During controller reset, the driver tries to flush all the pending firmware
event works from worker queue that are queued prior to the reset. However,
if any work is waiting for device addition/removal operation to be
completed at the SML, then a deadlock is observed. This is due to the
controller reset waiting for the device addition/removal to be completed
and the device/addition removal is waiting for the controller reset to be
completed.
To limit this deadlock, continue with the controller reset handling without
canceling the work which is waiting for device addition/removal operation
to complete at SML.
Link: https://lore.kernel.org/r/20220210095817.22828-2-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove some warnings found by running scripts/kernel-doc, which is caused
by using 'make W=1'.
drivers/scsi/mpi3mr/mpi3mr_fw.c:2188: warning: Function parameter or
member 'reason_code' not described in 'mpi3mr_check_rh_fault_ioc'
drivers/scsi/mpi3mr/mpi3mr_fw.c:3650: warning: Excess function parameter
'init_type' description in 'mpi3mr_init_ioc'
drivers/scsi/mpi3mr/mpi3mr_fw.c:4177: warning: bad line
Link: https://lore.kernel.org/r/20211231082350.19315-1-yang.lee@linux.alibaba.com
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add support for the io_uring interface in I/O-polled mode.
This feature is disabled in the driver by default. To enable the feature, a
module parameter "poll_queues" has to be set with the desired number of
polling queues.
When the feature is enabled, the driver reserves a certain number of
operational queue pairs for the poll_queues either from the available queue
pairs or creates additional queue pairs based on the operational queue
availability.
The Polling queues will have corresponding IRQ and ISR functions as similar
to default queues. However, the IRQ line is disabled by the driver for
poll_queues.
Link: https://lore.kernel.org/r/20211220141159.16117-22-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The IOC sends a Prepare for Reset Event to the host to prepare for a Soft
Reset. This event data has two reason codes:
1. Start - The host is expected to gracefully quiesce all I/O within
approximately 1 second.
2. Abort - The IOC is requesting to abort a previous Prepare for Reset
Event request. Normal I/O may be resumed.
Link: https://lore.kernel.org/r/20211220141159.16117-20-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Enhance driver to gracefully handle discrepancies in certain key data sizes
between firmware update operations as mentioned below:
- The driver displays an error message and marks the controller as
unrecoverable if the firmware reports ReplyFrameSize that is greater
than the current ReplyFrameSize.
- If the firmware reports ReplyFrameSize greater than the current
ReplyFrameSize then the driver uses the current ReplyFrameSize while
copying the reply messages.
- The driver displays an error message and marks the controller as
unrecoverable if the firmware reports MaxOperationalReplyQueues less
than the currently allocated operational reply queues count.
- If the firmware reports MaxOperationalReplyQueues that is greater than
the currently allocated operational reply queue count then the driver
ignores the new increased value and uses the previously allocated number
of operational queues only.
- If the firmware reports MaxDevHandle greater than the previously used
MaxDevHandle value after a reset then the driver re-allocates the
'device remove pending bitmap' buffer with the newer size using
krealloc().
Link: https://lore.kernel.org/r/20211220141159.16117-18-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Detect asynchronous reset that occurred in the firmware by polling for
reset history bit of IOC status register is set and if that bit is set,
then the driver waits for the controller to become ready and then
re-initializes the controller.
Also reduce the time driver is waiting for the controller to acknowledge
the reset action after issuing a specific reset action to the
controller. The wait time is reduced from 510 seconds to 30 seconds. If the
controller didn't acknowledge a specific reset action within the time
interval then the driver marks the controller as unrecoverable instead of
retrying two more times prior to giving up.
Link: https://lore.kernel.org/r/20211220141159.16117-17-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the driver marks the controller as unrecoverable if there is an
asynchronous reset or fault during the initialization, reinitialization
post reset, and OS resume.
Enhance driver to retry the initialization, re-initialization, and resume
sequences for a maximum of 3 times if the controller became faulty or
asynchronously reset due to a firmware activation during the initialization
sequence.
Link: https://lore.kernel.org/r/20211220141159.16117-15-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Save snapdump and fault the controller with the given reason code if it is
already not in the fault or not in asynchronous reset. This ensures that
soft reset is issued from the watchdog thread. This will also be used to
handle initialization time faults/resets/timeout as in those cases
immediate soft reset invocation is not required.
Link: https://lore.kernel.org/r/20211220141159.16117-12-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
I intended to move from snprintf() to scnprintf() in the previous patch but
I messed up and did not do that. The result of my bug is that it this
function could trigger a WARN() if the buffer is too large.
Link: https://lore.kernel.org/r/20211013083005.GA8592@kili
Fixes: 76a4f7cc59 ("scsi: mpi3mr: Clean up mpi3mr_print_ioc_info()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This function is more complicated than necessary.
If we change from scnprintf() to snprintf() that lets us remove the if
bytes_wrote < sizeof(protocol) checks. Also, we can use bytes_wrote ? ","
: "" to print the comma and remove the separate if statement and the
"is_string_nonempty" variable.
[mkp: a few formatting cleanups and s/wrote/written/]
Link: https://lore.kernel.org/r/20210916132605.GF25094@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The pci_alloc_irq_vectors_affinity() function returns negative error codes
or it returns a number between the minimum vectors (1 in this case) and
max_vectors. It won't return zero. Because "i" is a u16 then the error
handling won't work. And also if it did work the error code was not set.
Really "max_vectors" can be an int as well because we're doing a min_t() on
int type. The other change is that it's better to remove unnecessary
initialization so that static checkers can warn us if there are ever
uninitialized variable bugs introduced in the future.
I changed the error code from -1 (-EPERM) if the kmalloc() failed to
-ENOMEM. And on success path I changed it from "return retval;" to "return
0;" which shouldn't affect the compiled code but makes it more readable.
Link: https://lore.kernel.org/r/YMCJcnmSI4kOIyv/@mwanda
Fixes: 824a156633 ("scsi: mpi3mr: Base driver code")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Register driver for threaded interrupts.
By default the driver will attempt I/O completion from interrupt context
(primary handler). Since the driver tracks per reply queue outstanding
I/Os, it will schedule threaded ISR if there are any outstanding I/Os
expected on that particular reply queue.
Threaded ISR (secondary handler) will loop for I/O completion as long as
there are outstanding I/Os (speculative method using same per reply queue
outstanding counter) or it has completed some X amount of commands
(something like budget).
Link: https://lore.kernel.org/r/20210520152545.2710479-18-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Detection of firmware fault or any kind of unresponsiveness in the
controller (any admin command which times out) results in resetting the
controller. The primary reset mechanisms used are either soft reset or diag
fault reset. A reset is performed if the host sets the ResetAction field in
the HostDiagnostic register to either 001b (soft reset) or 007b (diag fault
reset). After successfully resetting the controller the driver
reinitializes the controller by going through start of the day
initialization procedure. Pending I/Os during the reset are returned back
to the SCSI midlayer for retry.
Link: https://lore.kernel.org/r/20210520152545.2710479-10-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.co
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create operational request and reply queue pair.
The MPI3 transport interface consists of an Administrative Request Queue,
an Administrative Reply Queue, and Operational Messaging Queues. The
Operational Messaging Queues are the primary communication mechanism
between the host and the I/O Controller (IOC). Request messages, allocated
in host memory, identify I/O operations to be performed by the IOC. These
operations are queued on an Operational Request Queue by the host driver.
Reply descriptors track I/O operations as they complete. The IOC queues
these completions in an Operational Reply Queue.
To fulfil large contiguous memory requirement, driver creates multiple
segments and provide the list of segments. Each segment size should be 4K
which is a hardware requirement. An element array is contiguous or
segmented. A contiguous element array is located in contiguous physical
memory. A contiguous element array must be aligned on an element size
boundary. An element's physical address within the array may be directly
calculated from the base address, the Producer/Consumer index, and the
element size.
Expected phased identifier bit is used to find out valid entry on reply
queue. Driver sets <ephase> bit and IOC inverts the value of this bit on
each pass.
Link: https://lore.kernel.org/r/20210520152545.2710479-4-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>