User experience slow recovery when target device went through a stop/start
of the authentication application (app_stop/app_start).
Between the period of app_stop and app_start on the target device, target
device choose to send ELS Reject for any receive AUTH ELS command. At this
time, authentication application does not do ELS reject if it encounters
error.
Therefore, AUTH ELS reject signify authentication application is not
running. If driver passes up the AUTH ELS Reject to the authentication
application, then it would result in authentication application
retrying/resending the same AUTH ELS command again + delay.
As a work around, driver should trigger a session tear down where it tells
the local authentication application to also tear down. At the next
relogin, both sides are then synchronized.
Link: https://lore.kernel.org/r/20220608115849.16693-10-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The scenario is this: User loaded driver but has not started authentication
app. All sessions to secure device will exhaust all login attempts, fail,
and in stay in deleted state. Then some time later the app is started. The
driver will replenish the login retry count, trigger delete to prepare for
secure login. After deletion, relogin is triggered.
For the session that is already deleted, the delete trigger is a no-op. If
none of the sessions trigger a relogin, no progress is made.
Add a relogin trigger.
Link: https://lore.kernel.org/r/20220608115849.16693-5-njavali@marvell.com
Fixes: 7ebb336e45 ("scsi: qla2xxx: edif: Add start + stop bsgs")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After initiator has burned up all login retries, target authentication
application begins to run. This triggers a link bounce on target side.
Initiator will attempt another login. Due to N2N, the PRLI [nvme | fcp] can
fail because of the mode mismatch with target. This patch add a few more
login retries to revive the connection.
Link: https://lore.kernel.org/r/20220607044627.19563-11-njavali@marvell.com
Fixes: 4de067e5df ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
User failed to see disk via n2n topology. Driver used up all login retries
before authentication application started. When authentication application
started, driver did not have enough login retries to connect securely. On
app_start, driver will reset the login retry attempt count.
Link: https://lore.kernel.org/r/20220607044627.19563-10-njavali@marvell.com
Fixes: 4de067e5df ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Recently driver has implemented a new doorbell mechanism via bsg. The new
doorbell tells driver the exact buffer size application has where driver
can fill it up with events. The old doorbell guestimated application buffer
size is 256.
Remove duplicate functionality, the application has moved on to the new
doorbell interface.
Link: https://lore.kernel.org/r/20220607044627.19563-9-njavali@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add bsg interface for app to read doorbell events. This interface lets
driver know how much app can read based on return buffer size. When the
next event(s) occur, driver will return the bsg_job with the event(s) in
the return buffer.
If there is no event to read, driver will hold on to the bsg_job up to few
seconds as a way to control the polling interval.
Link: https://lore.kernel.org/r/20220607044627.19563-5-njavali@marvell.com
Fixes: dd30706e73 ("scsi: qla2xxx: edif: Add key update")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch uses GFFID switch command to scan whether remote device is
Target or Initiator mode. Based on that info, driver will not pass up
Initiator info to authentication application. This helps reduce unnecessary
stress for authentication application to deal with unused connections.
Link: https://lore.kernel.org/r/20220607044627.19563-2-njavali@marvell.com
Fixes: 7ebb336e45 ("scsi: qla2xxx: edif: Add start + stop bsgs")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull more SCSI updates from James Bottomley:
"Mostly small bug fixes plus other trivial updates.
The major change of note is moving ufs out of scsi and a minor update
to lpfc vmid handling"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
scsi: qla2xxx: Remove unused 'ql_dm_tgt_ex_pct' parameter
scsi: qla2xxx: Remove setting of 'req' and 'rsp' parameters
scsi: mpi3mr: Fix kernel-doc
scsi: lpfc: Add support for ATTO Fibre Channel devices
scsi: core: Return BLK_STS_TRANSPORT for ALUA transitioning
scsi: sd_zbc: Prevent zone information memory leak
scsi: sd: Fix potential NULL pointer dereference
scsi: mpi3mr: Rework mrioc->bsg_device model to fix warnings
scsi: myrb: Fix up null pointer access on myrb_cleanup()
scsi: core: Unexport scsi_bus_type
scsi: sd: Don't call blk_cleanup_disk() in sd_probe()
scsi: ufs: ufshcd: Delete unnecessary NULL check
scsi: isci: Fix typo in comment
scsi: pmcraid: Fix typo in comment
scsi: smartpqi: Fix typo in comment
scsi: qedf: Fix typo in comment
scsi: esas2r: Fix typo in comment
scsi: storvsc: Fix typo in comment
scsi: ufs: Split the drivers/scsi/ufs directory
scsi: qla1280: Remove redundant variable
...
cppcheck reports
[drivers/scsi/qla2xxx/qla_mid.c:594]: (warning) Assignment of function parameter has no effect outside the function. Did you forget dereferencing it?
[drivers/scsi/qla2xxx/qla_mid.c:620]: (warning) Assignment of function parameter has no effect outside the function. Did you forget dereferencing it?
The functions qla25xx_free_req_que() and qla25xx_free_rsp_que() are
similar. They free a 'req' and a 'rsp' parameter respectively. The last
statement of both functions is setting the parameter to NULL. This has no
effect and can be removed.
Link: https://lore.kernel.org/r/20220521201607.4145298-1-trix@redhat.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull SCSI updates from James Bottomley:
"This consists of a small set of driver updates (lpfc, ufs, mpt3sas
mpi3mr, iscsi target). Apart from that this is mostly small fixes with
very few core changes (the biggest one being VPD caching)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (177 commits)
scsi: target: tcmu: Avoid holding XArray lock when calling lock_page
scsi: elx: efct: Remove NULL check after calling container_of()
scsi: dpt_i2o: Drop redundant spinlock initialization
scsi: qedf: Remove redundant variable op
scsi: hisi_sas: Fix memory ordering in hisi_sas_task_deliver()
scsi: fnic: Replace DMA mask of 64 bits with 47 bits
scsi: mpi3mr: Add target device related sysfs attributes
scsi: mpi3mr: Add shost related sysfs attributes
scsi: elx: efct: Remove redundant memset() statement
scsi: megaraid_sas: Remove redundant memset() statement
scsi: mpi3mr: Return error if dma_alloc_coherent() fails
scsi: hisi_sas: Fix rescan after deleting a disk
scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset
scsi: libsas: Refactor sas_ata_hard_reset()
scsi: mpt3sas: Update driver version to 42.100.00.00
scsi: mpt3sas: Fix junk chars displayed while printing ChipName
scsi: ipr: Use kobj_to_dev()
scsi: mpi3mr: Fix a NULL vs IS_ERR() bug in mpi3mr_bsg_init()
scsi: bnx2fc: Avoid using get_cpu() in bnx2fc_cmd_alloc()
scsi: libfc: Remove get_cpu() semantics in fc_exch_em_alloc()
...
Aborting commands that have already been sent to the firmware can
cause BUG in qlt_free_cmd(): BUG_ON(cmd->sg_mapped)
For instance:
- Command passes rdx_to_xfer state, maps sgl, sends to the firmware
- Reset occurs, qla2xxx performs ISP error recovery, aborts the command
- Target stack calls qlt_abort_cmd() and then qlt_free_cmd()
- BUG_ON(cmd->sg_mapped) in qlt_free_cmd() occurs because sgl was not
unmapped
Thus, unmap sgl in qlt_abort_cmd() for commands with the aborted flag set.
Link: https://lore.kernel.org/r/AS8PR10MB4952D545F84B6B1DFD39EC1E9DEE9@AS8PR10MB4952.EURPRD10.PROD.OUTLOOK.COM
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Gleb Chesnokov <Chesnokov.G@raidix.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For session recovery, driver relies on the dpc thread to initiate certain
operations. The dpc thread runs exclusively without the Mailbox interface
being occupied. A recent code change for heartbeat check via mailbox cmd 0
is preventing the dpc thread from carrying out its operation. This patch
allows the higher priority error recovery to run first before running the
lower priority heartbeat check.
Link: https://lore.kernel.org/r/20220310092604.22950-9-njavali@marvell.com
Fixes: d94d8158e1 ("scsi: qla2xxx: Add heartbeat check")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
User experienced device lost. The log shows Get port data base command was
queued up, failed, and requeued again. Every time it is requeued, it set
the FCF_ASYNC_ACTIVE. This prevents any recovery code from occurring
because driver thinks a recovery is in progress for this session. In
essence, this session is hung. The reason it gets into this place is the
session deletion got in front of this call due to link perturbation.
Break the requeue cycle and exit. The session deletion code will trigger a
session relogin.
Link: https://lore.kernel.org/r/20220310092604.22950-8-njavali@marvell.com
Fixes: 726b854870 ("qla2xxx: Add framework for async fabric discovery")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>