linux/drivers/scsi/qla2xxx
Bill Kuzeja a5dd506e15 scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init
A system can get hung task timeouts if a qlogic board fails during
initialization (if the board breaks again or fails the init). The hang
involves the scsi scan.

In a nutshell, since commit beb9e315e6 ("qla2xxx: Prevent removal and
board_disable race"):

...it is possible to have freed ha (base_vha->hw) early by a call to
qla2x00_remove_one when pdev->enable_cnt equals zero:

       if (!atomic_read(&pdev->enable_cnt)) {
               scsi_host_put(base_vha->host);
               kfree(ha);
               pci_set_drvdata(pdev, NULL);
               return;

Almost always, the scsi_host_put above frees the vha structure
(attached to the end of the Scsi_Host we're putting) since it's the last
put, and life is good.  However, if we are entering this routine because
the adapter has broken sometime during initialization AND a scsi scan is
already in progress (and has done its own scsi_host_get), vha will not
be freed. What's worse, the scsi scan will access the freed ha structure
through qla2xxx_scan_finished:

        if (time > vha->hw->loop_reset_delay * HZ)
                return 1;

The scsi scan keeps checking to see if a scan is complete by calling
qla2xxx_scan_finished. There is a timeout value that limits the length
of time a scan can take (hw->loop_reset_delay, usually set to 5
seconds), but this definition is in the data structure (hw) that can get
freed early.

This can yield unpredictable results, the worst of which is that the
scsi scan can hang indefinitely. This happens when the freed structure
gets reused and loop_reset_delay gets overwritten with garbage, which
the scan obliviously uses as its timeout value.

The fix for this is simple: at the top of qla2xxx_scan_finished, check
for the UNLOADING bit in the vha structure (_vha is not freed at this
point).  If UNLOADING is set, we exit the scan for this adapter
immediately. After this last reference to the ha structure, we'll exit
the scan for this adapter, and continue on.

This problem is hard to hit, but I have run into it doing negative
testing many times now (with a test specifically designed to bring it
out), so I can verify that this fix works. My testing has been against a
RHEL7 driver variant, but the bug and patch are equally relevant to to
the upstream driver.

Fixes: beb9e315e6 ("qla2xxx: Prevent removal and board_disable race")
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-01 16:39:01 -04:00
..
Kconfig tcm_qla2xxx Add SCSI command jammer/discard capability 2016-05-10 01:27:17 -07:00
Makefile
qla_attr.c qla2xxx: Disable the adapter and skip error recovery in case of register disconnect. 2016-07-15 15:35:50 -04:00
qla_bsg.c qla2xxx: Add bsg interface to support statistics counter reset. 2016-07-15 15:35:37 -04:00
qla_bsg.h qla2xxx: Add bsg interface to support statistics counter reset. 2016-07-15 15:35:37 -04:00
qla_dbg.c qla2xxx: Fix duplicate message id. 2016-07-15 15:35:51 -04:00
qla_dbg.h
qla_def.h scsi: qla2xxx: Use struct t10_pi_tuple 2016-09-15 09:41:56 -04:00
qla_devtbl.h
qla_dfs.c qla2xxx: Add DebugFS node for target sess list. 2016-03-10 21:48:27 -08:00
qla_fw.h qla2xxx: Fix BBCR offset 2016-07-15 15:35:51 -04:00
qla_gbl.h qla2xxx: Add bsg interface to support D_Port Diagnostics. 2016-07-15 15:31:31 -04:00
qla_gs.c qla2xxx: Remove __constant_ prefix 2015-08-26 10:40:32 -07:00
qla_init.c qla2xxx: Let DPORT be enabled purely by nvram. 2016-07-15 15:35:48 -04:00
qla_inline.h qla2xxx: Avoid side effects when using endianizer macros. 2016-02-23 21:27:02 -05:00
qla_iocb.c qla2xxx: Added interface to send explicit LOGO. 2016-01-07 13:57:43 -08:00
qla_isr.c scsi: qla2xxx: Use struct t10_pi_tuple 2016-09-15 09:41:56 -04:00
qla_mbx.c qla2xxx: Fix duplicate message id. 2016-07-15 15:35:51 -04:00
qla_mid.c qla2xxx: Fix stale pointer access. 2016-02-06 19:44:30 -08:00
qla_mr.c qla2xxx: Remove use of 'struct timeval' 2016-04-15 16:53:18 -04:00
qla_mr.h
qla_nx2.c qla2xxx: Replace two macros with an inline function 2015-08-26 10:35:35 -07:00
qla_nx2.h qla2xxx: Replace two macros with an inline function 2015-08-26 10:35:35 -07:00
qla_nx.c qla2xxx: Indicate out-of-memory with -ENOMEM 2016-04-11 16:57:09 -04:00
qla_nx.h scsi/qla2xxx: Remove erroneous unused macro qla82xx_get_temp_val1() 2016-06-21 13:22:39 +02:00
qla_os.c scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init 2016-11-01 16:39:01 -04:00
qla_settings.h
qla_sup.c treewide: Fix typos in printk 2016-04-28 10:52:28 +02:00
qla_target.c qla2xxx: Properly initialize IO statistics. 2016-07-15 15:31:31 -04:00
qla_target.h tcm_qla2xxx: introduce a private sess_kref 2016-05-10 01:19:33 -07:00
qla_tmpl.c qla2xxx: Add ram area DDR for fwdump template entry T262. 2016-07-15 15:31:31 -04:00
qla_tmpl.h
qla_version.h qla2xxx: Update driver version to 8.07.00.38-k 2016-07-15 15:35:52 -04:00
tcm_qla2xxx.c tcm_qla2xxx Add SCSI command jammer/discard capability 2016-05-10 01:27:17 -07:00
tcm_qla2xxx.h tcm_qla2xxx Add SCSI command jammer/discard capability 2016-05-10 01:27:17 -07:00