linux/drivers/scsi/lpfc
Mauricio Faria de Oliveira 05a05872c8 lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from lpfc_send_taskmgmt()
The lpfc_sli4_scmd_to_wqidx_distr() function expects the scsi_cmnd
'lpfc_cmd->pCmd' not to be null, and point to the midlayer command.

That's not true in the .eh_(device|target|bus)_reset_handler path,
because lpfc_send_taskmgmt() sends commands not from the midlayer, so
does not set 'lpfc_cmd->pCmd'.

That is true in the .queuecommand path because lpfc_queuecommand()
stores the scsi_cmnd from midlayer in lpfc_cmd->pCmd; and lpfc_cmd is
stored by lpfc_scsi_prep_cmnd() in piocbq->context1 -- which is passed
to lpfc_sli4_scmd_to_wqidx_distr() as lpfc_cmd parameter.

This problem can be hit on SCSI EH, and immediately with sg_reset.
These 2 test-cases demonstrate the problem/fix with next-20160601.

Test-case 1) sg_reset

    # strace sg_reset --device /dev/sdm
    <...>
    open("/dev/sdm", O_RDWR|O_NONBLOCK)     = 3
    ioctl(3, SG_SCSI_RESET, 0x3fffde6d0994 <unfinished ...>
    +++ killed by SIGSEGV +++
    Segmentation fault

    # dmesg
    Unable to handle kernel paging request for data at address 0x00000000
    Faulting instruction address: 0xd00000001c88442c
    Oops: Kernel access of bad area, sig: 11 [#1]
    <...>
    CPU: 104 PID: 16333 Comm: sg_reset Tainted: G        W       4.7.0-rc1-next-20160601-00004-g95b89dc #6
    <...>
    NIP [d00000001c88442c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc]
    LR [d00000001c826fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc]
    Call Trace:
    [c000003c9ec876f0] [c000003c9ec87770] 0xc000003c9ec87770 (unreliable)
    [c000003c9ec87720] [d00000001c82e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc]
    [c000003c9ec87780] [d00000001c831a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc]
    [c000003c9ec87880] [d00000001c87f27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc]
    [c000003c9ec87950] [d00000001c87fd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc]
    [c000003c9ec87a10] [c000000000610694] scsi_try_bus_device_reset+0x44/0xc0
    [c000003c9ec87a40] [c0000000006113e8] scsi_ioctl_reset+0x198/0x2c0
    [c000003c9ec87bf0] [c00000000060fe5c] scsi_ioctl+0x13c/0x4b0
    [c000003c9ec87c80] [c0000000006629b0] sd_ioctl+0xf0/0x120
    [c000003c9ec87cd0] [c00000000046e4f8] blkdev_ioctl+0x248/0xb70
    [c000003c9ec87d30] [c0000000002a1f60] block_ioctl+0x70/0x90
    [c000003c9ec87d50] [c00000000026d334] do_vfs_ioctl+0xc4/0x890
    [c000003c9ec87de0] [c00000000026db60] SyS_ioctl+0x60/0xc0
    [c000003c9ec87e30] [c000000000009120] system_call+0x38/0x108
    Instruction dump:
    <...>

    With fix:

    # strace sg_reset --device /dev/sdm
    <...>
    open("/dev/sdm", O_RDWR|O_NONBLOCK)     = 3
    ioctl(3, SG_SCSI_RESET, 0x3fffe103c554) = 0
    close(3)                                = 0
    exit_group(0)                           = ?
    +++ exited with 0 +++

    # dmesg
    [  424.658649] lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (1, 0) return x2002

Test-case 2) SCSI EH

    Using this debug patch to wire an SCSI EH trigger, for lpfc_scsi_cmd_iocb_cmpl():
    -       cmd->scsi_done(cmd);
    +       if ((phba->pport ? phba->pport->cfg_log_verbose : phba->cfg_log_verbose) == 0x32100000)
    +               printk(KERN_ALERT "lpfc: skip scsi_done()\n");
    +       else
    +               cmd->scsi_done(cmd);

    # echo 0x32100000 > /sys/class/scsi_host/host11/lpfc_log_verbose

    # dd if=/dev/sdm of=/dev/null iflag=direct &
    <...>

    After a while:

    # dmesg
    lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x32100000)
    lpfc: skip scsi_done()
    <...>
    Unable to handle kernel paging request for data at address 0x00000000
    Faulting instruction address: 0xd0000000199e448c
    Oops: Kernel access of bad area, sig: 11 [#1]
    <...>
    CPU: 96 PID: 28556 Comm: scsi_eh_11 Tainted: G        W       4.7.0-rc1-next-20160601-00004-g95b89dc #6
    <...>
    NIP [d0000000199e448c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc]
    LR [d000000019986fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc]
    Call Trace:
    [c000000ff0d0b890] [c000000ff0d0b900] 0xc000000ff0d0b900 (unreliable)
    [c000000ff0d0b8c0] [d00000001998e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc]
    [c000000ff0d0b920] [d000000019991a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc]
    [c000000ff0d0ba20] [d0000000199df27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc]
    [c000000ff0d0baf0] [d0000000199dfd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc]
    [c000000ff0d0bbb0] [c000000000610694] scsi_try_bus_device_reset+0x44/0xc0
    [c000000ff0d0bbe0] [c0000000006126cc] scsi_eh_ready_devs+0x49c/0x9c0
    [c000000ff0d0bcb0] [c000000000614160] scsi_error_handler+0x580/0x680
    [c000000ff0d0bd80] [c0000000000ae848] kthread+0x108/0x130
    [c000000ff0d0be30] [c0000000000094a8] ret_from_kernel_thread+0x5c/0xb4
    Instruction dump:
    <...>

    With fix:

    # dmesg
    lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x32100000)
    lpfc: skip scsi_done()
    <...>
    lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (0, 0) return x2002
    <...>
    lpfc 0006:01:00.4: 4:(0):0723 SCSI layer issued Target Reset (1, 0) return x2002
    <...>
    lpfc 0006:01:00.4: 4:(0):0714 SCSI layer issued Bus Reset Data: x2002
    <...>
    lpfc 0006:01:00.4: 4:(0):3172 SCSI layer issued Host Reset Data:
    <...>

Fixes: 8b0dff1416 ("lpfc: Add support for using block multi-queue")
Cc: <stable@vger.kernel.org> # v4.2+
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-07-27 00:31:09 -04:00
..
lpfc_attr.c scsi: lpfc: avoid harmless comparison warning 2016-07-20 19:53:07 -04:00
lpfc_attr.h lpfc: Re-organize source for easier driver attribute management 2016-07-15 15:25:06 -04:00
lpfc_bsg.c lpfc: remove set but not used variables 2015-10-27 10:06:00 +09:00
lpfc_bsg.h lpfc: Update copyright to 2015 2015-04-10 07:50:42 -07:00
lpfc_compat.h
lpfc_crtn.h lpfc: Copyright updates 2016-07-15 15:25:06 -04:00
lpfc_ct.c lpfc: Disable FDMI probing if not connected to a fabric 2016-07-15 15:25:06 -04:00
lpfc_debugfs.c lpfc: fix missing zero termination in debugfs 2016-02-23 21:27:02 -05:00
lpfc_debugfs.h
lpfc_disc.h lpfc: Fix rport leak. 2015-06-05 22:40:19 -07:00
lpfc_els.c lpfc: Remove global lpfc_delay_discovery attribute in leiu of per-hba lpfc_delay_discovery 2016-07-15 15:25:06 -04:00
lpfc_hbadisc.c lpfc: Update modified file copyrights 2016-04-11 16:57:09 -04:00
lpfc_hw4.h lpfc: Add MDS Diagnostics Support 2016-07-15 15:25:06 -04:00
lpfc_hw.h lpfc: Correct RDP response Revision location 2016-07-15 15:25:06 -04:00
lpfc_ids.h lpfc: Re-organize source for easier device-id management 2016-07-15 15:25:06 -04:00
lpfc_init.c lpfc: Remove global lpfc_enable_npiv attribute in leiu of per-hba lpfc_enable_npiv 2016-07-15 15:25:06 -04:00
lpfc_logmsg.h [SCSI] lpfc 8.3.39: Fixed BlockGuard error reporting 2013-05-02 12:38:02 -07:00
lpfc_mbox.c lpfc: Update modified file copyrights 2016-04-11 16:57:09 -04:00
lpfc_mem.c Revert "lpfc: Delete unnecessary checks before the function call mempool_destroy" 2016-05-11 13:02:43 -07:00
lpfc_nl.h
lpfc_nportdisc.c lpfc: Update modified file copyrights 2016-04-11 16:57:09 -04:00
lpfc_scsi.c lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from lpfc_send_taskmgmt() 2016-07-27 00:31:09 -04:00
lpfc_scsi.h lpfc: Copyright updates 2016-07-15 15:25:06 -04:00
lpfc_sli4.h lpfc: Copyright updates 2016-07-15 15:25:06 -04:00
lpfc_sli.c lpfc: call lpfc_sli_validate_fcp_iocb() with the hbalock held 2016-07-20 19:45:35 -04:00
lpfc_sli.h lpfc: Copyright updates 2016-07-15 15:25:06 -04:00
lpfc_version.h lpfc: Update lpfc version to 11.2.0.0 2016-07-15 15:25:06 -04:00
lpfc_vport.c lpfc: Update modified file copyrights 2016-04-11 16:57:09 -04:00
lpfc_vport.h [SCSI] lpfc 8.3.39: Fixed VPI allocation issues after firmware dump is performed 2013-05-02 12:37:45 -07:00
lpfc.h lpfc: Correct issue with ioremap() call on 32bit kernel 2016-07-15 15:25:06 -04:00
Makefile