When a timestamp update or an event acknowledgment command times out, the
driver invokes the soft reset handler to recover the controller while
holding a mutex lock. The soft reset handler also tries to acquire the same
mutex to send initialization commands to the controller which leads to a
deadlock scenario.
To resolve the issue the driver will check thestatus and if this indicates
the controller is operational, the driver will issue a diagnostic fault
reset and exit out of the command processing function. If the controller is
already faulted or asynchronously reset, then the driver will just exit the
command processing function.
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230804104248.118924-2-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While performing certain power-off sequences, PCI drivers are called to
suspend and resume their underlying devices through PCI PM (power
management) interface. However the hardware does not support PCI PM
suspend/resume operations so system wide suspend/resume leads to bad MFW
(management firmware) state which causes various follow-up errors in driver
when communicating with the device/firmware.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding system
to go into suspended/standby mode.
Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230807093725.46829-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While performing certain power-off sequences, PCI drivers are called to
suspend and resume their underlying devices through PCI PM (power
management) interface. However the hardware does not support PCI PM
suspend/resume operations so system wide suspend/resume leads to bad MFW
(management firmware) state which causes various follow-up errors in driver
when communicating with the device/firmware.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding system
to go into suspended/standby mode.
Fixes: ace7f46ba5 ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230807093725.46829-2-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As &qedi_percpu->p_work_lock is acquired by hard IRQ qedi_msix_handler(),
other acquisitions of the same lock under process context should disable
IRQ, otherwise deadlock could happen if the IRQ preempts the execution
while the lock is held in process context on the same CPU.
qedi_cpu_offline() is one such function which acquires the lock in process
context.
[Deadlock Scenario]
qedi_cpu_offline()
->spin_lock(&p->p_work_lock)
<irq>
->qedi_msix_handler()
->edi_process_completions()
->spin_lock_irqsave(&p->p_work_lock, flags); (deadlock here)
This flaw was found by an experimental static analysis tool I am developing
for IRQ-related deadlocks.
The tentative patch fix the potential deadlock by spin_lock_irqsave()
under process context.
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://lore.kernel.org/r/20230726125655.4197-1-dg573847474@gmail.com
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When preparing protection DIF I/O for DMA, the driver obtains reference
tags from scsi_prot_ref_tag(). Previously, there was a wrong assumption
that an all 0xffffffff value meant error and thus the driver failed the
I/O. This patch removes the evaluation code and accepts whatever the upper
layer returns.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230803211932.155745-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If device_add() returns error, the name allocated by dev_set_name() needs
be freed. As the comment of device_add() says, put_device() should be used
to give up the reference in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanp().
Fixes: c8806b6c9e ("snic: driver for Cisco SCSI HBA")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Acked-by: Narsimhulu Musini <nmusini@cisco.com>
Link: https://lore.kernel.org/r/20230801111421.63651-1-wangzhu9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If device_add() returns error, the name allocated by dev_set_name() needs
be freed. As the comment of device_add() says, put_device() should be used
to decrease the reference count in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanp().
Fixes: ee959b00c3 ("SCSI: convert struct class_device to struct device")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230803020230.226903-1-wangzhu9@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Only nodes whose state is at least past a PLOGI issue and strictly less
than a PRLI issue should be put into device recovery mode upon RSCN
receipt. Previously, the allowance of LOGO and PRLI completion states did
not make sense because those nodes should be allowed to flow through and
marked as NPort dissappeared as is normally done. A follow up RSCN GID_FT
would recover those nodes in such cases.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230804195546.157839-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- Prevent the scsi disk driver from issuing a START STOP UNIT command
for ATA devices during system resume as this causes various issues
reported by multiple users.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCZM73RgAKCRDdoc3SxdoY
dng8AP4qmIrU9K95uy7S9Ix8aMJj0HCWvFlBr6Evh8kpyEw7HgD/SHHlvbYg+g8n
lD9/JWRzpHkHl5XM8DqWyKSvi906pgM=
=m3pD
-----END PGP SIGNATURE-----
Merge tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata fix from Damien Le Moal:
- Prevent the scsi disk driver from issuing a START STOP UNIT command
for ATA devices during system resume as this causes various issues
reported by multiple users.
* tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata,scsi: do not issue START STOP UNIT on resume
ata_sas_port_init() now only contains a single initialization.
Move this single initialization to ata_sas_port_alloc(), since:
1) ata_sas_port_alloc() already initializes some of the struct members.
2) ata_sas_port_alloc() is only used by libsas.
Suggested-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Rename __ata_port_probe() to ata_port_probe() and drop the wrapper
ata_sas_async_probe().
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Is now a wrapper around kfree(), so call it directly.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Callbacks are empty now, so remove them.
Also, remove the call to ap->ops->port_start() in ata_sas_port_init(),
as this would otherwise cause a NULL pointer dereference, now when the
callback is gone.
Signed-off-by: Hannes Reinecke <hare@suse.de>
[niklas: remove the call to ap->ops->port_start() in ata_sas_port_init()]
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
During system resume, ata_port_pm_resume() triggers ata EH to
1) Resume the controller
2) Reset and rescan the ports
3) Revalidate devices
This EH execution is started asynchronously from ata_port_pm_resume(),
which means that when sd_resume() is executed, none or only part of the
above processing may have been executed. However, sd_resume() issues a
START STOP UNIT to wake up the drive from sleep mode. This command is
translated to ATA with ata_scsi_start_stop_xlat() and issued to the
device. However, depending on the state of execution of the EH process
and revalidation triggerred by ata_port_pm_resume(), two things may
happen:
1) The START STOP UNIT fails if it is received before the controller has
been reenabled at the beginning of the EH execution. This is visible
with error messages like:
ata10.00: device reported invalid CHS sector 0
sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current]
sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command
sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5
sd 9:0:0:0: PM: failed to resume async: error -5
2) The START STOP UNIT command is received while the EH process is
on-going, which mean that it is stopped and must wait for its
completion, at which point the command is rather useless as the drive
is already fully spun up already. This case results also in a
significant delay in sd_resume() which is observable by users as
the entire system resume completion is delayed.
Given that ATA devices will be woken up by libata activity on resume,
sd_resume() has no need to issue a START STOP UNIT command, which solves
the above mentioned problems. Do not issue this command by introducing
the new scsi_device flag no_start_on_resume and setting this flag to 1
in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP
UNIT command only if this flag is not set.
Reported-by: Paul Ausbeck <paula@soe.ucsc.edu>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880
Fixes: a19a93e4c6 ("scsi: core: pm: Rely on the device driver core for async power management")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Tanner Watkins <dalzot@gmail.com>
Tested-by: Paul Ausbeck <paula@soe.ucsc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
(lightly modified commit message mostly by Linus Torvalds)
The parsing code for /proc/scsi/scsi is disgusting and broken. We should
have just used 'sscanf()' or something simple like that, but the logic may
actually predate our kernel sscanf library routine for all I know. It
certainly predates both git and BK histories.
And we can't change it to be something sane like that now, because the
string matching at the start is done case-insensitively, and the separator
parsing between numbers isn't done at all, so *any* separator will work,
including a possible terminating NUL character.
This interface is root-only, and entirely for legacy use, so there is
absolutely no point in trying to tighten up the parsing. Because any
separator has traditionally worked, it's entirely possible that people have
used random characters rather than the suggested space.
So don't bother to try to pretty it up, and let's just make a minimal patch
that can be back-ported and we can forget about this whole sorry thing for
another two decades.
Just make it at least not read past the end of the supplied data.
Link: https://lore.kernel.org/linux-scsi/b570f5fe-cb7c-863a-6ed9-f6774c219b88@cybernetics.com/
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin K Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: stable@kernel.org
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Martin K Petersen <martin.petersen@oracle.com>
The qedf_dbg_fp_int_cmd_read() function invokes sprintf() directly on a
__user pointer, which may crash the kernel.
Avoid doing that by vmalloc()'ating a buffer for scnprintf() and then
calling simple_read_from_buffer() which does a proper copy_to_user() call.
Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Link: https://lore.kernel.org/lkml/20230724120241.40495-1-oleksandr@redhat.com/
Link: https://lore.kernel.org/linux-scsi/20230726101236.11922-1-skashyap@marvell.com/
Cc: Saurav Kashyap <skashyap@marvell.com>
Cc: Rob Evers <revers@redhat.com>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Jozef Bacik <jobacik@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: GR-QLogic-Storage-Upstream@marvell.com
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
Link: https://lore.kernel.org/r/20230731084034.37021-4-oleksandr@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The qedf_dbg_debug_cmd_read() function invokes sprintf() directly on a
__user pointer, which may crash the kernel.
Avoid doing that by using a small on-stack buffer for scnprintf() and then
calling simple_read_from_buffer() which does a proper copy_to_user() call.
Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Link: https://lore.kernel.org/lkml/20230724120241.40495-1-oleksandr@redhat.com/
Link: https://lore.kernel.org/linux-scsi/20230726101236.11922-1-skashyap@marvell.com/
Cc: Saurav Kashyap <skashyap@marvell.com>
Cc: Rob Evers <revers@redhat.com>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Jozef Bacik <jobacik@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: GR-QLogic-Storage-Upstream@marvell.com
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
Link: https://lore.kernel.org/r/20230731084034.37021-3-oleksandr@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The qedf_dbg_stop_io_on_error_cmd_read() function invokes sprintf()
directly on a __user pointer, which may crash the kernel.
Avoid doing that by using a small on-stack buffer for scnprintf() and then
calling simple_read_from_buffer() which does a proper copy_to_user() call.
Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Link: https://lore.kernel.org/lkml/20230724120241.40495-1-oleksandr@redhat.com/
Link: https://lore.kernel.org/linux-scsi/20230726101236.11922-1-skashyap@marvell.com/
Cc: Saurav Kashyap <skashyap@marvell.com>
Cc: Rob Evers <revers@redhat.com>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Jozef Bacik <jobacik@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: GR-QLogic-Storage-Upstream@marvell.com
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
Link: https://lore.kernel.org/r/20230731084034.37021-2-oleksandr@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a check for the command slot value to avoid dereferencing a NULL
pointer.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Co-developed-by: Vladimir Telezhnikov <vtelezhnikov@astralinux.ru>
Signed-off-by: Vladimir Telezhnikov <vtelezhnikov@astralinux.ru>
Signed-off-by: Alexandra Diupina <adiupina@astralinux.ru>
Link: https://lore.kernel.org/r/20230728123521.18293-1-adiupina@astralinux.ru
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
fnic_clean_pending_aborts() was returning a non-zero value irrespective of
failure or success. This caused the caller of this function to assume that
the device reset had failed, even though it would succeed in most cases. As
a consequence, a successful device reset would escalate to host reset.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20230727193919.2519-1-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hyper-V provides the ability to connect Fibre Channel LUNs to the host
system and present them in a guest VM as a SCSI device. I/O to the vFC
device is handled by the storvsc driver. The storvsc driver includes a
partial integration with the FC transport implemented in the generic
portion of the Linux SCSI subsystem so that FC attributes can be displayed
in /sys. However, the partial integration means that some aspects of vFC
don't work properly. Unfortunately, a full and correct integration isn't
practical because of limitations in what Hyper-V provides to the guest.
In particular, in the context of Hyper-V storvsc, the FC transport timeout
function fc_eh_timed_out() causes a kernel panic because it can't find the
rport and dereferences a NULL pointer. The original patch that added the
call from storvsc_eh_timed_out() to fc_eh_timed_out() is faulty in this
regard.
In many cases a timeout is due to a transient condition, so the situation
can be improved by just continuing to wait like with other I/O requests
issued by storvsc, and avoiding the guaranteed panic. For a permanent
failure, continuing to wait may result in a hung thread instead of a panic,
which again may be better.
So fix the panic by removing the storvsc call to fc_eh_timed_out(). This
allows storvsc to keep waiting for a response. The change has been tested
by users who experienced a panic in fc_eh_timed_out() due to transient
timeouts, and it solves their problem.
In the future we may want to deprecate the vFC functionality in storvsc
since it can't be fully fixed. But it has current users for whom it is
working well enough, so it should probably stay for a while longer.
Fixes: 3930d73098 ("scsi: storvsc: use default I/O timeout handler for FC devices")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1690606764-79669-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
LKP reports below warning when building for RISC-V with randconfig
configuration.
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:4567:35: sparse:
sparse: incorrect type in argument 4 (different base types)
@@ expected restricted __le32 [usertype] *[assigned] ptr
@@ got unsigned int * @@
Type cast to fix this warning.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307260823.whMNpZ1C-lkp@intel.com/
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20230726051759.30038-1-sunilvl@ventanamicro.com
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When building with CONFIG_AIC7XXX_BUILD_FIRMWARE=y, two fatal errors
are reported as shown below:
aicasm_gram.tab.c:203:10: fatal error: aicasm_gram.tab.h:
No such file or directory
aicasm_macro_gram.tab.c:167:10: fatal error: aicasm_macro_gram.tab.h:
No such file or directory
Fix these issues to make randconfig builds more reliable.
[mkp: add missing include]
Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com>
Link: https://lore.kernel.org/r/ZK0XIj6XzY5MCvtd@fedora
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If pm8001_init_sas_add() fails, return error code in pm8001_pci_probe().
Fixes: 14a8f116cd ("scsi: pm80xx: Add GET_NVMD timeout during probe")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20230725125706.566990-1-yangyingliang@huawei.com
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are three places that qla4xxx parses nlattrs:
- qla4xxx_set_chap_entry()
- qla4xxx_iface_set_param()
- qla4xxx_sysfs_ddb_set_param()
and each of them directly converts the nlattr to specific pointer of
structure without length checking. This could be dangerous as those
attributes are not validated and a malformed nlattr (e.g., length 0) could
result in an OOB read that leaks heap dirty data.
Add the nla_len check before accessing the nlattr data and return EINVAL if
the length check fails.
Fixes: 26ffd7b45f ("[SCSI] qla4xxx: Add support to set CHAP entries")
Fixes: 1e9e2be3ee ("[SCSI] qla4xxx: Add flash node mgmt support")
Fixes: 00c31889f7 ("[SCSI] qla4xxx: fix data alignment and use nl helpers")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230723080053.3714534-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
beiscsi_iface_set_param() parses nlattr with nla_for_each_attr and assumes
every attributes can be viewed as struct iscsi_iface_param_info.
This is not true because there is no any nla_policy to validate the
attributes passed from the upper function iscsi_set_iface_params().
Add the nla_len check before accessing the nlattr data and return EINVAL if
the length check fails.
Fixes: 0e43895ec1 ("[SCSI] be2iscsi: adding functionality to change network settings using iscsiadm")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230723075938.3713864-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The functions iscsi_if_set_param() and iscsi_if_set_host_param() convert an
nlattr payload to type char* and then call C string handling functions like
sscanf and kstrdup:
char *data = (char*)ev + sizeof(*ev);
...
sscanf(data, "%d", &value);
However, since the nlattr is provided by the user-space program and the
nlmsg skb is allocated with GFP_KERNEL instead of GFP_ZERO flag (see
netlink_alloc_large_skb() in netlink_sendmsg()), dirty data on the heap can
lead to an OOB access for those string handling functions.
By investigating how the bug is introduced, we find it is really
interesting as the old version parsing code starting from commit
fd7255f51a ("[SCSI] iscsi: add sysfs attrs for uspace sync up") treated
the nlattr as integer bytes instead of string and had length check in
iscsi_copy_param():
if (ev->u.set_param.len != sizeof(uint32_t))
BUG();
But, since the commit a54a52caad ("[SCSI] iscsi: fixup set/get param
functions"), the code treated the nlattr as C string while forgetting to
add any strlen checks(), opening the possibility of an OOB access.
Fix the potential OOB by adding the strlen() check before accessing the
buf. If the data passes this check, all low-level set_param handlers can
safely treat this buf as legal C string.
Fixes: fd7255f51a ("[SCSI] iscsi: add sysfs attrs for uspace sync up")
Fixes: 1d9bf13a9c ("[SCSI] iscsi class: add iscsi host set param event")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230723075820.3713119-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current NETLINK_ISCSI netlink parsing loop checks every nlmsg to make
sure the length is bigger than sizeof(struct iscsi_uevent) and then calls
iscsi_if_recv_msg().
nlh = nlmsg_hdr(skb);
if (nlh->nlmsg_len < sizeof(*nlh) + sizeof(*ev) ||
skb->len < nlh->nlmsg_len) {
break;
}
...
err = iscsi_if_recv_msg(skb, nlh, &group);
Hence, in iscsi_if_recv_msg() the nlmsg_data can be safely converted to
iscsi_uevent as the length is already checked.
However, in other cases the length of nlattr payload is not checked before
the payload is converted to other data structures. One example is
iscsi_set_path() which converts the payload to type iscsi_path without any
checks:
params = (struct iscsi_path *)((char *)ev + sizeof(*ev));
Whereas iscsi_if_transport_conn() correctly checks the pdu_len:
pdu_len = nlh->nlmsg_len - sizeof(*nlh) - sizeof(*ev);
if ((ev->u.send_pdu.hdr_size > pdu_len) ..
err = -EINVAL;
To sum up, some code paths called in iscsi_if_recv_msg() do not check the
length of the data (see below picture) and directly convert the data to
another data structure. This could result in an out-of-bound reads and heap
dirty data leakage.
_________ nlmsg_len(nlh) _______________
/ \
+----------+--------------+---------------------------+
| nlmsghdr | iscsi_uevent | data |
+----------+--------------+---------------------------+
\ /
iscsi_uevent->u.set_param.len
Fix the issue by adding the length check before accessing it. To clean up
the code, an additional parameter named rlen is added. The rlen is
calculated at the beginning of iscsi_if_recv_msg() which avoids duplicated
calculation.
Fixes: ac20c7bf07 ("[SCSI] iscsi_transport: Added Ping support")
Fixes: 43514774ff ("[SCSI] iscsi class: Add new NETLINK_ISCSI messages for cnic/bnx2i driver.")
Fixes: 1d9bf13a9c ("[SCSI] iscsi class: add iscsi host set param event")
Fixes: 01cb225dad ("[SCSI] iscsi: add target discvery event to transport class")
Fixes: 264faaaa12 ("[SCSI] iscsi: add transport end point callbacks")
Fixes: fd7255f51a ("[SCSI] iscsi: add sysfs attrs for uspace sync up")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230725024529.428311-1-linma@zju.edu.cn
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
blk_mq_run_queue() runs the queue asynchronously if BLK_MQ_F_BLOCKING
has been set. This is suboptimal since running the queue asynchronously
is slower than running the queue synchronously. This patch modifies
blk_mq_run_queue() as follows if BLK_MQ_F_BLOCKING has been set:
- Run the queue synchronously if it is allowed to sleep.
- Run the queue asynchronously if it is not allowed to sleep.
Additionally, blk_mq_run_hw_queue(hctx, false) calls are modified into
blk_mq_run_hw_queue(hctx, hctx->flags & BLK_MQ_F_BLOCKING) if the caller
may be invoked from atomic context.
The following caller chains have been reviewed:
blk_mq_run_hw_queue(hctx, false)
blk_mq_get_tag() /* may sleep, hence the functions it calls may also sleep */
blk_execute_rq() /* may sleep */
blk_mq_run_hw_queues(q, async=false)
blk_freeze_queue_start() /* may sleep */
blk_mq_requeue_work() /* may sleep */
scsi_kick_queue()
scsi_requeue_run_queue() /* may sleep */
scsi_run_host_queues()
scsi_ioctl_reset() /* may sleep */
blk_mq_insert_requests(hctx, ctx, list, run_queue_async=false)
blk_mq_dispatch_plug_list(plug, from_sched=false)
blk_mq_flush_plug_list(plug, from_schedule=false)
__blk_flush_plug(plug, from_schedule=false)
blk_add_rq_to_plug()
blk_mq_submit_bio() /* may sleep if REQ_NOWAIT has not been set */
blk_mq_plug_issue_direct()
blk_mq_flush_plug_list() /* see above */
blk_mq_dispatch_plug_list(plug, from_sched=false)
blk_mq_flush_plug_list() /* see above */
blk_mq_try_issue_directly()
blk_mq_submit_bio() /* may sleep if REQ_NOWAIT has not been set */
blk_mq_try_issue_list_directly(hctx, list)
blk_mq_insert_requests() /* see above */
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230721172731.955724-4-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_mq_kick_requeue_list() calls blk_mq_run_hw_queues() asynchronously.
Leave out the direct blk_mq_run_hw_queues() call. This patch causes
scsi_run_queue() to call blk_mq_run_hw_queues() asynchronously instead
of synchronously. Since scsi_run_queue() is not called from the hot I/O
submission path, this patch does not affect the hot path.
This patch prepares for allowing blk_mq_run_hw_queue() to sleep if
BLK_MQ_F_BLOCKING has been set. scsi_run_queue() may be called from
atomic context and must not sleep. Hence the removal of the
blk_mq_run_hw_queues(q, false) call. See also scsi_unblock_requests().
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20230721172731.955724-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Inline scsi_kick_queue() to prepare for modifying the second argument
passed to blk_mq_run_hw_queues().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20230721172731.955724-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Nilesh Javali <njavali@marvell.com> says:
Martin,
Please apply the qla2xxx driver bug fixes to the scsi tree at your
earliest convenience.
Link: https://lore.kernel.org/r/20230714070104.40052-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Different behavior were experienced of session being torn down vs not when
TMF is timed out. When FW detects the time out, the session is torn down.
When driver detects the time out, the session is not torn down.
Allow TMF error to return to upper layer without session tear down.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-10-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Task management can retry up to 5 times when FW resource becomes bottle
neck. Between the retries, there is a short sleep. Current code assumes
the chip has not reset or session has not changed.
Check for chip reset or session change before sending Task management.
Cc: stable@vger.kernel.org
Fixes: 9803fb5d27 ("scsi: qla2xxx: Fix task management cmd failure")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-9-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Connection does not resume after a host reset / chip reset. The cause of
the blockage is due to the FCF_ASYNC_ACTIVE left on. The gnl command was
interrupted by the chip reset. On exiting the command, this flag should be
turn off to allow relogin to reoccur. Clear this flag to prevent blockage.
Cc: stable@vger.kernel.org
Fixes: 17e64648aa ("scsi: qla2xxx: Correct fcport flags handling")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-7-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Link up failure occurred where driver failed to see certain events from FW
indicating link up (AEN 8011) and fabric login completion (AEN 8014).
Without these 2 events, driver would not proceed forward to scan the
fabric. The cause of this is due to delay in the receive of interrupt for
Mailbox 60 that causes qla to set the fw_started flag late. The late
setting of this flag causes other interrupts to be dropped. These dropped
interrupts happen to be the link up (AEN 8011) and fabric login completion
(AEN 8014).
Set fw_started flag early to prevent interrupts being dropped.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For each TMF request, driver iterates through each qpair and flushes
commands associated to the TMF. At the end of the qpair flush, a Marker is
used to complete the flush transaction. This process was repeated for each
qpair. The multiple flush and marker for this TMF request seems to cause
confusion for FW.
Instead, 1 flush is sent to FW. Driver would wait for FW to go through all
the I/Os on each qpair to be read then return. Driver then closes out the
transaction with a Marker.
Cc: stable@vger.kernel.org
Fixes: d90171dd0d ("scsi: qla2xxx: Multi-que support for TMF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Per FW recommendation, 8 TMF's can be outstanding for each
function. Previously, it allowed 8 per target.
Limit TMF to 8 per function.
Cc: stable@vger.kernel.org
Fixes: 6a87679626 ("scsi: qla2xxx: Fix task management cmd fail due to unavailable resource")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During NVMe queue creation, a new qpair is created. FW resource limit needs
to be re-adjusted to take into account the new qpair. Otherwise, NVMe
command can not go through. This issue was discovered while
testing/forcing FW execution to fail at load time.
Add call to readjust IOCB and exchange limit.
In addition, get FW state command and require FW to be running. Otherwise,
error is generated.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
System crash when using debug kernel due to link list corruption. The cause
of the link list corruption is due to session deletion was allowed to queue
up twice. Here's the internal trace that show the same port was allowed to
double queue for deletion on different cpu.
20808683956 015 qla2xxx [0000:13:00.1]-e801:4: Scheduling sess ffff93ebf9306800 for deletion 50:06:0e:80:12:48:ff:50 fc4_type 1
20808683957 027 qla2xxx [0000:13:00.1]-e801:4: Scheduling sess ffff93ebf9306800 for deletion 50:06:0e:80:12:48:ff:50 fc4_type 1
Move the clearing/setting of deleted flag lock.
Cc: stable@vger.kernel.org
Fixes: 726b854870 ("qla2xxx: Add framework for async fabric discovery")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.2.0.14
This patch set contains logging improvements, kref handling fixes,
discovery bug fixes, and refactoring of repeated code.
The patches were cut against Martin's 6.6/scsi-queue tree.
Link: https://lore.kernel.org/r/20230712180522.112722-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, we have dated logic to work around the differences between SLI-4
and SLI-3 resource reporting through sysfs.
Leave the SLI-3 path untouched, but for SLI4 path, retrieve resource values
from the phba->sli4_hba->max_cfg_param structure. Max values are populated
during ACQE events right after READ_CONFIG mbox cmd is sent. Instead of
the dated subtraction logic, used resource calculation is directly fed into
sysfs for display.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-11-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During initialization, a lot of the same logic is used on MSI-X vector CPU
affinity assignment.
Create a lpfc_next_present_cpu() helper routine, and apply its usage for
refactoring purposes.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A mailbox timeout error usually indicates something has gone wrong, and a
follow up reset of the HBA is a typical recovery mechanism. Introduce a
MBX_TMO_ERR flag to detect such cases and have lpfc_els_flush_cmd abort ELS
commands if the MBX_TMO_ERR flag condition was set. This ensures all of
the registered SGL resources meant for ELS traffic are not leaked after an
HBA reset.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch provides better target rport recovery when a target rport is
running in initiator mode to discover the fabric. Such a target will issue
a LOGO before switching back to strict target mode and changes are made to
recover the login. Log messages are also updated accordingly.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Previously, Establish Image Pair was set in all PRLI_ACC responses
regardless if the received PRLI was from an initiator or target function.
Specific target vendors that can operate in both initiator and target mode,
may view the PRLI_ACC with Establish Image Pair set as an invalid service
parameter when operating in initiator only mode. This causes discovery
issues later when the target switches on its target mode function.
Revise logic that determines an rport's role as an initiator or target and
set the Establish Image Pair service parameter bit only if the Target
Function bit is set.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The ndlp kref count implementation in lpfc_dev_loss_tmo_callbk() removes
the initial node reference when a vport is unloading. When lpfc_cleanup()
sends a DEVICE_RM event and is in NPR state, the driver calls
lpfc_drop_node(). Subsequently, lpfc_drop_node() also removes an ndlp kref
thinking it is the initial reference. This unintentionally introduces an
extra kref decrement on the ndlp object.
Fix by using the NLP_DROPPED node flag in lpfc_dev_loss_tmo_callbk() and
lpfc_drop_node() to coordinate the removal of the initial node reference.
In lpfc_dev_loss_tmo_callbk(), remove the SCSI transport reference provided
the node is registered in the dev_loss context because the driver cannot
call the SCSI transport in dev_loss context or afterwards. And, have
lpfc_drop_node() not remove a reference if another thread is acting or has
already acted on it.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Conditionalize when to put an ndlp into recovery mode when processing
RSCNs. As long as an ndlp state is beyond a PLOGI issue and has been
mapped to a transport layer before, the ndlp qualifies to be put into
recovery mode. Otherwise, treat the ndlp rport normally through the
discovery engine.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In lpfc_cmpl_els_flogi(), the return out: label decrements the ndlp kref
signaling that FLOGI processing on the ndlp is complete. In loop topology
path, there is an unnecessary ndlp put because it also branches to the out:
label. This also signals ndlp usage completion too soon. As such, remove
the extra lpfc_nlp_put() when in loop topology.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is reaching into a nvme_fc_cmd_iu ptr that belongs to the
transport during an abort. This could cause an unintentional ptr
dereference into memory that the driver does not own. Since the
nvme_fc_cmd_iu ptr was for logging purposes only, simplify the log message
such that the nvme_fc_cmd_iu reference is no longer needed.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The firmware diagnostic dump log message does not need to be a part of the
driver's log trace buffer because it is an expected user triggered event.
Change LOG_TRACE_EVENT verbose flag to LOG_SLI.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus. As
part of that merge prepping Arm DT support 13 years ago, they "temporarily"
include each other. They also include platform_device.h and of.h. As a
result, there's a pretty much random mix of those include files used
throughout the tree. In order to detangle these headers and replace the
implicit includes with struct declarations, users need to explicitly
include the correct includes.
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230714175052.4066150-1-robh@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus. As
part of that merge prepping Arm DT support 13 years ago, they "temporarily"
include each other. They also include platform_device.h and of.h. As a
result, there's a pretty much random mix of those include files used
throughout the tree. In order to detangle these headers and replace the
implicit includes with struct declarations, users need to explicitly
include the correct includes.
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230714175052.4066150-1-robh@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The Hyper-V host is queried to get the max transfer size that it supports,
and this value is used to set max_sectors for the synthetic SCSI
controller. However, this max transfer size may be too large for virtual
Fibre Channel devices, which are limited to 512 Kbytes. If a larger
transfer size is used with a vFC device, Hyper-V always returns an error,
and storvsc logs a message like this where the SRB status and SCSI status
are both zero:
hv_storvsc <GUID>: tag#197 cmd 0x8a status: scsi 0x0 srb 0x0 hv 0xc0000001
Add logic to limit the max transfer size to 512 Kbytes for vFC devices.
Fixes: 1d3e098078 ("scsi: storvsc: Correct reporting of Hyper-V I/O size limits")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1689887102-32806-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently spinlock hisi_hba->lock is used by both interrupts and threads
which requires the use of spin_lock_irqsave()/spin_unlock_irqrestore().
However, some places still use spin_lock()/spin_unlock(). Reviewing the
code revealed that it is unnecessary to use hisi_hba->lock in the function
hisi_sas_port_notify_formed() which is the only place that uses the
spinlock in interrupt context. So delete unused lock in
hisi_sas_port_notify_formed().
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1689045300-44318-4-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When FIO and debugfs snapshot occur concurrently, some SATA I/Os are failed
to return to the upper layer due to the setting of HISI_SAS_REJECT_CMD_BIT.
Then the SCSI layer invokes the error processing thread. However,
sas_ata_hard_reset() in EH also fails to be reset due to the setting of
HISI_SAS_REJECT_CMD_BIT. As a result, the device is disabled.
Calling scsi_block_requests() in the front of a debugfs snapshot and wait
command complete before setting HISI_SAS_REJECT_CMD_BIT to avoid SATA I/O
failures.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1689045300-44318-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The PIO read command has no response frame and the struct iu[1024] won't be
filled. I/Os which are normally completed will be treated as failed in
sas_ata_task_done() when iu contains abnormal dirty data.
Consequently ending_fis should not be filled by iu when the response frame
hasn't been written to memory.
Fixes: d380f55503 ("scsi: hisi_sas: Don't bother clearing status buffer IU in task prep")
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1689045300-44318-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit fcaa174a9c ("scsi/sg: don't grab scsi host module reference") make
a mess how blk_get_queue() is called, blk_get_queue() returns true on
success while the caller expects it returns 0 on success.
Fix this problem and also add a corresponding error message on failure.
Fixes: fcaa174a9c ("scsi/sg: don't grab scsi host module reference")
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Closes: https://lore.kernel.org/all/87lefv622n.fsf@linux.ibm.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20230705024001.177585-1-yukuai1@huaweicloud.com
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In response to a disk I/O request, Hyper-V has been observed to return SRB
status value 0x30. This indicates the request was not processed by Hyper-V
because low memory conditions on the host caused an internal error. The
0x30 status is not recognized by storvsc, so the I/O operation is not
flagged as an error. The request is treated as if it completed normally but
with zero data transferred, causing a flood of retries.
Add a definition for this SRB status value and handle it like other error
statuses from the Hyper-V host.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1688788886-94279-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A few late arriving patches that missed the initial pull request.
It's mostly bug fixes (the dt-bindings is a fix for the initial pull).
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZKmvwSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishResAQCPDbBh
omMRBE+W+Vx2TgOJGjo/F+T1D2JjBhLIGpNVggEApJtgrQutAToiCU/qIP9GOTl7
evetzh5boMMuyD2s7ak=
=pi4v
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"A few late arriving patches that missed the initial pull request. It's
mostly bug fixes (the dt-bindings is a fix for the initial pull)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Remove unused function declaration
scsi: target: docs: Remove tcm_mod_builder.py
scsi: target: iblock: Quiet bool conversion warning with pr_preempt use
scsi: dt-bindings: ufs: qcom: Fix ICE phandle
scsi: core: Simplify scsi_cdl_check_cmd()
scsi: isci: Fix comment typo
scsi: smartpqi: Replace one-element arrays with flexible-array members
scsi: target: tcmu: Replace strlcpy() with strscpy()
scsi: ncr53c8xx: Replace strlcpy() with strscpy()
scsi: lpfc: Fix lpfc_name struct packing
Damien Le Moal <dlemoal@kernel.org> says:
blk_revalidate_disk_zones() implements checks of the zones of a zoned
block device, verifying that the zone size is a power of 2 number of
sectors, that all zones (except possibly the last one) have the same
size and that zones cover the entire addressing space of the device.
While these checks are appropriate to verify that well tested hardware
devices have an adequate zone configurations, they lack in certain areas
which may result in issues with potentially buggy emulated devices
implemented with user drivers such as ublk or tcmu. Specifically, this
function does not check if the device driver indicated support for the
mandatory zone append writes, that is, if the device
max_zone_append_sectors queue limit is set to a non-zero value.
Additionally, invalid zones such as a zero length zone with a start
sector equal to the device capacity will not be detected and result in
out of bounds use of the zone bitmaps prepared with the callback
function blk_revalidate_zone_cb().
This series address these issues by modifying the 4 block device drivers
that currently support zoned block devices to ensure that they all set a
zoned device zone size and max zone append sectors limit before
executing blk_revalidate_disk_zones(). With these changes in place,
patch 5 improves blk_revalidate_disk_zones() to address the missing
checks, relying on the fact that the zone size and zone append limit are
normally set when this function is called.
Link: https://lore.kernel.org/r/20230703024812.76778-1-dlemoal@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In sd_zbc_revalidate_zones(), execute blk_queue_chunk_sectors() and
blk_queue_max_zone_append_sectors() to respectively set a ZBC device
zone size and maximum zone append sector limit before executing
blk_revalidate_disk_zones(). This is to allow the block layer zone
reavlidation to check these device characteristics prior to checking all
zones of the device.
Since blk_queue_max_zone_append_sectors() already caps the device
maximum zone append limit to the zone size and to the maximum command
size, the max_append value passed to blk_queue_max_zone_append_sectors()
is simplified to the maximum number of segments times the number of
sectors per page.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230703024812.76778-2-dlemoal@kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The one-element array in aac_aifcmd is actually meant as a flexible array,
and causes an overflow warning that can be avoided using the normal flex
arrays:
drivers/scsi/aacraid/commsup.c:1166:17: error: array index 1 is past the end of the array (that has type 'u8[1]' (aka 'unsigned char[1]'), cast to '__le32 *' (aka 'unsigned int *')) [-Werror,-Warray-bounds]
(((__le32 *)aifcmd->data)[1] == cpu_to_le32(3));
^ ~
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230703114851.1194510-1-arnd@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The ramdisk rwlocks are not used anymore.
Fixes: 87c715dcde ("scsi: scsi_debug: Add per_host_store option")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Link: https://lore.kernel.org/r/20230628150638.53218-1-mlombard@redhat.com
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This should be negative -EAGAIN instead of positive. The callers treat
non-zero error codes the same so it doesn't really impact runtime beyond
some trivial differences to debug output.
Fixes: 80676d054e ("scsi: qla2xxx: Fix session cleanup hang")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/49866d28-4cfe-47b0-842b-78f110e61aab@moroto.mountain
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Smatch and Clang both complain that LOGIN_TEMPLATE_SIZE is more than
sizeof(ha->plogi_els_payld.fl_csp).
Smatch warning:
drivers/scsi/qla2xxx/qla_iocb.c:3075 qla24xx_els_dcmd2_iocb()
warn: '&ha->plogi_els_payld.fl_csp' sometimes too small '16' size = 112
Clang warning:
include/linux/fortify-string.h:592:4: error: call to
'__read_overflow2_field' declared with 'warning' attribute: detected
read beyond size of field (2nd parameter); maybe use struct_group()?
[-Werror,-Wattribute-warning]
__read_overflow2_field(q_size_field, size);
When I was reading this code I assumed the "- 4" meant that we were
skipping the last 4 bytes but actually it turned out that we are
skipping the first four bytes.
I have re-written it remove the magic numbers, be more clear and
silence the static checker warnings.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/4aa0485e-766f-4b02-8d5d-c6781ea8f511@moroto.mountain
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The variable phba->fcf.fcf_flag is often protected by the lock
phba->hbalock() when is accessed. Here is an example in
lpfc_unregister_fcf_rescan():
spin_lock_irq(&phba->hbalock);
phba->fcf.fcf_flag |= FCF_INIT_DISC;
spin_unlock_irq(&phba->hbalock);
However, in the same function, phba->fcf.fcf_flag is assigned with 0
without holding the lock, and thus can cause a data race:
phba->fcf.fcf_flag = 0;
To fix this possible data race, a lock and unlock pair is added when
accessing the variable phba->fcf.fcf_flag.
Reported-by: BassCheck <bass@buaa.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Link: https://lore.kernel.org/r/20230630024748.1035993-1-islituo@gmail.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
lpfc, qla2xxx). We have a couple of major core changes impacting
other systems: Command Duration Limits, which spills into block and
ATA and block level Persistent Reservation Operations, which touches
block, nvme, target and dm (both of which are added with merge commits
containing a cover letter explaining what's going on).
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZJ19cSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishfZpAQCQBuWR
ELcOhsaG5KzO6xLWcH8mjsOoxffKvazZjTKXlAD5ATEv7++E250oKS3t+yfjae5I
Lc195MlDju85ItUQgfk=
=U9ik
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
lpfc, qla2xxx).
We have a couple of major core changes impacting other systems:
- Command Duration Limits, which spills into block and ATA
- block level Persistent Reservation Operations, which touches block,
nvme, target and dm
Both of these are added with merge commits containing a cover letter
explaining what's going on"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (187 commits)
scsi: core: Improve warning message in scsi_device_block()
scsi: core: Replace scsi_target_block() with scsi_block_targets()
scsi: core: Don't wait for quiesce in scsi_device_block()
scsi: core: Don't wait for quiesce in scsi_stop_queue()
scsi: core: Merge scsi_internal_device_block() and device_block()
scsi: sg: Increase number of devices
scsi: bsg: Increase number of devices
scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
scsi: ufs: ufs-pci: Add support for Intel Arrow Lake
scsi: sd: sd_zbc: Use PAGE_SECTORS_SHIFT
scsi: ufs: wb: Add explicit flush_threshold sysfs attribute
scsi: ufs: ufs-qcom: Switch to the new ICE API
scsi: ufs: dt-bindings: qcom: Add ICE phandle
scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_RTC quirk
scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_INTR quirk
scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_RTC
scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR
scsi: ufs: core: Remove dedicated hwq for dev command
scsi: ufs: core: mcq: Fix the incorrect OCS value for the device command
scsi: ufs: dt-bindings: samsung,exynos: Drop unneeded quotes
...
- Add support for the .remove_new callback to the ata_platform code to
simplify device removal interface (Uwe).
- Code simplification in ata_dev_revalidate() (Yahu)
- Fix code indentation and coding style in the pata_parport protocol
modules to avoid warnings from static code analyzers (me)
- Clarify ata_eh_qc_retry() behavior with better comments (Niklas)
- Simplify and improve ata_change_queue_depth() behavior to have a
consistent behavior between libsas managed devices and libata managed
devices (e.g. AHCI connected devices) (me).
- Cleanup libata-scsi and libata-eh code to use the ata_ncq_enabled()
and ata_ncq_supported() helpers instead of open coding flags tests
(me)
- Cleanup ahci_reset_controller() code (me).
- Change the pata_octeon_cf and sata_svw drivers to use
of_property_read_reg() to simplify the code (Rob, me).
- Remove unnecessary include files from ahci_octeon driver (me)
- Modify the DesignWare ahci dt bindings to add support for the
Rockchip RK3588 AHCI (Sebastian).
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCZJourwAKCRDdoc3SxdoY
dv43AQDzAFY0/0sjvqltGC31wRzzh/vEQFWsYt89Q4csMr4QgAEAkLO1gquH5/Wt
sxnCLh1WdFqbyNy6xsw+CXrfeREGDgo=
=IzEr
-----END PGP SIGNATURE-----
Merge tag 'ata-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata updates from Damien Le Moal:
- Add support for the .remove_new callback to the ata_platform code to
simplify device removal interface (Uwe)
- Code simplification in ata_dev_revalidate() (Yahu)
- Fix code indentation and coding style in the pata_parport protocol
modules to avoid warnings from static code analyzers (me)
- Clarify ata_eh_qc_retry() behavior with better comments (Niklas)
- Simplify and improve ata_change_queue_depth() behavior to have a
consistent behavior between libsas managed devices and libata managed
devices (e.g. AHCI connected devices) (me)
- Cleanup libata-scsi and libata-eh code to use the ata_ncq_enabled()
and ata_ncq_supported() helpers instead of open coding flags tests
(me)
- Cleanup ahci_reset_controller() code (me)
- Change the pata_octeon_cf and sata_svw drivers to use
of_property_read_reg() to simplify the code (Rob, me)
- Remove unnecessary include files from ahci_octeon driver (me)
- Modify the DesignWare ahci dt bindings to add support for the
Rockchip RK3588 AHCI (Sebastian)
* tag 'ata-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (29 commits)
dt-bindings: phy: rockchip: rk3588 has two reset lines
dt-bindings: ata: dwc-ahci: add Rockchip RK3588
dt-bindings: ata: dwc-ahci: add PHY clocks
ata: ahci_octeon: Remove unnecessary include
ata: pata_octeon_cf: Add missing header include
ata: ahci: Cleanup ahci_reset_controller()
ata: Use of_property_read_reg() to parse "reg"
ata: libata-scsi: Use ata_ncq_supported in ata_scsi_dev_config()
ata: libata-eh: Use ata_ncq_enabled() in ata_eh_speed_down()
ata: libata-sata: Improve ata_change_queue_depth()
ata: libata-sata: Simplify ata_change_queue_depth()
ata: libata-eh: Clarify ata_eh_qc_retry() behavior at call site
ata: pata_parport: Fix on26 module code indentation and style
ata: pata_parport: Fix on20 module code indentation and style
ata: pata_parport: Fix ktti module code indentation and style
ata: pata_parport: Fix kbic module code indentation and style
ata: pata_parport: Fix friq module code indentation and style
ata: pata_parport: Fix fit3 module code indentation and style
ata: pata_parport: Fix fit2 module code indentation and style
ata: pata_parport: Fix epia module code indentation and style
...
Reading the 800+ pages of SPC often leads to a brain shutdown and to less
than ideal code... This resulted in the checks of the rwcdlp and cdlp
fields in scsi_cdl_check_cmd() to have identical if-else branches.
Replace this with a comment describing the cases we are interested in and
replace the if-else code block with a simple test of the cdlp field that is
used as the function return value.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202306221657.BJHEADkz-lkp@intel.com/
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230623073057.816199-1-dlemoal@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Core
----
- Rework the sendpage & splice implementations. Instead of feeding
data into sockets page by page extend sendmsg handlers to support
taking a reference on the data, controlled by a new flag called
MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file
to invoke an additional callback instead of trying to predict what
the right combination of MORE/NOTLAST flags is.
Remove the MSG_SENDPAGE_NOTLAST flag completely.
- Implement SCM_PIDFD, a new type of CMSG type analogous to
SCM_CREDENTIALS, but it contains pidfd instead of plain pid.
- Enable socket busy polling with CONFIG_RT.
- Improve reliability and efficiency of reporting for ref_tracker.
- Auto-generate a user space C library for various Netlink families.
Protocols
---------
- Allow TCP to shrink the advertised window when necessary, prevent
sk_rcvbuf auto-tuning from growing the window all the way up to
tcp_rmem[2].
- Use per-VMA locking for "page-flipping" TCP receive zerocopy.
- Prepare TCP for device-to-device data transfers, by making sure
that payloads are always attached to skbs as page frags.
- Make the backoff time for the first N TCP SYN retransmissions
linear. Exponential backoff is unnecessarily conservative.
- Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO).
- Avoid waking up applications using TLS sockets until we have
a full record.
- Allow using kernel memory for protocol ioctl callbacks, paving
the way to issuing ioctls over io_uring.
- Add nolocalbypass option to VxLAN, forcing packets to be fully
encapsulated even if they are destined for a local IP address.
- Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
in-kernel ECMP implementation (e.g. Open vSwitch) select the same
link for all packets. Support L4 symmetric hashing in Open vSwitch.
- PPPoE: make number of hash bits configurable.
- Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
(ipconfig).
- Add layer 2 miss indication and filtering, allowing higher layers
(e.g. ACL filters) to make forwarding decisions based on whether
packet matched forwarding state in lower devices (bridge).
- Support matching on Connectivity Fault Management (CFM) packets.
- Hide the "link becomes ready" IPv6 messages by demoting their
printk level to debug.
- HSR: don't enable promiscuous mode if device offloads the proto.
- Support active scanning in IEEE 802.15.4.
- Continue work on Multi-Link Operation for WiFi 7.
BPF
---
- Add precision propagation for subprogs and callbacks. This allows
maintaining verification efficiency when subprograms are used,
or in fact passing the verifier at all for complex programs,
especially those using open-coded iterators.
- Improve BPF's {g,s}setsockopt() length handling. Previously BPF
assumed the length is always equal to the amount of written data.
But some protos allow passing a NULL buffer to discover what
the output buffer *should* be, without writing anything.
- Accept dynptr memory as memory arguments passed to helpers.
- Add routing table ID to bpf_fib_lookup BPF helper.
- Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands.
- Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
maps as read-only).
- Show target_{obj,btf}_id in tracing link fdinfo.
- Addition of several new kfuncs (most of the names are self-explanatory):
- Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
and bpf_dynptr_clone().
- bpf_task_under_cgroup()
- bpf_sock_destroy() - force closing sockets
- bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
Netfilter
---------
- Relax set/map validation checks in nf_tables. Allow checking
presence of an entry in a map without using the value.
- Increase ip_vs_conn_tab_bits range for 64BIT builds.
- Allow updating size of a set.
- Improve NAT tuple selection when connection is closing.
Driver API
----------
- Integrate netdev with LED subsystem, to allow configuring HW
"offloaded" blinking of LEDs based on link state and activity
(i.e. packets coming in and out).
- Support configuring rate selection pins of SFP modules.
- Factor Clause 73 auto-negotiation code out of the drivers, provide
common helper routines.
- Add more fool-proof helpers for managing lifetime of MDIO devices
associated with the PCS layer.
- Allow drivers to report advanced statistics related to Time Aware
scheduler offload (taprio).
- Allow opting out of VF statistics in link dump, to allow more VFs
to fit into the message.
- Split devlink instance and devlink port operations.
New hardware / drivers
----------------------
- Ethernet:
- Synopsys EMAC4 IP support (stmmac)
- Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
- Marvell 88E6250 7 port switches
- Microchip LAN8650/1 Rev.B0 PHYs
- MediaTek MT7981/MT7988 built-in 1GE PHY driver
- WiFi:
- Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
- Realtek RTL8723DS (SDIO variant)
- Realtek RTL8851BE
- CAN:
- Fintek F81604
Drivers
-------
- Ethernet NICs:
- Intel (100G, ice):
- support dynamic interrupt allocation
- use meta data match instead of VF MAC addr on slow-path
- nVidia/Mellanox:
- extend link aggregation to handle 4, rather than just 2 ports
- spawn sub-functions without any features by default
- OcteonTX2:
- support HTB (Tx scheduling/QoS) offload
- make RSS hash generation configurable
- support selecting Rx queue using TC filters
- Wangxun (ngbe/txgbe):
- add basic Tx/Rx packet offloads
- add phylink support (SFP/PCS control)
- Freescale/NXP (enetc):
- report TAPRIO packet statistics
- Solarflare/AMD:
- support matching on IP ToS and UDP source port of outer header
- VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
- add devlink dev info support for EF10
- Virtual NICs:
- Microsoft vNIC:
- size the Rx indirection table based on requested configuration
- support VLAN tagging
- Amazon vNIC:
- try to reuse Rx buffers if not fully consumed, useful for ARM
servers running with 16kB pages
- Google vNIC:
- support TCP segmentation of >64kB frames
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- enable USXGMII (88E6191X)
- Microchip:
- lan966x: add support for Egress Stage 0 ACL engine
- lan966x: support mapping packet priority to internal switch
priority (based on PCP or DSCP)
- Ethernet PHYs:
- Broadcom PHYs:
- support for Wake-on-LAN for BCM54210E/B50212E
- report LPI counter
- Microsemi PHYs: support RGMII delay configuration (VSC85xx)
- Micrel PHYs: receive timestamp in the frame (LAN8841)
- Realtek PHYs: support optional external PHY clock
- Altera TSE PCS: merge the driver into Lynx PCS which it is
a variant of
- CAN: Kvaser PCIEcan:
- support packet timestamping
- WiFi:
- Intel (iwlwifi):
- major update for new firmware and Multi-Link Operation (MLO)
- configuration rework to drop test devices and split
the different families
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
- Qualcomm 802.11ax (ath11k):
- Multiple Basic Service Set Identifier (MBSSID) and
Enhanced MBSSID Advertisement (EMA) support in AP mode
- support factory test mode
- RealTek (rtw89):
- add RSSI based antenna diversity
- support U-NII-4 channels on 5 GHz band
- RealTek (rtl8xxxu):
- AP mode support for 8188f
- support USB RX aggregation for the newer chips
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S
IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3
MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07
IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ
CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f
mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/
fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp
J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY
dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7
yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/
JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv
tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4=
=9i4I
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Jakub Kicinski:
"WiFi 7 and sendpage changes are the biggest pieces of work for this
release. The latter will definitely require fixes but I think that we
got it to a reasonable point.
Core:
- Rework the sendpage & splice implementations
Instead of feeding data into sockets page by page extend sendmsg
handlers to support taking a reference on the data, controlled by a
new flag called MSG_SPLICE_PAGES
Rework the handling of unexpected-end-of-file to invoke an
additional callback instead of trying to predict what the right
combination of MORE/NOTLAST flags is
Remove the MSG_SENDPAGE_NOTLAST flag completely
- Implement SCM_PIDFD, a new type of CMSG type analogous to
SCM_CREDENTIALS, but it contains pidfd instead of plain pid
- Enable socket busy polling with CONFIG_RT
- Improve reliability and efficiency of reporting for ref_tracker
- Auto-generate a user space C library for various Netlink families
Protocols:
- Allow TCP to shrink the advertised window when necessary, prevent
sk_rcvbuf auto-tuning from growing the window all the way up to
tcp_rmem[2]
- Use per-VMA locking for "page-flipping" TCP receive zerocopy
- Prepare TCP for device-to-device data transfers, by making sure
that payloads are always attached to skbs as page frags
- Make the backoff time for the first N TCP SYN retransmissions
linear. Exponential backoff is unnecessarily conservative
- Create a new MPTCP getsockopt to retrieve all info
(MPTCP_FULL_INFO)
- Avoid waking up applications using TLS sockets until we have a full
record
- Allow using kernel memory for protocol ioctl callbacks, paving the
way to issuing ioctls over io_uring
- Add nolocalbypass option to VxLAN, forcing packets to be fully
encapsulated even if they are destined for a local IP address
- Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
in-kernel ECMP implementation (e.g. Open vSwitch) select the same
link for all packets. Support L4 symmetric hashing in Open vSwitch
- PPPoE: make number of hash bits configurable
- Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
(ipconfig)
- Add layer 2 miss indication and filtering, allowing higher layers
(e.g. ACL filters) to make forwarding decisions based on whether
packet matched forwarding state in lower devices (bridge)
- Support matching on Connectivity Fault Management (CFM) packets
- Hide the "link becomes ready" IPv6 messages by demoting their
printk level to debug
- HSR: don't enable promiscuous mode if device offloads the proto
- Support active scanning in IEEE 802.15.4
- Continue work on Multi-Link Operation for WiFi 7
BPF:
- Add precision propagation for subprogs and callbacks. This allows
maintaining verification efficiency when subprograms are used, or
in fact passing the verifier at all for complex programs,
especially those using open-coded iterators
- Improve BPF's {g,s}setsockopt() length handling. Previously BPF
assumed the length is always equal to the amount of written data.
But some protos allow passing a NULL buffer to discover what the
output buffer *should* be, without writing anything
- Accept dynptr memory as memory arguments passed to helpers
- Add routing table ID to bpf_fib_lookup BPF helper
- Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
- Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
maps as read-only)
- Show target_{obj,btf}_id in tracing link fdinfo
- Addition of several new kfuncs (most of the names are
self-explanatory):
- Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
and bpf_dynptr_clone().
- bpf_task_under_cgroup()
- bpf_sock_destroy() - force closing sockets
- bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
Netfilter:
- Relax set/map validation checks in nf_tables. Allow checking
presence of an entry in a map without using the value
- Increase ip_vs_conn_tab_bits range for 64BIT builds
- Allow updating size of a set
- Improve NAT tuple selection when connection is closing
Driver API:
- Integrate netdev with LED subsystem, to allow configuring HW
"offloaded" blinking of LEDs based on link state and activity
(i.e. packets coming in and out)
- Support configuring rate selection pins of SFP modules
- Factor Clause 73 auto-negotiation code out of the drivers, provide
common helper routines
- Add more fool-proof helpers for managing lifetime of MDIO devices
associated with the PCS layer
- Allow drivers to report advanced statistics related to Time Aware
scheduler offload (taprio)
- Allow opting out of VF statistics in link dump, to allow more VFs
to fit into the message
- Split devlink instance and devlink port operations
New hardware / drivers:
- Ethernet:
- Synopsys EMAC4 IP support (stmmac)
- Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
- Marvell 88E6250 7 port switches
- Microchip LAN8650/1 Rev.B0 PHYs
- MediaTek MT7981/MT7988 built-in 1GE PHY driver
- WiFi:
- Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
- Realtek RTL8723DS (SDIO variant)
- Realtek RTL8851BE
- CAN:
- Fintek F81604
Drivers:
- Ethernet NICs:
- Intel (100G, ice):
- support dynamic interrupt allocation
- use meta data match instead of VF MAC addr on slow-path
- nVidia/Mellanox:
- extend link aggregation to handle 4, rather than just 2 ports
- spawn sub-functions without any features by default
- OcteonTX2:
- support HTB (Tx scheduling/QoS) offload
- make RSS hash generation configurable
- support selecting Rx queue using TC filters
- Wangxun (ngbe/txgbe):
- add basic Tx/Rx packet offloads
- add phylink support (SFP/PCS control)
- Freescale/NXP (enetc):
- report TAPRIO packet statistics
- Solarflare/AMD:
- support matching on IP ToS and UDP source port of outer
header
- VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
- add devlink dev info support for EF10
- Virtual NICs:
- Microsoft vNIC:
- size the Rx indirection table based on requested
configuration
- support VLAN tagging
- Amazon vNIC:
- try to reuse Rx buffers if not fully consumed, useful for ARM
servers running with 16kB pages
- Google vNIC:
- support TCP segmentation of >64kB frames
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- enable USXGMII (88E6191X)
- Microchip:
- lan966x: add support for Egress Stage 0 ACL engine
- lan966x: support mapping packet priority to internal switch
priority (based on PCP or DSCP)
- Ethernet PHYs:
- Broadcom PHYs:
- support for Wake-on-LAN for BCM54210E/B50212E
- report LPI counter
- Microsemi PHYs: support RGMII delay configuration (VSC85xx)
- Micrel PHYs: receive timestamp in the frame (LAN8841)
- Realtek PHYs: support optional external PHY clock
- Altera TSE PCS: merge the driver into Lynx PCS which it is a
variant of
- CAN: Kvaser PCIEcan:
- support packet timestamping
- WiFi:
- Intel (iwlwifi):
- major update for new firmware and Multi-Link Operation (MLO)
- configuration rework to drop test devices and split the
different families
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
- Qualcomm 802.11ax (ath11k):
- Multiple Basic Service Set Identifier (MBSSID) and Enhanced
MBSSID Advertisement (EMA) support in AP mode
- support factory test mode
- RealTek (rtw89):
- add RSSI based antenna diversity
- support U-NII-4 channels on 5 GHz band
- RealTek (rtl8xxxu):
- AP mode support for 8188f
- support USB RX aggregation for the newer chips"
* tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
net: scm: introduce and use scm_recv_unix helper
af_unix: Skip SCM_PIDFD if scm->pid is NULL.
net: lan743x: Simplify comparison
netlink: Add __sock_i_ino() for __netlink_diag_dump().
net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
Revert "af_unix: Call scm_recv() only after scm_set_cred()."
phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
libceph: Partially revert changes to support MSG_SPLICE_PAGES
net: phy: mscc: fix packet loss due to RGMII delays
net: mana: use vmalloc_array and vcalloc
net: enetc: use vmalloc_array and vcalloc
ionic: use vmalloc_array and vcalloc
pds_core: use vmalloc_array and vcalloc
gve: use vmalloc_array and vcalloc
octeon_ep: use vmalloc_array and vcalloc
net: usb: qmi_wwan: add u-blox 0x1312 composition
perf trace: fix MSG_SPLICE_PAGES build error
ipvlan: Fix return value of ipvlan_queue_xmit()
netfilter: nf_tables: fix underflow in chain reference counter
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
...
- Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)
- Convert strreplace() to return string start (Andy Shevchenko)
- Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)
- Add missing function prototypes seen with W=1 (Arnd Bergmann)
- Fix strscpy() kerndoc typo (Arne Welzel)
- Replace strlcpy() with strscpy() across many subsystems which were
either Acked by respective maintainers or were trivial changes that
went ignored for multiple weeks (Azeem Shaikh)
- Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)
- Add KUnit tests for strcat()-family
- Enable KUnit tests of FORTIFY wrappers under UML
- Add more complete FORTIFY protections for strlcat()
- Add missed disabling of FORTIFY for all arch purgatories.
- Enable -fstrict-flex-arrays=3 globally
- Tightening UBSAN_BOUNDS when using GCC
- Improve checkpatch to check for strcpy, strncpy, and fake flex arrays
- Improve use of const variables in FORTIFY
- Add requested struct_size_t() helper for types not pointers
- Add __counted_by macro for annotating flexible array size members
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmSbftQWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJj0MD/9X9jzJzCmsAU+yNldeoAzC84Sk
GVU3RBxGcTNysL1gZXynkIgigw7DWc4htMGeSABHHwQRVP65JCH1Kw/VqIkyumbx
9LdX6IklMJb4pRT4PVU3azebV4eNmSjlur2UxMeW54Czm91/6I8RHbJOyAPnOUmo
2oomGdP/hpEHtKR7hgy8Axc6w5ySwQixh2V5sVZG3VbvCS5WKTmTXbs6puuRT5hz
iHt7v+7VtEg/Qf1W7J2oxfoghvVBsaRrSLrExWT/oZYh1ZxM7DsCAAoG/IsDgHGA
9LBXiRECgAFThbHVxLvvKZQMXdVk0i8iXLX43XMKC0wTA+NTyH7wlcQQ4RWNMuo8
sfA9Qm9gMArXaf64aymr3Uwn20Zan0391HdlbhOJZAE6v3PPJbleUnM58AzD2d3r
5Lz6AIFBxDImy+3f9iDWgacCT5/PkeiXTHzk9QnKhJyKKtRA58XJxj4q2+rPnGJP
n4haXqoxD5FJbxdXiGKk31RS0U5HBug7wkOcUrTqDHUbc/QNU2b7dxTKUx+zYtCU
uV5emPzpF4H4z+91WpO47n9gkMAfwV0lt9S2dwS8pxsgqctbmIan+Jgip7rsqZ2G
OgLXBsb43eEs+6WgO8tVt/ZHYj9ivGMdrcNcsIfikzNs/xweUJ53k2xSEn2xEa5J
cwANDmkL6QQK7yfeeg==
=s0j1
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"There are three areas of note:
A bunch of strlcpy()->strscpy() conversions ended up living in my tree
since they were either Acked by maintainers for me to carry, or got
ignored for multiple weeks (and were trivial changes).
The compiler option '-fstrict-flex-arrays=3' has been enabled
globally, and has been in -next for the entire devel cycle. This
changes compiler diagnostics (though mainly just -Warray-bounds which
is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_
coverage. In other words, there are no new restrictions, just
potentially new warnings. Any new FORTIFY warnings we've seen have
been fixed (usually in their respective subsystem trees). For more
details, see commit df8fc4e934.
The under-development compiler attribute __counted_by has been added
so that we can start annotating flexible array members with their
associated structure member that tracks the count of flexible array
elements at run-time. It is possible (likely?) that the exact syntax
of the attribute will change before it is finalized, but GCC and Clang
are working together to sort it out. Any changes can be made to the
macro while we continue to add annotations.
As an example of that last case, I have a treewide commit waiting with
such annotations found via Coccinelle:
https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b
Also see commit dd06e72e68 for more details.
Summary:
- Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)
- Convert strreplace() to return string start (Andy Shevchenko)
- Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)
- Add missing function prototypes seen with W=1 (Arnd Bergmann)
- Fix strscpy() kerndoc typo (Arne Welzel)
- Replace strlcpy() with strscpy() across many subsystems which were
either Acked by respective maintainers or were trivial changes that
went ignored for multiple weeks (Azeem Shaikh)
- Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)
- Add KUnit tests for strcat()-family
- Enable KUnit tests of FORTIFY wrappers under UML
- Add more complete FORTIFY protections for strlcat()
- Add missed disabling of FORTIFY for all arch purgatories.
- Enable -fstrict-flex-arrays=3 globally
- Tightening UBSAN_BOUNDS when using GCC
- Improve checkpatch to check for strcpy, strncpy, and fake flex
arrays
- Improve use of const variables in FORTIFY
- Add requested struct_size_t() helper for types not pointers
- Add __counted_by macro for annotating flexible array size members"
* tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits)
netfilter: ipset: Replace strlcpy with strscpy
uml: Replace strlcpy with strscpy
um: Use HOST_DIR for mrproper
kallsyms: Replace all non-returning strlcpy with strscpy
sh: Replace all non-returning strlcpy with strscpy
of/flattree: Replace all non-returning strlcpy with strscpy
sparc64: Replace all non-returning strlcpy with strscpy
Hexagon: Replace all non-returning strlcpy with strscpy
kobject: Use return value of strreplace()
lib/string_helpers: Change returned value of the strreplace()
jbd2: Avoid printing outside the boundary of the buffer
checkpatch: Check for 0-length and 1-element arrays
riscv/purgatory: Do not use fortified string functions
s390/purgatory: Do not use fortified string functions
x86/purgatory: Do not use fortified string functions
acpi: Replace struct acpi_table_slit 1-element array with flex-array
clocksource: Replace all non-returning strlcpy with strscpy
string: use __builtin_memcpy() in strlcpy/strlcat
staging: most: Replace all non-returning strlcpy with strscpy
drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
...
For historical reasons, unbound workqueues with max concurrency limit of 1
are considered ordered, even though the concurrency limit hasn't been
system-wide for a long time. This creates ambiguity around whether ordered
execution is actually required for correctness, which was actually confusing
for e.g. btrfs (btrfs updates are being routed through the btrfs tree).
There aren't that many users in the tree which use the combination and there
are pending improvements to unbound workqueue affinity handling which will
make inadvertent use of ordered workqueue a bigger loss. This pull request
clarifies the situation for most of them by updating the ones which require
ordered execution to use alloc_ordered_workqueue().
There are some conversions being routed through subsystem-specific trees and
likely a few stragglers. Once they're all converted, workqueue can trigger a
warning on unbound + @max_active==1 usages and eventually drop the implicit
ordered behavior.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoKnA4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGc5SAQDOtjML7Cx9AYzbY5+nYc0wTebRRTXGeOu7A3Xy
j50rVgEAjHgvHLIdmeYmVhCeHOSN4q7Wn5AOwaIqZalOhfLyKQk=
=hs79
-----END PGP SIGNATURE-----
Merge tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull ordered workqueue creation updates from Tejun Heo:
"For historical reasons, unbound workqueues with max concurrency limit
of 1 are considered ordered, even though the concurrency limit hasn't
been system-wide for a long time.
This creates ambiguity around whether ordered execution is actually
required for correctness, which was actually confusing for e.g. btrfs
(btrfs updates are being routed through the btrfs tree).
There aren't that many users in the tree which use the combination and
there are pending improvements to unbound workqueue affinity handling
which will make inadvertent use of ordered workqueue a bigger loss.
This clarifies the situation for most of them by updating the ones
which require ordered execution to use alloc_ordered_workqueue().
There are some conversions being routed through subsystem-specific
trees and likely a few stragglers. Once they're all converted,
workqueue can trigger a warning on unbound + @max_active==1 usages and
eventually drop the implicit ordered behavior"
* tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
rxrpc: Use alloc_ordered_workqueue() to create ordered workqueues
net: qrtr: Use alloc_ordered_workqueue() to create ordered workqueues
net: wwan: t7xx: Use alloc_ordered_workqueue() to create ordered workqueues
dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues
media: amphion: Use alloc_ordered_workqueue() to create ordered workqueues
scsi: NCR5380: Use default @max_active for hostdata->work_q
media: coda: Use alloc_ordered_workqueue() to create ordered workqueues
crypto: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues
wifi: ath10/11/12k: Use alloc_ordered_workqueue() to create ordered workqueues
wifi: mwifiex: Use default @max_active for workqueues
wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq
xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues
virt: acrn: Use alloc_ordered_workqueue() to create ordered workqueues
net: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues
net: thunderx: Use alloc_ordered_workqueue() to create ordered workqueues
greybus: Use alloc_ordered_workqueue() to create ordered workqueues
powerpc, workqueue: Use alloc_ordered_workqueue() to create ordered workqueues
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmSV8dwQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpilGD/9Yys1oxIXJpRf00fzrylAlBthRxMjFQVWw
zAut106hAQiBHvU8IkmGA3MvEFVHxtzwYhHI7IR8K3aZBIqscweCqmVI9JyogJw9
U9Twnzel47VmuKdM94FeoN+hbj1fP8EWTjzmy67/zEEfFCdmHvNlMi3lSrGYIpFy
39LxTB99Y4UarM5PtWbes37GYYljzMSWKuo4AfBkvq1eQa+sZ0Vq2xAABKq3UM7f
apqhgHtkJooRePDP0eQp+kAyyVMgW2jIK+oIdJDxNF3CKTu2w40RzaYz6fp+jVSU
H4R/xS59GW4/xql+VBJDh/qJg9K62DPPYjlW8BmSR8+IjvfFpsyH3/MacE50CD3P
20fs/Mnj49H79fDrQEHJI53cOOb2EmUitbwLbvOcColNTPpt8loBtdQxjF2RMU8R
Nyort9DJPFclYCxky1LYg1CNEC2Ln4Zy/jD47wPvqRmOQphOoVlV/hPnOEqvjaZC
49Vn70W2DeE9cXvYI7ha+XIg6/oj+Gs3iusEbV08Ci7EAtXgI+ZUUsQ97K8UNiUh
h2lqSJtuI7lBpYP9sf+BeCch5UCC+xGYyTdoM5f58lehWBBPtbs0g7S9RyRyOYxe
n+yxEUo3dAGzJ/xsKAjinbZfeWIpr0b1TkAh4w3Cq/BKzRr9Bp8lBAxYuancbQ+Y
1ADPteUOTA==
=zP4Y
-----END PGP SIGNATURE-----
Merge tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- NVMe pull request via Keith:
- Various cleanups all around (Irvin, Chaitanya, Christophe)
- Better struct packing (Christophe JAILLET)
- Reduce controller error logs for optional commands (Keith)
- Support for >=64KiB block sizes (Daniel Gomez)
- Fabrics fixes and code organization (Max, Chaitanya, Daniel
Wagner)
- bcache updates via Coly:
- Fix a race at init time (Mingzhe Zou)
- Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye)
- use page pinning in the block layer for dio (David)
- convert old block dio code to page pinning (David, Christoph)
- cleanups for pktcdvd (Andy)
- cleanups for rnbd (Guoqing)
- use the unchecked __bio_add_page() for the initial single page
additions (Johannes)
- fix overflows in the Amiga partition handling code (Michael)
- improve mq-deadline zoned device support (Bart)
- keep passthrough requests out of the IO schedulers (Christoph, Ming)
- improve support for flush requests, making them less special to deal
with (Christoph)
- add bdev holder ops and shutdown methods (Christoph)
- fix the name_to_dev_t() situation and use cases (Christoph)
- decouple the block open flags from fmode_t (Christoph)
- ublk updates and cleanups, including adding user copy support (Ming)
- BFQ sanity checking (Bart)
- convert brd from radix to xarray (Pankaj)
- constify various structures (Thomas, Ivan)
- more fine grained persistent reservation ioctl capability checks
(Jingbo)
- misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan,
Jordy, Li, Min, Yu, Zhong, Waiman)
* tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits)
scsi/sg: don't grab scsi host module reference
ext4: Fix warning in blkdev_put()
block: don't return -EINVAL for not found names in devt_from_devname
cdrom: Fix spectre-v1 gadget
block: Improve kernel-doc headers
blk-mq: don't insert passthrough request into sw queue
bsg: make bsg_class a static const structure
ublk: make ublk_chr_class a static const structure
aoe: make aoe_class a static const structure
block/rnbd: make all 'class' structures const
block: fix the exclusive open mask in disk_scan_partitions
block: add overflow checks for Amiga partition support
block: change all __u32 annotations to __be32 in affs_hardblocks.h
block: fix signed int overflow in Amiga partition support
block: add capacity validation in bdev_add_partition()
block: fine-granular CAP_SYS_ADMIN for Persistent Reservation
block: disallow Persistent Reservation on partitions
reiserfs: fix blkdev_put() warning from release_journal_dev()
block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()
block: document the holder argument to blkdev_get_by_path
...
Use sendmsg() with MSG_SPLICE_PAGES rather than sendpage. This allows
multiple pages and multipage folios to be passed through.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
cc: Lee Duncan <lduncan@suse.com>
cc: Chris Leech <cleech@redhat.com>
cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
cc: "Martin K. Petersen" <martin.petersen@oracle.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: open-iscsi@googlegroups.com
Link: https://lore.kernel.org/r/20230623225513.2732256-12-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to prevent request_queue to be freed before cleaning up
blktrace debugfs entries, commit db59133e92 ("scsi: sg: fix blktrace
debugfs entries leakage") use scsi_device_get(), however,
scsi_device_get() will also grab scsi module reference and scsi module
can't be removed.
It's reported that blktests can't unload scsi_debug after block/001:
blktests (master) # ./check block
block/001 (stress device hotplugging) [failed]
+++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
Running block/001
Stressing sd
+modprobe: FATAL: Module scsi_debug is in use.
Fix this problem by grabbing request_queue reference directly, so that
scsi host module can still be unloaded while request_queue will be
pinged by sg device.
Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com>
Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
Fixes: db59133e92 ("scsi: sg: fix blktrace debugfs entries leakage")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230621160111.1433521-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
One-element arrays are deprecated, and we are replacing them with flexible
array members instead. So, replace one-element arrays with flexible-array
members in a couple of structures, and refactor the rest of the code,
accordingly.
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines
on memcpy().
This results in no differences in binary output.
Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/204
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZJNdKDkuRbFZpASS@work
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
clang points out that the lpfc_name structure has an 8-byte alignment
requirement on most architectures, but is embedded in a number of other
structures that are forced to be only 1-byte aligned:
drivers/scsi/lpfc/lpfc_hw.h:1516:30: error: field pe within 'struct lpfc_fdmi_reg_port_list' is less aligned than 'struct lpfc_fdmi_port_entry' and is usually due to 'struct lpfc_fdmi_reg_port_list' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
struct lpfc_fdmi_port_entry pe;
drivers/scsi/lpfc/lpfc_hw.h:850:19: error: field portName within 'struct _ADISC' is less aligned than 'struct lpfc_name' and is usually due to 'struct _ADISC' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
drivers/scsi/lpfc/lpfc_hw.h:851:19: error: field nodeName within 'struct _ADISC' is less aligned than 'struct lpfc_name' and is usually due to 'struct _ADISC' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
drivers/scsi/lpfc/lpfc_hw.h:922:19: error: field portName within 'struct _RNID' is less aligned than 'struct lpfc_name' and is usually due to 'struct _RNID' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
drivers/scsi/lpfc/lpfc_hw.h:923:19: error: field nodeName within 'struct _RNID' is less aligned than 'struct lpfc_name' and is usually due to 'struct _RNID' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access]
From the git history, I can see that all the __packed annotations were done
specifically to avoid introducing implicit padding around the lpfc_name
instances, though this was probably the wrong approach.
To improve this, only annotate the one uint64_t field inside of lpfc_name
as packed, with an explicit 4-byte alignment, as is the default already on
the 32-bit x86 ABI but not on most others. With this, the other __packed
annotations can be removed again, as this avoids the incorrect padding.
Two other structures change their layout as a result of this change:
- struct _LOGO never gained a __packed annotation even though it has the
same alignment problem as the others but is not used anywhere in the
driver today.
- struct serv_param similarly has this issue, and it is used, my guess is
that this is only an internal structure rather than part of a binary
interface, so the padding has no negative effect here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230616090705.2623408-1-arnd@kernel.org
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Martin Wilck <mwilck@suse.com> says:
This patch series addresses some issues we saw in a test setup with a
large number of SCSI LUNs. The first two patches simply increase the
number of available sg and bsg devices. 3-5 fix a large delay we
encountered between blocking a Fibre Channel remote port and the
dev_loss_tmo. 6 renames scsi_target_block() to scsi_block_targets(),
and makes additional changes to this API, as suggested in the review
of the v2 series. 7 improves a warning message.
Link: https://lore.kernel.org/r/20230614103616.31857-1-mwilck@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If __scsi_internal_device_block() returns an error, it is always -EINVAL
because of an invalid state transition. For debugging purposes, it makes
more sense to print the device state.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-8-mwilck@suse.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
All callers (fc_remote_port_delete(), __iscsi_block_session(),
__srp_start_tl_fail_timers(), srp_reconnect_rport(), snic_tgt_del()) pass
parent devices of scsi_target devices to scsi_target_block().
Rename the function to scsi_block_targets(), and simplify it by assuming
that it is always passed a parent device. Also, have callers pass the
Scsi_Host pointer to scsi_block_targets(), as every caller has this pointer
readily available.
Suggested-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-7-mwilck@suse.com
Cc: Karan Tilak Kumar <kartilak@cisco.com>
Cc: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_device_block() is only called from scsi_target_block(), which calls it
repeatedly for every child device. For targets with many devices, waiting
for every queue to quiesce may cause a substantial delay (we measured more
than 100s delay for blocking a FC rport with 2048 LUNs).
Just call blk_mq_wait_quiesce_done() once from scsi_target_block() after
stopping all queues.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-6-mwilck@suse.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_stop_queue() has just two callers, one with and one without
"nowait". As blk_mq_quiesce_queue() comes down to
blk_mq_quiesce_queue_nowait() followed by blk_mq_wait_quiesce_done(), we
might as well open-code this in scsi_device_block().
Also, add a comment explaining why blk_mq_quiesce_queue_nowait() must be
called with the state_mutex held, see
https://lore.kernel.org/linux-scsi/3b8b13bf-a458-827a-b916-07d7eee8ae00@acm.org/.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-5-mwilck@suse.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_internal_device_block() is only called from device_block(). Merge the
two functions, and call the result scsi_device_block(), as the name
device_block() is confusingly generic.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-4-mwilck@suse.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Larger setups may need to allocate more than 32k sg devices, so increase
the number of devices to the full range of minor device numbers.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-3-mwilck@suse.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nilesh Javali <njavali@marvell.com> says:
Please apply the qla2xxx driver klocwork fixes to the scsi tree at
your earliest convenience.
Link: https://lore.kernel.org/r/20230607113843.37185-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
sg_ioctl() support to enable blktrace, which will create debugfs entries
"/sys/kernel/debug/block/sgx/", however, there is no guarantee that user
will remove these entries through ioctl, and deleting sg device doesn't
cleanup these blktrace entries.
This problem can be fixed by cleanup blktrace while releasing
request_queue, however, it's not a good idea to do this special handling
in common layer just for sg device.
Fix this problem by shutdown bltkrace in sg_device_destroy(), where the
device is deleted and all the users close the device, also grab a
scsi_device reference from sg_add_device() to prevent scsi_device to be
freed before sg_device_destroy();
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20230610022003.2557284-3-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Klocwork reported array 'port_dstate_str' of size 10 may use index value(s)
10..15.
Add a fix to correct the index of array.
Cc: stable@vger.kernel.org
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-8-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Klocwork tool reported pointer 'rport' returned from call to function
fc_bsg_to_rport() may be NULL and will be dereferenced.
Add a fix to validate rport before dereferencing.
Cc: stable@vger.kernel.org
Signed-off-by: Shreyas Deodhar <sdeodhar@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-7-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Klocwork warning: Buffer Overflow - Array Index Out of Bounds
Driver uses fc_els_flogi to calculate size of buffer. The actual buffer is
nested inside of fc_els_flogi which is smaller.
Replace structure name to allow proper size calculation.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Klocwork reported warning of rport maybe NULL and will be dereferenced.
rport returned by call to fc_bsg_to_rport() could be NULL and dereferenced.
Check valid rport returned by fc_bsg_to_rport().
Cc: stable@vger.kernel.org
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Klocwork reported warning of NULL pointer may be dereferenced. The routine
exits when sa_ctl is NULL and fcport is allocated after the exit call thus
causing NULL fcport pointer to dereference at the time of exit.
To avoid fcport pointer dereference, exit the routine when sa_ctl is NULL.
Cc: stable@vger.kernel.org
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hyper-V synthetic SCSI devices do not support the MAINTENANCE_IN SCSI
command, so scsi_report_opcode() always fails, resulting in messages like
this:
hv_storvsc <guid>: tag#205 cmd 0xa3 status: scsi 0x2 srb 0x86 hv 0xc0000001
The recently added support for command duration limits calls
scsi_report_opcode() four times as each device comes online, which
significantly increases the number of messages logged in a system with many
disks.
Fix the problem by always marking Hyper-V synthetic SCSI devices as not
supporting scsi_report_opcode(). With this setting, the MAINTENANCE_IN SCSI
command is not issued and no messages are logged.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1686343101-18930-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix the I/O hang that arises because of the MSIx vector not having a mapped
online CPU upon receiving completion.
SCSI cmds take the blk_mq route, which is setup during init. Reserved cmds
fetch the vector_no from mq_map after init is complete. Before init, they
have to use 0 - as per the norm.
Reviewed-by: Gilbert Wu <gilbert.wu@microchip.com>
Signed-off-by: Sagar Biradar <Sagar.Biradar@microchip.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230519230834.27436-1-sagar.biradar@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd]
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of passing a fmode_t and only checking it for FMODE_WRITE, pass
a bool open_for_write to prepare for callers that won't have the fmode_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-21-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of passing a fmode_t and only checking it for FMODE_WRITE, pass
a bool open_for_write to prepare for callers that won't have the fmode_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of passing a fmode_t and only checking it for FMODE_WRITE, pass
a bool open_for_write to prepare for callers that won't have the fmode_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The mode argument to the ->release block_device_operation is never used,
so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd]
Link: https://lore.kernel.org/r/20230608110258.189493-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
->open is only called on the whole device. Make that explicit by
passing a gendisk instead of the block_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd]
Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bdev_check_media_change should only ever be called for the whole device.
Pass a gendisk to make that explicit and rename the function to
disk_check_media_change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
One-element arrays as fake flex arrays are deprecated and we are moving
towards adopting C99 flexible-array members, instead. So, replace
one-element array declaration in struct ct_sns_gpnft_rsp, which is
ultimately being used inside a union:
drivers/scsi/qla2xxx/qla_def.h:
3240 struct ct_sns_gpnft_pkt {
3241 union {
3242 struct ct_sns_req req;
3243 struct ct_sns_gpnft_rsp rsp;
3244 } p;
3245 };
Refactor the rest of the code, accordingly.
This issue was found with the help of Coccinelle.
Link: https://github.com/KSPP/linux/issues/245
Link: https://github.com/KSPP/linux/issues/193
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZH+/rZ1R1cBjIxjS@work
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored and
this typically results in resource leaks. To improve here there is a quest
to make the remove callback return void. In the first step of this quest
all drivers are converted to .remove_new() which already returns void.
hisi_sas_remove() returned zero unconditionally so this was changed to
return void. Then it has the right prototype to be used directly as remove
callback for the two hisi_sas drivers.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230518202043.261739-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Prevent any potential integer wrapping issue, and avoid a
-Wstringop-overflow warning by using the check_mul_overflow() helper.
drivers/scsi/lpfc/lpfc.h:
837:#define LPFC_RAS_MIN_BUFF_POST_SIZE (256 * 1024)
drivers/scsi/lpfc/lpfc_debugfs.c:
2266 size = LPFC_RAS_MIN_BUFF_POST_SIZE * phba->cfg_ras_fwlog_buffsize;
this can wrap to negative if cfg_ras_fwlog_buffsize is large
enough. And even when in practice this is not possible (due to
phba->cfg_ras_fwlog_buffsize never being larger than 4[1]), the
compiler is legitimately warning us about potentially buggy code.
Fix the following warning seen under GCC-13:
In function ‘lpfc_debugfs_ras_log_data’,
inlined from ‘lpfc_debugfs_ras_log_open’ at drivers/scsi/lpfc/lpfc_debugfs.c:2271:15:
drivers/scsi/lpfc/lpfc_debugfs.c:2210:25: warning: ‘memcpy’ specified bound between 18446744071562067968 and 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
2210 | memcpy(buffer + copied, dmabuf->virt,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2211 | size - copied - 1);
| ~~~~~~~~~~~~~~~~~~
Link: https://github.com/KSPP/linux/issues/305
Link: https://lore.kernel.org/linux-hardening/CABPRKS8zyzrbsWt4B5fp7kMowAZFiMLKg5kW26uELpg1cDKY3A@mail.gmail.com/ [1]
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZHkseX6TiFahvxJA@work
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Prefer struct_size() over open-coded versions of idiom:
sizeof(struct-with-flex-array) + sizeof(typeof-flex-array-elements) * count
where count is the max number of items the flexible array is supposed to
contain.
Link: https://github.com/KSPP/linux/issues/160
Co-developed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230531223319.24328-1-justintee8345@gmail.com
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 141f3d6256 ("ata: libata-sata: Fix device queue depth control")
added a struct ata_device argument to ata_change_queue_depth() to
address problems with changing the queue depth of ATA devices managed
through libsas. This was due to problems with ata_scsi_find_dev() which
are now fixed with commit 7f875850f2 ("ata: libata-scsi: Use correct
device no in ata_find_dev()").
Undo some of the changes of commit 141f3d6256: remove the added struct
ata_device aregument and use again ata_scsi_find_dev() to find the
target ATA device structure. While doing this, also make sure that
ata_scsi_find_dev() is called with ap->lock held, as it should.
libsas and libata call sites of ata_change_queue_depth() are updated to
match the modified function arguments.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Copy the sense data to internal driver buffer when the firmware completes
any SCSI I/O command sent through admin queue with sense data for further
use.
Fixes: 506bc1a0d6 ("scsi: mpi3mr: Add support for MPT commands")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Link: https://lore.kernel.org/r/20230531184025.3803-1-sumit.saxena@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add fatal error checking for the pm8001_phy_control() and
pm8001_lu_reset() functions.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Pranav Prasad <pranavpp@google.com>
Link: https://lore.kernel.org/r/20230526235155.433243-1-pranavpp@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In a future patch HAS_IOPORT=n will result in inb()/outb() and friends not
being declared. We thus need to add HAS_IOPORT as dependency for those
drivers using them.
Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20230522105049.1467313-32-schnelle@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.2.0.13
This patch set contains discovery bug fixes, firmware logging
improvements, clean up of CQ handling, and statistics collection
enhancements.
The patches were cut against Martin's 6.5/scsi-queue tree.
Link: https://lore.kernel.org/r/20230523183206.7728-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Various improvements are made for collecting congestion statistics:
- Pre-existing logic is replaced with use of an hrtimer for increased
reporting accuracy.
- Congestion timestamp information is reorganized into a single struct.
- Common statistic collection logic is refactored into a helper routine.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is mishandling of SLI-4 CQE status values larger than what is allowed
by the LPFC_IOCB_STATUS_MASK of 4 bits. The LPFC_IOCB_STATUS_MASK is a
leftover SLI-3 construct and serves no purpose in SLI-4 path.
Remove the LPFC_IOCB_STATUS_MASK and clean up general CQE status handling
in SLI-4 completion paths.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A firmware upgrade does not necessitate dumping of phba->dbg_log[] to kmsg
via LOG_TRACE_EVENT. A simple KERN_NOTICE log message should suffice to
notify the user of successful or unsuccessful firmware upgrade. As such,
firmware upgrade log messages are updated to use KERN_NOTICE instead of
LOG_TRACE_EVENT. Additionally, in order to notify the user of reset type
for instantiating newly downloaded firmware, lpfc_log_msg's default
KERN_LEVEL is updated to 5 or KERN_NOTICE.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When NPIV ports are zoned to devices that support both initiator and target
mode, a remote device's initiated PRLI results in unintended final kref
clean up of the device's ndlp structure. This disrupts NPIV ports'
discovery for target devices that support both initiator and target mode.
Modify the NPIV lpfc_drop_node clause such that we allow the ndlp to live
so long as it was in NLP_STE_PLOGI_ISSUE, NLP_STE_REG_LOGIN_ISSUE, or
NLP_STE_PRLI_ISSUE nlp_state. This allows lpfc's issued PRLI completion
routine to determine if the final kref clean up should execute rather than
a remote device's issued PRLI.
Fixes: db651ec225 ("scsi: lpfc: Correct used_rpi count when devloss tmo fires with no recovery")
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pre-existing device loss recovery logic via the NLP_IN_RECOV_POST_DEV_LOSS
flag only handled Fabric Port Login, Fabric Controller, Management, and
Name Server addresses.
Fabric domain controllers fall under the same category for usage of the
NLP_IN_RECOV_POST_DEV_LOSS flag. Add a default case statement to mark an
ndlp for device loss recovery.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-4-justintee8345@gmail.com
Acked-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In dev_loss_tmo callback routine, we early return if the ndlp is in a state
of rediscovery. This occurs when a target proactively PLOGIs or PRLIs
after an RSCN before the dev_loss_tmo callback routine is scheduled to run.
Move clear of the NLP_IN_DEV_LOSS flag before the ndlp state check in such
cases.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Due to a target port D_ID swap, it is possible for the
lpfc_register_remote_port() routine to touch post mortem fc_rport memory
when trying to access fc_rport->dd_data.
The D_ID swap causes a simultaneous call to lpfc_unregister_remote_port(),
where fc_remote_port_delete() reclaims fc_rport memory.
Remove the fc_rport->dd_data->pnode NULL assignment because the following
line reassigns ndlp->rport with an fc_rport object from
fc_remote_port_add() anyways. The pnode nullification is superfluous.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-2-justintee8345@gmail.com
Acked-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
strlcpy() reads the entire source buffer first. This read may exceed the
destination size limit. This is both inefficient and can lead to linear
read overflows if a source string is not NUL-terminated [1]. In an effort
to remove strlcpy() completely [2], replace strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Link: https://lore.kernel.org/r/20230530162321.984035-1-azeemshaikh38@gmail.com
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche <bvanassche@acm.org> says:
In the traces we recorded while testing zoned storage we noticed that UFS
commands are requeued while the clock is being ungated. Command requeueing
makes it harder than necessary to preserve the command order. Hence this
patch series that modifies the SCSI core and also the UFS driver such that
clock ungating does not trigger command requeueing.
Link: https://lore.kernel.org/r/20230529202640.11883-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Make scsi_host_block() easier to read by converting it to the widely used
early-return style. See also commit f983622ae6 ("scsi: core: Avoid
calling synchronize_rcu() for each device in scsi_host_block()").
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Ye Bin <yebin10@huawei.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230529202640.11883-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
gcc 13 may assign another type to enumeration constants than gcc 12. Split
the large enum at the top of source file stex.c such that the type of the
constants used in time expressions is changed back to the same type chosen
by gcc 12. This patch suppresses compiler warnings like this one:
In file included from ./include/linux/bitops.h:7,
from ./include/linux/kernel.h:22,
from drivers/scsi/stex.c:13:
drivers/scsi/stex.c: In function ‘stex_common_handshake’:
./include/linux/typecheck.h:12:25: error: comparison of distinct pointer types lacks a cast [-Werror]
12 | (void)(&__dummy == &__dummy2); \
| ^~
./include/linux/jiffies.h:106:10: note: in expansion of macro ‘typecheck’
106 | typecheck(unsigned long, b) && \
| ^~~~~~~~~
drivers/scsi/stex.c:1035:29: note: in expansion of macro ‘time_after’
1035 | if (time_after(jiffies, before + MU_MAX_DELAY * HZ)) {
| ^~~~~~~~~~
See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107405.
Cc: stable@vger.kernel.org
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230529195034.3077-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This loop will exit successfully when "found" is false or in the failure
case it times out with "wait_iter" set to -1. The test for timeouts is
impossible as is.
Fixes: b843adde8d ("scsi: qla2xxx: Fix mem access after free")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/cea5a62f-b873-4347-8f8e-c67527ced8d2@kili.mountain
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>