Commit Graph

1249698 Commits

Author SHA1 Message Date
Christoph Hellwig
0327ca9d53 block: use queue_limits_commit_update in queue_max_sectors_store
Convert queue_max_sectors_store to use queue_limits_commit_update to
check and update the max_sectors limit and freeze the queue before
doing so to ensure we don't have requests in flight while changing
the limits.

Note that this removes the previously held queue_lock that doesn't
protect against any other reader or writer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13 08:56:59 -07:00
Christoph Hellwig
d690cb8ae1 block: add an API to atomically update queue limits
Add a new queue_limits_{start,commit}_update pair of functions that
allows taking an atomic snapshot of queue limits, update it, and
commit it if it passes validity checking.  Also use the low-level
validation helper to implement blk_set_default_limits instead of
duplicating the initialization.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13 08:56:59 -07:00
Christoph Hellwig
c490f226a0 block: decouple blk_set_stacking_limits from blk_set_default_limits
blk_set_stacking_limits uses very little from blk_set_default_limits.
Open code these initializations in preparation for rewriting
blk_set_default_limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13 08:56:59 -07:00
Christoph Hellwig
b9947297d0 block: refactor disk_update_readahead
Factor out a blk_apply_bdi_limits limits helper that can be used with
an explicit queue_limits argument, which will be useful later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13 08:56:59 -07:00
Christoph Hellwig
8c4955c069 block: move max_{open,active}_zones to struct queue_limits
The maximum number of open and active zones is a limit on the queue
and should be places there so that we can including it in the upcoming
queue limits batch update API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13 08:56:59 -07:00
Arnd Bergmann
fe0b1e9a73 drbd: fix function cast warnings in state machine
There are four state machines in drbd that use a common infrastructure, with
a cast to an incompatible function type in REMEMBER_STATE_CHANGE that clang-16
now warns about:

drivers/block/drbd/drbd_state.c:1632:3: error: cast from 'int (*)(struct sk_buff *, unsigned int, struct drbd_resource_state_change *, enum drbd_notification_type)' to 'typeof (last_func)' (aka 'int (*)(struct sk_buff *, unsigned int, void *, enum drbd_notification_type)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
 1632 |                 REMEMBER_STATE_CHANGE(notify_resource_state_change,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1633 |                                       resource_state_change, NOTIFY_CHANGE);
      |                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/block/drbd/drbd_state.c:1619:17: note: expanded from macro 'REMEMBER_STATE_CHANGE'
 1619 |            last_func = (typeof(last_func))func; \
      |                        ^~~~~~~~~~~~~~~~~~~~~~~
drivers/block/drbd/drbd_state.c:1641:4: error: cast from 'int (*)(struct sk_buff *, unsigned int, struct drbd_connection_state_change *, enum drbd_notification_type)' to 'typeof (last_func)' (aka 'int (*)(struct sk_buff *, unsigned int, void *, enum drbd_notification_type)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
 1641 |                         REMEMBER_STATE_CHANGE(notify_connection_state_change,
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1642 |                                               connection_state_change, NOTIFY_CHANGE);
      |                                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Change these all to actually expect a void pointer to be passed, which
matches the caller.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240213100354.457128-1-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13 08:55:40 -07:00
Arnd Bergmann
7789bf0552 floppy: fix function pointer cast warnings
clang-16 complains about a control flow integrity (kcfi) violation
casting between incompatible pointers:

drivers/block/floppy.c:2001:11: error: cast from 'void (*)(void)' to 'done_f' (aka 'void (*)(int)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
 2001 |         .done           = (done_f)empty
      |                           ^~~~~~~~~~~~~

Just add another empty function with the correct prototype as a
workaround.

The warning is for code that was added before the start of the normal
git history, but I tracked it done to an early change in the reconstructed
linux-history.git.

Fixes: 598a477afe06 ("Import 1.1.41")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240213095918.455478-1-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13 08:55:33 -07:00
Li Nan
6cf3506587 md: fix kmemleak of rdev->serial
If kobject_add() is fail in bind_rdev_to_array(), 'rdev->serial' will be
alloc not be freed, and kmemleak occurs.

unreferenced object 0xffff88815a350000 (size 49152):
  comm "mdadm", pid 789, jiffies 4294716910
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc f773277a):
    [<0000000058b0a453>] kmemleak_alloc+0x61/0xe0
    [<00000000366adf14>] __kmalloc_large_node+0x15e/0x270
    [<000000002e82961b>] __kmalloc_node.cold+0x11/0x7f
    [<00000000f206d60a>] kvmalloc_node+0x74/0x150
    [<0000000034bf3363>] rdev_init_serial+0x67/0x170
    [<0000000010e08fe9>] mddev_create_serial_pool+0x62/0x220
    [<00000000c3837bf0>] bind_rdev_to_array+0x2af/0x630
    [<0000000073c28560>] md_add_new_disk+0x400/0x9f0
    [<00000000770e30ff>] md_ioctl+0x15bf/0x1c10
    [<000000006cfab718>] blkdev_ioctl+0x191/0x3f0
    [<0000000085086a11>] vfs_ioctl+0x22/0x60
    [<0000000018b656fe>] __x64_sys_ioctl+0xba/0xe0
    [<00000000e54e675e>] do_syscall_64+0x71/0x150
    [<000000008b0ad622>] entry_SYSCALL_64_after_hwframe+0x6c/0x74

Fixes: 963c555e75 ("md: introduce mddev_create/destroy_wb_pool for the change of member device")
Signed-off-by: Li Nan <linan122@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240208085556.2412922-1-linan666@huaweicloud.com
2024-02-12 09:20:08 -08:00
Kanchan Joshi
921e81db52 nvme: allow integrity when PI is not in first bytes
NVM command set 1.0 (or later) mandates PI to be in the last bytes of
metadata. But this was not supported in the block-layer, and driver
registered a nop profile.

Since block-integrity can now handle flexible PI offset, change the
driver to support this configuration.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240201130126.211402-4-joshi.k@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12 08:49:31 -07:00
Kanchan Joshi
60d21aac52 block: support PI at non-zero offset within metadata
Block layer integrity processing assumes that protection information
(PI) is placed in the first bytes of each metadata block.

Remove this limitation and include the metadata before the PI in the
calculation of the guard tag.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Chinmay Gameti <c.gameti@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240201130126.211402-3-joshi.k@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12 08:49:31 -07:00
Kanchan Joshi
6b5c132a3f block: refactor guard helpers
Allow computation using the existing guard value.
This is a prep patch.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240201130126.211402-2-joshi.k@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12 08:49:31 -07:00
Johannes Thumshirn
71f4ecdbb4 block: remove gfp_flags from blkdev_zone_mgmt
Now that all callers pass in GFP_KERNEL to blkdev_zone_mgmt() and use
memalloc_no{io,fs}_{save,restore}() to define the allocation scope, we can
drop the gfp_mask parameter from blkdev_zone_mgmt() as well as
blkdev_zone_reset_all() and blkdev_zone_reset_all_emulated().

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-5-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12 08:41:16 -07:00
Johannes Thumshirn
147ec1c60e f2fs: guard blkdev_zone_mgmt with nofs scope
Guard the calls to blkdev_zone_mgmt() with a memalloc_nofs scope.
This helps us getting rid of the GFP_NOFS argument to blkdev_zone_mgmt();

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-4-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12 08:41:16 -07:00
Johannes Thumshirn
d9d556755f btrfs: zoned: call blkdev_zone_mgmt in nofs scope
Add a memalloc_nofs scope around all calls to blkdev_zone_mgmt(). This
allows us to further get rid of the GFP_NOFS argument for
blkdev_zone_mgmt().

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-3-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12 08:41:16 -07:00
Johannes Thumshirn
218082010a dm: dm-zoned: guard blkdev_zone_mgmt with noio scope
Guard the calls to blkdev_zone_mgmt() with a memalloc_noio scope.
This helps us getting rid of the GFP_NOIO argument to blkdev_zone_mgmt();

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-2-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12 08:41:16 -07:00
Johannes Thumshirn
9105ce591b zonefs: pass GFP_KERNEL to blkdev_zone_mgmt() call
Pass GFP_KERNEL instead of GFP_NOFS to the blkdev_zone_mgmt() call in
zonefs_zone_mgmt().

As as zonefs_zone_mgmt() and zonefs_inode_zone_mgmt() are never called
from a place that can recurse back into the filesystem on memory reclaim,
it is save to call blkdev_zone_mgmt() with GFP_KERNEL.

Link: https://lore.kernel.org/all/ZZcgXI46AinlcBDP@casper.infradead.org/
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-1-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12 08:41:15 -07:00
Miroslav Franc
c3116e62dd s390/dasd: fix double module refcount decrement
Once the discipline is associated with the device, deleting the device
takes care of decrementing the module's refcount.  Doing it manually on
this error path causes refcount to artificially decrease on each error
while it should just stay the same.

Fixes: c020d722b1 ("s390/dasd: fix panic during offline processing")
Signed-off-by: Miroslav Franc <mfranc@suse.cz>
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240209124522.3697827-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-09 07:33:06 -07:00
Jan Höppner
1df0f512fa s390/dasd: Improve ERP error messages
Some ERP errors still share the same message format and only add
different reason codes to it. These reason codes don't have any meaning
anymore.
Make the individual error messages more explicit and remove the reason
codes altogether. Comments around the error messages are also removed as
they provide no additional value anymore with more explicit messages.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240209124522.3697827-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-09 07:33:06 -07:00
Shin'ichiro Kawasaki
14509b748f null_blk: add configfs variable shared_tags
Allow setting shared_tags through configfs, which could only be set as a
module parameter. For that purpose, delay tag_set initialization from
null_init() to null_add_dev(). Refer tag_set.ops as the flag to check if
tag_set is initialized or not.

The following parameters can not be set through configfs yet:

    timeout
    requeue
    init_hctx

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240130042134.2463659-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 11:31:23 -07:00
Kunwu Chan
48ff13a618 block: Simplify the allocation of slab caches
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240131094323.146659-1-chentao@kylinos.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 11:29:40 -07:00
Pavel Begunkov
e516c3fc6c block: optimise in irq bio put caching
When enlisting a bio into ->free_list_irq we protect the list by
disabling irqs. It's likely they're already disabled and performance of
local_irq_{save,restore}() is decent, but it's not zero cost.

Let's only use the irq cache when when we're serving a hard irq, which
allows to remove local_irq_{save,restore}(), and fall back to bio_free()
in all left cases.

Profiles indicate that the bio_put() cost is reduced by ~3.5 times
(1.76% -> 0.49%), and total throughput of a CPU bound benchmark improve
by around 1% (t/io_uring with high QD and several drives).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/36d207540b7046c653cc16e5ff08fe7234b19f81.1707314970.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 10:18:48 -07:00
Pavel Begunkov
c9f5f3aa19 block: extend bio caching to task context
bio_put_percpu_cache() puts all non-iopoll bios into the irq-safe list,
which entails disabling irqs. The overhead of that is not that bad when
interrupts are already off but getting worse otherwise. We can optimise
it when we're in the task context by using ->free_list directly just as
the IOPOLL path does.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4774e1a0f905f96c63174b0f3e4f79f0d9b63246.1707314970.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 10:18:47 -07:00
Jan Höppner
79ae56fc47 s390/dasd: Use dev_*() for device log messages
All log messages in dasd.c use the printk variants of pr_*(). They all
add the name of the affected device manually to the log message.
This can be simplified by using the dev_*() variants of printk, which
include the device information and make a separate call to dev_name()
unnecessary.

The KMSG_COMPONENT and the pr_fmt() definition can be dropped. Note that
this removes the "dasd: " prefix from the one pr_info() call in
dasd_init(). However, the log message already provides all relevant
information.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240208164248.540985-10-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 10:12:01 -07:00
Jan Höppner
c6c6c69df6 s390/dasd: Remove PRINTK_HEADER and KMSG_COMPONENT definitions
PRINTK_HEADER was mainly used to prefix log messages with the module
name. Most components don't use this definition anymore. Either because
there are no log messages being generated anymore, or pr_*() were
replaced by dev_*(), which contains device and component information
already.

PRINTK_HEADER is also dropped in the function
dasd_3990_erp_handle_match_erp() in dasd_3990_erp.c from a panic() call
as panic() already provides all relevant information.

KMSG_COMPONENT was mainly used to identify a component in a long gone
kernel message catalog feature.

Remove both definition since they're either not used or alternatives
make the code slightly shorter and more readable.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240208164248.540985-9-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 10:12:01 -07:00
Jan Höppner
4ba6366dbb s390/dasd: Remove %p format specifier from error messages
Printing pointer in error messages doesn't add any value since the
addresses are hashed. Remove the %p format specifier and adapt the error
messages slightly.

Replace %p with %px in ERP to get the actual addresses since ERP is used
for debugging purposes only anyway.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240208164248.540985-8-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 10:12:01 -07:00
Jan Höppner
0b3644b475 s390/dasd: Use dev_err() over printk()
To reduce the information required for the string generation in the
sense dump functions, use the more concise dev_err() variant over
printk(KERN_ERR, ...) to improve code readability.

The dev_err() function provides the component and device name for free
and the separate dev_name() calls as well as the PRINTK_HEADER can be
dropped.

Dropping PRINTK_HEADER removes the "dasd(eckd):" for all lines. Only the
first line of a dev_err() call is prefixed with the component and device
(e.g. "dasd-eckd 0.0.95d0:").

The format specifier for printed pointers is also changed to unhashed
(%px) as this can help with debugging and servicing.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240208164248.540985-7-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 10:12:01 -07:00
Jan Höppner
32312cf229 s390/dasd: Remove unused message logging macros
The macros DEV_MESSAGE, MESSAGE, DEV_MESSAGE_LOG, and MESSAGE_LOG, are
not used and there is no history anymore of any usage. Remove them.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240208164248.540985-6-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 10:12:01 -07:00
Jan Höppner
4849494f05 s390/dasd: Move allocation error message to DBF
All error messages for a failling dasd_smalloc_request() call are logged
via DBF, except one. There is no value in logging this particular
allocation failure via dev_err(). Move the message to DBF, too, to be
in line with the rest.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240208164248.540985-5-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 10:12:01 -07:00
Jan Höppner
8d7ac904c9 s390/dasd: Remove unnecessary errorstring generation
In quite a few cases an errorstring is generated using snprintf() before
it's passed to dev_err(). This indirection is unnecessary and all
information can simply be passed directly to dev_err() instead.
The errrorstring and ERRORLENGTH definitions are removed entirely.

While at it, rephrase the error messages to provide more context where
possible. Also, fix a few incorrectly used format specifier (e.g. %x02
-> %02x) in those messages.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240208164248.540985-4-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 10:12:01 -07:00
Jan Höppner
9c386d0f6e s390/dasd: Use sysfs_emit() over sprintf()
sysfs_emit() should be used in show() functions. There are still a
couple of functions that use sprintf().
Replace outstanding occurrences of sprintf() in all show() functions
with sysfs_emit().

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240208164248.540985-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 10:12:01 -07:00
Jan Höppner
e5de34db1e s390/dasd: Simplify uid string generation
There are two variants of the device uid string. One containing the
virtual device unit information table (vduit) identifying the device as
a virtual device located on a real device in a z/VM environment. The
other variant does not contain those additional information.

Simplify the string generation with a shorter check of an existing vduit
embedded in the snprintf() calls.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240208164248.540985-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08 10:12:01 -07:00
Ricardo B. Marliere
052618c71c block: rbd: make rbd_bus_type const
Now that the driver core can properly handle constant struct bus_type,
move the rbd_bus_type variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20240204-bus_cleanup-block-v1-1-fc77afd8d7cc@marliere.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-06 10:37:45 -07:00
Song Liu
83cbdaf61b md/multipath: Remove md-multipath.h
md-multipath is already deprecated. Remove the header file.

Signed-off-by: Song Liu <song@kernel.org>
2024-02-05 15:34:39 -08:00
Marc Zyngier
95b77082b7 md/linear: Get rid of md-linear.h
Given that 849d18e27b ("md: Remove deprecated CONFIG_MD_LINEAR")
killed the linear flavour of MD, it seems only logical to drop
the leftover include file that used to come with it.

I also feel that it should be my own privilege to remove my 30 year
old attempt at writing kernel code ;-). RIP!

Cc: Song Liu <song@kernel.org>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240201224549.750644-1-maz@kernel.org
2024-02-05 15:32:30 -08:00
Li Lingfeng
570b9147de md: use RCU lock to protect traversal in md_spares_need_change()
Since md_start_sync() will be called without the protect of mddev_lock,
and it can run concurrently with array reconfiguration, traversal of rdev
in it should be protected by RCU lock.
Commit bc08041b32 ("md: suspend array in md_start_sync() if array need
reconfiguration") added md_spares_need_change() to md_start_sync(),
casusing use of rdev without any protection.
Fix this by adding RCU lock in md_spares_need_change().

Fixes: bc08041b32 ("md: suspend array in md_start_sync() if array need reconfiguration")
Cc: stable@vger.kernel.org # 6.7+
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240104133629.1277517-1-lilingfeng@huaweicloud.com
2024-02-05 15:23:59 -08:00
Li Lingfeng
9cfcf99e7e md: get rdev->mddev with READ_ONCE()
Users may get rdev->mddev by sysfs while rdev is releasing.
So use both READ_ONCE() and WRITE_ONCE() to prevent load/store tearing
and to read/write mddev atomically.

Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231229070500.3602712-1-lilingfeng@huaweicloud.com
2024-02-05 15:23:58 -08:00
Yu Kuai
faeaf210a5 md: remove redundant md_wakeup_thread()
On the one hand, mddev_unlock() will call md_wakeup_thread()
unconditionally; on the other hand, md_check_recovery() can't make
progress if 'reconfig_mutex' can't be grabbed. Hence, it really doesn't
make sense to wake up daemon thread while 'reconfig_mutex' is still
grabbed.

Remove all the md_wakup_thread() for 'mddev->thread' while
'reconfig_mtuex' is still grabbed.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231228125553.2697765-3-yukuai1@huaweicloud.com
2024-02-05 15:23:58 -08:00
Yu Kuai
61c90765e1 md: remove redundant check of 'mddev->sync_thread'
The lifetime of sync_thread:

1) Set MD_RECOVERY_NEEDED and wake up daemon thread (by ioctl/sysfs or
   other events);
2) Daemon thread woke up, md_check_recovery() found that
   MD_RECOVERY_NEEDED is set:
   a) try to grab reconfig_mutex;
   b) set MD_RECOVERY_RUNNING;
   c) clear MD_RECOVERY_NEEDED, and then queue sync_work;
3) md_start_sync() choose sync_action, then register sync_thread;
4) md_do_sync() is done, set MD_RECOVERY_DONE and wake up daemon thread;
5) Daemon thread woke up, md_check_recovery() found that
   MD_RECOVERY_DONE is set:
   a) try to grab reconfig_mutex;
   b) unregister sync_thread;
   c) clear MD_RECOVERY_RUNNING and MD_RECOVERY_DONE;

Hence there is no such case that MD_RECOVERY_RUNNING is not set, while
sync_thread is registered.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231228125553.2697765-2-yukuai1@huaweicloud.com
2024-02-05 15:23:58 -08:00
Tang Yizhou
3bca7640b4 blk-throttle: Eliminate redundant checks for data direction
After calling throtl_peek_queued(), the data direction can be determined so
there is no need to call bio_data_dir() to check the direction again.

Signed-off-by: Tang Yizhou <yizhou.tang@shopee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240123081248.3752878-1-yizhou.tang@shopee.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05 10:16:12 -07:00
Jens Axboe
06b23f92af block: update cached timestamp post schedule/preemption
Mark the task as having a cached timestamp when set assign it, so we
can efficiently check if it needs updating post being scheduled back in.
This covers both the actual schedule out case, which would've flushed
the plug, and the preemption case which doesn't touch the plugged
requests (for many reasons, one of them being then we'd need to have
preemption disabled around plug state manipulation).

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05 10:07:34 -07:00
Jens Axboe
da4c8c3d09 block: cache current nsec time in struct blk_plug
Querying the current time is the most costly thing we do in the block
layer per IO, and depending on kernel config settings, we may do it
many times per IO.

None of the callers actually need nsec granularity. Take advantage of
that by caching the current time in the plug, with the assumption here
being that any time checking will be temporally close enough that the
slight loss of precision doesn't matter.

If the block plug gets flushed, eg on preempt or schedule out, then
we invalidate the cached clock.

On a basic peak IOPS test case with iostats enabled, this changes
the performance from:

IOPS=108.41M, BW=52.93GiB/s, IOS/call=31/31
IOPS=108.43M, BW=52.94GiB/s, IOS/call=32/32
IOPS=108.29M, BW=52.88GiB/s, IOS/call=31/32
IOPS=108.35M, BW=52.91GiB/s, IOS/call=32/32
IOPS=108.42M, BW=52.94GiB/s, IOS/call=31/31
IOPS=108.40M, BW=52.93GiB/s, IOS/call=32/32
IOPS=108.31M, BW=52.89GiB/s, IOS/call=32/31

to

IOPS=118.79M, BW=58.00GiB/s, IOS/call=31/32
IOPS=118.62M, BW=57.92GiB/s, IOS/call=31/31
IOPS=118.80M, BW=58.01GiB/s, IOS/call=32/31
IOPS=118.78M, BW=58.00GiB/s, IOS/call=32/32
IOPS=118.69M, BW=57.95GiB/s, IOS/call=32/31
IOPS=118.62M, BW=57.92GiB/s, IOS/call=32/31
IOPS=118.63M, BW=57.92GiB/s, IOS/call=31/32

which is more than a 9% improvement in performance. Looking at perf diff,
we can see a huge reduction in time overhead:

    10.55%     -9.88%  [kernel.vmlinux]  [k] read_tsc
     1.31%     -1.22%  [kernel.vmlinux]  [k] ktime_get

Note that since this relies on blk_plug for the caching, it's only
applicable to the issue side. But this is where most of the time calls
happen anyway. On the completion side, cached time stamping is done with
struct io_comp patch, as long as the driver supports it.

It's also worth noting that the above testing doesn't enable any of the
higher cost CPU items on the block layer side, like wbt, cgroups,
iocost, etc, which all would add additional time querying and hence
overhead. IOW, results would likely look even better in comparison with
those enabled, as distros would do.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05 10:07:28 -07:00
Jens Axboe
08420cf70c block: add blk_time_get_ns() and blk_time_get() helpers
Convert any user of ktime_get_ns() to use blk_time_get_ns(), and
ktime_get() to blk_time_get(), so we have a unified API for querying the
current time in nanoseconds or as ktime.

No functional changes intended, this patch just wraps ktime_get_ns()
and ktime_get() with a block helper.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05 10:07:22 -07:00
Jens Axboe
c4e47bbb00 block: move cgroup time handling code into blk.h
In preparation for moving time keeping into blk.h, move the cgroup
related code for timestamps in here too. This will help avoid a circular
dependency, and also moves it into a more appropriate header as this one
is private to the block layer code.

Leave struct bio_issue in blk_types.h as it's a proper time definition.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05 10:07:17 -07:00
Christoph Hellwig
72e84e909e blk-mq: special case cached requests less
Share the main merge / split / integrity preparation code between the
cached request vs newly allocated request cases, and add comments
explaining the cached request handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240124092658.2258309-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05 10:06:53 -07:00
Christoph Hellwig
337e89feb7 blk-mq: introduce a blk_mq_peek_cached_request helper
Add a new helper to check if there is suitable cached request in
blk_mq_submit_bio.  This removes open coded logic in blk_mq_submit_bio
and moves some checks that so far are in blk_mq_use_cached_rq to
be performed earlier.  This avoids the case where we first do check
with the cached request but then later end up allocating a new one
anyway and need to grab a queue reference.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240124092658.2258309-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05 10:05:16 -07:00
Christoph Hellwig
0f299da55a blk-mq: move blk_mq_attempt_bio_merge out blk_mq_get_new_requests
blk_mq_attempt_bio_merge has nothing to do with allocating a new
request, it avoids allocating a new request.  Move the call out of
blk_mq_get_new_requests and into the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240124092658.2258309-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05 10:03:51 -07:00
Linus Torvalds
54be6c6c5a Linux 6.8-rc3 2024-02-04 12:20:36 +00:00
Linus Torvalds
3f24fcdacd Miscellaneous bug fixes and cleanups in ext4's multi-block allocator
and extent handling code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmW/G4YACgkQ8vlZVpUN
 gaPTpwf/c/Fk27GV8ge9PQtR6gmir/lyw2qkvK3Z+12aEsblZRmyvElyZWjAuNQG
 bciQyltabIPOA4XxfsZOrdgYI42n0rTTFG7bmI0lr+BJM/HRw0tGGMN91FZla0FP
 EXv/AiHKCqlT5OFZbD+8n1TzfdOgWotjug1VLteXve3YKjkDgt5IQm/0Gx9hKBld
 IR8SrQlD/rYe+VPvaHz5G4u09Ne5pUE5fDj3xE23wxfU5cloVzuVRCSOGWUCTnCW
 T0v6sHeKrmiLC8tIOZkBjer4nXC0MOu0p5+geAjwOArc9VJ1Lh2eAkH+GgDOVprx
 ahdl2qmbIbacBYECIeQ/+1i78+O1yw==
 =CmYr
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Miscellaneous bug fixes and cleanups in ext4's multi-block allocator
  and extent handling code"

* tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
  ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type
  ext4: make ext4_map_blocks() distinguish delalloc only extent
  ext4: add a hole extent entry in cache after punch
  ext4: correct the hole length returned by ext4_map_blocks()
  ext4: convert to exclusive lock while inserting delalloc extents
  ext4: refactor ext4_da_map_blocks()
  ext4: remove 'needed' in trace_ext4_discard_preallocations
  ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations
  ext4: remove unused return value of ext4_mb_release_group_pa
  ext4: remove unused return value of ext4_mb_release_inode_pa
  ext4: remove unused return value of ext4_mb_release
  ext4: remove unused ext4_allocation_context::ac_groups_considered
  ext4: remove unneeded return value of ext4_mb_release_context
  ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_*()
  ext4: remove unused return value of __mb_check_buddy
  ext4: mark the group block bitmap as corrupted before reporting an error
  ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
  ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
  ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
  ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
  ...
2024-02-04 07:33:01 +00:00
Linus Torvalds
9e28c7a23b five smb3 client fixes, mostly multichannel related
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmW+oQoACgkQiiy9cAdy
 T1E6GwwAk9PQo1N8h6AI0PWCPoiHps8YzpISTgBfchDs+sIvo1sZEVMzTmSlWm4F
 cON4nb3QBkD4aqWbCdb1tjby2VAb0aHuRbCQPAczW4+R7He8ELlGYow7IPBWmyI8
 CTQRirYrJuIePh1aT2UG4mykNyQzp5i5ycsdcWFiIGliXrTp0De0rvE/60KGLuJ/
 lnDZp7+FHytIfpwVzl4bGax4odQh14whxdQme4C9a8kVfYPQGKFyADbuxy0KmWmu
 8kkEcGpY/bPdqLE1tGzNoVci9cEdYK6yUzkc8dj2ddsZ7YDHPz0NtsdWlC517qt+
 ekDADHRQZiKwyJzMGHibK8E4WoP0+JjB4j6OTc369ICzgjuIutWCCIexbS0IV/po
 IuyRQTvLSTfYpE8eoQavAKaty6sDnJcP7FepPtH2k5yGr4+ILZV4ozoH+hqzn2/K
 sf/q9ZfkyfaEi2JkMD9TTv7k2EWLMntpxklbpg59VKHY5MGvlQqoRl9b+jzfgKEZ
 xS/myr3e
 =q5wf
 -----END PGP SIGNATURE-----

Merge tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Five smb3 client fixes, mostly multichannel related:

   - four multichannel fixes including fix for channel allocation when
     multiple inactive channels, fix for unneeded race in channel
     deallocation, correct redundant channel scaling, and redundant
     multichannel disabling scenarios

   - add warning if max compound requests reached"

* tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: increase number of PDUs allowed in a compound request
  cifs: failure to add channel on iface should bump up weight
  cifs: do not search for channel if server is terminating
  cifs: avoid redundant calls to disable multichannel
  cifs: make sure that channel scaling is done only once
2024-02-04 07:26:19 +00:00
Linus Torvalds
fc86e5c990 Bug fixes for 6.8-rc3:
* Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format
    attribute fork.
 
  * Remove conditional compilation of realtime geometry validator functions to
    prevent confusing error messages from being printed on the console during the
    mount operation.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZbo7FAAKCRAH7y4RirJu
 9Cn+APsFEbHA8YQpCSxGDM+Xelez9X7wroi6QkyOxRP6Lqq6ogD6A96uuV86TxkQ
 Hkse9IAKkFoLmyzohT9u7Bv46M/X4w8=
 =Ez8Z
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Chandan Babu:

 - Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format
   attribute fork

 - Remove conditional compilation of realtime geometry validator
   functions to prevent confusing error messages from being printed on
   the console during the mount operation

* tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: remove conditional building of rt geometry validator functions
  xfs: reset XFS_ATTR_INCOMPLETE filter on node removal
2024-02-04 07:22:51 +00:00