linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-10 22:21:40 +00:00

Author	SHA1	Message	Date
Christoph Hellwig	89ea5ceb53	blk-mq: cleanup __blk_mq_sched_dispatch_requests __blk_mq_sched_dispatch_requests currently has duplicated logic for the cases where requests are on the hctx dispatch list or not. Merge the two with a new need_dispatch variable and remove a few pointless local variables. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413060651.694656-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:57:18 -06:00
Christoph Hellwig	b12e5c6c75	blk-mq: pass a flags argument to blk_mq_add_to_requeue_list Replace the boolean at_head argument with the same flags that are already passed to blk_mq_insert_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-21-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	93fffe16f7	blk-mq: pass a flags argument to elevator_type->insert_requests Instead of passing a bool at_head, pass down the full flags from the blk_mq_insert_request interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	2b5976134b	blk-mq: pass a flags argument to blk_mq_request_bypass_insert Replace the boolean at_head argument with the same flags that are already passed to blk_mq_insert_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	710fa3789e	blk-mq: pass a flags argument to blk_mq_insert_request Replace the at_head bool with a flags argument that so far only contains a single BLK_MQ_INSERT_AT_HEAD value. This makes it much easier to grep for head insertions into the blk-mq dispatch queues. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	214a441805	blk-mq: don't kick the requeue_list in blk_mq_add_to_requeue_list blk_mq_add_to_requeue_list takes a bool parameter to control how to kick the requeue list at the end of the function. Move the call to blk_mq_kick_requeue_list to the callers that want it instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	2394395cd5	blk-mq: don't run the hw_queue from blk_mq_request_bypass_insert blk_mq_request_bypass_insert takes a bool parameter to control how to run the queue at the end of the function. Move the blk_mq_run_hw_queue call to the callers that want it instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	f0dbe6e88e	blk-mq: don't run the hw_queue from blk_mq_insert_request blk_mq_insert_request takes two bool parameters to control how to run the queue at the end of the function. Move the blk_mq_run_hw_queue call to the callers that want it instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	e1f44ac0d7	blk-mq: fold __blk_mq_try_issue_directly into its two callers Due to the wildly different behavior based on the bypass_insert argument, not a whole lot of code in __blk_mq_try_issue_directly is actually shared between blk_mq_try_issue_directly and blk_mq_request_issue_directly. Remove __blk_mq_try_issue_directly and fold the code into the two callers instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	2b71b87707	blk-mq: factor out a blk_mq_get_budget_and_tag helper Factor out a helper from __blk_mq_try_issue_directly in preparation of folding that function into its two callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	a1e948b81a	blk-mq: refactor the DONTPREP/SOFTBARRIER andling in blk_mq_requeue_work Split the RQF_DONTPREP and RQF_SOFTBARRIER in separate branches to make the code more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	53548d2a94	blk-mq: refactor passthrough vs flush handling in blk_mq_insert_request While both passthrough and flush requests call directly into blk_mq_request_bypass_insert, the parameters aren't the same. Split the handling into two separate conditionals and turn the whole function into an if/elif/elif/else flow instead of the gotos. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	a4fa57ffb7	blk-mq: remove blk_flush_queue_rq Just call blk_mq_add_to_requeue_list directly from the two callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Christoph Hellwig	4ec5c0553c	blk-mq: fold __blk_mq_insert_req_list into blk_mq_insert_request Remove this very small helper and fold it into the only caller. Note that this moves the trace_block_rq_insert out of ctx->lock, matching the other calls to this tracepoint. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Christoph Hellwig	a88db1e000	blk-mq: fold __blk_mq_insert_request into blk_mq_insert_request There is no good point in keeping the __blk_mq_insert_request around for two function calls and a singler caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Christoph Hellwig	2bd215df79	blk-mq: move blk_mq_sched_insert_request to blk-mq.c blk_mq_sched_insert_request is the main request insert helper and not directly I/O scheduler related. Move blk_mq_sched_insert_request to blk-mq.c, rename it to blk_mq_insert_request and mark it static. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Christoph Hellwig	05a9311770	blk-mq: fold blk_mq_sched_insert_requests into blk_mq_dispatch_plug_list blk_mq_dispatch_plug_list is the only caller of blk_mq_sched_insert_requests, and it makes sense to just fold it there as blk_mq_sched_insert_requests isn't specific to I/O schedulers despite the name. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Christoph Hellwig	94aa228c2a	blk-mq: move more logic into blk_mq_insert_requests Move all logic related to the direct insert (including the call to blk_mq_run_hw_queue) into blk_mq_insert_requests to streamline the code flow up a bit, and to allow marking blk_mq_try_issue_list_directly static. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Christoph Hellwig	90110e04f2	blk-mq: include <linux/blk-mq.h> in block/blk-mq.h block/blk-mq.h needs various definitions from <linux/blk-mq.h>, include it there instead of relying on the source files to include both. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Christoph Hellwig	bebe84ebee	blk-mq: remove blk-mq-tag.h blk-mq-tag.h is always included by blk-mq.h, and causes recursive inclusion hell with further changes. Just merge it into blk-mq.h instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Christoph Hellwig	50947d7fe9	blk-mq: don't plug for head insertions in blk_execute_rq_nowait Plugs never insert at head, so don't plug for head insertions. Fixes: `1c2d2fff6d` ("block: wire-up support for passthrough plugging") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Chengming Zhou	8e15dfbd9a	blk-throttle: only enable blk-stat when BLK_DEV_THROTTLING_LOW blk_throtl_register() will unconditionally enable blk-stat for gendisk when register, even when we have no BLK_DEV_THROTTLING_LOW config. Since the kernel always has only BLK_DEV_THROTTLING config and the BLK_DEV_THROTTLING_LOW config is still in EXPERIMENTAL state, we can just skip blk-stat when !BLK_DEV_THROTTLING_LOW. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230413062805.2081970-2-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:48:11 -06:00
Chengming Zhou	20de765f6d	blk-stat: fix QUEUE_FLAG_STATS clear We need to set QUEUE_FLAG_STATS for two cases: 1. blk_stat_enable_accounting() 2. blk_stat_add_callback() So we should clear it only when ((q->stats->accounting == 0) && list_empty(&q->stats->callbacks)). blk_stat_disable_accounting() only check if q->stats->accounting is 0 before clear the flag, this patch fix it. Also add list_empty(&q->stats->callbacks)) check when enable, or the flag is already set. The bug can be reproduced on kernel without BLK_DEV_THROTTLING (since it unconditionally enable accounting, see the next patch). # cat /sys/block/sr0/queue/scheduler none mq-deadline [bfq] # cat /sys/kernel/debug/block/sr0/state SAME_COMP\|IO_STAT\|INIT_DONE\|STATS\|REGISTERED\|NOWAIT\|30 # echo none > /sys/block/sr0/queue/scheduler # cat /sys/kernel/debug/block/sr0/state SAME_COMP\|IO_STAT\|INIT_DONE\|REGISTERED\|NOWAIT # cat /sys/block/sr0/queue/wbt_lat_usec 75000 We can see that after changing elevator from "bfq" to "none", "STATS" flag is lost even though WBT callback still need it. Fixes: `68497092bd` ("block: make queue stat accounting a reference") Cc: <stable@vger.kernel.org> # v5.17+ Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230413062805.2081970-1-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:48:11 -06:00
Tejun Heo	a13696b83d	blk-iolatency: Make initialization lazy Other rq_qos policies such as wbt and iocost are lazy-initialized when they are configured for the first time for the device but iolatency is initialized unconditionally from blkcg_init_disk() during gendisk init. Lazy init is beneficial because rq_qos policies add runtime overhead when initialized as every IO has to walk all registered rq_qos callbacks. This patch switches iolatency to lazy initialization too so that it only registered its rq_qos policy when it is first configured. Note that there is a known race condition between blkcg config file writes and del_gendisk() and this patch makes iolatency susceptible to it by exposing the init path to race against the deletion path. However, that problem already exists in iocost and is being worked on. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20230413000649.115785-5-tj@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:46:49 -06:00
Tejun Heo	3304918758	blk-iolatency: s/blkcg_rq_qos/iolat_rq_qos/ The name was too generic given that there are multiple blkcg rq-qos policies. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20230413000649.115785-4-tj@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:46:49 -06:00
Tejun Heo	faffaab289	blkcg: Restructure blkg_conf_prep() and friends We want to support lazy init of rq-qos policies so that iolatency is enabled lazily on configuration instead of gendisk initialization. The way blkg config helpers are structured now is a bit awkward for that. Let's restructure: * blkcg_conf_open_bdev() is renamed to blkg_conf_open_bdev(). The blkcg_ prefix was used because the bdev opening step is blkg-independent. However, the distinction is too subtle and confuses more than helps. Let's switch to blkg prefix so that it's consistent with the type and other helper names. * struct blkg_conf_ctx now remembers the original input string and is always initialized by the new blkg_conf_init(). * blkg_conf_open_bdev() is updated to take a pointer to blkg_conf_ctx like blkg_conf_prep() and can be called multiple times safely. Instead of modifying the double pointer to input string directly, blkg_conf_open_bdev() now sets blkg_conf_ctx->body. * blkg_conf_finish() is renamed to blkg_conf_exit() for symmetry and now must be called on all blkg_conf_ctx's which were initialized with blkg_conf_init(). Combined, this allows the users to either open the bdev first or do it altogether with blkg_conf_prep() which will help implementing lazy init of rq-qos policies. blkg_conf_init/exit() will also be used implement synchronization against device removal. This is necessary because iolat / iocost are configured through cgroupfs instead of one of the files under /sys/block/DEVICE. As cgroupfs operations aren't synchronized with block layer, the lazy init and other configuration operations may race against device removal. This patch makes blkg_conf_init/exit() used consistently for all cgroup-orginating configurations making them a good place to implement explicit synchronization. Users are updated accordingly. No behavior change is intended by this patch. v2: bfq wasn't updated in v1 causing a build error. Fixed. v3: Update the description to include future use of blkg_conf_init/exit() as synchronization points. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yu Kuai <yukuai1@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230413000649.115785-3-tj@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:46:49 -06:00
Tejun Heo	83462a6c97	blkcg: Drop unnecessary RCU read [un]locks from blkg_conf_prep/finish() Now that all RCU flavors have been combined either holding a spin lock, disabling irq or disabling preemption implies RCU read lock, so there's no need to use rcu_read_[un]lock() explicitly while holding queue_lock. This shouldn't cause any behavior changes. v2: Description updated. Leave __acquires/release on queue_lock alone. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230413000649.115785-2-tj@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:46:48 -06:00
Chengming Zhou	650e2cb50f	blk-cgroup: delete cpd_init_fn of blkcg_policy blkcg_policy cpd_init_fn() is used to just initialize some default fields of policy data, which is enough to do in cpd_alloc_fn(). This patch delete the only user bfq_cpd_init(), and remove cpd_init_fn from blkcg_policy. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230406145050.49914-4-zhouchengming@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:17:32 -06:00
Chengming Zhou	d1023165ee	blk-cgroup: delete cpd_bind_fn of blkcg_policy cpd_bind_fn is just used for update default weight when block subsys attached to a hierarchy. No any policy need it anymore. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230406145050.49914-3-zhouchengming@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:17:32 -06:00
Chengming Zhou	e9f2f3f590	block, bfq: remove BFQ_WEIGHT_LEGACY_DFL BFQ_WEIGHT_LEGACY_DFL is the same as CGROUP_WEIGHT_DFL, which means we don't need cpd_bind_fn() callback to update default weight when attached to a hierarchy. This patch remove BFQ_WEIGHT_LEGACY_DFL and cpd_bind_fn(). Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230406145050.49914-2-zhouchengming@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-06 16:17:32 -06:00
Ondrej Kozina	4c4dd04e75	sed-opal: Add command to read locking range parameters. It returns following attributes: locking range start locking range length read lock enabled write lock enabled lock state (RW, RO or LK) It can be retrieved by user authority provided the authority was added to locking range via prior IOC_OPAL_ADD_USR_TO_LR ioctl command. The command was extended to add user in ACE that allows to read attributes listed above. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Tested-by: Luca Boccassi <bluca@debian.org> Tested-by: Milan Broz <gmazyland@gmail.com> Link: https://lore.kernel.org/r/20230405111223.272816-6-okozina@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-05 07:46:26 -06:00
Ondrej Kozina	baf82b679c	sed-opal: add helper to get multiple columns at once. Refactors current code querying single column to use the new helper. Real multi column usage will be added later. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Tested-by: Luca Boccassi <bluca@debian.org> Tested-by: Milan Broz <gmazyland@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230405111223.272816-5-okozina@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-05 07:46:26 -06:00
Ondrej Kozina	8be19a02f1	sed-opal: allow user authority to get locking range attributes. Extend ACE set of locking range attributes accessible to user authority. This patch allows user authority to get following locking range attribues when user get added to locking range via IOC_OPAL_ADD_USR_TO_LR: locking range start locking range end read lock enabled write lock enabled read locked write locked lock on reset active key Note: Admin1 authority always remains in the ACE. Otherwise it breaks current userspace expecting Admin1 in the ACE (sedutils). See TCG OPAL2 s.4.3.1.7 "ACE_Locking_RangeNNNN_Get_RangeStartToActiveKey". Signed-off-by: Ondrej Kozina <okozina@redhat.com> Tested-by: Luca Boccassi <bluca@debian.org> Tested-by: Milan Broz <gmazyland@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230405111223.272816-4-okozina@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-05 07:46:25 -06:00
Ondrej Kozina	175b654402	sed-opal: add helper for adding user authorities in ACE. Move ACE construction away from add_user_to_lr routine and refactor it to be used also in later code. Also adds boolean operators defines from TCG Core specification. Signed-off-by: Ondrej Kozina <okozina@redhat.com> Tested-by: Luca Boccassi <bluca@debian.org> Tested-by: Milan Broz <gmazyland@gmail.com> Link: https://lore.kernel.org/r/20230405111223.272816-3-okozina@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-05 07:46:25 -06:00
Ondrej Kozina	2fce95b196	sed-opal: do not add same authority twice in boolean ace. While adding user authority in boolean ace value of uid OPAL_LOCKINGRANGE_ACE_WRLOCKED or OPAL_LOCKINGRANGE_ACE_RDLOCKED, it was added twice. It seemed redundant when only single authority was added in the set method aka { authority1, authority1, OR }: TCG Storage Architecture Core Specification, 5.1.3.3 ACE_expression "This is an alternative type where the options are either a uidref to an Authority object or one of the boolean_ACE (AND = 0 and OR = 1) options. This type is used within the AC_element list to form a postfix Boolean expression of Authorities." Signed-off-by: Ondrej Kozina <okozina@redhat.com> Tested-by: Luca Boccassi <bluca@debian.org> Tested-by: Milan Broz <gmazyland@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230405111223.272816-2-okozina@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-05 07:46:25 -06:00
Chaitanya Kulkarni	06965037ce	block: open code __blk_account_io_done() There is only one caller for __blk_account_io_done(), the function is small enough to fit in its caller blk_account_io_done(). Remove the function and opencode in the its caller blk_account_io_done(). Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230327073427.4403-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-27 13:22:58 -06:00
Chaitanya Kulkarni	e165fb4dd6	block: open code __blk_account_io_start() There is only one caller for __blk_account_io_start(), the function is small enough to fit in its caller blk_account_io_start(). Remove the function and opencode in the its caller blk_account_io_start(). Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230327073427.4403-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-27 13:22:58 -06:00
Keith Busch	54bdd67d0f	blk-mq: remove hybrid polling io_uring provides the only way user space can poll completions, and that always sets BLK_POLL_NOSLEEP. This effectively makes hybrid polling dead code, so remove it and everything supporting it. Hybrid polling was effectively killed off with `9650b453a3`, "block: ignore RWF_HIPRI hint for sync dio", but still potentially reachable through io_uring until `d729cf9acb`, "io_uring: don't sleep when polling for I/O", but hybrid polling probably should not have been reachable through that async interface from the beginning. Fixes: `9650b453a3` ("block: ignore RWF_HIPRI hint for sync dio") Fixes: `d729cf9acb` ("io_uring: don't sleep when polling for I/O") Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20230320194926.3353144-1-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-20 15:30:03 -06:00
Eric Biggers	4cf2c3ab2c	blk-crypto: drop the NULL check from blk_crypto_put_keyslot() Now that all callers of blk_crypto_put_keyslot() check for NULL before calling it, there is no need for blk_crypto_put_keyslot() to do the NULL check itself. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-16 09:35:09 -06:00
Eric Biggers	5b8562f0e8	blk-mq: return actual keyslot error in blk_insert_cloned_request() To avoid hiding information, pass on the error code from blk_crypto_rq_get_keyslot() instead of always using BLK_STS_IOERR. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-16 09:35:09 -06:00
Eric Biggers	435c0e9996	blk-crypto: remove blk_crypto_insert_cloned_request() blk_crypto_insert_cloned_request() is the same as blk_crypto_rq_get_keyslot(), so just use that directly. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-16 09:35:09 -06:00
Eric Biggers	5c7cb94452	blk-crypto: make blk_crypto_evict_key() more robust If blk_crypto_evict_key() sees that the key is still in-use (due to a bug) or that ->keyslot_evict failed, it currently just returns while leaving the key linked into the keyslot management structures. However, blk_crypto_evict_key() is only called in contexts such as inode eviction where failure is not an option. So actually the caller proceeds with freeing the blk_crypto_key regardless of the return value of blk_crypto_evict_key(). These two assumptions don't match, and the result is that there can be a use-after-free in blk_crypto_reprogram_all_keys() after one of these errors occurs. (Note, these errors shouldn't happen; we're just talking about what happens if they do anyway.) Fix this by making blk_crypto_evict_key() unlink the key from the keyslot management structures even on failure. Also improve some comments. Fixes: `1b26283970` ("block: Keyslot Manager for Inline Encryption") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-16 09:35:09 -06:00
Eric Biggers	70493a63ba	blk-crypto: make blk_crypto_evict_key() return void blk_crypto_evict_key() is only called in contexts such as inode eviction where failure is not an option. So there is nothing the caller can do with errors except log them. (dm-table.c does "use" the error code, but only to pass on to upper layers, so it doesn't really count.) Just make blk_crypto_evict_key() return void and log errors itself. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-16 09:35:09 -06:00
Eric Biggers	9cd1e56667	blk-mq: release crypto keyslot before reporting I/O complete Once all I/O using a blk_crypto_key has completed, filesystems can call blk_crypto_evict_key(). However, the block layer currently doesn't call blk_crypto_put_keyslot() until the request is being freed, which happens after upper layers have been told (via bio_endio()) the I/O has completed. This causes a race condition where blk_crypto_evict_key() can see 'slot_refs != 0' without there being an actual bug. This makes __blk_crypto_evict_key() hit the 'WARN_ON_ONCE(atomic_read(&slot->slot_refs) != 0)' and return without doing anything, eventually causing a use-after-free in blk_crypto_reprogram_all_keys(). (This is a very rare bug and has only been seen when per-file keys are being used with fscrypt.) There are two options to fix this: either release the keyslot before bio_endio() is called on the request's last bio, or make __blk_crypto_evict_key() ignore slot_refs. Let's go with the first solution, since it preserves the ability to report bugs (via WARN_ON_ONCE) where a key is evicted while still in-use. Fixes: `a892c8d52c` ("block: Inline encryption support for blk-mq") Cc: stable@vger.kernel.org Reviewed-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-16 09:35:09 -06:00
Yu Kuai	e2f2a39452	block, bfq: fix uaf for 'stable_merge_bfqq' Before commit `fd571df0ac` ("block, bfq: turn bfqq_data into an array in bfq_io_cq"), process reference is read before bfq_put_stable_ref(), and it's safe if bfq_put_stable_ref() put the last reference, because process reference will be 0 and 'stable_merge_bfqq' won't be accessed in this case. However, the commit changed the order and will cause uaf for 'stable_merge_bfqq'. In order to emphasize that bfq_put_stable_ref() can drop the last reference, fix the problem by moving bfq_put_stable_ref() to the end of bfq_setup_stable_merge(). Fixes: `fd571df0ac` ("block, bfq: turn bfqq_data into an array in bfq_io_cq") Reported-and-tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/linux-block/20230307071448.rzihxbm4jhbf5krj@shindev/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-08 07:34:50 -07:00
Yu Kuai	428913bce1	block: fix wrong mode for blkdev_put() from disk_scan_partitions() If disk_scan_partitions() is called with 'FMODE_EXCL', blkdev_get_by_dev() will be called without 'FMODE_EXCL', however, follow blkdev_put() is still called with 'FMODE_EXCL', which will cause 'bd_holders' counter to leak. Fix the problem by using the right mode for blkdev_put(). Reported-by: syzbot+2bcc0d79e548c4f62a59@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/f9649d501bc8c3444769418f6c26263555d9d3be.camel@linux.ibm.com/T/ Tested-by: Julian Ruess <julianr@linux.ibm.com> Fixes: `e5cfefa97b` ("block: fix scan partition for exclusively open device again") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-07 07:24:38 -07:00
Linus Torvalds	9d0281b56b	block-6.3-2023-03-03 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmQB57MQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgputpEADVrc1OFzHOivJq+LJ3HS3ufhLBthtgu1Lp sEHvDNp9tBGXMLkomuCYpAju5TBAEKC+AJTZyj9iS1j++ItoezdoP55YRIH7t2Or UTy8ex3rLPGkQk6k3o8roWCyajTW/ZS+4fmk+NkVYMLsQBp9I+kFbxgJa5bbREdU Z8b/9hcBGz58R8Kq+TEMp/bO7oCV4c8xWumrKER+MktDDx0kc5d+afWXoy7bEKFg jLB3gleTM9HUpa9a2GPc4fxqdb0KanQdMtiyn/oplg0JcZLMiHfRbiRnsgQkjN0O RVtUcdxXmOkQeFra4GXPiHmQBcIfE85wP4wxb8p/F2StYRhb1epzzeCXOhuNZvv4 dd6OSARgtzWt3OlHka4aC63H4kzs9SxJp0F2uwuPLV0fM91TP1oOTWV+53FrQr9Z OQYyB8d9Il4K72NFLwU4ukJ1fPoCRHjpgAXIIkasEjaBftpJlMNnfblncTZTBumy XumFVdKfvqc3OFt8LLKWqLDV0j3TknVeCMPKhsbRwQ0NG4vlNOSWaLkGJCDLJ7ga ebf8AD5eaLCT9qyYquBuW5VBKZH5Z4rf5yHta9Dx+Omu0JTQYtTkiiM3UTdpDbtq SObZ31UvLoYK2dOZcVgjhE2RgM/AV5jJcx7aHhT3UptavAehHbePgiNhuEEntlKv L87kXJkSSQ== =ezrg -----END PGP SIGNATURE----- Merge tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - Don't access released socket during error recovery (Akinobu Mita) - Bring back auto-removal of deleted namespaces during sequential scan (Christoph Hellwig) - Fix an error code in nvme_auth_process_dhchap_challenge (Dan Carpenter) - Show well known discovery name (Daniel Wagner) - Add a missing endianess conversion in effects masking (Keith Busch) - Fix for a regression introduced in blk-rq-qos during init in this merge window (Breno) - Reorder a few fields in struct blk_mq_tag_set, eliminating a few holes and shrinking it (Christophe) - Remove redundant bdev_get_queue() NULL checks (Juhyung) - Add sed-opal single user mode support flag (Luca) - Remove SQE128 check in ublk as it isn't needed, saving some memory (Ming) - Op specific segment checking for cloned requests (Uday) - Exclusive open partition scan fixes (Yu) - Loop offset/size checking before assigning them in the device (Zhong) - Bio polling fixes (me) * tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux: blk-mq: enforce op-specific segment limits in blk_insert_cloned_request nvme-fabrics: show well known discovery name nvme-tcp: don't access released socket during error recovery nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge() nvme: bring back auto-removal of deleted namespaces during sequential scan blk-iocost: Pass gendisk to ioc_refresh_params nvme: fix sparse warning on effects masking block: be a bit more careful in checking for NULL bdev while polling block: clear bio->bi_bdev when putting a bio back in the cache loop: loop_set_status_from_info() check before assignment ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd block: remove more NULL checks after bdev_get_queue() blk-mq: Reorder fields in 'struct blk_mq_tag_set' block: fix scan partition for exclusively open device again block: Revert "block: Do not reread partition table on exclusively open device" sed-opal: add support flag for SUM in status ioctl	2023-03-03 10:21:39 -08:00
Uday Shankar	49d2439832	blk-mq: enforce op-specific segment limits in blk_insert_cloned_request The block layer might merge together discard requests up until the max_discard_segments limit is hit, but blk_insert_cloned_request checks the segment count against max_segments regardless of the req op. This can result in errors like the following when discards are issued through a DM device and max_discard_segments exceeds max_segments for the queue of the chosen underlying device. blk_insert_cloned_request: over max segments limit. (256 > 129) Fix this by looking at the req_op and enforcing the appropriate segment limit - max_discard_segments for REQ_OP_DISCARDs and max_segments for everything else. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230301000655.48112-1-ushankar@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-03-02 21:00:20 -07:00
Breno Leitao	e33b93650f	blk-iocost: Pass gendisk to ioc_refresh_params Current kernel (`d2980d8d82`) crashes when blk_iocost_init for `nvme1` disk. BUG: kernel NULL pointer dereference, address: 0000000000000050 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page blk_iocost_init (include/asm-generic/qspinlock.h:128 include/linux/spinlock.h:203 include/linux/spinlock_api_smp.h:158 include/linux/spinlock.h:400 block/blk-iocost.c:2884) ioc_qos_write (block/blk-iocost.c:3198) ? kretprobe_perf_func (kernel/trace/trace_kprobe.c:1566) ? kernfs_fop_write_iter (include/linux/slab.h:584 fs/kernfs/file.c:311) ? __kmem_cache_alloc_node (mm/slab.h:? mm/slub.c:3452 mm/slub.c:3491) ? _copy_from_iter (arch/x86/include/asm/uaccess_64.h:46 arch/x86/include/asm/uaccess_64.h:52 lib/iov_iter.c:183 lib/iov_iter.c:628) ? kretprobe_dispatcher (kernel/trace/trace_kprobe.c:1693) cgroup_file_write (kernel/cgroup/cgroup.c:4061) kernfs_fop_write_iter (fs/kernfs/file.c:334) vfs_write (include/linux/fs.h:1849 fs/read_write.c:491 fs/read_write.c:584) ksys_write (fs/read_write.c:637) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) This happens because ioc_refresh_params() is being called without a properly initialized ioc->rqos, which is happening later in the callee side. ioc_refresh_params() -> ioc_autop_idx() tries to access ioc->rqos.disk->queue but ioc->rqos.disk is NULL, causing the BUG above. Create function, called ioc_refresh_params_disk(), that is similar to ioc_refresh_params() but where the "struct gendisk" could be passed as an explicit argument. This function will be called when ioc->rqos.disk is not initialized. Fixes: `ce57b55860` ("blk-rq-qos: make rq_qos_add and rq_qos_del more useful") Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230228111654.1778120-1-leitao@debian.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-02-28 05:51:19 -07:00
Linus Torvalds	a93e884edf	Driver core changes for 6.3-rc1 Here is the large set of driver core changes for 6.3-rc1. There's a lot of changes this development cycle, most of the work falls into two different categories: - fw_devlink fixes and updates. This has gone through numerous review cycles and lots of review and testing by lots of different devices. Hopefully all should be good now, and Saravana will be keeping a watch for any potential regression on odd embedded systems. - driver core changes to work to make struct bus_type able to be moved into read-only memory (i.e. const) The recent work with Rust has pointed out a number of areas in the driver core where we are passing around and working with structures that really do not have to be dynamic at all, and they should be able to be read-only making things safer overall. This is the contuation of that work (started last release with kobject changes) in moving struct bus_type to be constant. We didn't quite make it for this release, but the remaining patches will be finished up for the release after this one, but the groundwork has been laid for this effort. Other than that we have in here: - debugfs memory leak fixes in some subsystems - error path cleanups and fixes for some never-able-to-be-hit codepaths. - cacheinfo rework and fixes - Other tiny fixes, full details are in the shortlog All of these have been in linux-next for a while with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/ipdg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynL3gCgwzbcWu0So3piZyLiJKxsVo9C2EsAn3sZ9gN6 6oeFOjD3JDju3cQsfGgd =Su6W -----END PGP SIGNATURE----- Merge tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.3-rc1. There's a lot of changes this development cycle, most of the work falls into two different categories: - fw_devlink fixes and updates. This has gone through numerous review cycles and lots of review and testing by lots of different devices. Hopefully all should be good now, and Saravana will be keeping a watch for any potential regression on odd embedded systems. - driver core changes to work to make struct bus_type able to be moved into read-only memory (i.e. const) The recent work with Rust has pointed out a number of areas in the driver core where we are passing around and working with structures that really do not have to be dynamic at all, and they should be able to be read-only making things safer overall. This is the contuation of that work (started last release with kobject changes) in moving struct bus_type to be constant. We didn't quite make it for this release, but the remaining patches will be finished up for the release after this one, but the groundwork has been laid for this effort. Other than that we have in here: - debugfs memory leak fixes in some subsystems - error path cleanups and fixes for some never-able-to-be-hit codepaths. - cacheinfo rework and fixes - Other tiny fixes, full details are in the shortlog All of these have been in linux-next for a while with no reported problems" [ Geert Uytterhoeven points out that that last sentence isn't true, and that there's a pending report that has a fix that is queued up - Linus ] * tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits) debugfs: drop inline constant formatting for ERR_PTR(-ERROR) OPP: fix error checking in opp_migrate_dentry() debugfs: update comment of debugfs_rename() i3c: fix device.h kernel-doc warnings dma-mapping: no need to pass a bus_type into get_arch_dma_ops() driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place Revert "driver core: add error handling for devtmpfs_create_node()" Revert "devtmpfs: add debug info to handle()" Revert "devtmpfs: remove return value of devtmpfs_delete_node()" driver core: cpu: don't hand-override the uevent bus_type callback. devtmpfs: remove return value of devtmpfs_delete_node() devtmpfs: add debug info to handle() driver core: add error handling for devtmpfs_create_node() driver core: bus: update my copyright notice driver core: bus: add bus_get_dev_root() function driver core: bus: constify bus_unregister() driver core: bus: constify some internal functions driver core: bus: constify bus_get_kset() driver core: bus: constify bus_register/unregister_notifier() driver core: remove private pointer from struct bus_type ...	2023-02-24 12:58:55 -08:00

1 2 3 4 5 ...

6779 Commits