linux/block
Tejun Heo a051661ca6 blkcg: implement per-blkg request allocation
Currently, request_queue has one request_list to allocate requests
from regardless of blkcg of the IO being issued.  When the unified
request pool is used up, cfq proportional IO limits become meaningless
- whoever grabs the next request being freed wins the race regardless
of the configured weights.

This can be easily demonstrated by creating a blkio cgroup w/ very low
weight, put a program which can issue a lot of random direct IOs there
and running a sequential IO from a different cgroup.  As soon as the
request pool is used up, the sequential IO bandwidth crashes.

This patch implements per-blkg request_list.  Each blkg has its own
request_list and any IO allocates its request from the matching blkg
making blkcgs completely isolated in terms of request allocation.

* Root blkcg uses the request_list embedded in each request_queue,
  which was renamed to @q->root_rl from @q->rq.  While making blkcg rl
  handling a bit harier, this enables avoiding most overhead for root
  blkcg.

* Queue fullness is properly per request_list but bdi isn't blkcg
  aware yet, so congestion state currently just follows the root
  blkcg.  As writeback isn't aware of blkcg yet, this works okay for
  async congestion but readahead may get the wrong signals.  It's
  better than blkcg completely collapsing with shared request_list but
  needs to be improved with future changes.

* After this change, each block cgroup gets a full request pool making
  resource consumption of each cgroup higher.  This makes allowing
  non-root users to create cgroups less desirable; however, note that
  allowing non-root users to directly manage cgroups is already
  severely broken regardless of this patch - each block cgroup
  consumes kernel memory and skews IO weight (IO weights are not
  hierarchical).

v2: queue-sysfs.txt updated and patch description udpated as suggested
    by Vivek.

v3: blk_get_rl() wasn't checking error return from
    blkg_lookup_create() and may cause oops on lookup failure.  Fix it
    by falling back to root_rl on blkg lookup failures.  This problem
    was spotted by Rakesh Iyer <rni@google.com>.

v4: Updated to accomodate 458f27a982 "block: Avoid missed wakeup in
    request waitqueue".  blk_drain_queue() now wakes up waiters on all
    blkg->rl on the target queue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-26 18:42:49 -04:00
..
partitions s390/dasd: re-prioritize partition detection message 2012-05-16 14:42:51 +02:00
blk-cgroup.c blkcg: implement per-blkg request allocation 2012-06-26 18:42:49 -04:00
blk-cgroup.h blkcg: implement per-blkg request allocation 2012-06-26 18:42:49 -04:00
blk-core.c blkcg: implement per-blkg request allocation 2012-06-26 18:42:49 -04:00
blk-exec.c block: add missing blk_queue_dead() checks 2011-12-14 00:33:37 +01:00
blk-flush.c blk-flush: move the queue kick into 2011-10-24 16:24:31 +02:00
blk-integrity.c block: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros 2011-10-31 19:31:12 -04:00
blk-ioc.c block: avoid infinite loop in get_task_io_context() 2012-05-31 13:39:05 +02:00
blk-iopoll.c tree-wide: fix assorted typos all over the place 2009-12-04 15:39:55 +01:00
blk-lib.c block: fix patch import error in max_discard_sectors check 2011-07-23 20:34:59 +02:00
blk-map.c block: re-use existing 'reading' variable instead of checking direction again 2011-12-21 15:27:24 +01:00
blk-merge.c block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions 2012-02-08 09:19:38 +01:00
blk-settings.c block: Introduce blk_set_stacking_limits function 2012-01-11 16:27:11 +01:00
blk-softirq.c sched, block: Unify cache detection 2012-01-27 13:28:48 +01:00
blk-sysfs.c blkcg: implement per-blkg request allocation 2012-06-26 18:42:49 -04:00
blk-tag.c block: fix blk_queue_end_tag() 2011-12-29 09:16:28 +01:00
blk-throttle.c block: allocate io_context upfront 2012-06-25 11:53:50 +02:00
blk-timeout.c block: Drop dead function blk_abort_queue() 2012-06-15 08:46:23 +02:00
blk.h block: prepare for multiple request_lists 2012-06-25 11:53:52 +02:00
bsg-lib.c block: drop custom queue draining used by scsi_transport_{iscsi|fc} 2012-06-25 11:53:48 +02:00
bsg.c bsg: fix sysfs link remove warning 2012-02-08 20:02:03 +01:00
cfq-iosched.c block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED 2012-06-04 10:02:29 +02:00
compat_ioctl.c block: Add BLKROTATIONAL ioctl 2012-01-11 16:29:31 +01:00
deadline-iosched.c elevator: make elevator_init_fn() return 0/-errno 2012-03-06 21:27:21 +01:00
elevator.c blkcg: implement per-queue policy activation 2012-04-20 10:06:06 +02:00
genhd.c block: fix buffer overflow when printing partition UUIDs 2012-05-15 08:22:04 +02:00
ioctl.c Merge branch 'for-3.3/core' of git://git.kernel.dk/linux-block 2012-01-15 12:24:45 -08:00
Kconfig move fs/partitions to block/ 2012-01-03 22:54:06 -05:00
Kconfig.iosched blkcg: make CONFIG_BLK_CGROUP bool 2012-03-06 21:27:21 +01:00
Makefile separate partition format handling from generic code 2012-01-03 22:54:06 -05:00
noop-iosched.c elevator: make elevator_init_fn() return 0/-errno 2012-03-06 21:27:21 +01:00
partition-generic.c block: Fix NULL pointer dereference in sd_revalidate_disk 2012-03-02 10:38:33 +01:00
scsi_ioctl.c scsi: Silence unnecessary warnings about ioctl to partition 2012-06-15 12:52:46 +02:00