linux/block
Jeff Moyer c6ce194325 cfq-iosched: fix incorrect filing of rt async cfqq
Hi,

If you can manage to submit an async write as the first async I/O from
the context of a process with realtime scheduling priority, then a
cfq_queue is allocated, but filed into the wrong async_cfqq bucket.  It
ends up in the best effort array, but actually has realtime I/O
scheduling priority set in cfqq->ioprio.

The reason is that cfq_get_queue assumes the default scheduling class and
priority when there is no information present (i.e. when the async cfqq
is created):

static struct cfq_queue *
cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
	      struct bio *bio, gfp_t gfp_mask)
{
	const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
	const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);

cic->ioprio starts out as 0, which is "invalid".  So, class of 0
(IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:

		async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);

static struct cfq_queue **
cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
{
        switch (ioprio_class) {
        case IOPRIO_CLASS_RT:
                return &cfqd->async_cfqq[0][ioprio];
        case IOPRIO_CLASS_NONE:
                ioprio = IOPRIO_NORM;
                /* fall through */
        case IOPRIO_CLASS_BE:
                return &cfqd->async_cfqq[1][ioprio];
        case IOPRIO_CLASS_IDLE:
                return &cfqd->async_idle_cfqq;
        default:
                BUG();
        }
}

Here, instead of returning a class mapped from the process' scheduling
priority, we get back the bucket associated with IOPRIO_CLASS_BE.

Now, there is no queue allocated there yet, so we create it:

		cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);

That function ends up doing this:

			cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
			cfq_init_prio_data(cfqq, cic);

cfq_init_cfqq marks the priority as having changed.  Then, cfq_init_prio
data does this:

	ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
	switch (ioprio_class) {
	default:
		printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class);
	case IOPRIO_CLASS_NONE:
		/*
		 * no prio set, inherit CPU scheduling settings
		 */
		cfqq->ioprio = task_nice_ioprio(tsk);
		cfqq->ioprio_class = task_nice_ioclass(tsk);
		break;

So we basically have two code paths that treat IOPRIO_CLASS_NONE
differently, which results in an RT async cfqq filed into a best effort
bucket.

Attached is a patch which fixes the problem.  I'm not sure how to make
it cleaner.  Suggestions would be welcome.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-21 10:38:30 -07:00
..
partitions block: Replace strnicmp with strncasecmp 2014-09-27 16:48:55 -06:00
bio-integrity.c block: fix regression where bio_integrity_process uses wrong bio_vec iterator 2014-12-02 08:15:21 -07:00
bio.c bio: modify __bio_add_page() to accept pages that don't start a new segment 2014-12-11 09:11:52 -07:00
blk-cgroup.c blkcg: remove blkcg->id 2014-09-08 09:55:37 -06:00
blk-cgroup.h blkcg: remove blkcg->id 2014-09-08 09:55:37 -06:00
blk-core.c block: wake up waiters when a queue is marked dying 2014-12-31 09:39:16 -07:00
blk-exec.c blk-mq: avoid infinite recursion with the FUA flag 2014-09-22 11:55:19 -06:00
blk-flush.c blk-mq: support per-distpatch_queue flush machinery 2014-09-25 15:22:45 -06:00
blk-integrity.c block: Don't merge requests if integrity flags differ 2014-09-27 09:14:57 -06:00
blk-ioc.c block: Substitute rcu_access_pointer() for rcu_dereference_raw() 2014-02-18 12:21:26 -08:00
blk-iopoll.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-03 12:57:53 -07:00
blk-lib.c block/blk-lib.c: make __blkdev_issue_zeroout static 2014-05-26 17:39:09 -06:00
blk-map.c block: remove struct request buffer member 2014-04-15 14:03:02 -06:00
blk-merge.c block: blk-merge: fix blk_recount_segments() 2014-11-11 16:24:15 -07:00
blk-mq-cpu.c blk-mq: add file comments and update copyright notices 2014-05-28 10:15:41 -06:00
blk-mq-cpumap.c blk-mq: Use all available hardware queues 2014-12-09 09:08:21 -07:00
blk-mq-sysfs.c blk-mq: Fix uninitialized kobject at CPU hotplugging 2014-12-10 08:57:31 -07:00
blk-mq-tag.c blk-mq: fix false negative out-of-tags condition 2015-01-14 08:49:55 -07:00
blk-mq-tag.h block: wake up waiters when a queue is marked dying 2014-12-31 09:39:16 -07:00
blk-mq.c blk-mq: export blk_mq_freeze_queue() 2015-01-02 15:05:12 -07:00
blk-mq.h block: wake up waiters when a queue is marked dying 2014-12-31 09:39:16 -07:00
blk-settings.c block: remove artifical max_hw_sectors cap 2014-10-21 14:02:54 -06:00
blk-softirq.c block: fix regression with block enabled tagging 2014-04-09 21:54:06 -06:00
blk-sysfs.c blk-mq: Fix a use-after-free 2014-12-09 09:07:13 -07:00
blk-tag.c block: don't assume last put of shared tags is for the host 2014-07-08 12:25:28 +02:00
blk-throttle.c cgroup: remove sane_behavior support on non-default hierarchies 2014-07-09 10:08:08 -04:00
blk-timeout.c block: fix blk_abort_request on blk-mq 2014-09-22 12:00:08 -06:00
blk.h blk-mq: support per-distpatch_queue flush machinery 2014-09-25 15:22:45 -06:00
bounce.c mm: convert some level-less printks to pr_* 2014-06-06 16:08:18 -07:00
bsg-lib.c bsg: Remove unused function bsg_goose_queue() 2012-12-06 14:33:02 +01:00
bsg.c bsg: fix potential error pointer dereference 2014-08-29 08:34:14 -06:00
cfq-iosched.c cfq-iosched: fix incorrect filing of rt async cfqq 2015-01-21 10:38:30 -07:00
cmdline-parser.c block: remove unrelated header files and export symbol 2014-01-21 20:18:26 -08:00
compat_ioctl.c block, bdi: an active gendisk always has a request_queue associated with it 2014-09-08 10:00:35 -06:00
deadline-iosched.c block: Stop abusing csd.list for fifo_time 2014-02-24 14:46:32 -08:00
elevator.c block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM 2014-12-04 01:00:23 +01:00
genhd.c genhd: check for int overflow in disk_expand_part_tbl() 2014-11-19 13:09:07 -07:00
ioctl.c block, bdi: an active gendisk always has a request_queue associated with it 2014-09-08 10:00:35 -06:00
ioprio.c block: Fix computation of merged request priority 2014-10-31 08:30:43 -06:00
Kconfig block: Add T10 Protection Information functions 2014-09-27 09:14:59 -06:00
Kconfig.iosched blkcg: make CONFIG_BLK_CGROUP bool 2012-03-06 21:27:21 +01:00
Makefile block: Add T10 Protection Information functions 2014-09-27 09:14:59 -06:00
noop-iosched.c elevator: Fix a race in elevator switching 2013-07-03 13:25:24 +02:00
partition-generic.c block: Fix dev_t minor allocation lifetime 2014-09-03 15:01:02 -06:00
scsi_ioctl.c Merge remote-tracking branch 'scsi-queue/core-for-3.19' into for-linus 2014-12-08 07:40:20 -08:00
t10-pi.c block: Add T10 Protection Information functions 2014-09-27 09:14:59 -06:00