linux/kernel/sched
Thomas Gleixner 81a44c5441 sched: Queue RT tasks to head when prio drops
The following scenario does not work correctly:

Runqueue of CPUx contains two runnable and pinned tasks:

 T1: SCHED_FIFO, prio 80
 T2: SCHED_FIFO, prio 80

T1 is on the cpu and executes the following syscalls (classic priority
ceiling scenario):

 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
 ...
 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
 ...

Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
to sleep the scheduler picks T2. Surprise!

The same happens w/o actual preemption when T1 is forced into the
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
pick_next_task() which returns T2. So T1 gets preempted and scheduled
out.

This happens because sched_setscheduler() dequeues T1 from the prio 90
list and then enqueues it on the tail of the prio 80 list behind T2.
This violates the POSIX spec and surprises user space which relies on
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
give the CPU up voluntarily or are preempted by a higher priority
task. In the latter case the preempted task must get back on the CPU
after the preempting task schedules out again.

We fixed a similar issue already in commit 60db48c (sched: Queue a
deboosted task to the head of the RT prio queue). The same treatment
is necessary for sched_setscheduler(). So enqueue to head of the prio
bucket list if the priority of the task is lowered.

It might be possible that existing user space relies on the current
behaviour, but it can be considered highly unlikely due to the corner
case nature of the application scenario.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-6-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:09:41 +01:00
..
auto_group.c sched/autogroup: Fix race with task_groups list 2013-05-28 09:40:22 +02:00
auto_group.h Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled" 2012-12-11 10:23:45 +01:00
clock.c sched/clock: Fixup early initialization 2014-01-23 14:48:36 +01:00
completion.c sched: Move completion code from core.c to completion.c 2013-11-06 07:49:19 +01:00
core.c sched: Queue RT tasks to head when prio drops 2014-02-22 18:09:41 +01:00
cpuacct.c cgroup: replace cftype->read_seq_string() with cftype->seq_show() 2013-12-05 12:28:04 -05:00
cpuacct.h sched/cpuacct: Initialize root cpuacct earlier 2013-04-10 13:54:20 +02:00
cpudeadline.c sched/deadline: Fix sparse static warnings 2014-01-16 09:27:03 +01:00
cpudeadline.h sched/deadline: speed up SCHED_DEADLINE pushes with a push-heap 2014-01-13 13:46:46 +01:00
cpupri.c sched: Fix some kernel-doc warnings 2013-07-18 09:58:21 +02:00
cpupri.h
cputime.c sched: Implement task_nice() as static inline function 2014-02-09 15:28:23 +01:00
deadline.c sched: Remove some #ifdeffery 2014-02-21 21:43:18 +01:00
debug.c sched: Add statistic for newidle load balance cost 2014-02-11 09:58:18 +01:00
fair.c sched: Fix hotplug task migration 2014-02-21 21:43:18 +01:00
features.h sched/numa: Resist moving tasks towards nodes with fewer hinting faults 2013-10-09 12:40:27 +02:00
idle_task.c sched: Remove some #ifdeffery 2014-02-21 21:43:18 +01:00
idle.c sched/idle: Move cpu/idle.c to sched/idle.c 2014-02-11 09:58:30 +01:00
Makefile sched/idle: Move cpu/idle.c to sched/idle.c 2014-02-11 09:58:30 +01:00
proc.c sched: Change get_rq_runnable_load() to static and inline 2013-06-27 10:07:44 +02:00
rt.c sched: Remove some #ifdeffery 2014-02-21 21:43:18 +01:00
sched.h sched: Remove some #ifdeffery 2014-02-21 21:43:18 +01:00
stats.c fix a leak in /proc/schedstats 2013-04-29 15:41:45 -04:00
stats.h sched: Micro-optimize by dropping unnecessary task_rq() calls 2013-09-25 13:51:06 +02:00
stop_task.c sched: Fix hotplug task migration 2014-02-21 21:43:18 +01:00
wait.c sched: Move wait code from core.c to wait.c 2013-11-06 07:49:18 +01:00