linux/kernel
Johannes Weiner 777c6c5f1f wait: prevent exclusive waiter starvation
With exclusive waiters, every process woken up through the wait queue must
ensure that the next waiter down the line is woken when it has finished.

Interruptible waiters don't do that when aborting due to a signal.  And if
an aborting waiter is concurrently woken up through the waitqueue, noone
will ever wake up the next waiter.

This has been observed with __wait_on_bit_lock() used by
lock_page_killable(): the first contender on the queue was aborting when
the actual lock holder woke it up concurrently.  The aborted contender
didn't acquire the lock and therefor never did an unlock followed by
waking up the next waiter.

Add abort_exclusive_wait() which removes the process' wait descriptor from
the waitqueue, iff still queued, or wakes up the next waiter otherwise.
It does so under the waitqueue lock.  Racing with a wake up means the
aborting process is either already woken (removed from the queue) and will
wake up the next waiter, or it will remove itself from the queue and the
concurrent wake up will apply to the next waiter after it.

Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and
__wait_on_bit_lock() when they were interrupted by other means than a wake
up through the queue.

[akpm@linux-foundation.org: coding-style fixes]
Reported-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Mentored-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>		["after some testing"]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05 12:56:48 -08:00
..
irq Merge branch 'core/xen' into x86/urgent 2009-02-04 14:54:56 +01:00
power Hibernation: Introduce system_entering_hibernation 2009-01-27 02:15:45 -05:00
time hrtimers: allow the hot-unplugging of all cpus 2009-01-30 22:35:29 +01:00
trace ftrace: do_each_pid_task() needs rcu lock 2009-02-03 22:50:58 +01:00
.gitignore
acct.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
async.c kernel/async.c: fix printk warnings 2009-02-05 12:56:46 -08:00
audit_tree.c audit: validate comparison operations, store them in sane form 2009-01-04 15:14:42 -05:00
audit.c [PATCH] fix broken timestamps in AVC generated by kernel threads 2008-12-09 02:27:41 -05:00
audit.h fixing audit rule ordering mess, part 1 2009-01-04 15:14:41 -05:00
auditfilter.c audit: validate comparison operations, store them in sane form 2009-01-04 15:14:42 -05:00
auditsc.c make sure that filterkey of task,always rules is reported 2009-01-04 15:14:42 -05:00
backtracetest.c
bounds.c
capability.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
cgroup_debug.c cgroups: fix probable race with put_css_set[_taskexit] and find_css_set 2008-10-20 08:52:38 -07:00
cgroup_freezer.c freezer_cg: disable writing freezer.state of root cgroup 2008-11-12 17:17:16 -08:00
cgroup.c cgroup: fix root_count when mount fails due to busy subsystem 2009-01-29 18:04:45 -08:00
compat.c Allow times and time system calls to return small negative values 2009-01-06 15:59:13 -08:00
configs.c kernel/configs.c: remove useless comments 2008-10-20 08:52:34 -07:00
cpu.c stop_machine/cpu hotplug: fix disable_nonboot_cpus 2009-01-07 11:36:14 -08:00
cpuset.c cpuset: fix possible deadlock in async_rebuild_sched_domains 2009-01-19 02:44:00 +01:00
cred-internals.h CRED: Inaugurate COW credentials 2008-11-14 10:39:23 +11:00
cred.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 2009-01-09 13:59:25 -08:00
delayacct.c schedstat: consolidate per-task cpu runtime stats 2008-12-18 13:54:01 +01:00
dma-coherent.c dma-coherent: Restore dma_alloc_from_coherent() large alloc fall back policy. 2009-01-21 18:51:53 +09:00
dma.c kernel/dma.c: remove a CVS keyword 2008-10-16 11:21:30 -07:00
exec_domain.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
exit.c [CVE-2009-0029] System call wrappers part 08 2009-01-14 14:15:21 +01:00
extable.c Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-12-30 16:10:19 -08:00
fork.c Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-26 09:47:43 -08:00
freezer.c freezer_cg: use thaw_process() in unfreeze_cgroup() 2008-10-30 11:38:45 -07:00
futex_compat.c CRED: Use RCU to access another task's creds and to release a task's own creds 2008-11-14 10:39:19 +11:00
futex.c [CVE-2009-0029] System call wrappers part 31 2009-01-14 14:15:31 +01:00
hrtimer.c hrtimer: prevent negative expiry value after clock_was_set() 2009-01-30 22:35:34 +01:00
itimer.c [CVE-2009-0029] System call wrappers part 05 2009-01-14 14:15:20 +01:00
kallsyms.c Revert "kbuild: strip generated symbols from *.ko" 2009-01-14 21:38:20 +01:00
Kconfig.freezer container freezer: implement freezer cgroup subsystem 2008-10-20 08:52:34 -07:00
Kconfig.hz
Kconfig.preempt rcu: provide RCU options on non-preempt architectures too 2008-12-25 09:31:28 +01:00
kexec.c [CVE-2009-0029] System call wrappers part 07 2009-01-14 14:15:20 +01:00
kfifo.c
kgdb.c kgdb: call touch_softlockup_watchdog on resume 2008-10-06 13:50:59 -05:00
kmod.c kmod: fix varargs kernel-doc 2009-01-06 15:59:27 -08:00
kprobes.c kprobes: check CONFIG_FREEZER instead of CONFIG_PM 2009-01-16 14:32:17 -05:00
ksysfs.c kernel/ksysfs.c:fix dependence on CONFIG_NET 2009-01-06 10:44:31 -08:00
kthread.c tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() 2008-11-16 09:01:36 +01:00
latencytop.c KSYM_SYMBOL_LEN fixes 2008-12-10 08:01:54 -08:00
lockdep_internals.h lockdep: build fix 2008-08-13 12:55:10 +02:00
lockdep_proc.c lockstat: contend with points 2008-10-20 15:43:10 +02:00
lockdep.c Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-12-30 16:10:19 -08:00
Makefile kernel/up.c: omit it if SMP=y, USE_GENERIC_SMP_HELPERS=n 2009-01-14 09:42:11 -08:00
marker.c markers/tracpoints: fix non-modular build 2008-11-16 09:52:03 +01:00
module.c modules: Use a better scheme for refcounting 2009-02-02 19:17:55 -08:00
mutex-debug.c
mutex-debug.h
mutex.c mutex: __used is needed for function referenced only from inline asm 2008-11-24 10:00:28 +01:00
mutex.h
notifier.c Merge commit 'v2.6.28-rc6' into core/debug 2008-11-26 08:22:50 +01:00
ns_cgroup.c ns_cgroup: remove unused spinlock 2009-01-08 08:31:02 -08:00
nsproxy.c User namespaces: set of cleanups (v2) 2008-11-24 18:57:41 -05:00
panic.c oops: increment the oops UUID every time we oops 2009-01-06 15:59:12 -08:00
params.c Fix compile warning in kernel/params.c 2008-10-23 12:09:00 -07:00
pid_namespace.c pid_ns: (BUG 11391) change ->child_reaper when init->group_leader exits 2008-09-02 19:21:38 -07:00
pid.c pid: generalize task_active_pid_ns 2009-01-08 08:31:12 -08:00
pm_qos_params.c pm_qos_requirement might sleep 2008-09-02 19:21:40 -07:00
posix-cpu-timers.c itimers: remove the per-cpu-ish-ness 2009-01-07 18:52:44 +01:00
posix-timers.c [CVE-2009-0029] System call wrappers part 05 2009-01-14 14:15:20 +01:00
printk.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
profile.c profile: don't include <asm/ptrace.h> twice. 2009-01-06 15:59:14 -08:00
ptrace.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
rcuclassic.c rcu: add __cpuinit to rcu_init_percpu_data() 2009-01-14 11:26:40 +01:00
rcupdate.c rcu: eliminate synchronize_rcu_xxx macro 2009-01-05 10:18:08 +01:00
rcupreempt_trace.c "Tree RCU": scalable classic RCU implementation 2008-12-18 21:56:04 +01:00
rcupreempt.c rcu: eliminate synchronize_rcu_xxx macro 2009-01-05 10:18:08 +01:00
rcutorture.c rcu: fix bug in rcutorture system-shutdown code 2009-01-07 23:36:25 +01:00
rcutree_trace.c "Tree RCU": scalable classic RCU implementation 2008-12-18 21:56:04 +01:00
rcutree.c rcu: add __cpuinit to rcu_init_percpu_data() 2009-01-14 11:26:40 +01:00
relay.c relay: fix lock imbalance in relay_late_setup_files 2009-01-18 20:29:35 +01:00
res_counter.c memcg: memory cgroup resource counters for hierarchy 2009-01-08 08:31:05 -08:00
resource.c resources: fix parameter name and kernel-doc 2009-01-15 16:39:38 -08:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c hrtimer: convert kernel/* to the new hrtimer apis 2008-09-05 21:35:13 -07:00
rtmutex.h
rwsem.c
sched_clock.c sched_clock: prevent scd->clock from moving backwards, take #2 2008-12-31 09:53:21 +01:00
sched_cpupri.c sched: fix section mismatch 2009-01-06 11:07:15 +01:00
sched_cpupri.h sched: convert struct cpupri_vec cpumask_var_t. 2008-11-24 17:52:22 +01:00
sched_debug.c sched: partly revert "sched debug: remove NULL checking in print_cfs_rt_rq()" 2009-01-11 02:40:32 +01:00
sched_fair.c sched: fix buddie group latency 2009-02-01 10:49:51 +01:00
sched_features.h sched: backward looking buddy 2008-11-05 10:30:14 +01:00
sched_idletask.c sched: add CONFIG_SMP consistency 2008-10-22 10:01:52 +02:00
sched_rt.c sched_rt: don't use first_cpu on cpumask created with cpumask_and 2009-02-01 10:49:52 +01:00
sched_stats.h itimers: remove the per-cpu-ish-ness 2009-01-07 18:52:44 +01:00
sched.c wait: prevent exclusive waiter starvation 2009-02-05 12:56:48 -08:00
seccomp.c
semaphore.c semaphore: __down_common: use signal_pending_state() 2008-08-05 14:33:47 -07:00
signal.c signals, debug: fix BUG: using smp_processor_id() in preemptible code in print_fatal_signal() 2009-01-27 00:36:19 +01:00
smp.c generic-ipi: use per cpu data for single cpu ipi calls 2009-01-30 18:31:08 +01:00
softirq.c Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-03 12:04:39 -08:00
softlockup.c softlock: fix false panic which can occur if softlockup_thresh is reduced 2009-01-14 11:48:07 +01:00
spinlock.c lockdep: spin_lock_nest_lock(), checkpatch fixes 2008-08-13 13:56:51 +02:00
srcu.c
stacktrace.c stacktrace: provide save_stack_trace_tsk() weak alias 2008-12-25 11:44:43 +01:00
stop_machine.c stop_machine: introduce stop_machine_create/destroy. 2009-01-05 08:40:14 +10:30
sys_ni.c [CVE-2009-0029] Make sys_syslog a conditional system call 2009-01-14 14:15:16 +01:00
sys.c revert "rlimit: permit setting RLIMIT_NOFILE to RLIM_INFINITY" 2009-02-05 12:56:47 -08:00
sysctl_check.c [XFS] remove restricted chown parameter from xfs linux 2008-10-30 18:30:09 +11:00
sysctl.c Merge branch 'core/debugobjects' into core/urgent 2009-01-22 10:03:02 +01:00
taskstats.c cpumask: convert rest of files in kernel/ 2009-01-01 10:12:28 +10:30
test_kprobes.c kprobes: add tests for register_kprobes 2009-01-06 15:59:20 -08:00
time.c [CVE-2009-0029] System call wrappers part 01 2009-01-14 14:15:18 +01:00
timeconst.pl
timer.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
tracepoint.c markers/tracpoints: fix non-modular build 2008-11-16 09:52:03 +01:00
tsacct.c mm: introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting 2009-01-06 15:59:09 -08:00
uid16.c [CVE-2009-0029] System call wrappers part 19 2009-01-14 14:15:26 +01:00
up.c smp_call_function_single(): be slightly less stupid, fix #2 2009-01-12 16:04:37 +01:00
user_namespace.c User namespaces: set of cleanups (v2) 2008-11-24 18:57:41 -05:00
user.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-12-28 12:27:58 -08:00
utsname_sysctl.c sysctl: simplify ->strategy 2008-10-16 11:21:47 -07:00
utsname.c removed unused #include <linux/version.h>'s 2008-08-23 12:14:12 -07:00
wait.c wait: prevent exclusive waiter starvation 2009-02-05 12:56:48 -08:00
workqueue.c work_on_cpu: Use our own workqueue. 2009-01-19 22:36:07 +01:00