linux/kernel
Paul Jackson abb5a5cc6b [PATCH] Cpuset: fix ABBA deadlock with cpu hotplug lock
Fix ABBA deadlock between lock_cpu_hotplug() and the cpuset
callback_mutex lock.

It only happens on cpu_exclusive cpusets, due to the dynamic
sched domain code trying to take the cpu hotplug lock inside
the cpuset callback_mutex lock.

This bug has apparently been here for several months, but didn't
get hit until the right customer load on a large system.

This fix appears right from inspection, but it will take a few
more days running it on that customers workload to be confident
we nailed it.  We don't have any other reproducible test case.

The cpu_hotplug_lock() tends to cover large runs of code.
The other places that hold both that lock and the cpuset callback
mutex lock always nest the cpuset lock inside the hotplug lock.
This place tries to do the reverse, risking an ABBA deadlock.

This is in the cpuset_rmdir() code, where we:
  * take the callback_mutex lock
  * mark the cpuset CS_REMOVED
  * call update_cpu_domains for cpu_exclusive cpusets
  * in that call, take the cpu_hotplug lock if the
    cpuset is marked for removal.

Thanks to Jack Steiner for identifying this deadlock.

The fix is to tear down the dynamic sched domain before we grab
the cpuset callback_mutex lock.  This way, the two locks are
serialized, with the hotplug lock taken and released before
trying for the cpuset lock.

I suspect that this bug was introduced when I changed the
cpuset locking from one lock to two.  The dynamic sched domain
dependency on cpu_exclusive cpusets and its hotplug hooks were
added to this code earlier, when cpusets had only a single lock.
It may well have been fine then.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-23 13:03:05 -07:00
..
irq Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc 2006-07-03 15:28:34 -07:00
power [PATCH] remove kernel/power/pm.c:pm_unregister_all() 2006-07-12 16:09:08 -07:00
time [PATCH] time: rename clocksource functions 2006-06-26 09:58:21 -07:00
.gitignore gitignore: ignore more generated files 2006-01-03 11:35:26 +01:00
acct.c [PATCH] Fix sighand->siglock usage in kernel/acct.c 2006-07-14 21:53:54 -07:00
audit.c [NETLINK]: Encapsulate eff_cap usage within security framework. 2006-06-29 16:57:55 -07:00
audit.h [PATCH] add rule filterkey 2006-07-01 05:43:06 -04:00
auditfilter.c [PATCH] audit syscall classes 2006-07-01 07:44:10 -04:00
auditsc.c [PATCH] audit: support for object context filters 2006-07-01 05:44:19 -04:00
capability.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
compat.c [PATCH] N32 sigset and __COMPAT_ENDIAN_SWAP__ 2006-06-25 10:01:15 -07:00
configs.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
cpu.c cpu hotplug: simplify and hopefully fix locking 2006-07-23 12:12:16 -07:00
cpuset.c [PATCH] Cpuset: fix ABBA deadlock with cpu hotplug lock 2006-07-23 13:03:05 -07:00
delayacct.c [PATCH] per-task-delay-accounting: /proc export of aggregated block I/O delays 2006-07-14 21:53:57 -07:00
dma.c
exec_domain.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
exit.c [PATCH] per-task delay accounting taskstats interface: control exit data through cpumasks 2006-07-14 21:53:57 -07:00
extable.c [PATCH] symbol_put_addr() locks kernel 2006-05-15 11:20:55 -07:00
fork.c [PATCH] delay accounting taskstats interface send tgid once 2006-07-14 21:53:57 -07:00
futex_compat.c [PATCH] pi-futex: futex_lock_pi/futex_unlock_pi support 2006-06-27 17:32:47 -07:00
futex.c [PATCH] pi-futex: Validate futex type instead of oopsing 2006-07-10 13:24:18 -07:00
hrtimer.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
itimer.c [PATCH] hrtimers: remove data field 2006-03-26 08:57:03 -08:00
kallsyms.c [PATCH] null-terminate over-long /proc/kallsyms symbols 2006-07-14 21:53:52 -07:00
Kconfig.hz [PATCH] i386: Selectable Frequency of the Timer Interrupt 2005-06-23 09:45:10 -07:00
Kconfig.preempt [PATCH] sched: voluntary kernel preemption 2005-06-25 16:24:45 -07:00
kexec.c [POWERPC] Add the use of the firmware soft-reset-nmi to kdump. 2006-06-28 15:18:52 +10:00
kfifo.c [PATCH] gfp flags annotations - part 1 2005-10-08 15:00:57 -07:00
kmod.c [PATCH] lockdep: annotate on-stack completions 2006-07-03 15:27:09 -07:00
kprobes.c [PATCH] Notify page fault call chain 2006-06-26 09:58:22 -07:00
ksysfs.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
kthread.c [PATCH] remove kernel/kthread.c:kthread_stop_sem() 2006-07-14 21:53:52 -07:00
lockdep_internals.h [PATCH] lockdep: core 2006-07-03 15:27:03 -07:00
lockdep_proc.c [PATCH] lockdep: procfs 2006-07-03 15:27:04 -07:00
lockdep.c [PATCH] lockdep: core, reduce per-lock class-cache size 2006-07-10 13:24:14 -07:00
Makefile [PATCH] per-task-delay-accounting: taskstats interface 2006-07-14 21:53:56 -07:00
module.c [PATCH] null-terminate over-long /proc/kallsyms symbols 2006-07-14 21:53:52 -07:00
mutex-debug.c [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
mutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
mutex.c [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
mutex.h [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
panic.c [PATCH] lockdep: disable lock debugging when kernel state becomes untrusted 2006-07-10 13:24:27 -07:00
params.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
pid.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
posix-cpu-timers.c [PATCH] arm_timer: remove a racy and obsolete PF_EXITING check 2006-06-17 10:52:13 -07:00
posix-timers.c [PATCH] hrtimers: remove data field 2006-03-26 08:57:03 -08:00
printk.c [PATCH] kernel/printk.c: EXPORT_SYMBOL_UNUSED 2006-07-10 13:24:17 -07:00
profile.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
ptrace.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
rcupdate.c [PATCH] lockdep: locking init debugging improvement 2006-07-03 15:27:02 -07:00
rcutorture.c [PATCH] rcutorture: add call_rcu_bh() operations 2006-06-27 17:32:40 -07:00
relay.c [PATCH] relay: consolidate sendfile() and read() code 2006-03-23 19:58:45 +01:00
resource.c [PATCH] The scheduled unexport of insert_resource 2006-07-12 16:09:08 -07:00
rtmutex_common.h [PATCH] pi-futex: futex_lock_pi/futex_unlock_pi support 2006-06-27 17:32:47 -07:00
rtmutex-debug.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
rtmutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rtmutex-tester.c [PATCH] Add try_to_freeze() to rt-test kthreads 2006-07-14 21:53:53 -07:00
rtmutex.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
rtmutex.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rwsem.c [PATCH] lockdep: prove rwsem locking correctness 2006-07-03 15:27:04 -07:00
sched.c [PATCH] per-task-delay-accounting: cpu delay collection via schedstats 2006-07-14 21:53:56 -07:00
seccomp.c
signal.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2006-06-30 15:39:30 -07:00
softirq.c [PATCH] unexport open_softirq 2006-07-14 21:53:53 -07:00
softlockup.c [PATCH] cpu hotplug: revert initdata patch submitted for 2.6.17 2006-06-27 17:32:41 -07:00
spinlock.c [PATCH] lockdep: prove spinlock rwlock locking correctness 2006-07-03 15:27:04 -07:00
stacktrace.c [PATCH] lockdep: stacktrace subsystem, core 2006-07-03 15:27:02 -07:00
stop_machine.c [PATCH] revert "kthread: convert stop_machine into a kthread" 2006-07-03 21:25:20 -07:00
sys_ni.c [PATCH] sys_move_pages: 32bit support (i386, x86_64) 2006-06-23 07:42:53 -07:00
sys.c [PATCH] Fix prctl privilege escalation and suid_dumpable (CVE-2006-2451) 2006-07-12 12:50:25 -07:00
sysctl.c [PATCH] ZVC/zone_reclaim: Leave 1% of unmapped pagecache pages for file I/O 2006-07-03 15:26:59 -07:00
taskstats.c [PATCH] Remove down_write() from taskstats code invoked on the exit() path 2006-07-14 21:53:57 -07:00
time.c [PATCH] Time: Introduce arch generic time accessors 2006-06-26 09:58:20 -07:00
timer.c [PATCH] improve timekeeping resume robustness 2006-07-14 21:53:54 -07:00
uid16.c [PATCH] Add more prevent_tail_call() 2006-04-19 16:27:18 -07:00
unwind.c [PATCH] x86_64: allow unwinder to build without module support 2006-06-26 10:48:18 -07:00
user.c [PATCH] selinux: add hooks for key subsystem 2006-06-22 15:05:55 -07:00
wait.c [PATCH] uninline init_waitqueue_head() 2006-07-10 13:24:25 -07:00
workqueue.c Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq 2006-07-04 14:00:26 -07:00