linux/kernel
Christoph Lameter 0a2966b48f [PATCH] Fix longstanding load balancing bug in the scheduler
The scheduler will stop load balancing if the most busy processor contains
processes pinned via processor affinity.

The scheduler currently only does one search for busiest cpu.  If it cannot
pull any tasks away from the busiest cpu because they were pinned then the
scheduler goes into a corner and sulks leaving the idle processors idle.

F.e.  If you have processor 0 busy running four tasks pinned via taskset,
there are none on processor 1 and one just started two processes on
processor 2 then the scheduler will not move one of the two processes away
from processor 2.

This patch fixes that issue by forcing the scheduler to come out of its
corner and retrying the load balancing by considering other processors for
load balancing.

This patch was originally developed by John Hawkes and discussed at

    http://marc.theaimsgroup.com/?l=linux-kernel&m=113901368523205&w=2.

I have removed extraneous material and gone back to equipping struct rq
with the cpu the queue is associated with since this makes the patch much
easier and it is likely that others in the future will have the same
difficulty of figuring out which processor owns which runqueue.

The overhead added through these patches is a single word on the stack if
the kernel is configured to support 32 cpus or less (32 bit).  For 32 bit
environments the maximum number of cpus that can be configued is 255 which
would result in the use of 32 bytes additional on the stack.  On IA64 up to
1k cpus can be configured which will result in the use of 128 additional
bytes on the stack.  The maximum additional cache footprint is one
cacheline.  Typically memory use will be much less than a cacheline and the
additional cpumask will be placed on the stack in a cacheline that already
contains other local variable.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: John Hawkes <hawkes@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:43 -07:00
..
irq [PATCH] genirq core: fix handle_level_irq() 2006-09-19 07:57:20 -07:00
power [PATCH] prevent swsusp with PAE 2006-09-06 11:00:02 -07:00
time [PATCH] time: rename clocksource functions 2006-06-26 09:58:21 -07:00
.gitignore gitignore: ignore more generated files 2006-01-03 11:35:26 +01:00
acct.c [PATCH] Fix sighand->siglock usage in kernel/acct.c 2006-07-14 21:53:54 -07:00
audit.c [PATCH] sanity check audit_buffer 2006-09-11 13:32:17 -04:00
audit.h [PATCH] audit: AUDIT_PERM support 2006-09-11 13:32:30 -04:00
auditfilter.c [PATCH] audit: AUDIT_PERM support 2006-09-11 13:32:30 -04:00
auditsc.c [PATCH] audit: AUDIT_PERM support 2006-09-11 13:32:30 -04:00
capability.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
compat.c [PATCH] N32 sigset and __COMPAT_ENDIAN_SWAP__ 2006-06-25 10:01:15 -07:00
configs.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
cpu.c cpu hotplug: simplify and hopefully fix locking 2006-07-23 12:12:16 -07:00
cpuset.c [PATCH] cpuset: oom panic fix 2006-08-27 11:01:32 -07:00
delayacct.c [PATCH] task delay accounting fixes 2006-09-01 11:39:08 -07:00
dma.c
exec_domain.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
exit.c [PATCH] eligible_child: remove an obsolete ->tgid check 2006-09-02 14:51:27 -07:00
extable.c [PATCH] symbol_put_addr() locks kernel 2006-05-15 11:20:55 -07:00
fork.c [PATCH] task delay accounting fixes 2006-09-01 11:39:08 -07:00
futex_compat.c [PATCH] futex: Apply recent futex fixes to futex_compat 2006-08-06 08:57:49 -07:00
futex.c [PATCH] Use the correct restart option for futex_lock_pi 2006-09-08 10:22:50 -07:00
hrtimer.c [PATCH] fix hrtimer percpu usage typo 2006-08-14 12:54:28 -07:00
itimer.c [PATCH] hrtimers: remove data field 2006-03-26 08:57:03 -08:00
kallsyms.c [PATCH] null-terminate over-long /proc/kallsyms symbols 2006-07-14 21:53:52 -07:00
Kconfig.hz [PATCH] i386: Selectable Frequency of the Timer Interrupt 2005-06-23 09:45:10 -07:00
Kconfig.preempt [PATCH] sched: voluntary kernel preemption 2005-06-25 16:24:45 -07:00
kexec.c [POWERPC] Add the use of the firmware soft-reset-nmi to kdump. 2006-06-28 15:18:52 +10:00
kfifo.c [PATCH] gfp flags annotations - part 1 2005-10-08 15:00:57 -07:00
kmod.c [PATCH] bug fix in kernel/kmod.c 2006-09-16 12:54:32 -07:00
kprobes.c [PATCH] IA64: kprobe invalidate icache of jump buffer 2006-07-31 13:28:38 -07:00
ksysfs.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
kthread.c [PATCH] remove kernel/kthread.c:kthread_stop_sem() 2006-07-14 21:53:52 -07:00
lockdep_internals.h [PATCH] lockdep: double the number of stack-trace entries 2006-09-13 07:32:14 -07:00
lockdep_proc.c [PATCH] lockdep: procfs 2006-07-03 15:27:04 -07:00
lockdep.c [PATCH] lockdep: core, reduce per-lock class-cache size 2006-07-10 13:24:14 -07:00
Makefile [PATCH] per-task-delay-accounting: taskstats interface 2006-07-14 21:53:56 -07:00
module.c [PATCH] load_module: no BUG if module_subsys uninitialized 2006-09-25 17:38:36 -07:00
mutex-debug.c [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
mutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
mutex.c [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
mutex.h [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
panic.c [PATCH] lockdep: do not touch console state when tainting the kernel 2006-09-06 11:00:02 -07:00
params.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
pid.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
posix-cpu-timers.c [PATCH] arm_timer: remove a racy and obsolete PF_EXITING check 2006-06-17 10:52:13 -07:00
posix-timers.c [PATCH] hrtimers: remove data field 2006-03-26 08:57:03 -08:00
printk.c [PATCH] vt: printk: Fix framebuffer console triggering might_sleep assertion 2006-08-06 08:57:47 -07:00
profile.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
ptrace.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
rcupdate.c [PATCH] rcu_do_batch: make ->qlen decrement irq safe 2006-09-13 07:32:14 -07:00
rcutorture.c [PATCH] rcutorture: add call_rcu_bh() operations 2006-06-27 17:32:40 -07:00
relay.c [PATCH] relay: consolidate sendfile() and read() code 2006-03-23 19:58:45 +01:00
resource.c [PATCH] memory hotadd fixes: find_next_system_ram catch range fix 2006-08-06 08:57:48 -07:00
rtmutex_common.h [PATCH] pi-futex: futex_lock_pi/futex_unlock_pi support 2006-06-27 17:32:47 -07:00
rtmutex-debug.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
rtmutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rtmutex-tester.c [PATCH] Add try_to_freeze() to rt-test kthreads 2006-07-14 21:53:53 -07:00
rtmutex.c [PATCH] reference rt-mutex-design in rtmutex.c 2006-07-31 13:28:43 -07:00
rtmutex.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rwsem.c [PATCH] lockdep: prove rwsem locking correctness 2006-07-03 15:27:04 -07:00
sched.c [PATCH] Fix longstanding load balancing bug in the scheduler 2006-09-26 08:48:43 -07:00
seccomp.c
signal.c Fix force_sig_info() semantics after cleanups 2006-08-02 20:17:49 -07:00
softirq.c [PATCH] Reducing local_bh_enable/disable overhead in irqtrace 2006-07-31 13:28:42 -07:00
softlockup.c [PATCH] cpu hotplug: replace __devinit* with __cpuinit* for cpu notifications 2006-07-31 13:28:39 -07:00
spinlock.c [PATCH] lockdep ifdef fix 2006-09-06 11:00:01 -07:00
stacktrace.c [PATCH] lockdep: stacktrace subsystem, core 2006-07-03 15:27:02 -07:00
stop_machine.c [PATCH] Remove redundant up() in stop_machine() 2006-08-27 11:01:31 -07:00
sys_ni.c [PATCH] sys_move_pages: 32bit support (i386, x86_64) 2006-06-23 07:42:53 -07:00
sys.c [PATCH] Fix prctl privilege escalation and suid_dumpable (CVE-2006-2451) 2006-07-12 12:50:25 -07:00
sysctl.c [PATCH] ZVC/zone_reclaim: Leave 1% of unmapped pagecache pages for file I/O 2006-07-03 15:26:59 -07:00
taskstats.c [NETLINK]: Extend netlink messaging interface 2006-09-22 14:53:43 -07:00
time.c [PATCH] Time: Introduce arch generic time accessors 2006-06-26 09:58:20 -07:00
timer.c [PATCH] sys_getppid oopses on debug kernel 2006-08-14 12:54:29 -07:00
uid16.c [PATCH] Add more prevent_tail_call() 2006-04-19 16:27:18 -07:00
unwind.c [PATCH] x86_64: allow unwinder to build without module support 2006-06-26 10:48:18 -07:00
user.c [PATCH] selinux: add hooks for key subsystem 2006-06-22 15:05:55 -07:00
wait.c [PATCH] uninline init_waitqueue_head() 2006-07-10 13:24:25 -07:00
workqueue.c [PATCH] workqueue: remove lock_cpu_hotplug() 2006-08-14 12:54:29 -07:00