linux/kernel
Tejun Heo e22bee782b workqueue: implement concurrency managed dynamic worker pool
Instead of creating a worker for each cwq and putting it into the
shared pool, manage per-cpu workers dynamically.

Works aren't supposed to be cpu cycle hogs and maintaining just enough
concurrency to prevent work processing from stalling due to lack of
processing context is optimal.  gcwq keeps the number of concurrent
active workers to minimum but no less.  As long as there's one or more
running workers on the cpu, no new worker is scheduled so that works
can be processed in batch as much as possible but when the last
running worker blocks, gcwq immediately schedules new worker so that
the cpu doesn't sit idle while there are works to be processed.

gcwq always keeps at least single idle worker around.  When a new
worker is necessary and the worker is the last idle one, the worker
assumes the role of "manager" and manages the worker pool -
ie. creates another worker.  Forward-progress is guaranteed by having
dedicated rescue workers for workqueues which may be necessary while
creating a new worker.  When the manager is having problem creating a
new worker, mayday timer activates and rescue workers are summoned to
the cpu and execute works which might be necessary to create new
workers.

Trustee is expanded to serve the role of manager while a CPU is being
taken down and stays down.  As no new works are supposed to be queued
on a dead cpu, it just needs to drain all the existing ones.  Trustee
continues to try to create new workers and summon rescuers as long as
there are pending works.  If the CPU is brought back up while the
trustee is still trying to drain the gcwq from the previous offlining,
the trustee will kill all idles ones and tell workers which are still
busy to rebind to the cpu, and pass control over to gcwq which assumes
the manager role as necessary.

Concurrency managed worker pool reduces the number of workers
drastically.  Only workers which are necessary to keep the processing
going are created and kept.  Also, it reduces cache footprint by
avoiding unnecessarily switching contexts between different workers.

Please note that this patch does not increase max_active of any
workqueue.  All workqueues can still only process one work per cpu.

Signed-off-by: Tejun Heo <tj@kernel.org>
2010-06-29 10:07:14 +02:00
..
debug module: fix kdb's illicit use of struct module_use. 2010-06-05 11:17:36 +09:30
gcov microblaze: Enable GCOV_PROFILE_ALL 2009-09-21 14:29:21 +02:00
irq genirq: Clear CPU mask in affinity_hint when none is provided 2010-05-12 11:23:34 +02:00
power workqueue: reimplement workqueue freeze using max_active 2010-06-29 10:07:12 +02:00
time Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-05-19 17:11:10 -07:00
trace workqueue: temporarily remove workqueue tracing 2010-06-29 10:07:11 +02:00
.gitignore
acct.c Merge branch 'next' into for-linus 2010-05-18 08:57:00 +10:00
async.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
audit_tree.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
audit_watch.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
audit.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
audit.h
auditfilter.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
auditsc.c audit: preface audit printk with audit 2010-04-05 13:19:45 -07:00
backtracetest.c
bounds.c kbuild: move bounds.h to include/generated 2009-12-12 13:08:14 +01:00
capability.c sched: Remove remaining USER_SCHED code 2010-04-02 20:12:00 +02:00
cgroup_freezer.c Freezer / cgroup freezer: Update stale locking comments 2010-05-10 23:18:47 +02:00
cgroup.c cgroups: alloc_css_id() increments hierarchy depth 2010-06-04 15:21:45 -07:00
compat.c cpumask: fix compat getaffinity 2010-05-19 11:48:18 -07:00
configs.c
cpu.c sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining 2010-06-08 21:40:36 +02:00
cpuset.c sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining 2010-06-08 21:40:36 +02:00
cred.c umh: creds: kill subprocess_info->cred logic 2010-05-27 09:12:45 -07:00
delayacct.c headers: taskstats_kern.h trim 2009-09-18 09:48:52 -07:00
dma.c
early_res.c x86: Do not free zero sized per cpu areas 2010-03-29 18:55:40 +02:00
elfcore.c elf coredump: add extended numbering support 2010-03-06 11:26:46 -08:00
exec_domain.c sys_personality: change sys_personality() to accept "unsigned int" instead of u_long 2010-06-04 15:21:45 -07:00
exit.c proc: turn signal_struct->count into "int nr_threads" 2010-05-27 09:12:47 -07:00
extable.c
fork.c sched: add hooks for workqueue 2010-06-08 21:40:37 +02:00
freezer.c
futex_compat.c futex: Protect pid lookup in compat code with RCU 2009-12-09 14:22:14 +01:00
futex.c futex: Handle futex value corruption gracefully 2010-02-03 15:13:22 +01:00
groups.c security: remove dead hook task_setgroups 2010-04-12 12:19:18 +10:00
hrtimer.c hrtimer: Avoid double seqlock 2010-05-26 16:15:37 +02:00
hung_task.c softlockup: Fix hung_task_check_count sysctl 2009-11-27 06:21:57 +01:00
hw_breakpoint.c hw_breakpoints: Fix percpu build failure 2010-05-04 08:39:36 +02:00
itimer.c itimers: Fix racy writes to cpu_itimer fields 2009-11-18 16:32:12 +01:00
kallsyms.c kdb: core for kgdb back end (2 of 2) 2010-05-20 21:04:21 -05:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks mutex: Better control mutex adaptive spinning config 2009-12-03 11:50:11 +01:00
Kconfig.preempt
kexec.c kexec: fix OOPS in crash_kernel_shrink 2010-05-11 17:33:42 -07:00
kfifo.c kfifo: Don't use integer as NULL pointer 2010-02-16 15:11:08 -08:00
kmod.c call_usermodehelper: UMH_WAIT_EXEC ignores kernel_thread() failure 2010-05-27 09:12:45 -07:00
kprobes.c kprobes: Move enable/disable_kprobe() out from debugfs code 2010-05-08 18:08:30 +02:00
ksysfs.c sysfs: add struct file* to bin_attr callbacks 2010-05-21 09:37:31 -07:00
kthread.c kthread: implement kthread_data() 2010-06-29 10:07:09 +02:00
latencytop.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
lockdep_internals.h lockdep: No need to disable preemption in debug atomic ops 2010-05-04 05:38:16 +02:00
lockdep_proc.c lockstat: Make lockstat counting per cpu 2010-04-06 00:15:37 +02:00
lockdep_states.h
lockdep.c lockdep: Add novalidate class for dev->mutex conversion 2010-05-21 09:37:30 -07:00
Makefile Move kernel/kgdb.c to kernel/debug/debug_core.c 2010-05-20 21:04:18 -05:00
module.c module: fix bne2 "gave up waiting for init of module libcrc32c" 2010-06-05 11:17:37 +09:30
mutex-debug.c headers: remove sched.h from interrupt.h 2009-10-11 11:20:58 -07:00
mutex-debug.h locking: Implement new raw_spinlock 2009-12-14 23:55:32 +01:00
mutex.c mutex: Fix optimistic spinning vs. BKL 2010-05-19 08:18:44 +02:00
mutex.h
notifier.c sched: Use lockdep-based checking on rcu_dereference() 2010-02-25 10:34:26 +01:00
ns_cgroup.c cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time 2009-09-24 07:20:58 -07:00
nsproxy.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
padata.c kernel/: convert cpu notifier to return encapsulate errno value 2010-05-27 09:12:48 -07:00
panic.c panic: call console_verbose() in panic 2010-05-27 09:12:53 -07:00
params.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-03-12 16:04:50 -08:00
perf_event.c perf: Fix signed comparison in perf_adjust_period() 2010-06-08 18:43:00 +02:00
pid_namespace.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
pid.c pids: increase pid_max based on num_possible_cpus 2010-05-27 09:12:51 -07:00
pm_qos_params.c PM: PM QOS update fix 2010-05-17 00:21:03 +02:00
posix-cpu-timers.c posix-cpu-timers: avoid "task->signal != NULL" checks 2010-05-27 09:12:46 -07:00
posix-timers.c posix_timer: Fix error path in timer_create 2010-05-27 22:38:15 +02:00
printk.c printk,kdb: capture printk() when in kdb shell 2010-05-20 21:04:27 -05:00
profile.c numa: in-kernel profiling: use cpu_to_mem() for per cpu allocations 2010-05-27 09:12:57 -07:00
ptrace.c ptrace: PTRACE_GETFDPIC: fix the unsafe usage of child->mm 2010-05-27 09:12:44 -07:00
range.c x86: Change range end to start+size 2010-02-10 17:47:17 -08:00
rcupdate.c rcu: slim down rcutiny by removing rcu_scheduler_active and friends 2010-05-10 11:08:34 -07:00
rcutiny_plugin.h rcu: slim down rcutiny by removing rcu_scheduler_active and friends 2010-05-10 11:08:34 -07:00
rcutiny.c rcu: remove all rcu head initializations, except on_stack initializations 2010-05-11 16:10:47 -07:00
rcutorture.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-05-18 08:27:54 -07:00
rcutree_plugin.h rcu: remove all rcu head initializations, except on_stack initializations 2010-05-11 16:10:47 -07:00
rcutree_trace.c rcu: reduce the number of spurious RCU_SOFTIRQ invocations 2010-05-10 11:08:35 -07:00
rcutree.c rcu: remove all rcu head initializations, except on_stack initializations 2010-05-11 16:10:47 -07:00
rcutree.h rcu: reduce the number of spurious RCU_SOFTIRQ invocations 2010-05-10 11:08:35 -07:00
relay.c kernel/: convert cpu notifier to return encapsulate errno value 2010-05-27 09:12:48 -07:00
res_counter.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
resource.c resource: shared I/O region support 2010-05-11 12:01:10 -07:00
rtmutex_common.h
rtmutex-debug.c sched: Convert pi_lock to raw_spinlock 2009-12-14 23:55:33 +01:00
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c rtmutes: Convert rtmutex.lock to raw_spinlock 2009-12-14 23:55:33 +01:00
rtmutex.h
rwsem.c
sched_clock.c blkio: fix for modular blk-cgroup build 2010-04-15 08:54:59 +02:00
sched_cpupri.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
sched_cpupri.h sched: Convert cpupri lock to raw_spinlock 2009-12-14 23:55:33 +01:00
sched_debug.c proc_sched_show_task(): use get_nr_threads() 2010-05-27 09:12:47 -07:00
sched_fair.c sched: Fix wake_affine() vs RT tasks 2010-06-01 09:27:16 +02:00
sched_features.h sched: Remove ASYM_GRAN feature 2010-03-11 18:32:53 +01:00
sched_idletask.c sched: Cure load average vs NO_HZ woes 2010-04-23 11:02:02 +02:00
sched_rt.c sched: Add enqueue/dequeue flags 2010-04-02 20:12:05 +02:00
sched_stats.h
sched.c sched: add hooks for workqueue 2010-06-08 21:40:37 +02:00
seccomp.c
semaphore.c
signal.c exit: change zap_other_threads() to count sub-threads 2010-05-27 09:12:46 -07:00
slow-work-debugfs.c SLOW_WORK: Move slow_work's proc file to debugfs 2009-12-01 08:20:31 -08:00
slow-work.c slow-work: use get_ref wrapper instead of directly calling get_ref 2010-03-29 09:13:30 -07:00
slow-work.h SLOW_WORK: CONFIG_SLOW_WORK_PROC should be CONFIG_SLOW_WORK_DEBUG 2010-03-29 09:14:47 -07:00
smp.c kernel/: convert cpu notifier to return encapsulate errno value 2010-05-27 09:12:48 -07:00
softirq.c kernel/: fix BUG_ON checks for cpu notifier callbacks direct call 2010-06-04 15:21:45 -07:00
softlockup.c softlockup: Stop spurious softlockup messages due to overflow 2010-03-21 19:30:13 +01:00
spinlock.c locking: Cleanup the name space completely 2009-12-14 23:55:33 +01:00
srcu.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
stacktrace.c
stop_machine.c sched: Make sure timers have migrated before killing the migration_thread 2010-05-31 08:37:44 +02:00
sys_ni.c Add generic sys_ipc wrapper 2010-03-12 15:52:32 -08:00
sys.c kmod: add init function to usermodehelper 2010-05-27 09:12:44 -07:00
sysctl_binary.c sysctl: don't use own implementation of hex_to_bin() 2010-05-25 08:07:05 -07:00
sysctl_check.c ipv4 05/05: add sysctl to accept packets with local source addresses 2009-12-03 12:14:38 -08:00
sysctl.c pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface 2010-06-03 14:54:39 +02:00
taskstats.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
test_kprobes.c
time.c timekeeping: Fix timezone update 2010-05-24 11:50:38 +02:00
timeconst.pl
timer.c kernel/: fix BUG_ON checks for cpu notifier callbacks direct call 2010-06-04 15:21:45 -07:00
tracepoint.c tracing: Let tracepoints have data passed to tracepoint callbacks 2010-05-14 09:50:34 -04:00
tsacct.c mm: clean up mm_counter 2010-03-06 11:26:23 -08:00
uid16.c headers: utsname.h redux 2009-09-23 18:13:10 -07:00
up.c
user_namespace.c kref: remove kref_set 2010-05-21 09:37:29 -07:00
user-return-notifier.c core: Clean up user return notifers use of per_cpu 2009-12-02 10:22:59 +01:00
user.c sched: Remove a stale comment 2010-05-10 08:48:39 +02:00
utsname_sysctl.c sysctl kernel: Remove binary sysctl logic 2009-11-12 02:04:55 -08:00
utsname.c
wait.c
workqueue_sched.h workqueue: implement concurrency managed dynamic worker pool 2010-06-29 10:07:14 +02:00
workqueue.c workqueue: implement concurrency managed dynamic worker pool 2010-06-29 10:07:14 +02:00