linux/kernel
Tejun Heo ed27b9f7a1 cgroup: don't hold css_set_rwsem across css task iteration
css_sets are synchronized through css_set_rwsem but the locking scheme
is kinda bizarre.  The hot paths - fork and exit - have to write lock
the rwsem making the rw part pointless; furthermore, many readers
already hold cgroup_mutex.

One of the readers is css task iteration.  It read locks the rwsem
over the entire duration of iteration.  This leads to silly locking
behavior.  When cpuset tries to migrate processes of a cgroup to a
different NUMA node, css_set_rwsem is held across the entire migration
attempt which can take a long time locking out forking, exiting and
other cgroup operations.

This patch updates css task iteration so that it locks css_set_rwsem
only while the iterator is being advanced.  css task iteration
involves two levels - css_set and task iteration.  As css_sets in use
are practically immutable, simply pinning the current one is enough
for resuming iteration afterwards.  Task iteration is tricky as tasks
may leave their css_set while iteration is in progress.  This is
solved by keeping track of active iterators and advancing them if
their next task leaves its css_set.

v2: put_task_struct() in css_task_iter_next() moved outside
    css_set_rwsem.  A later patch will add cgroup operations to
    task_struct free path which may grab the same lock and this avoids
    deadlock possibilities.

    css_set_move_task() updated to use list_for_each_entry_safe() when
    walking task_iters and advancing them.  This is necessary as
    advancing an iter may remove it from the list.

Signed-off-by: Tejun Heo <tj@kernel.org>
2015-10-15 16:41:52 -04:00
..
bpf bpf: fix out of bounds access in verifier log 2015-09-09 14:11:55 -07:00
configs kconfig: add xenconfig defconfig helper 2015-06-16 11:04:29 +01:00
debug
events kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
gcov gcov: add support for GCC 5.1 2015-06-30 19:44:57 -07:00
irq Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 14:33:35 -07:00
livepatch livepatch: Improve error handling in klp_disable_func() 2015-07-14 22:48:06 +02:00
locking Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-09-05 20:34:28 -07:00
power Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block 2015-09-02 13:10:25 -07:00
printk kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
rcu rcu,locking: Privatize smp_mb__after_unlock_lock() 2015-08-04 08:49:21 -07:00
sched userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key 2015-09-04 16:54:41 -07:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 14:04:50 -07:00
trace Mostly this is just clean ups and micro optimizations. 2015-09-08 14:04:14 -07:00
.gitignore
acct.c
async.c
audit_fsnotify.c audit: clean simple fsnotify implementation 2015-08-06 16:14:53 -04:00
audit_tree.c Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-09-08 13:34:59 -07:00
audit_watch.c Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-09-08 13:34:59 -07:00
audit.c Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-09-08 13:34:59 -07:00
audit.h Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-09-08 13:34:59 -07:00
auditfilter.c audit: implement audit by executable 2015-08-06 16:17:25 -04:00
auditsc.c Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-09-08 13:34:59 -07:00
backtracetest.c
bounds.c
capability.c
cgroup_freezer.c cgroup: allow a cgroup subsystem to reject a fork 2015-07-14 17:29:23 -04:00
cgroup_pids.c cgroup: pids: fix invalid get/put usage 2015-08-25 14:19:25 -04:00
cgroup.c cgroup: don't hold css_set_rwsem across css task iteration 2015-10-15 16:41:52 -04:00
compat.c compat: cleanup coding in compat_get_bitmap() and compat_put_bitmap() 2015-06-04 23:57:18 +02:00
configs.c
context_tracking.c
cpu_pm.c kernel/cpu_pm: fix cpu_cluster_pm_exit comment 2015-09-03 02:42:20 +02:00
cpu.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-08-31 20:26:22 -07:00
cpuset.c cgroup: replace cgroup_has_tasks() with cgroup_is_populated() 2015-10-15 16:41:50 -04:00
crash_dump.c
cred.c kernel/cred.c: remove unnecessary kdebug atomic reads 2015-09-10 13:29:01 -07:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c kernel: exit: fix typo in comment 2015-08-07 13:59:49 +02:00
extable.c kernel/extable.c: remove duplicated include 2015-09-10 13:29:01 -07:00
fork.c sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem 2015-09-16 12:53:17 -04:00
freezer.c
futex_compat.c
futex.c futex: Make should_fail_futex() static 2015-07-20 21:43:54 +02:00
groups.c
hung_task.c
irq_work.c
jump_label.c locking/static_keys: Add selftest 2015-08-03 11:34:16 +02:00
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec_core.c kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo 2015-09-10 13:29:01 -07:00
kexec_file.c kexec: split kexec_file syscall code to kexec_file.c 2015-09-10 13:29:01 -07:00
kexec_internal.h kexec: split kexec_file syscall code to kexec_file.c 2015-09-10 13:29:01 -07:00
kexec.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
kmod.c kmod: handle UMH_WAIT_PROC from system unbound workqueue 2015-09-10 13:29:01 -07:00
kprobes.c perf/x86/hw_breakpoints: Disallow kernel breakpoints unless kprobe-safe 2015-08-04 10:16:54 +02:00
ksysfs.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
kthread.c kernel/kthread.c:kthread_create_on_node(): clarify documentation 2015-09-04 16:54:41 -07:00
latencytop.c
Makefile sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
membarrier.c sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
memremap.c add devm_memremap_pages 2015-08-27 19:40:58 -04:00
module_signing.c PKCS#7: Appropriately restrict authenticated attributes and content type 2015-08-12 17:01:01 +01:00
module-internal.h
module.c module: weaken locking assertion for oops path. 2015-07-29 06:13:22 +09:30
notifier.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
nsproxy.c
padata.c
panic.c kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path 2015-06-30 19:44:57 -07:00
params.c Minor merge needed, due to function move. 2015-07-01 10:49:25 -07:00
pid_namespace.c
pid.c rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN() 2015-07-22 15:27:32 -07:00
profile.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
ptrace.c seccomp: add ptrace options for suspend/resume 2015-07-15 11:52:52 -07:00
range.c
reboot.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
relay.c kernel/relay.c: use kvfree() in relay_free_page_array() 2015-06-30 19:44:59 -07:00
resource.c mm: enhance region_is_ram() to region_intersects() 2015-08-10 23:07:05 -04:00
seccomp.c Merge tag 'seccomp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux into next 2015-07-20 17:19:19 +10:00
signal.c signal: fix information leak in copy_siginfo_to_user 2015-08-07 04:39:40 +03:00
smp.c
smpboot.c smpboot: allow passing the cpumask on per-cpu thread registration 2015-09-04 16:54:41 -07:00
smpboot.h
softirq.c
stacktrace.c
stop_machine.c stop_machine: Remove cpu_stop_work's from list in cpu_stop_park() 2015-08-03 12:21:28 +02:00
sys_ni.c sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
sys.c vfs: Commit to never having exectuables on proc and sysfs. 2015-07-10 10:39:25 -05:00
sysctl_binary.c
sysctl.c sysctl: fix int -> unsigned long assignments in INT_MIN case 2015-09-10 13:29:01 -07:00
task_work.c task_work: remove fifo ordering guarantee 2015-09-05 13:46:58 -07:00
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
uid16.c
up.c
user_namespace.c capabilities: ambient capabilities 2015-09-04 16:54:41 -07:00
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
watchdog.c watchdog: rename watchdog_suspend() and watchdog_resume() 2015-09-04 16:54:41 -07:00
workqueue_internal.h
workqueue.c Merge branch 'for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2015-09-02 08:02:20 -07:00