linux/kernel
Wanpeng Li 5473e0cc37 sched: 'Annotate' migrate_tasks()
Kernel testing triggered this warning:

| WARNING: CPU: 0 PID: 13 at kernel/sched/core.c:1156 do_set_cpus_allowed+0x7e/0x80()
| Modules linked in:
| CPU: 0 PID: 13 Comm: migration/0 Not tainted 4.2.0-rc1-00049-g25834c7 #2
| Call Trace:
|   dump_stack+0x4b/0x75
|   warn_slowpath_common+0x8b/0xc0
|   warn_slowpath_null+0x22/0x30
|   do_set_cpus_allowed+0x7e/0x80
|   cpuset_cpus_allowed_fallback+0x7c/0x170
|   select_fallback_rq+0x221/0x280
|   migration_call+0xe3/0x250
|   notifier_call_chain+0x53/0x70
|   __raw_notifier_call_chain+0x1e/0x30
|   cpu_notify+0x28/0x50
|   take_cpu_down+0x22/0x40
|   multi_cpu_stop+0xd5/0x140
|   cpu_stopper_thread+0xbc/0x170
|   smpboot_thread_fn+0x174/0x2f0
|   kthread+0xc4/0xe0
|   ret_from_kernel_thread+0x21/0x30

As Peterz pointed out:

| So the normal rules for changing task_struct::cpus_allowed are holding
| both pi_lock and rq->lock, such that holding either stabilizes the mask.
|
| This is so that wakeup can happen without rq->lock and load-balance
| without pi_lock.
|
| From this we already get the relaxation that we can omit acquiring
| rq->lock if the task is not on the rq, because in that case
| load-balancing will not apply to it.
|
| ** these are the rules currently tested in do_set_cpus_allowed() **
|
| Now, since __set_cpus_allowed_ptr() uses task_rq_lock() which
| unconditionally acquires both locks, we could get away with holding just
| rq->lock when on_rq for modification because that'd still exclude
| __set_cpus_allowed_ptr(), it would also work against
| __kthread_bind_mask() because that assumes !on_rq.
|
| That said, this is all somewhat fragile.
|
| Now, I don't think dropping rq->lock is quite as disastrous as it
| usually is because !cpu_active at this point, which means load-balance
| will not interfere, but that too is somewhat fragile.
|
| So we end up with a choice of two fragile..

This patch fixes it by following the rules for changing
task_struct::cpus_allowed with both pi_lock and rq->lock held.

Reported-by: kernel test robot <ying.huang@intel.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
[ Modified changelog and patch. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/BLU436-SMTP1660820490DE202E3934ED3806E0@phx.gbl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-11 07:57:50 +02:00
..
bpf bpf: allow networking programs to use bpf_trace_printk() for debugging 2015-06-15 15:53:50 -07:00
configs kconfig: add xenconfig defconfig helper 2015-06-16 11:04:29 +01:00
debug
events perf/ring-buffer: Clarify the use of page::private for high-order AUX allocations 2015-08-12 11:43:20 +02:00
gcov gcov: add support for GCC 5.1 2015-06-30 19:44:57 -07:00
irq Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 14:33:35 -07:00
livepatch Merge branches 'for-4.1/upstream-fixes', 'for-4.2/kaslr' and 'for-4.2/upstream' into for-linus 2015-06-22 16:26:56 +02:00
locking locking/pvqspinlock: Fix kernel panic in locking-selftest 2015-07-21 10:18:07 +02:00
power Power management and ACPI fixes for v4.2-rc1 2015-07-01 14:17:44 -07:00
printk printk: improve the description of /dev/kmsg line format 2015-06-30 19:44:59 -07:00
rcu rcu,locking: Privatize smp_mb__after_unlock_lock() 2015-08-04 08:49:21 -07:00
sched sched: 'Annotate' migrate_tasks() 2015-09-11 07:57:50 +02:00
time nohz: Assert existing housekeepers when nohz full enabled 2015-09-02 10:33:22 +02:00
trace Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-08-31 20:26:22 -07:00
.gitignore
acct.c acct: check FMODE_CAN_WRITE 2015-04-11 22:27:55 -04:00
async.c
audit_tree.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
audit_watch.c VFS: audit: d_backing_inode() annotations 2015-04-15 15:06:55 -04:00
audit.c Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-06-27 13:53:16 -07:00
audit.h Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-04-22 14:49:23 -07:00
auditfilter.c
auditsc.c Fix broken audit tests for exec arg len 2015-07-08 09:33:38 -07:00
backtracetest.c
bounds.c
capability.c kernel: conditionally support non-root users, groups and capabilities 2015-04-15 16:35:22 -07:00
cgroup_freezer.c
cgroup.c rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN() 2015-07-22 15:27:32 -07:00
compat.c compat: cleanup coding in compat_get_bitmap() and compat_put_bitmap() 2015-06-04 23:57:18 +02:00
configs.c
context_tracking.c context_tracking: Inherit TIF_NOHZ through forks instead of context switches 2015-05-07 12:02:51 +02:00
cpu_pm.c
cpu.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-08-31 20:26:22 -07:00
cpuset.c cpuset: use trialcs->mems_allowed as a temp variable 2015-08-10 11:18:41 -04:00
crash_dump.c
cred.c kernel: conditionally support non-root users, groups and capabilities 2015-04-15 16:35:22 -07:00
delayacct.c
dma.c
elfcore.c
exec_domain.c Remove rest of exec domains. 2015-04-12 21:03:31 +02:00
exit.c exit,stats: /* obey this comment */ 2015-06-25 17:00:43 -07:00
extable.c
fork.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-08-31 20:26:22 -07:00
freezer.c
futex_compat.c
futex.c Merge branch 'sched-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-06-24 14:46:01 -07:00
groups.c kernel: conditionally support non-root users, groups and capabilities 2015-04-15 16:35:22 -07:00
hung_task.c kernel/hung_task.c: change hung_task.c to use for_each_process_thread() 2015-04-15 16:35:22 -07:00
irq_work.c
jump_label.c module, jump_label: Fix module locking 2015-05-27 11:09:50 +09:30
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS 2015-05-12 09:46:00 +02:00
Kconfig.preempt
kexec.c kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path 2015-06-30 19:44:57 -07:00
kmod.c
kprobes.c perf/x86/hw_breakpoints: Disallow kernel breakpoints unless kprobe-safe 2015-08-04 10:16:54 +02:00
ksysfs.c
kthread.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-08-31 20:26:22 -07:00
latencytop.c
Makefile make certificate list change message more useful 2015-07-02 16:42:13 -07:00
module_signing.c
module-internal.h
module.c module: weaken locking assertion for oops path. 2015-07-29 06:13:22 +09:30
notifier.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
nsproxy.c
padata.c
panic.c kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path 2015-06-30 19:44:57 -07:00
params.c Minor merge needed, due to function move. 2015-07-01 10:49:25 -07:00
pid_namespace.c
pid.c rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN() 2015-07-22 15:27:32 -07:00
profile.c
ptrace.c ptrace: ptrace_detach() can no longer race with SIGKILL 2015-04-17 09:04:06 -04:00
range.c
reboot.c kernel/reboot.c: add orderly_reboot for graceful reboot 2015-04-15 16:35:23 -07:00
relay.c kernel/relay.c: use kvfree() in relay_free_page_array() 2015-06-30 19:44:59 -07:00
resource.c mm: Fix bugs in region_is_ram() 2015-07-22 17:20:34 +02:00
seccomp.c seccomp, filter: add and use bpf_prog_create_from_user from seccomp 2015-05-09 17:35:05 -04:00
signal.c signal: fix information leak in copy_siginfo_to_user 2015-08-07 04:39:40 +03:00
smp.c smp: Fix error case handling in smp_call_function_*() 2015-04-19 13:19:23 -07:00
smpboot.c watchdog: add watchdog_cpumask sysctl to assist nohz 2015-06-24 17:49:40 -07:00
smpboot.h
softirq.c
stacktrace.c
stop_machine.c stop_machine: Remove cpu_stop_work's from list in cpu_stop_park() 2015-08-03 12:21:28 +02:00
sys_ni.c x86/ldt: Make modify_ldt() optional 2015-07-31 13:30:45 +02:00
sys.c prctl: more prctl(PR_SET_MM_*) checks 2015-06-25 17:00:37 -07:00
sysctl_binary.c
sysctl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2015-07-03 15:20:57 -07:00
system_certificates.S
system_keyring.c
task_work.c
taskstats.c
test_kprobes.c
torture.c rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE() 2015-05-27 12:56:15 -07:00
tracepoint.c
tsacct.c
uid16.c
up.c
user_namespace.c
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
watchdog.c watchdog: add watchdog_cpumask sysctl to assist nohz 2015-06-24 17:49:40 -07:00
workqueue_internal.h
workqueue.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-08-31 20:26:22 -07:00