linux/kernel
Tejun Heo 85fbd722ad libata, freezer: avoid block device removal while system is frozen
Freezable kthreads and workqueues are fundamentally problematic in
that they effectively introduce a big kernel lock widely used in the
kernel and have already been the culprit of several deadlock
scenarios.  This is the latest occurrence.

During resume, libata rescans all the ports and revalidates all
pre-existing devices.  If it determines that a device has gone
missing, the device is removed from the system which involves
invalidating block device and flushing bdi while holding driver core
layer locks.  Unfortunately, this can race with the rest of device
resume.  Because freezable kthreads and workqueues are thawed after
device resume is complete and block device removal depends on
freezable workqueues and kthreads (e.g. bdi_wq, jbd2) to make
progress, this can lead to deadlock - block device removal can't
proceed because kthreads are frozen and kthreads can't be thawed
because device resume is blocked behind block device removal.

839a8e8660 ("writeback: replace custom worker pool implementation
with unbound workqueue") made this particular deadlock scenario more
visible but the underlying problem has always been there - the
original forker task and jbd2 are freezable too.  In fact, this is
highly likely just one of many possible deadlock scenarios given that
freezer behaves as a big kernel lock and we don't have any debug
mechanism around it.

I believe the right thing to do is getting rid of freezable kthreads
and workqueues.  This is something fundamentally broken.  For now,
implement a funny workaround in libata - just avoid doing block device
hot[un]plug while the system is frozen.  Kernel engineering at its
finest.  :(

v2: Add EXPORT_SYMBOL_GPL(pm_freezing) for cases where libata is built
    as a module.

v3: Comment updated and polling interval changed to 10ms as suggested
    by Rafael.

v4: Add #ifdef CONFIG_FREEZER around the hack as pm_freezing is not
    defined when FREEZER is not configured thus breaking build.
    Reported by kbuild test robot.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tomaž Šolc <tomaz.solc@tablix.org>
Reviewed-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=62801
Link: http://lkml.kernel.org/r/20131213174932.GA27070@htj.dyndns.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Cc: kbuild test robot <fengguang.wu@intel.com>
2013-12-19 13:50:32 -05:00
..
cpu sched: Add NEED_RESCHED to the preempt_count 2013-09-25 14:07:49 +02:00
debug kdb: Add support for external NMI handler to call KGDB/KDB 2013-10-03 18:47:54 +02:00
events list: introduce list_next_entry() and list_prev_entry() 2013-11-13 12:09:23 +09:00
gcov gcov: reuse kbasename helper 2013-11-13 12:09:34 +09:00
irq Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-19 10:40:00 -08:00
locking locking/lockdep: Mark __lockdep_count_forward_deps() as static 2013-11-13 13:50:17 +01:00
power Merge branch 'pm-sleep' 2013-11-19 01:07:08 +01:00
printk printk.c: comments should refer to /proc/vmcore instead of /proc/vmcoreinfo 2013-11-13 12:09:14 +09:00
rcu This batch of changes is mostly clean ups and small bug fixes. 2013-11-16 12:23:18 -08:00
sched sched/fair: Avoid integer overflow 2013-11-13 13:33:55 +01:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 10:36:00 +09:00
trace This batch of changes is mostly clean ups and small bug fixes. 2013-11-16 12:23:18 -08:00
.gitignore kernel/hz.bc: ignore. 2013-04-22 07:09:06 -07:00
acct.c fs: Fix hang with BSD accounting on frozen filesystem 2013-05-04 14:57:58 -04:00
async.c
audit_tree.c kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() 2013-06-12 16:29:46 -07:00
audit_watch.c
audit.c Merge git://git.infradead.org/users/eparis/audit 2013-11-21 19:18:14 -08:00
audit.h audit: call audit_bprm() only once to add AUDIT_EXECVE information 2013-11-05 11:15:03 -05:00
auditfilter.c audit: do not reject all AUDIT_INODE filter types 2013-11-05 11:09:16 -05:00
auditsc.c audit: fix type of sessionid in audit_set_loginuid() 2013-11-06 11:47:24 -05:00
backtracetest.c
bounds.c kernel/bounds: avoid circular dependencies in generated headers 2013-11-19 14:20:12 -08:00
capability.c xfs: update for v3.12-rc1 2013-09-09 11:19:09 -07:00
cgroup_freezer.c cgroup: make css_for_each_descendant() and friends include the origin css in the iteration 2013-08-08 20:11:27 -04:00
cgroup.c consolidate simple ->d_delete() instances 2013-11-15 22:04:17 -05:00
compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-05-01 07:21:43 -07:00
configs.c proc: Supply PDE attribute setting accessor functions 2013-05-01 17:29:18 -04:00
context_tracking.c Linux 3.12-rc4 2013-10-09 12:36:13 +02:00
cpu_pm.c
cpu.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:55:11 +09:00
cpuset.c Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2013-09-03 18:25:03 -07:00
crash_dump.c
cred.c
delayacct.c kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk() 2013-11-13 12:09:12 +09:00
dma.c
elfcore.c switch elf_core_write_extra_phdrs() to dump_emit() 2013-11-09 00:16:23 -05:00
exec_domain.c
exit.c ptrace: revert "Prepare to fix racy accesses on task breakpoints" 2013-07-09 10:33:26 -07:00
extable.c extable: skip sorting if the table is empty 2013-09-11 15:58:25 -07:00
fork.c mm: implement split page table lock for PMD level 2013-11-15 09:32:15 +09:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2013-12-19 13:50:32 -05:00
futex_compat.c
futex.c locking: Move the rtmutex code to kernel/locking/ 2013-11-06 09:23:59 +01:00
groups.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
hrtimer.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
hung_task.c Here are the 3.13 KVM changes. There was a lot of work on the PPC 2013-11-15 13:51:36 +09:00
irq_work.c
itimer.c
jump_label.c static_key: WARN on usage before jump_label_init was called 2013-10-19 19:45:35 -04:00
kallsyms.c kernel: kallsyms: memory override issue, need check destination buffer length 2013-04-15 15:17:26 +09:30
kcmp.c
Kconfig.freezer
Kconfig.hz kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS 2013-11-15 09:32:22 +09:00
Kconfig.locks locking: Fix copy/paste errors of "ARCH_INLINE_*_UNLOCK_BH" 2013-05-28 08:50:00 +02:00
Kconfig.preempt
kexec.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-11-15 16:47:22 -08:00
kmod.c kernel/kmod.c: check for NULL in call_usermodehelper_exec() 2013-09-30 14:31:02 -07:00
kprobes.c kprobes: use KSYM_NAME_LEN to size identifier buffers 2013-11-13 12:09:26 +09:00
ksysfs.c kernel: replace strict_strto*() with kstrto*() 2013-09-12 15:38:03 -07:00
kthread.c kthread: make kthread_create() killable 2013-11-13 12:08:59 +09:00
latencytop.c
Makefile Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2013-11-21 19:46:00 -08:00
module_signing.c keys: change asymmetric keys to use common hash definitions 2013-10-25 17:15:18 -04:00
module-internal.h KEYS: Separate the kernel signature checking keyring from module signing 2013-09-25 17:17:01 +01:00
module.c Mainly boring here, too. rmmod --wait finally removed, though. 2013-11-15 13:27:50 +09:00
notifier.c
nsproxy.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2013-09-07 14:35:32 -07:00
padata.c padata - Register hotcpu notifier after initialization 2013-08-29 14:37:59 +10:00
panic.c kernel/panic.c: reduce 1 byte usage for print tainted buffer 2013-11-13 12:09:35 +09:00
params.c kernel/params: fix handling of signed integer types 2013-09-28 12:35:52 -07:00
pid_namespace.c pid_namespace: make freeing struct pid_namespace rcu-delayed 2013-10-24 23:43:29 -04:00
pid.c pidns: fix free_pid() to handle the first fork failure 2013-09-30 14:31:03 -07:00
posix-cpu-timers.c posix_timers: fix racy timer delta caching on task exit 2013-07-03 16:54:42 +02:00
posix-timers.c posix-timers: Remove unused variable 2013-04-18 12:51:19 +02:00
profile.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
ptrace.c exec/ptrace: fix get_dumpable() incorrect tests 2013-11-13 12:09:33 +09:00
range.c range: Do not add new blank slot with add_range_with_merge 2013-06-18 11:32:10 -05:00
reboot.c kernel/reboot.c: re-enable the function of variable reboot_default 2013-09-24 17:00:26 -07:00
relay.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
res_counter.c memcg: reduce function dereference 2013-09-12 15:38:02 -07:00
resource.c kernel/resource.c: remove the unneeded assignment in function __find_resource 2013-07-03 16:08:06 -07:00
seccomp.c seccomp: allow BPF_XOR based ALU instructions. 2013-03-26 11:07:19 +11:00
signal.c constify copy_siginfo_to_user{,32}() 2013-11-09 00:16:29 -05:00
smp.c kernel: fix generic_exec_single indentation 2013-11-15 09:32:22 +09:00
smpboot.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
smpboot.h
softirq.c revert "softirq: Add support for triggering softirq work on softirqs" 2013-11-15 09:32:22 +09:00
stacktrace.c
stop_machine.c stop_machine: Fix race between stop_two_cpus() and stop_cpus() 2013-11-11 12:43:38 +01:00
sys_ni.c unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE 2013-05-09 13:46:38 -04:00
sys.c kernel/sys.c: remove obsolete #include <linux/kexec.h> 2013-11-13 12:09:13 +09:00
sysctl_binary.c kernel/sysctl_binary.c: use scnprintf() instead of snprintf() 2013-11-13 12:09:33 +09:00
sysctl.c Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:30:30 +09:00
system_certificates.S kernel/system_certificate.S: use real contents instead of macro GLOBAL() 2013-10-30 12:58:00 +00:00
system_keyring.c KEYS: Make the system 'trusted' keyring viewable by userspace 2013-09-25 17:17:01 +01:00
task_work.c task_work: documentation 2013-09-11 15:58:27 -07:00
taskstats.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
test_kprobes.c kernel/: rename random32() to prandom_u32() 2013-04-29 18:28:42 -07:00
time.c sched: Rename sched.c as sched/core.c in comments and Documentation 2013-06-19 12:58:42 +02:00
timeconst.bc
timer.c sched: Introduce preempt_count accessor functions 2013-09-25 14:07:32 +02:00
tracepoint.c Tracing updates for Linux 3.10 2013-04-29 13:55:38 -07:00
tsacct.c
uid16.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
up.c kernel: provide a __smp_call_function_single stub for !CONFIG_SMP 2013-11-15 09:32:22 +09:00
user_namespace.c KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches 2013-09-24 10:35:19 +01:00
user-return-notifier.c
user.c KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches 2013-09-24 10:35:19 +01:00
utsname_sysctl.c
utsname.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
watchdog.c watchdog: update watchdog_thresh properly 2013-09-24 17:00:25 -07:00
workqueue_internal.h sched: Rename sched.c as sched/core.c in comments and Documentation 2013-06-19 12:58:42 +02:00
workqueue.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-09-06 09:36:28 -07:00