linux/drivers
David Rientjes 30467e0b3b mm, hotplug: fix concurrent memory hot-add deadlock
There's a deadlock when concurrently hot-adding memory through the probe
interface and switching a memory block from offline to online.

When hot-adding memory via the probe interface, add_memory() first takes
mem_hotplug_begin() and then device_lock() is later taken when registering
the newly initialized memory block.  This creates a lock dependency of (1)
mem_hotplug.lock (2) dev->mutex.

When switching a memory block from offline to online, dev->mutex is first
grabbed in device_online() when the write(2) transitions an existing
memory block from offline to online, and then online_pages() will take
mem_hotplug_begin().

This creates a lock inversion between mem_hotplug.lock and dev->mutex.
Vitaly reports that this deadlock can happen when kworker handling a probe
event races with systemd-udevd switching a memory block's state.

This patch requires the state transition to take mem_hotplug_begin()
before dev->mutex.  Hot-adding memory via the probe interface creates a
memory block while holding mem_hotplug_begin(), there is no way to take
dev->mutex first in this case.

online_pages() and offline_pages() are only called when transitioning
memory block state.  We now require that mem_hotplug_begin() is taken
before calling them -- this requires exporting the mem_hotplug_begin() and
mem_hotplug_done() to generic code.  In all hot-add and hot-remove cases,
mem_hotplug_begin() is done prior to device_online().  This is all that is
needed to avoid the deadlock.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:00 -07:00
..
accessibility
acpi ACPI / battery: Fix doubly added battery on system suspend 2015-04-14 09:03:33 -07:00
amba ARM: 8256/1: driver coamba: add device binding path 'driver_override' 2015-02-10 10:23:15 +00:00
android android: binder: fix binder mmap failures 2015-03-01 18:43:51 -08:00
ata Merge branch 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2015-04-13 16:42:16 -07:00
atm
auxdisplay
base mm, hotplug: fix concurrent memory hot-add deadlock 2015-04-14 16:49:00 -07:00
bcma bcma: implement host code support for PCIe Gen 2 devices 2015-01-29 10:54:43 +02:00
block Merge branch 'linus' into irq/core to get the GIC updates which 2015-04-08 23:26:21 +02:00
bluetooth Bluetooth: btusb: Fix issue with CSR based Intel Wireless controllers 2015-02-23 09:30:35 +02:00
bus genirq: Remove the deprecated 'IRQF_DISABLED' request_irq() flag entirely 2015-03-05 20:53:06 +01:00
cdrom
char ipmi_ssif: Use interruptible completion for waiting in the thread 2015-04-10 20:51:42 -05:00
clk The clk fixes for 4.0-rc4 comprise three themes. First are the usual 2015-03-15 15:07:08 -07:00
clocksource ARM, clocksource/drivers: Provide read_boot_clock64() and read_persistent_clock64() and use them 2015-04-03 08:18:23 +02:00
connector
coresight coresight: fix function etm_writel_cp14() parameter order 2015-02-04 10:42:55 -08:00
cpufreq cpufreq: Schedule work for the first-online CPU on resume 2015-04-03 12:59:47 +02:00
cpuidle Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 11:08:28 -07:00
crypto Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma 2015-02-18 08:49:20 -08:00
dca
devfreq Merge branches 'pm-cpufreq', 'pm-cpuidle', 'pm-devfreq', 'pm-opp' and 'pm-tools' 2015-02-13 21:39:06 +01:00
dio
dma Staging driver patches for 4.1-rc1 2015-04-13 17:37:33 -07:00
dma-buf
edac EDAC: Constify of_device_id array 2015-03-20 17:50:07 +01:00
eisa
extcon extcon: max77693: Constify struct regmap_config 2015-01-26 13:47:55 +09:00
firewire firewire: core: use correct vendor/model IDs 2015-02-02 21:56:03 +01:00
firmware Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 10:22:30 -07:00
fmc
gpio gpio: syscon: reduce message level when direction reg offset not in dt 2015-03-27 11:17:08 +01:00
gpu regulator: Updates for v4.1 2015-04-13 15:13:25 -07:00
hid power supply and reset changes for the v4.1 series 2015-04-13 15:21:34 -07:00
hsi HSI: cmt_speech: fix error return code 2015-04-05 14:45:27 +02:00
hv Char / Misc patches for 3.20-rc1 2015-02-15 10:48:44 -08:00
hwmon hwmon: (pwm-fan) Update the duty cycle inorder to control the pwm-fan 2015-04-12 15:59:11 -07:00
hwspinlock
i2c Revert "i2c: core: Dispose OF IRQ mapping at client removal time" 2015-03-12 10:23:05 +01:00
ide Merge branch 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2015-04-13 16:42:16 -07:00
idle intel_idle: Use explicit broadcast oneshot control function 2015-04-03 08:44:35 +02:00
iio Staging driver patches for 4.1-rc1 2015-04-13 17:37:33 -07:00
infiniband IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic 2015-04-02 09:53:59 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2015-04-06 14:10:08 -07:00
iommu PCI changes for the v4.1 merge window: 2015-04-13 15:45:47 -07:00
ipack
irqchip irqchip core change for v4.1 (round 3) 2015-04-11 11:17:28 +02:00
isdn isdn: icn: use strlcpy() when parsing setup options 2015-03-15 22:24:37 -04:00
leds leds: leds-gpio: Pass on error codes unmodified 2015-02-02 14:36:10 -08:00
lguest lguest: now needs PCI_DIRECT. 2015-04-01 10:29:05 -07:00
macintosh
mailbox Merge branch 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration 2015-02-11 12:56:40 -08:00
mcb mcb: Fix error path of mcb_pci_probe 2015-02-03 15:48:51 -08:00
md md/raid0: fix bug with chunksize not a power of 2. 2015-04-10 15:36:31 +10:00
media [media] rtl28xxu: return success for unimplemented FE callback 2015-04-02 18:27:14 -03:00
memory memory/fsl-corenet-cf: Add t1040 support 2015-01-29 22:57:43 -06:00
memstick
message i2o: move to staging 2015-02-03 15:58:39 -08:00
mfd power supply and reset changes for the v4.1 series 2015-04-13 15:21:34 -07:00
misc Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 13:16:36 -07:00
mmc mmc: sdhci-st: Update the quirks for this controller. 2015-04-10 12:55:40 +02:00
mtd Merge branch 'linus' into irq/core to get the GIC updates which 2015-04-08 23:26:21 +02:00
net net/mlx4_core: Fix error message deprecation for ConnectX-2 cards 2015-04-06 17:32:27 -04:00
nfc NFC: nci: Move NFCEE discovery logic 2015-02-04 09:15:18 +01:00
ntb
nubus
of PCI changes for the v4.1 merge window: 2015-04-13 15:45:47 -07:00
oprofile
parisc
parport
pci PCI changes for the v4.1 merge window: 2015-04-13 15:45:47 -07:00
pcmcia Revert "pcmcia: add a new resource manager for non ISA systems" 2015-03-11 14:21:23 +01:00
phy USB patches for 4.1-rc1 2015-04-13 17:07:21 -07:00
pinctrl pinctrl: sun4i: GPIOs configured as irq must be set to input before reading 2015-03-18 10:56:46 +01:00
platform power_supply: Change ownership from driver to core 2015-03-13 23:15:51 +01:00
pnp PNP: Don't check for overlaps with unassigned PCI BARs 2015-03-12 12:30:00 -05:00
power power: twl4030_madc_battery: Add missing MODULE_ALIAS 2015-04-06 19:39:57 +02:00
powercap powercap / RAPL: handle domains with different energy units 2015-03-13 23:18:44 +01:00
pps
ps3
ptp
pwm pwm: tegra: Use NSEC_PER_SEC 2015-02-18 08:40:29 +01:00
rapidio Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma 2015-02-18 08:49:20 -08:00
ras
regulator Merge remote-tracking branch 'regulator/topic/wm8350' into regulator-next 2015-04-10 19:16:06 +01:00
remoteproc
reset
rpmsg virtio_rpmsg: set DRIVER_OK before using device 2015-03-13 15:55:42 +10:30
rtc time, drivers/rtc: Don't bother with rtc_resume() for the nonstop clocksource 2015-04-03 08:18:34 +02:00
s390 s390/dcss: array index 'i' is used before limits check. 2015-02-26 09:24:48 +01:00
sbus
scsi Merge branch 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2015-04-13 16:42:16 -07:00
sfi
sh drivers: sh: Disable PM runtime for multi-platform r8a7740 with genpd 2015-02-24 07:26:12 +09:00
sn
soc ARM: SoC driver updates 2015-02-17 09:38:59 -08:00
spi Merge remote-tracking branches 'spi/topic/spidev' and 'spi/topic/spidev-test' into spi-next 2015-04-11 23:09:31 +01:00
spmi
ssb treewide: Remove unnecessary SSB_DEVTABLE_END macro 2015-02-11 14:38:29 -08:00
staging Staging driver patches for 4.1-rc1 2015-04-13 17:37:33 -07:00
target iscsi target: fix oops when adding reject pdu 2015-04-10 12:33:55 -07:00
tc
thermal drivers: thermal: st: remove several sparse warnings 2015-04-07 13:43:28 -07:00
thunderbolt
tty Merge branch 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2015-04-13 16:19:18 -07:00
uio
usb USB patches for 4.1-rc1 2015-04-13 17:07:21 -07:00
uwb uwb: Remove umc bus legacy suspend/resume support 2015-03-18 17:27:03 +01:00
vfio vfio-pci: Add missing break to enable VFIO_PCI_ERR_IRQ_INDEX 2015-03-12 09:51:38 -06:00
vhost Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2015-03-21 11:24:38 -07:00
video OMAPDSS: fix regression with display sysfs files 2015-02-26 10:23:15 +02:00
virt
virtio virtio_mmio: fix access width for mmio 2015-03-17 12:12:21 +10:30
vlynq
vme vme: tsi148: Master windows support USERx and CR/CSR accesses, not slaves 2015-03-06 17:03:22 -08:00
w1
watchdog watchdog: imgpdc: Fix default heartbeat 2015-03-27 08:47:50 +01:00
xen xen: regression fixes for 4.0-rc6 2015-04-02 13:53:53 -07:00
zorro
Kconfig i2o: move to staging 2015-02-03 15:58:39 -08:00
Makefile