linux/drivers
Neil Horman 9fe0617d9b bonding: prevent deadlock on slave store with alb mode (v3)
This soft lockup was recently reported:

[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
bonding bond5: master_dev is not up in bond_enslave
[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.

BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
CPU 12:
Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
be2d
Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
.text.lock.spinlock+0x26/00
RSP: 0018:ffff810113167da8  EFLAGS: 00000286
RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
 [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
 [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
 [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
 [<ffffffff8006457b>] __down_write_nested+0x12/0x92
 [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
 [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80016b87>] vfs_write+0xce/0x174
 [<ffffffff80017450>] sys_write+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

It occurs because we are able to change the slave configuarion of a bond while
the bond interface is down.  The bonding driver initializes some data structures
only after its ndo_open routine is called.  Among them is the initalization of
the alb tx and rx hash locks.  So if we add or remove a slave without first
opening the bond master device, we run the risk of trying to lock/unlock a
spinlock that has garbage for data in it, which results in our above softlock.

Note that sometimes this works, because in many cases an unlocked spinlock has
the raw_lock parameter initialized to zero (meaning that the kzalloc of the
net_device private data is equivalent to calling spin_lock_init), but thats not
true in all cases, and we aren't guaranteed that condition, so we need to pass
the relevant spinlocks through the spin_lock_init function.

Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
the ndo_init path, so they are ready for use by the bond_store_slaves path.

Change notes:
v2) Based on conversation with Jay and Nicolas it seems that the ability to
enslave devices while the bond master is down should be safe to do.  As such
this is an outlier bug, and so instead we'll just initalize the errant spinlocks
in the init path rather than the open path, solving the problem.  We'll also
remove the warnings about the bond being down during enslave operations, since
it should be safe

v3) Fix spelling error

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: jtluka@redhat.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: nicolas.2p.debian@gmail.com
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-25 17:55:33 -04:00
..
accessibility
acpi Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
amba
ata Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev 2011-05-20 14:31:27 -07:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2011-05-20 13:43:21 -07:00
auxdisplay
base Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
bcma bcma: add PCI ID of the card found in Thinkpad X120e 2011-05-16 14:25:29 -04:00
block Add appropriate <linux/prefetch.h> include for prefetch users 2011-05-22 21:41:57 -07:00
bluetooth Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2011-05-16 19:32:19 -04:00
cdrom
char Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/apm 2011-05-23 09:08:19 -07:00
clk CLKDEV: Fix clkdev return value for NULL clk case 2011-04-30 10:14:08 +01:00
clocksource Merge branch 'consolidate-clksrc-i8253' of master.kernel.org:~rmk/linux-2.6-arm into timers/clocksource 2011-05-14 12:06:36 +02:00
connector
cpufreq Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-19 17:55:12 -07:00
cpuidle
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2011-05-20 17:24:14 -07:00
dca
dio
dma Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
edac drivercore: revert addition of of_match to struct device 2011-05-18 12:32:23 -06:00
eisa
firewire firewire: sbp2: parallelize login, reconnect, logout 2011-05-10 22:53:46 +02:00
firmware Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
gpio
gpu Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
hid Merge branches 'doc', 'multitouch', 'upstream' and 'upstream-fixes' into for-linus 2011-05-23 12:49:25 +02:00
hwmon hwmon: (coretemp) Fix checkpatch errors 2011-05-21 07:29:17 -07:00
hwspinlock
i2c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
ide Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6 2011-05-22 22:08:32 -07:00
idle
ieee802154 net: call dev_alloc_name from register_netdevice 2011-05-05 10:57:45 -07:00
infiniband Add appropriate <linux/prefetch.h> include for prefetch users 2011-05-22 21:41:57 -07:00
input input/atari: Fix mouse movement and button mapping 2011-05-19 18:19:12 +02:00
isdn isdn: netjet - blacklist Digium TDM400P 2011-05-25 17:55:32 -04:00
leds Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
lguest Correct occurrences of 2011-05-06 09:27:55 -07:00
macintosh Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
mca
md Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
media Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
memstick
message Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
mfd mfd: Fix for the TWL4030 PM sleep/wakeup sequence 2011-05-11 11:09:58 +02:00
misc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
mmc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
mtd MIPS: Alchemy: Clean up GPIO registers and accessors 2011-05-19 09:55:46 +01:00
net bonding: prevent deadlock on slave store with alb mode (v3) 2011-05-25 17:55:33 -04:00
nfc
nubus
of
oprofile
parisc
parport
pci Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-19 17:28:58 -07:00
pcmcia
platform Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
pnp
power
pps
ps3
rapidio rapidio: fix default routing initialization 2011-05-18 02:55:22 -07:00
regulator regulator: change debug statement be consistent with the style of the rest 2011-05-10 10:11:29 +02:00
rtc Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-19 17:45:08 -07:00
s390 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2011-05-20 13:43:21 -07:00
sbus
scsi Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
sfi
sh
sn
spi Haavard Skinnemoen has left Atmel 2011-05-18 23:24:50 +02:00
ssb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2011-05-20 13:43:21 -07:00
staging Merge branch 'for-davem' of ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2011-05-25 13:28:55 -04:00
target Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
tc
telephony
thermal
tty Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
uio
usb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
uwb
vhost Correct occurrences of 2011-05-06 09:27:55 -07:00
video Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
virtio
vlynq
w1
watchdog Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
xen Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-05-23 09:12:26 -07:00
zorro
Kconfig Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2011-05-20 13:43:21 -07:00
Makefile bcma: add Broadcom specific AMBA bus driver 2011-05-10 15:54:54 -04:00