linux/drivers
Tony Camuso cdea46566b ipmi: use rcu lock around call to intf->handlers->sender()
A vendor with a system having more than 128 CPUs occasionally encounters
the following crash during shutdown. This is not an easily reproduceable
event, but the vendor was able to provide the following analysis of the
crash, which exhibits the same footprint each time.

crash> bt
PID: 0      TASK: ffff88017c70ce70  CPU: 5   COMMAND: "swapper/5"
 #0 [ffff88085c143ac8] machine_kexec at ffffffff81059c8b
 #1 [ffff88085c143b28] __crash_kexec at ffffffff811052e2
 #2 [ffff88085c143bf8] crash_kexec at ffffffff811053d0
 #3 [ffff88085c143c10] oops_end at ffffffff8168ef88
 #4 [ffff88085c143c38] no_context at ffffffff8167ebb3
 #5 [ffff88085c143c88] __bad_area_nosemaphore at ffffffff8167ec49
 #6 [ffff88085c143cd0] bad_area_nosemaphore at ffffffff8167edb3
 #7 [ffff88085c143ce0] __do_page_fault at ffffffff81691d1e
 #8 [ffff88085c143d40] do_page_fault at ffffffff81691ec5
 #9 [ffff88085c143d70] page_fault at ffffffff8168e188
    [exception RIP: unknown or invalid address]
    RIP: ffffffffa053c800  RSP: ffff88085c143e28  RFLAGS: 00010206
    RAX: ffff88017c72bfd8  RBX: ffff88017a8dc000  RCX: ffff8810588b5ac8
    RDX: ffff8810588b5a00  RSI: ffffffffa053c800  RDI: ffff8810588b5a00
    RBP: ffff88085c143e58   R8: ffff88017c70d408   R9: ffff88017a8dc000
    R10: 0000000000000002  R11: ffff88085c143da0  R12: ffff8810588b5ac8
    R13: 0000000000000100  R14: ffffffffa053c800  R15: ffff8810588b5a00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <IRQ stack>
    [exception RIP: cpuidle_enter_state+82]
    RIP: ffffffff81514192  RSP: ffff88017c72be50  RFLAGS: 00000202
    RAX: 0000001e4c3c6f16  RBX: 000000000000f8a0  RCX: 0000000000000018
    RDX: 0000000225c17d03  RSI: ffff88017c72bfd8  RDI: 0000001e4c3c6f16
    RBP: ffff88017c72be78   R8: 000000000000237e   R9: 0000000000000018
    R10: 0000000000002494  R11: 0000000000000001  R12: ffff88017c72be20
    R13: ffff88085c14f8e0  R14: 0000000000000082  R15: 0000001e4c3bb400
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018

This is the corresponding stack trace

It has crashed because the area pointed with RIP extracted from timer
element is already removed during a shutdown process.

The function is smi_timeout().

And we think ffff8810588b5a00 in RDX is a parameter struct smi_info

crash> rd ffff8810588b5a00 20
ffff8810588b5a00:  ffff8810588b6000 0000000000000000   .`.X............
ffff8810588b5a10:  ffff880853264400 ffffffffa05417e0   .D&S......T.....
ffff8810588b5a20:  24a024a000000000 0000000000000000   .....$.$........
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a30:  0000000000000000 0000000000000000   ................
ffff8810588b5a40:  ffffffffa053a040 ffffffffa053a060   @.S.....`.S.....
ffff8810588b5a50:  0000000000000000 0000000100000001   ................
ffff8810588b5a60:  0000000000000000 0000000000000e00   ................
ffff8810588b5a70:  ffffffffa053a580 ffffffffa053a6e0   ..S.......S.....
ffff8810588b5a80:  ffffffffa053a4a0 ffffffffa053a250   ..S.....P.S.....
ffff8810588b5a90:  0000000500000002 0000000000000000   ................

Unfortunately the top of this area is already detroyed by someone.
But because of two reasonns we think this is struct smi_info
 1) The address included in between  ffff8810588b5a70 and ffff8810588b5a80:
  are inside of ipmi_si_intf.c  see crash> module ffff88085779d2c0

 2) We've found the area which point this.
  It is offset 0x68 of  ffff880859df4000

crash> rd  ffff880859df4000 100
ffff880859df4000:  0000000000000000 0000000000000001   ................
ffff880859df4010:  ffffffffa0535290 dead000000000200   .RS.............
ffff880859df4020:  ffff880859df4020 ffff880859df4020    @.Y.... @.Y....
ffff880859df4030:  0000000000000002 0000000000100010   ................
ffff880859df4040:  ffff880859df4040 ffff880859df4040   @@.Y....@@.Y....
ffff880859df4050:  0000000000000000 0000000000000000   ................
ffff880859df4060:  0000000000000000 ffff8810588b5a00   .........Z.X....
ffff880859df4070:  0000000000000001 ffff880859df4078   ........x@.Y....

 If we regards it as struct ipmi_smi in shutdown process
 it looks consistent.

The remedy for this apparent race is affixed below.

Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Cc: stable@vger.kernel.org # 3.19

This was first introduced in 7ea0ed2b5b ipmi: Make the
message handler easier to use for SMI interfaces
where some code was moved outside of the rcu_read_lock()
and the lock was not added.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
2017-06-19 12:49:34 -05:00
..
accessibility
acpi libnvdimm for 4.12 2017-05-05 18:49:20 -07:00
amba
android
ata Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2017-05-01 13:34:49 -07:00
atm
auxdisplay Merge branch 'for-4.11/libnvdimm' into for-4.12/dax 2017-04-12 21:59:01 -07:00
base DeviceTree for 4.12: 2017-05-05 19:33:07 -07:00
bcma
block Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2017-05-06 11:25:08 -07:00
bluetooth Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup() 2017-04-30 12:22:14 +02:00
bus
cdrom scsi: introduce a result field in struct scsi_request 2017-04-20 12:16:10 -06:00
char ipmi: use rcu lock around call to intf->handlers->sender() 2017-06-19 12:49:34 -05:00
clk MMC core: 2017-05-02 17:34:32 -07:00
clocksource Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-01 16:15:18 -07:00
connector
cpufreq Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-01 19:12:53 -07:00
cpuidle Merge branches 'pm-cpuidle', 'pm-core', 'pm-domains', 'pm-avs' and 'pm-devfreq' 2017-04-28 23:15:34 +02:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2017-05-02 15:53:46 -07:00
dax libnvdimm for 4.12 2017-05-05 18:49:20 -07:00
dca
devfreq
dio
dma
dma-buf dma-buf: Rename dma-ops to prevent conflict with kunmap_atomic macro 2017-04-20 13:47:46 +05:30
edac EDAC, ghes: Do not enable it by default 2017-04-27 14:15:38 +02:00
eisa
extcon
firewire
firmware arm64 updates for 4.12: 2017-05-05 12:11:37 -07:00
fmc
fpga fpga fr br: update supported version numbers 2017-04-26 11:38:56 +02:00
fsi
gpio char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00
gpu media updates for v4.12-rc1 2017-05-05 17:34:57 -07:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2017-05-02 19:09:35 -07:00
hsi HSI: ssi_protocol: double free in ssip_pn_xmit() 2017-04-21 17:58:45 +02:00
hv char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00
hwmon hwmon: (twl4030-madc) drop driver 2017-04-30 11:45:31 -07:00
hwspinlock
hwtracing char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00
i2c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2017-05-03 12:38:20 -07:00
ide ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset 2017-04-26 07:53:35 -06:00
idle
iio Staging/IIO patches for 4.12-rc1 2017-05-05 18:16:23 -07:00
infiniband char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00
input char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00
iommu extra pull request because I missed tegra. 2017-05-05 17:18:44 -07:00
ipack
irqchip irqchip/mbigen: Fix return value check in mbigen_device_probe() 2017-04-30 11:21:16 +02:00
isdn
leds leds: pca9532: Extend pca9532 device tree support 2017-04-19 20:27:50 +02:00
lguest
lightnvm lightnvm: fix bad back free on error path 2017-05-04 07:53:04 -06:00
macintosh DeviceTree for 4.12: 2017-05-05 19:33:07 -07:00
mailbox mailbox: handle empty message in tx_tick 2017-04-27 16:20:04 +05:30
mcb
md Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2017-05-06 11:25:08 -07:00
media Staging/IIO patches for 4.12-rc1 2017-05-05 18:16:23 -07:00
memory - New Drivers 2017-05-03 12:16:25 -07:00
memstick
message scsi: mpt: Move scsi_remove_host() out of mptscsih_remove_host() 2017-04-24 18:21:17 -04:00
mfd mfd: axp20x: Support AXP803 variant 2017-04-27 11:54:49 +01:00
misc powerpc updates for 4.12 part 1. 2017-05-05 11:36:44 -07:00
mmc MMC core: 2017-05-02 17:34:32 -07:00
mtd Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2017-05-06 11:25:08 -07:00
net qede: Fix possible misconfiguration of advertised autoneg value. 2017-05-04 12:31:03 -04:00
nfc nfc: fix get_unaligned_...() misuses 2017-04-17 00:42:22 +02:00
ntb
nubus nubus: Clean up whitespace 2017-04-20 09:54:24 +02:00
nvdimm libnvdimm for 4.12 2017-05-05 18:49:20 -07:00
nvme lightnvm: create cmd before allocating request 2017-05-04 07:53:04 -06:00
nvmem
of DeviceTree for 4.12: 2017-05-05 19:33:07 -07:00
oprofile
parisc
parport
pci main drm pull request for 4.12 kernel 2017-05-03 11:44:24 -07:00
pcmcia
perf
phy
pinctrl Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2017-05-02 19:09:35 -07:00
platform char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00
pnp
power ACPI updates for v4.12-rc1 2017-05-01 14:13:28 -07:00
powercap
pps
ps3
ptp Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-01 16:15:18 -07:00
pwm
rapidio char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00
ras
regulator Merge remote-tracking branch 'regulator/topic/vctrl' into regulator-next 2017-04-30 22:17:44 +09:00
remoteproc
reset
rpmsg
rtc
s390 libnvdimm for 4.12 2017-05-05 18:49:20 -07:00
sbus
scsi Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2017-05-06 11:25:08 -07:00
sfi
sh
sn
soc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-05-02 16:40:27 -07:00
spi Merge remote-tracking branches 'spi/topic/ti-qspi' and 'spi/topic/xlp' into spi-next 2017-04-26 15:58:22 +01:00
spmi
ssb
staging Staging/IIO patches for 4.12-rc1 2017-05-05 18:16:23 -07:00
target Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block 2017-05-01 10:39:57 -07:00
tc
thermal - New Drivers 2017-05-03 12:16:25 -07:00
thunderbolt
tty DeviceTree for 4.12: 2017-05-05 19:33:07 -07:00
uio
usb DeviceTree for 4.12: 2017-05-05 19:33:07 -07:00
uwb
vfio powerpc updates for 4.12 part 1. 2017-05-05 11:36:44 -07:00
vhost VSOCK: Add virtio vsock vsockmon hooks 2017-04-24 12:35:56 -04:00
video - New Drivers 2017-05-03 12:11:44 -07:00
virt
virtio
vlynq
vme
w1
watchdog watchdog: iTCO_wdt: Add PMC specific noreboot update api 2017-04-28 21:51:28 +03:00
xen xen: Implement EFI reset_system callback 2017-05-02 12:06:50 +02:00
zorro
Kconfig
Makefile libnvdimm for 4.12 2017-05-05 18:49:20 -07:00