linux/drivers/edac
Daniel J Blueman 0c510cc83b EDAC, amd64_edac: Prevent OOPS with >16 memory controllers
When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16),
the kernel fatally dereferences unallocated structures, see splat below;
this occurs on at least NumaConnect systems.

Fix by checking if a memory controller info structure was found.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000320
IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0
Oops: 0000 [#2] SMP
Modules linked in:
CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G   D    3.19.0 #1
Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b    01/28/2015
task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000
RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297
RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6
RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c
RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022
R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008
R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000
FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0
Stack:
 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13
 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a
 ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010
Call Trace:
 <IRQ>
 ? vprintk_default
 ? printk
 amd_decode_mce
 notifier_call_chain
 atomic_notifier_call_chain
 mce_log
 machine_check_poll
 mce_timer_fn
 ? mce_cpu_restart
 call_timer_fn.isra.29
 run_timer_softirq
 __do_softirq
 irq_exit
 smp_apic_timer_interrupt
 apic_timer_interrupt
 <EOI>
 ? down_read_trylock
 __do_page_fault
 ? __schedule
 do_page_fault
 page_fault

Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com
Cc: stable@vger.kernel.org
[ Boris: massage commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-17 10:32:12 +01:00
..
altera_edac.c edac: altera: Add Altera SDRAM EDAC support 2014-09-04 13:41:46 -05:00
amd64_edac_dbg.c amd64_edac: convert sysfs logic to use struct device 2012-06-11 13:23:40 -03:00
amd64_edac_inj.c EDAC: Replace strict_strtoul() with kstrtoul() 2013-06-08 10:16:33 +02:00
amd64_edac.c EDAC, amd64_edac: Prevent OOPS with >16 memory controllers 2015-02-17 10:32:12 +01:00
amd64_edac.h amd64_edac: Add F15h M60h support 2014-10-30 13:42:48 +01:00
amd76x_edac.c EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro 2013-12-06 10:23:41 +01:00
amd8111_edac.c amd8111_edac: Fix leaks in probe error paths 2014-02-25 10:09:09 +01:00
amd8111_edac.h
amd8131_edac.c edac: Drop __DATE__ usage 2011-04-19 00:23:22 +02:00
amd8131_edac.h tree-wide: fix comment/printk typos 2010-11-01 15:38:34 -04:00
cell_edac.c edac: drop owner assignment from platform_drivers 2014-10-20 16:20:30 +02:00
cpc925_edac.c cpc925_edac: Report UE events properly 2014-10-22 22:58:45 +02:00
e7xxx_edac.c e7xxx_edac: Report CE events properly 2014-10-22 22:59:00 +02:00
e752x_edac.c e752x_edac: Drop pvt->bridge_ck 2014-02-25 10:01:30 +01:00
edac_core.h EDAC: Fix mem_types strings type 2014-09-02 09:11:16 +02:00
edac_device_sysfs.c edac: Convert debugfX to edac_dbg(X, 2012-06-11 13:23:49 -03:00
edac_device.c EDAC: Don't try to cancel workqueue when it's never setup 2014-01-10 15:57:36 +01:00
edac_mc_sysfs.c sb_edac: Fix off-by-one error in number of channels 2014-12-02 12:06:51 -02:00
edac_mc.c EDAC: Sync memory types and names 2014-10-20 14:22:50 +02:00
edac_module.c EDAC, edac_module.c: Remove unnecessary test on unsigned value 2014-06-24 15:13:08 +02:00
edac_module.h EDAC: Poll timeout cannot be zero, p2 2014-02-14 10:40:29 +01:00
edac_pci_sysfs.c EDAC, pci_sysfs: remove unneccessary ifdef around entire file 2014-11-11 18:17:57 +01:00
edac_pci.c edac: Unify reporting of device info for device, mc and pci 2013-11-04 17:01:09 -06:00
edac_stub.c EDAC: Add an edac_report parameter to EDAC 2013-12-11 18:06:47 +01:00
ghes_edac.c ghes_edac: Use snprintf() to silence a static checker warning 2014-11-11 18:08:56 +01:00
highbank_l2_edac.c edac, highbank: Improve and unify naming 2013-11-04 17:01:07 -06:00
highbank_mc_edac.c edac, highbank: Moving error injection to sysfs for edac 2013-11-04 17:01:11 -06:00
i7core_edac.c Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2014-04-04 09:50:07 -07:00
i3000_edac.c EDAC: Delete unnecessary check before calling pci_dev_put() 2014-11-19 16:33:48 +01:00
i3200_edac.c EDAC updates all over the place: 2014-12-08 20:17:49 -08:00
i5000_edac.c EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro 2013-12-06 10:23:41 +01:00
i5100_edac.c i5100_edac: Remove an unneeded condition in i5100_init_csrows() 2014-02-20 11:52:58 +01:00
i5400_edac.c Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2014-04-04 09:50:07 -07:00
i7300_edac.c Linux 3.14-rc5 2014-03-11 06:55:49 -03:00
i82443bxgx_edac.c EDAC: Delete unnecessary check before calling pci_dev_put() 2014-11-19 16:33:48 +01:00
i82860_edac.c i82860_edac: Report CE events properly 2014-10-22 22:58:31 +02:00
i82875p_edac.c Merge branches 'pci/host-exynos', 'pci/host-imx6', 'pci/resource' and 'pci/misc' into next 2014-05-30 11:41:17 -06:00
i82975x_edac.c EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro 2013-12-06 10:23:41 +01:00
ie31200_edac.c ie31200_edac: Allocate mci and map mchbar first 2014-07-10 10:55:12 +02:00
Kconfig EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs 2014-11-25 13:09:33 +01:00
Makefile EDAC, pci_sysfs: remove unneccessary ifdef around entire file 2014-11-11 18:17:57 +01:00
mce_amd_inj.c EDAC, mce_amd_inj: Add an injector function 2014-11-25 13:09:45 +01:00
mce_amd.c EDAC, MCE, AMD: Correct formatting of decoded text 2014-11-25 13:09:49 +01:00
mce_amd.h x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error 2014-11-19 10:55:43 -08:00
mpc85xx_edac.c edac: drop owner assignment from platform_drivers 2014-10-20 16:20:30 +02:00
mpc85xx_edac.h edac/85xx: Add PCIe error interrupt edac support 2013-11-25 11:29:15 +01:00
mv64x60_edac.c {mv64x60,ppc4xx}_edac,: Remove deprecated IRQF_DISABLED 2014-10-20 14:23:09 +02:00
mv64x60_edac.h edac: Drop __DATE__ usage 2011-04-19 00:23:22 +02:00
octeon_edac-l2c.c Drivers: edac: remove __dev* attributes. 2013-01-03 15:57:03 -08:00
octeon_edac-lmc.c EDAC: Octeon: Add error injection support 2014-03-31 18:17:12 +02:00
octeon_edac-pc.c Drivers: edac: remove __dev* attributes. 2013-01-03 15:57:03 -08:00
octeon_edac-pci.c Drivers: edac: remove __dev* attributes. 2013-01-03 15:57:03 -08:00
pasemi_edac.c Drivers: edac: remove __dev* attributes. 2013-01-03 15:57:03 -08:00
ppc4xx_edac.c Driver core patches for 3.19-rc1 2014-12-14 16:10:09 -08:00
ppc4xx_edac.h
r82600_edac.c EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro 2013-12-06 10:23:41 +01:00
sb_edac.c sb_edac: Fix detection on SNB machines 2015-02-09 16:55:26 +01:00
tile_edac.c edac: drop owner assignment from platform_drivers 2014-10-20 16:20:30 +02:00
x38_edac.c EDAC: Delete unnecessary check before calling pci_dev_put() 2014-11-19 16:33:48 +01:00