linux/arch
Igor Mammedov 89f898c1e1 x86: Fix list/memory corruption on CPU hotplug
currently if AP wake up is failed, master CPU marks AP as not
present in do_boot_cpu() by calling set_cpu_present(cpu, false).
That leads to following list corruption on the next physical CPU
hotplug:

[  418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[  418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
[  418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
[  418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
[  418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  418.166433]  0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
[  418.176460]  ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
[  418.177453]  ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
[  418.178445] Call Trace:
[  418.185811]  [<ffffffff8159b22d>] dump_stack+0x49/0x5c
[  418.186440]  [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
[  418.187192]  [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
[  418.191231]  [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
[  418.193889]  [<ffffffff812f796e>] __list_add+0xbe/0xd0
[  418.196649]  [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
[  418.208610]  [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
[  418.213831]  [<ffffffff812e2ef4>] kobject_add+0x44/0x70
[  418.229961]  [<ffffffff813e2c60>] device_add+0xd0/0x550
[  418.234991]  [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
[  418.250226]  [<ffffffff813e32be>] device_register+0x1e/0x30
[  418.255296]  [<ffffffff813e82a3>] register_cpu+0xe3/0x130
[  418.266539]  [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
[  418.285845]  [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
...
Which is caused by the fact that generic_processor_info() allocates
logical CPU id by calling:

 cpu = cpumask_next_zero(-1, cpu_present_mask);

which returns id of previously failed to wake up CPU, since its
bit is cleared by do_boot_cpu() and as result register_cpu()
tries to register another CPU with the same id as already
present but failed to be onlined CPU.

Taking in account that AP will not do anything if master CPU
failed to wake it up, there is no reason to mark that AP as not
present and break next cpu hotplug attempts. As a side effect of
not marking AP as not present, user would be allowed to online
it again later.

Also fix memory corruption in acpi_unmap_lsapic()

if during CPU hotplug master CPU failed to wake up AP
it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.

However following attempt to unplug that CPU will lead to
out of bound write access to __apicid_to_node[] which is
32768 items long on x86_64 kernel.

So with above fix of cpu_present_mask make sure that a present
CPU has a valid APIC ID by not setting x86_cpu_to_apicid
to BAD_APICID in do_boot_cpu() on failure and allow
acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 16:33:07 +02:00
..
alpha Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
arc ARC: !PREEMPT: Ensure Return to kernel mode is IRQ safe 2014-04-30 08:21:43 -07:00
arm Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2014-05-29 18:31:09 -07:00
arm64 arm64: mm: fix pmd_write CoW brokenness 2014-05-29 11:31:14 +01:00
avr32 Char/Misc driver patches for 3.15-rc1 2014-04-01 16:13:21 -07:00
blackfin blackfin updates for Linux 3.15 2014-04-12 17:26:45 -07:00
c6x Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 10:59:39 -07:00
cris Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
frv PCI changes for the v3.15 merge window: 2014-04-01 15:14:04 -07:00
hexagon Hexagon: Delete stale barrier.h 2014-05-01 10:09:47 -07:00
ia64 ia64: add renameat2 syscall 2014-05-20 10:59:38 +02:00
m32r Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
m68k m68k: add renameat2 syscall 2014-05-20 10:59:37 +02:00
metag metag: Remove _STK_LIM_MAX override 2014-05-15 00:30:32 +01:00
microblaze Microblaze patches for 3.15-rc1 2014-04-11 11:53:45 -07:00
mips Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2014-06-01 18:28:58 -07:00
mn10300 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
openrisc Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
parisc parisc: 'renameat2()' doesn't need (or have) a separate compat system call 2014-05-23 09:23:51 -07:00
powerpc Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2014-06-01 18:30:07 -07:00
s390 Small fixes for x86, slightly larger fixes for PPC, and a forgotten s390 patch. 2014-05-28 08:08:03 -07:00
score score: remove unused CPU_SCORE7 Kconfig parameter 2014-04-03 16:20:52 -07:00
sh mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts 2014-04-25 16:05:40 -07:00
sparc sparc64: fix format string mismatch in arch/sparc/kernel/sysfs.c 2014-05-21 12:54:42 -07:00
tile Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
um mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts 2014-04-25 16:05:40 -07:00
unicore32 Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
x86 x86: Fix list/memory corruption on CPU hotplug 2014-06-05 16:33:07 +02:00
xtensa Xtensa patchset for v3.15. 2014-05-05 15:36:59 -07:00
.gitignore
Kconfig