linux/arch/x86/kernel
Vitaly Kuznetsov 59107e2f48 x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt')
which injects NMI to the guest. We may want to crash the guest and do kdump
on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to
allow the kdump kernel to re-establish VMBus connection so it will see
VMBus devices (storage, network,..).

To properly unload VMBus making it possible to start over during kdump we
need to do the following:

 - Send an 'unload' message to the hypervisor. This can be done on any CPU
   so we do this the crashing CPU.

 - Receive the 'unload finished' reply message. WS2012R2 delivers this
   message to the CPU which was used to establish VMBus connection during
   module load and this CPU may differ from the CPU sending 'unload'.

Receiving a VMBus message means the following:

 - There is a per-CPU slot in memory for one message. This slot can in
   theory be accessed by any CPU.

 - We get an interrupt on the CPU when a message was placed into the slot.

 - When we read the message we need to clear the slot and signal the fact
   to the hypervisor. In case there are more messages to this CPU pending
   the hypervisor will deliver the next message. The signaling is done by
   writing to an MSR so this can only be done on the appropriate CPU.

To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload()
function which checks message slots for all CPUs in a loop waiting for the
'unload finished' messages. However, there is an issue which arises when
these conditions are met:

 - We're crashing on a CPU which is different from the one which was used
   to initially contact the hypervisor.

 - The CPU which was used for the initial contact is blocked with interrupts
   disabled and there is a message pending in the message slot.

In this case we won't be able to read the 'unload finished' message on the
crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs
simultaneously: the first CPU entering panic() will proceed to crash and
all other CPUs will stop themselves with interrupts disabled.

The suggested solution is to handle unknown NMIs for Hyper-V guests on the
first CPU which gets them only. This will allow us to rely on VMBus
interrupt handler being able to receive the 'unload finish' message in
case it is delivered to a different CPU.

The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the
boot CPU only, WS2012R2 and earlier Hyper-V versions are affected.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: devel@linuxdriverproject.org
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-20 09:31:48 +01:00
..
acpi x86/init: Add i8042 state to the platform data 2016-12-19 11:34:15 +01:00
apic x86/smpboot: Make logical package management more robust 2016-12-13 10:22:39 +01:00
cpu x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic 2016-12-20 09:31:48 +01:00
fpu Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 14:27:49 -08:00
kprobes kprobes: Unpoison stack in jprobe_return() for KASAN 2016-10-16 11:02:31 +02:00
.gitignore
alternative.c x86/asm: Stop depending on ptrace.h in alternative.h 2016-04-29 11:56:40 +02:00
amd_gart_64.c dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
amd_nb.c x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h 2016-11-16 20:46:38 +01:00
apb_timer.c x86/apb_timer: Convert to hotplug state machine 2016-07-15 10:40:22 +02:00
aperture_64.c param: convert some "on"/"off" users to strtobool 2016-03-17 15:09:34 -07:00
apm_32.c Merge branch 'linus' into sched/core, to pick up fixes 2016-11-16 10:16:28 +01:00
asm-offsets_32.c sched/x86: Rewrite the switch_to() code 2016-08-24 12:31:41 +02:00
asm-offsets_64.c sched/x86: Rewrite the switch_to() code 2016-08-24 12:31:41 +02:00
asm-offsets.c x86: Move thread_info into task_struct 2016-09-15 08:25:13 +02:00
audit_64.c
bootflag.c
check.c
cpuid.c Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 19:25:04 -08:00
crash_dump_32.c
crash_dump_64.c
crash.c kexec_file: Change kexec_add_buffer to take kexec_buf as argument. 2016-11-30 23:14:59 +11:00
devicetree.c x86/cpufeature: Replace cpu_has_apic with boot_cpu_has() usage 2016-04-13 11:37:41 +02:00
doublefault.c
dumpstack_32.c x86/dumpstack: Make stack name tags more comprehensible 2016-11-21 13:00:42 +01:00
dumpstack_64.c x86/dumpstack: Make stack name tags more comprehensible 2016-11-21 13:00:42 +01:00
dumpstack.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 13:49:57 -08:00
e820.c x86/e820: Don't merge consecutive E820_PRAM ranges 2016-10-16 11:16:48 +02:00
early_printk.c x86: Fix misspellings in comments 2016-02-24 08:44:58 +01:00
early-quirks.c Merge tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux 2016-10-11 18:12:22 -07:00
ebda.c x86/boot: Simplify EBDA-vs-BIOS reservation logic 2016-07-22 11:46:01 +02:00
espfix_64.c x86: get rid of superfluous __GFP_REPEAT 2016-06-24 17:23:52 -07:00
ftrace.c ftrace/x86: Implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR 2016-08-24 12:15:15 +02:00
head32.c x86/boot: Run reserve_bios_regions() after we initialize the memory map 2016-08-11 11:14:59 +02:00
head64.c x86/boot: Run reserve_bios_regions() after we initialize the memory map 2016-08-11 11:14:59 +02:00
head_32.S Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 13:49:57 -08:00
head_64.S x86/boot/64: Push correct start_cpu() return address 2016-12-14 08:48:05 +01:00
hpet.c x86/hpet: Reduce HPET counter read contention 2016-09-09 15:16:19 +02:00
hw_breakpoint.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
i8237.c
i8253.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
i8259.c x86/irq: Probe for PIC presence before allocating descs for legacy IRQs 2015-11-07 10:37:37 +01:00
io_delay.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
ioport.c x86/iopl: Fix iopl capability check on Xen PV 2016-03-17 09:49:27 +01:00
irq_32.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
irq_64.c x86: Remove empty idle.h header 2016-12-09 21:23:22 +01:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
irq.c x86: Remove empty idle.h header 2016-12-09 21:23:22 +01:00
irqinit.c
itmt.c x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h> 2016-11-28 09:43:49 +01:00
jump_label.c x86/asm: Stop depending on ptrace.h in alternative.h 2016-04-29 11:56:40 +02:00
kdebugfs.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
kexec-bzimage64.c kexec_file: Change kexec_add_buffer to take kexec_buf as argument. 2016-11-30 23:14:59 +11:00
kgdb.c sched/x86: Add 'struct inactive_task_frame' to better document the sleeping task stack frame 2016-08-24 12:27:41 +02:00
ksysfs.c x86: Apply more __ro_after_init and const 2016-08-10 14:55:05 +02:00
kvm.c Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 14:55:04 -08:00
kvmclock.c Merge branch 'linus' into x86/asm, to pick up recent fixes 2016-09-15 08:24:53 +02:00
ldt.c Merge branch 'akpm' (patches from Andrew) 2016-12-12 20:50:02 -08:00
livepatch.c livepatch/x86: apply alternatives and paravirt patches after relocations 2016-08-18 23:41:55 +02:00
machine_kexec_32.c
machine_kexec_64.c kexec: export the value of phys_base instead of symbol address 2016-12-14 16:04:07 -08:00
Makefile Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-18 13:59:10 -08:00
mcount_64.S x86: Fix export for mcount and __fentry__ 2016-10-26 12:38:17 +02:00
mmconf-fam10h_64.c
module.c x86/asm: Stop depending on ptrace.h in alternative.h 2016-04-29 11:56:40 +02:00
mpparse.c x86/mm/numa: Open code function early_get_boot_cpu_id() 2016-08-15 08:51:54 +02:00
msr.c x86/msr: Convert to hotplug state machine 2016-11-22 23:34:39 +01:00
nmi_selftest.c
nmi.c x86: include linux/ratelimit.h in nmi.c 2016-06-06 17:10:15 +02:00
paravirt_patch_32.c Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 14:27:49 -08:00
paravirt_patch_64.c Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 14:27:49 -08:00
paravirt-spinlocks.c x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() 2016-11-22 12:48:11 +01:00
paravirt.c x86/fpu: Remove clts() 2016-11-01 07:47:55 +01:00
pci-calgary_64.c dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
pci-dma.c dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
pci-iommu_table.c x86: Fix non-static inlines 2016-04-16 13:21:40 +02:00
pci-nommu.c dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
pci-swiotlb.c dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
pcspeaker.c
perf_regs.c
platform-quirks.c x86/init: Add i8042 state to the platform data 2016-12-19 11:34:15 +01:00
pmem.c x86/kernel: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:41 +02:00
probe_roms.c
process_32.c Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 14:55:04 -08:00
process_64.c Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 14:55:04 -08:00
process.c Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-18 13:59:10 -08:00
ptrace.c Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-10-03 17:29:01 -07:00
pvclock.c KVM: x86: introduce get_kvmclock_ns 2016-09-20 09:26:15 +02:00
quirks.c x86/quirks: Hide maybe-uninitialized warning 2016-10-25 11:45:13 +02:00
reboot_fixups_32.c
reboot.c x86: Apply more __ro_after_init and const 2016-08-10 14:55:05 +02:00
relocate_kernel_32.S
relocate_kernel_64.S
resource.c x86/e820: Prepare e280 code for switch to dynamic storage 2016-09-21 15:02:12 +02:00
rtc.c timekeeping: Ignore the bogus sleep time if pm_trace is enabled 2016-11-29 18:02:58 +01:00
setup_percpu.c x86/percpu: Remove unnecessary include of module.h, add asm/desc.h 2016-11-15 07:26:37 +01:00
setup.c mm: remove x86-only restriction of movable_node 2016-12-12 18:55:07 -08:00
signal_compat.c x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi() 2016-10-20 13:05:15 +02:00
signal.c Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-10-03 17:29:01 -07:00
smp.c x86/apic: Prevent tracing on apic_msr_write_eoi() 2016-11-09 22:03:14 +01:00
smpboot.c x86/smpboot: Prevent false positive out of bounds cpumask access warning 2016-12-15 11:32:31 +01:00
stacktrace.c x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder 2016-09-20 08:29:33 +02:00
step.c mm: replace access_process_vm() write parameter with gup_flags 2016-10-19 08:31:25 -07:00
sys_x86_64.c x86: use simpler API for random address requests 2016-10-11 15:06:32 -07:00
sysfb_efi.c Merge branch 'linus' into efi/core, to pick up fixes 2016-05-07 07:00:07 +02:00
sysfb_simplefb.c x86/sysfb: Fix lfb_size calculation 2016-11-16 09:38:23 +01:00
sysfb.c
tboot.c x86/e820: Prepare e280 code for switch to dynamic storage 2016-09-21 15:02:12 +02:00
tce_64.c x86/cpufeature: Remove cpu_has_clflush 2016-03-31 13:35:09 +02:00
test_nx.c x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option 2016-02-22 08:51:38 +01:00
test_rodata.c x86: Don't use module.h just for AUTHOR / LICENSE tags 2016-07-14 13:04:20 +02:00
time.c
tls.c x86/tls: Synchronize segment registers in set_thread_area() 2016-04-29 11:56:42 +02:00
tls.h
topology.c
trace_clock.c
tracepoint.c tracing: Have the reg function allow to fail 2016-12-09 09:13:30 -05:00
traps.c x86/fpu: Handle #NM without FPU emulation as an error 2016-11-01 07:47:54 +01:00
tsc_msr.c x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs 2016-11-18 10:58:31 +01:00
tsc_sync.c x86/tsc: Limit the adjust value further 2016-12-18 16:37:04 +01:00
tsc.c x86/tsc: Force TSC_ADJUST register to value >= zero 2016-12-15 11:44:29 +01:00
unwind_frame.c x86/unwind: Dump stack data on warnings 2016-12-19 11:47:05 +01:00
unwind_guess.c x86/unwind: Fix guess-unwinder regression 2016-11-28 07:47:54 +01:00
uprobes.c uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions 2016-08-12 08:29:24 +02:00
verify_cpu.S x86/cpufeature: Carve out X86_FEATURE_* 2016-01-30 11:22:17 +01:00
vm86_32.c x86, bitops: remove use of "sbb" to return CF 2016-06-08 12:41:20 -07:00
vmlinux.lds.S x86/boot: Move the _stext marker to before the boot code 2016-10-20 09:15:24 +02:00
vsmp_64.c
x86_init.c x86/init: Remove i8042_detect() from platform ops 2016-12-19 11:34:15 +01:00