linux/arch/x86/kernel
Don Zickus 13beacee81 perf/x86/p4: Fix counter corruption when using lots of perf groups
On a P4 box stressing perf with:

   ./perf record -o perf.data ./perf stat -v ./perf bench all

it was noticed that a slew of unknown NMIs would pop out rather quickly.

Painfully debugging this ancient platform, led me to notice cross cpu counter
corruption.

The P4 machine is special in that it has 18 counters, half are used for cpu0
and the other half is for cpu1 (or all 18 if hyperthreading is disabled).  But
the splitting of the counters has to be actively managed by the software.

In this particular bug, one of the cpu0 specific counters was being used by
cpu1 and caused all sorts of random unknown nmis.

I am not entirely sure on the corruption path, but what happens is:

 o perf schedules a group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   but for a different cpu, so it 'swaps' the config bits and returns the
   updated 'assign' array with a _new_ index.
 o perf schedules another group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
   'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
   an earlier part of the 'assign' array (and hasn't been committed yet).
 o perf commits the transaction using the wrong index and corrupts the other cpu

The [BUG!!] is because the 'hwc->config' is updated but not the 'hwc->idx'.  So
the check for 'p4_should_swap_ts()' is correct the first time around but
incorrect the second time around (because hwc->config was updated in between).

I think the spirit of perf was to not modify anything until all the
transactions had a chance to 'test' if they would succeed, and if so, commit
atomically.  However, P4 breaks this spirit by touching the hwc->config
element.

So my fix is to continue the un-perf like breakage, by assigning hwc->idx to -1
on swap to tell follow up group scheduling to find a new index.

Of course if the transaction fails rolling this back will be difficult, but
that is not different than how the current code works. :-)  And I wasn't sure
how much effort to cleanup the code I should do for a platform that is almost
10 years old by now.

Hence the lazy fix.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391024270-19469-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:17:23 +01:00
..
acpi ACPI and power management updates for 3.14-rc1 2014-01-24 15:51:02 -08:00
apic Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-31 08:59:46 -08:00
cpu perf/x86/p4: Fix counter corruption when using lots of perf groups 2014-02-09 13:17:23 +01:00
kprobes Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-09-04 08:42:44 -07:00
.gitignore
alternative.c lockdep, x86/alternatives: Drop ancient lockdep fixup message 2013-09-28 10:09:41 +02:00
amd_gart_64.c x86, mm: use pfn_range_is_mapped() with gart 2012-11-17 11:59:10 -08:00
amd_nb.c x86/AMD/NB: Fix amd_set_subcaches() parameter type 2014-01-25 08:50:09 +01:00
apb_timer.c intel_mid: Renamed *mrst* to *intel_mid* 2013-10-17 16:40:47 -07:00
aperture_64.c x86/mm/gart: Drop unnecessary check 2013-04-16 10:54:40 +02:00
apm_32.c x86, asmlinkage, apm: Make APM data structure used from assembler visible 2013-08-06 14:20:20 -07:00
asm-offsets_32.c x86: Get rid of ->hard_math and all the FPU asm fu 2013-06-06 14:32:04 -07:00
asm-offsets_64.c x86, gdt, hibernate: Store/load GDT for hibernate path. 2013-05-02 11:27:35 -07:00
asm-offsets.c sched, x86: Provide a per-cpu preempt_count implementation 2013-09-25 14:07:57 +02:00
audit_64.c
bootflag.c
check.c x86/mm: memblock: switch to use NUMA_NO_NODE 2014-01-21 16:19:47 -08:00
cpuid.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
crash_dump_32.c
crash_dump_64.c
crash.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
devicetree.c Merge remote-tracking branch 'grant/devicetree/next' into for-next 2013-11-07 10:34:46 -06:00
doublefault.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
dumpstack_32.c dump_stack: unify debug information printed by show_regs() 2013-04-30 17:04:02 -07:00
dumpstack_64.c dump_stack: unify debug information printed by show_regs() 2013-04-30 17:04:02 -07:00
dumpstack.c x86/dumpstack: Fix printk_address for direct addresses 2013-11-12 21:06:06 +01:00
e820.c x86/mm: memblock: switch to use NUMA_NO_NODE 2014-01-21 16:19:47 -08:00
early_printk.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
early-quirks.c x86/early quirk: use gen6 stolen detection for VLV 2013-11-14 09:32:11 +01:00
entry_32.S ftrace/x86: Load ftrace_ops in parameter not the variable holding it 2014-01-09 13:24:29 -08:00
entry_64.S ftrace/x86: Load ftrace_ops in parameter not the variable holding it 2014-01-09 13:24:29 -08:00
ftrace.c ftrace/x86: skip over the breakpoint for ftrace caller 2013-11-05 16:01:47 -05:00
head_32.S Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-09-04 09:11:16 -07:00
head_64.S x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
head.c x86: Make sure we can boot in the case the BDA contains pure garbage 2013-02-27 13:38:57 -08:00
head32.c intel_mid: Renamed *mrst* to *intel_mid* 2013-10-17 16:40:47 -07:00
head64.c x86, trace: Register exception handler to trace IDT 2013-11-08 14:15:45 -08:00
hpet.c x86, hpet: Introduce x86_msi_ops.setup_hpet_msi 2013-01-28 10:48:30 +01:00
hw_breakpoint.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
i386_ksyms_32.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
i387.c x86: move fpu_counter into ARCH specific thread_struct 2013-11-13 12:09:13 +09:00
i8237.c
i8253.c
i8259.c x86/irq: Correct comment about i8259 initialization 2013-09-04 07:46:04 +02:00
io_delay.c
ioport.c x86: get rid of pt_regs argument of iopl(2) 2013-02-03 18:16:24 -05:00
iosf_mbi.c arch: x86: New MailBox support driver for Intel SOC's 2014-01-08 14:36:29 -08:00
irq_32.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 10:20:12 +09:00
irq_64.c irq: Consolidate do_softirq() arch overriden implementations 2013-10-01 12:53:25 +02:00
irq_work.c x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible 2013-08-06 14:18:23 -07:00
irq.c x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable() 2014-01-30 16:40:13 -08:00
irqinit.c x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs 2014-01-12 13:13:02 +01:00
jump_label.c x86/jump_label: expect default_nop if static_key gets enabled on boot-up 2013-10-19 19:45:35 -04:00
kdebugfs.c
kgdb.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
ksysfs.c x86: ksysfs.c build fix 2014-01-03 14:37:13 +00:00
kvm.c Second batch of KVM updates. Some minor x86 fixes, 2014-01-31 08:37:32 -08:00
kvmclock.c pvclock: detect watchdog reset at pvclock read 2013-11-06 09:48:43 +02:00
ldt.c
machine_kexec_32.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
machine_kexec_64.c x86, kexec, 64bit: Only set ident mapping for ram. 2013-01-29 15:26:35 -08:00
Makefile Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 12:09:31 -08:00
mmconf-fam10h_64.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
module.c mm/arch: use NUMA_NO_NODE 2013-11-13 12:09:05 +09:00
mpparse.c
msr.c x86, msr: Use file_inode(), not f_mapping->host 2013-10-08 12:01:29 -07:00
nmi_selftest.c
nmi.c x86/nmi: Push duration printk() to irq context 2014-02-09 13:17:22 +01:00
paravirt_patch_32.c
paravirt_patch_64.c
paravirt-spinlocks.c x86, ticketlock: Add slowpath logic 2013-08-09 07:54:00 -07:00
paravirt.c x86, paravirt: Remove duplicate definition for DEF_NATIVE 2013-09-04 09:46:43 -07:00
pci-calgary_64.c
pci-dma.c x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES 2013-01-24 17:34:18 +01:00
pci-iommu_table.c
pci-nommu.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
pci-swiotlb.c
pcspeaker.c
perf_regs.c perf: Fix off by one test in perf_reg_value() 2012-09-19 17:08:40 +02:00
preempt.S sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
probe_roms.c x86/pci/probe_roms: Add missing __iomem annotation to pci_map_biosrom() 2012-09-05 10:52:25 +02:00
process_32.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
process_64.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:55:56 +09:00
process.c sched, idle: Fix the idle polling state logic 2013-09-25 13:53:10 +02:00
ptrace.c ptrace/x86: cleanup ptrace_set_debugreg() 2013-07-09 10:33:26 -07:00
pvclock.c hung_task: add method to reset detector 2013-11-06 09:49:02 +02:00
quirks.c x86/quirks: Add workaround for AMD F16h Erratum792 2014-01-25 08:44:19 +01:00
reboot_fixups_32.c
reboot.c x86/apic, doc: Justification for disabling IO APIC before Local APIC 2013-12-04 19:33:21 -08:00
relocate_kernel_32.S x86, asm, cleanup: Replace open-coded control register values with symbolic 2013-06-25 16:26:06 -07:00
relocate_kernel_64.S x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S 2013-06-20 21:30:04 -07:00
resource.c
rtc.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
setup_percpu.c
setup.c x86: revert wrong memblock current limit setting 2014-01-27 21:02:38 -08:00
signal.c Merge branch 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-09-04 11:08:32 -07:00
smp.c x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible 2013-08-06 14:18:23 -07:00
smpboot.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 12:11:41 -08:00
stacktrace.c
step.c ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL 2013-01-22 10:08:00 -08:00
sys_x86_64.c x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member 2013-08-22 10:19:35 -07:00
syscall_32.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
syscall_64.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
sysfb_efi.c x86: sysfb: move EFI quirks from efifb to sysfb 2013-08-02 16:17:47 -07:00
sysfb_simplefb.c x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning 2013-10-03 07:51:11 +02:00
sysfb.c x86: sysfb: move EFI quirks from efifb to sysfb 2013-08-02 16:17:47 -07:00
tboot.c x86 / tboot / ACPI: Fail extended mode reduced hardware sleep 2013-07-31 14:25:51 +02:00
tce_64.c
test_nx.c
test_rodata.c
time.c
tls.c make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect 2013-03-03 22:58:33 -05:00
tls.h
topology.c hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock() 2013-09-30 19:55:51 +02:00
trace_clock.c tracing,x86: Add a TSC trace_clock 2012-11-13 15:48:27 -05:00
tracepoint.c x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
traps.c x86/traps: Clean up error exception handler definitions 2013-12-12 14:46:42 +01:00
tsc_msr.c x86, tsc, apic: Unbreak static (MSR) calibration when CONFIG_X86_LOCAL_APIC=n 2014-01-16 13:00:21 -08:00
tsc_sync.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
tsc.c sched/x86/tsc: Initialize multiplier to 0 2014-01-23 14:48:36 +01:00
uprobes.c uretprobes/x86: Hijack return address 2013-04-13 15:31:55 +02:00
verify_cpu.S
vm86_32.c x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...) 2013-05-02 20:36:32 -04:00
vmlinux.lds.S x86: intel-mid: Add section for sfi device table 2013-10-17 16:41:04 -07:00
vsmp_64.c x86, asmlinkage, paravirt: Make paravirt thunks global 2014-01-29 22:17:17 -08:00
vsyscall_64.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
vsyscall_emu_64.S
vsyscall_trace.h
x86_init.c PCI: Drop "irq" param from *_restore_msi_irqs() 2013-12-13 08:44:30 -07:00
x8664_ksyms_64.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
xsave.c x86, xsave: Support eager-only xsave features, add MPX support 2013-12-06 17:17:42 -08:00