linux/arch/x86/kernel
Frederic Weisbecker a2bbe75089 x86: Don't use frame pointer to save old stack on irq entry
rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
in order to restore it later in ret_from_intr.

It is convenient because we save its value in the irq regs
and it's easily restored using the leave instruction.

However this is a kind of abuse of the frame pointer which
role is to help unwinding the kernel by chaining frames
together, each node following the return address to the
previous frame.

But although we are breaking the frame by changing the stack
pointer, there is no preceding return address before the new
frame. Hence using the frame pointer to link the two stacks
breaks the stack unwinders that find a random value instead of
a return address here.

There is no workaround that can work in every case. We are using
the fixup_bp_irq_link() function to dereference that abused frame
pointer in the case of non nesting interrupt (which means stack
changed).
But that doesn't fix the case of interrupts that don't change the
stack (but we still have the unconditional frame link), which is
the case of hardirq interrupting softirq. We have no way to detect
this transition so the frame irq link is considered as a real frame
pointer and the return address is dereferenced but it is still a
spurious one.

There are two possible results of this: either the spurious return
address, a random stack value, luckily belongs to the kernel text
and then the unwinding can continue and we just have a weird entry
in the stack trace. Or it doesn't belong to the kernel text and
unwinding stops there.

This is the reason why stacktraces (including perf callchains) on
irqs that interrupted softirqs don't work very well.

To solve this, we don't save the old stack pointer on rbp anymore
but we save it to a scratch register that we push on the new
stack and that we pop back later on irq return.

This preserves the whole frame chain without spurious return addresses
in the middle and drops the need for the horrid fixup_bp_irq_link()
workaround.

And finally irqs that interrupt softirq are sanely unwinded.

Before:

    99.81%         perf  [kernel.kallsyms]  [k] perf_pending_event
                   |
                   --- perf_pending_event
                       irq_work_run
                       smp_irq_work_interrupt
                       irq_work_interrupt
                      |
                      |--41.60%-- __read
                      |          |
                      |          |--99.90%-- create_worker
                      |          |          bench_sched_messaging
                      |          |          cmd_bench
                      |          |          run_builtin
                      |          |          main
                      |          |          __libc_start_main
                      |           --0.10%-- [...]

After:

     1.64%  swapper  [kernel.kallsyms]  [k] perf_pending_event
            |
            --- perf_pending_event
                irq_work_run
                smp_irq_work_interrupt
                irq_work_interrupt
               |
               |--95.00%-- arch_irq_work_raise
               |          irq_work_queue
               |          __perf_event_overflow
               |          perf_swevent_overflow
               |          perf_swevent_event
               |          perf_tp_event
               |          perf_trace_softirq
               |          __do_softirq
               |          call_softirq
               |          do_softirq
               |          irq_exit
               |          |
               |          |--73.68%-- smp_apic_timer_interrupt
               |          |          apic_timer_interrupt
               |          |          |
               |          |          |--96.43%-- amd_e400_idle
               |          |          |          cpu_idle
               |          |          |          start_secondary

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
2011-07-02 18:06:36 +02:00
..
acpi x86, ioapic: Consolidate mp_ioapics[] into 'struct ioapic' 2011-05-20 13:40:59 +02:00
apic Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent 2011-06-08 15:49:03 +02:00
cpu Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 2011-05-29 11:18:09 -07:00
.gitignore
alternative.c Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-19 17:55:12 -07:00
amd_gart_64.c x86, gart: Rename pci-gart_64.c to amd_gart_64.c 2011-05-10 17:22:06 +02:00
amd_iommu_init.c x86/amd-iommu: Fix 3 possible endless loops 2011-06-06 16:10:15 +02:00
amd_iommu.c x86/amd-iommu: Fix boot crash with hidden PCI devices 2011-06-07 10:06:59 +02:00
amd_nb.c x86, amd-nb: Rename CPU PCI id define for F4 2011-03-31 08:51:38 +02:00
apb_timer.c Merge branch 'consolidate-clksrc-i8253' of master.kernel.org:~rmk/linux-2.6-arm into timers/clocksource 2011-05-14 12:06:36 +02:00
aperture_64.c x86, gart: Don't enforce GART aperture lower-bound by alignment 2011-04-18 10:31:38 -07:00
apm_32.c Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 2011-05-29 11:18:09 -07:00
asm-offsets_32.c x86: Partly unify asm-offsets_{32,64}.c 2011-02-10 13:31:37 +01:00
asm-offsets_64.c x86: Partly unify asm-offsets_{32,64}.c 2011-02-10 13:31:37 +01:00
asm-offsets.c x86, asm: Cleanup unnecssary macros in asm-offsets.c 2011-02-25 16:37:32 -08:00
audit_64.c
bootflag.c
check.c x86: Don't check for BIOS corruption in first 64K when there's no need to 2011-03-09 16:36:41 +01:00
cpuid.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
crash_dump_32.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
crash_dump_64.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
crash.c
devicetree.c x86: devicetree: Add missing early_init_dt_setup_initrd_arch stub 2011-06-09 15:39:43 +02:00
doublefault_32.c
dumpstack_32.c x86, dumpstack: Correct stack dump info when frame pointer is available 2011-03-18 10:51:42 +01:00
dumpstack_64.c x86: Don't use frame pointer to save old stack on irq entry 2011-07-02 18:06:36 +02:00
dumpstack.c Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 2011-05-19 18:24:11 -07:00
e820.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
early_printk.c x86, earlyprintk: Move mrst early console to platform/ and fix a typo 2010-12-06 20:52:04 +01:00
early-quirks.c x86, quirk: Fix SB600 revision check 2011-03-16 14:03:32 +01:00
entry_32.S Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-16 10:10:02 -07:00
entry_64.S x86: Don't use frame pointer to save old stack on irq entry 2011-07-02 18:06:36 +02:00
ftrace.c x86/ftrace: Fix compiler warning in ftrace.c 2011-05-25 19:56:26 -04:00
head32.c Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-19 18:08:06 -07:00
head64.c x86: Cleanup highmap after brk is concluded 2011-03-19 11:58:19 -07:00
head_32.S Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 20:01:36 -07:00
head_64.S x86, trampoline: Common infrastructure for low memory trampolines 2011-02-17 21:02:43 -08:00
head.c x86: Use memblock to replace early_res 2010-08-27 11:12:29 -07:00
hpet.c x86: hpet: Cleanup the clockevents init and register code 2011-05-19 14:24:16 +02:00
hw_breakpoint.c x86: Use this_cpu_ops to optimize code 2010-12-30 12:20:28 +01:00
i386_ksyms_32.c
i387.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
i8237.c x86: Use syscore_ops instead of sysdev classes and sysdevs 2011-03-23 22:15:54 +01:00
i8253.c x86: Convert PIT to clockevents_config_and_register() 2011-05-19 14:24:16 +02:00
i8259.c x86: Use syscore_ops instead of sysdev classes and sysdevs 2011-03-23 22:15:54 +01:00
init_task.c
io_delay.c
ioport.c x86: Use bitmap library functions 2011-02-17 14:59:22 +01:00
irq_32.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
irq_64.c
irq_work.c irq_work: Add generic hardirq context callbacks 2010-10-18 19:58:50 +02:00
irq.c x86: Don't unmask disabled irqs when migrating them 2011-05-19 14:51:08 +02:00
irqinit.c Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 20:01:36 -07:00
jump_label.c x86, cpu: Clean up and unify the NOP selection infrastructure 2011-04-18 16:40:21 -07:00
kdebugfs.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
kgdb.c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb 2011-03-25 21:04:56 -07:00
kprobes.c kprobes, x86: Disable irqs during optimized callback 2011-05-11 13:21:23 +02:00
kvm.c KVM guest: Fix section mismatch derived from kvm_guest_cpu_online() 2011-03-17 13:08:25 -03:00
kvmclock.c x86: Convert remaining x86 clocksources to clocksource_register_hz/khz 2011-02-21 13:33:33 -08:00
ldt.c
machine_kexec_32.c
machine_kexec_64.c x86, cleanups: Use clear_page/copy_page rather than memset/memcpy 2010-09-22 15:36:49 -07:00
Makefile x86: Put back -pg to tsc.o and add no GCOV to vread_tsc_64.o 2011-05-27 23:47:16 -04:00
mca_32.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
microcode_amd.c x86, microcode, AMD: Fix signedness bug in generic_load_microcode() 2011-02-20 14:01:32 +01:00
microcode_core.c x86, microcode: Unregister syscore_ops after microcode unloaded 2011-03-29 11:12:04 +02:00
microcode_intel.c x86/microcode: Fix double vfree() and remove redundant pointer checks before vfree() 2010-12-27 14:33:30 +01:00
mmconf-fam10h_64.c x86-64: Fix and clean up AMD Fam10 MMCONF enabling 2010-11-18 13:41:35 +01:00
module.c jump label: Introduce static_branch() interface 2011-04-04 12:48:08 -04:00
mpparse.c x86, ioapic: Consolidate mp_ioapics[] into 'struct ioapic' 2011-05-20 13:40:59 +02:00
msr.c BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
paravirt_patch_32.c
paravirt_patch_64.c
paravirt-spinlocks.c
paravirt.c thp: add pmd paravirt ops 2011-01-13 17:32:39 -08:00
pci-calgary_64.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
pci-dma.c x86/PCI: Remove dma32_reserve_bootmem 2011-05-10 15:43:32 -07:00
pci-iommu_table.c arch/x86/kernel/pci-iommu_table.c: Convert sprintf_symbol to %pS 2011-05-10 10:21:35 +02:00
pci-nommu.c
pci-swiotlb.c x86, swiotlb: Make SWIOTLB use IOMMU_INIT_* macros. 2010-08-26 15:13:37 -07:00
pcspeaker.c
probe_roms.c x86: Introduce pci_map_biosrom() 2011-03-15 15:34:15 -07:00
process_32.c exec: delay address limit change until point of no return 2011-06-09 12:50:05 -07:00
process_64.c exec: delay address limit change until point of no return 2011-06-09 12:50:05 -07:00
process.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-06-01 02:07:22 +09:00
ptrace.c x86: Get rid of asmregparm 2011-05-24 14:33:35 +02:00
pvclock.c x86/pvclock: Zero last_value on resume 2010-11-28 09:33:20 +01:00
quirks.c x86: HPET force enable for CX700 / VIA Epia LT 2010-09-15 16:27:04 +02:00
reboot_32.S x86, reboot: Fix relocations in reboot_32.S 2011-05-02 14:44:46 -07:00
reboot_fixups_32.c x86: Ce4100: Add reboot_fixup() for CE4100 2010-11-12 00:45:41 +01:00
reboot.c x86: Reorder reboot method preferences 2011-04-06 10:36:50 +02:00
relocate_kernel_32.S
relocate_kernel_64.S
resource.c x86: avoid high BIOS area when allocating address space 2010-12-17 10:01:30 -08:00
rtc.c rtc: cmos: Add OF bindings 2011-02-23 22:27:55 +01:00
setup_percpu.c x86: Unify CPU -> NUMA node mapping between 32 and 64bit 2011-01-28 14:54:09 +01:00
setup.c Merge branch 'linus' into x86/urgent 2011-05-26 13:51:35 +02:00
signal.c x86: signal: sys_rt_sigreturn() should use set_current_blocked() 2011-04-28 13:01:38 +02:00
smp.c sched: Provide scheduler_ipi() callback in response to smp_send_reschedule() 2011-04-14 08:52:32 +02:00
smpboot.c x86: cpu-hotplug: Prevent softirq wakeup on wrong CPU 2011-06-08 11:21:19 +02:00
stacktrace.c x86: Remove warning and warning_symbol from struct stacktrace_ops 2011-05-12 15:31:28 +02:00
step.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
sys_i386_32.c i386: Make kernel_execve() suitable for stack unwinding 2010-09-03 08:16:02 +02:00
sys_x86_64.c
syscall_64.c
syscall_table_32.S ns: Wire up the setns system call 2011-05-28 10:48:39 -07:00
tboot.c mm: convert mm->cpu_vm_cpumask into cpumask_var_t 2011-05-25 08:39:21 -07:00
tce_64.c
test_nx.c x86: Eliminate various 'set but not used' warnings 2011-05-21 19:10:33 +02:00
test_rodata.c
time.c x86-64: Clean up vdso/kernel shared variables 2011-05-24 14:51:28 +02:00
tls.c
tls.h
topology.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
trampoline_32.S x86, trampoline: Common infrastructure for low memory trampolines 2011-02-17 21:02:43 -08:00
trampoline_64.S x86-64, trampoline: Remove unused variable 2011-02-18 15:50:36 -08:00
trampoline.c x86, trampoline: Common infrastructure for low memory trampolines 2011-02-17 21:02:43 -08:00
traps.c x86, NMI: Clean-up default_do_nmi() 2011-01-07 15:08:53 +01:00
tsc_sync.c
tsc.c x86-64: Move vread_tsc into a new file with sensible options 2011-05-24 14:51:29 +02:00
verify_cpu.S x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
vm86_32.c thp: split_huge_page_mm/vma 2011-01-13 17:32:41 -08:00
vmlinux.lds.S Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-26 12:19:31 -07:00
vread_tsc_64.c x86-64: Move vread_tsc into a new file with sensible options 2011-05-24 14:51:29 +02:00
vsmp_64.c
vsyscall_64.c Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-26 12:19:31 -07:00
x86_init.c Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-19 18:08:06 -07:00
x8664_ksyms_64.c x86-64, mem: Convert memmove() to assembly file and fix return value bug 2011-01-25 16:58:39 -08:00
xsave.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00