linux/arch/x86/kernel
Andy Lutomirski 2a23c6b8a9 x86_64, entry: Use sysret to return to userspace when possible
The x86_64 entry code currently jumps through complex and
inconsistent hoops to try to minimize the impact of syscall exit
work.  For a true fast-path syscall, almost nothing needs to be
done, so returning is just a check for exit work and sysret.  For a
full slow-path return from a syscall, the C exit hook is invoked if
needed and we join the iret path.

Using iret to return to userspace is very slow, so the entry code
has accumulated various special cases to try to do certain forms of
exit work without invoking iret.  This is error-prone, since it
duplicates assembly code paths, and it's dangerous, since sysret
can malfunction in interesting ways if used carelessly.  It's
also inefficient, since a lot of useful cases aren't optimized
and therefore force an iret out of a combination of paranoia and
the fact that no one has bothered to write even more asm code
to avoid it.

I would argue that this approach is backwards.  Rather than trying
to avoid the iret path, we should instead try to make the iret path
fast.  Under a specific set of conditions, iret is unnecessary.  In
particular, if RIP==RCX, RFLAGS==R11, RIP is canonical, RF is not
set, and both SS and CS are as expected, then
movq 32(%rsp),%rsp;sysret does the same thing as iret.  This set of
conditions is nearly always satisfied on return from syscalls, and
it can even occasionally be satisfied on return from an irq.

Even with the careful checks for sysret applicability, this cuts
nearly 80ns off of the overhead from syscalls with unoptimized exit
work.  This includes tracing and context tracking, and any return
that invokes KVM's user return notifier.  For example, the cost of
getpid with CONFIG_CONTEXT_TRACKING_FORCE=y drops from ~360ns to
~280ns on my computer.

This may allow the removal and even eventual conversion to C
of a respectable amount of exit asm.

This may require further tweaking to give the full benefit on Xen.

It may be worthwhile to adjust signal delivery and exec to try hit
the sysret path.

This does not optimize returns to 32-bit userspace.  Making the same
optimization for CS == __USER32_CS is conceptually straightforward,
but it will require some tedious code to handle the differences
between sysretl and sysexitl.

Link: http://lkml.kernel.org/r/71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.net
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-02-01 04:03:01 -08:00
..
acpi x86/xen: Treat SCI interrupt as normal GSI interrupt 2015-01-20 11:44:40 +01:00
apic Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-19 14:02:02 -08:00
cpu This is my accumulated x86 entry work, part 1, for 3.20. The meat 2015-01-28 15:33:26 +01:00
kprobes ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing 2015-01-15 09:39:18 -05:00
.gitignore
alternative.c kprobes, x86: Allow kprobes on text_poke/hw_breakpoint 2014-04-24 10:03:02 +02:00
amd_gart_64.c x86: enable DMA CMA with swiotlb 2014-06-04 16:53:57 -07:00
amd_nb.c x86, amd_nb: Add device IDs to NB tables for F15h M60h 2014-10-20 14:18:45 +02:00
apb_timer.c x86, intel-mid: Create IRQs for APB timers and RTC timers 2014-10-29 08:52:23 +01:00
aperture_64.c x86/gart: Tidy messages and add bridge device info 2014-05-23 10:47:19 -06:00
apm_32.c cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logic 2014-11-12 21:17:27 +01:00
asm-offsets_32.c x86/asm: Guard against building the 32/64-bit versions of the asm-offsets*.c file directly 2014-12-11 11:43:56 +01:00
asm-offsets_64.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-14 11:51:50 -08:00
asm-offsets.c sched, x86: Provide a per-cpu preempt_count implementation 2013-09-25 14:07:57 +02:00
audit_64.c x86: hook up execveat system call 2014-12-13 12:42:51 -08:00
bootflag.c
check.c x86/mm: memblock: switch to use NUMA_NO_NODE 2014-01-21 16:19:47 -08:00
cpuid.c x86, cpuid: Use PTR_ERR_OR_ZERO 2014-10-17 13:40:51 -07:00
crash_dump_32.c
crash_dump_64.c
crash.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h 2014-12-16 14:08:17 +01:00
devicetree.c x86, irq, devicetree: Release IOAPIC pin when PCI device is disabled 2014-06-21 23:05:44 +02:00
doublefault.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
dumpstack_32.c Merge branch 'x86-threadinfo-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 10:17:18 -07:00
dumpstack_64.c x86_64, traps: Stop using IST for #SS 2014-11-23 13:56:19 -08:00
dumpstack.c kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation 2014-04-24 10:26:38 +02:00
e820.c x86/mm: Use min() instead of min_t() in the e820 printout code 2014-12-11 11:35:02 +01:00
early_printk.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
early-quirks.c drm/i915/skl: Add the additional graphics stolen sizes 2014-09-24 14:47:39 +02:00
entry_32.S Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-19 14:02:02 -08:00
entry_64.S x86_64, entry: Use sysret to return to userspace when possible 2015-02-01 04:03:01 -08:00
espfix_64.c x86, espfix: Remove stale ptemask 2014-11-11 17:57:46 +01:00
ftrace.c ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter 2014-12-01 14:08:58 -05:00
head32.c asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
head64.c kernel/printk: use symbolic defines for console loglevels 2014-06-04 16:54:17 -07:00
head_32.S x86: fix compile error due to X86_TRAP_NMI use in asm files 2014-03-07 18:58:40 -08:00
head_64.S x86: fix compile error due to X86_TRAP_NMI use in asm files 2014-03-07 18:58:40 -08:00
head.c x86: Make sure we can boot in the case the BDA contains pure garbage 2013-02-27 13:38:57 -08:00
hpet.c Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-05 08:05:29 -07:00
hw_breakpoint.c x86: Replace __get_cpu_var uses 2014-08-26 13:45:49 -04:00
i386_ksyms_32.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
i387.c x86/xsaves: Clear reserved bits in xsave header 2014-05-29 14:33:00 -07:00
i8237.c
i8253.c
i8259.c x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts 2014-10-28 12:01:08 +01:00
io_delay.c
ioport.c x86: get rid of pt_regs argument of iopl(2) 2013-02-03 18:16:24 -05:00
iosf_mbi.c x86/platform/intel/iosf: Add debugfs config option for IOSF 2014-09-19 13:08:43 +02:00
irq_32.c x86: Clean up current_stack_pointer 2015-01-02 10:22:46 -08:00
irq_64.c x86: Replace __get_cpu_var uses 2014-08-26 13:45:49 -04:00
irq_work.c x86: Tell irq work about self IPI support 2014-09-13 18:38:29 +02:00
irq.c x86, irq: Properly tag virtualization entry in /proc/interrupts 2015-01-20 12:37:23 +01:00
irqinit.c x86, irq: Move local APIC related code from io_apic.c into vector.c 2014-12-16 14:08:16 +01:00
jump_label.c x86/jump_label: expect default_nop if static_key gets enabled on boot-up 2013-10-19 19:45:35 -04:00
kdebugfs.c
kexec-bzimage64.c kexec-bzimage64: fix sparse warnings 2014-10-14 02:18:21 +02:00
kgdb.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
ksysfs.c x86: ksysfs.c build fix 2014-01-03 14:37:13 +00:00
kvm.c x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit 2014-12-10 12:49:39 +01:00
kvmclock.c x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit 2014-12-10 12:49:39 +01:00
ldt.c Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" 2014-05-21 10:22:59 -07:00
machine_kexec_32.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h 2014-12-16 14:08:17 +01:00
machine_kexec_64.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h 2014-12-16 14:08:17 +01:00
Makefile x86_64,vsyscall: Make vsyscall emulation configurable 2014-11-03 21:44:57 +01:00
mcount_64.S ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter 2014-12-01 14:08:58 -05:00
mmconf-fam10h_64.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
module.c x86, kaslr: fix module lock ordering problem 2014-03-24 10:18:26 -07:00
mpparse.c x86, apic: Remove mps_oem_check callback 2014-07-31 08:05:42 -07:00
msr.c x86, msr: Use seek definitions instead of hard-coded values 2014-10-17 13:40:55 -07:00
nmi_selftest.c
nmi.c kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation 2014-04-24 10:26:38 +02:00
paravirt_patch_32.c
paravirt_patch_64.c x86_64/entry/xen: Do not invoke espfix64 on Xen 2014-07-28 15:25:40 -07:00
paravirt-spinlocks.c x86, ticketlock: Add slowpath logic 2013-08-09 07:54:00 -07:00
paravirt.c kprobes, x86: Prohibit probing on native_set_debugreg()/load_idt() 2014-04-24 10:02:58 +02:00
pci-calgary_64.c x86, calgary: Use 8M TCE table size by default 2014-04-10 19:51:32 -07:00
pci-dma.c arch/x86/kernel/pci-dma.c: fix dma_generic_alloc_coherent() when CONFIG_DMA_CMA is enabled 2014-06-04 16:53:57 -07:00
pci-iommu_table.c
pci-nommu.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
pci-swiotlb.c x86: enable DMA CMA with swiotlb 2014-06-04 16:53:57 -07:00
pcspeaker.c
perf_regs.c perf/x86_64: Improve user regs sampling 2015-01-09 11:12:29 +01:00
pmc_atom.c x86/platform/pmc_atom: Fix warning when CONFIG_DEBUG_FS=n 2014-09-19 13:02:21 +02:00
probe_roms.c
process_32.c x86: copy_thread: Don't nullify ->ptrace_bps twice 2014-09-02 14:51:17 -07:00
process_64.c x86_64, switch_to(): Load TLS descriptors before switching DS and ES 2014-12-11 11:40:08 +01:00
process.c x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct() 2014-09-02 14:51:16 -07:00
ptrace.c x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1 2014-11-20 23:01:53 +01:00
pvclock.c hung_task: add method to reset detector 2013-11-06 09:49:02 +02:00
quirks.c x86: HPET force enable for e6xx based systems 2014-09-15 17:53:35 -07:00
reboot_fixups_32.c
reboot.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h 2014-12-16 14:08:17 +01:00
relocate_kernel_32.S x86, asm, cleanup: Replace open-coded control register values with symbolic 2013-06-25 16:26:06 -07:00
relocate_kernel_64.S x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S 2013-06-20 21:30:04 -07:00
resource.c x86: don't exclude low BIOS area when allocating address space for non-PCI cards 2014-07-16 12:29:36 -06:00
rtc.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
setup_percpu.c x86: Convert a few more per-CPU items to read-mostly ones 2014-11-04 20:13:28 +01:00
setup.c Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-10 14:24:20 -08:00
signal.c x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks 2015-01-07 07:47:42 -08:00
smp.c asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
smpboot.c x86, smpboot: Remove pointless preempt_disable() in native_smp_prepare_cpus() 2014-12-16 14:08:14 +01:00
stacktrace.c
step.c ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL 2013-01-22 10:08:00 -08:00
sys_x86_64.c x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member 2013-08-22 10:19:35 -07:00
syscall_32.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
syscall_64.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
sysfb_efi.c x86: sysfb: move EFI quirks from efifb to sysfb 2013-08-02 16:17:47 -07:00
sysfb_simplefb.c x86/simplefb: Use PTR_ERR_OR_ZERO 2014-10-17 13:40:52 -07:00
sysfb.c x86/sysfb: Use PTR_ERR_OR_ZERO 2014-10-17 13:40:52 -07:00
tboot.c x86 / tboot / ACPI: Fail extended mode reduced hardware sleep 2013-07-31 14:25:51 +02:00
tce_64.c
test_nx.c
test_rodata.c
time.c x86_64/vdso: Remove jiffies from the vvar page 2014-10-28 11:22:13 +01:00
tls.c x86, tls: Interpret an all-zero struct user_desc as "no segment" 2015-01-22 21:45:07 +01:00
tls.h
topology.c hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock() 2013-09-30 19:55:51 +02:00
trace_clock.c tracing,x86: Add a TSC trace_clock 2012-11-13 15:48:27 -05:00
tracepoint.c x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
traps.c x86, traps: Fix ist_enter from userspace 2015-02-01 04:02:53 -08:00
tsc_msr.c x86: tsc: Add missing Baytrail frequency to the table 2014-02-19 17:12:24 +01:00
tsc_sync.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
tsc.c x86, apic: Handle a bad TSC more gracefully 2014-10-22 21:31:46 +02:00
uprobes.c x86: Remove arbitrary instruction size limit in instruction decoder 2014-11-18 00:58:52 +01:00
verify_cpu.S
vm86_32.c x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...) 2013-05-02 20:36:32 -04:00
vmlinux.lds.S x86-64: Use RIP-relative addressing for most per-CPU accesses 2014-11-04 20:43:14 +01:00
vsmp_64.c x86/apic/vsmp: Make is_vsmp_box() static 2014-08-01 15:09:45 -07:00
vsyscall_64.c x86_64/vsyscall: Restore orig_ax after vsyscall seccomp 2014-11-10 10:46:35 +01:00
vsyscall_emu_64.S
vsyscall_gtod.c timekeeping: Create struct tk_read_base and use it in struct timekeeper 2014-07-23 15:01:53 -07:00
vsyscall_trace.h
x86_init.c Revert "PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()" 2014-11-11 15:14:30 -07:00
x8664_ksyms_64.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
xsave.c x86: export get_xsave_addr 2014-12-05 13:55:44 +01:00