linux/arch/x86
Andy Lutomirski 2a23c6b8a9 x86_64, entry: Use sysret to return to userspace when possible
The x86_64 entry code currently jumps through complex and
inconsistent hoops to try to minimize the impact of syscall exit
work.  For a true fast-path syscall, almost nothing needs to be
done, so returning is just a check for exit work and sysret.  For a
full slow-path return from a syscall, the C exit hook is invoked if
needed and we join the iret path.

Using iret to return to userspace is very slow, so the entry code
has accumulated various special cases to try to do certain forms of
exit work without invoking iret.  This is error-prone, since it
duplicates assembly code paths, and it's dangerous, since sysret
can malfunction in interesting ways if used carelessly.  It's
also inefficient, since a lot of useful cases aren't optimized
and therefore force an iret out of a combination of paranoia and
the fact that no one has bothered to write even more asm code
to avoid it.

I would argue that this approach is backwards.  Rather than trying
to avoid the iret path, we should instead try to make the iret path
fast.  Under a specific set of conditions, iret is unnecessary.  In
particular, if RIP==RCX, RFLAGS==R11, RIP is canonical, RF is not
set, and both SS and CS are as expected, then
movq 32(%rsp),%rsp;sysret does the same thing as iret.  This set of
conditions is nearly always satisfied on return from syscalls, and
it can even occasionally be satisfied on return from an irq.

Even with the careful checks for sysret applicability, this cuts
nearly 80ns off of the overhead from syscalls with unoptimized exit
work.  This includes tracing and context tracking, and any return
that invokes KVM's user return notifier.  For example, the cost of
getpid with CONFIG_CONTEXT_TRACKING_FORCE=y drops from ~360ns to
~280ns on my computer.

This may allow the removal and even eventual conversion to C
of a respectable amount of exit asm.

This may require further tweaking to give the full benefit on Xen.

It may be worthwhile to adjust signal delivery and exec to try hit
the sysret path.

This does not optimize returns to 32-bit userspace.  Making the same
optimization for CS == __USER32_CS is conceptually straightforward,
but it will require some tedious code to handle the differences
between sysretl and sysexitl.

Link: http://lkml.kernel.org/r/71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.net
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-02-01 04:03:01 -08:00
..
boot x86, boot: Skip relocs when load address unchanged 2015-01-20 12:37:23 +01:00
configs x86/kconfig/defconfig: Enable CONFIG_FHANDLE=y 2014-12-08 12:04:17 +01:00
crypto crypto: sha-mb - Add avx2_supported check. 2015-01-05 21:35:02 +11:00
ia32 x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET 2015-01-13 14:10:31 -08:00
include This is my accumulated x86 entry work, part 1, for 3.20. The meat 2015-01-28 15:33:26 +01:00
kernel x86_64, entry: Use sysret to return to userspace when possible 2015-02-01 04:03:01 -08:00
kvm kvm: x86: drop severity of "generation wraparound" message 2014-12-27 21:52:28 +01:00
lguest x86: Avoid building unused IRQ entry stubs 2014-12-16 14:08:14 +01:00
lib x86: Fix off-by-one in instruction decoder 2015-01-09 11:12:26 +01:00
math-emu
mm x86, mpx: Explicitly disable 32-bit MPX support on 64-bit kernels 2015-01-22 21:11:06 +01:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-12-10 15:48:20 -05:00
oprofile percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t 2014-08-28 08:58:57 -04:00
pci x86/xen: Override ACPI IRQ management callback __acpi_unregister_gsi 2015-01-20 11:44:41 +01:00
platform Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-19 14:02:02 -08:00
power nosave: consolidate __nosave_{begin,end} in <asm/sections.h> 2014-10-09 22:26:04 -04:00
purgatory Merge branches 'x86-build-for-linus', 'x86-cleanups-for-linus' and 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-10 12:35:46 -08:00
realmode
syscalls x86: hook up execveat system call 2014-12-13 12:42:51 -08:00
tools Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-10 12:10:24 -08:00
um x86, um: actually mark system call tables readonly 2015-01-04 14:21:25 +01:00
vdso x86_64, vdso: Fix the vdso address randomization algorithm 2014-12-20 16:56:57 -08:00
video
xen xen: bug fixes for 3.19-rc4 2015-01-14 08:07:42 +13:00
.gitignore x86/build: Add arch/x86/purgatory/ make generated files to gitignore 2014-10-09 09:29:46 +02:00
Kbuild kexec: create a new config option CONFIG_KEXEC_FILE for new syscall 2014-08-29 16:28:16 -07:00
Kconfig Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-19 14:02:02 -08:00
Kconfig.cpu
Kconfig.debug
Makefile Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 18:17:33 +02:00
Makefile_32.cpu
Makefile.um