Commit Graph

41594 Commits

Author SHA1 Message Date
Kees Cook
c7500c1b53 um: Allow builds with Clang
Add SUBARCH target for Clang+um (which must go last, not alphabetically,
so the other SUBARCHes are assigned). Remove open-coded "DEFINE"
macro, instead using linux/kbuild.h's version which was updated to use
Clang-friendly assembly in commit cf0c3e68aa ("kbuild: fix asm-offset
generation to work with clang"). Redefine "DEFINE_LONGS" in terms of
"COMMENT" and "DEFINE" so that the intended coment actually has useful
content. Add a missed "break" to avoid implicit fall-through warnings.

This lets me run KUnit tests with Clang:

$ ./tools/testing/kunit/kunit.py run --make_options LLVM=1
...

Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: linux-um@lists.infradead.org
Cc: linux-kbuild@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: kunit-dev@googlegroups.com
Cc: llvm@lists.linux.dev
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/lkml/Yg2YubZxvYvx7%2Fnm@dev-arch.archlinux-ax161/
Tested-by: David Gow <davidgow@google.com>
Link: https://lore.kernel.org/lkml/CABVgOSk=oFxsbSbQE-v65VwR2+mXeGXDDjzq8t7FShwjJ3+kUg@mail.gmail.com/
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v1: https://lore.kernel.org/lkml/20220217002843.2312603-1-keescook@chromium.org
v2: https://lore.kernel.org/lkml/20220224055831.1854786-1-keescook@chromium.org
v3:
 - use kbuild.h to avoid duplication (Masahiro)
 - fix intended comments (Masahiro)
 - use SUBARCH (Nathan)
2022-03-21 08:13:03 -07:00
Paolo Bonzini
c9b8fecddb KVM: use kvcalloc for array allocations
Instead of using array_size, use a function that takes care of the
multiplication.  While at it, switch to kvcalloc since this allocation
should not be very large.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21 09:28:41 -04:00
Oliver Upton
6d8491910f KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
advertise the set of quirks which may be disabled to userspace, so it is
impossible to predict the behavior of KVM. Worse yet,
KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
it fails to reject attempts to set invalid quirk bits.

The only valid workaround for the quirky quirks API is to add a new CAP.
Actually advertise the set of quirks that can be disabled to userspace
so it can predict KVM's behavior. Reject values for cap->args[0] that
contain invalid bits.

Finally, add documentation for the new capability and describe the
existing quirks.

Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20220301060351.442881-5-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21 09:28:41 -04:00
Thomas Gleixner
5e17b2ee45 kvm: x86: Require const tsc for RT
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Message-Id: <Yh5eJSG19S2sjZfy@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21 09:28:40 -04:00
Paolo Bonzini
f144c49e8c KVM: x86: synthesize CPUID leaf 0x80000021h if useful
Guests X86_BUG_NULL_SEG if and only if the host has them.  Use the info
from static_cpu_has_bug to form the 0x80000021 CPUID leaf that was
defined for Zen3.  Userspace can then set the bit even on older CPUs
that do not have the bug, such as Zen2.

Do the same for X86_FEATURE_LFENCE_RDTSC as well, since various processors
have had very different ways of detecting it and not all of them are
available to userspace.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21 09:28:40 -04:00
Paolo Bonzini
58b3d12c0a KVM: x86: add support for CPUID leaf 0x80000021
CPUID leaf 0x80000021 defines some features (or lack of bugs) of AMD
processors.  Expose the ones that make sense via KVM_GET_SUPPORTED_CPUID.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21 09:28:40 -04:00
Maxim Levitsky
bf07be36cd KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
KVM_X86_OP_OPTIONAL_RET0 can only be used with 32-bit return values on 32-bit
systems, because unsigned long is only 32-bits wide there and 64-bit values
are returned in edx:eax.

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21 09:28:25 -04:00
Paolo Bonzini
873dd12217 Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
This reverts commit cf3e26427c.

Multi-vCPU Hyper-V guests started crashing randomly on boot with the
latest kvm/queue and the problem can be bisected the problem to this
particular patch. Basically, I'm not able to boot e.g. 16-vCPU guest
successfully anymore. Both Intel and AMD seem to be affected. Reverting
the commit saves the day.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21 05:11:51 -04:00
Paolo Bonzini
fcb93eb6d0 kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
Since "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
is going to be reverted, it's not going to be true anymore that
the zap-page flow does not free any 'struct kvm_mmu_page'.  Introduce
an early flush before tdp_mmu_zap_leafs() returns, to preserve
bisectability.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21 05:11:51 -04:00
Borislav Petkov
fe83f5eae4 kvm/emulate: Fix SETcc emulation function offsets with SLS
The commit in Fixes started adding INT3 after RETs as a mitigation
against straight-line speculation.

The fastop SETcc implementation in kvm's insn emulator uses macro magic
to generate all possible SETcc functions and to jump to them when
emulating the respective instruction.

However, it hardcodes the size and alignment of those functions to 4: a
three-byte SETcc insn and a single-byte RET. BUT, with SLS, there's an
INT3 that gets slapped after the RET, which brings the whole scheme out
of alignment:

  15:   0f 90 c0                seto   %al
  18:   c3                      ret
  19:   cc                      int3
  1a:   0f 1f 00                nopl   (%rax)
  1d:   0f 91 c0                setno  %al
  20:   c3                      ret
  21:   cc                      int3
  22:   0f 1f 00                nopl   (%rax)
  25:   0f 92 c0                setb   %al
  28:   c3                      ret
  29:   cc                      int3

and this explodes like this:

  int3: 0000 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 2435 Comm: qemu-system-x86 Not tainted 5.17.0-rc8-sls #1
  Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A14 04/30/2012
  RIP: 0010:setc+0x5/0x8 [kvm]
  Code: 00 00 0f 1f 00 0f b6 05 43 24 06 00 c3 cc 0f 1f 80 00 00 00 00 0f 90 c0 c3 cc 0f \
	  1f 00 0f 91 c0 c3 cc 0f 1f 00 0f 92 c0 c3 cc <0f> 1f 00 0f 93 c0 c3 cc 0f 1f 00 \
	  0f 94 c0 c3 cc 0f 1f 00 0f 95 c0
  Call Trace:
   <TASK>
   ? x86_emulate_insn [kvm]
   ? x86_emulate_instruction [kvm]
   ? vmx_handle_exit [kvm_intel]
   ? kvm_arch_vcpu_ioctl_run [kvm]
   ? kvm_vcpu_ioctl [kvm]
   ? __x64_sys_ioctl
   ? do_syscall_64
   ? entry_SYSCALL_64_after_hwframe
   </TASK>

Raise the alignment value when SLS is enabled and use a macro for that
instead of hard-coding naked numbers.

Fixes: e463a09af2 ("x86: Add straight-line-speculation mitigation")
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Link: https://lore.kernel.org/r/YjGzJwjrvxg5YZ0Z@audible.transient.net
[Add a comment and a bit of safety checking, since this is going to be changed
 again for IBT support. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-20 14:55:46 +01:00
Rafael J. Wysocki
31035f3e20 Merge branch 'thermal-hfi'
Merge Intel Hardware Feedback Interface (HFI) thermal driver for
5.18-rc1 and update the intel-speed-select utility to support that
driver.

* thermal-hfi:
  tools/power/x86/intel-speed-select: v1.12 release
  tools/power/x86/intel-speed-select: HFI support
  tools/power/x86/intel-speed-select: OOB daemon mode
  thermal: intel: hfi: INTEL_HFI_THERMAL depends on NET
  thermal: netlink: Fix parameter type of thermal_genl_cpu_capability_event() stub
  thermal: intel: hfi: Notify user space for HFI events
  thermal: netlink: Add a new event to notify CPU capabilities change
  thermal: intel: hfi: Enable notification interrupt
  thermal: intel: hfi: Handle CPU hotplug events
  thermal: intel: hfi: Minimally initialize the Hardware Feedback Interface
  x86/cpu: Add definitions for the Intel Hardware Feedback Interface
  x86/Documentation: Describe the Intel Hardware Feedback Interface
2022-03-18 19:00:26 +01:00
Rafael J. Wysocki
dfad78e07e Merge branches 'pm-sleep', 'pm-domains' and 'pm-docs'
Merge changes related to system sleep, PM domains changes and power
management documentation changes for 5.18-rc1:

 - Fix load_image_and_restore() error path (Ye Bin).

 - Fix typos in comments in the system wakeup hadling code (Tom Rix).

 - Clean up non-kernel-doc comments in hibernation code (Jiapeng
   Chong).

 - Fix __setup handler error handling in system-wide suspend and
   hibernation core code (Randy Dunlap).

 - Add device name to suspend_report_result() (Youngjin Jang).

 - Make virtual guests honour ACPI S4 hardware signature by
   default (David Woodhouse).

 - Block power off of a parent PM domain unless child is in deepest
   state (Ulf Hansson).

 - Use dev_err_probe() to simplify error handling for generic PM
   domains (Ahmad Fatoum).

 - Fix sleep-in-atomic bug caused by genpd_debug_remove() (Shawn Guo).

 - Document Intel uncore frequency scaling (Srinivas Pandruvada).

* pm-sleep:
  PM: hibernate: Honour ACPI hardware signature by default for virtual guests
  PM: sleep: Add device name to suspend_report_result()
  PM: suspend: fix return value of __setup handler
  PM: hibernate: fix __setup handler error handling
  PM: hibernate: Clean up non-kernel-doc comments
  PM: sleep: wakeup: Fix typos in comments
  PM: hibernate: fix load_image_and_restore() error path

* pm-domains:
  PM: domains: Fix sleep-in-atomic bug caused by genpd_debug_remove()
  PM: domains: use dev_err_probe() to simplify error handling
  PM: domains: Prevent power off for parent unless child is in deepest state

* pm-docs:
  Documentation: admin-guide: pm: Document uncore frequency scaling
2022-03-18 18:29:21 +01:00
Rafael J. Wysocki
24b2b094b5 Merge branches 'acpi-ec', 'acpi-cppc', 'acpi-fan' and 'acpi-battery'
Merge ACPI EC driver changes, CPPC-related changes, ACPI fan driver
changes and ACPI battery driver changes for 5.18-rc1:

 - Make wakeup events checks in the ACPI EC driver more
   straightforward and clean up acpi_ec_submit_event() (Rafael
   Wysocki).

 - Make it possible to obtain the CPU capacity with the help of CPPC
   information (Ionela Voinescu).

 - Improve fine grained fan control in the ACPI fan driver and
   document it (Srinivas Pandruvada).

 - Add device HID and quirk for Microsoft Surface Go 3 to the ACPI
   battery driver (Maximilian Luz).

* acpi-ec:
  ACPI: EC: Rearrange code in acpi_ec_submit_event()
  ACPI: EC: Reduce indentation level in acpi_ec_submit_event()
  ACPI: EC: Do not return result from advance_transaction()

* acpi-cppc:
  arm64, topology: enable use of init_cpu_capacity_cppc()
  arch_topology: obtain cpu capacity using information from CPPC
  x86, ACPI: rename init_freq_invariance_cppc() to arch_init_invariance_cppc()

* acpi-fan:
  Documentation/admin-guide/acpi: Add documentation for fine grain control
  ACPI: fan: Add additional attributes for fine grain control
  ACPI: fan: Properly handle fine grain control
  ACPI: fan: Optimize struct acpi_fan_fif
  ACPI: fan: Separate file for attributes creation
  ACPI: fan: Fix error reporting to user space

* acpi-battery:
  ACPI: battery: Add device HID and quirk for Microsoft Surface Go 3
2022-03-18 17:36:54 +01:00
Rafael J. Wysocki
03d5c98d91 Merge branches 'acpi-pm', 'acpi-properties', 'acpi-misc' and 'acpi-x86'
Merge ACPI power management changes, ACPI device properties handling
changes, x86-specific ACPI changes and miscellaneous ACPI changes for
5.18-rc1:

 - Add power management debug messages related to suspend-to-idle in
   two places (Rafael Wysocki).

 - Fix __acpi_node_get_property_reference() return value and clean up
   that function (Andy Shevchenko, Sakari Ailus).

 - Fix return value of the __setup handler in the ACPI PM timer clock
   source driver (Randy Dunlap).

 - Clean up double words in two comments (Tom Rix).

 - Add "skip i2c clients" quirks for Lenovo Yoga Tablet 1050F/L and
   Nextbook Ares 8 (Hans de Goede).

 - Clean up frequency invariance handling on x86 in the ACPI CPPC
   library (Huang Rui).

 - Work around broken XSDT on the Advantech DAC-BJ01 board (Mark
   Cilissen).

* acpi-pm:
  ACPI: EC / PM: Print additional debug message in acpi_ec_dispatch_gpe()
  ACPI: PM: Print additional debug message in acpi_s2idle_wake()

* acpi-properties:
  ACPI: property: Get rid of redundant 'else'
  ACPI: properties: Consistently return -ENOENT if there are no more references

* acpi-misc:
  clocksource: acpi_pm: fix return value of __setup handler
  ACPI: clean up double words in two comments

* acpi-x86:
  ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 board
  x86/ACPI: CPPC: Move init_freq_invariance_cppc() into x86 CPPC
  x86: Expose init_freq_invariance() to topology header
  x86/ACPI: CPPC: Move AMD maximum frequency ratio setting function into x86 CPPC
  x86/ACPI: CPPC: Rename cppc_msr.c to cppc.c
  ACPI / x86: Add skip i2c clients quirk for Lenovo Yoga Tablet 1050F/L
  ACPI / x86: Add skip i2c clients quirk for Nextbook Ares 8
2022-03-18 17:23:05 +01:00
Masami Hiramatsu
75caf33eda rethook: x86: Add rethook x86 implementation
Add rethook for x86 implementation. Most of the code has been copied from
kretprobes on x86.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/164735286243.1084943.7477055110527046644.stgit@devnote2
2022-03-17 20:16:35 -07:00
Jakub Kicinski
e243f39685 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17 13:56:58 -07:00
Hou Tao
73e14451f3 bpf, x86: Fall back to interpreter mode when extra pass fails
Extra pass for subprog jit may fail (e.g. due to bpf_jit_harden race),
but bpf_func is not cleared for the subprog and jit_subprogs will
succeed. The running of the bpf program may lead to oops because the
memory for the jited subprog image has already been freed.

So fall back to interpreter mode by clearing bpf_func/jited/jited_len
when extra pass fails.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220309123321.2400262-2-houtao1@huawei.com
2022-03-16 15:12:18 -07:00
David Woodhouse
f6c46b1d62 PM: hibernate: Honour ACPI hardware signature by default for virtual guests
The ACPI specification says that OSPM should refuse to restore from
hibernate if the hardware signature changes, and should boot from
scratch. However, real BIOSes often vary the hardware signature in cases
where we *do* want to resume from hibernate, so Linux doesn't follow the
spec by default.

However, in a virtual environment there's no reason for the VMM to vary
the hardware signature *unless* it wants to trigger a clean reboot as
defined by the ACPI spec. So enable the check by default if a hypervisor
is detected.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-16 19:29:32 +01:00
Jiri Kosina
d4c9df20a3 x86/nmi: Remove the 'strange power saving mode' hint from unknown NMI handler
The

  Do you have a strange power saving mode enabled?

hint when unknown NMI happens dates back to i386 stone age, and isn't
currently really helpful.

Unknown NMIs are coming for many different reasons (broken firmware,
faulty hardware, ...) and rarely have anything to do with 'strange power
saving mode' (whatever that even is).

Just remove it as it's largerly misleading.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2203140924120.24795@cbobk.fhfr.pm
2022-03-16 11:02:41 +01:00
Greg Kroah-Hartman
7f220d4a38 Merge tag 'v5.17-rc8' into usb-next
We need the Xen USB fixes as other patches depend on those changes.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16 09:04:22 +01:00
jianchunfu
309b517276 arch:x86:xen: Remove unnecessary assignment in xen_apic_read()
In the function xen_apic_read(), the initialized value of 'ret' is unused
because it will be assigned by the function HYPERVISOR_platform_op(),
thus remove it.

Signed-off-by: jianchunfu <jianchunfu@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20220314070514.2602-1-jianchunfu@cmss.chinamobile.com
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-03-15 20:35:35 -05:00
Peter Zijlstra
b0ae33a2d2 usb: early: xhci-dbc: Remove duplicate keep parsing
The generic earlyprintk= parsing already parses the optional ",keep",
no need to duplicate that in the xdbc driver.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220304152135.975568860@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-15 18:20:34 +01:00
Peter Zijlstra
69f8aeab43 x86/tsc: Be consistent about use_tsc_delay()
Currently loops_per_jiffy is set in tsc_early_init(), but then don't
switch to delay_tsc, with the result that delay_loop is used with
loops_per_jiffy set for delay_tsc.

Then in (late) tsc_init() lpj_fine is set (which is mostly unused) and
after which use_tsc_delay() is finally called.

Move both loops_per_jiffy and use_tsc_delay() into
tsc_enable_sched_clock() which is called the moment tsc_khz is
determined, be it early or late. Keeping the lot consistent.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220304152135.914397165@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-15 18:20:33 +01:00
Ingo Molnar
9cea0d46f5 Merge branch 'x86/cpu' into x86/core, to resolve conflicts
Conflicts:
	arch/x86/include/asm/cpufeatures.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-03-15 12:52:51 +01:00
Ingo Molnar
8c490b42fe Merge branch 'x86/pasid' into x86/core, to resolve conflicts
Conflicts:
	tools/objtool/arch/x86/decode.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-03-15 12:50:59 +01:00
Nathan Chancellor
aaeed6ecc1 x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
There are two outstanding issues with CONFIG_X86_X32_ABI and
llvm-objcopy, with similar root causes:

1. llvm-objcopy does not properly convert .note.gnu.property when going
   from x86_64 to x86_x32, resulting in a corrupted section when
   linking:

   https://github.com/ClangBuiltLinux/linux/issues/1141

2. llvm-objcopy produces corrupted compressed debug sections when going
   from x86_64 to x86_x32, also resulting in an error when linking:

   https://github.com/ClangBuiltLinux/linux/issues/514

After commit 41c5ef31ad71 ("x86/ibt: Base IBT bits"), the
.note.gnu.property section is always generated when
CONFIG_X86_KERNEL_IBT is enabled, which causes the first issue to become
visible with an allmodconfig build:

  ld.lld: error: arch/x86/entry/vdso/vclock_gettime-x32.o:(.note.gnu.property+0x1c): program property is too short

To avoid this error, do not allow CONFIG_X86_X32_ABI to be selected when
using llvm-objcopy. If the two issues ever get fixed in llvm-objcopy,
this can be turned into a feature check.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220314194842.3452-3-nathan@kernel.org
2022-03-15 10:32:48 +01:00
Masahiro Yamada
83a44a4f47 x86: Remove toolchain check for X32 ABI capability
Commit 0bf6276392 ("x32: Warn and disable rather than error if
binutils too old") added a small test in arch/x86/Makefile because
binutils 2.22 or newer is needed to properly support elf32-x86-64. This
check is no longer necessary, as the minimum supported version of
binutils is 2.23, which is enforced at configuration time with
scripts/min-tool-version.sh.

Remove this check and replace all uses of CONFIG_X86_X32 with
CONFIG_X86_X32_ABI, as two symbols are no longer necessary.

[nathan: Rebase, fix up a few places where CONFIG_X86_X32 was still
         used, and simplify commit message to satisfy -tip requirements]

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220314194842.3452-2-nathan@kernel.org
2022-03-15 10:32:48 +01:00
Peter Zijlstra
ed53a0d971 x86/alternative: Use .ibt_endbr_seal to seal indirect calls
Objtool's --ibt option generates .ibt_endbr_seal which lists
superfluous ENDBR instructions. That is those instructions for which
the function is never indirectly called.

Overwrite these ENDBR instructions with a NOP4 such that these
function can never be indirect called, reducing the number of viable
ENDBR targets in the kernel.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.822545231@infradead.org
2022-03-15 10:32:47 +01:00
Peter Zijlstra
89bc853eae objtool: Find unused ENDBR instructions
Find all ENDBR instructions which are never referenced and stick them
in a section such that the kernel can poison them, sealing the
functions from ever being an indirect call target.

This removes about 1-in-4 ENDBR instructions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.763643193@infradead.org
2022-03-15 10:32:47 +01:00
Peter Zijlstra
3515899bef x86: Annotate idtentry_df()
Without CONFIG_X86_ESPFIX64 exc_double_fault() is noreturn and objtool
is clever enough to figure that out.

vmlinux.o: warning: objtool: asm_exc_double_fault()+0x22: unreachable instruction

0000000000001260 <asm_exc_double_fault>:
1260:       f3 0f 1e fa             endbr64
1264:       90                      nop
1265:       90                      nop
1266:       90                      nop
1267:       e8 84 03 00 00          call   15f0 <paranoid_entry>
126c:       48 89 e7                mov    %rsp,%rdi
126f:       48 8b 74 24 78          mov    0x78(%rsp),%rsi
1274:       48 c7 44 24 78 ff ff ff ff      movq   $0xffffffffffffffff,0x78(%rsp)
127d:       e8 00 00 00 00          call   1282 <asm_exc_double_fault+0x22> 127e: R_X86_64_PLT32    exc_double_fault-0x4
1282:       e9 09 04 00 00          jmp    1690 <paranoid_exit>

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Yi9gOW9f1GGwwUD6@hirez.programming.kicks-ass.net
2022-03-15 10:32:45 +01:00
Peter Zijlstra
dca5da2abe x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
Because we need a variant for .S files too.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Yi9gOW9f1GGwwUD6@hirez.programming.kicks-ass.net
2022-03-15 10:32:45 +01:00
Peter Zijlstra
be0075951f x86: Annotate call_on_stack()
vmlinux.o: warning: objtool: page_fault_oops()+0x13c: unreachable instruction

0000 000000000005b460 <page_fault_oops>:
...
0128    5b588:  49 89 23                mov    %rsp,(%r11)
012b    5b58b:  4c 89 dc                mov    %r11,%rsp
012e    5b58e:  4c 89 f2                mov    %r14,%rdx
0131    5b591:  48 89 ee                mov    %rbp,%rsi
0134    5b594:  4c 89 e7                mov    %r12,%rdi
0137    5b597:  e8 00 00 00 00          call   5b59c <page_fault_oops+0x13c>    5b598: R_X86_64_PLT32   handle_stack_overflow-0x4
013c    5b59c:  5c                      pop    %rsp

vmlinux.o: warning: objtool: sysvec_reboot()+0x6d: unreachable instruction

0000 00000000000033f0 <sysvec_reboot>:
...
005d     344d:  4c 89 dc                mov    %r11,%rsp
0060     3450:  e8 00 00 00 00          call   3455 <sysvec_reboot+0x65>        3451: R_X86_64_PLT32    irq_enter_rcu-0x4
0065     3455:  48 89 ef                mov    %rbp,%rdi
0068     3458:  e8 00 00 00 00          call   345d <sysvec_reboot+0x6d>        3459: R_X86_64_PC32     .text+0x47d0c
006d     345d:  e8 00 00 00 00          call   3462 <sysvec_reboot+0x72>        345e: R_X86_64_PLT32    irq_exit_rcu-0x4
0072     3462:  5c                      pop    %rsp

Both cases are due to a call_on_stack() calling a __noreturn function.
Since that's an inline asm, GCC can't do anything about the
instructions after the CALL. Therefore put in an explicit
ASM_REACHABLE annotation to make sure objtool and gcc are consistently
confused about control flow.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.468805622@infradead.org
2022-03-15 10:32:44 +01:00
Peter Zijlstra
f9cdf7ca57 x86: Mark stop_this_cpu() __noreturn
vmlinux.o: warning: objtool: smp_stop_nmi_callback()+0x2b: unreachable instruction

0000 0000000000047cf0 <smp_stop_nmi_callback>:
...
0026    47d16:  e8 00 00 00 00          call   47d1b <smp_stop_nmi_callback+0x2b>       47d17: R_X86_64_PLT32   stop_this_cpu-0x4
002b    47d1b:  b8 01 00 00 00          mov    $0x1,%eax

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.290905453@infradead.org
2022-03-15 10:32:43 +01:00
Peter Zijlstra
2b6ff7dea6 x86/ibt: Dont generate ENDBR in .discard.text
Having ENDBR in discarded sections can easily lead to relocations into
discarded sections which the linkers aren't really fond of. Objtool
also shouldn't generate them, but why tempt fate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.054842742@infradead.org
2022-03-15 10:32:42 +01:00
Peter Zijlstra
e8d61bdf0f x86/ibt,sev: Annotations
No IBT on AMD so far.. probably correct, who knows.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.995109889@infradead.org
2022-03-15 10:32:41 +01:00
Peter Zijlstra
3215de84c0 x86/ibt,ftrace: Annotate ftrace code patching
These are code patching sites, not indirect targets.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.936599479@infradead.org
2022-03-15 10:32:41 +01:00
Peter Zijlstra
3e3f069504 x86/ibt: Annotate text references
Annotate away some of the generic code references. This is things
where we take the address of a symbol for exception handling or return
addresses (eg. context switch).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.877758523@infradead.org
2022-03-15 10:32:40 +01:00
Peter Zijlstra
fe379fa4d1 x86/ibt: Disable IBT around firmware
Assume firmware isn't IBT clean and disable it across calls.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.759989383@infradead.org
2022-03-15 10:32:40 +01:00
Peter Zijlstra
99c95c5d4f x86/alternative: Simplify int3_selftest_ip
Similar to ibt_selftest_ip, apply the same pattern.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.700456643@infradead.org
2022-03-15 10:32:40 +01:00
Peter Zijlstra
af22700390 x86/ibt,kexec: Disable CET on kexec
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.641454603@infradead.org
2022-03-15 10:32:39 +01:00
Peter Zijlstra
991625f3dd x86/ibt: Add IBT feature, MSR and #CP handling
The bits required to make the hardware go.. Of note is that, provided
the syscall entry points are covered with ENDBR, #CP doesn't need to
be an IST because we'll never hit the syscall gap.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.582331711@infradead.org
2022-03-15 10:32:39 +01:00
Peter Zijlstra
5891271055 x86/ibt,bpf: Add ENDBR instructions to prologue and trampoline
With IBT enabled builds we need ENDBR instructions at indirect jump
target sites, since we start execution of the JIT'ed code through an
indirect jump, the very first instruction needs to be ENDBR.

Similarly, since eBPF tail-calls use indirect branches, their landing
site needs to be an ENDBR too.

The trampolines need similar adjustment.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Fixed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.464998838@infradead.org
2022-03-15 10:32:38 +01:00
Peter Zijlstra
cc66bb9145 x86/ibt,kprobes: Cure sym+0 equals fentry woes
In order to allow kprobes to skip the ENDBR instructions at sym+0 for
X86_KERNEL_IBT builds, change _kprobe_addr() to take an architecture
callback to inspect the function at hand and modify the offset if
needed.

This streamlines the existing interface to cover more cases and
require less hooks. Once PowerPC gets fully converted there will only
be the one arch hook.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.405947704@infradead.org
2022-03-15 10:32:38 +01:00
Peter Zijlstra
e52fc2cf3f x86/ibt,ftrace: Make function-graph play nice
Return trampoline must not use indirect branch to return; while this
preserves the RSB, it is fundamentally incompatible with IBT. Instead
use a retpoline like ROP gadget that defeats IBT while not unbalancing
the RSB.

And since ftrace_stub is no longer a plain RET, don't use it to copy
from. Since RET is a trivial instruction, poke it directly.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.347296408@infradead.org
2022-03-15 10:32:37 +01:00
Peter Zijlstra
aebfd12521 x86/ibt,ftrace: Search for __fentry__ location
Currently a lot of ftrace code assumes __fentry__ is at sym+0. However
with Intel IBT enabled the first instruction of a function will most
likely be ENDBR.

Change ftrace_location() to not only return the __fentry__ location
when called for the __fentry__ location, but also when called for the
sym+0 location.

Then audit/update all callsites of this function to consistently use
these new semantics.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.227581603@infradead.org
2022-03-15 10:32:37 +01:00
Peter Zijlstra
6649fa876d x86/ibt,kvm: Add ENDBR to fastops
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.168850084@infradead.org
2022-03-15 10:32:37 +01:00
Peter Zijlstra
214b9a83b6 x86/ibt,crypto: Add ENDBR for the jump-table entries
The code does:

	## branch into array
	mov     jump_table(,%rax,8), %bufp
	JMP_NOSPEC bufp

resulting in needing to mark the jump-table entries with ENDBR.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.110500806@infradead.org
2022-03-15 10:32:36 +01:00
Peter Zijlstra
c3b037917c x86/ibt,paravirt: Sprinkle ENDBR
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154318.051635891@infradead.org
2022-03-15 10:32:36 +01:00
Peter Zijlstra
c4691712b5 x86/linkage: Add ENDBR to SYM_FUNC_START*()
Ensure the ASM functions have ENDBR on for IBT builds, this follows
the ARM64 example. Unlike ARM64, we'll likely end up overwriting them
with poison.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.992708941@infradead.org
2022-03-15 10:32:36 +01:00
Peter Zijlstra
8f93402b92 x86/ibt,entry: Sprinkle ENDBR dust
Kernel entry points should be having ENDBR on for IBT configs.

The SYSCALL entry points are found through taking their respective
address in order to program them in the MSRs, while the exception
entry points are found through UNWIND_HINT_IRET_REGS.

The rule is that any UNWIND_HINT_IRET_REGS at sym+0 should have an
ENDBR, see the later objtool ibt validation patch.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.933157479@infradead.org
2022-03-15 10:32:35 +01:00