linux

Author	SHA1	Message	Date
Tianyu Lan	faff44069f	x86/hyperv: Add Write/Read MSR registers via ghcb page Hyperv provides GHCB protocol to write Synthetic Interrupt Controller MSR registers in Isolation VM with AMD SEV SNP and these registers are emulated by hypervisor directly. Hyperv requires to write SINTx MSR registers twice. First writes MSR via GHCB page to communicate with hypervisor and then writes wrmsr instruction to talk with paravisor which runs in VMPL0. Guest OS ID MSR also needs to be set via GHCB page. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-7-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-28 11:22:38 +00:00
Tianyu Lan	810a521265	x86/hyperv: Add new hvcall guest address host visibility support Add new hvcall guest address host visibility support to mark memory visible to host. Call it inside set_memory_decrypted /encrypted(). Add HYPERVISOR feature check in the hv_is_isolation_supported() to optimize in non-virtualization environment. Acked-by: Dave Hansen <dave.hansen@intel.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-4-ltykernel@gmail.com [ wei: fix conflicts with tip ] Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-28 11:21:33 +00:00
Tianyu Lan	af788f355e	x86/hyperv: Initialize shared memory boundary in the Isolation VM. Hyper-V exposes shared memory boundary via cpuid HYPERV_CPUID_ISOLATION_CONFIG and store it in the shared_gpa_boundary of ms_hyperv struct. This prepares to share memory with host for SNP guest. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-3-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-28 10:47:52 +00:00
Tianyu Lan	0cc4f6d9f0	x86/hyperv: Initialize GHCB page in Isolation VM Hyperv exposes GHCB page via SEV ES GHCB MSR for SNP guest to communicate with hypervisor. Map GHCB page for all cpus to read/write MSR register and submit hvcall request via ghcb page. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-2-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-10-28 10:47:52 +00:00
Wei Liu	e82f2069b5	Merge remote-tracking branch 'tip/x86/cc' into hyperv-next	2021-10-28 10:46:03 +00:00
Dave Airlie	970eae1560	BackMerge tag 'v5.15-rc7' into drm-next The msm next tree is based on rc3, so let's just backmerge rc7 before pulling it in. Signed-off-by: Dave Airlie <airlied@redhat.com>	2021-10-28 14:59:38 +10:00
王贇	ce5e48036c	ftrace: disable preemption when recursion locked As the documentation explained, ftrace_test_recursion_trylock() and ftrace_test_recursion_unlock() were supposed to disable and enable preemption properly, however currently this work is done outside of the function, which could be missing by mistake. And since the internal using of trace_test_and_set_recursion() and trace_clear_recursion() also require preemption disabled, we can just merge the logical. This patch will make sure the preemption has been disabled when trace_test_and_set_recursion() return bit >= 0, and trace_clear_recursion() will enable the preemption if previously enabled. Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com CC: Petr Mladek <pmladek@suse.com> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: Miroslav Benes <mbenes@suse.cz> Reported-by: Abaci <abaci@linux.alibaba.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> [ Removed extra line in comment - SDR ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-10-27 11:21:49 -04:00
Paolo Bonzini	9b0971ca7f	KVM: SEV-ES: fix another issue with string I/O VMGEXITs If the guest requests string I/O from the hypervisor via VMGEXIT, SW_EXITINFO2 will contain the REP count. However, sev_es_string_io was incorrectly treating it as the size of the GHCB buffer in bytes. This fixes the "outsw" test in the experimental SEV tests of kvm-unit-tests. Cc: stable@vger.kernel.org Fixes: `7ed9abfe8e` ("KVM: SVM: Support string IO operations for an SEV-ES guest") Reported-by: Marc Orr <marcorr@google.com> Tested-by: Marc Orr <marcorr@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-27 10:58:26 -04:00
Kees Cook	33f98a9798	x86/boot/compressed: Avoid duplicate malloc() implementations The early malloc() and free() implementation in include/linux/decompress/mm.h (which is also included by the static decompressors) is static. This is fine when the only thing interested in using malloc() is the decompression code, but the x86 early boot environment may use malloc() in a couple places, leading to a potential collision when the static copies of the available memory region ("malloc_ptr") gets reset to the global "free_mem_ptr" value. As it happened, the existing usage pattern was accidentally safe because each user did 1 malloc() and 1 free() before returning and were not nested: extract_kernel() (misc.c) choose_random_location() (kaslr.c) mem_avoid_init() handle_mem_options() malloc() ... free() ... parse_elf() (misc.c) malloc() ... free() Once the future FGKASLR series is added, however, it will insert additional malloc() calls local to fgkaslr.c in the middle of parse_elf()'s malloc()/free() pair: parse_elf() (misc.c) malloc() if (...) { layout_randomized_image(output, &ehdr, phdrs); malloc() <- boom ... else layout_image(output, &ehdr, phdrs); free() To avoid collisions, there must be a single implementation of malloc(). Adjust include/linux/decompress/mm.h so that visibility can be controlled, provide prototypes in misc.h, and implement the functions in misc.c. This also results in a small size savings: $ size vmlinux.before vmlinux.after text data bss dec hex filename 8842314 468 178320 9021102 89a6ae vmlinux.before 8842240 468 178320 9021028 89a664 vmlinux.after Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211013175742.1197608-4-keescook@chromium.org	2021-10-27 11:07:59 +02:00
Kees Cook	0d054d4e82	x86/boot: Allow a "silent" kaslr random byte fetch Under earlyprintk, each RNG call produces a debug report line. To support the future FGKASLR feature, which will fetch random bytes during function shuffling, this is not useful information (each line is identical and tells us nothing new), needlessly spamming the console. Instead, allow for a NULL "purpose" to suppress the debug reporting. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20211013175742.1197608-3-keescook@chromium.org	2021-10-27 11:07:58 +02:00
Kristen Carlson Accardi	a54c401ae6	x86/tools/relocs: Support >64K section headers While the relocs tool already supports finding the total number of section headers if vmlinux exceeds 64K sections, it fails to read the extended symbol table to get section header indexes for symbols, causing incorrect symbol table indexes to be used when there are > 64K symbols. Parse the ELF file to read the extended symbol table info, and then replace all direct references to st_shndx with calls to sym_index(), which will determine whether the value can be read directly or whether the value should be pulled out of the extended table. This is needed for future FGKASLR support, which uses a separate section per function. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211013175742.1197608-2-keescook@chromium.org	2021-10-27 11:07:58 +02:00
Masami Hiramatsu	1f6d3a8f5e	kprobes: Add a test case for stacktrace from kretprobe handler Add a test case for stacktrace from kretprobe handler and nested kretprobe handlers. This test checks both of stack trace inside kretprobe handler and stack trace from pt_regs. Those stack trace must include actual function return address instead of kretprobe trampoline. The nested kretprobe stacktrace test checks whether the unwinder can correctly unwind the call frame on the stack which has been modified by the kretprobe. Since the stacktrace on kretprobe is correctly fixed only on x86, this introduces a meta kconfig ARCH_CORRECT_STACKTRACE_ON_KRETPROBE which tells user that the stacktrace on kretprobe is correct or not. The test results will be shown like below; TAP version 14 1..1 # Subtest: kprobes_test 1..6 ok 1 - test_kprobe ok 2 - test_kprobes ok 3 - test_kretprobe ok 4 - test_kretprobes ok 5 - test_stacktrace_on_kretprobe ok 6 - test_stacktrace_on_nested_kretprobe # kprobes_test: pass:6 fail:0 skip:0 total:6 # Totals: pass:6 fail:0 skip:0 total:6 ok 1 - kprobes_test Link: https://lkml.kernel.org/r/163516211244.604541.18350507860972214415.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-10-26 17:23:45 -04:00
Peter Zijlstra	5d1ceb3969	x86: Fix __get_wchan() for !STACKTRACE Use asm/unwind.h to implement wchan, since we cannot always rely on STACKTRACE=y. Fixes: `bc9bbb8173` ("x86: Fix get_wchan() to support the ORC unwinder") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20211022152104.137058575@infradead.org	2021-10-26 21:10:12 +02:00
Chang S. Bae	2308ee57d9	x86/fpu/amx: Enable the AMX feature in 64-bit mode Add the AMX state components in XFEATURE_MASK_USER_SUPPORTED and the TILE_DATA component to the dynamic states and update the permission check table accordingly. This is only effective on 64 bit kernels as for 32bit kernels XFEATURE_MASK_TILE is defined as 0. TILE_DATA is caller-saved state and the only dynamic state. Add build time sanity check to ensure the assumption that every dynamic feature is caller- saved. Make AMX state depend on XFD as it is dynamic feature. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-24-chang.seok.bae@intel.com	2021-10-26 10:53:03 +02:00
Chang S. Bae	db3e7321b4	x86/fpu: Add XFD handling for dynamic states To handle the dynamic sizing of buffers on first use the XFD MSR has to be armed. Store the delta between the maximum available and the default feature bits in init_fpstate where it can be retrieved for task creation. If the delta is non zero then dynamic features are enabled. This needs also to enable the static key which guards the XFD updates. This is delayed to an initcall because the FPU setup runs before jump labels are initialized. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-23-chang.seok.bae@intel.com	2021-10-26 10:53:03 +02:00
Chang S. Bae	2ae996e0c1	x86/fpu: Calculate the default sizes independently When dynamically enabled states are supported the maximum and default sizes for the kernel buffers and user space interfaces are not longer identical. Put the necessary calculations in place which only take the default enabled features into account. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-22-chang.seok.bae@intel.com	2021-10-26 10:53:02 +02:00
Chang S. Bae	eec2113eab	x86/fpu/amx: Define AMX state components and have it used for boot-time checks The XSTATE initialization uses check_xstate_against_struct() to sanity check the size of XSTATE-enabled features. AMX is a XSAVE-enabled feature, and its size is not hard-coded but discoverable at run-time via CPUID. The AMX state is composed of state components 17 and 18, which are all user state components. The first component is the XTILECFG state of a 64-byte tile-related control register. The state component 18, called XTILEDATA, contains the actual tile data, and the state size varies on implementations. The architectural maximum, as defined in the CPUID(0x1d, 1): EAX[15:0], is a byte less than 64KB. The first implementation supports 8KB. Check the XTILEDATA state size dynamically. The feature introduces the new tile register, TMM. Define one register struct only and read the number of registers from CPUID. Cross-check the overall size with CPUID again. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-21-chang.seok.bae@intel.com	2021-10-26 10:53:02 +02:00
Chang S. Bae	70c3f1671b	x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers The kernel checks at boot time which features are available by walking a XSAVE feature table which contains the CPUID feature bit numbers which need to be checked whether a feature is available on a CPU or not. So far the feature numbers have been linear, but AMX will create a gap which the current code cannot handle. Make the table entries explicitly indexed and adjust the loop code accordingly to prepare for that. No functional change. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-20-chang.seok.bae@intel.com	2021-10-26 10:53:02 +02:00
Chang S. Bae	500afbf645	x86/fpu/xstate: Add fpstate_realloc()/free() The fpstate embedded in struct fpu is the default state for storing the FPU registers. It's sized so that the default supported features can be stored. For dynamically enabled features the register buffer is too small. The #NM handler detects first use of a feature which is disabled in the XFD MSR. After handling permission checks it recalculates the size for kernel space and user space state and invokes fpstate_realloc() which tries to reallocate fpstate and install it. Provide the allocator function which checks whether the current buffer size is sufficient and if not allocates one. If allocation is successful the new fpstate is initialized with the new features and sizes and the now enabled features is removed from the task's XFD mask. realloc_fpstate() uses vzalloc(). If use of this mechanism grows to re-allocate buffers larger than 64KB, a more sophisticated allocation scheme that includes purpose-built reclaim capability might be justified. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-19-chang.seok.bae@intel.com	2021-10-26 10:53:02 +02:00
Chang S. Bae	783e87b404	x86/fpu/xstate: Add XFD #NM handler If the XFD MSR has feature bits set then #NM will be raised when user space attempts to use an instruction related to one of these features. When the task has no permissions to use that feature, raise SIGILL, which is the same behavior as #UD. If the task has permissions, calculate the new buffer size for the extended feature set and allocate a larger fpstate. In the unlikely case that vzalloc() fails, SIGSEGV is raised. The allocation function will be added in the next step. Provide a stub which fails for now. [ tglx: Updated serialization ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-18-chang.seok.bae@intel.com	2021-10-26 10:53:02 +02:00
Chang S. Bae	672365477a	x86/fpu: Update XFD state where required The IA32_XFD_MSR allows to arm #NM traps for XSTATE components which are enabled in XCR0. The register has to be restored before the tasks XSTATE is restored. The life time rules are the same as for FPU state. XFD is updated on return to userspace only when the FPU state of the task is not up to date in the registers. It's updated before the XRSTORS so that eventually enabled dynamic features are restored as well and not brought into init state. Also in signal handling for restoring FPU state from user space the correctness of the XFD state has to be ensured. Add it to CPU initialization and resume as well. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-17-chang.seok.bae@intel.com	2021-10-26 10:53:02 +02:00
Thomas Gleixner	5529acf47e	x86/fpu: Add sanity checks for XFD Add debug functionality to ensure that the XFD MSR is up to date for XSAVE* and XRSTOR* operations. [ tglx: Improve comment. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-16-chang.seok.bae@intel.com	2021-10-26 10:52:35 +02:00
Chang S. Bae	8bf26758ca	x86/fpu: Add XFD state to fpstate Add storage for XFD register state to struct fpstate. This will be used to store the XFD MSR state. This will be used for switching the XFD MSR when FPU content is restored. Add a per-CPU variable to cache the current MSR value so the MSR has only to be written when the values are different. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-15-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Chang S. Bae	dae1bd5838	x86/msr-index: Add MSRs for XFD XFD introduces two MSRs: - IA32_XFD to enable/disable a feature controlled by XFD - IA32_XFD_ERR to expose to the #NM trap handler which feature was tried to be used for the first time. Both use the same xstate-component bitmap format, used by XCR0. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-14-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Chang S. Bae	c351101678	x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit Intel's eXtended Feature Disable (XFD) feature is an extension of the XSAVE architecture. XFD allows the kernel to enable a feature state in XCR0 and to receive a #NM trap when a task uses instructions accessing that state. This is going to be used to postpone the allocation of a larger XSTATE buffer for a task to the point where it is actually using a related instruction after the permission to use that facility has been granted. XFD is not used by the kernel, but only applied to userspace. This is a matter of policy as the kernel knows how a fpstate is reallocated and the XFD state. The compacted XSAVE format is adjustable for dynamic features. Make XFD depend on XSAVES. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-13-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Chang S. Bae	e61d6310a0	x86/fpu: Reset permission and fpstate on exec() On exec(), extended register states saved in the buffer is cleared. With dynamic features, each task carries variables besides the register states. The struct fpu has permission information and struct fpstate contains buffer size and feature masks. They are all dynamically updated with dynamic features. Reset the current task's entire FPU data before an exec() so that the new task starts with default permission and fpstate. Rename the register state reset function because the old naming confuses as it does not reset struct fpstate. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-12-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Thomas Gleixner	9e798e9aa1	x86/fpu: Prepare fpu_clone() for dynamically enabled features The default portion of the parent's FPU state is saved in a child task. With dynamic features enabled, the non-default portion is not saved in a child's fpstate because these register states are defined to be caller-saved. The new task's fpstate is therefore the default buffer. Fork inherits the permission of the parent. Also, do not use memcpy() when TIF_NEED_FPU_LOAD is set because it is invalid when the parent has dynamic features. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-11-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Chang S. Bae	53599b4d54	x86/fpu/signal: Prepare for variable sigframe length The software reserved portion of the fxsave frame in the signal frame is copied from structures which have been set up at boot time. With dynamically enabled features the content of these structures is no longer correct because the xfeatures and size can be different per task. Calculate the software reserved portion at runtime and fill in the xfeatures and size values from the tasks active fpstate. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-10-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Thomas Gleixner	4b7ca609a3	x86/signal: Use fpu::__state_user_size for sigalt stack validation Use the current->group_leader->fpu to check for pending permissions to use extended features and validate against the resulting user space size which is stored in the group leaders fpu struct as well. This prevents a task from installing a too small sized sigaltstack after permissions to use dynamically enabled features have been granted, but the task has not (yet) used a related instruction. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-9-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Thomas Gleixner	23686ef25d	x86/fpu: Add basic helpers for dynamically enabled features To allow building up the infrastructure required to support dynamically enabled FPU features, add: - XFEATURES_MASK_DYNAMIC This constant will hold xfeatures which can be dynamically enabled. - fpu_state_size_dynamic() A static branch for 64-bit and a simple 'return false' for 32-bit. This helper allows to add dynamic-feature-specific changes to common code which is shared between 32-bit and 64-bit without #ifdeffery. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-8-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Chang S. Bae	db8268df09	x86/arch_prctl: Add controls for dynamic XSTATE components Dynamically enabled XSTATE features are by default disabled for all processes. A process has to request permission to use such a feature. To support this implement a architecture specific prctl() with the options: - ARCH_GET_XCOMP_SUPP Copies the supported feature bitmap into the user space provided u64 storage. The pointer is handed in via arg2 - ARCH_GET_XCOMP_PERM Copies the process wide permitted feature bitmap into the user space provided u64 storage. The pointer is handed in via arg2 - ARCH_REQ_XCOMP_PERM Request permission for a feature set. A feature set can be mapped to a facility, e.g. AMX, and can require one or more XSTATE components to be enabled. The feature argument is the number of the highest XSTATE component which is required for a facility to work. The request argument is not a user supplied bitmap because that makes filtering harder (think seccomp) and even impossible because to support 32bit tasks the argument would have to be a pointer. The permission mechanism works this way: Task asks for permission for a facility and kernel checks whether that's supported. If supported it does: 1) Check whether permission has already been granted 2) Compute the size of the required kernel and user space buffer (sigframe) size. 3) Validate that no task has a sigaltstack installed which is smaller than the resulting sigframe size 4) Add the requested feature bit(s) to the permission bitmap of current->group_leader->fpu and store the sizes in the group leaders fpu struct as well. If that is successful then the feature is still not enabled for any of the tasks. The first usage of a related instruction will result in a #NM trap. The trap handler validates the permission bit of the tasks group leader and if permitted it installs a larger kernel buffer and transfers the permission and size info to the new fpstate container which makes all the FPU functions which require per task information aware of the extended feature set. [ tglx: Adopted to new base code, added missing serialization, massaged namings, comments and changelog ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-7-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Thomas Gleixner	c33f0a81a2	x86/fpu: Add fpu_state_config::legacy_features The upcoming prctl() which is required to request the permission for a dynamically enabled feature will also provide an option to retrieve the supported features. If the CPU does not support XSAVE, the supported features would be 0 even when the CPU supports FP and SSE. Provide separate storage for the legacy feature set to avoid that and fill in the bits in the legacy init function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-6-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Thomas Gleixner	6f6a7c09c4	x86/fpu: Add members to struct fpu to cache permission information Dynamically enabled features can be requested by any thread of a running process at any time. The request does neither enable the feature nor allocate larger buffers. It just stores the permission to use the feature by adding the features to the permission bitmap and by calculating the required sizes for kernel and user space. The reallocation of the kernel buffer happens when the feature is used for the first time which is caught by an exception. The permission bitmap is then checked and if the feature is permitted, then it becomes fully enabled. If not, the task dies similarly to a task which uses an undefined instruction. The size information is precomputed to allow proper sigaltstack size checks once the feature is permitted, but not yet in use because otherwise this would open race windows where too small stacks could be installed causing a later fail on signal delivery. Initialize them to the default feature set and sizes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-5-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Chang S. Bae	84e4dccc8f	x86/fpu/xstate: Provide xstate_calculate_size() Split out the size calculation from the paranoia check so it can be used for recalculating buffer sizes when dynamically enabled features are supported. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> [ tglx: Adopted to changed base code ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-4-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Thomas Gleixner	3aac3ebea0	x86/signal: Implement sigaltstack size validation For historical reasons MINSIGSTKSZ is a constant which became already too small with AVX512 support. Add a mechanism to enforce strict checking of the sigaltstack size against the real size of the FPU frame. The strict check can be enabled via a config option and can also be controlled via the kernel command line option 'strict_sas_size' independent of the config switch. Enabling it might break existing applications which allocate a too small sigaltstack but 'work' because they never get a signal delivered. Though it can be handy to filter out binaries which are not yet aware of AT_MINSIGSTKSZ. Also the upcoming support for dynamically enabled FPU features requires a strict sanity check to ensure that: - Enabling of a dynamic feature, which changes the sigframe size fits into an enabled sigaltstack - Installing a too small sigaltstack after a dynamic feature has been added is not possible. Implement the base check which is controlled by config and command line options. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-3-chang.seok.bae@intel.com	2021-10-26 10:18:09 +02:00
Daniel Bristot de Oliveira	9bd985766a	trace/osnoise: Fix an ifdef comment s/CONFIG_OSNOISE_TRAECR/CONFIG_OSNOISE_TRACER/ No functional changes. Link: https://lkml.kernel.org/r/33924a16f6e5559ce24952ca7d62561604bfd94a.1634308385.git.bristot@kernel.org Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: x86@kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-10-25 23:02:36 -04:00
Eric W. Biederman	1fbd60df8a	signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved. Update save_v86_state to always complete all of it's work except possibly some of the copies to userspace even if save_v86_state takes a fault. This ensures that the kernel is always in a sane state, even if userspace has done something silly. When save_v86_state takes a fault update it to force userspace to take a SIGSEGV and terminate the userspace application. As Andy pointed out in review of the first version of this change there are races between sigaction and the application terinating. Now that the code has been modified to always perform all save_v86_state's work (except possibly copying to userspace) those races do not matter from a kernel perspective. Forcing the userspace application to terminate (by resetting it's handler to SIGDFL) is there to keep everything as close to the current behavior as possible while removing the unique (and difficult to maintain) use of do_exit. If this new SIGSEGV happens during handle_signal the next time around the exit_to_user_mode_loop, SIGSEGV will be delivered to userspace. All of the callers of handle_vm86_trap and handle_vm86_fault run the exit_to_user_mode_loop before they return to userspace any signal sent to the current task during their execution will be delivered to the current task before that tasks exits to usermode. Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: H Peter Anvin <hpa@zytor.com> v1: https://lkml.kernel.org/r/20211020174406.17889-10-ebiederm@xmission.com Link: https://lkml.kernel.org/r/877de1xcr6.fsf_-_@disp2133 Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2021-10-25 15:56:29 -05:00
Eric W. Biederman	1a4d21a23c	signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON The function save_v86_state is only called when userspace was operating in vm86 mode before entering the kernel. Not having vm86 state in the task_struct should never happen. So transform the hand rolled BUG_ON into an actual BUG_ON to make it clear what is happening. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: H Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/20211020174406.17889-9-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2021-10-25 15:56:29 -05:00
Tianyu Lan	007faec014	x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV Hyper-V needs to issue the GHCB HV call in order to read/write MSRs in Isolation VMs. For that, expose sev_es_ghcb_hv_call(). The Hyper-V Isolation VMs are unenlightened guests and run a paravisor at VMPL0 for communicating. GHCB pages are being allocated and set up by that paravisor. Linux gets the GHCB page's physical address via MSR_AMD64_SEV_ES_GHCB from the paravisor and should not change it. Add a @set_ghcb_msr parameter to sev_es_ghcb_hv_call() to control whether the function should set the GHCB's address prior to the call or not and export that function for use by HyperV. [ bp: - Massage commit message - add a struct ghcb forward declaration to fix randconfig builds. ] Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-6-ltykernel@gmail.com	2021-10-25 18:11:42 +02:00
David Woodhouse	0985dba842	KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block() In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before making a final check whether the vCPU should be woken from HLT by any incoming interrupt. This is a problem for the get_user() in __kvm_xen_has_interrupt(), which really shouldn't be sleeping when the task state has already been set. I think it's actually harmless as it would just manifest itself as a spurious wakeup, but it's causing a debug warning: [ 230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80 Fix the warning by turning it into an explicit spurious wakeup. When invoked with !task_is_running(current) (and we might as well add in_atomic() there while we're at it), just return 1 to indicate that an IRQ is pending, which will cause a wakeup and then something will call it again in a context that can sleep so it can fault the page back in. Cc: stable@vger.kernel.org Fixes: `40da8ccd72` ("KVM: x86/xen: Add event channel interrupt vector upcall") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-25 09:10:18 -04:00
David Woodhouse	8228c77d8b	KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock On the preemption path when updating a Xen guest's runstate times, this lock is taken inside the scheduler rq->lock, which is a raw spinlock. This was shown in a lockdep warning: [ 89.138354] ============================= [ 89.138356] [ BUG: Invalid wait context ] [ 89.138358] 5.15.0-rc5+ #834 Tainted: G S I E [ 89.138360] ----------------------------- [ 89.138361] xen_shinfo_test/2575 is trying to lock: [ 89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138442] other info that might help us debug this: [ 89.138444] context-{5:5} [ 89.138445] 4 locks held by xen_shinfo_test/2575: [ 89.138447] #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm] [ 89.138483] #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm] [ 89.138526] #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0 [ 89.138534] #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm] ... [ 89.138695] get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138734] kvm_xen_update_runstate+0x14/0x90 [kvm] [ 89.138783] kvm_xen_update_runstate_guest+0x15/0xd0 [kvm] [ 89.138830] kvm_arch_vcpu_put+0xe6/0x170 [kvm] [ 89.138870] kvm_sched_out+0x2f/0x40 [kvm] [ 89.138900] __schedule+0x5de/0xbd0 Cc: stable@vger.kernel.org Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com Fixes: `30b5c851af` ("KVM: x86/xen: Add support for vCPU runstate information") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-25 08:14:38 -04:00
David Edmondson	0d7d84498f	KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol When passing the failing address and size out to user space, SGX must ensure not to trample on the earlier fields of the emulation_failure sub-union of struct kvm_run. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-5-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-25 06:48:25 -04:00
David Edmondson	e615e35589	KVM: x86: On emulation failure, convey the exit reason, etc. to userspace Should instruction emulation fail, include the VM exit reason, etc. in the emulation_failure data passed to userspace, in order that the VMM can report it as a debugging aid when describing the failure. Suggested-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-25 06:48:24 -04:00
David Edmondson	0a62a0319a	KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info Extend the get_exit_info static call to provide the reason for the VM exit. Modify relevant trace points to use this rather than extracting the reason in the caller. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-3-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-25 06:48:24 -04:00
Rob Herring	f2739ca15c	x86/of: Kill unused early_init_dt_scan_chosen_arch() There are no callers for early_init_dt_scan_chosen_arch(), so remove it. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Link: https://lkml.kernel.org/r/20211022164642.2815706-1-robh@kernel.org	2021-10-25 10:56:37 +02:00
Masahiro Yamada	8212f8986d	kbuild: use more subdir- for visiting subdirectories while cleaning Documentation/kbuild/makefiles.rst suggests to use "archclean" for cleaning arch/$(SRCARCH)/boot/, but it is not a hard requirement. Since commit `d92cc4d516` ("kbuild: require all architectures to have arch/$(SRCARCH)/Kbuild"), we can use the "subdir- += boot" trick for all architectures. This can take advantage of the parallel option (-j) for "make clean". I also cleaned up the comments in arch/$(SRCARCH)/Makefile. The "archdep" target no longer exists. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)	2021-10-24 13:49:46 +09:00
Thomas Gleixner	582b01b6ab	x86/fpu: Remove old KVM FPU interface No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211022185313.074853631@linutronix.de	2021-10-23 17:05:19 +02:00
Thomas Gleixner	d69c1382e1	x86/kvm: Convert FPU handling to a single swap buffer For the upcoming AMX support it's necessary to do a proper integration with KVM. Currently KVM allocates two FPU structs which are used for saving the user state of the vCPU thread and restoring the guest state when entering vcpu_run() and doing the reverse operation before leaving vcpu_run(). With the new fpstate mechanism this can be reduced to one extra buffer by swapping the fpstate pointer in current::thread::fpu. This makes the upcoming support for AMX and XFD simpler because then fpstate information (features, sizes, xfd) are always consistent and it does not require any nasty workarounds. Convert the KVM FPU code over to this new scheme. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211022185313.019454292@linutronix.de	2021-10-23 16:13:29 +02:00
Thomas Gleixner	69f6ed1d14	x86/fpu: Provide infrastructure for KVM FPU cleanup For the upcoming AMX support it's necessary to do a proper integration with KVM. Currently KVM allocates two FPU structs which are used for saving the user state of the vCPU thread and restoring the guest state when entering vcpu_run() and doing the reverse operation before leaving vcpu_run(). With the new fpstate mechanism this can be reduced to one extra buffer by swapping the fpstate pointer in current::thread::fpu. This makes the upcoming support for AMX and XFD simpler because then fpstate information (features, sizes, xfd) are always consistent and it does not require any nasty workarounds. Provide: - An allocator which initializes the state properly - A replacement for the existing FPU swap mechanim Aside of the reduced memory footprint, this also makes state switching more efficient when TIF_FPU_NEED_LOAD is set. It does not require a memcpy as the state is already correct in the to be swapped out fpstate. The existing interfaces will be removed once KVM is converted over. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211022185312.954684740@linutronix.de	2021-10-23 14:50:19 +02:00
Thomas Gleixner	75c52dad5e	x86/fpu: Prepare for sanitizing KVM FPU code For the upcoming AMX support it's necessary to do a proper integration with KVM. To avoid more nasty hackery in KVM which violate encapsulation extend struct fpu and fpstate so the fpstate switching can be consolidated and simplified. Currently KVM allocates two FPU structs which are used for saving the user state of the vCPU thread and restoring the guest state when entering vcpu_run() and doing the reverse operation before leaving vcpu_run(). With the new fpstate mechanism this can be reduced to one extra buffer by swapping the fpstate pointer in current::thread::fpu. This makes the upcoming support for AMX and XFD simpler because then fpstate information (features, sizes, xfd) are always consistent and it does not require any nasty workarounds. Add fpu::__task_fpstate to save the regular fpstate pointer while the task is inside vcpu_run(). Add some state fields to fpstate to indicate the nature of the state. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211022185312.896403942@linutronix.de	2021-10-23 13:14:50 +02:00

... 12 13 14 15 16 ...

40406 Commits