linux

Author	SHA1	Message	Date
Sean Christopherson	527d5cd7ee	KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped Zap only obsolete roots when responding to zapping a single root shadow page. Because KVM keeps root_count elevated when stuffing a previous root into its PGD cache, shadowing a 64-bit guest means that zapping any root causes all vCPUs to reload all roots, even if their current root is not affected by the zap. For many kernels, zapping a single root is a frequent operation, e.g. in Linux it happens whenever an mm is dropped, e.g. process exits, etc... Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Message-Id: <20220225182248.3812651-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-03-01 08:58:25 -05:00
Paolo Bonzini	0c1c92f15f	KVM: x86/mmu: do not pass vcpu to root freeing functions These functions only operate on a given MMU, of which there is more than one in a vCPU (we care about two, because the third does not have any roots and is only used to walk guest page tables). They do need a struct kvm in order to lock the mmu_lock, but they do not needed anything else in the struct kvm_vcpu. So, pass the vcpu->kvm directly to them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:17 -05:00
Paolo Bonzini	b9e5603c2a	KVM: x86: use struct kvm_mmu_root_info for mmu->root The root_hpa and root_pgd fields form essentially a struct kvm_mmu_root_info. Use the struct to have more consistency between mmu->root and mmu->prev_roots. The patch is entirely search and replace except for cached_root_available, which does not need a temporary struct kvm_mmu_root_info anymore. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:16 -05:00
David Dunn	ba7bb663f5	KVM: x86: Provide per VM capability for disabling PMU virtualization Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of settings/features to allow userspace to configure PMU virtualization on a per-VM basis. For now, support a single flag, KVM_PMU_CAP_DISABLE, to allow disabling PMU virtualization for a VM even when KVM is configured with enable_pmu=true a module level. To keep KVM simple, disallow changing VM's PMU configuration after vCPUs have been created. Signed-off-by: David Dunn <daviddunn@google.com> Message-Id: <20220223225743.2703915-2-daviddunn@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:14 -05:00
Sean Christopherson	925088781e	KVM: x86: Fix pointer mistmatch warning when patching RET0 static calls Cast kvm_x86_ops.func to 'void ' when updating KVM static calls that are conditionally patched to __static_call_return0(). clang complains about using mismatching pointers in the ternary operator, which breaks the build when compiling with CONFIG_KVM_WERROR=y. >> arch/x86/include/asm/kvm-x86-ops.h:82:1: warning: pointer type mismatch ('bool ()(struct kvm_vcpu )' and 'void ') [-Wpointer-type-mismatch] Fixes: `5be2226f41` ("KVM: x86: allow defining return-0 static calls") Reported-by: Like Xu <like.xu.linux@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Dunn <daviddunn@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Message-Id: <20220223162355.3174907-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-25 08:20:14 -05:00
Arnd Bergmann	12700c17fc	uaccess: generalize access_ok() There are many different ways that access_ok() is defined across architectures, but in the end, they all just compare against the user_addr_max() value or they accept anything. Provide one definition that works for most architectures, checking against TASK_SIZE_MAX for user processes or skipping the check inside of uaccess_kernel() sections. For architectures without CONFIG_SET_FS(), this should be the fastest check, as it comes down to a single comparison of a pointer against a compile-time constant, while the architecture specific versions tend to do something more complex for historic reasons or get something wrong. Type checking for __user annotations is handled inconsistently across architectures, but this is easily simplified as well by using an inline function that takes a 'const void __user *' argument. A handful of callers need an extra __user annotation for this. Some architectures had trick to use 33-bit or 65-bit arithmetic on the addresses to calculate the overflow, however this simpler version uses fewer registers, which means it can produce better object code in the end despite needing a second (statically predicted) branch. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64, asm-generic] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Stafford Horne <shorne@gmail.com> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2022-02-25 09:36:05 +01:00
Arnd Bergmann	34737e2698	uaccess: add generic __{get,put}_kernel_nofault Nine architectures are still missing __{get,put}_kernel_nofault: alpha, ia64, microblaze, nds32, nios2, openrisc, sh, sparc32, xtensa. Add a generic version that lets everything use the normal copy_{from,to}_kernel_nofault() code based on these, removing the last use of get_fs()/set_fs() from architecture-independent code. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2022-02-25 09:36:05 +01:00
Arnd Bergmann	1830a1d6a5	x86: use more conventional access_ok() definition The way that access_ok() is defined on x86 is slightly different from most other architectures, and a bit more complex. The generic version tends to result in the best output on all architectures, as it results in single comparison against a constant limit for calls with a known size. There are a few callers of __range_not_ok(), all of which use TASK_SIZE as the limit rather than TASK_SIZE_MAX, but I could not see any reason for picking this. Changing these to call __access_ok() instead uses the default limit, but keeps the behavior otherwise. x86 is the only architecture with a WARN_ON_IN_IRQ() checking access_ok(), but it's probably best to leave that in place. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2022-02-25 09:36:05 +01:00
Arnd Bergmann	36903abedf	x86: remove __range_not_ok() The __range_not_ok() helper is an x86 (and sparc64) specific interface that does roughly the same thing as __access_ok(), but with different calling conventions. Change this to use the normal interface in order for consistency as we clean up all access_ok() implementations. This changes the limit from TASK_SIZE to TASK_SIZE_MAX, which Al points out is the right thing do do here anyway. The callers have to use __access_ok() instead of the normal access_ok() though, because on x86 that contains a WARN_ON_IN_IRQ() check that cannot be used inside of NMI context while tracing. The check in copy_code() is not needed any more, because this one is already done by copy_from_user_nmi(). Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/lkml/YgsUKcXGR7r4nINj@zeniv-ca.linux.org.uk/ Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2022-02-25 09:36:05 +01:00
Linus Torvalds	1f840c0ef4	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "x86 host: - Expose KVM_CAP_ENABLE_CAP since it is supported - Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode - Ensure async page fault token is nonzero - Fix lockdep false negative - Fix FPU migration regression from the AMX changes x86 guest: - Don't use PV TLB/IPI/yield on uniprocessor guests PPC: - reserve capability id (topic branch for ppc/kvm)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled KVM: x86/mmu: make apf token non-zero to fix bug KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3 x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU x86/kvm: Fix compilation warning in non-x86_64 builds x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0 x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode KVM: Fix lockdep false negative during host resume KVM: x86: Add KVM_CAP_ENABLE_CAP to x86	2022-02-24 14:05:49 -08:00
Brijesh Singh	1e8c5971c2	x86/mm/cpa: Generalize __set_memory_enc_pgtable() The kernel provides infrastructure to set or clear the encryption mask from the pages for AMD SEV, but TDX requires few tweaks. - TDX and SEV have different requirements to the cache and TLB flushing. - TDX has own routine to notify VMM about page encryption status change. Modify __set_memory_enc_pgtable() and make it flexible enough to cover both AMD SEV and Intel TDX. The AMD-specific behavior is isolated in the callbacks under x86_platform.guest. TDX will provide own version of said callbacks. [ bp: Beat into submission. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20220223043528.2093214-1-brijesh.singh@amd.com	2022-02-23 19:14:29 +01:00
Kirill A. Shutemov	b577f542f9	x86/coco: Add API to handle encryption mask AMD SME/SEV uses a bit in the page table entries to indicate that the page is encrypted and not accessible to the VMM. TDX uses a similar approach, but the polarity of the mask is opposite to AMD: if the bit is set the page is accessible to VMM. Provide vendor-neutral API to deal with the mask: cc_mkenc() and cc_mkdec() modify given address to make it encrypted/decrypted. It can be applied to phys_addr_t, pgprotval_t or page table entry value. pgprot_encrypted() and pgprot_decrypted() reimplemented using new helpers. The implementation will be extended to cover TDX. pgprot_decrypted() is used by drivers (i915, virtio_gpu, vfio). cc_mkdec() called by pgprot_decrypted(). Export cc_mkdec(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20220222185740.26228-5-kirill.shutemov@linux.intel.com	2022-02-23 19:14:29 +01:00
Kirill A. Shutemov	655a0fa34b	x86/coco: Explicitly declare type of confidential computing platform The kernel derives the confidential computing platform type it is running as from sme_me_mask on AMD or by using hv_is_isolation_supported() on HyperV isolation VMs. This detection process will be more complicated as more platforms get added. Declare a confidential computing vendor variable explicitly and set it via cc_set_vendor() on the respective platform. [ bp: Massage commit message, fixup HyperV check. ] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20220222185740.26228-4-kirill.shutemov@linux.intel.com	2022-02-23 19:14:16 +01:00
Christoph Hellwig	4509d950a6	x86/pat: Remove the unused set_pages_array_wt() function Commit `623dffb2a2` ("x86/mm/pat: Add set_memory_wt() for Write-Through type") added it but there were no users. [ bp: Add a commit message. ] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220223072852.616143-1-hch@lst.de	2022-02-23 13:34:08 +01:00
Ingo Molnar	669f45f19c	sched/headers: Add initial new headers as identity mappings This allows code sharing between fast-headers tree and the vanilla scheduler tree. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>	2022-02-23 10:58:28 +01:00
Ingo Molnar	6255b48aeb	Merge tag 'v5.17-rc5' into sched/core, to resolve conflicts New conflicts in sched/core due to the following upstream fixes: `44585f7bc0` ("psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n") `a06247c680` ("psi: Fix uaf issue when psi trigger is destroyed while being polled") Conflicts: include/linux/psi_types.h kernel/sched/psi.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2022-02-21 11:53:51 +01:00
Peter Zijlstra	1e19da8522	x86/speculation: Add eIBRS + Retpoline options Thanks to the chaps at VUsec it is now clear that eIBRS is not sufficient, therefore allow enabling of retpolines along with eIBRS. Add spectre_v2=eibrs, spectre_v2=eibrs,lfence and spectre_v2=eibrs,retpoline options to explicitly pick your preferred means of mitigation. Since there's new mitigations there's also user visible changes in /sys/devices/system/cpu/vulnerabilities/spectre_v2 to reflect these new mitigations. [ bp: Massage commit message, trim error messages, do more precise eIBRS mode checking. ] Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Patrick Colp <patrick.colp@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2022-02-21 10:21:35 +01:00
Peter Zijlstra (Intel)	d45476d983	x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE The RETPOLINE_AMD name is unfortunate since it isn't necessarily AMD only, in fact Hygon also uses it. Furthermore it will likely be sufficient for some Intel processors. Therefore rename the thing to RETPOLINE_LFENCE to better describe what it is. Add the spectre_v2=retpoline,lfence option as an alias to spectre_v2=retpoline,amd to preserve existing setups. However, the output of /sys/devices/system/cpu/vulnerabilities/spectre_v2 will be changed. [ bp: Fix typos, massage. ] Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2022-02-21 10:21:28 +01:00
Mark Rutland	8a69fe0be1	sched/preempt: Refactor sched_dynamic_update() Currently sched_dynamic_update needs to open-code the enabled/disabled function names for each preemption model it supports, when in practice this is a boolean enabled/disabled state for each function. Make this clearer and avoid repetition by defining the enabled/disabled states at the function definition, and using helper macros to perform the static_call_update(). Where x86 currently overrides the enabled function, it is made to provide both the enabled and disabled states for consistency, with defaults provided by the core code otherwise. In subsequent patches this will allow us to support PREEMPT_DYNAMIC without static calls. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20220214165216.2231574-3-mark.rutland@arm.com	2022-02-19 11:11:07 +01:00
Sean Christopherson	1bbc60d0c7	KVM: x86/mmu: Remove MMU auditing Remove mmu_audit.c and all its collateral, the auditing code has suffered severe bitrot, ironically partly due to shadow paging being more stable and thus not benefiting as much from auditing, but mostly due to TDP supplanting shadow paging for non-nested guests and shadowing of nested TDP not heavily stressing the logic that is being audited. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 13:46:23 -05:00
Paolo Bonzini	5be2226f41	KVM: x86: allow defining return-0 static calls A few vendor callbacks are only used by VMX, but they return an integer or bool value. Introduce KVM_X86_OP_OPTIONAL_RET0 for them: if a func is NULL in struct kvm_x86_ops, it will be changed to __static_call_return0 when updating static calls. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:44:22 -05:00
Paolo Bonzini	abb6d479e2	KVM: x86: make several APIC virtualization callbacks optional All their invocations are conditional on vcpu->arch.apicv_active, meaning that they need not be implemented by vendor code: even though at the moment both vendors implement APIC virtualization, all of them can be optional. In fact SVM does not need many of them, and their implementation can be deleted now. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:41:23 -05:00
Paolo Bonzini	dd2319c618	KVM: x86: warn on incorrectly NULL members of kvm_x86_ops Use the newly corrected KVM_X86_OP annotations to warn about possible NULL pointer dereferences as soon as the vendor module is loaded. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:31:43 -05:00
Paolo Bonzini	e4fc23bad8	KVM: x86: remove KVM_X86_OP_NULL and mark optional kvm_x86_ops The original use of KVM_X86_OP_NULL, which was to mark calls that do not follow a specific naming convention, is not in use anymore. Instead, let's mark calls that are optional because they are always invoked within conditionals or with static_call_cond. Those that are _not_, i.e. those that are defined with KVM_X86_OP, must be defined by both vendor modules or some kind of NULL pointer dereference is bound to happen at runtime. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:31:27 -05:00
Paolo Bonzini	8a2897853c	KVM: x86: return 1 unconditionally for availability of KVM_CAP_VAPIC The two ioctls used to implement userspace-accelerated TPR, KVM_TPR_ACCESS_REPORTING and KVM_SET_VAPIC_ADDR, are available even if hardware-accelerated TPR can be used. So there is no reason not to report KVM_CAP_VAPIC. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-18 12:30:56 -05:00
Jakub Kicinski	6b5567b1b2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-17 11:44:20 -08:00
Leonardo Bras	988896bb61	x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0 kvm_vcpu_arch currently contains the guest supported features in both guest_supported_xcr0 and guest_fpu.fpstate->user_xfeatures field. Currently both fields are set to the same value in kvm_vcpu_after_set_cpuid() and are not changed anywhere else after that. Since it's not good to keep duplicated data, remove guest_supported_xcr0. To keep the code more readable, introduce kvm_guest_supported_xcr() and kvm_guest_supported_xfd() to replace the previous usages of guest_supported_xcr0. Signed-off-by: Leonardo Bras <leobras@redhat.com> Message-Id: <20220217053028.96432-3-leobras@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-17 10:06:49 -05:00
Gustavo A. R. Silva	5224f79096	treewide: Replace zero-length arrays with flexible-array members There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. This code was transformed with the help of Coccinelle: (next-20220214$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch) @@ identifier S, member, array; type T1, T2; @@ struct S { ... T1 member; T2 array[ - 0 ]; }; UAPI and wireless changes were intentionally excluded from this patch and will be sent out separately. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/78 Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2022-02-17 07:00:39 -06:00
Linus Torvalds	c5d9ae265b	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Read HW interrupt pending state from the HW x86: - Don't truncate the performance event mask on AMD - Fix Xen runstate updates to be atomic when preempting vCPU - Fix for AMD AVIC interrupt injection race - Several other AMD fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event KVM: SVM: fix race between interrupt delivery and AVIC inhibition KVM: SVM: set IRR in svm_deliver_interrupt KVM: SVM: extract avic_ring_doorbell selftests: kvm: Remove absent target file KVM: arm64: vgic: Read HW interrupt pending state from the HW KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU KVM: x86: SVM: move avic definitions from AMD's spec to svm.h KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them KVM: x86: nSVM: expose clean bit support to the guest KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state KVM: x86: nSVM: fix potential NULL derefernce on nested migration KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case Revert "svm: Add warning message for AVIC IPI invalid target"	2022-02-15 11:07:59 -08:00
Alexander Shishkin	161a9a3370	perf/x86/intel/pt: Add a capability and config bit for disabling TNTs As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new Intel PT feature called TNT-Disable which is enabled config bit 55. TNT-Disable disables Taken-Not-Taken packets to reduce the tracing overhead, but with the result that exact control flow information is lost. Add a capability and config bit for TNT-Disable. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20220126104815.2807416-3-adrian.hunter@intel.com	2022-02-15 17:47:11 +01:00
Alexander Shishkin	28c24ded64	perf/x86/intel/pt: Add a capability and config bit for event tracing As of Intel SDM (https://www.intel.com/sdm) version 076, there is a new Intel PT feature called Event Trace which is enabled config bit 31. Event Trace exposes details about asynchronous events such as interrupts and VM-Entry/Exit. Add a capability and config bit for Event Trace. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20220126104815.2807416-2-adrian.hunter@intel.com	2022-02-15 17:47:11 +01:00
Fenghua Yu	7c1ef59145	x86/cpufeatures: Re-enable ENQCMD The ENQCMD feature can only be used if CONFIG_INTEL_IOMMU_SVM is set. Add X86_FEATURE_ENQCMD to the disabled features mask as appropriate so that cpu_feature_enabled() can be used to check the feature. [ bp: Massage commit message. ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220207230254.3342514-10-fenghua.yu@intel.com	2022-02-15 11:31:43 +01:00
Linus Torvalds	42964a18f8	Merge tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Borislav Petkov: "Fix a case where objtool would mistakenly warn about instructions being unreachable" * tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm	2022-02-13 09:43:34 -08:00
Borislav Petkov	b008893b08	x86/ptrace: Always inline v8086_mode() for instrumentation Instrumentation glue like KASAN causes the following warning: vmlinux.o: warning: objtool: mce_gather_info()+0x5f: call to v8086_mode.constprop.0() leaves .noinstr.text section due to gcc creating a function call for that oneliner. Force-inline it and even save some vmlinux bytes (.config is close to an allmodconfig): text data bss dec hex filename 209431677 208257651 34411048 452100376 1af28118 vmlinux.before 209431519 208257615 34411048 452100182 1af28056 vmlinux.after Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/20220204083015.17317-3-bp@alien8.de	2022-02-12 22:07:13 +01:00
Borislav Petkov	f5c54f77b0	cpumask: Add a x86-specific cpumask_clear_cpu() helper Add a x86-specific cpumask_clear_cpu() helper which will be used in places where the explicit KASAN-instrumentation in the *_bit() helpers is unwanted. Also, always inline two more cpumask generic helpers. allyesconfig: text data bss dec hex filename 190553143 159425889 32076404 382055436 16c5b40c vmlinux.before 190551812 159424945 32076404 382053161 16c5ab29 vmlinux.after Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/20220204083015.17317-2-bp@alien8.de	2022-02-12 18:20:05 +01:00
Linus Torvalds	4a387c98b3	Merge tag 'for-linus-5.17a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - Two small cleanups - Another fix for addressing the EFI framebuffer above 4GB when running as Xen dom0 - A patch to let Xen guests use reserved bits in MSI- and IO-APIC- registers for extended APIC-IDs the same way KVM guests are doing it already * tag 'for-linus-5.17a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pci: Make use of the helper macro LIST_HEAD() xen/x2apic: Fix inconsistent indenting xen/x86: detect support for extended destination ID xen/x86: obtain full video frame buffer address for Dom0 also under EFI	2022-02-12 09:08:57 -08:00
Ard Biesheuvel	297565aa22	lib/xor: make xor prototypes more friendly to compiler vectorization Modern compilers are perfectly capable of extracting parallelism from the XOR routines, provided that the prototypes reflect the nature of the input accurately, in particular, the fact that the input vectors are expected not to overlap. This is not documented explicitly, but is implied by the interchangeability of the various C routines, some of which use temporary variables while others don't: this means that these routines only behave identically for non-overlapping inputs. So let's decorate these input vectors with the __restrict modifier, which informs the compiler that there is no overlap. While at it, make the input-only vectors pointer-to-const as well. Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/563 Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-02-11 20:39:39 +11:00
Jakub Kicinski	5b91c5cc0e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-10 17:29:56 -08:00
David Matlack	cb00a70bd4	KVM: x86/mmu: Split huge pages mapped by the TDP MMU during KVM_CLEAR_DIRTY_LOG When using KVM_DIRTY_LOG_INITIALLY_SET, huge pages are not write-protected when dirty logging is enabled on the memslot. Instead they are write-protected once userspace invokes KVM_CLEAR_DIRTY_LOG for the first time and only for the specific sub-region being cleared. Enhance KVM_CLEAR_DIRTY_LOG to also try to split huge pages prior to write-protecting to avoid causing write-protection faults on vCPU threads. This also allows userspace to smear the cost of huge page splitting across multiple ioctls, rather than splitting the entire memslot as is the case when initially-all-set is not used. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-17-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:43 -05:00
David Matlack	a3fe5dbda0	KVM: x86/mmu: Split huge pages mapped by the TDP MMU when dirty logging is enabled When dirty logging is enabled without initially-all-set, try to split all huge pages in the memslot down to 4KB pages so that vCPUs do not have to take expensive write-protection faults to split huge pages. Eager page splitting is best-effort only. This commit only adds the support for the TDP MMU, and even there splitting may fail due to out of memory conditions. Failures to split a huge page is fine from a correctness standpoint because KVM will always follow up splitting by write-protecting any remaining huge pages. Eager page splitting moves the cost of splitting huge pages off of the vCPU threads and onto the thread enabling dirty logging on the memslot. This is useful because: 1. Splitting on the vCPU thread interrupts vCPUs execution and is disruptive to customers whereas splitting on VM ioctl threads can run in parallel with vCPU execution. 2. Splitting all huge pages at once is more efficient because it does not require performing VM-exit handling or walking the page table for every 4KiB page in the memslot, and greatly reduces the amount of contention on the mmu_lock. For example, when running dirty_log_perf_test with 96 virtual CPUs, 1GiB per vCPU, and 1GiB HugeTLB memory, the time it takes vCPUs to write to all of their memory after dirty logging is enabled decreased by 95% from 2.94s to 0.14s. Eager Page Splitting is over 100x more efficient than the current implementation of splitting on fault under the read lock. For example, taking the same workload as above, Eager Page Splitting reduced the CPU required to split all huge pages from ~270 CPU-seconds ((2.94s - 0.14s) * 96 vCPU threads) to only 1.55 CPU-seconds. Eager page splitting does increase the amount of time it takes to enable dirty logging since it has split all huge pages. For example, the time it took to enable dirty logging in the 96GiB region of the aforementioned test increased from 0.001s to 1.55s. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-16-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:42 -05:00
Sean Christopherson	03d004cd07	KVM: x86: Use more verbose names for mem encrypt kvm_x86_ops hooks Use slightly more verbose names for the so called "memory encrypt", a.k.a. "mem enc", kvm_x86_ops hooks to bridge the gap between the current super short kvm_x86_ops names and SVM's more verbose, but non-conforming names. This is a step toward using kvm-x86-ops.h with KVM_X86_CVM_OP() to fill svm_x86_ops. Opportunistically rename mem_enc_op() to mem_enc_ioctl() to better reflect its true nature, as it really is a full fledged ioctl() of its own. Ideally, the hook would be named confidential_vm_ioctl() or so, as the ioctl() is a gateway to more than just memory encryption, and because its underlying purpose to support Confidential VMs, which can be provided without memory encryption, e.g. if the TCB of the guest includes the host kernel but not host userspace, or by isolation in hardware without encrypting memory. But, diverging from KVM_MEMORY_ENCRYPT_OP even further is undeseriable, and short of creating alises for all related ioctl()s, which introduces a different flavor of divergence, KVM is stuck with the nomenclature. Defer renaming SVM's functions to a future commit as there are additional changes needed to make SVM fully conforming and to match reality (looking at you, svm_vm_copy_asid_from()). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:50:29 -05:00
Sean Christopherson	872e0c5308	KVM: x86: Move get_cs_db_l_bits() helper to SVM Move kvm_get_cs_db_l_bits() to SVM and rename it appropriately so that its svm_x86_ops entry can be filled via kvm-x86-ops, and to eliminate a superfluous export from KVM x86. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:21 -05:00
Sean Christopherson	7ad02ef0da	KVM: x86: Use static_call() for copy/move encryption context ioctls() Define and use static_call()s for .vm_{copy,move}_enc_context_from(), mostly so that the op is defined in kvm-x86-ops.h. This will allow using KVM_X86_OP in vendor code to wire up the implementation. Any performance gains eeked out by using static_call() is a happy bonus and not the primary motiviation. Opportunistically refactor the code to reduce indentation and keep line lengths reasonable, and to be consistent when wrapping versus running a bit over the 80 char soft limit. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:20 -05:00
Sean Christopherson	a0941a64a9	KVM: x86: Use static_call() for .vcpu_deliver_sipi_vector() Define and use a static_call() for kvm_x86_ops.vcpu_deliver_sipi_vector(), mostly so that the op is defined in kvm-x86-ops.h. This will allow using KVM_X86_OP in vendor code to wire up the implementation. Any performance gains eeked out by using static_call() is a happy bonus and not the primary motiviation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:18 -05:00
Sean Christopherson	e27bc0440e	KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names Rename a variety of kvm_x86_op function pointers so that preferred name for vendor implementations follows the pattern <vendor>_<function>, e.g. rename .run() to .vcpu_run() to match {svm,vmx}_vcpu_run(). This will allow vendor implementations to be wired up via the KVM_X86_OP macro. In many cases, VMX and SVM "disagree" on the preferred name, though in reality it's VMX and x86 that disagree as SVM blindly prepended _svm to the kvm_x86_ops name. Justification for using the VMX nomenclature: - set_{irq,nmi} => inject_{irq,nmi} because the helper is injecting an event that has already been "set" in e.g. the vIRR. SVM's relevant VMCB field is even named event_inj, and KVM's stat is irq_injections. - prepare_guest_switch => prepare_switch_to_guest because the former is ambiguous, e.g. it could mean switching between multiple guests, switching from the guest to host, etc... - update_pi_irte => pi_update_irte to allow for matching match the rest of VMX's posted interrupt naming scheme, which is vmx_pi_<blah>(). - start_assignment => pi_start_assignment to again follow VMX's posted interrupt naming scheme, and to provide context for what bit of code might care about an otherwise undescribed "assignment". The "tlb_flush" => "flush_tlb" creates an inconsistency with respect to Hyper-V's "tlb_remote_flush" hooks, but Hyper-V really is the one that's wrong. x86, VMX, and SVM all use flush_tlb, and even common KVM is on a variant of the bandwagon with "kvm_flush_remote_tlbs", e.g. a more appropriate name for the Hyper-V hooks would be flush_remote_tlbs. Leave that change for another time as the Hyper-V hooks always start as NULL, i.e. the name doesn't matter for using kvm-x86-ops.h, and changing all names requires an astounding amount of churn. VMX and SVM function names are intentionally left as is to minimize the diff. Both VMX and SVM will need to rename even more functions in order to fully utilize KVM_X86_OPS, i.e. an additional patch for each is inevitable. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:17 -05:00
Jinrong Liang	62711e5a74	KVM: x86: Remove unused "vcpu" of kvm_scale_tsc() The "struct kvm_vcpu *vcpu" parameter of kvm_scale_tsc() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-18-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:14 -05:00
Roger Pau Monne	e07e98da92	xen/x86: detect support for extended destination ID Xen allows the usage of some previously reserved bits in the IO-APIC RTE and the MSI address fields in order to store high bits for the target APIC ID. Such feature is already implemented by QEMU/KVM and HyperV, so in order to enable it just add the handler that checks for it's presence. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220120152527.7524-3-roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>	2022-02-10 11:10:17 +01:00
Jakub Kicinski	1127170d45	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-02-09 We've added 126 non-merge commits during the last 16 day(s) which contain a total of 201 files changed, 4049 insertions(+), 2215 deletions(-). The main changes are: 1) Add custom BPF allocator for JITs that pack multiple programs into a huge page to reduce iTLB pressure, from Song Liu. 2) Add __user tagging support in vmlinux BTF and utilize it from BPF verifier when generating loads, from Yonghong Song. 3) Add per-socket fast path check guarding from cgroup/BPF overhead when used by only some sockets, from Pavel Begunkov. 4) Continued libbpf deprecation work of APIs/features and removal of their usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko and various others. 5) Improve BPF instruction set documentation by adding byte swap instructions and cleaning up load/store section, from Christoph Hellwig. 6) Switch BPF preload infra to light skeleton and remove libbpf dependency from it, from Alexei Starovoitov. 7) Fix architecture-agnostic macros in libbpf for accessing syscall arguments from BPF progs for non-x86 architectures, from Ilya Leoshkevich. 8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be of 16-bit field with anonymous zero padding, from Jakub Sitnicki. 9) Add new bpf_copy_from_user_task() helper to read memory from a different task than current. Add ability to create sleepable BPF iterator progs, from Kenny Yu. 10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski. 11) Generate temporary netns names for BPF selftests to avoid naming collisions, from Hangbin Liu. 12) Implement bpf_core_types_are_compat() with limited recursion for in-kernel usage, from Matteo Croce. 13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5 to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor. 14) Misc minor fixes to libbpf and selftests from various folks. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits) selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide libbpf: Fix compilation warning due to mismatched printf format selftests/bpf: Test BPF_KPROBE_SYSCALL macro libbpf: Add BPF_KPROBE_SYSCALL macro libbpf: Fix accessing the first syscall argument on s390 libbpf: Fix accessing the first syscall argument on arm64 libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390 libbpf: Fix accessing syscall arguments on riscv libbpf: Fix riscv register names libbpf: Fix accessing syscall arguments on powerpc selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro libbpf: Add PT_REGS_SYSCALL_REGS macro selftests/bpf: Fix an endianness issue in bpf_syscall_macro test bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE bpf: Fix leftover header->pages in sparc and powerpc code. libbpf: Fix signedness bug in btf_dump_array_data() selftests/bpf: Do not export subtest as standalone test bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures ... ==================== Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-09 18:40:56 -08:00
Maxim Levitsky	3915035282	KVM: x86: SVM: move avic definitions from AMD's spec to svm.h asm/svm.h is the correct place for all values that are defined in the SVM spec, and that includes AVIC. Also add some values from the spec that were not defined before and will be soon useful. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220207155447.840194-10-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-08 13:30:50 -05:00
Jim Mattson	fa31a4d669	x86/cpufeatures: Put the AMX macros in the word 18 block These macros are for bits in CPUID.(EAX=7,ECX=0):EDX, not for bits in CPUID(EAX=7,ECX=1):EAX. Put them with their brethren. [ bp: Sort word 18 bits properly, as caught by Like Xu <like.xu.linux@gmail.com> ] Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20220203194308.2469117-1-jmattson@google.com	2022-02-08 10:23:35 +01:00

... 5 6 7 8 9 ...

11340 Commits