linux

Author	SHA1	Message	Date
Sean Christopherson	872e0c5308	KVM: x86: Move get_cs_db_l_bits() helper to SVM Move kvm_get_cs_db_l_bits() to SVM and rename it appropriately so that its svm_x86_ops entry can be filled via kvm-x86-ops, and to eliminate a superfluous export from KVM x86. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:21 -05:00
Sean Christopherson	58fccda47e	KVM: VMX: Rename VMX functions to conform to kvm_x86_ops names Massage VMX's implementation names for kvm_x86_ops to maximize use of kvm-x86-ops.h. Leave cpu_has_vmx_wbinvd_exit() as-is to preserve the cpu_has_vmx_*() pattern used for querying VMCS capabilities. Keep pi_has_pending_interrupt() as vmx_dy_apicv_has_pending_interrupt() does a poor job of describing exactly what is being checked in VMX land. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:20 -05:00
Sean Christopherson	7ad02ef0da	KVM: x86: Use static_call() for copy/move encryption context ioctls() Define and use static_call()s for .vm_{copy,move}_enc_context_from(), mostly so that the op is defined in kvm-x86-ops.h. This will allow using KVM_X86_OP in vendor code to wire up the implementation. Any performance gains eeked out by using static_call() is a happy bonus and not the primary motiviation. Opportunistically refactor the code to reduce indentation and keep line lengths reasonable, and to be consistent when wrapping versus running a bit over the 80 char soft limit. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:20 -05:00
Sean Christopherson	dfc4e6ca04	KVM: x86: Unexport kvm_x86_ops Drop the export of kvm_x86_ops now it is no longer referenced by SVM or VMX. Disallowing access to kvm_x86_ops is very desirable as it prevents vendor code from incorrectly modifying hooks after they have been set by kvm_arch_hardware_setup(), and more importantly after each function's associated static_call key has been updated. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:19 -05:00
Sean Christopherson	3d4421f8f2	KVM: x86: Uninline and export hv_track_root_tdp() Uninline and export Hyper-V's hv_track_root_tdp(), which is (somewhat indirectly) the last remaining reference to kvm_x86_ops from vendor modules, i.e. will allow unexporting kvm_x86_ops. Reloading the TDP PGD isn't the fastest of paths, hv_track_root_tdp() isn't exactly tiny, and disallowing vendor code from accessing kvm_x86_ops provides nice-to-have encapsulation of common x86 code (and of Hyper-V code for that matter). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:19 -05:00
Sean Christopherson	0bcd556e15	KVM: nVMX: Refactor PMU refresh to avoid referencing kvm_x86_ops.pmu_ops Refactor the nested VMX PMU refresh helper to pass it a flag stating whether or not the vCPU has PERF_GLOBAL_CTRL instead of having the nVMX helper query the information by bouncing through kvm_x86_ops.pmu_ops. This will allow a future patch to use static_call() for the PMU ops without having to export any static call definitions from common x86, and it is also a step toward unexported kvm_x86_ops. Alternatively, nVMX could call kvm_pmu_is_valid_msr() to indirectly use kvm_x86_ops.pmu_ops, but that would incur an extra layer of indirection and would require exporting kvm_pmu_is_valid_msr(). Opportunistically rename the helper to keep line lengths somewhat reasonable, and to better capture its high-level role. No functional change intended. Cc: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:19 -05:00
Sean Christopherson	0264a35109	KVM: xen: Use static_call() for invoking kvm_x86_ops hooks Use static_call() for invoking kvm_x86_ops function that already have a defined static call, mostly as a step toward having _all_ calls to kvm_x86_ops route through a static_call() in order to simplify auditing, e.g. via grep, that all functions have an entry in kvm-x86-ops.h, but also because there's no reason not to use a static_call(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:18 -05:00
Sean Christopherson	a0941a64a9	KVM: x86: Use static_call() for .vcpu_deliver_sipi_vector() Define and use a static_call() for kvm_x86_ops.vcpu_deliver_sipi_vector(), mostly so that the op is defined in kvm-x86-ops.h. This will allow using KVM_X86_OP in vendor code to wire up the implementation. Any performance gains eeked out by using static_call() is a happy bonus and not the primary motiviation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:18 -05:00
Sean Christopherson	ef2d488c65	KVM: VMX: Call vmx_get_cpl() directly in handle_dr() Use vmx_get_cpl() instead of bouncing through kvm_x86_ops.get_cpl() when performing a CPL check on MOV DR accesses. This avoids a RETPOLINE (when enabled), and more importantly removes a vendor reference to kvm_x86_ops and helps pave the way for unexporting kvm_x86_ops. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:17 -05:00
Sean Christopherson	e27bc0440e	KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names Rename a variety of kvm_x86_op function pointers so that preferred name for vendor implementations follows the pattern <vendor>_<function>, e.g. rename .run() to .vcpu_run() to match {svm,vmx}_vcpu_run(). This will allow vendor implementations to be wired up via the KVM_X86_OP macro. In many cases, VMX and SVM "disagree" on the preferred name, though in reality it's VMX and x86 that disagree as SVM blindly prepended _svm to the kvm_x86_ops name. Justification for using the VMX nomenclature: - set_{irq,nmi} => inject_{irq,nmi} because the helper is injecting an event that has already been "set" in e.g. the vIRR. SVM's relevant VMCB field is even named event_inj, and KVM's stat is irq_injections. - prepare_guest_switch => prepare_switch_to_guest because the former is ambiguous, e.g. it could mean switching between multiple guests, switching from the guest to host, etc... - update_pi_irte => pi_update_irte to allow for matching match the rest of VMX's posted interrupt naming scheme, which is vmx_pi_<blah>(). - start_assignment => pi_start_assignment to again follow VMX's posted interrupt naming scheme, and to provide context for what bit of code might care about an otherwise undescribed "assignment". The "tlb_flush" => "flush_tlb" creates an inconsistency with respect to Hyper-V's "tlb_remote_flush" hooks, but Hyper-V really is the one that's wrong. x86, VMX, and SVM all use flush_tlb, and even common KVM is on a variant of the bandwagon with "kvm_flush_remote_tlbs", e.g. a more appropriate name for the Hyper-V hooks would be flush_remote_tlbs. Leave that change for another time as the Hyper-V hooks always start as NULL, i.e. the name doesn't matter for using kvm-x86-ops.h, and changing all names requires an astounding amount of churn. VMX and SVM function names are intentionally left as is to minimize the diff. Both VMX and SVM will need to rename even more functions in order to fully utilize KVM_X86_OPS, i.e. an additional patch for each is inevitable. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:17 -05:00
Sean Christopherson	feee3d9d5b	KVM: x86: Drop export for .tlb_flush_current() static_call key Remove the export of kvm_x86_tlb_flush_current() as there are no longer any users outside of common x86 code. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:16 -05:00
Paolo Bonzini	2746a6b72a	KVM: x86: skip host CPUID call for hypervisor leaves Hypervisor leaves are always synthesized by __do_cpuid_func; just return zeroes and do not ask the host. Even on nested virtualization, a value from another hypervisor would be bogus, because all hypercalls and MSRs are processed by KVM. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:16 -05:00
Jinrong Liang	9d68c6f60e	KVM: x86: Remove unused "flags" of kvm_pv_kick_cpu_op() The "unsigned long flags" parameter of kvm_pv_kick_cpu_op() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-20-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:15 -05:00
Jinrong Liang	62711e5a74	KVM: x86: Remove unused "vcpu" of kvm_scale_tsc() The "struct kvm_vcpu *vcpu" parameter of kvm_scale_tsc() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-18-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:14 -05:00
Jinrong Liang	7127fd3677	KVM: x86/emulate: Remove unused "tss_selector" of task_switch_{16, 32}() The "u16 tss_selector" parameter of task_switch_{16, 32}() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-16-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:14 -05:00
Jinrong Liang	09d9423d0e	KVM: x86/emulate: Remove unused "ctxt" of setup_syscalls_segments() The "struct x86_emulate_ctxt *ctxt" parameter of setup_syscalls_segments() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-15-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:13 -05:00
Jinrong Liang	019024e563	KVM: x86/ioapic: Remove unused "addr" and "length" of ioapic_read_indirect() The "unsigned long addr" and "unsigned long length" parameter of ioapic_read_indirect() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-14-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:13 -05:00
Jinrong Liang	1f2e66f037	KVM: x86/i8259: Remove unused "addr" of elcr_ioport_{read,write}() The "u32 addr" parameter of elcr_ioport_write() and elcr_ioport_read() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-13-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:12 -05:00
Paolo Bonzini	068f7ea618	KVM: SVM: improve split between svm_prepare_guest_switch and sev_es_prepare_guest_switch KVM performs the VMSAVE to the host save area for both regular and SEV-ES guests, so hoist it up to svm_prepare_guest_switch. And because sev_es_prepare_guest_switch does not really need to know the details of struct svm_cpu_data *, just pass it the pointer to the host save area inside the HSAVE page. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:12 -05:00
Jinrong Liang	98242dcafe	KVM: x86/svm: Remove unused "vcpu" of svm_check_exit_valid() The "struct kvm_vcpu *vcpu" parameter of svm_check_exit_valid() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-7-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:11 -05:00
Jinrong Liang	0758d6a7c3	KVM: x86/mmu_audit: Remove unused "level" of audit_spte_after_sync() The "int level" parameter of audit_spte_after_sync() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-6-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:11 -05:00
Jinrong Liang	ad6d6b949e	KVM: x86/tdp_mmu: Remove unused "kvm" of kvm_tdp_mmu_get_root() The "struct kvm *kvm" parameter of kvm_tdp_mmu_get_root() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-5-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:10 -05:00
Jinrong Liang	e8f6e7383c	KVM: x86/mmu: Remove unused "vcpu" of reset_{tdp,ept}_shadow_zero_bits_mask() The "struct kvm_vcpu *vcpu" parameter of reset_ept_shadow_zero_bits_mask() and reset_tdp_shadow_zero_bits_mask() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-4-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:10 -05:00
Jinrong Liang	a0e72cd1e9	KVM: x86/mmu: Remove unused "kvm" of __rmap_write_protect() The "struct kvm *kvm" parameter of __rmap_write_protect() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-3-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:10 -05:00
Jinrong Liang	61827671ca	KVM: x86/mmu: Remove unused "kvm" of kvm_mmu_unlink_parents() The "struct kvm *kvm" parameter of kvm_mmu_unlink_parents() is not used, so remove it. No functional change intended. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220125095909.38122-2-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:09 -05:00
Sean Christopherson	f15756428d	KVM: x86: Skip APICv update if APICv is disable at the module level Bail from the APICv update paths _before_ taking apicv_update_lock if APICv is disabled at the module level. kvm_request_apicv_update() in particular is invoked from multiple paths that can be reached without APICv being enabled, e.g. svm_enable_irq_window(), and taking the rw_sem for write when APICv is disabled may introduce unnecessary contention and stalls. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211208015236.1616697-25-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:09 -05:00
Sean Christopherson	7446cfebe8	KVM: x86: Drop NULL check on kvm_x86_ops.check_apicv_inhibit_reasons Drop the useless NULL check on kvm_x86_ops.check_apicv_inhibit_reasons when handling an APICv update, both VMX and SVM unconditionally implement the helper and leave it non-NULL even if APICv is disabled at the module level. The latter is a moot point now that __kvm_request_apicv_update() is called if and only if enable_apicv is true. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211208015236.1616697-26-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:08 -05:00
Sean Christopherson	cf9e255532	KVM: x86: Unexport __kvm_request_apicv_update() Unexport __kvm_request_apicv_update(), it's not used by vendor code and should never be used by vendor code. The only reason it's exposed at all is because Hyper-V's SynIC needs to track how many auto-EOIs are in use, and it's convenient to use apicv_update_lock to guard that tracking. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211208015236.1616697-27-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:08 -05:00
Sean Christopherson	d62007edf0	KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU Zap both valid and invalid roots when zapping/unmapping a gfn range, as KVM must ensure it holds no references to the freed page after returning from the unmap operation. Most notably, the TDP MMU doesn't zap invalid roots in mmu_notifier callbacks. This leads to use-after-free and other issues if the mmu_notifier runs to completion while an invalid root zapper yields as KVM fails to honor the requirement that there must be _no_ references to the page after the mmu_notifier returns. The bug is most easily reproduced by hacking KVM to cause a collision between set_nx_huge_pages() and kvm_mmu_notifier_release(), but the bug exists between kvm_mmu_notifier_invalidate_range_start() and memslot updates as well. Invalidating a root ensures pages aren't accessible by the guest, and KVM won't read or write page data itself, but KVM will trigger e.g. kvm_set_pfn_dirty() when zapping SPTEs, and thus completing a zap of an invalid root _after_ the mmu_notifier returns is fatal. WARNING: CPU: 24 PID: 1496 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:173 [kvm] RIP: 0010:kvm_is_zone_device_pfn+0x96/0xa0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0xa8/0xe0 [kvm] __handle_changed_spte+0x2ab/0x5e0 [kvm] __handle_changed_spte+0x2ab/0x5e0 [kvm] __handle_changed_spte+0x2ab/0x5e0 [kvm] zap_gfn_range+0x1f3/0x310 [kvm] kvm_tdp_mmu_zap_invalidated_roots+0x50/0x90 [kvm] kvm_mmu_zap_all_fast+0x177/0x1a0 [kvm] set_nx_huge_pages+0xb4/0x190 [kvm] param_attr_store+0x70/0x100 module_attr_store+0x19/0x30 kernfs_fop_write_iter+0x119/0x1b0 new_sync_write+0x11c/0x1b0 vfs_write+0x1cc/0x270 ksys_write+0x5f/0xe0 do_syscall_64+0x38/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Fixes: `b7cccd397f` ("KVM: x86/mmu: Fast invalidation for TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211215011557.399940-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:07 -05:00
Sean Christopherson	04dc4e6ce2	KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root() Move the check for an invalid root out of kvm_tdp_mmu_get_root() and into the one place it actually matters, tdp_mmu_next_root(), as the other user already has an implicit validity check. A future bug fix will need to get references to invalid roots to honor mmu_notifier requests; there's no point in forcing what will be a common path to open code getting a reference to a root. No functional change intended. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211215011557.399940-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:07 -05:00
Sean Christopherson	83b83a0207	KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap hook Use the common TDP MMU zap helper when handling an MMU notifier unmap event, the two flows are semantically identical. Consolidate the code in preparation for a future bug fix, as both kvm_tdp_mmu_unmap_gfn_range() and __kvm_tdp_mmu_zap_gfn_range() are guilty of not zapping SPTEs in invalid roots. No functional change intended. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211215011557.399940-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:47:06 -05:00
David Woodhouse	fcb732d8f8	KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU There are circumstances whem kvm_xen_update_runstate_guest() should not sleep because it ends up being called from __schedule() when the vCPU is preempted: [ 222.830825] kvm_xen_update_runstate_guest+0x24/0x100 [ 222.830878] kvm_arch_vcpu_put+0x14c/0x200 [ 222.830920] kvm_sched_out+0x30/0x40 [ 222.830960] __schedule+0x55c/0x9f0 To handle this, make it use the same trick as __kvm_xen_has_interrupt(), of using the hva from the gfn_to_hva_cache directly. Then it can use pagefault_disable() around the accesses and just bail out if the page is absent (which is unlikely). I almost switched to using a gfn_to_pfn_cache here and bailing out if kvm_map_gfn() fails, like kvm_steal_time_set_preempted() does — but on closer inspection it looks like kvm_map_gfn() will always fail in atomic context for a page in IOMEM, which means it will silently fail to make the update every single time for such guests, AFAICT. So I didn't do it that way after all. And will probably fix that one too. Cc: stable@vger.kernel.org Fixes: `30b5c851af` ("KVM: x86/xen: Add support for vCPU runstate information") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <b17a93e5ff4561e57b1238e3e7ccd0b613eb827e.camel@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-10 13:39:06 -05:00
Jiapeng Chong	afea27dc31	xen/x2apic: Fix inconsistent indenting Eliminate the follow smatch warning: arch/x86/xen/enlighten_hvm.c:189 xen_cpu_dead_hvm() warn: inconsistent indenting. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220207103506.102008-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Juergen Gross <jgross@suse.com>	2022-02-10 11:10:20 +01:00
Roger Pau Monne	e07e98da92	xen/x86: detect support for extended destination ID Xen allows the usage of some previously reserved bits in the IO-APIC RTE and the MSI address fields in order to store high bits for the target APIC ID. Such feature is already implemented by QEMU/KVM and HyperV, so in order to enable it just add the handler that checks for it's presence. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220120152527.7524-3-roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>	2022-02-10 11:10:17 +01:00
Jan Beulich	f34c4f2dd2	xen/x86: obtain full video frame buffer address for Dom0 also under EFI The initial change would not work when Xen was booted from EFI: There is an early exit from the case block in that case. Move the necessary code ahead of that. Fixes: `335e4dd67b` ("xen/x86: obtain upper 32 bits of video frame buffer address for Dom0") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/2501ce9d-40e5-b49d-b0e5-435544d17d4a@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>	2022-02-10 11:07:23 +01:00
Jakub Kicinski	1127170d45	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-02-09 We've added 126 non-merge commits during the last 16 day(s) which contain a total of 201 files changed, 4049 insertions(+), 2215 deletions(-). The main changes are: 1) Add custom BPF allocator for JITs that pack multiple programs into a huge page to reduce iTLB pressure, from Song Liu. 2) Add __user tagging support in vmlinux BTF and utilize it from BPF verifier when generating loads, from Yonghong Song. 3) Add per-socket fast path check guarding from cgroup/BPF overhead when used by only some sockets, from Pavel Begunkov. 4) Continued libbpf deprecation work of APIs/features and removal of their usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko and various others. 5) Improve BPF instruction set documentation by adding byte swap instructions and cleaning up load/store section, from Christoph Hellwig. 6) Switch BPF preload infra to light skeleton and remove libbpf dependency from it, from Alexei Starovoitov. 7) Fix architecture-agnostic macros in libbpf for accessing syscall arguments from BPF progs for non-x86 architectures, from Ilya Leoshkevich. 8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be of 16-bit field with anonymous zero padding, from Jakub Sitnicki. 9) Add new bpf_copy_from_user_task() helper to read memory from a different task than current. Add ability to create sleepable BPF iterator progs, from Kenny Yu. 10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski. 11) Generate temporary netns names for BPF selftests to avoid naming collisions, from Hangbin Liu. 12) Implement bpf_core_types_are_compat() with limited recursion for in-kernel usage, from Matteo Croce. 13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5 to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor. 14) Misc minor fixes to libbpf and selftests from various folks. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits) selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide libbpf: Fix compilation warning due to mismatched printf format selftests/bpf: Test BPF_KPROBE_SYSCALL macro libbpf: Add BPF_KPROBE_SYSCALL macro libbpf: Fix accessing the first syscall argument on s390 libbpf: Fix accessing the first syscall argument on arm64 libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390 libbpf: Fix accessing syscall arguments on riscv libbpf: Fix riscv register names libbpf: Fix accessing syscall arguments on powerpc selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro libbpf: Add PT_REGS_SYSCALL_REGS macro selftests/bpf: Fix an endianness issue in bpf_syscall_macro test bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE bpf: Fix leftover header->pages in sparc and powerpc code. libbpf: Fix signedness bug in btf_dump_array_data() selftests/bpf: Do not export subtest as standalone test bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures ... ==================== Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-09 18:40:56 -08:00
Hans de Goede	3eb616b264	x86/PCI: revert "Ignore E820 reservations for bridge windows on newer systems" Commit `7f7b4236f2` ("x86/PCI: Ignore E820 reservations for bridge windows on newer systems") fixes the touchpad not working on laptops like the Lenovo IdeaPad 3 15IIL05 and the Lenovo IdeaPad 5 14IIL05, as well as fixing thunderbolt hotplug issues on the Lenovo Yoga C940. Unfortunately it turns out that this is causing issues with suspend/resume on Lenovo ThinkPad X1 Carbon Gen 2 laptops. So, per the no regressions policy, rever this. Note I'm looking into another fix for the issues this fixed. Fixes: `7f7b4236f2` ("x86/PCI: Ignore E820 reservations for bridge windows on newer systems") BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2029207 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2022-02-09 19:42:58 +01:00
Maxim Levitsky	3915035282	KVM: x86: SVM: move avic definitions from AMD's spec to svm.h asm/svm.h is the correct place for all values that are defined in the SVM spec, and that includes AVIC. Also add some values from the spec that were not defined before and will be soon useful. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220207155447.840194-10-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-08 13:30:50 -05:00
Maxim Levitsky	755c2bf878	KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it kvm_apic_update_apicv is called when AVIC is still active, thus IRR bits can be set by the CPU after it is called, and don't cause the irr_pending to be set to true. Also logic in avic_kick_target_vcpu doesn't expect a race with this function so to make it simple, just keep irr_pending set to true and let the next interrupt injection to the guest clear it. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220207155447.840194-9-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-08 13:30:49 -05:00
Maxim Levitsky	2b0ecccb55	KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them Fix a corner case in which the L1 hypervisor intercepts interrupts (INTERCEPT_INTR) and either doesn't set virtual interrupt masking (V_INTR_MASKING) or enters a nested guest with EFLAGS.IF disabled prior to the entry. In this case, despite the fact that L1 intercepts the interrupts, KVM still needs to set up an interrupt window to wait before injecting the INTR vmexit. Currently the KVM instead enters an endless loop of 'req_immediate_exit'. Exactly the same issue also happens for SMIs and NMI. Fix this as well. Note that on VMX, this case is impossible as there is only 'vmexit on external interrupts' execution control which either set, in which case both host and guest's EFLAGS.IF are ignored, or not set, in which case no VMexits are delivered. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220207155447.840194-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-08 13:30:49 -05:00
Maxim Levitsky	91f673b3e1	KVM: x86: nSVM: expose clean bit support to the guest KVM already honours few clean bits thus it makes sense to let the nested guest know about it. Note that KVM also doesn't check if the hardware supports clean bits, and therefore nested KVM was already setting clean bits and L0 KVM was already honouring them. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220207155447.840194-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-08 13:30:49 -05:00
Maxim Levitsky	759cbd5967	KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM While RSM induced VM entries are not full VM entries, they still need to be followed by actual VM entry to complete it, unlike setting the nested state. This patch fixes boot of hyperv and SMM enabled windows VM running nested on KVM, which fail due to this issue combined with lack of dirty bit setting. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Message-Id: <20220207155447.840194-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-08 13:30:48 -05:00
Maxim Levitsky	e8efa4ff00	KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state While usually, restoring the smm state makes the KVM enter the nested guest thus a different vmcb (vmcb02 vs vmcb01), KVM should still mark it as dirty, since hardware can in theory cache multiple vmcbs. Failure to do so, combined with lack of setting the nested_run_pending (which is fixed in the next patch), might make KVM re-enter vmcb01, which was just exited from, with completely different set of guest state registers (SMM vs non SMM) and without proper dirty bits set, which results in the CPU reusing stale IDTR pointer which leads to a guest shutdown on any interrupt. On the real hardware this usually doesn't happen, but when running nested, L0's KVM does check and honour few dirty bits, causing this issue to happen. This patch fixes boot of hyperv and SMM enabled windows VM running nested on KVM. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Message-Id: <20220207155447.840194-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-08 13:30:47 -05:00
Maxim Levitsky	e1779c2714	KVM: x86: nSVM: fix potential NULL derefernce on nested migration Turns out that due to review feedback and/or rebases I accidentally moved the call to nested_svm_load_cr3 to be too early, before the NPT is enabled, which is very wrong to do. KVM can't even access guest memory at that point as nested NPT is needed for that, and of course it won't initialize the walk_mmu, which is main issue the patch was addressing. Fix this for real. Fixes: `232f75d3b4` ("KVM: nSVM: call nested_svm_load_cr3 on nested state load") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220207155447.840194-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-08 13:30:47 -05:00
Maxim Levitsky	c53bbe2145	KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case When the guest doesn't enable paging, and NPT/EPT is disabled, we use guest't paging CR3's as KVM's shadow paging pointer and we are technically in direct mode as if we were to use NPT/EPT. In direct mode we create SPTEs with user mode permissions because usually in the direct mode the NPT/EPT doesn't need to restrict access based on guest CPL (there are MBE/GMET extenstions for that but KVM doesn't use them). In this special "use guest paging as direct" mode however, and if CR4.SMAP/CR4.SMEP are enabled, that will make the CPU fault on each access and KVM will enter endless loop of page faults. Since page protection doesn't have any meaning in !PG case, just don't passthrough these bits. The fix is the same as was done for VMX in commit: commit `656ec4a492` ("KVM: VMX: fix SMEP and SMAP without EPT") This fixes the boot of windows 10 without NPT for good. (Without this patch, BSP boots, but APs were stuck in endless loop of page faults, causing the VM boot with 1 CPU) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Message-Id: <20220207155447.840194-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-08 13:30:47 -05:00
Sean Christopherson	dd4589eee9	Revert "svm: Add warning message for AVIC IPI invalid target" Remove a WARN on an "AVIC IPI invalid target" exit, the WARN is trivial to trigger from guest as it will fail on any destination APIC ID that doesn't exist from the guest's perspective. Don't bother recording anything in the kernel log, the common tracepoint for kvm_avic_incomplete_ipi() is sufficient for debugging. This reverts commit `37ef0c4414`. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-02-08 13:30:46 -05:00
Song Liu	f95f768f0a	bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures Instead of BUG_ON(), fail gracefully and return orig_prog. Fixes: `1022a5498f` ("bpf, x86_64: Use bpf_jit_binary_pack_alloc") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220208062533.3802081-1-song@kernel.org	2022-02-08 09:23:18 -08:00
Jim Mattson	fa31a4d669	x86/cpufeatures: Put the AMX macros in the word 18 block These macros are for bits in CPUID.(EAX=7,ECX=0):EDX, not for bits in CPUID(EAX=7,ECX=1):EAX. Put them with their brethren. [ bp: Sort word 18 bits properly, as caught by Like Xu <like.xu.linux@gmail.com> ] Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20220203194308.2469117-1-jmattson@google.com	2022-02-08 10:23:35 +01:00
Song Liu	1022a5498f	bpf, x86_64: Use bpf_jit_binary_pack_alloc Use bpf_jit_binary_pack_alloc in x86_64 jit. The jit engine first writes the program to the rw buffer. When the jit is done, the program is copied to the final location with bpf_jit_binary_pack_finalize. Note that we need to do bpf_tail_call_direct_fixup after finalize. Therefore, the text_live = false logic in __bpf_arch_text_poke is no longer needed. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-10-song@kernel.org	2022-02-07 18:13:01 -08:00
Song Liu	ebc1415d9b	bpf: Introduce bpf_arch_text_copy This will be used to copy JITed text to RO protected module memory. On x86, bpf_arch_text_copy is implemented with text_poke_copy. bpf_arch_text_copy returns pointer to dst on success, and ERR_PTR(errno) on errors. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204185742.271030-7-song@kernel.org	2022-02-07 18:13:01 -08:00

... 20 21 22 23 24 ...

41594 Commits