linux

Author	SHA1	Message	Date
Sean Christopherson	a3322d5cd8	KVM: nSVM: Set the shadow root level to the TDP level for nested NPT Override the shadow root level in the MMU context when configuring NPT for shadowing nested NPT. The level is always tied to the TDP level of the host, not whatever level the guest happens to be using. Fixes: `096586fda5` ("KVM: nSVM: Correctly set the shadow NPT root level in its MMU role") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:27 -04:00
Sean Christopherson	73ad160693	KVM: x86/mmu: WARN on NULL pae_root or lm_root, or bad shadow root level WARN if KVM is about to dereference a NULL pae_root or lm_root when loading an MMU, and convert the BUG() on a bad shadow_root_level into a WARN (now that errors are handled cleanly). With nested NPT, botching the level and sending KVM down the wrong path is all too easy, and the on-demand allocation of pae_root and lm_root means bugs crash the host. Obviously, KVM could unconditionally allocate the roots, but that's arguably a worse failure mode as it would potentially corrupt the guest instead of crashing it. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:25 -04:00
Sean Christopherson	a91f387b4b	KVM: x86/mmu: Sync roots after MMU load iff load as successful For clarity, explicitly skip syncing roots if the MMU load failed instead of relying on the !VALID_PAGE check in kvm_mmu_sync_roots(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:24 -04:00
Sean Christopherson	61a1773e2e	KVM: x86/mmu: Unexport MMU load/unload functions Unexport the MMU load and unload helpers now that they are no longer used (incorrectly) in vendor code. Opportunistically move the kvm_mmu_sync_roots() declaration into mmu.h, it should not be exposed to vendor code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:23 -04:00
Sean Christopherson	17e368d94a	KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs Set the C-bit in SPTEs that are set outside of the normal MMU flows, specifically the PDPDTRs and the handful of special cased "LM root" entries, all of which are shadow paging only. Note, the direct-mapped-root PDPTR handling is needed for the scenario where paging is disabled in the guest, in which case KVM uses a direct mapped MMU even though TDP is disabled. Fixes: `d0ec49d4de` ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:21 -04:00
Sean Christopherson	e49e0b7bf3	KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks Exempt NULL PAE roots from the check to detect leaks, since kvm_mmu_free_roots() doesn't set them back to INVALID_PAGE. Stop hiding the WARNs to detect PAE root leaks behind MMU_WARN_ON, the hidden WARNs obviously didn't do their job given the hilarious number of bugs that could lead to PAE roots being leaked, not to mention the above false positive. Opportunistically delete a warning on root_hpa being valid, there's nothing special about 4/5-level shadow pages that warrants a WARN. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:20 -04:00
Sean Christopherson	6e0918aec4	KVM: x86/mmu: Check PDPTRs before allocating PAE roots Check the validity of the PDPTRs before allocating any of the PAE roots, otherwise a bad PDPTR will cause KVM to leak any previously allocated roots. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:19 -04:00
Sean Christopherson	6e6ec58485	KVM: x86/mmu: Ensure MMU pages are available when allocating roots Hold the mmu_lock for write for the entire duration of allocating and initializing an MMU's roots. This ensures there are MMU pages available and thus prevents root allocations from failing. That in turn fixes a bug where KVM would fail to free valid PAE roots if a one of the later roots failed to allocate. Add a comment to make_mmu_pages_available() to call out that the limit is a soft limit, e.g. KVM will temporarily exceed the threshold if a page fault allocates multiple shadow pages and there was only one page "available". Note, KVM _still_ leaks the PAE roots if the guest PDPTR checks fail. This will be addressed in a future commit. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:18 -04:00
Sean Christopherson	748e52b9b7	KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper Move the on-demand allocation of the pae_root and lm_root pages, used by nested NPT for 32-bit L1s, into a separate helper. This will allow a future patch to hold mmu_lock while allocating the non-special roots so that make_mmu_pages_available() can be checked once at the start of root allocation, and thus avoid having to deal with failure in the middle of root allocation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:17 -04:00
Sean Christopherson	ba0a194ffb	KVM: x86/mmu: Allocate the lm_root before allocating PAE roots Allocate lm_root before the PAE roots so that the PAE roots aren't leaked if the memory allocation for the lm_root happens to fail. Note, KVM can still leak PAE roots if mmu_check_root() fails on a guest's PDPTR, or if mmu_alloc_root() fails due to MMU pages not being available. Those issues will be fixed in future commits. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:16 -04:00
Sean Christopherson	b37233c911	KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots Grab 'mmu' and do s/vcpu->arch.mmu/mmu to shorten line lengths and yield smaller diffs when moving code around in future cleanup without forcing the new code to use the same ugly pattern. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:15 -04:00
Sean Christopherson	04d45551a1	KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit Allocate the so called pae_root page on-demand, along with the lm_root page, when shadowing 32-bit NPT with 64-bit NPT, i.e. when running a 32-bit L1. KVM currently only allocates the page when NPT is disabled, or when L0 is 32-bit (using PAE paging). Note, there is an existing memory leak involving the MMU roots, as KVM fails to free the PAE roots on failure. This will be addressed in a future commit. Fixes: `ee6268ba3a` ("KVM: x86: Skip pae_root shadow allocation if tdp enabled") Fixes: `b6b80c78af` ("KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT") Cc: stable@vger.kernel.org Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:14 -04:00
David Stevens	4a42d848db	KVM: x86/mmu: Consider the hva in mmu_notifier retry Track the range being invalidated by mmu_notifier and skip page fault retries if the fault address is not affected by the in-progress invalidation. Handle concurrent invalidations by finding the minimal range which includes all ranges being invalidated. Although the combined range may include unrelated addresses and cannot be shrunk as individual invalidation operations complete, it is unlikely the marginal gains of proper range tracking are worth the additional complexity. The primary benefit of this change is the reduction in the likelihood of extreme latency when handing a page fault due to another thread having been preempted while modifying host virtual addresses. Signed-off-by: David Stevens <stevensd@chromium.org> Message-Id: <20210222024522.1751719-3-stevensd@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-22 13:16:53 -05:00
Sean Christopherson	5f8a7cf25a	KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault Don't retry a page fault due to an mmu_notifier invalidation when handling a page fault for a GPA that did not resolve to a memslot, i.e. an MMIO page fault. Invalidations from the mmu_notifier signal a change in a host virtual address (HVA) mapping; without a memslot, there is no HVA and thus no possibility that the invalidation is relevant to the page fault being handled. Note, the MMIO vs. memslot generation checks handle the case where a pending memslot will create a memslot overlapping the faulting GPA. The mmu_notifier checks are orthogonal to memslot updates. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210222024522.1751719-2-stevensd@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-22 13:13:30 -05:00
Sean Christopherson	96ad91ae4e	KVM: x86/mmu: Remove a variety of unnecessary exports Remove several exports from the MMU that are no longer necessary. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:35 -05:00
Sean Christopherson	a1419f8b5b	KVM: x86: Fold "write-protect large" use case into generic write-protect Drop kvm_mmu_slot_largepage_remove_write_access() and refactor its sole caller to use kvm_mmu_slot_remove_write_access(). Remove the now-unused slot_handle_large_level() and slot_handle_all_level() helpers. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:35 -05:00
Sean Christopherson	b6e16ae5d9	KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML Stop setting dirty bits for MMU pages when dirty logging is disabled for a memslot, as PML is now completely disabled when there are no memslots with dirty logging enabled. This means that spurious PML entries will be created for memslots with dirty logging disabled if at least one other memslot has dirty logging enabled. However, spurious PML entries are already possible since dirty bits are set only when a dirty logging is turned off, i.e. memslots that are never dirty logged will have dirty bits cleared. In the end, it's faster overall to eat a few spurious PML entries in the window where dirty logging is being disabled across all memslots. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:35 -05:00
Sean Christopherson	a018eba538	KVM: x86: Move MMU's PML logic to common code Drop the facade of KVM's PML logic being vendor specific and move the bits that aren't truly VMX specific into common x86 code. The MMU logic for dealing with PML is tightly coupled to the feature and to VMX's implementation, bouncing through kvm_x86_ops obfuscates the code without providing any meaningful separation of concerns or encapsulation. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:34 -05:00
Sean Christopherson	6dd03800b1	KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function Store the vendor-specific dirty log size in a variable, there's no need to wrap it in a function since the value is constant after hardware_setup() runs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:33 -05:00
Sean Christopherson	9eba50f8d7	KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs When zapping SPTEs in order to rebuild them as huge pages, use the new helper that computes the max mapping level to detect whether or not a SPTE should be zapped. Doing so avoids zapping SPTEs that can't possibly be rebuilt as huge pages, e.g. due to hardware constraints, memslot alignment, etc... This also avoids zapping SPTEs that are still large, e.g. if migration was canceled before write-protected huge pages were shattered to enable dirty logging. Note, such pages are still write-protected at this time, i.e. a page fault VM-Exit will still occur. This will hopefully be addressed in a future patch. Sadly, TDP MMU loses its const on the memslot, but that's a pervasive problem that's been around for quite some time. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:28 -05:00
Sean Christopherson	0a234f5dd0	KVM: x86/mmu: Pass the memslot to the rmap callbacks Pass the memslot to the rmap callbacks, it will be used when zapping collapsible SPTEs to verify the memslot is compatible with hugepages before zapping its SPTEs. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:12 -05:00
Sean Christopherson	1b6d9d9ed5	KVM: x86/mmu: Split out max mapping level calculation to helper Factor out the logic for determining the maximum mapping level given a memslot and a gpa. The helper will be used when zapping collapsible SPTEs when disabling dirty logging, e.g. to avoid zapping SPTEs that can't possibly be rebuilt as hugepages. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:11 -05:00
Maciej S. Szmigiero	8f5c44f953	KVM: x86/mmu: Make HVA handler retpoline-friendly When retpolines are enabled they have high overhead in the inner loop inside kvm_handle_hva_range() that iterates over the provided memory area. Let's mark this function and its TDP MMU equivalent __always_inline so compiler will be able to change the call to the actual handler function inside each of them into a direct one. This significantly improves performance on the unmap test on the existing kernel memslot code (tested on a Xeon 8167M machine): 30 slots in use: Test Before After Improvement Unmap 0.0353s 0.0334s 5% Unmap 2M 0.00104s 0.000407s 61% 509 slots in use: Test Before After Improvement Unmap 0.0742s 0.0740s None Unmap 2M 0.00221s 0.00159s 28% Looks like having an indirect call in these functions (and, so, a retpoline) might have interfered with unrolling of the whole loop in the CPU. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <732d3fe9eb68aa08402a638ab0309199fa89ae56.1612810129.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-09 08:42:09 -05:00
Paolo Bonzini	897218ff7c	KVM: x86: compile out TDP MMU on 32-bit systems The TDP MMU assumes that it can do atomic accesses to 64-bit PTEs. Rather than just disabling it, compile it out completely so that it is possible to use for example 64-bit xchg. To limit the number of stubs, wrap all accesses to tdp_mmu_enabled or tdp_mmu_page with a function. Calls to all other functions in tdp_mmu.c are eliminated and do not even reach the linker. Reviewed-by: Sean Christopherson <seanjc@google.com> Tested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 14:49:01 -05:00
Sean Christopherson	6f8e65a601	KVM: x86/mmu: Add helper to generate mask of reserved HPA bits Add a helper to generate the mask of reserved PA bits in the host. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 09:27:29 -05:00
Sean Christopherson	5b7f575ccd	KVM: x86: Use reserved_gpa_bits to calculate reserved PxE bits Use reserved_gpa_bits, which accounts for exceptions to the maxphyaddr rule, e.g. SEV's C-bit, for the page {table,directory,etc...} entry (PxE) reserved bits checks. For SEV, the C-bit is ignored by hardware when walking pages tables, e.g. the APM states: Note that while the guest may choose to set the C-bit explicitly on instruction pages and page table addresses, the value of this bit is a don't-care in such situations as hardware always performs these as private accesses. Such behavior is expected to hold true for other features that repurpose GPA bits, e.g. KVM could theoretically emulate SME or MKTME, which both allow non-zero repurposed bits in the page tables. Conceptually, KVM should apply reserved GPA checks universally, and any features that do not adhere to the basic rule should be explicitly handled, i.e. if a GPA bit is repurposed but not allowed in page tables for whatever reason. Refactor __reset_rsvds_bits_mask() to take the pre-generated reserved bits mask, and opportunistically clean up its code, e.g. to align lines and comments. Practically speaking, this is change is a likely a glorified nop given the current KVM code base. SEV's C-bit is the only repurposed GPA bit, and KVM doesn't support shadowing encrypted page tables (which is theoretically possible via SEV debug APIs). Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 09:27:29 -05:00
Ben Gardon	a2855afc7e	KVM: x86/mmu: Allow parallel page faults for the TDP MMU Make the last few changes necessary to enable the TDP MMU to handle page faults in parallel while holding the mmu_lock in read mode. Reviewed-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-24-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:45 -05:00
Ben Gardon	531810caa9	KVM: x86/mmu: Use an rwlock for the x86 MMU Add a read / write lock to be used in place of the MMU spinlock on x86. The rwlock will enable the TDP MMU to handle page faults, and other operations in parallel in future commits. Reviewed-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-19-bgardon@google.com> [Introduce virt/kvm/mmu_lock.h - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:43 -05:00
Ben Gardon	8d1a182ea7	KVM: x86/mmu: Fix braces in kvm_recover_nx_lpages No functional change intended. Fixes: `29cf0f5007` ("kvm: x86/mmu: NX largepage recovery for TDP MMU") Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-10-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:39 -05:00
Stephen Zhang	805a0f8390	KVM: x86/mmu: Add '__func__' in rmap_printk() Given the common pattern: rmap_printk("%s:"..., __func__,...) we could improve this by adding '__func__' in rmap_printk(). Signed-off-by: Stephen Zhang <stephenzhangzsd@gmail.com> Message-Id: <1611713325-3591-1-git-send-email-stephenzhangzsd@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:36 -05:00
Jason Baron	b3646477d4	KVM: x86: use static calls to reduce kvm_x86_ops overhead Convert kvm_x86_ops to use static calls. Note that all kvm_x86_ops are covered here except for 'pmu_ops and 'nested ops'. Here are some numbers running cpuid in a loop of 1 million calls averaged over 5 runs, measured in the vm (lower is better). Intel Xeon 3000MHz: \|default \|mitigations=off ------------------------------------- vanilla \|.671s \|.486s static call\|.573s(-15%)\|.458s(-6%) AMD EPYC 2500MHz: \|default \|mitigations=off ------------------------------------- vanilla \|.710s \|.609s static call\|.664s(-6%) \|.609s(0%) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Message-Id: <e057bf1b8a7ad15652df6eeba3f907ae758d3399.1610680941.git.jbaron@akamai.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:30 -05:00
Sean Christopherson	c5e2184d15	KVM: x86/mmu: Remove the defunct update_pte() paging hook Remove the update_pte() shadow paging logic, which was obsoleted by commit `4731d4c7a0` ("KVM: MMU: out of sync shadow core"), but never removed. As pointed out by Yu, KVM never write protects leaf page tables for the purposes of shadow paging, and instead marks their associated shadow page as unsync so that the guest can write PTEs at will. The update_pte() path, which predates the unsync logic, optimizes COW scenarios by refreshing leaf SPTEs when they are written, as opposed to zapping the SPTE, restarting the guest, and installing the new SPTE on the subsequent fault. Since KVM no longer write-protects leaf page tables, update_pte() is unreachable and can be dropped. Reported-by: Yu Zhang <yu.c.zhang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210115004051.4099250-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:17 -05:00
Sean Christopherson	8fc517267f	KVM: x86: Zap the oldest MMU pages, not the newest Walk the list of MMU pages in reverse in kvm_mmu_zap_oldest_mmu_pages(). The list is FIFO, meaning new pages are inserted at the head and thus the oldest pages are at the tail. Using a "forward" iterator causes KVM to zap MMU pages that were just added, which obliterates guest performance once the max number of shadow MMU pages is reached. Fixes: `6b82ef2c9c` ("KVM: x86/mmu: Batch zap MMU pages when recycling oldest pages") Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210113205030.3481307-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:15 -05:00
Paolo Bonzini	bc351f0726	Merge branch 'kvm-master' into kvm-next Fixes to get_mmio_spte, destined to 5.10 stable branch.	2021-01-07 18:06:52 -05:00
Sean Christopherson	9aa418792f	KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte() Check only the terminal leaf for a "!PRESENT \|\| MMIO" SPTE when looking for reserved bits on valid, non-MMIO SPTEs. The get_walk() helpers terminate their walks if a not-present or MMIO SPTE is encountered, i.e. the non-terminal SPTEs have already been verified to be regular SPTEs. This eliminates an extra check-and-branch in a relatively hot loop. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:00:27 -05:00
Sean Christopherson	dde81f9477	KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array Bump the size of the sptes array by one and use the raw level of the SPTE to index into the sptes array. Using the SPTE level directly improves readability by eliminating the need to reason out why the level is being adjusted when indexing the array. The array is on the stack and is not explicitly initialized; bumping its size is nothing more than a superficial adjustment to the stack frame. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-4-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:00:26 -05:00
Sean Christopherson	39b4d43e60	KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE Get the so called "root" level from the low level shadow page table walkers instead of manually attempting to calculate it higher up the stack, e.g. in get_mmio_spte(). When KVM is using PAE shadow paging, the starting level of the walk, from the callers perspective, is not the CR3 root but rather the PDPTR "root". Checking for reserved bits from the CR3 root causes get_mmio_spte() to consume uninitialized stack data due to indexing into sptes[] for a level that was not filled by get_walk(). This can result in false positives and/or negatives depending on what garbage happens to be on the stack. Opportunistically nuke a few extra newlines. Fixes: `95fb5b0258` ("kvm: x86/mmu: Support MMIO in the TDP MMU") Reported-by: Richard Herbert <rherbert@sympatico.ca> Cc: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:00:24 -05:00
Sean Christopherson	2aa078932f	KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte() Return -1 from the get_walk() helpers if the shadow walk doesn't fill at least one spte, which can theoretically happen if the walk hits a not-present PDPTR. Returning the root level in such a case will cause get_mmio_spte() to return garbage (uninitialized stack data). In practice, such a scenario should be impossible as KVM shouldn't get a reserved-bit page fault with a not-present PDPTR. Note, using mmu->root_level in get_walk() is wrong for other reasons, too, but that's now a moot point. Fixes: `95fb5b0258` ("kvm: x86/mmu: Support MMIO in the TDP MMU") Cc: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:00:23 -05:00
Vitaly Kuznetsov	9a2a0d3ca1	kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT Commit `95fb5b0258` ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused the following WARNING on an Intel Ice Lake CPU: get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy: ------ spte 0xb80a0 level 5. ------ spte 0xfcd210107 level 4. ------ spte 0x1004c40107 level 3. ------ spte 0x1004c41107 level 2. ------ spte 0x1db00000000b83b6 level 1. WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm] ... Call Trace: ? kvm_io_bus_get_first_dev+0x55/0x110 [kvm] vcpu_enter_guest+0xaa1/0x16a0 [kvm] ? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel] ? skip_emulated_instruction+0xaa/0x150 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm] The guest triggering this crashes. Note, this happens with the traditional MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out, there was a subtle change in the above mentioned commit. Previously, walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level' which is returned by shadow_walk_init() and this equals to 'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to 'int root = vcpu->arch.mmu->root_level'. The difference between 'root_level' and 'shadow_root_level' on CPUs supporting 5-level page tables is that in some case we don't want to use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48' kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used, the corresponding SPTE will fail '__is_rsvd_bits_set()' check. Revert to using 'shadow_root_level'. Fixes: `95fb5b0258` ("kvm: x86/mmu: Support MMIO in the TDP MMU") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-27 11:14:27 -05:00
Peter Xu	044c59c409	KVM: Don't allocate dirty bitmap if dirty ring is enabled Because kvm dirty rings and kvm dirty log is used in an exclusive way, Let's avoid creating the dirty_bitmap when kvm dirty ring is enabled. At the meantime, since the dirty_bitmap will be conditionally created now, we can't use it as a sign of "whether this memory slot enabled dirty tracking". Change users like that to check against the kvm memory slot flags. Note that there still can be chances where the kvm memory slot got its dirty_bitmap allocated, _if_ the memory slots are created before enabling of the dirty rings and at the same time with the dirty tracking capability enabled, they'll still with the dirty_bitmap. However it should not hurt much (e.g., the bitmaps will always be freed if they are there), and the real users normally won't trigger this because dirty bit tracking flag should in most cases only be applied to kvm slots only before migration starts, that should be far latter than kvm initializes (VM starts). Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201001012226.5868-1-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-15 09:49:16 -05:00
Peter Xu	fb04a1eddb	KVM: X86: Implement ring-based dirty memory tracking This patch is heavily based on previous work from Lei Cao <lei.cao@stratus.com> and Paolo Bonzini <pbonzini@redhat.com>. [1] KVM currently uses large bitmaps to track dirty memory. These bitmaps are copied to userspace when userspace queries KVM for its dirty page information. The use of bitmaps is mostly sufficient for live migration, as large parts of memory are be dirtied from one log-dirty pass to another. However, in a checkpointing system, the number of dirty pages is small and in fact it is often bounded---the VM is paused when it has dirtied a pre-defined number of pages. Traversing a large, sparsely populated bitmap to find set bits is time-consuming, as is copying the bitmap to user-space. A similar issue will be there for live migration when the guest memory is huge while the page dirty procedure is trivial. In that case for each dirty sync we need to pull the whole dirty bitmap to userspace and analyse every bit even if it's mostly zeros. The preferred data structure for above scenarios is a dense list of guest frame numbers (GFN). This patch series stores the dirty list in kernel memory that can be memory mapped into userspace to allow speedy harvesting. This patch enables dirty ring for X86 only. However it should be easily extended to other archs as well. [1] https://patchwork.kernel.org/patch/10471409/ Signed-off-by: Lei Cao <lei.cao@stratus.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201001012222.5767-1-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-15 09:49:15 -05:00
Li RongQing	c6c4f961cb	KVM: x86/mmu: fix counting of rmap entries in pte_list_add Fix an off-by-one style bug in pte_list_add() where it failed to account the last full set of SPTEs, i.e. when desc->sptes is full and desc->more is NULL. Merge the two "PTE_LIST_EXT-1" checks as part of the fix to avoid an extra comparison. Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <1601196297-24104-1-git-send-email-lirongqing@baidu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-08 04:41:26 -05:00
Paolo Bonzini	8a967d655e	KVM: x86: replace static const variables with macros Even though the compiler is able to replace static const variables with their value, it will warn about them being unused when Linux is built with W=1. Use good old macros instead, this is not C++. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-30 13:39:55 -04:00
Ben Gardon	29cf0f5007	kvm: x86/mmu: NX largepage recovery for TDP MMU When KVM maps a largepage backed region at a lower level in order to make it executable (i.e. NX large page shattering), it reduces the TLB performance of that region. In order to avoid making this degradation permanent, KVM must periodically reclaim shattered NX largepages by zapping them and allowing them to be rebuilt in the page fault handler. With this patch, the TDP MMU does not respect KVM's rate limiting on reclaim. It traverses the entire TDP structure every time. This will be addressed in a future patch. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-21-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-23 03:42:16 -04:00
Ben Gardon	daa5b6c123	kvm: x86/mmu: Don't clear write flooding count for direct roots Direct roots don't have a write flooding count because the guest can't affect that paging structure. Thus there's no need to clear the write flooding count on a fast CR3 switch for direct roots. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-20-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-23 03:42:15 -04:00
Ben Gardon	95fb5b0258	kvm: x86/mmu: Support MMIO in the TDP MMU In order to support MMIO, KVM must be able to walk the TDP paging structures to find mappings for a given GFN. Support this walk for the TDP MMU. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 v2: Thanks to Dan Carpenter and kernel test robot for finding that root was used uninitialized in get_mmio_spte. Signed-off-by: Ben Gardon <bgardon@google.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Message-Id: <20201014182700.2888246-19-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-23 03:42:15 -04:00
Ben Gardon	46044f72c3	kvm: x86/mmu: Support write protection for nesting in tdp MMU To support nested virtualization, KVM will sometimes need to write protect pages which are part of a shadowed paging structure or are not writable in the shadowed paging structure. Add a function to write protect GFN mappings for this purpose. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-18-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-23 03:42:14 -04:00
Ben Gardon	1488199856	kvm: x86/mmu: Support disabling dirty logging for the tdp MMU Dirty logging ultimately breaks down MMU mappings to 4k granularity. When dirty logging is no longer needed, these granaular mappings represent a useless performance penalty. When dirty logging is disabled, search the paging structure for mappings that could be re-constituted into a large page mapping. Zap those mappings so that they can be faulted in again at a higher mapping level. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-17-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-23 03:42:14 -04:00
Ben Gardon	a6a0b05da9	kvm: x86/mmu: Support dirty logging for the TDP MMU Dirty logging is a key feature of the KVM MMU and must be supported by the TDP MMU. Add support for both the write protection and PML dirty logging modes. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-16-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-23 03:42:13 -04:00
Ben Gardon	1d8dd6b3f1	kvm: x86/mmu: Support changed pte notifier in tdp MMU In order to interoperate correctly with the rest of KVM and other Linux subsystems, the TDP MMU must correctly handle various MMU notifiers. Add a hook and handle the change_pte MMU notifier. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538 Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-15-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-23 03:42:12 -04:00

... 5 6 7 8 9 ...

513 Commits