linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-17 01:22:07 +00:00

Author	SHA1	Message	Date
Nicolas Kaiser	9611c18777	KVM: fix typo in copyright notice Fix typo in copyright notice. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:14 +02:00
Xiao Guangrong	3377078027	KVM: MMU: move access code parsing to FNAME(walk_addr) function Move access code parsing from caller site to FNAME(walk_addr) function Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Xiao Guangrong	20bd40dc64	KVM: MMU: cleanup for error mask set while walk guest page table Small cleanup for set page fault error code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:10 +02:00
Joerg Roedel	2d48a985c7	KVM: MMU: Track NX state in struct kvm_mmu With Nested Paging emulation the NX state between the two MMU contexts may differ. To make sure that always the right fault error code is recorded this patch moves the NX state into struct kvm_mmu so that the code can distinguish between L1 and L2 NX state. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:44 +02:00
Joerg Roedel	d41d1895eb	KVM: MMU: Introduce kvm_pdptr_read_mmu This function is implemented to load the pdptr pointers of the currently running guest (l1 or l2 guest). Therefore it takes care about the current paging mode and can read pdptrs out of l2 guest physical memory. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:42 +02:00
Joerg Roedel	2329d46d21	KVM: MMU: Make walk_addr_generic capable for two-level walking This patch uses kvm_read_guest_page_tdp to make the walk_addr_generic functions suitable for two-level page table walking. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:38 +02:00
Joerg Roedel	6539e738f6	KVM: MMU: Implement nested gva_to_gpa functions This patch adds the functions to do a nested l2_gva to l1_gpa page table walk. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:36 +02:00
Joerg Roedel	1e301feb07	KVM: MMU: Introduce generic walk_addr function This is the first patch in the series towards a generic walk_addr implementation which could walk two-dimensional page tables in the end. In this first step the walk_addr function is renamed into walk_addr_generic which takes a mmu context as an additional parameter. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:33 +02:00
Joerg Roedel	8df25a328a	KVM: MMU: Track page fault data in struct vcpu This patch introduces a struct with two new fields in vcpu_arch for x86: * fault.address * fault.error_code This will be used to correctly propagate page faults back into the guest when we could have either an ordinary page fault or a nested page fault. In the case of a nested page fault the fault-address is different from the original address that should be walked. So we need to keep track about the real fault-address. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:33 +02:00
Joerg Roedel	3241f22da8	KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu This patch changes is_rsvd_bits_set() function prototype to take only a kvm_mmu context instead of a full vcpu. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:32 +02:00
Joerg Roedel	5777ed340d	KVM: MMU: Introduce get_cr3 function pointer This function pointer in the MMU context is required to implement Nested Nested Paging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:31 +02:00
Joerg Roedel	957446afce	KVM: MMU: Check for root_level instead of long mode The walk_addr function checks for !is_long_mode in its 64 bit version. But what is meant here is a check for pae paging. Change the condition to really check for pae paging so that it also works with nested nested paging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:27 +02:00
Xiao Guangrong	8b1fe17cc7	KVM: MMU: support disable/enable mmu audit dynamicly Add a r/w module parameter named 'mmu_audit', it can control audit enable/disable: enable: echo 1 > /sys/module/kvm/parameters/mmu_audit disable: echo 0 > /sys/module/kvm/parameters/mmu_audit This patch not change the logic Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:56 +02:00
Xiao Guangrong	bc32ce2152	KVM: MMU: fix wrong not write protected sp report The audit code reports some sp not write protected in current code, it's just the bug in audit_write_protection(), since: - the invalid sp not need write protected - using uninitialize local variable('gfn') - call kvm_mmu_audit() out of mmu_lock's protection Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:47 +02:00
Xiao Guangrong	189be38db3	KVM: MMU: combine guest pte read between fetch and pte prefetch Combine guest pte read between guest pte check in the fetch path and pte prefetch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:28 +02:00
Xiao Guangrong	957ed9effd	KVM: MMU: prefetch ptes when intercepted guest #PF Support prefetch ptes when intercept guest #PF, avoid to #PF by later access If we meet any failure in the prefetch path, we will exit it and not try other ptes to avoid become heavy path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:27 +02:00
Xiao Guangrong	fa1de2bfc0	KVM: MMU: add missing reserved bits check in speculative path In the speculative path, we should check guest pte's reserved bits just as the real processor does Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:56 +03:00
Avi Kivity	24157aaf83	KVM: MMU: Eliminate redundant temporaries in FNAME(fetch) 'level' and 'sptep' are aliases for 'interator.level' and 'iterator.sptep', no need for them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:48 +03:00
Avi Kivity	5991b33237	KVM: MMU: Validate all gptes during fetch, not just those used for new pages Currently, when we fetch an spte, we only verify that gptes match those that the walker saw if we build new shadow pages for them. However, this misses the following race: vcpu1 vcpu2 walk change gpte walk instantiate sp fetch existing sp Fix by validating every gpte, regardless of whether it is used for building a new sp or not. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:47 +03:00
Avi Kivity	0b3c933302	KVM: MMU: Simplify spte fetch() function Partition the function into three sections: - fetching indirect shadow pages (host_level > guest_level) - fetching direct shadow pages (page_level < host_level <= guest_level) - the final spte (page_level == host_level) Instead of the current spaghetti. A slight change from the original code is that we call validate_direct_spte() more often: previously we called it only for gw->level, now we also call it for lower levels. The change should have no effect. [xiao: fix regression caused by validate_direct_spte() called too late] Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:45 +03:00
Avi Kivity	39c8c672a1	KVM: MMU: Add gpte_valid() helper Move the code to check whether a gpte has changed since we fetched it into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:44 +03:00
Avi Kivity	a357bd229c	KVM: MMU: Add validate_direct_spte() helper Add a helper to verify that a direct shadow page is valid wrt the required access permissions; drop the page if it is not valid. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:43 +03:00
Avi Kivity	a3aa51cfaa	KVM: MMU: Add drop_large_spte() helper To clarify spte fetching code, move large spte handling into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:42 +03:00
Avi Kivity	32ef26a359	KVM: MMU: Add link_shadow_page() helper To simplify the process of fetching an spte, add a helper that links a shadow page to an spte. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:40 +03:00
Avi Kivity	f59c1d2ded	KVM: MMU: Keep going on permission error Real hardware disregards permission errors when computing page fault error code bit 0 (page present). Do the same. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:30 +03:00
Avi Kivity	b0eeec29fe	KVM: MMU: Only indicate a fetch fault in page fault error code if nx is enabled Bit 4 of the page fault error code is set only if EFER.NX is set. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:29 +03:00
Avi Kivity	be38d276b0	KVM: MMU: Introduce drop_spte() When we call rmap_remove(), we (almost) always immediately follow it by an __set_spte() to a nonpresent pte. Since we need to perform the two operations atomically, to avoid losing the dirty and accessed bits, introduce a helper drop_spte() and convert all call sites. The operation is still nonatomic at this point. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:17 +03:00
Xiao Guangrong	84754cd8fc	KVM: MMU: cleanup FNAME(fetch)() functions Cleanup this function that we are already get the direct sp's access Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:26 +03:00
Xiao Guangrong	9e7b0e7fba	KVM: MMU: fix direct sp's access corrupted If the mapping is writable but the dirty flag is not set, we will find the read-only direct sp and setup the mapping, then if the write #PF occur, we will mark this mapping writable in the read-only direct sp, now, other real read-only mapping will happily write it without #PF. It may hurt guest's COW Fixed by re-install the mapping when write #PF occur. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:25 +03:00
Xiao Guangrong	5fd5387c89	KVM: MMU: fix conflict access permissions in direct sp In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> \|---\| P[W] \| \| LPA \ PDE2 -> \|---\| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:23 +03:00
Avi Kivity	a1f4d39500	KVM: Remove memory alias support As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:00 +03:00
Xiao Guangrong	be71e061d1	KVM: MMU: don't mark pte notrap if it's just sync transient If the sync-sp just sync transient, don't mark its pte notrap Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:42 +03:00
Xiao Guangrong	cb83cad2e7	KVM: MMU: cleanup for dirty page judgment Using wrap function to cleanup page dirty judgment Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:39 +03:00
Xiao Guangrong	ac3cd03cca	KVM: MMU: rename 'page' and 'shadow_page' to 'sp' Rename 'page' and 'shadow_page' to 'sp' to better fit the context Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:38 +03:00
Andi Kleen	a24e809902	KVM: Fix unused but set warnings No real bugs in this one. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:29 +03:00
Lai Jiangshan	3af1817a0d	KVM: MMU: calculate correct gfn for small host pages backing large guest pages In Documentation/kvm/mmu.txt: gfn: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. But in function FNAME(fetch)(), sp->gfn is incorrect when one of following situations occurred: 1) guest is 32bit paging and the guest PDE maps a 4-MByte page (backed by 4k host pages), FNAME(fetch)() miss handling the quadrant. And if guest use pse-36, "table_gfn = gpte_to_gfn(gw->ptes[level - delta]);" is incorrect. 2) guest is long mode paging and the guest PDPTE maps a 1-GByte page (backed by 4k or 2M host pages). So we fix it to suit to the document and suit to the code which requires sp->gfn correct when sp->role.direct=1. We use the goal mapping gfn(gw->gfn) to calculate the base page frame for linear translations, it is simple and easy to be understood. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:21 +03:00
Lai Jiangshan	2032a93d66	KVM: MMU: Don't allocate gfns page for direct mmu pages When sp->role.direct is set, sp->gfns does not contain any essential information, leaf sptes reachable from this sp are for a continuous guest physical memory range (a linear range). So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL) Obviously, it is not essential information, we can calculate it when need. It means we don't need sp->gfns when sp->role.direct=1, Thus we can save one page usage for every kvm_mmu_page. Note: Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn() or kvm_mmu_page_set_gfn(). It is only exposed in FNAME(sync_page). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:52 +03:00
Avi Kivity	221d059d15	KVM: Update Red Hat copyrights Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:51 +03:00
Xiao Guangrong	f78978aa3a	KVM: MMU: only update unsync page in invlpg path Only unsync pages need updated at invlpg time since other shadow pages are write-protected Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:50 +03:00
Xiao Guangrong	f55c3f419a	KVM: MMU: unalias gfn before sp->gfns[] comparison in sync_page sp->gfns[] contain unaliased gfns, but gpte might contain pointer to aliased region. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:46 +03:00
Gui Jianfeng	518c5a05e8	KVM: MMU: Fix debug output error in walk_addr() Fix a debug output error in walk_addr Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:39 +03:00
Gui Jianfeng	f3b8c964a9	KVM: MMU: mark page table dirty when a pte is actually modified Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark page table page as dirty. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:38 +03:00
Huang Ying	bf998156d2	KVM: Avoid killing userspace through guest SRAO MCE on unmapped pages In common cases, guest SRAO MCE will cause corresponding poisoned page be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay the MCE to guest OS. But it is reported that if the poisoned page is accessed in guest after unmapping and before MCE is relayed to guest OS, userspace will be killed. The reason is as follows. Because poisoned page has been un-mapped, guest access will cause guest exit and kvm_mmu_page_fault will be called. kvm_mmu_page_fault can not get the poisoned page for fault address, so kernel and user space MMIO processing is tried in turn. In user MMIO processing, poisoned page is accessed again, then userspace is killed by force_sig_info. To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM and do not try kernel and user space MMIO processing for poisoned page. [xiao: fix warning introduced by avi] Reported-by: Max Asbock <masbock@linux.vnet.ibm.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:26 +03:00
Xiao Guangrong	6aa0b9dec5	KVM: MMU: fix conflict access permissions in direct sp In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> \|---\| P[W] \| \| LPA \ PDE2 -> \|---\| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-07-23 09:07:04 +03:00
Xiao Guangrong	884a0ff0b6	KVM: MMU: cleanup invlpg code Using is_last_spte() to cleanup invlpg code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:36:28 +03:00
Xiao Guangrong	22c9b2d166	KVM: MMU: fix for calculating gpa in invlpg code If the guest is 32-bit, we should use 'quadrant' to adjust gpa offset Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:36:25 +03:00
Gui Jianfeng	814a59d207	KVM: MMU: Make use of is_large_pte() in walker Make use of is_large_pte() instead of checking PT_PAGE_SIZE_MASK bit directly. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:07 +03:00
Gui Jianfeng	51fb60d81b	KVM: MMU: Move sync_page() first pte address calculation out of loop Move first pte address calculation out of loop to save some cycles. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:06 +03:00
Xiao Guangrong	24222c2fec	KVM: MMU: remove unnecessary NX check in walk_addr After is_rsvd_bits_set() checks, EFER.NXE must be enabled if NX bit is seted Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:30 +03:00
Avi Kivity	08e850c653	KVM: MMU: Reinstate pte prefetch on invlpg Commit `fb341f57` removed the pte prefetch on guest invlpg, citing guest races. However, the SDM is adamant that prefetch is allowed: "The processor may create entries in paging-structure caches for translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path." And, in fact, there was a race in the prefetch code: we picked up the pte without the mmu lock held, so an older invlpg could install the pte over a newer invlpg. Reinstate the prefetch logic, but this time note whether another invlpg has executed using a counter. If a race occured, do not install the pte. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:43 +03:00

1 2 3

109 Commits