linux

Author	SHA1	Message	Date
Ilias Stamatis	1ab9287add	KVM: X86: Add vendor callbacks for writing the TSC multiplier Currently vmx_vcpu_load_vmcs() writes the TSC_MULTIPLIER field of the VMCS every time the VMCS is loaded. Instead of doing this, set this field from common code on initialization and whenever the scaling ratio changes. Additionally remove vmx->current_tsc_ratio. This field is redundant as vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling ratio. The vmx->current_tsc_ratio field is only used for avoiding unnecessary writes but it is no longer needed after removing the code from the VMCS load path. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ilias Stamatis <ilstam@amazon.com> Message-Id: <20210607105438.16541-1-ilstam@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:29 -04:00
Ilias Stamatis	edcfe54058	KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it The write_l1_tsc_offset() callback has a misleading name. It does not set L1's TSC offset, it rather updates the current TSC offset which might be different if a nested guest is executing. Additionally, both the vmx and svm implementations use the same logic for calculating the current TSC before writing it to hardware. Rename the function and move the common logic to the caller. The vmx/svm specific code now merely sets the given offset to the corresponding hardware structure. Signed-off-by: Ilias Stamatis <ilstam@amazon.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-9-ilstam@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:29 -04:00
Ilias Stamatis	307a94c721	KVM: X86: Add functions for retrieving L2 TSC fields from common code In order to implement as much of the nested TSC scaling logic as possible in common code, we need these vendor callbacks for retrieving the TSC offset and the TSC multiplier that L1 has set for L2. Signed-off-by: Ilias Stamatis <ilstam@amazon.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-7-ilstam@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:28 -04:00
Alper Gun	934002cd66	KVM: SVM: Call SEV Guest Decommission if ASID binding fails Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding fails. If a failure happens after a successful LAUNCH_START command, a decommission command should be executed. Otherwise, guest context will be unfreed inside the AMD SP. After the firmware will not have memory to allocate more SEV guest context, LAUNCH_START command will begin to fail with SEV_RET_RESOURCE_LIMIT error. The existing code calls decommission inside sev_unbind_asid, but it is not called if a failure happens before guest activation succeeds. If sev_bind_asid fails, decommission is never called. PSP firmware has a limit for the number of guests. If sev_asid_binding fails many times, PSP firmware will not have resources to create another guest context. Cc: stable@vger.kernel.org Fixes: `59414c9892` ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command") Reported-by: Peter Gonda <pgonda@google.com> Signed-off-by: Alper Gun <alpergun@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210610174604.2554090-1-alpergun@google.com>	2021-06-11 11:52:48 -04:00
ChenXiaoSong	02ffbe6351	KVM: SVM: fix doc warnings Fix kernel-doc warnings: arch/x86/kvm/svm/avic.c:233: warning: Function parameter or member 'activate' not described in 'avic_update_access_page' arch/x86/kvm/svm/avic.c:233: warning: Function parameter or member 'kvm' not described in 'avic_update_access_page' arch/x86/kvm/svm/avic.c:781: warning: Function parameter or member 'e' not described in 'get_pi_vcpu_info' arch/x86/kvm/svm/avic.c:781: warning: Function parameter or member 'kvm' not described in 'get_pi_vcpu_info' arch/x86/kvm/svm/avic.c:781: warning: Function parameter or member 'svm' not described in 'get_pi_vcpu_info' arch/x86/kvm/svm/avic.c:781: warning: Function parameter or member 'vcpu_info' not described in 'get_pi_vcpu_info' arch/x86/kvm/svm/avic.c:1009: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Message-Id: <20210609122217.2967131-1-chenxiaosong2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-10 07:46:54 -04:00
Linus Torvalds	2f673816b2	Bugfixes, including a TLB flush fix that affects processors without nested page tables. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDAVpQUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNkOgf9F97eFxAdod3/wbW9EbsUPR5bMTLE +R6Hmvw+yCm/W2cycVGdCSh1BEKNuZN/XfHln2cYVfVr6ndog58A4Y0urFAhTROv IHs8TCA5biQitoZ716l88ExOitnqJiSmMhGex969+zm1Lb9MQo1KA/zxERlqCi3s Pfcxb6I8VbD9LEb6NaQdDgQoslJo1tzhe9gGYAYrpMOZujpj1RPeIOZIfeII0MP/ g14/JSar8cXc9QJ6zbiKn8HhpmzGJnaIsyFFL2RMIBlKvxsnpOU6VmisLTL9407o P246Vq59BM8pdRCVUW9W9hLr2ho8lmi+ZYXASCm+qfn8cLaHyRCqSK56ZQ== =nW43 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Bugfixes, including a TLB flush fix that affects processors without nested page tables" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: fix previous commit for 32-bit builds kvm: avoid speculation-based attacks from out-of-range memslot accesses KVM: x86: Unload MMU on guest TLB flush if TDP disabled to force MMU sync KVM: x86: Ensure liveliness of nested VM-Enter fail tracepoint message selftests: kvm: Add support for customized slot0 memory size KVM: selftests: introduce P47V64 for s390x KVM: x86: Ensure PV TLB flush tracepoint reflects KVM behavior KVM: X86: MMU: Use the correct inherited permissions to get shadow page KVM: LAPIC: Write 0 to TMICT should also cancel vmx-preemption timer KVM: SVM: Fix SEV SEND_START session length & SEND_UPDATE_DATA query length after commit `238eca821c`	2021-06-09 13:09:57 -07:00
Ashish Kalra	4f13d471e5	KVM: SVM: Fix SEV SEND_START session length & SEND_UPDATE_DATA query length after commit `238eca821c` Commit `238eca821c` ("KVM: SVM: Allocate SEV command structures on local stack") uses the local stack to allocate the structures used to communicate with the PSP, which were earlier being kzalloced. This breaks SEV live migration for computing the SEND_START session length and SEND_UPDATE_DATA query length as session_len and trans_len and hdr_len fields are not zeroed respectively for the above commands before issuing the SEV Firmware API call, hence the firmware returns incorrect session length and update data header or trans length. Also the SEV Firmware API returns SEV_RET_INVALID_LEN firmware error for these length query API calls, and the return value and the firmware error needs to be passed to the userspace as it is, so need to remove the return check in the KVM code. Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <20210607061532.27459-1-Ashish.Kalra@amd.com> Fixes: `238eca821c` ("KVM: SVM: Allocate SEV command structures on local stack") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-08 12:21:55 -04:00
Linus Torvalds	224478289c	ARM fixes: * Another state update on exit to userspace fix * Prevent the creation of mixed 32/64 VMs * Fix regression with irqbypass not restarting the guest on failed connect * Fix regression with debug register decoding resulting in overlapping access * Commit exception state on exit to usrspace * Fix the MMU notifier return values * Add missing 'static' qualifiers in the new host stage-2 code x86 fixes: * fix guest missed wakeup with assigned devices * fix WARN reported by syzkaller * do not use BIT() in UAPI headers * make the kvm_amd.avic parameter bool PPC fixes: * make halt polling heuristics consistent with other architectures selftests: * various fixes * new performance selftest memslot_perf_test * test UFFD minor faults in demand_paging_test -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCyF0MUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOHSgf/Q4Hm5e12Bj2xJy6A+iShnrbbT8PW hcIIOA7zGWXfjVYcBV7anbj7CcpzfIz0otcRBABa5mkhj+fb3YmPEb0EzCPi4Hru zxpcpB2w7W7WtUOIKe2EmaT+4Pk6/iLcfr8UMHMqx460akE9OmIg10QNWai3My/3 RIOeakSckBI9e/1TQZbxH66dsLwCT0lLco7i7AWHdFxkzUQyoA34HX5pczOCBsO5 3nXH+/txnRVhqlcyzWLVVGVzFqmpHtBqkIInDOXfUqIoxo/gOhOgF1QdMUEKomxn 5ZFXlL5IXNtr+7yiI67iHX7CWkGZE9oJ04TgPHn6LR6wRnVvc3JInzcB5Q== =ollO -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "ARM fixes: - Another state update on exit to userspace fix - Prevent the creation of mixed 32/64 VMs - Fix regression with irqbypass not restarting the guest on failed connect - Fix regression with debug register decoding resulting in overlapping access - Commit exception state on exit to usrspace - Fix the MMU notifier return values - Add missing 'static' qualifiers in the new host stage-2 code x86 fixes: - fix guest missed wakeup with assigned devices - fix WARN reported by syzkaller - do not use BIT() in UAPI headers - make the kvm_amd.avic parameter bool PPC fixes: - make halt polling heuristics consistent with other architectures selftests: - various fixes - new performance selftest memslot_perf_test - test UFFD minor faults in demand_paging_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits) selftests: kvm: fix overlapping addresses in memslot_perf_test KVM: X86: Kill off ctxt->ud KVM: X86: Fix warning caused by stale emulation context KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception KVM: x86/mmu: Fix comment mentioning skip_4k KVM: VMX: update vcpu posted-interrupt descriptor when assigning device KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK KVM: x86: add start_assignment hook to kvm_x86_ops KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch selftests: kvm: do only 1 memslot_perf_test run by default KVM: X86: Use _BITUL() macro in UAPI headers KVM: selftests: add shared hugetlbfs backing source type KVM: selftests: allow using UFFD minor faults for demand paging KVM: selftests: create alias mappings when using shared memory KVM: selftests: add shmem backing source type KVM: selftests: refactor vm_mem_backing_src_type flags KVM: selftests: allow different backing source types KVM: selftests: compute correct demand paging size KVM: selftests: simplify setup_demand_paging error handling KVM: selftests: Print a message if /dev/kvm is missing ...	2021-05-29 06:02:25 -10:00
Paolo Bonzini	28a4aa1160	KVM: SVM: make the avic parameter a bool Make it consistent with kvm_intel.enable_apicv. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-24 13:10:27 -04:00
Vitaly Kuznetsov	778a136e48	KVM: SVM: Drop unneeded CONFIG_X86_LOCAL_APIC check AVIC dependency on CONFIG_X86_LOCAL_APIC is dead code since commit `e42eef4ba3` ("KVM: add X86_LOCAL_APIC dependency"). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210518144339.1987982-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com>	2021-05-24 18:47:39 +02:00
Paolo Bonzini	a4345a7cec	KVM/arm64 fixes for 5.13, take #1 - Fix regression with irqbypass not restarting the guest on failed connect - Fix regression with debug register decoding resulting in overlapping access - Commit exception state on exit to usrspace - Fix the MMU notifier return values - Add missing 'static' qualifiers in the new host stage-2 code -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmCfmUoPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDei8QAMOWMA9wFTydsMTyRwDDZzD9i3Vg4bYlTdj1 1C1FiHHGL37t44coo1eHtnydWBuhxhhwDHWQE8owFbDHyOnPzEX+NwhmJ4gVlUW5 51aSxfPgXzKiv17WyncqZO9SfA5/RFyA/C2gRq9/fMr/7CpQJjqrvdQXaWh4kPVa 9jFMVd1sCDUPd5c9Jyxd42CmVZjg6mCorOKaEwlI7NZkulRBlFW21A5y+M57sGTF RLIuQcggFJaG17kZN4p6v55Yoclt8O4xVbDv8SZV3vO1gjpaF1LtXdsmAKvbDZrZ lEtdumPHyD1maFhwXQFMOyvOgEaRhlhiNaTgKUOyX2LgeW1utCiYO/KwysflZvIC oLsfx3x+G0nSxa+MWGL9m52Hrt4yyscfbKfBg6nqJB+AqD3teH20xfsEUHTEuYkW kEgeWcJcWkadL5+ngs6S4PwFr88NyVBdUAagNd5VXE/KFhxCcr4B9oOXk5WdOaMi ZvLG5IQfIH6k3w+h2wR2WSoxYwltriZ3PwrPIeJ2Se33bK15xtQy1k/IIqvZP/oK 0xxRVoY+nwuru0QZGwyI7zCFFvZzEKOXJ3qzJ2NeQxoTBky/e0bvUwnU8gXLXGPM lx2Gzw6t+xlTfcF9oIaQq7WlOsrC7Zr4uiTurZGLZKWklso9tLdzW35zmdN6D3qx sP2LC4iv =57tg -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.13, take #1 - Fix regression with irqbypass not restarting the guest on failed connect - Fix regression with debug register decoding resulting in overlapping access - Commit exception state on exit to usrspace - Fix the MMU notifier return values - Add missing 'static' qualifiers in the new host stage-2 code	2021-05-17 09:55:12 +02:00
Linus Torvalds	ccb013c29d	- Enable -Wundef for the compressed kernel build stage - Reorganize SEV code to streamline and simplify future development -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCg1XQACgkQEsHwGGHe VUpRKA//dwzDD1QU16JucfhgFlv/9OTm48ukSwAb9lZjDEy4H1CtVL3xEHFd7L3G LJp0LTW+OQf0/0aGlQp/cP6sBF6G9Bf4mydx70Id4SyCQt8eZDodB+ZOOWbeteWq p92fJPbX8CzAglutbE+3v/MD8CCAllTiLZnJZPVj4Kux2/wF6EryDgF1+rb5q8jp ObTT9817mHVwWVUYzbgceZtd43IocOlKZRmF1qivwScMGylQTe1wfMjunpD5pVt8 Zg4UDNknNfYduqpaG546E6e1zerGNaJK7SHnsuzHRUVU5icNqtgBk061CehP9Ksq DvYXLUl4xF16j6xJAqIZPNrBkJGdQf4q1g5x2FiBm7rSQU5owzqh5rkVk4EBFFzn UtzeXpqbStbsZHXycyxBNdq2HXxkFPf2NXZ+bkripPg+DifOGots1uwvAft+6iAE GudK6qxAvr8phR1cRyy6BahGtgOStXbZYEz0ZdU6t7qFfZMz+DomD5Jimj0kAe6B s6ras5xm8q3/Py87N/KNjKtSEpgsHv/7F+idde7ODtHhpRL5HCBqhkZOSRkMMZqI ptX1oSTvBXwRKyi5x9YhkKHUFqfFSUTfJhiRFCWK+IEAv3Y7SipJtfkqxRbI6fEV FfCeueKDDdViBtseaRceVLJ8Tlr6Qjy27fkPPTqJpthqPpCdoZ0= =ENfF -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "The three SEV commits are not really urgent material. But we figured since getting them in now will avoid a huge amount of conflicts between future SEV changes touching tip, the kvm and probably other trees, sending them to you now would be best. The idea is that the tip, kvm etc branches for 5.14 will all base ontop of -rc2 and thus everything will be peachy. What is more, those changes are purely mechanical and defines movement so they should be fine to go now (famous last words). Summary: - Enable -Wundef for the compressed kernel build stage - Reorganize SEV code to streamline and simplify future development" * tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/compressed: Enable -Wundef x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG x86/sev: Move GHCB MSR protocol and NAE definitions in a common header x86/sev-es: Rename sev-es.{ch} to sev.{ch}	2021-05-16 09:31:06 -07:00
Linus Torvalds	0aa099a312	* Lots of bug fixes. * Fix virtualization of RDPID * Virtualization of DR6_BUS_LOCK, which on bare metal is new in the 5.13 merge window * More nested virtualization migration fixes (nSVM and eVMCS) * Fix for KVM guest hibernation * Fix for warning in SEV-ES SRCU usage * Block KVM from loading on AMD machines with 5-level page tables, due to the APM not mentioning how host CR4.LA57 exactly impacts the guest. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCZWwgUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOE9wgAk7Io8cuvnhC9ogVqzZWrPweWqFg8 fJcPMB584JRnMqYHBVYbkTPGe8SsCHKR2MKsNdc4cEP111cyr3suWsxOdmjJn58i 7ahy6PcKx7wWeWwEt7O599l6CeoX5XB9ExvA6eiXAv7iZeOJHFa+Ny2GlWgauy6Y DELryEomx1r4IUkZaSR+2fYjzvOWTXQixwU/jwx8NcTJz0DrzknzLE7XOciPBfn0 t0Q2rCXdL2nF1uPksZbntx8Qoa6t6GDVIyrH/ZCPQYJtAX6cjxNAh3zwCe+hMnOd fW8ntBH1nZRiNnberA4IICAzqnUokgPWdKBrZT2ntWHBK+aqxXHznrlPJA== =e+gD -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: - Lots of bug fixes. - Fix virtualization of RDPID - Virtualization of DR6_BUS_LOCK, which on bare metal is new to this release - More nested virtualization migration fixes (nSVM and eVMCS) - Fix for KVM guest hibernation - Fix for warning in SEV-ES SRCU usage - Block KVM from loading on AMD machines with 5-level page tables, due to the APM not mentioning how host CR4.LA57 exactly impacts the guest. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (48 commits) KVM: SVM: Move GHCB unmapping to fix RCU warning KVM: SVM: Invert user pointer casting in SEV {en,de}crypt helpers kvm: Cap halt polling at kvm->max_halt_poll_ns tools/kvm_stat: Fix documentation typo KVM: x86: Prevent deadlock against tk_core.seq KVM: x86: Cancel pvclock_gtod_work on module removal KVM: x86: Prevent KVM SVM from loading on kernels with 5-level paging KVM: X86: Expose bus lock debug exception to guest KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks KVM: x86: Hide RDTSCP and RDPID if MSR_TSC_AUX probing failed KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model KVM: x86: Move uret MSR slot management to common x86 KVM: x86: Export the number of uret MSRs to vendor modules KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way KVM: VMX: Use common x86's uret MSR list as the one true list KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list KVM: VMX: Configure list of user return MSRs at module init KVM: x86: Add support for RDPID without RDTSCP KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in host ...	2021-05-10 12:30:45 -07:00
Brijesh Singh	059e5c321a	x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG The SYSCFG MSR continued being updated beyond the K8 family; drop the K8 name from it. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20210427111636.1207-4-brijesh.singh@amd.com	2021-05-10 07:51:38 +02:00
Brijesh Singh	b81fc74d53	x86/sev: Move GHCB MSR protocol and NAE definitions in a common header The guest and the hypervisor contain separate macros to get and set the GHCB MSR protocol and NAE event fields. Consolidate the GHCB protocol definitions and helper macros in one place. Leave the supported protocol version define in separate files to keep the guest and hypervisor flexibility to support different GHCB version in the same release. There is no functional change intended. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20210427111636.1207-3-brijesh.singh@amd.com	2021-05-10 07:46:39 +02:00
Tom Lendacky	ce7ea0cfdc	KVM: SVM: Move GHCB unmapping to fix RCU warning When an SEV-ES guest is running, the GHCB is unmapped as part of the vCPU run support. However, kvm_vcpu_unmap() triggers an RCU dereference warning with CONFIG_PROVE_LOCKING=y because the SRCU lock is released before invoking the vCPU run support. Move the GHCB unmapping into the prepare_guest_switch callback, which is invoked while still holding the SRCU lock, eliminating the RCU dereference warning. Fixes: `291bd20d5d` ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <b2f9b79d15166f2c3e4375c0d9bc3268b7696455.1620332081.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:23 -04:00
Sean Christopherson	368340a3c7	KVM: SVM: Invert user pointer casting in SEV {en,de}crypt helpers Invert the user pointer params for SEV's helpers for encrypting and decrypting guest memory so that they take a pointer and cast to an unsigned long as necessary, as opposed to doing the opposite. Tagging a non-pointer as __user is confusing and weird since a cast of some form needs to occur to actually access the user data. This also fixes Sparse warnings triggered by directly consuming the unsigned longs, which are "noderef" due to the __user tag. Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210506231542.2331138-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:22 -04:00
Sean Christopherson	03ca4589fa	KVM: x86: Prevent KVM SVM from loading on kernels with 5-level paging Disallow loading KVM SVM if 5-level paging is supported. In theory, NPT for L1 should simply work, but there unknowns with respect to how the guest's MAXPHYADDR will be handled by hardware. Nested NPT is more problematic, as running an L1 VMM that is using 2-level page tables requires stacking single-entry PDP and PML4 tables in KVM's NPT for L2, as there are no equivalent entries in L1's NPT to shadow. Barring hardware magic, for 5-level paging, KVM would need stack another layer to handle PML5. Opportunistically rename the lm_root pointer, which is used for the aforementioned stacking when shadowing 2-level L1 NPT, to pml4_root to call out that it's specifically for PML4. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210505204221.1934471-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:21 -04:00
Sean Christopherson	61a05d444d	KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model Squish the Intel and AMD emulation of MSR_TSC_AUX together and tie it to the guest CPU model instead of the host CPU behavior. While not strictly necessary to avoid guest breakage, emulating cross-vendor "architecture" will provide consistent behavior for the guest, e.g. WRMSR fault behavior won't change if the vCPU is migrated to a host with divergent behavior. Note, the "new" kvm_is_supported_user_return_msr() checks do not add new functionality on either SVM or VMX. On SVM, the equivalent was "tsc_aux_uret_slot < 0", and on VMX the check was buried in the vmx_find_uret_msr() call at the find_uret_msr label. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:19 -04:00
Sean Christopherson	e5fda4bbad	KVM: x86: Move uret MSR slot management to common x86 Now that SVM and VMX both probe MSRs before "defining" user return slots for them, consolidate the code for probe+define into common x86 and eliminate the odd behavior of having the vendor code define the slot for a given MSR. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:19 -04:00
Sean Christopherson	36fa06f9ff	KVM: x86: Add support for RDPID without RDTSCP Allow userspace to enable RDPID for a guest without also enabling RDTSCP. Aside from checking for RDPID support in the obvious flows, VMX also needs to set ENABLE_RDTSCP=1 when RDPID is exposed. For the record, there is no known scenario where enabling RDPID without RDTSCP is desirable. But, both AMD and Intel architectures allow for the condition, i.e. this is purely to make KVM more architecturally accurate. Fixes: `41cd02c6f7` ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID") Cc: stable@vger.kernel.org Reported-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:17 -04:00
Sean Christopherson	0caa0a77c2	KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in host Probe MSR_TSC_AUX whether or not RDTSCP is supported in the host, and if probing succeeds, load the guest's MSR_TSC_AUX into hardware prior to VMRUN. Because SVM doesn't support interception of RDPID, RDPID cannot be disallowed in the guest (without resorting to binary translation). Leaving the host's MSR_TSC_AUX in hardware would leak the host's value to the guest if RDTSCP is not supported. Note, there is also a kernel bug that prevents leaking the host's value. The host kernel initializes MSR_TSC_AUX if and only if RDTSCP is supported, even though the vDSO usage consumes MSR_TSC_AUX via RDPID. I.e. if RDTSCP is not supported, there is no host value to leak. But, if/when the host kernel bug is fixed, KVM would start leaking MSR_TSC_AUX in the case where hardware supports RDPID but RDTSCP is unavailable for whatever reason. Probing MSR_TSC_AUX will also allow consolidating the probe and define logic in common x86, and will make it simpler to condition the existence of MSR_TSX_AUX (from the guest's perspective) on RDTSCP or RDPID. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:17 -04:00
Sean Christopherson	3b195ac926	KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest. Note, SVM does not support intercepting RDPID. Unlike VMX's ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID. This is a benign virtualization hole as the host kernel (incorrectly) sets MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX into hardware if RDTSCP is supported in the host, i.e. KVM will not leak the host's MSR_TSC_AUX to the guest. But, when the kernel bug is fixed, KVM will start leaking the host's MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available for whatever reason. This leak will be remedied in a future commit. Fixes: `46896c73c1` ("KVM: svm: add support for RDTSCP") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-4-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:15 -04:00
Maxim Levitsky	809c79137a	KVM: nSVM: remove a warning about vmcb01 VM exit reason While in most cases, when returning to use the VMCB01, the exit reason stored in it will be SVM_EXIT_VMRUN, on first VM exit after a nested migration this field can contain anything since the VM entry did happen before the migration. Remove this warning to avoid the false positive. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210504143936.1644378-3-mlevitsk@redhat.com> Fixes: `9a7de6ecc3` ("KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in nested_svm_vmexit()") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:14 -04:00
Maxim Levitsky	063ab16c14	KVM: nSVM: always restore the L1's GIF on migration While usually the L1's GIF is set while L2 runs, and usually migration nested state is loaded after a vCPU reset which also sets L1's GIF to true, this is not guaranteed. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210504143936.1644378-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:14 -04:00
Sean Christopherson	bc908e091b	KVM: x86: Consolidate guest enter/exit logic to common helpers Move the enter/exit logic in {svm,vmx}_vcpu_enter_exit() to common helpers. Opportunistically update the somewhat stale comment about the updates needing to occur immediately after VM-Exit. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210505002735.1684165-9-seanjc@google.com	2021-05-05 22:54:12 +02:00
Wanpeng Li	1604571401	KVM: x86: Defer vtime accounting 'til after IRQ handling Defer the call to account guest time until after servicing any IRQ(s) that happened in the guest or immediately after VM-Exit. Tick-based accounting of vCPU time relies on PF_VCPU being set when the tick IRQ handler runs, and IRQs are blocked throughout the main sequence of vcpu_enter_guest(), including the call into vendor code to actually enter and exit the guest. This fixes a bug where reported guest time remains '0', even when running an infinite loop in the guest: https://bugzilla.kernel.org/show_bug.cgi?id=209831 Fixes: `87fa7f3e98` ("x86/kvm: Move context tracking where it belongs") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.com	2021-05-05 22:54:11 +02:00
Colin Ian King	8899a5fc7d	KVM: x86: Fix potential fput on a null source_kvm_file The fget can potentially return null, so the fput on the error return path can cause a null pointer dereference. Fix this by checking for a null source_kvm_file before doing a fput. Addresses-Coverity: ("Dereference null return") Fixes: `54526d1fd5` ("KVM: x86: Support KVM VMs sharing SEV context") Signed-off-by: Colin Ian King <colin.king@canonical.com> Message-Id: <20210430170303.131924-1-colin.king@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-03 11:25:40 -04:00
Maxim Levitsky	9d290e1643	KVM: nSVM: leave the guest mode prior to loading a nested state This allows the KVM to load the nested state more than once without warnings. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210503125446.1353307-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-03 11:25:38 -04:00
Maxim Levitsky	c74ad08f33	KVM: nSVM: fix few bugs in the vmcb02 caching logic * Define and use an invalid GPA (all ones) for init value of last and current nested vmcb physical addresses. * Reset the current vmcb12 gpa to the invalid value when leaving the nested mode, similar to what is done on nested vmexit. * Reset the last seen vmcb12 address when disabling the nested SVM, as it relies on vmcb02 fields which are freed at that point. Fixes: `4995a3685f` ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210503125446.1353307-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-03 11:25:37 -04:00
Maxim Levitsky	deee59bacb	KVM: nSVM: fix a typo in svm_leave_nested When forcibly leaving the nested mode, we should switch to vmcb01 Fixes: `4995a3685f` ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210503125446.1353307-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-03 11:25:36 -04:00
Linus Torvalds	152d32aa84	ARM: - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - Some selftests improvements -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+ Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY +2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ== =AXUi -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "This is a large update by KVM standards, including AMD PSP (Platform Security Processor, aka "AMD Secure Technology") and ARM CoreSight (debug and trace) changes. ARM: - CoreSight: Add support for ETE and TRBE - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - AMD PSP driver changes - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - a handful of "Get rid of oprofile leftovers" patches - Some selftests improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits) KVM: selftests: Speed up set_memory_region_test selftests: kvm: Fix the check of return value KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt() KVM: SVM: Skip SEV cache flush if no ASIDs have been used KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids() KVM: SVM: Drop redundant svm_sev_enabled() helper KVM: SVM: Move SEV VMCB tracking allocation to sev.c KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup() KVM: SVM: Unconditionally invoke sev_hardware_teardown() KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported) KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features KVM: SVM: Move SEV module params/variables to sev.c KVM: SVM: Disable SEV/SEV-ES if NPT is disabled KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails KVM: SVM: Zero out the VMCB array used to track SEV ASID association x86/sev: Drop redundant and potentially misleading 'sev_enabled' KVM: x86: Move reverse CPUID helpers to separate header file KVM: x86: Rename GPR accessors to make mode-aware variants the defaults ...	2021-05-01 10:14:08 -07:00
Linus Torvalds	55e6be657b	Merge branch 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "The only notable change is Vipin's new misc cgroup controller. This implements generic support for resources which can be controlled by simply counting and limiting the number of resource instances - ie there's X number of these on the system and this cgroup subtree can have upto Y of those. The first user is the address space IDs used for virtual machine memory encryption and expected future usages are similar - niche hardware features with concrete resource limits and simple usage models" * 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: use tsk->in_iowait instead of delayacct_is_task_waiting_on_io() cgroup/cpuset: fix typos in comments cgroup: misc: mark dummy misc_cg_res_total_usage() static inline svm/sev: Register SEV and SEV-ES ASIDs to the misc controller cgroup: Miscellaneous cgroup documentation. cgroup: Add misc cgroup controller	2021-04-27 18:47:42 -07:00
Linus Torvalds	ea5bc7b977	Trivial cleanups and fixes all over the place. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCGmYIACgkQEsHwGGHe VUr45w/8CSXr7MXaFBj4To0hTWJXSZyF6YGqlZOSJXFcFh4cWTNwfVOoFaV47aDo +HsCNTkGENcKhLrDUWDRiG/Uo46jxtOtl1vhq7U4pGemSYH871XWOKfb5k5XNMwn /uhaHMI4aEfd6bUFnF518NeyRIsD0BdqFj4tB7RbAiyFwdETDX9Tkj/uBKnQ4zon 4tEDoXgThuK5YKK9zVQg5pa7aFp2zg1CAdX/WzBkS8BHVBPXSV0CF97AJYQOM/V+ lUHv+BN3wp97GYHPQMPsbkNr8IuFoe2mIvikwjxg8iOFpzEU1G1u09XV9R+PXByX LclFTRqK/2uU5hJlcsBiKfUuidyErYMRYImbMAOREt2w0ogWVu2zQ7HkjVve25h1 sQPwPudbAt6STbqRxvpmB3yoV4TCYwnF91FcWgEy+rcEK2BDsHCnScA45TsK5I1C kGR1K17pHXprgMZFPveH+LgxewB6smDv+HllxQdSG67LhMJXcs2Epz0TsN8VsXw8 dlD3lGReK+5qy9FTgO7mY0xhiXGz1IbEdAPU4eRBgih13puu03+jqgMaMabvBWKD wax+BWJUrPtetwD5fBPhlS/XdJDnd8Mkv2xsf//+wT0s4p+g++l1APYxeB8QEehm Pd7Mvxm4GvQkfE13QEVIPYQRIXCMH/e9qixtY5SHUZDBVkUyFM0= =bO1i -----END PGP SIGNATURE----- Merge tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 cleanups from Borislav Petkov: "Trivial cleanups and fixes all over the place" * tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove me from IDE/ATAPI section x86/pat: Do not compile stubbed functions when X86_PAT is off x86/asm: Ensure asm/proto.h can be included stand-alone x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files x86/msr: Make locally used functions static x86/cacheinfo: Remove unneeded dead-store initialization x86/process/64: Move cpu_current_top_of_stack out of TSS tools/turbostat: Unmark non-kernel-doc comment x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL() x86/fpu/math-emu: Fix function cast warning x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes x86: Fix various typos in comments, take #2 x86: Remove unusual Unicode characters from comments x86/kaslr: Return boolean values from a function returning bool x86: Fix various typos in comments x86/setup: Remove unused RESERVE_BRK_ARRAY() stacktrace: Move documentation for arch_stack_walk_reliable() to header x86: Remove duplicate TSC DEADLINE MSR definitions	2021-04-26 09:25:47 -07:00
Sean Christopherson	469bb32b68	KVM: SVM: Skip SEV cache flush if no ASIDs have been used Skip SEV's expensive WBINVD and DF_FLUSH if there are no SEV ASIDs waiting to be reclaimed, e.g. if SEV was never used. This "fixes" an issue where the DF_FLUSH fails during hardware teardown if the original SEV_INIT failed. Ideally, SEV wouldn't be marked as enabled in KVM if SEV_INIT fails, but that's a problem for another day. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:17 -04:00
Sean Christopherson	82b7ae0481	KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids() Remove the forward declaration of sev_flush_asids(), which is only a few lines above the function itself. No functional change intended. Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:17 -04:00
Sean Christopherson	a5c1c5aad6	KVM: SVM: Drop redundant svm_sev_enabled() helper Replace calls to svm_sev_enabled() with direct checks on sev_enabled, or in the case of svm_mem_enc_op, simply drop the call to svm_sev_enabled(). This effectively replaces checks against a valid max_sev_asid with checks against sev_enabled. sev_enabled is forced off by sev_hardware_setup() if max_sev_asid is invalid, all call sites are guaranteed to run after sev_hardware_setup(), and all of the checks care about SEV being fully enabled (as opposed to intentionally handling the scenario where max_sev_asid is valid but SEV enabling fails due to OOM). Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:17 -04:00
Sean Christopherson	b95c221cac	KVM: SVM: Move SEV VMCB tracking allocation to sev.c Move the allocation of the SEV VMCB array to sev.c to help pave the way toward encapsulating SEV enabling wholly within sev.c. No functional change intended. Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:17 -04:00
Sean Christopherson	8cb756b7bd	KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup() Query max_sev_asid directly after setting it instead of bouncing through its wrapper, svm_sev_enabled(). Using the wrapper is unnecessary obfuscation. No functional change intended. Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:16 -04:00
Sean Christopherson	4cafd0c572	KVM: SVM: Unconditionally invoke sev_hardware_teardown() Remove the redundant svm_sev_enabled() check when calling sev_hardware_teardown(), the teardown helper itself does the check. Removing the check from svm.c will eventually allow dropping svm_sev_enabled() entirely. No functional change intended. Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:16 -04:00
Sean Christopherson	6c2c7bf580	KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported) Enable the 'sev' and 'sev_es' module params by default instead of having them conditioned on CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT. The extra Kconfig is pointless as KVM SEV/SEV-ES support is already controlled via CONFIG_KVM_AMD_SEV, and CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT has the unfortunate side effect of enabling all the SEV-ES _guest_ code due to it being dependent on CONFIG_AMD_MEM_ENCRYPT=y. Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:16 -04:00
Sean Christopherson	a479c33484	KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y Define sev_enabled and sev_es_enabled as 'false' and explicitly #ifdef out all of sev_hardware_setup() if CONFIG_KVM_AMD_SEV=n. This kills three birds at once: - Makes sev_enabled and sev_es_enabled off by default if CONFIG_KVM_AMD_SEV=n. Previously, they could be on by default if CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y, regardless of KVM SEV support. - Hides the sev and sev_es modules params when CONFIG_KVM_AMD_SEV=n. - Resolves a false positive -Wnonnull in __sev_recycle_asids() that is currently masked by the equivalent IS_ENABLED(CONFIG_KVM_AMD_SEV) check in svm_sev_enabled(), which will be dropped in a future patch. Reviewed by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:16 -04:00
Sean Christopherson	8d364a0792	KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables Rename sev and sev_es to sev_enabled and sev_es_enabled respectively to better align with other KVM terminology, and to avoid pseudo-shadowing when the variables are moved to sev.c in a future patch ('sev' is often used for local struct kvm_sev_info pointers. No functional change intended. Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:15 -04:00
Paolo Bonzini	d9db0fd6c5	KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features Add a reverse-CPUID entry for the memory encryption word, 0x8000001F.EAX, and use it to override the supported CPUID flags reported to userspace. Masking the reported CPUID flags avoids over-reporting KVM support, e.g. without the mask a SEV-SNP capable CPU may incorrectly advertise SNP support to userspace. Clear SEV/SEV-ES if their corresponding module parameters are disabled, and clear the memory encryption leaf completely if SEV is not fully supported in KVM. Advertise SME_COHERENT in addition to SEV and SEV-ES, as the guest can use SME_COHERENT to avoid CLFLUSH operations. Explicitly omit SME and VM_PAGE_FLUSH from the reporting. These features are used by KVM, but are not exposed to the guest, e.g. guest access to related MSRs will fault. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:15 -04:00
Sean Christopherson	e8126bdaf1	KVM: SVM: Move SEV module params/variables to sev.c Unconditionally invoke sev_hardware_setup() when configuring SVM and handle clearing the module params/variable 'sev' and 'sev_es' in sev_hardware_setup(). This allows making said variables static within sev.c and reduces the odds of a collision with guest code, e.g. the guest side of things has already laid claim to 'sev_enabled'. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:15 -04:00
Sean Christopherson	fa13680f56	KVM: SVM: Disable SEV/SEV-ES if NPT is disabled Disable SEV and SEV-ES if NPT is disabled. While the APM doesn't clearly state that NPT is mandatory, it's alluded to by: The guest page tables, managed by the guest, may mark data memory pages as either private or shared, thus allowing selected pages to be shared outside the guest. And practically speaking, shadow paging can't work since KVM can't read the guest's page tables. Fixes: `e9df094289` ("KVM: SVM: Add sev module_param") Cc: Brijesh Singh <brijesh.singh@amd.com Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:14 -04:00
Sean Christopherson	f31b88b35f	KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails Free sev_asid_bitmap if the reclaim bitmap allocation fails, othwerise KVM will unnecessarily keep the bitmap when SEV is not fully enabled. Freeing the page is also necessary to avoid introducing a bug when a future patch eliminates svm_sev_enabled() in favor of using the global 'sev' flag directly. While sev_hardware_enabled() checks max_sev_asid, which is true even if KVM setup fails, 'sev' will be true if and only if KVM setup fully succeeds. Fixes: `33af3a7ef9` ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations") Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:14 -04:00
Sean Christopherson	3b1902b87b	KVM: SVM: Zero out the VMCB array used to track SEV ASID association Zero out the array of VMCB pointers so that pre_sev_run() won't see garbage when querying the array to detect when an SEV ASID is being associated with a new VMCB. In practice, reading random values is all but guaranteed to be benign as a false negative (which is extremely unlikely on its own) can only happen on CPU0 on the first VMRUN and would only cause KVM to skip the ASID flush. For anything bad to happen, a previous instance of KVM would have to exit without flushing the ASID, _and_ KVM would have to not flush the ASID at any time while building the new SEV guest. Cc: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Fixes: `70cd94e60c` ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422021125.3417167-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:14 -04:00
Sean Christopherson	27b4a9c454	KVM: x86: Rename GPR accessors to make mode-aware variants the defaults Append raw to the direct variants of kvm_register_read/write(), and drop the "l" from the mode-aware variants. I.e. make the mode-aware variants the default, and make the direct variants scary sounding so as to discourage use. Accessing the full 64-bit values irrespective of mode is rarely the desired behavior. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:13 -04:00
Sean Christopherson	bc9eff67fc	KVM: SVM: Use default rAX size for INVLPGA emulation Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation outside of 64-bit mode to make KVM's emulation slightly less wrong. The address for INVLPGA is determined by the effective address size, i.e. it's not hardcoded to 64/32 bits for a given mode. Add a FIXME to call out that the emulation is wrong. Opportunistically tweak the ASID handling to make it clear that it's defined by ECX, not rCX. Per the APM: The portion of rAX used to form the address is determined by the effective address size (current execution mode and optional address size prefix). The ASID is taken from ECX. Fixes: `ff092385e8` ("KVM: SVM: Implement INVLPGA") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:13 -04:00
Sean Christopherson	0884335a2e	KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not in 64-bit mode. The APM states bits 63:32 are dropped for both DRs and CRs: In 64-bit mode, the operand size is fixed at 64 bits without the need for a REX prefix. In non-64-bit mode, the operand size is fixed at 32 bits and the upper 32 bits of the destination are forced to 0. Fixes: `7ff76d58a9` ("KVM: SVM: enhance MOV CR intercept handler") Fixes: `cae3797a46` ("KVM: SVM: enhance mov DR intercept handler") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:27:11 -04:00
Sean Christopherson	844d69c26d	KVM: SVM: Delay restoration of host MSR_TSC_AUX until return to userspace Use KVM's "user return MSRs" framework to defer restoring the host's MSR_TSC_AUX until the CPU returns to userspace. Add/improve comments to clarify why MSR_TSC_AUX is intercepted on both RDMSR and WRMSR, and why it's safe for KVM to keep the guest's value loaded even if KVM is scheduled out. Cc: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210423223404.3860547-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:26:13 -04:00
Sean Christopherson	dbd6127375	KVM: SVM: Clear MSR_TSC_AUX[63:32] on write Force clear bits 63:32 of MSR_TSC_AUX on write to emulate current AMD CPUs, which completely ignore the upper 32 bits, including dropping them on write. Emulating AMD hardware will also allow migrating a vCPU from AMD hardware to Intel hardware without requiring userspace to manually clear the upper bits, which are reserved on Intel hardware. Presumably, MSR_TSC_AUX[63:32] are intended to be reserved on AMD, but sadly the APM doesn't say _anything_ about those bits in the context of MSR access. The RDTSCP entry simply states that RCX contains bits 31:0 of the MSR, zero extended. And even worse is that the RDPID description implies that it can consume all 64 bits of the MSR: RDPID reads the value of TSC_AUX MSR used by the RDTSCP instruction into the specified destination register. Normal operand size prefixes do not apply and the update is either 32 bit or 64 bit based on the current mode. Emulate current hardware behavior to give KVM the best odds of playing nice with whatever the behavior of future AMD CPUs happens to be. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210423223404.3860547-3-seanjc@google.com> [Fix broken patch. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:24:43 -04:00
Sean Christopherson	6f2b296aa6	KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP unsupported Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in the guest's CPUID model. Fixes: `46896c73c1` ("KVM: svm: add support for RDTSCP") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210423223404.3860547-2-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-26 05:20:36 -04:00
Paolo Bonzini	fd49e8ee70	Merge branch 'kvm-sev-cgroup' into HEAD	2021-04-22 13:19:01 -04:00
Sean Christopherson	238eca821c	KVM: SVM: Allocate SEV command structures on local stack Use the local stack to "allocate" the structures used to communicate with the PSP. The largest struct used by KVM, sev_data_launch_secret, clocks in at 52 bytes, well within the realm of reasonable stack usage. The smallest structs are a mere 4 bytes, i.e. the pointer for the allocation is larger than the allocation itself. Now that the PSP driver plays nice with vmalloc pointers, putting the data on a virtually mapped stack (CONFIG_VMAP_STACK=y) will not cause explosions. Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406224952.4177376-9-seanjc@google.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> [Apply same treatment to PSP migration commands. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:07 -04:00
Brijesh Singh	6a443def87	KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command The command finalize the guest receiving process and make the SEV guest ready for the execution. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <d08914dc259644de94e29b51c3b68a13286fc5a3.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:05 -04:00
Brijesh Singh	15fb7de1a7	KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command The command is used for copying the incoming buffer into the SEV guest memory space. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <c5d0e3e719db7bb37ea85d79ed4db52e9da06257.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:05 -04:00
Brijesh Singh	af43cbbf95	KVM: SVM: Add support for KVM_SEV_RECEIVE_START command The command is used to create the encryption context for an incoming SEV guest. The encryption context can be later used by the hypervisor to import the incoming data into the SEV guest memory space. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <c7400111ed7458eee01007c4d8d57cdf2cbb0fc2.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:04 -04:00
Steve Rutherford	5569e2e7a6	KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command After completion of SEND_START, but before SEND_FINISH, the source VMM can issue the SEND_CANCEL command to stop a migration. This is necessary so that a cancelled migration can restart with a new target later. Reviewed-by: Nathan Tempelman <natet@google.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Steve Rutherford <srutherford@google.com> Message-Id: <20210412194408.2458827-1-srutherford@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:04 -04:00
Brijesh Singh	fddecf6a23	KVM: SVM: Add KVM_SEV_SEND_FINISH command The command is used to finailize the encryption context created with KVM_SEV_SEND_START command. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <5082bd6a8539d24bc55a1dd63a1b341245bb168f.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:04 -04:00
Brijesh Singh	d3d1af85e2	KVM: SVM: Add KVM_SEND_UPDATE_DATA command The command is used for encrypting the guest memory region using the encryption context created with KVM_SEV_SEND_START. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by : Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <d6a6ea740b0c668b30905ae31eac5ad7da048bb3.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:03 -04:00
Brijesh Singh	4cfdd47d6d	KVM: SVM: Add KVM_SEV SEND_START command The command is used to create an outgoing SEV guest encryption context. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <2f1686d0164e0f1b3d6a41d620408393e0a48376.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:03 -04:00
Nathan Tempelman	54526d1fd5	KVM: x86: Support KVM VMs sharing SEV context Add a capability for userspace to mirror SEV encryption context from one vm to another. On our side, this is intended to support a Migration Helper vCPU, but it can also be used generically to support other in-guest workloads scheduled by the host. The intention is for the primary guest and the mirror to have nearly identical memslots. The primary benefits of this are that: 1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they can't accidentally clobber each other. 2) The VMs can have different memory-views, which is necessary for post-copy migration (the migration vCPUs on the target need to read and write to pages, when the primary guest would VMEXIT). This does not change the threat model for AMD SEV. Any memory involved is still owned by the primary guest and its initial state is still attested to through the normal SEV_LAUNCH_* flows. If userspace wanted to circumvent SEV, they could achieve the same effect by simply attaching a vCPU to the primary VM. This patch deliberately leaves userspace in charge of the memslots for the mirror, as it already has the power to mess with them in the primary guest. This patch does not support SEV-ES (much less SNP), as it does not handle handing off attested VMSAs to the mirror. For additional context, we need a Migration Helper because SEV PSP migration is far too slow for our live migration on its own. Using an in-guest migrator lets us speed this up significantly. Signed-off-by: Nathan Tempelman <natet@google.com> Message-Id: <20210408223214.2582277-1-natet@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:02 -04:00
Krish Sadhukhan	ee695f22b5	nSVM: Check addresses of MSR and IO permission maps According to section "Canonicalization and Consistency Checks" in APM vol 2, the following guest state is illegal: "The MSR or IOIO intercept tables extend to a physical address that is greater than or equal to the maximum supported physical address." Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20210412215611.110095-5-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-21 12:20:01 -04:00
Krish Sadhukhan	47903dc10e	KVM: SVM: Define actual size of IOPM and MSRPM tables Define the actual size of the IOPM and MSRPM tables so that the actual size can be used when initializing them and when checking the consistency of their physical address. These #defines are placed in svm.h so that they can be shared. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20210412215611.110095-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-20 04:18:56 -04:00
Sean Christopherson	44f1b5586d	KVM: SVM: Enhance and clean up the vmcb tracking comment in pre_svm_run() Explicitly document why a vmcb must be marked dirty and assigned a new asid when it will be run on a different cpu. The "what" is relatively obvious, whereas the "why" requires reading the APM and/or KVM code. Opportunistically remove a spurious period and several unnecessary newlines in the comment. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-20 04:18:50 -04:00
Sean Christopherson	554cf31474	KVM: SVM: Add a comment to clarify what vcpu_svm.vmcb points at Add a comment above the declaration of vcpu_svm.vmcb to call out that it is simply a shorthand for current_vmcb->ptr. The myriad accesses to svm->vmcb are quite confusing without this crucial detail. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-20 04:18:49 -04:00
Sean Christopherson	d1788191fd	KVM: SVM: Drop vcpu_svm.vmcb_pa Remove vmcb_pa from vcpu_svm and simply read current_vmcb->pa directly in the one path where it is consumed. Unlike svm->vmcb, use of the current vmcb's address is very limited, as evidenced by the fact that its use can be trimmed to a single dereference. Opportunistically add a comment about using vmcb01 for VMLOAD/VMSAVE, at first glance using vmcb01 instead of vmcb_pa looks wrong. No functional change intended. Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-20 04:18:49 -04:00
Sean Christopherson	17e5e964ee	KVM: SVM: Don't set current_vmcb->cpu when switching vmcb Do not update the new vmcb's last-run cpu when switching to a different vmcb. If the vCPU is migrated between its last run and a vmcb switch, e.g. for nested VM-Exit, then setting the cpu without marking the vmcb dirty will lead to KVM running the vCPU on a different physical cpu with stale clean bit settings. vcpu->cpu current_vmcb->cpu hardware pre_svm_run() cpu0 cpu0 cpu0,clean kvm_arch_vcpu_load() cpu1 cpu0 cpu0,clean svm_switch_vmcb() cpu1 cpu1 cpu0,clean pre_svm_run() cpu1 cpu1 kaboom Simply delete the offending code; unlike VMX, which needs to update the cpu at switch time due to the need to do VMPTRLD, SVM only cares about which cpu last ran the vCPU. Fixes: `af18fa775d` ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb") Cc: Cathy Avery <cavery@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-20 04:18:49 -04:00
Tom Lendacky	a3ba26ecfb	KVM: SVM: Make sure GHCB is mapped before updating Access to the GHCB is mainly in the VMGEXIT path and it is known that the GHCB will be mapped. But there are two paths where it is possible the GHCB might not be mapped. The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform the caller of the AP Reset Hold NAE event that a SIPI has been delivered. However, if a SIPI is performed without a corresponding AP Reset Hold, then the GHCB might not be mapped (depending on the previous VMEXIT), which will result in a NULL pointer dereference. The svm_complete_emulated_msr() routine will update the GHCB to inform the caller of a RDMSR/WRMSR operation about any errors. While it is likely that the GHCB will be mapped in this situation, add a safe guard in this path to be certain a NULL pointer dereference is not encountered. Fixes: `f1c6366e30` ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Fixes: `647daca25d` ("KVM: SVM: Add support for booting APs in an SEV-ES guest") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 18:04:47 -04:00
Maxim Levitsky	4020da3b9f	KVM: x86: pending exceptions must not be blocked by an injected event Injected interrupts/nmi should not block a pending exception, but rather be either lost if nested hypervisor doesn't intercept the pending exception (as in stock x86), or be delivered in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit that corresponds to the pending exception. The only reason for an exception to be blocked is when nested run is pending (and that can't really happen currently but still worth checking for). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:31:02 -04:00
Maxim Levitsky	232f75d3b4	KVM: nSVM: call nested_svm_load_cr3 on nested state load While KVM's MMU should be fully reset by loading of nested CR0/CR3/CR4 by KVM_SET_SREGS, we are not in nested mode yet when we do it and therefore only root_mmu is reset. On regular nested entries we call nested_svm_load_cr3 which both updates the guest's CR3 in the MMU when it is needed, and it also initializes the mmu again which makes it initialize the walk_mmu as well when nested paging is enabled in both host and guest. Since we don't call nested_svm_load_cr3 on nested state load, the walk_mmu can be left uninitialized, which can lead to a NULL pointer dereference while accessing it if we happen to get a nested page fault right after entering the nested guest first time after the migration and we decide to emulate it, which leads to the emulator trying to access walk_mmu->gva_to_gpa which is NULL. Therefore we should call this function on nested state load as well. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401141814.1029036-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:31:01 -04:00
Maxim Levitsky	adc2a23734	KVM: nSVM: improve SYSENTER emulation on AMD Currently to support Intel->AMD migration, if CPU vendor is GenuineIntel, we emulate the full 64 value for MSR_IA32_SYSENTER_{EIP\|ESP} msrs, and we also emulate the sysenter/sysexit instruction in long mode. (Emulator does still refuse to emulate sysenter in 64 bit mode, on the ground that the code for that wasn't tested and likely has no users) However when virtual vmload/vmsave is enabled, the vmload instruction will update these 32 bit msrs without triggering their msr intercept, which will lead to having stale values in kvm's shadow copy of these msrs, which relies on the intercept to be up to date. Fix/optimize this by doing the following: 1. Enable the MSR intercepts for SYSENTER MSRs iff vendor=GenuineIntel (This is both a tiny optimization and also ensures that in case the guest cpu vendor is AMD, the msrs will be 32 bit wide as AMD defined). 2. Store only high 32 bit part of these msrs on interception and combine it with hardware msr value on intercepted read/writes iff vendor=GenuineIntel. 3. Disable vmload/vmsave virtualization if vendor=GenuineIntel. (It is somewhat insane to set vendor=GenuineIntel and still enable SVM for the guest but well whatever). Then zero the high 32 bit parts when kvm intercepts and emulates vmload. Thanks a lot to Paulo Bonzini for helping me with fixing this in the most correct way. This patch fixes nested migration of 32 bit nested guests, that was broken because incorrect cached values of SYSENTER msrs were stored in the migration stream if L1 changed these msrs with vmload prior to L2 entry. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401111928.996871-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:59 -04:00
Sean Christopherson	eba04b20e4	KVM: x86: Account a variety of miscellaneous allocations Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are clearly associated with a single task/VM. Note, there are a several SEV allocations that aren't accounted, but those can (hopefully) be fixed by using the local stack for memory. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331023025.2485960-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:58 -04:00
Sean Christopherson	8727906fde	KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs are created Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one or more vCPUs have been created. KVM assumes a VM is tagged SEV/SEV-ES prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV enabled, and svm_create_vcpu() needs to allocate the VMSA. At best, creating vCPUs before SEV/SEV-ES init will lead to unexpected errors and/or behavior, and at worst it will crash the host, e.g. sev_launch_update_vmsa() will dereference a null svm->vmsa pointer. Fixes: `1654efcbc4` ("KVM: SVM: Add KVM_SEV_INIT command") Fixes: `ad73109ae7` ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:58 -04:00
Sean Christopherson	9fa1521daa	KVM: SVM: Do not set sev->es_active until KVM_SEV_ES_INIT completes Set sev->es_active only after the guts of KVM_SEV_ES_INIT succeeds. If the command fails, e.g. because SEV is already active or there are no available ASIDs, then es_active will be left set even though the VM is not fully SEV-ES capable. Refactor the code so that "es_active" is passed on the stack instead of being prematurely shoved into sev_info, both to avoid having to unwind sev_info and so that it's more obvious what actually consumes es_active in sev_guest_init() and its helpers. Fixes: `ad73109ae7` ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:57 -04:00
Sean Christopherson	c36b16d29f	KVM: SVM: Use online_vcpus, not created_vcpus, to iterate over vCPUs Use the kvm_for_each_vcpu() helper to iterate over vCPUs when encrypting VMSAs for SEV, which effectively switches to use online_vcpus instead of created_vcpus. This fixes a possible null-pointer dereference as created_vcpus does not guarantee a vCPU exists, since it is updated at the very beginning of KVM_CREATE_VCPU. created_vcpus exists to allow the bulk of vCPU creation to run in parallel, while still correctly restricting the max number of max vCPUs. Fixes: `ad73109ae7` ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:57 -04:00
Krish Sadhukhan	9a7de6ecc3	KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in nested_svm_vmexit() According to APM, the #DB intercept for a single-stepped VMRUN must happen after the completion of that instruction, when the guest does #VMEXIT to the host. However, in the current implementation of KVM, the #DB intercept for a single-stepped VMRUN happens after the completion of the instruction that follows the VMRUN instruction. When the #DB intercept handler is invoked, it shows the RIP of the instruction that follows VMRUN, instead of of VMRUN itself. This is an incorrect RIP as far as single-stepping VMRUN is concerned. This patch fixes the problem by checking, in nested_svm_vmexit(), for the condition that the VMRUN instruction is being single-stepped and if so, queues the pending #DB intercept so that the #DB is accounted for before we execute L1's next instruction. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oraacle.com> Message-Id: <20210323175006.73249-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:52 -04:00
Vipin Sharma	7aef27f0b2	svm/sev: Register SEV and SEV-ES ASIDs to the misc controller Secure Encrypted Virtualization (SEV) and Secure Encrypted Virtualization - Encrypted State (SEV-ES) ASIDs are used to encrypt KVMs on AMD platform. These ASIDs are available in the limited quantities on a host. Register their capacity and usage to the misc controller for tracking via cgroups. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Rientjes <rientjes@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2021-04-04 13:34:46 -04:00
Paolo Bonzini	cb9b6a1b19	Merge branch 'kvm-fix-svm-races' into HEAD	2021-04-01 05:19:48 -04:00
Paolo Bonzini	6ebae23c07	Merge branch 'kvm-fix-svm-races' into kvm-master	2021-04-01 05:14:05 -04:00
Paolo Bonzini	3c346c0c60	KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit Fixing nested_vmcb_check_save to avoid all TOC/TOU races is a bit harder in released kernels, so do the bare minimum by avoiding that EFER.SVME is cleared. This is problematic because svm_set_efer frees the data structures for nested virtualization if EFER.SVME is cleared. Also check that EFER.SVME remains set after a nested vmexit; clearing it could happen if the bit is zero in the save area that is passed to KVM_SET_NESTED_STATE (the save area of the nested state corresponds to the nested hypervisor's state and is restored on the next nested vmexit). Cc: stable@vger.kernel.org Fixes: `2fcf4876ad` ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 05:11:35 -04:00
Paolo Bonzini	a58d9166a7	KVM: SVM: load control fields from VMCB12 before checking them Avoid races between check and use of the nested VMCB controls. This for example ensures that the VMRUN intercept is always reflected to the nested hypervisor, instead of being processed by the host. Without this patch, it is possible to end up with svm->nested.hsave pointing to the MSR permission bitmap for nested guests. This bug is CVE-2021-29657. Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@vger.kernel.org Fixes: `2fcf4876ad` ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 05:09:31 -04:00
Vitaly Kuznetsov	1973cadd4c	KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE MSR_F15H_PERF_CTL0-5, MSR_F15H_PERF_CTR0-5 MSRs are only available when X86_FEATURE_PERFCTR_CORE CPUID bit was exposed to the guest. KVM, however, allows these MSRs unconditionally because kvm_pmu_is_valid_msr() -> amd_msr_idx_to_pmc() check always passes and because kvm_pmu_set_msr() -> amd_pmu_set_msr() doesn't fail. In case of a counter (CTRn), no big harm is done as we only increase internal PMC's value but in case of an eventsel (CTLn), we go deep into perf internals with a non-existing counter. Note, kvm_get_msr_common() just returns '0' when these MSRs don't exist and this also seems to contradict architectural behavior which is #GP (I did check one old Opteron host) but changing this status quo is a bit scarier. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210323084515.1346540-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:07:10 -04:00
Ingo Molnar	163b099146	x86: Fix various typos in comments, take #2 Fix another ~42 single-word typos in arch/x86/ code comments, missed a few in the first pass, in particular in .S files. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org	2021-03-21 23:50:28 +01:00
Ingo Molnar	d9f6e12fb0	x86: Fix various typos in comments Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org	2021-03-18 15:31:53 +01:00
Sean Christopherson	4a98623d5d	KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging Set the PAE roots used as decrypted to play nice with SME when KVM is using shadow paging. Explicitly skip setting the C-bit when loading CR3 for PAE shadow paging, even though it's completely ignored by the CPU. The extra documentation is nice to have. Note, there are several subtleties at play with NPT. In addition to legacy shadow paging, the PAE roots are used for SVM's NPT when either KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit NPT. However, 32-bit Linux, and thus KVM, doesn't support SME. And 64-bit KVM can happily set the C-bit in CR3. This also means that keeping __sme_set(root) for 32-bit KVM when NPT is enabled is conceptually wrong, but functionally ok since SME is 64-bit only. Leave it as is to avoid unnecessary pollution. Fixes: `d0ec49d4de` ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210309224207.1218275-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:44:08 -04:00
Sean Christopherson	e83bc09caf	KVM: x86: Get active PCID only when writing a CR3 value Retrieve the active PCID only when writing a guest CR3 value, i.e. don't get the PCID when using EPT or NPT. The PCID is especially problematic for EPT as the bits have different meaning, and so the PCID and must be manually stripped, which is annoying and unnecessary. And on VMX, getting the active PCID also involves reading the guest's CR3 and CR4.PCIDE, i.e. may add pointless VMREADs. Opportunistically rename the pgd/pgd_level params to root_hpa and root_level to better reflect their new roles. Keep the function names, as "load the guest PGD" is still accurate/correct. Last, and probably least, pass root_hpa as a hpa_t/u64 instead of an unsigned long. The EPTP holds a 64-bit value, even in 32-bit mode, so in theory EPT could support HIGHMEM for 32-bit KVM. Never mind that doing so would require changing the MMU page allocators and reworking the MMU to use kmap(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-2-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:56 -04:00
Uros Bizjak	7531b47c8a	KVM/SVM: Move vmenter.S exception fixups out of line Avoid jump by moving exception fixups out of line. Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20210226125621.111723-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:56 -04:00
Sean Christopherson	8120337a4c	KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs Stop tagging MMIO SPTEs with specific available bits and instead detect MMIO SPTEs by checking for their unique SPTE value. The value is guaranteed to be unique on shadow paging and NPT as setting reserved physical address bits on any other type of SPTE would consistute a KVM bug. Ditto for EPT, as creating a WX non-MMIO would also be a bug. Note, this approach is also future-compatibile with TDX, which will need to reflect MMIO EPT violations as #VEs into the guest. To create an EPT violation instead of a misconfig, TDX EPTs will need to have RWX=0, But, MMIO SPTEs will also be the only case where KVM clears SUPPRESS_VE, so MMIO SPTEs will still be guaranteed to have a unique value within a given MMU context. The main motivation is to make it easier to reason about which types of SPTEs use which available bits. As a happy side effect, this frees up two more bits for storing the MMIO generation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:41 -04:00
Cathy Avery	8173396e94	KVM: nSVM: Optimize vmcb12 to vmcb02 save area copies Use the vmcb12 control clean field to determine which vmcb12.save registers were marked dirty in order to minimize register copies when switching from L1 to L2. Those vmcb12 registers marked as dirty need to be copied to L0's vmcb02 as they will be used to update the vmcb state cache for the L2 VMRUN. In the case where we have a different vmcb12 from the last L2 VMRUN all vmcb12.save registers must be copied over to L2.save. Tested: kvm-unit-tests kvm selftests Fedora L1 L2 Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20210301200844.2000-1-cavery@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:26 -04:00
Babu Moger	d00b99c514	KVM: SVM: Add support for Virtual SPEC_CTRL Newer AMD processors have a feature to virtualize the use of the SPEC_CTRL MSR. Presence of this feature is indicated via CPUID function 0x8000000A_EDX[20]: GuestSpecCtrl. Hypervisors are not required to enable this feature since it is automatically enabled on processors that support it. A hypervisor may wish to impose speculation controls on guest execution or a guest may want to impose its own speculation controls. Therefore, the processor implements both host and guest versions of SPEC_CTRL. When in host mode, the host SPEC_CTRL value is in effect and writes update only the host version of SPEC_CTRL. On a VMRUN, the processor loads the guest version of SPEC_CTRL from the VMCB. When the guest writes SPEC_CTRL, only the guest version is updated. On a VMEXIT, the guest version is saved into the VMCB and the processor returns to only using the host SPEC_CTRL for speculation control. The guest SPEC_CTRL is located at offset 0x2E0 in the VMCB. The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed with the hypervisor SPEC_CTRL setting. This allows the hypervisor to ensure a minimum SPEC_CTRL if desired. This support also fixes an issue where a guest may sometimes see an inconsistent value for the SPEC_CTRL MSR on processors that support this feature. With the current SPEC_CTRL support, the first write to SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it will be 0x0, instead of the actual expected value. There isn’t a security concern here, because the host SPEC_CTRL value is or’ed with the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value. KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL MSR just before the VMRUN, so it will always have the actual value even though it doesn’t appear that way in the guest. The guest will only see the proper value for the SPEC_CTRL register if the guest was to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL support, the save area spec_ctrl is properly saved and restored. So, the guest will always see the proper value when it is read back. Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <161188100955.28787.11816849358413330720.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:25 -04:00
Maxim Levitsky	cc3ed80ae6	KVM: nSVM: always use vmcb01 to for vmsave/vmload of guest state This allows to avoid copying of these fields between vmcb01 and vmcb02 on nested guest entry/exit. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:24 -04:00
Paolo Bonzini	fb0c4a4fee	KVM: SVM: move VMLOAD/VMSAVE to C code Thanks to the new macros that handle exception handling for SVM instructions, it is easier to just do the VMLOAD/VMSAVE in C. This is safe, as shown by the fact that the host reload is already done outside the assembly source. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:23 -04:00
Sean Christopherson	c8781feaf1	KVM: SVM: Skip intercepted PAUSE instructions after emulation Skip PAUSE after interception to avoid unnecessarily re-executing the instruction in the guest, e.g. after regaining control post-yield. This is a benign bug as KVM disables PAUSE interception if filtering is off, including the case where pause_filter_count is set to zero. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:22 -04:00
Sean Christopherson	32c23c7d52	KVM: SVM: Don't manually emulate RDPMC if nrips=0 Remove bizarre code that causes KVM to run RDPMC through the emulator when nrips is disabled. Accelerated emulation of RDPMC doesn't rely on any additional data from the VMCB, and SVM has generic handling for updating RIP to skip instructions when nrips is disabled. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:21 -04:00
Sean Christopherson	c483c45471	KVM: x86: Move RDPMC emulation to common code Move the entirety of the accelerated RDPMC emulation to x86.c, and assign the common handler directly to the exit handler array for VMX. SVM has bizarre nrips behavior that prevents it from directly invoking the common handler. The nrips goofiness will be addressed in a future patch. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:20 -04:00
Sean Christopherson	5ff3a351f6	KVM: x86: Move trivial instruction-based exit handlers to common code Move the trivial exit handlers, e.g. for instructions that KVM "emulates" as nops, to common x86 code. Assign the common handlers directly to the exit handler arrays and drop the vendor trampolines. Opportunistically use pr_warn_once() where appropriate. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:19 -04:00
Sean Christopherson	92f9895c14	KVM: x86: Move XSETBV emulation to common code Move the entirety of XSETBV emulation to x86.c, and assign the function directly to both VMX's and SVM's exit handlers, i.e. drop the unnecessary trampolines. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:18 -04:00
Sean Christopherson	2ac636a6ea	KVM: nSVM: Add VMLOAD/VMSAVE helper to deduplicate code Add another helper layer for VMLOAD+VMSAVE, the code is identical except for the one line that determines which VMCB is the source and which is the destination. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:17 -04:00
Sean Christopherson	3a87c7e0d1	KVM: nSVM: Add helper to synthesize nested VM-Exit without collateral Add a helper to consolidate boilerplate for nested VM-Exits that don't provide any data in exit_info_*. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:16 -04:00
Sean Christopherson	cb6a32c2b8	KVM: x86: Handle triple fault in L2 without killing L1 Synthesize a nested VM-Exit if L2 triggers an emulated triple fault instead of exiting to userspace, which likely will kill L1. Any flow that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that is _not_intercepted by L1. E.g. if KVM is intercepting #GPs for the VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF will cause KVM to emulate triple fault. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:15 -04:00
Paolo Bonzini	6312975417	KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places) Refactor the svm_exit_handlers API to pass @vcpu instead of @svm to allow directly invoking common x86 exit handlers (in a future patch). Opportunistically convert an absurd number of instances of 'svm->vcpu' to direct uses of 'vcpu' to avoid pointless casting. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:09 -04:00
Paolo Bonzini	2a32a77cef	KVM: SVM: merge update_cr0_intercept into svm_set_cr0 The logic of update_cr0_intercept is pointlessly complicated. All svm_set_cr0 is compute the effective cr0 and compare it with the guest value. Inlining the function and simplifying the condition clarifies what it is doing. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:38 -04:00
Sean Christopherson	11f0cbf0c6	KVM: nSVM: Trace VM-Enter consistency check failures Use trace_kvm_nested_vmenter_failed() and its macro magic to trace consistency check failures on nested VMRUN. Tracing such failures by running the buggy VMM as a KVM guest is often the only way to get a precise explanation of why VMRUN failed. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:37 -04:00
Krish Sadhukhan	6906e06db9	KVM: nSVM: Add missing checks for reserved bits to svm_set_nested_state() The path for SVM_SET_NESTED_STATE needs to have the same checks for the CPU registers, as we have in the VMRUN path for a nested guest. This patch adds those missing checks to svm_set_nested_state(). Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20201006190654.32305-3-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:35 -04:00
Paolo Bonzini	c08f390a75	KVM: nSVM: only copy L1 non-VMLOAD/VMSAVE data in svm_set_nested_state() The VMLOAD/VMSAVE data is not taken from userspace, since it will not be restored on VMEXIT (it will be copied from VMCB02 to VMCB01). For clarity, replace the wholesale copy of the VMCB save area with a copy of that state only. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:34 -04:00
Paolo Bonzini	4bb170a543	KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit Since L1 and L2 now use different VMCBs, most of the fields remain the same in VMCB02 from one L2 run to the next. Since KVM itself is not looking at VMCB12's clean field, for now not much can be optimized. However, in the future we could avoid more copies if the VMCB12's SEG and DT sections are clean. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:33 -04:00
Paolo Bonzini	7ca62d1322	KVM: nSVM: do not mark all VMCB01 fields dirty on nested vmexit Since L1 and L2 now use different VMCBs, most of the fields remain the same from one L1 run to the next. svm_set_cr0 and other functions called by nested_svm_vmexit already take care of clearing the corresponding clean bits; only the TSC offset is special. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:32 -04:00
Paolo Bonzini	7c3ecfcd31	KVM: nSVM: do not copy vmcb01->control blindly to vmcb02->control Most fields were going to be overwritten by vmcb12 control fields, or do not matter at all because they are filled by the processor on vmexit. Therefore, we need not copy them from vmcb01 to vmcb02 on vmentry. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:32 -04:00
Paolo Bonzini	9e8f0fbfff	KVM: nSVM: rename functions and variables according to vmcbXY nomenclature Now that SVM is using a separate vmcb01 and vmcb02 (and also uses the vmcb12 naming) we can give clearer names to functions that write to and read from those VMCBs. Likewise, variables and parameters can be renamed from nested_vmcb to vmcb12. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:31 -04:00
Cathy Avery	193015adf4	KVM: nSVM: Track the ASID generation of the vmcb vmrun through the vmcb This patch moves the asid_generation from the vcpu to the vmcb in order to track the ASID generation that was active the last time the vmcb was run. If sd->asid_generation changes between two runs, the old ASID is invalid and must be changed. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20210112164313.4204-3-cavery@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:30 -04:00
Cathy Avery	af18fa775d	KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb This patch moves the physical cpu tracking from the vcpu to the vmcb in svm_switch_vmcb. If either vmcb01 or vmcb02 change physical cpus from one vmrun to the next the vmcb's previous cpu is preserved for comparison with the current cpu and the vmcb is marked dirty if different. This prevents the processor from using old cached data for a vmcb that may have been updated on a prior run on a different processor. It also moves the physical cpu check from svm_vcpu_load to pre_svm_run as the check only needs to be done at run. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20210112164313.4204-2-cavery@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:29 -04:00
Cathy Avery	4995a3685f	KVM: SVM: Use a separate vmcb for the nested L2 guest svm->vmcb will now point to a separate vmcb for L1 (not nested) or L2 (nested). The main advantages are removing get_host_vmcb and hsave, in favor of concepts that are shared with VMX. We don't need anymore to stash the L1 registers in hsave while L2 runs, but we need to copy the VMLOAD/VMSAVE registers from VMCB01 to VMCB02 and back. This more or less has the same cost, but code-wise nested_svm_vmloadsave can be reused. This patch omits several optimizations that are possible: - for simplicity there is some wholesale copying of vmcb.control areas which can go away. - we should be able to better use the VMCB01 and VMCB02 clean bits. - another possibility is to always use VMCB01 for VMLOAD and VMSAVE, thus avoiding the copy of VMLOAD/VMSAVE registers from VMCB01 to VMCB02 and back. Tested: kvm-unit-tests kvm self tests Loaded fedora nested guest on fedora Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20201011184818.3609-3-cavery@redhat.com> [Fix conflicts; keep VMCB02 G_PAT up to date whenever guest writes the PAT MSR; do not copy CR4 over from VMCB01 as it is not needed anymore; add a few more comments. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:28 -04:00
Sean Christopherson	6d1b867d04	KVM: SVM: Don't strip the C-bit from CR2 on #PF interception Don't strip the C-bit from the faulting address on an intercepted #PF, the address is a virtual address, not a physical address. Fixes: `0ede79e132` ("KVM: SVM: Clear C-bit from the page fault address") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:42:26 -04:00
Dongli Zhang	43c11d91fb	KVM: x86: to track if L1 is running L2 VM The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM is running or used to run L2 VM. An example of the usage of 'nested_run' is to help the host administrator to easily track if any L1 VM is used to run L2 VM. Suppose there is issue that may happen with nested virtualization, the administrator will be able to easily narrow down and confirm if the issue is due to nested virtualization via 'nested_run'. For example, whether the fix like commit `88dddc11a8` ("KVM: nVMX: do not use dangling shadow VMCS after guest reset") is required. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:28:02 -04:00
Sean Christopherson	99840a7545	KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled' Directly connect the 'npt' param to the 'npt_enabled' variable so that runtime adjustments to npt_enabled are reflected in sysfs. Move the !PAE restriction to a runtime check to ensure NPT is forced off if the host is using 2-level paging, and add a comment explicitly stating why NPT requires a 64-bit kernel or a kernel with PAE enabled. Opportunistically switch the param to octal permissions. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305021637.3768573-1-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-05 08:33:15 -05:00
Babu Moger	9e46f6c6c9	KVM: SVM: Clear the CR4 register on reset This problem was reported on a SVM guest while executing kexec. Kexec fails to load the new kernel when the PCID feature is enabled. When kexec starts loading the new kernel, it starts the process by resetting the vCPU's and then bringing each vCPU online one by one. The vCPU reset is supposed to reset all the register states before the vCPUs are brought online. However, the CR4 register is not reset during this process. If this register is already setup during the last boot, all the flags can remain intact. The X86_CR4_PCIDE bit can only be enabled in long mode. So, it must be enabled much later in SMP initialization. Having the X86_CR4_PCIDE bit set during SMP boot can cause a boot failures. Fix the issue by resetting the CR4 register in init_vmcb(). Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <161471109108.30811.6392805173629704166.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-02 14:39:11 -05:00
Sean Christopherson	2df8d3807c	KVM: SVM: Fix nested VM-Exit on #GP interception handling Fix the interpreation of nested_svm_vmexit()'s return value when synthesizing a nested VM-Exit after intercepting an SVM instruction while L2 was running. The helper returns '0' on success, whereas a return value of '0' in the exit handler path means "exit to userspace". The incorrect return value causes KVM to exit to userspace without filling the run state, e.g. QEMU logs "KVM: unknown exit, hardware reason 0". Fixes: `14c2bf81fc` ("KVM: SVM: Fix #GP handling for doubly-nested virtualization") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210224005627.657028-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-25 05:13:05 -05:00
Paolo Bonzini	d2df592fd8	KVM: nSVM: prepare guest save area while is_guest_mode is true Right now, enter_svm_guest_mode is calling nested_prepare_vmcb_save and nested_prepare_vmcb_control. This results in is_guest_mode being false until the end of nested_prepare_vmcb_control. This is a problem because nested_prepare_vmcb_save can in turn cause changes to the intercepts and these have to be applied to the "host VMCB" (stored in svm->nested.hsave) and then merged with the VMCB12 intercepts into svm->vmcb. In particular, without this change we forget to set the CR0 read and CR0 write intercepts when running a real mode L2 guest with NPT disabled. The guest is therefore able to see the CR0.PG bit that KVM sets to enable "paged real mode". This patch fixes the svm.flat mode_switch test case with npt=0. There are no other problematic calls in nested_prepare_vmcb_save. Moving is_guest_mode to the end is done since commit `06fc777269` ("KVM: SVM: Activate nested state only when guest state is complete", 2010-04-25). However, back then KVM didn't grab a different VMCB when updating the intercepts, it had already copied/merged L1's stuff to L0's VMCB, and then updated L0's VMCB regardless of is_nested(). Later recalc_intercepts was introduced in commit `384c636843` ("KVM: SVM: Add function to recalculate intercept masks", 2011-01-12). This introduced the bug, because recalc_intercepts now throws away the intercept manipulations that svm_set_cr0 had done in the meanwhile to svm->vmcb. [1] https://lore.kernel.org/kvm/1266493115-28386-1-git-send-email-joerg.roedel@amd.com/ Reviewed-by: Sean Christopherson <seanjc@google.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-22 13:11:56 -05:00
Paolo Bonzini	a04aead144	KVM: nSVM: fix running nested guests when npt=0 In case of npt=0 on host, nSVM needs the same .inject_page_fault tweak as VMX has, to make sure that shadow mmu faults are injected as vmexits. It is not clear why this is needed at all, but for now keep the same code as VMX and we'll fix it for both. Based on a patch by Maxim Levitsky <mlevitsk@redhat.com>. Fixes: `7c86663b68` ("KVM: nSVM: inject exceptions via svm_check_nested_events") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-18 07:33:30 -05:00
Maxim Levitsky	954f419ba8	KVM: nSVM: move nested vmrun tracepoint to enter_svm_guest_mode This way trace will capture all the nested mode entries (including entries after migration, and from smm) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210217145718.1217358-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-18 07:33:30 -05:00
Sean Christopherson	e420333422	KVM: x86: Advertise INVPCID by default Advertise INVPCID by default (if supported by the host kernel) instead of having both SVM and VMX opt in. INVPCID was opt in when it was a VMX only feature so that KVM wouldn't prematurely advertise support if/when it showed up in the kernel on AMD hardware. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210212003411.1102677-3-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-18 07:33:29 -05:00
Sean Christopherson	0a8ed2eaac	KVM: SVM: Intercept INVPCID when it's disabled to inject #UD Intercept INVPCID if it's disabled in the guest, even when using NPT, as KVM needs to inject #UD in this case. Fixes: `4407a797e9` ("KVM: SVM: Enable INVPCID feature on AMD") Cc: Babu Moger <babu.moger@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210212003411.1102677-2-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-18 07:33:28 -05:00
Wei Yongjun	2e215216d6	KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static The sparse tool complains as follows: arch/x86/kvm/svm/svm.c:204:6: warning: symbol 'svm_gp_erratum_intercept' was not declared. Should it be static? This symbol is not used outside of svm.c, so this commit marks it static. Fixes: `82a11e9c6f` ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Message-Id: <20210210075958.1096317-1-weiyongjun1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-11 08:02:08 -05:00
Paolo Bonzini	996ff5429e	KVM: x86: move kvm_inject_gp up from kvm_set_dr to callers Push the injection of #GP up to the callers, so that they can just use kvm_complete_insn_gp. __kvm_set_dr is pretty much what the callers can use together with kvm_complete_insn_gp, so rename it to kvm_set_dr and drop the old kvm_set_dr wrapper. This also allows nested VMX code, which really wanted to use __kvm_set_dr, to use the right function. While at it, remove the kvm_require_dr() check from the SVM interception. The APM states: All normal exception checks take precedence over the SVM intercepts. which includes the CR4.DE=1 #UD. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-09 08:17:07 -05:00
Sean Christopherson	6f7a343987	KVM: SVM: Remove an unnecessary forward declaration Drop a defunct forward declaration of svm_complete_interrupts(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-09 08:17:06 -05:00
Sean Christopherson	e6c804a848	KVM: SVM: Move AVIC vCPU kicking snippet to helper function Add a helper function to handle kicking non-running vCPUs when sending virtual IPIs. A future patch will change SVM's interception functions to take @vcpu instead of @svm, at which piont declaring and modifying 'vcpu' in a case statement is confusing, and potentially dangerous. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-09 08:17:06 -05:00
Sean Christopherson	ca29e14506	KVM: x86: SEV: Treat C-bit as legal GPA bit regardless of vCPU mode Rename cr3_lm_rsvd_bits to reserved_gpa_bits, and use it for all GPA legality checks. AMD's APM states: If the C-bit is an address bit, this bit is masked from the guest physical address when it is translated through the nested page tables. Thus, any access that can conceivably be run through NPT should ignore the C-bit when checking for validity. For features that KVM emulates in software, e.g. MTRRs, there is no clear direction in the APM for how the C-bit should be handled. For such cases, follow the SME behavior inasmuch as possible, since SEV is is essentially a VM-specific variant of SME. For SME, the APM states: In this case the upper physical address bits are treated as reserved when the feature is enabled except where otherwise indicated. Collecting the various relavant SME snippets in the APM and cross- referencing the omissions with Linux kernel code, this leaves MTTRs and APIC_BASE as the only flows that KVM emulates that should _not_ ignore the C-bit. Note, this means the reserved bit checks in the page tables are technically broken. This will be remedied in a future patch. Although the page table checks are technically broken, in practice, it's all but guaranteed to be irrelevant. NPT is required for SEV, i.e. shadowing page tables isn't needed in the common case. Theoretically, the checks could be in play for nested NPT, but it's extremely unlikely that anyone is running nested VMs on SEV, as doing so would require L1 to expose sensitive data to L0, e.g. the entire VMCB. And if anyone is running nested VMs, L0 can't read the guest's encrypted memory, i.e. L1 would need to put its NPT in shared memory, in which case the C-bit will never be set. Or, L1 could use shadow paging, but again, if L0 needs to read page tables, e.g. to load PDPTRs, the memory can't be encrypted if L1 has any expectation of L0 doing the right thing. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 09:27:29 -05:00
Sean Christopherson	bbc2c63ddd	KVM: nSVM: Use common GPA helper to check for illegal CR3 Replace an open coded check for an invalid CR3 with its equivalent helper. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 09:27:28 -05:00
Sean Christopherson	2732be9023	KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs Don't clear the SME C-bit when reading a guest PDPTR, as the GPA (CR3) is in the guest domain. Barring a bizarre paravirtual use case, this is likely a benign bug. SME is not emulated by KVM, loading SEV guest PDPTRs is doomed as KVM can't use the correct key to read guest memory, and setting guest MAXPHYADDR higher than the host, i.e. overlapping the C-bit, would cause faults in the guest. Note, for SEV guests, stripping the C-bit is technically aligned with CPU behavior, but for KVM it's the greater of two evils. Because KVM doesn't have access to the guest's encryption key, ignoring the C-bit would at best result in KVM reading garbage. By keeping the C-bit, KVM will fail its read (unless userspace creates a memslot with the C-bit set). The guest will still undoubtedly die, as KVM will use '0' for the PDPTR value, but that's preferable to interpreting encrypted data as a PDPTR. Fixes: `d0ec49d4de` ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 09:27:27 -05:00
Paolo Bonzini	bbefd4fc8f	KVM: x86: move kvm_inject_gp up from kvm_set_xcr to callers Push the injection of #GP up to the callers, so that they can just use kvm_complete_insn_gp. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:37 -05:00
Krish Sadhukhan	04548ed020	KVM: SVM: Replace hard-coded value with #define Replace the hard-coded value for bit# 1 in EFLAGS, with the available #define. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20210203012842.101447-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:35 -05:00
Michael Roth	a7fc06dd2f	KVM: SVM: use .prepare_guest_switch() to handle CPU register save/setup Currently we save host state like user-visible host MSRs, and do some initial guest register setup for MSR_TSC_AUX and MSR_AMD64_TSC_RATIO in svm_vcpu_load(). Defer this until just before we enter the guest by moving the handling to kvm_x86_ops.prepare_guest_switch() similarly to how it is done for the VMX implementation. Additionally, since handling of saving/restoring host user MSRs is the same both with/without SEV-ES enabled, move that handling to common code. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20210202190126.2185715-4-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:35 -05:00
Michael Roth	553cc15f6e	KVM: SVM: remove uneeded fields from host_save_users_msrs Now that the set of host user MSRs that need to be individually saved/restored are the same with/without SEV-ES, we can drop the .sev_es_restored flag and just iterate through the list unconditionally for both cases. A subsequent patch can then move these loops to a common path. Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20210202190126.2185715-3-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:34 -05:00
Michael Roth	e79b91bb3c	KVM: SVM: use vmsave/vmload for saving/restoring additional host state Using a guest workload which simply issues 'hlt' in a tight loop to generate VMEXITs, it was observed (on a recent EPYC processor) that a significant amount of the VMEXIT overhead measured on the host was the result of MSR reads/writes in svm_vcpu_load/svm_vcpu_put according to perf: 67.49%--kvm_arch_vcpu_ioctl_run \| \|--23.13%--vcpu_put \| kvm_arch_vcpu_put \| \| \| \|--21.31%--native_write_msr \| \| \| --1.27%--svm_set_cr4 \| \|--16.11%--vcpu_load \| \| \| --15.58%--kvm_arch_vcpu_load \| \| \| \|--13.97%--svm_set_cr4 \| \| \| \| \| \|--12.64%--native_read_msr Most of these MSRs relate to 'syscall'/'sysenter' and segment bases, and can be saved/restored using 'vmsave'/'vmload' instructions rather than explicit MSR reads/writes. In doing so there is a significant reduction in the svm_vcpu_load/svm_vcpu_put overhead measured for the above workload: 50.92%--kvm_arch_vcpu_ioctl_run \| \|--19.28%--disable_nmi_singlestep \| \|--13.68%--vcpu_load \| kvm_arch_vcpu_load \| \| \| \|--9.19%--svm_set_cr4 \| \| \| \| \| --6.44%--native_read_msr \| \| \| --3.55%--native_write_msr \| \|--6.05%--kvm_inject_nmi \|--2.80%--kvm_sev_es_mmio_read \|--2.19%--vcpu_put \| \| \| --1.25%--kvm_arch_vcpu_put \| native_write_msr Quantifying this further, if we look at the raw cycle counts for a normal iteration of the above workload (according to 'rdtscp'), kvm_arch_vcpu_ioctl_run() takes ~4600 cycles from start to finish with the current behavior. Using 'vmsave'/'vmload', this is reduced to ~2800 cycles, a savings of 39%. While this approach doesn't seem to manifest in any noticeable improvement for more realistic workloads like UnixBench, netperf, and kernel builds, likely due to their exit paths generally involving IO with comparatively high latencies, it does improve overall overhead of KVM_RUN significantly, which may still be noticeable for certain situations. It also simplifies some aspects of the code. With this change, explicit save/restore is no longer needed for the following host MSRs, since they are documented[1] as being part of the VMCB State Save Area: MSR_STAR, MSR_LSTAR, MSR_CSTAR, MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, MSR_FS_BASE, MSR_GS_BASE and only the following MSR needs individual handling in svm_vcpu_put/svm_vcpu_load: MSR_TSC_AUX We could drop the host_save_user_msrs array/loop and instead handle MSR read/write of MSR_TSC_AUX directly, but we leave that for now as a potential follow-up. Since 'vmsave'/'vmload' also handles the LDTR and FS/GS segment registers (and associated hidden state)[2], some of the code previously used to handle this is no longer needed, so we drop it as well. The first public release of the SVM spec[3] also documents the same handling for the host state in question, so we make these changes unconditionally. Also worth noting is that we 'vmsave' to the same page that is subsequently used by 'vmrun' to record some host additional state. This is okay, since, in accordance with the spec[2], the additional state written to the page by 'vmrun' does not overwrite any fields written by 'vmsave'. This has also been confirmed through testing (for the above CPU, at least). [1] AMD64 Architecture Programmer's Manual, Rev 3.33, Volume 2, Appendix B, Table B-2 [2] AMD64 Architecture Programmer's Manual, Rev 3.31, Volume 3, Chapter 4, VMSAVE/VMLOAD [3] Secure Virtual Machine Architecture Reference Manual, Rev 3.01 Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20210202190126.2185715-2-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:34 -05:00
Sean Christopherson	35a7831912	KVM: SVM: Use asm goto to handle unexpected #UD on SVM instructions Add svm_asm() macros, a la the existing vmx_asm() macros, to handle faults on SVM instructions instead of using the generic __ex(), a.k.a. __kvm_handle_fault_on_reboot(). Using asm goto generates slightly better code as it eliminates the in-line JMP+CALL sequences that are needed by __kvm_handle_fault_on_reboot() to avoid triggering BUG() from fixup (which generates bad stack traces). Using SVM specific macros also drops the last user of __ex() and the the last asm linkage to kvm_spurious_fault(), and adds a helper for VMSAVE, which may gain an addition call site in the future (as part of optimizing the SVM context switching). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201231002702.2223707-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:33 -05:00
Jason Baron	b6a7cc3544	KVM: X86: prepend vmx/svm prefix to additional kvm_x86_ops functions A subsequent patch introduces macros in preparation for simplifying the definition for vmx_x86_ops and svm_x86_ops. Making the naming more uniform expands the coverage of the macros. Add vmx/svm prefix to the following functions: update_exception_bitmap(), enable_nmi_window(), enable_irq_window(), update_cr8_intercept and enable_smi_window(). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Message-Id: <ed594696f8e2c2b2bfc747504cee9bbb2a269300.1610680941.git.jbaron@akamai.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:29 -05:00
Wei Huang	14c2bf81fc	KVM: SVM: Fix #GP handling for doubly-nested virtualization Under the case of nested on nested (L0, L1, L2 are all hypervisors), we do not support emulation of the vVMLOAD/VMSAVE feature, the L0 hypervisor can inject the proper #VMEXIT to inform L1 of what is happening and L1 can avoid invoking the #GP workaround. For this reason we turns on guest VM's X86_FEATURE_SVME_ADDR_CHK bit for KVM running inside VM to receive the notification and change behavior. Similarly we check if vcpu is under guest mode before emulating the vmware-backdoor instructions. For the case of nested on nested, we let the guest handle it. Co-developed-by: Bandan Das <bsd@redhat.com> Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210126081831.570253-5-wei.huang2@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:28 -05:00
Wei Huang	3b9c723ed7	KVM: SVM: Add support for SVM instruction address check change New AMD CPUs have a change that checks #VMEXIT intercept on special SVM instructions before checking their EAX against reserved memory region. This change is indicated by CPUID_0x8000000A_EDX[28]. If it is 1, #VMEXIT is triggered before #GP. KVM doesn't need to intercept and emulate #GP faults as #GP is supposed to be triggered. Co-developed-by: Bandan Das <bsd@redhat.com> Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210126081831.570253-4-wei.huang2@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:28 -05:00
Bandan Das	82a11e9c6f	KVM: SVM: Add emulation support for #GP triggered by SVM instructions While running SVM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD CPUs check EAX against reserved memory regions (e.g. SMM memory on host) before checking VMCB's instruction intercept. If EAX falls into such memory areas, #GP is triggered before VMEXIT. This causes problem under nested virtualization. To solve this problem, KVM needs to trap #GP and check the instructions triggering #GP. For VM execution instructions, KVM emulates these instructions. Co-developed-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Bandan Das <bsd@redhat.com> Message-Id: <20210126081831.570253-3-wei.huang2@amd.com> [Conditionally enable #GP intercept. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:28 -05:00
Chenyi Qiang	9a3ecd5e2a	KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW DR6_INIT contains the 1-reserved bits as well as the bit that is cleared to 0 when the condition (e.g. RTM) happens. The value can be used to initialize dr6 and also be the XOR mask between the #DB exit qualification (or payload) and DR6. Concerning that DR6_INIT is used as initial value only once, rename it to DR6_ACTIVE_LOW and apply it in other places, which would make the incoming changes for bus lock debug exception more simple. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210202090433.13441-2-chenyi.qiang@intel.com> [Define DR6_FIXED_1 from DR6_ACTIVE_LOW and DR6_VOLATILE. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:27 -05:00
Brijesh Singh	2c07ded064	KVM/SVM: add support for SEV attestation command The SEV FW version >= 0.23 added a new command that can be used to query the attestation report containing the SHA-256 digest of the guest memory encrypted through the KVM_SEV_LAUNCH_UPDATE_{DATA, VMSA} commands and sign the report with the Platform Endorsement Key (PEK). See the SEV FW API spec section 6.8 for more details. Note there already exist a command (KVM_SEV_LAUNCH_MEASURE) that can be used to get the SHA-256 digest. The main difference between the KVM_SEV_LAUNCH_MEASURE and KVM_SEV_ATTESTATION_REPORT is that the latter can be called while the guest is running and the measurement value is signed with PEK. Cc: James Bottomley <jejb@linux.ibm.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: David Rientjes <rientjes@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: John Allen <john.allen@amd.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: linux-crypto@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: James Bottomley <jejb@linux.ibm.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Message-Id: <20210104151749.30248-1-brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:20 -05:00
Paolo Bonzini	c1c35cf78b	KVM: x86: cleanup CR3 reserved bits checks If not in long mode, the low bits of CR3 are reserved but not enforced to be zero, so remove those checks. If in long mode, however, the MBZ bits extend down to the highest physical address bit of the guest, excluding the encryption bit. Make the checks consistent with the above, and match them between nested_vmcb_checks and KVM_SET_SREGS. Cc: stable@vger.kernel.org Fixes: `761e416934` ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests") Fixes: `a780a3ea62` ("KVM: X86: Fix reserved bits check for MOV to CR3") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-03 04:30:38 -05:00
Sean Christopherson	ccd85d90ce	KVM: SVM: Treat SVM as unsupported when running as an SEV guest Don't let KVM load when running as an SEV guest, regardless of what CPUID says. Memory is encrypted with a key that is not accessible to the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll see garbage when reading the VMCB. Technically, KVM could decrypt all memory that needs to be accessible to the L0 and use shadow paging so that L0 does not need to shadow NPT, but exposing such information to L0 largely defeats the purpose of running as an SEV guest. This can always be revisited if someone comes up with a use case for running VMs inside SEV guests. Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM is doomed even if the SEV guest is debuggable and the hypervisor is willing to decrypt the VMCB. This may or may not be fixed on CPUs that have the SVME_ADDR_CHK fix. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210202212017.2486595-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-03 04:30:37 -05:00
Peter Gonda	19a23da539	Fix unsynchronized access to sev members through svm_register_enc_region Grab kvm->lock before pinning memory when registering an encrypted region; sev_pin_memory() relies on kvm->lock being held to ensure correctness when checking and updating the number of pinned pages. Add a lockdep assertion to help prevent future regressions. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: `1e80fdc09d` ("KVM: SVM: Pin guest memory when SEV is active") Signed-off-by: Peter Gonda <pgonda@google.com> V2 - Fix up patch description - Correct file paths svm.c -> sev.c - Add unlock of kvm->lock on sev_pin_memory error V1 - https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/ Message-Id: <20210127161524.2832400-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-28 13:03:14 -05:00
Paolo Bonzini	9a78e15802	KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS, which may need to be loaded outside guest mode. Therefore we cannot WARN in that case. However, that part of nested_get_vmcs12_pages is _not_ needed at vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling, so that both vmentry and migration (and in the latter case, independent of is_guest_mode) do the parts that are needed. Cc: <stable@vger.kernel.org> # 5.10.x: `f2c7ef3ba`: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES Cc: <stable@vger.kernel.org> # 5.10.x Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-25 18:54:09 -05:00
Sean Christopherson	250091409a	KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the GRPs' dirty bits are set from time zero and never cleared, i.e. will always be seen as dirty. The obvious alternative would be to clear the dirty bits when appropriate, but removing the dirty checks is desirable as it allows reverting GPR dirty+available tracking, which adds overhead to all flavors of x86 VMs. Note, unconditionally writing the GPRs in the GHCB is tacitly allowed by the GHCB spec, which allows the hypervisor (or guest) to provide unnecessary info; it's the guest's responsibility to consume only what it needs (the hypervisor is untrusted after all). The guest and hypervisor can supply additional state if desired but must not rely on that additional state being provided. Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Fixes: `291bd20d5d` ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210122235049.3107620-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-25 18:52:09 -05:00
Lorenzo Brescia	d95df95106	kvm: tracing: Fix unmatched kvm_entry and kvm_exit events On VMX, if we exit and then re-enter immediately without leaving the vmx_vcpu_run() function, the kvm_entry event is not logged. That means we will see one (or more) kvm_exit, without its (their) corresponding kvm_entry, as shown here: CPU-1979 [002] 89.871187: kvm_entry: vcpu 1 CPU-1979 [002] 89.871218: kvm_exit: reason MSR_WRITE CPU-1979 [002] 89.871259: kvm_exit: reason MSR_WRITE It also seems possible for a kvm_entry event to be logged, but then we leave vmx_vcpu_run() right away (if vmx->emulation_required is true). In this case, we will have a spurious kvm_entry event in the trace. Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run() (where trace_kvm_exit() already is). A trace obtained with this patch applied looks like this: CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395392: kvm_exit: reason MSR_WRITE CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395503: kvm_exit: reason EXTERNAL_INTERRUPT Of course, not calling trace_kvm_entry() in common x86 code any longer means that we need to adjust the SVM side of things too. Signed-off-by: Lorenzo Brescia <lorenzo.brescia@edu.unito.it> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-25 18:52:08 -05:00
Tom Lendacky	647daca25d	KVM: SVM: Add support for booting APs in an SEV-ES guest Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence, where the guest vCPU register state is updated and then the vCPU is VMRUN to begin execution of the AP. For an SEV-ES guest, this won't work because the guest register state is encrypted. Following the GHCB specification, the hypervisor must not alter the guest register state, so KVM must track an AP/vCPU boot. Should the guest want to park the AP, it must use the AP Reset Hold exit event in place of, for example, a HLT loop. First AP boot (first INIT-SIPI-SIPI sequence): Execute the AP (vCPU) as it was initialized and measured by the SEV-ES support. It is up to the guest to transfer control of the AP to the proper location. Subsequent AP boot: KVM will expect to receive an AP Reset Hold exit event indicating that the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to awaken it. When the AP Reset Hold exit event is received, KVM will place the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI sequence, KVM will make the vCPU runnable. It is again up to the guest to then transfer control of the AP to the proper location. To differentiate between an actual HLT and an AP Reset Hold, a new MP state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is placed in upon receiving the AP Reset Hold exit event. Additionally, to communicate the AP Reset Hold exit event up to userspace (if needed), a new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD. A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds a new function that, for non SEV-ES guests, invokes the original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests, implements the logic above. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:11:37 -05:00
Maxim Levitsky	f2c7ef3ba9	KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit It is possible to exit the nested guest mode, entered by svm_set_nested_state prior to first vm entry to it (e.g due to pending event) if the nested run was not pending during the migration. In this case we must not switch to the nested msr permission bitmap. Also add a warning to catch similar cases in the future. Fixes: `a7d5c7ce41` ("KVM: nSVM: delay MSR permission processing to first nested VM run") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:11:35 -05:00
Maxim Levitsky	56fe28de8c	KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode We overwrite most of vmcb fields while doing so, so we must mark it as dirty. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:11:34 -05:00
Maxim Levitsky	81f76adad5	KVM: nSVM: correctly restore nested_run_pending on migration The code to store it on the migration exists, but no code was restoring it. One of the side effects of fixing this is that L1->L2 injected events are no longer lost when migration happens with nested run pending. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:11:33 -05:00
Uros Bizjak	52782d5b63	KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c Commit `16809ecdc1` moved __svm_vcpu_run the prototype to svm.h, but forgot to remove the original from svm.c. Fixes: `16809ecdc1` ("KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20201220200339.65115-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:07:28 -05:00
Nathan Chancellor	f65cf84ee7	KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load When using LLVM's integrated assembler (LLVM_IAS=1) while building x86_64_defconfig + CONFIG_KVM=y + CONFIG_KVM_AMD=y, the following build error occurs: $ make LLVM=1 LLVM_IAS=1 arch/x86/kvm/svm/sev.o arch/x86/kvm/svm/sev.c:2004:15: error: too few operands for instruction asm volatile(__ex("vmsave") : : "a" (__sme_page_pa(sd->save_area)) : "memory"); ^ arch/x86/kvm/svm/sev.c:28:17: note: expanded from macro '__ex' #define __ex(x) __kvm_handle_fault_on_reboot(x) ^ ./arch/x86/include/asm/kvm_host.h:1646:10: note: expanded from macro '__kvm_handle_fault_on_reboot' "666: \n\t" \ ^ <inline asm>:2:2: note: instantiated into assembly here vmsave ^ 1 error generated. This happens because LLVM currently does not support calling vmsave without the fixed register operand (%rax for 64-bit and %eax for 32-bit). This will be fixed in LLVM 12 but the kernel currently supports LLVM 10.0.1 and newer so this needs to be handled. Add the proper register using the _ASM_AX macro, which matches the vmsave call in vmenter.S. Fixes: `861377730a` ("KVM: SVM: Provide support for SEV-ES vCPU loading") Link: https://reviews.llvm.org/D93524 Link: https://github.com/ClangBuiltLinux/linux/issues/1216 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Message-Id: <20201219063711.3526947-1-natechancellor@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:07:27 -05:00
Paolo Bonzini	bc351f0726	Merge branch 'kvm-master' into kvm-next Fixes to get_mmio_spte, destined to 5.10 stable branch.	2021-01-07 18:06:52 -05:00
Paolo Bonzini	d45f89f743	KVM: SVM: fix 32-bit compilation VCPU_REGS_R8...VCPU_REGS_R15 are not defined on 32-bit x86, so cull them from the synchronization of the VMSA. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-16 13:08:21 -05:00
Tom Lendacky	8640ca588b	KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting The GHCB specification requires the hypervisor to save the address of an AP Jump Table so that, for example, vCPUs that have been parked by UEFI can be started by the OS. Provide support for the AP Jump Table set/get exit code. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 12:47:53 -05:00
Tom Lendacky	ad73109ae7	KVM: SVM: Provide support to launch and run an SEV-ES guest An SEV-ES guest is started by invoking a new SEV initialization ioctl, KVM_SEV_ES_INIT. This identifies the guest as an SEV-ES guest, which is used to drive the appropriate ASID allocation, VMSA encryption, etc. Before being able to run an SEV-ES vCPU, the vCPU VMSA must be encrypted and measured. This is done using the LAUNCH_UPDATE_VMSA command after all calls to LAUNCH_UPDATE_DATA have been performed, but before LAUNCH_MEASURE has been performed. In order to establish the encrypted VMSA, the current (traditional) VMSA and the GPRs are synced to the page that will hold the encrypted VMSA and then LAUNCH_UPDATE_VMSA is invoked. The vCPU is then marked as having protected guest state. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e9643245adb809caf3a87c09997926d2f3d6ff41.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:21:00 -05:00
Tom Lendacky	16809ecdc1	KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests The run sequence is different for an SEV-ES guest compared to a legacy or even an SEV guest. The guest vCPU register state of an SEV-ES guest will be restored on VMRUN and saved on VMEXIT. There is no need to restore the guest registers directly and through VMLOAD before VMRUN and no need to save the guest registers directly and through VMSAVE on VMEXIT. Update the svm_vcpu_run() function to skip register state saving and restoring and provide an alternative function for running an SEV-ES guest in vmenter.S Additionally, certain host state is restored across an SEV-ES VMRUN. As a result certain register states are not required to be restored upon VMEXIT (e.g. FS, GS, etc.), so only do that if the guest is not an SEV-ES guest. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <fb1c66d32f2194e171b95fc1a8affd6d326e10c1.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:59 -05:00
Tom Lendacky	861377730a	KVM: SVM: Provide support for SEV-ES vCPU loading An SEV-ES vCPU requires additional VMCB vCPU load/put requirements. SEV-ES hardware will restore certain registers on VMEXIT, but not save them on VMRUN (see Table B-3 and Table B-4 of the AMD64 APM Volume 2), so make the following changes: General vCPU load changes: - During vCPU loading, perform a VMSAVE to the per-CPU SVM save area and save the current values of XCR0, XSS and PKRU to the per-CPU SVM save area as these registers will be restored on VMEXIT. General vCPU put changes: - Do not attempt to restore registers that SEV-ES hardware has already restored on VMEXIT. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <019390e9cb5e93cd73014fa5a040c17d42588733.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:59 -05:00
Tom Lendacky	376c6d2850	KVM: SVM: Provide support for SEV-ES vCPU creation/loading An SEV-ES vCPU requires additional VMCB initialization requirements for vCPU creation and vCPU load/put requirements. This includes: General VMCB initialization changes: - Set a VMCB control bit to enable SEV-ES support on the vCPU. - Set the VMCB encrypted VM save area address. - CRx registers are part of the encrypted register state and cannot be updated. Remove the CRx register read and write intercepts and replace them with CRx register write traps to track the CRx register values. - Certain MSR values are part of the encrypted register state and cannot be updated. Remove certain MSR intercepts (EFER, CR_PAT, etc.). - Remove the #GP intercept (no support for "enable_vmware_backdoor"). - Remove the XSETBV intercept since the hypervisor cannot modify XCR0. General vCPU creation changes: - Set the initial GHCB gpa value as per the GHCB specification. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <3a8aef366416eddd5556dfa3fdc212aafa1ad0a2.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:58 -05:00
Tom Lendacky	80675b3ad4	KVM: SVM: Update ASID allocation to support SEV-ES guests SEV and SEV-ES guests each have dedicated ASID ranges. Update the ASID allocation routine to return an ASID in the respective range. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <d7aed505e31e3954268b2015bb60a1486269c780.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:57 -05:00
Tom Lendacky	85ca8be938	KVM: SVM: Set the encryption mask for the SVM host save area The SVM host save area is used to restore some host state on VMEXIT of an SEV-ES guest. After allocating the save area, clear it and add the encryption mask to the SVM host save area physical address that is programmed into the VM_HSAVE_PA MSR. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <b77aa28af6d7f1a0cb545959e08d6dc75e0c3cba.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:57 -05:00
Tom Lendacky	4444dfe405	KVM: SVM: Add NMI support for an SEV-ES guest The GHCB specification defines how NMIs are to be handled for an SEV-ES guest. To detect the completion of an NMI the hypervisor must not intercept the IRET instruction (because a #VC while running the NMI will issue an IRET) and, instead, must receive an NMI Complete exit event from the guest. Update the KVM support for detecting the completion of NMIs in the guest to follow the GHCB specification. When an SEV-ES guest is active, the IRET instruction will no longer be intercepted. Now, when the NMI Complete exit event is received, the iret_interception() function will be called to simulate the completion of the NMI. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <5ea3dd69b8d4396cefdc9048ebc1ab7caa70a847.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:56 -05:00
Tom Lendacky	ed02b21309	KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest The guest FPU state is automatically restored on VMRUN and saved on VMEXIT by the hardware, so there is no reason to do this in KVM. Eliminate the allocation of the guest_fpu save area and key off that to skip operations related to the guest FPU state. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <173e429b4d0d962c6a443c4553ffdaf31b7665a4.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:56 -05:00
Tom Lendacky	5719455fbd	KVM: SVM: Do not report support for SMM for an SEV-ES guest SEV-ES guests do not currently support SMM. Update the has_emulated_msr() kvm_x86_ops function to take a struct kvm parameter so that the capability can be reported at a VM level. Since this op is also called during KVM initialization and before a struct kvm instance is available, comments will be added to each implementation of has_emulated_msr() to indicate the kvm parameter can be null. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <75de5138e33b945d2fb17f81ae507bda381808e3.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:55 -05:00
Tom Lendacky	d1949b93c6	KVM: SVM: Add support for CR8 write traps for an SEV-ES guest For SEV-ES guests, the interception of control register write access is not recommended. Control register interception occurs prior to the control register being modified and the hypervisor is unable to modify the control register itself because the register is located in the encrypted register state. SEV-ES guests introduce new control register write traps. These traps provide intercept support of a control register write after the control register has been modified. The new control register value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest control registers. Add support to track the value of the guest CR8 register using the control register write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <5a01033f4c8b3106ca9374b7cadf8e33da852df1.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:54 -05:00
Tom Lendacky	5b51cb1316	KVM: SVM: Add support for CR4 write traps for an SEV-ES guest For SEV-ES guests, the interception of control register write access is not recommended. Control register interception occurs prior to the control register being modified and the hypervisor is unable to modify the control register itself because the register is located in the encrypted register state. SEV-ES guests introduce new control register write traps. These traps provide intercept support of a control register write after the control register has been modified. The new control register value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest control registers. Add support to track the value of the guest CR4 register using the control register write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c3880bf2db8693aa26f648528fbc6e967ab46e25.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:53 -05:00
Tom Lendacky	f27ad38aac	KVM: SVM: Add support for CR0 write traps for an SEV-ES guest For SEV-ES guests, the interception of control register write access is not recommended. Control register interception occurs prior to the control register being modified and the hypervisor is unable to modify the control register itself because the register is located in the encrypted register state. SEV-ES support introduces new control register write traps. These traps provide intercept support of a control register write after the control register has been modified. The new control register value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest control registers. Add support to track the value of the guest CR0 register using the control register write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <182c9baf99df7e40ad9617ff90b84542705ef0d7.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:52 -05:00
Tom Lendacky	2985afbcdb	KVM: SVM: Add support for EFER write traps for an SEV-ES guest For SEV-ES guests, the interception of EFER write access is not recommended. EFER interception occurs prior to EFER being modified and the hypervisor is unable to modify EFER itself because the register is located in the encrypted register state. SEV-ES support introduces a new EFER write trap. This trap provides intercept support of an EFER write after it has been modified. The new EFER value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest EFER. Add support to track the value of the guest EFER value using the EFER write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <8993149352a3a87cd0625b3b61bfd31ab28977e1.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:51 -05:00
Tom Lendacky	7ed9abfe8e	KVM: SVM: Support string IO operations for an SEV-ES guest For an SEV-ES guest, string-based port IO is performed to a shared (un-encrypted) page so that both the hypervisor and guest can read or write to it and each see the contents. For string-based port IO operations, invoke SEV-ES specific routines that can complete the operation using common KVM port IO support. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <9d61daf0ffda496703717218f415cdc8fd487100.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:51 -05:00
Tom Lendacky	8f423a80d2	KVM: SVM: Support MMIO for an SEV-ES guest For an SEV-ES guest, MMIO is performed to a shared (un-encrypted) page so that both the hypervisor and guest can read or write to it and each see the contents. The GHCB specification provides software-defined VMGEXIT exit codes to indicate a request for an MMIO read or an MMIO write. Add support to recognize the MMIO requests and invoke SEV-ES specific routines that can complete the MMIO operation. These routines use common KVM support to complete the MMIO operation. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <af8de55127d5bcc3253d9b6084a0144c12307d4d.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:50 -05:00
Tom Lendacky	59e38b58de	KVM: SVM: Create trace events for VMGEXIT MSR protocol processing Add trace events for entry to and exit from VMGEXIT MSR protocol processing. The vCPU will be common for the trace events. The MSR protocol processing is guided by the GHCB GPA in the VMCB, so the GHCB GPA will represent the input and output values for the entry and exit events, respectively. Additionally, the exit event will contain the return code for the event. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c5b3b440c3e0db43ff2fc02813faa94fa54896b0.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:49 -05:00
Tom Lendacky	d523ab6ba2	KVM: SVM: Create trace events for VMGEXIT processing Add trace events for entry to and exit from VMGEXIT processing. The vCPU id and the exit reason will be common for the trace events. The exit info fields will represent the input and output values for the entry and exit events, respectively. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <25357dca49a38372e8f483753fb0c1c2a70a6898.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:49 -05:00
Tom Lendacky	e1d71116b6	KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100 The GHCB specification defines a GHCB MSR protocol using the lower 12-bits of the GHCB MSR (in the hypervisor this corresponds to the GHCB GPA field in the VMCB). Function 0x100 is a request for termination of the guest. The guest has encountered some situation for which it has requested to be terminated. The GHCB MSR value contains the reason for the request. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <f3a1f7850c75b6ea4101e15bbb4a3af1a203f1dc.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:48 -05:00
Tom Lendacky	d36946679e	KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004 The GHCB specification defines a GHCB MSR protocol using the lower 12-bits of the GHCB MSR (in the hypervisor this corresponds to the GHCB GPA field in the VMCB). Function 0x004 is a request for CPUID information. Only a single CPUID result register can be sent per invocation, so the protocol defines the register that is requested. The GHCB MSR value is set to the CPUID register value as per the specification via the VMCB GHCB GPA field. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <fd7ee347d3936e484c06e9001e340bf6387092cd.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:48 -05:00
Tom Lendacky	1edc14599e	KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002 The GHCB specification defines a GHCB MSR protocol using the lower 12-bits of the GHCB MSR (in the hypervisor this corresponds to the GHCB GPA field in the VMCB). Function 0x002 is a request to set the GHCB MSR value to the SEV INFO as per the specification via the VMCB GHCB GPA field. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c23c163a505290a0d1b9efc4659b838c8c902cbc.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:47 -05:00
Tom Lendacky	291bd20d5d	KVM: SVM: Add initial support for a VMGEXIT VMEXIT SEV-ES adds a new VMEXIT reason code, VMGEXIT. Initial support for a VMGEXIT includes mapping the GHCB based on the guest GPA, which is obtained from a new VMCB field, and then validating the required inputs for the VMGEXIT exit reason. Since many of the VMGEXIT exit reasons correspond to existing VMEXIT reasons, the information from the GHCB is copied into the VMCB control exit code areas and KVM register areas. The standard exit handlers are invoked, similar to standard VMEXIT processing. Before restarting the vCPU, the GHCB is updated with any registers that have been updated by the hypervisor. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c6a4ed4294a369bd75c44d03bd7ce0f0c3840e50.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:47 -05:00
Tom Lendacky	e9093fd492	KVM: SVM: Prepare for SEV-ES exit handling in the sev.c file This is a pre-patch to consolidate some exit handling code into callable functions. Follow-on patches for SEV-ES exit handling will then be able to use them from the sev.c file. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <5b8b0ffca8137f3e1e257f83df9f5c881c8a96a3.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:47 -05:00
Tom Lendacky	8164a5ffe4	KVM: SVM: Cannot re-initialize the VMCB after shutdown with SEV-ES When a SHUTDOWN VMEXIT is encountered, normally the VMCB is re-initialized so that the guest can be re-launched. But when a guest is running as an SEV-ES guest, the VMSA cannot be re-initialized because it has been encrypted. For now, just return -EINVAL to prevent a possible attempt at a guest reset. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <aa6506000f6f3a574de8dbcdab0707df844cb00c.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:46 -05:00
Tom Lendacky	bc624d9f1b	KVM: SVM: Do not allow instruction emulation under SEV-ES When a guest is running as an SEV-ES guest, it is not possible to emulate instructions. Add support to prevent instruction emulation. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <f6355ea3024fda0a3eb5eb99c6b62dca10d792bd.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:46 -05:00
Tom Lendacky	8d4846b9b1	KVM: SVM: Prevent debugging under SEV-ES Since the guest register state of an SEV-ES guest is encrypted, debugging is not supported. Update the code to prevent guest debugging when the guest has protected state. Additionally, an SEV-ES guest must only and always intercept DR7 reads and writes. Update set_dr_intercepts() and clr_dr_intercepts() to account for this. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <8db966fa2f9803d6454ce773863025d0e2e7f3cc.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:46 -05:00
Tom Lendacky	f1c6366e30	KVM: SVM: Add required changes to support intercepts under SEV-ES When a guest is running under SEV-ES, the hypervisor cannot access the guest register state. There are numerous places in the KVM code where certain registers are accessed that are not allowed to be accessed (e.g. RIP, CR0, etc). Add checks to prevent register accesses and add intercept update support at various points within the KVM code. Also, when handling a VMGEXIT, exceptions are passed back through the GHCB. Since the RDMSR/WRMSR intercepts (may) inject a #GP on error, update the SVM intercepts to handle this for SEV-ES guests. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [Redo MSR part using the .complete_emulated_msr callback. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:45 -05:00
Paolo Bonzini	f9a4d62176	KVM: x86: introduce complete_emulated_msr callback This will be used by SEV-ES to inject MSR failure via the GHCB. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-15 05:20:34 -05:00
Tom Lendacky	add5e2f045	KVM: SVM: Add support for the SEV-ES VMSA Allocate a page during vCPU creation to be used as the encrypted VM save area (VMSA) for the SEV-ES guest. Provide a flag in the kvm_vcpu_arch structure that indicates whether the guest state is protected. When freeing a VMSA page that has been encrypted, the cache contents must be flushed using the MSR_AMD64_VM_PAGE_FLUSH before freeing the page. [ i386 build warnings ] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <fde272b17eec804f3b9db18c131262fe074015c5.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-14 11:09:32 -05:00
Tom Lendacky	916391a2d1	KVM: SVM: Add support for SEV-ES capability in KVM Add support to KVM for determining if a system is capable of supporting SEV-ES as well as determining if a guest is an SEV-ES guest. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e66792323982c822350e40c7a1cf67ea2978a70b.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-14 11:09:31 -05:00
Tom Lendacky	9d4747d023	KVM: SVM: Remove the call to sev_platform_status() during setup When both KVM support and the CCP driver are built into the kernel instead of as modules, KVM initialization can happen before CCP initialization. As a result, sev_platform_status() will return a failure when it is called from sev_hardware_setup(), when this isn't really an error condition. Since sev_platform_status() doesn't need to be called at this time anyway, remove the invocation from sev_hardware_setup(). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <618380488358b56af558f2682203786f09a49483.1607620209.git.thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-14 11:09:30 -05:00
Uros Bizjak	3f1a18b9fa	KVM/VMX/SVM: Move kvm_machine_check function to x86.h Move kvm_machine_check to x86.h to avoid two exact copies of the same function in kvm.c and svm.c. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20201029135600.122392-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-14 11:09:29 -05:00
Paolo Bonzini	39485ed95d	KVM: x86: reinstate vendor-agnostic check on SPEC_CTRL cpuid bits Until commit `e7c587da12` ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP"), KVM was testing both Intel and AMD CPUID bits before allowing the guest to write MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD. Testing only Intel bits on VMX processors, or only AMD bits on SVM processors, fails if the guests are created with the "opposite" vendor as the host. While at it, also tweak the host CPU check to use the vendor-agnostic feature bit X86_FEATURE_IBPB, since we only care about the availability of the MSR on the host here and not about specific CPUID bits. Fixes: `e7c587da12` ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP") Cc: stable@vger.kernel.org Reported-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-11 19:05:13 -05:00
Jacob Xu	a2b2d4bf50	kvm: svm: de-allocate svm_cpu_data for all cpus in svm_cpu_uninit() The cpu arg for svm_cpu_uninit() was previously ignored resulting in the per cpu structure svm_cpu_data not being de-allocated for all cpus. Signed-off-by: Jacob Xu <jacobhxu@google.com> Message-Id: <20201203205939.1783969-1-jacobhxu@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-04 03:47:58 -05:00
Paolo Bonzini	dee734a7de	KVM: x86: adjust SEV for commit `7e8e6eed75` Since the ASID is now stored in svm->asid, pre_sev_run should also place it there and not directly in the VMCB control area. Reported-by: Ashish Kalra <Ashish.Kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-03 12:38:39 -05:00
Paolo Bonzini	8cce12b3c8	KVM: nSVM: set fixed bits by hand SVM generally ignores fixed-1 bits. Set them manually so that we do not end up by mistake without those bits set in struct kvm_vcpu; it is part of userspace API that KVM always returns value with the bits set. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-27 12:46:36 -05:00
Chen Zhou	054409ab25	KVM: SVM: fix error return code in svm_create_vcpu() Fix to return a negative error code from the error handling case instead of 0 in function svm_create_vcpu(), as done elsewhere in this function. Fixes: `f4c847a956` ("KVM: SVM: refactor msr permission bitmap allocation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Message-Id: <20201117025426.167824-1-chenzhou10@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-17 02:40:08 -05:00
Ashish Kalra	854c57f02b	KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt(). Fix offset computation in __sev_dbg_decrypt() to include the source paddr before it is rounded down to be aligned to 16 bytes as required by SEV API. This fixes incorrect guest memory dumps observed when using qemu monitor. Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <20201110224205.29444-1-Ashish.Kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-16 13:59:09 -05:00
Paolo Bonzini	dc924b0624	KVM: SVM: check CR4 changes against vcpu->arch Similarly to what vmx/vmx.c does, use vcpu->arch.cr4 to check if CR4 bits PGE, PKE and OSXSAVE have changed. When switching between VMCB01 and VMCB02, CPUID has to be adjusted every time if CR4.PKE or CR4.OSXSAVE change; without this patch, instead, CR4 would be checked against the previous value for L2 on vmentry, and against the previous value for L1 on vmexit, and CPUID would not be updated. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-16 13:14:22 -05:00
Cathy Avery	7e8e6eed75	KVM: SVM: Move asid to vcpu_svm KVM does not have separate ASIDs for L1 and L2; either the nested hypervisor and nested guests share a single ASID, or on older processor the ASID is used only to implement TLB flushing. Either way, ASIDs are handled at the VM level. In preparation for having different VMCBs passed to VMLOAD/VMRUN/VMSAVE for L1 and L2, store the current ASID to struct vcpu_svm and only move it to the VMCB in svm_vcpu_run. This way, TLB flushes can be applied no matter which VMCB will be active during the next svm_vcpu_run. Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20201011184818.3609-2-cavery@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-16 13:14:21 -05:00
Jim Mattson	2259c17f01	kvm: x86: Sink cpuid update into vendor-specific set_cr4 functions On emulated VM-entry and VM-exit, update the CPUID bits that reflect CR4.OSXSAVE and CR4.PKE. This fixes a bug where the CPUID bits could continue to reflect L2 CR4 values after emulated VM-exit to L1. It also fixes a related bug where the CPUID bits could continue to reflect L1 CR4 values after emulated VM-entry to L2. The latter bug is mainly relevant to SVM, wherein CPUID is not a required intercept. However, it could also be relevant to VMX, because the code to conditionally update these CPUID bits assumes that the guest CPUID and the guest CR4 are always in sync. Fixes: `8eb3f87d90` ("KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit") Fixes: `2acf923e38` ("KVM: VMX: Enable XSAVE/XRSTOR for guest") Fixes: `b9baba8614` ("KVM, pkeys: expose CPUID/CR4 to guest") Reported-by: Abhiroop Dabral <adabral@paloaltonetworks.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Peter Shier <pshier@google.com> Cc: Haozhong Zhang <haozhong.zhang@intel.com> Cc: Dexuan Cui <dexuan.cui@intel.com> Cc: Huaitong Han <huaitong.han@intel.com> Message-Id: <20201029170648.483210-1-jmattson@google.com>	2020-11-15 09:49:18 -05:00
Peter Xu	ff5a983cbb	KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR\|IDENTITY_MAP_ADDR] Originally, we have three code paths that can dirty a page without vcpu context for X86: - init_rmode_identity_map - init_rmode_tss - kvmgt_rw_gpa init_rmode_identity_map and init_rmode_tss will be setup on destination VM no matter what (and the guest cannot even see them), so it does not make sense to track them at all. To do this, allow __x86_set_memory_region() to return the userspace address that just allocated to the caller. Then in both of the functions we directly write to the userspace address instead of calling kvm_write_*() APIs. Another trivial change is that we don't need to explicitly clear the identity page table root in init_rmode_identity_map() because no matter what we'll write to the whole page with 4M huge page entries. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201001012044.5151-4-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-15 09:49:12 -05:00
Sean Christopherson	ee69c92bac	KVM: x86: Return bool instead of int for CR4 and SREGS validity checks Rework the common CR4 and SREGS checks to return a bool instead of an int, i.e. true/false instead of 0/-EINVAL, and add "is" to the name to clarify the polarity of the return value (which is effectively inverted by this change). No functional changed intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-15 09:49:08 -05:00
Sean Christopherson	c2fe3cd460	KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(), and invoke the new hook from kvm_valid_cr4(). This fixes an issue where KVM_SET_SREGS would return success while failing to actually set CR4. Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return in __set_sregs() is not a viable option as KVM has already stuffed a variety of vCPU state. Note, kvm_valid_cr4() and is_valid_cr4() have different return types and inverted semantics. This will be remedied in a future patch. Fixes: `5e1746d620` ("KVM: nVMX: Allow setting the VMXE bit in CR4") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-15 09:49:07 -05:00
Sean Christopherson	311a06593b	KVM: SVM: Drop VMXE check from svm_set_cr4() Drop svm_set_cr4()'s explicit check CR4.VMXE now that common x86 handles the check by incorporating VMXE into the CR4 reserved bits, via kvm_cpu_caps. SVM obviously does not set X86_FEATURE_VMX. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-15 09:49:07 -05:00
Babu Moger	96308b0661	KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guests For AMD SEV guests, update the cr3_lm_rsvd_bits to mask the memory encryption bit in reserved bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <160521948301.32054.5783800787423231162.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-13 06:31:14 -05:00
Linus Torvalds	f9a705ad1c	ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes For x86, also included in this pull request is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+S8dsUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroM40Af+M46NJmuS5rcwFfybvK/c42KT6svX Co1NrZDwzSQ2mMy3WQzH9qeLvb+nbY4sT3n5BPNPNsT+aIDPOTDt//qJ2/Ip9UUs tRNea0MAR96JWLE7MSeeRxnTaQIrw/AAZC0RXFzZvxcgytXwdqBExugw4im+b+dn Dcz8QxX1EkwT+4lTm5HC0hKZAuo4apnK1QkqCq4SdD2QVJ1YE6+z7pgj4wX7xitr STKD6q/Yt/0ndwqS0GSGbyg0jy6mE620SN6isFRkJYwqfwLJci6KnqvEK67EcNMu qeE017K+d93yIVC46/6TfVHzLR/D1FpQ8LZ16Yl6S13OuGIfAWBkQZtPRg== =AD6a -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "For x86, there is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. Other updates: ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits) kvm: x86/mmu: NX largepage recovery for TDP MMU kvm: x86/mmu: Don't clear write flooding count for direct roots kvm: x86/mmu: Support MMIO in the TDP MMU kvm: x86/mmu: Support write protection for nesting in tdp MMU kvm: x86/mmu: Support disabling dirty logging for the tdp MMU kvm: x86/mmu: Support dirty logging for the TDP MMU kvm: x86/mmu: Support changed pte notifier in tdp MMU kvm: x86/mmu: Add access tracking for tdp_mmu kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU kvm: x86/mmu: Add TDP MMU PF handler kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg kvm: x86/mmu: Support zapping SPTEs in the TDP MMU KVM: Cache as_id in kvm_memory_slot kvm: x86/mmu: Add functions to handle changed TDP SPTEs kvm: x86/mmu: Allocate and free TDP MMU roots kvm: x86/mmu: Init / Uninit the TDP MMU kvm: x86/mmu: Introduce tdp_iter KVM: mmu: extract spte.h and spte.c KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp ...	2020-10-23 11:17:56 -07:00
Paolo Bonzini	c0623f5e5d	Merge branch 'kvm-fixes' into 'next' Pick up bugfixes from 5.9, otherwise various tests fail.	2020-10-21 18:05:58 -04:00
Suravee Suthikulpanit	f6426ab9c9	KVM: SVM: Initialize prev_ga_tag before use The function amd_ir_set_vcpu_affinity makes use of the parameter struct amd_iommu_pi_data.prev_ga_tag to determine if it should delete struct amd_iommu_pi_data from a list when not running in AVIC mode. However, prev_ga_tag is initialized only when AVIC is enabled. The non-zero uninitialized value can cause unintended code path, which ends up making use of the struct vcpu_svm.ir_list and ir_list_lock without being initialized (since they are intended only for the AVIC case). This triggers NULL pointer dereference bug in the function vm_ir_list_del with the following call trace: svm_update_pi_irte+0x3c2/0x550 [kvm_amd] ? proc_create_single_data+0x41/0x50 kvm_arch_irq_bypass_add_producer+0x40/0x60 [kvm] __connect+0x5f/0xb0 [irqbypass] irq_bypass_register_producer+0xf8/0x120 [irqbypass] vfio_msi_set_vector_signal+0x1de/0x2d0 [vfio_pci] vfio_msi_set_block+0x77/0xe0 [vfio_pci] vfio_pci_set_msi_trigger+0x25c/0x2f0 [vfio_pci] vfio_pci_set_irqs_ioctl+0x88/0xb0 [vfio_pci] vfio_pci_ioctl+0x2ea/0xed0 [vfio_pci] ? alloc_file_pseudo+0xa5/0x100 vfio_device_fops_unl_ioctl+0x26/0x30 [vfio] ? vfio_device_fops_unl_ioctl+0x26/0x30 [vfio] __x64_sys_ioctl+0x96/0xd0 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Therefore, initialize prev_ga_tag to zero before use. This should be safe because ga_tag value 0 is invalid (see function avic_vm_init). Fixes: `dfa20099e2` ("KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20201003232707.4662-1-suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-21 17:48:49 -04:00
Maxim Levitsky	2fcf4876ad	KVM: nSVM: implement on demand allocation of the nested state This way we don't waste memory on VMs which don't use nesting virtualization even when the host enabled it for them. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-21 17:48:48 -04:00
Maxim Levitsky	72f211ecaa	KVM: x86: allow kvm_x86_ops.set_efer to return an error value This will be used to signal an error to the userspace, in case the vendor code failed during handling of this msr. (e.g -ENOMEM) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-21 17:48:48 -04:00
Linus Torvalds	da9803dfd3	This feature enhances the current guest memory encryption support called SEV by also encrypting the guest register state, making the registers inaccessible to the hypervisor by en-/decrypting them on world switches. Thus, it adds additional protection to Linux guests against exfiltration, control flow and rollback attacks. With SEV-ES, the guest is in full control of what registers the hypervisor can access. This is provided by a guest-host exchange mechanism based on a new exception vector called VMM Communication Exception (#VC), a new instruction called VMGEXIT and a shared Guest-Host Communication Block which is a decrypted page shared between the guest and the hypervisor. Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest so in order for that exception mechanism to work, the early x86 init code needed to be made able to handle exceptions, which, in itself, brings a bunch of very nice cleanups and improvements to the early boot code like an early page fault handler, allowing for on-demand building of the identity mapping. With that, !KASLR configurations do not use the EFI page table anymore but switch to a kernel-controlled one. The main part of this series adds the support for that new exchange mechanism. The goal has been to keep this as much as possibly separate from the core x86 code by concentrating the machinery in two SEV-ES-specific files: arch/x86/kernel/sev-es-shared.c arch/x86/kernel/sev-es.c Other interaction with core x86 code has been kept at minimum and behind static keys to minimize the performance impact on !SEV-ES setups. Work by Joerg Roedel and Thomas Lendacky and others. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+FiKYACgkQEsHwGGHe VUqS5BAAlh5mKwtxXMyFyAIHa5tpsgDjbecFzy1UVmZyxN0JHLlM3NLmb+K52drY PiWjNNMi/cFMFazkuLFHuY0poBWrZml8zRS/mExKgUJC6EtguS9FQnRE9xjDBoWQ gOTSGJWEzT5wnFqo8qHwlC2CDCSF1hfL8ks3cUFW2tCWus4F9pyaMSGfFqD224rg Lh/8+arDMSIKE4uH0cm7iSuyNpbobId0l5JNDfCEFDYRigQZ6pZsQ9pbmbEpncs4 rmjDvBA5eHDlNMXq0ukqyrjxWTX4ZLBOBvuLhpyssSXnnu2T+Tcxg09+ZSTyJAe0 LyC9Wfo0v78JASXMAdeH9b1d1mRYNMqjvnBItNQoqweoqUXWz7kvgxCOp6b/G4xp cX5YhB6BprBW2DXL45frMRT/zX77UkEKYc5+0IBegV2xfnhRsjqQAQaWLIksyEaX nz9/C6+1Sr2IAv271yykeJtY6gtlRjg/usTlYpev+K0ghvGvTmuilEiTltjHrso1 XAMbfWHQGSd61LNXofvx/GLNfGBisS6dHVHwtkayinSjXNdWxI6w9fhbWVjQ+y2V hOF05lmzaJSG5kPLrsFHFqm2YcxOmsWkYYDBHvtmBkMZSf5B+9xxDv97Uy9NETcr eSYk//TEkKQqVazfCQS/9LSm0MllqKbwNO25sl0Tw2k6PnheO2g= =toqi -----END PGP SIGNATURE----- Merge tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV-ES support from Borislav Petkov: "SEV-ES enhances the current guest memory encryption support called SEV by also encrypting the guest register state, making the registers inaccessible to the hypervisor by en-/decrypting them on world switches. Thus, it adds additional protection to Linux guests against exfiltration, control flow and rollback attacks. With SEV-ES, the guest is in full control of what registers the hypervisor can access. This is provided by a guest-host exchange mechanism based on a new exception vector called VMM Communication Exception (#VC), a new instruction called VMGEXIT and a shared Guest-Host Communication Block which is a decrypted page shared between the guest and the hypervisor. Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest so in order for that exception mechanism to work, the early x86 init code needed to be made able to handle exceptions, which, in itself, brings a bunch of very nice cleanups and improvements to the early boot code like an early page fault handler, allowing for on-demand building of the identity mapping. With that, !KASLR configurations do not use the EFI page table anymore but switch to a kernel-controlled one. The main part of this series adds the support for that new exchange mechanism. The goal has been to keep this as much as possibly separate from the core x86 code by concentrating the machinery in two SEV-ES-specific files: arch/x86/kernel/sev-es-shared.c arch/x86/kernel/sev-es.c Other interaction with core x86 code has been kept at minimum and behind static keys to minimize the performance impact on !SEV-ES setups. Work by Joerg Roedel and Thomas Lendacky and others" * tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits) x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer x86/sev-es: Check required CPU features for SEV-ES x86/efi: Add GHCB mappings when SEV-ES is active x86/sev-es: Handle NMI State x86/sev-es: Support CPU offline/online x86/head/64: Don't call verify_cpu() on starting APs x86/smpboot: Load TSS and getcpu GDT entry before loading IDT x86/realmode: Setup AP jump table x86/realmode: Add SEV-ES specific trampoline entry point x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES x86/sev-es: Handle #DB Events x86/sev-es: Handle #AC Events x86/sev-es: Handle VMMCALL Events x86/sev-es: Handle MWAIT/MWAITX Events x86/sev-es: Handle MONITOR/MONITORX Events x86/sev-es: Handle INVD Events x86/sev-es: Handle RDPMC Events x86/sev-es: Handle RDTSC(P) Events ...	2020-10-14 10:21:34 -07:00
Linus Torvalds	6873139ed0	objtool changes for v5.10: - Most of the changes are cleanups and reorganization to make the objtool code more arch-agnostic. This is in preparation for non-x86 support. Fixes: - KASAN fixes. - Handle unreachable trap after call to noreturn functions better. - Ignore unreachable fake jumps. - Misc smaller fixes & cleanups. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+FgwIRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1juGw/6A6goA5/HHapM965yG1eY/rTLp3eIbcma 1ZbkUsP0YfT6wVUzw/sOeZzKNOwOq1FuMfkjuH2KcnlxlcMekIaKvLk8uauW4igM hbFGuuZfZ0An5ka9iQ1W6HGdsuD3vVlN1w/kxdWk0c3lJCVQSTxdCfzF8fuF3gxX lF3Bc1D/ZFcHIHT/hu/jeIUCgCYpD3qZDjQJBScSwVthZC+Fw6weLLGp2rKDaCao HhSQft6MUfDrUKfH3LBIUNPRPCOrHo5+AX6BXxLXJVxqlwO/YU3e0GMwSLedMtBy TASWo7/9GAp+wNNZe8EliyTKrfC3sLxN1QImfjuojxbBVXx/YQ/ToTt9fVGpF4Y+ XhhRFv9520v1tS2wPHIgQGwbh7EWG6mdrmo10RAs/31ViONPrbEZ4WmcA08b/5FY KEkOVb18yfmDVzVZPpSc+HpIFkppEBOf7wPg27Bj3RTZmzIl/y+rKSnxROpsJsWb R6iov7SFVET14lHl1G7tPNXfqRaS7HaOQIj3rSUyAP0ZfX+yIupVJp32dc6Ofg8b SddUCwdIHoFdUNz4Y9csUCrewtCVJbxhV4MIdv0GpWbrgSw96RFZgetaH+6mGRpj 0Kh6M1eC3irDbhBuarWUBAr2doPAq4iOUeQU36Q6YSAbCs83Ws2uKOWOHoFBVwCH uSKT0wqqG+E= =KX5o -----END PGP SIGNATURE----- Merge tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: "Most of the changes are cleanups and reorganization to make the objtool code more arch-agnostic. This is in preparation for non-x86 support. Other changes: - KASAN fixes - Handle unreachable trap after call to noreturn functions better - Ignore unreachable fake jumps - Misc smaller fixes & cleanups" * tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf build: Allow nested externs to enable BUILD_BUG() usage objtool: Allow nested externs to enable BUILD_BUG() objtool: Permit __kasan_check_{read,write} under UACCESS objtool: Ignore unreachable trap after call to noreturn functions objtool: Handle calling non-function symbols in other sections objtool: Ignore unreachable fake jumps objtool: Remove useless tests before save_reg() objtool: Decode unwind hint register depending on architecture objtool: Make unwind hint definitions available to other architectures objtool: Only include valid definitions depending on source file type objtool: Rename frame.h -> objtool.h objtool: Refactor jump table code to support other architectures objtool: Make relocation in alternative handling arch dependent objtool: Abstract alternative special case handling objtool: Move macros describing structures to arch-dependent code objtool: Make sync-check consider the target architecture objtool: Group headers to check in a single list objtool: Define 'struct orc_entry' only when needed objtool: Skip ORC entry creation for non-text sections objtool: Move ORC logic out of check() ...	2020-10-14 10:13:37 -07:00
Linus Torvalds	92a0610b6a	* Add support for hardware-enforced cache coherency on AMD which obviates the need to flush cachelines before changing the PTE encryption bit, by Krish Sadhukhan. * Add Centaur initialization support for families >= 7, by Tony W Wang-oc. * Add a feature flag for, and expose TSX suspend load tracking feature to KVM, by Cathy Zhang. * Emulate SLDT and STR so that windows programs don't crash on UMIP machines, by Brendan Shanks and Ricardo Neri. * Use the new SERIALIZE insn on Intel hardware which supports it, by Ricardo Neri. * Misc cleanups and fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+EKA4ACgkQEsHwGGHe VUq9/RAAtneeyhaM3Erjj+4gLi383ml7n8hlAixZ7x1T5mkV7B5rKtNQ99AYQTbp lbA7qu5mfsEqKeu2WY13IMqHKgfgGXPPUumCTYjgQlC6UoSAMjgtGGCDodqBE8i5 iIq6gp59t4aMn1ltkzNMS0hLGX0i3xA6cBHPB6EVdJhQgaX4oVVs1dL8IbXumj3q NX5qIcaTN/f/IC3EZuCDwZGKUpRF+PWTbD+k02jfhS2crvG/LHJgtisQJ1Eu/ccz yKfiwCBYS7FcuDTxiJDchz1sXD25dgBmBG26voIukSIPbysAPF7O1MULGvKsztFV W/6+VMC+KGUs0ACQHhFl5ALXA73zAJskjzNzRRuduYM0Mr2yAckVet2usicnt/Cp lpmvOpeCjDTPuBQTs0cR9TWjXFeinUkQJOAEcqv9Wh1OKQShZUAJ1jpHwZiDCnhY kzOhq9GAgKNXxcqcTQD8mIap2/GKIppIxAVb7vPxDQfUhUz/60o0eF3cMkeaa216 31Bnf4h+XtJPoJkDOhI8XrPLw6c3KRWP3i3IoBj+raLiylwzzrIczf/7CBgHoIsa fEwuM0PUDVurY38VMRlj1dMFBSFw8U7JqKYyvXKwB3KFeyX7SGZDLmdlvhsRTq20 HJepCVldKZvjDq1zvRFyx/TsZQnoDHsIyv5lAC/EKE3S0/XRg2c= =zXC1 -----END PGP SIGNATURE----- Merge tag 'x86_cpu_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Add support for hardware-enforced cache coherency on AMD which obviates the need to flush cachelines before changing the PTE encryption bit (Krish Sadhukhan) - Add Centaur initialization support for families >= 7 (Tony W Wang-oc) - Add a feature flag for, and expose TSX suspend load tracking feature to KVM (Cathy Zhang) - Emulate SLDT and STR so that windows programs don't crash on UMIP machines (Brendan Shanks and Ricardo Neri) - Use the new SERIALIZE insn on Intel hardware which supports it (Ricardo Neri) - Misc cleanups and fixes * tag 'x86_cpu_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: KVM: SVM: Don't flush cache if hardware enforces cache coherency across encryption domains x86/mm/pat: Don't flush cache if hardware enforces cache coherency across encryption domnains x86/cpu: Add hardware-enforced cache coherency as a CPUID feature x86/cpu/centaur: Add Centaur family >=7 CPUs initialization support x86/cpu/centaur: Replace two-condition switch-case with an if statement x86/kvm: Expose TSX Suspend Load Tracking feature x86/cpufeatures: Enumerate TSX suspend load address tracking instructions x86/umip: Add emulation/spoofing for SLDT and STR instructions x86/cpu: Fix typos and improve the comments in sync_core() x86/cpu: Use XGETBV and XSETBV mnemonics in fpu/internal.h x86/cpu: Use SERIALIZE in sync_core() when available	2020-10-12 10:24:40 -07:00
Paolo Bonzini	a7d5c7ce41	KVM: nSVM: delay MSR permission processing to first nested VM run Allow userspace to set up the memory map after KVM_SET_NESTED_STATE; to do so, move the call to nested_svm_vmrun_msrpm inside the KVM_REQ_GET_NESTED_STATE_PAGES handler (which is currently not used by nSVM). This is similar to what VMX does already. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:59:30 -04:00
Alexander Graf	fd6fa73d13	KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied We will introduce the concept of MSRs that may not be handled in kernel space soon. Some MSRs are directly passed through to the guest, effectively making them handled by KVM from user space's point of view. This patch introduces all logic required to ensure that MSRs that user space wants trapped are not marked as direct access for guests. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-6-graf@amazon.com> [Make terminology a bit more similar to VMX. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:06 -04:00
Aaron Lewis	476c9bd8e9	KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs Prepare vmx and svm for a subsequent change that ensures the MSR permission bitmap is set to allow an MSR that userspace is tracking to force a vmx_vmexit in the guest. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Oliver Upton <oupton@google.com> [agraf: rebase, adapt SVM scheme to nested changes that came in between] Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-5-graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:05 -04:00
Sean Christopherson	cc167bd7ee	KVM: x86: Use common definition for kvm_nested_vmexit tracepoint Use the newly introduced TRACE_EVENT_KVM_EXIT to define the guts of kvm_nested_vmexit so that it captures and prints the same information as kvm_exit. This has the bonus side effect of fixing the interrupt info and error code printing for the case where they're invalid, e.g. if the exit was a failed VM-Entry. This also sets the stage for retrieving EXIT_QUALIFICATION and VM_EXIT_INTR_INFO in nested_vmx_reflect_vmexit() if and only if the VM-Exit is being routed to L1. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923201349.16097-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:52 -04:00
Sean Christopherson	235ba74f00	KVM: x86: Add intr/vectoring info and error code to kvm_exit tracepoint Extend the kvm_exit tracepoint to align it with kvm_nested_vmexit in terms of what information is captured. On SVM, add interrupt info and error code, while on VMX it add IDT vectoring and error code. This sets the stage for macrofying the kvm_exit tracepoint definition so that it can be reused for kvm_nested_vmexit without loss of information. Opportunistically stuff a zero for VM_EXIT_INTR_INFO if the VM-Enter failed, as the field is guaranteed to be invalid. Note, it'd be possible to further filter the interrupt/exception fields based on the VM-Exit reason, but the helper is intended only for tracepoints, i.e. an extra VMREAD or two is a non-issue, the failed VM-Enter case is just low hanging fruit. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923201349.16097-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:51 -04:00
Sean Christopherson	a9d7d76c66	KVM: x86: Read guest RIP from within the kvm_nested_vmexit tracepoint Use kvm_rip_read() to read the guest's RIP for the nested VM-Exit tracepoint instead of having the caller pass in an argument. Params that are passed into a tracepoint are evaluated even if the tracepoint is disabled, i.e. passing in RIP for VMX incurs a VMREAD and retpoline to retrieve a value that may never be used, e.g. if the exit is due to a hardware interrupt. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923201349.16097-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:50 -04:00
Paolo Bonzini	14e3dd8d25	KVM: SEV: shorten comments around sev_clflush_pages Very similar content is present in four comments in sev.c. Unfortunately there are small differences that make it harder to place the comment in sev_clflush_pages itself, but at least we can make it more concise. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:24 -04:00
Cfir Cohen	50085beee8	KVM: SVM: Mark SEV launch secret pages as dirty. The LAUNCH_SECRET command performs encryption of the launch secret memory contents. Mark pinned pages as dirty, before unpinning them. This matches the logic in sev_launch_update_data(). Signed-off-by: Cfir Cohen <cfir@google.com> Message-Id: <20200808003746.66687-1-cfir@google.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:24 -04:00
Krish Sadhukhan	fb0f33fdef	KVM: nSVM: CR3 MBZ bits are only 63:52 Commit `761e416934` created a wrong mask for the CR3 MBZ bits. According to APM vol 2, only the upper 12 bits are MBZ. Fixes: `761e416934` ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests", 2020-07-08) Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20200829004824.4577-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:23 -04:00
Haiwei Li	95b28ac9db	KVM: SVM: Add tracepoint for cr_interception Add trace_kvm_cr_write and trace_kvm_cr_read for svm. Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <f3031602-db3b-c4fe-b719-d402663b0a2b@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:21 -04:00
Wanpeng Li	4e810adb53	KVM: SVM: Analyze is_guest_mode() in svm_vcpu_run() Analyze is_guest_mode() in svm_vcpu_run() instead of svm_exit_handlers_fastpath() in conformity with VMX version. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1600066548-4343-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:21 -04:00
Sean Christopherson	09e3e2a1cc	KVM: x86: Add kvm_x86_ops hook to short circuit emulation Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a more generic is_emulatable(), and unconditionally call the new function in x86_emulate_instruction(). KVM will use the generic hook to support multiple security related technologies that prevent emulation in one way or another. Similar to the existing AMD #NPF case where emulation of the current instruction is not possible due to lack of information, AMD's SEV-ES and Intel's SGX and TDX will introduce scenarios where emulation is impossible due to the guest's register state being inaccessible. And again similar to the existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(), i.e. outside of the control of vendor-specific code. While the cause and architecturally visible behavior of the various cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean resume or complete shutdown, and SEV-ES and TDX "return" an error, the impact on the common emulation code is identical: KVM must stop emulation immediately and resume the guest. Query is_emulatable() in handle_ud() as well so that the force_emulation_prefix code doesn't incorrectly modify RIP before calling emulate_instruction() in the absurdly unlikely scenario that KVM encounters forced emulation in conjunction with "do not emulate". Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200915232702.15945-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:19 -04:00
Haiwei Li	ae5a2a39e4	KVM: SVM: use __GFP_ZERO instead of clear_page() Use __GFP_ZERO while alloc_page(). Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <20200916083621.5512-1-lihaiwei.kernel@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:19 -04:00
Babu Moger	4407a797e9	KVM: SVM: Enable INVPCID feature on AMD The following intercept bit has been added to support VMEXIT for INVPCID instruction: Code Name Cause A2h VMEXIT_INVPCID INVPCID instruction The following bit has been added to the VMCB layout control area to control intercept of INVPCID: Byte Offset Bit(s) Function 14h 2 intercept INVPCID Enable the interceptions when the the guest is running with shadow page table enabled and handle the tlbflush based on the invpcid instruction type. For the guests with nested page table (NPT) support, the INVPCID feature works as running it natively. KVM does not need to do any special handling in this case. AMD documentation for INVPCID feature is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985255929.11252.17346684135277453258.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:17 -04:00
Babu Moger	830bd71f2c	KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept. Instead call generic svm_set_intercept, svm_clr_intercept an dsvm_is_intercep tfor all cr intercepts. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985253016.11252.16945893859439811480.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:15 -04:00
Babu Moger	4c44e8d6c1	KVM: SVM: Add new intercept word in vmcb_control_area The new intercept bits have been added in vmcb control area to support few more interceptions. Here are the some of them. - INTERCEPT_INVLPGB, - INTERCEPT_INVLPGB_ILLEGAL, - INTERCEPT_INVPCID, - INTERCEPT_MCOMMIT, - INTERCEPT_TLBSYNC, Add a new intercept word in vmcb_control_area to support these instructions. Also update kvm_nested_vmrun trace function to support the new addition. AMD documentation for these instructions is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985251547.11252.16994139329949066945.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:15 -04:00
Babu Moger	c62e2e94b9	KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors Convert all the intercepts to one array of 32 bit vectors in vmcb_control_area. This makes it easy for future intercept vector additions. Also update trace functions. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250813.11252.5736581193881040525.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:15 -04:00
Babu Moger	9780d51dc2	KVM: SVM: Modify intercept_exceptions to generic intercepts Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_exceptions bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250037.11252.1361972528657052410.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:14 -04:00
Babu Moger	30abaa8838	KVM: SVM: Change intercept_dr to generic intercepts Modify intercept_dr to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_dr bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985249255.11252.10000868032136333355.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:14 -04:00
Babu Moger	03bfeeb988	KVM: SVM: Change intercept_cr to generic intercepts Change intercept_cr to generic intercepts in vmcb_control_area. Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept where applicable. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985248506.11252.9081085950784508671.stgit@bmoger-ubuntu> [Change constant names. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:13 -04:00
Babu Moger	c45ad7229d	KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept) This is in preparation for the future intercept vector additions. Add new functions vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept using kernel APIs __set_bit, __clear_bit and test_bit espectively. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985247876.11252.16039238014239824460.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:13 -04:00
Babu Moger	a90c1ed9f1	KVM: nSVM: Remove unused field host_intercept_exceptions is not used anywhere. Clean it up. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985252277.11252.8819848322175521354.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:12 -04:00
Maxim Levitsky	8d22b90e94	KVM: SVM: refactor exit labels in svm_create_vcpu Kernel coding style suggests not to use labels like error1,error2 Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:12 -04:00
Maxim Levitsky	0681de1b83	KVM: SVM: use __GFP_ZERO instead of clear_page Another small refactoring. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:11 -04:00
Maxim Levitsky	f4c847a956	KVM: SVM: refactor msr permission bitmap allocation Replace svm_vcpu_init_msrpm with svm_vcpu_alloc_msrpm, that also allocates the msr bitmap and add svm_vcpu_free_msrpm to free it. This will be used later to move the nested msr permission bitmap allocation to nested.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:11 -04:00
Maxim Levitsky	0dd16b5b0c	KVM: nSVM: rename nested vmcb to vmcb12 This is to be more consistient with VMX, and to support upcoming addition of vmcb02 Hopefully no functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:10 -04:00
Maxim Levitsky	1feaba144c	KVM: SVM: rename a variable in the svm_create_vcpu The 'page' is to hold the vcpu's vmcb so name it as such to avoid confusion. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200827171145.374620-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:10 -04:00
Wanpeng Li	010fd37fdd	KVM: LAPIC: Reduce world switch latency caused by timer_advance_ns All the checks in lapic_timer_int_injected(), __kvm_wait_lapic_expire(), and these function calls waste cpu cycles when the timer mode is not tscdeadline. We can observe ~1.3% world switch time overhead by kvm-unit-tests/vmexit.flat vmcall testing on AMD server. This patch reduces the world switch latency caused by timer_advance_ns feature when the timer mode is not tscdeadline by simpling move the check against apic->lapic_timer.expired_tscdeadline much earlier. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-7-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:10 -04:00
Vitaly Kuznetsov	d5cd6f3401	KVM: nSVM: Avoid freeing uninitialized pointers in svm_set_nested_state() The save and ctl pointers are passed uninitialized to kfree() when svm_set_nested_state() follows the 'goto out_set_gif' path. While the issue could've been fixed by initializing these on-stack varialbles to NULL, it seems preferable to eliminate 'out_set_gif' label completely as it is not actually a failure path and duplicating a single svm_set_gif() call doesn't look too bad. [ bp: Drop obscure Addresses-Coverity: tag. ] Fixes: `6ccbd29ade` ("KVM: SVM: nested: Don't allocate VMCB structures on stack") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Joerg Roedel <jroedel@suse.de> Reported-by: Colin King <colin.king@canonical.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Joerg Roedel <jroedel@suse.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20200914133725.650221-1-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:56:43 -04:00
Linus Torvalds	7c7ec3226f	Five small fixes. The nested migration bug will be fixed with a better API in 5.10 or 5.11, for now this is a fix that works with existing userspace but keeps the current ugly API. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl9ufLMUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMe9AgAgU3YQ2SktkqEOXjHMLqCH5Y3PKFI S2anYpoKlH36Q6kzoqtkCj0GVagvdh5+Envz3I/tMdhv3Y/JgZaX1wHAe4cUl9BT VyoiDBTWkhYRmpUbLYA8AtmgxQw1Hp8srH86rnvVGmLG6zdAa/rgUAKiQgT688Ej CQvF5H7Zi3viPo2rInNSkgTIgewduqSWkwJ6+h4AQMmNJpbRaeZs45yMYyyu/FIi hUazy7Rwk2vkWcuTd/sqH9b9y3VCYpN9juRaehEiK8qxXT3ydTU4Tub25BHmvXdr dx5pShG4P3nAGnfV1qKAemyQcY7sjfMieqN1F3QcsRcxqZgySUm11o2JRw== =sHsX -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more kvm fixes from Paolo Bonzini: "Five small fixes. The nested migration bug will be fixed with a better API in 5.10 or 5.11, for now this is a fix that works with existing userspace but keeps the current ugly API" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Add a dedicated INVD intercept routine KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE KVM: x86: fix MSR_IA32_TSC read for nested migration selftests: kvm: Fix assert failure in single-step test KVM: x86: VMX: Make smaller physical guest address space support user-configurable	2020-09-25 17:15:19 -07:00
Tom Lendacky	4bb05f3048	KVM: SVM: Add a dedicated INVD intercept routine The INVD instruction intercept performs emulation. Emulation can't be done on an SEV guest because the guest memory is encrypted. Provide a dedicated intercept routine for the INVD intercept. And since the instruction is emulated as a NOP, just skip it instead. Fixes: `1654efcbc4` ("KVM: SVM: Add KVM_SEV_INIT command") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <a0b9a19ffa7fef86a3cc700c7ea01cb2731e04e5.1600972918.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-25 13:27:35 -04:00
Paolo Bonzini	bf3c0e5e71	Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD	2020-09-22 06:43:17 -04:00
Krish Sadhukhan	e1ebb2b490	KVM: SVM: Don't flush cache if hardware enforces cache coherency across encryption domains In some hardware implementations, coherency between the encrypted and unencrypted mappings of the same physical page in a VM is enforced. In such a system, it is not required for software to flush the VM's page from all CPU caches in the system prior to changing the value of the C-bit for the page. So check that bit before flushing the cache. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20200917212038.5090-4-krish.sadhukhan@oracle.com	2020-09-19 20:46:59 +02:00
Linus Torvalds	84b1349972	ARM: - Multiple stolen time fixes, with a new capability to match x86 - Fix for hugetlbfs mappings when PUD and PMD are the same level - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty logging, for example) - Fix tracing output of 64bit values x86: - nSVM state restore fixes - Async page fault fixes - Lots of small fixes everywhere -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl9dM5kUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroM+Iwf+LbISO7ccpPMK1kKtOeug/jZv+xQA sVaBGRzYo+k2e0XtV8E8IV4N30FBtYSwXsbBKkMAoy2FpmMebgDWDQ7xspb6RJMS /y8t1iqPwdOaLIkUkgc7UihSTlZm05Es3f3q6uZ9+oaM4Fe+V7xWzTUX4Oy89JO7 KcQsTD7pMqS4bfZGADK781ITR/WPgCi0aYx5s6dcwcZAQXhb1K1UKEjB8OGKnjUh jliReJtxRA16rjF+S5aJ7L07Ce/ksrfwkI4NXJ4GxW+lyOfVNdSBJUBaZt1m7G2M 1We5+i5EjKCjuxmgtUUUfVdazpj1yl+gBGT7KKkLte9T9WZdXyDnixAbvg== =OFb3 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "A bit on the bigger side, mostly due to me being on vacation, then busy, then on parental leave, but there's nothing worrisome. ARM: - Multiple stolen time fixes, with a new capability to match x86 - Fix for hugetlbfs mappings when PUD and PMD are the same level - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty logging, for example) - Fix tracing output of 64bit values x86: - nSVM state restore fixes - Async page fault fixes - Lots of small fixes everywhere" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: emulator: more strict rsm checks. KVM: nSVM: more strict SMM checks when returning to nested guest SVM: nSVM: setup nested msr permission bitmap on nested state load SVM: nSVM: correctly restore GIF on vmexit from nesting after migration x86/kvm: don't forget to ACK async PF IRQ x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit KVM: SVM: avoid emulation with stale next_rip KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN KVM: SVM: Periodically schedule when unregistering regions on destroy KVM: MIPS: Change the definition of kvm type kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control KVM: fix memory leak in kvm_io_bus_unregister_dev() KVM: Check the allocation of pv cpu mask KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected KVM: arm64: Update page shift if stage 2 block mapping not supported KVM: arm64: Fix address truncation in traces KVM: arm64: Do not try to map PUDs when they are folded into PMD arm64/x86: KVM: Introduce steal-time cap ...	2020-09-13 08:34:47 -07:00
Maxim Levitsky	3ebb5d2617	KVM: nSVM: more strict SMM checks when returning to nested guest * check that guest is 64 bit guest, otherwise the SVM related fields in the smm state area are not defined * If the SMM area indicates that SMM interrupted a running guest, check that EFER.SVME which is also saved in this area is set, otherwise the guest might have tampered with SMM save area, and so indicate emulation failure which should triple fault the guest. * Check that that guest CPUID supports SVM (due to the same issue as above) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-12 12:21:43 -04:00
Maxim Levitsky	772b81bb2f	SVM: nSVM: setup nested msr permission bitmap on nested state load This code was missing and was forcing the L2 run with L1's msr permission bitmap Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-12 12:20:53 -04:00
Maxim Levitsky	9883764ad0	SVM: nSVM: correctly restore GIF on vmexit from nesting after migration Currently code in svm_set_nested_state copies the current vmcb control area to L1 control area (hsave->control), under assumption that it mostly reflects the defaults that kvm choose, and later qemu overrides these defaults with L2 state using standard KVM interfaces, like KVM_SET_REGS. However nested GIF (which is AMD specific thing) is by default is true, and it is copied to hsave area as such. This alone is not a big deal since on VMexit, GIF is always set to false, regardless of what it was on VM entry. However in nested_svm_vmexit we were first were setting GIF to false, but then we overwrite the control fields with value from the hsave area. (including the nested GIF field itself if GIF virtualization is enabled). Now on normal vm entry this is not a problem, since GIF is usually false prior to normal vm entry, and this is the value that copied to hsave, and then restored, but this is not always the case when the nested state is loaded as explained above. To fix this issue, move svm_set_gif after we restore the L1 control state in nested_svm_vmexit, so that even with wrong GIF in the saved L1 control area, we still clear GIF as the spec says. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-12 12:19:06 -04:00
Wanpeng Li	e42c68281b	KVM: SVM: avoid emulation with stale next_rip svm->next_rip is reset in svm_vcpu_run() only after calling svm_exit_handlers_fastpath(), which will cause SVM's skip_emulated_instruction() to write a stale RIP. We can move svm_exit_handlers_fastpath towards the end of svm_vcpu_run(). To align VMX with SVM, keep svm_complete_interrupts() close as well. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paul K. <kronenpj@kronenpj.dyndns.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> [Also move vmcb_mark_all_clean before any possible write to the VMCB. - Paolo]	2020-09-12 02:19:23 -04:00

... 3 4 5 6 7 ...

585 Commits