ARM: Cleanups and corner case fixes

PPC: Bugfixes
 
 x86:
 * Support for mapping DAX areas with large nested page table entries.
 * Cleanups and bugfixes here too.  A particularly important one is
 a fix for FPU load when the thread has TIF_NEED_FPU_LOAD.  There is
 also a race condition which could be used in guest userspace to exploit
 the guest kernel, for which the embargo expired today.
 * Fast path for IPI delivery vmexits, shaving about 200 clock cycles
 from IPI latency.
 * Protect against "Spectre-v1/L1TF" (bring data in the cache via
 speculative out of bound accesses, use L1TF on the sibling hyperthread
 to read it), which unfortunately is an even bigger whack-a-mole game
 than SpectreV1.
 
 Sean continues his mission to rewrite KVM.  In addition to a sizable
 number of x86 patches, this time he contributed a pretty large refactoring
 of vCPU creation that affects all architectures but should not have any
 visible effect.
 
 s390 will come next week together with some more x86 patches.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeMxtCAAoJEL/70l94x66DQxIIAJv9hMmXLQHGFnUMskjGErR6
 DCLSC0YRdRMwE50CerblyJtGsMwGsPyHZwvZxoAceKJ9w0Yay9cyaoJ87ItBgHoY
 ce0HrqIUYqRSJ/F8WH2lSzkzMBr839rcmqw8p1tt4D5DIsYnxHGWwRaaP+5M/1KQ
 YKFu3Hea4L00U339iIuDkuA+xgz92LIbsn38svv5fxHhPAyWza0rDEYHNgzMKuoF
 IakLf5+RrBFAh6ZuhYWQQ44uxjb+uQa9pVmcqYzzTd5t1g4PV5uXtlJKesHoAvik
 Eba8IEUJn+HgQJjhp3YxQYuLeWOwRF3bwOiZ578MlJ4OPfYXMtbdlqCQANHOcGk=
 =H/q1
 -----END PGP SIGNATURE-----

Merge tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "This is the first batch of KVM changes.

  ARM:
   - cleanups and corner case fixes.

  PPC:
   - Bugfixes

  x86:
   - Support for mapping DAX areas with large nested page table entries.

   - Cleanups and bugfixes here too. A particularly important one is a
     fix for FPU load when the thread has TIF_NEED_FPU_LOAD. There is
     also a race condition which could be used in guest userspace to
     exploit the guest kernel, for which the embargo expired today.

   - Fast path for IPI delivery vmexits, shaving about 200 clock cycles
     from IPI latency.

   - Protect against "Spectre-v1/L1TF" (bring data in the cache via
     speculative out of bound accesses, use L1TF on the sibling
     hyperthread to read it), which unfortunately is an even bigger
     whack-a-mole game than SpectreV1.

  Sean continues his mission to rewrite KVM. In addition to a sizable
  number of x86 patches, this time he contributed a pretty large
  refactoring of vCPU creation that affects all architectures but should
  not have any visible effect.

  s390 will come next week together with some more x86 patches"

* tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  x86/KVM: Clean up host's steal time structure
  x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed
  x86/kvm: Cache gfn to pfn translation
  x86/kvm: Introduce kvm_(un)map_gfn()
  x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
  KVM: PPC: Book3S PR: Fix -Werror=return-type build failure
  KVM: PPC: Book3S HV: Release lock on page-out failure path
  KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integer
  KVM: arm64: pmu: Only handle supported event counters
  KVM: arm64: pmu: Fix chained SW_INCR counters
  KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabled
  KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset
  KVM: x86: Use a typedef for fastop functions
  KVM: X86: Add 'else' to unify fastop and execute call path
  KVM: x86: inline memslot_valid_for_gpte
  KVM: x86/mmu: Use huge pages for DAX-backed files
  KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte()
  KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust()
  KVM: x86/mmu: Zap any compound page when collapsing sptes
  KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch)
  ...
This commit is contained in:
Linus Torvalds 2020-01-31 09:30:41 -08:00
commit e813e65038
92 changed files with 2400 additions and 2058 deletions

View File

@ -948,6 +948,66 @@ Use cases
up its internal state for this virtual machine.
H_SVM_INIT_ABORT
----------------
Abort the process of securing an SVM.
Syntax
~~~~~~
.. code-block:: c
uint64_t hypercall(const uint64_t H_SVM_INIT_ABORT)
Return values
~~~~~~~~~~~~~
One of the following values:
* H_PARAMETER on successfully cleaning up the state,
Hypervisor will return this value to the
**guest**, to indicate that the underlying
UV_ESM ultracall failed.
* H_STATE if called after a VM has gone secure (i.e
H_SVM_INIT_DONE hypercall was successful).
* H_UNSUPPORTED if called from a wrong context (e.g. from a
normal VM).
Description
~~~~~~~~~~~
Abort the process of securing a virtual machine. This call must
be made after a prior call to ``H_SVM_INIT_START`` hypercall and
before a call to ``H_SVM_INIT_DONE``.
On entry into this hypercall the non-volatile GPRs and FPRs are
expected to contain the values they had at the time the VM issued
the UV_ESM ultracall. Further ``SRR0`` is expected to contain the
address of the instruction after the ``UV_ESM`` ultracall and ``SRR1``
the MSR value with which to return to the VM.
This hypercall will cleanup any partial state that was established for
the VM since the prior ``H_SVM_INIT_START`` hypercall, including paging
out pages that were paged-into secure memory, and issue the
``UV_SVM_TERMINATE`` ultracall to terminate the VM.
After the partial state is cleaned up, control returns to the VM
(**not Ultravisor**), at the address specified in ``SRR0`` with the
MSR values set to the value in ``SRR1``.
Use cases
~~~~~~~~~
If after a successful call to ``H_SVM_INIT_START``, the Ultravisor
encounters an error while securing a virtual machine, either due
to lack of resources or because the VM's security information could
not be validated, Ultravisor informs the Hypervisor about it.
Hypervisor should use this call to clean up any internal state for
this virtual machine and return to the VM.
H_SVM_PAGE_IN
-------------

View File

@ -2196,6 +2196,15 @@ arm64 CCSIDR registers are demultiplexed by CSSELR value:
arm64 system registers have the following id bit patterns:
0x6030 0000 0013 <op0:2> <op1:3> <crn:4> <crm:4> <op2:3>
WARNING:
Two system register IDs do not follow the specified pattern. These
are KVM_REG_ARM_TIMER_CVAL and KVM_REG_ARM_TIMER_CNT, which map to
system registers CNTV_CVAL_EL0 and CNTVCT_EL0 respectively. These
two had their values accidentally swapped, which means TIMER_CVAL is
derived from the register encoding for CNTVCT_EL0 and TIMER_CNT is
derived from the register encoding for CNTV_CVAL_EL0. As this is
API, it must remain this way.
arm64 firmware pseudo-registers have the following bit pattern:
0x6030 0000 0014 <regno:16>

View File

@ -9,18 +9,29 @@
#include <linux/kvm_host.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_mmio.h>
#include <asm/kvm_arm.h>
#include <asm/cputype.h>
/* arm64 compatibility macros */
#define PSR_AA32_MODE_FIQ FIQ_MODE
#define PSR_AA32_MODE_SVC SVC_MODE
#define PSR_AA32_MODE_ABT ABT_MODE
#define PSR_AA32_MODE_UND UND_MODE
#define PSR_AA32_T_BIT PSR_T_BIT
#define PSR_AA32_F_BIT PSR_F_BIT
#define PSR_AA32_I_BIT PSR_I_BIT
#define PSR_AA32_A_BIT PSR_A_BIT
#define PSR_AA32_E_BIT PSR_E_BIT
#define PSR_AA32_IT_MASK PSR_IT_MASK
#define PSR_AA32_GE_MASK 0x000f0000
#define PSR_AA32_DIT_BIT 0x00200000
#define PSR_AA32_PAN_BIT 0x00400000
#define PSR_AA32_SSBS_BIT 0x00800000
#define PSR_AA32_Q_BIT PSR_Q_BIT
#define PSR_AA32_V_BIT PSR_V_BIT
#define PSR_AA32_C_BIT PSR_C_BIT
#define PSR_AA32_Z_BIT PSR_Z_BIT
#define PSR_AA32_N_BIT PSR_N_BIT
unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
@ -41,6 +52,11 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
*__vcpu_spsr(vcpu) = v;
}
static inline unsigned long host_spsr_to_spsr32(unsigned long spsr)
{
return spsr;
}
static inline unsigned long vcpu_get_reg(struct kvm_vcpu *vcpu,
u8 reg_num)
{
@ -182,6 +198,11 @@ static inline bool kvm_vcpu_dabt_issext(struct kvm_vcpu *vcpu)
return kvm_vcpu_get_hsr(vcpu) & HSR_SSE;
}
static inline bool kvm_vcpu_dabt_issf(const struct kvm_vcpu *vcpu)
{
return false;
}
static inline int kvm_vcpu_dabt_get_rd(struct kvm_vcpu *vcpu)
{
return (kvm_vcpu_get_hsr(vcpu) & HSR_SRT_MASK) >> HSR_SRT_SHIFT;
@ -198,7 +219,7 @@ static inline bool kvm_vcpu_dabt_is_cm(struct kvm_vcpu *vcpu)
}
/* Get Access Size from a data abort */
static inline int kvm_vcpu_dabt_get_as(struct kvm_vcpu *vcpu)
static inline unsigned int kvm_vcpu_dabt_get_as(struct kvm_vcpu *vcpu)
{
switch ((kvm_vcpu_get_hsr(vcpu) >> 22) & 0x3) {
case 0:
@ -209,7 +230,7 @@ static inline int kvm_vcpu_dabt_get_as(struct kvm_vcpu *vcpu)
return 4;
default:
kvm_err("Hardware is weird: SAS 0b11 is reserved\n");
return -EFAULT;
return 4;
}
}

View File

@ -14,7 +14,6 @@
#include <asm/cputype.h>
#include <asm/kvm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_mmio.h>
#include <asm/fpstate.h>
#include <kvm/arm_arch_timer.h>
@ -202,9 +201,6 @@ struct kvm_vcpu_arch {
/* Don't run the guest (internal implementation need) */
bool pause;
/* IO related fields */
struct kvm_decode mmio_decode;
/* Cache some mmu pages needed inside spinlock regions */
struct kvm_mmu_memory_cache mmu_page_cache;
@ -284,8 +280,6 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
void kvm_arm_halt_guest(struct kvm *kvm);
void kvm_arm_resume_guest(struct kvm *kvm);
@ -300,6 +294,14 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
static inline void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
int exception_index) {}
/* MMIO helpers */
void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len);
int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
phys_addr_t fault_ipa);
static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
unsigned long hyp_stack_ptr,
unsigned long vector_ptr)
@ -363,9 +365,9 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
static inline bool kvm_arch_requires_vhe(void) { return false; }
static inline void kvm_arch_hardware_unsetup(void) {}
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
static inline void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu) {}
static inline void kvm_arm_init_debug(void) {}
static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}

View File

@ -10,6 +10,7 @@
#include <linux/compiler.h>
#include <linux/kvm_host.h>
#include <asm/cp15.h>
#include <asm/kvm_arm.h>
#include <asm/vfp.h>
#define __hyp_text __section(.hyp.text) notrace

View File

@ -1,26 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (C) 2012 - Virtual Open Systems and Columbia University
* Author: Christoffer Dall <c.dall@virtualopensystems.com>
*/
#ifndef __ARM_KVM_MMIO_H__
#define __ARM_KVM_MMIO_H__
#include <linux/kvm_host.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_arm.h>
struct kvm_decode {
unsigned long rt;
bool sign_extend;
};
void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len);
int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
phys_addr_t fault_ipa);
#endif /* __ARM_KVM_MMIO_H__ */

View File

@ -34,11 +34,6 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
};
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
return 0;
}
static u64 core_reg_offset_from_id(u64 id)
{
return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);

View File

@ -17,7 +17,6 @@
#include <asm/esr.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmio.h>
#include <asm/ptrace.h>
#include <asm/cputype.h>
#include <asm/virt.h>
@ -219,6 +218,38 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1] = v;
}
/*
* The layout of SPSR for an AArch32 state is different when observed from an
* AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
* view given an AArch64 view.
*
* In ARM DDI 0487E.a see:
*
* - The AArch64 view (SPSR_EL2) in section C5.2.18, page C5-426
* - The AArch32 view (SPSR_abt) in section G8.2.126, page G8-6256
* - The AArch32 view (SPSR_und) in section G8.2.132, page G8-6280
*
* Which show the following differences:
*
* | Bit | AA64 | AA32 | Notes |
* +-----+------+------+-----------------------------|
* | 24 | DIT | J | J is RES0 in ARMv8 |
* | 21 | SS | DIT | SS doesn't exist in AArch32 |
*
* ... and all other bits are (currently) common.
*/
static inline unsigned long host_spsr_to_spsr32(unsigned long spsr)
{
const unsigned long overlap = BIT(24) | BIT(21);
unsigned long dit = !!(spsr & PSR_AA32_DIT_BIT);
spsr &= ~overlap;
spsr |= dit << 21;
return spsr;
}
static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
{
u32 mode;
@ -283,6 +314,11 @@ static inline bool kvm_vcpu_dabt_issext(const struct kvm_vcpu *vcpu)
return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SSE);
}
static inline bool kvm_vcpu_dabt_issf(const struct kvm_vcpu *vcpu)
{
return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SF);
}
static inline int kvm_vcpu_dabt_get_rd(const struct kvm_vcpu *vcpu)
{
return (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SRT_MASK) >> ESR_ELx_SRT_SHIFT;
@ -304,7 +340,7 @@ static inline bool kvm_vcpu_dabt_is_cm(const struct kvm_vcpu *vcpu)
return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_CM);
}
static inline int kvm_vcpu_dabt_get_as(const struct kvm_vcpu *vcpu)
static inline unsigned int kvm_vcpu_dabt_get_as(const struct kvm_vcpu *vcpu)
{
return 1 << ((kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SAS) >> ESR_ELx_SAS_SHIFT);
}

View File

@ -24,7 +24,6 @@
#include <asm/fpsimd.h>
#include <asm/kvm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_mmio.h>
#include <asm/thread_info.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
@ -53,7 +52,7 @@ int kvm_arm_init_sve(void);
int __attribute_const__ kvm_target_cpu(void);
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext);
void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
@ -325,9 +324,6 @@ struct kvm_vcpu_arch {
/* Don't run the guest (internal implementation need) */
bool pause;
/* IO related fields */
struct kvm_decode mmio_decode;
/* Cache some mmu pages needed inside spinlock regions */
struct kvm_mmu_memory_cache mmu_page_cache;
@ -446,8 +442,6 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
void kvm_arm_halt_guest(struct kvm *kvm);
void kvm_arm_resume_guest(struct kvm *kvm);
@ -491,6 +485,14 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
int exception_index);
/* MMIO helpers */
void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len);
int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
phys_addr_t fault_ipa);
int kvm_perf_init(void);
int kvm_perf_teardown(void);

View File

@ -1,29 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (C) 2012 - Virtual Open Systems and Columbia University
* Author: Christoffer Dall <c.dall@virtualopensystems.com>
*/
#ifndef __ARM64_KVM_MMIO_H__
#define __ARM64_KVM_MMIO_H__
#include <linux/kvm_host.h>
#include <asm/kvm_arm.h>
/*
* This is annoying. The mmio code requires this, even if we don't
* need any decoding. To be fixed.
*/
struct kvm_decode {
unsigned long rt;
bool sign_extend;
};
void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len);
int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
phys_addr_t fault_ipa);
#endif /* __ARM64_KVM_MMIO_H__ */

View File

@ -62,6 +62,7 @@
#define PSR_AA32_I_BIT 0x00000080
#define PSR_AA32_A_BIT 0x00000100
#define PSR_AA32_E_BIT 0x00000200
#define PSR_AA32_PAN_BIT 0x00400000
#define PSR_AA32_SSBS_BIT 0x00800000
#define PSR_AA32_DIT_BIT 0x01000000
#define PSR_AA32_Q_BIT 0x08000000

View File

@ -220,10 +220,18 @@ struct kvm_vcpu_events {
#define KVM_REG_ARM_PTIMER_CVAL ARM64_SYS_REG(3, 3, 14, 2, 2)
#define KVM_REG_ARM_PTIMER_CNT ARM64_SYS_REG(3, 3, 14, 0, 1)
/* EL0 Virtual Timer Registers */
/*
* EL0 Virtual Timer Registers
*
* WARNING:
* KVM_REG_ARM_TIMER_CVAL and KVM_REG_ARM_TIMER_CNT are not defined
* with the appropriate register encodings. Their values have been
* accidentally swapped. As this is set API, the definitions here
* must be used, rather than ones derived from the encodings.
*/
#define KVM_REG_ARM_TIMER_CTL ARM64_SYS_REG(3, 3, 14, 3, 1)
#define KVM_REG_ARM_TIMER_CNT ARM64_SYS_REG(3, 3, 14, 3, 2)
#define KVM_REG_ARM_TIMER_CVAL ARM64_SYS_REG(3, 3, 14, 0, 2)
#define KVM_REG_ARM_TIMER_CNT ARM64_SYS_REG(3, 3, 14, 3, 2)
/* KVM-as-firmware specific pseudo-registers */
#define KVM_REG_ARM_FW (0x0014 << KVM_REG_ARM_COPROC_SHIFT)

View File

@ -49,6 +49,7 @@
#define PSR_SSBS_BIT 0x00001000
#define PSR_PAN_BIT 0x00400000
#define PSR_UAO_BIT 0x00800000
#define PSR_DIT_BIT 0x01000000
#define PSR_V_BIT 0x10000000
#define PSR_C_BIT 0x20000000
#define PSR_Z_BIT 0x40000000

View File

@ -47,11 +47,6 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
};
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
return 0;
}
static bool core_reg_offset_is_vreg(u64 off)
{
return off >= KVM_REG_ARM_CORE_REG(fp_regs.vregs) &&

View File

@ -51,7 +51,7 @@
* u64 __guest_enter(struct kvm_vcpu *vcpu,
* struct kvm_cpu_context *host_ctxt);
*/
ENTRY(__guest_enter)
SYM_FUNC_START(__guest_enter)
// x0: vcpu
// x1: host context
// x2-x17: clobbered by macros
@ -100,9 +100,8 @@ alternative_else_nop_endif
// Do not touch any register after this!
eret
sb
ENDPROC(__guest_enter)
ENTRY(__guest_exit)
SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
// x0: return code
// x1: vcpu
// x2-x29,lr: vcpu regs
@ -195,4 +194,4 @@ abort_guest_exit_end:
msr spsr_el2, x4
orr x0, x0, x5
1: ret
ENDPROC(__guest_exit)
SYM_FUNC_END(__guest_enter)

View File

@ -14,9 +14,6 @@
#include <asm/kvm_emulate.h>
#include <asm/esr.h>
#define PSTATE_FAULT_BITS_64 (PSR_MODE_EL1h | PSR_A_BIT | PSR_F_BIT | \
PSR_I_BIT | PSR_D_BIT)
#define CURRENT_EL_SP_EL0_VECTOR 0x0
#define CURRENT_EL_SP_ELx_VECTOR 0x200
#define LOWER_EL_AArch64_VECTOR 0x400
@ -50,6 +47,69 @@ static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
return vcpu_read_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
}
/*
* When an exception is taken, most PSTATE fields are left unchanged in the
* handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
* of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
* layouts, so we don't need to shuffle these for exceptions from AArch32 EL0.
*
* For the SPSR_ELx layout for AArch64, see ARM DDI 0487E.a page C5-429.
* For the SPSR_ELx layout for AArch32, see ARM DDI 0487E.a page C5-426.
*
* Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
* MSB to LSB.
*/
static unsigned long get_except64_pstate(struct kvm_vcpu *vcpu)
{
unsigned long sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
unsigned long old, new;
old = *vcpu_cpsr(vcpu);
new = 0;
new |= (old & PSR_N_BIT);
new |= (old & PSR_Z_BIT);
new |= (old & PSR_C_BIT);
new |= (old & PSR_V_BIT);
// TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
new |= (old & PSR_DIT_BIT);
// PSTATE.UAO is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, page D5-2579.
// PSTATE.PAN is unchanged unless SCTLR_ELx.SPAN == 0b0
// SCTLR_ELx.SPAN is RES1 when ARMv8.1-PAN is not implemented
// See ARM DDI 0487E.a, page D5-2578.
new |= (old & PSR_PAN_BIT);
if (!(sctlr & SCTLR_EL1_SPAN))
new |= PSR_PAN_BIT;
// PSTATE.SS is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, page D2-2452.
// PSTATE.IL is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, page D1-2306.
// PSTATE.SSBS is set to SCTLR_ELx.DSSBS upon any exception to AArch64
// See ARM DDI 0487E.a, page D13-3258
if (sctlr & SCTLR_ELx_DSSBS)
new |= PSR_SSBS_BIT;
// PSTATE.BTYPE is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, pages D1-2293 to D1-2294.
new |= PSR_D_BIT;
new |= PSR_A_BIT;
new |= PSR_I_BIT;
new |= PSR_F_BIT;
new |= PSR_MODE_EL1h;
return new;
}
static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
{
unsigned long cpsr = *vcpu_cpsr(vcpu);
@ -59,7 +119,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
*vcpu_cpsr(vcpu) = get_except64_pstate(vcpu);
vcpu_write_spsr(vcpu, cpsr);
vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
@ -94,7 +154,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
*vcpu_cpsr(vcpu) = get_except64_pstate(vcpu);
vcpu_write_spsr(vcpu, cpsr);
/*

View File

@ -204,7 +204,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
return true;
}
void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
{
kfree(vcpu->arch.sve_state);
}

View File

@ -13,52 +13,46 @@
#include <asm/kvm_mmu.h>
/*
* The LSB of the random hyp VA tag or 0 if no randomization is used.
* The LSB of the HYP VA tag
*/
static u8 tag_lsb;
/*
* The random hyp VA tag value with the region bit if hyp randomization is used
* The HYP VA tag value with the region bit
*/
static u64 tag_val;
static u64 va_mask;
/*
* We want to generate a hyp VA with the following format (with V ==
* vabits_actual):
*
* 63 ... V | V-1 | V-2 .. tag_lsb | tag_lsb - 1 .. 0
* ---------------------------------------------------------
* | 0000000 | hyp_va_msb | random tag | kern linear VA |
* |--------- tag_val -----------|----- va_mask ---|
*
* which does not conflict with the idmap regions.
*/
__init void kvm_compute_layout(void)
{
phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
u64 hyp_va_msb;
int kva_msb;
/* Where is my RAM region? */
hyp_va_msb = idmap_addr & BIT(vabits_actual - 1);
hyp_va_msb ^= BIT(vabits_actual - 1);
kva_msb = fls64((u64)phys_to_virt(memblock_start_of_DRAM()) ^
tag_lsb = fls64((u64)phys_to_virt(memblock_start_of_DRAM()) ^
(u64)(high_memory - 1));
if (kva_msb == (vabits_actual - 1)) {
/*
* No space in the address, let's compute the mask so
* that it covers (vabits_actual - 1) bits, and the region
* bit. The tag stays set to zero.
*/
va_mask = BIT(vabits_actual - 1) - 1;
va_mask |= hyp_va_msb;
} else {
/*
* We do have some free bits to insert a random tag.
* Hyp VAs are now created from kernel linear map VAs
* using the following formula (with V == vabits_actual):
*
* 63 ... V | V-1 | V-2 .. tag_lsb | tag_lsb - 1 .. 0
* ---------------------------------------------------------
* | 0000000 | hyp_va_msb | random tag | kern linear VA |
*/
tag_lsb = kva_msb;
va_mask = GENMASK_ULL(tag_lsb - 1, 0);
tag_val = get_random_long() & GENMASK_ULL(vabits_actual - 2, tag_lsb);
tag_val |= hyp_va_msb;
tag_val >>= tag_lsb;
va_mask = GENMASK_ULL(tag_lsb - 1, 0);
tag_val = hyp_va_msb;
if (tag_lsb != (vabits_actual - 1)) {
/* We have some free bits to insert a random tag. */
tag_val |= get_random_long() & GENMASK_ULL(vabits_actual - 2, tag_lsb);
}
tag_val >>= tag_lsb;
}
static u32 compute_instruction(int n, u32 rd, u32 rn)
@ -117,11 +111,11 @@ void __init kvm_update_va_mask(struct alt_instr *alt,
* VHE doesn't need any address translation, let's NOP
* everything.
*
* Alternatively, if we don't have any spare bits in
* the address, NOP everything after masking that
* kernel VA.
* Alternatively, if the tag is zero (because the layout
* dictates it and we don't have any spare bits in the
* address), NOP everything after masking the kernel VA.
*/
if (has_vhe() || (!tag_lsb && i > 0)) {
if (has_vhe() || (!tag_val && i > 0)) {
updptr[i] = cpu_to_le32(aarch64_insn_gen_nop());
continue;
}

View File

@ -156,7 +156,7 @@ void kvm_mips_free_vcpus(struct kvm *kvm)
struct kvm_vcpu *vcpu;
kvm_for_each_vcpu(i, vcpu, kvm) {
kvm_arch_vcpu_free(vcpu);
kvm_vcpu_destroy(vcpu);
}
mutex_lock(&kvm->lock);
@ -280,25 +280,27 @@ static inline void dump_handler(const char *symbol, void *start, void *end)
pr_debug("\tEND(%s)\n", symbol);
}
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
{
return 0;
}
int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
{
int err, size;
void *gebase, *p, *handler, *refill_start, *refill_end;
int i;
struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
if (!vcpu) {
err = -ENOMEM;
goto out;
}
err = kvm_vcpu_init(vcpu, kvm, id);
kvm_debug("kvm @ %p: create cpu %d at %p\n",
vcpu->kvm, vcpu->vcpu_id, vcpu);
err = kvm_mips_callbacks->vcpu_init(vcpu);
if (err)
goto out_free_cpu;
return err;
kvm_debug("kvm @ %p: create cpu %d at %p\n", kvm, id, vcpu);
hrtimer_init(&vcpu->arch.comparecount_timer, CLOCK_MONOTONIC,
HRTIMER_MODE_REL);
vcpu->arch.comparecount_timer.function = kvm_mips_comparecount_wakeup;
/*
* Allocate space for host mode exception handlers that handle
@ -313,7 +315,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
if (!gebase) {
err = -ENOMEM;
goto out_uninit_cpu;
goto out_uninit_vcpu;
}
kvm_debug("Allocated %d bytes for KVM Exception Handlers @ %p\n",
ALIGN(size, PAGE_SIZE), gebase);
@ -392,38 +394,33 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
vcpu->arch.last_sched_cpu = -1;
vcpu->arch.last_exec_cpu = -1;
return vcpu;
/* Initial guest state */
err = kvm_mips_callbacks->vcpu_setup(vcpu);
if (err)
goto out_free_commpage;
return 0;
out_free_commpage:
kfree(vcpu->arch.kseg0_commpage);
out_free_gebase:
kfree(gebase);
out_uninit_cpu:
kvm_vcpu_uninit(vcpu);
out_free_cpu:
kfree(vcpu);
out:
return ERR_PTR(err);
out_uninit_vcpu:
kvm_mips_callbacks->vcpu_uninit(vcpu);
return err;
}
void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
hrtimer_cancel(&vcpu->arch.comparecount_timer);
kvm_vcpu_uninit(vcpu);
kvm_mips_dump_stats(vcpu);
kvm_mmu_free_memory_caches(vcpu);
kfree(vcpu->arch.guest_ebase);
kfree(vcpu->arch.kseg0_commpage);
kfree(vcpu);
}
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
kvm_arch_vcpu_free(vcpu);
kvm_mips_callbacks->vcpu_uninit(vcpu);
}
int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
@ -1233,37 +1230,12 @@ static enum hrtimer_restart kvm_mips_comparecount_wakeup(struct hrtimer *timer)
return kvm_mips_count_timeout(vcpu);
}
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
int err;
err = kvm_mips_callbacks->vcpu_init(vcpu);
if (err)
return err;
hrtimer_init(&vcpu->arch.comparecount_timer, CLOCK_MONOTONIC,
HRTIMER_MODE_REL);
vcpu->arch.comparecount_timer.function = kvm_mips_comparecount_wakeup;
return 0;
}
void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
{
kvm_mips_callbacks->vcpu_uninit(vcpu);
}
int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
struct kvm_translation *tr)
{
return 0;
}
/* Initial guest state */
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
return kvm_mips_callbacks->vcpu_setup(vcpu);
}
static void kvm_mips_set_c0_status(void)
{
u32 status = read_c0_status();

View File

@ -350,6 +350,7 @@
#define H_SVM_PAGE_OUT 0xEF04
#define H_SVM_INIT_START 0xEF08
#define H_SVM_INIT_DONE 0xEF0C
#define H_SVM_INIT_ABORT 0xEF14
/* Values for 2nd argument to H_SET_MODE */
#define H_SET_MODE_RESOURCE_SET_CIABR 1

View File

@ -19,8 +19,9 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
unsigned long kvmppc_h_svm_init_start(struct kvm *kvm);
unsigned long kvmppc_h_svm_init_done(struct kvm *kvm);
int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn);
unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
struct kvm *kvm);
struct kvm *kvm, bool skip_page_out);
#else
static inline int kvmppc_uvmem_init(void)
{
@ -62,6 +63,11 @@ static inline unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
return H_UNSUPPORTED;
}
static inline unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
{
return H_UNSUPPORTED;
}
static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
{
return -EFAULT;
@ -69,6 +75,6 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
static inline void
kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
struct kvm *kvm) { }
struct kvm *kvm, bool skip_page_out) { }
#endif /* CONFIG_PPC_UV */
#endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */

View File

@ -278,6 +278,7 @@ struct kvm_resize_hpt;
/* Flag values for kvm_arch.secure_guest */
#define KVMPPC_SECURE_INIT_START 0x1 /* H_SVM_INIT_START has been called */
#define KVMPPC_SECURE_INIT_DONE 0x2 /* H_SVM_INIT_DONE completed */
#define KVMPPC_SECURE_INIT_ABORT 0x4 /* H_SVM_INIT_ABORT issued */
struct kvm_arch {
unsigned int lpid;

View File

@ -119,8 +119,7 @@ extern int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr,
enum xlate_instdata xlid, enum xlate_readwrite xlrw,
struct kvmppc_pte *pte);
extern struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm,
unsigned int id);
extern int kvmppc_core_vcpu_create(struct kvm_vcpu *vcpu);
extern void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu);
extern int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu);
extern int kvmppc_core_check_processor_compat(void);
@ -274,7 +273,7 @@ struct kvmppc_ops {
void (*inject_interrupt)(struct kvm_vcpu *vcpu, int vec, u64 srr1_flags);
void (*set_msr)(struct kvm_vcpu *vcpu, u64 msr);
int (*vcpu_run)(struct kvm_run *run, struct kvm_vcpu *vcpu);
struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned int id);
int (*vcpu_create)(struct kvm_vcpu *vcpu);
void (*vcpu_free)(struct kvm_vcpu *vcpu);
int (*check_requests)(struct kvm_vcpu *vcpu);
int (*get_dirty_log)(struct kvm *kvm, struct kvm_dirty_log *log);

View File

@ -471,11 +471,6 @@ int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
}
EXPORT_SYMBOL_GPL(kvmppc_load_last_inst);
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
return 0;
}
int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu)
{
return 0;
@ -789,9 +784,9 @@ void kvmppc_decrementer_func(struct kvm_vcpu *vcpu)
kvm_vcpu_kick(vcpu);
}
struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
int kvmppc_core_vcpu_create(struct kvm_vcpu *vcpu)
{
return kvm->arch.kvm_ops->vcpu_create(kvm, id);
return vcpu->kvm->arch.kvm_ops->vcpu_create(vcpu);
}
void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)

View File

@ -284,7 +284,7 @@ static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
/* Protect linux PTE lookup from page table destruction */
rcu_read_lock_sched(); /* this disables preemption too */
ret = kvmppc_do_h_enter(kvm, flags, pte_index, pteh, ptel,
current->mm->pgd, false, pte_idx_ret);
kvm->mm->pgd, false, pte_idx_ret);
rcu_read_unlock_sched();
if (ret == H_TOO_HARD) {
/* this can't happen */
@ -573,7 +573,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
is_ci = false;
pfn = 0;
page = NULL;
mm = current->mm;
mm = kvm->mm;
pte_size = PAGE_SIZE;
writing = (dsisr & DSISR_ISSTORE) != 0;
/* If writing != 0, then the HPTE must allow writing, if we get here */

View File

@ -1102,7 +1102,7 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
unsigned int shift;
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START)
kvmppc_uvmem_drop_pages(memslot, kvm);
kvmppc_uvmem_drop_pages(memslot, kvm, true);
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
return;

View File

@ -253,10 +253,11 @@ static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
}
}
account_locked_vm(kvm->mm,
kvmppc_stt_pages(kvmppc_tce_pages(stt->size)), false);
kvm_put_kvm(stt->kvm);
account_locked_vm(current->mm,
kvmppc_stt_pages(kvmppc_tce_pages(stt->size)), false);
call_rcu(&stt->rcu, release_spapr_tce_table);
return 0;
@ -272,6 +273,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
{
struct kvmppc_spapr_tce_table *stt = NULL;
struct kvmppc_spapr_tce_table *siter;
struct mm_struct *mm = kvm->mm;
unsigned long npages, size = args->size;
int ret = -ENOMEM;
@ -280,7 +282,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
return -EINVAL;
npages = kvmppc_tce_pages(size);
ret = account_locked_vm(current->mm, kvmppc_stt_pages(npages), true);
ret = account_locked_vm(mm, kvmppc_stt_pages(npages), true);
if (ret)
return ret;
@ -326,7 +328,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
kfree(stt);
fail_acct:
account_locked_vm(current->mm, kvmppc_stt_pages(npages), false);
account_locked_vm(mm, kvmppc_stt_pages(npages), false);
return ret;
}

View File

@ -1091,6 +1091,9 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
case H_SVM_INIT_DONE:
ret = kvmppc_h_svm_init_done(vcpu->kvm);
break;
case H_SVM_INIT_ABORT:
ret = kvmppc_h_svm_init_abort(vcpu->kvm);
break;
default:
return RESUME_HOST;
@ -2271,22 +2274,16 @@ static void debugfs_vcpu_init(struct kvm_vcpu *vcpu, unsigned int id)
}
#endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
unsigned int id)
static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
{
struct kvm_vcpu *vcpu;
int err;
int core;
struct kvmppc_vcore *vcore;
struct kvm *kvm;
unsigned int id;
err = -ENOMEM;
vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
if (!vcpu)
goto out;
err = kvm_vcpu_init(vcpu, kvm, id);
if (err)
goto free_vcpu;
kvm = vcpu->kvm;
id = vcpu->vcpu_id;
vcpu->arch.shared = &vcpu->arch.shregs;
#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
@ -2368,7 +2365,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
mutex_unlock(&kvm->lock);
if (!vcore)
goto free_vcpu;
return err;
spin_lock(&vcore->lock);
++vcore->num_threads;
@ -2383,12 +2380,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
debugfs_vcpu_init(vcpu, id);
return vcpu;
free_vcpu:
kmem_cache_free(kvm_vcpu_cache, vcpu);
out:
return ERR_PTR(err);
return 0;
}
static int kvmhv_set_smt_mode(struct kvm *kvm, unsigned long smt_mode,
@ -2442,8 +2434,6 @@ static void kvmppc_core_vcpu_free_hv(struct kvm_vcpu *vcpu)
unpin_vpa(vcpu->kvm, &vcpu->arch.slb_shadow);
unpin_vpa(vcpu->kvm, &vcpu->arch.vpa);
spin_unlock(&vcpu->arch.vpa_update_lock);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu);
}
static int kvmppc_core_check_requests_hv(struct kvm_vcpu *vcpu)
@ -4285,7 +4275,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
user_vrsave = mfspr(SPRN_VRSAVE);
vcpu->arch.wqp = &vcpu->arch.vcore->wq;
vcpu->arch.pgdir = current->mm->pgd;
vcpu->arch.pgdir = kvm->mm->pgd;
vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
do {
@ -4640,14 +4630,14 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
/* Look up the VMA for the start of this memory slot */
hva = memslot->userspace_addr;
down_read(&current->mm->mmap_sem);
vma = find_vma(current->mm, hva);
down_read(&kvm->mm->mmap_sem);
vma = find_vma(kvm->mm, hva);
if (!vma || vma->vm_start > hva || (vma->vm_flags & VM_IO))
goto up_out;
psize = vma_kernel_pagesize(vma);
up_read(&current->mm->mmap_sem);
up_read(&kvm->mm->mmap_sem);
/* We can handle 4k, 64k or 16M pages in the VRMA */
if (psize >= 0x1000000)
@ -4680,7 +4670,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
return err;
up_out:
up_read(&current->mm->mmap_sem);
up_read(&kvm->mm->mmap_sem);
goto out_srcu;
}
@ -5477,7 +5467,7 @@ static int kvmhv_svm_off(struct kvm *kvm)
continue;
kvm_for_each_memslot(memslot, slots) {
kvmppc_uvmem_drop_pages(memslot, kvm);
kvmppc_uvmem_drop_pages(memslot, kvm, true);
uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
}
}

View File

@ -258,7 +258,7 @@ unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
* QEMU page table with normal PTEs from newly allocated pages.
*/
void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
struct kvm *kvm)
struct kvm *kvm, bool skip_page_out)
{
int i;
struct kvmppc_uvmem_page_pvt *pvt;
@ -276,7 +276,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
uvmem_page = pfn_to_page(uvmem_pfn);
pvt = uvmem_page->zone_device_data;
pvt->skip_page_out = true;
pvt->skip_page_out = skip_page_out;
mutex_unlock(&kvm->arch.uvmem_lock);
pfn = gfn_to_pfn(kvm, gfn);
@ -286,6 +286,34 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
}
}
unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
{
int srcu_idx;
struct kvm_memory_slot *memslot;
/*
* Expect to be called only after INIT_START and before INIT_DONE.
* If INIT_DONE was completed, use normal VM termination sequence.
*/
if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
return H_UNSUPPORTED;
if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE)
return H_STATE;
srcu_idx = srcu_read_lock(&kvm->srcu);
kvm_for_each_memslot(memslot, kvm_memslots(kvm))
kvmppc_uvmem_drop_pages(memslot, kvm, false);
srcu_read_unlock(&kvm->srcu, srcu_idx);
kvm->arch.secure_guest = 0;
uv_svm_terminate(kvm->arch.lpid);
return H_PARAMETER;
}
/*
* Get a free device PFN from the pool
*
@ -543,7 +571,7 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
ret = migrate_vma_setup(&mig);
if (ret)
return ret;
goto out;
spage = migrate_pfn_to_page(*mig.src);
if (!spage || !(*mig.src & MIGRATE_PFN_MIGRATE))

View File

@ -1744,21 +1744,17 @@ static int kvmppc_set_one_reg_pr(struct kvm_vcpu *vcpu, u64 id,
return r;
}
static struct kvm_vcpu *kvmppc_core_vcpu_create_pr(struct kvm *kvm,
unsigned int id)
static int kvmppc_core_vcpu_create_pr(struct kvm_vcpu *vcpu)
{
struct kvmppc_vcpu_book3s *vcpu_book3s;
struct kvm_vcpu *vcpu;
int err = -ENOMEM;
unsigned long p;
int err;
vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
if (!vcpu)
goto out;
err = -ENOMEM;
vcpu_book3s = vzalloc(sizeof(struct kvmppc_vcpu_book3s));
if (!vcpu_book3s)
goto free_vcpu;
goto out;
vcpu->arch.book3s = vcpu_book3s;
#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
@ -1768,14 +1764,9 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_pr(struct kvm *kvm,
goto free_vcpu3s;
#endif
err = kvm_vcpu_init(vcpu, kvm, id);
if (err)
goto free_shadow_vcpu;
err = -ENOMEM;
p = __get_free_page(GFP_KERNEL|__GFP_ZERO);
if (!p)
goto uninit_vcpu;
goto free_shadow_vcpu;
vcpu->arch.shared = (void *)p;
#ifdef CONFIG_PPC_BOOK3S_64
/* Always start the shared struct in native endian mode */
@ -1806,22 +1797,20 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_pr(struct kvm *kvm,
err = kvmppc_mmu_init(vcpu);
if (err < 0)
goto uninit_vcpu;
goto free_shared_page;
return vcpu;
return 0;
uninit_vcpu:
kvm_vcpu_uninit(vcpu);
free_shared_page:
free_page((unsigned long)vcpu->arch.shared);
free_shadow_vcpu:
#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
kfree(vcpu->arch.shadow_vcpu);
free_vcpu3s:
#endif
vfree(vcpu_book3s);
free_vcpu:
kmem_cache_free(kvm_vcpu_cache, vcpu);
out:
return ERR_PTR(err);
return err;
}
static void kvmppc_core_vcpu_free_pr(struct kvm_vcpu *vcpu)
@ -1829,12 +1818,10 @@ static void kvmppc_core_vcpu_free_pr(struct kvm_vcpu *vcpu)
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
free_page((unsigned long)vcpu->arch.shared & PAGE_MASK);
kvm_vcpu_uninit(vcpu);
#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
kfree(vcpu->arch.shadow_vcpu);
#endif
vfree(vcpu_book3s);
kmem_cache_free(kvm_vcpu_cache, vcpu);
}
static int kvmppc_vcpu_run_pr(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
@ -2030,6 +2017,7 @@ static int kvm_vm_ioctl_get_smmu_info_pr(struct kvm *kvm,
{
/* We should not get called */
BUG();
return 0;
}
#endif /* CONFIG_PPC64 */

View File

@ -631,7 +631,7 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
srcu_idx = srcu_read_lock(&kvm->srcu);
gfn = gpa_to_gfn(kvm_eq.qaddr);
page_size = kvm_host_page_size(kvm, gfn);
page_size = kvm_host_page_size(vcpu, gfn);
if (1ull << kvm_eq.qshift > page_size) {
srcu_read_unlock(&kvm->srcu, srcu_idx);
pr_warn("Incompatible host page size %lx!\n", page_size);

View File

@ -775,7 +775,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
debug = current->thread.debug;
current->thread.debug = vcpu->arch.dbg_reg;
vcpu->arch.pgdir = current->mm->pgd;
vcpu->arch.pgdir = vcpu->kvm->mm->pgd;
kvmppc_fix_ee_before_entry();
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
@ -1377,36 +1377,6 @@ static void kvmppc_set_tsr(struct kvm_vcpu *vcpu, u32 new_tsr)
update_timer_ints(vcpu);
}
/* Initial guest state: 16MB mapping 0 -> 0, PC = 0, MSR = 0, R1 = 16MB */
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
int i;
int r;
vcpu->arch.regs.nip = 0;
vcpu->arch.shared->pir = vcpu->vcpu_id;
kvmppc_set_gpr(vcpu, 1, (16<<20) - 8); /* -8 for the callee-save LR slot */
kvmppc_set_msr(vcpu, 0);
#ifndef CONFIG_KVM_BOOKE_HV
vcpu->arch.shadow_msr = MSR_USER | MSR_IS | MSR_DS;
vcpu->arch.shadow_pid = 1;
vcpu->arch.shared->msr = 0;
#endif
/* Eye-catching numbers so we know if the guest takes an interrupt
* before it's programmed its own IVPR/IVORs. */
vcpu->arch.ivpr = 0x55550000;
for (i = 0; i < BOOKE_IRQPRIO_MAX; i++)
vcpu->arch.ivor[i] = 0x7700 | i * 4;
kvmppc_init_timing_stats(vcpu);
r = kvmppc_core_vcpu_setup(vcpu);
kvmppc_sanity_check(vcpu);
return r;
}
int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu)
{
/* setup watchdog timer once */
@ -2114,9 +2084,40 @@ int kvmppc_core_init_vm(struct kvm *kvm)
return kvm->arch.kvm_ops->init_vm(kvm);
}
struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
int kvmppc_core_vcpu_create(struct kvm_vcpu *vcpu)
{
return kvm->arch.kvm_ops->vcpu_create(kvm, id);
int i;
int r;
r = vcpu->kvm->arch.kvm_ops->vcpu_create(vcpu);
if (r)
return r;
/* Initial guest state: 16MB mapping 0 -> 0, PC = 0, MSR = 0, R1 = 16MB */
vcpu->arch.regs.nip = 0;
vcpu->arch.shared->pir = vcpu->vcpu_id;
kvmppc_set_gpr(vcpu, 1, (16<<20) - 8); /* -8 for the callee-save LR slot */
kvmppc_set_msr(vcpu, 0);
#ifndef CONFIG_KVM_BOOKE_HV
vcpu->arch.shadow_msr = MSR_USER | MSR_IS | MSR_DS;
vcpu->arch.shadow_pid = 1;
vcpu->arch.shared->msr = 0;
#endif
/* Eye-catching numbers so we know if the guest takes an interrupt
* before it's programmed its own IVPR/IVORs. */
vcpu->arch.ivpr = 0x55550000;
for (i = 0; i < BOOKE_IRQPRIO_MAX; i++)
vcpu->arch.ivor[i] = 0x7700 | i * 4;
kvmppc_init_timing_stats(vcpu);
r = kvmppc_core_vcpu_setup(vcpu);
if (r)
vcpu->kvm->arch.kvm_ops->vcpu_free(vcpu);
kvmppc_sanity_check(vcpu);
return r;
}
void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)

View File

@ -433,31 +433,16 @@ static int kvmppc_set_one_reg_e500(struct kvm_vcpu *vcpu, u64 id,
return r;
}
static struct kvm_vcpu *kvmppc_core_vcpu_create_e500(struct kvm *kvm,
unsigned int id)
static int kvmppc_core_vcpu_create_e500(struct kvm_vcpu *vcpu)
{
struct kvmppc_vcpu_e500 *vcpu_e500;
struct kvm_vcpu *vcpu;
int err;
BUILD_BUG_ON_MSG(offsetof(struct kvmppc_vcpu_e500, vcpu) != 0,
"struct kvm_vcpu must be at offset 0 for arch usercopy region");
BUILD_BUG_ON(offsetof(struct kvmppc_vcpu_e500, vcpu) != 0);
vcpu_e500 = to_e500(vcpu);
vcpu_e500 = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
if (!vcpu_e500) {
err = -ENOMEM;
goto out;
}
vcpu = &vcpu_e500->vcpu;
err = kvm_vcpu_init(vcpu, kvm, id);
if (err)
goto free_vcpu;
if (kvmppc_e500_id_table_alloc(vcpu_e500) == NULL) {
err = -ENOMEM;
goto uninit_vcpu;
}
if (kvmppc_e500_id_table_alloc(vcpu_e500) == NULL)
return -ENOMEM;
err = kvmppc_e500_tlb_init(vcpu_e500);
if (err)
@ -469,18 +454,13 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_e500(struct kvm *kvm,
goto uninit_tlb;
}
return vcpu;
return 0;
uninit_tlb:
kvmppc_e500_tlb_uninit(vcpu_e500);
uninit_id:
kvmppc_e500_id_table_free(vcpu_e500);
uninit_vcpu:
kvm_vcpu_uninit(vcpu);
free_vcpu:
kmem_cache_free(kvm_vcpu_cache, vcpu_e500);
out:
return ERR_PTR(err);
return err;
}
static void kvmppc_core_vcpu_free_e500(struct kvm_vcpu *vcpu)
@ -490,8 +470,6 @@ static void kvmppc_core_vcpu_free_e500(struct kvm_vcpu *vcpu)
free_page((unsigned long)vcpu->arch.shared);
kvmppc_e500_tlb_uninit(vcpu_e500);
kvmppc_e500_id_table_free(vcpu_e500);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu_e500);
}
static int kvmppc_core_init_vm_e500(struct kvm *kvm)

View File

@ -301,30 +301,20 @@ static int kvmppc_set_one_reg_e500mc(struct kvm_vcpu *vcpu, u64 id,
return r;
}
static struct kvm_vcpu *kvmppc_core_vcpu_create_e500mc(struct kvm *kvm,
unsigned int id)
static int kvmppc_core_vcpu_create_e500mc(struct kvm_vcpu *vcpu)
{
struct kvmppc_vcpu_e500 *vcpu_e500;
struct kvm_vcpu *vcpu;
int err;
vcpu_e500 = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
if (!vcpu_e500) {
err = -ENOMEM;
goto out;
}
vcpu = &vcpu_e500->vcpu;
BUILD_BUG_ON(offsetof(struct kvmppc_vcpu_e500, vcpu) != 0);
vcpu_e500 = to_e500(vcpu);
/* Invalid PIR value -- this LPID dosn't have valid state on any cpu */
vcpu->arch.oldpir = 0xffffffff;
err = kvm_vcpu_init(vcpu, kvm, id);
if (err)
goto free_vcpu;
err = kvmppc_e500_tlb_init(vcpu_e500);
if (err)
goto uninit_vcpu;
return err;
vcpu->arch.shared = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
if (!vcpu->arch.shared) {
@ -332,17 +322,11 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_e500mc(struct kvm *kvm,
goto uninit_tlb;
}
return vcpu;
return 0;
uninit_tlb:
kvmppc_e500_tlb_uninit(vcpu_e500);
uninit_vcpu:
kvm_vcpu_uninit(vcpu);
free_vcpu:
kmem_cache_free(kvm_vcpu_cache, vcpu_e500);
out:
return ERR_PTR(err);
return err;
}
static void kvmppc_core_vcpu_free_e500mc(struct kvm_vcpu *vcpu)
@ -351,8 +335,6 @@ static void kvmppc_core_vcpu_free_e500mc(struct kvm_vcpu *vcpu)
free_page((unsigned long)vcpu->arch.shared);
kvmppc_e500_tlb_uninit(vcpu_e500);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu_e500);
}
static int kvmppc_core_init_vm_e500mc(struct kvm *kvm)

View File

@ -73,7 +73,6 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
{
struct kvm_run *run = vcpu->run;
u32 inst;
int ra, rs, rt;
enum emulation_result emulated = EMULATE_FAIL;
int advance = 1;
struct instruction_op op;
@ -85,10 +84,6 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
if (emulated != EMULATE_DONE)
return emulated;
ra = get_ra(inst);
rs = get_rs(inst);
rt = get_rt(inst);
vcpu->arch.mmio_vsx_copy_nums = 0;
vcpu->arch.mmio_vsx_offset = 0;
vcpu->arch.mmio_copy_type = KVMPPC_VSX_COPY_NONE;

View File

@ -475,7 +475,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
#endif
kvm_for_each_vcpu(i, vcpu, kvm)
kvm_arch_vcpu_free(vcpu);
kvm_vcpu_destroy(vcpu);
mutex_lock(&kvm->lock);
for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
@ -720,22 +720,55 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
kvmppc_core_flush_memslot(kvm, slot);
}
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
{
return 0;
}
static enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer)
{
struct kvm_vcpu *vcpu;
vcpu = kvmppc_core_vcpu_create(kvm, id);
if (!IS_ERR(vcpu)) {
vcpu->arch.wqp = &vcpu->wq;
kvmppc_create_vcpu_debugfs(vcpu, id);
}
return vcpu;
vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer);
kvmppc_decrementer_func(vcpu);
return HRTIMER_NORESTART;
}
int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
{
int err;
hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup;
vcpu->arch.dec_expires = get_tb();
#ifdef CONFIG_KVM_EXIT_TIMING
mutex_init(&vcpu->arch.exit_timing_lock);
#endif
err = kvmppc_subarch_vcpu_init(vcpu);
if (err)
return err;
err = kvmppc_core_vcpu_create(vcpu);
if (err)
goto out_vcpu_uninit;
vcpu->arch.wqp = &vcpu->wq;
kvmppc_create_vcpu_debugfs(vcpu, vcpu->vcpu_id);
return 0;
out_vcpu_uninit:
kvmppc_mmu_destroy(vcpu);
kvmppc_subarch_vcpu_uninit(vcpu);
return err;
}
void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
{
}
void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
/* Make sure we're not using the vcpu anymore */
hrtimer_cancel(&vcpu->arch.dec_timer);
@ -758,11 +791,9 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
}
kvmppc_core_vcpu_free(vcpu);
}
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
kvm_arch_vcpu_free(vcpu);
kvmppc_mmu_destroy(vcpu);
kvmppc_subarch_vcpu_uninit(vcpu);
}
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
@ -770,37 +801,6 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
return kvmppc_core_pending_dec(vcpu);
}
static enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer)
{
struct kvm_vcpu *vcpu;
vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer);
kvmppc_decrementer_func(vcpu);
return HRTIMER_NORESTART;
}
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
int ret;
hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup;
vcpu->arch.dec_expires = get_tb();
#ifdef CONFIG_KVM_EXIT_TIMING
mutex_init(&vcpu->arch.exit_timing_lock);
#endif
ret = kvmppc_subarch_vcpu_init(vcpu);
return ret;
}
void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
{
kvmppc_mmu_destroy(vcpu);
kvmppc_subarch_vcpu_uninit(vcpu);
}
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
#ifdef CONFIG_BOOKE

View File

@ -914,7 +914,6 @@ extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);
static inline void kvm_arch_hardware_disable(void) {}
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_free_memslot(struct kvm *kvm,
struct kvm_memory_slot *free, struct kvm_memory_slot *dont) {}

View File

@ -2530,9 +2530,6 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
if (vcpu->kvm->arch.use_cmma)
kvm_s390_vcpu_unsetup_cmma(vcpu);
free_page((unsigned long)(vcpu->arch.sie_block));
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu);
}
static void kvm_free_vcpus(struct kvm *kvm)
@ -2541,7 +2538,7 @@ static void kvm_free_vcpus(struct kvm *kvm)
struct kvm_vcpu *vcpu;
kvm_for_each_vcpu(i, vcpu, kvm)
kvm_arch_vcpu_destroy(vcpu);
kvm_vcpu_destroy(vcpu);
mutex_lock(&kvm->lock);
for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
@ -2703,39 +2700,6 @@ static int sca_can_add_vcpu(struct kvm *kvm, unsigned int id)
return rc == 0 && id < KVM_S390_ESCA_CPU_SLOTS;
}
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
kvm_clear_async_pf_completion_queue(vcpu);
vcpu->run->kvm_valid_regs = KVM_SYNC_PREFIX |
KVM_SYNC_GPRS |
KVM_SYNC_ACRS |
KVM_SYNC_CRS |
KVM_SYNC_ARCH0 |
KVM_SYNC_PFAULT;
kvm_s390_set_prefix(vcpu, 0);
if (test_kvm_facility(vcpu->kvm, 64))
vcpu->run->kvm_valid_regs |= KVM_SYNC_RICCB;
if (test_kvm_facility(vcpu->kvm, 82))
vcpu->run->kvm_valid_regs |= KVM_SYNC_BPBC;
if (test_kvm_facility(vcpu->kvm, 133))
vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
if (test_kvm_facility(vcpu->kvm, 156))
vcpu->run->kvm_valid_regs |= KVM_SYNC_ETOKEN;
/* fprs can be synchronized via vrs, even if the guest has no vx. With
* MACHINE_HAS_VX, (load|store)_fpu_regs() will work with vrs format.
*/
if (MACHINE_HAS_VX)
vcpu->run->kvm_valid_regs |= KVM_SYNC_VRS;
else
vcpu->run->kvm_valid_regs |= KVM_SYNC_FPRS;
if (kvm_is_ucontrol(vcpu->kvm))
return __kvm_ucontrol_vcpu_init(vcpu);
return 0;
}
/* needs disabled preemption to protect from TOD sync and vcpu_load/put */
static void __start_cpu_timer_accounting(struct kvm_vcpu *vcpu)
{
@ -2962,7 +2926,7 @@ static void kvm_s390_vcpu_setup_model(struct kvm_vcpu *vcpu)
vcpu->arch.sie_block->fac = (u32)(u64) model->fac_list;
}
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
static int kvm_s390_vcpu_setup(struct kvm_vcpu *vcpu)
{
int rc = 0;
@ -3035,26 +2999,22 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
return rc;
}
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
unsigned int id)
int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
{
struct kvm_vcpu *vcpu;
struct sie_page *sie_page;
int rc = -EINVAL;
if (!kvm_is_ucontrol(kvm) && !sca_can_add_vcpu(kvm, id))
goto out;
return -EINVAL;
return 0;
}
rc = -ENOMEM;
vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
if (!vcpu)
goto out;
int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
{
struct sie_page *sie_page;
int rc;
BUILD_BUG_ON(sizeof(struct sie_page) != 4096);
sie_page = (struct sie_page *) get_zeroed_page(GFP_KERNEL);
if (!sie_page)
goto out_free_cpu;
return -ENOMEM;
vcpu->arch.sie_block = &sie_page->sie_block;
vcpu->arch.sie_block->itdba = (unsigned long) &sie_page->itdb;
@ -3063,27 +3023,59 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
vcpu->arch.sie_block->mso = 0;
vcpu->arch.sie_block->msl = sclp.hamax;
vcpu->arch.sie_block->icpua = id;
vcpu->arch.sie_block->icpua = vcpu->vcpu_id;
spin_lock_init(&vcpu->arch.local_int.lock);
vcpu->arch.sie_block->gd = (u32)(u64)kvm->arch.gisa_int.origin;
vcpu->arch.sie_block->gd = (u32)(u64)vcpu->kvm->arch.gisa_int.origin;
if (vcpu->arch.sie_block->gd && sclp.has_gisaf)
vcpu->arch.sie_block->gd |= GISA_FORMAT1;
seqcount_init(&vcpu->arch.cputm_seqcount);
rc = kvm_vcpu_init(vcpu, kvm, id);
if (rc)
goto out_free_sie_block;
VM_EVENT(kvm, 3, "create cpu %d at 0x%pK, sie block at 0x%pK", id, vcpu,
vcpu->arch.sie_block);
trace_kvm_s390_create_vcpu(id, vcpu, vcpu->arch.sie_block);
vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
kvm_clear_async_pf_completion_queue(vcpu);
vcpu->run->kvm_valid_regs = KVM_SYNC_PREFIX |
KVM_SYNC_GPRS |
KVM_SYNC_ACRS |
KVM_SYNC_CRS |
KVM_SYNC_ARCH0 |
KVM_SYNC_PFAULT;
kvm_s390_set_prefix(vcpu, 0);
if (test_kvm_facility(vcpu->kvm, 64))
vcpu->run->kvm_valid_regs |= KVM_SYNC_RICCB;
if (test_kvm_facility(vcpu->kvm, 82))
vcpu->run->kvm_valid_regs |= KVM_SYNC_BPBC;
if (test_kvm_facility(vcpu->kvm, 133))
vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
if (test_kvm_facility(vcpu->kvm, 156))
vcpu->run->kvm_valid_regs |= KVM_SYNC_ETOKEN;
/* fprs can be synchronized via vrs, even if the guest has no vx. With
* MACHINE_HAS_VX, (load|store)_fpu_regs() will work with vrs format.
*/
if (MACHINE_HAS_VX)
vcpu->run->kvm_valid_regs |= KVM_SYNC_VRS;
else
vcpu->run->kvm_valid_regs |= KVM_SYNC_FPRS;
return vcpu;
if (kvm_is_ucontrol(vcpu->kvm)) {
rc = __kvm_ucontrol_vcpu_init(vcpu);
if (rc)
goto out_free_sie_block;
}
VM_EVENT(vcpu->kvm, 3, "create cpu %d at 0x%pK, sie block at 0x%pK",
vcpu->vcpu_id, vcpu, vcpu->arch.sie_block);
trace_kvm_s390_create_vcpu(vcpu->vcpu_id, vcpu, vcpu->arch.sie_block);
rc = kvm_s390_vcpu_setup(vcpu);
if (rc)
goto out_ucontrol_uninit;
return 0;
out_ucontrol_uninit:
if (kvm_is_ucontrol(vcpu->kvm))
gmap_remove(vcpu->arch.gmap);
out_free_sie_block:
free_page((unsigned long)(vcpu->arch.sie_block));
out_free_cpu:
kmem_cache_free(kvm_vcpu_cache, vcpu);
out:
return ERR_PTR(rc);
return rc;
}
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)

View File

@ -222,6 +222,10 @@ struct x86_emulate_ops {
bool (*get_cpuid)(struct x86_emulate_ctxt *ctxt, u32 *eax, u32 *ebx,
u32 *ecx, u32 *edx, bool check_limit);
bool (*guest_has_long_mode)(struct x86_emulate_ctxt *ctxt);
bool (*guest_has_movbe)(struct x86_emulate_ctxt *ctxt);
bool (*guest_has_fxsr)(struct x86_emulate_ctxt *ctxt);
void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
unsigned (*get_hflags)(struct x86_emulate_ctxt *ctxt);

View File

@ -175,6 +175,11 @@ enum {
VCPU_SREG_LDTR,
};
enum exit_fastpath_completion {
EXIT_FASTPATH_NONE,
EXIT_FASTPATH_SKIP_EMUL_INS,
};
#include <asm/kvm_emulate.h>
#define KVM_NR_MEM_OBJS 40
@ -378,12 +383,12 @@ struct kvm_mmu {
void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err,
int (*page_fault)(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u32 err,
bool prefault);
void (*inject_page_fault)(struct kvm_vcpu *vcpu,
struct x86_exception *fault);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
struct x86_exception *exception);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gpa_t gva_or_gpa,
u32 access, struct x86_exception *exception);
gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
struct x86_exception *exception);
int (*sync_page)(struct kvm_vcpu *vcpu,
@ -606,7 +611,7 @@ struct kvm_vcpu_arch {
* Paging state of an L2 guest (used for nested npt)
*
* This context will save all necessary information to walk page tables
* of the an L2 guest. This context is only initialized for page table
* of an L2 guest. This context is only initialized for page table
* walking and not for faulting since we never handle l2 page faults on
* the host.
*/
@ -685,10 +690,10 @@ struct kvm_vcpu_arch {
bool pvclock_set_guest_stopped_request;
struct {
u8 preempted;
u64 msr_val;
u64 last_steal;
struct gfn_to_hva_cache stime;
struct kvm_steal_time steal;
struct gfn_to_pfn_cache cache;
} st;
u64 tsc_offset;
@ -1022,6 +1027,11 @@ struct kvm_lapic_irq {
bool msi_redir_hint;
};
static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
{
return dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
}
struct kvm_x86_ops {
int (*cpu_has_kvm_support)(void); /* __init */
int (*disabled_by_bios)(void); /* __init */
@ -1040,7 +1050,7 @@ struct kvm_x86_ops {
void (*vm_destroy)(struct kvm *kvm);
/* Create, but do not attach this VCPU */
struct kvm_vcpu *(*vcpu_create)(struct kvm *kvm, unsigned id);
int (*vcpu_create)(struct kvm_vcpu *vcpu);
void (*vcpu_free)(struct kvm_vcpu *vcpu);
void (*vcpu_reset)(struct kvm_vcpu *vcpu, bool init_event);
@ -1090,7 +1100,8 @@ struct kvm_x86_ops {
void (*tlb_flush_gva)(struct kvm_vcpu *vcpu, gva_t addr);
void (*run)(struct kvm_vcpu *vcpu);
int (*handle_exit)(struct kvm_vcpu *vcpu);
int (*handle_exit)(struct kvm_vcpu *vcpu,
enum exit_fastpath_completion exit_fastpath);
int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu);
@ -1140,11 +1151,13 @@ struct kvm_x86_ops {
int (*check_intercept)(struct kvm_vcpu *vcpu,
struct x86_instruction_info *info,
enum x86_intercept_stage stage);
void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu);
void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu,
enum exit_fastpath_completion *exit_fastpath);
bool (*mpx_supported)(void);
bool (*xsaves_supported)(void);
bool (*umip_emulated)(void);
bool (*pt_supported)(void);
bool (*pku_supported)(void);
int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
void (*request_immediate_exit)(struct kvm_vcpu *vcpu);
@ -1468,7 +1481,7 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu);
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u64 error_code,
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
void *insn, int insn_len);
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
@ -1614,7 +1627,6 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu);
int kvm_is_in_guest(void);
int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size);
int x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size);
bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);

View File

@ -566,6 +566,10 @@ static inline void update_page_count(int level, unsigned long pages) { }
extern pte_t *lookup_address(unsigned long address, unsigned int *level);
extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
unsigned int *level);
struct mm_struct;
extern pte_t *lookup_address_in_mm(struct mm_struct *mm, unsigned long address,
unsigned int *level);
extern pmd_t *lookup_pmd_address(unsigned long address);
extern phys_addr_t slow_virt_to_phys(void *__address);
extern int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn,

View File

@ -22,8 +22,8 @@
/*
* Definitions of Primary Processor-Based VM-Execution Controls.
*/
#define CPU_BASED_VIRTUAL_INTR_PENDING VMCS_CONTROL_BIT(VIRTUAL_INTR_PENDING)
#define CPU_BASED_USE_TSC_OFFSETING VMCS_CONTROL_BIT(TSC_OFFSETTING)
#define CPU_BASED_INTR_WINDOW_EXITING VMCS_CONTROL_BIT(VIRTUAL_INTR_PENDING)
#define CPU_BASED_USE_TSC_OFFSETTING VMCS_CONTROL_BIT(TSC_OFFSETTING)
#define CPU_BASED_HLT_EXITING VMCS_CONTROL_BIT(HLT_EXITING)
#define CPU_BASED_INVLPG_EXITING VMCS_CONTROL_BIT(INVLPG_EXITING)
#define CPU_BASED_MWAIT_EXITING VMCS_CONTROL_BIT(MWAIT_EXITING)
@ -34,7 +34,7 @@
#define CPU_BASED_CR8_LOAD_EXITING VMCS_CONTROL_BIT(CR8_LOAD_EXITING)
#define CPU_BASED_CR8_STORE_EXITING VMCS_CONTROL_BIT(CR8_STORE_EXITING)
#define CPU_BASED_TPR_SHADOW VMCS_CONTROL_BIT(VIRTUAL_TPR)
#define CPU_BASED_VIRTUAL_NMI_PENDING VMCS_CONTROL_BIT(VIRTUAL_NMI_PENDING)
#define CPU_BASED_NMI_WINDOW_EXITING VMCS_CONTROL_BIT(VIRTUAL_NMI_PENDING)
#define CPU_BASED_MOV_DR_EXITING VMCS_CONTROL_BIT(MOV_DR_EXITING)
#define CPU_BASED_UNCOND_IO_EXITING VMCS_CONTROL_BIT(UNCOND_IO_EXITING)
#define CPU_BASED_USE_IO_BITMAPS VMCS_CONTROL_BIT(USE_IO_BITMAPS)

View File

@ -33,7 +33,7 @@
#define EXIT_REASON_TRIPLE_FAULT 2
#define EXIT_REASON_INIT_SIGNAL 3
#define EXIT_REASON_PENDING_INTERRUPT 7
#define EXIT_REASON_INTERRUPT_WINDOW 7
#define EXIT_REASON_NMI_WINDOW 8
#define EXIT_REASON_TASK_SWITCH 9
#define EXIT_REASON_CPUID 10
@ -94,7 +94,7 @@
{ EXIT_REASON_EXTERNAL_INTERRUPT, "EXTERNAL_INTERRUPT" }, \
{ EXIT_REASON_TRIPLE_FAULT, "TRIPLE_FAULT" }, \
{ EXIT_REASON_INIT_SIGNAL, "INIT_SIGNAL" }, \
{ EXIT_REASON_PENDING_INTERRUPT, "PENDING_INTERRUPT" }, \
{ EXIT_REASON_INTERRUPT_WINDOW, "INTERRUPT_WINDOW" }, \
{ EXIT_REASON_NMI_WINDOW, "NMI_WINDOW" }, \
{ EXIT_REASON_TASK_SWITCH, "TASK_SWITCH" }, \
{ EXIT_REASON_CPUID, "CPUID" }, \

View File

@ -62,7 +62,7 @@ u64 kvm_supported_xcr0(void)
return xcr0;
}
#define F(x) bit(X86_FEATURE_##x)
#define F feature_bit
int kvm_update_cpuid(struct kvm_vcpu *vcpu)
{
@ -281,8 +281,9 @@ out:
return r;
}
static void cpuid_mask(u32 *word, int wordnum)
static __always_inline void cpuid_mask(u32 *word, int wordnum)
{
reverse_cpuid_check(wordnum);
*word &= boot_cpu_data.x86_capability[wordnum];
}
@ -352,6 +353,7 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
unsigned f_umip = kvm_x86_ops->umip_emulated() ? F(UMIP) : 0;
unsigned f_intel_pt = kvm_x86_ops->pt_supported() ? F(INTEL_PT) : 0;
unsigned f_la57;
unsigned f_pku = kvm_x86_ops->pku_supported() ? F(PKU) : 0;
/* cpuid 7.0.ebx */
const u32 kvm_cpuid_7_0_ebx_x86_features =
@ -363,7 +365,7 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features =
F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
F(AVX512VBMI) | F(LA57) | 0 /*PKU*/ | 0 /*OSPKE*/ | F(RDPID) |
F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/;
@ -392,6 +394,7 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
/* Set LA57 based on hardware capability. */
entry->ecx |= f_la57;
entry->ecx |= f_umip;
entry->ecx |= f_pku;
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
entry->ecx &= ~F(PKU);

View File

@ -53,15 +53,46 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
[CPUID_7_EDX] = { 7, 0, CPUID_EDX},
[CPUID_7_1_EAX] = { 7, 1, CPUID_EAX},
};
/*
* Reverse CPUID and its derivatives can only be used for hardware-defined
* feature words, i.e. words whose bits directly correspond to a CPUID leaf.
* Retrieving a feature bit or masking guest CPUID from a Linux-defined word
* is nonsensical as the bit number/mask is an arbitrary software-defined value
* and can't be used by KVM to query/control guest capabilities. And obviously
* the leaf being queried must have an entry in the lookup table.
*/
static __always_inline void reverse_cpuid_check(unsigned x86_leaf)
{
BUILD_BUG_ON(x86_leaf == CPUID_LNX_1);
BUILD_BUG_ON(x86_leaf == CPUID_LNX_2);
BUILD_BUG_ON(x86_leaf == CPUID_LNX_3);
BUILD_BUG_ON(x86_leaf == CPUID_LNX_4);
BUILD_BUG_ON(x86_leaf >= ARRAY_SIZE(reverse_cpuid));
BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
}
/*
* Retrieve the bit mask from an X86_FEATURE_* definition. Features contain
* the hardware defined bit number (stored in bits 4:0) and a software defined
* "word" (stored in bits 31:5). The word is used to index into arrays of
* bit masks that hold the per-cpu feature capabilities, e.g. this_cpu_has().
*/
static __always_inline u32 __feature_bit(int x86_feature)
{
reverse_cpuid_check(x86_feature / 32);
return 1 << (x86_feature & 31);
}
#define feature_bit(name) __feature_bit(X86_FEATURE_##name)
static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
{
unsigned x86_leaf = x86_feature / 32;
BUILD_BUG_ON(x86_leaf >= ARRAY_SIZE(reverse_cpuid));
BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
reverse_cpuid_check(x86_leaf);
return reverse_cpuid[x86_leaf];
}
@ -93,15 +124,11 @@ static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu, unsigned x86_
{
int *reg;
if (x86_feature == X86_FEATURE_XSAVE &&
!static_cpu_has(X86_FEATURE_XSAVE))
return false;
reg = guest_cpuid_get_register(vcpu, x86_feature);
if (!reg)
return false;
return *reg & bit(x86_feature);
return *reg & __feature_bit(x86_feature);
}
static __always_inline void guest_cpuid_clear(struct kvm_vcpu *vcpu, unsigned x86_feature)
@ -110,7 +137,7 @@ static __always_inline void guest_cpuid_clear(struct kvm_vcpu *vcpu, unsigned x8
reg = guest_cpuid_get_register(vcpu, x86_feature);
if (reg)
*reg &= ~bit(x86_feature);
*reg &= ~__feature_bit(x86_feature);
}
static inline bool guest_cpuid_is_amd(struct kvm_vcpu *vcpu)

View File

@ -22,6 +22,7 @@
#include "kvm_cache_regs.h"
#include <asm/kvm_emulate.h>
#include <linux/stringify.h>
#include <asm/fpu/api.h>
#include <asm/debugreg.h>
#include <asm/nospec-branch.h>
@ -310,7 +311,9 @@ static void invalidate_registers(struct x86_emulate_ctxt *ctxt)
#define ON64(x)
#endif
static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *));
typedef void (*fastop_t)(struct fastop *);
static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop);
#define __FOP_FUNC(name) \
".align " __stringify(FASTOP_SIZE) " \n\t" \
@ -1075,8 +1078,23 @@ static void fetch_register_operand(struct operand *op)
}
}
static void read_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data, int reg)
static void emulator_get_fpu(void)
{
fpregs_lock();
fpregs_assert_state_consistent();
if (test_thread_flag(TIF_NEED_FPU_LOAD))
switch_fpu_return();
}
static void emulator_put_fpu(void)
{
fpregs_unlock();
}
static void read_sse_reg(sse128_t *data, int reg)
{
emulator_get_fpu();
switch (reg) {
case 0: asm("movdqa %%xmm0, %0" : "=m"(*data)); break;
case 1: asm("movdqa %%xmm1, %0" : "=m"(*data)); break;
@ -1098,11 +1116,12 @@ static void read_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data, int reg)
#endif
default: BUG();
}
emulator_put_fpu();
}
static void write_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data,
int reg)
static void write_sse_reg(sse128_t *data, int reg)
{
emulator_get_fpu();
switch (reg) {
case 0: asm("movdqa %0, %%xmm0" : : "m"(*data)); break;
case 1: asm("movdqa %0, %%xmm1" : : "m"(*data)); break;
@ -1124,10 +1143,12 @@ static void write_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data,
#endif
default: BUG();
}
emulator_put_fpu();
}
static void read_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
static void read_mmx_reg(u64 *data, int reg)
{
emulator_get_fpu();
switch (reg) {
case 0: asm("movq %%mm0, %0" : "=m"(*data)); break;
case 1: asm("movq %%mm1, %0" : "=m"(*data)); break;
@ -1139,10 +1160,12 @@ static void read_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
case 7: asm("movq %%mm7, %0" : "=m"(*data)); break;
default: BUG();
}
emulator_put_fpu();
}
static void write_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
static void write_mmx_reg(u64 *data, int reg)
{
emulator_get_fpu();
switch (reg) {
case 0: asm("movq %0, %%mm0" : : "m"(*data)); break;
case 1: asm("movq %0, %%mm1" : : "m"(*data)); break;
@ -1154,6 +1177,7 @@ static void write_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
case 7: asm("movq %0, %%mm7" : : "m"(*data)); break;
default: BUG();
}
emulator_put_fpu();
}
static int em_fninit(struct x86_emulate_ctxt *ctxt)
@ -1161,7 +1185,9 @@ static int em_fninit(struct x86_emulate_ctxt *ctxt)
if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
return emulate_nm(ctxt);
emulator_get_fpu();
asm volatile("fninit");
emulator_put_fpu();
return X86EMUL_CONTINUE;
}
@ -1172,7 +1198,9 @@ static int em_fnstcw(struct x86_emulate_ctxt *ctxt)
if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
return emulate_nm(ctxt);
emulator_get_fpu();
asm volatile("fnstcw %0": "+m"(fcw));
emulator_put_fpu();
ctxt->dst.val = fcw;
@ -1186,7 +1214,9 @@ static int em_fnstsw(struct x86_emulate_ctxt *ctxt)
if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
return emulate_nm(ctxt);
emulator_get_fpu();
asm volatile("fnstsw %0": "+m"(fsw));
emulator_put_fpu();
ctxt->dst.val = fsw;
@ -1205,7 +1235,7 @@ static void decode_register_operand(struct x86_emulate_ctxt *ctxt,
op->type = OP_XMM;
op->bytes = 16;
op->addr.xmm = reg;
read_sse_reg(ctxt, &op->vec_val, reg);
read_sse_reg(&op->vec_val, reg);
return;
}
if (ctxt->d & Mmx) {
@ -1256,7 +1286,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
op->type = OP_XMM;
op->bytes = 16;
op->addr.xmm = ctxt->modrm_rm;
read_sse_reg(ctxt, &op->vec_val, ctxt->modrm_rm);
read_sse_reg(&op->vec_val, ctxt->modrm_rm);
return rc;
}
if (ctxt->d & Mmx) {
@ -1833,10 +1863,10 @@ static int writeback(struct x86_emulate_ctxt *ctxt, struct operand *op)
op->bytes * op->count);
break;
case OP_XMM:
write_sse_reg(ctxt, &op->vec_val, op->addr.xmm);
write_sse_reg(&op->vec_val, op->addr.xmm);
break;
case OP_MM:
write_mmx_reg(ctxt, &op->mm_val, op->addr.mm);
write_mmx_reg(&op->mm_val, op->addr.mm);
break;
case OP_NONE:
/* no writeback */
@ -2348,12 +2378,7 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt)
static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
{
#ifdef CONFIG_X86_64
u32 eax, ebx, ecx, edx;
eax = 0x80000001;
ecx = 0;
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
return edx & bit(X86_FEATURE_LM);
return ctxt->ops->guest_has_long_mode(ctxt);
#else
return false;
#endif
@ -3618,18 +3643,11 @@ static int em_mov(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
}
#define FFL(x) bit(X86_FEATURE_##x)
static int em_movbe(struct x86_emulate_ctxt *ctxt)
{
u32 ebx, ecx, edx, eax = 1;
u16 tmp;
/*
* Check MOVBE is set in the guest-visible CPUID leaf.
*/
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
if (!(ecx & FFL(MOVBE)))
if (!ctxt->ops->guest_has_movbe(ctxt))
return emulate_ud(ctxt);
switch (ctxt->op_bytes) {
@ -4027,10 +4045,7 @@ static int em_movsxd(struct x86_emulate_ctxt *ctxt)
static int check_fxsr(struct x86_emulate_ctxt *ctxt)
{
u32 eax = 1, ebx, ecx = 0, edx;
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
if (!(edx & FFL(FXSR)))
if (!ctxt->ops->guest_has_fxsr(ctxt))
return emulate_ud(ctxt);
if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
@ -4092,8 +4107,12 @@ static int em_fxsave(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;
emulator_get_fpu();
rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_state));
emulator_put_fpu();
if (rc != X86EMUL_CONTINUE)
return rc;
@ -4136,6 +4155,8 @@ static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;
emulator_get_fpu();
if (size < __fxstate_size(16)) {
rc = fxregs_fixup(&fx_state, size);
if (rc != X86EMUL_CONTINUE)
@ -4151,6 +4172,8 @@ static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
rc = asm_safe("fxrstor %[fx]", : [fx] "m"(fx_state));
out:
emulator_put_fpu();
return rc;
}
@ -5210,16 +5233,28 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
ctxt->ad_bytes = def_ad_bytes ^ 6;
break;
case 0x26: /* ES override */
has_seg_override = true;
ctxt->seg_override = VCPU_SREG_ES;
break;
case 0x2e: /* CS override */
has_seg_override = true;
ctxt->seg_override = VCPU_SREG_CS;
break;
case 0x36: /* SS override */
has_seg_override = true;
ctxt->seg_override = VCPU_SREG_SS;
break;
case 0x3e: /* DS override */
has_seg_override = true;
ctxt->seg_override = (ctxt->b >> 3) & 3;
ctxt->seg_override = VCPU_SREG_DS;
break;
case 0x64: /* FS override */
has_seg_override = true;
ctxt->seg_override = VCPU_SREG_FS;
break;
case 0x65: /* GS override */
has_seg_override = true;
ctxt->seg_override = ctxt->b & 7;
ctxt->seg_override = VCPU_SREG_GS;
break;
case 0x40 ... 0x4f: /* REX */
if (mode != X86EMUL_MODE_PROT64)
@ -5303,10 +5338,15 @@ done_prefixes:
}
break;
case Escape:
if (ctxt->modrm > 0xbf)
opcode = opcode.u.esc->high[ctxt->modrm - 0xc0];
else
if (ctxt->modrm > 0xbf) {
size_t size = ARRAY_SIZE(opcode.u.esc->high);
u32 index = array_index_nospec(
ctxt->modrm - 0xc0, size);
opcode = opcode.u.esc->high[index];
} else {
opcode = opcode.u.esc->op[(ctxt->modrm >> 3) & 7];
}
break;
case InstrDual:
if ((ctxt->modrm >> 6) == 3)
@ -5448,7 +5488,9 @@ static int flush_pending_x87_faults(struct x86_emulate_ctxt *ctxt)
{
int rc;
emulator_get_fpu();
rc = asm_safe("fwait");
emulator_put_fpu();
if (unlikely(rc != X86EMUL_CONTINUE))
return emulate_exception(ctxt, MF_VECTOR, 0, false);
@ -5456,14 +5498,13 @@ static int flush_pending_x87_faults(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
}
static void fetch_possible_mmx_operand(struct x86_emulate_ctxt *ctxt,
struct operand *op)
static void fetch_possible_mmx_operand(struct operand *op)
{
if (op->type == OP_MM)
read_mmx_reg(ctxt, &op->mm_val, op->addr.mm);
read_mmx_reg(&op->mm_val, op->addr.mm);
}
static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop)
{
ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
@ -5539,10 +5580,10 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
* Now that we know the fpu is exception safe, we can fetch
* operands from it.
*/
fetch_possible_mmx_operand(ctxt, &ctxt->src);
fetch_possible_mmx_operand(ctxt, &ctxt->src2);
fetch_possible_mmx_operand(&ctxt->src);
fetch_possible_mmx_operand(&ctxt->src2);
if (!(ctxt->d & Mov))
fetch_possible_mmx_operand(ctxt, &ctxt->dst);
fetch_possible_mmx_operand(&ctxt->dst);
}
if (unlikely(emul_flags & X86EMUL_GUEST_MASK) && ctxt->intercept) {
@ -5641,14 +5682,10 @@ special_insn:
ctxt->eflags &= ~X86_EFLAGS_RF;
if (ctxt->execute) {
if (ctxt->d & Fastop) {
void (*fop)(struct fastop *) = (void *)ctxt->execute;
rc = fastop(ctxt, fop);
if (rc != X86EMUL_CONTINUE)
goto done;
goto writeback;
}
rc = ctxt->execute(ctxt);
if (ctxt->d & Fastop)
rc = fastop(ctxt, (fastop_t)ctxt->execute);
else
rc = ctxt->execute(ctxt);
if (rc != X86EMUL_CONTINUE)
goto done;
goto writeback;

View File

@ -33,6 +33,7 @@
#include <trace/events/kvm.h>
#include "trace.h"
#include "irq.h"
#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
@ -809,11 +810,12 @@ static int kvm_hv_msr_get_crash_data(struct kvm_vcpu *vcpu,
u32 index, u64 *pdata)
{
struct kvm_hv *hv = &vcpu->kvm->arch.hyperv;
size_t size = ARRAY_SIZE(hv->hv_crash_param);
if (WARN_ON_ONCE(index >= ARRAY_SIZE(hv->hv_crash_param)))
if (WARN_ON_ONCE(index >= size))
return -EINVAL;
*pdata = hv->hv_crash_param[index];
*pdata = hv->hv_crash_param[array_index_nospec(index, size)];
return 0;
}
@ -852,11 +854,12 @@ static int kvm_hv_msr_set_crash_data(struct kvm_vcpu *vcpu,
u32 index, u64 data)
{
struct kvm_hv *hv = &vcpu->kvm->arch.hyperv;
size_t size = ARRAY_SIZE(hv->hv_crash_param);
if (WARN_ON_ONCE(index >= ARRAY_SIZE(hv->hv_crash_param)))
if (WARN_ON_ONCE(index >= size))
return -EINVAL;
hv->hv_crash_param[index] = data;
hv->hv_crash_param[array_index_nospec(index, size)] = data;
return 0;
}
@ -1058,7 +1061,7 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
return 1;
break;
default:
vcpu_unimpl(vcpu, "Hyper-V uhandled wrmsr: 0x%x data 0x%llx\n",
vcpu_unimpl(vcpu, "Hyper-V unhandled wrmsr: 0x%x data 0x%llx\n",
msr, data);
return 1;
}
@ -1121,7 +1124,7 @@ static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)
return 1;
/*
* Clear apic_assist portion of f(struct hv_vp_assist_page
* Clear apic_assist portion of struct hv_vp_assist_page
* only, there can be valuable data in the rest which needs
* to be preserved e.g. on migration.
*/
@ -1178,7 +1181,7 @@ static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)
return 1;
break;
default:
vcpu_unimpl(vcpu, "Hyper-V uhandled wrmsr: 0x%x data 0x%llx\n",
vcpu_unimpl(vcpu, "Hyper-V unhandled wrmsr: 0x%x data 0x%llx\n",
msr, data);
return 1;
}

View File

@ -460,10 +460,14 @@ static int picdev_write(struct kvm_pic *s,
switch (addr) {
case 0x20:
case 0x21:
pic_lock(s);
pic_ioport_write(&s->pics[0], addr, data);
pic_unlock(s);
break;
case 0xa0:
case 0xa1:
pic_lock(s);
pic_ioport_write(&s->pics[addr >> 7], addr, data);
pic_ioport_write(&s->pics[1], addr, data);
pic_unlock(s);
break;
case 0x4d0:

View File

@ -36,6 +36,7 @@
#include <linux/io.h>
#include <linux/slab.h>
#include <linux/export.h>
#include <linux/nospec.h>
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/current.h>
@ -68,13 +69,14 @@ static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
default:
{
u32 redir_index = (ioapic->ioregsel - 0x10) >> 1;
u64 redir_content;
u64 redir_content = ~0ULL;
if (redir_index < IOAPIC_NUM_PINS)
redir_content =
ioapic->redirtbl[redir_index].bits;
else
redir_content = ~0ULL;
if (redir_index < IOAPIC_NUM_PINS) {
u32 index = array_index_nospec(
redir_index, IOAPIC_NUM_PINS);
redir_content = ioapic->redirtbl[index].bits;
}
result = (ioapic->ioregsel & 0x1) ?
(redir_content >> 32) & 0xffffffff :
@ -108,8 +110,9 @@ static void __rtc_irq_eoi_tracking_restore_one(struct kvm_vcpu *vcpu)
union kvm_ioapic_redirect_entry *e;
e = &ioapic->redirtbl[RTC_GSI];
if (!kvm_apic_match_dest(vcpu, NULL, 0, e->fields.dest_id,
e->fields.dest_mode))
if (!kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
e->fields.dest_id,
kvm_lapic_irq_dest_mode(!!e->fields.dest_mode)))
return;
new_val = kvm_apic_pending_eoi(vcpu, e->fields.vector);
@ -188,7 +191,7 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
/*
* Return 0 for coalesced interrupts; for edge-triggered interrupts,
* this only happens if a previous edge has not been delivered due
* do masking. For level interrupts, the remote_irr field tells
* to masking. For level interrupts, the remote_irr field tells
* us if the interrupt is waiting for an EOI.
*
* RTC is special: it is edge-triggered, but userspace likes to know
@ -250,8 +253,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors)
if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) ||
index == RTC_GSI) {
if (kvm_apic_match_dest(vcpu, NULL, 0,
e->fields.dest_id, e->fields.dest_mode) ||
u16 dm = kvm_lapic_irq_dest_mode(!!e->fields.dest_mode);
if (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
e->fields.dest_id, dm) ||
kvm_apic_pending_eoi(vcpu, e->fields.vector))
__set_bit(e->fields.vector,
ioapic_handled_vectors);
@ -292,6 +297,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
if (index >= IOAPIC_NUM_PINS)
return;
index = array_index_nospec(index, IOAPIC_NUM_PINS);
e = &ioapic->redirtbl[index];
mask_before = e->fields.mask;
/* Preserve read-only fields */
@ -327,11 +333,12 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
if (e->fields.delivery_mode == APIC_DM_FIXED) {
struct kvm_lapic_irq irq;
irq.shorthand = 0;
irq.shorthand = APIC_DEST_NOSHORT;
irq.vector = e->fields.vector;
irq.delivery_mode = e->fields.delivery_mode << 8;
irq.dest_id = e->fields.dest_id;
irq.dest_mode = e->fields.dest_mode;
irq.dest_mode =
kvm_lapic_irq_dest_mode(!!e->fields.dest_mode);
bitmap_zero(&vcpu_bitmap, 16);
kvm_bitmap_or_dest_vcpus(ioapic->kvm, &irq,
&vcpu_bitmap);
@ -343,7 +350,9 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
* keep ioapic_handled_vectors synchronized.
*/
irq.dest_id = old_dest_id;
irq.dest_mode = old_dest_mode;
irq.dest_mode =
kvm_lapic_irq_dest_mode(
!!e->fields.dest_mode);
kvm_bitmap_or_dest_vcpus(ioapic->kvm, &irq,
&vcpu_bitmap);
}
@ -369,11 +378,11 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status)
irqe.dest_id = entry->fields.dest_id;
irqe.vector = entry->fields.vector;
irqe.dest_mode = entry->fields.dest_mode;
irqe.dest_mode = kvm_lapic_irq_dest_mode(!!entry->fields.dest_mode);
irqe.trig_mode = entry->fields.trig_mode;
irqe.delivery_mode = entry->fields.delivery_mode << 8;
irqe.level = 1;
irqe.shorthand = 0;
irqe.shorthand = APIC_DEST_NOSHORT;
irqe.msi_redir_hint = false;
if (irqe.trig_mode == IOAPIC_EDGE_TRIG)

View File

@ -116,9 +116,6 @@ static inline int ioapic_in_kernel(struct kvm *kvm)
}
void kvm_rtc_eoi_tracking_restore_one(struct kvm_vcpu *vcpu);
bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
int short_hand, unsigned int dest, int dest_mode);
int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2);
void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector,
int trigger_mode);
int kvm_ioapic_init(struct kvm *kvm);
@ -126,9 +123,6 @@ void kvm_ioapic_destroy(struct kvm *kvm);
int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id,
int level, bool line_status);
void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int irq_source_id);
int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq,
struct dest_map *dest_map);
void kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
void kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,

View File

@ -113,5 +113,8 @@ int apic_has_pending_timer(struct kvm_vcpu *vcpu);
int kvm_setup_default_irq_routing(struct kvm *kvm);
int kvm_setup_empty_irq_routing(struct kvm *kvm);
int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq,
struct dest_map *dest_map);
#endif

View File

@ -52,15 +52,15 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
unsigned int dest_vcpus = 0;
if (irq->dest_mode == 0 && irq->dest_id == 0xff &&
kvm_lowest_prio_delivery(irq)) {
if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
return r;
if (irq->dest_mode == APIC_DEST_PHYSICAL &&
irq->dest_id == 0xff && kvm_lowest_prio_delivery(irq)) {
printk(KERN_INFO "kvm: apic: phys broadcast and lowest prio\n");
irq->delivery_mode = APIC_DM_FIXED;
}
if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
return r;
memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
kvm_for_each_vcpu(i, vcpu, kvm) {
@ -114,13 +114,14 @@ void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
irq->dest_id |= MSI_ADDR_EXT_DEST_ID(e->msi.address_hi);
irq->vector = (e->msi.data &
MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
irq->dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo;
irq->dest_mode = kvm_lapic_irq_dest_mode(
!!((1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo));
irq->trig_mode = (1 << MSI_DATA_TRIGGER_SHIFT) & e->msi.data;
irq->delivery_mode = e->msi.data & 0x700;
irq->msi_redir_hint = ((e->msi.address_lo
& MSI_ADDR_REDIRECTION_LOWPRI) > 0);
irq->level = 1;
irq->shorthand = 0;
irq->shorthand = APIC_DEST_NOSHORT;
}
EXPORT_SYMBOL_GPL(kvm_set_msi_irq);
@ -416,7 +417,8 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
kvm_set_msi_irq(vcpu->kvm, entry, &irq);
if (irq.level && kvm_apic_match_dest(vcpu, NULL, 0,
if (irq.level &&
kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
irq.dest_id, irq.dest_mode))
__set_bit(irq.vector, ioapic_handled_vectors);
}

View File

@ -56,9 +56,6 @@
#define APIC_VERSION (0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
#define LAPIC_MMIO_LENGTH (1 << 12)
/* followed define is not in apicdef.h */
#define APIC_SHORT_MASK 0xc0000
#define APIC_DEST_NOSHORT 0x0
#define APIC_DEST_MASK 0x800
#define MAX_APIC_VECTOR 256
#define APIC_VECTORS_PER_REG 32
@ -792,13 +789,13 @@ static u32 kvm_apic_mda(struct kvm_vcpu *vcpu, unsigned int dest_id,
}
bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
int short_hand, unsigned int dest, int dest_mode)
int shorthand, unsigned int dest, int dest_mode)
{
struct kvm_lapic *target = vcpu->arch.apic;
u32 mda = kvm_apic_mda(vcpu, dest, source, target);
ASSERT(target);
switch (short_hand) {
switch (shorthand) {
case APIC_DEST_NOSHORT:
if (dest_mode == APIC_DEST_PHYSICAL)
return kvm_apic_match_physical_addr(target, mda);
@ -967,12 +964,12 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
}
/*
* This routine tries to handler interrupts in posted mode, here is how
* This routine tries to handle interrupts in posted mode, here is how
* it deals with different cases:
* - For single-destination interrupts, handle it in posted mode
* - Else if vector hashing is enabled and it is a lowest-priority
* interrupt, handle it in posted mode and use the following mechanism
* to find the destinaiton vCPU.
* to find the destination vCPU.
* 1. For lowest-priority interrupts, store all the possible
* destination vCPUs in an array.
* 2. Use "guest vector % max number of destination vCPUs" to find
@ -1151,7 +1148,7 @@ void kvm_bitmap_or_dest_vcpus(struct kvm *kvm, struct kvm_lapic_irq *irq,
if (!kvm_apic_present(vcpu))
continue;
if (!kvm_apic_match_dest(vcpu, NULL,
irq->delivery_mode,
irq->shorthand,
irq->dest_id,
irq->dest_mode))
continue;
@ -1574,9 +1571,9 @@ static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
struct kvm_timer *ktimer = &apic->lapic_timer;
kvm_apic_local_deliver(apic, APIC_LVTT);
if (apic_lvtt_tscdeadline(apic))
if (apic_lvtt_tscdeadline(apic)) {
ktimer->tscdeadline = 0;
if (apic_lvtt_oneshot(apic)) {
} else if (apic_lvtt_oneshot(apic)) {
ktimer->tscdeadline = 0;
ktimer->target_expiration = 0;
}
@ -1963,15 +1960,20 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
case APIC_LVTTHMR:
case APIC_LVTPC:
case APIC_LVT1:
case APIC_LVTERR:
case APIC_LVTERR: {
/* TODO: Check vector */
size_t size;
u32 index;
if (!kvm_apic_sw_enabled(apic))
val |= APIC_LVT_MASKED;
val &= apic_lvt_mask[(reg - APIC_LVTT) >> 4];
size = ARRAY_SIZE(apic_lvt_mask);
index = array_index_nospec(
(reg - APIC_LVTT) >> 4, size);
val &= apic_lvt_mask[index];
kvm_lapic_set_reg(apic, reg, val);
break;
}
case APIC_LVTT:
if (!kvm_apic_sw_enabled(apic))
@ -2373,14 +2375,13 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
{
u32 lvt0 = kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVT0);
int r = 0;
if (!kvm_apic_hw_enabled(vcpu->arch.apic))
r = 1;
return 1;
if ((lvt0 & APIC_LVT_MASKED) == 0 &&
GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
r = 1;
return r;
return 1;
return 0;
}
void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)

View File

@ -10,8 +10,9 @@
#define KVM_APIC_SIPI 1
#define KVM_APIC_LVT_NUM 6
#define KVM_APIC_SHORT_MASK 0xc0000
#define KVM_APIC_DEST_MASK 0x800
#define APIC_SHORT_MASK 0xc0000
#define APIC_DEST_NOSHORT 0x0
#define APIC_DEST_MASK 0x800
#define APIC_BUS_CYCLE_NS 1
#define APIC_BUS_FREQUENCY (1000000000ULL / APIC_BUS_CYCLE_NS)
@ -82,8 +83,8 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val);
int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
void *data);
bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
int short_hand, unsigned int dest, int dest_mode);
int shorthand, unsigned int dest, int dest_mode);
int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2);
bool __kvm_apic_update_irr(u32 *pir, void *regs, int *max_irr);
bool kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir, int *max_irr);
void kvm_apic_update_ppr(struct kvm_vcpu *vcpu);

View File

@ -418,22 +418,24 @@ static inline bool is_access_track_spte(u64 spte)
* requires a full MMU zap). The flag is instead explicitly queried when
* checking for MMIO spte cache hits.
*/
#define MMIO_SPTE_GEN_MASK GENMASK_ULL(18, 0)
#define MMIO_SPTE_GEN_MASK GENMASK_ULL(17, 0)
#define MMIO_SPTE_GEN_LOW_START 3
#define MMIO_SPTE_GEN_LOW_END 11
#define MMIO_SPTE_GEN_LOW_MASK GENMASK_ULL(MMIO_SPTE_GEN_LOW_END, \
MMIO_SPTE_GEN_LOW_START)
#define MMIO_SPTE_GEN_HIGH_START 52
#define MMIO_SPTE_GEN_HIGH_END 61
#define MMIO_SPTE_GEN_HIGH_START PT64_SECOND_AVAIL_BITS_SHIFT
#define MMIO_SPTE_GEN_HIGH_END 62
#define MMIO_SPTE_GEN_HIGH_MASK GENMASK_ULL(MMIO_SPTE_GEN_HIGH_END, \
MMIO_SPTE_GEN_HIGH_START)
static u64 generation_mmio_spte_mask(u64 gen)
{
u64 mask;
WARN_ON(gen & ~MMIO_SPTE_GEN_MASK);
BUILD_BUG_ON((MMIO_SPTE_GEN_HIGH_MASK | MMIO_SPTE_GEN_LOW_MASK) & SPTE_SPECIAL_MASK);
mask = (gen << MMIO_SPTE_GEN_LOW_START) & MMIO_SPTE_GEN_LOW_MASK;
mask |= (gen << MMIO_SPTE_GEN_HIGH_START) & MMIO_SPTE_GEN_HIGH_MASK;
@ -444,8 +446,6 @@ static u64 get_mmio_spte_generation(u64 spte)
{
u64 gen;
spte &= ~shadow_mmio_mask;
gen = (spte & MMIO_SPTE_GEN_LOW_MASK) >> MMIO_SPTE_GEN_LOW_START;
gen |= (spte & MMIO_SPTE_GEN_HIGH_MASK) >> MMIO_SPTE_GEN_HIGH_START;
return gen;
@ -538,16 +538,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
static u8 kvm_get_shadow_phys_bits(void)
{
/*
* boot_cpu_data.x86_phys_bits is reduced when MKTME is detected
* in CPU detection code, but MKTME treats those reduced bits as
* 'keyID' thus they are not reserved bits. Therefore for MKTME
* we should still return physical address bits reported by CPUID.
* boot_cpu_data.x86_phys_bits is reduced when MKTME or SME are detected
* in CPU detection code, but the processor treats those reduced bits as
* 'keyID' thus they are not reserved bits. Therefore KVM needs to look at
* the physical address bits reported by CPUID.
*/
if (!boot_cpu_has(X86_FEATURE_TME) ||
WARN_ON_ONCE(boot_cpu_data.extended_cpuid_level < 0x80000008))
return boot_cpu_data.x86_phys_bits;
if (likely(boot_cpu_data.extended_cpuid_level >= 0x80000008))
return cpuid_eax(0x80000008) & 0xff;
return cpuid_eax(0x80000008) & 0xff;
/*
* Quite weird to have VMX or SVM but not MAXPHYADDR; probably a VM with
* custom CPUID. Proceed with whatever the kernel found since these features
* aren't virtualizable (SME/SEV also require CPUIDs higher than 0x80000008).
*/
return boot_cpu_data.x86_phys_bits;
}
static void kvm_mmu_reset_all_pte_masks(void)
@ -1260,56 +1264,6 @@ static void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
list_del(&sp->lpage_disallowed_link);
}
static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
struct kvm_memory_slot *slot)
{
struct kvm_lpage_info *linfo;
if (slot) {
linfo = lpage_info_slot(gfn, slot, level);
return !!linfo->disallow_lpage;
}
return true;
}
static bool mmu_gfn_lpage_is_disallowed(struct kvm_vcpu *vcpu, gfn_t gfn,
int level)
{
struct kvm_memory_slot *slot;
slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
return __mmu_gfn_lpage_is_disallowed(gfn, level, slot);
}
static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
{
unsigned long page_size;
int i, ret = 0;
page_size = kvm_host_page_size(kvm, gfn);
for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
if (page_size >= KVM_HPAGE_SIZE(i))
ret = i;
else
break;
}
return ret;
}
static inline bool memslot_valid_for_gpte(struct kvm_memory_slot *slot,
bool no_dirty_log)
{
if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
return false;
if (no_dirty_log && slot->dirty_bitmap)
return false;
return true;
}
static struct kvm_memory_slot *
gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
bool no_dirty_log)
@ -1317,40 +1271,14 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
struct kvm_memory_slot *slot;
slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
if (!memslot_valid_for_gpte(slot, no_dirty_log))
slot = NULL;
if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
return NULL;
if (no_dirty_log && slot->dirty_bitmap)
return NULL;
return slot;
}
static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
bool *force_pt_level)
{
int host_level, level, max_level;
struct kvm_memory_slot *slot;
if (unlikely(*force_pt_level))
return PT_PAGE_TABLE_LEVEL;
slot = kvm_vcpu_gfn_to_memslot(vcpu, large_gfn);
*force_pt_level = !memslot_valid_for_gpte(slot, true);
if (unlikely(*force_pt_level))
return PT_PAGE_TABLE_LEVEL;
host_level = host_mapping_level(vcpu->kvm, large_gfn);
if (host_level == PT_PAGE_TABLE_LEVEL)
return host_level;
max_level = min(kvm_x86_ops->get_lpage_level(), host_level);
for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
if (__mmu_gfn_lpage_is_disallowed(large_gfn, level, slot))
break;
return level - 1;
}
/*
* About rmap_head encoding:
*
@ -1410,7 +1338,7 @@ pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
if (j != 0)
return;
if (!prev_desc && !desc->more)
rmap_head->val = (unsigned long)desc->sptes[0];
rmap_head->val = 0;
else
if (prev_desc)
prev_desc->more = desc->more;
@ -1525,7 +1453,7 @@ struct rmap_iterator {
/*
* Iteration must be started by this function. This should also be used after
* removing/dropping sptes from the rmap link because in such cases the
* information in the itererator may not be valid.
* information in the iterator may not be valid.
*
* Returns sptep if found, NULL otherwise.
*/
@ -2899,6 +2827,26 @@ static bool prepare_zap_oldest_mmu_page(struct kvm *kvm,
return kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
}
static int make_mmu_pages_available(struct kvm_vcpu *vcpu)
{
LIST_HEAD(invalid_list);
if (likely(kvm_mmu_available_pages(vcpu->kvm) >= KVM_MIN_FREE_MMU_PAGES))
return 0;
while (kvm_mmu_available_pages(vcpu->kvm) < KVM_REFILL_PAGES) {
if (!prepare_zap_oldest_mmu_page(vcpu->kvm, &invalid_list))
break;
++vcpu->kvm->stat.mmu_recycled;
}
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
if (!kvm_mmu_available_pages(vcpu->kvm))
return -ENOSPC;
return 0;
}
/*
* Changing the number of mmu pages allocated to the vm
* Note: if goal_nr_mmu_pages is too small, you will get dead lock
@ -3099,17 +3047,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
spte |= (u64)pfn << PAGE_SHIFT;
if (pte_access & ACC_WRITE_MASK) {
/*
* Other vcpu creates new sp in the window between
* mapping_level() and acquiring mmu-lock. We can
* allow guest to retry the access, the mapping can
* be fixed if guest refault.
*/
if (level > PT_PAGE_TABLE_LEVEL &&
mmu_gfn_lpage_is_disallowed(vcpu, gfn, level))
goto done;
spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
/*
@ -3141,7 +3078,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
set_pte:
if (mmu_spte_update(sptep, spte))
ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
done:
return ret;
}
@ -3294,6 +3230,83 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
__direct_pte_prefetch(vcpu, sp, sptep);
}
static int host_pfn_mapping_level(struct kvm_vcpu *vcpu, gfn_t gfn,
kvm_pfn_t pfn, struct kvm_memory_slot *slot)
{
unsigned long hva;
pte_t *pte;
int level;
BUILD_BUG_ON(PT_PAGE_TABLE_LEVEL != (int)PG_LEVEL_4K ||
PT_DIRECTORY_LEVEL != (int)PG_LEVEL_2M ||
PT_PDPE_LEVEL != (int)PG_LEVEL_1G);
if (!PageCompound(pfn_to_page(pfn)) && !kvm_is_zone_device_pfn(pfn))
return PT_PAGE_TABLE_LEVEL;
/*
* Note, using the already-retrieved memslot and __gfn_to_hva_memslot()
* is not solely for performance, it's also necessary to avoid the
* "writable" check in __gfn_to_hva_many(), which will always fail on
* read-only memslots due to gfn_to_hva() assuming writes. Earlier
* page fault steps have already verified the guest isn't writing a
* read-only memslot.
*/
hva = __gfn_to_hva_memslot(slot, gfn);
pte = lookup_address_in_mm(vcpu->kvm->mm, hva, &level);
if (unlikely(!pte))
return PT_PAGE_TABLE_LEVEL;
return level;
}
static int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
int max_level, kvm_pfn_t *pfnp)
{
struct kvm_memory_slot *slot;
struct kvm_lpage_info *linfo;
kvm_pfn_t pfn = *pfnp;
kvm_pfn_t mask;
int level;
if (unlikely(max_level == PT_PAGE_TABLE_LEVEL))
return PT_PAGE_TABLE_LEVEL;
if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn))
return PT_PAGE_TABLE_LEVEL;
slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, true);
if (!slot)
return PT_PAGE_TABLE_LEVEL;
max_level = min(max_level, kvm_x86_ops->get_lpage_level());
for ( ; max_level > PT_PAGE_TABLE_LEVEL; max_level--) {
linfo = lpage_info_slot(gfn, slot, max_level);
if (!linfo->disallow_lpage)
break;
}
if (max_level == PT_PAGE_TABLE_LEVEL)
return PT_PAGE_TABLE_LEVEL;
level = host_pfn_mapping_level(vcpu, gfn, pfn, slot);
if (level == PT_PAGE_TABLE_LEVEL)
return level;
level = min(level, max_level);
/*
* mmu_notifier_retry() was successful and mmu_lock is held, so
* the pmd can't be split from under us.
*/
mask = KVM_PAGES_PER_HPAGE(level) - 1;
VM_BUG_ON((gfn & mask) != (pfn & mask));
*pfnp = pfn & ~mask;
return level;
}
static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator it,
gfn_t gfn, kvm_pfn_t *pfnp, int *levelp)
{
@ -3318,18 +3331,20 @@ static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator it,
}
static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
int map_writable, int level, kvm_pfn_t pfn,
bool prefault, bool lpage_disallowed)
int map_writable, int max_level, kvm_pfn_t pfn,
bool prefault, bool account_disallowed_nx_lpage)
{
struct kvm_shadow_walk_iterator it;
struct kvm_mmu_page *sp;
int ret;
int level, ret;
gfn_t gfn = gpa >> PAGE_SHIFT;
gfn_t base_gfn = gfn;
if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
return RET_PF_RETRY;
level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn);
trace_kvm_mmu_spte_requested(gpa, level, pfn);
for_each_shadow_entry(vcpu, gpa, it) {
/*
@ -3348,7 +3363,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
it.level - 1, true, ACC_ALL);
link_shadow_page(vcpu, it.sptep, sp);
if (lpage_disallowed)
if (account_disallowed_nx_lpage)
account_huge_nx_page(vcpu->kvm, sp);
}
}
@ -3384,45 +3399,6 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
return -EFAULT;
}
static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
gfn_t gfn, kvm_pfn_t *pfnp,
int *levelp)
{
kvm_pfn_t pfn = *pfnp;
int level = *levelp;
/*
* Check if it's a transparent hugepage. If this would be an
* hugetlbfs page, level wouldn't be set to
* PT_PAGE_TABLE_LEVEL and there would be no adjustment done
* here.
*/
if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
!kvm_is_zone_device_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL &&
PageTransCompoundMap(pfn_to_page(pfn)) &&
!mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
unsigned long mask;
/*
* mmu_notifier_retry was successful and we hold the
* mmu_lock here, so the pmd can't become splitting
* from under us, and in turn
* __split_huge_page_refcount() can't run from under
* us and we can safely transfer the refcount from
* PG_tail to PG_head as we switch the pfn to tail to
* head.
*/
*levelp = level = PT_DIRECTORY_LEVEL;
mask = KVM_PAGES_PER_HPAGE(level) - 1;
VM_BUG_ON((gfn & mask) != (pfn & mask));
if (pfn & mask) {
kvm_release_pfn_clean(pfn);
pfn &= ~mask;
kvm_get_pfn(pfn);
*pfnp = pfn;
}
}
}
static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
kvm_pfn_t pfn, unsigned access, int *ret_val)
{
@ -3528,7 +3504,7 @@ static bool is_access_allowed(u32 fault_err_code, u64 spte)
* - true: let the vcpu to access on the same address again.
* - false: let the real page fault path to fix it.
*/
static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
u32 error_code)
{
struct kvm_shadow_walk_iterator iterator;
@ -3537,9 +3513,6 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
u64 spte = 0ull;
uint retry_count = 0;
if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
return false;
if (!page_fault_can_be_fast(error_code))
return false;
@ -3548,9 +3521,8 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
do {
u64 new_spte;
for_each_shadow_entry_lockless(vcpu, gva, iterator, spte)
if (!is_shadow_present_pte(spte) ||
iterator.level < level)
for_each_shadow_entry_lockless(vcpu, cr2_or_gpa, iterator, spte)
if (!is_shadow_present_pte(spte))
break;
sp = page_header(__pa(iterator.sptep));
@ -3626,71 +3598,13 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
} while (true);
trace_fast_page_fault(vcpu, gva, error_code, iterator.sptep,
trace_fast_page_fault(vcpu, cr2_or_gpa, error_code, iterator.sptep,
spte, fault_handled);
walk_shadow_page_lockless_end(vcpu);
return fault_handled;
}
static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
gva_t gva, kvm_pfn_t *pfn, bool write, bool *writable);
static int make_mmu_pages_available(struct kvm_vcpu *vcpu);
static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
gfn_t gfn, bool prefault)
{
int r;
int level;
bool force_pt_level;
kvm_pfn_t pfn;
unsigned long mmu_seq;
bool map_writable, write = error_code & PFERR_WRITE_MASK;
bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
is_nx_huge_page_enabled();
force_pt_level = lpage_disallowed;
level = mapping_level(vcpu, gfn, &force_pt_level);
if (likely(!force_pt_level)) {
/*
* This path builds a PAE pagetable - so we can map
* 2mb pages at maximum. Therefore check if the level
* is larger than that.
*/
if (level > PT_DIRECTORY_LEVEL)
level = PT_DIRECTORY_LEVEL;
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
}
if (fast_page_fault(vcpu, v, level, error_code))
return RET_PF_RETRY;
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
if (try_async_pf(vcpu, prefault, gfn, v, &pfn, write, &map_writable))
return RET_PF_RETRY;
if (handle_abnormal_pfn(vcpu, v, gfn, pfn, ACC_ALL, &r))
return r;
r = RET_PF_RETRY;
spin_lock(&vcpu->kvm->mmu_lock);
if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
goto out_unlock;
if (make_mmu_pages_available(vcpu) < 0)
goto out_unlock;
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, gfn, &pfn, &level);
r = __direct_map(vcpu, v, write, map_writable, level, pfn,
prefault, false);
out_unlock:
spin_unlock(&vcpu->kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
return r;
}
static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
struct list_head *invalid_list)
{
@ -3981,7 +3895,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_mmu_sync_roots);
static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gpa_t vaddr,
u32 access, struct x86_exception *exception)
{
if (exception)
@ -3989,7 +3903,7 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
return vaddr;
}
static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr,
static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gpa_t vaddr,
u32 access,
struct x86_exception *exception)
{
@ -4001,20 +3915,14 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr,
static bool
__is_rsvd_bits_set(struct rsvd_bits_validate *rsvd_check, u64 pte, int level)
{
int bit7 = (pte >> 7) & 1, low6 = pte & 0x3f;
int bit7 = (pte >> 7) & 1;
return (pte & rsvd_check->rsvd_bits_mask[bit7][level-1]) |
((rsvd_check->bad_mt_xwr & (1ull << low6)) != 0);
return pte & rsvd_check->rsvd_bits_mask[bit7][level-1];
}
static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
static bool __is_bad_mt_xwr(struct rsvd_bits_validate *rsvd_check, u64 pte)
{
return __is_rsvd_bits_set(&mmu->guest_rsvd_check, gpte, level);
}
static bool is_shadow_zero_bits_set(struct kvm_mmu *mmu, u64 spte, int level)
{
return __is_rsvd_bits_set(&mmu->shadow_zero_check, spte, level);
return rsvd_check->bad_mt_xwr & BIT_ULL(pte & 0x3f);
}
static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct)
@ -4038,11 +3946,11 @@ walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
{
struct kvm_shadow_walk_iterator iterator;
u64 sptes[PT64_ROOT_MAX_LEVEL], spte = 0ull;
struct rsvd_bits_validate *rsvd_check;
int root, leaf;
bool reserved = false;
if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
goto exit;
rsvd_check = &vcpu->arch.mmu->shadow_zero_check;
walk_shadow_page_lockless_begin(vcpu);
@ -4058,8 +3966,13 @@ walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
if (!is_shadow_present_pte(spte))
break;
reserved |= is_shadow_zero_bits_set(vcpu->arch.mmu, spte,
iterator.level);
/*
* Use a bitwise-OR instead of a logical-OR to aggregate the
* reserved bit and EPT's invalid memtype/XWR checks to avoid
* adding a Jcc in the loop.
*/
reserved |= __is_bad_mt_xwr(rsvd_check, spte) |
__is_rsvd_bits_set(rsvd_check, spte, iterator.level);
}
walk_shadow_page_lockless_end(vcpu);
@ -4073,7 +3986,7 @@ walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
root--;
}
}
exit:
*sptep = spte;
return reserved;
}
@ -4137,9 +4050,6 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
struct kvm_shadow_walk_iterator iterator;
u64 spte;
if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
return;
walk_shadow_page_lockless_begin(vcpu);
for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
clear_sp_write_flooding_count(iterator.sptep);
@ -4149,29 +4059,8 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
walk_shadow_page_lockless_end(vcpu);
}
static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
u32 error_code, bool prefault)
{
gfn_t gfn = gva >> PAGE_SHIFT;
int r;
pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
if (page_fault_handle_page_track(vcpu, error_code, gfn))
return RET_PF_EMULATE;
r = mmu_topup_memory_caches(vcpu);
if (r)
return r;
MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa));
return nonpaging_map(vcpu, gva & PAGE_MASK,
error_code, gfn, prefault);
}
static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
gfn_t gfn)
{
struct kvm_arch_async_pf arch;
@ -4180,11 +4069,13 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
arch.direct_map = vcpu->arch.mmu->direct_map;
arch.cr3 = vcpu->arch.mmu->get_cr3(vcpu);
return kvm_setup_async_pf(vcpu, gva, kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
return kvm_setup_async_pf(vcpu, cr2_or_gpa,
kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
}
static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
gva_t gva, kvm_pfn_t *pfn, bool write, bool *writable)
gpa_t cr2_or_gpa, kvm_pfn_t *pfn, bool write,
bool *writable)
{
struct kvm_memory_slot *slot;
bool async;
@ -4204,12 +4095,12 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
return false; /* *pfn has correct page already */
if (!prefault && kvm_can_do_async_pf(vcpu)) {
trace_kvm_try_async_get_page(gva, gfn);
trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
if (kvm_find_async_pf_gfn(vcpu, gfn)) {
trace_kvm_async_pf_doublefault(gva, gfn);
trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn);
kvm_make_request(KVM_REQ_APF_HALT, vcpu);
return true;
} else if (kvm_arch_setup_async_pf(vcpu, gva, gfn))
} else if (kvm_arch_setup_async_pf(vcpu, cr2_or_gpa, gfn))
return true;
}
@ -4217,11 +4108,77 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
return false;
}
static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
bool prefault, int max_level, bool is_tdp)
{
bool write = error_code & PFERR_WRITE_MASK;
bool exec = error_code & PFERR_FETCH_MASK;
bool lpage_disallowed = exec && is_nx_huge_page_enabled();
bool map_writable;
gfn_t gfn = gpa >> PAGE_SHIFT;
unsigned long mmu_seq;
kvm_pfn_t pfn;
int r;
if (page_fault_handle_page_track(vcpu, error_code, gfn))
return RET_PF_EMULATE;
r = mmu_topup_memory_caches(vcpu);
if (r)
return r;
if (lpage_disallowed)
max_level = PT_PAGE_TABLE_LEVEL;
if (fast_page_fault(vcpu, gpa, error_code))
return RET_PF_RETRY;
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable))
return RET_PF_RETRY;
if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
return r;
r = RET_PF_RETRY;
spin_lock(&vcpu->kvm->mmu_lock);
if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
goto out_unlock;
if (make_mmu_pages_available(vcpu) < 0)
goto out_unlock;
r = __direct_map(vcpu, gpa, write, map_writable, max_level, pfn,
prefault, is_tdp && lpage_disallowed);
out_unlock:
spin_unlock(&vcpu->kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
return r;
}
static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa,
u32 error_code, bool prefault)
{
pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code);
/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
return direct_page_fault(vcpu, gpa & PAGE_MASK, error_code, prefault,
PT_DIRECTORY_LEVEL, false);
}
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
u64 fault_address, char *insn, int insn_len)
{
int r = 1;
#ifndef CONFIG_X86_64
/* A 64-bit CR2 should be impossible on 32-bit KVM. */
if (WARN_ON_ONCE(fault_address >> 32))
return -EFAULT;
#endif
vcpu->arch.l1tf_flush_l1d = true;
switch (vcpu->arch.apf.host_apf_reason) {
default:
@ -4249,76 +4206,23 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
}
EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
static bool
check_hugepage_cache_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, int level)
{
int page_num = KVM_PAGES_PER_HPAGE(level);
gfn &= ~(page_num - 1);
return kvm_mtrr_check_gfn_range_consistency(vcpu, gfn, page_num);
}
static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
static int tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
bool prefault)
{
kvm_pfn_t pfn;
int r;
int level;
bool force_pt_level;
gfn_t gfn = gpa >> PAGE_SHIFT;
unsigned long mmu_seq;
int write = error_code & PFERR_WRITE_MASK;
bool map_writable;
bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
is_nx_huge_page_enabled();
int max_level;
MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa));
for (max_level = PT_MAX_HUGEPAGE_LEVEL;
max_level > PT_PAGE_TABLE_LEVEL;
max_level--) {
int page_num = KVM_PAGES_PER_HPAGE(max_level);
gfn_t base = (gpa >> PAGE_SHIFT) & ~(page_num - 1);
if (page_fault_handle_page_track(vcpu, error_code, gfn))
return RET_PF_EMULATE;
r = mmu_topup_memory_caches(vcpu);
if (r)
return r;
force_pt_level =
lpage_disallowed ||
!check_hugepage_cache_consistency(vcpu, gfn, PT_DIRECTORY_LEVEL);
level = mapping_level(vcpu, gfn, &force_pt_level);
if (likely(!force_pt_level)) {
if (level > PT_DIRECTORY_LEVEL &&
!check_hugepage_cache_consistency(vcpu, gfn, level))
level = PT_DIRECTORY_LEVEL;
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num))
break;
}
if (fast_page_fault(vcpu, gpa, level, error_code))
return RET_PF_RETRY;
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable))
return RET_PF_RETRY;
if (handle_abnormal_pfn(vcpu, 0, gfn, pfn, ACC_ALL, &r))
return r;
r = RET_PF_RETRY;
spin_lock(&vcpu->kvm->mmu_lock);
if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
goto out_unlock;
if (make_mmu_pages_available(vcpu) < 0)
goto out_unlock;
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, gfn, &pfn, &level);
r = __direct_map(vcpu, gpa, write, map_writable, level, pfn,
prefault, lpage_disallowed);
out_unlock:
spin_unlock(&vcpu->kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
return r;
return direct_page_fault(vcpu, gpa, error_code, prefault,
max_level, true);
}
static void nonpaging_init_context(struct kvm_vcpu *vcpu,
@ -5496,47 +5400,30 @@ int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
}
EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt);
static int make_mmu_pages_available(struct kvm_vcpu *vcpu)
{
LIST_HEAD(invalid_list);
if (likely(kvm_mmu_available_pages(vcpu->kvm) >= KVM_MIN_FREE_MMU_PAGES))
return 0;
while (kvm_mmu_available_pages(vcpu->kvm) < KVM_REFILL_PAGES) {
if (!prepare_zap_oldest_mmu_page(vcpu->kvm, &invalid_list))
break;
++vcpu->kvm->stat.mmu_recycled;
}
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
if (!kvm_mmu_available_pages(vcpu->kvm))
return -ENOSPC;
return 0;
}
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
void *insn, int insn_len)
{
int r, emulation_type = 0;
bool direct = vcpu->arch.mmu->direct_map;
if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
return RET_PF_RETRY;
/* With shadow page tables, fault_address contains a GVA or nGPA. */
if (vcpu->arch.mmu->direct_map) {
vcpu->arch.gpa_available = true;
vcpu->arch.gpa_val = cr2;
vcpu->arch.gpa_val = cr2_or_gpa;
}
r = RET_PF_INVALID;
if (unlikely(error_code & PFERR_RSVD_MASK)) {
r = handle_mmio_page_fault(vcpu, cr2, direct);
r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
if (r == RET_PF_EMULATE)
goto emulate;
}
if (r == RET_PF_INVALID) {
r = vcpu->arch.mmu->page_fault(vcpu, cr2,
r = vcpu->arch.mmu->page_fault(vcpu, cr2_or_gpa,
lower_32_bits(error_code),
false);
WARN_ON(r == RET_PF_INVALID);
@ -5556,7 +5443,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
*/
if (vcpu->arch.mmu->direct_map &&
(error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2));
kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
return 1;
}
@ -5571,7 +5458,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
* explicitly shadowing L1's page tables, i.e. unprotecting something
* for L1 isn't going to magically fix whatever issue cause L2 to fail.
*/
if (!mmio_info_in_cache(vcpu, cr2, direct) && !is_guest_mode(vcpu))
if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu))
emulation_type = EMULTYPE_ALLOW_RETRY;
emulate:
/*
@ -5586,7 +5473,7 @@ emulate:
return 1;
}
return x86_emulate_instruction(vcpu, cr2, emulation_type, insn,
return x86_emulate_instruction(vcpu, cr2_or_gpa, emulation_type, insn,
insn_len);
}
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
@ -6015,8 +5902,8 @@ restart:
* mapping if the indirect sp has level = 1.
*/
if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
!kvm_is_zone_device_pfn(pfn) &&
PageTransCompoundMap(pfn_to_page(pfn))) {
(kvm_is_zone_device_pfn(pfn) ||
PageCompound(pfn_to_page(pfn)))) {
pte_list_remove(rmap_head, sptep);
if (kvm_available_flush_tlb_with_range())
@ -6249,7 +6136,7 @@ static void kvm_set_mmio_spte_mask(void)
* If reserved bit is not supported, clear the present bit to disable
* mmio page fault.
*/
if (IS_ENABLED(CONFIG_X86_64) && shadow_phys_bits == 52)
if (shadow_phys_bits == 52)
mask &= ~1ull;
kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);

View File

@ -128,6 +128,21 @@ static inline int FNAME(is_present_gpte)(unsigned long pte)
#endif
}
static bool FNAME(is_bad_mt_xwr)(struct rsvd_bits_validate *rsvd_check, u64 gpte)
{
#if PTTYPE != PTTYPE_EPT
return false;
#else
return __is_bad_mt_xwr(rsvd_check, gpte);
#endif
}
static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level)
{
return __is_rsvd_bits_set(&mmu->guest_rsvd_check, gpte, level) ||
FNAME(is_bad_mt_xwr)(&mmu->guest_rsvd_check, gpte);
}
static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
pt_element_t __user *ptep_user, unsigned index,
pt_element_t orig_pte, pt_element_t new_pte)
@ -175,9 +190,6 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp, u64 *spte,
u64 gpte)
{
if (is_rsvd_bits_set(vcpu->arch.mmu, gpte, PT_PAGE_TABLE_LEVEL))
goto no_present;
if (!FNAME(is_present_gpte)(gpte))
goto no_present;
@ -186,6 +198,9 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
!(gpte & PT_GUEST_ACCESSED_MASK))
goto no_present;
if (FNAME(is_rsvd_bits_set)(vcpu->arch.mmu, gpte, PT_PAGE_TABLE_LEVEL))
goto no_present;
return false;
no_present:
@ -291,11 +306,11 @@ static inline unsigned FNAME(gpte_pkeys)(struct kvm_vcpu *vcpu, u64 gpte)
}
/*
* Fetch a guest pte for a guest virtual address
* Fetch a guest pte for a guest virtual address, or for an L2's GPA.
*/
static int FNAME(walk_addr_generic)(struct guest_walker *walker,
struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gva_t addr, u32 access)
gpa_t addr, u32 access)
{
int ret;
pt_element_t pte;
@ -400,7 +415,7 @@ retry_walk:
if (unlikely(!FNAME(is_present_gpte)(pte)))
goto error;
if (unlikely(is_rsvd_bits_set(mmu, pte, walker->level))) {
if (unlikely(FNAME(is_rsvd_bits_set)(mmu, pte, walker->level))) {
errcode = PFERR_RSVD_MASK | PFERR_PRESENT_MASK;
goto error;
}
@ -496,7 +511,7 @@ error:
}
static int FNAME(walk_addr)(struct guest_walker *walker,
struct kvm_vcpu *vcpu, gva_t addr, u32 access)
struct kvm_vcpu *vcpu, gpa_t addr, u32 access)
{
return FNAME(walk_addr_generic)(walker, vcpu, vcpu->arch.mmu, addr,
access);
@ -611,17 +626,17 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
* If the guest tries to write a write-protected page, we need to
* emulate this operation, return 1 to indicate this case.
*/
static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
struct guest_walker *gw,
int write_fault, int hlevel,
int write_fault, int max_level,
kvm_pfn_t pfn, bool map_writable, bool prefault,
bool lpage_disallowed)
{
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned direct_access, access = gw->pt_access;
int top_level, ret;
gfn_t gfn, base_gfn;
int top_level, hlevel, ret;
gfn_t base_gfn = gw->gfn;
direct_access = gw->pte_access;
@ -637,7 +652,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
if (FNAME(gpte_changed)(vcpu, gw, top_level))
goto out_gpte_changed;
if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
goto out_gpte_changed;
for (shadow_walk_init(&it, vcpu, addr);
@ -666,12 +681,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
link_shadow_page(vcpu, it.sptep, sp);
}
/*
* FNAME(page_fault) might have clobbered the bottom bits of
* gw->gfn, restore them from the virtual address.
*/
gfn = gw->gfn | ((addr & PT_LVL_OFFSET_MASK(gw->level)) >> PAGE_SHIFT);
base_gfn = gfn;
hlevel = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn);
trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
@ -682,9 +692,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
* We cannot overwrite existing page tables with an NX
* large page, as the leaf could be executable.
*/
disallowed_hugepage_adjust(it, gfn, &pfn, &hlevel);
disallowed_hugepage_adjust(it, gw->gfn, &pfn, &hlevel);
base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
base_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
if (it.level == hlevel)
break;
@ -765,7 +775,7 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
* Returns: 1 if we need to emulate the instruction, 0 otherwise, or
* a negative value on error.
*/
static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
bool prefault)
{
int write_fault = error_code & PFERR_WRITE_MASK;
@ -773,12 +783,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
struct guest_walker walker;
int r;
kvm_pfn_t pfn;
int level = PT_PAGE_TABLE_LEVEL;
unsigned long mmu_seq;
bool map_writable, is_self_change_mapping;
bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
is_nx_huge_page_enabled();
bool force_pt_level = lpage_disallowed;
int max_level;
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
@ -818,14 +827,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
&walker, user_fault, &vcpu->arch.write_fault_to_shadow_pgtable);
if (walker.level >= PT_DIRECTORY_LEVEL && !is_self_change_mapping) {
level = mapping_level(vcpu, walker.gfn, &force_pt_level);
if (likely(!force_pt_level)) {
level = min(walker.level, level);
walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1);
}
} else
force_pt_level = true;
if (lpage_disallowed || is_self_change_mapping)
max_level = PT_PAGE_TABLE_LEVEL;
else
max_level = walker.level;
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
@ -865,10 +870,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
if (make_mmu_pages_available(vcpu) < 0)
goto out_unlock;
if (!force_pt_level)
transparent_hugepage_adjust(vcpu, walker.gfn, &pfn, &level);
r = FNAME(fetch)(vcpu, addr, &walker, write_fault,
level, pfn, map_writable, prefault, lpage_disallowed);
r = FNAME(fetch)(vcpu, addr, &walker, write_fault, max_level, pfn,
map_writable, prefault, lpage_disallowed);
kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
out_unlock:
@ -945,18 +948,19 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
spin_unlock(&vcpu->kvm->mmu_lock);
}
static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access,
/* Note, @addr is a GPA when gva_to_gpa() translates an L2 GPA to an L1 GPA. */
static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gpa_t addr, u32 access,
struct x86_exception *exception)
{
struct guest_walker walker;
gpa_t gpa = UNMAPPED_GVA;
int r;
r = FNAME(walk_addr)(&walker, vcpu, vaddr, access);
r = FNAME(walk_addr)(&walker, vcpu, addr, access);
if (r) {
gpa = gfn_to_gpa(walker.gfn);
gpa |= vaddr & ~PAGE_MASK;
gpa |= addr & ~PAGE_MASK;
} else if (exception)
*exception = walker.fault;
@ -964,7 +968,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access,
}
#if PTTYPE != PTTYPE_EPT
static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
/* Note, gva_to_gpa_nested() is only used to translate L2 GVAs. */
static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gpa_t vaddr,
u32 access,
struct x86_exception *exception)
{
@ -972,6 +977,11 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
gpa_t gpa = UNMAPPED_GVA;
int r;
#ifndef CONFIG_X86_64
/* A 64-bit GVA should be impossible on 32-bit KVM. */
WARN_ON_ONCE(vaddr >> 32);
#endif
r = FNAME(walk_addr_nested)(&walker, vcpu, vaddr, access);
if (r) {

View File

@ -249,13 +249,13 @@ TRACE_EVENT(
TRACE_EVENT(
fast_page_fault,
TP_PROTO(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code,
TP_PROTO(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u32 error_code,
u64 *sptep, u64 old_spte, bool retry),
TP_ARGS(vcpu, gva, error_code, sptep, old_spte, retry),
TP_ARGS(vcpu, cr2_or_gpa, error_code, sptep, old_spte, retry),
TP_STRUCT__entry(
__field(int, vcpu_id)
__field(gva_t, gva)
__field(gpa_t, cr2_or_gpa)
__field(u32, error_code)
__field(u64 *, sptep)
__field(u64, old_spte)
@ -265,7 +265,7 @@ TRACE_EVENT(
TP_fast_assign(
__entry->vcpu_id = vcpu->vcpu_id;
__entry->gva = gva;
__entry->cr2_or_gpa = cr2_or_gpa;
__entry->error_code = error_code;
__entry->sptep = sptep;
__entry->old_spte = old_spte;
@ -273,9 +273,9 @@ TRACE_EVENT(
__entry->retry = retry;
),
TP_printk("vcpu %d gva %lx error_code %s sptep %p old %#llx"
TP_printk("vcpu %d gva %llx error_code %s sptep %p old %#llx"
" new %llx spurious %d fixed %d", __entry->vcpu_id,
__entry->gva, __print_flags(__entry->error_code, "|",
__entry->cr2_or_gpa, __print_flags(__entry->error_code, "|",
kvm_mmu_trace_pferr_flags), __entry->sptep,
__entry->old_spte, __entry->new_spte,
__spte_satisfied(old_spte), __spte_satisfied(new_spte)

View File

@ -192,11 +192,15 @@ static bool fixed_msr_to_seg_unit(u32 msr, int *seg, int *unit)
break;
case MSR_MTRRfix16K_80000 ... MSR_MTRRfix16K_A0000:
*seg = 1;
*unit = msr - MSR_MTRRfix16K_80000;
*unit = array_index_nospec(
msr - MSR_MTRRfix16K_80000,
MSR_MTRRfix16K_A0000 - MSR_MTRRfix16K_80000 + 1);
break;
case MSR_MTRRfix4K_C0000 ... MSR_MTRRfix4K_F8000:
*seg = 2;
*unit = msr - MSR_MTRRfix4K_C0000;
*unit = array_index_nospec(
msr - MSR_MTRRfix4K_C0000,
MSR_MTRRfix4K_F8000 - MSR_MTRRfix4K_C0000 + 1);
break;
default:
return false;

View File

@ -2,6 +2,8 @@
#ifndef __KVM_X86_PMU_H
#define __KVM_X86_PMU_H
#include <linux/nospec.h>
#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
#define pmc_to_pmu(pmc) (&(pmc)->vcpu->arch.pmu)
@ -102,8 +104,12 @@ static inline bool kvm_valid_perf_global_ctrl(struct kvm_pmu *pmu,
static inline struct kvm_pmc *get_gp_pmc(struct kvm_pmu *pmu, u32 msr,
u32 base)
{
if (msr >= base && msr < base + pmu->nr_arch_gp_counters)
return &pmu->gp_counters[msr - base];
if (msr >= base && msr < base + pmu->nr_arch_gp_counters) {
u32 index = array_index_nospec(msr - base,
pmu->nr_arch_gp_counters);
return &pmu->gp_counters[index];
}
return NULL;
}
@ -113,8 +119,12 @@ static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
{
int base = MSR_CORE_PERF_FIXED_CTR0;
if (msr >= base && msr < base + pmu->nr_arch_fixed_counters)
return &pmu->fixed_counters[msr - base];
if (msr >= base && msr < base + pmu->nr_arch_fixed_counters) {
u32 index = array_index_nospec(msr - base,
pmu->nr_arch_fixed_counters);
return &pmu->fixed_counters[index];
}
return NULL;
}

View File

@ -1307,6 +1307,47 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu)
}
}
/*
* The default MMIO mask is a single bit (excluding the present bit),
* which could conflict with the memory encryption bit. Check for
* memory encryption support and override the default MMIO mask if
* memory encryption is enabled.
*/
static __init void svm_adjust_mmio_mask(void)
{
unsigned int enc_bit, mask_bit;
u64 msr, mask;
/* If there is no memory encryption support, use existing mask */
if (cpuid_eax(0x80000000) < 0x8000001f)
return;
/* If memory encryption is not enabled, use existing mask */
rdmsrl(MSR_K8_SYSCFG, msr);
if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
return;
enc_bit = cpuid_ebx(0x8000001f) & 0x3f;
mask_bit = boot_cpu_data.x86_phys_bits;
/* Increment the mask bit if it is the same as the encryption bit */
if (enc_bit == mask_bit)
mask_bit++;
/*
* If the mask bit location is below 52, then some bits above the
* physical addressing limit will always be reserved, so use the
* rsvd_bits() function to generate the mask. This mask, along with
* the present bit, will be used to generate a page fault with
* PFER.RSV = 1.
*
* If the mask bit location is 52 (or above), then clear the mask.
*/
mask = (mask_bit < 52) ? rsvd_bits(mask_bit, 51) | PT_PRESENT_MASK : 0;
kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK);
}
static __init int svm_hardware_setup(void)
{
int cpu;
@ -1361,6 +1402,8 @@ static __init int svm_hardware_setup(void)
}
}
svm_adjust_mmio_mask();
for_each_possible_cpu(cpu) {
r = svm_cpu_init(cpu);
if (r)
@ -2144,7 +2187,7 @@ static int avic_init_vcpu(struct vcpu_svm *svm)
return ret;
}
static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
static int svm_create_vcpu(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm;
struct page *page;
@ -2153,39 +2196,13 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
struct page *nested_msrpm_pages;
int err;
BUILD_BUG_ON_MSG(offsetof(struct vcpu_svm, vcpu) != 0,
"struct kvm_vcpu must be at offset 0 for arch usercopy region");
svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL_ACCOUNT);
if (!svm) {
err = -ENOMEM;
goto out;
}
svm->vcpu.arch.user_fpu = kmem_cache_zalloc(x86_fpu_cache,
GFP_KERNEL_ACCOUNT);
if (!svm->vcpu.arch.user_fpu) {
printk(KERN_ERR "kvm: failed to allocate kvm userspace's fpu\n");
err = -ENOMEM;
goto free_partial_svm;
}
svm->vcpu.arch.guest_fpu = kmem_cache_zalloc(x86_fpu_cache,
GFP_KERNEL_ACCOUNT);
if (!svm->vcpu.arch.guest_fpu) {
printk(KERN_ERR "kvm: failed to allocate vcpu's fpu\n");
err = -ENOMEM;
goto free_user_fpu;
}
err = kvm_vcpu_init(&svm->vcpu, kvm, id);
if (err)
goto free_svm;
BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0);
svm = to_svm(vcpu);
err = -ENOMEM;
page = alloc_page(GFP_KERNEL_ACCOUNT);
if (!page)
goto uninit;
goto out;
msrpm_pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
if (!msrpm_pages)
@ -2222,9 +2239,9 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
svm_init_osvw(&svm->vcpu);
svm_init_osvw(vcpu);
return &svm->vcpu;
return 0;
free_page4:
__free_page(hsave_page);
@ -2234,16 +2251,8 @@ free_page2:
__free_pages(msrpm_pages, MSRPM_ALLOC_ORDER);
free_page1:
__free_page(page);
uninit:
kvm_vcpu_uninit(&svm->vcpu);
free_svm:
kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.guest_fpu);
free_user_fpu:
kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.user_fpu);
free_partial_svm:
kmem_cache_free(kvm_vcpu_cache, svm);
out:
return ERR_PTR(err);
return err;
}
static void svm_clear_current_vmcb(struct vmcb *vmcb)
@ -2269,10 +2278,6 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
__free_page(virt_to_page(svm->nested.hsave));
__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.user_fpu);
kmem_cache_free(x86_fpu_cache, svm->vcpu.arch.guest_fpu);
kmem_cache_free(kvm_vcpu_cache, svm);
}
static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
@ -4281,12 +4286,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
!guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD))
return 1;
/* The STIBP bit doesn't fault even if it's not advertised */
if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD))
if (data & ~kvm_spec_ctrl_valid_bits(vcpu))
return 1;
svm->spec_ctrl = data;
if (!data)
break;
@ -4310,13 +4313,12 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
if (data & ~PRED_CMD_IBPB)
return 1;
if (!boot_cpu_has(X86_FEATURE_AMD_IBPB))
return 1;
if (!data)
break;
wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
if (is_guest_mode(vcpu))
break;
set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
break;
case MSR_AMD64_VIRT_SPEC_CTRL:
@ -4519,9 +4521,9 @@ static int avic_incomplete_ipi_interception(struct vcpu_svm *svm)
*/
kvm_for_each_vcpu(i, vcpu, kvm) {
bool m = kvm_apic_match_dest(vcpu, apic,
icrl & KVM_APIC_SHORT_MASK,
icrl & APIC_SHORT_MASK,
GET_APIC_DEST_FIELD(icrh),
icrl & KVM_APIC_DEST_MASK);
icrl & APIC_DEST_MASK);
if (m && !avic_vcpu_is_running(vcpu))
kvm_vcpu_wake_up(vcpu);
@ -4935,7 +4937,8 @@ static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2)
*info2 = control->exit_info_2;
}
static int handle_exit(struct kvm_vcpu *vcpu)
static int handle_exit(struct kvm_vcpu *vcpu,
enum exit_fastpath_completion exit_fastpath)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct kvm_run *kvm_run = vcpu->run;
@ -4993,7 +4996,10 @@ static int handle_exit(struct kvm_vcpu *vcpu)
__func__, svm->vmcb->control.exit_int_info,
exit_code);
if (exit_code >= ARRAY_SIZE(svm_exit_handlers)
if (exit_fastpath == EXIT_FASTPATH_SKIP_EMUL_INS) {
kvm_skip_emulated_instruction(vcpu);
return 1;
} else if (exit_code >= ARRAY_SIZE(svm_exit_handlers)
|| !svm_exit_handlers[exit_code]) {
vcpu_unimpl(vcpu, "svm: unexpected exit reason 0x%x\n", exit_code);
dump_vmcb(vcpu);
@ -5913,6 +5919,7 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
vcpu->arch.xsaves_enabled = guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
boot_cpu_has(X86_FEATURE_XSAVE) &&
boot_cpu_has(X86_FEATURE_XSAVES);
/* Update nrips enabled cache */
@ -5924,14 +5931,14 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
guest_cpuid_clear(vcpu, X86_FEATURE_X2APIC);
}
#define F(x) bit(X86_FEATURE_##x)
#define F feature_bit
static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
{
switch (func) {
case 0x1:
if (avic)
entry->ecx &= ~bit(X86_FEATURE_X2APIC);
entry->ecx &= ~F(X2APIC);
break;
case 0x80000001:
if (nested)
@ -6001,6 +6008,11 @@ static bool svm_has_wbinvd_exit(void)
return true;
}
static bool svm_pku_supported(void)
{
return false;
}
#define PRE_EX(exit) { .exit_code = (exit), \
.stage = X86_ICPT_PRE_EXCEPT, }
#define POST_EX(exit) { .exit_code = (exit), \
@ -6186,9 +6198,12 @@ out:
return ret;
}
static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu)
static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu,
enum exit_fastpath_completion *exit_fastpath)
{
if (!is_guest_mode(vcpu) &&
to_svm(vcpu)->vmcb->control.exit_code == EXIT_REASON_MSR_WRITE)
*exit_fastpath = handle_fastpath_set_msr_irqoff(vcpu);
}
static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu)
@ -7341,6 +7356,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.xsaves_supported = svm_xsaves_supported,
.umip_emulated = svm_umip_emulated,
.pt_supported = svm_pt_supported,
.pku_supported = svm_pku_supported,
.set_supported_cpuid = svm_set_supported_cpuid,

View File

@ -145,6 +145,11 @@ static inline bool vmx_umip_emulated(void)
SECONDARY_EXEC_DESC;
}
static inline bool vmx_pku_supported(void)
{
return boot_cpu_has(X86_FEATURE_PKU);
}
static inline bool cpu_has_vmx_rdtscp(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &

View File

@ -350,17 +350,12 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
uint16_t *vmcs_version)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
bool evmcs_already_enabled = vmx->nested.enlightened_vmcs_enabled;
vmx->nested.enlightened_vmcs_enabled = true;
if (vmcs_version)
*vmcs_version = nested_get_evmcs_version(vcpu);
/* We don't support disabling the feature for simplicity. */
if (evmcs_already_enabled)
return 0;
vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
vmx->nested.msrs.entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
vmx->nested.msrs.exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;

View File

@ -28,16 +28,6 @@ module_param(nested_early_check, bool, S_IRUGO);
failed; \
})
#define SET_MSR_OR_WARN(vcpu, idx, data) \
({ \
bool failed = kvm_set_msr(vcpu, idx, data); \
if (failed) \
pr_warn_ratelimited( \
"%s cannot write MSR (0x%x, 0x%llx)\n", \
__func__, idx, data); \
failed; \
})
/*
* Hyper-V requires all of these, so mark them as supported even though
* they are just treated the same as all-context.
@ -2172,8 +2162,8 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
* EXEC CONTROLS
*/
exec_control = vmx_exec_control(vmx); /* L0's desires */
exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
exec_control &= ~CPU_BASED_INTR_WINDOW_EXITING;
exec_control &= ~CPU_BASED_NMI_WINDOW_EXITING;
exec_control &= ~CPU_BASED_TPR_SHADOW;
exec_control |= vmcs12->cpu_based_vm_exec_control;
@ -2550,8 +2540,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
vcpu->arch.walk_mmu->inject_page_fault = vmx_inject_page_fault_nested;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) &&
SET_MSR_OR_WARN(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
vmcs12->guest_ia32_perf_global_ctrl))
WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
vmcs12->guest_ia32_perf_global_ctrl)))
return -EINVAL;
kvm_rsp_write(vcpu, vmcs12->guest_rsp);
@ -2566,7 +2556,7 @@ static int nested_vmx_check_nmi_controls(struct vmcs12 *vmcs12)
return -EINVAL;
if (CC(!nested_cpu_has_virtual_nmis(vmcs12) &&
nested_cpu_has(vmcs12, CPU_BASED_VIRTUAL_NMI_PENDING)))
nested_cpu_has(vmcs12, CPU_BASED_NMI_WINDOW_EXITING)))
return -EINVAL;
return 0;
@ -2823,7 +2813,6 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
CC(vmcs12->host_ss_selector == 0 && !ia32e))
return -EINVAL;
#ifdef CONFIG_X86_64
if (CC(is_noncanonical_address(vmcs12->host_fs_base, vcpu)) ||
CC(is_noncanonical_address(vmcs12->host_gs_base, vcpu)) ||
CC(is_noncanonical_address(vmcs12->host_gdtr_base, vcpu)) ||
@ -2831,7 +2820,6 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
CC(is_noncanonical_address(vmcs12->host_tr_base, vcpu)) ||
CC(is_noncanonical_address(vmcs12->host_rip, vcpu)))
return -EINVAL;
#endif
/*
* If the load IA32_EFER VM-exit control is 1, bits reserved in the
@ -2899,6 +2887,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
CC(!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
CC(!kvm_pat_valid(vmcs12->guest_ia32_pat)))
return -EINVAL;
@ -3048,9 +3040,6 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu)
return 0;
}
static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12);
static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
{
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
@ -3183,7 +3172,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
u32 exit_qual;
evaluate_pending_interrupts = exec_controls_get(vmx) &
(CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING);
(CPU_BASED_INTR_WINDOW_EXITING | CPU_BASED_NMI_WINDOW_EXITING);
if (likely(!evaluate_pending_interrupts) && kvm_vcpu_apicv_active(vcpu))
evaluate_pending_interrupts |= vmx_has_apicv_interrupt(vcpu);
@ -3230,7 +3219,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
}
enter_guest_mode(vcpu);
if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING)
if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
vcpu->arch.tsc_offset += vmcs12->tsc_offset;
if (prepare_vmcs02(vcpu, vmcs12, &exit_qual))
@ -3294,7 +3283,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
* 26.7 "VM-entry failures during or after loading guest state".
*/
vmentry_fail_vmexit_guest_mode:
if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING)
if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
vcpu->arch.tsc_offset -= vmcs12->tsc_offset;
leave_guest_mode(vcpu);
@ -3407,8 +3396,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
*/
if ((vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) &&
!(vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) &&
!(vmcs12->cpu_based_vm_exec_control & CPU_BASED_VIRTUAL_NMI_PENDING) &&
!((vmcs12->cpu_based_vm_exec_control & CPU_BASED_VIRTUAL_INTR_PENDING) &&
!(vmcs12->cpu_based_vm_exec_control & CPU_BASED_NMI_WINDOW_EXITING) &&
!((vmcs12->cpu_based_vm_exec_control & CPU_BASED_INTR_WINDOW_EXITING) &&
(vmcs12->guest_rflags & X86_EFLAGS_IF))) {
vmx->nested.nested_run_pending = 0;
return kvm_vcpu_halt(vcpu);
@ -3427,7 +3416,7 @@ vmentry_failed:
/*
* On a nested exit from L2 to L1, vmcs12.guest_cr0 might not be up-to-date
* because L2 may have changed some cr0 bits directly (CRO_GUEST_HOST_MASK).
* because L2 may have changed some cr0 bits directly (CR0_GUEST_HOST_MASK).
* This function returns the new value we should put in vmcs12.guest_cr0.
* It's not enough to just return the vmcs02 GUEST_CR0. Rather,
* 1. Bits that neither L0 nor L1 trapped, were set directly by L2 and are now
@ -3999,8 +3988,8 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
vcpu->arch.pat = vmcs12->host_ia32_pat;
}
if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL)
SET_MSR_OR_WARN(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
vmcs12->host_ia32_perf_global_ctrl);
WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
vmcs12->host_ia32_perf_global_ctrl));
/* Set L1 segment info according to Intel SDM
27.5.2 Loading Host Segment and Descriptor-Table Registers */
@ -4209,7 +4198,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
if (nested_cpu_has_preemption_timer(vmcs12))
hrtimer_cancel(&to_vmx(vcpu)->nested.preemption_timer);
if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING)
if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
vcpu->arch.tsc_offset -= vmcs12->tsc_offset;
if (likely(!vmx->fail)) {
@ -4751,36 +4740,32 @@ static int handle_vmresume(struct kvm_vcpu *vcpu)
static int handle_vmread(struct kvm_vcpu *vcpu)
{
unsigned long field;
u64 field_value;
struct vmcs12 *vmcs12 = is_guest_mode(vcpu) ? get_shadow_vmcs12(vcpu)
: get_vmcs12(vcpu);
unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
int len;
gva_t gva = 0;
struct vmcs12 *vmcs12;
u32 instr_info = vmcs_read32(VMX_INSTRUCTION_INFO);
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct x86_exception e;
unsigned long field;
u64 value;
gva_t gva = 0;
short offset;
int len;
if (!nested_vmx_check_permission(vcpu))
return 1;
if (to_vmx(vcpu)->nested.current_vmptr == -1ull)
/*
* In VMX non-root operation, when the VMCS-link pointer is -1ull,
* any VMREAD sets the ALU flags for VMfailInvalid.
*/
if (vmx->nested.current_vmptr == -1ull ||
(is_guest_mode(vcpu) &&
get_vmcs12(vcpu)->vmcs_link_pointer == -1ull))
return nested_vmx_failInvalid(vcpu);
if (!is_guest_mode(vcpu))
vmcs12 = get_vmcs12(vcpu);
else {
/*
* When vmcs->vmcs_link_pointer is -1ull, any VMREAD
* to shadowed-field sets the ALU flags for VMfailInvalid.
*/
if (get_vmcs12(vcpu)->vmcs_link_pointer == -1ull)
return nested_vmx_failInvalid(vcpu);
vmcs12 = get_shadow_vmcs12(vcpu);
}
/* Decode instruction info and find the field to read */
field = kvm_register_readl(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
field = kvm_register_readl(vcpu, (((instr_info) >> 28) & 0xf));
offset = vmcs_field_to_offset(field);
if (offset < 0)
@ -4790,25 +4775,26 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
if (!is_guest_mode(vcpu) && is_vmcs12_ext_field(field))
copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12);
/* Read the field, zero-extended to a u64 field_value */
field_value = vmcs12_read_any(vmcs12, field, offset);
/* Read the field, zero-extended to a u64 value */
value = vmcs12_read_any(vmcs12, field, offset);
/*
* Now copy part of this value to register or memory, as requested.
* Note that the number of bits actually copied is 32 or 64 depending
* on the guest's mode (32 or 64 bit), not on the given field's length.
*/
if (vmx_instruction_info & (1u << 10)) {
kvm_register_writel(vcpu, (((vmx_instruction_info) >> 3) & 0xf),
field_value);
if (instr_info & BIT(10)) {
kvm_register_writel(vcpu, (((instr_info) >> 3) & 0xf), value);
} else {
len = is_64_bit_mode(vcpu) ? 8 : 4;
if (get_vmx_mem_address(vcpu, exit_qualification,
vmx_instruction_info, true, len, &gva))
instr_info, true, len, &gva))
return 1;
/* _system ok, nested_vmx_check_permission has verified cpl=0 */
if (kvm_write_guest_virt_system(vcpu, gva, &field_value, len, &e))
if (kvm_write_guest_virt_system(vcpu, gva, &value, len, &e)) {
kvm_inject_page_fault(vcpu, &e);
return 1;
}
}
return nested_vmx_succeed(vcpu);
@ -4840,46 +4826,58 @@ static bool is_shadow_field_ro(unsigned long field)
static int handle_vmwrite(struct kvm_vcpu *vcpu)
{
unsigned long field;
int len;
gva_t gva;
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct vmcs12 *vmcs12 = is_guest_mode(vcpu) ? get_shadow_vmcs12(vcpu)
: get_vmcs12(vcpu);
unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
u32 instr_info = vmcs_read32(VMX_INSTRUCTION_INFO);
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct x86_exception e;
unsigned long field;
short offset;
gva_t gva;
int len;
/* The value to write might be 32 or 64 bits, depending on L1's long
/*
* The value to write might be 32 or 64 bits, depending on L1's long
* mode, and eventually we need to write that into a field of several
* possible lengths. The code below first zero-extends the value to 64
* bit (field_value), and then copies only the appropriate number of
* bit (value), and then copies only the appropriate number of
* bits into the vmcs12 field.
*/
u64 field_value = 0;
struct x86_exception e;
struct vmcs12 *vmcs12;
short offset;
u64 value = 0;
if (!nested_vmx_check_permission(vcpu))
return 1;
if (vmx->nested.current_vmptr == -1ull)
/*
* In VMX non-root operation, when the VMCS-link pointer is -1ull,
* any VMWRITE sets the ALU flags for VMfailInvalid.
*/
if (vmx->nested.current_vmptr == -1ull ||
(is_guest_mode(vcpu) &&
get_vmcs12(vcpu)->vmcs_link_pointer == -1ull))
return nested_vmx_failInvalid(vcpu);
if (vmx_instruction_info & (1u << 10))
field_value = kvm_register_readl(vcpu,
(((vmx_instruction_info) >> 3) & 0xf));
if (instr_info & BIT(10))
value = kvm_register_readl(vcpu, (((instr_info) >> 3) & 0xf));
else {
len = is_64_bit_mode(vcpu) ? 8 : 4;
if (get_vmx_mem_address(vcpu, exit_qualification,
vmx_instruction_info, false, len, &gva))
instr_info, false, len, &gva))
return 1;
if (kvm_read_guest_virt(vcpu, gva, &field_value, len, &e)) {
if (kvm_read_guest_virt(vcpu, gva, &value, len, &e)) {
kvm_inject_page_fault(vcpu, &e);
return 1;
}
}
field = kvm_register_readl(vcpu, (((instr_info) >> 28) & 0xf));
offset = vmcs_field_to_offset(field);
if (offset < 0)
return nested_vmx_failValid(vcpu,
VMXERR_UNSUPPORTED_VMCS_COMPONENT);
field = kvm_register_readl(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
/*
* If the vCPU supports "VMWRITE to any supported field in the
* VMCS," then the "read-only" fields are actually read/write.
@ -4889,29 +4887,12 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
return nested_vmx_failValid(vcpu,
VMXERR_VMWRITE_READ_ONLY_VMCS_COMPONENT);
if (!is_guest_mode(vcpu)) {
vmcs12 = get_vmcs12(vcpu);
/*
* Ensure vmcs12 is up-to-date before any VMWRITE that dirties
* vmcs12, else we may crush a field or consume a stale value.
*/
if (!is_shadow_field_rw(field))
copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12);
} else {
/*
* When vmcs->vmcs_link_pointer is -1ull, any VMWRITE
* to shadowed-field sets the ALU flags for VMfailInvalid.
*/
if (get_vmcs12(vcpu)->vmcs_link_pointer == -1ull)
return nested_vmx_failInvalid(vcpu);
vmcs12 = get_shadow_vmcs12(vcpu);
}
offset = vmcs_field_to_offset(field);
if (offset < 0)
return nested_vmx_failValid(vcpu,
VMXERR_UNSUPPORTED_VMCS_COMPONENT);
/*
* Ensure vmcs12 is up-to-date before any VMWRITE that dirties
* vmcs12, else we may crush a field or consume a stale value.
*/
if (!is_guest_mode(vcpu) && !is_shadow_field_rw(field))
copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12);
/*
* Some Intel CPUs intentionally drop the reserved bits of the AR byte
@ -4922,9 +4903,9 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
* the stripped down value, L2 sees the full value as stored by KVM).
*/
if (field >= GUEST_ES_AR_BYTES && field <= GUEST_TR_AR_BYTES)
field_value &= 0x1f0ff;
value &= 0x1f0ff;
vmcs12_write_any(vmcs12, field, offset, field_value);
vmcs12_write_any(vmcs12, field, offset, value);
/*
* Do not track vmcs12 dirty-state if in guest-mode as we actually
@ -4941,7 +4922,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
preempt_disable();
vmcs_load(vmx->vmcs01.shadow_vmcs);
__vmcs_writel(field, field_value);
__vmcs_writel(field, value);
vmcs_clear(vmx->vmcs01.shadow_vmcs);
vmcs_load(vmx->loaded_vmcs->vmcs);
@ -5524,10 +5505,10 @@ bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason)
return false;
case EXIT_REASON_TRIPLE_FAULT:
return true;
case EXIT_REASON_PENDING_INTERRUPT:
return nested_cpu_has(vmcs12, CPU_BASED_VIRTUAL_INTR_PENDING);
case EXIT_REASON_INTERRUPT_WINDOW:
return nested_cpu_has(vmcs12, CPU_BASED_INTR_WINDOW_EXITING);
case EXIT_REASON_NMI_WINDOW:
return nested_cpu_has(vmcs12, CPU_BASED_VIRTUAL_NMI_PENDING);
return nested_cpu_has(vmcs12, CPU_BASED_NMI_WINDOW_EXITING);
case EXIT_REASON_TASK_SWITCH:
return true;
case EXIT_REASON_CPUID:
@ -6015,8 +5996,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
msrs->procbased_ctls_low =
CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
msrs->procbased_ctls_high &=
CPU_BASED_VIRTUAL_INTR_PENDING |
CPU_BASED_VIRTUAL_NMI_PENDING | CPU_BASED_USE_TSC_OFFSETING |
CPU_BASED_INTR_WINDOW_EXITING |
CPU_BASED_NMI_WINDOW_EXITING | CPU_BASED_USE_TSC_OFFSETTING |
CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING |
CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING |
CPU_BASED_CR3_STORE_EXITING |

View File

@ -86,10 +86,14 @@ static unsigned intel_find_arch_event(struct kvm_pmu *pmu,
static unsigned intel_find_fixed_event(int idx)
{
if (idx >= ARRAY_SIZE(fixed_pmc_events))
u32 event;
size_t size = ARRAY_SIZE(fixed_pmc_events);
if (idx >= size)
return PERF_COUNT_HW_MAX;
return intel_arch_events[fixed_pmc_events[idx]].event_type;
event = fixed_pmc_events[array_index_nospec(idx, size)];
return intel_arch_events[event].event_type;
}
/* check if a PMC is enabled by comparing it with globl_ctrl bits. */
@ -130,16 +134,20 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
bool fixed = idx & (1u << 30);
struct kvm_pmc *counters;
unsigned int num_counters;
idx &= ~(3u << 30);
if (!fixed && idx >= pmu->nr_arch_gp_counters)
if (fixed) {
counters = pmu->fixed_counters;
num_counters = pmu->nr_arch_fixed_counters;
} else {
counters = pmu->gp_counters;
num_counters = pmu->nr_arch_gp_counters;
}
if (idx >= num_counters)
return NULL;
if (fixed && idx >= pmu->nr_arch_fixed_counters)
return NULL;
counters = fixed ? pmu->fixed_counters : pmu->gp_counters;
*mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
return &counters[idx];
return &counters[array_index_nospec(idx, num_counters)];
}
static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)

View File

@ -23,12 +23,12 @@ BUILD_BUG_ON(1)
*
* When adding or removing fields here, note that shadowed
* fields must always be synced by prepare_vmcs02, not just
* prepare_vmcs02_full.
* prepare_vmcs02_rare.
*/
/*
* Keeping the fields ordered by size is an attempt at improving
* branch prediction in vmcs_read_any and vmcs_write_any.
* branch prediction in vmcs12_read_any and vmcs12_write_any.
*/
/* 16-bits */

View File

@ -1057,6 +1057,12 @@ static unsigned long segment_base(u16 selector)
}
#endif
static inline bool pt_can_write_msr(struct vcpu_vmx *vmx)
{
return (pt_mode == PT_MODE_HOST_GUEST) &&
!(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN);
}
static inline void pt_load_msr(struct pt_ctx *ctx, u32 addr_range)
{
u32 i;
@ -1716,7 +1722,7 @@ static u64 vmx_read_l1_tsc_offset(struct kvm_vcpu *vcpu)
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
if (is_guest_mode(vcpu) &&
(vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING))
(vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING))
return vcpu->arch.tsc_offset - vmcs12->tsc_offset;
return vcpu->arch.tsc_offset;
@ -1734,7 +1740,7 @@ static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
* to the newly set TSC to get L2's TSC.
*/
if (is_guest_mode(vcpu) &&
(vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING))
(vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING))
g_tsc_offset = vmcs12->tsc_offset;
trace_kvm_write_tsc_offset(vcpu->vcpu_id,
@ -1773,8 +1779,6 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
default:
return 1;
}
return 0;
}
/*
@ -1916,7 +1920,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
/*
* Writes msr value into into the appropriate "register".
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
* Assumes vcpu_load() was already called.
*/
@ -1994,12 +1998,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
return 1;
/* The STIBP bit doesn't fault even if it's not advertised */
if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD))
if (data & ~kvm_spec_ctrl_valid_bits(vcpu))
return 1;
vmx->spec_ctrl = data;
if (!data)
break;
@ -2010,7 +2012,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
*
* For nested:
* The handling of the MSR bitmap for L2 guests is done in
* nested_vmx_merge_msr_bitmap. We should not touch the
* nested_vmx_prepare_msr_bitmap. We should not touch the
* vmcs02.msr_bitmap here since it gets completely overwritten
* in the merging. We update the vmcs01 here for L1 as well
* since it will end up touching the MSR anyway now.
@ -2033,7 +2035,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data & ~PRED_CMD_IBPB)
return 1;
if (!boot_cpu_has(X86_FEATURE_SPEC_CTRL))
return 1;
if (!data)
break;
@ -2046,7 +2049,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
*
* For nested:
* The handling of the MSR bitmap for L2 guests is done in
* nested_vmx_merge_msr_bitmap. We should not touch the
* nested_vmx_prepare_msr_bitmap. We should not touch the
* vmcs02.msr_bitmap here since it gets completely overwritten
* in the merging.
*/
@ -2104,47 +2107,50 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
pt_update_intercept_for_msr(vmx);
break;
case MSR_IA32_RTIT_STATUS:
if ((pt_mode != PT_MODE_HOST_GUEST) ||
(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
(data & MSR_IA32_RTIT_STATUS_MASK))
if (!pt_can_write_msr(vmx))
return 1;
if (data & MSR_IA32_RTIT_STATUS_MASK)
return 1;
vmx->pt_desc.guest.status = data;
break;
case MSR_IA32_RTIT_CR3_MATCH:
if ((pt_mode != PT_MODE_HOST_GUEST) ||
(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
!intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_cr3_filtering))
if (!pt_can_write_msr(vmx))
return 1;
if (!intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_cr3_filtering))
return 1;
vmx->pt_desc.guest.cr3_match = data;
break;
case MSR_IA32_RTIT_OUTPUT_BASE:
if ((pt_mode != PT_MODE_HOST_GUEST) ||
(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
(!intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_topa_output) &&
!intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_single_range_output)) ||
(data & MSR_IA32_RTIT_OUTPUT_BASE_MASK))
if (!pt_can_write_msr(vmx))
return 1;
if (!intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_topa_output) &&
!intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_single_range_output))
return 1;
if (data & MSR_IA32_RTIT_OUTPUT_BASE_MASK)
return 1;
vmx->pt_desc.guest.output_base = data;
break;
case MSR_IA32_RTIT_OUTPUT_MASK:
if ((pt_mode != PT_MODE_HOST_GUEST) ||
(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
(!intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_topa_output) &&
!intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_single_range_output)))
if (!pt_can_write_msr(vmx))
return 1;
if (!intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_topa_output) &&
!intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_single_range_output))
return 1;
vmx->pt_desc.guest.output_mask = data;
break;
case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
if (!pt_can_write_msr(vmx))
return 1;
index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
if ((pt_mode != PT_MODE_HOST_GUEST) ||
(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
(index >= 2 * intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_num_address_ranges)))
if (index >= 2 * intel_pt_validate_cap(vmx->pt_desc.caps,
PT_CAP_num_address_ranges))
return 1;
if (is_noncanonical_address(data, vcpu))
return 1;
if (index % 2)
vmx->pt_desc.guest.addr_b[index / 2] = data;
@ -2322,7 +2328,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
CPU_BASED_CR3_STORE_EXITING |
CPU_BASED_UNCOND_IO_EXITING |
CPU_BASED_MOV_DR_EXITING |
CPU_BASED_USE_TSC_OFFSETING |
CPU_BASED_USE_TSC_OFFSETTING |
CPU_BASED_MWAIT_EXITING |
CPU_BASED_MONITOR_EXITING |
CPU_BASED_INVLPG_EXITING |
@ -2657,8 +2663,6 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
vmx->rmode.vm86_active = 0;
vmx_segment_cache_clear(vmx);
vmx_set_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_TR], VCPU_SREG_TR);
flags = vmcs_readl(GUEST_RFLAGS);
@ -3444,7 +3448,7 @@ out:
static int init_rmode_identity_map(struct kvm *kvm)
{
struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
int i, idx, r = 0;
int i, r = 0;
kvm_pfn_t identity_map_pfn;
u32 tmp;
@ -3452,7 +3456,7 @@ static int init_rmode_identity_map(struct kvm *kvm)
mutex_lock(&kvm->slots_lock);
if (likely(kvm_vmx->ept_identity_pagetable_done))
goto out2;
goto out;
if (!kvm_vmx->ept_identity_map_addr)
kvm_vmx->ept_identity_map_addr = VMX_EPT_IDENTITY_PAGETABLE_ADDR;
@ -3461,9 +3465,8 @@ static int init_rmode_identity_map(struct kvm *kvm)
r = __x86_set_memory_region(kvm, IDENTITY_PAGETABLE_PRIVATE_MEMSLOT,
kvm_vmx->ept_identity_map_addr, PAGE_SIZE);
if (r < 0)
goto out2;
goto out;
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
goto out;
@ -3479,9 +3482,6 @@ static int init_rmode_identity_map(struct kvm *kvm)
kvm_vmx->ept_identity_pagetable_done = true;
out:
srcu_read_unlock(&kvm->srcu, idx);
out2:
mutex_unlock(&kvm->slots_lock);
return r;
}
@ -4009,6 +4009,7 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
if (vmx_xsaves_supported()) {
/* Exposing XSAVES only when XSAVE is exposed */
bool xsaves_enabled =
boot_cpu_has(X86_FEATURE_XSAVE) &&
guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
guest_cpuid_has(vcpu, X86_FEATURE_XSAVES);
@ -4319,7 +4320,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
static void enable_irq_window(struct kvm_vcpu *vcpu)
{
exec_controls_setbit(to_vmx(vcpu), CPU_BASED_VIRTUAL_INTR_PENDING);
exec_controls_setbit(to_vmx(vcpu), CPU_BASED_INTR_WINDOW_EXITING);
}
static void enable_nmi_window(struct kvm_vcpu *vcpu)
@ -4330,7 +4331,7 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
return;
}
exec_controls_setbit(to_vmx(vcpu), CPU_BASED_VIRTUAL_NMI_PENDING);
exec_controls_setbit(to_vmx(vcpu), CPU_BASED_NMI_WINDOW_EXITING);
}
static void vmx_inject_irq(struct kvm_vcpu *vcpu)
@ -4455,8 +4456,11 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
if (enable_unrestricted_guest)
return 0;
ret = x86_set_memory_region(kvm, TSS_PRIVATE_MEMSLOT, addr,
PAGE_SIZE * 3);
mutex_lock(&kvm->slots_lock);
ret = __x86_set_memory_region(kvm, TSS_PRIVATE_MEMSLOT, addr,
PAGE_SIZE * 3);
mutex_unlock(&kvm->slots_lock);
if (ret)
return ret;
to_kvm_vmx(kvm)->tss_addr = addr;
@ -4938,7 +4942,7 @@ static int handle_tpr_below_threshold(struct kvm_vcpu *vcpu)
static int handle_interrupt_window(struct kvm_vcpu *vcpu)
{
exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_VIRTUAL_INTR_PENDING);
exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_INTR_WINDOW_EXITING);
kvm_make_request(KVM_REQ_EVENT, vcpu);
@ -5151,7 +5155,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
static int handle_nmi_window(struct kvm_vcpu *vcpu)
{
WARN_ON_ONCE(!enable_vnmi);
exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_VIRTUAL_NMI_PENDING);
exec_controls_clearbit(to_vmx(vcpu), CPU_BASED_NMI_WINDOW_EXITING);
++vcpu->stat.nmi_window_exits;
kvm_make_request(KVM_REQ_EVENT, vcpu);
@ -5172,7 +5176,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
WARN_ON_ONCE(vmx->emulation_required && vmx->nested.nested_run_pending);
intr_window_requested = exec_controls_get(vmx) &
CPU_BASED_VIRTUAL_INTR_PENDING;
CPU_BASED_INTR_WINDOW_EXITING;
while (vmx->emulation_required && count-- != 0) {
if (intr_window_requested && vmx_interrupt_allowed(vcpu))
@ -5496,7 +5500,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_CPUID] = kvm_emulate_cpuid,
[EXIT_REASON_MSR_READ] = kvm_emulate_rdmsr,
[EXIT_REASON_MSR_WRITE] = kvm_emulate_wrmsr,
[EXIT_REASON_PENDING_INTERRUPT] = handle_interrupt_window,
[EXIT_REASON_INTERRUPT_WINDOW] = handle_interrupt_window,
[EXIT_REASON_HLT] = kvm_emulate_halt,
[EXIT_REASON_INVD] = handle_invd,
[EXIT_REASON_INVLPG] = handle_invlpg,
@ -5783,7 +5787,8 @@ void dump_vmcs(void)
* The guest has exited. See if we can fix it or if we need userspace
* assistance.
*/
static int vmx_handle_exit(struct kvm_vcpu *vcpu)
static int vmx_handle_exit(struct kvm_vcpu *vcpu,
enum exit_fastpath_completion exit_fastpath)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exit_reason = vmx->exit_reason;
@ -5869,34 +5874,44 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
}
}
if (exit_reason < kvm_vmx_max_exit_handlers
&& kvm_vmx_exit_handlers[exit_reason]) {
#ifdef CONFIG_RETPOLINE
if (exit_reason == EXIT_REASON_MSR_WRITE)
return kvm_emulate_wrmsr(vcpu);
else if (exit_reason == EXIT_REASON_PREEMPTION_TIMER)
return handle_preemption_timer(vcpu);
else if (exit_reason == EXIT_REASON_PENDING_INTERRUPT)
return handle_interrupt_window(vcpu);
else if (exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
return handle_external_interrupt(vcpu);
else if (exit_reason == EXIT_REASON_HLT)
return kvm_emulate_halt(vcpu);
else if (exit_reason == EXIT_REASON_EPT_MISCONFIG)
return handle_ept_misconfig(vcpu);
#endif
return kvm_vmx_exit_handlers[exit_reason](vcpu);
} else {
vcpu_unimpl(vcpu, "vmx: unexpected exit reason 0x%x\n",
exit_reason);
dump_vmcs();
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
vcpu->run->internal.suberror =
KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
vcpu->run->internal.ndata = 1;
vcpu->run->internal.data[0] = exit_reason;
return 0;
if (exit_fastpath == EXIT_FASTPATH_SKIP_EMUL_INS) {
kvm_skip_emulated_instruction(vcpu);
return 1;
}
if (exit_reason >= kvm_vmx_max_exit_handlers)
goto unexpected_vmexit;
#ifdef CONFIG_RETPOLINE
if (exit_reason == EXIT_REASON_MSR_WRITE)
return kvm_emulate_wrmsr(vcpu);
else if (exit_reason == EXIT_REASON_PREEMPTION_TIMER)
return handle_preemption_timer(vcpu);
else if (exit_reason == EXIT_REASON_INTERRUPT_WINDOW)
return handle_interrupt_window(vcpu);
else if (exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
return handle_external_interrupt(vcpu);
else if (exit_reason == EXIT_REASON_HLT)
return kvm_emulate_halt(vcpu);
else if (exit_reason == EXIT_REASON_EPT_MISCONFIG)
return handle_ept_misconfig(vcpu);
#endif
exit_reason = array_index_nospec(exit_reason,
kvm_vmx_max_exit_handlers);
if (!kvm_vmx_exit_handlers[exit_reason])
goto unexpected_vmexit;
return kvm_vmx_exit_handlers[exit_reason](vcpu);
unexpected_vmexit:
vcpu_unimpl(vcpu, "vmx: unexpected exit reason 0x%x\n", exit_reason);
dump_vmcs();
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
vcpu->run->internal.suberror =
KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
vcpu->run->internal.ndata = 1;
vcpu->run->internal.data[0] = exit_reason;
return 0;
}
/*
@ -6217,7 +6232,8 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
}
STACK_FRAME_NON_STANDARD(handle_external_interrupt_irqoff);
static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu,
enum exit_fastpath_completion *exit_fastpath)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
@ -6225,6 +6241,9 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
handle_external_interrupt_irqoff(vcpu);
else if (vmx->exit_reason == EXIT_REASON_EXCEPTION_NMI)
handle_exception_nmi_irqoff(vmx);
else if (!is_guest_mode(vcpu) &&
vmx->exit_reason == EXIT_REASON_MSR_WRITE)
*exit_fastpath = handle_fastpath_set_msr_irqoff(vcpu);
}
static bool vmx_has_emulated_msr(int index)
@ -6633,60 +6652,31 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
free_vpid(vmx->vpid);
nested_vmx_free_vcpu(vcpu);
free_loaded_vmcs(vmx->loaded_vmcs);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.user_fpu);
kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.guest_fpu);
kmem_cache_free(kvm_vcpu_cache, vmx);
}
static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
{
int err;
struct vcpu_vmx *vmx;
unsigned long *msr_bitmap;
int i, cpu;
int i, cpu, err;
BUILD_BUG_ON_MSG(offsetof(struct vcpu_vmx, vcpu) != 0,
"struct kvm_vcpu must be at offset 0 for arch usercopy region");
vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL_ACCOUNT);
if (!vmx)
return ERR_PTR(-ENOMEM);
vmx->vcpu.arch.user_fpu = kmem_cache_zalloc(x86_fpu_cache,
GFP_KERNEL_ACCOUNT);
if (!vmx->vcpu.arch.user_fpu) {
printk(KERN_ERR "kvm: failed to allocate kvm userspace's fpu\n");
err = -ENOMEM;
goto free_partial_vcpu;
}
vmx->vcpu.arch.guest_fpu = kmem_cache_zalloc(x86_fpu_cache,
GFP_KERNEL_ACCOUNT);
if (!vmx->vcpu.arch.guest_fpu) {
printk(KERN_ERR "kvm: failed to allocate vcpu's fpu\n");
err = -ENOMEM;
goto free_user_fpu;
}
vmx->vpid = allocate_vpid();
err = kvm_vcpu_init(&vmx->vcpu, kvm, id);
if (err)
goto free_vcpu;
BUILD_BUG_ON(offsetof(struct vcpu_vmx, vcpu) != 0);
vmx = to_vmx(vcpu);
err = -ENOMEM;
vmx->vpid = allocate_vpid();
/*
* If PML is turned on, failure on enabling PML just results in failure
* of creating the vcpu, therefore we can simplify PML logic (by
* avoiding dealing with cases, such as enabling PML partially on vcpus
* for the guest, etc.
* for the guest), etc.
*/
if (enable_pml) {
vmx->pml_pg = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
if (!vmx->pml_pg)
goto uninit_vcpu;
goto free_vpid;
}
BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) != NR_SHARED_MSRS);
@ -6731,7 +6721,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
if (kvm_cstate_in_guest(kvm)) {
if (kvm_cstate_in_guest(vcpu->kvm)) {
vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
@ -6741,19 +6731,19 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
vmx->loaded_vmcs = &vmx->vmcs01;
cpu = get_cpu();
vmx_vcpu_load(&vmx->vcpu, cpu);
vmx->vcpu.cpu = cpu;
vmx_vcpu_load(vcpu, cpu);
vcpu->cpu = cpu;
init_vmcs(vmx);
vmx_vcpu_put(&vmx->vcpu);
vmx_vcpu_put(vcpu);
put_cpu();
if (cpu_need_virtualize_apic_accesses(&vmx->vcpu)) {
err = alloc_apic_access_page(kvm);
if (cpu_need_virtualize_apic_accesses(vcpu)) {
err = alloc_apic_access_page(vcpu->kvm);
if (err)
goto free_vmcs;
}
if (enable_ept && !enable_unrestricted_guest) {
err = init_rmode_identity_map(kvm);
err = init_rmode_identity_map(vcpu->kvm);
if (err)
goto free_vmcs;
}
@ -6761,7 +6751,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
if (nested)
nested_vmx_setup_ctls_msrs(&vmx->nested.msrs,
vmx_capability.ept,
kvm_vcpu_apicv_active(&vmx->vcpu));
kvm_vcpu_apicv_active(vcpu));
else
memset(&vmx->nested.msrs, 0, sizeof(vmx->nested.msrs));
@ -6779,22 +6769,15 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
vmx->ept_pointer = INVALID_PAGE;
return &vmx->vcpu;
return 0;
free_vmcs:
free_loaded_vmcs(vmx->loaded_vmcs);
free_pml:
vmx_destroy_pml_buffer(vmx);
uninit_vcpu:
kvm_vcpu_uninit(&vmx->vcpu);
free_vcpu:
free_vpid:
free_vpid(vmx->vpid);
kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.guest_fpu);
free_user_fpu:
kmem_cache_free(x86_fpu_cache, vmx->vcpu.arch.user_fpu);
free_partial_vcpu:
kmem_cache_free(kvm_vcpu_cache, vmx);
return ERR_PTR(err);
return err;
}
#define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
@ -6946,28 +6929,28 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
} while (0)
entry = kvm_find_cpuid_entry(vcpu, 0x1, 0);
cr4_fixed1_update(X86_CR4_VME, edx, bit(X86_FEATURE_VME));
cr4_fixed1_update(X86_CR4_PVI, edx, bit(X86_FEATURE_VME));
cr4_fixed1_update(X86_CR4_TSD, edx, bit(X86_FEATURE_TSC));
cr4_fixed1_update(X86_CR4_DE, edx, bit(X86_FEATURE_DE));
cr4_fixed1_update(X86_CR4_PSE, edx, bit(X86_FEATURE_PSE));
cr4_fixed1_update(X86_CR4_PAE, edx, bit(X86_FEATURE_PAE));
cr4_fixed1_update(X86_CR4_MCE, edx, bit(X86_FEATURE_MCE));
cr4_fixed1_update(X86_CR4_PGE, edx, bit(X86_FEATURE_PGE));
cr4_fixed1_update(X86_CR4_OSFXSR, edx, bit(X86_FEATURE_FXSR));
cr4_fixed1_update(X86_CR4_OSXMMEXCPT, edx, bit(X86_FEATURE_XMM));
cr4_fixed1_update(X86_CR4_VMXE, ecx, bit(X86_FEATURE_VMX));
cr4_fixed1_update(X86_CR4_SMXE, ecx, bit(X86_FEATURE_SMX));
cr4_fixed1_update(X86_CR4_PCIDE, ecx, bit(X86_FEATURE_PCID));
cr4_fixed1_update(X86_CR4_OSXSAVE, ecx, bit(X86_FEATURE_XSAVE));
cr4_fixed1_update(X86_CR4_VME, edx, feature_bit(VME));
cr4_fixed1_update(X86_CR4_PVI, edx, feature_bit(VME));
cr4_fixed1_update(X86_CR4_TSD, edx, feature_bit(TSC));
cr4_fixed1_update(X86_CR4_DE, edx, feature_bit(DE));
cr4_fixed1_update(X86_CR4_PSE, edx, feature_bit(PSE));
cr4_fixed1_update(X86_CR4_PAE, edx, feature_bit(PAE));
cr4_fixed1_update(X86_CR4_MCE, edx, feature_bit(MCE));
cr4_fixed1_update(X86_CR4_PGE, edx, feature_bit(PGE));
cr4_fixed1_update(X86_CR4_OSFXSR, edx, feature_bit(FXSR));
cr4_fixed1_update(X86_CR4_OSXMMEXCPT, edx, feature_bit(XMM));
cr4_fixed1_update(X86_CR4_VMXE, ecx, feature_bit(VMX));
cr4_fixed1_update(X86_CR4_SMXE, ecx, feature_bit(SMX));
cr4_fixed1_update(X86_CR4_PCIDE, ecx, feature_bit(PCID));
cr4_fixed1_update(X86_CR4_OSXSAVE, ecx, feature_bit(XSAVE));
entry = kvm_find_cpuid_entry(vcpu, 0x7, 0);
cr4_fixed1_update(X86_CR4_FSGSBASE, ebx, bit(X86_FEATURE_FSGSBASE));
cr4_fixed1_update(X86_CR4_SMEP, ebx, bit(X86_FEATURE_SMEP));
cr4_fixed1_update(X86_CR4_SMAP, ebx, bit(X86_FEATURE_SMAP));
cr4_fixed1_update(X86_CR4_PKE, ecx, bit(X86_FEATURE_PKU));
cr4_fixed1_update(X86_CR4_UMIP, ecx, bit(X86_FEATURE_UMIP));
cr4_fixed1_update(X86_CR4_LA57, ecx, bit(X86_FEATURE_LA57));
cr4_fixed1_update(X86_CR4_FSGSBASE, ebx, feature_bit(FSGSBASE));
cr4_fixed1_update(X86_CR4_SMEP, ebx, feature_bit(SMEP));
cr4_fixed1_update(X86_CR4_SMAP, ebx, feature_bit(SMAP));
cr4_fixed1_update(X86_CR4_PKE, ecx, feature_bit(PKU));
cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP));
cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57));
#undef cr4_fixed1_update
}
@ -7101,7 +7084,7 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
{
if (func == 1 && nested)
entry->ecx |= bit(X86_FEATURE_VMX);
entry->ecx |= feature_bit(VMX);
}
static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
@ -7843,6 +7826,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
.xsaves_supported = vmx_xsaves_supported,
.umip_emulated = vmx_umip_emulated,
.pt_supported = vmx_pt_supported,
.pku_supported = vmx_pku_supported,
.request_immediate_exit = vmx_request_immediate_exit,

View File

@ -93,6 +93,8 @@ u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#endif
static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
#define VM_STAT(x, ...) offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__
#define VCPU_STAT(x, ...) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__
@ -879,30 +881,44 @@ int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
}
EXPORT_SYMBOL_GPL(kvm_set_xcr);
#define __cr4_reserved_bits(__cpu_has, __c) \
({ \
u64 __reserved_bits = CR4_RESERVED_BITS; \
\
if (!__cpu_has(__c, X86_FEATURE_XSAVE)) \
__reserved_bits |= X86_CR4_OSXSAVE; \
if (!__cpu_has(__c, X86_FEATURE_SMEP)) \
__reserved_bits |= X86_CR4_SMEP; \
if (!__cpu_has(__c, X86_FEATURE_SMAP)) \
__reserved_bits |= X86_CR4_SMAP; \
if (!__cpu_has(__c, X86_FEATURE_FSGSBASE)) \
__reserved_bits |= X86_CR4_FSGSBASE; \
if (!__cpu_has(__c, X86_FEATURE_PKU)) \
__reserved_bits |= X86_CR4_PKE; \
if (!__cpu_has(__c, X86_FEATURE_LA57)) \
__reserved_bits |= X86_CR4_LA57; \
__reserved_bits; \
})
static u64 kvm_host_cr4_reserved_bits(struct cpuinfo_x86 *c)
{
u64 reserved_bits = __cr4_reserved_bits(cpu_has, c);
if (cpuid_ecx(0x7) & feature_bit(LA57))
reserved_bits &= ~X86_CR4_LA57;
if (kvm_x86_ops->umip_emulated())
reserved_bits &= ~X86_CR4_UMIP;
return reserved_bits;
}
static int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
if (cr4 & CR4_RESERVED_BITS)
if (cr4 & cr4_reserved_bits)
return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) && (cr4 & X86_CR4_OSXSAVE))
return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_SMEP) && (cr4 & X86_CR4_SMEP))
return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_SMAP) && (cr4 & X86_CR4_SMAP))
return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_FSGSBASE) && (cr4 & X86_CR4_FSGSBASE))
return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_PKU) && (cr4 & X86_CR4_PKE))
return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_LA57) && (cr4 & X86_CR4_LA57))
return -EINVAL;
if (!guest_cpuid_has(vcpu, X86_FEATURE_UMIP) && (cr4 & X86_CR4_UMIP))
if (cr4 & __cr4_reserved_bits(guest_cpuid_has, vcpu))
return -EINVAL;
return 0;
@ -1047,9 +1063,11 @@ static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
{
size_t size = ARRAY_SIZE(vcpu->arch.db);
switch (dr) {
case 0 ... 3:
vcpu->arch.db[dr] = val;
vcpu->arch.db[array_index_nospec(dr, size)] = val;
if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP))
vcpu->arch.eff_db[dr] = val;
break;
@ -1064,7 +1082,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
case 5:
/* fall through */
default: /* 7 */
if (val & 0xffffffff00000000ULL)
if (!kvm_dr7_valid(val))
return -1; /* #GP */
vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1;
kvm_update_dr7(vcpu);
@ -1086,9 +1104,11 @@ EXPORT_SYMBOL_GPL(kvm_set_dr);
int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
{
size_t size = ARRAY_SIZE(vcpu->arch.db);
switch (dr) {
case 0 ... 3:
*val = vcpu->arch.db[dr];
*val = vcpu->arch.db[array_index_nospec(dr, size)];
break;
case 4:
/* fall through */
@ -1212,6 +1232,7 @@ static const u32 emulated_msrs_all[] = {
MSR_MISC_FEATURES_ENABLES,
MSR_AMD64_VIRT_SPEC_CTRL,
MSR_IA32_POWER_CTL,
MSR_IA32_UCODE_REV,
/*
* The following list leaves out MSRs whose values are determined
@ -1525,6 +1546,49 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr);
/*
* The fast path for frequent and performance sensitive wrmsr emulation,
* i.e. the sending of IPI, sending IPI early in the VM-Exit flow reduces
* the latency of virtual IPI by avoiding the expensive bits of transitioning
* from guest to host, e.g. reacquiring KVM's SRCU lock. In contrast to the
* other cases which must be called after interrupts are enabled on the host.
*/
static int handle_fastpath_set_x2apic_icr_irqoff(struct kvm_vcpu *vcpu, u64 data)
{
if (lapic_in_kernel(vcpu) && apic_x2apic_mode(vcpu->arch.apic) &&
((data & APIC_DEST_MASK) == APIC_DEST_PHYSICAL) &&
((data & APIC_MODE_MASK) == APIC_DM_FIXED)) {
kvm_lapic_set_reg(vcpu->arch.apic, APIC_ICR2, (u32)(data >> 32));
return kvm_lapic_reg_write(vcpu->arch.apic, APIC_ICR, (u32)data);
}
return 1;
}
enum exit_fastpath_completion handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
{
u32 msr = kvm_rcx_read(vcpu);
u64 data = kvm_read_edx_eax(vcpu);
int ret = 0;
switch (msr) {
case APIC_BASE_MSR + (APIC_ICR >> 4):
ret = handle_fastpath_set_x2apic_icr_irqoff(vcpu, data);
break;
default:
return EXIT_FASTPATH_NONE;
}
if (!ret) {
trace_kvm_msr_write(msr, data);
return EXIT_FASTPATH_SKIP_EMUL_INS;
}
return EXIT_FASTPATH_NONE;
}
EXPORT_SYMBOL_GPL(handle_fastpath_set_msr_irqoff);
/*
* Adapt set_msr() to msr_io()'s calling convention
*/
@ -2485,7 +2549,10 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
default:
if (msr >= MSR_IA32_MC0_CTL &&
msr < MSR_IA32_MCx_CTL(bank_num)) {
u32 offset = msr - MSR_IA32_MC0_CTL;
u32 offset = array_index_nospec(
msr - MSR_IA32_MC0_CTL,
MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
/* only 0 or all 1s can be written to IA32_MCi_CTL
* some Linux kernels though clear bit 10 in bank 4 to
* workaround a BIOS/GART TBL issue on AMD K8s, ignore
@ -2581,45 +2648,47 @@ static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
static void record_steal_time(struct kvm_vcpu *vcpu)
{
struct kvm_host_map map;
struct kvm_steal_time *st;
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
if (unlikely(kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
/* -EAGAIN is returned in atomic context so we can just return. */
if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT,
&map, &vcpu->arch.st.cache, false))
return;
st = map.hva +
offset_in_page(vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS);
/*
* Doing a TLB flush here, on the guest's behalf, can avoid
* expensive IPIs.
*/
trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
vcpu->arch.st.steal.preempted & KVM_VCPU_FLUSH_TLB);
if (xchg(&vcpu->arch.st.steal.preempted, 0) & KVM_VCPU_FLUSH_TLB)
st->preempted & KVM_VCPU_FLUSH_TLB);
if (xchg(&st->preempted, 0) & KVM_VCPU_FLUSH_TLB)
kvm_vcpu_flush_tlb(vcpu, false);
if (vcpu->arch.st.steal.version & 1)
vcpu->arch.st.steal.version += 1; /* first time write, random junk */
vcpu->arch.st.preempted = 0;
vcpu->arch.st.steal.version += 1;
if (st->version & 1)
st->version += 1; /* first time write, random junk */
kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
st->version += 1;
smp_wmb();
vcpu->arch.st.steal.steal += current->sched_info.run_delay -
st->steal += current->sched_info.run_delay -
vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
smp_wmb();
vcpu->arch.st.steal.version += 1;
st->version += 1;
kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, false);
}
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@ -2786,11 +2855,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data & KVM_STEAL_RESERVED_MASK)
return 1;
if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.st.stime,
data & KVM_STEAL_VALID_BITS,
sizeof(struct kvm_steal_time)))
return 1;
vcpu->arch.st.msr_val = data;
if (!(data & KVM_MSR_ENABLED))
@ -2926,7 +2990,10 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
default:
if (msr >= MSR_IA32_MC0_CTL &&
msr < MSR_IA32_MCx_CTL(bank_num)) {
u32 offset = msr - MSR_IA32_MC0_CTL;
u32 offset = array_index_nospec(
msr - MSR_IA32_MC0_CTL,
MSR_IA32_MCx_CTL(bank_num) - MSR_IA32_MC0_CTL);
data = vcpu->arch.mce_banks[offset];
break;
}
@ -3458,10 +3525,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_x86_ops->vcpu_load(vcpu, cpu);
fpregs_assert_state_consistent();
if (test_thread_flag(TIF_NEED_FPU_LOAD))
switch_fpu_return();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
@ -3501,15 +3564,25 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
{
struct kvm_host_map map;
struct kvm_steal_time *st;
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED;
if (vcpu->arch.st.preempted)
return;
kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal.preempted,
offsetof(struct kvm_steal_time, preempted),
sizeof(vcpu->arch.st.steal.preempted));
if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT, &map,
&vcpu->arch.st.cache, true))
return;
st = map.hva +
offset_in_page(vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS);
st->preempted = vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, true);
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@ -4660,9 +4733,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
{
struct kvm_pit *pit = kvm->arch.vpit;
if (!pit)
return -ENXIO;
/* pit->pit_state.lock was overloaded to prevent userspace from getting
* an inconsistent state after running multiple KVM_REINJECT_CONTROL
* ioctls in parallel. Use a separate lock if that ioctl isn't rare.
@ -5029,6 +5099,9 @@ set_identity_unlock:
r = -EFAULT;
if (copy_from_user(&control, argp, sizeof(control)))
goto out;
r = -ENXIO;
if (!kvm->arch.vpit)
goto out;
r = kvm_vm_ioctl_reinject(kvm, &control);
break;
}
@ -6186,6 +6259,21 @@ static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
return kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx, check_limit);
}
static bool emulator_guest_has_long_mode(struct x86_emulate_ctxt *ctxt)
{
return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_LM);
}
static bool emulator_guest_has_movbe(struct x86_emulate_ctxt *ctxt)
{
return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_MOVBE);
}
static bool emulator_guest_has_fxsr(struct x86_emulate_ctxt *ctxt)
{
return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_FXSR);
}
static ulong emulator_read_gpr(struct x86_emulate_ctxt *ctxt, unsigned reg)
{
return kvm_register_read(emul_to_vcpu(ctxt), reg);
@ -6263,6 +6351,9 @@ static const struct x86_emulate_ops emulate_ops = {
.fix_hypercall = emulator_fix_hypercall,
.intercept = emulator_intercept,
.get_cpuid = emulator_get_cpuid,
.guest_has_long_mode = emulator_guest_has_long_mode,
.guest_has_movbe = emulator_guest_has_movbe,
.guest_has_fxsr = emulator_guest_has_fxsr,
.set_nmi_mask = emulator_set_nmi_mask,
.get_hflags = emulator_get_hflags,
.set_hflags = emulator_set_hflags,
@ -6379,11 +6470,11 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
return 1;
}
static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2,
static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
bool write_fault_to_shadow_pgtable,
int emulation_type)
{
gpa_t gpa = cr2;
gpa_t gpa = cr2_or_gpa;
kvm_pfn_t pfn;
if (!(emulation_type & EMULTYPE_ALLOW_RETRY))
@ -6397,7 +6488,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2,
* Write permission should be allowed since only
* write access need to be emulated.
*/
gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);
gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
/*
* If the mapping is invalid in guest, let cpu retry
@ -6454,10 +6545,10 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2,
}
static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
unsigned long cr2, int emulation_type)
gpa_t cr2_or_gpa, int emulation_type)
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
unsigned long last_retry_eip, last_retry_addr, gpa = cr2;
unsigned long last_retry_eip, last_retry_addr, gpa = cr2_or_gpa;
last_retry_eip = vcpu->arch.last_retry_eip;
last_retry_addr = vcpu->arch.last_retry_addr;
@ -6486,14 +6577,14 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
if (x86_page_table_writing_insn(ctxt))
return false;
if (ctxt->eip == last_retry_eip && last_retry_addr == cr2)
if (ctxt->eip == last_retry_eip && last_retry_addr == cr2_or_gpa)
return false;
vcpu->arch.last_retry_eip = ctxt->eip;
vcpu->arch.last_retry_addr = cr2;
vcpu->arch.last_retry_addr = cr2_or_gpa;
if (!vcpu->arch.mmu->direct_map)
gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);
gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
@ -6639,11 +6730,8 @@ static bool is_vmware_backdoor_opcode(struct x86_emulate_ctxt *ctxt)
return false;
}
int x86_emulate_instruction(struct kvm_vcpu *vcpu,
unsigned long cr2,
int emulation_type,
void *insn,
int insn_len)
int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len)
{
int r;
struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
@ -6689,8 +6777,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
if (reexecute_instruction(vcpu, cr2, write_fault_to_spt,
emulation_type))
if (reexecute_instruction(vcpu, cr2_or_gpa,
write_fault_to_spt,
emulation_type))
return 1;
if (ctxt->have_exception) {
/*
@ -6724,7 +6813,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
return 1;
}
if (retry_instruction(ctxt, cr2, emulation_type))
if (retry_instruction(ctxt, cr2_or_gpa, emulation_type))
return 1;
/* this is needed for vmware backdoor interface to work since it
@ -6736,7 +6825,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
restart:
/* Save the faulting GPA (cr2) in the address field */
ctxt->exception.address = cr2;
ctxt->exception.address = cr2_or_gpa;
r = x86_emulate_insn(ctxt);
@ -6744,7 +6833,7 @@ restart:
return 1;
if (r == EMULATION_FAILED) {
if (reexecute_instruction(vcpu, cr2, write_fault_to_spt,
if (reexecute_instruction(vcpu, cr2_or_gpa, write_fault_to_spt,
emulation_type))
return 1;
@ -7357,8 +7446,8 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
{
struct kvm_lapic_irq lapic_irq;
lapic_irq.shorthand = 0;
lapic_irq.dest_mode = 0;
lapic_irq.shorthand = APIC_DEST_NOSHORT;
lapic_irq.dest_mode = APIC_DEST_PHYSICAL;
lapic_irq.level = 0;
lapic_irq.dest_id = apicid;
lapic_irq.msi_redir_hint = false;
@ -7997,6 +8086,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
bool req_int_win =
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
enum exit_fastpath_completion exit_fastpath = EXIT_FASTPATH_NONE;
bool req_immediate_exit = false;
@ -8198,8 +8288,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
trace_kvm_entry(vcpu->vcpu_id);
guest_enter_irqoff();
/* The preempt notifier should have taken care of the FPU already. */
WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD));
fpregs_assert_state_consistent();
if (test_thread_flag(TIF_NEED_FPU_LOAD))
switch_fpu_return();
if (unlikely(vcpu->arch.switch_db_regs)) {
set_debugreg(0, 7);
@ -8243,7 +8334,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
vcpu->mode = OUTSIDE_GUEST_MODE;
smp_wmb();
kvm_x86_ops->handle_exit_irqoff(vcpu);
kvm_x86_ops->handle_exit_irqoff(vcpu, &exit_fastpath);
/*
* Consume any pending interrupts, including the possible source of
@ -8287,7 +8378,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_lapic_sync_from_vapic(vcpu);
vcpu->arch.gpa_available = false;
r = kvm_x86_ops->handle_exit(vcpu);
r = kvm_x86_ops->handle_exit(vcpu, exit_fastpath);
return r;
cancel_injection:
@ -8471,12 +8562,26 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
return 0;
}
static void kvm_save_current_fpu(struct fpu *fpu)
{
/*
* If the target FPU state is not resident in the CPU registers, just
* memcpy() from current, else save CPU state directly to the target.
*/
if (test_thread_flag(TIF_NEED_FPU_LOAD))
memcpy(&fpu->state, &current->thread.fpu.state,
fpu_kernel_xstate_size);
else
copy_fpregs_to_fpstate(fpu);
}
/* Swap (qemu) user FPU context for the guest FPU context. */
static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
{
fpregs_lock();
copy_fpregs_to_fpstate(vcpu->arch.user_fpu);
kvm_save_current_fpu(vcpu->arch.user_fpu);
/* PKRU is separately restored in kvm_x86_ops->run. */
__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
~XFEATURE_MASK_PKRU);
@ -8492,7 +8597,8 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
{
fpregs_lock();
copy_fpregs_to_fpstate(vcpu->arch.guest_fpu);
kvm_save_current_fpu(vcpu->arch.guest_fpu);
copy_kernel_to_fpregs(&vcpu->arch.user_fpu->state);
fpregs_mark_activate();
@ -8714,6 +8820,8 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
vcpu_load(vcpu);
if (kvm_mpx_supported())
kvm_load_guest_fpu(vcpu);
kvm_apic_accept_events(vcpu);
if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED &&
@ -8722,6 +8830,8 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
else
mp_state->mp_state = vcpu->arch.mp_state;
if (kvm_mpx_supported())
kvm_put_guest_fpu(vcpu);
vcpu_put(vcpu);
return 0;
}
@ -9082,33 +9192,90 @@ static void fx_init(struct kvm_vcpu *vcpu)
vcpu->arch.cr0 |= X86_CR0_ET;
}
void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
{
void *wbinvd_dirty_mask = vcpu->arch.wbinvd_dirty_mask;
kvmclock_reset(vcpu);
kvm_x86_ops->vcpu_free(vcpu);
free_cpumask_var(wbinvd_dirty_mask);
}
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
unsigned int id)
{
struct kvm_vcpu *vcpu;
if (kvm_check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
printk_once(KERN_WARNING
"kvm: SMP vm created on host with unstable TSC; "
"guest TSC will not be reliable\n");
pr_warn_once("kvm: SMP vm created on host with unstable TSC; "
"guest TSC will not be reliable\n");
vcpu = kvm_x86_ops->vcpu_create(kvm, id);
return vcpu;
return 0;
}
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
{
struct page *page;
int r;
vcpu->arch.emulate_ctxt.ops = &emulate_ops;
if (!irqchip_in_kernel(vcpu->kvm) || kvm_vcpu_is_reset_bsp(vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED;
kvm_set_tsc_khz(vcpu, max_tsc_khz);
r = kvm_mmu_create(vcpu);
if (r < 0)
return r;
if (irqchip_in_kernel(vcpu->kvm)) {
vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu->kvm);
r = kvm_create_lapic(vcpu, lapic_timer_advance_ns);
if (r < 0)
goto fail_mmu_destroy;
} else
static_key_slow_inc(&kvm_no_apic_vcpu);
r = -ENOMEM;
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page)
goto fail_free_lapic;
vcpu->arch.pio_data = page_address(page);
vcpu->arch.mce_banks = kzalloc(KVM_MAX_MCE_BANKS * sizeof(u64) * 4,
GFP_KERNEL_ACCOUNT);
if (!vcpu->arch.mce_banks)
goto fail_free_pio_data;
vcpu->arch.mcg_cap = KVM_MAX_MCE_BANKS;
if (!zalloc_cpumask_var(&vcpu->arch.wbinvd_dirty_mask,
GFP_KERNEL_ACCOUNT))
goto fail_free_mce_banks;
vcpu->arch.user_fpu = kmem_cache_zalloc(x86_fpu_cache,
GFP_KERNEL_ACCOUNT);
if (!vcpu->arch.user_fpu) {
pr_err("kvm: failed to allocate userspace's fpu\n");
goto free_wbinvd_dirty_mask;
}
vcpu->arch.guest_fpu = kmem_cache_zalloc(x86_fpu_cache,
GFP_KERNEL_ACCOUNT);
if (!vcpu->arch.guest_fpu) {
pr_err("kvm: failed to allocate vcpu's fpu\n");
goto free_user_fpu;
}
fx_init(vcpu);
vcpu->arch.guest_xstate_size = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET;
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
kvm_async_pf_hash_reset(vcpu);
kvm_pmu_init(vcpu);
vcpu->arch.pending_external_vector = -1;
vcpu->arch.preempted_in_kernel = false;
kvm_hv_vcpu_init(vcpu);
r = kvm_x86_ops->vcpu_create(vcpu);
if (r)
goto free_guest_fpu;
vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
kvm_vcpu_mtrr_init(vcpu);
@ -9117,6 +9284,22 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
kvm_init_mmu(vcpu, false);
vcpu_put(vcpu);
return 0;
free_guest_fpu:
kmem_cache_free(x86_fpu_cache, vcpu->arch.guest_fpu);
free_user_fpu:
kmem_cache_free(x86_fpu_cache, vcpu->arch.user_fpu);
free_wbinvd_dirty_mask:
free_cpumask_var(vcpu->arch.wbinvd_dirty_mask);
fail_free_mce_banks:
kfree(vcpu->arch.mce_banks);
fail_free_pio_data:
free_page((unsigned long)vcpu->arch.pio_data);
fail_free_lapic:
kvm_free_lapic(vcpu);
fail_mmu_destroy:
kvm_mmu_destroy(vcpu);
return r;
}
void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
@ -9149,13 +9332,29 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
vcpu->arch.apf.msr_val = 0;
struct gfn_to_pfn_cache *cache = &vcpu->arch.st.cache;
int idx;
vcpu_load(vcpu);
kvm_mmu_unload(vcpu);
vcpu_put(vcpu);
kvm_release_pfn(cache->pfn, cache->dirty, cache);
kvmclock_reset(vcpu);
kvm_x86_ops->vcpu_free(vcpu);
free_cpumask_var(vcpu->arch.wbinvd_dirty_mask);
kmem_cache_free(x86_fpu_cache, vcpu->arch.user_fpu);
kmem_cache_free(x86_fpu_cache, vcpu->arch.guest_fpu);
kvm_hv_vcpu_uninit(vcpu);
kvm_pmu_destroy(vcpu);
kfree(vcpu->arch.mce_banks);
kvm_free_lapic(vcpu);
idx = srcu_read_lock(&vcpu->kvm->srcu);
kvm_mmu_destroy(vcpu);
srcu_read_unlock(&vcpu->kvm->srcu, idx);
free_page((unsigned long)vcpu->arch.pio_data);
if (!lapic_in_kernel(vcpu))
static_key_slow_dec(&kvm_no_apic_vcpu);
}
void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@ -9171,7 +9370,6 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vcpu->arch.nmi_injected = false;
kvm_clear_interrupt_queue(vcpu);
kvm_clear_exception_queue(vcpu);
vcpu->arch.exception.pending = false;
memset(vcpu->arch.db, 0, sizeof(vcpu->arch.db));
kvm_update_dr0123(vcpu);
@ -9347,6 +9545,8 @@ int kvm_arch_hardware_setup(void)
if (r != 0)
return r;
cr4_reserved_bits = kvm_host_cr4_reserved_bits(&boot_cpu_data);
if (kvm_has_tsc_control) {
/*
* Make sure the user can only configure tsc_khz values that
@ -9375,6 +9575,13 @@ void kvm_arch_hardware_unsetup(void)
int kvm_arch_check_processor_compat(void)
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
WARN_ON(!irqs_disabled());
if (kvm_host_cr4_reserved_bits(c) != cr4_reserved_bits)
return -EIO;
return kvm_x86_ops->check_processor_compatibility();
}
@ -9392,98 +9599,6 @@ bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
struct static_key kvm_no_apic_vcpu __read_mostly;
EXPORT_SYMBOL_GPL(kvm_no_apic_vcpu);
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
struct page *page;
int r;
vcpu->arch.emulate_ctxt.ops = &emulate_ops;
if (!irqchip_in_kernel(vcpu->kvm) || kvm_vcpu_is_reset_bsp(vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED;
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page) {
r = -ENOMEM;
goto fail;
}
vcpu->arch.pio_data = page_address(page);
kvm_set_tsc_khz(vcpu, max_tsc_khz);
r = kvm_mmu_create(vcpu);
if (r < 0)
goto fail_free_pio_data;
if (irqchip_in_kernel(vcpu->kvm)) {
vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu->kvm);
r = kvm_create_lapic(vcpu, lapic_timer_advance_ns);
if (r < 0)
goto fail_mmu_destroy;
} else
static_key_slow_inc(&kvm_no_apic_vcpu);
vcpu->arch.mce_banks = kzalloc(KVM_MAX_MCE_BANKS * sizeof(u64) * 4,
GFP_KERNEL_ACCOUNT);
if (!vcpu->arch.mce_banks) {
r = -ENOMEM;
goto fail_free_lapic;
}
vcpu->arch.mcg_cap = KVM_MAX_MCE_BANKS;
if (!zalloc_cpumask_var(&vcpu->arch.wbinvd_dirty_mask,
GFP_KERNEL_ACCOUNT)) {
r = -ENOMEM;
goto fail_free_mce_banks;
}
fx_init(vcpu);
vcpu->arch.guest_xstate_size = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET;
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
kvm_async_pf_hash_reset(vcpu);
kvm_pmu_init(vcpu);
vcpu->arch.pending_external_vector = -1;
vcpu->arch.preempted_in_kernel = false;
kvm_hv_vcpu_init(vcpu);
return 0;
fail_free_mce_banks:
kfree(vcpu->arch.mce_banks);
fail_free_lapic:
kvm_free_lapic(vcpu);
fail_mmu_destroy:
kvm_mmu_destroy(vcpu);
fail_free_pio_data:
free_page((unsigned long)vcpu->arch.pio_data);
fail:
return r;
}
void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
{
int idx;
kvm_hv_vcpu_uninit(vcpu);
kvm_pmu_destroy(vcpu);
kfree(vcpu->arch.mce_banks);
kvm_free_lapic(vcpu);
idx = srcu_read_lock(&vcpu->kvm->srcu);
kvm_mmu_destroy(vcpu);
srcu_read_unlock(&vcpu->kvm->srcu, idx);
free_page((unsigned long)vcpu->arch.pio_data);
if (!lapic_in_kernel(vcpu))
static_key_slow_dec(&kvm_no_apic_vcpu);
}
void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@ -9558,7 +9673,7 @@ static void kvm_free_vcpus(struct kvm *kvm)
kvm_unload_vcpu_mmu(vcpu);
}
kvm_for_each_vcpu(i, vcpu, kvm)
kvm_arch_vcpu_free(vcpu);
kvm_vcpu_destroy(vcpu);
mutex_lock(&kvm->lock);
for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
@ -9627,18 +9742,6 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
}
EXPORT_SYMBOL_GPL(__x86_set_memory_region);
int x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
{
int r;
mutex_lock(&kvm->slots_lock);
r = __x86_set_memory_region(kvm, id, gpa, size);
mutex_unlock(&kvm->slots_lock);
return r;
}
EXPORT_SYMBOL_GPL(x86_set_memory_region);
void kvm_arch_pre_destroy_vm(struct kvm *kvm)
{
kvm_mmu_pre_destroy_vm(kvm);
@ -9652,9 +9755,13 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
* unless the the memory map has changed due to process exit
* or fd copying.
*/
x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
x86_set_memory_region(kvm, IDENTITY_PAGETABLE_PRIVATE_MEMSLOT, 0, 0);
x86_set_memory_region(kvm, TSS_PRIVATE_MEMSLOT, 0, 0);
mutex_lock(&kvm->slots_lock);
__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
0, 0);
__x86_set_memory_region(kvm, IDENTITY_PAGETABLE_PRIVATE_MEMSLOT,
0, 0);
__x86_set_memory_region(kvm, TSS_PRIVATE_MEMSLOT, 0, 0);
mutex_unlock(&kvm->slots_lock);
}
if (kvm_x86_ops->vm_destroy)
kvm_x86_ops->vm_destroy(kvm);
@ -9758,11 +9865,18 @@ out_free:
void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
{
struct kvm_vcpu *vcpu;
int i;
/*
* memslots->generation has been incremented.
* mmio generation may have reached its maximum value.
*/
kvm_mmu_invalidate_mmio_sptes(kvm, gen);
/* Force re-initialization of steal_time cache */
kvm_for_each_vcpu(i, vcpu, kvm)
kvm_vcpu_kick(vcpu);
}
int kvm_arch_prepare_memory_region(struct kvm *kvm,
@ -9792,7 +9906,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
*
* The reason is, in case of PML, we need to set D-bit for any slots
* with dirty logging disabled in order to eliminate unnecessary GPA
* logging in PML buffer (and potential PML buffer full VMEXT). This
* logging in PML buffer (and potential PML buffer full VMEXIT). This
* guarantees leaving PML enabled during guest's lifetime won't have
* any additional overhead from PML when guest is running with dirty
* logging disabled for memory slots.
@ -10014,7 +10128,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
work->arch.cr3 != vcpu->arch.mmu->get_cr3(vcpu))
return;
vcpu->arch.mmu->page_fault(vcpu, work->gva, 0, true);
vcpu->arch.mmu->page_fault(vcpu, work->cr2_or_gpa, 0, true);
}
static inline u32 kvm_async_pf_hash_fn(gfn_t gfn)
@ -10127,7 +10241,7 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
{
struct x86_exception fault;
trace_kvm_async_pf_not_present(work->arch.token, work->gva);
trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa);
kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
if (kvm_can_deliver_async_pf(vcpu) &&
@ -10162,7 +10276,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
work->arch.token = ~0; /* broadcast wakeup */
else
kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
trace_kvm_async_pf_ready(work->arch.token, work->gva);
trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
!apf_get_user(vcpu, &val)) {
@ -10284,7 +10398,6 @@ bool kvm_vector_hashing_enabled(void)
{
return vector_hashing;
}
EXPORT_SYMBOL_GPL(kvm_vector_hashing_enabled);
bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
{
@ -10292,6 +10405,28 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu)
{
uint64_t bits = SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD;
/* The STIBP bit doesn't fault even if it's not advertised */
if (!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
!guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS))
bits &= ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP);
if (!boot_cpu_has(X86_FEATURE_SPEC_CTRL) &&
!boot_cpu_has(X86_FEATURE_AMD_IBRS))
bits &= ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP);
if (!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL_SSBD) &&
!guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD))
bits &= ~SPEC_CTRL_SSBD;
if (!boot_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) &&
!boot_cpu_has(X86_FEATURE_AMD_SSBD))
bits &= ~SPEC_CTRL_SSBD;
return bits;
}
EXPORT_SYMBOL_GPL(kvm_spec_ctrl_valid_bits);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);

View File

@ -144,11 +144,6 @@ static inline bool is_pae_paging(struct kvm_vcpu *vcpu)
return !is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu);
}
static inline u32 bit(int bitno)
{
return 1 << (bitno & 31);
}
static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
{
return kvm_read_cr4_bits(vcpu, X86_CR4_LA57) ? 57 : 48;
@ -166,21 +161,13 @@ static inline u64 get_canonical(u64 la, u8 vaddr_bits)
static inline bool is_noncanonical_address(u64 la, struct kvm_vcpu *vcpu)
{
#ifdef CONFIG_X86_64
return get_canonical(la, vcpu_virt_addr_bits(vcpu)) != la;
#else
return false;
#endif
}
static inline bool emul_is_noncanonical_address(u64 la,
struct x86_emulate_ctxt *ctxt)
{
#ifdef CONFIG_X86_64
return get_canonical(la, ctxt_virt_addr_bits(ctxt)) != la;
#else
return false;
#endif
}
static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
@ -289,8 +276,9 @@ int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
int page_num);
bool kvm_vector_hashing_enabled(void);
int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len);
enum exit_fastpath_completion handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
#define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \
| XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
@ -369,7 +357,14 @@ static inline bool kvm_pat_valid(u64 data)
return (data | ((data & 0x0202020202020202ull) << 1)) == data;
}
static inline bool kvm_dr7_valid(unsigned long data)
{
/* Bits [63:32] are reserved */
return !(data >> 32);
}
void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu);
#endif

View File

@ -618,6 +618,17 @@ pte_t *lookup_address(unsigned long address, unsigned int *level)
}
EXPORT_SYMBOL_GPL(lookup_address);
/*
* Lookup the page table entry for a virtual address in a given mm. Return a
* pointer to the entry and the level of the mapping.
*/
pte_t *lookup_address_in_mm(struct mm_struct *mm, unsigned long address,
unsigned int *level)
{
return lookup_address_in_pgd(pgd_offset(mm, address), address, level);
}
EXPORT_SYMBOL_GPL(lookup_address_in_mm);
static pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long address,
unsigned int *level)
{

View File

@ -154,15 +154,6 @@ static inline void guest_exit_irqoff(void)
}
#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
static inline void guest_enter(void)
{
unsigned long flags;
local_irq_save(flags);
guest_enter_irqoff();
local_irq_restore(flags);
}
static inline void guest_exit(void)
{
unsigned long flags;

View File

@ -160,6 +160,7 @@ extern unsigned long thp_get_unmapped_area(struct file *filp,
extern void prep_transhuge_page(struct page *page);
extern void free_transhuge_page(struct page *page);
bool is_transparent_hugepage(struct page *page);
bool can_split_huge_page(struct page *page, int *pextra_pins);
int split_huge_page_to_list(struct page *page, struct list_head *list);
@ -308,6 +309,11 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
static inline void prep_transhuge_page(struct page *page) {}
static inline bool is_transparent_hugepage(struct page *page)
{
return false;
}
#define transparent_hugepage_flags 0UL
#define thp_get_unmapped_area NULL

View File

@ -157,8 +157,6 @@ static inline bool is_error_page(struct page *page)
#define KVM_USERSPACE_IRQ_SOURCE_ID 0
#define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1
extern struct kmem_cache *kvm_vcpu_cache;
extern struct mutex kvm_lock;
extern struct list_head vm_list;
@ -204,7 +202,7 @@ struct kvm_async_pf {
struct list_head queue;
struct kvm_vcpu *vcpu;
struct mm_struct *mm;
gva_t gva;
gpa_t cr2_or_gpa;
unsigned long addr;
struct kvm_arch_async_pf arch;
bool wakeup_all;
@ -212,8 +210,8 @@ struct kvm_async_pf {
void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
struct kvm_arch_async_pf *arch);
int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
unsigned long hva, struct kvm_arch_async_pf *arch);
int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
#endif
@ -579,8 +577,7 @@ static inline int kvm_vcpu_get_idx(struct kvm_vcpu *vcpu)
memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
memslot++)
int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id);
void kvm_vcpu_uninit(struct kvm_vcpu *vcpu);
void kvm_vcpu_destroy(struct kvm_vcpu *vcpu);
void vcpu_load(struct kvm_vcpu *vcpu);
void vcpu_put(struct kvm_vcpu *vcpu);
@ -723,10 +720,9 @@ void kvm_set_pfn_dirty(kvm_pfn_t pfn);
void kvm_set_pfn_accessed(kvm_pfn_t pfn);
void kvm_get_pfn(kvm_pfn_t pfn);
void kvm_release_pfn(kvm_pfn_t pfn, bool dirty, struct gfn_to_pfn_cache *cache);
int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
int len);
int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
unsigned long len);
int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len);
int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
void *data, unsigned long len);
@ -767,7 +763,7 @@ int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
@ -775,8 +771,12 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn
kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
struct gfn_to_pfn_cache *cache, bool atomic);
struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn);
void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty);
int kvm_unmap_gfn(struct kvm_vcpu *vcpu, struct kvm_host_map *map,
struct gfn_to_pfn_cache *cache, bool dirty, bool atomic);
unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn);
unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *writable);
int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, int offset,
@ -867,16 +867,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
int kvm_arch_init(void *opaque);
void kvm_arch_exit(void);
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id);
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id);
int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu);
@ -982,10 +978,10 @@ void kvm_arch_destroy_vm(struct kvm *kvm);
void kvm_arch_sync_events(struct kvm *kvm);
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
bool kvm_is_reserved_pfn(kvm_pfn_t pfn);
bool kvm_is_zone_device_pfn(kvm_pfn_t pfn);
bool kvm_is_transparent_hugepage(kvm_pfn_t pfn);
struct kvm_irq_ack_notifier {
struct hlist_node link;
@ -1109,9 +1105,8 @@ enum kvm_stat_kind {
};
struct kvm_stat_data {
int offset;
int mode;
struct kvm *kvm;
struct kvm_stats_debugfs_item *dbgfs_item;
};
struct kvm_stats_debugfs_item {
@ -1120,6 +1115,10 @@ struct kvm_stats_debugfs_item {
enum kvm_stat_kind kind;
int mode;
};
#define KVM_DBGFS_GET_MODE(dbgfs_item) \
((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
extern struct kvm_stats_debugfs_item debugfs_entries[];
extern struct dentry *kvm_debugfs_dir;
@ -1342,6 +1341,9 @@ static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val)
}
#endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */
struct kvm_vcpu *kvm_get_running_vcpu(void);
struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
bool kvm_arch_has_irq_bypass(void);
int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *,

View File

@ -18,7 +18,7 @@ struct kvm_memslots;
enum kvm_mr_change;
#include <asm/types.h>
#include <linux/types.h>
/*
* Address types:
@ -51,4 +51,11 @@ struct gfn_to_hva_cache {
struct kvm_memory_slot *memslot;
};
struct gfn_to_pfn_cache {
u64 generation;
gfn_t gfn;
kvm_pfn_t pfn;
bool dirty;
};
#endif /* __KVM_TYPES_H__ */

View File

@ -527,6 +527,17 @@ void prep_transhuge_page(struct page *page)
set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
}
bool is_transparent_hugepage(struct page *page)
{
if (!PageCompound(page))
return 0;
page = compound_head(page);
return is_huge_zero_page(page) ||
page[1].compound_dtor == TRANSHUGE_PAGE_DTOR;
}
EXPORT_SYMBOL_GPL(is_transparent_hugepage);
static unsigned long __thp_get_unmapped_area(struct file *filp,
unsigned long addr, unsigned long len,
loff_t off, unsigned long flags, unsigned long size)

View File

@ -33,7 +33,7 @@
#define EXIT_REASON_TRIPLE_FAULT 2
#define EXIT_REASON_INIT_SIGNAL 3
#define EXIT_REASON_PENDING_INTERRUPT 7
#define EXIT_REASON_INTERRUPT_WINDOW 7
#define EXIT_REASON_NMI_WINDOW 8
#define EXIT_REASON_TASK_SWITCH 9
#define EXIT_REASON_CPUID 10
@ -94,7 +94,7 @@
{ EXIT_REASON_EXTERNAL_INTERRUPT, "EXTERNAL_INTERRUPT" }, \
{ EXIT_REASON_TRIPLE_FAULT, "TRIPLE_FAULT" }, \
{ EXIT_REASON_INIT_SIGNAL, "INIT_SIGNAL" }, \
{ EXIT_REASON_PENDING_INTERRUPT, "PENDING_INTERRUPT" }, \
{ EXIT_REASON_INTERRUPT_WINDOW, "INTERRUPT_WINDOW" }, \
{ EXIT_REASON_NMI_WINDOW, "NMI_WINDOW" }, \
{ EXIT_REASON_TASK_SWITCH, "TASK_SWITCH" }, \
{ EXIT_REASON_CPUID, "CPUID" }, \

View File

@ -270,6 +270,7 @@ class ArchX86(Arch):
def __init__(self, exit_reasons):
self.sc_perf_evt_open = 298
self.ioctl_numbers = IOCTL_NUMBERS
self.exit_reason_field = 'exit_reason'
self.exit_reasons = exit_reasons
def debugfs_is_child(self, field):
@ -289,6 +290,7 @@ class ArchPPC(Arch):
# numbers depend on the wordsize.
char_ptr_size = ctypes.sizeof(ctypes.c_char_p)
self.ioctl_numbers['SET_FILTER'] = 0x80002406 | char_ptr_size << 16
self.exit_reason_field = 'exit_nr'
self.exit_reasons = {}
def debugfs_is_child(self, field):
@ -300,6 +302,7 @@ class ArchA64(Arch):
def __init__(self):
self.sc_perf_evt_open = 241
self.ioctl_numbers = IOCTL_NUMBERS
self.exit_reason_field = 'esr_ec'
self.exit_reasons = AARCH64_EXIT_REASONS
def debugfs_is_child(self, field):
@ -311,6 +314,7 @@ class ArchS390(Arch):
def __init__(self):
self.sc_perf_evt_open = 331
self.ioctl_numbers = IOCTL_NUMBERS
self.exit_reason_field = None
self.exit_reasons = None
def debugfs_is_child(self, field):
@ -541,8 +545,8 @@ class TracepointProvider(Provider):
"""
filters = {}
filters['kvm_userspace_exit'] = ('reason', USERSPACE_EXIT_REASONS)
if ARCH.exit_reasons:
filters['kvm_exit'] = ('exit_reason', ARCH.exit_reasons)
if ARCH.exit_reason_field and ARCH.exit_reasons:
filters['kvm_exit'] = (ARCH.exit_reason_field, ARCH.exit_reasons)
return filters
def _get_available_fields(self):

View File

@ -18,8 +18,8 @@
/*
* Definitions of Primary Processor-Based VM-Execution Controls.
*/
#define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004
#define CPU_BASED_USE_TSC_OFFSETING 0x00000008
#define CPU_BASED_INTR_WINDOW_EXITING 0x00000004
#define CPU_BASED_USE_TSC_OFFSETTING 0x00000008
#define CPU_BASED_HLT_EXITING 0x00000080
#define CPU_BASED_INVLPG_EXITING 0x00000200
#define CPU_BASED_MWAIT_EXITING 0x00000400
@ -30,7 +30,7 @@
#define CPU_BASED_CR8_LOAD_EXITING 0x00080000
#define CPU_BASED_CR8_STORE_EXITING 0x00100000
#define CPU_BASED_TPR_SHADOW 0x00200000
#define CPU_BASED_VIRTUAL_NMI_PENDING 0x00400000
#define CPU_BASED_NMI_WINDOW_EXITING 0x00400000
#define CPU_BASED_MOV_DR_EXITING 0x00800000
#define CPU_BASED_UNCOND_IO_EXITING 0x01000000
#define CPU_BASED_USE_IO_BITMAPS 0x02000000
@ -103,7 +103,7 @@
#define EXIT_REASON_EXCEPTION_NMI 0
#define EXIT_REASON_EXTERNAL_INTERRUPT 1
#define EXIT_REASON_TRIPLE_FAULT 2
#define EXIT_REASON_PENDING_INTERRUPT 7
#define EXIT_REASON_INTERRUPT_WINDOW 7
#define EXIT_REASON_NMI_WINDOW 8
#define EXIT_REASON_TASK_SWITCH 9
#define EXIT_REASON_CPUID 10

View File

@ -98,7 +98,7 @@ static void l1_guest_code(struct vmx_pages *vmx_pages)
prepare_vmcs(vmx_pages, l2_guest_code,
&l2_guest_stack[L2_GUEST_STACK_SIZE]);
control = vmreadz(CPU_BASED_VM_EXEC_CONTROL);
control |= CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_TSC_OFFSETING;
control |= CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_USE_TSC_OFFSETTING;
vmwrite(CPU_BASED_VM_EXEC_CONTROL, control);
vmwrite(TSC_OFFSET, TSC_OFFSET_VALUE);

View File

@ -10,10 +10,15 @@
* Author: Christoffer Dall <c.dall@virtualopensystems.com>
*/
#include <linux/bits.h>
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_hyp.h>
#define DFSR_FSC_EXTABT_LPAE 0x10
#define DFSR_FSC_EXTABT_nLPAE 0x08
#define DFSR_LPAE BIT(9)
/*
* Table taken from ARMv8 ARM DDI0487B-B, table G1-10.
*/
@ -28,25 +33,115 @@ static const u8 return_offsets[8][2] = {
[7] = { 4, 4 }, /* FIQ, unused */
};
/*
* When an exception is taken, most CPSR fields are left unchanged in the
* handler. However, some are explicitly overridden (e.g. M[4:0]).
*
* The SPSR/SPSR_ELx layouts differ, and the below is intended to work with
* either format. Note: SPSR.J bit doesn't exist in SPSR_ELx, but this bit was
* obsoleted by the ARMv7 virtualization extensions and is RES0.
*
* For the SPSR layout seen from AArch32, see:
* - ARM DDI 0406C.d, page B1-1148
* - ARM DDI 0487E.a, page G8-6264
*
* For the SPSR_ELx layout for AArch32 seen from AArch64, see:
* - ARM DDI 0487E.a, page C5-426
*
* Here we manipulate the fields in order of the AArch32 SPSR_ELx layout, from
* MSB to LSB.
*/
static unsigned long get_except32_cpsr(struct kvm_vcpu *vcpu, u32 mode)
{
u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR);
unsigned long old, new;
old = *vcpu_cpsr(vcpu);
new = 0;
new |= (old & PSR_AA32_N_BIT);
new |= (old & PSR_AA32_Z_BIT);
new |= (old & PSR_AA32_C_BIT);
new |= (old & PSR_AA32_V_BIT);
new |= (old & PSR_AA32_Q_BIT);
// CPSR.IT[7:0] are set to zero upon any exception
// See ARM DDI 0487E.a, section G1.12.3
// See ARM DDI 0406C.d, section B1.8.3
new |= (old & PSR_AA32_DIT_BIT);
// CPSR.SSBS is set to SCTLR.DSSBS upon any exception
// See ARM DDI 0487E.a, page G8-6244
if (sctlr & BIT(31))
new |= PSR_AA32_SSBS_BIT;
// CPSR.PAN is unchanged unless SCTLR.SPAN == 0b0
// SCTLR.SPAN is RES1 when ARMv8.1-PAN is not implemented
// See ARM DDI 0487E.a, page G8-6246
new |= (old & PSR_AA32_PAN_BIT);
if (!(sctlr & BIT(23)))
new |= PSR_AA32_PAN_BIT;
// SS does not exist in AArch32, so ignore
// CPSR.IL is set to zero upon any exception
// See ARM DDI 0487E.a, page G1-5527
new |= (old & PSR_AA32_GE_MASK);
// CPSR.IT[7:0] are set to zero upon any exception
// See prior comment above
// CPSR.E is set to SCTLR.EE upon any exception
// See ARM DDI 0487E.a, page G8-6245
// See ARM DDI 0406C.d, page B4-1701
if (sctlr & BIT(25))
new |= PSR_AA32_E_BIT;
// CPSR.A is unchanged upon an exception to Undefined, Supervisor
// CPSR.A is set upon an exception to other modes
// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
// See ARM DDI 0406C.d, page B1-1182
new |= (old & PSR_AA32_A_BIT);
if (mode != PSR_AA32_MODE_UND && mode != PSR_AA32_MODE_SVC)
new |= PSR_AA32_A_BIT;
// CPSR.I is set upon any exception
// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
// See ARM DDI 0406C.d, page B1-1182
new |= PSR_AA32_I_BIT;
// CPSR.F is set upon an exception to FIQ
// CPSR.F is unchanged upon an exception to other modes
// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
// See ARM DDI 0406C.d, page B1-1182
new |= (old & PSR_AA32_F_BIT);
if (mode == PSR_AA32_MODE_FIQ)
new |= PSR_AA32_F_BIT;
// CPSR.T is set to SCTLR.TE upon any exception
// See ARM DDI 0487E.a, page G8-5514
// See ARM DDI 0406C.d, page B1-1181
if (sctlr & BIT(30))
new |= PSR_AA32_T_BIT;
new |= mode;
return new;
}
static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
{
unsigned long cpsr;
unsigned long new_spsr_value = *vcpu_cpsr(vcpu);
bool is_thumb = (new_spsr_value & PSR_AA32_T_BIT);
unsigned long spsr = *vcpu_cpsr(vcpu);
bool is_thumb = (spsr & PSR_AA32_T_BIT);
u32 return_offset = return_offsets[vect_offset >> 2][is_thumb];
u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR);
cpsr = mode | PSR_AA32_I_BIT;
if (sctlr & (1 << 30))
cpsr |= PSR_AA32_T_BIT;
if (sctlr & (1 << 25))
cpsr |= PSR_AA32_E_BIT;
*vcpu_cpsr(vcpu) = cpsr;
*vcpu_cpsr(vcpu) = get_except32_cpsr(vcpu, mode);
/* Note: These now point to the banked copies */
vcpu_write_spsr(vcpu, new_spsr_value);
vcpu_write_spsr(vcpu, host_spsr_to_spsr32(spsr));
*vcpu_reg32(vcpu, 14) = *vcpu_pc(vcpu) + return_offset;
/* Branch to exception vector */
@ -84,16 +179,18 @@ static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
fsr = &vcpu_cp15(vcpu, c5_DFSR);
}
prepare_fault32(vcpu, PSR_AA32_MODE_ABT | PSR_AA32_A_BIT, vect_offset);
prepare_fault32(vcpu, PSR_AA32_MODE_ABT, vect_offset);
*far = addr;
/* Give the guest an IMPLEMENTATION DEFINED exception */
is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31);
if (is_lpae)
*fsr = 1 << 9 | 0x34;
else
*fsr = 0x14;
if (is_lpae) {
*fsr = DFSR_LPAE | DFSR_FSC_EXTABT_LPAE;
} else {
/* no need to shuffle FS[4] into DFSR[10] as its 0 */
*fsr = DFSR_FSC_EXTABT_nLPAE;
}
}
void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr)

View File

@ -805,6 +805,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
switch (treg) {
case TIMER_REG_TVAL:
val = timer->cnt_cval - kvm_phys_timer_read() + timer->cntvoff;
val &= lower_32_bits(val);
break;
case TIMER_REG_CTL:
@ -850,7 +851,7 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
{
switch (treg) {
case TIMER_REG_TVAL:
timer->cnt_cval = kvm_phys_timer_read() - timer->cntvoff + val;
timer->cnt_cval = kvm_phys_timer_read() - timer->cntvoff + (s32)val;
break;
case TIMER_REG_CTL:
@ -1022,7 +1023,7 @@ static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
bool kvm_arch_timer_get_input_level(int vintid)
{
struct kvm_vcpu *vcpu = kvm_arm_get_running_vcpu();
struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
struct arch_timer_context *timer;
if (vintid == vcpu_vtimer(vcpu)->irq.irq)

View File

@ -20,8 +20,6 @@
#include <linux/irqbypass.h>
#include <linux/sched/stat.h>
#include <trace/events/kvm.h>
#include <kvm/arm_pmu.h>
#include <kvm/arm_psci.h>
#define CREATE_TRACE_POINTS
#include "trace.h"
@ -51,9 +49,6 @@ __asm__(".arch_extension virt");
DEFINE_PER_CPU(kvm_host_data_t, kvm_host_data);
static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
/* Per-CPU variable containing the currently running vcpu. */
static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_arm_running_vcpu);
/* The VMID used in the VTTBR */
static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
static u32 kvm_next_vmid;
@ -62,31 +57,8 @@ static DEFINE_SPINLOCK(kvm_vmid_lock);
static bool vgic_present;
static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
{
__this_cpu_write(kvm_arm_running_vcpu, vcpu);
}
DEFINE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
/**
* kvm_arm_get_running_vcpu - get the vcpu running on the current CPU.
* Must be called from non-preemptible context
*/
struct kvm_vcpu *kvm_arm_get_running_vcpu(void)
{
return __this_cpu_read(kvm_arm_running_vcpu);
}
/**
* kvm_arm_get_running_vcpus - get the per-CPU array of currently running vcpus.
*/
struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
{
return &kvm_arm_running_vcpu;
}
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
{
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
@ -194,7 +166,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
for (i = 0; i < KVM_MAX_VCPUS; ++i) {
if (kvm->vcpus[i]) {
kvm_arch_vcpu_free(kvm->vcpus[i]);
kvm_vcpu_destroy(kvm->vcpus[i]);
kvm->vcpus[i] = NULL;
}
}
@ -279,49 +251,46 @@ void kvm_arch_free_vm(struct kvm *kvm)
vfree(kvm);
}
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
{
if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
return -EBUSY;
if (id >= kvm->arch.max_vcpus)
return -EINVAL;
return 0;
}
int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
{
int err;
struct kvm_vcpu *vcpu;
if (irqchip_in_kernel(kvm) && vgic_initialized(kvm)) {
err = -EBUSY;
goto out;
}
/* Force users to call KVM_ARM_VCPU_INIT */
vcpu->arch.target = -1;
bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
if (id >= kvm->arch.max_vcpus) {
err = -EINVAL;
goto out;
}
/* Set up the timer */
kvm_timer_vcpu_init(vcpu);
vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
if (!vcpu) {
err = -ENOMEM;
goto out;
}
kvm_pmu_vcpu_init(vcpu);
err = kvm_vcpu_init(vcpu, kvm, id);
kvm_arm_reset_debug_ptr(vcpu);
kvm_arm_pvtime_vcpu_init(&vcpu->arch);
err = kvm_vgic_vcpu_init(vcpu);
if (err)
goto free_vcpu;
return err;
err = create_hyp_mappings(vcpu, vcpu + 1, PAGE_HYP);
if (err)
goto vcpu_uninit;
return vcpu;
vcpu_uninit:
kvm_vcpu_uninit(vcpu);
free_vcpu:
kmem_cache_free(kvm_vcpu_cache, vcpu);
out:
return ERR_PTR(err);
return create_hyp_mappings(vcpu, vcpu + 1, PAGE_HYP);
}
void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
{
}
void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
static_branch_dec(&userspace_irqchip_in_use);
@ -329,13 +298,8 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
kvm_mmu_free_memory_caches(vcpu);
kvm_timer_vcpu_terminate(vcpu);
kvm_pmu_vcpu_destroy(vcpu);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu);
}
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
kvm_arch_vcpu_free(vcpu);
kvm_arm_vcpu_destroy(vcpu);
}
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
@ -368,24 +332,6 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
preempt_enable();
}
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
/* Force users to call KVM_ARM_VCPU_INIT */
vcpu->arch.target = -1;
bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
/* Set up the timer */
kvm_timer_vcpu_init(vcpu);
kvm_pmu_vcpu_init(vcpu);
kvm_arm_reset_debug_ptr(vcpu);
kvm_arm_pvtime_vcpu_init(&vcpu->arch);
return kvm_vgic_vcpu_init(vcpu);
}
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
int *last_ran;
@ -406,7 +352,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vcpu->cpu = cpu;
vcpu->arch.host_cpu_context = &cpu_data->host_ctxt;
kvm_arm_set_running_vcpu(vcpu);
kvm_vgic_load(vcpu);
kvm_timer_vcpu_load(vcpu);
kvm_vcpu_load_sysregs(vcpu);
@ -432,8 +377,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_vcpu_pmu_restore_host(vcpu);
vcpu->cpu = -1;
kvm_arm_set_running_vcpu(NULL);
}
static void vcpu_power_off(struct kvm_vcpu *vcpu)
@ -1537,7 +1480,6 @@ static void teardown_hyp_mode(void)
free_hyp_pgds();
for_each_possible_cpu(cpu)
free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
hyp_cpu_pm_exit();
}
/**
@ -1751,6 +1693,7 @@ int kvm_arch_init(void *opaque)
return 0;
out_hyp:
hyp_cpu_pm_exit();
if (!in_hyp_mode)
teardown_hyp_mode();
out_err:

View File

@ -5,7 +5,6 @@
*/
#include <linux/kvm_host.h>
#include <asm/kvm_mmio.h>
#include <asm/kvm_emulate.h>
#include <trace/events/kvm.h>
@ -92,23 +91,23 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
vcpu->mmio_needed = 0;
if (!run->mmio.is_write) {
len = run->mmio.len;
if (len > sizeof(unsigned long))
return -EINVAL;
if (!kvm_vcpu_dabt_iswrite(vcpu)) {
len = kvm_vcpu_dabt_get_as(vcpu);
data = kvm_mmio_read_buf(run->mmio.data, len);
if (vcpu->arch.mmio_decode.sign_extend &&
if (kvm_vcpu_dabt_issext(vcpu) &&
len < sizeof(unsigned long)) {
mask = 1U << ((len * 8) - 1);
data = (data ^ mask) - mask;
}
if (!kvm_vcpu_dabt_issf(vcpu))
data = data & 0xffffffff;
trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
&data);
data = vcpu_data_host_to_guest(vcpu, data, len);
vcpu_set_reg(vcpu, vcpu->arch.mmio_decode.rt, data);
vcpu_set_reg(vcpu, kvm_vcpu_dabt_get_rd(vcpu), data);
}
/*
@ -120,33 +119,6 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
return 0;
}
static int decode_hsr(struct kvm_vcpu *vcpu, bool *is_write, int *len)
{
unsigned long rt;
int access_size;
bool sign_extend;
if (kvm_vcpu_dabt_iss1tw(vcpu)) {
/* page table accesses IO mem: tell guest to fix its TTBR */
kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
return 1;
}
access_size = kvm_vcpu_dabt_get_as(vcpu);
if (unlikely(access_size < 0))
return access_size;
*is_write = kvm_vcpu_dabt_iswrite(vcpu);
sign_extend = kvm_vcpu_dabt_issext(vcpu);
rt = kvm_vcpu_dabt_get_rd(vcpu);
*len = access_size;
vcpu->arch.mmio_decode.sign_extend = sign_extend;
vcpu->arch.mmio_decode.rt = rt;
return 0;
}
int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
phys_addr_t fault_ipa)
{
@ -158,15 +130,10 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
u8 data_buf[8];
/*
* Prepare MMIO operation. First decode the syndrome data we get
* from the CPU. Then try if some in-kernel emulation feels
* responsible, otherwise let user space do its magic.
* No valid syndrome? Ask userspace for help if it has
* voluntered to do so, and bail out otherwise.
*/
if (kvm_vcpu_dabt_isvalid(vcpu)) {
ret = decode_hsr(vcpu, &is_write, &len);
if (ret)
return ret;
} else {
if (!kvm_vcpu_dabt_isvalid(vcpu)) {
if (vcpu->kvm->arch.return_nisv_io_abort_to_user) {
run->exit_reason = KVM_EXIT_ARM_NISV;
run->arm_nisv.esr_iss = kvm_vcpu_dabt_iss_nisv_sanitized(vcpu);
@ -178,7 +145,20 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
return -ENOSYS;
}
rt = vcpu->arch.mmio_decode.rt;
/* Page table accesses IO mem: tell guest to fix its TTBR */
if (kvm_vcpu_dabt_iss1tw(vcpu)) {
kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
return 1;
}
/*
* Prepare MMIO operation. First decode the syndrome data we get
* from the CPU. Then try if some in-kernel emulation feels
* responsible, otherwise let user space do its magic.
*/
is_write = kvm_vcpu_dabt_iswrite(vcpu);
len = kvm_vcpu_dabt_get_as(vcpu);
rt = kvm_vcpu_dabt_get_rd(vcpu);
if (is_write) {
data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt),

View File

@ -14,7 +14,6 @@
#include <asm/cacheflush.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_mmio.h>
#include <asm/kvm_ras.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
@ -1377,14 +1376,8 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap)
{
kvm_pfn_t pfn = *pfnp;
gfn_t gfn = *ipap >> PAGE_SHIFT;
struct page *page = pfn_to_page(pfn);
/*
* PageTransCompoundMap() returns true for THP and
* hugetlbfs. Make sure the adjustment is done only for THP
* pages.
*/
if (!PageHuge(page) && PageTransCompoundMap(page)) {
if (kvm_is_transparent_hugepage(pfn)) {
unsigned long mask;
/*
* The address we faulted on is backed by a transparent huge
@ -1596,16 +1589,8 @@ static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
__invalidate_icache_guest_page(pfn, size);
}
static void kvm_send_hwpoison_signal(unsigned long address,
struct vm_area_struct *vma)
static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
{
short lsb;
if (is_vm_hugetlb_page(vma))
lsb = huge_page_shift(hstate_vma(vma));
else
lsb = PAGE_SHIFT;
send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
}
@ -1678,6 +1663,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm *kvm = vcpu->kvm;
struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
struct vm_area_struct *vma;
short vma_shift;
kvm_pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
bool logging_active = memslot_is_logging(memslot);
@ -1701,7 +1687,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}
vma_pagesize = vma_kernel_pagesize(vma);
if (is_vm_hugetlb_page(vma))
vma_shift = huge_page_shift(hstate_vma(vma));
else
vma_shift = PAGE_SHIFT;
vma_pagesize = 1ULL << vma_shift;
if (logging_active ||
(vma->vm_flags & VM_PFNMAP) ||
!fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) {
@ -1741,7 +1732,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
if (pfn == KVM_PFN_ERR_HWPOISON) {
kvm_send_hwpoison_signal(hva, vma);
kvm_send_hwpoison_signal(hva, vma_shift);
return 0;
}
if (is_error_noslot_pfn(pfn))
@ -2147,7 +2138,8 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
if (!kvm->arch.pgd)
return 0;
trace_kvm_test_age_hva(hva);
return handle_hva_to_gpa(kvm, hva, hva, kvm_test_age_hva_handler, NULL);
return handle_hva_to_gpa(kvm, hva, hva + PAGE_SIZE,
kvm_test_age_hva_handler, NULL);
}
void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)

View File

@ -13,14 +13,14 @@
static int kvm_is_in_guest(void)
{
return kvm_arm_get_running_vcpu() != NULL;
return kvm_get_running_vcpu() != NULL;
}
static int kvm_is_user_mode(void)
{
struct kvm_vcpu *vcpu;
vcpu = kvm_arm_get_running_vcpu();
vcpu = kvm_get_running_vcpu();
if (vcpu)
return !vcpu_mode_priv(vcpu);
@ -32,7 +32,7 @@ static unsigned long kvm_get_guest_ip(void)
{
struct kvm_vcpu *vcpu;
vcpu = kvm_arm_get_running_vcpu();
vcpu = kvm_get_running_vcpu();
if (vcpu)
return *vcpu_pc(vcpu);

View File

@ -15,6 +15,8 @@
#include <kvm/arm_vgic.h>
static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx);
static void kvm_pmu_update_pmc_chained(struct kvm_vcpu *vcpu, u64 select_idx);
static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc);
#define PERF_ATTR_CFG1_KVM_PMU_CHAINED 0x1
@ -75,6 +77,13 @@ static struct kvm_pmc *kvm_pmu_get_canonical_pmc(struct kvm_pmc *pmc)
return pmc;
}
static struct kvm_pmc *kvm_pmu_get_alternate_pmc(struct kvm_pmc *pmc)
{
if (kvm_pmu_idx_is_high_counter(pmc->idx))
return pmc - 1;
else
return pmc + 1;
}
/**
* kvm_pmu_idx_has_chain_evtype - determine if the event type is chain
@ -238,10 +247,11 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu)
*/
void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
{
int i;
unsigned long mask = kvm_pmu_valid_counter_mask(vcpu);
struct kvm_pmu *pmu = &vcpu->arch.pmu;
int i;
for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++)
for_each_set_bit(i, &mask, 32)
kvm_pmu_stop_counter(vcpu, &pmu->pmc[i]);
bitmap_zero(vcpu->arch.pmu.chained, ARMV8_PMU_MAX_COUNTER_PAIRS);
@ -294,15 +304,9 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val)
pmc = &pmu->pmc[i];
/*
* For high counters of chained events we must recreate the
* perf event with the long (64bit) attribute set.
*/
if (kvm_pmu_pmc_is_chained(pmc) &&
kvm_pmu_idx_is_high_counter(i)) {
kvm_pmu_create_perf_event(vcpu, i);
continue;
}
/* A change in the enable state may affect the chain state */
kvm_pmu_update_pmc_chained(vcpu, i);
kvm_pmu_create_perf_event(vcpu, i);
/* At this point, pmc must be the canonical */
if (pmc->perf_event) {
@ -335,15 +339,9 @@ void kvm_pmu_disable_counter_mask(struct kvm_vcpu *vcpu, u64 val)
pmc = &pmu->pmc[i];
/*
* For high counters of chained events we must recreate the
* perf event with the long (64bit) attribute unset.
*/
if (kvm_pmu_pmc_is_chained(pmc) &&
kvm_pmu_idx_is_high_counter(i)) {
kvm_pmu_create_perf_event(vcpu, i);
continue;
}
/* A change in the enable state may affect the chain state */
kvm_pmu_update_pmc_chained(vcpu, i);
kvm_pmu_create_perf_event(vcpu, i);
/* At this point, pmc must be the canonical */
if (pmc->perf_event)
@ -480,25 +478,45 @@ static void kvm_pmu_perf_overflow(struct perf_event *perf_event,
*/
void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val)
{
struct kvm_pmu *pmu = &vcpu->arch.pmu;
int i;
u64 type, enable, reg;
if (val == 0)
if (!(__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E))
return;
enable = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
/* Weed out disabled counters */
val &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
for (i = 0; i < ARMV8_PMU_CYCLE_IDX; i++) {
u64 type, reg;
if (!(val & BIT(i)))
continue;
type = __vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i)
& ARMV8_PMU_EVTYPE_EVENT;
if ((type == ARMV8_PMUV3_PERFCTR_SW_INCR)
&& (enable & BIT(i))) {
reg = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
/* PMSWINC only applies to ... SW_INC! */
type = __vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i);
type &= ARMV8_PMU_EVTYPE_EVENT;
if (type != ARMV8_PMUV3_PERFCTR_SW_INCR)
continue;
/* increment this even SW_INC counter */
reg = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
reg = lower_32_bits(reg);
__vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg;
if (reg) /* no overflow on the low part */
continue;
if (kvm_pmu_pmc_is_chained(&pmu->pmc[i])) {
/* increment the high counter */
reg = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i + 1) + 1;
reg = lower_32_bits(reg);
__vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg;
if (!reg)
__vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(i);
__vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i + 1) = reg;
if (!reg) /* mark overflow on the high counter */
__vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(i + 1);
} else {
/* mark overflow on low counter */
__vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(i);
}
}
}
@ -510,10 +528,9 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val)
*/
void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
{
u64 mask;
unsigned long mask = kvm_pmu_valid_counter_mask(vcpu);
int i;
mask = kvm_pmu_valid_counter_mask(vcpu);
if (val & ARMV8_PMU_PMCR_E) {
kvm_pmu_enable_counter_mask(vcpu,
__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
@ -525,7 +542,7 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, 0);
if (val & ARMV8_PMU_PMCR_P) {
for (i = 0; i < ARMV8_PMU_CYCLE_IDX; i++)
for_each_set_bit(i, &mask, 32)
kvm_pmu_set_counter_value(vcpu, i, 0);
}
}
@ -582,15 +599,14 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
counter = kvm_pmu_get_pair_counter_value(vcpu, pmc);
if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx)) {
if (kvm_pmu_pmc_is_chained(pmc)) {
/**
* The initial sample period (overflow count) of an event. For
* chained counters we only support overflow interrupts on the
* high counter.
*/
attr.sample_period = (-counter) & GENMASK(63, 0);
if (kvm_pmu_counter_is_enabled(vcpu, pmc->idx + 1))
attr.config1 |= PERF_ATTR_CFG1_KVM_PMU_CHAINED;
attr.config1 |= PERF_ATTR_CFG1_KVM_PMU_CHAINED;
event = perf_event_create_kernel_counter(&attr, -1, current,
kvm_pmu_perf_overflow,
@ -621,25 +637,33 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
* @select_idx: The number of selected counter
*
* Update the chained bitmap based on the event type written in the
* typer register.
* typer register and the enable state of the odd register.
*/
static void kvm_pmu_update_pmc_chained(struct kvm_vcpu *vcpu, u64 select_idx)
{
struct kvm_pmu *pmu = &vcpu->arch.pmu;
struct kvm_pmc *pmc = &pmu->pmc[select_idx];
struct kvm_pmc *pmc = &pmu->pmc[select_idx], *canonical_pmc;
bool new_state, old_state;
if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx)) {
old_state = kvm_pmu_pmc_is_chained(pmc);
new_state = kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx) &&
kvm_pmu_counter_is_enabled(vcpu, pmc->idx | 0x1);
if (old_state == new_state)
return;
canonical_pmc = kvm_pmu_get_canonical_pmc(pmc);
kvm_pmu_stop_counter(vcpu, canonical_pmc);
if (new_state) {
/*
* During promotion from !chained to chained we must ensure
* the adjacent counter is stopped and its event destroyed
*/
if (!kvm_pmu_pmc_is_chained(pmc))
kvm_pmu_stop_counter(vcpu, pmc);
kvm_pmu_stop_counter(vcpu, kvm_pmu_get_alternate_pmc(pmc));
set_bit(pmc->idx >> 1, vcpu->arch.pmu.chained);
} else {
clear_bit(pmc->idx >> 1, vcpu->arch.pmu.chained);
return;
}
clear_bit(pmc->idx >> 1, vcpu->arch.pmu.chained);
}
/**

View File

@ -839,9 +839,8 @@ static int vgic_its_cmd_handle_discard(struct kvm *kvm, struct vgic_its *its,
u32 event_id = its_cmd_get_id(its_cmd);
struct its_ite *ite;
ite = find_ite(its, device_id, event_id);
if (ite && ite->collection) {
if (ite && its_is_collection_mapped(ite->collection)) {
/*
* Though the spec talks about removing the pending state, we
* don't bother here since we clear the ITTE anyway and the
@ -2475,7 +2474,8 @@ static int vgic_its_restore_cte(struct vgic_its *its, gpa_t gpa, int esz)
target_addr = (u32)(val >> KVM_ITS_CTE_RDBASE_SHIFT);
coll_id = val & KVM_ITS_CTE_ICID_MASK;
if (target_addr >= atomic_read(&kvm->online_vcpus))
if (target_addr != COLLECTION_NOT_MAPPED &&
target_addr >= atomic_read(&kvm->online_vcpus))
return -EINVAL;
collection = find_collection(its, coll_id);

View File

@ -414,8 +414,11 @@ static unsigned long vgic_mmio_read_pendbase(struct kvm_vcpu *vcpu,
gpa_t addr, unsigned int len)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
u64 value = vgic_cpu->pendbaser;
return extract_bytes(vgic_cpu->pendbaser, addr & 7, len);
value &= ~GICR_PENDBASER_PTZ;
return extract_bytes(value, addr & 7, len);
}
static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu,

View File

@ -190,15 +190,6 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
* value later will give us the same value as we update the per-CPU variable
* in the preempt notifier handlers.
*/
static struct kvm_vcpu *vgic_get_mmio_requester_vcpu(void)
{
struct kvm_vcpu *vcpu;
preempt_disable();
vcpu = kvm_arm_get_running_vcpu();
preempt_enable();
return vcpu;
}
/* Must be called with irq->irq_lock held */
static void vgic_hw_irq_spending(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
@ -221,7 +212,7 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
gpa_t addr, unsigned int len,
unsigned long val)
{
bool is_uaccess = !vgic_get_mmio_requester_vcpu();
bool is_uaccess = !kvm_get_running_vcpu();
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
int i;
unsigned long flags;
@ -274,7 +265,7 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
gpa_t addr, unsigned int len,
unsigned long val)
{
bool is_uaccess = !vgic_get_mmio_requester_vcpu();
bool is_uaccess = !kvm_get_running_vcpu();
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
int i;
unsigned long flags;
@ -335,7 +326,7 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
bool active)
{
unsigned long flags;
struct kvm_vcpu *requester_vcpu = vgic_get_mmio_requester_vcpu();
struct kvm_vcpu *requester_vcpu = kvm_get_running_vcpu();
raw_spin_lock_irqsave(&irq->irq_lock, flags);

View File

@ -98,11 +98,6 @@ extern struct kvm_io_device_ops kvm_io_gic_ops;
.uaccess_write = uwr, \
}
int kvm_vgic_register_mmio_region(struct kvm *kvm, struct kvm_vcpu *vcpu,
struct vgic_register_region *reg_desc,
struct vgic_io_device *region,
int nr_irqs, bool offset_private);
unsigned long vgic_data_mmio_bus_to_host(const void *val, unsigned int len);
void vgic_data_host_to_mmio_bus(void *buf, unsigned int len,

View File

@ -17,21 +17,6 @@
#include "async_pf.h"
#include <trace/events/kvm.h>
static inline void kvm_async_page_present_sync(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work)
{
#ifdef CONFIG_KVM_ASYNC_PF_SYNC
kvm_arch_async_page_present(vcpu, work);
#endif
}
static inline void kvm_async_page_present_async(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work)
{
#ifndef CONFIG_KVM_ASYNC_PF_SYNC
kvm_arch_async_page_present(vcpu, work);
#endif
}
static struct kmem_cache *async_pf_cache;
int kvm_async_pf_init(void)
@ -64,7 +49,7 @@ static void async_pf_execute(struct work_struct *work)
struct mm_struct *mm = apf->mm;
struct kvm_vcpu *vcpu = apf->vcpu;
unsigned long addr = apf->addr;
gva_t gva = apf->gva;
gpa_t cr2_or_gpa = apf->cr2_or_gpa;
int locked = 1;
might_sleep();
@ -80,7 +65,8 @@ static void async_pf_execute(struct work_struct *work)
if (locked)
up_read(&mm->mmap_sem);
kvm_async_page_present_sync(vcpu, apf);
if (IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC))
kvm_arch_async_page_present(vcpu, apf);
spin_lock(&vcpu->async_pf.lock);
list_add_tail(&apf->link, &vcpu->async_pf.done);
@ -92,7 +78,7 @@ static void async_pf_execute(struct work_struct *work)
* this point
*/
trace_kvm_async_pf_completed(addr, gva);
trace_kvm_async_pf_completed(addr, cr2_or_gpa);
if (swq_has_sleeper(&vcpu->wq))
swake_up_one(&vcpu->wq);
@ -157,7 +143,8 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
spin_unlock(&vcpu->async_pf.lock);
kvm_arch_async_page_ready(vcpu, work);
kvm_async_page_present_async(vcpu, work);
if (!IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC))
kvm_arch_async_page_present(vcpu, work);
list_del(&work->queue);
vcpu->async_pf.queued--;
@ -165,8 +152,8 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
}
}
int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
struct kvm_arch_async_pf *arch)
int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
unsigned long hva, struct kvm_arch_async_pf *arch)
{
struct kvm_async_pf *work;
@ -185,7 +172,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
work->wakeup_all = false;
work->vcpu = vcpu;
work->gva = gva;
work->cr2_or_gpa = cr2_or_gpa;
work->addr = hva;
work->arch = *arch;
work->mm = current->mm;

View File

@ -104,16 +104,16 @@ static cpumask_var_t cpus_hardware_enabled;
static int kvm_usage_count;
static atomic_t hardware_enable_failed;
struct kmem_cache *kvm_vcpu_cache;
EXPORT_SYMBOL_GPL(kvm_vcpu_cache);
static struct kmem_cache *kvm_vcpu_cache;
static __read_mostly struct preempt_ops kvm_preempt_ops;
static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
struct dentry *kvm_debugfs_dir;
EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
static int kvm_debugfs_num_entries;
static const struct file_operations *stat_fops_per_vm[];
static const struct file_operations stat_fops_per_vm;
static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
unsigned long arg);
@ -191,12 +191,24 @@ bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
return true;
}
bool kvm_is_transparent_hugepage(kvm_pfn_t pfn)
{
struct page *page = pfn_to_page(pfn);
if (!PageTransCompoundMap(page))
return false;
return is_transparent_hugepage(compound_head(page));
}
/*
* Switches to specified vcpu, until a matching vcpu_put()
*/
void vcpu_load(struct kvm_vcpu *vcpu)
{
int cpu = get_cpu();
__this_cpu_write(kvm_running_vcpu, vcpu);
preempt_notifier_register(&vcpu->preempt_notifier);
kvm_arch_vcpu_load(vcpu, cpu);
put_cpu();
@ -208,6 +220,7 @@ void vcpu_put(struct kvm_vcpu *vcpu)
preempt_disable();
kvm_arch_vcpu_put(vcpu);
preempt_notifier_unregister(&vcpu->preempt_notifier);
__this_cpu_write(kvm_running_vcpu, NULL);
preempt_enable();
}
EXPORT_SYMBOL_GPL(vcpu_put);
@ -322,11 +335,8 @@ void kvm_reload_remote_mmus(struct kvm *kvm)
kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
}
int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
{
struct page *page;
int r;
mutex_init(&vcpu->mutex);
vcpu->cpu = -1;
vcpu->kvm = kvm;
@ -338,42 +348,28 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
vcpu->pre_pcpu = -1;
INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page) {
r = -ENOMEM;
goto fail;
}
vcpu->run = page_address(page);
kvm_vcpu_set_in_spin_loop(vcpu, false);
kvm_vcpu_set_dy_eligible(vcpu, false);
vcpu->preempted = false;
vcpu->ready = false;
r = kvm_arch_vcpu_init(vcpu);
if (r < 0)
goto fail_free_run;
return 0;
fail_free_run:
free_page((unsigned long)vcpu->run);
fail:
return r;
preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
}
EXPORT_SYMBOL_GPL(kvm_vcpu_init);
void kvm_vcpu_uninit(struct kvm_vcpu *vcpu)
void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
{
kvm_arch_vcpu_destroy(vcpu);
/*
* no need for rcu_read_lock as VCPU_RUN is the only place that
* will change the vcpu->pid pointer and on uninit all file
* descriptors are already gone.
* No need for rcu_read_lock as VCPU_RUN is the only place that changes
* the vcpu->pid pointer, and at destruction time all file descriptors
* are already gone.
*/
put_pid(rcu_dereference_protected(vcpu->pid, 1));
kvm_arch_vcpu_uninit(vcpu);
free_page((unsigned long)vcpu->run);
kmem_cache_free(kvm_vcpu_cache, vcpu);
}
EXPORT_SYMBOL_GPL(kvm_vcpu_uninit);
EXPORT_SYMBOL_GPL(kvm_vcpu_destroy);
#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
@ -650,11 +646,11 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
return -ENOMEM;
stat_data->kvm = kvm;
stat_data->offset = p->offset;
stat_data->mode = p->mode ? p->mode : 0644;
stat_data->dbgfs_item = p;
kvm->debugfs_stat_data[p - debugfs_entries] = stat_data;
debugfs_create_file(p->name, stat_data->mode, kvm->debugfs_dentry,
stat_data, stat_fops_per_vm[p->kind]);
debugfs_create_file(p->name, KVM_DBGFS_GET_MODE(p),
kvm->debugfs_dentry, stat_data,
&stat_fops_per_vm);
}
return 0;
}
@ -964,7 +960,7 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
/*
* Increment the new memslot generation a second time, dropping the
* update in-progress flag and incrementing then generation based on
* update in-progress flag and incrementing the generation based on
* the number of address spaces. This provides a unique and easily
* identifiable generation number while the memslots are in flux.
*/
@ -1117,7 +1113,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
*
* validation of sp->gfn happens in:
* - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
* - kvm_is_visible_gfn (mmu_check_roots)
* - kvm_is_visible_gfn (mmu_check_root)
*/
kvm_arch_flush_shadow_memslot(kvm, slot);
@ -1406,14 +1402,14 @@ bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
}
EXPORT_SYMBOL_GPL(kvm_is_visible_gfn);
unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn)
unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
{
struct vm_area_struct *vma;
unsigned long addr, size;
size = PAGE_SIZE;
addr = gfn_to_hva(kvm, gfn);
addr = kvm_vcpu_gfn_to_hva_prot(vcpu, gfn, NULL);
if (kvm_is_error_hva(addr))
return PAGE_SIZE;
@ -1519,7 +1515,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
/*
* The fast path to get the writable pfn which will be stored in @pfn,
* true indicates success, otherwise false is returned. It's also the
* only part that runs if we can are in atomic context.
* only part that runs if we can in atomic context.
*/
static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
bool *writable, kvm_pfn_t *pfn)
@ -1821,26 +1817,72 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
}
EXPORT_SYMBOL_GPL(gfn_to_page);
static int __kvm_map_gfn(struct kvm_memory_slot *slot, gfn_t gfn,
struct kvm_host_map *map)
void kvm_release_pfn(kvm_pfn_t pfn, bool dirty, struct gfn_to_pfn_cache *cache)
{
if (pfn == 0)
return;
if (cache)
cache->pfn = cache->gfn = 0;
if (dirty)
kvm_release_pfn_dirty(pfn);
else
kvm_release_pfn_clean(pfn);
}
static void kvm_cache_gfn_to_pfn(struct kvm_memory_slot *slot, gfn_t gfn,
struct gfn_to_pfn_cache *cache, u64 gen)
{
kvm_release_pfn(cache->pfn, cache->dirty, cache);
cache->pfn = gfn_to_pfn_memslot(slot, gfn);
cache->gfn = gfn;
cache->dirty = false;
cache->generation = gen;
}
static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn,
struct kvm_host_map *map,
struct gfn_to_pfn_cache *cache,
bool atomic)
{
kvm_pfn_t pfn;
void *hva = NULL;
struct page *page = KVM_UNMAPPED_PAGE;
struct kvm_memory_slot *slot = __gfn_to_memslot(slots, gfn);
u64 gen = slots->generation;
if (!map)
return -EINVAL;
pfn = gfn_to_pfn_memslot(slot, gfn);
if (cache) {
if (!cache->pfn || cache->gfn != gfn ||
cache->generation != gen) {
if (atomic)
return -EAGAIN;
kvm_cache_gfn_to_pfn(slot, gfn, cache, gen);
}
pfn = cache->pfn;
} else {
if (atomic)
return -EAGAIN;
pfn = gfn_to_pfn_memslot(slot, gfn);
}
if (is_error_noslot_pfn(pfn))
return -EINVAL;
if (pfn_valid(pfn)) {
page = pfn_to_page(pfn);
hva = kmap(page);
if (atomic)
hva = kmap_atomic(page);
else
hva = kmap(page);
#ifdef CONFIG_HAS_IOMEM
} else {
} else if (!atomic) {
hva = memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
} else {
return -EINVAL;
#endif
}
@ -1855,14 +1897,25 @@ static int __kvm_map_gfn(struct kvm_memory_slot *slot, gfn_t gfn,
return 0;
}
int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
struct gfn_to_pfn_cache *cache, bool atomic)
{
return __kvm_map_gfn(kvm_memslots(vcpu->kvm), gfn, map,
cache, atomic);
}
EXPORT_SYMBOL_GPL(kvm_map_gfn);
int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
{
return __kvm_map_gfn(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn, map);
return __kvm_map_gfn(kvm_vcpu_memslots(vcpu), gfn, map,
NULL, false);
}
EXPORT_SYMBOL_GPL(kvm_vcpu_map);
void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map,
bool dirty)
static void __kvm_unmap_gfn(struct kvm_memory_slot *memslot,
struct kvm_host_map *map,
struct gfn_to_pfn_cache *cache,
bool dirty, bool atomic)
{
if (!map)
return;
@ -1870,23 +1923,45 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map,
if (!map->hva)
return;
if (map->page != KVM_UNMAPPED_PAGE)
kunmap(map->page);
if (map->page != KVM_UNMAPPED_PAGE) {
if (atomic)
kunmap_atomic(map->hva);
else
kunmap(map->page);
}
#ifdef CONFIG_HAS_IOMEM
else
else if (!atomic)
memunmap(map->hva);
else
WARN_ONCE(1, "Unexpected unmapping in atomic context");
#endif
if (dirty) {
kvm_vcpu_mark_page_dirty(vcpu, map->gfn);
kvm_release_pfn_dirty(map->pfn);
} else {
kvm_release_pfn_clean(map->pfn);
}
if (dirty)
mark_page_dirty_in_slot(memslot, map->gfn);
if (cache)
cache->dirty |= dirty;
else
kvm_release_pfn(map->pfn, dirty, NULL);
map->hva = NULL;
map->page = NULL;
}
int kvm_unmap_gfn(struct kvm_vcpu *vcpu, struct kvm_host_map *map,
struct gfn_to_pfn_cache *cache, bool dirty, bool atomic)
{
__kvm_unmap_gfn(gfn_to_memslot(vcpu->kvm, map->gfn), map,
cache, dirty, atomic);
return 0;
}
EXPORT_SYMBOL_GPL(kvm_unmap_gfn);
void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
{
__kvm_unmap_gfn(kvm_vcpu_gfn_to_memslot(vcpu, map->gfn), map, NULL,
dirty, false);
}
EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
@ -1931,11 +2006,8 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
void kvm_set_pfn_dirty(kvm_pfn_t pfn)
{
if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) {
struct page *page = pfn_to_page(pfn);
SetPageDirty(page);
}
if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn))
SetPageDirty(pfn_to_page(pfn));
}
EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
@ -2051,17 +2123,6 @@ static int __kvm_read_guest_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
return 0;
}
int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
unsigned long len)
{
gfn_t gfn = gpa >> PAGE_SHIFT;
struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
int offset = offset_in_page(gpa);
return __kvm_read_guest_atomic(slot, gfn, data, offset, len);
}
EXPORT_SYMBOL_GPL(kvm_read_guest_atomic);
int kvm_vcpu_read_guest_atomic(struct kvm_vcpu *vcpu, gpa_t gpa,
void *data, unsigned long len)
{
@ -2158,33 +2219,36 @@ static int __kvm_gfn_to_hva_cache_init(struct kvm_memslots *slots,
gfn_t end_gfn = (gpa + len - 1) >> PAGE_SHIFT;
gfn_t nr_pages_needed = end_gfn - start_gfn + 1;
gfn_t nr_pages_avail;
int r = start_gfn <= end_gfn ? 0 : -EINVAL;
ghc->gpa = gpa;
/* Update ghc->generation before performing any error checks. */
ghc->generation = slots->generation;
ghc->len = len;
ghc->hva = KVM_HVA_ERR_BAD;
if (start_gfn > end_gfn) {
ghc->hva = KVM_HVA_ERR_BAD;
return -EINVAL;
}
/*
* If the requested region crosses two memslots, we still
* verify that the entire region is valid here.
*/
while (!r && start_gfn <= end_gfn) {
for ( ; start_gfn <= end_gfn; start_gfn += nr_pages_avail) {
ghc->memslot = __gfn_to_memslot(slots, start_gfn);
ghc->hva = gfn_to_hva_many(ghc->memslot, start_gfn,
&nr_pages_avail);
if (kvm_is_error_hva(ghc->hva))
r = -EFAULT;
start_gfn += nr_pages_avail;
return -EFAULT;
}
/* Use the slow path for cross page reads and writes. */
if (!r && nr_pages_needed == 1)
if (nr_pages_needed == 1)
ghc->hva += offset;
else
ghc->memslot = NULL;
return r;
ghc->gpa = gpa;
ghc->len = len;
return 0;
}
int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
@ -2205,15 +2269,17 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
BUG_ON(len + offset > ghc->len);
if (slots->generation != ghc->generation)
__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len);
if (unlikely(!ghc->memslot))
return kvm_write_guest(kvm, gpa, data, len);
if (slots->generation != ghc->generation) {
if (__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len))
return -EFAULT;
}
if (kvm_is_error_hva(ghc->hva))
return -EFAULT;
if (unlikely(!ghc->memslot))
return kvm_write_guest(kvm, gpa, data, len);
r = __copy_to_user((void __user *)ghc->hva + offset, data, len);
if (r)
return -EFAULT;
@ -2238,15 +2304,17 @@ int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
BUG_ON(len > ghc->len);
if (slots->generation != ghc->generation)
__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len);
if (unlikely(!ghc->memslot))
return kvm_read_guest(kvm, ghc->gpa, data, len);
if (slots->generation != ghc->generation) {
if (__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len))
return -EFAULT;
}
if (kvm_is_error_hva(ghc->hva))
return -EFAULT;
if (unlikely(!ghc->memslot))
return kvm_read_guest(kvm, ghc->gpa, data, len);
r = __copy_from_user(data, (void __user *)ghc->hva, len);
if (r)
return -EFAULT;
@ -2718,6 +2786,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
{
int r;
struct kvm_vcpu *vcpu;
struct page *page;
if (id >= KVM_MAX_VCPU_ID)
return -EINVAL;
@ -2731,17 +2800,29 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
kvm->created_vcpus++;
mutex_unlock(&kvm->lock);
vcpu = kvm_arch_vcpu_create(kvm, id);
if (IS_ERR(vcpu)) {
r = PTR_ERR(vcpu);
r = kvm_arch_vcpu_precreate(kvm, id);
if (r)
goto vcpu_decrement;
vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
if (!vcpu) {
r = -ENOMEM;
goto vcpu_decrement;
}
preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
BUILD_BUG_ON(sizeof(struct kvm_run) > PAGE_SIZE);
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page) {
r = -ENOMEM;
goto vcpu_free;
}
vcpu->run = page_address(page);
r = kvm_arch_vcpu_setup(vcpu);
kvm_vcpu_init(vcpu, kvm, id);
r = kvm_arch_vcpu_create(vcpu);
if (r)
goto vcpu_destroy;
goto vcpu_free_run_page;
kvm_create_vcpu_debugfs(vcpu);
@ -2778,8 +2859,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
unlock_vcpu_destroy:
mutex_unlock(&kvm->lock);
debugfs_remove_recursive(vcpu->debugfs_dentry);
vcpu_destroy:
kvm_arch_vcpu_destroy(vcpu);
vcpu_free_run_page:
free_page((unsigned long)vcpu->run);
vcpu_free:
kmem_cache_free(kvm_vcpu_cache, vcpu);
vcpu_decrement:
mutex_lock(&kvm->lock);
kvm->created_vcpus--;
@ -4013,8 +4097,9 @@ static int kvm_debugfs_open(struct inode *inode, struct file *file,
return -ENOENT;
if (simple_attr_open(inode, file, get,
stat_data->mode & S_IWUGO ? set : NULL,
fmt)) {
KVM_DBGFS_GET_MODE(stat_data->dbgfs_item) & 0222
? set : NULL,
fmt)) {
kvm_put_kvm(stat_data->kvm);
return -ENOMEM;
}
@ -4033,105 +4118,111 @@ static int kvm_debugfs_release(struct inode *inode, struct file *file)
return 0;
}
static int vm_stat_get_per_vm(void *data, u64 *val)
static int kvm_get_stat_per_vm(struct kvm *kvm, size_t offset, u64 *val)
{
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
*val = *(ulong *)((void *)stat_data->kvm + stat_data->offset);
*val = *(ulong *)((void *)kvm + offset);
return 0;
}
static int vm_stat_clear_per_vm(void *data, u64 val)
static int kvm_clear_stat_per_vm(struct kvm *kvm, size_t offset)
{
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
if (val)
return -EINVAL;
*(ulong *)((void *)stat_data->kvm + stat_data->offset) = 0;
*(ulong *)((void *)kvm + offset) = 0;
return 0;
}
static int vm_stat_get_per_vm_open(struct inode *inode, struct file *file)
{
__simple_attr_check_format("%llu\n", 0ull);
return kvm_debugfs_open(inode, file, vm_stat_get_per_vm,
vm_stat_clear_per_vm, "%llu\n");
}
static const struct file_operations vm_stat_get_per_vm_fops = {
.owner = THIS_MODULE,
.open = vm_stat_get_per_vm_open,
.release = kvm_debugfs_release,
.read = simple_attr_read,
.write = simple_attr_write,
.llseek = no_llseek,
};
static int vcpu_stat_get_per_vm(void *data, u64 *val)
static int kvm_get_stat_per_vcpu(struct kvm *kvm, size_t offset, u64 *val)
{
int i;
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
struct kvm_vcpu *vcpu;
*val = 0;
kvm_for_each_vcpu(i, vcpu, stat_data->kvm)
*val += *(u64 *)((void *)vcpu + stat_data->offset);
kvm_for_each_vcpu(i, vcpu, kvm)
*val += *(u64 *)((void *)vcpu + offset);
return 0;
}
static int vcpu_stat_clear_per_vm(void *data, u64 val)
static int kvm_clear_stat_per_vcpu(struct kvm *kvm, size_t offset)
{
int i;
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
struct kvm_vcpu *vcpu;
kvm_for_each_vcpu(i, vcpu, kvm)
*(u64 *)((void *)vcpu + offset) = 0;
return 0;
}
static int kvm_stat_data_get(void *data, u64 *val)
{
int r = -EFAULT;
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
switch (stat_data->dbgfs_item->kind) {
case KVM_STAT_VM:
r = kvm_get_stat_per_vm(stat_data->kvm,
stat_data->dbgfs_item->offset, val);
break;
case KVM_STAT_VCPU:
r = kvm_get_stat_per_vcpu(stat_data->kvm,
stat_data->dbgfs_item->offset, val);
break;
}
return r;
}
static int kvm_stat_data_clear(void *data, u64 val)
{
int r = -EFAULT;
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
if (val)
return -EINVAL;
kvm_for_each_vcpu(i, vcpu, stat_data->kvm)
*(u64 *)((void *)vcpu + stat_data->offset) = 0;
switch (stat_data->dbgfs_item->kind) {
case KVM_STAT_VM:
r = kvm_clear_stat_per_vm(stat_data->kvm,
stat_data->dbgfs_item->offset);
break;
case KVM_STAT_VCPU:
r = kvm_clear_stat_per_vcpu(stat_data->kvm,
stat_data->dbgfs_item->offset);
break;
}
return 0;
return r;
}
static int vcpu_stat_get_per_vm_open(struct inode *inode, struct file *file)
static int kvm_stat_data_open(struct inode *inode, struct file *file)
{
__simple_attr_check_format("%llu\n", 0ull);
return kvm_debugfs_open(inode, file, vcpu_stat_get_per_vm,
vcpu_stat_clear_per_vm, "%llu\n");
return kvm_debugfs_open(inode, file, kvm_stat_data_get,
kvm_stat_data_clear, "%llu\n");
}
static const struct file_operations vcpu_stat_get_per_vm_fops = {
.owner = THIS_MODULE,
.open = vcpu_stat_get_per_vm_open,
static const struct file_operations stat_fops_per_vm = {
.owner = THIS_MODULE,
.open = kvm_stat_data_open,
.release = kvm_debugfs_release,
.read = simple_attr_read,
.write = simple_attr_write,
.llseek = no_llseek,
};
static const struct file_operations *stat_fops_per_vm[] = {
[KVM_STAT_VCPU] = &vcpu_stat_get_per_vm_fops,
[KVM_STAT_VM] = &vm_stat_get_per_vm_fops,
.read = simple_attr_read,
.write = simple_attr_write,
.llseek = no_llseek,
};
static int vm_stat_get(void *_offset, u64 *val)
{
unsigned offset = (long)_offset;
struct kvm *kvm;
struct kvm_stat_data stat_tmp = {.offset = offset};
u64 tmp_val;
*val = 0;
mutex_lock(&kvm_lock);
list_for_each_entry(kvm, &vm_list, vm_list) {
stat_tmp.kvm = kvm;
vm_stat_get_per_vm((void *)&stat_tmp, &tmp_val);
kvm_get_stat_per_vm(kvm, offset, &tmp_val);
*val += tmp_val;
}
mutex_unlock(&kvm_lock);
@ -4142,15 +4233,13 @@ static int vm_stat_clear(void *_offset, u64 val)
{
unsigned offset = (long)_offset;
struct kvm *kvm;
struct kvm_stat_data stat_tmp = {.offset = offset};
if (val)
return -EINVAL;
mutex_lock(&kvm_lock);
list_for_each_entry(kvm, &vm_list, vm_list) {
stat_tmp.kvm = kvm;
vm_stat_clear_per_vm((void *)&stat_tmp, 0);
kvm_clear_stat_per_vm(kvm, offset);
}
mutex_unlock(&kvm_lock);
@ -4163,14 +4252,12 @@ static int vcpu_stat_get(void *_offset, u64 *val)
{
unsigned offset = (long)_offset;
struct kvm *kvm;
struct kvm_stat_data stat_tmp = {.offset = offset};
u64 tmp_val;
*val = 0;
mutex_lock(&kvm_lock);
list_for_each_entry(kvm, &vm_list, vm_list) {
stat_tmp.kvm = kvm;
vcpu_stat_get_per_vm((void *)&stat_tmp, &tmp_val);
kvm_get_stat_per_vcpu(kvm, offset, &tmp_val);
*val += tmp_val;
}
mutex_unlock(&kvm_lock);
@ -4181,15 +4268,13 @@ static int vcpu_stat_clear(void *_offset, u64 val)
{
unsigned offset = (long)_offset;
struct kvm *kvm;
struct kvm_stat_data stat_tmp = {.offset = offset};
if (val)
return -EINVAL;
mutex_lock(&kvm_lock);
list_for_each_entry(kvm, &vm_list, vm_list) {
stat_tmp.kvm = kvm;
vcpu_stat_clear_per_vm((void *)&stat_tmp, 0);
kvm_clear_stat_per_vcpu(kvm, offset);
}
mutex_unlock(&kvm_lock);
@ -4262,9 +4347,8 @@ static void kvm_init_debug(void)
kvm_debugfs_num_entries = 0;
for (p = debugfs_entries; p->name; ++p, kvm_debugfs_num_entries++) {
int mode = p->mode ? p->mode : 0644;
debugfs_create_file(p->name, mode, kvm_debugfs_dir,
(void *)(long)p->offset,
debugfs_create_file(p->name, KVM_DBGFS_GET_MODE(p),
kvm_debugfs_dir, (void *)(long)p->offset,
stat_fops[p->kind]);
}
}
@ -4304,8 +4388,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
WRITE_ONCE(vcpu->preempted, false);
WRITE_ONCE(vcpu->ready, false);
__this_cpu_write(kvm_running_vcpu, vcpu);
kvm_arch_sched_in(vcpu, cpu);
kvm_arch_vcpu_load(vcpu, cpu);
}
@ -4319,6 +4403,25 @@ static void kvm_sched_out(struct preempt_notifier *pn,
WRITE_ONCE(vcpu->ready, true);
}
kvm_arch_vcpu_put(vcpu);
__this_cpu_write(kvm_running_vcpu, NULL);
}
/**
* kvm_get_running_vcpu - get the vcpu running on the current CPU.
* Thanks to preempt notifiers, this can also be called from
* preemptible context.
*/
struct kvm_vcpu *kvm_get_running_vcpu(void)
{
return __this_cpu_read(kvm_running_vcpu);
}
/**
* kvm_get_running_vcpus - get the per-CPU array of currently running vcpus.
*/
struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
{
return &kvm_running_vcpu;
}
static void check_processor_compat(void *rtn)

View File

@ -85,6 +85,7 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer)
{
struct irq_bypass_producer *tmp;
struct irq_bypass_consumer *consumer;
int ret;
if (!producer->token)
return -EINVAL;
@ -98,20 +99,16 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer)
list_for_each_entry(tmp, &producers, node) {
if (tmp->token == producer->token) {
mutex_unlock(&lock);
module_put(THIS_MODULE);
return -EBUSY;
ret = -EBUSY;
goto out_err;
}
}
list_for_each_entry(consumer, &consumers, node) {
if (consumer->token == producer->token) {
int ret = __connect(producer, consumer);
if (ret) {
mutex_unlock(&lock);
module_put(THIS_MODULE);
return ret;
}
ret = __connect(producer, consumer);
if (ret)
goto out_err;
break;
}
}
@ -121,6 +118,10 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer)
mutex_unlock(&lock);
return 0;
out_err:
mutex_unlock(&lock);
module_put(THIS_MODULE);
return ret;
}
EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
@ -179,6 +180,7 @@ int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
{
struct irq_bypass_consumer *tmp;
struct irq_bypass_producer *producer;
int ret;
if (!consumer->token ||
!consumer->add_producer || !consumer->del_producer)
@ -193,20 +195,16 @@ int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
list_for_each_entry(tmp, &consumers, node) {
if (tmp->token == consumer->token || tmp == consumer) {
mutex_unlock(&lock);
module_put(THIS_MODULE);
return -EBUSY;
ret = -EBUSY;
goto out_err;
}
}
list_for_each_entry(producer, &producers, node) {
if (producer->token == consumer->token) {
int ret = __connect(producer, consumer);
if (ret) {
mutex_unlock(&lock);
module_put(THIS_MODULE);
return ret;
}
ret = __connect(producer, consumer);
if (ret)
goto out_err;
break;
}
}
@ -216,6 +214,10 @@ int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
mutex_unlock(&lock);
return 0;
out_err:
mutex_unlock(&lock);
module_put(THIS_MODULE);
return ret;
}
EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);