* ARM: support for SVE and Pointer Authentication in guests, PMU improvements

* POWER: support for direct access to the POWER9 XIVE interrupt controller,
 memory and performance optimizations.
 
 * x86: support for accessing memory not backed by struct page, fixes and refactoring
 
 * Generic: dirty page tracking improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJc3qV/AAoJEL/70l94x66Dn3QH/jX1Bn0P/RZAIt4w0SySklSg
 PqxUKDyBQqB9vN9Qeb9jWXAKPH2CtM3+up/rz7oRnBWp7qA6vXcC/R/QJYAvzdXE
 nklsR/oYCsflR1KdlVYuDvvPCPP2fLBU5zfN83OsaBQ8fNRkm3gN+N5XQ2SbXbLy
 Mo9tybS4otY201UAC96e8N0ipwwyCRpDneQpLcl+F5nH3RBt63cVbs04O+70MXn7
 eT4I+8K3+Go7LATzT8hglD21D/7uvE31qQb6yr5L33IfhU4GB51RZzBXTNaAdY8n
 hT1rMrRkAMAFWYZPQDfoMadjWU3i5DIfstKjDxOr9oTfuOEp5Z+GvJwvVnUDg1I=
 =D0+p
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - support for SVE and Pointer Authentication in guests
   - PMU improvements

  POWER:
   - support for direct access to the POWER9 XIVE interrupt controller
   - memory and performance optimizations

  x86:
   - support for accessing memory not backed by struct page
   - fixes and refactoring

  Generic:
   - dirty page tracking improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits)
  kvm: fix compilation on aarch64
  Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU"
  kvm: x86: Fix L1TF mitigation for shadow MMU
  KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible
  KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device
  KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing"
  KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs
  kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete
  tests: kvm: Add tests for KVM_SET_NESTED_STATE
  KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state
  tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID
  tests: kvm: Add tests to .gitignore
  KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
  KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one
  KVM: Fix the bitmap range to copy during clear dirty
  KVM: arm64: Fix ptrauth ID register masking logic
  KVM: x86: use direct accessors for RIP and RSP
  KVM: VMX: Use accessors for GPRs outside of dedicated caching logic
  KVM: x86: Omit caching logic for always-available GPRs
  kvm, x86: Properly check whether a pfn is an MMIO or not
  ...
This commit is contained in:
Linus Torvalds 2019-05-17 10:33:30 -07:00
commit 0ef0fd3515
91 changed files with 5580 additions and 971 deletions

View File

@ -0,0 +1,85 @@
Perf Event Attributes
=====================
Author: Andrew Murray <andrew.murray@arm.com>
Date: 2019-03-06
exclude_user
------------
This attribute excludes userspace.
Userspace always runs at EL0 and thus this attribute will exclude EL0.
exclude_kernel
--------------
This attribute excludes the kernel.
The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run
at EL1.
For the host this attribute will exclude EL1 and additionally EL2 on a VHE
system.
For the guest this attribute will exclude EL1. Please note that EL2 is
never counted within a guest.
exclude_hv
----------
This attribute excludes the hypervisor.
For a VHE host this attribute is ignored as we consider the host kernel to
be the hypervisor.
For a non-VHE host this attribute will exclude EL2 as we consider the
hypervisor to be any code that runs at EL2 which is predominantly used for
guest/host transitions.
For the guest this attribute has no effect. Please note that EL2 is
never counted within a guest.
exclude_host / exclude_guest
----------------------------
These attributes exclude the KVM host and guest, respectively.
The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE
kernel or non-VHE hypervisor).
The KVM guest may run at EL0 (userspace) and EL1 (kernel).
Due to the overlapping exception levels between host and guests we cannot
exclusively rely on the PMU's hardware exception filtering - therefore we
must enable/disable counting on the entry and exit to the guest. This is
performed differently on VHE and non-VHE systems.
For non-VHE systems we exclude EL2 for exclude_host - upon entering and
exiting the guest we disable/enable the event as appropriate based on the
exclude_host and exclude_guest attributes.
For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2
for exclude_host. Upon entering and exiting the guest we modify the event
to include/exclude EL0 as appropriate based on the exclude_host and
exclude_guest attributes.
The statements above also apply when these attributes are used within a
non-VHE guest however please note that EL2 is never counted within a guest.
Accuracy
--------
On non-VHE hosts we enable/disable counters on the entry/exit of host/guest
transition at EL2 - however there is a period of time between
enabling/disabling the counters and entering/exiting the guest. We are
able to eliminate counters counting host events on the boundaries of guest
entry/exit when counting guest events by filtering out EL2 for
exclude_host. However when using !exclude_hv there is a small blackout
window at the guest entry/exit where host events are not captured.
On VHE systems there are no blackout windows.

View File

@ -87,7 +87,21 @@ used to get and set the keys for a thread.
Virtualization
--------------
Pointer authentication is not currently supported in KVM guests. KVM
will mask the feature bits from ID_AA64ISAR1_EL1, and attempted use of
the feature will result in an UNDEFINED exception being injected into
the guest.
Pointer authentication is enabled in KVM guest when each virtual cpu is
initialised by passing flags KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] and
requesting these two separate cpu features to be enabled. The current KVM
guest implementation works by enabling both features together, so both
these userspace flags are checked before enabling pointer authentication.
The separate userspace flag will allow to have no userspace ABI changes
if support is added in the future to allow these two features to be
enabled independently of one another.
As Arm Architecture specifies that Pointer Authentication feature is
implemented along with the VHE feature so KVM arm64 ptrauth code relies
on VHE mode to be present.
Additionally, when these vcpu feature flags are not set then KVM will
filter out the Pointer Authentication system key registers from
KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID
register. Any attempt to use the Pointer Authentication instructions will
result in an UNDEFINED exception being injected into the guest.

View File

@ -69,23 +69,6 @@ by and on behalf of the VM's process may not be freed/unaccounted when
the VM is shut down.
It is important to note that althought VM ioctls may only be issued from
the process that created the VM, a VM's lifecycle is associated with its
file descriptor, not its creator (process). In other words, the VM and
its resources, *including the associated address space*, are not freed
until the last reference to the VM's file descriptor has been released.
For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will
not be freed until both the parent (original) process and its child have
put their references to the VM's file descriptor.
Because a VM's resources are not freed until the last reference to its
file descriptor is released, creating additional references to a VM via
via fork(), dup(), etc... without careful consideration is strongly
discouraged and may have unwanted side effects, e.g. memory allocated
by and on behalf of the VM's process may not be freed/unaccounted when
the VM is shut down.
3. Extensions
-------------
@ -347,7 +330,7 @@ They must be less than the value that KVM_CHECK_EXTENSION returns for
the KVM_CAP_MULTI_ADDRESS_SPACE capability.
The bits in the dirty bitmap are cleared before the ioctl returns, unless
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is enabled. For more information,
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled. For more information,
see the description of the capability.
4.9 KVM_SET_MEMORY_ALIAS
@ -1117,9 +1100,8 @@ struct kvm_userspace_memory_region {
This ioctl allows the user to create, modify or delete a guest physical
memory slot. Bits 0-15 of "slot" specify the slot id and this value
should be less than the maximum number of user memory slots supported per
VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS,
if this capability is supported by the architecture. Slots may not
overlap in guest physical address space.
VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS.
Slots may not overlap in guest physical address space.
If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot"
specifies the address space which is being modified. They must be
@ -1901,6 +1883,12 @@ Architectures: all
Type: vcpu ioctl
Parameters: struct kvm_one_reg (in)
Returns: 0 on success, negative value on failure
Errors:
 ENOENT:   no such register
 EINVAL:   invalid register ID, or no such register
 EPERM:    (arm64) register access not allowed before vcpu finalization
(These error codes are indicative only: do not rely on a specific error
code being returned in a specific situation.)
struct kvm_one_reg {
__u64 id;
@ -1985,6 +1973,7 @@ registers, find a list below:
PPC | KVM_REG_PPC_TLB3PS | 32
PPC | KVM_REG_PPC_EPTCFG | 32
PPC | KVM_REG_PPC_ICP_STATE | 64
PPC | KVM_REG_PPC_VP_STATE | 128
PPC | KVM_REG_PPC_TB_OFFSET | 64
PPC | KVM_REG_PPC_SPMC1 | 32
PPC | KVM_REG_PPC_SPMC2 | 32
@ -2137,6 +2126,37 @@ contains elements ranging from 32 to 128 bits. The index is a 32bit
value in the kvm_regs structure seen as a 32bit array.
0x60x0 0000 0010 <index into the kvm_regs struct:16>
Specifically:
Encoding Register Bits kvm_regs member
----------------------------------------------------------------
0x6030 0000 0010 0000 X0 64 regs.regs[0]
0x6030 0000 0010 0002 X1 64 regs.regs[1]
...
0x6030 0000 0010 003c X30 64 regs.regs[30]
0x6030 0000 0010 003e SP 64 regs.sp
0x6030 0000 0010 0040 PC 64 regs.pc
0x6030 0000 0010 0042 PSTATE 64 regs.pstate
0x6030 0000 0010 0044 SP_EL1 64 sp_el1
0x6030 0000 0010 0046 ELR_EL1 64 elr_el1
0x6030 0000 0010 0048 SPSR_EL1 64 spsr[KVM_SPSR_EL1] (alias SPSR_SVC)
0x6030 0000 0010 004a SPSR_ABT 64 spsr[KVM_SPSR_ABT]
0x6030 0000 0010 004c SPSR_UND 64 spsr[KVM_SPSR_UND]
0x6030 0000 0010 004e SPSR_IRQ 64 spsr[KVM_SPSR_IRQ]
0x6060 0000 0010 0050 SPSR_FIQ 64 spsr[KVM_SPSR_FIQ]
0x6040 0000 0010 0054 V0 128 fp_regs.vregs[0] (*)
0x6040 0000 0010 0058 V1 128 fp_regs.vregs[1] (*)
...
0x6040 0000 0010 00d0 V31 128 fp_regs.vregs[31] (*)
0x6020 0000 0010 00d4 FPSR 32 fp_regs.fpsr
0x6020 0000 0010 00d5 FPCR 32 fp_regs.fpcr
(*) These encodings are not accepted for SVE-enabled vcpus. See
KVM_ARM_VCPU_INIT.
The equivalent register content can be accessed via bits [127:0] of
the corresponding SVE Zn registers instead for vcpus that have SVE
enabled (see below).
arm64 CCSIDR registers are demultiplexed by CSSELR value:
0x6020 0000 0011 00 <csselr:8>
@ -2146,6 +2166,64 @@ arm64 system registers have the following id bit patterns:
arm64 firmware pseudo-registers have the following bit pattern:
0x6030 0000 0014 <regno:16>
arm64 SVE registers have the following bit patterns:
0x6080 0000 0015 00 <n:5> <slice:5> Zn bits[2048*slice + 2047 : 2048*slice]
0x6050 0000 0015 04 <n:4> <slice:5> Pn bits[256*slice + 255 : 256*slice]
0x6050 0000 0015 060 <slice:5> FFR bits[256*slice + 255 : 256*slice]
0x6060 0000 0015 ffff KVM_REG_ARM64_SVE_VLS pseudo-register
Access to register IDs where 2048 * slice >= 128 * max_vq will fail with
ENOENT. max_vq is the vcpu's maximum supported vector length in 128-bit
quadwords: see (**) below.
These registers are only accessible on vcpus for which SVE is enabled.
See KVM_ARM_VCPU_INIT for details.
In addition, except for KVM_REG_ARM64_SVE_VLS, these registers are not
accessible until the vcpu's SVE configuration has been finalized
using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). See KVM_ARM_VCPU_INIT
and KVM_ARM_VCPU_FINALIZE for more information about this procedure.
KVM_REG_ARM64_SVE_VLS is a pseudo-register that allows the set of vector
lengths supported by the vcpu to be discovered and configured by
userspace. When transferred to or from user memory via KVM_GET_ONE_REG
or KVM_SET_ONE_REG, the value of this register is of type
__u64[KVM_ARM64_SVE_VLS_WORDS], and encodes the set of vector lengths as
follows:
__u64 vector_lengths[KVM_ARM64_SVE_VLS_WORDS];
if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX &&
((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >>
((vq - KVM_ARM64_SVE_VQ_MIN) % 64)) & 1))
/* Vector length vq * 16 bytes supported */
else
/* Vector length vq * 16 bytes not supported */
(**) The maximum value vq for which the above condition is true is
max_vq. This is the maximum vector length available to the guest on
this vcpu, and determines which register slices are visible through
this ioctl interface.
(See Documentation/arm64/sve.txt for an explanation of the "vq"
nomenclature.)
KVM_REG_ARM64_SVE_VLS is only accessible after KVM_ARM_VCPU_INIT.
KVM_ARM_VCPU_INIT initialises it to the best set of vector lengths that
the host supports.
Userspace may subsequently modify it if desired until the vcpu's SVE
configuration is finalized using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE).
Apart from simply removing all vector lengths from the host set that
exceed some value, support for arbitrarily chosen sets of vector lengths
is hardware-dependent and may not be available. Attempting to configure
an invalid set of vector lengths via KVM_SET_ONE_REG will fail with
EINVAL.
After the vcpu's SVE configuration is finalized, further attempts to
write this register will fail with EPERM.
MIPS registers are mapped using the lower 32 bits. The upper 16 of that is
the register group type:
@ -2198,6 +2276,12 @@ Architectures: all
Type: vcpu ioctl
Parameters: struct kvm_one_reg (in and out)
Returns: 0 on success, negative value on failure
Errors include:
 ENOENT:   no such register
 EINVAL:   invalid register ID, or no such register
 EPERM:    (arm64) register access not allowed before vcpu finalization
(These error codes are indicative only: do not rely on a specific error
code being returned in a specific situation.)
This ioctl allows to receive the value of a single register implemented
in a vcpu. The register to read is indicated by the "id" field of the
@ -2690,6 +2774,49 @@ Possible features:
- KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
Depends on KVM_CAP_ARM_PMU_V3.
- KVM_ARM_VCPU_PTRAUTH_ADDRESS: Enables Address Pointer authentication
for arm64 only.
Depends on KVM_CAP_ARM_PTRAUTH_ADDRESS.
If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are
both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and
KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be
requested.
- KVM_ARM_VCPU_PTRAUTH_GENERIC: Enables Generic Pointer authentication
for arm64 only.
Depends on KVM_CAP_ARM_PTRAUTH_GENERIC.
If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are
both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and
KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be
requested.
- KVM_ARM_VCPU_SVE: Enables SVE for the CPU (arm64 only).
Depends on KVM_CAP_ARM_SVE.
Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE):
* After KVM_ARM_VCPU_INIT:
- KVM_REG_ARM64_SVE_VLS may be read using KVM_GET_ONE_REG: the
initial value of this pseudo-register indicates the best set of
vector lengths possible for a vcpu on this host.
* Before KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE):
- KVM_RUN and KVM_GET_REG_LIST are not available;
- KVM_GET_ONE_REG and KVM_SET_ONE_REG cannot be used to access
the scalable archietctural SVE registers
KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() or
KVM_REG_ARM64_SVE_FFR;
- KVM_REG_ARM64_SVE_VLS may optionally be written using
KVM_SET_ONE_REG, to modify the set of vector lengths available
for the vcpu.
* After KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE):
- the KVM_REG_ARM64_SVE_VLS pseudo-register is immutable, and can
no longer be written using KVM_SET_ONE_REG.
4.83 KVM_ARM_PREFERRED_TARGET
@ -3809,7 +3936,7 @@ to I/O ports.
4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl)
Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
Architectures: x86, arm, arm64, mips
Type: vm ioctl
Parameters: struct kvm_dirty_log (in)
@ -3842,10 +3969,10 @@ the address space for which you want to return the dirty bitmap.
They must be less than the value that KVM_CHECK_EXTENSION returns for
the KVM_CAP_MULTI_ADDRESS_SPACE capability.
This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
is enabled; for more information, see the description of the capability.
However, it can always be used as long as KVM_CHECK_EXTENSION confirms
that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is present.
that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is present.
4.118 KVM_GET_SUPPORTED_HV_CPUID
@ -3904,6 +4031,40 @@ number of valid entries in the 'entries' array, which is then filled.
'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved,
userspace should not expect to get any particular value there.
4.119 KVM_ARM_VCPU_FINALIZE
Architectures: arm, arm64
Type: vcpu ioctl
Parameters: int feature (in)
Returns: 0 on success, -1 on error
Errors:
EPERM: feature not enabled, needs configuration, or already finalized
EINVAL: feature unknown or not present
Recognised values for feature:
arm64 KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE)
Finalizes the configuration of the specified vcpu feature.
The vcpu must already have been initialised, enabling the affected feature, by
means of a successful KVM_ARM_VCPU_INIT call with the appropriate flag set in
features[].
For affected vcpu features, this is a mandatory step that must be performed
before the vcpu is fully usable.
Between KVM_ARM_VCPU_INIT and KVM_ARM_VCPU_FINALIZE, the feature may be
configured by use of ioctls such as KVM_SET_ONE_REG. The exact configuration
that should be performaned and how to do it are feature-dependent.
Other calls that depend on a particular feature being finalized, such as
KVM_RUN, KVM_GET_REG_LIST, KVM_GET_ONE_REG and KVM_SET_ONE_REG, will fail with
-EPERM unless the feature has already been finalized by means of a
KVM_ARM_VCPU_FINALIZE call.
See KVM_ARM_VCPU_INIT for details of vcpu features that require finalization
using this ioctl.
5. The kvm_run structure
------------------------
@ -4505,6 +4666,15 @@ struct kvm_sync_regs {
struct kvm_vcpu_events events;
};
6.75 KVM_CAP_PPC_IRQ_XIVE
Architectures: ppc
Target: vcpu
Parameters: args[0] is the XIVE device fd
args[1] is the XIVE CPU number (server ID) for this vcpu
This capability connects the vcpu to an in-kernel XIVE device.
7. Capabilities that can be enabled on VMs
------------------------------------------
@ -4798,7 +4968,7 @@ and injected exceptions.
* For the new DR6 bits, note that bit 16 is set iff the #DB exception
will clear DR6.RTM.
7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
Architectures: x86, arm, arm64, mips
Parameters: args[0] whether feature should be enabled or not
@ -4821,6 +4991,11 @@ while userspace can see false reports of dirty pages. Manual reprotection
helps reducing this time, improving guest performance and reducing the
number of dirty log false positives.
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make
it hard or impossible to use it correctly. The availability of
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed.
Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT.
8. Other capabilities.
----------------------

View File

@ -141,7 +141,8 @@ struct kvm_s390_vm_cpu_subfunc {
u8 pcc[16]; # valid with Message-Security-Assist-Extension 4
u8 ppno[16]; # valid with Message-Security-Assist-Extension 5
u8 kma[16]; # valid with Message-Security-Assist-Extension 8
u8 reserved[1808]; # reserved for future instructions
u8 kdsa[16]; # valid with Message-Security-Assist-Extension 9
u8 reserved[1792]; # reserved for future instructions
};
Parameters: address of a buffer to load the subfunction blocks from.

View File

@ -0,0 +1,197 @@
POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1)
==========================================================
Device types supported:
KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1
This device acts as a VM interrupt controller. It provides the KVM
interface to configure the interrupt sources of a VM in the underlying
POWER9 XIVE interrupt controller.
Only one XIVE instance may be instantiated. A guest XIVE device
requires a POWER9 host and the guest OS should have support for the
XIVE native exploitation interrupt mode. If not, it should run using
the legacy interrupt mode, referred as XICS (POWER7/8).
* Device Mappings
The KVM device exposes different MMIO ranges of the XIVE HW which
are required for interrupt management. These are exposed to the
guest in VMAs populated with a custom VM fault handler.
1. Thread Interrupt Management Area (TIMA)
Each thread has an associated Thread Interrupt Management context
composed of a set of registers. These registers let the thread
handle priority management and interrupt acknowledgment. The most
important are :
- Interrupt Pending Buffer (IPB)
- Current Processor Priority (CPPR)
- Notification Source Register (NSR)
They are exposed to software in four different pages each proposing
a view with a different privilege. The first page is for the
physical thread context and the second for the hypervisor. Only the
third (operating system) and the fourth (user level) are exposed the
guest.
2. Event State Buffer (ESB)
Each source is associated with an Event State Buffer (ESB) with
either a pair of even/odd pair of pages which provides commands to
manage the source: to trigger, to EOI, to turn off the source for
instance.
3. Device pass-through
When a device is passed-through into the guest, the source
interrupts are from a different HW controller (PHB4) and the ESB
pages exposed to the guest should accommadate this change.
The passthru_irq helpers, kvmppc_xive_set_mapped() and
kvmppc_xive_clr_mapped() are called when the device HW irqs are
mapped into or unmapped from the guest IRQ number space. The KVM
device extends these helpers to clear the ESB pages of the guest IRQ
number being mapped and then lets the VM fault handler repopulate.
The handler will insert the ESB page corresponding to the HW
interrupt of the device being passed-through or the initial IPI ESB
page if the device has being removed.
The ESB remapping is fully transparent to the guest and the OS
device driver. All handling is done within VFIO and the above
helpers in KVM-PPC.
* Groups:
1. KVM_DEV_XIVE_GRP_CTRL
Provides global controls on the device
Attributes:
1.1 KVM_DEV_XIVE_RESET (write only)
Resets the interrupt controller configuration for sources and event
queues. To be used by kexec and kdump.
Errors: none
1.2 KVM_DEV_XIVE_EQ_SYNC (write only)
Sync all the sources and queues and mark the EQ pages dirty. This
to make sure that a consistent memory state is captured when
migrating the VM.
Errors: none
2. KVM_DEV_XIVE_GRP_SOURCE (write only)
Initializes a new source in the XIVE device and mask it.
Attributes:
Interrupt source number (64-bit)
The kvm_device_attr.addr points to a __u64 value:
bits: | 63 .... 2 | 1 | 0
values: | unused | level | type
- type: 0:MSI 1:LSI
- level: assertion level in case of an LSI.
Errors:
-E2BIG: Interrupt source number is out of range
-ENOMEM: Could not create a new source block
-EFAULT: Invalid user pointer for attr->addr.
-ENXIO: Could not allocate underlying HW interrupt
3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only)
Configures source targeting
Attributes:
Interrupt source number (64-bit)
The kvm_device_attr.addr points to a __u64 value:
bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0
values: | eisn | mask | server | priority
- priority: 0-7 interrupt priority level
- server: CPU number chosen to handle the interrupt
- mask: mask flag (unused)
- eisn: Effective Interrupt Source Number
Errors:
-ENOENT: Unknown source number
-EINVAL: Not initialized source number
-EINVAL: Invalid priority
-EINVAL: Invalid CPU number.
-EFAULT: Invalid user pointer for attr->addr.
-ENXIO: CPU event queues not configured or configuration of the
underlying HW interrupt failed
-EBUSY: No CPU available to serve interrupt
4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
Configures an event queue of a CPU
Attributes:
EQ descriptor identifier (64-bit)
The EQ descriptor identifier is a tuple (server, priority) :
bits: | 63 .... 32 | 31 .. 3 | 2 .. 0
values: | unused | server | priority
The kvm_device_attr.addr points to :
struct kvm_ppc_xive_eq {
__u32 flags;
__u32 qshift;
__u64 qaddr;
__u32 qtoggle;
__u32 qindex;
__u8 pad[40];
};
- flags: queue flags
KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
forces notification without using the coalescing mechanism
provided by the XIVE END ESBs.
- qshift: queue size (power of 2)
- qaddr: real address of queue
- qtoggle: current queue toggle bit
- qindex: current queue index
- pad: reserved for future use
Errors:
-ENOENT: Invalid CPU number
-EINVAL: Invalid priority
-EINVAL: Invalid flags
-EINVAL: Invalid queue size
-EINVAL: Invalid queue address
-EFAULT: Invalid user pointer for attr->addr.
-EIO: Configuration of the underlying HW failed
5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only)
Synchronize the source to flush event notifications
Attributes:
Interrupt source number (64-bit)
Errors:
-ENOENT: Unknown source number
-EINVAL: Not initialized source number
* VCPU state
The XIVE IC maintains VP interrupt state in an internal structure
called the NVT. When a VP is not dispatched on a HW processor
thread, this structure can be updated by HW if the VP is the target
of an event notification.
It is important for migration to capture the cached IPB from the NVT
as it synthesizes the priorities of the pending interrupts. We
capture a bit more to report debug information.
KVM_REG_PPC_VP_STATE (2 * 64bits)
bits: | 63 .... 32 | 31 .... 0 |
values: | TIMA word0 | TIMA word1 |
bits: | 127 .......... 64 |
values: | unused |
* Migration:
Saving the state of a VM using the XIVE native exploitation mode
should follow a specific sequence. When the VM is stopped :
1. Mask all sources (PQ=01) to stop the flow of events.
2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to
flush any in-flight event notification and to stabilize the EQs. At
this stage, the EQ pages are marked dirty to make sure they are
transferred in the migration sequence.
3. Capture the state of the source targeting, the EQs configuration
and the state of thread interrupt context registers.
Restore is similar :
1. Restore the EQ configuration. As targeting depends on it.
2. Restore targeting
3. Restore the thread interrupt contexts
4. Restore the source states
5. Let the vCPU run

View File

@ -343,4 +343,6 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
}
}
static inline void vcpu_ptrauth_setup_lazy(struct kvm_vcpu *vcpu) {}
#endif /* __ARM_KVM_EMULATE_H__ */

View File

@ -19,6 +19,7 @@
#ifndef __ARM_KVM_HOST_H__
#define __ARM_KVM_HOST_H__
#include <linux/errno.h>
#include <linux/types.h>
#include <linux/kvm_types.h>
#include <asm/cputype.h>
@ -53,6 +54,8 @@
DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
static inline int kvm_arm_init_sve(void) { return 0; }
u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
int __attribute_const__ kvm_target_cpu(void);
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
@ -150,9 +153,13 @@ struct kvm_cpu_context {
u32 cp15[NR_CP15_REGS];
};
typedef struct kvm_cpu_context kvm_cpu_context_t;
struct kvm_host_data {
struct kvm_cpu_context host_ctxt;
};
static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt,
typedef struct kvm_host_data kvm_host_data_t;
static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt,
int cpu)
{
/* The host's MPIDR is immutable, so let's set it up at boot time */
@ -182,7 +189,7 @@ struct kvm_vcpu_arch {
struct kvm_vcpu_fault_info fault;
/* Host FP context */
kvm_cpu_context_t *host_cpu_context;
struct kvm_cpu_context *host_cpu_context;
/* VGIC state */
struct vgic_cpu vgic_cpu;
@ -361,6 +368,9 @@ static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {}
static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {}
static inline void kvm_arm_vhe_guest_enter(void) {}
static inline void kvm_arm_vhe_guest_exit(void) {}
@ -409,4 +419,14 @@ static inline int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
return 0;
}
static inline int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
{
return -EINVAL;
}
static inline bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
{
return true;
}
#endif /* __ARM_KVM_HOST_H__ */

View File

@ -1341,6 +1341,7 @@ menu "ARMv8.3 architectural features"
config ARM64_PTR_AUTH
bool "Enable support for pointer authentication"
default y
depends on !KVM || ARM64_VHE
help
Pointer authentication (part of the ARMv8.3 Extensions) provides
instructions for signing and authenticating pointers against secret
@ -1354,8 +1355,9 @@ config ARM64_PTR_AUTH
context-switched along with the process.
The feature is detected at runtime. If the feature is not present in
hardware it will not be advertised to userspace nor will it be
enabled.
hardware it will not be advertised to userspace/KVM guest nor will it
be enabled. However, KVM guest also require VHE mode and hence
CONFIG_ARM64_VHE=y option to use this feature.
endmenu

View File

@ -24,10 +24,13 @@
#ifndef __ASSEMBLY__
#include <linux/bitmap.h>
#include <linux/build_bug.h>
#include <linux/bug.h>
#include <linux/cache.h>
#include <linux/init.h>
#include <linux/stddef.h>
#include <linux/types.h>
#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
/* Masks for extracting the FPSR and FPCR from the FPSCR */
@ -56,7 +59,8 @@ extern void fpsimd_restore_current_state(void);
extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
extern void fpsimd_bind_task_to_cpu(void);
extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
void *sve_state, unsigned int sve_vl);
extern void fpsimd_flush_task_state(struct task_struct *target);
extern void fpsimd_flush_cpu_state(void);
@ -87,6 +91,29 @@ extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern u64 read_zcr_features(void);
extern int __ro_after_init sve_max_vl;
extern int __ro_after_init sve_max_virtualisable_vl;
extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
/*
* Helpers to translate bit indices in sve_vq_map to VQ values (and
* vice versa). This allows find_next_bit() to be used to find the
* _maximum_ VQ not exceeding a certain value.
*/
static inline unsigned int __vq_to_bit(unsigned int vq)
{
return SVE_VQ_MAX - vq;
}
static inline unsigned int __bit_to_vq(unsigned int bit)
{
return SVE_VQ_MAX - bit;
}
/* Ensure vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX before calling this function */
static inline bool sve_vq_available(unsigned int vq)
{
return test_bit(__vq_to_bit(vq), sve_vq_map);
}
#ifdef CONFIG_ARM64_SVE

View File

@ -108,7 +108,8 @@ extern u32 __kvm_get_mdcr_el2(void);
.endm
.macro get_host_ctxt reg, tmp
hyp_adr_this_cpu \reg, kvm_host_cpu_state, \tmp
hyp_adr_this_cpu \reg, kvm_host_data, \tmp
add \reg, \reg, #HOST_DATA_CONTEXT
.endm
.macro get_vcpu_ptr vcpu, ctxt

View File

@ -98,6 +98,22 @@ static inline void vcpu_set_wfe_traps(struct kvm_vcpu *vcpu)
vcpu->arch.hcr_el2 |= HCR_TWE;
}
static inline void vcpu_ptrauth_enable(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 |= (HCR_API | HCR_APK);
}
static inline void vcpu_ptrauth_disable(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 &= ~(HCR_API | HCR_APK);
}
static inline void vcpu_ptrauth_setup_lazy(struct kvm_vcpu *vcpu)
{
if (vcpu_has_ptrauth(vcpu))
vcpu_ptrauth_disable(vcpu);
}
static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.vsesr_el2;

View File

@ -22,9 +22,13 @@
#ifndef __ARM64_KVM_HOST_H__
#define __ARM64_KVM_HOST_H__
#include <linux/bitmap.h>
#include <linux/types.h>
#include <linux/jump_label.h>
#include <linux/kvm_types.h>
#include <linux/percpu.h>
#include <asm/arch_gicv3.h>
#include <asm/barrier.h>
#include <asm/cpufeature.h>
#include <asm/daifflags.h>
#include <asm/fpsimd.h>
@ -45,7 +49,7 @@
#define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
#define KVM_VCPU_MAX_FEATURES 4
#define KVM_VCPU_MAX_FEATURES 7
#define KVM_REQ_SLEEP \
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
@ -54,8 +58,12 @@
DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
extern unsigned int kvm_sve_max_vl;
int kvm_arm_init_sve(void);
int __attribute_const__ kvm_target_cpu(void);
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext);
void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
@ -117,6 +125,7 @@ enum vcpu_sysreg {
SCTLR_EL1, /* System Control Register */
ACTLR_EL1, /* Auxiliary Control Register */
CPACR_EL1, /* Coprocessor Access Control */
ZCR_EL1, /* SVE Control */
TTBR0_EL1, /* Translation Table Base Register 0 */
TTBR1_EL1, /* Translation Table Base Register 1 */
TCR_EL1, /* Translation Control Register */
@ -152,6 +161,18 @@ enum vcpu_sysreg {
PMSWINC_EL0, /* Software Increment Register */
PMUSERENR_EL0, /* User Enable Register */
/* Pointer Authentication Registers in a strict increasing order. */
APIAKEYLO_EL1,
APIAKEYHI_EL1,
APIBKEYLO_EL1,
APIBKEYHI_EL1,
APDAKEYLO_EL1,
APDAKEYHI_EL1,
APDBKEYLO_EL1,
APDBKEYHI_EL1,
APGAKEYLO_EL1,
APGAKEYHI_EL1,
/* 32bit specific registers. Keep them at the end of the range */
DACR32_EL2, /* Domain Access Control Register */
IFSR32_EL2, /* Instruction Fault Status Register */
@ -212,7 +233,17 @@ struct kvm_cpu_context {
struct kvm_vcpu *__hyp_running_vcpu;
};
typedef struct kvm_cpu_context kvm_cpu_context_t;
struct kvm_pmu_events {
u32 events_host;
u32 events_guest;
};
struct kvm_host_data {
struct kvm_cpu_context host_ctxt;
struct kvm_pmu_events pmu_events;
};
typedef struct kvm_host_data kvm_host_data_t;
struct vcpu_reset_state {
unsigned long pc;
@ -223,6 +254,8 @@ struct vcpu_reset_state {
struct kvm_vcpu_arch {
struct kvm_cpu_context ctxt;
void *sve_state;
unsigned int sve_max_vl;
/* HYP configuration */
u64 hcr_el2;
@ -255,7 +288,7 @@ struct kvm_vcpu_arch {
struct kvm_guest_debug_arch external_debug_state;
/* Pointer to host CPU context */
kvm_cpu_context_t *host_cpu_context;
struct kvm_cpu_context *host_cpu_context;
struct thread_info *host_thread_info; /* hyp VA */
struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
@ -318,12 +351,40 @@ struct kvm_vcpu_arch {
bool sysregs_loaded_on_cpu;
};
/* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
#define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + \
sve_ffr_offset((vcpu)->arch.sve_max_vl)))
#define vcpu_sve_state_size(vcpu) ({ \
size_t __size_ret; \
unsigned int __vcpu_vq; \
\
if (WARN_ON(!sve_vl_valid((vcpu)->arch.sve_max_vl))) { \
__size_ret = 0; \
} else { \
__vcpu_vq = sve_vq_from_vl((vcpu)->arch.sve_max_vl); \
__size_ret = SVE_SIG_REGS_SIZE(__vcpu_vq); \
} \
\
__size_ret; \
})
/* vcpu_arch flags field values: */
#define KVM_ARM64_DEBUG_DIRTY (1 << 0)
#define KVM_ARM64_FP_ENABLED (1 << 1) /* guest FP regs loaded */
#define KVM_ARM64_FP_HOST (1 << 2) /* host FP regs loaded */
#define KVM_ARM64_HOST_SVE_IN_USE (1 << 3) /* backup for host TIF_SVE */
#define KVM_ARM64_HOST_SVE_ENABLED (1 << 4) /* SVE enabled for EL0 */
#define KVM_ARM64_GUEST_HAS_SVE (1 << 5) /* SVE exposed to guest */
#define KVM_ARM64_VCPU_SVE_FINALIZED (1 << 6) /* SVE config completed */
#define KVM_ARM64_GUEST_HAS_PTRAUTH (1 << 7) /* PTRAUTH exposed to guest */
#define vcpu_has_sve(vcpu) (system_supports_sve() && \
((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
#define vcpu_has_ptrauth(vcpu) ((system_supports_address_auth() || \
system_supports_generic_auth()) && \
((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_PTRAUTH))
#define vcpu_gp_regs(v) (&(v)->arch.ctxt.gp_regs)
@ -432,9 +493,9 @@ void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state);
DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data);
static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt,
static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt,
int cpu)
{
/* The host's MPIDR is immutable, so let's set it up at boot time */
@ -452,8 +513,8 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
* kernel's mapping to the linear mapping, and store it in tpidr_el2
* so that we can use adr_l to access per-cpu variables in EL2.
*/
u64 tpidr_el2 = ((u64)this_cpu_ptr(&kvm_host_cpu_state) -
(u64)kvm_ksym_ref(kvm_host_cpu_state));
u64 tpidr_el2 = ((u64)this_cpu_ptr(&kvm_host_data) -
(u64)kvm_ksym_ref(kvm_host_data));
/*
* Call initialization code, and switch to the full blown HYP code.
@ -491,9 +552,10 @@ static inline bool kvm_arch_requires_vhe(void)
return false;
}
void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu);
static inline void kvm_arch_hardware_unsetup(void) {}
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
@ -516,11 +578,28 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
{
return (!has_vhe() && attr->exclude_host);
}
#ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */
static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
{
return kvm_arch_vcpu_run_map_fp(vcpu);
}
void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
void kvm_clr_pmu_events(u32 clr);
void __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt);
bool __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt);
void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu);
void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
#else
static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
static inline void kvm_clr_pmu_events(u32 clr) {}
#endif
static inline void kvm_arm_vhe_guest_enter(void)
@ -594,4 +673,10 @@ void kvm_arch_free_vm(struct kvm *kvm);
int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature);
bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
#define kvm_arm_vcpu_sve_finalized(vcpu) \
((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
#endif /* __ARM64_KVM_HOST_H__ */

View File

@ -149,7 +149,6 @@ void __debug_switch_to_host(struct kvm_vcpu *vcpu);
void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
bool __fpsimd_enabled(void);
void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
void deactivate_traps_vhe_put(void);

View File

@ -0,0 +1,111 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* arch/arm64/include/asm/kvm_ptrauth.h: Guest/host ptrauth save/restore
* Copyright 2019 Arm Limited
* Authors: Mark Rutland <mark.rutland@arm.com>
* Amit Daniel Kachhap <amit.kachhap@arm.com>
*/
#ifndef __ASM_KVM_PTRAUTH_H
#define __ASM_KVM_PTRAUTH_H
#ifdef __ASSEMBLY__
#include <asm/sysreg.h>
#ifdef CONFIG_ARM64_PTR_AUTH
#define PTRAUTH_REG_OFFSET(x) (x - CPU_APIAKEYLO_EL1)
/*
* CPU_AP*_EL1 values exceed immediate offset range (512) for stp
* instruction so below macros takes CPU_APIAKEYLO_EL1 as base and
* calculates the offset of the keys from this base to avoid an extra add
* instruction. These macros assumes the keys offsets follow the order of
* the sysreg enum in kvm_host.h.
*/
.macro ptrauth_save_state base, reg1, reg2
mrs_s \reg1, SYS_APIAKEYLO_EL1
mrs_s \reg2, SYS_APIAKEYHI_EL1
stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIAKEYLO_EL1)]
mrs_s \reg1, SYS_APIBKEYLO_EL1
mrs_s \reg2, SYS_APIBKEYHI_EL1
stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIBKEYLO_EL1)]
mrs_s \reg1, SYS_APDAKEYLO_EL1
mrs_s \reg2, SYS_APDAKEYHI_EL1
stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDAKEYLO_EL1)]
mrs_s \reg1, SYS_APDBKEYLO_EL1
mrs_s \reg2, SYS_APDBKEYHI_EL1
stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDBKEYLO_EL1)]
mrs_s \reg1, SYS_APGAKEYLO_EL1
mrs_s \reg2, SYS_APGAKEYHI_EL1
stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APGAKEYLO_EL1)]
.endm
.macro ptrauth_restore_state base, reg1, reg2
ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIAKEYLO_EL1)]
msr_s SYS_APIAKEYLO_EL1, \reg1
msr_s SYS_APIAKEYHI_EL1, \reg2
ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIBKEYLO_EL1)]
msr_s SYS_APIBKEYLO_EL1, \reg1
msr_s SYS_APIBKEYHI_EL1, \reg2
ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDAKEYLO_EL1)]
msr_s SYS_APDAKEYLO_EL1, \reg1
msr_s SYS_APDAKEYHI_EL1, \reg2
ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDBKEYLO_EL1)]
msr_s SYS_APDBKEYLO_EL1, \reg1
msr_s SYS_APDBKEYHI_EL1, \reg2
ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APGAKEYLO_EL1)]
msr_s SYS_APGAKEYLO_EL1, \reg1
msr_s SYS_APGAKEYHI_EL1, \reg2
.endm
/*
* Both ptrauth_switch_to_guest and ptrauth_switch_to_host macros will
* check for the presence of one of the cpufeature flag
* ARM64_HAS_ADDRESS_AUTH_ARCH or ARM64_HAS_ADDRESS_AUTH_IMP_DEF and
* then proceed ahead with the save/restore of Pointer Authentication
* key registers.
*/
.macro ptrauth_switch_to_guest g_ctxt, reg1, reg2, reg3
alternative_if ARM64_HAS_ADDRESS_AUTH_ARCH
b 1000f
alternative_else_nop_endif
alternative_if_not ARM64_HAS_ADDRESS_AUTH_IMP_DEF
b 1001f
alternative_else_nop_endif
1000:
ldr \reg1, [\g_ctxt, #(VCPU_HCR_EL2 - VCPU_CONTEXT)]
and \reg1, \reg1, #(HCR_API | HCR_APK)
cbz \reg1, 1001f
add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1
ptrauth_restore_state \reg1, \reg2, \reg3
1001:
.endm
.macro ptrauth_switch_to_host g_ctxt, h_ctxt, reg1, reg2, reg3
alternative_if ARM64_HAS_ADDRESS_AUTH_ARCH
b 2000f
alternative_else_nop_endif
alternative_if_not ARM64_HAS_ADDRESS_AUTH_IMP_DEF
b 2001f
alternative_else_nop_endif
2000:
ldr \reg1, [\g_ctxt, #(VCPU_HCR_EL2 - VCPU_CONTEXT)]
and \reg1, \reg1, #(HCR_API | HCR_APK)
cbz \reg1, 2001f
add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1
ptrauth_save_state \reg1, \reg2, \reg3
add \reg1, \h_ctxt, #CPU_APIAKEYLO_EL1
ptrauth_restore_state \reg1, \reg2, \reg3
isb
2001:
.endm
#else /* !CONFIG_ARM64_PTR_AUTH */
.macro ptrauth_switch_to_guest g_ctxt, reg1, reg2, reg3
.endm
.macro ptrauth_switch_to_host g_ctxt, h_ctxt, reg1, reg2, reg3
.endm
#endif /* CONFIG_ARM64_PTR_AUTH */
#endif /* __ASSEMBLY__ */
#endif /* __ASM_KVM_PTRAUTH_H */

View File

@ -454,6 +454,9 @@
#define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6)
#define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7)
/* VHE encodings for architectural EL0/1 system registers */
#define SYS_ZCR_EL12 sys_reg(3, 5, 1, 2, 0)
/* Common SCTLR_ELx flags. */
#define SCTLR_ELx_DSSBS (_BITUL(44))
#define SCTLR_ELx_ENIA (_BITUL(31))

View File

@ -35,6 +35,7 @@
#include <linux/psci.h>
#include <linux/types.h>
#include <asm/ptrace.h>
#include <asm/sve_context.h>
#define __KVM_HAVE_GUEST_DEBUG
#define __KVM_HAVE_IRQ_LINE
@ -102,6 +103,9 @@ struct kvm_regs {
#define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */
#define KVM_ARM_VCPU_PSCI_0_2 2 /* CPU uses PSCI v0.2 */
#define KVM_ARM_VCPU_PMU_V3 3 /* Support guest PMUv3 */
#define KVM_ARM_VCPU_SVE 4 /* enable SVE for this CPU */
#define KVM_ARM_VCPU_PTRAUTH_ADDRESS 5 /* VCPU uses address authentication */
#define KVM_ARM_VCPU_PTRAUTH_GENERIC 6 /* VCPU uses generic authentication */
struct kvm_vcpu_init {
__u32 target;
@ -226,6 +230,45 @@ struct kvm_vcpu_events {
KVM_REG_ARM_FW | ((r) & 0xffff))
#define KVM_REG_ARM_PSCI_VERSION KVM_REG_ARM_FW_REG(0)
/* SVE registers */
#define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT)
/* Z- and P-regs occupy blocks at the following offsets within this range: */
#define KVM_REG_ARM64_SVE_ZREG_BASE 0
#define KVM_REG_ARM64_SVE_PREG_BASE 0x400
#define KVM_REG_ARM64_SVE_FFR_BASE 0x600
#define KVM_ARM64_SVE_NUM_ZREGS __SVE_NUM_ZREGS
#define KVM_ARM64_SVE_NUM_PREGS __SVE_NUM_PREGS
#define KVM_ARM64_SVE_MAX_SLICES 32
#define KVM_REG_ARM64_SVE_ZREG(n, i) \
(KVM_REG_ARM64 | KVM_REG_ARM64_SVE | KVM_REG_ARM64_SVE_ZREG_BASE | \
KVM_REG_SIZE_U2048 | \
(((n) & (KVM_ARM64_SVE_NUM_ZREGS - 1)) << 5) | \
((i) & (KVM_ARM64_SVE_MAX_SLICES - 1)))
#define KVM_REG_ARM64_SVE_PREG(n, i) \
(KVM_REG_ARM64 | KVM_REG_ARM64_SVE | KVM_REG_ARM64_SVE_PREG_BASE | \
KVM_REG_SIZE_U256 | \
(((n) & (KVM_ARM64_SVE_NUM_PREGS - 1)) << 5) | \
((i) & (KVM_ARM64_SVE_MAX_SLICES - 1)))
#define KVM_REG_ARM64_SVE_FFR(i) \
(KVM_REG_ARM64 | KVM_REG_ARM64_SVE | KVM_REG_ARM64_SVE_FFR_BASE | \
KVM_REG_SIZE_U256 | \
((i) & (KVM_ARM64_SVE_MAX_SLICES - 1)))
#define KVM_ARM64_SVE_VQ_MIN __SVE_VQ_MIN
#define KVM_ARM64_SVE_VQ_MAX __SVE_VQ_MAX
/* Vector lengths pseudo-register: */
#define KVM_REG_ARM64_SVE_VLS (KVM_REG_ARM64 | KVM_REG_ARM64_SVE | \
KVM_REG_SIZE_U512 | 0xffff)
#define KVM_ARM64_SVE_VLS_WORDS \
((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
/* Device Control API: ARM VGIC */
#define KVM_DEV_ARM_VGIC_GRP_ADDR 0
#define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1

View File

@ -125,9 +125,16 @@ int main(void)
DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt));
DEFINE(VCPU_FAULT_DISR, offsetof(struct kvm_vcpu, arch.fault.disr_el1));
DEFINE(VCPU_WORKAROUND_FLAGS, offsetof(struct kvm_vcpu, arch.workaround_flags));
DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2));
DEFINE(CPU_GP_REGS, offsetof(struct kvm_cpu_context, gp_regs));
DEFINE(CPU_APIAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APIAKEYLO_EL1]));
DEFINE(CPU_APIBKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APIBKEYLO_EL1]));
DEFINE(CPU_APDAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APDAKEYLO_EL1]));
DEFINE(CPU_APDBKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APDBKEYLO_EL1]));
DEFINE(CPU_APGAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APGAKEYLO_EL1]));
DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs));
DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
DEFINE(HOST_DATA_CONTEXT, offsetof(struct kvm_host_data, host_ctxt));
#endif
#ifdef CONFIG_CPU_PM
DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));

View File

@ -1913,7 +1913,7 @@ static void verify_sve_features(void)
unsigned int len = zcr & ZCR_ELx_LEN_MASK;
if (len < safe_len || sve_verify_vq_map()) {
pr_crit("CPU%d: SVE: required vector length(s) missing\n",
pr_crit("CPU%d: SVE: vector length support mismatch\n",
smp_processor_id());
cpu_die_early();
}

View File

@ -18,6 +18,7 @@
*/
#include <linux/bitmap.h>
#include <linux/bitops.h>
#include <linux/bottom_half.h>
#include <linux/bug.h>
#include <linux/cache.h>
@ -48,6 +49,7 @@
#include <asm/sigcontext.h>
#include <asm/sysreg.h>
#include <asm/traps.h>
#include <asm/virt.h>
#define FPEXC_IOF (1 << 0)
#define FPEXC_DZF (1 << 1)
@ -119,6 +121,8 @@
*/
struct fpsimd_last_state_struct {
struct user_fpsimd_state *st;
void *sve_state;
unsigned int sve_vl;
};
static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
@ -130,14 +134,23 @@ static int sve_default_vl = -1;
/* Maximum supported vector length across all CPUs (initially poisoned) */
int __ro_after_init sve_max_vl = SVE_VL_MIN;
/* Set of available vector lengths, as vq_to_bit(vq): */
static __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
int __ro_after_init sve_max_virtualisable_vl = SVE_VL_MIN;
/*
* Set of available vector lengths,
* where length vq encoded as bit __vq_to_bit(vq):
*/
__ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
/* Set of vector lengths present on at least one cpu: */
static __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX);
static void __percpu *efi_sve_state;
#else /* ! CONFIG_ARM64_SVE */
/* Dummy declaration for code that will be optimised out: */
extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
extern __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX);
extern void __percpu *efi_sve_state;
#endif /* ! CONFIG_ARM64_SVE */
@ -235,14 +248,15 @@ static void task_fpsimd_load(void)
*/
void fpsimd_save(void)
{
struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
struct fpsimd_last_state_struct const *last =
this_cpu_ptr(&fpsimd_last_state);
/* set by fpsimd_bind_task_to_cpu() or fpsimd_bind_state_to_cpu() */
WARN_ON(!in_softirq() && !irqs_disabled());
if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
if (system_supports_sve() && test_thread_flag(TIF_SVE)) {
if (WARN_ON(sve_get_vl() != current->thread.sve_vl)) {
if (WARN_ON(sve_get_vl() != last->sve_vl)) {
/*
* Can't save the user regs, so current would
* re-enter user with corrupt state.
@ -252,31 +266,14 @@ void fpsimd_save(void)
return;
}
sve_save_state(sve_pffr(&current->thread), &st->fpsr);
sve_save_state((char *)last->sve_state +
sve_ffr_offset(last->sve_vl),
&last->st->fpsr);
} else
fpsimd_save_state(st);
fpsimd_save_state(last->st);
}
}
/*
* Helpers to translate bit indices in sve_vq_map to VQ values (and
* vice versa). This allows find_next_bit() to be used to find the
* _maximum_ VQ not exceeding a certain value.
*/
static unsigned int vq_to_bit(unsigned int vq)
{
return SVE_VQ_MAX - vq;
}
static unsigned int bit_to_vq(unsigned int bit)
{
if (WARN_ON(bit >= SVE_VQ_MAX))
bit = SVE_VQ_MAX - 1;
return SVE_VQ_MAX - bit;
}
/*
* All vector length selection from userspace comes through here.
* We're on a slow path, so some sanity-checks are included.
@ -298,8 +295,8 @@ static unsigned int find_supported_vector_length(unsigned int vl)
vl = max_vl;
bit = find_next_bit(sve_vq_map, SVE_VQ_MAX,
vq_to_bit(sve_vq_from_vl(vl)));
return sve_vl_from_vq(bit_to_vq(bit));
__vq_to_bit(sve_vq_from_vl(vl)));
return sve_vl_from_vq(__bit_to_vq(bit));
}
#ifdef CONFIG_SYSCTL
@ -550,7 +547,6 @@ int sve_set_vector_length(struct task_struct *task,
local_bh_disable();
fpsimd_save();
set_thread_flag(TIF_FOREIGN_FPSTATE);
}
fpsimd_flush_task_state(task);
@ -624,12 +620,6 @@ int sve_get_current_vl(void)
return sve_prctl_status(0);
}
/*
* Bitmap for temporary storage of the per-CPU set of supported vector lengths
* during secondary boot.
*/
static DECLARE_BITMAP(sve_secondary_vq_map, SVE_VQ_MAX);
static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX))
{
unsigned int vq, vl;
@ -644,40 +634,82 @@ static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX))
write_sysreg_s(zcr | (vq - 1), SYS_ZCR_EL1); /* self-syncing */
vl = sve_get_vl();
vq = sve_vq_from_vl(vl); /* skip intervening lengths */
set_bit(vq_to_bit(vq), map);
set_bit(__vq_to_bit(vq), map);
}
}
/*
* Initialise the set of known supported VQs for the boot CPU.
* This is called during kernel boot, before secondary CPUs are brought up.
*/
void __init sve_init_vq_map(void)
{
sve_probe_vqs(sve_vq_map);
bitmap_copy(sve_vq_partial_map, sve_vq_map, SVE_VQ_MAX);
}
/*
* If we haven't committed to the set of supported VQs yet, filter out
* those not supported by the current CPU.
* This function is called during the bring-up of early secondary CPUs only.
*/
void sve_update_vq_map(void)
{
sve_probe_vqs(sve_secondary_vq_map);
bitmap_and(sve_vq_map, sve_vq_map, sve_secondary_vq_map, SVE_VQ_MAX);
DECLARE_BITMAP(tmp_map, SVE_VQ_MAX);
sve_probe_vqs(tmp_map);
bitmap_and(sve_vq_map, sve_vq_map, tmp_map, SVE_VQ_MAX);
bitmap_or(sve_vq_partial_map, sve_vq_partial_map, tmp_map, SVE_VQ_MAX);
}
/* Check whether the current CPU supports all VQs in the committed set */
/*
* Check whether the current CPU supports all VQs in the committed set.
* This function is called during the bring-up of late secondary CPUs only.
*/
int sve_verify_vq_map(void)
{
int ret = 0;
DECLARE_BITMAP(tmp_map, SVE_VQ_MAX);
unsigned long b;
sve_probe_vqs(sve_secondary_vq_map);
bitmap_andnot(sve_secondary_vq_map, sve_vq_map, sve_secondary_vq_map,
SVE_VQ_MAX);
if (!bitmap_empty(sve_secondary_vq_map, SVE_VQ_MAX)) {
sve_probe_vqs(tmp_map);
bitmap_complement(tmp_map, tmp_map, SVE_VQ_MAX);
if (bitmap_intersects(tmp_map, sve_vq_map, SVE_VQ_MAX)) {
pr_warn("SVE: cpu%d: Required vector length(s) missing\n",
smp_processor_id());
ret = -EINVAL;
return -EINVAL;
}
return ret;
if (!IS_ENABLED(CONFIG_KVM) || !is_hyp_mode_available())
return 0;
/*
* For KVM, it is necessary to ensure that this CPU doesn't
* support any vector length that guests may have probed as
* unsupported.
*/
/* Recover the set of supported VQs: */
bitmap_complement(tmp_map, tmp_map, SVE_VQ_MAX);
/* Find VQs supported that are not globally supported: */
bitmap_andnot(tmp_map, tmp_map, sve_vq_map, SVE_VQ_MAX);
/* Find the lowest such VQ, if any: */
b = find_last_bit(tmp_map, SVE_VQ_MAX);
if (b >= SVE_VQ_MAX)
return 0; /* no mismatches */
/*
* Mismatches above sve_max_virtualisable_vl are fine, since
* no guest is allowed to configure ZCR_EL2.LEN to exceed this:
*/
if (sve_vl_from_vq(__bit_to_vq(b)) <= sve_max_virtualisable_vl) {
pr_warn("SVE: cpu%d: Unsupported vector length(s) present\n",
smp_processor_id());
return -EINVAL;
}
return 0;
}
static void __init sve_efi_setup(void)
@ -744,6 +776,8 @@ u64 read_zcr_features(void)
void __init sve_setup(void)
{
u64 zcr;
DECLARE_BITMAP(tmp_map, SVE_VQ_MAX);
unsigned long b;
if (!system_supports_sve())
return;
@ -753,8 +787,8 @@ void __init sve_setup(void)
* so sve_vq_map must have at least SVE_VQ_MIN set.
* If something went wrong, at least try to patch it up:
*/
if (WARN_ON(!test_bit(vq_to_bit(SVE_VQ_MIN), sve_vq_map)))
set_bit(vq_to_bit(SVE_VQ_MIN), sve_vq_map);
if (WARN_ON(!test_bit(__vq_to_bit(SVE_VQ_MIN), sve_vq_map)))
set_bit(__vq_to_bit(SVE_VQ_MIN), sve_vq_map);
zcr = read_sanitised_ftr_reg(SYS_ZCR_EL1);
sve_max_vl = sve_vl_from_vq((zcr & ZCR_ELx_LEN_MASK) + 1);
@ -772,11 +806,31 @@ void __init sve_setup(void)
*/
sve_default_vl = find_supported_vector_length(64);
bitmap_andnot(tmp_map, sve_vq_partial_map, sve_vq_map,
SVE_VQ_MAX);
b = find_last_bit(tmp_map, SVE_VQ_MAX);
if (b >= SVE_VQ_MAX)
/* No non-virtualisable VLs found */
sve_max_virtualisable_vl = SVE_VQ_MAX;
else if (WARN_ON(b == SVE_VQ_MAX - 1))
/* No virtualisable VLs? This is architecturally forbidden. */
sve_max_virtualisable_vl = SVE_VQ_MIN;
else /* b + 1 < SVE_VQ_MAX */
sve_max_virtualisable_vl = sve_vl_from_vq(__bit_to_vq(b + 1));
if (sve_max_virtualisable_vl > sve_max_vl)
sve_max_virtualisable_vl = sve_max_vl;
pr_info("SVE: maximum available vector length %u bytes per vector\n",
sve_max_vl);
pr_info("SVE: default vector length %u bytes per vector\n",
sve_default_vl);
/* KVM decides whether to support mismatched systems. Just warn here: */
if (sve_max_virtualisable_vl < sve_max_vl)
pr_warn("SVE: unvirtualisable vector lengths present\n");
sve_efi_setup();
}
@ -816,12 +870,11 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
local_bh_disable();
fpsimd_save();
fpsimd_to_sve(current);
/* Force ret_to_user to reload the registers: */
fpsimd_flush_task_state(current);
set_thread_flag(TIF_FOREIGN_FPSTATE);
fpsimd_to_sve(current);
if (test_and_set_thread_flag(TIF_SVE))
WARN_ON(1); /* SVE access shouldn't have trapped */
@ -894,9 +947,9 @@ void fpsimd_flush_thread(void)
local_bh_disable();
fpsimd_flush_task_state(current);
memset(&current->thread.uw.fpsimd_state, 0,
sizeof(current->thread.uw.fpsimd_state));
fpsimd_flush_task_state(current);
if (system_supports_sve()) {
clear_thread_flag(TIF_SVE);
@ -933,8 +986,6 @@ void fpsimd_flush_thread(void)
current->thread.sve_vl_onexec = 0;
}
set_thread_flag(TIF_FOREIGN_FPSTATE);
local_bh_enable();
}
@ -974,6 +1025,8 @@ void fpsimd_bind_task_to_cpu(void)
this_cpu_ptr(&fpsimd_last_state);
last->st = &current->thread.uw.fpsimd_state;
last->sve_state = current->thread.sve_state;
last->sve_vl = current->thread.sve_vl;
current->thread.fpsimd_cpu = smp_processor_id();
if (system_supports_sve()) {
@ -987,7 +1040,8 @@ void fpsimd_bind_task_to_cpu(void)
}
}
void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
unsigned int sve_vl)
{
struct fpsimd_last_state_struct *last =
this_cpu_ptr(&fpsimd_last_state);
@ -995,6 +1049,8 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
WARN_ON(!in_softirq() && !irqs_disabled());
last->st = st;
last->sve_state = sve_state;
last->sve_vl = sve_vl;
}
/*
@ -1043,12 +1099,29 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
/*
* Invalidate live CPU copies of task t's FPSIMD state
*
* This function may be called with preemption enabled. The barrier()
* ensures that the assignment to fpsimd_cpu is visible to any
* preemption/softirq that could race with set_tsk_thread_flag(), so
* that TIF_FOREIGN_FPSTATE cannot be spuriously re-cleared.
*
* The final barrier ensures that TIF_FOREIGN_FPSTATE is seen set by any
* subsequent code.
*/
void fpsimd_flush_task_state(struct task_struct *t)
{
t->thread.fpsimd_cpu = NR_CPUS;
barrier();
set_tsk_thread_flag(t, TIF_FOREIGN_FPSTATE);
barrier();
}
/*
* Invalidate any task's FPSIMD state that is present on this cpu.
* This function must be called with softirqs disabled.
*/
void fpsimd_flush_cpu_state(void)
{
__this_cpu_write(fpsimd_last_state.st, NULL);

View File

@ -26,6 +26,7 @@
#include <linux/acpi.h>
#include <linux/clocksource.h>
#include <linux/kvm_host.h>
#include <linux/of.h>
#include <linux/perf/arm_pmu.h>
#include <linux/platform_device.h>
@ -528,12 +529,21 @@ static inline int armv8pmu_enable_counter(int idx)
static inline void armv8pmu_enable_event_counter(struct perf_event *event)
{
struct perf_event_attr *attr = &event->attr;
int idx = event->hw.idx;
u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
armv8pmu_enable_counter(idx);
if (armv8pmu_event_is_chained(event))
armv8pmu_enable_counter(idx - 1);
isb();
counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
kvm_set_pmu_events(counter_bits, attr);
/* We rely on the hypervisor switch code to enable guest counters */
if (!kvm_pmu_counter_deferred(attr)) {
armv8pmu_enable_counter(idx);
if (armv8pmu_event_is_chained(event))
armv8pmu_enable_counter(idx - 1);
}
}
static inline int armv8pmu_disable_counter(int idx)
@ -546,11 +556,21 @@ static inline int armv8pmu_disable_counter(int idx)
static inline void armv8pmu_disable_event_counter(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
struct perf_event_attr *attr = &event->attr;
int idx = hwc->idx;
u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
if (armv8pmu_event_is_chained(event))
armv8pmu_disable_counter(idx - 1);
armv8pmu_disable_counter(idx);
counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
kvm_clr_pmu_events(counter_bits);
/* We rely on the hypervisor switch code to disable guest counters */
if (!kvm_pmu_counter_deferred(attr)) {
if (armv8pmu_event_is_chained(event))
armv8pmu_disable_counter(idx - 1);
armv8pmu_disable_counter(idx);
}
}
static inline int armv8pmu_enable_intens(int idx)
@ -827,14 +847,23 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
* with other architectures (x86 and Power).
*/
if (is_kernel_in_hyp_mode()) {
if (!attr->exclude_kernel)
if (!attr->exclude_kernel && !attr->exclude_host)
config_base |= ARMV8_PMU_INCLUDE_EL2;
} else {
if (attr->exclude_kernel)
if (attr->exclude_guest)
config_base |= ARMV8_PMU_EXCLUDE_EL1;
if (!attr->exclude_hv)
if (attr->exclude_host)
config_base |= ARMV8_PMU_EXCLUDE_EL0;
} else {
if (!attr->exclude_hv && !attr->exclude_host)
config_base |= ARMV8_PMU_INCLUDE_EL2;
}
/*
* Filter out !VHE kernels and guest kernels
*/
if (attr->exclude_kernel)
config_base |= ARMV8_PMU_EXCLUDE_EL1;
if (attr->exclude_user)
config_base |= ARMV8_PMU_EXCLUDE_EL0;
@ -864,6 +893,9 @@ static void armv8pmu_reset(void *info)
armv8pmu_disable_intens(idx);
}
/* Clear the counters we flip at guest entry/exit */
kvm_clr_pmu_events(U32_MAX);
/*
* Initialize & Reset PMNC. Request overflow interrupt for
* 64 bit cycle counter but cheat in armv8pmu_write_counter().

View File

@ -296,11 +296,6 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
*/
fpsimd_flush_task_state(current);
barrier();
/* From now, fpsimd_thread_switch() won't clear TIF_FOREIGN_FPSTATE */
set_thread_flag(TIF_FOREIGN_FPSTATE);
barrier();
/* From now, fpsimd_thread_switch() won't touch thread.sve_state */
sve_alloc(current);

View File

@ -17,7 +17,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o pmu.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o

View File

@ -9,6 +9,7 @@
#include <linux/sched.h>
#include <linux/thread_info.h>
#include <linux/kvm_host.h>
#include <asm/fpsimd.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_host.h>
#include <asm/kvm_mmu.h>
@ -85,9 +86,12 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
WARN_ON_ONCE(!irqs_disabled());
if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.gp_regs.fp_regs);
fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.gp_regs.fp_regs,
vcpu->arch.sve_state,
vcpu->arch.sve_max_vl);
clear_thread_flag(TIF_FOREIGN_FPSTATE);
clear_thread_flag(TIF_SVE);
update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
}
}
@ -100,14 +104,21 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
{
unsigned long flags;
bool host_has_sve = system_supports_sve();
bool guest_has_sve = vcpu_has_sve(vcpu);
local_irq_save(flags);
if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
u64 *guest_zcr = &vcpu->arch.ctxt.sys_regs[ZCR_EL1];
/* Clean guest FP state to memory and invalidate cpu view */
fpsimd_save();
fpsimd_flush_cpu_state();
} else if (system_supports_sve()) {
if (guest_has_sve)
*guest_zcr = read_sysreg_s(SYS_ZCR_EL12);
} else if (host_has_sve) {
/*
* The FPSIMD/SVE state in the CPU has not been touched, and we
* have SVE (and VHE): CPACR_EL1 (alias CPTR_EL2) has been

View File

@ -19,18 +19,25 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include <linux/bits.h>
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/nospec.h>
#include <linux/kvm_host.h>
#include <linux/module.h>
#include <linux/stddef.h>
#include <linux/string.h>
#include <linux/vmalloc.h>
#include <linux/fs.h>
#include <kvm/arm_psci.h>
#include <asm/cputype.h>
#include <linux/uaccess.h>
#include <asm/fpsimd.h>
#include <asm/kvm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_coproc.h>
#include <asm/kvm_host.h>
#include <asm/sigcontext.h>
#include "trace.h"
@ -52,12 +59,19 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
return 0;
}
static bool core_reg_offset_is_vreg(u64 off)
{
return off >= KVM_REG_ARM_CORE_REG(fp_regs.vregs) &&
off < KVM_REG_ARM_CORE_REG(fp_regs.fpsr);
}
static u64 core_reg_offset_from_id(u64 id)
{
return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
}
static int validate_core_offset(const struct kvm_one_reg *reg)
static int validate_core_offset(const struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg)
{
u64 off = core_reg_offset_from_id(reg->id);
int size;
@ -89,11 +103,19 @@ static int validate_core_offset(const struct kvm_one_reg *reg)
return -EINVAL;
}
if (KVM_REG_SIZE(reg->id) == size &&
IS_ALIGNED(off, size / sizeof(__u32)))
return 0;
if (KVM_REG_SIZE(reg->id) != size ||
!IS_ALIGNED(off, size / sizeof(__u32)))
return -EINVAL;
return -EINVAL;
/*
* The KVM_REG_ARM64_SVE regs must be used instead of
* KVM_REG_ARM_CORE for accessing the FPSIMD V-registers on
* SVE-enabled vcpus:
*/
if (vcpu_has_sve(vcpu) && core_reg_offset_is_vreg(off))
return -EINVAL;
return 0;
}
static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@ -115,7 +137,7 @@ static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
(off + (KVM_REG_SIZE(reg->id) / sizeof(__u32))) >= nr_regs)
return -ENOENT;
if (validate_core_offset(reg))
if (validate_core_offset(vcpu, reg))
return -EINVAL;
if (copy_to_user(uaddr, ((u32 *)regs) + off, KVM_REG_SIZE(reg->id)))
@ -140,7 +162,7 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
(off + (KVM_REG_SIZE(reg->id) / sizeof(__u32))) >= nr_regs)
return -ENOENT;
if (validate_core_offset(reg))
if (validate_core_offset(vcpu, reg))
return -EINVAL;
if (KVM_REG_SIZE(reg->id) > sizeof(tmp))
@ -183,6 +205,239 @@ out:
return err;
}
#define vq_word(vq) (((vq) - SVE_VQ_MIN) / 64)
#define vq_mask(vq) ((u64)1 << ((vq) - SVE_VQ_MIN) % 64)
static bool vq_present(
const u64 (*const vqs)[KVM_ARM64_SVE_VLS_WORDS],
unsigned int vq)
{
return (*vqs)[vq_word(vq)] & vq_mask(vq);
}
static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
unsigned int max_vq, vq;
u64 vqs[KVM_ARM64_SVE_VLS_WORDS];
if (!vcpu_has_sve(vcpu))
return -ENOENT;
if (WARN_ON(!sve_vl_valid(vcpu->arch.sve_max_vl)))
return -EINVAL;
memset(vqs, 0, sizeof(vqs));
max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl);
for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
if (sve_vq_available(vq))
vqs[vq_word(vq)] |= vq_mask(vq);
if (copy_to_user((void __user *)reg->addr, vqs, sizeof(vqs)))
return -EFAULT;
return 0;
}
static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
unsigned int max_vq, vq;
u64 vqs[KVM_ARM64_SVE_VLS_WORDS];
if (!vcpu_has_sve(vcpu))
return -ENOENT;
if (kvm_arm_vcpu_sve_finalized(vcpu))
return -EPERM; /* too late! */
if (WARN_ON(vcpu->arch.sve_state))
return -EINVAL;
if (copy_from_user(vqs, (const void __user *)reg->addr, sizeof(vqs)))
return -EFAULT;
max_vq = 0;
for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; ++vq)
if (vq_present(&vqs, vq))
max_vq = vq;
if (max_vq > sve_vq_from_vl(kvm_sve_max_vl))
return -EINVAL;
/*
* Vector lengths supported by the host can't currently be
* hidden from the guest individually: instead we can only set a
* maxmium via ZCR_EL2.LEN. So, make sure the available vector
* lengths match the set requested exactly up to the requested
* maximum:
*/
for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
if (vq_present(&vqs, vq) != sve_vq_available(vq))
return -EINVAL;
/* Can't run with no vector lengths at all: */
if (max_vq < SVE_VQ_MIN)
return -EINVAL;
/* vcpu->arch.sve_state will be alloc'd by kvm_vcpu_finalize_sve() */
vcpu->arch.sve_max_vl = sve_vl_from_vq(max_vq);
return 0;
}
#define SVE_REG_SLICE_SHIFT 0
#define SVE_REG_SLICE_BITS 5
#define SVE_REG_ID_SHIFT (SVE_REG_SLICE_SHIFT + SVE_REG_SLICE_BITS)
#define SVE_REG_ID_BITS 5
#define SVE_REG_SLICE_MASK \
GENMASK(SVE_REG_SLICE_SHIFT + SVE_REG_SLICE_BITS - 1, \
SVE_REG_SLICE_SHIFT)
#define SVE_REG_ID_MASK \
GENMASK(SVE_REG_ID_SHIFT + SVE_REG_ID_BITS - 1, SVE_REG_ID_SHIFT)
#define SVE_NUM_SLICES (1 << SVE_REG_SLICE_BITS)
#define KVM_SVE_ZREG_SIZE KVM_REG_SIZE(KVM_REG_ARM64_SVE_ZREG(0, 0))
#define KVM_SVE_PREG_SIZE KVM_REG_SIZE(KVM_REG_ARM64_SVE_PREG(0, 0))
/*
* Number of register slices required to cover each whole SVE register.
* NOTE: Only the first slice every exists, for now.
* If you are tempted to modify this, you must also rework sve_reg_to_region()
* to match:
*/
#define vcpu_sve_slices(vcpu) 1
/* Bounds of a single SVE register slice within vcpu->arch.sve_state */
struct sve_state_reg_region {
unsigned int koffset; /* offset into sve_state in kernel memory */
unsigned int klen; /* length in kernel memory */
unsigned int upad; /* extra trailing padding in user memory */
};
/*
* Validate SVE register ID and get sanitised bounds for user/kernel SVE
* register copy
*/
static int sve_reg_to_region(struct sve_state_reg_region *region,
struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg)
{
/* reg ID ranges for Z- registers */
const u64 zreg_id_min = KVM_REG_ARM64_SVE_ZREG(0, 0);
const u64 zreg_id_max = KVM_REG_ARM64_SVE_ZREG(SVE_NUM_ZREGS - 1,
SVE_NUM_SLICES - 1);
/* reg ID ranges for P- registers and FFR (which are contiguous) */
const u64 preg_id_min = KVM_REG_ARM64_SVE_PREG(0, 0);
const u64 preg_id_max = KVM_REG_ARM64_SVE_FFR(SVE_NUM_SLICES - 1);
unsigned int vq;
unsigned int reg_num;
unsigned int reqoffset, reqlen; /* User-requested offset and length */
unsigned int maxlen; /* Maxmimum permitted length */
size_t sve_state_size;
const u64 last_preg_id = KVM_REG_ARM64_SVE_PREG(SVE_NUM_PREGS - 1,
SVE_NUM_SLICES - 1);
/* Verify that the P-regs and FFR really do have contiguous IDs: */
BUILD_BUG_ON(KVM_REG_ARM64_SVE_FFR(0) != last_preg_id + 1);
/* Verify that we match the UAPI header: */
BUILD_BUG_ON(SVE_NUM_SLICES != KVM_ARM64_SVE_MAX_SLICES);
reg_num = (reg->id & SVE_REG_ID_MASK) >> SVE_REG_ID_SHIFT;
if (reg->id >= zreg_id_min && reg->id <= zreg_id_max) {
if (!vcpu_has_sve(vcpu) || (reg->id & SVE_REG_SLICE_MASK) > 0)
return -ENOENT;
vq = sve_vq_from_vl(vcpu->arch.sve_max_vl);
reqoffset = SVE_SIG_ZREG_OFFSET(vq, reg_num) -
SVE_SIG_REGS_OFFSET;
reqlen = KVM_SVE_ZREG_SIZE;
maxlen = SVE_SIG_ZREG_SIZE(vq);
} else if (reg->id >= preg_id_min && reg->id <= preg_id_max) {
if (!vcpu_has_sve(vcpu) || (reg->id & SVE_REG_SLICE_MASK) > 0)
return -ENOENT;
vq = sve_vq_from_vl(vcpu->arch.sve_max_vl);
reqoffset = SVE_SIG_PREG_OFFSET(vq, reg_num) -
SVE_SIG_REGS_OFFSET;
reqlen = KVM_SVE_PREG_SIZE;
maxlen = SVE_SIG_PREG_SIZE(vq);
} else {
return -EINVAL;
}
sve_state_size = vcpu_sve_state_size(vcpu);
if (WARN_ON(!sve_state_size))
return -EINVAL;
region->koffset = array_index_nospec(reqoffset, sve_state_size);
region->klen = min(maxlen, reqlen);
region->upad = reqlen - region->klen;
return 0;
}
static int get_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
int ret;
struct sve_state_reg_region region;
char __user *uptr = (char __user *)reg->addr;
/* Handle the KVM_REG_ARM64_SVE_VLS pseudo-reg as a special case: */
if (reg->id == KVM_REG_ARM64_SVE_VLS)
return get_sve_vls(vcpu, reg);
/* Try to interpret reg ID as an architectural SVE register... */
ret = sve_reg_to_region(&region, vcpu, reg);
if (ret)
return ret;
if (!kvm_arm_vcpu_sve_finalized(vcpu))
return -EPERM;
if (copy_to_user(uptr, vcpu->arch.sve_state + region.koffset,
region.klen) ||
clear_user(uptr + region.klen, region.upad))
return -EFAULT;
return 0;
}
static int set_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
int ret;
struct sve_state_reg_region region;
const char __user *uptr = (const char __user *)reg->addr;
/* Handle the KVM_REG_ARM64_SVE_VLS pseudo-reg as a special case: */
if (reg->id == KVM_REG_ARM64_SVE_VLS)
return set_sve_vls(vcpu, reg);
/* Try to interpret reg ID as an architectural SVE register... */
ret = sve_reg_to_region(&region, vcpu, reg);
if (ret)
return ret;
if (!kvm_arm_vcpu_sve_finalized(vcpu))
return -EPERM;
if (copy_from_user(vcpu->arch.sve_state + region.koffset, uptr,
region.klen))
return -EFAULT;
return 0;
}
int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
{
return -EINVAL;
@ -193,9 +448,37 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
return -EINVAL;
}
static unsigned long num_core_regs(void)
static int copy_core_reg_indices(const struct kvm_vcpu *vcpu,
u64 __user *uindices)
{
return sizeof(struct kvm_regs) / sizeof(__u32);
unsigned int i;
int n = 0;
const u64 core_reg = KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE;
for (i = 0; i < sizeof(struct kvm_regs) / sizeof(__u32); i++) {
/*
* The KVM_REG_ARM64_SVE regs must be used instead of
* KVM_REG_ARM_CORE for accessing the FPSIMD V-registers on
* SVE-enabled vcpus:
*/
if (vcpu_has_sve(vcpu) && core_reg_offset_is_vreg(i))
continue;
if (uindices) {
if (put_user(core_reg | i, uindices))
return -EFAULT;
uindices++;
}
n++;
}
return n;
}
static unsigned long num_core_regs(const struct kvm_vcpu *vcpu)
{
return copy_core_reg_indices(vcpu, NULL);
}
/**
@ -251,6 +534,67 @@ static int get_timer_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
return copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id)) ? -EFAULT : 0;
}
static unsigned long num_sve_regs(const struct kvm_vcpu *vcpu)
{
const unsigned int slices = vcpu_sve_slices(vcpu);
if (!vcpu_has_sve(vcpu))
return 0;
/* Policed by KVM_GET_REG_LIST: */
WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu));
return slices * (SVE_NUM_PREGS + SVE_NUM_ZREGS + 1 /* FFR */)
+ 1; /* KVM_REG_ARM64_SVE_VLS */
}
static int copy_sve_reg_indices(const struct kvm_vcpu *vcpu,
u64 __user *uindices)
{
const unsigned int slices = vcpu_sve_slices(vcpu);
u64 reg;
unsigned int i, n;
int num_regs = 0;
if (!vcpu_has_sve(vcpu))
return 0;
/* Policed by KVM_GET_REG_LIST: */
WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu));
/*
* Enumerate this first, so that userspace can save/restore in
* the order reported by KVM_GET_REG_LIST:
*/
reg = KVM_REG_ARM64_SVE_VLS;
if (put_user(reg, uindices++))
return -EFAULT;
++num_regs;
for (i = 0; i < slices; i++) {
for (n = 0; n < SVE_NUM_ZREGS; n++) {
reg = KVM_REG_ARM64_SVE_ZREG(n, i);
if (put_user(reg, uindices++))
return -EFAULT;
num_regs++;
}
for (n = 0; n < SVE_NUM_PREGS; n++) {
reg = KVM_REG_ARM64_SVE_PREG(n, i);
if (put_user(reg, uindices++))
return -EFAULT;
num_regs++;
}
reg = KVM_REG_ARM64_SVE_FFR(i);
if (put_user(reg, uindices++))
return -EFAULT;
num_regs++;
}
return num_regs;
}
/**
* kvm_arm_num_regs - how many registers do we present via KVM_GET_ONE_REG
*
@ -258,8 +602,15 @@ static int get_timer_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
*/
unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
{
return num_core_regs() + kvm_arm_num_sys_reg_descs(vcpu)
+ kvm_arm_get_fw_num_regs(vcpu) + NUM_TIMER_REGS;
unsigned long res = 0;
res += num_core_regs(vcpu);
res += num_sve_regs(vcpu);
res += kvm_arm_num_sys_reg_descs(vcpu);
res += kvm_arm_get_fw_num_regs(vcpu);
res += NUM_TIMER_REGS;
return res;
}
/**
@ -269,23 +620,25 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
*/
int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
{
unsigned int i;
const u64 core_reg = KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE;
int ret;
for (i = 0; i < sizeof(struct kvm_regs) / sizeof(__u32); i++) {
if (put_user(core_reg | i, uindices))
return -EFAULT;
uindices++;
}
ret = copy_core_reg_indices(vcpu, uindices);
if (ret < 0)
return ret;
uindices += ret;
ret = copy_sve_reg_indices(vcpu, uindices);
if (ret < 0)
return ret;
uindices += ret;
ret = kvm_arm_copy_fw_reg_indices(vcpu, uindices);
if (ret)
if (ret < 0)
return ret;
uindices += kvm_arm_get_fw_num_regs(vcpu);
ret = copy_timer_indices(vcpu, uindices);
if (ret)
if (ret < 0)
return ret;
uindices += NUM_TIMER_REGS;
@ -298,12 +651,11 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM64 >> 32)
return -EINVAL;
/* Register group 16 means we want a core register. */
if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
return get_core_reg(vcpu, reg);
if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_FW)
return kvm_arm_get_fw_reg(vcpu, reg);
switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
case KVM_REG_ARM_CORE: return get_core_reg(vcpu, reg);
case KVM_REG_ARM_FW: return kvm_arm_get_fw_reg(vcpu, reg);
case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg);
}
if (is_timer_reg(reg->id))
return get_timer_reg(vcpu, reg);
@ -317,12 +669,11 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM64 >> 32)
return -EINVAL;
/* Register group 16 means we set a core register. */
if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE)
return set_core_reg(vcpu, reg);
if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_FW)
return kvm_arm_set_fw_reg(vcpu, reg);
switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg);
case KVM_REG_ARM_FW: return kvm_arm_set_fw_reg(vcpu, reg);
case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg);
}
if (is_timer_reg(reg->id))
return set_timer_reg(vcpu, reg);

View File

@ -173,20 +173,40 @@ static int handle_sve(struct kvm_vcpu *vcpu, struct kvm_run *run)
return 1;
}
#define __ptrauth_save_key(regs, key) \
({ \
regs[key ## KEYLO_EL1] = read_sysreg_s(SYS_ ## key ## KEYLO_EL1); \
regs[key ## KEYHI_EL1] = read_sysreg_s(SYS_ ## key ## KEYHI_EL1); \
})
/*
* Handle the guest trying to use a ptrauth instruction, or trying to access a
* ptrauth register.
*/
void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *ctxt;
if (vcpu_has_ptrauth(vcpu)) {
vcpu_ptrauth_enable(vcpu);
ctxt = vcpu->arch.host_cpu_context;
__ptrauth_save_key(ctxt->sys_regs, APIA);
__ptrauth_save_key(ctxt->sys_regs, APIB);
__ptrauth_save_key(ctxt->sys_regs, APDA);
__ptrauth_save_key(ctxt->sys_regs, APDB);
__ptrauth_save_key(ctxt->sys_regs, APGA);
} else {
kvm_inject_undefined(vcpu);
}
}
/*
* Guest usage of a ptrauth instruction (which the guest EL1 did not turn into
* a NOP).
*/
static int kvm_handle_ptrauth(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
/*
* We don't currently support ptrauth in a guest, and we mask the ID
* registers to prevent well-behaved guests from trying to make use of
* it.
*
* Inject an UNDEF, as if the feature really isn't present.
*/
kvm_inject_undefined(vcpu);
kvm_arm_vcpu_ptrauth_trap(vcpu);
return 1;
}

View File

@ -24,6 +24,7 @@
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_ptrauth.h>
#define CPU_GP_REG_OFFSET(x) (CPU_GP_REGS + x)
#define CPU_XREG_OFFSET(x) CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
@ -64,6 +65,13 @@ ENTRY(__guest_enter)
add x18, x0, #VCPU_CONTEXT
// Macro ptrauth_switch_to_guest format:
// ptrauth_switch_to_guest(guest cxt, tmp1, tmp2, tmp3)
// The below macro to restore guest keys is not implemented in C code
// as it may cause Pointer Authentication key signing mismatch errors
// when this feature is enabled for kernel code.
ptrauth_switch_to_guest x18, x0, x1, x2
// Restore guest regs x0-x17
ldp x0, x1, [x18, #CPU_XREG_OFFSET(0)]
ldp x2, x3, [x18, #CPU_XREG_OFFSET(2)]
@ -118,6 +126,13 @@ ENTRY(__guest_exit)
get_host_ctxt x2, x3
// Macro ptrauth_switch_to_guest format:
// ptrauth_switch_to_host(guest cxt, host cxt, tmp1, tmp2, tmp3)
// The below macro to save/restore keys is not implemented in C code
// as it may cause Pointer Authentication key signing mismatch errors
// when this feature is enabled for kernel code.
ptrauth_switch_to_host x1, x2, x3, x4, x5
// Now restore the host regs
restore_callee_saved_regs x2

View File

@ -100,7 +100,10 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
val = read_sysreg(cpacr_el1);
val |= CPACR_EL1_TTA;
val &= ~CPACR_EL1_ZEN;
if (!update_fp_enabled(vcpu)) {
if (update_fp_enabled(vcpu)) {
if (vcpu_has_sve(vcpu))
val |= CPACR_EL1_ZEN;
} else {
val &= ~CPACR_EL1_FPEN;
__activate_traps_fpsimd32(vcpu);
}
@ -317,16 +320,48 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
return true;
}
static bool __hyp_text __hyp_switch_fpsimd(struct kvm_vcpu *vcpu)
/* Check for an FPSIMD/SVE trap and handle as appropriate */
static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
{
struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state;
bool vhe, sve_guest, sve_host;
u8 hsr_ec;
if (has_vhe())
write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
cpacr_el1);
else
if (!system_supports_fpsimd())
return false;
if (system_supports_sve()) {
sve_guest = vcpu_has_sve(vcpu);
sve_host = vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE;
vhe = true;
} else {
sve_guest = false;
sve_host = false;
vhe = has_vhe();
}
hsr_ec = kvm_vcpu_trap_get_class(vcpu);
if (hsr_ec != ESR_ELx_EC_FP_ASIMD &&
hsr_ec != ESR_ELx_EC_SVE)
return false;
/* Don't handle SVE traps for non-SVE vcpus here: */
if (!sve_guest)
if (hsr_ec != ESR_ELx_EC_FP_ASIMD)
return false;
/* Valid trap. Switch the context: */
if (vhe) {
u64 reg = read_sysreg(cpacr_el1) | CPACR_EL1_FPEN;
if (sve_guest)
reg |= CPACR_EL1_ZEN;
write_sysreg(reg, cpacr_el1);
} else {
write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
cptr_el2);
}
isb();
@ -335,21 +370,28 @@ static bool __hyp_text __hyp_switch_fpsimd(struct kvm_vcpu *vcpu)
* In the SVE case, VHE is assumed: it is enforced by
* Kconfig and kvm_arch_init().
*/
if (system_supports_sve() &&
(vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE)) {
if (sve_host) {
struct thread_struct *thread = container_of(
host_fpsimd,
vcpu->arch.host_fpsimd_state,
struct thread_struct, uw.fpsimd_state);
sve_save_state(sve_pffr(thread), &host_fpsimd->fpsr);
sve_save_state(sve_pffr(thread),
&vcpu->arch.host_fpsimd_state->fpsr);
} else {
__fpsimd_save_state(host_fpsimd);
__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
}
vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
}
__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
if (sve_guest) {
sve_load_state(vcpu_sve_pffr(vcpu),
&vcpu->arch.ctxt.gp_regs.fp_regs.fpsr,
sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1);
write_sysreg_s(vcpu->arch.ctxt.sys_regs[ZCR_EL1], SYS_ZCR_EL12);
} else {
__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
}
/* Skip restoring fpexc32 for AArch64 guests */
if (!(read_sysreg(hcr_el2) & HCR_RW))
@ -385,10 +427,10 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
* and restore the guest context lazily.
* If FP/SIMD is not implemented, handle the trap and inject an
* undefined instruction exception to the guest.
* Similarly for trapped SVE accesses.
*/
if (system_supports_fpsimd() &&
kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_FP_ASIMD)
return __hyp_switch_fpsimd(vcpu);
if (__hyp_handle_fpsimd(vcpu))
return true;
if (!__populate_fault_info(vcpu))
return true;
@ -524,6 +566,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
bool pmu_switch_needed;
u64 exit_code;
/*
@ -543,6 +586,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = &vcpu->arch.ctxt;
pmu_switch_needed = __pmu_switch_to_guest(host_ctxt);
__sysreg_save_state_nvhe(host_ctxt);
__activate_vm(kern_hyp_va(vcpu->kvm));
@ -589,6 +634,9 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
*/
__debug_switch_to_host(vcpu);
if (pmu_switch_needed)
__pmu_switch_to_host(host_ctxt);
/* Returning to host will clear PSR.I, remask PMR if needed */
if (system_uses_irq_prio_masking())
gic_write_pmr(GIC_PRIO_IRQOFF);

239
arch/arm64/kvm/pmu.c Normal file
View File

@ -0,0 +1,239 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright 2019 Arm Limited
* Author: Andrew Murray <Andrew.Murray@arm.com>
*/
#include <linux/kvm_host.h>
#include <linux/perf_event.h>
#include <asm/kvm_hyp.h>
/*
* Given the perf event attributes and system type, determine
* if we are going to need to switch counters at guest entry/exit.
*/
static bool kvm_pmu_switch_needed(struct perf_event_attr *attr)
{
/**
* With VHE the guest kernel runs at EL1 and the host at EL2,
* where user (EL0) is excluded then we have no reason to switch
* counters.
*/
if (has_vhe() && attr->exclude_user)
return false;
/* Only switch if attributes are different */
return (attr->exclude_host != attr->exclude_guest);
}
/*
* Add events to track that we may want to switch at guest entry/exit
* time.
*/
void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
{
struct kvm_host_data *ctx = this_cpu_ptr(&kvm_host_data);
if (!kvm_pmu_switch_needed(attr))
return;
if (!attr->exclude_host)
ctx->pmu_events.events_host |= set;
if (!attr->exclude_guest)
ctx->pmu_events.events_guest |= set;
}
/*
* Stop tracking events
*/
void kvm_clr_pmu_events(u32 clr)
{
struct kvm_host_data *ctx = this_cpu_ptr(&kvm_host_data);
ctx->pmu_events.events_host &= ~clr;
ctx->pmu_events.events_guest &= ~clr;
}
/**
* Disable host events, enable guest events
*/
bool __hyp_text __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt)
{
struct kvm_host_data *host;
struct kvm_pmu_events *pmu;
host = container_of(host_ctxt, struct kvm_host_data, host_ctxt);
pmu = &host->pmu_events;
if (pmu->events_host)
write_sysreg(pmu->events_host, pmcntenclr_el0);
if (pmu->events_guest)
write_sysreg(pmu->events_guest, pmcntenset_el0);
return (pmu->events_host || pmu->events_guest);
}
/**
* Disable guest events, enable host events
*/
void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt)
{
struct kvm_host_data *host;
struct kvm_pmu_events *pmu;
host = container_of(host_ctxt, struct kvm_host_data, host_ctxt);
pmu = &host->pmu_events;
if (pmu->events_guest)
write_sysreg(pmu->events_guest, pmcntenclr_el0);
if (pmu->events_host)
write_sysreg(pmu->events_host, pmcntenset_el0);
}
#define PMEVTYPER_READ_CASE(idx) \
case idx: \
return read_sysreg(pmevtyper##idx##_el0)
#define PMEVTYPER_WRITE_CASE(idx) \
case idx: \
write_sysreg(val, pmevtyper##idx##_el0); \
break
#define PMEVTYPER_CASES(readwrite) \
PMEVTYPER_##readwrite##_CASE(0); \
PMEVTYPER_##readwrite##_CASE(1); \
PMEVTYPER_##readwrite##_CASE(2); \
PMEVTYPER_##readwrite##_CASE(3); \
PMEVTYPER_##readwrite##_CASE(4); \
PMEVTYPER_##readwrite##_CASE(5); \
PMEVTYPER_##readwrite##_CASE(6); \
PMEVTYPER_##readwrite##_CASE(7); \
PMEVTYPER_##readwrite##_CASE(8); \
PMEVTYPER_##readwrite##_CASE(9); \
PMEVTYPER_##readwrite##_CASE(10); \
PMEVTYPER_##readwrite##_CASE(11); \
PMEVTYPER_##readwrite##_CASE(12); \
PMEVTYPER_##readwrite##_CASE(13); \
PMEVTYPER_##readwrite##_CASE(14); \
PMEVTYPER_##readwrite##_CASE(15); \
PMEVTYPER_##readwrite##_CASE(16); \
PMEVTYPER_##readwrite##_CASE(17); \
PMEVTYPER_##readwrite##_CASE(18); \
PMEVTYPER_##readwrite##_CASE(19); \
PMEVTYPER_##readwrite##_CASE(20); \
PMEVTYPER_##readwrite##_CASE(21); \
PMEVTYPER_##readwrite##_CASE(22); \
PMEVTYPER_##readwrite##_CASE(23); \
PMEVTYPER_##readwrite##_CASE(24); \
PMEVTYPER_##readwrite##_CASE(25); \
PMEVTYPER_##readwrite##_CASE(26); \
PMEVTYPER_##readwrite##_CASE(27); \
PMEVTYPER_##readwrite##_CASE(28); \
PMEVTYPER_##readwrite##_CASE(29); \
PMEVTYPER_##readwrite##_CASE(30)
/*
* Read a value direct from PMEVTYPER<idx> where idx is 0-30
* or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31).
*/
static u64 kvm_vcpu_pmu_read_evtype_direct(int idx)
{
switch (idx) {
PMEVTYPER_CASES(READ);
case ARMV8_PMU_CYCLE_IDX:
return read_sysreg(pmccfiltr_el0);
default:
WARN_ON(1);
}
return 0;
}
/*
* Write a value direct to PMEVTYPER<idx> where idx is 0-30
* or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31).
*/
static void kvm_vcpu_pmu_write_evtype_direct(int idx, u32 val)
{
switch (idx) {
PMEVTYPER_CASES(WRITE);
case ARMV8_PMU_CYCLE_IDX:
write_sysreg(val, pmccfiltr_el0);
break;
default:
WARN_ON(1);
}
}
/*
* Modify ARMv8 PMU events to include EL0 counting
*/
static void kvm_vcpu_pmu_enable_el0(unsigned long events)
{
u64 typer;
u32 counter;
for_each_set_bit(counter, &events, 32) {
typer = kvm_vcpu_pmu_read_evtype_direct(counter);
typer &= ~ARMV8_PMU_EXCLUDE_EL0;
kvm_vcpu_pmu_write_evtype_direct(counter, typer);
}
}
/*
* Modify ARMv8 PMU events to exclude EL0 counting
*/
static void kvm_vcpu_pmu_disable_el0(unsigned long events)
{
u64 typer;
u32 counter;
for_each_set_bit(counter, &events, 32) {
typer = kvm_vcpu_pmu_read_evtype_direct(counter);
typer |= ARMV8_PMU_EXCLUDE_EL0;
kvm_vcpu_pmu_write_evtype_direct(counter, typer);
}
}
/*
* On VHE ensure that only guest events have EL0 counting enabled
*/
void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *host_ctxt;
struct kvm_host_data *host;
u32 events_guest, events_host;
if (!has_vhe())
return;
host_ctxt = vcpu->arch.host_cpu_context;
host = container_of(host_ctxt, struct kvm_host_data, host_ctxt);
events_guest = host->pmu_events.events_guest;
events_host = host->pmu_events.events_host;
kvm_vcpu_pmu_enable_el0(events_guest);
kvm_vcpu_pmu_disable_el0(events_host);
}
/*
* On VHE ensure that only host events have EL0 counting enabled
*/
void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *host_ctxt;
struct kvm_host_data *host;
u32 events_guest, events_host;
if (!has_vhe())
return;
host_ctxt = vcpu->arch.host_cpu_context;
host = container_of(host_ctxt, struct kvm_host_data, host_ctxt);
events_guest = host->pmu_events.events_guest;
events_host = host->pmu_events.events_host;
kvm_vcpu_pmu_enable_el0(events_host);
kvm_vcpu_pmu_disable_el0(events_guest);
}

View File

@ -20,20 +20,26 @@
*/
#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/kvm_host.h>
#include <linux/kvm.h>
#include <linux/hw_breakpoint.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/types.h>
#include <kvm/arm_arch_timer.h>
#include <asm/cpufeature.h>
#include <asm/cputype.h>
#include <asm/fpsimd.h>
#include <asm/ptrace.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_coproc.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_mmu.h>
#include <asm/virt.h>
/* Maximum phys_shift supported for any VM on this host */
static u32 kvm_ipa_limit;
@ -92,6 +98,14 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_VM_IPA_SIZE:
r = kvm_ipa_limit;
break;
case KVM_CAP_ARM_SVE:
r = system_supports_sve();
break;
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
case KVM_CAP_ARM_PTRAUTH_GENERIC:
r = has_vhe() && system_supports_address_auth() &&
system_supports_generic_auth();
break;
default:
r = 0;
}
@ -99,13 +113,148 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
return r;
}
unsigned int kvm_sve_max_vl;
int kvm_arm_init_sve(void)
{
if (system_supports_sve()) {
kvm_sve_max_vl = sve_max_virtualisable_vl;
/*
* The get_sve_reg()/set_sve_reg() ioctl interface will need
* to be extended with multiple register slice support in
* order to support vector lengths greater than
* SVE_VL_ARCH_MAX:
*/
if (WARN_ON(kvm_sve_max_vl > SVE_VL_ARCH_MAX))
kvm_sve_max_vl = SVE_VL_ARCH_MAX;
/*
* Don't even try to make use of vector lengths that
* aren't available on all CPUs, for now:
*/
if (kvm_sve_max_vl < sve_max_vl)
pr_warn("KVM: SVE vector length for guests limited to %u bytes\n",
kvm_sve_max_vl);
}
return 0;
}
static int kvm_vcpu_enable_sve(struct kvm_vcpu *vcpu)
{
if (!system_supports_sve())
return -EINVAL;
/* Verify that KVM startup enforced this when SVE was detected: */
if (WARN_ON(!has_vhe()))
return -EINVAL;
vcpu->arch.sve_max_vl = kvm_sve_max_vl;
/*
* Userspace can still customize the vector lengths by writing
* KVM_REG_ARM64_SVE_VLS. Allocation is deferred until
* kvm_arm_vcpu_finalize(), which freezes the configuration.
*/
vcpu->arch.flags |= KVM_ARM64_GUEST_HAS_SVE;
return 0;
}
/*
* Finalize vcpu's maximum SVE vector length, allocating
* vcpu->arch.sve_state as necessary.
*/
static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
{
void *buf;
unsigned int vl;
vl = vcpu->arch.sve_max_vl;
/*
* Resposibility for these properties is shared between
* kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and
* set_sve_vls(). Double-check here just to be sure:
*/
if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl ||
vl > SVE_VL_ARCH_MAX))
return -EIO;
buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
if (!buf)
return -ENOMEM;
vcpu->arch.sve_state = buf;
vcpu->arch.flags |= KVM_ARM64_VCPU_SVE_FINALIZED;
return 0;
}
int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
{
switch (feature) {
case KVM_ARM_VCPU_SVE:
if (!vcpu_has_sve(vcpu))
return -EINVAL;
if (kvm_arm_vcpu_sve_finalized(vcpu))
return -EPERM;
return kvm_vcpu_finalize_sve(vcpu);
}
return -EINVAL;
}
bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
{
if (vcpu_has_sve(vcpu) && !kvm_arm_vcpu_sve_finalized(vcpu))
return false;
return true;
}
void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
{
kfree(vcpu->arch.sve_state);
}
static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
{
if (vcpu_has_sve(vcpu))
memset(vcpu->arch.sve_state, 0, vcpu_sve_state_size(vcpu));
}
static int kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
{
/* Support ptrauth only if the system supports these capabilities. */
if (!has_vhe())
return -EINVAL;
if (!system_supports_address_auth() ||
!system_supports_generic_auth())
return -EINVAL;
/*
* For now make sure that both address/generic pointer authentication
* features are requested by the userspace together.
*/
if (!test_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, vcpu->arch.features) ||
!test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, vcpu->arch.features))
return -EINVAL;
vcpu->arch.flags |= KVM_ARM64_GUEST_HAS_PTRAUTH;
return 0;
}
/**
* kvm_reset_vcpu - sets core registers and sys_regs to reset value
* @vcpu: The VCPU pointer
*
* This function finds the right table above and sets the registers on
* the virtual CPU struct to their architecturally defined reset
* values.
* values, except for registers whose reset is deferred until
* kvm_arm_vcpu_finalize().
*
* Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT
* ioctl or as part of handling a request issued by another VCPU in the PSCI
@ -131,6 +280,22 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
if (loaded)
kvm_arch_vcpu_put(vcpu);
if (!kvm_arm_vcpu_sve_finalized(vcpu)) {
if (test_bit(KVM_ARM_VCPU_SVE, vcpu->arch.features)) {
ret = kvm_vcpu_enable_sve(vcpu);
if (ret)
goto out;
}
} else {
kvm_vcpu_reset_sve(vcpu);
}
if (test_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, vcpu->arch.features) ||
test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, vcpu->arch.features)) {
if (kvm_vcpu_enable_ptrauth(vcpu))
goto out;
}
switch (vcpu->arch.target) {
default:
if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {

View File

@ -695,6 +695,7 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
val |= p->regval & ARMV8_PMU_PMCR_MASK;
__vcpu_sys_reg(vcpu, PMCR_EL0) = val;
kvm_pmu_handle_pmcr(vcpu, val);
kvm_vcpu_pmu_restore_guest(vcpu);
} else {
/* PMCR.P & PMCR.C are RAZ */
val = __vcpu_sys_reg(vcpu, PMCR_EL0)
@ -850,6 +851,7 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (p->is_write) {
kvm_pmu_set_counter_event_type(vcpu, p->regval, idx);
__vcpu_sys_reg(vcpu, reg) = p->regval & ARMV8_PMU_EVTYPE_MASK;
kvm_vcpu_pmu_restore_guest(vcpu);
} else {
p->regval = __vcpu_sys_reg(vcpu, reg) & ARMV8_PMU_EVTYPE_MASK;
}
@ -875,6 +877,7 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
/* accessing PMCNTENSET_EL0 */
__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val;
kvm_pmu_enable_counter(vcpu, val);
kvm_vcpu_pmu_restore_guest(vcpu);
} else {
/* accessing PMCNTENCLR_EL0 */
__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val;
@ -1007,6 +1010,37 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
{ SYS_DESC(SYS_PMEVTYPERn_EL0(n)), \
access_pmu_evtyper, reset_unknown, (PMEVTYPER0_EL0 + n), }
static bool trap_ptrauth(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *rd)
{
kvm_arm_vcpu_ptrauth_trap(vcpu);
/*
* Return false for both cases as we never skip the trapped
* instruction:
*
* - Either we re-execute the same key register access instruction
* after enabling ptrauth.
* - Or an UNDEF is injected as ptrauth is not supported/enabled.
*/
return false;
}
static unsigned int ptrauth_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
return vcpu_has_ptrauth(vcpu) ? 0 : REG_HIDDEN_USER | REG_HIDDEN_GUEST;
}
#define __PTRAUTH_KEY(k) \
{ SYS_DESC(SYS_## k), trap_ptrauth, reset_unknown, k, \
.visibility = ptrauth_visibility}
#define PTRAUTH_KEY(k) \
__PTRAUTH_KEY(k ## KEYLO_EL1), \
__PTRAUTH_KEY(k ## KEYHI_EL1)
static bool access_arch_timer(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
@ -1044,25 +1078,20 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
}
/* Read a sanitised cpufeature ID register by sys_reg_desc */
static u64 read_id_reg(struct sys_reg_desc const *r, bool raz)
static u64 read_id_reg(const struct kvm_vcpu *vcpu,
struct sys_reg_desc const *r, bool raz)
{
u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
(u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
u64 val = raz ? 0 : read_sanitised_ftr_reg(id);
if (id == SYS_ID_AA64PFR0_EL1) {
if (val & (0xfUL << ID_AA64PFR0_SVE_SHIFT))
kvm_debug("SVE unsupported for guests, suppressing\n");
if (id == SYS_ID_AA64PFR0_EL1 && !vcpu_has_sve(vcpu)) {
val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
} else if (id == SYS_ID_AA64ISAR1_EL1) {
const u64 ptrauth_mask = (0xfUL << ID_AA64ISAR1_APA_SHIFT) |
(0xfUL << ID_AA64ISAR1_API_SHIFT) |
(0xfUL << ID_AA64ISAR1_GPA_SHIFT) |
(0xfUL << ID_AA64ISAR1_GPI_SHIFT);
if (val & ptrauth_mask)
kvm_debug("ptrauth unsupported for guests, suppressing\n");
val &= ~ptrauth_mask;
} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
(0xfUL << ID_AA64ISAR1_API_SHIFT) |
(0xfUL << ID_AA64ISAR1_GPA_SHIFT) |
(0xfUL << ID_AA64ISAR1_GPI_SHIFT));
}
return val;
@ -1078,7 +1107,7 @@ static bool __access_id_reg(struct kvm_vcpu *vcpu,
if (p->is_write)
return write_to_read_only(vcpu, p, r);
p->regval = read_id_reg(r, raz);
p->regval = read_id_reg(vcpu, r, raz);
return true;
}
@ -1100,6 +1129,81 @@ static int reg_from_user(u64 *val, const void __user *uaddr, u64 id);
static int reg_to_user(void __user *uaddr, const u64 *val, u64 id);
static u64 sys_reg_to_index(const struct sys_reg_desc *reg);
/* Visibility overrides for SVE-specific control registers */
static unsigned int sve_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
if (vcpu_has_sve(vcpu))
return 0;
return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
}
/* Visibility overrides for SVE-specific ID registers */
static unsigned int sve_id_visibility(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd)
{
if (vcpu_has_sve(vcpu))
return 0;
return REG_HIDDEN_USER;
}
/* Generate the emulated ID_AA64ZFR0_EL1 value exposed to the guest */
static u64 guest_id_aa64zfr0_el1(const struct kvm_vcpu *vcpu)
{
if (!vcpu_has_sve(vcpu))
return 0;
return read_sanitised_ftr_reg(SYS_ID_AA64ZFR0_EL1);
}
static bool access_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *rd)
{
if (p->is_write)
return write_to_read_only(vcpu, p, rd);
p->regval = guest_id_aa64zfr0_el1(vcpu);
return true;
}
static int get_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd,
const struct kvm_one_reg *reg, void __user *uaddr)
{
u64 val;
if (WARN_ON(!vcpu_has_sve(vcpu)))
return -ENOENT;
val = guest_id_aa64zfr0_el1(vcpu);
return reg_to_user(uaddr, &val, reg->id);
}
static int set_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd,
const struct kvm_one_reg *reg, void __user *uaddr)
{
const u64 id = sys_reg_to_index(rd);
int err;
u64 val;
if (WARN_ON(!vcpu_has_sve(vcpu)))
return -ENOENT;
err = reg_from_user(&val, uaddr, id);
if (err)
return err;
/* This is what we mean by invariant: you can't change it. */
if (val != guest_id_aa64zfr0_el1(vcpu))
return -EINVAL;
return 0;
}
/*
* cpufeature ID register user accessors
*
@ -1107,16 +1211,18 @@ static u64 sys_reg_to_index(const struct sys_reg_desc *reg);
* are stored, and for set_id_reg() we don't allow the effective value
* to be changed.
*/
static int __get_id_reg(const struct sys_reg_desc *rd, void __user *uaddr,
static int __get_id_reg(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd, void __user *uaddr,
bool raz)
{
const u64 id = sys_reg_to_index(rd);
const u64 val = read_id_reg(rd, raz);
const u64 val = read_id_reg(vcpu, rd, raz);
return reg_to_user(uaddr, &val, id);
}
static int __set_id_reg(const struct sys_reg_desc *rd, void __user *uaddr,
static int __set_id_reg(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd, void __user *uaddr,
bool raz)
{
const u64 id = sys_reg_to_index(rd);
@ -1128,7 +1234,7 @@ static int __set_id_reg(const struct sys_reg_desc *rd, void __user *uaddr,
return err;
/* This is what we mean by invariant: you can't change it. */
if (val != read_id_reg(rd, raz))
if (val != read_id_reg(vcpu, rd, raz))
return -EINVAL;
return 0;
@ -1137,25 +1243,25 @@ static int __set_id_reg(const struct sys_reg_desc *rd, void __user *uaddr,
static int get_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
const struct kvm_one_reg *reg, void __user *uaddr)
{
return __get_id_reg(rd, uaddr, false);
return __get_id_reg(vcpu, rd, uaddr, false);
}
static int set_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
const struct kvm_one_reg *reg, void __user *uaddr)
{
return __set_id_reg(rd, uaddr, false);
return __set_id_reg(vcpu, rd, uaddr, false);
}
static int get_raz_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
const struct kvm_one_reg *reg, void __user *uaddr)
{
return __get_id_reg(rd, uaddr, true);
return __get_id_reg(vcpu, rd, uaddr, true);
}
static int set_raz_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
const struct kvm_one_reg *reg, void __user *uaddr)
{
return __set_id_reg(rd, uaddr, true);
return __set_id_reg(vcpu, rd, uaddr, true);
}
static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
@ -1343,7 +1449,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_SANITISED(ID_AA64PFR1_EL1),
ID_UNALLOCATED(4,2),
ID_UNALLOCATED(4,3),
ID_UNALLOCATED(4,4),
{ SYS_DESC(SYS_ID_AA64ZFR0_EL1), access_id_aa64zfr0_el1, .get_user = get_id_aa64zfr0_el1, .set_user = set_id_aa64zfr0_el1, .visibility = sve_id_visibility },
ID_UNALLOCATED(4,5),
ID_UNALLOCATED(4,6),
ID_UNALLOCATED(4,7),
@ -1380,10 +1486,17 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
{ SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 },
{ SYS_DESC(SYS_TCR_EL1), access_vm_reg, reset_val, TCR_EL1, 0 },
PTRAUTH_KEY(APIA),
PTRAUTH_KEY(APIB),
PTRAUTH_KEY(APDA),
PTRAUTH_KEY(APDB),
PTRAUTH_KEY(APGA),
{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
@ -1924,6 +2037,12 @@ static void perform_access(struct kvm_vcpu *vcpu,
{
trace_kvm_sys_access(*vcpu_pc(vcpu), params, r);
/* Check for regs disabled by runtime config */
if (sysreg_hidden_from_guest(vcpu, r)) {
kvm_inject_undefined(vcpu);
return;
}
/*
* Not having an accessor means that we have configured a trap
* that we don't know how to handle. This certainly qualifies
@ -2435,6 +2554,10 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
if (!r)
return get_invariant_sys_reg(reg->id, uaddr);
/* Check for regs disabled by runtime config */
if (sysreg_hidden_from_user(vcpu, r))
return -ENOENT;
if (r->get_user)
return (r->get_user)(vcpu, r, reg, uaddr);
@ -2456,6 +2579,10 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
if (!r)
return set_invariant_sys_reg(reg->id, uaddr);
/* Check for regs disabled by runtime config */
if (sysreg_hidden_from_user(vcpu, r))
return -ENOENT;
if (r->set_user)
return (r->set_user)(vcpu, r, reg, uaddr);
@ -2512,7 +2639,8 @@ static bool copy_reg_to_user(const struct sys_reg_desc *reg, u64 __user **uind)
return true;
}
static int walk_one_sys_reg(const struct sys_reg_desc *rd,
static int walk_one_sys_reg(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd,
u64 __user **uind,
unsigned int *total)
{
@ -2523,6 +2651,9 @@ static int walk_one_sys_reg(const struct sys_reg_desc *rd,
if (!(rd->reg || rd->get_user))
return 0;
if (sysreg_hidden_from_user(vcpu, rd))
return 0;
if (!copy_reg_to_user(rd, uind))
return -EFAULT;
@ -2551,9 +2682,9 @@ static int walk_sys_regs(struct kvm_vcpu *vcpu, u64 __user *uind)
int cmp = cmp_sys_reg(i1, i2);
/* target-specific overrides generic entry. */
if (cmp <= 0)
err = walk_one_sys_reg(i1, &uind, &total);
err = walk_one_sys_reg(vcpu, i1, &uind, &total);
else
err = walk_one_sys_reg(i2, &uind, &total);
err = walk_one_sys_reg(vcpu, i2, &uind, &total);
if (err)
return err;

View File

@ -64,8 +64,15 @@ struct sys_reg_desc {
const struct kvm_one_reg *reg, void __user *uaddr);
int (*set_user)(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
const struct kvm_one_reg *reg, void __user *uaddr);
/* Return mask of REG_* runtime visibility overrides */
unsigned int (*visibility)(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *rd);
};
#define REG_HIDDEN_USER (1 << 0) /* hidden from userspace ioctls */
#define REG_HIDDEN_GUEST (1 << 1) /* hidden from guest */
static inline void print_sys_reg_instr(const struct sys_reg_params *p)
{
/* Look, we even formatted it for you to paste into the table! */
@ -102,6 +109,24 @@ static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r
__vcpu_sys_reg(vcpu, r->reg) = r->val;
}
static inline bool sysreg_hidden_from_guest(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r)
{
if (likely(!r->visibility))
return false;
return r->visibility(vcpu, r) & REG_HIDDEN_GUEST;
}
static inline bool sysreg_hidden_from_user(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r)
{
if (likely(!r->visibility))
return false;
return r->visibility(vcpu, r) & REG_HIDDEN_USER;
}
static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
const struct sys_reg_desc *i2)
{

View File

@ -201,6 +201,8 @@ struct kvmppc_spapr_tce_iommu_table {
struct kref kref;
};
#define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64))
struct kvmppc_spapr_tce_table {
struct list_head list;
struct kvm *kvm;
@ -210,6 +212,7 @@ struct kvmppc_spapr_tce_table {
u64 offset; /* in pages */
u64 size; /* window size in pages */
struct list_head iommu_tables;
struct mutex alloc_lock;
struct page *pages[0];
};
@ -222,6 +225,7 @@ extern struct kvm_device_ops kvm_xics_ops;
struct kvmppc_xive;
struct kvmppc_xive_vcpu;
extern struct kvm_device_ops kvm_xive_ops;
extern struct kvm_device_ops kvm_xive_native_ops;
struct kvmppc_passthru_irqmap;
@ -312,7 +316,11 @@ struct kvm_arch {
#endif
#ifdef CONFIG_KVM_XICS
struct kvmppc_xics *xics;
struct kvmppc_xive *xive;
struct kvmppc_xive *xive; /* Current XIVE device in use */
struct {
struct kvmppc_xive *native;
struct kvmppc_xive *xics_on_xive;
} xive_devices;
struct kvmppc_passthru_irqmap *pimap;
#endif
struct kvmppc_ops *kvm_ops;
@ -449,6 +457,7 @@ struct kvmppc_passthru_irqmap {
#define KVMPPC_IRQ_DEFAULT 0
#define KVMPPC_IRQ_MPIC 1
#define KVMPPC_IRQ_XICS 2 /* Includes a XIVE option */
#define KVMPPC_IRQ_XIVE 3 /* XIVE native exploitation mode */
#define MMIO_HPTE_CACHE_SIZE 4

View File

@ -197,10 +197,6 @@ extern struct kvmppc_spapr_tce_table *kvmppc_find_table(
(iommu_tce_check_ioba((stt)->page_shift, (stt)->offset, \
(stt)->size, (ioba), (npages)) ? \
H_PARAMETER : H_SUCCESS)
extern long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
unsigned long *ua, unsigned long **prmap);
extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
unsigned long idx, unsigned long tce);
extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
unsigned long ioba, unsigned long tce);
extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
@ -273,6 +269,7 @@ union kvmppc_one_reg {
u64 addr;
u64 length;
} vpaval;
u64 xive_timaval[2];
};
struct kvmppc_ops {
@ -480,6 +477,9 @@ extern void kvm_hv_vm_activated(void);
extern void kvm_hv_vm_deactivated(void);
extern bool kvm_hv_mode_active(void);
extern void kvmppc_check_need_tlb_flush(struct kvm *kvm, int pcpu,
struct kvm_nested_guest *nested);
#else
static inline void __init kvm_cma_reserve(void)
{}
@ -594,6 +594,22 @@ extern int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
int level, bool line_status);
extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
{
return vcpu->arch.irq_type == KVMPPC_IRQ_XIVE;
}
extern int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
extern void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu);
extern void kvmppc_xive_native_init_module(void);
extern void kvmppc_xive_native_exit_module(void);
extern int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu,
union kvmppc_one_reg *val);
extern int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu,
union kvmppc_one_reg *val);
#else
static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server,
u32 priority) { return -1; }
@ -617,6 +633,21 @@ static inline int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval) { retur
static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
int level, bool line_status) { return -ENODEV; }
static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
{ return 0; }
static inline int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu) { return -EBUSY; }
static inline void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu) { }
static inline void kvmppc_xive_native_init_module(void) { }
static inline void kvmppc_xive_native_exit_module(void) { }
static inline int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu,
union kvmppc_one_reg *val)
{ return 0; }
static inline int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu,
union kvmppc_one_reg *val)
{ return -ENOENT; }
#endif /* CONFIG_KVM_XIVE */
#if defined(CONFIG_PPC_POWERNV) && defined(CONFIG_KVM_BOOK3S_64_HANDLER)
@ -665,6 +696,8 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long pte_index);
long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long pte_index);
long kvmppc_rm_h_page_init(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long dest, unsigned long src);
long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
unsigned long slb_v, unsigned int status, bool data);
unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu);

View File

@ -23,6 +23,7 @@
* same offset regardless of where the code is executing
*/
extern void __iomem *xive_tima;
extern unsigned long xive_tima_os;
/*
* Offset in the TM area of our current execution level (provided by
@ -73,6 +74,8 @@ struct xive_q {
u32 esc_irq;
atomic_t count;
atomic_t pending_count;
u64 guest_qaddr;
u32 guest_qshift;
};
/* Global enable flags for the XIVE support */

View File

@ -482,6 +482,8 @@ struct kvm_ppc_cpu_char {
#define KVM_REG_PPC_ICP_PPRI_SHIFT 16 /* pending irq priority */
#define KVM_REG_PPC_ICP_PPRI_MASK 0xff
#define KVM_REG_PPC_VP_STATE (KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x8d)
/* Device control API: PPC-specific devices */
#define KVM_DEV_MPIC_GRP_MISC 1
#define KVM_DEV_MPIC_BASE_ADDR 0 /* 64-bit */
@ -677,4 +679,48 @@ struct kvm_ppc_cpu_char {
#define KVM_XICS_PRESENTED (1ULL << 43)
#define KVM_XICS_QUEUED (1ULL << 44)
/* POWER9 XIVE Native Interrupt Controller */
#define KVM_DEV_XIVE_GRP_CTRL 1
#define KVM_DEV_XIVE_RESET 1
#define KVM_DEV_XIVE_EQ_SYNC 2
#define KVM_DEV_XIVE_GRP_SOURCE 2 /* 64-bit source identifier */
#define KVM_DEV_XIVE_GRP_SOURCE_CONFIG 3 /* 64-bit source identifier */
#define KVM_DEV_XIVE_GRP_EQ_CONFIG 4 /* 64-bit EQ identifier */
#define KVM_DEV_XIVE_GRP_SOURCE_SYNC 5 /* 64-bit source identifier */
/* Layout of 64-bit XIVE source attribute values */
#define KVM_XIVE_LEVEL_SENSITIVE (1ULL << 0)
#define KVM_XIVE_LEVEL_ASSERTED (1ULL << 1)
/* Layout of 64-bit XIVE source configuration attribute values */
#define KVM_XIVE_SOURCE_PRIORITY_SHIFT 0
#define KVM_XIVE_SOURCE_PRIORITY_MASK 0x7
#define KVM_XIVE_SOURCE_SERVER_SHIFT 3
#define KVM_XIVE_SOURCE_SERVER_MASK 0xfffffff8ULL
#define KVM_XIVE_SOURCE_MASKED_SHIFT 32
#define KVM_XIVE_SOURCE_MASKED_MASK 0x100000000ULL
#define KVM_XIVE_SOURCE_EISN_SHIFT 33
#define KVM_XIVE_SOURCE_EISN_MASK 0xfffffffe00000000ULL
/* Layout of 64-bit EQ identifier */
#define KVM_XIVE_EQ_PRIORITY_SHIFT 0
#define KVM_XIVE_EQ_PRIORITY_MASK 0x7
#define KVM_XIVE_EQ_SERVER_SHIFT 3
#define KVM_XIVE_EQ_SERVER_MASK 0xfffffff8ULL
/* Layout of EQ configuration values (64 bytes) */
struct kvm_ppc_xive_eq {
__u32 flags;
__u32 qshift;
__u64 qaddr;
__u32 qtoggle;
__u32 qindex;
__u8 pad[40];
};
#define KVM_XIVE_EQ_ALWAYS_NOTIFY 0x00000001
#define KVM_XIVE_TIMA_PAGE_OFFSET 0
#define KVM_XIVE_ESB_PAGE_OFFSET 4
#endif /* __LINUX_KVM_POWERPC_H */

View File

@ -94,7 +94,7 @@ endif
kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
book3s_xics.o
kvm-book3s_64-objs-$(CONFIG_KVM_XIVE) += book3s_xive.o
kvm-book3s_64-objs-$(CONFIG_KVM_XIVE) += book3s_xive.o book3s_xive_native.o
kvm-book3s_64-objs-$(CONFIG_SPAPR_TCE_IOMMU) += book3s_64_vio.o
kvm-book3s_64-module-objs := \

View File

@ -651,6 +651,18 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
*val = get_reg_val(id, kvmppc_xics_get_icp(vcpu));
break;
#endif /* CONFIG_KVM_XICS */
#ifdef CONFIG_KVM_XIVE
case KVM_REG_PPC_VP_STATE:
if (!vcpu->arch.xive_vcpu) {
r = -ENXIO;
break;
}
if (xive_enabled())
r = kvmppc_xive_native_get_vp(vcpu, val);
else
r = -ENXIO;
break;
#endif /* CONFIG_KVM_XIVE */
case KVM_REG_PPC_FSCR:
*val = get_reg_val(id, vcpu->arch.fscr);
break;
@ -724,6 +736,18 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
r = kvmppc_xics_set_icp(vcpu, set_reg_val(id, *val));
break;
#endif /* CONFIG_KVM_XICS */
#ifdef CONFIG_KVM_XIVE
case KVM_REG_PPC_VP_STATE:
if (!vcpu->arch.xive_vcpu) {
r = -ENXIO;
break;
}
if (xive_enabled())
r = kvmppc_xive_native_set_vp(vcpu, val);
else
r = -ENXIO;
break;
#endif /* CONFIG_KVM_XIVE */
case KVM_REG_PPC_FSCR:
vcpu->arch.fscr = set_reg_val(id, *val);
break;
@ -891,6 +915,17 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
kvmppc_rtas_tokens_free(kvm);
WARN_ON(!list_empty(&kvm->arch.spapr_tce_tables));
#endif
#ifdef CONFIG_KVM_XICS
/*
* Free the XIVE devices which are not directly freed by the
* device 'release' method
*/
kfree(kvm->arch.xive_devices.native);
kvm->arch.xive_devices.native = NULL;
kfree(kvm->arch.xive_devices.xics_on_xive);
kvm->arch.xive_devices.xics_on_xive = NULL;
#endif /* CONFIG_KVM_XICS */
}
int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
@ -1050,6 +1085,9 @@ static int kvmppc_book3s_init(void)
if (xics_on_xive()) {
kvmppc_xive_init_module();
kvm_register_device_ops(&kvm_xive_ops, KVM_DEV_TYPE_XICS);
kvmppc_xive_native_init_module();
kvm_register_device_ops(&kvm_xive_native_ops,
KVM_DEV_TYPE_XIVE);
} else
#endif
kvm_register_device_ops(&kvm_xics_ops, KVM_DEV_TYPE_XICS);
@ -1060,8 +1098,10 @@ static int kvmppc_book3s_init(void)
static void kvmppc_book3s_exit(void)
{
#ifdef CONFIG_KVM_XICS
if (xics_on_xive())
if (xics_on_xive()) {
kvmppc_xive_exit_module();
kvmppc_xive_native_exit_module();
}
#endif
#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
kvmppc_book3s_exit_pr();

View File

@ -228,11 +228,33 @@ static void release_spapr_tce_table(struct rcu_head *head)
unsigned long i, npages = kvmppc_tce_pages(stt->size);
for (i = 0; i < npages; i++)
__free_page(stt->pages[i]);
if (stt->pages[i])
__free_page(stt->pages[i]);
kfree(stt);
}
static struct page *kvm_spapr_get_tce_page(struct kvmppc_spapr_tce_table *stt,
unsigned long sttpage)
{
struct page *page = stt->pages[sttpage];
if (page)
return page;
mutex_lock(&stt->alloc_lock);
page = stt->pages[sttpage];
if (!page) {
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
WARN_ON_ONCE(!page);
if (page)
stt->pages[sttpage] = page;
}
mutex_unlock(&stt->alloc_lock);
return page;
}
static vm_fault_t kvm_spapr_tce_fault(struct vm_fault *vmf)
{
struct kvmppc_spapr_tce_table *stt = vmf->vma->vm_file->private_data;
@ -241,7 +263,10 @@ static vm_fault_t kvm_spapr_tce_fault(struct vm_fault *vmf)
if (vmf->pgoff >= kvmppc_tce_pages(stt->size))
return VM_FAULT_SIGBUS;
page = stt->pages[vmf->pgoff];
page = kvm_spapr_get_tce_page(stt, vmf->pgoff);
if (!page)
return VM_FAULT_OOM;
get_page(page);
vmf->page = page;
return 0;
@ -296,7 +321,6 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvmppc_spapr_tce_table *siter;
unsigned long npages, size = args->size;
int ret = -ENOMEM;
int i;
if (!args->size || args->page_shift < 12 || args->page_shift > 34 ||
(args->offset + args->size > (ULLONG_MAX >> args->page_shift)))
@ -318,14 +342,9 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
stt->offset = args->offset;
stt->size = size;
stt->kvm = kvm;
mutex_init(&stt->alloc_lock);
INIT_LIST_HEAD_RCU(&stt->iommu_tables);
for (i = 0; i < npages; i++) {
stt->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!stt->pages[i])
goto fail;
}
mutex_lock(&kvm->lock);
/* Check this LIOBN hasn't been previously allocated */
@ -352,17 +371,28 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
if (ret >= 0)
return ret;
fail:
for (i = 0; i < npages; i++)
if (stt->pages[i])
__free_page(stt->pages[i]);
kfree(stt);
fail_acct:
kvmppc_account_memlimit(kvmppc_stt_pages(npages), false);
return ret;
}
static long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
unsigned long *ua)
{
unsigned long gfn = tce >> PAGE_SHIFT;
struct kvm_memory_slot *memslot;
memslot = search_memslots(kvm_memslots(kvm), gfn);
if (!memslot)
return -EINVAL;
*ua = __gfn_to_hva_memslot(memslot, gfn) |
(tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
return 0;
}
static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
unsigned long tce)
{
@ -378,7 +408,7 @@ static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
if (iommu_tce_check_gpa(stt->page_shift, gpa))
return H_TOO_HARD;
if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL))
if (kvmppc_tce_to_ua(stt->kvm, tce, &ua))
return H_TOO_HARD;
list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
@ -397,6 +427,36 @@ static long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt,
return H_SUCCESS;
}
/*
* Handles TCE requests for emulated devices.
* Puts guest TCE values to the table and expects user space to convert them.
* Cannot fail so kvmppc_tce_validate must be called before it.
*/
static void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
unsigned long idx, unsigned long tce)
{
struct page *page;
u64 *tbl;
unsigned long sttpage;
idx -= stt->offset;
sttpage = idx / TCES_PER_PAGE;
page = stt->pages[sttpage];
if (!page) {
/* We allow any TCE, not just with read|write permissions */
if (!tce)
return;
page = kvm_spapr_get_tce_page(stt, sttpage);
if (!page)
return;
}
tbl = page_to_virt(page);
tbl[idx % TCES_PER_PAGE] = tce;
}
static void kvmppc_clear_tce(struct mm_struct *mm, struct iommu_table *tbl,
unsigned long entry)
{
@ -551,7 +611,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
dir = iommu_tce_direction(tce);
if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) {
if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua)) {
ret = H_PARAMETER;
goto unlock_exit;
}
@ -612,7 +672,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
return ret;
idx = srcu_read_lock(&vcpu->kvm->srcu);
if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL)) {
if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua)) {
ret = H_TOO_HARD;
goto unlock_exit;
}
@ -647,7 +707,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
}
tce = be64_to_cpu(tce);
if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua))
return H_PARAMETER;
list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {

View File

@ -66,8 +66,6 @@
#endif
#define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64))
/*
* Finds a TCE table descriptor by LIOBN.
*
@ -88,6 +86,25 @@ struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm *kvm,
EXPORT_SYMBOL_GPL(kvmppc_find_table);
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
static long kvmppc_rm_tce_to_ua(struct kvm *kvm, unsigned long tce,
unsigned long *ua, unsigned long **prmap)
{
unsigned long gfn = tce >> PAGE_SHIFT;
struct kvm_memory_slot *memslot;
memslot = search_memslots(kvm_memslots_raw(kvm), gfn);
if (!memslot)
return -EINVAL;
*ua = __gfn_to_hva_memslot(memslot, gfn) |
(tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
if (prmap)
*prmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
return 0;
}
/*
* Validates TCE address.
* At the moment flags and page mask are validated.
@ -111,7 +128,7 @@ static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
if (iommu_tce_check_gpa(stt->page_shift, gpa))
return H_PARAMETER;
if (kvmppc_tce_to_ua(stt->kvm, tce, &ua, NULL))
if (kvmppc_rm_tce_to_ua(stt->kvm, tce, &ua, NULL))
return H_TOO_HARD;
list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
@ -129,7 +146,6 @@ static long kvmppc_rm_tce_validate(struct kvmppc_spapr_tce_table *stt,
return H_SUCCESS;
}
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
/* Note on the use of page_address() in real mode,
*
@ -161,13 +177,9 @@ static u64 *kvmppc_page_address(struct page *page)
/*
* Handles TCE requests for emulated devices.
* Puts guest TCE values to the table and expects user space to convert them.
* Called in both real and virtual modes.
* Cannot fail so kvmppc_tce_validate must be called before it.
*
* WARNING: This will be called in real-mode on HV KVM and virtual
* mode on PR KVM
* Cannot fail so kvmppc_rm_tce_validate must be called before it.
*/
void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
static void kvmppc_rm_tce_put(struct kvmppc_spapr_tce_table *stt,
unsigned long idx, unsigned long tce)
{
struct page *page;
@ -175,35 +187,48 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
idx -= stt->offset;
page = stt->pages[idx / TCES_PER_PAGE];
/*
* page must not be NULL in real mode,
* kvmppc_rm_ioba_validate() must have taken care of this.
*/
WARN_ON_ONCE_RM(!page);
tbl = kvmppc_page_address(page);
tbl[idx % TCES_PER_PAGE] = tce;
}
EXPORT_SYMBOL_GPL(kvmppc_tce_put);
long kvmppc_tce_to_ua(struct kvm *kvm, unsigned long tce,
unsigned long *ua, unsigned long **prmap)
/*
* TCEs pages are allocated in kvmppc_rm_tce_put() which won't be able to do so
* in real mode.
* Check if kvmppc_rm_tce_put() can succeed in real mode, i.e. a TCEs page is
* allocated or not required (when clearing a tce entry).
*/
static long kvmppc_rm_ioba_validate(struct kvmppc_spapr_tce_table *stt,
unsigned long ioba, unsigned long npages, bool clearing)
{
unsigned long gfn = tce >> PAGE_SHIFT;
struct kvm_memory_slot *memslot;
unsigned long i, idx, sttpage, sttpages;
unsigned long ret = kvmppc_ioba_validate(stt, ioba, npages);
memslot = search_memslots(kvm_memslots(kvm), gfn);
if (!memslot)
return -EINVAL;
if (ret)
return ret;
/*
* clearing==true says kvmppc_rm_tce_put won't be allocating pages
* for empty tces.
*/
if (clearing)
return H_SUCCESS;
*ua = __gfn_to_hva_memslot(memslot, gfn) |
(tce & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE));
idx = (ioba >> stt->page_shift) - stt->offset;
sttpage = idx / TCES_PER_PAGE;
sttpages = _ALIGN_UP(idx % TCES_PER_PAGE + npages, TCES_PER_PAGE) /
TCES_PER_PAGE;
for (i = sttpage; i < sttpage + sttpages; ++i)
if (!stt->pages[i])
return H_TOO_HARD;
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
if (prmap)
*prmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
#endif
return 0;
return H_SUCCESS;
}
EXPORT_SYMBOL_GPL(kvmppc_tce_to_ua);
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl,
unsigned long entry, unsigned long *hpa,
enum dma_data_direction *direction)
@ -381,7 +406,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
if (!stt)
return H_TOO_HARD;
ret = kvmppc_ioba_validate(stt, ioba, 1);
ret = kvmppc_rm_ioba_validate(stt, ioba, 1, tce == 0);
if (ret != H_SUCCESS)
return ret;
@ -390,7 +415,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
return ret;
dir = iommu_tce_direction(tce);
if ((dir != DMA_NONE) && kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
if ((dir != DMA_NONE) && kvmppc_rm_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
return H_PARAMETER;
entry = ioba >> stt->page_shift;
@ -409,7 +434,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
}
}
kvmppc_tce_put(stt, entry, tce);
kvmppc_rm_tce_put(stt, entry, tce);
return H_SUCCESS;
}
@ -480,7 +505,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
if (tce_list & (SZ_4K - 1))
return H_PARAMETER;
ret = kvmppc_ioba_validate(stt, ioba, npages);
ret = kvmppc_rm_ioba_validate(stt, ioba, npages, false);
if (ret != H_SUCCESS)
return ret;
@ -492,7 +517,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
*/
struct mm_iommu_table_group_mem_t *mem;
if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL))
if (kvmppc_rm_tce_to_ua(vcpu->kvm, tce_list, &ua, NULL))
return H_TOO_HARD;
mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K);
@ -508,7 +533,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
* We do not require memory to be preregistered in this case
* so lock rmap and do __find_linux_pte_or_hugepte().
*/
if (kvmppc_tce_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
if (kvmppc_rm_tce_to_ua(vcpu->kvm, tce_list, &ua, &rmap))
return H_TOO_HARD;
rmap = (void *) vmalloc_to_phys(rmap);
@ -542,7 +567,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
ua = 0;
if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
if (kvmppc_rm_tce_to_ua(vcpu->kvm, tce, &ua, NULL))
return H_PARAMETER;
list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
@ -557,7 +582,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
}
}
kvmppc_tce_put(stt, entry + i, tce);
kvmppc_rm_tce_put(stt, entry + i, tce);
}
unlock_exit:
@ -583,7 +608,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
if (!stt)
return H_TOO_HARD;
ret = kvmppc_ioba_validate(stt, ioba, npages);
ret = kvmppc_rm_ioba_validate(stt, ioba, npages, tce_value == 0);
if (ret != H_SUCCESS)
return ret;
@ -610,7 +635,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
}
for (i = 0; i < npages; ++i, ioba += (1ULL << stt->page_shift))
kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value);
kvmppc_rm_tce_put(stt, ioba >> stt->page_shift, tce_value);
return H_SUCCESS;
}
@ -635,6 +660,10 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
idx = (ioba >> stt->page_shift) - stt->offset;
page = stt->pages[idx / TCES_PER_PAGE];
if (!page) {
vcpu->arch.regs.gpr[4] = 0;
return H_SUCCESS;
}
tbl = (u64 *)page_address(page);
vcpu->arch.regs.gpr[4] = tbl[idx % TCES_PER_PAGE];

View File

@ -750,7 +750,7 @@ static bool kvmppc_doorbell_pending(struct kvm_vcpu *vcpu)
/*
* Ensure that the read of vcore->dpdes comes after the read
* of vcpu->doorbell_request. This barrier matches the
* smb_wmb() in kvmppc_guest_entry_inject().
* smp_wmb() in kvmppc_guest_entry_inject().
*/
smp_rmb();
vc = vcpu->arch.vcore;
@ -802,6 +802,80 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
}
}
/* Copy guest memory in place - must reside within a single memslot */
static int kvmppc_copy_guest(struct kvm *kvm, gpa_t to, gpa_t from,
unsigned long len)
{
struct kvm_memory_slot *to_memslot = NULL;
struct kvm_memory_slot *from_memslot = NULL;
unsigned long to_addr, from_addr;
int r;
/* Get HPA for from address */
from_memslot = gfn_to_memslot(kvm, from >> PAGE_SHIFT);
if (!from_memslot)
return -EFAULT;
if ((from + len) >= ((from_memslot->base_gfn + from_memslot->npages)
<< PAGE_SHIFT))
return -EINVAL;
from_addr = gfn_to_hva_memslot(from_memslot, from >> PAGE_SHIFT);
if (kvm_is_error_hva(from_addr))
return -EFAULT;
from_addr |= (from & (PAGE_SIZE - 1));
/* Get HPA for to address */
to_memslot = gfn_to_memslot(kvm, to >> PAGE_SHIFT);
if (!to_memslot)
return -EFAULT;
if ((to + len) >= ((to_memslot->base_gfn + to_memslot->npages)
<< PAGE_SHIFT))
return -EINVAL;
to_addr = gfn_to_hva_memslot(to_memslot, to >> PAGE_SHIFT);
if (kvm_is_error_hva(to_addr))
return -EFAULT;
to_addr |= (to & (PAGE_SIZE - 1));
/* Perform copy */
r = raw_copy_in_user((void __user *)to_addr, (void __user *)from_addr,
len);
if (r)
return -EFAULT;
mark_page_dirty(kvm, to >> PAGE_SHIFT);
return 0;
}
static long kvmppc_h_page_init(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long dest, unsigned long src)
{
u64 pg_sz = SZ_4K; /* 4K page size */
u64 pg_mask = SZ_4K - 1;
int ret;
/* Check for invalid flags (H_PAGE_SET_LOANED covers all CMO flags) */
if (flags & ~(H_ICACHE_INVALIDATE | H_ICACHE_SYNCHRONIZE |
H_ZERO_PAGE | H_COPY_PAGE | H_PAGE_SET_LOANED))
return H_PARAMETER;
/* dest (and src if copy_page flag set) must be page aligned */
if ((dest & pg_mask) || ((flags & H_COPY_PAGE) && (src & pg_mask)))
return H_PARAMETER;
/* zero and/or copy the page as determined by the flags */
if (flags & H_COPY_PAGE) {
ret = kvmppc_copy_guest(vcpu->kvm, dest, src, pg_sz);
if (ret < 0)
return H_PARAMETER;
} else if (flags & H_ZERO_PAGE) {
ret = kvm_clear_guest(vcpu->kvm, dest, pg_sz);
if (ret < 0)
return H_PARAMETER;
}
/* We can ignore the remaining flags */
return H_SUCCESS;
}
static int kvm_arch_vcpu_yield_to(struct kvm_vcpu *target)
{
struct kvmppc_vcore *vcore = target->arch.vcore;
@ -1004,6 +1078,11 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
if (nesting_enabled(vcpu->kvm))
ret = kvmhv_copy_tofrom_guest_nested(vcpu);
break;
case H_PAGE_INIT:
ret = kvmppc_h_page_init(vcpu, kvmppc_get_gpr(vcpu, 4),
kvmppc_get_gpr(vcpu, 5),
kvmppc_get_gpr(vcpu, 6));
break;
default:
return RESUME_HOST;
}
@ -1048,6 +1127,7 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd)
case H_IPOLL:
case H_XIRR_X:
#endif
case H_PAGE_INIT:
return 1;
}
@ -2505,37 +2585,6 @@ static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu *vcpu, int pcpu)
}
}
static void kvmppc_radix_check_need_tlb_flush(struct kvm *kvm, int pcpu,
struct kvm_nested_guest *nested)
{
cpumask_t *need_tlb_flush;
int lpid;
if (!cpu_has_feature(CPU_FTR_HVMODE))
return;
if (cpu_has_feature(CPU_FTR_ARCH_300))
pcpu &= ~0x3UL;
if (nested) {
lpid = nested->shadow_lpid;
need_tlb_flush = &nested->need_tlb_flush;
} else {
lpid = kvm->arch.lpid;
need_tlb_flush = &kvm->arch.need_tlb_flush;
}
mtspr(SPRN_LPID, lpid);
isync();
smp_mb();
if (cpumask_test_cpu(pcpu, need_tlb_flush)) {
radix__local_flush_tlb_lpid_guest(lpid);
/* Clear the bit after the TLB flush */
cpumask_clear_cpu(pcpu, need_tlb_flush);
}
}
static void kvmppc_start_thread(struct kvm_vcpu *vcpu, struct kvmppc_vcore *vc)
{
int cpu;
@ -3229,19 +3278,11 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
for (sub = 0; sub < core_info.n_subcores; ++sub)
spin_unlock(&core_info.vc[sub]->lock);
if (kvm_is_radix(vc->kvm)) {
/*
* Do we need to flush the process scoped TLB for the LPAR?
*
* On POWER9, individual threads can come in here, but the
* TLB is shared between the 4 threads in a core, hence
* invalidating on one thread invalidates for all.
* Thus we make all 4 threads use the same bit here.
*
* Hash must be flushed in realmode in order to use tlbiel.
*/
kvmppc_radix_check_need_tlb_flush(vc->kvm, pcpu, NULL);
}
guest_enter_irqoff();
srcu_idx = srcu_read_lock(&vc->kvm->srcu);
this_cpu_disable_ftrace();
/*
* Interrupts will be enabled once we get into the guest,
@ -3249,19 +3290,14 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
*/
trace_hardirqs_on();
guest_enter_irqoff();
srcu_idx = srcu_read_lock(&vc->kvm->srcu);
this_cpu_disable_ftrace();
trap = __kvmppc_vcore_entry();
trace_hardirqs_off();
this_cpu_enable_ftrace();
srcu_read_unlock(&vc->kvm->srcu, srcu_idx);
trace_hardirqs_off();
set_irq_happened(trap);
spin_lock(&vc->lock);
@ -3514,6 +3550,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
#ifdef CONFIG_ALTIVEC
load_vr_state(&vcpu->arch.vr);
#endif
mtspr(SPRN_VRSAVE, vcpu->arch.vrsave);
mtspr(SPRN_DSCR, vcpu->arch.dscr);
mtspr(SPRN_IAMR, vcpu->arch.iamr);
@ -3605,6 +3642,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
#ifdef CONFIG_ALTIVEC
store_vr_state(&vcpu->arch.vr);
#endif
vcpu->arch.vrsave = mfspr(SPRN_VRSAVE);
if (cpu_has_feature(CPU_FTR_TM) ||
cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
@ -3970,7 +4008,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run,
unsigned long lpcr)
{
int trap, r, pcpu;
int srcu_idx;
int srcu_idx, lpid;
struct kvmppc_vcore *vc;
struct kvm *kvm = vcpu->kvm;
struct kvm_nested_guest *nested = vcpu->arch.nested;
@ -4046,8 +4084,12 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run,
vc->vcore_state = VCORE_RUNNING;
trace_kvmppc_run_core(vc, 0);
if (cpu_has_feature(CPU_FTR_HVMODE))
kvmppc_radix_check_need_tlb_flush(kvm, pcpu, nested);
if (cpu_has_feature(CPU_FTR_HVMODE)) {
lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
mtspr(SPRN_LPID, lpid);
isync();
kvmppc_check_need_tlb_flush(kvm, pcpu, nested);
}
trace_hardirqs_on();
guest_enter_irqoff();

View File

@ -805,3 +805,60 @@ void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu)
vcpu->arch.doorbell_request = 0;
}
}
static void flush_guest_tlb(struct kvm *kvm)
{
unsigned long rb, set;
rb = PPC_BIT(52); /* IS = 2 */
if (kvm_is_radix(kvm)) {
/* R=1 PRS=1 RIC=2 */
asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
: : "r" (rb), "i" (1), "i" (1), "i" (2),
"r" (0) : "memory");
for (set = 1; set < kvm->arch.tlb_sets; ++set) {
rb += PPC_BIT(51); /* increment set number */
/* R=1 PRS=1 RIC=0 */
asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
: : "r" (rb), "i" (1), "i" (1), "i" (0),
"r" (0) : "memory");
}
} else {
for (set = 0; set < kvm->arch.tlb_sets; ++set) {
/* R=0 PRS=0 RIC=0 */
asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
: : "r" (rb), "i" (0), "i" (0), "i" (0),
"r" (0) : "memory");
rb += PPC_BIT(51); /* increment set number */
}
}
asm volatile("ptesync": : :"memory");
}
void kvmppc_check_need_tlb_flush(struct kvm *kvm, int pcpu,
struct kvm_nested_guest *nested)
{
cpumask_t *need_tlb_flush;
/*
* On POWER9, individual threads can come in here, but the
* TLB is shared between the 4 threads in a core, hence
* invalidating on one thread invalidates for all.
* Thus we make all 4 threads use the same bit.
*/
if (cpu_has_feature(CPU_FTR_ARCH_300))
pcpu = cpu_first_thread_sibling(pcpu);
if (nested)
need_tlb_flush = &nested->need_tlb_flush;
else
need_tlb_flush = &kvm->arch.need_tlb_flush;
if (cpumask_test_cpu(pcpu, need_tlb_flush)) {
flush_guest_tlb(kvm);
/* Clear the bit after the TLB flush */
cpumask_clear_cpu(pcpu, need_tlb_flush);
}
}
EXPORT_SYMBOL_GPL(kvmppc_check_need_tlb_flush);

View File

@ -13,6 +13,7 @@
#include <linux/hugetlb.h>
#include <linux/module.h>
#include <linux/log2.h>
#include <linux/sizes.h>
#include <asm/trace.h>
#include <asm/kvm_ppc.h>
@ -867,6 +868,149 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
return ret;
}
static int kvmppc_get_hpa(struct kvm_vcpu *vcpu, unsigned long gpa,
int writing, unsigned long *hpa,
struct kvm_memory_slot **memslot_p)
{
struct kvm *kvm = vcpu->kvm;
struct kvm_memory_slot *memslot;
unsigned long gfn, hva, pa, psize = PAGE_SHIFT;
unsigned int shift;
pte_t *ptep, pte;
/* Find the memslot for this address */
gfn = gpa >> PAGE_SHIFT;
memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
return H_PARAMETER;
/* Translate to host virtual address */
hva = __gfn_to_hva_memslot(memslot, gfn);
/* Try to find the host pte for that virtual address */
ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift);
if (!ptep)
return H_TOO_HARD;
pte = kvmppc_read_update_linux_pte(ptep, writing);
if (!pte_present(pte))
return H_TOO_HARD;
/* Convert to a physical address */
if (shift)
psize = 1UL << shift;
pa = pte_pfn(pte) << PAGE_SHIFT;
pa |= hva & (psize - 1);
pa |= gpa & ~PAGE_MASK;
if (hpa)
*hpa = pa;
if (memslot_p)
*memslot_p = memslot;
return H_SUCCESS;
}
static long kvmppc_do_h_page_init_zero(struct kvm_vcpu *vcpu,
unsigned long dest)
{
struct kvm_memory_slot *memslot;
struct kvm *kvm = vcpu->kvm;
unsigned long pa, mmu_seq;
long ret = H_SUCCESS;
int i;
/* Used later to detect if we might have been invalidated */
mmu_seq = kvm->mmu_notifier_seq;
smp_rmb();
ret = kvmppc_get_hpa(vcpu, dest, 1, &pa, &memslot);
if (ret != H_SUCCESS)
return ret;
/* Check if we've been invalidated */
raw_spin_lock(&kvm->mmu_lock.rlock);
if (mmu_notifier_retry(kvm, mmu_seq)) {
ret = H_TOO_HARD;
goto out_unlock;
}
/* Zero the page */
for (i = 0; i < SZ_4K; i += L1_CACHE_BYTES, pa += L1_CACHE_BYTES)
dcbz((void *)pa);
kvmppc_update_dirty_map(memslot, dest >> PAGE_SHIFT, PAGE_SIZE);
out_unlock:
raw_spin_unlock(&kvm->mmu_lock.rlock);
return ret;
}
static long kvmppc_do_h_page_init_copy(struct kvm_vcpu *vcpu,
unsigned long dest, unsigned long src)
{
unsigned long dest_pa, src_pa, mmu_seq;
struct kvm_memory_slot *dest_memslot;
struct kvm *kvm = vcpu->kvm;
long ret = H_SUCCESS;
/* Used later to detect if we might have been invalidated */
mmu_seq = kvm->mmu_notifier_seq;
smp_rmb();
ret = kvmppc_get_hpa(vcpu, dest, 1, &dest_pa, &dest_memslot);
if (ret != H_SUCCESS)
return ret;
ret = kvmppc_get_hpa(vcpu, src, 0, &src_pa, NULL);
if (ret != H_SUCCESS)
return ret;
/* Check if we've been invalidated */
raw_spin_lock(&kvm->mmu_lock.rlock);
if (mmu_notifier_retry(kvm, mmu_seq)) {
ret = H_TOO_HARD;
goto out_unlock;
}
/* Copy the page */
memcpy((void *)dest_pa, (void *)src_pa, SZ_4K);
kvmppc_update_dirty_map(dest_memslot, dest >> PAGE_SHIFT, PAGE_SIZE);
out_unlock:
raw_spin_unlock(&kvm->mmu_lock.rlock);
return ret;
}
long kvmppc_rm_h_page_init(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long dest, unsigned long src)
{
struct kvm *kvm = vcpu->kvm;
u64 pg_mask = SZ_4K - 1; /* 4K page size */
long ret = H_SUCCESS;
/* Don't handle radix mode here, go up to the virtual mode handler */
if (kvm_is_radix(kvm))
return H_TOO_HARD;
/* Check for invalid flags (H_PAGE_SET_LOANED covers all CMO flags) */
if (flags & ~(H_ICACHE_INVALIDATE | H_ICACHE_SYNCHRONIZE |
H_ZERO_PAGE | H_COPY_PAGE | H_PAGE_SET_LOANED))
return H_PARAMETER;
/* dest (and src if copy_page flag set) must be page aligned */
if ((dest & pg_mask) || ((flags & H_COPY_PAGE) && (src & pg_mask)))
return H_PARAMETER;
/* zero and/or copy the page as determined by the flags */
if (flags & H_COPY_PAGE)
ret = kvmppc_do_h_page_init_copy(vcpu, dest, src);
else if (flags & H_ZERO_PAGE)
ret = kvmppc_do_h_page_init_zero(vcpu, dest);
/* We can ignore the other flags */
return ret;
}
void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
unsigned long pte_index)
{

View File

@ -589,11 +589,8 @@ kvmppc_hv_entry:
1:
#endif
/* Use cr7 as an indication of radix mode */
ld r5, HSTATE_KVM_VCORE(r13)
ld r9, VCORE_KVM(r5) /* pointer to struct kvm */
lbz r0, KVM_RADIX(r9)
cmpwi cr7, r0, 0
/*
* POWER7/POWER8 host -> guest partition switch code.
@ -616,9 +613,6 @@ kvmppc_hv_entry:
cmpwi r6,0
bne 10f
/* Radix has already switched LPID and flushed core TLB */
bne cr7, 22f
lwz r7,KVM_LPID(r9)
BEGIN_FTR_SECTION
ld r6,KVM_SDR1(r9)
@ -630,41 +624,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
mtspr SPRN_LPID,r7
isync
/* See if we need to flush the TLB. Hash has to be done in RM */
lhz r6,PACAPACAINDEX(r13) /* test_bit(cpu, need_tlb_flush) */
BEGIN_FTR_SECTION
/*
* On POWER9, individual threads can come in here, but the
* TLB is shared between the 4 threads in a core, hence
* invalidating on one thread invalidates for all.
* Thus we make all 4 threads use the same bit here.
*/
clrrdi r6,r6,2
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
clrldi r7,r6,64-6 /* extract bit number (6 bits) */
srdi r6,r6,6 /* doubleword number */
sldi r6,r6,3 /* address offset */
add r6,r6,r9
addi r6,r6,KVM_NEED_FLUSH /* dword in kvm->arch.need_tlb_flush */
li r8,1
sld r8,r8,r7
ld r7,0(r6)
and. r7,r7,r8
beq 22f
/* Flush the TLB of any entries for this LPID */
lwz r0,KVM_TLB_SETS(r9)
mtctr r0
li r7,0x800 /* IS field = 0b10 */
ptesync
li r0,0 /* RS for P9 version of tlbiel */
28: tlbiel r7 /* On P9, rs=0, RIC=0, PRS=0, R=0 */
addi r7,r7,0x1000
bdnz 28b
ptesync
23: ldarx r7,0,r6 /* clear the bit after TLB flushed */
andc r7,r7,r8
stdcx. r7,0,r6
bne 23b
/* See if we need to flush the TLB. */
mr r3, r9 /* kvm pointer */
lhz r4, PACAPACAINDEX(r13) /* physical cpu number */
li r5, 0 /* nested vcpu pointer */
bl kvmppc_check_need_tlb_flush
nop
ld r5, HSTATE_KVM_VCORE(r13)
/* Add timebase offset onto timebase */
22: ld r8,VCORE_TB_OFFSET(r5)
@ -980,17 +946,27 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
#ifdef CONFIG_KVM_XICS
/* We are entering the guest on that thread, push VCPU to XIVE */
ld r10, HSTATE_XIVE_TIMA_PHYS(r13)
cmpldi cr0, r10, 0
beq no_xive
ld r11, VCPU_XIVE_SAVED_STATE(r4)
li r9, TM_QW1_OS
lwz r8, VCPU_XIVE_CAM_WORD(r4)
li r7, TM_QW1_OS + TM_WORD2
mfmsr r0
andi. r0, r0, MSR_DR /* in real mode? */
beq 2f
ld r10, HSTATE_XIVE_TIMA_VIRT(r13)
cmpldi cr1, r10, 0
beq cr1, no_xive
eieio
stdx r11,r9,r10
stwx r8,r7,r10
b 3f
2: ld r10, HSTATE_XIVE_TIMA_PHYS(r13)
cmpldi cr1, r10, 0
beq cr1, no_xive
eieio
stdcix r11,r9,r10
lwz r11, VCPU_XIVE_CAM_WORD(r4)
li r9, TM_QW1_OS + TM_WORD2
stwcix r11,r9,r10
li r9, 1
stwcix r8,r7,r10
3: li r9, 1
stb r9, VCPU_XIVE_PUSHED(r4)
eieio
@ -1009,12 +985,16 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
* on, we mask it.
*/
lbz r0, VCPU_XIVE_ESC_ON(r4)
cmpwi r0,0
beq 1f
ld r10, VCPU_XIVE_ESC_RADDR(r4)
cmpwi cr1, r0,0
beq cr1, 1f
li r9, XIVE_ESB_SET_PQ_01
beq 4f /* in real mode? */
ld r10, VCPU_XIVE_ESC_VADDR(r4)
ldx r0, r10, r9
b 5f
4: ld r10, VCPU_XIVE_ESC_RADDR(r4)
ldcix r0, r10, r9
sync
5: sync
/* We have a possible subtle race here: The escalation interrupt might
* have fired and be on its way to the host queue while we mask it,
@ -2292,7 +2272,7 @@ hcall_real_table:
#endif
.long 0 /* 0x24 - H_SET_SPRG0 */
.long DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
.long 0 /* 0x2c */
.long DOTSYM(kvmppc_rm_h_page_init) - hcall_real_table
.long 0 /* 0x30 */
.long 0 /* 0x34 */
.long 0 /* 0x38 */

View File

@ -166,7 +166,8 @@ static irqreturn_t xive_esc_irq(int irq, void *data)
return IRQ_HANDLED;
}
static int xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio)
int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
bool single_escalation)
{
struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
struct xive_q *q = &xc->queues[prio];
@ -185,7 +186,7 @@ static int xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio)
return -EIO;
}
if (xc->xive->single_escalation)
if (single_escalation)
name = kasprintf(GFP_KERNEL, "kvm-%d-%d",
vcpu->kvm->arch.lpid, xc->server_num);
else
@ -217,7 +218,7 @@ static int xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio)
* interrupt, thus leaving it effectively masked after
* it fires once.
*/
if (xc->xive->single_escalation) {
if (single_escalation) {
struct irq_data *d = irq_get_irq_data(xc->esc_virq[prio]);
struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
@ -291,7 +292,8 @@ static int xive_check_provisioning(struct kvm *kvm, u8 prio)
continue;
rc = xive_provision_queue(vcpu, prio);
if (rc == 0 && !xive->single_escalation)
xive_attach_escalation(vcpu, prio);
kvmppc_xive_attach_escalation(vcpu, prio,
xive->single_escalation);
if (rc)
return rc;
}
@ -342,7 +344,7 @@ static int xive_try_pick_queue(struct kvm_vcpu *vcpu, u8 prio)
return atomic_add_unless(&q->count, 1, max) ? 0 : -EBUSY;
}
static int xive_select_target(struct kvm *kvm, u32 *server, u8 prio)
int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio)
{
struct kvm_vcpu *vcpu;
int i, rc;
@ -380,11 +382,6 @@ static int xive_select_target(struct kvm *kvm, u32 *server, u8 prio)
return -EBUSY;
}
static u32 xive_vp(struct kvmppc_xive *xive, u32 server)
{
return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server);
}
static u8 xive_lock_and_mask(struct kvmppc_xive *xive,
struct kvmppc_xive_src_block *sb,
struct kvmppc_xive_irq_state *state)
@ -430,8 +427,8 @@ static u8 xive_lock_and_mask(struct kvmppc_xive *xive,
*/
if (xd->flags & OPAL_XIVE_IRQ_MASK_VIA_FW) {
xive_native_configure_irq(hw_num,
xive_vp(xive, state->act_server),
MASKED, state->number);
kvmppc_xive_vp(xive, state->act_server),
MASKED, state->number);
/* set old_p so we can track if an H_EOI was done */
state->old_p = true;
state->old_q = false;
@ -486,8 +483,8 @@ static void xive_finish_unmask(struct kvmppc_xive *xive,
*/
if (xd->flags & OPAL_XIVE_IRQ_MASK_VIA_FW) {
xive_native_configure_irq(hw_num,
xive_vp(xive, state->act_server),
state->act_priority, state->number);
kvmppc_xive_vp(xive, state->act_server),
state->act_priority, state->number);
/* If an EOI is needed, do it here */
if (!state->old_p)
xive_vm_source_eoi(hw_num, xd);
@ -535,7 +532,7 @@ static int xive_target_interrupt(struct kvm *kvm,
* priority. The count for that new target will have
* already been incremented.
*/
rc = xive_select_target(kvm, &server, prio);
rc = kvmppc_xive_select_target(kvm, &server, prio);
/*
* We failed to find a target ? Not much we can do
@ -563,7 +560,7 @@ static int xive_target_interrupt(struct kvm *kvm,
kvmppc_xive_select_irq(state, &hw_num, NULL);
return xive_native_configure_irq(hw_num,
xive_vp(xive, server),
kvmppc_xive_vp(xive, server),
prio, state->number);
}
@ -849,7 +846,8 @@ int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
/*
* We can't update the state of a "pushed" VCPU, but that
* shouldn't happen.
* shouldn't happen because the vcpu->mutex makes running a
* vcpu mutually exclusive with doing one_reg get/set on it.
*/
if (WARN_ON(vcpu->arch.xive_pushed))
return -EIO;
@ -940,6 +938,13 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
/* Turn the IPI hard off */
xive_vm_esb_load(&state->ipi_data, XIVE_ESB_SET_PQ_01);
/*
* Reset ESB guest mapping. Needed when ESB pages are exposed
* to the guest in XIVE native mode
*/
if (xive->ops && xive->ops->reset_mapped)
xive->ops->reset_mapped(kvm, guest_irq);
/* Grab info about irq */
state->pt_number = hw_irq;
state->pt_data = irq_data_get_irq_handler_data(host_data);
@ -951,7 +956,7 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
* which is fine for a never started interrupt.
*/
xive_native_configure_irq(hw_irq,
xive_vp(xive, state->act_server),
kvmppc_xive_vp(xive, state->act_server),
state->act_priority, state->number);
/*
@ -1025,9 +1030,17 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
state->pt_number = 0;
state->pt_data = NULL;
/*
* Reset ESB guest mapping. Needed when ESB pages are exposed
* to the guest in XIVE native mode
*/
if (xive->ops && xive->ops->reset_mapped) {
xive->ops->reset_mapped(kvm, guest_irq);
}
/* Reconfigure the IPI */
xive_native_configure_irq(state->ipi_number,
xive_vp(xive, state->act_server),
kvmppc_xive_vp(xive, state->act_server),
state->act_priority, state->number);
/*
@ -1049,7 +1062,7 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
}
EXPORT_SYMBOL_GPL(kvmppc_xive_clr_mapped);
static void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
{
struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
struct kvm *kvm = vcpu->kvm;
@ -1083,14 +1096,35 @@ static void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
arch_spin_unlock(&sb->lock);
}
}
/* Disable vcpu's escalation interrupt */
if (vcpu->arch.xive_esc_on) {
__raw_readq((void __iomem *)(vcpu->arch.xive_esc_vaddr +
XIVE_ESB_SET_PQ_01));
vcpu->arch.xive_esc_on = false;
}
/*
* Clear pointers to escalation interrupt ESB.
* This is safe because the vcpu->mutex is held, preventing
* any other CPU from concurrently executing a KVM_RUN ioctl.
*/
vcpu->arch.xive_esc_vaddr = 0;
vcpu->arch.xive_esc_raddr = 0;
}
void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
{
struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
struct kvmppc_xive *xive = xc->xive;
struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
int i;
if (!kvmppc_xics_enabled(vcpu))
return;
if (!xc)
return;
pr_devel("cleanup_vcpu(cpu=%d)\n", xc->server_num);
/* Ensure no interrupt is still routed to that VP */
@ -1129,6 +1163,10 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
}
/* Free the VP */
kfree(xc);
/* Cleanup the vcpu */
vcpu->arch.irq_type = KVMPPC_IRQ_DEFAULT;
vcpu->arch.xive_vcpu = NULL;
}
int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
@ -1146,7 +1184,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
}
if (xive->kvm != vcpu->kvm)
return -EPERM;
if (vcpu->arch.irq_type)
if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
return -EBUSY;
if (kvmppc_xive_find_server(vcpu->kvm, cpu)) {
pr_devel("Duplicate !\n");
@ -1166,7 +1204,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
xc->xive = xive;
xc->vcpu = vcpu;
xc->server_num = cpu;
xc->vp_id = xive_vp(xive, cpu);
xc->vp_id = kvmppc_xive_vp(xive, cpu);
xc->mfrr = 0xff;
xc->valid = true;
@ -1219,7 +1257,8 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
if (xive->qmap & (1 << i)) {
r = xive_provision_queue(vcpu, i);
if (r == 0 && !xive->single_escalation)
xive_attach_escalation(vcpu, i);
kvmppc_xive_attach_escalation(
vcpu, i, xive->single_escalation);
if (r)
goto bail;
} else {
@ -1234,7 +1273,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
}
/* If not done above, attach priority 0 escalation */
r = xive_attach_escalation(vcpu, 0);
r = kvmppc_xive_attach_escalation(vcpu, 0, xive->single_escalation);
if (r)
goto bail;
@ -1485,8 +1524,8 @@ static int xive_get_source(struct kvmppc_xive *xive, long irq, u64 addr)
return 0;
}
static struct kvmppc_xive_src_block *xive_create_src_block(struct kvmppc_xive *xive,
int irq)
struct kvmppc_xive_src_block *kvmppc_xive_create_src_block(
struct kvmppc_xive *xive, int irq)
{
struct kvm *kvm = xive->kvm;
struct kvmppc_xive_src_block *sb;
@ -1509,6 +1548,7 @@ static struct kvmppc_xive_src_block *xive_create_src_block(struct kvmppc_xive *x
for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
sb->irq_state[i].number = (bid << KVMPPC_XICS_ICS_SHIFT) | i;
sb->irq_state[i].eisn = 0;
sb->irq_state[i].guest_priority = MASKED;
sb->irq_state[i].saved_priority = MASKED;
sb->irq_state[i].act_priority = MASKED;
@ -1565,7 +1605,7 @@ static int xive_set_source(struct kvmppc_xive *xive, long irq, u64 addr)
sb = kvmppc_xive_find_source(xive, irq, &idx);
if (!sb) {
pr_devel("No source, creating source block...\n");
sb = xive_create_src_block(xive, irq);
sb = kvmppc_xive_create_src_block(xive, irq);
if (!sb) {
pr_devel("Failed to create block...\n");
return -ENOMEM;
@ -1789,7 +1829,7 @@ static void kvmppc_xive_cleanup_irq(u32 hw_num, struct xive_irq_data *xd)
xive_cleanup_irq_data(xd);
}
static void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
{
int i;
@ -1810,16 +1850,55 @@ static void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
}
}
static void kvmppc_xive_free(struct kvm_device *dev)
/*
* Called when device fd is closed. kvm->lock is held.
*/
static void kvmppc_xive_release(struct kvm_device *dev)
{
struct kvmppc_xive *xive = dev->private;
struct kvm *kvm = xive->kvm;
struct kvm_vcpu *vcpu;
int i;
int was_ready;
pr_devel("Releasing xive device\n");
debugfs_remove(xive->dentry);
if (kvm)
kvm->arch.xive = NULL;
/*
* Clearing mmu_ready temporarily while holding kvm->lock
* is a way of ensuring that no vcpus can enter the guest
* until we drop kvm->lock. Doing kick_all_cpus_sync()
* ensures that any vcpu executing inside the guest has
* exited the guest. Once kick_all_cpus_sync() has finished,
* we know that no vcpu can be executing the XIVE push or
* pull code, or executing a XICS hcall.
*
* Since this is the device release function, we know that
* userspace does not have any open fd referring to the
* device. Therefore there can not be any of the device
* attribute set/get functions being executed concurrently,
* and similarly, the connect_vcpu and set/clr_mapped
* functions also cannot be being executed.
*/
was_ready = kvm->arch.mmu_ready;
kvm->arch.mmu_ready = 0;
kick_all_cpus_sync();
/*
* We should clean up the vCPU interrupt presenters first.
*/
kvm_for_each_vcpu(i, vcpu, kvm) {
/*
* Take vcpu->mutex to ensure that no one_reg get/set ioctl
* (i.e. kvmppc_xive_[gs]et_icp) can be done concurrently.
*/
mutex_lock(&vcpu->mutex);
kvmppc_xive_cleanup_vcpu(vcpu);
mutex_unlock(&vcpu->mutex);
}
kvm->arch.xive = NULL;
/* Mask and free interrupts */
for (i = 0; i <= xive->max_sbid; i++) {
@ -1832,11 +1911,47 @@ static void kvmppc_xive_free(struct kvm_device *dev)
if (xive->vp_base != XIVE_INVALID_VP)
xive_native_free_vp_block(xive->vp_base);
kvm->arch.mmu_ready = was_ready;
/*
* A reference of the kvmppc_xive pointer is now kept under
* the xive_devices struct of the machine for reuse. It is
* freed when the VM is destroyed for now until we fix all the
* execution paths.
*/
kfree(xive);
kfree(dev);
}
/*
* When the guest chooses the interrupt mode (XICS legacy or XIVE
* native), the VM will switch of KVM device. The previous device will
* be "released" before the new one is created.
*
* Until we are sure all execution paths are well protected, provide a
* fail safe (transitional) method for device destruction, in which
* the XIVE device pointer is recycled and not directly freed.
*/
struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type)
{
struct kvmppc_xive **kvm_xive_device = type == KVM_DEV_TYPE_XIVE ?
&kvm->arch.xive_devices.native :
&kvm->arch.xive_devices.xics_on_xive;
struct kvmppc_xive *xive = *kvm_xive_device;
if (!xive) {
xive = kzalloc(sizeof(*xive), GFP_KERNEL);
*kvm_xive_device = xive;
} else {
memset(xive, 0, sizeof(*xive));
}
return xive;
}
/*
* Create a XICS device with XIVE backend. kvm->lock is held.
*/
static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
{
struct kvmppc_xive *xive;
@ -1845,7 +1960,7 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
pr_devel("Creating xive for partition\n");
xive = kzalloc(sizeof(*xive), GFP_KERNEL);
xive = kvmppc_xive_get_device(kvm, type);
if (!xive)
return -ENOMEM;
@ -1883,6 +1998,43 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
return 0;
}
int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu)
{
struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
unsigned int i;
for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
struct xive_q *q = &xc->queues[i];
u32 i0, i1, idx;
if (!q->qpage && !xc->esc_virq[i])
continue;
seq_printf(m, " [q%d]: ", i);
if (q->qpage) {
idx = q->idx;
i0 = be32_to_cpup(q->qpage + idx);
idx = (idx + 1) & q->msk;
i1 = be32_to_cpup(q->qpage + idx);
seq_printf(m, "T=%d %08x %08x...\n", q->toggle,
i0, i1);
}
if (xc->esc_virq[i]) {
struct irq_data *d = irq_get_irq_data(xc->esc_virq[i]);
struct xive_irq_data *xd =
irq_data_get_irq_handler_data(d);
u64 pq = xive_vm_esb_load(xd, XIVE_ESB_GET);
seq_printf(m, "E:%c%c I(%d:%llx:%llx)",
(pq & XIVE_ESB_VAL_P) ? 'P' : 'p',
(pq & XIVE_ESB_VAL_Q) ? 'Q' : 'q',
xc->esc_virq[i], pq, xd->eoi_page);
seq_puts(m, "\n");
}
}
return 0;
}
static int xive_debug_show(struct seq_file *m, void *private)
{
@ -1908,7 +2060,6 @@ static int xive_debug_show(struct seq_file *m, void *private)
kvm_for_each_vcpu(i, vcpu, kvm) {
struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
unsigned int i;
if (!xc)
continue;
@ -1918,33 +2069,8 @@ static int xive_debug_show(struct seq_file *m, void *private)
xc->server_num, xc->cppr, xc->hw_cppr,
xc->mfrr, xc->pending,
xc->stat_rm_h_xirr, xc->stat_vm_h_xirr);
for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
struct xive_q *q = &xc->queues[i];
u32 i0, i1, idx;
if (!q->qpage && !xc->esc_virq[i])
continue;
seq_printf(m, " [q%d]: ", i);
if (q->qpage) {
idx = q->idx;
i0 = be32_to_cpup(q->qpage + idx);
idx = (idx + 1) & q->msk;
i1 = be32_to_cpup(q->qpage + idx);
seq_printf(m, "T=%d %08x %08x... \n", q->toggle, i0, i1);
}
if (xc->esc_virq[i]) {
struct irq_data *d = irq_get_irq_data(xc->esc_virq[i]);
struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
u64 pq = xive_vm_esb_load(xd, XIVE_ESB_GET);
seq_printf(m, "E:%c%c I(%d:%llx:%llx)",
(pq & XIVE_ESB_VAL_P) ? 'P' : 'p',
(pq & XIVE_ESB_VAL_Q) ? 'Q' : 'q',
xc->esc_virq[i], pq, xd->eoi_page);
seq_printf(m, "\n");
}
}
kvmppc_xive_debug_show_queues(m, vcpu);
t_rm_h_xirr += xc->stat_rm_h_xirr;
t_rm_h_ipoll += xc->stat_rm_h_ipoll;
@ -1999,7 +2125,7 @@ struct kvm_device_ops kvm_xive_ops = {
.name = "kvm-xive",
.create = kvmppc_xive_create,
.init = kvmppc_xive_init,
.destroy = kvmppc_xive_free,
.release = kvmppc_xive_release,
.set_attr = xive_set_attr,
.get_attr = xive_get_attr,
.has_attr = xive_has_attr,

View File

@ -12,6 +12,13 @@
#ifdef CONFIG_KVM_XICS
#include "book3s_xics.h"
/*
* The XIVE Interrupt source numbers are within the range 0 to
* KVMPPC_XICS_NR_IRQS.
*/
#define KVMPPC_XIVE_FIRST_IRQ 0
#define KVMPPC_XIVE_NR_IRQS KVMPPC_XICS_NR_IRQS
/*
* State for one guest irq source.
*
@ -54,6 +61,9 @@ struct kvmppc_xive_irq_state {
bool saved_p;
bool saved_q;
u8 saved_scan_prio;
/* Xive native */
u32 eisn; /* Guest Effective IRQ number */
};
/* Select the "right" interrupt (IPI vs. passthrough) */
@ -84,6 +94,11 @@ struct kvmppc_xive_src_block {
struct kvmppc_xive_irq_state irq_state[KVMPPC_XICS_IRQ_PER_ICS];
};
struct kvmppc_xive;
struct kvmppc_xive_ops {
int (*reset_mapped)(struct kvm *kvm, unsigned long guest_irq);
};
struct kvmppc_xive {
struct kvm *kvm;
@ -122,6 +137,10 @@ struct kvmppc_xive {
/* Flags */
u8 single_escalation;
struct kvmppc_xive_ops *ops;
struct address_space *mapping;
struct mutex mapping_lock;
};
#define KVMPPC_XIVE_Q_COUNT 8
@ -198,6 +217,11 @@ static inline struct kvmppc_xive_src_block *kvmppc_xive_find_source(struct kvmpp
return xive->src_blocks[bid];
}
static inline u32 kvmppc_xive_vp(struct kvmppc_xive *xive, u32 server)
{
return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server);
}
/*
* Mapping between guest priorities and host priorities
* is as follow.
@ -248,5 +272,18 @@ extern int (*__xive_vm_h_ipi)(struct kvm_vcpu *vcpu, unsigned long server,
extern int (*__xive_vm_h_cppr)(struct kvm_vcpu *vcpu, unsigned long cppr);
extern int (*__xive_vm_h_eoi)(struct kvm_vcpu *vcpu, unsigned long xirr);
/*
* Common Xive routines for XICS-over-XIVE and XIVE native
*/
void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu);
int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu);
struct kvmppc_xive_src_block *kvmppc_xive_create_src_block(
struct kvmppc_xive *xive, int irq);
void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb);
int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio);
int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
bool single_escalation);
struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type);
#endif /* CONFIG_KVM_XICS */
#endif /* _KVM_PPC_BOOK3S_XICS_H */

File diff suppressed because it is too large Load Diff

View File

@ -130,25 +130,15 @@ static u32 GLUE(X_PFX,scan_interrupts)(struct kvmppc_xive_vcpu *xc,
*/
prio = ffs(pending) - 1;
/*
* If the most favoured prio we found pending is less
* favored (or equal) than a pending IPI, we return
* the IPI instead.
*
* Note: If pending was 0 and mfrr is 0xff, we will
* not spurriously take an IPI because mfrr cannot
* then be smaller than cppr.
*/
if (prio >= xc->mfrr && xc->mfrr < xc->cppr) {
prio = xc->mfrr;
hirq = XICS_IPI;
/* Don't scan past the guest cppr */
if (prio >= xc->cppr || prio > 7) {
if (xc->mfrr < xc->cppr) {
prio = xc->mfrr;
hirq = XICS_IPI;
}
break;
}
/* Don't scan past the guest cppr */
if (prio >= xc->cppr || prio > 7)
break;
/* Grab queue and pointers */
q = &xc->queues[prio];
idx = q->idx;
@ -184,9 +174,12 @@ skip_ipi:
* been set and another occurrence of the IPI will trigger.
*/
if (hirq == XICS_IPI || (prio == 0 && !qpage)) {
if (scan_type == scan_fetch)
if (scan_type == scan_fetch) {
GLUE(X_PFX,source_eoi)(xc->vp_ipi,
&xc->vp_ipi_data);
q->idx = idx;
q->toggle = toggle;
}
/* Loop back on same queue with updated idx/toggle */
#ifdef XIVE_RUNTIME_CHECKS
WARN_ON(hirq && hirq != XICS_IPI);
@ -199,32 +192,41 @@ skip_ipi:
if (hirq == XICS_DUMMY)
goto skip_ipi;
/* Clear the pending bit if the queue is now empty */
if (!hirq) {
pending &= ~(1 << prio);
/*
* Check if the queue count needs adjusting due to
* interrupts being moved away.
*/
if (atomic_read(&q->pending_count)) {
int p = atomic_xchg(&q->pending_count, 0);
if (p) {
#ifdef XIVE_RUNTIME_CHECKS
WARN_ON(p > atomic_read(&q->count));
#endif
atomic_sub(p, &q->count);
}
}
}
/*
* If the most favoured prio we found pending is less
* favored (or equal) than a pending IPI, we return
* the IPI instead.
*/
if (prio >= xc->mfrr && xc->mfrr < xc->cppr) {
prio = xc->mfrr;
hirq = XICS_IPI;
break;
}
/* If fetching, update queue pointers */
if (scan_type == scan_fetch) {
q->idx = idx;
q->toggle = toggle;
}
/* Something found, stop searching */
if (hirq)
break;
/* Clear the pending bit on the now empty queue */
pending &= ~(1 << prio);
/*
* Check if the queue count needs adjusting due to
* interrupts being moved away.
*/
if (atomic_read(&q->pending_count)) {
int p = atomic_xchg(&q->pending_count, 0);
if (p) {
#ifdef XIVE_RUNTIME_CHECKS
WARN_ON(p > atomic_read(&q->count));
#endif
atomic_sub(p, &q->count);
}
}
}
/* If we are just taking a "peek", do nothing else */

View File

@ -570,6 +570,16 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_PPC_GET_CPU_CHAR:
r = 1;
break;
#ifdef CONFIG_KVM_XIVE
case KVM_CAP_PPC_IRQ_XIVE:
/*
* We need XIVE to be enabled on the platform (implies
* a POWER9 processor) and the PowerNV platform, as
* nested is not yet supported.
*/
r = xive_enabled() && !!cpu_has_feature(CPU_FTR_HVMODE);
break;
#endif
case KVM_CAP_PPC_ALLOC_HTAB:
r = hv_enabled;
@ -644,9 +654,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
else
r = num_online_cpus();
break;
case KVM_CAP_NR_MEMSLOTS:
r = KVM_USER_MEM_SLOTS;
break;
case KVM_CAP_MAX_VCPUS:
r = KVM_MAX_VCPUS;
break;
@ -753,6 +760,9 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
else
kvmppc_xics_free_icp(vcpu);
break;
case KVMPPC_IRQ_XIVE:
kvmppc_xive_native_cleanup_vcpu(vcpu);
break;
}
kvmppc_core_vcpu_free(vcpu);
@ -1941,6 +1951,30 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
break;
}
#endif /* CONFIG_KVM_XICS */
#ifdef CONFIG_KVM_XIVE
case KVM_CAP_PPC_IRQ_XIVE: {
struct fd f;
struct kvm_device *dev;
r = -EBADF;
f = fdget(cap->args[0]);
if (!f.file)
break;
r = -ENXIO;
if (!xive_enabled())
break;
r = -EPERM;
dev = kvm_device_from_filp(f.file);
if (dev)
r = kvmppc_xive_native_connect_vcpu(dev, vcpu,
cap->args[1]);
fdput(f);
break;
}
#endif /* CONFIG_KVM_XIVE */
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
case KVM_CAP_PPC_FWNMI:
r = -EINVAL;

View File

@ -521,6 +521,9 @@ u32 xive_native_default_eq_shift(void)
}
EXPORT_SYMBOL_GPL(xive_native_default_eq_shift);
unsigned long xive_tima_os;
EXPORT_SYMBOL_GPL(xive_tima_os);
bool __init xive_native_init(void)
{
struct device_node *np;
@ -573,6 +576,14 @@ bool __init xive_native_init(void)
for_each_possible_cpu(cpu)
kvmppc_set_xive_tima(cpu, r.start, tima);
/* Resource 2 is OS window */
if (of_address_to_resource(np, 2, &r)) {
pr_err("Failed to get thread mgmnt area resource\n");
return false;
}
xive_tima_os = r.start;
/* Grab size of provisionning pages */
xive_parse_provisioning(np);

View File

@ -28,6 +28,7 @@
#define CPACF_KMCTR 0xb92d /* MSA4 */
#define CPACF_PRNO 0xb93c /* MSA5 */
#define CPACF_KMA 0xb929 /* MSA8 */
#define CPACF_KDSA 0xb93a /* MSA9 */
/*
* En/decryption modifier bits

View File

@ -278,6 +278,7 @@ struct kvm_s390_sie_block {
#define ECD_HOSTREGMGMT 0x20000000
#define ECD_MEF 0x08000000
#define ECD_ETOKENF 0x02000000
#define ECD_ECC 0x00200000
__u32 ecd; /* 0x01c8 */
__u8 reserved1cc[18]; /* 0x01cc */
__u64 pp; /* 0x01de */
@ -312,6 +313,7 @@ struct kvm_vcpu_stat {
u64 halt_successful_poll;
u64 halt_attempted_poll;
u64 halt_poll_invalid;
u64 halt_no_poll_steal;
u64 halt_wakeup;
u64 instruction_lctl;
u64 instruction_lctlg;

View File

@ -152,7 +152,10 @@ struct kvm_s390_vm_cpu_subfunc {
__u8 pcc[16]; /* with MSA4 */
__u8 ppno[16]; /* with MSA5 */
__u8 kma[16]; /* with MSA8 */
__u8 reserved[1808];
__u8 kdsa[16]; /* with MSA9 */
__u8 sortl[32]; /* with STFLE.150 */
__u8 dfltcc[32]; /* with STFLE.151 */
__u8 reserved[1728];
};
/* kvm attributes for crypto */

View File

@ -30,6 +30,7 @@ config KVM
select HAVE_KVM_IRQFD
select HAVE_KVM_IRQ_ROUTING
select HAVE_KVM_INVALID_WAKEUPS
select HAVE_KVM_NO_POLL
select SRCU
select KVM_VFIO
---help---

View File

@ -14,6 +14,7 @@
#include <linux/kvm_host.h>
#include <linux/hrtimer.h>
#include <linux/mmu_context.h>
#include <linux/nospec.h>
#include <linux/signal.h>
#include <linux/slab.h>
#include <linux/bitmap.h>
@ -2307,6 +2308,7 @@ static struct s390_io_adapter *get_io_adapter(struct kvm *kvm, unsigned int id)
{
if (id >= MAX_S390_IO_ADAPTERS)
return NULL;
id = array_index_nospec(id, MAX_S390_IO_ADAPTERS);
return kvm->arch.adapters[id];
}
@ -2320,8 +2322,13 @@ static int register_io_adapter(struct kvm_device *dev,
(void __user *)attr->addr, sizeof(adapter_info)))
return -EFAULT;
if ((adapter_info.id >= MAX_S390_IO_ADAPTERS) ||
(dev->kvm->arch.adapters[adapter_info.id] != NULL))
if (adapter_info.id >= MAX_S390_IO_ADAPTERS)
return -EINVAL;
adapter_info.id = array_index_nospec(adapter_info.id,
MAX_S390_IO_ADAPTERS);
if (dev->kvm->arch.adapters[adapter_info.id] != NULL)
return -EINVAL;
adapter = kzalloc(sizeof(*adapter), GFP_KERNEL);

View File

@ -75,6 +75,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "halt_successful_poll", VCPU_STAT(halt_successful_poll) },
{ "halt_attempted_poll", VCPU_STAT(halt_attempted_poll) },
{ "halt_poll_invalid", VCPU_STAT(halt_poll_invalid) },
{ "halt_no_poll_steal", VCPU_STAT(halt_no_poll_steal) },
{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
{ "instruction_lctlg", VCPU_STAT(instruction_lctlg) },
{ "instruction_lctl", VCPU_STAT(instruction_lctl) },
@ -177,6 +178,11 @@ static int hpage;
module_param(hpage, int, 0444);
MODULE_PARM_DESC(hpage, "1m huge page backing support");
/* maximum percentage of steal time for polling. >100 is treated like 100 */
static u8 halt_poll_max_steal = 10;
module_param(halt_poll_max_steal, byte, 0644);
MODULE_PARM_DESC(hpage, "Maximum percentage of steal time to allow polling");
/*
* For now we handle at most 16 double words as this is what the s390 base
* kernel handles and stores in the prefix page. If we ever need to go beyond
@ -321,6 +327,22 @@ static inline int plo_test_bit(unsigned char nr)
return cc == 0;
}
static inline void __insn32_query(unsigned int opcode, u8 query[32])
{
register unsigned long r0 asm("0") = 0; /* query function */
register unsigned long r1 asm("1") = (unsigned long) query;
asm volatile(
/* Parameter regs are ignored */
" .insn rrf,%[opc] << 16,2,4,6,0\n"
: "=m" (*query)
: "d" (r0), "a" (r1), [opc] "i" (opcode)
: "cc");
}
#define INSN_SORTL 0xb938
#define INSN_DFLTCC 0xb939
static void kvm_s390_cpu_feat_init(void)
{
int i;
@ -368,6 +390,16 @@ static void kvm_s390_cpu_feat_init(void)
__cpacf_query(CPACF_KMA, (cpacf_mask_t *)
kvm_s390_available_subfunc.kma);
if (test_facility(155)) /* MSA9 */
__cpacf_query(CPACF_KDSA, (cpacf_mask_t *)
kvm_s390_available_subfunc.kdsa);
if (test_facility(150)) /* SORTL */
__insn32_query(INSN_SORTL, kvm_s390_available_subfunc.sortl);
if (test_facility(151)) /* DFLTCC */
__insn32_query(INSN_DFLTCC, kvm_s390_available_subfunc.dfltcc);
if (MACHINE_HAS_ESOP)
allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
/*
@ -513,9 +545,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
else if (sclp.has_esca && sclp.has_64bscao)
r = KVM_S390_ESCA_CPU_SLOTS;
break;
case KVM_CAP_NR_MEMSLOTS:
r = KVM_USER_MEM_SLOTS;
break;
case KVM_CAP_S390_COW:
r = MACHINE_HAS_ESOP;
break;
@ -657,6 +686,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
set_kvm_facility(kvm->arch.model.fac_mask, 135);
set_kvm_facility(kvm->arch.model.fac_list, 135);
}
if (test_facility(148)) {
set_kvm_facility(kvm->arch.model.fac_mask, 148);
set_kvm_facility(kvm->arch.model.fac_list, 148);
}
if (test_facility(152)) {
set_kvm_facility(kvm->arch.model.fac_mask, 152);
set_kvm_facility(kvm->arch.model.fac_list, 152);
}
r = 0;
} else
r = -EINVAL;
@ -1323,6 +1360,19 @@ static int kvm_s390_set_processor_subfunc(struct kvm *kvm,
VM_EVENT(kvm, 3, "SET: guest KMA subfunc 0x%16.16lx.%16.16lx",
((unsigned long *) &kvm->arch.model.subfuncs.kma)[0],
((unsigned long *) &kvm->arch.model.subfuncs.kma)[1]);
VM_EVENT(kvm, 3, "SET: guest KDSA subfunc 0x%16.16lx.%16.16lx",
((unsigned long *) &kvm->arch.model.subfuncs.kdsa)[0],
((unsigned long *) &kvm->arch.model.subfuncs.kdsa)[1]);
VM_EVENT(kvm, 3, "SET: guest SORTL subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx",
((unsigned long *) &kvm->arch.model.subfuncs.sortl)[0],
((unsigned long *) &kvm->arch.model.subfuncs.sortl)[1],
((unsigned long *) &kvm->arch.model.subfuncs.sortl)[2],
((unsigned long *) &kvm->arch.model.subfuncs.sortl)[3]);
VM_EVENT(kvm, 3, "SET: guest DFLTCC subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx",
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[0],
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[1],
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[2],
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[3]);
return 0;
}
@ -1491,6 +1541,19 @@ static int kvm_s390_get_processor_subfunc(struct kvm *kvm,
VM_EVENT(kvm, 3, "GET: guest KMA subfunc 0x%16.16lx.%16.16lx",
((unsigned long *) &kvm->arch.model.subfuncs.kma)[0],
((unsigned long *) &kvm->arch.model.subfuncs.kma)[1]);
VM_EVENT(kvm, 3, "GET: guest KDSA subfunc 0x%16.16lx.%16.16lx",
((unsigned long *) &kvm->arch.model.subfuncs.kdsa)[0],
((unsigned long *) &kvm->arch.model.subfuncs.kdsa)[1]);
VM_EVENT(kvm, 3, "GET: guest SORTL subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx",
((unsigned long *) &kvm->arch.model.subfuncs.sortl)[0],
((unsigned long *) &kvm->arch.model.subfuncs.sortl)[1],
((unsigned long *) &kvm->arch.model.subfuncs.sortl)[2],
((unsigned long *) &kvm->arch.model.subfuncs.sortl)[3]);
VM_EVENT(kvm, 3, "GET: guest DFLTCC subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx",
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[0],
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[1],
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[2],
((unsigned long *) &kvm->arch.model.subfuncs.dfltcc)[3]);
return 0;
}
@ -1546,6 +1609,19 @@ static int kvm_s390_get_machine_subfunc(struct kvm *kvm,
VM_EVENT(kvm, 3, "GET: host KMA subfunc 0x%16.16lx.%16.16lx",
((unsigned long *) &kvm_s390_available_subfunc.kma)[0],
((unsigned long *) &kvm_s390_available_subfunc.kma)[1]);
VM_EVENT(kvm, 3, "GET: host KDSA subfunc 0x%16.16lx.%16.16lx",
((unsigned long *) &kvm_s390_available_subfunc.kdsa)[0],
((unsigned long *) &kvm_s390_available_subfunc.kdsa)[1]);
VM_EVENT(kvm, 3, "GET: host SORTL subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx",
((unsigned long *) &kvm_s390_available_subfunc.sortl)[0],
((unsigned long *) &kvm_s390_available_subfunc.sortl)[1],
((unsigned long *) &kvm_s390_available_subfunc.sortl)[2],
((unsigned long *) &kvm_s390_available_subfunc.sortl)[3]);
VM_EVENT(kvm, 3, "GET: host DFLTCC subfunc 0x%16.16lx.%16.16lx.%16.16lx.%16.16lx",
((unsigned long *) &kvm_s390_available_subfunc.dfltcc)[0],
((unsigned long *) &kvm_s390_available_subfunc.dfltcc)[1],
((unsigned long *) &kvm_s390_available_subfunc.dfltcc)[2],
((unsigned long *) &kvm_s390_available_subfunc.dfltcc)[3]);
return 0;
}
@ -2817,6 +2893,25 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
vcpu->arch.enabled_gmap = vcpu->arch.gmap;
}
static bool kvm_has_pckmo_subfunc(struct kvm *kvm, unsigned long nr)
{
if (test_bit_inv(nr, (unsigned long *)&kvm->arch.model.subfuncs.pckmo) &&
test_bit_inv(nr, (unsigned long *)&kvm_s390_available_subfunc.pckmo))
return true;
return false;
}
static bool kvm_has_pckmo_ecc(struct kvm *kvm)
{
/* At least one ECC subfunction must be present */
return kvm_has_pckmo_subfunc(kvm, 32) ||
kvm_has_pckmo_subfunc(kvm, 33) ||
kvm_has_pckmo_subfunc(kvm, 34) ||
kvm_has_pckmo_subfunc(kvm, 40) ||
kvm_has_pckmo_subfunc(kvm, 41);
}
static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
{
/*
@ -2829,13 +2924,19 @@ static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
vcpu->arch.sie_block->eca &= ~ECA_APIE;
vcpu->arch.sie_block->ecd &= ~ECD_ECC;
if (vcpu->kvm->arch.crypto.apie)
vcpu->arch.sie_block->eca |= ECA_APIE;
/* Set up protected key support */
if (vcpu->kvm->arch.crypto.aes_kw)
if (vcpu->kvm->arch.crypto.aes_kw) {
vcpu->arch.sie_block->ecb3 |= ECB3_AES;
/* ecc is also wrapped with AES key */
if (kvm_has_pckmo_ecc(vcpu->kvm))
vcpu->arch.sie_block->ecd |= ECD_ECC;
}
if (vcpu->kvm->arch.crypto.dea_kw)
vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
}
@ -3068,6 +3169,17 @@ static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
}
}
bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
{
/* do not poll with more than halt_poll_max_steal percent of steal time */
if (S390_lowcore.avg_steal_timer * 100 / (TICK_USEC << 12) >=
halt_poll_max_steal) {
vcpu->stat.halt_no_poll_steal++;
return true;
}
return false;
}
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
{
/* kvm common code refers to this, but never calls it */

View File

@ -288,7 +288,9 @@ static int shadow_crycb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
const u32 crycb_addr = crycbd_o & 0x7ffffff8U;
unsigned long *b1, *b2;
u8 ecb3_flags;
u32 ecd_flags;
int apie_h;
int apie_s;
int key_msk = test_kvm_facility(vcpu->kvm, 76);
int fmt_o = crycbd_o & CRYCB_FORMAT_MASK;
int fmt_h = vcpu->arch.sie_block->crycbd & CRYCB_FORMAT_MASK;
@ -297,7 +299,8 @@ static int shadow_crycb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
scb_s->crycbd = 0;
apie_h = vcpu->arch.sie_block->eca & ECA_APIE;
if (!apie_h && (!key_msk || fmt_o == CRYCB_FORMAT0))
apie_s = apie_h & scb_o->eca;
if (!apie_s && (!key_msk || (fmt_o == CRYCB_FORMAT0)))
return 0;
if (!crycb_addr)
@ -308,7 +311,7 @@ static int shadow_crycb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
((crycb_addr + 128) & PAGE_MASK))
return set_validity_icpt(scb_s, 0x003CU);
if (apie_h && (scb_o->eca & ECA_APIE)) {
if (apie_s) {
ret = setup_apcb(vcpu, &vsie_page->crycb, crycb_addr,
vcpu->kvm->arch.crypto.crycb,
fmt_o, fmt_h);
@ -320,7 +323,8 @@ static int shadow_crycb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
/* we may only allow it if enabled for guest 2 */
ecb3_flags = scb_o->ecb3 & vcpu->arch.sie_block->ecb3 &
(ECB3_AES | ECB3_DEA);
if (!ecb3_flags)
ecd_flags = scb_o->ecd & vcpu->arch.sie_block->ecd & ECD_ECC;
if (!ecb3_flags && !ecd_flags)
goto end;
/* copy only the wrapping keys */
@ -329,6 +333,7 @@ static int shadow_crycb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
return set_validity_icpt(scb_s, 0x0035U);
scb_s->ecb3 |= ecb3_flags;
scb_s->ecd |= ecd_flags;
/* xor both blocks in one run */
b1 = (unsigned long *) vsie_page->crycb.dea_wrapping_key_mask;
@ -339,7 +344,7 @@ static int shadow_crycb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
end:
switch (ret) {
case -EINVAL:
return set_validity_icpt(scb_s, 0x0020U);
return set_validity_icpt(scb_s, 0x0022U);
case -EFAULT:
return set_validity_icpt(scb_s, 0x0035U);
case -EACCES:

View File

@ -93,6 +93,9 @@ static struct facility_def facility_defs[] = {
131, /* enhanced-SOP 2 and side-effect */
139, /* multiple epoch facility */
146, /* msa extension 8 */
150, /* enhanced sort */
151, /* deflate conversion */
155, /* msa extension 9 */
-1 /* END */
}
},

View File

@ -2384,7 +2384,11 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
*/
if (__test_and_clear_bit(55, (unsigned long *)&status)) {
handled++;
intel_pt_interrupt();
if (unlikely(perf_guest_cbs && perf_guest_cbs->is_in_guest() &&
perf_guest_cbs->handle_intel_pt_intr))
perf_guest_cbs->handle_intel_pt_intr();
else
intel_pt_interrupt();
}
/*

View File

@ -10,6 +10,7 @@ extern struct e820_table *e820_table_firmware;
extern unsigned long pci_mem_start;
extern bool e820__mapped_raw_any(u64 start, u64 end, enum e820_type type);
extern bool e820__mapped_any(u64 start, u64 end, enum e820_type type);
extern bool e820__mapped_all(u64 start, u64 end, enum e820_type type);

View File

@ -470,6 +470,7 @@ struct kvm_pmu {
u64 global_ovf_ctrl;
u64 counter_bitmask[2];
u64 global_ctrl_mask;
u64 global_ovf_ctrl_mask;
u64 reserved_bits;
u8 version;
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
@ -781,6 +782,9 @@ struct kvm_vcpu_arch {
/* Flush the L1 Data cache for L1TF mitigation on VMENTER */
bool l1tf_flush_l1d;
/* AMD MSRC001_0015 Hardware Configuration */
u64 msr_hwcr;
};
struct kvm_lpage_info {
@ -1168,7 +1172,8 @@ struct kvm_x86_ops {
uint32_t guest_irq, bool set);
void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc);
int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
bool *expired);
void (*cancel_hv_timer)(struct kvm_vcpu *vcpu);
void (*setup_mce)(struct kvm_vcpu *vcpu);

View File

@ -789,6 +789,14 @@
#define MSR_CORE_PERF_GLOBAL_CTRL 0x0000038f
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x00000390
/* PERF_GLOBAL_OVF_CTL bits */
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT 55
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI (1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT)
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF_BIT 62
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF (1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF_BIT)
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT 63
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD (1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT)
/* Geode defined MSRs */
#define MSR_GEODE_BUSCONT_CONF0 0x00001900

View File

@ -73,12 +73,13 @@ EXPORT_SYMBOL(pci_mem_start);
* This function checks if any part of the range <start,end> is mapped
* with type.
*/
bool e820__mapped_any(u64 start, u64 end, enum e820_type type)
static bool _e820__mapped_any(struct e820_table *table,
u64 start, u64 end, enum e820_type type)
{
int i;
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
for (i = 0; i < table->nr_entries; i++) {
struct e820_entry *entry = &table->entries[i];
if (type && entry->type != type)
continue;
@ -88,6 +89,17 @@ bool e820__mapped_any(u64 start, u64 end, enum e820_type type)
}
return 0;
}
bool e820__mapped_raw_any(u64 start, u64 end, enum e820_type type)
{
return _e820__mapped_any(e820_table_firmware, start, end, type);
}
EXPORT_SYMBOL_GPL(e820__mapped_raw_any);
bool e820__mapped_any(u64 start, u64 end, enum e820_type type)
{
return _e820__mapped_any(e820_table, start, end, type);
}
EXPORT_SYMBOL_GPL(e820__mapped_any);
/*

View File

@ -963,13 +963,13 @@ int kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
if (cpuid_fault_enabled(vcpu) && !kvm_require_cpl(vcpu, 0))
return 1;
eax = kvm_register_read(vcpu, VCPU_REGS_RAX);
ecx = kvm_register_read(vcpu, VCPU_REGS_RCX);
eax = kvm_rax_read(vcpu);
ecx = kvm_rcx_read(vcpu);
kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, true);
kvm_register_write(vcpu, VCPU_REGS_RAX, eax);
kvm_register_write(vcpu, VCPU_REGS_RBX, ebx);
kvm_register_write(vcpu, VCPU_REGS_RCX, ecx);
kvm_register_write(vcpu, VCPU_REGS_RDX, edx);
kvm_rax_write(vcpu, eax);
kvm_rbx_write(vcpu, ebx);
kvm_rcx_write(vcpu, ecx);
kvm_rdx_write(vcpu, edx);
return kvm_skip_emulated_instruction(vcpu);
}
EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);

View File

@ -1535,10 +1535,10 @@ static void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result)
longmode = is_64_bit_mode(vcpu);
if (longmode)
kvm_register_write(vcpu, VCPU_REGS_RAX, result);
kvm_rax_write(vcpu, result);
else {
kvm_register_write(vcpu, VCPU_REGS_RDX, result >> 32);
kvm_register_write(vcpu, VCPU_REGS_RAX, result & 0xffffffff);
kvm_rdx_write(vcpu, result >> 32);
kvm_rax_write(vcpu, result & 0xffffffff);
}
}
@ -1611,18 +1611,18 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
longmode = is_64_bit_mode(vcpu);
if (!longmode) {
param = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDX) << 32) |
(kvm_register_read(vcpu, VCPU_REGS_RAX) & 0xffffffff);
ingpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RBX) << 32) |
(kvm_register_read(vcpu, VCPU_REGS_RCX) & 0xffffffff);
outgpa = ((u64)kvm_register_read(vcpu, VCPU_REGS_RDI) << 32) |
(kvm_register_read(vcpu, VCPU_REGS_RSI) & 0xffffffff);
param = ((u64)kvm_rdx_read(vcpu) << 32) |
(kvm_rax_read(vcpu) & 0xffffffff);
ingpa = ((u64)kvm_rbx_read(vcpu) << 32) |
(kvm_rcx_read(vcpu) & 0xffffffff);
outgpa = ((u64)kvm_rdi_read(vcpu) << 32) |
(kvm_rsi_read(vcpu) & 0xffffffff);
}
#ifdef CONFIG_X86_64
else {
param = kvm_register_read(vcpu, VCPU_REGS_RCX);
ingpa = kvm_register_read(vcpu, VCPU_REGS_RDX);
outgpa = kvm_register_read(vcpu, VCPU_REGS_R8);
param = kvm_rcx_read(vcpu);
ingpa = kvm_rdx_read(vcpu);
outgpa = kvm_r8_read(vcpu);
}
#endif

View File

@ -9,6 +9,34 @@
(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR \
| X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_PGE)
#define BUILD_KVM_GPR_ACCESSORS(lname, uname) \
static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\
{ \
return vcpu->arch.regs[VCPU_REGS_##uname]; \
} \
static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu, \
unsigned long val) \
{ \
vcpu->arch.regs[VCPU_REGS_##uname] = val; \
}
BUILD_KVM_GPR_ACCESSORS(rax, RAX)
BUILD_KVM_GPR_ACCESSORS(rbx, RBX)
BUILD_KVM_GPR_ACCESSORS(rcx, RCX)
BUILD_KVM_GPR_ACCESSORS(rdx, RDX)
BUILD_KVM_GPR_ACCESSORS(rbp, RBP)
BUILD_KVM_GPR_ACCESSORS(rsi, RSI)
BUILD_KVM_GPR_ACCESSORS(rdi, RDI)
#ifdef CONFIG_X86_64
BUILD_KVM_GPR_ACCESSORS(r8, R8)
BUILD_KVM_GPR_ACCESSORS(r9, R9)
BUILD_KVM_GPR_ACCESSORS(r10, R10)
BUILD_KVM_GPR_ACCESSORS(r11, R11)
BUILD_KVM_GPR_ACCESSORS(r12, R12)
BUILD_KVM_GPR_ACCESSORS(r13, R13)
BUILD_KVM_GPR_ACCESSORS(r14, R14)
BUILD_KVM_GPR_ACCESSORS(r15, R15)
#endif
static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu,
enum kvm_reg reg)
{
@ -37,6 +65,16 @@ static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val)
kvm_register_write(vcpu, VCPU_REGS_RIP, val);
}
static inline unsigned long kvm_rsp_read(struct kvm_vcpu *vcpu)
{
return kvm_register_read(vcpu, VCPU_REGS_RSP);
}
static inline void kvm_rsp_write(struct kvm_vcpu *vcpu, unsigned long val)
{
kvm_register_write(vcpu, VCPU_REGS_RSP, val);
}
static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
{
might_sleep(); /* on svm */
@ -83,8 +121,8 @@ static inline ulong kvm_read_cr4(struct kvm_vcpu *vcpu)
static inline u64 kvm_read_edx_eax(struct kvm_vcpu *vcpu)
{
return (kvm_register_read(vcpu, VCPU_REGS_RAX) & -1u)
| ((u64)(kvm_register_read(vcpu, VCPU_REGS_RDX) & -1u) << 32);
return (kvm_rax_read(vcpu) & -1u)
| ((u64)(kvm_rdx_read(vcpu) & -1u) << 32);
}
static inline void enter_guest_mode(struct kvm_vcpu *vcpu)

View File

@ -1454,7 +1454,7 @@ static void apic_timer_expired(struct kvm_lapic *apic)
if (swait_active(q))
swake_up_one(q);
if (apic_lvtt_tscdeadline(apic))
if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
ktimer->expired_tscdeadline = ktimer->tscdeadline;
}
@ -1696,37 +1696,42 @@ static void cancel_hv_timer(struct kvm_lapic *apic)
static bool start_hv_timer(struct kvm_lapic *apic)
{
struct kvm_timer *ktimer = &apic->lapic_timer;
int r;
struct kvm_vcpu *vcpu = apic->vcpu;
bool expired;
WARN_ON(preemptible());
if (!kvm_x86_ops->set_hv_timer)
return false;
if (!apic_lvtt_period(apic) && atomic_read(&ktimer->pending))
return false;
if (!ktimer->tscdeadline)
return false;
r = kvm_x86_ops->set_hv_timer(apic->vcpu, ktimer->tscdeadline);
if (r < 0)
if (kvm_x86_ops->set_hv_timer(vcpu, ktimer->tscdeadline, &expired))
return false;
ktimer->hv_timer_in_use = true;
hrtimer_cancel(&ktimer->timer);
/*
* Also recheck ktimer->pending, in case the sw timer triggered in
* the window. For periodic timer, leave the hv timer running for
* simplicity, and the deadline will be recomputed on the next vmexit.
* To simplify handling the periodic timer, leave the hv timer running
* even if the deadline timer has expired, i.e. rely on the resulting
* VM-Exit to recompute the periodic timer's target expiration.
*/
if (!apic_lvtt_period(apic) && (r || atomic_read(&ktimer->pending))) {
if (r)
if (!apic_lvtt_period(apic)) {
/*
* Cancel the hv timer if the sw timer fired while the hv timer
* was being programmed, or if the hv timer itself expired.
*/
if (atomic_read(&ktimer->pending)) {
cancel_hv_timer(apic);
} else if (expired) {
apic_timer_expired(apic);
return false;
cancel_hv_timer(apic);
}
}
trace_kvm_hv_timer_state(apic->vcpu->vcpu_id, true);
trace_kvm_hv_timer_state(vcpu->vcpu_id, ktimer->hv_timer_in_use);
return true;
}
@ -1750,8 +1755,13 @@ static void start_sw_timer(struct kvm_lapic *apic)
static void restart_apic_timer(struct kvm_lapic *apic)
{
preempt_disable();
if (!apic_lvtt_period(apic) && atomic_read(&apic->lapic_timer.pending))
goto out;
if (!start_hv_timer(apic))
start_sw_timer(apic);
out:
preempt_enable();
}

View File

@ -44,6 +44,7 @@
#include <asm/page.h>
#include <asm/pat.h>
#include <asm/cmpxchg.h>
#include <asm/e820/api.h>
#include <asm/io.h>
#include <asm/vmx.h>
#include <asm/kvm_page_track.h>
@ -487,16 +488,24 @@ static void kvm_mmu_reset_all_pte_masks(void)
* If the CPU has 46 or less physical address bits, then set an
* appropriate mask to guard against L1TF attacks. Otherwise, it is
* assumed that the CPU is not vulnerable to L1TF.
*
* Some Intel CPUs address the L1 cache using more PA bits than are
* reported by CPUID. Use the PA width of the L1 cache when possible
* to achieve more effective mitigation, e.g. if system RAM overlaps
* the most significant bits of legal physical address space.
*/
low_phys_bits = boot_cpu_data.x86_phys_bits;
if (boot_cpu_data.x86_phys_bits <
shadow_nonpresent_or_rsvd_mask = 0;
low_phys_bits = boot_cpu_data.x86_cache_bits;
if (boot_cpu_data.x86_cache_bits <
52 - shadow_nonpresent_or_rsvd_mask_len) {
shadow_nonpresent_or_rsvd_mask =
rsvd_bits(boot_cpu_data.x86_phys_bits -
rsvd_bits(boot_cpu_data.x86_cache_bits -
shadow_nonpresent_or_rsvd_mask_len,
boot_cpu_data.x86_phys_bits - 1);
boot_cpu_data.x86_cache_bits - 1);
low_phys_bits -= shadow_nonpresent_or_rsvd_mask_len;
}
} else
WARN_ON_ONCE(boot_cpu_has_bug(X86_BUG_L1TF));
shadow_nonpresent_or_rsvd_lower_gfn_mask =
GENMASK_ULL(low_phys_bits - 1, PAGE_SHIFT);
}
@ -2892,7 +2901,9 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
*/
(!pat_enabled() || pat_pfn_immune_to_uc_mtrr(pfn));
return true;
return !e820__mapped_raw_any(pfn_to_hpa(pfn),
pfn_to_hpa(pfn + 1) - 1,
E820_TYPE_RAM);
}
/* Bits which may be returned by set_spte() */

View File

@ -48,11 +48,6 @@ static bool msr_mtrr_valid(unsigned msr)
return false;
}
static bool valid_pat_type(unsigned t)
{
return t < 8 && (1 << t) & 0xf3; /* 0, 1, 4, 5, 6, 7 */
}
static bool valid_mtrr_type(unsigned t)
{
return t < 8 && (1 << t) & 0x73; /* 0, 1, 4, 5, 6 */
@ -67,10 +62,7 @@ bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data)
return false;
if (msr == MSR_IA32_CR_PAT) {
for (i = 0; i < 8; i++)
if (!valid_pat_type((data >> (i * 8)) & 0xff))
return false;
return true;
return kvm_pat_valid(data);
} else if (msr == MSR_MTRRdefType) {
if (data & ~0xcff)
return false;

View File

@ -141,15 +141,35 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
struct page *page;
npages = get_user_pages_fast((unsigned long)ptep_user, 1, FOLL_WRITE, &page);
/* Check if the user is doing something meaningless. */
if (unlikely(npages != 1))
return -EFAULT;
if (likely(npages == 1)) {
table = kmap_atomic(page);
ret = CMPXCHG(&table[index], orig_pte, new_pte);
kunmap_atomic(table);
table = kmap_atomic(page);
ret = CMPXCHG(&table[index], orig_pte, new_pte);
kunmap_atomic(table);
kvm_release_page_dirty(page);
} else {
struct vm_area_struct *vma;
unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK;
unsigned long pfn;
unsigned long paddr;
kvm_release_page_dirty(page);
down_read(&current->mm->mmap_sem);
vma = find_vma_intersection(current->mm, vaddr, vaddr + PAGE_SIZE);
if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
up_read(&current->mm->mmap_sem);
return -EFAULT;
}
pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
paddr = pfn << PAGE_SHIFT;
table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);
if (!table) {
up_read(&current->mm->mmap_sem);
return -EFAULT;
}
ret = CMPXCHG(&table[index], orig_pte, new_pte);
memunmap(table);
up_read(&current->mm->mmap_sem);
}
return (ret != orig_pte);
}

View File

@ -2091,7 +2091,7 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
init_vmcb(svm);
kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy, true);
kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
kvm_rdx_write(vcpu, eax);
if (kvm_vcpu_apicv_active(vcpu) && !init_event)
avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE);
@ -3071,32 +3071,6 @@ static inline bool nested_svm_nmi(struct vcpu_svm *svm)
return false;
}
static void *nested_svm_map(struct vcpu_svm *svm, u64 gpa, struct page **_page)
{
struct page *page;
might_sleep();
page = kvm_vcpu_gfn_to_page(&svm->vcpu, gpa >> PAGE_SHIFT);
if (is_error_page(page))
goto error;
*_page = page;
return kmap(page);
error:
kvm_inject_gp(&svm->vcpu, 0);
return NULL;
}
static void nested_svm_unmap(struct page *page)
{
kunmap(page);
kvm_release_page_dirty(page);
}
static int nested_svm_intercept_ioio(struct vcpu_svm *svm)
{
unsigned port, size, iopm_len;
@ -3299,10 +3273,11 @@ static inline void copy_vmcb_control_area(struct vmcb *dst_vmcb, struct vmcb *fr
static int nested_svm_vmexit(struct vcpu_svm *svm)
{
int rc;
struct vmcb *nested_vmcb;
struct vmcb *hsave = svm->nested.hsave;
struct vmcb *vmcb = svm->vmcb;
struct page *page;
struct kvm_host_map map;
trace_kvm_nested_vmexit_inject(vmcb->control.exit_code,
vmcb->control.exit_info_1,
@ -3311,9 +3286,14 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
vmcb->control.exit_int_info_err,
KVM_ISA_SVM);
nested_vmcb = nested_svm_map(svm, svm->nested.vmcb, &page);
if (!nested_vmcb)
rc = kvm_vcpu_map(&svm->vcpu, gfn_to_gpa(svm->nested.vmcb), &map);
if (rc) {
if (rc == -EINVAL)
kvm_inject_gp(&svm->vcpu, 0);
return 1;
}
nested_vmcb = map.hva;
/* Exit Guest-Mode */
leave_guest_mode(&svm->vcpu);
@ -3408,16 +3388,16 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
} else {
(void)kvm_set_cr3(&svm->vcpu, hsave->save.cr3);
}
kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, hsave->save.rax);
kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, hsave->save.rsp);
kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, hsave->save.rip);
kvm_rax_write(&svm->vcpu, hsave->save.rax);
kvm_rsp_write(&svm->vcpu, hsave->save.rsp);
kvm_rip_write(&svm->vcpu, hsave->save.rip);
svm->vmcb->save.dr7 = 0;
svm->vmcb->save.cpl = 0;
svm->vmcb->control.exit_int_info = 0;
mark_all_dirty(svm->vmcb);
nested_svm_unmap(page);
kvm_vcpu_unmap(&svm->vcpu, &map, true);
nested_svm_uninit_mmu_context(&svm->vcpu);
kvm_mmu_reset_context(&svm->vcpu);
@ -3483,7 +3463,7 @@ static bool nested_vmcb_checks(struct vmcb *vmcb)
}
static void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
struct vmcb *nested_vmcb, struct page *page)
struct vmcb *nested_vmcb, struct kvm_host_map *map)
{
if (kvm_get_rflags(&svm->vcpu) & X86_EFLAGS_IF)
svm->vcpu.arch.hflags |= HF_HIF_MASK;
@ -3516,9 +3496,9 @@ static void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
kvm_mmu_reset_context(&svm->vcpu);
svm->vmcb->save.cr2 = svm->vcpu.arch.cr2 = nested_vmcb->save.cr2;
kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, nested_vmcb->save.rax);
kvm_register_write(&svm->vcpu, VCPU_REGS_RSP, nested_vmcb->save.rsp);
kvm_register_write(&svm->vcpu, VCPU_REGS_RIP, nested_vmcb->save.rip);
kvm_rax_write(&svm->vcpu, nested_vmcb->save.rax);
kvm_rsp_write(&svm->vcpu, nested_vmcb->save.rsp);
kvm_rip_write(&svm->vcpu, nested_vmcb->save.rip);
/* In case we don't even reach vcpu_run, the fields are not updated */
svm->vmcb->save.rax = nested_vmcb->save.rax;
@ -3567,7 +3547,7 @@ static void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
svm->vmcb->control.pause_filter_thresh =
nested_vmcb->control.pause_filter_thresh;
nested_svm_unmap(page);
kvm_vcpu_unmap(&svm->vcpu, map, true);
/* Enter Guest-Mode */
enter_guest_mode(&svm->vcpu);
@ -3587,17 +3567,23 @@ static void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
static bool nested_svm_vmrun(struct vcpu_svm *svm)
{
int rc;
struct vmcb *nested_vmcb;
struct vmcb *hsave = svm->nested.hsave;
struct vmcb *vmcb = svm->vmcb;
struct page *page;
struct kvm_host_map map;
u64 vmcb_gpa;
vmcb_gpa = svm->vmcb->save.rax;
nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page);
if (!nested_vmcb)
rc = kvm_vcpu_map(&svm->vcpu, gfn_to_gpa(vmcb_gpa), &map);
if (rc) {
if (rc == -EINVAL)
kvm_inject_gp(&svm->vcpu, 0);
return false;
}
nested_vmcb = map.hva;
if (!nested_vmcb_checks(nested_vmcb)) {
nested_vmcb->control.exit_code = SVM_EXIT_ERR;
@ -3605,7 +3591,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
nested_vmcb->control.exit_info_1 = 0;
nested_vmcb->control.exit_info_2 = 0;
nested_svm_unmap(page);
kvm_vcpu_unmap(&svm->vcpu, &map, true);
return false;
}
@ -3649,7 +3635,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
copy_vmcb_control_area(hsave, vmcb);
enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb, page);
enter_svm_guest_mode(svm, vmcb_gpa, nested_vmcb, &map);
return true;
}
@ -3673,21 +3659,26 @@ static void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
static int vmload_interception(struct vcpu_svm *svm)
{
struct vmcb *nested_vmcb;
struct page *page;
struct kvm_host_map map;
int ret;
if (nested_svm_check_permissions(svm))
return 1;
nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page);
if (!nested_vmcb)
ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
if (ret) {
if (ret == -EINVAL)
kvm_inject_gp(&svm->vcpu, 0);
return 1;
}
nested_vmcb = map.hva;
svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
ret = kvm_skip_emulated_instruction(&svm->vcpu);
nested_svm_vmloadsave(nested_vmcb, svm->vmcb);
nested_svm_unmap(page);
kvm_vcpu_unmap(&svm->vcpu, &map, true);
return ret;
}
@ -3695,21 +3686,26 @@ static int vmload_interception(struct vcpu_svm *svm)
static int vmsave_interception(struct vcpu_svm *svm)
{
struct vmcb *nested_vmcb;
struct page *page;
struct kvm_host_map map;
int ret;
if (nested_svm_check_permissions(svm))
return 1;
nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, &page);
if (!nested_vmcb)
ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
if (ret) {
if (ret == -EINVAL)
kvm_inject_gp(&svm->vcpu, 0);
return 1;
}
nested_vmcb = map.hva;
svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
ret = kvm_skip_emulated_instruction(&svm->vcpu);
nested_svm_vmloadsave(svm->vmcb, nested_vmcb);
nested_svm_unmap(page);
kvm_vcpu_unmap(&svm->vcpu, &map, true);
return ret;
}
@ -3791,11 +3787,11 @@ static int invlpga_interception(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
trace_kvm_invlpga(svm->vmcb->save.rip, kvm_register_read(&svm->vcpu, VCPU_REGS_RCX),
kvm_register_read(&svm->vcpu, VCPU_REGS_RAX));
trace_kvm_invlpga(svm->vmcb->save.rip, kvm_rcx_read(&svm->vcpu),
kvm_rax_read(&svm->vcpu));
/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
kvm_mmu_invlpg(vcpu, kvm_register_read(&svm->vcpu, VCPU_REGS_RAX));
kvm_mmu_invlpg(vcpu, kvm_rax_read(&svm->vcpu));
svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
return kvm_skip_emulated_instruction(&svm->vcpu);
@ -3803,7 +3799,7 @@ static int invlpga_interception(struct vcpu_svm *svm)
static int skinit_interception(struct vcpu_svm *svm)
{
trace_kvm_skinit(svm->vmcb->save.rip, kvm_register_read(&svm->vcpu, VCPU_REGS_RAX));
trace_kvm_skinit(svm->vmcb->save.rip, kvm_rax_read(&svm->vcpu));
kvm_queue_exception(&svm->vcpu, UD_VECTOR);
return 1;
@ -3817,7 +3813,7 @@ static int wbinvd_interception(struct vcpu_svm *svm)
static int xsetbv_interception(struct vcpu_svm *svm)
{
u64 new_bv = kvm_read_edx_eax(&svm->vcpu);
u32 index = kvm_register_read(&svm->vcpu, VCPU_REGS_RCX);
u32 index = kvm_rcx_read(&svm->vcpu);
if (kvm_set_xcr(&svm->vcpu, index, new_bv) == 0) {
svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
@ -4213,7 +4209,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
static int rdmsr_interception(struct vcpu_svm *svm)
{
u32 ecx = kvm_register_read(&svm->vcpu, VCPU_REGS_RCX);
u32 ecx = kvm_rcx_read(&svm->vcpu);
struct msr_data msr_info;
msr_info.index = ecx;
@ -4225,10 +4221,8 @@ static int rdmsr_interception(struct vcpu_svm *svm)
} else {
trace_kvm_msr_read(ecx, msr_info.data);
kvm_register_write(&svm->vcpu, VCPU_REGS_RAX,
msr_info.data & 0xffffffff);
kvm_register_write(&svm->vcpu, VCPU_REGS_RDX,
msr_info.data >> 32);
kvm_rax_write(&svm->vcpu, msr_info.data & 0xffffffff);
kvm_rdx_write(&svm->vcpu, msr_info.data >> 32);
svm->next_rip = kvm_rip_read(&svm->vcpu) + 2;
return kvm_skip_emulated_instruction(&svm->vcpu);
}
@ -4422,7 +4416,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
static int wrmsr_interception(struct vcpu_svm *svm)
{
struct msr_data msr;
u32 ecx = kvm_register_read(&svm->vcpu, VCPU_REGS_RCX);
u32 ecx = kvm_rcx_read(&svm->vcpu);
u64 data = kvm_read_edx_eax(&svm->vcpu);
msr.data = data;
@ -6236,7 +6230,7 @@ static int svm_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *nested_vmcb;
struct page *page;
struct kvm_host_map map;
u64 guest;
u64 vmcb;
@ -6244,10 +6238,10 @@ static int svm_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
vmcb = GET_SMSTATE(u64, smstate, 0x7ee0);
if (guest) {
nested_vmcb = nested_svm_map(svm, vmcb, &page);
if (!nested_vmcb)
if (kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(vmcb), &map) == -EINVAL)
return 1;
enter_svm_guest_mode(svm, vmcb, nested_vmcb, page);
nested_vmcb = map.hva;
enter_svm_guest_mode(svm, vmcb, nested_vmcb, &map);
}
return 0;
}

View File

@ -2,6 +2,8 @@
#ifndef __KVM_X86_VMX_CAPS_H
#define __KVM_X86_VMX_CAPS_H
#include <asm/vmx.h>
#include "lapic.h"
extern bool __read_mostly enable_vpid;

View File

@ -193,10 +193,8 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
if (!vmx->nested.hv_evmcs)
return;
kunmap(vmx->nested.hv_evmcs_page);
kvm_release_page_dirty(vmx->nested.hv_evmcs_page);
kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map, true);
vmx->nested.hv_evmcs_vmptr = -1ull;
vmx->nested.hv_evmcs_page = NULL;
vmx->nested.hv_evmcs = NULL;
}
@ -229,16 +227,9 @@ static void free_nested(struct kvm_vcpu *vcpu)
kvm_release_page_dirty(vmx->nested.apic_access_page);
vmx->nested.apic_access_page = NULL;
}
if (vmx->nested.virtual_apic_page) {
kvm_release_page_dirty(vmx->nested.virtual_apic_page);
vmx->nested.virtual_apic_page = NULL;
}
if (vmx->nested.pi_desc_page) {
kunmap(vmx->nested.pi_desc_page);
kvm_release_page_dirty(vmx->nested.pi_desc_page);
vmx->nested.pi_desc_page = NULL;
vmx->nested.pi_desc = NULL;
}
kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
vmx->nested.pi_desc = NULL;
kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
@ -519,39 +510,19 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
int msr;
struct page *page;
unsigned long *msr_bitmap_l1;
unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
/*
* pred_cmd & spec_ctrl are trying to verify two things:
*
* 1. L0 gave a permission to L1 to actually passthrough the MSR. This
* ensures that we do not accidentally generate an L02 MSR bitmap
* from the L12 MSR bitmap that is too permissive.
* 2. That L1 or L2s have actually used the MSR. This avoids
* unnecessarily merging of the bitmap if the MSR is unused. This
* works properly because we only update the L01 MSR bitmap lazily.
* So even if L0 should pass L1 these MSRs, the L01 bitmap is only
* updated to reflect this when L1 (or its L2s) actually write to
* the MSR.
*/
bool pred_cmd = !msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
bool spec_ctrl = !msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
struct kvm_host_map *map = &to_vmx(vcpu)->nested.msr_bitmap_map;
/* Nothing to do if the MSR bitmap is not in use. */
if (!cpu_has_vmx_msr_bitmap() ||
!nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS))
return false;
if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
!pred_cmd && !spec_ctrl)
if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->msr_bitmap), map))
return false;
page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
if (is_error_page(page))
return false;
msr_bitmap_l1 = (unsigned long *)kmap(page);
msr_bitmap_l1 = (unsigned long *)map->hva;
/*
* To keep the control flow simple, pay eight 8-byte writes (sixteen
@ -592,20 +563,42 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
}
}
if (spec_ctrl)
/* KVM unconditionally exposes the FS/GS base MSRs to L1. */
nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0,
MSR_FS_BASE, MSR_TYPE_RW);
nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0,
MSR_GS_BASE, MSR_TYPE_RW);
nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0,
MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
/*
* Checking the L0->L1 bitmap is trying to verify two things:
*
* 1. L0 gave a permission to L1 to actually passthrough the MSR. This
* ensures that we do not accidentally generate an L02 MSR bitmap
* from the L12 MSR bitmap that is too permissive.
* 2. That L1 or L2s have actually used the MSR. This avoids
* unnecessarily merging of the bitmap if the MSR is unused. This
* works properly because we only update the L01 MSR bitmap lazily.
* So even if L0 should pass L1 these MSRs, the L01 bitmap is only
* updated to reflect this when L1 (or its L2s) actually write to
* the MSR.
*/
if (!msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL))
nested_vmx_disable_intercept_for_msr(
msr_bitmap_l1, msr_bitmap_l0,
MSR_IA32_SPEC_CTRL,
MSR_TYPE_R | MSR_TYPE_W);
if (pred_cmd)
if (!msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD))
nested_vmx_disable_intercept_for_msr(
msr_bitmap_l1, msr_bitmap_l0,
MSR_IA32_PRED_CMD,
MSR_TYPE_W);
kunmap(page);
kvm_release_page_clean(page);
kvm_vcpu_unmap(vcpu, &to_vmx(vcpu)->nested.msr_bitmap_map, false);
return true;
}
@ -613,20 +606,20 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
static void nested_cache_shadow_vmcs12(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
struct kvm_host_map map;
struct vmcs12 *shadow;
struct page *page;
if (!nested_cpu_has_shadow_vmcs(vmcs12) ||
vmcs12->vmcs_link_pointer == -1ull)
return;
shadow = get_shadow_vmcs12(vcpu);
page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->vmcs_link_pointer);
memcpy(shadow, kmap(page), VMCS12_SIZE);
if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->vmcs_link_pointer), &map))
return;
kunmap(page);
kvm_release_page_clean(page);
memcpy(shadow, map.hva, VMCS12_SIZE);
kvm_vcpu_unmap(vcpu, &map, false);
}
static void nested_flush_cached_shadow_vmcs12(struct kvm_vcpu *vcpu,
@ -930,7 +923,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
if (!nested_cr3_valid(vcpu, cr3)) {
*entry_failure_code = ENTRY_FAIL_DEFAULT;
return 1;
return -EINVAL;
}
/*
@ -941,7 +934,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
!nested_ept) {
if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
*entry_failure_code = ENTRY_FAIL_PDPTE;
return 1;
return -EINVAL;
}
}
}
@ -1794,13 +1787,11 @@ static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu,
nested_release_evmcs(vcpu);
vmx->nested.hv_evmcs_page = kvm_vcpu_gpa_to_page(
vcpu, assist_page.current_nested_vmcs);
if (unlikely(is_error_page(vmx->nested.hv_evmcs_page)))
if (kvm_vcpu_map(vcpu, gpa_to_gfn(assist_page.current_nested_vmcs),
&vmx->nested.hv_evmcs_map))
return 0;
vmx->nested.hv_evmcs = kmap(vmx->nested.hv_evmcs_page);
vmx->nested.hv_evmcs = vmx->nested.hv_evmcs_map.hva;
/*
* Currently, KVM only supports eVMCS version 1
@ -2373,19 +2364,19 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
*/
if (vmx->emulation_required) {
*entry_failure_code = ENTRY_FAIL_DEFAULT;
return 1;
return -EINVAL;
}
/* Shadow page tables on either EPT or shadow page tables. */
if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12),
entry_failure_code))
return 1;
return -EINVAL;
if (!enable_ept)
vcpu->arch.walk_mmu->inject_page_fault = vmx_inject_page_fault_nested;
kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->guest_rsp);
kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12->guest_rip);
kvm_rsp_write(vcpu, vmcs12->guest_rsp);
kvm_rip_write(vcpu, vmcs12->guest_rip);
return 0;
}
@ -2589,11 +2580,19 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
return 0;
}
/*
* Checks related to Host Control Registers and MSRs
*/
static int nested_check_host_control_regs(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
static int nested_vmx_check_controls(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
if (nested_check_vm_execution_controls(vcpu, vmcs12) ||
nested_check_vm_exit_controls(vcpu, vmcs12) ||
nested_check_vm_entry_controls(vcpu, vmcs12))
return -EINVAL;
return 0;
}
static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
bool ia32e;
@ -2606,6 +2605,10 @@ static int nested_check_host_control_regs(struct kvm_vcpu *vcpu,
is_noncanonical_address(vmcs12->host_ia32_sysenter_eip, vcpu))
return -EINVAL;
if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT) &&
!kvm_pat_valid(vmcs12->host_ia32_pat))
return -EINVAL;
/*
* If the load IA32_EFER VM-exit control is 1, bits reserved in the
* IA32_EFER MSR must be 0 in the field for that register. In addition,
@ -2624,6 +2627,32 @@ static int nested_check_host_control_regs(struct kvm_vcpu *vcpu,
return 0;
}
static int nested_vmx_check_vmcs_link_ptr(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
int r = 0;
struct vmcs12 *shadow;
struct kvm_host_map map;
if (vmcs12->vmcs_link_pointer == -1ull)
return 0;
if (!page_address_valid(vcpu, vmcs12->vmcs_link_pointer))
return -EINVAL;
if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->vmcs_link_pointer), &map))
return -EINVAL;
shadow = map.hva;
if (shadow->hdr.revision_id != VMCS12_REVISION ||
shadow->hdr.shadow_vmcs != nested_cpu_has_shadow_vmcs(vmcs12))
r = -EINVAL;
kvm_vcpu_unmap(vcpu, &map, false);
return r;
}
/*
* Checks related to Guest Non-register State
*/
@ -2636,53 +2665,9 @@ static int nested_check_guest_non_reg_state(struct vmcs12 *vmcs12)
return 0;
}
static int nested_vmx_check_vmentry_prereqs(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
if (nested_check_vm_execution_controls(vcpu, vmcs12) ||
nested_check_vm_exit_controls(vcpu, vmcs12) ||
nested_check_vm_entry_controls(vcpu, vmcs12))
return VMXERR_ENTRY_INVALID_CONTROL_FIELD;
if (nested_check_host_control_regs(vcpu, vmcs12))
return VMXERR_ENTRY_INVALID_HOST_STATE_FIELD;
if (nested_check_guest_non_reg_state(vmcs12))
return VMXERR_ENTRY_INVALID_CONTROL_FIELD;
return 0;
}
static int nested_vmx_check_vmcs_link_ptr(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
int r;
struct page *page;
struct vmcs12 *shadow;
if (vmcs12->vmcs_link_pointer == -1ull)
return 0;
if (!page_address_valid(vcpu, vmcs12->vmcs_link_pointer))
return -EINVAL;
page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->vmcs_link_pointer);
if (is_error_page(page))
return -EINVAL;
r = 0;
shadow = kmap(page);
if (shadow->hdr.revision_id != VMCS12_REVISION ||
shadow->hdr.shadow_vmcs != nested_cpu_has_shadow_vmcs(vmcs12))
r = -EINVAL;
kunmap(page);
kvm_release_page_clean(page);
return r;
}
static int nested_vmx_check_vmentry_postreqs(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12,
u32 *exit_qual)
static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12,
u32 *exit_qual)
{
bool ia32e;
@ -2690,11 +2675,15 @@ static int nested_vmx_check_vmentry_postreqs(struct kvm_vcpu *vcpu,
if (!nested_guest_cr0_valid(vcpu, vmcs12->guest_cr0) ||
!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4))
return 1;
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
!kvm_pat_valid(vmcs12->guest_ia32_pat))
return -EINVAL;
if (nested_vmx_check_vmcs_link_ptr(vcpu, vmcs12)) {
*exit_qual = ENTRY_FAIL_VMCS_LINK_PTR;
return 1;
return -EINVAL;
}
/*
@ -2713,13 +2702,16 @@ static int nested_vmx_check_vmentry_postreqs(struct kvm_vcpu *vcpu,
ia32e != !!(vmcs12->guest_ia32_efer & EFER_LMA) ||
((vmcs12->guest_cr0 & X86_CR0_PG) &&
ia32e != !!(vmcs12->guest_ia32_efer & EFER_LME)))
return 1;
return -EINVAL;
}
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS) &&
(is_noncanonical_address(vmcs12->guest_bndcfgs & PAGE_MASK, vcpu) ||
(vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD)))
return 1;
(is_noncanonical_address(vmcs12->guest_bndcfgs & PAGE_MASK, vcpu) ||
(vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD)))
return -EINVAL;
if (nested_check_guest_non_reg_state(vmcs12))
return -EINVAL;
return 0;
}
@ -2832,6 +2824,7 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
{
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct kvm_host_map *map;
struct page *page;
u64 hpa;
@ -2864,20 +2857,14 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
}
if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) {
if (vmx->nested.virtual_apic_page) { /* shouldn't happen */
kvm_release_page_dirty(vmx->nested.virtual_apic_page);
vmx->nested.virtual_apic_page = NULL;
}
page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->virtual_apic_page_addr);
map = &vmx->nested.virtual_apic_map;
/*
* If translation failed, VM entry will fail because
* prepare_vmcs02 set VIRTUAL_APIC_PAGE_ADDR to -1ull.
*/
if (!is_error_page(page)) {
vmx->nested.virtual_apic_page = page;
hpa = page_to_phys(vmx->nested.virtual_apic_page);
vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, hpa);
if (!kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->virtual_apic_page_addr), map)) {
vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, pfn_to_hpa(map->pfn));
} else if (nested_cpu_has(vmcs12, CPU_BASED_CR8_LOAD_EXITING) &&
nested_cpu_has(vmcs12, CPU_BASED_CR8_STORE_EXITING) &&
!nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
@ -2898,26 +2885,15 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
}
if (nested_cpu_has_posted_intr(vmcs12)) {
if (vmx->nested.pi_desc_page) { /* shouldn't happen */
kunmap(vmx->nested.pi_desc_page);
kvm_release_page_dirty(vmx->nested.pi_desc_page);
vmx->nested.pi_desc_page = NULL;
vmx->nested.pi_desc = NULL;
vmcs_write64(POSTED_INTR_DESC_ADDR, -1ull);
map = &vmx->nested.pi_desc_map;
if (!kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->posted_intr_desc_addr), map)) {
vmx->nested.pi_desc =
(struct pi_desc *)(((void *)map->hva) +
offset_in_page(vmcs12->posted_intr_desc_addr));
vmcs_write64(POSTED_INTR_DESC_ADDR,
pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
}
page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->posted_intr_desc_addr);
if (is_error_page(page))
return;
vmx->nested.pi_desc_page = page;
vmx->nested.pi_desc = kmap(vmx->nested.pi_desc_page);
vmx->nested.pi_desc =
(struct pi_desc *)((void *)vmx->nested.pi_desc +
(unsigned long)(vmcs12->posted_intr_desc_addr &
(PAGE_SIZE - 1)));
vmcs_write64(POSTED_INTR_DESC_ADDR,
page_to_phys(vmx->nested.pi_desc_page) +
(unsigned long)(vmcs12->posted_intr_desc_addr &
(PAGE_SIZE - 1)));
}
if (nested_vmx_prepare_msr_bitmap(vcpu, vmcs12))
vmcs_set_bits(CPU_BASED_VM_EXEC_CONTROL,
@ -3000,7 +2976,7 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
return -1;
}
if (nested_vmx_check_vmentry_postreqs(vcpu, vmcs12, &exit_qual))
if (nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual))
goto vmentry_fail_vmexit;
}
@ -3145,9 +3121,11 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
launch ? VMXERR_VMLAUNCH_NONCLEAR_VMCS
: VMXERR_VMRESUME_NONLAUNCHED_VMCS);
ret = nested_vmx_check_vmentry_prereqs(vcpu, vmcs12);
if (ret)
return nested_vmx_failValid(vcpu, ret);
if (nested_vmx_check_controls(vcpu, vmcs12))
return nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
if (nested_vmx_check_host_state(vcpu, vmcs12))
return nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_HOST_STATE_FIELD);
/*
* We're finally done with prerequisite checking, and can start with
@ -3310,11 +3288,12 @@ static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu)
max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256);
if (max_irr != 256) {
vapic_page = kmap(vmx->nested.virtual_apic_page);
vapic_page = vmx->nested.virtual_apic_map.hva;
if (!vapic_page)
return;
__kvm_apic_update_irr(vmx->nested.pi_desc->pir,
vapic_page, &max_irr);
kunmap(vmx->nested.virtual_apic_page);
status = vmcs_read16(GUEST_INTR_STATUS);
if ((u8)max_irr > ((u8)status & 0xff)) {
status &= ~0xff;
@ -3425,8 +3404,8 @@ static void sync_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmcs12->guest_cr0 = vmcs12_guest_cr0(vcpu, vmcs12);
vmcs12->guest_cr4 = vmcs12_guest_cr4(vcpu, vmcs12);
vmcs12->guest_rsp = kvm_register_read(vcpu, VCPU_REGS_RSP);
vmcs12->guest_rip = kvm_register_read(vcpu, VCPU_REGS_RIP);
vmcs12->guest_rsp = kvm_rsp_read(vcpu);
vmcs12->guest_rip = kvm_rip_read(vcpu);
vmcs12->guest_rflags = vmcs_readl(GUEST_RFLAGS);
vmcs12->guest_es_selector = vmcs_read16(GUEST_ES_SELECTOR);
@ -3609,8 +3588,8 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
vmx_set_efer(vcpu, vcpu->arch.efer);
kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->host_rsp);
kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12->host_rip);
kvm_rsp_write(vcpu, vmcs12->host_rsp);
kvm_rip_write(vcpu, vmcs12->host_rip);
vmx_set_rflags(vcpu, X86_EFLAGS_FIXED);
vmx_set_interrupt_shadow(vcpu, 0);
@ -3955,16 +3934,9 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
kvm_release_page_dirty(vmx->nested.apic_access_page);
vmx->nested.apic_access_page = NULL;
}
if (vmx->nested.virtual_apic_page) {
kvm_release_page_dirty(vmx->nested.virtual_apic_page);
vmx->nested.virtual_apic_page = NULL;
}
if (vmx->nested.pi_desc_page) {
kunmap(vmx->nested.pi_desc_page);
kvm_release_page_dirty(vmx->nested.pi_desc_page);
vmx->nested.pi_desc_page = NULL;
vmx->nested.pi_desc = NULL;
}
kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
vmx->nested.pi_desc = NULL;
/*
* We are now running in L2, mmu_notifier will force to reload the
@ -4260,7 +4232,7 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
{
int ret;
gpa_t vmptr;
struct page *page;
uint32_t revision;
struct vcpu_vmx *vmx = to_vmx(vcpu);
const u64 VMXON_NEEDED_FEATURES = FEATURE_CONTROL_LOCKED
| FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
@ -4306,21 +4278,13 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
* Note - IA32_VMX_BASIC[48] will never be 1 for the nested case;
* which replaces physical address width with 32
*/
if (!PAGE_ALIGNED(vmptr) || (vmptr >> cpuid_maxphyaddr(vcpu)))
if (!page_address_valid(vcpu, vmptr))
return nested_vmx_failInvalid(vcpu);
page = kvm_vcpu_gpa_to_page(vcpu, vmptr);
if (is_error_page(page))
if (kvm_read_guest(vcpu->kvm, vmptr, &revision, sizeof(revision)) ||
revision != VMCS12_REVISION)
return nested_vmx_failInvalid(vcpu);
if (*(u32 *)kmap(page) != VMCS12_REVISION) {
kunmap(page);
kvm_release_page_clean(page);
return nested_vmx_failInvalid(vcpu);
}
kunmap(page);
kvm_release_page_clean(page);
vmx->nested.vmxon_ptr = vmptr;
ret = enter_vmx_operation(vcpu);
if (ret)
@ -4377,7 +4341,7 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
if (nested_vmx_get_vmptr(vcpu, &vmptr))
return 1;
if (!PAGE_ALIGNED(vmptr) || (vmptr >> cpuid_maxphyaddr(vcpu)))
if (!page_address_valid(vcpu, vmptr))
return nested_vmx_failValid(vcpu,
VMXERR_VMCLEAR_INVALID_ADDRESS);
@ -4385,7 +4349,7 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
return nested_vmx_failValid(vcpu,
VMXERR_VMCLEAR_VMXON_POINTER);
if (vmx->nested.hv_evmcs_page) {
if (vmx->nested.hv_evmcs_map.hva) {
if (vmptr == vmx->nested.hv_evmcs_vmptr)
nested_release_evmcs(vcpu);
} else {
@ -4584,7 +4548,7 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
if (nested_vmx_get_vmptr(vcpu, &vmptr))
return 1;
if (!PAGE_ALIGNED(vmptr) || (vmptr >> cpuid_maxphyaddr(vcpu)))
if (!page_address_valid(vcpu, vmptr))
return nested_vmx_failValid(vcpu,
VMXERR_VMPTRLD_INVALID_ADDRESS);
@ -4597,11 +4561,10 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
return 1;
if (vmx->nested.current_vmptr != vmptr) {
struct kvm_host_map map;
struct vmcs12 *new_vmcs12;
struct page *page;
page = kvm_vcpu_gpa_to_page(vcpu, vmptr);
if (is_error_page(page)) {
if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmptr), &map)) {
/*
* Reads from an unbacked page return all 1s,
* which means that the 32 bits located at the
@ -4611,12 +4574,13 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
return nested_vmx_failValid(vcpu,
VMXERR_VMPTRLD_INCORRECT_VMCS_REVISION_ID);
}
new_vmcs12 = kmap(page);
new_vmcs12 = map.hva;
if (new_vmcs12->hdr.revision_id != VMCS12_REVISION ||
(new_vmcs12->hdr.shadow_vmcs &&
!nested_cpu_has_vmx_shadow_vmcs(vcpu))) {
kunmap(page);
kvm_release_page_clean(page);
kvm_vcpu_unmap(vcpu, &map, false);
return nested_vmx_failValid(vcpu,
VMXERR_VMPTRLD_INCORRECT_VMCS_REVISION_ID);
}
@ -4628,8 +4592,7 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
* cached.
*/
memcpy(vmx->nested.cached_vmcs12, new_vmcs12, VMCS12_SIZE);
kunmap(page);
kvm_release_page_clean(page);
kvm_vcpu_unmap(vcpu, &map, false);
set_current_vmptr(vmx, vmptr);
}
@ -4804,7 +4767,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
u32 index = vcpu->arch.regs[VCPU_REGS_RCX];
u32 index = kvm_rcx_read(vcpu);
u64 address;
bool accessed_dirty;
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
@ -4850,7 +4813,7 @@ static int handle_vmfunc(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct vmcs12 *vmcs12;
u32 function = vcpu->arch.regs[VCPU_REGS_RAX];
u32 function = kvm_rax_read(vcpu);
/*
* VMFUNC is only supported for nested guests, but we always enable the
@ -4936,7 +4899,7 @@ static bool nested_vmx_exit_handled_io(struct kvm_vcpu *vcpu,
static bool nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12, u32 exit_reason)
{
u32 msr_index = vcpu->arch.regs[VCPU_REGS_RCX];
u32 msr_index = kvm_rcx_read(vcpu);
gpa_t bitmap;
if (!nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS))
@ -5373,9 +5336,6 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
if (kvm_state->format != 0)
return -EINVAL;
if (kvm_state->flags & KVM_STATE_NESTED_EVMCS)
nested_enable_evmcs(vcpu, NULL);
if (!nested_vmx_allowed(vcpu))
return kvm_state->vmx.vmxon_pa == -1ull ? 0 : -EINVAL;
@ -5417,6 +5377,9 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
if (kvm_state->vmx.vmxon_pa == -1ull)
return 0;
if (kvm_state->flags & KVM_STATE_NESTED_EVMCS)
nested_enable_evmcs(vcpu, NULL);
vmx->nested.vmxon_ptr = kvm_state->vmx.vmxon_pa;
ret = enter_vmx_operation(vcpu);
if (ret)
@ -5460,9 +5423,6 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
if (!(kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE))
return 0;
vmx->nested.nested_run_pending =
!!(kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING);
if (nested_cpu_has_shadow_vmcs(vmcs12) &&
vmcs12->vmcs_link_pointer != -1ull) {
struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu);
@ -5480,14 +5440,20 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
return -EINVAL;
}
if (nested_vmx_check_vmentry_prereqs(vcpu, vmcs12) ||
nested_vmx_check_vmentry_postreqs(vcpu, vmcs12, &exit_qual))
if (nested_vmx_check_controls(vcpu, vmcs12) ||
nested_vmx_check_host_state(vcpu, vmcs12) ||
nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual))
return -EINVAL;
vmx->nested.dirty_vmcs12 = true;
vmx->nested.nested_run_pending =
!!(kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING);
ret = nested_vmx_enter_non_root_mode(vcpu, false);
if (ret)
if (ret) {
vmx->nested.nested_run_pending = 0;
return -EINVAL;
}
return 0;
}

View File

@ -227,7 +227,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
break;
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
if (!(data & (pmu->global_ctrl_mask & ~(3ull<<62)))) {
if (!(data & pmu->global_ovf_ctrl_mask)) {
if (!msr_info->host_initiated)
pmu->global_status &= ~data;
pmu->global_ovf_ctrl = data;
@ -297,6 +297,12 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->global_ctrl = ((1ull << pmu->nr_arch_gp_counters) - 1) |
(((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;
pmu->global_ovf_ctrl_mask = pmu->global_ctrl_mask
& ~(MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF |
MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD);
if (kvm_x86_ops->pt_supported())
pmu->global_ovf_ctrl_mask &=
~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI;
entry = kvm_find_cpuid_entry(vcpu, 7, 0);
if (entry &&

View File

@ -1692,6 +1692,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_SYSENTER_ESP:
msr_info->data = vmcs_readl(GUEST_SYSENTER_ESP);
break;
case MSR_IA32_POWER_CTL:
msr_info->data = vmx->msr_ia32_power_ctl;
break;
case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported() ||
(!msr_info->host_initiated &&
@ -1822,6 +1825,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_SYSENTER_ESP:
vmcs_writel(GUEST_SYSENTER_ESP, data);
break;
case MSR_IA32_POWER_CTL:
vmx->msr_ia32_power_ctl = data;
break;
case MSR_IA32_BNDCFGS:
if (!kvm_mpx_supported() ||
(!msr_info->host_initiated &&
@ -1891,7 +1897,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
if (!kvm_pat_valid(data))
return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
@ -2288,7 +2294,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
#endif
opt = VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
VM_EXIT_SAVE_IA32_PAT |
VM_EXIT_LOAD_IA32_PAT |
VM_EXIT_LOAD_IA32_EFER |
VM_EXIT_CLEAR_BNDCFGS |
@ -3619,14 +3624,13 @@ static bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(!is_guest_mode(vcpu)) ||
!nested_cpu_has_vid(get_vmcs12(vcpu)) ||
WARN_ON_ONCE(!vmx->nested.virtual_apic_page))
WARN_ON_ONCE(!vmx->nested.virtual_apic_map.gfn))
return false;
rvi = vmx_get_rvi();
vapic_page = kmap(vmx->nested.virtual_apic_page);
vapic_page = vmx->nested.virtual_apic_map.hva;
vppr = *((u32 *)(vapic_page + APIC_PROCPRI));
kunmap(vmx->nested.virtual_apic_page);
return ((rvi & 0xf0) > (vppr & 0xf0));
}
@ -4827,7 +4831,7 @@ static int handle_cpuid(struct kvm_vcpu *vcpu)
static int handle_rdmsr(struct kvm_vcpu *vcpu)
{
u32 ecx = vcpu->arch.regs[VCPU_REGS_RCX];
u32 ecx = kvm_rcx_read(vcpu);
struct msr_data msr_info;
msr_info.index = ecx;
@ -4840,18 +4844,16 @@ static int handle_rdmsr(struct kvm_vcpu *vcpu)
trace_kvm_msr_read(ecx, msr_info.data);
/* FIXME: handling of bits 32:63 of rax, rdx */
vcpu->arch.regs[VCPU_REGS_RAX] = msr_info.data & -1u;
vcpu->arch.regs[VCPU_REGS_RDX] = (msr_info.data >> 32) & -1u;
kvm_rax_write(vcpu, msr_info.data & -1u);
kvm_rdx_write(vcpu, (msr_info.data >> 32) & -1u);
return kvm_skip_emulated_instruction(vcpu);
}
static int handle_wrmsr(struct kvm_vcpu *vcpu)
{
struct msr_data msr;
u32 ecx = vcpu->arch.regs[VCPU_REGS_RCX];
u64 data = (vcpu->arch.regs[VCPU_REGS_RAX] & -1u)
| ((u64)(vcpu->arch.regs[VCPU_REGS_RDX] & -1u) << 32);
u32 ecx = kvm_rcx_read(vcpu);
u64 data = kvm_read_edx_eax(vcpu);
msr.data = data;
msr.index = ecx;
@ -4922,7 +4924,7 @@ static int handle_wbinvd(struct kvm_vcpu *vcpu)
static int handle_xsetbv(struct kvm_vcpu *vcpu)
{
u64 new_bv = kvm_read_edx_eax(vcpu);
u32 index = kvm_register_read(vcpu, VCPU_REGS_RCX);
u32 index = kvm_rcx_read(vcpu);
if (kvm_set_xcr(vcpu, index, new_bv) == 0)
return kvm_skip_emulated_instruction(vcpu);
@ -5723,8 +5725,16 @@ void dump_vmcs(void)
if (secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
pr_err("TSC Multiplier = 0x%016llx\n",
vmcs_read64(TSC_MULTIPLIER));
if (cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW)
pr_err("TPR Threshold = 0x%02x\n", vmcs_read32(TPR_THRESHOLD));
if (cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW) {
if (secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) {
u16 status = vmcs_read16(GUEST_INTR_STATUS);
pr_err("SVI|RVI = %02x|%02x ", status >> 8, status & 0xff);
}
pr_cont("TPR Threshold = 0x%02x\n", vmcs_read32(TPR_THRESHOLD));
if (secondary_exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)
pr_err("APIC-access addr = 0x%016llx ", vmcs_read64(APIC_ACCESS_ADDR));
pr_cont("virt-APIC addr = 0x%016llx\n", vmcs_read64(VIRTUAL_APIC_PAGE_ADDR));
}
if (pin_based_exec_ctrl & PIN_BASED_POSTED_INTR)
pr_err("PostedIntrVec = 0x%02x\n", vmcs_read16(POSTED_INTR_NV));
if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT))
@ -6856,30 +6866,6 @@ static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
}
}
static bool guest_cpuid_has_pmu(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *entry;
union cpuid10_eax eax;
entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
if (!entry)
return false;
eax.full = entry->eax;
return (eax.split.version_id > 0);
}
static void nested_vmx_procbased_ctls_update(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
bool pmu_enabled = guest_cpuid_has_pmu(vcpu);
if (pmu_enabled)
vmx->nested.msrs.procbased_ctls_high |= CPU_BASED_RDPMC_EXITING;
else
vmx->nested.msrs.procbased_ctls_high &= ~CPU_BASED_RDPMC_EXITING;
}
static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
@ -6968,7 +6954,6 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
if (nested_vmx_allowed(vcpu)) {
nested_vmx_cr_fixed1_bits_update(vcpu);
nested_vmx_entry_exit_ctls_update(vcpu);
nested_vmx_procbased_ctls_update(vcpu);
}
if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
@ -7028,7 +7013,8 @@ static inline int u64_shl_div_u64(u64 a, unsigned int shift,
return 0;
}
static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
bool *expired)
{
struct vcpu_vmx *vmx;
u64 tscl, guest_tscl, delta_tsc, lapic_timer_advance_cycles;
@ -7051,10 +7037,9 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
/* Convert to host delta tsc if tsc scaling is enabled */
if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio &&
u64_shl_div_u64(delta_tsc,
delta_tsc && u64_shl_div_u64(delta_tsc,
kvm_tsc_scaling_ratio_frac_bits,
vcpu->arch.tsc_scaling_ratio,
&delta_tsc))
vcpu->arch.tsc_scaling_ratio, &delta_tsc))
return -ERANGE;
/*
@ -7067,7 +7052,8 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc)
return -ERANGE;
vmx->hv_deadline_tsc = tscl + delta_tsc;
return delta_tsc == 0;
*expired = !delta_tsc;
return 0;
}
static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
@ -7104,9 +7090,7 @@ static int vmx_write_pml_buffer(struct kvm_vcpu *vcpu)
{
struct vmcs12 *vmcs12;
struct vcpu_vmx *vmx = to_vmx(vcpu);
gpa_t gpa;
struct page *page = NULL;
u64 *pml_address;
gpa_t gpa, dst;
if (is_guest_mode(vcpu)) {
WARN_ON_ONCE(vmx->nested.pml_full);
@ -7126,15 +7110,13 @@ static int vmx_write_pml_buffer(struct kvm_vcpu *vcpu)
}
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS) & ~0xFFFull;
dst = vmcs12->pml_address + sizeof(u64) * vmcs12->guest_pml_index;
page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->pml_address);
if (is_error_page(page))
if (kvm_write_guest_page(vcpu->kvm, gpa_to_gfn(dst), &gpa,
offset_in_page(dst), sizeof(gpa)))
return 0;
pml_address = kmap(page);
pml_address[vmcs12->guest_pml_index--] = gpa;
kunmap(page);
kvm_release_page_clean(page);
vmcs12->guest_pml_index--;
}
return 0;

View File

@ -142,8 +142,11 @@ struct nested_vmx {
* pointers, so we must keep them pinned while L2 runs.
*/
struct page *apic_access_page;
struct page *virtual_apic_page;
struct page *pi_desc_page;
struct kvm_host_map virtual_apic_map;
struct kvm_host_map pi_desc_map;
struct kvm_host_map msr_bitmap_map;
struct pi_desc *pi_desc;
bool pi_pending;
u16 posted_intr_nv;
@ -169,7 +172,7 @@ struct nested_vmx {
} smm;
gpa_t hv_evmcs_vmptr;
struct page *hv_evmcs_page;
struct kvm_host_map hv_evmcs_map;
struct hv_enlightened_vmcs *hv_evmcs;
};
@ -257,6 +260,8 @@ struct vcpu_vmx {
unsigned long host_debugctlmsr;
u64 msr_ia32_power_ctl;
/*
* Only bits masked by msr_ia32_feature_control_valid_bits can be set in
* msr_ia32_feature_control. FEATURE_CONTROL_LOCKED is always included

View File

@ -1100,15 +1100,15 @@ EXPORT_SYMBOL_GPL(kvm_get_dr);
bool kvm_rdpmc(struct kvm_vcpu *vcpu)
{
u32 ecx = kvm_register_read(vcpu, VCPU_REGS_RCX);
u32 ecx = kvm_rcx_read(vcpu);
u64 data;
int err;
err = kvm_pmu_rdpmc(vcpu, ecx, &data);
if (err)
return err;
kvm_register_write(vcpu, VCPU_REGS_RAX, (u32)data);
kvm_register_write(vcpu, VCPU_REGS_RDX, data >> 32);
kvm_rax_write(vcpu, (u32)data);
kvm_rdx_write(vcpu, data >> 32);
return err;
}
EXPORT_SYMBOL_GPL(kvm_rdpmc);
@ -1174,6 +1174,9 @@ static u32 emulated_msrs[] = {
MSR_PLATFORM_INFO,
MSR_MISC_FEATURES_ENABLES,
MSR_AMD64_VIRT_SPEC_CTRL,
MSR_IA32_POWER_CTL,
MSR_K7_HWCR,
};
static unsigned num_emulated_msrs;
@ -1262,31 +1265,49 @@ static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
return 0;
}
static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
{
if (efer & EFER_FFXSR && !guest_cpuid_has(vcpu, X86_FEATURE_FXSR_OPT))
return false;
if (efer & EFER_SVME && !guest_cpuid_has(vcpu, X86_FEATURE_SVM))
return false;
if (efer & (EFER_LME | EFER_LMA) &&
!guest_cpuid_has(vcpu, X86_FEATURE_LM))
return false;
if (efer & EFER_NX && !guest_cpuid_has(vcpu, X86_FEATURE_NX))
return false;
return true;
}
bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
{
if (efer & efer_reserved_bits)
return false;
if (efer & EFER_FFXSR && !guest_cpuid_has(vcpu, X86_FEATURE_FXSR_OPT))
return false;
if (efer & EFER_SVME && !guest_cpuid_has(vcpu, X86_FEATURE_SVM))
return false;
return true;
return __kvm_valid_efer(vcpu, efer);
}
EXPORT_SYMBOL_GPL(kvm_valid_efer);
static int set_efer(struct kvm_vcpu *vcpu, u64 efer)
static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
u64 old_efer = vcpu->arch.efer;
u64 efer = msr_info->data;
if (!kvm_valid_efer(vcpu, efer))
return 1;
if (efer & efer_reserved_bits)
return false;
if (is_paging(vcpu)
&& (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME))
return 1;
if (!msr_info->host_initiated) {
if (!__kvm_valid_efer(vcpu, efer))
return 1;
if (is_paging(vcpu) &&
(vcpu->arch.efer & EFER_LME) != (efer & EFER_LME))
return 1;
}
efer &= ~EFER_LMA;
efer |= vcpu->arch.efer & EFER_LMA;
@ -2279,6 +2300,18 @@ static void kvmclock_sync_fn(struct work_struct *work)
KVMCLOCK_SYNC_PERIOD);
}
/*
* On AMD, HWCR[McStatusWrEn] controls whether setting MCi_STATUS results in #GP.
*/
static bool can_set_mci_status(struct kvm_vcpu *vcpu)
{
/* McStatusWrEn enabled? */
if (guest_cpuid_is_amd(vcpu))
return !!(vcpu->arch.msr_hwcr & BIT_ULL(18));
return false;
}
static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
u64 mcg_cap = vcpu->arch.mcg_cap;
@ -2310,9 +2343,14 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if ((offset & 0x3) == 0 &&
data != 0 && (data | (1 << 10)) != ~(u64)0)
return -1;
/* MCi_STATUS */
if (!msr_info->host_initiated &&
(offset & 0x3) == 1 && data != 0)
return -1;
(offset & 0x3) == 1 && data != 0) {
if (!can_set_mci_status(vcpu))
return -1;
}
vcpu->arch.mce_banks[offset] = data;
break;
}
@ -2456,13 +2494,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vcpu->arch.arch_capabilities = data;
break;
case MSR_EFER:
return set_efer(vcpu, data);
return set_efer(vcpu, msr_info);
case MSR_K7_HWCR:
data &= ~(u64)0x40; /* ignore flush filter disable */
data &= ~(u64)0x100; /* ignore ignne emulation enable */
data &= ~(u64)0x8; /* ignore TLB cache disable */
data &= ~(u64)0x40000; /* ignore Mc status write enable */
if (data != 0) {
/* Handle McStatusWrEn */
if (data == BIT_ULL(18)) {
vcpu->arch.msr_hwcr = data;
} else if (data != 0) {
vcpu_unimpl(vcpu, "unimplemented HWCR wrmsr: 0x%llx\n",
data);
return 1;
@ -2736,7 +2777,6 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_K8_SYSCFG:
case MSR_K8_TSEG_ADDR:
case MSR_K8_TSEG_MASK:
case MSR_K7_HWCR:
case MSR_VM_HSAVE_PA:
case MSR_K8_INT_PENDING_MSG:
case MSR_AMD64_NB_CFG:
@ -2900,6 +2940,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_MISC_FEATURES_ENABLES:
msr_info->data = vcpu->arch.msr_misc_features_enables;
break;
case MSR_K7_HWCR:
msr_info->data = vcpu->arch.msr_hwcr;
break;
default:
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
return kvm_pmu_get_msr(vcpu, msr_info->index, &msr_info->data);
@ -3079,9 +3122,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_MAX_VCPUS:
r = KVM_MAX_VCPUS;
break;
case KVM_CAP_NR_MEMSLOTS:
r = KVM_USER_MEM_SLOTS;
break;
case KVM_CAP_PV_MMU: /* obsolete */
r = 0;
break;
@ -5521,9 +5561,9 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
unsigned int bytes,
struct x86_exception *exception)
{
struct kvm_host_map map;
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
gpa_t gpa;
struct page *page;
char *kaddr;
bool exchanged;
@ -5540,12 +5580,11 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
goto emul_write;
page = kvm_vcpu_gfn_to_page(vcpu, gpa >> PAGE_SHIFT);
if (is_error_page(page))
if (kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map))
goto emul_write;
kaddr = kmap_atomic(page);
kaddr += offset_in_page(gpa);
kaddr = map.hva + offset_in_page(gpa);
switch (bytes) {
case 1:
exchanged = CMPXCHG_TYPE(u8, kaddr, old, new);
@ -5562,13 +5601,12 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
default:
BUG();
}
kunmap_atomic(kaddr);
kvm_release_page_dirty(page);
kvm_vcpu_unmap(vcpu, &map, true);
if (!exchanged)
return X86EMUL_CMPXCHG_FAILED;
kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
kvm_page_track_write(vcpu, gpa, new, bytes);
return X86EMUL_CONTINUE;
@ -6558,7 +6596,7 @@ static int complete_fast_pio_out(struct kvm_vcpu *vcpu)
static int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size,
unsigned short port)
{
unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX);
unsigned long val = kvm_rax_read(vcpu);
int ret = emulator_pio_out_emulated(&vcpu->arch.emulate_ctxt,
size, port, &val, 1);
if (ret)
@ -6593,8 +6631,7 @@ static int complete_fast_pio_in(struct kvm_vcpu *vcpu)
}
/* For size less than 4 we merge, else we zero extend */
val = (vcpu->arch.pio.size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX)
: 0;
val = (vcpu->arch.pio.size < 4) ? kvm_rax_read(vcpu) : 0;
/*
* Since vcpu->arch.pio.count == 1 let emulator_pio_in_emulated perform
@ -6602,7 +6639,7 @@ static int complete_fast_pio_in(struct kvm_vcpu *vcpu)
*/
emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size,
vcpu->arch.pio.port, &val, 1);
kvm_register_write(vcpu, VCPU_REGS_RAX, val);
kvm_rax_write(vcpu, val);
return kvm_skip_emulated_instruction(vcpu);
}
@ -6614,12 +6651,12 @@ static int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size,
int ret;
/* For size less than 4 we merge, else we zero extend */
val = (size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX) : 0;
val = (size < 4) ? kvm_rax_read(vcpu) : 0;
ret = emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, size, port,
&val, 1);
if (ret) {
kvm_register_write(vcpu, VCPU_REGS_RAX, val);
kvm_rax_write(vcpu, val);
return ret;
}
@ -6854,10 +6891,20 @@ static unsigned long kvm_get_guest_ip(void)
return ip;
}
static void kvm_handle_intel_pt_intr(void)
{
struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
kvm_make_request(KVM_REQ_PMI, vcpu);
__set_bit(MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT,
(unsigned long *)&vcpu->arch.pmu.global_status);
}
static struct perf_guest_info_callbacks kvm_guest_cbs = {
.is_in_guest = kvm_is_in_guest,
.is_user_mode = kvm_is_user_mode,
.get_guest_ip = kvm_get_guest_ip,
.handle_intel_pt_intr = kvm_handle_intel_pt_intr,
};
static void kvm_set_mmio_spte_mask(void)
@ -7133,11 +7180,11 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
if (kvm_hv_hypercall_enabled(vcpu->kvm))
return kvm_hv_hypercall(vcpu);
nr = kvm_register_read(vcpu, VCPU_REGS_RAX);
a0 = kvm_register_read(vcpu, VCPU_REGS_RBX);
a1 = kvm_register_read(vcpu, VCPU_REGS_RCX);
a2 = kvm_register_read(vcpu, VCPU_REGS_RDX);
a3 = kvm_register_read(vcpu, VCPU_REGS_RSI);
nr = kvm_rax_read(vcpu);
a0 = kvm_rbx_read(vcpu);
a1 = kvm_rcx_read(vcpu);
a2 = kvm_rdx_read(vcpu);
a3 = kvm_rsi_read(vcpu);
trace_kvm_hypercall(nr, a0, a1, a2, a3);
@ -7178,7 +7225,7 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
out:
if (!op_64_bit)
ret = (u32)ret;
kvm_register_write(vcpu, VCPU_REGS_RAX, ret);
kvm_rax_write(vcpu, ret);
++vcpu->stat.hypercalls;
return kvm_skip_emulated_instruction(vcpu);
@ -8280,23 +8327,23 @@ static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
emulator_writeback_register_cache(&vcpu->arch.emulate_ctxt);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
}
regs->rax = kvm_register_read(vcpu, VCPU_REGS_RAX);
regs->rbx = kvm_register_read(vcpu, VCPU_REGS_RBX);
regs->rcx = kvm_register_read(vcpu, VCPU_REGS_RCX);
regs->rdx = kvm_register_read(vcpu, VCPU_REGS_RDX);
regs->rsi = kvm_register_read(vcpu, VCPU_REGS_RSI);
regs->rdi = kvm_register_read(vcpu, VCPU_REGS_RDI);
regs->rsp = kvm_register_read(vcpu, VCPU_REGS_RSP);
regs->rbp = kvm_register_read(vcpu, VCPU_REGS_RBP);
regs->rax = kvm_rax_read(vcpu);
regs->rbx = kvm_rbx_read(vcpu);
regs->rcx = kvm_rcx_read(vcpu);
regs->rdx = kvm_rdx_read(vcpu);
regs->rsi = kvm_rsi_read(vcpu);
regs->rdi = kvm_rdi_read(vcpu);
regs->rsp = kvm_rsp_read(vcpu);
regs->rbp = kvm_rbp_read(vcpu);
#ifdef CONFIG_X86_64
regs->r8 = kvm_register_read(vcpu, VCPU_REGS_R8);
regs->r9 = kvm_register_read(vcpu, VCPU_REGS_R9);
regs->r10 = kvm_register_read(vcpu, VCPU_REGS_R10);
regs->r11 = kvm_register_read(vcpu, VCPU_REGS_R11);
regs->r12 = kvm_register_read(vcpu, VCPU_REGS_R12);
regs->r13 = kvm_register_read(vcpu, VCPU_REGS_R13);
regs->r14 = kvm_register_read(vcpu, VCPU_REGS_R14);
regs->r15 = kvm_register_read(vcpu, VCPU_REGS_R15);
regs->r8 = kvm_r8_read(vcpu);
regs->r9 = kvm_r9_read(vcpu);
regs->r10 = kvm_r10_read(vcpu);
regs->r11 = kvm_r11_read(vcpu);
regs->r12 = kvm_r12_read(vcpu);
regs->r13 = kvm_r13_read(vcpu);
regs->r14 = kvm_r14_read(vcpu);
regs->r15 = kvm_r15_read(vcpu);
#endif
regs->rip = kvm_rip_read(vcpu);
@ -8316,23 +8363,23 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
vcpu->arch.emulate_regs_need_sync_from_vcpu = true;
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
kvm_register_write(vcpu, VCPU_REGS_RAX, regs->rax);
kvm_register_write(vcpu, VCPU_REGS_RBX, regs->rbx);
kvm_register_write(vcpu, VCPU_REGS_RCX, regs->rcx);
kvm_register_write(vcpu, VCPU_REGS_RDX, regs->rdx);
kvm_register_write(vcpu, VCPU_REGS_RSI, regs->rsi);
kvm_register_write(vcpu, VCPU_REGS_RDI, regs->rdi);
kvm_register_write(vcpu, VCPU_REGS_RSP, regs->rsp);
kvm_register_write(vcpu, VCPU_REGS_RBP, regs->rbp);
kvm_rax_write(vcpu, regs->rax);
kvm_rbx_write(vcpu, regs->rbx);
kvm_rcx_write(vcpu, regs->rcx);
kvm_rdx_write(vcpu, regs->rdx);
kvm_rsi_write(vcpu, regs->rsi);
kvm_rdi_write(vcpu, regs->rdi);
kvm_rsp_write(vcpu, regs->rsp);
kvm_rbp_write(vcpu, regs->rbp);
#ifdef CONFIG_X86_64
kvm_register_write(vcpu, VCPU_REGS_R8, regs->r8);
kvm_register_write(vcpu, VCPU_REGS_R9, regs->r9);
kvm_register_write(vcpu, VCPU_REGS_R10, regs->r10);
kvm_register_write(vcpu, VCPU_REGS_R11, regs->r11);
kvm_register_write(vcpu, VCPU_REGS_R12, regs->r12);
kvm_register_write(vcpu, VCPU_REGS_R13, regs->r13);
kvm_register_write(vcpu, VCPU_REGS_R14, regs->r14);
kvm_register_write(vcpu, VCPU_REGS_R15, regs->r15);
kvm_r8_write(vcpu, regs->r8);
kvm_r9_write(vcpu, regs->r9);
kvm_r10_write(vcpu, regs->r10);
kvm_r11_write(vcpu, regs->r11);
kvm_r12_write(vcpu, regs->r12);
kvm_r13_write(vcpu, regs->r13);
kvm_r14_write(vcpu, regs->r14);
kvm_r15_write(vcpu, regs->r15);
#endif
kvm_rip_write(vcpu, regs->rip);

View File

@ -345,6 +345,16 @@ static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
__this_cpu_write(current_vcpu, NULL);
}
static inline bool kvm_pat_valid(u64 data)
{
if (data & 0xF8F8F8F8F8F8F8F8ull)
return false;
/* 0, 1, 4, 5, 6, 7 are valid values. */
return (data | ((data & 0x0202020202020202ull) << 1)) == data;
}
void kvm_load_guest_xcr0(struct kvm_vcpu *vcpu);
void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu);
#endif

View File

@ -227,6 +227,32 @@ enum {
READING_SHADOW_PAGE_TABLES,
};
#define KVM_UNMAPPED_PAGE ((void *) 0x500 + POISON_POINTER_DELTA)
struct kvm_host_map {
/*
* Only valid if the 'pfn' is managed by the host kernel (i.e. There is
* a 'struct page' for it. When using mem= kernel parameter some memory
* can be used as guest memory but they are not managed by host
* kernel).
* If 'pfn' is not managed by the host kernel, this field is
* initialized to KVM_UNMAPPED_PAGE.
*/
struct page *page;
void *hva;
kvm_pfn_t pfn;
kvm_pfn_t gfn;
};
/*
* Used to check if the mapping is valid or not. Never use 'kvm_host_map'
* directly to check for that.
*/
static inline bool kvm_vcpu_mapped(struct kvm_host_map *map)
{
return !!map->hva;
}
/*
* Sometimes a large or cross-page mmio needs to be broken up into separate
* exits for userspace servicing.
@ -733,7 +759,9 @@ struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn);
void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty);
unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn);
unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *writable);
int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, int offset,
@ -1242,11 +1270,21 @@ struct kvm_device_ops {
*/
void (*destroy)(struct kvm_device *dev);
/*
* Release is an alternative method to free the device. It is
* called when the device file descriptor is closed. Once
* release is called, the destroy method will not be called
* anymore as the device is removed from the device list of
* the VM. kvm->lock is held.
*/
void (*release)(struct kvm_device *dev);
int (*set_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
int (*get_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
int (*has_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
long (*ioctl)(struct kvm_device *dev, unsigned int ioctl,
unsigned long arg);
int (*mmap)(struct kvm_device *dev, struct vm_area_struct *vma);
};
void kvm_device_get(struct kvm_device *dev);
@ -1307,6 +1345,16 @@ static inline bool vcpu_valid_wakeup(struct kvm_vcpu *vcpu)
}
#endif /* CONFIG_HAVE_KVM_INVALID_WAKEUPS */
#ifdef CONFIG_HAVE_KVM_NO_POLL
/* Callback that tells if we must not poll */
bool kvm_arch_no_poll(struct kvm_vcpu *vcpu);
#else
static inline bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
{
return false;
}
#endif /* CONFIG_HAVE_KVM_NO_POLL */
#ifdef CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL
long kvm_arch_vcpu_async_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);

View File

@ -30,6 +30,7 @@ struct perf_guest_info_callbacks {
int (*is_in_guest)(void);
int (*is_user_mode)(void);
unsigned long (*get_guest_ip)(void);
void (*handle_intel_pt_intr)(void);
};
#ifdef CONFIG_HAVE_HW_BREAKPOINT

View File

@ -986,8 +986,13 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_HYPERV_ENLIGHTENED_VMCS 163
#define KVM_CAP_EXCEPTION_PAYLOAD 164
#define KVM_CAP_ARM_VM_IPA_SIZE 165
#define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166
#define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166 /* Obsolete */
#define KVM_CAP_HYPERV_CPUID 167
#define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 168
#define KVM_CAP_PPC_IRQ_XIVE 169
#define KVM_CAP_ARM_SVE 170
#define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
#define KVM_CAP_ARM_PTRAUTH_GENERIC 172
#ifdef KVM_CAP_IRQ_ROUTING
@ -1145,6 +1150,7 @@ struct kvm_dirty_tlb {
#define KVM_REG_SIZE_U256 0x0050000000000000ULL
#define KVM_REG_SIZE_U512 0x0060000000000000ULL
#define KVM_REG_SIZE_U1024 0x0070000000000000ULL
#define KVM_REG_SIZE_U2048 0x0080000000000000ULL
struct kvm_reg_list {
__u64 n; /* number of regs */
@ -1211,6 +1217,8 @@ enum kvm_device_type {
#define KVM_DEV_TYPE_ARM_VGIC_V3 KVM_DEV_TYPE_ARM_VGIC_V3
KVM_DEV_TYPE_ARM_VGIC_ITS,
#define KVM_DEV_TYPE_ARM_VGIC_ITS KVM_DEV_TYPE_ARM_VGIC_ITS
KVM_DEV_TYPE_XIVE,
#define KVM_DEV_TYPE_XIVE KVM_DEV_TYPE_XIVE
KVM_DEV_TYPE_MAX,
};
@ -1434,12 +1442,15 @@ struct kvm_enc_region {
#define KVM_GET_NESTED_STATE _IOWR(KVMIO, 0xbe, struct kvm_nested_state)
#define KVM_SET_NESTED_STATE _IOW(KVMIO, 0xbf, struct kvm_nested_state)
/* Available with KVM_CAP_MANUAL_DIRTY_LOG_PROTECT */
/* Available with KVM_CAP_MANUAL_DIRTY_LOG_PROTECT_2 */
#define KVM_CLEAR_DIRTY_LOG _IOWR(KVMIO, 0xc0, struct kvm_clear_dirty_log)
/* Available with KVM_CAP_HYPERV_CPUID */
#define KVM_GET_SUPPORTED_HV_CPUID _IOWR(KVMIO, 0xc1, struct kvm_cpuid2)
/* Available with KVM_CAP_ARM_SVE */
#define KVM_ARM_VCPU_FINALIZE _IOW(KVMIO, 0xc2, int)
/* Secure Encrypted Virtualization command */
enum sev_cmd_id {
/* Guest initialization commands */

View File

@ -152,7 +152,8 @@ struct kvm_s390_vm_cpu_subfunc {
__u8 pcc[16]; /* with MSA4 */
__u8 ppno[16]; /* with MSA5 */
__u8 kma[16]; /* with MSA8 */
__u8 reserved[1808];
__u8 kdsa[16]; /* with MSA9 */
__u8 reserved[1792];
};
/* kvm attributes for crypto */

View File

@ -1,9 +1,14 @@
/x86_64/cr4_cpuid_sync_test
/x86_64/evmcs_test
/x86_64/hyperv_cpuid
/x86_64/kvm_create_max_vcpus
/x86_64/platform_info_test
/x86_64/set_sregs_test
/x86_64/smm_test
/x86_64/state_test
/x86_64/sync_regs_test
/x86_64/vmx_close_while_nested_test
/x86_64/vmx_set_nested_state_test
/x86_64/vmx_tsc_adjust_test
/x86_64/state_test
/clear_dirty_log_test
/dirty_log_test

View File

@ -20,6 +20,8 @@ TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
TEST_GEN_PROGS_x86_64 += x86_64/smm_test
TEST_GEN_PROGS_x86_64 += x86_64/kvm_create_max_vcpus
TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
TEST_GEN_PROGS_x86_64 += dirty_log_test
TEST_GEN_PROGS_x86_64 += clear_dirty_log_test

View File

@ -314,7 +314,7 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations,
#ifdef USE_CLEAR_DIRTY_LOG
struct kvm_enable_cap cap = {};
cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT;
cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2;
cap.args[0] = 1;
vm_enable_cap(vm, &cap);
#endif
@ -430,7 +430,7 @@ int main(int argc, char *argv[])
int opt, i;
#ifdef USE_CLEAR_DIRTY_LOG
if (!kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT)) {
if (!kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2)) {
fprintf(stderr, "KVM_CLEAR_DIRTY_LOG not available, skipping tests\n");
exit(KSFT_SKIP);
}

View File

@ -118,6 +118,10 @@ void vcpu_events_get(struct kvm_vm *vm, uint32_t vcpuid,
struct kvm_vcpu_events *events);
void vcpu_events_set(struct kvm_vm *vm, uint32_t vcpuid,
struct kvm_vcpu_events *events);
void vcpu_nested_state_get(struct kvm_vm *vm, uint32_t vcpuid,
struct kvm_nested_state *state);
int vcpu_nested_state_set(struct kvm_vm *vm, uint32_t vcpuid,
struct kvm_nested_state *state, bool ignore_error);
const char *exit_reason_str(unsigned int exit_reason);

View File

@ -1250,6 +1250,38 @@ void vcpu_events_set(struct kvm_vm *vm, uint32_t vcpuid,
ret, errno);
}
void vcpu_nested_state_get(struct kvm_vm *vm, uint32_t vcpuid,
struct kvm_nested_state *state)
{
struct vcpu *vcpu = vcpu_find(vm, vcpuid);
int ret;
TEST_ASSERT(vcpu != NULL, "vcpu not found, vcpuid: %u", vcpuid);
ret = ioctl(vcpu->fd, KVM_GET_NESTED_STATE, state);
TEST_ASSERT(ret == 0,
"KVM_SET_NESTED_STATE failed, ret: %i errno: %i",
ret, errno);
}
int vcpu_nested_state_set(struct kvm_vm *vm, uint32_t vcpuid,
struct kvm_nested_state *state, bool ignore_error)
{
struct vcpu *vcpu = vcpu_find(vm, vcpuid);
int ret;
TEST_ASSERT(vcpu != NULL, "vcpu not found, vcpuid: %u", vcpuid);
ret = ioctl(vcpu->fd, KVM_SET_NESTED_STATE, state);
if (!ignore_error) {
TEST_ASSERT(ret == 0,
"KVM_SET_NESTED_STATE failed, ret: %i errno: %i",
ret, errno);
}
return ret;
}
/*
* VM VCPU System Regs Get
*

View File

@ -0,0 +1,70 @@
/*
* kvm_create_max_vcpus
*
* Copyright (C) 2019, Google LLC.
*
* This work is licensed under the terms of the GNU GPL, version 2.
*
* Test for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_VCPU_ID.
*/
#define _GNU_SOURCE /* for program_invocation_short_name */
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "test_util.h"
#include "kvm_util.h"
#include "asm/kvm.h"
#include "linux/kvm.h"
void test_vcpu_creation(int first_vcpu_id, int num_vcpus)
{
struct kvm_vm *vm;
int i;
printf("Testing creating %d vCPUs, with IDs %d...%d.\n",
num_vcpus, first_vcpu_id, first_vcpu_id + num_vcpus - 1);
vm = vm_create(VM_MODE_P52V48_4K, DEFAULT_GUEST_PHY_PAGES, O_RDWR);
for (i = 0; i < num_vcpus; i++) {
int vcpu_id = first_vcpu_id + i;
/* This asserts that the vCPU was created. */
vm_vcpu_add(vm, vcpu_id, 0, 0);
}
kvm_vm_free(vm);
}
int main(int argc, char *argv[])
{
int kvm_max_vcpu_id = kvm_check_cap(KVM_CAP_MAX_VCPU_ID);
int kvm_max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS);
printf("KVM_CAP_MAX_VCPU_ID: %d\n", kvm_max_vcpu_id);
printf("KVM_CAP_MAX_VCPUS: %d\n", kvm_max_vcpus);
/*
* Upstream KVM prior to 4.8 does not support KVM_CAP_MAX_VCPU_ID.
* Userspace is supposed to use KVM_CAP_MAX_VCPUS as the maximum ID
* in this case.
*/
if (!kvm_max_vcpu_id)
kvm_max_vcpu_id = kvm_max_vcpus;
TEST_ASSERT(kvm_max_vcpu_id >= kvm_max_vcpus,
"KVM_MAX_VCPU_ID (%d) must be at least as large as KVM_MAX_VCPUS (%d).",
kvm_max_vcpu_id, kvm_max_vcpus);
test_vcpu_creation(0, kvm_max_vcpus);
if (kvm_max_vcpu_id > kvm_max_vcpus)
test_vcpu_creation(
kvm_max_vcpu_id - kvm_max_vcpus, kvm_max_vcpus);
return 0;
}

View File

@ -0,0 +1,280 @@
/*
* vmx_set_nested_state_test
*
* Copyright (C) 2019, Google LLC.
*
* This work is licensed under the terms of the GNU GPL, version 2.
*
* This test verifies the integrity of calling the ioctl KVM_SET_NESTED_STATE.
*/
#include "test_util.h"
#include "kvm_util.h"
#include "processor.h"
#include "vmx.h"
#include <errno.h>
#include <linux/kvm.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
/*
* Mirror of VMCS12_REVISION in arch/x86/kvm/vmx/vmcs12.h. If that value
* changes this should be updated.
*/
#define VMCS12_REVISION 0x11e57ed0
#define VCPU_ID 5
void test_nested_state(struct kvm_vm *vm, struct kvm_nested_state *state)
{
volatile struct kvm_run *run;
vcpu_nested_state_set(vm, VCPU_ID, state, false);
run = vcpu_state(vm, VCPU_ID);
vcpu_run(vm, VCPU_ID);
TEST_ASSERT(run->exit_reason == KVM_EXIT_SHUTDOWN,
"Got exit_reason other than KVM_EXIT_SHUTDOWN: %u (%s),\n",
run->exit_reason,
exit_reason_str(run->exit_reason));
}
void test_nested_state_expect_errno(struct kvm_vm *vm,
struct kvm_nested_state *state,
int expected_errno)
{
volatile struct kvm_run *run;
int rv;
rv = vcpu_nested_state_set(vm, VCPU_ID, state, true);
TEST_ASSERT(rv == -1 && errno == expected_errno,
"Expected %s (%d) from vcpu_nested_state_set but got rv: %i errno: %s (%d)",
strerror(expected_errno), expected_errno, rv, strerror(errno),
errno);
run = vcpu_state(vm, VCPU_ID);
vcpu_run(vm, VCPU_ID);
TEST_ASSERT(run->exit_reason == KVM_EXIT_SHUTDOWN,
"Got exit_reason other than KVM_EXIT_SHUTDOWN: %u (%s),\n",
run->exit_reason,
exit_reason_str(run->exit_reason));
}
void test_nested_state_expect_einval(struct kvm_vm *vm,
struct kvm_nested_state *state)
{
test_nested_state_expect_errno(vm, state, EINVAL);
}
void test_nested_state_expect_efault(struct kvm_vm *vm,
struct kvm_nested_state *state)
{
test_nested_state_expect_errno(vm, state, EFAULT);
}
void set_revision_id_for_vmcs12(struct kvm_nested_state *state,
u32 vmcs12_revision)
{
/* Set revision_id in vmcs12 to vmcs12_revision. */
*(u32 *)(state->data) = vmcs12_revision;
}
void set_default_state(struct kvm_nested_state *state)
{
memset(state, 0, sizeof(*state));
state->flags = KVM_STATE_NESTED_RUN_PENDING |
KVM_STATE_NESTED_GUEST_MODE;
state->format = 0;
state->size = sizeof(*state);
}
void set_default_vmx_state(struct kvm_nested_state *state, int size)
{
memset(state, 0, size);
state->flags = KVM_STATE_NESTED_GUEST_MODE |
KVM_STATE_NESTED_RUN_PENDING |
KVM_STATE_NESTED_EVMCS;
state->format = 0;
state->size = size;
state->vmx.vmxon_pa = 0x1000;
state->vmx.vmcs_pa = 0x2000;
state->vmx.smm.flags = 0;
set_revision_id_for_vmcs12(state, VMCS12_REVISION);
}
void test_vmx_nested_state(struct kvm_vm *vm)
{
/* Add a page for VMCS12. */
const int state_sz = sizeof(struct kvm_nested_state) + getpagesize();
struct kvm_nested_state *state =
(struct kvm_nested_state *)malloc(state_sz);
/* The format must be set to 0. 0 for VMX, 1 for SVM. */
set_default_vmx_state(state, state_sz);
state->format = 1;
test_nested_state_expect_einval(vm, state);
/*
* We cannot virtualize anything if the guest does not have VMX
* enabled.
*/
set_default_vmx_state(state, state_sz);
test_nested_state_expect_einval(vm, state);
/*
* We cannot virtualize anything if the guest does not have VMX
* enabled. We expect KVM_SET_NESTED_STATE to return 0 if vmxon_pa
* is set to -1ull.
*/
set_default_vmx_state(state, state_sz);
state->vmx.vmxon_pa = -1ull;
test_nested_state(vm, state);
/* Enable VMX in the guest CPUID. */
vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
/* It is invalid to have vmxon_pa == -1ull and SMM flags non-zero. */
set_default_vmx_state(state, state_sz);
state->vmx.vmxon_pa = -1ull;
state->vmx.smm.flags = 1;
test_nested_state_expect_einval(vm, state);
/* It is invalid to have vmxon_pa == -1ull and vmcs_pa != -1ull. */
set_default_vmx_state(state, state_sz);
state->vmx.vmxon_pa = -1ull;
state->vmx.vmcs_pa = 0;
test_nested_state_expect_einval(vm, state);
/*
* Setting vmxon_pa == -1ull and vmcs_pa == -1ull exits early without
* setting the nested state.
*/
set_default_vmx_state(state, state_sz);
state->vmx.vmxon_pa = -1ull;
state->vmx.vmcs_pa = -1ull;
test_nested_state(vm, state);
/* It is invalid to have vmxon_pa set to a non-page aligned address. */
set_default_vmx_state(state, state_sz);
state->vmx.vmxon_pa = 1;
test_nested_state_expect_einval(vm, state);
/*
* It is invalid to have KVM_STATE_NESTED_SMM_GUEST_MODE and
* KVM_STATE_NESTED_GUEST_MODE set together.
*/
set_default_vmx_state(state, state_sz);
state->flags = KVM_STATE_NESTED_GUEST_MODE |
KVM_STATE_NESTED_RUN_PENDING;
state->vmx.smm.flags = KVM_STATE_NESTED_SMM_GUEST_MODE;
test_nested_state_expect_einval(vm, state);
/*
* It is invalid to have any of the SMM flags set besides:
* KVM_STATE_NESTED_SMM_GUEST_MODE
* KVM_STATE_NESTED_SMM_VMXON
*/
set_default_vmx_state(state, state_sz);
state->vmx.smm.flags = ~(KVM_STATE_NESTED_SMM_GUEST_MODE |
KVM_STATE_NESTED_SMM_VMXON);
test_nested_state_expect_einval(vm, state);
/* Outside SMM, SMM flags must be zero. */
set_default_vmx_state(state, state_sz);
state->flags = 0;
state->vmx.smm.flags = KVM_STATE_NESTED_SMM_GUEST_MODE;
test_nested_state_expect_einval(vm, state);
/* Size must be large enough to fit kvm_nested_state and vmcs12. */
set_default_vmx_state(state, state_sz);
state->size = sizeof(*state);
test_nested_state(vm, state);
/* vmxon_pa cannot be the same address as vmcs_pa. */
set_default_vmx_state(state, state_sz);
state->vmx.vmxon_pa = 0;
state->vmx.vmcs_pa = 0;
test_nested_state_expect_einval(vm, state);
/* The revision id for vmcs12 must be VMCS12_REVISION. */
set_default_vmx_state(state, state_sz);
set_revision_id_for_vmcs12(state, 0);
test_nested_state_expect_einval(vm, state);
/*
* Test that if we leave nesting the state reflects that when we get
* it again.
*/
set_default_vmx_state(state, state_sz);
state->vmx.vmxon_pa = -1ull;
state->vmx.vmcs_pa = -1ull;
state->flags = 0;
test_nested_state(vm, state);
vcpu_nested_state_get(vm, VCPU_ID, state);
TEST_ASSERT(state->size >= sizeof(*state) && state->size <= state_sz,
"Size must be between %d and %d. The size returned was %d.",
sizeof(*state), state_sz, state->size);
TEST_ASSERT(state->vmx.vmxon_pa == -1ull, "vmxon_pa must be -1ull.");
TEST_ASSERT(state->vmx.vmcs_pa == -1ull, "vmcs_pa must be -1ull.");
free(state);
}
int main(int argc, char *argv[])
{
struct kvm_vm *vm;
struct kvm_nested_state state;
struct kvm_cpuid_entry2 *entry = kvm_get_supported_cpuid_entry(1);
if (!kvm_check_cap(KVM_CAP_NESTED_STATE)) {
printf("KVM_CAP_NESTED_STATE not available, skipping test\n");
exit(KSFT_SKIP);
}
/*
* AMD currently does not implement set_nested_state, so for now we
* just early out.
*/
if (!(entry->ecx & CPUID_VMX)) {
fprintf(stderr, "nested VMX not enabled, skipping test\n");
exit(KSFT_SKIP);
}
vm = vm_create_default(VCPU_ID, 0, 0);
/* Passing a NULL kvm_nested_state causes a EFAULT. */
test_nested_state_expect_efault(vm, NULL);
/* 'size' cannot be smaller than sizeof(kvm_nested_state). */
set_default_state(&state);
state.size = 0;
test_nested_state_expect_einval(vm, &state);
/*
* Setting the flags 0xf fails the flags check. The only flags that
* can be used are:
* KVM_STATE_NESTED_GUEST_MODE
* KVM_STATE_NESTED_RUN_PENDING
* KVM_STATE_NESTED_EVMCS
*/
set_default_state(&state);
state.flags = 0xf;
test_nested_state_expect_einval(vm, &state);
/*
* If KVM_STATE_NESTED_RUN_PENDING is set then
* KVM_STATE_NESTED_GUEST_MODE has to be set as well.
*/
set_default_state(&state);
state.flags = KVM_STATE_NESTED_RUN_PENDING;
test_nested_state_expect_einval(vm, &state);
/*
* TODO: When SVM support is added for KVM_SET_NESTED_STATE
* add tests here to support it like VMX.
*/
if (entry->ecx & CPUID_VMX)
test_vmx_nested_state(vm);
kvm_vm_free(vm);
return 0;
}

View File

@ -57,3 +57,6 @@ config HAVE_KVM_VCPU_ASYNC_IOCTL
config HAVE_KVM_VCPU_RUN_PID_CHANGE
bool
config HAVE_KVM_NO_POLL
bool

View File

@ -56,7 +56,7 @@
__asm__(".arch_extension virt");
#endif
DEFINE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state);
DEFINE_PER_CPU(kvm_host_data_t, kvm_host_data);
static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
/* Per-CPU variable containing the currently running vcpu. */
@ -224,9 +224,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_MAX_VCPUS:
r = KVM_MAX_VCPUS;
break;
case KVM_CAP_NR_MEMSLOTS:
r = KVM_USER_MEM_SLOTS;
break;
case KVM_CAP_MSI_DEVID:
if (!kvm)
r = -EINVAL;
@ -360,8 +357,10 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
int *last_ran;
kvm_host_data_t *cpu_data;
last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran);
cpu_data = this_cpu_ptr(&kvm_host_data);
/*
* We might get preempted before the vCPU actually runs, but
@ -373,18 +372,21 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
}
vcpu->cpu = cpu;
vcpu->arch.host_cpu_context = this_cpu_ptr(&kvm_host_cpu_state);
vcpu->arch.host_cpu_context = &cpu_data->host_ctxt;
kvm_arm_set_running_vcpu(vcpu);
kvm_vgic_load(vcpu);
kvm_timer_vcpu_load(vcpu);
kvm_vcpu_load_sysregs(vcpu);
kvm_arch_vcpu_load_fp(vcpu);
kvm_vcpu_pmu_restore_guest(vcpu);
if (single_task_running())
vcpu_clear_wfe_traps(vcpu);
else
vcpu_set_wfe_traps(vcpu);
vcpu_ptrauth_setup_lazy(vcpu);
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@ -393,6 +395,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_vcpu_put_sysregs(vcpu);
kvm_timer_vcpu_put(vcpu);
kvm_vgic_put(vcpu);
kvm_vcpu_pmu_restore_host(vcpu);
vcpu->cpu = -1;
@ -545,6 +548,9 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
if (likely(vcpu->arch.has_run_once))
return 0;
if (!kvm_arm_vcpu_is_finalized(vcpu))
return -EPERM;
vcpu->arch.has_run_once = true;
if (likely(irqchip_in_kernel(kvm))) {
@ -1121,6 +1127,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
if (unlikely(!kvm_vcpu_initialized(vcpu)))
break;
r = -EPERM;
if (!kvm_arm_vcpu_is_finalized(vcpu))
break;
r = -EFAULT;
if (copy_from_user(&reg_list, user_list, sizeof(reg_list)))
break;
@ -1174,6 +1184,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
return kvm_arm_vcpu_set_events(vcpu, &events);
}
case KVM_ARM_VCPU_FINALIZE: {
int what;
if (!kvm_vcpu_initialized(vcpu))
return -ENOEXEC;
if (get_user(what, (const int __user *)argp))
return -EFAULT;
return kvm_arm_vcpu_finalize(vcpu, what);
}
default:
r = -EINVAL;
}
@ -1554,11 +1575,11 @@ static int init_hyp_mode(void)
}
for_each_possible_cpu(cpu) {
kvm_cpu_context_t *cpu_ctxt;
kvm_host_data_t *cpu_data;
cpu_ctxt = per_cpu_ptr(&kvm_host_cpu_state, cpu);
kvm_init_host_cpu_context(cpu_ctxt, cpu);
err = create_hyp_mappings(cpu_ctxt, cpu_ctxt + 1, PAGE_HYP);
cpu_data = per_cpu_ptr(&kvm_host_data, cpu);
kvm_init_host_cpu_context(&cpu_data->host_ctxt, cpu);
err = create_hyp_mappings(cpu_data, cpu_data + 1, PAGE_HYP);
if (err) {
kvm_err("Cannot map host CPU state: %d\n", err);
@ -1669,6 +1690,10 @@ int kvm_arch_init(void *opaque)
if (err)
return err;
err = kvm_arm_init_sve();
if (err)
return err;
if (!in_hyp_mode) {
err = init_hyp_mode();
if (err)

View File

@ -51,9 +51,9 @@
#include <linux/slab.h>
#include <linux/sort.h>
#include <linux/bsearch.h>
#include <linux/io.h>
#include <asm/processor.h>
#include <asm/io.h>
#include <asm/ioctl.h>
#include <linux/uaccess.h>
#include <asm/pgtable.h>
@ -1135,11 +1135,11 @@ EXPORT_SYMBOL_GPL(kvm_get_dirty_log);
#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
/**
* kvm_get_dirty_log_protect - get a snapshot of dirty pages, and if any pages
* kvm_get_dirty_log_protect - get a snapshot of dirty pages
* and reenable dirty page tracking for the corresponding pages.
* @kvm: pointer to kvm instance
* @log: slot id and address to which we copy the log
* @is_dirty: flag set if any page is dirty
* @flush: true if TLB flush is needed by caller
*
* We need to keep it in mind that VCPU threads can write to the bitmap
* concurrently. So, to avoid losing track of dirty pages we keep the
@ -1224,6 +1224,7 @@ EXPORT_SYMBOL_GPL(kvm_get_dirty_log_protect);
* and reenable dirty page tracking for the corresponding pages.
* @kvm: pointer to kvm instance
* @log: slot id and address from which to fetch the bitmap of dirty pages
* @flush: true if TLB flush is needed by caller
*/
int kvm_clear_dirty_log_protect(struct kvm *kvm,
struct kvm_clear_dirty_log *log, bool *flush)
@ -1251,7 +1252,7 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
if (!dirty_bitmap)
return -ENOENT;
n = kvm_dirty_bitmap_bytes(memslot);
n = ALIGN(log->num_pages, BITS_PER_LONG) / 8;
if (log->first_page > memslot->npages ||
log->num_pages > memslot->npages - log->first_page ||
@ -1264,8 +1265,8 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
return -EFAULT;
spin_lock(&kvm->mmu_lock);
for (offset = log->first_page,
i = offset / BITS_PER_LONG, n = log->num_pages / BITS_PER_LONG; n--;
for (offset = log->first_page, i = offset / BITS_PER_LONG,
n = DIV_ROUND_UP(log->num_pages, BITS_PER_LONG); n--;
i++, offset += BITS_PER_LONG) {
unsigned long mask = *dirty_bitmap_buffer++;
atomic_long_t *p = (atomic_long_t *) &dirty_bitmap[i];
@ -1742,6 +1743,70 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
}
EXPORT_SYMBOL_GPL(gfn_to_page);
static int __kvm_map_gfn(struct kvm_memory_slot *slot, gfn_t gfn,
struct kvm_host_map *map)
{
kvm_pfn_t pfn;
void *hva = NULL;
struct page *page = KVM_UNMAPPED_PAGE;
if (!map)
return -EINVAL;
pfn = gfn_to_pfn_memslot(slot, gfn);
if (is_error_noslot_pfn(pfn))
return -EINVAL;
if (pfn_valid(pfn)) {
page = pfn_to_page(pfn);
hva = kmap(page);
} else {
hva = memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
}
if (!hva)
return -EFAULT;
map->page = page;
map->hva = hva;
map->pfn = pfn;
map->gfn = gfn;
return 0;
}
int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
{
return __kvm_map_gfn(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn, map);
}
EXPORT_SYMBOL_GPL(kvm_vcpu_map);
void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map,
bool dirty)
{
if (!map)
return;
if (!map->hva)
return;
if (map->page)
kunmap(map->page);
else
memunmap(map->hva);
if (dirty) {
kvm_vcpu_mark_page_dirty(vcpu, map->gfn);
kvm_release_pfn_dirty(map->pfn);
} else {
kvm_release_pfn_clean(map->pfn);
}
map->hva = NULL;
map->page = NULL;
}
EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
{
kvm_pfn_t pfn;
@ -2255,7 +2320,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
u64 block_ns;
start = cur = ktime_get();
if (vcpu->halt_poll_ns) {
if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) {
ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
++vcpu->stat.halt_attempted_poll;
@ -2886,6 +2951,16 @@ out:
}
#endif
static int kvm_device_mmap(struct file *filp, struct vm_area_struct *vma)
{
struct kvm_device *dev = filp->private_data;
if (dev->ops->mmap)
return dev->ops->mmap(dev, vma);
return -ENODEV;
}
static int kvm_device_ioctl_attr(struct kvm_device *dev,
int (*accessor)(struct kvm_device *dev,
struct kvm_device_attr *attr),
@ -2930,6 +3005,13 @@ static int kvm_device_release(struct inode *inode, struct file *filp)
struct kvm_device *dev = filp->private_data;
struct kvm *kvm = dev->kvm;
if (dev->ops->release) {
mutex_lock(&kvm->lock);
list_del(&dev->vm_node);
dev->ops->release(dev);
mutex_unlock(&kvm->lock);
}
kvm_put_kvm(kvm);
return 0;
}
@ -2938,6 +3020,7 @@ static const struct file_operations kvm_device_fops = {
.unlocked_ioctl = kvm_device_ioctl,
.release = kvm_device_release,
KVM_COMPAT(kvm_device_ioctl),
.mmap = kvm_device_mmap,
};
struct kvm_device *kvm_device_from_filp(struct file *filp)
@ -3046,7 +3129,7 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
case KVM_CAP_CHECK_EXTENSION_VM:
case KVM_CAP_ENABLE_CAP_VM:
#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT:
case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
#endif
return 1;
#ifdef CONFIG_KVM_MMIO
@ -3065,6 +3148,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
#endif
case KVM_CAP_MAX_VCPU_ID:
return KVM_MAX_VCPU_ID;
case KVM_CAP_NR_MEMSLOTS:
return KVM_USER_MEM_SLOTS;
default:
break;
}
@ -3082,7 +3167,7 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
{
switch (cap->cap) {
#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT:
case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
if (cap->flags || (cap->args[0] & ~1))
return -EINVAL;
kvm->manual_dirty_log_protect = cap->args[0];