forked from Minki/linux
S390:
* ultravisor communication device driver * fix TEID on terminating storage key ops RISC-V: * Added Sv57x4 support for G-stage page table * Added range based local HFENCE functions * Added remote HFENCE functions based on VCPU requests * Added ISA extension registers in ONE_REG interface * Updated KVM RISC-V maintainers entry to cover selftests support ARM: * Add support for the ARMv8.6 WFxT extension * Guard pages for the EL2 stacks * Trap and emulate AArch32 ID registers to hide unsupported features * Ability to select and save/restore the set of hypercalls exposed to the guest * Support for PSCI-initiated suspend in collaboration with userspace * GICv3 register-based LPI invalidation support * Move host PMU event merging into the vcpu data structure * GICv3 ITS save/restore fixes * The usual set of small-scale cleanups and fixes x86: * New ioctls to get/set TSC frequency for a whole VM * Allow userspace to opt out of hypercall patching * Only do MSR filtering for MSRs accessed by rdmsr/wrmsr AMD SEV improvements: * Add KVM_EXIT_SHUTDOWN metadata for SEV-ES * V_TSC_AUX support Nested virtualization improvements for AMD: * Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE, nested vGIF) * Allow AVIC to co-exist with a nested guest running * Fixes for LBR virtualizations when a nested guest is running, and nested LBR virtualization support * PAUSE filtering for nested hypervisors Guest support: * Decoupling of vcpu_is_preempted from PV spinlocks -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKN9M4UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNLeAf+KizAlQwxEehHHeNyTkZuKyMawrD6 zsqAENR6i1TxiXe7fDfPFbO2NR0ZulQopHbD9mwnHJ+nNw0J4UT7g3ii1IAVcXPu rQNRGMVWiu54jt+lep8/gDg0JvPGKVVKLhxUaU1kdWT9PhIOC6lwpP3vmeWkUfRi PFL/TMT0M8Nfryi0zHB0tXeqg41BiXfqO8wMySfBAHUbpv8D53D2eXQL6YlMM0pL 2quB1HxHnpueE5vj3WEPQ3PCdy1M2MTfCDBJAbZGG78Ljx45FxSGoQcmiBpPnhJr C6UGP4ZDWpml5YULUoA70k5ylCbP+vI61U4vUtzEiOjHugpPV5wFKtx5nw== =ozWx -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "S390: - ultravisor communication device driver - fix TEID on terminating storage key ops RISC-V: - Added Sv57x4 support for G-stage page table - Added range based local HFENCE functions - Added remote HFENCE functions based on VCPU requests - Added ISA extension registers in ONE_REG interface - Updated KVM RISC-V maintainers entry to cover selftests support ARM: - Add support for the ARMv8.6 WFxT extension - Guard pages for the EL2 stacks - Trap and emulate AArch32 ID registers to hide unsupported features - Ability to select and save/restore the set of hypercalls exposed to the guest - Support for PSCI-initiated suspend in collaboration with userspace - GICv3 register-based LPI invalidation support - Move host PMU event merging into the vcpu data structure - GICv3 ITS save/restore fixes - The usual set of small-scale cleanups and fixes x86: - New ioctls to get/set TSC frequency for a whole VM - Allow userspace to opt out of hypercall patching - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr AMD SEV improvements: - Add KVM_EXIT_SHUTDOWN metadata for SEV-ES - V_TSC_AUX support Nested virtualization improvements for AMD: - Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE, nested vGIF) - Allow AVIC to co-exist with a nested guest running - Fixes for LBR virtualizations when a nested guest is running, and nested LBR virtualization support - PAUSE filtering for nested hypervisors Guest support: - Decoupling of vcpu_is_preempted from PV spinlocks" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (199 commits) KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest KVM: selftests: x86: Sync the new name of the test case to .gitignore Documentation: kvm: reorder ARM-specific section about KVM_SYSTEM_EVENT_SUSPEND x86, kvm: use correct GFP flags for preemption disabled KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer x86/kvm: Alloc dummy async #PF token outside of raw spinlock KVM: x86: avoid calling x86 emulator without a decoded instruction KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave) s390/uv_uapi: depend on CONFIG_S390 KVM: selftests: x86: Fix test failure on arch lbr capable platforms KVM: LAPIC: Trace LAPIC timer expiration on every vmentry KVM: s390: selftest: Test suppression indication on key prot exception KVM: s390: Don't indicate suppression on dirtying, failing memop selftests: drivers/s390x: Add uvdevice tests drivers/s390/char: Add Ultravisor io device MAINTAINERS: Update KVM RISC-V entry to cover selftests support RISC-V: KVM: Introduce ISA extension register RISC-V: KVM: Cleanup stale TLB entries when host CPU changes RISC-V: KVM: Add remote HFENCE functions based on VCPU requests ...
This commit is contained in:
commit
bf9095424d
@ -290,6 +290,8 @@ infrastructure:
|
||||
+------------------------------+---------+---------+
|
||||
| RPRES | [7-4] | y |
|
||||
+------------------------------+---------+---------+
|
||||
| WFXT | [3-0] | y |
|
||||
+------------------------------+---------+---------+
|
||||
|
||||
|
||||
Appendix I: Example
|
||||
|
@ -297,6 +297,10 @@ HWCAP2_SME_FA64
|
||||
|
||||
Functionality implied by ID_AA64SMFR0_EL1.FA64 == 0b1.
|
||||
|
||||
HWCAP2_WFXT
|
||||
|
||||
Functionality implied by ID_AA64ISAR2_EL1.WFXT == 0b0010.
|
||||
|
||||
4. Unused AT_HWCAP bits
|
||||
-----------------------
|
||||
|
||||
|
@ -982,12 +982,22 @@ memory.
|
||||
__u8 pad2[30];
|
||||
};
|
||||
|
||||
If the KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag is returned from the
|
||||
KVM_CAP_XEN_HVM check, it may be set in the flags field of this ioctl.
|
||||
This requests KVM to generate the contents of the hypercall page
|
||||
automatically; hypercalls will be intercepted and passed to userspace
|
||||
through KVM_EXIT_XEN. In this case, all of the blob size and address
|
||||
fields must be zero.
|
||||
If certain flags are returned from the KVM_CAP_XEN_HVM check, they may
|
||||
be set in the flags field of this ioctl:
|
||||
|
||||
The KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag requests KVM to generate
|
||||
the contents of the hypercall page automatically; hypercalls will be
|
||||
intercepted and passed to userspace through KVM_EXIT_XEN. In this
|
||||
ase, all of the blob size and address fields must be zero.
|
||||
|
||||
The KVM_XEN_HVM_CONFIG_EVTCHN_SEND flag indicates to KVM that userspace
|
||||
will always use the KVM_XEN_HVM_EVTCHN_SEND ioctl to deliver event
|
||||
channel interrupts rather than manipulating the guest's shared_info
|
||||
structures directly. This, in turn, may allow KVM to enable features
|
||||
such as intercepting the SCHEDOP_poll hypercall to accelerate PV
|
||||
spinlock operation for the guest. Userspace may still use the ioctl
|
||||
to deliver events if it was advertised, even if userspace does not
|
||||
send this indication that it will always do so
|
||||
|
||||
No other flags are currently valid in the struct kvm_xen_hvm_config.
|
||||
|
||||
@ -1476,14 +1486,43 @@ Possible values are:
|
||||
[s390]
|
||||
KVM_MP_STATE_LOAD the vcpu is in a special load/startup state
|
||||
[s390]
|
||||
KVM_MP_STATE_SUSPENDED the vcpu is in a suspend state and is waiting
|
||||
for a wakeup event [arm64]
|
||||
========================== ===============================================
|
||||
|
||||
On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
|
||||
in-kernel irqchip, the multiprocessing state must be maintained by userspace on
|
||||
these architectures.
|
||||
|
||||
For arm64/riscv:
|
||||
^^^^^^^^^^^^^^^^
|
||||
For arm64:
|
||||
^^^^^^^^^^
|
||||
|
||||
If a vCPU is in the KVM_MP_STATE_SUSPENDED state, KVM will emulate the
|
||||
architectural execution of a WFI instruction.
|
||||
|
||||
If a wakeup event is recognized, KVM will exit to userspace with a
|
||||
KVM_SYSTEM_EVENT exit, where the event type is KVM_SYSTEM_EVENT_WAKEUP. If
|
||||
userspace wants to honor the wakeup, it must set the vCPU's MP state to
|
||||
KVM_MP_STATE_RUNNABLE. If it does not, KVM will continue to await a wakeup
|
||||
event in subsequent calls to KVM_RUN.
|
||||
|
||||
.. warning::
|
||||
|
||||
If userspace intends to keep the vCPU in a SUSPENDED state, it is
|
||||
strongly recommended that userspace take action to suppress the
|
||||
wakeup event (such as masking an interrupt). Otherwise, subsequent
|
||||
calls to KVM_RUN will immediately exit with a KVM_SYSTEM_EVENT_WAKEUP
|
||||
event and inadvertently waste CPU cycles.
|
||||
|
||||
Additionally, if userspace takes action to suppress a wakeup event,
|
||||
it is strongly recommended that it also restores the vCPU to its
|
||||
original state when the vCPU is made RUNNABLE again. For example,
|
||||
if userspace masked a pending interrupt to suppress the wakeup,
|
||||
the interrupt should be unmasked before returning control to the
|
||||
guest.
|
||||
|
||||
For riscv:
|
||||
^^^^^^^^^^
|
||||
|
||||
The only states that are valid are KVM_MP_STATE_STOPPED and
|
||||
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
|
||||
@ -1887,22 +1926,25 @@ the future.
|
||||
4.55 KVM_SET_TSC_KHZ
|
||||
--------------------
|
||||
|
||||
:Capability: KVM_CAP_TSC_CONTROL
|
||||
:Capability: KVM_CAP_TSC_CONTROL / KVM_CAP_VM_TSC_CONTROL
|
||||
:Architectures: x86
|
||||
:Type: vcpu ioctl
|
||||
:Type: vcpu ioctl / vm ioctl
|
||||
:Parameters: virtual tsc_khz
|
||||
:Returns: 0 on success, -1 on error
|
||||
|
||||
Specifies the tsc frequency for the virtual machine. The unit of the
|
||||
frequency is KHz.
|
||||
|
||||
If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also
|
||||
be used as a vm ioctl to set the initial tsc frequency of subsequently
|
||||
created vCPUs.
|
||||
|
||||
4.56 KVM_GET_TSC_KHZ
|
||||
--------------------
|
||||
|
||||
:Capability: KVM_CAP_GET_TSC_KHZ
|
||||
:Capability: KVM_CAP_GET_TSC_KHZ / KVM_CAP_VM_TSC_CONTROL
|
||||
:Architectures: x86
|
||||
:Type: vcpu ioctl
|
||||
:Type: vcpu ioctl / vm ioctl
|
||||
:Parameters: none
|
||||
:Returns: virtual tsc-khz on success, negative value on error
|
||||
|
||||
@ -2601,6 +2643,24 @@ EINVAL.
|
||||
After the vcpu's SVE configuration is finalized, further attempts to
|
||||
write this register will fail with EPERM.
|
||||
|
||||
arm64 bitmap feature firmware pseudo-registers have the following bit pattern::
|
||||
|
||||
0x6030 0000 0016 <regno:16>
|
||||
|
||||
The bitmap feature firmware registers exposes the hypercall services that
|
||||
are available for userspace to configure. The set bits corresponds to the
|
||||
services that are available for the guests to access. By default, KVM
|
||||
sets all the supported bits during VM initialization. The userspace can
|
||||
discover the available services via KVM_GET_ONE_REG, and write back the
|
||||
bitmap corresponding to the features that it wishes guests to see via
|
||||
KVM_SET_ONE_REG.
|
||||
|
||||
Note: These registers are immutable once any of the vCPUs of the VM has
|
||||
run at least once. A KVM_SET_ONE_REG in such a scenario will return
|
||||
a -EBUSY to userspace.
|
||||
|
||||
(See Documentation/virt/kvm/arm/hypercalls.rst for more details.)
|
||||
|
||||
|
||||
MIPS registers are mapped using the lower 32 bits. The upper 16 of that is
|
||||
the register group type:
|
||||
@ -3754,12 +3814,18 @@ in case of KVM_S390_MEMOP_F_CHECK_ONLY), the ioctl returns a positive
|
||||
error number indicating the type of exception. This exception is also
|
||||
raised directly at the corresponding VCPU if the flag
|
||||
KVM_S390_MEMOP_F_INJECT_EXCEPTION is set.
|
||||
On protection exceptions, unless specified otherwise, the injected
|
||||
translation-exception identifier (TEID) indicates suppression.
|
||||
|
||||
If the KVM_S390_MEMOP_F_SKEY_PROTECTION flag is set, storage key
|
||||
protection is also in effect and may cause exceptions if accesses are
|
||||
prohibited given the access key designated by "key"; the valid range is 0..15.
|
||||
KVM_S390_MEMOP_F_SKEY_PROTECTION is available if KVM_CAP_S390_MEM_OP_EXTENSION
|
||||
is > 0.
|
||||
Since the accessed memory may span multiple pages and those pages might have
|
||||
different storage keys, it is possible that a protection exception occurs
|
||||
after memory has been modified. In this case, if the exception is injected,
|
||||
the TEID does not indicate suppression.
|
||||
|
||||
Absolute read/write:
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
@ -5216,7 +5282,25 @@ have deterministic behavior.
|
||||
struct {
|
||||
__u64 gfn;
|
||||
} shared_info;
|
||||
__u64 pad[4];
|
||||
struct {
|
||||
__u32 send_port;
|
||||
__u32 type; /* EVTCHNSTAT_ipi / EVTCHNSTAT_interdomain */
|
||||
__u32 flags;
|
||||
union {
|
||||
struct {
|
||||
__u32 port;
|
||||
__u32 vcpu;
|
||||
__u32 priority;
|
||||
} port;
|
||||
struct {
|
||||
__u32 port; /* Zero for eventfd */
|
||||
__s32 fd;
|
||||
} eventfd;
|
||||
__u32 padding[4];
|
||||
} deliver;
|
||||
} evtchn;
|
||||
__u32 xen_version;
|
||||
__u64 pad[8];
|
||||
} u;
|
||||
};
|
||||
|
||||
@ -5247,6 +5331,30 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
|
||||
|
||||
KVM_XEN_ATTR_TYPE_UPCALL_VECTOR
|
||||
Sets the exception vector used to deliver Xen event channel upcalls.
|
||||
This is the HVM-wide vector injected directly by the hypervisor
|
||||
(not through the local APIC), typically configured by a guest via
|
||||
HVM_PARAM_CALLBACK_IRQ.
|
||||
|
||||
KVM_XEN_ATTR_TYPE_EVTCHN
|
||||
This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates
|
||||
support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It configures
|
||||
an outbound port number for interception of EVTCHNOP_send requests
|
||||
from the guest. A given sending port number may be directed back
|
||||
to a specified vCPU (by APIC ID) / port / priority on the guest,
|
||||
or to trigger events on an eventfd. The vCPU and priority can be
|
||||
changed by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call,
|
||||
but other fields cannot change for a given sending port. A port
|
||||
mapping is removed by using KVM_XEN_EVTCHN_DEASSIGN in the flags
|
||||
field.
|
||||
|
||||
KVM_XEN_ATTR_TYPE_XEN_VERSION
|
||||
This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates
|
||||
support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It configures
|
||||
the 32-bit version code returned to the guest when it invokes the
|
||||
XENVER_version call; typically (XEN_MAJOR << 16 | XEN_MINOR). PV
|
||||
Xen guests will often use this to as a dummy hypercall to trigger
|
||||
event channel delivery, so responding within the kernel without
|
||||
exiting to userspace is beneficial.
|
||||
|
||||
4.127 KVM_XEN_HVM_GET_ATTR
|
||||
--------------------------
|
||||
@ -5258,7 +5366,8 @@ KVM_XEN_ATTR_TYPE_UPCALL_VECTOR
|
||||
:Returns: 0 on success, < 0 on error
|
||||
|
||||
Allows Xen VM attributes to be read. For the structure and types,
|
||||
see KVM_XEN_HVM_SET_ATTR above.
|
||||
see KVM_XEN_HVM_SET_ATTR above. The KVM_XEN_ATTR_TYPE_EVTCHN
|
||||
attribute cannot be read.
|
||||
|
||||
4.128 KVM_XEN_VCPU_SET_ATTR
|
||||
---------------------------
|
||||
@ -5285,6 +5394,13 @@ see KVM_XEN_HVM_SET_ATTR above.
|
||||
__u64 time_blocked;
|
||||
__u64 time_offline;
|
||||
} runstate;
|
||||
__u32 vcpu_id;
|
||||
struct {
|
||||
__u32 port;
|
||||
__u32 priority;
|
||||
__u64 expires_ns;
|
||||
} timer;
|
||||
__u8 vector;
|
||||
} u;
|
||||
};
|
||||
|
||||
@ -5326,6 +5442,27 @@ KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST
|
||||
or RUNSTATE_offline) to set the current accounted state as of the
|
||||
adjusted state_entry_time.
|
||||
|
||||
KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID
|
||||
This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates
|
||||
support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the Xen
|
||||
vCPU ID of the given vCPU, to allow timer-related VCPU operations to
|
||||
be intercepted by KVM.
|
||||
|
||||
KVM_XEN_VCPU_ATTR_TYPE_TIMER
|
||||
This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates
|
||||
support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the
|
||||
event channel port/priority for the VIRQ_TIMER of the vCPU, as well
|
||||
as allowing a pending timer to be saved/restored.
|
||||
|
||||
KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR
|
||||
This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates
|
||||
support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the
|
||||
per-vCPU local APIC upcall vector, configured by a Xen guest with
|
||||
the HVMOP_set_evtchn_upcall_vector hypercall. This is typically
|
||||
used by Windows guests, and is distinct from the HVM-wide upcall
|
||||
vector configured with HVM_PARAM_CALLBACK_IRQ.
|
||||
|
||||
|
||||
4.129 KVM_XEN_VCPU_GET_ATTR
|
||||
---------------------------
|
||||
|
||||
@ -5645,6 +5782,25 @@ enabled with ``arch_prctl()``, but this may change in the future.
|
||||
The offsets of the state save areas in struct kvm_xsave follow the contents
|
||||
of CPUID leaf 0xD on the host.
|
||||
|
||||
4.135 KVM_XEN_HVM_EVTCHN_SEND
|
||||
-----------------------------
|
||||
|
||||
:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND
|
||||
:Architectures: x86
|
||||
:Type: vm ioctl
|
||||
:Parameters: struct kvm_irq_routing_xen_evtchn
|
||||
:Returns: 0 on success, < 0 on error
|
||||
|
||||
|
||||
::
|
||||
|
||||
struct kvm_irq_routing_xen_evtchn {
|
||||
__u32 port;
|
||||
__u32 vcpu;
|
||||
__u32 priority;
|
||||
};
|
||||
|
||||
This ioctl injects an event channel interrupt directly to the guest vCPU.
|
||||
|
||||
5. The kvm_run structure
|
||||
========================
|
||||
@ -5987,6 +6143,9 @@ should put the acknowledged interrupt vector into the 'epr' field.
|
||||
#define KVM_SYSTEM_EVENT_SHUTDOWN 1
|
||||
#define KVM_SYSTEM_EVENT_RESET 2
|
||||
#define KVM_SYSTEM_EVENT_CRASH 3
|
||||
#define KVM_SYSTEM_EVENT_WAKEUP 4
|
||||
#define KVM_SYSTEM_EVENT_SUSPEND 5
|
||||
#define KVM_SYSTEM_EVENT_SEV_TERM 6
|
||||
__u32 type;
|
||||
__u32 ndata;
|
||||
__u64 data[16];
|
||||
@ -6011,6 +6170,13 @@ Valid values for 'type' are:
|
||||
has requested a crash condition maintenance. Userspace can choose
|
||||
to ignore the request, or to gather VM memory core dump and/or
|
||||
reset/shutdown of the VM.
|
||||
- KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination.
|
||||
The guest physical address of the guest's GHCB is stored in `data[0]`.
|
||||
- KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and
|
||||
KVM has recognized a wakeup event. Userspace may honor this event by
|
||||
marking the exiting vCPU as runnable, or deny it and call KVM_RUN again.
|
||||
- KVM_SYSTEM_EVENT_SUSPEND -- the guest has requested a suspension of
|
||||
the VM.
|
||||
|
||||
If KVM_CAP_SYSTEM_EVENT_DATA is present, the 'data' field can contain
|
||||
architecture specific information for the system-level event. Only
|
||||
@ -6027,6 +6193,32 @@ Previous versions of Linux defined a `flags` member in this struct. The
|
||||
field is now aliased to `data[0]`. Userspace can assume that it is only
|
||||
written if ndata is greater than 0.
|
||||
|
||||
For arm/arm64:
|
||||
--------------
|
||||
|
||||
KVM_SYSTEM_EVENT_SUSPEND exits are enabled with the
|
||||
KVM_CAP_ARM_SYSTEM_SUSPEND VM capability. If a guest invokes the PSCI
|
||||
SYSTEM_SUSPEND function, KVM will exit to userspace with this event
|
||||
type.
|
||||
|
||||
It is the sole responsibility of userspace to implement the PSCI
|
||||
SYSTEM_SUSPEND call according to ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND".
|
||||
KVM does not change the vCPU's state before exiting to userspace, so
|
||||
the call parameters are left in-place in the vCPU registers.
|
||||
|
||||
Userspace is _required_ to take action for such an exit. It must
|
||||
either:
|
||||
|
||||
- Honor the guest request to suspend the VM. Userspace can request
|
||||
in-kernel emulation of suspension by setting the calling vCPU's
|
||||
state to KVM_MP_STATE_SUSPENDED. Userspace must configure the vCPU's
|
||||
state according to the parameters passed to the PSCI function when
|
||||
the calling vCPU is resumed. See ARM DEN0022D.b 5.19.1 "Intended use"
|
||||
for details on the function parameters.
|
||||
|
||||
- Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2
|
||||
"Caller responsibilities" for possible return values.
|
||||
|
||||
::
|
||||
|
||||
/* KVM_EXIT_IOAPIC_EOI */
|
||||
@ -7147,6 +7339,15 @@ The valid bits in cap.args[0] are:
|
||||
Additionally, when this quirk is disabled,
|
||||
KVM clears CPUID.01H:ECX[bit 3] if
|
||||
IA32_MISC_ENABLE[bit 18] is cleared.
|
||||
|
||||
KVM_X86_QUIRK_FIX_HYPERCALL_INSN By default, KVM rewrites guest
|
||||
VMMCALL/VMCALL instructions to match the
|
||||
vendor's hypercall instruction for the
|
||||
system. When this quirk is disabled, KVM
|
||||
will no longer rewrite invalid guest
|
||||
hypercall instructions. Executing the
|
||||
incorrect hypercall instruction will
|
||||
generate a #UD within the guest.
|
||||
=================================== ============================================
|
||||
|
||||
8. Other capabilities.
|
||||
@ -7624,8 +7825,9 @@ PVHVM guests. Valid flags are::
|
||||
#define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR (1 << 0)
|
||||
#define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL (1 << 1)
|
||||
#define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2)
|
||||
#define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 2)
|
||||
#define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 3)
|
||||
#define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3)
|
||||
#define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4)
|
||||
#define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5)
|
||||
|
||||
The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG
|
||||
ioctl is available, for the guest to set its hypercall page.
|
||||
@ -7649,6 +7851,14 @@ The KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL flag indicates that IRQ routing entries
|
||||
of the type KVM_IRQ_ROUTING_XEN_EVTCHN are supported, with the priority
|
||||
field set to indicate 2 level event channel delivery.
|
||||
|
||||
The KVM_XEN_HVM_CONFIG_EVTCHN_SEND flag indicates that KVM supports
|
||||
injecting event channel events directly into the guest with the
|
||||
KVM_XEN_HVM_EVTCHN_SEND ioctl. It also indicates support for the
|
||||
KVM_XEN_ATTR_TYPE_EVTCHN/XEN_VERSION HVM attributes and the
|
||||
KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID/TIMER/UPCALL_VECTOR vCPU attributes.
|
||||
related to event channel delivery, timers, and the XENVER_version
|
||||
interception.
|
||||
|
||||
8.31 KVM_CAP_PPC_MULTITCE
|
||||
-------------------------
|
||||
|
||||
@ -7736,6 +7946,16 @@ At this time, KVM_PMU_CAP_DISABLE is the only capability. Setting
|
||||
this capability will disable PMU virtualization for that VM. Usermode
|
||||
should adjust CPUID leaf 0xA to reflect that the PMU is disabled.
|
||||
|
||||
8.36 KVM_CAP_ARM_SYSTEM_SUSPEND
|
||||
-------------------------------
|
||||
|
||||
:Capability: KVM_CAP_ARM_SYSTEM_SUSPEND
|
||||
:Architectures: arm64
|
||||
:Type: vm
|
||||
|
||||
When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
|
||||
type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
|
||||
|
||||
9. Known KVM API problems
|
||||
=========================
|
||||
|
||||
|
138
Documentation/virt/kvm/arm/hypercalls.rst
Normal file
138
Documentation/virt/kvm/arm/hypercalls.rst
Normal file
@ -0,0 +1,138 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================
|
||||
ARM Hypercall Interface
|
||||
=======================
|
||||
|
||||
KVM handles the hypercall services as requested by the guests. New hypercall
|
||||
services are regularly made available by the ARM specification or by KVM (as
|
||||
vendor services) if they make sense from a virtualization point of view.
|
||||
|
||||
This means that a guest booted on two different versions of KVM can observe
|
||||
two different "firmware" revisions. This could cause issues if a given guest
|
||||
is tied to a particular version of a hypercall service, or if a migration
|
||||
causes a different version to be exposed out of the blue to an unsuspecting
|
||||
guest.
|
||||
|
||||
In order to remedy this situation, KVM exposes a set of "firmware
|
||||
pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
|
||||
interface. These registers can be saved/restored by userspace, and set
|
||||
to a convenient value as required.
|
||||
|
||||
The following registers are defined:
|
||||
|
||||
* KVM_REG_ARM_PSCI_VERSION:
|
||||
|
||||
KVM implements the PSCI (Power State Coordination Interface)
|
||||
specification in order to provide services such as CPU on/off, reset
|
||||
and power-off to the guest.
|
||||
|
||||
- Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
|
||||
(and thus has already been initialized)
|
||||
- Returns the current PSCI version on GET_ONE_REG (defaulting to the
|
||||
highest PSCI version implemented by KVM and compatible with v0.2)
|
||||
- Allows any PSCI version implemented by KVM and compatible with
|
||||
v0.2 to be set with SET_ONE_REG
|
||||
- Affects the whole VM (even if the register view is per-vcpu)
|
||||
|
||||
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
|
||||
Holds the state of the firmware support to mitigate CVE-2017-5715, as
|
||||
offered by KVM to the guest via a HVC call. The workaround is described
|
||||
under SMCCC_ARCH_WORKAROUND_1 in [1].
|
||||
|
||||
Accepted values are:
|
||||
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL:
|
||||
KVM does not offer
|
||||
firmware support for the workaround. The mitigation status for the
|
||||
guest is unknown.
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL:
|
||||
The workaround HVC call is
|
||||
available to the guest and required for the mitigation.
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED:
|
||||
The workaround HVC call
|
||||
is available to the guest, but it is not needed on this VCPU.
|
||||
|
||||
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
|
||||
Holds the state of the firmware support to mitigate CVE-2018-3639, as
|
||||
offered by KVM to the guest via a HVC call. The workaround is described
|
||||
under SMCCC_ARCH_WORKAROUND_2 in [1]_.
|
||||
|
||||
Accepted values are:
|
||||
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
|
||||
A workaround is not
|
||||
available. KVM does not offer firmware support for the workaround.
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
|
||||
The workaround state is
|
||||
unknown. KVM does not offer firmware support for the workaround.
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
|
||||
The workaround is available,
|
||||
and can be disabled by a vCPU. If
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED is set, it is active for
|
||||
this vCPU.
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
|
||||
The workaround is always active on this vCPU or it is not needed.
|
||||
|
||||
|
||||
Bitmap Feature Firmware Registers
|
||||
---------------------------------
|
||||
|
||||
Contrary to the above registers, the following registers exposes the
|
||||
hypercall services in the form of a feature-bitmap to the userspace. This
|
||||
bitmap is translated to the services that are available to the guest.
|
||||
There is a register defined per service call owner and can be accessed via
|
||||
GET/SET_ONE_REG interface.
|
||||
|
||||
By default, these registers are set with the upper limit of the features
|
||||
that are supported. This way userspace can discover all the usable
|
||||
hypercall services via GET_ONE_REG. The user-space can write-back the
|
||||
desired bitmap back via SET_ONE_REG. The features for the registers that
|
||||
are untouched, probably because userspace isn't aware of them, will be
|
||||
exposed as is to the guest.
|
||||
|
||||
Note that KVM will not allow the userspace to configure the registers
|
||||
anymore once any of the vCPUs has run at least once. Instead, it will
|
||||
return a -EBUSY.
|
||||
|
||||
The pseudo-firmware bitmap register are as follows:
|
||||
|
||||
* KVM_REG_ARM_STD_BMAP:
|
||||
Controls the bitmap of the ARM Standard Secure Service Calls.
|
||||
|
||||
The following bits are accepted:
|
||||
|
||||
Bit-0: KVM_REG_ARM_STD_BIT_TRNG_V1_0:
|
||||
The bit represents the services offered under v1.0 of ARM True Random
|
||||
Number Generator (TRNG) specification, ARM DEN0098.
|
||||
|
||||
* KVM_REG_ARM_STD_HYP_BMAP:
|
||||
Controls the bitmap of the ARM Standard Hypervisor Service Calls.
|
||||
|
||||
The following bits are accepted:
|
||||
|
||||
Bit-0: KVM_REG_ARM_STD_HYP_BIT_PV_TIME:
|
||||
The bit represents the Paravirtualized Time service as represented by
|
||||
ARM DEN0057A.
|
||||
|
||||
* KVM_REG_ARM_VENDOR_HYP_BMAP:
|
||||
Controls the bitmap of the Vendor specific Hypervisor Service Calls.
|
||||
|
||||
The following bits are accepted:
|
||||
|
||||
Bit-0: KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT
|
||||
The bit represents the ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID
|
||||
and ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID function-ids.
|
||||
|
||||
Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
|
||||
The bit represents the Precision Time Protocol KVM service.
|
||||
|
||||
Errors:
|
||||
|
||||
======= =============================================================
|
||||
-ENOENT Unknown register accessed.
|
||||
-EBUSY Attempt a 'write' to the register after the VM has started.
|
||||
-EINVAL Invalid bitmap written to the register.
|
||||
======= =============================================================
|
||||
|
||||
.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf
|
@ -8,6 +8,6 @@ ARM
|
||||
:maxdepth: 2
|
||||
|
||||
hyp-abi
|
||||
psci
|
||||
hypercalls
|
||||
pvtime
|
||||
ptp_kvm
|
||||
|
@ -1,77 +0,0 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=========================================
|
||||
Power State Coordination Interface (PSCI)
|
||||
=========================================
|
||||
|
||||
KVM implements the PSCI (Power State Coordination Interface)
|
||||
specification in order to provide services such as CPU on/off, reset
|
||||
and power-off to the guest.
|
||||
|
||||
The PSCI specification is regularly updated to provide new features,
|
||||
and KVM implements these updates if they make sense from a virtualization
|
||||
point of view.
|
||||
|
||||
This means that a guest booted on two different versions of KVM can
|
||||
observe two different "firmware" revisions. This could cause issues if
|
||||
a given guest is tied to a particular PSCI revision (unlikely), or if
|
||||
a migration causes a different PSCI version to be exposed out of the
|
||||
blue to an unsuspecting guest.
|
||||
|
||||
In order to remedy this situation, KVM exposes a set of "firmware
|
||||
pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
|
||||
interface. These registers can be saved/restored by userspace, and set
|
||||
to a convenient value if required.
|
||||
|
||||
The following register is defined:
|
||||
|
||||
* KVM_REG_ARM_PSCI_VERSION:
|
||||
|
||||
- Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
|
||||
(and thus has already been initialized)
|
||||
- Returns the current PSCI version on GET_ONE_REG (defaulting to the
|
||||
highest PSCI version implemented by KVM and compatible with v0.2)
|
||||
- Allows any PSCI version implemented by KVM and compatible with
|
||||
v0.2 to be set with SET_ONE_REG
|
||||
- Affects the whole VM (even if the register view is per-vcpu)
|
||||
|
||||
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
|
||||
Holds the state of the firmware support to mitigate CVE-2017-5715, as
|
||||
offered by KVM to the guest via a HVC call. The workaround is described
|
||||
under SMCCC_ARCH_WORKAROUND_1 in [1].
|
||||
|
||||
Accepted values are:
|
||||
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL:
|
||||
KVM does not offer
|
||||
firmware support for the workaround. The mitigation status for the
|
||||
guest is unknown.
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL:
|
||||
The workaround HVC call is
|
||||
available to the guest and required for the mitigation.
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED:
|
||||
The workaround HVC call
|
||||
is available to the guest, but it is not needed on this VCPU.
|
||||
|
||||
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
|
||||
Holds the state of the firmware support to mitigate CVE-2018-3639, as
|
||||
offered by KVM to the guest via a HVC call. The workaround is described
|
||||
under SMCCC_ARCH_WORKAROUND_2 in [1]_.
|
||||
|
||||
Accepted values are:
|
||||
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
|
||||
A workaround is not
|
||||
available. KVM does not offer firmware support for the workaround.
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
|
||||
The workaround state is
|
||||
unknown. KVM does not offer firmware support for the workaround.
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
|
||||
The workaround is available,
|
||||
and can be disabled by a vCPU. If
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED is set, it is active for
|
||||
this vCPU.
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
|
||||
The workaround is always active on this vCPU or it is not needed.
|
||||
|
||||
.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf
|
@ -202,6 +202,10 @@ Shadow pages contain the following information:
|
||||
Is 1 if the MMU instance cannot use A/D bits. EPT did not have A/D
|
||||
bits before Haswell; shadow EPT page tables also cannot use A/D bits
|
||||
if the L1 hypervisor does not enable them.
|
||||
role.passthrough:
|
||||
The page is not backed by a guest page table, but its first entry
|
||||
points to one. This is set if NPT uses 5-level page tables (host
|
||||
CR4.LA57=1) and is shadowing L1's 4-level NPT (L1 CR4.LA57=1).
|
||||
gfn:
|
||||
Either the guest page table containing the translations shadowed by this
|
||||
page, or the base page frame for linear translations. See role.direct.
|
||||
|
@ -10830,6 +10830,8 @@ T: git git://github.com/kvm-riscv/linux.git
|
||||
F: arch/riscv/include/asm/kvm*
|
||||
F: arch/riscv/include/uapi/asm/kvm*
|
||||
F: arch/riscv/kvm/
|
||||
F: tools/testing/selftests/kvm/*/riscv/
|
||||
F: tools/testing/selftests/kvm/riscv/
|
||||
|
||||
KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
|
||||
M: Christian Borntraeger <borntraeger@linux.ibm.com>
|
||||
@ -10844,9 +10846,12 @@ F: Documentation/virt/kvm/s390*
|
||||
F: arch/s390/include/asm/gmap.h
|
||||
F: arch/s390/include/asm/kvm*
|
||||
F: arch/s390/include/uapi/asm/kvm*
|
||||
F: arch/s390/include/uapi/asm/uvdevice.h
|
||||
F: arch/s390/kernel/uv.c
|
||||
F: arch/s390/kvm/
|
||||
F: arch/s390/mm/gmap.c
|
||||
F: drivers/s390/char/uvdevice.c
|
||||
F: tools/testing/selftests/drivers/s390x/uvdevice/
|
||||
F: tools/testing/selftests/kvm/*/s390x/
|
||||
F: tools/testing/selftests/kvm/s390x/
|
||||
|
||||
|
@ -16,7 +16,11 @@
|
||||
|
||||
#define sev() asm volatile("sev" : : : "memory")
|
||||
#define wfe() asm volatile("wfe" : : : "memory")
|
||||
#define wfet(val) asm volatile("msr s0_3_c1_c0_0, %0" \
|
||||
: : "r" (val) : "memory")
|
||||
#define wfi() asm volatile("wfi" : : : "memory")
|
||||
#define wfit(val) asm volatile("msr s0_3_c1_c0_1, %0" \
|
||||
: : "r" (val) : "memory")
|
||||
|
||||
#define isb() asm volatile("isb" : : : "memory")
|
||||
#define dmb(opt) asm volatile("dmb " #opt : : : "memory")
|
||||
|
@ -118,6 +118,10 @@
|
||||
|
||||
#define APPLE_CPU_PART_M1_ICESTORM 0x022
|
||||
#define APPLE_CPU_PART_M1_FIRESTORM 0x023
|
||||
#define APPLE_CPU_PART_M1_ICESTORM_PRO 0x024
|
||||
#define APPLE_CPU_PART_M1_FIRESTORM_PRO 0x025
|
||||
#define APPLE_CPU_PART_M1_ICESTORM_MAX 0x028
|
||||
#define APPLE_CPU_PART_M1_FIRESTORM_MAX 0x029
|
||||
|
||||
#define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53)
|
||||
#define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57)
|
||||
@ -164,6 +168,10 @@
|
||||
#define MIDR_HISI_TSV110 MIDR_CPU_MODEL(ARM_CPU_IMP_HISI, HISI_CPU_PART_TSV110)
|
||||
#define MIDR_APPLE_M1_ICESTORM MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_ICESTORM)
|
||||
#define MIDR_APPLE_M1_FIRESTORM MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM)
|
||||
#define MIDR_APPLE_M1_ICESTORM_PRO MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_ICESTORM_PRO)
|
||||
#define MIDR_APPLE_M1_FIRESTORM_PRO MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM_PRO)
|
||||
#define MIDR_APPLE_M1_ICESTORM_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_ICESTORM_MAX)
|
||||
#define MIDR_APPLE_M1_FIRESTORM_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM_MAX)
|
||||
|
||||
/* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */
|
||||
#define MIDR_FUJITSU_ERRATUM_010001 MIDR_FUJITSU_A64FX
|
||||
|
@ -135,7 +135,10 @@
|
||||
#define ESR_ELx_CV (UL(1) << 24)
|
||||
#define ESR_ELx_COND_SHIFT (20)
|
||||
#define ESR_ELx_COND_MASK (UL(0xF) << ESR_ELx_COND_SHIFT)
|
||||
#define ESR_ELx_WFx_ISS_TI (UL(1) << 0)
|
||||
#define ESR_ELx_WFx_ISS_RN (UL(0x1F) << 5)
|
||||
#define ESR_ELx_WFx_ISS_RV (UL(1) << 2)
|
||||
#define ESR_ELx_WFx_ISS_TI (UL(3) << 0)
|
||||
#define ESR_ELx_WFx_ISS_WFxT (UL(2) << 0)
|
||||
#define ESR_ELx_WFx_ISS_WFI (UL(0) << 0)
|
||||
#define ESR_ELx_WFx_ISS_WFE (UL(1) << 0)
|
||||
#define ESR_ELx_xVC_IMM_MASK ((UL(1) << 16) - 1)
|
||||
@ -148,7 +151,8 @@
|
||||
#define DISR_EL1_ESR_MASK (ESR_ELx_AET | ESR_ELx_EA | ESR_ELx_FSC)
|
||||
|
||||
/* ESR value templates for specific events */
|
||||
#define ESR_ELx_WFx_MASK (ESR_ELx_EC_MASK | ESR_ELx_WFx_ISS_TI)
|
||||
#define ESR_ELx_WFx_MASK (ESR_ELx_EC_MASK | \
|
||||
(ESR_ELx_WFx_ISS_TI & ~ESR_ELx_WFx_ISS_WFxT))
|
||||
#define ESR_ELx_WFx_WFI_VAL ((ESR_ELx_EC_WFx << ESR_ELx_EC_SHIFT) | \
|
||||
ESR_ELx_WFx_ISS_WFI)
|
||||
|
||||
|
@ -117,6 +117,7 @@
|
||||
#define KERNEL_HWCAP_SME_B16F32 __khwcap2_feature(SME_B16F32)
|
||||
#define KERNEL_HWCAP_SME_F32F32 __khwcap2_feature(SME_F32F32)
|
||||
#define KERNEL_HWCAP_SME_FA64 __khwcap2_feature(SME_FA64)
|
||||
#define KERNEL_HWCAP_WFXT __khwcap2_feature(WFXT)
|
||||
|
||||
/*
|
||||
* This yields a mask that user programs can use to figure out what
|
||||
|
@ -80,11 +80,12 @@
|
||||
* FMO: Override CPSR.F and enable signaling with VF
|
||||
* SWIO: Turn set/way invalidates into set/way clean+invalidate
|
||||
* PTW: Take a stage2 fault if a stage1 walk steps in device memory
|
||||
* TID3: Trap EL1 reads of group 3 ID registers
|
||||
*/
|
||||
#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
|
||||
HCR_BSU_IS | HCR_FB | HCR_TACR | \
|
||||
HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
|
||||
HCR_FMO | HCR_IMO | HCR_PTW )
|
||||
HCR_FMO | HCR_IMO | HCR_PTW | HCR_TID3 )
|
||||
#define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
|
||||
#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
|
||||
#define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
|
||||
|
@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
|
||||
unsigned long tcr_el2;
|
||||
unsigned long tpidr_el2;
|
||||
unsigned long stack_hyp_va;
|
||||
unsigned long stack_pa;
|
||||
phys_addr_t pgd_pa;
|
||||
unsigned long hcr_el2;
|
||||
unsigned long vttbr;
|
||||
|
@ -87,13 +87,6 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
|
||||
|
||||
if (vcpu_el1_is_32bit(vcpu))
|
||||
vcpu->arch.hcr_el2 &= ~HCR_RW;
|
||||
else
|
||||
/*
|
||||
* TID3: trap feature register accesses that we virtualise.
|
||||
* For now this is conditional, since no AArch32 feature regs
|
||||
* are currently virtualised.
|
||||
*/
|
||||
vcpu->arch.hcr_el2 |= HCR_TID3;
|
||||
|
||||
if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
|
||||
vcpu_el1_is_32bit(vcpu))
|
||||
|
@ -46,6 +46,7 @@
|
||||
#define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3)
|
||||
#define KVM_REQ_RELOAD_GICv4 KVM_ARCH_REQ(4)
|
||||
#define KVM_REQ_RELOAD_PMU KVM_ARCH_REQ(5)
|
||||
#define KVM_REQ_SUSPEND KVM_ARCH_REQ(6)
|
||||
|
||||
#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
|
||||
KVM_DIRTY_LOG_INITIALLY_SET)
|
||||
@ -101,15 +102,25 @@ struct kvm_s2_mmu {
|
||||
struct kvm_arch_memory_slot {
|
||||
};
|
||||
|
||||
/**
|
||||
* struct kvm_smccc_features: Descriptor of the hypercall services exposed to the guests
|
||||
*
|
||||
* @std_bmap: Bitmap of standard secure service calls
|
||||
* @std_hyp_bmap: Bitmap of standard hypervisor service calls
|
||||
* @vendor_hyp_bmap: Bitmap of vendor specific hypervisor service calls
|
||||
*/
|
||||
struct kvm_smccc_features {
|
||||
unsigned long std_bmap;
|
||||
unsigned long std_hyp_bmap;
|
||||
unsigned long vendor_hyp_bmap;
|
||||
};
|
||||
|
||||
struct kvm_arch {
|
||||
struct kvm_s2_mmu mmu;
|
||||
|
||||
/* VTCR_EL2 value for this VM */
|
||||
u64 vtcr;
|
||||
|
||||
/* The maximum number of vCPUs depends on the used GIC model */
|
||||
int max_vcpus;
|
||||
|
||||
/* Interrupt controller */
|
||||
struct vgic_dist vgic;
|
||||
|
||||
@ -136,6 +147,8 @@ struct kvm_arch {
|
||||
*/
|
||||
#define KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED 3
|
||||
#define KVM_ARCH_FLAG_EL1_32BIT 4
|
||||
/* PSCI SYSTEM_SUSPEND enabled for the guest */
|
||||
#define KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED 5
|
||||
|
||||
unsigned long flags;
|
||||
|
||||
@ -150,6 +163,9 @@ struct kvm_arch {
|
||||
|
||||
u8 pfr0_csv2;
|
||||
u8 pfr0_csv3;
|
||||
|
||||
/* Hypercall features firmware registers' descriptor */
|
||||
struct kvm_smccc_features smccc_feat;
|
||||
};
|
||||
|
||||
struct kvm_vcpu_fault_info {
|
||||
@ -254,14 +270,8 @@ struct kvm_cpu_context {
|
||||
struct kvm_vcpu *__hyp_running_vcpu;
|
||||
};
|
||||
|
||||
struct kvm_pmu_events {
|
||||
u32 events_host;
|
||||
u32 events_guest;
|
||||
};
|
||||
|
||||
struct kvm_host_data {
|
||||
struct kvm_cpu_context host_ctxt;
|
||||
struct kvm_pmu_events pmu_events;
|
||||
};
|
||||
|
||||
struct kvm_host_psci_config {
|
||||
@ -368,8 +378,8 @@ struct kvm_vcpu_arch {
|
||||
u32 mdscr_el1;
|
||||
} guest_debug_preserved;
|
||||
|
||||
/* vcpu power-off state */
|
||||
bool power_off;
|
||||
/* vcpu power state */
|
||||
struct kvm_mp_state mp_state;
|
||||
|
||||
/* Don't run the guest (internal implementation need) */
|
||||
bool pause;
|
||||
@ -455,6 +465,7 @@ struct kvm_vcpu_arch {
|
||||
#define KVM_ARM64_FP_FOREIGN_FPSTATE (1 << 14)
|
||||
#define KVM_ARM64_ON_UNSUPPORTED_CPU (1 << 15) /* Physical CPU not in supported_cpus */
|
||||
#define KVM_ARM64_HOST_SME_ENABLED (1 << 16) /* SME enabled for EL0 */
|
||||
#define KVM_ARM64_WFIT (1 << 17) /* WFIT instruction trapped */
|
||||
|
||||
#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
|
||||
KVM_GUESTDBG_USE_SW_BP | \
|
||||
@ -687,10 +698,11 @@ int kvm_handle_cp14_64(struct kvm_vcpu *vcpu);
|
||||
int kvm_handle_cp15_32(struct kvm_vcpu *vcpu);
|
||||
int kvm_handle_cp15_64(struct kvm_vcpu *vcpu);
|
||||
int kvm_handle_sys_reg(struct kvm_vcpu *vcpu);
|
||||
int kvm_handle_cp10_id(struct kvm_vcpu *vcpu);
|
||||
|
||||
void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
|
||||
|
||||
void kvm_sys_reg_table_init(void);
|
||||
int kvm_sys_reg_table_init(void);
|
||||
|
||||
/* MMIO helpers */
|
||||
void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
|
||||
@ -799,9 +811,6 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
|
||||
#ifdef CONFIG_KVM
|
||||
void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
|
||||
void kvm_clr_pmu_events(u32 clr);
|
||||
|
||||
void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu);
|
||||
void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
|
||||
#else
|
||||
static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
|
||||
static inline void kvm_clr_pmu_events(u32 clr) {}
|
||||
@ -833,8 +842,6 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
|
||||
#define kvm_has_mte(kvm) \
|
||||
(system_supports_mte() && \
|
||||
test_bit(KVM_ARCH_FLAG_MTE_ENABLED, &(kvm)->arch.flags))
|
||||
#define kvm_vcpu_has_pmu(vcpu) \
|
||||
(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
|
||||
|
||||
int kvm_trng_call(struct kvm_vcpu *vcpu);
|
||||
#ifdef CONFIG_KVM
|
||||
@ -845,4 +852,7 @@ void __init kvm_hyp_reserve(void);
|
||||
static inline void kvm_hyp_reserve(void) { }
|
||||
#endif
|
||||
|
||||
void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
|
||||
bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
|
||||
|
||||
#endif /* __ARM64_KVM_HOST_H__ */
|
||||
|
@ -154,6 +154,9 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
|
||||
int kvm_share_hyp(void *from, void *to);
|
||||
void kvm_unshare_hyp(void *from, void *to);
|
||||
int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
|
||||
int __create_hyp_mappings(unsigned long start, unsigned long size,
|
||||
unsigned long phys, enum kvm_pgtable_prot prot);
|
||||
int hyp_alloc_private_va_range(size_t size, unsigned long *haddr);
|
||||
int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
|
||||
void __iomem **kaddr,
|
||||
void __iomem **haddr);
|
||||
|
@ -87,5 +87,6 @@
|
||||
#define HWCAP2_SME_B16F32 (1 << 28)
|
||||
#define HWCAP2_SME_F32F32 (1 << 29)
|
||||
#define HWCAP2_SME_FA64 (1 << 30)
|
||||
#define HWCAP2_WFXT (1UL << 31)
|
||||
|
||||
#endif /* _UAPI__ASM_HWCAP_H */
|
||||
|
@ -334,6 +334,40 @@ struct kvm_arm_copy_mte_tags {
|
||||
#define KVM_ARM64_SVE_VLS_WORDS \
|
||||
((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
|
||||
|
||||
/* Bitmap feature firmware registers */
|
||||
#define KVM_REG_ARM_FW_FEAT_BMAP (0x0016 << KVM_REG_ARM_COPROC_SHIFT)
|
||||
#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
|
||||
KVM_REG_ARM_FW_FEAT_BMAP | \
|
||||
((r) & 0xffff))
|
||||
|
||||
#define KVM_REG_ARM_STD_BMAP KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
|
||||
|
||||
enum {
|
||||
KVM_REG_ARM_STD_BIT_TRNG_V1_0 = 0,
|
||||
#ifdef __KERNEL__
|
||||
KVM_REG_ARM_STD_BMAP_BIT_COUNT,
|
||||
#endif
|
||||
};
|
||||
|
||||
#define KVM_REG_ARM_STD_HYP_BMAP KVM_REG_ARM_FW_FEAT_BMAP_REG(1)
|
||||
|
||||
enum {
|
||||
KVM_REG_ARM_STD_HYP_BIT_PV_TIME = 0,
|
||||
#ifdef __KERNEL__
|
||||
KVM_REG_ARM_STD_HYP_BMAP_BIT_COUNT,
|
||||
#endif
|
||||
};
|
||||
|
||||
#define KVM_REG_ARM_VENDOR_HYP_BMAP KVM_REG_ARM_FW_FEAT_BMAP_REG(2)
|
||||
|
||||
enum {
|
||||
KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT = 0,
|
||||
KVM_REG_ARM_VENDOR_HYP_BIT_PTP = 1,
|
||||
#ifdef __KERNEL__
|
||||
KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_COUNT,
|
||||
#endif
|
||||
};
|
||||
|
||||
/* Device Control API: ARM VGIC */
|
||||
#define KVM_DEV_ARM_VGIC_GRP_ADDR 0
|
||||
#define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
|
||||
|
@ -237,6 +237,7 @@ static const struct arm64_ftr_bits ftr_id_aa64isar2[] = {
|
||||
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_PTR_AUTH),
|
||||
FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR2_GPA3_SHIFT, 4, 0),
|
||||
ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR2_RPRES_SHIFT, 4, 0),
|
||||
ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR2_WFXT_SHIFT, 4, 0),
|
||||
ARM64_FTR_END,
|
||||
};
|
||||
|
||||
@ -2517,6 +2518,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
|
||||
.cpu_enable = fa64_kernel_enable,
|
||||
},
|
||||
#endif /* CONFIG_ARM64_SME */
|
||||
{
|
||||
.desc = "WFx with timeout",
|
||||
.capability = ARM64_HAS_WFXT,
|
||||
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
|
||||
.sys_reg = SYS_ID_AA64ISAR2_EL1,
|
||||
.sign = FTR_UNSIGNED,
|
||||
.field_pos = ID_AA64ISAR2_WFXT_SHIFT,
|
||||
.field_width = 4,
|
||||
.matches = has_cpuid_feature,
|
||||
.min_field_value = ID_AA64ISAR2_WFXT_SUPPORTED,
|
||||
},
|
||||
{},
|
||||
};
|
||||
|
||||
@ -2650,6 +2662,7 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
|
||||
HWCAP_CAP(SYS_ID_AA64MMFR0_EL1, ID_AA64MMFR0_ECV_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ECV),
|
||||
HWCAP_CAP(SYS_ID_AA64MMFR1_EL1, ID_AA64MMFR1_AFP_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_AFP),
|
||||
HWCAP_CAP(SYS_ID_AA64ISAR2_EL1, ID_AA64ISAR2_RPRES_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_RPRES),
|
||||
HWCAP_CAP(SYS_ID_AA64ISAR2_EL1, ID_AA64ISAR2_WFXT_SHIFT, 4, FTR_UNSIGNED, ID_AA64ISAR2_WFXT_SUPPORTED, CAP_HWCAP, KERNEL_HWCAP_WFXT),
|
||||
#ifdef CONFIG_ARM64_SME
|
||||
HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_SME_SHIFT, 4, FTR_UNSIGNED, ID_AA64PFR1_SME, CAP_HWCAP, KERNEL_HWCAP_SME),
|
||||
HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_FA64_SHIFT, 1, FTR_UNSIGNED, ID_AA64SMFR0_FA64, CAP_HWCAP, KERNEL_HWCAP_SME_FA64),
|
||||
|
@ -106,6 +106,7 @@ static const char *const hwcap_str[] = {
|
||||
[KERNEL_HWCAP_SME_B16F32] = "smeb16f32",
|
||||
[KERNEL_HWCAP_SME_F32F32] = "smef32f32",
|
||||
[KERNEL_HWCAP_SME_FA64] = "smefa64",
|
||||
[KERNEL_HWCAP_WFXT] = "wfxt",
|
||||
};
|
||||
|
||||
#ifdef CONFIG_COMPAT
|
||||
|
@ -13,7 +13,7 @@ obj-$(CONFIG_KVM) += hyp/
|
||||
kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
|
||||
inject_fault.o va_layout.o handle_exit.o \
|
||||
guest.o debug.o reset.o sys_regs.o \
|
||||
vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
|
||||
vgic-sys-reg-v3.o fpsimd.o pkvm.o \
|
||||
arch_timer.o trng.o vmid.o \
|
||||
vgic/vgic.o vgic/vgic-init.o \
|
||||
vgic/vgic-irqfd.o vgic/vgic-v2.o \
|
||||
@ -22,7 +22,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
|
||||
vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
|
||||
vgic/vgic-its.o vgic/vgic-debug.o
|
||||
|
||||
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o
|
||||
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o
|
||||
|
||||
always-y := hyp_constants.h hyp-constants.s
|
||||
|
||||
|
@ -208,18 +208,16 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
|
||||
return IRQ_HANDLED;
|
||||
}
|
||||
|
||||
static u64 kvm_timer_compute_delta(struct arch_timer_context *timer_ctx)
|
||||
static u64 kvm_counter_compute_delta(struct arch_timer_context *timer_ctx,
|
||||
u64 val)
|
||||
{
|
||||
u64 cval, now;
|
||||
u64 now = kvm_phys_timer_read() - timer_get_offset(timer_ctx);
|
||||
|
||||
cval = timer_get_cval(timer_ctx);
|
||||
now = kvm_phys_timer_read() - timer_get_offset(timer_ctx);
|
||||
|
||||
if (now < cval) {
|
||||
if (now < val) {
|
||||
u64 ns;
|
||||
|
||||
ns = cyclecounter_cyc2ns(timecounter->cc,
|
||||
cval - now,
|
||||
val - now,
|
||||
timecounter->mask,
|
||||
&timecounter->frac);
|
||||
return ns;
|
||||
@ -228,6 +226,11 @@ static u64 kvm_timer_compute_delta(struct arch_timer_context *timer_ctx)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static u64 kvm_timer_compute_delta(struct arch_timer_context *timer_ctx)
|
||||
{
|
||||
return kvm_counter_compute_delta(timer_ctx, timer_get_cval(timer_ctx));
|
||||
}
|
||||
|
||||
static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx)
|
||||
{
|
||||
WARN_ON(timer_ctx && timer_ctx->loaded);
|
||||
@ -236,6 +239,20 @@ static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx)
|
||||
(ARCH_TIMER_CTRL_IT_MASK | ARCH_TIMER_CTRL_ENABLE)) == ARCH_TIMER_CTRL_ENABLE);
|
||||
}
|
||||
|
||||
static bool vcpu_has_wfit_active(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return (cpus_have_final_cap(ARM64_HAS_WFXT) &&
|
||||
(vcpu->arch.flags & KVM_ARM64_WFIT));
|
||||
}
|
||||
|
||||
static u64 wfit_delay_ns(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct arch_timer_context *ctx = vcpu_vtimer(vcpu);
|
||||
u64 val = vcpu_get_reg(vcpu, kvm_vcpu_sys_get_rt(vcpu));
|
||||
|
||||
return kvm_counter_compute_delta(ctx, val);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns the earliest expiration time in ns among guest timers.
|
||||
* Note that it will return 0 if none of timers can fire.
|
||||
@ -253,6 +270,9 @@ static u64 kvm_timer_earliest_exp(struct kvm_vcpu *vcpu)
|
||||
min_delta = min(min_delta, kvm_timer_compute_delta(ctx));
|
||||
}
|
||||
|
||||
if (vcpu_has_wfit_active(vcpu))
|
||||
min_delta = min(min_delta, wfit_delay_ns(vcpu));
|
||||
|
||||
/* If none of timers can fire, then return 0 */
|
||||
if (min_delta == ULLONG_MAX)
|
||||
return 0;
|
||||
@ -350,15 +370,9 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
|
||||
return cval <= now;
|
||||
}
|
||||
|
||||
bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
|
||||
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct timer_map map;
|
||||
|
||||
get_timer_map(vcpu, &map);
|
||||
|
||||
return kvm_timer_should_fire(map.direct_vtimer) ||
|
||||
kvm_timer_should_fire(map.direct_ptimer) ||
|
||||
kvm_timer_should_fire(map.emul_ptimer);
|
||||
return vcpu_has_wfit_active(vcpu) && wfit_delay_ns(vcpu) == 0;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -484,7 +498,8 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
|
||||
*/
|
||||
if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
|
||||
!kvm_timer_irq_can_fire(map.direct_ptimer) &&
|
||||
!kvm_timer_irq_can_fire(map.emul_ptimer))
|
||||
!kvm_timer_irq_can_fire(map.emul_ptimer) &&
|
||||
!vcpu_has_wfit_active(vcpu))
|
||||
return;
|
||||
|
||||
/*
|
||||
|
@ -97,6 +97,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
|
||||
}
|
||||
mutex_unlock(&kvm->lock);
|
||||
break;
|
||||
case KVM_CAP_ARM_SYSTEM_SUSPEND:
|
||||
r = 0;
|
||||
set_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags);
|
||||
break;
|
||||
default:
|
||||
r = -EINVAL;
|
||||
break;
|
||||
@ -153,9 +157,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
|
||||
kvm_vgic_early_init(kvm);
|
||||
|
||||
/* The maximum number of VCPUs is limited by the host's GIC model */
|
||||
kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
|
||||
kvm->max_vcpus = kvm_arm_default_max_vcpus();
|
||||
|
||||
set_default_spectre(kvm);
|
||||
kvm_arm_init_hypercalls(kvm);
|
||||
|
||||
return ret;
|
||||
out_free_stage2_pgd:
|
||||
@ -210,6 +215,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
case KVM_CAP_SET_GUEST_DEBUG:
|
||||
case KVM_CAP_VCPU_ATTRIBUTES:
|
||||
case KVM_CAP_PTP_KVM:
|
||||
case KVM_CAP_ARM_SYSTEM_SUSPEND:
|
||||
r = 1;
|
||||
break;
|
||||
case KVM_CAP_SET_GUEST_DEBUG2:
|
||||
@ -230,7 +236,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
case KVM_CAP_MAX_VCPUS:
|
||||
case KVM_CAP_MAX_VCPU_ID:
|
||||
if (kvm)
|
||||
r = kvm->arch.max_vcpus;
|
||||
r = kvm->max_vcpus;
|
||||
else
|
||||
r = kvm_arm_default_max_vcpus();
|
||||
break;
|
||||
@ -306,7 +312,7 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
|
||||
if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
|
||||
return -EBUSY;
|
||||
|
||||
if (id >= kvm->arch.max_vcpus)
|
||||
if (id >= kvm->max_vcpus)
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
@ -356,11 +362,6 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
|
||||
kvm_arm_vcpu_destroy(vcpu);
|
||||
}
|
||||
|
||||
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return kvm_timer_is_pending(vcpu);
|
||||
}
|
||||
|
||||
void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
|
||||
@ -432,20 +433,34 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
|
||||
vcpu->cpu = -1;
|
||||
}
|
||||
|
||||
static void vcpu_power_off(struct kvm_vcpu *vcpu)
|
||||
void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
vcpu->arch.power_off = true;
|
||||
vcpu->arch.mp_state.mp_state = KVM_MP_STATE_STOPPED;
|
||||
kvm_make_request(KVM_REQ_SLEEP, vcpu);
|
||||
kvm_vcpu_kick(vcpu);
|
||||
}
|
||||
|
||||
bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return vcpu->arch.mp_state.mp_state == KVM_MP_STATE_STOPPED;
|
||||
}
|
||||
|
||||
static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
vcpu->arch.mp_state.mp_state = KVM_MP_STATE_SUSPENDED;
|
||||
kvm_make_request(KVM_REQ_SUSPEND, vcpu);
|
||||
kvm_vcpu_kick(vcpu);
|
||||
}
|
||||
|
||||
static bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return vcpu->arch.mp_state.mp_state == KVM_MP_STATE_SUSPENDED;
|
||||
}
|
||||
|
||||
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mp_state *mp_state)
|
||||
{
|
||||
if (vcpu->arch.power_off)
|
||||
mp_state->mp_state = KVM_MP_STATE_STOPPED;
|
||||
else
|
||||
mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
|
||||
*mp_state = vcpu->arch.mp_state;
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -457,10 +472,13 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
|
||||
|
||||
switch (mp_state->mp_state) {
|
||||
case KVM_MP_STATE_RUNNABLE:
|
||||
vcpu->arch.power_off = false;
|
||||
vcpu->arch.mp_state = *mp_state;
|
||||
break;
|
||||
case KVM_MP_STATE_STOPPED:
|
||||
vcpu_power_off(vcpu);
|
||||
kvm_arm_vcpu_power_off(vcpu);
|
||||
break;
|
||||
case KVM_MP_STATE_SUSPENDED:
|
||||
kvm_arm_vcpu_suspend(vcpu);
|
||||
break;
|
||||
default:
|
||||
ret = -EINVAL;
|
||||
@ -480,7 +498,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
|
||||
{
|
||||
bool irq_lines = *vcpu_hcr(v) & (HCR_VI | HCR_VF);
|
||||
return ((irq_lines || kvm_vgic_vcpu_pending_irq(v))
|
||||
&& !v->arch.power_off && !v->arch.pause);
|
||||
&& !kvm_arm_vcpu_stopped(v) && !v->arch.pause);
|
||||
}
|
||||
|
||||
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
|
||||
@ -592,15 +610,15 @@ void kvm_arm_resume_guest(struct kvm *kvm)
|
||||
}
|
||||
}
|
||||
|
||||
static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
|
||||
static void kvm_vcpu_sleep(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
|
||||
|
||||
rcuwait_wait_event(wait,
|
||||
(!vcpu->arch.power_off) &&(!vcpu->arch.pause),
|
||||
(!kvm_arm_vcpu_stopped(vcpu)) && (!vcpu->arch.pause),
|
||||
TASK_INTERRUPTIBLE);
|
||||
|
||||
if (vcpu->arch.power_off || vcpu->arch.pause) {
|
||||
if (kvm_arm_vcpu_stopped(vcpu) || vcpu->arch.pause) {
|
||||
/* Awaken to handle a signal, request we sleep again later. */
|
||||
kvm_make_request(KVM_REQ_SLEEP, vcpu);
|
||||
}
|
||||
@ -639,6 +657,7 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
|
||||
preempt_enable();
|
||||
|
||||
kvm_vcpu_halt(vcpu);
|
||||
vcpu->arch.flags &= ~KVM_ARM64_WFIT;
|
||||
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
|
||||
|
||||
preempt_disable();
|
||||
@ -646,11 +665,53 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
|
||||
preempt_enable();
|
||||
}
|
||||
|
||||
static void check_vcpu_requests(struct kvm_vcpu *vcpu)
|
||||
static int kvm_vcpu_suspend(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (!kvm_arm_vcpu_suspended(vcpu))
|
||||
return 1;
|
||||
|
||||
kvm_vcpu_wfi(vcpu);
|
||||
|
||||
/*
|
||||
* The suspend state is sticky; we do not leave it until userspace
|
||||
* explicitly marks the vCPU as runnable. Request that we suspend again
|
||||
* later.
|
||||
*/
|
||||
kvm_make_request(KVM_REQ_SUSPEND, vcpu);
|
||||
|
||||
/*
|
||||
* Check to make sure the vCPU is actually runnable. If so, exit to
|
||||
* userspace informing it of the wakeup condition.
|
||||
*/
|
||||
if (kvm_arch_vcpu_runnable(vcpu)) {
|
||||
memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
|
||||
vcpu->run->system_event.type = KVM_SYSTEM_EVENT_WAKEUP;
|
||||
vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Otherwise, we were unblocked to process a different event, such as a
|
||||
* pending signal. Return 1 and allow kvm_arch_vcpu_ioctl_run() to
|
||||
* process the event.
|
||||
*/
|
||||
return 1;
|
||||
}
|
||||
|
||||
/**
|
||||
* check_vcpu_requests - check and handle pending vCPU requests
|
||||
* @vcpu: the VCPU pointer
|
||||
*
|
||||
* Return: 1 if we should enter the guest
|
||||
* 0 if we should exit to userspace
|
||||
* < 0 if we should exit to userspace, where the return value indicates
|
||||
* an error
|
||||
*/
|
||||
static int check_vcpu_requests(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (kvm_request_pending(vcpu)) {
|
||||
if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
|
||||
vcpu_req_sleep(vcpu);
|
||||
kvm_vcpu_sleep(vcpu);
|
||||
|
||||
if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
|
||||
kvm_reset_vcpu(vcpu);
|
||||
@ -675,7 +736,12 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
|
||||
if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
|
||||
kvm_pmu_handle_pmcr(vcpu,
|
||||
__vcpu_sys_reg(vcpu, PMCR_EL0));
|
||||
|
||||
if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
|
||||
return kvm_vcpu_suspend(vcpu);
|
||||
}
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
static bool vcpu_mode_is_bad_32bit(struct kvm_vcpu *vcpu)
|
||||
@ -792,7 +858,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
|
||||
if (!ret)
|
||||
ret = 1;
|
||||
|
||||
check_vcpu_requests(vcpu);
|
||||
if (ret > 0)
|
||||
ret = check_vcpu_requests(vcpu);
|
||||
|
||||
/*
|
||||
* Preparing the interrupts to be injected also
|
||||
@ -816,6 +883,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
|
||||
|
||||
kvm_vgic_flush_hwstate(vcpu);
|
||||
|
||||
kvm_pmu_update_vcpu_events(vcpu);
|
||||
|
||||
/*
|
||||
* Ensure we set mode to IN_GUEST_MODE after we disable
|
||||
* interrupts and before the final VCPU requests check.
|
||||
@ -1125,9 +1194,9 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
|
||||
* Handle the "start in power-off" case.
|
||||
*/
|
||||
if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
|
||||
vcpu_power_off(vcpu);
|
||||
kvm_arm_vcpu_power_off(vcpu);
|
||||
else
|
||||
vcpu->arch.power_off = false;
|
||||
vcpu->arch.mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -1485,7 +1554,6 @@ static void cpu_prepare_hyp_mode(int cpu)
|
||||
tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
|
||||
params->tcr_el2 = tcr;
|
||||
|
||||
params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
|
||||
params->pgd_pa = kvm_mmu_get_httbr();
|
||||
if (is_protected_kvm_enabled())
|
||||
params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
|
||||
@ -1763,8 +1831,6 @@ static int init_subsystems(void)
|
||||
|
||||
kvm_register_perf_callbacks(NULL);
|
||||
|
||||
kvm_sys_reg_table_init();
|
||||
|
||||
out:
|
||||
if (err || !is_protected_kvm_enabled())
|
||||
on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
|
||||
@ -1935,14 +2001,46 @@ static int init_hyp_mode(void)
|
||||
* Map the Hyp stack pages
|
||||
*/
|
||||
for_each_possible_cpu(cpu) {
|
||||
struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
|
||||
char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
|
||||
err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
|
||||
PAGE_HYP);
|
||||
unsigned long hyp_addr;
|
||||
|
||||
/*
|
||||
* Allocate a contiguous HYP private VA range for the stack
|
||||
* and guard page. The allocation is also aligned based on
|
||||
* the order of its size.
|
||||
*/
|
||||
err = hyp_alloc_private_va_range(PAGE_SIZE * 2, &hyp_addr);
|
||||
if (err) {
|
||||
kvm_err("Cannot allocate hyp stack guard page\n");
|
||||
goto out_err;
|
||||
}
|
||||
|
||||
/*
|
||||
* Since the stack grows downwards, map the stack to the page
|
||||
* at the higher address and leave the lower guard page
|
||||
* unbacked.
|
||||
*
|
||||
* Any valid stack address now has the PAGE_SHIFT bit as 1
|
||||
* and addresses corresponding to the guard page have the
|
||||
* PAGE_SHIFT bit as 0 - this is used for overflow detection.
|
||||
*/
|
||||
err = __create_hyp_mappings(hyp_addr + PAGE_SIZE, PAGE_SIZE,
|
||||
__pa(stack_page), PAGE_HYP);
|
||||
if (err) {
|
||||
kvm_err("Cannot map hyp stack\n");
|
||||
goto out_err;
|
||||
}
|
||||
|
||||
/*
|
||||
* Save the stack PA in nvhe_init_params. This will be needed
|
||||
* to recreate the stack mapping in protected nVHE mode.
|
||||
* __hyp_pa() won't do the right thing there, since the stack
|
||||
* has been mapped in the flexible private VA space.
|
||||
*/
|
||||
params->stack_pa = __pa(stack_page);
|
||||
|
||||
params->stack_hyp_va = hyp_addr + (2 * PAGE_SIZE);
|
||||
}
|
||||
|
||||
for_each_possible_cpu(cpu) {
|
||||
@ -2091,6 +2189,12 @@ int kvm_arch_init(void *opaque)
|
||||
return -ENODEV;
|
||||
}
|
||||
|
||||
err = kvm_sys_reg_table_init();
|
||||
if (err) {
|
||||
kvm_info("Error initializing system register tables");
|
||||
return err;
|
||||
}
|
||||
|
||||
in_hyp_mode = is_kernel_in_hyp_mode();
|
||||
|
||||
if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) ||
|
||||
|
@ -18,7 +18,7 @@
|
||||
#include <linux/string.h>
|
||||
#include <linux/vmalloc.h>
|
||||
#include <linux/fs.h>
|
||||
#include <kvm/arm_psci.h>
|
||||
#include <kvm/arm_hypercalls.h>
|
||||
#include <asm/cputype.h>
|
||||
#include <linux/uaccess.h>
|
||||
#include <asm/fpsimd.h>
|
||||
@ -756,7 +756,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
|
||||
|
||||
switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
|
||||
case KVM_REG_ARM_CORE: return get_core_reg(vcpu, reg);
|
||||
case KVM_REG_ARM_FW: return kvm_arm_get_fw_reg(vcpu, reg);
|
||||
case KVM_REG_ARM_FW:
|
||||
case KVM_REG_ARM_FW_FEAT_BMAP:
|
||||
return kvm_arm_get_fw_reg(vcpu, reg);
|
||||
case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg);
|
||||
}
|
||||
|
||||
@ -774,7 +776,9 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
|
||||
|
||||
switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
|
||||
case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg);
|
||||
case KVM_REG_ARM_FW: return kvm_arm_set_fw_reg(vcpu, reg);
|
||||
case KVM_REG_ARM_FW:
|
||||
case KVM_REG_ARM_FW_FEAT_BMAP:
|
||||
return kvm_arm_set_fw_reg(vcpu, reg);
|
||||
case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg);
|
||||
}
|
||||
|
||||
|
@ -80,24 +80,51 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
|
||||
*
|
||||
* @vcpu: the vcpu pointer
|
||||
*
|
||||
* WFE: Yield the CPU and come back to this vcpu when the scheduler
|
||||
* WFE[T]: Yield the CPU and come back to this vcpu when the scheduler
|
||||
* decides to.
|
||||
* WFI: Simply call kvm_vcpu_halt(), which will halt execution of
|
||||
* world-switches and schedule other host processes until there is an
|
||||
* incoming IRQ or FIQ to the VM.
|
||||
* WFIT: Same as WFI, with a timed wakeup implemented as a background timer
|
||||
*
|
||||
* WF{I,E}T can immediately return if the deadline has already expired.
|
||||
*/
|
||||
static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
|
||||
u64 esr = kvm_vcpu_get_esr(vcpu);
|
||||
|
||||
if (esr & ESR_ELx_WFx_ISS_WFE) {
|
||||
trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
|
||||
vcpu->stat.wfe_exit_stat++;
|
||||
kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
|
||||
} else {
|
||||
trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
|
||||
vcpu->stat.wfi_exit_stat++;
|
||||
kvm_vcpu_wfi(vcpu);
|
||||
}
|
||||
|
||||
if (esr & ESR_ELx_WFx_ISS_WFxT) {
|
||||
if (esr & ESR_ELx_WFx_ISS_RV) {
|
||||
u64 val, now;
|
||||
|
||||
now = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_TIMER_CNT);
|
||||
val = vcpu_get_reg(vcpu, kvm_vcpu_sys_get_rt(vcpu));
|
||||
|
||||
if (now >= val)
|
||||
goto out;
|
||||
} else {
|
||||
/* Treat WFxT as WFx if RN is invalid */
|
||||
esr &= ~ESR_ELx_WFx_ISS_WFxT;
|
||||
}
|
||||
}
|
||||
|
||||
if (esr & ESR_ELx_WFx_ISS_WFE) {
|
||||
kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
|
||||
} else {
|
||||
if (esr & ESR_ELx_WFx_ISS_WFxT)
|
||||
vcpu->arch.flags |= KVM_ARM64_WFIT;
|
||||
|
||||
kvm_vcpu_wfi(vcpu);
|
||||
}
|
||||
out:
|
||||
kvm_incr_pc(vcpu);
|
||||
|
||||
return 1;
|
||||
@ -169,6 +196,7 @@ static exit_handle_fn arm_exit_handlers[] = {
|
||||
[ESR_ELx_EC_CP15_64] = kvm_handle_cp15_64,
|
||||
[ESR_ELx_EC_CP14_MR] = kvm_handle_cp14_32,
|
||||
[ESR_ELx_EC_CP14_LS] = kvm_handle_cp14_load_store,
|
||||
[ESR_ELx_EC_CP10_ID] = kvm_handle_cp10_id,
|
||||
[ESR_ELx_EC_CP14_64] = kvm_handle_cp14_64,
|
||||
[ESR_ELx_EC_HVC32] = handle_hvc,
|
||||
[ESR_ELx_EC_SMC32] = handle_smc,
|
||||
@ -297,13 +325,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
|
||||
u64 elr_in_kimg = __phys_to_kimg(elr_phys);
|
||||
u64 hyp_offset = elr_in_kimg - kaslr_offset() - elr_virt;
|
||||
u64 mode = spsr & PSR_MODE_MASK;
|
||||
u64 panic_addr = elr_virt + hyp_offset;
|
||||
|
||||
/*
|
||||
* The nVHE hyp symbols are not included by kallsyms to avoid issues
|
||||
* with aliasing. That means that the symbols cannot be printed with the
|
||||
* "%pS" format specifier, so fall back to the vmlinux address if
|
||||
* there's no better option.
|
||||
*/
|
||||
if (mode != PSR_MODE_EL2t && mode != PSR_MODE_EL2h) {
|
||||
kvm_err("Invalid host exception to nVHE hyp!\n");
|
||||
} else if (ESR_ELx_EC(esr) == ESR_ELx_EC_BRK64 &&
|
||||
@ -323,9 +346,11 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
|
||||
if (file)
|
||||
kvm_err("nVHE hyp BUG at: %s:%u!\n", file, line);
|
||||
else
|
||||
kvm_err("nVHE hyp BUG at: %016llx!\n", elr_virt + hyp_offset);
|
||||
kvm_err("nVHE hyp BUG at: [<%016llx>] %pB!\n", panic_addr,
|
||||
(void *)panic_addr);
|
||||
} else {
|
||||
kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
|
||||
kvm_err("nVHE hyp panic at: [<%016llx>] %pB!\n", panic_addr,
|
||||
(void *)panic_addr);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -19,8 +19,10 @@ int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
|
||||
int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
|
||||
int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
|
||||
int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
|
||||
unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
|
||||
enum kvm_pgtable_prot prot);
|
||||
int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
|
||||
enum kvm_pgtable_prot prot,
|
||||
unsigned long *haddr);
|
||||
int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
|
||||
|
||||
static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
|
||||
unsigned long *start, unsigned long *end)
|
||||
|
@ -80,7 +80,7 @@ SYM_FUNC_START(__hyp_do_panic)
|
||||
mov lr, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
|
||||
PSR_MODE_EL1h)
|
||||
msr spsr_el2, lr
|
||||
ldr lr, =nvhe_hyp_panic_handler
|
||||
adr_l lr, nvhe_hyp_panic_handler
|
||||
hyp_kimg_va lr, x6
|
||||
msr elr_el2, lr
|
||||
|
||||
@ -125,13 +125,11 @@ alternative_else_nop_endif
|
||||
add sp, sp, #16
|
||||
/*
|
||||
* Compute the idmap address of __kvm_handle_stub_hvc and
|
||||
* jump there. Since we use kimage_voffset, do not use the
|
||||
* HYP VA for __kvm_handle_stub_hvc, but the kernel VA instead
|
||||
* (by loading it from the constant pool).
|
||||
* jump there.
|
||||
*
|
||||
* Preserve x0-x4, which may contain stub parameters.
|
||||
*/
|
||||
ldr x5, =__kvm_handle_stub_hvc
|
||||
adr_l x5, __kvm_handle_stub_hvc
|
||||
hyp_pa x5, x6
|
||||
br x5
|
||||
SYM_FUNC_END(__host_hvc)
|
||||
@ -153,6 +151,18 @@ SYM_FUNC_END(__host_hvc)
|
||||
|
||||
.macro invalid_host_el2_vect
|
||||
.align 7
|
||||
|
||||
/*
|
||||
* Test whether the SP has overflowed, without corrupting a GPR.
|
||||
* nVHE hypervisor stacks are aligned so that the PAGE_SHIFT bit
|
||||
* of SP should always be 1.
|
||||
*/
|
||||
add sp, sp, x0 // sp' = sp + x0
|
||||
sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
|
||||
tbz x0, #PAGE_SHIFT, .L__hyp_sp_overflow\@
|
||||
sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0
|
||||
sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp
|
||||
|
||||
/* If a guest is loaded, panic out of it. */
|
||||
stp x0, x1, [sp, #-16]!
|
||||
get_loaded_vcpu x0, x1
|
||||
@ -165,6 +175,18 @@ SYM_FUNC_END(__host_hvc)
|
||||
* been partially clobbered by __host_enter.
|
||||
*/
|
||||
b hyp_panic
|
||||
|
||||
.L__hyp_sp_overflow\@:
|
||||
/*
|
||||
* Reset SP to the top of the stack, to allow handling the hyp_panic.
|
||||
* This corrupts the stack but is ok, since we won't be attempting
|
||||
* any unwinding here.
|
||||
*/
|
||||
ldr_this_cpu x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
|
||||
mov sp, x0
|
||||
|
||||
b hyp_panic_bad_stack
|
||||
ASM_BUG()
|
||||
.endm
|
||||
|
||||
.macro invalid_host_el1_vect
|
||||
|
@ -160,7 +160,23 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
|
||||
DECLARE_REG(size_t, size, host_ctxt, 2);
|
||||
DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
|
||||
|
||||
cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
|
||||
/*
|
||||
* __pkvm_create_private_mapping() populates a pointer with the
|
||||
* hypervisor start address of the allocation.
|
||||
*
|
||||
* However, handle___pkvm_create_private_mapping() hypercall crosses the
|
||||
* EL1/EL2 boundary so the pointer would not be valid in this context.
|
||||
*
|
||||
* Instead pass the allocation address as the return value (or return
|
||||
* ERR_PTR() on failure).
|
||||
*/
|
||||
unsigned long haddr;
|
||||
int err = __pkvm_create_private_mapping(phys, size, prot, &haddr);
|
||||
|
||||
if (err)
|
||||
haddr = (unsigned long)ERR_PTR(err);
|
||||
|
||||
cpu_reg(host_ctxt, 1) = haddr;
|
||||
}
|
||||
|
||||
static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
|
||||
|
@ -37,36 +37,60 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
|
||||
return err;
|
||||
}
|
||||
|
||||
unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
|
||||
enum kvm_pgtable_prot prot)
|
||||
/**
|
||||
* pkvm_alloc_private_va_range - Allocates a private VA range.
|
||||
* @size: The size of the VA range to reserve.
|
||||
* @haddr: The hypervisor virtual start address of the allocation.
|
||||
*
|
||||
* The private virtual address (VA) range is allocated above __io_map_base
|
||||
* and aligned based on the order of @size.
|
||||
*
|
||||
* Return: 0 on success or negative error code on failure.
|
||||
*/
|
||||
int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr)
|
||||
{
|
||||
unsigned long base, addr;
|
||||
int ret = 0;
|
||||
|
||||
hyp_spin_lock(&pkvm_pgd_lock);
|
||||
|
||||
/* Align the allocation based on the order of its size */
|
||||
addr = ALIGN(__io_map_base, PAGE_SIZE << get_order(size));
|
||||
|
||||
/* The allocated size is always a multiple of PAGE_SIZE */
|
||||
base = addr + PAGE_ALIGN(size);
|
||||
|
||||
/* Are we overflowing on the vmemmap ? */
|
||||
if (!addr || base > __hyp_vmemmap)
|
||||
ret = -ENOMEM;
|
||||
else {
|
||||
__io_map_base = base;
|
||||
*haddr = addr;
|
||||
}
|
||||
|
||||
hyp_spin_unlock(&pkvm_pgd_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
|
||||
enum kvm_pgtable_prot prot,
|
||||
unsigned long *haddr)
|
||||
{
|
||||
unsigned long addr;
|
||||
int err;
|
||||
|
||||
hyp_spin_lock(&pkvm_pgd_lock);
|
||||
|
||||
size = PAGE_ALIGN(size + offset_in_page(phys));
|
||||
addr = __io_map_base;
|
||||
__io_map_base += size;
|
||||
err = pkvm_alloc_private_va_range(size, &addr);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
/* Are we overflowing on the vmemmap ? */
|
||||
if (__io_map_base > __hyp_vmemmap) {
|
||||
__io_map_base -= size;
|
||||
addr = (unsigned long)ERR_PTR(-ENOMEM);
|
||||
goto out;
|
||||
}
|
||||
err = __pkvm_create_mappings(addr, size, phys, prot);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
|
||||
if (err) {
|
||||
addr = (unsigned long)ERR_PTR(err);
|
||||
goto out;
|
||||
}
|
||||
|
||||
addr = addr + offset_in_page(phys);
|
||||
out:
|
||||
hyp_spin_unlock(&pkvm_pgd_lock);
|
||||
|
||||
return addr;
|
||||
*haddr = addr + offset_in_page(phys);
|
||||
return err;
|
||||
}
|
||||
|
||||
int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot)
|
||||
@ -146,7 +170,8 @@ int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot)
|
||||
int hyp_map_vectors(void)
|
||||
{
|
||||
phys_addr_t phys;
|
||||
void *bp_base;
|
||||
unsigned long bp_base;
|
||||
int ret;
|
||||
|
||||
if (!kvm_system_needs_idmapped_vectors()) {
|
||||
__hyp_bp_vect_base = __bp_harden_hyp_vecs;
|
||||
@ -154,13 +179,12 @@ int hyp_map_vectors(void)
|
||||
}
|
||||
|
||||
phys = __hyp_pa(__bp_harden_hyp_vecs);
|
||||
bp_base = (void *)__pkvm_create_private_mapping(phys,
|
||||
__BP_HARDEN_HYP_VECS_SZ,
|
||||
PAGE_HYP_EXEC);
|
||||
if (IS_ERR_OR_NULL(bp_base))
|
||||
return PTR_ERR(bp_base);
|
||||
ret = __pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
|
||||
PAGE_HYP_EXEC, &bp_base);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
__hyp_bp_vect_base = bp_base;
|
||||
__hyp_bp_vect_base = (void *)bp_base;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -99,17 +99,42 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
|
||||
return ret;
|
||||
|
||||
for (i = 0; i < hyp_nr_cpus; i++) {
|
||||
struct kvm_nvhe_init_params *params = per_cpu_ptr(&kvm_init_params, i);
|
||||
unsigned long hyp_addr;
|
||||
|
||||
start = (void *)kern_hyp_va(per_cpu_base[i]);
|
||||
end = start + PAGE_ALIGN(hyp_percpu_size);
|
||||
ret = pkvm_create_mappings(start, end, PAGE_HYP);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
|
||||
start = end - PAGE_SIZE;
|
||||
ret = pkvm_create_mappings(start, end, PAGE_HYP);
|
||||
/*
|
||||
* Allocate a contiguous HYP private VA range for the stack
|
||||
* and guard page. The allocation is also aligned based on
|
||||
* the order of its size.
|
||||
*/
|
||||
ret = pkvm_alloc_private_va_range(PAGE_SIZE * 2, &hyp_addr);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
/*
|
||||
* Since the stack grows downwards, map the stack to the page
|
||||
* at the higher address and leave the lower guard page
|
||||
* unbacked.
|
||||
*
|
||||
* Any valid stack address now has the PAGE_SHIFT bit as 1
|
||||
* and addresses corresponding to the guard page have the
|
||||
* PAGE_SHIFT bit as 0 - this is used for overflow detection.
|
||||
*/
|
||||
hyp_spin_lock(&pkvm_pgd_lock);
|
||||
ret = kvm_pgtable_hyp_map(&pkvm_pgtable, hyp_addr + PAGE_SIZE,
|
||||
PAGE_SIZE, params->stack_pa, PAGE_HYP);
|
||||
hyp_spin_unlock(&pkvm_pgd_lock);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
/* Update stack_hyp_va to end of the stack's private VA range */
|
||||
params->stack_hyp_va = hyp_addr + (2 * PAGE_SIZE);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -150,16 +150,13 @@ static void __hyp_vgic_restore_state(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
/*
|
||||
* Disable host events, enable guest events
|
||||
*/
|
||||
static bool __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt)
|
||||
#ifdef CONFIG_HW_PERF_EVENTS
|
||||
static bool __pmu_switch_to_guest(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_host_data *host;
|
||||
struct kvm_pmu_events *pmu;
|
||||
|
||||
host = container_of(host_ctxt, struct kvm_host_data, host_ctxt);
|
||||
pmu = &host->pmu_events;
|
||||
struct kvm_pmu_events *pmu = &vcpu->arch.pmu.events;
|
||||
|
||||
if (pmu->events_host)
|
||||
write_sysreg(pmu->events_host, pmcntenclr_el0);
|
||||
@ -170,16 +167,12 @@ static bool __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt)
|
||||
return (pmu->events_host || pmu->events_guest);
|
||||
}
|
||||
|
||||
/**
|
||||
/*
|
||||
* Disable guest events, enable host events
|
||||
*/
|
||||
static void __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt)
|
||||
static void __pmu_switch_to_host(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_host_data *host;
|
||||
struct kvm_pmu_events *pmu;
|
||||
|
||||
host = container_of(host_ctxt, struct kvm_host_data, host_ctxt);
|
||||
pmu = &host->pmu_events;
|
||||
struct kvm_pmu_events *pmu = &vcpu->arch.pmu.events;
|
||||
|
||||
if (pmu->events_guest)
|
||||
write_sysreg(pmu->events_guest, pmcntenclr_el0);
|
||||
@ -187,8 +180,12 @@ static void __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt)
|
||||
if (pmu->events_host)
|
||||
write_sysreg(pmu->events_host, pmcntenset_el0);
|
||||
}
|
||||
#else
|
||||
#define __pmu_switch_to_guest(v) ({ false; })
|
||||
#define __pmu_switch_to_host(v) do {} while (0)
|
||||
#endif
|
||||
|
||||
/**
|
||||
/*
|
||||
* Handler for protected VM MSR, MRS or System instruction execution in AArch64.
|
||||
*
|
||||
* Returns true if the hypervisor has handled the exit, and control should go
|
||||
@ -205,23 +202,6 @@ static bool kvm_handle_pvm_sys64(struct kvm_vcpu *vcpu, u64 *exit_code)
|
||||
kvm_handle_pvm_sysreg(vcpu, exit_code));
|
||||
}
|
||||
|
||||
/**
|
||||
* Handler for protected floating-point and Advanced SIMD accesses.
|
||||
*
|
||||
* Returns true if the hypervisor has handled the exit, and control should go
|
||||
* back to the guest, or false if it hasn't.
|
||||
*/
|
||||
static bool kvm_handle_pvm_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
|
||||
{
|
||||
/* Linux guests assume support for floating-point and Advanced SIMD. */
|
||||
BUILD_BUG_ON(!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_FP),
|
||||
PVM_ID_AA64PFR0_ALLOW));
|
||||
BUILD_BUG_ON(!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_ASIMD),
|
||||
PVM_ID_AA64PFR0_ALLOW));
|
||||
|
||||
return kvm_hyp_handle_fpsimd(vcpu, exit_code);
|
||||
}
|
||||
|
||||
static const exit_handler_fn hyp_exit_handlers[] = {
|
||||
[0 ... ESR_ELx_EC_MAX] = NULL,
|
||||
[ESR_ELx_EC_CP15_32] = kvm_hyp_handle_cp15_32,
|
||||
@ -237,7 +217,7 @@ static const exit_handler_fn pvm_exit_handlers[] = {
|
||||
[0 ... ESR_ELx_EC_MAX] = NULL,
|
||||
[ESR_ELx_EC_SYS64] = kvm_handle_pvm_sys64,
|
||||
[ESR_ELx_EC_SVE] = kvm_handle_pvm_restricted,
|
||||
[ESR_ELx_EC_FP_ASIMD] = kvm_handle_pvm_fpsimd,
|
||||
[ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd,
|
||||
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
|
||||
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
|
||||
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
|
||||
@ -304,7 +284,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
|
||||
host_ctxt->__hyp_running_vcpu = vcpu;
|
||||
guest_ctxt = &vcpu->arch.ctxt;
|
||||
|
||||
pmu_switch_needed = __pmu_switch_to_guest(host_ctxt);
|
||||
pmu_switch_needed = __pmu_switch_to_guest(vcpu);
|
||||
|
||||
__sysreg_save_state_nvhe(host_ctxt);
|
||||
/*
|
||||
@ -366,7 +346,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
|
||||
__debug_restore_host_buffers_nvhe(vcpu);
|
||||
|
||||
if (pmu_switch_needed)
|
||||
__pmu_switch_to_host(host_ctxt);
|
||||
__pmu_switch_to_host(vcpu);
|
||||
|
||||
/* Returning to host will clear PSR.I, remask PMR if needed */
|
||||
if (system_uses_irq_prio_masking())
|
||||
@ -377,7 +357,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
|
||||
return exit_code;
|
||||
}
|
||||
|
||||
void __noreturn hyp_panic(void)
|
||||
asmlinkage void __noreturn hyp_panic(void)
|
||||
{
|
||||
u64 spsr = read_sysreg_el2(SYS_SPSR);
|
||||
u64 elr = read_sysreg_el2(SYS_ELR);
|
||||
@ -399,6 +379,11 @@ void __noreturn hyp_panic(void)
|
||||
unreachable();
|
||||
}
|
||||
|
||||
asmlinkage void __noreturn hyp_panic_bad_stack(void)
|
||||
{
|
||||
hyp_panic();
|
||||
}
|
||||
|
||||
asmlinkage void kvm_unexpected_el2_exception(void)
|
||||
{
|
||||
return __kvm_unexpected_el2_exception();
|
||||
|
@ -90,9 +90,6 @@ static u64 get_pvm_id_aa64pfr0(const struct kvm_vcpu *vcpu)
|
||||
u64 set_mask = 0;
|
||||
u64 allow_mask = PVM_ID_AA64PFR0_ALLOW;
|
||||
|
||||
if (!vcpu_has_sve(vcpu))
|
||||
allow_mask &= ~ARM64_FEATURE_MASK(ID_AA64PFR0_SVE);
|
||||
|
||||
set_mask |= get_restricted_features_unsigned(id_aa64pfr0_el1_sys_val,
|
||||
PVM_ID_AA64PFR0_RESTRICT_UNSIGNED);
|
||||
|
||||
|
@ -9,6 +9,13 @@
|
||||
#include <kvm/arm_hypercalls.h>
|
||||
#include <kvm/arm_psci.h>
|
||||
|
||||
#define KVM_ARM_SMCCC_STD_FEATURES \
|
||||
GENMASK(KVM_REG_ARM_STD_BMAP_BIT_COUNT - 1, 0)
|
||||
#define KVM_ARM_SMCCC_STD_HYP_FEATURES \
|
||||
GENMASK(KVM_REG_ARM_STD_HYP_BMAP_BIT_COUNT - 1, 0)
|
||||
#define KVM_ARM_SMCCC_VENDOR_HYP_FEATURES \
|
||||
GENMASK(KVM_REG_ARM_VENDOR_HYP_BMAP_BIT_COUNT - 1, 0)
|
||||
|
||||
static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
|
||||
{
|
||||
struct system_time_snapshot systime_snapshot;
|
||||
@ -58,13 +65,73 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
|
||||
val[3] = lower_32_bits(cycles);
|
||||
}
|
||||
|
||||
static bool kvm_hvc_call_default_allowed(u32 func_id)
|
||||
{
|
||||
switch (func_id) {
|
||||
/*
|
||||
* List of function-ids that are not gated with the bitmapped
|
||||
* feature firmware registers, and are to be allowed for
|
||||
* servicing the call by default.
|
||||
*/
|
||||
case ARM_SMCCC_VERSION_FUNC_ID:
|
||||
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
|
||||
return true;
|
||||
default:
|
||||
/* PSCI 0.2 and up is in the 0:0x1f range */
|
||||
if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
|
||||
ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
|
||||
return true;
|
||||
|
||||
/*
|
||||
* KVM's PSCI 0.1 doesn't comply with SMCCC, and has
|
||||
* its own function-id base and range
|
||||
*/
|
||||
if (func_id >= KVM_PSCI_FN(0) && func_id <= KVM_PSCI_FN(3))
|
||||
return true;
|
||||
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
|
||||
{
|
||||
struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
|
||||
|
||||
switch (func_id) {
|
||||
case ARM_SMCCC_TRNG_VERSION:
|
||||
case ARM_SMCCC_TRNG_FEATURES:
|
||||
case ARM_SMCCC_TRNG_GET_UUID:
|
||||
case ARM_SMCCC_TRNG_RND32:
|
||||
case ARM_SMCCC_TRNG_RND64:
|
||||
return test_bit(KVM_REG_ARM_STD_BIT_TRNG_V1_0,
|
||||
&smccc_feat->std_bmap);
|
||||
case ARM_SMCCC_HV_PV_TIME_FEATURES:
|
||||
case ARM_SMCCC_HV_PV_TIME_ST:
|
||||
return test_bit(KVM_REG_ARM_STD_HYP_BIT_PV_TIME,
|
||||
&smccc_feat->std_hyp_bmap);
|
||||
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
|
||||
case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
|
||||
return test_bit(KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT,
|
||||
&smccc_feat->vendor_hyp_bmap);
|
||||
case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
|
||||
return test_bit(KVM_REG_ARM_VENDOR_HYP_BIT_PTP,
|
||||
&smccc_feat->vendor_hyp_bmap);
|
||||
default:
|
||||
return kvm_hvc_call_default_allowed(func_id);
|
||||
}
|
||||
}
|
||||
|
||||
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
|
||||
u32 func_id = smccc_get_function(vcpu);
|
||||
u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
|
||||
u32 feature;
|
||||
gpa_t gpa;
|
||||
|
||||
if (!kvm_hvc_call_allowed(vcpu, func_id))
|
||||
goto out;
|
||||
|
||||
switch (func_id) {
|
||||
case ARM_SMCCC_VERSION_FUNC_ID:
|
||||
val[0] = ARM_SMCCC_VERSION_1_1;
|
||||
@ -120,7 +187,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
break;
|
||||
case ARM_SMCCC_HV_PV_TIME_FEATURES:
|
||||
val[0] = SMCCC_RET_SUCCESS;
|
||||
if (test_bit(KVM_REG_ARM_STD_HYP_BIT_PV_TIME,
|
||||
&smccc_feat->std_hyp_bmap))
|
||||
val[0] = SMCCC_RET_SUCCESS;
|
||||
break;
|
||||
}
|
||||
break;
|
||||
@ -139,8 +208,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
|
||||
val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
|
||||
break;
|
||||
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
|
||||
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
|
||||
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
|
||||
val[0] = smccc_feat->vendor_hyp_bmap;
|
||||
break;
|
||||
case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
|
||||
kvm_ptp_get_time(vcpu, val);
|
||||
@ -155,6 +223,259 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
|
||||
return kvm_psci_call(vcpu);
|
||||
}
|
||||
|
||||
out:
|
||||
smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
|
||||
return 1;
|
||||
}
|
||||
|
||||
static const u64 kvm_arm_fw_reg_ids[] = {
|
||||
KVM_REG_ARM_PSCI_VERSION,
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
|
||||
KVM_REG_ARM_STD_BMAP,
|
||||
KVM_REG_ARM_STD_HYP_BMAP,
|
||||
KVM_REG_ARM_VENDOR_HYP_BMAP,
|
||||
};
|
||||
|
||||
void kvm_arm_init_hypercalls(struct kvm *kvm)
|
||||
{
|
||||
struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
|
||||
|
||||
smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
|
||||
smccc_feat->std_hyp_bmap = KVM_ARM_SMCCC_STD_HYP_FEATURES;
|
||||
smccc_feat->vendor_hyp_bmap = KVM_ARM_SMCCC_VENDOR_HYP_FEATURES;
|
||||
}
|
||||
|
||||
int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return ARRAY_SIZE(kvm_arm_fw_reg_ids);
|
||||
}
|
||||
|
||||
int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
|
||||
{
|
||||
int i;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(kvm_arm_fw_reg_ids); i++) {
|
||||
if (put_user(kvm_arm_fw_reg_ids[i], uindices++))
|
||||
return -EFAULT;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
#define KVM_REG_FEATURE_LEVEL_MASK GENMASK(3, 0)
|
||||
|
||||
/*
|
||||
* Convert the workaround level into an easy-to-compare number, where higher
|
||||
* values mean better protection.
|
||||
*/
|
||||
static int get_kernel_wa_level(u64 regid)
|
||||
{
|
||||
switch (regid) {
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
|
||||
switch (arm64_get_spectre_v2_state()) {
|
||||
case SPECTRE_VULNERABLE:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
|
||||
case SPECTRE_MITIGATED:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL;
|
||||
case SPECTRE_UNAFFECTED:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED;
|
||||
}
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
|
||||
switch (arm64_get_spectre_v4_state()) {
|
||||
case SPECTRE_MITIGATED:
|
||||
/*
|
||||
* As for the hypercall discovery, we pretend we
|
||||
* don't have any FW mitigation if SSBS is there at
|
||||
* all times.
|
||||
*/
|
||||
if (cpus_have_final_cap(ARM64_SSBS))
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
|
||||
fallthrough;
|
||||
case SPECTRE_UNAFFECTED:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
|
||||
case SPECTRE_VULNERABLE:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
|
||||
}
|
||||
break;
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
|
||||
switch (arm64_get_spectre_bhb_state()) {
|
||||
case SPECTRE_VULNERABLE:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_NOT_AVAIL;
|
||||
case SPECTRE_MITIGATED:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_AVAIL;
|
||||
case SPECTRE_UNAFFECTED:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_NOT_REQUIRED;
|
||||
}
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_NOT_AVAIL;
|
||||
}
|
||||
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
|
||||
{
|
||||
struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
|
||||
void __user *uaddr = (void __user *)(long)reg->addr;
|
||||
u64 val;
|
||||
|
||||
switch (reg->id) {
|
||||
case KVM_REG_ARM_PSCI_VERSION:
|
||||
val = kvm_psci_version(vcpu);
|
||||
break;
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
|
||||
val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
|
||||
break;
|
||||
case KVM_REG_ARM_STD_BMAP:
|
||||
val = READ_ONCE(smccc_feat->std_bmap);
|
||||
break;
|
||||
case KVM_REG_ARM_STD_HYP_BMAP:
|
||||
val = READ_ONCE(smccc_feat->std_hyp_bmap);
|
||||
break;
|
||||
case KVM_REG_ARM_VENDOR_HYP_BMAP:
|
||||
val = READ_ONCE(smccc_feat->vendor_hyp_bmap);
|
||||
break;
|
||||
default:
|
||||
return -ENOENT;
|
||||
}
|
||||
|
||||
if (copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id)))
|
||||
return -EFAULT;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
|
||||
{
|
||||
int ret = 0;
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
|
||||
unsigned long *fw_reg_bmap, fw_reg_features;
|
||||
|
||||
switch (reg_id) {
|
||||
case KVM_REG_ARM_STD_BMAP:
|
||||
fw_reg_bmap = &smccc_feat->std_bmap;
|
||||
fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
|
||||
break;
|
||||
case KVM_REG_ARM_STD_HYP_BMAP:
|
||||
fw_reg_bmap = &smccc_feat->std_hyp_bmap;
|
||||
fw_reg_features = KVM_ARM_SMCCC_STD_HYP_FEATURES;
|
||||
break;
|
||||
case KVM_REG_ARM_VENDOR_HYP_BMAP:
|
||||
fw_reg_bmap = &smccc_feat->vendor_hyp_bmap;
|
||||
fw_reg_features = KVM_ARM_SMCCC_VENDOR_HYP_FEATURES;
|
||||
break;
|
||||
default:
|
||||
return -ENOENT;
|
||||
}
|
||||
|
||||
/* Check for unsupported bit */
|
||||
if (val & ~fw_reg_features)
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
|
||||
if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags) &&
|
||||
val != *fw_reg_bmap) {
|
||||
ret = -EBUSY;
|
||||
goto out;
|
||||
}
|
||||
|
||||
WRITE_ONCE(*fw_reg_bmap, val);
|
||||
out:
|
||||
mutex_unlock(&kvm->lock);
|
||||
return ret;
|
||||
}
|
||||
|
||||
int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
|
||||
{
|
||||
void __user *uaddr = (void __user *)(long)reg->addr;
|
||||
u64 val;
|
||||
int wa_level;
|
||||
|
||||
if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
|
||||
return -EFAULT;
|
||||
|
||||
switch (reg->id) {
|
||||
case KVM_REG_ARM_PSCI_VERSION:
|
||||
{
|
||||
bool wants_02;
|
||||
|
||||
wants_02 = test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features);
|
||||
|
||||
switch (val) {
|
||||
case KVM_ARM_PSCI_0_1:
|
||||
if (wants_02)
|
||||
return -EINVAL;
|
||||
vcpu->kvm->arch.psci_version = val;
|
||||
return 0;
|
||||
case KVM_ARM_PSCI_0_2:
|
||||
case KVM_ARM_PSCI_1_0:
|
||||
case KVM_ARM_PSCI_1_1:
|
||||
if (!wants_02)
|
||||
return -EINVAL;
|
||||
vcpu->kvm->arch.psci_version = val;
|
||||
return 0;
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
|
||||
if (val & ~KVM_REG_FEATURE_LEVEL_MASK)
|
||||
return -EINVAL;
|
||||
|
||||
if (get_kernel_wa_level(reg->id) < val)
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
|
||||
if (val & ~(KVM_REG_FEATURE_LEVEL_MASK |
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED))
|
||||
return -EINVAL;
|
||||
|
||||
/* The enabled bit must not be set unless the level is AVAIL. */
|
||||
if ((val & KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED) &&
|
||||
(val & KVM_REG_FEATURE_LEVEL_MASK) != KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL)
|
||||
return -EINVAL;
|
||||
|
||||
/*
|
||||
* Map all the possible incoming states to the only two we
|
||||
* really want to deal with.
|
||||
*/
|
||||
switch (val & KVM_REG_FEATURE_LEVEL_MASK) {
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
|
||||
wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
|
||||
break;
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
|
||||
wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
|
||||
break;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/*
|
||||
* We can deal with NOT_AVAIL on NOT_REQUIRED, but not the
|
||||
* other way around.
|
||||
*/
|
||||
if (get_kernel_wa_level(reg->id) < wa_level)
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
case KVM_REG_ARM_STD_BMAP:
|
||||
case KVM_REG_ARM_STD_HYP_BMAP:
|
||||
case KVM_REG_ARM_VENDOR_HYP_BMAP:
|
||||
return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
|
||||
default:
|
||||
return -ENOENT;
|
||||
}
|
||||
|
||||
return -EINVAL;
|
||||
}
|
||||
|
@ -258,8 +258,8 @@ static bool kvm_host_owns_hyp_mappings(void)
|
||||
return true;
|
||||
}
|
||||
|
||||
static int __create_hyp_mappings(unsigned long start, unsigned long size,
|
||||
unsigned long phys, enum kvm_pgtable_prot prot)
|
||||
int __create_hyp_mappings(unsigned long start, unsigned long size,
|
||||
unsigned long phys, enum kvm_pgtable_prot prot)
|
||||
{
|
||||
int err;
|
||||
|
||||
@ -457,23 +457,22 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
|
||||
unsigned long *haddr,
|
||||
enum kvm_pgtable_prot prot)
|
||||
|
||||
/**
|
||||
* hyp_alloc_private_va_range - Allocates a private VA range.
|
||||
* @size: The size of the VA range to reserve.
|
||||
* @haddr: The hypervisor virtual start address of the allocation.
|
||||
*
|
||||
* The private virtual address (VA) range is allocated below io_map_base
|
||||
* and aligned based on the order of @size.
|
||||
*
|
||||
* Return: 0 on success or negative error code on failure.
|
||||
*/
|
||||
int hyp_alloc_private_va_range(size_t size, unsigned long *haddr)
|
||||
{
|
||||
unsigned long base;
|
||||
int ret = 0;
|
||||
|
||||
if (!kvm_host_owns_hyp_mappings()) {
|
||||
base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
|
||||
phys_addr, size, prot);
|
||||
if (IS_ERR_OR_NULL((void *)base))
|
||||
return PTR_ERR((void *)base);
|
||||
*haddr = base;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
mutex_lock(&kvm_hyp_pgd_mutex);
|
||||
|
||||
/*
|
||||
@ -484,8 +483,10 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
|
||||
*
|
||||
* The allocated size is always a multiple of PAGE_SIZE.
|
||||
*/
|
||||
size = PAGE_ALIGN(size + offset_in_page(phys_addr));
|
||||
base = io_map_base - size;
|
||||
base = io_map_base - PAGE_ALIGN(size);
|
||||
|
||||
/* Align the allocation based on the order of its size */
|
||||
base = ALIGN_DOWN(base, PAGE_SIZE << get_order(size));
|
||||
|
||||
/*
|
||||
* Verify that BIT(VA_BITS - 1) hasn't been flipped by
|
||||
@ -495,19 +496,40 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
|
||||
if ((base ^ io_map_base) & BIT(VA_BITS - 1))
|
||||
ret = -ENOMEM;
|
||||
else
|
||||
io_map_base = base;
|
||||
*haddr = io_map_base = base;
|
||||
|
||||
mutex_unlock(&kvm_hyp_pgd_mutex);
|
||||
|
||||
if (ret)
|
||||
goto out;
|
||||
return ret;
|
||||
}
|
||||
|
||||
ret = __create_hyp_mappings(base, size, phys_addr, prot);
|
||||
if (ret)
|
||||
goto out;
|
||||
static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
|
||||
unsigned long *haddr,
|
||||
enum kvm_pgtable_prot prot)
|
||||
{
|
||||
unsigned long addr;
|
||||
int ret = 0;
|
||||
|
||||
*haddr = base + offset_in_page(phys_addr);
|
||||
out:
|
||||
if (!kvm_host_owns_hyp_mappings()) {
|
||||
addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
|
||||
phys_addr, size, prot);
|
||||
if (IS_ERR_VALUE(addr))
|
||||
return addr;
|
||||
*haddr = addr;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
size = PAGE_ALIGN(size + offset_in_page(phys_addr));
|
||||
ret = hyp_alloc_private_va_range(size, &addr);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
ret = __create_hyp_mappings(addr, size, phys_addr, prot);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
*haddr = addr + offset_in_page(phys_addr);
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
@ -774,8 +774,7 @@ void kvm_host_pmu_init(struct arm_pmu *pmu)
|
||||
{
|
||||
struct arm_pmu_entry *entry;
|
||||
|
||||
if (pmu->pmuver == 0 || pmu->pmuver == ID_AA64DFR0_PMUVER_IMP_DEF ||
|
||||
is_protected_kvm_enabled())
|
||||
if (pmu->pmuver == 0 || pmu->pmuver == ID_AA64DFR0_PMUVER_IMP_DEF)
|
||||
return;
|
||||
|
||||
mutex_lock(&arm_pmus_lock);
|
||||
|
@ -5,7 +5,8 @@
|
||||
*/
|
||||
#include <linux/kvm_host.h>
|
||||
#include <linux/perf_event.h>
|
||||
#include <asm/kvm_hyp.h>
|
||||
|
||||
static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events);
|
||||
|
||||
/*
|
||||
* Given the perf event attributes and system type, determine
|
||||
@ -25,21 +26,26 @@ static bool kvm_pmu_switch_needed(struct perf_event_attr *attr)
|
||||
return (attr->exclude_host != attr->exclude_guest);
|
||||
}
|
||||
|
||||
struct kvm_pmu_events *kvm_get_pmu_events(void)
|
||||
{
|
||||
return this_cpu_ptr(&kvm_pmu_events);
|
||||
}
|
||||
|
||||
/*
|
||||
* Add events to track that we may want to switch at guest entry/exit
|
||||
* time.
|
||||
*/
|
||||
void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
|
||||
{
|
||||
struct kvm_host_data *ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
|
||||
struct kvm_pmu_events *pmu = kvm_get_pmu_events();
|
||||
|
||||
if (!kvm_arm_support_pmu_v3() || !ctx || !kvm_pmu_switch_needed(attr))
|
||||
if (!kvm_arm_support_pmu_v3() || !pmu || !kvm_pmu_switch_needed(attr))
|
||||
return;
|
||||
|
||||
if (!attr->exclude_host)
|
||||
ctx->pmu_events.events_host |= set;
|
||||
pmu->events_host |= set;
|
||||
if (!attr->exclude_guest)
|
||||
ctx->pmu_events.events_guest |= set;
|
||||
pmu->events_guest |= set;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -47,13 +53,13 @@ void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
|
||||
*/
|
||||
void kvm_clr_pmu_events(u32 clr)
|
||||
{
|
||||
struct kvm_host_data *ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
|
||||
struct kvm_pmu_events *pmu = kvm_get_pmu_events();
|
||||
|
||||
if (!kvm_arm_support_pmu_v3() || !ctx)
|
||||
if (!kvm_arm_support_pmu_v3() || !pmu)
|
||||
return;
|
||||
|
||||
ctx->pmu_events.events_host &= ~clr;
|
||||
ctx->pmu_events.events_guest &= ~clr;
|
||||
pmu->events_host &= ~clr;
|
||||
pmu->events_guest &= ~clr;
|
||||
}
|
||||
|
||||
#define PMEVTYPER_READ_CASE(idx) \
|
||||
@ -169,16 +175,16 @@ static void kvm_vcpu_pmu_disable_el0(unsigned long events)
|
||||
*/
|
||||
void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_host_data *host;
|
||||
struct kvm_pmu_events *pmu;
|
||||
u32 events_guest, events_host;
|
||||
|
||||
if (!kvm_arm_support_pmu_v3() || !has_vhe())
|
||||
return;
|
||||
|
||||
preempt_disable();
|
||||
host = this_cpu_ptr_hyp_sym(kvm_host_data);
|
||||
events_guest = host->pmu_events.events_guest;
|
||||
events_host = host->pmu_events.events_host;
|
||||
pmu = kvm_get_pmu_events();
|
||||
events_guest = pmu->events_guest;
|
||||
events_host = pmu->events_host;
|
||||
|
||||
kvm_vcpu_pmu_enable_el0(events_guest);
|
||||
kvm_vcpu_pmu_disable_el0(events_host);
|
||||
@ -190,15 +196,15 @@ void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
|
||||
*/
|
||||
void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_host_data *host;
|
||||
struct kvm_pmu_events *pmu;
|
||||
u32 events_guest, events_host;
|
||||
|
||||
if (!kvm_arm_support_pmu_v3() || !has_vhe())
|
||||
return;
|
||||
|
||||
host = this_cpu_ptr_hyp_sym(kvm_host_data);
|
||||
events_guest = host->pmu_events.events_guest;
|
||||
events_host = host->pmu_events.events_host;
|
||||
pmu = kvm_get_pmu_events();
|
||||
events_guest = pmu->events_guest;
|
||||
events_host = pmu->events_host;
|
||||
|
||||
kvm_vcpu_pmu_enable_el0(events_host);
|
||||
kvm_vcpu_pmu_disable_el0(events_guest);
|
||||
|
@ -51,13 +51,6 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
|
||||
return PSCI_RET_SUCCESS;
|
||||
}
|
||||
|
||||
static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
vcpu->arch.power_off = true;
|
||||
kvm_make_request(KVM_REQ_SLEEP, vcpu);
|
||||
kvm_vcpu_kick(vcpu);
|
||||
}
|
||||
|
||||
static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
|
||||
unsigned long affinity)
|
||||
{
|
||||
@ -83,7 +76,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
|
||||
*/
|
||||
if (!vcpu)
|
||||
return PSCI_RET_INVALID_PARAMS;
|
||||
if (!vcpu->arch.power_off) {
|
||||
if (!kvm_arm_vcpu_stopped(vcpu)) {
|
||||
if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
|
||||
return PSCI_RET_ALREADY_ON;
|
||||
else
|
||||
@ -107,12 +100,12 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
|
||||
kvm_make_request(KVM_REQ_VCPU_RESET, vcpu);
|
||||
|
||||
/*
|
||||
* Make sure the reset request is observed if the change to
|
||||
* power_off is observed.
|
||||
* Make sure the reset request is observed if the RUNNABLE mp_state is
|
||||
* observed.
|
||||
*/
|
||||
smp_wmb();
|
||||
|
||||
vcpu->arch.power_off = false;
|
||||
vcpu->arch.mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
|
||||
kvm_vcpu_wake_up(vcpu);
|
||||
|
||||
return PSCI_RET_SUCCESS;
|
||||
@ -150,7 +143,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
|
||||
mpidr = kvm_vcpu_get_mpidr_aff(tmp);
|
||||
if ((mpidr & target_affinity_mask) == target_affinity) {
|
||||
matching_cpus++;
|
||||
if (!tmp->arch.power_off)
|
||||
if (!kvm_arm_vcpu_stopped(tmp))
|
||||
return PSCI_0_2_AFFINITY_LEVEL_ON;
|
||||
}
|
||||
}
|
||||
@ -176,7 +169,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type, u64 flags)
|
||||
* re-initialized.
|
||||
*/
|
||||
kvm_for_each_vcpu(i, tmp, vcpu->kvm)
|
||||
tmp->arch.power_off = true;
|
||||
tmp->arch.mp_state.mp_state = KVM_MP_STATE_STOPPED;
|
||||
kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
|
||||
|
||||
memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
|
||||
@ -202,6 +195,15 @@ static void kvm_psci_system_reset2(struct kvm_vcpu *vcpu)
|
||||
KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2);
|
||||
}
|
||||
|
||||
static void kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_run *run = vcpu->run;
|
||||
|
||||
memset(&run->system_event, 0, sizeof(vcpu->run->system_event));
|
||||
run->system_event.type = KVM_SYSTEM_EVENT_SUSPEND;
|
||||
run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
|
||||
}
|
||||
|
||||
static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
int i;
|
||||
@ -245,7 +247,7 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
|
||||
val = kvm_psci_vcpu_suspend(vcpu);
|
||||
break;
|
||||
case PSCI_0_2_FN_CPU_OFF:
|
||||
kvm_psci_vcpu_off(vcpu);
|
||||
kvm_arm_vcpu_power_off(vcpu);
|
||||
val = PSCI_RET_SUCCESS;
|
||||
break;
|
||||
case PSCI_0_2_FN_CPU_ON:
|
||||
@ -305,9 +307,10 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
|
||||
|
||||
static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
|
||||
{
|
||||
unsigned long val = PSCI_RET_NOT_SUPPORTED;
|
||||
u32 psci_fn = smccc_get_function(vcpu);
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
u32 arg;
|
||||
unsigned long val;
|
||||
int ret = 1;
|
||||
|
||||
switch(psci_fn) {
|
||||
@ -320,6 +323,8 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
|
||||
if (val)
|
||||
break;
|
||||
|
||||
val = PSCI_RET_NOT_SUPPORTED;
|
||||
|
||||
switch(arg) {
|
||||
case PSCI_0_2_FN_PSCI_VERSION:
|
||||
case PSCI_0_2_FN_CPU_SUSPEND:
|
||||
@ -336,18 +341,32 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
|
||||
case ARM_SMCCC_VERSION_FUNC_ID:
|
||||
val = 0;
|
||||
break;
|
||||
case PSCI_1_0_FN_SYSTEM_SUSPEND:
|
||||
case PSCI_1_0_FN64_SYSTEM_SUSPEND:
|
||||
if (test_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags))
|
||||
val = 0;
|
||||
break;
|
||||
case PSCI_1_1_FN_SYSTEM_RESET2:
|
||||
case PSCI_1_1_FN64_SYSTEM_RESET2:
|
||||
if (minor >= 1) {
|
||||
if (minor >= 1)
|
||||
val = 0;
|
||||
break;
|
||||
}
|
||||
fallthrough;
|
||||
default:
|
||||
val = PSCI_RET_NOT_SUPPORTED;
|
||||
break;
|
||||
}
|
||||
break;
|
||||
case PSCI_1_0_FN_SYSTEM_SUSPEND:
|
||||
kvm_psci_narrow_to_32bit(vcpu);
|
||||
fallthrough;
|
||||
case PSCI_1_0_FN64_SYSTEM_SUSPEND:
|
||||
/*
|
||||
* Return directly to userspace without changing the vCPU's
|
||||
* registers. Userspace depends on reading the SMCCC parameters
|
||||
* to implement SYSTEM_SUSPEND.
|
||||
*/
|
||||
if (test_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags)) {
|
||||
kvm_psci_system_suspend(vcpu);
|
||||
return 0;
|
||||
}
|
||||
break;
|
||||
case PSCI_1_1_FN_SYSTEM_RESET2:
|
||||
kvm_psci_narrow_to_32bit(vcpu);
|
||||
fallthrough;
|
||||
@ -365,7 +384,7 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
|
||||
val = PSCI_RET_INVALID_PARAMS;
|
||||
break;
|
||||
}
|
||||
fallthrough;
|
||||
break;
|
||||
default:
|
||||
return kvm_psci_0_2_call(vcpu);
|
||||
}
|
||||
@ -382,7 +401,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
|
||||
|
||||
switch (psci_fn) {
|
||||
case KVM_PSCI_FN_CPU_OFF:
|
||||
kvm_psci_vcpu_off(vcpu);
|
||||
kvm_arm_vcpu_power_off(vcpu);
|
||||
val = PSCI_RET_SUCCESS;
|
||||
break;
|
||||
case KVM_PSCI_FN_CPU_ON:
|
||||
@ -437,186 +456,3 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
|
||||
return -EINVAL;
|
||||
}
|
||||
}
|
||||
|
||||
int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return 4; /* PSCI version and three workaround registers */
|
||||
}
|
||||
|
||||
int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
|
||||
{
|
||||
if (put_user(KVM_REG_ARM_PSCI_VERSION, uindices++))
|
||||
return -EFAULT;
|
||||
|
||||
if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1, uindices++))
|
||||
return -EFAULT;
|
||||
|
||||
if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2, uindices++))
|
||||
return -EFAULT;
|
||||
|
||||
if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3, uindices++))
|
||||
return -EFAULT;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
#define KVM_REG_FEATURE_LEVEL_WIDTH 4
|
||||
#define KVM_REG_FEATURE_LEVEL_MASK (BIT(KVM_REG_FEATURE_LEVEL_WIDTH) - 1)
|
||||
|
||||
/*
|
||||
* Convert the workaround level into an easy-to-compare number, where higher
|
||||
* values mean better protection.
|
||||
*/
|
||||
static int get_kernel_wa_level(u64 regid)
|
||||
{
|
||||
switch (regid) {
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
|
||||
switch (arm64_get_spectre_v2_state()) {
|
||||
case SPECTRE_VULNERABLE:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
|
||||
case SPECTRE_MITIGATED:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL;
|
||||
case SPECTRE_UNAFFECTED:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED;
|
||||
}
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL;
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
|
||||
switch (arm64_get_spectre_v4_state()) {
|
||||
case SPECTRE_MITIGATED:
|
||||
/*
|
||||
* As for the hypercall discovery, we pretend we
|
||||
* don't have any FW mitigation if SSBS is there at
|
||||
* all times.
|
||||
*/
|
||||
if (cpus_have_final_cap(ARM64_SSBS))
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
|
||||
fallthrough;
|
||||
case SPECTRE_UNAFFECTED:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
|
||||
case SPECTRE_VULNERABLE:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
|
||||
}
|
||||
break;
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
|
||||
switch (arm64_get_spectre_bhb_state()) {
|
||||
case SPECTRE_VULNERABLE:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_NOT_AVAIL;
|
||||
case SPECTRE_MITIGATED:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_AVAIL;
|
||||
case SPECTRE_UNAFFECTED:
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_NOT_REQUIRED;
|
||||
}
|
||||
return KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_NOT_AVAIL;
|
||||
}
|
||||
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
|
||||
{
|
||||
void __user *uaddr = (void __user *)(long)reg->addr;
|
||||
u64 val;
|
||||
|
||||
switch (reg->id) {
|
||||
case KVM_REG_ARM_PSCI_VERSION:
|
||||
val = kvm_psci_version(vcpu);
|
||||
break;
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
|
||||
val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
|
||||
break;
|
||||
default:
|
||||
return -ENOENT;
|
||||
}
|
||||
|
||||
if (copy_to_user(uaddr, &val, KVM_REG_SIZE(reg->id)))
|
||||
return -EFAULT;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
|
||||
{
|
||||
void __user *uaddr = (void __user *)(long)reg->addr;
|
||||
u64 val;
|
||||
int wa_level;
|
||||
|
||||
if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
|
||||
return -EFAULT;
|
||||
|
||||
switch (reg->id) {
|
||||
case KVM_REG_ARM_PSCI_VERSION:
|
||||
{
|
||||
bool wants_02;
|
||||
|
||||
wants_02 = test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features);
|
||||
|
||||
switch (val) {
|
||||
case KVM_ARM_PSCI_0_1:
|
||||
if (wants_02)
|
||||
return -EINVAL;
|
||||
vcpu->kvm->arch.psci_version = val;
|
||||
return 0;
|
||||
case KVM_ARM_PSCI_0_2:
|
||||
case KVM_ARM_PSCI_1_0:
|
||||
case KVM_ARM_PSCI_1_1:
|
||||
if (!wants_02)
|
||||
return -EINVAL;
|
||||
vcpu->kvm->arch.psci_version = val;
|
||||
return 0;
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
|
||||
if (val & ~KVM_REG_FEATURE_LEVEL_MASK)
|
||||
return -EINVAL;
|
||||
|
||||
if (get_kernel_wa_level(reg->id) < val)
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
|
||||
if (val & ~(KVM_REG_FEATURE_LEVEL_MASK |
|
||||
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED))
|
||||
return -EINVAL;
|
||||
|
||||
/* The enabled bit must not be set unless the level is AVAIL. */
|
||||
if ((val & KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED) &&
|
||||
(val & KVM_REG_FEATURE_LEVEL_MASK) != KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL)
|
||||
return -EINVAL;
|
||||
|
||||
/*
|
||||
* Map all the possible incoming states to the only two we
|
||||
* really want to deal with.
|
||||
*/
|
||||
switch (val & KVM_REG_FEATURE_LEVEL_MASK) {
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
|
||||
wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL;
|
||||
break;
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
|
||||
case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
|
||||
wa_level = KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED;
|
||||
break;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/*
|
||||
* We can deal with NOT_AVAIL on NOT_REQUIRED, but not the
|
||||
* other way around.
|
||||
*/
|
||||
if (get_kernel_wa_level(reg->id) < wa_level)
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
default:
|
||||
return -ENOENT;
|
||||
}
|
||||
|
||||
return -EINVAL;
|
||||
}
|
||||
|
@ -1145,6 +1145,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
|
||||
if (!vcpu_has_ptrauth(vcpu))
|
||||
val &= ~(ARM64_FEATURE_MASK(ID_AA64ISAR2_APA3) |
|
||||
ARM64_FEATURE_MASK(ID_AA64ISAR2_GPA3));
|
||||
if (!cpus_have_final_cap(ARM64_HAS_WFXT))
|
||||
val &= ~ARM64_FEATURE_MASK(ID_AA64ISAR2_WFXT);
|
||||
break;
|
||||
case SYS_ID_AA64DFR0_EL1:
|
||||
/* Limit debug to ARMv8.0 */
|
||||
@ -2020,20 +2022,22 @@ static const struct sys_reg_desc cp14_64_regs[] = {
|
||||
{ Op1( 0), CRm( 2), .access = trap_raz_wi },
|
||||
};
|
||||
|
||||
#define CP15_PMU_SYS_REG(_map, _Op1, _CRn, _CRm, _Op2) \
|
||||
AA32(_map), \
|
||||
Op1(_Op1), CRn(_CRn), CRm(_CRm), Op2(_Op2), \
|
||||
.visibility = pmu_visibility
|
||||
|
||||
/* Macro to expand the PMEVCNTRn register */
|
||||
#define PMU_PMEVCNTR(n) \
|
||||
/* PMEVCNTRn */ \
|
||||
{ Op1(0), CRn(0b1110), \
|
||||
CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
|
||||
access_pmu_evcntr }
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 0b1110, \
|
||||
(0b1000 | (((n) >> 3) & 0x3)), ((n) & 0x7)), \
|
||||
.access = access_pmu_evcntr }
|
||||
|
||||
/* Macro to expand the PMEVTYPERn register */
|
||||
#define PMU_PMEVTYPER(n) \
|
||||
/* PMEVTYPERn */ \
|
||||
{ Op1(0), CRn(0b1110), \
|
||||
CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
|
||||
access_pmu_evtyper }
|
||||
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 0b1110, \
|
||||
(0b1100 | (((n) >> 3) & 0x3)), ((n) & 0x7)), \
|
||||
.access = access_pmu_evtyper }
|
||||
/*
|
||||
* Trapped cp15 registers. TTBR0/TTBR1 get a double encoding,
|
||||
* depending on the way they are accessed (as a 32bit or a 64bit
|
||||
@ -2073,25 +2077,25 @@ static const struct sys_reg_desc cp15_regs[] = {
|
||||
{ Op1( 0), CRn( 7), CRm(14), Op2( 2), access_dcsw },
|
||||
|
||||
/* PMU */
|
||||
{ Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr },
|
||||
{ Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmcnten },
|
||||
{ Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmcnten },
|
||||
{ Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmovs },
|
||||
{ Op1( 0), CRn( 9), CRm(12), Op2( 4), access_pmswinc },
|
||||
{ Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmselr },
|
||||
{ AA32(LO), Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmceid },
|
||||
{ AA32(LO), Op1( 0), CRn( 9), CRm(12), Op2( 7), access_pmceid },
|
||||
{ Op1( 0), CRn( 9), CRm(13), Op2( 0), access_pmu_evcntr },
|
||||
{ Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_evtyper },
|
||||
{ Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_evcntr },
|
||||
{ Op1( 0), CRn( 9), CRm(14), Op2( 0), access_pmuserenr },
|
||||
{ Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pminten },
|
||||
{ Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pminten },
|
||||
{ Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmovs },
|
||||
{ AA32(HI), Op1( 0), CRn( 9), CRm(14), Op2( 4), access_pmceid },
|
||||
{ AA32(HI), Op1( 0), CRn( 9), CRm(14), Op2( 5), access_pmceid },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 0), .access = access_pmcr },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 1), .access = access_pmcnten },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 2), .access = access_pmcnten },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 3), .access = access_pmovs },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 4), .access = access_pmswinc },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 12, 5), .access = access_pmselr },
|
||||
{ CP15_PMU_SYS_REG(LO, 0, 9, 12, 6), .access = access_pmceid },
|
||||
{ CP15_PMU_SYS_REG(LO, 0, 9, 12, 7), .access = access_pmceid },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 13, 0), .access = access_pmu_evcntr },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 13, 1), .access = access_pmu_evtyper },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 13, 2), .access = access_pmu_evcntr },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 0), .access = access_pmuserenr },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 1), .access = access_pminten },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 2), .access = access_pminten },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 3), .access = access_pmovs },
|
||||
{ CP15_PMU_SYS_REG(HI, 0, 9, 14, 4), .access = access_pmceid },
|
||||
{ CP15_PMU_SYS_REG(HI, 0, 9, 14, 5), .access = access_pmceid },
|
||||
/* PMMIR */
|
||||
{ Op1( 0), CRn( 9), CRm(14), Op2( 6), trap_raz_wi },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 9, 14, 6), .access = trap_raz_wi },
|
||||
|
||||
/* PRRR/MAIR0 */
|
||||
{ AA32(LO), Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, MAIR_EL1 },
|
||||
@ -2176,7 +2180,7 @@ static const struct sys_reg_desc cp15_regs[] = {
|
||||
PMU_PMEVTYPER(29),
|
||||
PMU_PMEVTYPER(30),
|
||||
/* PMCCFILTR */
|
||||
{ Op1(0), CRn(14), CRm(15), Op2(7), access_pmu_evtyper },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 14, 15, 7), .access = access_pmu_evtyper },
|
||||
|
||||
{ Op1(1), CRn( 0), CRm( 0), Op2(0), access_ccsidr },
|
||||
{ Op1(1), CRn( 0), CRm( 0), Op2(1), access_clidr },
|
||||
@ -2185,7 +2189,7 @@ static const struct sys_reg_desc cp15_regs[] = {
|
||||
|
||||
static const struct sys_reg_desc cp15_64_regs[] = {
|
||||
{ Op1( 0), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, TTBR0_EL1 },
|
||||
{ Op1( 0), CRn( 0), CRm( 9), Op2( 0), access_pmu_evcntr },
|
||||
{ CP15_PMU_SYS_REG(DIRECT, 0, 0, 9, 0), .access = access_pmu_evcntr },
|
||||
{ Op1( 0), CRn( 0), CRm(12), Op2( 0), access_gic_sgi }, /* ICC_SGI1R */
|
||||
{ Op1( 1), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, TTBR1_EL1 },
|
||||
{ Op1( 1), CRn( 0), CRm(12), Op2( 0), access_gic_sgi }, /* ICC_ASGI1R */
|
||||
@ -2193,25 +2197,24 @@ static const struct sys_reg_desc cp15_64_regs[] = {
|
||||
{ SYS_DESC(SYS_AARCH32_CNTP_CVAL), access_arch_timer },
|
||||
};
|
||||
|
||||
static int check_sysreg_table(const struct sys_reg_desc *table, unsigned int n,
|
||||
bool is_32)
|
||||
static bool check_sysreg_table(const struct sys_reg_desc *table, unsigned int n,
|
||||
bool is_32)
|
||||
{
|
||||
unsigned int i;
|
||||
|
||||
for (i = 0; i < n; i++) {
|
||||
if (!is_32 && table[i].reg && !table[i].reset) {
|
||||
kvm_err("sys_reg table %p entry %d has lacks reset\n",
|
||||
table, i);
|
||||
return 1;
|
||||
kvm_err("sys_reg table %pS entry %d lacks reset\n", &table[i], i);
|
||||
return false;
|
||||
}
|
||||
|
||||
if (i && cmp_sys_reg(&table[i-1], &table[i]) >= 0) {
|
||||
kvm_err("sys_reg table %p out of order (%d)\n", table, i - 1);
|
||||
return 1;
|
||||
kvm_err("sys_reg table %pS entry %d out of order\n", &table[i - 1], i - 1);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
return true;
|
||||
}
|
||||
|
||||
int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu)
|
||||
@ -2252,27 +2255,27 @@ static void perform_access(struct kvm_vcpu *vcpu,
|
||||
* @table: array of trap descriptors
|
||||
* @num: size of the trap descriptor array
|
||||
*
|
||||
* Return 0 if the access has been handled, and -1 if not.
|
||||
* Return true if the access has been handled, false if not.
|
||||
*/
|
||||
static int emulate_cp(struct kvm_vcpu *vcpu,
|
||||
struct sys_reg_params *params,
|
||||
const struct sys_reg_desc *table,
|
||||
size_t num)
|
||||
static bool emulate_cp(struct kvm_vcpu *vcpu,
|
||||
struct sys_reg_params *params,
|
||||
const struct sys_reg_desc *table,
|
||||
size_t num)
|
||||
{
|
||||
const struct sys_reg_desc *r;
|
||||
|
||||
if (!table)
|
||||
return -1; /* Not handled */
|
||||
return false; /* Not handled */
|
||||
|
||||
r = find_reg(params, table, num);
|
||||
|
||||
if (r) {
|
||||
perform_access(vcpu, params, r);
|
||||
return 0;
|
||||
return true;
|
||||
}
|
||||
|
||||
/* Not handled */
|
||||
return -1;
|
||||
return false;
|
||||
}
|
||||
|
||||
static void unhandled_cp_access(struct kvm_vcpu *vcpu,
|
||||
@ -2336,7 +2339,7 @@ static int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
|
||||
* potential register operation in the case of a read and return
|
||||
* with success.
|
||||
*/
|
||||
if (!emulate_cp(vcpu, ¶ms, global, nr_global)) {
|
||||
if (emulate_cp(vcpu, ¶ms, global, nr_global)) {
|
||||
/* Split up the value between registers for the read side */
|
||||
if (!params.is_write) {
|
||||
vcpu_set_reg(vcpu, Rt, lower_32_bits(params.regval));
|
||||
@ -2350,34 +2353,144 @@ static int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
|
||||
return 1;
|
||||
}
|
||||
|
||||
static bool emulate_sys_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *params);
|
||||
|
||||
/*
|
||||
* The CP10 ID registers are architecturally mapped to AArch64 feature
|
||||
* registers. Abuse that fact so we can rely on the AArch64 handler for accesses
|
||||
* from AArch32.
|
||||
*/
|
||||
static bool kvm_esr_cp10_id_to_sys64(u64 esr, struct sys_reg_params *params)
|
||||
{
|
||||
u8 reg_id = (esr >> 10) & 0xf;
|
||||
bool valid;
|
||||
|
||||
params->is_write = ((esr & 1) == 0);
|
||||
params->Op0 = 3;
|
||||
params->Op1 = 0;
|
||||
params->CRn = 0;
|
||||
params->CRm = 3;
|
||||
|
||||
/* CP10 ID registers are read-only */
|
||||
valid = !params->is_write;
|
||||
|
||||
switch (reg_id) {
|
||||
/* MVFR0 */
|
||||
case 0b0111:
|
||||
params->Op2 = 0;
|
||||
break;
|
||||
/* MVFR1 */
|
||||
case 0b0110:
|
||||
params->Op2 = 1;
|
||||
break;
|
||||
/* MVFR2 */
|
||||
case 0b0101:
|
||||
params->Op2 = 2;
|
||||
break;
|
||||
default:
|
||||
valid = false;
|
||||
}
|
||||
|
||||
if (valid)
|
||||
return true;
|
||||
|
||||
kvm_pr_unimpl("Unhandled cp10 register %s: %u\n",
|
||||
params->is_write ? "write" : "read", reg_id);
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* kvm_handle_cp10_id() - Handles a VMRS trap on guest access to a 'Media and
|
||||
* VFP Register' from AArch32.
|
||||
* @vcpu: The vCPU pointer
|
||||
*
|
||||
* MVFR{0-2} are architecturally mapped to the AArch64 MVFR{0-2}_EL1 registers.
|
||||
* Work out the correct AArch64 system register encoding and reroute to the
|
||||
* AArch64 system register emulation.
|
||||
*/
|
||||
int kvm_handle_cp10_id(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
int Rt = kvm_vcpu_sys_get_rt(vcpu);
|
||||
u64 esr = kvm_vcpu_get_esr(vcpu);
|
||||
struct sys_reg_params params;
|
||||
|
||||
/* UNDEF on any unhandled register access */
|
||||
if (!kvm_esr_cp10_id_to_sys64(esr, ¶ms)) {
|
||||
kvm_inject_undefined(vcpu);
|
||||
return 1;
|
||||
}
|
||||
|
||||
if (emulate_sys_reg(vcpu, ¶ms))
|
||||
vcpu_set_reg(vcpu, Rt, params.regval);
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
/**
|
||||
* kvm_emulate_cp15_id_reg() - Handles an MRC trap on a guest CP15 access where
|
||||
* CRn=0, which corresponds to the AArch32 feature
|
||||
* registers.
|
||||
* @vcpu: the vCPU pointer
|
||||
* @params: the system register access parameters.
|
||||
*
|
||||
* Our cp15 system register tables do not enumerate the AArch32 feature
|
||||
* registers. Conveniently, our AArch64 table does, and the AArch32 system
|
||||
* register encoding can be trivially remapped into the AArch64 for the feature
|
||||
* registers: Append op0=3, leaving op1, CRn, CRm, and op2 the same.
|
||||
*
|
||||
* According to DDI0487G.b G7.3.1, paragraph "Behavior of VMSAv8-32 32-bit
|
||||
* System registers with (coproc=0b1111, CRn==c0)", read accesses from this
|
||||
* range are either UNKNOWN or RES0. Rerouting remains architectural as we
|
||||
* treat undefined registers in this range as RAZ.
|
||||
*/
|
||||
static int kvm_emulate_cp15_id_reg(struct kvm_vcpu *vcpu,
|
||||
struct sys_reg_params *params)
|
||||
{
|
||||
int Rt = kvm_vcpu_sys_get_rt(vcpu);
|
||||
|
||||
/* Treat impossible writes to RO registers as UNDEFINED */
|
||||
if (params->is_write) {
|
||||
unhandled_cp_access(vcpu, params);
|
||||
return 1;
|
||||
}
|
||||
|
||||
params->Op0 = 3;
|
||||
|
||||
/*
|
||||
* All registers where CRm > 3 are known to be UNKNOWN/RAZ from AArch32.
|
||||
* Avoid conflicting with future expansion of AArch64 feature registers
|
||||
* and simply treat them as RAZ here.
|
||||
*/
|
||||
if (params->CRm > 3)
|
||||
params->regval = 0;
|
||||
else if (!emulate_sys_reg(vcpu, params))
|
||||
return 1;
|
||||
|
||||
vcpu_set_reg(vcpu, Rt, params->regval);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/**
|
||||
* kvm_handle_cp_32 -- handles a mrc/mcr trap on a guest CP14/CP15 access
|
||||
* @vcpu: The VCPU pointer
|
||||
* @run: The kvm_run struct
|
||||
*/
|
||||
static int kvm_handle_cp_32(struct kvm_vcpu *vcpu,
|
||||
struct sys_reg_params *params,
|
||||
const struct sys_reg_desc *global,
|
||||
size_t nr_global)
|
||||
{
|
||||
struct sys_reg_params params;
|
||||
u64 esr = kvm_vcpu_get_esr(vcpu);
|
||||
int Rt = kvm_vcpu_sys_get_rt(vcpu);
|
||||
|
||||
params.CRm = (esr >> 1) & 0xf;
|
||||
params.regval = vcpu_get_reg(vcpu, Rt);
|
||||
params.is_write = ((esr & 1) == 0);
|
||||
params.CRn = (esr >> 10) & 0xf;
|
||||
params.Op0 = 0;
|
||||
params.Op1 = (esr >> 14) & 0x7;
|
||||
params.Op2 = (esr >> 17) & 0x7;
|
||||
params->regval = vcpu_get_reg(vcpu, Rt);
|
||||
|
||||
if (!emulate_cp(vcpu, ¶ms, global, nr_global)) {
|
||||
if (!params.is_write)
|
||||
vcpu_set_reg(vcpu, Rt, params.regval);
|
||||
if (emulate_cp(vcpu, params, global, nr_global)) {
|
||||
if (!params->is_write)
|
||||
vcpu_set_reg(vcpu, Rt, params->regval);
|
||||
return 1;
|
||||
}
|
||||
|
||||
unhandled_cp_access(vcpu, ¶ms);
|
||||
unhandled_cp_access(vcpu, params);
|
||||
return 1;
|
||||
}
|
||||
|
||||
@ -2388,7 +2501,20 @@ int kvm_handle_cp15_64(struct kvm_vcpu *vcpu)
|
||||
|
||||
int kvm_handle_cp15_32(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return kvm_handle_cp_32(vcpu, cp15_regs, ARRAY_SIZE(cp15_regs));
|
||||
struct sys_reg_params params;
|
||||
|
||||
params = esr_cp1x_32_to_params(kvm_vcpu_get_esr(vcpu));
|
||||
|
||||
/*
|
||||
* Certain AArch32 ID registers are handled by rerouting to the AArch64
|
||||
* system register table. Registers in the ID range where CRm=0 are
|
||||
* excluded from this scheme as they do not trivially map into AArch64
|
||||
* system register encodings.
|
||||
*/
|
||||
if (params.Op1 == 0 && params.CRn == 0 && params.CRm)
|
||||
return kvm_emulate_cp15_id_reg(vcpu, ¶ms);
|
||||
|
||||
return kvm_handle_cp_32(vcpu, ¶ms, cp15_regs, ARRAY_SIZE(cp15_regs));
|
||||
}
|
||||
|
||||
int kvm_handle_cp14_64(struct kvm_vcpu *vcpu)
|
||||
@ -2398,7 +2524,11 @@ int kvm_handle_cp14_64(struct kvm_vcpu *vcpu)
|
||||
|
||||
int kvm_handle_cp14_32(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return kvm_handle_cp_32(vcpu, cp14_regs, ARRAY_SIZE(cp14_regs));
|
||||
struct sys_reg_params params;
|
||||
|
||||
params = esr_cp1x_32_to_params(kvm_vcpu_get_esr(vcpu));
|
||||
|
||||
return kvm_handle_cp_32(vcpu, ¶ms, cp14_regs, ARRAY_SIZE(cp14_regs));
|
||||
}
|
||||
|
||||
static bool is_imp_def_sys_reg(struct sys_reg_params *params)
|
||||
@ -2407,7 +2537,14 @@ static bool is_imp_def_sys_reg(struct sys_reg_params *params)
|
||||
return params->Op0 == 3 && (params->CRn & 0b1011) == 0b1011;
|
||||
}
|
||||
|
||||
static int emulate_sys_reg(struct kvm_vcpu *vcpu,
|
||||
/**
|
||||
* emulate_sys_reg - Emulate a guest access to an AArch64 system register
|
||||
* @vcpu: The VCPU pointer
|
||||
* @params: Decoded system register parameters
|
||||
*
|
||||
* Return: true if the system register access was successful, false otherwise.
|
||||
*/
|
||||
static bool emulate_sys_reg(struct kvm_vcpu *vcpu,
|
||||
struct sys_reg_params *params)
|
||||
{
|
||||
const struct sys_reg_desc *r;
|
||||
@ -2416,7 +2553,10 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
|
||||
|
||||
if (likely(r)) {
|
||||
perform_access(vcpu, params, r);
|
||||
} else if (is_imp_def_sys_reg(params)) {
|
||||
return true;
|
||||
}
|
||||
|
||||
if (is_imp_def_sys_reg(params)) {
|
||||
kvm_inject_undefined(vcpu);
|
||||
} else {
|
||||
print_sys_reg_msg(params,
|
||||
@ -2424,7 +2564,7 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
|
||||
*vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
|
||||
kvm_inject_undefined(vcpu);
|
||||
}
|
||||
return 1;
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
@ -2452,18 +2592,18 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
|
||||
struct sys_reg_params params;
|
||||
unsigned long esr = kvm_vcpu_get_esr(vcpu);
|
||||
int Rt = kvm_vcpu_sys_get_rt(vcpu);
|
||||
int ret;
|
||||
|
||||
trace_kvm_handle_sys_reg(esr);
|
||||
|
||||
params = esr_sys64_to_params(esr);
|
||||
params.regval = vcpu_get_reg(vcpu, Rt);
|
||||
|
||||
ret = emulate_sys_reg(vcpu, ¶ms);
|
||||
if (!emulate_sys_reg(vcpu, ¶ms))
|
||||
return 1;
|
||||
|
||||
if (!params.is_write)
|
||||
vcpu_set_reg(vcpu, Rt, params.regval);
|
||||
return ret;
|
||||
return 1;
|
||||
}
|
||||
|
||||
/******************************************************************************
|
||||
@ -2866,18 +3006,22 @@ int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
|
||||
return write_demux_regids(uindices);
|
||||
}
|
||||
|
||||
void kvm_sys_reg_table_init(void)
|
||||
int kvm_sys_reg_table_init(void)
|
||||
{
|
||||
bool valid = true;
|
||||
unsigned int i;
|
||||
struct sys_reg_desc clidr;
|
||||
|
||||
/* Make sure tables are unique and in order. */
|
||||
BUG_ON(check_sysreg_table(sys_reg_descs, ARRAY_SIZE(sys_reg_descs), false));
|
||||
BUG_ON(check_sysreg_table(cp14_regs, ARRAY_SIZE(cp14_regs), true));
|
||||
BUG_ON(check_sysreg_table(cp14_64_regs, ARRAY_SIZE(cp14_64_regs), true));
|
||||
BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs), true));
|
||||
BUG_ON(check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs), true));
|
||||
BUG_ON(check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs), false));
|
||||
valid &= check_sysreg_table(sys_reg_descs, ARRAY_SIZE(sys_reg_descs), false);
|
||||
valid &= check_sysreg_table(cp14_regs, ARRAY_SIZE(cp14_regs), true);
|
||||
valid &= check_sysreg_table(cp14_64_regs, ARRAY_SIZE(cp14_64_regs), true);
|
||||
valid &= check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs), true);
|
||||
valid &= check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs), true);
|
||||
valid &= check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs), false);
|
||||
|
||||
if (!valid)
|
||||
return -EINVAL;
|
||||
|
||||
/* We abuse the reset function to overwrite the table itself. */
|
||||
for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
|
||||
@ -2900,4 +3044,6 @@ void kvm_sys_reg_table_init(void)
|
||||
break;
|
||||
/* Clear all higher bits. */
|
||||
cache_levels &= (1 << (i*3))-1;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -35,12 +35,19 @@ struct sys_reg_params {
|
||||
.Op2 = ((esr) >> 17) & 0x7, \
|
||||
.is_write = !((esr) & 1) })
|
||||
|
||||
#define esr_cp1x_32_to_params(esr) \
|
||||
((struct sys_reg_params){ .Op1 = ((esr) >> 14) & 0x7, \
|
||||
.CRn = ((esr) >> 10) & 0xf, \
|
||||
.CRm = ((esr) >> 1) & 0xf, \
|
||||
.Op2 = ((esr) >> 17) & 0x7, \
|
||||
.is_write = !((esr) & 1) })
|
||||
|
||||
struct sys_reg_desc {
|
||||
/* Sysreg string for debug */
|
||||
const char *name;
|
||||
|
||||
enum {
|
||||
AA32_ZEROHIGH,
|
||||
AA32_DIRECT,
|
||||
AA32_LO,
|
||||
AA32_HI,
|
||||
} aarch32_map;
|
||||
|
@ -98,11 +98,11 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
|
||||
ret = 0;
|
||||
|
||||
if (type == KVM_DEV_TYPE_ARM_VGIC_V2)
|
||||
kvm->arch.max_vcpus = VGIC_V2_MAX_CPUS;
|
||||
kvm->max_vcpus = VGIC_V2_MAX_CPUS;
|
||||
else
|
||||
kvm->arch.max_vcpus = VGIC_V3_MAX_CPUS;
|
||||
kvm->max_vcpus = VGIC_V3_MAX_CPUS;
|
||||
|
||||
if (atomic_read(&kvm->online_vcpus) > kvm->arch.max_vcpus) {
|
||||
if (atomic_read(&kvm->online_vcpus) > kvm->max_vcpus) {
|
||||
ret = -E2BIG;
|
||||
goto out_unlock;
|
||||
}
|
||||
@ -319,7 +319,12 @@ int vgic_init(struct kvm *kvm)
|
||||
|
||||
vgic_debug_init(kvm);
|
||||
|
||||
dist->implementation_rev = 2;
|
||||
/*
|
||||
* If userspace didn't set the GIC implementation revision,
|
||||
* default to the latest and greatest. You know want it.
|
||||
*/
|
||||
if (!dist->implementation_rev)
|
||||
dist->implementation_rev = KVM_VGIC_IMP_REV_LATEST;
|
||||
dist->initialized = true;
|
||||
|
||||
out:
|
||||
|
@ -683,7 +683,7 @@ int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its,
|
||||
if (!vcpu)
|
||||
return E_ITS_INT_UNMAPPED_INTERRUPT;
|
||||
|
||||
if (!vcpu->arch.vgic_cpu.lpis_enabled)
|
||||
if (!vgic_lpis_enabled(vcpu))
|
||||
return -EBUSY;
|
||||
|
||||
vgic_its_cache_translation(kvm, its, devid, eventid, ite->irq);
|
||||
@ -894,6 +894,18 @@ static int vgic_its_cmd_handle_movi(struct kvm *kvm, struct vgic_its *its,
|
||||
return update_affinity(ite->irq, vcpu);
|
||||
}
|
||||
|
||||
static bool __is_visible_gfn_locked(struct vgic_its *its, gpa_t gpa)
|
||||
{
|
||||
gfn_t gfn = gpa >> PAGE_SHIFT;
|
||||
int idx;
|
||||
bool ret;
|
||||
|
||||
idx = srcu_read_lock(&its->dev->kvm->srcu);
|
||||
ret = kvm_is_visible_gfn(its->dev->kvm, gfn);
|
||||
srcu_read_unlock(&its->dev->kvm->srcu, idx);
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* Check whether an ID can be stored into the corresponding guest table.
|
||||
* For a direct table this is pretty easy, but gets a bit nasty for
|
||||
@ -908,9 +920,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
|
||||
u64 indirect_ptr, type = GITS_BASER_TYPE(baser);
|
||||
phys_addr_t base = GITS_BASER_ADDR_48_to_52(baser);
|
||||
int esz = GITS_BASER_ENTRY_SIZE(baser);
|
||||
int index, idx;
|
||||
gfn_t gfn;
|
||||
bool ret;
|
||||
int index;
|
||||
|
||||
switch (type) {
|
||||
case GITS_BASER_TYPE_DEVICE:
|
||||
@ -933,12 +943,11 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
|
||||
return false;
|
||||
|
||||
addr = base + id * esz;
|
||||
gfn = addr >> PAGE_SHIFT;
|
||||
|
||||
if (eaddr)
|
||||
*eaddr = addr;
|
||||
|
||||
goto out;
|
||||
return __is_visible_gfn_locked(its, addr);
|
||||
}
|
||||
|
||||
/* calculate and check the index into the 1st level */
|
||||
@ -964,27 +973,42 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
|
||||
/* Find the address of the actual entry */
|
||||
index = id % (SZ_64K / esz);
|
||||
indirect_ptr += index * esz;
|
||||
gfn = indirect_ptr >> PAGE_SHIFT;
|
||||
|
||||
if (eaddr)
|
||||
*eaddr = indirect_ptr;
|
||||
|
||||
out:
|
||||
idx = srcu_read_lock(&its->dev->kvm->srcu);
|
||||
ret = kvm_is_visible_gfn(its->dev->kvm, gfn);
|
||||
srcu_read_unlock(&its->dev->kvm->srcu, idx);
|
||||
return ret;
|
||||
return __is_visible_gfn_locked(its, indirect_ptr);
|
||||
}
|
||||
|
||||
/*
|
||||
* Check whether an event ID can be stored in the corresponding Interrupt
|
||||
* Translation Table, which starts at device->itt_addr.
|
||||
*/
|
||||
static bool vgic_its_check_event_id(struct vgic_its *its, struct its_device *device,
|
||||
u32 event_id)
|
||||
{
|
||||
const struct vgic_its_abi *abi = vgic_its_get_abi(its);
|
||||
int ite_esz = abi->ite_esz;
|
||||
gpa_t gpa;
|
||||
|
||||
/* max table size is: BIT_ULL(device->num_eventid_bits) * ite_esz */
|
||||
if (event_id >= BIT_ULL(device->num_eventid_bits))
|
||||
return false;
|
||||
|
||||
gpa = device->itt_addr + event_id * ite_esz;
|
||||
return __is_visible_gfn_locked(its, gpa);
|
||||
}
|
||||
|
||||
/*
|
||||
* Add a new collection into the ITS collection table.
|
||||
* Returns 0 on success, and a negative error value for generic errors.
|
||||
*/
|
||||
static int vgic_its_alloc_collection(struct vgic_its *its,
|
||||
struct its_collection **colp,
|
||||
u32 coll_id)
|
||||
{
|
||||
struct its_collection *collection;
|
||||
|
||||
if (!vgic_its_check_id(its, its->baser_coll_table, coll_id, NULL))
|
||||
return E_ITS_MAPC_COLLECTION_OOR;
|
||||
|
||||
collection = kzalloc(sizeof(*collection), GFP_KERNEL_ACCOUNT);
|
||||
if (!collection)
|
||||
return -ENOMEM;
|
||||
@ -1061,7 +1085,7 @@ static int vgic_its_cmd_handle_mapi(struct kvm *kvm, struct vgic_its *its,
|
||||
if (!device)
|
||||
return E_ITS_MAPTI_UNMAPPED_DEVICE;
|
||||
|
||||
if (event_id >= BIT_ULL(device->num_eventid_bits))
|
||||
if (!vgic_its_check_event_id(its, device, event_id))
|
||||
return E_ITS_MAPTI_ID_OOR;
|
||||
|
||||
if (its_cmd_get_command(its_cmd) == GITS_CMD_MAPTI)
|
||||
@ -1078,7 +1102,12 @@ static int vgic_its_cmd_handle_mapi(struct kvm *kvm, struct vgic_its *its,
|
||||
|
||||
collection = find_collection(its, coll_id);
|
||||
if (!collection) {
|
||||
int ret = vgic_its_alloc_collection(its, &collection, coll_id);
|
||||
int ret;
|
||||
|
||||
if (!vgic_its_check_id(its, its->baser_coll_table, coll_id, NULL))
|
||||
return E_ITS_MAPC_COLLECTION_OOR;
|
||||
|
||||
ret = vgic_its_alloc_collection(its, &collection, coll_id);
|
||||
if (ret)
|
||||
return ret;
|
||||
new_coll = collection;
|
||||
@ -1233,6 +1262,10 @@ static int vgic_its_cmd_handle_mapc(struct kvm *kvm, struct vgic_its *its,
|
||||
if (!collection) {
|
||||
int ret;
|
||||
|
||||
if (!vgic_its_check_id(its, its->baser_coll_table,
|
||||
coll_id, NULL))
|
||||
return E_ITS_MAPC_COLLECTION_OOR;
|
||||
|
||||
ret = vgic_its_alloc_collection(its, &collection,
|
||||
coll_id);
|
||||
if (ret)
|
||||
@ -1272,6 +1305,11 @@ static int vgic_its_cmd_handle_clear(struct kvm *kvm, struct vgic_its *its,
|
||||
return 0;
|
||||
}
|
||||
|
||||
int vgic_its_inv_lpi(struct kvm *kvm, struct vgic_irq *irq)
|
||||
{
|
||||
return update_lpi_config(kvm, irq, NULL, true);
|
||||
}
|
||||
|
||||
/*
|
||||
* The INV command syncs the configuration bits from the memory table.
|
||||
* Must be called with the its_lock mutex held.
|
||||
@ -1288,7 +1326,41 @@ static int vgic_its_cmd_handle_inv(struct kvm *kvm, struct vgic_its *its,
|
||||
if (!ite)
|
||||
return E_ITS_INV_UNMAPPED_INTERRUPT;
|
||||
|
||||
return update_lpi_config(kvm, ite->irq, NULL, true);
|
||||
return vgic_its_inv_lpi(kvm, ite->irq);
|
||||
}
|
||||
|
||||
/**
|
||||
* vgic_its_invall - invalidate all LPIs targetting a given vcpu
|
||||
* @vcpu: the vcpu for which the RD is targetted by an invalidation
|
||||
*
|
||||
* Contrary to the INVALL command, this targets a RD instead of a
|
||||
* collection, and we don't need to hold the its_lock, since no ITS is
|
||||
* involved here.
|
||||
*/
|
||||
int vgic_its_invall(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
int irq_count, i = 0;
|
||||
u32 *intids;
|
||||
|
||||
irq_count = vgic_copy_lpi_list(kvm, vcpu, &intids);
|
||||
if (irq_count < 0)
|
||||
return irq_count;
|
||||
|
||||
for (i = 0; i < irq_count; i++) {
|
||||
struct vgic_irq *irq = vgic_get_irq(kvm, NULL, intids[i]);
|
||||
if (!irq)
|
||||
continue;
|
||||
update_lpi_config(kvm, irq, vcpu, false);
|
||||
vgic_put_irq(kvm, irq);
|
||||
}
|
||||
|
||||
kfree(intids);
|
||||
|
||||
if (vcpu->arch.vgic_cpu.vgic_v3.its_vpe.its_vm)
|
||||
its_invall_vpe(&vcpu->arch.vgic_cpu.vgic_v3.its_vpe);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -1305,32 +1377,13 @@ static int vgic_its_cmd_handle_invall(struct kvm *kvm, struct vgic_its *its,
|
||||
u32 coll_id = its_cmd_get_collection(its_cmd);
|
||||
struct its_collection *collection;
|
||||
struct kvm_vcpu *vcpu;
|
||||
struct vgic_irq *irq;
|
||||
u32 *intids;
|
||||
int irq_count, i;
|
||||
|
||||
collection = find_collection(its, coll_id);
|
||||
if (!its_is_collection_mapped(collection))
|
||||
return E_ITS_INVALL_UNMAPPED_COLLECTION;
|
||||
|
||||
vcpu = kvm_get_vcpu(kvm, collection->target_addr);
|
||||
|
||||
irq_count = vgic_copy_lpi_list(kvm, vcpu, &intids);
|
||||
if (irq_count < 0)
|
||||
return irq_count;
|
||||
|
||||
for (i = 0; i < irq_count; i++) {
|
||||
irq = vgic_get_irq(kvm, NULL, intids[i]);
|
||||
if (!irq)
|
||||
continue;
|
||||
update_lpi_config(kvm, irq, vcpu, false);
|
||||
vgic_put_irq(kvm, irq);
|
||||
}
|
||||
|
||||
kfree(intids);
|
||||
|
||||
if (vcpu->arch.vgic_cpu.vgic_v3.its_vpe.its_vm)
|
||||
its_invall_vpe(&vcpu->arch.vgic_cpu.vgic_v3.its_vpe);
|
||||
vgic_its_invall(vcpu);
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -2175,6 +2228,9 @@ static int vgic_its_restore_ite(struct vgic_its *its, u32 event_id,
|
||||
if (!collection)
|
||||
return -EINVAL;
|
||||
|
||||
if (!vgic_its_check_event_id(its, dev, event_id))
|
||||
return -EINVAL;
|
||||
|
||||
ite = vgic_its_alloc_ite(dev, collection, event_id);
|
||||
if (IS_ERR(ite))
|
||||
return PTR_ERR(ite);
|
||||
@ -2183,8 +2239,10 @@ static int vgic_its_restore_ite(struct vgic_its *its, u32 event_id,
|
||||
vcpu = kvm_get_vcpu(kvm, collection->target_addr);
|
||||
|
||||
irq = vgic_add_lpi(kvm, lpi_id, vcpu);
|
||||
if (IS_ERR(irq))
|
||||
if (IS_ERR(irq)) {
|
||||
its_free_ite(kvm, ite);
|
||||
return PTR_ERR(irq);
|
||||
}
|
||||
ite->irq = irq;
|
||||
|
||||
return offset;
|
||||
@ -2296,6 +2354,7 @@ static int vgic_its_restore_dte(struct vgic_its *its, u32 id,
|
||||
void *ptr, void *opaque)
|
||||
{
|
||||
struct its_device *dev;
|
||||
u64 baser = its->baser_device_table;
|
||||
gpa_t itt_addr;
|
||||
u8 num_eventid_bits;
|
||||
u64 entry = *(u64 *)ptr;
|
||||
@ -2316,6 +2375,9 @@ static int vgic_its_restore_dte(struct vgic_its *its, u32 id,
|
||||
/* dte entry is valid */
|
||||
offset = (entry & KVM_ITS_DTE_NEXT_MASK) >> KVM_ITS_DTE_NEXT_SHIFT;
|
||||
|
||||
if (!vgic_its_check_id(its, baser, id, NULL))
|
||||
return -EINVAL;
|
||||
|
||||
dev = vgic_its_alloc_device(its, id, itt_addr, num_eventid_bits);
|
||||
if (IS_ERR(dev))
|
||||
return PTR_ERR(dev);
|
||||
@ -2445,6 +2507,9 @@ static int vgic_its_restore_device_tables(struct vgic_its *its)
|
||||
if (ret > 0)
|
||||
ret = 0;
|
||||
|
||||
if (ret < 0)
|
||||
vgic_its_free_device_list(its->dev->kvm, its);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -2461,6 +2526,11 @@ static int vgic_its_save_cte(struct vgic_its *its,
|
||||
return kvm_write_guest_lock(its->dev->kvm, gpa, &val, esz);
|
||||
}
|
||||
|
||||
/*
|
||||
* Restore a collection entry into the ITS collection table.
|
||||
* Return +1 on success, 0 if the entry was invalid (which should be
|
||||
* interpreted as end-of-table), and a negative error value for generic errors.
|
||||
*/
|
||||
static int vgic_its_restore_cte(struct vgic_its *its, gpa_t gpa, int esz)
|
||||
{
|
||||
struct its_collection *collection;
|
||||
@ -2487,6 +2557,10 @@ static int vgic_its_restore_cte(struct vgic_its *its, gpa_t gpa, int esz)
|
||||
collection = find_collection(its, coll_id);
|
||||
if (collection)
|
||||
return -EEXIST;
|
||||
|
||||
if (!vgic_its_check_id(its, its->baser_coll_table, coll_id, NULL))
|
||||
return -EINVAL;
|
||||
|
||||
ret = vgic_its_alloc_collection(its, &collection, coll_id);
|
||||
if (ret)
|
||||
return ret;
|
||||
@ -2566,6 +2640,9 @@ static int vgic_its_restore_collection_table(struct vgic_its *its)
|
||||
if (ret > 0)
|
||||
return 0;
|
||||
|
||||
if (ret < 0)
|
||||
vgic_its_free_collection_list(its->dev->kvm, its);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -2597,7 +2674,10 @@ static int vgic_its_restore_tables_v0(struct vgic_its *its)
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
return vgic_its_restore_device_tables(its);
|
||||
ret = vgic_its_restore_device_tables(its);
|
||||
if (ret)
|
||||
vgic_its_free_collection_list(its->dev->kvm, its);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int vgic_its_commit_v0(struct vgic_its *its)
|
||||
|
@ -73,9 +73,13 @@ static int vgic_mmio_uaccess_write_v2_misc(struct kvm_vcpu *vcpu,
|
||||
gpa_t addr, unsigned int len,
|
||||
unsigned long val)
|
||||
{
|
||||
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
|
||||
u32 reg;
|
||||
|
||||
switch (addr & 0x0c) {
|
||||
case GIC_DIST_IIDR:
|
||||
if (val != vgic_mmio_read_v2_misc(vcpu, addr, len))
|
||||
reg = vgic_mmio_read_v2_misc(vcpu, addr, len);
|
||||
if ((reg ^ val) & ~GICD_IIDR_REVISION_MASK)
|
||||
return -EINVAL;
|
||||
|
||||
/*
|
||||
@ -87,8 +91,16 @@ static int vgic_mmio_uaccess_write_v2_misc(struct kvm_vcpu *vcpu,
|
||||
* migration from old kernels to new kernels with legacy
|
||||
* userspace.
|
||||
*/
|
||||
vcpu->kvm->arch.vgic.v2_groups_user_writable = true;
|
||||
return 0;
|
||||
reg = FIELD_GET(GICD_IIDR_REVISION_MASK, reg);
|
||||
switch (reg) {
|
||||
case KVM_VGIC_IMP_REV_2:
|
||||
case KVM_VGIC_IMP_REV_3:
|
||||
vcpu->kvm->arch.vgic.v2_groups_user_writable = true;
|
||||
dist->implementation_rev = reg;
|
||||
return 0;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
}
|
||||
|
||||
vgic_mmio_write_v2_misc(vcpu, addr, len, val);
|
||||
|
@ -155,13 +155,27 @@ static int vgic_mmio_uaccess_write_v3_misc(struct kvm_vcpu *vcpu,
|
||||
unsigned long val)
|
||||
{
|
||||
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
|
||||
u32 reg;
|
||||
|
||||
switch (addr & 0x0c) {
|
||||
case GICD_TYPER2:
|
||||
case GICD_IIDR:
|
||||
if (val != vgic_mmio_read_v3_misc(vcpu, addr, len))
|
||||
return -EINVAL;
|
||||
return 0;
|
||||
case GICD_IIDR:
|
||||
reg = vgic_mmio_read_v3_misc(vcpu, addr, len);
|
||||
if ((reg ^ val) & ~GICD_IIDR_REVISION_MASK)
|
||||
return -EINVAL;
|
||||
|
||||
reg = FIELD_GET(GICD_IIDR_REVISION_MASK, reg);
|
||||
switch (reg) {
|
||||
case KVM_VGIC_IMP_REV_2:
|
||||
case KVM_VGIC_IMP_REV_3:
|
||||
dist->implementation_rev = reg;
|
||||
return 0;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
case GICD_CTLR:
|
||||
/* Not a GICv4.1? No HW SGIs */
|
||||
if (!kvm_vgic_global_state.has_gicv4_1)
|
||||
@ -221,34 +235,58 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
|
||||
vgic_put_irq(vcpu->kvm, irq);
|
||||
}
|
||||
|
||||
bool vgic_lpis_enabled(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
|
||||
|
||||
return atomic_read(&vgic_cpu->ctlr) == GICR_CTLR_ENABLE_LPIS;
|
||||
}
|
||||
|
||||
static unsigned long vgic_mmio_read_v3r_ctlr(struct kvm_vcpu *vcpu,
|
||||
gpa_t addr, unsigned int len)
|
||||
{
|
||||
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
|
||||
unsigned long val;
|
||||
|
||||
return vgic_cpu->lpis_enabled ? GICR_CTLR_ENABLE_LPIS : 0;
|
||||
val = atomic_read(&vgic_cpu->ctlr);
|
||||
if (vgic_get_implementation_rev(vcpu) >= KVM_VGIC_IMP_REV_3)
|
||||
val |= GICR_CTLR_IR | GICR_CTLR_CES;
|
||||
|
||||
return val;
|
||||
}
|
||||
|
||||
|
||||
static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu *vcpu,
|
||||
gpa_t addr, unsigned int len,
|
||||
unsigned long val)
|
||||
{
|
||||
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
|
||||
bool was_enabled = vgic_cpu->lpis_enabled;
|
||||
u32 ctlr;
|
||||
|
||||
if (!vgic_has_its(vcpu->kvm))
|
||||
return;
|
||||
|
||||
vgic_cpu->lpis_enabled = val & GICR_CTLR_ENABLE_LPIS;
|
||||
if (!(val & GICR_CTLR_ENABLE_LPIS)) {
|
||||
/*
|
||||
* Don't disable if RWP is set, as there already an
|
||||
* ongoing disable. Funky guest...
|
||||
*/
|
||||
ctlr = atomic_cmpxchg_acquire(&vgic_cpu->ctlr,
|
||||
GICR_CTLR_ENABLE_LPIS,
|
||||
GICR_CTLR_RWP);
|
||||
if (ctlr != GICR_CTLR_ENABLE_LPIS)
|
||||
return;
|
||||
|
||||
if (was_enabled && !vgic_cpu->lpis_enabled) {
|
||||
vgic_flush_pending_lpis(vcpu);
|
||||
vgic_its_invalidate_cache(vcpu->kvm);
|
||||
}
|
||||
atomic_set_release(&vgic_cpu->ctlr, 0);
|
||||
} else {
|
||||
ctlr = atomic_cmpxchg_acquire(&vgic_cpu->ctlr, 0,
|
||||
GICR_CTLR_ENABLE_LPIS);
|
||||
if (ctlr != 0)
|
||||
return;
|
||||
|
||||
if (!was_enabled && vgic_cpu->lpis_enabled)
|
||||
vgic_enable_lpis(vcpu);
|
||||
}
|
||||
}
|
||||
|
||||
static bool vgic_mmio_vcpu_rdist_is_last(struct kvm_vcpu *vcpu)
|
||||
@ -478,11 +516,10 @@ static void vgic_mmio_write_propbase(struct kvm_vcpu *vcpu,
|
||||
unsigned long val)
|
||||
{
|
||||
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
|
||||
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
|
||||
u64 old_propbaser, propbaser;
|
||||
|
||||
/* Storing a value with LPIs already enabled is undefined */
|
||||
if (vgic_cpu->lpis_enabled)
|
||||
if (vgic_lpis_enabled(vcpu))
|
||||
return;
|
||||
|
||||
do {
|
||||
@ -513,7 +550,7 @@ static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu,
|
||||
u64 old_pendbaser, pendbaser;
|
||||
|
||||
/* Storing a value with LPIs already enabled is undefined */
|
||||
if (vgic_cpu->lpis_enabled)
|
||||
if (vgic_lpis_enabled(vcpu))
|
||||
return;
|
||||
|
||||
do {
|
||||
@ -525,6 +562,63 @@ static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu,
|
||||
pendbaser) != old_pendbaser);
|
||||
}
|
||||
|
||||
static unsigned long vgic_mmio_read_sync(struct kvm_vcpu *vcpu,
|
||||
gpa_t addr, unsigned int len)
|
||||
{
|
||||
return !!atomic_read(&vcpu->arch.vgic_cpu.syncr_busy);
|
||||
}
|
||||
|
||||
static void vgic_set_rdist_busy(struct kvm_vcpu *vcpu, bool busy)
|
||||
{
|
||||
if (busy) {
|
||||
atomic_inc(&vcpu->arch.vgic_cpu.syncr_busy);
|
||||
smp_mb__after_atomic();
|
||||
} else {
|
||||
smp_mb__before_atomic();
|
||||
atomic_dec(&vcpu->arch.vgic_cpu.syncr_busy);
|
||||
}
|
||||
}
|
||||
|
||||
static void vgic_mmio_write_invlpi(struct kvm_vcpu *vcpu,
|
||||
gpa_t addr, unsigned int len,
|
||||
unsigned long val)
|
||||
{
|
||||
struct vgic_irq *irq;
|
||||
|
||||
/*
|
||||
* If the guest wrote only to the upper 32bit part of the
|
||||
* register, drop the write on the floor, as it is only for
|
||||
* vPEs (which we don't support for obvious reasons).
|
||||
*
|
||||
* Also discard the access if LPIs are not enabled.
|
||||
*/
|
||||
if ((addr & 4) || !vgic_lpis_enabled(vcpu))
|
||||
return;
|
||||
|
||||
vgic_set_rdist_busy(vcpu, true);
|
||||
|
||||
irq = vgic_get_irq(vcpu->kvm, NULL, lower_32_bits(val));
|
||||
if (irq) {
|
||||
vgic_its_inv_lpi(vcpu->kvm, irq);
|
||||
vgic_put_irq(vcpu->kvm, irq);
|
||||
}
|
||||
|
||||
vgic_set_rdist_busy(vcpu, false);
|
||||
}
|
||||
|
||||
static void vgic_mmio_write_invall(struct kvm_vcpu *vcpu,
|
||||
gpa_t addr, unsigned int len,
|
||||
unsigned long val)
|
||||
{
|
||||
/* See vgic_mmio_write_invlpi() for the early return rationale */
|
||||
if ((addr & 4) || !vgic_lpis_enabled(vcpu))
|
||||
return;
|
||||
|
||||
vgic_set_rdist_busy(vcpu, true);
|
||||
vgic_its_invall(vcpu);
|
||||
vgic_set_rdist_busy(vcpu, false);
|
||||
}
|
||||
|
||||
/*
|
||||
* The GICv3 per-IRQ registers are split to control PPIs and SGIs in the
|
||||
* redistributors, while SPIs are covered by registers in the distributor
|
||||
@ -630,6 +724,15 @@ static const struct vgic_register_region vgic_v3_rd_registers[] = {
|
||||
REGISTER_DESC_WITH_LENGTH(GICR_PENDBASER,
|
||||
vgic_mmio_read_pendbase, vgic_mmio_write_pendbase, 8,
|
||||
VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
|
||||
REGISTER_DESC_WITH_LENGTH(GICR_INVLPIR,
|
||||
vgic_mmio_read_raz, vgic_mmio_write_invlpi, 8,
|
||||
VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
|
||||
REGISTER_DESC_WITH_LENGTH(GICR_INVALLR,
|
||||
vgic_mmio_read_raz, vgic_mmio_write_invall, 8,
|
||||
VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
|
||||
REGISTER_DESC_WITH_LENGTH(GICR_SYNCR,
|
||||
vgic_mmio_read_sync, vgic_mmio_write_wi, 4,
|
||||
VGIC_ACCESS_32bit),
|
||||
REGISTER_DESC_WITH_LENGTH(GICR_IDREGS,
|
||||
vgic_mmio_read_v3_idregs, vgic_mmio_write_wi, 48,
|
||||
VGIC_ACCESS_32bit),
|
||||
|
@ -612,6 +612,10 @@ early_param("kvm-arm.vgic_v4_enable", early_gicv4_enable);
|
||||
static const struct midr_range broken_seis[] = {
|
||||
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM),
|
||||
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM),
|
||||
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM_PRO),
|
||||
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM_PRO),
|
||||
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM_MAX),
|
||||
MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM_MAX),
|
||||
{},
|
||||
};
|
||||
|
||||
|
@ -98,6 +98,11 @@
|
||||
#define DEBUG_SPINLOCK_BUG_ON(p)
|
||||
#endif
|
||||
|
||||
static inline u32 vgic_get_implementation_rev(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return vcpu->kvm->arch.vgic.implementation_rev;
|
||||
}
|
||||
|
||||
/* Requires the irq_lock to be held by the caller. */
|
||||
static inline bool irq_is_pending(struct vgic_irq *irq)
|
||||
{
|
||||
@ -308,6 +313,7 @@ static inline bool vgic_dist_overlap(struct kvm *kvm, gpa_t base, size_t size)
|
||||
(base < d->vgic_dist_base + KVM_VGIC_V3_DIST_SIZE);
|
||||
}
|
||||
|
||||
bool vgic_lpis_enabled(struct kvm_vcpu *vcpu);
|
||||
int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr);
|
||||
int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its,
|
||||
u32 devid, u32 eventid, struct vgic_irq **irq);
|
||||
@ -317,6 +323,10 @@ void vgic_lpi_translation_cache_init(struct kvm *kvm);
|
||||
void vgic_lpi_translation_cache_destroy(struct kvm *kvm);
|
||||
void vgic_its_invalidate_cache(struct kvm *kvm);
|
||||
|
||||
/* GICv4.1 MMIO interface */
|
||||
int vgic_its_inv_lpi(struct kvm *kvm, struct vgic_irq *irq);
|
||||
int vgic_its_invall(struct kvm_vcpu *vcpu);
|
||||
|
||||
bool vgic_supports_direct_msis(struct kvm *kvm);
|
||||
int vgic_v4_init(struct kvm *kvm);
|
||||
void vgic_v4_teardown(struct kvm *kvm);
|
||||
|
@ -27,7 +27,17 @@ void __delay(unsigned long cycles)
|
||||
{
|
||||
cycles_t start = get_cycles();
|
||||
|
||||
if (arch_timer_evtstrm_available()) {
|
||||
if (cpus_have_const_cap(ARM64_HAS_WFXT)) {
|
||||
u64 end = start + cycles;
|
||||
|
||||
/*
|
||||
* Start with WFIT. If an interrupt makes us resume
|
||||
* early, use a WFET loop to complete the delay.
|
||||
*/
|
||||
wfit(end);
|
||||
while ((get_cycles() - start) < cycles)
|
||||
wfet(end);
|
||||
} else if (arch_timer_evtstrm_available()) {
|
||||
const cycles_t timer_evt_period =
|
||||
USECS_TO_CYCLES(ARCH_TIMER_EVT_STREAM_PERIOD_US);
|
||||
|
||||
|
@ -38,6 +38,7 @@ HAS_STAGE2_FWB
|
||||
HAS_SYSREG_GIC_CPUIF
|
||||
HAS_TLB_RANGE
|
||||
HAS_VIRT_HOST_EXTN
|
||||
HAS_WFXT
|
||||
HW_DBM
|
||||
KVM_PROTECTED_MODE
|
||||
MISMATCHED_CACHE_TYPE
|
||||
|
@ -117,6 +117,7 @@
|
||||
#define HGATP_MODE_SV32X4 _AC(1, UL)
|
||||
#define HGATP_MODE_SV39X4 _AC(8, UL)
|
||||
#define HGATP_MODE_SV48X4 _AC(9, UL)
|
||||
#define HGATP_MODE_SV57X4 _AC(10, UL)
|
||||
|
||||
#define HGATP32_MODE_SHIFT 31
|
||||
#define HGATP32_VMID_SHIFT 22
|
||||
|
@ -12,12 +12,12 @@
|
||||
#include <linux/types.h>
|
||||
#include <linux/kvm.h>
|
||||
#include <linux/kvm_types.h>
|
||||
#include <linux/spinlock.h>
|
||||
#include <asm/csr.h>
|
||||
#include <asm/kvm_vcpu_fp.h>
|
||||
#include <asm/kvm_vcpu_timer.h>
|
||||
|
||||
#define KVM_MAX_VCPUS \
|
||||
((HGATP_VMID_MASK >> HGATP_VMID_SHIFT) + 1)
|
||||
#define KVM_MAX_VCPUS 1024
|
||||
|
||||
#define KVM_HALT_POLL_NS_DEFAULT 500000
|
||||
|
||||
@ -27,6 +27,31 @@
|
||||
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||
#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(1)
|
||||
#define KVM_REQ_UPDATE_HGATP KVM_ARCH_REQ(2)
|
||||
#define KVM_REQ_FENCE_I \
|
||||
KVM_ARCH_REQ_FLAGS(3, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||
#define KVM_REQ_HFENCE_GVMA_VMID_ALL KVM_REQ_TLB_FLUSH
|
||||
#define KVM_REQ_HFENCE_VVMA_ALL \
|
||||
KVM_ARCH_REQ_FLAGS(4, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||
#define KVM_REQ_HFENCE \
|
||||
KVM_ARCH_REQ_FLAGS(5, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||
|
||||
enum kvm_riscv_hfence_type {
|
||||
KVM_RISCV_HFENCE_UNKNOWN = 0,
|
||||
KVM_RISCV_HFENCE_GVMA_VMID_GPA,
|
||||
KVM_RISCV_HFENCE_VVMA_ASID_GVA,
|
||||
KVM_RISCV_HFENCE_VVMA_ASID_ALL,
|
||||
KVM_RISCV_HFENCE_VVMA_GVA,
|
||||
};
|
||||
|
||||
struct kvm_riscv_hfence {
|
||||
enum kvm_riscv_hfence_type type;
|
||||
unsigned long asid;
|
||||
unsigned long order;
|
||||
gpa_t addr;
|
||||
gpa_t size;
|
||||
};
|
||||
|
||||
#define KVM_RISCV_VCPU_MAX_HFENCE 64
|
||||
|
||||
struct kvm_vm_stat {
|
||||
struct kvm_vm_stat_generic generic;
|
||||
@ -54,10 +79,10 @@ struct kvm_vmid {
|
||||
};
|
||||
|
||||
struct kvm_arch {
|
||||
/* stage2 vmid */
|
||||
/* G-stage vmid */
|
||||
struct kvm_vmid vmid;
|
||||
|
||||
/* stage2 page table */
|
||||
/* G-stage page table */
|
||||
pgd_t *pgd;
|
||||
phys_addr_t pgd_phys;
|
||||
|
||||
@ -141,6 +166,9 @@ struct kvm_vcpu_arch {
|
||||
/* VCPU ran at least once */
|
||||
bool ran_atleast_once;
|
||||
|
||||
/* Last Host CPU on which Guest VCPU exited */
|
||||
int last_exit_cpu;
|
||||
|
||||
/* ISA feature bits (similar to MISA) */
|
||||
unsigned long isa;
|
||||
|
||||
@ -179,6 +207,12 @@ struct kvm_vcpu_arch {
|
||||
/* VCPU Timer */
|
||||
struct kvm_vcpu_timer timer;
|
||||
|
||||
/* HFENCE request queue */
|
||||
spinlock_t hfence_lock;
|
||||
unsigned long hfence_head;
|
||||
unsigned long hfence_tail;
|
||||
struct kvm_riscv_hfence hfence_queue[KVM_RISCV_VCPU_MAX_HFENCE];
|
||||
|
||||
/* MMIO instruction details */
|
||||
struct kvm_mmio_decode mmio_decode;
|
||||
|
||||
@ -201,27 +235,71 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
|
||||
|
||||
#define KVM_ARCH_WANT_MMU_NOTIFIER
|
||||
|
||||
void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long gpa_divby_4,
|
||||
unsigned long vmid);
|
||||
void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid);
|
||||
void __kvm_riscv_hfence_gvma_gpa(unsigned long gpa_divby_4);
|
||||
void __kvm_riscv_hfence_gvma_all(void);
|
||||
#define KVM_RISCV_GSTAGE_TLB_MIN_ORDER 12
|
||||
|
||||
int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
|
||||
void kvm_riscv_local_hfence_gvma_vmid_gpa(unsigned long vmid,
|
||||
gpa_t gpa, gpa_t gpsz,
|
||||
unsigned long order);
|
||||
void kvm_riscv_local_hfence_gvma_vmid_all(unsigned long vmid);
|
||||
void kvm_riscv_local_hfence_gvma_gpa(gpa_t gpa, gpa_t gpsz,
|
||||
unsigned long order);
|
||||
void kvm_riscv_local_hfence_gvma_all(void);
|
||||
void kvm_riscv_local_hfence_vvma_asid_gva(unsigned long vmid,
|
||||
unsigned long asid,
|
||||
unsigned long gva,
|
||||
unsigned long gvsz,
|
||||
unsigned long order);
|
||||
void kvm_riscv_local_hfence_vvma_asid_all(unsigned long vmid,
|
||||
unsigned long asid);
|
||||
void kvm_riscv_local_hfence_vvma_gva(unsigned long vmid,
|
||||
unsigned long gva, unsigned long gvsz,
|
||||
unsigned long order);
|
||||
void kvm_riscv_local_hfence_vvma_all(unsigned long vmid);
|
||||
|
||||
void kvm_riscv_local_tlb_sanitize(struct kvm_vcpu *vcpu);
|
||||
|
||||
void kvm_riscv_fence_i_process(struct kvm_vcpu *vcpu);
|
||||
void kvm_riscv_hfence_gvma_vmid_all_process(struct kvm_vcpu *vcpu);
|
||||
void kvm_riscv_hfence_vvma_all_process(struct kvm_vcpu *vcpu);
|
||||
void kvm_riscv_hfence_process(struct kvm_vcpu *vcpu);
|
||||
|
||||
void kvm_riscv_fence_i(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask);
|
||||
void kvm_riscv_hfence_gvma_vmid_gpa(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask,
|
||||
gpa_t gpa, gpa_t gpsz,
|
||||
unsigned long order);
|
||||
void kvm_riscv_hfence_gvma_vmid_all(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask);
|
||||
void kvm_riscv_hfence_vvma_asid_gva(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask,
|
||||
unsigned long gva, unsigned long gvsz,
|
||||
unsigned long order, unsigned long asid);
|
||||
void kvm_riscv_hfence_vvma_asid_all(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask,
|
||||
unsigned long asid);
|
||||
void kvm_riscv_hfence_vvma_gva(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask,
|
||||
unsigned long gva, unsigned long gvsz,
|
||||
unsigned long order);
|
||||
void kvm_riscv_hfence_vvma_all(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask);
|
||||
|
||||
int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
|
||||
struct kvm_memory_slot *memslot,
|
||||
gpa_t gpa, unsigned long hva, bool is_write);
|
||||
int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
|
||||
void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
|
||||
void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
|
||||
void kvm_riscv_stage2_mode_detect(void);
|
||||
unsigned long kvm_riscv_stage2_mode(void);
|
||||
int kvm_riscv_stage2_gpa_bits(void);
|
||||
int kvm_riscv_gstage_alloc_pgd(struct kvm *kvm);
|
||||
void kvm_riscv_gstage_free_pgd(struct kvm *kvm);
|
||||
void kvm_riscv_gstage_update_hgatp(struct kvm_vcpu *vcpu);
|
||||
void kvm_riscv_gstage_mode_detect(void);
|
||||
unsigned long kvm_riscv_gstage_mode(void);
|
||||
int kvm_riscv_gstage_gpa_bits(void);
|
||||
|
||||
void kvm_riscv_stage2_vmid_detect(void);
|
||||
unsigned long kvm_riscv_stage2_vmid_bits(void);
|
||||
int kvm_riscv_stage2_vmid_init(struct kvm *kvm);
|
||||
bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid);
|
||||
void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu);
|
||||
void kvm_riscv_gstage_vmid_detect(void);
|
||||
unsigned long kvm_riscv_gstage_vmid_bits(void);
|
||||
int kvm_riscv_gstage_vmid_init(struct kvm *kvm);
|
||||
bool kvm_riscv_gstage_vmid_ver_changed(struct kvm_vmid *vmid);
|
||||
void kvm_riscv_gstage_vmid_update(struct kvm_vcpu *vcpu);
|
||||
|
||||
void __kvm_riscv_unpriv_trap(void);
|
||||
|
||||
|
@ -82,6 +82,23 @@ struct kvm_riscv_timer {
|
||||
__u64 state;
|
||||
};
|
||||
|
||||
/*
|
||||
* ISA extension IDs specific to KVM. This is not the same as the host ISA
|
||||
* extension IDs as that is internal to the host and should not be exposed
|
||||
* to the guest. This should always be contiguous to keep the mapping simple
|
||||
* in KVM implementation.
|
||||
*/
|
||||
enum KVM_RISCV_ISA_EXT_ID {
|
||||
KVM_RISCV_ISA_EXT_A = 0,
|
||||
KVM_RISCV_ISA_EXT_C,
|
||||
KVM_RISCV_ISA_EXT_D,
|
||||
KVM_RISCV_ISA_EXT_F,
|
||||
KVM_RISCV_ISA_EXT_H,
|
||||
KVM_RISCV_ISA_EXT_I,
|
||||
KVM_RISCV_ISA_EXT_M,
|
||||
KVM_RISCV_ISA_EXT_MAX,
|
||||
};
|
||||
|
||||
/* Possible states for kvm_riscv_timer */
|
||||
#define KVM_RISCV_TIMER_STATE_OFF 0
|
||||
#define KVM_RISCV_TIMER_STATE_ON 1
|
||||
@ -123,6 +140,9 @@ struct kvm_riscv_timer {
|
||||
#define KVM_REG_RISCV_FP_D_REG(name) \
|
||||
(offsetof(struct __riscv_d_ext_state, name) / sizeof(__u64))
|
||||
|
||||
/* ISA Extension registers are mapped as type 7 */
|
||||
#define KVM_REG_RISCV_ISA_EXT (0x07 << KVM_REG_RISCV_TYPE_SHIFT)
|
||||
|
||||
#endif
|
||||
|
||||
#endif /* __LINUX_KVM_RISCV_H */
|
||||
|
@ -89,13 +89,13 @@ int kvm_arch_init(void *opaque)
|
||||
return -ENODEV;
|
||||
}
|
||||
|
||||
kvm_riscv_stage2_mode_detect();
|
||||
kvm_riscv_gstage_mode_detect();
|
||||
|
||||
kvm_riscv_stage2_vmid_detect();
|
||||
kvm_riscv_gstage_vmid_detect();
|
||||
|
||||
kvm_info("hypervisor extension available\n");
|
||||
|
||||
switch (kvm_riscv_stage2_mode()) {
|
||||
switch (kvm_riscv_gstage_mode()) {
|
||||
case HGATP_MODE_SV32X4:
|
||||
str = "Sv32x4";
|
||||
break;
|
||||
@ -105,12 +105,15 @@ int kvm_arch_init(void *opaque)
|
||||
case HGATP_MODE_SV48X4:
|
||||
str = "Sv48x4";
|
||||
break;
|
||||
case HGATP_MODE_SV57X4:
|
||||
str = "Sv57x4";
|
||||
break;
|
||||
default:
|
||||
return -ENODEV;
|
||||
}
|
||||
kvm_info("using %s G-stage page table format\n", str);
|
||||
|
||||
kvm_info("VMID %ld bits available\n", kvm_riscv_stage2_vmid_bits());
|
||||
kvm_info("VMID %ld bits available\n", kvm_riscv_gstage_vmid_bits());
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -18,53 +18,52 @@
|
||||
#include <asm/csr.h>
|
||||
#include <asm/page.h>
|
||||
#include <asm/pgtable.h>
|
||||
#include <asm/sbi.h>
|
||||
|
||||
#ifdef CONFIG_64BIT
|
||||
static unsigned long stage2_mode = (HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
|
||||
static unsigned long stage2_pgd_levels = 3;
|
||||
#define stage2_index_bits 9
|
||||
static unsigned long gstage_mode = (HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
|
||||
static unsigned long gstage_pgd_levels = 3;
|
||||
#define gstage_index_bits 9
|
||||
#else
|
||||
static unsigned long stage2_mode = (HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
|
||||
static unsigned long stage2_pgd_levels = 2;
|
||||
#define stage2_index_bits 10
|
||||
static unsigned long gstage_mode = (HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
|
||||
static unsigned long gstage_pgd_levels = 2;
|
||||
#define gstage_index_bits 10
|
||||
#endif
|
||||
|
||||
#define stage2_pgd_xbits 2
|
||||
#define stage2_pgd_size (1UL << (HGATP_PAGE_SHIFT + stage2_pgd_xbits))
|
||||
#define stage2_gpa_bits (HGATP_PAGE_SHIFT + \
|
||||
(stage2_pgd_levels * stage2_index_bits) + \
|
||||
stage2_pgd_xbits)
|
||||
#define stage2_gpa_size ((gpa_t)(1ULL << stage2_gpa_bits))
|
||||
#define gstage_pgd_xbits 2
|
||||
#define gstage_pgd_size (1UL << (HGATP_PAGE_SHIFT + gstage_pgd_xbits))
|
||||
#define gstage_gpa_bits (HGATP_PAGE_SHIFT + \
|
||||
(gstage_pgd_levels * gstage_index_bits) + \
|
||||
gstage_pgd_xbits)
|
||||
#define gstage_gpa_size ((gpa_t)(1ULL << gstage_gpa_bits))
|
||||
|
||||
#define stage2_pte_leaf(__ptep) \
|
||||
#define gstage_pte_leaf(__ptep) \
|
||||
(pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC))
|
||||
|
||||
static inline unsigned long stage2_pte_index(gpa_t addr, u32 level)
|
||||
static inline unsigned long gstage_pte_index(gpa_t addr, u32 level)
|
||||
{
|
||||
unsigned long mask;
|
||||
unsigned long shift = HGATP_PAGE_SHIFT + (stage2_index_bits * level);
|
||||
unsigned long shift = HGATP_PAGE_SHIFT + (gstage_index_bits * level);
|
||||
|
||||
if (level == (stage2_pgd_levels - 1))
|
||||
mask = (PTRS_PER_PTE * (1UL << stage2_pgd_xbits)) - 1;
|
||||
if (level == (gstage_pgd_levels - 1))
|
||||
mask = (PTRS_PER_PTE * (1UL << gstage_pgd_xbits)) - 1;
|
||||
else
|
||||
mask = PTRS_PER_PTE - 1;
|
||||
|
||||
return (addr >> shift) & mask;
|
||||
}
|
||||
|
||||
static inline unsigned long stage2_pte_page_vaddr(pte_t pte)
|
||||
static inline unsigned long gstage_pte_page_vaddr(pte_t pte)
|
||||
{
|
||||
return (unsigned long)pfn_to_virt(pte_val(pte) >> _PAGE_PFN_SHIFT);
|
||||
}
|
||||
|
||||
static int stage2_page_size_to_level(unsigned long page_size, u32 *out_level)
|
||||
static int gstage_page_size_to_level(unsigned long page_size, u32 *out_level)
|
||||
{
|
||||
u32 i;
|
||||
unsigned long psz = 1UL << 12;
|
||||
|
||||
for (i = 0; i < stage2_pgd_levels; i++) {
|
||||
if (page_size == (psz << (i * stage2_index_bits))) {
|
||||
for (i = 0; i < gstage_pgd_levels; i++) {
|
||||
if (page_size == (psz << (i * gstage_index_bits))) {
|
||||
*out_level = i;
|
||||
return 0;
|
||||
}
|
||||
@ -73,27 +72,39 @@ static int stage2_page_size_to_level(unsigned long page_size, u32 *out_level)
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
static int stage2_level_to_page_size(u32 level, unsigned long *out_pgsize)
|
||||
static int gstage_level_to_page_order(u32 level, unsigned long *out_pgorder)
|
||||
{
|
||||
if (stage2_pgd_levels < level)
|
||||
if (gstage_pgd_levels < level)
|
||||
return -EINVAL;
|
||||
|
||||
*out_pgsize = 1UL << (12 + (level * stage2_index_bits));
|
||||
|
||||
*out_pgorder = 12 + (level * gstage_index_bits);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static bool stage2_get_leaf_entry(struct kvm *kvm, gpa_t addr,
|
||||
static int gstage_level_to_page_size(u32 level, unsigned long *out_pgsize)
|
||||
{
|
||||
int rc;
|
||||
unsigned long page_order = PAGE_SHIFT;
|
||||
|
||||
rc = gstage_level_to_page_order(level, &page_order);
|
||||
if (rc)
|
||||
return rc;
|
||||
|
||||
*out_pgsize = BIT(page_order);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static bool gstage_get_leaf_entry(struct kvm *kvm, gpa_t addr,
|
||||
pte_t **ptepp, u32 *ptep_level)
|
||||
{
|
||||
pte_t *ptep;
|
||||
u32 current_level = stage2_pgd_levels - 1;
|
||||
u32 current_level = gstage_pgd_levels - 1;
|
||||
|
||||
*ptep_level = current_level;
|
||||
ptep = (pte_t *)kvm->arch.pgd;
|
||||
ptep = &ptep[stage2_pte_index(addr, current_level)];
|
||||
ptep = &ptep[gstage_pte_index(addr, current_level)];
|
||||
while (ptep && pte_val(*ptep)) {
|
||||
if (stage2_pte_leaf(ptep)) {
|
||||
if (gstage_pte_leaf(ptep)) {
|
||||
*ptep_level = current_level;
|
||||
*ptepp = ptep;
|
||||
return true;
|
||||
@ -102,8 +113,8 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, gpa_t addr,
|
||||
if (current_level) {
|
||||
current_level--;
|
||||
*ptep_level = current_level;
|
||||
ptep = (pte_t *)stage2_pte_page_vaddr(*ptep);
|
||||
ptep = &ptep[stage2_pte_index(addr, current_level)];
|
||||
ptep = (pte_t *)gstage_pte_page_vaddr(*ptep);
|
||||
ptep = &ptep[gstage_pte_index(addr, current_level)];
|
||||
} else {
|
||||
ptep = NULL;
|
||||
}
|
||||
@ -112,38 +123,30 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, gpa_t addr,
|
||||
return false;
|
||||
}
|
||||
|
||||
static void stage2_remote_tlb_flush(struct kvm *kvm, u32 level, gpa_t addr)
|
||||
static void gstage_remote_tlb_flush(struct kvm *kvm, u32 level, gpa_t addr)
|
||||
{
|
||||
unsigned long size = PAGE_SIZE;
|
||||
struct kvm_vmid *vmid = &kvm->arch.vmid;
|
||||
unsigned long order = PAGE_SHIFT;
|
||||
|
||||
if (stage2_level_to_page_size(level, &size))
|
||||
if (gstage_level_to_page_order(level, &order))
|
||||
return;
|
||||
addr &= ~(size - 1);
|
||||
addr &= ~(BIT(order) - 1);
|
||||
|
||||
/*
|
||||
* TODO: Instead of cpu_online_mask, we should only target CPUs
|
||||
* where the Guest/VM is running.
|
||||
*/
|
||||
preempt_disable();
|
||||
sbi_remote_hfence_gvma_vmid(cpu_online_mask, addr, size,
|
||||
READ_ONCE(vmid->vmid));
|
||||
preempt_enable();
|
||||
kvm_riscv_hfence_gvma_vmid_gpa(kvm, -1UL, 0, addr, BIT(order), order);
|
||||
}
|
||||
|
||||
static int stage2_set_pte(struct kvm *kvm, u32 level,
|
||||
static int gstage_set_pte(struct kvm *kvm, u32 level,
|
||||
struct kvm_mmu_memory_cache *pcache,
|
||||
gpa_t addr, const pte_t *new_pte)
|
||||
{
|
||||
u32 current_level = stage2_pgd_levels - 1;
|
||||
u32 current_level = gstage_pgd_levels - 1;
|
||||
pte_t *next_ptep = (pte_t *)kvm->arch.pgd;
|
||||
pte_t *ptep = &next_ptep[stage2_pte_index(addr, current_level)];
|
||||
pte_t *ptep = &next_ptep[gstage_pte_index(addr, current_level)];
|
||||
|
||||
if (current_level < level)
|
||||
return -EINVAL;
|
||||
|
||||
while (current_level != level) {
|
||||
if (stage2_pte_leaf(ptep))
|
||||
if (gstage_pte_leaf(ptep))
|
||||
return -EEXIST;
|
||||
|
||||
if (!pte_val(*ptep)) {
|
||||
@ -155,23 +158,23 @@ static int stage2_set_pte(struct kvm *kvm, u32 level,
|
||||
*ptep = pfn_pte(PFN_DOWN(__pa(next_ptep)),
|
||||
__pgprot(_PAGE_TABLE));
|
||||
} else {
|
||||
if (stage2_pte_leaf(ptep))
|
||||
if (gstage_pte_leaf(ptep))
|
||||
return -EEXIST;
|
||||
next_ptep = (pte_t *)stage2_pte_page_vaddr(*ptep);
|
||||
next_ptep = (pte_t *)gstage_pte_page_vaddr(*ptep);
|
||||
}
|
||||
|
||||
current_level--;
|
||||
ptep = &next_ptep[stage2_pte_index(addr, current_level)];
|
||||
ptep = &next_ptep[gstage_pte_index(addr, current_level)];
|
||||
}
|
||||
|
||||
*ptep = *new_pte;
|
||||
if (stage2_pte_leaf(ptep))
|
||||
stage2_remote_tlb_flush(kvm, current_level, addr);
|
||||
if (gstage_pte_leaf(ptep))
|
||||
gstage_remote_tlb_flush(kvm, current_level, addr);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int stage2_map_page(struct kvm *kvm,
|
||||
static int gstage_map_page(struct kvm *kvm,
|
||||
struct kvm_mmu_memory_cache *pcache,
|
||||
gpa_t gpa, phys_addr_t hpa,
|
||||
unsigned long page_size,
|
||||
@ -182,7 +185,7 @@ static int stage2_map_page(struct kvm *kvm,
|
||||
pte_t new_pte;
|
||||
pgprot_t prot;
|
||||
|
||||
ret = stage2_page_size_to_level(page_size, &level);
|
||||
ret = gstage_page_size_to_level(page_size, &level);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
@ -193,9 +196,9 @@ static int stage2_map_page(struct kvm *kvm,
|
||||
* PTE so that software can update these bits.
|
||||
*
|
||||
* We support both options mentioned above. To achieve this, we
|
||||
* always set 'A' and 'D' PTE bits at time of creating stage2
|
||||
* always set 'A' and 'D' PTE bits at time of creating G-stage
|
||||
* mapping. To support KVM dirty page logging with both options
|
||||
* mentioned above, we will write-protect stage2 PTEs to track
|
||||
* mentioned above, we will write-protect G-stage PTEs to track
|
||||
* dirty pages.
|
||||
*/
|
||||
|
||||
@ -213,24 +216,24 @@ static int stage2_map_page(struct kvm *kvm,
|
||||
new_pte = pfn_pte(PFN_DOWN(hpa), prot);
|
||||
new_pte = pte_mkdirty(new_pte);
|
||||
|
||||
return stage2_set_pte(kvm, level, pcache, gpa, &new_pte);
|
||||
return gstage_set_pte(kvm, level, pcache, gpa, &new_pte);
|
||||
}
|
||||
|
||||
enum stage2_op {
|
||||
STAGE2_OP_NOP = 0, /* Nothing */
|
||||
STAGE2_OP_CLEAR, /* Clear/Unmap */
|
||||
STAGE2_OP_WP, /* Write-protect */
|
||||
enum gstage_op {
|
||||
GSTAGE_OP_NOP = 0, /* Nothing */
|
||||
GSTAGE_OP_CLEAR, /* Clear/Unmap */
|
||||
GSTAGE_OP_WP, /* Write-protect */
|
||||
};
|
||||
|
||||
static void stage2_op_pte(struct kvm *kvm, gpa_t addr,
|
||||
pte_t *ptep, u32 ptep_level, enum stage2_op op)
|
||||
static void gstage_op_pte(struct kvm *kvm, gpa_t addr,
|
||||
pte_t *ptep, u32 ptep_level, enum gstage_op op)
|
||||
{
|
||||
int i, ret;
|
||||
pte_t *next_ptep;
|
||||
u32 next_ptep_level;
|
||||
unsigned long next_page_size, page_size;
|
||||
|
||||
ret = stage2_level_to_page_size(ptep_level, &page_size);
|
||||
ret = gstage_level_to_page_size(ptep_level, &page_size);
|
||||
if (ret)
|
||||
return;
|
||||
|
||||
@ -239,31 +242,31 @@ static void stage2_op_pte(struct kvm *kvm, gpa_t addr,
|
||||
if (!pte_val(*ptep))
|
||||
return;
|
||||
|
||||
if (ptep_level && !stage2_pte_leaf(ptep)) {
|
||||
next_ptep = (pte_t *)stage2_pte_page_vaddr(*ptep);
|
||||
if (ptep_level && !gstage_pte_leaf(ptep)) {
|
||||
next_ptep = (pte_t *)gstage_pte_page_vaddr(*ptep);
|
||||
next_ptep_level = ptep_level - 1;
|
||||
ret = stage2_level_to_page_size(next_ptep_level,
|
||||
ret = gstage_level_to_page_size(next_ptep_level,
|
||||
&next_page_size);
|
||||
if (ret)
|
||||
return;
|
||||
|
||||
if (op == STAGE2_OP_CLEAR)
|
||||
if (op == GSTAGE_OP_CLEAR)
|
||||
set_pte(ptep, __pte(0));
|
||||
for (i = 0; i < PTRS_PER_PTE; i++)
|
||||
stage2_op_pte(kvm, addr + i * next_page_size,
|
||||
gstage_op_pte(kvm, addr + i * next_page_size,
|
||||
&next_ptep[i], next_ptep_level, op);
|
||||
if (op == STAGE2_OP_CLEAR)
|
||||
if (op == GSTAGE_OP_CLEAR)
|
||||
put_page(virt_to_page(next_ptep));
|
||||
} else {
|
||||
if (op == STAGE2_OP_CLEAR)
|
||||
if (op == GSTAGE_OP_CLEAR)
|
||||
set_pte(ptep, __pte(0));
|
||||
else if (op == STAGE2_OP_WP)
|
||||
else if (op == GSTAGE_OP_WP)
|
||||
set_pte(ptep, __pte(pte_val(*ptep) & ~_PAGE_WRITE));
|
||||
stage2_remote_tlb_flush(kvm, ptep_level, addr);
|
||||
gstage_remote_tlb_flush(kvm, ptep_level, addr);
|
||||
}
|
||||
}
|
||||
|
||||
static void stage2_unmap_range(struct kvm *kvm, gpa_t start,
|
||||
static void gstage_unmap_range(struct kvm *kvm, gpa_t start,
|
||||
gpa_t size, bool may_block)
|
||||
{
|
||||
int ret;
|
||||
@ -274,9 +277,9 @@ static void stage2_unmap_range(struct kvm *kvm, gpa_t start,
|
||||
gpa_t addr = start, end = start + size;
|
||||
|
||||
while (addr < end) {
|
||||
found_leaf = stage2_get_leaf_entry(kvm, addr,
|
||||
found_leaf = gstage_get_leaf_entry(kvm, addr,
|
||||
&ptep, &ptep_level);
|
||||
ret = stage2_level_to_page_size(ptep_level, &page_size);
|
||||
ret = gstage_level_to_page_size(ptep_level, &page_size);
|
||||
if (ret)
|
||||
break;
|
||||
|
||||
@ -284,8 +287,8 @@ static void stage2_unmap_range(struct kvm *kvm, gpa_t start,
|
||||
goto next;
|
||||
|
||||
if (!(addr & (page_size - 1)) && ((end - addr) >= page_size))
|
||||
stage2_op_pte(kvm, addr, ptep,
|
||||
ptep_level, STAGE2_OP_CLEAR);
|
||||
gstage_op_pte(kvm, addr, ptep,
|
||||
ptep_level, GSTAGE_OP_CLEAR);
|
||||
|
||||
next:
|
||||
addr += page_size;
|
||||
@ -299,7 +302,7 @@ next:
|
||||
}
|
||||
}
|
||||
|
||||
static void stage2_wp_range(struct kvm *kvm, gpa_t start, gpa_t end)
|
||||
static void gstage_wp_range(struct kvm *kvm, gpa_t start, gpa_t end)
|
||||
{
|
||||
int ret;
|
||||
pte_t *ptep;
|
||||
@ -309,9 +312,9 @@ static void stage2_wp_range(struct kvm *kvm, gpa_t start, gpa_t end)
|
||||
unsigned long page_size;
|
||||
|
||||
while (addr < end) {
|
||||
found_leaf = stage2_get_leaf_entry(kvm, addr,
|
||||
found_leaf = gstage_get_leaf_entry(kvm, addr,
|
||||
&ptep, &ptep_level);
|
||||
ret = stage2_level_to_page_size(ptep_level, &page_size);
|
||||
ret = gstage_level_to_page_size(ptep_level, &page_size);
|
||||
if (ret)
|
||||
break;
|
||||
|
||||
@ -319,15 +322,15 @@ static void stage2_wp_range(struct kvm *kvm, gpa_t start, gpa_t end)
|
||||
goto next;
|
||||
|
||||
if (!(addr & (page_size - 1)) && ((end - addr) >= page_size))
|
||||
stage2_op_pte(kvm, addr, ptep,
|
||||
ptep_level, STAGE2_OP_WP);
|
||||
gstage_op_pte(kvm, addr, ptep,
|
||||
ptep_level, GSTAGE_OP_WP);
|
||||
|
||||
next:
|
||||
addr += page_size;
|
||||
}
|
||||
}
|
||||
|
||||
static void stage2_wp_memory_region(struct kvm *kvm, int slot)
|
||||
static void gstage_wp_memory_region(struct kvm *kvm, int slot)
|
||||
{
|
||||
struct kvm_memslots *slots = kvm_memslots(kvm);
|
||||
struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
|
||||
@ -335,12 +338,12 @@ static void stage2_wp_memory_region(struct kvm *kvm, int slot)
|
||||
phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
|
||||
|
||||
spin_lock(&kvm->mmu_lock);
|
||||
stage2_wp_range(kvm, start, end);
|
||||
gstage_wp_range(kvm, start, end);
|
||||
spin_unlock(&kvm->mmu_lock);
|
||||
kvm_flush_remote_tlbs(kvm);
|
||||
}
|
||||
|
||||
static int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
|
||||
static int gstage_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
|
||||
unsigned long size, bool writable)
|
||||
{
|
||||
pte_t pte;
|
||||
@ -361,12 +364,12 @@ static int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa,
|
||||
if (!writable)
|
||||
pte = pte_wrprotect(pte);
|
||||
|
||||
ret = kvm_mmu_topup_memory_cache(&pcache, stage2_pgd_levels);
|
||||
ret = kvm_mmu_topup_memory_cache(&pcache, gstage_pgd_levels);
|
||||
if (ret)
|
||||
goto out;
|
||||
|
||||
spin_lock(&kvm->mmu_lock);
|
||||
ret = stage2_set_pte(kvm, 0, &pcache, addr, &pte);
|
||||
ret = gstage_set_pte(kvm, 0, &pcache, addr, &pte);
|
||||
spin_unlock(&kvm->mmu_lock);
|
||||
if (ret)
|
||||
goto out;
|
||||
@ -388,7 +391,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
|
||||
phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT;
|
||||
phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
|
||||
|
||||
stage2_wp_range(kvm, start, end);
|
||||
gstage_wp_range(kvm, start, end);
|
||||
}
|
||||
|
||||
void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
|
||||
@ -411,7 +414,7 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
|
||||
|
||||
void kvm_arch_flush_shadow_all(struct kvm *kvm)
|
||||
{
|
||||
kvm_riscv_stage2_free_pgd(kvm);
|
||||
kvm_riscv_gstage_free_pgd(kvm);
|
||||
}
|
||||
|
||||
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
|
||||
@ -421,7 +424,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
|
||||
phys_addr_t size = slot->npages << PAGE_SHIFT;
|
||||
|
||||
spin_lock(&kvm->mmu_lock);
|
||||
stage2_unmap_range(kvm, gpa, size, false);
|
||||
gstage_unmap_range(kvm, gpa, size, false);
|
||||
spin_unlock(&kvm->mmu_lock);
|
||||
}
|
||||
|
||||
@ -436,7 +439,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
|
||||
* the memory slot is write protected.
|
||||
*/
|
||||
if (change != KVM_MR_DELETE && new->flags & KVM_MEM_LOG_DIRTY_PAGES)
|
||||
stage2_wp_memory_region(kvm, new->id);
|
||||
gstage_wp_memory_region(kvm, new->id);
|
||||
}
|
||||
|
||||
int kvm_arch_prepare_memory_region(struct kvm *kvm,
|
||||
@ -458,7 +461,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
|
||||
* space addressable by the KVM guest GPA space.
|
||||
*/
|
||||
if ((new->base_gfn + new->npages) >=
|
||||
(stage2_gpa_size >> PAGE_SHIFT))
|
||||
(gstage_gpa_size >> PAGE_SHIFT))
|
||||
return -EFAULT;
|
||||
|
||||
hva = new->userspace_addr;
|
||||
@ -514,7 +517,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
|
||||
goto out;
|
||||
}
|
||||
|
||||
ret = stage2_ioremap(kvm, gpa, pa,
|
||||
ret = gstage_ioremap(kvm, gpa, pa,
|
||||
vm_end - vm_start, writable);
|
||||
if (ret)
|
||||
break;
|
||||
@ -527,7 +530,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
|
||||
|
||||
spin_lock(&kvm->mmu_lock);
|
||||
if (ret)
|
||||
stage2_unmap_range(kvm, base_gpa, size, false);
|
||||
gstage_unmap_range(kvm, base_gpa, size, false);
|
||||
spin_unlock(&kvm->mmu_lock);
|
||||
|
||||
out:
|
||||
@ -540,7 +543,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
|
||||
if (!kvm->arch.pgd)
|
||||
return false;
|
||||
|
||||
stage2_unmap_range(kvm, range->start << PAGE_SHIFT,
|
||||
gstage_unmap_range(kvm, range->start << PAGE_SHIFT,
|
||||
(range->end - range->start) << PAGE_SHIFT,
|
||||
range->may_block);
|
||||
return false;
|
||||
@ -556,10 +559,10 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
|
||||
|
||||
WARN_ON(range->end - range->start != 1);
|
||||
|
||||
ret = stage2_map_page(kvm, NULL, range->start << PAGE_SHIFT,
|
||||
ret = gstage_map_page(kvm, NULL, range->start << PAGE_SHIFT,
|
||||
__pfn_to_phys(pfn), PAGE_SIZE, true, true);
|
||||
if (ret) {
|
||||
kvm_debug("Failed to map stage2 page (error %d)\n", ret);
|
||||
kvm_debug("Failed to map G-stage page (error %d)\n", ret);
|
||||
return true;
|
||||
}
|
||||
|
||||
@ -577,7 +580,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
|
||||
|
||||
WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PGDIR_SIZE);
|
||||
|
||||
if (!stage2_get_leaf_entry(kvm, range->start << PAGE_SHIFT,
|
||||
if (!gstage_get_leaf_entry(kvm, range->start << PAGE_SHIFT,
|
||||
&ptep, &ptep_level))
|
||||
return false;
|
||||
|
||||
@ -595,14 +598,14 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
|
||||
|
||||
WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PGDIR_SIZE);
|
||||
|
||||
if (!stage2_get_leaf_entry(kvm, range->start << PAGE_SHIFT,
|
||||
if (!gstage_get_leaf_entry(kvm, range->start << PAGE_SHIFT,
|
||||
&ptep, &ptep_level))
|
||||
return false;
|
||||
|
||||
return pte_young(*ptep);
|
||||
}
|
||||
|
||||
int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
|
||||
int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
|
||||
struct kvm_memory_slot *memslot,
|
||||
gpa_t gpa, unsigned long hva, bool is_write)
|
||||
{
|
||||
@ -648,9 +651,9 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
|
||||
}
|
||||
|
||||
/* We need minimum second+third level pages */
|
||||
ret = kvm_mmu_topup_memory_cache(pcache, stage2_pgd_levels);
|
||||
ret = kvm_mmu_topup_memory_cache(pcache, gstage_pgd_levels);
|
||||
if (ret) {
|
||||
kvm_err("Failed to topup stage2 cache\n");
|
||||
kvm_err("Failed to topup G-stage cache\n");
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -680,15 +683,15 @@ int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
|
||||
if (writeable) {
|
||||
kvm_set_pfn_dirty(hfn);
|
||||
mark_page_dirty(kvm, gfn);
|
||||
ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
|
||||
ret = gstage_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
|
||||
vma_pagesize, false, true);
|
||||
} else {
|
||||
ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
|
||||
ret = gstage_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT,
|
||||
vma_pagesize, true, true);
|
||||
}
|
||||
|
||||
if (ret)
|
||||
kvm_err("Failed to map in stage2\n");
|
||||
kvm_err("Failed to map in G-stage\n");
|
||||
|
||||
out_unlock:
|
||||
spin_unlock(&kvm->mmu_lock);
|
||||
@ -697,7 +700,7 @@ out_unlock:
|
||||
return ret;
|
||||
}
|
||||
|
||||
int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm)
|
||||
int kvm_riscv_gstage_alloc_pgd(struct kvm *kvm)
|
||||
{
|
||||
struct page *pgd_page;
|
||||
|
||||
@ -707,7 +710,7 @@ int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm)
|
||||
}
|
||||
|
||||
pgd_page = alloc_pages(GFP_KERNEL | __GFP_ZERO,
|
||||
get_order(stage2_pgd_size));
|
||||
get_order(gstage_pgd_size));
|
||||
if (!pgd_page)
|
||||
return -ENOMEM;
|
||||
kvm->arch.pgd = page_to_virt(pgd_page);
|
||||
@ -716,13 +719,13 @@ int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm)
|
||||
return 0;
|
||||
}
|
||||
|
||||
void kvm_riscv_stage2_free_pgd(struct kvm *kvm)
|
||||
void kvm_riscv_gstage_free_pgd(struct kvm *kvm)
|
||||
{
|
||||
void *pgd = NULL;
|
||||
|
||||
spin_lock(&kvm->mmu_lock);
|
||||
if (kvm->arch.pgd) {
|
||||
stage2_unmap_range(kvm, 0UL, stage2_gpa_size, false);
|
||||
gstage_unmap_range(kvm, 0UL, gstage_gpa_size, false);
|
||||
pgd = READ_ONCE(kvm->arch.pgd);
|
||||
kvm->arch.pgd = NULL;
|
||||
kvm->arch.pgd_phys = 0;
|
||||
@ -730,12 +733,12 @@ void kvm_riscv_stage2_free_pgd(struct kvm *kvm)
|
||||
spin_unlock(&kvm->mmu_lock);
|
||||
|
||||
if (pgd)
|
||||
free_pages((unsigned long)pgd, get_order(stage2_pgd_size));
|
||||
free_pages((unsigned long)pgd, get_order(gstage_pgd_size));
|
||||
}
|
||||
|
||||
void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu)
|
||||
void kvm_riscv_gstage_update_hgatp(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
unsigned long hgatp = stage2_mode;
|
||||
unsigned long hgatp = gstage_mode;
|
||||
struct kvm_arch *k = &vcpu->kvm->arch;
|
||||
|
||||
hgatp |= (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) &
|
||||
@ -744,31 +747,40 @@ void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu)
|
||||
|
||||
csr_write(CSR_HGATP, hgatp);
|
||||
|
||||
if (!kvm_riscv_stage2_vmid_bits())
|
||||
__kvm_riscv_hfence_gvma_all();
|
||||
if (!kvm_riscv_gstage_vmid_bits())
|
||||
kvm_riscv_local_hfence_gvma_all();
|
||||
}
|
||||
|
||||
void kvm_riscv_stage2_mode_detect(void)
|
||||
void kvm_riscv_gstage_mode_detect(void)
|
||||
{
|
||||
#ifdef CONFIG_64BIT
|
||||
/* Try Sv48x4 stage2 mode */
|
||||
/* Try Sv57x4 G-stage mode */
|
||||
csr_write(CSR_HGATP, HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
|
||||
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV57X4) {
|
||||
gstage_mode = (HGATP_MODE_SV57X4 << HGATP_MODE_SHIFT);
|
||||
gstage_pgd_levels = 5;
|
||||
goto skip_sv48x4_test;
|
||||
}
|
||||
|
||||
/* Try Sv48x4 G-stage mode */
|
||||
csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
|
||||
if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) {
|
||||
stage2_mode = (HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
|
||||
stage2_pgd_levels = 4;
|
||||
gstage_mode = (HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT);
|
||||
gstage_pgd_levels = 4;
|
||||
}
|
||||
csr_write(CSR_HGATP, 0);
|
||||
skip_sv48x4_test:
|
||||
|
||||
__kvm_riscv_hfence_gvma_all();
|
||||
csr_write(CSR_HGATP, 0);
|
||||
kvm_riscv_local_hfence_gvma_all();
|
||||
#endif
|
||||
}
|
||||
|
||||
unsigned long kvm_riscv_stage2_mode(void)
|
||||
unsigned long kvm_riscv_gstage_mode(void)
|
||||
{
|
||||
return stage2_mode >> HGATP_MODE_SHIFT;
|
||||
return gstage_mode >> HGATP_MODE_SHIFT;
|
||||
}
|
||||
|
||||
int kvm_riscv_stage2_gpa_bits(void)
|
||||
int kvm_riscv_gstage_gpa_bits(void)
|
||||
{
|
||||
return stage2_gpa_bits;
|
||||
return gstage_gpa_bits;
|
||||
}
|
||||
|
@ -1,74 +0,0 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/*
|
||||
* Copyright (C) 2019 Western Digital Corporation or its affiliates.
|
||||
*
|
||||
* Authors:
|
||||
* Anup Patel <anup.patel@wdc.com>
|
||||
*/
|
||||
|
||||
#include <linux/linkage.h>
|
||||
#include <asm/asm.h>
|
||||
|
||||
.text
|
||||
.altmacro
|
||||
.option norelax
|
||||
|
||||
/*
|
||||
* Instruction encoding of hfence.gvma is:
|
||||
* HFENCE.GVMA rs1, rs2
|
||||
* HFENCE.GVMA zero, rs2
|
||||
* HFENCE.GVMA rs1
|
||||
* HFENCE.GVMA
|
||||
*
|
||||
* rs1!=zero and rs2!=zero ==> HFENCE.GVMA rs1, rs2
|
||||
* rs1==zero and rs2!=zero ==> HFENCE.GVMA zero, rs2
|
||||
* rs1!=zero and rs2==zero ==> HFENCE.GVMA rs1
|
||||
* rs1==zero and rs2==zero ==> HFENCE.GVMA
|
||||
*
|
||||
* Instruction encoding of HFENCE.GVMA is:
|
||||
* 0110001 rs2(5) rs1(5) 000 00000 1110011
|
||||
*/
|
||||
|
||||
ENTRY(__kvm_riscv_hfence_gvma_vmid_gpa)
|
||||
/*
|
||||
* rs1 = a0 (GPA >> 2)
|
||||
* rs2 = a1 (VMID)
|
||||
* HFENCE.GVMA a0, a1
|
||||
* 0110001 01011 01010 000 00000 1110011
|
||||
*/
|
||||
.word 0x62b50073
|
||||
ret
|
||||
ENDPROC(__kvm_riscv_hfence_gvma_vmid_gpa)
|
||||
|
||||
ENTRY(__kvm_riscv_hfence_gvma_vmid)
|
||||
/*
|
||||
* rs1 = zero
|
||||
* rs2 = a0 (VMID)
|
||||
* HFENCE.GVMA zero, a0
|
||||
* 0110001 01010 00000 000 00000 1110011
|
||||
*/
|
||||
.word 0x62a00073
|
||||
ret
|
||||
ENDPROC(__kvm_riscv_hfence_gvma_vmid)
|
||||
|
||||
ENTRY(__kvm_riscv_hfence_gvma_gpa)
|
||||
/*
|
||||
* rs1 = a0 (GPA >> 2)
|
||||
* rs2 = zero
|
||||
* HFENCE.GVMA a0
|
||||
* 0110001 00000 01010 000 00000 1110011
|
||||
*/
|
||||
.word 0x62050073
|
||||
ret
|
||||
ENDPROC(__kvm_riscv_hfence_gvma_gpa)
|
||||
|
||||
ENTRY(__kvm_riscv_hfence_gvma_all)
|
||||
/*
|
||||
* rs1 = zero
|
||||
* rs2 = zero
|
||||
* HFENCE.GVMA
|
||||
* 0110001 00000 00000 000 00000 1110011
|
||||
*/
|
||||
.word 0x62000073
|
||||
ret
|
||||
ENDPROC(__kvm_riscv_hfence_gvma_all)
|
461
arch/riscv/kvm/tlb.c
Normal file
461
arch/riscv/kvm/tlb.c
Normal file
@ -0,0 +1,461 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* Copyright (c) 2022 Ventana Micro Systems Inc.
|
||||
*/
|
||||
|
||||
#include <linux/bitmap.h>
|
||||
#include <linux/cpumask.h>
|
||||
#include <linux/errno.h>
|
||||
#include <linux/err.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/smp.h>
|
||||
#include <linux/kvm_host.h>
|
||||
#include <asm/cacheflush.h>
|
||||
#include <asm/csr.h>
|
||||
|
||||
/*
|
||||
* Instruction encoding of hfence.gvma is:
|
||||
* HFENCE.GVMA rs1, rs2
|
||||
* HFENCE.GVMA zero, rs2
|
||||
* HFENCE.GVMA rs1
|
||||
* HFENCE.GVMA
|
||||
*
|
||||
* rs1!=zero and rs2!=zero ==> HFENCE.GVMA rs1, rs2
|
||||
* rs1==zero and rs2!=zero ==> HFENCE.GVMA zero, rs2
|
||||
* rs1!=zero and rs2==zero ==> HFENCE.GVMA rs1
|
||||
* rs1==zero and rs2==zero ==> HFENCE.GVMA
|
||||
*
|
||||
* Instruction encoding of HFENCE.GVMA is:
|
||||
* 0110001 rs2(5) rs1(5) 000 00000 1110011
|
||||
*/
|
||||
|
||||
void kvm_riscv_local_hfence_gvma_vmid_gpa(unsigned long vmid,
|
||||
gpa_t gpa, gpa_t gpsz,
|
||||
unsigned long order)
|
||||
{
|
||||
gpa_t pos;
|
||||
|
||||
if (PTRS_PER_PTE < (gpsz >> order)) {
|
||||
kvm_riscv_local_hfence_gvma_vmid_all(vmid);
|
||||
return;
|
||||
}
|
||||
|
||||
for (pos = gpa; pos < (gpa + gpsz); pos += BIT(order)) {
|
||||
/*
|
||||
* rs1 = a0 (GPA >> 2)
|
||||
* rs2 = a1 (VMID)
|
||||
* HFENCE.GVMA a0, a1
|
||||
* 0110001 01011 01010 000 00000 1110011
|
||||
*/
|
||||
asm volatile ("srli a0, %0, 2\n"
|
||||
"add a1, %1, zero\n"
|
||||
".word 0x62b50073\n"
|
||||
:: "r" (pos), "r" (vmid)
|
||||
: "a0", "a1", "memory");
|
||||
}
|
||||
}
|
||||
|
||||
void kvm_riscv_local_hfence_gvma_vmid_all(unsigned long vmid)
|
||||
{
|
||||
/*
|
||||
* rs1 = zero
|
||||
* rs2 = a0 (VMID)
|
||||
* HFENCE.GVMA zero, a0
|
||||
* 0110001 01010 00000 000 00000 1110011
|
||||
*/
|
||||
asm volatile ("add a0, %0, zero\n"
|
||||
".word 0x62a00073\n"
|
||||
:: "r" (vmid) : "a0", "memory");
|
||||
}
|
||||
|
||||
void kvm_riscv_local_hfence_gvma_gpa(gpa_t gpa, gpa_t gpsz,
|
||||
unsigned long order)
|
||||
{
|
||||
gpa_t pos;
|
||||
|
||||
if (PTRS_PER_PTE < (gpsz >> order)) {
|
||||
kvm_riscv_local_hfence_gvma_all();
|
||||
return;
|
||||
}
|
||||
|
||||
for (pos = gpa; pos < (gpa + gpsz); pos += BIT(order)) {
|
||||
/*
|
||||
* rs1 = a0 (GPA >> 2)
|
||||
* rs2 = zero
|
||||
* HFENCE.GVMA a0
|
||||
* 0110001 00000 01010 000 00000 1110011
|
||||
*/
|
||||
asm volatile ("srli a0, %0, 2\n"
|
||||
".word 0x62050073\n"
|
||||
:: "r" (pos) : "a0", "memory");
|
||||
}
|
||||
}
|
||||
|
||||
void kvm_riscv_local_hfence_gvma_all(void)
|
||||
{
|
||||
/*
|
||||
* rs1 = zero
|
||||
* rs2 = zero
|
||||
* HFENCE.GVMA
|
||||
* 0110001 00000 00000 000 00000 1110011
|
||||
*/
|
||||
asm volatile (".word 0x62000073" ::: "memory");
|
||||
}
|
||||
|
||||
/*
|
||||
* Instruction encoding of hfence.gvma is:
|
||||
* HFENCE.VVMA rs1, rs2
|
||||
* HFENCE.VVMA zero, rs2
|
||||
* HFENCE.VVMA rs1
|
||||
* HFENCE.VVMA
|
||||
*
|
||||
* rs1!=zero and rs2!=zero ==> HFENCE.VVMA rs1, rs2
|
||||
* rs1==zero and rs2!=zero ==> HFENCE.VVMA zero, rs2
|
||||
* rs1!=zero and rs2==zero ==> HFENCE.VVMA rs1
|
||||
* rs1==zero and rs2==zero ==> HFENCE.VVMA
|
||||
*
|
||||
* Instruction encoding of HFENCE.VVMA is:
|
||||
* 0010001 rs2(5) rs1(5) 000 00000 1110011
|
||||
*/
|
||||
|
||||
void kvm_riscv_local_hfence_vvma_asid_gva(unsigned long vmid,
|
||||
unsigned long asid,
|
||||
unsigned long gva,
|
||||
unsigned long gvsz,
|
||||
unsigned long order)
|
||||
{
|
||||
unsigned long pos, hgatp;
|
||||
|
||||
if (PTRS_PER_PTE < (gvsz >> order)) {
|
||||
kvm_riscv_local_hfence_vvma_asid_all(vmid, asid);
|
||||
return;
|
||||
}
|
||||
|
||||
hgatp = csr_swap(CSR_HGATP, vmid << HGATP_VMID_SHIFT);
|
||||
|
||||
for (pos = gva; pos < (gva + gvsz); pos += BIT(order)) {
|
||||
/*
|
||||
* rs1 = a0 (GVA)
|
||||
* rs2 = a1 (ASID)
|
||||
* HFENCE.VVMA a0, a1
|
||||
* 0010001 01011 01010 000 00000 1110011
|
||||
*/
|
||||
asm volatile ("add a0, %0, zero\n"
|
||||
"add a1, %1, zero\n"
|
||||
".word 0x22b50073\n"
|
||||
:: "r" (pos), "r" (asid)
|
||||
: "a0", "a1", "memory");
|
||||
}
|
||||
|
||||
csr_write(CSR_HGATP, hgatp);
|
||||
}
|
||||
|
||||
void kvm_riscv_local_hfence_vvma_asid_all(unsigned long vmid,
|
||||
unsigned long asid)
|
||||
{
|
||||
unsigned long hgatp;
|
||||
|
||||
hgatp = csr_swap(CSR_HGATP, vmid << HGATP_VMID_SHIFT);
|
||||
|
||||
/*
|
||||
* rs1 = zero
|
||||
* rs2 = a0 (ASID)
|
||||
* HFENCE.VVMA zero, a0
|
||||
* 0010001 01010 00000 000 00000 1110011
|
||||
*/
|
||||
asm volatile ("add a0, %0, zero\n"
|
||||
".word 0x22a00073\n"
|
||||
:: "r" (asid) : "a0", "memory");
|
||||
|
||||
csr_write(CSR_HGATP, hgatp);
|
||||
}
|
||||
|
||||
void kvm_riscv_local_hfence_vvma_gva(unsigned long vmid,
|
||||
unsigned long gva, unsigned long gvsz,
|
||||
unsigned long order)
|
||||
{
|
||||
unsigned long pos, hgatp;
|
||||
|
||||
if (PTRS_PER_PTE < (gvsz >> order)) {
|
||||
kvm_riscv_local_hfence_vvma_all(vmid);
|
||||
return;
|
||||
}
|
||||
|
||||
hgatp = csr_swap(CSR_HGATP, vmid << HGATP_VMID_SHIFT);
|
||||
|
||||
for (pos = gva; pos < (gva + gvsz); pos += BIT(order)) {
|
||||
/*
|
||||
* rs1 = a0 (GVA)
|
||||
* rs2 = zero
|
||||
* HFENCE.VVMA a0
|
||||
* 0010001 00000 01010 000 00000 1110011
|
||||
*/
|
||||
asm volatile ("add a0, %0, zero\n"
|
||||
".word 0x22050073\n"
|
||||
:: "r" (pos) : "a0", "memory");
|
||||
}
|
||||
|
||||
csr_write(CSR_HGATP, hgatp);
|
||||
}
|
||||
|
||||
void kvm_riscv_local_hfence_vvma_all(unsigned long vmid)
|
||||
{
|
||||
unsigned long hgatp;
|
||||
|
||||
hgatp = csr_swap(CSR_HGATP, vmid << HGATP_VMID_SHIFT);
|
||||
|
||||
/*
|
||||
* rs1 = zero
|
||||
* rs2 = zero
|
||||
* HFENCE.VVMA
|
||||
* 0010001 00000 00000 000 00000 1110011
|
||||
*/
|
||||
asm volatile (".word 0x22000073" ::: "memory");
|
||||
|
||||
csr_write(CSR_HGATP, hgatp);
|
||||
}
|
||||
|
||||
void kvm_riscv_local_tlb_sanitize(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
unsigned long vmid;
|
||||
|
||||
if (!kvm_riscv_gstage_vmid_bits() ||
|
||||
vcpu->arch.last_exit_cpu == vcpu->cpu)
|
||||
return;
|
||||
|
||||
/*
|
||||
* On RISC-V platforms with hardware VMID support, we share same
|
||||
* VMID for all VCPUs of a particular Guest/VM. This means we might
|
||||
* have stale G-stage TLB entries on the current Host CPU due to
|
||||
* some other VCPU of the same Guest which ran previously on the
|
||||
* current Host CPU.
|
||||
*
|
||||
* To cleanup stale TLB entries, we simply flush all G-stage TLB
|
||||
* entries by VMID whenever underlying Host CPU changes for a VCPU.
|
||||
*/
|
||||
|
||||
vmid = READ_ONCE(vcpu->kvm->arch.vmid.vmid);
|
||||
kvm_riscv_local_hfence_gvma_vmid_all(vmid);
|
||||
}
|
||||
|
||||
void kvm_riscv_fence_i_process(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
local_flush_icache_all();
|
||||
}
|
||||
|
||||
void kvm_riscv_hfence_gvma_vmid_all_process(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_vmid *vmid;
|
||||
|
||||
vmid = &vcpu->kvm->arch.vmid;
|
||||
kvm_riscv_local_hfence_gvma_vmid_all(READ_ONCE(vmid->vmid));
|
||||
}
|
||||
|
||||
void kvm_riscv_hfence_vvma_all_process(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_vmid *vmid;
|
||||
|
||||
vmid = &vcpu->kvm->arch.vmid;
|
||||
kvm_riscv_local_hfence_vvma_all(READ_ONCE(vmid->vmid));
|
||||
}
|
||||
|
||||
static bool vcpu_hfence_dequeue(struct kvm_vcpu *vcpu,
|
||||
struct kvm_riscv_hfence *out_data)
|
||||
{
|
||||
bool ret = false;
|
||||
struct kvm_vcpu_arch *varch = &vcpu->arch;
|
||||
|
||||
spin_lock(&varch->hfence_lock);
|
||||
|
||||
if (varch->hfence_queue[varch->hfence_head].type) {
|
||||
memcpy(out_data, &varch->hfence_queue[varch->hfence_head],
|
||||
sizeof(*out_data));
|
||||
varch->hfence_queue[varch->hfence_head].type = 0;
|
||||
|
||||
varch->hfence_head++;
|
||||
if (varch->hfence_head == KVM_RISCV_VCPU_MAX_HFENCE)
|
||||
varch->hfence_head = 0;
|
||||
|
||||
ret = true;
|
||||
}
|
||||
|
||||
spin_unlock(&varch->hfence_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static bool vcpu_hfence_enqueue(struct kvm_vcpu *vcpu,
|
||||
const struct kvm_riscv_hfence *data)
|
||||
{
|
||||
bool ret = false;
|
||||
struct kvm_vcpu_arch *varch = &vcpu->arch;
|
||||
|
||||
spin_lock(&varch->hfence_lock);
|
||||
|
||||
if (!varch->hfence_queue[varch->hfence_tail].type) {
|
||||
memcpy(&varch->hfence_queue[varch->hfence_tail],
|
||||
data, sizeof(*data));
|
||||
|
||||
varch->hfence_tail++;
|
||||
if (varch->hfence_tail == KVM_RISCV_VCPU_MAX_HFENCE)
|
||||
varch->hfence_tail = 0;
|
||||
|
||||
ret = true;
|
||||
}
|
||||
|
||||
spin_unlock(&varch->hfence_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
void kvm_riscv_hfence_process(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_riscv_hfence d = { 0 };
|
||||
struct kvm_vmid *v = &vcpu->kvm->arch.vmid;
|
||||
|
||||
while (vcpu_hfence_dequeue(vcpu, &d)) {
|
||||
switch (d.type) {
|
||||
case KVM_RISCV_HFENCE_UNKNOWN:
|
||||
break;
|
||||
case KVM_RISCV_HFENCE_GVMA_VMID_GPA:
|
||||
kvm_riscv_local_hfence_gvma_vmid_gpa(
|
||||
READ_ONCE(v->vmid),
|
||||
d.addr, d.size, d.order);
|
||||
break;
|
||||
case KVM_RISCV_HFENCE_VVMA_ASID_GVA:
|
||||
kvm_riscv_local_hfence_vvma_asid_gva(
|
||||
READ_ONCE(v->vmid), d.asid,
|
||||
d.addr, d.size, d.order);
|
||||
break;
|
||||
case KVM_RISCV_HFENCE_VVMA_ASID_ALL:
|
||||
kvm_riscv_local_hfence_vvma_asid_all(
|
||||
READ_ONCE(v->vmid), d.asid);
|
||||
break;
|
||||
case KVM_RISCV_HFENCE_VVMA_GVA:
|
||||
kvm_riscv_local_hfence_vvma_gva(
|
||||
READ_ONCE(v->vmid),
|
||||
d.addr, d.size, d.order);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static void make_xfence_request(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask,
|
||||
unsigned int req, unsigned int fallback_req,
|
||||
const struct kvm_riscv_hfence *data)
|
||||
{
|
||||
unsigned long i;
|
||||
struct kvm_vcpu *vcpu;
|
||||
unsigned int actual_req = req;
|
||||
DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
|
||||
|
||||
bitmap_clear(vcpu_mask, 0, KVM_MAX_VCPUS);
|
||||
kvm_for_each_vcpu(i, vcpu, kvm) {
|
||||
if (hbase != -1UL) {
|
||||
if (vcpu->vcpu_id < hbase)
|
||||
continue;
|
||||
if (!(hmask & (1UL << (vcpu->vcpu_id - hbase))))
|
||||
continue;
|
||||
}
|
||||
|
||||
bitmap_set(vcpu_mask, i, 1);
|
||||
|
||||
if (!data || !data->type)
|
||||
continue;
|
||||
|
||||
/*
|
||||
* Enqueue hfence data to VCPU hfence queue. If we don't
|
||||
* have space in the VCPU hfence queue then fallback to
|
||||
* a more conservative hfence request.
|
||||
*/
|
||||
if (!vcpu_hfence_enqueue(vcpu, data))
|
||||
actual_req = fallback_req;
|
||||
}
|
||||
|
||||
kvm_make_vcpus_request_mask(kvm, actual_req, vcpu_mask);
|
||||
}
|
||||
|
||||
void kvm_riscv_fence_i(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask)
|
||||
{
|
||||
make_xfence_request(kvm, hbase, hmask, KVM_REQ_FENCE_I,
|
||||
KVM_REQ_FENCE_I, NULL);
|
||||
}
|
||||
|
||||
void kvm_riscv_hfence_gvma_vmid_gpa(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask,
|
||||
gpa_t gpa, gpa_t gpsz,
|
||||
unsigned long order)
|
||||
{
|
||||
struct kvm_riscv_hfence data;
|
||||
|
||||
data.type = KVM_RISCV_HFENCE_GVMA_VMID_GPA;
|
||||
data.asid = 0;
|
||||
data.addr = gpa;
|
||||
data.size = gpsz;
|
||||
data.order = order;
|
||||
make_xfence_request(kvm, hbase, hmask, KVM_REQ_HFENCE,
|
||||
KVM_REQ_HFENCE_GVMA_VMID_ALL, &data);
|
||||
}
|
||||
|
||||
void kvm_riscv_hfence_gvma_vmid_all(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask)
|
||||
{
|
||||
make_xfence_request(kvm, hbase, hmask, KVM_REQ_HFENCE_GVMA_VMID_ALL,
|
||||
KVM_REQ_HFENCE_GVMA_VMID_ALL, NULL);
|
||||
}
|
||||
|
||||
void kvm_riscv_hfence_vvma_asid_gva(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask,
|
||||
unsigned long gva, unsigned long gvsz,
|
||||
unsigned long order, unsigned long asid)
|
||||
{
|
||||
struct kvm_riscv_hfence data;
|
||||
|
||||
data.type = KVM_RISCV_HFENCE_VVMA_ASID_GVA;
|
||||
data.asid = asid;
|
||||
data.addr = gva;
|
||||
data.size = gvsz;
|
||||
data.order = order;
|
||||
make_xfence_request(kvm, hbase, hmask, KVM_REQ_HFENCE,
|
||||
KVM_REQ_HFENCE_VVMA_ALL, &data);
|
||||
}
|
||||
|
||||
void kvm_riscv_hfence_vvma_asid_all(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask,
|
||||
unsigned long asid)
|
||||
{
|
||||
struct kvm_riscv_hfence data;
|
||||
|
||||
data.type = KVM_RISCV_HFENCE_VVMA_ASID_ALL;
|
||||
data.asid = asid;
|
||||
data.addr = data.size = data.order = 0;
|
||||
make_xfence_request(kvm, hbase, hmask, KVM_REQ_HFENCE,
|
||||
KVM_REQ_HFENCE_VVMA_ALL, &data);
|
||||
}
|
||||
|
||||
void kvm_riscv_hfence_vvma_gva(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask,
|
||||
unsigned long gva, unsigned long gvsz,
|
||||
unsigned long order)
|
||||
{
|
||||
struct kvm_riscv_hfence data;
|
||||
|
||||
data.type = KVM_RISCV_HFENCE_VVMA_GVA;
|
||||
data.asid = 0;
|
||||
data.addr = gva;
|
||||
data.size = gvsz;
|
||||
data.order = order;
|
||||
make_xfence_request(kvm, hbase, hmask, KVM_REQ_HFENCE,
|
||||
KVM_REQ_HFENCE_VVMA_ALL, &data);
|
||||
}
|
||||
|
||||
void kvm_riscv_hfence_vvma_all(struct kvm *kvm,
|
||||
unsigned long hbase, unsigned long hmask)
|
||||
{
|
||||
make_xfence_request(kvm, hbase, hmask, KVM_REQ_HFENCE_VVMA_ALL,
|
||||
KVM_REQ_HFENCE_VVMA_ALL, NULL);
|
||||
}
|
@ -67,6 +67,8 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
|
||||
if (loaded)
|
||||
kvm_arch_vcpu_put(vcpu);
|
||||
|
||||
vcpu->arch.last_exit_cpu = -1;
|
||||
|
||||
memcpy(csr, reset_csr, sizeof(*csr));
|
||||
|
||||
memcpy(cntx, reset_cntx, sizeof(*cntx));
|
||||
@ -78,6 +80,10 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
|
||||
WRITE_ONCE(vcpu->arch.irqs_pending, 0);
|
||||
WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
|
||||
|
||||
vcpu->arch.hfence_head = 0;
|
||||
vcpu->arch.hfence_tail = 0;
|
||||
memset(vcpu->arch.hfence_queue, 0, sizeof(vcpu->arch.hfence_queue));
|
||||
|
||||
/* Reset the guest CSRs for hotplug usecase */
|
||||
if (loaded)
|
||||
kvm_arch_vcpu_load(vcpu, smp_processor_id());
|
||||
@ -101,6 +107,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
|
||||
/* Setup ISA features available to VCPU */
|
||||
vcpu->arch.isa = riscv_isa_extension_base(NULL) & KVM_RISCV_ISA_ALLOWED;
|
||||
|
||||
/* Setup VCPU hfence queue */
|
||||
spin_lock_init(&vcpu->arch.hfence_lock);
|
||||
|
||||
/* Setup reset state of shadow SSTATUS and HSTATUS CSRs */
|
||||
cntx = &vcpu->arch.guest_reset_context;
|
||||
cntx->sstatus = SR_SPP | SR_SPIE;
|
||||
@ -137,7 +146,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
|
||||
/* Cleanup VCPU timer */
|
||||
kvm_riscv_vcpu_timer_deinit(vcpu);
|
||||
|
||||
/* Free unused pages pre-allocated for Stage2 page table mappings */
|
||||
/* Free unused pages pre-allocated for G-stage page table mappings */
|
||||
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
|
||||
}
|
||||
|
||||
@ -365,6 +374,101 @@ static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu,
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Mapping between KVM ISA Extension ID & Host ISA extension ID */
|
||||
static unsigned long kvm_isa_ext_arr[] = {
|
||||
RISCV_ISA_EXT_a,
|
||||
RISCV_ISA_EXT_c,
|
||||
RISCV_ISA_EXT_d,
|
||||
RISCV_ISA_EXT_f,
|
||||
RISCV_ISA_EXT_h,
|
||||
RISCV_ISA_EXT_i,
|
||||
RISCV_ISA_EXT_m,
|
||||
};
|
||||
|
||||
static int kvm_riscv_vcpu_get_reg_isa_ext(struct kvm_vcpu *vcpu,
|
||||
const struct kvm_one_reg *reg)
|
||||
{
|
||||
unsigned long __user *uaddr =
|
||||
(unsigned long __user *)(unsigned long)reg->addr;
|
||||
unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
|
||||
KVM_REG_SIZE_MASK |
|
||||
KVM_REG_RISCV_ISA_EXT);
|
||||
unsigned long reg_val = 0;
|
||||
unsigned long host_isa_ext;
|
||||
|
||||
if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
|
||||
return -EINVAL;
|
||||
|
||||
if (reg_num >= KVM_RISCV_ISA_EXT_MAX || reg_num >= ARRAY_SIZE(kvm_isa_ext_arr))
|
||||
return -EINVAL;
|
||||
|
||||
host_isa_ext = kvm_isa_ext_arr[reg_num];
|
||||
if (__riscv_isa_extension_available(&vcpu->arch.isa, host_isa_ext))
|
||||
reg_val = 1; /* Mark the given extension as available */
|
||||
|
||||
if (copy_to_user(uaddr, ®_val, KVM_REG_SIZE(reg->id)))
|
||||
return -EFAULT;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int kvm_riscv_vcpu_set_reg_isa_ext(struct kvm_vcpu *vcpu,
|
||||
const struct kvm_one_reg *reg)
|
||||
{
|
||||
unsigned long __user *uaddr =
|
||||
(unsigned long __user *)(unsigned long)reg->addr;
|
||||
unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
|
||||
KVM_REG_SIZE_MASK |
|
||||
KVM_REG_RISCV_ISA_EXT);
|
||||
unsigned long reg_val;
|
||||
unsigned long host_isa_ext;
|
||||
unsigned long host_isa_ext_mask;
|
||||
|
||||
if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
|
||||
return -EINVAL;
|
||||
|
||||
if (reg_num >= KVM_RISCV_ISA_EXT_MAX || reg_num >= ARRAY_SIZE(kvm_isa_ext_arr))
|
||||
return -EINVAL;
|
||||
|
||||
if (copy_from_user(®_val, uaddr, KVM_REG_SIZE(reg->id)))
|
||||
return -EFAULT;
|
||||
|
||||
host_isa_ext = kvm_isa_ext_arr[reg_num];
|
||||
if (!__riscv_isa_extension_available(NULL, host_isa_ext))
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
if (host_isa_ext >= RISCV_ISA_EXT_BASE &&
|
||||
host_isa_ext < RISCV_ISA_EXT_MAX) {
|
||||
/*
|
||||
* Multi-letter ISA extension. Currently there is no provision
|
||||
* to enable/disable the multi-letter ISA extensions for guests.
|
||||
* Return success if the request is to enable any ISA extension
|
||||
* that is available in the hardware.
|
||||
* Return -EOPNOTSUPP otherwise.
|
||||
*/
|
||||
if (!reg_val)
|
||||
return -EOPNOTSUPP;
|
||||
else
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Single letter base ISA extension */
|
||||
if (!vcpu->arch.ran_atleast_once) {
|
||||
host_isa_ext_mask = BIT_MASK(host_isa_ext);
|
||||
if (!reg_val && (host_isa_ext_mask & KVM_RISCV_ISA_DISABLE_ALLOWED))
|
||||
vcpu->arch.isa &= ~host_isa_ext_mask;
|
||||
else
|
||||
vcpu->arch.isa |= host_isa_ext_mask;
|
||||
vcpu->arch.isa &= riscv_isa_extension_base(NULL);
|
||||
vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED;
|
||||
kvm_riscv_vcpu_fp_reset(vcpu);
|
||||
} else {
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu,
|
||||
const struct kvm_one_reg *reg)
|
||||
{
|
||||
@ -382,6 +486,8 @@ static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu,
|
||||
else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_D)
|
||||
return kvm_riscv_vcpu_set_reg_fp(vcpu, reg,
|
||||
KVM_REG_RISCV_FP_D);
|
||||
else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_ISA_EXT)
|
||||
return kvm_riscv_vcpu_set_reg_isa_ext(vcpu, reg);
|
||||
|
||||
return -EINVAL;
|
||||
}
|
||||
@ -403,6 +509,8 @@ static int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu,
|
||||
else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_D)
|
||||
return kvm_riscv_vcpu_get_reg_fp(vcpu, reg,
|
||||
KVM_REG_RISCV_FP_D);
|
||||
else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_ISA_EXT)
|
||||
return kvm_riscv_vcpu_get_reg_isa_ext(vcpu, reg);
|
||||
|
||||
return -EINVAL;
|
||||
}
|
||||
@ -635,7 +743,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
|
||||
csr_write(CSR_HVIP, csr->hvip);
|
||||
csr_write(CSR_VSATP, csr->vsatp);
|
||||
|
||||
kvm_riscv_stage2_update_hgatp(vcpu);
|
||||
kvm_riscv_gstage_update_hgatp(vcpu);
|
||||
|
||||
kvm_riscv_vcpu_timer_restore(vcpu);
|
||||
|
||||
@ -690,10 +798,23 @@ static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
|
||||
kvm_riscv_reset_vcpu(vcpu);
|
||||
|
||||
if (kvm_check_request(KVM_REQ_UPDATE_HGATP, vcpu))
|
||||
kvm_riscv_stage2_update_hgatp(vcpu);
|
||||
kvm_riscv_gstage_update_hgatp(vcpu);
|
||||
|
||||
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
|
||||
__kvm_riscv_hfence_gvma_all();
|
||||
if (kvm_check_request(KVM_REQ_FENCE_I, vcpu))
|
||||
kvm_riscv_fence_i_process(vcpu);
|
||||
|
||||
/*
|
||||
* The generic KVM_REQ_TLB_FLUSH is same as
|
||||
* KVM_REQ_HFENCE_GVMA_VMID_ALL
|
||||
*/
|
||||
if (kvm_check_request(KVM_REQ_HFENCE_GVMA_VMID_ALL, vcpu))
|
||||
kvm_riscv_hfence_gvma_vmid_all_process(vcpu);
|
||||
|
||||
if (kvm_check_request(KVM_REQ_HFENCE_VVMA_ALL, vcpu))
|
||||
kvm_riscv_hfence_vvma_all_process(vcpu);
|
||||
|
||||
if (kvm_check_request(KVM_REQ_HFENCE, vcpu))
|
||||
kvm_riscv_hfence_process(vcpu);
|
||||
}
|
||||
}
|
||||
|
||||
@ -715,6 +836,7 @@ static void noinstr kvm_riscv_vcpu_enter_exit(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
guest_state_enter_irqoff();
|
||||
__kvm_riscv_switch_to(&vcpu->arch);
|
||||
vcpu->arch.last_exit_cpu = vcpu->cpu;
|
||||
guest_state_exit_irqoff();
|
||||
}
|
||||
|
||||
@ -762,7 +884,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
|
||||
/* Check conditions before entering the guest */
|
||||
cond_resched();
|
||||
|
||||
kvm_riscv_stage2_vmid_update(vcpu);
|
||||
kvm_riscv_gstage_vmid_update(vcpu);
|
||||
|
||||
kvm_riscv_check_vcpu_requests(vcpu);
|
||||
|
||||
@ -800,7 +922,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
|
||||
kvm_riscv_update_hvip(vcpu);
|
||||
|
||||
if (ret <= 0 ||
|
||||
kvm_riscv_stage2_vmid_ver_changed(&vcpu->kvm->arch.vmid) ||
|
||||
kvm_riscv_gstage_vmid_ver_changed(&vcpu->kvm->arch.vmid) ||
|
||||
kvm_request_pending(vcpu)) {
|
||||
vcpu->mode = OUTSIDE_GUEST_MODE;
|
||||
local_irq_enable();
|
||||
@ -809,6 +931,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
|
||||
continue;
|
||||
}
|
||||
|
||||
/*
|
||||
* Cleanup stale TLB enteries
|
||||
*
|
||||
* Note: This should be done after G-stage VMID has been
|
||||
* updated using kvm_riscv_gstage_vmid_ver_changed()
|
||||
*/
|
||||
kvm_riscv_local_tlb_sanitize(vcpu);
|
||||
|
||||
guest_timing_enter_irqoff();
|
||||
|
||||
kvm_riscv_vcpu_enter_exit(vcpu);
|
||||
|
@ -412,7 +412,7 @@ static int emulate_store(struct kvm_vcpu *vcpu, struct kvm_run *run,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int stage2_page_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
|
||||
static int gstage_page_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
|
||||
struct kvm_cpu_trap *trap)
|
||||
{
|
||||
struct kvm_memory_slot *memslot;
|
||||
@ -440,7 +440,7 @@ static int stage2_page_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
|
||||
};
|
||||
}
|
||||
|
||||
ret = kvm_riscv_stage2_map(vcpu, memslot, fault_addr, hva,
|
||||
ret = kvm_riscv_gstage_map(vcpu, memslot, fault_addr, hva,
|
||||
(trap->scause == EXC_STORE_GUEST_PAGE_FAULT) ? true : false);
|
||||
if (ret < 0)
|
||||
return ret;
|
||||
@ -686,7 +686,7 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
|
||||
case EXC_LOAD_GUEST_PAGE_FAULT:
|
||||
case EXC_STORE_GUEST_PAGE_FAULT:
|
||||
if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
|
||||
ret = stage2_page_fault(vcpu, run, trap);
|
||||
ret = gstage_page_fault(vcpu, run, trap);
|
||||
break;
|
||||
case EXC_SUPERVISOR_SYSCALL:
|
||||
if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
|
||||
|
@ -81,43 +81,41 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
|
||||
struct kvm_cpu_trap *utrap, bool *exit)
|
||||
{
|
||||
int ret = 0;
|
||||
unsigned long i;
|
||||
struct cpumask cm;
|
||||
struct kvm_vcpu *tmp;
|
||||
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
|
||||
unsigned long hmask = cp->a0;
|
||||
unsigned long hbase = cp->a1;
|
||||
unsigned long funcid = cp->a6;
|
||||
|
||||
cpumask_clear(&cm);
|
||||
kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
|
||||
if (hbase != -1UL) {
|
||||
if (tmp->vcpu_id < hbase)
|
||||
continue;
|
||||
if (!(hmask & (1UL << (tmp->vcpu_id - hbase))))
|
||||
continue;
|
||||
}
|
||||
if (tmp->cpu < 0)
|
||||
continue;
|
||||
cpumask_set_cpu(tmp->cpu, &cm);
|
||||
}
|
||||
|
||||
switch (funcid) {
|
||||
case SBI_EXT_RFENCE_REMOTE_FENCE_I:
|
||||
ret = sbi_remote_fence_i(&cm);
|
||||
kvm_riscv_fence_i(vcpu->kvm, hbase, hmask);
|
||||
break;
|
||||
case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA:
|
||||
ret = sbi_remote_hfence_vvma(&cm, cp->a2, cp->a3);
|
||||
if (cp->a2 == 0 && cp->a3 == 0)
|
||||
kvm_riscv_hfence_vvma_all(vcpu->kvm, hbase, hmask);
|
||||
else
|
||||
kvm_riscv_hfence_vvma_gva(vcpu->kvm, hbase, hmask,
|
||||
cp->a2, cp->a3, PAGE_SHIFT);
|
||||
break;
|
||||
case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID:
|
||||
ret = sbi_remote_hfence_vvma_asid(&cm, cp->a2,
|
||||
cp->a3, cp->a4);
|
||||
if (cp->a2 == 0 && cp->a3 == 0)
|
||||
kvm_riscv_hfence_vvma_asid_all(vcpu->kvm,
|
||||
hbase, hmask, cp->a4);
|
||||
else
|
||||
kvm_riscv_hfence_vvma_asid_gva(vcpu->kvm,
|
||||
hbase, hmask,
|
||||
cp->a2, cp->a3,
|
||||
PAGE_SHIFT, cp->a4);
|
||||
break;
|
||||
case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA:
|
||||
case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID:
|
||||
case SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA:
|
||||
case SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID:
|
||||
/* TODO: implement for nested hypervisor case */
|
||||
/*
|
||||
* Until nested virtualization is implemented, the
|
||||
* SBI HFENCE calls should be treated as NOPs
|
||||
*/
|
||||
break;
|
||||
default:
|
||||
ret = -EOPNOTSUPP;
|
||||
}
|
||||
|
@ -23,7 +23,6 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
|
||||
int i, ret = 0;
|
||||
u64 next_cycle;
|
||||
struct kvm_vcpu *rvcpu;
|
||||
struct cpumask cm;
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
|
||||
|
||||
@ -80,19 +79,29 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
|
||||
if (utrap->scause)
|
||||
break;
|
||||
|
||||
cpumask_clear(&cm);
|
||||
for_each_set_bit(i, &hmask, BITS_PER_LONG) {
|
||||
rvcpu = kvm_get_vcpu_by_id(vcpu->kvm, i);
|
||||
if (rvcpu->cpu < 0)
|
||||
continue;
|
||||
cpumask_set_cpu(rvcpu->cpu, &cm);
|
||||
}
|
||||
if (cp->a7 == SBI_EXT_0_1_REMOTE_FENCE_I)
|
||||
ret = sbi_remote_fence_i(&cm);
|
||||
else if (cp->a7 == SBI_EXT_0_1_REMOTE_SFENCE_VMA)
|
||||
ret = sbi_remote_hfence_vvma(&cm, cp->a1, cp->a2);
|
||||
else
|
||||
ret = sbi_remote_hfence_vvma_asid(&cm, cp->a1, cp->a2, cp->a3);
|
||||
kvm_riscv_fence_i(vcpu->kvm, 0, hmask);
|
||||
else if (cp->a7 == SBI_EXT_0_1_REMOTE_SFENCE_VMA) {
|
||||
if (cp->a1 == 0 && cp->a2 == 0)
|
||||
kvm_riscv_hfence_vvma_all(vcpu->kvm,
|
||||
0, hmask);
|
||||
else
|
||||
kvm_riscv_hfence_vvma_gva(vcpu->kvm,
|
||||
0, hmask,
|
||||
cp->a1, cp->a2,
|
||||
PAGE_SHIFT);
|
||||
} else {
|
||||
if (cp->a1 == 0 && cp->a2 == 0)
|
||||
kvm_riscv_hfence_vvma_asid_all(vcpu->kvm,
|
||||
0, hmask,
|
||||
cp->a3);
|
||||
else
|
||||
kvm_riscv_hfence_vvma_asid_gva(vcpu->kvm,
|
||||
0, hmask,
|
||||
cp->a1, cp->a2,
|
||||
PAGE_SHIFT,
|
||||
cp->a3);
|
||||
}
|
||||
break;
|
||||
default:
|
||||
ret = -EINVAL;
|
||||
|
@ -31,13 +31,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
|
||||
{
|
||||
int r;
|
||||
|
||||
r = kvm_riscv_stage2_alloc_pgd(kvm);
|
||||
r = kvm_riscv_gstage_alloc_pgd(kvm);
|
||||
if (r)
|
||||
return r;
|
||||
|
||||
r = kvm_riscv_stage2_vmid_init(kvm);
|
||||
r = kvm_riscv_gstage_vmid_init(kvm);
|
||||
if (r) {
|
||||
kvm_riscv_stage2_free_pgd(kvm);
|
||||
kvm_riscv_gstage_free_pgd(kvm);
|
||||
return r;
|
||||
}
|
||||
|
||||
@ -75,7 +75,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
r = KVM_USER_MEM_SLOTS;
|
||||
break;
|
||||
case KVM_CAP_VM_GPA_BITS:
|
||||
r = kvm_riscv_stage2_gpa_bits();
|
||||
r = kvm_riscv_gstage_gpa_bits();
|
||||
break;
|
||||
default:
|
||||
r = 0;
|
||||
|
@ -11,16 +11,16 @@
|
||||
#include <linux/errno.h>
|
||||
#include <linux/err.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/smp.h>
|
||||
#include <linux/kvm_host.h>
|
||||
#include <asm/csr.h>
|
||||
#include <asm/sbi.h>
|
||||
|
||||
static unsigned long vmid_version = 1;
|
||||
static unsigned long vmid_next;
|
||||
static unsigned long vmid_bits;
|
||||
static DEFINE_SPINLOCK(vmid_lock);
|
||||
|
||||
void kvm_riscv_stage2_vmid_detect(void)
|
||||
void kvm_riscv_gstage_vmid_detect(void)
|
||||
{
|
||||
unsigned long old;
|
||||
|
||||
@ -33,19 +33,19 @@ void kvm_riscv_stage2_vmid_detect(void)
|
||||
csr_write(CSR_HGATP, old);
|
||||
|
||||
/* We polluted local TLB so flush all guest TLB */
|
||||
__kvm_riscv_hfence_gvma_all();
|
||||
kvm_riscv_local_hfence_gvma_all();
|
||||
|
||||
/* We don't use VMID bits if they are not sufficient */
|
||||
if ((1UL << vmid_bits) < num_possible_cpus())
|
||||
vmid_bits = 0;
|
||||
}
|
||||
|
||||
unsigned long kvm_riscv_stage2_vmid_bits(void)
|
||||
unsigned long kvm_riscv_gstage_vmid_bits(void)
|
||||
{
|
||||
return vmid_bits;
|
||||
}
|
||||
|
||||
int kvm_riscv_stage2_vmid_init(struct kvm *kvm)
|
||||
int kvm_riscv_gstage_vmid_init(struct kvm *kvm)
|
||||
{
|
||||
/* Mark the initial VMID and VMID version invalid */
|
||||
kvm->arch.vmid.vmid_version = 0;
|
||||
@ -54,7 +54,7 @@ int kvm_riscv_stage2_vmid_init(struct kvm *kvm)
|
||||
return 0;
|
||||
}
|
||||
|
||||
bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid)
|
||||
bool kvm_riscv_gstage_vmid_ver_changed(struct kvm_vmid *vmid)
|
||||
{
|
||||
if (!vmid_bits)
|
||||
return false;
|
||||
@ -63,13 +63,18 @@ bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid)
|
||||
READ_ONCE(vmid_version));
|
||||
}
|
||||
|
||||
void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu)
|
||||
static void __local_hfence_gvma_all(void *info)
|
||||
{
|
||||
kvm_riscv_local_hfence_gvma_all();
|
||||
}
|
||||
|
||||
void kvm_riscv_gstage_vmid_update(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
unsigned long i;
|
||||
struct kvm_vcpu *v;
|
||||
struct kvm_vmid *vmid = &vcpu->kvm->arch.vmid;
|
||||
|
||||
if (!kvm_riscv_stage2_vmid_ver_changed(vmid))
|
||||
if (!kvm_riscv_gstage_vmid_ver_changed(vmid))
|
||||
return;
|
||||
|
||||
spin_lock(&vmid_lock);
|
||||
@ -78,7 +83,7 @@ void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu)
|
||||
* We need to re-check the vmid_version here to ensure that if
|
||||
* another vcpu already allocated a valid vmid for this vm.
|
||||
*/
|
||||
if (!kvm_riscv_stage2_vmid_ver_changed(vmid)) {
|
||||
if (!kvm_riscv_gstage_vmid_ver_changed(vmid)) {
|
||||
spin_unlock(&vmid_lock);
|
||||
return;
|
||||
}
|
||||
@ -96,12 +101,13 @@ void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu)
|
||||
* instances is invalid and we have force VMID re-assignement
|
||||
* for all Guest instances. The Guest instances that were not
|
||||
* running will automatically pick-up new VMIDs because will
|
||||
* call kvm_riscv_stage2_vmid_update() whenever they enter
|
||||
* call kvm_riscv_gstage_vmid_update() whenever they enter
|
||||
* in-kernel run loop. For Guest instances that are already
|
||||
* running, we force VM exits on all host CPUs using IPI and
|
||||
* flush all Guest TLBs.
|
||||
*/
|
||||
sbi_remote_hfence_gvma(cpu_online_mask, 0, 0);
|
||||
on_each_cpu_mask(cpu_online_mask, __local_hfence_gvma_all,
|
||||
NULL, 1);
|
||||
}
|
||||
|
||||
vmid->vmid = vmid_next;
|
||||
@ -112,7 +118,7 @@ void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu)
|
||||
|
||||
spin_unlock(&vmid_lock);
|
||||
|
||||
/* Request stage2 page table update for all VCPUs */
|
||||
/* Request G-stage page table update for all VCPUs */
|
||||
kvm_for_each_vcpu(i, v, vcpu->kvm)
|
||||
kvm_make_request(KVM_REQ_UPDATE_HGATP, v);
|
||||
}
|
||||
|
@ -2,7 +2,7 @@
|
||||
/*
|
||||
* Ultravisor Interfaces
|
||||
*
|
||||
* Copyright IBM Corp. 2019
|
||||
* Copyright IBM Corp. 2019, 2022
|
||||
*
|
||||
* Author(s):
|
||||
* Vasily Gorbik <gor@linux.ibm.com>
|
||||
@ -52,6 +52,7 @@
|
||||
#define UVC_CMD_UNPIN_PAGE_SHARED 0x0342
|
||||
#define UVC_CMD_SET_SHARED_ACCESS 0x1000
|
||||
#define UVC_CMD_REMOVE_SHARED_ACCESS 0x1001
|
||||
#define UVC_CMD_RETR_ATTEST 0x1020
|
||||
|
||||
/* Bits in installed uv calls */
|
||||
enum uv_cmds_inst {
|
||||
@ -76,6 +77,7 @@ enum uv_cmds_inst {
|
||||
BIT_UVC_CMD_UNSHARE_ALL = 20,
|
||||
BIT_UVC_CMD_PIN_PAGE_SHARED = 21,
|
||||
BIT_UVC_CMD_UNPIN_PAGE_SHARED = 22,
|
||||
BIT_UVC_CMD_RETR_ATTEST = 28,
|
||||
};
|
||||
|
||||
enum uv_feat_ind {
|
||||
@ -219,6 +221,25 @@ struct uv_cb_share {
|
||||
u64 reserved28;
|
||||
} __packed __aligned(8);
|
||||
|
||||
/* Retrieve Attestation Measurement */
|
||||
struct uv_cb_attest {
|
||||
struct uv_cb_header header; /* 0x0000 */
|
||||
u64 reserved08[2]; /* 0x0008 */
|
||||
u64 arcb_addr; /* 0x0018 */
|
||||
u64 cont_token; /* 0x0020 */
|
||||
u8 reserved28[6]; /* 0x0028 */
|
||||
u16 user_data_len; /* 0x002e */
|
||||
u8 user_data[256]; /* 0x0030 */
|
||||
u32 reserved130[3]; /* 0x0130 */
|
||||
u32 meas_len; /* 0x013c */
|
||||
u64 meas_addr; /* 0x0140 */
|
||||
u8 config_uid[16]; /* 0x0148 */
|
||||
u32 reserved158; /* 0x0158 */
|
||||
u32 add_data_len; /* 0x015c */
|
||||
u64 add_data_addr; /* 0x0160 */
|
||||
u64 reserved168[4]; /* 0x0168 */
|
||||
} __packed __aligned(8);
|
||||
|
||||
static inline int __uv_call(unsigned long r1, unsigned long r2)
|
||||
{
|
||||
int cc;
|
||||
|
51
arch/s390/include/uapi/asm/uvdevice.h
Normal file
51
arch/s390/include/uapi/asm/uvdevice.h
Normal file
@ -0,0 +1,51 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
|
||||
/*
|
||||
* Copyright IBM Corp. 2022
|
||||
* Author(s): Steffen Eiden <seiden@linux.ibm.com>
|
||||
*/
|
||||
#ifndef __S390_ASM_UVDEVICE_H
|
||||
#define __S390_ASM_UVDEVICE_H
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
struct uvio_ioctl_cb {
|
||||
__u32 flags;
|
||||
__u16 uv_rc; /* UV header rc value */
|
||||
__u16 uv_rrc; /* UV header rrc value */
|
||||
__u64 argument_addr; /* Userspace address of uvio argument */
|
||||
__u32 argument_len;
|
||||
__u8 reserved14[0x40 - 0x14]; /* must be zero */
|
||||
};
|
||||
|
||||
#define UVIO_ATT_USER_DATA_LEN 0x100
|
||||
#define UVIO_ATT_UID_LEN 0x10
|
||||
struct uvio_attest {
|
||||
__u64 arcb_addr; /* 0x0000 */
|
||||
__u64 meas_addr; /* 0x0008 */
|
||||
__u64 add_data_addr; /* 0x0010 */
|
||||
__u8 user_data[UVIO_ATT_USER_DATA_LEN]; /* 0x0018 */
|
||||
__u8 config_uid[UVIO_ATT_UID_LEN]; /* 0x0118 */
|
||||
__u32 arcb_len; /* 0x0128 */
|
||||
__u32 meas_len; /* 0x012c */
|
||||
__u32 add_data_len; /* 0x0130 */
|
||||
__u16 user_data_len; /* 0x0134 */
|
||||
__u16 reserved136; /* 0x0136 */
|
||||
};
|
||||
|
||||
/*
|
||||
* The following max values define an upper length for the IOCTL in/out buffers.
|
||||
* However, they do not represent the maximum the Ultravisor allows which is
|
||||
* often way smaller. By allowing larger buffer sizes we hopefully do not need
|
||||
* to update the code with every machine update. It is therefore possible for
|
||||
* userspace to request more memory than actually used by kernel/UV.
|
||||
*/
|
||||
#define UVIO_ATT_ARCB_MAX_LEN 0x100000
|
||||
#define UVIO_ATT_MEASUREMENT_MAX_LEN 0x8000
|
||||
#define UVIO_ATT_ADDITIONAL_MAX_LEN 0x8000
|
||||
|
||||
#define UVIO_DEVICE_NAME "uv"
|
||||
#define UVIO_TYPE_UVC 'u'
|
||||
|
||||
#define UVIO_IOCTL_ATT _IOWR(UVIO_TYPE_UVC, 0x01, struct uvio_ioctl_cb)
|
||||
|
||||
#endif /* __S390_ASM_UVDEVICE_H */
|
@ -491,8 +491,8 @@ enum prot_type {
|
||||
PROT_TYPE_IEP = 4,
|
||||
};
|
||||
|
||||
static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
|
||||
u8 ar, enum gacc_mode mode, enum prot_type prot)
|
||||
static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
|
||||
enum gacc_mode mode, enum prot_type prot, bool terminate)
|
||||
{
|
||||
struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm;
|
||||
struct trans_exc_code_bits *tec;
|
||||
@ -520,6 +520,11 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
|
||||
tec->b61 = 1;
|
||||
break;
|
||||
}
|
||||
if (terminate) {
|
||||
tec->b56 = 0;
|
||||
tec->b60 = 0;
|
||||
tec->b61 = 0;
|
||||
}
|
||||
fallthrough;
|
||||
case PGM_ASCE_TYPE:
|
||||
case PGM_PAGE_TRANSLATION:
|
||||
@ -552,6 +557,12 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
|
||||
return code;
|
||||
}
|
||||
|
||||
static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
|
||||
enum gacc_mode mode, enum prot_type prot)
|
||||
{
|
||||
return trans_exc_ending(vcpu, code, gva, ar, mode, prot, false);
|
||||
}
|
||||
|
||||
static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce,
|
||||
unsigned long ga, u8 ar, enum gacc_mode mode)
|
||||
{
|
||||
@ -1109,8 +1120,11 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
|
||||
data += fragment_len;
|
||||
ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len);
|
||||
}
|
||||
if (rc > 0)
|
||||
rc = trans_exc(vcpu, rc, ga, ar, mode, prot);
|
||||
if (rc > 0) {
|
||||
bool terminate = (mode == GACC_STORE) && (idx > 0);
|
||||
|
||||
rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot, terminate);
|
||||
}
|
||||
out_unlock:
|
||||
if (need_ipte_lock)
|
||||
ipte_unlock(vcpu);
|
||||
|
@ -407,6 +407,7 @@
|
||||
#define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
|
||||
#define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
|
||||
#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
|
||||
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
|
||||
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
|
||||
|
||||
/*
|
||||
|
@ -127,6 +127,7 @@ KVM_X86_OP_OPTIONAL(migrate_timers)
|
||||
KVM_X86_OP(msr_filter_changed)
|
||||
KVM_X86_OP(complete_emulated_msr)
|
||||
KVM_X86_OP(vcpu_deliver_sipi_vector)
|
||||
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
|
||||
|
||||
#undef KVM_X86_OP
|
||||
#undef KVM_X86_OP_OPTIONAL
|
||||
|
31
arch/x86/include/asm/kvm-x86-pmu-ops.h
Normal file
31
arch/x86/include/asm/kvm-x86-pmu-ops.h
Normal file
@ -0,0 +1,31 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#if !defined(KVM_X86_PMU_OP) || !defined(KVM_X86_PMU_OP_OPTIONAL)
|
||||
BUILD_BUG_ON(1)
|
||||
#endif
|
||||
|
||||
/*
|
||||
* KVM_X86_PMU_OP() and KVM_X86_PMU_OP_OPTIONAL() are used to help generate
|
||||
* both DECLARE/DEFINE_STATIC_CALL() invocations and
|
||||
* "static_call_update()" calls.
|
||||
*
|
||||
* KVM_X86_PMU_OP_OPTIONAL() can be used for those functions that can have
|
||||
* a NULL definition, for example if "static_call_cond()" will be used
|
||||
* at the call sites.
|
||||
*/
|
||||
KVM_X86_PMU_OP(pmc_perf_hw_id)
|
||||
KVM_X86_PMU_OP(pmc_is_enabled)
|
||||
KVM_X86_PMU_OP(pmc_idx_to_pmc)
|
||||
KVM_X86_PMU_OP(rdpmc_ecx_to_pmc)
|
||||
KVM_X86_PMU_OP(msr_idx_to_pmc)
|
||||
KVM_X86_PMU_OP(is_valid_rdpmc_ecx)
|
||||
KVM_X86_PMU_OP(is_valid_msr)
|
||||
KVM_X86_PMU_OP(get_msr)
|
||||
KVM_X86_PMU_OP(set_msr)
|
||||
KVM_X86_PMU_OP(refresh)
|
||||
KVM_X86_PMU_OP(init)
|
||||
KVM_X86_PMU_OP(reset)
|
||||
KVM_X86_PMU_OP_OPTIONAL(deliver_pmi)
|
||||
KVM_X86_PMU_OP_OPTIONAL(cleanup)
|
||||
|
||||
#undef KVM_X86_PMU_OP
|
||||
#undef KVM_X86_PMU_OP_OPTIONAL
|
@ -281,11 +281,11 @@ struct kvm_kernel_irq_routing_entry;
|
||||
/*
|
||||
* kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
|
||||
* also includes TDP pages) to determine whether or not a page can be used in
|
||||
* the given MMU context. This is a subset of the overall kvm_mmu_role to
|
||||
* the given MMU context. This is a subset of the overall kvm_cpu_role to
|
||||
* minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
|
||||
* 2 bytes per gfn instead of 4 bytes per gfn.
|
||||
*
|
||||
* Indirect upper-level shadow pages are tracked for write-protection via
|
||||
* Upper-level shadow pages having gptes are tracked for write-protection via
|
||||
* gfn_track. As above, gfn_track is a 16 bit counter, so KVM must not create
|
||||
* more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
|
||||
* gfn_track will overflow and explosions will ensure.
|
||||
@ -331,7 +331,8 @@ union kvm_mmu_page_role {
|
||||
unsigned smap_andnot_wp:1;
|
||||
unsigned ad_disabled:1;
|
||||
unsigned guest_mode:1;
|
||||
unsigned :6;
|
||||
unsigned passthrough:1;
|
||||
unsigned :5;
|
||||
|
||||
/*
|
||||
* This is left at the top of the word so that
|
||||
@ -367,8 +368,6 @@ union kvm_mmu_extended_role {
|
||||
struct {
|
||||
unsigned int valid:1;
|
||||
unsigned int execonly:1;
|
||||
unsigned int cr0_pg:1;
|
||||
unsigned int cr4_pae:1;
|
||||
unsigned int cr4_pse:1;
|
||||
unsigned int cr4_pke:1;
|
||||
unsigned int cr4_smap:1;
|
||||
@ -378,7 +377,7 @@ union kvm_mmu_extended_role {
|
||||
};
|
||||
};
|
||||
|
||||
union kvm_mmu_role {
|
||||
union kvm_cpu_role {
|
||||
u64 as_u64;
|
||||
struct {
|
||||
union kvm_mmu_page_role base;
|
||||
@ -438,19 +437,8 @@ struct kvm_mmu {
|
||||
struct kvm_mmu_page *sp);
|
||||
void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
|
||||
struct kvm_mmu_root_info root;
|
||||
union kvm_mmu_role mmu_role;
|
||||
u8 root_level;
|
||||
u8 shadow_root_level;
|
||||
u8 ept_ad;
|
||||
bool direct_map;
|
||||
struct kvm_mmu_root_info prev_roots[KVM_MMU_NUM_PREV_ROOTS];
|
||||
|
||||
/*
|
||||
* Bitmap; bit set = permission fault
|
||||
* Byte index: page fault error code [4:1]
|
||||
* Bit index: pte permissions in ACC_* format
|
||||
*/
|
||||
u8 permissions[16];
|
||||
union kvm_cpu_role cpu_role;
|
||||
union kvm_mmu_page_role root_role;
|
||||
|
||||
/*
|
||||
* The pkru_mask indicates if protection key checks are needed. It
|
||||
@ -460,6 +448,15 @@ struct kvm_mmu {
|
||||
*/
|
||||
u32 pkru_mask;
|
||||
|
||||
struct kvm_mmu_root_info prev_roots[KVM_MMU_NUM_PREV_ROOTS];
|
||||
|
||||
/*
|
||||
* Bitmap; bit set = permission fault
|
||||
* Byte index: page fault error code [4:1]
|
||||
* Bit index: pte permissions in ACC_* format
|
||||
*/
|
||||
u8 permissions[16];
|
||||
|
||||
u64 *pae_root;
|
||||
u64 *pml4_root;
|
||||
u64 *pml5_root;
|
||||
@ -607,16 +604,21 @@ struct kvm_vcpu_hv {
|
||||
struct kvm_vcpu_xen {
|
||||
u64 hypercall_rip;
|
||||
u32 current_runstate;
|
||||
bool vcpu_info_set;
|
||||
bool vcpu_time_info_set;
|
||||
bool runstate_set;
|
||||
struct gfn_to_hva_cache vcpu_info_cache;
|
||||
struct gfn_to_hva_cache vcpu_time_info_cache;
|
||||
struct gfn_to_hva_cache runstate_cache;
|
||||
u8 upcall_vector;
|
||||
struct gfn_to_pfn_cache vcpu_info_cache;
|
||||
struct gfn_to_pfn_cache vcpu_time_info_cache;
|
||||
struct gfn_to_pfn_cache runstate_cache;
|
||||
u64 last_steal;
|
||||
u64 runstate_entry_time;
|
||||
u64 runstate_times[4];
|
||||
unsigned long evtchn_pending_sel;
|
||||
u32 vcpu_id; /* The Xen / ACPI vCPU ID */
|
||||
u32 timer_virq;
|
||||
u64 timer_expires; /* In guest epoch */
|
||||
atomic_t timer_pending;
|
||||
struct hrtimer timer;
|
||||
int poll_evtchn;
|
||||
struct timer_list poll_timer;
|
||||
};
|
||||
|
||||
struct kvm_vcpu_arch {
|
||||
@ -753,8 +755,7 @@ struct kvm_vcpu_arch {
|
||||
gpa_t time;
|
||||
struct pvclock_vcpu_time_info hv_clock;
|
||||
unsigned int hw_tsc_khz;
|
||||
struct gfn_to_hva_cache pv_time;
|
||||
bool pv_time_enabled;
|
||||
struct gfn_to_pfn_cache pv_time;
|
||||
/* set guest stopped flag in pvclock flags field */
|
||||
bool pvclock_set_guest_stopped_request;
|
||||
|
||||
@ -1024,9 +1025,12 @@ struct msr_bitmap_range {
|
||||
|
||||
/* Xen emulation context */
|
||||
struct kvm_xen {
|
||||
u32 xen_version;
|
||||
bool long_mode;
|
||||
u8 upcall_vector;
|
||||
struct gfn_to_pfn_cache shinfo_cache;
|
||||
struct idr evtchn_ports;
|
||||
unsigned long poll_mask[BITS_TO_LONGS(KVM_MAX_VCPUS)];
|
||||
};
|
||||
|
||||
enum kvm_irqchip_mode {
|
||||
@ -1119,6 +1123,8 @@ struct kvm_arch {
|
||||
u64 cur_tsc_generation;
|
||||
int nr_vcpus_matched_tsc;
|
||||
|
||||
u32 default_tsc_khz;
|
||||
|
||||
seqcount_raw_spinlock_t pvclock_sc;
|
||||
bool use_master_clock;
|
||||
u64 master_kernel_ns;
|
||||
@ -1263,7 +1269,12 @@ struct kvm_vm_stat {
|
||||
|
||||
struct kvm_vcpu_stat {
|
||||
struct kvm_vcpu_stat_generic generic;
|
||||
u64 pf_taken;
|
||||
u64 pf_fixed;
|
||||
u64 pf_emulate;
|
||||
u64 pf_spurious;
|
||||
u64 pf_fast;
|
||||
u64 pf_mmio_spte_created;
|
||||
u64 pf_guest;
|
||||
u64 tlb_flush;
|
||||
u64 invlpg;
|
||||
@ -1455,8 +1466,6 @@ struct kvm_x86_ops {
|
||||
int cpu_dirty_log_size;
|
||||
void (*update_cpu_dirty_logging)(struct kvm_vcpu *vcpu);
|
||||
|
||||
/* pmu operations of sub-arch */
|
||||
const struct kvm_pmu_ops *pmu_ops;
|
||||
const struct kvm_x86_nested_ops *nested_ops;
|
||||
|
||||
void (*vcpu_blocking)(struct kvm_vcpu *vcpu);
|
||||
@ -1499,11 +1508,18 @@ struct kvm_x86_ops {
|
||||
int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
|
||||
|
||||
void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
|
||||
|
||||
/*
|
||||
* Returns vCPU specific APICv inhibit reasons
|
||||
*/
|
||||
unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
|
||||
};
|
||||
|
||||
struct kvm_x86_nested_ops {
|
||||
void (*leave_nested)(struct kvm_vcpu *vcpu);
|
||||
int (*check_events)(struct kvm_vcpu *vcpu);
|
||||
bool (*handle_page_fault_workaround)(struct kvm_vcpu *vcpu,
|
||||
struct x86_exception *fault);
|
||||
bool (*hv_timer_pending)(struct kvm_vcpu *vcpu);
|
||||
void (*triple_fault)(struct kvm_vcpu *vcpu);
|
||||
int (*get_state)(struct kvm_vcpu *vcpu,
|
||||
@ -1528,6 +1544,7 @@ struct kvm_x86_init_ops {
|
||||
unsigned int (*handle_intel_pt_intr)(void);
|
||||
|
||||
struct kvm_x86_ops *runtime_ops;
|
||||
struct kvm_pmu_ops *pmu_ops;
|
||||
};
|
||||
|
||||
struct kvm_arch_async_pf {
|
||||
@ -1549,20 +1566,6 @@ extern struct kvm_x86_ops kvm_x86_ops;
|
||||
#define KVM_X86_OP_OPTIONAL_RET0 KVM_X86_OP
|
||||
#include <asm/kvm-x86-ops.h>
|
||||
|
||||
static inline void kvm_ops_static_call_update(void)
|
||||
{
|
||||
#define __KVM_X86_OP(func) \
|
||||
static_call_update(kvm_x86_##func, kvm_x86_ops.func);
|
||||
#define KVM_X86_OP(func) \
|
||||
WARN_ON(!kvm_x86_ops.func); __KVM_X86_OP(func)
|
||||
#define KVM_X86_OP_OPTIONAL __KVM_X86_OP
|
||||
#define KVM_X86_OP_OPTIONAL_RET0(func) \
|
||||
static_call_update(kvm_x86_##func, (void *)kvm_x86_ops.func ? : \
|
||||
(void *)__static_call_return0);
|
||||
#include <asm/kvm-x86-ops.h>
|
||||
#undef __KVM_X86_OP
|
||||
}
|
||||
|
||||
#define __KVM_HAVE_ARCH_VM_ALLOC
|
||||
static inline struct kvm *kvm_arch_alloc_vm(void)
|
||||
{
|
||||
@ -1800,6 +1803,7 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
|
||||
struct x86_exception *exception);
|
||||
|
||||
bool kvm_apicv_activated(struct kvm *kvm);
|
||||
bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
|
||||
void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
|
||||
void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
|
||||
enum kvm_apicv_inhibit reason, bool set);
|
||||
@ -1989,6 +1993,7 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
|
||||
KVM_X86_QUIRK_CD_NW_CLEARED | \
|
||||
KVM_X86_QUIRK_LAPIC_MMIO_HOLE | \
|
||||
KVM_X86_QUIRK_OUT_7E_INC_RIP | \
|
||||
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)
|
||||
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \
|
||||
KVM_X86_QUIRK_FIX_HYPERCALL_INSN)
|
||||
|
||||
#endif /* _ASM_X86_KVM_HOST_H */
|
||||
|
@ -382,6 +382,103 @@ do { \
|
||||
|
||||
#endif // CONFIG_CC_HAS_ASM_GOTO_OUTPUT
|
||||
|
||||
#ifdef CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT
|
||||
#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \
|
||||
bool success; \
|
||||
__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
|
||||
__typeof__(*(_ptr)) __old = *_old; \
|
||||
__typeof__(*(_ptr)) __new = (_new); \
|
||||
asm_volatile_goto("\n" \
|
||||
"1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\
|
||||
_ASM_EXTABLE_UA(1b, %l[label]) \
|
||||
: CC_OUT(z) (success), \
|
||||
[ptr] "+m" (*_ptr), \
|
||||
[old] "+a" (__old) \
|
||||
: [new] ltype (__new) \
|
||||
: "memory" \
|
||||
: label); \
|
||||
if (unlikely(!success)) \
|
||||
*_old = __old; \
|
||||
likely(success); })
|
||||
|
||||
#ifdef CONFIG_X86_32
|
||||
#define __try_cmpxchg64_user_asm(_ptr, _pold, _new, label) ({ \
|
||||
bool success; \
|
||||
__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
|
||||
__typeof__(*(_ptr)) __old = *_old; \
|
||||
__typeof__(*(_ptr)) __new = (_new); \
|
||||
asm_volatile_goto("\n" \
|
||||
"1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n" \
|
||||
_ASM_EXTABLE_UA(1b, %l[label]) \
|
||||
: CC_OUT(z) (success), \
|
||||
"+A" (__old), \
|
||||
[ptr] "+m" (*_ptr) \
|
||||
: "b" ((u32)__new), \
|
||||
"c" ((u32)((u64)__new >> 32)) \
|
||||
: "memory" \
|
||||
: label); \
|
||||
if (unlikely(!success)) \
|
||||
*_old = __old; \
|
||||
likely(success); })
|
||||
#endif // CONFIG_X86_32
|
||||
#else // !CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT
|
||||
#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \
|
||||
int __err = 0; \
|
||||
bool success; \
|
||||
__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
|
||||
__typeof__(*(_ptr)) __old = *_old; \
|
||||
__typeof__(*(_ptr)) __new = (_new); \
|
||||
asm volatile("\n" \
|
||||
"1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\
|
||||
CC_SET(z) \
|
||||
"2:\n" \
|
||||
_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, \
|
||||
%[errout]) \
|
||||
: CC_OUT(z) (success), \
|
||||
[errout] "+r" (__err), \
|
||||
[ptr] "+m" (*_ptr), \
|
||||
[old] "+a" (__old) \
|
||||
: [new] ltype (__new) \
|
||||
: "memory", "cc"); \
|
||||
if (unlikely(__err)) \
|
||||
goto label; \
|
||||
if (unlikely(!success)) \
|
||||
*_old = __old; \
|
||||
likely(success); })
|
||||
|
||||
#ifdef CONFIG_X86_32
|
||||
/*
|
||||
* Unlike the normal CMPXCHG, hardcode ECX for both success/fail and error.
|
||||
* There are only six GPRs available and four (EAX, EBX, ECX, and EDX) are
|
||||
* hardcoded by CMPXCHG8B, leaving only ESI and EDI. If the compiler uses
|
||||
* both ESI and EDI for the memory operand, compilation will fail if the error
|
||||
* is an input+output as there will be no register available for input.
|
||||
*/
|
||||
#define __try_cmpxchg64_user_asm(_ptr, _pold, _new, label) ({ \
|
||||
int __result; \
|
||||
__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
|
||||
__typeof__(*(_ptr)) __old = *_old; \
|
||||
__typeof__(*(_ptr)) __new = (_new); \
|
||||
asm volatile("\n" \
|
||||
"1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n" \
|
||||
"mov $0, %%ecx\n\t" \
|
||||
"setz %%cl\n" \
|
||||
"2:\n" \
|
||||
_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, %%ecx) \
|
||||
: [result]"=c" (__result), \
|
||||
"+A" (__old), \
|
||||
[ptr] "+m" (*_ptr) \
|
||||
: "b" ((u32)__new), \
|
||||
"c" ((u32)((u64)__new >> 32)) \
|
||||
: "memory", "cc"); \
|
||||
if (unlikely(__result < 0)) \
|
||||
goto label; \
|
||||
if (unlikely(!__result)) \
|
||||
*_old = __old; \
|
||||
likely(__result); })
|
||||
#endif // CONFIG_X86_32
|
||||
#endif // CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT
|
||||
|
||||
/* FIXME: this hack is definitely wrong -AK */
|
||||
struct __large_struct { unsigned long buf[100]; };
|
||||
#define __m(x) (*(struct __large_struct __user *)(x))
|
||||
@ -474,6 +571,51 @@ do { \
|
||||
} while (0)
|
||||
#endif // CONFIG_CC_HAS_ASM_GOTO_OUTPUT
|
||||
|
||||
extern void __try_cmpxchg_user_wrong_size(void);
|
||||
|
||||
#ifndef CONFIG_X86_32
|
||||
#define __try_cmpxchg64_user_asm(_ptr, _oldp, _nval, _label) \
|
||||
__try_cmpxchg_user_asm("q", "r", (_ptr), (_oldp), (_nval), _label)
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Force the pointer to u<size> to match the size expected by the asm helper.
|
||||
* clang/LLVM compiles all cases and only discards the unused paths after
|
||||
* processing errors, which breaks i386 if the pointer is an 8-byte value.
|
||||
*/
|
||||
#define unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \
|
||||
bool __ret; \
|
||||
__chk_user_ptr(_ptr); \
|
||||
switch (sizeof(*(_ptr))) { \
|
||||
case 1: __ret = __try_cmpxchg_user_asm("b", "q", \
|
||||
(__force u8 *)(_ptr), (_oldp), \
|
||||
(_nval), _label); \
|
||||
break; \
|
||||
case 2: __ret = __try_cmpxchg_user_asm("w", "r", \
|
||||
(__force u16 *)(_ptr), (_oldp), \
|
||||
(_nval), _label); \
|
||||
break; \
|
||||
case 4: __ret = __try_cmpxchg_user_asm("l", "r", \
|
||||
(__force u32 *)(_ptr), (_oldp), \
|
||||
(_nval), _label); \
|
||||
break; \
|
||||
case 8: __ret = __try_cmpxchg64_user_asm((__force u64 *)(_ptr), (_oldp),\
|
||||
(_nval), _label); \
|
||||
break; \
|
||||
default: __try_cmpxchg_user_wrong_size(); \
|
||||
} \
|
||||
__ret; })
|
||||
|
||||
/* "Returns" 0 on success, 1 on failure, -EFAULT if the access faults. */
|
||||
#define __try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \
|
||||
int __ret = -EFAULT; \
|
||||
__uaccess_begin_nospec(); \
|
||||
__ret = !unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label); \
|
||||
_label: \
|
||||
__uaccess_end(); \
|
||||
__ret; \
|
||||
})
|
||||
|
||||
/*
|
||||
* We want the unsafe accessors to always be inlined and use
|
||||
* the error labels - thus the macro games.
|
||||
|
@ -543,16 +543,14 @@ enum vm_entry_failure_code {
|
||||
#define EPT_VIOLATION_ACC_READ_BIT 0
|
||||
#define EPT_VIOLATION_ACC_WRITE_BIT 1
|
||||
#define EPT_VIOLATION_ACC_INSTR_BIT 2
|
||||
#define EPT_VIOLATION_READABLE_BIT 3
|
||||
#define EPT_VIOLATION_WRITABLE_BIT 4
|
||||
#define EPT_VIOLATION_EXECUTABLE_BIT 5
|
||||
#define EPT_VIOLATION_RWX_SHIFT 3
|
||||
#define EPT_VIOLATION_GVA_IS_VALID_BIT 7
|
||||
#define EPT_VIOLATION_GVA_TRANSLATED_BIT 8
|
||||
#define EPT_VIOLATION_ACC_READ (1 << EPT_VIOLATION_ACC_READ_BIT)
|
||||
#define EPT_VIOLATION_ACC_WRITE (1 << EPT_VIOLATION_ACC_WRITE_BIT)
|
||||
#define EPT_VIOLATION_ACC_INSTR (1 << EPT_VIOLATION_ACC_INSTR_BIT)
|
||||
#define EPT_VIOLATION_READABLE (1 << EPT_VIOLATION_READABLE_BIT)
|
||||
#define EPT_VIOLATION_WRITABLE (1 << EPT_VIOLATION_WRITABLE_BIT)
|
||||
#define EPT_VIOLATION_EXECUTABLE (1 << EPT_VIOLATION_EXECUTABLE_BIT)
|
||||
#define EPT_VIOLATION_RWX_MASK (VMX_EPT_RWX_MASK << EPT_VIOLATION_RWX_SHIFT)
|
||||
#define EPT_VIOLATION_GVA_IS_VALID (1 << EPT_VIOLATION_GVA_IS_VALID_BIT)
|
||||
#define EPT_VIOLATION_GVA_TRANSLATED (1 << EPT_VIOLATION_GVA_TRANSLATED_BIT)
|
||||
|
||||
/*
|
||||
|
@ -428,11 +428,12 @@ struct kvm_sync_regs {
|
||||
struct kvm_vcpu_events events;
|
||||
};
|
||||
|
||||
#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0)
|
||||
#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1)
|
||||
#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2)
|
||||
#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3)
|
||||
#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
|
||||
#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0)
|
||||
#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1)
|
||||
#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2)
|
||||
#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3)
|
||||
#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
|
||||
#define KVM_X86_QUIRK_FIX_HYPERCALL_INSN (1 << 5)
|
||||
|
||||
#define KVM_STATE_NESTED_FORMAT_VMX 0
|
||||
#define KVM_STATE_NESTED_FORMAT_SVM 1
|
||||
|
@ -5,7 +5,7 @@
|
||||
|
||||
#include <asm/ia32.h>
|
||||
|
||||
#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_PARAVIRT_SPINLOCKS)
|
||||
#if defined(CONFIG_KVM_GUEST)
|
||||
#include <asm/kvm_para.h>
|
||||
#endif
|
||||
|
||||
@ -20,7 +20,7 @@ int main(void)
|
||||
BLANK();
|
||||
#endif
|
||||
|
||||
#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_PARAVIRT_SPINLOCKS)
|
||||
#if defined(CONFIG_KVM_GUEST)
|
||||
OFFSET(KVM_STEAL_TIME_preempted, kvm_steal_time, preempted);
|
||||
BLANK();
|
||||
#endif
|
||||
|
@ -14,6 +14,8 @@
|
||||
#include <asm/traps.h>
|
||||
#include <asm/irq_regs.h>
|
||||
|
||||
#include <uapi/asm/kvm.h>
|
||||
|
||||
#include <linux/hardirq.h>
|
||||
#include <linux/pkeys.h>
|
||||
#include <linux/vmalloc.h>
|
||||
@ -232,7 +234,20 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
|
||||
gfpu->fpstate = fpstate;
|
||||
gfpu->xfeatures = fpu_user_cfg.default_features;
|
||||
gfpu->perm = fpu_user_cfg.default_features;
|
||||
gfpu->uabi_size = fpu_user_cfg.default_size;
|
||||
|
||||
/*
|
||||
* KVM sets the FP+SSE bits in the XSAVE header when copying FPU state
|
||||
* to userspace, even when XSAVE is unsupported, so that restoring FPU
|
||||
* state on a different CPU that does support XSAVE can cleanly load
|
||||
* the incoming state using its natural XSAVE. In other words, KVM's
|
||||
* uABI size may be larger than this host's default size. Conversely,
|
||||
* the default size should never be larger than KVM's base uABI size;
|
||||
* all features that can expand the uABI size must be opt-in.
|
||||
*/
|
||||
gfpu->uabi_size = sizeof(struct kvm_xsave);
|
||||
if (WARN_ON_ONCE(fpu_user_cfg.default_size > gfpu->uabi_size))
|
||||
gfpu->uabi_size = fpu_user_cfg.default_size;
|
||||
|
||||
fpu_init_guest_permissions(gfpu);
|
||||
|
||||
return true;
|
||||
|
@ -191,7 +191,7 @@ void kvm_async_pf_task_wake(u32 token)
|
||||
{
|
||||
u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
|
||||
struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
|
||||
struct kvm_task_sleep_node *n;
|
||||
struct kvm_task_sleep_node *n, *dummy = NULL;
|
||||
|
||||
if (token == ~0) {
|
||||
apf_task_wake_all();
|
||||
@ -203,28 +203,41 @@ again:
|
||||
n = _find_apf_task(b, token);
|
||||
if (!n) {
|
||||
/*
|
||||
* async PF was not yet handled.
|
||||
* Add dummy entry for the token.
|
||||
* Async #PF not yet handled, add a dummy entry for the token.
|
||||
* Allocating the token must be down outside of the raw lock
|
||||
* as the allocator is preemptible on PREEMPT_RT kernels.
|
||||
*/
|
||||
n = kzalloc(sizeof(*n), GFP_ATOMIC);
|
||||
if (!n) {
|
||||
/*
|
||||
* Allocation failed! Busy wait while other cpu
|
||||
* handles async PF.
|
||||
*/
|
||||
if (!dummy) {
|
||||
raw_spin_unlock(&b->lock);
|
||||
cpu_relax();
|
||||
dummy = kzalloc(sizeof(*dummy), GFP_ATOMIC);
|
||||
|
||||
/*
|
||||
* Continue looping on allocation failure, eventually
|
||||
* the async #PF will be handled and allocating a new
|
||||
* node will be unnecessary.
|
||||
*/
|
||||
if (!dummy)
|
||||
cpu_relax();
|
||||
|
||||
/*
|
||||
* Recheck for async #PF completion before enqueueing
|
||||
* the dummy token to avoid duplicate list entries.
|
||||
*/
|
||||
goto again;
|
||||
}
|
||||
n->token = token;
|
||||
n->cpu = smp_processor_id();
|
||||
init_swait_queue_head(&n->wq);
|
||||
hlist_add_head(&n->link, &b->list);
|
||||
dummy->token = token;
|
||||
dummy->cpu = smp_processor_id();
|
||||
init_swait_queue_head(&dummy->wq);
|
||||
hlist_add_head(&dummy->link, &b->list);
|
||||
dummy = NULL;
|
||||
} else {
|
||||
apf_task_wake_one(n);
|
||||
}
|
||||
raw_spin_unlock(&b->lock);
|
||||
return;
|
||||
|
||||
/* A dummy token might be allocated and ultimately not used. */
|
||||
if (dummy)
|
||||
kfree(dummy);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);
|
||||
|
||||
@ -765,6 +778,42 @@ static void kvm_crash_shutdown(struct pt_regs *regs)
|
||||
}
|
||||
#endif
|
||||
|
||||
#if defined(CONFIG_X86_32) || !defined(CONFIG_SMP)
|
||||
bool __kvm_vcpu_is_preempted(long cpu);
|
||||
|
||||
__visible bool __kvm_vcpu_is_preempted(long cpu)
|
||||
{
|
||||
struct kvm_steal_time *src = &per_cpu(steal_time, cpu);
|
||||
|
||||
return !!(src->preempted & KVM_VCPU_PREEMPTED);
|
||||
}
|
||||
PV_CALLEE_SAVE_REGS_THUNK(__kvm_vcpu_is_preempted);
|
||||
|
||||
#else
|
||||
|
||||
#include <asm/asm-offsets.h>
|
||||
|
||||
extern bool __raw_callee_save___kvm_vcpu_is_preempted(long);
|
||||
|
||||
/*
|
||||
* Hand-optimize version for x86-64 to avoid 8 64-bit register saving and
|
||||
* restoring to/from the stack.
|
||||
*/
|
||||
asm(
|
||||
".pushsection .text;"
|
||||
".global __raw_callee_save___kvm_vcpu_is_preempted;"
|
||||
".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
|
||||
"__raw_callee_save___kvm_vcpu_is_preempted:"
|
||||
ASM_ENDBR
|
||||
"movq __per_cpu_offset(,%rdi,8), %rax;"
|
||||
"cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);"
|
||||
"setne %al;"
|
||||
ASM_RET
|
||||
".size __raw_callee_save___kvm_vcpu_is_preempted, .-__raw_callee_save___kvm_vcpu_is_preempted;"
|
||||
".popsection");
|
||||
|
||||
#endif
|
||||
|
||||
static void __init kvm_guest_init(void)
|
||||
{
|
||||
int i;
|
||||
@ -777,6 +826,9 @@ static void __init kvm_guest_init(void)
|
||||
if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
|
||||
has_steal_clock = 1;
|
||||
static_call_update(pv_steal_clock, kvm_steal_clock);
|
||||
|
||||
pv_ops.lock.vcpu_is_preempted =
|
||||
PV_CALLEE_SAVE(__kvm_vcpu_is_preempted);
|
||||
}
|
||||
|
||||
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
|
||||
@ -1018,40 +1070,6 @@ static void kvm_wait(u8 *ptr, u8 val)
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef CONFIG_X86_32
|
||||
__visible bool __kvm_vcpu_is_preempted(long cpu)
|
||||
{
|
||||
struct kvm_steal_time *src = &per_cpu(steal_time, cpu);
|
||||
|
||||
return !!(src->preempted & KVM_VCPU_PREEMPTED);
|
||||
}
|
||||
PV_CALLEE_SAVE_REGS_THUNK(__kvm_vcpu_is_preempted);
|
||||
|
||||
#else
|
||||
|
||||
#include <asm/asm-offsets.h>
|
||||
|
||||
extern bool __raw_callee_save___kvm_vcpu_is_preempted(long);
|
||||
|
||||
/*
|
||||
* Hand-optimize version for x86-64 to avoid 8 64-bit register saving and
|
||||
* restoring to/from the stack.
|
||||
*/
|
||||
asm(
|
||||
".pushsection .text;"
|
||||
".global __raw_callee_save___kvm_vcpu_is_preempted;"
|
||||
".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
|
||||
"__raw_callee_save___kvm_vcpu_is_preempted:"
|
||||
ASM_ENDBR
|
||||
"movq __per_cpu_offset(,%rdi,8), %rax;"
|
||||
"cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);"
|
||||
"setne %al;"
|
||||
ASM_RET
|
||||
".size __raw_callee_save___kvm_vcpu_is_preempted, .-__raw_callee_save___kvm_vcpu_is_preempted;"
|
||||
".popsection");
|
||||
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
|
||||
*/
|
||||
@ -1095,10 +1113,6 @@ void __init kvm_spinlock_init(void)
|
||||
pv_ops.lock.wait = kvm_wait;
|
||||
pv_ops.lock.kick = kvm_kick_cpu;
|
||||
|
||||
if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
|
||||
pv_ops.lock.vcpu_is_preempted =
|
||||
PV_CALLEE_SAVE(__kvm_vcpu_is_preempted);
|
||||
}
|
||||
/*
|
||||
* When PV spinlock is enabled which is preferred over
|
||||
* virt_spin_lock(), virt_spin_lock_key's value is meaningless.
|
||||
|
@ -239,7 +239,7 @@ static void __init kvmclock_init_mem(void)
|
||||
|
||||
static int __init kvm_setup_vsyscall_timeinfo(void)
|
||||
{
|
||||
if (!kvm_para_available() || !kvmclock)
|
||||
if (!kvm_para_available() || !kvmclock || nopv)
|
||||
return 0;
|
||||
|
||||
kvmclock_init_mem();
|
||||
|
@ -252,7 +252,6 @@ int kvm_pic_read_irq(struct kvm *kvm)
|
||||
*/
|
||||
irq2 = 7;
|
||||
intno = s->pics[1].irq_base + irq2;
|
||||
irq = irq2 + 8;
|
||||
} else
|
||||
intno = s->pics[0].irq_base + irq;
|
||||
} else {
|
||||
|
@ -22,10 +22,14 @@
|
||||
*/
|
||||
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (lapic_in_kernel(vcpu))
|
||||
return apic_has_pending_timer(vcpu);
|
||||
int r = 0;
|
||||
|
||||
return 0;
|
||||
if (lapic_in_kernel(vcpu))
|
||||
r = apic_has_pending_timer(vcpu);
|
||||
if (kvm_xen_timer_enabled(vcpu))
|
||||
r += kvm_xen_has_pending_timer(vcpu);
|
||||
|
||||
return r;
|
||||
}
|
||||
EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
|
||||
|
||||
@ -143,6 +147,8 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (lapic_in_kernel(vcpu))
|
||||
kvm_inject_apic_timer_irqs(vcpu);
|
||||
if (kvm_xen_timer_enabled(vcpu))
|
||||
kvm_xen_inject_timer_irqs(vcpu);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs);
|
||||
|
||||
|
@ -181,7 +181,7 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
|
||||
if (!level)
|
||||
return -1;
|
||||
|
||||
return kvm_xen_set_evtchn_fast(e, kvm);
|
||||
return kvm_xen_set_evtchn_fast(&e->xen_evtchn, kvm);
|
||||
#endif
|
||||
default:
|
||||
break;
|
||||
|
@ -1548,6 +1548,7 @@ static void cancel_apic_timer(struct kvm_lapic *apic)
|
||||
if (apic->lapic_timer.hv_timer_in_use)
|
||||
cancel_hv_timer(apic);
|
||||
preempt_enable();
|
||||
atomic_set(&apic->lapic_timer.pending, 0);
|
||||
}
|
||||
|
||||
static void apic_update_lvtt(struct kvm_lapic *apic)
|
||||
@ -1648,10 +1649,10 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
|
||||
tsc_deadline = apic->lapic_timer.expired_tscdeadline;
|
||||
apic->lapic_timer.expired_tscdeadline = 0;
|
||||
guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
|
||||
apic->lapic_timer.advance_expire_delta = guest_tsc - tsc_deadline;
|
||||
trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline);
|
||||
|
||||
if (lapic_timer_advance_dynamic) {
|
||||
adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
|
||||
adjust_lapic_timer_advance(vcpu, guest_tsc - tsc_deadline);
|
||||
/*
|
||||
* If the timer fired early, reread the TSC to account for the
|
||||
* overhead of the above adjustment to avoid waiting longer
|
||||
|
@ -38,7 +38,6 @@ struct kvm_timer {
|
||||
u64 tscdeadline;
|
||||
u64 expired_tscdeadline;
|
||||
u32 timer_advance_ns;
|
||||
s64 advance_expire_delta;
|
||||
atomic_t pending; /* accumulated triggered timers */
|
||||
bool hv_timer_in_use;
|
||||
};
|
||||
|
@ -89,7 +89,27 @@ static inline gfn_t kvm_mmu_max_gfn(void)
|
||||
return (1ULL << (max_gpa_bits - PAGE_SHIFT)) - 1;
|
||||
}
|
||||
|
||||
static inline u8 kvm_get_shadow_phys_bits(void)
|
||||
{
|
||||
/*
|
||||
* boot_cpu_data.x86_phys_bits is reduced when MKTME or SME are detected
|
||||
* in CPU detection code, but the processor treats those reduced bits as
|
||||
* 'keyID' thus they are not reserved bits. Therefore KVM needs to look at
|
||||
* the physical address bits reported by CPUID.
|
||||
*/
|
||||
if (likely(boot_cpu_data.extended_cpuid_level >= 0x80000008))
|
||||
return cpuid_eax(0x80000008) & 0xff;
|
||||
|
||||
/*
|
||||
* Quite weird to have VMX or SVM but not MAXPHYADDR; probably a VM with
|
||||
* custom CPUID. Proceed with whatever the kernel found since these features
|
||||
* aren't virtualizable (SME/SEV also require CPUIDs higher than 0x80000008).
|
||||
*/
|
||||
return boot_cpu_data.x86_phys_bits;
|
||||
}
|
||||
|
||||
void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
|
||||
void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
|
||||
void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
|
||||
|
||||
void kvm_init_mmu(struct kvm_vcpu *vcpu);
|
||||
@ -138,94 +158,7 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
|
||||
return;
|
||||
|
||||
static_call(kvm_x86_load_mmu_pgd)(vcpu, root_hpa,
|
||||
vcpu->arch.mmu->shadow_root_level);
|
||||
}
|
||||
|
||||
struct kvm_page_fault {
|
||||
/* arguments to kvm_mmu_do_page_fault. */
|
||||
const gpa_t addr;
|
||||
const u32 error_code;
|
||||
const bool prefetch;
|
||||
|
||||
/* Derived from error_code. */
|
||||
const bool exec;
|
||||
const bool write;
|
||||
const bool present;
|
||||
const bool rsvd;
|
||||
const bool user;
|
||||
|
||||
/* Derived from mmu and global state. */
|
||||
const bool is_tdp;
|
||||
const bool nx_huge_page_workaround_enabled;
|
||||
|
||||
/*
|
||||
* Whether a >4KB mapping can be created or is forbidden due to NX
|
||||
* hugepages.
|
||||
*/
|
||||
bool huge_page_disallowed;
|
||||
|
||||
/*
|
||||
* Maximum page size that can be created for this fault; input to
|
||||
* FNAME(fetch), __direct_map and kvm_tdp_mmu_map.
|
||||
*/
|
||||
u8 max_level;
|
||||
|
||||
/*
|
||||
* Page size that can be created based on the max_level and the
|
||||
* page size used by the host mapping.
|
||||
*/
|
||||
u8 req_level;
|
||||
|
||||
/*
|
||||
* Page size that will be created based on the req_level and
|
||||
* huge_page_disallowed.
|
||||
*/
|
||||
u8 goal_level;
|
||||
|
||||
/* Shifted addr, or result of guest page table walk if addr is a gva. */
|
||||
gfn_t gfn;
|
||||
|
||||
/* The memslot containing gfn. May be NULL. */
|
||||
struct kvm_memory_slot *slot;
|
||||
|
||||
/* Outputs of kvm_faultin_pfn. */
|
||||
kvm_pfn_t pfn;
|
||||
hva_t hva;
|
||||
bool map_writable;
|
||||
};
|
||||
|
||||
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
|
||||
|
||||
extern int nx_huge_pages;
|
||||
static inline bool is_nx_huge_page_enabled(void)
|
||||
{
|
||||
return READ_ONCE(nx_huge_pages);
|
||||
}
|
||||
|
||||
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
u32 err, bool prefetch)
|
||||
{
|
||||
struct kvm_page_fault fault = {
|
||||
.addr = cr2_or_gpa,
|
||||
.error_code = err,
|
||||
.exec = err & PFERR_FETCH_MASK,
|
||||
.write = err & PFERR_WRITE_MASK,
|
||||
.present = err & PFERR_PRESENT_MASK,
|
||||
.rsvd = err & PFERR_RSVD_MASK,
|
||||
.user = err & PFERR_USER_MASK,
|
||||
.prefetch = prefetch,
|
||||
.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
|
||||
.nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(),
|
||||
|
||||
.max_level = KVM_MAX_HUGEPAGE_LEVEL,
|
||||
.req_level = PG_LEVEL_4K,
|
||||
.goal_level = PG_LEVEL_4K,
|
||||
};
|
||||
#ifdef CONFIG_RETPOLINE
|
||||
if (fault.is_tdp)
|
||||
return kvm_tdp_page_fault(vcpu, &fault);
|
||||
#endif
|
||||
return vcpu->arch.mmu->page_fault(vcpu, &fault);
|
||||
vcpu->arch.mmu->root_role.level);
|
||||
}
|
||||
|
||||
/*
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -140,9 +140,72 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
|
||||
u64 start_gfn, u64 pages);
|
||||
unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
|
||||
|
||||
extern int nx_huge_pages;
|
||||
static inline bool is_nx_huge_page_enabled(void)
|
||||
{
|
||||
return READ_ONCE(nx_huge_pages);
|
||||
}
|
||||
|
||||
struct kvm_page_fault {
|
||||
/* arguments to kvm_mmu_do_page_fault. */
|
||||
const gpa_t addr;
|
||||
const u32 error_code;
|
||||
const bool prefetch;
|
||||
|
||||
/* Derived from error_code. */
|
||||
const bool exec;
|
||||
const bool write;
|
||||
const bool present;
|
||||
const bool rsvd;
|
||||
const bool user;
|
||||
|
||||
/* Derived from mmu and global state. */
|
||||
const bool is_tdp;
|
||||
const bool nx_huge_page_workaround_enabled;
|
||||
|
||||
/*
|
||||
* Whether a >4KB mapping can be created or is forbidden due to NX
|
||||
* hugepages.
|
||||
*/
|
||||
bool huge_page_disallowed;
|
||||
|
||||
/*
|
||||
* Maximum page size that can be created for this fault; input to
|
||||
* FNAME(fetch), __direct_map and kvm_tdp_mmu_map.
|
||||
*/
|
||||
u8 max_level;
|
||||
|
||||
/*
|
||||
* Page size that can be created based on the max_level and the
|
||||
* page size used by the host mapping.
|
||||
*/
|
||||
u8 req_level;
|
||||
|
||||
/*
|
||||
* Page size that will be created based on the req_level and
|
||||
* huge_page_disallowed.
|
||||
*/
|
||||
u8 goal_level;
|
||||
|
||||
/* Shifted addr, or result of guest page table walk if addr is a gva. */
|
||||
gfn_t gfn;
|
||||
|
||||
/* The memslot containing gfn. May be NULL. */
|
||||
struct kvm_memory_slot *slot;
|
||||
|
||||
/* Outputs of kvm_faultin_pfn. */
|
||||
kvm_pfn_t pfn;
|
||||
hva_t hva;
|
||||
bool map_writable;
|
||||
};
|
||||
|
||||
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
|
||||
|
||||
/*
|
||||
* Return values of handle_mmio_page_fault, mmu.page_fault, and fast_page_fault().
|
||||
* Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
|
||||
* and of course kvm_mmu_do_page_fault().
|
||||
*
|
||||
* RET_PF_CONTINUE: So far, so good, keep handling the page fault.
|
||||
* RET_PF_RETRY: let CPU fault again on the address.
|
||||
* RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
|
||||
* RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
|
||||
@ -151,15 +214,71 @@ unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
|
||||
*
|
||||
* Any names added to this enum should be exported to userspace for use in
|
||||
* tracepoints via TRACE_DEFINE_ENUM() in mmutrace.h
|
||||
*
|
||||
* Note, all values must be greater than or equal to zero so as not to encroach
|
||||
* on -errno return values. Somewhat arbitrarily use '0' for CONTINUE, which
|
||||
* will allow for efficient machine code when checking for CONTINUE, e.g.
|
||||
* "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero.
|
||||
*/
|
||||
enum {
|
||||
RET_PF_RETRY = 0,
|
||||
RET_PF_CONTINUE = 0,
|
||||
RET_PF_RETRY,
|
||||
RET_PF_EMULATE,
|
||||
RET_PF_INVALID,
|
||||
RET_PF_FIXED,
|
||||
RET_PF_SPURIOUS,
|
||||
};
|
||||
|
||||
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
u32 err, bool prefetch)
|
||||
{
|
||||
struct kvm_page_fault fault = {
|
||||
.addr = cr2_or_gpa,
|
||||
.error_code = err,
|
||||
.exec = err & PFERR_FETCH_MASK,
|
||||
.write = err & PFERR_WRITE_MASK,
|
||||
.present = err & PFERR_PRESENT_MASK,
|
||||
.rsvd = err & PFERR_RSVD_MASK,
|
||||
.user = err & PFERR_USER_MASK,
|
||||
.prefetch = prefetch,
|
||||
.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
|
||||
.nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(),
|
||||
|
||||
.max_level = KVM_MAX_HUGEPAGE_LEVEL,
|
||||
.req_level = PG_LEVEL_4K,
|
||||
.goal_level = PG_LEVEL_4K,
|
||||
};
|
||||
int r;
|
||||
|
||||
/*
|
||||
* Async #PF "faults", a.k.a. prefetch faults, are not faults from the
|
||||
* guest perspective and have already been counted at the time of the
|
||||
* original fault.
|
||||
*/
|
||||
if (!prefetch)
|
||||
vcpu->stat.pf_taken++;
|
||||
|
||||
if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp)
|
||||
r = kvm_tdp_page_fault(vcpu, &fault);
|
||||
else
|
||||
r = vcpu->arch.mmu->page_fault(vcpu, &fault);
|
||||
|
||||
/*
|
||||
* Similar to above, prefetch faults aren't truly spurious, and the
|
||||
* async #PF path doesn't do emulation. Do count faults that are fixed
|
||||
* by the async #PF handler though, otherwise they'll never be counted.
|
||||
*/
|
||||
if (r == RET_PF_FIXED)
|
||||
vcpu->stat.pf_fixed++;
|
||||
else if (prefetch)
|
||||
;
|
||||
else if (r == RET_PF_EMULATE)
|
||||
vcpu->stat.pf_emulate++;
|
||||
else if (r == RET_PF_SPURIOUS)
|
||||
vcpu->stat.pf_spurious++;
|
||||
return r;
|
||||
}
|
||||
|
||||
int kvm_mmu_max_mapping_level(struct kvm *kvm,
|
||||
const struct kvm_memory_slot *slot, gfn_t gfn,
|
||||
kvm_pfn_t pfn, int max_level);
|
||||
|
@ -54,6 +54,7 @@
|
||||
{ PFERR_RSVD_MASK, "RSVD" }, \
|
||||
{ PFERR_FETCH_MASK, "F" }
|
||||
|
||||
TRACE_DEFINE_ENUM(RET_PF_CONTINUE);
|
||||
TRACE_DEFINE_ENUM(RET_PF_RETRY);
|
||||
TRACE_DEFINE_ENUM(RET_PF_EMULATE);
|
||||
TRACE_DEFINE_ENUM(RET_PF_INVALID);
|
||||
|
@ -63,7 +63,7 @@
|
||||
#define PT_LEVEL_BITS PT64_LEVEL_BITS
|
||||
#define PT_GUEST_DIRTY_SHIFT 9
|
||||
#define PT_GUEST_ACCESSED_SHIFT 8
|
||||
#define PT_HAVE_ACCESSED_DIRTY(mmu) ((mmu)->ept_ad)
|
||||
#define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.ad_disabled)
|
||||
#ifdef CONFIG_X86_64
|
||||
#define CMPXCHG "cmpxchgq"
|
||||
#endif
|
||||
@ -144,42 +144,6 @@ static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level)
|
||||
FNAME(is_bad_mt_xwr)(&mmu->guest_rsvd_check, gpte);
|
||||
}
|
||||
|
||||
static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
pt_element_t __user *ptep_user, unsigned index,
|
||||
pt_element_t orig_pte, pt_element_t new_pte)
|
||||
{
|
||||
signed char r;
|
||||
|
||||
if (!user_access_begin(ptep_user, sizeof(pt_element_t)))
|
||||
return -EFAULT;
|
||||
|
||||
#ifdef CMPXCHG
|
||||
asm volatile("1:" LOCK_PREFIX CMPXCHG " %[new], %[ptr]\n"
|
||||
"setnz %b[r]\n"
|
||||
"2:"
|
||||
_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, %k[r])
|
||||
: [ptr] "+m" (*ptep_user),
|
||||
[old] "+a" (orig_pte),
|
||||
[r] "=q" (r)
|
||||
: [new] "r" (new_pte)
|
||||
: "memory");
|
||||
#else
|
||||
asm volatile("1:" LOCK_PREFIX "cmpxchg8b %[ptr]\n"
|
||||
"setnz %b[r]\n"
|
||||
"2:"
|
||||
_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, %k[r])
|
||||
: [ptr] "+m" (*ptep_user),
|
||||
[old] "+A" (orig_pte),
|
||||
[r] "=q" (r)
|
||||
: [new_lo] "b" ((u32)new_pte),
|
||||
[new_hi] "c" ((u32)(new_pte >> 32))
|
||||
: "memory");
|
||||
#endif
|
||||
|
||||
user_access_end();
|
||||
return r;
|
||||
}
|
||||
|
||||
static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mmu_page *sp, u64 *spte,
|
||||
u64 gpte)
|
||||
@ -187,7 +151,7 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
|
||||
if (!FNAME(is_present_gpte)(gpte))
|
||||
goto no_present;
|
||||
|
||||
/* if accessed bit is not supported prefetch non accessed gpte */
|
||||
/* Prefetch only accessed entries (unless A/D bits are disabled). */
|
||||
if (PT_HAVE_ACCESSED_DIRTY(vcpu->arch.mmu) &&
|
||||
!(gpte & PT_GUEST_ACCESSED_MASK))
|
||||
goto no_present;
|
||||
@ -278,7 +242,7 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
|
||||
if (unlikely(!walker->pte_writable[level - 1]))
|
||||
continue;
|
||||
|
||||
ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, orig_pte, pte);
|
||||
ret = __try_cmpxchg_user(ptep_user, &orig_pte, pte, fault);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
@ -317,7 +281,7 @@ static inline bool FNAME(is_last_gpte)(struct kvm_mmu *mmu,
|
||||
* is not reserved and does not indicate a large page at this level,
|
||||
* so clear PT_PAGE_SIZE_MASK in gpte if that is the case.
|
||||
*/
|
||||
gpte &= level - (PT32_ROOT_LEVEL + mmu->mmu_role.ext.cr4_pse);
|
||||
gpte &= level - (PT32_ROOT_LEVEL + mmu->cpu_role.ext.cr4_pse);
|
||||
#endif
|
||||
/*
|
||||
* PG_LEVEL_4K always terminates. The RHS has bit 7 set
|
||||
@ -355,7 +319,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
|
||||
|
||||
trace_kvm_mmu_pagetable_walk(addr, access);
|
||||
retry_walk:
|
||||
walker->level = mmu->root_level;
|
||||
walker->level = mmu->cpu_role.base.level;
|
||||
pte = mmu->get_guest_pgd(vcpu);
|
||||
have_ad = PT_HAVE_ACCESSED_DIRTY(mmu);
|
||||
|
||||
@ -515,14 +479,21 @@ error:
|
||||
* The other bits are set to 0.
|
||||
*/
|
||||
if (!(errcode & PFERR_RSVD_MASK)) {
|
||||
vcpu->arch.exit_qualification &= 0x180;
|
||||
vcpu->arch.exit_qualification &= (EPT_VIOLATION_GVA_IS_VALID |
|
||||
EPT_VIOLATION_GVA_TRANSLATED);
|
||||
if (write_fault)
|
||||
vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_WRITE;
|
||||
if (user_fault)
|
||||
vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_READ;
|
||||
if (fetch_fault)
|
||||
vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_INSTR;
|
||||
vcpu->arch.exit_qualification |= (pte_access & 0x7) << 3;
|
||||
|
||||
/*
|
||||
* Note, pte_access holds the raw RWX bits from the EPTE, not
|
||||
* ACC_*_MASK flags!
|
||||
*/
|
||||
vcpu->arch.exit_qualification |= (pte_access & VMX_EPT_RWX_MASK) <<
|
||||
EPT_VIOLATION_RWX_SHIFT;
|
||||
}
|
||||
#endif
|
||||
walker->fault.address = addr;
|
||||
@ -650,7 +621,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
|
||||
WARN_ON_ONCE(gw->gfn != base_gfn);
|
||||
direct_access = gw->pte_access;
|
||||
|
||||
top_level = vcpu->arch.mmu->root_level;
|
||||
top_level = vcpu->arch.mmu->cpu_role.base.level;
|
||||
if (top_level == PT32E_ROOT_LEVEL)
|
||||
top_level = PT32_ROOT_LEVEL;
|
||||
/*
|
||||
@ -752,7 +723,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
|
||||
return ret;
|
||||
|
||||
FNAME(pte_prefetch)(vcpu, gw, it.sptep);
|
||||
++vcpu->stat.pf_fixed;
|
||||
return ret;
|
||||
|
||||
out_gpte_changed:
|
||||
@ -867,10 +837,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
|
||||
mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
||||
smp_rmb();
|
||||
|
||||
if (kvm_faultin_pfn(vcpu, fault, &r))
|
||||
r = kvm_faultin_pfn(vcpu, fault);
|
||||
if (r != RET_PF_CONTINUE)
|
||||
return r;
|
||||
|
||||
if (handle_abnormal_pfn(vcpu, fault, walker.pte_access, &r))
|
||||
r = handle_abnormal_pfn(vcpu, fault, walker.pte_access);
|
||||
if (r != RET_PF_CONTINUE)
|
||||
return r;
|
||||
|
||||
/*
|
||||
@ -1017,7 +989,7 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
*/
|
||||
static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
||||
{
|
||||
union kvm_mmu_page_role mmu_role = vcpu->arch.mmu->mmu_role.base;
|
||||
union kvm_mmu_page_role root_role = vcpu->arch.mmu->root_role;
|
||||
int i;
|
||||
bool host_writable;
|
||||
gpa_t first_pte_gpa;
|
||||
@ -1036,6 +1008,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
||||
.level = 0xf,
|
||||
.access = 0x7,
|
||||
.quadrant = 0x3,
|
||||
.passthrough = 0x1,
|
||||
};
|
||||
|
||||
/*
|
||||
@ -1045,7 +1018,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
||||
* reserved bits checks will be wrong, etc...
|
||||
*/
|
||||
if (WARN_ON_ONCE(sp->role.direct ||
|
||||
(sp->role.word ^ mmu_role.word) & ~sync_role_ign.word))
|
||||
(sp->role.word ^ root_role.word) & ~sync_role_ign.word))
|
||||
return -1;
|
||||
|
||||
first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);
|
||||
|
@ -19,7 +19,7 @@
|
||||
#include <asm/memtype.h>
|
||||
#include <asm/vmx.h>
|
||||
|
||||
static bool __read_mostly enable_mmio_caching = true;
|
||||
bool __read_mostly enable_mmio_caching = true;
|
||||
module_param_named(mmio_caching, enable_mmio_caching, bool, 0444);
|
||||
|
||||
u64 __read_mostly shadow_host_writable_mask;
|
||||
@ -33,6 +33,7 @@ u64 __read_mostly shadow_mmio_value;
|
||||
u64 __read_mostly shadow_mmio_mask;
|
||||
u64 __read_mostly shadow_mmio_access_mask;
|
||||
u64 __read_mostly shadow_present_mask;
|
||||
u64 __read_mostly shadow_me_value;
|
||||
u64 __read_mostly shadow_me_mask;
|
||||
u64 __read_mostly shadow_acc_track_mask;
|
||||
|
||||
@ -167,8 +168,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
|
||||
else
|
||||
pte_access &= ~ACC_WRITE_MASK;
|
||||
|
||||
if (!kvm_is_mmio_pfn(pfn))
|
||||
spte |= shadow_me_mask;
|
||||
if (shadow_me_value && !kvm_is_mmio_pfn(pfn))
|
||||
spte |= shadow_me_value;
|
||||
|
||||
spte |= (u64)pfn << PAGE_SHIFT;
|
||||
|
||||
@ -284,7 +285,7 @@ u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
|
||||
u64 spte = SPTE_MMU_PRESENT_MASK;
|
||||
|
||||
spte |= __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK |
|
||||
shadow_user_mask | shadow_x_mask | shadow_me_mask;
|
||||
shadow_user_mask | shadow_x_mask | shadow_me_value;
|
||||
|
||||
if (ad_disabled)
|
||||
spte |= SPTE_TDP_AD_DISABLED_MASK;
|
||||
@ -310,25 +311,6 @@ u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte, kvm_pfn_t new_pfn)
|
||||
return new_spte;
|
||||
}
|
||||
|
||||
static u8 kvm_get_shadow_phys_bits(void)
|
||||
{
|
||||
/*
|
||||
* boot_cpu_data.x86_phys_bits is reduced when MKTME or SME are detected
|
||||
* in CPU detection code, but the processor treats those reduced bits as
|
||||
* 'keyID' thus they are not reserved bits. Therefore KVM needs to look at
|
||||
* the physical address bits reported by CPUID.
|
||||
*/
|
||||
if (likely(boot_cpu_data.extended_cpuid_level >= 0x80000008))
|
||||
return cpuid_eax(0x80000008) & 0xff;
|
||||
|
||||
/*
|
||||
* Quite weird to have VMX or SVM but not MAXPHYADDR; probably a VM with
|
||||
* custom CPUID. Proceed with whatever the kernel found since these features
|
||||
* aren't virtualizable (SME/SEV also require CPUIDs higher than 0x80000008).
|
||||
*/
|
||||
return boot_cpu_data.x86_phys_bits;
|
||||
}
|
||||
|
||||
u64 mark_spte_for_access_track(u64 spte)
|
||||
{
|
||||
if (spte_ad_enabled(spte))
|
||||
@ -379,12 +361,26 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask)
|
||||
WARN_ON(mmio_value && (REMOVED_SPTE & mmio_mask) == mmio_value))
|
||||
mmio_value = 0;
|
||||
|
||||
if (!mmio_value)
|
||||
enable_mmio_caching = false;
|
||||
|
||||
shadow_mmio_value = mmio_value;
|
||||
shadow_mmio_mask = mmio_mask;
|
||||
shadow_mmio_access_mask = access_mask;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
|
||||
|
||||
void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
|
||||
{
|
||||
/* shadow_me_value must be a subset of shadow_me_mask */
|
||||
if (WARN_ON(me_value & ~me_mask))
|
||||
me_value = me_mask = 0;
|
||||
|
||||
shadow_me_value = me_value;
|
||||
shadow_me_mask = me_mask;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask);
|
||||
|
||||
void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only)
|
||||
{
|
||||
shadow_user_mask = VMX_EPT_READABLE_MASK;
|
||||
@ -394,8 +390,6 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only)
|
||||
shadow_x_mask = VMX_EPT_EXECUTABLE_MASK;
|
||||
shadow_present_mask = has_exec_only ? 0ull : VMX_EPT_READABLE_MASK;
|
||||
shadow_acc_track_mask = VMX_EPT_RWX_MASK;
|
||||
shadow_me_mask = 0ull;
|
||||
|
||||
shadow_host_writable_mask = EPT_SPTE_HOST_WRITABLE;
|
||||
shadow_mmu_writable_mask = EPT_SPTE_MMU_WRITABLE;
|
||||
|
||||
@ -446,7 +440,8 @@ void kvm_mmu_reset_all_pte_masks(void)
|
||||
shadow_x_mask = 0;
|
||||
shadow_present_mask = PT_PRESENT_MASK;
|
||||
shadow_acc_track_mask = 0;
|
||||
shadow_me_mask = sme_me_mask;
|
||||
shadow_me_mask = 0;
|
||||
shadow_me_value = 0;
|
||||
|
||||
shadow_host_writable_mask = DEFAULT_SPTE_HOST_WRITABLE;
|
||||
shadow_mmu_writable_mask = DEFAULT_SPTE_MMU_WRITABLE;
|
||||
|
@ -5,6 +5,8 @@
|
||||
|
||||
#include "mmu_internal.h"
|
||||
|
||||
extern bool __read_mostly enable_mmio_caching;
|
||||
|
||||
/*
|
||||
* A MMU present SPTE is backed by actual memory and may or may not be present
|
||||
* in hardware. E.g. MMIO SPTEs are not considered present. Use bit 11, as it
|
||||
@ -149,6 +151,7 @@ extern u64 __read_mostly shadow_mmio_value;
|
||||
extern u64 __read_mostly shadow_mmio_mask;
|
||||
extern u64 __read_mostly shadow_mmio_access_mask;
|
||||
extern u64 __read_mostly shadow_present_mask;
|
||||
extern u64 __read_mostly shadow_me_value;
|
||||
extern u64 __read_mostly shadow_me_mask;
|
||||
|
||||
/*
|
||||
@ -204,7 +207,7 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;
|
||||
static inline bool is_mmio_spte(u64 spte)
|
||||
{
|
||||
return (spte & shadow_mmio_mask) == shadow_mmio_value &&
|
||||
likely(shadow_mmio_value);
|
||||
likely(enable_mmio_caching);
|
||||
}
|
||||
|
||||
static inline bool is_shadow_present_pte(u64 pte)
|
||||
@ -212,6 +215,17 @@ static inline bool is_shadow_present_pte(u64 pte)
|
||||
return !!(pte & SPTE_MMU_PRESENT_MASK);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true if A/D bits are supported in hardware and are enabled by KVM.
|
||||
* When enabled, KVM uses A/D bits for all non-nested MMUs. Because L1 can
|
||||
* disable A/D bits in EPTP12, SP and SPTE variants are needed to handle the
|
||||
* scenario where KVM is using A/D bits for L1, but not L2.
|
||||
*/
|
||||
static inline bool kvm_ad_enabled(void)
|
||||
{
|
||||
return !!shadow_accessed_mask;
|
||||
}
|
||||
|
||||
static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
|
||||
{
|
||||
return sp->role.ad_disabled;
|
||||
|
@ -310,7 +310,7 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp,
|
||||
|
||||
hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
union kvm_mmu_page_role role = vcpu->arch.mmu->mmu_role.base;
|
||||
union kvm_mmu_page_role role = vcpu->arch.mmu->root_role;
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
struct kvm_mmu_page *root;
|
||||
|
||||
@ -1100,6 +1100,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
|
||||
|
||||
/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */
|
||||
if (unlikely(is_mmio_spte(new_spte))) {
|
||||
vcpu->stat.pf_mmio_spte_created++;
|
||||
trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn,
|
||||
new_spte);
|
||||
ret = RET_PF_EMULATE;
|
||||
@ -1108,13 +1109,6 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
|
||||
rcu_dereference(iter->sptep));
|
||||
}
|
||||
|
||||
/*
|
||||
* Increase pf_fixed in both RET_PF_EMULATE and RET_PF_FIXED to be
|
||||
* consistent with legacy MMU behavior.
|
||||
*/
|
||||
if (ret != RET_PF_SPURIOUS)
|
||||
vcpu->stat.pf_fixed++;
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -1136,7 +1130,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
|
||||
struct kvm_mmu_page *sp, bool account_nx,
|
||||
bool shared)
|
||||
{
|
||||
u64 spte = make_nonleaf_spte(sp->spt, !shadow_accessed_mask);
|
||||
u64 spte = make_nonleaf_spte(sp->spt, !kvm_ad_enabled());
|
||||
int ret = 0;
|
||||
|
||||
if (shared) {
|
||||
@ -1859,7 +1853,7 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
|
||||
gfn_t gfn = addr >> PAGE_SHIFT;
|
||||
int leaf = -1;
|
||||
|
||||
*root_level = vcpu->arch.mmu->shadow_root_level;
|
||||
*root_level = vcpu->arch.mmu->root_role.level;
|
||||
|
||||
tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
|
||||
leaf = iter.level;
|
||||
|
@ -49,6 +49,32 @@
|
||||
* * AMD: [0 .. AMD64_NUM_COUNTERS-1] <=> gp counters
|
||||
*/
|
||||
|
||||
static struct kvm_pmu_ops kvm_pmu_ops __read_mostly;
|
||||
|
||||
#define KVM_X86_PMU_OP(func) \
|
||||
DEFINE_STATIC_CALL_NULL(kvm_x86_pmu_##func, \
|
||||
*(((struct kvm_pmu_ops *)0)->func));
|
||||
#define KVM_X86_PMU_OP_OPTIONAL KVM_X86_PMU_OP
|
||||
#include <asm/kvm-x86-pmu-ops.h>
|
||||
|
||||
void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops)
|
||||
{
|
||||
memcpy(&kvm_pmu_ops, pmu_ops, sizeof(kvm_pmu_ops));
|
||||
|
||||
#define __KVM_X86_PMU_OP(func) \
|
||||
static_call_update(kvm_x86_pmu_##func, kvm_pmu_ops.func);
|
||||
#define KVM_X86_PMU_OP(func) \
|
||||
WARN_ON(!kvm_pmu_ops.func); __KVM_X86_PMU_OP(func)
|
||||
#define KVM_X86_PMU_OP_OPTIONAL __KVM_X86_PMU_OP
|
||||
#include <asm/kvm-x86-pmu-ops.h>
|
||||
#undef __KVM_X86_PMU_OP
|
||||
}
|
||||
|
||||
static inline bool pmc_is_enabled(struct kvm_pmc *pmc)
|
||||
{
|
||||
return static_call(kvm_x86_pmu_pmc_is_enabled)(pmc);
|
||||
}
|
||||
|
||||
static void kvm_pmi_trigger_fn(struct irq_work *irq_work)
|
||||
{
|
||||
struct kvm_pmu *pmu = container_of(irq_work, struct kvm_pmu, irq_work);
|
||||
@ -216,7 +242,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
|
||||
ARCH_PERFMON_EVENTSEL_CMASK |
|
||||
HSW_IN_TX |
|
||||
HSW_IN_TX_CHECKPOINTED))) {
|
||||
config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
|
||||
config = static_call(kvm_x86_pmu_pmc_perf_hw_id)(pmc);
|
||||
if (config != PERF_COUNT_HW_MAX)
|
||||
type = PERF_TYPE_HARDWARE;
|
||||
}
|
||||
@ -266,7 +292,7 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
|
||||
|
||||
pmc->current_config = (u64)ctrl;
|
||||
pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
|
||||
kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc),
|
||||
static_call(kvm_x86_pmu_pmc_perf_hw_id)(pmc),
|
||||
!(en_field & 0x2), /* exclude user */
|
||||
!(en_field & 0x1), /* exclude kernel */
|
||||
pmi);
|
||||
@ -275,7 +301,7 @@ EXPORT_SYMBOL_GPL(reprogram_fixed_counter);
|
||||
|
||||
void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx)
|
||||
{
|
||||
struct kvm_pmc *pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, pmc_idx);
|
||||
struct kvm_pmc *pmc = static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, pmc_idx);
|
||||
|
||||
if (!pmc)
|
||||
return;
|
||||
@ -297,7 +323,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
|
||||
int bit;
|
||||
|
||||
for_each_set_bit(bit, pmu->reprogram_pmi, X86_PMC_IDX_MAX) {
|
||||
struct kvm_pmc *pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, bit);
|
||||
struct kvm_pmc *pmc = static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, bit);
|
||||
|
||||
if (unlikely(!pmc || !pmc->perf_event)) {
|
||||
clear_bit(bit, pmu->reprogram_pmi);
|
||||
@ -319,7 +345,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
|
||||
/* check if idx is a valid index to access PMU */
|
||||
bool kvm_pmu_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
|
||||
{
|
||||
return kvm_x86_ops.pmu_ops->is_valid_rdpmc_ecx(vcpu, idx);
|
||||
return static_call(kvm_x86_pmu_is_valid_rdpmc_ecx)(vcpu, idx);
|
||||
}
|
||||
|
||||
bool is_vmware_backdoor_pmc(u32 pmc_idx)
|
||||
@ -369,7 +395,7 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
|
||||
if (is_vmware_backdoor_pmc(idx))
|
||||
return kvm_pmu_rdpmc_vmware(vcpu, idx, data);
|
||||
|
||||
pmc = kvm_x86_ops.pmu_ops->rdpmc_ecx_to_pmc(vcpu, idx, &mask);
|
||||
pmc = static_call(kvm_x86_pmu_rdpmc_ecx_to_pmc)(vcpu, idx, &mask);
|
||||
if (!pmc)
|
||||
return 1;
|
||||
|
||||
@ -385,22 +411,21 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
|
||||
void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (lapic_in_kernel(vcpu)) {
|
||||
if (kvm_x86_ops.pmu_ops->deliver_pmi)
|
||||
kvm_x86_ops.pmu_ops->deliver_pmi(vcpu);
|
||||
static_call_cond(kvm_x86_pmu_deliver_pmi)(vcpu);
|
||||
kvm_apic_local_deliver(vcpu->arch.apic, APIC_LVTPC);
|
||||
}
|
||||
}
|
||||
|
||||
bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
|
||||
{
|
||||
return kvm_x86_ops.pmu_ops->msr_idx_to_pmc(vcpu, msr) ||
|
||||
kvm_x86_ops.pmu_ops->is_valid_msr(vcpu, msr);
|
||||
return static_call(kvm_x86_pmu_msr_idx_to_pmc)(vcpu, msr) ||
|
||||
static_call(kvm_x86_pmu_is_valid_msr)(vcpu, msr);
|
||||
}
|
||||
|
||||
static void kvm_pmu_mark_pmc_in_use(struct kvm_vcpu *vcpu, u32 msr)
|
||||
{
|
||||
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
|
||||
struct kvm_pmc *pmc = kvm_x86_ops.pmu_ops->msr_idx_to_pmc(vcpu, msr);
|
||||
struct kvm_pmc *pmc = static_call(kvm_x86_pmu_msr_idx_to_pmc)(vcpu, msr);
|
||||
|
||||
if (pmc)
|
||||
__set_bit(pmc->idx, pmu->pmc_in_use);
|
||||
@ -408,13 +433,13 @@ static void kvm_pmu_mark_pmc_in_use(struct kvm_vcpu *vcpu, u32 msr)
|
||||
|
||||
int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
{
|
||||
return kvm_x86_ops.pmu_ops->get_msr(vcpu, msr_info);
|
||||
return static_call(kvm_x86_pmu_get_msr)(vcpu, msr_info);
|
||||
}
|
||||
|
||||
int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
{
|
||||
kvm_pmu_mark_pmc_in_use(vcpu, msr_info->index);
|
||||
return kvm_x86_ops.pmu_ops->set_msr(vcpu, msr_info);
|
||||
return static_call(kvm_x86_pmu_set_msr)(vcpu, msr_info);
|
||||
}
|
||||
|
||||
/* refresh PMU settings. This function generally is called when underlying
|
||||
@ -423,7 +448,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
*/
|
||||
void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
kvm_x86_ops.pmu_ops->refresh(vcpu);
|
||||
static_call(kvm_x86_pmu_refresh)(vcpu);
|
||||
}
|
||||
|
||||
void kvm_pmu_reset(struct kvm_vcpu *vcpu)
|
||||
@ -431,7 +456,7 @@ void kvm_pmu_reset(struct kvm_vcpu *vcpu)
|
||||
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
|
||||
|
||||
irq_work_sync(&pmu->irq_work);
|
||||
kvm_x86_ops.pmu_ops->reset(vcpu);
|
||||
static_call(kvm_x86_pmu_reset)(vcpu);
|
||||
}
|
||||
|
||||
void kvm_pmu_init(struct kvm_vcpu *vcpu)
|
||||
@ -439,7 +464,7 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
|
||||
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
|
||||
|
||||
memset(pmu, 0, sizeof(*pmu));
|
||||
kvm_x86_ops.pmu_ops->init(vcpu);
|
||||
static_call(kvm_x86_pmu_init)(vcpu);
|
||||
init_irq_work(&pmu->irq_work, kvm_pmi_trigger_fn);
|
||||
pmu->event_count = 0;
|
||||
pmu->need_cleanup = false;
|
||||
@ -471,14 +496,13 @@ void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
|
||||
pmu->pmc_in_use, X86_PMC_IDX_MAX);
|
||||
|
||||
for_each_set_bit(i, bitmask, X86_PMC_IDX_MAX) {
|
||||
pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, i);
|
||||
pmc = static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, i);
|
||||
|
||||
if (pmc && pmc->perf_event && !pmc_speculative_in_use(pmc))
|
||||
pmc_stop_counter(pmc);
|
||||
}
|
||||
|
||||
if (kvm_x86_ops.pmu_ops->cleanup)
|
||||
kvm_x86_ops.pmu_ops->cleanup(vcpu);
|
||||
static_call_cond(kvm_x86_pmu_cleanup)(vcpu);
|
||||
|
||||
bitmap_zero(pmu->pmc_in_use, X86_PMC_IDX_MAX);
|
||||
}
|
||||
@ -508,7 +532,7 @@ static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc,
|
||||
unsigned int config;
|
||||
|
||||
pmc->eventsel &= (ARCH_PERFMON_EVENTSEL_EVENT | ARCH_PERFMON_EVENTSEL_UMASK);
|
||||
config = kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc);
|
||||
config = static_call(kvm_x86_pmu_pmc_perf_hw_id)(pmc);
|
||||
pmc->eventsel = old_eventsel;
|
||||
return config == perf_hw_id;
|
||||
}
|
||||
@ -536,7 +560,7 @@ void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
|
||||
int i;
|
||||
|
||||
for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) {
|
||||
pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, i);
|
||||
pmc = static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, i);
|
||||
|
||||
if (!pmc || !pmc_is_enabled(pmc) || !pmc_speculative_in_use(pmc))
|
||||
continue;
|
||||
|
@ -39,6 +39,8 @@ struct kvm_pmu_ops {
|
||||
void (*cleanup)(struct kvm_vcpu *vcpu);
|
||||
};
|
||||
|
||||
void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops);
|
||||
|
||||
static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
|
||||
{
|
||||
struct kvm_pmu *pmu = pmc_to_pmu(pmc);
|
||||
@ -86,11 +88,6 @@ static inline bool pmc_is_fixed(struct kvm_pmc *pmc)
|
||||
return pmc->type == KVM_PMC_FIXED;
|
||||
}
|
||||
|
||||
static inline bool pmc_is_enabled(struct kvm_pmc *pmc)
|
||||
{
|
||||
return kvm_x86_ops.pmu_ops->pmc_is_enabled(pmc);
|
||||
}
|
||||
|
||||
static inline bool kvm_valid_perf_global_ctrl(struct kvm_pmu *pmu,
|
||||
u64 data)
|
||||
{
|
||||
|
@ -165,9 +165,8 @@ free_avic:
|
||||
return err;
|
||||
}
|
||||
|
||||
void avic_init_vmcb(struct vcpu_svm *svm)
|
||||
void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb)
|
||||
{
|
||||
struct vmcb *vmcb = svm->vmcb;
|
||||
struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
|
||||
phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page));
|
||||
phys_addr_t lpa = __sme_set(page_to_phys(kvm_svm->avic_logical_id_table_page));
|
||||
@ -285,11 +284,77 @@ void avic_ring_doorbell(struct kvm_vcpu *vcpu)
|
||||
put_cpu();
|
||||
}
|
||||
|
||||
static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
|
||||
u32 icrl, u32 icrh)
|
||||
/*
|
||||
* A fast-path version of avic_kick_target_vcpus(), which attempts to match
|
||||
* destination APIC ID to vCPU without looping through all vCPUs.
|
||||
*/
|
||||
static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source,
|
||||
u32 icrl, u32 icrh, u32 index)
|
||||
{
|
||||
u32 dest, apic_id;
|
||||
struct kvm_vcpu *vcpu;
|
||||
int dest_mode = icrl & APIC_DEST_MASK;
|
||||
int shorthand = icrl & APIC_SHORT_MASK;
|
||||
struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
|
||||
u32 *avic_logical_id_table = page_address(kvm_svm->avic_logical_id_table_page);
|
||||
|
||||
if (shorthand != APIC_DEST_NOSHORT)
|
||||
return -EINVAL;
|
||||
|
||||
/*
|
||||
* The AVIC incomplete IPI #vmexit info provides index into
|
||||
* the physical APIC ID table, which can be used to derive
|
||||
* guest physical APIC ID.
|
||||
*/
|
||||
if (dest_mode == APIC_DEST_PHYSICAL) {
|
||||
apic_id = index;
|
||||
} else {
|
||||
if (!apic_x2apic_mode(source)) {
|
||||
/* For xAPIC logical mode, the index is for logical APIC table. */
|
||||
apic_id = avic_logical_id_table[index] & 0x1ff;
|
||||
} else {
|
||||
return -EINVAL;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Assuming vcpu ID is the same as physical apic ID,
|
||||
* and use it to retrieve the target vCPU.
|
||||
*/
|
||||
vcpu = kvm_get_vcpu_by_id(kvm, apic_id);
|
||||
if (!vcpu)
|
||||
return -EINVAL;
|
||||
|
||||
if (apic_x2apic_mode(vcpu->arch.apic))
|
||||
dest = icrh;
|
||||
else
|
||||
dest = GET_APIC_DEST_FIELD(icrh);
|
||||
|
||||
/*
|
||||
* Try matching the destination APIC ID with the vCPU.
|
||||
*/
|
||||
if (kvm_apic_match_dest(vcpu, source, shorthand, dest, dest_mode)) {
|
||||
vcpu->arch.apic->irr_pending = true;
|
||||
svm_complete_interrupt_delivery(vcpu,
|
||||
icrl & APIC_MODE_MASK,
|
||||
icrl & APIC_INT_LEVELTRIG,
|
||||
icrl & APIC_VECTOR_MASK);
|
||||
return 0;
|
||||
}
|
||||
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
|
||||
u32 icrl, u32 icrh, u32 index)
|
||||
{
|
||||
unsigned long i;
|
||||
struct kvm_vcpu *vcpu;
|
||||
|
||||
if (!avic_kick_target_vcpus_fast(kvm, source, icrl, icrh, index))
|
||||
return;
|
||||
|
||||
trace_kvm_avic_kick_vcpu_slowpath(icrh, icrl, index);
|
||||
|
||||
/*
|
||||
* Wake any target vCPUs that are blocking, i.e. waiting for a wake
|
||||
@ -316,7 +381,7 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
|
||||
u32 icrh = svm->vmcb->control.exit_info_1 >> 32;
|
||||
u32 icrl = svm->vmcb->control.exit_info_1;
|
||||
u32 id = svm->vmcb->control.exit_info_2 >> 32;
|
||||
u32 index = svm->vmcb->control.exit_info_2 & 0xFF;
|
||||
u32 index = svm->vmcb->control.exit_info_2 & 0x1FF;
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
|
||||
trace_kvm_avic_incomplete_ipi(vcpu->vcpu_id, icrh, icrl, id, index);
|
||||
@ -343,7 +408,7 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
|
||||
* set the appropriate IRR bits on the valid target
|
||||
* vcpus. So, we just need to kick the appropriate vcpu.
|
||||
*/
|
||||
avic_kick_target_vcpus(vcpu->kvm, apic, icrl, icrh);
|
||||
avic_kick_target_vcpus(vcpu->kvm, apic, icrl, icrh, index);
|
||||
break;
|
||||
case AVIC_IPI_FAILURE_INVALID_TARGET:
|
||||
break;
|
||||
@ -357,6 +422,13 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
|
||||
return 1;
|
||||
}
|
||||
|
||||
unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (is_guest_mode(vcpu))
|
||||
return APICV_INHIBIT_REASON_NESTED;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
|
||||
{
|
||||
struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);
|
||||
|
@ -36,40 +36,45 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
|
||||
struct x86_exception *fault)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
struct vmcb *vmcb = svm->vmcb;
|
||||
|
||||
if (svm->vmcb->control.exit_code != SVM_EXIT_NPF) {
|
||||
if (vmcb->control.exit_code != SVM_EXIT_NPF) {
|
||||
/*
|
||||
* TODO: track the cause of the nested page fault, and
|
||||
* correctly fill in the high bits of exit_info_1.
|
||||
*/
|
||||
svm->vmcb->control.exit_code = SVM_EXIT_NPF;
|
||||
svm->vmcb->control.exit_code_hi = 0;
|
||||
svm->vmcb->control.exit_info_1 = (1ULL << 32);
|
||||
svm->vmcb->control.exit_info_2 = fault->address;
|
||||
vmcb->control.exit_code = SVM_EXIT_NPF;
|
||||
vmcb->control.exit_code_hi = 0;
|
||||
vmcb->control.exit_info_1 = (1ULL << 32);
|
||||
vmcb->control.exit_info_2 = fault->address;
|
||||
}
|
||||
|
||||
svm->vmcb->control.exit_info_1 &= ~0xffffffffULL;
|
||||
svm->vmcb->control.exit_info_1 |= fault->error_code;
|
||||
vmcb->control.exit_info_1 &= ~0xffffffffULL;
|
||||
vmcb->control.exit_info_1 |= fault->error_code;
|
||||
|
||||
nested_svm_vmexit(svm);
|
||||
}
|
||||
|
||||
static void svm_inject_page_fault_nested(struct kvm_vcpu *vcpu, struct x86_exception *fault)
|
||||
static bool nested_svm_handle_page_fault_workaround(struct kvm_vcpu *vcpu,
|
||||
struct x86_exception *fault)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
WARN_ON(!is_guest_mode(vcpu));
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
struct vmcb *vmcb = svm->vmcb;
|
||||
|
||||
WARN_ON(!is_guest_mode(vcpu));
|
||||
|
||||
if (vmcb12_is_intercept(&svm->nested.ctl,
|
||||
INTERCEPT_EXCEPTION_OFFSET + PF_VECTOR) &&
|
||||
!svm->nested.nested_run_pending) {
|
||||
svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + PF_VECTOR;
|
||||
svm->vmcb->control.exit_code_hi = 0;
|
||||
svm->vmcb->control.exit_info_1 = fault->error_code;
|
||||
svm->vmcb->control.exit_info_2 = fault->address;
|
||||
nested_svm_vmexit(svm);
|
||||
} else {
|
||||
kvm_inject_page_fault(vcpu, fault);
|
||||
}
|
||||
!WARN_ON_ONCE(svm->nested.nested_run_pending)) {
|
||||
vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + PF_VECTOR;
|
||||
vmcb->control.exit_code_hi = 0;
|
||||
vmcb->control.exit_info_1 = fault->error_code;
|
||||
vmcb->control.exit_info_2 = fault->address;
|
||||
nested_svm_vmexit(svm);
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index)
|
||||
@ -121,6 +126,20 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
|
||||
vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
|
||||
}
|
||||
|
||||
static bool nested_vmcb_needs_vls_intercept(struct vcpu_svm *svm)
|
||||
{
|
||||
if (!svm->v_vmload_vmsave_enabled)
|
||||
return true;
|
||||
|
||||
if (!nested_npt_enabled(svm))
|
||||
return true;
|
||||
|
||||
if (!(svm->nested.ctl.virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK))
|
||||
return true;
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
void recalc_intercepts(struct vcpu_svm *svm)
|
||||
{
|
||||
struct vmcb_control_area *c, *h;
|
||||
@ -162,8 +181,17 @@ void recalc_intercepts(struct vcpu_svm *svm)
|
||||
if (!intercept_smi)
|
||||
vmcb_clr_intercept(c, INTERCEPT_SMI);
|
||||
|
||||
vmcb_set_intercept(c, INTERCEPT_VMLOAD);
|
||||
vmcb_set_intercept(c, INTERCEPT_VMSAVE);
|
||||
if (nested_vmcb_needs_vls_intercept(svm)) {
|
||||
/*
|
||||
* If the virtual VMLOAD/VMSAVE is not enabled for the L2,
|
||||
* we must intercept these instructions to correctly
|
||||
* emulate them in case L1 doesn't intercept them.
|
||||
*/
|
||||
vmcb_set_intercept(c, INTERCEPT_VMLOAD);
|
||||
vmcb_set_intercept(c, INTERCEPT_VMSAVE);
|
||||
} else {
|
||||
WARN_ON(!(c->virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK));
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
@ -413,6 +441,10 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
|
||||
*/
|
||||
mask &= ~V_IRQ_MASK;
|
||||
}
|
||||
|
||||
if (nested_vgif_enabled(svm))
|
||||
mask |= V_GIF_MASK;
|
||||
|
||||
svm->nested.ctl.int_ctl &= ~mask;
|
||||
svm->nested.ctl.int_ctl |= svm->vmcb->control.int_ctl & mask;
|
||||
}
|
||||
@ -454,11 +486,6 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
|
||||
vmcb12->control.exit_int_info = exit_int_info;
|
||||
}
|
||||
|
||||
static inline bool nested_npt_enabled(struct vcpu_svm *svm)
|
||||
{
|
||||
return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
|
||||
}
|
||||
|
||||
static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
/*
|
||||
@ -515,6 +542,8 @@ void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm)
|
||||
static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12)
|
||||
{
|
||||
bool new_vmcb12 = false;
|
||||
struct vmcb *vmcb01 = svm->vmcb01.ptr;
|
||||
struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
|
||||
|
||||
nested_vmcb02_compute_g_pat(svm);
|
||||
|
||||
@ -526,18 +555,18 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
|
||||
}
|
||||
|
||||
if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_SEG))) {
|
||||
svm->vmcb->save.es = vmcb12->save.es;
|
||||
svm->vmcb->save.cs = vmcb12->save.cs;
|
||||
svm->vmcb->save.ss = vmcb12->save.ss;
|
||||
svm->vmcb->save.ds = vmcb12->save.ds;
|
||||
svm->vmcb->save.cpl = vmcb12->save.cpl;
|
||||
vmcb_mark_dirty(svm->vmcb, VMCB_SEG);
|
||||
vmcb02->save.es = vmcb12->save.es;
|
||||
vmcb02->save.cs = vmcb12->save.cs;
|
||||
vmcb02->save.ss = vmcb12->save.ss;
|
||||
vmcb02->save.ds = vmcb12->save.ds;
|
||||
vmcb02->save.cpl = vmcb12->save.cpl;
|
||||
vmcb_mark_dirty(vmcb02, VMCB_SEG);
|
||||
}
|
||||
|
||||
if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_DT))) {
|
||||
svm->vmcb->save.gdtr = vmcb12->save.gdtr;
|
||||
svm->vmcb->save.idtr = vmcb12->save.idtr;
|
||||
vmcb_mark_dirty(svm->vmcb, VMCB_DT);
|
||||
vmcb02->save.gdtr = vmcb12->save.gdtr;
|
||||
vmcb02->save.idtr = vmcb12->save.idtr;
|
||||
vmcb_mark_dirty(vmcb02, VMCB_DT);
|
||||
}
|
||||
|
||||
kvm_set_rflags(&svm->vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED);
|
||||
@ -554,47 +583,59 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
|
||||
kvm_rip_write(&svm->vcpu, vmcb12->save.rip);
|
||||
|
||||
/* In case we don't even reach vcpu_run, the fields are not updated */
|
||||
svm->vmcb->save.rax = vmcb12->save.rax;
|
||||
svm->vmcb->save.rsp = vmcb12->save.rsp;
|
||||
svm->vmcb->save.rip = vmcb12->save.rip;
|
||||
vmcb02->save.rax = vmcb12->save.rax;
|
||||
vmcb02->save.rsp = vmcb12->save.rsp;
|
||||
vmcb02->save.rip = vmcb12->save.rip;
|
||||
|
||||
/* These bits will be set properly on the first execution when new_vmc12 is true */
|
||||
if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_DR))) {
|
||||
svm->vmcb->save.dr7 = svm->nested.save.dr7 | DR7_FIXED_1;
|
||||
vmcb02->save.dr7 = svm->nested.save.dr7 | DR7_FIXED_1;
|
||||
svm->vcpu.arch.dr6 = svm->nested.save.dr6 | DR6_ACTIVE_LOW;
|
||||
vmcb_mark_dirty(svm->vmcb, VMCB_DR);
|
||||
vmcb_mark_dirty(vmcb02, VMCB_DR);
|
||||
}
|
||||
|
||||
if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
|
||||
/*
|
||||
* Reserved bits of DEBUGCTL are ignored. Be consistent with
|
||||
* svm_set_msr's definition of reserved bits.
|
||||
*/
|
||||
svm_copy_lbrs(vmcb02, vmcb12);
|
||||
vmcb02->save.dbgctl &= ~DEBUGCTL_RESERVED_BITS;
|
||||
svm_update_lbrv(&svm->vcpu);
|
||||
|
||||
} else if (unlikely(vmcb01->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
|
||||
svm_copy_lbrs(vmcb02, vmcb01);
|
||||
}
|
||||
}
|
||||
|
||||
static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
|
||||
{
|
||||
const u32 int_ctl_vmcb01_bits =
|
||||
V_INTR_MASKING_MASK | V_GIF_MASK | V_GIF_ENABLE_MASK;
|
||||
|
||||
const u32 int_ctl_vmcb12_bits = V_TPR_MASK | V_IRQ_INJECTION_BITS_MASK;
|
||||
u32 int_ctl_vmcb01_bits = V_INTR_MASKING_MASK;
|
||||
u32 int_ctl_vmcb12_bits = V_TPR_MASK | V_IRQ_INJECTION_BITS_MASK;
|
||||
|
||||
struct kvm_vcpu *vcpu = &svm->vcpu;
|
||||
struct vmcb *vmcb01 = svm->vmcb01.ptr;
|
||||
struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
|
||||
|
||||
/*
|
||||
* Filled at exit: exit_code, exit_code_hi, exit_info_1, exit_info_2,
|
||||
* exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
|
||||
*/
|
||||
|
||||
/*
|
||||
* Also covers avic_vapic_bar, avic_backing_page, avic_logical_id,
|
||||
* avic_physical_id.
|
||||
*/
|
||||
WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
|
||||
if (svm->vgif_enabled && (svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK))
|
||||
int_ctl_vmcb12_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
|
||||
else
|
||||
int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
|
||||
|
||||
/* Copied from vmcb01. msrpm_base can be overwritten later. */
|
||||
svm->vmcb->control.nested_ctl = svm->vmcb01.ptr->control.nested_ctl;
|
||||
svm->vmcb->control.iopm_base_pa = svm->vmcb01.ptr->control.iopm_base_pa;
|
||||
svm->vmcb->control.msrpm_base_pa = svm->vmcb01.ptr->control.msrpm_base_pa;
|
||||
vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
|
||||
vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
|
||||
vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
|
||||
|
||||
/* Done at vmrun: asid. */
|
||||
|
||||
/* Also overwritten later if necessary. */
|
||||
svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
|
||||
vmcb02->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
|
||||
|
||||
/* nested_cr3. */
|
||||
if (nested_npt_enabled(svm))
|
||||
@ -605,21 +646,53 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
|
||||
svm->nested.ctl.tsc_offset,
|
||||
svm->tsc_ratio_msr);
|
||||
|
||||
svm->vmcb->control.tsc_offset = vcpu->arch.tsc_offset;
|
||||
vmcb02->control.tsc_offset = vcpu->arch.tsc_offset;
|
||||
|
||||
if (svm->tsc_ratio_msr != kvm_default_tsc_scaling_ratio) {
|
||||
WARN_ON(!svm->tsc_scaling_enabled);
|
||||
nested_svm_update_tsc_ratio_msr(vcpu);
|
||||
}
|
||||
|
||||
svm->vmcb->control.int_ctl =
|
||||
vmcb02->control.int_ctl =
|
||||
(svm->nested.ctl.int_ctl & int_ctl_vmcb12_bits) |
|
||||
(svm->vmcb01.ptr->control.int_ctl & int_ctl_vmcb01_bits);
|
||||
(vmcb01->control.int_ctl & int_ctl_vmcb01_bits);
|
||||
|
||||
svm->vmcb->control.int_vector = svm->nested.ctl.int_vector;
|
||||
svm->vmcb->control.int_state = svm->nested.ctl.int_state;
|
||||
svm->vmcb->control.event_inj = svm->nested.ctl.event_inj;
|
||||
svm->vmcb->control.event_inj_err = svm->nested.ctl.event_inj_err;
|
||||
vmcb02->control.int_vector = svm->nested.ctl.int_vector;
|
||||
vmcb02->control.int_state = svm->nested.ctl.int_state;
|
||||
vmcb02->control.event_inj = svm->nested.ctl.event_inj;
|
||||
vmcb02->control.event_inj_err = svm->nested.ctl.event_inj_err;
|
||||
|
||||
vmcb02->control.virt_ext = vmcb01->control.virt_ext &
|
||||
LBR_CTL_ENABLE_MASK;
|
||||
if (svm->lbrv_enabled)
|
||||
vmcb02->control.virt_ext |=
|
||||
(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
|
||||
|
||||
if (!nested_vmcb_needs_vls_intercept(svm))
|
||||
vmcb02->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
|
||||
|
||||
if (kvm_pause_in_guest(svm->vcpu.kvm)) {
|
||||
/* use guest values since host doesn't use them */
|
||||
vmcb02->control.pause_filter_count =
|
||||
svm->pause_filter_enabled ?
|
||||
svm->nested.ctl.pause_filter_count : 0;
|
||||
|
||||
vmcb02->control.pause_filter_thresh =
|
||||
svm->pause_threshold_enabled ?
|
||||
svm->nested.ctl.pause_filter_thresh : 0;
|
||||
|
||||
} else if (!vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_PAUSE)) {
|
||||
/* use host values when guest doesn't use them */
|
||||
vmcb02->control.pause_filter_count = vmcb01->control.pause_filter_count;
|
||||
vmcb02->control.pause_filter_thresh = vmcb01->control.pause_filter_thresh;
|
||||
} else {
|
||||
/*
|
||||
* Intercept every PAUSE otherwise and
|
||||
* ignore both host and guest values
|
||||
*/
|
||||
vmcb02->control.pause_filter_count = 0;
|
||||
vmcb02->control.pause_filter_thresh = 0;
|
||||
}
|
||||
|
||||
nested_svm_transition_tlb_flush(vcpu);
|
||||
|
||||
@ -680,14 +753,14 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
if (!npt_enabled)
|
||||
vcpu->arch.mmu->inject_page_fault = svm_inject_page_fault_nested;
|
||||
|
||||
if (!from_vmrun)
|
||||
kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
|
||||
|
||||
svm_set_gif(svm, true);
|
||||
|
||||
if (kvm_vcpu_apicv_active(vcpu))
|
||||
kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -698,6 +771,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
|
||||
struct vmcb *vmcb12;
|
||||
struct kvm_host_map map;
|
||||
u64 vmcb12_gpa;
|
||||
struct vmcb *vmcb01 = svm->vmcb01.ptr;
|
||||
|
||||
if (!svm->nested.hsave_msr) {
|
||||
kvm_inject_gp(vcpu, 0);
|
||||
@ -741,14 +815,14 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
|
||||
* Since vmcb01 is not in use, we can use it to store some of the L1
|
||||
* state.
|
||||
*/
|
||||
svm->vmcb01.ptr->save.efer = vcpu->arch.efer;
|
||||
svm->vmcb01.ptr->save.cr0 = kvm_read_cr0(vcpu);
|
||||
svm->vmcb01.ptr->save.cr4 = vcpu->arch.cr4;
|
||||
svm->vmcb01.ptr->save.rflags = kvm_get_rflags(vcpu);
|
||||
svm->vmcb01.ptr->save.rip = kvm_rip_read(vcpu);
|
||||
vmcb01->save.efer = vcpu->arch.efer;
|
||||
vmcb01->save.cr0 = kvm_read_cr0(vcpu);
|
||||
vmcb01->save.cr4 = vcpu->arch.cr4;
|
||||
vmcb01->save.rflags = kvm_get_rflags(vcpu);
|
||||
vmcb01->save.rip = kvm_rip_read(vcpu);
|
||||
|
||||
if (!npt_enabled)
|
||||
svm->vmcb01.ptr->save.cr3 = kvm_read_cr3(vcpu);
|
||||
vmcb01->save.cr3 = kvm_read_cr3(vcpu);
|
||||
|
||||
svm->nested.nested_run_pending = 1;
|
||||
|
||||
@ -814,14 +888,12 @@ void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
|
||||
int nested_svm_vmexit(struct vcpu_svm *svm)
|
||||
{
|
||||
struct kvm_vcpu *vcpu = &svm->vcpu;
|
||||
struct vmcb *vmcb01 = svm->vmcb01.ptr;
|
||||
struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
|
||||
struct vmcb *vmcb12;
|
||||
struct vmcb *vmcb = svm->vmcb;
|
||||
struct kvm_host_map map;
|
||||
int rc;
|
||||
|
||||
/* Triple faults in L2 should never escape. */
|
||||
WARN_ON_ONCE(kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu));
|
||||
|
||||
rc = kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.vmcb12_gpa), &map);
|
||||
if (rc) {
|
||||
if (rc == -EINVAL)
|
||||
@ -843,57 +915,68 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
|
||||
|
||||
/* Give the current vmcb to the guest */
|
||||
|
||||
vmcb12->save.es = vmcb->save.es;
|
||||
vmcb12->save.cs = vmcb->save.cs;
|
||||
vmcb12->save.ss = vmcb->save.ss;
|
||||
vmcb12->save.ds = vmcb->save.ds;
|
||||
vmcb12->save.gdtr = vmcb->save.gdtr;
|
||||
vmcb12->save.idtr = vmcb->save.idtr;
|
||||
vmcb12->save.es = vmcb02->save.es;
|
||||
vmcb12->save.cs = vmcb02->save.cs;
|
||||
vmcb12->save.ss = vmcb02->save.ss;
|
||||
vmcb12->save.ds = vmcb02->save.ds;
|
||||
vmcb12->save.gdtr = vmcb02->save.gdtr;
|
||||
vmcb12->save.idtr = vmcb02->save.idtr;
|
||||
vmcb12->save.efer = svm->vcpu.arch.efer;
|
||||
vmcb12->save.cr0 = kvm_read_cr0(vcpu);
|
||||
vmcb12->save.cr3 = kvm_read_cr3(vcpu);
|
||||
vmcb12->save.cr2 = vmcb->save.cr2;
|
||||
vmcb12->save.cr2 = vmcb02->save.cr2;
|
||||
vmcb12->save.cr4 = svm->vcpu.arch.cr4;
|
||||
vmcb12->save.rflags = kvm_get_rflags(vcpu);
|
||||
vmcb12->save.rip = kvm_rip_read(vcpu);
|
||||
vmcb12->save.rsp = kvm_rsp_read(vcpu);
|
||||
vmcb12->save.rax = kvm_rax_read(vcpu);
|
||||
vmcb12->save.dr7 = vmcb->save.dr7;
|
||||
vmcb12->save.dr7 = vmcb02->save.dr7;
|
||||
vmcb12->save.dr6 = svm->vcpu.arch.dr6;
|
||||
vmcb12->save.cpl = vmcb->save.cpl;
|
||||
vmcb12->save.cpl = vmcb02->save.cpl;
|
||||
|
||||
vmcb12->control.int_state = vmcb->control.int_state;
|
||||
vmcb12->control.exit_code = vmcb->control.exit_code;
|
||||
vmcb12->control.exit_code_hi = vmcb->control.exit_code_hi;
|
||||
vmcb12->control.exit_info_1 = vmcb->control.exit_info_1;
|
||||
vmcb12->control.exit_info_2 = vmcb->control.exit_info_2;
|
||||
vmcb12->control.int_state = vmcb02->control.int_state;
|
||||
vmcb12->control.exit_code = vmcb02->control.exit_code;
|
||||
vmcb12->control.exit_code_hi = vmcb02->control.exit_code_hi;
|
||||
vmcb12->control.exit_info_1 = vmcb02->control.exit_info_1;
|
||||
vmcb12->control.exit_info_2 = vmcb02->control.exit_info_2;
|
||||
|
||||
if (vmcb12->control.exit_code != SVM_EXIT_ERR)
|
||||
nested_save_pending_event_to_vmcb12(svm, vmcb12);
|
||||
|
||||
if (svm->nrips_enabled)
|
||||
vmcb12->control.next_rip = vmcb->control.next_rip;
|
||||
vmcb12->control.next_rip = vmcb02->control.next_rip;
|
||||
|
||||
vmcb12->control.int_ctl = svm->nested.ctl.int_ctl;
|
||||
vmcb12->control.tlb_ctl = svm->nested.ctl.tlb_ctl;
|
||||
vmcb12->control.event_inj = svm->nested.ctl.event_inj;
|
||||
vmcb12->control.event_inj_err = svm->nested.ctl.event_inj_err;
|
||||
|
||||
if (!kvm_pause_in_guest(vcpu->kvm) && vmcb02->control.pause_filter_count)
|
||||
vmcb01->control.pause_filter_count = vmcb02->control.pause_filter_count;
|
||||
|
||||
nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr);
|
||||
|
||||
svm_switch_vmcb(svm, &svm->vmcb01);
|
||||
|
||||
if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
|
||||
svm_copy_lbrs(vmcb12, vmcb02);
|
||||
svm_update_lbrv(vcpu);
|
||||
} else if (unlikely(vmcb01->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
|
||||
svm_copy_lbrs(vmcb01, vmcb02);
|
||||
svm_update_lbrv(vcpu);
|
||||
}
|
||||
|
||||
/*
|
||||
* On vmexit the GIF is set to false and
|
||||
* no event can be injected in L1.
|
||||
*/
|
||||
svm_set_gif(svm, false);
|
||||
svm->vmcb->control.exit_int_info = 0;
|
||||
vmcb01->control.exit_int_info = 0;
|
||||
|
||||
svm->vcpu.arch.tsc_offset = svm->vcpu.arch.l1_tsc_offset;
|
||||
if (svm->vmcb->control.tsc_offset != svm->vcpu.arch.tsc_offset) {
|
||||
svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset;
|
||||
vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
|
||||
if (vmcb01->control.tsc_offset != svm->vcpu.arch.tsc_offset) {
|
||||
vmcb01->control.tsc_offset = svm->vcpu.arch.tsc_offset;
|
||||
vmcb_mark_dirty(vmcb01, VMCB_INTERCEPTS);
|
||||
}
|
||||
|
||||
if (svm->tsc_ratio_msr != kvm_default_tsc_scaling_ratio) {
|
||||
@ -907,13 +990,13 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
|
||||
/*
|
||||
* Restore processor state that had been saved in vmcb01
|
||||
*/
|
||||
kvm_set_rflags(vcpu, svm->vmcb->save.rflags);
|
||||
svm_set_efer(vcpu, svm->vmcb->save.efer);
|
||||
svm_set_cr0(vcpu, svm->vmcb->save.cr0 | X86_CR0_PE);
|
||||
svm_set_cr4(vcpu, svm->vmcb->save.cr4);
|
||||
kvm_rax_write(vcpu, svm->vmcb->save.rax);
|
||||
kvm_rsp_write(vcpu, svm->vmcb->save.rsp);
|
||||
kvm_rip_write(vcpu, svm->vmcb->save.rip);
|
||||
kvm_set_rflags(vcpu, vmcb01->save.rflags);
|
||||
svm_set_efer(vcpu, vmcb01->save.efer);
|
||||
svm_set_cr0(vcpu, vmcb01->save.cr0 | X86_CR0_PE);
|
||||
svm_set_cr4(vcpu, vmcb01->save.cr4);
|
||||
kvm_rax_write(vcpu, vmcb01->save.rax);
|
||||
kvm_rsp_write(vcpu, vmcb01->save.rsp);
|
||||
kvm_rip_write(vcpu, vmcb01->save.rip);
|
||||
|
||||
svm->vcpu.arch.dr7 = DR7_FIXED_1;
|
||||
kvm_update_dr7(&svm->vcpu);
|
||||
@ -931,7 +1014,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
|
||||
|
||||
nested_svm_uninit_mmu_context(vcpu);
|
||||
|
||||
rc = nested_svm_load_cr3(vcpu, svm->vmcb->save.cr3, false, true);
|
||||
rc = nested_svm_load_cr3(vcpu, vmcb01->save.cr3, false, true);
|
||||
if (rc)
|
||||
return 1;
|
||||
|
||||
@ -949,9 +1032,16 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
|
||||
* right now so that it an be accounted for before we execute
|
||||
* L1's next instruction.
|
||||
*/
|
||||
if (unlikely(svm->vmcb->save.rflags & X86_EFLAGS_TF))
|
||||
if (unlikely(vmcb01->save.rflags & X86_EFLAGS_TF))
|
||||
kvm_queue_exception(&(svm->vcpu), DB_VECTOR);
|
||||
|
||||
/*
|
||||
* Un-inhibit the AVIC right away, so that other vCPUs can start
|
||||
* to benefit from it right away.
|
||||
*/
|
||||
if (kvm_apicv_activated(vcpu->kvm))
|
||||
kvm_vcpu_update_apicv(vcpu);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -1162,12 +1252,13 @@ static bool nested_exit_on_exception(struct vcpu_svm *svm)
|
||||
static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)
|
||||
{
|
||||
unsigned int nr = svm->vcpu.arch.exception.nr;
|
||||
struct vmcb *vmcb = svm->vmcb;
|
||||
|
||||
svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
|
||||
svm->vmcb->control.exit_code_hi = 0;
|
||||
vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
|
||||
vmcb->control.exit_code_hi = 0;
|
||||
|
||||
if (svm->vcpu.arch.exception.has_error_code)
|
||||
svm->vmcb->control.exit_info_1 = svm->vcpu.arch.exception.error_code;
|
||||
vmcb->control.exit_info_1 = svm->vcpu.arch.exception.error_code;
|
||||
|
||||
/*
|
||||
* EXITINFO2 is undefined for all exception intercepts other
|
||||
@ -1175,11 +1266,11 @@ static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)
|
||||
*/
|
||||
if (nr == PF_VECTOR) {
|
||||
if (svm->vcpu.arch.exception.nested_apf)
|
||||
svm->vmcb->control.exit_info_2 = svm->vcpu.arch.apf.nested_apf_token;
|
||||
vmcb->control.exit_info_2 = svm->vcpu.arch.apf.nested_apf_token;
|
||||
else if (svm->vcpu.arch.exception.has_payload)
|
||||
svm->vmcb->control.exit_info_2 = svm->vcpu.arch.exception.payload;
|
||||
vmcb->control.exit_info_2 = svm->vcpu.arch.exception.payload;
|
||||
else
|
||||
svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
|
||||
vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
|
||||
} else if (nr == DB_VECTOR) {
|
||||
/* See inject_pending_event. */
|
||||
kvm_deliver_exception_payload(&svm->vcpu);
|
||||
@ -1567,6 +1658,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
|
||||
struct kvm_x86_nested_ops svm_nested_ops = {
|
||||
.leave_nested = svm_leave_nested,
|
||||
.check_events = svm_check_nested_events,
|
||||
.handle_page_fault_workaround = nested_svm_handle_page_fault_workaround,
|
||||
.triple_fault = nested_svm_triple_fault,
|
||||
.get_nested_state_pages = svm_get_nested_state_pages,
|
||||
.get_state = svm_get_nested_state,
|
||||
|
@ -342,7 +342,7 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
}
|
||||
|
||||
struct kvm_pmu_ops amd_pmu_ops = {
|
||||
struct kvm_pmu_ops amd_pmu_ops __initdata = {
|
||||
.pmc_perf_hw_id = amd_pmc_perf_hw_id,
|
||||
.pmc_is_enabled = amd_pmc_is_enabled,
|
||||
.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
|
||||
|
@ -688,7 +688,7 @@ static int sev_launch_measure(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
||||
if (params.len > SEV_FW_BLOB_MAX_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
blob = kmalloc(params.len, GFP_KERNEL_ACCOUNT);
|
||||
blob = kzalloc(params.len, GFP_KERNEL_ACCOUNT);
|
||||
if (!blob)
|
||||
return -ENOMEM;
|
||||
|
||||
@ -808,7 +808,7 @@ static int __sev_dbg_decrypt_user(struct kvm *kvm, unsigned long paddr,
|
||||
if (!IS_ALIGNED(dst_paddr, 16) ||
|
||||
!IS_ALIGNED(paddr, 16) ||
|
||||
!IS_ALIGNED(size, 16)) {
|
||||
tpage = (void *)alloc_page(GFP_KERNEL);
|
||||
tpage = (void *)alloc_page(GFP_KERNEL | __GFP_ZERO);
|
||||
if (!tpage)
|
||||
return -ENOMEM;
|
||||
|
||||
@ -1094,7 +1094,7 @@ static int sev_get_attestation_report(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
||||
if (params.len > SEV_FW_BLOB_MAX_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
blob = kmalloc(params.len, GFP_KERNEL_ACCOUNT);
|
||||
blob = kzalloc(params.len, GFP_KERNEL_ACCOUNT);
|
||||
if (!blob)
|
||||
return -ENOMEM;
|
||||
|
||||
@ -1176,7 +1176,7 @@ static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
||||
return -EINVAL;
|
||||
|
||||
/* allocate the memory to hold the session data blob */
|
||||
session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT);
|
||||
session_data = kzalloc(params.session_len, GFP_KERNEL_ACCOUNT);
|
||||
if (!session_data)
|
||||
return -ENOMEM;
|
||||
|
||||
@ -1300,11 +1300,11 @@ static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
||||
|
||||
/* allocate memory for header and transport buffer */
|
||||
ret = -ENOMEM;
|
||||
hdr = kmalloc(params.hdr_len, GFP_KERNEL_ACCOUNT);
|
||||
hdr = kzalloc(params.hdr_len, GFP_KERNEL_ACCOUNT);
|
||||
if (!hdr)
|
||||
goto e_unpin;
|
||||
|
||||
trans_data = kmalloc(params.trans_len, GFP_KERNEL_ACCOUNT);
|
||||
trans_data = kzalloc(params.trans_len, GFP_KERNEL_ACCOUNT);
|
||||
if (!trans_data)
|
||||
goto e_free_hdr;
|
||||
|
||||
@ -2769,8 +2769,12 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
|
||||
pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
|
||||
reason_set, reason_code);
|
||||
|
||||
ret = -EINVAL;
|
||||
break;
|
||||
vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
|
||||
vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
|
||||
vcpu->run->system_event.ndata = 1;
|
||||
vcpu->run->system_event.data[0] = control->ghcb_gpa;
|
||||
|
||||
return 0;
|
||||
}
|
||||
default:
|
||||
/* Error, keep GHCB MSR value as-is */
|
||||
@ -2953,6 +2957,14 @@ void sev_es_init_vmcb(struct vcpu_svm *svm)
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
|
||||
|
||||
if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) &&
|
||||
(guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP) ||
|
||||
guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDPID))) {
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, 1, 1);
|
||||
if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
|
||||
svm_clr_intercept(svm, INTERCEPT_RDTSCP);
|
||||
}
|
||||
}
|
||||
|
||||
void sev_es_vcpu_reset(struct vcpu_svm *svm)
|
||||
|
@ -62,8 +62,6 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
|
||||
#define SEG_TYPE_LDT 2
|
||||
#define SEG_TYPE_BUSY_TSS16 3
|
||||
|
||||
#define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
|
||||
|
||||
static bool erratum_383_found __read_mostly;
|
||||
|
||||
u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
|
||||
@ -101,6 +99,7 @@ static const struct svm_direct_access_msrs {
|
||||
{ .index = MSR_EFER, .always = false },
|
||||
{ .index = MSR_IA32_CR_PAT, .always = false },
|
||||
{ .index = MSR_AMD64_SEV_ES_GHCB, .always = true },
|
||||
{ .index = MSR_TSC_AUX, .always = false },
|
||||
{ .index = MSR_INVALID, .always = false },
|
||||
};
|
||||
|
||||
@ -172,7 +171,7 @@ static int vls = true;
|
||||
module_param(vls, int, 0444);
|
||||
|
||||
/* enable/disable Virtual GIF */
|
||||
static int vgif = true;
|
||||
int vgif = true;
|
||||
module_param(vgif, int, 0444);
|
||||
|
||||
/* enable/disable LBR virtualization */
|
||||
@ -189,6 +188,9 @@ module_param(tsc_scaling, int, 0444);
|
||||
static bool avic;
|
||||
module_param(avic, bool, 0444);
|
||||
|
||||
static bool force_avic;
|
||||
module_param_unsafe(force_avic, bool, 0444);
|
||||
|
||||
bool __read_mostly dump_invalid_vmcb;
|
||||
module_param(dump_invalid_vmcb, bool, 0644);
|
||||
|
||||
@ -790,6 +792,17 @@ static void init_msrpm_offsets(void)
|
||||
}
|
||||
}
|
||||
|
||||
void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
|
||||
{
|
||||
to_vmcb->save.dbgctl = from_vmcb->save.dbgctl;
|
||||
to_vmcb->save.br_from = from_vmcb->save.br_from;
|
||||
to_vmcb->save.br_to = from_vmcb->save.br_to;
|
||||
to_vmcb->save.last_excp_from = from_vmcb->save.last_excp_from;
|
||||
to_vmcb->save.last_excp_to = from_vmcb->save.last_excp_to;
|
||||
|
||||
vmcb_mark_dirty(to_vmcb, VMCB_LBR);
|
||||
}
|
||||
|
||||
static void svm_enable_lbrv(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
@ -799,6 +812,10 @@ static void svm_enable_lbrv(struct kvm_vcpu *vcpu)
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
|
||||
|
||||
/* Move the LBR msrs to the vmcb02 so that the guest can see them. */
|
||||
if (is_guest_mode(vcpu))
|
||||
svm_copy_lbrs(svm->vmcb, svm->vmcb01.ptr);
|
||||
}
|
||||
|
||||
static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
|
||||
@ -810,6 +827,67 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 0, 0);
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 0, 0);
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
|
||||
|
||||
/*
|
||||
* Move the LBR msrs back to the vmcb01 to avoid copying them
|
||||
* on nested guest entries.
|
||||
*/
|
||||
if (is_guest_mode(vcpu))
|
||||
svm_copy_lbrs(svm->vmcb01.ptr, svm->vmcb);
|
||||
}
|
||||
|
||||
static int svm_get_lbr_msr(struct vcpu_svm *svm, u32 index)
|
||||
{
|
||||
/*
|
||||
* If the LBR virtualization is disabled, the LBR msrs are always
|
||||
* kept in the vmcb01 to avoid copying them on nested guest entries.
|
||||
*
|
||||
* If nested, and the LBR virtualization is enabled/disabled, the msrs
|
||||
* are moved between the vmcb01 and vmcb02 as needed.
|
||||
*/
|
||||
struct vmcb *vmcb =
|
||||
(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) ?
|
||||
svm->vmcb : svm->vmcb01.ptr;
|
||||
|
||||
switch (index) {
|
||||
case MSR_IA32_DEBUGCTLMSR:
|
||||
return vmcb->save.dbgctl;
|
||||
case MSR_IA32_LASTBRANCHFROMIP:
|
||||
return vmcb->save.br_from;
|
||||
case MSR_IA32_LASTBRANCHTOIP:
|
||||
return vmcb->save.br_to;
|
||||
case MSR_IA32_LASTINTFROMIP:
|
||||
return vmcb->save.last_excp_from;
|
||||
case MSR_IA32_LASTINTTOIP:
|
||||
return vmcb->save.last_excp_to;
|
||||
default:
|
||||
KVM_BUG(false, svm->vcpu.kvm,
|
||||
"%s: Unknown MSR 0x%x", __func__, index);
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
void svm_update_lbrv(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
|
||||
bool enable_lbrv = svm_get_lbr_msr(svm, MSR_IA32_DEBUGCTLMSR) &
|
||||
DEBUGCTLMSR_LBR;
|
||||
|
||||
bool current_enable_lbrv = !!(svm->vmcb->control.virt_ext &
|
||||
LBR_CTL_ENABLE_MASK);
|
||||
|
||||
if (unlikely(is_guest_mode(vcpu) && svm->lbrv_enabled))
|
||||
if (unlikely(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))
|
||||
enable_lbrv = true;
|
||||
|
||||
if (enable_lbrv == current_enable_lbrv)
|
||||
return;
|
||||
|
||||
if (enable_lbrv)
|
||||
svm_enable_lbrv(vcpu);
|
||||
else
|
||||
svm_disable_lbrv(vcpu);
|
||||
}
|
||||
|
||||
void disable_nmi_singlestep(struct vcpu_svm *svm)
|
||||
@ -831,6 +909,9 @@ static void grow_ple_window(struct kvm_vcpu *vcpu)
|
||||
struct vmcb_control_area *control = &svm->vmcb->control;
|
||||
int old = control->pause_filter_count;
|
||||
|
||||
if (kvm_pause_in_guest(vcpu->kvm) || !old)
|
||||
return;
|
||||
|
||||
control->pause_filter_count = __grow_ple_window(old,
|
||||
pause_filter_count,
|
||||
pause_filter_count_grow,
|
||||
@ -849,6 +930,9 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu)
|
||||
struct vmcb_control_area *control = &svm->vmcb->control;
|
||||
int old = control->pause_filter_count;
|
||||
|
||||
if (kvm_pause_in_guest(vcpu->kvm) || !old)
|
||||
return;
|
||||
|
||||
control->pause_filter_count =
|
||||
__shrink_ple_window(old,
|
||||
pause_filter_count,
|
||||
@ -960,6 +1044,8 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 0, 0);
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 0, 0);
|
||||
|
||||
svm->v_vmload_vmsave_enabled = false;
|
||||
} else {
|
||||
/*
|
||||
* If hardware supports Virtual VMLOAD VMSAVE then enable it
|
||||
@ -979,8 +1065,9 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
static void init_vmcb(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
struct vmcb_control_area *control = &svm->vmcb->control;
|
||||
struct vmcb_save_area *save = &svm->vmcb->save;
|
||||
struct vmcb *vmcb = svm->vmcb01.ptr;
|
||||
struct vmcb_control_area *control = &vmcb->control;
|
||||
struct vmcb_save_area *save = &vmcb->save;
|
||||
|
||||
svm_set_intercept(svm, INTERCEPT_CR0_READ);
|
||||
svm_set_intercept(svm, INTERCEPT_CR3_READ);
|
||||
@ -1104,7 +1191,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
|
||||
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
|
||||
|
||||
if (kvm_vcpu_apicv_active(vcpu))
|
||||
avic_init_vmcb(svm);
|
||||
avic_init_vmcb(svm, vmcb);
|
||||
|
||||
if (vgif) {
|
||||
svm_clr_intercept(svm, INTERCEPT_STGI);
|
||||
@ -1122,10 +1209,10 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
}
|
||||
|
||||
svm_hv_init_vmcb(svm->vmcb);
|
||||
svm_hv_init_vmcb(vmcb);
|
||||
init_vmcb_after_set_cpuid(vcpu);
|
||||
|
||||
vmcb_mark_all_dirty(svm->vmcb);
|
||||
vmcb_mark_all_dirty(vmcb);
|
||||
|
||||
enable_gif(svm);
|
||||
}
|
||||
@ -1380,7 +1467,7 @@ static void svm_set_vintr(struct vcpu_svm *svm)
|
||||
/*
|
||||
* The following fields are ignored when AVIC is enabled
|
||||
*/
|
||||
WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
|
||||
WARN_ON(kvm_vcpu_apicv_activated(&svm->vcpu));
|
||||
|
||||
svm_set_intercept(svm, INTERCEPT_VINTR);
|
||||
|
||||
@ -2142,7 +2229,7 @@ void svm_set_gif(struct vcpu_svm *svm, bool value)
|
||||
* Likewise, clear the VINTR intercept, we will set it
|
||||
* again while processing KVM_REQ_EVENT if needed.
|
||||
*/
|
||||
if (vgif_enabled(svm))
|
||||
if (vgif)
|
||||
svm_clr_intercept(svm, INTERCEPT_STGI);
|
||||
if (svm_is_intercept(svm, INTERCEPT_VINTR))
|
||||
svm_clear_vintr(svm);
|
||||
@ -2160,7 +2247,7 @@ void svm_set_gif(struct vcpu_svm *svm, bool value)
|
||||
* in use, we still rely on the VINTR intercept (rather than
|
||||
* STGI) to detect an open interrupt window.
|
||||
*/
|
||||
if (!vgif_enabled(svm))
|
||||
if (!vgif)
|
||||
svm_clear_vintr(svm);
|
||||
}
|
||||
}
|
||||
@ -2575,25 +2662,12 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
case MSR_TSC_AUX:
|
||||
msr_info->data = svm->tsc_aux;
|
||||
break;
|
||||
/*
|
||||
* Nobody will change the following 5 values in the VMCB so we can
|
||||
* safely return them on rdmsr. They will always be 0 until LBRV is
|
||||
* implemented.
|
||||
*/
|
||||
case MSR_IA32_DEBUGCTLMSR:
|
||||
msr_info->data = svm->vmcb->save.dbgctl;
|
||||
break;
|
||||
case MSR_IA32_LASTBRANCHFROMIP:
|
||||
msr_info->data = svm->vmcb->save.br_from;
|
||||
break;
|
||||
case MSR_IA32_LASTBRANCHTOIP:
|
||||
msr_info->data = svm->vmcb->save.br_to;
|
||||
break;
|
||||
case MSR_IA32_LASTINTFROMIP:
|
||||
msr_info->data = svm->vmcb->save.last_excp_from;
|
||||
break;
|
||||
case MSR_IA32_LASTINTTOIP:
|
||||
msr_info->data = svm->vmcb->save.last_excp_to;
|
||||
msr_info->data = svm_get_lbr_msr(svm, msr_info->index);
|
||||
break;
|
||||
case MSR_VM_HSAVE_PA:
|
||||
msr_info->data = svm->nested.hsave_msr;
|
||||
@ -2839,12 +2913,13 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
|
||||
if (data & DEBUGCTL_RESERVED_BITS)
|
||||
return 1;
|
||||
|
||||
svm->vmcb->save.dbgctl = data;
|
||||
vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
|
||||
if (data & (1ULL<<0))
|
||||
svm_enable_lbrv(vcpu);
|
||||
if (svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)
|
||||
svm->vmcb->save.dbgctl = data;
|
||||
else
|
||||
svm_disable_lbrv(vcpu);
|
||||
svm->vmcb01.ptr->save.dbgctl = data;
|
||||
|
||||
svm_update_lbrv(vcpu);
|
||||
|
||||
break;
|
||||
case MSR_VM_HSAVE_PA:
|
||||
/*
|
||||
@ -2901,9 +2976,16 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu)
|
||||
svm_clear_vintr(to_svm(vcpu));
|
||||
|
||||
/*
|
||||
* For AVIC, the only reason to end up here is ExtINTs.
|
||||
* If not running nested, for AVIC, the only reason to end up here is ExtINTs.
|
||||
* In this case AVIC was temporarily disabled for
|
||||
* requesting the IRQ window and we have to re-enable it.
|
||||
*
|
||||
* If running nested, still remove the VM wide AVIC inhibit to
|
||||
* support case in which the interrupt window was requested when the
|
||||
* vCPU was not running nested.
|
||||
|
||||
* All vCPUs which run still run nested, will remain to have their
|
||||
* AVIC still inhibited due to per-cpu AVIC inhibition.
|
||||
*/
|
||||
kvm_clear_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN);
|
||||
|
||||
@ -2914,7 +2996,6 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu)
|
||||
static int pause_interception(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
bool in_kernel;
|
||||
|
||||
/*
|
||||
* CPL is not made available for an SEV-ES guest, therefore
|
||||
* vcpu->arch.preempted_in_kernel can never be true. Just
|
||||
@ -2922,8 +3003,7 @@ static int pause_interception(struct kvm_vcpu *vcpu)
|
||||
*/
|
||||
in_kernel = !sev_es_guest(vcpu->kvm) && svm_get_cpl(vcpu) == 0;
|
||||
|
||||
if (!kvm_pause_in_guest(vcpu->kvm))
|
||||
grow_ple_window(vcpu);
|
||||
grow_ple_window(vcpu);
|
||||
|
||||
kvm_vcpu_on_spin(vcpu, in_kernel);
|
||||
return kvm_skip_emulated_instruction(vcpu);
|
||||
@ -3496,14 +3576,20 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
|
||||
* enabled, the STGI interception will not occur. Enable the irq
|
||||
* window under the assumption that the hardware will set the GIF.
|
||||
*/
|
||||
if (vgif_enabled(svm) || gif_set(svm)) {
|
||||
if (vgif || gif_set(svm)) {
|
||||
/*
|
||||
* IRQ window is not needed when AVIC is enabled,
|
||||
* unless we have pending ExtINT since it cannot be injected
|
||||
* via AVIC. In such case, we need to temporarily disable AVIC,
|
||||
* via AVIC. In such case, KVM needs to temporarily disable AVIC,
|
||||
* and fallback to injecting IRQ via V_IRQ.
|
||||
*
|
||||
* If running nested, AVIC is already locally inhibited
|
||||
* on this vCPU, therefore there is no need to request
|
||||
* the VM wide AVIC inhibition.
|
||||
*/
|
||||
kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN);
|
||||
if (!is_guest_mode(vcpu))
|
||||
kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN);
|
||||
|
||||
svm_set_vintr(svm);
|
||||
}
|
||||
}
|
||||
@ -3516,7 +3602,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
|
||||
return; /* IRET will cause a vm exit */
|
||||
|
||||
if (!gif_set(svm)) {
|
||||
if (vgif_enabled(svm))
|
||||
if (vgif)
|
||||
svm_set_intercept(svm, INTERCEPT_STGI);
|
||||
return; /* STGI will cause a vm exit */
|
||||
}
|
||||
@ -3865,7 +3951,7 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
|
||||
hv_track_root_tdp(vcpu, root_hpa);
|
||||
|
||||
cr3 = vcpu->arch.cr3;
|
||||
} else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
|
||||
} else if (vcpu->arch.mmu->root_role.level >= PT64_ROOT_4LEVEL) {
|
||||
cr3 = __sme_set(root_hpa) | kvm_get_active_pcid(vcpu);
|
||||
} else {
|
||||
/* PCID in the guest should be impossible with a 32-bit MMU. */
|
||||
@ -3946,6 +4032,17 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
|
||||
|
||||
svm->tsc_scaling_enabled = tsc_scaling && guest_cpuid_has(vcpu, X86_FEATURE_TSCRATEMSR);
|
||||
svm->lbrv_enabled = lbrv && guest_cpuid_has(vcpu, X86_FEATURE_LBRV);
|
||||
|
||||
svm->v_vmload_vmsave_enabled = vls && guest_cpuid_has(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
|
||||
|
||||
svm->pause_filter_enabled = kvm_cpu_cap_has(X86_FEATURE_PAUSEFILTER) &&
|
||||
guest_cpuid_has(vcpu, X86_FEATURE_PAUSEFILTER);
|
||||
|
||||
svm->pause_threshold_enabled = kvm_cpu_cap_has(X86_FEATURE_PFTHRESHOLD) &&
|
||||
guest_cpuid_has(vcpu, X86_FEATURE_PFTHRESHOLD);
|
||||
|
||||
svm->vgif_enabled = vgif && guest_cpuid_has(vcpu, X86_FEATURE_VGIF);
|
||||
|
||||
svm_recalc_instruction_intercepts(vcpu, svm);
|
||||
|
||||
@ -3963,13 +4060,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
*/
|
||||
if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC))
|
||||
kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_X2APIC);
|
||||
|
||||
/*
|
||||
* Currently, AVIC does not work with nested virtualization.
|
||||
* So, we disable AVIC when cpuid for SVM is set in the L1 guest.
|
||||
*/
|
||||
if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
|
||||
kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_NESTED);
|
||||
}
|
||||
init_vmcb_after_set_cpuid(vcpu);
|
||||
}
|
||||
@ -4224,7 +4314,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
|
||||
svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
|
||||
svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
|
||||
|
||||
ret = nested_svm_vmexit(svm);
|
||||
ret = nested_svm_simple_vmexit(svm, SVM_EXIT_SW);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
@ -4321,7 +4411,7 @@ static void svm_enable_smi_window(struct kvm_vcpu *vcpu)
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
|
||||
if (!gif_set(svm)) {
|
||||
if (vgif_enabled(svm))
|
||||
if (vgif)
|
||||
svm_set_intercept(svm, INTERCEPT_STGI);
|
||||
/* STGI will cause a vm exit */
|
||||
} else {
|
||||
@ -4605,7 +4695,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
|
||||
|
||||
.sched_in = svm_sched_in,
|
||||
|
||||
.pmu_ops = &amd_pmu_ops,
|
||||
.nested_ops = &svm_nested_ops,
|
||||
|
||||
.deliver_interrupt = svm_deliver_interrupt,
|
||||
@ -4633,6 +4722,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
|
||||
.complete_emulated_msr = svm_complete_emulated_msr,
|
||||
|
||||
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
|
||||
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
|
||||
};
|
||||
|
||||
/*
|
||||
@ -4696,6 +4786,20 @@ static __init void svm_set_cpu_caps(void)
|
||||
if (tsc_scaling)
|
||||
kvm_cpu_cap_set(X86_FEATURE_TSCRATEMSR);
|
||||
|
||||
if (vls)
|
||||
kvm_cpu_cap_set(X86_FEATURE_V_VMSAVE_VMLOAD);
|
||||
if (lbrv)
|
||||
kvm_cpu_cap_set(X86_FEATURE_LBRV);
|
||||
|
||||
if (boot_cpu_has(X86_FEATURE_PAUSEFILTER))
|
||||
kvm_cpu_cap_set(X86_FEATURE_PAUSEFILTER);
|
||||
|
||||
if (boot_cpu_has(X86_FEATURE_PFTHRESHOLD))
|
||||
kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD);
|
||||
|
||||
if (vgif)
|
||||
kvm_cpu_cap_set(X86_FEATURE_VGIF);
|
||||
|
||||
/* Nested VM can receive #VMEXIT instead of triggering #GP */
|
||||
kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
|
||||
}
|
||||
@ -4789,6 +4893,9 @@ static __init int svm_hardware_setup(void)
|
||||
get_npt_level(), PG_LEVEL_1G);
|
||||
pr_info("kvm: Nested Paging %sabled\n", npt_enabled ? "en" : "dis");
|
||||
|
||||
/* Setup shadow_me_value and shadow_me_mask */
|
||||
kvm_mmu_set_me_spte_mask(sme_me_mask, sme_me_mask);
|
||||
|
||||
/* Note, SEV setup consumes npt_enabled. */
|
||||
sev_hardware_setup();
|
||||
|
||||
@ -4807,15 +4914,20 @@ static __init int svm_hardware_setup(void)
|
||||
nrips = false;
|
||||
}
|
||||
|
||||
enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC);
|
||||
enable_apicv = avic = avic && npt_enabled && (boot_cpu_has(X86_FEATURE_AVIC) || force_avic);
|
||||
|
||||
if (enable_apicv) {
|
||||
pr_info("AVIC enabled\n");
|
||||
if (!boot_cpu_has(X86_FEATURE_AVIC)) {
|
||||
pr_warn("AVIC is not supported in CPUID but force enabled");
|
||||
pr_warn("Your system might crash and burn");
|
||||
} else
|
||||
pr_info("AVIC enabled\n");
|
||||
|
||||
amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
|
||||
} else {
|
||||
svm_x86_ops.vcpu_blocking = NULL;
|
||||
svm_x86_ops.vcpu_unblocking = NULL;
|
||||
svm_x86_ops.vcpu_get_apicv_inhibit_reasons = NULL;
|
||||
}
|
||||
|
||||
if (vls) {
|
||||
@ -4880,6 +4992,7 @@ static struct kvm_x86_init_ops svm_init_ops __initdata = {
|
||||
.check_processor_compatibility = svm_check_processor_compat,
|
||||
|
||||
.runtime_ops = &svm_x86_ops,
|
||||
.pmu_ops = &amd_pmu_ops,
|
||||
};
|
||||
|
||||
static int __init svm_init(void)
|
||||
|
@ -29,10 +29,11 @@
|
||||
#define IOPM_SIZE PAGE_SIZE * 3
|
||||
#define MSRPM_SIZE PAGE_SIZE * 2
|
||||
|
||||
#define MAX_DIRECT_ACCESS_MSRS 20
|
||||
#define MAX_DIRECT_ACCESS_MSRS 21
|
||||
#define MSRPM_OFFSETS 16
|
||||
extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
|
||||
extern bool npt_enabled;
|
||||
extern int vgif;
|
||||
extern bool intercept_smi;
|
||||
|
||||
/*
|
||||
@ -231,9 +232,14 @@ struct vcpu_svm {
|
||||
unsigned int3_injected;
|
||||
unsigned long int3_rip;
|
||||
|
||||
/* cached guest cpuid flags for faster access */
|
||||
/* optional nested SVM features that are enabled for this guest */
|
||||
bool nrips_enabled : 1;
|
||||
bool tsc_scaling_enabled : 1;
|
||||
bool v_vmload_vmsave_enabled : 1;
|
||||
bool lbrv_enabled : 1;
|
||||
bool pause_filter_enabled : 1;
|
||||
bool pause_threshold_enabled : 1;
|
||||
bool vgif_enabled : 1;
|
||||
|
||||
u32 ldr_reg;
|
||||
u32 dfr_reg;
|
||||
@ -452,44 +458,70 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit)
|
||||
return vmcb_is_intercept(&svm->vmcb->control, bit);
|
||||
}
|
||||
|
||||
static inline bool vgif_enabled(struct vcpu_svm *svm)
|
||||
static inline bool nested_vgif_enabled(struct vcpu_svm *svm)
|
||||
{
|
||||
return !!(svm->vmcb->control.int_ctl & V_GIF_ENABLE_MASK);
|
||||
return svm->vgif_enabled && (svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK);
|
||||
}
|
||||
|
||||
static inline struct vmcb *get_vgif_vmcb(struct vcpu_svm *svm)
|
||||
{
|
||||
if (!vgif)
|
||||
return NULL;
|
||||
|
||||
if (is_guest_mode(&svm->vcpu) && !nested_vgif_enabled(svm))
|
||||
return svm->nested.vmcb02.ptr;
|
||||
else
|
||||
return svm->vmcb01.ptr;
|
||||
}
|
||||
|
||||
static inline void enable_gif(struct vcpu_svm *svm)
|
||||
{
|
||||
if (vgif_enabled(svm))
|
||||
svm->vmcb->control.int_ctl |= V_GIF_MASK;
|
||||
struct vmcb *vmcb = get_vgif_vmcb(svm);
|
||||
|
||||
if (vmcb)
|
||||
vmcb->control.int_ctl |= V_GIF_MASK;
|
||||
else
|
||||
svm->vcpu.arch.hflags |= HF_GIF_MASK;
|
||||
}
|
||||
|
||||
static inline void disable_gif(struct vcpu_svm *svm)
|
||||
{
|
||||
if (vgif_enabled(svm))
|
||||
svm->vmcb->control.int_ctl &= ~V_GIF_MASK;
|
||||
struct vmcb *vmcb = get_vgif_vmcb(svm);
|
||||
|
||||
if (vmcb)
|
||||
vmcb->control.int_ctl &= ~V_GIF_MASK;
|
||||
else
|
||||
svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
|
||||
}
|
||||
|
||||
static inline bool gif_set(struct vcpu_svm *svm)
|
||||
{
|
||||
if (vgif_enabled(svm))
|
||||
return !!(svm->vmcb->control.int_ctl & V_GIF_MASK);
|
||||
struct vmcb *vmcb = get_vgif_vmcb(svm);
|
||||
|
||||
if (vmcb)
|
||||
return !!(vmcb->control.int_ctl & V_GIF_MASK);
|
||||
else
|
||||
return !!(svm->vcpu.arch.hflags & HF_GIF_MASK);
|
||||
}
|
||||
|
||||
static inline bool nested_npt_enabled(struct vcpu_svm *svm)
|
||||
{
|
||||
return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
|
||||
}
|
||||
|
||||
/* svm.c */
|
||||
#define MSR_INVALID 0xffffffffU
|
||||
|
||||
#define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
|
||||
|
||||
extern bool dump_invalid_vmcb;
|
||||
|
||||
u32 svm_msrpm_offset(u32 msr);
|
||||
u32 *svm_vcpu_alloc_msrpm(void);
|
||||
void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
|
||||
void svm_vcpu_free_msrpm(u32 *msrpm);
|
||||
void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
|
||||
void svm_update_lbrv(struct kvm_vcpu *vcpu);
|
||||
|
||||
int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
|
||||
void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
|
||||
@ -574,7 +606,7 @@ extern struct kvm_x86_nested_ops svm_nested_ops;
|
||||
int avic_ga_log_notifier(u32 ga_tag);
|
||||
void avic_vm_destroy(struct kvm *kvm);
|
||||
int avic_vm_init(struct kvm *kvm);
|
||||
void avic_init_vmcb(struct vcpu_svm *svm);
|
||||
void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *vmcb);
|
||||
int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu);
|
||||
int avic_unaccelerated_access_interception(struct kvm_vcpu *vcpu);
|
||||
int avic_init_vcpu(struct vcpu_svm *svm);
|
||||
@ -592,6 +624,7 @@ int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
|
||||
void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
|
||||
void avic_vcpu_unblocking(struct kvm_vcpu *vcpu);
|
||||
void avic_ring_doorbell(struct kvm_vcpu *vcpu);
|
||||
unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu);
|
||||
|
||||
/* sev.c */
|
||||
|
||||
|
@ -1459,6 +1459,26 @@ TRACE_EVENT(kvm_avic_ga_log,
|
||||
__entry->vmid, __entry->vcpuid)
|
||||
);
|
||||
|
||||
TRACE_EVENT(kvm_avic_kick_vcpu_slowpath,
|
||||
TP_PROTO(u32 icrh, u32 icrl, u32 index),
|
||||
TP_ARGS(icrh, icrl, index),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(u32, icrh)
|
||||
__field(u32, icrl)
|
||||
__field(u32, index)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->icrh = icrh;
|
||||
__entry->icrl = icrl;
|
||||
__entry->index = index;
|
||||
),
|
||||
|
||||
TP_printk("icrh:icrl=%#08x:%08x, index=%u",
|
||||
__entry->icrh, __entry->icrl, __entry->index)
|
||||
);
|
||||
|
||||
TRACE_EVENT(kvm_hv_timer_state,
|
||||
TP_PROTO(unsigned int vcpu_id, unsigned int hv_timer_in_use),
|
||||
TP_ARGS(vcpu_id, hv_timer_in_use),
|
||||
|
@ -476,24 +476,23 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned long *exit
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
|
||||
struct x86_exception *fault)
|
||||
static bool nested_vmx_handle_page_fault_workaround(struct kvm_vcpu *vcpu,
|
||||
struct x86_exception *fault)
|
||||
{
|
||||
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
|
||||
|
||||
WARN_ON(!is_guest_mode(vcpu));
|
||||
|
||||
if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code) &&
|
||||
!to_vmx(vcpu)->nested.nested_run_pending) {
|
||||
!WARN_ON_ONCE(to_vmx(vcpu)->nested.nested_run_pending)) {
|
||||
vmcs12->vm_exit_intr_error_code = fault->error_code;
|
||||
nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
|
||||
PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
|
||||
INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
|
||||
fault->address);
|
||||
} else {
|
||||
kvm_inject_page_fault(vcpu, fault);
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
static int nested_vmx_check_io_bitmap_controls(struct kvm_vcpu *vcpu,
|
||||
@ -2614,9 +2613,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
|
||||
vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
|
||||
}
|
||||
|
||||
if (!enable_ept)
|
||||
vcpu->arch.walk_mmu->inject_page_fault = vmx_inject_page_fault_nested;
|
||||
|
||||
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) &&
|
||||
WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
|
||||
vmcs12->guest_ia32_perf_global_ctrl))) {
|
||||
@ -3695,12 +3691,34 @@ vmcs12_guest_cr4(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
|
||||
}
|
||||
|
||||
static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
|
||||
struct vmcs12 *vmcs12)
|
||||
struct vmcs12 *vmcs12,
|
||||
u32 vm_exit_reason, u32 exit_intr_info)
|
||||
{
|
||||
u32 idt_vectoring;
|
||||
unsigned int nr;
|
||||
|
||||
if (vcpu->arch.exception.injected) {
|
||||
/*
|
||||
* Per the SDM, VM-Exits due to double and triple faults are never
|
||||
* considered to occur during event delivery, even if the double/triple
|
||||
* fault is the result of an escalating vectoring issue.
|
||||
*
|
||||
* Note, the SDM qualifies the double fault behavior with "The original
|
||||
* event results in a double-fault exception". It's unclear why the
|
||||
* qualification exists since exits due to double fault can occur only
|
||||
* while vectoring a different exception (injected events are never
|
||||
* subject to interception), i.e. there's _always_ an original event.
|
||||
*
|
||||
* The SDM also uses NMI as a confusing example for the "original event
|
||||
* causes the VM exit directly" clause. NMI isn't special in any way,
|
||||
* the same rule applies to all events that cause an exit directly.
|
||||
* NMI is an odd choice for the example because NMIs can only occur on
|
||||
* instruction boundaries, i.e. they _can't_ occur during vectoring.
|
||||
*/
|
||||
if ((u16)vm_exit_reason == EXIT_REASON_TRIPLE_FAULT ||
|
||||
((u16)vm_exit_reason == EXIT_REASON_EXCEPTION_NMI &&
|
||||
is_double_fault(exit_intr_info))) {
|
||||
vmcs12->idt_vectoring_info_field = 0;
|
||||
} else if (vcpu->arch.exception.injected) {
|
||||
nr = vcpu->arch.exception.nr;
|
||||
idt_vectoring = nr | VECTORING_INFO_VALID_MASK;
|
||||
|
||||
@ -3733,6 +3751,8 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu,
|
||||
idt_vectoring |= INTR_TYPE_EXT_INTR;
|
||||
|
||||
vmcs12->idt_vectoring_info_field = idt_vectoring;
|
||||
} else {
|
||||
vmcs12->idt_vectoring_info_field = 0;
|
||||
}
|
||||
}
|
||||
|
||||
@ -4202,12 +4222,12 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
|
||||
if (to_vmx(vcpu)->exit_reason.enclave_mode)
|
||||
vmcs12->vm_exit_reason |= VMX_EXIT_REASONS_SGX_ENCLAVE_MODE;
|
||||
vmcs12->exit_qualification = exit_qualification;
|
||||
vmcs12->vm_exit_intr_info = exit_intr_info;
|
||||
|
||||
vmcs12->idt_vectoring_info_field = 0;
|
||||
vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
|
||||
vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
|
||||
|
||||
/*
|
||||
* On VM-Exit due to a failed VM-Entry, the VMCS isn't marked launched
|
||||
* and only EXIT_REASON and EXIT_QUALIFICATION are updated, all other
|
||||
* exit info fields are unmodified.
|
||||
*/
|
||||
if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) {
|
||||
vmcs12->launch_state = 1;
|
||||
|
||||
@ -4219,7 +4239,12 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
|
||||
* Transfer the event that L0 or L1 may wanted to inject into
|
||||
* L2 to IDT_VECTORING_INFO_FIELD.
|
||||
*/
|
||||
vmcs12_save_pending_event(vcpu, vmcs12);
|
||||
vmcs12_save_pending_event(vcpu, vmcs12,
|
||||
vm_exit_reason, exit_intr_info);
|
||||
|
||||
vmcs12->vm_exit_intr_info = exit_intr_info;
|
||||
vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
|
||||
vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
|
||||
|
||||
/*
|
||||
* According to spec, there's no need to store the guest's
|
||||
@ -4518,9 +4543,6 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
|
||||
/* trying to cancel vmlaunch/vmresume is a bug */
|
||||
WARN_ON_ONCE(vmx->nested.nested_run_pending);
|
||||
|
||||
/* Similarly, triple faults in L2 should never escape. */
|
||||
WARN_ON_ONCE(kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu));
|
||||
|
||||
if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
|
||||
/*
|
||||
* KVM_REQ_GET_NESTED_STATE_PAGES is also used to map
|
||||
@ -6809,6 +6831,7 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
|
||||
struct kvm_x86_nested_ops vmx_nested_ops = {
|
||||
.leave_nested = vmx_leave_nested,
|
||||
.check_events = vmx_check_nested_events,
|
||||
.handle_page_fault_workaround = nested_vmx_handle_page_fault_workaround,
|
||||
.hv_timer_pending = nested_vmx_preemption_timer_pending,
|
||||
.triple_fault = nested_vmx_triple_fault,
|
||||
.get_state = vmx_get_nested_state,
|
||||
|
@ -719,7 +719,7 @@ static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
|
||||
intel_pmu_release_guest_lbr_event(vcpu);
|
||||
}
|
||||
|
||||
struct kvm_pmu_ops intel_pmu_ops = {
|
||||
struct kvm_pmu_ops intel_pmu_ops __initdata = {
|
||||
.pmc_perf_hw_id = intel_pmc_perf_hw_id,
|
||||
.pmc_is_enabled = intel_pmc_is_enabled,
|
||||
.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
|
||||
|
@ -202,16 +202,17 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
|
||||
void pi_wakeup_handler(void)
|
||||
{
|
||||
int cpu = smp_processor_id();
|
||||
struct list_head *wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu);
|
||||
raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu);
|
||||
struct vcpu_vmx *vmx;
|
||||
|
||||
raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
|
||||
list_for_each_entry(vmx, &per_cpu(wakeup_vcpus_on_cpu, cpu),
|
||||
pi_wakeup_list) {
|
||||
raw_spin_lock(spinlock);
|
||||
list_for_each_entry(vmx, wakeup_list, pi_wakeup_list) {
|
||||
|
||||
if (pi_test_on(&vmx->pi_desc))
|
||||
kvm_vcpu_wake_up(&vmx->vcpu);
|
||||
}
|
||||
raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
|
||||
raw_spin_unlock(spinlock);
|
||||
}
|
||||
|
||||
void __init pi_init_cpu(int cpu)
|
||||
@ -311,7 +312,7 @@ int vmx_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
|
||||
continue;
|
||||
}
|
||||
|
||||
vcpu_info.pi_desc_addr = __pa(&to_vmx(vcpu)->pi_desc);
|
||||
vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu));
|
||||
vcpu_info.vector = irq.vector;
|
||||
|
||||
trace_kvm_pi_irte_update(host_irq, vcpu->vcpu_id, e->gsi,
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user