2020-02-10 06:02:54 +00:00
|
|
|
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
|
|
|
|
===========================================================
|
2019-04-18 10:39:27 +00:00
|
|
|
POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1)
|
2020-02-10 06:02:54 +00:00
|
|
|
===========================================================
|
2019-04-18 10:39:27 +00:00
|
|
|
|
|
|
|
Device types supported:
|
2020-02-10 06:02:54 +00:00
|
|
|
- KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1
|
2019-04-18 10:39:27 +00:00
|
|
|
|
|
|
|
This device acts as a VM interrupt controller. It provides the KVM
|
|
|
|
interface to configure the interrupt sources of a VM in the underlying
|
|
|
|
POWER9 XIVE interrupt controller.
|
|
|
|
|
|
|
|
Only one XIVE instance may be instantiated. A guest XIVE device
|
|
|
|
requires a POWER9 host and the guest OS should have support for the
|
|
|
|
XIVE native exploitation interrupt mode. If not, it should run using
|
|
|
|
the legacy interrupt mode, referred as XICS (POWER7/8).
|
|
|
|
|
2019-04-18 10:39:37 +00:00
|
|
|
* Device Mappings
|
|
|
|
|
|
|
|
The KVM device exposes different MMIO ranges of the XIVE HW which
|
|
|
|
are required for interrupt management. These are exposed to the
|
|
|
|
guest in VMAs populated with a custom VM fault handler.
|
|
|
|
|
|
|
|
1. Thread Interrupt Management Area (TIMA)
|
|
|
|
|
|
|
|
Each thread has an associated Thread Interrupt Management context
|
|
|
|
composed of a set of registers. These registers let the thread
|
|
|
|
handle priority management and interrupt acknowledgment. The most
|
|
|
|
important are :
|
|
|
|
|
|
|
|
- Interrupt Pending Buffer (IPB)
|
|
|
|
- Current Processor Priority (CPPR)
|
|
|
|
- Notification Source Register (NSR)
|
|
|
|
|
|
|
|
They are exposed to software in four different pages each proposing
|
|
|
|
a view with a different privilege. The first page is for the
|
|
|
|
physical thread context and the second for the hypervisor. Only the
|
|
|
|
third (operating system) and the fourth (user level) are exposed the
|
|
|
|
guest.
|
|
|
|
|
2019-04-18 10:39:38 +00:00
|
|
|
2. Event State Buffer (ESB)
|
|
|
|
|
|
|
|
Each source is associated with an Event State Buffer (ESB) with
|
|
|
|
either a pair of even/odd pair of pages which provides commands to
|
|
|
|
manage the source: to trigger, to EOI, to turn off the source for
|
|
|
|
instance.
|
|
|
|
|
KVM: PPC: Book3S HV: XIVE: Add passthrough support
The KVM XICS-over-XIVE device and the proposed KVM XIVE native device
implement an IRQ space for the guest using the generic IPI interrupts
of the XIVE IC controller. These interrupts are allocated at the OPAL
level and "mapped" into the guest IRQ number space in the range 0-0x1FFF.
Interrupt management is performed in the XIVE way: using loads and
stores on the addresses of the XIVE IPI interrupt ESB pages.
Both KVM devices share the same internal structure caching information
on the interrupts, among which the xive_irq_data struct containing the
addresses of the IPI ESB pages and an extra one in case of pass-through.
The later contains the addresses of the ESB pages of the underlying HW
controller interrupts, PHB4 in all cases for now.
A guest, when running in the XICS legacy interrupt mode, lets the KVM
XICS-over-XIVE device "handle" interrupt management, that is to
perform the loads and stores on the addresses of the ESB pages of the
guest interrupts. However, when running in XIVE native exploitation
mode, the KVM XIVE native device exposes the interrupt ESB pages to
the guest and lets the guest perform directly the loads and stores.
The VMA exposing the ESB pages make use of a custom VM fault handler
which role is to populate the VMA with appropriate pages. When a fault
occurs, the guest IRQ number is deduced from the offset, and the ESB
pages of associated XIVE IPI interrupt are inserted in the VMA (using
the internal structure caching information on the interrupts).
Supporting device passthrough in the guest running in XIVE native
exploitation mode adds some extra refinements because the ESB pages
of a different HW controller (PHB4) need to be exposed to the guest
along with the initial IPI ESB pages of the XIVE IC controller. But
the overall mechanic is the same.
When the device HW irqs are mapped into or unmapped from the guest
IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped()
and kvmppc_xive_clr_mapped(), are called to record or clear the
passthrough interrupt information and to perform the switch.
The approach taken by this patch is to clear the ESB pages of the
guest IRQ number being mapped and let the VM fault handler repopulate.
The handler will insert the ESB page corresponding to the HW interrupt
of the device being passed-through or the initial IPI ESB page if the
device is being removed.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-04-18 10:39:39 +00:00
|
|
|
3. Device pass-through
|
|
|
|
|
|
|
|
When a device is passed-through into the guest, the source
|
|
|
|
interrupts are from a different HW controller (PHB4) and the ESB
|
|
|
|
pages exposed to the guest should accommadate this change.
|
|
|
|
|
|
|
|
The passthru_irq helpers, kvmppc_xive_set_mapped() and
|
|
|
|
kvmppc_xive_clr_mapped() are called when the device HW irqs are
|
|
|
|
mapped into or unmapped from the guest IRQ number space. The KVM
|
|
|
|
device extends these helpers to clear the ESB pages of the guest IRQ
|
|
|
|
number being mapped and then lets the VM fault handler repopulate.
|
|
|
|
The handler will insert the ESB page corresponding to the HW
|
|
|
|
interrupt of the device being passed-through or the initial IPI ESB
|
|
|
|
page if the device has being removed.
|
|
|
|
|
|
|
|
The ESB remapping is fully transparent to the guest and the OS
|
|
|
|
device driver. All handling is done within VFIO and the above
|
|
|
|
helpers in KVM-PPC.
|
|
|
|
|
2019-04-18 10:39:27 +00:00
|
|
|
* Groups:
|
|
|
|
|
2020-02-10 06:02:54 +00:00
|
|
|
1. KVM_DEV_XIVE_GRP_CTRL
|
|
|
|
Provides global controls on the device
|
|
|
|
|
2019-04-18 10:39:32 +00:00
|
|
|
Attributes:
|
|
|
|
1.1 KVM_DEV_XIVE_RESET (write only)
|
|
|
|
Resets the interrupt controller configuration for sources and event
|
|
|
|
queues. To be used by kexec and kdump.
|
2020-02-10 06:02:54 +00:00
|
|
|
|
2019-04-18 10:39:32 +00:00
|
|
|
Errors: none
|
2019-04-18 10:39:29 +00:00
|
|
|
|
2019-04-18 10:39:34 +00:00
|
|
|
1.2 KVM_DEV_XIVE_EQ_SYNC (write only)
|
|
|
|
Sync all the sources and queues and mark the EQ pages dirty. This
|
|
|
|
to make sure that a consistent memory state is captured when
|
|
|
|
migrating the VM.
|
2020-02-10 06:02:54 +00:00
|
|
|
|
2019-04-18 10:39:34 +00:00
|
|
|
Errors: none
|
|
|
|
|
2019-09-27 11:54:07 +00:00
|
|
|
1.3 KVM_DEV_XIVE_NR_SERVERS (write only)
|
|
|
|
The kvm_device_attr.addr points to a __u32 value which is the number of
|
|
|
|
interrupt server numbers (ie, highest possible vcpu id plus one).
|
2020-02-10 06:02:54 +00:00
|
|
|
|
2019-09-27 11:54:07 +00:00
|
|
|
Errors:
|
|
|
|
|
2020-02-10 06:02:54 +00:00
|
|
|
======= ==========================================
|
2021-09-13 13:57:44 +00:00
|
|
|
-EINVAL Value greater than KVM_MAX_VCPU_IDS.
|
2020-02-10 06:02:54 +00:00
|
|
|
-EFAULT Invalid user pointer for attr->addr.
|
|
|
|
-EBUSY A vCPU is already connected to the device.
|
|
|
|
======= ==========================================
|
|
|
|
|
|
|
|
2. KVM_DEV_XIVE_GRP_SOURCE (write only)
|
|
|
|
Initializes a new source in the XIVE device and mask it.
|
|
|
|
|
2019-04-18 10:39:29 +00:00
|
|
|
Attributes:
|
|
|
|
Interrupt source number (64-bit)
|
2020-02-10 06:02:54 +00:00
|
|
|
|
|
|
|
The kvm_device_attr.addr points to a __u64 value::
|
|
|
|
|
|
|
|
bits: | 63 .... 2 | 1 | 0
|
|
|
|
values: | unused | level | type
|
|
|
|
|
2019-04-18 10:39:29 +00:00
|
|
|
- type: 0:MSI 1:LSI
|
|
|
|
- level: assertion level in case of an LSI.
|
2020-02-10 06:02:54 +00:00
|
|
|
|
2019-04-18 10:39:29 +00:00
|
|
|
Errors:
|
2019-04-18 10:39:30 +00:00
|
|
|
|
2020-02-10 06:02:54 +00:00
|
|
|
======= ==========================================
|
|
|
|
-E2BIG Interrupt source number is out of range
|
|
|
|
-ENOMEM Could not create a new source block
|
|
|
|
-EFAULT Invalid user pointer for attr->addr.
|
|
|
|
-ENXIO Could not allocate underlying HW interrupt
|
|
|
|
======= ==========================================
|
|
|
|
|
|
|
|
3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only)
|
|
|
|
Configures source targeting
|
|
|
|
|
2019-04-18 10:39:30 +00:00
|
|
|
Attributes:
|
|
|
|
Interrupt source number (64-bit)
|
2020-02-10 06:02:54 +00:00
|
|
|
|
|
|
|
The kvm_device_attr.addr points to a __u64 value::
|
|
|
|
|
|
|
|
bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0
|
|
|
|
values: | eisn | mask | server | priority
|
|
|
|
|
2019-04-18 10:39:30 +00:00
|
|
|
- priority: 0-7 interrupt priority level
|
|
|
|
- server: CPU number chosen to handle the interrupt
|
|
|
|
- mask: mask flag (unused)
|
|
|
|
- eisn: Effective Interrupt Source Number
|
2020-02-10 06:02:54 +00:00
|
|
|
|
2019-04-18 10:39:30 +00:00
|
|
|
Errors:
|
2020-02-10 06:02:54 +00:00
|
|
|
|
|
|
|
======= =======================================================
|
|
|
|
-ENOENT Unknown source number
|
|
|
|
-EINVAL Not initialized source number
|
|
|
|
-EINVAL Invalid priority
|
|
|
|
-EINVAL Invalid CPU number.
|
|
|
|
-EFAULT Invalid user pointer for attr->addr.
|
|
|
|
-ENXIO CPU event queues not configured or configuration of the
|
|
|
|
underlying HW interrupt failed
|
|
|
|
-EBUSY No CPU available to serve interrupt
|
|
|
|
======= =======================================================
|
|
|
|
|
|
|
|
4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
|
|
|
|
Configures an event queue of a CPU
|
|
|
|
|
2019-04-18 10:39:31 +00:00
|
|
|
Attributes:
|
|
|
|
EQ descriptor identifier (64-bit)
|
2020-02-10 06:02:54 +00:00
|
|
|
|
|
|
|
The EQ descriptor identifier is a tuple (server, priority)::
|
|
|
|
|
|
|
|
bits: | 63 .... 32 | 31 .. 3 | 2 .. 0
|
|
|
|
values: | unused | server | priority
|
|
|
|
|
|
|
|
The kvm_device_attr.addr points to::
|
|
|
|
|
2019-04-18 10:39:31 +00:00
|
|
|
struct kvm_ppc_xive_eq {
|
|
|
|
__u32 flags;
|
|
|
|
__u32 qshift;
|
|
|
|
__u64 qaddr;
|
|
|
|
__u32 qtoggle;
|
|
|
|
__u32 qindex;
|
|
|
|
__u8 pad[40];
|
|
|
|
};
|
2020-02-10 06:02:54 +00:00
|
|
|
|
2019-04-18 10:39:31 +00:00
|
|
|
- flags: queue flags
|
2020-02-10 06:02:54 +00:00
|
|
|
KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
|
2019-04-18 10:39:31 +00:00
|
|
|
forces notification without using the coalescing mechanism
|
|
|
|
provided by the XIVE END ESBs.
|
|
|
|
- qshift: queue size (power of 2)
|
|
|
|
- qaddr: real address of queue
|
|
|
|
- qtoggle: current queue toggle bit
|
|
|
|
- qindex: current queue index
|
|
|
|
- pad: reserved for future use
|
2020-02-10 06:02:54 +00:00
|
|
|
|
2019-04-18 10:39:31 +00:00
|
|
|
Errors:
|
2020-02-10 06:02:54 +00:00
|
|
|
|
|
|
|
======= =========================================
|
|
|
|
-ENOENT Invalid CPU number
|
|
|
|
-EINVAL Invalid priority
|
|
|
|
-EINVAL Invalid flags
|
|
|
|
-EINVAL Invalid queue size
|
|
|
|
-EINVAL Invalid queue address
|
|
|
|
-EFAULT Invalid user pointer for attr->addr.
|
|
|
|
-EIO Configuration of the underlying HW failed
|
|
|
|
======= =========================================
|
|
|
|
|
|
|
|
5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only)
|
|
|
|
Synchronize the source to flush event notifications
|
|
|
|
|
2019-04-18 10:39:33 +00:00
|
|
|
Attributes:
|
|
|
|
Interrupt source number (64-bit)
|
2020-02-10 06:02:54 +00:00
|
|
|
|
2019-04-18 10:39:33 +00:00
|
|
|
Errors:
|
2020-02-10 06:02:54 +00:00
|
|
|
|
|
|
|
======= =============================
|
|
|
|
-ENOENT Unknown source number
|
|
|
|
-EINVAL Not initialized source number
|
|
|
|
======= =============================
|
2019-04-18 10:39:34 +00:00
|
|
|
|
2019-04-18 10:39:35 +00:00
|
|
|
* VCPU state
|
|
|
|
|
|
|
|
The XIVE IC maintains VP interrupt state in an internal structure
|
|
|
|
called the NVT. When a VP is not dispatched on a HW processor
|
|
|
|
thread, this structure can be updated by HW if the VP is the target
|
|
|
|
of an event notification.
|
|
|
|
|
|
|
|
It is important for migration to capture the cached IPB from the NVT
|
|
|
|
as it synthesizes the priorities of the pending interrupts. We
|
|
|
|
capture a bit more to report debug information.
|
|
|
|
|
2020-02-10 06:02:54 +00:00
|
|
|
KVM_REG_PPC_VP_STATE (2 * 64bits)::
|
|
|
|
|
|
|
|
bits: | 63 .... 32 | 31 .... 0 |
|
|
|
|
values: | TIMA word0 | TIMA word1 |
|
|
|
|
bits: | 127 .......... 64 |
|
|
|
|
values: | unused |
|
2019-04-18 10:39:35 +00:00
|
|
|
|
2019-04-18 10:39:34 +00:00
|
|
|
* Migration:
|
|
|
|
|
|
|
|
Saving the state of a VM using the XIVE native exploitation mode
|
|
|
|
should follow a specific sequence. When the VM is stopped :
|
|
|
|
|
|
|
|
1. Mask all sources (PQ=01) to stop the flow of events.
|
|
|
|
|
|
|
|
2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to
|
|
|
|
flush any in-flight event notification and to stabilize the EQs. At
|
|
|
|
this stage, the EQ pages are marked dirty to make sure they are
|
|
|
|
transferred in the migration sequence.
|
|
|
|
|
|
|
|
3. Capture the state of the source targeting, the EQs configuration
|
|
|
|
and the state of thread interrupt context registers.
|
|
|
|
|
2020-02-10 06:02:54 +00:00
|
|
|
Restore is similar:
|
2019-04-18 10:39:34 +00:00
|
|
|
|
|
|
|
1. Restore the EQ configuration. As targeting depends on it.
|
|
|
|
2. Restore targeting
|
|
|
|
3. Restore the thread interrupt contexts
|
|
|
|
4. Restore the source states
|
|
|
|
5. Let the vCPU run
|