1POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1) 2========================================================== 3 4Device types supported: 5 KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 6 7This device acts as a VM interrupt controller. It provides the KVM 8interface to configure the interrupt sources of a VM in the underlying 9POWER9 XIVE interrupt controller. 10 11Only one XIVE instance may be instantiated. A guest XIVE device 12requires a POWER9 host and the guest OS should have support for the 13XIVE native exploitation interrupt mode. If not, it should run using 14the legacy interrupt mode, referred as XICS (POWER7/8). 15 16* Device Mappings 17 18 The KVM device exposes different MMIO ranges of the XIVE HW which 19 are required for interrupt management. These are exposed to the 20 guest in VMAs populated with a custom VM fault handler. 21 22 1. Thread Interrupt Management Area (TIMA) 23 24 Each thread has an associated Thread Interrupt Management context 25 composed of a set of registers. These registers let the thread 26 handle priority management and interrupt acknowledgment. The most 27 important are : 28 29 - Interrupt Pending Buffer (IPB) 30 - Current Processor Priority (CPPR) 31 - Notification Source Register (NSR) 32 33 They are exposed to software in four different pages each proposing 34 a view with a different privilege. The first page is for the 35 physical thread context and the second for the hypervisor. Only the 36 third (operating system) and the fourth (user level) are exposed the 37 guest. 38 39 2. Event State Buffer (ESB) 40 41 Each source is associated with an Event State Buffer (ESB) with 42 either a pair of even/odd pair of pages which provides commands to 43 manage the source: to trigger, to EOI, to turn off the source for 44 instance. 45 46 3. Device pass-through 47 48 When a device is passed-through into the guest, the source 49 interrupts are from a different HW controller (PHB4) and the ESB 50 pages exposed to the guest should accommadate this change. 51 52 The passthru_irq helpers, kvmppc_xive_set_mapped() and 53 kvmppc_xive_clr_mapped() are called when the device HW irqs are 54 mapped into or unmapped from the guest IRQ number space. The KVM 55 device extends these helpers to clear the ESB pages of the guest IRQ 56 number being mapped and then lets the VM fault handler repopulate. 57 The handler will insert the ESB page corresponding to the HW 58 interrupt of the device being passed-through or the initial IPI ESB 59 page if the device has being removed. 60 61 The ESB remapping is fully transparent to the guest and the OS 62 device driver. All handling is done within VFIO and the above 63 helpers in KVM-PPC. 64 65* Groups: 66 67 1. KVM_DEV_XIVE_GRP_CTRL 68 Provides global controls on the device 69 Attributes: 70 1.1 KVM_DEV_XIVE_RESET (write only) 71 Resets the interrupt controller configuration for sources and event 72 queues. To be used by kexec and kdump. 73 Errors: none 74 75 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) 76 Sync all the sources and queues and mark the EQ pages dirty. This 77 to make sure that a consistent memory state is captured when 78 migrating the VM. 79 Errors: none 80 81 2. KVM_DEV_XIVE_GRP_SOURCE (write only) 82 Initializes a new source in the XIVE device and mask it. 83 Attributes: 84 Interrupt source number (64-bit) 85 The kvm_device_attr.addr points to a __u64 value: 86 bits: | 63 .... 2 | 1 | 0 87 values: | unused | level | type 88 - type: 0:MSI 1:LSI 89 - level: assertion level in case of an LSI. 90 Errors: 91 -E2BIG: Interrupt source number is out of range 92 -ENOMEM: Could not create a new source block 93 -EFAULT: Invalid user pointer for attr->addr. 94 -ENXIO: Could not allocate underlying HW interrupt 95 96 3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) 97 Configures source targeting 98 Attributes: 99 Interrupt source number (64-bit) 100 The kvm_device_attr.addr points to a __u64 value: 101 bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 102 values: | eisn | mask | server | priority 103 - priority: 0-7 interrupt priority level 104 - server: CPU number chosen to handle the interrupt 105 - mask: mask flag (unused) 106 - eisn: Effective Interrupt Source Number 107 Errors: 108 -ENOENT: Unknown source number 109 -EINVAL: Not initialized source number 110 -EINVAL: Invalid priority 111 -EINVAL: Invalid CPU number. 112 -EFAULT: Invalid user pointer for attr->addr. 113 -ENXIO: CPU event queues not configured or configuration of the 114 underlying HW interrupt failed 115 -EBUSY: No CPU available to serve interrupt 116 117 4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) 118 Configures an event queue of a CPU 119 Attributes: 120 EQ descriptor identifier (64-bit) 121 The EQ descriptor identifier is a tuple (server, priority) : 122 bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 123 values: | unused | server | priority 124 The kvm_device_attr.addr points to : 125 struct kvm_ppc_xive_eq { 126 __u32 flags; 127 __u32 qshift; 128 __u64 qaddr; 129 __u32 qtoggle; 130 __u32 qindex; 131 __u8 pad[40]; 132 }; 133 - flags: queue flags 134 KVM_XIVE_EQ_ALWAYS_NOTIFY (required) 135 forces notification without using the coalescing mechanism 136 provided by the XIVE END ESBs. 137 - qshift: queue size (power of 2) 138 - qaddr: real address of queue 139 - qtoggle: current queue toggle bit 140 - qindex: current queue index 141 - pad: reserved for future use 142 Errors: 143 -ENOENT: Invalid CPU number 144 -EINVAL: Invalid priority 145 -EINVAL: Invalid flags 146 -EINVAL: Invalid queue size 147 -EINVAL: Invalid queue address 148 -EFAULT: Invalid user pointer for attr->addr. 149 -EIO: Configuration of the underlying HW failed 150 151 5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) 152 Synchronize the source to flush event notifications 153 Attributes: 154 Interrupt source number (64-bit) 155 Errors: 156 -ENOENT: Unknown source number 157 -EINVAL: Not initialized source number 158 159* VCPU state 160 161 The XIVE IC maintains VP interrupt state in an internal structure 162 called the NVT. When a VP is not dispatched on a HW processor 163 thread, this structure can be updated by HW if the VP is the target 164 of an event notification. 165 166 It is important for migration to capture the cached IPB from the NVT 167 as it synthesizes the priorities of the pending interrupts. We 168 capture a bit more to report debug information. 169 170 KVM_REG_PPC_VP_STATE (2 * 64bits) 171 bits: | 63 .... 32 | 31 .... 0 | 172 values: | TIMA word0 | TIMA word1 | 173 bits: | 127 .......... 64 | 174 values: | unused | 175 176* Migration: 177 178 Saving the state of a VM using the XIVE native exploitation mode 179 should follow a specific sequence. When the VM is stopped : 180 181 1. Mask all sources (PQ=01) to stop the flow of events. 182 183 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to 184 flush any in-flight event notification and to stabilize the EQs. At 185 this stage, the EQ pages are marked dirty to make sure they are 186 transferred in the migration sequence. 187 188 3. Capture the state of the source targeting, the EQs configuration 189 and the state of thread interrupt context registers. 190 191 Restore is similar : 192 193 1. Restore the EQ configuration. As targeting depends on it. 194 2. Restore targeting 195 3. Restore the thread interrupt contexts 196 4. Restore the source states 197 5. Let the vCPU run 198