README.md
1# Protected Virtual Machine Firmware
2
3In the context of the [Android Virtualization Framework][AVF], a hypervisor
4(_e.g._ [pKVM]) enforces full memory isolation between its virtual machines
5(VMs) and the host. As a result, the host is only allowed to access memory that
6has been explicitly shared back by a VM. Such _protected VMs_ (“pVMs”) are
7therefore able to manipulate secrets without being at risk of an attacker
8stealing them by compromising the Android host.
9
10As pVMs are started dynamically by a _virtual machine manager_ (“VMM”) running
11as a host process and as pVMs must not trust the host (see [_Why
12AVF?_][why-avf]), the virtual machine it configures can't be trusted either.
13Furthermore, even though the isolation mentioned above allows pVMs to protect
14their secrets from the host, it does not help with provisioning them during
15boot. In particular, the threat model would prohibit the host from ever having
16access to those secrets, preventing the VMM from passing them to the pVM.
17
18To address these concerns the hypervisor securely loads the pVM firmware
19(“pvmfw”) in the pVM from a protected memory region (this prevents the host or
20any pVM from tampering with it), setting it as the entry point of the virtual
21machine. As a result, pvmfw becomes the very first code that gets executed in
22the pVM, allowing it to validate the environment and abort the boot sequence if
23necessary. This process takes place whenever the VMM places a VM in protected
24mode and can’t be prevented by the host.
25
26Given the threat model, pvmfw is not allowed to trust the devices or device
27layout provided by the virtual platform it is running on as those are configured
28by the VMM. Instead, it performs all the necessary checks to ensure that the pVM
29was set up as expected. For functional purposes, the interface with the
30hypervisor, although trusted, is also validated.
31
32Once it has been determined that the platform can be trusted, pvmfw derives
33unique secrets for the guest through the [_Boot Certificate Chain_][BCC]
34("BCC", see [Open Profile for DICE][open-dice]) that can be used to prove the
35identity of the pVM to local and remote actors. If any operation or check fails,
36or in case of a missing prerequisite, pvmfw will abort the boot process of the
37pVM, effectively preventing non-compliant pVMs and/or guests from running.
38Otherwise, it hands over the pVM to the guest kernel by jumping to its first
39instruction, similarly to a bootloader.
40
41pvmfw currently only supports AArch64.
42
43[AVF]: https://source.android.com/docs/core/virtualization
44[why-avf]: https://source.android.com/docs/core/virtualization/whyavf
45[BCC]: https://pigweed.googlesource.com/open-dice/+/master/src/android/README.md
46[pKVM]: https://source.android.com/docs/core/virtualization/architecture#hypervisor
47[open-dice]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md
48
49## Integration
50
51### pvmfw Loading
52
53When running pKVM, the physical memory from which the hypervisor loads pvmfw
54into guest address space is not initially populated by the hypervisor itself.
55Instead, it receives a pre-loaded memory region from a trusted pvmfw loader and
56only then becomes responsible for protecting it. As a result, the hypervisor is
57kept generic (beyond AVF) and small as it is not expected (nor necessary) for it
58to know how to interpret or obtain the content of that region.
59
60#### Android Bootloader (ABL) Support
61
62Starting in Android T, the `PRODUCT_BUILD_PVMFW_IMAGE` build variable controls
63the generation of `pvmfw.img`, a new [ABL partition][ABL-part] containing the
64pvmfw binary (sometimes called "`pvmfw.bin`") and following the internal format
65of the [`boot`][boot-img] partition, intended to be verified and loaded by ABL
66on AVF-compatible devices.
67
68Once ABL has verified the `pvmfw.img` chained static partition, the contained
69[`boot.img` header][boot-img] may be used to obtain the size of the `pvmfw.bin`
70image (recorded in the `kernel_size` field), as it already does for the kernel
71itself. In accordance with the header format, the `kernel_size` bytes of the
72partition following the header will be the `pvmfw.bin` image.
73
74Note that when it gets executed in the context of a pVM, `pvmfw` expects to have
75been loaded at 4KiB-aligned intermediate physical address (IPA) so if ABL loads
76the `pvmfw.bin` image without respecting this alignment, it is the
77responsibility of the hypervisor to either reject the image or copy it into
78guest address space with the right alignment.
79
80To support pKVM, ABL is expected to describe the region using a reserved memory
81device tree node where both address and size have been properly aligned to the
82page size used by the hypervisor. This single region must include both the pvmfw
83binary image and its configuration data (see below). For example, the following
84node describes a region of size `0x40000` at address `0x80000000`:
85```
86reserved-memory {
87 ...
88 pkvm_guest_firmware {
89 compatible = "linux,pkvm-guest-firmware-memory";
90 reg = <0x0 0x80000000 0x40000>;
91 no-map;
92 }
93}
94```
95
96[ABL-part]: https://source.android.com/docs/core/architecture/bootloader/partitions
97[boot-img]: https://source.android.com/docs/core/architecture/bootloader/boot-image-header
98
99### Configuration Data
100
101As part of the process of loading pvmfw, the loader (typically the Android
102Bootloader, "ABL") is expected to pass device-specific pvmfw configuration data
103by appending it to the pvmfw binary and including it in the region passed to the
104hypervisor. As a result, the hypervisor will give the same protection to this
105data as it does to pvmfw and will transparently load it in guest memory, making
106it available to pvmfw at runtime. This enables pvmfw to be kept device-agnostic,
107simplifying its adoption and distribution as a centralized signed binary, while
108also being able to support device-specific details.
109
110The configuration data will be read by pvmfw at the next 4KiB boundary from the
111end of its loaded binary. Even if the pvmfw is position-independent, it will be
112expected for it to also have been loaded at a 4-KiB boundary. As a result, the
113location of the configuration data is implicitly passed to pvmfw and known to it
114at build time.
115
116#### Configuration Data Format
117
118The configuration data is described using the following [header]:
119
120```
121+===============================+
122| pvmfw.bin |
123+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
124| (Padding to 4KiB alignment) |
125+===============================+ <-- HEAD
126| Magic (= 0x666d7670) |
127+-------------------------------+
128| Version |
129+-------------------------------+
130| Total Size = (TAIL - HEAD) |
131+-------------------------------+
132| Flags |
133+-------------------------------+
134| [Entry 0] |
135| offset = (FIRST - HEAD) |
136| size = (FIRST_END - FIRST) |
137+-------------------------------+
138| [Entry 1] |
139| offset = (SECOND - HEAD) |
140| size = (SECOND_END - SECOND) |
141+-------------------------------+
142| ... |
143+-------------------------------+
144| [Entry n] |
145+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
146| (Padding to 8-byte alignment) |
147+===============================+ <-- FIRST
148| {First blob: BCC} |
149+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- FIRST_END
150| (Padding to 8-byte alignment) |
151+===============================+ <-- SECOND
152| {Second blob: DP} |
153+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- SECOND_END
154| (Padding to 8-byte alignment) |
155+===============================+
156| ... |
157+===============================+ <-- TAIL
158```
159
160Where the version number is encoded using a "`major.minor`" as follows
161
162```
163((major << 16) | (minor & 0xffff))
164```
165
166and defines the format of the header (which may change between major versions),
167its size and, in particular, the expected number of appended blobs. Each blob is
168referred to by its offset in the entry array and may be mandatory or optional
169(as defined by this specification), where missing entries are denoted by a zero
170size. It is therefore not allowed to trim missing optional entries from the end
171of the array. The header uses the endianness of the virtual machine.
172
173The header format itself is agnostic of the internal format of the individual
174blos it refers to. In version 1.0, it describes two blobs:
175
176- entry 0 must point to a valid BCC Handover (see below)
177- entry 1 may point to a [DTBO] to be applied to the pVM device tree
178
179[header]: src/config.rs
180[DTBO]: https://android.googlesource.com/platform/external/dtc/+/refs/heads/master/Documentation/dt-object-internal.txt
181
182#### Virtual Platform Boot Certificate Chain Handover
183
184The format of the BCC entry mentioned above, compatible with the
185[`BccHandover`][BccHandover] defined by the Open Profile for DICE reference
186implementation, is described by the following [CDDL][CDDL]:
187```
188PvmfwBccHandover = {
189 1 : bstr .size 32, ; CDI_Attest
190 2 : bstr .size 32, ; CDI_Seal
191 3 : Bcc, ; Certificate chain
192}
193```
194
195and contains the _Compound Device Identifiers_ ("CDIs"), used to derive the
196next-stage secret, and a certificate chain, intended for pVM attestation. Note
197that it differs from the `BccHandover` defined by the specification in that its
198`Bcc` field is mandatory (while optional in the original).
199
200Devices that fully implement DICE should provide a certificate rooted at the
201Unique Device Secret (UDS) in a boot stage preceding the pvmfw loader (typically
202ABL), in such a way that it would receive a valid `BccHandover`, that can be
203passed to [`BccHandoverMainFlow`][BccHandoverMainFlow] along with the inputs
204described below.
205
206Otherwise, as an intermediate step towards supporting DICE throughout the
207software stack of the device, incomplete implementations may root the BCC at the
208pvmfw loader, using an arbitrary constant as initial CDI. The pvmfw loader can
209easily do so by:
210
2111. Building a BCC-less `BccHandover` using CBOR operations
212 ([example][Trusty-BCC]) and containing the constant CDIs
2131. Passing the resulting `BccHandover` to `BccHandoverMainFlow` as described
214 above
215
216The recommended DICE inputs at this stage are:
217
218- **Code**: hash of the pvmfw image, hypervisor (`boot.img`), and other target
219 code relevant to the secure execution of pvmfw (_e.g._ `vendor_boot.img`)
220- **Configuration Data**: any extra input relevant to pvmfw security
221- **Authority Data**: must cover all the public keys used to sign and verify the
222 code contributing to the **Code** input
223- **Mode Decision**: Set according to the [specification][dice-mode]. In
224 particular, should only be `Normal` if secure boot is being properly enforced
225 (_e.g._ locked device in [Android Verified Boot][AVB])
226- **Hidden Inputs**: Factory Reset Secret (FRS, stored in a tamper evident
227 storage and changes during every factory reset) or similar that changes as
228 part of the device lifecycle (_e.g._ reset)
229
230The resulting `BccHandover` is then used by pvmfw in a similar way to derive
231another [DICE layer][Layering], passed to the guest through a `/reserved-memory`
232device tree node marked as [`compatible=”google,open-dice”`][dice-dt].
233
234[AVB]: https://source.android.com/docs/security/features/verifiedboot/boot-flow
235[BccHandover]: https://pigweed.googlesource.com/open-dice/+/825e3beb6c/src/android/bcc.c#260
236[BccHandoverMainFlow]: https://pigweed.googlesource.com/open-dice/+/825e3beb6c/src/android/bcc.c#199
237[CDDL]: https://datatracker.ietf.org/doc/rfc8610
238[dice-mode]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md#Mode-Value-Details
239[dice-dt]: https://www.kernel.org/doc/Documentation/devicetree/bindings/reserved-memory/google%2Copen-dice.yaml
240[Layering]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md#layering-details
241[Trusty-BCC]: https://android.googlesource.com/trusty/lib/+/1696be0a8f3a7103/lib/hwbcc/common/swbcc.c#554
242
243#### pVM Device Tree Overlay
244
245Config header can provide a DTBO to be overlaid on top of the baseline device
246tree from crosvm.
247
248The DTBO may contain debug policies. Debug policies MUST NOT be provided for
249locked devices for security reasons.
250
251Here are an example of DTBO.
252
253```
254/ {
255 fragment@avf {
256 target-path = "/";
257
258 __overlay__ {
259 avf {
260 /* your debug policy here */
261 };
262 };
263 };
264}; /* end of avf */
265```
266
267For specifying DTBO, host bootloader should apply the DTBO to both host
268OS's device tree and config header of `pvmfw`. Both `virtualizationmanager` and
269`pvmfw` will prepare for debugging features.
270
271For details about device tree properties for debug policies, see
272[microdroid's debugging policy guide](../microdroid/README.md#option-1-running-microdroid-on-avf-debug-policy-configured-device).
273
274### Platform Requirements
275
276pvmfw is intended to run in a virtualized environment according to the `crosvm`
277[memory layout][crosvm-mem] for protected VMs and so it expects to have been
278loaded at address `0x7fc0_0000` and uses the 2MiB region at address
279`0x7fe0_0000` as scratch memory. It makes use of the virtual PCI bus to obtain a
280virtio interface to the host and prints its logs through the 16550 UART (address
281`0x3f8`).
282
283At boot, pvmfw discovers the running hypervisor in order to select the
284appropriate hypervisor calls to share/unshare memory, mark IPA regions as MMIO,
285obtain trusted true entropy, and reboot the virtual machine. In particular, it
286makes use of the following hypervisor calls:
287
288- Arm [SMC Calling Convention][smccc] v1.1 or above:
289
290 - `SMCCC_VERSION`
291 - Vendor Specific Hypervisor Service Call UID Query
292
293- Arm [Power State Coordination Interface][psci] v1.0 or above:
294
295 - `PSCI_VERSION`
296 - `PSCI_FEATURES`
297 - `PSCI_SYSTEM_RESET`
298 - `PSCI_SYSTEM_SHUTDOWN`
299
300- Arm [True Random Number Generator Firmware Interface][smccc-trng] v1.0:
301
302 - `TRNG_VERSION`
303 - `TRNG_FEATURES`
304 - `TRNG_RND`
305
306- When running under KVM, the pKVM-specific hypervisor interface must provide:
307
308 - `MEMINFO` (function ID `0xc6000002`)
309 - `MEM_SHARE` (function ID `0xc6000003`)
310 - `MEM_UNSHARE` (function ID `0xc6000004`)
311 - `MMIO_GUARD_INFO` (function ID `0xc6000005`)
312 - `MMIO_GUARD_ENROLL` (function ID `0xc6000006`)
313 - `MMIO_GUARD_MAP` (function ID `0xc6000007`)
314 - `MMIO_GUARD_UNMAP` (function ID `0xc6000008`)
315
316[crosvm-mem]: https://crosvm.dev/book/appendix/memory_layout.html
317[psci]: https://developer.arm.com/documentation/den0022
318[smccc]: https://developer.arm.com/documentation/den0028
319[smccc-trng]: https://developer.arm.com/documentation/den0098
320
321## Booting Protected Virtual Machines
322
323### Boot Protocol
324
325As the hypervisor makes pvmfw the entry point of the VM, the initial value of
326the registers it receives is configured by the VMM and is expected to follow the
327[Linux ABI] _i.e._
328
329- x0 = physical address of device tree blob (dtb) in system RAM.
330- x1 = 0 (reserved for future use)
331- x2 = 0 (reserved for future use)
332- x3 = 0 (reserved for future use)
333
334Images to be verified, which have been loaded to guest memory by the VMM prior
335to booting the VM, are described to pvmfw using the device tree (x0):
336
337- the kernel in the `/config` DT node _e.g._
338
339 ```
340 / {
341 config {
342 kernel-address = <0x80200000>;
343 kernel-size = <0x1000000>;
344 };
345 };
346 ````
347
348- the (optional) ramdisk in the standard `/chosen` node _e.g._
349
350 ```
351 / {
352 chosen {
353 linux,initrd-start = <0x82000000>;
354 linux,initrd-end = <0x82800000>;
355 };
356 };
357 ```
358
359[Linux ABI]: https://www.kernel.org/doc/Documentation/arm64/booting.txt
360
361### Handover ABI
362
363After verifying the guest kernel, pvmfw boots it using the Linux ABI described
364above. It uses the device tree to pass the following:
365
366- a reserved memory node containing the produced BCC:
367
368 ```
369 / {
370 reserved-memory {
371 #address-cells = <0x02>;
372 #size-cells = <0x02>;
373 ranges;
374 dice {
375 compatible = "google,open-dice";
376 no-map;
377 reg = <0x0 0x7fe0000>, <0x0 0x1000>;
378 };
379 };
380 };
381 ```
382
383- the `/chosen/avf,new-instance` flag, set when pvmfw generated a new secret
384 (_i.e._ the pVM instance was booted for the first time). This should be used
385 by the next stages to ensure that an attacker isn't trying to force new
386 secrets to be generated by one stage, in isolation;
387
388- the `/chosen/avf,strict-boot` flag, always set and can be used by guests to
389 enable extra validation
390
391### Guest Image Signing
392
393pvmfw verifies the guest kernel image (loaded by the VMM) by re-using tools and
394formats introduced by the Android Verified Boot. In particular, it expects the
395kernel region (see `/config/kernel-{address,size}` described above) to contain
396an appended VBMeta structure, which can be generated as follows:
397
398```
399avbtool add_hash_footer --image <kernel.bin> \
400 --partition_name boot \
401 --dynamic_partition_size \
402 --key $KEY
403```
404
405In cases where a ramdisk is required by the guest, pvmfw must also verify it. To
406do so, it must be covered by a hash descriptor in the VBMeta of the kernel:
407
408```
409cp <initrd.bin> /tmp/
410avbtool add_hash_footer --image /tmp/<initrd.bin> \
411 --partition_name $INITRD_NAME \
412 --dynamic_partition_size \
413 --key $KEY
414avbtool add_hash_footer --image <kernel.bin> \
415 --partition_name boot \
416 --dynamic_partition_size \
417 --include_descriptor_from_image /tmp/<initrd.bin> \
418 --key $KEY
419```
420
421Note that the `/tmp/<initrd.bin>` file is only created to temporarily hold the
422hash descriptor to be added to the kernel footer and that the unsigned
423`<initrd.bin>` should be passed to the VMM when booting a pVM.
424
425The name of the AVB "partition" for the ramdisk (`$INITRD_NAME`) can be used by
426the signer to specify if pvmfw must consider the guest to be debuggable
427(`initrd_debug`) or not (`initrd_normal`), which will be reflected in the
428certificate of the guest and will affect the secrets being provisioned.
429
430If pVM guest kernels are built and/or packaged using the Android Build system,
431the signing described above is recommended to be done through an
432`avb_add_hash_footer` Soong module (see [how we sign the Microdroid
433kernel][soong-udroid]).
434
435[soong-udroid]: https://cs.android.com/android/platform/superproject/+/master:packages/modules/Virtualization/microdroid/Android.bp;l=427;drc=ca0049be4d84897b8c9956924cfae506773103eb
436