1# vhost-user protocol extensions: sleep/wake/snapshot/restore 2 3WORK IN PROGRESS 4 5Documentation for the vhost-user protocol extensions added to crosvm as part of the snapshot-restore 6project. Written in the style of https://qemu-project.gitlab.io/qemu/interop/vhost-user.html so that 7we can send it upstream as a proposal. 8 9These extensions might be redundant with the VHOST_USER_PROTOCOL_F_DEVICE_STATE features recently 10added to the spec. 11 12## Protocol features 13 14TODO: Include a protocol feature for backends to advertise snapshotting support. 15 16## Front-end message types 17 18### VHOST_USER_SLEEP 19 20id: 1000 (temporary) 21 22equivalent ioctl: N/A 23 24request payload: N/A 25 26reply payload: i8 27 28Backend should stop all active queues. If the backend interacts with resources on the host, e.g. if 29it writes to a socket, it is expected that all activity with those resources stops before the 30VHOST_USER_SLEEP response is sent. This requirement allows other host side processes to snapshot 31their own state without the risk of race conditions. For example, if a virtio-blk flushed pending 32writes after VHOST_USER_SLEEP, then a disk image snapshot taken by the VMM could be missing data. 33 34The first byte of the response should be 1 to indicate success or 0 to indicate failure. 35 36### VHOST_USER_WAKE 37 38id: 1001 (temporary) 39 40equivalent ioctl: N/A 41 42request payload: N/A 43 44reply payload: i8 45 46Backend should start all active queues and may restart any interactions with host side resources. 47 48The first byte of the response should be 1 to indicate success or 0 to indicate failure. 49 50### VHOST_USER_SNAPSHOT 51 52id: 1002 (temporary) 53 54equivalent ioctl: N/A 55 56request payload: N/A 57 58reply payload: i8, followed by (payload size - 1) bytes of opaque snapshot data 59 60Backend should create a snapshot of all state needed to perform a restore. 61 62The first byte of the response should be 1 to indicate success or 0 to indicate failure. The rest of 63the response is the snapshot bytes, which are opaque from the perspective of the frontend. 64 65### VHOST_USER_RESTORE 66 67id: 1003 (temporary) 68 69equivalent ioctl: N/A 70 71request payload: (payload size) bytes of opaque snapshot data 72 73reply payload: i8 74 75Backend should restore itself to state of the snapshot provided in the request payload. The request 76will contain the exact same bytes returned from a previous VHOST_USER_SNAPSHOT request. 77 78The frontend must send the VHOST_USER_SET_MEM_TABLE request before VHOST_USER_RESTORE so that the 79backend has enough information to perform the vring restore. 80 81The event file descriptors for adding buffers to the vrings (normally passed via 82VHOST_USER_SET_VRING_KICK) are included in the ancillary data. The index of the file descriptor in 83the ancillary data is the index of the queue it belongs to. 84 85The one byte response should be 1 to indicate success or 0 to indicate failure. 86 87## Snapshot-Restore 88 89TODO: write an overview for the feature 90 91### Frontend 92 93Snapshot sequence: 94 951. Frontend connects to vhost-user devices. 961. ... proceed as usual ... 971. For each vhost-user device 98 - Frontend sends VHOST_USER_SLEEP request. 991. For each vhost-user device 100 - Frontend sends VHOST_USER_SNAPSHOT request and saves the response payload somewhere. 1011. For each vhost-user device 102 - Frontend sends VHOST_USER_WAKE request. 1031. ... proceed as usual ... 104 105Restore sequence: 106 1071. Frontend connects to vhost-user devices. 1081. For each vhost-user device 109 - Frontend sends VHOST_USER_SLEEP request. 1101. For each vhost-user device 111 - Frontend sends VHOST_USER_SET_MEM_TABLE request. 112 - For every queue that was active at the time of snapshotting, frontend sends a 113 VHOST_USER_SET_VRING_CALL request for that queue. 114 - Frontend sends VHOST_USER_RESTORE request. 1151. For each vhost-user device 116 - Frontend sends VHOST_USER_WAKE request. 1171. ... proceed as usual ... 118 119### Backend 120 121TODO: anything interesting to write here? 122