• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright 2021-2024 The Khronos Group Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5= VK_EXT_host_image_copy
6:toc: left
7:refpage: https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/
8:sectnums:
9
10This document identifies inefficiencies with image data initialization and proposes an extension to improve it.
11
12== Problem Statement
13
14Copying data to optimal-layout images in Vulkan requires staging the data in a buffer first, and using the GPU to perform the copy.
15Similarly, copying data out of an optimal-layout image requires a copy to a buffer.
16This restriction can cause a number of inefficiencies in certain scenarios.
17
18Take initializing an image for the purpose of sampling as an example, where the source of data is a file.
19The application has to load the data to memory (one copy), then initialize the buffer (second copy) and finally copy over to the image (third copy).
20Applications can remove one copy from the above scenario by creating and memory mapping the buffer first and loading the image data from disk directly into the buffer.
21This is not always possible, for example because the streaming and graphics subsystems of a game engine are independent, or in the case of layering, because the layer is given a pointer to the data which is already loaded from disk.
22
23The extra copy involved due to it going through a buffer is not just a performance cost though.
24The buffer that is allocated for the image copy is at least as big as the image itself, and lives for a short duration until the copy is confirmed to be done.
25When an application performs a large number of image initialization at the same time, such as a game loading assets, it will momentarily have twice as much memory allocated for its images (the images themselves and their staging buffers), greatly increasing its peak memory usage.
26This can lead to out-of-memory errors on some devices.
27
28This document proposes an extension that allows image data to be copied from/to host memory directly, obviating the need to perform the copy through a buffer and save on memory.
29While copying to an optimal layout image on the CPU has its own costs, this extension can still lead to better performance by allowing the CPU to perform some copies in parallel with the GPU.
30
31== Proposal
32
33An extension is proposed to address this issue.
34The extension's API is designed to be similar to buffer-image and image-image copies.
35
36Introduced by this API are:
37
38Features, advertising whether the implementation supports host->image, image->host and image->image copies:
39
40[source,c]
41----
42typedef struct VkPhysicalDeviceHostImageCopyFeaturesEXT {
43    VkStructureType    sType;
44    void*              pNext;
45    VkBool32           hostImageCopy;
46} VkPhysicalDeviceHostImageCopyFeaturesEXT;
47----
48
49Query of which layouts can be used in to-image and from-image copies:
50
51[source,c]
52----
53typedef struct VkPhysicalDeviceHostImageCopyPropertiesEXT {
54    VkStructureType    sType;
55    void*              pNext;
56    uint32_t           copySrcLayoutCount;
57    VkImageLayout*     pCopySrcLayouts;
58    uint32_t           copyDstLayoutCount;
59    VkImageLayout*     pCopyDstLayouts;
60    uint8_t            optimalTilingLayoutUUID[VK_UUID_SIZE];
61    VkBool32           identicalMemoryTypeRequirements;
62} VkPhysicalDeviceHostImageCopyPropertiesEXT;
63----
64
65In the above, `optimalTilingLayoutUUID` can be used to ensure compatible data layouts between memory and images when using `VK_HOST_IMAGE_COPY_MEMCPY_EXT` in the below commands.
66`identicalMemoryTypeRequirements` specifies whether using `VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT` may affect the memory type requirements of the image or not.
67
68Defining regions to copy to an image:
69
70[source,c]
71----
72typedef struct VkCopyMemoryToImageInfoEXT {
73    VkStructureType               sType;
74    void*                         pNext;
75    VkHostImageCopyFlagsEXT       flags;
76    VkImage                       dstImage;
77    VkImageLayout                 dstImageLayout;
78    uint32_t                      regionCount;
79    const VkMemoryToImageCopyEXT* pRegions;
80} VkCopyMemoryToImageInfoEXT;
81----
82
83In the above, `flags` may be `VK_HOST_IMAGE_COPY_MEMCPY_EXT`, in which case the data in host memory should have the same swizzling layout as the image.
84This is mainly useful for embedded systems where this swizzling is known and well defined outside of Vulkan.
85
86Defining regions to copy from an image:
87
88[source,c]
89----
90typedef struct VkCopyImageToMemoryInfoEXT {
91    VkStructureType               sType;
92    void*                         pNext;
93    VkHostImageCopyFlagsEXT       flags;
94    VkImage                       srcImage;
95    VkImageLayout                 srcImageLayout;
96    uint32_t                      regionCount;
97    const VkImageToMemoryCopyEXT* pRegions;
98} VkCopyImageToMemoryInfoEXT;
99----
100
101In the above, `flags` may be `VK_HOST_IMAGE_COPY_MEMCPY_EXT`, in which case the data in host memory will have the same swizzling layout as the image.
102
103Defining regions to copy between images
104
105[source,c]
106----
107typedef struct VkCopyImageToImageInfoEXT {
108    VkStructureType               sType;
109    void*                         pNext;
110    VkHostImageCopyFlagsEXT       flags;
111    VkImage                       srcImage;
112    VkImageLayout                 srcImageLayout;
113    VkImage                       dstImage;
114    VkImageLayout                 dstImageLayout;
115    uint32_t                      regionCount;
116    const VkImageCopy2*           pRegions;
117} VkCopyImageToImageInfoEXT;
118----
119
120In the above, `flags` may be `VK_HOST_IMAGE_COPY_MEMCPY_EXT`, in which case data is copied between images with no swizzling layout considerations.
121Current limitations on source and destination images necessarily lead to raw copies between images, so this flag is currently redundant for image to image copies.
122
123Defining the copy regions themselves:
124
125[source,c]
126----
127typedef struct VkMemoryToImageCopyEXT {
128    VkStructureType             sType;
129    void*                       pNext;
130    const void*                 pHostPointer;
131    uint32_t                    memoryRowLength;
132    uint32_t                    memoryImageHeight;
133    VkImageSubresourceLayers    imageSubresource;
134    VkOffset3D                  imageOffset;
135    VkExtent3D                  imageExtent;
136} VkMemoryToImageCopyEXT;
137
138typedef struct VkImageToMemoryCopyEXT {
139    VkStructureType             sType;
140    void*                       pNext;
141    void*                       pHostPointer;
142    uint32_t                    memoryRowLength;
143    uint32_t                    memoryImageHeight;
144    VkImageSubresourceLayers    imageSubresource;
145    VkOffset3D                  imageOffset;
146    VkExtent3D                  imageExtent;
147} VkImageToMemoryCopyEXT;
148----
149
150The following functions perform the actual copy:
151
152[source,c]
153----
154VkResult vkCopyMemoryToImageEXT(VkDevice device, const VkCopyMemoryToImageInfoEXT* pCopyMemoryToImageInfo);
155VkResult vkCopyImageToMemoryEXT(VkDevice device, const VkCopyImageToMemoryInfoEXT* pCopyImageToMemoryInfo);
156VkResult vkCopyImageToImageEXT(VkDevice device, const VkCopyImageToImageInfoEXT* pCopyImageToImageInfo);
157----
158
159Images that are used by these copy instructions must have the `VK_IMAGE_USAGE_HOST_TRANSFER_BIT` usage bit set.
160
161Additionally, to avoid having to submit a command just to transition the image to the correct layout, the following function is introduced to do the layout transition on the host.
162The allowed layouts are limited to serve this purpose without requiring implementations to implement complex layout transitions.
163
164[source,c]
165----
166typedef struct VkHostImageLayoutTransitionInfoEXT {
167    VkStructureType            sType;
168    void*                      pNext;
169    VkImage                    image;
170    VkImageLayout              oldLayout;
171    VkImageLayout              newLayout;
172    VkImageSubresourceRange    subresourceRange;
173} VkHostImageLayoutTransitionInfoEXT;
174
175VkResult vkTransitionImageLayoutEXT(VkDevice device, uint32_t transitionCount, const VkHostImageLayoutTransitionInfoEXT *pTransitions);
176----
177
178The allowed values for `oldLayout` are:
179
180- `VK_IMAGE_LAYOUT_UNDEFINED`
181- `VK_IMAGE_LAYOUT_PREINITIALIZED`
182- Layouts in `VkPhysicalDeviceHostImageCopyPropertiesEXT::pCopySrcLayouts`
183
184The allowed values for `newLayout` are:
185
186- Layouts in `VkPhysicalDeviceHostImageCopyPropertiesEXT::pCopyDstLayouts`.
187  - This list always includes `VK_IMAGE_LAYOUT_GENERAL`
188
189---
190
191When `VK_HOST_IMAGE_COPY_MEMCPY_EXT` is used in copies to or from an image with `VK_IMAGE_TILING_OPTIMAL`, the application may need to query the memory size needed for copy.
192The link:{refpage}vkGetImageSubresourceLayout2EXT.html[vkGetImageSubresourceLayout2EXT] function can be used for this purpose:
193
194[source,c]
195----
196void vkGetImageSubresourceLayout2EXT(
197    VkDevice                       device,
198    VkImage                        image,
199    const VkImageSubresource2EXT*  pSubresource,
200    VkSubresourceLayout2EXT*       pLayout);
201----
202
203The memory size in bytes needed for copies using `VK_HOST_IMAGE_COPY_MEMCPY_EXT` can be retrieved by chaining `VkSubresourceHostMemcpySizeEXT` to `pLayout`:
204
205[source,c]
206----
207typedef struct VkSubresourceHostMemcpySizeEXT {
208    VkStructureType            sType;
209    void*                      pNext;
210    VkDeviceSize               size;
211} VkSubresourceHostMemcpySizeEXT;
212----
213
214=== Querying support
215
216To determine if a format supports host image copies, `VK_FORMAT_FEATURE_2_HOST_IMAGE_TRANSFER_BIT_EXT` is added.
217
218=== Required formats
219
220All color formats that support sampling are required to support
221`VK_FORMAT_FEATURE_2_HOST_IMAGE_TRANSFER_BIT_EXT`, with some exceptions for externally defined formats:
222
223- DRM format modifiers
224- Android hardware buffers
225
226=== Limitations
227
228Images in optimal layout are often swizzled non-linearly.
229When copying between images and buffers, the GPU can perform the swizzling and address translations in hardware.
230When copying between images and host memory however, the CPU needs to perform this swizzling.
231As a result:
232
233- The implementation may decide to use a simpler and less efficient layout for the image data when `VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT` is specified.
234  - If `optimalDeviceAccess` is set however (see below), the implementation informs that the memory layout
235    is equivalent to an image that does not enable `VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT` from a performance perspective
236    and applications can assume that host image copy is just as efficient as using device copies for resources which are
237    accessed many times on device.
238  - Equivalent performance is only expected within a specific memory type however.
239    On a discrete GPU for example, non-device local memory is expected to be slower to access than device-local memory.
240- The copy on the CPU may indeed be slower than the double-copy through a buffer due to the above swizzling logic.
241
242Additionally, to perform the copy, the implementation must be able to map the image's memory which may limit the memory type the image can be allocated from.
243
244It is therefore recommended that developers measure performance and decide whether this extension results in a performance gain or loss in their application.
245Unless specifically recommended on a platform, it is _not_ generally recommended for applications to perform all image copies through this extension.
246
247=== Querying performance characteristics
248
249[source,c]
250----
251typedef struct VkHostImageCopyDevicePerformanceQueryEXT {
252    VkStructureType    sType;
253    void*              pNext;
254    VkBool32           optimalDeviceAccess;
255    VkBool32           identicalMemoryLayout;
256} VkHostImageCopyDevicePerformanceQueryEXT;
257----
258
259This struct can be chained as an output struct in `vkGetPhysicalDeviceImageFormatProperties2`.
260Given certain image creation flags, it is important for applications to know if using `VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT`
261has an adverse effect on device performance.
262
263This query cannot be a format feature flag, since image creation information can affect this query.
264For example, an image that is only created with `VK_IMAGE_USAGE_SAMPLED_BIT` and `VK_IMAGE_USAGE_TRANSFER_DST_BIT`
265might not have compression at all on some implementations, but adding `VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT` would change this query.
266Other implementations may want to use compression even for `VK_IMAGE_USAGE_TRANSFER_DST_BIT`.
267
268`identicalMemoryLayout` is intended for the gray area where the image is just swizzled in a slightly different pattern to aid host access,
269but fundamentally similar to non-host image copy paths, such that it is unlikely that performance changes in any meaningful way
270except pathological situations.
271The inclusion of this field gives more leeway to implementations that would like to
272set `optimalDeviceAccess` for an image without having to guarantee 100% identical memory layout, and allows applications to choose host image copies
273in that case, knowing that performance is not sacrificed.
274
275As a baseline, block-compressed formats are required to set `optimalDeviceAccess` to `VK_TRUE`.
276
277== Issues
278
279=== RESOLVED: Should other layouts be allowed in `VkHostImageLayoutTransitionInfoEXT`?
280
281Specifying `VK_IMAGE_USAGE_HOST_TRANSFER_BIT` effectively puts the image in a physical layout where `VK_IMAGE_LAYOUT_GENERAL` performs similarly to the `OPTIMAL` layouts for that image.
282Therefore, it was deemed unnecessary to allow other layouts, as they provide no performance benefit.
283In practice, especially for read-only textures, a host-transferred image in the `VK_IMAGE_LAYOUT_GENERAL` layout could be just as efficient as an image transitioned to `VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL`.
284`VkHostImageCopyDevicePerformanceQueryEXT` can be used to query whether using `VK_IMAGE_USAGE_HOST_TRANSFER_BIT` can be detrimental to performance.
285If it is, performance measurements are recommended to ensure the gains from this extension outperform the potential losses.
286
287=== RESOLVED: Should queue family ownership transfers be supported on the host as well?
288
289As long as the allowed layouts are limited to the ones specified above, the actual physical layout of the image will not vary between queue families, and so queue family ownership transfers are currently unnecessary.
290