1// Copyright 2021-2024 The Khronos Group Inc. 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5= VK_EXT_host_image_copy 6:toc: left 7:refpage: https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/ 8:sectnums: 9 10This document identifies inefficiencies with image data initialization and proposes an extension to improve it. 11 12== Problem Statement 13 14Copying data to optimal-layout images in Vulkan requires staging the data in a buffer first, and using the GPU to perform the copy. 15Similarly, copying data out of an optimal-layout image requires a copy to a buffer. 16This restriction can cause a number of inefficiencies in certain scenarios. 17 18Take initializing an image for the purpose of sampling as an example, where the source of data is a file. 19The application has to load the data to memory (one copy), then initialize the buffer (second copy) and finally copy over to the image (third copy). 20Applications can remove one copy from the above scenario by creating and memory mapping the buffer first and loading the image data from disk directly into the buffer. 21This is not always possible, for example because the streaming and graphics subsystems of a game engine are independent, or in the case of layering, because the layer is given a pointer to the data which is already loaded from disk. 22 23The extra copy involved due to it going through a buffer is not just a performance cost though. 24The buffer that is allocated for the image copy is at least as big as the image itself, and lives for a short duration until the copy is confirmed to be done. 25When an application performs a large number of image initialization at the same time, such as a game loading assets, it will momentarily have twice as much memory allocated for its images (the images themselves and their staging buffers), greatly increasing its peak memory usage. 26This can lead to out-of-memory errors on some devices. 27 28This document proposes an extension that allows image data to be copied from/to host memory directly, obviating the need to perform the copy through a buffer and save on memory. 29While copying to an optimal layout image on the CPU has its own costs, this extension can still lead to better performance by allowing the CPU to perform some copies in parallel with the GPU. 30 31== Proposal 32 33An extension is proposed to address this issue. 34The extension's API is designed to be similar to buffer-image and image-image copies. 35 36Introduced by this API are: 37 38Features, advertising whether the implementation supports host->image, image->host and image->image copies: 39 40[source,c] 41---- 42typedef struct VkPhysicalDeviceHostImageCopyFeaturesEXT { 43 VkStructureType sType; 44 void* pNext; 45 VkBool32 hostImageCopy; 46} VkPhysicalDeviceHostImageCopyFeaturesEXT; 47---- 48 49Query of which layouts can be used in to-image and from-image copies: 50 51[source,c] 52---- 53typedef struct VkPhysicalDeviceHostImageCopyPropertiesEXT { 54 VkStructureType sType; 55 void* pNext; 56 uint32_t copySrcLayoutCount; 57 VkImageLayout* pCopySrcLayouts; 58 uint32_t copyDstLayoutCount; 59 VkImageLayout* pCopyDstLayouts; 60 uint8_t optimalTilingLayoutUUID[VK_UUID_SIZE]; 61 VkBool32 identicalMemoryTypeRequirements; 62} VkPhysicalDeviceHostImageCopyPropertiesEXT; 63---- 64 65In the above, `optimalTilingLayoutUUID` can be used to ensure compatible data layouts between memory and images when using `VK_HOST_IMAGE_COPY_MEMCPY_EXT` in the below commands. 66`identicalMemoryTypeRequirements` specifies whether using `VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT` may affect the memory type requirements of the image or not. 67 68Defining regions to copy to an image: 69 70[source,c] 71---- 72typedef struct VkCopyMemoryToImageInfoEXT { 73 VkStructureType sType; 74 void* pNext; 75 VkHostImageCopyFlagsEXT flags; 76 VkImage dstImage; 77 VkImageLayout dstImageLayout; 78 uint32_t regionCount; 79 const VkMemoryToImageCopyEXT* pRegions; 80} VkCopyMemoryToImageInfoEXT; 81---- 82 83In the above, `flags` may be `VK_HOST_IMAGE_COPY_MEMCPY_EXT`, in which case the data in host memory should have the same swizzling layout as the image. 84This is mainly useful for embedded systems where this swizzling is known and well defined outside of Vulkan. 85 86Defining regions to copy from an image: 87 88[source,c] 89---- 90typedef struct VkCopyImageToMemoryInfoEXT { 91 VkStructureType sType; 92 void* pNext; 93 VkHostImageCopyFlagsEXT flags; 94 VkImage srcImage; 95 VkImageLayout srcImageLayout; 96 uint32_t regionCount; 97 const VkImageToMemoryCopyEXT* pRegions; 98} VkCopyImageToMemoryInfoEXT; 99---- 100 101In the above, `flags` may be `VK_HOST_IMAGE_COPY_MEMCPY_EXT`, in which case the data in host memory will have the same swizzling layout as the image. 102 103Defining regions to copy between images 104 105[source,c] 106---- 107typedef struct VkCopyImageToImageInfoEXT { 108 VkStructureType sType; 109 void* pNext; 110 VkHostImageCopyFlagsEXT flags; 111 VkImage srcImage; 112 VkImageLayout srcImageLayout; 113 VkImage dstImage; 114 VkImageLayout dstImageLayout; 115 uint32_t regionCount; 116 const VkImageCopy2* pRegions; 117} VkCopyImageToImageInfoEXT; 118---- 119 120In the above, `flags` may be `VK_HOST_IMAGE_COPY_MEMCPY_EXT`, in which case data is copied between images with no swizzling layout considerations. 121Current limitations on source and destination images necessarily lead to raw copies between images, so this flag is currently redundant for image to image copies. 122 123Defining the copy regions themselves: 124 125[source,c] 126---- 127typedef struct VkMemoryToImageCopyEXT { 128 VkStructureType sType; 129 void* pNext; 130 const void* pHostPointer; 131 uint32_t memoryRowLength; 132 uint32_t memoryImageHeight; 133 VkImageSubresourceLayers imageSubresource; 134 VkOffset3D imageOffset; 135 VkExtent3D imageExtent; 136} VkMemoryToImageCopyEXT; 137 138typedef struct VkImageToMemoryCopyEXT { 139 VkStructureType sType; 140 void* pNext; 141 void* pHostPointer; 142 uint32_t memoryRowLength; 143 uint32_t memoryImageHeight; 144 VkImageSubresourceLayers imageSubresource; 145 VkOffset3D imageOffset; 146 VkExtent3D imageExtent; 147} VkImageToMemoryCopyEXT; 148---- 149 150The following functions perform the actual copy: 151 152[source,c] 153---- 154VkResult vkCopyMemoryToImageEXT(VkDevice device, const VkCopyMemoryToImageInfoEXT* pCopyMemoryToImageInfo); 155VkResult vkCopyImageToMemoryEXT(VkDevice device, const VkCopyImageToMemoryInfoEXT* pCopyImageToMemoryInfo); 156VkResult vkCopyImageToImageEXT(VkDevice device, const VkCopyImageToImageInfoEXT* pCopyImageToImageInfo); 157---- 158 159Images that are used by these copy instructions must have the `VK_IMAGE_USAGE_HOST_TRANSFER_BIT` usage bit set. 160 161Additionally, to avoid having to submit a command just to transition the image to the correct layout, the following function is introduced to do the layout transition on the host. 162The allowed layouts are limited to serve this purpose without requiring implementations to implement complex layout transitions. 163 164[source,c] 165---- 166typedef struct VkHostImageLayoutTransitionInfoEXT { 167 VkStructureType sType; 168 void* pNext; 169 VkImage image; 170 VkImageLayout oldLayout; 171 VkImageLayout newLayout; 172 VkImageSubresourceRange subresourceRange; 173} VkHostImageLayoutTransitionInfoEXT; 174 175VkResult vkTransitionImageLayoutEXT(VkDevice device, uint32_t transitionCount, const VkHostImageLayoutTransitionInfoEXT *pTransitions); 176---- 177 178The allowed values for `oldLayout` are: 179 180- `VK_IMAGE_LAYOUT_UNDEFINED` 181- `VK_IMAGE_LAYOUT_PREINITIALIZED` 182- Layouts in `VkPhysicalDeviceHostImageCopyPropertiesEXT::pCopySrcLayouts` 183 184The allowed values for `newLayout` are: 185 186- Layouts in `VkPhysicalDeviceHostImageCopyPropertiesEXT::pCopyDstLayouts`. 187 - This list always includes `VK_IMAGE_LAYOUT_GENERAL` 188 189--- 190 191When `VK_HOST_IMAGE_COPY_MEMCPY_EXT` is used in copies to or from an image with `VK_IMAGE_TILING_OPTIMAL`, the application may need to query the memory size needed for copy. 192The link:{refpage}vkGetImageSubresourceLayout2EXT.html[vkGetImageSubresourceLayout2EXT] function can be used for this purpose: 193 194[source,c] 195---- 196void vkGetImageSubresourceLayout2EXT( 197 VkDevice device, 198 VkImage image, 199 const VkImageSubresource2EXT* pSubresource, 200 VkSubresourceLayout2EXT* pLayout); 201---- 202 203The memory size in bytes needed for copies using `VK_HOST_IMAGE_COPY_MEMCPY_EXT` can be retrieved by chaining `VkSubresourceHostMemcpySizeEXT` to `pLayout`: 204 205[source,c] 206---- 207typedef struct VkSubresourceHostMemcpySizeEXT { 208 VkStructureType sType; 209 void* pNext; 210 VkDeviceSize size; 211} VkSubresourceHostMemcpySizeEXT; 212---- 213 214=== Querying support 215 216To determine if a format supports host image copies, `VK_FORMAT_FEATURE_2_HOST_IMAGE_TRANSFER_BIT_EXT` is added. 217 218=== Required formats 219 220All color formats that support sampling are required to support 221`VK_FORMAT_FEATURE_2_HOST_IMAGE_TRANSFER_BIT_EXT`, with some exceptions for externally defined formats: 222 223- DRM format modifiers 224- Android hardware buffers 225 226=== Limitations 227 228Images in optimal layout are often swizzled non-linearly. 229When copying between images and buffers, the GPU can perform the swizzling and address translations in hardware. 230When copying between images and host memory however, the CPU needs to perform this swizzling. 231As a result: 232 233- The implementation may decide to use a simpler and less efficient layout for the image data when `VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT` is specified. 234 - If `optimalDeviceAccess` is set however (see below), the implementation informs that the memory layout 235 is equivalent to an image that does not enable `VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT` from a performance perspective 236 and applications can assume that host image copy is just as efficient as using device copies for resources which are 237 accessed many times on device. 238 - Equivalent performance is only expected within a specific memory type however. 239 On a discrete GPU for example, non-device local memory is expected to be slower to access than device-local memory. 240- The copy on the CPU may indeed be slower than the double-copy through a buffer due to the above swizzling logic. 241 242Additionally, to perform the copy, the implementation must be able to map the image's memory which may limit the memory type the image can be allocated from. 243 244It is therefore recommended that developers measure performance and decide whether this extension results in a performance gain or loss in their application. 245Unless specifically recommended on a platform, it is _not_ generally recommended for applications to perform all image copies through this extension. 246 247=== Querying performance characteristics 248 249[source,c] 250---- 251typedef struct VkHostImageCopyDevicePerformanceQueryEXT { 252 VkStructureType sType; 253 void* pNext; 254 VkBool32 optimalDeviceAccess; 255 VkBool32 identicalMemoryLayout; 256} VkHostImageCopyDevicePerformanceQueryEXT; 257---- 258 259This struct can be chained as an output struct in `vkGetPhysicalDeviceImageFormatProperties2`. 260Given certain image creation flags, it is important for applications to know if using `VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT` 261has an adverse effect on device performance. 262 263This query cannot be a format feature flag, since image creation information can affect this query. 264For example, an image that is only created with `VK_IMAGE_USAGE_SAMPLED_BIT` and `VK_IMAGE_USAGE_TRANSFER_DST_BIT` 265might not have compression at all on some implementations, but adding `VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT` would change this query. 266Other implementations may want to use compression even for `VK_IMAGE_USAGE_TRANSFER_DST_BIT`. 267 268`identicalMemoryLayout` is intended for the gray area where the image is just swizzled in a slightly different pattern to aid host access, 269but fundamentally similar to non-host image copy paths, such that it is unlikely that performance changes in any meaningful way 270except pathological situations. 271The inclusion of this field gives more leeway to implementations that would like to 272set `optimalDeviceAccess` for an image without having to guarantee 100% identical memory layout, and allows applications to choose host image copies 273in that case, knowing that performance is not sacrificed. 274 275As a baseline, block-compressed formats are required to set `optimalDeviceAccess` to `VK_TRUE`. 276 277== Issues 278 279=== RESOLVED: Should other layouts be allowed in `VkHostImageLayoutTransitionInfoEXT`? 280 281Specifying `VK_IMAGE_USAGE_HOST_TRANSFER_BIT` effectively puts the image in a physical layout where `VK_IMAGE_LAYOUT_GENERAL` performs similarly to the `OPTIMAL` layouts for that image. 282Therefore, it was deemed unnecessary to allow other layouts, as they provide no performance benefit. 283In practice, especially for read-only textures, a host-transferred image in the `VK_IMAGE_LAYOUT_GENERAL` layout could be just as efficient as an image transitioned to `VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL`. 284`VkHostImageCopyDevicePerformanceQueryEXT` can be used to query whether using `VK_IMAGE_USAGE_HOST_TRANSFER_BIT` can be detrimental to performance. 285If it is, performance measurements are recommended to ensure the gains from this extension outperform the potential losses. 286 287=== RESOLVED: Should queue family ownership transfers be supported on the host as well? 288 289As long as the allowed layouts are limited to the ones specified above, the actual physical layout of the image will not vary between queue families, and so queue family ownership transfers are currently unnecessary. 290