1// Copyright 2021-2024 The Khronos Group Inc. 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5= VK_EXT_descriptor_buffer 6:toc: left 7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/ 8:sectnums: 9 10This document outlines a proposal to make the management of descriptor memory more explicit, allowing descriptors to be present in buffer memory, allowing the data and memory to be managed alongside other buffer objects. 11 12== Problem Statement 13 14With more “bindless” models of descriptor management, applications are ever increasing the number of descriptors that end up in descriptor sets. 15Managing allocations this large, and ensuring they end up in device local memory for fast access, is becoming an increasingly awkward problem to manage in the driver. 16Developers moving to Vulkan are starting to hit bottlenecks that they simply don’t encounter on other platforms. 17 18In other scenarios, making sure descriptors *do not* end up in device memory is important. 19Copying descriptors in Vulkan is considered rather esoteric, but it is a fairly common strategy in other APIs and implementing a similar style in Vulkan can lead to problems. 20There is no hint to let an implementation know that a descriptor set will only be used for purposes of copying (i.e. staging buffer). 21If a descriptor set is mapped to device local memory (BAR) or uncached memory, reading from the descriptor set on the host can have a catastrophic effect on performance. 22On top of this, some applications rely on being able to copy several tens of thousand individual descriptors every frame. 23The overhead to set up this many calls to `vkUpdateDescriptorSets` is not ideal. 24 25In contrast to this, developers are managing uploads for other large resources (e.g. images, buffers) in application code and generally doing a good job of it – typically this is not identified as a problem area. 26Developers approaching Vulkan are often confused by the way in which descriptor pools work - and several have made requests to manage things more explicitly. 27The key things that we’ve had requests for are (relevant Vulkan issues in brackets): 28 29 * Explicit allocation management 30 * Better mapping to DirectX 12 31 * Host-only descriptor pools 32 * GPU descriptor updates 33 34== Solution Space 35 36There are several more-or-less invasive options that could work here: 37 38 . Add relevant flags and other information to descriptor pools 39 . Like 1, but enable memory binding for descriptor pools 40 . Bypass descriptor pools, and allow direct creation and memory binding for descriptor sets 41 . Bypass descriptor sets, and use descriptor set layouts in buffers 42 . Bypass descriptor set layouts, and use blobs of memory in buffers that shaders access with explicit layouts 43 44link:{refpage}VK_VALVE_mutable_descriptor_type.html[VK_VALVE_mutable_descriptor_type] includes support for option 1, 45through the use of `VK_DESCRIPTOR_SET_LAYOUT_CREATE_HOST_ONLY_POOL_BIT_VALVE` and `VK_DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT_VALVE`. 46However, this does not fully solve the problem of memory management since we can only *avoid* allocating device memory for descriptors. 47Being able to control where shader-accessible descriptors are allocated is still unavailable to applications. 48 49Option 2 attempts to redefine what a descriptor pool is, and it would seem like a very awkward abstraction. 50The whole point of the descriptor pool is to allocate and manage memory on the behalf of the application. 51 52Option 3 and 4 are similarly invasive, but move descriptor pools out of the way, making things a lot clearer. 53The major downside to this is that it potentially blocks out older implementations; however this is likely the same set of implementations that wouldn’t see a benefit from this proposal anyway (i.e. “non-bindless" hardware). 54 55Option 4 has the advantage of having a smaller surface area than option 3 and allows applications to use existing buffer management functions in both Vulkan and in their own code. 56Being able to use buffers directly means that applications are in control of where the memory is allocated and can control if memory is: 57 58 * Host-only (plain malloc) 59 * Host-only but shader-visible (`VkDeviceMemory` with `HOST_VISIBLE_BIT`) 60 * Device local and shader-visible (resizable BAR on discrete GPUs, unified memory on integrated) 61 * Device local only (GPU copies descriptors) 62 63Option 5 is more invasive than Option 4 and requires shader-side changes. 64 65In order to keep the required changes in this extension to the API only, the extra steps in Option 5 are deferred to a future planned extension, and this proposal focuses on Option 4. 66 67 68== Proposal 69 70=== Modelling a descriptor set as memory 71 72Descriptors in Vulkan as it stands are generally considered quite abstract. 73They do not have a size, and when creating descriptor pools it is only specified how many descriptors can be allocated. 74 75This abstraction is removed by the proposal and it assumes that a `VkDescriptorSetLayout` can be expressed as a list of binding points with a known: 76 77 * Byte offset 78 * Element size 79 * Number of elements tightly packed 80 81The element size depends on the descriptor type and is a property of the physical device. 82 83Implementations are free to control the byte offset, and so can freely repack descriptors for optimal memory access. 84For exact control over byte offsets for different descriptors, descriptor indexing should be used, since arrays have guaranteed packing. 85 86If we think in terms of `VkDescriptorPool` with this model, an implementation of that could be something like an arena allocator where size is derived from the descriptor counts, 87and a `VkDescriptorSet` with `VkDescriptorSetLayout` just allocates a certain number of bytes from the pool. 88This is essentially the same model as `VkBuffer` and `VkImage` allocation. 89 90When we call `vkCmdBindDescriptorSets`, what we are really doing is binding a buffer of a certain size. 91The shader compiler looks at `VkPipelineLayout` and based on the `DescriptorSet` and `Binding` decorations, it can look up that a descriptor can be read from the bound descriptor set at a specific offset. 92 93As link:{refpage}VK_EXT_descriptor_indexing.html[VK_EXT_descriptor_indexing] is required, its descriptor limits apply. 94 95==== Next level update-after-bind 96 97With descriptor being modelled as buffer memory, we remove all pretense of the implementation being able to consume descriptors when recording the command buffer. 98In the Vulkan 1.0 descriptor model, descriptors must be valid when descriptor sets are bound and remain valid, which means implementations are free to consume the descriptors, repack them, and so on if they desire. 99With descriptor indexing, the `UPDATE_AFTER_BIND_BIT` and `PARTIALLY_BOUND_BIT` flags imply a buffer like model where descriptors must not be consumed unless dynamically used by shaders. 100With descriptor buffers, this model is implied and it is not allowed to specify a descriptor set layout being both update-after-bind and descriptor buffer capable. 101 102As descriptors can be updated in the GPU timeline, descriptor buffers go a bit further than update-after-bind. 103In the existing update-after-bind model, descriptors can only be consumed correctly if they were written before queue submits. 104 105==== Dropping support for abstract descriptor types 106 107Some descriptor types are a bit more abstract in nature. Dynamic uniform buffers and dynamic storage buffers for example have a component to them that does not consume descriptor memory, but function more like push constants. 108Descriptor types which cannot be expressed in terms of descriptors in memory are not supported with descriptor buffers, 109but rapidly changing descriptors can be replaced with existing alternatives such as: 110 111 * Push constants 112 * Place buffer device address in push constants 113 * Push descriptors 114 115Update-after-bind has similar restrictions already. 116 117==== One buffer, many offsets 118 119While binding descriptor sets as memory is possible on a wide range of hardware, descriptors are still considered "special" memory by many implementations, and it may not be possible to bind many different buffers at the same time. 120Some possible restrictions can be: 121 122 * Limited address space for descriptors 123 * Descriptor sets are accessed with offset from one or more base pointers 124 125In Vulkan, applications are guaranteed at least 4 descriptor sets, but many implementations go beyond this. 126At the same time, it might not be possible to bind that many different descriptor buffers. 127 128In D3D12 for example, this problem manifests itself as `ID3D12GraphicsCommandList::SetDescriptorHeaps()`. 129 130Similarly, this extension will work on a model where applications allocate large descriptor buffers, and bind those buffers to the command buffer. 131From there, descriptor sets are expressed as offsets into the bound buffers. 132 133It is expected that changing a descriptor buffer binding is a fairly heavy operation on some implementations and should be avoided. 134Changing offsets however, is very efficient. 135 136A limited address space can be expressed with special memory types that allocate from a dedicated address space region. 137 138==== No mixing and matching descriptor buffers and older model 139 140The implication of descriptor buffers is that applications will now take more control over which descriptor buffers are bound to a command buffer. 141Without descriptor buffers, this is something implementations were able to hide from applications, so it is not possible to mix and match these models in one draw or dispatch. 142It is possible to mix and match the two models in different draw or dispatches, but it is equivalent to changing the descriptor buffer bindings and should be avoided if possible. 143 144In terms of state invalidation, whenever a descriptor buffer offset is bound, it invalidates all bindings for descriptor sets and vice versa. 145 146=== Putting Descriptors in Memory 147 148This extension introduces new commands to put shader-accessible descriptors directly in memory. 149Properties of descriptor set layouts may vary based on enabled device features, so new device-level functions are added to query the properties of layouts. 150These calls are invariant across the lifetime of the device, and between link:{refpage}VkDevice.html[VkDevice] objects created from the same physical device(s), with the same creation parameters. 151 152[source,c] 153---- 154void vkGetDescriptorSetLayoutSizeEXT( 155 VkDevice device, 156 VkDescriptorSetLayout layout, 157 VkDeviceSize* pLayoutSizeInBytes); 158 159void vkGetDescriptorSetLayoutBindingOffsetEXT( 160 VkDevice device, 161 VkDescriptorSetLayout layout, 162 uint32_t binding, 163 VkDeviceSize* pOffset); 164---- 165 166Applications are responsible for writing data into memory, but the application does not control the memory location directly – descriptor set layouts dictate where each descriptor lives, so that the shader interface continues to work as-is with set and binding numbers. 167 168The size and offset of descriptors is exposed to applications, so they know how to copy it into memory. 169This is important since applications are free to copy descriptors on the device itself. 170 171The sizes for different descriptor types are defined in the properties: `samplerDescriptorSize`, `combinedImageSamplerDescriptorSize`, `sampledImageDescriptorSize`, `storageImageDescriptorSize`, `uniformTexelBufferDescriptorSize`, `robustUniformTexelBufferDescriptorSize`, `storageTexelBufferDescriptorSize`, `robustStorageTexelBufferDescriptorSize`, `uniformBufferDescriptorSize`, `robustUniformBufferDescriptorSize`, `storageBufferDescriptorSize`, `robustStorageBufferDescriptorSize`, `inputAttachmentDescriptorSize`, `accelerationStructureDescriptorSize`, `combinedImageSamplerDensityMapDescriptorSize`. 172 173Descriptor arrays have guaranteed packing, such that each element of an array for a given binding has an offset from that binding’s base offset equal to the size of the descriptor multiplied by the array offset. 174Bindings can be moved around as the driver sees fit, but variable-sized descriptor arrays must be packed at the end. 175 176For use cases where layouts contain a variable-sized descriptor count, the size returned reflects the upper bound described in the descriptor set layout. 177The size required for a descriptor set layout with a variable size descriptor array can be obtained by adding the product of the number of descriptors that are actually used and the size of the descriptor. 178 179Descriptor set layouts used for this purpose must be created with a new create flag: 180 181[source,c] 182---- 183VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT = 0x00000010 184---- 185 186Layouts created with this flag must not be used to create a link:{refpage}VkDescriptorSet.html[VkDescriptorSet] and must not include dynamic uniform buffers or dynamic storage buffers. 187Applications can achieve the same dynamic offsetting by either updating a descriptor buffer, using push constants, or by using push descriptors. 188The blob of memory corresponding to a descriptor is obtained from resource views directly. 189How applications get that data into device memory is entirely up to them, but the offset must match that obtained from the layout. 190 191[source,c] 192---- 193typedef struct VkDescriptorAddressInfoEXT { 194 VkStructureType sType; 195 const void* pNext; 196 VkDeviceAddress address; 197 VkDeviceSize range; 198 VkFormat format; 199} VkDescriptorAddressInfoEXT; 200 201typedef union VkDescriptorDataEXT { 202 const VkSampler* pSampler; 203 const VkDescriptorImageInfo* pCombinedImageSampler; 204 const VkDescriptorImageInfo* pInputAttachmentImage; 205 const VkDescriptorImageInfo* pSampledImage; 206 const VkDescriptorImageInfo* pStorageImage; 207 const VkDescriptorAddressInfoEXT* pUniformTexelBuffer; 208 const VkDescriptorAddressInfoEXT* pStorageTexelBuffer; 209 const VkDescriptorAddressInfoEXT* pUniformBuffer; 210 const VkDescriptorAddressInfoEXT* pStorageBuffer; 211 VkDeviceAddress accelerationStructure; 212} VkDescriptorDataEXT; 213 214typedef struct VkDescriptorGetInfoEXT { 215 VkStructureType sType; 216 const void* pNext; 217 VkDescriptorType type; 218 VkDescriptorDataEXT data; 219} VkDescriptorGetInfoEXT; 220 221void vkGetDescriptorEXT( 222 VkDevice device 223 const VkDescriptorGetInfoEXT* pCreateInfo, 224 size_t dataSize, 225 void* pDescriptor); 226---- 227 228These APIs extract raw descriptor blob data from objects. The data obtained from these calls can be freely copied around. 229Note that these calls do not know anything about descriptor set layouts. It is the application's responsibility to write descriptors to a suitable location. 230 231A notable change here is that there is no longer any need for link:{refpage}VkBufferView.html[VkBufferView] objects. 232Texel buffers are built from buffer device addresses and format instead. 233This improvement is motivated by DX12 portability. 234In some use cases, texel buffers are linearly allocated and having to create and manage a large number of unique view objects is problematic. 235With descriptor buffers, this style of API is now feasible in Vulkan. 236 237A similar improvement is that uniform buffers and storage buffer also take buffer device addresses. 238 239Acceleration structure descriptors are also built from device addresses, or handles retrieved from `vkGetAccelerationStructureHandleNV` when using `VkAccelerationStructureNV` objects. 240 241Inline uniform buffers do not have a descriptor data getter API associated with them. 242Instead, the descriptor data is copied directly into the buffer offset obtained by `vkGetDescriptorSetLayoutBindingOffsetEXT`. 243As the name suggests, inline uniform buffers are embedded into the descriptor set itself. 244 245As descriptors are now in regular memory, drivers cannot hide copies of immutable samplers that end up in descriptor sets from the application. 246As such, applications are required to provide these samplers as if they were not provided immutably. 247These samplers must have identical parameters to the immutable samplers in the descriptor set layout. 248Alternatively, applications can use dedicated descriptor sets for immutable samplers that do not require app-managed memory, by <<Embedded Immutable Samplers,embedding them in a special descriptor set>>. 249 250If the `descriptorBufferImageLayoutIgnored` feature is enabled, the `imageLayout` in link:{refpage}VkDescriptorImageInfo.html[VkDescriptorImageInfo] is ignored, otherwise it specifies the layout that the descriptor will be used with. 251`type` must not be `VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC` or `VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC`. 252'format' in `VkDescriptorAddressInfoEXT` is ignored for non-texel buffers. 253 254The `combinedImageSamplerDescriptorSingleArray` property indicates that the implementation does not require an array of `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` descriptors to be written into a descriptor buffer as an array of image descriptors, immediately followed by an array of sampler descriptors. If `VK_FALSE`, applications are expected to write the first `sampledImageDescriptorSize` bytes of the data returned through `pDescriptor` to the first array, and the remaining `samplerDescriptorSize` bytes of the data to the second array. 255On these implementations, variable descriptor counts of combined image samplers may be supported, but it is not useful as the descriptor set size must assume the upper bound. 256 257 258==== Embedded Immutable Samplers 259 260Immutable samplers can be embedded into descriptor layouts, allowing them to be bound without disturbing descriptor buffer bindings or requiring device memory backing. 261Descriptor set layouts must be created with a new flag for this purpose: 262 263[source,c] 264---- 265VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT = 0x00000020 266---- 267 268When this flag is used, this set layout can only contain descriptor bindings with a `descriptorType` of `VK_DESCRIPTOR_TYPE_SAMPLER`, a `descriptorCount` of `1` (i.e. not arrayed), and a valid `VkSampler used in `pImmutableSamplers`. 269Note that arrays of immutable samplers are not supported, as implementations typically need these in memory to allow dynamic indexing - whereas no device memory is directly associated with these sets. 270 271 272=== Pipeline creation 273 274To use pipelines with descriptor buffers a new `VkPipelineCreateFlag` must be used: 275 276[source,c] 277---- 278VK_PIPELINE_CREATE_DESCRIPTOR_BUFFER_BIT_EXT = 0x20000000 279---- 280 281=== Descriptor Binding 282 283Descriptor buffers are bound to the command buffer directly (similar to vertex buffers). 284 285[source,c] 286---- 287 288typedef struct VkDescriptorBufferBindingPushDescriptorBufferHandleEXT { 289 VkStructureType sType; 290 const void* pNext; 291 VkBuffer buffer; 292} VkDescriptorBufferBindingPushDescriptorBufferHandleEXT; 293 294typedef struct VkDescriptorBufferBindingInfoEXT { 295 VkStructureType sType; 296 const void* pNext; 297 VkDeviceAddress address; 298 VkBufferUsageFlags usage; 299} VkDescriptorBufferBindingInfoEXT; 300 301vkCmdBindDescriptorBuffersEXT( 302 VkCommandBuffer commandBuffer, 303 uint32_t bufferCount, 304 const VkDescriptorBufferBindingInfoEXT* pBindingInfos); 305---- 306 307Unlike binding descriptor sets, there’s no invalidating going on with this binding – a buffer remains bound and is interpreted by a pipeline in the manner the pipeline expects, irrespective of what layout was used to construct the buffer for each set. 308 309There must be no more than `maxSamplerDescriptorBufferBindings` descriptor buffers containing sampler descriptor data bound. 310Such buffers must be created with `VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT`. 311 312There must be no more than `maxResourceDescriptorBufferBindings` descriptor buffers containing resource descriptors bound. 313Such buffers must be bound with `VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT`. 314 315If a buffer contains both usage flags, it counts once against both limits. 316 317If the `bufferlessPushDescriptors` property is `VK_FALSE` and a buffer contains the `VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT` usage flag, a `VkDescriptorBufferBindingPushDescriptorBufferHandleEXT` structure must be added to the `pNext` chain of `VkDescriptorBufferBindingInfoEXT`. 318 319`bufferCount` must be less than or equal to `maxDescriptorBufferBindings`. 320 321Any previously bound buffers at binding points greater than or equal to `bufferCount` are unbound. 322 323Each entry in `pBindingInfos` contains the device address of a descriptor buffer and the usage flags that the buffer was created with. 324 325Changing buffers may be an expensive operation and should be done infrequently (if ever). 326 327The maximum available range of each binding to a shader is `maxSamplerDescriptorBufferRange` and/or `maxResourceDescriptorBufferRange`. 328 329The `samplerDescriptorBufferAddressSpaceSize`, `resourceDescriptorBufferAddressSpaceSize`, and `descriptorBufferAddressSpaceSize` properties 330give the upper bound for the total amount of address space used for descriptor buffers. 331 332Buffers used for this purpose need to be created with a new usage flags: 333 334[source,c] 335---- 336VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT = 0x00200000 337VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT = 0x00400000 338---- 339 340`VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT` specifies that the buffer will be used to contain sampler descriptors when bound as a descriptor buffer. 341`VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT` specifies that the buffer will be used to contain resource descriptors, i.e. non-sampler descriptors, when bound as a descriptor buffer. 342Buffers containing `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` descriptors must have been created with both `VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT` and `VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT`. 343 344Each descriptor set is associated with a buffer and an offset into that buffer which can be set by: 345 346[source,c] 347---- 348vkCmdSetDescriptorBufferOffsetsEXT( 349 VkCommandBuffer commandBuffer, 350 VkPipelineBindPoint pipelineBindPoint, 351 VkPipelineLayout layout, 352 uint32_t firstSet, 353 uint32_t setCount, 354 const uint32_t* pBufferIndices, 355 const VkDeviceSize* pOffsets); 356---- 357 358`vkCmdSetDescriptorBufferOffsetsEXT` causes the sets numbered [firstSet.. firstSet+setCount-1] to use the bindings stored in the buffer bound at pBufferIndices[i] at an offset of pOffsets[i] for subsequent bound pipeline commands set by pipelineBindPoint. Any bindings that were previously applied via these sets, or calls to `vkCmdBindDescriptorSets`, are no longer valid. Calling vkCmdBindDescriptorSets invalidates bindings previously applied via `vkCmdSetDescriptorBufferOffsetsEXT`. 359 360Setting offsets should be a cheap operation and can be performed frequently. 361The offsets must be aligned to `descriptorBufferOffsetAlignment`. 362 363<<Embedded Immutable Samplers,Embedded immutable samplers>> can be bound using: 364 365[source,c] 366----- 367vkCmdBindDescriptorBufferEmbeddedSamplersEXT( 368 VkCommandBuffer commandBuffer, 369 VkPipelineBindPoint pipelineBindPoint, 370 VkPipelineLayout layout, 371 uint32_t set) 372); 373----- 374 375`vkCmdBindDescriptorBufferEmbeddedSamplersEXT` binds the embedded immutable samplers in `layout` at set index `set` to the same set in the command buffer. 376Set bindings are invalidated in the same manner as they are for `vkCmdSetDescriptorBufferOffsetEXT`. 377The `VkDescriptorSetLayout` at index `set` of `layout` must have been created with the `VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT` bit. 378There must be no more than `maxEmbeddedImmutableSamplerBindings` embedded immutable sampler sets bound. 379Like DX12, there is a limit to how many unique embedded immutable samplers may be alive in a device at any one point. This limit is designed to match DX12. 380 381 382=== Descriptor Updates 383 384As descriptors are just a blob of memory, descriptor updates can be performed by any operation on either the host or device that can access memory, enabling a form of GPU descriptor update. 385Descriptor buffer reads can be synchronized using a new access bit in the relevant shader stage: 386 387[source,c] 388---- 389VK_ACCESS_2_DESCRIPTOR_BUFFER_READ_BIT_EXT = 0x20000000000ULL 390---- 391 392Note that host writes are implicitly made visible to all stages in `vkQueueSubmit`, so this access flag is only relevant when performing GPU-side updates of descriptors. 393 394If the `allowSamplerImageViewPostSubmitCreation` property is `VK_FALSE` there are special requirements for when descriptor data for `VkSampler` or `VkImageView` objects can be used. 395Those objects must have been created before any `vkQueueSubmit` (or `vkQueueSubmit2`) call that executes a command buffer which accesses descriptor data for them. 396 397For example, if `allowSamplerImageViewPostSubmitCreation` is `VK_FALSE`, this is disallowed: 398 399* Call `vkQueueSubmit()` which is waiting for a timeline semaphore 400* Create a `VkImageView` 401* Update the descriptor buffer used by the previous submission from the host using the descriptor data of the new `VkImageView` 402* Signal the semaphore from the host 403 404=== Push descriptors 405 406Support for descriptor buffers combined with push descriptors is supported if the `descriptorBufferPushDescriptors` feature bit is set. 407 408To support push descriptors on certain implementations, additional buffer usage flags are added: 409 410[source,c] 411---- 412VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT = 0x04000000 413---- 414 415If the application desires to use push descriptors and descriptor buffers together, 416a descriptor set layout must be declared with `VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR` and `VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT` bits set. 417 418If the `bufferlessPushDescriptors` property is `VK_FALSE`, there are special requirements for using push descriptors with descriptor buffers. 419`VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT` is a special buffer flag which is required for certain implementations in order for push descriptors to interoperate with descriptor buffers. 420When pushing descriptors using this kind of set layout, it is required that a descriptor buffer is bound to the command list with the `VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT` usage flag. 421The intention here is that implementation can reserve scratch space in descriptor buffers for the purposes of dealing with push descriptors. 422The mechanics here are highly magical and implementation defined in nature and is considered too burdensome to expect that applications deal with it. 423 424Binding a buffer that was created with `VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT` requires the application to record any current push descriptors again. 425 426=== Capture/Replay 427 428When creating a resource with the capture/replay feature enabled, an opaque handle can be obtained which can be passed into creation calls in a future replay, causing descriptors to be created with the same data. 429 430New flags to be supplied when creating buffers, images, and samplers to be captured/replayed: 431 432[source,c] 433---- 434VK_BUFFER_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT = 0x00000020 435VK_IMAGE_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT = 0x00010000 436VK_IMAGE_VIEW_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT = 0x00000004 437VK_SAMPLER_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT = 0x00000008 438VK_ACCELERATION_STRUCTURE_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT = 0x00000008 439---- 440 441There are separate commands to get opaque data for buffers, images, and samplers: 442 443[source,c] 444---- 445VkResult vkGetBufferOpaqueCaptureDescriptorDataEXT( 446 VkDevice device, 447 const VkBufferCaptureDescriptorDataInfoEXT* pInfo, 448 void* pData); 449 450typedef struct VkBufferCaptureDescriptorDataInfoEXT { 451 VkStructureType sType; 452 const void* pNext; 453 VkBuffer buffer; 454} VkBufferCaptureDescriptorDataInfoEXT; 455 456VkResult vkGetImageOpaqueCaptureDescriptorDataEXT( 457 VkDevice device, 458 const VkImageCaptureDescriptorDataInfoEXT* pInfo, 459 void* pData); 460 461typedef struct VkImageCaptureDescriptorDataInfoEXT { 462 VkStructureType sType; 463 const void* pNext; 464 VkImage image; 465} VkImageCaptureDescriptorDataInfoEXT; 466 467VkResult vkGetImageViewOpaqueCaptureDescriptorDataEXT( 468 VkDevice device, 469 const VkImageViewCaptureDescriptorDataInfoEXT* pInfo, 470 void* pData); 471 472typedef struct VkImageViewCaptureDescriptorDataInfoEXT { 473 VkStructureType sType; 474 const void* pNext; 475 VkImageView imageView; 476} VkImageViewCaptureDescriptorDataInfoEXT; 477 478VkResult vkGetSamplerOpaqueCaptureDescriptorDataEXT( 479 VkDevice device, 480 const VkSamplerCaptureDescriptorDataInfoEXT* pInfo, 481 void* pData); 482 483typedef struct VkSamplerCaptureDescriptorDataInfoEXT { 484 VkStructureType sType; 485 const void* pNext; 486 VkSampler sampler; 487} VkSamplerCaptureDescriptorDataInfoEXT; 488 489VkResult vkGetAccelerationStructureOpaqueCaptureDescriptorDataEXT( 490 VkDevice device, 491 const VkAccelerationStructureCaptureDescriptorDataInfoEXT* pInfo, 492 void* pData); 493 494typedef struct VkAccelerationStructureCaptureDescriptorDataInfoEXT { 495 VkStructureType sType; 496 const void* pNext; 497 VkAccelerationStructureKHR accelerationStructure; 498 VkAccelerationStructureNV accelerationStructureNV; 499} VkAccelerationStructureCaptureDescriptorDataInfoEXT; 500---- 501 502Once queried, this must be provided to buffer/image/imageview/sampler/acceleration structure creation in a similar manner to buffer device address creation, by chaining the following structure to buffer, image, imageview, sampler, or acceleration structure creation: 503 504[source,c] 505---- 506typedef struct VkOpaqueCaptureDescriptorDataCreateInfoEXT { 507 VkStructureType sType; 508 const void* pNext; 509 const void* opaqueCaptureDescriptorData; 510} VkOpaqueCaptureDescriptorDataCreateInfoEXT; 511---- 512 513In each case, the size of the capture data is sized to the `bufferCaptureReplayDescriptorDataSize`, `imageCaptureReplayDescriptorDataSize`, `imageViewCaptureReplayDescriptorDataSize`, `samplerCaptureReplayDescriptorDataSize`, or `accelerationStructureCaptureReplayDescriptorDataSize` limits as appropriate. 514 515In addition, link:{refpage}vkGetDeviceMemoryOpaqueCaptureAddress.html[vkGetDeviceMemoryOpaqueCaptureAddress] must be used to capture the opaque address and replay it with link:{refpage}VkMemoryOpaqueCaptureAddressAllocateInfo.html[VkMemoryOpaqueCaptureAddressAllocateInfo], for any memory used by resources with these handles. 516 517 518=== Device Features 519 520The following features are exposed: 521 522[source,c] 523---- 524typedef struct VkPhysicalDeviceDescriptorBufferFeaturesEXT { 525 VkStructureType sType; 526 void* pNext; 527 VkBool32 descriptorBuffer; 528 VkBool32 descriptorBufferCaptureReplay; 529 VkBool32 descriptorBufferImageLayoutIgnored; 530 VkBool32 descriptorBufferPushDescriptors; 531} VkPhysicalDeviceDescriptorBufferFeaturesEXT; 532---- 533 534If the `descriptorBuffer` feature is enabled, link:{refpage}VK_AMD_shader_fragment_mask.html[VK_AMD_shader_fragment_mask] must not be enabled. 535If the `descriptorBufferImageLayoutIgnored` feature is enabled, the image layout provided when getting a descriptor is ignored. 536The `descriptorBufferCaptureReplay` feature is primarily for capture replay tools, and allows opaque data to be captured and replayed, allowing the same descriptor handles to be used on replay. 537If the `descriptorBufferPushDescriptors` features is enabled push descriptors can be used with descriptor buffers. 538 539 540=== Device Properties 541 542The following properties are exposed: 543 544[source,c] 545---- 546typedef struct VkPhysicalDeviceDescriptorBufferPropertiesEXT { 547 VkStructureType sType; 548 void* pNext; 549 VkBool32 combinedImageSamplerDescriptorSingleArray; 550 VkBool32 bufferlessPushDescriptors; 551 VkBool32 allowSamplerImageViewPostSubmitCreation; 552 VkDeviceSize descriptorBufferOffsetAlignment; 553 uint32_t maxDescriptorBufferBindings; 554 uint32_t maxResourceDescriptorBufferBindings; 555 uint32_t maxSamplerDescriptorBufferBindings; 556 uint32_t maxEmbeddedImmutableSamplerBindings; 557 uint32_t maxEmbeddedImmutableSamplers; 558 size_t bufferCaptureReplayDescriptorDataSize; 559 size_t imageCaptureReplayDescriptorDataSize; 560 size_t imageViewCaptureReplayDescriptorDataSize; 561 size_t samplerCaptureReplayDescriptorDataSize; 562 size_t accelerationStructureCaptureReplayDescriptorDataSize; 563 size_t samplerDescriptorSize; 564 size_t combinedImageSamplerDescriptorSize; 565 size_t sampledImageDescriptorSize; 566 size_t storageImageDescriptorSize; 567 size_t uniformTexelBufferDescriptorSize; 568 size_t robustUniformTexelBufferDescriptorSize; 569 size_t storageTexelBufferDescriptorSize; 570 size_t robustStorageTexelBufferDescriptorSize; 571 size_t uniformBufferDescriptorSize; 572 size_t robustUniformBufferDescriptorSize; 573 size_t storageBufferDescriptorSize; 574 size_t robustStorageBufferDescriptorSize; 575 size_t inputAttachmentDescriptorSize; 576 size_t accelerationStructureDescriptorSize; 577 VkDeviceSize maxSamplerDescriptorBufferRange; 578 VkDeviceSize maxResourceDescriptorBufferRange; 579 VkDeviceSize samplerDescriptorBufferAddressSpaceSize; 580 VkDeviceSize resourceDescriptorBufferAddressSpaceSize; 581 VkDeviceSize descriptorBufferAddressSpaceSize; 582} VkPhysicalDeviceDescriptorBufferPropertiesEXT; 583---- 584 585* `descriptorBufferOffsetAlignment` describes the alignment required, in bytes, when setting offsets into the descriptor buffer. 586* `combinedImageSamplerDescriptorSingleArray` indicates that the implementation does not require an array of `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` descriptors to be written into a descriptor buffer as an array of image descriptors, immediately followed by an array of sampler descriptors. 587* `bufferlessPushDescriptors` indicates that the implementation does not require a buffer created with `VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT` to be bound when using push descriptors. 588* `allowSamplerImageViewPostSubmitCreation` indicates that the implementation does not restrict when the `VkSampler` or `VkImageView` objects used to retrieve descriptor data can be created in relation to command buffer submission. If this value is `VK_FALSE`, then the application must create any `VkSampler` or `VkImageView` objects whose descriptor data is accessed during the execution of a command buffer, before the `vkQueueSubmit` (or `vkQueueSubmit2`) call that submits that command buffer. 589* `maxDescriptorBufferBindings` defines the maximum total number of descriptor buffers and embedded immutable sampler sets that can be bound. 590* `maxResourceDescriptorBufferBindings` defines the maximum number of resource descriptor buffers that can be bound. 591* `maxSamplerDescriptorBufferBindings` defines the maximum number of sampler descriptor buffers that can be bound. 592* `maxEmbeddedImmutableSamplerBindings` defines the maximum number of embedded immutable samplers sets that can be bound. 593* `maxEmbeddedImmutableSamplers` describes the maximum number of unique immutable samplers in descriptor set layouts created with `VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT`, and pipeline layouts created from them, which can simultaneously exist on a device. 594* `bufferCaptureReplayDescriptorDataSize`, `imageCaptureReplayDescriptorDataSize`, `imageViewCaptureReplayDescriptorDataSize`, `samplerCaptureReplayDescriptorDataSize`, and `accelerationStructureCaptureReplayDescriptorDataSize` define the maximum size, in bytes, of the opaque data used for capture replay with each respective object type. 595* `samplerDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_SAMPLER descriptor. 596* `combinedImageSamplerDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptor. 597* `sampledImageDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE descriptor. 598* `storageImageDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_IMAGE descriptor. 599* `uniformTexelBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER descriptor. 600* `robustUniformTexelBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER descriptor when robust buffer access is enabled. 601* `storageTexelBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER descriptor. 602* `robustStorageTexelBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER descriptor when robust buffer access is enabled. 603* `uniformBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER descriptor. 604* `robustUniformBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER descriptor when robust buffer access is enabled. 605* `storageBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_BUFFER descriptor. 606* `robustStorageBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_BUFFER descriptor when robust buffer access is enabled. 607* `inputAttachmentDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT descriptor. 608* `accelerationStructureDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR/VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_NV descriptor. 609* `maxSamplerDescriptorBufferRange` describes the accessible range, in bytes, of a sampler buffer when bound. 610* `maxResourceDescriptorBufferRange` describes the accessible range, in bytes, of a resource buffer when bound. 611* `samplerDescriptorBufferAddressSpaceSize` describes the total amount of address space available, in bytes, for descriptor buffers containing samplers. 612* `resourceDescriptorBufferAddressSpaceSize` describes the total amount of address space available, in bytes, for descriptor buffers containing resources. 613* `descriptorBufferAddressSpaceSize` describes the total amount of address space available, in bytes, for all descriptor buffers. 614 615If link:{refpage}VK_VALVE_mutable_descriptor_type.html[VK_VALVE_mutable_descriptor_type] is used, 616a descriptor is considered to be a union of all the enabled types, so the size of a descriptor is the maximum of all enabled types. 617 618[source,c] 619---- 620typedef struct VkPhysicalDeviceDescriptorBufferDensityMapPropertiesEXT { 621 VkStructureType sType; 622 void* pNext; 623 size_t combinedImageSamplerDensityMapDescriptorSize; 624} VkPhysicalDeviceDescriptorBufferDensityMapPropertiesEXT; 625---- 626 627* `combinedImageSamplerDensityMapDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptor when using the VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT flag of the link:{refpage}VK_EXT_fragment_density_map.html[VK_EXT_fragment_density_map] extension. 628 629== Mapping to DirectX® 12 Descriptor Heaps 630 631In DirectX 12 (DX12), descriptors are allocated into descriptor heaps, which work almost completely differently to anything currently in Vulkan. 632This extension aims to reduce one aspect of the divergence between the two. 633Below is a rough description of the mapping from DX12 to this extension. 634Applications looking to port between the two APIs will likely have more information available than the DX12 API provides, and can likely take shortcuts (highlighted where possible). 635This doesn’t solve the overall limits for object counts, and so it’s not possible to trivially emulate every corner of the DX12 API. 636 637 638=== Descriptor Heap Creation 639 640DX12 has the following command to create a heap: 641 642[source,c] 643---- 644typedef struct D3D12_DESCRIPTOR_HEAP_DESC { 645 D3D12_DESCRIPTOR_HEAP_TYPE Type; 646 UINT NumDescriptors; 647 D3D12_DESCRIPTOR_HEAP_FLAGS Flags; 648 UINT NodeMask; 649} D3D12_DESCRIPTOR_HEAP_DESC; 650 651HRESULT CreateDescriptorHeap( 652 const D3D12_DESCRIPTOR_HEAP_DESC *pDescriptorHeapDesc, 653 REFIID riid, 654 void **ppvHeap 655); 656---- 657 658Implementing the equivalent functionality in Vulkan would mean the following operations: 659 660 * Create a `VkDescriptorSetLayout` with `VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT`. The count would be up to 1000000 for resources, and 2048 for samplers. 661 ** If link:{refpage}VK_VALVE_mutable_descriptor_type.html[VK_VALVE_mutable_descriptor_type] is supported, we only need one descriptor set layout which supports all descriptor types for the heap type. 662 ** Otherwise, there are two alternatives: 663 *** Create up to 6 descriptor set layouts of the relevant descriptor types the application cares about (`STORAGE_BUFFER`, `UNIFORM_BUFFER`, `SAMPLED_IMAGE`, `STORAGE_IMAGE`, `UNIFORM_TEXEL_BUFFER`, `STORAGE_TEXEL_BUFFER`). 664 *** Create one descriptor set layout with 6 fixed-size arrays instead of using variable descriptor counts. This means `NumDescriptors` is effectively ignored. 665 * Create a `VkBuffer`, size equal to `NumDescriptors` multiplied by the descriptor size within it, and its device mask set per `NodeMask`. 666 * If `Flags` includes `D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE`, allocate `DEVICE_LOCAL` memory. 667 ** If this memory can be `DEVICE_LOCAL` and `HOST_VISIBLE`, then that can be mapped directly for the CPU pointer and used as the heap CPU pointer. 668 ** Otherwise, `HOST_VISIBLE` staging memory should be allocated for a parallel buffer. 669 Copying from this staging buffer to the main descriptor buffer should be done at each submit where the staging buffer has been modified. 670 * If Flags does not include `D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE`, allocate `HOST_VISIBLE` memory that can be used for staging copies to `DEVICE_LOCAL` memory. 671 ** Alternatively, plain `malloc` can be used if descriptor copies are implemented as `memcpy`. 672 * Copying descriptors ala `CopyDescriptorsSimple()` is implemented with either memcpy or staging copies. 673 674This model would support the full TIER_3 resource binding feature in DX12 and shader model 6.6 direct heap access, but can be simplified a lot for applications with DX11-style binding models. 675 676=== Descriptor Creation 677 678Unlike DX12, Vulkan (and this extension) requires view objects and sampler objects to exist and have their lifetimes managed by the application. 679These objects need to be kept alive for the descriptor itself to be valid. 680How this is managed precisely is going to depend on the application’s usage patterns, though link:https://github.com/HansKristian-Work/vkd3d-proton[vkd3d-proton] suggests one viable option. 681The scheme used by vkd3d-proton involves keeping a hash map of the views associated with each resource object (or the device for samplers), using creation parameters as a key, so that their lifetime is tied to the underlying resource and can be reused. 682When actually creating the UAV/SRV/Sampler, the object should be looked up in the relevant hash map, and created there if necessary. 683The descriptor itself is then written directly to the provided CPU pointer. 684Note that 'VkBufferView' objects are not used and have been replaced by an explicit address, range, and format. 685This is very important since applications have a tendency to linearly allocate texel buffers and might end up rapidly create these views at different offsets. 686If applications were forced to hold on to all unique 'VkBufferView' objects, things get out of hand quickly. 687vkd3d-proton currently works around this problem by quantizing the texel buffer offset and range, and instead performs offset/range checks per access in shaders to keep the number of objects low, which is obviously not desirable. 688 689For image views on the other hand, the number of unique views in flight per resource tends to be constrained and manageable. 690In terms of performance characteristics, creating SRVs and UAVs is already far more expensive in DX12 than copying descriptors. 691The style observed in most DX12 applications is that view objects are created in non-shader visible heaps, which are then streamed into shader visible heaps. 692 693=== Descriptor Heap Queries 694 695Descriptor heaps provide methods to query the “start” pointer for the descriptor heap on both the CPU and GPU. 696 697[source,c] 698---- 699D3D12_CPU_DESCRIPTOR_HANDLE GetCPUDescriptorHandleForHeapStart(); 700D3D12_GPU_DESCRIPTOR_HANDLE GetGPUDescriptorHandleForHeapStart(); 701UINT GetDescriptorHandleIncrementSize( 702 D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapType 703); 704---- 705 706`GetGPUDescriptorHandleForHeapStart` should be the `VkDeviceAddress` for the device-local buffer. 707`GetCPUDescriptorHandleForHeapStart` should be the mapped host address for the host-visible buffer. 708`GetDescriptorHandleIncrementSize` should be the size of the largest descriptor possible in the buffer. 709 710However, this model can fall through fairly quickly if the descriptor set layout is more complicated. 711When more than one descriptor array is used to emulate the union-style descriptor heap of DX12, 712it is not possible to provide a unique pointer to host memory that is suitable for copying. 713 714An engine abstraction that takes descriptor heap and offset separately is much easier to implement overall and avoids all these pitfalls. 715 716=== Descriptor Copies 717 718D3D12-style descriptor copies can be performed using `memcpy` on the host-visible descriptor buffer memory, 719but applications need to make sure the memory that is being read from is cached on the host. 720Alternatively, it is possible to use staging buffer copies. 721 722=== Descriptor Binding 723 724Binding descriptors to shaders in DX12 consists of two operations: setting the descriptor heaps, and setting tables as offsets into those heaps. 725 726`SetDescriptorHeaps` allows applications to set one sampler heap, and one CBV/SRV/UAV heap (containing other resources). 727This command should straightforwardly map to `vkCmdBindDescriptorBuffersEXT`, with each heap being bound as a separate buffer. 728 729`Set{Graphics|Compute}RootDescriptorTable` allows applications to set various offsets to the descriptor heap, to be more or less used like descriptor sets in Vulkan. 730This command will map fairly directly to `vkCmdSetDescriptorBufferOffsetsEXT`, but if implementing DX12 root signatures natively, this approach will not work easily. 731The core assumption of DX12 is that the heap is a big array and a table offset should be seen more as an index offset into that big array. 732`descriptorBufferOffsetAlignment` might be larger than one descriptor, so binding at the desired offset might not be possible. 733Descriptor buffer offsets are better suited for suballocating individual descriptor sets rather than slicing existing descriptor sets. 734 735An engine abstraction can decide to take this into account when allocating descriptor sets: 736 737 * In DX12 path, a root signature has N tables, which needs to allocate M descriptors each. 738 * In Vulkan path, a "root signature" translates to a `VkPipelineLayout`, which in turn translates to N `VkDescriptorSetLayout`s which require M bytes in the descriptor buffer each. 739 740If native DX12 root signature compatibility is required however, the suggested implementation is to bind the heap in its entirety with a single `vkCmdSetDescriptorBufferOffsetEXT` of 0. 741The shader declares global unsized arrays and from there we can implement shader model 6.6 by just indexing into the descriptor array directly. 742For older models, descriptor table offsets can translate to u32 push constants that add an extra offset, meaning that we promote legacy root signatures to shader model 6.6. 743This is a fairly invasive process and it is only expected that translation layers would go to this length. 744 745== Porting existing Vulkan applications 746 747Porting an existing Vulkan application to the new API should require minimal additional code, and ideally should allow the removal of older code. 748 749Applications should be uploading descriptors in the exact same manner they upload other resource data (e.g. new textures, constants, etc.). 750All advice about how to upload resources (e.g. use staging buffers, use the DMA queue asynchronously, etc.) apply in the exact same manner for descriptors as they do for anything else. 751 752When porting an application then, the aim should not be to create a new separate path for descriptor uploads, but to directly hook into existing resource upload paths. 753This amortises the cost of descriptor uploads with other data uploads and reduces the amount of code dedicated to descriptor management. 754Any improvements to data uploads then automatically apply to descriptor uploads. 755For strategies where resizable BAR or unified memory can be used, none of this is necessary and uploading descriptors becomes `memcpy`. 756 757For descriptor management, pools are removed. Instead of allocating descriptor sets from pools, applications can instead allocate from a custom allocator, which is backed by a big descriptor buffer. 758The size to allocate for a set would be obtained from `vkGetDescriptorSetLayoutSizeEXT` and alignment from `descriptorBufferOffsetAlignment`. 759A linear or arena allocator would be a good match for this. 760 761Instead of updating descriptor sets with `vkUpdateDescriptorSets`, `vkGetDescriptorEXT` could point directly to the mapped descriptor buffer, or a scratch buffer can be used and copied later. 762 763== Example 764 765This example intends to show: 766 767 * How to create descriptor set layouts 768 * How to use immutable samplers with descriptor buffers 769 * How to use embedded immutable samplers 770 * How to use push descriptors 771 * How to allocate enough descriptor buffer memory 772 * How to bind ranges of descriptor buffers to descriptor sets 773 774[source,c] 775---- 776VkSampler immutableSamplers[4]; // Create these somehow. 777 778// When using descriptor buffers, it is generally a good idea to separate out samplers and resources into separate sets, 779// since descriptor buffers containing samplers might be very limited in size. 780const VkDescriptorSetLayoutBinding setLayout0[] = 781{ 782 { 783 0, // binding 784 VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE, // descriptorType 785 2, // descriptorCount 786 VK_SHADER_STAGE_FRAGMENT_BIT, // stageFlags 787 NULL // pImmutableSamplers 788 }, 789 { 790 1, // binding 791 VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER, // descriptorType 792 2, // descriptorCount 793 VK_SHADER_STAGE_FRAGMENT_BIT, // stageFlags 794 NULL // pImmutableSamplers 795 } 796}; 797 798const VkDescriptorSetLayoutBinding setLayout1[] = 799{ 800 { 801 0, // binding 802 VK_DESCRIPTOR_TYPE_SAMPLER, // descriptorType 803 2, // descriptorCount 804 VK_SHADER_STAGE_FRAGMENT_BIT, // stageFlags 805 &immutableSamplers[0], // pImmutableSamplers 806 }, 807 { 808 1, // binding 809 VK_DESCRIPTOR_TYPE_SAMPLER, // descriptorType 810 2, // descriptorCount 811 VK_SHADER_STAGE_FRAGMENT_BIT, // stageFlags 812 NULL, 813 } 814}; 815 816const VkDescriptorSetLayoutBinding setLayout2[] = 817{ 818 // binding to a single image descriptor 819 { 820 0, // binding 821 VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, // descriptorType 822 1, // descriptorCount 823 VK_SHADER_STAGE_FRAGMENT_BIT, // stageFlags 824 NULL // pImmutableSamplers 825 } 826}; 827 828// Embedded immutable samplers are internally allocated and we do not need to allocate anything. 829const VkDescriptorSetLayoutBinding setLayout3[] = 830{ 831 { 832 0, // binding 833 VK_DESCRIPTOR_TYPE_SAMPLER, // descriptorType 834 1, // descriptorCount 835 VK_SHADER_STAGE_FRAGMENT_BIT, // stageFlags 836 &immutableSamplers[2], // pImmutableSamplers 837 }, 838 { 839 1, // binding 840 VK_DESCRIPTOR_TYPE_SAMPLER, // descriptorType 841 1, // descriptorCount 842 VK_SHADER_STAGE_FRAGMENT_BIT, // stageFlags 843 &immutableSamplers[3], // pImmutableSamplers 844 } 845}; 846 847// Descriptor set layouts are created as normal, but we use the descriptor buffer flag on the set layouts. 848VkDescriptorSetLayout layout0 = 849 create_descriptor_set_layout({ .flags = VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT, .pBindings = setLayout0, .bindingCount = 2 }); 850VkDescriptorSetLayout layout1 = 851 create_descriptor_set_layout({ .flags = VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT, .pBindings = setLayout1, .bindingCount = 2 }); 852VkDescriptorSetLayout layout2 = 853 create_descriptor_set_layout({ .flags = 854 VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT | 855 VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, 856 .pBindings = setLayout2, .bindingCount = 1 }); 857VkDescriptorSetLayout layout3 = 858 create_descriptor_set_layout({ .flags = 859 VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT | 860 VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT, 861 .pBindings = setLayout3, .bindingCount = 2 }); 862 863// Use 5 descriptor set layouts, mostly here to demonstrate how multiple sets can refer to one descriptor buffer. 864// Also, use embedded sampler sets and push constants for completion. 865VkPipelineLayout layout = create_pipeline_layout({ .layouts = { layout0, layout0, layout1, layout2, layout3 }}); 866 867// Query how big the descriptor set layout is. 868VkDeviceSize layoutSizes[2]; 869vkGetDescriptorSetLayoutSizeEXT(device, layout0, &layoutSizes[0]); 870vkGetDescriptorSetLayoutSizeEXT(device, layout1, &layoutSizes[1]); 871 872// Align the descriptor set size so it is suitable for suballocation within a descriptor buffer. 873layoutSizes[0] = align(layoutSizes[0], props.descriptorBufferOffsetAlignment); 874layoutSizes[1] = align(layoutSizes[1], props.descriptorBufferOffsetAlignment); 875 876// Query individual offsets into the descriptor set. 877VkDeviceSize layoutOffsets[2][2]; 878vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout0, 0, &layoutOffsets[0][0]); 879vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout0, 1, &layoutOffsets[0][1]); 880vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout1, 0, &layoutOffsets[1][0]); 881vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout1, 1, &layoutOffsets[1][1]); 882 883#define SET_COUNT 64 884 885// Allocate the equivalent of a big descriptor pool. 886// The size is arbitrary and should be large and be able to hold all descriptors used by app, 887// for this sample, we allocate the smallest possible descriptor buffer for the number of sets we need. 888// The most compatible thing to do is 1 resource buffer, 1 sampler buffer. 889Buffer resourceBuffer = create_buffer({ 890 .size = layoutSizes[0] * 2 * SET_COUNT, 891 .usage = VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT | 892 (props.bufferlessPushDescriptors ? 0 : VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT), 893 .properties = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT }); 894 895Buffer samplerBuffer = create_buffer({ 896 .size = layoutSizes[1] * SET_COUNT, 897 .usage = VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT, 898 .properties = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT }); 899 900const VkDescriptorBufferBindingPushDescriptorBufferHandleEXT push_descriptor_buffer_handle = { 901 VK_STRUCTURE_TYPE_DESCRIPTOR_BUFFER_BINDING_PUSH_DESCRIPTOR_BUFFER_HANDLE_EXT, NULL, resourceBuffer.handle}; 902 903const VkDescriptorBufferBindingInfoEXT binding_infos[2] = { 904 { VK_STRUCTURE_TYPE_DESCRIPTOR_BUFFER_BINDING_INFO_EXT, (props.bufferlessPushDescriptors ? NULL : &push_descriptor_buffer_handle), 905 resourceBuffer.deviceAddress, 906 VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT | (props.bufferlessPushDescriptors ? 0 : VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT) }, 907 { VK_STRUCTURE_TYPE_DESCRIPTOR_BUFFER_BINDING_INFO_EXT, NULL, samplerBuffer.deviceAddress, 908 VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT } 909}; 910 911// Bind the descriptor buffers once, from here, we will offset into the buffer for different descriptor sets. 912vkCmdBindDescriptorBuffersEXT(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, 0, 2, binding_infos); 913 914// Allocate these somehow, not particularly important to this example. 915VkImageView views[SET_COUNT][2][2]; 916VkSampler samplers[SET_COUNT][2]; 917VkDeviceAddress bufferAddressTexelBuffer; 918 919// No buffers are associated with embedded immutable samplers. This maps to DX12 static samplers. 920// There is no vkCmdBindPipelineLayout(), so this is the way to do it in Vulkan. 921vkCmdBindDescriptorBufferEmbeddedSamplersEXT(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 4); 922 923for (int i = 0; i < SET_COUNT; i++) 924{ 925 // This refers to the buffers we bound in vkCmdBindDescriptorBuffersEXT. 926 // Allocate descriptor sets linearly. 927 const uint32_t bufferIndices[] = { 0, 0, 1 }; 928 const VkDeviceSize offsets[] = { 2 * i * layoutSizes[0], (2 * i + 1) * layoutSizes[0], i * layoutSizes[1] }; 929 930 // Set 0: Resource set pulled from buffer 0 931 // Set 1: Resource set pulled from buffer 0 932 // Set 2: Sampler set pulled from buffer 1 933 // Set 3: Push descriptors 934 // Set 4: Embedded samplers 935 936 vkCmdSetDescriptorBufferOffsetsEXT(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 0, 3, 937 bufferIndices, offsets); 938 939 VkWriteDescriptorSet ssbo_write = { /* Fill in as desired, details not interesting here. */ }; 940 vkCmdPushDescriptorSetKHR(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 3, 1, &ssbo_write); 941 942 VkDescriptorImageInfo image_info = {}; 943 VkDescriptorAddressInfoEXT addr_info = { VK_STRUCTURE_TYPE_DESCRIPTOR_ADDRESS_INFO_EXT }; 944 VkDescriptorGetInfoEXT info = { VK_STRUCTURE_TYPE_DESCRIPTOR_GET_INFO_EXT }; 945 946 for (int j = 0; j < 2; j++) 947 { 948 info.type = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE; 949 info.pSampledImage = &image_info; 950 // If descriptorBufferImageLayoutIgnored is enabled, this is ignored, convenient! 951 image_info.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; 952 953 // Offset is based on the binding offset + the offset within the descriptor set layout we queried earlier. 954 // For array indexing, use the descriptor size from physical device property. 955 // set j, binding 0, element k 956 for (int k = 0; k < 2; k++) 957 { 958 image_info.imageView = views[i][j][k]; 959 vkGetDescriptorEXT(device, &info, props.sampledImageDescriptorSize, 960 resourceBuffer.hostPointer + offsets[j] + layoutOffsets[0][0] + k * props.sampledImageDescriptorSize); 961 } 962 963 // set j, binding 1, element k 964 info.type = VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER; 965 info.data.pUniformBuffer = &addr_info; 966 for (int k = 0; k < 2; k++) 967 { 968 addr_info.range = 1024; 969 addr_info.address = bufferAddressTexelBuffer + (4 * i + 2 * j + k) * addr_info.range; 970 // No VkBufferView needed, how convenient! 971 addr_info.format = VK_FORMAT_R8G8B8A8_UNORM; 972 vkGetDescriptorEXT(device, &info, props.uniformTexelBufferDescriptorSize, 973 resourceBuffer.hostPointer + offsets[j] + layoutOffsets[0][1] + k * props.uniformTexelBufferDescriptorSize); 974 } 975 } 976 977 // For immutable samplers, we have to emit the buffer payload. 978 // In practice, the immutable samplers must work even if implementation just ignores pImmutableSamplers. 979 info.type = VK_DESCRIPTOR_TYPE_SAMPLER; 980 // set 2, binding 0, element k 981 for (int k = 0; k < 2; k++) 982 { 983 info.data.pSampler = &immutableSamplers[k]; 984 vkGetDescriptorEXT(device, &info, props.samplerDescriptorSize, 985 samplerBuffer.hostPointer + offsets[2] + layoutOffsets[1][0] + k * props.samplerDescriptorSize); 986 } 987 988 // set 2, binding 1, element k 989 for (int k = 0; k < 2; k++) 990 { 991 info.data.pSampler = &samplers[i][k]; 992 vkGetDescriptorEXT(device, &info, props.samplerDescriptorSize, 993 samplerBuffer.hostPointer + offsets[2] + layoutOffsets[1][1] + k * props.samplerDescriptorSize); 994 } 995 996 vkCmdDraw(...); 997} 998---- 999 1000== Issues 1001 1002=== RESOLVED: How do immutable samplers work? 1003 1004There may be cases where a driver needs immutable samplers stored as part of the descriptor, rather than solely existing as a part of the pipeline. 1005With descriptor sets, this could be hidden from the application as the driver controlled how writes were performed – not so with this API. 1006To fix this, samplers must be used to populate these descriptor bindings as if they were not immutable, and they must have been created with identical parameters. 1007 1008For partity with DX12, a special kind of descriptor set - embedded immutable samplers - are supported as an alternative which follow DX12 restrictions. 1009 1010=== RESOLVED: Should we support dynamic buffers? 1011 1012No, these have very specialized support paths in some drivers, and end up being more pain than it’s worth to support. 1013Applications can achieve the same using device addresses in push constants, or pipelined descriptor buffer updates. 1014 1015 1016=== UNRESOLVED: How does this interact with descriptor set invalidation? 1017 1018There’s some extra complication with whether descriptor set layouts work with buffers or sets (`VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT`) that will need sorting. 1019Shouldn’t be too difficult and will likely just be along the lines of invalidating sets that don’t match in this regard when binding a new pipeline layout, but it’s too much detail for this design document. 1020 1021 1022=== RESOLVED: Should `vkGetDescriptorOffset` take an `arrayOffset` parameter, or should we make guarantees about how arrays work? 1023 1024Guarantees about how arrays work makes it much easier to work with GPU-side updates, as it avoids having to either add a “get offset” shader intrinsic, or for apps to keep a mapping when doing GPU copies. 1025 1026 1027=== RESOLVED: Now that descriptors are in regular memory, should there be a limit on the size of “inline uniforms”? 1028 1029We should allow developers to put as many constants into descriptor buffers as they want, thus removing the limit, at least when it interacts with this extension. 1030This is likely to remove an indirection compared to putting these in a uniform buffer. 1031Potentially we might want to at least have it match the uniform buffer limit rather than being independent. 1032 1033 1034=== RESOLVED: Why are view objects required when DX12 has no such requirement? 1035 1036DX12 has dedicated heap objects which allow implementations to hide a lot of implementation detail behind them; without them, some vendors rely on view objects to store metadata. 1037Introducing heaps to Vulkan as-is was too complex alongside the other changes in this extension, when the primary goal is to enable explicit memory management, rather than precise DX12 compatibility. 1038If this turns out to be a significant problem, a future extension could be developed to bridge this gap. 1039 1040 1041=== RESOLVED: Should `vkGetDescriptorEXT` / `vkGetDescriptorSetLayoutBindingOffsetEXT` be arrayed? 1042 1043No – there is no reason why pulling this loop into the driver should provide any benefit. 1044 1045 1046=== RESOLVED: Should we support combined image/sampler descriptors with this extension? 1047 1048While some consider these deprecated, removing them would prevent some applications being able to port to this extension. 1049Additionally, YCbCr support currently _relies_ on this descriptor type, which is required on some platforms. 1050It might be possible to remove that requirement in the YCbCr feature, but it is a lot of work for a fairly low payoff. 1051 1052 1053=== RESOLVED: How does this interact with variable descriptor count? 1054 1055The variable flag is allowed; `vkGetDescriptorSetLayoutSize` returns a size assuming the maximum size will be used - but developers are free to use the set with a buffer sized for a smaller number of descriptors. The exception to this is when `combinedImageSamplerDescriptorSingleArray` is `VK_FALSE` and the binding contains `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` descriptors; in this case the image and sampler descriptors are still arranged in the descriptor buffer as though the maximum number of descriptors are used, and so the buffer must be sized accordingly. 1056 1057 1058=== RESOLVED: Should we require descriptors to be retrieved for `NULL_HANDLE` or is `memset(0)` sufficient? 1059 1060Some vendors use non-zero values for null descriptors, so applications can retrieve these using `VK_NULL_HANDLE` with `vkGetDescriptorEXT`. 1061For descriptor types which take buffer devices addresses, a `0` address is used instead. 1062 1063=== RESOLVED: How can YCbCr descriptors be obtained? 1064 1065YCbCr descriptors can have multiple descriptors associated with them; applications must allow for this space. 1066`VkSamplerYcbcrConversionImageFormatProperties::combinedImageSamplerDescriptorCount` determines how many descriptors each image format requires. 1067When calling `vkGetDescriptorEXT` for a YCbCr combined descriptor, applications must provide a pointer to enough memory for this many combined sampled image descriptors, and factor this in when copying descriptors. 1068 1069 1070=== RESOLVED: How should we expect capture/replay tooling (e.g. RenderDoc/vktrace) to use this? 1071 1072A capture replay bit on image/buffer creation will be added to enable descriptors to be reused between runs. This allows capture tools to capture the buffer data as bound, and replay with the same descriptors, rather than attempting to do a mapping. 1073Some sort of GPU feedback is still desirable on capture to determine which handles are accessed, but this will be similar to the situation with descriptor indexing. 1074 1075 1076=== RESOLVED: On some platforms, descriptor sets occupy a 4GB range, allowing the set pointer to be 32-bit, rather than 64-bit. How can this be guaranteed for descriptor buffers? 1077 1078This could be done a number of ways – e.g. having unique memory types that guarantee allocation in a 4GB range. 1079 1080 1081=== RESOLVED: Should the alignment be separate from the size? 1082 1083No - the alignment of a descriptor is always the size of the descriptor. 1084 1085 1086=== RESOLVED: What is the fast path for constant data in this new model? Previously most vendors have recommended dynamic UBOs as a fast path, but those go away in this extension. 1087 1088The crucial part of getting data into a shader quickly is mostly dominated by number of indirections, and cache behavior. 1089Static accesses with fewer indirections and minimal memory model interactions (e.g. read-only and not `NonPrivate`) will be fastest. 1090Push constants should be favored for small amounts of data. 1091For larger amounts of data, applications should favor allocating buffers and putting data into those buffers according with whichever of the below API mechanisms is most straightforward for their use case, with some potential degradation at each step. 1092 1093 * Push constants 1094 * Pointer to data in push constants 1095 * Inline uniform data in descriptor buffers 1096 * Push descriptors 1097 * Uniform buffer in descriptor memory 1098 * Storage buffer in descriptor memory 1099 1100This order listed above is not necessarily true for all IHVs. 1101 1102 1103=== RESOLVED: Should applications be able to mix sets and buffers? 1104 1105Originally the intention was to support this, but at least one vendor cannot support this natively. 1106 1107 1108=== RESOLVED: Should we use buffer device addresses for the buffer arguments? 1109 1110Buffer parameters in recent extensions have been using device address arguments, so this extension aims to be consistent. Part of the reason for this though, is so that the base address can be modified with a single pointer argument instead of object + offset. 1111However, this extension explicitly uses a separate command for setting the offset dynamically compared to the base address, to allow for the application to set the base address statically. 1112Having the base address specified with a device address is still useful for consistency though. 1113 1114 1115=== RESOLVED: How does this interact with VK_EXT_pipeline_robustness? 1116 1117There is no way to request robust and non-robust descriptors separately, or specify robust/non-robust descriptors in the set layout, so if 1118the `robustBufferAccess` feature is enabled then robust descriptors are always used. 1119