• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright 2021-2024 The Khronos Group Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5= VK_EXT_descriptor_buffer
6:toc: left
7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/
8:sectnums:
9
10This document outlines a proposal to make the management of descriptor memory more explicit, allowing descriptors to be present in buffer memory, allowing the data and memory to be managed alongside other buffer objects.
11
12== Problem Statement
13
14With more “bindless” models of descriptor management, applications are ever increasing the number of descriptors that end up in descriptor sets.
15Managing allocations this large, and ensuring they end up in device local memory for fast access, is becoming an increasingly awkward problem to manage in the driver.
16Developers moving to Vulkan are starting to hit bottlenecks that they simply don’t encounter on other platforms.
17
18In other scenarios, making sure descriptors *do not* end up in device memory is important.
19Copying descriptors in Vulkan is considered rather esoteric, but it is a fairly common strategy in other APIs and implementing a similar style in Vulkan can lead to problems.
20There is no hint to let an implementation know that a descriptor set will only be used for purposes of copying (i.e. staging buffer).
21If a descriptor set is mapped to device local memory (BAR) or uncached memory, reading from the descriptor set on the host can have a catastrophic effect on performance.
22On top of this, some applications rely on being able to copy several tens of thousand individual descriptors every frame.
23The overhead to set up this many calls to `vkUpdateDescriptorSets` is not ideal.
24
25In contrast to this, developers are managing uploads for other large resources (e.g. images, buffers) in application code and generally doing a good job of it – typically this is not identified as a problem area.
26Developers approaching Vulkan are often confused by the way in which descriptor pools work - and several have made requests to manage things more explicitly.
27The key things that we’ve had requests for are (relevant Vulkan issues in brackets):
28
29  * Explicit allocation management
30  * Better mapping to DirectX 12
31  * Host-only descriptor pools
32  * GPU descriptor updates
33
34== Solution Space
35
36There are several more-or-less invasive options that could work here:
37
38  . Add relevant flags and other information to descriptor pools
39  . Like 1, but enable memory binding for descriptor pools
40  . Bypass descriptor pools, and allow direct creation and memory binding for descriptor sets
41  . Bypass descriptor sets, and use descriptor set layouts in buffers
42  . Bypass descriptor set layouts, and use blobs of memory in buffers that shaders access with explicit layouts
43
44link:{refpage}VK_VALVE_mutable_descriptor_type.html[VK_VALVE_mutable_descriptor_type] includes support for option 1,
45through the use of `VK_DESCRIPTOR_SET_LAYOUT_CREATE_HOST_ONLY_POOL_BIT_VALVE` and `VK_DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT_VALVE`.
46However, this does not fully solve the problem of memory management since we can only *avoid* allocating device memory for descriptors.
47Being able to control where shader-accessible descriptors are allocated is still unavailable to applications.
48
49Option 2 attempts to redefine what a descriptor pool is, and it would seem like a very awkward abstraction.
50The whole point of the descriptor pool is to allocate and manage memory on the behalf of the application.
51
52Option 3 and 4 are similarly invasive, but move descriptor pools out of the way, making things a lot clearer.
53The major downside to this is that it potentially blocks out older implementations; however this is likely the same set of implementations that wouldn’t see a benefit from this proposal anyway (i.e. “non-bindless" hardware).
54
55Option 4 has the advantage of having a smaller surface area than option 3 and allows applications to use existing buffer management functions in both Vulkan and in their own code.
56Being able to use buffers directly means that applications are in control of where the memory is allocated and can control if memory is:
57
58 * Host-only (plain malloc)
59 * Host-only but shader-visible (`VkDeviceMemory` with `HOST_VISIBLE_BIT`)
60 * Device local and shader-visible (resizable BAR on discrete GPUs, unified memory on integrated)
61 * Device local only (GPU copies descriptors)
62
63Option 5 is more invasive than Option 4 and requires shader-side changes.
64
65In order to keep the required changes in this extension to the API only, the extra steps in Option 5 are deferred to a future planned extension, and this proposal focuses on Option 4.
66
67
68== Proposal
69
70=== Modelling a descriptor set as memory
71
72Descriptors in Vulkan as it stands are generally considered quite abstract.
73They do not have a size, and when creating descriptor pools it is only specified how many descriptors can be allocated.
74
75This abstraction is removed by the proposal and it assumes that a `VkDescriptorSetLayout` can be expressed as a list of binding points with a known:
76
77 * Byte offset
78 * Element size
79 * Number of elements tightly packed
80
81The element size depends on the descriptor type and is a property of the physical device.
82
83Implementations are free to control the byte offset, and so can freely repack descriptors for optimal memory access.
84For exact control over byte offsets for different descriptors, descriptor indexing should be used, since arrays have guaranteed packing.
85
86If we think in terms of `VkDescriptorPool` with this model, an implementation of that could be something like an arena allocator where size is derived from the descriptor counts,
87and a `VkDescriptorSet` with `VkDescriptorSetLayout` just allocates a certain number of bytes from the pool.
88This is essentially the same model as `VkBuffer` and `VkImage` allocation.
89
90When we call `vkCmdBindDescriptorSets`, what we are really doing is binding a buffer of a certain size.
91The shader compiler looks at `VkPipelineLayout` and based on the `DescriptorSet` and `Binding` decorations, it can look up that a descriptor can be read from the bound descriptor set at a specific offset.
92
93As link:{refpage}VK_EXT_descriptor_indexing.html[VK_EXT_descriptor_indexing] is required, its descriptor limits apply.
94
95==== Next level update-after-bind
96
97With descriptor being modelled as buffer memory, we remove all pretense of the implementation being able to consume descriptors when recording the command buffer.
98In the Vulkan 1.0 descriptor model, descriptors must be valid when descriptor sets are bound and remain valid, which means implementations are free to consume the descriptors, repack them, and so on if they desire.
99With descriptor indexing, the `UPDATE_AFTER_BIND_BIT` and `PARTIALLY_BOUND_BIT` flags imply a buffer like model where descriptors must not be consumed unless dynamically used by shaders.
100With descriptor buffers, this model is implied and it is not allowed to specify a descriptor set layout being both update-after-bind and descriptor buffer capable.
101
102As descriptors can be updated in the GPU timeline, descriptor buffers go a bit further than update-after-bind.
103In the existing update-after-bind model, descriptors can only be consumed correctly if they were written before queue submits.
104
105==== Dropping support for abstract descriptor types
106
107Some descriptor types are a bit more abstract in nature. Dynamic uniform buffers and dynamic storage buffers for example have a component to them that does not consume descriptor memory, but function more like push constants.
108Descriptor types which cannot be expressed in terms of descriptors in memory are not supported with descriptor buffers,
109but rapidly changing descriptors can be replaced with existing alternatives such as:
110
111 * Push constants
112 * Place buffer device address in push constants
113 * Push descriptors
114
115Update-after-bind has similar restrictions already.
116
117==== One buffer, many offsets
118
119While binding descriptor sets as memory is possible on a wide range of hardware, descriptors are still considered "special" memory by many implementations, and it may not be possible to bind many different buffers at the same time.
120Some possible restrictions can be:
121
122 * Limited address space for descriptors
123 * Descriptor sets are accessed with offset from one or more base pointers
124
125In Vulkan, applications are guaranteed at least 4 descriptor sets, but many implementations go beyond this.
126At the same time, it might not be possible to bind that many different descriptor buffers.
127
128In D3D12 for example, this problem manifests itself as `ID3D12GraphicsCommandList::SetDescriptorHeaps()`.
129
130Similarly, this extension will work on a model where applications allocate large descriptor buffers, and bind those buffers to the command buffer.
131From there, descriptor sets are expressed as offsets into the bound buffers.
132
133It is expected that changing a descriptor buffer binding is a fairly heavy operation on some implementations and should be avoided.
134Changing offsets however, is very efficient.
135
136A limited address space can be expressed with special memory types that allocate from a dedicated address space region.
137
138==== No mixing and matching descriptor buffers and older model
139
140The implication of descriptor buffers is that applications will now take more control over which descriptor buffers are bound to a command buffer.
141Without descriptor buffers, this is something implementations were able to hide from applications, so it is not possible to mix and match these models in one draw or dispatch.
142It is possible to mix and match the two models in different draw or dispatches, but it is equivalent to changing the descriptor buffer bindings and should be avoided if possible.
143
144In terms of state invalidation, whenever a descriptor buffer offset is bound, it invalidates all bindings for descriptor sets and vice versa.
145
146=== Putting Descriptors in Memory
147
148This extension introduces new commands to put shader-accessible descriptors directly in memory.
149Properties of descriptor set layouts may vary based on enabled device features, so new device-level functions are added to query the properties of layouts.
150These calls are invariant across the lifetime of the device, and between link:{refpage}VkDevice.html[VkDevice] objects created from the same physical device(s), with the same creation parameters.
151
152[source,c]
153----
154void vkGetDescriptorSetLayoutSizeEXT(
155    VkDevice                                    device,
156    VkDescriptorSetLayout                       layout,
157    VkDeviceSize*                               pLayoutSizeInBytes);
158
159void vkGetDescriptorSetLayoutBindingOffsetEXT(
160    VkDevice                                    device,
161    VkDescriptorSetLayout                       layout,
162    uint32_t                                    binding,
163    VkDeviceSize*                               pOffset);
164----
165
166Applications are responsible for writing data into memory, but the application does not control the memory location directly – descriptor set layouts dictate where each descriptor lives, so that the shader interface continues to work as-is with set and binding numbers.
167
168The size and offset of descriptors is exposed to applications, so they know how to copy it into memory.
169This is important since applications are free to copy descriptors on the device itself.
170
171The sizes for different descriptor types are defined in the properties: `samplerDescriptorSize`, `combinedImageSamplerDescriptorSize`, `sampledImageDescriptorSize`, `storageImageDescriptorSize`, `uniformTexelBufferDescriptorSize`, `robustUniformTexelBufferDescriptorSize`, `storageTexelBufferDescriptorSize`, `robustStorageTexelBufferDescriptorSize`, `uniformBufferDescriptorSize`, `robustUniformBufferDescriptorSize`, `storageBufferDescriptorSize`, `robustStorageBufferDescriptorSize`, `inputAttachmentDescriptorSize`, `accelerationStructureDescriptorSize`, `combinedImageSamplerDensityMapDescriptorSize`.
172
173Descriptor arrays have guaranteed packing, such that each element of an array for a given binding has an offset from that binding’s base offset equal to the size of the descriptor multiplied by the array offset.
174Bindings can be moved around as the driver sees fit, but variable-sized descriptor arrays must be packed at the end.
175
176For use cases where layouts contain a variable-sized descriptor count, the size returned reflects the upper bound described in the descriptor set layout.
177The size required for a descriptor set layout with a variable size descriptor array can be obtained by adding the product of the number of descriptors that are actually used and the size of the descriptor.
178
179Descriptor set layouts used for this purpose must be created with a new create flag:
180
181[source,c]
182----
183VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT = 0x00000010
184----
185
186Layouts created with this flag must not be used to create a link:{refpage}VkDescriptorSet.html[VkDescriptorSet] and must not include dynamic uniform buffers or dynamic storage buffers.
187Applications can achieve the same dynamic offsetting by either updating a descriptor buffer, using push constants, or by using push descriptors.
188The blob of memory corresponding to a descriptor is obtained from resource views directly.
189How applications get that data into device memory is entirely up to them, but the offset must match that obtained from the layout.
190
191[source,c]
192----
193typedef struct VkDescriptorAddressInfoEXT {
194    VkStructureType                                 sType;
195    const void*                                     pNext;
196    VkDeviceAddress                                 address;
197    VkDeviceSize                                    range;
198    VkFormat                                        format;
199} VkDescriptorAddressInfoEXT;
200
201typedef union VkDescriptorDataEXT {
202    const VkSampler*                                pSampler;
203    const VkDescriptorImageInfo*                    pCombinedImageSampler;
204    const VkDescriptorImageInfo*                    pInputAttachmentImage;
205    const VkDescriptorImageInfo*                    pSampledImage;
206    const VkDescriptorImageInfo*                    pStorageImage;
207    const VkDescriptorAddressInfoEXT*               pUniformTexelBuffer;
208    const VkDescriptorAddressInfoEXT*               pStorageTexelBuffer;
209    const VkDescriptorAddressInfoEXT*               pUniformBuffer;
210    const VkDescriptorAddressInfoEXT*               pStorageBuffer;
211    VkDeviceAddress                                 accelerationStructure;
212} VkDescriptorDataEXT;
213
214typedef struct VkDescriptorGetInfoEXT {
215    VkStructureType                                 sType;
216    const void*                                     pNext;
217    VkDescriptorType                                type;
218    VkDescriptorDataEXT                             data;
219} VkDescriptorGetInfoEXT;
220
221void vkGetDescriptorEXT(
222    VkDevice                                        device
223    const VkDescriptorGetInfoEXT*                   pCreateInfo,
224    size_t                                          dataSize,
225    void*                                           pDescriptor);
226----
227
228These APIs extract raw descriptor blob data from objects. The data obtained from these calls can be freely copied around.
229Note that these calls do not know anything about descriptor set layouts. It is the application's responsibility to write descriptors to a suitable location.
230
231A notable change here is that there is no longer any need for link:{refpage}VkBufferView.html[VkBufferView] objects.
232Texel buffers are built from buffer device addresses and format instead.
233This improvement is motivated by DX12 portability.
234In some use cases, texel buffers are linearly allocated and having to create and manage a large number of unique view objects is problematic.
235With descriptor buffers, this style of API is now feasible in Vulkan.
236
237A similar improvement is that uniform buffers and storage buffer also take buffer device addresses.
238
239Acceleration structure descriptors are also built from device addresses, or handles retrieved from `vkGetAccelerationStructureHandleNV` when using `VkAccelerationStructureNV` objects.
240
241Inline uniform buffers do not have a descriptor data getter API associated with them.
242Instead, the descriptor data is copied directly into the buffer offset obtained by `vkGetDescriptorSetLayoutBindingOffsetEXT`.
243As the name suggests, inline uniform buffers are embedded into the descriptor set itself.
244
245As descriptors are now in regular memory, drivers cannot hide copies of immutable samplers that end up in descriptor sets from the application.
246As such, applications are required to provide these samplers as if they were not provided immutably.
247These samplers must have identical parameters to the immutable samplers in the descriptor set layout.
248Alternatively, applications can use dedicated descriptor sets for immutable samplers that do not require app-managed memory, by <<Embedded Immutable Samplers,embedding them in a special descriptor set>>.
249
250If the `descriptorBufferImageLayoutIgnored` feature is enabled, the `imageLayout` in link:{refpage}VkDescriptorImageInfo.html[VkDescriptorImageInfo] is ignored, otherwise it specifies the layout that the descriptor will be used with.
251`type` must not be `VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC` or `VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC`.
252'format' in `VkDescriptorAddressInfoEXT` is ignored for non-texel buffers.
253
254The `combinedImageSamplerDescriptorSingleArray` property indicates that the implementation does not require an array of `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` descriptors to be written into a descriptor buffer as an array of image descriptors, immediately followed by an array of sampler descriptors. If `VK_FALSE`, applications are expected to write the first `sampledImageDescriptorSize` bytes of the data returned through `pDescriptor` to the first array, and the remaining `samplerDescriptorSize` bytes of the data to the second array.
255On these implementations, variable descriptor counts of combined image samplers may be supported, but it is not useful as the descriptor set size must assume the upper bound.
256
257
258==== Embedded Immutable Samplers
259
260Immutable samplers can be embedded into descriptor layouts, allowing them to be bound without disturbing descriptor buffer bindings or requiring device memory backing.
261Descriptor set layouts must be created with a new flag for this purpose:
262
263[source,c]
264----
265VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT = 0x00000020
266----
267
268When this flag is used, this set layout can only contain descriptor bindings with a `descriptorType` of `VK_DESCRIPTOR_TYPE_SAMPLER`, a `descriptorCount` of `1` (i.e. not arrayed), and a valid `VkSampler used in `pImmutableSamplers`.
269Note that arrays of immutable samplers are not supported, as implementations typically need these in memory to allow dynamic indexing - whereas no device memory is directly associated with these sets.
270
271
272=== Pipeline creation
273
274To use pipelines with descriptor buffers a new `VkPipelineCreateFlag` must be used:
275
276[source,c]
277----
278VK_PIPELINE_CREATE_DESCRIPTOR_BUFFER_BIT_EXT = 0x20000000
279----
280
281=== Descriptor Binding
282
283Descriptor buffers are bound to the command buffer directly (similar to vertex buffers).
284
285[source,c]
286----
287
288typedef struct VkDescriptorBufferBindingPushDescriptorBufferHandleEXT {
289    VkStructureType                             sType;
290    const void*                                 pNext;
291    VkBuffer                                    buffer;
292} VkDescriptorBufferBindingPushDescriptorBufferHandleEXT;
293
294typedef struct VkDescriptorBufferBindingInfoEXT {
295    VkStructureType                             sType;
296    const void*                                 pNext;
297    VkDeviceAddress                             address;
298    VkBufferUsageFlags                          usage;
299} VkDescriptorBufferBindingInfoEXT;
300
301vkCmdBindDescriptorBuffersEXT(
302    VkCommandBuffer                             commandBuffer,
303    uint32_t                                    bufferCount,
304    const VkDescriptorBufferBindingInfoEXT*     pBindingInfos);
305----
306
307Unlike binding descriptor sets, there’s no invalidating going on with this binding – a buffer remains bound and is interpreted by a pipeline in the manner the pipeline expects, irrespective of what layout was used to construct the buffer for each set.
308
309There must be no more than `maxSamplerDescriptorBufferBindings` descriptor buffers containing sampler descriptor data bound.
310Such buffers must be created with `VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT`.
311
312There must be no more than `maxResourceDescriptorBufferBindings` descriptor buffers containing resource descriptors bound.
313Such buffers must be bound with `VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT`.
314
315If a buffer contains both usage flags, it counts once against both limits.
316
317If the `bufferlessPushDescriptors` property is `VK_FALSE` and a buffer contains the `VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT` usage flag, a `VkDescriptorBufferBindingPushDescriptorBufferHandleEXT` structure must be added to the `pNext` chain of `VkDescriptorBufferBindingInfoEXT`.
318
319`bufferCount` must be less than or equal to `maxDescriptorBufferBindings`.
320
321Any previously bound buffers at binding points greater than or equal to `bufferCount` are unbound.
322
323Each entry in `pBindingInfos` contains the device address of a descriptor buffer and the usage flags that the buffer was created with.
324
325Changing buffers may be an expensive operation and should be done infrequently (if ever).
326
327The maximum available range of each binding to a shader is `maxSamplerDescriptorBufferRange` and/or `maxResourceDescriptorBufferRange`.
328
329The `samplerDescriptorBufferAddressSpaceSize`, `resourceDescriptorBufferAddressSpaceSize`, and `descriptorBufferAddressSpaceSize` properties
330give the upper bound for the total amount of address space used for descriptor buffers.
331
332Buffers used for this purpose need to be created with a new usage flags:
333
334[source,c]
335----
336VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT  = 0x00200000
337VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT = 0x00400000
338----
339
340`VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT` specifies that the buffer will be used to contain sampler descriptors when bound as a descriptor buffer.
341`VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT` specifies that the buffer will be used to contain resource descriptors, i.e. non-sampler descriptors, when bound as a descriptor buffer.
342Buffers containing `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` descriptors must have been created with both `VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT` and `VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT`.
343
344Each descriptor set is associated with a buffer and an offset into that buffer which can be set by:
345
346[source,c]
347----
348vkCmdSetDescriptorBufferOffsetsEXT(
349    VkCommandBuffer                             commandBuffer,
350    VkPipelineBindPoint                         pipelineBindPoint,
351    VkPipelineLayout                            layout,
352    uint32_t                                    firstSet,
353    uint32_t                                    setCount,
354    const uint32_t*                             pBufferIndices,
355    const VkDeviceSize*                         pOffsets);
356----
357
358`vkCmdSetDescriptorBufferOffsetsEXT` causes the sets numbered [firstSet.. firstSet+setCount-1] to use the bindings stored in the buffer bound at pBufferIndices[i] at an offset of pOffsets[i] for subsequent bound pipeline commands set by pipelineBindPoint. Any bindings that were previously applied via these sets, or calls to `vkCmdBindDescriptorSets`, are no longer valid. Calling vkCmdBindDescriptorSets invalidates bindings previously applied via `vkCmdSetDescriptorBufferOffsetsEXT`.
359
360Setting offsets should be a cheap operation and can be performed frequently.
361The offsets must be aligned to `descriptorBufferOffsetAlignment`.
362
363<<Embedded Immutable Samplers,Embedded immutable samplers>> can be bound using:
364
365[source,c]
366-----
367vkCmdBindDescriptorBufferEmbeddedSamplersEXT(
368    VkCommandBuffer                             commandBuffer,
369    VkPipelineBindPoint                         pipelineBindPoint,
370    VkPipelineLayout                            layout,
371    uint32_t                                    set)
372);
373-----
374
375`vkCmdBindDescriptorBufferEmbeddedSamplersEXT` binds the embedded immutable samplers in `layout` at set index `set` to the same set in the command buffer.
376Set bindings are invalidated in the same manner as they are for `vkCmdSetDescriptorBufferOffsetEXT`.
377The `VkDescriptorSetLayout` at index `set` of `layout` must have been created with the `VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT` bit.
378There must be no more than `maxEmbeddedImmutableSamplerBindings` embedded immutable sampler sets bound.
379Like DX12, there is a limit to how many unique embedded immutable samplers may be alive in a device at any one point. This limit is designed to match DX12.
380
381
382=== Descriptor Updates
383
384As descriptors are just a blob of memory, descriptor updates can be performed by any operation on either the host or device that can access memory, enabling a form of GPU descriptor update.
385Descriptor buffer reads can be synchronized using a new access bit in the relevant shader stage:
386
387[source,c]
388----
389VK_ACCESS_2_DESCRIPTOR_BUFFER_READ_BIT_EXT = 0x20000000000ULL
390----
391
392Note that host writes are implicitly made visible to all stages in `vkQueueSubmit`, so this access flag is only relevant when performing GPU-side updates of descriptors.
393
394If the `allowSamplerImageViewPostSubmitCreation` property is `VK_FALSE` there are special requirements for when descriptor data for `VkSampler` or `VkImageView` objects can be used.
395Those objects must have been created before any `vkQueueSubmit` (or `vkQueueSubmit2`) call that executes a command buffer which accesses descriptor data for them.
396
397For example, if `allowSamplerImageViewPostSubmitCreation` is `VK_FALSE`, this is disallowed:
398
399* Call `vkQueueSubmit()` which is waiting for a timeline semaphore
400* Create a `VkImageView`
401* Update the descriptor buffer used by the previous submission from the host using the descriptor data of the new `VkImageView`
402* Signal the semaphore from the host
403
404=== Push descriptors
405
406Support for descriptor buffers combined with push descriptors is supported if the `descriptorBufferPushDescriptors` feature bit is set.
407
408To support push descriptors on certain implementations, additional buffer usage flags are added:
409
410[source,c]
411----
412VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT = 0x04000000
413----
414
415If the application desires to use push descriptors and descriptor buffers together,
416a descriptor set layout must be declared with `VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR` and `VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT` bits set.
417
418If the `bufferlessPushDescriptors` property is `VK_FALSE`, there are special requirements for using push descriptors with descriptor buffers.
419`VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT` is a special buffer flag which is required for certain implementations in order for push descriptors to interoperate with descriptor buffers.
420When pushing descriptors using this kind of set layout, it is required that a descriptor buffer is bound to the command list with the `VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT` usage flag.
421The intention here is that implementation can reserve scratch space in descriptor buffers for the purposes of dealing with push descriptors.
422The mechanics here are highly magical and implementation defined in nature and is considered too burdensome to expect that applications deal with it.
423
424Binding a buffer that was created with `VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT` requires the application to record any current push descriptors again.
425
426=== Capture/Replay
427
428When creating a resource with the capture/replay feature enabled, an opaque handle can be obtained which can be passed into creation calls in a future replay, causing descriptors to be created with the same data.
429
430New flags to be supplied when creating buffers, images, and samplers to be captured/replayed:
431
432[source,c]
433----
434VK_BUFFER_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT                 = 0x00000020
435VK_IMAGE_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT                  = 0x00010000
436VK_IMAGE_VIEW_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT             = 0x00000004
437VK_SAMPLER_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT                = 0x00000008
438VK_ACCELERATION_STRUCTURE_CREATE_DESCRIPTOR_BUFFER_CAPTURE_REPLAY_BIT_EXT = 0x00000008
439----
440
441There are separate commands to get opaque data for buffers, images, and samplers:
442
443[source,c]
444----
445VkResult vkGetBufferOpaqueCaptureDescriptorDataEXT(
446    VkDevice                                    device,
447    const VkBufferCaptureDescriptorDataInfoEXT* pInfo,
448    void*                                       pData);
449
450typedef struct VkBufferCaptureDescriptorDataInfoEXT {
451    VkStructureType    sType;
452    const void*        pNext;
453    VkBuffer           buffer;
454} VkBufferCaptureDescriptorDataInfoEXT;
455
456VkResult vkGetImageOpaqueCaptureDescriptorDataEXT(
457    VkDevice                                   device,
458    const VkImageCaptureDescriptorDataInfoEXT* pInfo,
459    void*                                      pData);
460
461typedef struct VkImageCaptureDescriptorDataInfoEXT {
462    VkStructureType    sType;
463    const void*        pNext;
464    VkImage            image;
465} VkImageCaptureDescriptorDataInfoEXT;
466
467VkResult vkGetImageViewOpaqueCaptureDescriptorDataEXT(
468    VkDevice                                       device,
469    const VkImageViewCaptureDescriptorDataInfoEXT* pInfo,
470    void*                                          pData);
471
472typedef struct VkImageViewCaptureDescriptorDataInfoEXT {
473    VkStructureType    sType;
474    const void*        pNext;
475    VkImageView        imageView;
476} VkImageViewCaptureDescriptorDataInfoEXT;
477
478VkResult vkGetSamplerOpaqueCaptureDescriptorDataEXT(
479    VkDevice                                     device,
480    const VkSamplerCaptureDescriptorDataInfoEXT* pInfo,
481    void*                                        pData);
482
483typedef struct VkSamplerCaptureDescriptorDataInfoEXT {
484    VkStructureType    sType;
485    const void*        pNext;
486    VkSampler          sampler;
487} VkSamplerCaptureDescriptorDataInfoEXT;
488
489VkResult vkGetAccelerationStructureOpaqueCaptureDescriptorDataEXT(
490    VkDevice                                                   device,
491    const VkAccelerationStructureCaptureDescriptorDataInfoEXT* pInfo,
492    void*                                                      pData);
493
494typedef struct VkAccelerationStructureCaptureDescriptorDataInfoEXT {
495    VkStructureType                  sType;
496    const void*                      pNext;
497    VkAccelerationStructureKHR       accelerationStructure;
498    VkAccelerationStructureNV        accelerationStructureNV;
499} VkAccelerationStructureCaptureDescriptorDataInfoEXT;
500----
501
502Once queried, this must be provided to buffer/image/imageview/sampler/acceleration structure creation in a similar manner to buffer device address creation, by chaining the following structure to buffer, image, imageview, sampler, or acceleration structure creation:
503
504[source,c]
505----
506typedef struct VkOpaqueCaptureDescriptorDataCreateInfoEXT {
507    VkStructureType    sType;
508    const void*        pNext;
509    const void*        opaqueCaptureDescriptorData;
510} VkOpaqueCaptureDescriptorDataCreateInfoEXT;
511----
512
513In each case, the size of the capture data is sized to the `bufferCaptureReplayDescriptorDataSize`, `imageCaptureReplayDescriptorDataSize`, `imageViewCaptureReplayDescriptorDataSize`, `samplerCaptureReplayDescriptorDataSize`, or `accelerationStructureCaptureReplayDescriptorDataSize` limits as appropriate.
514
515In addition, link:{refpage}vkGetDeviceMemoryOpaqueCaptureAddress.html[vkGetDeviceMemoryOpaqueCaptureAddress] must be used to capture the opaque address and replay it with link:{refpage}VkMemoryOpaqueCaptureAddressAllocateInfo.html[VkMemoryOpaqueCaptureAddressAllocateInfo], for any memory used by resources with these handles.
516
517
518=== Device Features
519
520The following features are exposed:
521
522[source,c]
523----
524typedef struct VkPhysicalDeviceDescriptorBufferFeaturesEXT {
525    VkStructureType    sType;
526    void*              pNext;
527    VkBool32           descriptorBuffer;
528    VkBool32           descriptorBufferCaptureReplay;
529    VkBool32           descriptorBufferImageLayoutIgnored;
530    VkBool32           descriptorBufferPushDescriptors;
531} VkPhysicalDeviceDescriptorBufferFeaturesEXT;
532----
533
534If the `descriptorBuffer` feature is enabled, link:{refpage}VK_AMD_shader_fragment_mask.html[VK_AMD_shader_fragment_mask] must not be enabled.
535If the `descriptorBufferImageLayoutIgnored` feature is enabled, the image layout provided when getting a descriptor is ignored.
536The `descriptorBufferCaptureReplay` feature is primarily for capture replay tools, and allows opaque data to be captured and replayed, allowing the same descriptor handles to be used on replay.
537If the `descriptorBufferPushDescriptors` features is enabled push descriptors can be used with descriptor buffers.
538
539
540=== Device Properties
541
542The following properties are exposed:
543
544[source,c]
545----
546typedef struct VkPhysicalDeviceDescriptorBufferPropertiesEXT {
547    VkStructureType    sType;
548    void*              pNext;
549    VkBool32           combinedImageSamplerDescriptorSingleArray;
550    VkBool32           bufferlessPushDescriptors;
551    VkBool32           allowSamplerImageViewPostSubmitCreation;
552    VkDeviceSize       descriptorBufferOffsetAlignment;
553    uint32_t           maxDescriptorBufferBindings;
554    uint32_t           maxResourceDescriptorBufferBindings;
555    uint32_t           maxSamplerDescriptorBufferBindings;
556    uint32_t           maxEmbeddedImmutableSamplerBindings;
557    uint32_t           maxEmbeddedImmutableSamplers;
558    size_t             bufferCaptureReplayDescriptorDataSize;
559    size_t             imageCaptureReplayDescriptorDataSize;
560    size_t             imageViewCaptureReplayDescriptorDataSize;
561    size_t             samplerCaptureReplayDescriptorDataSize;
562    size_t             accelerationStructureCaptureReplayDescriptorDataSize;
563    size_t             samplerDescriptorSize;
564    size_t             combinedImageSamplerDescriptorSize;
565    size_t             sampledImageDescriptorSize;
566    size_t             storageImageDescriptorSize;
567    size_t             uniformTexelBufferDescriptorSize;
568    size_t             robustUniformTexelBufferDescriptorSize;
569    size_t             storageTexelBufferDescriptorSize;
570    size_t             robustStorageTexelBufferDescriptorSize;
571    size_t             uniformBufferDescriptorSize;
572    size_t             robustUniformBufferDescriptorSize;
573    size_t             storageBufferDescriptorSize;
574    size_t             robustStorageBufferDescriptorSize;
575    size_t             inputAttachmentDescriptorSize;
576    size_t             accelerationStructureDescriptorSize;
577    VkDeviceSize       maxSamplerDescriptorBufferRange;
578    VkDeviceSize       maxResourceDescriptorBufferRange;
579    VkDeviceSize       samplerDescriptorBufferAddressSpaceSize;
580    VkDeviceSize       resourceDescriptorBufferAddressSpaceSize;
581    VkDeviceSize       descriptorBufferAddressSpaceSize;
582} VkPhysicalDeviceDescriptorBufferPropertiesEXT;
583----
584
585* `descriptorBufferOffsetAlignment` describes the alignment required, in bytes, when setting offsets into the descriptor buffer.
586* `combinedImageSamplerDescriptorSingleArray` indicates that the implementation does not require an array of `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` descriptors to be written into a descriptor buffer as an array of image descriptors, immediately followed by an array of sampler descriptors.
587* `bufferlessPushDescriptors` indicates that the implementation does not require a buffer created with `VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT_EXT` to be bound when using push descriptors.
588* `allowSamplerImageViewPostSubmitCreation` indicates that the implementation does not restrict when the `VkSampler` or `VkImageView` objects used to retrieve descriptor data can be created in relation to command buffer submission. If this value is `VK_FALSE`, then the application must create any `VkSampler` or `VkImageView` objects whose descriptor data is accessed during the execution of a command buffer, before the `vkQueueSubmit` (or `vkQueueSubmit2`) call that submits that command buffer.
589* `maxDescriptorBufferBindings` defines the maximum total number of descriptor buffers and embedded immutable sampler sets that can be bound.
590* `maxResourceDescriptorBufferBindings` defines the maximum number of resource descriptor buffers that can be bound.
591* `maxSamplerDescriptorBufferBindings` defines the maximum number of sampler descriptor buffers that can be bound.
592* `maxEmbeddedImmutableSamplerBindings` defines the maximum number of embedded immutable samplers sets that can be bound.
593* `maxEmbeddedImmutableSamplers` describes the maximum number of unique immutable samplers in descriptor set layouts created with `VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT`, and pipeline layouts created from them, which can simultaneously exist on a device.
594* `bufferCaptureReplayDescriptorDataSize`, `imageCaptureReplayDescriptorDataSize`, `imageViewCaptureReplayDescriptorDataSize`, `samplerCaptureReplayDescriptorDataSize`, and `accelerationStructureCaptureReplayDescriptorDataSize` define the maximum size, in bytes, of the opaque data used for capture replay with each respective object type.
595* `samplerDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_SAMPLER descriptor.
596* `combinedImageSamplerDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptor.
597* `sampledImageDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE descriptor.
598* `storageImageDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_IMAGE descriptor.
599* `uniformTexelBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER descriptor.
600* `robustUniformTexelBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER descriptor when robust buffer access is enabled.
601* `storageTexelBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER descriptor.
602* `robustStorageTexelBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER descriptor when robust buffer access is enabled.
603* `uniformBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER descriptor.
604* `robustUniformBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER descriptor when robust buffer access is enabled.
605* `storageBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_BUFFER descriptor.
606* `robustStorageBufferDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_STORAGE_BUFFER descriptor when robust buffer access is enabled.
607* `inputAttachmentDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT descriptor.
608* `accelerationStructureDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR/VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_NV descriptor.
609* `maxSamplerDescriptorBufferRange` describes the accessible range, in bytes, of a sampler buffer when bound.
610* `maxResourceDescriptorBufferRange` describes the accessible range, in bytes, of a resource buffer when bound.
611* `samplerDescriptorBufferAddressSpaceSize` describes the total amount of address space available, in bytes, for descriptor buffers containing samplers.
612* `resourceDescriptorBufferAddressSpaceSize` describes the total amount of address space available, in bytes, for descriptor buffers containing resources.
613* `descriptorBufferAddressSpaceSize` describes the total amount of address space available, in bytes, for all descriptor buffers.
614
615If link:{refpage}VK_VALVE_mutable_descriptor_type.html[VK_VALVE_mutable_descriptor_type] is used,
616a descriptor is considered to be a union of all the enabled types, so the size of a descriptor is the maximum of all enabled types.
617
618[source,c]
619----
620typedef struct VkPhysicalDeviceDescriptorBufferDensityMapPropertiesEXT {
621    VkStructureType    sType;
622    void*              pNext;
623    size_t             combinedImageSamplerDensityMapDescriptorSize;
624} VkPhysicalDeviceDescriptorBufferDensityMapPropertiesEXT;
625----
626
627* `combinedImageSamplerDensityMapDescriptorSize` describes the size, in bytes, of a VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptor when using the VK_SAMPLER_CREATE_SUBSAMPLED_BIT_EXT flag of the link:{refpage}VK_EXT_fragment_density_map.html[VK_EXT_fragment_density_map] extension.
628
629== Mapping to DirectX® 12 Descriptor Heaps
630
631In DirectX 12 (DX12), descriptors are allocated into descriptor heaps, which work almost completely differently to anything currently in Vulkan.
632This extension aims to reduce one aspect of the divergence between the two.
633Below is a rough description of the mapping from DX12 to this extension.
634Applications looking to port between the two APIs will likely have more information available than the DX12 API provides, and can likely take shortcuts (highlighted where possible).
635This doesn’t solve the overall limits for object counts, and so it’s not possible to trivially emulate every corner of the DX12 API.
636
637
638=== Descriptor Heap Creation
639
640DX12 has the following command to create a heap:
641
642[source,c]
643----
644typedef struct D3D12_DESCRIPTOR_HEAP_DESC {
645  D3D12_DESCRIPTOR_HEAP_TYPE  Type;
646  UINT                        NumDescriptors;
647  D3D12_DESCRIPTOR_HEAP_FLAGS Flags;
648  UINT                        NodeMask;
649} D3D12_DESCRIPTOR_HEAP_DESC;
650
651HRESULT CreateDescriptorHeap(
652  const D3D12_DESCRIPTOR_HEAP_DESC *pDescriptorHeapDesc,
653  REFIID                           riid,
654  void                             **ppvHeap
655);
656----
657
658Implementing the equivalent functionality in Vulkan would mean the following operations:
659
660  * Create a `VkDescriptorSetLayout` with `VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT`. The count would be up to 1000000 for resources, and 2048 for samplers.
661  ** If link:{refpage}VK_VALVE_mutable_descriptor_type.html[VK_VALVE_mutable_descriptor_type] is supported, we only need one descriptor set layout which supports all descriptor types for the heap type.
662  ** Otherwise, there are two alternatives:
663  *** Create up to 6 descriptor set layouts of the relevant descriptor types the application cares about (`STORAGE_BUFFER`, `UNIFORM_BUFFER`, `SAMPLED_IMAGE`, `STORAGE_IMAGE`, `UNIFORM_TEXEL_BUFFER`, `STORAGE_TEXEL_BUFFER`).
664  *** Create one descriptor set layout with 6 fixed-size arrays instead of using variable descriptor counts. This means `NumDescriptors` is effectively ignored.
665  * Create a `VkBuffer`, size equal to `NumDescriptors` multiplied by the descriptor size within it, and its device mask set per `NodeMask`.
666  * If `Flags` includes `D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE`, allocate `DEVICE_LOCAL` memory.
667  ** If this memory can be `DEVICE_LOCAL` and `HOST_VISIBLE`, then that can be mapped directly for the CPU pointer and used as the heap CPU pointer.
668  ** Otherwise, `HOST_VISIBLE` staging memory should be allocated for a parallel buffer.
669     Copying from this staging buffer to the main descriptor buffer should be done at each submit where the staging buffer has been modified.
670  * If Flags does not include `D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE`, allocate `HOST_VISIBLE` memory that can be used for staging copies to `DEVICE_LOCAL` memory.
671  ** Alternatively, plain `malloc` can be used if descriptor copies are implemented as `memcpy`.
672  * Copying descriptors ala `CopyDescriptorsSimple()` is implemented with either memcpy or staging copies.
673
674This model would support the full TIER_3 resource binding feature in DX12 and shader model 6.6 direct heap access, but can be simplified a lot for applications with DX11-style binding models.
675
676=== Descriptor Creation
677
678Unlike DX12, Vulkan (and this extension) requires view objects and sampler objects to exist and have their lifetimes managed by the application.
679These objects need to be kept alive for the descriptor itself to be valid.
680How this is managed precisely is going to depend on the application’s usage patterns, though link:https://github.com/HansKristian-Work/vkd3d-proton[vkd3d-proton] suggests one viable option.
681The scheme used by vkd3d-proton involves keeping a hash map of the views associated with each resource object (or the device for samplers), using creation parameters as a key, so that their lifetime is tied to the underlying resource and can be reused.
682When actually creating the UAV/SRV/Sampler, the object should be looked up in the relevant hash map, and created there if necessary.
683The descriptor itself is then written directly to the provided CPU pointer.
684Note that 'VkBufferView' objects are not used and have been replaced by an explicit address, range, and format.
685This is very important since applications have a tendency to linearly allocate texel buffers and might end up rapidly create these views at different offsets.
686If applications were forced to hold on to all unique 'VkBufferView' objects, things get out of hand quickly.
687vkd3d-proton currently works around this problem by quantizing the texel buffer offset and range, and instead performs offset/range checks per access in shaders to keep the number of objects low, which is obviously not desirable.
688
689For image views on the other hand, the number of unique views in flight per resource tends to be constrained and manageable.
690In terms of performance characteristics, creating SRVs and UAVs is already far more expensive in DX12 than copying descriptors.
691The style observed in most DX12 applications is that view objects are created in non-shader visible heaps, which are then streamed into shader visible heaps.
692
693=== Descriptor Heap Queries
694
695Descriptor heaps provide methods to query the “start” pointer for the descriptor heap on both the CPU and GPU.
696
697[source,c]
698----
699D3D12_CPU_DESCRIPTOR_HANDLE GetCPUDescriptorHandleForHeapStart();
700D3D12_GPU_DESCRIPTOR_HANDLE GetGPUDescriptorHandleForHeapStart();
701UINT GetDescriptorHandleIncrementSize(
702  D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapType
703);
704----
705
706`GetGPUDescriptorHandleForHeapStart` should be the `VkDeviceAddress` for the device-local buffer.
707`GetCPUDescriptorHandleForHeapStart` should be the mapped host address for the host-visible buffer.
708`GetDescriptorHandleIncrementSize` should be the size of the largest descriptor possible in the buffer.
709
710However, this model can fall through fairly quickly if the descriptor set layout is more complicated.
711When more than one descriptor array is used to emulate the union-style descriptor heap of DX12,
712it is not possible to provide a unique pointer to host memory that is suitable for copying.
713
714An engine abstraction that takes descriptor heap and offset separately is much easier to implement overall and avoids all these pitfalls.
715
716=== Descriptor Copies
717
718D3D12-style descriptor copies can be performed using `memcpy` on the host-visible descriptor buffer memory,
719but applications need to make sure the memory that is being read from is cached on the host.
720Alternatively, it is possible to use staging buffer copies.
721
722=== Descriptor Binding
723
724Binding descriptors to shaders in DX12 consists of two operations: setting the descriptor heaps, and setting tables as offsets into those heaps.
725
726`SetDescriptorHeaps` allows applications to set one sampler heap, and one CBV/SRV/UAV heap (containing other resources).
727This command should straightforwardly map to `vkCmdBindDescriptorBuffersEXT`, with each heap being bound as a separate buffer.
728
729`Set{Graphics|Compute}RootDescriptorTable` allows applications to set various offsets to the descriptor heap, to be more or less used like descriptor sets in Vulkan.
730This command will map fairly directly to `vkCmdSetDescriptorBufferOffsetsEXT`, but if implementing DX12 root signatures natively, this approach will not work easily.
731The core assumption of DX12 is that the heap is a big array and a table offset should be seen more as an index offset into that big array.
732`descriptorBufferOffsetAlignment` might be larger than one descriptor, so binding at the desired offset might not be possible.
733Descriptor buffer offsets are better suited for suballocating individual descriptor sets rather than slicing existing descriptor sets.
734
735An engine abstraction can decide to take this into account when allocating descriptor sets:
736
737 * In DX12 path, a root signature has N tables, which needs to allocate M descriptors each.
738 * In Vulkan path, a "root signature" translates to a `VkPipelineLayout`, which in turn translates to N `VkDescriptorSetLayout`s which require M bytes in the descriptor buffer each.
739
740If native DX12 root signature compatibility is required however, the suggested implementation is to bind the heap in its entirety with a single `vkCmdSetDescriptorBufferOffsetEXT` of 0.
741The shader declares global unsized arrays and from there we can implement shader model 6.6 by just indexing into the descriptor array directly.
742For older models, descriptor table offsets can translate to u32 push constants that add an extra offset, meaning that we promote legacy root signatures to shader model 6.6.
743This is a fairly invasive process and it is only expected that translation layers would go to this length.
744
745== Porting existing Vulkan applications
746
747Porting an existing Vulkan application to the new API should require minimal additional code, and ideally should allow the removal of older code.
748
749Applications should be uploading descriptors in the exact same manner they upload other resource data (e.g. new textures, constants, etc.).
750All advice about how to upload resources (e.g. use staging buffers, use the DMA queue asynchronously, etc.) apply in the exact same manner for descriptors as they do for anything else.
751
752When porting an application then, the aim should not be to create a new separate path for descriptor uploads, but to directly hook into existing resource upload paths.
753This amortises the cost of descriptor uploads with other data uploads and reduces the amount of code dedicated to descriptor management.
754Any improvements to data uploads then automatically apply to descriptor uploads.
755For strategies where resizable BAR or unified memory can be used, none of this is necessary and uploading descriptors becomes `memcpy`.
756
757For descriptor management, pools are removed. Instead of allocating descriptor sets from pools, applications can instead allocate from a custom allocator, which is backed by a big descriptor buffer.
758The size to allocate for a set would be obtained from `vkGetDescriptorSetLayoutSizeEXT` and alignment from `descriptorBufferOffsetAlignment`.
759A linear or arena allocator would be a good match for this.
760
761Instead of updating descriptor sets with `vkUpdateDescriptorSets`, `vkGetDescriptorEXT` could point directly to the mapped descriptor buffer, or a scratch buffer can be used and copied later.
762
763== Example
764
765This example intends to show:
766
767 * How to create descriptor set layouts
768 * How to use immutable samplers with descriptor buffers
769 * How to use embedded immutable samplers
770 * How to use push descriptors
771 * How to allocate enough descriptor buffer memory
772 * How to bind ranges of descriptor buffers to descriptor sets
773
774[source,c]
775----
776VkSampler immutableSamplers[4]; // Create these somehow.
777
778// When using descriptor buffers, it is generally a good idea to separate out samplers and resources into separate sets,
779// since descriptor buffers containing samplers might be very limited in size.
780const VkDescriptorSetLayoutBinding setLayout0[] =
781{
782    {
783        0,                                      // binding
784        VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE,       // descriptorType
785        2,                                      // descriptorCount
786        VK_SHADER_STAGE_FRAGMENT_BIT,           // stageFlags
787        NULL                                    // pImmutableSamplers
788    },
789    {
790        1,                                       // binding
791        VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER, // descriptorType
792        2,                                       // descriptorCount
793        VK_SHADER_STAGE_FRAGMENT_BIT,            // stageFlags
794        NULL                                     // pImmutableSamplers
795    }
796};
797
798const VkDescriptorSetLayoutBinding setLayout1[] =
799{
800    {
801        0,                                      // binding
802        VK_DESCRIPTOR_TYPE_SAMPLER,             // descriptorType
803        2,                                      // descriptorCount
804        VK_SHADER_STAGE_FRAGMENT_BIT,           // stageFlags
805        &immutableSamplers[0],                  // pImmutableSamplers
806    },
807    {
808        1,                                       // binding
809        VK_DESCRIPTOR_TYPE_SAMPLER,              // descriptorType
810        2,                                       // descriptorCount
811        VK_SHADER_STAGE_FRAGMENT_BIT,            // stageFlags
812        NULL,
813    }
814};
815
816const VkDescriptorSetLayoutBinding setLayout2[] =
817{
818    // binding to a single image descriptor
819    {
820        0,                                      // binding
821        VK_DESCRIPTOR_TYPE_STORAGE_BUFFER,      // descriptorType
822        1,                                      // descriptorCount
823        VK_SHADER_STAGE_FRAGMENT_BIT,           // stageFlags
824        NULL                                    // pImmutableSamplers
825    }
826};
827
828// Embedded immutable samplers are internally allocated and we do not need to allocate anything.
829const VkDescriptorSetLayoutBinding setLayout3[] =
830{
831    {
832        0,                                      // binding
833        VK_DESCRIPTOR_TYPE_SAMPLER,             // descriptorType
834        1,                                      // descriptorCount
835        VK_SHADER_STAGE_FRAGMENT_BIT,           // stageFlags
836        &immutableSamplers[2],                  // pImmutableSamplers
837    },
838    {
839        1,                                       // binding
840        VK_DESCRIPTOR_TYPE_SAMPLER,              // descriptorType
841        1,                                       // descriptorCount
842        VK_SHADER_STAGE_FRAGMENT_BIT,            // stageFlags
843        &immutableSamplers[3],                   // pImmutableSamplers
844    }
845};
846
847// Descriptor set layouts are created as normal, but we use the descriptor buffer flag on the set layouts.
848VkDescriptorSetLayout layout0 =
849    create_descriptor_set_layout({ .flags = VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT, .pBindings = setLayout0, .bindingCount = 2 });
850VkDescriptorSetLayout layout1 =
851    create_descriptor_set_layout({ .flags = VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT, .pBindings = setLayout1, .bindingCount = 2 });
852VkDescriptorSetLayout layout2 =
853    create_descriptor_set_layout({ .flags =
854            VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT |
855            VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR,
856        .pBindings = setLayout2, .bindingCount = 1 });
857VkDescriptorSetLayout layout3 =
858    create_descriptor_set_layout({ .flags =
859            VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT |
860            VK_DESCRIPTOR_SET_LAYOUT_CREATE_EMBEDDED_IMMUTABLE_SAMPLERS_BIT_EXT,
861        .pBindings = setLayout3, .bindingCount = 2 });
862
863// Use 5 descriptor set layouts, mostly here to demonstrate how multiple sets can refer to one descriptor buffer.
864// Also, use embedded sampler sets and push constants for completion.
865VkPipelineLayout layout = create_pipeline_layout({ .layouts = { layout0, layout0, layout1, layout2, layout3 }});
866
867// Query how big the descriptor set layout is.
868VkDeviceSize layoutSizes[2];
869vkGetDescriptorSetLayoutSizeEXT(device, layout0, &layoutSizes[0]);
870vkGetDescriptorSetLayoutSizeEXT(device, layout1, &layoutSizes[1]);
871
872// Align the descriptor set size so it is suitable for suballocation within a descriptor buffer.
873layoutSizes[0] = align(layoutSizes[0], props.descriptorBufferOffsetAlignment);
874layoutSizes[1] = align(layoutSizes[1], props.descriptorBufferOffsetAlignment);
875
876// Query individual offsets into the descriptor set.
877VkDeviceSize layoutOffsets[2][2];
878vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout0, 0, &layoutOffsets[0][0]);
879vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout0, 1, &layoutOffsets[0][1]);
880vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout1, 0, &layoutOffsets[1][0]);
881vkGetDescriptorSetLayoutBindingOffsetEXT(device, layout1, 1, &layoutOffsets[1][1]);
882
883#define SET_COUNT 64
884
885// Allocate the equivalent of a big descriptor pool.
886// The size is arbitrary and should be large and be able to hold all descriptors used by app,
887// for this sample, we allocate the smallest possible descriptor buffer for the number of sets we need.
888// The most compatible thing to do is 1 resource buffer, 1 sampler buffer.
889Buffer resourceBuffer = create_buffer({
890    .size = layoutSizes[0] * 2 * SET_COUNT,
891    .usage = VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT |
892        (props.bufferlessPushDescriptors ? 0 : VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT),
893    .properties = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT });
894
895Buffer samplerBuffer = create_buffer({
896    .size = layoutSizes[1] * SET_COUNT,
897    .usage = VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT,
898    .properties = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT });
899
900const VkDescriptorBufferBindingPushDescriptorBufferHandleEXT push_descriptor_buffer_handle = {
901    VK_STRUCTURE_TYPE_DESCRIPTOR_BUFFER_BINDING_PUSH_DESCRIPTOR_BUFFER_HANDLE_EXT, NULL, resourceBuffer.handle};
902
903const VkDescriptorBufferBindingInfoEXT binding_infos[2] = {
904    { VK_STRUCTURE_TYPE_DESCRIPTOR_BUFFER_BINDING_INFO_EXT, (props.bufferlessPushDescriptors ? NULL : &push_descriptor_buffer_handle),
905        resourceBuffer.deviceAddress,
906        VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT | (props.bufferlessPushDescriptors ? 0 : VK_BUFFER_USAGE_PUSH_DESCRIPTORS_DESCRIPTOR_BUFFER_BIT) },
907    { VK_STRUCTURE_TYPE_DESCRIPTOR_BUFFER_BINDING_INFO_EXT, NULL, samplerBuffer.deviceAddress,
908        VK_BUFFER_USAGE_SAMPLER_DESCRIPTOR_BUFFER_BIT_EXT }
909};
910
911// Bind the descriptor buffers once, from here, we will offset into the buffer for different descriptor sets.
912vkCmdBindDescriptorBuffersEXT(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, 0, 2, binding_infos);
913
914// Allocate these somehow, not particularly important to this example.
915VkImageView views[SET_COUNT][2][2];
916VkSampler samplers[SET_COUNT][2];
917VkDeviceAddress bufferAddressTexelBuffer;
918
919// No buffers are associated with embedded immutable samplers. This maps to DX12 static samplers.
920// There is no vkCmdBindPipelineLayout(), so this is the way to do it in Vulkan.
921vkCmdBindDescriptorBufferEmbeddedSamplersEXT(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 4);
922
923for (int i = 0; i < SET_COUNT; i++)
924{
925    // This refers to the buffers we bound in vkCmdBindDescriptorBuffersEXT.
926    // Allocate descriptor sets linearly.
927    const uint32_t bufferIndices[] = { 0, 0, 1 };
928    const VkDeviceSize offsets[] = { 2 * i * layoutSizes[0], (2 * i + 1) * layoutSizes[0], i * layoutSizes[1] };
929
930    // Set 0: Resource set pulled from buffer 0
931    // Set 1: Resource set pulled from buffer 0
932    // Set 2: Sampler set pulled from buffer 1
933    // Set 3: Push descriptors
934    // Set 4: Embedded samplers
935
936    vkCmdSetDescriptorBufferOffsetsEXT(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 0, 3,
937        bufferIndices, offsets);
938
939    VkWriteDescriptorSet ssbo_write = { /* Fill in as desired, details not interesting here. */ };
940    vkCmdPushDescriptorSetKHR(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 3, 1, &ssbo_write);
941
942    VkDescriptorImageInfo image_info = {};
943    VkDescriptorAddressInfoEXT addr_info = { VK_STRUCTURE_TYPE_DESCRIPTOR_ADDRESS_INFO_EXT };
944    VkDescriptorGetInfoEXT info = { VK_STRUCTURE_TYPE_DESCRIPTOR_GET_INFO_EXT };
945
946    for (int j = 0; j < 2; j++)
947    {
948        info.type = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE;
949        info.pSampledImage = &image_info;
950        // If descriptorBufferImageLayoutIgnored is enabled, this is ignored, convenient!
951        image_info.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
952
953        // Offset is based on the binding offset + the offset within the descriptor set layout we queried earlier.
954        // For array indexing, use the descriptor size from physical device property.
955        // set j, binding 0, element k
956        for (int k = 0; k < 2; k++)
957        {
958            image_info.imageView = views[i][j][k];
959            vkGetDescriptorEXT(device, &info, props.sampledImageDescriptorSize,
960            resourceBuffer.hostPointer + offsets[j] + layoutOffsets[0][0] + k * props.sampledImageDescriptorSize);
961        }
962
963        // set j, binding 1, element k
964        info.type = VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER;
965        info.data.pUniformBuffer = &addr_info;
966        for (int k = 0; k < 2; k++)
967        {
968            addr_info.range = 1024;
969            addr_info.address = bufferAddressTexelBuffer + (4 * i + 2 * j + k) * addr_info.range;
970            // No VkBufferView needed, how convenient!
971            addr_info.format = VK_FORMAT_R8G8B8A8_UNORM;
972            vkGetDescriptorEXT(device, &info, props.uniformTexelBufferDescriptorSize,
973            resourceBuffer.hostPointer + offsets[j] + layoutOffsets[0][1] + k * props.uniformTexelBufferDescriptorSize);
974        }
975    }
976
977    // For immutable samplers, we have to emit the buffer payload.
978    // In practice, the immutable samplers must work even if implementation just ignores pImmutableSamplers.
979    info.type = VK_DESCRIPTOR_TYPE_SAMPLER;
980    // set 2, binding 0, element k
981    for (int k = 0; k < 2; k++)
982    {
983        info.data.pSampler = &immutableSamplers[k];
984        vkGetDescriptorEXT(device, &info, props.samplerDescriptorSize,
985        samplerBuffer.hostPointer + offsets[2] + layoutOffsets[1][0] + k * props.samplerDescriptorSize);
986    }
987
988    // set 2, binding 1, element k
989    for (int k = 0; k < 2; k++)
990    {
991        info.data.pSampler = &samplers[i][k];
992        vkGetDescriptorEXT(device, &info, props.samplerDescriptorSize,
993        samplerBuffer.hostPointer + offsets[2] + layoutOffsets[1][1] + k * props.samplerDescriptorSize);
994    }
995
996    vkCmdDraw(...);
997}
998----
999
1000== Issues
1001
1002=== RESOLVED: How do immutable samplers work?
1003
1004There may be cases where a driver needs immutable samplers stored as part of the descriptor, rather than solely existing as a part of the pipeline.
1005With descriptor sets, this could be hidden from the application as the driver controlled how writes were performed – not so with this API.
1006To fix this, samplers must be used to populate these descriptor bindings as if they were not immutable, and they must have been created with identical parameters.
1007
1008For partity with DX12, a special kind of descriptor set - embedded immutable samplers - are supported as an alternative which follow DX12 restrictions.
1009
1010=== RESOLVED: Should we support dynamic buffers?
1011
1012No, these have very specialized support paths in some drivers, and end up being more pain than it’s worth to support.
1013Applications can achieve the same using device addresses in push constants, or pipelined descriptor buffer updates.
1014
1015
1016=== UNRESOLVED: How does this interact with descriptor set invalidation?
1017
1018There’s some extra complication with whether descriptor set layouts work with buffers or sets (`VK_DESCRIPTOR_SET_LAYOUT_CREATE_DESCRIPTOR_BUFFER_BIT_EXT`) that will need sorting.
1019Shouldn’t be too difficult and will likely just be along the lines of invalidating sets that don’t match in this regard when binding a new pipeline layout, but it’s too much detail for this design document.
1020
1021
1022=== RESOLVED: Should `vkGetDescriptorOffset` take an `arrayOffset` parameter, or should we make guarantees about how arrays work?
1023
1024Guarantees about how arrays work makes it much easier to work with GPU-side updates, as it avoids having to either add a “get offset” shader intrinsic, or for apps to keep a mapping when doing GPU copies.
1025
1026
1027=== RESOLVED: Now that descriptors are in regular memory, should there be a limit on the size of “inline uniforms”?
1028
1029We should allow developers to put as many constants into descriptor buffers as they want, thus removing the limit, at least when it interacts with this extension.
1030This is likely to remove an indirection compared to putting these in a uniform buffer.
1031Potentially we might want to at least have it match the uniform buffer limit rather than being independent.
1032
1033
1034=== RESOLVED: Why are view objects required when DX12 has no such requirement?
1035
1036DX12 has dedicated heap objects which allow implementations to hide a lot of implementation detail behind them; without them, some vendors rely on view objects to store metadata.
1037Introducing heaps to Vulkan as-is was too complex alongside the other changes in this extension, when the primary goal is to enable explicit memory management, rather than precise DX12 compatibility.
1038If this turns out to be a significant problem, a future extension could be developed to bridge this gap.
1039
1040
1041=== RESOLVED: Should `vkGetDescriptorEXT` / `vkGetDescriptorSetLayoutBindingOffsetEXT` be arrayed?
1042
1043No – there is no reason why pulling this loop into the driver should provide any benefit.
1044
1045
1046=== RESOLVED: Should we support combined image/sampler descriptors with this extension?
1047
1048While some consider these deprecated, removing them would prevent some applications being able to port to this extension.
1049Additionally, YCbCr support currently _relies_ on this descriptor type, which is required on some platforms.
1050It might be possible to remove that requirement in the YCbCr feature, but it is a lot of work for a fairly low payoff.
1051
1052
1053=== RESOLVED: How does this interact with variable descriptor count?
1054
1055The variable flag is allowed; `vkGetDescriptorSetLayoutSize` returns a size assuming the maximum size will be used - but developers are free to use the set with a buffer sized for a smaller number of descriptors.  The exception to this is when `combinedImageSamplerDescriptorSingleArray` is `VK_FALSE` and the binding contains `VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER` descriptors; in this case the image and sampler descriptors are still arranged in the descriptor buffer as though the maximum number of descriptors are used, and so the buffer must be sized accordingly.
1056
1057
1058=== RESOLVED: Should we require descriptors to be retrieved for `NULL_HANDLE` or is `memset(0)` sufficient?
1059
1060Some vendors use non-zero values for null descriptors, so applications can retrieve these using `VK_NULL_HANDLE` with `vkGetDescriptorEXT`.
1061For descriptor types which take buffer devices addresses, a `0` address is used instead.
1062
1063=== RESOLVED: How can YCbCr descriptors be obtained?
1064
1065YCbCr descriptors can have multiple descriptors associated with them; applications must allow for this space.
1066`VkSamplerYcbcrConversionImageFormatProperties::combinedImageSamplerDescriptorCount` determines how many descriptors each image format requires.
1067When calling `vkGetDescriptorEXT` for a YCbCr combined descriptor, applications must provide a pointer to enough memory for this many combined sampled image descriptors, and factor this in when copying descriptors.
1068
1069
1070=== RESOLVED: How should we expect capture/replay tooling (e.g. RenderDoc/vktrace) to use this?
1071
1072A capture replay bit on image/buffer creation will be added to enable descriptors to be reused between runs. This allows capture tools to capture the buffer data as bound, and replay with the same descriptors, rather than attempting to do a mapping.
1073Some sort of GPU feedback is still desirable on capture to determine which handles are accessed, but this will be similar to the situation with descriptor indexing.
1074
1075
1076=== RESOLVED: On some platforms, descriptor sets occupy a 4GB range, allowing the set pointer to be 32-bit, rather than 64-bit. How can this be guaranteed for descriptor buffers?
1077
1078This could be done a number of ways – e.g. having unique memory types that guarantee allocation in a 4GB range.
1079
1080
1081=== RESOLVED: Should the alignment be separate from the size?
1082
1083No - the alignment of a descriptor is always the size of the descriptor.
1084
1085
1086=== RESOLVED: What is the fast path for constant data in this new model? Previously most vendors have recommended dynamic UBOs as a fast path, but those go away in this extension.
1087
1088The crucial part of getting data into a shader quickly is mostly dominated by number of indirections, and cache behavior.
1089Static accesses with fewer indirections and minimal memory model interactions (e.g. read-only and not `NonPrivate`) will be fastest.
1090Push constants should be favored for small amounts of data.
1091For larger amounts of data, applications should favor allocating buffers and putting data into those buffers according with whichever of the below API mechanisms is most straightforward for their use case, with some potential degradation at each step.
1092
1093  * Push constants
1094  * Pointer to data in push constants
1095  * Inline uniform data in descriptor buffers
1096  * Push descriptors
1097  * Uniform buffer in descriptor memory
1098  * Storage buffer in descriptor memory
1099
1100This order listed above is not necessarily true for all IHVs.
1101
1102
1103=== RESOLVED: Should applications be able to mix sets and buffers?
1104
1105Originally the intention was to support this, but at least one vendor cannot support this natively.
1106
1107
1108=== RESOLVED: Should we use buffer device addresses for the buffer arguments?
1109
1110Buffer parameters in recent extensions have been using device address arguments, so this extension aims to be consistent. Part of the reason for this though, is so that the base address can be modified with a single pointer argument instead of object + offset.
1111However, this extension explicitly uses a separate command for setting the offset dynamically compared to the base address, to allow for the application to set the base address statically.
1112Having the base address specified with a device address is still useful for consistency though.
1113
1114
1115=== RESOLVED: How does this interact with VK_EXT_pipeline_robustness?
1116
1117There is no way to request robust and non-robust descriptors separately, or specify robust/non-robust descriptors in the set layout, so if
1118the `robustBufferAccess` feature is enabled then robust descriptors are always used.
1119