• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1include::meta/VK_NVX_device_generated_commands.txt[]
2
3*Last Modified Date*::
4    2017-07-25
5*Contributors*::
6  - Pierre Boudier, NVIDIA
7  - Christoph Kubisch, NVIDIA
8  - Mathias Schott, NVIDIA
9  - Jeff Bolz, NVIDIA
10  - Eric Werness, NVIDIA
11  - Detlef Roettger, NVIDIA
12  - Daniel Koch, NVIDIA
13  - Chris Hebert, NVIDIA
14
15This extension allows the device to generate a number of critical commands
16for command buffers.
17
18When rendering a large number of objects, the device can be leveraged to
19implement a number of critical functions, like updating matrices, or
20implementing occlusion culling, frustum culling, front to back sorting, etc.
21Implementing those on the device does not require any special extension,
22since an application is free to define its own data structure, and just
23process them using shaders.
24
25However, if the application desires to quickly kick off the rendering of the
26final stream of objects, then unextended Vulkan forces the application to
27read back the processed stream and issue graphics command from the host.
28For very large scenes, the synchronization overhead, and cost to generate
29the command buffer can become the bottleneck.
30This extension allows an application to generate a device side stream of
31state changes and commands, and convert it efficiently into a command buffer
32without having to read it back on the host.
33
34Furthermore, it allows incremental changes to such command buffers by
35manipulating only partial sections of a command stream -- for example
36pipeline bindings.
37Unextended Vulkan requires re-creation of entire command buffers in such
38scenario, or updates synchronized on the host.
39
40The intended usage for this extension is for the application to:
41
42  * create its objects as in unextended Vulkan
43  * create a slink:VkObjectTableNVX, and register the various Vulkan objects
44    that are needed to evaluate the input parameters.
45  * create a slink:VkIndirectCommandsLayoutNVX, which lists the
46    slink:VkIndirectCommandsTokenTypeNVX it wants to dynamically change as
47    atomic command sequence.
48    This step likely involves some internal device code compilation, since
49    the intent is for the GPU to generate the command buffer in the
50    pipeline.
51  * fill the input buffers with the data for each of the inputs it needs.
52    Each input is an array that will be filled with an index in the object
53    table, instead of using CPU pointers.
54  * set up a target secondary command buffer
55  * reserve command buffer space via flink:vkCmdReserveSpaceForCommandsNVX
56    in a target command buffer at the position you want the generated
57    commands to be executed.
58  * call flink:vkCmdProcessCommandsNVX to create the actual device commands
59    for all sequences based on the array contents into a provided target
60    command buffer.
61  * execute the target command buffer like a regular secondary command
62    buffer
63
64For each draw/dispatch, the following can be specified:
65
66  * a different pipeline state object
67  * a number of descriptor sets, with dynamic offsets
68  * a number of vertex buffer bindings, with an optional dynamic offset
69  * a different index buffer, with an optional dynamic offset
70
71Applications should: register a small number of objects, and use dynamic
72offsets whenever possible.
73
74While the GPU can be faster than a CPU to generate the commands, it may not
75happen asynchronously, therefore the primary use-case is generating "`less`"
76total work (occlusion culling, classification to use specialized shaders,
77etc.).
78
79=== New Object Types
80
81  * slink:VkObjectTableNVX
82  * slink:VkIndirectCommandsLayoutNVX
83
84=== New Flag Types
85
86  * elink:VkIndirectCommandsLayoutUsageFlagsNVX
87  * elink:VkObjectEntryUsageFlagsNVX
88
89=== New Enum Constants
90
91Extending elink:VkStructureType:
92
93  ** ename:VK_STRUCTURE_TYPE_OBJECT_TABLE_CREATE_INFO_NVX
94  ** ename:VK_STRUCTURE_TYPE_INDIRECT_COMMANDS_LAYOUT_CREATE_INFO_NVX
95  ** ename:VK_STRUCTURE_TYPE_CMD_PROCESS_COMMANDS_INFO_NVX
96  ** ename:VK_STRUCTURE_TYPE_CMD_RESERVE_SPACE_FOR_COMMANDS_INFO_NVX
97  ** ename:VK_STRUCTURE_TYPE_DEVICE_GENERATED_COMMANDS_LIMITS_NVX
98  ** ename:VK_STRUCTURE_TYPE_DEVICE_GENERATED_COMMANDS_FEATURES_NVX
99
100Extending elink:VkPipelineStageFlagBits:
101
102  ** ename:VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX
103
104Extending elink:VkAccessFlagBits:
105
106  ** ename:VK_ACCESS_COMMAND_PROCESS_READ_BIT_NVX
107  ** ename:VK_ACCESS_COMMAND_PROCESS_WRITE_BIT_NVX
108
109=== New Enums
110
111  * elink:VkIndirectCommandsLayoutUsageFlagBitsNVX
112  * elink:VkIndirectCommandsTokenTypeNVX
113  * elink:VkObjectEntryUsageFlagBitsNVX
114  * elink:VkObjectEntryTypeNVX
115
116=== New Structures
117
118  * slink:VkDeviceGeneratedCommandsFeaturesNVX
119  * slink:VkDeviceGeneratedCommandsLimitsNVX
120  * slink:VkIndirectCommandsTokenNVX
121  * slink:VkIndirectCommandsLayoutTokenNVX
122  * slink:VkIndirectCommandsLayoutCreateInfoNVX
123  * slink:VkCmdProcessCommandsInfoNVX
124  * slink:VkCmdReserveSpaceForCommandsInfoNVX
125  * slink:VkObjectTableCreateInfoNVX
126  * slink:VkObjectTableEntryNVX
127  * slink:VkObjectTablePipelineEntryNVX
128  * slink:VkObjectTableDescriptorSetEntryNVX
129  * slink:VkObjectTableVertexBufferEntryNVX
130  * slink:VkObjectTableIndexBufferEntryNVX
131  * slink:VkObjectTablePushConstantEntryNVX
132
133=== New Functions
134
135  * flink:vkCmdProcessCommandsNVX
136  * flink:vkCmdReserveSpaceForCommandsNVX
137  * flink:vkCreateIndirectCommandsLayoutNVX
138  * flink:vkDestroyIndirectCommandsLayoutNVX
139  * flink:vkCreateObjectTableNVX
140  * flink:vkDestroyObjectTableNVX
141  * flink:vkRegisterObjectsNVX
142  * flink:vkUnregisterObjectsNVX
143  * flink:vkGetPhysicalDeviceGeneratedCommandsPropertiesNVX
144
145=== Issues
146
1471) How to name this extension ?
148
149*RESOLVED*: `VK_NVX_device_generated_commands`
150
151As usual, one of the hardest issues ;)
152
153Alternatives: `VK_gpu_commands`, `VK_execute_commands`,
154`VK_device_commands`, `VK_device_execute_commands`, `VK_device_execute`,
155`VK_device_created_commands`, `VK_device_recorded_commands`,
156`VK_device_generated_commands`
157
1582) Should we use serial tokens or redundant sequence description?
159
160Similarly to slink:VkPipeline, signatures have the most likelihood to be
161cross-vendor adoptable.
162They also benefit from being processable in parallel.
163
1643) How to name sequence description
165
166stext:ExecuteCommandSignature is a bit long.
167Maybe just stext:ExecuteSignature, or actually more following Vulkan
168nomenclature: slink:VkIndirectCommandsLayoutNVX.
169
1704) Do we want to provide code:indirectCommands inputs with layout or at
171code:indirectCommands time?
172
173Separate layout from data as Vulkan does.
174Provide full flexibilty for code:indirectCommands.
175
1765) Should the input be provided as SoA or AoS?
177
178It is desirable for the application to reuse the list of objects and render
179them with some kind of an override.
180This can be done by just selecting a different input for a push constant or
181a descriptor set, if they are defined as independent arrays.
182If the data was interleaved, this would not be as easily possible.
183
184Allowing input divisors can also reduce the conservative command buffer
185allocation.
186
1876) How do we know the size of the GPU command buffer generated by
188flink:vkCmdProcessCommandsNVX ?
189
190pname:maxSequenceCount can give an upper estimate, even if the actual count
191is sourced from the gpu buffer at (buffer, countOffset).
192As such pname:maxSequenceCount must always be set correctly.
193
194Developers are encouraged to make well use the
195slink:VkIndirectCommandsLayoutNVX's ptext:pTokens[].divisor, as they allow
196less conservative storage costs.
197Especially pipeline changes on a per-draw basis can be costly memory wise.
198
1997) How to deal with dynamic offsets in DescriptorSets?
200
201Maybe additional token etext:VK_EXECUTE_DESCRIPTOR_SET_OFFSET_COMMAND_NVX
202that works for a "`single dynamic buffer`" descriptor set and then use (32
203bit tableEntry + 32bit offset)
204
205added dynamicCount field, variable sized input
206
2078) Should we allow updates to the object table, similar to DescriptorSet?
208
209Desired yes, people may change "`material`" shaders and not want to recreate
210the entire register table.
211However the developer must ensure to not overwrite a registered objectIndex
212while it is still being used.
213
2149) Should we allow dynamic state changes?
215
216Seems a bit excessive for "`per-draw`" type of scenario, but GPU could
217partition work itself with viewport/scissor...
218
21910) How do we allow re-using already "`filled`" code:indirectCommands
220buffers?
221
222just use a slink:VkCommandBuffer for the output, and it can be reused
223easily.
224
22511) How portable should such re-use be?
226
227Same as secondary command buffer
228
22912) Should sequenceOrdered be part of IndirectCommandsLayout or
230slink:vkCmdProcessCommandsNVX?
231
232Seems better for IndirectCommandsLayout, as that is when most heavy lifting
233in terms of internal device code generation is done.
234
23513) Under which conditions is flink:vkCmdProcessCommandsNVX legal?
236
237Options:
238
239a) on the host command buffer like a regular draw call
240
241b) flink:vkCmdProcessCommandsNVX makes use slink:VkCommandBufferBeginInfo
242   and serves as flink:vkBeginCommandBuffer / flink:vkEndCommandBuffer
243   implicitly.
244
245c) The pname:targetCommandbuffer must be inside the "`begin`" state already
246   at the moment of being passed.
247   This very likely suggests a new slink:VkCommandBufferUsageFlags
248   etext:VK_COMMAND_BUFFER_USAGE_DEVICE_GENERATED_BIT.
249
250d) The pname:targetCommandbuffer must reserve space via a new function.
251
252used a) and d).
253
25414) What if different pipelines have different DescriptorSetLayouts at a
255certain set unit that mismatches in code:token.dynamicCount?
256
257Considered legal, as long as the maximum dynamic count of all used
258DescriptorSetLayouts is provided.
259
26015) Should we add "`strides`" to input arrays, so that "`Array of
261Structures`" type setups can be supported more easily?
262
263Maybe provide a usage flag for packed tokens stream (all inputs from same
264buffer, implicit stride).
265
266No, given performance test was worse.
267
26816) Should we allow re-using the target command buffer directly, without
269need to reset command buffer?
270
271YES: new api flink:vkCmdReserveSpaceForCommandsNVX.
272
27317) Is flink:vkCmdProcessCommandsNVX copying the input data or referencing
274it ?
275
276There are multiple implementations possible:
277
278  * one could have some emulation code that parse the inputs, and generates
279    an output command buffer, therefore copying the inputs.
280  * one could just reference the inputs, and have the processing done in
281    pipe at execution time.
282
283If the data is mandated to be copied, then it puts a penalty on
284implementation that could process the inputs directly in pipe.
285If the data is "`referenced`", then it allows both types of implementation
286
287The inputs are "`referenced`", and should not be modified after the call to
288flink:vkCmdProcessCommandsNVX and until after the rendering of the target
289command buffer is finished.
290
29118) Why is this +NVX+ and not +NV+?
292
293To allow early experimentation and feedback.
294We expect that a version with a refined design as multi-vendor variant will
295follow up.
296
29719) Should we make the availability for each token type a device limit?
298
299Only distinguish between graphics/compute for now, further splitting up may
300lead to too much fractioning.
301
30220) When can the pname:objectTable be modified?
303
304Similar to the other inputs for flink:vkCmdProcessCommandsNVX, only when all
305device access via flink:vkCmdProcessCommandsNVX or execution of target
306command buffer has completed can an object at a given objectIndex be
307unregistered or re-registered again.
308
30921) Which buffer usage flags are required for the buffers referenced by
310flink:vkCmdProcessCommandsNVX
311
312reuse existing ename:VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT
313
314  * slink:VkCmdProcessCommandsInfoNVX::pname:sequencesCountBuffer
315  * slink:VkCmdProcessCommandsInfoNVX::pname:sequencesIndexBuffer
316  * slink:VkIndirectCommandsTokenNVX::pname:buffer
317
31822) In which pipeline stage do the device generated command expansion
319happen?
320
321flink:vkCmdProcessCommandsNVX is treated as if it occurs in a separate
322logical pipeline from either graphics or compute, and that pipeline only
323includes ename:VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, a new stage
324ename:VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX, and
325ename:VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT.
326This new stage has two corresponding new access types,
327ename:VK_ACCESS_COMMAND_PROCESS_READ_BIT_NVX and
328ename:VK_ACCESS_COMMAND_PROCESS_WRITE_BIT_NVX, used to synchronize reading
329the buffer inputs and writing the command buffer memory output.
330The output written in the target command buffer is considered to be consumed
331by the ename:VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT pipeline stage.
332
333Thus, to synchronize from writing the input buffers to executing
334flink:vkCmdProcessCommandsNVX, use:
335
336  * pname:dstStageMask = ename:VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX
337  * pname:dstAccessMask = ename:VK_ACCESS_COMMAND_PROCESS_READ_BIT_NVX
338
339To synchronize from executing flink:vkCmdProcessCommandsNVX to executing the
340generated commands, use
341
342  * pname:srcStageMask = ename:VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX
343  * pname:srcAccessMask = ename:VK_ACCESS_COMMAND_PROCESS_WRITE_BIT_NVX
344  * pname:dstStageMask = ename:VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT
345  * pname:dstAccessMask = ename:VK_ACCESS_INDIRECT_COMMAND_READ_BIT
346
347When flink:vkCmdProcessCommandsNVX is used with a pname:targetCommandBuffer
348of `NULL`, the generated commands are immediately executed and there is
349implicit synchronization between generation and execution.
350
35123) What if most token data is "`static`", but we frequently want to render
352a subsection?
353
354added "`sequencesIndexBuffer`".
355This allows to easier sort and filter what should actually be processed.
356
357=== Example Code
358
359Open-Source samples illustrating the usage of the extension can be found at
360the following locations:
361
362https://github.com/nvpro-samples/gl_vk_threaded_cadscene/blob/master/doc/vulkan_nvxdevicegenerated.md
363
364https://github.com/NVIDIAGameWorks/GraphicsSamples/tree/master/samples/vk10-kepler/BasicDeviceGeneratedCommandsVk
365
366[source,c]
367---------------------------------------------------
368
369  // setup secondary command buffer
370    vkBeginCommandBuffer(generatedCmdBuffer, &beginInfo);
371    ... setup its state as usual
372
373  // insert the reservation (there can only be one per command buffer)
374  // where the generated calls should be filled into
375    VkCmdReserveSpaceForCommandsInfoNVX reserveInfo = { VK_STRUCTURE_TYPE_CMD_RESERVE_SPACE_FOR_COMMANDS_INFO_NVX };
376    reserveInfo.objectTable = objectTable;
377    reserveInfo.indirectCommandsLayout = deviceGeneratedLayout;
378    reserveInfo.maxSequencesCount = myCount;
379    vkCmdReserveSpaceForCommandsNVX(generatedCmdBuffer, &reserveInfo);
380
381    vkEndCommandBuffer(generatedCmdBuffer);
382
383  // trigger the generation at some point in another primary command buffer
384    VkCmdProcessCommandsInfoNVX processInfo = { VK_STRUCTURE_TYPE_CMD_PROCESS_COMMANDS_INFO_NVX };
385    processInfo.objectTable = objectTable;
386    processInfo.indirectCommandsLayout = deviceGeneratedLayout;
387    processInfo.maxSequencesCount = myCount;
388    // set the target of the generation (if null we would directly execute with mainCmd)
389    processInfo.targetCommandBuffer = generatedCmdBuffer;
390    // provide input data
391    processInfo.indirectCommandsTokenCount = 3;
392    processInfo.pIndirectCommandsTokens = myTokens;
393
394  // If you modify the input buffer data referenced by VkCmdProcessCommandsInfoNVX,
395  // ensure you have added the appropriate barriers prior generation process.
396  // When regenerating the content of the same reserved space, ensure prior operations have completed
397
398    VkMemoryBarrier memoryBarrier = { VK_STRUCTURE_TYPE_MEMORY_BARRIER };
399    memoryBarrier.srcAccessMask = ...;
400    memoryBarrier.dstAccessMask = VK_ACCESS_COMMAND_PROCESS_READ_BIT_NVX;
401
402    vkCmdPipelineBarrier(mainCmd,
403                         /*srcStageMask*/VK_PIPELINE_STAGE_ALL_COMMANDS_BIT,
404                         /*dstStageMask*/VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX,
405                         /*dependencyFlags*/0,
406                         /*memoryBarrierCount*/1,
407                         /*pMemoryBarriers*/&memoryBarrier,
408                         ...);
409
410    vkCmdProcessCommandsNVX(mainCmd, &processInfo);
411    ...
412  // execute the secondary command buffer and ensure the processing that modifies command-buffer content
413  // has completed
414
415    memoryBarrier.srcAccessMask = VK_ACCESS_COMMAND_PROCESS_WRITE_BIT_NVX;
416    memoryBarrier.dstAccessMask = VK_ACCESS_INDIRECT_COMMAND_READ_BIT;
417
418    vkCmdPipelineBarrier(mainCmd,
419                         /*srcStageMask*/VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX,
420                         /*dstStageMask*/VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT,
421                         /*dependencyFlags*/0,
422                         /*memoryBarrierCount*/1,
423                         /*pMemoryBarriers*/&memoryBarrier,
424                         ...)
425    vkCmdExecuteCommands(mainCmd, 1, &generatedCmdBuffer);
426
427---------------------------------------------------
428
429=== Version History
430
431 * Revision 3, 2017-07-25 (Chris Hebert)
432   - Correction to specification of dynamicCount for push_constant token in
433     VkIndirectCommandsLayoutNVX.
434     Stride was incorrectly computed as dynamicCount was not treated as byte
435     size.
436 * Revision 2, 2017-06-01 (Christoph Kubisch)
437   - header compatibility break: add missing _TYPE to
438     VkIndirectCommandsTokenTypeNVX and VkObjectEntryTypeNVX enums to follow
439     Vulkan naming convention
440   - behavior clarification: only allow a single work provoking token per
441     sequence when creating a slink:VkIndirectCommandsLayoutNVX
442 * Revision 1, 2016-10-31 (Christoph Kubisch)
443   - Initial draft
444