• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright (c) 2020 NVIDIA Corporation
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5include::{generated}/meta/{refprefix}VK_NV_device_generated_commands.adoc[]
6
7=== Other Extension Metadata
8
9*Last Modified Date*::
10    2020-02-20
11*Interactions and External Dependencies*::
12  - This extension requires Vulkan 1.1
13  - This extension requires `VK_EXT_buffer_device_address` or
14    `VK_KHR_buffer_device_address` or Vulkan 1.2 for the ability to bind
15    vertex and index buffers on the device.
16  - This extension interacts with `VK_NV_mesh_shader`.
17    If the latter extension is not supported, remove the command token to
18    initiate mesh tasks drawing in this extension.
19*Contributors*::
20  - Christoph Kubisch, NVIDIA
21  - Pierre Boudier, NVIDIA
22  - Jeff Bolz, NVIDIA
23  - Eric Werness, NVIDIA
24  - Yuriy O'Donnell, Epic Games
25  - Baldur Karlsson, Valve
26  - Mathias Schott, NVIDIA
27  - Tyson Smith, NVIDIA
28  - Ingo Esser, NVIDIA
29
30=== Description
31
32This extension allows the device to generate a number of critical graphics
33commands for command buffers.
34
35When rendering a large number of objects, the device can be leveraged to
36implement a number of critical functions, like updating matrices, or
37implementing occlusion culling, frustum culling, front to back sorting, etc.
38Implementing those on the device does not require any special extension,
39since an application is free to define its own data structures, and just
40process them using shaders.
41
42However, if the application desires to quickly kick off the rendering of the
43final stream of objects, then unextended Vulkan forces the application to
44read back the processed stream and issue graphics command from the host.
45For very large scenes, the synchronization overhead and cost to generate the
46command buffer can become the bottleneck.
47This extension allows an application to generate a device side stream of
48state changes and commands, and convert it efficiently into a command buffer
49without having to read it back to the host.
50
51Furthermore, it allows incremental changes to such command buffers by
52manipulating only partial sections of a command stream -- for example
53pipeline bindings.
54Unextended Vulkan requires re-creation of entire command buffers in such a
55scenario, or updates synchronized on the host.
56
57The intended usage for this extension is for the application to:
58
59  * create sname:VkBuffer objects and retrieve physical addresses from them
60    via flink:vkGetBufferDeviceAddressEXT
61  * create a graphics pipeline using
62    sname:VkGraphicsPipelineShaderGroupsCreateInfoNV for the ability to
63    change shaders on the device.
64  * create a slink:VkIndirectCommandsLayoutNV, which lists the
65    elink:VkIndirectCommandsTokenTypeNV it wants to dynamically execute as
66    an atomic command sequence.
67    This step likely involves some internal device code compilation, since
68    the intent is for the GPU to generate the command buffer in the
69    pipeline.
70  * fill the input stream buffers with the data for each of the inputs it
71    needs.
72    Each input is an array that will be filled with token-dependent data.
73  * set up a preprocess sname:VkBuffer that uses memory according to the
74    information retrieved via
75    flink:vkGetGeneratedCommandsMemoryRequirementsNV.
76  * optionally preprocess the generated content using
77    flink:vkCmdPreprocessGeneratedCommandsNV, for example on an asynchronous
78    compute queue, or for the purpose of re-using the data in multiple
79    executions.
80  * call flink:vkCmdExecuteGeneratedCommandsNV to create and execute the
81    actual device commands for all sequences based on the inputs provided.
82
83For each draw in a sequence, the following can be specified:
84
85  * a different shader group
86  * a number of vertex buffer bindings
87  * a different index buffer, with an optional dynamic offset and index type
88  * a number of different push constants
89  * a flag that encodes the primitive winding
90
91While the GPU can be faster than a CPU to generate the commands, it will not
92happen asynchronously to the device, therefore the primary use-case is
93generating "`less`" total work (occlusion culling, classification to use
94specialized shaders, etc.).
95
96include::{generated}/interfaces/VK_NV_device_generated_commands.adoc[]
97
98=== Issues
99
1001) How to name this extension ?
101
102`VK_NV_device_generated_commands`
103
104As usual, one of the hardest issues ;)
105
106Alternatives: `VK_gpu_commands`, `VK_execute_commands`,
107`VK_device_commands`, `VK_device_execute_commands`, `VK_device_execute`,
108`VK_device_created_commands`, `VK_device_recorded_commands`,
109`VK_device_generated_commands` `VK_indirect_generated_commands`
110
1112) Should we use a serial stateful token stream or stateless sequence
112descriptions?
113
114Similarly to slink:VkPipeline, fixed layouts have the most likelihood to be
115cross-vendor adoptable.
116They also benefit from being processable in parallel.
117This is a different design choice compared to the serial command stream
118generated through `GL_NV_command_list`.
119
1203) How to name a sequence description?
121
122`VkIndirectCommandsLayout` as in the NVX extension predecessor.
123
124Alternative: `VkGeneratedCommandsLayout`
125
1264) Do we want to provide code:indirectCommands inputs with layout or at
127code:indirectCommands time?
128
129Separate layout from data as Vulkan does.
130Provide full flexibility for code:indirectCommands.
131
1325) Should the input be provided as SoA or AoS?
133
134Both ways are desirable.
135AoS can provide portability to other APIs and easier to setup, while SoA
136allows to update individual inputs in a cache-efficient manner, when others
137remain static.
138
1396) How do we make developers aware of the memory requirements of
140implementation-dependent data used for the generated commands?
141
142Make the API explicit and introduce a `preprocess` slink:VkBuffer.
143Developers have to allocate it using
144flink:vkGetGeneratedCommandsMemoryRequirementsNV.
145
146In the NVX version the requirements were hidden implicitly as part of the
147command buffer reservation process, however as the memory requirements can
148be substantial, we want to give developers the ability to budget the memory
149themselves.
150By lowering the `maxSequencesCount` the memory consumption can be reduced.
151Furthermore reuse of the memory is possible, for example for doing explicit
152preprocessing and execution in a ping-pong fashion.
153
154The actual buffer size is implementation-dependent and may be zero, i.e. not
155always required.
156
157When making use of Graphics Shader Groups, the programs should behave
158similar with regards to vertex inputs, clipping and culling outputs of the
159geometry stage, as well as sample shading behavior in fragment shaders, to
160reduce the amount of the worst-case memory approximation.
161
1627) Should we allow additional per-sequence dynamic state changes?
163
164Yes
165
166Introduced a lightweight indirect state flag
167elink:VkIndirectStateFlagBitsNV.
168So far only switching front face winding state is exposed.
169Especially in CAD/DCC mirrored transforms that require such changes are
170common, and similar flexibility is given in the ray tracing instance
171description.
172
173The flag could be extended further, for example to switch between
174primitive-lists or -strips, or make other state modifications.
175
176Furthermore, as new tokens can be added easily, future extension could add
177the ability to change any elink:VkDynamicState.
178
1798) How do we allow re-using already "`generated`" code:indirectCommands?
180
181Expose a `preprocessBuffer` to reuse implementation-dependencyFlags data.
182Set the `isPreprocessed` to true in flink:vkCmdExecuteGeneratedCommandsNV.
183
1849) Under which conditions is flink:vkCmdExecuteGeneratedCommandsNV legal?
185
186It behaves like a regular draw call command.
187
18810) Is flink:vkCmdPreprocessGeneratedCommandsNV copying the input data or
189referencing it?
190
191There are multiple implementations possible:
192
193  * one could have some emulation code that parses the inputs, and generates
194    an output command buffer, therefore copying the inputs.
195  * one could just reference the inputs, and have the processing done in
196    pipe at execution time.
197
198If the data is mandated to be copied, then it puts a penalty on
199implementation that could process the inputs directly in pipe.
200If the data is "`referenced`", then it allows both types of implementation.
201
202The inputs are "`referenced`", and must: not be modified after the call to
203flink:vkCmdExecuteGeneratedCommandsNV has completed.
204
20511) Which buffer usage flags are required for the buffers referenced by
206sname:VkGeneratedCommandsInfoNV ?
207
208Reuse existing ename:VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT
209
210  * slink:VkGeneratedCommandsInfoNV::pname:preprocessBuffer
211  * slink:VkGeneratedCommandsInfoNV::pname:sequencesCountBuffer
212  * slink:VkGeneratedCommandsInfoNV::pname:sequencesIndexBuffer
213  * slink:VkIndirectCommandsStreamNV::pname:buffer
214
21512) In which pipeline stage does the device generated command expansion
216happen?
217
218flink:vkCmdPreprocessGeneratedCommandsNV is treated as if it occurs in a
219separate logical pipeline from either graphics or compute, and that pipeline
220only includes ename:VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, a new stage
221ename:VK_PIPELINE_STAGE_COMMAND_PREPROCESS_BIT_NV, and
222ename:VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT.
223This new stage has two corresponding new access types,
224ename:VK_ACCESS_COMMAND_PREPROCESS_READ_BIT_NV and
225ename:VK_ACCESS_COMMAND_PREPROCESS_WRITE_BIT_NV, used to synchronize reading
226the buffer inputs and writing the preprocess memory output.
227
228The generated output written in the preprocess buffer memory by
229flink:vkCmdExecuteGeneratedCommandsNV is considered to be consumed by the
230ename:VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT pipeline stage.
231
232Thus, to synchronize from writing the input buffers to preprocessing via
233flink:vkCmdPreprocessGeneratedCommandsNV, use:
234
235  * pname:dstStageMask = ename:VK_PIPELINE_STAGE_COMMAND_PREPROCESS_BIT_NV
236  * pname:dstAccessMask = ename:VK_ACCESS_COMMAND_PREPROCESS_READ_BIT_NV
237
238To synchronize from flink:vkCmdPreprocessGeneratedCommandsNV to executing
239the generated commands by flink:vkCmdExecuteGeneratedCommandsNV, use:
240
241  * pname:srcStageMask = ename:VK_PIPELINE_STAGE_COMMAND_PREPROCESS_BIT_NV
242  * pname:srcAccessMask = ename:VK_ACCESS_COMMAND_PREPROCESS_WRITE_BIT_NV
243  * pname:dstStageMask = ename:VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT
244  * pname:dstAccessMask = ename:VK_ACCESS_INDIRECT_COMMAND_READ_BIT
245
246When flink:vkCmdExecuteGeneratedCommandsNV is used with a
247pname:isPreprocessed of `VK_FALSE`, the generated commands are implicitly
248preprocessed, therefore one only needs to synchronize the inputs via:
249
250  * pname:dstStageMask = ename:VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT
251  * pname:dstAccessMask = ename:VK_ACCESS_INDIRECT_COMMAND_READ_BIT
252
25313) What if most token data is "`static`", but we frequently want to render
254a subsection?
255
256Added "`sequencesIndexBuffer`".
257This allows to easier sort and filter what should actually be executed.
258
25914) What are the changes compared to the previous NVX extension?
260
261  * Compute dispatch support was removed (was never implemented in drivers).
262    There are different approaches how dispatching from the device should
263    work, hence we defer this to a future extension.
264  * The `ObjectTableNVX` was replaced by using physical buffer addresses and
265    introducing Shader Groups for the graphics pipeline.
266  * Less state changes are possible overall, but the important operations
267    are still there (reduces complexity of implementation).
268  * The API was redesigned so all inputs must be passed at both
269    preprocessing and execution time (this was implicit in NVX, now it is
270    explicit)
271  * The reservation of intermediate command space is now mandatory and
272    explicit through a preprocess buffer.
273  * The elink:VkIndirectStateFlagBitsNV were introduced
274
27515) When porting from other APIs, their indirect buffers may use different
276    enums, for example for index buffer types.
277    How to solve this?
278
279Added "`pIndexTypeValues`" to map custom `uint32_t` values to corresponding
280ename:VkIndexType.
281
28216) Do we need more shader group state overrides?
283
284The NVX version allowed all PSO states to be different, however as the goal
285is not to replace all state setup, but focus on highly-frequent state
286changes for drawing lots of objects, we reduced the amount of state
287overrides.
288Especially VkPipelineLayout as well as VkRenderPass configuration should be
289left static, the rest is still open for discussion.
290
291The current focus is just to allow VertexInput changes as well as shaders,
292while all shader groups use the same shader stages.
293
294Too much flexibility will increase the test coverage requirement as well.
295However, further extensions could allow more dynamic state as well.
296
29717) Do we need more detailed physical device feature queries/enables?
298
299An EXT version would need detailed implementor feedback to come up with a
300good set of features.
301Please contact us if you are interested, we are happy to make more features
302optional, or add further restrictions to reduce the minimum feature set of
303an EXT.
304
30518) Is there an interaction with VK_KHR_pipeline_library planned?
306
307Yes, a future version of this extension will detail the interaction, once
308VK_KHR_pipeline_library is no longer provisional.
309
310=== Example Code
311
312Open-Source samples illustrating the usage of the extension can be found at
313the following location (may not yet exist at time of writing):
314
315https://github.com/nvpro-samples/vk_device_generated_cmds
316
317
318=== Version History
319
320  * Revision 1, 2020-02-20 (Christoph Kubisch)
321  ** Initial version
322  * Revision 2, 2020-03-09 (Christoph Kubisch)
323  ** Remove VK_EXT_debug_report interactions
324  * Revision 3, 2020-03-09 (Christoph Kubisch)
325  ** Fix naming VkPhysicalDeviceGenerated to VkPhysicalDeviceDeviceGenerated
326