Name NV_command_list Name Strings GL_NV_command_list Contact Pierre Boudier, NVIDIA (pboudier 'at' nvidia.com) Christoph Kubisch, NVIDIA (ckubisch 'at' nvidia.com) Tristan Lorach, NVIDIA (tlorach 'at' nvidia.com) Contributors Jeff Bolz, NVIDIA Corentin Wallez, NVIDIA Markus Tavenrath, NVIDIA Mark Kilgard, NVIDIA Joseph Emmons, NVIDIA Thomas Ludwig, MAXON Status Shipping with NVIDIA driver release 347.88 (March 2015) Version Last Modified Date: November 3, 2015 Revision: 6 Number OpenGL Extension #477 Dependencies This extension interacts with NV_vertex_buffer_unified_memory. This extension interacts with NV_uniform_buffer_unified_memory. This extension interacts with NV_parameter_buffer_object. This extension interacts with ARB_robust_buffer_access_behavior This extension interacts with NV_bindless_texture and ARB_bindless_texture This extension interacts with NV_shader_buffer_load This extension interacts with ARB_shader_draw_parameters The extension is written against the OpenGL 4.4 Specification, Compatibility Profile. Overview This extension adds a few new features designed to provide very low overhead batching and replay of rendering commands and state changes: - A state object, which stores a pre-validated representation of the the state of (almost) the entire pipeline. - A more flexible and extensible MultiDrawIndirect (MDI) type of mechanism, using a token-based command stream, allowing to setup binding state and emit draw calls. - A set of functions to execute a list of the token-based command streams with state object changes interleaved with the streams. - Command lists enabling compilation and reuse of sequences of command streams and state object changes. Because state objects reflect the state of the entire pipeline, it is expected that they can be pre-validated and executed efficiently. It is also expected that when state objects are combined into a command list, the command list can diff consecutive state objects to produce a reduced/ optimized set of state changes specific to that transition. The token-based command stream can also be stored in regular buffer objects and therefore be modified by the server itself. This allows more complex work creation than the original MDI approach, which was limited to emitting draw calls only. New Procedures and Functions void CreateStatesNV(sizei n, uint *states); void DeleteStatesNV(sizei n, const uint *states); boolean IsStateNV(uint state); void StateCaptureNV(uint state, enum mode); uint GetCommandHeaderNV(enum tokenID, uint size); ushort GetStageIndexNV(enum shadertype); void DrawCommandsNV(enum primitiveMode, uint buffer, const intptr* indirects, const sizei* sizes, uint count); void DrawCommandsAddressNV(enum primitiveMode, const uint64* indirects, const sizei* sizes, uint count); void DrawCommandsStatesNV(uint buffer, const intptr* indirects, const sizei* sizes, const uint* states, const uint* fbos, uint count); void DrawCommandsStatesAddressNV(const uint64* indirects, const sizei* sizes, const uint* states, const uint* fbos, uint count); void CreateCommandListsNV(sizei n, uint *lists); void DeleteCommandListsNV(sizei n, const uint *lists); boolean IsCommandListNV(uint list); void ListDrawCommandsStatesClientNV(uint list, uint segment, const void** indirects, const sizei* sizes, const uint* states, const uint* fbos, uint count); void CommandListSegmentsNV(uint list, uint segments); void CompileCommandListNV(uint list); void CallCommandListNV(uint list); New Tokens Used in DrawCommandsStates buffer formats, in GetCommandHeaderNV to return the header: TERMINATE_SEQUENCE_COMMAND_NV 0x0000 NOP_COMMAND_NV 0x0001 DRAW_ELEMENTS_COMMAND_NV 0x0002 DRAW_ARRAYS_COMMAND_NV 0x0003 DRAW_ELEMENTS_STRIP_COMMAND_NV 0x0004 DRAW_ARRAYS_STRIP_COMMAND_NV 0x0005 DRAW_ELEMENTS_INSTANCED_COMMAND_NV 0x0006 DRAW_ARRAYS_INSTANCED_COMMAND_NV 0x0007 ELEMENT_ADDRESS_COMMAND_NV 0x0008 ATTRIBUTE_ADDRESS_COMMAND_NV 0x0009 UNIFORM_ADDRESS_COMMAND_NV 0x000a BLEND_COLOR_COMMAND_NV 0x000b STENCIL_REF_COMMAND_NV 0x000c LINE_WIDTH_COMMAND_NV 0x000d POLYGON_OFFSET_COMMAND_NV 0x000e ALPHA_REF_COMMAND_NV 0x000f VIEWPORT_COMMAND_NV 0x0010 SCISSOR_COMMAND_NV 0x0011 FRONT_FACE_COMMAND_NV 0x0012 Additions to Chapter 5 of the OpenGL 4.4 (Compatibility) Specification (Shared Objects and Multiple Contexts) Add state objects and command lists to the set of objects that can not be shared between contexts. Additions to Chapter 7 of the OpenGL 4.4 (Compatibility) Specification (Shared Objects and Multiple Contexts) Modify Section 7.12.2, Shader Memory Access Synchronization (modify list of barrier bits) * COMMAND_BARRIER_BIT: Command data sourced from buffer objects by Draw*Indirect, DispatchComputeIndirect and DrawCommands*NV commands after the barrier will reflect data written by shaders prior to the barrier. The buffer objects affected by this bit are derived from the DRAW_INDIRECT_BUFFER and DISPATCH_INDIRECT_BUFFER bindings, or from the arguments passed to DrawCommands*NV. Additions to Chapter 10 of the OpenGL 4.4 (Compatibility) Specification (Drawing Commands) Add a new Section 10.X (Indirect Draw Commands With State Changes) Add a new subsection 10.X.1 (State Objects) The current state of the rendering pipeline can be captured into a state object for later reuse with a new set of drawing commands. The name space for state objects is the unsigned integers, with zero reserved. The command: void CreateStatesNV(sizei n, uint *states); returns previously unused state object names in , and creates a state object in the initial state for each name. State objects are deleted by calling void DeleteStatesNV(sizei n, const uint *states); contains names of state objects to be deleted. Once a state object is deleted it has no contents and its name is again unused. Unused names in are silently ignored, as is the value zero. All the states that can be set via DrawCommandsStatesNV (as defined in Section 10.X.2) are excluded from the captured state and will be inherited from the most recent commands or GL context state. Binding state is, however, never inherited from GL context, only from commands. The command void StateCaptureNV(uint state, enum basicmode); captures the current state of the rendering pipeline into the object indicated by . indicates the basic Begin mode that this state object must be used with, see Table 10.X.1.2 for compatibility between primitive modes and basic modes. Table 10.X.1.2 (Primitive mode compatibility) basic primitive mode | compatible primitive mode --------------------------------------------------------------------- POINTS | POINTS LINES | LINES | LINE_STRIP | LINE_LOOP TRIANGLES | TRIANGLES | TRIANGLE_STRIP | TRIANGLE_FAN QUADS | QUADS | QUAD_STRIP PATCHES | PATCHES LINES_ADJACENCY | LINES_ADJACENCY | LINES_STRIP_ADJACENCY TRIANGLES_ADJACENCY | TRIANGLES_ADJACENCY | TRIANGLES_STRIP_ADJACENCY This rendering state includes: - Vertex attribute enable state, formats, types, relative offsets and strides. - Primitive state such as primitive restart and patch parameters, provoking vertex. - Immediate vertex attribute values as provided by glVertexAttrib* or glVertexAttribI* - All active program binaries except compute (either from the active program pipeline or from UseProgram) with their current subroutine configuration. - Rasterization, multisample fragment operation, depth, stencil, and blending state. - Rasterization state such as stippling and polygon modes and offsets. - Viewport, scissor, and depth range state. - Framebuffer attachment configuration: attachment state including attachment formats, drawbuffer state, and target/layer information, but not including actual attachments or sizes of attachments (these are stored separately). - Framebuffer attachment textures (but not their residency state). It does NOT include: - Bound vertex buffers or vertex unified addresses, or their offsets, or bound index buffers/addresses. - Other program-related bindings, such as shader storage buffers, atomic counter buffers, texture and sampler bindings. - Default-block uniform values from active programs - Blending constant color, front and back stencil reference values, alpha test threshold. - Polygon offset values. - Viewport and scissor rectangle for viewport index zero. Essentially all state that can be manupulated by the commands listed in 10.X.2 (Drawing with Commands) is excluded from the state capture. INVALID_ENUM is generated if is not a basic primitive mode, as listed in Table 10.X.1.2. INVALID_OPERATION is generated if the default framebuffer is bound as either draw or read buffer. INVALID_OPERATION is generated if transform feedback is enabled. INVALID_OPERATION is generated if occlusion query is enabled. INVALID_OPERATION is generated if the current active program or program pipeline makes use of SHADER_STORAGE_BUFFER, ATOMIC_COUNTER_BUFFER or has uniforms defined in the default uniform-block, or uniforms inheriting from fixed function state (gl_ModelView etc.). INVALID_OPERATION is generated if the current active program or program pipeline uses uniform blocks that did not have the "commandBindableNV" flag set (see "Modifications to the OpenGL Shading Language Specification" section). INVALID_OPERATION is generated if neither program, nor program pipeline objects are actively used. Add a new subsection 10.X.2 (Drawing with Commands) void DrawCommandsNV(enum mode, uint buffer, const intptr* indirects, const sizei* sizes, uint count); void DrawCommandsAddressNV(enum mode, const uint64* indirects, const sizei* sizes, uint count); These commands accept arrays of buffer addresses (either an array of offsets into a buffer named by , or an array of GPU addresses ), and an array of sequence lengths in . All arrays have entries. The current binding state of vertex, element and uniform buffers will not be effective but must be set via commands within the buffer, other state will however be inherited from the current OpenGL context. INVALID_ENUM is generated if is not an accepted value. INVALID_VALUE is generated if is not a valid buffer object. INVALID_OPERATION is generated if a geometry shader is active and is incompatible with the input primitive type of the geometry shader in the currently installed program object. INVALID_OPERATION is generated if the default (zero) frame buffer object is currently bound as DRAW_FRAMEBUFFER, a non-zero frame buffer object is required. DrawCommandsNV and DrawCommandsAddressNV are equivalent to: Save current GL state; enum indexType = UNSIGNED_SHORT; for (uint i = 0; i < count; i++) { uint64 address = address computed from +[i]; indexType = DrawCommandSequenceNV(, indexType, address, sizes[i]); } Restore current GL state; The command: enum DrawCommandSequenceNV(enum mode, enum indexType, void *address, sizei size); does not exist in the GL, but is used to describe functionality in the rest of this section. DrawCommandSequenceNV is a flexible and extensible command that executes simple state changes and draw commands based on a tokenized format. The loop above illustrates that the state changes from one invocation will influence the next. All rendering is peformed as if the client states for VERTEX_ATTRIB_ARRAY_UNIFIED_NV, ELEMENT_ARRAY_UNIFIED_NV and UNIFORM_BUFFER_UNIFIED_NV are enabled. It is defined by the following pseudo code, tokens, and structures: Table 10.X.2 (Token values and command structure names) tokenID | Command --------------------------------------------------------------------- TERMINATE_SEQUENCE_COMMAND_NV | TerminateSequenceCommandNV NOP_COMMAND_NV | NOPCommandNV DRAW_ELEMENTS_COMMAND_NV | DrawElementsCommandNV DRAW_ARRAYS_COMMAND_NV | DrawArraysCommandNV DRAW_ELEMENTS_STRIP_COMMAND_NV | DrawElementsCommandNV DRAW_ARRAYS__STRIP_COMMAND_NV | DrawArraysCommandNV DRAW_ELEMENTS_INSTANCED_COMMAND_NV | DrawElementsInstancedCommandNV DRAW_ARRAYS_INSTANCED_COMMAND_NV | DrawArraysInstancedCommandNV ELEMENT_ADDRESS_COMMAND_NV | ElementAddressCommandNV ATTRIBUTE_ADDRESS_COMMAND_NV | AttributeAddressCommandNV UNIFORM_ADDRESS_COMMAND_NV | UniformAddressCommandNV BLEND_COLOR_COMMAND_NV | BlendColorCommandNV STENCIL_REF_COMMAND_NV | StencilRefCommandNV LINE_WIDTH_COMMAND_NV | LineWidthCommandNV POLYGON_OFFSET_COMMAND_NV | PolygonOffsetCommandNV ALPHA_REF_COMMAND_NV | AlphaRefCommandNV VIEWPORT_COMMAND_NV | ViewportCommandNV SCISSOR_COMMAND_NV | ScissorCommandNV FRONT_FACE_COMMAND_NV | FrontFaceCommandNV Tight packing is used for all structures typedef struct { uint header; } TerminateSequenceCommandNV; typedef struct { uint header; } NOPCommandNV; typedef struct { uint header; uint count; uint firstIndex; uint baseVertex; } DrawElementsCommandNV; typedef struct { uint header; uint count; uint first; } DrawArraysCommandNV; typedef struct { uint header; uint mode; uint count; uint instanceCount; uint firstIndex; uint baseVertex; uint baseInstance; } DrawElementsInstancedCommandNV; typedef struct { uint header; uint mode; uint count; uint instanceCount; uint first; uint baseInstance; } DrawArraysInstancedCommandNV; typedef struct { uint header; uint addressLo; uint addressHi; uint typeSizeInByte; } ElementAddressCommandNV; typedef struct { uint header; uint index; uint addressLo; uint addressHi; } AttributeAddressCommandNV; typedef struct { uint header; ushort index; ushort stage; uint addressLo; uint addressHi; } UniformAddressCommandNV; typedef struct { uint header; float red; float green; float blue; float alpha; } BlendColorCommandNV; typedef struct { uint header; uint frontStencilRef; uint backStencilRef; } StencilRefCommandNV; typedef struct { uint header; float lineWidth; } LineWidthCommandNV; typedef struct { uint header; float scale; float bias; } PolygonOffsetCommandNV; typedef struct { uint header; float alphaRef; } AlphaRefCommandNV; typedef struct { uint header; uint x; uint y; uint width; uint height; } ViewportCommandNV; // only ViewportIndex 0 typedef struct { uint header; uint x; uint y; uint width; uint height; } ScissorCommandNV; // only ViewportIndex 0 typedef struct { uint header; uint frontFace; // 0 for CW, 1 for CCW } FrontFaceCommandNV; enum DrawCommandSequenceNV(enum mode, enum indexType, void *address, sizei size) { enum modeStrip; if (mode == TRIANGLES) modeStrip = TRIANGLE_STRIP; else if (mode == LINES) modeStrip = LINE_STRIP; else if (mode == LINES_ADJACENCY) modeStrip = LINE_STRIP_ADJACENCY; else if (mode == TRIANGLES_ADJACENCY) modeStrip = TRIANGLE_STRIP_ADJACENCY; else if (mode == QUADS) modeStrip = QUAD_STRIP; else modeStrip = mode; enum modeSpecial; if (mode == LINES) modeSpecial = LINE_LOOP; else if (mode == TRIANGLES) modeSpecial = TRIANGLE_FAN; else modeSpecial = mode; void *current = address; while (current != (ubyte *)address + size) { uint header = *(uint*)current; switch( GetTokenType(header)){ case TERMINATE_SEQUENCE_NV: { return indexType; } break; case NOP_COMMAND_NV: break; case DRAW_ELEMENTS_COMMAND_NV: { DrawElementsCommandNV* cmd = (DrawElementsCommandNV*)current; DrawElementsBaseVertex(mode, cmd->count, indexType, (void*)(cmd->firstIndex * sizeofindextype), cmd->baseVertex); } break; case DRAW_ARRAYS_COMMAND_NV: { DrawArraysCommandNV* cmd = (DrawArraysCommandNV*)current; DrawArrays(mode, cmd->first, cmd->count); } break; case DRAW_ELEMENTS_STRIP_COMMAND_NV: { DrawElementsCommandNV* cmd = (DrawElementsCommandNV*)current; DrawElementsBaseVertex(modeStrip, cmd->count, indexType, (void*)(cmd->firstIndex * sizeofindextype), cmd->baseVertex); } break; case DRAW_ARRAYS_STRIP_COMMAND_NV: { DrawArraysCommandNV* cmd = (DrawArraysCommandNV*)current; DrawArrays(modeStrip, cmd->first, cmd->count); } break; case DRAW_ELEMENTS_INSTANCED_COMMAND_NV: { // undefined behavior if (cmd->mode != mode && cmd->mode != modeStrip && cmd->mode != modeSpecial) DrawElementsInstancedCommandNV* cmd = (DrawElementsInstancedCommandNV*)current; DrawElementsIndirect(cmd->mode, indexType, &cmd->count); } break; case DRAW_ARRAYS_INSTANCED_COMMAND_NV: { // undefined behavior if (cmd->mode != mode && cmd->mode != modeStrip && cmd->mode != modeSpecial) DrawArraysInstancedCommandNV* cmd = (DrawArraysInstancedCommandNV*)current; DrawArraysIndirect(cmd->mode, &cmd->count); } break; case ELEMENT_ADDRESS_COMMAND_NV: { ElementAddressCommandNV* cmd = (ElementAddressCommandNV*)current; switch(cmd->typeSizeInByte){ case 1: indexType = UNSIGNED_BYTE; break; case 2: indexType = UNSIGNED_SHORT; break; case 4: indexType = UNSIGNED_INT; break; } BufferAddressRangeNV(ELEMENT_ARRAY_ADDRESS_NV, 0, uint64(cmd->addressLo) | (uint64(cmd->addressHi)<<32), 0x7FFFFFFF); } break; case ATTRIBUTE_ADDRESS_COMMAND_NV: { AttributeAddressCommandNV* cmd = (AttributeAddressCommandNV*)current; BufferAddressRangeNV(VERTEX_ATTRIB_ARRAY_ADDRESS_NV, cmd->index, uint64(cmd->addressLo) | (uint64(cmd->addressHi)<<32), 0x7FFFFFFF); } break; case UNIFORM_ADDRESS_COMMAND_NV: { UniformAddressCommandNV* cmd = (UniformAddressCommandNV*)current; BufferAddressRangeNV(UNIFORM_BUFFER_ADDRESS_NV, cmd->index, uint64(cmd->addressLo) | (uint64(cmd->addressHi)<<32), 0x10000); } break; case BLEND_COLOR_COMMAND_NV: { BlendColorCommandNV* cmd = (BlendColorCommandNV*)current; BlendColor(cmd->red,cmd->green,cmd->blue,cmd->alpha); } break; case STENCIL_REF_COMMAND_NV: { StencilRefCommandNV* cmd = (StencilRefCommandNV*)current; StencilFuncSeparate(FRONT, asIs, cmd->frontStencilRef, asIs); StencilFuncSeparate(BACK, asIs, cmd->backStencilRef, asIs); } break; case LINE_WIDTH_COMMAND_NV: { LineWidthCommandNV* cmd = (LineWidthCommandNV*)current; LineWidth(cmd->lineWidth); } break; case POLYGON_OFFSET_COMMAND_NV: { PolygonOffsetCommandNV* cmd = (PolygonOffsetCommandNV*)current; PolygonOffset(cmd->scale,cmd->bias); } break; case ALPHA_REF_COMMAND_NV: { AlphaRefCommandNV* cmd = (AlphaRefCommandNV*)current; AlphaFunc(asIs, cmd->alphaRef); } break case VIEWPORT_COMMAND_NV: { ViewportCommandNV* cmd = (ViewportCommandNV*)current; Viewport (cmd->x,cmd->y,cmd->width,cmd->height); } break; case SCISSOR_COMMAND_NV: { ScissorCommandNV* cmd = (ScissorCommandNV*)current; Scissor(cmd->x,cmd->y,cmd->width,cmd->height); } break; case FRONT_FACE_COMMAND_NV: { FrontFaceCommandNV* cmd = (FrontFaceCommandNV*)current; FrontFace(cmd->frontFace ? CW : CCW); } break; } current = (ubyte *)current + GetTokenSize(header); } return indexType; } None of the commands called by DrawCommandSequenceNV may generate their appropriate errors, providing erroneous data as parameters or generating state that normally would create errors when executed by the server can produce undefined results and may cause program termination. The residency of all resources referenced directly (buffer addresses inside tokens) or indirectly (texture handles inside uniform buffer objects) must be managed explicitly. (XXX should we add something similar to CheckFramebufferStatus? for debugging, that tests the content in software and throws error + offset into buffer triggering the error) All BufferAddressRangeNV calls issued by DrawCommandSequenceNV are effective independent of their appropriate client state being enabled or not. uint GetCommandHeaderNV(enum tokenID, uint size) Returns the encoded 32bit header value for a given command; the returned value is implementation specific. The is only provided as basic consistency check, since the size of each structure is fixed and no padding is allowed. The value is the sum of the header and the command specific structure. INVALID_ENUM is generated if is not one of the values listed under Table 10.X.2. INVALID_VALUE is thrown if the does not match the fixed size of a command defined by the spec. ushort GetStageIndexNV(enum shadertype) Returns the 16bit value for a specific shader stage; the returned value is implementation specific. The value is to be used with the stage field within UniformAddressCommandNV tokens. Add a new subsection 10.X.3 (Drawing with Commands and State Objects) State objects may be used in rendering with the commands: void DrawCommandsStatesNV(uint buffer, const intptr* indirects, const sizei* sizes, const uint* states, const uint* fbos, uint count); void DrawCommandsStatesAddressNV(const uint64* indirects, const sizei* sizes, const uint* states, const uint* fbos, uint count); These commands accept arrays of buffer addresses (either an array of offsets into a buffer named by , or an array of GPU addresses ), an array of sequence lengths in , and an array of state object names in , of which all names must be non-zero. Frame buffer object names are stored in and can be either zero or non-zero. All arrays have entries. The residency of textures used as attachment inside the state object's captured fbo or the passed fbo must managed explicitly. INVALID_VALUE is generated if one entry of is zero. INVALID_OPERATION is generated if the fbo configuration from mismatches the configuration inside the corresponding state object from . DrawCommandsStatesNV and DrawCommandsStatesAddressNV are equivalent to: Save current GL state; enum indexType = UNSIGNED_SHORT; for (uint i = 0; i < count; i++) { fbo = LookupFbo(fbos[i]); stateObject = LookupStateObject(states[i]); if ( i == 0){ Set full state captured by stateObject; } else { Set difference of state going from [i-1] to current stateObject, } if ( fbo == 0) { BindFramebuffer(FRAMEBUFFER, stateObject.fbo.name); } else if ( stateObject.fbo.configuration == fbo.configuration ){ // The configuration excludes attachment textures and size information, however // includes attached texture formats and other state (see StateCaptureNV). BindFramebuffer(FRAMEBUFFER, fbo.name); } else { // Only compatible fbo states can be used. generate ERROR INVALID_OPERATION; return; } enum mode = primitive mode from stateObject uint64 address = address computed from +[i]; indexType = DrawCommandSequenceNV(mode, indexType, address, sizes[i]); } Restore current GL state; where LookupFbo and LookupStateObject return the driver's internal fbo and stateObject object and stateObject.fbo is the driver's fbo state object and fbo.configuration and fbo.name are the current configuration of a fbo and the fbo's name respectively. Add a new section 10.X.4 (Command Lists) A list of DrawCommandsStates* commands may be compiled into a command list, for further optimization and efficient reuse. The name space for command lists is the unsigned integers, with zero reserved. The command: void CreateCommandListsNV(sizei n, uint *lists); returns previously unused command list names in , and creates a command list in the initial state for each name. Command lists are deleted by calling void DeleteCommandListsNV(sizei n, const uint *lists); contains names of command lists to be deleted. Once a command list is deleted it has no contents and its name is again unused. Unused names in are silently ignored, as is the value zero. The command void CommandListSegmentsNV(uint list, uint segments); indicates that will have number of segments, each of which is a list of command sequences that it enqueues. This must be called before any commands are enqueued. In the initial state, a command list has a single segment. A command list's initial state allows it to enqueue commands, but not to be executed. The following command can be enqueued: void ListDrawCommandsStatesClientNV(uint list, uint segment, const void** indirects, const sizei* sizes, const uint* states, const uint* fbos, uint count); A list has multiple segments and each segment enqueues an ordered list of command sequences. This command enqueues the equivalent of the DrawCommandsStatesNV commands into the list indicated by on the segment indicated by except that the sequence data is copied from the sequences pointed to by the pointer. The pointer should point to a list of size of pointers, each of which should point to a command sequence. The pre-validated state from is saved into the command list, rather than a reference to the state object (i.e. the state objects or fbos could be deleted and the command list would be unaffected). This includes native GPU addresses for all textures indirectly referenced through the fbos passed or state objects' fbos attachments, therefore a recompile of the command list is required if such referenced textures change their allocation (for example due to resizing), as well as explicit management of the residency of the textures prior CallCommandListNV. ListDrawCommandsStatesClientNV performs a by-value copy of the indirect data based on the provided client-side pointers. In this case the content is fully immutable, while the buffer-based versions can change the content of the buffers at any later time. The command void CompileCommandListNV(uint list); make the list indicated by switch from allowing collection of commands to allowing its execution. At this time, the implementation may generate optimized commands to transition between states as efficiently as possible. Lists may be executed with the command void CallCommandListNV(uint list); This executes the command list indicated by , which operates as if the DrawCommandsStates* commands were replayed in the order they were enqueued on each segment, starting from segment zero and proceeding to the maximum segment. All buffer or texture resources' residency must be managed explicitly, including texture attachments of the effective fbos during list enqueuing. Modifications to the OpenGL Shading Language Specification, Version 4.40 Including the following line in a shader can be used to control the language features described in this extension: #extension GL_NV_command_list : where is as specified in section 3.3. New preprocessor #defines are added to the OpenGL Shading Language: #define GL_NV_command_list 1 Modify Section 4.4.5, "Uniform and Shader Storage Block Layout Qualifiers" (modify first paragraph, p.78) Layout qualifiers can be used for uniform and shader storage blocks, but not for non-block uniform declarations. The layout qualifier identifiers (and shared keyword) for uniform and shader storage blocks are layout-qualifier-id shared packed std140 std430 row_major column_major binding = integer-constant-expression offset = integer-constant-expression align = integer-constant-expression commandBindableNV (add paragraph prior "When multiple arguments", p. 80) The commandBindableNV qualifier enables the associated uniform block to be updated via UniformAddressCommandNVs when executing DrawCommandsStatesNV. When commandBindableNV is enabled the identifier must be provided for each block, only its value will correspond with the index field of a UniformAddressCommandNV. A link time error will be thrown if an index is greater or equal to MAX_PROGRAM_PARAMETER_BUFFER_BINDINGS_NV. Changing the binding point by the OpenGL API may not influence this associated index value and may cause UniformAddressCommandNVs to have undefined behavior. Dependencies on OpenGL 4.4 (Core Profile) If only the core profile of OpenGL 4.4 is supported, references to functionality deprecated by OpenGL 3.0 (built-in input/output/uniform variables corresponding to fixed-function vertex attributes, fixed-function vertex and fragment processing) should be removed and/or replaced with functionality supported in the core profile. In such an environment, the QUADS primitive type is not supported by the StateCaptureNV function. StateCaptureNV will also ignore all references to deprecated state such as line stippling. The ALPHA_REF_COMMAND_NV is not allowed to be used, therefore GetCommandHeaderNV will return an error if the token enum is passed. Interactions with NV_shader_buffer_load The GPU addresses used in ELEMENT_ADDRESS_COMMAND_NV, ATTRIBUTE_ADDRESS_COMMAND_NV and UNIFORM_ADDRESS_COMMAND_NV can be queried via the API provided in this extension. Furthermore the same API must be used to ensure residency of such buffers when draw commands using such addresses are issued. Interactions with NV_bindless_texture or ARB_bindless_texture Residency of fbo attachment textures referenced in state objects or command lists must be managed explicitly using the API provided by either of these extensions. Interactions with NV_parameter_buffer_object The UNIFORM_ADDRESS_COMMAND_NV described in (Drawing with Commands), will affect the PROGRAM_PARAMETER_BUFFER of the target stage defined within the command token. Interactions with ARB_robust_buffer_access_behavior The buffer setups performed by ELEMENT_ADDRESS_COMMAND_NV, ATTRIBUTE_ADDRESS_COMMAND_NV and UNIFORM_ADDRESS_COMMAND_NV do not provide the required buffer ranges for robust buffer access. Therefore draw calls executed under this type of buffer setup will not respect the robust buffer access rules. Interactions with ARB_shader_draw_parameters The drawing operations performed through this extension will not support setting of the built-in GLSL values that were added by ARB_shader_draw_parameters (gl_BaseInstanceARB, gl_BaseVertexARB, gl_DrawIDARB). Accessing these variables will result in undefined values. Additions to the AGL/GLX/WGL Specifications None. GLX Protocol None. Errors New State None. Issues 1) What motivates the design? The primary goal is to be able to reuse pre-validated command buffers. Other APIs and proposals have addressed this with various incarnations of command lists or state objects, but a recurring problem is that interactions between various stages of the pipeline prevent this prevalidation and reuse. These interactions are often hardware-specific (and differ from vendor to vendor or even generation to generation) and new interactions are introduced by new features that were not imagined when the prevalidation scheme was proposed. We attempt to address this by having a monolithic state object that encompasses (almost) the entire state of the pipeline. This should provide enough information for all implementations to do any needed cross- validation. We try to create these in a way that minimizes the new API footprint - since we want ALL state (including any added in the future), we just capture it from the current state of the context. We expect that a captured state object will be represented as a list of commands to send to the GPU. While that list of commands may be fairly large, it is also well-suited to filtering redundant changes when switching from one state object to another (filtering may occur on the GPU, or by some processing on the CPU). We anticipate that filtering will be applied when compiling a command list, but it is likely that some (perhaps less aggressive) filtering will also occur in unlisted DrawCommandsStates commands. 2) Should binding state be captured? Binding state should not be captured, for multiple reasons. The memory management performed by the driver as part of legacy command execution is expensive and not well-suited for the prevalidation of commands. This can be replaced by explicit bindless memory management APIs (e.g. Make*Resident). Resource bindings also require behind-the-scenes management of internal GPU structures like texture handles. Again, this can be replaced by the bindless APIs. 3) What FBO state should be captured? We definitely want to capture enough information to be able to do any state-based recompiles of the fragment shader, which would include drawbuffer state and format state. However, it is not desirable to have all properties of the FBO be captured, e.g. if attachment width/height were captured then state objects could become invalid if the window shape changed RESOLVED: state objects reference the FBO configuration, but passing other compatible FBOs during rendering is possible. Furthermore the VIEWPORT_COMMAND_NV allows setting the appropriate viewport state. 4) Can UBOs be accessed? How? RESOLVED: We want to encourage the "first level of the scene graph" information read by shaders to be accessed with fast UBO memory accesses. UNIFORM_ADDRESS_COMMAND_NV provides this mechanism. 5) What about Compute? Compute does not have the same complex state interactions that the graphics pipeline has, so it is not included in this extension. 6) What dynamic state should be allowed? There are some state values which are pretty much raw integer/floating point data, where requiring a unique state object for each value would drastically bloat the number of state objects needed and break batching. We allow for a few such values to be set in the token command buffer rather than in the state object. The current list is motivated by similar state in other APIs, and may not be complete. 7) What are the "segments" in command lists? These are multiple "starting points" for appending commands to the list, which are ultimately replayed in order by segments. This may be useful to build a multipass rendering algorithm with only a single traversal of the scene graph. 8) When are state objects consumed into the list? This could either occur as the command is appended to the list, or during CompileCommandListNV. RESOLVED: At ListDrawCommandsStatesClientNV time. 9) Do we want to have multiple modes in the same dispatch ? RESOLVED: yes, state-objects with different modes can be used, allowing fast transitioning between those. Furthermore, it is possible to mix LINES/LINE_STRIP/LINE_LOOP or TRIANGLES/TRIANGLE_STRIP/TRIANGLE_FAN and others using the same state object, as long as their base primitive mode is the same. 10) Do we want to allow mixing DrawArrays and DrawElements in the same dispatch ? RESOLVED: yes. 11) What happens if the token buffer is modified while it is being dispatched ? RESOLVED: there is no guarantee of coherency, so undefined behavior. 12) I would like to change states in the middle; how do I do this ? RESOLVED: you can select a new state object or state tokens, but you cannot change state in the indirect buffer itself. 13) Is the token buffer multithread safe; does it scale ? RESOLVED: yes. it is trivial to allocate a token buffer per thread, and then submit them in the main thread sequentially. since the implementation is not involved when the application writes to them, the only thread safety requirements are in the application itself. Command lists and state objects are, however, currently not context share-able, though as rendering is much more efficient now, the main dispatching thread can spend the time on preparing state objects prior drawing. The cost of glStateCaptureNV is no worse than a classic API draw call, and exploiting temporal coherence not too many states would be "new" frame to frame, but instead cached states can be reused. 14) Can I reuse token buffer multiple times ? RESOLVED: yes. 15) Should we use a fixed length decoding or at the very least a size in the header ? RESOLVED: fixed length is used. As basic consistency check the size is also passed to header generation. The NOP command can be used to pad structures to custom sizes. 16) Can I do buffer updates in a single DrawCommands call ? RESOLVED: NO. Updating memory in general requires synchronization, and having lots of updates inside a single DrawCommands would become a performance bottleneck. 17) I want to implement some occlusion scheme and skip some of the draws; how do I do this ? RESOLVED: this extension does not offer a conditional render facility, but this can be implemented by using NOP or preferably TERMINATE_SEQUENCE commands in the stream. 18) I want to implement some level of detail scheme; is that possible ? RESOLVED: you can use NOP or TERMINATE_SEQUENCE to skip the level of details that you don't want to draw. 19) Why can't I just get a token to change the state, and avoid specifying lists of state and indirect buffers ? RESOLVED: Getting a token to specify a state switch imply that the application would have access to a virtual address of state changes. This would potentially open security issue, since part of the validation may involve complex sequence of programming. 20) Instead of void** which means all commands must be stored in one buffer, could GLuint64** be used when EnableClientState(DRAW_INDIRECT_UNIFIED_NV) is set? This would allow managing different command buffers independently. RESOLVED: separate Address command added 21) How big can each indirect command list's buffer size be? RESOLVED: no limit required. 22) How to retrieve the "index" within UniformAddressCommandNV, or is that the GL binding point? RESOLVED: added commandBindableNV layout qualifier in GLSL for uniform blocks to ensure fixed binding unit. Also added stage value to command. 23) In what condition is the state left, that is modified by tokens, after the dispatch call? RESOLVED: state is reset. 24) How does working with this extension look like You will find related samples at https://github.com/nvpro-samples 25) How can I use textures, images, shader storage or atomic counter buffers in combination with state objects? Textures and images are covered via NV/ARB_bindless_texture, you can store their handles inside uniform buffers. Shader storage and atomic counter buffers are currently not directly exposed, however NV_gpu_shader5 allows storing pointers to such buffers inside uniform buffers as well. Atomic counters can be replaced by regular atomic increments. Alternatively use DrawCommandsNV or DrawCommandsAddressNV, which does support any GLSL programs with these resource bindings, as well as default-block uniforms. Revision History Rev. Date Author Changes ---- -------- -------- ----------------------------------------- 6 11/3/2015 ckubisch Rephrase what stateobjects capture and what not 5 8/17/2015 ckubisch correct errors for DrawCommandsNV and DrawCommandsAddressNV rendering to default framebuffer is not allowed. Clarify which state is inherited (updated Issue 25). 4 6/18/2015 ckubisch Add missing interaction with ARB_shader_draw_parameters 3 5/27/2015 jemmons Multiple minor fixes and clarifications 2 4/16/2015 pboudier Fix incorrect type (size_t is now sizei) in ListDrawCommandsStatesClientNV 1 pboudier concept jbolz base spec ckubisch detailed spec mjk Internal revisions