// Copyright 2015-2022 The Khronos Group Inc. // // SPDX-License-Identifier: CC-BY-4.0 [[shaders]] = Shaders A shader specifies programmable operations that execute for each vertex, control point, tessellated vertex, primitive, fragment, or workgroup in the corresponding stage(s) of the graphics and compute pipelines. Graphics pipelines include vertex shader execution as a result of <>, followed, if enabled, by tessellation control and evaluation shaders operating on <>, geometry shaders, if enabled, operating on primitives, and fragment shaders, if present, operating on fragments generated by <>. In this specification, vertex, tessellation control, tessellation evaluation and geometry shaders are collectively referred to as <>s and occur in the logical pipeline before rasterization. The fragment shader occurs logically after rasterization. Only the compute shader stage is included in a compute pipeline. Compute shaders operate on compute invocations in a workgroup. Shaders can: read from input variables, and read from and write to output variables. Input and output variables can: be used to transfer data between shader stages, or to allow the shader to interact with values that exist in the execution environment. Similarly, the execution environment provides constants describing capabilities. Shader variables are associated with execution environment-provided inputs and outputs using _built-in_ decorations in the shader. The available decorations for each stage are documented in the following subsections. [[shader-modules]] == Shader Modules [open,refpage='VkShaderModule',desc='Opaque handle to a shader module object',type='handles'] -- _Shader modules_ contain _shader code_ and one or more entry points. Shaders are selected from a shader module by specifying an entry point as part of <> creation. The stages of a pipeline can: use shaders that come from different modules. The shader code defining a shader module must: be in the SPIR-V format, as described by the <> appendix. Shader modules are represented by sname:VkShaderModule handles: include::{generated}/api/handles/VkShaderModule.adoc[] -- [open,refpage='vkCreateShaderModule',desc='Creates a new shader module object',type='protos'] -- To create a shader module, call: include::{generated}/api/protos/vkCreateShaderModule.adoc[] * pname:device is the logical device that creates the shader module. * pname:pCreateInfo is a pointer to a slink:VkShaderModuleCreateInfo structure. * pname:pAllocator controls host memory allocation as described in the <> chapter. * pname:pShaderModule is a pointer to a slink:VkShaderModule handle in which the resulting shader module object is returned. Once a shader module has been created, any entry points it contains can: be used in pipeline shader stages as described in <> and <>. ifdef::VK_EXT_graphics_pipeline_libraries[] If the <> feature is enabled, shader module creation can: be omitted entirely. Instead, applications should: provide the slink:VkShaderModuleCreateInfo structure directly in to pipeline creation by chaining it to slink:VkPipelineShaderStageCreateInfo. This avoids the overhead of creating and managing an additional object. endif::VK_EXT_graphics_pipeline_libraries[] .Valid Usage **** ifdef::VK_EXT_validation_cache[] * [[VUID-vkCreateShaderModule-pCreateInfo-06904]] If pname:pCreateInfo is not `NULL`, pname:pCreateInfo->pNext must: be `NULL` or a pointer to a slink:VkShaderModuleValidationCacheCreateInfoEXT structure endif::VK_EXT_validation_cache[] ifndef::VK_EXT_validation_cache[] * [[VUID-vkCreateShaderModule-pCreateInfo-06905]] If pname:pCreateInfo is not `NULL`, pname:pCreateInfo->pNext must: be `NULL` endif::VK_EXT_validation_cache[] **** include::{generated}/validity/protos/vkCreateShaderModule.adoc[] -- [open,refpage='VkShaderModuleCreateInfo',desc='Structure specifying parameters of a newly created shader module',type='structs'] -- The sname:VkShaderModuleCreateInfo structure is defined as: include::{generated}/api/structs/VkShaderModuleCreateInfo.adoc[] * pname:sType is the type of this structure. * pname:pNext is `NULL` or a pointer to a structure extending this structure. * pname:flags is reserved for future use. * pname:codeSize is the size, in bytes, of the code pointed to by pname:pCode. * pname:pCode is a pointer to code that is used to create the shader module. The type and format of the code is determined from the content of the memory addressed by pname:pCode. .Valid Usage **** * [[VUID-VkShaderModuleCreateInfo-codeSize-01085]] pname:codeSize must: be greater than 0 ifndef::VK_NV_glsl_shader[] * [[VUID-VkShaderModuleCreateInfo-codeSize-01086]] pname:codeSize must: be a multiple of 4 * [[VUID-VkShaderModuleCreateInfo-pCode-01087]] pname:pCode must: point to valid SPIR-V code, formatted and packed as described by the <> * [[VUID-VkShaderModuleCreateInfo-pCode-01088]] pname:pCode must: adhere to the validation rules described by the <> section of the <> appendix endif::VK_NV_glsl_shader[] ifdef::VK_NV_glsl_shader[] * [[VUID-VkShaderModuleCreateInfo-pCode-01376]] If pname:pCode is a pointer to SPIR-V code, pname:codeSize must: be a multiple of 4 * [[VUID-VkShaderModuleCreateInfo-pCode-01377]] pname:pCode must: point to either valid SPIR-V code, formatted and packed as described by the <> or valid GLSL code which must: be written to the `GL_KHR_vulkan_glsl` extension specification * [[VUID-VkShaderModuleCreateInfo-pCode-01378]] If pname:pCode is a pointer to SPIR-V code, that code must: adhere to the validation rules described by the <> section of the <> appendix * [[VUID-VkShaderModuleCreateInfo-pCode-01379]] If pname:pCode is a pointer to GLSL code, it must: be valid GLSL code written to the `GL_KHR_vulkan_glsl` GLSL extension specification endif::VK_NV_glsl_shader[] * [[VUID-VkShaderModuleCreateInfo-pCode-01089]] pname:pCode must: declare the code:Shader capability for SPIR-V code * [[VUID-VkShaderModuleCreateInfo-pCode-01090]] pname:pCode must: not declare any capability that is not supported by the API, as described by the <> section of the <> appendix * [[VUID-VkShaderModuleCreateInfo-pCode-01091]] If pname:pCode declares any of the capabilities listed in the <> appendix, one of the corresponding requirements must: be satisfied * [[VUID-VkShaderModuleCreateInfo-pCode-04146]] pname:pCode must: not declare any SPIR-V extension that is not supported by the API, as described by the <> section of the <> appendix * [[VUID-VkShaderModuleCreateInfo-pCode-04147]] If pname:pCode declares any of the SPIR-V extensions listed in the <> appendix, one of the corresponding requirements must: be satisfied **** include::{generated}/validity/structs/VkShaderModuleCreateInfo.adoc[] -- [open,refpage='VkShaderModuleCreateFlags',desc='Reserved for future use',type='flags'] -- include::{generated}/api/flags/VkShaderModuleCreateFlags.adoc[] tname:VkShaderModuleCreateFlags is a bitmask type for setting a mask, but is currently reserved for future use. -- ifdef::VK_EXT_validation_cache[] include::{chapters}/VK_EXT_validation_cache/shader-module-validation-cache.adoc[] endif::VK_EXT_validation_cache[] [open,refpage='vkDestroyShaderModule',desc='Destroy a shader module',type='protos'] -- To destroy a shader module, call: include::{generated}/api/protos/vkDestroyShaderModule.adoc[] * pname:device is the logical device that destroys the shader module. * pname:shaderModule is the handle of the shader module to destroy. * pname:pAllocator controls host memory allocation as described in the <> chapter. A shader module can: be destroyed while pipelines created using its shaders are still in use. .Valid Usage **** * [[VUID-vkDestroyShaderModule-shaderModule-01092]] If sname:VkAllocationCallbacks were provided when pname:shaderModule was created, a compatible set of callbacks must: be provided here * [[VUID-vkDestroyShaderModule-shaderModule-01093]] If no sname:VkAllocationCallbacks were provided when pname:shaderModule was created, pname:pAllocator must: be `NULL` **** include::{generated}/validity/protos/vkDestroyShaderModule.adoc[] -- ifdef::VK_EXT_shader_module_identifier[] [[shaders-identifiers]] == Shader Module Identifiers [open,refpage='vkGetShaderModuleIdentifierEXT',desc='Query a unique identifier for a shader module',type='protos'] -- Shader modules have unique identifiers associated with them. To query an implementation provided identifier, call: include::{generated}/api/protos/vkGetShaderModuleIdentifierEXT.adoc[] * pname:device is the logical device that created the shader module. * pname:shaderModule is the handle of the shader module. * pname:pIdentifier is a pointer to the returned slink:VkShaderModuleIdentifierEXT. The identifier returned by the implementation must: only depend on pname:shaderIdentifierAlgorithmUUID and information provided in the slink:VkShaderModuleCreateInfo which created pname:shaderModule. The implementation may: return equal identifiers for two different slink:VkShaderModuleCreateInfo structures if the difference does not affect pipeline compilation. Identifiers are only meaningful on different slink:VkDevice objects if the device the identifier was queried from had the same <> as the device consuming the identifier. .Valid Usage **** * [[VUID-vkGetShaderModuleIdentifierEXT-shaderModuleIdentifier-06884]] <> feature must: be enabled **** include::{generated}/validity/protos/vkGetShaderModuleIdentifierEXT.adoc[] -- [open,refpage='vkGetShaderModuleCreateInfoIdentifierEXT',desc='Query a unique identifier for a shader module create info',type='protos'] -- slink:VkShaderModuleCreateInfo structures have unique identifiers associated with them. To query an implementation provided identifier, call: include::{generated}/api/protos/vkGetShaderModuleCreateInfoIdentifierEXT.adoc[] * pname:device is the logical device that can: create a slink:VkShaderModule from pname:pCreateInfo. * pname:pCreateInfo is a pointer to a slink:VkShaderModuleCreateInfo structure. * pname:pIdentifier is a pointer to the returned slink:VkShaderModuleIdentifierEXT. The identifier returned by implementation must: only depend on pname:shaderIdentifierAlgorithmUUID and information provided in the slink:VkShaderModuleCreateInfo. The implementation may: return equal identifiers for two different slink:VkShaderModuleCreateInfo structures if the difference does not affect pipeline compilation. Identifiers are only meaningful on different slink:VkDevice objects if the device the identifier was queried from had the same <> as the device consuming the identifier. The identifier returned by the implementation in flink:vkGetShaderModuleCreateInfoIdentifierEXT must: be equal to the identifier returned by flink:vkGetShaderModuleIdentifierEXT given equivalent definitions of slink:VkShaderModuleCreateInfo and any chained pname:pNext structures. .Valid Usage **** * [[VUID-vkGetShaderModuleCreateInfoIdentifierEXT-shaderModuleIdentifier-06885]] <> feature must: be enabled **** include::{generated}/validity/protos/vkGetShaderModuleCreateInfoIdentifierEXT.adoc[] -- [open,refpage='VkShaderModuleIdentifierEXT',desc='A unique identifier for a shader module',type='structs'] -- slink:VkShaderModuleIdentifierEXT represents a shader module identifier returned by the implementation. include::{generated}/api/structs/VkShaderModuleIdentifierEXT.adoc[] * pname:sType is the type of this structure. * pname:pNext is `NULL` or a pointer to a structure extending this structure. * pname:identifierSize is the size, in bytes, of valid data returned in pname:identifier. * pname:identifier is a buffer of opaque data specifying an identifier. Any returned values beyond the first pname:identifierSize bytes are undefined:. Implementations must: return an pname:identifierSize greater than 0, and less-or-equal to ename:VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT. Two identifiers are considered equal if pname:identifierSize is equal and the first pname:identifierSize bytes of pname:identifier compare equal. Implementations may: return a different pname:identifierSize for different modules. Implementations should: ensure that pname:identifierSize is large enough to uniquely define a shader module. include::{generated}/validity/structs/VkShaderModuleIdentifierEXT.adoc[] -- [open,refpage='VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT',desc='Maximum length of a shader module identifier',type='consts'] -- ename:VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT is the length in bytes of a shader module identifier, as returned in slink:VkShaderModuleIdentifierEXT::pname:identifierSize. include::{generated}/api/enums/VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT.adoc[] -- endif::VK_EXT_shader_module_identifier[] [[shaders-execution]] == Shader Execution At each stage of the pipeline, multiple invocations of a shader may: execute simultaneously. Further, invocations of a single shader produced as the result of different commands may: execute simultaneously. The relative execution order of invocations of the same shader type is undefined:. Shader invocations may: complete in a different order than that in which the primitives they originated from were drawn or dispatched by the application. However, fragment shader outputs are written to attachments in <>. The relative execution order of invocations of different shader types is largely undefined:. However, when invoking a shader whose inputs are generated from a previous pipeline stage, the shader invocations from the previous stage are guaranteed to have executed far enough to generate input values for all required inputs. [[shaders-termination]] === Shader Termination A shader invocation that is _terminated_ has finished executing instructions. Executing code:OpReturn in the entry point, or executing code:OpTerminateInvocation in any function will terminate an invocation. Implementations may: also terminate a shader invocation when code:OpKill is executed in any function; otherwise it becomes a <>. In addition to the above conditions, <> are terminated when all non-helper invocations in the same <> either terminate or become <> via ifdef::VK_EXT_shader_demote_to_helper_invocation[] code:OpDemoteToHelperInvocationEXT or endif::VK_EXT_shader_demote_to_helper_invocation[] code:OpKill. A shader stage for a given command completes execution when all invocations for that stage have terminated. [[shaders-execution-memory-ordering]] == Shader Memory Access Ordering The order in which image or buffer memory is read or written by shaders is largely undefined:. For some shader types (vertex, tessellation evaluation, and in some cases, fragment), even the number of shader invocations that may: perform loads and stores is undefined:. In particular, the following rules apply: * <> and <> shaders will be invoked at least once for each unique vertex, as defined in those sections. * <> shaders will be invoked zero or more times, as defined in that section. * The relative execution order of invocations of the same shader type is undefined:. A store issued by a shader when working on primitive B might complete prior to a store for primitive A, even if primitive A is specified prior to primitive B. This applies even to fragment shaders; while fragment shader outputs are always written to the framebuffer in <>, stores executed by fragment shader invocations are not. * The relative execution order of invocations of different shader types is largely undefined:. [NOTE] .Note ==== The above limitations on shader invocation order make some forms of synchronization between shader invocations within a single set of primitives unimplementable. For example, having one invocation poll memory written by another invocation assumes that the other invocation has been launched and will complete its writes in finite time. ==== ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] The <> appendix defines the terminology and rules for how to correctly communicate between shader invocations, such as when a write is <> a read, and what constitutes a <>. Applications must: not cause a data race. endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] ifndef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] Stores issued to different memory locations within a single shader invocation may: not be visible to other invocations, or may: not become visible in the order they were performed. The code:OpMemoryBarrier instruction can: be used to provide stronger ordering of reads and writes performed by a single invocation. code:OpMemoryBarrier guarantees that any memory transactions issued by the shader invocation prior to the instruction complete prior to the memory transactions issued after the instruction. Memory barriers are needed for algorithms that require multiple invocations to access the same memory and require the operations to be performed in a partially-defined relative order. For example, if one shader invocation does a series of writes, followed by an code:OpMemoryBarrier instruction, followed by another write, then the results of the series of writes before the barrier become visible to other shader invocations at a time earlier or equal to when the results of the final write become visible to those invocations. In practice it means that another invocation that sees the results of the final write would also see the previous writes. Without the memory barrier, the final write may: be visible before the previous writes. Writes that are the result of shader stores through a variable decorated with code:Coherent automatically have available writes to the same buffer, buffer view, or image view made visible to them, and are themselves automatically made available to access by the same buffer, buffer view, or image view. Reads that are the result of shader loads through a variable decorated with code:Coherent automatically have available writes to the same buffer, buffer view, or image view made visible to them. The order that coherent writes to different locations become available is undefined:, unless enforced by a memory barrier instruction or other memory dependency. [NOTE] .Note ==== Explicit memory dependencies must: still be used to guarantee availability and visibility for access via other buffers, buffer views, or image views. ==== The built-in atomic memory transaction instructions can: be used to read and write a given memory address atomically. While built-in atomic functions issued by multiple shader invocations are executed in undefined: order relative to each other, these functions perform both a read and a write of a memory address and guarantee that no other memory transaction will write to the underlying memory between the read and write. Atomic operations ensure automatic availability and visibility for writes and reads in the same way as those to code:Coherent variables. [NOTE] .Note ==== Memory accesses performed on different resource descriptors with the same memory backing may: not be well-defined even with the code:Coherent decoration or via atomics, due to things such as image layouts or ownership of the resource - as described in the <> chapter. ==== [NOTE] .Note ==== Atomics allow shaders to use shared global addresses for mutual exclusion or as counters, among other uses. ==== endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] The SPIR-V *SubgroupMemory*, *CrossWorkgroupMemory*, and *AtomicCounterMemory* memory semantics are ignored. Sequentially consistent atomics and barriers are not supported and *SequentiallyConsistent* is treated as *AcquireRelease*. *SequentiallyConsistent* should: not be used. [[shaders-inputs]] == Shader Inputs and Outputs Data is passed into and out of shaders using variables with input or output storage class, respectively. User-defined inputs and outputs are connected between stages by matching their code:Location decorations. Additionally, data can: be provided by or communicated to special functions provided by the execution environment using code:BuiltIn decorations. In many cases, the same code:BuiltIn decoration can: be used in multiple shader stages with similar meaning. The specific behavior of variables decorated as code:BuiltIn is documented in the following sections. ifdef::VK_NV_mesh_shader,VK_EXT_mesh_shader[] [[shaders-task]] == Task Shaders Task shaders operate in conjunction with the mesh shaders to produce a collection of primitives that will be processed by subsequent stages of the graphics pipeline. Its primary purpose is to create a variable amount of subsequent mesh shader invocations. Task shaders are invoked via the execution of the <> pipeline. The task shader has no fixed-function inputs other than variables identifying the specific workgroup and invocation. ifdef::VK_NV_mesh_shader[] In the code:TaskNV {ExecutionModel} the number of mesh shader workgroups to create is specified via a code:TaskCountNV decorated output variable. endif::VK_NV_mesh_shader[] ifdef::VK_EXT_mesh_shader[] In the code:TaskEXT {ExecutionModel} the number of mesh shader workgroups to create is specified via the code:OpEmitMeshTasksEXT instruction. endif::VK_EXT_mesh_shader[] The task shader can write additional outputs to task memory, which can be read by all of the mesh shader workgroups it created. === Task Shader Execution Task workloads are formed from groups of work items called workgroups and processed by the task shader in the current graphics pipeline. A workgroup is a collection of shader invocations that execute the same shader, potentially in parallel. Task shaders execute in _global workgroups_ which are divided into a number of _local workgroups_ with a size that can: be set by assigning a value to the code:LocalSize ifdef::VK_VERSION_1_3,VK_KHR_maintenance4[or code:LocalSizeId] execution mode or via an object decorated by the code:WorkgroupSize decoration. An invocation within a local workgroup can: share data with other members of the local workgroup through shared variables and issue memory and control flow barriers to synchronize with other members of the local workgroup. ifdef::VK_EXT_mesh_shader[] ifdef::VK_VERSION_1_1,VK_KHR_multiview[] If the subpass includes multiple views in its view mask, a Task shader using code:TaskEXT {ExecutionModel} may: be invoked separately for each view. endif::VK_VERSION_1_1,VK_KHR_multiview[] endif::VK_EXT_mesh_shader[] [[shaders-mesh]] == Mesh Shaders Mesh shaders operate in workgroups to produce a collection of primitives that will be processed by subsequent stages of the graphics pipeline. Each workgroup emits zero or more output primitives and the group of vertices and their associated data required for each output primitive. Mesh shaders are invoked via the execution of the <> pipeline. The only inputs available to the mesh shader are variables identifying the specific workgroup and invocation and, if applicable, any outputs written to task memory by the task shader that spawned the mesh shader's workgroup. The mesh shader can operate without a task shader as well. The invocations of the mesh shader workgroup write an output mesh, comprising a set of primitives with per-primitive attributes, a set of vertices with per-vertex attributes, and an array of indices identifying the mesh vertices that belong to each primitive. The primitives of this mesh are then processed by subsequent graphics pipeline stages, where the outputs of the mesh shader form an interface with the fragment shader. === Mesh Shader Execution Mesh workloads are formed from groups of work items called workgroups and processed by the mesh shader in the current graphics pipeline. A workgroup is a collection of shader invocations that execute the same shader, potentially in parallel. Mesh shaders execute in _global workgroups_ which are divided into a number of _local workgroups_ with a size that can: be set by assigning a value to the code:LocalSize ifdef::VK_VERSION_1_3,VK_KHR_maintenance4[or code:LocalSizeId] execution mode or via an object decorated by the code:WorkgroupSize decoration. An invocation within a local workgroup can: share data with other members of the local workgroup through shared variables and issue memory and control flow barriers to synchronize with other members of the local workgroup. The _global workgroups_ may be generated explicitly via the API, or implicitly through the task shader's work creation mechanism. endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[] ifdef::VK_EXT_mesh_shader[] ifdef::VK_VERSION_1_1,VK_KHR_multiview[] If the subpass includes multiple views in its view mask, a Mesh shader using code:MeshEXT {ExecutionModel} may: be invoked separately for each view. endif::VK_VERSION_1_1,VK_KHR_multiview[] endif::VK_EXT_mesh_shader[] [[shaders-vertex]] == Vertex Shaders Each vertex shader invocation operates on one vertex and its associated <> data, and outputs one vertex and associated data. ifndef::VK_NV_mesh_shader,VK_EXT_mesh_shader[] Graphics pipelines must: include a vertex shader, and the vertex shader stage is always the first shader stage in the graphics pipeline. endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[] ifdef::VK_NV_mesh_shader,VK_EXT_mesh_shader[] Graphics pipelines using primitive shading must: include a vertex shader, and the vertex shader stage is always the first shader stage in the graphics pipeline. endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[] [[shaders-vertex-execution]] === Vertex Shader Execution A vertex shader must: be executed at least once for each vertex specified by a drawing command. ifdef::VK_VERSION_1_1,VK_KHR_multiview[] If the subpass includes multiple views in its view mask, the shader may: be invoked separately for each view. endif::VK_VERSION_1_1,VK_KHR_multiview[] During execution, the shader is presented with the index of the vertex and instance for which it has been invoked. Input variables declared in the vertex shader are filled by the implementation with the values of vertex attributes associated with the invocation being executed. If the same vertex is specified multiple times in a drawing command (e.g. by including the same index value multiple times in an index buffer) the implementation may: reuse the results of vertex shading if it can statically determine that the vertex shader invocations will produce identical results. [NOTE] .Note ==== It is implementation-dependent when and if results of vertex shading are reused, and thus how many times the vertex shader will be executed. This is true also if the vertex shader contains stores or atomic operations (see <>). ==== [[shaders-tessellation-control]] == Tessellation Control Shaders The tessellation control shader is used to read an input patch provided by the application and to produce an output patch. Each tessellation control shader invocation operates on an input patch (after all control points in the patch are processed by a vertex shader) and its associated data, and outputs a single control point of the output patch and its associated data, and can: also output additional per-patch data. The input patch is sized according to the pname:patchControlPoints member of slink:VkPipelineTessellationStateCreateInfo, as part of input assembly. ifdef::VK_EXT_extended_dynamic_state2[] The input patch can also be dynamically sized with pname:patchControlPoints parameter of flink:vkCmdSetPatchControlPointsEXT. [open,refpage='vkCmdSetPatchControlPointsEXT',desc='Specify the number of control points per patch dynamically for a command buffer',type='protos'] -- To <> the number of control points per patch, call: include::{generated}/api/protos/vkCmdSetPatchControlPointsEXT.adoc[] * pname:commandBuffer is the command buffer into which the command will be recorded. * pname:patchControlPoints specifies the number of control points per patch. This command sets the number of control points per patch for subsequent drawing commands when the graphics pipeline is created with ename:VK_DYNAMIC_STATE_PATCH_CONTROL_POINTS_EXT set in slink:VkPipelineDynamicStateCreateInfo::pname:pDynamicStates. Otherwise, this state is specified by the slink:VkPipelineTessellationStateCreateInfo::pname:patchControlPoints value used to create the currently active pipeline. .Valid Usage **** * [[VUID-vkCmdSetPatchControlPointsEXT-None-04873]] The <> feature must: be enabled * [[VUID-vkCmdSetPatchControlPointsEXT-patchControlPoints-04874]] pname:patchControlPoints must: be greater than zero and less than or equal to sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize **** include::{generated}/validity/protos/vkCmdSetPatchControlPointsEXT.adoc[] -- endif::VK_EXT_extended_dynamic_state2[] The size of the output patch is controlled by the code:OpExecutionMode code:OutputVertices specified in the tessellation control or tessellation evaluation shaders, which must: be specified in at least one of the shaders. The size of the input and output patches must: each be greater than zero and less than or equal to sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize. [[shaders-tessellation-control-execution]] === Tessellation Control Shader Execution A tessellation control shader is invoked at least once for each _output_ vertex in a patch. ifdef::VK_VERSION_1_1,VK_KHR_multiview[] If the subpass includes multiple views in its view mask, the shader may: be invoked separately for each view. endif::VK_VERSION_1_1,VK_KHR_multiview[] Inputs to the tessellation control shader are generated by the vertex shader. Each invocation of the tessellation control shader can: read the attributes of any incoming vertices and their associated data. The invocations corresponding to a given patch execute logically in parallel, with undefined: relative execution order. However, the code:OpControlBarrier instruction can: be used to provide limited control of the execution order by synchronizing invocations within a patch, effectively dividing tessellation control shader execution into a set of phases. Tessellation control shaders will read undefined: values if one invocation reads a per-vertex or per-patch output written by another invocation at any point during the same phase, or if two invocations attempt to write different values to the same per-patch output in a single phase. [[shaders-tessellation-evaluation]] == Tessellation Evaluation Shaders The Tessellation Evaluation Shader operates on an input patch of control points and their associated data, and a single input barycentric coordinate indicating the invocation's relative position within the subdivided patch, and outputs a single vertex and its associated data. [[shaders-tessellation-evaluation-execution]] === Tessellation Evaluation Shader Execution A tessellation evaluation shader is invoked at least once for each unique vertex generated by the tessellator. ifdef::VK_VERSION_1_1,VK_KHR_multiview[] If the subpass includes multiple views in its view mask, the shader may: be invoked separately for each view. endif::VK_VERSION_1_1,VK_KHR_multiview[] [[shaders-geometry]] == Geometry Shaders The geometry shader operates on a group of vertices and their associated data assembled from a single input primitive, and emits zero or more output primitives and the group of vertices and their associated data required for each output primitive. [[shaders-geometry-execution]] === Geometry Shader Execution A geometry shader is invoked at least once for each primitive produced by the tessellation stages, or at least once for each primitive generated by <> when tessellation is not in use. A shader can request that the geometry shader runs multiple <>. A geometry shader is invoked at least once for each instance. ifdef::VK_VERSION_1_1,VK_KHR_multiview[] If the subpass includes multiple views in its view mask, the shader may: be invoked separately for each view. endif::VK_VERSION_1_1,VK_KHR_multiview[] [[shaders-fragment]] == Fragment Shaders Fragment shaders are invoked as a <> in a graphics pipeline. Each fragment shader invocation operates on a single fragment and its associated data. With few exceptions, fragment shaders do not have access to any data associated with other fragments and are considered to execute in isolation of fragment shader invocations associated with other fragments. [[shaders-compute]] == Compute Shaders Compute shaders are invoked via flink:vkCmdDispatch and flink:vkCmdDispatchIndirect commands. In general, they have access to similar resources as shader stages executing as part of a graphics pipeline. Compute workloads are formed from groups of work items called workgroups and processed by the compute shader in the current compute pipeline. A workgroup is a collection of shader invocations that execute the same shader, potentially in parallel. Compute shaders execute in _global workgroups_ which are divided into a number of _local workgroups_ with a size that can: be set by assigning a value to the code:LocalSize ifdef::VK_VERSION_1_3,VK_KHR_maintenance4[or code:LocalSizeId] execution mode or via an object decorated by the code:WorkgroupSize decoration. An invocation within a local workgroup can: share data with other members of the local workgroup through shared variables and issue memory and control flow barriers to synchronize with other members of the local workgroup. ifdef::VK_NV_ray_tracing,VK_KHR_ray_tracing_pipeline[] [[shaders-raytracing-shaders]] [[shaders-ray-generation]] == Ray Generation Shaders A ray generation shader is similar to a compute shader. Its main purpose is to execute ray tracing queries using code:OpTraceRayKHR instructions and process the results. [[shaders-ray-generation-execution]] === Ray Generation Shader Execution One ray generation shader is executed per ray tracing dispatch. Its location in the shader binding table (see <> for details) is passed directly into ifdef::VK_KHR_ray_tracing_pipeline[] flink:vkCmdTraceRaysKHR using the pname:pRaygenShaderBindingTable parameter endif::VK_KHR_ray_tracing_pipeline[] ifdef::VK_KHR_ray_tracing_pipeline+VK_KHR_ray_tracing_pipeline[or] ifdef::VK_NV_ray_tracing[] flink:vkCmdTraceRaysNV using the pname:raygenShaderBindingTableBuffer and pname:raygenShaderBindingOffset parameters endif::VK_NV_ray_tracing[] . [[shaders-intersection]] == Intersection Shaders Intersection shaders enable the implementation of arbitrary, application defined geometric primitives. An intersection shader for a primitive is executed whenever its axis-aligned bounding box is hit by a ray. Like other ray tracing shader domains, an intersection shader operates on a single ray at a time. It also operates on a single primitive at a time. It is therefore the purpose of an intersection shader to compute the ray-primitive intersections and report them. To report an intersection, the shader calls the code:OpReportIntersectionKHR instruction. An intersection shader communicates with any-hit and closest shaders by generating attribute values that they can: read. Intersection shaders cannot: read or modify the ray payload. [[shaders-intersection-execution]] === Intersection Shader Execution The order in which intersections are found along a ray, and therefore the order in which intersection shaders are executed, is unspecified. The intersection shader of the closest AABB which intersects the ray is guaranteed to be executed at some point during traversal, unless the ray is forcibly terminated. [[shaders-any-hit]] == Any-Hit Shaders The any-hit shader is executed after the intersection shader reports an intersection that lies within the current [eq]#[t~min~,t~max~]# of the ray. The main use of any-hit shaders is to programmatically decide whether or not an intersection will be accepted. The intersection will be accepted unless the shader calls the code:OpIgnoreIntersectionKHR instruction. Any-hit shaders have read-only access to the attributes generated by the corresponding intersection shader, and can: read or modify the ray payload. [[shaders-any-hit-execution]] === Any-Hit Shader Execution The order in which intersections are found along a ray, and therefore the order in which any-hit shaders are executed, is unspecified. The any-hit shader of the closest hit is guaranteed to be executed at some point during traversal, unless the ray is forcibly terminated. [[shaders-closest-hit]] == Closest Hit Shaders Closest hit shaders have read-only access to the attributes generated by the corresponding intersection shader, and can: read or modify the ray payload. They also have access to a number of system-generated values. Closest hit shaders can: call code:OpTraceRayKHR to recursively trace rays. [[shaders-closest-hit-execution]] === Closest Hit Shader Execution Exactly one closest hit shader is executed when traversal is finished and an intersection has been found and accepted. [[shaders-miss]] == Miss Shaders Miss shaders can: access the ray payload and can: trace new rays through the code:OpTraceRayKHR instruction, but cannot: access attributes since they are not associated with an intersection. [[shaders-miss-execution]] === Miss Shader Execution A miss shader is executed instead of a closest hit shader if no intersection was found during traversal. [[shaders-callable]] == Callable Shaders Callable shaders can: access a callable payload that works similarly to ray payloads to do subroutine work. [[shaders-callable-execution]] === Callable Shader Execution A callable shader is executed by calling code:OpExecuteCallableKHR from an allowed shader stage. endif::VK_NV_ray_tracing,VK_KHR_ray_tracing_pipeline[] [[shaders-interpolation-decorations]] == Interpolation decorations Variables in the code:Input storage class in a fragment shader's interface are interpolated from the values specified by the primitive being rasterized. [NOTE] .Note ==== Interpolation decorations can be present on input and output variables in pre-rasterization shaders but have no effect on the interpolation performed. ifdef::VK_EXT_graphics_pipeline_libraries[] However, when linking graphics pipeline libraries, if the <> limit is not supported, interpolation qualifiers do need to match between the fragment shader input and the last pre-rasterization shader output. endif::VK_EXT_graphics_pipeline_libraries[] ==== An undecorated input variable will be interpolated with perspective-correct interpolation according to the primitive type being rasterized. <> and <> are interpolated in the same way as the primitive's clip coordinates. If the code:NoPerspective decoration is present, linear interpolation is instead used for <> and <>. For points, as there is only a single vertex, input values are never interpolated and instead take the value written for the single vertex. If the code:Flat decoration is present on an input variable, the value is not interpolated, and instead takes its value directly from the <>. Fragment shader inputs that are signed or unsigned integers, integer vectors, or any double-precision floating-point type must: be decorated with code:Flat. Interpolation of input variables is performed at an implementation-defined position within the fragment area being shaded. The position is further constrained as follows: * If the code:Centroid decoration is used, the interpolation position used for the variable must: also fall within the bounds of the primitive being rasterized. * If the code:Sample decoration is used, the interpolation position used for the variable must: be at the position of the sample being shaded by the current fragment shader invocation. * If a sample count of 1 is used, the interpolation position must: be at the center of the fragment area. [NOTE] .Note ==== As code:Centroid restricts the possible interpolation position to the covered area of the primitive, the position can be forced to vary between neighboring fragments when it otherwise would not. Derivatives calculated based on these differing locations can produce inconsistent results compared to undecorated inputs. It is recommended that input variables used in derivative calculations are not decorated with code:Centroid. ==== ifdef::VK_NV_fragment_shader_barycentric,VK_KHR_fragment_shader_barycentric[] [[shaders-interpolation-decorations-pervertexkhr]] If the code:PerVertexKHR decoration is present on an input variable, the value is not interpolated, and instead values from all input vertices are available in an array. Each index of the array corresponds to one of the vertices of the primitive that produced the fragment. endif::VK_NV_fragment_shader_barycentric,VK_KHR_fragment_shader_barycentric[] ifdef::VK_AMD_shader_explicit_vertex_parameter[] If the code:CustomInterpAMD decoration is present on an input variable, the value cannot: be accessed directly; instead the extended instruction code:InterpolateAtVertexAMD must: be used to obtain values from the input vertices. endif::VK_AMD_shader_explicit_vertex_parameter[] [[shaders-staticuse]] == Static Use A SPIR-V module declares a global object in memory using the code:OpVariable instruction, which results in a pointer code:x to that object. A specific entry point in a SPIR-V module is said to _statically use_ that object if that entry point's call tree contains a function containing a instruction with code:x as an code:id operand. Static use is not used to control the behavior of variables with code:Input and code:Output storage. The effects of those variables are applied based only on whether they are present in a shader entry point's interface. [[shaders-scope]] == Scope A _scope_ describes a set of shader invocations, where each such set is a _scope instance_. Each invocation belongs to one or more scope instances, but belongs to no more than one scope instance for each scope. The operations available between invocations in a given scope instance vary, with smaller scopes generally able to perform more operations, and with greater efficiency. [[shaders-scope-cross-device]] === Cross Device All invocations executed in a Vulkan instance fall into a single _cross device scope instance_. Whilst the code:CrossDevice scope is defined in SPIR-V, it is disallowed in Vulkan. API <> commands can: be used to communicate between devices. [[shaders-scope-device]] === Device All invocations executed on a single device form a _device scope instance_. ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] If the <> and <> features are enabled, this scope is represented in SPIR-V by the code:Device code:Scope, which can: be used as a code:Memory code:Scope for barrier and atomic operations. ifdef::VK_KHR_shader_clock[] If both the <> and <> features are enabled, using the code:Device code:Scope with the code:OpReadClockKHR instruction will read from a clock that is consistent across invocations in the same device scope instance. endif::VK_KHR_shader_clock[] endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] There is no method to synchronize the execution of these invocations within SPIR-V, and this can: only be done with API synchronization primitives. ifdef::VK_VERSION_1_1,VK_KHR_device_group[] Invocations executing on different devices in a device group operate in separate device scope instances. endif::VK_VERSION_1_1,VK_KHR_device_group[] ifndef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] The scope only extends to the queue family, not the whole device. endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] [[shaders-scope-queue-family]] === Queue Family Invocations executed by queues in a given queue family form a _queue family scope instance_. This scope is identified in SPIR-V as the ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] code:QueueFamily code:Scope if the <> feature is enabled, or if not, the endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] code:Device code:Scope, which can: be used as a code:Memory code:Scope for barrier and atomic operations. ifdef::VK_KHR_shader_clock[] If the <> feature is enabled, ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] but the <> feature is not enabled, endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] using the code:Device code:Scope with the code:OpReadClockKHR instruction will read from a clock that is consistent across invocations in the same queue family scope instance. endif::VK_KHR_shader_clock[] There is no method to synchronize the execution of these invocations within SPIR-V, and this can: only be done with API synchronization primitives. Each invocation in a queue family scope instance must: be in the same <>. [[shaders-scope-command]] === Command Any shader invocations executed as the result of a single command such as flink:vkCmdDispatch or flink:vkCmdDraw form a _command scope instance_. For indirect drawing commands with pname:drawCount greater than one, invocations from separate draws are in separate command scope instances. ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] For ray tracing shaders, an invocation group is an implementation-dependent subset of the set of shader invocations of a given shader stage which are produced by a single trace rays command. endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] There is no specific code:Scope for communication across invocations in a command scope instance. As this has a clear boundary at the API level, coordination here can: be performed in the API, rather than in SPIR-V. Each invocation in a command scope instance must: be in the same <>. For shaders without defined <>, this set of invocations forms an _invocation group_ as defined in the <>. [[shaders-scope-primitive]] === Primitive Any fragment shader invocations executed as the result of rasterization of a single primitive form a _primitive scope instance_. There is no specific code:Scope for communication across invocations in a primitive scope instance. Any generated <> are included in this scope instance. Each invocation in a primitive scope instance must: be in the same <>. Any input variables decorated with code:Flat are uniform within a primitive scope instance. // intentionally no VK_NV_ray_tracing here since this scope does not exist there ifdef::VK_KHR_ray_tracing_pipeline[] [[shaders-scope-shadercall]] === Shader Call Any <> invocations that are executed in one or more ray tracing execution models form a _shader call scope instance_. The code:ShaderCallKHR code:Scope can be used as code:Memory code:Scope for barrier and atomic operations. Each invocation in a shader call scope instance must: be in the same <>. endif::VK_KHR_ray_tracing_pipeline[] [[shaders-scope-workgroup]] === Workgroup A _local workgroup_ is a set of invocations that can synchronize and share data with each other using memory in the code:Workgroup storage class. The code:Workgroup code:Scope can be used as both an code:Execution code:Scope and code:Memory code:Scope for barrier and atomic operations. Each invocation in a local workgroup must: be in the same <>. Only ifdef::VK_NV_mesh_shader,VK_EXT_mesh_shader[] task, mesh, and endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[] compute shaders have defined workgroups - other shader types cannot: use workgroup functionality. For shaders that have defined workgroups, this set of invocations forms an _invocation group_ as defined in the <>. ifdef::VK_VERSION_1_1[] [[shaders-scope-subgroup]] === Subgroup A _subgroup_ (see the subsection "`Control Flow`" of section 2 of the SPIR-V 1.3 Revision 1 specification) is a set of invocations that can synchronize and share data with each other efficiently. The code:Subgroup code:Scope can be used as both an code:Execution code:Scope and code:Memory code:Scope for barrier and atomic operations. Other <> allow the use of <> with subgroup scope. ifdef::VK_KHR_shader_clock[] If the <> feature is enabled, using the code:Subgroup code:Scope with the code:OpReadClockKHR instruction will read from a clock that is consistent across invocations in the same subgroup. endif::VK_KHR_shader_clock[] For <>, each invocation in a subgroup must: be in the same <>. In other shader stages, each invocation in a subgroup must: be in the same <>. Only <> have defined subgroups. endif::VK_VERSION_1_1[] [[shaders-scope-quad]] === Quad A _quad scope instance_ is formed of four shader invocations. In a fragment shader, each invocation in a quad scope instance is formed of invocations in neighboring framebuffer locations [eq]#(x~i~, y~i~)#, where: * [eq]#i# is the index of the invocation within the scope instance. * [eq]#w# and [eq]#h# are the number of pixels the fragment covers in the [eq]#x# and [eq]#y# axes. * [eq]#w# and [eq]#h# are identical for all participating invocations. * [eq]#(x~0~) = (x~1~ - w) = (x~2~) = (x~3~ - w)# * [eq]#(y~0~) = (y~1~) = (y~2~ - h) = (y~3~ - h)# * Each invocation has the same layer and sample indices. ifdef::VK_NV_compute_shader_derivatives[] In a compute shader, if the code:DerivativeGroupQuadsNV execution mode is specified, each invocation in a quad scope instance is formed of invocations with adjacent local invocation IDs [eq]#(x~i~, y~i~)#, where: * [eq]#i# is the index of the invocation within the quad scope instance. * [eq]#(x~0~) = (x~1~ - 1) = (x~2~) = (x~3~ - 1)# * [eq]#(y~0~) = (y~1~) = (y~2~ - 1) = (y~3~ - 1)# * [eq]#x~0~# and [eq]#y~0~# are integer multiples of 2. * Each invocation has the same [eq]#z# coordinate. In a compute shader, if the code:DerivativeGroupLinearNV execution mode is specified, each invocation in a quad scope instance is formed of invocations with adjacent local invocation indices [eq]#(l~i~)#, where: * [eq]#i# is the index of the invocation within the quad scope instance. * [eq]#(l~0~) = (l~1~ - 1) = (l~2~ - 2) = (l~3~ - 3)# * [eq]#l~0~# is an integer multiple of 4. endif::VK_NV_compute_shader_derivatives[] ifdef::VK_VERSION_1_1[] In all shaders, each invocation in a quad scope instance is formed of invocations in adjacent subgroup invocation indices [eq]#(s~i~)#, where: * [eq]#i# is the index of the invocation within the quad scope instance. * [eq]#(s~0~) = (s~1~ - 1) = (s~2~ - 2) = (s~3~ - 3)# * [eq]#s~0~# is an integer multiple of 4. Each invocation in a quad scope instance must: be in the same <>. endif::VK_VERSION_1_1[] ifndef::VK_VERSION_1_1[] The specific set of invocations that make up a quad scope instance in other shader stages is undefined:. endif::VK_VERSION_1_1[] In a fragment shader, each invocation in a quad scope instance must: be in the same <>. ifndef::VK_VERSION_1_1[] For <>, each invocation in a quad scope instance must: be in the same <>. In other shader stages, each invocation in a quad scope instance must: be in the same <>. endif::VK_VERSION_1_1[] Fragment ifdef::VK_NV_compute_shader_derivatives,VK_VERSION_1_1[] and compute endif::VK_NV_compute_shader_derivatives,VK_VERSION_1_1[] shaders have defined quad scope instances. ifdef::VK_VERSION_1_1[] If the <> limit is supported, any <> also have defined quad scope instances. endif::VK_VERSION_1_1[] ifdef::VK_EXT_fragment_shader_interlock[] [[shaders-scope-fragment-interlock]] === Fragment Interlock A _fragment interlock scope instance_ is formed of fragment shader invocations based on their framebuffer locations [eq]#(x,y,layer,sample)#, executed by commands inside a single <>. The specific set of invocations included varies based on the execution mode as follows: * If the code:SampleInterlockOrderedEXT or code:SampleInterlockUnorderedEXT execution modes are used, only invocations with identical framebuffer locations [eq]#(x,y,layer,sample)# are included. * If the code:PixelInterlockOrderedEXT or code:PixelInterlockUnorderedEXT execution modes are used, fragments with different sample ids are also included. ifdef::VK_NV_shading_rate_image,VK_KHR_fragment_shading_rate[] * If the code:ShadingRateInterlockOrderedEXT or code:ShadingRateInterlockUnorderedEXT execution modes are used, fragments from neighbouring framebuffer locations are also included. The ifdef::VK_NV_shading_rate_image[<>] ifdef::VK_KHR_fragment_shading_rate+VK_NV_shading_rate_image[or] ifdef::VK_KHR_fragment_shading_rate[<>] determines these fragments. endif::VK_NV_shading_rate_image,VK_KHR_fragment_shading_rate[] Only fragment shaders with one of the above execution modes have defined fragment interlock scope instances. There is no specific code:Scope value for communication across invocations in a fragment interlock scope instance. However, this is implicitly used as a memory scope by code:OpBeginInvocationInterlockEXT and code:OpEndInvocationInterlockEXT. Each invocation in a fragment interlock scope instance must: be in the same <>. endif::VK_EXT_fragment_shader_interlock[] [[shaders-scope-invocation]] === Invocation The smallest _scope_ is a single invocation; this is represented by the code:Invocation code:Scope in SPIR-V. Fragment shader invocations must: be in a <>. ifdef::VK_EXT_fragment_shader_interlock[] Invocations in <> must: be in a <>. endif::VK_EXT_fragment_shader_interlock[] Invocations in <> must: be in a <>. ifdef::VK_VERSION_1_1[] Invocations in <> must: be in a <>. endif::VK_VERSION_1_1[] Invocations in <> must: be in a <>. All invocations in all stages must: be in a <>. ifdef::VK_VERSION_1_1[] [[shaders-group-operations]] == Group Operations _Group operations_ are executed by multiple invocations within a <>; with each invocation involved in calculating the result. This provides a mechanism for efficient communication between invocations in a particular scope instance. Group operations all take a code:Scope defining the desired <> to operate within. Only the code:Subgroup scope can: be used for these operations; the <> limit defines which types of operation can: be used. [[shaders-group-operations-basic]] === Basic Group Operations Basic group operations include the use of code:OpGroupNonUniformElect, code:OpControlBarrier, code:OpMemoryBarrier, and atomic operations. code:OpGroupNonUniformElect can: be used to choose a single invocation to perform a task for the whole group. Only the invocation with the lowest id in the group will return code:true. The <> appendix defines the operation of barriers and atomics. [[shaders-group-operations-vote]] === Vote Group Operations The vote group operations allow invocations within a group to compare values across a group. The types of votes enabled are: * Do all active group invocations agree that an expression is true? * Do any active group invocations evaluate an expression to true? * Do all active group invocations have the same value of an expression? [NOTE] .Note ==== These operations are useful in combination with control flow in that they allow for developers to check whether conditions match across the group and choose potentially faster code-paths in these cases. ==== [[shaders-group-operations-arithmetic]] === Arithmetic Group Operations The arithmetic group operations allow invocations to perform scans and reductions across a group. The operators supported are add, mul, min, max, and, or, xor. For reductions, every invocation in a group will obtain the cumulative result of these operators applied to all values in the group. For exclusive scans, each invocation in a group will obtain the cumulative result of these operators applied to all values in invocations with a lower index in the group. Inclusive scans are identical to exclusive scans, except the cumulative result includes the operator applied to the value in the current invocation. The order in which these operators are applied is implementation-dependent. [[shaders-group-operations-ballot]] === Ballot Group Operations The ballot group operations allow invocations to perform more complex votes across the group. The ballot functionality allows all invocations within a group to provide a boolean value and get as a result what each invocation provided as their boolean value. The broadcast functionality allows values to be broadcast from an invocation to all other invocations within the group. [[shaders-group-operations-shuffle]] === Shuffle Group Operations The shuffle group operations allow invocations to read values from other invocations within a group. [[shaders-group-operations-shuffle-relative]] === Shuffle Relative Group Operations The shuffle relative group operations allow invocations to read values from other invocations within the group relative to the current invocation in the group. The relative operations supported allow data to be shifted up and down through the invocations within a group. [[shaders-group-operations-clustered]] === Clustered Group Operations The clustered group operations allow invocations to perform an operation among partitions of a group, such that the operation is only performed within the group invocations within a partition. The partitions for clustered group operations are consecutive power-of-two size groups of invocations and the cluster size must: be known at pipeline creation time. The operations supported are add, mul, min, max, and, or, xor. [[shaders-quad-operations]] == Quad Group Operations Quad group operations (code:OpGroupNonUniformQuad*) are a specialized type of <> that only operate on <>. Whilst these instructions do include a code:Scope parameter, this scope is always overridden; only the <> is included in its execution scope. Fragment shaders that statically execute quad group operations must: launch sufficient invocations to ensure their correct operation; additional <> are launched for framebuffer locations not covered by rasterized fragments if necessary. The index used to select participating invocations is [eq]#i#, as described for a <>, defined as the _quad index_ in the <>. For code:OpGroupNonUniformQuadBroadcast this value is equal to code:Index. For code:OpGroupNonUniformQuadSwap, it is equal to the implicit code:Index used by each participating invocation. endif::VK_VERSION_1_1[] [[shaders-derivative-operations]] == Derivative Operations Derivative operations calculate the partial derivative for an expression [eq]#P# as a function of an invocation's [eq]#x# and [eq]#y# coordinates. Derivative operations operate on a set of invocations known as a _derivative group_ as defined in the <>. A derivative group is equivalent to ifdef::VK_NV_compute_shader_derivatives[] the <> for a compute shader invocation, or endif::VK_NV_compute_shader_derivatives[] the <> for a fragment shader invocation. Derivatives are calculated assuming that [eq]#P# is piecewise linear and continuous within the derivative group. All dynamic instances of explicit derivative instructions (code:OpDPdx*, code:OpDPdy*, and code:OpFwidth*) must: be executed in control flow that is uniform within a derivative group. For other derivative operations, results are undefined: if a dynamic instance is executed in control flow that is not uniform within the derivative group. Fragment shaders that statically execute derivative operations must: launch sufficient invocations to ensure their correct operation; additional <> are launched for framebuffer locations not covered by rasterized fragments if necessary. ifdef::VK_NV_compute_shader_derivatives[] [NOTE] .Note ==== In a compute shader, it is the application's responsibility to ensure that sufficient invocations are launched. ==== endif::VK_NV_compute_shader_derivatives[] Derivative operations calculate their results as the difference between the result of [eq]#P# across invocations in the quad. For fine derivative operations (code:OpDPdxFine and code:OpDPdyFine), the values of [eq]#DPdx(P~i~)# are calculated as {empty}:: [eq]#DPdx(P~0~) = DPdx(P~1~) = P~1~ - P~0~# {empty}:: [eq]#DPdx(P~2~) = DPdx(P~3~) = P~3~ - P~2~# and the values of [eq]#DPdy(P~i~)# are calculated as {empty}:: [eq]#DPdy(P~0~) = DPdy(P~2~) = P~2~ - P~0~# {empty}:: [eq]#DPdy(P~1~) = DPdy(P~3~) = P~3~ - P~1~# where [eq]#i# is the index of each invocation as described in <>. Coarse derivative operations (code:OpDPdxCoarse and code:OpDPdyCoarse), calculate their results in roughly the same manner, but may: only calculate two values instead of four (one for each of [eq]#DPdx# and [eq]#DPdy#), reusing the same result no matter the originating invocation. If an implementation does this, it should: use the fine derivative calculations described for [eq]#P~0~#. [NOTE] .Note ==== Derivative values are calculated between fragments rather than pixels. If the fragment shader invocations involved in the calculation cover multiple pixels, these operations cover a wider area, resulting in larger derivative values. This in turn will result in a coarser level of detail being selected for image sampling operations using derivatives. Applications may want to account for this when using multi-pixel fragments; if pixel derivatives are desired, applications should use explicit derivative operations and divide the results by the size of the fragment in each dimension as follows: {empty}:: [eq]#DPdx(P~n~)' = DPdx(P~n~) / w# {empty}:: [eq]#DPdy(P~n~)' = DPdy(P~n~) / h# where [eq]#w# and [eq]#h# are the size of the fragments in the quad, and [eq]#DPdx(P~n~)'# and [eq]#DPdy(P~n~)'# are the pixel derivatives. ==== The results for code:OpDPdx and code:OpDPdy may: be calculated as either fine or coarse derivatives, with implementations favouring the most efficient approach. Implementations must: choose coarse or fine consistently between the two. Executing code:OpFwidthFine, code:OpFwidthCoarse, or code:OpFwidth is equivalent to executing the corresponding code:OpDPdx* and code:OpDPdy* instructions, taking the absolute value of the results, and summing them. Executing an code:OpImage*Sample*ImplicitLod instruction is equivalent to executing code:OpDPdx(code:Coordinate) and code:OpDPdy(code:Coordinate), and passing the results as the code:Grad operands code:dx and code:dy. [NOTE] .Note ==== It is expected that using the code:ImplicitLod variants of sampling functions will be substantially more efficient than using the code:ExplicitLod variants with explicitly generated derivatives. ==== [[shaders-helper-invocations]] == Helper Invocations When performing <> ifdef::VK_VERSION_1_1[] or <> endif::VK_VERSION_1_1[] operations in a fragment shader, additional invocations may: be spawned in order to ensure correct results. These additional invocations are known as _helper invocations_ and can: be identified by a non-zero value in the code:HelperInvocation built-in. Stores and atomics performed by helper invocations must: not have any effect on memory except for the code:Function, code:Private and code:Output storage classes, and values returned by atomic instructions in helper invocations are undefined:. [NOTE] .Note ==== While storage to code:Output storage class has an effect even in helper invocations, it does not mean that helper invocations have an effect on the framebuffer. code:Output variables in fragment shaders can be read from as well, and they behave more like code:Private variables for the duration of the shader invocation. ==== For <> other than <> ifdef::VK_VERSION_1_1[] and <> endif::VK_VERSION_1_1[] operations, helper invocations may: be treated as inactive even if they would be considered otherwise active. ifdef::VK_VERSION_1_3,VK_EXT_shader_demote_to_helper_invocation[] Helper invocations may: become permanently inactive if all invocations in a quad scope instance become helper invocations. endif::VK_VERSION_1_3,VK_EXT_shader_demote_to_helper_invocation[] ifdef::VK_NV_cooperative_matrix[] == Cooperative Matrices A _cooperative matrix_ type is a SPIR-V type where the storage for and computations performed on the matrix are spread across the invocations in a scope instance. These types give the implementation freedom in how to optimize matrix multiplies. SPIR-V defines the types and instructions, but does not specify rules about what sizes/combinations are valid, and it is expected that different implementations may: support different sizes. [open,refpage='vkGetPhysicalDeviceCooperativeMatrixPropertiesNV',desc='Returns properties describing what cooperative matrix types are supported',type='protos'] -- To enumerate the supported cooperative matrix types and operations, call: include::{generated}/api/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.adoc[] * pname:physicalDevice is the physical device. * pname:pPropertyCount is a pointer to an integer related to the number of cooperative matrix properties available or queried. * pname:pProperties is either `NULL` or a pointer to an array of slink:VkCooperativeMatrixPropertiesNV structures. If pname:pProperties is `NULL`, then the number of cooperative matrix properties available is returned in pname:pPropertyCount. Otherwise, pname:pPropertyCount must: point to a variable set by the user to the number of elements in the pname:pProperties array, and on return the variable is overwritten with the number of structures actually written to pname:pProperties. If pname:pPropertyCount is less than the number of cooperative matrix properties available, at most pname:pPropertyCount structures will be written, and ename:VK_INCOMPLETE will be returned instead of ename:VK_SUCCESS, to indicate that not all the available cooperative matrix properties were returned. include::{generated}/validity/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.adoc[] -- [open,refpage='VkCooperativeMatrixPropertiesNV',desc='Structure specifying cooperative matrix properties',type='structs'] -- Each sname:VkCooperativeMatrixPropertiesNV structure describes a single supported combination of types for a matrix multiply/add operation (code:OpCooperativeMatrixMulAddNV). The multiply can: be described in terms of the following variables and types (in SPIR-V pseudocode): [source,c] ~~~~ %A is of type OpTypeCooperativeMatrixNV %AType %scope %MSize %KSize %B is of type OpTypeCooperativeMatrixNV %BType %scope %KSize %NSize %C is of type OpTypeCooperativeMatrixNV %CType %scope %MSize %NSize %D is of type OpTypeCooperativeMatrixNV %DType %scope %MSize %NSize %D = %A * %B + %C // using OpCooperativeMatrixMulAddNV ~~~~ A matrix multiply with these dimensions is known as an _MxNxK_ matrix multiply. The sname:VkCooperativeMatrixPropertiesNV structure is defined as: include::{generated}/api/structs/VkCooperativeMatrixPropertiesNV.adoc[] * pname:sType is the type of this structure. * pname:pNext is `NULL` or a pointer to a structure extending this structure. * pname:MSize is the number of rows in matrices A, C, and D. * pname:KSize is the number of columns in matrix A and rows in matrix B. * pname:NSize is the number of columns in matrices B, C, D. * pname:AType is the component type of matrix A, of type elink:VkComponentTypeNV. * pname:BType is the component type of matrix B, of type elink:VkComponentTypeNV. * pname:CType is the component type of matrix C, of type elink:VkComponentTypeNV. * pname:DType is the component type of matrix D, of type elink:VkComponentTypeNV. * pname:scope is the scope of all the matrix types, of type elink:VkScopeNV. If some types are preferred over other types (e.g. for performance), they should: appear earlier in the list enumerated by flink:vkGetPhysicalDeviceCooperativeMatrixPropertiesNV. At least one entry in the list must: have power of two values for all of pname:MSize, pname:KSize, and pname:NSize. include::{generated}/validity/structs/VkCooperativeMatrixPropertiesNV.adoc[] -- [open,refpage='VkScopeNV',desc='Specify SPIR-V scope',type='enums'] -- Possible values for elink:VkScopeNV include: include::{generated}/api/enums/VkScopeNV.adoc[] * ename:VK_SCOPE_DEVICE_NV corresponds to SPIR-V code:Device scope. * ename:VK_SCOPE_WORKGROUP_NV corresponds to SPIR-V code:Workgroup scope. * ename:VK_SCOPE_SUBGROUP_NV corresponds to SPIR-V code:Subgroup scope. * ename:VK_SCOPE_QUEUE_FAMILY_NV corresponds to SPIR-V code:QueueFamily scope. All enum values match the corresponding SPIR-V value. -- [open,refpage='VkComponentTypeNV',desc='Specify SPIR-V cooperative matrix component type',type='enums'] -- Possible values for elink:VkComponentTypeNV include: include::{generated}/api/enums/VkComponentTypeNV.adoc[] * ename:VK_COMPONENT_TYPE_FLOAT16_NV corresponds to SPIR-V code:OpTypeFloat 16. * ename:VK_COMPONENT_TYPE_FLOAT32_NV corresponds to SPIR-V code:OpTypeFloat 32. * ename:VK_COMPONENT_TYPE_FLOAT64_NV corresponds to SPIR-V code:OpTypeFloat 64. * ename:VK_COMPONENT_TYPE_SINT8_NV corresponds to SPIR-V code:OpTypeInt 8 1. * ename:VK_COMPONENT_TYPE_SINT16_NV corresponds to SPIR-V code:OpTypeInt 16 1. * ename:VK_COMPONENT_TYPE_SINT32_NV corresponds to SPIR-V code:OpTypeInt 32 1. * ename:VK_COMPONENT_TYPE_SINT64_NV corresponds to SPIR-V code:OpTypeInt 64 1. * ename:VK_COMPONENT_TYPE_UINT8_NV corresponds to SPIR-V code:OpTypeInt 8 0. * ename:VK_COMPONENT_TYPE_UINT16_NV corresponds to SPIR-V code:OpTypeInt 16 0. * ename:VK_COMPONENT_TYPE_UINT32_NV corresponds to SPIR-V code:OpTypeInt 32 0. * ename:VK_COMPONENT_TYPE_UINT64_NV corresponds to SPIR-V code:OpTypeInt 64 0. -- endif::VK_NV_cooperative_matrix[] ifdef::VK_EXT_validation_cache[] [[shaders-validation-cache]] == Validation Cache [open,refpage='VkValidationCacheEXT',desc='Opaque handle to a validation cache object',type='handles'] -- Validation cache objects allow the result of internal validation to be reused, both within a single application run and between multiple runs. Reuse within a single run is achieved by passing the same validation cache object when creating supported Vulkan objects. Reuse across runs of an application is achieved by retrieving validation cache contents in one run of an application, saving the contents, and using them to preinitialize a validation cache on a subsequent run. The contents of the validation cache objects are managed by the validation layers. Applications can: manage the host memory consumed by a validation cache object and control the amount of data retrieved from a validation cache object. Validation cache objects are represented by sname:VkValidationCacheEXT handles: include::{generated}/api/handles/VkValidationCacheEXT.adoc[] -- [open,refpage='vkCreateValidationCacheEXT',desc='Creates a new validation cache',type='protos'] -- To create validation cache objects, call: include::{generated}/api/protos/vkCreateValidationCacheEXT.adoc[] * pname:device is the logical device that creates the validation cache object. * pname:pCreateInfo is a pointer to a slink:VkValidationCacheCreateInfoEXT structure containing the initial parameters for the validation cache object. * pname:pAllocator controls host memory allocation as described in the <> chapter. * pname:pValidationCache is a pointer to a slink:VkValidationCacheEXT handle in which the resulting validation cache object is returned. [NOTE] .Note ==== Applications can: track and manage the total host memory size of a validation cache object using the pname:pAllocator. Applications can: limit the amount of data retrieved from a validation cache object in fname:vkGetValidationCacheDataEXT. Implementations should: not internally limit the total number of entries added to a validation cache object or the total host memory consumed. ==== Once created, a validation cache can: be passed to the fname:vkCreateShaderModule command by adding this object to the slink:VkShaderModuleCreateInfo structure's pname:pNext chain. If a slink:VkShaderModuleValidationCacheCreateInfoEXT object is included in the slink:VkShaderModuleCreateInfo::pname:pNext chain, and its pname:validationCache field is not dlink:VK_NULL_HANDLE, the implementation will query it for possible reuse opportunities and update it with new content. The use of the validation cache object in these commands is internally synchronized, and the same validation cache object can: be used in multiple threads simultaneously. [NOTE] .Note ==== Implementations should: make every effort to limit any critical sections to the actual accesses to the cache, which is expected to be significantly shorter than the duration of the fname:vkCreateShaderModule command. ==== include::{generated}/validity/protos/vkCreateValidationCacheEXT.adoc[] -- [open,refpage='VkValidationCacheCreateInfoEXT',desc='Structure specifying parameters of a newly created validation cache',type='structs'] -- The sname:VkValidationCacheCreateInfoEXT structure is defined as: include::{generated}/api/structs/VkValidationCacheCreateInfoEXT.adoc[] * pname:sType is the type of this structure. * pname:pNext is `NULL` or a pointer to a structure extending this structure. * pname:flags is reserved for future use. * pname:initialDataSize is the number of bytes in pname:pInitialData. If pname:initialDataSize is zero, the validation cache will initially be empty. * pname:pInitialData is a pointer to previously retrieved validation cache data. If the validation cache data is incompatible (as defined below) with the device, the validation cache will be initially empty. If pname:initialDataSize is zero, pname:pInitialData is ignored. .Valid Usage **** * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01534]] If pname:initialDataSize is not `0`, it must: be equal to the size of pname:pInitialData, as returned by fname:vkGetValidationCacheDataEXT when pname:pInitialData was originally retrieved * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01535]] If pname:initialDataSize is not `0`, pname:pInitialData must: have been retrieved from a previous call to fname:vkGetValidationCacheDataEXT **** include::{generated}/validity/structs/VkValidationCacheCreateInfoEXT.adoc[] -- [open,refpage='VkValidationCacheCreateFlagsEXT',desc='Reserved for future use',type='flags'] -- include::{generated}/api/flags/VkValidationCacheCreateFlagsEXT.adoc[] tname:VkValidationCacheCreateFlagsEXT is a bitmask type for setting a mask, but is currently reserved for future use. -- [open,refpage='vkMergeValidationCachesEXT',desc='Combine the data stores of validation caches',type='protos'] -- Validation cache objects can: be merged using the command: include::{generated}/api/protos/vkMergeValidationCachesEXT.adoc[] * pname:device is the logical device that owns the validation cache objects. * pname:dstCache is the handle of the validation cache to merge results into. * pname:srcCacheCount is the length of the pname:pSrcCaches array. * pname:pSrcCaches is a pointer to an array of validation cache handles, which will be merged into pname:dstCache. The previous contents of pname:dstCache are included after the merge. [NOTE] .Note ==== The details of the merge operation are implementation-dependent, but implementations should: merge the contents of the specified validation caches and prune duplicate entries. ==== .Valid Usage **** * [[VUID-vkMergeValidationCachesEXT-dstCache-01536]] pname:dstCache must: not appear in the list of source caches **** include::{generated}/validity/protos/vkMergeValidationCachesEXT.adoc[] -- [open,refpage='vkGetValidationCacheDataEXT',desc='Get the data store from a validation cache',type='protos'] -- Data can: be retrieved from a validation cache object using the command: include::{generated}/api/protos/vkGetValidationCacheDataEXT.adoc[] * pname:device is the logical device that owns the validation cache. * pname:validationCache is the validation cache to retrieve data from. * pname:pDataSize is a pointer to a value related to the amount of data in the validation cache, as described below. * pname:pData is either `NULL` or a pointer to a buffer. If pname:pData is `NULL`, then the maximum size of the data that can: be retrieved from the validation cache, in bytes, is returned in pname:pDataSize. Otherwise, pname:pDataSize must: point to a variable set by the user to the size of the buffer, in bytes, pointed to by pname:pData, and on return the variable is overwritten with the amount of data actually written to pname:pData. If pname:pDataSize is less than the maximum size that can: be retrieved by the validation cache, at most pname:pDataSize bytes will be written to pname:pData, and fname:vkGetValidationCacheDataEXT will return ename:VK_INCOMPLETE instead of ename:VK_SUCCESS, to indicate that not all of the validation cache was returned. Any data written to pname:pData is valid and can: be provided as the pname:pInitialData member of the slink:VkValidationCacheCreateInfoEXT structure passed to fname:vkCreateValidationCacheEXT. Two calls to fname:vkGetValidationCacheDataEXT with the same parameters must: retrieve the same data unless a command that modifies the contents of the cache is called between them. [[validation-cache-header]] Applications can: store the data retrieved from the validation cache, and use these data, possibly in a future run of the application, to populate new validation cache objects. The results of validation, however, may: depend on the vendor ID, device ID, driver version, and other details of the device. To enable applications to detect when previously retrieved data is incompatible with the device, the initial bytes written to pname:pData must: be a header consisting of the following members: .Layout for validation cache header version ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT [width="85%",cols="8%,21%,71%",options="header"] |==== | Offset | Size | Meaning | 0 | 4 | length in bytes of the entire validation cache header written as a stream of bytes, with the least significant byte first | 4 | 4 | a elink:VkValidationCacheHeaderVersionEXT value written as a stream of bytes, with the least significant byte first | 8 | ename:VK_UUID_SIZE | a layer commit ID expressed as a UUID, which uniquely identifies the version of the validation layers used to generate these validation results |==== The first four bytes encode the length of the entire validation cache header, in bytes. This value includes all fields in the header including the validation cache version field and the size of the length field. The next four bytes encode the validation cache version, as described for elink:VkValidationCacheHeaderVersionEXT. A consumer of the validation cache should: use the cache version to interpret the remainder of the cache header. If pname:pDataSize is less than what is necessary to store this header, nothing will be written to pname:pData and zero will be written to pname:pDataSize. include::{generated}/validity/protos/vkGetValidationCacheDataEXT.adoc[] -- [open,refpage='VkValidationCacheHeaderVersionEXT',desc='Encode validation cache version',type='enums',xrefs='vkCreateValidationCacheEXT vkGetValidationCacheDataEXT'] -- Possible values of the second group of four bytes in the header returned by flink:vkGetValidationCacheDataEXT, encoding the validation cache version, are: include::{generated}/api/enums/VkValidationCacheHeaderVersionEXT.adoc[] * ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT specifies version one of the validation cache. -- [open,refpage='vkDestroyValidationCacheEXT',desc='Destroy a validation cache object',type='protos'] -- To destroy a validation cache, call: include::{generated}/api/protos/vkDestroyValidationCacheEXT.adoc[] * pname:device is the logical device that destroys the validation cache object. * pname:validationCache is the handle of the validation cache to destroy. * pname:pAllocator controls host memory allocation as described in the <> chapter. .Valid Usage **** * [[VUID-vkDestroyValidationCacheEXT-validationCache-01537]] If sname:VkAllocationCallbacks were provided when pname:validationCache was created, a compatible set of callbacks must: be provided here * [[VUID-vkDestroyValidationCacheEXT-validationCache-01538]] If no sname:VkAllocationCallbacks were provided when pname:validationCache was created, pname:pAllocator must: be `NULL` **** include::{generated}/validity/protos/vkDestroyValidationCacheEXT.adoc[] -- endif::VK_EXT_validation_cache[]