OpenHarmony-v5.0-Beta1/s

// Copyright 2015-2021 The Khronos Group, Inc.
//
// SPDX-License-Identifier: CC-BY-4.0

[[shaders]]
= Shaders

A shader specifies programmable operations that execute for each vertex,
control point, tessellated vertex, primitive, fragment, or workgroup in the
corresponding stage(s) of the graphics and compute pipelines.

Graphics pipelines include vertex shader execution as a result of
<<drawing,primitive assembly>>, followed, if enabled, by tessellation
control and evaluation shaders operating on <<drawing-patch-lists,patches>>,
geometry shaders, if enabled, operating on primitives, and fragment shaders,
if present, operating on fragments generated by <<primsrast,Rasterization>>.
In this specification, vertex, tessellation control, tessellation evaluation
and geometry shaders are collectively referred to as
<<pipeline-graphics-subsets-pre-rasterization,pre-rasterization shader
stage>>s and occur in the logical pipeline before rasterization.
The fragment shader occurs logically after rasterization.

Only the compute shader stage is included in a compute pipeline.
Compute shaders operate on compute invocations in a workgroup.

Shaders can: read from input variables, and read from and write to output
variables.
Input and output variables can: be used to transfer data between shader
stages, or to allow the shader to interact with values that exist in the
execution environment.
Similarly, the execution environment provides constants describing
capabilities.

Shader variables are associated with execution environment-provided inputs
and outputs using _built-in_ decorations in the shader.
The available decorations for each stage are documented in the following
subsections.


[[shader-modules]]
== Shader Modules

[open,refpage='VkShaderModule',desc='Opaque handle to a shader module object',type='handles']
--
_Shader modules_ contain _shader code_ and one or more entry points.
Shaders are selected from a shader module by specifying an entry point as
part of <<pipelines,pipeline>> creation.
The stages of a pipeline can: use shaders that come from different modules.
The shader code defining a shader module must: be in the SPIR-V format, as
described by the <<spirvenv,Vulkan Environment for SPIR-V>> appendix.

Shader modules are represented by sname:VkShaderModule handles:

include::{generated}/api/handles/VkShaderModule.txt[]
--

[open,refpage='vkCreateShaderModule',desc='Creates a new shader module object',type='protos']
--
To create a shader module, call:

include::{generated}/api/protos/vkCreateShaderModule.txt[]

  * pname:device is the logical device that creates the shader module.
  * pname:pCreateInfo is a pointer to a slink:VkShaderModuleCreateInfo
    structure.
  * pname:pAllocator controls host memory allocation as described in the
    <<memory-allocation, Memory Allocation>> chapter.
  * pname:pShaderModule is a pointer to a slink:VkShaderModule handle in
    which the resulting shader module object is returned.

Once a shader module has been created, any entry points it contains can: be
used in pipeline shader stages as described in <<pipelines-compute,Compute
Pipelines>> and <<pipelines-graphics,Graphics Pipelines>>.

include::{generated}/validity/protos/vkCreateShaderModule.txt[]
--

[open,refpage='VkShaderModuleCreateInfo',desc='Structure specifying parameters of a newly created shader module',type='structs']
--
The sname:VkShaderModuleCreateInfo structure is defined as:

include::{generated}/api/structs/VkShaderModuleCreateInfo.txt[]

  * pname:sType is the type of this structure.
  * pname:pNext is `NULL` or a pointer to a structure extending this
    structure.
  * pname:flags is reserved for future use.
  * pname:codeSize is the size, in bytes, of the code pointed to by
    pname:pCode.
  * pname:pCode is a pointer to code that is used to create the shader
    module.
    The type and format of the code is determined from the content of the
    memory addressed by pname:pCode.

.Valid Usage
****
  * [[VUID-VkShaderModuleCreateInfo-codeSize-01085]]
    pname:codeSize must: be greater than 0
ifndef::VK_NV_glsl_shader[]
  * [[VUID-VkShaderModuleCreateInfo-codeSize-01086]]
    pname:codeSize must: be a multiple of 4
  * [[VUID-VkShaderModuleCreateInfo-pCode-01087]]
    pname:pCode must: point to valid SPIR-V code, formatted and packed as
    described by the <<spirv-spec,Khronos SPIR-V Specification>>
  * [[VUID-VkShaderModuleCreateInfo-pCode-01088]]
    pname:pCode must: adhere to the validation rules described by the
    <<spirvenv-module-validation, Validation Rules within a Module>> section
    of the <<spirvenv-capabilities,SPIR-V Environment>> appendix
endif::VK_NV_glsl_shader[]
ifdef::VK_NV_glsl_shader[]
  * [[VUID-VkShaderModuleCreateInfo-pCode-01376]]
    If pname:pCode is a pointer to SPIR-V code, pname:codeSize must: be a
    multiple of 4
  * [[VUID-VkShaderModuleCreateInfo-pCode-01377]]
    pname:pCode must: point to either valid SPIR-V code, formatted and
    packed as described by the <<spirv-spec,Khronos SPIR-V Specification>>
    or valid GLSL code which must: be written to the `GL_KHR_vulkan_glsl`
    extension specification
  * [[VUID-VkShaderModuleCreateInfo-pCode-01378]]
    If pname:pCode is a pointer to SPIR-V code, that code must: adhere to
    the validation rules described by the <<spirvenv-module-validation,
    Validation Rules within a Module>> section of the
    <<spirvenv-capabilities,SPIR-V Environment>> appendix
  * [[VUID-VkShaderModuleCreateInfo-pCode-01379]]
    If pname:pCode is a pointer to GLSL code, it must: be valid GLSL code
    written to the `GL_KHR_vulkan_glsl` GLSL extension specification
endif::VK_NV_glsl_shader[]
  * [[VUID-VkShaderModuleCreateInfo-pCode-01089]]
    pname:pCode must: declare the code:Shader capability for SPIR-V code
  * [[VUID-VkShaderModuleCreateInfo-pCode-01090]]
    pname:pCode must: not declare any capability that is not supported by
    the API, as described by the <<spirvenv-module-validation,
    Capabilities>> section of the <<spirvenv-capabilities,SPIR-V
    Environment>> appendix
  * [[VUID-VkShaderModuleCreateInfo-pCode-01091]]
    If pname:pCode declares any of the capabilities listed in the
    <<spirvenv-capabilities-table,SPIR-V Environment>> appendix, one of the
    corresponding requirements must: be satisfied
  * [[VUID-VkShaderModuleCreateInfo-pCode-04146]]
    pname:pCode must: not declare any SPIR-V extension that is not supported
    by the API, as described by the <<spirvenv-extensions, Extension>>
    section of the <<spirvenv-capabilities,SPIR-V Environment>> appendix
  * [[VUID-VkShaderModuleCreateInfo-pCode-04147]]
    If pname:pCode declares any of the SPIR-V extensions listed in the
    <<spirvenv-extensions-table,SPIR-V Environment>> appendix, one of the
    corresponding requirements must: be satisfied
****

include::{generated}/validity/structs/VkShaderModuleCreateInfo.txt[]
--

[open,refpage='VkShaderModuleCreateFlags',desc='Reserved for future use',type='flags']
--
include::{generated}/api/flags/VkShaderModuleCreateFlags.txt[]

tname:VkShaderModuleCreateFlags is a bitmask type for setting a mask, but is
currently reserved for future use.
--

ifdef::VK_EXT_validation_cache[]
include::{chapters}/VK_EXT_validation_cache/shader-module-validation-cache.txt[]
endif::VK_EXT_validation_cache[]


[open,refpage='vkDestroyShaderModule',desc='Destroy a shader module',type='protos']
--
To destroy a shader module, call:

include::{generated}/api/protos/vkDestroyShaderModule.txt[]

  * pname:device is the logical device that destroys the shader module.
  * pname:shaderModule is the handle of the shader module to destroy.
  * pname:pAllocator controls host memory allocation as described in the
    <<memory-allocation, Memory Allocation>> chapter.

A shader module can: be destroyed while pipelines created using its shaders
are still in use.

.Valid Usage
****
  * [[VUID-vkDestroyShaderModule-shaderModule-01092]]
    If sname:VkAllocationCallbacks were provided when pname:shaderModule was
    created, a compatible set of callbacks must: be provided here
  * [[VUID-vkDestroyShaderModule-shaderModule-01093]]
    If no sname:VkAllocationCallbacks were provided when pname:shaderModule
    was created, pname:pAllocator must: be `NULL`
****

include::{generated}/validity/protos/vkDestroyShaderModule.txt[]
--


[[shaders-execution]]
== Shader Execution

At each stage of the pipeline, multiple invocations of a shader may: execute
simultaneously.
Further, invocations of a single shader produced as the result of different
commands may: execute simultaneously.
The relative execution order of invocations of the same shader type is
undefined:.
Shader invocations may: complete in a different order than that in which the
primitives they originated from were drawn or dispatched by the application.
However, fragment shader outputs are written to attachments in
<<primsrast-order,rasterization order>>.

The relative execution order of invocations of different shader types is
largely undefined:.
However, when invoking a shader whose inputs are generated from a previous
pipeline stage, the shader invocations from the previous stage are
guaranteed to have executed far enough to generate input values for all
required inputs.


[[shaders-execution-memory-ordering]]
== Shader Memory Access Ordering

The order in which image or buffer memory is read or written by shaders is
largely undefined:.
For some shader types (vertex, tessellation evaluation, and in some cases,
fragment), even the number of shader invocations that may: perform loads and
stores is undefined:.

In particular, the following rules apply:

  * <<shaders-vertex-execution,Vertex>> and
    <<shaders-tessellation-evaluation-execution,tessellation evaluation>>
    shaders will be invoked at least once for each unique vertex, as defined
    in those sections.
  * <<fragops-shader,Fragment>> shaders will be invoked zero or more times,
    as defined in that section.
  * The relative execution order of invocations of the same shader type is
    undefined:.
    A store issued by a shader when working on primitive B might complete
    prior to a store for primitive A, even if primitive A is specified prior
    to primitive B. This applies even to fragment shaders; while fragment
    shader outputs are always written to the framebuffer in
    <<primsrast-order, rasterization order>>, stores executed by fragment
    shader invocations are not.
  * The relative execution order of invocations of different shader types is
    largely undefined:.

[NOTE]
.Note
====
The above limitations on shader invocation order make some forms of
synchronization between shader invocations within a single set of primitives
unimplementable.
For example, having one invocation poll memory written by another invocation
assumes that the other invocation has been launched and will complete its
writes in finite time.
====

ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]

The <<memory-model,Memory Model>> appendix defines the terminology and rules
for how to correctly communicate between shader invocations, such as when a
write is <<memory-model-visible-to,Visible-To>> a read, and what constitutes
a <<memory-model-access-data-race,Data Race>>.

Applications must: not cause a data race.

endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]

ifndef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]

Stores issued to different memory locations within a single shader
invocation may: not be visible to other invocations, or may: not become
visible in the order they were performed.

The code:OpMemoryBarrier instruction can: be used to provide stronger
ordering of reads and writes performed by a single invocation.
code:OpMemoryBarrier guarantees that any memory transactions issued by the
shader invocation prior to the instruction complete prior to the memory
transactions issued after the instruction.
Memory barriers are needed for algorithms that require multiple invocations
to access the same memory and require the operations to be performed in a
partially-defined relative order.
For example, if one shader invocation does a series of writes, followed by
an code:OpMemoryBarrier instruction, followed by another write, then the
results of the series of writes before the barrier become visible to other
shader invocations at a time earlier or equal to when the results of the
final write become visible to those invocations.
In practice it means that another invocation that sees the results of the
final write would also see the previous writes.
Without the memory barrier, the final write may: be visible before the
previous writes.

Writes that are the result of shader stores through a variable decorated
with code:Coherent automatically have available writes to the same buffer,
buffer view, or image view made visible to them, and are themselves
automatically made available to access by the same buffer, buffer view, or
image view.
Reads that are the result of shader loads through a variable decorated with
code:Coherent automatically have available writes to the same buffer, buffer
view, or image view made visible to them.
The order that coherent writes to different locations become available is
undefined:, unless enforced by a memory barrier instruction or other memory
dependency.

[NOTE]
.Note
====
Explicit memory dependencies must: still be used to guarantee availability
and visibility for access via other buffers, buffer views, or image views.
====

The built-in atomic memory transaction instructions can: be used to read and
write a given memory address atomically.
While built-in atomic functions issued by multiple shader invocations are
executed in undefined: order relative to each other, these functions perform
both a read and a write of a memory address and guarantee that no other
memory transaction will write to the underlying memory between the read and
write.
Atomic operations ensure automatic availability and visibility for writes
and reads in the same way as those to code:Coherent variables.

[NOTE]
.Note
====
Memory accesses performed on different resource descriptors with the same
memory backing may: not be well-defined even with the code:Coherent
decoration or via atomics, due to things such as image layouts or ownership
of the resource - as described in the <<synchronization, Synchronization and
Cache Control>> chapter.
====

[NOTE]
.Note
====
Atomics allow shaders to use shared global addresses for mutual exclusion or
as counters, among other uses.
====

endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]

The SPIR-V *SubgroupMemory*, *CrossWorkgroupMemory*, and
*AtomicCounterMemory* memory semantics are ignored.
Sequentially consistent atomics and barriers are not supported and
*SequentiallyConsistent* is treated as *AcquireRelease*.
*SequentiallyConsistent* should: not be used.


[[shaders-inputs]]
== Shader Inputs and Outputs

Data is passed into and out of shaders using variables with input or output
storage class, respectively.
User-defined inputs and outputs are connected between stages by matching
their code:Location decorations.
Additionally, data can: be provided by or communicated to special functions
provided by the execution environment using code:BuiltIn decorations.

In many cases, the same code:BuiltIn decoration can: be used in multiple
shader stages with similar meaning.
The specific behavior of variables decorated as code:BuiltIn is documented
in the following sections.


ifdef::VK_NV_mesh_shader[]
[[shaders-task]]
== Task Shaders

Task shaders operate in conjunction with the mesh shaders to produce a
collection of primitives that will be processed by subsequent stages of the
graphics pipeline.
Its primary purpose is to create a variable amount of subsequent mesh shader
invocations.

Task shaders are invoked via the execution of the
<<drawing-mesh-shading,programmable mesh shading>> pipeline.

The task shader has no fixed-function inputs other than variables
identifying the specific workgroup and invocation.
The only fixed output of the task shader is a task count, identifying the
number of mesh shader workgroups to create.
The task shader can write additional outputs to task memory, which can be
read by all of the mesh shader workgroups it created.


=== Task Shader Execution

Task workloads are formed from groups of work items called workgroups and
processed by the task shader in the current graphics pipeline.
A workgroup is a collection of shader invocations that execute the same
shader, potentially in parallel.
Task shaders execute in _global workgroups_ which are divided into a number
of _local workgroups_ with a size that can: be set by assigning a value to
the code:LocalSize
ifdef::VK_KHR_maintenance4[or code:LocalSizeId]
execution mode or via an object decorated by the code:WorkgroupSize
decoration.
An invocation within a local workgroup can: share data with other members of
the local workgroup through shared variables and issue memory and control
flow barriers to synchronize with other members of the local workgroup.


[[shaders-mesh]]
== Mesh Shaders

Mesh shaders operate in workgroups to produce a collection of primitives
that will be processed by subsequent stages of the graphics pipeline.
Each workgroup emits zero or more output primitives and the group of
vertices and their associated data required for each output primitive.

Mesh shaders are invoked via the execution of the
<<drawing-mesh-shading,programmable mesh shading>> pipeline.

The only inputs available to the mesh shader are variables identifying the
specific workgroup and invocation and, if applicable, any outputs written to
task memory by the task shader that spawned the mesh shader's workgroup.
The mesh shader can operate without a task shader as well.

The invocations of the mesh shader workgroup write an output mesh,
comprising a set of primitives with per-primitive attributes, a set of
vertices with per-vertex attributes, and an array of indices identifying the
mesh vertices that belong to each primitive.
The primitives of this mesh are then processed by subsequent graphics
pipeline stages, where the outputs of the mesh shader form an interface with
the fragment shader.


=== Mesh Shader Execution

Mesh workloads are formed from groups of work items called workgroups and
processed by the mesh shader in the current graphics pipeline.
A workgroup is a collection of shader invocations that execute the same
shader, potentially in parallel.
Mesh shaders execute in _global workgroups_ which are divided into a number
of _local workgroups_ with a size that can: be set by assigning a value to
the code:LocalSize
ifdef::VK_KHR_maintenance4[or code:LocalSizeId]
execution mode or via an object decorated by the code:WorkgroupSize
decoration.
An invocation within a local workgroup can: share data with other members of
the local workgroup through shared variables and issue memory and control
flow barriers to synchronize with other members of the local workgroup.

The _global workgroups_ may be generated explcitly via the API, or
implicitly through the task shader's work creation mechanism.
endif::VK_NV_mesh_shader[]


[[shaders-vertex]]
== Vertex Shaders

Each vertex shader invocation operates on one vertex and its associated
<<fxvertex-attrib,vertex attribute>> data, and outputs one vertex and
associated data.
ifndef::VK_NV_mesh_shader[]
Graphics pipelines must: include a vertex shader, and the vertex shader
stage is always the first shader stage in the graphics pipeline.
endif::VK_NV_mesh_shader[]
ifdef::VK_NV_mesh_shader[]
Graphics pipelines using primitive shading must: include a vertex shader,
and the vertex shader stage is always the first shader stage in the graphics
pipeline.
endif::VK_NV_mesh_shader[]


[[shaders-vertex-execution]]
=== Vertex Shader Execution

A vertex shader must: be executed at least once for each vertex specified by
a drawing command.
ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
If the subpass includes multiple views in its view mask, the shader may: be
invoked separately for each view.
endif::VK_VERSION_1_1,VK_KHR_multiview[]
During execution, the shader is presented with the index of the vertex and
instance for which it has been invoked.
Input variables declared in the vertex shader are filled by the
implementation with the values of vertex attributes associated with the
invocation being executed.

If the same vertex is specified multiple times in a drawing command (e.g. by
including the same index value multiple times in an index buffer) the
implementation may: reuse the results of vertex shading if it can statically
determine that the vertex shader invocations will produce identical results.

[NOTE]
.Note
====
It is implementation-dependent when and if results of vertex shading are
reused, and thus how many times the vertex shader will be executed.
This is true also if the vertex shader contains stores or atomic operations
(see <<features-vertexPipelineStoresAndAtomics,
pname:vertexPipelineStoresAndAtomics>>).
====


[[shaders-tessellation-control]]
== Tessellation Control Shaders

The tessellation control shader is used to read an input patch provided by
the application and to produce an output patch.
Each tessellation control shader invocation operates on an input patch
(after all control points in the patch are processed by a vertex shader) and
its associated data, and outputs a single control point of the output patch
and its associated data, and can: also output additional per-patch data.
The input patch is sized according to the pname:patchControlPoints member of
slink:VkPipelineTessellationStateCreateInfo, as part of input assembly.

ifdef::VK_EXT_extended_dynamic_state2[]
The input patch can also be dynamically sized with pname:patchControlPoints
parameter of flink:vkCmdSetPatchControlPointsEXT.

[open,refpage='vkCmdSetPatchControlPointsEXT',desc='Specify the number of control points per patch dynamically for a command buffer',type='protos']
--
To <<pipelines-dynamic-state, dynamically set>> the number of control points
per patch, call:

include::{generated}/api/protos/vkCmdSetPatchControlPointsEXT.txt[]

  * pname:commandBuffer is the command buffer into which the command will be
    recorded.
  * pname:patchControlPoints specifies the number of control points per
    patch.

This command sets the number of control points per patch for subsequent
drawing commands when the graphics pipeline is created with
ename:VK_DYNAMIC_STATE_PATCH_CONTROL_POINTS_EXT set in
slink:VkPipelineDynamicStateCreateInfo::pname:pDynamicStates.
Otherwise, this state is specified by the
slink:VkPipelineTessellationStateCreateInfo::pname:patchControlPoints value
used to create the currently active pipeline.

.Valid Usage
****
  * [[VUID-vkCmdSetPatchControlPointsEXT-None-04873]]
    The <<features-extendedDynamicState2PatchControlPoints,
    extendedDynamicState2PatchControlPoints>> feature must: be enabled
  * [[VUID-vkCmdSetPatchControlPointsEXT-patchControlPoints-04874]]
    pname:patchControlPoints must: be greater than zero and less than or
    equal to sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize
****

include::{generated}/validity/protos/vkCmdSetPatchControlPointsEXT.txt[]
--
endif::VK_EXT_extended_dynamic_state2[]

The size of the output patch is controlled by the code:OpExecutionMode
code:OutputVertices specified in the tessellation control or tessellation
evaluation shaders, which must: be specified in at least one of the shaders.
The size of the input and output patches must: each be greater than zero and
less than or equal to
sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize.


[[shaders-tessellation-control-execution]]
=== Tessellation Control Shader Execution

A tessellation control shader is invoked at least once for each _output_
vertex in a patch.
ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
If the subpass includes multiple views in its view mask, the shader may: be
invoked separately for each view.
endif::VK_VERSION_1_1,VK_KHR_multiview[]

Inputs to the tessellation control shader are generated by the vertex
shader.
Each invocation of the tessellation control shader can: read the attributes
of any incoming vertices and their associated data.
The invocations corresponding to a given patch execute logically in
parallel, with undefined: relative execution order.
However, the code:OpControlBarrier instruction can: be used to provide
limited control of the execution order by synchronizing invocations within a
patch, effectively dividing tessellation control shader execution into a set
of phases.
Tessellation control shaders will read undefined: values if one invocation
reads a per-vertex or per-patch output written by another invocation at any
point during the same phase, or if two invocations attempt to write
different values to the same per-patch output in a single phase.


[[shaders-tessellation-evaluation]]
== Tessellation Evaluation Shaders

The Tessellation Evaluation Shader operates on an input patch of control
points and their associated data, and a single input barycentric coordinate
indicating the invocation's relative position within the subdivided patch,
and outputs a single vertex and its associated data.


[[shaders-tessellation-evaluation-execution]]
=== Tessellation Evaluation Shader Execution

A tessellation evaluation shader is invoked at least once for each unique
vertex generated by the tessellator.
ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
If the subpass includes multiple views in its view mask, the shader may: be
invoked separately for each view.
endif::VK_VERSION_1_1,VK_KHR_multiview[]


[[shaders-geometry]]
== Geometry Shaders

The geometry shader operates on a group of vertices and their associated
data assembled from a single input primitive, and emits zero or more output
primitives and the group of vertices and their associated data required for
each output primitive.


[[shaders-geometry-execution]]
=== Geometry Shader Execution

A geometry shader is invoked at least once for each primitive produced by
the tessellation stages, or at least once for each primitive generated by
<<drawing,primitive assembly>> when tessellation is not in use.
A shader can request that the geometry shader runs multiple
<<geometry-invocations, instances>>.
A geometry shader is invoked at least once for each instance.
ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
If the subpass includes multiple views in its view mask, the shader may: be
invoked separately for each view.
endif::VK_VERSION_1_1,VK_KHR_multiview[]


[[shaders-fragment]]
== Fragment Shaders

Fragment shaders are invoked as a <<fragops-shader, fragment operation>> in
a graphics pipeline.
Each fragment shader invocation operates on a single fragment and its
associated data.
With few exceptions, fragment shaders do not have access to any data
associated with other fragments and are considered to execute in isolation
of fragment shader invocations associated with other fragments.


[[shaders-compute]]
== Compute Shaders

Compute shaders are invoked via flink:vkCmdDispatch and
flink:vkCmdDispatchIndirect commands.
In general, they have access to similar resources as shader stages executing
as part of a graphics pipeline.

Compute workloads are formed from groups of work items called workgroups and
processed by the compute shader in the current compute pipeline.
A workgroup is a collection of shader invocations that execute the same
shader, potentially in parallel.
Compute shaders execute in _global workgroups_ which are divided into a
number of _local workgroups_ with a size that can: be set by assigning a
value to the code:LocalSize
ifdef::VK_KHR_maintenance4[or code:LocalSizeId]
execution mode or via an object decorated by the code:WorkgroupSize
decoration.
An invocation within a local workgroup can: share data with other members of
the local workgroup through shared variables and issue memory and control
flow barriers to synchronize with other members of the local workgroup.


ifdef::VK_NV_ray_tracing,VK_KHR_ray_tracing_pipeline[]
[[shaders-raytracing-shaders]]
[[shaders-ray-generation]]
== Ray Generation Shaders

A ray generation shader is similar to a compute shader.
Its main purpose is to execute ray tracing queries using code:OpTraceRayKHR
instructions and process the results.


[[shaders-ray-generation-execution]]
=== Ray Generation Shader Execution

One ray generation shader is executed per ray tracing dispatch.
Its location in the shader binding table (see <<shader-binding-table,Shader
Binding Table>> for details) is passed directly into fname:vkCmdTraceRaysKHR
using the pname:raygenShaderBindingTableBuffer and
pname:raygenShaderBindingOffset parameters.


[[shaders-intersection]]
== Intersection Shaders

Intersection shaders enable the implementation of arbitrary, application
defined geometric primitives.
An intersection shader for a primitive is executed whenever its axis-aligned
bounding box is hit by a ray.

Like other ray tracing shader domains, an intersection shader operates on a
single ray at a time.
It also operates on a single primitive at a time.
It is therefore the purpose of an intersection shader to compute the
ray-primitive intersections and report them.
To report an intersection, the shader calls the code:OpReportIntersectionKHR
instruction.

An intersection shader communicates with any-hit and closest shaders by
generating attribute values that they can: read.
Intersection shaders cannot: read or modify the ray payload.


[[shaders-intersection-execution]]
=== Intersection Shader Execution
The order in which intersections are found along a ray, and therefore the
order in which intersection shaders are executed, is unspecified.

The intersection shader of the closest AABB which intersects the ray is
guaranteed to be executed at some point during traversal, unless the ray is
forcibly terminated.


[[shaders-any-hit]]
== Any-Hit Shaders

The any-hit shader is executed after the intersection shader reports an
intersection that lies within the current [eq]#[t~min~,t~max~]# of the ray.
The main use of any-hit shaders is to programmatically decide whether or not
an intersection will be accepted.
The intersection will be accepted unless the shader calls the
code:OpIgnoreIntersectionKHR instruction.
Any-hit shaders have read-only access to the attributes generated by the
corresponding intersection shader, and can: read or modify the ray payload.


[[shaders-any-hit-execution]]
=== Any-Hit Shader Execution

The order in which intersections are found along a ray, and therefore the
order in which any-hit shaders are executed, is unspecified.

The any-hit shader of the closest hit is guaranteed to be executed at some
point during traversal, unless the ray is forcibly terminated.


[[shaders-closest-hit]]
== Closest Hit Shaders

Closest hit shaders have read-only access to the attributes generated by the
corresponding intersection shader, and can: read or modify the ray payload.
They also have access to a number of system-generated values.
Closest hit shaders can: call code:OpTraceRayKHR to recursively trace rays.


[[shaders-closest-hit-execution]]
=== Closest Hit Shader Execution

Exactly one closest hit shader is executed when traversal is finished and an
intersection has been found and accepted.


[[shaders-miss]]
== Miss Shaders

Miss shaders can: access the ray payload and can: trace new rays through the
code:OpTraceRayKHR instruction, but cannot: access attributes since they are
not associated with an intersection.


[[shaders-miss-execution]]
=== Miss Shader Execution

A miss shader is executed instead of a closest hit shader if no intersection
was found during traversal.


[[shaders-callable]]
== Callable Shaders

Callable shaders can: access a callable payload that works similarly to ray
payloads to do subroutine work.


[[shaders-callable-execution]]
=== Callable Shader Execution

A callable shader is executed by calling code:OpExecuteCallableKHR from an
allowed shader stage.

endif::VK_NV_ray_tracing,VK_KHR_ray_tracing_pipeline[]


[[shaders-interpolation-decorations]]
== Interpolation Decorations

Interpolation decorations control the behavior of attribute interpolation in
the fragment shader stage.
Interpolation decorations can: be applied to code:Input storage class
variables in the fragment shader stage's interface, and control the
interpolation behavior of those variables.

Inputs that could be interpolated can: be decorated by at most one of the
following decorations:

  * code:Flat: no interpolation
  * code:NoPerspective: linear interpolation (for
    <<line_linear_interpolation,lines>> and
    <<triangle_linear_interpolation,polygons>>)
ifdef::VK_NV_fragment_shader_barycentric[]
  * code:PerVertexNV: values fetched from shader-specified primitive vertex
endif::VK_NV_fragment_shader_barycentric[]

Fragment input variables decorated with neither code:Flat nor
code:NoPerspective use perspective-correct interpolation (for
<<line_perspective_interpolation,lines>> and
<<triangle_perspective_interpolation,polygons>>).

The presence of and type of interpolation is controlled by the above
interpolation decorations as well as the auxiliary decorations code:Centroid
and code:Sample.

A variable decorated with code:Flat will not be interpolated.
Instead, it will have the same value for every fragment within a triangle.
This value will come from a single <<vertexpostproc-flatshading,provoking
vertex>>.
A variable decorated with code:Flat can: also be decorated with
code:Centroid or code:Sample, which will mean the same thing as decorating
it only as code:Flat.

For fragment shader input variables decorated with neither code:Centroid nor
code:Sample, the assigned variable may: be interpolated anywhere within the
fragment and a single value may: be assigned to each sample within the
fragment.

If a fragment shader input is decorated with code:Centroid, a single value
may: be assigned to that variable for all samples in the fragment, but that
value must: be interpolated to a location that lies in both the fragment and
in the primitive being rendered, including any of the fragment's samples
covered by the primitive.
Because the location at which the variable is interpolated may: be different
in neighboring fragments, and derivatives may: be computed by computing
differences between neighboring fragments, derivatives of centroid-sampled
inputs may: be less accurate than those for non-centroid interpolated
variables.
ifdef::VK_EXT_post_depth_coverage[]
The code:PostDepthCoverage execution mode does not affect the determination
of the centroid location.
endif::VK_EXT_post_depth_coverage[]

If a fragment shader input is decorated with code:Sample, a separate value
must: be assigned to that variable for each covered sample in the fragment,
and that value must: be sampled at the location of the individual sample.
When pname:rasterizationSamples is ename:VK_SAMPLE_COUNT_1_BIT, the fragment
center must: be used for code:Centroid, code:Sample, and undecorated
attribute interpolation.

Fragment shader inputs that are signed or unsigned integers, integer
vectors, or any double-precision floating-point type must: be decorated with
code:Flat.

ifdef::VK_AMD_shader_explicit_vertex_parameter[]
When the `apiext:VK_AMD_shader_explicit_vertex_parameter` device extension
is enabled inputs can: be also decorated with the code:CustomInterpAMD
interpolation decoration, including fragment shader inputs that are signed
or unsigned integers, integer vectors, or any double-precision
floating-point type.
Inputs decorated with code:CustomInterpAMD can: only be accessed by the
extended instruction code:InterpolateAtVertexAMD and allows accessing the
value of the input for individual vertices of the primitive.
endif::VK_AMD_shader_explicit_vertex_parameter[]

ifdef::VK_NV_fragment_shader_barycentric[]
[[shaders-interpolation-decorations-pervertexnv]]
When the pname:fragmentShaderBarycentric feature is enabled, inputs can: be
also decorated with the code:PerVertexNV interpolation decoration, including
fragment shader inputs that are signed or unsigned integers, integer
vectors, or any double-precision floating-point type.
Inputs decorated with code:PerVertexNV can: only be accessed using an extra
array dimension, where the extra index identifies one of the vertices of the
primitive that produced the fragment.
endif::VK_NV_fragment_shader_barycentric[]


[[shaders-staticuse]]
== Static Use

A SPIR-V module declares a global object in memory using the code:OpVariable
instruction, which results in a pointer code:x to that object.
A specific entry point in a SPIR-V module is said to _statically use_ that
object if that entry point's call tree contains a function containing a
memory instruction or image instruction with code:x as an code:id operand.
See the "`Memory Instructions`" and "`Image Instructions`" subsections of
section 3 "`Binary Form`" of the SPIR-V specification for the complete list
of SPIR-V memory instructions.

Static use is not used to control the behavior of variables with code:Input
and code:Output storage.
The effects of those variables are applied based only on whether they are
present in a shader entry point's interface.


[[shaders-scope]]
== Scope

A _scope_ describes a set of shader invocations, where each such set is a
_scope instance_.
Each invocation belongs to one or more scope instances, but belongs to no
more than one scope instance for each scope.

The operations available between invocations in a given scope instance vary,
with smaller scopes generally able to perform more operations, and with
greater efficiency.


[[shaders-scope-cross-device]]
=== Cross Device

All invocations executed in a Vulkan instance fall into a single _cross
device scope instance_.

Whilst the code:CrossDevice scope is defined in SPIR-V, it is disallowed in
Vulkan.
API <<synchronization, synchronization>> commands can: be used to
communicate between devices.


[[shaders-scope-device]]
=== Device

All invocations executed on a single device form a _device scope instance_.

ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
If the <<features-vulkanMemoryModel,pname:vulkanMemoryModel>> and
<<features-vulkanMemoryModelDeviceScope,
pname:vulkanMemoryModelDeviceScope>> features are enabled, this scope is
represented in SPIR-V by the code:Device code:Scope, which can: be used as a
code:Memory code:Scope for barrier and atomic operations.

ifdef::VK_KHR_shader_clock[]
If both the <<features-shaderDeviceClock, pname:shaderDeviceClock>> and
<<features-vulkanMemoryModelDeviceScope,
pname:vulkanMemoryModelDeviceScope>> features are enabled, using the
code:Device code:Scope with the code:OpReadClockKHR instruction will read
from a clock that is consistent across invocations in the same device scope
instance.
endif::VK_KHR_shader_clock[]
endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]

There is no method to synchronize the execution of these invocations within
SPIR-V, and this can: only be done with API synchronization primitives.

ifdef::VK_VERSION_1_1,VK_KHR_device_group[]
Invocations executing on different devices in a device group operate in
separate device scope instances.
endif::VK_VERSION_1_1,VK_KHR_device_group[]

ifndef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
The scope only extends to the queue family, not the whole device.
endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]


[[shaders-scope-queue-family]]
=== Queue Family

Invocations executed by queues in a given queue family form a _queue family
scope instance_.

This scope is identified in SPIR-V as the
ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
code:QueueFamily code:Scope if the
<<features-vulkanMemoryModel,pname:vulkanMemoryModel>> feature is enabled,
or if not, the
endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
code:Device code:Scope, which can: be used as a code:Memory code:Scope for
barrier and atomic operations.

ifdef::VK_KHR_shader_clock[]
If the <<features-shaderDeviceClock, pname:shaderDeviceClock>> feature is
enabled,
ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
but the <<features-vulkanMemoryModelDeviceScope,
pname:vulkanMemoryModelDeviceScope>> feature is not enabled,
endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
using the code:Device code:Scope with the code:OpReadClockKHR instruction
will read from a clock that is consistent across invocations in the same
queue family scope instance.
endif::VK_KHR_shader_clock[]

There is no method to synchronize the execution of these invocations within
SPIR-V, and this can: only be done with API synchronization primitives.

Each invocation in a queue family scope instance must: be in the same
<<shaders-scope-device, device scope instance>>.


[[shaders-scope-command]]
=== Command

Any shader invocations executed as the result of a single command such as
flink:vkCmdDispatch or flink:vkCmdDraw form a _command scope instance_.
For indirect drawing commands with pname:drawCount greater than one,
invocations from separate draws are in separate command scope instances.
ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
For ray tracing shaders, an invocation group is an implementation-dependent
subset of the set of shader invocations of a given shader stage which are
produced by a single trace rays command.
endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]

There is no specific code:Scope for communication across invocations in a
command scope instance.
As this has a clear boundary at the API level, coordination here can: be
performed in the API, rather than in SPIR-V.

Each invocation in a command scope instance must: be in the same
<<shaders-scope-queue-family, queue-family scope instance>>.

For shaders without defined <<shaders-scope-workgroup, workgroups>>, this
set of invocations forms an _invocation group_ as defined in the
<<spirv-spec,SPIR-V specification>>.


[[shaders-scope-primitive]]
=== Primitive

Any fragment shader invocations executed as the result of rasterization of a
single primitive form a _primitive scope instance_.

There is no specific code:Scope for communication across invocations in a
primitive scope instance.

Any generated <<shaders-helper-invocations, helper invocations>> are
included in this scope instance.

Each invocation in a primitive scope instance must: be in the same
<<shaders-scope-command, command scope instance>>.

Any input variables decorated with code:Flat are uniform within a primitive
scope instance.


// intentionally no VK_NV_ray_tracing here since this scope does not exist there
ifdef::VK_KHR_ray_tracing_pipeline[]
[[shaders-scope-shadercall]]
=== Shader Call

Any <<shader-call-related,shader-call-related>> invocations that are
executed in one or more ray tracing execution models form a _shader call
scope instance_.

The code:ShaderCallKHR code:Scope can be used as code:Memory code:Scope for
barrier and atomic operations.

Each invocation in a shader call scope instance must: be in the same
<<shaders-scope-queue-family, queue family scope instance>>.
endif::VK_KHR_ray_tracing_pipeline[]


[[shaders-scope-workgroup]]
=== Workgroup

A _local workgroup_ is a set of invocations that can synchronize and share
data with each other using memory in the code:Workgroup storage class.

The code:Workgroup code:Scope can be used as both an code:Execution
code:Scope and code:Memory code:Scope for barrier and atomic operations.

Each invocation in a local workgroup must: be in the same
<<shaders-scope-command, command scope instance>>.

Only
ifdef::VK_NV_mesh_shader[]
task, mesh, and
endif::VK_NV_mesh_shader[]
compute shaders have defined workgroups - other shader types cannot: use
workgroup functionality.
For shaders that have defined workgroups, this set of invocations forms an
_invocation group_ as defined in the <<spirv-spec,SPIR-V specification>>.


ifdef::VK_VERSION_1_1[]
[[shaders-scope-subgroup]]
=== Subgroup

A _subgroup_ (see the subsection "`Control Flow`" of section 2 of the SPIR-V
1.3 Revision 1 specification) is a set of invocations that can synchronize
and share data with each other efficiently.

The code:Subgroup code:Scope can be used as both an code:Execution
code:Scope and code:Memory code:Scope for barrier and atomic operations.
Other <<VkSubgroupFeatureFlagBits, subgroup features>> allow the use of
<<shaders-group-operations, group operations>> with subgroup scope.

ifdef::VK_KHR_shader_clock[]
If the <<features-shaderSubgroupClock, pname:shaderSubgroupClock>> feature
is enabled, using the code:Subgroup code:Scope with the code:OpReadClockKHR
instruction will read from a clock that is consistent across invocations in
the same subgroup.
endif::VK_KHR_shader_clock[]

For <<shaders-scope-workgroup, shaders that have defined workgroups>>, each
invocation in a subgroup must: be in the same <<shaders-scope-workgroup,
local workgroup>>.

In other shader stages, each invocation in a subgroup must: be in the same
<<shaders-scope-device, device scope instance>>.

Only <<limits-subgroup-supportedStages, shader stages that support subgroup
operations>> have defined subgroups.
endif::VK_VERSION_1_1[]


[[shaders-scope-quad]]
=== Quad

A _quad scope instance_ is formed of four shader invocations.

In a fragment shader, each invocation in a quad scope instance is formed of
invocations in neighboring framebuffer locations [eq]#(x~i~, y~i~)#, where:

  * [eq]#i# is the index of the invocation within the scope instance.
  * [eq]#w# and [eq]#h# are the number of pixels the fragment covers in the
    [eq]#x# and [eq]#y# axes.
  * [eq]#w# and [eq]#h# are identical for all participating invocations.
  * [eq]#(x~0~) = (x~1~ - w) = (x~2~) = (x~3~ - w)#
  * [eq]#(y~0~) = (y~1~) = (y~2~ - h) = (y~3~ - h)#
  * Each invocation has the same layer and sample indices.

ifdef::VK_NV_compute_shader_derivatives[]
In a compute shader, if the code:DerivativeGroupQuadsNV execution mode is
specified, each invocation in a quad scope instance is formed of invocations
with adjacent local invocation IDs [eq]#(x~i~, y~i~)#, where:

  * [eq]#i# is the index of the invocation within the quad scope instance.
  * [eq]#(x~0~) = (x~1~ - 1) = (x~2~) = (x~3~ - 1)#
  * [eq]#(y~0~) = (y~1~) = (y~2~ - 1) = (y~3~ - 1)#
  * [eq]#x~0~# and [eq]#y~0~# are integer multiples of 2.
  * Each invocation has the same [eq]#z# coordinate.

In a compute shader, if the code:DerivativeGroupLinearNV execution mode is
specified, each invocation in a quad scope instance is formed of invocations
with adjacent local invocation indices [eq]#(l~i~)#, where:

  * [eq]#i# is the index of the invocation within the quad scope instance.
  * [eq]#(l~0~) = (l~1~ - 1) = (l~2~ - 2) = (l~3~ - 3)#
  * [eq]#l~0~# is an integer multiple of 4.

endif::VK_NV_compute_shader_derivatives[]

ifdef::VK_VERSION_1_1[]
In all shaders, each invocation in a quad scope instance is formed of
invocations in adjacent subgroup invocation indices [eq]#(s~i~)#, where:

  * [eq]#i# is the index of the invocation within the quad scope instance.
  * [eq]#(s~0~) = (s~1~ - 1) = (s~2~ - 2) = (s~3~ - 3)#
  * [eq]#s~0~# is an integer multiple of 4.

Each invocation in a quad scope instance must: be in the same
<<shaders-scope-subgroup, subgroup>>.
endif::VK_VERSION_1_1[]

ifndef::VK_VERSION_1_1[]
The specific set of invocations that make up a quad scope instance in other
shader stages is undefined:.
endif::VK_VERSION_1_1[]

In a fragment shader, each invocation in a quad scope instance must: be in
the same <<shaders-scope-primitive, primitive scope instance>>.

ifndef::VK_VERSION_1_1[]
For <<shaders-scope-workgroup, shaders that have defined workgroups>>, each
invocation in a quad scope instance must: be in the same
<<shaders-scope-workgroup, local workgroup>>.

In other shader stages, each invocation in a quad scope instance must: be in
the same <<shaders-scope-device, device scope instance>>.
endif::VK_VERSION_1_1[]

Fragment
ifdef::VK_NV_compute_shader_derivatives,VK_VERSION_1_1[]
and compute
endif::VK_NV_compute_shader_derivatives,VK_VERSION_1_1[]
shaders have defined quad scope instances.
ifdef::VK_VERSION_1_1[]
If the <<limits-subgroup-quadOperationsInAllStages,
pname:quadOperationsInAllStages>> limit is supported, any
<<limits-subgroup-supportedStages, shader stages that support subgroup
operations>> also have defined quad scope instances.
endif::VK_VERSION_1_1[]


ifdef::VK_EXT_fragment_shader_interlock[]
[[shaders-scope-fragment-interlock]]
=== Fragment Interlock

A _fragment interlock scope instance_ is formed of fragment shader
invocations based on their framebuffer locations [eq]#(x,y,layer,sample)#,
executed by commands inside a single <<renderpass,subpass>>.

The specific set of invocations included varies based on the execution mode
as follows:

  * If the code:SampleInterlockOrderedEXT or
    code:SampleInterlockUnorderedEXT execution modes are used, only
    invocations with identical framebuffer locations
    [eq]#(x,y,layer,sample)# are included.
  * If the code:PixelInterlockOrderedEXT or code:PixelInterlockUnorderedEXT
    execution modes are used, fragments with different sample ids are also
    included.
ifdef::VK_NV_shading_rate_image,VK_KHR_fragment_shading_rate[]
  * If the code:ShadingRateInterlockOrderedEXT or
    code:ShadingRateInterlockUnorderedEXT execution modes are used,
    fragments from neighbouring framebuffer locations are also included, as
    <<primsrast-shading-rate-image,determined by the shading rate>>.
endif::VK_NV_shading_rate_image,VK_KHR_fragment_shading_rate[]

Only fragment shaders with one of the above execution modes have defined
fragment interlock scope instances.

There is no specific code:Scope value for communication across invocations
in a fragment interlock scope instance.
However, this is implicitly used as a memory scope by
code:OpBeginInvocationInterlockEXT and code:OpEndInvocationInterlockEXT.

Each invocation in a fragment interlock scope instance must: be in the same
<<shaders-scope-queue-family, queue family scope instance>>.
endif::VK_EXT_fragment_shader_interlock[]


[[shaders-scope-invocation]]
=== Invocation

The smallest _scope_ is a single invocation; this is represented by the
code:Invocation code:Scope in SPIR-V.

Fragment shader invocations must: be in a <<shaders-scope-primitive,
primitive scope instance>>.

ifdef::VK_EXT_fragment_shader_interlock[]
Invocations in <<shaders-scope-fragment-interlock, fragment shaders that
have a defined fragment interlock scope>> must: be in a
<<shaders-scope-fragment-interlock, fragment interlock scope instance>>.
endif::VK_EXT_fragment_shader_interlock[]

Invocations in <<shaders-scope-workgroup, shaders that have defined
workgroups>> must: be in a <<shaders-scope-workgroup, local workgroup>>.

ifdef::VK_VERSION_1_1[]
Invocations in <<shaders-scope-subgroup, shaders that have a defined
subgroup scope>> must: be in a <<shaders-scope-subgroup, subgroup>>.
endif::VK_VERSION_1_1[]

Invocations in <<shaders-scope-quad, shaders that have a defined quad
scope>> must: be in a <<shaders-scope-quad, quad scope instance>>.

All invocations in all stages must: be in a <<shaders-scope-command,command
scope instance>>.


ifdef::VK_VERSION_1_1[]
[[shaders-group-operations]]
== Group Operations

_Group operations_ are executed by multiple invocations within a
<<shaders-scope, scope instance>>; with each invocation involved in
calculating the result.
This provides a mechanism for efficient communication between invocations in
a particular scope instance.

Group operations all take a code:Scope defining the desired
<<shaders-scope,scope instance>> to operate within.
Only the code:Subgroup scope can: be used for these operations; the
<<limits-subgroupSupportedOperations, pname:subgroupSupportedOperations>>
limit defines which types of operation can: be used.


[[shaders-group-operations-basic]]
=== Basic Group Operations

Basic group operations include the use of code:OpGroupNonUniformElect,
code:OpControlBarrier, code:OpMemoryBarrier, and atomic operations.

code:OpGroupNonUniformElect can: be used to choose a single invocation to
perform a task for the whole group.
Only the invocation with the lowest id in the group will return code:true.

The <<memory-model,Memory Model>> appendix defines the operation of barriers
and atomics.


[[shaders-group-operations-vote]]
=== Vote Group Operations

The vote group operations allow invocations within a group to compare values
across a group.
The types of votes enabled are:

  * Do all active group invocations agree that an expression is true?
  * Do any active group invocations evaluate an expression to true?
  * Do all active group invocations have the same value of an expression?

[NOTE]
.Note
====
These operations are useful in combination with control flow in that they
allow for developers to check whether conditions match across the group and
choose potentially faster code-paths in these cases.
====


[[shaders-group-operations-arithmetic]]
=== Arithmetic Group Operations

The arithmetic group operations allow invocations to perform scans and
reductions across a group.
The operators supported are add, mul, min, max, and, or, xor.

For reductions, every invocation in a group will obtain the cumulative
result of these operators applied to all values in the group.
For exclusive scans, each invocation in a group will obtain the cumulative
result of these operators applied to all values in invocations with a lower
index in the group.
Inclusive scans are identical to exclusive scans, except the cumulative
result includes the operator applied to the value in the current invocation.

The order in which these operators are applied is implementation-dependent.


[[shaders-group-operations-ballot]]
=== Ballot Group Operations

The ballot group operations allow invocations to perform more complex votes
across the group.
The ballot functionality allows all invocations within a group to provide a
boolean value and get as a result what each invocation provided as their
boolean value.
The broadcast functionality allows values to be broadcast from an invocation
to all other invocations within the group.


[[shaders-group-operations-shuffle]]
=== Shuffle Group Operations

The shuffle group operations allow invocations to read values from other
invocations within a group.


[[shaders-group-operations-shuffle-relative]]
=== Shuffle Relative Group Operations

The shuffle relative group operations allow invocations to read values from
other invocations within the group relative to the current invocation in the
group.
The relative operations supported allow data to be shifted up and down
through the invocations within a group.


[[shaders-group-operations-clustered]]
=== Clustered Group Operations

The clustered group operations allow invocations to perform an operation
among partitions of a group, such that the operation is only performed
within the group invocations within a partition.
The partitions for clustered group operations are consecutive power-of-two
size groups of invocations and the cluster size must: be known at pipeline
creation time.
The operations supported are add, mul, min, max, and, or, xor.


[[shaders-quad-operations]]
== Quad Group Operations

Quad group operations (code:OpGroupNonUniformQuad*) are a specialized type
of <<shaders-group-operations, group operations>> that only operate on
<<shaders-scope-quad, quad scope instances>>.
Whilst these instructions do include a code:Scope parameter, this scope is
always overridden; only the <<shaders-scope-quad, quad scope instance>> is
included in its execution scope.

Fragment shaders that statically execute quad group operations must: launch
sufficient invocations to ensure their correct operation; additional
<<shaders-helper-invocations, helper invocations>> are launched for
framebuffer locations not covered by rasterized fragments if necessary.

The index used to select participating invocations is [eq]#i#, as described
for a <<shaders-scope-quad, quad scope instance>>, defined as the _quad
index_ in the <<spirv-spec,SPIR-V specification>>.

For code:OpGroupNonUniformQuadBroadcast this value is equal to code:Index.
For code:OpGroupNonUniformQuadSwap, it is equal to the implicit code:Index
used by each participating invocation.
endif::VK_VERSION_1_1[]


[[shaders-derivative-operations]]
== Derivative Operations

Derivative operations calculate the partial derivative for an expression
[eq]#P# as a function of an invocation's [eq]#x# and [eq]#y# coordinates.

Derivative operations operate on a set of invocations known as a _derivative
group_ as defined in the <<spirv-spec,SPIR-V specification>>.
A derivative group is equivalent to
ifdef::VK_NV_compute_shader_derivatives[]
the <<shaders-scope-quad, quad scope instance>> for a compute shader
invocation, or
endif::VK_NV_compute_shader_derivatives[]
the <<shaders-scope-primitive, primitive scope instance>> for a fragment
shader invocation.

Derivatives are calculated assuming that [eq]#P# is piecewise linear and
continuous within the derivative group.
All dynamic instances of explicit derivative instructions (code:OpDPdx*,
code:OpDPdy*, and code:OpFwidth*) must: be executed in control flow that is
uniform within a derivative group.
For other derivative operations, results are undefined: if a dynamic
instance is executed in control flow that is not uniform within the
derivative group.

Fragment shaders that statically execute derivative operations must: launch
sufficient invocations to ensure their correct operation; additional
<<shaders-helper-invocations, helper invocations>> are launched for
framebuffer locations not covered by rasterized fragments if necessary.

ifdef::VK_NV_compute_shader_derivatives[]
[NOTE]
.Note
====
In a compute shader, it is the application's responsibility to ensure that
sufficient invocations are launched.
====
endif::VK_NV_compute_shader_derivatives[]

Derivative operations calculate their results as the difference between the
result of [eq]#P# across invocations in the quad.
For fine derivative operations (code:OpDPdxFine and code:OpDPdyFine), the
values of [eq]#DPdx(P~i~)# are calculated as

  {empty}:: [eq]#DPdx(P~0~) = DPdx(P~1~) = P~1~ - P~0~#
  {empty}:: [eq]#DPdx(P~2~) = DPdx(P~3~) = P~3~ - P~2~#

and the values of [eq]#DPdy(P~i~)# are calculated as

  {empty}:: [eq]#DPdy(P~0~) = DPdy(P~2~) = P~2~ - P~0~#
  {empty}:: [eq]#DPdy(P~1~) = DPdy(P~3~) = P~3~ - P~1~#

where [eq]#i# is the index of each invocation as described in
<<shaders-scope-quad>>.

Coarse derivative operations (code:OpDPdxCoarse and code:OpDPdyCoarse),
calculate their results in roughly the same manner, but may: only calculate
two values instead of four (one for each of [eq]#DPdx# and [eq]#DPdy#),
reusing the same result no matter the originating invocation.
If an implementation does this, it should: use the fine derivative
calculations described for [eq]#P~0~#.

[NOTE]
.Note
====
Derivative values are calculated between fragments rather than pixels.
If the fragment shader invocations involved in the calculation cover
multiple pixels, these operations cover a wider area, resulting in larger
derivative values.
This in turn will result in a coarser level of detail being selected for
image sampling operations using derivatives.

Applications may want to account for this when using multi-pixel fragments;
if pixel derivatives are desired, applications should use explicit
derivative operations and divide the results by the size of the fragment in
each dimension as follows:

  {empty}:: [eq]#DPdx(P~n~)' = DPdx(P~n~) / w#
  {empty}:: [eq]#DPdy(P~n~)' = DPdy(P~n~) / h#

where [eq]#w# and [eq]#h# are the size of the fragments in the quad, and
[eq]#DPdx(P~n~)'# and [eq]#DPdy(P~n~)'# are the pixel derivatives.
====

The results for code:OpDPdx and code:OpDPdy may: be calculated as either
fine or coarse derivatives, with implementations favouring the most
efficient approach.
Implementations must: choose coarse or fine consistently between the two.

Executing code:OpFwidthFine, code:OpFwidthCoarse, or code:OpFwidth is
equivalent to executing the corresponding code:OpDPdx* and code:OpDPdy*
instructions, taking the absolute value of the results, and summing them.

Executing an code:OpImage*Sample*ImplicitLod instruction is equivalent to
executing code:OpDPdx(code:Coordinate) and code:OpDPdy(code:Coordinate), and
passing the results as the code:Grad operands code:dx and code:dy.

[NOTE]
.Note
====
It is expected that using the code:ImplicitLod variants of sampling
functions will be substantially more efficient than using the
code:ExplicitLod variants with explicitly generated derivatives.
====


[[shaders-helper-invocations]]
== Helper Invocations

When performing <<shaders-derivative-operations, derivative>>
ifdef::VK_VERSION_1_1[]
or <<shaders-quad-operations, quad group>>
endif::VK_VERSION_1_1[]
operations in a fragment shader, additional invocations may: be spawned in
order to ensure correct results.
These additional invocations are known as _helper invocations_ and can: be
identified by a non-zero value in the code:HelperInvocation built-in.
Stores and atomics performed by helper invocations must: not have any effect
on memory, and values returned by atomic instructions in helper invocations
are undefined:.

For <<shaders-group-operations, group operations>> other than
<<shaders-derivative-operations, derivative>>
ifdef::VK_VERSION_1_1[]
and <<shaders-quad-operations, quad group>>
endif::VK_VERSION_1_1[]
operations, helper invocations may: be treated as inactive even if they
would be considered otherwise active.

ifdef::VK_EXT_shader_demote_to_helper_invocation[]
Helper invocations may: become permanently inactive if all invocations in a
quad scope instance become helper invocations.
endif::VK_EXT_shader_demote_to_helper_invocation[]


ifdef::VK_NV_cooperative_matrix[]
== Cooperative Matrices

A _cooperative matrix_ type is a SPIR-V type where the storage for and
computations performed on the matrix are spread across the invocations in a
scope instance.
These types give the implementation freedom in how to optimize matrix
multiplies.

SPIR-V defines the types and instructions, but does not specify rules about
what sizes/combinations are valid, and it is expected that different
implementations may: support different sizes.

[open,refpage='vkGetPhysicalDeviceCooperativeMatrixPropertiesNV',desc='Returns properties describing what cooperative matrix types are supported',type='protos']
--
To enumerate the supported cooperative matrix types and operations, call:

include::{generated}/api/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.txt[]

  * pname:physicalDevice is the physical device.
  * pname:pPropertyCount is a pointer to an integer related to the number of
    cooperative matrix properties available or queried.
  * pname:pProperties is either `NULL` or a pointer to an array of
    slink:VkCooperativeMatrixPropertiesNV structures.

If pname:pProperties is `NULL`, then the number of cooperative matrix
properties available is returned in pname:pPropertyCount.
Otherwise, pname:pPropertyCount must: point to a variable set by the user to
the number of elements in the pname:pProperties array, and on return the
variable is overwritten with the number of structures actually written to
pname:pProperties.
If pname:pPropertyCount is less than the number of cooperative matrix
properties available, at most pname:pPropertyCount structures will be
written, and ename:VK_INCOMPLETE will be returned instead of
ename:VK_SUCCESS, to indicate that not all the available cooperative matrix
properties were returned.

include::{generated}/validity/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.txt[]
--

[open,refpage='VkCooperativeMatrixPropertiesNV',desc='Structure specifying cooperative matrix properties',type='structs']
--
Each sname:VkCooperativeMatrixPropertiesNV structure describes a single
supported combination of types for a matrix multiply/add operation
(code:OpCooperativeMatrixMulAddNV).
The multiply can: be described in terms of the following variables and types
(in SPIR-V pseudocode):

[source,c]
~~~~
    %A is of type OpTypeCooperativeMatrixNV %AType %scope %MSize %KSize
    %B is of type OpTypeCooperativeMatrixNV %BType %scope %KSize %NSize
    %C is of type OpTypeCooperativeMatrixNV %CType %scope %MSize %NSize
    %D is of type OpTypeCooperativeMatrixNV %DType %scope %MSize %NSize

    %D = %A * %B + %C // using OpCooperativeMatrixMulAddNV
~~~~

A matrix multiply with these dimensions is known as an _MxNxK_ matrix
multiply.

The sname:VkCooperativeMatrixPropertiesNV structure is defined as:

include::{generated}/api/structs/VkCooperativeMatrixPropertiesNV.txt[]

  * pname:sType is the type of this structure.
  * pname:pNext is `NULL` or a pointer to a structure extending this
    structure.
  * pname:MSize is the number of rows in matrices A, C, and D.
  * pname:KSize is the number of columns in matrix A and rows in matrix B.
  * pname:NSize is the number of columns in matrices B, C, D.
  * pname:AType is the component type of matrix A, of type
    elink:VkComponentTypeNV.
  * pname:BType is the component type of matrix B, of type
    elink:VkComponentTypeNV.
  * pname:CType is the component type of matrix C, of type
    elink:VkComponentTypeNV.
  * pname:DType is the component type of matrix D, of type
    elink:VkComponentTypeNV.
  * pname:scope is the scope of all the matrix types, of type
    elink:VkScopeNV.

If some types are preferred over other types (e.g. for performance), they
should: appear earlier in the list enumerated by
flink:vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.

At least one entry in the list must: have power of two values for all of
pname:MSize, pname:KSize, and pname:NSize.

include::{generated}/validity/structs/VkCooperativeMatrixPropertiesNV.txt[]
--

[open,refpage='VkScopeNV',desc='Specify SPIR-V scope',type='enums']
--
Possible values for elink:VkScopeNV include:

include::{generated}/api/enums/VkScopeNV.txt[]

  * ename:VK_SCOPE_DEVICE_NV corresponds to SPIR-V code:Device scope.
  * ename:VK_SCOPE_WORKGROUP_NV corresponds to SPIR-V code:Workgroup scope.
  * ename:VK_SCOPE_SUBGROUP_NV corresponds to SPIR-V code:Subgroup scope.
  * ename:VK_SCOPE_QUEUE_FAMILY_NV corresponds to SPIR-V code:QueueFamily
    scope.

All enum values match the corresponding SPIR-V value.
--

[open,refpage='VkComponentTypeNV',desc='Specify SPIR-V cooperative matrix component type',type='enums']
--
Possible values for elink:VkComponentTypeNV include:

include::{generated}/api/enums/VkComponentTypeNV.txt[]

  * ename:VK_COMPONENT_TYPE_FLOAT16_NV corresponds to SPIR-V
    code:OpTypeFloat 16.
  * ename:VK_COMPONENT_TYPE_FLOAT32_NV corresponds to SPIR-V
    code:OpTypeFloat 32.
  * ename:VK_COMPONENT_TYPE_FLOAT64_NV corresponds to SPIR-V
    code:OpTypeFloat 64.
  * ename:VK_COMPONENT_TYPE_SINT8_NV corresponds to SPIR-V code:OpTypeInt 8 1.
  * ename:VK_COMPONENT_TYPE_SINT16_NV corresponds to SPIR-V code:OpTypeInt
    16 1.
  * ename:VK_COMPONENT_TYPE_SINT32_NV corresponds to SPIR-V code:OpTypeInt
    32 1.
  * ename:VK_COMPONENT_TYPE_SINT64_NV corresponds to SPIR-V code:OpTypeInt
    64 1.
  * ename:VK_COMPONENT_TYPE_UINT8_NV corresponds to SPIR-V code:OpTypeInt 8 0.
  * ename:VK_COMPONENT_TYPE_UINT16_NV corresponds to SPIR-V code:OpTypeInt
    16 0.
  * ename:VK_COMPONENT_TYPE_UINT32_NV corresponds to SPIR-V code:OpTypeInt
    32 0.
  * ename:VK_COMPONENT_TYPE_UINT64_NV corresponds to SPIR-V code:OpTypeInt
    64 0.
--
endif::VK_NV_cooperative_matrix[]


ifdef::VK_EXT_validation_cache[]
[[shaders-validation-cache]]
== Validation Cache

[open,refpage='VkValidationCacheEXT',desc='Opaque handle to a validation cache object',type='handles']
--
Validation cache objects allow the result of internal validation to be
reused, both within a single application run and between multiple runs.
Reuse within a single run is achieved by passing the same validation cache
object when creating supported Vulkan objects.
Reuse across runs of an application is achieved by retrieving validation
cache contents in one run of an application, saving the contents, and using
them to preinitialize a validation cache on a subsequent run.
The contents of the validation cache objects are managed by the validation
layers.
Applications can: manage the host memory consumed by a validation cache
object and control the amount of data retrieved from a validation cache
object.

Validation cache objects are represented by sname:VkValidationCacheEXT
handles:

include::{generated}/api/handles/VkValidationCacheEXT.txt[]
--

[open,refpage='vkCreateValidationCacheEXT',desc='Creates a new validation cache',type='protos']
--
To create validation cache objects, call:

include::{generated}/api/protos/vkCreateValidationCacheEXT.txt[]

  * pname:device is the logical device that creates the validation cache
    object.
  * pname:pCreateInfo is a pointer to a slink:VkValidationCacheCreateInfoEXT
    structure containing the initial parameters for the validation cache
    object.
  * pname:pAllocator controls host memory allocation as described in the
    <<memory-allocation, Memory Allocation>> chapter.
  * pname:pValidationCache is a pointer to a slink:VkValidationCacheEXT
    handle in which the resulting validation cache object is returned.

[NOTE]
.Note
====
Applications can: track and manage the total host memory size of a
validation cache object using the pname:pAllocator.
Applications can: limit the amount of data retrieved from a validation cache
object in fname:vkGetValidationCacheDataEXT.
Implementations should: not internally limit the total number of entries
added to a validation cache object or the total host memory consumed.
====

Once created, a validation cache can: be passed to the
fname:vkCreateShaderModule command by adding this object to the
slink:VkShaderModuleCreateInfo structure's pname:pNext chain.
If a slink:VkShaderModuleValidationCacheCreateInfoEXT object is included in
the slink:VkShaderModuleCreateInfo::pname:pNext chain, and its
pname:validationCache field is not dlink:VK_NULL_HANDLE, the implementation
will query it for possible reuse opportunities and update it with new
content.
The use of the validation cache object in these commands is internally
synchronized, and the same validation cache object can: be used in multiple
threads simultaneously.

[NOTE]
.Note
====
Implementations should: make every effort to limit any critical sections to
the actual accesses to the cache, which is expected to be significantly
shorter than the duration of the fname:vkCreateShaderModule command.
====

include::{generated}/validity/protos/vkCreateValidationCacheEXT.txt[]
--

[open,refpage='VkValidationCacheCreateInfoEXT',desc='Structure specifying parameters of a newly created validation cache',type='structs']
--
The sname:VkValidationCacheCreateInfoEXT structure is defined as:

include::{generated}/api/structs/VkValidationCacheCreateInfoEXT.txt[]

  * pname:sType is the type of this structure.
  * pname:pNext is `NULL` or a pointer to a structure extending this
    structure.
  * pname:flags is reserved for future use.
  * pname:initialDataSize is the number of bytes in pname:pInitialData.
    If pname:initialDataSize is zero, the validation cache will initially be
    empty.
  * pname:pInitialData is a pointer to previously retrieved validation cache
    data.
    If the validation cache data is incompatible (as defined below) with the
    device, the validation cache will be initially empty.
    If pname:initialDataSize is zero, pname:pInitialData is ignored.

.Valid Usage
****
  * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01534]]
    If pname:initialDataSize is not `0`, it must: be equal to the size of
    pname:pInitialData, as returned by fname:vkGetValidationCacheDataEXT
    when pname:pInitialData was originally retrieved
  * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01535]]
    If pname:initialDataSize is not `0`, pname:pInitialData must: have been
    retrieved from a previous call to fname:vkGetValidationCacheDataEXT
****

include::{generated}/validity/structs/VkValidationCacheCreateInfoEXT.txt[]
--

[open,refpage='VkValidationCacheCreateFlagsEXT',desc='Reserved for future use',type='flags']
--
include::{generated}/api/flags/VkValidationCacheCreateFlagsEXT.txt[]

tname:VkValidationCacheCreateFlagsEXT is a bitmask type for setting a mask,
but is currently reserved for future use.
--

[open,refpage='vkMergeValidationCachesEXT',desc='Combine the data stores of validation caches',type='protos']
--
Validation cache objects can: be merged using the command:

include::{generated}/api/protos/vkMergeValidationCachesEXT.txt[]

  * pname:device is the logical device that owns the validation cache
    objects.
  * pname:dstCache is the handle of the validation cache to merge results
    into.
  * pname:srcCacheCount is the length of the pname:pSrcCaches array.
  * pname:pSrcCaches is a pointer to an array of validation cache handles,
    which will be merged into pname:dstCache.
    The previous contents of pname:dstCache are included after the merge.

[NOTE]
.Note
====
The details of the merge operation are implementation-dependent, but
implementations should: merge the contents of the specified validation
caches and prune duplicate entries.
====

.Valid Usage
****
  * [[VUID-vkMergeValidationCachesEXT-dstCache-01536]]
    pname:dstCache must: not appear in the list of source caches
****

include::{generated}/validity/protos/vkMergeValidationCachesEXT.txt[]
--

[open,refpage='vkGetValidationCacheDataEXT',desc='Get the data store from a validation cache',type='protos']
--
Data can: be retrieved from a validation cache object using the command:

include::{generated}/api/protos/vkGetValidationCacheDataEXT.txt[]

  * pname:device is the logical device that owns the validation cache.
  * pname:validationCache is the validation cache to retrieve data from.
  * pname:pDataSize is a pointer to a value related to the amount of data in
    the validation cache, as described below.
  * pname:pData is either `NULL` or a pointer to a buffer.

If pname:pData is `NULL`, then the maximum size of the data that can: be
retrieved from the validation cache, in bytes, is returned in
pname:pDataSize.
Otherwise, pname:pDataSize must: point to a variable set by the user to the
size of the buffer, in bytes, pointed to by pname:pData, and on return the
variable is overwritten with the amount of data actually written to
pname:pData.
If pname:pDataSize is less than the maximum size that can: be retrieved by
the validation cache, at most pname:pDataSize bytes will be written to
pname:pData, and fname:vkGetValidationCacheDataEXT will return
ename:VK_INCOMPLETE instead of ename:VK_SUCCESS, to indicate that not all of
the validation cache was returned.

Any data written to pname:pData is valid and can: be provided as the
pname:pInitialData member of the slink:VkValidationCacheCreateInfoEXT
structure passed to fname:vkCreateValidationCacheEXT.

Two calls to fname:vkGetValidationCacheDataEXT with the same parameters
must: retrieve the same data unless a command that modifies the contents of
the cache is called between them.

[[validation-cache-header]]
Applications can: store the data retrieved from the validation cache, and
use these data, possibly in a future run of the application, to populate new
validation cache objects.
The results of validation, however, may: depend on the vendor ID, device ID,
driver version, and other details of the device.
To enable applications to detect when previously retrieved data is
incompatible with the device, the initial bytes written to pname:pData must:
be a header consisting of the following members:

.Layout for validation cache header version ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT
[width="85%",cols="8%,21%,71%",options="header"]
|====
| Offset | Size | Meaning
| 0 | 4                    | length in bytes of the entire validation cache header
                             written as a stream of bytes, with the least
                             significant byte first
| 4 | 4                    | a elink:VkValidationCacheHeaderVersionEXT value
                             written as a stream of bytes, with the least
                             significant byte first
| 8 | ename:VK_UUID_SIZE   | a layer commit ID expressed as a UUID, which uniquely
                             identifies the version of the validation layers used
                             to generate these validation results
|====

The first four bytes encode the length of the entire validation cache
header, in bytes.
This value includes all fields in the header including the validation cache
version field and the size of the length field.

The next four bytes encode the validation cache version, as described for
elink:VkValidationCacheHeaderVersionEXT.
A consumer of the validation cache should: use the cache version to
interpret the remainder of the cache header.

If pname:pDataSize is less than what is necessary to store this header,
nothing will be written to pname:pData and zero will be written to
pname:pDataSize.

include::{generated}/validity/protos/vkGetValidationCacheDataEXT.txt[]
--

[open,refpage='VkValidationCacheHeaderVersionEXT',desc='Encode validation cache version',type='enums',xrefs='vkCreateValidationCacheEXT vkGetValidationCacheDataEXT']
--
Possible values of the second group of four bytes in the header returned by
flink:vkGetValidationCacheDataEXT, encoding the validation cache version,
are:

include::{generated}/api/enums/VkValidationCacheHeaderVersionEXT.txt[]

  * ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT specifies version one
    of the validation cache.
--

[open,refpage='vkDestroyValidationCacheEXT',desc='Destroy a validation cache object',type='protos']
--
To destroy a validation cache, call:

include::{generated}/api/protos/vkDestroyValidationCacheEXT.txt[]

  * pname:device is the logical device that destroys the validation cache
    object.
  * pname:validationCache is the handle of the validation cache to destroy.
  * pname:pAllocator controls host memory allocation as described in the
    <<memory-allocation, Memory Allocation>> chapter.

.Valid Usage
****
  * [[VUID-vkDestroyValidationCacheEXT-validationCache-01537]]
    If sname:VkAllocationCallbacks were provided when pname:validationCache
    was created, a compatible set of callbacks must: be provided here
  * [[VUID-vkDestroyValidationCacheEXT-validationCache-01538]]
    If no sname:VkAllocationCallbacks were provided when
    pname:validationCache was created, pname:pAllocator must: be `NULL`
****

include::{generated}/validity/protos/vkDestroyValidationCacheEXT.txt[]
--
endif::VK_EXT_validation_cache[]