• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright 2015-2022 The Khronos Group Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5[[shaders]]
6= Shaders
7
8A shader specifies programmable operations that execute for each vertex,
9control point, tessellated vertex, primitive, fragment, or workgroup in the
10corresponding stage(s) of the graphics and compute pipelines.
11
12Graphics pipelines include vertex shader execution as a result of
13<<drawing,primitive assembly>>, followed, if enabled, by tessellation
14control and evaluation shaders operating on <<drawing-patch-lists,patches>>,
15geometry shaders, if enabled, operating on primitives, and fragment shaders,
16if present, operating on fragments generated by <<primsrast,Rasterization>>.
17In this specification, vertex, tessellation control, tessellation evaluation
18and geometry shaders are collectively referred to as
19<<pipelines-graphics-subsets-pre-rasterization,pre-rasterization shader
20stage>>s and occur in the logical pipeline before rasterization.
21The fragment shader occurs logically after rasterization.
22
23Only the compute shader stage is included in a compute pipeline.
24Compute shaders operate on compute invocations in a workgroup.
25
26Shaders can: read from input variables, and read from and write to output
27variables.
28Input and output variables can: be used to transfer data between shader
29stages, or to allow the shader to interact with values that exist in the
30execution environment.
31Similarly, the execution environment provides constants describing
32capabilities.
33
34Shader variables are associated with execution environment-provided inputs
35and outputs using _built-in_ decorations in the shader.
36The available decorations for each stage are documented in the following
37subsections.
38
39
40[[shader-modules]]
41== Shader Modules
42
43[open,refpage='VkShaderModule',desc='Opaque handle to a shader module object',type='handles']
44--
45_Shader modules_ contain _shader code_ and one or more entry points.
46Shaders are selected from a shader module by specifying an entry point as
47part of <<pipelines,pipeline>> creation.
48The stages of a pipeline can: use shaders that come from different modules.
49The shader code defining a shader module must: be in the SPIR-V format, as
50described by the <<spirvenv,Vulkan Environment for SPIR-V>> appendix.
51
52Shader modules are represented by sname:VkShaderModule handles:
53
54include::{generated}/api/handles/VkShaderModule.adoc[]
55--
56
57[open,refpage='vkCreateShaderModule',desc='Creates a new shader module object',type='protos']
58--
59To create a shader module, call:
60
61include::{generated}/api/protos/vkCreateShaderModule.adoc[]
62
63  * pname:device is the logical device that creates the shader module.
64  * pname:pCreateInfo is a pointer to a slink:VkShaderModuleCreateInfo
65    structure.
66  * pname:pAllocator controls host memory allocation as described in the
67    <<memory-allocation, Memory Allocation>> chapter.
68  * pname:pShaderModule is a pointer to a slink:VkShaderModule handle in
69    which the resulting shader module object is returned.
70
71Once a shader module has been created, any entry points it contains can: be
72used in pipeline shader stages as described in <<pipelines-compute,Compute
73Pipelines>> and <<pipelines-graphics,Graphics Pipelines>>.
74
75ifdef::VK_EXT_graphics_pipeline_libraries[]
76If the <<features-graphicsPipelineLibrary, pname:graphicsPipelineLibrary>>
77feature is enabled, shader module creation can: be omitted entirely.
78Instead, applications should: provide the slink:VkShaderModuleCreateInfo
79structure directly in to pipeline creation by chaining it to
80slink:VkPipelineShaderStageCreateInfo.
81This avoids the overhead of creating and managing an additional object.
82endif::VK_EXT_graphics_pipeline_libraries[]
83
84.Valid Usage
85****
86ifdef::VK_EXT_validation_cache[]
87  * [[VUID-vkCreateShaderModule-pCreateInfo-06904]]
88    If pname:pCreateInfo is not `NULL`, pname:pCreateInfo->pNext must: be
89    `NULL` or a pointer to a
90    slink:VkShaderModuleValidationCacheCreateInfoEXT structure
91endif::VK_EXT_validation_cache[]
92ifndef::VK_EXT_validation_cache[]
93  * [[VUID-vkCreateShaderModule-pCreateInfo-06905]]
94    If pname:pCreateInfo is not `NULL`, pname:pCreateInfo->pNext must: be
95    `NULL`
96endif::VK_EXT_validation_cache[]
97****
98
99include::{generated}/validity/protos/vkCreateShaderModule.adoc[]
100--
101
102[open,refpage='VkShaderModuleCreateInfo',desc='Structure specifying parameters of a newly created shader module',type='structs']
103--
104The sname:VkShaderModuleCreateInfo structure is defined as:
105
106include::{generated}/api/structs/VkShaderModuleCreateInfo.adoc[]
107
108  * pname:sType is the type of this structure.
109  * pname:pNext is `NULL` or a pointer to a structure extending this
110    structure.
111  * pname:flags is reserved for future use.
112  * pname:codeSize is the size, in bytes, of the code pointed to by
113    pname:pCode.
114  * pname:pCode is a pointer to code that is used to create the shader
115    module.
116    The type and format of the code is determined from the content of the
117    memory addressed by pname:pCode.
118
119.Valid Usage
120****
121  * [[VUID-VkShaderModuleCreateInfo-codeSize-01085]]
122    pname:codeSize must: be greater than 0
123ifndef::VK_NV_glsl_shader[]
124  * [[VUID-VkShaderModuleCreateInfo-codeSize-01086]]
125    pname:codeSize must: be a multiple of 4
126  * [[VUID-VkShaderModuleCreateInfo-pCode-01087]]
127    pname:pCode must: point to valid SPIR-V code, formatted and packed as
128    described by the <<spirv-spec,Khronos SPIR-V Specification>>
129  * [[VUID-VkShaderModuleCreateInfo-pCode-01088]]
130    pname:pCode must: adhere to the validation rules described by the
131    <<spirvenv-module-validation, Validation Rules within a Module>> section
132    of the <<spirvenv-capabilities,SPIR-V Environment>> appendix
133endif::VK_NV_glsl_shader[]
134ifdef::VK_NV_glsl_shader[]
135  * [[VUID-VkShaderModuleCreateInfo-pCode-01376]]
136    If pname:pCode is a pointer to SPIR-V code, pname:codeSize must: be a
137    multiple of 4
138  * [[VUID-VkShaderModuleCreateInfo-pCode-01377]]
139    pname:pCode must: point to either valid SPIR-V code, formatted and
140    packed as described by the <<spirv-spec,Khronos SPIR-V Specification>>
141    or valid GLSL code which must: be written to the `GL_KHR_vulkan_glsl`
142    extension specification
143  * [[VUID-VkShaderModuleCreateInfo-pCode-01378]]
144    If pname:pCode is a pointer to SPIR-V code, that code must: adhere to
145    the validation rules described by the <<spirvenv-module-validation,
146    Validation Rules within a Module>> section of the
147    <<spirvenv-capabilities,SPIR-V Environment>> appendix
148  * [[VUID-VkShaderModuleCreateInfo-pCode-01379]]
149    If pname:pCode is a pointer to GLSL code, it must: be valid GLSL code
150    written to the `GL_KHR_vulkan_glsl` GLSL extension specification
151endif::VK_NV_glsl_shader[]
152  * [[VUID-VkShaderModuleCreateInfo-pCode-01089]]
153    pname:pCode must: declare the code:Shader capability for SPIR-V code
154  * [[VUID-VkShaderModuleCreateInfo-pCode-01090]]
155    pname:pCode must: not declare any capability that is not supported by
156    the API, as described by the <<spirvenv-module-validation,
157    Capabilities>> section of the <<spirvenv-capabilities,SPIR-V
158    Environment>> appendix
159  * [[VUID-VkShaderModuleCreateInfo-pCode-01091]]
160    If pname:pCode declares any of the capabilities listed in the
161    <<spirvenv-capabilities-table, SPIR-V Environment>> appendix, one of the
162    corresponding requirements must: be satisfied
163  * [[VUID-VkShaderModuleCreateInfo-pCode-04146]]
164    pname:pCode must: not declare any SPIR-V extension that is not supported
165    by the API, as described by the <<spirvenv-extensions, Extension>>
166    section of the <<spirvenv-capabilities,SPIR-V Environment>> appendix
167  * [[VUID-VkShaderModuleCreateInfo-pCode-04147]]
168    If pname:pCode declares any of the SPIR-V extensions listed in the
169    <<spirvenv-extensions-table,SPIR-V Environment>> appendix, one of the
170    corresponding requirements must: be satisfied
171****
172
173include::{generated}/validity/structs/VkShaderModuleCreateInfo.adoc[]
174--
175
176[open,refpage='VkShaderModuleCreateFlags',desc='Reserved for future use',type='flags']
177--
178include::{generated}/api/flags/VkShaderModuleCreateFlags.adoc[]
179
180tname:VkShaderModuleCreateFlags is a bitmask type for setting a mask, but is
181currently reserved for future use.
182--
183
184ifdef::VK_EXT_validation_cache[]
185include::{chapters}/VK_EXT_validation_cache/shader-module-validation-cache.adoc[]
186endif::VK_EXT_validation_cache[]
187
188
189[open,refpage='vkDestroyShaderModule',desc='Destroy a shader module',type='protos']
190--
191To destroy a shader module, call:
192
193include::{generated}/api/protos/vkDestroyShaderModule.adoc[]
194
195  * pname:device is the logical device that destroys the shader module.
196  * pname:shaderModule is the handle of the shader module to destroy.
197  * pname:pAllocator controls host memory allocation as described in the
198    <<memory-allocation, Memory Allocation>> chapter.
199
200A shader module can: be destroyed while pipelines created using its shaders
201are still in use.
202
203.Valid Usage
204****
205  * [[VUID-vkDestroyShaderModule-shaderModule-01092]]
206    If sname:VkAllocationCallbacks were provided when pname:shaderModule was
207    created, a compatible set of callbacks must: be provided here
208  * [[VUID-vkDestroyShaderModule-shaderModule-01093]]
209    If no sname:VkAllocationCallbacks were provided when pname:shaderModule
210    was created, pname:pAllocator must: be `NULL`
211****
212
213include::{generated}/validity/protos/vkDestroyShaderModule.adoc[]
214--
215
216
217ifdef::VK_EXT_shader_module_identifier[]
218[[shaders-identifiers]]
219== Shader Module Identifiers
220
221[open,refpage='vkGetShaderModuleIdentifierEXT',desc='Query a unique identifier for a shader module',type='protos']
222--
223Shader modules have unique identifiers associated with them.
224To query an implementation provided identifier, call:
225
226include::{generated}/api/protos/vkGetShaderModuleIdentifierEXT.adoc[]
227
228  * pname:device is the logical device that created the shader module.
229  * pname:shaderModule is the handle of the shader module.
230  * pname:pIdentifier is a pointer to the returned
231    slink:VkShaderModuleIdentifierEXT.
232
233The identifier returned by the implementation must: only depend on
234pname:shaderIdentifierAlgorithmUUID and information provided in the
235slink:VkShaderModuleCreateInfo which created pname:shaderModule.
236The implementation may: return equal identifiers for two different
237slink:VkShaderModuleCreateInfo structures if the difference does not affect
238pipeline compilation.
239Identifiers are only meaningful on different slink:VkDevice objects if the
240device the identifier was queried from had the same
241<<limits-shaderModuleIdentifierAlgorithmUUID,
242pname:shaderModuleIdentifierAlgorithmUUID>> as the device consuming the
243identifier.
244
245.Valid Usage
246****
247  * [[VUID-vkGetShaderModuleIdentifierEXT-shaderModuleIdentifier-06884]]
248    <<features-shaderModuleIdentifier, pname:shaderModuleIdentifier>>
249    feature must: be enabled
250****
251
252include::{generated}/validity/protos/vkGetShaderModuleIdentifierEXT.adoc[]
253--
254
255[open,refpage='vkGetShaderModuleCreateInfoIdentifierEXT',desc='Query a unique identifier for a shader module create info',type='protos']
256--
257slink:VkShaderModuleCreateInfo structures have unique identifiers associated
258with them.
259To query an implementation provided identifier, call:
260
261include::{generated}/api/protos/vkGetShaderModuleCreateInfoIdentifierEXT.adoc[]
262
263  * pname:device is the logical device that can: create a
264    slink:VkShaderModule from pname:pCreateInfo.
265  * pname:pCreateInfo is a pointer to a slink:VkShaderModuleCreateInfo
266    structure.
267  * pname:pIdentifier is a pointer to the returned
268    slink:VkShaderModuleIdentifierEXT.
269
270The identifier returned by implementation must: only depend on
271pname:shaderIdentifierAlgorithmUUID and information provided in the
272slink:VkShaderModuleCreateInfo.
273The implementation may: return equal identifiers for two different
274slink:VkShaderModuleCreateInfo structures if the difference does not affect
275pipeline compilation.
276Identifiers are only meaningful on different slink:VkDevice objects if the
277device the identifier was queried from had the same
278<<limits-shaderModuleIdentifierAlgorithmUUID,
279pname:shaderModuleIdentifierAlgorithmUUID>> as the device consuming the
280identifier.
281
282The identifier returned by the implementation in
283flink:vkGetShaderModuleCreateInfoIdentifierEXT must: be equal to the
284identifier returned by flink:vkGetShaderModuleIdentifierEXT given equivalent
285definitions of slink:VkShaderModuleCreateInfo and any chained pname:pNext
286structures.
287
288.Valid Usage
289****
290  * [[VUID-vkGetShaderModuleCreateInfoIdentifierEXT-shaderModuleIdentifier-06885]]
291    <<features-shaderModuleIdentifier, pname:shaderModuleIdentifier>>
292    feature must: be enabled
293****
294
295include::{generated}/validity/protos/vkGetShaderModuleCreateInfoIdentifierEXT.adoc[]
296--
297
298[open,refpage='VkShaderModuleIdentifierEXT',desc='A unique identifier for a shader module',type='structs']
299--
300slink:VkShaderModuleIdentifierEXT represents a shader module identifier
301returned by the implementation.
302
303include::{generated}/api/structs/VkShaderModuleIdentifierEXT.adoc[]
304
305  * pname:sType is the type of this structure.
306  * pname:pNext is `NULL` or a pointer to a structure extending this
307    structure.
308  * pname:identifierSize is the size, in bytes, of valid data returned in
309    pname:identifier.
310  * pname:identifier is a buffer of opaque data specifying an identifier.
311
312Any returned values beyond the first pname:identifierSize bytes are
313undefined:.
314Implementations must: return an pname:identifierSize greater than 0, and
315less-or-equal to ename:VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT.
316
317Two identifiers are considered equal if pname:identifierSize is equal and
318the first pname:identifierSize bytes of pname:identifier compare equal.
319
320Implementations may: return a different pname:identifierSize for different
321modules.
322Implementations should: ensure that pname:identifierSize is large enough to
323uniquely define a shader module.
324
325include::{generated}/validity/structs/VkShaderModuleIdentifierEXT.adoc[]
326--
327
328[open,refpage='VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT',desc='Maximum length of a shader module identifier',type='consts']
329--
330ename:VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT is the length in bytes of a
331shader module identifier, as returned in
332slink:VkShaderModuleIdentifierEXT::pname:identifierSize.
333
334include::{generated}/api/enums/VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT.adoc[]
335--
336endif::VK_EXT_shader_module_identifier[]
337
338
339[[shaders-execution]]
340== Shader Execution
341
342At each stage of the pipeline, multiple invocations of a shader may: execute
343simultaneously.
344Further, invocations of a single shader produced as the result of different
345commands may: execute simultaneously.
346The relative execution order of invocations of the same shader type is
347undefined:.
348Shader invocations may: complete in a different order than that in which the
349primitives they originated from were drawn or dispatched by the application.
350However, fragment shader outputs are written to attachments in
351<<primsrast-order,rasterization order>>.
352
353The relative execution order of invocations of different shader types is
354largely undefined:.
355However, when invoking a shader whose inputs are generated from a previous
356pipeline stage, the shader invocations from the previous stage are
357guaranteed to have executed far enough to generate input values for all
358required inputs.
359
360
361[[shaders-termination]]
362=== Shader Termination
363
364A shader invocation that is _terminated_ has finished executing
365instructions.
366
367Executing code:OpReturn in the entry point, or executing
368code:OpTerminateInvocation in any function will terminate an invocation.
369Implementations may: also terminate a shader invocation when code:OpKill is
370executed in any function; otherwise it becomes a
371<<shaders-helper-invocations, helper invocation>>.
372
373In addition to the above conditions, <<shaders-helper-invocations,helper
374invocations>> are terminated when all non-helper invocations in the same
375<<shaders-derivative-operations,derivative group>> either terminate or
376become <<shaders-helper-invocations,helper invocations>> via
377ifdef::VK_EXT_shader_demote_to_helper_invocation[]
378code:OpDemoteToHelperInvocationEXT or
379endif::VK_EXT_shader_demote_to_helper_invocation[]
380code:OpKill.
381
382A shader stage for a given command completes execution when all invocations
383for that stage have terminated.
384
385
386[[shaders-execution-memory-ordering]]
387== Shader Memory Access Ordering
388
389The order in which image or buffer memory is read or written by shaders is
390largely undefined:.
391For some shader types (vertex, tessellation evaluation, and in some cases,
392fragment), even the number of shader invocations that may: perform loads and
393stores is undefined:.
394
395In particular, the following rules apply:
396
397  * <<shaders-vertex-execution,Vertex>> and
398    <<shaders-tessellation-evaluation-execution,tessellation evaluation>>
399    shaders will be invoked at least once for each unique vertex, as defined
400    in those sections.
401  * <<fragops-shader,Fragment>> shaders will be invoked zero or more times,
402    as defined in that section.
403  * The relative execution order of invocations of the same shader type is
404    undefined:.
405    A store issued by a shader when working on primitive B might complete
406    prior to a store for primitive A, even if primitive A is specified prior
407    to primitive B. This applies even to fragment shaders; while fragment
408    shader outputs are always written to the framebuffer in
409    <<primsrast-order, rasterization order>>, stores executed by fragment
410    shader invocations are not.
411  * The relative execution order of invocations of different shader types is
412    largely undefined:.
413
414[NOTE]
415.Note
416====
417The above limitations on shader invocation order make some forms of
418synchronization between shader invocations within a single set of primitives
419unimplementable.
420For example, having one invocation poll memory written by another invocation
421assumes that the other invocation has been launched and will complete its
422writes in finite time.
423====
424
425ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
426
427The <<memory-model,Memory Model>> appendix defines the terminology and rules
428for how to correctly communicate between shader invocations, such as when a
429write is <<memory-model-visible-to,Visible-To>> a read, and what constitutes
430a <<memory-model-access-data-race,Data Race>>.
431
432Applications must: not cause a data race.
433
434endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
435
436ifndef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
437
438Stores issued to different memory locations within a single shader
439invocation may: not be visible to other invocations, or may: not become
440visible in the order they were performed.
441
442The code:OpMemoryBarrier instruction can: be used to provide stronger
443ordering of reads and writes performed by a single invocation.
444code:OpMemoryBarrier guarantees that any memory transactions issued by the
445shader invocation prior to the instruction complete prior to the memory
446transactions issued after the instruction.
447Memory barriers are needed for algorithms that require multiple invocations
448to access the same memory and require the operations to be performed in a
449partially-defined relative order.
450For example, if one shader invocation does a series of writes, followed by
451an code:OpMemoryBarrier instruction, followed by another write, then the
452results of the series of writes before the barrier become visible to other
453shader invocations at a time earlier or equal to when the results of the
454final write become visible to those invocations.
455In practice it means that another invocation that sees the results of the
456final write would also see the previous writes.
457Without the memory barrier, the final write may: be visible before the
458previous writes.
459
460Writes that are the result of shader stores through a variable decorated
461with code:Coherent automatically have available writes to the same buffer,
462buffer view, or image view made visible to them, and are themselves
463automatically made available to access by the same buffer, buffer view, or
464image view.
465Reads that are the result of shader loads through a variable decorated with
466code:Coherent automatically have available writes to the same buffer, buffer
467view, or image view made visible to them.
468The order that coherent writes to different locations become available is
469undefined:, unless enforced by a memory barrier instruction or other memory
470dependency.
471
472[NOTE]
473.Note
474====
475Explicit memory dependencies must: still be used to guarantee availability
476and visibility for access via other buffers, buffer views, or image views.
477====
478
479The built-in atomic memory transaction instructions can: be used to read and
480write a given memory address atomically.
481While built-in atomic functions issued by multiple shader invocations are
482executed in undefined: order relative to each other, these functions perform
483both a read and a write of a memory address and guarantee that no other
484memory transaction will write to the underlying memory between the read and
485write.
486Atomic operations ensure automatic availability and visibility for writes
487and reads in the same way as those to code:Coherent variables.
488
489[NOTE]
490.Note
491====
492Memory accesses performed on different resource descriptors with the same
493memory backing may: not be well-defined even with the code:Coherent
494decoration or via atomics, due to things such as image layouts or ownership
495of the resource - as described in the <<synchronization, Synchronization and
496Cache Control>> chapter.
497====
498
499[NOTE]
500.Note
501====
502Atomics allow shaders to use shared global addresses for mutual exclusion or
503as counters, among other uses.
504====
505
506endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
507
508The SPIR-V *SubgroupMemory*, *CrossWorkgroupMemory*, and
509*AtomicCounterMemory* memory semantics are ignored.
510Sequentially consistent atomics and barriers are not supported and
511*SequentiallyConsistent* is treated as *AcquireRelease*.
512*SequentiallyConsistent* should: not be used.
513
514
515[[shaders-inputs]]
516== Shader Inputs and Outputs
517
518Data is passed into and out of shaders using variables with input or output
519storage class, respectively.
520User-defined inputs and outputs are connected between stages by matching
521their code:Location decorations.
522Additionally, data can: be provided by or communicated to special functions
523provided by the execution environment using code:BuiltIn decorations.
524
525In many cases, the same code:BuiltIn decoration can: be used in multiple
526shader stages with similar meaning.
527The specific behavior of variables decorated as code:BuiltIn is documented
528in the following sections.
529
530ifdef::VK_NV_mesh_shader,VK_EXT_mesh_shader[]
531[[shaders-task]]
532== Task Shaders
533
534Task shaders operate in conjunction with the mesh shaders to produce a
535collection of primitives that will be processed by subsequent stages of the
536graphics pipeline.
537Its primary purpose is to create a variable amount of subsequent mesh shader
538invocations.
539
540Task shaders are invoked via the execution of the
541<<drawing-mesh-shading,programmable mesh shading>> pipeline.
542
543The task shader has no fixed-function inputs other than variables
544identifying the specific workgroup and invocation.
545ifdef::VK_NV_mesh_shader[]
546In the code:TaskNV {ExecutionModel} the number of mesh shader workgroups to
547create is specified via a code:TaskCountNV decorated output variable.
548endif::VK_NV_mesh_shader[]
549ifdef::VK_EXT_mesh_shader[]
550In the code:TaskEXT {ExecutionModel} the number of mesh shader workgroups to
551create is specified via the code:OpEmitMeshTasksEXT instruction.
552endif::VK_EXT_mesh_shader[]
553
554The task shader can write additional outputs to task memory, which can be
555read by all of the mesh shader workgroups it created.
556
557
558=== Task Shader Execution
559
560Task workloads are formed from groups of work items called workgroups and
561processed by the task shader in the current graphics pipeline.
562A workgroup is a collection of shader invocations that execute the same
563shader, potentially in parallel.
564Task shaders execute in _global workgroups_ which are divided into a number
565of _local workgroups_ with a size that can: be set by assigning a value to
566the code:LocalSize
567ifdef::VK_VERSION_1_3,VK_KHR_maintenance4[or code:LocalSizeId]
568execution mode or via an object decorated by the code:WorkgroupSize
569decoration.
570An invocation within a local workgroup can: share data with other members of
571the local workgroup through shared variables and issue memory and control
572flow barriers to synchronize with other members of the local workgroup.
573ifdef::VK_EXT_mesh_shader[]
574ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
575If the subpass includes multiple views in its view mask, a Task shader using
576code:TaskEXT {ExecutionModel} may: be invoked separately for each view.
577endif::VK_VERSION_1_1,VK_KHR_multiview[]
578endif::VK_EXT_mesh_shader[]
579
580
581[[shaders-mesh]]
582== Mesh Shaders
583
584Mesh shaders operate in workgroups to produce a collection of primitives
585that will be processed by subsequent stages of the graphics pipeline.
586Each workgroup emits zero or more output primitives and the group of
587vertices and their associated data required for each output primitive.
588
589Mesh shaders are invoked via the execution of the
590<<drawing-mesh-shading,programmable mesh shading>> pipeline.
591
592The only inputs available to the mesh shader are variables identifying the
593specific workgroup and invocation and, if applicable, any outputs written to
594task memory by the task shader that spawned the mesh shader's workgroup.
595The mesh shader can operate without a task shader as well.
596
597The invocations of the mesh shader workgroup write an output mesh,
598comprising a set of primitives with per-primitive attributes, a set of
599vertices with per-vertex attributes, and an array of indices identifying the
600mesh vertices that belong to each primitive.
601The primitives of this mesh are then processed by subsequent graphics
602pipeline stages, where the outputs of the mesh shader form an interface with
603the fragment shader.
604
605
606=== Mesh Shader Execution
607
608Mesh workloads are formed from groups of work items called workgroups and
609processed by the mesh shader in the current graphics pipeline.
610A workgroup is a collection of shader invocations that execute the same
611shader, potentially in parallel.
612Mesh shaders execute in _global workgroups_ which are divided into a number
613of _local workgroups_ with a size that can: be set by assigning a value to
614the code:LocalSize
615ifdef::VK_VERSION_1_3,VK_KHR_maintenance4[or code:LocalSizeId]
616execution mode or via an object decorated by the code:WorkgroupSize
617decoration.
618An invocation within a local workgroup can: share data with other members of
619the local workgroup through shared variables and issue memory and control
620flow barriers to synchronize with other members of the local workgroup.
621
622The _global workgroups_ may be generated explicitly via the API, or
623implicitly through the task shader's work creation mechanism.
624endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[]
625ifdef::VK_EXT_mesh_shader[]
626ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
627If the subpass includes multiple views in its view mask, a Mesh shader using
628code:MeshEXT {ExecutionModel} may: be invoked separately for each view.
629endif::VK_VERSION_1_1,VK_KHR_multiview[]
630endif::VK_EXT_mesh_shader[]
631
632
633[[shaders-vertex]]
634== Vertex Shaders
635
636Each vertex shader invocation operates on one vertex and its associated
637<<fxvertex-attrib,vertex attribute>> data, and outputs one vertex and
638associated data.
639ifndef::VK_NV_mesh_shader,VK_EXT_mesh_shader[]
640Graphics pipelines must: include a vertex shader, and the vertex shader
641stage is always the first shader stage in the graphics pipeline.
642endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[]
643ifdef::VK_NV_mesh_shader,VK_EXT_mesh_shader[]
644Graphics pipelines using primitive shading must: include a vertex shader,
645and the vertex shader stage is always the first shader stage in the graphics
646pipeline.
647endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[]
648
649
650[[shaders-vertex-execution]]
651=== Vertex Shader Execution
652
653A vertex shader must: be executed at least once for each vertex specified by
654a drawing command.
655ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
656If the subpass includes multiple views in its view mask, the shader may: be
657invoked separately for each view.
658endif::VK_VERSION_1_1,VK_KHR_multiview[]
659During execution, the shader is presented with the index of the vertex and
660instance for which it has been invoked.
661Input variables declared in the vertex shader are filled by the
662implementation with the values of vertex attributes associated with the
663invocation being executed.
664
665If the same vertex is specified multiple times in a drawing command (e.g. by
666including the same index value multiple times in an index buffer) the
667implementation may: reuse the results of vertex shading if it can statically
668determine that the vertex shader invocations will produce identical results.
669
670[NOTE]
671.Note
672====
673It is implementation-dependent when and if results of vertex shading are
674reused, and thus how many times the vertex shader will be executed.
675This is true also if the vertex shader contains stores or atomic operations
676(see <<features-vertexPipelineStoresAndAtomics,
677pname:vertexPipelineStoresAndAtomics>>).
678====
679
680
681[[shaders-tessellation-control]]
682== Tessellation Control Shaders
683
684The tessellation control shader is used to read an input patch provided by
685the application and to produce an output patch.
686Each tessellation control shader invocation operates on an input patch
687(after all control points in the patch are processed by a vertex shader) and
688its associated data, and outputs a single control point of the output patch
689and its associated data, and can: also output additional per-patch data.
690The input patch is sized according to the pname:patchControlPoints member of
691slink:VkPipelineTessellationStateCreateInfo, as part of input assembly.
692
693ifdef::VK_EXT_extended_dynamic_state2[]
694The input patch can also be dynamically sized with pname:patchControlPoints
695parameter of flink:vkCmdSetPatchControlPointsEXT.
696
697[open,refpage='vkCmdSetPatchControlPointsEXT',desc='Specify the number of control points per patch dynamically for a command buffer',type='protos']
698--
699To <<pipelines-dynamic-state, dynamically set>> the number of control points
700per patch, call:
701
702include::{generated}/api/protos/vkCmdSetPatchControlPointsEXT.adoc[]
703
704  * pname:commandBuffer is the command buffer into which the command will be
705    recorded.
706  * pname:patchControlPoints specifies the number of control points per
707    patch.
708
709This command sets the number of control points per patch for subsequent
710drawing commands when the graphics pipeline is created with
711ename:VK_DYNAMIC_STATE_PATCH_CONTROL_POINTS_EXT set in
712slink:VkPipelineDynamicStateCreateInfo::pname:pDynamicStates.
713Otherwise, this state is specified by the
714slink:VkPipelineTessellationStateCreateInfo::pname:patchControlPoints value
715used to create the currently active pipeline.
716
717.Valid Usage
718****
719  * [[VUID-vkCmdSetPatchControlPointsEXT-None-04873]]
720    The <<features-extendedDynamicState2PatchControlPoints,
721    pname:extendedDynamicState2PatchControlPoints>> feature must: be enabled
722  * [[VUID-vkCmdSetPatchControlPointsEXT-patchControlPoints-04874]]
723    pname:patchControlPoints must: be greater than zero and less than or
724    equal to sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize
725****
726
727include::{generated}/validity/protos/vkCmdSetPatchControlPointsEXT.adoc[]
728--
729endif::VK_EXT_extended_dynamic_state2[]
730
731The size of the output patch is controlled by the code:OpExecutionMode
732code:OutputVertices specified in the tessellation control or tessellation
733evaluation shaders, which must: be specified in at least one of the shaders.
734The size of the input and output patches must: each be greater than zero and
735less than or equal to
736sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize.
737
738
739[[shaders-tessellation-control-execution]]
740=== Tessellation Control Shader Execution
741
742A tessellation control shader is invoked at least once for each _output_
743vertex in a patch.
744ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
745If the subpass includes multiple views in its view mask, the shader may: be
746invoked separately for each view.
747endif::VK_VERSION_1_1,VK_KHR_multiview[]
748
749Inputs to the tessellation control shader are generated by the vertex
750shader.
751Each invocation of the tessellation control shader can: read the attributes
752of any incoming vertices and their associated data.
753The invocations corresponding to a given patch execute logically in
754parallel, with undefined: relative execution order.
755However, the code:OpControlBarrier instruction can: be used to provide
756limited control of the execution order by synchronizing invocations within a
757patch, effectively dividing tessellation control shader execution into a set
758of phases.
759Tessellation control shaders will read undefined: values if one invocation
760reads a per-vertex or per-patch output written by another invocation at any
761point during the same phase, or if two invocations attempt to write
762different values to the same per-patch output in a single phase.
763
764
765[[shaders-tessellation-evaluation]]
766== Tessellation Evaluation Shaders
767
768The Tessellation Evaluation Shader operates on an input patch of control
769points and their associated data, and a single input barycentric coordinate
770indicating the invocation's relative position within the subdivided patch,
771and outputs a single vertex and its associated data.
772
773
774[[shaders-tessellation-evaluation-execution]]
775=== Tessellation Evaluation Shader Execution
776
777A tessellation evaluation shader is invoked at least once for each unique
778vertex generated by the tessellator.
779ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
780If the subpass includes multiple views in its view mask, the shader may: be
781invoked separately for each view.
782endif::VK_VERSION_1_1,VK_KHR_multiview[]
783
784
785[[shaders-geometry]]
786== Geometry Shaders
787
788The geometry shader operates on a group of vertices and their associated
789data assembled from a single input primitive, and emits zero or more output
790primitives and the group of vertices and their associated data required for
791each output primitive.
792
793
794[[shaders-geometry-execution]]
795=== Geometry Shader Execution
796
797A geometry shader is invoked at least once for each primitive produced by
798the tessellation stages, or at least once for each primitive generated by
799<<drawing,primitive assembly>> when tessellation is not in use.
800A shader can request that the geometry shader runs multiple
801<<geometry-invocations, instances>>.
802A geometry shader is invoked at least once for each instance.
803ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
804If the subpass includes multiple views in its view mask, the shader may: be
805invoked separately for each view.
806endif::VK_VERSION_1_1,VK_KHR_multiview[]
807
808
809[[shaders-fragment]]
810== Fragment Shaders
811
812Fragment shaders are invoked as a <<fragops-shader, fragment operation>> in
813a graphics pipeline.
814Each fragment shader invocation operates on a single fragment and its
815associated data.
816With few exceptions, fragment shaders do not have access to any data
817associated with other fragments and are considered to execute in isolation
818of fragment shader invocations associated with other fragments.
819
820
821[[shaders-compute]]
822== Compute Shaders
823
824Compute shaders are invoked via flink:vkCmdDispatch and
825flink:vkCmdDispatchIndirect commands.
826In general, they have access to similar resources as shader stages executing
827as part of a graphics pipeline.
828
829Compute workloads are formed from groups of work items called workgroups and
830processed by the compute shader in the current compute pipeline.
831A workgroup is a collection of shader invocations that execute the same
832shader, potentially in parallel.
833Compute shaders execute in _global workgroups_ which are divided into a
834number of _local workgroups_ with a size that can: be set by assigning a
835value to the code:LocalSize
836ifdef::VK_VERSION_1_3,VK_KHR_maintenance4[or code:LocalSizeId]
837execution mode or via an object decorated by the code:WorkgroupSize
838decoration.
839An invocation within a local workgroup can: share data with other members of
840the local workgroup through shared variables and issue memory and control
841flow barriers to synchronize with other members of the local workgroup.
842
843
844ifdef::VK_NV_ray_tracing,VK_KHR_ray_tracing_pipeline[]
845[[shaders-raytracing-shaders]]
846[[shaders-ray-generation]]
847== Ray Generation Shaders
848
849A ray generation shader is similar to a compute shader.
850Its main purpose is to execute ray tracing queries using code:OpTraceRayKHR
851instructions and process the results.
852
853
854[[shaders-ray-generation-execution]]
855=== Ray Generation Shader Execution
856
857One ray generation shader is executed per ray tracing dispatch.
858Its location in the shader binding table (see <<shader-binding-table,Shader
859Binding Table>> for details) is passed directly into
860ifdef::VK_KHR_ray_tracing_pipeline[]
861flink:vkCmdTraceRaysKHR using the pname:pRaygenShaderBindingTable parameter
862endif::VK_KHR_ray_tracing_pipeline[]
863ifdef::VK_KHR_ray_tracing_pipeline+VK_KHR_ray_tracing_pipeline[or]
864ifdef::VK_NV_ray_tracing[]
865flink:vkCmdTraceRaysNV using the pname:raygenShaderBindingTableBuffer and
866pname:raygenShaderBindingOffset parameters
867endif::VK_NV_ray_tracing[]
868.
869
870
871[[shaders-intersection]]
872== Intersection Shaders
873
874Intersection shaders enable the implementation of arbitrary, application
875defined geometric primitives.
876An intersection shader for a primitive is executed whenever its axis-aligned
877bounding box is hit by a ray.
878
879Like other ray tracing shader domains, an intersection shader operates on a
880single ray at a time.
881It also operates on a single primitive at a time.
882It is therefore the purpose of an intersection shader to compute the
883ray-primitive intersections and report them.
884To report an intersection, the shader calls the code:OpReportIntersectionKHR
885instruction.
886
887An intersection shader communicates with any-hit and closest shaders by
888generating attribute values that they can: read.
889Intersection shaders cannot: read or modify the ray payload.
890
891
892[[shaders-intersection-execution]]
893=== Intersection Shader Execution
894The order in which intersections are found along a ray, and therefore the
895order in which intersection shaders are executed, is unspecified.
896
897The intersection shader of the closest AABB which intersects the ray is
898guaranteed to be executed at some point during traversal, unless the ray is
899forcibly terminated.
900
901
902[[shaders-any-hit]]
903== Any-Hit Shaders
904
905The any-hit shader is executed after the intersection shader reports an
906intersection that lies within the current [eq]#[t~min~,t~max~]# of the ray.
907The main use of any-hit shaders is to programmatically decide whether or not
908an intersection will be accepted.
909The intersection will be accepted unless the shader calls the
910code:OpIgnoreIntersectionKHR instruction.
911Any-hit shaders have read-only access to the attributes generated by the
912corresponding intersection shader, and can: read or modify the ray payload.
913
914
915[[shaders-any-hit-execution]]
916=== Any-Hit Shader Execution
917
918The order in which intersections are found along a ray, and therefore the
919order in which any-hit shaders are executed, is unspecified.
920
921The any-hit shader of the closest hit is guaranteed to be executed at some
922point during traversal, unless the ray is forcibly terminated.
923
924
925[[shaders-closest-hit]]
926== Closest Hit Shaders
927
928Closest hit shaders have read-only access to the attributes generated by the
929corresponding intersection shader, and can: read or modify the ray payload.
930They also have access to a number of system-generated values.
931Closest hit shaders can: call code:OpTraceRayKHR to recursively trace rays.
932
933
934[[shaders-closest-hit-execution]]
935=== Closest Hit Shader Execution
936
937Exactly one closest hit shader is executed when traversal is finished and an
938intersection has been found and accepted.
939
940
941[[shaders-miss]]
942== Miss Shaders
943
944Miss shaders can: access the ray payload and can: trace new rays through the
945code:OpTraceRayKHR instruction, but cannot: access attributes since they are
946not associated with an intersection.
947
948
949[[shaders-miss-execution]]
950=== Miss Shader Execution
951
952A miss shader is executed instead of a closest hit shader if no intersection
953was found during traversal.
954
955
956[[shaders-callable]]
957== Callable Shaders
958
959Callable shaders can: access a callable payload that works similarly to ray
960payloads to do subroutine work.
961
962
963[[shaders-callable-execution]]
964=== Callable Shader Execution
965
966A callable shader is executed by calling code:OpExecuteCallableKHR from an
967allowed shader stage.
968
969endif::VK_NV_ray_tracing,VK_KHR_ray_tracing_pipeline[]
970
971
972[[shaders-interpolation-decorations]]
973== Interpolation decorations
974
975Variables in the code:Input storage class in a fragment shader's interface
976are interpolated from the values specified by the primitive being
977rasterized.
978
979[NOTE]
980.Note
981====
982Interpolation decorations can be present on input and output variables in
983pre-rasterization shaders but have no effect on the interpolation performed.
984ifdef::VK_EXT_graphics_pipeline_libraries[]
985However, when linking graphics pipeline libraries, if the
986<<limits-graphicsPipelineLibraryIndependentInterpolationDecoration,
987pname:graphicsPipelineLibraryIndependentInterpolationDecoration>> limit is
988not supported, interpolation qualifiers do need to match between the
989fragment shader input and the last pre-rasterization shader output.
990endif::VK_EXT_graphics_pipeline_libraries[]
991====
992
993An undecorated input variable will be interpolated with perspective-correct
994interpolation according to the primitive type being rasterized.
995<<line_perspective_interpolation,Lines>> and
996<<triangle_perspective_interpolation,polygons>> are interpolated in the same
997way as the primitive's clip coordinates.
998If the code:NoPerspective decoration is present, linear interpolation is
999instead used for <<line_linear_interpolation,lines>> and
1000<<triangle_linear_interpolation,polygons>>.
1001For points, as there is only a single vertex, input values are never
1002interpolated and instead take the value written for the single vertex.
1003
1004If the code:Flat decoration is present on an input variable, the value is
1005not interpolated, and instead takes its value directly from the
1006<<vertexpostproc-flatshading,provoking vertex>>.
1007Fragment shader inputs that are signed or unsigned integers, integer
1008vectors, or any double-precision floating-point type must: be decorated with
1009code:Flat.
1010
1011Interpolation of input variables is performed at an implementation-defined
1012position within the fragment area being shaded.
1013The position is further constrained as follows:
1014
1015  * If the code:Centroid decoration is used, the interpolation position used
1016    for the variable must: also fall within the bounds of the primitive
1017    being rasterized.
1018  * If the code:Sample decoration is used, the interpolation position used
1019    for the variable must: be at the position of the sample being shaded by
1020    the current fragment shader invocation.
1021  * If a sample count of 1 is used, the interpolation position must: be at
1022    the center of the fragment area.
1023
1024[NOTE]
1025.Note
1026====
1027As code:Centroid restricts the possible interpolation position to the
1028covered area of the primitive, the position can be forced to vary between
1029neighboring fragments when it otherwise would not.
1030Derivatives calculated based on these differing locations can produce
1031inconsistent results compared to undecorated inputs.
1032It is recommended that input variables used in derivative calculations are
1033not decorated with code:Centroid.
1034====
1035
1036ifdef::VK_NV_fragment_shader_barycentric,VK_KHR_fragment_shader_barycentric[]
1037[[shaders-interpolation-decorations-pervertexkhr]]
1038If the code:PerVertexKHR decoration is present on an input variable, the
1039value is not interpolated, and instead values from all input vertices are
1040available in an array.
1041Each index of the array corresponds to one of the vertices of the primitive
1042that produced the fragment.
1043endif::VK_NV_fragment_shader_barycentric,VK_KHR_fragment_shader_barycentric[]
1044
1045ifdef::VK_AMD_shader_explicit_vertex_parameter[]
1046If the code:CustomInterpAMD decoration is present on an input variable, the
1047value cannot: be accessed directly; instead the extended instruction
1048code:InterpolateAtVertexAMD must: be used to obtain values from the input
1049vertices.
1050endif::VK_AMD_shader_explicit_vertex_parameter[]
1051
1052
1053[[shaders-staticuse]]
1054== Static Use
1055
1056A SPIR-V module declares a global object in memory using the code:OpVariable
1057instruction, which results in a pointer code:x to that object.
1058A specific entry point in a SPIR-V module is said to _statically use_ that
1059object if that entry point's call tree contains a function containing a
1060instruction with code:x as an code:id operand.
1061
1062Static use is not used to control the behavior of variables with code:Input
1063and code:Output storage.
1064The effects of those variables are applied based only on whether they are
1065present in a shader entry point's interface.
1066
1067
1068[[shaders-scope]]
1069== Scope
1070
1071A _scope_ describes a set of shader invocations, where each such set is a
1072_scope instance_.
1073Each invocation belongs to one or more scope instances, but belongs to no
1074more than one scope instance for each scope.
1075
1076The operations available between invocations in a given scope instance vary,
1077with smaller scopes generally able to perform more operations, and with
1078greater efficiency.
1079
1080
1081[[shaders-scope-cross-device]]
1082=== Cross Device
1083
1084All invocations executed in a Vulkan instance fall into a single _cross
1085device scope instance_.
1086
1087Whilst the code:CrossDevice scope is defined in SPIR-V, it is disallowed in
1088Vulkan.
1089API <<synchronization, synchronization>> commands can: be used to
1090communicate between devices.
1091
1092
1093[[shaders-scope-device]]
1094=== Device
1095
1096All invocations executed on a single device form a _device scope instance_.
1097
1098ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
1099If the <<features-vulkanMemoryModel, pname:vulkanMemoryModel>> and
1100<<features-vulkanMemoryModelDeviceScope,
1101pname:vulkanMemoryModelDeviceScope>> features are enabled, this scope is
1102represented in SPIR-V by the code:Device code:Scope, which can: be used as a
1103code:Memory code:Scope for barrier and atomic operations.
1104
1105ifdef::VK_KHR_shader_clock[]
1106If both the <<features-shaderDeviceClock, pname:shaderDeviceClock>> and
1107<<features-vulkanMemoryModelDeviceScope,
1108pname:vulkanMemoryModelDeviceScope>> features are enabled, using the
1109code:Device code:Scope with the code:OpReadClockKHR instruction will read
1110from a clock that is consistent across invocations in the same device scope
1111instance.
1112endif::VK_KHR_shader_clock[]
1113endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
1114
1115There is no method to synchronize the execution of these invocations within
1116SPIR-V, and this can: only be done with API synchronization primitives.
1117
1118ifdef::VK_VERSION_1_1,VK_KHR_device_group[]
1119Invocations executing on different devices in a device group operate in
1120separate device scope instances.
1121endif::VK_VERSION_1_1,VK_KHR_device_group[]
1122
1123ifndef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
1124The scope only extends to the queue family, not the whole device.
1125endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
1126
1127
1128[[shaders-scope-queue-family]]
1129=== Queue Family
1130
1131Invocations executed by queues in a given queue family form a _queue family
1132scope instance_.
1133
1134This scope is identified in SPIR-V as the
1135ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
1136code:QueueFamily code:Scope if the <<features-vulkanMemoryModel,
1137pname:vulkanMemoryModel>> feature is enabled, or if not, the
1138endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
1139code:Device code:Scope, which can: be used as a code:Memory code:Scope for
1140barrier and atomic operations.
1141
1142ifdef::VK_KHR_shader_clock[]
1143If the <<features-shaderDeviceClock, pname:shaderDeviceClock>> feature is
1144enabled,
1145ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
1146but the <<features-vulkanMemoryModelDeviceScope,
1147pname:vulkanMemoryModelDeviceScope>> feature is not enabled,
1148endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
1149using the code:Device code:Scope with the code:OpReadClockKHR instruction
1150will read from a clock that is consistent across invocations in the same
1151queue family scope instance.
1152endif::VK_KHR_shader_clock[]
1153
1154There is no method to synchronize the execution of these invocations within
1155SPIR-V, and this can: only be done with API synchronization primitives.
1156
1157Each invocation in a queue family scope instance must: be in the same
1158<<shaders-scope-device, device scope instance>>.
1159
1160
1161[[shaders-scope-command]]
1162=== Command
1163
1164Any shader invocations executed as the result of a single command such as
1165flink:vkCmdDispatch or flink:vkCmdDraw form a _command scope instance_.
1166For indirect drawing commands with pname:drawCount greater than one,
1167invocations from separate draws are in separate command scope instances.
1168ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
1169For ray tracing shaders, an invocation group is an implementation-dependent
1170subset of the set of shader invocations of a given shader stage which are
1171produced by a single trace rays command.
1172endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
1173
1174There is no specific code:Scope for communication across invocations in a
1175command scope instance.
1176As this has a clear boundary at the API level, coordination here can: be
1177performed in the API, rather than in SPIR-V.
1178
1179Each invocation in a command scope instance must: be in the same
1180<<shaders-scope-queue-family, queue-family scope instance>>.
1181
1182For shaders without defined <<shaders-scope-workgroup, workgroups>>, this
1183set of invocations forms an _invocation group_ as defined in the
1184<<spirv-spec,SPIR-V specification>>.
1185
1186
1187[[shaders-scope-primitive]]
1188=== Primitive
1189
1190Any fragment shader invocations executed as the result of rasterization of a
1191single primitive form a _primitive scope instance_.
1192
1193There is no specific code:Scope for communication across invocations in a
1194primitive scope instance.
1195
1196Any generated <<shaders-helper-invocations, helper invocations>> are
1197included in this scope instance.
1198
1199Each invocation in a primitive scope instance must: be in the same
1200<<shaders-scope-command, command scope instance>>.
1201
1202Any input variables decorated with code:Flat are uniform within a primitive
1203scope instance.
1204
1205
1206// intentionally no VK_NV_ray_tracing here since this scope does not exist there
1207ifdef::VK_KHR_ray_tracing_pipeline[]
1208[[shaders-scope-shadercall]]
1209=== Shader Call
1210
1211Any <<shader-call-related,shader-call-related>> invocations that are
1212executed in one or more ray tracing execution models form a _shader call
1213scope instance_.
1214
1215The code:ShaderCallKHR code:Scope can be used as code:Memory code:Scope for
1216barrier and atomic operations.
1217
1218Each invocation in a shader call scope instance must: be in the same
1219<<shaders-scope-queue-family, queue family scope instance>>.
1220endif::VK_KHR_ray_tracing_pipeline[]
1221
1222
1223[[shaders-scope-workgroup]]
1224=== Workgroup
1225
1226A _local workgroup_ is a set of invocations that can synchronize and share
1227data with each other using memory in the code:Workgroup storage class.
1228
1229The code:Workgroup code:Scope can be used as both an code:Execution
1230code:Scope and code:Memory code:Scope for barrier and atomic operations.
1231
1232Each invocation in a local workgroup must: be in the same
1233<<shaders-scope-command, command scope instance>>.
1234
1235Only
1236ifdef::VK_NV_mesh_shader,VK_EXT_mesh_shader[]
1237task, mesh, and
1238endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[]
1239compute shaders have defined workgroups - other shader types cannot: use
1240workgroup functionality.
1241For shaders that have defined workgroups, this set of invocations forms an
1242_invocation group_ as defined in the <<spirv-spec,SPIR-V specification>>.
1243
1244
1245ifdef::VK_VERSION_1_1[]
1246[[shaders-scope-subgroup]]
1247=== Subgroup
1248
1249A _subgroup_ (see the subsection "`Control Flow`" of section 2 of the SPIR-V
12501.3 Revision 1 specification) is a set of invocations that can synchronize
1251and share data with each other efficiently.
1252
1253The code:Subgroup code:Scope can be used as both an code:Execution
1254code:Scope and code:Memory code:Scope for barrier and atomic operations.
1255Other <<VkSubgroupFeatureFlagBits, subgroup features>> allow the use of
1256<<shaders-group-operations, group operations>> with subgroup scope.
1257
1258ifdef::VK_KHR_shader_clock[]
1259If the <<features-shaderSubgroupClock, pname:shaderSubgroupClock>> feature
1260is enabled, using the code:Subgroup code:Scope with the code:OpReadClockKHR
1261instruction will read from a clock that is consistent across invocations in
1262the same subgroup.
1263endif::VK_KHR_shader_clock[]
1264
1265For <<shaders-scope-workgroup, shaders that have defined workgroups>>, each
1266invocation in a subgroup must: be in the same <<shaders-scope-workgroup,
1267local workgroup>>.
1268
1269In other shader stages, each invocation in a subgroup must: be in the same
1270<<shaders-scope-device, device scope instance>>.
1271
1272Only <<limits-subgroup-supportedStages, shader stages that support subgroup
1273operations>> have defined subgroups.
1274endif::VK_VERSION_1_1[]
1275
1276
1277[[shaders-scope-quad]]
1278=== Quad
1279
1280A _quad scope instance_ is formed of four shader invocations.
1281
1282In a fragment shader, each invocation in a quad scope instance is formed of
1283invocations in neighboring framebuffer locations [eq]#(x~i~, y~i~)#, where:
1284
1285  * [eq]#i# is the index of the invocation within the scope instance.
1286  * [eq]#w# and [eq]#h# are the number of pixels the fragment covers in the
1287    [eq]#x# and [eq]#y# axes.
1288  * [eq]#w# and [eq]#h# are identical for all participating invocations.
1289  * [eq]#(x~0~) = (x~1~ - w) = (x~2~) = (x~3~ - w)#
1290  * [eq]#(y~0~) = (y~1~) = (y~2~ - h) = (y~3~ - h)#
1291  * Each invocation has the same layer and sample indices.
1292
1293ifdef::VK_NV_compute_shader_derivatives[]
1294In a compute shader, if the code:DerivativeGroupQuadsNV execution mode is
1295specified, each invocation in a quad scope instance is formed of invocations
1296with adjacent local invocation IDs [eq]#(x~i~, y~i~)#, where:
1297
1298  * [eq]#i# is the index of the invocation within the quad scope instance.
1299  * [eq]#(x~0~) = (x~1~ - 1) = (x~2~) = (x~3~ - 1)#
1300  * [eq]#(y~0~) = (y~1~) = (y~2~ - 1) = (y~3~ - 1)#
1301  * [eq]#x~0~# and [eq]#y~0~# are integer multiples of 2.
1302  * Each invocation has the same [eq]#z# coordinate.
1303
1304In a compute shader, if the code:DerivativeGroupLinearNV execution mode is
1305specified, each invocation in a quad scope instance is formed of invocations
1306with adjacent local invocation indices [eq]#(l~i~)#, where:
1307
1308  * [eq]#i# is the index of the invocation within the quad scope instance.
1309  * [eq]#(l~0~) = (l~1~ - 1) = (l~2~ - 2) = (l~3~ - 3)#
1310  * [eq]#l~0~# is an integer multiple of 4.
1311
1312endif::VK_NV_compute_shader_derivatives[]
1313
1314ifdef::VK_VERSION_1_1[]
1315In all shaders, each invocation in a quad scope instance is formed of
1316invocations in adjacent subgroup invocation indices [eq]#(s~i~)#, where:
1317
1318  * [eq]#i# is the index of the invocation within the quad scope instance.
1319  * [eq]#(s~0~) = (s~1~ - 1) = (s~2~ - 2) = (s~3~ - 3)#
1320  * [eq]#s~0~# is an integer multiple of 4.
1321
1322Each invocation in a quad scope instance must: be in the same
1323<<shaders-scope-subgroup, subgroup>>.
1324endif::VK_VERSION_1_1[]
1325
1326ifndef::VK_VERSION_1_1[]
1327The specific set of invocations that make up a quad scope instance in other
1328shader stages is undefined:.
1329endif::VK_VERSION_1_1[]
1330
1331In a fragment shader, each invocation in a quad scope instance must: be in
1332the same <<shaders-scope-primitive, primitive scope instance>>.
1333
1334ifndef::VK_VERSION_1_1[]
1335For <<shaders-scope-workgroup, shaders that have defined workgroups>>, each
1336invocation in a quad scope instance must: be in the same
1337<<shaders-scope-workgroup, local workgroup>>.
1338
1339In other shader stages, each invocation in a quad scope instance must: be in
1340the same <<shaders-scope-device, device scope instance>>.
1341endif::VK_VERSION_1_1[]
1342
1343Fragment
1344ifdef::VK_NV_compute_shader_derivatives,VK_VERSION_1_1[]
1345and compute
1346endif::VK_NV_compute_shader_derivatives,VK_VERSION_1_1[]
1347shaders have defined quad scope instances.
1348ifdef::VK_VERSION_1_1[]
1349If the <<limits-subgroup-quadOperationsInAllStages,
1350pname:quadOperationsInAllStages>> limit is supported, any
1351<<limits-subgroup-supportedStages, shader stages that support subgroup
1352operations>> also have defined quad scope instances.
1353endif::VK_VERSION_1_1[]
1354
1355
1356ifdef::VK_EXT_fragment_shader_interlock[]
1357[[shaders-scope-fragment-interlock]]
1358=== Fragment Interlock
1359
1360A _fragment interlock scope instance_ is formed of fragment shader
1361invocations based on their framebuffer locations [eq]#(x,y,layer,sample)#,
1362executed by commands inside a single <<renderpass,subpass>>.
1363
1364The specific set of invocations included varies based on the execution mode
1365as follows:
1366
1367  * If the code:SampleInterlockOrderedEXT or
1368    code:SampleInterlockUnorderedEXT execution modes are used, only
1369    invocations with identical framebuffer locations
1370    [eq]#(x,y,layer,sample)# are included.
1371  * If the code:PixelInterlockOrderedEXT or code:PixelInterlockUnorderedEXT
1372    execution modes are used, fragments with different sample ids are also
1373    included.
1374ifdef::VK_NV_shading_rate_image,VK_KHR_fragment_shading_rate[]
1375  * If the code:ShadingRateInterlockOrderedEXT or
1376    code:ShadingRateInterlockUnorderedEXT execution modes are used,
1377    fragments from neighbouring framebuffer locations are also included.
1378    The
1379ifdef::VK_NV_shading_rate_image[<<primsrast-shading-rate-image, shading rate image>>]
1380ifdef::VK_KHR_fragment_shading_rate+VK_NV_shading_rate_image[or]
1381ifdef::VK_KHR_fragment_shading_rate[<<primsrast-fragment-shading-rate, fragment shading rate>>]
1382    determines these fragments.
1383endif::VK_NV_shading_rate_image,VK_KHR_fragment_shading_rate[]
1384
1385Only fragment shaders with one of the above execution modes have defined
1386fragment interlock scope instances.
1387
1388There is no specific code:Scope value for communication across invocations
1389in a fragment interlock scope instance.
1390However, this is implicitly used as a memory scope by
1391code:OpBeginInvocationInterlockEXT and code:OpEndInvocationInterlockEXT.
1392
1393Each invocation in a fragment interlock scope instance must: be in the same
1394<<shaders-scope-queue-family, queue family scope instance>>.
1395endif::VK_EXT_fragment_shader_interlock[]
1396
1397
1398[[shaders-scope-invocation]]
1399=== Invocation
1400
1401The smallest _scope_ is a single invocation; this is represented by the
1402code:Invocation code:Scope in SPIR-V.
1403
1404Fragment shader invocations must: be in a <<shaders-scope-primitive,
1405primitive scope instance>>.
1406
1407ifdef::VK_EXT_fragment_shader_interlock[]
1408Invocations in <<shaders-scope-fragment-interlock, fragment shaders that
1409have a defined fragment interlock scope>> must: be in a
1410<<shaders-scope-fragment-interlock, fragment interlock scope instance>>.
1411endif::VK_EXT_fragment_shader_interlock[]
1412
1413Invocations in <<shaders-scope-workgroup, shaders that have defined
1414workgroups>> must: be in a <<shaders-scope-workgroup, local workgroup>>.
1415
1416ifdef::VK_VERSION_1_1[]
1417Invocations in <<shaders-scope-subgroup, shaders that have a defined
1418subgroup scope>> must: be in a <<shaders-scope-subgroup, subgroup>>.
1419endif::VK_VERSION_1_1[]
1420
1421Invocations in <<shaders-scope-quad, shaders that have a defined quad
1422scope>> must: be in a <<shaders-scope-quad, quad scope instance>>.
1423
1424All invocations in all stages must: be in a <<shaders-scope-command,command
1425scope instance>>.
1426
1427
1428ifdef::VK_VERSION_1_1[]
1429[[shaders-group-operations]]
1430== Group Operations
1431
1432_Group operations_ are executed by multiple invocations within a
1433<<shaders-scope, scope instance>>; with each invocation involved in
1434calculating the result.
1435This provides a mechanism for efficient communication between invocations in
1436a particular scope instance.
1437
1438Group operations all take a code:Scope defining the desired
1439<<shaders-scope,scope instance>> to operate within.
1440Only the code:Subgroup scope can: be used for these operations; the
1441<<limits-subgroupSupportedOperations, pname:subgroupSupportedOperations>>
1442limit defines which types of operation can: be used.
1443
1444
1445[[shaders-group-operations-basic]]
1446=== Basic Group Operations
1447
1448Basic group operations include the use of code:OpGroupNonUniformElect,
1449code:OpControlBarrier, code:OpMemoryBarrier, and atomic operations.
1450
1451code:OpGroupNonUniformElect can: be used to choose a single invocation to
1452perform a task for the whole group.
1453Only the invocation with the lowest id in the group will return code:true.
1454
1455The <<memory-model,Memory Model>> appendix defines the operation of barriers
1456and atomics.
1457
1458
1459[[shaders-group-operations-vote]]
1460=== Vote Group Operations
1461
1462The vote group operations allow invocations within a group to compare values
1463across a group.
1464The types of votes enabled are:
1465
1466  * Do all active group invocations agree that an expression is true?
1467  * Do any active group invocations evaluate an expression to true?
1468  * Do all active group invocations have the same value of an expression?
1469
1470[NOTE]
1471.Note
1472====
1473These operations are useful in combination with control flow in that they
1474allow for developers to check whether conditions match across the group and
1475choose potentially faster code-paths in these cases.
1476====
1477
1478
1479[[shaders-group-operations-arithmetic]]
1480=== Arithmetic Group Operations
1481
1482The arithmetic group operations allow invocations to perform scans and
1483reductions across a group.
1484The operators supported are add, mul, min, max, and, or, xor.
1485
1486For reductions, every invocation in a group will obtain the cumulative
1487result of these operators applied to all values in the group.
1488For exclusive scans, each invocation in a group will obtain the cumulative
1489result of these operators applied to all values in invocations with a lower
1490index in the group.
1491Inclusive scans are identical to exclusive scans, except the cumulative
1492result includes the operator applied to the value in the current invocation.
1493
1494The order in which these operators are applied is implementation-dependent.
1495
1496
1497[[shaders-group-operations-ballot]]
1498=== Ballot Group Operations
1499
1500The ballot group operations allow invocations to perform more complex votes
1501across the group.
1502The ballot functionality allows all invocations within a group to provide a
1503boolean value and get as a result what each invocation provided as their
1504boolean value.
1505The broadcast functionality allows values to be broadcast from an invocation
1506to all other invocations within the group.
1507
1508
1509[[shaders-group-operations-shuffle]]
1510=== Shuffle Group Operations
1511
1512The shuffle group operations allow invocations to read values from other
1513invocations within a group.
1514
1515
1516[[shaders-group-operations-shuffle-relative]]
1517=== Shuffle Relative Group Operations
1518
1519The shuffle relative group operations allow invocations to read values from
1520other invocations within the group relative to the current invocation in the
1521group.
1522The relative operations supported allow data to be shifted up and down
1523through the invocations within a group.
1524
1525
1526[[shaders-group-operations-clustered]]
1527=== Clustered Group Operations
1528
1529The clustered group operations allow invocations to perform an operation
1530among partitions of a group, such that the operation is only performed
1531within the group invocations within a partition.
1532The partitions for clustered group operations are consecutive power-of-two
1533size groups of invocations and the cluster size must: be known at pipeline
1534creation time.
1535The operations supported are add, mul, min, max, and, or, xor.
1536
1537
1538[[shaders-quad-operations]]
1539== Quad Group Operations
1540
1541Quad group operations (code:OpGroupNonUniformQuad*) are a specialized type
1542of <<shaders-group-operations, group operations>> that only operate on
1543<<shaders-scope-quad, quad scope instances>>.
1544Whilst these instructions do include a code:Scope parameter, this scope is
1545always overridden; only the <<shaders-scope-quad, quad scope instance>> is
1546included in its execution scope.
1547
1548Fragment shaders that statically execute quad group operations must: launch
1549sufficient invocations to ensure their correct operation; additional
1550<<shaders-helper-invocations, helper invocations>> are launched for
1551framebuffer locations not covered by rasterized fragments if necessary.
1552
1553The index used to select participating invocations is [eq]#i#, as described
1554for a <<shaders-scope-quad, quad scope instance>>, defined as the _quad
1555index_ in the <<spirv-spec,SPIR-V specification>>.
1556
1557For code:OpGroupNonUniformQuadBroadcast this value is equal to code:Index.
1558For code:OpGroupNonUniformQuadSwap, it is equal to the implicit code:Index
1559used by each participating invocation.
1560endif::VK_VERSION_1_1[]
1561
1562
1563[[shaders-derivative-operations]]
1564== Derivative Operations
1565
1566Derivative operations calculate the partial derivative for an expression
1567[eq]#P# as a function of an invocation's [eq]#x# and [eq]#y# coordinates.
1568
1569Derivative operations operate on a set of invocations known as a _derivative
1570group_ as defined in the <<spirv-spec,SPIR-V specification>>.
1571A derivative group is equivalent to
1572ifdef::VK_NV_compute_shader_derivatives[]
1573the <<shaders-scope-quad, quad scope instance>> for a compute shader
1574invocation, or
1575endif::VK_NV_compute_shader_derivatives[]
1576the <<shaders-scope-primitive, primitive scope instance>> for a fragment
1577shader invocation.
1578
1579Derivatives are calculated assuming that [eq]#P# is piecewise linear and
1580continuous within the derivative group.
1581All dynamic instances of explicit derivative instructions (code:OpDPdx*,
1582code:OpDPdy*, and code:OpFwidth*) must: be executed in control flow that is
1583uniform within a derivative group.
1584For other derivative operations, results are undefined: if a dynamic
1585instance is executed in control flow that is not uniform within the
1586derivative group.
1587
1588Fragment shaders that statically execute derivative operations must: launch
1589sufficient invocations to ensure their correct operation; additional
1590<<shaders-helper-invocations, helper invocations>> are launched for
1591framebuffer locations not covered by rasterized fragments if necessary.
1592
1593ifdef::VK_NV_compute_shader_derivatives[]
1594[NOTE]
1595.Note
1596====
1597In a compute shader, it is the application's responsibility to ensure that
1598sufficient invocations are launched.
1599====
1600endif::VK_NV_compute_shader_derivatives[]
1601
1602Derivative operations calculate their results as the difference between the
1603result of [eq]#P# across invocations in the quad.
1604For fine derivative operations (code:OpDPdxFine and code:OpDPdyFine), the
1605values of [eq]#DPdx(P~i~)# are calculated as
1606
1607  {empty}:: [eq]#DPdx(P~0~) = DPdx(P~1~) = P~1~ - P~0~#
1608  {empty}:: [eq]#DPdx(P~2~) = DPdx(P~3~) = P~3~ - P~2~#
1609
1610and the values of [eq]#DPdy(P~i~)# are calculated as
1611
1612  {empty}:: [eq]#DPdy(P~0~) = DPdy(P~2~) = P~2~ - P~0~#
1613  {empty}:: [eq]#DPdy(P~1~) = DPdy(P~3~) = P~3~ - P~1~#
1614
1615where [eq]#i# is the index of each invocation as described in
1616<<shaders-scope-quad>>.
1617
1618Coarse derivative operations (code:OpDPdxCoarse and code:OpDPdyCoarse),
1619calculate their results in roughly the same manner, but may: only calculate
1620two values instead of four (one for each of [eq]#DPdx# and [eq]#DPdy#),
1621reusing the same result no matter the originating invocation.
1622If an implementation does this, it should: use the fine derivative
1623calculations described for [eq]#P~0~#.
1624
1625[NOTE]
1626.Note
1627====
1628Derivative values are calculated between fragments rather than pixels.
1629If the fragment shader invocations involved in the calculation cover
1630multiple pixels, these operations cover a wider area, resulting in larger
1631derivative values.
1632This in turn will result in a coarser level of detail being selected for
1633image sampling operations using derivatives.
1634
1635Applications may want to account for this when using multi-pixel fragments;
1636if pixel derivatives are desired, applications should use explicit
1637derivative operations and divide the results by the size of the fragment in
1638each dimension as follows:
1639
1640  {empty}:: [eq]#DPdx(P~n~)' = DPdx(P~n~) / w#
1641  {empty}:: [eq]#DPdy(P~n~)' = DPdy(P~n~) / h#
1642
1643where [eq]#w# and [eq]#h# are the size of the fragments in the quad, and
1644[eq]#DPdx(P~n~)'# and [eq]#DPdy(P~n~)'# are the pixel derivatives.
1645====
1646
1647The results for code:OpDPdx and code:OpDPdy may: be calculated as either
1648fine or coarse derivatives, with implementations favouring the most
1649efficient approach.
1650Implementations must: choose coarse or fine consistently between the two.
1651
1652Executing code:OpFwidthFine, code:OpFwidthCoarse, or code:OpFwidth is
1653equivalent to executing the corresponding code:OpDPdx* and code:OpDPdy*
1654instructions, taking the absolute value of the results, and summing them.
1655
1656Executing an code:OpImage*Sample*ImplicitLod instruction is equivalent to
1657executing code:OpDPdx(code:Coordinate) and code:OpDPdy(code:Coordinate), and
1658passing the results as the code:Grad operands code:dx and code:dy.
1659
1660[NOTE]
1661.Note
1662====
1663It is expected that using the code:ImplicitLod variants of sampling
1664functions will be substantially more efficient than using the
1665code:ExplicitLod variants with explicitly generated derivatives.
1666====
1667
1668
1669[[shaders-helper-invocations]]
1670== Helper Invocations
1671
1672When performing <<shaders-derivative-operations, derivative>>
1673ifdef::VK_VERSION_1_1[]
1674or <<shaders-quad-operations, quad group>>
1675endif::VK_VERSION_1_1[]
1676operations in a fragment shader, additional invocations may: be spawned in
1677order to ensure correct results.
1678These additional invocations are known as _helper invocations_ and can: be
1679identified by a non-zero value in the code:HelperInvocation built-in.
1680Stores and atomics performed by helper invocations must: not have any effect
1681on memory except for the code:Function, code:Private and code:Output storage
1682classes, and values returned by atomic instructions in helper invocations
1683are undefined:.
1684
1685[NOTE]
1686.Note
1687====
1688While storage to code:Output storage class has an effect even in helper
1689invocations, it does not mean that helper invocations have an effect on the
1690framebuffer.
1691code:Output variables in fragment shaders can be read from as well, and they
1692behave more like code:Private variables for the duration of the shader
1693invocation.
1694====
1695
1696For <<shaders-group-operations, group operations>> other than
1697<<shaders-derivative-operations, derivative>>
1698ifdef::VK_VERSION_1_1[]
1699and <<shaders-quad-operations, quad group>>
1700endif::VK_VERSION_1_1[]
1701operations, helper invocations may: be treated as inactive even if they
1702would be considered otherwise active.
1703
1704ifdef::VK_VERSION_1_3,VK_EXT_shader_demote_to_helper_invocation[]
1705Helper invocations may: become permanently inactive if all invocations in a
1706quad scope instance become helper invocations.
1707endif::VK_VERSION_1_3,VK_EXT_shader_demote_to_helper_invocation[]
1708
1709
1710ifdef::VK_NV_cooperative_matrix[]
1711== Cooperative Matrices
1712
1713A _cooperative matrix_ type is a SPIR-V type where the storage for and
1714computations performed on the matrix are spread across the invocations in a
1715scope instance.
1716These types give the implementation freedom in how to optimize matrix
1717multiplies.
1718
1719SPIR-V defines the types and instructions, but does not specify rules about
1720what sizes/combinations are valid, and it is expected that different
1721implementations may: support different sizes.
1722
1723[open,refpage='vkGetPhysicalDeviceCooperativeMatrixPropertiesNV',desc='Returns properties describing what cooperative matrix types are supported',type='protos']
1724--
1725To enumerate the supported cooperative matrix types and operations, call:
1726
1727include::{generated}/api/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.adoc[]
1728
1729  * pname:physicalDevice is the physical device.
1730  * pname:pPropertyCount is a pointer to an integer related to the number of
1731    cooperative matrix properties available or queried.
1732  * pname:pProperties is either `NULL` or a pointer to an array of
1733    slink:VkCooperativeMatrixPropertiesNV structures.
1734
1735If pname:pProperties is `NULL`, then the number of cooperative matrix
1736properties available is returned in pname:pPropertyCount.
1737Otherwise, pname:pPropertyCount must: point to a variable set by the user to
1738the number of elements in the pname:pProperties array, and on return the
1739variable is overwritten with the number of structures actually written to
1740pname:pProperties.
1741If pname:pPropertyCount is less than the number of cooperative matrix
1742properties available, at most pname:pPropertyCount structures will be
1743written, and ename:VK_INCOMPLETE will be returned instead of
1744ename:VK_SUCCESS, to indicate that not all the available cooperative matrix
1745properties were returned.
1746
1747include::{generated}/validity/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.adoc[]
1748--
1749
1750[open,refpage='VkCooperativeMatrixPropertiesNV',desc='Structure specifying cooperative matrix properties',type='structs']
1751--
1752Each sname:VkCooperativeMatrixPropertiesNV structure describes a single
1753supported combination of types for a matrix multiply/add operation
1754(code:OpCooperativeMatrixMulAddNV).
1755The multiply can: be described in terms of the following variables and types
1756(in SPIR-V pseudocode):
1757
1758[source,c]
1759~~~~
1760    %A is of type OpTypeCooperativeMatrixNV %AType %scope %MSize %KSize
1761    %B is of type OpTypeCooperativeMatrixNV %BType %scope %KSize %NSize
1762    %C is of type OpTypeCooperativeMatrixNV %CType %scope %MSize %NSize
1763    %D is of type OpTypeCooperativeMatrixNV %DType %scope %MSize %NSize
1764
1765    %D = %A * %B + %C // using OpCooperativeMatrixMulAddNV
1766~~~~
1767
1768A matrix multiply with these dimensions is known as an _MxNxK_ matrix
1769multiply.
1770
1771The sname:VkCooperativeMatrixPropertiesNV structure is defined as:
1772
1773include::{generated}/api/structs/VkCooperativeMatrixPropertiesNV.adoc[]
1774
1775  * pname:sType is the type of this structure.
1776  * pname:pNext is `NULL` or a pointer to a structure extending this
1777    structure.
1778  * pname:MSize is the number of rows in matrices A, C, and D.
1779  * pname:KSize is the number of columns in matrix A and rows in matrix B.
1780  * pname:NSize is the number of columns in matrices B, C, D.
1781  * pname:AType is the component type of matrix A, of type
1782    elink:VkComponentTypeNV.
1783  * pname:BType is the component type of matrix B, of type
1784    elink:VkComponentTypeNV.
1785  * pname:CType is the component type of matrix C, of type
1786    elink:VkComponentTypeNV.
1787  * pname:DType is the component type of matrix D, of type
1788    elink:VkComponentTypeNV.
1789  * pname:scope is the scope of all the matrix types, of type
1790    elink:VkScopeNV.
1791
1792If some types are preferred over other types (e.g. for performance), they
1793should: appear earlier in the list enumerated by
1794flink:vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.
1795
1796At least one entry in the list must: have power of two values for all of
1797pname:MSize, pname:KSize, and pname:NSize.
1798
1799include::{generated}/validity/structs/VkCooperativeMatrixPropertiesNV.adoc[]
1800--
1801
1802[open,refpage='VkScopeNV',desc='Specify SPIR-V scope',type='enums']
1803--
1804Possible values for elink:VkScopeNV include:
1805
1806include::{generated}/api/enums/VkScopeNV.adoc[]
1807
1808  * ename:VK_SCOPE_DEVICE_NV corresponds to SPIR-V code:Device scope.
1809  * ename:VK_SCOPE_WORKGROUP_NV corresponds to SPIR-V code:Workgroup scope.
1810  * ename:VK_SCOPE_SUBGROUP_NV corresponds to SPIR-V code:Subgroup scope.
1811  * ename:VK_SCOPE_QUEUE_FAMILY_NV corresponds to SPIR-V code:QueueFamily
1812    scope.
1813
1814All enum values match the corresponding SPIR-V value.
1815--
1816
1817[open,refpage='VkComponentTypeNV',desc='Specify SPIR-V cooperative matrix component type',type='enums']
1818--
1819Possible values for elink:VkComponentTypeNV include:
1820
1821include::{generated}/api/enums/VkComponentTypeNV.adoc[]
1822
1823  * ename:VK_COMPONENT_TYPE_FLOAT16_NV corresponds to SPIR-V
1824    code:OpTypeFloat 16.
1825  * ename:VK_COMPONENT_TYPE_FLOAT32_NV corresponds to SPIR-V
1826    code:OpTypeFloat 32.
1827  * ename:VK_COMPONENT_TYPE_FLOAT64_NV corresponds to SPIR-V
1828    code:OpTypeFloat 64.
1829  * ename:VK_COMPONENT_TYPE_SINT8_NV corresponds to SPIR-V code:OpTypeInt 8 1.
1830  * ename:VK_COMPONENT_TYPE_SINT16_NV corresponds to SPIR-V code:OpTypeInt
1831    16 1.
1832  * ename:VK_COMPONENT_TYPE_SINT32_NV corresponds to SPIR-V code:OpTypeInt
1833    32 1.
1834  * ename:VK_COMPONENT_TYPE_SINT64_NV corresponds to SPIR-V code:OpTypeInt
1835    64 1.
1836  * ename:VK_COMPONENT_TYPE_UINT8_NV corresponds to SPIR-V code:OpTypeInt 8 0.
1837  * ename:VK_COMPONENT_TYPE_UINT16_NV corresponds to SPIR-V code:OpTypeInt
1838    16 0.
1839  * ename:VK_COMPONENT_TYPE_UINT32_NV corresponds to SPIR-V code:OpTypeInt
1840    32 0.
1841  * ename:VK_COMPONENT_TYPE_UINT64_NV corresponds to SPIR-V code:OpTypeInt
1842    64 0.
1843--
1844endif::VK_NV_cooperative_matrix[]
1845
1846
1847ifdef::VK_EXT_validation_cache[]
1848[[shaders-validation-cache]]
1849== Validation Cache
1850
1851[open,refpage='VkValidationCacheEXT',desc='Opaque handle to a validation cache object',type='handles']
1852--
1853Validation cache objects allow the result of internal validation to be
1854reused, both within a single application run and between multiple runs.
1855Reuse within a single run is achieved by passing the same validation cache
1856object when creating supported Vulkan objects.
1857Reuse across runs of an application is achieved by retrieving validation
1858cache contents in one run of an application, saving the contents, and using
1859them to preinitialize a validation cache on a subsequent run.
1860The contents of the validation cache objects are managed by the validation
1861layers.
1862Applications can: manage the host memory consumed by a validation cache
1863object and control the amount of data retrieved from a validation cache
1864object.
1865
1866Validation cache objects are represented by sname:VkValidationCacheEXT
1867handles:
1868
1869include::{generated}/api/handles/VkValidationCacheEXT.adoc[]
1870--
1871
1872[open,refpage='vkCreateValidationCacheEXT',desc='Creates a new validation cache',type='protos']
1873--
1874To create validation cache objects, call:
1875
1876include::{generated}/api/protos/vkCreateValidationCacheEXT.adoc[]
1877
1878  * pname:device is the logical device that creates the validation cache
1879    object.
1880  * pname:pCreateInfo is a pointer to a slink:VkValidationCacheCreateInfoEXT
1881    structure containing the initial parameters for the validation cache
1882    object.
1883  * pname:pAllocator controls host memory allocation as described in the
1884    <<memory-allocation, Memory Allocation>> chapter.
1885  * pname:pValidationCache is a pointer to a slink:VkValidationCacheEXT
1886    handle in which the resulting validation cache object is returned.
1887
1888[NOTE]
1889.Note
1890====
1891Applications can: track and manage the total host memory size of a
1892validation cache object using the pname:pAllocator.
1893Applications can: limit the amount of data retrieved from a validation cache
1894object in fname:vkGetValidationCacheDataEXT.
1895Implementations should: not internally limit the total number of entries
1896added to a validation cache object or the total host memory consumed.
1897====
1898
1899Once created, a validation cache can: be passed to the
1900fname:vkCreateShaderModule command by adding this object to the
1901slink:VkShaderModuleCreateInfo structure's pname:pNext chain.
1902If a slink:VkShaderModuleValidationCacheCreateInfoEXT object is included in
1903the slink:VkShaderModuleCreateInfo::pname:pNext chain, and its
1904pname:validationCache field is not dlink:VK_NULL_HANDLE, the implementation
1905will query it for possible reuse opportunities and update it with new
1906content.
1907The use of the validation cache object in these commands is internally
1908synchronized, and the same validation cache object can: be used in multiple
1909threads simultaneously.
1910
1911[NOTE]
1912.Note
1913====
1914Implementations should: make every effort to limit any critical sections to
1915the actual accesses to the cache, which is expected to be significantly
1916shorter than the duration of the fname:vkCreateShaderModule command.
1917====
1918
1919include::{generated}/validity/protos/vkCreateValidationCacheEXT.adoc[]
1920--
1921
1922[open,refpage='VkValidationCacheCreateInfoEXT',desc='Structure specifying parameters of a newly created validation cache',type='structs']
1923--
1924The sname:VkValidationCacheCreateInfoEXT structure is defined as:
1925
1926include::{generated}/api/structs/VkValidationCacheCreateInfoEXT.adoc[]
1927
1928  * pname:sType is the type of this structure.
1929  * pname:pNext is `NULL` or a pointer to a structure extending this
1930    structure.
1931  * pname:flags is reserved for future use.
1932  * pname:initialDataSize is the number of bytes in pname:pInitialData.
1933    If pname:initialDataSize is zero, the validation cache will initially be
1934    empty.
1935  * pname:pInitialData is a pointer to previously retrieved validation cache
1936    data.
1937    If the validation cache data is incompatible (as defined below) with the
1938    device, the validation cache will be initially empty.
1939    If pname:initialDataSize is zero, pname:pInitialData is ignored.
1940
1941.Valid Usage
1942****
1943  * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01534]]
1944    If pname:initialDataSize is not `0`, it must: be equal to the size of
1945    pname:pInitialData, as returned by fname:vkGetValidationCacheDataEXT
1946    when pname:pInitialData was originally retrieved
1947  * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01535]]
1948    If pname:initialDataSize is not `0`, pname:pInitialData must: have been
1949    retrieved from a previous call to fname:vkGetValidationCacheDataEXT
1950****
1951
1952include::{generated}/validity/structs/VkValidationCacheCreateInfoEXT.adoc[]
1953--
1954
1955[open,refpage='VkValidationCacheCreateFlagsEXT',desc='Reserved for future use',type='flags']
1956--
1957include::{generated}/api/flags/VkValidationCacheCreateFlagsEXT.adoc[]
1958
1959tname:VkValidationCacheCreateFlagsEXT is a bitmask type for setting a mask,
1960but is currently reserved for future use.
1961--
1962
1963[open,refpage='vkMergeValidationCachesEXT',desc='Combine the data stores of validation caches',type='protos']
1964--
1965Validation cache objects can: be merged using the command:
1966
1967include::{generated}/api/protos/vkMergeValidationCachesEXT.adoc[]
1968
1969  * pname:device is the logical device that owns the validation cache
1970    objects.
1971  * pname:dstCache is the handle of the validation cache to merge results
1972    into.
1973  * pname:srcCacheCount is the length of the pname:pSrcCaches array.
1974  * pname:pSrcCaches is a pointer to an array of validation cache handles,
1975    which will be merged into pname:dstCache.
1976    The previous contents of pname:dstCache are included after the merge.
1977
1978[NOTE]
1979.Note
1980====
1981The details of the merge operation are implementation-dependent, but
1982implementations should: merge the contents of the specified validation
1983caches and prune duplicate entries.
1984====
1985
1986.Valid Usage
1987****
1988  * [[VUID-vkMergeValidationCachesEXT-dstCache-01536]]
1989    pname:dstCache must: not appear in the list of source caches
1990****
1991
1992include::{generated}/validity/protos/vkMergeValidationCachesEXT.adoc[]
1993--
1994
1995[open,refpage='vkGetValidationCacheDataEXT',desc='Get the data store from a validation cache',type='protos']
1996--
1997Data can: be retrieved from a validation cache object using the command:
1998
1999include::{generated}/api/protos/vkGetValidationCacheDataEXT.adoc[]
2000
2001  * pname:device is the logical device that owns the validation cache.
2002  * pname:validationCache is the validation cache to retrieve data from.
2003  * pname:pDataSize is a pointer to a value related to the amount of data in
2004    the validation cache, as described below.
2005  * pname:pData is either `NULL` or a pointer to a buffer.
2006
2007If pname:pData is `NULL`, then the maximum size of the data that can: be
2008retrieved from the validation cache, in bytes, is returned in
2009pname:pDataSize.
2010Otherwise, pname:pDataSize must: point to a variable set by the user to the
2011size of the buffer, in bytes, pointed to by pname:pData, and on return the
2012variable is overwritten with the amount of data actually written to
2013pname:pData.
2014If pname:pDataSize is less than the maximum size that can: be retrieved by
2015the validation cache, at most pname:pDataSize bytes will be written to
2016pname:pData, and fname:vkGetValidationCacheDataEXT will return
2017ename:VK_INCOMPLETE instead of ename:VK_SUCCESS, to indicate that not all of
2018the validation cache was returned.
2019
2020Any data written to pname:pData is valid and can: be provided as the
2021pname:pInitialData member of the slink:VkValidationCacheCreateInfoEXT
2022structure passed to fname:vkCreateValidationCacheEXT.
2023
2024Two calls to fname:vkGetValidationCacheDataEXT with the same parameters
2025must: retrieve the same data unless a command that modifies the contents of
2026the cache is called between them.
2027
2028[[validation-cache-header]]
2029Applications can: store the data retrieved from the validation cache, and
2030use these data, possibly in a future run of the application, to populate new
2031validation cache objects.
2032The results of validation, however, may: depend on the vendor ID, device ID,
2033driver version, and other details of the device.
2034To enable applications to detect when previously retrieved data is
2035incompatible with the device, the initial bytes written to pname:pData must:
2036be a header consisting of the following members:
2037
2038.Layout for validation cache header version ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT
2039[width="85%",cols="8%,21%,71%",options="header"]
2040|====
2041| Offset | Size | Meaning
2042| 0 | 4                    | length in bytes of the entire validation cache header
2043                             written as a stream of bytes, with the least
2044                             significant byte first
2045| 4 | 4                    | a elink:VkValidationCacheHeaderVersionEXT value
2046                             written as a stream of bytes, with the least
2047                             significant byte first
2048| 8 | ename:VK_UUID_SIZE   | a layer commit ID expressed as a UUID, which uniquely
2049                             identifies the version of the validation layers used
2050                             to generate these validation results
2051|====
2052
2053The first four bytes encode the length of the entire validation cache
2054header, in bytes.
2055This value includes all fields in the header including the validation cache
2056version field and the size of the length field.
2057
2058The next four bytes encode the validation cache version, as described for
2059elink:VkValidationCacheHeaderVersionEXT.
2060A consumer of the validation cache should: use the cache version to
2061interpret the remainder of the cache header.
2062
2063If pname:pDataSize is less than what is necessary to store this header,
2064nothing will be written to pname:pData and zero will be written to
2065pname:pDataSize.
2066
2067include::{generated}/validity/protos/vkGetValidationCacheDataEXT.adoc[]
2068--
2069
2070[open,refpage='VkValidationCacheHeaderVersionEXT',desc='Encode validation cache version',type='enums',xrefs='vkCreateValidationCacheEXT vkGetValidationCacheDataEXT']
2071--
2072Possible values of the second group of four bytes in the header returned by
2073flink:vkGetValidationCacheDataEXT, encoding the validation cache version,
2074are:
2075
2076include::{generated}/api/enums/VkValidationCacheHeaderVersionEXT.adoc[]
2077
2078  * ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT specifies version one
2079    of the validation cache.
2080--
2081
2082[open,refpage='vkDestroyValidationCacheEXT',desc='Destroy a validation cache object',type='protos']
2083--
2084To destroy a validation cache, call:
2085
2086include::{generated}/api/protos/vkDestroyValidationCacheEXT.adoc[]
2087
2088  * pname:device is the logical device that destroys the validation cache
2089    object.
2090  * pname:validationCache is the handle of the validation cache to destroy.
2091  * pname:pAllocator controls host memory allocation as described in the
2092    <<memory-allocation, Memory Allocation>> chapter.
2093
2094.Valid Usage
2095****
2096  * [[VUID-vkDestroyValidationCacheEXT-validationCache-01537]]
2097    If sname:VkAllocationCallbacks were provided when pname:validationCache
2098    was created, a compatible set of callbacks must: be provided here
2099  * [[VUID-vkDestroyValidationCacheEXT-validationCache-01538]]
2100    If no sname:VkAllocationCallbacks were provided when
2101    pname:validationCache was created, pname:pAllocator must: be `NULL`
2102****
2103
2104include::{generated}/validity/protos/vkDestroyValidationCacheEXT.adoc[]
2105--
2106endif::VK_EXT_validation_cache[]
2107