• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright (c) 2015-2018 Khronos Group. This work is licensed under a
2// Creative Commons Attribution 4.0 International License; see
3// http://creativecommons.org/licenses/by/4.0/
4
5[[shaders]]
6= Shaders
7
8A shader specifies programmable operations that execute for each vertex,
9control point, tessellated vertex, primitive, fragment, or workgroup in the
10corresponding stage(s) of the graphics and compute pipelines.
11
12Graphics pipelines include vertex shader execution as a result of
13<<drawing,primitive assembly>>, followed, if enabled, by tessellation
14control and evaluation shaders operating on
15<<drawing-primitive-topologies-patches,patches>>, geometry shaders, if
16enabled, operating on primitives, and fragment shaders, if present,
17operating on fragments generated by <<primsrast,Rasterization>>.
18In this specification, vertex, tessellation control, tessellation evaluation
19and geometry shaders are collectively referred to as vertex processing
20stages and occur in the logical pipeline before rasterization.
21The fragment shader occurs logically after rasterization.
22
23Only the compute shader stage is included in a compute pipeline.
24Compute shaders operate on compute invocations in a workgroup.
25
26Shaders can: read from input variables, and read from and write to output
27variables.
28Input and output variables can: be used to transfer data between shader
29stages, or to allow the shader to interact with values that exist in the
30execution environment.
31Similarly, the execution environment provides constants that describe
32capabilities.
33
34Shader variables are associated with execution environment-provided inputs
35and outputs using _built-in_ decorations in the shader.
36The available decorations for each stage are documented in the following
37subsections.
38
39
40[[shader-modules]]
41== Shader Modules
42
43[open,refpage='VkShaderModule',desc='Opaque handle to a shader module object',type='handles']
44--
45
46_Shader modules_ contain _shader code_ and one or more entry points.
47Shaders are selected from a shader module by specifying an entry point as
48part of <<pipelines,pipeline>> creation.
49The stages of a pipeline can: use shaders that come from different modules.
50The shader code defining a shader module must: be in the SPIR-V format, as
51described by the <<spirvenv,Vulkan Environment for SPIR-V>> appendix.
52
53Shader modules are represented by sname:VkShaderModule handles:
54
55include::../api/handles/VkShaderModule.txt[]
56
57--
58
59[open,refpage='vkCreateShaderModule',desc='Creates a new shader module object',type='protos']
60--
61
62To create a shader module, call:
63
64include::../api/protos/vkCreateShaderModule.txt[]
65
66  * pname:device is the logical device that creates the shader module.
67  * pname:pCreateInfo is a pointer to an instance of the
68    sname:VkShaderModuleCreateInfo structure.
69  * pname:pAllocator controls host memory allocation as described in the
70    <<memory-allocation, Memory Allocation>> chapter.
71  * pname:pShaderModule points to a slink:VkShaderModule handle in which the
72    resulting shader module object is returned.
73
74Once a shader module has been created, any entry points it contains can: be
75used in pipeline shader stages as described in <<pipelines-compute,Compute
76Pipelines>> and <<pipelines-graphics,Graphics Pipelines>>.
77
78ifdef::VK_NV_glsl_shader[]
79If the shader stage fails to compile ename:VK_ERROR_INVALID_SHADER_NV will
80be generated and the compile log will be reported back to the application by
81`<<VK_EXT_debug_report>>` if enabled.
82endif::VK_NV_glsl_shader[]
83
84include::../validity/protos/vkCreateShaderModule.txt[]
85--
86
87[open,refpage='VkShaderModuleCreateInfo',desc='Structure specifying parameters of a newly created shader module',type='structs']
88--
89
90The sname:VkShaderModuleCreateInfo structure is defined as:
91
92include::../api/structs/VkShaderModuleCreateInfo.txt[]
93
94  * pname:sType is the type of this structure.
95  * pname:pNext is `NULL` or a pointer to an extension-specific structure.
96  * pname:flags is reserved for future use.
97  * pname:codeSize is the size, in bytes, of the code pointed to by
98    pname:pCode.
99  * pname:pCode points to code that is used to create the shader module.
100    The type and format of the code is determined from the content of the
101    memory addressed by pname:pCode.
102
103.Valid Usage
104****
105  * [[VUID-VkShaderModuleCreateInfo-codeSize-01085]]
106    pname:codeSize must: be greater than 0
107ifndef::VK_NV_glsl_shader[]
108  * [[VUID-VkShaderModuleCreateInfo-codeSize-01086]]
109    pname:codeSize must: be a multiple of 4
110  * [[VUID-VkShaderModuleCreateInfo-pCode-01087]]
111    pname:pCode must: point to valid SPIR-V code, formatted and packed as
112    described by the <<spirv-spec,Khronos SPIR-V Specification>>
113  * [[VUID-VkShaderModuleCreateInfo-pCode-01088]]
114    pname:pCode must: adhere to the validation rules described by the
115    <<spirvenv-module-validation, Validation Rules within a Module>> section
116    of the <<spirvenv-capabilities,SPIR-V Environment>> appendix
117endif::VK_NV_glsl_shader[]
118ifdef::VK_NV_glsl_shader[]
119  * [[VUID-VkShaderModuleCreateInfo-pCode-01376]]
120    If pname:pCode points to SPIR-V code, pname:codeSize must: be a multiple
121    of 4
122  * [[VUID-VkShaderModuleCreateInfo-pCode-01377]]
123    pname:pCode must: point to either valid SPIR-V code, formatted and
124    packed as described by the <<spirv-spec,Khronos SPIR-V Specification>>
125    or valid GLSL code which must: be written to the +GL_KHR_vulkan_glsl+
126    extension specification
127  * [[VUID-VkShaderModuleCreateInfo-pCode-01378]]
128    If pname:pCode points to SPIR-V code, that code must: adhere to the
129    validation rules described by the <<spirvenv-module-validation,
130    Validation Rules within a Module>> section of the
131    <<spirvenv-capabilities,SPIR-V Environment>> appendix
132  * [[VUID-VkShaderModuleCreateInfo-pCode-01379]]
133    If pname:pCode points to GLSL code, it must: be valid GLSL code written
134    to the +GL_KHR_vulkan_glsl+ GLSL extension specification
135endif::VK_NV_glsl_shader[]
136  * [[VUID-VkShaderModuleCreateInfo-pCode-01089]]
137    pname:pCode must: declare the code:Shader capability for SPIR-V code
138  * [[VUID-VkShaderModuleCreateInfo-pCode-01090]]
139    pname:pCode must: not declare any capability that is not supported by
140    the API, as described by the <<spirvenv-module-validation,
141    Capabilities>> section of the <<spirvenv-capabilities,SPIR-V
142    Environment>> appendix
143  * [[VUID-VkShaderModuleCreateInfo-pCode-01091]]
144    If pname:pCode declares any of the capabilities listed as optional: in
145    the <<spirvenv-capabilities-table,SPIR-V Environment>> appendix, the
146    corresponding feature(s) must: be enabled.
147****
148
149include::../validity/structs/VkShaderModuleCreateInfo.txt[]
150--
151
152[open,refpage='VkShaderModuleCreateFlags',desc='Reserved for future use',type='enums']
153--
154include::../api/flags/VkShaderModuleCreateFlags.txt[]
155
156sname:VkShaderModuleCreateFlags is a bitmask type for setting a mask, but is
157currently reserved for future use.
158--
159
160ifdef::VK_EXT_validation_cache[]
161include::VK_EXT_validation_cache/shader-module-validation-cache.txt[]
162endif::VK_EXT_validation_cache[]
163
164
165[open,refpage='vkDestroyShaderModule',desc='Destroy a shader module module',type='protos']
166--
167
168To destroy a shader module, call:
169
170include::../api/protos/vkDestroyShaderModule.txt[]
171
172  * pname:device is the logical device that destroys the shader module.
173  * pname:shaderModule is the handle of the shader module to destroy.
174  * pname:pAllocator controls host memory allocation as described in the
175    <<memory-allocation, Memory Allocation>> chapter.
176
177A shader module can: be destroyed while pipelines created using its shaders
178are still in use.
179
180.Valid Usage
181****
182  * [[VUID-vkDestroyShaderModule-shaderModule-01092]]
183    If sname:VkAllocationCallbacks were provided when pname:shaderModule was
184    created, a compatible set of callbacks must: be provided here
185  * [[VUID-vkDestroyShaderModule-shaderModule-01093]]
186    If no sname:VkAllocationCallbacks were provided when pname:shaderModule
187    was created, pname:pAllocator must: be `NULL`
188****
189
190include::../validity/protos/vkDestroyShaderModule.txt[]
191--
192
193
194[[shaders-execution]]
195== Shader Execution
196
197At each stage of the pipeline, multiple invocations of a shader may: execute
198simultaneously.
199Further, invocations of a single shader produced as the result of different
200commands may: execute simultaneously.
201The relative execution order of invocations of the same shader type is
202undefined.
203Shader invocations may: complete in a different order than that in which the
204primitives they originated from were drawn or dispatched by the application.
205However, fragment shader outputs are written to attachments in
206<<primrast-order,rasterization order>>.
207
208The relative order of invocations of different shader types is largely
209undefined.
210However, when invoking a shader whose inputs are generated from a previous
211pipeline stage, the shader invocations from the previous stage are
212guaranteed to have executed far enough to generate input values for all
213required inputs.
214
215
216[[shaders-execution-memory-ordering]]
217== Shader Memory Access Ordering
218
219The order in which image or buffer memory is read or written by shaders is
220largely undefined.
221For some shader types (vertex, tessellation evaluation, and in some cases,
222fragment), even the number of shader invocations that may: perform loads and
223stores is undefined.
224
225In particular, the following rules apply:
226
227  * <<shaders-vertex-execution,Vertex>> and
228    <<shaders-tessellation-evaluation-execution,tessellation evaluation>>
229    shaders will be invoked at least once for each unique vertex, as defined
230    in those sections.
231  * <<shaders-fragment-execution,Fragment>> shaders will be invoked zero or
232    more times, as defined in that section.
233  * The relative order of invocations of the same shader type are undefined.
234    A store issued by a shader when working on primitive B might complete
235    prior to a store for primitive A, even if primitive A is specified prior
236    to primitive B. This applies even to fragment shaders; while fragment
237    shader outputs are always written to the framebuffer in
238    <<primrast-order, rasterization order>>, stores executed by fragment
239    shader invocations are not.
240  * The relative order of invocations of different shader types is largely
241    undefined.
242
243[NOTE]
244.Note
245====
246The above limitations on shader invocation order make some forms of
247synchronization between shader invocations within a single set of primitives
248unimplementable.
249For example, having one invocation poll memory written by another invocation
250assumes that the other invocation has been launched and will complete its
251writes in finite time.
252====
253
254Stores issued to different memory locations within a single shader
255invocation may: not be visible to other invocations, or may: not become
256visible in the order they were performed.
257
258The code:OpMemoryBarrier instruction can: be used to provide stronger
259ordering of reads and writes performed by a single invocation.
260code:OpMemoryBarrier guarantees that any memory transactions issued by the
261shader invocation prior to the instruction complete prior to the memory
262transactions issued after the instruction.
263Memory barriers are needed for algorithms that require multiple invocations
264to access the same memory and require the operations to be performed in a
265partially-defined relative order.
266For example, if one shader invocation does a series of writes, followed by
267an code:OpMemoryBarrier instruction, followed by another write, then the
268results of the series of writes before the barrier become visible to other
269shader invocations at a time earlier or equal to when the results of the
270final write become visible to those invocations.
271In practice it means that another invocation that sees the results of the
272final write would also see the previous writes.
273Without the memory barrier, the final write may: be visible before the
274previous writes.
275
276Writes that are the result of shader stores through a variable decorated
277with code:Coherent automatically have available writes to the same buffer,
278buffer view, or image view made visible to them, and are themselves
279automatically made available to access by the same buffer, buffer view, or
280image view.
281Reads that are the result of shader loads through a variable decorated with
282code:Coherent automatically have available writes to the same buffer, buffer
283view, or image view made visible to them.
284The order that coherent writes to different locations become available is
285undefined, unless enforced by a memory barrier instruction or other memory
286dependency.
287
288[NOTE]
289.Note
290====
291Explicit memory dependencies must: still be used to guarantee availability
292and visibility for access via other buffers, buffer views, or image views.
293====
294
295The built-in atomic memory transaction instructions can: be used to read and
296write a given memory address atomically.
297While built-in atomic functions issued by multiple shader invocations are
298executed in undefined order relative to each other, these functions perform
299both a read and a write of a memory address and guarantee that no other
300memory transaction will write to the underlying memory between the read and
301write.
302Atomic operations ensure automatic availability and visibility for writes
303and reads in the same way as those to code:Coherent variables.
304
305[NOTE]
306.Note
307====
308Memory accesses performed on different resource descriptors with the same
309memory backing may: not be well-defined even with the code:Coherent
310decoration or via atomics, due to things such as image layouts or ownership
311of the resource - as described in the <<synchronization, Synchronization and
312Cache Control>> chapter.
313====
314
315[NOTE]
316.Note
317====
318Atomics allow shaders to use shared global addresses for mutual exclusion or
319as counters, among other uses.
320====
321
322
323[[shaders-inputs]]
324== Shader Inputs and Outputs
325
326Data is passed into and out of shaders using variables with input or output
327storage class, respectively.
328User-defined inputs and outputs are connected between stages by matching
329their code:Location decorations.
330Additionally, data can: be provided by or communicated to special functions
331provided by the execution environment using code:BuiltIn decorations.
332
333In many cases, the same code:BuiltIn decoration can: be used in multiple
334shader stages with similar meaning.
335The specific behavior of variables decorated as code:BuiltIn is documented
336in the following sections.
337
338
339[[shaders-vertex]]
340== Vertex Shaders
341
342Each vertex shader invocation operates on one vertex and its associated
343<<fxvertex-attrib,vertex attribute>> data, and outputs one vertex and
344associated data.
345Graphics pipelines must: include a vertex shader, and the vertex shader
346stage is always the first shader stage in the graphics pipeline.
347
348
349[[shaders-vertex-execution]]
350=== Vertex Shader Execution
351
352A vertex shader must: be executed at least once for each vertex specified by
353a draw command.
354ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
355If the subpass includes multiple views in its view mask, the shader may: be
356invoked separately for each view.
357endif::VK_VERSION_1_1,VK_KHR_multiview[]
358During execution, the shader is presented with the index of the vertex and
359instance for which it has been invoked.
360Input variables declared in the vertex shader are filled by the
361implementation with the values of vertex attributes associated with the
362invocation being executed.
363
364If the same vertex is specified multiple times in a draw command (e.g. by
365including the same index value multiple times in an index buffer) the
366implementation may: reuse the results of vertex shading if it can statically
367determine that the vertex shader invocations will produce identical results.
368
369[NOTE]
370.Note
371====
372It is implementation-dependent when and if results of vertex shading are
373reused, and thus how many times the vertex shader will be executed.
374This is true also if the vertex shader contains stores or atomic operations
375(see <<features-features-vertexPipelineStoresAndAtomics,
376pname:vertexPipelineStoresAndAtomics>>).
377====
378
379
380[[shaders-tessellation-control]]
381== Tessellation Control Shaders
382
383The tessellation control shader is used to read an input patch provided by
384the application and to produce an output patch.
385Each tessellation control shader invocation operates on an input patch
386(after all control points in the patch are processed by a vertex shader) and
387its associated data, and outputs a single control point of the output patch
388and its associated data, and can: also output additional per-patch data.
389The input patch is sized according to the pname:patchControlPoints member of
390slink:VkPipelineTessellationStateCreateInfo, as part of input assembly.
391The size of the output patch is controlled by the code:OpExecutionMode
392code:OutputVertices specified in the tessellation control or tessellation
393evaluation shaders, which must: be specified in at least one of the shaders.
394The size of the input and output patches must: each be greater than zero and
395less than or equal to
396sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize.
397
398
399[[shaders-tessellation-control-execution]]
400=== Tessellation Control Shader Execution
401
402A tessellation control shader is invoked at least once for each _output_
403vertex in a patch.
404ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
405If the subpass includes multiple views in its view mask, the shader may: be
406invoked separately for each view.
407endif::VK_VERSION_1_1,VK_KHR_multiview[]
408
409Inputs to the tessellation control shader are generated by the vertex
410shader.
411Each invocation of the tessellation control shader can: read the attributes
412of any incoming vertices and their associated data.
413The invocations corresponding to a given patch execute logically in
414parallel, with undefined relative execution order.
415However, the code:OpControlBarrier instruction can: be used to provide
416limited control of the execution order by synchronizing invocations within a
417patch, effectively dividing tessellation control shader execution into a set
418of phases.
419Tessellation control shaders will read undefined values if one invocation
420reads a per-vertex or per-patch attribute written by another invocation at
421any point during the same phase, or if two invocations attempt to write
422different values to the same per-patch output in a single phase.
423
424
425[[shaders-tessellation-evaluation]]
426== Tessellation Evaluation Shaders
427
428The Tessellation Evaluation Shader operates on an input patch of control
429points and their associated data, and a single input barycentric coordinate
430indicating the invocation's relative position within the subdivided patch,
431and outputs a single vertex and its associated data.
432
433
434[[shaders-tessellation-evaluation-execution]]
435=== Tessellation Evaluation Shader Execution
436
437A tessellation evaluation shader is invoked at least once for each unique
438vertex generated by the tessellator.
439ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
440If the subpass includes multiple views in its view mask, the shader may: be
441invoked separately for each view.
442endif::VK_VERSION_1_1,VK_KHR_multiview[]
443
444
445[[shaders-geometry]]
446== Geometry Shaders
447
448The geometry shader operates on a group of vertices and their associated
449data assembled from a single input primitive, and emits zero or more output
450primitives and the group of vertices and their associated data required for
451each output primitive.
452
453
454[[shaders-geometry-execution]]
455=== Geometry Shader Execution
456
457A geometry shader is invoked at least once for each primitive produced by
458the tessellation stages, or at least once for each primitive generated by
459<<drawing,primitive assembly>> when tessellation is not in use.
460A shader can request that the geometry shader runs multiple
461<<geometry-invocations, instances>>.
462A geometry shader is invoked at least once for each instance.
463ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
464If the subpass includes multiple views in its view mask, the shader may: be
465invoked separately for each view.
466endif::VK_VERSION_1_1,VK_KHR_multiview[]
467
468
469[[shaders-fragment]]
470== Fragment Shaders
471
472Fragment shaders are invoked as the result of rasterization in a graphics
473pipeline.
474Each fragment shader invocation operates on a single fragment and its
475associated data.
476With few exceptions, fragment shaders do not have access to any data
477associated with other fragments and are considered to execute in isolation
478of fragment shader invocations associated with other fragments.
479
480
481[[shaders-fragment-execution]]
482=== Fragment Shader Execution
483
484For each fragment generated by rasterization, a fragment shader may: be
485invoked.
486A fragment shader must: not be invoked if the <<fragops-early,Early
487Per-Fragment Tests>> cause it to have no coverage.
488ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
489If the subpass includes multiple views in its view mask, the shader may: be
490invoked separately for each view.
491endif::VK_VERSION_1_1,VK_KHR_multiview[]
492
493Furthermore, if it is determined that a fragment generated as the result of
494rasterizing a first primitive will have its outputs entirely overwritten by
495a fragment generated as the result of rasterizing a second primitive in the
496same subpass, and the fragment shader used for the fragment has no other
497side effects, then the fragment shader may: not be executed for the fragment
498from the first primitive.
499
500Relative ordering of execution of different fragment shader invocations is
501not defined.
502
503When a primitive (partially or fully) covers a pixel, the number of times
504the fragment shader is invoked is implementation-dependent, but must: obey
505the following constraints:
506
507  * Each covered sample is included in a single fragment shader invocation.
508  * When sample shading is not enabled, there is at least one fragment
509    shader invocation.
510  * When sample shading is enabled, the minimum number of fragment shader
511    invocations is as defined in <<primsrast-sampleshading,Sample Shading>>.
512
513When there is more than one fragment shader invocation per pixel, the
514association of samples to invocations is implementation-dependent.
515
516In addition to the conditions outlined above for the invocation of a
517fragment shader, a fragment shader invocation may: be produced as a _helper
518invocation_.
519A helper invocation is a fragment shader invocation that is created solely
520for the purposes of evaluating derivatives for use in non-helper fragment
521shader invocations.
522Stores and atomics performed by helper invocations must: not have any effect
523on memory, and values returned by atomic instructions in helper invocations
524are undefined.
525
526
527[[shaders-fragment-earlytest]]
528=== Early Fragment Tests
529
530An explicit control is provided to allow fragment shaders to enable early
531fragment tests.
532If the fragment shader specifies the code:EarlyFragmentTests
533code:OpExecutionMode, the per-fragment tests described in
534<<fragops-early-mode,Early Fragment Test Mode>> are performed prior to
535fragment shader execution.
536Otherwise, they are performed after fragment shader execution.
537
538ifdef::VK_EXT_post_depth_coverage[]
539[[shaders-fragment-earlytest-postdepthcoverage]]
540If the fragment shader additionally specifies the code:PostDepthCoverage
541code:OpExecutionMode, the value of a variable decorated with the
542<<interfaces-builtin-variables-samplemask,code:SampleMask>> built-in
543reflects the coverage after the early fragment tests.
544Otherwise, it reflects the coverage before the early fragment tests.
545endif::VK_EXT_post_depth_coverage[]
546
547[[shaders-compute]]
548== Compute Shaders
549
550Compute shaders are invoked via flink:vkCmdDispatch and
551flink:vkCmdDispatchIndirect commands.
552In general, they have access to similar resources as shader stages executing
553as part of a graphics pipeline.
554
555Compute workloads are formed from groups of work items called workgroups and
556processed by the compute shader in the current compute pipeline.
557A workgroup is a collection of shader invocations that execute the same
558shader, potentially in parallel.
559Compute shaders execute in _global workgroups_ which are divided into a
560number of _local workgroups_ with a size that can: be set by assigning a
561value to the code:LocalSize execution mode or via an object decorated by the
562code:WorkgroupSize decoration.
563An invocation within a local workgroup can: share data with other members of
564the local workgroup through shared variables and issue memory and control
565flow barriers to synchronize with other members of the local workgroup.
566
567
568[[shaders-interpolation-decorations]]
569== Interpolation Decorations
570
571Interpolation decorations control the behavior of attribute interpolation in
572the fragment shader stage.
573Interpolation decorations can: be applied to code:Input storage class
574variables in the fragment shader stage's interface, and control the
575interpolation behavior of those variables.
576
577Inputs that could be interpolated can: be decorated by at most one of the
578following decorations:
579
580  * code:Flat: no interpolation
581  * code:NoPerspective: linear interpolation (for
582    <<line_linear_interpolation,lines>> and
583    <<triangle_linear_interpolation,polygons>>).
584
585Fragment input variables decorated with neither code:Flat nor
586code:NoPerspective use perspective-correct interpolation (for
587<<line_perspective_interpolation,lines>> and
588<<triangle_perspective_interpolation,polygons>>).
589
590The presence of and type of interpolation is controlled by the above
591interpolation decorations as well as the auxiliary decorations code:Centroid
592and code:Sample.
593
594A variable decorated with code:Flat will not be interpolated.
595Instead, it will have the same value for every fragment within a triangle.
596This value will come from a single <<vertexpostproc-flatshading,provoking
597vertex>>.
598A variable decorated with code:Flat can: also be decorated with
599code:Centroid or code:Sample, which will mean the same thing as decorating
600it only as code:Flat.
601
602For fragment shader input variables decorated with neither code:Centroid nor
603code:Sample, the assigned variable may: be interpolated anywhere within the
604pixel and a single value may: be assigned to each sample within the pixel.
605
606code:Centroid and code:Sample can: be used to control the location and
607frequency of the sampling of the decorated fragment shader input.
608If a fragment shader input is decorated with code:Centroid, a single value
609may: be assigned to that variable for all samples in the pixel, but that
610value must: be interpolated to a location that lies in both the pixel and in
611the primitive being rendered, including any of the pixel's samples covered
612by the primitive.
613Because the location at which the variable is interpolated may: be different
614in neighboring pixels, and derivatives may: be computed by computing
615differences between neighboring pixels, derivatives of centroid-sampled
616inputs may: be less accurate than those for non-centroid interpolated
617variables.
618ifdef::VK_EXT_post_depth_coverage[]
619The <<shaders-fragment-earlytest-postdepthcoverage,code:PostDepthCoverage>>
620execution mode does not affect the determination of the centroid location.
621endif::VK_EXT_post_depth_coverage[]
622If a fragment shader input is decorated with code:Sample, a separate value
623must: be assigned to that variable for each covered sample in the pixel, and
624that value must: be sampled at the location of the individual sample.
625When pname:rasterizationSamples is ename:VK_SAMPLE_COUNT_1_BIT, the pixel
626center must: be used for code:Centroid, code:Sample, and undecorated
627attribute interpolation.
628
629Fragment shader inputs that are signed or unsigned integers, integer
630vectors, or any double-precision floating-point type must: be decorated with
631code:Flat.
632
633ifdef::VK_AMD_shader_explicit_vertex_parameter[]
634When the `<<VK_AMD_shader_explicit_vertex_parameter>>` device extension is
635enabled inputs can: be also decorated with the code:CustomInterpAMD
636interpolation decoration, including fragment shader inputs that are signed
637or unsigned integers, integer vectors, or any double-precision
638floating-point type.
639Inputs decorated with code:CustomInterpAMD can: only be accessed by the
640extended instruction code:InterpolateAtVertexAMD and allows accessing the
641value of the input for individual vertices of the primitive.
642endif::VK_AMD_shader_explicit_vertex_parameter[]
643
644
645[[shaders-staticuse]]
646== Static Use
647
648A SPIR-V module declares a global object in memory using the code:OpVariable
649instruction, which results in a pointer code:x to that object.
650A specific entry point in a SPIR-V module is said to _statically use_ that
651object if that entry point's call tree contains a function that contains a
652memory instruction or image instruction with code:x as an code:id operand.
653See the "`Memory Instructions`" and "`Image Instructions`" subsections of
654section 3 "`Binary Form`" of the SPIR-V specification for the complete list
655of SPIR-V memory instructions.
656
657Static use is not used to control the behavior of variables with code:Input
658and code:Output storage.
659The effects of those variables are applied based only on whether they are
660present in a shader entry point's interface.
661
662[[shaders-invocationgroups]]
663== Invocation and Derivative Groups
664
665An _invocation group_ (see the subsection "`Control Flow`" of section 2 of
666the SPIR-V specification) for a compute shader is the set of invocations in
667a single local workgroup.
668For graphics shaders, an invocation group is an implementation-dependent
669subset of the set of shader invocations of a given shader stage which are
670produced by a single drawing command.
671For indirect drawing commands with pname:drawCount greater than one,
672invocations from separate draws are in distinct invocation groups.
673
674[NOTE]
675.Note
676====
677Because the partitioning of invocations into invocation groups is
678implementation-dependent and not observable, applications generally need to
679assume the worst case of all invocations in a draw belonging to a single
680invocation group.
681====
682
683A _derivative group_ (see the subsection "`Control Flow`" of section 2 of
684the SPIR-V 1.00 Revision 4 specification) for a fragment shader is the set
685of invocations generated by a single primitive (point, line, or triangle),
686including any helper invocations generated by that primitive.
687Derivatives are undefined for a sampled image instruction if the instruction
688is in flow control that is not uniform across the derivative group.
689
690ifdef::VK_VERSION_1_1[]
691[[shaders-subgroup]]
692== Subgroups
693
694A _subgroup_ (see the subsection ``Control Flow'' of section 2 of the SPIR-V
6951.3 Revision 1 specification) is a set of invocations that can synchronize
696and share data with each other efficiently.
697An invocation group is partitioned into one or more subgroups.
698
699Subgroup operations are divided into various categories as described in
700elink:VkSubgroupFeatureFlagBits.
701
702[[shaders-subgroup-basic]]
703=== Basic Subgroup Operations
704
705The basic subgroup operations allow two classes of functionality within
706shaders
707- elect and barrier.
708Invocations within a subgroup can: choose a single invocation to perform
709some task for the subgroup as a whole using elect.
710Invocations within a subgroup can: perform a subgroup barrier to ensure the
711ordering of execution or memory accesses within a subgroup.
712Barriers can: be performed on buffer memory accesses, code:WorkgroupLocal
713memory accesses, and image memory accesses to ensure that any results
714written are visible by other invocations within the subgroup.
715An code:OpControlBarrier can: also be used to perform a full execution
716control barrier.
717A full execution control barrier will ensure that each active invocation
718within the subgroup reaches a point of execution before any are allowed to
719continue.
720
721[[shaders-subgroup-vote]]
722=== Vote Subgroup Operations
723
724The vote subgroup operations allow invocations within a subgroup to compare
725values across a subgroup.
726The types of votes enabled are:
727
728  * Do all active subgroup invocations agree that an expression is true?
729  * Do any active subgroup invocations evaluate an expression to true?
730  * Do all active subgroup invocations have the same value of an expression?
731
732[NOTE]
733.Note
734====
735These operations are useful in combination with control flow in that they
736allow for developers to check whether conditions match across the subgroup
737and choose potentially faster code-paths in these cases.
738====
739
740[[shaders-subgroup-arithmetic]]
741=== Arithmetic Subgroup Operations
742
743The arithmetic subgroup operations allow invocations to perform scan and
744reduction operations across a subgroup.
745For reduction operations, each invocation in a subgroup will obtain the same
746result of these arithmetic operations applied across the subgroup.
747For scan operations, each invocation in the subgroup will perform an
748inclusive or exclusive scan, cumulatively applying the operation across the
749invocations in a subgroup in linear order.
750The operations supported are add, mul, min, max, and, or, xor.
751
752[[shaders-subgroup-ballot]]
753=== Ballot Subgroup Operations
754
755The ballot subgroup operations allow invocations to perform more complex
756votes across the subgroup.
757The ballot functionality allows all invocations within a subgroup to provide
758a boolean value and get as a result what each invocation provided as their
759boolean value.
760The broadcast functionality allows values to be broadcast from an invocation
761to all other invocations within the subgroup, given that the invocation to
762be broadcast from is known at pipeline creation time.
763
764[[shaders-subgroup-shuffle]]
765=== Shuffle Subgroup Operations
766
767The shuffle subgroup operations allow invocations to read values from other
768invocations within a subgroup.
769
770[[shaders-subgroup-shuffle-relative]]
771=== Shuffle Relative Subgroup Operations
772
773The shuffle relative subgroup operations allow invocations to read values
774from other invocations within the subgroup relative to the current
775invocation in the group.
776The relative operations supported allow data to be shifted up and down
777through the invocations within a subgroup.
778
779[[shaders-subgroup-clustered]]
780=== Clustered Subgroup Operations
781
782The clustered subgroup operations allow invocations to perform an operation
783among partitions of a subgroup, such that the operation is only performed
784within the subgroup invocations within a partition.
785The partitions for clustered subgroup operations are consecutive
786power-of-two size groups of invocations and the cluster size must: be known
787at pipeline creation time.
788The operations supported are add, mul, min, max, and, or, xor.
789
790[[shaders-subgroup-quad]]
791=== Quad Subgroup Operations
792
793The quad subgroup operations allow clusters of 4 invocations (a quad), to
794share data efficiently with each other.
795
796ifdef::VK_NV_shader_subgroup_partitioned[]
797
798[[shaders-subgroup-partitioned]]
799=== Partitioned Subgroup Operations
800
801The partitioned subgroup operations allow invocations to perform an
802operation among partitions of a subgroup, such that the operation is only
803performed within the subgroup invocations within a partition.
804The partitions for partitioned subgroup operations can: group the
805invocations into arbitrary subsets and can: be computed at runtime.
806The operations supported are add, mul, min, max, and, or, xor.
807
808endif::VK_NV_shader_subgroup_partitioned[]
809
810endif::VK_VERSION_1_1[]
811
812ifdef::VK_EXT_validation_cache[]
813[[shaders-validation-cache]]
814== Validation Cache
815
816[open,refpage='VkValidationCacheEXT',desc='Opaque handle to a validation cache object',type='handles']
817--
818
819Validation cache objects allow the result of internal validation to be
820reused, both within a single application run and between multiple runs.
821Reuse within a single run is achieved by passing the same validation cache
822object when creating supported Vulkan objects.
823Reuse across runs of an application is achieved by retrieving validation
824cache contents in one run of an application, saving the contents, and using
825them to preinitialize a validation cache on a subsequent run.
826The contents of the validation cache objects are managed by the validation
827layers.
828Applications can: manage the host memory consumed by a validation cache
829object and control the amount of data retrieved from a validation cache
830object.
831
832Validation cache objects are represented by sname:VkValidationCacheEXT
833handles:
834
835include::../api/handles/VkValidationCacheEXT.txt[]
836
837--
838
839[open,refpage='vkCreateValidationCacheEXT',desc='Creates a new validation cache',type='protos']
840--
841
842To create validation cache objects, call:
843
844include::../api/protos/vkCreateValidationCacheEXT.txt[]
845
846  * pname:device is the logical device that creates the validation cache
847    object.
848  * pname:pCreateInfo is a pointer to a slink:VkValidationCacheCreateInfoEXT
849    structure that contains the initial parameters for the validation cache
850    object.
851  * pname:pAllocator controls host memory allocation as described in the
852    <<memory-allocation, Memory Allocation>> chapter.
853  * pname:pValidationCache is a pointer to a slink:VkValidationCacheEXT
854    handle in which the resulting validation cache object is returned.
855
856[NOTE]
857.Note
858====
859Applications can: track and manage the total host memory size of a
860validation cache object using the pname:pAllocator.
861Applications can: limit the amount of data retrieved from a validation cache
862object in fname:vkGetValidationCacheDataEXT.
863Implementations should: not internally limit the total number of entries
864added to a validation cache object or the total host memory consumed.
865====
866
867Once created, a validation cache can: be passed to the
868fname:vkCreateShaderModule command as part of the
869sname:VkShaderModuleCreateInfo pname:pNext chain.
870If a sname:VkShaderModuleValidationCacheCreateInfoEXT object is part of the
871sname:VkShaderModuleCreateInfo::pname:pNext chain, and its
872pname:validationCache field is not dlink:VK_NULL_HANDLE, the implementation
873will query it for possible reuse opportunities and update it with new
874content.
875The use of the validation cache object in these commands is internally
876synchronized, and the same validation cache object can: be used in multiple
877threads simultaneously.
878
879[NOTE]
880.Note
881====
882Implementations should: make every effort to limit any critical sections to
883the actual accesses to the cache, which is expected to be significantly
884shorter than the duration of the fname:vkCreateShaderModule command.
885====
886
887include::../validity/protos/vkCreateValidationCacheEXT.txt[]
888--
889
890[open,refpage='VkValidationCacheCreateInfoEXT',desc='Structure specifying parameters of a newly created validation cache',type='structs']
891--
892
893The sname:VkValidationCacheCreateInfoEXT structure is defined as:
894
895include::../api/structs/VkValidationCacheCreateInfoEXT.txt[]
896
897  * pname:sType is the type of this structure.
898  * pname:pNext is `NULL` or a pointer to an extension-specific structure.
899  * pname:flags is reserved for future use.
900  * pname:initialDataSize is the number of bytes in pname:pInitialData.
901    If pname:initialDataSize is zero, the validation cache will initially be
902    empty.
903  * pname:pInitialData is a pointer to previously retrieved validation cache
904    data.
905    If the validation cache data is incompatible (as defined below) with the
906    device, the validation cache will be initially empty.
907    If pname:initialDataSize is zero, pname:pInitialData is ignored.
908
909.Valid Usage
910****
911  * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01534]]
912    If pname:initialDataSize is not `0`, it must: be equal to the size of
913    pname:pInitialData, as returned by fname:vkGetValidationCacheDataEXT
914    when pname:pInitialData was originally retrieved
915  * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01535]]
916    If pname:initialDataSize is not `0`, pname:pInitialData must: have been
917    retrieved from a previous call to fname:vkGetValidationCacheDataEXT
918****
919
920include::../validity/structs/VkValidationCacheCreateInfoEXT.txt[]
921--
922
923[open,refpage='VkValidationCacheCreateFlagsEXT',desc='Reserved for future use',type='enums']
924--
925include::../api/flags/VkValidationCacheCreateFlagsEXT.txt[]
926
927sname:VkValidationCacheCreateFlagsEXT is a bitmask type for setting a mask,
928but is currently reserved for future use.
929--
930
931[open,refpage='vkMergeValidationCachesEXT',desc='Combine the data stores of validation caches',type='protos']
932--
933
934Validation cache objects can: be merged using the command:
935
936include::../api/protos/vkMergeValidationCachesEXT.txt[]
937
938  * pname:device is the logical device that owns the validation cache
939    objects.
940  * pname:dstCache is the handle of the validation cache to merge results
941    into.
942  * pname:srcCacheCount is the length of the pname:pSrcCaches array.
943  * pname:pSrcCaches is an array of validation cache handles, which will be
944    merged into pname:dstCache.
945    The previous contents of pname:dstCache are included after the merge.
946
947[NOTE]
948.Note
949====
950The details of the merge operation are implementation dependent, but
951implementations should: merge the contents of the specified validation
952caches and prune duplicate entries.
953====
954
955.Valid Usage
956****
957  * [[VUID-vkMergeValidationCachesEXT-dstCache-01536]]
958    pname:dstCache must: not appear in the list of source caches
959****
960
961include::../validity/protos/vkMergeValidationCachesEXT.txt[]
962--
963
964[open,refpage='vkGetValidationCacheDataEXT',desc='Get the data store from a validation cache',type='protos']
965--
966
967Data can: be retrieved from a validation cache object using the command:
968
969include::../api/protos/vkGetValidationCacheDataEXT.txt[]
970
971  * pname:device is the logical device that owns the validation cache.
972  * pname:validationCache is the validation cache to retrieve data from.
973  * pname:pDataSize is a pointer to a value related to the amount of data in
974    the validation cache, as described below.
975  * pname:pData is either `NULL` or a pointer to a buffer.
976
977If pname:pData is `NULL`, then the maximum size of the data that can: be
978retrieved from the validation cache, in bytes, is returned in
979pname:pDataSize.
980Otherwise, pname:pDataSize must: point to a variable set by the user to the
981size of the buffer, in bytes, pointed to by pname:pData, and on return the
982variable is overwritten with the amount of data actually written to
983pname:pData.
984
985If pname:pDataSize is less than the maximum size that can: be retrieved by
986the validation cache, at most pname:pDataSize bytes will be written to
987pname:pData, and fname:vkGetValidationCacheDataEXT will return
988ename:VK_INCOMPLETE.
989Any data written to pname:pData is valid and can: be provided as the
990pname:pInitialData member of the sname:VkValidationCacheCreateInfoEXT
991structure passed to fname:vkCreateValidationCacheEXT.
992
993Two calls to fname:vkGetValidationCacheDataEXT with the same parameters
994must: retrieve the same data unless a command that modifies the contents of
995the cache is called between them.
996
997[[validation-cache-header]]
998Applications can: store the data retrieved from the validation cache, and
999use these data, possibly in a future run of the application, to populate new
1000validation cache objects.
1001The results of validation, however, may: depend on the vendor ID, device ID,
1002driver version, and other details of the device.
1003To enable applications to detect when previously retrieved data is
1004incompatible with the device, the initial bytes written to pname:pData must:
1005be a header consisting of the following members:
1006
1007.Layout for validation cache header version ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT
1008[width="85%",cols="8%,21%,71%",options="header"]
1009|====
1010| Offset | Size | Meaning
1011| 0 | 4                    | length in bytes of the entire validation cache header
1012                             written as a stream of bytes, with the least
1013                             significant byte first
1014| 4 | 4                    | a elink:VkValidationCacheHeaderVersionEXT value
1015                             written as a stream of bytes, with the least
1016                             significant byte first
1017| 8 | ename:VK_UUID_SIZE   | a layer commit ID expressed as a UUID, which uniquely
1018                             identifies the version of the validation layers used
1019                             to generate these validation results
1020|====
1021
1022The first four bytes encode the length of the entire validation cache
1023header, in bytes.
1024This value includes all fields in the header including the validation cache
1025version field and the size of the length field.
1026
1027The next four bytes encode the validation cache version, as described for
1028elink:VkValidationCacheHeaderVersionEXT.
1029A consumer of the validation cache should: use the cache version to
1030interpret the remainder of the cache header.
1031
1032If pname:pDataSize is less than what is necessary to store this header,
1033nothing will be written to pname:pData and zero will be written to
1034pname:pDataSize.
1035
1036include::../validity/protos/vkGetValidationCacheDataEXT.txt[]
1037--
1038
1039[open,refpage='VkValidationCacheHeaderVersionEXT',desc='Encode validation cache version',type='enums',xrefs='vkCreateValdiationCacheEXT vkGetValidationCacheDataEXT']
1040--
1041Possible values of the second group of four bytes in the header returned by
1042flink:vkGetValidationCacheDataEXT, encoding the validation cache version,
1043are:
1044
1045include::../api/enums/VkValidationCacheHeaderVersionEXT.txt[]
1046
1047  * ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT specifies version one
1048    of the validation cache.
1049--
1050
1051[open,refpage='vkDestroyValidationCacheEXT',desc='Destroy a validation cache object',type='protos']
1052--
1053
1054To destroy a validation cache, call:
1055
1056include::../api/protos/vkDestroyValidationCacheEXT.txt[]
1057
1058  * pname:device is the logical device that destroys the validation cache
1059    object.
1060  * pname:validationCache is the handle of the validation cache to destroy.
1061  * pname:pAllocator controls host memory allocation as described in the
1062    <<memory-allocation, Memory Allocation>> chapter.
1063
1064.Valid Usage
1065****
1066  * [[VUID-vkDestroyValidationCacheEXT-validationCache-01537]]
1067    If sname:VkAllocationCallbacks were provided when pname:validationCache
1068    was created, a compatible set of callbacks must: be provided here
1069  * [[VUID-vkDestroyValidationCacheEXT-validationCache-01538]]
1070    If no sname:VkAllocationCallbacks were provided when
1071    pname:validationCache was created, pname:pAllocator must: be `NULL`
1072****
1073
1074include::../validity/protos/vkDestroyValidationCacheEXT.txt[]
1075--
1076endif::VK_EXT_validation_cache[]
1077