1// Copyright 2015-2022 The Khronos Group Inc. 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5[[shaders]] 6= Shaders 7 8A shader specifies programmable operations that execute for each vertex, 9control point, tessellated vertex, primitive, fragment, or workgroup in the 10corresponding stage(s) of the graphics and compute pipelines. 11 12Graphics pipelines include vertex shader execution as a result of 13<<drawing,primitive assembly>>, followed, if enabled, by tessellation 14control and evaluation shaders operating on <<drawing-patch-lists,patches>>, 15geometry shaders, if enabled, operating on primitives, and fragment shaders, 16if present, operating on fragments generated by <<primsrast,Rasterization>>. 17In this specification, vertex, tessellation control, tessellation evaluation 18and geometry shaders are collectively referred to as 19<<pipelines-graphics-subsets-pre-rasterization,pre-rasterization shader 20stage>>s and occur in the logical pipeline before rasterization. 21The fragment shader occurs logically after rasterization. 22 23Only the compute shader stage is included in a compute pipeline. 24Compute shaders operate on compute invocations in a workgroup. 25 26Shaders can: read from input variables, and read from and write to output 27variables. 28Input and output variables can: be used to transfer data between shader 29stages, or to allow the shader to interact with values that exist in the 30execution environment. 31Similarly, the execution environment provides constants describing 32capabilities. 33 34Shader variables are associated with execution environment-provided inputs 35and outputs using _built-in_ decorations in the shader. 36The available decorations for each stage are documented in the following 37subsections. 38 39 40[[shader-modules]] 41== Shader Modules 42 43[open,refpage='VkShaderModule',desc='Opaque handle to a shader module object',type='handles'] 44-- 45_Shader modules_ contain _shader code_ and one or more entry points. 46Shaders are selected from a shader module by specifying an entry point as 47part of <<pipelines,pipeline>> creation. 48The stages of a pipeline can: use shaders that come from different modules. 49The shader code defining a shader module must: be in the SPIR-V format, as 50described by the <<spirvenv,Vulkan Environment for SPIR-V>> appendix. 51 52Shader modules are represented by sname:VkShaderModule handles: 53 54include::{generated}/api/handles/VkShaderModule.adoc[] 55-- 56 57[open,refpage='vkCreateShaderModule',desc='Creates a new shader module object',type='protos'] 58-- 59To create a shader module, call: 60 61include::{generated}/api/protos/vkCreateShaderModule.adoc[] 62 63 * pname:device is the logical device that creates the shader module. 64 * pname:pCreateInfo is a pointer to a slink:VkShaderModuleCreateInfo 65 structure. 66 * pname:pAllocator controls host memory allocation as described in the 67 <<memory-allocation, Memory Allocation>> chapter. 68 * pname:pShaderModule is a pointer to a slink:VkShaderModule handle in 69 which the resulting shader module object is returned. 70 71Once a shader module has been created, any entry points it contains can: be 72used in pipeline shader stages as described in <<pipelines-compute,Compute 73Pipelines>> and <<pipelines-graphics,Graphics Pipelines>>. 74 75ifdef::VK_EXT_graphics_pipeline_libraries[] 76If the <<features-graphicsPipelineLibrary, pname:graphicsPipelineLibrary>> 77feature is enabled, shader module creation can: be omitted entirely. 78Instead, applications should: provide the slink:VkShaderModuleCreateInfo 79structure directly in to pipeline creation by chaining it to 80slink:VkPipelineShaderStageCreateInfo. 81This avoids the overhead of creating and managing an additional object. 82endif::VK_EXT_graphics_pipeline_libraries[] 83 84.Valid Usage 85**** 86ifdef::VK_EXT_validation_cache[] 87 * [[VUID-vkCreateShaderModule-pCreateInfo-06904]] 88 If pname:pCreateInfo is not `NULL`, pname:pCreateInfo->pNext must: be 89 `NULL` or a pointer to a 90 slink:VkShaderModuleValidationCacheCreateInfoEXT structure 91endif::VK_EXT_validation_cache[] 92ifndef::VK_EXT_validation_cache[] 93 * [[VUID-vkCreateShaderModule-pCreateInfo-06905]] 94 If pname:pCreateInfo is not `NULL`, pname:pCreateInfo->pNext must: be 95 `NULL` 96endif::VK_EXT_validation_cache[] 97**** 98 99include::{generated}/validity/protos/vkCreateShaderModule.adoc[] 100-- 101 102[open,refpage='VkShaderModuleCreateInfo',desc='Structure specifying parameters of a newly created shader module',type='structs'] 103-- 104The sname:VkShaderModuleCreateInfo structure is defined as: 105 106include::{generated}/api/structs/VkShaderModuleCreateInfo.adoc[] 107 108 * pname:sType is the type of this structure. 109 * pname:pNext is `NULL` or a pointer to a structure extending this 110 structure. 111 * pname:flags is reserved for future use. 112 * pname:codeSize is the size, in bytes, of the code pointed to by 113 pname:pCode. 114 * pname:pCode is a pointer to code that is used to create the shader 115 module. 116 The type and format of the code is determined from the content of the 117 memory addressed by pname:pCode. 118 119.Valid Usage 120**** 121 * [[VUID-VkShaderModuleCreateInfo-codeSize-01085]] 122 pname:codeSize must: be greater than 0 123ifndef::VK_NV_glsl_shader[] 124 * [[VUID-VkShaderModuleCreateInfo-codeSize-01086]] 125 pname:codeSize must: be a multiple of 4 126 * [[VUID-VkShaderModuleCreateInfo-pCode-01087]] 127 pname:pCode must: point to valid SPIR-V code, formatted and packed as 128 described by the <<spirv-spec,Khronos SPIR-V Specification>> 129 * [[VUID-VkShaderModuleCreateInfo-pCode-01088]] 130 pname:pCode must: adhere to the validation rules described by the 131 <<spirvenv-module-validation, Validation Rules within a Module>> section 132 of the <<spirvenv-capabilities,SPIR-V Environment>> appendix 133endif::VK_NV_glsl_shader[] 134ifdef::VK_NV_glsl_shader[] 135 * [[VUID-VkShaderModuleCreateInfo-pCode-01376]] 136 If pname:pCode is a pointer to SPIR-V code, pname:codeSize must: be a 137 multiple of 4 138 * [[VUID-VkShaderModuleCreateInfo-pCode-01377]] 139 pname:pCode must: point to either valid SPIR-V code, formatted and 140 packed as described by the <<spirv-spec,Khronos SPIR-V Specification>> 141 or valid GLSL code which must: be written to the `GL_KHR_vulkan_glsl` 142 extension specification 143 * [[VUID-VkShaderModuleCreateInfo-pCode-01378]] 144 If pname:pCode is a pointer to SPIR-V code, that code must: adhere to 145 the validation rules described by the <<spirvenv-module-validation, 146 Validation Rules within a Module>> section of the 147 <<spirvenv-capabilities,SPIR-V Environment>> appendix 148 * [[VUID-VkShaderModuleCreateInfo-pCode-01379]] 149 If pname:pCode is a pointer to GLSL code, it must: be valid GLSL code 150 written to the `GL_KHR_vulkan_glsl` GLSL extension specification 151endif::VK_NV_glsl_shader[] 152 * [[VUID-VkShaderModuleCreateInfo-pCode-01089]] 153 pname:pCode must: declare the code:Shader capability for SPIR-V code 154 * [[VUID-VkShaderModuleCreateInfo-pCode-01090]] 155 pname:pCode must: not declare any capability that is not supported by 156 the API, as described by the <<spirvenv-module-validation, 157 Capabilities>> section of the <<spirvenv-capabilities,SPIR-V 158 Environment>> appendix 159 * [[VUID-VkShaderModuleCreateInfo-pCode-01091]] 160 If pname:pCode declares any of the capabilities listed in the 161 <<spirvenv-capabilities-table, SPIR-V Environment>> appendix, one of the 162 corresponding requirements must: be satisfied 163 * [[VUID-VkShaderModuleCreateInfo-pCode-04146]] 164 pname:pCode must: not declare any SPIR-V extension that is not supported 165 by the API, as described by the <<spirvenv-extensions, Extension>> 166 section of the <<spirvenv-capabilities,SPIR-V Environment>> appendix 167 * [[VUID-VkShaderModuleCreateInfo-pCode-04147]] 168 If pname:pCode declares any of the SPIR-V extensions listed in the 169 <<spirvenv-extensions-table,SPIR-V Environment>> appendix, one of the 170 corresponding requirements must: be satisfied 171**** 172 173include::{generated}/validity/structs/VkShaderModuleCreateInfo.adoc[] 174-- 175 176[open,refpage='VkShaderModuleCreateFlags',desc='Reserved for future use',type='flags'] 177-- 178include::{generated}/api/flags/VkShaderModuleCreateFlags.adoc[] 179 180tname:VkShaderModuleCreateFlags is a bitmask type for setting a mask, but is 181currently reserved for future use. 182-- 183 184ifdef::VK_EXT_validation_cache[] 185include::{chapters}/VK_EXT_validation_cache/shader-module-validation-cache.adoc[] 186endif::VK_EXT_validation_cache[] 187 188 189[open,refpage='vkDestroyShaderModule',desc='Destroy a shader module',type='protos'] 190-- 191To destroy a shader module, call: 192 193include::{generated}/api/protos/vkDestroyShaderModule.adoc[] 194 195 * pname:device is the logical device that destroys the shader module. 196 * pname:shaderModule is the handle of the shader module to destroy. 197 * pname:pAllocator controls host memory allocation as described in the 198 <<memory-allocation, Memory Allocation>> chapter. 199 200A shader module can: be destroyed while pipelines created using its shaders 201are still in use. 202 203.Valid Usage 204**** 205 * [[VUID-vkDestroyShaderModule-shaderModule-01092]] 206 If sname:VkAllocationCallbacks were provided when pname:shaderModule was 207 created, a compatible set of callbacks must: be provided here 208 * [[VUID-vkDestroyShaderModule-shaderModule-01093]] 209 If no sname:VkAllocationCallbacks were provided when pname:shaderModule 210 was created, pname:pAllocator must: be `NULL` 211**** 212 213include::{generated}/validity/protos/vkDestroyShaderModule.adoc[] 214-- 215 216 217ifdef::VK_EXT_shader_module_identifier[] 218[[shaders-identifiers]] 219== Shader Module Identifiers 220 221[open,refpage='vkGetShaderModuleIdentifierEXT',desc='Query a unique identifier for a shader module',type='protos'] 222-- 223Shader modules have unique identifiers associated with them. 224To query an implementation provided identifier, call: 225 226include::{generated}/api/protos/vkGetShaderModuleIdentifierEXT.adoc[] 227 228 * pname:device is the logical device that created the shader module. 229 * pname:shaderModule is the handle of the shader module. 230 * pname:pIdentifier is a pointer to the returned 231 slink:VkShaderModuleIdentifierEXT. 232 233The identifier returned by the implementation must: only depend on 234pname:shaderIdentifierAlgorithmUUID and information provided in the 235slink:VkShaderModuleCreateInfo which created pname:shaderModule. 236The implementation may: return equal identifiers for two different 237slink:VkShaderModuleCreateInfo structures if the difference does not affect 238pipeline compilation. 239Identifiers are only meaningful on different slink:VkDevice objects if the 240device the identifier was queried from had the same 241<<limits-shaderModuleIdentifierAlgorithmUUID, 242pname:shaderModuleIdentifierAlgorithmUUID>> as the device consuming the 243identifier. 244 245.Valid Usage 246**** 247 * [[VUID-vkGetShaderModuleIdentifierEXT-shaderModuleIdentifier-06884]] 248 <<features-shaderModuleIdentifier, pname:shaderModuleIdentifier>> 249 feature must: be enabled 250**** 251 252include::{generated}/validity/protos/vkGetShaderModuleIdentifierEXT.adoc[] 253-- 254 255[open,refpage='vkGetShaderModuleCreateInfoIdentifierEXT',desc='Query a unique identifier for a shader module create info',type='protos'] 256-- 257slink:VkShaderModuleCreateInfo structures have unique identifiers associated 258with them. 259To query an implementation provided identifier, call: 260 261include::{generated}/api/protos/vkGetShaderModuleCreateInfoIdentifierEXT.adoc[] 262 263 * pname:device is the logical device that can: create a 264 slink:VkShaderModule from pname:pCreateInfo. 265 * pname:pCreateInfo is a pointer to a slink:VkShaderModuleCreateInfo 266 structure. 267 * pname:pIdentifier is a pointer to the returned 268 slink:VkShaderModuleIdentifierEXT. 269 270The identifier returned by implementation must: only depend on 271pname:shaderIdentifierAlgorithmUUID and information provided in the 272slink:VkShaderModuleCreateInfo. 273The implementation may: return equal identifiers for two different 274slink:VkShaderModuleCreateInfo structures if the difference does not affect 275pipeline compilation. 276Identifiers are only meaningful on different slink:VkDevice objects if the 277device the identifier was queried from had the same 278<<limits-shaderModuleIdentifierAlgorithmUUID, 279pname:shaderModuleIdentifierAlgorithmUUID>> as the device consuming the 280identifier. 281 282The identifier returned by the implementation in 283flink:vkGetShaderModuleCreateInfoIdentifierEXT must: be equal to the 284identifier returned by flink:vkGetShaderModuleIdentifierEXT given equivalent 285definitions of slink:VkShaderModuleCreateInfo and any chained pname:pNext 286structures. 287 288.Valid Usage 289**** 290 * [[VUID-vkGetShaderModuleCreateInfoIdentifierEXT-shaderModuleIdentifier-06885]] 291 <<features-shaderModuleIdentifier, pname:shaderModuleIdentifier>> 292 feature must: be enabled 293**** 294 295include::{generated}/validity/protos/vkGetShaderModuleCreateInfoIdentifierEXT.adoc[] 296-- 297 298[open,refpage='VkShaderModuleIdentifierEXT',desc='A unique identifier for a shader module',type='structs'] 299-- 300slink:VkShaderModuleIdentifierEXT represents a shader module identifier 301returned by the implementation. 302 303include::{generated}/api/structs/VkShaderModuleIdentifierEXT.adoc[] 304 305 * pname:sType is the type of this structure. 306 * pname:pNext is `NULL` or a pointer to a structure extending this 307 structure. 308 * pname:identifierSize is the size, in bytes, of valid data returned in 309 pname:identifier. 310 * pname:identifier is a buffer of opaque data specifying an identifier. 311 312Any returned values beyond the first pname:identifierSize bytes are 313undefined:. 314Implementations must: return an pname:identifierSize greater than 0, and 315less-or-equal to ename:VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT. 316 317Two identifiers are considered equal if pname:identifierSize is equal and 318the first pname:identifierSize bytes of pname:identifier compare equal. 319 320Implementations may: return a different pname:identifierSize for different 321modules. 322Implementations should: ensure that pname:identifierSize is large enough to 323uniquely define a shader module. 324 325include::{generated}/validity/structs/VkShaderModuleIdentifierEXT.adoc[] 326-- 327 328[open,refpage='VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT',desc='Maximum length of a shader module identifier',type='consts'] 329-- 330ename:VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT is the length in bytes of a 331shader module identifier, as returned in 332slink:VkShaderModuleIdentifierEXT::pname:identifierSize. 333 334include::{generated}/api/enums/VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT.adoc[] 335-- 336endif::VK_EXT_shader_module_identifier[] 337 338 339[[shaders-execution]] 340== Shader Execution 341 342At each stage of the pipeline, multiple invocations of a shader may: execute 343simultaneously. 344Further, invocations of a single shader produced as the result of different 345commands may: execute simultaneously. 346The relative execution order of invocations of the same shader type is 347undefined:. 348Shader invocations may: complete in a different order than that in which the 349primitives they originated from were drawn or dispatched by the application. 350However, fragment shader outputs are written to attachments in 351<<primsrast-order,rasterization order>>. 352 353The relative execution order of invocations of different shader types is 354largely undefined:. 355However, when invoking a shader whose inputs are generated from a previous 356pipeline stage, the shader invocations from the previous stage are 357guaranteed to have executed far enough to generate input values for all 358required inputs. 359 360 361[[shaders-termination]] 362=== Shader Termination 363 364A shader invocation that is _terminated_ has finished executing 365instructions. 366 367Executing code:OpReturn in the entry point, or executing 368code:OpTerminateInvocation in any function will terminate an invocation. 369Implementations may: also terminate a shader invocation when code:OpKill is 370executed in any function; otherwise it becomes a 371<<shaders-helper-invocations, helper invocation>>. 372 373In addition to the above conditions, <<shaders-helper-invocations,helper 374invocations>> are terminated when all non-helper invocations in the same 375<<shaders-derivative-operations,derivative group>> either terminate or 376become <<shaders-helper-invocations,helper invocations>> via 377ifdef::VK_EXT_shader_demote_to_helper_invocation[] 378code:OpDemoteToHelperInvocationEXT or 379endif::VK_EXT_shader_demote_to_helper_invocation[] 380code:OpKill. 381 382A shader stage for a given command completes execution when all invocations 383for that stage have terminated. 384 385 386[[shaders-execution-memory-ordering]] 387== Shader Memory Access Ordering 388 389The order in which image or buffer memory is read or written by shaders is 390largely undefined:. 391For some shader types (vertex, tessellation evaluation, and in some cases, 392fragment), even the number of shader invocations that may: perform loads and 393stores is undefined:. 394 395In particular, the following rules apply: 396 397 * <<shaders-vertex-execution,Vertex>> and 398 <<shaders-tessellation-evaluation-execution,tessellation evaluation>> 399 shaders will be invoked at least once for each unique vertex, as defined 400 in those sections. 401 * <<fragops-shader,Fragment>> shaders will be invoked zero or more times, 402 as defined in that section. 403 * The relative execution order of invocations of the same shader type is 404 undefined:. 405 A store issued by a shader when working on primitive B might complete 406 prior to a store for primitive A, even if primitive A is specified prior 407 to primitive B. This applies even to fragment shaders; while fragment 408 shader outputs are always written to the framebuffer in 409 <<primsrast-order, rasterization order>>, stores executed by fragment 410 shader invocations are not. 411 * The relative execution order of invocations of different shader types is 412 largely undefined:. 413 414[NOTE] 415.Note 416==== 417The above limitations on shader invocation order make some forms of 418synchronization between shader invocations within a single set of primitives 419unimplementable. 420For example, having one invocation poll memory written by another invocation 421assumes that the other invocation has been launched and will complete its 422writes in finite time. 423==== 424 425ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 426 427The <<memory-model,Memory Model>> appendix defines the terminology and rules 428for how to correctly communicate between shader invocations, such as when a 429write is <<memory-model-visible-to,Visible-To>> a read, and what constitutes 430a <<memory-model-access-data-race,Data Race>>. 431 432Applications must: not cause a data race. 433 434endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 435 436ifndef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 437 438Stores issued to different memory locations within a single shader 439invocation may: not be visible to other invocations, or may: not become 440visible in the order they were performed. 441 442The code:OpMemoryBarrier instruction can: be used to provide stronger 443ordering of reads and writes performed by a single invocation. 444code:OpMemoryBarrier guarantees that any memory transactions issued by the 445shader invocation prior to the instruction complete prior to the memory 446transactions issued after the instruction. 447Memory barriers are needed for algorithms that require multiple invocations 448to access the same memory and require the operations to be performed in a 449partially-defined relative order. 450For example, if one shader invocation does a series of writes, followed by 451an code:OpMemoryBarrier instruction, followed by another write, then the 452results of the series of writes before the barrier become visible to other 453shader invocations at a time earlier or equal to when the results of the 454final write become visible to those invocations. 455In practice it means that another invocation that sees the results of the 456final write would also see the previous writes. 457Without the memory barrier, the final write may: be visible before the 458previous writes. 459 460Writes that are the result of shader stores through a variable decorated 461with code:Coherent automatically have available writes to the same buffer, 462buffer view, or image view made visible to them, and are themselves 463automatically made available to access by the same buffer, buffer view, or 464image view. 465Reads that are the result of shader loads through a variable decorated with 466code:Coherent automatically have available writes to the same buffer, buffer 467view, or image view made visible to them. 468The order that coherent writes to different locations become available is 469undefined:, unless enforced by a memory barrier instruction or other memory 470dependency. 471 472[NOTE] 473.Note 474==== 475Explicit memory dependencies must: still be used to guarantee availability 476and visibility for access via other buffers, buffer views, or image views. 477==== 478 479The built-in atomic memory transaction instructions can: be used to read and 480write a given memory address atomically. 481While built-in atomic functions issued by multiple shader invocations are 482executed in undefined: order relative to each other, these functions perform 483both a read and a write of a memory address and guarantee that no other 484memory transaction will write to the underlying memory between the read and 485write. 486Atomic operations ensure automatic availability and visibility for writes 487and reads in the same way as those to code:Coherent variables. 488 489[NOTE] 490.Note 491==== 492Memory accesses performed on different resource descriptors with the same 493memory backing may: not be well-defined even with the code:Coherent 494decoration or via atomics, due to things such as image layouts or ownership 495of the resource - as described in the <<synchronization, Synchronization and 496Cache Control>> chapter. 497==== 498 499[NOTE] 500.Note 501==== 502Atomics allow shaders to use shared global addresses for mutual exclusion or 503as counters, among other uses. 504==== 505 506endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 507 508The SPIR-V *SubgroupMemory*, *CrossWorkgroupMemory*, and 509*AtomicCounterMemory* memory semantics are ignored. 510Sequentially consistent atomics and barriers are not supported and 511*SequentiallyConsistent* is treated as *AcquireRelease*. 512*SequentiallyConsistent* should: not be used. 513 514 515[[shaders-inputs]] 516== Shader Inputs and Outputs 517 518Data is passed into and out of shaders using variables with input or output 519storage class, respectively. 520User-defined inputs and outputs are connected between stages by matching 521their code:Location decorations. 522Additionally, data can: be provided by or communicated to special functions 523provided by the execution environment using code:BuiltIn decorations. 524 525In many cases, the same code:BuiltIn decoration can: be used in multiple 526shader stages with similar meaning. 527The specific behavior of variables decorated as code:BuiltIn is documented 528in the following sections. 529 530ifdef::VK_NV_mesh_shader,VK_EXT_mesh_shader[] 531[[shaders-task]] 532== Task Shaders 533 534Task shaders operate in conjunction with the mesh shaders to produce a 535collection of primitives that will be processed by subsequent stages of the 536graphics pipeline. 537Its primary purpose is to create a variable amount of subsequent mesh shader 538invocations. 539 540Task shaders are invoked via the execution of the 541<<drawing-mesh-shading,programmable mesh shading>> pipeline. 542 543The task shader has no fixed-function inputs other than variables 544identifying the specific workgroup and invocation. 545ifdef::VK_NV_mesh_shader[] 546In the code:TaskNV {ExecutionModel} the number of mesh shader workgroups to 547create is specified via a code:TaskCountNV decorated output variable. 548endif::VK_NV_mesh_shader[] 549ifdef::VK_EXT_mesh_shader[] 550In the code:TaskEXT {ExecutionModel} the number of mesh shader workgroups to 551create is specified via the code:OpEmitMeshTasksEXT instruction. 552endif::VK_EXT_mesh_shader[] 553 554The task shader can write additional outputs to task memory, which can be 555read by all of the mesh shader workgroups it created. 556 557 558=== Task Shader Execution 559 560Task workloads are formed from groups of work items called workgroups and 561processed by the task shader in the current graphics pipeline. 562A workgroup is a collection of shader invocations that execute the same 563shader, potentially in parallel. 564Task shaders execute in _global workgroups_ which are divided into a number 565of _local workgroups_ with a size that can: be set by assigning a value to 566the code:LocalSize 567ifdef::VK_VERSION_1_3,VK_KHR_maintenance4[or code:LocalSizeId] 568execution mode or via an object decorated by the code:WorkgroupSize 569decoration. 570An invocation within a local workgroup can: share data with other members of 571the local workgroup through shared variables and issue memory and control 572flow barriers to synchronize with other members of the local workgroup. 573ifdef::VK_EXT_mesh_shader[] 574ifdef::VK_VERSION_1_1,VK_KHR_multiview[] 575If the subpass includes multiple views in its view mask, a Task shader using 576code:TaskEXT {ExecutionModel} may: be invoked separately for each view. 577endif::VK_VERSION_1_1,VK_KHR_multiview[] 578endif::VK_EXT_mesh_shader[] 579 580 581[[shaders-mesh]] 582== Mesh Shaders 583 584Mesh shaders operate in workgroups to produce a collection of primitives 585that will be processed by subsequent stages of the graphics pipeline. 586Each workgroup emits zero or more output primitives and the group of 587vertices and their associated data required for each output primitive. 588 589Mesh shaders are invoked via the execution of the 590<<drawing-mesh-shading,programmable mesh shading>> pipeline. 591 592The only inputs available to the mesh shader are variables identifying the 593specific workgroup and invocation and, if applicable, any outputs written to 594task memory by the task shader that spawned the mesh shader's workgroup. 595The mesh shader can operate without a task shader as well. 596 597The invocations of the mesh shader workgroup write an output mesh, 598comprising a set of primitives with per-primitive attributes, a set of 599vertices with per-vertex attributes, and an array of indices identifying the 600mesh vertices that belong to each primitive. 601The primitives of this mesh are then processed by subsequent graphics 602pipeline stages, where the outputs of the mesh shader form an interface with 603the fragment shader. 604 605 606=== Mesh Shader Execution 607 608Mesh workloads are formed from groups of work items called workgroups and 609processed by the mesh shader in the current graphics pipeline. 610A workgroup is a collection of shader invocations that execute the same 611shader, potentially in parallel. 612Mesh shaders execute in _global workgroups_ which are divided into a number 613of _local workgroups_ with a size that can: be set by assigning a value to 614the code:LocalSize 615ifdef::VK_VERSION_1_3,VK_KHR_maintenance4[or code:LocalSizeId] 616execution mode or via an object decorated by the code:WorkgroupSize 617decoration. 618An invocation within a local workgroup can: share data with other members of 619the local workgroup through shared variables and issue memory and control 620flow barriers to synchronize with other members of the local workgroup. 621 622The _global workgroups_ may be generated explicitly via the API, or 623implicitly through the task shader's work creation mechanism. 624endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[] 625ifdef::VK_EXT_mesh_shader[] 626ifdef::VK_VERSION_1_1,VK_KHR_multiview[] 627If the subpass includes multiple views in its view mask, a Mesh shader using 628code:MeshEXT {ExecutionModel} may: be invoked separately for each view. 629endif::VK_VERSION_1_1,VK_KHR_multiview[] 630endif::VK_EXT_mesh_shader[] 631 632 633[[shaders-vertex]] 634== Vertex Shaders 635 636Each vertex shader invocation operates on one vertex and its associated 637<<fxvertex-attrib,vertex attribute>> data, and outputs one vertex and 638associated data. 639ifndef::VK_NV_mesh_shader,VK_EXT_mesh_shader[] 640Graphics pipelines must: include a vertex shader, and the vertex shader 641stage is always the first shader stage in the graphics pipeline. 642endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[] 643ifdef::VK_NV_mesh_shader,VK_EXT_mesh_shader[] 644Graphics pipelines using primitive shading must: include a vertex shader, 645and the vertex shader stage is always the first shader stage in the graphics 646pipeline. 647endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[] 648 649 650[[shaders-vertex-execution]] 651=== Vertex Shader Execution 652 653A vertex shader must: be executed at least once for each vertex specified by 654a drawing command. 655ifdef::VK_VERSION_1_1,VK_KHR_multiview[] 656If the subpass includes multiple views in its view mask, the shader may: be 657invoked separately for each view. 658endif::VK_VERSION_1_1,VK_KHR_multiview[] 659During execution, the shader is presented with the index of the vertex and 660instance for which it has been invoked. 661Input variables declared in the vertex shader are filled by the 662implementation with the values of vertex attributes associated with the 663invocation being executed. 664 665If the same vertex is specified multiple times in a drawing command (e.g. by 666including the same index value multiple times in an index buffer) the 667implementation may: reuse the results of vertex shading if it can statically 668determine that the vertex shader invocations will produce identical results. 669 670[NOTE] 671.Note 672==== 673It is implementation-dependent when and if results of vertex shading are 674reused, and thus how many times the vertex shader will be executed. 675This is true also if the vertex shader contains stores or atomic operations 676(see <<features-vertexPipelineStoresAndAtomics, 677pname:vertexPipelineStoresAndAtomics>>). 678==== 679 680 681[[shaders-tessellation-control]] 682== Tessellation Control Shaders 683 684The tessellation control shader is used to read an input patch provided by 685the application and to produce an output patch. 686Each tessellation control shader invocation operates on an input patch 687(after all control points in the patch are processed by a vertex shader) and 688its associated data, and outputs a single control point of the output patch 689and its associated data, and can: also output additional per-patch data. 690The input patch is sized according to the pname:patchControlPoints member of 691slink:VkPipelineTessellationStateCreateInfo, as part of input assembly. 692 693ifdef::VK_EXT_extended_dynamic_state2[] 694The input patch can also be dynamically sized with pname:patchControlPoints 695parameter of flink:vkCmdSetPatchControlPointsEXT. 696 697[open,refpage='vkCmdSetPatchControlPointsEXT',desc='Specify the number of control points per patch dynamically for a command buffer',type='protos'] 698-- 699To <<pipelines-dynamic-state, dynamically set>> the number of control points 700per patch, call: 701 702include::{generated}/api/protos/vkCmdSetPatchControlPointsEXT.adoc[] 703 704 * pname:commandBuffer is the command buffer into which the command will be 705 recorded. 706 * pname:patchControlPoints specifies the number of control points per 707 patch. 708 709This command sets the number of control points per patch for subsequent 710drawing commands when the graphics pipeline is created with 711ename:VK_DYNAMIC_STATE_PATCH_CONTROL_POINTS_EXT set in 712slink:VkPipelineDynamicStateCreateInfo::pname:pDynamicStates. 713Otherwise, this state is specified by the 714slink:VkPipelineTessellationStateCreateInfo::pname:patchControlPoints value 715used to create the currently active pipeline. 716 717.Valid Usage 718**** 719 * [[VUID-vkCmdSetPatchControlPointsEXT-None-04873]] 720 The <<features-extendedDynamicState2PatchControlPoints, 721 pname:extendedDynamicState2PatchControlPoints>> feature must: be enabled 722 * [[VUID-vkCmdSetPatchControlPointsEXT-patchControlPoints-04874]] 723 pname:patchControlPoints must: be greater than zero and less than or 724 equal to sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize 725**** 726 727include::{generated}/validity/protos/vkCmdSetPatchControlPointsEXT.adoc[] 728-- 729endif::VK_EXT_extended_dynamic_state2[] 730 731The size of the output patch is controlled by the code:OpExecutionMode 732code:OutputVertices specified in the tessellation control or tessellation 733evaluation shaders, which must: be specified in at least one of the shaders. 734The size of the input and output patches must: each be greater than zero and 735less than or equal to 736sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize. 737 738 739[[shaders-tessellation-control-execution]] 740=== Tessellation Control Shader Execution 741 742A tessellation control shader is invoked at least once for each _output_ 743vertex in a patch. 744ifdef::VK_VERSION_1_1,VK_KHR_multiview[] 745If the subpass includes multiple views in its view mask, the shader may: be 746invoked separately for each view. 747endif::VK_VERSION_1_1,VK_KHR_multiview[] 748 749Inputs to the tessellation control shader are generated by the vertex 750shader. 751Each invocation of the tessellation control shader can: read the attributes 752of any incoming vertices and their associated data. 753The invocations corresponding to a given patch execute logically in 754parallel, with undefined: relative execution order. 755However, the code:OpControlBarrier instruction can: be used to provide 756limited control of the execution order by synchronizing invocations within a 757patch, effectively dividing tessellation control shader execution into a set 758of phases. 759Tessellation control shaders will read undefined: values if one invocation 760reads a per-vertex or per-patch output written by another invocation at any 761point during the same phase, or if two invocations attempt to write 762different values to the same per-patch output in a single phase. 763 764 765[[shaders-tessellation-evaluation]] 766== Tessellation Evaluation Shaders 767 768The Tessellation Evaluation Shader operates on an input patch of control 769points and their associated data, and a single input barycentric coordinate 770indicating the invocation's relative position within the subdivided patch, 771and outputs a single vertex and its associated data. 772 773 774[[shaders-tessellation-evaluation-execution]] 775=== Tessellation Evaluation Shader Execution 776 777A tessellation evaluation shader is invoked at least once for each unique 778vertex generated by the tessellator. 779ifdef::VK_VERSION_1_1,VK_KHR_multiview[] 780If the subpass includes multiple views in its view mask, the shader may: be 781invoked separately for each view. 782endif::VK_VERSION_1_1,VK_KHR_multiview[] 783 784 785[[shaders-geometry]] 786== Geometry Shaders 787 788The geometry shader operates on a group of vertices and their associated 789data assembled from a single input primitive, and emits zero or more output 790primitives and the group of vertices and their associated data required for 791each output primitive. 792 793 794[[shaders-geometry-execution]] 795=== Geometry Shader Execution 796 797A geometry shader is invoked at least once for each primitive produced by 798the tessellation stages, or at least once for each primitive generated by 799<<drawing,primitive assembly>> when tessellation is not in use. 800A shader can request that the geometry shader runs multiple 801<<geometry-invocations, instances>>. 802A geometry shader is invoked at least once for each instance. 803ifdef::VK_VERSION_1_1,VK_KHR_multiview[] 804If the subpass includes multiple views in its view mask, the shader may: be 805invoked separately for each view. 806endif::VK_VERSION_1_1,VK_KHR_multiview[] 807 808 809[[shaders-fragment]] 810== Fragment Shaders 811 812Fragment shaders are invoked as a <<fragops-shader, fragment operation>> in 813a graphics pipeline. 814Each fragment shader invocation operates on a single fragment and its 815associated data. 816With few exceptions, fragment shaders do not have access to any data 817associated with other fragments and are considered to execute in isolation 818of fragment shader invocations associated with other fragments. 819 820 821[[shaders-compute]] 822== Compute Shaders 823 824Compute shaders are invoked via flink:vkCmdDispatch and 825flink:vkCmdDispatchIndirect commands. 826In general, they have access to similar resources as shader stages executing 827as part of a graphics pipeline. 828 829Compute workloads are formed from groups of work items called workgroups and 830processed by the compute shader in the current compute pipeline. 831A workgroup is a collection of shader invocations that execute the same 832shader, potentially in parallel. 833Compute shaders execute in _global workgroups_ which are divided into a 834number of _local workgroups_ with a size that can: be set by assigning a 835value to the code:LocalSize 836ifdef::VK_VERSION_1_3,VK_KHR_maintenance4[or code:LocalSizeId] 837execution mode or via an object decorated by the code:WorkgroupSize 838decoration. 839An invocation within a local workgroup can: share data with other members of 840the local workgroup through shared variables and issue memory and control 841flow barriers to synchronize with other members of the local workgroup. 842 843 844ifdef::VK_NV_ray_tracing,VK_KHR_ray_tracing_pipeline[] 845[[shaders-raytracing-shaders]] 846[[shaders-ray-generation]] 847== Ray Generation Shaders 848 849A ray generation shader is similar to a compute shader. 850Its main purpose is to execute ray tracing queries using code:OpTraceRayKHR 851instructions and process the results. 852 853 854[[shaders-ray-generation-execution]] 855=== Ray Generation Shader Execution 856 857One ray generation shader is executed per ray tracing dispatch. 858Its location in the shader binding table (see <<shader-binding-table,Shader 859Binding Table>> for details) is passed directly into 860ifdef::VK_KHR_ray_tracing_pipeline[] 861flink:vkCmdTraceRaysKHR using the pname:pRaygenShaderBindingTable parameter 862endif::VK_KHR_ray_tracing_pipeline[] 863ifdef::VK_KHR_ray_tracing_pipeline+VK_KHR_ray_tracing_pipeline[or] 864ifdef::VK_NV_ray_tracing[] 865flink:vkCmdTraceRaysNV using the pname:raygenShaderBindingTableBuffer and 866pname:raygenShaderBindingOffset parameters 867endif::VK_NV_ray_tracing[] 868. 869 870 871[[shaders-intersection]] 872== Intersection Shaders 873 874Intersection shaders enable the implementation of arbitrary, application 875defined geometric primitives. 876An intersection shader for a primitive is executed whenever its axis-aligned 877bounding box is hit by a ray. 878 879Like other ray tracing shader domains, an intersection shader operates on a 880single ray at a time. 881It also operates on a single primitive at a time. 882It is therefore the purpose of an intersection shader to compute the 883ray-primitive intersections and report them. 884To report an intersection, the shader calls the code:OpReportIntersectionKHR 885instruction. 886 887An intersection shader communicates with any-hit and closest shaders by 888generating attribute values that they can: read. 889Intersection shaders cannot: read or modify the ray payload. 890 891 892[[shaders-intersection-execution]] 893=== Intersection Shader Execution 894The order in which intersections are found along a ray, and therefore the 895order in which intersection shaders are executed, is unspecified. 896 897The intersection shader of the closest AABB which intersects the ray is 898guaranteed to be executed at some point during traversal, unless the ray is 899forcibly terminated. 900 901 902[[shaders-any-hit]] 903== Any-Hit Shaders 904 905The any-hit shader is executed after the intersection shader reports an 906intersection that lies within the current [eq]#[t~min~,t~max~]# of the ray. 907The main use of any-hit shaders is to programmatically decide whether or not 908an intersection will be accepted. 909The intersection will be accepted unless the shader calls the 910code:OpIgnoreIntersectionKHR instruction. 911Any-hit shaders have read-only access to the attributes generated by the 912corresponding intersection shader, and can: read or modify the ray payload. 913 914 915[[shaders-any-hit-execution]] 916=== Any-Hit Shader Execution 917 918The order in which intersections are found along a ray, and therefore the 919order in which any-hit shaders are executed, is unspecified. 920 921The any-hit shader of the closest hit is guaranteed to be executed at some 922point during traversal, unless the ray is forcibly terminated. 923 924 925[[shaders-closest-hit]] 926== Closest Hit Shaders 927 928Closest hit shaders have read-only access to the attributes generated by the 929corresponding intersection shader, and can: read or modify the ray payload. 930They also have access to a number of system-generated values. 931Closest hit shaders can: call code:OpTraceRayKHR to recursively trace rays. 932 933 934[[shaders-closest-hit-execution]] 935=== Closest Hit Shader Execution 936 937Exactly one closest hit shader is executed when traversal is finished and an 938intersection has been found and accepted. 939 940 941[[shaders-miss]] 942== Miss Shaders 943 944Miss shaders can: access the ray payload and can: trace new rays through the 945code:OpTraceRayKHR instruction, but cannot: access attributes since they are 946not associated with an intersection. 947 948 949[[shaders-miss-execution]] 950=== Miss Shader Execution 951 952A miss shader is executed instead of a closest hit shader if no intersection 953was found during traversal. 954 955 956[[shaders-callable]] 957== Callable Shaders 958 959Callable shaders can: access a callable payload that works similarly to ray 960payloads to do subroutine work. 961 962 963[[shaders-callable-execution]] 964=== Callable Shader Execution 965 966A callable shader is executed by calling code:OpExecuteCallableKHR from an 967allowed shader stage. 968 969endif::VK_NV_ray_tracing,VK_KHR_ray_tracing_pipeline[] 970 971 972[[shaders-interpolation-decorations]] 973== Interpolation decorations 974 975Variables in the code:Input storage class in a fragment shader's interface 976are interpolated from the values specified by the primitive being 977rasterized. 978 979[NOTE] 980.Note 981==== 982Interpolation decorations can be present on input and output variables in 983pre-rasterization shaders but have no effect on the interpolation performed. 984ifdef::VK_EXT_graphics_pipeline_libraries[] 985However, when linking graphics pipeline libraries, if the 986<<limits-graphicsPipelineLibraryIndependentInterpolationDecoration, 987pname:graphicsPipelineLibraryIndependentInterpolationDecoration>> limit is 988not supported, interpolation qualifiers do need to match between the 989fragment shader input and the last pre-rasterization shader output. 990endif::VK_EXT_graphics_pipeline_libraries[] 991==== 992 993An undecorated input variable will be interpolated with perspective-correct 994interpolation according to the primitive type being rasterized. 995<<line_perspective_interpolation,Lines>> and 996<<triangle_perspective_interpolation,polygons>> are interpolated in the same 997way as the primitive's clip coordinates. 998If the code:NoPerspective decoration is present, linear interpolation is 999instead used for <<line_linear_interpolation,lines>> and 1000<<triangle_linear_interpolation,polygons>>. 1001For points, as there is only a single vertex, input values are never 1002interpolated and instead take the value written for the single vertex. 1003 1004If the code:Flat decoration is present on an input variable, the value is 1005not interpolated, and instead takes its value directly from the 1006<<vertexpostproc-flatshading,provoking vertex>>. 1007Fragment shader inputs that are signed or unsigned integers, integer 1008vectors, or any double-precision floating-point type must: be decorated with 1009code:Flat. 1010 1011Interpolation of input variables is performed at an implementation-defined 1012position within the fragment area being shaded. 1013The position is further constrained as follows: 1014 1015 * If the code:Centroid decoration is used, the interpolation position used 1016 for the variable must: also fall within the bounds of the primitive 1017 being rasterized. 1018 * If the code:Sample decoration is used, the interpolation position used 1019 for the variable must: be at the position of the sample being shaded by 1020 the current fragment shader invocation. 1021 * If a sample count of 1 is used, the interpolation position must: be at 1022 the center of the fragment area. 1023 1024[NOTE] 1025.Note 1026==== 1027As code:Centroid restricts the possible interpolation position to the 1028covered area of the primitive, the position can be forced to vary between 1029neighboring fragments when it otherwise would not. 1030Derivatives calculated based on these differing locations can produce 1031inconsistent results compared to undecorated inputs. 1032It is recommended that input variables used in derivative calculations are 1033not decorated with code:Centroid. 1034==== 1035 1036ifdef::VK_NV_fragment_shader_barycentric,VK_KHR_fragment_shader_barycentric[] 1037[[shaders-interpolation-decorations-pervertexkhr]] 1038If the code:PerVertexKHR decoration is present on an input variable, the 1039value is not interpolated, and instead values from all input vertices are 1040available in an array. 1041Each index of the array corresponds to one of the vertices of the primitive 1042that produced the fragment. 1043endif::VK_NV_fragment_shader_barycentric,VK_KHR_fragment_shader_barycentric[] 1044 1045ifdef::VK_AMD_shader_explicit_vertex_parameter[] 1046If the code:CustomInterpAMD decoration is present on an input variable, the 1047value cannot: be accessed directly; instead the extended instruction 1048code:InterpolateAtVertexAMD must: be used to obtain values from the input 1049vertices. 1050endif::VK_AMD_shader_explicit_vertex_parameter[] 1051 1052 1053[[shaders-staticuse]] 1054== Static Use 1055 1056A SPIR-V module declares a global object in memory using the code:OpVariable 1057instruction, which results in a pointer code:x to that object. 1058A specific entry point in a SPIR-V module is said to _statically use_ that 1059object if that entry point's call tree contains a function containing a 1060instruction with code:x as an code:id operand. 1061 1062Static use is not used to control the behavior of variables with code:Input 1063and code:Output storage. 1064The effects of those variables are applied based only on whether they are 1065present in a shader entry point's interface. 1066 1067 1068[[shaders-scope]] 1069== Scope 1070 1071A _scope_ describes a set of shader invocations, where each such set is a 1072_scope instance_. 1073Each invocation belongs to one or more scope instances, but belongs to no 1074more than one scope instance for each scope. 1075 1076The operations available between invocations in a given scope instance vary, 1077with smaller scopes generally able to perform more operations, and with 1078greater efficiency. 1079 1080 1081[[shaders-scope-cross-device]] 1082=== Cross Device 1083 1084All invocations executed in a Vulkan instance fall into a single _cross 1085device scope instance_. 1086 1087Whilst the code:CrossDevice scope is defined in SPIR-V, it is disallowed in 1088Vulkan. 1089API <<synchronization, synchronization>> commands can: be used to 1090communicate between devices. 1091 1092 1093[[shaders-scope-device]] 1094=== Device 1095 1096All invocations executed on a single device form a _device scope instance_. 1097 1098ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 1099If the <<features-vulkanMemoryModel, pname:vulkanMemoryModel>> and 1100<<features-vulkanMemoryModelDeviceScope, 1101pname:vulkanMemoryModelDeviceScope>> features are enabled, this scope is 1102represented in SPIR-V by the code:Device code:Scope, which can: be used as a 1103code:Memory code:Scope for barrier and atomic operations. 1104 1105ifdef::VK_KHR_shader_clock[] 1106If both the <<features-shaderDeviceClock, pname:shaderDeviceClock>> and 1107<<features-vulkanMemoryModelDeviceScope, 1108pname:vulkanMemoryModelDeviceScope>> features are enabled, using the 1109code:Device code:Scope with the code:OpReadClockKHR instruction will read 1110from a clock that is consistent across invocations in the same device scope 1111instance. 1112endif::VK_KHR_shader_clock[] 1113endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 1114 1115There is no method to synchronize the execution of these invocations within 1116SPIR-V, and this can: only be done with API synchronization primitives. 1117 1118ifdef::VK_VERSION_1_1,VK_KHR_device_group[] 1119Invocations executing on different devices in a device group operate in 1120separate device scope instances. 1121endif::VK_VERSION_1_1,VK_KHR_device_group[] 1122 1123ifndef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 1124The scope only extends to the queue family, not the whole device. 1125endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 1126 1127 1128[[shaders-scope-queue-family]] 1129=== Queue Family 1130 1131Invocations executed by queues in a given queue family form a _queue family 1132scope instance_. 1133 1134This scope is identified in SPIR-V as the 1135ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 1136code:QueueFamily code:Scope if the <<features-vulkanMemoryModel, 1137pname:vulkanMemoryModel>> feature is enabled, or if not, the 1138endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 1139code:Device code:Scope, which can: be used as a code:Memory code:Scope for 1140barrier and atomic operations. 1141 1142ifdef::VK_KHR_shader_clock[] 1143If the <<features-shaderDeviceClock, pname:shaderDeviceClock>> feature is 1144enabled, 1145ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 1146but the <<features-vulkanMemoryModelDeviceScope, 1147pname:vulkanMemoryModelDeviceScope>> feature is not enabled, 1148endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 1149using the code:Device code:Scope with the code:OpReadClockKHR instruction 1150will read from a clock that is consistent across invocations in the same 1151queue family scope instance. 1152endif::VK_KHR_shader_clock[] 1153 1154There is no method to synchronize the execution of these invocations within 1155SPIR-V, and this can: only be done with API synchronization primitives. 1156 1157Each invocation in a queue family scope instance must: be in the same 1158<<shaders-scope-device, device scope instance>>. 1159 1160 1161[[shaders-scope-command]] 1162=== Command 1163 1164Any shader invocations executed as the result of a single command such as 1165flink:vkCmdDispatch or flink:vkCmdDraw form a _command scope instance_. 1166For indirect drawing commands with pname:drawCount greater than one, 1167invocations from separate draws are in separate command scope instances. 1168ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 1169For ray tracing shaders, an invocation group is an implementation-dependent 1170subset of the set of shader invocations of a given shader stage which are 1171produced by a single trace rays command. 1172endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 1173 1174There is no specific code:Scope for communication across invocations in a 1175command scope instance. 1176As this has a clear boundary at the API level, coordination here can: be 1177performed in the API, rather than in SPIR-V. 1178 1179Each invocation in a command scope instance must: be in the same 1180<<shaders-scope-queue-family, queue-family scope instance>>. 1181 1182For shaders without defined <<shaders-scope-workgroup, workgroups>>, this 1183set of invocations forms an _invocation group_ as defined in the 1184<<spirv-spec,SPIR-V specification>>. 1185 1186 1187[[shaders-scope-primitive]] 1188=== Primitive 1189 1190Any fragment shader invocations executed as the result of rasterization of a 1191single primitive form a _primitive scope instance_. 1192 1193There is no specific code:Scope for communication across invocations in a 1194primitive scope instance. 1195 1196Any generated <<shaders-helper-invocations, helper invocations>> are 1197included in this scope instance. 1198 1199Each invocation in a primitive scope instance must: be in the same 1200<<shaders-scope-command, command scope instance>>. 1201 1202Any input variables decorated with code:Flat are uniform within a primitive 1203scope instance. 1204 1205 1206// intentionally no VK_NV_ray_tracing here since this scope does not exist there 1207ifdef::VK_KHR_ray_tracing_pipeline[] 1208[[shaders-scope-shadercall]] 1209=== Shader Call 1210 1211Any <<shader-call-related,shader-call-related>> invocations that are 1212executed in one or more ray tracing execution models form a _shader call 1213scope instance_. 1214 1215The code:ShaderCallKHR code:Scope can be used as code:Memory code:Scope for 1216barrier and atomic operations. 1217 1218Each invocation in a shader call scope instance must: be in the same 1219<<shaders-scope-queue-family, queue family scope instance>>. 1220endif::VK_KHR_ray_tracing_pipeline[] 1221 1222 1223[[shaders-scope-workgroup]] 1224=== Workgroup 1225 1226A _local workgroup_ is a set of invocations that can synchronize and share 1227data with each other using memory in the code:Workgroup storage class. 1228 1229The code:Workgroup code:Scope can be used as both an code:Execution 1230code:Scope and code:Memory code:Scope for barrier and atomic operations. 1231 1232Each invocation in a local workgroup must: be in the same 1233<<shaders-scope-command, command scope instance>>. 1234 1235Only 1236ifdef::VK_NV_mesh_shader,VK_EXT_mesh_shader[] 1237task, mesh, and 1238endif::VK_NV_mesh_shader,VK_EXT_mesh_shader[] 1239compute shaders have defined workgroups - other shader types cannot: use 1240workgroup functionality. 1241For shaders that have defined workgroups, this set of invocations forms an 1242_invocation group_ as defined in the <<spirv-spec,SPIR-V specification>>. 1243 1244 1245ifdef::VK_VERSION_1_1[] 1246[[shaders-scope-subgroup]] 1247=== Subgroup 1248 1249A _subgroup_ (see the subsection "`Control Flow`" of section 2 of the SPIR-V 12501.3 Revision 1 specification) is a set of invocations that can synchronize 1251and share data with each other efficiently. 1252 1253The code:Subgroup code:Scope can be used as both an code:Execution 1254code:Scope and code:Memory code:Scope for barrier and atomic operations. 1255Other <<VkSubgroupFeatureFlagBits, subgroup features>> allow the use of 1256<<shaders-group-operations, group operations>> with subgroup scope. 1257 1258ifdef::VK_KHR_shader_clock[] 1259If the <<features-shaderSubgroupClock, pname:shaderSubgroupClock>> feature 1260is enabled, using the code:Subgroup code:Scope with the code:OpReadClockKHR 1261instruction will read from a clock that is consistent across invocations in 1262the same subgroup. 1263endif::VK_KHR_shader_clock[] 1264 1265For <<shaders-scope-workgroup, shaders that have defined workgroups>>, each 1266invocation in a subgroup must: be in the same <<shaders-scope-workgroup, 1267local workgroup>>. 1268 1269In other shader stages, each invocation in a subgroup must: be in the same 1270<<shaders-scope-device, device scope instance>>. 1271 1272Only <<limits-subgroup-supportedStages, shader stages that support subgroup 1273operations>> have defined subgroups. 1274endif::VK_VERSION_1_1[] 1275 1276 1277[[shaders-scope-quad]] 1278=== Quad 1279 1280A _quad scope instance_ is formed of four shader invocations. 1281 1282In a fragment shader, each invocation in a quad scope instance is formed of 1283invocations in neighboring framebuffer locations [eq]#(x~i~, y~i~)#, where: 1284 1285 * [eq]#i# is the index of the invocation within the scope instance. 1286 * [eq]#w# and [eq]#h# are the number of pixels the fragment covers in the 1287 [eq]#x# and [eq]#y# axes. 1288 * [eq]#w# and [eq]#h# are identical for all participating invocations. 1289 * [eq]#(x~0~) = (x~1~ - w) = (x~2~) = (x~3~ - w)# 1290 * [eq]#(y~0~) = (y~1~) = (y~2~ - h) = (y~3~ - h)# 1291 * Each invocation has the same layer and sample indices. 1292 1293ifdef::VK_NV_compute_shader_derivatives[] 1294In a compute shader, if the code:DerivativeGroupQuadsNV execution mode is 1295specified, each invocation in a quad scope instance is formed of invocations 1296with adjacent local invocation IDs [eq]#(x~i~, y~i~)#, where: 1297 1298 * [eq]#i# is the index of the invocation within the quad scope instance. 1299 * [eq]#(x~0~) = (x~1~ - 1) = (x~2~) = (x~3~ - 1)# 1300 * [eq]#(y~0~) = (y~1~) = (y~2~ - 1) = (y~3~ - 1)# 1301 * [eq]#x~0~# and [eq]#y~0~# are integer multiples of 2. 1302 * Each invocation has the same [eq]#z# coordinate. 1303 1304In a compute shader, if the code:DerivativeGroupLinearNV execution mode is 1305specified, each invocation in a quad scope instance is formed of invocations 1306with adjacent local invocation indices [eq]#(l~i~)#, where: 1307 1308 * [eq]#i# is the index of the invocation within the quad scope instance. 1309 * [eq]#(l~0~) = (l~1~ - 1) = (l~2~ - 2) = (l~3~ - 3)# 1310 * [eq]#l~0~# is an integer multiple of 4. 1311 1312endif::VK_NV_compute_shader_derivatives[] 1313 1314ifdef::VK_VERSION_1_1[] 1315In all shaders, each invocation in a quad scope instance is formed of 1316invocations in adjacent subgroup invocation indices [eq]#(s~i~)#, where: 1317 1318 * [eq]#i# is the index of the invocation within the quad scope instance. 1319 * [eq]#(s~0~) = (s~1~ - 1) = (s~2~ - 2) = (s~3~ - 3)# 1320 * [eq]#s~0~# is an integer multiple of 4. 1321 1322Each invocation in a quad scope instance must: be in the same 1323<<shaders-scope-subgroup, subgroup>>. 1324endif::VK_VERSION_1_1[] 1325 1326ifndef::VK_VERSION_1_1[] 1327The specific set of invocations that make up a quad scope instance in other 1328shader stages is undefined:. 1329endif::VK_VERSION_1_1[] 1330 1331In a fragment shader, each invocation in a quad scope instance must: be in 1332the same <<shaders-scope-primitive, primitive scope instance>>. 1333 1334ifndef::VK_VERSION_1_1[] 1335For <<shaders-scope-workgroup, shaders that have defined workgroups>>, each 1336invocation in a quad scope instance must: be in the same 1337<<shaders-scope-workgroup, local workgroup>>. 1338 1339In other shader stages, each invocation in a quad scope instance must: be in 1340the same <<shaders-scope-device, device scope instance>>. 1341endif::VK_VERSION_1_1[] 1342 1343Fragment 1344ifdef::VK_NV_compute_shader_derivatives,VK_VERSION_1_1[] 1345and compute 1346endif::VK_NV_compute_shader_derivatives,VK_VERSION_1_1[] 1347shaders have defined quad scope instances. 1348ifdef::VK_VERSION_1_1[] 1349If the <<limits-subgroup-quadOperationsInAllStages, 1350pname:quadOperationsInAllStages>> limit is supported, any 1351<<limits-subgroup-supportedStages, shader stages that support subgroup 1352operations>> also have defined quad scope instances. 1353endif::VK_VERSION_1_1[] 1354 1355 1356ifdef::VK_EXT_fragment_shader_interlock[] 1357[[shaders-scope-fragment-interlock]] 1358=== Fragment Interlock 1359 1360A _fragment interlock scope instance_ is formed of fragment shader 1361invocations based on their framebuffer locations [eq]#(x,y,layer,sample)#, 1362executed by commands inside a single <<renderpass,subpass>>. 1363 1364The specific set of invocations included varies based on the execution mode 1365as follows: 1366 1367 * If the code:SampleInterlockOrderedEXT or 1368 code:SampleInterlockUnorderedEXT execution modes are used, only 1369 invocations with identical framebuffer locations 1370 [eq]#(x,y,layer,sample)# are included. 1371 * If the code:PixelInterlockOrderedEXT or code:PixelInterlockUnorderedEXT 1372 execution modes are used, fragments with different sample ids are also 1373 included. 1374ifdef::VK_NV_shading_rate_image,VK_KHR_fragment_shading_rate[] 1375 * If the code:ShadingRateInterlockOrderedEXT or 1376 code:ShadingRateInterlockUnorderedEXT execution modes are used, 1377 fragments from neighbouring framebuffer locations are also included. 1378 The 1379ifdef::VK_NV_shading_rate_image[<<primsrast-shading-rate-image, shading rate image>>] 1380ifdef::VK_KHR_fragment_shading_rate+VK_NV_shading_rate_image[or] 1381ifdef::VK_KHR_fragment_shading_rate[<<primsrast-fragment-shading-rate, fragment shading rate>>] 1382 determines these fragments. 1383endif::VK_NV_shading_rate_image,VK_KHR_fragment_shading_rate[] 1384 1385Only fragment shaders with one of the above execution modes have defined 1386fragment interlock scope instances. 1387 1388There is no specific code:Scope value for communication across invocations 1389in a fragment interlock scope instance. 1390However, this is implicitly used as a memory scope by 1391code:OpBeginInvocationInterlockEXT and code:OpEndInvocationInterlockEXT. 1392 1393Each invocation in a fragment interlock scope instance must: be in the same 1394<<shaders-scope-queue-family, queue family scope instance>>. 1395endif::VK_EXT_fragment_shader_interlock[] 1396 1397 1398[[shaders-scope-invocation]] 1399=== Invocation 1400 1401The smallest _scope_ is a single invocation; this is represented by the 1402code:Invocation code:Scope in SPIR-V. 1403 1404Fragment shader invocations must: be in a <<shaders-scope-primitive, 1405primitive scope instance>>. 1406 1407ifdef::VK_EXT_fragment_shader_interlock[] 1408Invocations in <<shaders-scope-fragment-interlock, fragment shaders that 1409have a defined fragment interlock scope>> must: be in a 1410<<shaders-scope-fragment-interlock, fragment interlock scope instance>>. 1411endif::VK_EXT_fragment_shader_interlock[] 1412 1413Invocations in <<shaders-scope-workgroup, shaders that have defined 1414workgroups>> must: be in a <<shaders-scope-workgroup, local workgroup>>. 1415 1416ifdef::VK_VERSION_1_1[] 1417Invocations in <<shaders-scope-subgroup, shaders that have a defined 1418subgroup scope>> must: be in a <<shaders-scope-subgroup, subgroup>>. 1419endif::VK_VERSION_1_1[] 1420 1421Invocations in <<shaders-scope-quad, shaders that have a defined quad 1422scope>> must: be in a <<shaders-scope-quad, quad scope instance>>. 1423 1424All invocations in all stages must: be in a <<shaders-scope-command,command 1425scope instance>>. 1426 1427 1428ifdef::VK_VERSION_1_1[] 1429[[shaders-group-operations]] 1430== Group Operations 1431 1432_Group operations_ are executed by multiple invocations within a 1433<<shaders-scope, scope instance>>; with each invocation involved in 1434calculating the result. 1435This provides a mechanism for efficient communication between invocations in 1436a particular scope instance. 1437 1438Group operations all take a code:Scope defining the desired 1439<<shaders-scope,scope instance>> to operate within. 1440Only the code:Subgroup scope can: be used for these operations; the 1441<<limits-subgroupSupportedOperations, pname:subgroupSupportedOperations>> 1442limit defines which types of operation can: be used. 1443 1444 1445[[shaders-group-operations-basic]] 1446=== Basic Group Operations 1447 1448Basic group operations include the use of code:OpGroupNonUniformElect, 1449code:OpControlBarrier, code:OpMemoryBarrier, and atomic operations. 1450 1451code:OpGroupNonUniformElect can: be used to choose a single invocation to 1452perform a task for the whole group. 1453Only the invocation with the lowest id in the group will return code:true. 1454 1455The <<memory-model,Memory Model>> appendix defines the operation of barriers 1456and atomics. 1457 1458 1459[[shaders-group-operations-vote]] 1460=== Vote Group Operations 1461 1462The vote group operations allow invocations within a group to compare values 1463across a group. 1464The types of votes enabled are: 1465 1466 * Do all active group invocations agree that an expression is true? 1467 * Do any active group invocations evaluate an expression to true? 1468 * Do all active group invocations have the same value of an expression? 1469 1470[NOTE] 1471.Note 1472==== 1473These operations are useful in combination with control flow in that they 1474allow for developers to check whether conditions match across the group and 1475choose potentially faster code-paths in these cases. 1476==== 1477 1478 1479[[shaders-group-operations-arithmetic]] 1480=== Arithmetic Group Operations 1481 1482The arithmetic group operations allow invocations to perform scans and 1483reductions across a group. 1484The operators supported are add, mul, min, max, and, or, xor. 1485 1486For reductions, every invocation in a group will obtain the cumulative 1487result of these operators applied to all values in the group. 1488For exclusive scans, each invocation in a group will obtain the cumulative 1489result of these operators applied to all values in invocations with a lower 1490index in the group. 1491Inclusive scans are identical to exclusive scans, except the cumulative 1492result includes the operator applied to the value in the current invocation. 1493 1494The order in which these operators are applied is implementation-dependent. 1495 1496 1497[[shaders-group-operations-ballot]] 1498=== Ballot Group Operations 1499 1500The ballot group operations allow invocations to perform more complex votes 1501across the group. 1502The ballot functionality allows all invocations within a group to provide a 1503boolean value and get as a result what each invocation provided as their 1504boolean value. 1505The broadcast functionality allows values to be broadcast from an invocation 1506to all other invocations within the group. 1507 1508 1509[[shaders-group-operations-shuffle]] 1510=== Shuffle Group Operations 1511 1512The shuffle group operations allow invocations to read values from other 1513invocations within a group. 1514 1515 1516[[shaders-group-operations-shuffle-relative]] 1517=== Shuffle Relative Group Operations 1518 1519The shuffle relative group operations allow invocations to read values from 1520other invocations within the group relative to the current invocation in the 1521group. 1522The relative operations supported allow data to be shifted up and down 1523through the invocations within a group. 1524 1525 1526[[shaders-group-operations-clustered]] 1527=== Clustered Group Operations 1528 1529The clustered group operations allow invocations to perform an operation 1530among partitions of a group, such that the operation is only performed 1531within the group invocations within a partition. 1532The partitions for clustered group operations are consecutive power-of-two 1533size groups of invocations and the cluster size must: be known at pipeline 1534creation time. 1535The operations supported are add, mul, min, max, and, or, xor. 1536 1537 1538[[shaders-quad-operations]] 1539== Quad Group Operations 1540 1541Quad group operations (code:OpGroupNonUniformQuad*) are a specialized type 1542of <<shaders-group-operations, group operations>> that only operate on 1543<<shaders-scope-quad, quad scope instances>>. 1544Whilst these instructions do include a code:Scope parameter, this scope is 1545always overridden; only the <<shaders-scope-quad, quad scope instance>> is 1546included in its execution scope. 1547 1548Fragment shaders that statically execute quad group operations must: launch 1549sufficient invocations to ensure their correct operation; additional 1550<<shaders-helper-invocations, helper invocations>> are launched for 1551framebuffer locations not covered by rasterized fragments if necessary. 1552 1553The index used to select participating invocations is [eq]#i#, as described 1554for a <<shaders-scope-quad, quad scope instance>>, defined as the _quad 1555index_ in the <<spirv-spec,SPIR-V specification>>. 1556 1557For code:OpGroupNonUniformQuadBroadcast this value is equal to code:Index. 1558For code:OpGroupNonUniformQuadSwap, it is equal to the implicit code:Index 1559used by each participating invocation. 1560endif::VK_VERSION_1_1[] 1561 1562 1563[[shaders-derivative-operations]] 1564== Derivative Operations 1565 1566Derivative operations calculate the partial derivative for an expression 1567[eq]#P# as a function of an invocation's [eq]#x# and [eq]#y# coordinates. 1568 1569Derivative operations operate on a set of invocations known as a _derivative 1570group_ as defined in the <<spirv-spec,SPIR-V specification>>. 1571A derivative group is equivalent to 1572ifdef::VK_NV_compute_shader_derivatives[] 1573the <<shaders-scope-quad, quad scope instance>> for a compute shader 1574invocation, or 1575endif::VK_NV_compute_shader_derivatives[] 1576the <<shaders-scope-primitive, primitive scope instance>> for a fragment 1577shader invocation. 1578 1579Derivatives are calculated assuming that [eq]#P# is piecewise linear and 1580continuous within the derivative group. 1581All dynamic instances of explicit derivative instructions (code:OpDPdx*, 1582code:OpDPdy*, and code:OpFwidth*) must: be executed in control flow that is 1583uniform within a derivative group. 1584For other derivative operations, results are undefined: if a dynamic 1585instance is executed in control flow that is not uniform within the 1586derivative group. 1587 1588Fragment shaders that statically execute derivative operations must: launch 1589sufficient invocations to ensure their correct operation; additional 1590<<shaders-helper-invocations, helper invocations>> are launched for 1591framebuffer locations not covered by rasterized fragments if necessary. 1592 1593ifdef::VK_NV_compute_shader_derivatives[] 1594[NOTE] 1595.Note 1596==== 1597In a compute shader, it is the application's responsibility to ensure that 1598sufficient invocations are launched. 1599==== 1600endif::VK_NV_compute_shader_derivatives[] 1601 1602Derivative operations calculate their results as the difference between the 1603result of [eq]#P# across invocations in the quad. 1604For fine derivative operations (code:OpDPdxFine and code:OpDPdyFine), the 1605values of [eq]#DPdx(P~i~)# are calculated as 1606 1607 {empty}:: [eq]#DPdx(P~0~) = DPdx(P~1~) = P~1~ - P~0~# 1608 {empty}:: [eq]#DPdx(P~2~) = DPdx(P~3~) = P~3~ - P~2~# 1609 1610and the values of [eq]#DPdy(P~i~)# are calculated as 1611 1612 {empty}:: [eq]#DPdy(P~0~) = DPdy(P~2~) = P~2~ - P~0~# 1613 {empty}:: [eq]#DPdy(P~1~) = DPdy(P~3~) = P~3~ - P~1~# 1614 1615where [eq]#i# is the index of each invocation as described in 1616<<shaders-scope-quad>>. 1617 1618Coarse derivative operations (code:OpDPdxCoarse and code:OpDPdyCoarse), 1619calculate their results in roughly the same manner, but may: only calculate 1620two values instead of four (one for each of [eq]#DPdx# and [eq]#DPdy#), 1621reusing the same result no matter the originating invocation. 1622If an implementation does this, it should: use the fine derivative 1623calculations described for [eq]#P~0~#. 1624 1625[NOTE] 1626.Note 1627==== 1628Derivative values are calculated between fragments rather than pixels. 1629If the fragment shader invocations involved in the calculation cover 1630multiple pixels, these operations cover a wider area, resulting in larger 1631derivative values. 1632This in turn will result in a coarser level of detail being selected for 1633image sampling operations using derivatives. 1634 1635Applications may want to account for this when using multi-pixel fragments; 1636if pixel derivatives are desired, applications should use explicit 1637derivative operations and divide the results by the size of the fragment in 1638each dimension as follows: 1639 1640 {empty}:: [eq]#DPdx(P~n~)' = DPdx(P~n~) / w# 1641 {empty}:: [eq]#DPdy(P~n~)' = DPdy(P~n~) / h# 1642 1643where [eq]#w# and [eq]#h# are the size of the fragments in the quad, and 1644[eq]#DPdx(P~n~)'# and [eq]#DPdy(P~n~)'# are the pixel derivatives. 1645==== 1646 1647The results for code:OpDPdx and code:OpDPdy may: be calculated as either 1648fine or coarse derivatives, with implementations favouring the most 1649efficient approach. 1650Implementations must: choose coarse or fine consistently between the two. 1651 1652Executing code:OpFwidthFine, code:OpFwidthCoarse, or code:OpFwidth is 1653equivalent to executing the corresponding code:OpDPdx* and code:OpDPdy* 1654instructions, taking the absolute value of the results, and summing them. 1655 1656Executing an code:OpImage*Sample*ImplicitLod instruction is equivalent to 1657executing code:OpDPdx(code:Coordinate) and code:OpDPdy(code:Coordinate), and 1658passing the results as the code:Grad operands code:dx and code:dy. 1659 1660[NOTE] 1661.Note 1662==== 1663It is expected that using the code:ImplicitLod variants of sampling 1664functions will be substantially more efficient than using the 1665code:ExplicitLod variants with explicitly generated derivatives. 1666==== 1667 1668 1669[[shaders-helper-invocations]] 1670== Helper Invocations 1671 1672When performing <<shaders-derivative-operations, derivative>> 1673ifdef::VK_VERSION_1_1[] 1674or <<shaders-quad-operations, quad group>> 1675endif::VK_VERSION_1_1[] 1676operations in a fragment shader, additional invocations may: be spawned in 1677order to ensure correct results. 1678These additional invocations are known as _helper invocations_ and can: be 1679identified by a non-zero value in the code:HelperInvocation built-in. 1680Stores and atomics performed by helper invocations must: not have any effect 1681on memory except for the code:Function, code:Private and code:Output storage 1682classes, and values returned by atomic instructions in helper invocations 1683are undefined:. 1684 1685[NOTE] 1686.Note 1687==== 1688While storage to code:Output storage class has an effect even in helper 1689invocations, it does not mean that helper invocations have an effect on the 1690framebuffer. 1691code:Output variables in fragment shaders can be read from as well, and they 1692behave more like code:Private variables for the duration of the shader 1693invocation. 1694==== 1695 1696For <<shaders-group-operations, group operations>> other than 1697<<shaders-derivative-operations, derivative>> 1698ifdef::VK_VERSION_1_1[] 1699and <<shaders-quad-operations, quad group>> 1700endif::VK_VERSION_1_1[] 1701operations, helper invocations may: be treated as inactive even if they 1702would be considered otherwise active. 1703 1704ifdef::VK_VERSION_1_3,VK_EXT_shader_demote_to_helper_invocation[] 1705Helper invocations may: become permanently inactive if all invocations in a 1706quad scope instance become helper invocations. 1707endif::VK_VERSION_1_3,VK_EXT_shader_demote_to_helper_invocation[] 1708 1709 1710ifdef::VK_NV_cooperative_matrix[] 1711== Cooperative Matrices 1712 1713A _cooperative matrix_ type is a SPIR-V type where the storage for and 1714computations performed on the matrix are spread across the invocations in a 1715scope instance. 1716These types give the implementation freedom in how to optimize matrix 1717multiplies. 1718 1719SPIR-V defines the types and instructions, but does not specify rules about 1720what sizes/combinations are valid, and it is expected that different 1721implementations may: support different sizes. 1722 1723[open,refpage='vkGetPhysicalDeviceCooperativeMatrixPropertiesNV',desc='Returns properties describing what cooperative matrix types are supported',type='protos'] 1724-- 1725To enumerate the supported cooperative matrix types and operations, call: 1726 1727include::{generated}/api/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.adoc[] 1728 1729 * pname:physicalDevice is the physical device. 1730 * pname:pPropertyCount is a pointer to an integer related to the number of 1731 cooperative matrix properties available or queried. 1732 * pname:pProperties is either `NULL` or a pointer to an array of 1733 slink:VkCooperativeMatrixPropertiesNV structures. 1734 1735If pname:pProperties is `NULL`, then the number of cooperative matrix 1736properties available is returned in pname:pPropertyCount. 1737Otherwise, pname:pPropertyCount must: point to a variable set by the user to 1738the number of elements in the pname:pProperties array, and on return the 1739variable is overwritten with the number of structures actually written to 1740pname:pProperties. 1741If pname:pPropertyCount is less than the number of cooperative matrix 1742properties available, at most pname:pPropertyCount structures will be 1743written, and ename:VK_INCOMPLETE will be returned instead of 1744ename:VK_SUCCESS, to indicate that not all the available cooperative matrix 1745properties were returned. 1746 1747include::{generated}/validity/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.adoc[] 1748-- 1749 1750[open,refpage='VkCooperativeMatrixPropertiesNV',desc='Structure specifying cooperative matrix properties',type='structs'] 1751-- 1752Each sname:VkCooperativeMatrixPropertiesNV structure describes a single 1753supported combination of types for a matrix multiply/add operation 1754(code:OpCooperativeMatrixMulAddNV). 1755The multiply can: be described in terms of the following variables and types 1756(in SPIR-V pseudocode): 1757 1758[source,c] 1759~~~~ 1760 %A is of type OpTypeCooperativeMatrixNV %AType %scope %MSize %KSize 1761 %B is of type OpTypeCooperativeMatrixNV %BType %scope %KSize %NSize 1762 %C is of type OpTypeCooperativeMatrixNV %CType %scope %MSize %NSize 1763 %D is of type OpTypeCooperativeMatrixNV %DType %scope %MSize %NSize 1764 1765 %D = %A * %B + %C // using OpCooperativeMatrixMulAddNV 1766~~~~ 1767 1768A matrix multiply with these dimensions is known as an _MxNxK_ matrix 1769multiply. 1770 1771The sname:VkCooperativeMatrixPropertiesNV structure is defined as: 1772 1773include::{generated}/api/structs/VkCooperativeMatrixPropertiesNV.adoc[] 1774 1775 * pname:sType is the type of this structure. 1776 * pname:pNext is `NULL` or a pointer to a structure extending this 1777 structure. 1778 * pname:MSize is the number of rows in matrices A, C, and D. 1779 * pname:KSize is the number of columns in matrix A and rows in matrix B. 1780 * pname:NSize is the number of columns in matrices B, C, D. 1781 * pname:AType is the component type of matrix A, of type 1782 elink:VkComponentTypeNV. 1783 * pname:BType is the component type of matrix B, of type 1784 elink:VkComponentTypeNV. 1785 * pname:CType is the component type of matrix C, of type 1786 elink:VkComponentTypeNV. 1787 * pname:DType is the component type of matrix D, of type 1788 elink:VkComponentTypeNV. 1789 * pname:scope is the scope of all the matrix types, of type 1790 elink:VkScopeNV. 1791 1792If some types are preferred over other types (e.g. for performance), they 1793should: appear earlier in the list enumerated by 1794flink:vkGetPhysicalDeviceCooperativeMatrixPropertiesNV. 1795 1796At least one entry in the list must: have power of two values for all of 1797pname:MSize, pname:KSize, and pname:NSize. 1798 1799include::{generated}/validity/structs/VkCooperativeMatrixPropertiesNV.adoc[] 1800-- 1801 1802[open,refpage='VkScopeNV',desc='Specify SPIR-V scope',type='enums'] 1803-- 1804Possible values for elink:VkScopeNV include: 1805 1806include::{generated}/api/enums/VkScopeNV.adoc[] 1807 1808 * ename:VK_SCOPE_DEVICE_NV corresponds to SPIR-V code:Device scope. 1809 * ename:VK_SCOPE_WORKGROUP_NV corresponds to SPIR-V code:Workgroup scope. 1810 * ename:VK_SCOPE_SUBGROUP_NV corresponds to SPIR-V code:Subgroup scope. 1811 * ename:VK_SCOPE_QUEUE_FAMILY_NV corresponds to SPIR-V code:QueueFamily 1812 scope. 1813 1814All enum values match the corresponding SPIR-V value. 1815-- 1816 1817[open,refpage='VkComponentTypeNV',desc='Specify SPIR-V cooperative matrix component type',type='enums'] 1818-- 1819Possible values for elink:VkComponentTypeNV include: 1820 1821include::{generated}/api/enums/VkComponentTypeNV.adoc[] 1822 1823 * ename:VK_COMPONENT_TYPE_FLOAT16_NV corresponds to SPIR-V 1824 code:OpTypeFloat 16. 1825 * ename:VK_COMPONENT_TYPE_FLOAT32_NV corresponds to SPIR-V 1826 code:OpTypeFloat 32. 1827 * ename:VK_COMPONENT_TYPE_FLOAT64_NV corresponds to SPIR-V 1828 code:OpTypeFloat 64. 1829 * ename:VK_COMPONENT_TYPE_SINT8_NV corresponds to SPIR-V code:OpTypeInt 8 1. 1830 * ename:VK_COMPONENT_TYPE_SINT16_NV corresponds to SPIR-V code:OpTypeInt 1831 16 1. 1832 * ename:VK_COMPONENT_TYPE_SINT32_NV corresponds to SPIR-V code:OpTypeInt 1833 32 1. 1834 * ename:VK_COMPONENT_TYPE_SINT64_NV corresponds to SPIR-V code:OpTypeInt 1835 64 1. 1836 * ename:VK_COMPONENT_TYPE_UINT8_NV corresponds to SPIR-V code:OpTypeInt 8 0. 1837 * ename:VK_COMPONENT_TYPE_UINT16_NV corresponds to SPIR-V code:OpTypeInt 1838 16 0. 1839 * ename:VK_COMPONENT_TYPE_UINT32_NV corresponds to SPIR-V code:OpTypeInt 1840 32 0. 1841 * ename:VK_COMPONENT_TYPE_UINT64_NV corresponds to SPIR-V code:OpTypeInt 1842 64 0. 1843-- 1844endif::VK_NV_cooperative_matrix[] 1845 1846 1847ifdef::VK_EXT_validation_cache[] 1848[[shaders-validation-cache]] 1849== Validation Cache 1850 1851[open,refpage='VkValidationCacheEXT',desc='Opaque handle to a validation cache object',type='handles'] 1852-- 1853Validation cache objects allow the result of internal validation to be 1854reused, both within a single application run and between multiple runs. 1855Reuse within a single run is achieved by passing the same validation cache 1856object when creating supported Vulkan objects. 1857Reuse across runs of an application is achieved by retrieving validation 1858cache contents in one run of an application, saving the contents, and using 1859them to preinitialize a validation cache on a subsequent run. 1860The contents of the validation cache objects are managed by the validation 1861layers. 1862Applications can: manage the host memory consumed by a validation cache 1863object and control the amount of data retrieved from a validation cache 1864object. 1865 1866Validation cache objects are represented by sname:VkValidationCacheEXT 1867handles: 1868 1869include::{generated}/api/handles/VkValidationCacheEXT.adoc[] 1870-- 1871 1872[open,refpage='vkCreateValidationCacheEXT',desc='Creates a new validation cache',type='protos'] 1873-- 1874To create validation cache objects, call: 1875 1876include::{generated}/api/protos/vkCreateValidationCacheEXT.adoc[] 1877 1878 * pname:device is the logical device that creates the validation cache 1879 object. 1880 * pname:pCreateInfo is a pointer to a slink:VkValidationCacheCreateInfoEXT 1881 structure containing the initial parameters for the validation cache 1882 object. 1883 * pname:pAllocator controls host memory allocation as described in the 1884 <<memory-allocation, Memory Allocation>> chapter. 1885 * pname:pValidationCache is a pointer to a slink:VkValidationCacheEXT 1886 handle in which the resulting validation cache object is returned. 1887 1888[NOTE] 1889.Note 1890==== 1891Applications can: track and manage the total host memory size of a 1892validation cache object using the pname:pAllocator. 1893Applications can: limit the amount of data retrieved from a validation cache 1894object in fname:vkGetValidationCacheDataEXT. 1895Implementations should: not internally limit the total number of entries 1896added to a validation cache object or the total host memory consumed. 1897==== 1898 1899Once created, a validation cache can: be passed to the 1900fname:vkCreateShaderModule command by adding this object to the 1901slink:VkShaderModuleCreateInfo structure's pname:pNext chain. 1902If a slink:VkShaderModuleValidationCacheCreateInfoEXT object is included in 1903the slink:VkShaderModuleCreateInfo::pname:pNext chain, and its 1904pname:validationCache field is not dlink:VK_NULL_HANDLE, the implementation 1905will query it for possible reuse opportunities and update it with new 1906content. 1907The use of the validation cache object in these commands is internally 1908synchronized, and the same validation cache object can: be used in multiple 1909threads simultaneously. 1910 1911[NOTE] 1912.Note 1913==== 1914Implementations should: make every effort to limit any critical sections to 1915the actual accesses to the cache, which is expected to be significantly 1916shorter than the duration of the fname:vkCreateShaderModule command. 1917==== 1918 1919include::{generated}/validity/protos/vkCreateValidationCacheEXT.adoc[] 1920-- 1921 1922[open,refpage='VkValidationCacheCreateInfoEXT',desc='Structure specifying parameters of a newly created validation cache',type='structs'] 1923-- 1924The sname:VkValidationCacheCreateInfoEXT structure is defined as: 1925 1926include::{generated}/api/structs/VkValidationCacheCreateInfoEXT.adoc[] 1927 1928 * pname:sType is the type of this structure. 1929 * pname:pNext is `NULL` or a pointer to a structure extending this 1930 structure. 1931 * pname:flags is reserved for future use. 1932 * pname:initialDataSize is the number of bytes in pname:pInitialData. 1933 If pname:initialDataSize is zero, the validation cache will initially be 1934 empty. 1935 * pname:pInitialData is a pointer to previously retrieved validation cache 1936 data. 1937 If the validation cache data is incompatible (as defined below) with the 1938 device, the validation cache will be initially empty. 1939 If pname:initialDataSize is zero, pname:pInitialData is ignored. 1940 1941.Valid Usage 1942**** 1943 * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01534]] 1944 If pname:initialDataSize is not `0`, it must: be equal to the size of 1945 pname:pInitialData, as returned by fname:vkGetValidationCacheDataEXT 1946 when pname:pInitialData was originally retrieved 1947 * [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01535]] 1948 If pname:initialDataSize is not `0`, pname:pInitialData must: have been 1949 retrieved from a previous call to fname:vkGetValidationCacheDataEXT 1950**** 1951 1952include::{generated}/validity/structs/VkValidationCacheCreateInfoEXT.adoc[] 1953-- 1954 1955[open,refpage='VkValidationCacheCreateFlagsEXT',desc='Reserved for future use',type='flags'] 1956-- 1957include::{generated}/api/flags/VkValidationCacheCreateFlagsEXT.adoc[] 1958 1959tname:VkValidationCacheCreateFlagsEXT is a bitmask type for setting a mask, 1960but is currently reserved for future use. 1961-- 1962 1963[open,refpage='vkMergeValidationCachesEXT',desc='Combine the data stores of validation caches',type='protos'] 1964-- 1965Validation cache objects can: be merged using the command: 1966 1967include::{generated}/api/protos/vkMergeValidationCachesEXT.adoc[] 1968 1969 * pname:device is the logical device that owns the validation cache 1970 objects. 1971 * pname:dstCache is the handle of the validation cache to merge results 1972 into. 1973 * pname:srcCacheCount is the length of the pname:pSrcCaches array. 1974 * pname:pSrcCaches is a pointer to an array of validation cache handles, 1975 which will be merged into pname:dstCache. 1976 The previous contents of pname:dstCache are included after the merge. 1977 1978[NOTE] 1979.Note 1980==== 1981The details of the merge operation are implementation-dependent, but 1982implementations should: merge the contents of the specified validation 1983caches and prune duplicate entries. 1984==== 1985 1986.Valid Usage 1987**** 1988 * [[VUID-vkMergeValidationCachesEXT-dstCache-01536]] 1989 pname:dstCache must: not appear in the list of source caches 1990**** 1991 1992include::{generated}/validity/protos/vkMergeValidationCachesEXT.adoc[] 1993-- 1994 1995[open,refpage='vkGetValidationCacheDataEXT',desc='Get the data store from a validation cache',type='protos'] 1996-- 1997Data can: be retrieved from a validation cache object using the command: 1998 1999include::{generated}/api/protos/vkGetValidationCacheDataEXT.adoc[] 2000 2001 * pname:device is the logical device that owns the validation cache. 2002 * pname:validationCache is the validation cache to retrieve data from. 2003 * pname:pDataSize is a pointer to a value related to the amount of data in 2004 the validation cache, as described below. 2005 * pname:pData is either `NULL` or a pointer to a buffer. 2006 2007If pname:pData is `NULL`, then the maximum size of the data that can: be 2008retrieved from the validation cache, in bytes, is returned in 2009pname:pDataSize. 2010Otherwise, pname:pDataSize must: point to a variable set by the user to the 2011size of the buffer, in bytes, pointed to by pname:pData, and on return the 2012variable is overwritten with the amount of data actually written to 2013pname:pData. 2014If pname:pDataSize is less than the maximum size that can: be retrieved by 2015the validation cache, at most pname:pDataSize bytes will be written to 2016pname:pData, and fname:vkGetValidationCacheDataEXT will return 2017ename:VK_INCOMPLETE instead of ename:VK_SUCCESS, to indicate that not all of 2018the validation cache was returned. 2019 2020Any data written to pname:pData is valid and can: be provided as the 2021pname:pInitialData member of the slink:VkValidationCacheCreateInfoEXT 2022structure passed to fname:vkCreateValidationCacheEXT. 2023 2024Two calls to fname:vkGetValidationCacheDataEXT with the same parameters 2025must: retrieve the same data unless a command that modifies the contents of 2026the cache is called between them. 2027 2028[[validation-cache-header]] 2029Applications can: store the data retrieved from the validation cache, and 2030use these data, possibly in a future run of the application, to populate new 2031validation cache objects. 2032The results of validation, however, may: depend on the vendor ID, device ID, 2033driver version, and other details of the device. 2034To enable applications to detect when previously retrieved data is 2035incompatible with the device, the initial bytes written to pname:pData must: 2036be a header consisting of the following members: 2037 2038.Layout for validation cache header version ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT 2039[width="85%",cols="8%,21%,71%",options="header"] 2040|==== 2041| Offset | Size | Meaning 2042| 0 | 4 | length in bytes of the entire validation cache header 2043 written as a stream of bytes, with the least 2044 significant byte first 2045| 4 | 4 | a elink:VkValidationCacheHeaderVersionEXT value 2046 written as a stream of bytes, with the least 2047 significant byte first 2048| 8 | ename:VK_UUID_SIZE | a layer commit ID expressed as a UUID, which uniquely 2049 identifies the version of the validation layers used 2050 to generate these validation results 2051|==== 2052 2053The first four bytes encode the length of the entire validation cache 2054header, in bytes. 2055This value includes all fields in the header including the validation cache 2056version field and the size of the length field. 2057 2058The next four bytes encode the validation cache version, as described for 2059elink:VkValidationCacheHeaderVersionEXT. 2060A consumer of the validation cache should: use the cache version to 2061interpret the remainder of the cache header. 2062 2063If pname:pDataSize is less than what is necessary to store this header, 2064nothing will be written to pname:pData and zero will be written to 2065pname:pDataSize. 2066 2067include::{generated}/validity/protos/vkGetValidationCacheDataEXT.adoc[] 2068-- 2069 2070[open,refpage='VkValidationCacheHeaderVersionEXT',desc='Encode validation cache version',type='enums',xrefs='vkCreateValidationCacheEXT vkGetValidationCacheDataEXT'] 2071-- 2072Possible values of the second group of four bytes in the header returned by 2073flink:vkGetValidationCacheDataEXT, encoding the validation cache version, 2074are: 2075 2076include::{generated}/api/enums/VkValidationCacheHeaderVersionEXT.adoc[] 2077 2078 * ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT specifies version one 2079 of the validation cache. 2080-- 2081 2082[open,refpage='vkDestroyValidationCacheEXT',desc='Destroy a validation cache object',type='protos'] 2083-- 2084To destroy a validation cache, call: 2085 2086include::{generated}/api/protos/vkDestroyValidationCacheEXT.adoc[] 2087 2088 * pname:device is the logical device that destroys the validation cache 2089 object. 2090 * pname:validationCache is the handle of the validation cache to destroy. 2091 * pname:pAllocator controls host memory allocation as described in the 2092 <<memory-allocation, Memory Allocation>> chapter. 2093 2094.Valid Usage 2095**** 2096 * [[VUID-vkDestroyValidationCacheEXT-validationCache-01537]] 2097 If sname:VkAllocationCallbacks were provided when pname:validationCache 2098 was created, a compatible set of callbacks must: be provided here 2099 * [[VUID-vkDestroyValidationCacheEXT-validationCache-01538]] 2100 If no sname:VkAllocationCallbacks were provided when 2101 pname:validationCache was created, pname:pAllocator must: be `NULL` 2102**** 2103 2104include::{generated}/validity/protos/vkDestroyValidationCacheEXT.adoc[] 2105-- 2106endif::VK_EXT_validation_cache[] 2107