1// Copyright 2017-2021 The Khronos Group, Inc. 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5[appendix] 6[[memory-model]] 7= Memory Model 8 9 10[[memory-model-agent]] 11== Agent 12 13_Operation_ is a general term for any task that is executed on the system. 14 15[NOTE] 16.Note 17==== 18An operation is by definition something that is executed. 19Thus if an instruction is skipped due to control flow, it does not 20constitute an operation. 21==== 22 23Each operation is executed by a particular _agent_. 24Possible agents include each shader invocation, each host thread, and each 25fixed-function stage of the pipeline. 26 27 28[[memory-model-memory-location]] 29== Memory Location 30 31A _memory location_ identifies unique storage for 8 bits of data. 32Memory operations access a _set of memory locations_ consisting of one or 33more memory locations at a time, e.g. an operation accessing a 32-bit 34integer in memory would read/write a set of four memory locations. 35Memory operations that access whole aggregates may: access any padding bytes 36between elements or members, but no padding bytes at the end of the 37aggregate. 38Two sets of memory locations _overlap_ if the intersection of their sets of 39memory locations is non-empty. 40A memory operation must: not affect memory at a memory location not within 41its set of memory locations. 42 43Memory locations for buffers and images are explicitly allocated in 44slink:VkDeviceMemory objects, and are implicitly allocated for SPIR-V 45variables in each shader invocation. 46 47ifdef::VK_KHR_workgroup_memory_explicit_layout[] 48Variables with code:Workgroup storage class that point to a block-decorated 49type share a set of memory locations. 50endif::VK_KHR_workgroup_memory_explicit_layout[] 51 52 53[[memory-model-allocation]] 54== Allocation 55 56The values stored in newly allocated memory locations are determined by a 57SPIR-V variable's initializer, if present, or else are undefined:. 58At the time an allocation is created there have been no 59<<memory-model-memory-operation,memory operations>> to any of its memory 60locations. 61The initialization is not considered to be a memory operation. 62 63[NOTE] 64.Note 65==== 66For tessellation control shader output variables, a consequence of 67initialization not being considered a memory operation is that some 68implementations may need to insert a barrier between the initialization of 69the output variables and any reads of those variables. 70==== 71 72 73[[memory-model-memory-operation]] 74== Memory Operation 75 76For an operation A and memory location M: 77 78 * [[memory-model-access-read]] A _reads_ M if and only if the data stored 79 in M is an input to A. 80 * [[memory-model-access-write]] A _writes_ M if and only if the data 81 output from A is stored to M. 82 * [[memory-model-access-access]] A _accesses_ M if and only if it either 83 reads or writes (or both) M. 84 85[NOTE] 86.Note 87==== 88A write whose value is the same as what was already in those memory 89locations is still considered to be a write and has all the same effects. 90==== 91 92 93[[memory-model-references]] 94== Reference 95 96A _reference_ is an object that a particular agent can: use to access a set 97of memory locations. 98On the host, a reference is a host virtual address. 99On the device, a reference is: 100 101 * The descriptor that a variable is bound to, for variables in Image, 102 Uniform, or StorageBuffer storage classes. 103 If the variable is an array (or array of arrays, etc.) then each element 104 of the array may: be a unique reference. 105ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] 106 * The address range for a buffer in code:PhysicalStorageBuffer storage 107 class, where the base of the address range is queried with 108ifndef::VK_VERSION_1_2,VK_KHR_buffer_device_address[] 109 flink:vkGetBufferDeviceAddressEXT 110endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[] 111ifdef::VK_VERSION_1_2,VK_KHR_buffer_device_address[] 112 flink:vkGetBufferDeviceAddress 113endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[] 114 and the length of the range is the size of the buffer. 115endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] 116ifdef::VK_KHR_workgroup_memory_explicit_layout[] 117 * A single common reference for all variables with code:Workgroup storage 118 class that point to a block-decorated type. 119 * The variable itself for non-block-decorated type variables in 120 code:Workgroup storage class. 121endif::VK_KHR_workgroup_memory_explicit_layout[] 122 * The variable itself for variables in other storage classes. 123 124Two memory accesses through distinct references may: require availability 125and visibility operations as defined 126<<memory-model-location-ordered,below>>. 127 128 129[[memory-model-program-order]] 130== Program-Order 131 132A _dynamic instance_ of an instruction is defined in SPIR-V 133(https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#DynamicInstance) 134as a way of referring to a particular execution of a static instruction. 135Program-order is an ordering on dynamic instances of instructions executed 136by a single shader invocation: 137 138 * (Basic block): If instructions A and B are in the same basic block, and 139 A is listed in the module before B, then the n'th dynamic instance of A 140 is program-ordered before the n'th dynamic instance of B. 141 * (Branch): The dynamic instance of a branch or switch instruction is 142 program-ordered before the dynamic instance of the OpLabel instruction 143 to which it transfers control. 144 * (Call entry): The dynamic instance of an code:OpFunctionCall instruction 145 is program-ordered before the dynamic instances of the 146 code:OpFunctionParameter instructions and the body of the called 147 function. 148 * (Call exit): The dynamic instance of the instruction following an 149 code:OpFunctionCall instruction is program-ordered after the dynamic 150 instance of the return instruction executed by the called function. 151 * (Transitive Closure): If dynamic instance A of any instruction is 152 program-ordered before dynamic instance B of any instruction and B is 153 program-ordered before dynamic instance C of any instruction then A is 154 program-ordered before C. 155 * (Complete definition): No other dynamic instances are program-ordered. 156 157For instructions executed on the host, the source language defines the 158program-order relation (e.g. as "`sequenced-before`"). 159 160 161ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 162[[shader-call-related]] 163== Shader Call Related 164 165Shader-call-related is an equivalence relation on invocations defined as the 166symmetric and transitive closure of: 167 168 * A is shader-call-related to B if A is created by an 169 <<ray-tracing-repack,invocation repack>> instruction executed by B. 170 171 172[[shader-call-order]] 173== Shader Call Order 174 175Shader-call-order is a partial order on dynamic instances of instructions 176executed by invocations that are shader-call-related: 177 178 * (Program order): If dynamic instance A is program-ordered before B, then 179 A is shader-call-ordered before B. 180 * (Shader call entry): If A is a dynamic instance of an 181 <<ray-tracing-repack,invocation repack>> instruction and B is a dynamic 182 instance executed by an invocation that is created by A, then A is 183 shader-call-ordered before B. 184 * (Shader call exit): If A is a dynamic instance of an 185 <<ray-tracing-repack,invocation repack>> instruction, B is the next 186 dynamic instance executed by the same invocation, and C is a dynamic 187 instance executed by an invocation that is created by A, then C is 188 shader-call-ordered before B. 189 * (Transitive closure): If A is shader-call-ordered-before B and B is 190 shader-call-ordered-before C, then A is shader-call-ordered-before C. 191 * (Complete definition): No other dynamic instances are 192 shader-call-ordered. 193endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 194 195 196[[memory-model-scope]] 197== Scope 198 199Atomic and barrier instructions include scopes which identify sets of shader 200invocations that must: obey the requested ordering and atomicity rules of 201the operation, as defined below. 202 203The various scopes are described in detail in <<shaders-scope, the Shaders 204chapter>>. 205 206 207[[memory-model-atomic-operation]] 208== Atomic Operation 209 210An _atomic operation_ on the device is any SPIR-V operation whose name 211begins with code:OpAtomic. 212An atomic operation on the host is any operation performed with an 213std::atomic typed object. 214 215Each atomic operation has a memory <<memory-model-scope,scope>> and a 216<<memory-model-memory-semantics,semantics>>. 217Informally, the scope determines which other agents it is atomic with 218respect to, and the <<memory-model-memory-semantics,semantics>> constrains 219its ordering against other memory accesses. 220Device atomic operations have explicit scopes and semantics. 221Each host atomic operation implicitly uses the code:CrossDevice scope, and 222uses a memory semantics equivalent to a C++ std::memory_order value of 223relaxed, acquire, release, acq_rel, or seq_cst. 224 225Two atomic operations A and B are _potentially-mutually-ordered_ if and only 226if all of the following are true: 227 228 * They access the same set of memory locations. 229 * They use the same reference. 230 * A is in the instance of B's memory scope. 231 * B is in the instance of A's memory scope. 232 * A and B are not the same operation (irreflexive). 233 234Two atomic operations A and B are _mutually-ordered_ if and only if they are 235potentially-mutually-ordered and any of the following are true: 236 237 * A and B are both device operations. 238 * A and B are both host operations. 239 * A is a device operation, B is a host operation, and the implementation 240 supports concurrent host- and device-atomics. 241 242[NOTE] 243.Note 244==== 245If two atomic operations are not mutually-ordered, and if their sets of 246memory locations overlap, then each must: be synchronized against the other 247as if they were non-atomic operations. 248==== 249 250 251[[memory-model-scoped-modification-order]] 252== Scoped Modification Order 253 254For a given atomic write A, all atomic writes that are mutually-ordered with 255A occur in an order known as A's _scoped modification order_. 256A's scoped modification order relates no other operations. 257 258[NOTE] 259.Note 260==== 261Invocations outside the instance of A's memory scope may: observe the values 262at A's set of memory locations becoming visible to it in an order that 263disagrees with the scoped modification order. 264==== 265 266[NOTE] 267.Note 268==== 269It is valid to have non-atomic operations or atomics in a different scope 270instance to the same set of memory locations, as long as they are 271synchronized against each other as if they were non-atomic (if they are not, 272it is treated as a <<memory-model-access-data-race,data race>>). 273That means this definition of A's scoped modification order could include 274atomic operations that occur much later, after intervening non-atomics. 275That is a bit non-intuitive, but it helps to keep this definition simple and 276non-circular. 277==== 278 279 280[[memory-model-memory-semantics]] 281== Memory Semantics 282 283Non-atomic memory operations, by default, may: be observed by one agent in a 284different order than they were written by another agent. 285 286Atomics and some synchronization operations include _memory semantics_, 287which are flags that constrain the order in which other memory accesses 288(including non-atomic memory accesses and 289<<memory-model-availability-visibility,availability and visibility 290operations>>) performed by the same agent can: be observed by other agents, 291or can: observe accesses by other agents. 292 293Device instructions that include semantics are code:OpAtomic*, 294code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier. 295Host instructions that include semantics are some std::atomic methods and 296memory fences. 297 298SPIR-V supports the following memory semantics: 299 300 * Relaxed: No constraints on order of other memory accesses. 301 * Acquire: A memory read with this semantic performs an _acquire 302 operation_. 303 A memory barrier with this semantic is an _acquire barrier_. 304 * Release: A memory write with this semantic performs a _release 305 operation_. 306 A memory barrier with this semantic is a _release barrier_. 307 * AcquireRelease: A memory read-modify-write operation with this semantic 308 performs both an acquire operation and a release operation, and inherits 309 the limitations on ordering from both of those operations. 310 A memory barrier with this semantic is both a release and acquire 311 barrier. 312 313[NOTE] 314.Note 315==== 316SPIR-V does not support "`consume`" semantics on the device. 317==== 318 319The memory semantics operand also includes _storage class semantics_ which 320indicate which storage classes are constrained by the synchronization. 321SPIR-V storage class semantics include: 322 323 * UniformMemory 324 * WorkgroupMemory 325 * ImageMemory 326 * OutputMemory 327 328Each SPIR-V memory operation accesses a single storage class. 329Semantics in synchronization operations can include a combination of storage 330classes. 331 332The UniformMemory storage class semantic applies to accesses to memory in 333the 334ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] 335PhysicalStorageBuffer, 336endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] 337ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 338code:ShaderRecordBufferKHR, 339endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 340Uniform and StorageBuffer storage classes. 341The WorkgroupMemory storage class semantic applies to accesses to memory in 342the Workgroup storage class. 343The ImageMemory storage class semantic applies to accesses to memory in the 344Image storage class. 345The OutputMemory storage class semantic applies to accesses to memory in the 346Output storage class. 347 348[NOTE] 349.Note 350==== 351Informally, these constraints limit how memory operations can be reordered, 352and these limits apply not only to the order of accesses as performed in the 353agent that executes the instruction, but also to the order the effects of 354writes become visible to all other agents within the same instance of the 355instruction's memory scope. 356==== 357 358[NOTE] 359.Note 360==== 361Release and acquire operations in different threads can: act as 362synchronization operations, to guarantee that writes that happened before 363the release are visible after the acquire. 364(This is not a formal definition, just an Informative forward reference.) 365==== 366 367[NOTE] 368.Note 369==== 370The OutputMemory storage class semantic is only useful in tessellation 371control shaders, which is the only execution model where output variables 372are shared between invocations. 373==== 374 375The memory semantics operand can: also include availability and visibility 376flags, which apply availability and visibility operations as described in 377<<memory-model-availability-visibility,availability and visibility>>. 378The availability/visibility flags are: 379 380 * MakeAvailable: Semantics must: be Release or AcquireRelease. 381 Performs an availability operation before the release operation or 382 barrier. 383 * MakeVisible: Semantics must: be Acquire or AcquireRelease. 384 Performs a visibility operation after the acquire operation or barrier. 385 386The specifics of these operations are defined in 387<<memory-model-availability-visibility-semantics,Availability and Visibility 388Semantics>>. 389 390Host atomic operations may: support a different list of memory semantics and 391synchronization operations, depending on the host architecture and source 392language. 393 394 395[[memory-model-release-sequence]] 396== Release Sequence 397 398After an atomic operation A performs a release operation on a set of memory 399locations M, the _release sequence headed by A_ is the longest continuous 400subsequence of A's scoped modification order that consists of: 401 402 * the atomic operation A as its first element 403 * atomic read-modify-write operations on M by any agent 404 405[NOTE] 406.Note 407==== 408The atomics in the last bullet must: be mutually-ordered with A by virtue of 409being in A's scoped modification order. 410==== 411 412[NOTE] 413.Note 414==== 415This intentionally omits "`atomic writes to M performed by the same agent 416that performed A`", which is present in the corresponding C++ definition. 417==== 418 419 420[[memory-model-synchronizes-with]] 421== Synchronizes-With 422 423_Synchronizes-with_ is a relation between operations, where each operation 424is either an atomic operation or a memory barrier (aka fence on the host). 425 426If A and B are atomic operations, then A synchronizes-with B if and only if 427all of the following are true: 428 429 * A performs a release operation 430 * B performs an acquire operation 431 * A and B are mutually-ordered 432 * B reads a value written by A or by an operation in the release sequence 433 headed by A 434 435code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier 436are _memory barrier_ instructions in SPIR-V. 437 438If A is a release barrier and B is an atomic operation that performs an 439acquire operation, then A synchronizes-with B if and only if all of the 440following are true: 441 442 * there exists an atomic write X (with any memory semantics) 443 * A is program-ordered before X 444 * X and B are mutually-ordered 445 * B reads a value written by X or by an operation in the release sequence 446 headed by X 447 ** If X is relaxed, it is still considered to head a hypothetical release 448 sequence for this rule 449 * A and B are in the instance of each other's memory scopes 450 * X's storage class is in A's semantics. 451 452If A is an atomic operation that performs a release operation and B is an 453acquire barrier, then A synchronizes-with B if and only if all of the 454following are true: 455 456 * there exists an atomic read X (with any memory semantics) 457 * X is program-ordered before B 458 * X and A are mutually-ordered 459 * X reads a value written by A or by an operation in the release sequence 460 headed by A 461 * A and B are in the instance of each other's memory scopes 462 * X's storage class is in B's semantics. 463 464If A is a release barrier and B is an acquire barrier, then A 465synchronizes-with B if all of the following are true: 466 467 * there exists an atomic write X (with any memory semantics) 468 * A is program-ordered before X 469 * there exists an atomic read Y (with any memory semantics) 470 * Y is program-ordered before B 471 * X and Y are mutually-ordered 472 * Y reads the value written by X or by an operation in the release 473 sequence headed by X 474 ** If X is relaxed, it is still considered to head a hypothetical release 475 sequence for this rule 476 * A and B are in the instance of each other's memory scopes 477 * X's and Y's storage class is in A's and B's semantics. 478 ** NOTE: X and Y must have the same storage class, because they are 479 mutually ordered. 480 481If A is a release barrier, B is an acquire barrier, and C is a control 482barrier (where A can: equal C, and B can: equal C), then A synchronizes-with 483B if all of the following are true: 484 485 * A is program-ordered before (or equals) C 486 * C is program-ordered before (or equals) B 487 * A and B are in the instance of each other's memory scopes 488 * A and B are in the instance of C's execution scope 489 490[NOTE] 491.Note 492==== 493This is similar to the barrier-barrier synchronization above, but with a 494control barrier filling the role of the relaxed atomics. 495==== 496 497ifdef::VK_EXT_fragment_shader_interlock[] 498 499Let F be an ordering of fragment shader invocations, such that invocation 500F~1~ is ordered before invocation F~2~ if and only if F~1~ and F~2~ overlap 501as described in <<shaders-scope-fragment-interlock,Fragment Shader 502Interlock>> and F~1~ executes the interlocked code before F~2~. 503 504If A is an code:OpEndInvocationInterlockEXT instruction and B is an 505code:OpBeginInvocationInterlockEXT instruction, then A synchronizes-with B 506if the agent that executes A is ordered before the agent that executes B in 507F. A and B are both considered to have code:FragmentInterlock memory scope 508and semantics of UniformMemory and ImageMemory, and A is considered to have 509Release semantics and B is considered to have Acquire semantics. 510 511[NOTE] 512.Note 513==== 514code:OpBeginInvocationInterlockEXT and code:OpBeginInvocationInterlockEXT do 515not perform implicit availability or visibility operations. 516Usually, shaders using fragment shader interlock will declare the relevant 517resources as `coherent` to get implicit 518<<memory-model-instruction-av-vis,per-instruction availability and 519visibility operations>>. 520==== 521 522endif::VK_EXT_fragment_shader_interlock[] 523 524ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 525If A is a release barrier and B is an acquire barrier, then A 526synchronizes-with B if all of the following are true: 527 528 * A is shader-call-ordered-before B 529 * A and B are in the instance of each other's memory scopes 530 531endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 532 533No other release and acquire barriers synchronize-with each other. 534 535 536[[memory-model-system-synchronizes-with]] 537== System-Synchronizes-With 538 539_System-synchronizes-with_ is a relation between arbitrary operations on the 540device or host. 541Certain operations system-synchronize-with each other, which informally 542means the first operation occurs before the second and that the 543synchronization is performed without using application-visible memory 544accesses. 545 546If there is an <<synchronization-dependencies-execution,execution 547dependency>> between two operations A and B, then the operation in the first 548synchronization scope system-synchronizes-with the operation in the second 549synchronization scope. 550 551[NOTE] 552.Note 553==== 554This covers all Vulkan synchronization primitives, including device 555operations executing before a synchronization primitive is signaled, wait 556operations happening before subsequent device operations, signal operations 557happening before host operations that wait on them, and host operations 558happening before flink:vkQueueSubmit. 559The list is spread throughout the synchronization chapter, and is not 560repeated here. 561==== 562 563System-synchronizes-with implicitly includes all storage class semantics and 564has code:CrossDevice scope. 565 566If A system-synchronizes-with B, we also say A is 567_system-synchronized-before_ B and B is _system-synchronized-after_ A. 568 569 570[[memory-model-non-private]] 571== Private vs. Non-Private 572 573By default, non-atomic memory operations are treated as _private_, meaning 574such a memory operation is not intended to be used for communication with 575other agents. 576Memory operations with the NonPrivatePointer/NonPrivateTexel bit set are 577treated as _non-private_, and are intended to be used for communication with 578other agents. 579 580More precisely, for private memory operations to be 581<<memory-model-location-ordered,Location-Ordered>> between distinct agents 582requires using system-synchronizes-with rather than shader-based 583synchronization. 584Non-private memory operations still obey program-order. 585 586Atomic operations are always considered non-private. 587 588 589[[memory-model-inter-thread-happens-before]] 590== Inter-Thread-Happens-Before 591 592Let SC be a non-empty set of storage class semantics. 593Then (using template syntax) operation A _inter-thread-happens-before_<SC> 594operation B if and only if any of the following is true: 595 596 * A system-synchronizes-with B 597 * A synchronizes-with B, and both A and B have all of SC in their 598 semantics 599 * A is an operation on memory in a storage class in SC or that has all of 600 SC in its semantics, B is a release barrier or release atomic with all 601 of SC in its semantics, and A is program-ordered before B 602 * A is an acquire barrier or acquire atomic with all of SC in its 603 semantics, B is an operation on memory in a storage class in SC or that 604 has all of SC in its semantics, and A is program-ordered before B 605 * A and B are both host operations and A inter-thread-happens-before B as 606 defined in the host language specification 607 * A inter-thread-happens-before<SC> some X and X 608 inter-thread-happens-before<SC> B 609 610 611[[memory-model-happens-before]] 612== Happens-Before 613 614Operation A _happens-before_ operation B if and only if any of the following 615is true: 616 617 * A is program-ordered before B 618 * A inter-thread-happens-before<SC> B for some set of storage classes SC 619 620_Happens-after_ is defined similarly. 621 622[NOTE] 623.Note 624==== 625Unlike C++, happens-before is not always sufficient for a write to be 626visible to a read. 627Additional <<memory-model-availability-visibility,availability and 628visibility>> operations may: be required for writes to be 629<<memory-model-visible-to,visible-to>> other memory accesses. 630==== 631 632[NOTE] 633.Note 634==== 635Happens-before is not transitive, but each of program-order and 636inter-thread-happens-before<SC> are transitive. 637These can be thought of as covering the "`single-threaded`" case and the 638"`multi-threaded`" case, and it is not necessary (and not valid) to form 639chains between the two. 640==== 641 642 643[[memory-model-availability-visibility]] 644== Availability and Visibility 645 646_Availability_ and _visibility_ are states of a write operation, which 647(informally) track how far the write has permeated the system, i.e. which 648agents and references are able to observe the write. 649Availability state is per _memory domain_. 650Visibility state is per (agent,reference) pair. 651Availability and visibility states are per-memory location for each write. 652 653Memory domains are named according to the agents whose memory accesses use 654the domain. 655Domains used by shader invocations are organized hierarchically into 656multiple smaller memory domains which correspond to the different 657<<shaders-scope, scopes>>. 658Each memory domain is considered the _dual_ of a scope, and vice versa. 659The memory domains defined in Vulkan include: 660 661 * _host_ - accessible by host agents 662 * _device_ - accessible by all device agents for a particular device 663 * _shader_ - accessible by shader agents for a particular device, 664 corresponding to the code:Device scope 665 * _queue family instance_ - accessible by shader agents in a single queue 666 family, corresponding to the code:QueueFamily scope. 667ifdef::VK_EXT_fragment_shader_interlock[] 668 * _fragment interlock instance_ - accessible by fragment shader agents 669 that <<shaders-scope-fragment-interlock,overlap>>, corresponding to the 670 code:FragmentInterlock scope. 671endif::VK_EXT_fragment_shader_interlock[] 672ifdef::VK_KHR_ray_tracing_pipeline[] 673 * _shader call instance_ - accessible by shader agents that are 674 <<shader-call-related,shader-call-related>>, corresponding to the 675 code:ShaderCallKHR scope. 676endif::VK_KHR_ray_tracing_pipeline[] 677 * _workgroup instance_ - accessible by shader agents in the same 678 workgroup, corresponding to the code:Workgroup scope. 679 * _subgroup instance_ - accessible by shader agents in the same subgroup, 680 corresponding to the code:Subgroup scope. 681 682The memory domains are nested in the order listed above, 683ifdef::VK_KHR_ray_tracing_pipeline[] 684except for shader call instance domain, 685endif::VK_KHR_ray_tracing_pipeline[] 686with memory domains later in the list nested in the domains earlier in the 687list. 688ifdef::VK_KHR_ray_tracing_pipeline[] 689The shader call instance domain is at an implementation-dependent location 690in the list, and is nested according to that location. 691The shader call instance domain is not broader than the queue family 692instance domain. 693endif::VK_KHR_ray_tracing_pipeline[] 694 695[NOTE] 696.Note 697==== 698Memory domains do not correspond to storage classes or device-local and 699host-local slink:VkDeviceMemory allocations, rather they indicate whether a 700write can be made visible only to agents in the same subgroup, same 701workgroup, 702ifdef::VK_EXT_fragment_shader_interlock[] 703overlapping fragment shader invocation, 704endif::VK_EXT_fragment_shader_interlock[] 705ifdef::VK_KHR_ray_tracing_pipeline[] 706shader-call-related ray tracing invocation, 707endif::VK_KHR_ray_tracing_pipeline[] 708in any shader invocation, or anywhere on the device, or host. 709The shader, queue family instance, 710ifdef::VK_EXT_fragment_shader_interlock[] 711fragment interlock instance, 712endif::VK_EXT_fragment_shader_interlock[] 713ifdef::VK_KHR_ray_tracing_pipeline[] 714shader call instance, 715endif::VK_KHR_ray_tracing_pipeline[] 716workgroup instance, and subgroup instance domains are only used for 717shader-based availability/visibility operatons, in other cases writes can be 718made available from/visible to the shader via the device domain. 719==== 720 721_Availability operations_, _visibility operations_, and _memory domain 722operations_ alter the state of the write operations that happen-before them, 723and which are included in their _source scope_ to be available or visible to 724their _destination scope_. 725 726 * For an availability operation, the source scope is a set of 727 (agent,reference,memory location) tuples, and the destination scope is a 728 set of memory domains. 729 * For a memory domain operation, the source scope is a memory domain and 730 the destination scope is a memory domain. 731 * For a visibility operation, the source scope is a set of memory domains 732 and the destination scope is a set of (agent,reference,memory location) 733 tuples. 734 735How the scopes are determined depends on the specific operation. 736Availability and memory domain operations expand the set of memory domains 737to which the write is available. 738Visibility operations expand the set of (agent,reference,memory location) 739tuples to which the write is visible. 740 741Recall that availability and visibility states are per-memory location, and 742let W be a write operation to one or more locations performed by agent A via 743reference R. Let L be one of the locations written. 744(W,L) (the write W to L), is initially not available to any memory domain 745and only visible to (A,R,L). 746An availability operation AV that happens-after W and that includes (A,R,L) 747in its source scope makes (W,L) _available_ to the memory domains in its 748destination scope. 749 750A memory domain operation DOM that happens-after AV and for which (W,L) is 751available in the source scope makes (W,L) available in the destination 752memory domain. 753 754A visibility operation VIS that happens-after AV (or DOM) and for which 755(W,L) is available in any domain in the source scope makes (W,L) _visible_ 756to all (agent,reference,L) tuples included in its destination scope. 757 758If write W~2~ happens-after W, and their sets of memory locations overlap, 759then W will not be available/visible to all agents/references for those 760memory locations that overlap (and future AV/DOM/VIS ops cannot revive W's 761write to those locations). 762 763Availability, memory domain, and visibility operations are treated like 764other non-atomic memory accesses for the purpose of 765<<memory-model-memory-semantics,memory semantics>>, meaning they can be 766ordered by release-acquire sequences or memory barriers. 767 768An _availability chain_ is a sequence of availability operations to 769increasingly broad memory domains, where element N+1 of the chain is 770performed in the dual scope instance of the destination memory domain of 771element N and element N happens-before element N+1. 772An example is an availability operation with destination scope of the 773workgroup instance domain that happens-before an availability operation to 774the shader domain performed by an invocation in the same workgroup. 775An availability chain AVC that happens-after W and that includes (A,R,L) in 776the source scope makes (W,L) _available_ to the memory domains in its final 777destination scope. 778An availability chain with a single element is just the availability 779operation. 780 781Similarly, a _visibility chain_ is a sequence of visibility operations from 782increasingly narrow memory domains, where element N of the chain is 783performed in the dual scope instance of the source memory domain of element 784N+1 and element N happens-before element N+1. 785An example is a visibility operation with source scope of the shader domain 786that happens-before a visibility operation with source scope of the 787workgroup instance domain performed by an invocation in the same workgroup. 788A visibility chain VISC that happens-after AVC (or DOM) and for which (W,L) 789is available in any domain in the source scope makes (W,L) _visible_ to all 790(agent,reference,L) tuples included in its final destination scope. 791A visibility chain with a single element is just the visibility operation. 792 793 794[[memory-model-vulkan-availability-visibility]] 795== Availability, Visibility, and Domain Operations 796 797The following operations generate availability, visibility, and domain 798operations. 799When multiple availability/visibility/domain operations are described, they 800are system-synchronized-with each other in the order listed. 801 802An operation that performs a <<synchronization-dependencies-memory,memory 803dependency>> generates: 804 805 * If the source access mask includes ename:VK_ACCESS_HOST_WRITE_BIT, then 806 the dependency includes a memory domain operation from host domain to 807 device domain. 808 * An availability operation with source scope of all writes in the first 809 <<synchronization-dependencies-access-scopes,access scope>> of the 810 dependency and a destination scope of the device domain. 811 * A visibility operation with source scope of the device domain and 812 destination scope of the second access scope of the dependency. 813 * If the destination access mask includes ename:VK_ACCESS_HOST_READ_BIT or 814 ename:VK_ACCESS_HOST_WRITE_BIT, then the dependency includes a memory 815 domain operation from device domain to host domain. 816 817flink:vkFlushMappedMemoryRanges performs an availability operation, with a 818source scope of (agents,references) = (all host threads, all mapped memory 819ranges passed to the command), and destination scope of the host domain. 820 821flink:vkInvalidateMappedMemoryRanges performs a visibility operation, with a 822source scope of the host domain and a destination scope of 823(agents,references) = (all host threads, all mapped memory ranges passed to 824the command). 825 826flink:vkQueueSubmit performs a memory domain operation from host to device, 827and a visibility operation with source scope of the device domain and 828destination scope of all agents and references on the device. 829 830 831[[memory-model-availability-visibility-semantics]] 832== Availability and Visibility Semantics 833 834A memory barrier or atomic operation via agent A that includes MakeAvailable 835in its semantics performs an availability operation whose source scope 836includes agent A and all references in the storage classes in that 837instruction's storage class semantics, and all memory locations, and whose 838destination scope is a set of memory domains selected as specified below. 839The implicit availability operation is program-ordered between the barrier 840or atomic and all other operations program-ordered before the barrier or 841atomic. 842 843A memory barrier or atomic operation via agent A that includes MakeVisible 844in its semantics performs a visibility operation whose source scope is a set 845of memory domains selected as specified below, and whose destination scope 846includes agent A and all references in the storage classes in that 847instruction's storage class semantics, and all memory locations. 848The implicit visibility operation is program-ordered between the barrier or 849atomic and all other operations program-ordered after the barrier or atomic. 850 851The memory domains are selected based on the memory scope of the instruction 852as follows: 853 854 * code:Device scope uses the shader domain 855 * code:QueueFamily scope uses the queue family instance domain 856ifdef::VK_EXT_fragment_shader_interlock[] 857 * code:FragmentInterlock scope uses the fragment interlock instance domain 858endif::VK_EXT_fragment_shader_interlock[] 859ifdef::VK_KHR_ray_tracing_pipeline[] 860 * code:ShaderCallKHR scope uses the shader call instance domain 861endif::VK_KHR_ray_tracing_pipeline[] 862 * code:Workgroup scope uses the workgroup instance domain 863 * code:Subgroup uses the subgroup instance domain 864 * code:Invocation perform no availability/visibility operations. 865 866When an availability operation performed by an agent A includes a memory 867domain D in its destination scope, where D corresponds to scope instance S, 868it also includes the memory domains that correspond to each smaller scope 869instance S' that is a subset of S and that includes A. Similarly for 870visibility operations. 871 872 873[[memory-model-instruction-av-vis]] 874== Per-Instruction Availability and Visibility Semantics 875 876A memory write instruction that includes MakePointerAvailable, or an image 877write instruction that includes MakeTexelAvailable, performs an availability 878operation whose source scope includes the agent and reference used to 879perform the write and the memory locations written by the instruction, and 880whose destination scope is a set of memory domains selected by the Scope 881operand specified in <<memory-model-availability-visibility-semantics, 882Availability and Visibility Semantics>>. 883The implicit availability operation is program-ordered between the write and 884all other operations program-ordered after the write. 885 886A memory read instruction that includes MakePointerVisible, or an image read 887instruction that includes MakeTexelVisible, performs a visibility operation 888whose source scope is a set of memory domains selected by the Scope operand 889as specified in <<memory-model-availability-visibility-semantics, 890Availability and Visibility Semantics>>, and whose destination scope 891includes the agent and reference used to perform the read and the memory 892locations read by the instruction. 893The implicit visibility operation is program-ordered between read and all 894other operations program-ordered before the read. 895 896[NOTE] 897.Note 898==== 899Although reads with per-instruction visibility only perform visibility ops 900from the shader or 901ifdef::VK_EXT_fragment_shader_interlock[] 902fragment interlock instance or 903endif::VK_EXT_fragment_shader_interlock[] 904ifdef::VK_KHR_ray_tracing_pipeline[] 905shader call instance or 906endif::VK_KHR_ray_tracing_pipeline[] 907workgroup instance or subgroup instance domain, they will also see writes 908that were made visible via the device domain, i.e. those writes previously 909performed by non-shader agents and made visible via API commands. 910==== 911 912[NOTE] 913.Note 914==== 915It is expected that all invocations in a subgroup execute on the same 916processor with the same path to memory, and thus availability and visibility 917operations with subgroup scope can be expected to be "`free`". 918==== 919 920 921[[memory-model-location-ordered]] 922== Location-Ordered 923 924Let X and Y be memory accesses to overlapping sets of memory locations M, 925where X != Y. Let (A~X~,R~X~) be the agent and reference used for X, and 926(A~Y~,R~Y~) be the agent and reference used for Y. For now, let "`->`" 927denote happens-before and "`->^rcpo^`" denote the reflexive closure of 928program-ordered before. 929 930If D~1~ and D~2~ are different memory domains, then let DOM(D~1~,D~2~) be a 931memory domain operation from D~1~ to D~2~. 932Otherwise, let DOM(D,D) be a placeholder such that X->DOM(D,D)->Y if and 933only if X->Y. 934 935X is _location-ordered_ before Y for a location L in M if and only if any of 936the following is true: 937 938 * A~X~ == A~Y~ and R~X~ == R~Y~ and X->Y 939 ** NOTE: this case means no availability/visibility ops are required when 940 it is the same (agent,reference). 941 942 * X is a read, both X and Y are non-private, and X->Y 943 * X is a read, and X (transitively) system-synchronizes with Y 944 945 * If R~X~ == R~Y~ and A~X~ and A~Y~ access a common memory domain D (e.g. 946 are in the same workgroup instance if D is the workgroup instance 947 domain), and both X and Y are non-private: 948 ** X is a write, Y is a write, AVC(A~X~,R~X~,D,L) is an availability chain 949 making (X,L) available to domain D, and X->^rcpo^AVC(A~X~,R~X~,D,L)->Y 950 ** X is a write, Y is a read, AVC(A~X~,R~X~,D,L) is an availability chain 951 making (X,L) available to domain D, VISC(A~Y~,R~Y~,D,L) is a visibility 952 chain making writes to L available in domain D visible to Y, and 953 X->^rcpo^AVC(A~X~,R~X~,D,L)->VISC(A~Y~,R~Y~,D,L)->^rcpo^Y 954 ** If 955 slink:VkPhysicalDeviceVulkanMemoryModelFeatures::pname:vulkanMemoryModelAvailabilityVisibilityChains 956 is ename:VK_FALSE, then AVC and VISC must: each only have a single 957 element in the chain, in each sub-bullet above. 958 959 * Let D~X~ and D~Y~ each be either the device domain or the host domain, 960 depending on whether A~X~ and A~Y~ execute on the device or host: 961 ** X is a write and Y is a write, and 962 X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->Y 963 ** X is a write and Y is a read, and 964 X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->VIS(A~Y~,R~Y~,D~Y~,L)->Y 965 966[NOTE] 967.Note 968==== 969The final bullet (synchronization through device/host domain) requires 970API-level synchronization operations, since the device/host domains are not 971accessible via shader instructions. 972And "`device domain`" is not to be confused with "`device scope`", which 973synchronizes through the "`shader domain`". 974==== 975 976 977[[memory-model-access-data-race]] 978== Data Race 979 980Let X and Y be operations that access overlapping sets of memory locations 981M, where X != Y, and at least one of X and Y is a write, and X and Y are not 982mutually-ordered atomic operations. 983If there does not exist a location-ordered relation between X and Y for each 984location in M, then there is a _data race_. 985 986Applications must: ensure that no data races occur during the execution of 987their application. 988 989[NOTE] 990.Note 991==== 992Data races can only occur due to instructions that are actually executed. 993For example, an instruction skipped due to control flow must not contribute 994to a data race. 995==== 996 997 998[[memory-model-visible-to]] 999== Visible-To 1000 1001Let X be a write and Y be a read whose sets of memory locations overlap, and 1002let M be the set of memory locations that overlap. 1003Let M~2~ be a non-empty subset of M. Then X is _visible-to_ Y for memory 1004locations M~2~ if and only if all of the following are true: 1005 1006 * X is location-ordered before Y for each location L in M~2~. 1007 * There does not exist another write Z to any location L in M~2~ such that 1008 X is location-ordered before Z for location L and Z is location-ordered 1009 before Y for location L. 1010 1011If X is visible-to Y, then Y reads the value written by X for locations 1012M~2~. 1013 1014[NOTE] 1015.Note 1016==== 1017It is possible for there to be a write between X and Y that overwrites a 1018subset of the memory locations, but the remaining memory locations (M~2~) 1019will still be visible-to Y. 1020==== 1021 1022 1023[[memory-model-acyclicity]] 1024== Acyclicity 1025 1026_Reads-from_ is a relation between operations, where the first operation is 1027a write, the second operation is a read, and the second operation reads the 1028value written by the first operation. 1029_From-reads_ is a relation between operations, where the first operation is 1030a read, the second operation is a write, and the first operation reads a 1031value written earlier than the second operation in the second operation's 1032scoped modification order (or the first operation reads from the initial 1033value, and the second operation is any write to the same locations). 1034 1035Then the implementation must: guarantee that no cycles exist in the union of 1036the following relations: 1037 1038 * location-ordered 1039 * scoped modification order (over all atomic writes) 1040 * reads-from 1041 * from-reads 1042 1043[NOTE] 1044.Note 1045==== 1046This is a "`consistency`" axiom, which informally guarantees that sequences 1047of operations cannot violate causality. 1048==== 1049 1050 1051[[memory-model-scoped-modification-order-coherence]] 1052=== Scoped Modification Order Coherence 1053 1054Let A and B be mutually-ordered atomic operations, where A is 1055location-ordered before B. Then the following rules are a consequence of 1056acyclicity: 1057 1058 * If A and B are both reads and A does not read the initial value, then 1059 the write that A takes its value from must: be earlier in its own scoped 1060 modification order than (or the same as) the write that B takes its 1061 value from (no cycles between location-order, reads-from, and 1062 from-reads). 1063 * If A is a read and B is a write and A does not read the initial value, 1064 then A must: take its value from a write earlier than B in B's scoped 1065 modification order (no cycles between location-order, scope modification 1066 order, and reads-from). 1067 * If A is a write and B is a read, then B must: take its value from A or a 1068 write later than A in A's scoped modification order (no cycles between 1069 location-order, scoped modification order, and from-reads). 1070 * If A and B are both writes, then A must: be earlier than B in A's scoped 1071 modification order (no cycles between location-order and scoped 1072 modification order). 1073 * If A is a write and B is a read-modify-write and B reads the value 1074 written by A, then B comes immediately after A in A's scoped 1075 modification order (no cycles between scoped modification order and 1076 from-reads). 1077 1078 1079[[memory-model-shader-io]] 1080== Shader I/O 1081 1082If a shader invocation A in a shader stage other than code:Vertex performs a 1083memory read operation X from an object in storage class 1084ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 1085code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR, 1086code:HitAttributeKHR, code:IncomingRayPayloadKHR, or 1087endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 1088code:Input, then X is system-synchronized-after all writes to the 1089corresponding 1090ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 1091code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR, 1092code:HitAttributeKHR, code:IncomingRayPayloadKHR, or 1093endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 1094code:Output storage variable(s) in the shader invocation(s) that contribute 1095to generating invocation A, and those writes are all visible-to X. 1096 1097[NOTE] 1098.Note 1099==== 1100It is not necessary for the upstream shader invocations to have completed 1101execution, they only need to have generated the output that is being read. 1102==== 1103 1104 1105[[memory-model-deallocation]] 1106== Deallocation 1107 1108A call to flink:vkFreeMemory must: happen-after all memory operations on all 1109memory locations in that slink:VkDeviceMemory object. 1110 1111[NOTE] 1112.Note 1113==== 1114Normally, device memory operations in a given queue are synchronized with 1115flink:vkFreeMemory by having a host thread wait on a fence signalled by that 1116queue, and the wait happens-before the call to flink:vkFreeMemory on the 1117host. 1118==== 1119 1120The deallocation of SPIR-V variables is managed by the system and 1121happens-after all operations on those variables. 1122 1123 1124[[memory-model-informative-descriptions]] 1125== Descriptions (Informative) 1126 1127This subsection offers more easily understandable consequences of the memory 1128model for app/compiler developers. 1129 1130Let SC be the storage class(es) specified by a release or acquire operation 1131or barrier. 1132 1133 * An atomic write with release semantics must not be reordered against any 1134 read or write to SC that is program-ordered before it (regardless of the 1135 storage class the atomic is in). 1136 1137 * An atomic read with acquire semantics must not be reordered against any 1138 read or write to SC that is program-ordered after it (regardless of the 1139 storage class the atomic is in). 1140 1141 * Any write to SC program-ordered after a release barrier must not be 1142 reordered against any read or write to SC program-ordered before that 1143 barrier. 1144 1145 * Any read from SC program-ordered before an acquire barrier must not be 1146 reordered against any read or write to SC program-ordered after the 1147 barrier. 1148 1149A control barrier (even if it has no memory semantics) must not be reordered 1150against any memory barriers. 1151 1152This memory model allows memory accesses with and without availability and 1153visibility operations, as well as atomic operations, all to be performed on 1154the same memory location. 1155This is critical to allow it to reason about memory that is reused in 1156multiple ways, e.g. across the lifetime of different shader invocations or 1157draw calls. 1158While GLSL (and legacy SPIR-V) applies the "`coherent`" decoration to 1159variables (for historical reasons), this model treats each memory access 1160instruction as having optional implicit availability/visibility operations. 1161GLSL to SPIR-V compilers should map all (non-atomic) operations on a 1162coherent variable to Make{Pointer,Texel}{Available}{Visible} flags in this 1163model. 1164 1165Atomic operations implicitly have availability/visibility operations, and 1166the scope of those operations is taken from the atomic operation's scope. 1167 1168 1169[[memory-model-tessellation-output-ordering]] 1170== Tessellation Output Ordering 1171 1172For SPIR-V that uses the Vulkan Memory Model, the code:OutputMemory storage 1173class is used to synchronize accesses to tessellation control output 1174variables. 1175For legacy SPIR-V that does not enable the Vulkan Memory Model via 1176code:OpMemoryModel, tessellation outputs can be ordered using a control 1177barrier with no particular memory scope or semantics, as defined below. 1178 1179Let X and Y be memory operations performed by shader invocations A~X~ and 1180A~Y~. 1181Operation X is _tessellation-output-ordered_ before operation Y if and only 1182if all of the following are true: 1183 1184 * There is a dynamic instance of an code:OpControlBarrier instruction C 1185 such that X is program-ordered before C in A~X~ and C is program-ordered 1186 before Y in A~Y~. 1187 * A~X~ and A~Y~ are in the same instance of C's execution scope. 1188 1189If shader invocations A~X~ and A~Y~ in the code:TessellationControl 1190execution model execute memory operations X and Y, respectively, on the 1191code:Output storage class, and X is tessellation-output-ordered before Y 1192with a scope of code:Workgroup, then X is location-ordered before Y, and if 1193X is a write and Y is a read then X is visible-to Y. 1194 1195 1196ifdef::VK_NV_cooperative_matrix[] 1197 1198[[memory-model-cooperative-matrix]] 1199== Cooperative Matrix Memory Access 1200 1201For each dynamic instance of a cooperative matrix load or store instruction 1202(code:OpCooperativeMatrixLoadNV or code:OpCooperativeMatrixStoreNV), a 1203single implementation-dependent invocation within the instance of the 1204matrix's scope performs a non-atomic load or store (respectively) to each 1205memory location that is defined to be accessed by the instruction. 1206 1207endif::VK_NV_cooperative_matrix[] 1208