• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright 2017-2021 The Khronos Group, Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5[appendix]
6[[memory-model]]
7= Memory Model
8
9
10[[memory-model-agent]]
11== Agent
12
13_Operation_ is a general term for any task that is executed on the system.
14
15[NOTE]
16.Note
17====
18An operation is by definition something that is executed.
19Thus if an instruction is skipped due to control flow, it does not
20constitute an operation.
21====
22
23Each operation is executed by a particular _agent_.
24Possible agents include each shader invocation, each host thread, and each
25fixed-function stage of the pipeline.
26
27
28[[memory-model-memory-location]]
29== Memory Location
30
31A _memory location_ identifies unique storage for 8 bits of data.
32Memory operations access a _set of memory locations_ consisting of one or
33more memory locations at a time, e.g. an operation accessing a 32-bit
34integer in memory would read/write a set of four memory locations.
35Memory operations that access whole aggregates may: access any padding bytes
36between elements or members, but no padding bytes at the end of the
37aggregate.
38Two sets of memory locations _overlap_ if the intersection of their sets of
39memory locations is non-empty.
40A memory operation must: not affect memory at a memory location not within
41its set of memory locations.
42
43Memory locations for buffers and images are explicitly allocated in
44slink:VkDeviceMemory objects, and are implicitly allocated for SPIR-V
45variables in each shader invocation.
46
47ifdef::VK_KHR_workgroup_memory_explicit_layout[]
48Variables with code:Workgroup storage class that point to a block-decorated
49type share a set of memory locations.
50endif::VK_KHR_workgroup_memory_explicit_layout[]
51
52
53[[memory-model-allocation]]
54== Allocation
55
56The values stored in newly allocated memory locations are determined by a
57SPIR-V variable's initializer, if present, or else are undefined:.
58At the time an allocation is created there have been no
59<<memory-model-memory-operation,memory operations>> to any of its memory
60locations.
61The initialization is not considered to be a memory operation.
62
63[NOTE]
64.Note
65====
66For tessellation control shader output variables, a consequence of
67initialization not being considered a memory operation is that some
68implementations may need to insert a barrier between the initialization of
69the output variables and any reads of those variables.
70====
71
72
73[[memory-model-memory-operation]]
74== Memory Operation
75
76For an operation A and memory location M:
77
78  * [[memory-model-access-read]] A _reads_ M if and only if the data stored
79    in M is an input to A.
80  * [[memory-model-access-write]] A _writes_ M if and only if the data
81    output from A is stored to M.
82  * [[memory-model-access-access]] A _accesses_ M if and only if it either
83    reads or writes (or both) M.
84
85[NOTE]
86.Note
87====
88A write whose value is the same as what was already in those memory
89locations is still considered to be a write and has all the same effects.
90====
91
92
93[[memory-model-references]]
94== Reference
95
96A _reference_ is an object that a particular agent can: use to access a set
97of memory locations.
98On the host, a reference is a host virtual address.
99On the device, a reference is:
100
101  * The descriptor that a variable is bound to, for variables in Image,
102    Uniform, or StorageBuffer storage classes.
103    If the variable is an array (or array of arrays, etc.) then each element
104    of the array may: be a unique reference.
105ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
106  * The address range for a buffer in code:PhysicalStorageBuffer storage
107    class, where the base of the address range is queried with
108ifndef::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
109    flink:vkGetBufferDeviceAddressEXT
110endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
111ifdef::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
112    flink:vkGetBufferDeviceAddress
113endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
114    and the length of the range is the size of the buffer.
115endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
116ifdef::VK_KHR_workgroup_memory_explicit_layout[]
117   * A single common reference for all variables with code:Workgroup storage
118     class that point to a block-decorated type.
119   * The variable itself for non-block-decorated type variables in
120     code:Workgroup storage class.
121endif::VK_KHR_workgroup_memory_explicit_layout[]
122   * The variable itself for variables in other storage classes.
123
124Two memory accesses through distinct references may: require availability
125and visibility operations as defined
126<<memory-model-location-ordered,below>>.
127
128
129[[memory-model-program-order]]
130== Program-Order
131
132A _dynamic instance_ of an instruction is defined in SPIR-V
133(https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#DynamicInstance)
134as a way of referring to a particular execution of a static instruction.
135Program-order is an ordering on dynamic instances of instructions executed
136by a single shader invocation:
137
138  * (Basic block): If instructions A and B are in the same basic block, and
139    A is listed in the module before B, then the n'th dynamic instance of A
140    is program-ordered before the n'th dynamic instance of B.
141  * (Branch): The dynamic instance of a branch or switch instruction is
142    program-ordered before the dynamic instance of the OpLabel instruction
143    to which it transfers control.
144  * (Call entry): The dynamic instance of an code:OpFunctionCall instruction
145    is program-ordered before the dynamic instances of the
146    code:OpFunctionParameter instructions and the body of the called
147    function.
148  * (Call exit): The dynamic instance of the instruction following an
149    code:OpFunctionCall instruction is program-ordered after the dynamic
150    instance of the return instruction executed by the called function.
151  * (Transitive Closure): If dynamic instance A of any instruction is
152    program-ordered before dynamic instance B of any instruction and B is
153    program-ordered before dynamic instance C of any instruction then A is
154    program-ordered before C.
155  * (Complete definition): No other dynamic instances are program-ordered.
156
157For instructions executed on the host, the source language defines the
158program-order relation (e.g. as "`sequenced-before`").
159
160
161ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
162[[shader-call-related]]
163== Shader Call Related
164
165Shader-call-related is an equivalence relation on invocations defined as the
166symmetric and transitive closure of:
167
168  * A is shader-call-related to B if A is created by an
169    <<ray-tracing-repack,invocation repack>> instruction executed by B.
170
171
172[[shader-call-order]]
173== Shader Call Order
174
175Shader-call-order is a partial order on dynamic instances of instructions
176executed by invocations that are shader-call-related:
177
178  * (Program order): If dynamic instance A is program-ordered before B, then
179    A is shader-call-ordered before B.
180  * (Shader call entry): If A is a dynamic instance of an
181    <<ray-tracing-repack,invocation repack>> instruction and B is a dynamic
182    instance executed by an invocation that is created by A, then A is
183    shader-call-ordered before B.
184  * (Shader call exit): If A is a dynamic instance of an
185    <<ray-tracing-repack,invocation repack>> instruction, B is the next
186    dynamic instance executed by the same invocation, and C is a dynamic
187    instance executed by an invocation that is created by A, then C is
188    shader-call-ordered before B.
189  * (Transitive closure): If A is shader-call-ordered-before B and B is
190    shader-call-ordered-before C, then A is shader-call-ordered-before C.
191  * (Complete definition): No other dynamic instances are
192    shader-call-ordered.
193endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
194
195
196[[memory-model-scope]]
197== Scope
198
199Atomic and barrier instructions include scopes which identify sets of shader
200invocations that must: obey the requested ordering and atomicity rules of
201the operation, as defined below.
202
203The various scopes are described in detail in <<shaders-scope, the Shaders
204chapter>>.
205
206
207[[memory-model-atomic-operation]]
208== Atomic Operation
209
210An _atomic operation_ on the device is any SPIR-V operation whose name
211begins with code:OpAtomic.
212An atomic operation on the host is any operation performed with an
213std::atomic typed object.
214
215Each atomic operation has a memory <<memory-model-scope,scope>> and a
216<<memory-model-memory-semantics,semantics>>.
217Informally, the scope determines which other agents it is atomic with
218respect to, and the <<memory-model-memory-semantics,semantics>> constrains
219its ordering against other memory accesses.
220Device atomic operations have explicit scopes and semantics.
221Each host atomic operation implicitly uses the code:CrossDevice scope, and
222uses a memory semantics equivalent to a C++ std::memory_order value of
223relaxed, acquire, release, acq_rel, or seq_cst.
224
225Two atomic operations A and B are _potentially-mutually-ordered_ if and only
226if all of the following are true:
227
228  * They access the same set of memory locations.
229  * They use the same reference.
230  * A is in the instance of B's memory scope.
231  * B is in the instance of A's memory scope.
232  * A and B are not the same operation (irreflexive).
233
234Two atomic operations A and B are _mutually-ordered_ if and only if they are
235potentially-mutually-ordered and any of the following are true:
236
237  * A and B are both device operations.
238  * A and B are both host operations.
239  * A is a device operation, B is a host operation, and the implementation
240    supports concurrent host- and device-atomics.
241
242[NOTE]
243.Note
244====
245If two atomic operations are not mutually-ordered, and if their sets of
246memory locations overlap, then each must: be synchronized against the other
247as if they were non-atomic operations.
248====
249
250
251[[memory-model-scoped-modification-order]]
252== Scoped Modification Order
253
254For a given atomic write A, all atomic writes that are mutually-ordered with
255A occur in an order known as A's _scoped modification order_.
256A's scoped modification order relates no other operations.
257
258[NOTE]
259.Note
260====
261Invocations outside the instance of A's memory scope may: observe the values
262at A's set of memory locations becoming visible to it in an order that
263disagrees with the scoped modification order.
264====
265
266[NOTE]
267.Note
268====
269It is valid to have non-atomic operations or atomics in a different scope
270instance to the same set of memory locations, as long as they are
271synchronized against each other as if they were non-atomic (if they are not,
272it is treated as a <<memory-model-access-data-race,data race>>).
273That means this definition of A's scoped modification order could include
274atomic operations that occur much later, after intervening non-atomics.
275That is a bit non-intuitive, but it helps to keep this definition simple and
276non-circular.
277====
278
279
280[[memory-model-memory-semantics]]
281== Memory Semantics
282
283Non-atomic memory operations, by default, may: be observed by one agent in a
284different order than they were written by another agent.
285
286Atomics and some synchronization operations include _memory semantics_,
287which are flags that constrain the order in which other memory accesses
288(including non-atomic memory accesses and
289<<memory-model-availability-visibility,availability and visibility
290operations>>) performed by the same agent can: be observed by other agents,
291or can: observe accesses by other agents.
292
293Device instructions that include semantics are code:OpAtomic*,
294code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier.
295Host instructions that include semantics are some std::atomic methods and
296memory fences.
297
298SPIR-V supports the following memory semantics:
299
300  * Relaxed: No constraints on order of other memory accesses.
301  * Acquire: A memory read with this semantic performs an _acquire
302    operation_.
303    A memory barrier with this semantic is an _acquire barrier_.
304  * Release: A memory write with this semantic performs a _release
305    operation_.
306    A memory barrier with this semantic is a _release barrier_.
307  * AcquireRelease: A memory read-modify-write operation with this semantic
308    performs both an acquire operation and a release operation, and inherits
309    the limitations on ordering from both of those operations.
310    A memory barrier with this semantic is both a release and acquire
311    barrier.
312
313[NOTE]
314.Note
315====
316SPIR-V does not support "`consume`" semantics on the device.
317====
318
319The memory semantics operand also includes _storage class semantics_ which
320indicate which storage classes are constrained by the synchronization.
321SPIR-V storage class semantics include:
322
323  * UniformMemory
324  * WorkgroupMemory
325  * ImageMemory
326  * OutputMemory
327
328Each SPIR-V memory operation accesses a single storage class.
329Semantics in synchronization operations can include a combination of storage
330classes.
331
332The UniformMemory storage class semantic applies to accesses to memory in
333the
334ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
335PhysicalStorageBuffer,
336endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
337ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
338code:ShaderRecordBufferKHR,
339endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
340Uniform and StorageBuffer storage classes.
341The WorkgroupMemory storage class semantic applies to accesses to memory in
342the Workgroup storage class.
343The ImageMemory storage class semantic applies to accesses to memory in the
344Image storage class.
345The OutputMemory storage class semantic applies to accesses to memory in the
346Output storage class.
347
348[NOTE]
349.Note
350====
351Informally, these constraints limit how memory operations can be reordered,
352and these limits apply not only to the order of accesses as performed in the
353agent that executes the instruction, but also to the order the effects of
354writes become visible to all other agents within the same instance of the
355instruction's memory scope.
356====
357
358[NOTE]
359.Note
360====
361Release and acquire operations in different threads can: act as
362synchronization operations, to guarantee that writes that happened before
363the release are visible after the acquire.
364(This is not a formal definition, just an Informative forward reference.)
365====
366
367[NOTE]
368.Note
369====
370The OutputMemory storage class semantic is only useful in tessellation
371control shaders, which is the only execution model where output variables
372are shared between invocations.
373====
374
375The memory semantics operand can: also include availability and visibility
376flags, which apply availability and visibility operations as described in
377<<memory-model-availability-visibility,availability and visibility>>.
378The availability/visibility flags are:
379
380  * MakeAvailable: Semantics must: be Release or AcquireRelease.
381    Performs an availability operation before the release operation or
382    barrier.
383  * MakeVisible: Semantics must: be Acquire or AcquireRelease.
384    Performs a visibility operation after the acquire operation or barrier.
385
386The specifics of these operations are defined in
387<<memory-model-availability-visibility-semantics,Availability and Visibility
388Semantics>>.
389
390Host atomic operations may: support a different list of memory semantics and
391synchronization operations, depending on the host architecture and source
392language.
393
394
395[[memory-model-release-sequence]]
396== Release Sequence
397
398After an atomic operation A performs a release operation on a set of memory
399locations M, the _release sequence headed by A_ is the longest continuous
400subsequence of A's scoped modification order that consists of:
401
402  * the atomic operation A as its first element
403  * atomic read-modify-write operations on M by any agent
404
405[NOTE]
406.Note
407====
408The atomics in the last bullet must: be mutually-ordered with A by virtue of
409being in A's scoped modification order.
410====
411
412[NOTE]
413.Note
414====
415This intentionally omits "`atomic writes to M performed by the same agent
416that performed A`", which is present in the corresponding C++ definition.
417====
418
419
420[[memory-model-synchronizes-with]]
421== Synchronizes-With
422
423_Synchronizes-with_ is a relation between operations, where each operation
424is either an atomic operation or a memory barrier (aka fence on the host).
425
426If A and B are atomic operations, then A synchronizes-with B if and only if
427all of the following are true:
428
429  * A performs a release operation
430  * B performs an acquire operation
431  * A and B are mutually-ordered
432  * B reads a value written by A or by an operation in the release sequence
433    headed by A
434
435code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier
436are _memory barrier_ instructions in SPIR-V.
437
438If A is a release barrier and B is an atomic operation that performs an
439acquire operation, then A synchronizes-with B if and only if all of the
440following are true:
441
442  * there exists an atomic write X (with any memory semantics)
443  * A is program-ordered before X
444  * X and B are mutually-ordered
445  * B reads a value written by X or by an operation in the release sequence
446    headed by X
447  ** If X is relaxed, it is still considered to head a hypothetical release
448     sequence for this rule
449  * A and B are in the instance of each other's memory scopes
450  * X's storage class is in A's semantics.
451
452If A is an atomic operation that performs a release operation and B is an
453acquire barrier, then A synchronizes-with B if and only if all of the
454following are true:
455
456  * there exists an atomic read X (with any memory semantics)
457  * X is program-ordered before B
458  * X and A are mutually-ordered
459  * X reads a value written by A or by an operation in the release sequence
460    headed by A
461  * A and B are in the instance of each other's memory scopes
462  * X's storage class is in B's semantics.
463
464If A is a release barrier and B is an acquire barrier, then A
465synchronizes-with B if all of the following are true:
466
467  * there exists an atomic write X (with any memory semantics)
468  * A is program-ordered before X
469  * there exists an atomic read Y (with any memory semantics)
470  * Y is program-ordered before B
471  * X and Y are mutually-ordered
472  * Y reads the value written by X or by an operation in the release
473    sequence headed by X
474  ** If X is relaxed, it is still considered to head a hypothetical release
475     sequence for this rule
476  * A and B are in the instance of each other's memory scopes
477  * X's and Y's storage class is in A's and B's semantics.
478  ** NOTE: X and Y must have the same storage class, because they are
479     mutually ordered.
480
481If A is a release barrier, B is an acquire barrier, and C is a control
482barrier (where A can: equal C, and B can: equal C), then A synchronizes-with
483B if all of the following are true:
484
485  * A is program-ordered before (or equals) C
486  * C is program-ordered before (or equals) B
487  * A and B are in the instance of each other's memory scopes
488  * A and B are in the instance of C's execution scope
489
490[NOTE]
491.Note
492====
493This is similar to the barrier-barrier synchronization above, but with a
494control barrier filling the role of the relaxed atomics.
495====
496
497ifdef::VK_EXT_fragment_shader_interlock[]
498
499Let F be an ordering of fragment shader invocations, such that invocation
500F~1~ is ordered before invocation F~2~ if and only if F~1~ and F~2~ overlap
501as described in <<shaders-scope-fragment-interlock,Fragment Shader
502Interlock>> and F~1~ executes the interlocked code before F~2~.
503
504If A is an code:OpEndInvocationInterlockEXT instruction and B is an
505code:OpBeginInvocationInterlockEXT instruction, then A synchronizes-with B
506if the agent that executes A is ordered before the agent that executes B in
507F. A and B are both considered to have code:FragmentInterlock memory scope
508and semantics of UniformMemory and ImageMemory, and A is considered to have
509Release semantics and B is considered to have Acquire semantics.
510
511[NOTE]
512.Note
513====
514code:OpBeginInvocationInterlockEXT and code:OpBeginInvocationInterlockEXT do
515not perform implicit availability or visibility operations.
516Usually, shaders using fragment shader interlock will declare the relevant
517resources as `coherent` to get implicit
518<<memory-model-instruction-av-vis,per-instruction availability and
519visibility operations>>.
520====
521
522endif::VK_EXT_fragment_shader_interlock[]
523
524ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
525If A is a release barrier and B is an acquire barrier, then A
526synchronizes-with B if all of the following are true:
527
528  * A is shader-call-ordered-before B
529  * A and B are in the instance of each other's memory scopes
530
531endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
532
533No other release and acquire barriers synchronize-with each other.
534
535
536[[memory-model-system-synchronizes-with]]
537== System-Synchronizes-With
538
539_System-synchronizes-with_ is a relation between arbitrary operations on the
540device or host.
541Certain operations system-synchronize-with each other, which informally
542means the first operation occurs before the second and that the
543synchronization is performed without using application-visible memory
544accesses.
545
546If there is an <<synchronization-dependencies-execution,execution
547dependency>> between two operations A and B, then the operation in the first
548synchronization scope system-synchronizes-with the operation in the second
549synchronization scope.
550
551[NOTE]
552.Note
553====
554This covers all Vulkan synchronization primitives, including device
555operations executing before a synchronization primitive is signaled, wait
556operations happening before subsequent device operations, signal operations
557happening before host operations that wait on them, and host operations
558happening before flink:vkQueueSubmit.
559The list is spread throughout the synchronization chapter, and is not
560repeated here.
561====
562
563System-synchronizes-with implicitly includes all storage class semantics and
564has code:CrossDevice scope.
565
566If A system-synchronizes-with B, we also say A is
567_system-synchronized-before_ B and B is _system-synchronized-after_ A.
568
569
570[[memory-model-non-private]]
571== Private vs. Non-Private
572
573By default, non-atomic memory operations are treated as _private_, meaning
574such a memory operation is not intended to be used for communication with
575other agents.
576Memory operations with the NonPrivatePointer/NonPrivateTexel bit set are
577treated as _non-private_, and are intended to be used for communication with
578other agents.
579
580More precisely, for private memory operations to be
581<<memory-model-location-ordered,Location-Ordered>> between distinct agents
582requires using system-synchronizes-with rather than shader-based
583synchronization.
584Non-private memory operations still obey program-order.
585
586Atomic operations are always considered non-private.
587
588
589[[memory-model-inter-thread-happens-before]]
590== Inter-Thread-Happens-Before
591
592Let SC be a non-empty set of storage class semantics.
593Then (using template syntax) operation A _inter-thread-happens-before_<SC>
594operation B if and only if any of the following is true:
595
596  * A system-synchronizes-with B
597  * A synchronizes-with B, and both A and B have all of SC in their
598    semantics
599  * A is an operation on memory in a storage class in SC or that has all of
600    SC in its semantics, B is a release barrier or release atomic with all
601    of SC in its semantics, and A is program-ordered before B
602  * A is an acquire barrier or acquire atomic with all of SC in its
603    semantics, B is an operation on memory in a storage class in SC or that
604    has all of SC in its semantics, and A is program-ordered before B
605  * A and B are both host operations and A inter-thread-happens-before B as
606    defined in the host language specification
607  * A inter-thread-happens-before<SC> some X and X
608    inter-thread-happens-before<SC> B
609
610
611[[memory-model-happens-before]]
612== Happens-Before
613
614Operation A _happens-before_ operation B if and only if any of the following
615is true:
616
617  * A is program-ordered before B
618  * A inter-thread-happens-before<SC> B for some set of storage classes SC
619
620_Happens-after_ is defined similarly.
621
622[NOTE]
623.Note
624====
625Unlike C++, happens-before is not always sufficient for a write to be
626visible to a read.
627Additional <<memory-model-availability-visibility,availability and
628visibility>> operations may: be required for writes to be
629<<memory-model-visible-to,visible-to>> other memory accesses.
630====
631
632[NOTE]
633.Note
634====
635Happens-before is not transitive, but each of program-order and
636inter-thread-happens-before<SC> are transitive.
637These can be thought of as covering the "`single-threaded`" case and the
638"`multi-threaded`" case, and it is not necessary (and not valid) to form
639chains between the two.
640====
641
642
643[[memory-model-availability-visibility]]
644== Availability and Visibility
645
646_Availability_ and _visibility_ are states of a write operation, which
647(informally) track how far the write has permeated the system, i.e. which
648agents and references are able to observe the write.
649Availability state is per _memory domain_.
650Visibility state is per (agent,reference) pair.
651Availability and visibility states are per-memory location for each write.
652
653Memory domains are named according to the agents whose memory accesses use
654the domain.
655Domains used by shader invocations are organized hierarchically into
656multiple smaller memory domains which correspond to the different
657<<shaders-scope, scopes>>.
658Each memory domain is considered the _dual_ of a scope, and vice versa.
659The memory domains defined in Vulkan include:
660
661  * _host_ - accessible by host agents
662  * _device_ - accessible by all device agents for a particular device
663  * _shader_ - accessible by shader agents for a particular device,
664    corresponding to the code:Device scope
665  * _queue family instance_ - accessible by shader agents in a single queue
666    family, corresponding to the code:QueueFamily scope.
667ifdef::VK_EXT_fragment_shader_interlock[]
668  * _fragment interlock instance_ - accessible by fragment shader agents
669    that <<shaders-scope-fragment-interlock,overlap>>, corresponding to the
670    code:FragmentInterlock scope.
671endif::VK_EXT_fragment_shader_interlock[]
672ifdef::VK_KHR_ray_tracing_pipeline[]
673  * _shader call instance_ - accessible by shader agents that are
674    <<shader-call-related,shader-call-related>>, corresponding to the
675    code:ShaderCallKHR scope.
676endif::VK_KHR_ray_tracing_pipeline[]
677  * _workgroup instance_ - accessible by shader agents in the same
678    workgroup, corresponding to the code:Workgroup scope.
679  * _subgroup instance_ - accessible by shader agents in the same subgroup,
680    corresponding to the code:Subgroup scope.
681
682The memory domains are nested in the order listed above,
683ifdef::VK_KHR_ray_tracing_pipeline[]
684except for shader call instance domain,
685endif::VK_KHR_ray_tracing_pipeline[]
686with memory domains later in the list nested in the domains earlier in the
687list.
688ifdef::VK_KHR_ray_tracing_pipeline[]
689The shader call instance domain is at an implementation-dependent location
690in the list, and is nested according to that location.
691The shader call instance domain is not broader than the queue family
692instance domain.
693endif::VK_KHR_ray_tracing_pipeline[]
694
695[NOTE]
696.Note
697====
698Memory domains do not correspond to storage classes or device-local and
699host-local slink:VkDeviceMemory allocations, rather they indicate whether a
700write can be made visible only to agents in the same subgroup, same
701workgroup,
702ifdef::VK_EXT_fragment_shader_interlock[]
703overlapping fragment shader invocation,
704endif::VK_EXT_fragment_shader_interlock[]
705ifdef::VK_KHR_ray_tracing_pipeline[]
706shader-call-related ray tracing invocation,
707endif::VK_KHR_ray_tracing_pipeline[]
708in any shader invocation, or anywhere on the device, or host.
709The shader, queue family instance,
710ifdef::VK_EXT_fragment_shader_interlock[]
711fragment interlock instance,
712endif::VK_EXT_fragment_shader_interlock[]
713ifdef::VK_KHR_ray_tracing_pipeline[]
714shader call instance,
715endif::VK_KHR_ray_tracing_pipeline[]
716workgroup instance, and subgroup instance domains are only used for
717shader-based availability/visibility operatons, in other cases writes can be
718made available from/visible to the shader via the device domain.
719====
720
721_Availability operations_, _visibility operations_, and _memory domain
722operations_ alter the state of the write operations that happen-before them,
723and which are included in their _source scope_ to be available or visible to
724their _destination scope_.
725
726  * For an availability operation, the source scope is a set of
727    (agent,reference,memory location) tuples, and the destination scope is a
728    set of memory domains.
729  * For a memory domain operation, the source scope is a memory domain and
730    the destination scope is a memory domain.
731  * For a visibility operation, the source scope is a set of memory domains
732    and the destination scope is a set of (agent,reference,memory location)
733    tuples.
734
735How the scopes are determined depends on the specific operation.
736Availability and memory domain operations expand the set of memory domains
737to which the write is available.
738Visibility operations expand the set of (agent,reference,memory location)
739tuples to which the write is visible.
740
741Recall that availability and visibility states are per-memory location, and
742let W be a write operation to one or more locations performed by agent A via
743reference R. Let L be one of the locations written.
744(W,L) (the write W to L), is initially not available to any memory domain
745and only visible to (A,R,L).
746An availability operation AV that happens-after W and that includes (A,R,L)
747in its source scope makes (W,L) _available_ to the memory domains in its
748destination scope.
749
750A memory domain operation DOM that happens-after AV and for which (W,L) is
751available in the source scope makes (W,L) available in the destination
752memory domain.
753
754A visibility operation VIS that happens-after AV (or DOM) and for which
755(W,L) is available in any domain in the source scope makes (W,L) _visible_
756to all (agent,reference,L) tuples included in its destination scope.
757
758If write W~2~ happens-after W, and their sets of memory locations overlap,
759then W will not be available/visible to all agents/references for those
760memory locations that overlap (and future AV/DOM/VIS ops cannot revive W's
761write to those locations).
762
763Availability, memory domain, and visibility operations are treated like
764other non-atomic memory accesses for the purpose of
765<<memory-model-memory-semantics,memory semantics>>, meaning they can be
766ordered by release-acquire sequences or memory barriers.
767
768An _availability chain_ is a sequence of availability operations to
769increasingly broad memory domains, where element N+1 of the chain is
770performed in the dual scope instance of the destination memory domain of
771element N and element N happens-before element N+1.
772An example is an availability operation with destination scope of the
773workgroup instance domain that happens-before an availability operation to
774the shader domain performed by an invocation in the same workgroup.
775An availability chain AVC that happens-after W and that includes (A,R,L) in
776the source scope makes (W,L) _available_ to the memory domains in its final
777destination scope.
778An availability chain with a single element is just the availability
779operation.
780
781Similarly, a _visibility chain_ is a sequence of visibility operations from
782increasingly narrow memory domains, where element N of the chain is
783performed in the dual scope instance of the source memory domain of element
784N+1 and element N happens-before element N+1.
785An example is a visibility operation with source scope of the shader domain
786that happens-before a visibility operation with source scope of the
787workgroup instance domain performed by an invocation in the same workgroup.
788A visibility chain VISC that happens-after AVC (or DOM) and for which (W,L)
789is available in any domain in the source scope makes (W,L) _visible_ to all
790(agent,reference,L) tuples included in its final destination scope.
791A visibility chain with a single element is just the visibility operation.
792
793
794[[memory-model-vulkan-availability-visibility]]
795== Availability, Visibility, and Domain Operations
796
797The following operations generate availability, visibility, and domain
798operations.
799When multiple availability/visibility/domain operations are described, they
800are system-synchronized-with each other in the order listed.
801
802An operation that performs a <<synchronization-dependencies-memory,memory
803dependency>> generates:
804
805  * If the source access mask includes ename:VK_ACCESS_HOST_WRITE_BIT, then
806    the dependency includes a memory domain operation from host domain to
807    device domain.
808  * An availability operation with source scope of all writes in the first
809    <<synchronization-dependencies-access-scopes,access scope>> of the
810    dependency and a destination scope of the device domain.
811  * A visibility operation with source scope of the device domain and
812    destination scope of the second access scope of the dependency.
813  * If the destination access mask includes ename:VK_ACCESS_HOST_READ_BIT or
814    ename:VK_ACCESS_HOST_WRITE_BIT, then the dependency includes a memory
815    domain operation from device domain to host domain.
816
817flink:vkFlushMappedMemoryRanges performs an availability operation, with a
818source scope of (agents,references) = (all host threads, all mapped memory
819ranges passed to the command), and destination scope of the host domain.
820
821flink:vkInvalidateMappedMemoryRanges performs a visibility operation, with a
822source scope of the host domain and a destination scope of
823(agents,references) = (all host threads, all mapped memory ranges passed to
824the command).
825
826flink:vkQueueSubmit performs a memory domain operation from host to device,
827and a visibility operation with source scope of the device domain and
828destination scope of all agents and references on the device.
829
830
831[[memory-model-availability-visibility-semantics]]
832== Availability and Visibility Semantics
833
834A memory barrier or atomic operation via agent A that includes MakeAvailable
835in its semantics performs an availability operation whose source scope
836includes agent A and all references in the storage classes in that
837instruction's storage class semantics, and all memory locations, and whose
838destination scope is a set of memory domains selected as specified below.
839The implicit availability operation is program-ordered between the barrier
840or atomic and all other operations program-ordered before the barrier or
841atomic.
842
843A memory barrier or atomic operation via agent A that includes MakeVisible
844in its semantics performs a visibility operation whose source scope is a set
845of memory domains selected as specified below, and whose destination scope
846includes agent A and all references in the storage classes in that
847instruction's storage class semantics, and all memory locations.
848The implicit visibility operation is program-ordered between the barrier or
849atomic and all other operations program-ordered after the barrier or atomic.
850
851The memory domains are selected based on the memory scope of the instruction
852as follows:
853
854  * code:Device scope uses the shader domain
855  * code:QueueFamily scope uses the queue family instance domain
856ifdef::VK_EXT_fragment_shader_interlock[]
857  * code:FragmentInterlock scope uses the fragment interlock instance domain
858endif::VK_EXT_fragment_shader_interlock[]
859ifdef::VK_KHR_ray_tracing_pipeline[]
860  * code:ShaderCallKHR scope uses the shader call instance domain
861endif::VK_KHR_ray_tracing_pipeline[]
862  * code:Workgroup scope uses the workgroup instance domain
863  * code:Subgroup uses the subgroup instance domain
864  * code:Invocation perform no availability/visibility operations.
865
866When an availability operation performed by an agent A includes a memory
867domain D in its destination scope, where D corresponds to scope instance S,
868it also includes the memory domains that correspond to each smaller scope
869instance S' that is a subset of S and that includes A. Similarly for
870visibility operations.
871
872
873[[memory-model-instruction-av-vis]]
874== Per-Instruction Availability and Visibility Semantics
875
876A memory write instruction that includes MakePointerAvailable, or an image
877write instruction that includes MakeTexelAvailable, performs an availability
878operation whose source scope includes the agent and reference used to
879perform the write and the memory locations written by the instruction, and
880whose destination scope is a set of memory domains selected by the Scope
881operand specified in <<memory-model-availability-visibility-semantics,
882Availability and Visibility Semantics>>.
883The implicit availability operation is program-ordered between the write and
884all other operations program-ordered after the write.
885
886A memory read instruction that includes MakePointerVisible, or an image read
887instruction that includes MakeTexelVisible, performs a visibility operation
888whose source scope is a set of memory domains selected by the Scope operand
889as specified in <<memory-model-availability-visibility-semantics,
890Availability and Visibility Semantics>>, and whose destination scope
891includes the agent and reference used to perform the read and the memory
892locations read by the instruction.
893The implicit visibility operation is program-ordered between read and all
894other operations program-ordered before the read.
895
896[NOTE]
897.Note
898====
899Although reads with per-instruction visibility only perform visibility ops
900from the shader or
901ifdef::VK_EXT_fragment_shader_interlock[]
902fragment interlock instance or
903endif::VK_EXT_fragment_shader_interlock[]
904ifdef::VK_KHR_ray_tracing_pipeline[]
905shader call instance or
906endif::VK_KHR_ray_tracing_pipeline[]
907workgroup instance or subgroup instance domain, they will also see writes
908that were made visible via the device domain, i.e. those writes previously
909performed by non-shader agents and made visible via API commands.
910====
911
912[NOTE]
913.Note
914====
915It is expected that all invocations in a subgroup execute on the same
916processor with the same path to memory, and thus availability and visibility
917operations with subgroup scope can be expected to be "`free`".
918====
919
920
921[[memory-model-location-ordered]]
922== Location-Ordered
923
924Let X and Y be memory accesses to overlapping sets of memory locations M,
925where X != Y. Let (A~X~,R~X~) be the agent and reference used for X, and
926(A~Y~,R~Y~) be the agent and reference used for Y. For now, let "`->`"
927denote happens-before and "`->^rcpo^`" denote the reflexive closure of
928program-ordered before.
929
930If D~1~ and D~2~ are different memory domains, then let DOM(D~1~,D~2~) be a
931memory domain operation from D~1~ to D~2~.
932Otherwise, let DOM(D,D) be a placeholder such that X->DOM(D,D)->Y if and
933only if X->Y.
934
935X is _location-ordered_ before Y for a location L in M if and only if any of
936the following is true:
937
938  * A~X~ == A~Y~ and R~X~ == R~Y~ and X->Y
939  ** NOTE: this case means no availability/visibility ops are required when
940     it is the same (agent,reference).
941
942  * X is a read, both X and Y are non-private, and X->Y
943  * X is a read, and X (transitively) system-synchronizes with Y
944
945  * If R~X~ == R~Y~ and A~X~ and A~Y~ access a common memory domain D (e.g.
946    are in the same workgroup instance if D is the workgroup instance
947    domain), and both X and Y are non-private:
948  ** X is a write, Y is a write, AVC(A~X~,R~X~,D,L) is an availability chain
949     making (X,L) available to domain D, and X->^rcpo^AVC(A~X~,R~X~,D,L)->Y
950  ** X is a write, Y is a read, AVC(A~X~,R~X~,D,L) is an availability chain
951     making (X,L) available to domain D, VISC(A~Y~,R~Y~,D,L) is a visibility
952     chain making writes to L available in domain D visible to Y, and
953     X->^rcpo^AVC(A~X~,R~X~,D,L)->VISC(A~Y~,R~Y~,D,L)->^rcpo^Y
954  ** If
955     slink:VkPhysicalDeviceVulkanMemoryModelFeatures::pname:vulkanMemoryModelAvailabilityVisibilityChains
956     is ename:VK_FALSE, then AVC and VISC must: each only have a single
957     element in the chain, in each sub-bullet above.
958
959  * Let D~X~ and D~Y~ each be either the device domain or the host domain,
960    depending on whether A~X~ and A~Y~ execute on the device or host:
961  ** X is a write and Y is a write, and
962     X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->Y
963  ** X is a write and Y is a read, and
964     X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->VIS(A~Y~,R~Y~,D~Y~,L)->Y
965
966[NOTE]
967.Note
968====
969The final bullet (synchronization through device/host domain) requires
970API-level synchronization operations, since the device/host domains are not
971accessible via shader instructions.
972And "`device domain`" is not to be confused with "`device scope`", which
973synchronizes through the "`shader domain`".
974====
975
976
977[[memory-model-access-data-race]]
978== Data Race
979
980Let X and Y be operations that access overlapping sets of memory locations
981M, where X != Y, and at least one of X and Y is a write, and X and Y are not
982mutually-ordered atomic operations.
983If there does not exist a location-ordered relation between X and Y for each
984location in M, then there is a _data race_.
985
986Applications must: ensure that no data races occur during the execution of
987their application.
988
989[NOTE]
990.Note
991====
992Data races can only occur due to instructions that are actually executed.
993For example, an instruction skipped due to control flow must not contribute
994to a data race.
995====
996
997
998[[memory-model-visible-to]]
999== Visible-To
1000
1001Let X be a write and Y be a read whose sets of memory locations overlap, and
1002let M be the set of memory locations that overlap.
1003Let M~2~ be a non-empty subset of M. Then X is _visible-to_ Y for memory
1004locations M~2~ if and only if all of the following are true:
1005
1006  * X is location-ordered before Y for each location L in M~2~.
1007  * There does not exist another write Z to any location L in M~2~ such that
1008    X is location-ordered before Z for location L and Z is location-ordered
1009    before Y for location L.
1010
1011If X is visible-to Y, then Y reads the value written by X for locations
1012M~2~.
1013
1014[NOTE]
1015.Note
1016====
1017It is possible for there to be a write between X and Y that overwrites a
1018subset of the memory locations, but the remaining memory locations (M~2~)
1019will still be visible-to Y.
1020====
1021
1022
1023[[memory-model-acyclicity]]
1024== Acyclicity
1025
1026_Reads-from_ is a relation between operations, where the first operation is
1027a write, the second operation is a read, and the second operation reads the
1028value written by the first operation.
1029_From-reads_ is a relation between operations, where the first operation is
1030a read, the second operation is a write, and the first operation reads a
1031value written earlier than the second operation in the second operation's
1032scoped modification order (or the first operation reads from the initial
1033value, and the second operation is any write to the same locations).
1034
1035Then the implementation must: guarantee that no cycles exist in the union of
1036the following relations:
1037
1038  * location-ordered
1039  * scoped modification order (over all atomic writes)
1040  * reads-from
1041  * from-reads
1042
1043[NOTE]
1044.Note
1045====
1046This is a "`consistency`" axiom, which informally guarantees that sequences
1047of operations cannot violate causality.
1048====
1049
1050
1051[[memory-model-scoped-modification-order-coherence]]
1052=== Scoped Modification Order Coherence
1053
1054Let A and B be mutually-ordered atomic operations, where A is
1055location-ordered before B. Then the following rules are a consequence of
1056acyclicity:
1057
1058  * If A and B are both reads and A does not read the initial value, then
1059    the write that A takes its value from must: be earlier in its own scoped
1060    modification order than (or the same as) the write that B takes its
1061    value from (no cycles between location-order, reads-from, and
1062    from-reads).
1063  * If A is a read and B is a write and A does not read the initial value,
1064    then A must: take its value from a write earlier than B in B's scoped
1065    modification order (no cycles between location-order, scope modification
1066    order, and reads-from).
1067  * If A is a write and B is a read, then B must: take its value from A or a
1068    write later than A in A's scoped modification order (no cycles between
1069    location-order, scoped modification order, and from-reads).
1070  * If A and B are both writes, then A must: be earlier than B in A's scoped
1071    modification order (no cycles between location-order and scoped
1072    modification order).
1073  * If A is a write and B is a read-modify-write and B reads the value
1074    written by A, then B comes immediately after A in A's scoped
1075    modification order (no cycles between scoped modification order and
1076    from-reads).
1077
1078
1079[[memory-model-shader-io]]
1080== Shader I/O
1081
1082If a shader invocation A in a shader stage other than code:Vertex performs a
1083memory read operation X from an object in storage class
1084ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
1085code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR,
1086code:HitAttributeKHR, code:IncomingRayPayloadKHR, or
1087endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
1088code:Input, then X is system-synchronized-after all writes to the
1089corresponding
1090ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
1091code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR,
1092code:HitAttributeKHR, code:IncomingRayPayloadKHR, or
1093endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
1094code:Output storage variable(s) in the shader invocation(s) that contribute
1095to generating invocation A, and those writes are all visible-to X.
1096
1097[NOTE]
1098.Note
1099====
1100It is not necessary for the upstream shader invocations to have completed
1101execution, they only need to have generated the output that is being read.
1102====
1103
1104
1105[[memory-model-deallocation]]
1106== Deallocation
1107
1108A call to flink:vkFreeMemory must: happen-after all memory operations on all
1109memory locations in that slink:VkDeviceMemory object.
1110
1111[NOTE]
1112.Note
1113====
1114Normally, device memory operations in a given queue are synchronized with
1115flink:vkFreeMemory by having a host thread wait on a fence signalled by that
1116queue, and the wait happens-before the call to flink:vkFreeMemory on the
1117host.
1118====
1119
1120The deallocation of SPIR-V variables is managed by the system and
1121happens-after all operations on those variables.
1122
1123
1124[[memory-model-informative-descriptions]]
1125== Descriptions (Informative)
1126
1127This subsection offers more easily understandable consequences of the memory
1128model for app/compiler developers.
1129
1130Let SC be the storage class(es) specified by a release or acquire operation
1131or barrier.
1132
1133  * An atomic write with release semantics must not be reordered against any
1134    read or write to SC that is program-ordered before it (regardless of the
1135    storage class the atomic is in).
1136
1137  * An atomic read with acquire semantics must not be reordered against any
1138    read or write to SC that is program-ordered after it (regardless of the
1139    storage class the atomic is in).
1140
1141  * Any write to SC program-ordered after a release barrier must not be
1142    reordered against any read or write to SC program-ordered before that
1143    barrier.
1144
1145  * Any read from SC program-ordered before an acquire barrier must not be
1146    reordered against any read or write to SC program-ordered after the
1147    barrier.
1148
1149A control barrier (even if it has no memory semantics) must not be reordered
1150against any memory barriers.
1151
1152This memory model allows memory accesses with and without availability and
1153visibility operations, as well as atomic operations, all to be performed on
1154the same memory location.
1155This is critical to allow it to reason about memory that is reused in
1156multiple ways, e.g. across the lifetime of different shader invocations or
1157draw calls.
1158While GLSL (and legacy SPIR-V) applies the "`coherent`" decoration to
1159variables (for historical reasons), this model treats each memory access
1160instruction as having optional implicit availability/visibility operations.
1161GLSL to SPIR-V compilers should map all (non-atomic) operations on a
1162coherent variable to Make{Pointer,Texel}{Available}{Visible} flags in this
1163model.
1164
1165Atomic operations implicitly have availability/visibility operations, and
1166the scope of those operations is taken from the atomic operation's scope.
1167
1168
1169[[memory-model-tessellation-output-ordering]]
1170== Tessellation Output Ordering
1171
1172For SPIR-V that uses the Vulkan Memory Model, the code:OutputMemory storage
1173class is used to synchronize accesses to tessellation control output
1174variables.
1175For legacy SPIR-V that does not enable the Vulkan Memory Model via
1176code:OpMemoryModel, tessellation outputs can be ordered using a control
1177barrier with no particular memory scope or semantics, as defined below.
1178
1179Let X and Y be memory operations performed by shader invocations A~X~ and
1180A~Y~.
1181Operation X is _tessellation-output-ordered_ before operation Y if and only
1182if all of the following are true:
1183
1184  * There is a dynamic instance of an code:OpControlBarrier instruction C
1185    such that X is program-ordered before C in A~X~ and C is program-ordered
1186    before Y in A~Y~.
1187  * A~X~ and A~Y~ are in the same instance of C's execution scope.
1188
1189If shader invocations A~X~ and A~Y~ in the code:TessellationControl
1190execution model execute memory operations X and Y, respectively, on the
1191code:Output storage class, and X is tessellation-output-ordered before Y
1192with a scope of code:Workgroup, then X is location-ordered before Y, and if
1193X is a write and Y is a read then X is visible-to Y.
1194
1195
1196ifdef::VK_NV_cooperative_matrix[]
1197
1198[[memory-model-cooperative-matrix]]
1199== Cooperative Matrix Memory Access
1200
1201For each dynamic instance of a cooperative matrix load or store instruction
1202(code:OpCooperativeMatrixLoadNV or code:OpCooperativeMatrixStoreNV), a
1203single implementation-dependent invocation within the instance of the
1204matrix's scope performs a non-atomic load or store (respectively) to each
1205memory location that is defined to be accessed by the instruction.
1206
1207endif::VK_NV_cooperative_matrix[]
1208