• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    NV_shader_buffer_store
4
5Name Strings
6
7    none (implied by GL_NV_gpu_program5 or GL_NV_gpu_shader5)
8
9Contact
10
11    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
12
13Status
14
15    Shipping.
16
17Version
18
19    Last Modified Date:         May 25, 2022
20    NVIDIA Revision:            6
21
22Number
23
24    390
25
26Dependencies
27
28    OpenGL 3.0 and GLSL 1.30 are required.
29
30    This extension is written against the OpenGL 3.2 (Compatibility Profile)
31    specification, dated July 24, 2009.
32
33    This extension is written against version 1.50.09 of the OpenGL Shading
34    Language Specification.
35
36    OpenGL 3.0 and GLSL 1.30 are required.
37
38    NV_shader_buffer_load is required.
39
40    NV_gpu_program5 and/or NV_gpu_shader5 is required.
41
42    This extension interacts with EXT_shader_image_load_store.
43
44    This extension interacts with NV_gpu_shader5.
45
46    This extension interacts with NV_gpu_program5.
47
48    This extension interacts with GLSL 4.30, ARB_shader_storage_buffer_object,
49    and ARB_compute_shader.
50
51    This extension interacts with OpenGL 4.2.
52
53Overview
54
55    This extension builds upon the mechanisms added by the
56    NV_shader_buffer_load extension to allow shaders to perform random-access
57    reads to buffer object memory without using dedicated buffer object
58    binding points.  Instead, it allowed an application to make a buffer
59    object resident, query a GPU address (pointer) for the buffer object, and
60    then use that address as a pointer in shader code.  This approach allows
61    shaders to access a large number of buffer objects without needing to
62    repeatedly bind buffers to a limited number of fixed-functionality binding
63    points.
64
65    This extension lifts the restriction from NV_shader_buffer_load that
66    disallows writes.  In particular, the MakeBufferResidentNV function now
67    allows READ_WRITE and WRITE_ONLY access modes, and the shading language is
68    extended to allow shaders to write through (GPU address) pointers.
69    Additionally, the extension provides built-in functions to perform atomic
70    memory transactions to buffer object memory.
71
72    As with the shader writes provided by the EXT_shader_image_load_store
73    extension, writes to buffer object memory using this extension are weakly
74    ordered to allow for parallel or distributed shader execution.  The
75    EXT_shader_image_load_store extension provides mechanisms allowing for
76    finer control of memory transaction order, and those mechanisms apply
77    equally to buffer object stores using this extension.
78
79
80New Procedures and Functions
81
82    None.
83
84New Tokens
85
86    Accepted by the <barriers> parameter of MemoryBarrierEXT:
87
88        SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV             0x00000010
89
90    Accepted by the <access> parameter of MakeBufferResidentNV:
91
92        READ_WRITE
93        WRITE_ONLY
94
95
96Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
97(OpenGL Operation)
98
99    Modify Section 2.9, Buffer Objects, p. 46
100
101    (extend the language inserted by NV_shader_buffer_load in its "Append to
102     Section 2.9 (p. 45) to allow READ_WRITE and WRITE_ONLY mappings)
103
104    The data store of a buffer object may be made accessible to the GL
105    via shader buffer loads and stores by calling:
106
107        void MakeBufferResidentNV(enum target, enum access);
108
109    <access> may be READ_ONLY, READ_WRITE, and WRITE_ONLY.  If a shader loads
110    from a buffer with WRITE_ONLY <access> or stores to a buffer with
111    READ_ONLY <access>, the results of that shader operation are undefined and
112    may lead to application termination.  <target> may be any of the buffer
113    targets accepted by BindBuffer.
114
115    The data store of a buffer object may be made inaccessible to the GL
116    via shader buffer loads and stores by calling:
117
118        void MakeBufferNonResidentNV(enum target);
119
120
121    Modify "Section 2.20.X, Shader Memory Access" introduced by the
122    NV_shader_buffer_load specification, to reflect that shaders may store to
123    buffer object memory.
124
125    (first paragraph) Shaders may load from or store to buffer object memory
126    by dereferencing pointer variables.  ...
127
128    (second paragraph) When a shader dereferences a pointer variable, data are
129    read from or written to buffer object memory according to the following
130    rules:
131
132    (modify the paragraph after the end of the alignment and stride rules,
133    allowing for writes, and also providing rules forbidding reads to
134    WRITE_ONLY mappings or vice-versa) If a shader reads or writes to a GPU
135    memory address that does not correspond to a buffer object made resident
136    by MakeBufferResidentNV, the results of the operation are undefined and
137    may result in application termination.  If a shader reads from a buffer
138    object made resident with an <access> parameter of WRITE_ONLY, or writes
139    to a buffer object made resident with an <access> parameter of READ_ONLY,
140    the results of the operation are also undefined and may lead to
141    application termination.
142
143    Incorporate the contents of "Section 2.14.X, Shader Memory Access" from
144    the EXT_shader_image_load_store specification into the same "Shader memory
145    Access", with the following edits.
146
147    (modify first paragraph to reference pointers) Shaders may perform
148    random-access reads and writes to texture or buffer object memory using
149    pointers or with built-in image load, store, and atomic functions, as
150    described in the OpenGL Shading Language Specification.  ...
151
152    (add to list of bits in <barriers> in MemoryBarrierEXT)
153
154    - SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV:  Memory accesses using pointers and
155        assembly program global loads, stores, and atomics issued after the
156        barrier will reflect data written by shaders prior to the barrier.
157        Additionally, memory writes using pointers issued after the barrier
158        will not execute until memory accesses (loads, stores, texture
159        fetches, vertex fetches, etc) initiated prior to the barrier complete.
160
161    (modify second paragraph after the list of <barriers> bits) To allow for
162    independent shader threads to communicate by reads and writes to a common
163    memory address, pointers and image variables in the OpenGL shading
164    language may be declared as "coherent".  Buffer object or texture memory
165    accessed through such variables may be cached only if...
166
167    (add to the coherency guidelines)
168
169    - Data written using pointers in one rendering pass and read by the shader
170      in a later pass need not use coherent variables or memoryBarrier().
171      Calling MemoryBarrierEXT() with the SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV
172      set in <barriers> between passes is necessary.
173
174
175Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
176(Rasterization)
177
178    None.
179
180
181Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
182(Per-Fragment Operations and the Frame Buffer)
183
184    None.
185
186
187Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
188(Special Functions)
189
190    None.
191
192
193Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
194(State and State Requests)
195
196    None.
197
198
199Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
200Specification (Invariance)
201
202    None.
203
204Additions to the AGL/GLX/WGL Specifications
205
206    None.
207
208GLX Protocol
209
210    None.
211
212
213Additions to the OpenGL Shading Language Specification, Version 1.50 (Revision
21409)
215
216    Modify Section 4.3.X, Memory Access Qualifiers, as added by
217    EXT_shader_image_load_store
218
219    (modify second paragraph) Memory accesses to image and pointer variables
220    declared using the "coherent" storage qualifier are performed coherently
221    with similar accesses from other shader threads.  ...
222
223    (modify fourth paragraph) Memory accesses to image and pointer variables
224    declared using the "volatile" storage qualifier must treat the underlying
225    memory as though it could be read or written at any point during shader
226    execution by some source other than the executing thread.  ...
227
228    (modify fifth paragraph) Memory accesses to image and pointer variables
229    declared using the "restrict" storage qualifier may be compiled assuming
230    that the variable used to perform the memory access is the only way to
231    access the underlying memory using the shader stage in question.  ...
232
233    (modify sixth paragraph) Memory accesses to image and pointer variables
234    declared using the "const" storage qualifier may only read the underlying
235    memory, which is treated as read-only.  ...
236
237    (insert after seventh paragraph)
238
239    In pointer variable declarations, the "coherent", "volatile", "restrict",
240    and "const" qualifiers can be positioned anywhere in the declaration, and
241    may apply qualify either a pointer or the underlying data being pointed
242    to, depending on its position in the declaration.  Each qualifier to the
243    right of the basic data type in a declaration is considered to apply to
244    whatever type is found immediately to its left; qualifiers to the left of
245    the basic type are considered to apply to that basic type.  To interpret
246    the meaning of qualifiers in pointer declarations, it is useful to read
247    the declaration from right to left as in the following examples.
248
249      int * * const a;     // a is a constant pointer to a pointer to int
250      int * volatile * b;  // b is a pointer to a volatile pointer to int
251      int const * * c;     // c is a pointer to a pointer to a constant int
252      const int * * d;     // d is like c
253      int const * const *  // e is a constant pointer to a constant pointer
254       const e;            //   to a constant int
255
256    For pointer types, the "restrict" qualifier can be used to qualify
257    pointers, but not non-pointer types being pointed to.
258
259      int * restrict a;    // a is a restricted pointer to int
260      int restrict * b;    // b qualifies "int" as restricted - illegal
261
262    (modify eighth paragraph) The "coherent", "volatile", and "restrict"
263    storage qualifiers may only be used on image and pointer variables, and
264    may not be used on variables of any other type.  ...
265
266    (modify last paragraph) The values of image and pointer variables
267    qualified with "coherent," "volatile," "restrict", or "const" may not be
268    assigned to function parameters or l-values lacking such qualifiers.
269
270    (add examples for the last paragraph)
271
272      int volatile * var1;
273      int * var2;
274      int * restrict var3;
275      var1 = var2;              // OK, adding "volatile" is allowed
276      var2 = var3;              // illegal, stripping "restrict" is not
277
278
279    Modify Section 5.X, Pointer Operations, as added by NV_shader_buffer_load
280
281    (modify second paragraph, allowing storing through pointers) The pointer
282    dereference operator ...  The result of a pointer dereference may be used
283    as the left-hand side of an assignment.
284
285
286    Modify Section 8.Y, Shader Memory Functions, as added by
287    EXT_shader_image_load_store
288
289    (modify first paragraph) Shaders of all types may read and write the
290    contents of textures and buffer objects using pointers and image
291    variables.  ...
292
293    (modify description of memoryBarrier) memoryBarrier() can be used to
294    control the ordering of memory transactions issued by a shader thread.
295    When called, it will wait on the completion of all memory accesses
296    resulting from the use of pointers and image variables prior to calling
297    the function.  ...
298
299    (add the following paragraphs to the end of the section)
300
301    If multiple threads need to atomically access shared memory addresses
302    using pointers, they may do so using the following built-in functions.
303    The following atomic memory access functions allow a shader thread to
304    read, modify, and write an address in memory in a manner that guarantees
305    that no other shader thread can modify the memory between the read and the
306    write.  All of these functions read a single data element from memory,
307    compute a new value based on the value read from memory and one or more
308    other values passed to the function, and writes the result back to the
309    same memory address.  The value returned to the caller is always the data
310    element originally read from memory.
311
312    Syntax:
313
314      uint      atomicAdd(uint *address, uint data);
315      int       atomicAdd(int *address, int data);
316      uint64_t  atomicAdd(uint64_t *address,  uint64_t data);
317
318      uint      atomicMin(uint *address, uint data);
319      int       atomicMin(int *address, int data);
320
321      uint      atomicMax(uint *address, uint data);
322      int       atomicMax(int *address, int data);
323
324      uint      atomicIncWrap(uint *address, uint wrap);
325
326      uint      atomicDecWrap(uint *address, uint wrap);
327
328      uint      atomicAnd(uint *address, uint data);
329      int       atomicAnd(int *address, int data);
330
331      uint      atomicOr(uint *address, uint data);
332      int       atomicOr(int *address, int data);
333
334      uint      atomicXor(uint *address, uint data);
335      int       atomicXor(int *address, int data);
336
337      uint      atomicExchange(uint *address, uint data);
338      int       atomicExchange(int *address, uint data);
339      uint64_t  atomicExchange(uint64_t *address, uint64_t data);
340
341      uint      atomicCompSwap(uint *address, uint compare, uint data);
342      int       atomicCompSwap(int *address, int compare, int data);
343      uint64_t  atomicCompSwap(uint64_t *address, uint64_t compare,
344                               uint64_t data);
345
346    Description:
347
348    atomicAdd() computes the new value written to <address> by adding the
349    value of <data> to the contents of <address>.  This function supports 32-
350    and 64-bit unsigned integer operands, and 32-bit signed integer operands.
351
352    atomicMin() computes the new value written to <address> by taking the
353    minimum of the value of <data> and the contents of <address>.  This
354    function supports 32-bit signed and unsigned integer operands.
355
356    atomicMax() computes the new value written to <address> by taking the
357    maximum of the value of <data> and the contents of <address>.  This
358    function supports 32-bit signed and unsigned integer operands.
359
360    atomicIncWrap() computes the new value written to <address> by adding one
361    to the contents of <address>, and then forcing the result to zero if and
362    only if the incremented value is greater than or equal to <wrap>.  This
363    function supports only 32-bit unsigned integer operands.
364
365    atomicDecWrap() computes the new value written to <address> by subtracting
366    one from the contents of <address>, and then forcing the result to
367    <wrap>-1 if the original value read from <address> was either zero or
368    greater than <wrap>.  This function supports only 32-bit unsigned integer
369    operands.
370
371    atomicAnd() computes the new value written to <address> by performing a
372    bitwise and of the value of <data> and the contents of <address>.  This
373    function supports 32-bit signed and unsigned integer operands.
374
375    atomicOr() computes the new value written to <address> by performing a
376    bitwise or of the value of <data> and the contents of <address>.  This
377    function supports 32-bit signed and unsigned integer operands.
378
379    atomicXor() computes the new value written to <address> by performing a
380    bitwise exclusive or of the value of <data> and the contents of <address>.
381    This function supports 32-bit signed and unsigned integer operands.
382
383    atomicExchange() uses the value of <data> as the value written to
384    <address>.  This function supports 32- and 64-bit unsigned integer
385    operands and 32-bit signed integer operands.
386
387    atomicCompSwap() compares the value of <compare> and the contents of
388    <address>.  If the values are equal, <data> is written to <address>;
389    otherwise, the original contents of <address> are preserved.  This
390    function supports 32- and 64-bit unsigned integer operands and 32-bit
391    signed integer operands.
392
393
394    Modify Section 9, Shading Language Grammar, p. 105
395
396    !!! TBD:  Add grammar constructs for memory access qualifiers, allowing
397        memory access qualifiers before or after the type and the "*"
398        characters indicating pointers in a variable declaration.
399
400
401Dependencies on EXT_shader_image_load_store
402
403    This specification incorporates the memory access ordering and
404    synchronization discussion from EXT_shader_image_load_store verbatim.
405
406    If EXT_shader_image_load_store is not supported, this spec should be
407    construed to introduce:
408
409      * the shader memory access language from that specification, including
410        the MemoryBarrierEXT() command and the tokens accepted by <barriers>
411        from that specification;
412
413      * the memoryBarrier() function to the OpenGL shading language
414        specification; and
415
416      * the capability and spec language allowing applications to enable early
417        depth tests.
418
419Dependencies on NV_gpu_shader5
420
421    This specification requires either NV_gpu_shader5 or NV_gpu_program5.
422
423    If NV_gpu_shader5 is supported, use of the new shading language features
424    described in this extension requires
425
426      #extension GL_NV_gpu_shader5 : enable
427
428    If NV_gpu_shader5 is not supported, modifications to the OpenGL Shading
429    Language Specification should be removed.
430
431Dependencies on NV_gpu_program5
432
433    If NV_gpu_program5 is supported, the extension provides support for stores
434    and atomic memory transactions to buffer object memory.  Stores are
435    provided by the STORE opcode; atomics are provided by the ATOM opcode.  No
436    "OPTION" line is required for these features, which are implied by
437    NV_gpu_program5 program headers such as "!!NVfp5.0".  The operation of
438    these opcodes is described in the NV_gpu_program5 extension specification.
439
440    Note also that NV_gpu_program5 also supports the LOAD opcode originally
441    added by the NV_shader_buffer_load and the MEMBAR opcode originally
442    provided by EXT_shader_image_load_store.
443
444Dependencies on GLSL 4.30, ARB_shader_storage_buffer_object, and
445ARB_compute_shader
446
447    If GLSL 4.30 is supported, add the following atomic memory functions to
448    section 8.11 (Atomic Memory Functions) of the GLSL 4.30 specification:
449
450      uint atomicIncWrap(inout uint mem, uint wrap);
451      uint atomicDecWrap(inout uint mem, uint wrap);
452
453    with the following documentation
454
455      atomicIncWrap() computes the new value written to <mem> by adding one to
456      the contents of <mem>, and then forcing the result to zero if and only
457      if the incremented value is greater than or equal to <wrap>.  This
458      function supports only 32-bit unsigned integer operands.
459
460      atomicDecWrap() computes the new value written to <mem> by subtracting
461      one from the contents of <mem>, and then forcing the result to <wrap>-1
462      if the original value read from <mem> was either zero or greater than
463      <wrap>.  This function supports only 32-bit unsigned integer operands.
464
465    Additionally, add the following functions to the section:
466
467      uint64_t atomicAdd(inout uint64_t mem, uint data);
468      uint64_t atomicExchange(inout uint64_t mem, uint data);
469      uint64_t atomicCompSwap(inout uint64_t mem, uint64_t compare,
470                              uint64_t data);
471
472    If ARB_shader_storage_buffer_object or ARB_compute_shader are supported,
473    make similar edits to the functions documented in the
474    ARB_shader_storage_buffer object extension.
475
476    These functions are available if and only if GL_NV_gpu_shader5 is enabled
477    via the "#extension" directive.
478
479Dependencies on OpenGL 4.2
480
481    If OpenGL 4.2 is supported, MemoryBarrierEXT can be replaced with the
482    equivalent core function MemoryBarrier.
483
484
485Errors
486
487    None
488
489New State
490
491    None.
492
493Issues
494
495    (1) Does MAX_SHADER_BUFFER_ADDRESS_NV still apply?
496
497      RESOLVED:  The primary reason for this limitation to exist was the lack
498      of 64-bit integer support in shaders (see issue 15 of
499      NV_shader_buffer_load). Given that this extension is being released at
500      the same time as NV_gpu_shader5 which adds 64-bit integer support, it
501      is expected that this maximum address will match the maximum address
502      supported by the GPU's address space, or will be equal to "~0ULL"
503      indicating that any GPU address returned by the GL will be usable in a
504      shader.
505
506    (2) What qualifiers should be supported on pointer variables, and how can
507        they be used in declarations?
508
509      RESOLVED:  We will support the qualifiers "coherent", "volatile",
510      "restrict", and "const" to be used in pointer declarations.  "coherent"
511      is taken from EXT_shader_image_load_store and is used to ensure that
512      memory accesses from different shader threads are cached coherently
513      (i.e., will be able to see each other when complete).  "volatile" and
514      "const" behave is as in C.
515
516      "restrict" behaves as in the C99 standard, and can be used to indicate
517      that no other pointer points to the same underlying data.  This permits
518      optimizations that would otherwise be impossible if the compiler has to
519      assume that a pair of pointers might end up pointing to the same data.
520      For example, in standard C/C++, a loop like:
521
522        int *a, *b;
523        a[0] = b[0] + b[0];
524        a[1] = b[0] + b[1];
525        a[2] = b[0] + b[2];
526
527       would need to reload b[0] for each assignment because a[0] or a[1]
528       might point at the same data as b[0].  With restrict, the compiler can
529       assume that b[0] is not modified by any of the instructions and load it
530       just once.
531
532    (3) What amount of automatic synchronization is provided for buffer object
533        writes through pointers?
534
535      RESOLVED:  Use of MemoryBarrierEXT() is required, and there is no
536      automatic synchronization when buffers are bound or unbound.  With
537      resident buffers, there are no well-defined binding points in the first
538      place -- all resident buffers are effectively "bound".
539
540      Implicit synchronization is difficult, as it might require some
541      combination of:
542
543        - tracking which buffers might be written (randomly) in the shader
544          itself;
545
546        - assuming that if a shader that performs writes is executed, all
547          bytes of all resident buffers could be modified and thus must be
548          treated as dirty;
549
550        - idling at the end of each primitive or draw call, so that the
551          results of all previous commands are complete.
552
553      Since normal OpenGL operation is pipelined, idling would result in a
554      significant performance impact since pipelining would otherwise allow
555      fragment shader execution for draw call N while simultaneously
556      performing vertex shader execution for draw call N+1.
557
558
559Revision History
560
561    Rev.    Date    Author    Changes
562    ----  --------  --------  -----------------------------------------
563     6    05/25/22  shqxu     Update to address removal of function MemoryBarrierNV
564                              and replace with MemoryBarrierEXT. Add interaction
565                              with OpenGL 4.2 supporting MemoryBarrier.
566
567     5    08/13/12  pbrown    Add interaction with OpenGL 4.3 (and related ARB
568                              extensions) supporting atomic{Inc,Dec}Wrap and
569                              64-bit unsigned integer atomics to shared and
570                              shader storage buffer memory.
571
572     4    04/13/10  pbrown    Remove the floating-point version of atomicAdd().
573
574     3    03/23/10  pbrown    Minor cleanups to the dependency sections.
575                              Fixed obsolete extension names.  Add an issue
576                              on synchronization.
577
578     2    03/16/10  pbrown    Updated memory access qualifiers section
579                              (volatile, coherent, restrict, const) for
580                              pointers.  Added language to document how
581                              these qualifiers work in possibly complicated
582                              expression.
583
584     1              pbrown    Internal revisions.
585