• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    NV_shader_buffer_store
4
5Name Strings
6
7    none (implied by GL_NV_gpu_program5 or GL_NV_gpu_shader5)
8
9Contact
10
11    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
12
13Status
14
15    Shipping.
16
17Version
18
19    Last Modified Date:         August 13, 2012
20    NVIDIA Revision:            5
21
22Number
23
24    390
25
26Dependencies
27
28    OpenGL 3.0 and GLSL 1.30 are required.
29
30    This extension is written against the OpenGL 3.2 (Compatibility Profile)
31    specification, dated July 24, 2009.
32
33    This extension is written against version 1.50.09 of the OpenGL Shading
34    Language Specification.
35
36    OpenGL 3.0 and GLSL 1.30 are required.
37
38    NV_shader_buffer_load is required.
39
40    NV_gpu_program5 and/or NV_gpu_shader5 is required.
41
42    This extension interacts with EXT_shader_image_load_store.
43
44    This extension interacts with NV_gpu_shader5.
45
46    This extension interacts with NV_gpu_program5.
47
48    This extension interacts with GLSL 4.30, ARB_shader_storage_buffer_object,
49    and ARB_compute_shader.
50
51Overview
52
53    This extension builds upon the mechanisms added by the
54    NV_shader_buffer_load extension to allow shaders to perform random-access
55    reads to buffer object memory without using dedicated buffer object
56    binding points.  Instead, it allowed an application to make a buffer
57    object resident, query a GPU address (pointer) for the buffer object, and
58    then use that address as a pointer in shader code.  This approach allows
59    shaders to access a large number of buffer objects without needing to
60    repeatedly bind buffers to a limited number of fixed-functionality binding
61    points.
62
63    This extension lifts the restriction from NV_shader_buffer_load that
64    disallows writes.  In particular, the MakeBufferResidentNV function now
65    allows READ_WRITE and WRITE_ONLY access modes, and the shading language is
66    extended to allow shaders to write through (GPU address) pointers.
67    Additionally, the extension provides built-in functions to perform atomic
68    memory transactions to buffer object memory.
69
70    As with the shader writes provided by the EXT_shader_image_load_store
71    extension, writes to buffer object memory using this extension are weakly
72    ordered to allow for parallel or distributed shader execution.  The
73    EXT_shader_image_load_store extension provides mechanisms allowing for
74    finer control of memory transaction order, and those mechanisms apply
75    equally to buffer object stores using this extension.
76
77
78New Procedures and Functions
79
80    None.
81
82New Tokens
83
84    Accepted by the <barriers> parameter of MemoryBarrierNV:
85
86        SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV             0x00000010
87
88    Accepted by the <access> parameter of MakeBufferResidentNV:
89
90        READ_WRITE
91        WRITE_ONLY
92
93
94Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
95(OpenGL Operation)
96
97    Modify Section 2.9, Buffer Objects, p. 46
98
99    (extend the language inserted by NV_shader_buffer_load in its "Append to
100     Section 2.9 (p. 45) to allow READ_WRITE and WRITE_ONLY mappings)
101
102    The data store of a buffer object may be made accessible to the GL
103    via shader buffer loads and stores by calling:
104
105        void MakeBufferResidentNV(enum target, enum access);
106
107    <access> may be READ_ONLY, READ_WRITE, and WRITE_ONLY.  If a shader loads
108    from a buffer with WRITE_ONLY <access> or stores to a buffer with
109    READ_ONLY <access>, the results of that shader operation are undefined and
110    may lead to application termination.  <target> may be any of the buffer
111    targets accepted by BindBuffer.
112
113    The data store of a buffer object may be made inaccessible to the GL
114    via shader buffer loads and stores by calling:
115
116        void MakeBufferNonResidentNV(enum target);
117
118
119    Modify "Section 2.20.X, Shader Memory Access" introduced by the
120    NV_shader_buffer_load specification, to reflect that shaders may store to
121    buffer object memory.
122
123    (first paragraph) Shaders may load from or store to buffer object memory
124    by dereferencing pointer variables.  ...
125
126    (second paragraph) When a shader dereferences a pointer variable, data are
127    read from or written to buffer object memory according to the following
128    rules:
129
130    (modify the paragraph after the end of the alignment and stride rules,
131    allowing for writes, and also providing rules forbidding reads to
132    WRITE_ONLY mappings or vice-versa) If a shader reads or writes to a GPU
133    memory address that does not correspond to a buffer object made resident
134    by MakeBufferResidentNV, the results of the operation are undefined and
135    may result in application termination.  If a shader reads from a buffer
136    object made resident with an <access> parameter of WRITE_ONLY, or writes
137    to a buffer object made resident with an <access> parameter of READ_ONLY,
138    the results of the operation are also undefined and may lead to
139    application termination.
140
141    Incorporate the contents of "Section 2.14.X, Shader Memory Access" from
142    the EXT_shader_image_load_store specification into the same "Shader memory
143    Access", with the following edits.
144
145    (modify first paragraph to reference pointers) Shaders may perform
146    random-access reads and writes to texture or buffer object memory using
147    pointers or with built-in image load, store, and atomic functions, as
148    described in the OpenGL Shading Language Specification.  ...
149
150    (add to list of bits in <barriers> in MemoryBarrierNV)
151
152    - SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV:  Memory accesses using pointers and
153        assembly program global loads, stores, and atomics issued after the
154        barrier will reflect data written by shaders prior to the barrier.
155        Additionally, memory writes using pointers issued after the barrier
156        will not execute until memory accesses (loads, stores, texture
157        fetches, vertex fetches, etc) initiated prior to the barrier complete.
158
159    (modify second paragraph after the list of <barriers> bits) To allow for
160    independent shader threads to communicate by reads and writes to a common
161    memory address, pointers and image variables in the OpenGL shading
162    language may be declared as "coherent".  Buffer object or texture memory
163    accessed through such variables may be cached only if...
164
165    (add to the coherency guidelines)
166
167    - Data written using pointers in one rendering pass and read by the shader
168      in a later pass need not use coherent variables or memoryBarrier().
169      Calling MemoryBarrierNV() with the SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV
170      set in <barriers> between passes is necessary.
171
172
173Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
174(Rasterization)
175
176    None.
177
178
179Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
180(Per-Fragment Operations and the Frame Buffer)
181
182    None.
183
184
185Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
186(Special Functions)
187
188    None.
189
190
191Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
192(State and State Requests)
193
194    None.
195
196
197Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
198Specification (Invariance)
199
200    None.
201
202Additions to the AGL/GLX/WGL Specifications
203
204    None.
205
206GLX Protocol
207
208    None.
209
210
211Additions to the OpenGL Shading Language Specification, Version 1.50 (Revision
21209)
213
214    Modify Section 4.3.X, Memory Access Qualifiers, as added by
215    EXT_shader_image_load_store
216
217    (modify second paragraph) Memory accesses to image and pointer variables
218    declared using the "coherent" storage qualifier are performed coherently
219    with similar accesses from other shader threads.  ...
220
221    (modify fourth paragraph) Memory accesses to image and pointer variables
222    declared using the "volatile" storage qualifier must treat the underlying
223    memory as though it could be read or written at any point during shader
224    execution by some source other than the executing thread.  ...
225
226    (modify fifth paragraph) Memory accesses to image and pointer variables
227    declared using the "restrict" storage qualifier may be compiled assuming
228    that the variable used to perform the memory access is the only way to
229    access the underlying memory using the shader stage in question.  ...
230
231    (modify sixth paragraph) Memory accesses to image and pointer variables
232    declared using the "const" storage qualifier may only read the underlying
233    memory, which is treated as read-only.  ...
234
235    (insert after seventh paragraph)
236
237    In pointer variable declarations, the "coherent", "volatile", "restrict",
238    and "const" qualifiers can be positioned anywhere in the declaration, and
239    may apply qualify either a pointer or the underlying data being pointed
240    to, depending on its position in the declaration.  Each qualifier to the
241    right of the basic data type in a declaration is considered to apply to
242    whatever type is found immediately to its left; qualifiers to the left of
243    the basic type are considered to apply to that basic type.  To interpret
244    the meaning of qualifiers in pointer declarations, it is useful to read
245    the declaration from right to left as in the following examples.
246
247      int * * const a;     // a is a constant pointer to a pointer to int
248      int * volatile * b;  // b is a pointer to a volatile pointer to int
249      int const * * c;     // c is a pointer to a pointer to a constant int
250      const int * * d;     // d is like c
251      int const * const *  // e is a constant pointer to a constant pointer
252       const e;            //   to a constant int
253
254    For pointer types, the "restrict" qualifier can be used to qualify
255    pointers, but not non-pointer types being pointed to.
256
257      int * restrict a;    // a is a restricted pointer to int
258      int restrict * b;    // b qualifies "int" as restricted - illegal
259
260    (modify eighth paragraph) The "coherent", "volatile", and "restrict"
261    storage qualifiers may only be used on image and pointer variables, and
262    may not be used on variables of any other type.  ...
263
264    (modify last paragraph) The values of image and pointer variables
265    qualified with "coherent," "volatile," "restrict", or "const" may not be
266    assigned to function parameters or l-values lacking such qualifiers.
267
268    (add examples for the last paragraph)
269
270      int volatile * var1;
271      int * var2;
272      int * restrict var3;
273      var1 = var2;              // OK, adding "volatile" is allowed
274      var2 = var3;              // illegal, stripping "restrict" is not
275
276
277    Modify Section 5.X, Pointer Operations, as added by NV_shader_buffer_load
278
279    (modify second paragraph, allowing storing through pointers) The pointer
280    dereference operator ...  The result of a pointer dereference may be used
281    as the left-hand side of an assignment.
282
283
284    Modify Section 8.Y, Shader Memory Functions, as added by
285    EXT_shader_image_load_store
286
287    (modify first paragraph) Shaders of all types may read and write the
288    contents of textures and buffer objects using pointers and image
289    variables.  ...
290
291    (modify description of memoryBarrier) memoryBarrier() can be used to
292    control the ordering of memory transactions issued by a shader thread.
293    When called, it will wait on the completion of all memory accesses
294    resulting from the use of pointers and image variables prior to calling
295    the function.  ...
296
297    (add the following paragraphs to the end of the section)
298
299    If multiple threads need to atomically access shared memory addresses
300    using pointers, they may do so using the following built-in functions.
301    The following atomic memory access functions allow a shader thread to
302    read, modify, and write an address in memory in a manner that guarantees
303    that no other shader thread can modify the memory between the read and the
304    write.  All of these functions read a single data element from memory,
305    compute a new value based on the value read from memory and one or more
306    other values passed to the function, and writes the result back to the
307    same memory address.  The value returned to the caller is always the data
308    element originally read from memory.
309
310    Syntax:
311
312      uint      atomicAdd(uint *address, uint data);
313      int       atomicAdd(int *address, int data);
314      uint64_t  atomicAdd(uint64_t *address,  uint64_t data);
315
316      uint      atomicMin(uint *address, uint data);
317      int       atomicMin(int *address, int data);
318
319      uint      atomicMax(uint *address, uint data);
320      int       atomicMax(int *address, int data);
321
322      uint      atomicIncWrap(uint *address, uint wrap);
323
324      uint      atomicDecWrap(uint *address, uint wrap);
325
326      uint      atomicAnd(uint *address, uint data);
327      int       atomicAnd(int *address, int data);
328
329      uint      atomicOr(uint *address, uint data);
330      int       atomicOr(int *address, int data);
331
332      uint      atomicXor(uint *address, uint data);
333      int       atomicXor(int *address, int data);
334
335      uint      atomicExchange(uint *address, uint data);
336      int       atomicExchange(int *address, uint data);
337      uint64_t  atomicExchange(uint64_t *address, uint64_t data);
338
339      uint      atomicCompSwap(uint *address, uint compare, uint data);
340      int       atomicCompSwap(int *address, int compare, int data);
341      uint64_t  atomicCompSwap(uint64_t *address, uint64_t compare,
342                               uint64_t data);
343
344    Description:
345
346    atomicAdd() computes the new value written to <address> by adding the
347    value of <data> to the contents of <address>.  This function supports 32-
348    and 64-bit unsigned integer operands, and 32-bit signed integer operands.
349
350    atomicMin() computes the new value written to <address> by taking the
351    minimum of the value of <data> and the contents of <address>.  This
352    function supports 32-bit signed and unsigned integer operands.
353
354    atomicMax() computes the new value written to <address> by taking the
355    maximum of the value of <data> and the contents of <address>.  This
356    function supports 32-bit signed and unsigned integer operands.
357
358    atomicIncWrap() computes the new value written to <address> by adding one
359    to the contents of <address>, and then forcing the result to zero if and
360    only if the incremented value is greater than or equal to <wrap>.  This
361    function supports only 32-bit unsigned integer operands.
362
363    atomicDecWrap() computes the new value written to <address> by subtracting
364    one from the contents of <address>, and then forcing the result to
365    <wrap>-1 if the original value read from <address> was either zero or
366    greater than <wrap>.  This function supports only 32-bit unsigned integer
367    operands.
368
369    atomicAnd() computes the new value written to <address> by performing a
370    bitwise and of the value of <data> and the contents of <address>.  This
371    function supports 32-bit signed and unsigned integer operands.
372
373    atomicOr() computes the new value written to <address> by performing a
374    bitwise or of the value of <data> and the contents of <address>.  This
375    function supports 32-bit signed and unsigned integer operands.
376
377    atomicXor() computes the new value written to <address> by performing a
378    bitwise exclusive or of the value of <data> and the contents of <address>.
379    This function supports 32-bit signed and unsigned integer operands.
380
381    atomicExchange() uses the value of <data> as the value written to
382    <address>.  This function supports 32- and 64-bit unsigned integer
383    operands and 32-bit signed integer operands.
384
385    atomicCompSwap() compares the value of <compare> and the contents of
386    <address>.  If the values are equal, <data> is written to <address>;
387    otherwise, the original contents of <address> are preserved.  This
388    function supports 32- and 64-bit unsigned integer operands and 32-bit
389    signed integer operands.
390
391
392    Modify Section 9, Shading Language Grammar, p. 105
393
394    !!! TBD:  Add grammar constructs for memory access qualifiers, allowing
395        memory access qualifiers before or after the type and the "*"
396        characters indicating pointers in a variable declaration.
397
398
399Dependencies on EXT_shader_image_load_store
400
401    This specification incorporates the memory access ordering and
402    synchronization discussion from EXT_shader_image_load_store verbatim.
403
404    If EXT_shader_image_load_store is not supported, this spec should be
405    construed to introduce:
406
407      * the shader memory access language from that specification, including
408        the MemoryBarrierNV() command and the tokens accepted by <barriers>
409        from that specification;
410
411      * the memoryBarrier() function to the OpenGL shading language
412        specification; and
413
414      * the capability and spec language allowing applications to enable early
415        depth tests.
416
417Dependencies on NV_gpu_shader5
418
419    This specification requires either NV_gpu_shader5 or NV_gpu_program5.
420
421    If NV_gpu_shader5 is supported, use of the new shading language features
422    described in this extension requires
423
424      #extension GL_NV_gpu_shader5 : enable
425
426    If NV_gpu_shader5 is not supported, modifications to the OpenGL Shading
427    Language Specification should be removed.
428
429Dependencies on NV_gpu_program5
430
431    If NV_gpu_program5 is supported, the extension provides support for stores
432    and atomic memory transactions to buffer object memory.  Stores are
433    provided by the STORE opcode; atomics are provided by the ATOM opcode.  No
434    "OPTION" line is required for these features, which are implied by
435    NV_gpu_program5 program headers such as "!!NVfp5.0".  The operation of
436    these opcodes is described in the NV_gpu_program5 extension specification.
437
438    Note also that NV_gpu_program5 also supports the LOAD opcode originally
439    added by the NV_shader_buffer_load and the MEMBAR opcode originally
440    provided by EXT_shader_image_load_store.
441
442
443Dependencies on GLSL 4.30, ARB_shader_storage_buffer_object, and
444ARB_compute_shader
445
446    If GLSL 4.30 is supported, add the following atomic memory functions to
447    section 8.11 (Atomic Memory Functions) of the GLSL 4.30 specification:
448
449      uint atomicIncWrap(inout uint mem, uint wrap);
450      uint atomicDecWrap(inout uint mem, uint wrap);
451
452    with the following documentation
453
454      atomicIncWrap() computes the new value written to <mem> by adding one to
455      the contents of <mem>, and then forcing the result to zero if and only
456      if the incremented value is greater than or equal to <wrap>.  This
457      function supports only 32-bit unsigned integer operands.
458
459      atomicDecWrap() computes the new value written to <mem> by subtracting
460      one from the contents of <mem>, and then forcing the result to <wrap>-1
461      if the original value read from <mem> was either zero or greater than
462      <wrap>.  This function supports only 32-bit unsigned integer operands.
463
464    Additionally, add the following functions to the section:
465
466      uint64_t atomicAdd(inout uint64_t mem, uint data);
467      uint64_t atomicExchange(inout uint64_t mem, uint data);
468      uint64_t atomicCompSwap(inout uint64_t mem, uint64_t compare,
469                              uint64_t data);
470
471    If ARB_shader_storage_buffer_object or ARB_compute_shader are supported,
472    make similar edits to the functions documented in the
473    ARB_shader_storage_buffer object extension.
474
475    These functions are available if and only if GL_NV_gpu_shader5 is enabled
476    via the "#extension" directive.
477
478
479Errors
480
481    None
482
483New State
484
485    None.
486
487Issues
488
489    (1) Does MAX_SHADER_BUFFER_ADDRESS_NV still apply?
490
491      RESOLVED:  The primary reason for this limitation to exist was the lack
492      of 64-bit integer support in shaders (see issue 15 of
493      NV_shader_buffer_load). Given that this extension is being released at
494      the same time as NV_gpu_shader5 which adds 64-bit integer support, it
495      is expected that this maximum address will match the maximum address
496      supported by the GPU's address space, or will be equal to "~0ULL"
497      indicating that any GPU address returned by the GL will be usable in a
498      shader.
499
500    (2) What qualifiers should be supported on pointer variables, and how can
501        they be used in declarations?
502
503      RESOLVED:  We will support the qualifiers "coherent", "volatile",
504      "restrict", and "const" to be used in pointer declarations.  "coherent"
505      is taken from EXT_shader_image_load_store and is used to ensure that
506      memory accesses from different shader threads are cached coherently
507      (i.e., will be able to see each other when complete).  "volatile" and
508      "const" behave is as in C.
509
510      "restrict" behaves as in the C99 standard, and can be used to indicate
511      that no other pointer points to the same underlying data.  This permits
512      optimizations that would otherwise be impossible if the compiler has to
513      assume that a pair of pointers might end up pointing to the same data.
514      For example, in standard C/C++, a loop like:
515
516        int *a, *b;
517        a[0] = b[0] + b[0];
518        a[1] = b[0] + b[1];
519        a[2] = b[0] + b[2];
520
521       would need to reload b[0] for each assignment because a[0] or a[1]
522       might point at the same data as b[0].  With restrict, the compiler can
523       assume that b[0] is not modified by any of the instructions and load it
524       just once.
525
526    (3) What amount of automatic synchronization is provided for buffer object
527        writes through pointers?
528
529      RESOLVED:  Use of MemoryBarrierEXT() is required, and there is no
530      automatic synchronization when buffers are bound or unbound.  With
531      resident buffers, there are no well-defined binding points in the first
532      place -- all resident buffers are effectively "bound".
533
534      Implicit synchronization is difficult, as it might require some
535      combination of:
536
537        - tracking which buffers might be written (randomly) in the shader
538          itself;
539
540        - assuming that if a shader that performs writes is executed, all
541          bytes of all resident buffers could be modified and thus must be
542          treated as dirty;
543
544        - idling at the end of each primitive or draw call, so that the
545          results of all previous commands are complete.
546
547      Since normal OpenGL operation is pipelined, idling would result in a
548      significant performance impact since pipelining would otherwise allow
549      fragment shader execution for draw call N while simultaneously
550      performing vertex shader execution for draw call N+1.
551
552
553Revision History
554
555    Rev.    Date    Author    Changes
556    ----  --------  --------  -----------------------------------------
557     5    08/13/12  pbrown    Add interaction with OpenGL 4.3 (and related ARB
558                              extensions) supporting atomic{Inc,Dec}Wrap and
559                              64-bit unsigned integer atomics to shared and
560                              shader storage buffer memory.
561
562     4    04/13/10  pbrown    Remove the floating-point version of atomicAdd().
563
564     3    03/23/10  pbrown    Minor cleanups to the dependency sections.
565                              Fixed obsolete extension names.  Add an issue
566                              on synchronization.
567
568     2    03/16/10  pbrown    Updated memory access qualifiers section
569                              (volatile, coherent, restrict, const) for
570                              pointers.  Added language to document how
571                              these qualifiers work in possibly complicated
572                              expression.
573
574     1              pbrown    Internal revisions.
575