1Name 2 3 NV_shader_buffer_store 4 5Name Strings 6 7 none (implied by GL_NV_gpu_program5 or GL_NV_gpu_shader5) 8 9Contact 10 11 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 12 13Status 14 15 Shipping. 16 17Version 18 19 Last Modified Date: May 25, 2022 20 NVIDIA Revision: 6 21 22Number 23 24 390 25 26Dependencies 27 28 OpenGL 3.0 and GLSL 1.30 are required. 29 30 This extension is written against the OpenGL 3.2 (Compatibility Profile) 31 specification, dated July 24, 2009. 32 33 This extension is written against version 1.50.09 of the OpenGL Shading 34 Language Specification. 35 36 OpenGL 3.0 and GLSL 1.30 are required. 37 38 NV_shader_buffer_load is required. 39 40 NV_gpu_program5 and/or NV_gpu_shader5 is required. 41 42 This extension interacts with EXT_shader_image_load_store. 43 44 This extension interacts with NV_gpu_shader5. 45 46 This extension interacts with NV_gpu_program5. 47 48 This extension interacts with GLSL 4.30, ARB_shader_storage_buffer_object, 49 and ARB_compute_shader. 50 51 This extension interacts with OpenGL 4.2. 52 53Overview 54 55 This extension builds upon the mechanisms added by the 56 NV_shader_buffer_load extension to allow shaders to perform random-access 57 reads to buffer object memory without using dedicated buffer object 58 binding points. Instead, it allowed an application to make a buffer 59 object resident, query a GPU address (pointer) for the buffer object, and 60 then use that address as a pointer in shader code. This approach allows 61 shaders to access a large number of buffer objects without needing to 62 repeatedly bind buffers to a limited number of fixed-functionality binding 63 points. 64 65 This extension lifts the restriction from NV_shader_buffer_load that 66 disallows writes. In particular, the MakeBufferResidentNV function now 67 allows READ_WRITE and WRITE_ONLY access modes, and the shading language is 68 extended to allow shaders to write through (GPU address) pointers. 69 Additionally, the extension provides built-in functions to perform atomic 70 memory transactions to buffer object memory. 71 72 As with the shader writes provided by the EXT_shader_image_load_store 73 extension, writes to buffer object memory using this extension are weakly 74 ordered to allow for parallel or distributed shader execution. The 75 EXT_shader_image_load_store extension provides mechanisms allowing for 76 finer control of memory transaction order, and those mechanisms apply 77 equally to buffer object stores using this extension. 78 79 80New Procedures and Functions 81 82 None. 83 84New Tokens 85 86 Accepted by the <barriers> parameter of MemoryBarrierEXT: 87 88 SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV 0x00000010 89 90 Accepted by the <access> parameter of MakeBufferResidentNV: 91 92 READ_WRITE 93 WRITE_ONLY 94 95 96Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification 97(OpenGL Operation) 98 99 Modify Section 2.9, Buffer Objects, p. 46 100 101 (extend the language inserted by NV_shader_buffer_load in its "Append to 102 Section 2.9 (p. 45) to allow READ_WRITE and WRITE_ONLY mappings) 103 104 The data store of a buffer object may be made accessible to the GL 105 via shader buffer loads and stores by calling: 106 107 void MakeBufferResidentNV(enum target, enum access); 108 109 <access> may be READ_ONLY, READ_WRITE, and WRITE_ONLY. If a shader loads 110 from a buffer with WRITE_ONLY <access> or stores to a buffer with 111 READ_ONLY <access>, the results of that shader operation are undefined and 112 may lead to application termination. <target> may be any of the buffer 113 targets accepted by BindBuffer. 114 115 The data store of a buffer object may be made inaccessible to the GL 116 via shader buffer loads and stores by calling: 117 118 void MakeBufferNonResidentNV(enum target); 119 120 121 Modify "Section 2.20.X, Shader Memory Access" introduced by the 122 NV_shader_buffer_load specification, to reflect that shaders may store to 123 buffer object memory. 124 125 (first paragraph) Shaders may load from or store to buffer object memory 126 by dereferencing pointer variables. ... 127 128 (second paragraph) When a shader dereferences a pointer variable, data are 129 read from or written to buffer object memory according to the following 130 rules: 131 132 (modify the paragraph after the end of the alignment and stride rules, 133 allowing for writes, and also providing rules forbidding reads to 134 WRITE_ONLY mappings or vice-versa) If a shader reads or writes to a GPU 135 memory address that does not correspond to a buffer object made resident 136 by MakeBufferResidentNV, the results of the operation are undefined and 137 may result in application termination. If a shader reads from a buffer 138 object made resident with an <access> parameter of WRITE_ONLY, or writes 139 to a buffer object made resident with an <access> parameter of READ_ONLY, 140 the results of the operation are also undefined and may lead to 141 application termination. 142 143 Incorporate the contents of "Section 2.14.X, Shader Memory Access" from 144 the EXT_shader_image_load_store specification into the same "Shader memory 145 Access", with the following edits. 146 147 (modify first paragraph to reference pointers) Shaders may perform 148 random-access reads and writes to texture or buffer object memory using 149 pointers or with built-in image load, store, and atomic functions, as 150 described in the OpenGL Shading Language Specification. ... 151 152 (add to list of bits in <barriers> in MemoryBarrierEXT) 153 154 - SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV: Memory accesses using pointers and 155 assembly program global loads, stores, and atomics issued after the 156 barrier will reflect data written by shaders prior to the barrier. 157 Additionally, memory writes using pointers issued after the barrier 158 will not execute until memory accesses (loads, stores, texture 159 fetches, vertex fetches, etc) initiated prior to the barrier complete. 160 161 (modify second paragraph after the list of <barriers> bits) To allow for 162 independent shader threads to communicate by reads and writes to a common 163 memory address, pointers and image variables in the OpenGL shading 164 language may be declared as "coherent". Buffer object or texture memory 165 accessed through such variables may be cached only if... 166 167 (add to the coherency guidelines) 168 169 - Data written using pointers in one rendering pass and read by the shader 170 in a later pass need not use coherent variables or memoryBarrier(). 171 Calling MemoryBarrierEXT() with the SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV 172 set in <barriers> between passes is necessary. 173 174 175Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification 176(Rasterization) 177 178 None. 179 180 181Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification 182(Per-Fragment Operations and the Frame Buffer) 183 184 None. 185 186 187Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification 188(Special Functions) 189 190 None. 191 192 193Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification 194(State and State Requests) 195 196 None. 197 198 199Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) 200Specification (Invariance) 201 202 None. 203 204Additions to the AGL/GLX/WGL Specifications 205 206 None. 207 208GLX Protocol 209 210 None. 211 212 213Additions to the OpenGL Shading Language Specification, Version 1.50 (Revision 21409) 215 216 Modify Section 4.3.X, Memory Access Qualifiers, as added by 217 EXT_shader_image_load_store 218 219 (modify second paragraph) Memory accesses to image and pointer variables 220 declared using the "coherent" storage qualifier are performed coherently 221 with similar accesses from other shader threads. ... 222 223 (modify fourth paragraph) Memory accesses to image and pointer variables 224 declared using the "volatile" storage qualifier must treat the underlying 225 memory as though it could be read or written at any point during shader 226 execution by some source other than the executing thread. ... 227 228 (modify fifth paragraph) Memory accesses to image and pointer variables 229 declared using the "restrict" storage qualifier may be compiled assuming 230 that the variable used to perform the memory access is the only way to 231 access the underlying memory using the shader stage in question. ... 232 233 (modify sixth paragraph) Memory accesses to image and pointer variables 234 declared using the "const" storage qualifier may only read the underlying 235 memory, which is treated as read-only. ... 236 237 (insert after seventh paragraph) 238 239 In pointer variable declarations, the "coherent", "volatile", "restrict", 240 and "const" qualifiers can be positioned anywhere in the declaration, and 241 may apply qualify either a pointer or the underlying data being pointed 242 to, depending on its position in the declaration. Each qualifier to the 243 right of the basic data type in a declaration is considered to apply to 244 whatever type is found immediately to its left; qualifiers to the left of 245 the basic type are considered to apply to that basic type. To interpret 246 the meaning of qualifiers in pointer declarations, it is useful to read 247 the declaration from right to left as in the following examples. 248 249 int * * const a; // a is a constant pointer to a pointer to int 250 int * volatile * b; // b is a pointer to a volatile pointer to int 251 int const * * c; // c is a pointer to a pointer to a constant int 252 const int * * d; // d is like c 253 int const * const * // e is a constant pointer to a constant pointer 254 const e; // to a constant int 255 256 For pointer types, the "restrict" qualifier can be used to qualify 257 pointers, but not non-pointer types being pointed to. 258 259 int * restrict a; // a is a restricted pointer to int 260 int restrict * b; // b qualifies "int" as restricted - illegal 261 262 (modify eighth paragraph) The "coherent", "volatile", and "restrict" 263 storage qualifiers may only be used on image and pointer variables, and 264 may not be used on variables of any other type. ... 265 266 (modify last paragraph) The values of image and pointer variables 267 qualified with "coherent," "volatile," "restrict", or "const" may not be 268 assigned to function parameters or l-values lacking such qualifiers. 269 270 (add examples for the last paragraph) 271 272 int volatile * var1; 273 int * var2; 274 int * restrict var3; 275 var1 = var2; // OK, adding "volatile" is allowed 276 var2 = var3; // illegal, stripping "restrict" is not 277 278 279 Modify Section 5.X, Pointer Operations, as added by NV_shader_buffer_load 280 281 (modify second paragraph, allowing storing through pointers) The pointer 282 dereference operator ... The result of a pointer dereference may be used 283 as the left-hand side of an assignment. 284 285 286 Modify Section 8.Y, Shader Memory Functions, as added by 287 EXT_shader_image_load_store 288 289 (modify first paragraph) Shaders of all types may read and write the 290 contents of textures and buffer objects using pointers and image 291 variables. ... 292 293 (modify description of memoryBarrier) memoryBarrier() can be used to 294 control the ordering of memory transactions issued by a shader thread. 295 When called, it will wait on the completion of all memory accesses 296 resulting from the use of pointers and image variables prior to calling 297 the function. ... 298 299 (add the following paragraphs to the end of the section) 300 301 If multiple threads need to atomically access shared memory addresses 302 using pointers, they may do so using the following built-in functions. 303 The following atomic memory access functions allow a shader thread to 304 read, modify, and write an address in memory in a manner that guarantees 305 that no other shader thread can modify the memory between the read and the 306 write. All of these functions read a single data element from memory, 307 compute a new value based on the value read from memory and one or more 308 other values passed to the function, and writes the result back to the 309 same memory address. The value returned to the caller is always the data 310 element originally read from memory. 311 312 Syntax: 313 314 uint atomicAdd(uint *address, uint data); 315 int atomicAdd(int *address, int data); 316 uint64_t atomicAdd(uint64_t *address, uint64_t data); 317 318 uint atomicMin(uint *address, uint data); 319 int atomicMin(int *address, int data); 320 321 uint atomicMax(uint *address, uint data); 322 int atomicMax(int *address, int data); 323 324 uint atomicIncWrap(uint *address, uint wrap); 325 326 uint atomicDecWrap(uint *address, uint wrap); 327 328 uint atomicAnd(uint *address, uint data); 329 int atomicAnd(int *address, int data); 330 331 uint atomicOr(uint *address, uint data); 332 int atomicOr(int *address, int data); 333 334 uint atomicXor(uint *address, uint data); 335 int atomicXor(int *address, int data); 336 337 uint atomicExchange(uint *address, uint data); 338 int atomicExchange(int *address, uint data); 339 uint64_t atomicExchange(uint64_t *address, uint64_t data); 340 341 uint atomicCompSwap(uint *address, uint compare, uint data); 342 int atomicCompSwap(int *address, int compare, int data); 343 uint64_t atomicCompSwap(uint64_t *address, uint64_t compare, 344 uint64_t data); 345 346 Description: 347 348 atomicAdd() computes the new value written to <address> by adding the 349 value of <data> to the contents of <address>. This function supports 32- 350 and 64-bit unsigned integer operands, and 32-bit signed integer operands. 351 352 atomicMin() computes the new value written to <address> by taking the 353 minimum of the value of <data> and the contents of <address>. This 354 function supports 32-bit signed and unsigned integer operands. 355 356 atomicMax() computes the new value written to <address> by taking the 357 maximum of the value of <data> and the contents of <address>. This 358 function supports 32-bit signed and unsigned integer operands. 359 360 atomicIncWrap() computes the new value written to <address> by adding one 361 to the contents of <address>, and then forcing the result to zero if and 362 only if the incremented value is greater than or equal to <wrap>. This 363 function supports only 32-bit unsigned integer operands. 364 365 atomicDecWrap() computes the new value written to <address> by subtracting 366 one from the contents of <address>, and then forcing the result to 367 <wrap>-1 if the original value read from <address> was either zero or 368 greater than <wrap>. This function supports only 32-bit unsigned integer 369 operands. 370 371 atomicAnd() computes the new value written to <address> by performing a 372 bitwise and of the value of <data> and the contents of <address>. This 373 function supports 32-bit signed and unsigned integer operands. 374 375 atomicOr() computes the new value written to <address> by performing a 376 bitwise or of the value of <data> and the contents of <address>. This 377 function supports 32-bit signed and unsigned integer operands. 378 379 atomicXor() computes the new value written to <address> by performing a 380 bitwise exclusive or of the value of <data> and the contents of <address>. 381 This function supports 32-bit signed and unsigned integer operands. 382 383 atomicExchange() uses the value of <data> as the value written to 384 <address>. This function supports 32- and 64-bit unsigned integer 385 operands and 32-bit signed integer operands. 386 387 atomicCompSwap() compares the value of <compare> and the contents of 388 <address>. If the values are equal, <data> is written to <address>; 389 otherwise, the original contents of <address> are preserved. This 390 function supports 32- and 64-bit unsigned integer operands and 32-bit 391 signed integer operands. 392 393 394 Modify Section 9, Shading Language Grammar, p. 105 395 396 !!! TBD: Add grammar constructs for memory access qualifiers, allowing 397 memory access qualifiers before or after the type and the "*" 398 characters indicating pointers in a variable declaration. 399 400 401Dependencies on EXT_shader_image_load_store 402 403 This specification incorporates the memory access ordering and 404 synchronization discussion from EXT_shader_image_load_store verbatim. 405 406 If EXT_shader_image_load_store is not supported, this spec should be 407 construed to introduce: 408 409 * the shader memory access language from that specification, including 410 the MemoryBarrierEXT() command and the tokens accepted by <barriers> 411 from that specification; 412 413 * the memoryBarrier() function to the OpenGL shading language 414 specification; and 415 416 * the capability and spec language allowing applications to enable early 417 depth tests. 418 419Dependencies on NV_gpu_shader5 420 421 This specification requires either NV_gpu_shader5 or NV_gpu_program5. 422 423 If NV_gpu_shader5 is supported, use of the new shading language features 424 described in this extension requires 425 426 #extension GL_NV_gpu_shader5 : enable 427 428 If NV_gpu_shader5 is not supported, modifications to the OpenGL Shading 429 Language Specification should be removed. 430 431Dependencies on NV_gpu_program5 432 433 If NV_gpu_program5 is supported, the extension provides support for stores 434 and atomic memory transactions to buffer object memory. Stores are 435 provided by the STORE opcode; atomics are provided by the ATOM opcode. No 436 "OPTION" line is required for these features, which are implied by 437 NV_gpu_program5 program headers such as "!!NVfp5.0". The operation of 438 these opcodes is described in the NV_gpu_program5 extension specification. 439 440 Note also that NV_gpu_program5 also supports the LOAD opcode originally 441 added by the NV_shader_buffer_load and the MEMBAR opcode originally 442 provided by EXT_shader_image_load_store. 443 444Dependencies on GLSL 4.30, ARB_shader_storage_buffer_object, and 445ARB_compute_shader 446 447 If GLSL 4.30 is supported, add the following atomic memory functions to 448 section 8.11 (Atomic Memory Functions) of the GLSL 4.30 specification: 449 450 uint atomicIncWrap(inout uint mem, uint wrap); 451 uint atomicDecWrap(inout uint mem, uint wrap); 452 453 with the following documentation 454 455 atomicIncWrap() computes the new value written to <mem> by adding one to 456 the contents of <mem>, and then forcing the result to zero if and only 457 if the incremented value is greater than or equal to <wrap>. This 458 function supports only 32-bit unsigned integer operands. 459 460 atomicDecWrap() computes the new value written to <mem> by subtracting 461 one from the contents of <mem>, and then forcing the result to <wrap>-1 462 if the original value read from <mem> was either zero or greater than 463 <wrap>. This function supports only 32-bit unsigned integer operands. 464 465 Additionally, add the following functions to the section: 466 467 uint64_t atomicAdd(inout uint64_t mem, uint data); 468 uint64_t atomicExchange(inout uint64_t mem, uint data); 469 uint64_t atomicCompSwap(inout uint64_t mem, uint64_t compare, 470 uint64_t data); 471 472 If ARB_shader_storage_buffer_object or ARB_compute_shader are supported, 473 make similar edits to the functions documented in the 474 ARB_shader_storage_buffer object extension. 475 476 These functions are available if and only if GL_NV_gpu_shader5 is enabled 477 via the "#extension" directive. 478 479Dependencies on OpenGL 4.2 480 481 If OpenGL 4.2 is supported, MemoryBarrierEXT can be replaced with the 482 equivalent core function MemoryBarrier. 483 484 485Errors 486 487 None 488 489New State 490 491 None. 492 493Issues 494 495 (1) Does MAX_SHADER_BUFFER_ADDRESS_NV still apply? 496 497 RESOLVED: The primary reason for this limitation to exist was the lack 498 of 64-bit integer support in shaders (see issue 15 of 499 NV_shader_buffer_load). Given that this extension is being released at 500 the same time as NV_gpu_shader5 which adds 64-bit integer support, it 501 is expected that this maximum address will match the maximum address 502 supported by the GPU's address space, or will be equal to "~0ULL" 503 indicating that any GPU address returned by the GL will be usable in a 504 shader. 505 506 (2) What qualifiers should be supported on pointer variables, and how can 507 they be used in declarations? 508 509 RESOLVED: We will support the qualifiers "coherent", "volatile", 510 "restrict", and "const" to be used in pointer declarations. "coherent" 511 is taken from EXT_shader_image_load_store and is used to ensure that 512 memory accesses from different shader threads are cached coherently 513 (i.e., will be able to see each other when complete). "volatile" and 514 "const" behave is as in C. 515 516 "restrict" behaves as in the C99 standard, and can be used to indicate 517 that no other pointer points to the same underlying data. This permits 518 optimizations that would otherwise be impossible if the compiler has to 519 assume that a pair of pointers might end up pointing to the same data. 520 For example, in standard C/C++, a loop like: 521 522 int *a, *b; 523 a[0] = b[0] + b[0]; 524 a[1] = b[0] + b[1]; 525 a[2] = b[0] + b[2]; 526 527 would need to reload b[0] for each assignment because a[0] or a[1] 528 might point at the same data as b[0]. With restrict, the compiler can 529 assume that b[0] is not modified by any of the instructions and load it 530 just once. 531 532 (3) What amount of automatic synchronization is provided for buffer object 533 writes through pointers? 534 535 RESOLVED: Use of MemoryBarrierEXT() is required, and there is no 536 automatic synchronization when buffers are bound or unbound. With 537 resident buffers, there are no well-defined binding points in the first 538 place -- all resident buffers are effectively "bound". 539 540 Implicit synchronization is difficult, as it might require some 541 combination of: 542 543 - tracking which buffers might be written (randomly) in the shader 544 itself; 545 546 - assuming that if a shader that performs writes is executed, all 547 bytes of all resident buffers could be modified and thus must be 548 treated as dirty; 549 550 - idling at the end of each primitive or draw call, so that the 551 results of all previous commands are complete. 552 553 Since normal OpenGL operation is pipelined, idling would result in a 554 significant performance impact since pipelining would otherwise allow 555 fragment shader execution for draw call N while simultaneously 556 performing vertex shader execution for draw call N+1. 557 558 559Revision History 560 561 Rev. Date Author Changes 562 ---- -------- -------- ----------------------------------------- 563 6 05/25/22 shqxu Update to address removal of function MemoryBarrierNV 564 and replace with MemoryBarrierEXT. Add interaction 565 with OpenGL 4.2 supporting MemoryBarrier. 566 567 5 08/13/12 pbrown Add interaction with OpenGL 4.3 (and related ARB 568 extensions) supporting atomic{Inc,Dec}Wrap and 569 64-bit unsigned integer atomics to shared and 570 shader storage buffer memory. 571 572 4 04/13/10 pbrown Remove the floating-point version of atomicAdd(). 573 574 3 03/23/10 pbrown Minor cleanups to the dependency sections. 575 Fixed obsolete extension names. Add an issue 576 on synchronization. 577 578 2 03/16/10 pbrown Updated memory access qualifiers section 579 (volatile, coherent, restrict, const) for 580 pointers. Added language to document how 581 these qualifiers work in possibly complicated 582 expression. 583 584 1 pbrown Internal revisions. 585