1Name 2 3 NV_shader_buffer_store 4 5Name Strings 6 7 none (implied by GL_NV_gpu_program5 or GL_NV_gpu_shader5) 8 9Contact 10 11 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 12 13Status 14 15 Shipping. 16 17Version 18 19 Last Modified Date: August 13, 2012 20 NVIDIA Revision: 5 21 22Number 23 24 390 25 26Dependencies 27 28 OpenGL 3.0 and GLSL 1.30 are required. 29 30 This extension is written against the OpenGL 3.2 (Compatibility Profile) 31 specification, dated July 24, 2009. 32 33 This extension is written against version 1.50.09 of the OpenGL Shading 34 Language Specification. 35 36 OpenGL 3.0 and GLSL 1.30 are required. 37 38 NV_shader_buffer_load is required. 39 40 NV_gpu_program5 and/or NV_gpu_shader5 is required. 41 42 This extension interacts with EXT_shader_image_load_store. 43 44 This extension interacts with NV_gpu_shader5. 45 46 This extension interacts with NV_gpu_program5. 47 48 This extension interacts with GLSL 4.30, ARB_shader_storage_buffer_object, 49 and ARB_compute_shader. 50 51Overview 52 53 This extension builds upon the mechanisms added by the 54 NV_shader_buffer_load extension to allow shaders to perform random-access 55 reads to buffer object memory without using dedicated buffer object 56 binding points. Instead, it allowed an application to make a buffer 57 object resident, query a GPU address (pointer) for the buffer object, and 58 then use that address as a pointer in shader code. This approach allows 59 shaders to access a large number of buffer objects without needing to 60 repeatedly bind buffers to a limited number of fixed-functionality binding 61 points. 62 63 This extension lifts the restriction from NV_shader_buffer_load that 64 disallows writes. In particular, the MakeBufferResidentNV function now 65 allows READ_WRITE and WRITE_ONLY access modes, and the shading language is 66 extended to allow shaders to write through (GPU address) pointers. 67 Additionally, the extension provides built-in functions to perform atomic 68 memory transactions to buffer object memory. 69 70 As with the shader writes provided by the EXT_shader_image_load_store 71 extension, writes to buffer object memory using this extension are weakly 72 ordered to allow for parallel or distributed shader execution. The 73 EXT_shader_image_load_store extension provides mechanisms allowing for 74 finer control of memory transaction order, and those mechanisms apply 75 equally to buffer object stores using this extension. 76 77 78New Procedures and Functions 79 80 None. 81 82New Tokens 83 84 Accepted by the <barriers> parameter of MemoryBarrierNV: 85 86 SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV 0x00000010 87 88 Accepted by the <access> parameter of MakeBufferResidentNV: 89 90 READ_WRITE 91 WRITE_ONLY 92 93 94Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification 95(OpenGL Operation) 96 97 Modify Section 2.9, Buffer Objects, p. 46 98 99 (extend the language inserted by NV_shader_buffer_load in its "Append to 100 Section 2.9 (p. 45) to allow READ_WRITE and WRITE_ONLY mappings) 101 102 The data store of a buffer object may be made accessible to the GL 103 via shader buffer loads and stores by calling: 104 105 void MakeBufferResidentNV(enum target, enum access); 106 107 <access> may be READ_ONLY, READ_WRITE, and WRITE_ONLY. If a shader loads 108 from a buffer with WRITE_ONLY <access> or stores to a buffer with 109 READ_ONLY <access>, the results of that shader operation are undefined and 110 may lead to application termination. <target> may be any of the buffer 111 targets accepted by BindBuffer. 112 113 The data store of a buffer object may be made inaccessible to the GL 114 via shader buffer loads and stores by calling: 115 116 void MakeBufferNonResidentNV(enum target); 117 118 119 Modify "Section 2.20.X, Shader Memory Access" introduced by the 120 NV_shader_buffer_load specification, to reflect that shaders may store to 121 buffer object memory. 122 123 (first paragraph) Shaders may load from or store to buffer object memory 124 by dereferencing pointer variables. ... 125 126 (second paragraph) When a shader dereferences a pointer variable, data are 127 read from or written to buffer object memory according to the following 128 rules: 129 130 (modify the paragraph after the end of the alignment and stride rules, 131 allowing for writes, and also providing rules forbidding reads to 132 WRITE_ONLY mappings or vice-versa) If a shader reads or writes to a GPU 133 memory address that does not correspond to a buffer object made resident 134 by MakeBufferResidentNV, the results of the operation are undefined and 135 may result in application termination. If a shader reads from a buffer 136 object made resident with an <access> parameter of WRITE_ONLY, or writes 137 to a buffer object made resident with an <access> parameter of READ_ONLY, 138 the results of the operation are also undefined and may lead to 139 application termination. 140 141 Incorporate the contents of "Section 2.14.X, Shader Memory Access" from 142 the EXT_shader_image_load_store specification into the same "Shader memory 143 Access", with the following edits. 144 145 (modify first paragraph to reference pointers) Shaders may perform 146 random-access reads and writes to texture or buffer object memory using 147 pointers or with built-in image load, store, and atomic functions, as 148 described in the OpenGL Shading Language Specification. ... 149 150 (add to list of bits in <barriers> in MemoryBarrierNV) 151 152 - SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV: Memory accesses using pointers and 153 assembly program global loads, stores, and atomics issued after the 154 barrier will reflect data written by shaders prior to the barrier. 155 Additionally, memory writes using pointers issued after the barrier 156 will not execute until memory accesses (loads, stores, texture 157 fetches, vertex fetches, etc) initiated prior to the barrier complete. 158 159 (modify second paragraph after the list of <barriers> bits) To allow for 160 independent shader threads to communicate by reads and writes to a common 161 memory address, pointers and image variables in the OpenGL shading 162 language may be declared as "coherent". Buffer object or texture memory 163 accessed through such variables may be cached only if... 164 165 (add to the coherency guidelines) 166 167 - Data written using pointers in one rendering pass and read by the shader 168 in a later pass need not use coherent variables or memoryBarrier(). 169 Calling MemoryBarrierNV() with the SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV 170 set in <barriers> between passes is necessary. 171 172 173Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification 174(Rasterization) 175 176 None. 177 178 179Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification 180(Per-Fragment Operations and the Frame Buffer) 181 182 None. 183 184 185Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification 186(Special Functions) 187 188 None. 189 190 191Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification 192(State and State Requests) 193 194 None. 195 196 197Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) 198Specification (Invariance) 199 200 None. 201 202Additions to the AGL/GLX/WGL Specifications 203 204 None. 205 206GLX Protocol 207 208 None. 209 210 211Additions to the OpenGL Shading Language Specification, Version 1.50 (Revision 21209) 213 214 Modify Section 4.3.X, Memory Access Qualifiers, as added by 215 EXT_shader_image_load_store 216 217 (modify second paragraph) Memory accesses to image and pointer variables 218 declared using the "coherent" storage qualifier are performed coherently 219 with similar accesses from other shader threads. ... 220 221 (modify fourth paragraph) Memory accesses to image and pointer variables 222 declared using the "volatile" storage qualifier must treat the underlying 223 memory as though it could be read or written at any point during shader 224 execution by some source other than the executing thread. ... 225 226 (modify fifth paragraph) Memory accesses to image and pointer variables 227 declared using the "restrict" storage qualifier may be compiled assuming 228 that the variable used to perform the memory access is the only way to 229 access the underlying memory using the shader stage in question. ... 230 231 (modify sixth paragraph) Memory accesses to image and pointer variables 232 declared using the "const" storage qualifier may only read the underlying 233 memory, which is treated as read-only. ... 234 235 (insert after seventh paragraph) 236 237 In pointer variable declarations, the "coherent", "volatile", "restrict", 238 and "const" qualifiers can be positioned anywhere in the declaration, and 239 may apply qualify either a pointer or the underlying data being pointed 240 to, depending on its position in the declaration. Each qualifier to the 241 right of the basic data type in a declaration is considered to apply to 242 whatever type is found immediately to its left; qualifiers to the left of 243 the basic type are considered to apply to that basic type. To interpret 244 the meaning of qualifiers in pointer declarations, it is useful to read 245 the declaration from right to left as in the following examples. 246 247 int * * const a; // a is a constant pointer to a pointer to int 248 int * volatile * b; // b is a pointer to a volatile pointer to int 249 int const * * c; // c is a pointer to a pointer to a constant int 250 const int * * d; // d is like c 251 int const * const * // e is a constant pointer to a constant pointer 252 const e; // to a constant int 253 254 For pointer types, the "restrict" qualifier can be used to qualify 255 pointers, but not non-pointer types being pointed to. 256 257 int * restrict a; // a is a restricted pointer to int 258 int restrict * b; // b qualifies "int" as restricted - illegal 259 260 (modify eighth paragraph) The "coherent", "volatile", and "restrict" 261 storage qualifiers may only be used on image and pointer variables, and 262 may not be used on variables of any other type. ... 263 264 (modify last paragraph) The values of image and pointer variables 265 qualified with "coherent," "volatile," "restrict", or "const" may not be 266 assigned to function parameters or l-values lacking such qualifiers. 267 268 (add examples for the last paragraph) 269 270 int volatile * var1; 271 int * var2; 272 int * restrict var3; 273 var1 = var2; // OK, adding "volatile" is allowed 274 var2 = var3; // illegal, stripping "restrict" is not 275 276 277 Modify Section 5.X, Pointer Operations, as added by NV_shader_buffer_load 278 279 (modify second paragraph, allowing storing through pointers) The pointer 280 dereference operator ... The result of a pointer dereference may be used 281 as the left-hand side of an assignment. 282 283 284 Modify Section 8.Y, Shader Memory Functions, as added by 285 EXT_shader_image_load_store 286 287 (modify first paragraph) Shaders of all types may read and write the 288 contents of textures and buffer objects using pointers and image 289 variables. ... 290 291 (modify description of memoryBarrier) memoryBarrier() can be used to 292 control the ordering of memory transactions issued by a shader thread. 293 When called, it will wait on the completion of all memory accesses 294 resulting from the use of pointers and image variables prior to calling 295 the function. ... 296 297 (add the following paragraphs to the end of the section) 298 299 If multiple threads need to atomically access shared memory addresses 300 using pointers, they may do so using the following built-in functions. 301 The following atomic memory access functions allow a shader thread to 302 read, modify, and write an address in memory in a manner that guarantees 303 that no other shader thread can modify the memory between the read and the 304 write. All of these functions read a single data element from memory, 305 compute a new value based on the value read from memory and one or more 306 other values passed to the function, and writes the result back to the 307 same memory address. The value returned to the caller is always the data 308 element originally read from memory. 309 310 Syntax: 311 312 uint atomicAdd(uint *address, uint data); 313 int atomicAdd(int *address, int data); 314 uint64_t atomicAdd(uint64_t *address, uint64_t data); 315 316 uint atomicMin(uint *address, uint data); 317 int atomicMin(int *address, int data); 318 319 uint atomicMax(uint *address, uint data); 320 int atomicMax(int *address, int data); 321 322 uint atomicIncWrap(uint *address, uint wrap); 323 324 uint atomicDecWrap(uint *address, uint wrap); 325 326 uint atomicAnd(uint *address, uint data); 327 int atomicAnd(int *address, int data); 328 329 uint atomicOr(uint *address, uint data); 330 int atomicOr(int *address, int data); 331 332 uint atomicXor(uint *address, uint data); 333 int atomicXor(int *address, int data); 334 335 uint atomicExchange(uint *address, uint data); 336 int atomicExchange(int *address, uint data); 337 uint64_t atomicExchange(uint64_t *address, uint64_t data); 338 339 uint atomicCompSwap(uint *address, uint compare, uint data); 340 int atomicCompSwap(int *address, int compare, int data); 341 uint64_t atomicCompSwap(uint64_t *address, uint64_t compare, 342 uint64_t data); 343 344 Description: 345 346 atomicAdd() computes the new value written to <address> by adding the 347 value of <data> to the contents of <address>. This function supports 32- 348 and 64-bit unsigned integer operands, and 32-bit signed integer operands. 349 350 atomicMin() computes the new value written to <address> by taking the 351 minimum of the value of <data> and the contents of <address>. This 352 function supports 32-bit signed and unsigned integer operands. 353 354 atomicMax() computes the new value written to <address> by taking the 355 maximum of the value of <data> and the contents of <address>. This 356 function supports 32-bit signed and unsigned integer operands. 357 358 atomicIncWrap() computes the new value written to <address> by adding one 359 to the contents of <address>, and then forcing the result to zero if and 360 only if the incremented value is greater than or equal to <wrap>. This 361 function supports only 32-bit unsigned integer operands. 362 363 atomicDecWrap() computes the new value written to <address> by subtracting 364 one from the contents of <address>, and then forcing the result to 365 <wrap>-1 if the original value read from <address> was either zero or 366 greater than <wrap>. This function supports only 32-bit unsigned integer 367 operands. 368 369 atomicAnd() computes the new value written to <address> by performing a 370 bitwise and of the value of <data> and the contents of <address>. This 371 function supports 32-bit signed and unsigned integer operands. 372 373 atomicOr() computes the new value written to <address> by performing a 374 bitwise or of the value of <data> and the contents of <address>. This 375 function supports 32-bit signed and unsigned integer operands. 376 377 atomicXor() computes the new value written to <address> by performing a 378 bitwise exclusive or of the value of <data> and the contents of <address>. 379 This function supports 32-bit signed and unsigned integer operands. 380 381 atomicExchange() uses the value of <data> as the value written to 382 <address>. This function supports 32- and 64-bit unsigned integer 383 operands and 32-bit signed integer operands. 384 385 atomicCompSwap() compares the value of <compare> and the contents of 386 <address>. If the values are equal, <data> is written to <address>; 387 otherwise, the original contents of <address> are preserved. This 388 function supports 32- and 64-bit unsigned integer operands and 32-bit 389 signed integer operands. 390 391 392 Modify Section 9, Shading Language Grammar, p. 105 393 394 !!! TBD: Add grammar constructs for memory access qualifiers, allowing 395 memory access qualifiers before or after the type and the "*" 396 characters indicating pointers in a variable declaration. 397 398 399Dependencies on EXT_shader_image_load_store 400 401 This specification incorporates the memory access ordering and 402 synchronization discussion from EXT_shader_image_load_store verbatim. 403 404 If EXT_shader_image_load_store is not supported, this spec should be 405 construed to introduce: 406 407 * the shader memory access language from that specification, including 408 the MemoryBarrierNV() command and the tokens accepted by <barriers> 409 from that specification; 410 411 * the memoryBarrier() function to the OpenGL shading language 412 specification; and 413 414 * the capability and spec language allowing applications to enable early 415 depth tests. 416 417Dependencies on NV_gpu_shader5 418 419 This specification requires either NV_gpu_shader5 or NV_gpu_program5. 420 421 If NV_gpu_shader5 is supported, use of the new shading language features 422 described in this extension requires 423 424 #extension GL_NV_gpu_shader5 : enable 425 426 If NV_gpu_shader5 is not supported, modifications to the OpenGL Shading 427 Language Specification should be removed. 428 429Dependencies on NV_gpu_program5 430 431 If NV_gpu_program5 is supported, the extension provides support for stores 432 and atomic memory transactions to buffer object memory. Stores are 433 provided by the STORE opcode; atomics are provided by the ATOM opcode. No 434 "OPTION" line is required for these features, which are implied by 435 NV_gpu_program5 program headers such as "!!NVfp5.0". The operation of 436 these opcodes is described in the NV_gpu_program5 extension specification. 437 438 Note also that NV_gpu_program5 also supports the LOAD opcode originally 439 added by the NV_shader_buffer_load and the MEMBAR opcode originally 440 provided by EXT_shader_image_load_store. 441 442 443Dependencies on GLSL 4.30, ARB_shader_storage_buffer_object, and 444ARB_compute_shader 445 446 If GLSL 4.30 is supported, add the following atomic memory functions to 447 section 8.11 (Atomic Memory Functions) of the GLSL 4.30 specification: 448 449 uint atomicIncWrap(inout uint mem, uint wrap); 450 uint atomicDecWrap(inout uint mem, uint wrap); 451 452 with the following documentation 453 454 atomicIncWrap() computes the new value written to <mem> by adding one to 455 the contents of <mem>, and then forcing the result to zero if and only 456 if the incremented value is greater than or equal to <wrap>. This 457 function supports only 32-bit unsigned integer operands. 458 459 atomicDecWrap() computes the new value written to <mem> by subtracting 460 one from the contents of <mem>, and then forcing the result to <wrap>-1 461 if the original value read from <mem> was either zero or greater than 462 <wrap>. This function supports only 32-bit unsigned integer operands. 463 464 Additionally, add the following functions to the section: 465 466 uint64_t atomicAdd(inout uint64_t mem, uint data); 467 uint64_t atomicExchange(inout uint64_t mem, uint data); 468 uint64_t atomicCompSwap(inout uint64_t mem, uint64_t compare, 469 uint64_t data); 470 471 If ARB_shader_storage_buffer_object or ARB_compute_shader are supported, 472 make similar edits to the functions documented in the 473 ARB_shader_storage_buffer object extension. 474 475 These functions are available if and only if GL_NV_gpu_shader5 is enabled 476 via the "#extension" directive. 477 478 479Errors 480 481 None 482 483New State 484 485 None. 486 487Issues 488 489 (1) Does MAX_SHADER_BUFFER_ADDRESS_NV still apply? 490 491 RESOLVED: The primary reason for this limitation to exist was the lack 492 of 64-bit integer support in shaders (see issue 15 of 493 NV_shader_buffer_load). Given that this extension is being released at 494 the same time as NV_gpu_shader5 which adds 64-bit integer support, it 495 is expected that this maximum address will match the maximum address 496 supported by the GPU's address space, or will be equal to "~0ULL" 497 indicating that any GPU address returned by the GL will be usable in a 498 shader. 499 500 (2) What qualifiers should be supported on pointer variables, and how can 501 they be used in declarations? 502 503 RESOLVED: We will support the qualifiers "coherent", "volatile", 504 "restrict", and "const" to be used in pointer declarations. "coherent" 505 is taken from EXT_shader_image_load_store and is used to ensure that 506 memory accesses from different shader threads are cached coherently 507 (i.e., will be able to see each other when complete). "volatile" and 508 "const" behave is as in C. 509 510 "restrict" behaves as in the C99 standard, and can be used to indicate 511 that no other pointer points to the same underlying data. This permits 512 optimizations that would otherwise be impossible if the compiler has to 513 assume that a pair of pointers might end up pointing to the same data. 514 For example, in standard C/C++, a loop like: 515 516 int *a, *b; 517 a[0] = b[0] + b[0]; 518 a[1] = b[0] + b[1]; 519 a[2] = b[0] + b[2]; 520 521 would need to reload b[0] for each assignment because a[0] or a[1] 522 might point at the same data as b[0]. With restrict, the compiler can 523 assume that b[0] is not modified by any of the instructions and load it 524 just once. 525 526 (3) What amount of automatic synchronization is provided for buffer object 527 writes through pointers? 528 529 RESOLVED: Use of MemoryBarrierEXT() is required, and there is no 530 automatic synchronization when buffers are bound or unbound. With 531 resident buffers, there are no well-defined binding points in the first 532 place -- all resident buffers are effectively "bound". 533 534 Implicit synchronization is difficult, as it might require some 535 combination of: 536 537 - tracking which buffers might be written (randomly) in the shader 538 itself; 539 540 - assuming that if a shader that performs writes is executed, all 541 bytes of all resident buffers could be modified and thus must be 542 treated as dirty; 543 544 - idling at the end of each primitive or draw call, so that the 545 results of all previous commands are complete. 546 547 Since normal OpenGL operation is pipelined, idling would result in a 548 significant performance impact since pipelining would otherwise allow 549 fragment shader execution for draw call N while simultaneously 550 performing vertex shader execution for draw call N+1. 551 552 553Revision History 554 555 Rev. Date Author Changes 556 ---- -------- -------- ----------------------------------------- 557 5 08/13/12 pbrown Add interaction with OpenGL 4.3 (and related ARB 558 extensions) supporting atomic{Inc,Dec}Wrap and 559 64-bit unsigned integer atomics to shared and 560 shader storage buffer memory. 561 562 4 04/13/10 pbrown Remove the floating-point version of atomicAdd(). 563 564 3 03/23/10 pbrown Minor cleanups to the dependency sections. 565 Fixed obsolete extension names. Add an issue 566 on synchronization. 567 568 2 03/16/10 pbrown Updated memory access qualifiers section 569 (volatile, coherent, restrict, const) for 570 pointers. Added language to document how 571 these qualifiers work in possibly complicated 572 expression. 573 574 1 pbrown Internal revisions. 575