1Name 2 3 NV_shader_buffer_load 4 5Name Strings 6 7 GL_NV_shader_buffer_load 8 9Contact 10 11 Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com) 12 13Contributors 14 15 Pat Brown, NVIDIA 16 Chris Dodd, NVIDIA 17 Mark Kilgard, NVIDIA 18 Eric Werness, NVIDIA 19 20Status 21 22 Complete 23 24Version 25 26 Last Modified Date: August 8, 2010 27 Author Revision: 8 28 29Number 30 31 379 32 33Dependencies 34 35 Written against the OpenGL 3.0 Specification. 36 37 Written against the GLSL 1.30 Specification (Revision 09). 38 39 This extension interacts with NV_gpu_program4. 40 41 42Overview 43 44 At a very coarse level, GL has evolved in a way that allows 45 applications to replace many of the original state machine variables 46 with blocks of user-defined data. For example, the current vertex 47 state has been augmented by vertex buffer objects, fixed-function 48 shading state and parameters have been replaced by shaders/programs 49 and constant buffers, etc.. Applications switch between coarse sets 50 of state by binding objects to the context or to other container 51 objects (e.g. vertex array objects) instead of manipulating state 52 variables of the context. In terms of the number of GL commands 53 required to draw an object, modern applications are orders of 54 magnitude more efficient than legacy applications, but this explosion 55 of objects bound to other objects has led to a new bottleneck - 56 pointer chasing and CPU L2 cache misses in the driver, and general 57 L2 cache pollution. 58 59 This extension provides a mechanism to read from a flat, 64-bit GPU 60 address space from programs/shaders, to query GPU addresses of buffer 61 objects at the API level, and to bind buffer objects to the context in 62 such a way that they can be accessed via their GPU addresses in any 63 shader stage. 64 65 The intent is that applications can avoid re-binding buffer objects 66 or updating constants between each Draw call and instead simply use 67 a VertexAttrib (or TexCoord, or InstanceID, or...) to "point" to the 68 new object's state. In this way, one of the cheapest "state" updates 69 (from the CPU's point of view) can be used to effect a significant 70 state change in the shader similarly to how a pointer change may on 71 the CPU. At the same time, this relieves the limits on how many 72 buffer objects can be accessed at once by shaders, and allows these 73 buffer object accesses to be exposed as C-style pointer dereferences 74 in the shading language. 75 76 As a very simple example, imagine packing a group of similar objects' 77 constants into a single buffer object and pointing your program 78 at object <i> by setting "glVertexAttribI1iEXT(attrLoc, i);" 79 and using a shader as such: 80 81 struct MyObjectType { 82 mat4x4 modelView; 83 vec4 materialPropertyX; 84 // etc. 85 }; 86 uniform MyObjectType *allObjects; 87 in int objectID; // bound to attrLoc 88 89 ... 90 91 mat4x4 thisObjectsMatrix = allObjects[objectID].modelView; 92 // do transform, shading, etc. 93 94 This is beneficial in much the same way that texture arrays allow 95 choosing between similar, but independent, texture maps with a single 96 coordinate identifying which slice of the texture to use. It also 97 resembles instancing, where a lightweight change (incrementing the 98 instance ID) can be used to generate a different and interesting 99 result, but with additional flexibility over instancing because the 100 values are app-controlled and not a single incrementing counter. 101 102 Dependent pointer fetches are allowed, so more complex scene graph 103 structures can be built into buffer objects providing significant new 104 flexibility in the use of shaders. Another simple example, showing 105 something you can't do with existing functionality, is to do dependent 106 fetches into many buffer objects: 107 108 GenBuffers(N, dataBuffers); 109 GenBuffers(1, &pointerBuffer); 110 111 GLuint64EXT gpuAddrs[N]; 112 for (i = 0; i < N; ++i) { 113 BindBuffer(target, dataBuffers[i]); 114 BufferData(target, size[i], myData[i], STATIC_DRAW); 115 116 // get the address of this buffer and make it resident. 117 GetBufferParameterui64vNV(target, BUFFER_GPU_ADDRESS, 118 gpuaddrs[i]); 119 MakeBufferResidentNV(target, READ_ONLY); 120 } 121 122 GLuint64EXT pointerBufferAddr; 123 BindBuffer(target, pointerBuffer); 124 BufferData(target, sizeof(GLuint64EXT)*N, gpuAddrs, STATIC_DRAW); 125 GetBufferParameterui64vNV(target, BUFFER_GPU_ADDRESS, 126 &pointerBufferAddr); 127 MakeBufferResidentNV(target, READ_ONLY); 128 129 // now in the shader, we can use a double indirection 130 vec4 **ptrToBuffers = pointerBufferAddr; 131 vec4 *ptrToBufferI = ptrToBuffers[i]; 132 133 This allows simultaneous access to more buffers than 134 EXT_bindable_uniform (MAX_VERTEX_BINDABLE_UNIFORMS, etc.) and each 135 can be larger than MAX_BINDABLE_UNIFORM_SIZE. 136 137New Procedures and Functions 138 139 void MakeBufferResidentNV(enum target, enum access); 140 void MakeBufferNonResidentNV(enum target); 141 boolean IsBufferResidentNV(enum target); 142 void MakeNamedBufferResidentNV(uint buffer, enum access); 143 void MakeNamedBufferNonResidentNV(uint buffer); 144 boolean IsNamedBufferResidentNV(uint buffer); 145 146 void GetBufferParameterui64vNV(enum target, enum pname, 147 uint64EXT *params); 148 void GetNamedBufferParameterui64vNV(uint buffer, enum pname, 149 uint64EXT *params); 150 151 void GetIntegerui64vNV(enum value, uint64EXT *result); 152 153 void Uniformui64NV(int location, uint64EXT value); 154 void Uniformui64vNV(int location, sizei count, 155 const uint64EXT *value); 156 void GetUniformui64vNV(uint program, int location, uint64EXT *params); 157 void ProgramUniformui64NV(uint program, int location, uint64EXT value); 158 void ProgramUniformui64vNV(uint program, int location, sizei count, 159 const uint64EXT *value); 160 161New Tokens 162 163 Accepted by the <pname> parameter of GetBufferParameterui64vNV, 164 GetNamedBufferParameterui64vNV: 165 166 BUFFER_GPU_ADDRESS_NV 0x8F1D 167 168 Returned by the <type> parameter of GetActiveUniform: 169 170 GPU_ADDRESS_NV 0x8F34 171 172 Accepted by the <value> parameter of GetIntegerui64vNV: 173 174 MAX_SHADER_BUFFER_ADDRESS_NV 0x8F35 175 176 177Additions to Chapter 2 of the OpenGL 3.0 Specification (OpenGL Operation) 178 179 Append to Section 2.9 (p. 45) 180 181 The data store of a buffer object may be made accessible to the GL 182 via shader buffer loads by calling: 183 184 void MakeBufferResidentNV(enum target, enum access); 185 186 <access> may only be READ_ONLY, but is provided for future 187 extensibility to indicate to the driver that the GPU may write to the 188 memory. <target> may be any of the buffer targets accepted by 189 BindBuffer. The error INVALID_OPERATION will be generated if no 190 buffer is bound to <target>, if the buffer bound to <target> is 191 already resident in the current GL context, or if the buffer bound to 192 <target> has no data store. 193 194 While the buffer object is resident, it is legal to use GPU addresses 195 in the range [BUFFER_GPU_ADDRESS, BUFFER_GPU_ADDRESS + BUFFER_SIZE) 196 in any shader stage. 197 198 The data store of a buffer object may be made inaccessible to the GL 199 via shader buffer loads by calling: 200 201 void MakeBufferNonResidentNV(enum target); 202 203 A buffer is also made non-resident implicitly as a result of being 204 respecified via BufferData or being deleted. <target> may be any of 205 the buffer targets accepted by BindBuffer. The error 206 INVALID_OPERATION will be generated if no buffer is bound to <target> 207 or if the buffer bound to <target> is not resident in the current 208 GL context. 209 210 The function: 211 212 void GetBufferParameterui64vNV(enum target, enum pname, 213 uint64EXT *params); 214 215 may be used to query the GPU address of a buffer object's data store. 216 This address remains valid until the buffer object is deleted, or 217 when the data store is respecified via BufferData. The address "zero" 218 is reserved for convenience, so no buffer object will ever have an 219 address of zero. The error INVALID_OPERATION will be generated if no 220 buffer is bound to <target>, or if the buffer bound to <target> has no 221 data store. 222 223 The functions: 224 225 void MakeNamedBufferResidentNV(uint buffer, enum access); 226 void MakeNamedBufferNonResidentNV(uint buffer); 227 void GetNamedBufferParameterui64vNV(uint buffer, enum pname, 228 uint64EXT *params); 229 230 operate identically to the non-"Named" functions except, rather than 231 using currently bound buffers, it uses the buffer object identified 232 by <buffer>. If the buffer object named by the buffer parameter has 233 not been previously bound or has been deleted since the last binding, 234 the GL first creates a new state vector, initialized with a zero-sized 235 memory buffer and comprising the state values listed in table 2.6. 236 There is no buffer corresponding to the name zero, these commands 237 generate the INVALID_OPERATION error if the buffer parameter is zero. 238 239 Add to Section 2.20.3 (p. 98) 240 241 void Uniformui64NV(int location, uint64EXT value); 242 void Uniformui64vNV(int location, sizei count, uint64EXT *value); 243 244 The Uniformui64{v}NV commands will load <count> uint64EXT values into 245 a uniform location defined as a GPU_ADDRESS_NV or an array of 246 GPU_ADDRESS_NVs. 247 248 The functions: 249 250 void ProgramUniformui64NV(uint program, int location, 251 uint64EXT value); 252 void ProgramUniformui64vNV(uint program, int location, sizei count, 253 uint64EXT *value); 254 255 operate identically to the non-"Program" functions except, rather 256 than updating the currently in use program object, these "Program" 257 commands update the program object named by the initial program 258 parameter. 259 260 261 Insert a new subsection after Section 2.20.4, Shader Execution (Vertex 262 Shaders), p. 103. 263 264 Section 2.20.X, Shader Memory Access 265 266 Shaders may load from buffer object memory by dereferencing pointer 267 variables. Pointer variables are 64-bit unsigned integer values referring 268 to the GPU addresses of data stored in buffer objects made resident by 269 MakeBufferResidentNV. The GPU addresses of such buffer objects may be 270 queried using GetBufferParameterui64vNV with a <pname> of 271 BUFFER_GPU_ADDRESS_NV. 272 273 When a shader dereferences a pointer variable, data are read from buffer 274 object memory according to the following rules: 275 276 - Data of type "bool" are stored in memory as one uint-typed value at the 277 specified GPU address. All non-zero values correspond to true, and zero 278 corresponds to false. 279 280 - Data of type "int" are stored in memory as one int-typed value at the 281 specified GPU address. 282 283 - Data of type "uint" are stored in memory as one uint-typed value at the 284 specified GPU address. 285 286 - Data of type "float" are stored in memory as one float-typed value at 287 the specified GPU address. 288 289 - Vectors with <N> elements with any of the above basic element types are 290 stored in memory as <N> values in consecutive memory locations beginning 291 at the specified GPU address, with components stored in order with the 292 first (X) component at the lowest offset. The data type used for 293 individual components is derived according to the rules for scalar 294 members above. 295 296 - Data with any pointer type are stored in memory as a single 64-bit 297 unsigned integer value at the specified GPU address. 298 299 - Column-major matrices with <C> columns and <R> rows (using the type 300 "mat<C>x<R>", or simply "mat<C>" if <C>==<R>) are treated as an array of 301 <C> floating-point column vectors, each consisting of <R> components. 302 The column vectors will be stored in order, with column zero at the 303 lowest offset. The difference in offsets between consecutive columns of 304 the matrix will be referred to as the column stride, and is constant 305 across the matrix. 306 307 - Row-major matrices with <C> columns and <R> rows (using the type 308 "mat<C>x<R>", or simply "mat<C>" if <C>==<R>) are treated as an array of 309 <R> floating-point row vectors, each consisting of <C> components. The 310 row vectors will be stored in order, with row zero at the lowest offset. 311 The difference in offsets between consecutive rows of the matrix will be 312 referred to as the row stride, and is constant across the matrix. 313 314 - Arrays of scalars, vectors, pointers, and matrices are stored in memory 315 by element order, with array member zero at the lowest offset. The 316 difference in offsets between each pair of elements in the array in 317 basic machine units is referred to as the array stride, and is constant 318 across the entire array. 319 320 For matrix and array variables, the matrix and/or array strides 321 corresponding to the variable may be derived according to the structure 322 layout rules specified immediately below. 323 324 When dereferencing a pointer to a structure, its individual members will 325 be laid out in memory in monotonically increasing order based on their 326 location in the structure declaration. Each structure member has a base 327 offset and a base alignment, from which an aligned offset is computed by 328 rounding the base offset up to the next multiple of the base alignment. 329 The base offset of the first member of a structure is taken from the 330 aligned offset of the structure itself. The base offset of all other 331 structure members is derived by taking the offset of the last basic 332 machine unit consumed by the previous member and adding one. Each 333 structure member is stored in memory at its aligned offset. 334 335 (1) If the member is a scalar consuming <N> basic machine units, the 336 base alignment is <N>. 337 338 (2) If the member is a two- or four-component vector with components 339 consuming <N> basic machine units, the base alignment is 2<N> or 340 4<N>, respectively. 341 342 (3) If the member is a three-component vector with components consuming 343 <N> basic machine units, the base alignment is 4<N>. 344 345 (4) If the member is an array of scalars or vectors, the base alignment 346 and array stride are set to match the base alignment of a single 347 array element, according to rules (1), (2), and (3). The array may 348 have padding at the end; the base offset of the member following the 349 array is rounded up to the next multiple of the base alignment. 350 351 (5) If the member is a column-major matrix with <C> columns and <R> 352 rows, the matrix is stored identically to an array of <C> column 353 vectors with <R> components each, according to rule (4). 354 355 (6) If the member is an array of <S> column-major matrices with <C> 356 columns and <R> rows, the matrix is stored identically to a row of 357 <S>*<C> column vectors with <R> components each, according to rule 358 (4). 359 360 (7) If the member is a row-major matrix with <C> columns and <R> rows, 361 the matrix is stored identically to an array of <R> row vectors 362 with <C> components each, according to rule (4). 363 364 (8) If the member is an array of <S> row-major matrices with <C> columns 365 and <R> rows, the matrix is stored identically to a row of <S>*<R> 366 row vectors with <C> components each, according to rule (4). 367 368 (9) If the member is a structure, the base alignment of the structure is 369 <N>, where <N> is the largest base alignment value of any of its 370 members. The individual members of this sub-structure are then 371 assigned offsets by applying this set of rules recursively, where 372 the base offset of the first member of the sub-structure is equal to 373 the aligned offset of the structure. The structure may have padding 374 at the end; the base offset of the member following the 375 sub-structure is rounded up to the next multiple of the base 376 alignment of the structure. 377 378 (10) If the member is an array of <S> structures, the <S> elements of 379 the array are laid out in order, according to rule (9). 380 381 If a shader reads from a GPU address that does not correspond to a buffer 382 object made resident by MakeBufferResidentNV, the results of the operation 383 are undefined and may result in application termination. 384 385 Any variable, array element, or structure member accessed using a pointer 386 has a required base alignment, which may be derived according the 387 structure layout rules above. If a variable, array member, or structure 388 member is accessed using a pointer that is not a multiple of its base 389 alignment, the results of the access will be undefined. To store multiple 390 variables in a single buffer object, an application must ensure that each 391 variable is properly aligned. Storing a single scalar, vector, matrix, 392 array, or structure variable using a pointer set to the base GPU address 393 of a resident buffer object requires no special alignment. The base GPU 394 address of a buffer object is guaranteed to be sufficiently aligned to 395 satisfy the base alignment requirement of any variable, and the layout 396 rules above ensure that individual matrix rows/columns, array elements, 397 and structure members are properly aligned as long as the base pointer 398 meets alignment requirements. 399 400 401Additions to Chapter 5 of the OpenGL 3.0 Specification (Special Functions) 402 403 Add to Section 5.4, p. 310 (Display Lists) 404 405 Edit the list of commands that are executed immediately when compiling 406 a display list to include MakeBufferResidentNV, 407 MakeBufferNonResidentNV, MakeNamedBufferResidentNV, 408 MakeNamedBufferNonResidentNV, GetBufferParameterui64vNV, 409 GetNamedBufferParameterui64vNV, IsBufferResidentNV, and 410 IsNamedBufferResidentNV. 411 412Additions to Chapter 6 of the OpenGL 3.0 Specification (Querying GL State) 413 414 Add to Section 6.1.11, p. 314 (Pointer, String, and 64-bit Queries) 415 416 The command: 417 418 void GetIntegerui64vNV(enum value, uint64EXT *result); 419 420 obtains 64-bit unsigned integer state variables. Legal values of 421 <value> are only those that specify GetIntegerui64vNV in the state 422 tables in Chapter 6. 423 424 Add to Section 6.1.13, p. 332 (Buffer Object Queries) 425 426 The commands: 427 428 boolean IsBufferResidentNV(enum target); 429 boolean IsNamedBufferResidentNV(uint buffer); 430 431 return TRUE if the specified buffer is resident in the current context. 432 The error INVALID_OPERATION will be generated by IsBufferResidentNV if no 433 buffer is bound to <target>. If the buffer object named by the buffer 434 parameter of IsNamedBufferResidentNV has not been previously bound or has 435 been deleted since the last binding, the GL first creates a new state 436 vector, initialized with a zero-sized memory buffer and comprising the 437 state values listed in table 2.6. There is no buffer corresponding to the 438 name zero, IsNamedBufferResidentNV generates the INVALID_OPERATION error if 439 the buffer parameter is zero. 440 441 Add to Section 6.1.15, p. 337 (Shader and Program Queries) 442 443 void GetUniformui64vNV(uint program, int location, uint64EXT *params); 444 445Additions to Appendix D of the OpenGL 3.0 Specification (Shared Objects and Multiple Contexts) 446 447 Add a new section D.X (Object Use by GPU Address) 448 449 A buffer object's GPU addresses is valid in all contexts in the share 450 group that the buffer belongs to. A buffer should be made resident in 451 each context that will use it via GPU address, to allow the GL 452 knowledge that it is used in each command stream. 453 454Additions to the NV_gpu_program4 specification: 455 456 Change Section 2.X.2, Program Grammar 457 458 If a program specifies the NV_shader_buffer_load program option, 459 the following modifications apply to the program grammar: 460 461 Append to <opModifier> list: | "F32" | "F32X2" | "F32X4" | "S8" | "S16" | 462 "S32" | "S32X2" | "S32X4" | "U8" | "U16" | "U32" | "U32X2" | "U32X4". 463 464 Append to <SCALARop> list: | "LOAD". 465 466 Modify Section 2.X.4, Program Execution Environment 467 468 (Add to the set of opcodes in Table X.13) 469 470 Modifiers 471 Instruction F I C S H D Out Inputs Description 472 ----------- - - - - - - --- -------- -------------------------------- 473 LOAD X X X X - F v su Global load 474 475 476 (Add to Table X.14, Instruction Modifiers, and to the corresponding 477 description following the table) 478 479 Modifier Description 480 -------- ----------------------------------------------- 481 F32 Access one 32-bit floating-point value 482 F32X2 Access two 32-bit floating-point values 483 F32X4 Access four 32-bit floating-point values 484 S8 Access one 8-bit signed integer value 485 S16 Access one 16-bit signed integer value 486 S32 Access one 32-bit signed integer value 487 S32X2 Access two 32-bit signed integer values 488 S32X4 Access four 32-bit signed integer values 489 U8 Access one 8-bit unsigned integer value 490 U16 Access one 16-bit unsigned integer value 491 U32 Access one 32-bit unsigned integer value 492 U32X2 Access two 32-bit unsigned integer values 493 U32X4 Access four 32-bit unsigned integer values 494 495 For memory load operations, the "F32", "F32X2", "F32X4", "S8", "S16", 496 "S32", "S32X2", "S32X4", "U8", "U16", "U32", "U32X2", and "U32X4" storage 497 modifiers control how data are loaded from memory. Storage modifiers are 498 supported by LOAD instruction and are covered in more detail in the 499 descriptions of that instruction. LOAD must specify exactly one of these 500 modifiers, and may not specify any of the base data type modifiers (F,U,S) 501 described above. The base data type of the result vector of a LOAD 502 instruction is trivially derived from the storage modifier. 503 504 505 Add New Section 2.X.4.5, Program Memory Access 506 507 Programs may load from buffer object memory via the LOAD (global load) 508 instruction. 509 510 Load instructions read 8, 16, 32, 64, or 128 bits of data from a source 511 address to produce a four-component vector, according to the storage 512 modifier specified with the instruction. The storage modifier has three 513 parts: 514 515 - a base data type, "F", "S", or "U", specifying that the instruction 516 fetches floating-point, signed integer, or unsigned integer values, 517 respectively; 518 519 - a component size, specifying that the components fetched by the 520 instruction have 8, 16, or 32 bits; and 521 522 - an optional component count, where "X2" and "X4" indicate that two or 523 four components be fetched, and no count indicates a single component 524 fetch. 525 526 When the storage modifier specifies that fewer than four components should 527 be fetched, remaining components are filled with zeroes. When performing 528 a global load (LOAD), the GPU address is specified as an instruction 529 operand. Given a GPU address <address> and a storage modifier <modifier>, 530 the memory load can be described by the following code: 531 532 result_t_vec BufferMemoryLoad(char *address, OpModifier modifier) 533 { 534 result_t_vec result = { 0, 0, 0, 0 }; 535 switch (modifier) { 536 case F32: 537 result.x = ((float32_t *)address)[0]; 538 break; 539 case F32X2: 540 result.x = ((float32_t *)address)[0]; 541 result.y = ((float32_t *)address)[1]; 542 break; 543 case F32X4: 544 result.x = ((float32_t *)address)[0]; 545 result.y = ((float32_t *)address)[1]; 546 result.z = ((float32_t *)address)[2]; 547 result.w = ((float32_t *)address)[3]; 548 break; 549 case S8: 550 result.x = ((int8_t *)address)[0]; 551 break; 552 case S16: 553 result.x = ((int16_t *)address)[0]; 554 break; 555 case S32: 556 result.x = ((int32_t *)address)[0]; 557 break; 558 case S32X2: 559 result.x = ((int32_t *)address)[0]; 560 result.y = ((int32_t *)address)[1]; 561 break; 562 case S32X4: 563 result.x = ((int32_t *)address)[0]; 564 result.y = ((int32_t *)address)[1]; 565 result.z = ((int32_t *)address)[2]; 566 result.w = ((int32_t *)address)[3]; 567 break; 568 case U8: 569 result.x = ((uint8_t *)address)[0]; 570 break; 571 case U16: 572 result.x = ((uint16_t *)address)[0]; 573 break; 574 case U32: 575 result.x = ((uint32_t *)address)[0]; 576 break; 577 case U32X2: 578 result.x = ((uint32_t *)address)[0]; 579 result.y = ((uint32_t *)address)[1]; 580 break; 581 case U32X4: 582 result.x = ((uint32_t *)address)[0]; 583 result.y = ((uint32_t *)address)[1]; 584 result.z = ((uint32_t *)address)[2]; 585 result.w = ((uint32_t *)address)[3]; 586 break; 587 } 588 return result; 589 } 590 591 If a global load accesses a memory address that does not correspond to a 592 buffer object made resident by MakeBufferResidentNV, the results of the 593 operation are undefined and may result in application termination. 594 595 The address used for the buffer memory loads must be aligned to the fetch 596 size corresponding to the storage opcode modifier. For S8 and U8, the 597 offset has no alignment requirements. For S16 and U16, the offset must be 598 a multiple of two basic machine units. For F32, S32, and U32, the offset 599 must be a multiple of four. For F32X2, S32X2, and U32X2, the offset must 600 be a multiple of eight. For F32X4, S32X4, and U32X4, the offset must be a 601 multiple of sixteen. If an offset is not correctly aligned, the values 602 returned by a buffer memory load will be undefined. 603 604 605 Modify Section 2.X.6, Program Options 606 607 + Shader Buffer Load Support (NV_shader_buffer_load) 608 609 If a program specifies the "NV_shader_buffer_load" option, it may use the 610 LOAD instruction to load data from a resident buffer object given a GPU 611 address. 612 613 614 Section 2.X.8.Z, LOAD: Global Load 615 616 The LOAD instruction generates a result vector by reading an address from 617 the single unsigned integer scalar operand and fetching data from buffer 618 object memory, as described in Section 2.X.4.5. 619 620 address = ScalarLoad(op0); 621 result = BufferMemoryLoad(address, storageModifier); 622 623 LOAD supports no base data type modifiers, but requires exactly one 624 storage modifier. The base data type of the result vector is derived from 625 the storage modifier. The single scalar operand is always interpreted as 626 an unsigned integer. 627 628 The range of GPU addresses supported by the LOAD instruction may be 629 subject to an implementation-dependent limit. If any component fetched by 630 the LOAD instruction corresponds to memory with an address larger than the 631 value of MAX_SHADER_BUFFER_ADDRESS_NV, the value fetched for that 632 component will be undefined. 633 634 635Modifications to The OpenGL Shading Language Specification, Version 1.30.09 636 637 Modify Section 3.6, Keywords, p. 14 638 639 (add the following to the list of reserved keywords) 640 641 intptr_t 642 uintptr_t 643 644 645 Modify Section 4.1, Basic Types, p. 18 646 647 (add to the basic "Transparent Types" table, p. 18) 648 649 Types Meaning 650 -------- ---------------------------------------------------------- 651 intptr_t a signed integer with the same precision as a pointer 652 uintptr_t an unsigned integer with the same precision as a pointer 653 654 (replace the last paragraph of the section with the following) 655 656 Pointers to any of the transparent types, user-defined structs, or other 657 pointer types are supported. 658 659 660 Modify Section 4.1.3, Integers, p. 18 661 662 (add to the end of the first paragraph) Signed and unsigned integer 663 variables are fully supported. ... intptr_t and uintptr_t variables have 664 the same number of bits of precision as the native size of a pointer in 665 the underlying implementation. 666 667 668 (Insert new section immediately before Section 4.1.10, Implicit 669 Conversions, p. 27) 670 671 Section 4.1.X, Pointers 672 673 Pointers are 64-bit unsigned integer values that represent the address of 674 some "global" memory (i.e. not local to this invocation of a shader). 675 Pointers to any of the transparent types, user-defined structures, or 676 pointer types are supported. Pointers are dereferenced with the operators 677 (*), (->), and ([]) and a variety of operators performing addition and 678 subtraction are supported. There is no mechanism to assign a pointer to 679 the address of a local variable or array, nor is there a mechanism to 680 allocate or free memory from within a shader. There are no function 681 pointers. 682 683 The underlying memory read using pointer variables may also be accessed 684 using the OpenGL API commands. To communicate between shaders and other 685 OpenGL API commands, variables read through pointers are arranged in 686 memory in the manner described in Section 2.20.X of the OpenGL 687 Specification. 688 689 690 Modify Section 4.1.10, Implicit Conversions, p. 27 691 692 (add before the final paragraph of the section, p. 27) 693 694 Pointers to any type may be implicitly converted to pointers to void. 695 Pointers to any type (including void), are never implicitly converted to 696 pointers to any other non-void type. 697 698 699 Modify Section 5.1, Operators, p. 39 700 701 (add new entries to the precedence table; for a full spec, renumber the 702 new precedence row "3.5" to "4", and renumber all subsequent rows) 703 704 Precedence Operator Class Operators Associativity 705 ---------- -------------------------- --------- ------------- 706 2 field access from pointer -> left to right 707 3 pointer dereference * right to left 708 3.5 typecast () right to left 709 710 (modify the last paragraph, p.39, to delete language saying that 711 dereferences and typecast operators are not supported) 712 713 There is no address-of operator. 714 715 716 (Insert new section immediately after Section 5.7, Structure and Array 717 Operations, p. 46) 718 719 Section 5.X, Pointer Operations 720 721 The following operators are allowed to operate on pointer types: 722 723 pointer dereference * 724 additive + - 725 array subscript [] 726 arithmetic assignments += -= 727 postfix increment and decrement ++ -- 728 prefix increment and decrement ++ -- 729 equality == != 730 assignment = 731 field or method selector -> 732 733 The pointer dereference operator is a unary operator that converts a 734 pointer expression into an l-value designating data of the type pointed to 735 by the pointer expression. The result of a pointer dereference may not be 736 used as the left-hand side of an assignment. 737 738 The pointer binary addition (+) and subtraction (-) operators produce a 739 pointer result from one pointer operand and one scalar signed or unsigned 740 integer operand. For subtraction, the pointer must be the first operand; 741 for addition, the pointer may be either operand. The type of the result 742 is the same type as the pointer operand. A new pointer is computed by 743 adding or subtracting <I>*<S> basic machine units to the value of the 744 pointer operand, where <I> is the integer operand and <S> is the stride 745 that would be derived by applying the rules specified in Section 2.20.X of 746 the OpenGL Specification to an array with elements of the type pointed to 747 by the pointer. 748 749 The binary subtraction (-) operator may also operate on a pair of pointers 750 of identical type. In this operation, the second operand is subtracted 751 from the first, yielding a signed integer result of type <intptr_t>. The 752 result is in units of the type being pointed to. The result is the 753 integer value that would yield the first pointer operand if added to the 754 second pointer operand in the manner described above. If no such integer 755 value exists, the result of the operation is undefined. Pointer 756 subtraction is not supported for pointers to the type <void>. 757 758 The array subscript operator ([]) adds a signed or unsigned integer 759 expression specified inside the brackets to a pointer expression specified 760 to the left of the brackets, and then dereferences the pointer produced by 761 the addition. The array subscript operation "P[i]" is functionally 762 equivalent to "(*(P+i))". 763 764 The add into (+=) and subtract from (-=) are binary operations, where the 765 first operand must be one that could be assigned to (an l-value) and the 766 second operand must be a signed or unsigned integer scalar. These 767 operations add the integer operand into or subtract the integer operand 768 from the pointer operand, as defined for pointer addition and subtraction. 769 770 The arithmetic unary operators post- and pre-increment and decrement (-- 771 and ++) operate on pointers. For post- and pre-increment and decrement, 772 the expression must be one that could be assigned to (an l-value). Pre- 773 and post-increment and decrement add or subtract 1 to the contents of the 774 expression they operate on, as defined for pointer addition and 775 subtraction. The value of the pre-increment or pre-decrement expression 776 is the resulting value of that modification. The value of the 777 post-increment or post-decrement expression is the value of the expression 778 before modification. 779 780 The equality operators equal (==) and not equal (!=) operate on pointer 781 types and produce a scalar Boolean result. The two operands must either 782 be pointers to the same type, or one of the two operands must point to 783 void. Two pointers are considered equal if and only if they point to the 784 same global memory address. 785 786 The field or method selection operator (->) operates on a pointer to a 787 structure of any type and is used to select a field of the structure 788 pointed to by the pointer. This selector also operates on a pointer to 789 vector of any type, where the right hand side of the operator must be a 790 valid string using the vector component selection suffix described in 791 Section 5.5. In both cases, the field or method selection operation 792 "p->s" is functionally equivalent to "((*p).s)". 793 794 Pointer addition and subtraction, including the add into, subtract from, 795 and pre- and post-increment and decrement operators, are not supported on 796 pointers to a void type. 797 798 The assignment operator may be used to update the value of a pointer 799 variable, as described in Section 5.8. 800 801 802 (Insert after Section 5.10, Vector and Matrix Operations, p. 50) 803 804 Section 5.11, Typecast Operations 805 806 The typecast operator may be used to convert an expression from one type 807 to another, operating in a manner similar to scalar, vector, and matrix 808 constructors. The typecast operator specifies a new data type in 809 parentheses, followed by an expression, as in the following examples: 810 811 float a = (float) 2U; 812 vec3 b = (vec3) 1.0; 813 vec4 c = (vec4) b; 814 mat2 d = (mat2) 1.0; 815 mat4 e = (mat4) d; 816 817 For scalar, vector, and matrix data types, the set of typecasts supported 818 is equivalent to the set of single-operand constructors supported, and a 819 typecast operates identically to an equivalent constructor. A scalar 820 expression may be typecast to any scalar, vector, or matrix data type. A 821 vector expression may be typecast any vector type, except vectors with a 822 larger number of components. Additionally, four-component vector 823 expressions may also be cast to a mat2 type. A matrix expression may be 824 typecast to any other matrix data type. 825 826 Expressions with structure type may only be typecast to a structure of 827 identical type, which has no effect. Typecast operators are not supported 828 for array types. 829 830 Note that the typecast operator takes only a single expression. Unlike 831 constructors, they can not be used to generate a vector, structure, or 832 matrix from multiple inputs. For example, 833 834 vec3 f = (vec3) (1.0, 2.0, 3.0); 835 836 generates a three-component vector <f>. But all three components 837 are set to 3.0, which is the scalar value of the expression "(1.0, 2.0, 838 3.0)". The commas in that expression are sequence operators, not list 839 delimiters. 840 841 Additionally, typecast operators may also be used to cast values to a 842 pointer type. In this case, the expression being typecast must be either 843 a pointer (to any type) or a scalar of type intptr_t or uintptr_t. 844 845 vec4 *v4ptr 846 intptr_t iptr; 847 vec3 *v3ptr = (vec3 *) v4ptr; 848 ivec2 *iv2ptr = (ivec2 *) iptr; 849 850 Note that function call-style constructors are not supported for pointers. 851 852 853 Add to the end of Section 8.3, Common Functions, p. 72 854 855 (add support for pointer packing functions) 856 857 Syntax: 858 859 void *packPtr(uvec2 a); 860 uvec2 unpackPtr(void *a); 861 862 The function packPtr() returns a pointer to void by constructing a 64-bit 863 void pointer from the two 32-bit components of an unsigned integer vector. 864 The first vector component specifies the 32 least significant bits of the 865 pointer; the second component specifies the 32 most significant bits. 866 867 The function unpackPtr() returns a two-component unsigned integer vector 868 built from a 64-bit void pointer. The first component of the vector 869 consists of the 32 least significant bits of the pointer value; the second 870 component consists of the 32 most significant bits. 871 872 873 Modify Chapter 9, Shading Language Grammar, p.92 874 875 (change comment in the grammar disallowing pointer dereferences) 876 877 Change the sentence: 878 879 // Grammar Note: No '*' or '&' unary ops. Pointers are not supported. 880 881 to 882 883 // Grammar Note: No '&' unary. 884 885 886Additions to the AGL/EGL/GLX/WGL Specifications 887 888 None 889 890Errors 891 892 INVALID_ENUM is generated by MakeBufferResidentNV if <access> is not 893 READ_ONLY. 894 895 INVALID_ENUM is generated by GetBufferParameterui64vNV if <pname> is 896 not BUFFER_GPU_ADDRESS_NV. 897 898 INVALID_OPERATION is generated by MakeBufferResidentNV, 899 MakeBufferNonResidentNV, IsBufferResidentNV, and GetBufferParameterui64vNV 900 if no buffer is bound to <target>. 901 902 INVALID_OPERATION is generated by MakeBufferResidentNV if the buffer bound 903 to <target> is already resident in the current GL context. 904 905 INVALID_OPERATION is generated by MakeBufferNonResidentNV if the buffer 906 bound to <target> is not resident in the current GL context. 907 908 INVALID_OPERATION is generated by MakeNamedBufferResidentNV if <buffer> is 909 already resident in the current GL context. 910 911 INVALID_OPERATION is generated by MakeNamedBufferNonResidentNV if <buffer> 912 is not resident in the current GL context. 913 914 INVALID_OPERATION is generated by GetBufferParameterui64vNV or 915 MakeBufferResidentNV if the buffer bound to <target> has no data store. 916 917 INVALID_OPERATION is generated by GetNamedBufferParameterui64vNV or 918 MakeNamedBufferResidentNV if <buffer> has no data store. 919 920Examples 921 922 (1) Layout of a complex structure using the rules from the new Section 923 2.20.X added to the OpenGL spec: 924 925 struct Example { 926 // bytes used rules 927 float a; // 0-3 928 vec2 b; // 8-15 1 // bumped to a multiple of 8 929 vec3 c; // 16-27 1 930 struct { 931 int d; // 32-35 2 // bumped to a multiple of 8 (bvec2) 932 bvec2 e; // 40-47 1 933 } f; 934 float g; // 48-51 935 float h[2]; // 52-55 (h[0]) 5 // multiple of 4 (float) with no additional padding 936 // 56-59 (h[1]) 6 // tightly packed 937 mat2x3 i; // 64-75 (i[0]) 938 // 80-91 (i[1]) 6 // bumped to a multiple of 16 (vec3) 939 struct { 940 uvec3 j; // 96-107 (m[0].j) 941 vec2 k; // 112-119 (m[0].k) 1 // bumped to a multiple of 8 (vec2) 942 float l[2]; // 120-123 (m[0].l[0]) 1,5 // simply float aligned 943 // 124-127 (m[0].l[1]) 6 // tightly packed 944 // 128-139 (m[1].j) 945 // 144-151 (m[1].k) 946 // 152-155 (m[1].l[0]) 947 // 156-159 (m[1].l[1]) 948 } m[2]; 949 }; 950 // sizeof(Example) == 160 951 952 (2) Replacing bindable_uniform with an array of pointers: 953 954 #version 120 955 #extension GL_NV_shader_buffer_load : require 956 #extension GL_EXT_bindable_uniform : require 957 958 in vec4 **ptr; 959 in uvec2 whichbuf; 960 961 void main() { 962 gl_FrontColor = ptr[whichbuf.x][whichbuf.y]; 963 gl_Position = ftransform(); 964 } 965 966 in the GL code, assuming the bufferobject setup in the Overview: 967 968 glBindAttribLocation(program, 8, "ptr"); 969 glBindAttribLocation(program, 9, "whichbuf"); 970 glLinkProgram(program); 971 glBegin(...); 972 glVertexAttribI2iEXT(8, (unsigned int)pointerBufferAddr, 973 (unsigned int)(pointerBufferAddr>>32)); 974 for (i = ...) { 975 for (j = ...) { 976 glVertexAttribI2iEXT(9, i, j); 977 glVertex3f(...); 978 } 979 } 980 glEnd(); 981 982 983New State 984 985 Update Table 6.11, p. 349 (Buffer Object State) 986 987 Get Value Type Get Command Initial Value Sec Attribute 988 --------- ---- ----------- ------------- --- --------- 989 BUFFER_GPU_ADDRESS_NV Z64+ GetBufferParameterui64vNV 0 2.9 none 990 991 Update Table 6.46, p. 384 (Implementation Dependent Values) 992 993 Get Value Type Get Command Minimum Value Sec Attribute 994 --------- ---- ----------- ------------- --- --------- 995 MAX_SHADER_BUFFER_ADDRESS_NV Z64+ GetIntegerui64vNV 0xFFFFFFFF 2.X.2 none 996 997Dependencies on NV_gpu_program4: 998 999 This extension is generally written against the NV_gpu_program4 1000 wording, program grammar, etc., but doesn't have specific 1001 dependencies on its functionality. 1002 1003 1004Issues 1005 1006 1) Only buffer objects? 1007 1008 RESOLVED: YES, for now. Buffer objects are unformatted memory and 1009 easily mapped to a "pointer"-style shading language. 1010 1011 2) Should we allow writes? 1012 1013 RESOLVED: NO, deferred to a later extension. Writes involve 1014 specifying many kinds of synchronization primitives. Writes are also 1015 a "side effect" which makes program execution "observable" in cases 1016 where it may not have otherwise been (e.g. early-Z can kill fragments 1017 before shading, or a post-transform cache may prevent vertex program 1018 execution). 1019 1020 3) What happens if an invalid pointer is fetched? 1021 1022 UNRESOLVED: Unpredictable results, including program termination? 1023 Make the driver trap the error and report it (still unpredictable 1024 results, but no program termination)? My preference would be to 1025 at least report the faulting address (roughly), whether it was 1026 a read or a write, and which shader stage faulted. I'd like to not 1027 terminate the program, but the app has to assume all their data 1028 stored in the GL is lost. 1029 1030 4) What should this extension be named? 1031 1032 RESOLVED: NV_shader_buffer_load. Rather than trying to choose an 1033 overly-general name and naming future extensions "GL_XXX2", let's 1034 name this according to the specific functionality it provides. 1035 1036 5) What are the performance characteristics of buffer loads? 1037 1038 RESOLVED: Likely somewhere between uniforms and texture fetches, 1039 but totally implementation-dependent. Uniforms still serve a purpose 1040 for "program locals". Buffer loads may have different caching 1041 behavior than either uniforms or texture fetches, but the expectation 1042 is that they will be cached reads of memory and all the common sense 1043 guidelines to try to maintain locality of reference apply. 1044 1045 6) What does MakeBufferResidentNV do? Why not just have a 1046 MapBufferGPUNV? 1047 1048 RESOLVED: Reserving virtual address space only requires knowing the 1049 size of the data store, so an explicit MapBufferGPU call isn't 1050 necessary. If all GPUs supported demand paging, a GPU address might 1051 be sufficient, but without that assumption MakeBufferResidentNV serves 1052 as a hint to the driver that it needs to page lock memory, download 1053 the buffer contents into GPU-accessible memory, or other similar 1054 preparation. MapBufferGPU would also imply that a different address 1055 may be returned each time it is mapped, which could be cumbersome 1056 for the application to handle. 1057 1058 7) Is it an error to render while any resident buffer is mapped? 1059 1060 RESOLVED: No. As the number of attachment points in the context grows, 1061 even the existing error check is falling out of favor. 1062 1063 8) Does MapBuffer stall on pending use of a resident buffer? 1064 1065 RESOLVED: No. The existing language is: 1066 1067 "If the GL is able to map the buffer object's data store into the 1068 client's address space, MapBuffer returns the pointer value to 1069 the data store once all pending operations on that buffer have 1070 completed." 1071 1072 However, since the implementation has no information about how the 1073 buffer is used, "all pending operations" amounts to a Finish. In 1074 terms of sharing across contexts/threads, ARB_vertex_buffer_object 1075 says: 1076 1077 "How is synchronization enforced when buffer objects are shared by 1078 multiple OpenGL contexts? 1079 1080 RESOLVED: It is generally the clients' responsibility to 1081 synchronize modifications made to shared buffer objects." 1082 1083 So we shouldn't dictate any additional shared object synchronization. 1084 So the best we could do is a Finish, but it's not clear that this 1085 accomplishes anything for the application since they can just as 1086 easily call Finish. Or if they don't want synchronization, they can 1087 use MAP_UNSYNCHRONIZED_BIT. It seems the resolution to this is 1088 inconsequential as GL already provides the tools to achieve either 1089 behavior. Hence, don't bother stalling. 1090 1091 However, if a buffer was previously resident and has since been made 1092 non-resident, the implementation should enforce the stalling 1093 behavior for those pending operations from before it was made non- 1094 resident. 1095 1096 9) Given issue (8), what are some effective ways to load data into 1097 a buffer that is resident? 1098 1099 RESOLVED: There are several possibilities: 1100 1101 - BufferSubData. 1102 1103 - The application may track using Fences which parts of the buffer 1104 are actually in use and update them with CPU writes using 1105 MAP_UNSYNCHRONIZED_BIT. This is potentially error-prone, as 1106 described in ARB_copy_buffer. 1107 1108 - CopyBufferSubData. ARB_copy_buffer describes a simple usage example 1109 for a single-threaded application. Since this extension is targeted 1110 at reducing the CPU bottleneck in the rendering thread, offloading 1111 some of the work to other threads may be useful. 1112 1113 Example with a single Loading thread and Rendering thread: 1114 1115 Loading thread: 1116 while (1) { 1117 WaitForEvent(something to do); 1118 1119 NamedBufferData(tempBuffer, updateSize, NULL, STREAM_DRAW); 1120 ptr = MapNamedBuffer(tempBuffer, WRITE_ONLY); 1121 // fill ptr 1122 UnmapNamedBuffer(tempBuffer); 1123 // the buffer could have been filled via BufferData, if 1124 // that's more natural. 1125 1126 // send tempBuffer name to Rendering thread 1127 } 1128 Rendering thread: 1129 foreach (obj in scene) { 1130 if (obj has changed) { 1131 // get tempBuffer name from Loading thread 1132 1133 NamedCopyBufferSubData(tempBuffer, objBuf, objOffset, updateSize); 1134 } 1135 Draw(obj); 1136 } 1137 1138 If we further desire to offload the data transfer to another 1139 thread, and the implementation supports concurrent data transfers 1140 in one context/thread while rendering in another context/thread, 1141 this may also be accomplished thusly: 1142 1143 Loading thread: 1144 while (1) { 1145 WaitForEvent(something to do); 1146 1147 NamedBufferData(sysBuffer, updateSize, NULL, STREAM_DRAW); 1148 ptr = MapNamedBuffer(sysBuffer, WRITE_ONLY); 1149 // fill ptr 1150 UnmapNamedBuffer(sysBuffer); 1151 1152 NamedBufferData(vidBuffer, updateSize, NULL, STREAM_COPY); 1153 // This is a sysmem->vidmem blit. 1154 NamedCopyBufferSubData(sysBuffer, vidBuffer, 0, updateSize); 1155 SetFence(fenceId, ALL_COMPLETED); 1156 1157 // send vidBuffer name and fenceId to Rendering thread 1158 1159 // This could have been a BufferSubData directly into 1160 // vidBuffer, if that's more natural. 1161 } 1162 Rendering thread: 1163 foreach (obj in scene) { 1164 if (obj has changed) { 1165 // get vidBuffer name and fenceId from Loading thread 1166 1167 // note: there aren't any sharable fences currently, 1168 // actually need to ask the loading thread when it 1169 // has finished. 1170 FinishFence(fenceId); 1171 1172 // This is hopefully a fast vidmem->vidmem blit. 1173 NamedCopyBufferSubData(vidBuffer, objBuffer, objOffset, updateSize); 1174 } 1175 Draw(obj); 1176 } 1177 1178 In both of these examples, the point at which the data is written to 1179 the resident buffer's data store is clearly specified in order 1180 with rendering commands. This resolves a whole class of 1181 synchronization bugs (Write After Read hazard) that 1182 MAP_UNSYNCHRONIZED_BIT is prone to. 1183 1184 10) What happens if BufferData is called on a buffer that is resident? 1185 1186 RESOLVED: BufferData is specified to "delete the existing data store", 1187 so the GPU address of that data should become invalid. The buffer is 1188 therefore made non-resident in the current context. 1189 1190 11) Should residency be a property of the buffer object, or should 1191 a buffer be "made resident to a context"? 1192 1193 RESOLVED: Made resident to a context. If a shared buffer is used in 1194 two threads/contexts, it may be difficult for the application to know 1195 when the residency state actually changes on the shared object 1196 particularly if there is a large latency between commands being 1197 submitted on the client and processed on the server. Allowing the 1198 buffer to be made resident to each context individually allows the 1199 state to be reliably toggled in-order in each command stream. This 1200 also allows MakeBufferNonResident to serve as indication to the GL 1201 that the buffer is no longer in use in each command stream. 1202 1203 This leads to an unfortunate orphaning issue. For example, if the 1204 buffer is resident in context A and then deleted in context B, how 1205 can the app make it non-resident in context A? Given the name-based 1206 object model, it is impossible. It would be complex from an 1207 implementation point of view for DeleteBuffers (or BufferData) to 1208 either make it non-resident or throw an error if it is resident in 1209 some other context. 1210 1211 An ideal solution would be a (separate) extension that allows the 1212 application to increment the refcount on the object and to decrement 1213 the refcount without necessarily deleting the object's name. Until 1214 such an extension exists, the unsatisfying proposed resolution is that 1215 a buffer can be "stuck" resident until the context is deleted. Note 1216 that DeleteBuffers should make the buffer non-resident in the context 1217 that does the delete, so this problem only applies to rare multi- 1218 context corner cases. 1219 1220 12) Is there any value in requiring an "immutable structure" bit of 1221 state to be set in order to query the address? 1222 1223 RESOLVED: NO. Given that the BufferData behavior is fairly 1224 straightforward to specify and implement, it's not clear that this 1225 would be useful. 1226 1227 13) What should the program syntax look like? 1228 1229 RESOLVED: Support 1-, 2-, 4-vec fetches of float/int/uint types, as 1230 well as 8- and 16-bit int/uint fetches via a new LOAD instruction 1231 with a slew of suffixes. Handling 8/16bit sizes will be useful for 1232 high-level languages compiling to the assembly. Addresses are required 1233 to be a multiple of the size of the data, as some implementations may 1234 require this. 1235 1236 Other options include a more x86-style pointer dereference 1237 ("MOV R0, DWORD PTR[R1];") or a complement to program.local 1238 ("MOV R0, program.global[R1];") but neither of these provide the 1239 simple granularity of the explicit type suffixes, and a new 1240 instruction is convenient in terms of implementation and not muddling 1241 the clean definition of MOV. 1242 1243 14) How does the GL know to invalidate caches when data has changed? 1244 1245 RESOLVED: Any entry points that can write to buffer objects should 1246 trigger the necessary invalidation. A new entry point may only be 1247 necessary once there is a way to write to a buffer by GPU address. 1248 1249 15) Does this extension require 64bit register/operation support in 1250 programs and shaders? 1251 1252 RESOLVED: NO. At the API level, GPU addresses are always 64bit values 1253 and when they are stored in uniforms, attribs, parameters, etc. they 1254 should always be stored at full precision. However, if programs and 1255 shaders don't support 64bit registers/operations via another 1256 programmability extension, then they will need to use only 32 bits. 1257 On such implementations, the usable address space is therefore limited 1258 to 4GB. Such a limit should be reflected in the value of 1259 MAX_SHADER_BUFFER_ADDRESS_NV. 1260 1261 It is expected that GLSL shaders will be compiled in such a way as to 1262 generate 64bit pointers on implementations that support it and 32bit 1263 pointers on implementations that don't. So GLSL shaders written against 1264 a 32bit implementation can be expected to be forward-compatible when 1265 run against a 64bit implementation. (u)intptr_t types are provided to 1266 ease this compatibility. 1267 1268 Built-in functions are provided to convert pointers to and from a pair 1269 of integers. These can be used to pass pointers as two components of a 1270 generic attrib, to construct a pointer from an RGUI32 texture fetch, 1271 or to write a pointer to a fragment shader output. 1272 1273 16) What assumption can applications make about the alignment of 1274 addresses returned by GetBufferParameterui64vNV? 1275 1276 RESOLVED: All buffers will begin at an address that is a multiple of 1277 16 bytes. 1278 1279 17) How can the application guarantee that the layout of a structure 1280 on the CPU matches the layout used by the GLSL compiler? 1281 1282 RESOLVED: Provide a standard set of packing rules designed around 1283 naturally aligning simple types. This spec will define pointer fetches 1284 in GLSL to use these rules, but does not explicitly guarantee that 1285 other extensions (like EXT_bindable_uniform) will use the same packing 1286 rules for their bufferobject fetches. These packing rules are 1287 different from the ARB_uniform_buffer_object rules - in particular, 1288 these rules do not require vec4 padding of the array stride. 1289 1290 18) Is the address space per-context, per-share-group, or global? 1291 1292 RESOLVED: It is per-share-group. Using addresses from one share group 1293 in another share group will cause undefined results. 1294 1295 19) Is there risk of using invalid pointers for "killed" fragments, 1296 fragments that don't take a certain branch of an "if" block, or 1297 fragments whose shader is conceptually never executed due to pixel 1298 ownership, stipple, etc.? 1299 1300 RESOLVED: NO. OpenGL implementations sometimes run fragment programs 1301 on "helper" pixels that have no coverage, or continue to run fragment 1302 programs on killed pixels in order to be able to compute sane partial 1303 derivatives for fragment program instructions (DDX, DDY) or automatic 1304 level-of-detail calculations for texturing. In this approach, 1305 derivatives are approximated by computing the difference in a quantity 1306 computed for a given fragment at (x,y) and a fragment at a neighboring 1307 pixel. When a fragment program is executed on a "helper" pixel or 1308 killed pixel, global loads may not be executed in order to prevent 1309 spurious faults. Helper pixels aren't explicitly mentioned in the spec 1310 body; instead, partial derivatives are obtained by magic. 1311 1312 If a fragment program contains a KIL instruction, compilers may not 1313 reorder code such that a LOAD instruction is executed before a KIL 1314 instruction that logically precedes it in flow control. Once a 1315 fragment is killed, subsequent loads should never be executed if they 1316 could cause any observable side effects. 1317 1318 As a result, if a shader uses instructions that explicitly or 1319 implicitly do LOD calculations dependent on the result of a global 1320 load, those instructions will have undefined results. 1321 1322 20) How are structures and arrays stored in buffer object memory? 1323 1324 RESOLVED: Individual structure members and array elements are stored 1325 "packed" in memory, subject to an alignment requirement. Structure 1326 members are stored according to the order of declaration. Array elements 1327 are stored consecutively by element number. Unreferenced structure 1328 members or array elements are never eliminated. 1329 1330 The alignment requirement of individual structure members or array 1331 elements is usually equal to the size of the item. For the purposes of 1332 this requirement, vector types are treated atomically (i.e., a "vec4" with 1333 32-bit floats will be 16-byte aligned). One exception is that the 1334 required alignment of three-component vectors is the same as the required 1335 alignment of a four-component vector of the same base type. 1336 1337 21) How do the memory layout rules relate to the similar layout rules 1338 specified for the uniform buffer object (UBO) feature incorporated in 1339 OpenGL 3.1? 1340 1341 RESOLVED: This extension was completed prior to OpenGL 3.1, but the 1342 layout rules for this extension and for UBO were developed roughly 1343 concurrently. The layout rules here are nearly identical to those for the 1344 "std140" layout for uniform blocks. The main difference here is that 1345 "std140" requires arrays of small types (e.g., "float") to be padded out 1346 to vec4 alignment (16B), while this extension does not. 1347 1348 Note that this extension does NOT allow shaders to use the layout() 1349 qualifier added by GLSL 1.40 to achieve fine-grained control of structure 1350 or array layout using pointers. A subsequent extension could provide this 1351 capability. 1352 1353 22) Should we provide a mechanism for tighter packing of an array of 1354 three-component vectors? 1355 1356 RESOLVED: This could be desirable, but it won't be provided in this 1357 extension. A subsequent extension could support alternate layouts by 1358 allowing shaders to use of the GLSL 1.40 layout() modifier to qualify 1359 pointer types. 1360 1361 If tight packing of vec3's is strongly required, a three component array 1362 element could be constructed using three single component loads or by 1363 selecting/swizzling components of one or more larger loads. The former 1364 technique could be done using GLSL by replacing: 1365 1366 vec3 *pointer; 1367 vec3 elementN; 1368 int n; 1369 elementN = pointer[n]; 1370 1371 with 1372 1373 float *pointer; 1374 vec3 elementN; 1375 int n; 1376 elementN = vec3(pointer[n*3], pointer[n*3+1], pointer[n*3+2]); 1377 1378 1379Revision History 1380 1381 Rev. Date Author Changes 1382 ---- -------- -------- ----------------------------------------- 1383 8 08/06/10 istewart Modify behavior of named buffer functions 1384 to match those of EXT_direct_state_access. 1385 Add INVALID_OPERATION error to 1386 MakeBufferResidentNV and GetBufferParameterui64vNV 1387 if the buffer object has no data store. 1388 1389 7 06/22/10 pbrown Document INVALID_OPERATION errors on 1390 residency managment and query APIs when an 1391 non-existent buffer object is referenced, 1392 when trying to make an already resident buffer 1393 resident, or when trying to make an already 1394 non-resident buffer non-resident. 1395 1396 6 09/21/09 groth Fix non-conformant DSA function names. 1397 1398 5 09/10/09 Jon Leech Add 'const' to type of Uniformui64vNV and 1399 ProgramUniformui64vNV 'count' argument. 1400 1401 4 09/09/09 mjk Fix typos 1402 1403 3 08/21/09 pbrown Add explicit spec language describing the 1404 typecast operator implemented here. The 1405 previous spec language said it was allowed 1406 but didn't say what it did. 1407 1408 2 08/05/09 pbrown Update section describing memory layout of 1409 variables pointed to; moved to the core 1410 specification as with OpenGL 3.1's uniform 1411 buffer layout. Added a few issues on memory 1412 layout. Explicitly documented the set of 1413 operations and implicit conversions allowed 1414 on pointers. 1415 1416 1 jbolz Internal revisions. 1417