• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    NV_shader_buffer_load
4
5Name Strings
6
7    GL_NV_shader_buffer_load
8
9Contact
10
11    Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com)
12
13Contributors
14
15    Pat Brown, NVIDIA
16    Chris Dodd, NVIDIA
17    Mark Kilgard, NVIDIA
18    Eric Werness, NVIDIA
19
20Status
21
22    Complete
23
24Version
25
26    Last Modified Date:         August 8, 2010
27    Author Revision:            8
28
29Number
30
31    379
32
33Dependencies
34
35    Written against the OpenGL 3.0 Specification.
36
37    Written against the GLSL 1.30 Specification (Revision 09).
38
39    This extension interacts with NV_gpu_program4.
40
41
42Overview
43
44    At a very coarse level, GL has evolved in a way that allows
45    applications to replace many of the original state machine variables
46    with blocks of user-defined data. For example, the current vertex
47    state has been augmented by vertex buffer objects, fixed-function
48    shading state and parameters have been replaced by shaders/programs
49    and constant buffers, etc.. Applications switch between coarse sets
50    of state by binding objects to the context or to other container
51    objects (e.g. vertex array objects) instead of manipulating state
52    variables of the context. In terms of the number of GL commands
53    required to draw an object, modern applications are orders of
54    magnitude more efficient than legacy applications, but this explosion
55    of objects bound to other objects has led to a new bottleneck -
56    pointer chasing and CPU L2 cache misses in the driver, and general
57    L2 cache pollution.
58
59    This extension provides a mechanism to read from a flat, 64-bit GPU
60    address space from programs/shaders, to query GPU addresses of buffer
61    objects at the API level, and to bind buffer objects to the context in
62    such a way that they can be accessed via their GPU addresses in any
63    shader stage.
64
65    The intent is that applications can avoid re-binding buffer objects
66    or updating constants between each Draw call and instead simply use
67    a VertexAttrib (or TexCoord, or InstanceID, or...) to "point" to the
68    new object's state. In this way, one of the cheapest "state" updates
69    (from the CPU's point of view) can be used to effect a significant
70    state change in the shader similarly to how a pointer change may on
71    the CPU. At the same time, this relieves the limits on how many
72    buffer objects can be accessed at once by shaders, and allows these
73    buffer object accesses to be exposed as C-style pointer dereferences
74    in the shading language.
75
76    As a very simple example, imagine packing a group of similar objects'
77    constants into a single buffer object and pointing your program
78    at object <i> by setting "glVertexAttribI1iEXT(attrLoc, i);"
79    and using a shader as such:
80
81        struct MyObjectType {
82            mat4x4 modelView;
83            vec4 materialPropertyX;
84            // etc.
85        };
86        uniform MyObjectType *allObjects;
87        in int objectID; // bound to attrLoc
88
89        ...
90
91        mat4x4 thisObjectsMatrix = allObjects[objectID].modelView;
92        // do transform, shading, etc.
93
94    This is beneficial in much the same way that texture arrays allow
95    choosing between similar, but independent, texture maps with a single
96    coordinate identifying which slice of the texture to use. It also
97    resembles instancing, where a lightweight change (incrementing the
98    instance ID) can be used to generate a different and interesting
99    result, but with additional flexibility over instancing because the
100    values are app-controlled and not a single incrementing counter.
101
102    Dependent pointer fetches are allowed, so more complex scene graph
103    structures can be built into buffer objects providing significant new
104    flexibility in the use of shaders. Another simple example, showing
105    something you can't do with existing functionality, is to do dependent
106    fetches into many buffer objects:
107
108        GenBuffers(N, dataBuffers);
109        GenBuffers(1, &pointerBuffer);
110
111        GLuint64EXT gpuAddrs[N];
112        for (i = 0; i < N; ++i) {
113            BindBuffer(target, dataBuffers[i]);
114            BufferData(target, size[i], myData[i], STATIC_DRAW);
115
116            // get the address of this buffer and make it resident.
117            GetBufferParameterui64vNV(target, BUFFER_GPU_ADDRESS,
118                                      gpuaddrs[i]);
119            MakeBufferResidentNV(target, READ_ONLY);
120        }
121
122        GLuint64EXT pointerBufferAddr;
123        BindBuffer(target, pointerBuffer);
124        BufferData(target, sizeof(GLuint64EXT)*N, gpuAddrs, STATIC_DRAW);
125        GetBufferParameterui64vNV(target, BUFFER_GPU_ADDRESS,
126                                  &pointerBufferAddr);
127        MakeBufferResidentNV(target, READ_ONLY);
128
129        // now in the shader, we can use a double indirection
130        vec4 **ptrToBuffers = pointerBufferAddr;
131        vec4 *ptrToBufferI = ptrToBuffers[i];
132
133    This allows simultaneous access to more buffers than
134    EXT_bindable_uniform (MAX_VERTEX_BINDABLE_UNIFORMS, etc.) and each
135    can be larger than MAX_BINDABLE_UNIFORM_SIZE.
136
137New Procedures and Functions
138
139    void MakeBufferResidentNV(enum target, enum access);
140    void MakeBufferNonResidentNV(enum target);
141    boolean IsBufferResidentNV(enum target);
142    void MakeNamedBufferResidentNV(uint buffer, enum access);
143    void MakeNamedBufferNonResidentNV(uint buffer);
144    boolean IsNamedBufferResidentNV(uint buffer);
145
146    void GetBufferParameterui64vNV(enum target, enum pname,
147                                   uint64EXT *params);
148    void GetNamedBufferParameterui64vNV(uint buffer, enum pname,
149                                        uint64EXT *params);
150
151    void GetIntegerui64vNV(enum value, uint64EXT *result);
152
153    void Uniformui64NV(int location, uint64EXT value);
154    void Uniformui64vNV(int location, sizei count,
155                               const uint64EXT *value);
156    void GetUniformui64vNV(uint program, int location, uint64EXT *params);
157    void ProgramUniformui64NV(uint program, int location, uint64EXT value);
158    void ProgramUniformui64vNV(uint program, int location, sizei count,
159                               const uint64EXT *value);
160
161New Tokens
162
163    Accepted by the <pname> parameter of GetBufferParameterui64vNV,
164    GetNamedBufferParameterui64vNV:
165
166        BUFFER_GPU_ADDRESS_NV                          0x8F1D
167
168    Returned by the <type> parameter of GetActiveUniform:
169
170        GPU_ADDRESS_NV                                 0x8F34
171
172    Accepted by the <value> parameter of GetIntegerui64vNV:
173
174        MAX_SHADER_BUFFER_ADDRESS_NV                   0x8F35
175
176
177Additions to Chapter 2 of the OpenGL 3.0 Specification (OpenGL Operation)
178
179    Append to Section 2.9 (p. 45)
180
181    The data store of a buffer object may be made accessible to the GL
182    via shader buffer loads by calling:
183
184        void MakeBufferResidentNV(enum target, enum access);
185
186    <access> may only be READ_ONLY, but is provided for future
187    extensibility to indicate to the driver that the GPU may write to the
188    memory. <target> may be any of the buffer targets accepted by
189    BindBuffer.  The error INVALID_OPERATION will be generated if no
190    buffer is bound to <target>, if the buffer bound to <target> is
191    already resident in the current GL context, or if the buffer bound to
192    <target> has no data store.
193
194    While the buffer object is resident, it is legal to use GPU addresses
195    in the range [BUFFER_GPU_ADDRESS, BUFFER_GPU_ADDRESS + BUFFER_SIZE)
196    in any shader stage.
197
198    The data store of a buffer object may be made inaccessible to the GL
199    via shader buffer loads by calling:
200
201        void MakeBufferNonResidentNV(enum target);
202
203    A buffer is also made non-resident implicitly as a result of being
204    respecified via BufferData or being deleted. <target> may be any of
205    the buffer targets accepted by BindBuffer.  The error
206    INVALID_OPERATION will be generated if no buffer is bound to <target>
207    or if the buffer bound to <target> is not resident in the current
208    GL context.
209
210    The function:
211
212        void GetBufferParameterui64vNV(enum target, enum pname,
213                                       uint64EXT *params);
214
215    may be used to query the GPU address of a buffer object's data store.
216    This address remains valid until the buffer object is deleted, or
217    when the data store is respecified via BufferData. The address "zero"
218    is reserved for convenience, so no buffer object will ever have an
219    address of zero.  The error INVALID_OPERATION will be generated if no
220    buffer is bound to <target>, or if the buffer bound to <target> has no
221    data store.
222
223    The functions:
224
225        void MakeNamedBufferResidentNV(uint buffer, enum access);
226        void MakeNamedBufferNonResidentNV(uint buffer);
227        void GetNamedBufferParameterui64vNV(uint buffer, enum pname,
228                                            uint64EXT *params);
229
230    operate identically to the non-"Named" functions except, rather than
231    using currently bound buffers, it uses the buffer object identified
232    by <buffer>.  If the buffer object named by the buffer parameter has
233    not been previously bound or has been deleted since the last binding,
234    the GL first creates a new state vector, initialized with a zero-sized
235    memory buffer and comprising the state values listed in table 2.6.
236    There is no buffer corresponding to the name zero, these commands
237    generate the INVALID_OPERATION error if the buffer parameter is zero.
238
239    Add to Section 2.20.3 (p. 98)
240
241        void Uniformui64NV(int location, uint64EXT value);
242        void Uniformui64vNV(int location, sizei count, uint64EXT *value);
243
244    The Uniformui64{v}NV commands will load <count> uint64EXT values into
245    a uniform location defined as a GPU_ADDRESS_NV or an array of
246    GPU_ADDRESS_NVs.
247
248    The functions:
249
250        void ProgramUniformui64NV(uint program, int location,
251                                  uint64EXT value);
252        void ProgramUniformui64vNV(uint program, int location, sizei count,
253                                   uint64EXT *value);
254
255    operate identically to the non-"Program" functions except, rather
256    than updating the currently in use program object, these "Program"
257    commands update the program object named by the initial program
258    parameter.
259
260
261    Insert a new subsection after Section 2.20.4, Shader Execution (Vertex
262    Shaders), p. 103.
263
264    Section 2.20.X, Shader Memory Access
265
266    Shaders may load from buffer object memory by dereferencing pointer
267    variables.  Pointer variables are 64-bit unsigned integer values referring
268    to the GPU addresses of data stored in buffer objects made resident by
269    MakeBufferResidentNV.  The GPU addresses of such buffer objects may be
270    queried using GetBufferParameterui64vNV with a <pname> of
271    BUFFER_GPU_ADDRESS_NV.
272
273    When a shader dereferences a pointer variable, data are read from buffer
274    object memory according to the following rules:
275
276    - Data of type "bool" are stored in memory as one uint-typed value at the
277      specified GPU address.  All non-zero values correspond to true, and zero
278      corresponds to false.
279
280    - Data of type "int" are stored in memory as one int-typed value at the
281      specified GPU address.
282
283    - Data of type "uint" are stored in memory as one uint-typed value at the
284      specified GPU address.
285
286    - Data of type "float" are stored in memory as one float-typed value at
287      the specified GPU address.
288
289    - Vectors with <N> elements with any of the above basic element types are
290      stored in memory as <N> values in consecutive memory locations beginning
291      at the specified GPU address, with components stored in order with the
292      first (X) component at the lowest offset.  The data type used for
293      individual components is derived according to the rules for scalar
294      members above.
295
296    - Data with any pointer type are stored in memory as a single 64-bit
297      unsigned integer value at the specified GPU address.
298
299    - Column-major matrices with <C> columns and <R> rows (using the type
300      "mat<C>x<R>", or simply "mat<C>" if <C>==<R>) are treated as an array of
301      <C> floating-point column vectors, each consisting of <R> components.
302      The column vectors will be stored in order, with column zero at the
303      lowest offset.  The difference in offsets between consecutive columns of
304      the matrix will be referred to as the column stride, and is constant
305      across the matrix.
306
307    - Row-major matrices with <C> columns and <R> rows (using the type
308      "mat<C>x<R>", or simply "mat<C>" if <C>==<R>) are treated as an array of
309      <R> floating-point row vectors, each consisting of <C> components. The
310      row vectors will be stored in order, with row zero at the lowest offset.
311      The difference in offsets between consecutive rows of the matrix will be
312      referred to as the row stride, and is constant across the matrix.
313
314    - Arrays of scalars, vectors, pointers, and matrices are stored in memory
315      by element order, with array member zero at the lowest offset.  The
316      difference in offsets between each pair of elements in the array in
317      basic machine units is referred to as the array stride, and is constant
318      across the entire array.
319
320    For matrix and array variables, the matrix and/or array strides
321    corresponding to the variable may be derived according to the structure
322    layout rules specified immediately below.
323
324    When dereferencing a pointer to a structure, its individual members will
325    be laid out in memory in monotonically increasing order based on their
326    location in the structure declaration.  Each structure member has a base
327    offset and a base alignment, from which an aligned offset is computed by
328    rounding the base offset up to the next multiple of the base alignment.
329    The base offset of the first member of a structure is taken from the
330    aligned offset of the structure itself.  The base offset of all other
331    structure members is derived by taking the offset of the last basic
332    machine unit consumed by the previous member and adding one.  Each
333    structure member is stored in memory at its aligned offset.
334
335      (1) If the member is a scalar consuming <N> basic machine units, the
336          base alignment is <N>.
337
338      (2) If the member is a two- or four-component vector with components
339          consuming <N> basic machine units, the base alignment is 2<N> or
340          4<N>, respectively.
341
342      (3) If the member is a three-component vector with components consuming
343          <N> basic machine units, the base alignment is 4<N>.
344
345      (4) If the member is an array of scalars or vectors, the base alignment
346          and array stride are set to match the base alignment of a single
347          array element, according to rules (1), (2), and (3). The array may
348          have padding at the end; the base offset of the member following the
349          array is rounded up to the next multiple of the base alignment.
350
351      (5) If the member is a column-major matrix with <C> columns and <R>
352          rows, the matrix is stored identically to an array of <C> column
353          vectors with <R> components each, according to rule (4).
354
355      (6) If the member is an array of <S> column-major matrices with <C>
356          columns and <R> rows, the matrix is stored identically to a row of
357          <S>*<C> column vectors with <R> components each, according to rule
358          (4).
359
360      (7) If the member is a row-major matrix with <C> columns and <R> rows,
361          the matrix is stored identically to an array of <R> row vectors
362          with <C> components each, according to rule (4).
363
364      (8) If the member is an array of <S> row-major matrices with <C> columns
365          and <R> rows, the matrix is stored identically to a row of <S>*<R>
366          row vectors with <C> components each, according to rule (4).
367
368      (9) If the member is a structure, the base alignment of the structure is
369          <N>, where <N> is the largest base alignment value of any of its
370          members.  The individual members of this sub-structure are then
371          assigned offsets by applying this set of rules recursively, where
372          the base offset of the first member of the sub-structure is equal to
373          the aligned offset of the structure. The structure may have padding
374          at the end; the base offset of the member following the
375          sub-structure is rounded up to the next multiple of the base
376          alignment of the structure.
377
378      (10) If the member is an array of <S> structures, the <S> elements of
379           the array are laid out in order, according to rule (9).
380
381    If a shader reads from a GPU address that does not correspond to a buffer
382    object made resident by MakeBufferResidentNV, the results of the operation
383    are undefined and may result in application termination.
384
385    Any variable, array element, or structure member accessed using a pointer
386    has a required base alignment, which may be derived according the
387    structure layout rules above.  If a variable, array member, or structure
388    member is accessed using a pointer that is not a multiple of its base
389    alignment, the results of the access will be undefined.  To store multiple
390    variables in a single buffer object, an application must ensure that each
391    variable is properly aligned.  Storing a single scalar, vector, matrix,
392    array, or structure variable using a pointer set to the base GPU address
393    of a resident buffer object requires no special alignment.  The base GPU
394    address of a buffer object is guaranteed to be sufficiently aligned to
395    satisfy the base alignment requirement of any variable, and the layout
396    rules above ensure that individual matrix rows/columns, array elements,
397    and structure members are properly aligned as long as the base pointer
398    meets alignment requirements.
399
400
401Additions to Chapter 5 of the OpenGL 3.0 Specification (Special Functions)
402
403    Add to Section 5.4, p. 310 (Display Lists)
404
405    Edit the list of commands that are executed immediately when compiling
406    a display list to include MakeBufferResidentNV,
407    MakeBufferNonResidentNV, MakeNamedBufferResidentNV,
408    MakeNamedBufferNonResidentNV, GetBufferParameterui64vNV,
409    GetNamedBufferParameterui64vNV, IsBufferResidentNV, and
410    IsNamedBufferResidentNV.
411
412Additions to Chapter 6 of the OpenGL 3.0 Specification (Querying GL State)
413
414    Add to Section 6.1.11, p. 314 (Pointer, String, and 64-bit Queries)
415
416    The command:
417
418        void GetIntegerui64vNV(enum value, uint64EXT *result);
419
420    obtains 64-bit unsigned integer state variables. Legal values of
421    <value> are only those that specify GetIntegerui64vNV in the state
422    tables in Chapter 6.
423
424    Add to Section 6.1.13, p. 332 (Buffer Object Queries)
425
426    The commands:
427
428        boolean IsBufferResidentNV(enum target);
429        boolean IsNamedBufferResidentNV(uint buffer);
430
431    return TRUE if the specified buffer is resident in the current context.
432    The error INVALID_OPERATION will be generated by IsBufferResidentNV if no
433    buffer is bound to <target>.  If the buffer object named by the buffer
434    parameter of IsNamedBufferResidentNV has not been previously bound or has
435    been deleted since the last binding, the GL first creates a new state
436    vector, initialized with a zero-sized memory buffer and comprising the
437    state values listed in table 2.6.  There is no buffer corresponding to the
438    name zero, IsNamedBufferResidentNV generates the INVALID_OPERATION error if
439    the buffer parameter is zero.
440
441    Add to Section 6.1.15, p. 337 (Shader and Program Queries)
442
443        void GetUniformui64vNV(uint program, int location, uint64EXT *params);
444
445Additions to Appendix D of the OpenGL 3.0 Specification (Shared Objects and Multiple Contexts)
446
447    Add a new section D.X (Object Use by GPU Address)
448
449    A buffer object's GPU addresses is valid in all contexts in the share
450    group that the buffer belongs to. A buffer should be made resident in
451    each context that will use it via GPU address, to allow the GL
452    knowledge that it is used in each command stream.
453
454Additions to the NV_gpu_program4 specification:
455
456    Change Section 2.X.2, Program Grammar
457
458    If a program specifies the NV_shader_buffer_load program option,
459    the following modifications apply to the program grammar:
460
461    Append to <opModifier> list: | "F32" | "F32X2" | "F32X4" | "S8" | "S16" |
462    "S32" | "S32X2" | "S32X4" | "U8" | "U16" | "U32" | "U32X2" | "U32X4".
463
464    Append to <SCALARop> list: | "LOAD".
465
466    Modify Section 2.X.4, Program Execution Environment
467
468    (Add to the set of opcodes in Table X.13)
469
470                  Modifiers
471      Instruction F I C S H D  Out Inputs    Description
472      ----------- - - - - - -  --- --------  --------------------------------
473      LOAD        X X X X - F  v   su        Global load
474
475
476    (Add to Table X.14, Instruction Modifiers, and to the corresponding
477    description following the table)
478
479      Modifier  Description
480      --------  -----------------------------------------------
481      F32       Access one 32-bit floating-point value
482      F32X2     Access two 32-bit floating-point values
483      F32X4     Access four 32-bit floating-point values
484      S8        Access one 8-bit signed integer value
485      S16       Access one 16-bit signed integer value
486      S32       Access one 32-bit signed integer value
487      S32X2     Access two 32-bit signed integer values
488      S32X4     Access four 32-bit signed integer values
489      U8        Access one 8-bit unsigned integer value
490      U16       Access one 16-bit unsigned integer value
491      U32       Access one 32-bit unsigned integer value
492      U32X2     Access two 32-bit unsigned integer values
493      U32X4     Access four 32-bit unsigned integer values
494
495    For memory load operations, the "F32", "F32X2", "F32X4", "S8", "S16",
496    "S32", "S32X2", "S32X4", "U8", "U16", "U32", "U32X2", and "U32X4" storage
497    modifiers control how data are loaded from memory.  Storage modifiers are
498    supported by LOAD instruction and are covered in more detail in the
499    descriptions of that instruction.  LOAD must specify exactly one of these
500    modifiers, and may not specify any of the base data type modifiers (F,U,S)
501    described above.  The base data type of the result vector of a LOAD
502    instruction is trivially derived from the storage modifier.
503
504
505    Add New Section 2.X.4.5, Program Memory Access
506
507    Programs may load from buffer object memory via the LOAD (global load)
508    instruction.
509
510    Load instructions read 8, 16, 32, 64, or 128 bits of data from a source
511    address to produce a four-component vector, according to the storage
512    modifier specified with the instruction.  The storage modifier has three
513    parts:
514
515      - a base data type, "F", "S", or "U", specifying that the instruction
516        fetches floating-point, signed integer, or unsigned integer values,
517        respectively;
518
519      - a component size, specifying that the components fetched by the
520        instruction have 8, 16, or 32 bits; and
521
522      - an optional component count, where "X2" and "X4" indicate that two or
523        four components be fetched, and no count indicates a single component
524        fetch.
525
526    When the storage modifier specifies that fewer than four components should
527    be fetched, remaining components are filled with zeroes.  When performing
528    a global load (LOAD), the GPU address is specified as an instruction
529    operand.  Given a GPU address <address> and a storage modifier <modifier>,
530    the memory load can be described by the following code:
531
532      result_t_vec BufferMemoryLoad(char *address, OpModifier modifier)
533      {
534        result_t_vec result = { 0, 0, 0, 0 };
535        switch (modifier) {
536        case F32:
537            result.x = ((float32_t *)address)[0];
538            break;
539        case F32X2:
540            result.x = ((float32_t *)address)[0];
541            result.y = ((float32_t *)address)[1];
542            break;
543        case F32X4:
544            result.x = ((float32_t *)address)[0];
545            result.y = ((float32_t *)address)[1];
546            result.z = ((float32_t *)address)[2];
547            result.w = ((float32_t *)address)[3];
548            break;
549        case S8:
550            result.x = ((int8_t *)address)[0];
551            break;
552        case S16:
553            result.x = ((int16_t *)address)[0];
554            break;
555        case S32:
556            result.x = ((int32_t *)address)[0];
557            break;
558        case S32X2:
559            result.x = ((int32_t *)address)[0];
560            result.y = ((int32_t *)address)[1];
561            break;
562        case S32X4:
563            result.x = ((int32_t *)address)[0];
564            result.y = ((int32_t *)address)[1];
565            result.z = ((int32_t *)address)[2];
566            result.w = ((int32_t *)address)[3];
567            break;
568        case U8:
569            result.x = ((uint8_t *)address)[0];
570            break;
571        case U16:
572            result.x = ((uint16_t *)address)[0];
573            break;
574        case U32:
575            result.x = ((uint32_t *)address)[0];
576            break;
577        case U32X2:
578            result.x = ((uint32_t *)address)[0];
579            result.y = ((uint32_t *)address)[1];
580            break;
581        case U32X4:
582            result.x = ((uint32_t *)address)[0];
583            result.y = ((uint32_t *)address)[1];
584            result.z = ((uint32_t *)address)[2];
585            result.w = ((uint32_t *)address)[3];
586            break;
587        }
588        return result;
589      }
590
591    If a global load accesses a memory address that does not correspond to a
592    buffer object made resident by MakeBufferResidentNV, the results of the
593    operation are undefined and may result in application termination.
594
595    The address used for the buffer memory loads must be aligned to the fetch
596    size corresponding to the storage opcode modifier.  For S8 and U8, the
597    offset has no alignment requirements.  For S16 and U16, the offset must be
598    a multiple of two basic machine units.  For F32, S32, and U32, the offset
599    must be a multiple of four.  For F32X2, S32X2, and U32X2, the offset must
600    be a multiple of eight.  For F32X4, S32X4, and U32X4, the offset must be a
601    multiple of sixteen.  If an offset is not correctly aligned, the values
602    returned by a buffer memory load will be undefined.
603
604
605    Modify Section 2.X.6, Program Options
606
607    + Shader Buffer Load Support (NV_shader_buffer_load)
608
609    If a program specifies the "NV_shader_buffer_load" option, it may use the
610    LOAD instruction to load data from a resident buffer object given a GPU
611    address.
612
613
614    Section 2.X.8.Z, LOAD:  Global Load
615
616    The LOAD instruction generates a result vector by reading an address from
617    the single unsigned integer scalar operand and fetching data from buffer
618    object memory, as described in Section 2.X.4.5.
619
620      address = ScalarLoad(op0);
621      result = BufferMemoryLoad(address, storageModifier);
622
623    LOAD supports no base data type modifiers, but requires exactly one
624    storage modifier.  The base data type of the result vector is derived from
625    the storage modifier.  The single scalar operand is always interpreted as
626    an unsigned integer.
627
628    The range of GPU addresses supported by the LOAD instruction may be
629    subject to an implementation-dependent limit.  If any component fetched by
630    the LOAD instruction corresponds to memory with an address larger than the
631    value of MAX_SHADER_BUFFER_ADDRESS_NV, the value fetched for that
632    component will be undefined.
633
634
635Modifications to The OpenGL Shading Language Specification, Version 1.30.09
636
637    Modify Section 3.6, Keywords, p. 14
638
639    (add the following to the list of reserved keywords)
640
641    intptr_t
642    uintptr_t
643
644
645    Modify Section 4.1, Basic Types, p. 18
646
647    (add to the basic "Transparent Types" table, p. 18)
648
649      Types       Meaning
650      --------    ----------------------------------------------------------
651      intptr_t    a signed integer with the same precision as a pointer
652      uintptr_t   an unsigned integer with the same precision as a pointer
653
654    (replace the last paragraph of the section with the following)
655
656    Pointers to any of the transparent types, user-defined structs, or other
657    pointer types are supported.
658
659
660    Modify Section 4.1.3, Integers, p. 18
661
662    (add to the end of the first paragraph) Signed and unsigned integer
663    variables are fully supported.  ... intptr_t and uintptr_t variables have
664    the same number of bits of precision as the native size of a pointer in
665    the underlying implementation.
666
667
668    (Insert new section immediately before Section 4.1.10, Implicit
669    Conversions, p. 27)
670
671    Section 4.1.X, Pointers
672
673    Pointers are 64-bit unsigned integer values that represent the address of
674    some "global" memory (i.e. not local to this invocation of a shader).
675    Pointers to any of the transparent types, user-defined structures, or
676    pointer types are supported.  Pointers are dereferenced with the operators
677    (*), (->), and ([]) and a variety of operators performing addition and
678    subtraction are supported.  There is no mechanism to assign a pointer to
679    the address of a local variable or array, nor is there a mechanism to
680    allocate or free memory from within a shader.  There are no function
681    pointers.
682
683    The underlying memory read using pointer variables may also be accessed
684    using the OpenGL API commands.  To communicate between shaders and other
685    OpenGL API commands, variables read through pointers are arranged in
686    memory in the manner described in Section 2.20.X of the OpenGL
687    Specification.
688
689
690    Modify Section 4.1.10, Implicit Conversions, p. 27
691
692    (add before the final paragraph of the section, p. 27)
693
694    Pointers to any type may be implicitly converted to pointers to void.
695    Pointers to any type (including void), are never implicitly converted to
696    pointers to any other non-void type.
697
698
699    Modify Section 5.1, Operators, p. 39
700
701    (add new entries to the precedence table; for a full spec, renumber the
702    new precedence row "3.5" to "4", and renumber all subsequent rows)
703
704    Precedence  Operator Class               Operators    Associativity
705    ----------  --------------------------   ---------    -------------
706      2         field access from pointer       ->        left to right
707      3         pointer dereference             *         right to left
708      3.5       typecast                        ()        right to left
709
710    (modify the last paragraph, p.39, to delete language saying that
711     dereferences and typecast operators are not supported)
712
713    There is no address-of operator.
714
715
716    (Insert new section immediately after Section 5.7, Structure and Array
717     Operations, p. 46)
718
719    Section 5.X, Pointer Operations
720
721    The following operators are allowed to operate on pointer types:
722
723        pointer dereference                     *
724        additive                                + -
725        array subscript                         []
726        arithmetic assignments                  += -=
727        postfix increment and decrement         ++ --
728        prefix increment and decrement          ++ --
729        equality                                == !=
730        assignment                              =
731        field or method selector                ->
732
733    The pointer dereference operator is a unary operator that converts a
734    pointer expression into an l-value designating data of the type pointed to
735    by the pointer expression.  The result of a pointer dereference may not be
736    used as the left-hand side of an assignment.
737
738    The pointer binary addition (+) and subtraction (-) operators produce a
739    pointer result from one pointer operand and one scalar signed or unsigned
740    integer operand.  For subtraction, the pointer must be the first operand;
741    for addition, the pointer may be either operand.  The type of the result
742    is the same type as the pointer operand.  A new pointer is computed by
743    adding or subtracting <I>*<S> basic machine units to the value of the
744    pointer operand, where <I> is the integer operand and <S> is the stride
745    that would be derived by applying the rules specified in Section 2.20.X of
746    the OpenGL Specification to an array with elements of the type pointed to
747    by the pointer.
748
749    The binary subtraction (-) operator may also operate on a pair of pointers
750    of identical type.  In this operation, the second operand is subtracted
751    from the first, yielding a signed integer result of type <intptr_t>.  The
752    result is in units of the type being pointed to.  The result is the
753    integer value that would yield the first pointer operand if added to the
754    second pointer operand in the manner described above.  If no such integer
755    value exists, the result of the operation is undefined.  Pointer
756    subtraction is not supported for pointers to the type <void>.
757
758    The array subscript operator ([]) adds a signed or unsigned integer
759    expression specified inside the brackets to a pointer expression specified
760    to the left of the brackets, and then dereferences the pointer produced by
761    the addition.  The array subscript operation "P[i]" is functionally
762    equivalent to "(*(P+i))".
763
764    The add into (+=) and subtract from (-=) are binary operations, where the
765    first operand must be one that could be assigned to (an l-value) and the
766    second operand must be a signed or unsigned integer scalar.  These
767    operations add the integer operand into or subtract the integer operand
768    from the pointer operand, as defined for pointer addition and subtraction.
769
770    The arithmetic unary operators post- and pre-increment and decrement (--
771    and ++) operate on pointers.  For post- and pre-increment and decrement,
772    the expression must be one that could be assigned to (an l-value).  Pre-
773    and post-increment and decrement add or subtract 1 to the contents of the
774    expression they operate on, as defined for pointer addition and
775    subtraction.  The value of the pre-increment or pre-decrement expression
776    is the resulting value of that modification.  The value of the
777    post-increment or post-decrement expression is the value of the expression
778    before modification.
779
780    The equality operators equal (==) and not equal (!=) operate on pointer
781    types and produce a scalar Boolean result.  The two operands must either
782    be pointers to the same type, or one of the two operands must point to
783    void.  Two pointers are considered equal if and only if they point to the
784    same global memory address.
785
786    The field or method selection operator (->) operates on a pointer to a
787    structure of any type and is used to select a field of the structure
788    pointed to by the pointer.  This selector also operates on a pointer to
789    vector of any type, where the right hand side of the operator must be a
790    valid string using the vector component selection suffix described in
791    Section 5.5.  In both cases, the field or method selection operation
792    "p->s" is functionally equivalent to "((*p).s)".
793
794    Pointer addition and subtraction, including the add into, subtract from,
795    and pre- and post-increment and decrement operators, are not supported on
796    pointers to a void type.
797
798    The assignment operator may be used to update the value of a pointer
799    variable, as described in Section 5.8.
800
801
802    (Insert after Section 5.10, Vector and Matrix Operations, p. 50)
803
804    Section 5.11, Typecast Operations
805
806    The typecast operator may be used to convert an expression from one type
807    to another, operating in a manner similar to scalar, vector, and matrix
808    constructors.  The typecast operator specifies a new data type in
809    parentheses, followed by an expression, as in the following examples:
810
811      float a = (float) 2U;
812      vec3 b = (vec3) 1.0;
813      vec4 c = (vec4) b;
814      mat2 d = (mat2) 1.0;
815      mat4 e = (mat4) d;
816
817    For scalar, vector, and matrix data types, the set of typecasts supported
818    is equivalent to the set of single-operand constructors supported, and a
819    typecast operates identically to an equivalent constructor.  A scalar
820    expression may be typecast to any scalar, vector, or matrix data type.  A
821    vector expression may be typecast any vector type, except vectors with a
822    larger number of components.  Additionally, four-component vector
823    expressions may also be cast to a mat2 type.  A matrix expression may be
824    typecast to any other matrix data type.
825
826    Expressions with structure type may only be typecast to a structure of
827    identical type, which has no effect.  Typecast operators are not supported
828    for array types.
829
830    Note that the typecast operator takes only a single expression.  Unlike
831    constructors, they can not be used to generate a vector, structure, or
832    matrix from multiple inputs.  For example,
833
834      vec3 f = (vec3) (1.0, 2.0, 3.0);
835
836    generates a three-component vector <f>.  But all three components
837    are set to 3.0, which is the scalar value of the expression "(1.0, 2.0,
838    3.0)".  The commas in that expression are sequence operators, not list
839    delimiters.
840
841    Additionally, typecast operators may also be used to cast values to a
842    pointer type.  In this case, the expression being typecast must be either
843    a pointer (to any type) or a scalar of type intptr_t or uintptr_t.
844
845      vec4      *v4ptr
846      intptr_t  iptr;
847      vec3      *v3ptr = (vec3 *) v4ptr;
848      ivec2     *iv2ptr = (ivec2 *) iptr;
849
850    Note that function call-style constructors are not supported for pointers.
851
852
853    Add to the end of Section 8.3, Common Functions, p. 72
854
855    (add support for pointer packing functions)
856
857    Syntax:
858
859      void *packPtr(uvec2 a);
860      uvec2 unpackPtr(void *a);
861
862    The function packPtr() returns a pointer to void by constructing a 64-bit
863    void pointer from the two 32-bit components of an unsigned integer vector.
864    The first vector component specifies the 32 least significant bits of the
865    pointer; the second component specifies the 32 most significant bits.
866
867    The function unpackPtr() returns a two-component unsigned integer vector
868    built from a 64-bit void pointer.  The first component of the vector
869    consists of the 32 least significant bits of the pointer value; the second
870    component consists of the 32 most significant bits.
871
872
873    Modify Chapter 9, Shading Language Grammar, p.92
874
875    (change comment in the grammar disallowing pointer dereferences)
876
877    Change the sentence:
878
879      // Grammar Note: No '*' or '&' unary ops. Pointers are not supported.
880
881    to
882
883      // Grammar Note: No '&' unary.
884
885
886Additions to the AGL/EGL/GLX/WGL Specifications
887
888    None
889
890Errors
891
892    INVALID_ENUM is generated by MakeBufferResidentNV if <access> is not
893    READ_ONLY.
894
895    INVALID_ENUM is generated by GetBufferParameterui64vNV if <pname> is
896    not BUFFER_GPU_ADDRESS_NV.
897
898    INVALID_OPERATION is generated by MakeBufferResidentNV,
899    MakeBufferNonResidentNV, IsBufferResidentNV, and GetBufferParameterui64vNV
900    if no buffer is bound to <target>.
901
902    INVALID_OPERATION is generated by MakeBufferResidentNV if the buffer bound
903    to <target> is already resident in the current GL context.
904
905    INVALID_OPERATION is generated by MakeBufferNonResidentNV if the buffer
906    bound to <target> is not resident in the current GL context.
907
908    INVALID_OPERATION is generated by MakeNamedBufferResidentNV if <buffer> is
909    already resident in the current GL context.
910
911    INVALID_OPERATION is generated by MakeNamedBufferNonResidentNV if <buffer>
912    is not resident in the current GL context.
913
914    INVALID_OPERATION is generated by GetBufferParameterui64vNV or
915    MakeBufferResidentNV if the buffer bound to <target> has no data store.
916
917    INVALID_OPERATION is generated by GetNamedBufferParameterui64vNV or
918    MakeNamedBufferResidentNV if <buffer> has no data store.
919
920Examples
921
922    (1) Layout of a complex structure using the rules from the new Section
923        2.20.X added to the OpenGL spec:
924
925    struct  Example {
926                    // bytes used            rules
927      float a;      //  0-3
928      vec2 b;       //  8-15                 1   // bumped to a multiple of 8
929      vec3 c;       //  16-27                1
930      struct {
931        int d;      //  32-35                2   // bumped to a multiple of 8 (bvec2)
932        bvec2 e;    //  40-47                1
933      } f;
934      float g;      //  48-51
935      float h[2];   //  52-55 (h[0])         5   // multiple of 4 (float) with no additional padding
936                    //  56-59 (h[1])         6   // tightly packed
937      mat2x3 i;     //  64-75 (i[0])
938                    //  80-91 (i[1])         6   // bumped to a multiple of 16 (vec3)
939      struct {
940        uvec3 j;    //   96-107 (m[0].j)
941        vec2 k;     //  112-119 (m[0].k)     1   // bumped to a multiple of 8 (vec2)
942        float l[2]; //  120-123 (m[0].l[0])  1,5 // simply float aligned
943                    //  124-127 (m[0].l[1])  6   // tightly packed
944                    //  128-139 (m[1].j)
945                    //  144-151 (m[1].k)
946                    //  152-155 (m[1].l[0])
947                    //  156-159 (m[1].l[1])
948      } m[2];
949    };
950    // sizeof(Example) == 160
951
952    (2) Replacing bindable_uniform with an array of pointers:
953
954        #version 120
955        #extension GL_NV_shader_buffer_load : require
956        #extension GL_EXT_bindable_uniform : require
957
958        in vec4 **ptr;
959        in uvec2 whichbuf;
960
961        void main() {
962            gl_FrontColor = ptr[whichbuf.x][whichbuf.y];
963            gl_Position = ftransform();
964        }
965
966        in the GL code, assuming the bufferobject setup in the Overview:
967
968        glBindAttribLocation(program, 8, "ptr");
969        glBindAttribLocation(program, 9, "whichbuf");
970        glLinkProgram(program);
971        glBegin(...);
972        glVertexAttribI2iEXT(8, (unsigned int)pointerBufferAddr,
973                                (unsigned int)(pointerBufferAddr>>32));
974        for (i = ...) {
975            for (j = ...) {
976                glVertexAttribI2iEXT(9, i, j);
977                glVertex3f(...);
978            }
979        }
980        glEnd();
981
982
983New State
984
985    Update Table 6.11, p. 349 (Buffer Object State)
986
987    Get Value                   Type    Get Command                  Initial Value   Sec     Attribute
988    ---------                   ----    -----------                  -------------   ---     ---------
989    BUFFER_GPU_ADDRESS_NV       Z64+    GetBufferParameterui64vNV    0               2.9     none
990
991    Update Table 6.46, p. 384 (Implementation Dependent Values)
992
993    Get Value                   Type    Get Command                  Minimum Value   Sec     Attribute
994    ---------                   ----    -----------                  -------------   ---     ---------
995    MAX_SHADER_BUFFER_ADDRESS_NV Z64+   GetIntegerui64vNV            0xFFFFFFFF      2.X.2   none
996
997Dependencies on NV_gpu_program4:
998
999    This extension is generally written against the NV_gpu_program4
1000    wording, program grammar, etc., but doesn't have specific
1001    dependencies on its functionality.
1002
1003
1004Issues
1005
1006    1) Only buffer objects?
1007
1008    RESOLVED: YES, for now. Buffer objects are unformatted memory and
1009    easily mapped to a "pointer"-style shading language.
1010
1011    2) Should we allow writes?
1012
1013    RESOLVED: NO, deferred to a later extension. Writes involve
1014    specifying many kinds of synchronization primitives. Writes are also
1015    a "side effect" which makes program execution "observable" in cases
1016    where it may not have otherwise been (e.g. early-Z can kill fragments
1017    before shading, or a post-transform cache may prevent vertex program
1018    execution).
1019
1020    3) What happens if an invalid pointer is fetched?
1021
1022    UNRESOLVED: Unpredictable results, including program termination?
1023    Make the driver trap the error and report it (still unpredictable
1024    results, but no program termination)? My preference would be to
1025    at least report the faulting address (roughly), whether it was
1026    a read or a write, and which shader stage faulted. I'd like to not
1027    terminate the program, but the app has to assume all their data
1028    stored in the GL is lost.
1029
1030    4) What should this extension be named?
1031
1032    RESOLVED: NV_shader_buffer_load. Rather than trying to choose an
1033    overly-general name and naming future extensions "GL_XXX2", let's
1034    name this according to the specific functionality it provides.
1035
1036    5) What are the performance characteristics of buffer loads?
1037
1038    RESOLVED: Likely somewhere between uniforms and texture fetches,
1039    but totally implementation-dependent. Uniforms still serve a purpose
1040    for "program locals". Buffer loads may have different caching
1041    behavior than either uniforms or texture fetches, but the expectation
1042    is that they will be cached reads of memory and all the common sense
1043    guidelines to try to maintain locality of reference apply.
1044
1045    6) What does MakeBufferResidentNV do? Why not just have a
1046    MapBufferGPUNV?
1047
1048    RESOLVED: Reserving virtual address space only requires knowing the
1049    size of the data store, so an explicit MapBufferGPU call isn't
1050    necessary. If all GPUs supported demand paging, a GPU address might
1051    be sufficient, but without that assumption MakeBufferResidentNV serves
1052    as a hint to the driver that it needs to page lock memory, download
1053    the buffer contents into GPU-accessible memory, or other similar
1054    preparation. MapBufferGPU would also imply that a different address
1055    may be returned each time it is mapped, which could be cumbersome
1056    for the application to handle.
1057
1058    7) Is it an error to render while any resident buffer is mapped?
1059
1060    RESOLVED: No. As the number of attachment points in the context grows,
1061    even the existing error check is falling out of favor.
1062
1063    8) Does MapBuffer stall on pending use of a resident buffer?
1064
1065    RESOLVED: No. The existing language is:
1066
1067        "If the GL is able to map the buffer object's data store into the
1068         client's address space, MapBuffer returns the pointer value to
1069         the data store once all pending operations on that buffer have
1070         completed."
1071
1072    However, since the implementation has no information about how the
1073    buffer is used, "all pending operations" amounts to a Finish. In
1074    terms of sharing across contexts/threads, ARB_vertex_buffer_object
1075    says:
1076
1077        "How is synchronization enforced when buffer objects are shared by
1078         multiple OpenGL contexts?
1079
1080         RESOLVED: It is generally the clients' responsibility to
1081         synchronize modifications made to shared buffer objects."
1082
1083    So we shouldn't dictate any additional shared object synchronization.
1084    So the best we could do is a Finish, but it's not clear that this
1085    accomplishes anything for the application since they can just as
1086    easily call Finish. Or if they don't want synchronization, they can
1087    use MAP_UNSYNCHRONIZED_BIT. It seems the resolution to this is
1088    inconsequential as GL already provides the tools to achieve either
1089    behavior. Hence, don't bother stalling.
1090
1091    However, if a buffer was previously resident and has since been made
1092    non-resident, the implementation should enforce the stalling
1093    behavior for those pending operations from before it was made non-
1094    resident.
1095
1096    9) Given issue (8), what are some effective ways to load data into
1097    a buffer that is resident?
1098
1099    RESOLVED: There are several possibilities:
1100
1101    - BufferSubData.
1102
1103    - The application may track using Fences which parts of the buffer
1104      are actually in use and update them with CPU writes using
1105      MAP_UNSYNCHRONIZED_BIT. This is potentially error-prone, as
1106      described in ARB_copy_buffer.
1107
1108    - CopyBufferSubData. ARB_copy_buffer describes a simple usage example
1109      for a single-threaded application. Since this extension is targeted
1110      at reducing the CPU bottleneck in the rendering thread, offloading
1111      some of the work to other threads may be useful.
1112
1113      Example with a single Loading thread and Rendering thread:
1114
1115          Loading thread:
1116              while (1) {
1117                  WaitForEvent(something to do);
1118
1119                  NamedBufferData(tempBuffer, updateSize, NULL, STREAM_DRAW);
1120                  ptr = MapNamedBuffer(tempBuffer, WRITE_ONLY);
1121                  // fill ptr
1122                  UnmapNamedBuffer(tempBuffer);
1123                  // the buffer could have been filled via BufferData, if
1124                  // that's more natural.
1125
1126                  // send tempBuffer name to Rendering thread
1127              }
1128          Rendering thread:
1129              foreach (obj in scene) {
1130                  if (obj has changed) {
1131                      // get tempBuffer name from Loading thread
1132
1133                      NamedCopyBufferSubData(tempBuffer, objBuf, objOffset, updateSize);
1134                  }
1135                  Draw(obj);
1136              }
1137
1138      If we further desire to offload the data transfer to another
1139      thread, and the implementation supports concurrent data transfers
1140      in one context/thread while rendering in another context/thread,
1141      this may also be accomplished thusly:
1142
1143          Loading thread:
1144              while (1) {
1145                  WaitForEvent(something to do);
1146
1147                  NamedBufferData(sysBuffer, updateSize, NULL, STREAM_DRAW);
1148                  ptr = MapNamedBuffer(sysBuffer, WRITE_ONLY);
1149                  // fill ptr
1150                  UnmapNamedBuffer(sysBuffer);
1151
1152                  NamedBufferData(vidBuffer, updateSize, NULL, STREAM_COPY);
1153                  // This is a sysmem->vidmem blit.
1154                  NamedCopyBufferSubData(sysBuffer, vidBuffer, 0, updateSize);
1155                  SetFence(fenceId, ALL_COMPLETED);
1156
1157                  // send vidBuffer name and fenceId to Rendering thread
1158
1159                  // This could have been a BufferSubData directly into
1160                  // vidBuffer, if that's more natural.
1161              }
1162          Rendering thread:
1163              foreach (obj in scene) {
1164                  if (obj has changed) {
1165                      // get vidBuffer name and fenceId from Loading thread
1166
1167                      // note: there aren't any sharable fences currently,
1168                      // actually need to ask the loading thread when it
1169                      // has finished.
1170                      FinishFence(fenceId);
1171
1172                      // This is hopefully a fast vidmem->vidmem blit.
1173                      NamedCopyBufferSubData(vidBuffer, objBuffer, objOffset, updateSize);
1174                  }
1175                  Draw(obj);
1176              }
1177
1178      In both of these examples, the point at which the data is written to
1179      the resident buffer's data store is clearly specified in order
1180      with rendering commands. This resolves a whole class of
1181      synchronization bugs (Write After Read hazard) that
1182      MAP_UNSYNCHRONIZED_BIT is prone to.
1183
1184    10) What happens if BufferData is called on a buffer that is resident?
1185
1186    RESOLVED: BufferData is specified to "delete the existing data store",
1187    so the GPU address of that data should become invalid. The buffer is
1188    therefore made non-resident in the current context.
1189
1190    11) Should residency be a property of the buffer object, or should
1191    a buffer be "made resident to a context"?
1192
1193    RESOLVED: Made resident to a context. If a shared buffer is used in
1194    two threads/contexts, it may be difficult for the application to know
1195    when the residency state actually changes on the shared object
1196    particularly if there is a large latency between commands being
1197    submitted on the client and processed on the server. Allowing the
1198    buffer to be made resident to each context individually allows the
1199    state to be reliably toggled in-order in each command stream. This
1200    also allows MakeBufferNonResident to serve as indication to the GL
1201    that the buffer is no longer in use in each command stream.
1202
1203    This leads to an unfortunate orphaning issue. For example, if the
1204    buffer is resident in context A and then deleted in context B, how
1205    can the app make it non-resident in context A? Given the name-based
1206    object model, it is impossible. It would be complex from an
1207    implementation point of view for DeleteBuffers (or BufferData) to
1208    either make it non-resident or throw an error if it is resident in
1209    some other context.
1210
1211    An ideal solution would be a (separate) extension that allows the
1212    application to increment the refcount on the object and to decrement
1213    the refcount without necessarily deleting the object's name. Until
1214    such an extension exists, the unsatisfying proposed resolution is that
1215    a buffer can be "stuck" resident until the context is deleted. Note
1216    that DeleteBuffers should make the buffer non-resident in the context
1217    that does the delete, so this problem only applies to rare multi-
1218    context corner cases.
1219
1220    12) Is there any value in requiring an "immutable structure" bit of
1221    state to be set in order to query the address?
1222
1223    RESOLVED: NO. Given that the BufferData behavior is fairly
1224    straightforward to specify and implement, it's not clear that this
1225    would be useful.
1226
1227    13) What should the program syntax look like?
1228
1229    RESOLVED: Support 1-, 2-, 4-vec fetches of float/int/uint types, as
1230    well as 8- and 16-bit int/uint fetches via a new LOAD instruction
1231    with a slew of suffixes. Handling 8/16bit sizes will be useful for
1232    high-level languages compiling to the assembly. Addresses are required
1233    to be a multiple of the size of the data, as some implementations may
1234    require this.
1235
1236    Other options include a more x86-style pointer dereference
1237    ("MOV R0, DWORD PTR[R1];") or a complement to program.local
1238    ("MOV R0, program.global[R1];") but neither of these provide the
1239    simple granularity of the explicit type suffixes, and a new
1240    instruction is convenient in terms of implementation and not muddling
1241    the clean definition of MOV.
1242
1243    14) How does the GL know to invalidate caches when data has changed?
1244
1245    RESOLVED: Any entry points that can write to buffer objects should
1246    trigger the necessary invalidation. A new entry point may only be
1247    necessary once there is a way to write to a buffer by GPU address.
1248
1249    15) Does this extension require 64bit register/operation support in
1250        programs and shaders?
1251
1252    RESOLVED: NO. At the API level, GPU addresses are always 64bit values
1253    and when they are stored in uniforms, attribs, parameters, etc. they
1254    should always be stored at full precision. However, if programs and
1255    shaders don't support 64bit registers/operations via another
1256    programmability extension, then they will need to use only 32 bits.
1257    On such implementations, the usable address space is therefore limited
1258    to 4GB. Such a limit should be reflected in the value of
1259    MAX_SHADER_BUFFER_ADDRESS_NV.
1260
1261    It is expected that GLSL shaders will be compiled in such a way as to
1262    generate 64bit pointers on implementations that support it and 32bit
1263    pointers on implementations that don't. So GLSL shaders written against
1264    a 32bit implementation can be expected to be forward-compatible when
1265    run against a 64bit implementation. (u)intptr_t types are provided to
1266    ease this compatibility.
1267
1268    Built-in functions are provided to convert pointers to and from a pair
1269    of integers. These can be used to pass pointers as two components of a
1270    generic attrib, to construct a pointer from an RGUI32 texture fetch,
1271    or to write a pointer to a fragment shader output.
1272
1273    16) What assumption can applications make about the alignment of
1274    addresses returned by GetBufferParameterui64vNV?
1275
1276    RESOLVED: All buffers will begin at an address that is a multiple of
1277    16 bytes.
1278
1279    17) How can the application guarantee that the layout of a structure
1280        on the CPU matches the layout used by the GLSL compiler?
1281
1282    RESOLVED: Provide a standard set of packing rules designed around
1283    naturally aligning simple types. This spec will define pointer fetches
1284    in GLSL to use these rules, but does not explicitly guarantee that
1285    other extensions (like EXT_bindable_uniform) will use the same packing
1286    rules for their bufferobject fetches. These packing rules are
1287    different from the ARB_uniform_buffer_object rules - in particular,
1288    these rules do not require vec4 padding of the array stride.
1289
1290    18) Is the address space per-context, per-share-group, or global?
1291
1292    RESOLVED: It is per-share-group. Using addresses from one share group
1293    in another share group will cause undefined results.
1294
1295    19) Is there risk of using invalid pointers for "killed" fragments,
1296    fragments that don't take a certain branch of an "if" block, or
1297    fragments whose shader is conceptually never executed due to pixel
1298    ownership, stipple, etc.?
1299
1300    RESOLVED: NO. OpenGL implementations sometimes run fragment programs
1301    on "helper" pixels that have no coverage, or continue to run fragment
1302    programs on killed pixels in order to be able to compute sane partial
1303    derivatives for fragment program instructions (DDX, DDY) or automatic
1304    level-of-detail calculations for texturing.  In this approach,
1305    derivatives are approximated by computing the difference in a quantity
1306    computed for a given fragment at (x,y) and a fragment at a neighboring
1307    pixel.  When a fragment program is executed on a "helper" pixel or
1308    killed pixel, global loads may not be executed in order to prevent
1309    spurious faults. Helper pixels aren't explicitly mentioned in the spec
1310    body; instead, partial derivatives are obtained by magic.
1311
1312    If a fragment program contains a KIL instruction, compilers may not
1313    reorder code such that a LOAD instruction is executed before a KIL
1314    instruction that logically precedes it in flow control.  Once a
1315    fragment is killed, subsequent loads should never be executed if they
1316    could cause any observable side effects.
1317
1318    As a result, if a shader uses instructions that explicitly or
1319    implicitly do LOD calculations dependent on the result of a global
1320    load, those instructions will have undefined results.
1321
1322    20) How are structures and arrays stored in buffer object memory?
1323
1324    RESOLVED:  Individual structure members and array elements are stored
1325    "packed" in memory, subject to an alignment requirement.  Structure
1326    members are stored according to the order of declaration.  Array elements
1327    are stored consecutively by element number.  Unreferenced structure
1328    members or array elements are never eliminated.
1329
1330    The alignment requirement of individual structure members or array
1331    elements is usually equal to the size of the item.  For the purposes of
1332    this requirement, vector types are treated atomically (i.e., a "vec4" with
1333    32-bit floats will be 16-byte aligned).  One exception is that the
1334    required alignment of three-component vectors is the same as the required
1335    alignment of a four-component vector of the same base type.
1336
1337    21) How do the memory layout rules relate to the similar layout rules
1338    specified for the uniform buffer object (UBO) feature incorporated in
1339    OpenGL 3.1?
1340
1341    RESOLVED:  This extension was completed prior to OpenGL 3.1, but the
1342    layout rules for this extension and for UBO were developed roughly
1343    concurrently.  The layout rules here are nearly identical to those for the
1344    "std140" layout for uniform blocks.  The main difference here is that
1345    "std140" requires arrays of small types (e.g., "float") to be padded out
1346    to vec4 alignment (16B), while this extension does not.
1347
1348    Note that this extension does NOT allow shaders to use the layout()
1349    qualifier added by GLSL 1.40 to achieve fine-grained control of structure
1350    or array layout using pointers.  A subsequent extension could provide this
1351    capability.
1352
1353    22) Should we provide a mechanism for tighter packing of an array of
1354    three-component vectors?
1355
1356    RESOLVED:  This could be desirable, but it won't be provided in this
1357    extension.  A subsequent extension could support alternate layouts by
1358    allowing shaders to use of the GLSL 1.40 layout() modifier to qualify
1359    pointer types.
1360
1361    If tight packing of vec3's is strongly required, a three component array
1362    element could be constructed using three single component loads or by
1363    selecting/swizzling components of one or more larger loads.  The former
1364    technique could be done using GLSL by replacing:
1365
1366      vec3 *pointer;
1367      vec3 elementN;
1368      int n;
1369      elementN = pointer[n];
1370
1371    with
1372
1373      float *pointer;
1374      vec3 elementN;
1375      int n;
1376      elementN = vec3(pointer[n*3], pointer[n*3+1], pointer[n*3+2]);
1377
1378
1379Revision History
1380
1381    Rev.    Date    Author    Changes
1382    ----  --------  --------  -----------------------------------------
1383     8    08/06/10  istewart  Modify behavior of named buffer functions
1384                              to match those of EXT_direct_state_access.
1385                              Add INVALID_OPERATION error to
1386                              MakeBufferResidentNV and GetBufferParameterui64vNV
1387                              if the buffer object has no data store.
1388
1389     7    06/22/10  pbrown    Document INVALID_OPERATION errors on
1390                              residency managment and query APIs when an
1391                              non-existent buffer object is referenced,
1392                              when trying to make an already resident buffer
1393                              resident, or when trying to make an already
1394                              non-resident buffer non-resident.
1395
1396     6    09/21/09  groth     Fix non-conformant DSA function names.
1397
1398     5    09/10/09  Jon Leech Add 'const' to type of Uniformui64vNV and
1399                              ProgramUniformui64vNV 'count' argument.
1400
1401     4    09/09/09  mjk       Fix typos
1402
1403     3    08/21/09  pbrown    Add explicit spec language describing the
1404                              typecast operator implemented here.  The
1405                              previous spec language said it was allowed
1406                              but didn't say what it did.
1407
1408     2    08/05/09  pbrown    Update section describing memory layout of
1409                              variables pointed to; moved to the core
1410                              specification as with OpenGL 3.1's uniform
1411                              buffer layout.  Added a few issues on memory
1412                              layout.  Explicitly documented the set of
1413                              operations and implicit conversions allowed
1414                              on pointers.
1415
1416     1              jbolz     Internal revisions.
1417