• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    EXT_shader_image_load_store
4
5Name Strings
6
7    GL_EXT_shader_image_load_store
8
9Contact
10
11    Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com)
12    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
13
14Contributors
15
16    Barthold Lichtenbelt, NVIDIA
17    Bill Licea-Kane, AMD
18    Eric Werness, NVIDIA
19    Graham Sellers, AMD
20    Greg Roth, NVIDIA
21    Nick Haemel, AMD
22    Pierre Boudier, AMD
23    Piers Daniell, NVIDIA
24
25Status
26
27    Shipping.
28
29Version
30
31    Last Modified Date:         10/16/2013
32    NVIDIA Revision:            7
33
34Number
35
36    386
37
38Dependencies
39
40    This extension is written against the OpenGL 3.2 specification
41    (Compatibility Profile).
42
43    This extension is written against version 1.50 (revision 09) of the OpenGL
44    Shading Language Specification.
45
46    OpenGL 3.0 and GLSL 1.30 are required.
47
48    This extension interacts trivially with OpenGL 3.2 (Core Profile).
49
50    This extension interacts trivially with OpenGL 3.1,
51    ARB_uniform_buffer_object, and EXT_bindable_uniform.
52
53    This extension interacts trivially with ARB_draw_indirect.
54
55    This extension interacts trivially with NV_vertex_buffer_unified_memory.
56
57    This extension interacts trivially with OpenGL 3.2 and
58    ARB_texture_multisample.
59
60    This extension interacts trivially with OpenGL 4.0 and ARB_sample_shading.
61
62    This extension interacts trivially with OpenGL 4.0 and
63    ARB_texture_cube_map_array.
64
65    This extension interacts trivially with OpenGL 3.3 and
66    ARB_texture_rgb10_a2ui.
67
68    This extension interacts trivially with NV_shader_buffer_load.
69
70    This extension interacts trivially with OpenGL 4.0, ARB_gpu_shader5, and
71    NV_gpu_shader5.
72
73    This extension interacts trivially with OpenGL 4.0 and
74    ARB_tessellation_shader.
75
76    This extension interacts trivially with EXT_depth_bounds_test.
77
78    This extension interacts with EXT_separate_shader_objects.
79
80    This extension interacts with NV_gpu_program5.
81
82Overview
83
84    This extension provides GLSL built-in functions allowing shaders to load
85    from, store to, and perform atomic read-modify-write operations to a
86    single level of a texture object from any shader stage.  These built-in
87    functions are named imageLoad(), imageStore(), and imageAtomic*(),
88    respectively, and accept integer texel coordinates to identify the texel
89    accessed.  The extension adds the notion of "image units" to the OpenGL
90    API, to which texture levels are bound for access by the GLSL built-in
91    functions.  To allow shaders to specify the image unit to access, GLSL
92    provides a new set of data types ("image*") similar to samplers.  Each
93    image variable is assigned an integer value to identify an image unit to
94    access, which is specified using Uniform*() APIs in a manner similar to
95    samplers.  For implementations supporting the NV_gpu_program5 extensions,
96    assembly language instructions to perform image loads, stores, and atomics
97    are also provided.
98
99    This extension also provides the capability to explicitly enable "early"
100    per-fragment tests, where operations like depth and stencil testing are
101    performed prior to fragment shader execution.  In unextended OpenGL,
102    fragment shaders never have any side effects and implementations can
103    sometimes perform per-fragment tests and discard some fragments prior to
104    executing the fragment shader.  Since this extension allows fragment
105    shaders to write to texture and buffer object memory using the built-in
106    image functions, such optimizations could lead to non-deterministic
107    results.  To avoid this, implementations supporting this extension may not
108    perform such optimizations on shaders having such side effects.  However,
109    enabling early per-fragment tests guarantees that such tests will be
110    performed prior to fragment shader execution, and ensures that image
111    stores and atomics will not be performed by fragment shader invocations
112    where these per-fragment tests fail.
113
114    Finally, this extension provides both a GLSL built-in function and an
115    OpenGL API function allowing applications some control over the ordering
116    of image loads, stores, and atomics relative to other OpenGL pipeline
117    operations accessing the same memory.  Because the extension provides the
118    ability to perform random accesses to texture or buffer object memory,
119    such accesses are not easily tracked by the OpenGL driver.  To avoid the
120    need for heavy-handed synchronization at the driver level, this extension
121    requires manual synchronization.  The MemoryBarrierEXT() OpenGL API
122    function allows applications to specify a bitfield indicating the set of
123    OpenGL API operations to synchronize relative to shader memory access.
124    The memoryBarrier() GLSL built-in function provides a synchronization
125    point within a given shader invocation to ensure that all memory accesses
126    performed prior to the synchronization point complete prior to any started
127    after the synchronization point.
128
129New Procedures and Functions
130
131    void BindImageTextureEXT(uint index, uint texture, int level,
132                             boolean layered, int layer, enum access,
133                             int format);
134
135    void MemoryBarrierEXT(bitfield barriers);
136
137New Tokens
138
139    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
140    GetFloatv, and GetDoublev:
141
142        MAX_IMAGE_UNITS_EXT                             0x8F38
143        MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS_EXT 0x8F39
144        MAX_IMAGE_SAMPLES_EXT                           0x906D
145
146    Accepted by the <target> parameter of GetIntegeri_v and GetBooleani_v:
147
148        IMAGE_BINDING_NAME_EXT                          0x8F3A
149        IMAGE_BINDING_LEVEL_EXT                         0x8F3B
150        IMAGE_BINDING_LAYERED_EXT                       0x8F3C
151        IMAGE_BINDING_LAYER_EXT                         0x8F3D
152        IMAGE_BINDING_ACCESS_EXT                        0x8F3E
153        IMAGE_BINDING_FORMAT_EXT                        0x906E
154
155    Accepted by the <barriers> parameter of MemoryBarrierEXT:
156
157        VERTEX_ATTRIB_ARRAY_BARRIER_BIT_EXT             0x00000001
158        ELEMENT_ARRAY_BARRIER_BIT_EXT                   0x00000002
159        UNIFORM_BARRIER_BIT_EXT                         0x00000004
160        TEXTURE_FETCH_BARRIER_BIT_EXT                   0x00000008
161        SHADER_IMAGE_ACCESS_BARRIER_BIT_EXT             0x00000020
162        COMMAND_BARRIER_BIT_EXT                         0x00000040
163        PIXEL_BUFFER_BARRIER_BIT_EXT                    0x00000080
164        TEXTURE_UPDATE_BARRIER_BIT_EXT                  0x00000100
165        BUFFER_UPDATE_BARRIER_BIT_EXT                   0x00000200
166        FRAMEBUFFER_BARRIER_BIT_EXT                     0x00000400
167        TRANSFORM_FEEDBACK_BARRIER_BIT_EXT              0x00000800
168        ATOMIC_COUNTER_BARRIER_BIT_EXT                  0x00001000
169        ALL_BARRIER_BITS_EXT                            0xFFFFFFFF
170
171    Returned by the <type> parameter of GetActiveUniform:
172
173        IMAGE_1D_EXT                                    0x904C
174        IMAGE_2D_EXT                                    0x904D
175        IMAGE_3D_EXT                                    0x904E
176        IMAGE_2D_RECT_EXT                               0x904F
177        IMAGE_CUBE_EXT                                  0x9050
178        IMAGE_BUFFER_EXT                                0x9051
179        IMAGE_1D_ARRAY_EXT                              0x9052
180        IMAGE_2D_ARRAY_EXT                              0x9053
181        IMAGE_CUBE_MAP_ARRAY_EXT                        0x9054
182        IMAGE_2D_MULTISAMPLE_EXT                        0x9055
183        IMAGE_2D_MULTISAMPLE_ARRAY_EXT                  0x9056
184        INT_IMAGE_1D_EXT                                0x9057
185        INT_IMAGE_2D_EXT                                0x9058
186        INT_IMAGE_3D_EXT                                0x9059
187        INT_IMAGE_2D_RECT_EXT                           0x905A
188        INT_IMAGE_CUBE_EXT                              0x905B
189        INT_IMAGE_BUFFER_EXT                            0x905C
190        INT_IMAGE_1D_ARRAY_EXT                          0x905D
191        INT_IMAGE_2D_ARRAY_EXT                          0x905E
192        INT_IMAGE_CUBE_MAP_ARRAY_EXT                    0x905F
193        INT_IMAGE_2D_MULTISAMPLE_EXT                    0x9060
194        INT_IMAGE_2D_MULTISAMPLE_ARRAY_EXT              0x9061
195        UNSIGNED_INT_IMAGE_1D_EXT                       0x9062
196        UNSIGNED_INT_IMAGE_2D_EXT                       0x9063
197        UNSIGNED_INT_IMAGE_3D_EXT                       0x9064
198        UNSIGNED_INT_IMAGE_2D_RECT_EXT                  0x9065
199        UNSIGNED_INT_IMAGE_CUBE_EXT                     0x9066
200        UNSIGNED_INT_IMAGE_BUFFER_EXT                   0x9067
201        UNSIGNED_INT_IMAGE_1D_ARRAY_EXT                 0x9068
202        UNSIGNED_INT_IMAGE_2D_ARRAY_EXT                 0x9069
203        UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY_EXT           0x906A
204        UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_EXT           0x906B
205        UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY_EXT     0x906C
206
207
208Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
209(Rasterization)
210
211    (Add new types to table 2.13, pp. 96-98)
212
213      Type Name                                    Keyword
214      ------------------------------               -------------------------
215      IMAGE_1D_EXT                                 image1D
216      IMAGE_2D_EXT                                 image2D
217      IMAGE_3D_EXT                                 image3D
218      IMAGE_2D_RECT_EXT                            image2DRect
219      IMAGE_CUBE_EXT                               imageCube
220      IMAGE_BUFFER_EXT                             imageBuffer
221      IMAGE_1D_ARRAY_EXT                           image1DArray
222      IMAGE_2D_ARRAY_EXT                           image2DArray
223      IMAGE_CUBE_MAP_ARRAY_EXT                     imageCubeArray
224      IMAGE_2D_MULTISAMPLE_EXT                     image2DMS
225      IMAGE_2D_MULTISAMPLE_ARRAY_EXT               image2DMSArray
226      INT_IMAGE_1D_EXT                             iimage1D
227      INT_IMAGE_2D_EXT                             iimage2D
228      INT_IMAGE_3D_EXT                             iimage3D
229      INT_IMAGE_2D_RECT_EXT                        iimage2DRect
230      INT_IMAGE_CUBE_EXT                           iimageCube
231      INT_IMAGE_BUFFER_EXT                         iimageBuffer
232      INT_IMAGE_1D_ARRAY_EXT                       iimage1DArray
233      INT_IMAGE_2D_ARRAY_EXT                       iimage2DArray
234      INT_IMAGE_CUBE_MAP_ARRAY_EXT                 iimageCubeArray
235      INT_IMAGE_2D_MULTISAMPLE_EXT                 iimage2DMS
236      INT_IMAGE_2D_MULTISAMPLE_ARRAY_EXT           iimage2DMSArray
237      UNSIGNED_INT_IMAGE_1D_EXT                    uimage1D
238      UNSIGNED_INT_IMAGE_2D_EXT                    uimage2D
239      UNSIGNED_INT_IMAGE_3D_EXT                    uimage3D
240      UNSIGNED_INT_IMAGE_2D_RECT_EXT               uimage2DRect
241      UNSIGNED_INT_IMAGE_CUBE_EXT                  uimageCube
242      UNSIGNED_INT_IMAGE_BUFFER_EXT                uimageBuffer
243      UNSIGNED_INT_IMAGE_1D_ARRAY_EXT              uimage1DArray
244      UNSIGNED_INT_IMAGE_2D_ARRAY_EXT              uimage2DArray
245      UNSIGNED_INT_IMAGE_CUBE_MAP_ARRAY_EXT        uimageCubeArray
246      UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_EXT        uimage2DMS
247      UNSIGNED_INT_IMAGE_2D_MULTISAMPLE_ARRAY_EXT  uimage2DMSArray
248
249
250    (Add a new subsection after Section 2.14.5, Samplers, p. 106)
251
252    Section 2.14.X, Images
253
254    Images are special uniforms used in the OpenGL Shading Language to
255    identify a level of a texture to be read or written using image load,
256    store, and atomic built-in functions in the manner described in Section
257    3.9.X.  The value of an image uniform is an integer specifying the image
258    unit accessed.  Image units are numbered beginning at zero, and there is
259    an implementation-dependent number of available image units
260    (MAX_IMAGE_UNITS_EXT).  The error INVALID_VALUE is generated if a
261    Uniform1i{v} call is used to set an image uniform to a value less than
262    zero or greater than or equal to MAX_IMAGE_UNITS_EXT.  Note that image
263    units used for image variables are independent of the texture image
264    units used for sampler variables; the number of units provided by the
265    implementation may differ.  Textures are bound independently and
266    separately to image and texture image units.
267
268    The type of an image variable must match the texture target of the image
269    currently bound to the image unit, otherwise the result of the load/
270    store/atomic operation is undefined (see Section 4.1.X of the OpenGL
271    Shading Language specification for more detail).
272
273    The location of an image variable needs to be queried with
274    GetUniformLocation, just like any uniform variable.  Image values need to
275    be set by calling Uniform1i{v}.  Loading image variables with any of the
276    other Uniform entry point is not allowed and will result in an
277    INVALID_OPERATION error.
278
279    Unlike samplers, there is no limit on the number of active image variables
280    that may be used by a program or by any particular shader.  However, given
281    that there is an implementation-dependent limit on the number of unique
282    image units, the actual number of images that may be used by all shaders
283    in a program is limited.
284
285
286    (Add a new subsection after Section 2.14.7, Shader Execution, p. 109)
287
288    Section 2.14.X, Shader Memory Access
289
290    Shaders may perform random-access reads and writes to texture or buffer
291    object memory using built-in image load, store, and atomic functions, as
292    described in the OpenGL Shading Language Specification.  The ability to
293    perform such random-access reads and writes in system that may be highly
294    pipelined results in ordering and synchronization issues discussed in the
295    sections below.
296
297
298    Shader Memory Access Ordering
299
300    The order in which texture or buffer object memory is read or written by
301    shaders is largely undefined.  For some shader types (vertex, tessellation
302    evaluation, and in some cases, fragment), the number of shader invocations
303    that might perform loads and stores is even undefined.  In particular, the
304    following rules apply:
305
306      * While a vertex or tessellation evaluation shader will be executed at
307        least once for each unique vertex specified by the application (vertex
308        shaders) or generated by the tessellation primitive generator
309        (tessellation evaluation shaders), it may be executed more than once
310        for implementation-dependent reasons.  Additionally, if the same
311        vertex is specified multiple times in a collection of primitives
312        (e.g., repeating an index in DrawElements), the vertex shader might be
313        run only once.
314
315      * For each fragment generated by the GL, the number of fragment shader
316        invocations depends on a number of factors.  If the fragment fails the
317        pixel ownership test (Section 4.1.1), the fragment shader may not be
318        executed.  Otherwise, if the framebuffer has no multisample buffer
319        (SAMPLE_BUFFERS is zero), the fragment shader will be invoked exactly
320        once.  If the fragment shader specifies per-sample shading, the
321        fragment shader will be run once per covered sample.  Otherwise, the
322        number of fragment shader invocations is undefined, but must be in the
323        range [1,<N>], where <N> is the number of samples covered by the
324        fragment.
325
326      * If a fragment shader is invoked to process fragments or samples not
327        covered by a primitive being rasterized to facilitate the
328        approximation of derivatives for texture lookups, stores and atomics
329        have no effect.
330
331      * The relative order of invocations of the same shader type are
332        undefined.  A store issued by a shader when working on primitive B
333        might complete prior to a store for primitive A, even if primitive A
334        is specified prior to primitive B.  This applies even to fragment
335        shaders; while fragment shader outputs are written to the framebuffer
336        in primitive order, stores executed by fragment shader invocations are
337        not.
338
339      * The relative order of invocations of different shader types is largely
340        undefined.  However, when executing a shader whose inputs are
341        generated from a previous programmable stage, the shader invocations
342        from the previous stage are guaranteed to have executed far enough to
343        generate final values for all next-stage inputs.  That implies shader
344        completion for all stages except geometry; geometry shaders are
345        guaranteed only to have executed far enough to emit all needed
346        vertices.
347
348    The above limitations on shader invocation order also make some forms of
349    synchronization between shader invocations within a single set of
350    primitives unimplementable.  For example, having one invocation poll
351    memory written by another invocation assumes that the other invocation has
352    been launched and can complete its writes.  The only case where such a
353    guarantee is made is when the inputs of one shader invocation are
354    generated from the outputs of a shader invocation in a previous stage.
355
356    Stores issued to different memory locations within a single shader
357    invocation may not be visible to other invocations in the order they were
358    performed.  The built-in function memoryBarrier() may be used to provide
359    stronger ordering of reads and writes performed by a single invocation.
360    Calling memoryBarrier() guarantees that any memory transactions issued by
361    the shader invocation prior to the call complete prior to the memory
362    transactions issued after the call.  Memory barriers may be needed for
363    algorithms that require multiple invocations to access the same memory and
364    require the operations need to be performed in a partially-defined
365    relative order.  For example, if one shader invocation does a series of
366    writes, followed by a memoryBarrier() call, followed by another write,
367    then another invocation that sees the results of the final write will also
368    see the previous writes.  Without the memory barrier, the final write may
369    be visible before the previous writes.
370
371    The atomic memory transaction built-in functions may be used to read and
372    write a given memory address atomically.  While atomic built-in functions
373    issued by multiple shader invocations are executed in undefined order
374    relative to each other, these functions perform both a read and a write of
375    a memory address and guarantee that no other memory transaction will write
376    to the underlying memory between the read and write.  Atomics allow
377    shaders to use shared global addresses for mutual exclusion or as
378    counters, among other uses.
379
380
381    Shader Memory Access Synchronization
382
383    Data written to textures or buffer objects by a shader invocation may
384    eventually be read by other shader invocations, sourced by other fixed
385    pipeline stages, or read back by the application.  When applications write
386    to buffer objects or textures using API commands such as TexSubImage* or
387    BufferSubData, the GL implementation knows when and where writes occur and
388    can perform implicit synchronization to ensure that operations requested
389    before the update see the original data and that subsequent operations see
390    the modified data.  Without logic to track the target address of each
391    shader instruction performing a store, automatic synchronization of stores
392    performed by a shader invocation would require the GL implementation to
393    make worst-case assumptions at significant performance cost.  To permit
394    cases where textures or buffers may be read or written in different
395    pipeline stages without the overhead of automatic synchronization, buffer
396    object and texture stores performed by shaders are not automatically
397    synchronized with other GL operations using the same memory.
398
399    Explicit synchronization is required to ensure that the effects of buffer
400    and texture data stores performed by shaders will be visible to subsequent
401    operations using the same objects and will not overwrite data still to be
402    read by previously requested operations.  Without manual synchronization,
403    shader stores for a "new" primitive may complete before processing of an
404    "old" primitive completes.  Additionally, stores for an "old" primitive
405    might not be completed before processing of a "new" primitive starts.  The
406    command
407
408        void MemoryBarrierEXT(bitfield barriers)
409
410    defines a barrier ordering the memory transactions issued prior to the
411    command relative to those issued after the barrier.  For the purposes of
412    this ordering, memory transactions performed by shaders are considered to
413    be issued by the rendering command that triggered the execution of the
414    shader.  <barriers> is a bitfield indicating the set of operations that
415    are synchronized with shader stores; the bits used in <barriers> are as
416    follows:
417
418    - VERTEX_ATTRIB_ARRAY_BARRIER_BIT_EXT:  If set, vertex data sourced from
419        buffer objects after the barrier will reflect data written by shaders
420        prior to the barrier.  The set of buffer objects affected by this bit
421        is derived from the buffer object bindings or GPU addresses used for
422        generic vertex attributes (VERTEX_ATTRIB_ARRAY_BUFFER bindings,
423        VERTEX_ATTRIB_ARRAY_ADDRESS from NV_vertex_buffer_unified_memory), as
424        well as those for arrays of named vertex attributes (e.g., vertex,
425        color, normal).
426
427    - ELEMENT_ARRAY_BARRIER_BIT_EXT:  If set, vertex array indices sourced from
428        buffer objects after the barrier will reflect data written by shaders
429        prior to the barrier.  The buffer objects affected by this bit are
430        derived from the ELEMENT_ARRAY_BUFFER binding and the
431        NV_vertex_buffer_unified_memory ELEMENT_ARRAY_ADDRESS address.
432
433    - UNIFORM_BARRIER_BIT_EXT:  Shader uniforms and assembly program parameters
434        sourced from buffer objects after the barrier will reflect data
435        written by shaders prior to the barrier.
436
437    - TEXTURE_FETCH_BARRIER_BIT_EXT:  Texture fetches from shaders, including
438        fetches from buffer object memory via buffer textures, after the
439        barrier will reflect data written by shaders prior to the barrier.
440
441    - SHADER_IMAGE_ACCESS_BARRIER_BIT_EXT:  Memory accesses using shader image
442        load, store, and atomic built-in functions issued after the barrier
443        will reflect data written by shaders prior to the barrier.
444        Additionally, image stores and atomics issued after the barrier will
445        not execute until all memory accesses (e.g., loads, stores, texture
446        fetches, vertex fetches) initiated prior to the barrier complete.
447
448    - COMMAND_BARRIER_BIT_EXT:  Command data sourced from buffer objects by
449        Draw*Indirect commands after the barrier will reflect data written by
450        shaders prior to the barrier.  The buffer objects affected by this bit
451        are derived from the DRAW_INDIRECT_BUFFER_EXT binding and the GPU
452        address DRAW_INDIRECT_ADDRESS_EXT.
453
454    - PIXEL_BUFFER_BARRIER_BIT_EXT:  Reads/writes of buffer objects via the
455        PACK/UNPACK_BUFFER bindings (ReadPixels, TexSubImage, etc.) after the
456        barrier will reflect data written by shaders prior to the barrier.
457        Additionally, buffer object writes issued after the barrier will wait
458        on the completion of all shader writes initiated prior to the barrier.
459
460    - TEXTURE_UPDATE_BARRIER_BIT_EXT:  Writes to a texture via Tex(Sub)Image*,
461        CopyTex(Sub)Image*, CompressedTex(Sub)Image*, and reads via
462        GetTexImage after the barrier will reflect data written by shaders
463        prior to the barrier.  Additionally, texture writes from these
464        commands issued after the barrier will not execute until all shader
465        writes initiated prior to the barrier complete.
466
467    - BUFFER_UPDATE_BARRIER_BIT_EXT:  Reads/writes via Buffer(Sub)Data,
468        MapBuffer(Range), CopyBufferSubData, ProgramBufferParameters, and
469        GetBufferSubData after the barrier will reflect data written by
470        shaders prior to the barrier.  Additionally, writes via these commands
471        issued after the barrier will wait on the completion of all shader
472        writes initiated prior to the barrier.
473
474    - FRAMEBUFFER_BARRIER_BIT_EXT:  Reads and writes via framebuffer object
475        attachments after the barrier will reflect data written by shaders
476        prior to the barrier.  Additionally, framebuffer writes issued after
477        the barrier will wait on the completion of all shader writes issued
478        prior to the barrier.
479
480    - TRANSFORM_FEEDBACK_BARRIER_BIT_EXT:  Writes via transform feedback
481        bindings after the barrier will reflect data written by shaders prior
482        to the barrier.  Additionally, transform feedback writes issued after
483        the barrier will wait on the completion of all shader writes issued
484        prior to the barrier.
485
486    - ATOMIC_COUNTER_BARRIER_BIT_EXT: Accesses to atomic counters after the
487        barrier will reflect writes prior to the barrier.
488
489    If <barriers> is ALL_BARRIER_BITS_EXT, shader memory accesses will be
490    synchronized relative to all the operations described above.
491
492    Implementations may cache buffer object and texture image memory that
493    could be written by shaders in multiple caches; for example, there may be
494    separate caches for texture, vertex fetching, and one or more caches for
495    shader memory accesses.  Implementations are not required to keep these
496    caches coherent with shader memory writes.  Stores issued by one
497    invocation may not be immediately observable by other pipeline stages or
498    other shader invocations because the value stored may remain in a cache
499    local to the processor executing the store, or because data overwritten by
500    the store is still in a cache elsewhere in the system.  When MemoryBarrier
501    is called, the GL flushes and/or invalidates any caches relevant to the
502    operations specified by the <barriers> parameter to ensure consistent
503    ordering of operations across the barrier.
504
505    To allow for independent shader invocations to communicate by reads and
506    writes to a common memory address, image variables in the OpenGL Shading
507    Language may be declared as "coherent".  Buffer object or texture image
508    memory accessed through such variables may be cached only if caches are
509    automatically updated due to stores issued by any other shader invocation.
510    If the same address is accessed using both coherent and non-coherent
511    variables, the accesses using variables declared as coherent will observe
512    the results stored using coherent variables in other invocations.  Using
513    variables declared as "coherent" guarantees only that the results of
514    stores will be immediately visible to shader invocations using
515    similarly-declared variables; calling MemoryBarrier is required to ensure
516    that the stores are visible to other operations.
517
518    The following guidelines may be helpful in choosing when to use coherent
519    memory accesses and when to use barriers.
520
521    - Data that are read-only or constant may be accessed without using
522      coherent variables or calling MemoryBarrierEXT().  Updates to the
523      read-only data via API calls such as BufferSubData will invalidate
524      shader caches implicitly as required.
525
526    - Data that are shared between shader invocations at a fine granularity
527      (e.g., written by one invocation, consumed by another invocation) should
528      use coherent variables to read and write the shared data.
529
530    - Data written by one shader invocation and consumed by other shader
531      invocations launched as a result of its execution ("dependent
532      invocations") should use coherent variables in the producing shader
533      invocation and call memoryBarrier() after the last write.  The consuming
534      shader invocation should also use coherent variables.
535
536    - Data written to image variables in one rendering pass and read by the
537      shader in a later pass need not use coherent variables or
538      memoryBarrier().  Calling MemoryBarrierEXT() with the
539      SHADER_IMAGE_ACCESS_BARRIER_BIT_EXT set in <barriers> between passes is
540      necessary.
541
542    - Data written by the shader in one rendering pass and read by another
543      mechanism (e.g., vertex or index buffer pulling) in a later pass need
544      not use coherent variables or memoryBarrier().  Calling
545      MemoryBarrierEXT() with the appropriate bits set in <barriers> between
546      passes is necessary.
547
548
549Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
550(Rasterization)
551
552    (insert new section immediately before Section 3.8, Texturing, p. 210)
553
554    Section 3.X, Early Per-Fragment Tests
555
556    Once fragments are produced by rasterization (sections 3.4 through 3.8), a
557    number of per-fragment operations may be performed prior to fragment
558    shader execution.  If a fragment is discarded during any of these
559    operations, it will not be processed by any subsequent stage, including
560    fragment shader execution.
561
562    Up to six operations are performed on each fragment, in the following
563    order:
564
565      * the pixel ownership test, described in section 4.1.1;
566
567      * the scissor test, described in section 4.1.2;
568
569      * the depth bounds test, described in section 4.1.X (of the
570        EXT_depth_bounds_test specification);
571
572      * the stencil test, described in section 4.1.5;
573
574      * the depth buffer test, described in section 4.1.6; and
575
576      * occlusion query sample counting, described in section 4.1.7.
577
578    The pixel ownership and scissor tests are always performed.
579
580    The other operations are performed if and only if early fragment tests are
581    enabled in the active fragment shader (section 3.12.2).  When early
582    per-fragment operations are enabled, the depth bounds test, stencil test,
583    depth buffer test, and occlusion query sample counting operations are
584    performed prior to fragment shader execution, and the stencil buffer,
585    depth buffer, and occlusion query sample counts will be updated
586    accordingly.  When early per-fragment operations are enabled, these
587    operations will not be performed again after fragment shader execution.
588    When there is no active program, the active program has no fragment
589    shader, or the active program was linked with early fragment tests
590    disabled, these operations are performed only after fragment program
591    execution, in the order described in chapter 4.
592
593    If early fragment tests are enabled, any depth value computed by the
594    fragment shader has no effect.  Additionally, the depth buffer, stencil
595    buffer, and occlusion query sample counts may be updated even for
596    fragments or samples that would be discarded after fragment shader
597    execution due to per-fragment operations such as alpha-to-coverage or
598    alpha tests.
599
600
601    (Add new section after Section 3.9.19, Texture Application, p. 268)
602
603    Section 3.9.X, Texture Image Loads and Stores
604
605    The contents of a texture may be made available for shaders to read and
606    write by binding the texture to one of a collection of image units.  The
607    GL implementation provides an array of image units numbered beginning with
608    zero, with the total number of image units provided given by the
609    implementation-dependent constant MAX_IMAGE_UNITS_EXT.  Unlike texture
610    image units, image units do not have a separate attachment for each
611    texture target texture; each image unit may have only one texture bound at
612    a time.
613
614    A texture may be bound to an image unit for use by image loads and stores
615    by calling:
616
617        void BindImageTextureEXT(uint index, uint texture, int level,
618                                 boolean layered, int layer, enum access,
619                                 int format);
620
621    where <index> identifies the image unit, <texture> is the name of the
622    texture, and <level> selects a single level of the texture.  If <texture>
623    is zero, <level> is ignored and the currently bound texture to image unit
624    <index> is unbound.  If <index> is less than zero or greater than or equal
625    to MAX_IMAGE_UNITS_EXT, or if <texture> is not the name of an existing
626    texture object, the error INVALID_VALUE is generated.
627
628    If the texture identified by <texture> is a one-dimensional array,
629    two-dimensional array, three-dimensional, cube map, cube map array, or
630    two-dimensional multisample array texture, it is possible to bind either
631    the entire texture level or a single layer or face of the texture level.
632    If <layered> is TRUE, the entire level is bound.  If <layered> is FALSE,
633    only the single layer identified by <layer> will be bound.  When <layered>
634    is FALSE, the single bound layer is treated as a different texture target
635    for image accesses:
636
637      * one-dimensional array texture layers are treated as one-dimensional
638        textures;
639
640      * two-dimensional array, three-dimensional, cube map, cube map array
641        texture layers are treated as two-dimensional textures; and
642
643      * two-dimensional multisample array textures are treated as
644        two-dimensional multisample textures.
645
646    For cube map textures where <layered> is FALSE, the face is taken by
647    mapping the layer number to a face according to table 4.13.  For cube map
648    array textures where <layered> is FALSE, the selected layer number is
649    mapped to a texture layer and cube face using the following equations and
650    mapping <face> to a face according to table 4.13.
651
652      layer  = floor(layer_orig / 6)
653      face   = layer_orig - (layer * 6)
654
655    <format> specifies the format that the elements of the image will be
656    treated as when doing formatted stores, as described later in this
657    section. This is referred to as the "image unit format". This must be one
658    of the formats listed in Table X.2, otherwise the error INVALID_VALUE is
659    generated.
660
661    <access> specifies whether the texture bound to the image will be treated
662    as READ_ONLY, WRITE_ONLY, or READ_WRITE.  If a shader reads from an image
663    unit with a texture bound as WRITE_ONLY, or writes to an image unit with a
664    texture bound as READ_ONLY, the results of that shader operation are
665    undefined and may lead to application termination.
666
667    If a texture object bound to one or more image units is deleted by
668    DeleteTextures, it is detached from each such image unit, as though
669    BindImageTextureEXT were called with <index> identifying the image unit and
670    <texture> set to zero.
671
672    When a shader accesses the texture bound to an image unit using a built-in
673    image load, store, or atomic function, it identifies a single texel by
674    providing a one-, two-, or three-dimensional coordinate.  Multisample
675    texture accesses also specify a sample number.  A coordinate vector is
676    mapped to an individual texel tau_i, tau_i_j, or tau_i_j_k according to
677    the target of the texture bound to the image unit using Table X.1.  As
678    noted above, single-layer bindings of array or cube map textures are
679    considered to use a texture target corresponding to the bound layer,
680    rather than that of the full texture.
681
682                                                   face/
683                                          i  j  k  layer
684                                          -- -- -- -----
685        TEXTURE_1D                        x  -  -    -
686        TEXTURE_2D                        x  y  -    -
687        TEXTURE_3D                        x  y  z    -
688        TEXTURE_RECTANGLE                 x  y  -    -
689        TEXTURE_CUBE_MAP                  x  y  -    z
690        TEXTURE_BUFFER                    x  -  -    -
691        TEXTURE_1D_ARRAY                  x  -  -    y
692        TEXTURE_2D_ARRAY                  x  y  -    z
693        TEXTURE_CUBE_MAP_ARRAY            x  y  -    z
694        TEXTURE_2D_MULTISAMPLE            x  y  -    -
695        TEXTURE_2D_MULTISAMPLE_ARRAY      x  y  -    z
696
697        Table X.1, Mapping of image load, store, and atomic texel coordinate
698        components to texel numbers.
699
700    If the texture target has layers or cube map faces, the layer or face
701    number is taken from the <layer> argument of BindImageTextureEXT if the
702    texture is bound with <layered> set to FALSE, or from the coordinate
703    identified by Table X.1 otherwise.  For cube map and cube map array
704    textures with <layered> set to TRUE, the coordinate is mapped to a layer
705    and face in the same manner as the <layer> argument of
706    BindImageTextureEXT.
707
708    If the individual texel identified for an image load, store, or atomic
709    operation doesn't exist, the access is treated as invalid.  Invalid image
710    loads will return zero.  Invalid image stores will have no effect.
711    Invalid image atomics will not update any texture bound to the image unit
712    and will return zero.  An access is considered invalid if:
713
714      * no texture is bound to the selected image unit;
715
716      * the texture bound to the selected image unit is incomplete;
717
718      * the texture level bound to the image unit is less than the base
719        level or greater than the maximum level of the texture;
720
721      * the texture bound to the image unit is bordered;
722
723      * the internal format of the texture bound to the image unit is not
724        found in Table X.2;
725
726      * the internal format of the texture is incompatible with the specified
727        <format> according to Table X.2.
728
729      * the texture bound to the image unit has layers, is bound with
730        <layered> set to TRUE, and the selected layer or cube map face doesn't
731        exist;
732
733      * the selected texel tau_i, tau_i_j, or tau_i_j_k doesn't exist;
734
735      * the <x>, <y>, or <z> coordinate is not listed in the selected row of
736        Table X.1 and is non-zero; or
737
738      * the texture bound to the image unit has layers, is bound with
739        <layered> set to FALSE, and the corresponding coordinate in the
740        face/layer column of Table X.1 is non-zero.
741
742      * the image has more samples than the implementation-dependent value of
743        MAX_IMAGE_SAMPLES_EXT.
744
745      * the access is a load and the format is not compatible with the
746        "size" layout qualifier of the image uniform.
747
748     For textures with multiple samples per texel, the sample selected for an
749     image load, store, or atomic is undefined if the <sample> coordinate is
750     negative or greater than or equal to the number of samples in the
751     texture.
752
753     If a shader performs an image load, store, or atomic operation using an
754     image variable declared as an array, and if the index used to select an
755     individual out of bounds is negative or greater than or equal to the size
756     of the array, the results of the operation are undefined but may not lead
757     to termination.
758
759     Accesses to textures bound to image units do format conversions based on
760     the <format> argument specified when the image is bound. Loads always
761     return a value as a vec4, ivec4, or uvec4, and stores always take the
762     source data as a vec4, ivec4, or uvec4. Data is converted to/from the
763     specified format as if it were passed through a TexImage2D or GetTexImage
764     command with <format> and <type> as RGBA and FLOAT for vec4 data, with
765     <format> and <type> as RGBA_INTEGER and INT for ivec4 data, or with
766     <format> and <type> as RGBA_INTEGER and UNSIGNED_INT for uvec4 data.
767     Unused components are filled in with (0,0,0,1) (where "1" is either a
768     float or integer depending on the format).
769
770     The formats that are supported for image loads are dependent on the
771     layout(size*) qualifier of the image uniform. The following formats
772     are supported for image loads:
773
774     - size1x8: R8I, R8UI
775     - size1x16: R16I, R16UI
776     - size1x32: R32F, R32I, R32UI
777     - size2x32: RG32F, RG32I, RG32UI
778     - size4x32: RGBA32F, RGBA32I, RGBA32UI
779
780     Image stores support all formats in Table X.2.
781
782     Table X.2 specifies how each format is stored in memory, which must be
783     made explicit because a single image can be viewed with multiple formats
784     according to the <format> argument. The "R", "G", "B", and "A" columns
785     indicate which bits of which 32-bit word correspond to that component.
786     For example, an entry of "1[15:0]" indicates that the selected component
787     uses sixteen bits with its most significant bit in bit 15 of the second
788     word of memory and its least significant bit in bit 0. Floating-point
789     textures with 32-bit components are stored using the IEEE standard
790     representation; textures with 10-, 11-, or 16-bit floating-point
791     components are stored according to Sections 2.1.2 and 2.1.3.
792
793     The "equivalence" column of Table X.2 defines a set of equivalence
794     classes for formats, such that if the internal format of a texture level
795     is in the same equivalence class as the <format> argument to
796     BindImageTextureEXT then the image may be viewed with that format.
797     Otherwise, the access is considered invalid as described above.
798
799        Internal format  Equivalence      R        G         B         A
800        ---------------  -----------   -------  -------   -------   -------
801        RGBA32F             4x32       0[31:0]  1[31:0]   2[31:0]   3[31:0]
802        RGBA16F             2x32       0[15:0]  0[31:16]  1[15:0]   1[31:16]
803        RG32F               2x32       0[31:0]  1[31:0]
804        RG16F               1x32       0[15:0]  0[31:16]
805        R11F_G11F_B10F      1x32       0[10:0]  0[21:11]  0[31:22]
806        R32F                1x32       0[31:0]
807        R16F                1x16       0[15:0]
808
809        RGBA32UI            4x32       0[31:0]  1[31:0]   2[31:0]   3[31:0]
810        RGBA16UI            2x32       0[15:0]  0[31:16]  1[15:0]   1[31:16]
811        RGB10_A2UI          1x32       0[9:0]   0[19:10]  0[29:20]  0[31:30]
812        RGBA8UI             1x32       0[7:0]   0[15:8]   0[23:16]  0[31:24]
813        RG32UI              2x32       0[31:0]  1[31:0]
814        RG16UI              1x32       0[15:0]  0[31:16]
815        RG8UI               1x16       0[7:0]   0[15:8]
816        R32UI               1x32       0[31:0]
817        R16UI               1x16       0[15:0]
818        R8UI                1x8        0[7:0]
819
820        RGBA32I             4x32       0[31:0]  1[31:0]   2[31:0]   3[31:0]
821        RGBA16I             2x32       0[15:0]  0[31:16]  1[15:0]   1[31:16]
822        RGBA8I              1x32       0[7:0]   0[15:8]   0[23:16]  0[31:24]
823        RG32I               2x32       0[31:0]  1[31:0]
824        RG16I               1x32       0[15:0]  0[31:16]
825        RG8I                1x16       0[7:0]   0[15:8]
826        R32I                1x32       0[31:0]
827        R16I                1x16       0[15:0]
828        R8I                 1x8        0[7:0]
829
830        RGBA16              2x32       0[15:0]  0[31:16]  1[15:0]   1[31:16]
831        RGB10_A2            1x32       0[9:0]   0[19:10]  0[29:20]  0[31:30]
832        RGBA8               1x32       0[7:0]   0[15:8]   0[23:16]  0[31:24]
833        RG16                1x32       0[15:0]  0[31:16]
834        RG8                 1x16       0[7:0]   0[15:8]
835        R16                 1x16       0[15:0]
836        R8                  1x8        0[7:0]
837
838        RGBA16_SNORM        2x32       0[15:0]  0[31:16]  1[15:0]   1[31:16]
839        RGBA8_SNORM         1x32       0[7:0]   0[15:8]   0[23:16]  0[31:24]
840        RG16_SNORM          1x32       0[15:0]  0[31:16]
841        RG8_SNORM           1x16       0[7:0]   0[15:8]
842        R16_SNORM           1x16       0[15:0]
843        R8_SNORM            1x8        0[7:0]
844
845        Table X.2, Supported texture formats, component packing, and
846        equivalence classes for formatted image accesses.
847
848     Implementations may support a limited combined number of image units and
849     active fragment shader outputs (section 4.2.1).  A link error will be
850     generated if the number of active image uniforms used in all shaders and
851     the number of active fragment shader outputs exceeds the implementation-
852     dependent value (MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS_EXT).
853
854
855   Modify Section 3.12.2, Shader Execution, p. 274
856
857   (add new unnumbered subsection section at the end of the section, p. 279)
858
859   Early Fragment Tests
860
861   An explicit control is provided to allow fragment shaders to enable early
862   fragment tests.  If the fragment shader specifies the
863   "early_fragment_tests" layout qualifier, the per-fragment tests described
864   in Section 3.X will be performed prior to fragment shader execution.
865   Otherwise, they will be performed after fragment shader execution.
866
867
868Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
869(Per-Fragment Operations and the Framebuffer)
870
871    None.
872
873
874Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
875(Special Functions)
876
877    Modify Section 5.4.1, Commands Not Usable In Display Lists (p. 358)
878
879    (add "MemoryBarrierEXT" to the list of commands not allowed in a display
880     list, in the "Buffer objects" paragraph)
881
882
883Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
884(State and State Requests)
885
886    None.
887
888
889New Implementation Dependent State
890
891                                                        Minimum
892    Get Value                    Type  Get Command      Value      Description             Sec.       Attrib
893    ---------                    ----  -----------      -------    -----------             ----       ------
894    MAX_IMAGE_UNITS_EXT          Z+    GetIntegerv      8          number of units for     3.9.X        -
895                                                                   image load/store/atom
896    MAX_COMBINED_IMAGE_UNITS_    Z+    GetIntegerv      8          limit on active image   3.9.X        -
897      AND_FRAGMENT_OUTPUTS_EXT                                     units + fragment outputs
898    MAX_IMAGE_SAMPLES_EXT        Z     GetIntegerv      0          max allowed samples     3.9.X        -
899                                                                   for a texture level
900                                                                   bound to an image unit
901
902New State
903
904    Add a new Table 6.X, Image Stage (state per image unit)
905
906    Get Value                            Type    Get Command     Initial Value   Sec     Attribute
907    ---------                            ----    -----------     -------------   ---     ---------
908    IMAGE_BINDING_NAME_EXT               8*xZ+   GetIntegeri_v    0             3.9.X      none
909    IMAGE_BINDING_LEVEL_EXT              8*xZ+   GetIntegeri_v    0             3.9.X      none
910    IMAGE_BINDING_LAYERED_EXT            8*xB    GetBooleani_v    FALSE         3.9.X      none
911    IMAGE_BINDING_LAYER_EXT              8*xZ+   GetIntegeri_v    0             3.9.X      none
912    IMAGE_BINDING_ACCESS_EXT             8*xZ3   GetIntegeri_v    READ_ONLY     3.9.X      none
913    IMAGE_BINDING_FORMAT_EXT             8*xZ+   GetIntegeri_v    R8            3.9.X      none
914
915
916Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
917Specification (Invariance)
918
919    None.
920
921
922Additions to the AGL/GLX/WGL Specifications
923
924    None.
925
926
927GLX Protocol
928
929    !!! TBD !!!
930
931
932Modifications to the OpenGL Shading Language Specification, Version 1.50
933
934    Including the following line in a shader can be used to control the
935    language features described in this extension:
936
937      #extension GL_EXT_shader_image_load_store : <behavior>
938
939    where <behavior> is as specified in section 3.3.
940
941    New preprocessor #defines are added to the OpenGL Shading Language:
942
943      #define GL_EXT_shader_image_load_store    1
944
945
946    Modify Section 3.6, Keywords, p. 14
947
948    (add the following to the list of keywords, p. 14)
949
950    coherent
951    volatile
952    restrict
953
954    image1D             iimage1D                uimage1D
955    image2D             iimage2D                uimage2D
956    image3D             iimage3D                uimage3D
957    image2DRect         iimage2DRect            uimage2DRect
958    imageCube           iimageCube              uimageCube
959    imageBuffer         iimageBuffer            uimageBuffer
960    image1DArray        iimage1DArray           uimage1DArray
961    image2DArray        iimage2DArray           uimage2DArray
962    imageCubeArray      iimageCubeArray         uimageCubeArray
963    image2DMS           iimage2DMS              uimage2DMS
964    image2DMSArray      iimage2DMSArray         uimage2DMSArray
965
966    (remove from the list of reserved keywords, p. 15)
967
968    volatile
969
970
971    (Insert a new section immediately after Section 4.1.7, Samplers, p. 23)
972
973    Section 4.1.X, Images
974
975    Like samplers, images are opaque handles to one-, two-, or
976    three-dimensional images corresponding to all or a portion of a single
977    level of a texture image bound to an image unit.  There are distinct
978    image variable types for each texture target, and for each of float,
979    integer, and unsigned integer data types.  Image accesses should use
980    an image type that matches the target of the texture whose level is
981    bound to the image unit, or for non-layered bindings of 3D or array
982    images should use the image type that matches the dimensionality of
983    the layer of the image (i.e. a layer of 3D, 2DArray, Cube, or
984    CubeArray should use image2D, a layer of 1DArray should use image1D,
985    and a layer of 2DMSArray should use image2DMS). If the image target type
986    does not match the bound image in this manner, if the data type does not
987    match the bound image, or if the "size" layout qualifier does not match
988    the image unit format as described in Section 3.9.X of the OpenGL
989    Specification, the results of image accesses are undefined but may not
990    include program termination.
991
992    Image variables are used in the image load, store, and atomic functions
993    described in Section 8.X, "Image Functions" to specify an image to access.
994    They can only be declared as function parameters or uniform variables (see
995    Section 4.3.5 "Uniform").  Except for array indexing, structure field
996    selection, and parentheses, images are not allowed to be operands in
997    expressions.  Images may be aggregated into arrays within a shader (using
998    square brackets [ ]) and can be indexed with general integer expressions.
999    The results of accessing an image array with an out-of-bounds index are
1000    undefined.  Images cannot be treated as l-values; hence, they cannot be
1001    used as out or inout function parameters, nor can they be assigned into.
1002    As uniforms, they are initialized only with the OpenGL API; they cannot be
1003    declared with an initializer in a shader.  As function parameters, images
1004    may only be passed to samplers of matching type.
1005
1006
1007    Modify Section 4.3, Storage Qualifiers, p. 29
1008
1009    (add new qualifiers to the first table, p. 29)
1010
1011        Qualifier       Meaning
1012        ------------    -------------------------------------------------
1013        coherent        memory variable where reads and writes are coherent
1014                        with reads and writes from other shader invocations
1015
1016        volatile        memory variable whose underlying value may be
1017                        changed at any point during shader execution by
1018                        some source other than the current shader invocation
1019
1020        restrict        memory variable where use of that variable is the
1021                        only way to read and write the underlying memory
1022                        in the relevant shader stage
1023
1024
1025    Modify Section 4.3.2, Constant Qualifier (p. 30)
1026
1027    (add after last paragraph of section)
1028
1029    Because image variables can not be built from constant expressions, the
1030    "const" qualifier may not be used to create a compile-time constant image
1031    variable.  However, the "const" qualifier may be used to declare image
1032    variables whose image data are treated as constant, as described in
1033    Section 4.3.X.
1034
1035
1036    Modify Section 4.3.8.1 (Input Layout Qualifiers), p. 39
1037
1038    Remove "only" from the sentence:
1039
1040    Fragment shaders can have an input layout only for redeclaring the
1041    built-in variable gl_FragCoord...
1042
1043    Add to the end of the section:
1044
1045    Fragment shaders also allow an input layout qualifier on the qualifier
1046    "in". The only valid layout qualifier is:
1047
1048      layout-qualifier-id
1049        early_fragment_tests
1050
1051    to indicate that fragment tests will be performed before fragment shader
1052    execution, as described in Section 3.12.2 of the OpenGL Specification.
1053    For example,
1054
1055      layout(early_fragment_tests) in;
1056
1057
1058    (Insert immediately after Section 4.3.8.3, Uniform Block Layout
1059     Qualifiers, p. 40)
1060
1061    Section 4.3.8.X, Image Qualifiers
1062
1063    Layout qualifiers can be used for image variable declarations.  The layout
1064    qualifier identifiers for image variable declarations are
1065
1066      layout-qualifier-id
1067        size1x8
1068        size1x16
1069        size1x32
1070        size2x32
1071        size4x32
1072
1073    The "size" identifiers indicate the set of image formats that the image
1074    variable can be used to access.  Only one "size" identifier may be
1075    specified for any variable declaration.  A layout of "size1x8" is illegal
1076    for image variables associated with floating-point data types.
1077
1078    All image variable declarations, including function parameter
1079    declarations, must specify a "size" layout qualifier.  It is an error to
1080    declare an image uniform variable or function parameter without a size
1081    qualifier.
1082
1083
1084    (Insert immediately after Section 4.3.9, Interpolation, p. 42)
1085
1086    Section 4.3.X, Memory Access Qualifiers
1087
1088    The "coherent", "volatile", "restrict", and "const" storage qualifiers can
1089    be specified in image variable declarations to control memory accesses
1090    using the declared variables.
1091
1092    Memory accesses to image variables declared using the "coherent" storage
1093    qualifier are performed coherently with similar accesses from other shader
1094    invocations.  In particular, when reading a variable declared as
1095    "coherent", the values returned will reflect the results of previously
1096    completed writes performed by other shader invocations.  When writing a
1097    variable declared as "coherent", the values written will be reflected in
1098    subsequent coherent reads performed by other shader invocations.  As
1099    described in the Section 2.20.X of the OpenGL Specification, shader memory
1100    reads and writes complete in a largely undefined order.  The built-in
1101    function memoryBarrier() can be used if needed to guarantee the completion
1102    and relative ordering of memory accesses performed by a single shader
1103    invocation.
1104
1105    When accessing memory using variables not declared as "coherent", the
1106    memory accessed by a shader may be cached by the implementation to service
1107    future accesses to the same address.  Memory stores may be cached in such
1108    a way that the values written may not be visible to other shader
1109    invocations accessing the same memory.  The implementation may cache the
1110    values fetched by memory reads and return the same values to any shader
1111    invocation accessing the same memory, even if the underlying memory has
1112    been modified since the first memory read.  While variables not declared
1113    as "coherent" may not be useful for communicating between shader
1114    invocations, using non-coherent accesses may result in higher performance.
1115
1116    Memory accesses to image variables declared using the "volatile" storage
1117    qualifier must treat the underlying memory as though it could be read or
1118    written at any point during shader execution by some source other than the
1119    executing shader invocation.  When a volatile variable is read, its value
1120    must be re-fetched from the underlying memory, even if the shader
1121    invocation performing the read had already fetched its value from the same
1122    memory once.  When a volatile variable is written, its value must be
1123    written to the underlying memory, even if the compiler can conclusively
1124    determine that its value will be overwritten by a subsequent write.  Since
1125    the external source reading or writing a "volatile" variable may be
1126    another shader invocation, variables declared as "volatile" are
1127    automatically treated as coherent.
1128
1129    Memory accesses to image variables declared using the "restrict" storage
1130    qualifier may be compiled assuming that the variable used to perform the
1131    memory access is the only way to access the underlying memory using the
1132    shader stage in question.  This allows the compiler to coalesce or reorder
1133    loads and stores using "restrict"-qualified image variables in ways that
1134    wouldn't be permitted for image variables not so qualified, because the
1135    compiler can assume that the underlying image won't be read or written by
1136    other code.  Applications are responsible for ensuring that image memory
1137    referenced by variables qualified with "restrict" will not be referenced
1138    using other variables in the same scope; otherwise, accesses to
1139    "restrict"-qualified variables will have undefined results.
1140
1141    Memory accesses to image variables declared using the "const" storage
1142    qualifier may only read the underlying memory, which is treated as
1143    read-only.  It is an error to pass an image variable qualified with
1144    "const" to imageStore() or imageAtomic*().
1145
1146    In image variable declarations, the "coherent", "volatile", "restrict",
1147    and "const" qualifiers can be positioned anywhere in the declaration,
1148    either before or after the data type of the variable being qualified.
1149    Qualifiers before the type name apply to the image data referenced by the
1150    image variable; qualifiers after the type name apply to the image variable
1151    itself.  It is an error to specify "restrict" prior to the type name, as
1152    "restrict" can only qualify the image variable itself.
1153
1154    The "coherent", "volatile", and "restrict" storage qualifiers may only be
1155    used on image variables, and may not be used on variables of any other
1156    type.  "const" may be used in declarations with non-image variable types,
1157    as described in Section 4.3.2.
1158
1159    The values of variables qualified with "coherent", "volatile", "restrict",
1160    or "const" may not be assigned to function parameters lacking such
1161    qualifiers.  It is legal to add qualifiers in a function call, but not to
1162    remove them.
1163
1164      vec4 funcA(layout(size4x32) image2D restrict a)   { ... }
1165      vec4 funcB(layout(size4x32) image2D a)            { ... }
1166      layout(size4x32) uniform image2D img1;
1167      layout(size4x32) coherent uniform image2D img2;
1168
1169      funcA(img1);              // OK, adding "restrict" is allowed
1170      funcB(img2);              // illegal, stripping "coherent" is not
1171
1172
1173    (Insert a new numbered section at the end of Chapter 8, Built-in
1174    Functions, p. 69)
1175
1176    Section 8.X, Image Functions
1177
1178    Variables using one of the image data types may be used in the built-in
1179    shader image memory functions defined in this section to read and write
1180    individual texels of a texture.  Each image variable is an integer scalar
1181    that references an image unit, which has a texture image attached.
1182
1183    When image memory functions access memory, an individual texel in the
1184    image is identified using an i, (i,j), or (i,j,k) coordinate corresponding
1185    to the values of <coord>.  For image2DMS and image2DMSArray variables (and
1186    the corresponding int/unsigned int types) corresponding to multisample
1187    textures, each texel may have multiple samples and an individual sample is
1188    identified using the integer <sample> parameter.  The coordinates and
1189    sample number are used to select an individual texel in the manner
1190    described in Section 3.9.X of the OpenGL specification.
1191
1192    Loads and stores support float, integer, and unsigned integer types. The
1193    data types "gimage*" serve as placeholders meaning either "image*",
1194    "iimage*", or "uimage*" in the same way as "gvec" or "gsampler".
1195
1196    The "IMAGE_INFO" in the prototypes below is a placeholder representing
1197    33 separate functions, each for a different type of image variable.  The
1198    "IMAGE_INFO" placeholder is replaced by one of the following argument
1199    lists:
1200
1201        gimage1D image, int coord
1202        gimage2D image, ivec2 coord
1203        gimage3D image, ivec3 coord
1204        gimage2DRect image, ivec2 coord
1205        gimageCube image, ivec3 coord
1206        gimageBuffer image, int coord
1207        gimage1DArray image, ivec2 coord
1208        gimage2DArray image, ivec3 coord
1209        gimageCubeArray image, ivec3 coord
1210        gimage2DMS image, ivec2 coord, int sample
1211        gimage2DMSArray image, ivec3 coord, int sample
1212
1213    (Note that each of the "gimage*" lines represents one of three different
1214    image variable types.)
1215
1216    Syntax:
1217
1218      gvec4 imageLoad(const IMAGE_INFO);
1219
1220    Description:
1221
1222    Loads the texel at the coordinate <coord> from the image unit specified
1223    by <image>.  For multisample loads, the sample number is given by
1224    <sample>.  When <image>, <coord>, and <sample> identify a valid texel,
1225    the bits used to represent the selected texel in memory are converted to
1226    a vec4, ivec4, or uvec4 in the manner described in Section 3.9.X of the
1227    OpenGL Specification and returned.
1228
1229
1230    Syntax:
1231
1232      void imageStore(IMAGE_INFO, gvec4 data);
1233
1234    Description:
1235
1236    Stores the value of <data> into the texel at the coordinate <coord> from
1237    the image specified by <image>.  For multisample stores, the sample number
1238    is given by <sample>.  When <image>, <coord>, and <sample> identify a
1239    valid texel, the bits used to represent <data> are converted to the format
1240    of the image unit in the manner described in Section 3.9.X of the OpenGL
1241    Specification and stored to the specified texel.
1242
1243
1244    Syntax:
1245
1246      uint      imageAtomicAdd(IMAGE_INFO, uint data);
1247      int       imageAtomicAdd(IMAGE_INFO, int data);
1248
1249      uint      imageAtomicMin(IMAGE_INFO, uint data);
1250      int       imageAtomicMin(IMAGE_INFO, int data);
1251
1252      uint      imageAtomicMax(IMAGE_INFO, uint data);
1253      int       imageAtomicMax(IMAGE_INFO, int data);
1254
1255      uint      imageAtomicIncWrap(IMAGE_INFO, uint wrap);
1256
1257      uint      imageAtomicDecWrap(IMAGE_INFO, uint wrap);
1258
1259      uint      imageAtomicAnd(IMAGE_INFO, uint data);
1260      int       imageAtomicAnd(IMAGE_INFO, int data);
1261
1262      uint      imageAtomicOr(IMAGE_INFO, uint data);
1263      int       imageAtomicOr(IMAGE_INFO, int data);
1264
1265      uint      imageAtomicXor(IMAGE_INFO, uint data);
1266      int       imageAtomicXor(IMAGE_INFO, int data);
1267
1268      uint      imageAtomicExchange(IMAGE_INFO, uint data);
1269      int       imageAtomicExchange(IMAGE_INFO, int data);
1270
1271      uint      imageAtomicCompSwap(IMAGE_INFO, uint compare, uint data);
1272      int       imageAtomicCompSwap(IMAGE_INFO, int compare, int data);
1273
1274    Description:
1275
1276    These functions perform atomic operations on individual texels or samples
1277    of an image variable.  Atomic memory operations read a value from the
1278    selected texel, compute a new value using one of the operations described
1279    below, writes the new value to the selected texel, and returns the
1280    original value read.  The contents of the texel being updated by the
1281    atomic operation are guaranteed not to be updated by any other image store
1282    or atomic function between the time the original value is read and the
1283    time the new value is written.
1284
1285    As with image load and store functions, <image>, <coord>, and <sample>
1286    specify the the individual texel to operate on.  The method for
1287    identifying the individual texel operated on from <image>, <coord>, and
1288    <sample>, and the method for reading and writing the texel are specified
1289    in Section 3.9.X of the OpenGL specification. The format of the image
1290    unit must be in the "1x32" equivalence class in Table X.2 in Section 3.9.X
1291    of the OpenGL specification, otherwise the atomic operation is invalid.
1292
1293    imageAtomicAdd() computes a new value by adding the value of <data> to the
1294    contents of the selected texel.  These functions support 32-bit unsigned
1295    integer operands and 32-bit signed integer operands.
1296
1297    imageAtomicMin() computes a new value by taking the minimum of the value
1298    of <data> and the contents of the selected texel.  These functions support
1299    32-bit signed and unsigned integer operands.
1300
1301    imageAtomicMax() computes a new value by taking the maximum of the value
1302    of <data> and the contents of the selected texel.  These functions support
1303    32-bit signed and unsigned integer operands.
1304
1305    imageAtomicIncWrap() computes a new value by adding one to the contents of
1306    the selected texel, and then forcing the result to zero if and only if the
1307    incremented value is greater than or equal to <wrap>.  These functions
1308    support only 32-bit unsigned integer operands.
1309
1310    imageAtomicDecWrap() computes a new value by subtracting one from the
1311    contents of the selected texel, and then forcing the result to <wrap>-1 if
1312    the original value read from the selected texel was either zero or greater
1313    than <wrap>.  These functions support only 32-bit unsigned integer
1314    operands.
1315
1316    imageAtomicAnd() computes a new value by performing a bitwise and of the
1317    value of <data> and the contents of the selected texel.  These functions
1318    support 32-bit signed and unsigned integer operands.
1319
1320    imageAtomicOr() computes a new value by performing a bitwise or of the
1321    value of <data> and the contents of the selected texel.  These functions
1322    support 32-bit signed and unsigned integer operands.
1323
1324    imageAtomicXor() computes a new value by performing a bitwise exclusive or
1325    of the value of <data> and the contents of the selected texel.  These
1326    functions support 32-bit signed and unsigned integer operands.
1327
1328    imageAtomicExchange() computes a new value by simply copying the value of
1329    <data>.  These functions support 32-bit signed and unsigned integer
1330    operands.
1331
1332    imageAtomicCompSwap() compares the value of <compare> and the contents of
1333    the selected texel.  If the values are equal, the new value is given by
1334    <data>; otherwise, it is taken from the original value loaded from the
1335    texel.  These functions support 32-bit signed and unsigned integer
1336    operands.
1337
1338
1339    (Insert another new numbered section at the end of Chapter 8, Built-in
1340    Functions, p. 69)
1341
1342    Section 8.Y, Shader Memory Functions
1343
1344    Shaders of all types may read and write the contents of textures and
1345    buffer objects using image variables.  While the order or reads and writes
1346    within a single shader invocation is well-defined, the relative order of
1347    reads and writes to a single shared memory address from multiple separate
1348    invocations is largely undefined.
1349
1350    Syntax:
1351
1352      void      memoryBarrier(void);
1353
1354    Description:
1355
1356    memoryBarrier() can be used to control the ordering of memory transactions
1357    issued by a shader invocation.  When called, it will wait on the
1358    completion of all memory accesses resulting from the use of image
1359    variables prior to calling the function.  When all memory operations have
1360    been flushed, memoryBarrier() returns to the caller with no other effect.
1361    When this function returns, the results of any memory stores performed
1362    using coherent variables performed prior to the call will be visible to
1363    any future coherent memory access to the same addresses from other shader
1364    invocations.  In particular, the values written and flushed this way in
1365    one shader stage are guaranteed to be visible to coherent memory accesses
1366    performed by shader invocations in subsequent stages when those
1367    invocations were triggered by the execution of the original shader
1368    invocation (e.g., fragment shader invocations for a primitive resulting
1369    from a particular geometry shader invocation).
1370
1371
1372    Modify Section 9, Shading Language Grammar (p. 105)
1373
1374    !!! TBD:  Add grammar constructs for memory access qualifiers, allowing
1375        memory access qualifiers before or after the type in a variable
1376        declaration.
1377
1378
1379Errors
1380
1381    INVALID_VALUE is generated by Uniform1i{v} if the location refers to an
1382    image variable and the value specified is less than zero or greater than
1383    or equal to MAX_IMAGE_UNITS_EXT.
1384
1385    INVALID_OPERATION is generated by Uniform* functions other than
1386    Uniform1i{v} if the location refers to an image variable.
1387
1388    INVALID_VALUE is generated by BindImageTextureEXT if <index> is less than
1389    zero or greater than or equal to MAX_IMAGE_UNITS_EXT.
1390
1391    INVALID_VALUE is generated by BindImageTextureEXT if <texture> is not the
1392    name of an existing texture object.
1393
1394    INVALID_VALUE is generated by BindImageTextureEXT if <format> is not a
1395    legal format.
1396
1397
1398Dependencies on OpenGL 3.2 (Core Profile)
1399
1400    If only the core profile of OpenGL 3.2 is supported, references to buffer
1401    objects for conventional vertex attributes and to the Begin and RasterPos
1402    commands should be removed.
1403
1404Dependencies on OpenGL 3.1, ARB_uniform_buffer_object, and
1405EXT_bindable_uniform
1406
1407    If OpenGL 3.1, ARB_uniform_buffer_object, and EXT_bindable_uniform are not
1408    supported, references to UNIFORM_BARRIER_BIT should be removed.
1409
1410Dependencies on ARB_draw_indirect
1411
1412    If ARB_draw_indirect is not supported, references to COMMAND_BARRIER_BIT_EXT
1413    should be removed.
1414
1415Dependencies on NV_vertex_buffer_unified_memory
1416
1417    If NV_vertex_buffer_unified_memory is not supported, references to that
1418    extension and GPU addresses in the discussion of
1419    VERTEX_ATTRIB_ARRAY_BARRIER_BIT_EXT and ELEMENT_ARRAY_BARRIER_BIT_EXT should
1420    be removed.
1421
1422Dependencies on OpenGL 3.2 and ARB_texture_multisample
1423
1424    If OpenGL 3.2 and ARB_texture_multisample are not supported, references to
1425    multisample textures should be removed.
1426
1427Dependencies on OpenGL 4.0 and ARB_sample_shading
1428
1429    If OpenGL 4.0 or ARB_sample_shading is supported, the discussion of the
1430    number of shader invocations for a given fragment in the "Shader Memory
1431    Access" section of the specification should be updated to discuss the
1432    sample shading enable and the minimum sample shading factor provided in
1433    that extension.
1434
1435Dependencies on OpenGL 4.0 and ARB_texture_cube_map_array
1436
1437    If OpenGL 4.0 or ARB_texture_cube_map_array are not supported, references
1438    to cube map array textures should be removed.
1439
1440Dependencies on OpenGL 3.3 and ARB_texture_rgb10_a2ui
1441
1442    If OpenGL 3.3 or ARB_texture_rgb10_a2ui are not supported, references to
1443    the RGB10_A2UI texture format should be removed.
1444
1445Dependencies on NV_shader_buffer_load
1446
1447    If NV_shader_buffer_load is supported, the new section 2.14.X (Shader
1448    Memory Access) should be combined with "Section 2.20.X, Shader Memory
1449    Access" from NV_shader_buffer_load.
1450
1451Dependencies on OpenGL 4.0, ARB_gpu_shader5, and NV_gpu_shader5
1452
1453    If OpenGL 4.0, ARB_gpu_shader5, and NV_gpu_shader5 are not supported, the
1454    modifications to the OpenGL Shading Language Specification should be
1455    removed.
1456
1457Dependencies on OpenGL 4.0 and ARB_tessellation_shader
1458
1459    If OpenGL 4.0 and ARB_tessellation_shader are not supported, references to
1460    tessellation control and evaluation shaders should be removed.
1461
1462Dependencies on EXT_shader_atomic_counters
1463
1464    If EXT_shader_atomic_counters is not supported, remove references to
1465    ATOMIC_COUNTER_BARRIER_BIT_EXT.
1466
1467Dependencies on EXT_depth_bounds_test
1468
1469    If EXT_depth_bounds_test is not supported, references to the depth bounds
1470    test should be removed.
1471
1472Dependencies on EXT_separate_shader_objects
1473
1474    If EXT_separate_shader_objects is supported, early depth tests are enabled
1475    if and only if (a) there is an active program for the fragment shader
1476    stage and (b) the fragment shader in that program enables early depth
1477    tests using a layout qualifier.
1478
1479Dependencies on NV_gpu_program5
1480
1481    If NV_gpu_program5 is supported, the following edits are made to extend
1482    the assembly programming model documented in the NV_gpu_program4 extension
1483    and extended by NV_gpu_program5.  No "OPTION" line is required; the
1484    following capability is implied by NV_gpu_program5 program headers such as
1485    "!!NVfp5.0".
1486
1487    If NV_gpu_program5 is not supported, the contents of this dependencies
1488    section should be ignored.
1489
1490    Section 2.X.2, Program Grammar
1491
1492    (add the following rules to the grammar)
1493
1494      <namingStatement>       ::= IMAGE_statement
1495
1496      <IMAGE_statement>       ::= "IMAGE" <establishName> <imageSingleInit>
1497                                | "IMAGE" <establishName> <optArraySize>
1498                                    <imageMultipleInit>
1499
1500      <imageSingleInit>       ::= "=" <imageUseDS>
1501
1502      <imageMultipleInit>     ::= "=" "{" <imageItemList> "}"
1503
1504      <imageItemList>         ::= <imageUseDM>
1505                                | <imageUseDM> "," <imageItemList>
1506
1507      <imageUseDS>            ::= "image" <arrayMemAbs>
1508
1509      <imageUseDM>            ::= <imageUseDS>
1510                                | "image" <arrayRange>
1511
1512
1513      <instruction>           ::= <ImageInstruction>
1514
1515      <ImageInstruction>:     ::= <LOADIMop_instruction>
1516                                | <STOREIMop_instruction>
1517                                | <ATOMIMop_instruction>
1518
1519      <LOADIMop_instruction>  ::= <LOADIMop> <opModifiers> <instResult> ","
1520                                       <instOperandV> "," <imageAccess>
1521
1522      <STOREIMop_instruction> ::= <STOREIMop> <opModifiers> <imageUnit> ","
1523                                       <instOperandV> "," <instOperandV> ","
1524                                       <imageTarget>
1525
1526      <ATOMIMop_instruction>  ::= <ATOMIMop> <opModifiers> <instResult> ","
1527                                       <instOperandV> "," <instOperandV> ","
1528                                       <imageAccess>
1529
1530      <LOADIMop>              ::= "LOADIM"
1531      <STOREIMop>             ::= "STOREIM"
1532      <ATOMIMop>              ::= "ATOMIM"
1533
1534      <imageAccess>           ::= <imageUnit> "," <imageTarget>
1535
1536      <imageUnit>             ::= "image" <arrayMemAbs>
1537                                | <imageVarName> <optArrayMem>
1538
1539      <imageTarget>           ::= "1D"
1540                                | "2D"
1541                                | "3D"
1542                                | "RECT"
1543                                | "CUBE"
1544                                | "BUFFER"
1545                                | "ARRAY1D"
1546                                | "ARRAY2D"
1547                                | "ARRAYCUBE"
1548                                | "2DMS"
1549                                | "ARRAY2DMS"
1550
1551    Section 2.X.3.X, Program Image Variables
1552
1553    Program image variables are used as constants during program execution
1554    and refer the image objects bound to one or more image units. All
1555    image variables have associated bindings and are read-only during
1556    program execution.  Image variables retain their values across program
1557    invocations, and the set of image units to which they refer is
1558    constant.  The texture object a variable refers to may be changed by
1559    binding a new texture object to the corresponding image unit.  Image
1560    variables may only be used to identify a texture object in image
1561    instructions, and may not be used as operands in any other instruction.
1562    Image variables may be declared explicitly via the <IMAGE_statement>
1563    grammar rule, or implicitly by using an image unit binding in an
1564    instruction.
1565
1566    Image array variables may be declared as arrays, but the list of image
1567    units assigned to the array must increase consecutively.
1568
1569      Binding          Components  Underlying State
1570      ---------------  ----------  ------------------------------------------
1571      image[a]             x       image object bound to image unit a
1572      image[a..b]          x       image objects bound to image units a
1573                                     through b
1574
1575      Table X.12.2:  Image Unit Bindings.  <a> and <b> indicate image unit
1576      numbers.
1577
1578    If an image binding matches "image[a]", the image variable is filled
1579    with a single integer referring to image unit <a>.
1580
1581    If an image binding matches "image[a..b]", the image variable is
1582    filled with an array of integers referring to image units <a> through
1583    <b>, inclusive.  A program will fail to compile if <a> or <b> is
1584    negative or greater than or equal to the number of image units
1585    supported, or if <a> is greater than <b>.
1586
1587
1588    Modify Section 2.X.4, Program Execution Environment
1589
1590      Instr-      Modifiers
1591      uction  V  F I C S H D  Out Inputs    Description
1592      ------- -- - - - - - -  --- --------  --------------------------------
1593      ATOMIM  50 - - X - - -  s   v,vs,i    atomic image operation
1594      LOADIM  50 - - X X - F  v   vs,i      image load
1595      MEMBAR  50 - - - - - -  -   -         memory barrier
1596      STOREIM 50 X X - - - F  -   i,v,vs    image store
1597
1598      ...
1599
1600      The input and output columns describe the formats of the operands and
1601      results of the instruction.
1602
1603        i:  IMAGE variable, read-only
1604
1605
1606    Modify Section 2.X.4.1, Program Instruction Modifiers
1607
1608    (add to Table X.14 of the NV_gpu_program4 specification.)
1609
1610      Modifier  Description
1611      --------  ---------------------------------------------------
1612      COH       Mark LOADIM and STOREIM operations as coherent
1613      VOL       Make LOADIM and STOREIM operations as volatile
1614
1615    For image load and store operations, the "COH" modifier controls whether
1616    the operation is performed in a manner guaranteed to be coherent with
1617    loads and stores performed by other shader invocations.
1618
1619    For image load and store operations, the "VOL" modifier controls whether
1620    the operation should treat the contents of the image accessed as volatile,
1621    where the underlying image contents may be changed at any point during
1622    shader execution by some source other than the current shader thread.
1623
1624
1625    Section 2.X.8.Z, LOADIM:  Image Load
1626
1627    The LOADIM instruction takes the components of a single signed integer
1628    vector operand and uses them as coordinates to perform an unformatted
1629    image load from the texture bound to the image unit specified by
1630    <imageUnit>. Unformatted loads read the data from memory without
1631    converting from the image unit format, by copying raw bits from memory
1632    to the destination variable according to the bit layouts described in
1633    Table X.2, where word 0 is written to the .x component, word 1 to .y,
1634    etc..
1635
1636    Eleven image targets are supported:  1D, 2D, 3D, RECT, CUBE, BUFFER,
1637    ARRAY1D, ARRAY2D, ARRAYCUBE, 2DMS, and ARRAY2DMS.  The texel coordinate
1638    is a one-, two- or three-dimensional vector, taken from the <x>, <y>, and
1639    <z> components of the operand.  For the 2DMS and ARRAY2DMS, the texel
1640    coordinate is a two- or three-dimensional vector, taken from the <x>,
1641    <y>, and <z> components of the operand, and a sample number is taken from
1642    the <w> component of the operand.
1643
1644        coords = VectorLoad(op0);
1645        if (target == 1D || target == BUFFER) {
1646          coords.y = 0;
1647        }
1648        if (target == 1D || target == 2D ||
1649            target == BUFFER || target == RECT ||
1650            target == 2DMS) {
1651          coords.z = 0;
1652        }
1653        if (target != 2DMS && target != ARRAY2DMS) {
1654          coords.w = 0;
1655        }
1656        result = ImageLoad(image, coords);
1657
1658    When an image load uses the "S8", "U8", "S16", "U16", "F32", "S32", or
1659    "U32" storage modifiers, the <x> component of the result contains the
1660    loaded value and the <y>, <z>, and <w> components of the result are zero,
1661    zero, and one (int or float, depending on the type of the opModifier),
1662    respectively. For "S8" and "S16" modifiers, the loaded value is sign-
1663    extended; for "U8" and "U16", the loaded value is zero-extended.  When
1664    an image load uses the "F32X2", "S32X2", or "U32X2" storage modifiers,
1665    the <x> and <y> components of the result contain the loaded values and
1666    the <z>, and <w> components of the result are zero and one, respectively.
1667    When an image load uses the "F32X4", "S32X4", or "U32X4" storage
1668    modifiers, all four components of the result contain the loaded values.
1669    If the image load is invalid for any of the reasons described in Section
1670    3.9.X, the result vector will be undefined.
1671
1672    LOADIM supports no base data type modifiers, but requires exactly one
1673    storage modifier.  An image load is treated as invalid unless the storage
1674    modifier matches the image unit format, as described in Table X.3.  The
1675    base data type of the result vector is derived from the storage modifier.
1676    The single operand is always interpreted as a signed integer vector.
1677
1678        Data Type    Supported Modifers
1679        ---------    -------------------
1680          4x32       F32X4, S32X4, U32X4
1681          2x32       F32X2, S32X2, U32X2
1682          1x32       F32,   S32,   U32
1683          1x16              S16,   U16
1684          1x8               S8,    U8
1685
1686      Table X.3, Supported Storage Modifiers.  Unformatted image operations
1687      are considered invalid unless the storage modifier is compatible with
1688      the "Data Type" entry for the image unit format, as described in Table
1689      X.2.
1690
1691
1692    Section 2.X.8.Z, STOREIM:  Image Store
1693
1694    The STOREIM instruction takes the components of the second signed integer
1695    vector operand, uses them as coordinates to perform a formatted or
1696    unformatted image store to the texture bound to the image unit specified
1697    by <imageUnit> using the data specified in the first vector operand.  The
1698    store is performed in the manner described in Section 3.9.X.
1699
1700    Eleven image targets are supported:  1D, 2D, 3D, RECT, CUBE, BUFFER,
1701    ARRAY1D, ARRAY2D, ARRAYCUBE, 2DMS, and ARRAY2DMS.  The texel coordinate
1702    is a one-, two- or three-dimensional vector, taken from the <x>, <y>, and
1703    <z> components of the operand.  For the 2DMS and ARRAY2DMS, the texel
1704    coordinate is a two- or three-dimensional vector, taken from the <x>,
1705    <y>, and <z> components of the operand, and a sample number is taken from
1706    the <w> component of the operand.
1707
1708        data = VectorLoad(op0);
1709        coords = VectorLoad(op1);
1710        if (target == 1D || target == BUFFER) {
1711          coords.y = 0;
1712        }
1713        if (target == 1D || target == 2D ||
1714            target == BUFFER || target == RECT ||
1715            target == 2DMS) {
1716          coords.z = 0;
1717        }
1718        if (target != 2DMS && target != ARRAY2DMS) {
1719          coords.w = 0;
1720        }
1721        ImageStore(image, coords, data);
1722
1723    STOREIM supports an optional base data type or storage modifier.  If a
1724    storage modifier is specified, the store is unformatted; otherwise, it is
1725    formatted.  Formatted stores operate as described in Section 3.9.X.
1726    Unformatted stores write the data to memory without converting to the
1727    image unit format, by copying raw bits from the source variable to
1728    memory according to the bit layouts described in Table X.2, where word
1729    0 is taken from the <x> component, word 1 from <y>, etc..
1730
1731    An unformatted image store is treated as invalid unless the
1732    storage modifier matches image unit format, as described in Table X.3.
1733    When performing an unformatted store using the "S8", "U8", "S16", or
1734    "U16" modifiers, all bits but the least significant eight or sixteen
1735    are dropped as part of the store.  When performing a formatted store,
1736    the first operand will be converted to the image unit format as part
1737    of the store.
1738
1739    The base data type of the first vector operand is derived from the data
1740    type or storage modifier.  The second operand is always interpreted as a
1741    signed integer vector.
1742
1743
1744    Section 2.X.8.Z, ATOMIM:  Image Atomic Memory Operation
1745
1746    The ATOMIM instruction takes the components of the second signed integer
1747    vector operand, uses them as coordinates to perform an unformatted image
1748    load from the texture bound to the image unit specified by <imageUnit>,
1749    performs a computation using the loaded value and the first vector
1750    operand, performs an unformatted store of the result of the computation to
1751    the same texel, and then returns the loaded value in the vector result.
1752    The atomic operation is performed in the manner described in Section
1753    3.9.X.
1754
1755    The ATOMIM instruction has two required instruction modifiers.  The atomic
1756    modifier specifies the type of computation to be performed.  The storage
1757    modifier specifies the size and data type of the operand read from the
1758    image unit and the base data type of the operation used to compute the
1759    value to be written back.
1760
1761      atomic     storage
1762      modifier   modifiers   operation
1763      --------   ---------   --------------------------------------
1764       ADD       U32, S32    compute a sum
1765       MIN       U32, S32    compute minimum
1766       MAX       U32, S32    compute maximum
1767       IWRAP     U32         increment memory, wrapping at operand
1768       DWRAP     U32         decrement memory, wrapping at operand
1769       AND       U32, S32    compute bit-wise AND
1770       OR        U32, S32    compute bit-wise OR
1771       XOR       U32, S32    compute bit-wise XOR
1772       EXCH      U32, S32    exchange memory with operand
1773       CSWAP     U32, S32    compare-and-swap
1774
1775     Table X.4, Supported atomic and storage modifiers for the ATOMIM
1776     instruction.
1777
1778    Not all storage modifiers are supported by ATOMIM, and the set of
1779    modifiers allowed for any given instruction depends on the atomic modifier
1780    specified.  Table X.4 enumerates the set of atomic modifiers supported by
1781    the ATOMIM instruction, and the storage modifiers allowed for each.
1782
1783        data = VectorLoad(op0);
1784        coords = VectorLoad(op1);
1785        if (target == 1D || target == BUFFER) {
1786          coords.y = 0;
1787        }
1788        if (target == 1D || target == 2D ||
1789            target == BUFFER || target == RECT ||
1790            target == 2DMS) {
1791          coords.z = 0;
1792        }
1793        if (target != 2DMS && target != ARRAY2DMS) {
1794          coords.w = 0;
1795        }
1796        result = ImageLoad(coords, data);
1797        switch (atomicModifier) {
1798        case ADD:
1799          writeval = tmp0.x + result;
1800          break;
1801        case MIN:
1802          writeval = min(tmp0.x, result);
1803          break;
1804        case MAX:
1805          writeval = max(tmp0.x, result);
1806          break;
1807        case IWRAP:
1808          writeval = (result >= tmp0.x) ? 0 : result+1;
1809          break;
1810        case DWRAP:
1811          writeval = (result == 0 || result > tmp0.x) ? tmp0.x : result-1;
1812          break;
1813        case AND:
1814          writeval = tmp0.x & result;
1815          break;
1816        case OR:
1817          writeval = tmp0.x | result;
1818          break;
1819        case XOR:
1820          writeval = tmp0.x ^ result;
1821          break;
1822        case EXCH:
1823          break;
1824        case CSWAP:
1825          if (result == tmp0.x) {
1826            writeval = tmp0.y;
1827          } else {
1828            writeval = result;
1829          }
1830          break;
1831        }
1832        ImageStore(image, writeval);
1833
1834    ATOMIM performs a scalar atomic operation.  The <y>, <z>, and <w>
1835    components of the result vector are undefined.
1836
1837    ATOMIM supports no base data type modifiers, but requires exactly one
1838    storage and one atomic modifier.  An image atomic is treated as invalid
1839    unless the storage modifier matches the format of the texture bound to the
1840    image unit, as described in Table X.3.  The base data type of the result
1841    and the first operand is derived from the storage modifier.  The second
1842    operand is always interpreted as a signed integer vector.
1843
1844
1845    Section 2.X.8.Z, MEMBAR:  Memory Barrier
1846
1847    The MEMBAR instruction synchronizes memory transactions to ensure that
1848    memory transactions resulting from any instruction executed by the thread
1849    prior to the MEMBAR instruction complete prior to any memory transactions
1850    issued after the instruction.
1851
1852    MEMBAR has no operands and generates no result.
1853
1854    Modify Section 3.9.X, Texture Image Loads and Stores, as added above.
1855
1856    (Add a separate paragraph and table describing how the four-component
1857    coordinate vector used in image load, store, and atomic opcodes are mapped
1858    to individual texels.)
1859
1860    When a program accesses the texture bound to an image unit using the
1861    LOADIM, STOREIM, or ATOMIM opcodes, it provides a four-component
1862    coordinate vector used to select individual texels or samples.  This
1863    (x,y,z,w) vector is used to select an individual texel tau_i, tau_i_j, or
1864    tau_i_j_k according to the target of the texture bound to the image unit
1865    using Table X.5.  As noted above, single-layer bindings of array or cube
1866    map textures are considered to use a texture target corresponding to the
1867    bound layer, rather than that of the full texture.
1868
1869                                                   face/
1870                                          i  j  k  layer sample
1871                                          -- -- -- ----- ------
1872        TEXTURE_1D                        x  -  -    -     -
1873        TEXTURE_2D                        x  y  -    -     -
1874        TEXTURE_3D                        x  y  z    -     -
1875        TEXTURE_RECTANGLE                 x  y  -    -     -
1876        TEXTURE_CUBE_MAP                  x  y  -    z     -
1877        TEXTURE_BUFFER                    x  -  -    -     -
1878        TEXTURE_1D_ARRAY                  x  -  -    z     -
1879        TEXTURE_2D_ARRAY                  x  y  -    z     -
1880        TEXTURE_CUBE_MAP_ARRAY_ARB        x  y  -    z     -
1881        TEXTURE_2D_MULTISAMPLE            x  y  -    -     w
1882        TEXTURE_2D_MULTISAMPLE_ARRAY      x  y  -    z     w
1883
1884        Table X.5, Mapping of image load, store, and atomic texel coordinate
1885        components to texel numbers.
1886
1887
1888Issues
1889
1890    (1) How are the format and type of the load/store determined?
1891
1892      RESOLVED:  There is a natural desire to load and store using a
1893      canonical 4-vector in the shader with hardware converting to/from a
1894      format compatible with the bound image, to be consistent with how
1895      texture loads and fragment shader outputs currently behave. There is
1896      also good reason to allow some flexibility in the format used for image
1897      accesses being different from the internal format of the texture level.
1898      We allow format conversions to and from any format that image units
1899      support. We make the format be selected when the image is bound to an
1900      image unit, and define which image unit formats can be used for which
1901      texture level internal formats. For example, it is legal to access an
1902      image whose internal format is RGBA8 with an image unit format of
1903      R32UI.
1904
1905    (2) What set of texture formats should be supported for image loads and
1906        stores?
1907
1908      RESOLVED:  We allow textures to be bound to image units if and only if
1909      the implementation supports formatted stores for the texture format.
1910      Any texture formats not explicitly enumerated in this extension may not
1911      be bound to an image unit, although future extensions may add new
1912      formats to the set of supported formats.
1913
1914      In particular, this extension supports one-, two-, and four-component
1915      textures with 8-, 16-, and 32-bit components, including floating-point,
1916      signed integer, unsigned integer, as well as signed and unsigned
1917      normalized formats.  Additionally, a small number of other formats are
1918      supported, including the 11/11/10 RGB format from EXT_packed_float and
1919      10/10/10/2 unsigned normalized RGBA.
1920
1921    (3) Should we general support image loads and stores for three-component
1922        "RGB" formats?
1923
1924      RESOLVED:  Not in this extension.  If an application needs to perform
1925      image loads and stores on a three-component texture, it could use an
1926      equivalent RGBA format and ignore the alpha component.  The
1927      EXT_texture_swizzle extension could be used to make the values returned
1928      by texture appear identical to an RGB texture, if required.
1929
1930    (4) Should textures be unbound from image units when they are deleted?
1931
1932      RESOLVED:  Yes, this matches behavior of existing bind points.
1933
1934    (5) Should we support image loads and stores for the deprecated LUMINANCE,
1935        LUMINANCE_ALPHA, and ALPHA formats?
1936
1937      RESOLVED:  No, only support the RGBA-style formats. EXT_texture_swizzle
1938      can be used to mimic luminance and alpha if required.
1939
1940    (6) Should we support 64-bit atomics on images?  Should we support atomics
1941        at all on formats with 8-, 16-, 64-, or 128-bit texels?
1942
1943      RESOLVED:  No, we will only support 32-bit atomic operations on images.
1944
1945    (7) How do shader image loads and stores interact with texture
1946        completeness?  What happens if you bind a texture with inconsistent
1947        mipmaps?
1948
1949      RESOLVED:  The image unit is treated as if nothing were bound, where
1950      all accesses are treated as invalid.
1951
1952    (8) What happens if the value passed to Uniform1i to specify the image
1953        unit corresponding to a image variable refers to a non-existent image
1954        unit (i.e., is negative or greater than or equal to the number of
1955        image units supported)?
1956
1957      RESOLVED:  Values referring to invalid image units will be rejected and
1958      produce an INVALID_VALUE error.
1959
1960    (9) Should we provide counting rules for image variable use in different
1961        shaders like we have for samplers?  In particular, there are limits
1962        on the amount of state, the number of active samplers in each shader
1963        stage, and the sum of the active sampler counts in each stage.
1964
1965      RESOLVED:  No.  It was considered sufficient to have just a limit on the
1966      total number of image units in the implementation (i.e., the number of
1967      distinct values that the variable can be set to).
1968
1969    (10) Can this extension be used to load and store values into a buffer
1970         object?  Into a renderbuffer?
1971
1972      RESOLVED:  Yes, indirectly.  The BUFFER_TEXTURE target provided by
1973      OpenGL 3.0 and the EXT_texture_buffer_object extension allows an
1974      application to create a one-dimensional buffer texture using the data
1975      store of a buffer object. This buffer texture may be bound to an image
1976      unit and accessed with an imageBuffer variable in the Shading Language.
1977
1978      This extension adds support for image accesses to multisample textures,
1979      but not renderbuffers. Note that with the ARB_texture_multisample
1980      extension, there is no longer a good reason to use renderbuffers.
1981      Existing 2D or rectangle targets already provided a superset of single-
1982      sample renderbuffer functionality; the new ARB extension provides a
1983      superset of multisample renderbuffer functionality.
1984
1985    (11) What amount of automatic synchronization is provided for image loads
1986         and stores?  In particular, is the use of MemoryBarrierEXT() required
1987         to ensure consistent ordering relative to other GL operations?  Or is
1988         some other mechanism (e.g., unbinding a texture from an image unit
1989         and then binding it to a texture image unit) sufficient?
1990
1991      RESOLVED:  Use of MemoryBarrierEXT is required, and there is no
1992      automatic synchronization when images are bound or unbound.
1993
1994      Implicit synchronization is difficult, as it might require some
1995      combination of:
1996
1997        - tracking which images might be written (randomly) in the shader
1998          itself;
1999
2000        - assuming that if a shader that performs writes is executed, all
2001          texels of all bound images could be modified and thus must be
2002          treated as dirty;
2003
2004        - idling at the end of each primitive or draw call, so that the
2005          results of all previous commands are complete.
2006
2007      Since normal OpenGL operation is pipelined, idling would result in a
2008      significant performance impact since pipelining would otherwise allow
2009      fragment shader execution for draw call N while simultaneously
2010      performing vertex shader execution for draw call N+1.
2011
2012    (12) Should image loads and stores be allowed for all shader types?
2013
2014      RESOLVED:  Yes, it seems useful.
2015
2016      Note that some shader types pose specific implementation complexities
2017      (e.g., reuse of vertices in vertex shaders, number of fragment shader
2018      invocations in multisample modes, relative order of execution within and
2019      between shader groups).  We have explicitly specify several cases where
2020      the invocation count and execution order are undefined.  While these
2021      cases may be a problem for some algorithms, we expect that many
2022      algorithms will not be adversely impacted.
2023
2024    (13) Should an implementation be required to throw INVALID_OPERATION
2025         errors if the dimension of the texture coordinates implied by the
2026         image variable type doesn't match the structure of the texture
2027         level/layer bound to the corresponding image unit?  If not, what
2028         happens in such a mismatch?
2029
2030      RESOLVED:  No.  The results of image accesses are undefined.
2031
2032    (14) Should shader image variable types include a "format" implying the
2033         data type accepted/returned by shader image loads and stores?  For
2034         example, an image variable corresponding to a 2D texture with format
2035         of RGBA32F might have a type "image2Dvec4", with the "vec4"
2036         indicating that the image data lines up with a four-component
2037         floating-point vector.
2038
2039      RESOLVED:  No.  Separate types are provided for float vs. int vs.
2040      unsigned int, but not for each image format.
2041
2042    (15) If shader image variable types include information on the texel
2043         components returned or written by shader image accesses, should an
2044         implementation be required to enforce errors if the variable type is
2045         incompatible with the format of the referenced texture?  If not, or
2046         if the image variable type doesn't include format information, what
2047         happens in case of a mismatch between the texture format and the
2048         shader access format?
2049
2050      RESOLVED:  We aren't including types in the variable that correspond
2051      to the image format, so an error check in the driver is not possible.
2052
2053      If an individual load, store, or atomic uses a data type incompatible
2054      with the texture bound to the image unit, loads will return and stores
2055      will write undefined values.
2056
2057    (16) Is it possible to bind the "default texture" (numbered zero) for a
2058         given texture target to an image unit?
2059
2060      RESOLVED:  No.  Passing zero to BindImageTexture unbinds and texture
2061      currently bound to the selected image unit.  If this ability were
2062      provided, it would also be necessary to provide some mechanism to
2063      specify a texture target because there is a separate default "zero"
2064      texture for each target.
2065
2066      Note that existing framebuffer objects have a similar behavior; default
2067      textures can't be attached to an FBO.
2068
2069    (17) May bordered textures be used with image loads and stores?
2070
2071      RESOLVED:  No.
2072
2073    (18) Should we have defined behavior if invalid coordinates are passed to
2074         an image load, store, or atomic operation?  If so, what happens?
2075
2076      RESOLVED:  Yes. We define the behavior to return zeroes on a load and
2077      atomic and to have no effect on any bound texture on stores and
2078      atomics.
2079
2080    (19) Should we have a limit on the total number of combined image units
2081         and draw buffers, and if so, what should that be?
2082
2083      RESOLVED:  Yes, some hardware requires this. The program will fail to
2084      link.
2085
2086    (20) What happens if a shader specifies an image store or atomic operation
2087         for killed/discarded pixels?
2088
2089      RESOLVED:  For GLSL shaders that execute a "discard" instruction, any
2090      image stores or atomics performed before executing the discard will
2091      behave normally.  When the "discard" instruction is executed, the shader
2092      invocation will be terminated and will perform no further image store or
2093      atomic operations.
2094
2095      For assembly shaders (NV_gpu_program5) that execute a "KIL" instruction,
2096      any image stores or atomics performed before executing the KIL will
2097      behave normally.  Unlike GLSL's "discard", the "KIL" instruction does
2098      not terminate program invocations.  However, any image store or atomic
2099      operations performed after the KIL instruction do not update memory, and
2100      the value returned by atomic operations is undefined.
2101
2102    (21) When enabling early depth tests in a program, what happens if a
2103         fragment fails one of the tests (e.g., depth test)?
2104
2105      RESOLVED:  The specification indicates that the fragment shader is not
2106      executed.  Implementations might still end up running fragment shader
2107      for implementation-dependent reasons.  For example, the fragment shader
2108      may be run in order to approximate derivatives for neighboring pixels
2109      that did pass all per-fragment tests.  In these cases, implementations
2110      must guarantee that image stores have no effect.
2111
2112    (22) If implementations run fragment shaders for fragments that aren't
2113         covered by the primitive or fail early depth tests (e.g., "helper
2114         pixels"), how does that interact with stores and atomics?
2115
2116      RESOLVED:  The current OpenGL specification has no formal notion of
2117      "helper" pixels.  In practice, implementations may run fragment shaders
2118      for pixels near the boundaries of rasterized primitives to allow
2119      derivatives to be approximated by differencing.  Typically, these shader
2120      invocations have no effect.  While they may produce outputs, the outputs
2121      for these pixels will be discarded without affecting the framebuffer.
2122      The spec basically treats these shader invocations as though they don't
2123      exist.
2124
2125      If such a shader invocation performs store or atomic operations, we need
2126      to define what happens.  In our definition, stores will have no effect,
2127      atomics will not update memory, and the values returned by atomics will
2128      be undefined.  The fact that these invocations don't affect memory is
2129      consistent with the notion of helper pixel shader invocations not
2130      existing.
2131
2132      However, it is possible to write a fragment shader where flow control
2133      depends on the (undefined) values returned by the atomic.  In this case,
2134      the undefined values returned for helper pixels could result in very
2135      long execution time (appearing to be hang) or an infinite loop.  To
2136      avoid hangs in such cases, it is possible to use the fragment shader
2137      input sample mask to identify helper pixels:
2138
2139        // If the input sample mask is non-zero, at least one sample is
2140        // covered and the invocation should be treated as a real invocation.
2141        // If the sample mask is zero, nothing is covered and this should be
2142        // treated as a helper pixel.  If more than 32 samples are supported,
2143        // additional words of gl_SampleMaskIn would need to be checked.
2144        if (gl_SampleMaskIn[0] != 0)  {
2145          // "real" pixel, perform atomic operations
2146        } else {
2147          // "helper" pixel, skip atomics
2148        }
2149
2150      It may be desirable to formalize the notion of helper pixels in a future
2151      addition to the shading language.
2152
2153    (23) What API should we use to specify early depth tests?
2154
2155      RESOLVED:  Use a layout qualifier in a fragment shader rather than
2156      having a separate program parameter or other piece of GL state.
2157
2158    (24) For formatted loads where the format doesn't include some component,
2159         what values are filled in? (0,0,0,1)? (0,0,0,0)?
2160
2161      RESOLVED: Prefer (0,0,0,1) to match other APIs.
2162
2163    (25) How does the combined-image-and-fragment-output limit interact with
2164         separate shader objects?  For example, an application may want to
2165         share a single image unit between two shader stages and not have it
2166         count twice against the limit.
2167
2168      RESOLVED:  The known implementations of this extension do not have this
2169      issue, so we chose not to include any spec language.  Perhaps a
2170      Begin-time error could be specified in the future if this limit is
2171      exceeded.
2172
2173    (26) What sort of qualifiers should we provide relevant to memory
2174         referenced by image variables?
2175
2176      RESOLVED:  We will support the qualifiers "coherent", "volatile",
2177      "restrict", and "const" to be used in image variable declarations.
2178
2179      "coherent" is used to ensure that memory accesses from different shader
2180      invocations are cached coherently (i.e., one invocation will be able to
2181      observe writes from another when the other invocation's writes
2182      complete).  This coherence may mean the use of "coherent"-qualified
2183      image variables may perform more slowly than of otherwise equivalent
2184      unqualified variables.
2185
2186      "volatile" behaves is as in C, and may be needed if an algorithm
2187      requires reading image memory that may be written asynchronously by
2188      other shader invocations.
2189
2190      "restrict" behaves as in the C99 standard, and can be used to indicate
2191      that no other image variable points to the same underlying data.  This
2192      permits optimizations that would otherwise be impossible if the compiler
2193      has to assume that a pair of images might end up pointing to the same
2194      data.  For example, in standard C/C++, a loop like:
2195
2196        int *a, *b;
2197        a[0] = b[0] + b[0];
2198        a[1] = b[0] + b[1];
2199        a[2] = b[0] + b[2];
2200
2201      would need to reload b[0] for each assignment because a[0] or a[1] might
2202      point at the same data as b[0].  With restrict, the compiler can assume
2203      that b[0] is not modified by any of the instructions and load it just
2204      once.  The same considerations apply to accesses using imageLoad(),
2205      imageStore(), and imageAtomic*() builtins.
2206
2207      "const" behaves as in C, and indicates that the image memory should be
2208      treated as read-only.  Note that the use of "const" in image variable
2209      declarations is different from the normal "const" qualifier, as it
2210      treats the image data referenced by the variable as constant.
2211
2212    (27) How should shaders be able to express qualifiers for image variables?
2213
2214      RESOLVED:  This extension borrows from C/C++ syntax rules where a
2215      qualifier may be specified before or after the type.  For example,
2216
2217        layout(size4x32) const uniform image2D imageVariable;
2218
2219      declare an image uniform whose image data are treated as read-only.  We
2220      permit qualifiers to be provided either before or after the type name
2221      (image2D).  The position of the qualifier is meaningful.  Qualifiers
2222      before the type name apply to the data referenced by the variable.
2223      Qualifiers after the type name apply to the variable itself.
2224
2225      The closest C/C++ equivalent to the declarations above would turn
2226      declarations like:
2227
2228        layout(size4x32) const uniform image2D firstImage;
2229        layout(size4x32) uniform image2D const secondImage;
2230
2231      into:
2232
2233        const struct image2D_data * firstImage;
2234        struct image2D_data * const secondImage;
2235
2236      where "image2D" is replaced with "struct image2D_data *".  In this
2237      model, the former declares <firstImage> to be a pointer to constant
2238      image data.  The latter declares <secondImage> to be a constant pointer
2239      to non-constant image data.
2240
2241      For "coherent", "volatile", and "const", the qualifier should typically
2242      go before the image type.  For "restrict", the qualifier must go after
2243      the image type, since "restrict" applies to the pointer, not the data
2244      being pointed to.
2245
2246      Note that a qualifier could theoretically be specified before and after
2247      the type name, such as:
2248
2249        const image2D const imageVariable;
2250
2251      which would declare <imageVariable> to be constant and to reference
2252      constant image data.  In this extension, declaring an image variable to
2253      be constant isn't meaningful, as such variables can never be used as
2254      l-values.
2255
2256    (28) What is the meaning of "restrict" on a system that might run either
2257         multiple invocations of the same shader simultaneously, or multiple
2258         invocations of different shaders (vertex and fragment)
2259         simultaneously?
2260
2261      RESOLVED:  When an image variable is qualified with "restrict", the only
2262      guarantee is that no other image variable in the same shader invocation
2263      references the same underlying image data.  There is no guarantee that
2264      the same image couldn't be referenced by another invocation of the same
2265      shader, or by an invocation of a different shader.
2266
2267      The main function of "restrict" is to allow compilers to generate more
2268      efficient code for a single shader invocation than it could if it had to
2269      conservatively assume that accesses to other images could touch the same
2270      image data.
2271
2272    (29) What is the purpose of the memoryBarrier() built-in function?
2273
2274      RESOLVED:  The memoryBarrier() function can be used to ensure that if
2275      another shader invocation or other portions observe image memory being
2276      written by a shader, that accesses appear in a predictable order.  For
2277      example, consider the following code:
2278
2279        uniform imageBuffer buf1;
2280        uniform imageBuffer buf2;
2281        int offset1, offset2;
2282        vec4 data1, data2;
2283        imageStore(buf1, offset1, data1);
2284        imageStore(buf2, offset2, data2);
2285
2286      This specification doesn't require that writes be committed to memory in
2287      the order specified in the shader.  It is possible that another shader
2288      invocation or some other observer would see <data2> before seeing
2289      <data1>.  If an algorithm involved multiple shader invocations with one
2290      possibly needing to wait on data written by another, observing <data2>
2291      in the second shader would not ensure that <data1> has been written.
2292      However, if memoryBarrier() were used, as in the following code, the
2293      second shader would have such a guarantee.
2294
2295        imageStore(buf1, offset1, data1);
2296        memoryBarrier();
2297        imageStore(buf2, offset2, data2);
2298
2299    (30) What happens if the texel identified by the coordinates given to an
2300         image load, store, or atomic built-in doesn't exist?  (i.e.,
2301         coordinates are out of bounds)
2302
2303      RESOLVED:  The results of image loads return zero.  Stores do not update
2304      image memory.  Atomics do not update image memory and return zero.
2305      These same considerations apply if no texture is bound to an image unit,
2306      the texture is incomplete, and various other conditions.  We do not ever
2307      apply wrap modes on image operations.
2308
2309    (31) Why do we have a <format> parameter on BindImageTextureEXT?
2310
2311      RESOLVED:  It allows some amount of bit-casting, to view a texture with
2312      one format using another format.  This parameter allows applications to
2313      work around several limitations of the specification:
2314
2315        * Image loads do not support all formats supported for stores.  In
2316          particular, the only formats supported are 1x8, 1x16, 1x32, 2x32,
2317          and 4x32.  Using the <format> parameter allows an application to
2318          view an RGBA8 texture as "R32UI" and examine the component bits
2319          itself.
2320
2321        * Image atomics are single-component 32-bit operations.  The ability
2322          to view some other formats as "size1x32" allows atomic operations to
2323          be done on some multi-component formats, such as RGBA8.
2324
2325    (32) Do we support image atomics on multi-component texture formats?
2326
2327      RESOLVED:  Only using the formats in the "size1x32" equivalence class,
2328      and then only as 32-bit scalar integer operations.  Atomics do not
2329      operate on a component-by-component basis in this extension.
2330
2331    (33) What happens if early fragment testing is enabled, the early depth
2332         test passes, and a fragment shader that computes a new depth value is
2333         executed?
2334
2335      RESOLVED:  The depth value produced by the fragment shader has no effect
2336      if early depth and stencil tests are enabled.  The depth value computed
2337      by a fragment shader is used only by the post-fragment shader stencil
2338      and depth tests, and those tests always have no effect when early
2339      fragment tests is enabled.
2340
2341    (34) How do early fragment tests interact with occlusion queries?
2342
2343      RESOLVED:  When early fragment tests are enabled, sample counting for
2344      occlusion queries also happens prior to fragment shader execution.
2345      Enabling early fragment tests can change the overall sample count,
2346      because samples killed by alpha test and alpha to coverage will still be
2347      counted if early fragment tests are enabled.
2348
2349    (35) If we provide support for multiple active program objects (e.g., one
2350         containing a vertex shader, another containing a fragment shader, as
2351         in EXT_separate_shader_object), how will early fragment tests be
2352         handled?
2353
2354      RESOLVED:  The early fragment test enable should be taken from the
2355      active program object corresponding to the fragment shader stage.
2356
2357    (36) When specifying a coordinate vector to specify a texel for a
2358         TEXTURE_1D_ARRAY target, what coordinate is used to specify the
2359         layer?
2360
2361      RESOLVED:  For GLSL functions, a two-component vector is specified and
2362      the second (y) component is used to select a layer.  When using the
2363      LOADIM, STOREIM, and ATOMIM NV_gpu_program5 assembly opcodes, a
2364      four-component vector is provided and the third (z) component selects
2365      the layer.
2366
2367Revision History
2368
2369    Rev.    Date    Author    Changes
2370    ----  --------  --------  -----------------------------------------
2371     7    10/16/13  pbrown    Update issue (20) to clarify that any image
2372                              stores and atomics issued before a "discard" do
2373                              have an effect.  Update issue (22) to better
2374                              define the behavior of stores and atomics on
2375                              "helper" pixels and to suggest a workaround for
2376                              shaders that need to use values returned by
2377                              atomics (undefined for helper pixels) in flow
2378                              control constructs.
2379
2380     6    12/12/10  pbrown    Fix minor errata reported by spec reviewers
2381                              (bugs 6870 and 6991).
2382
2383     5    09/17/10  pbrown    Clean up the spec language specifying the
2384                              mapping of coordinates to texels according to
2385                              the texture target.  For 1D arrays, GLSL wants
2386                              the layer in the second component of a
2387                              two-component vector while NV_gpu_program5 wants
2388                              it in the third component of a four-component
2389                              vector.  Also clarify that single-layer bindings
2390                              of an array or cube map texture use a target
2391                              appropriate to the bound layer.
2392
2393     4    03/23/10  pbrown    Add interaction with EXT_separate_shader_objects.
2394                              Update issues section to include some issues
2395                              left behind in NV_gpu_shader5 when specs were
2396                              refactored.
2397
2398     3    03/21/10  pbrown    Update spec overview, interactions, and issues
2399                              sections; miscellaneous minor clarifications.
2400
2401     2    03/16/10  pbrown    Add a separate #extension line for this
2402                              extension; needed since the became packaged
2403                              separately from ARB_gpu_shader5.  Added C99-like
2404                              "restrict" qualifier to indicate that an image
2405                              variable won't share underlying image contents
2406                              with any other variable.  Added support for
2407                              "const" qualifiers on images to allow indicate
2408                              read-only image data.  Added language describing
2409                              the significance of the position of image
2410                              variable qualifiers.  Clarified rules on use of
2411                              image variables as function parameters; adding
2412                              qualifiers is OK, stripping them off is not.
2413                              Updated image layout qualifier section to
2414                              clarify that "size" layout qualifiers are
2415                              required on both uniform and function parameter
2416                              declarations.  Added "const" qualifier on the
2417                              image argument in imageLoad() prototypes.
2418                              Updated extension names in dependency sections.
2419                              Add support for stores to the RGB10_A2 texture
2420                              format from OpenGL 3.3.  Add several issues.
2421
2422     1              jbolz     Internal revisions.
2423