• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    ARB_gpu_shader5
4
5Name Strings
6
7    GL_ARB_gpu_shader5
8
9Contact
10
11    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
12
13Contributors
14
15    Barthold Lichtenbelt, NVIDIA
16    Bill Licea-Kane, AMD
17    Bruce Merry, ARM
18    Chris Dodd, NVIDIA
19    Eric Werness, NVIDIA
20    Graham Sellers, AMD
21    Greg Roth, NVIDIA
22    Jeff Bolz, NVIDIA
23    Nick Haemel, AMD
24    Pierre Boudier, AMD
25    Piers Daniell, NVIDIA
26
27Notice
28
29    Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at
30        http://www.khronos.org/registry/speccopyright.html
31
32Specification Update Policy
33
34    Khronos-approved extension specifications are updated in response to
35    issues and bugs prioritized by the Khronos OpenGL Working Group. For
36    extensions which have been promoted to a core Specification, fixes will
37    first appear in the latest version of that core Specification, and will
38    eventually be backported to the extension document. This policy is
39    described in more detail at
40        https://www.khronos.org/registry/OpenGL/docs/update_policy.php
41
42Status
43
44    Complete. Approved by the ARB at the 2010/01/22 F2F meeting.
45    Approved by the Khronos Board of Promoters on March 10, 2010.
46
47Version
48
49    Version 16, March 30, 2012
50
51Number
52
53    ARB Extension #88
54
55Dependencies
56
57    This extension is written against the OpenGL 3.2 (Compatibility Profile)
58    Specification.
59
60    This extension is written against Version 1.50 (Revision 09) of the OpenGL
61    Shading Language Specification.
62
63    OpenGL 3.2 and GLSL 1.50 are required.
64
65    This extension interacts with ARB_gpu_shader_fp64.
66
67    This extension interacts with NV_gpu_shader5.
68
69    This extension interacts with ARB_sample_shading.
70
71    This extension interacts with ARB_texture_gather.
72
73Overview
74
75    This extension provides a set of new features to the OpenGL Shading
76    Language and related APIs to support capabilities of new GPUs, extending
77    the capabilities of version 1.50 of the OpenGL Shading Language.  Shaders
78    using the new functionality provided by this extension should enable this
79    functionality via the construct
80
81      #extension GL_ARB_gpu_shader5 : require     (or enable)
82
83    This extension provides a variety of new features for all shader types,
84    including:
85
86      * support for indexing into arrays of samplers using non-constant
87        indices, as long as the index doesn't diverge if multiple shader
88        invocations are run in lockstep;
89
90      * extending the uniform block capability of OpenGL 3.1 and 3.2 to allow
91        shaders to index into an array of uniform blocks;
92
93      * support for implicitly converting signed integer types to unsigned
94        types, as well as more general implicit conversion and function
95        overloading infrastructure to support new data types introduced by
96        other extensions;
97
98      * a "precise" qualifier allowing computations to be carried out exactly
99        as specified in the shader source to avoid optimization-induced
100        invariance issues (which might cause cracking in tessellation);
101
102      * new built-in functions supporting:
103
104        * fused floating-point multiply-add operations;
105
106        * splitting a floating-point number into a significand and exponent
107          (frexp), or building a floating-point number from a significand and
108          exponent (ldexp);
109
110        * integer bitfield manipulation, including functions to find the
111          position of the most or least significant set bit, count the number
112          of one bits, and bitfield insertion, extraction, and reversal;
113
114        * packing and unpacking vectors of small fixed-point data types into a
115          larger scalar; and
116
117        * convert floating-point values to or from their integer bit
118          encodings;
119
120      * extending the textureGather() built-in functions provided by
121        ARB_texture_gather:
122
123        * allowing shaders to select any single component of a multi-component
124          texture to produce the gathered 2x2 footprint;
125
126        * allowing shaders to perform a per-sample depth comparison when
127          gathering the 2x2 footprint using for shadow sampler types;
128
129        * allowing shaders to use arbitrary offsets computed at run-time to
130          select a 2x2 footprint to gather from; and
131
132        * allowing shaders to use separate independent offsets for each of the
133          four texels returned, instead of requiring a fixed 2x2 footprint.
134
135    This extension also provides some new capabilities for individual
136    shader types, including:
137
138      * support for instanced geometry shaders, where a geometry shader may be
139        run multiple times for each primitive, including a built-in
140        gl_InvocationID to identify the invocation number;
141
142      * support for emitting vertices in a geometry program where each vertex
143        emitted may be directed independently at a specified vertex stream (as
144        provided by ARB_transform_feedback3), and where each shader output is
145        associated with a stream;
146
147      * support for reading a mask of covered samples in a fragment shader;
148        and
149
150      * support for interpolating a fragment shader input at a programmable
151        offset relative to the pixel center, a programmable sample number, or
152        at the centroid.
153
154IP Status
155
156    No known IP claims.
157
158New Procedures and Functions
159
160    None
161
162New Tokens
163
164    Accepted by the <pname> parameter of GetProgramiv:
165
166        GEOMETRY_SHADER_INVOCATIONS                     0x887F
167
168    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,
169    GetDoublev, and GetInteger64v:
170
171        MAX_GEOMETRY_SHADER_INVOCATIONS                 0x8E5A
172        MIN_FRAGMENT_INTERPOLATION_OFFSET               0x8E5B
173        MAX_FRAGMENT_INTERPOLATION_OFFSET               0x8E5C
174        FRAGMENT_INTERPOLATION_OFFSET_BITS              0x8E5D
175        MAX_VERTEX_STREAMS                              0x8E71
176
177    (note:  MAX_GEOMETRY_SHADER_INVOCATIONS,
178     MIN_FRAGMENT_INTERPOLATION_OFFSET, MAX_FRAGMENT_INTERPOLATION_OFFSET, and
179     FRAGMENT_INTERPOLATION_OFFSET_BITS have identical values to corresponding
180     "NV" enums from NV_gpu_program5.  MAX_VERTEX_STREAMS is also defined in
181     ARB_transform_feedback3.)
182
183
184Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
185(OpenGL Operation)
186
187    Modify Section 2.15.4, Geometry Shader Execution Environment, p. 121
188
189    (add two unnumbered subsections after "Texture Access", p. 122)
190
191    Instanced Geometry Shaders
192
193    For each input primitive received by the geometry shader pipeline stage,
194    the geometry shader may be run once or multiple times.  The number of
195    times a geometry shader should be executed for each input primitive may be
196    specified using a layout qualifier in a geometry shader of a linked
197    program.  If the invocation count is not specified in any layout
198    qualifier, the invocation count will be one.
199
200    Each separate geometry shader invocation is assigned a unique invocation
201    number.  For a geometry shader with <N> invocations, each input primitive
202    spawns <N> invocations, numbered 0 through <N>-1.  The built-in uniform
203    gl_InvocationID may be used by a geometry shader invocation to determine
204    its invocation number.
205
206    When executing instanced geometry shaders, the output primitives generated
207    from each input primitive are passed to subsequent pipeline stages using
208    the shader invocation number to order the output.  The first primitives
209    received by the subsequent pipeline stages are those emitted by the shader
210    invocation numbered zero, followed by those from the shader invocation
211    numbered one, and so forth.  Additionally, all output primitives generated
212    from a given input primitive are passed to subsequent pipeline stages
213    before any output primitives generated from subsequent input primitives.
214
215
216    Geometry Shader Vertex Streams
217
218    Geometry shaders may emit primitives to multiple independent vertex
219    streams.  Each vertex emitted by the geometry shader is directed at one of
220    the vertex streams.  As vertices are received on each stream, they are
221    arranged into primitives of the type specified by the geometry shader
222    output primitive type.  The shading language built-in functions
223    EndPrimitive() and EndStreamPrimitive() may be used to end the primitive
224    being assembled on a given vertex stream and start a new empty primitive
225    of the same type.  If an implementation supports <N> vertex streams, the
226    individual streams are numbered 0 through <N>-1.  There is no requirement
227    on the order of the streams to which vertices are emitted, and the number
228    of vertices emitted to each stream may be completely independent, subject
229    only to implementation-dependent output limits.
230
231    The primitives emitted to all vertex streams are passed to the transform
232    feedback stage to be captured and written to buffer objects in the manner
233    specified by the transform feedback state.  The primitives emitted to all
234    streams but stream zero are discarded after transform feedback.
235    Primitives emitted to stream zero are passed to subsequent pipeline stages
236    for clipping, rasterization, and subsequent fragment processing.
237
238    Geometry shaders that emit vertices to multiple vertex streams are
239    currently limited to using only the "points" output primitive type.  A
240    program will fail to link if it includes a geometry shader that calls the
241    EmitStreamVertex() built-in function and has any other output primitive
242    type parameter.
243
244
245Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
246(Rasterization)
247
248    Modify Section 3.3.1, Multisampling, p. 148
249
250    (add new paragraph at the end of the section, p. 149)
251
252    If MULTISAMPLE is enabled and the current program object includes a
253    fragment shader with one or more input variables qualified with "sample
254    in", the data associated with those variables will be assigned
255    independently.  The values for each sample must be evaluated at the
256    location of the sample.  The data associated with any other variables not
257    qualified with "sample in" need not be evaluated independently for each
258    sample.
259
260
261    Modify ARB_texture_gather, "Changes to Section 3.8.8"
262
263    (extend language describing the operation of textureGather, allowing the
264     new <comp> argument to select any of the four components from a
265     multi-component texel vector)
266
267    The textureGather and textureGatherOffset built-in shader functions...  A
268    four-component vector is then assembled by taking a single component from
269    the swizzled texture source colors of the four texels, in the order
270    T_i0_j1, T_i1_j1, T_i1_j0, and T_i0_j0.  The selected component is
271    identified by the optional <comp> argument, where the values zero, one,
272    two, and three identify the Rs, Gs, Bs, or As component, respectively.  If
273    <comp> is omitted, it is treated as identifying the Rs component.
274    Incomplete textures (section 3.8.10) are considered to return a texture
275    source color of (0,0,0,1) for all four source texels.
276
277    (add further language describing textureGatherOffsets)
278
279    The textureGatherOffsets built-in functions from the OpenGL Shading
280    Language return a vector derived from sampling four texels in the image
281    array of level <level_base>.  For each of the four texel offsets specified
282    by the <offsets> argument, the rules for the LINEAR minification filter
283    are applied to identify a 2x2 texel footprint, from which the single texel
284    T_i0_j0 is selected.  A four-component vector is then assembled by taking
285    a single component from each of the four T_i0_j0 texels in the same manner
286    as for the textureGather function.
287
288
289    Modify Section 3.12.1, Shader Variables, p. 273
290
291    (insert prior to the last paragraph of the section, p. 274)
292
293    When interpolating built-in and user-defined varying variables, the default
294    screen-space location at which these variables are sampled is defined in
295    previous rasterization sections.  The default location may be overriden by
296    interpolation qualifiers.  When interpolating variables declared using
297    "centroid in", the variable is sampled at a location within the pixel
298    covered by the primitive generating the fragment.  When interpolating
299    variables declared using "sample in" when MULTISAMPLE is enabled, the
300    fragment shader will be invoked separately for each covered sample and the
301    variable will be sampled at the corresponding sample point.
302
303    Additionally, built-in fragment shader functions provide further
304    fine-grained control over interpolation.  The built-in functions
305    interpolateAtCentroid() and interpolateAtSample() will sample variables as
306    though they were declared with the "centroid" or "sample" qualifiers,
307    respectively.  The built-in function interpolateAtOffset() will sample
308    variables at a specified (x,y) offset relative to the center of the pixel.
309    The range and granularity of offsets supported by this function is
310    implementation-dependent.  If either component of the specified offset is
311    less than MIN_FRAGMENT_INTERPOLATION_OFFSET or greater than
312    MAX_FRAGMENT_INTERPOLATION_OFFSET, the position used to interpolate the
313    variable is undefined.  Not all values of <offset> may be supported; x and
314    y offsets may be rounded to fixed-point values with the number of fraction
315    bits given by the implementation-dependent constant
316    FRAGMENT_INTERPOLATION_OFFSET_BITS.
317
318
319    Modify Section 3.12.2, Shader Execution, p. 274
320
321    (insert prior to the next-to-last paragraph in "Shader Inputs", p. 277)
322
323    The built-in variable gl_SampleMaskIn[] is an integer array holding
324    bitfields indicating the set of fragment samples covered by the primitive
325    corresponding to the fragment shader invocation.  The number of elements
326    in the array is ceil(<s>/32), where <s> is the maximum number of color
327    samples supported by the implementation.  Bit <n> of element <w> in the
328    array is set if and only if the sample numbered <w>*32+<n> is considered
329    covered for this fragment shader invocation.  When rendering to a
330    non-multisample buffer, or if multisample rasterization is disabled, all
331    bits are zero except for bit zero of the first array element.  That bit
332    will be one if the pixel is covered and zero otherwise.  Bits in the
333    sample mask corresponding to covered samples that will be killed due to
334    SAMPLE_COVERAGE or SAMPLE_MASK_NV will not be set (section 4.1.3).  When
335    per-sample shading is active due to the use of a fragment input qualified
336    by "sample", only the bit for the current sample is set in
337    gl_SampleMaskIn.  When OpenGL API state specifies multiple fragment shader
338    invocations for a given fragment, the sample mask for any single fragment
339    shader invocation may specify a subset of the covered samples for the
340    fragment.  In this case, the bit corresponding to each covered sample will
341    be set in exactly one fragment shader invocation.
342
343
344Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
345(Per-Fragment Operations and the Frame Buffer)
346
347    None.
348
349Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
350(Special Functions)
351
352    None.
353
354Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
355(State and State Requests)
356
357    Modify Section 6.1.16, Shader and Program Queries, p. 384
358
359    (add to long first paragraph, p. 386) ... If <pname> is
360    GEOMETRY_SHADER_INVOCATIONS, the number of geometry shader invocations per
361    primitive will be returned.  If GEOMETRY_VERTICES_OUT,
362    GEOMETRY_INPUT_TYPE, GEOMETRY_OUTPUT_TYPE, or GEOMETRY_SHADER_INVOCATIONS
363    are queried for a program which has not been linked successfully, or which
364    does not contain objects to form a geometry shader, then an
365    INVALID_OPERATION error is generated.
366
367
368Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
369Specification (Invariance)
370
371    None.
372
373Additions to the AGL/GLX/WGL Specifications
374
375    None.
376
377Modifications to The OpenGL Shading Language Specification, Version 1.50
378(Revision 09)
379
380    Including the following line in a shader can be used to control the
381    language features described in this extension:
382
383      #extension GL_ARB_gpu_shader5 : <behavior>
384
385    where <behavior> is as specified in section 3.3.
386
387    New preprocessor #defines are added to the OpenGL Shading Language:
388
389      #define GL_ARB_gpu_shader5        1
390
391
392    Modify Section 3.6, Keywords, p. 14
393
394    (add to the keyword list)
395
396      sample
397
398
399    Modify Section 4.1.7, Samplers, p. 23
400
401    (modify 1st paragraph of the section, deleting the restriction requiring
402    constant indexing of sampler arrays but still requiring uniform indexing
403    across invocations) ... Samplers may aggregated into arrays within a
404    shader (using square brackets [ ]) and can be indexed with general integer
405    expressions.  The results of accessing a sampler array with an
406    out-of-bounds index are undefined. ...
407
408    (add new paragraph restricting the use of general integer expression in
409    sampler array indexing) When indexing an array of samplers, the integer
410    expression used to index the array must be uniform across shader
411    invocations.  If this restriction is not satisfied, the results of
412    accessing the sampler array are undefined.  For the purposes of this
413    uniformity test, the index used for texture lookups performed inside a
414    loop is considered uniform for the <n>th loop iteration if all shader
415    invocations that execute the loop at least <n> times compute the same
416    index on that iteration.  For texture lookups inside a function other than
417    main(), an index is considered uniform if the value is the same for all
418    invocations calling the function from the same point in the caller.  For
419    nested loops and function calls, the uniformity test requires that the
420    index match only those other shader invocations with identical loop
421    iteration counts and function call chains.
422
423
424    Modify Section 4.1.10, Implicit Conversions, p. 27
425
426    (modify table of implicit conversions)
427
428                                Can be implicitly
429        Type of expression        converted to
430        ---------------------   -----------------
431        int                     uint, float
432        ivec2                   uvec2, vec2
433        ivec3                   uvec3, vec3
434        ivec4                   uvec4, vec4
435
436        uint                    float
437        uvec2                   vec2
438        uvec3                   vec3
439        uvec4                   vec4
440
441    (modify second paragraph of the section) No implicit conversions are
442    provided to convert from unsigned to signed integer types or from
443    floating-point to integer types.  There are no implicit array or structure
444    conversions.
445
446    (insert before the final paragraph of the section) When performing
447    implicit conversion for binary operators, there may be multiple data types
448    to which the two operands can be converted.  For example, when adding an
449    int value to a uint value, both values can be implicitly converted to uint
450    and float.  In such cases, a floating-point type is chosen if either
451    operand has a floating-point type.  Otherwise, an unsigned integer type is
452    chosen if either operand has an unsigned integer type.  Otherwise, a
453    signed integer type is chosen.
454
455
456    Modify Section 4.3, Storage Qualifiers, p. 29
457
458    (add to first table on the page)
459
460      Qualifier         Meaning
461      --------------    ----------------------------------------
462      sample in         linkage with per-sample interpolation
463      sample out        linkage with per-sample interpolation
464
465    (modify third paragraph, p. 29) These interpolation qualifiers may only
466    precede the qualifiers in, centroid in, sample in, out, centroid out, or
467    sample out in a declaration.  ...
468
469
470    Modify Section 4.3.4, Inputs, p. 31
471
472    (modify first paragraph of section) Shader input variables are declared
473    with the in, centroid in, or sample in storage qualifiers. ... Variables
474    declared as in, centroid in, or sample in may not be written to during
475    shader execution. ...
476
477    (modify third paragraph, p. 32) ...  Fragment shader inputs get
478    per-fragment values, typically interpolated from a previous stage's
479    outputs.  They are declared in fragment shaders with the in, centroid in,
480    or sample in storage qualifiers or the deprecated varying and centroid
481    varying storage qualifiers. ...
482
483    (add to examples immediately below)
484
485      sample in vec4 perSampleColor;
486
487
488    Modify Section 4.3.6, Outputs, p. 33
489
490    (modify first paragraph of section) Shader output variables are declared
491    with the out, centroid out, or sample out storage qualifiers. ...
492
493    (modify third paragraph of section) Vertex and geometry output variables
494    output per-vertex data and are declared using the out, centroid out, or
495    sample out storage qualifiers, or the deprecated varying storage
496    qualifier.
497
498    (add to examples immediately below)
499
500      sample out vec4 perSampleColor;
501
502    (modify last paragraph, p. 33) Fragment outputs output per-fragment data
503    and are declared using the out storage qualifier. It is an error to use
504    centroid out or sample out in a fragment shader. ...
505
506
507    Modify Section 4.3.7, Interface Blocks, p. 34
508
509    (modify last paragaph, p. 36, removing the requirement for indexing
510    uniform blocks using constant expressions) For uniform blocks declared as
511    arrays, each individual array element corresponds to a separate buffer
512    object backing one instance of the block.  As the array size indicates the
513    number of buffer objects needed, uniform block array declarations must
514    specify an integral array size.  Arbitrary indices may be used to index a
515    uniform block array; integral constant expressions are not required.  If
516    the index used to access an array of uniform blocks is out-of-bounds, the
517    results of the access are undefined.
518
519
520    Modify Section 4.3.8.1, Input Layout Qualifiers, p. 37
521
522    (modify last paragraph, p. 37, and subsequent paragraphs on p. 38)
523
524    Geometry shaders support input layout qualifiers.  There are two types of
525    layout qualifiers used to specify an input primitive type and an
526    invocation count.  The input primitive type and invocation count
527    qualifiers are allowed only on the interface qualifier in, not on an input
528    block, block member, or variable.
529
530      layout-qualifier-id
531        points
532        lines
533        lines_adjacency
534        triangles
535        triangles_adjacency
536        invocations = integer-constant
537
538    The identifiers "points", "lines", "lines_adjacency", "triangles", and
539    "triangles_adjacency" are used to specify the type of input primitive
540    accepted by the geometry shader, and only one of these is accepted.  At
541    least one geometry shader (compilation unit) in a program must declare an
542    input primitive type, and all geometry shader input primitive type
543    declarations in a program must declare the same type.  It is not required
544    that all geometry shaders in a program declare an input primitive type.
545
546    The identifier "invocations" is used to specify the number of times the
547    geometry shader is invoked for each input primitive received.  Invocation
548    count declarations are optional.  If no invocation count is declared in
549    any geometry shader in the program, the geometry shader will be run once
550    for each input primitive.  If an invocation count is declared, all such
551    declarations must specify the same count.  If a shader specifies an
552    invocation count greater than the implementation-dependent maximum, it
553    will fail to compile.
554
555    For example,
556
557      layout(triangles, invocations=6) in;
558
559    will establish that all inputs to the geometry shader are triangles and
560    that the geometry shader is run six times for each triangle processed.
561
562    All geometry shader input unsized array declarations ...
563
564
565    Modify Section 4.3.8.2, Output Layout Qualifiers, p. 40
566
567    (modify second and subsequent paragraphs, p. 40)
568
569    Geometry shaders can have output layout qualifiers.  There are three types
570    of output layout qualifiers used to specify an output primitive type, a
571    maximum output vertex count, and per-output stream numbers.  The output
572    primitive type and output vertex count qualifiers are allowed only on the
573    interface qualifier out, not on an output block, block member, or variable
574    declaration.  The output stream number qualifier is allowed on the
575    interface qualifier out, or on output blocks or variable declarations.
576
577    The layout qualifier identifiers for geometry shader outputs are
578
579      layout-qualifier-id
580        points
581        line_strip
582        triangle_strip
583        max_vertices = integer-constant
584        stream = integer-constant
585
586    The identifiers "points", "line_strip", and "triangle_strip" are used to
587    specify the type of output primitive produced by the geometry shader, and
588    only one of these is accepted.  At least one geometry shader (compilation
589    unit) in a program must declare an output primitive type, and all geometry
590    shader output primitive type declarations in a program must declare the
591    same primitive type.  It is not required that all geometry shaders in a
592    program declare an output primitive type.
593
594    The identifier "max_vertices" is used to specify the maximum number of
595    vertices the shader will ever emit in a single invocation.  At least one
596    geometry shader (compilation unit) in a program must declare an maximum
597    output vertex count, and all geometry shader output vertex count
598    declarations in a program must declare the same count.  It is not required
599    that all geometry shaders in a program declare a count.
600
601    In the example,
602
603      layout(triangle_strip, max_vertices = 60) out; // order does not matter
604      layout(max_vertices = 60) out; // redeclaration okay
605      layout(triangle_strip) out; // redeclaration okay
606      layout(points) out; // error, contradicts triangle_strip
607      layout(max_vertices = 30) out; // error, contradicts 60
608
609    all outputs from the geometry shader are triangles and at most 60 vertices
610    will be emitted by the shader.  It is an error for the maximum number of
611    vertices to be greater than gl_MaxGeometryOutputVertices.
612
613    The identifier "stream" is used to specify that a geometry shader output
614    variable or block is associated with a particular vertex stream (numbered
615    beginning with zero).  A default stream number may be declared at global
616    scope by qualifying interface qualifier out as in this example:
617
618      layout(stream = 1) out;
619
620    The stream number specified in such a declaration replaces any previous
621    default and applies to all subsequent block and variable declarations
622    until a new default is established.  The initial default stream number is
623    zero.
624
625    Each output block or non-block output variable is associated with a vertex
626    stream.  If the block or variable is declared with a stream qualifier, it
627    is associated with the specified stream; otherwise, it is associated with
628    the current default stream.  A block member may be declared with a stream
629    qualifier, but the specified stream must match the stream associated with
630    the containing block.  One example:
631
632      layout(stream=1) out;             // default is now stream 1
633      out vec4 var1;                    // var1 gets default stream (1)
634      layout(stream=2) out Block1 {     // "Block1" belongs to stream 2
635        layout(stream=2) vec4 var2;     // redundant block member stream decl
636        layout(stream=3) vec2 var3;     // ILLEGAL (must match block stream)
637        vec3 var4;                      // belongs to stream 2
638      };
639      layout(stream=0) out;             // default is now stream 0
640      out vec4 var5;                    // var5 gets default stream (0)
641      out Block2 {                      // "Block2" gets default stream (0)
642        vec4 var6;
643      };
644      layout(stream=3) out vec4 var7;   // var7 belongs to stream 3
645
646    If a geometry shader output block or variable is declared more than once,
647    all such declarations must associate the variable with the same vertex
648    stream.  If any stream declaration specifies a non-existent stream number,
649    the shader will fail to compile.
650
651    Built-in geometry shader outputs are always associated with vertex stream
652    zero.
653
654    Each vertex emitted by the geometry shader is assigned to a specific
655    stream, and the attributes of the emitted vertex are taken from the set of
656    output blocks and variables assigned to the targeted stream.  After each
657    vertex is emitted, the values of all output variables become undefined.
658    Additionally, the output variables associated with each vertex stream may
659    share storage.  Writing to an output variable associated with one stream
660    may overwrite output variables associated with any other stream.  When
661    emitting each vertex, a geometry shader should write to all outputs
662    associated with the stream to which the vertex will be emitted and to no
663    outputs associated with any other stream.
664
665
666    Modify Section 4.3.9, Interpolation, p. 42
667
668    (modify first paragraph of section, add reference to sample in/out) The
669    presence of and type of interpolation is controlled by the storage
670    qualifiers centroid in, sample in, centroid out, and sample out, by the
671    optional interpolation qualifiers smooth, flat, and noperspective, and by
672    default behaviors established through the OpenGL API when no interpolation
673    qualifier is present. ...
674
675    (modify second paragraph) ... A variable may be qualified as flat centroid
676    or flat sample, which will mean the same thing as qualifying it only as
677    flat.
678
679    (replace last paragraph, p. 42)
680
681    When multisample rasterization is disabled, or for fragment shader input
682    variables qualified with neither "centroid in" nor "sample in", the value
683    of the assigned variable may be interpolated anywhere within the pixel and
684    a single value may be assigned to each sample within the pixel, to the
685    extent permitted by the OpenGL Specification.
686
687    When multisample rasterization is enabled, "centroid" and "sample" may be
688    used to control the location and frequency of the sampling of the
689    qualified fragment shader input.  If a fragment shader input is qualified
690    with "centroid", a single value may be assigned to that variable for all
691    samples in the pixel, but that value must be interpolated at a location
692    that lies in both the pixel and in the primitive being rendered, including
693    any of the pixel's samples covered by the primitive.  Because the location
694    at which the variable is sampled may be different in neighboring pixels,
695    derivatives of centroid-sampled inputs may be less accurate than those for
696    non-centroid interpolated variables.  If a fragment shader input is
697    qualified with "sample", a separate value must be assigned to that
698    variable for each covered sample in the pixel, and that value must be
699    sampled at the location of the individual sample.
700
701
702    (Insert before Section 4.7, Order of Qualification, p. 47)
703
704    Section 4.Q, The Precise Qualifier
705
706    Some algorithms may require that floating-point computations be carried
707    out in exactly the manner specified in the source code, even if the
708    implementation supports optimizations that could produce nearly equivalent
709    results with higher performance.  For example, many GL implementations
710    support a "multiply-add" that can compute values such as
711
712      float result = (float(a) * float(b)) + float(c);
713
714    in a single operation.  The result of a floating-point multiply-add may
715    not always be identical to first doing a multiply yielding a
716    floating-point result, and then doing a floating-point add.  By default,
717    implementations are permitted to perform optimizations that effectively
718    modify the order of the operations used to evaluate an expression, even if
719    those optimizations may produce slightly different results relative to
720    unoptimized code.
721
722    The qualifier "precise" will ensure that operations contributing to a
723    variable's value are performed in the order and with the precision
724    specified in the source code.  Order of evaluation is determined by
725    operator precedence and parentheses, as described in Section 5.
726    Expressions must be evaluated with a precision consistent with the
727    operation; for example, multiplying two "float" values must produce a
728    single value with "float" precision.  This effectively prohibits the
729    arbitrary use of fused multiply-add operations if the intermediate
730    multiply result is kept at a higher precision.  For example:
731
732      precise out vec4 position;
733
734    declares that computations used to produce the value of "position" must be
735    performed precisely using the order and precision specified.  As with the
736    invariant qualifier (section 4.6.1), the precise qualifier may be used to
737    qualify a built-in or previously declared user-defined variable as being
738    precise:
739
740      out vec3 Color;
741      precise Color;            // make existing Color be precise
742
743    This qualifier will affect the evaluation of expressions used on the
744    right-hand side of an assignment if and only if:
745
746      * the variable assigned to is qualified as "precise"; or
747
748      * the value assigned is used later in the same function, either directly
749        or indirectly, on the right-hand of an assignment to a variable
750        declared as "precise".
751
752    Expressions computed in a function are treated as precise only if assigned
753    to a variable qualified as "precise" in that same function.  Any other
754    expressions within a function are not automatically treated as precise,
755    even if they are used to determine a value that is returned by the
756    function and directly assigned to a variable qualified as "precise".
757
758    Some examples of the use of "precise" include:
759
760      in vec4 a, b, c, d;
761      precise out vec4 v;
762
763      float func(float e, float f, float g, float h)
764      {
765        return (e*f) + (g*h);            // no special precision
766      }
767
768      float func2(float e, float f, float g, float h)
769      {
770        precise result = (e*f) + (g*h);  // ensures a precise return value
771        return result;
772      }
773
774      float func3(float i, float j, precise out float k)
775      {
776        k = i * i + j;                   // precise, due to <k> declaration
777      }
778
779      void main(void)
780      {
781        vec4 r = vec3(a * b);           // precise, used to compute v.xyz
782        vec4 s = vec3(c * d);           // precise, used to compute v.xyz
783        v.xyz = r + s;                          // precise
784        v.w = (a.w * b.w) + (c.w * d.w);        // precise
785        v.x = func(a.x, b.x, c.x, d.x);         // values computed in func()
786                                                // are NOT precise
787        v.x = func2(a.x, b.x, c.x, d.x);        // precise!
788        func3(a.x * b.x, c.x * d.x, v.x);       // precise!
789      }
790
791
792    Modify Section 4.7, Order of Qualification, p. 47
793
794    When multiple qualifications are present, they must follow a strict order.
795    This order is as follows:
796
797      precise-qualifier invariant-qualifier interpolation-qualifier storage-qualifier
798         precision-qualifier
799
800
801    Modify Section 5.9, Expressions, p. 57
802
803    (modify bulleted list as follows, adding support for implicit conversion
804    between signed and unsigned types)
805
806    Expressions in the shading language are built from the following:
807
808    * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector
809      types, and all matrix types.
810
811    ...
812
813    * The operator modulus (%) operates on signed or unsigned integer scalars
814      or vectors.  If the fundamental types of the operands do not match, the
815      conversions from Section 4.1.10 "Implicit Conversions" are applied to
816      produce matching types.  ...
817
818
819    Modify Section 6.1, Function Definitions, p. 63
820
821    (modify description of overloading, beginning at the top of p. 64)
822
823     Function names can be overloaded.  The same function name can be used for
824     multiple functions, as long as the parameter types differ.  If a function
825     name is declared twice with the same parameter types, then the return
826     types and all qualifiers must also match, and it is the same function
827     being declared.  For example,
828
829       vec4 f(in vec4 x, out vec4  y);   // (A)
830       vec4 f(in vec4 x, out uvec4 y);   // (B) okay, different argument type
831       vec4 f(in ivec4 x, out uvec4 y);  // (C) okay, different argument type
832
833       int  f(in vec4 x, out ivec4 y);  // error, only return type differs
834       vec4 f(in vec4 x, in  vec4  y);  // error, only qualifier differs
835       vec4 f(const in vec4 x, out vec4 y);  // error, only qualifier differs
836
837     When function calls are resolved, an exact type match for all the
838     arguments is sought.  If an exact match is found, all other functions are
839     ignored, and the exact match is used.  If no exact match is found, then
840     the implicit conversions in Section 4.1.10 (Implicit Conversions) will be
841     applied to find a match.  Mismatched types on input parameters (in or
842     inout or default) must have a conversion from the calling argument type
843     to the formal parameter type.  Mismatched types on output parameters (out
844     or inout) must have a conversion from the formal parameter type to the
845     calling argument type.
846
847     If implicit conversions can be used to find more than one matching
848     function, a single best-matching function is sought.  To determine a best
849     match, the conversions between calling argument and formal parameter
850     types are compared for each function argument and pair of matching
851     functions.  After these comparisons are performed, each pair of matching
852     functions are compared.  A function definition A is considered a better
853     match than function definition B if:
854
855       * for at least one function argument, the conversion for that argument
856         in A is better than the corresponding conversion in B; and
857
858       * there is no function argument for which the conversion in B is better
859         than the corresponding conversion in A.
860
861     If a single function definition is considered a better match than every
862     other matching function definition, it will be used.  Otherwise, a
863     semantic error occurs and the shader will fail to compile.
864
865     To determine whether the conversion for a single argument in one match is
866     better than that for another match, the following rules are applied, in
867     order:
868
869       1. An exact match is better than a match involving any implicit
870          conversion.
871
872       2. A match involving an implicit conversion from float to double is
873          better than a match involving any other implicit conversion.
874
875       3. A match involving an implicit conversion from either int or uint to
876          float is better than a match involving an implicit conversion from
877          either int or uint to double.
878
879     If none of the rules above apply to a particular pair of conversions,
880     neither conversion is considered better than the other.
881
882     For the function prototypes (A), (B), and (C) above, the following
883     examples show how the rules apply to different sets of calling argument
884     types:
885
886       f(vec4, vec4);        // exact match of vec4 f(in vec4 x, out vec4 y)
887       f(vec4, uvec4);       // exact match of vec4 f(in vec4 x, out ivec4 y)
888       f(vec4, ivec4);       // matched to vec4 f(in vec4 x, out vec4 y)
889                             //   (C) not relevant, can't convert vec4 to
890                             //   ivec4.  (A) better than (B) for 2nd
891                             //   argument (rule 2), same on first argument.
892       f(ivec4, vec4);       // NOT matched.  All three match by implicit
893                             //   conversion.  (C) is better than (A) and (B)
894                             //   on the first argument.  (A) is better than
895                             //   (B) and (C).
896
897
898    Modify Section 7.1, Vertex And Geometry Shader Special Variables, p. 69
899
900    (add to the list of geometry shader special variables, p. 69)
901
902      in int gl_InvocationID;
903
904    (add to the end of the section, p. 71)
905
906    The input variable gl_InvocationID is available in the geometry language
907    and is filled with an integer holding the invocation number associated
908    with the given shader invocation.  If the program is linked to support
909    multiple geometry shader invocations per input primitive, the invocations
910    are numbered 0, 1, 2, ..., <N>-1.  gl_InvocationID is not available in the
911    vertex or fragment language.
912
913
914    Modify Section 7.2, Fragment Shader Special Variables, p. 72
915
916    (add to the list of built-in variables)
917
918      in int gl_SampleMaskIn[];
919
920    The variable gl_SampleMaskIn is an array of integers, each holding a
921    bitfield indicating the set of samples covered by the primitive generating
922    the fragment during multisample rasterization.  The array has ceil(<s>/32)
923    elements, where <s> is the maximum number of color samples supported by
924    the implementation.  Bit <n> or word <w> in the bitfield is set if and
925    only if the sample numbered <w>*32+<n> is considered covered for this
926    fragment shader invocation.
927
928
929    Modify Section 8.3, Common Functions, p. 84
930
931    (add support for floating-point multiply-add)
932
933    Syntax:
934
935      genType fma(genType a, genType b, genType c);
936
937    The function fma() performs a fused floating-point multiply-add to compute
938    the value a*b+c.  The results of fma() may not be identical to evaluating
939    the expression (a*b)+c, because the computation may be performed in a
940    single operation with intermediate precision different from that used to
941    compute a non-fma() expression.
942
943    The results of fma() are guaranteed to be invariant given fixed inputs
944    <a>, <b>, and <c>, as though the result were taken from a variable
945    declared as "precise".
946
947
948    (add support for single-precision frexp and ldexp functions)
949
950    Syntax:
951
952      genType frexp(genType x, out genIType exp);
953      genType ldexp(genType x, in genIType exp);
954
955    The function frexp() splits each single-precision floating-point number in
956    <x> into a binary significand, a floating-point number in the range [0.5,
957    1.0), and an integral exponent of two, such that:
958
959      x = significand * 2 ^ exponent
960
961    The significand is returned by the function; the exponent is returned in
962    the parameter <exp>.  For a floating-point value of zero, the significant
963    and exponent are both zero.  For a floating-point value that is an
964    infinity or is not a number, the results of frexp() are undefined.
965
966    If the input <x> is a vector, this operation is performed in a
967    component-wise manner; the value returned by the function and the value
968    written to <exp> are vectors with the same number of components as <x>.
969
970    The function ldexp() builds a single-precision floating-point number from
971    each significand component in <x> and the corresponding integral exponent
972    of two in <exp>, returning:
973
974      significand * 2 ^ exponent
975
976    If this product is too large to be represented as a single-precision
977    floating-point value, the result is considered undefined.
978
979    If the input <x> is a vector, this operation is performed in a
980    component-wise manner; the value passed in <exp> and returned by the
981    function are vectors with the same number of components as <x>.
982
983
984    (add support for new integer built-in functions)
985
986    Syntax:
987
988      genIType bitfieldExtract(genIType value, int offset, int bits);
989      genUType bitfieldExtract(genUType value, int offset, int bits);
990
991      genIType bitfieldInsert(genIType base, genIType insert, int offset,
992                              int bits);
993      genUType bitfieldInsert(genUType base, genUType insert, int offset,
994                              int bits);
995
996      genIType bitfieldReverse(genIType value);
997      genUType bitfieldReverse(genUType value);
998
999      genIType bitCount(genIType value);
1000      genIType bitCount(genUType value);
1001
1002      genIType findLSB(genIType value);
1003      genIType findLSB(genUType value);
1004
1005      genIType findMSB(genIType value);
1006      genIType findMSB(genUType value);
1007
1008    The function bitfieldExtract() extracts bits <offset> through
1009    <offset>+<bits>-1 from each component in <value>, returning them in the
1010    least significant bits of corresponding component of the result.  For
1011    unsigned data types, the most significant bits of the result will be set
1012    to zero.  For signed data types, the most significant bits will be set to
1013    the value of bit <offset>+<base>-1.  If <bits> is zero, the result will be
1014    zero.  The result will be undefined if <offset> or <bits> is negative, or
1015    if the sum of <offset> and <bits> is greater than the number of bits used
1016    to store the operand.  Note that for vector versions of bitfieldExtract(),
1017    a single pair of <offset> and <bits> values is shared for all components.
1018
1019    The function bitfieldInsert() inserts the <bits> least significant bits of
1020    each component of <insert> into the corresponding component of <base>.
1021    The result will have bits numbered <offset> through <offset>+<bits>-1
1022    taken from bits 0 through <bits>-1 of <insert>, and all other bits taken
1023    directly from the corresponding bits of <base>.  If <bits> is zero, the
1024    result will simply be <base>.  The result will be undefined if <offset> or
1025    <bits> is negative, or if the sum of <offset> and <bits> is greater than
1026    the number of bits used to store the operand.  Note that for vector
1027    versions of bitfieldInsert(), a single pair of <offset> and <bits> values
1028    is shared for all components.
1029
1030    The function bitfieldReverse() reverses the bits of <value>.  The bit
1031    numbered <n> of the result will be taken from bit (<bits>-1)-<n> of
1032    <value>, where <bits> is the total number of bits used to represent
1033    <value>.
1034
1035    The function bitCount() returns the number of one bits in the binary
1036    representation of <value>.
1037
1038    The function findLSB() returns the bit number of the least significant one
1039    bit in the binary representation of <value>.  If <value> is zero, -1 will
1040    be returned.
1041
1042    The function findMSB() returns the bit number of the most significant bit
1043    in the binary representation of <value>.  For positive integers, the
1044    result will be the bit number of the most significant one bit.  For
1045    negative integers, the result will be the bit number of the most
1046    significant zero bit.  For a <value> of zero or negative one, -1 will be
1047    returned.
1048
1049
1050    (add support for general packing functions)
1051
1052    Syntax:
1053
1054      uint      packUnorm2x16(vec2 v);
1055      uint      packUnorm4x8(vec4 v);
1056      uint      packSnorm4x8(vec4 v);
1057
1058      vec2      unpackUnorm2x16(uint v);
1059      vec4      unpackUnorm4x8(uint v);
1060      vec4      unpackSnorm4x8(uint v);
1061
1062    The functions packUnorm2x16(), packUnorm4x8(), and packSnorm4x8() first
1063    convert each component of a two- or four-component vector of normalized
1064    floating-point values into 8- or 16-bit integer values.  Then, the results
1065    are packed into a 32-bit unsigned integer.  The first component of the
1066    vector will be written to the least significant bits of the output; the
1067    last component will be written to the most significant bits.
1068
1069    The functions unpackUnorm2x16(), unpackUnorm4x8(), and unpackSnorm4x8()
1070    first unpacks a single 32-bit unsigned integer into a pair of 16-bit
1071    unsigned integers, four 8-bit unsigned integers, or four 8-bit signed
1072    integers.  The, each component is converted to a normalized floating-point
1073    value to generate a two- or four-component vector.  The first component of
1074    the vector will be extracted from the least significant bits of the input;
1075    the last component will be extracted from the most significant bits.
1076
1077    The conversion between fixed- and normalized floating-point values will be
1078    performed as below.
1079
1080      function          conversion
1081      ---------------   -----------------------------------------------------
1082      packUnorm2x16     fixed_val = round(clamp(float_val, 0, +1) * 65535.0);
1083      packUnorm4x8      fixed_val = round(clamp(float_val, 0, +1) * 255.0);
1084      packSnorm4x8      fixed_val = round(clamp(float_val, -1, +1) * 127.0);
1085      unpackUnorm2x16   float_val = fixed_val / 65535.0;
1086      unpackUnorm4x8    float_val = fixed_val / 255.0;
1087      unpackSnorm4x8    float_val = clamp(fixed_val / 127.0, -1, +1);
1088
1089
1090    (add functions to get/set the bit encoding for floating-point values)
1091
1092    32-bit floating-point data types in the OpenGL shading language are
1093    specified to be encoded according to the IEEE 754 specification for
1094    single-precision floating-point values.  The functions below allow shaders
1095    to convert floating-point values to and from signed or unsigned integers
1096    representing their encoding.
1097
1098    To obtain signed or unsigned integer values holding the encoding of a
1099    floating-point value, use:
1100
1101      genIType floatBitsToInt(genType value);
1102      genUType floatBitsToUint(genType value);
1103
1104    Conversions are done on a component-by-component basis.
1105
1106    To obtain a floating-point value corresponding to a signed or unsigned
1107    integer encoding, use:
1108
1109      genType intBitsToFloat(genIType value);
1110      genType uintBitsToFloat(genUType value);
1111
1112
1113    (support for unsigned integer add/subtract with carry-out)
1114
1115    Syntax:
1116
1117      genUType uaddCarry(genUType x, genUType y, out genUType carry);
1118      genUType usubBorrow(genUType x, genUType y, out genUType borrow);
1119
1120    The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and
1121    <y>, returning the sum modulo 2^32.  The value <carry> is set to zero if
1122    the sum was less than 2^32, or one otherwise.
1123
1124    The function usubBorrow() subtracts the 32-bit unsigned integer or vector
1125    <y> from <x>, returning the difference if non-negative or 2^32 plus the
1126    difference, otherwise.  The value <borrow> is set to zero if x >= y, or
1127    one otherwise.
1128
1129
1130    (support for signed and unsigned multiplies, with 32-bit inputs and a
1131     64-bit result spanning two 32-bit outputs)
1132
1133    Syntax:
1134
1135      void umulExtended(genUType x, genUType y, out genUType msb,
1136                        out genUType lsb);
1137      void imulExtended(genIType x, genIType y, out genIType msb,
1138                        out genIType lsb);
1139
1140    The functions umulExtended() and imulExtended() multiply 32-bit unsigned
1141    or signed integers or vectors <x> and <y>, producing a 64-bit result.  The
1142    32 least significant bits are returned in <lsb>; the 32 most significant
1143    bits are returned in <msb>.
1144
1145
1146    Modify Section 8.7, Texture Lookup Functions, p. 91
1147
1148    (extend the basic versions of textureGather from ARB_texture_gather,
1149     allowing for optional component selection in a multi-component texture
1150     and for shadow mapping)
1151
1152    Syntax:
1153      gvec4 textureGather(gsampler2D sampler, vec2 coord[, int comp]);
1154      gvec4 textureGather(gsampler2DArray sampler, vec3 coord[, int comp]);
1155      gvec4 textureGather(gsamplerCube sampler, vec3 coord[, int comp]);
1156      gvec4 textureGather(gsamplerCubeArray sampler, vec4 coord[, int comp]);
1157      gvec4 textureGather(gsampler2DRect sampler, vec2 coord[, int comp]);
1158
1159      vec4 textureGather(sampler2DShadow sampler, vec2 coord, float refZ);
1160      vec4 textureGather(sampler2DArrayShadow sampler, vec3 coord, float refZ);
1161      vec4 textureGather(samplerCubeShadow sampler, vec3 coord, float refZ);
1162      vec4 textureGather(samplerCubeArrayShadow sampler, vec4 coord,
1163                         float refZ);
1164      vec4 textureGather(sampler2DRectShadow sampler, vec2 coord, float refZ);
1165
1166    The textureGather() functions use the texture coordinates given by <coord>
1167    to determine a set of four texels to sample from the texture identified by
1168    <sampler>.  These functions return a four-component vector consisting of
1169    one component from each texel.  If specified, the value of <comp> must be
1170    a constant integer expression with a value of zero, one, two, or three,
1171    identifying the <x>, <y>, <z>, or <w> component of the four-component
1172    vector lookup result for each texel, respectively.  If <comp> is not
1173    specified, the <x> component of each texel will be used to generate the
1174    result vector.  As described in the OpenGL Specification, the vector
1175    selects the post-swizzle component corresponding to <comp> from each of
1176    the four texels, returning:
1177
1178      vec4(T_i0_j1(coord, base).<comp>,
1179           T_i1_j1(coord, base).<comp>,
1180           T_i1_j0(coord, base).<comp>,
1181           T_i0_j0(coord, base).<comp>)
1182
1183    For textureGather() functions using a shadow sampler type, each of the
1184    four texel lookups performs a depth comparison against the depth reference
1185    value passed in <refZ>, and returns the result of that comparison in the
1186    appropriate component of the result vector.  The parameter <comp> used for
1187    component selection is not supported for textureGather() functions with
1188    shader sampler types.
1189
1190    As with other texture lookup functions, the results of textureGather() are
1191    undefined for shadow samplers if the texture referenced is not a depth
1192    texture or has depth comparisons disabled; or for non-shadow samplers if
1193    the texture referenced is a depth texture with depth comparisons enabled.
1194
1195
1196    (extend the "Offset" versions of textureGather from ARB_texture_gather,
1197     allowing for optional component selection in a multi-component texture,
1198     non-constant offsets, and shadow mapping)
1199
1200    Syntax:
1201      gvec4 textureGatherOffset(gsampler2D sampler, vec2 coord,
1202                                ivec2 offset[, int comp]);
1203      gvec4 textureGatherOffset(gsampler2DArray sampler, vec3 coord,
1204                                ivec2 offset[, int comp]);
1205      gvec4 textureGatherOffset(gsampler2DRect sampler, vec2 coord,
1206                                ivec2 offset[, int comp]);
1207
1208      vec4 textureGatherOffset(sampler2DShadow sampler, vec2 coord,
1209                               float refZ, ivec2 offset);
1210      vec4 textureGatherOffset(sampler2DArrayShadow sampler, vec3 coord,
1211                               float refZ, ivec2 offset);
1212      vec4 textureGatherOffset(sampler2DRectShadow sampler, vec2 coord,
1213                               float refZ, ivec2 offset);
1214
1215    The textureGatherOffset() functions operate identically to
1216    textureGather(), except that the 2-component integer texel offset vector
1217    <offset> is applied as a (u,v) offset to determine the four texels to
1218    sample.  The value <offset> need not be constant; however, a limited range
1219    of offset values are supported.  If any component of <offset> is less than
1220    MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB or greater than
1221    MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, the offset applied to the texture
1222    coordinates is undefined.  Note that <offset> does not apply to the layer
1223    coordinate for array textures.
1224
1225
1226    (add new "Offsets" versions of textureGather from ARB_texture_gather,
1227     allowing for optional component selection in a multi-component texture,
1228     separate non-constant offsets for each texel in the footprint, and shadow
1229     mapping)
1230
1231    Syntax:
1232      gvec4 textureGatherOffsets(gsampler2D sampler, vec2 coord,
1233                                 ivec2 offsets[4][, int comp]);
1234      gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 coord,
1235                                 ivec2 offsets[4][, int comp]);
1236      gvec4 textureGatherOffsets(gsampler2DRect sampler, vec2 coord,
1237                                 ivec2 offsets[4][, int comp]);
1238
1239      vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 coord,
1240                                float refZ, ivec2 offsets[4]);
1241      vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 coord,
1242                                float refZ, ivec2 offsets[4]);
1243      vec4 textureGatherOffsets(sampler2DRectShadow sampler, vec2 coord,
1244                                float refZ, ivec2 offsets[4]);
1245
1246    The textureGatherOffsets() functions operate identically to
1247    textureGather(), except that the array of two-component integer vectors
1248    <offsets> is used to determine the location of the four texels to sample.
1249    Each of the four texels is obtained by applying the corresponding offset
1250    in the four-element array <offsets> as a (u,v) coordinate offset to the
1251    coordinates <coord>, identifying the four-texel LINEAR footprint, and then
1252    selecting the texel T_i0_j0 of that footprint.  The specified values in
1253    <offsets> must be constant.  A limited range of offset values are
1254    supported; the minimum and maximum offset values are
1255    implementation-dependent and given by
1256    MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB and
1257    MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, respectively.  Note that <offset>
1258    does not apply to the layer coordinate for array textures.
1259
1260
1261    Modify Section 8.8, Fragment Processing Functions, p. 101
1262
1263    (add new functions to the end of section, p. 102)
1264
1265    Built-in interpolation functions are available to compute an interpolated
1266    value of a fragment shader input variable at a shader-specified (x,y)
1267    location.  A separate (x,y) location may be used for each invocation of
1268    the built-in function, and those locations may differ from the default
1269    (x,y) location used to produce the default value of the input.
1270
1271      float interpolateAtCentroid(float interpolant);
1272      vec2 interpolateAtCentroid(vec2 interpolant);
1273      vec3 interpolateAtCentroid(vec3 interpolant);
1274      vec4 interpolateAtCentroid(vec4 interpolant);
1275
1276      float interpolateAtSample(float interpolant, int sample);
1277      vec2 interpolateAtSample(vec2 interpolant, int sample);
1278      vec3 interpolateAtSample(vec3 interpolant, int sample);
1279      vec4 interpolateAtSample(vec4 interpolant, int sample);
1280
1281      float interpolateAtOffset(float interpolant, vec2 offset);
1282      vec2 interpolateAtOffset(vec2 interpolant, vec2 offset);
1283      vec3 interpolateAtOffset(vec3 interpolant, vec2 offset);
1284      vec4 interpolateAtOffset(vec4 interpolant, vec2 offset);
1285
1286    The function interpolateAtCentroid() will return the value of the input
1287    varying <interpolant> sampled at a location inside the both the pixel and
1288    the primitive being processed.  The value obtained would be the same value
1289    assigned to the input variable if declared with the "centroid" qualifier.
1290
1291    The function interpolateAtSample() will return the value of the input
1292    varying <interpolant> at the location of the sample numbered <sample>.  If
1293    multisample buffers are not available, the input varying will be evaluated
1294    at the center of the pixel.  If the sample number given by <sample> does
1295    not exist, the position used to interpolate the input varying is
1296    undefined.
1297
1298    The function interpolateAtOffset() will return the value of the input
1299    varying <interpolant> sampled at an offset from the center of the pixel
1300    specified by <offset>.  The two floating-point components of <offset>
1301    give the offset in pixels in the x and y directions, respectively.
1302    An offset of (0,0) identifies the center of the pixel.  The range and
1303    granularity of offsets supported by this function is
1304    implementation-dependent.
1305
1306    For all of the interpolation functions, <interpolant> must be an input
1307    variable or an element of an input variable declared as an array.
1308    Component selection operators (e.g., ".xy") may not be used when
1309    specifying <interpolant>.  If <interpolant> is declared with a "flat" or
1310    "centroid" qualifier, the qualifier will have no effect on the
1311    interpolated value.  If <interpolant> is declared with the "noperspective"
1312    qualifier, the interpolated value will be computed without perspective
1313    correction.
1314
1315
1316    Modify Section 8.10, Geometry Shader Functions, p. 104
1317
1318    (replace the section, using the following more general formulation)
1319
1320    These functions are only available in geometry shaders.
1321
1322    Syntax:
1323
1324        void EmitStreamVertex(int stream);      // Geometry-only
1325        void EndStreamPrimitive(int stream);    // Geometry-only
1326
1327        void EmitVertex();                      // Geometry-only
1328        void EndPrimitive();                    // Geometry-only
1329
1330    Description:
1331
1332    The function EmitStreamVertex() specifies that the vertex being generated
1333    by the geometry shader is completed.  A vertex is added to the current
1334    output primitive in the vertex stream numbered <stream> using the current
1335    values of all output variables associated with <stream>.  The values of
1336    any unwritten output variables associated with <stream> are undefined.
1337    The argument <stream> must be a constant integral expression.  The values
1338    of all output variables (for all output streams) are undefined after
1339    calling EmitStreamVertex().  If a geometry shader invocation has emitted
1340    more vertices than permitted by the output layout qualifier
1341    "max_vertices", the results of calling EmitStreamVertex() are undefined.
1342
1343    The function EmitVertex() is equivalent to calling EmitStreamVertex() with
1344    <stream> set to zero.
1345
1346    The function EndStreamPrimitive() specifies that the current output
1347    primitive for the vertex stream numbered <stream> is completed and that a
1348    new empty output primitive of the same type should be started.  The
1349    argument <stream> must be a constant integral expression.  This function
1350    does not emit a vertex.  If the output layout is declared to be "points",
1351    calling EndPrimitive() is optional.
1352
1353    The function EndPrimitive() is equivalent to calling EndStreamPrimitive()
1354    with <stream> set to zero.
1355
1356    A geometry shader starts with an output primitive containing no vertices
1357    for each stream.  When a geometry shader terminates, the current output
1358    primitive for each vertex stream is automatically completed.  It is not
1359    necessary to call EndPrimitive() or EndStreamPrimitive() for any stream
1360    where the geometry shader writes only a single primitive.
1361
1362    Multiple vertex streams are supported only if the output primitive type is
1363    declared to be "points".  A program will fail to link if it contains a
1364    geometry shader calling EmitStreamVertex() or EndStreamPrimitive() if its
1365    output primitive type is not "points".
1366
1367
1368    Modify Section 9, Shading Language Grammar, p. 92
1369
1370    !!! TBD !!!
1371
1372
1373GLX Protocol
1374
1375    None.
1376
1377Dependencies on ARB_gpu_shader_fp64
1378
1379    This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
1380    of implicit conversions supported in the OpenGL Shading Language.  If more
1381    than one of these extensions is supported, an expression of one type may
1382    be converted to another type if that conversion is allowed by any of these
1383    specifications.
1384
1385    If ARB_gpu_shader_fp64 or a similar extension introducing new data types
1386    is not supported, the function overloading rule in the GLSL specification
1387    preferring promotion an input parameters to smaller type to a larger type
1388    is never applicable, as all data types are of the same size.  That rule
1389    and the example referring to "double" should be removed.
1390
1391
1392Dependencies on NV_gpu_shader5
1393
1394    This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set
1395    of implicit conversions supported in the OpenGL Shading Language.  If more
1396    than one of these extensions is supported, an expression of one type may
1397    be converted to another type if that conversion is allowed by any of these
1398    specifications.
1399
1400    This specification and NV_gpu_shader5 both lift the restriction in GLSL
1401    1.50 requiring that indexing in arrays of samplers must be done with
1402    constant expressions.  However, this extension specifies that results are
1403    undefined if the indices would diverge if multiple shader invocations are
1404    run in lockstep.  NV_gpu_shader5 does not impose the non-divergent
1405    indexing requirement.
1406
1407    If NV_gpu_shader5 is supported, integer data types are supported with four
1408    different precisions (8-, 16, 32-, and 64-bit) and floating-point data
1409    types are supported with three different precisions (16-, 32-, and
1410    64-bit).  The extension adds the following rule for output parameters,
1411    which is similar to the one present in this extension for input
1412    parameters:
1413
1414       5. If the formal parameters in both matches are output parameters, a
1415          conversion from a type with a larger number of bits per component is
1416          better than a conversion from a type with a smaller number of bits
1417          per component.  For example, a conversion from an "int16_t" formal
1418          parameter type to "int"  is better than one from an "int8_t" formal
1419          parameter type to "int".
1420
1421    Such a rule is not provided in this extension because there is no
1422    combination of types in this extension and ARB_gpu_shader_fp64 where this
1423    rule has any effect.
1424
1425
1426Dependencies on ARB_sample_shading
1427
1428    This extension builds upon the per-sample shading support provided by
1429    ARB_sample_shading to provide several new capabilities, including:
1430
1431      * the built-in variable gl_SampleMaskIn[] indicates the set of samples
1432        covered by the input primitive corresponding to the fragment shader
1433        invocation; and
1434
1435      * use of the "sample" qualifier on a fragment shader input forces
1436        per-sample shading, and specifies that the value of the input be
1437        evaluated per-sample.
1438
1439    There is no interaction between the extensions, except that shaders using
1440    the features of this extension seem likely to use features from
1441    ARB_sample_shading as well.
1442
1443
1444Dependencies on ARB_texture_gather
1445
1446    This extension builds upon the textureGather() built-ins provided by
1447    ARB_texture_gather to provide several new capabilities, including:
1448
1449      * allowing shaders to select any single component of a multi-component
1450        texture to produce the gathered 2x2 footprint;
1451
1452      * allowing shaders to perform a per-sample depth comparison when
1453        gathering the 2x2 footprint using for shadow sampler types;
1454
1455      * allowing shaders to use arbitrary offsets computed at run-time to
1456        select a 2x2 footprint to gather from; and
1457
1458      * allowing shaders to use separate independent offsets for each of the
1459        four texels returned, instead of requiring a fixed 2x2 footprint.
1460
1461    Other than the fact that they provide similar functionality, there is no
1462    interaction between the extensions.
1463
1464    Since this extension requires support for gathering from multi-component
1465    textures, the minimum value of MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB
1466    is increased to 4.
1467
1468
1469Errors
1470
1471    INVALID_OPERATION is generated by GetProgram if <pname> is
1472    GEOMETRY_SHADER_INVOCATIONS and the program which has not been linked
1473    successfully, or does not contain objects to form a geometry shader.
1474
1475
1476New State
1477
1478    Add the following state to Table 6.40, Program Object State, p. 378
1479
1480                                                    Initial
1481    Get Value                 Type   Get Command     Value     Description                  Sec.  Attribute
1482    ------------------------- ----  ------------    -------    -------------------------   ------  -------
1483    GEOMETRY_SHADER_           Z+    GetProgramiv      1       number of times a geometry  6.1.16    -
1484      INVOCATIONS                                              shader should be executed
1485                                                               for each input primitive
1486
1487New Implementation Dependent State
1488
1489                                               Min.
1490    Get Value               Type  Get Command  Value  Description                  Sec.      Attrib
1491    ----------------------  ----  -----------  -----  --------------------------   --------  ------
1492    MAX_GEOMETRY_SHADER_     Z+   GetIntegerv   32    maximum supported geometry   2.16.4      -
1493      INVOCATIONS                                     shader invocation count
1494    MIN_FRAGMENT_INTERP-     R    GetFloatv    -0.5   furthest negative offset     3.12.1      -
1495      OLATION_OFFSET                                   for interpolateAtOffset()
1496    MAX_FRAGMENT_INTERP-     R    GetFloatv    +0.5   furthest positive offset     3.12.1      -
1497      OLATION_OFFSET                                   for interpolateAtOffset()
1498    FRAGMENT_INTERPOLATION_  Z+   GetIntegerv    4    supixel bits for             3.12.1      -
1499      OFFSET_BITS                                      interpolateAtOffset()
1500    MAX_VERTEX_STREAMS       Z+   GetInteger     4    total number of vertex       2.16.4      -
1501                                                       streams
1502
1503    (Note:  The minimum value for MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB,
1504     added by ARB_texture_gather, is increased to 4.)
1505
1506Issues
1507
1508    (1) This extension builds on the capability provided by
1509        ARB_sample_shading, adding a new built-in variable for the input
1510        sample mask.  It seems likely that a shader using this mask might also
1511        want to use one or more ARB_sample_shading built-ins.  Are such
1512        shaders required to include #extension lines for both extensions?
1513
1514      UNRESOLVED:  It would be nice if it wasn't required.
1515
1516    (2) How do the per-sample shading features of this extension interact with
1517        non-multisample rendering?
1518
1519      RESOLVED:  Non-multisample rendering (due to no multisample buffer or
1520      MULTISAMPLE disabled) is treated as single-sample rendering.
1521
1522    (3) This extension lifts the restriction requiring that indices into
1523        samplers be constant expressions, but makes the results undefined if
1524        the indices used would diverge in lockstep execution.  What is this
1525        good for?
1526
1527      RESOLVED:  This allows shaders to index into samplers using integer
1528      uniforms, or with non-divergent values computed at run-time (e.g., loop
1529      counters).  Many implementations of this extension will be SIMD, running
1530      multiple shader invocations at once, and some implementations may have
1531      difficulty with accessing multiple textures in a single SIMD
1532      instruction.
1533
1534      Note that the NV_gpu_shader5 extension similarly lifts the restriction
1535      but does not require non-divergent indexing.
1536
1537    (4) What sort of implicit conversions should we support in this and
1538        related extensions?
1539
1540      RESOLVED:  In GLSL 1.50, we have implicit conversion from "int" and
1541      "uint" to "float", as well as equivalent conversions for vector type.
1542      One of the primary motivations of this feature is to allow constants
1543      that are nominally integer values to be used in floating-point contexts
1544      without requiring special suffixes.  The following code compiles
1545      successfully in GLSL 1.50.
1546
1547        float square(float x) {
1548          return x * x;
1549        }
1550        float f = 0;
1551        float g = f * 2;
1552        float h = square(3);
1553
1554      The same code would fail on GLSL 1.1, because "0", "2", and "3" would
1555      need to be written as "0.0", "2.0", and "3.0", respectively.
1556
1557      This extension adds implicit conversions from "int" to "uint" to allow
1558      for cases like:
1559
1560        uint square(uint x) {
1561          return x * x;
1562        }
1563        uint v = square(2);
1564
1565      This code is legal with this extension, but not in GLSL 1.50 ("2" would
1566      need to be replaced with "2U" or "uint(2)").
1567
1568      ARB_gpu_shader_fp64 adds a new type "double", and we extend existing
1569      implicit conversions to allow for promotion of "int", "uint", and
1570      "float" to "double".
1571
1572      Unlike C/C++, the general rule for implicit conversions in GLSL is that
1573      conversions are unidirectional.  If type A can be implicitly converted
1574      to type B, type B can not be converted to type A.
1575
1576    (5) Increasing the number of available implicit conversions means that
1577        there is the possibility of ambiguities in various operators?  How do
1578        we deal with these cases?
1579
1580      RESOLVED:  For binary operators, the new implicit conversions mean that
1581      there may be multiple ways to resolve an expression.  For example, in
1582      the following declaration
1583
1584        int i;
1585        uint u;
1586
1587      the expression "i+u" could be resolved either by implicitly converting
1588      "i" to "uint", or by implicitly converting both values to either "float"
1589      or "double".  To resolve, we define a set of preferences for a common
1590      data type based on the types of the operands:
1591
1592        - use a floating-point type if either operand is floating-point
1593        - use an unsigned integer type if either operand is unsigned
1594        - use a signed integer type otherwise
1595
1596      If conversions to multiple precisions are supported, the
1597      lowest-precision available data type is preferred (e.g., int*float will
1598      be converted to float*float and not double*double).
1599
1600      These rules should extend naturally if new basic data types are added.
1601
1602    (6) Increasing the number of available implicit conversions means that
1603        there is an increased possibility of ambiguity when function
1604        overloading is involved?  Additionally, this and related extensions
1605        add new function overloads?  How do we deal with these cases?
1606
1607      RESOLVED:  The general rule for function overloading in GLSL 1.50 is
1608      that we first check for a function prototype that exactly matches the
1609      parameters passed to a function call.  If no match exists, we check for
1610      prototypes that can be matched by implicit conversions.  If more than
1611      one matching prototype can be matched by conversion, the function call
1612      is considered ambiguous and results in a complication error.
1613
1614      Unfortunately, when adding new implicit conversions, it is possible for
1615      cases that were formally unambiguous to become ambiguous.  For backward
1616      compatibility purposes, it would be desirable to ensure that shaders
1617      that succeeded in old language versions should still compile if
1618      "upgraded" to more recent versions/extensions.  However, the new
1619      conversions and overloads might make this more difficult without
1620      modifying other language rules.  For example, the following prototypes
1621      are available for the standard built-in function min() on scalar values
1622      when this extension and ARB_gpu_shader_fp64 are supported:
1623
1624        int     min(int a, int b);
1625        uint    min(uint a, uint b);
1626        float   min(float a, float b);
1627        double  min(double a, double b);
1628
1629      In GLSL 1.50, a function call such as:
1630
1631        float f;
1632        min(f, 1);
1633
1634      would be considered unambiguous because the double-precision version of
1635      min() didn't exist and the call matched only the single-precision
1636      version.  However, with double-precision, implicit conversions can be
1637      used to resolve to either the single- or double-precision versions.
1638
1639      To resolve this issue, we provide a set of rules that can be used to
1640      resolve multiple candidates to a "best match".  The rules for
1641      determining a best match are similar to those for C++ function
1642      overloading, but not exactly the same.  Like C++, these rules compare
1643      the conversions required on an argument-by-argument basis.  A function
1644      prototype A is better than function prototype B if:
1645
1646        - A is better than B for one or more arguments
1647        - B is better than A for no arguments
1648
1649      If a single function prototype is better than all others, that one is
1650      used.  Otherwise, we get the same ambiguity error as on previous GLSL
1651      versions.
1652
1653      As far as argument-by-argument comparisons go, the order of preference
1654      is:
1655
1656        - favor exact matches
1657        - prefer "promotions" (float->double) to other conversions
1658        - prefer conversions from int/uint to float over similar conversion to
1659          double
1660
1661      If none of the rules apply, one match is considered neither better nor
1662      worse than the other.
1663
1664      With these rules, the "min(f,1)" example above resolves to the "float"
1665      version, as is the case in GLSL 1.50.  However, there are other cases
1666      where ambiguity remains.  For example, consider the prototypes:
1667
1668        int f(uint x);
1669        int f(float x);
1670
1671      With GLSL 1.50 rules, "f(3)" would match the floating-point version, as
1672      no implicit conversions existed from "int" to "uint".  With the new
1673      implicit conversions, both prototypes match and neither is preferred.
1674      Because of the ambiguity, "f(3)" would fail to compile with this
1675      extension enabled, but should still compile on implementations
1676      supporting this extension if the extension is not enabled in GLSL source
1677      code.
1678
1679    (7) The function overloading rules described in this extension describe
1680        conversions between data types with different sizes, however all
1681        existing data types allowing implicit conversion (int, uint, float)
1682        are the same size?  Why do we specify these rules?
1683
1684      RESOLVED:  This extension is specified at the same time as the related
1685      ARB_gpu_shader_fp64 and NV_gpu_shader5 extensions, which do provide such
1686      types.  The rules are specified all in one place here so we don't have
1687      to replicate and extend the rules in the other extensions.  It also
1688      provides the ability to automatically convert from signed to unsigned
1689      integer types, as in the C programming language.
1690
1691    (8) Should we support textureGather() for rectangle textures
1692        (sampler2DRect)?  They aren't in ARB_texture_gather.
1693
1694      RESOLVED:  Yes.
1695
1696    (9) How does the input sample mask interact with the fixed-function
1697        SampleCoverage and SampleMask state?  Will samples be removed from the
1698        input mask if they would be eliminated by these masks in the
1699        per-fragment operations?
1700
1701      UNRESOLVED.
1702
1703    (10) Should we support reading patches as geometry shader inputs, and if
1704    so, where?
1705
1706      RESOLVED:  Not in this extension.  This capability will be provided in
1707      NV_gpu_shader5.
1708
1709    (11) Should we support per-sample interpolation of attributes?  If so,
1710         how?
1711
1712      RESOLVED.  Yes.  When multisample rasterization is enabled, qualifying
1713      one or more fragment shader inputs with "sample" will force per-sample
1714      interpolation of those attributes.  If the same shader includes other
1715      fragment inputs not qualified with sample, those attributes may be
1716      interpolated per-pixel (i.e., all samples get the same values, likely
1717      evaluated at the pixel center).
1718
1719    (12) Should we reserve "sample" as a keyword for per-sample interpolation
1720    qualifiers, or use something more obscure, such as "per_sample"?
1721
1722      RESOLVED:  This extension uses "sample".
1723
1724    (13) What should be the base data type for the bitCount(), findLSB(), and
1725         findMSB() functions -- signed or unsigned integers?
1726
1727      RESOLVED:  These functions will return signed values, with -1 returned
1728      by findLSB/findMSB if no bit is found.  Note that the shading language
1729      supports implicit conversions of signed integers to unsigned, which
1730      makes it easy enough if an unsigned result is desired.
1731
1732    (14) Why do EmitVertex() and EndPrimitive() begin with capitalized words
1733         while most of the other built-ins start with a lower-case (e.g.,
1734         emitVertex)?  Which precedent should the new per-vertex stream emit
1735         and end primitive functions follow?
1736
1737      RESOLVED:  The inconsistency began with the original functions in
1738      EXT_geometry_shader4; the spec author can't recall the original reasons
1739      (if any).  Regardless, we decided to match the existing functions as
1740      closely as possible and use EmitStreamVertex() and EndStreamPrimitive().
1741
1742    (15) How do the textureGather functions work with sRGB textures?
1743
1744      RESOLVED:  Gamma-correction is applied to the texture source color
1745      before "gathering" and hence applies to all four components, unless the
1746      texture swizzle of the selected component is ALPHA in which case no
1747      gamma-correction is applied.
1748
1749    (16) How should we support arrays of uniform blocks (i.e., multiple blocks
1750         in a group, each backed by a separate buffer object)?
1751
1752      RESOLVED:  We will use instance names in the block definitions, which
1753      can be declared as regular arrays:
1754
1755        uniform UniformData {
1756          vec4 stuff;
1757        } blocks[4];
1758
1759      These four blocks used will be referred to as "block[0]" through
1760      "block[3]" in shader code, and "UniformData[0]" through "UniformData[3]"
1761      in the OpenGL API code.  The block member in this example will be
1762      referred to as "UniformData.stuff" in the API.  A similar approach was
1763      already adopted in GLSL 1.50, where geometry shaders supported arrays of
1764      input blocks that were treated similarly.  Since this spec depends on
1765      GLSL 1.50, little new spec language is required here.
1766
1767    (17) What are instanced geometry shaders useful for?
1768
1769      RESOLVED:  Instanced geometry shaders allow geometry programs that
1770      perform regular operations to run more efficiently.
1771
1772      Consider a simple example of an algorithm that uses geometry shaders to
1773      render primitives to a cube map in a single pass.  Without instanced
1774      geometry shaders, the geometry shader to render triangles to the cube
1775      map would do something like:
1776
1777        for (face = 0; face < 6; face++) {
1778          for (vertex = 0; vertex < 3; vertex++) {
1779            project vertex <vertex> onto face <face>, output position
1780            compute/copy attributes of emitted <vertex> to outputs
1781            output <face> to result.layer
1782            emit the projected vertex
1783          }
1784          end the primitive (next triangle)
1785        }
1786
1787      This algorithm would output 18 vertices per input triangle, three for
1788      each cube face.  The six triangles emitted would be rasterized, one per
1789      face.  Geometry shaders that emit a large number of attributes have
1790      often posed performance challenges, since all the attributes must be
1791      stored somewhere until the emitted primitives.  Large storage
1792      requirements may limit the number of threads that can be run in parallel
1793      and reduce overall performance.
1794
1795      Instanced geometry shaders allow this example to be restructured to run
1796      with six separate invocations, one per face.  Each invocation projects
1797      the triangle to only a single face (identified by the invocation number)
1798      and emits only 3 vertices.  The reduced storage requirements allow more
1799      geometry shader invocations to be run in parallel, with greater overall
1800      efficiency.
1801
1802      Additionally, the total number of attributes that can be emitted by a
1803      single geometry shader invocation is limited.  However, for instanced
1804      geometry shaders, that limit applies to each of <N> invocations which
1805      allows for a larger total output.  For example, if the GL implementation
1806      supports only 1024 components of output per invocation, the 18-vertex
1807      algorithm above could emit no more than 56 components per vertex.  The
1808      same algorithm implemented as a 3-vertex 6-invocation geometry program
1809      could theoretically allow for 341 components per vertex.
1810
1811    (18) Should EmitStreamVertex() and EndStreamPrimitive() accept a
1812         non-constant stream number?
1813
1814      RESOLVED:  Not in this extension.  Requiring a constant stream number
1815      for each call simplifies code generation for the compiler.
1816
1817    (19) Are there any restrictions on geometry shaders with multiple output
1818         streams?
1819
1820      RESOLVED:  Yes, such geometry shaders are required to generate points;
1821      line strip and triangle strip outputs are not supported.
1822
1823    (20) Since multi-stream geometry shaders only support points, why does
1824         EndStreamPrimitive() exist?  Neither it nor EndStream() does anything
1825         useful when emitting points.
1826
1827      RESOLVED:  This function was added for completeness, and would be useful
1828      if the requirement for emitting points were lifted by a future
1829      extension.
1830
1831    (21) Should we provide mechanisms allowing shaders to examine or set the
1832         bit representation of floating-point numbers?
1833
1834      RESOLVED:  Yes, we will provide functions to convert single-precision
1835      floats to/from signed and unsigned 32-bit integers.  The
1836      ARB_gpu_shader_fp64 extension will provide similar functionality for
1837      double-precision floats.  We chose to adopt the Java naming convention
1838      here -- converting a single-precision float to/from a signed integer is
1839      accomplished by the functions floatBitsToInt() and intBitsToFloat().
1840
1841      Note that this functionality has also been forked off into a separate
1842      extension (ARB_shader_bit_encoding) that can be exported on
1843      implementations capable of performing such conversions but not capable
1844      of the full feature set of this extension and/or OpenGL 4.0.
1845
1846    (22) What is the "precise" qualifier good for?
1847
1848      RESOLVED:  Like "invariant", "precise" provides some invariance
1849      guarantees is useful for certain algorithms.
1850
1851      With an output position qualified as "invariant", we ensure that if the
1852      same geometry is processed by multiple shaders using the exact same
1853      code, it will be transformed in exactly the same way to ensure that we
1854      have no cracking or flickering in multi-pass algorithms using different
1855      shaders.
1856
1857      With "precise", we ensure that an algorithm can be written to produce
1858      identical results on subtly different inputs.  For example, the order of
1859      vertices visible to a geometry or tessellation shader used to subdivide
1860      primitive edges might present an edge shared between two primitives in
1861      one direction for one primitive and the other direction for the adjacent
1862      primitive.  Even if the weights are identical in the two cases, there
1863      may be cracking if the computations are being done in an order-dependent
1864      manner.  If the position of a new vertex were provided by evaluation the
1865      function f() below with limited-precision floating-point math, it's not
1866      necessarily the case that f(a,b,c) == f(c,b,a) in the following code:
1867
1868          float f(float x, float y, float z)
1869          {
1870            return (x + y) + z;
1871          }
1872
1873      This function f() can be rewritten as follows with "precise" and a
1874      symmetric evaluation order to ensure that f(a,b,c) == f(c,b,a).
1875
1876          float f(float x, float y, float z)
1877          {
1878            // Note that we intentionally compute "(x+z)" instead of "(x+y)"
1879            // here, because that value will be the same when <x> and <z>
1880            // are reversed.
1881            precise float result = (x + z) + y;
1882            return result;
1883          }
1884
1885          (a + b) + c == (c + b) + a
1886
1887      The "precise" qualifier will disable certain optimization and thus
1888      carries a performance cost.  The cost may be higher than "invariant",
1889      because "invariant" permits optimizations disallowed by "precise" as
1890      long as the compiler ensures that it always optimizes in the exact same
1891      manner.
1892
1893    (23) What computations will be affected by the "precise" qualifier, and
1894         what computations aren't?
1895
1896      RESOLVED:  We will ensure precise computation of any expressions within
1897      a single function used directly or indirectly to produce the value of a
1898      variable qualified as "precise".
1899
1900      We chose not to provide this guarantee across function boundaries, even
1901      if the results of a function are used in the computation of an output
1902      qualified as "precise".  Algorithms requiring the use of "precise" may
1903      have a mix of computations, some required to be precise, some not.  This
1904      function boundary rule may serve to limit the amount of computation
1905      indirectly forced to be precise.
1906
1907      Additionally, the subroutine rule permits non-precise sub-operations in
1908      a computation required to be precise.  For example, a shader might need
1909      to compute a "precise" position by taking a weighted average as in the
1910      following code:
1911
1912        precise vec3 pos = (p[0]*w[0] + p[1]*w[1]) + (p[2]*w[2] + p[3]*w[3]);
1913
1914      However, if the main precision requirement is that the same result be
1915      generated when <p> and <w> are reversed, the following code also gets
1916      the job done, even if posmad() is implemented with multiply-add
1917      operations.
1918
1919        vec3 posmad(vec3 p0, float w0, vec3 p1w1) { return p0*w0+p1w1; }
1920        precise vec3 pos = (posmad(p[0], w[0], p[1]*w[1]) +
1921                            posmad(p[3], w[3], p[2]*w[2]));
1922
1923      To generate precise results within a function, the function arguments
1924      and/or temporaries within the function body should be qualified as
1925      "precise" as needed.
1926
1927      Note that when applying "precise" rules to assignments, indirect
1928      application of this rule applies on an assignment-by-assignment basis.
1929      In the following perverse example:
1930
1931        float a,b,c,d,e,f;
1932        precise float g;
1933        f = a + b + c;
1934        ...
1935        f = c + d + e;
1936        g = f * 2.0;
1937
1938      The first assignment to <f> need not be treated as "precise", since the
1939      value assigned will have no effect on the final value of the
1940      precise-qualified <g>.  The second assignment to <f> must be evaluated
1941      precisely.  The fact that one assignment to a variable needs to be
1942      treated as precise does not mean that the variable itself is implicitly
1943      treated as "precise".
1944
1945    (24) Are "precise" qualifiers allowed on function arguments?  If so, what
1946         do they mean?  Can a return value for a function be declared as
1947         precise?
1948
1949      RESOLVED:  Yes; the rules permit the use of "precise" on any variable
1950      declaration, including function arguments.  The code
1951
1952        float f(precise in vec4 arg1, precise out vec4 arg2) { ... }
1953
1954      specifies that any expressions used to assign values to <arg1> or <arg2>
1955      within f() will be evaluated as a precise manner.
1956
1957      Expressions used to derive the value passed to the function f() as
1958      <arg1> will be treated as precise according to the normal rules.  The
1959      expression for <arg1> is treated as precise if and only if the function
1960      call is on the right-hand side of an assignment to a variable qualified
1961      as "precise" or is indirectly used in an assignment to such a variable.
1962      It is not automatically treated as precise just because the formal
1963      parameter <arg1> is qualified with "precise".
1964
1965      For the purposes of this rule, variables passed as "out" parameters do
1966      not count as assignments.  Values assigned to an output parameter will
1967      not be evaluated precisely just because the caller provides a variable
1968      qualified as "precise".  When the output parameter itself is qualified
1969      as "precise", precise evaluation of that output is required within the
1970      callee.
1971
1972      We chose not to permit function return values to be qualified as
1973      "precise", though we could have hypothetically allowed code such as:
1974
1975        precise float f(float a, float b, float c) { return (a+b)+c; }
1976
1977      To obtain a precise return value in such a case, use code such as:
1978
1979        float f(float a, float b, float c)
1980        {
1981          precise float result = (a+b) + c;
1982          return result;
1983        }
1984
1985    (25) How does texture gather interact with incomplete textures?
1986
1987      RESOLVED:  For regular texture lookups, incomplete textures are
1988      considered to return a texel value with RGBA components of (0,0,0,1).
1989      For texture gather operations, each texel in the sampled footprint is
1990      considered to have RGBA components of (0,0,0,1).  When using the
1991      textureGather() function to select the R, G, or B component of an
1992      incomplete texture, (0,0,0,0) will be returned.  When selecting the A
1993      component, (1,1,1,1) will be returned.
1994
1995
1996Revision History
1997
1998    Rev.    Date    Author    Changes
1999    ----  --------  --------  -----------------------------------------
2000    16    03/30/12  pbrown    Fix typo in language restricting the use of
2001                              EmitStreamVertex()/EndStreamPrimitive() to
2002                              programs with an output primitive type of
2003                              points, not an input type of points (bug 8371).
2004
2005    15    10/17/11  pbrown    Fix prototypes for textureGather and
2006                              textureGatherOffset to use vec2 coordinates for
2007                              "2DRect" sampler versions (bug 7964).
2008
2009    14    01/27/11  pbrown    Add further clarification on the interaction
2010                              of texture gather and incomplete textures (bug
2011                              7289).
2012
2013    13    09/24/10  pbrown    Clarify the interaction of texture gather
2014                              with swizzle (bug 5910), fixing conflicts
2015                              between API and GLSL spec language.
2016                              Consolidate into one copy in the API
2017                              spec.
2018
2019    12    03/23/10  pbrown    Update issues section, both fixing/numbering
2020                              existing issues and including other issues
2021                              that were left behind in NV_gpu_shader5 when the
2022                              specs were refactored.
2023
2024    11    03/23/10  Jon Leech Describe <offset> to interpolateAtOffset
2025                              without implying it is a constant expression
2026                              (Bug 6026).
2027
2028    10    03/07/10  pbrown    Fix typo in an output stream qualifier example.
2029
2030     9    03/05/10  pbrown    Modify function overloading rules to remove
2031                              most preferences when converting between
2032                              two different types.  The only preferences
2033                              that remain are promoting "float" to "double"
2034                              over other conversions, and preferring
2035                              conversion of integers to "float" to converting
2036                              to "double" (bug 5938).
2037
2038     8    01/29/10  pbrown    Update the spec to require that the minimum
2039                              value for MAX_PROGRAM_TEXTURE_GATHER_-
2040                              COMPONENTS is 4 (bug 5919).
2041
2042     7    01/21/10  pbrown    Clarify the rules for determining a best match
2043                              if implicit conversions can result in multiple
2044                              matching function prototypes.  Modify the rules
2045                              to pick a best match by comparing pairs of
2046                              functions, and using any function deemed better
2047                              than any other choice.  Modify the argument
2048                              conversion preference rules for overloading to
2049                              disfavor "int" to "uint" conversions, for
2050                              backward compatibility with previous GLSL
2051                              versions.  Add some new discussion of the
2052                              choices involved to the issues section (bug
2053                              5938).
2054
2055     6    01/14/10  pbrown    Minor wording updates from spec reviews.
2056
2057     5    12/10/09  pbrown    Functionality updates from spec review:
2058                              Rename fmad to fma.  Fix error in spec
2059                              language for negative diffs in usubBorrow.
2060
2061     4    12/10/09  pbrown    Convert from EXT to ARB.
2062
2063     3    12/08/09  pbrown    Miscellaneous fixes from spec review:  Added
2064                              missing implementation constants for
2065                              interpolation offset range and granularity;
2066                              added explicit section to OpenGL spec describing
2067                              shader requested interpolation modifiers and
2068                              functions.  Clean up more dangling "ThreadID"
2069                              references.  General typo fixes and language
2070                              clarifications.
2071
2072     2    10/01/09  pbrown    Renamed gl_ThreadID to gl_InvocationID.
2073
2074     1              pbrown    Internal revisions.
2075