• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    EXT_gpu_shader5
4
5Name Strings
6
7    GL_EXT_gpu_shader5
8
9Contact
10
11    Jon Leech (oddhack 'at' sonic.net)
12    Daniel Koch, NVIDIA (dkoch 'at' nvidia.com)
13
14Contributors
15
16    Daniel Koch, NVIDIA (dkoch 'at' nvidia.com)
17    Pat Brown, NVIDIA (pbrown 'at' nvidia.com)
18    Jesse Hall, Google
19    Maurice Ribble, Qualcomm
20    Bill Licea-Kane, Qualcomm
21    Graham Connor, Imagination
22    Ben Bowman, Imagination
23    Jonathan Putsman, Imagination
24    Marcin Kantoch, Mobica
25    Slawomir Grajewski, Intel
26    Contributors to ARB_gpu_shader5
27
28Notice
29
30    Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at
31        http://www.khronos.org/registry/speccopyright.html
32
33    Portions Copyright (c) 2013-2014 NVIDIA Corporation.
34
35Status
36
37    Complete.
38
39Version
40
41    Last Modified Date: March 27, 2015
42    Revision: 12
43
44Number
45
46    OpenGL ES Extension #178
47
48Dependencies
49
50    OpenGL ES 3.1 and OpenGL ES Shading Language 3.10 are required.
51
52    This specification is written against the OpenGL ES 3.1 (March 17,
53    2014) and OpenGL ES 3.10 Shading Language (March 17, 2014)
54    Specifications.
55
56    This extension interacts with EXT_geometry_shader.
57
58Overview
59
60    This extension provides a set of new features to the OpenGL ES Shading
61    Language and related APIs to support capabilities of new GPUs, extending
62    the capabilities of version 3.10 of the OpenGL ES Shading Language.
63    Shaders using the new functionality provided by this extension should
64    enable this functionality via the construct
65
66      #extension GL_EXT_gpu_shader5 : require     (or enable)
67
68    This extension provides a variety of new features for all shader types,
69    including:
70
71      * support for indexing into arrays of opaque types (samplers,
72        and atomic counters) using dynamically uniform integer expressions;
73
74      * support for indexing into arrays of images and shader storage blocks
75        using only constant integral expressions;
76
77      * extending the uniform block capability to allow shaders to index
78        into an array of uniform blocks;
79
80      * a "precise" qualifier allowing computations to be carried out exactly
81        as specified in the shader source to avoid optimization-induced
82        invariance issues (which might cause cracking in tessellation);
83
84      * new built-in functions supporting:
85
86        * fused floating-point multiply-add operations;
87
88      * extending the textureGather() built-in functions provided by
89        OpenGL ES Shading Language 3.10:
90
91        * allowing shaders to use arbitrary offsets computed at run-time to
92          select a 2x2 footprint to gather from; and
93        * allowing shaders to use separate independent offsets for each of
94          the four texels returned, instead of requiring a fixed 2x2
95          footprint.
96
97New Procedures and Functions
98
99    None
100
101New Tokens
102
103    None
104
105Additions to the OpenGL ES 3.1 Specification
106
107    Add to the end of section 8.13.2, "Coordinate Wrapping and Texel
108    Selection":
109
110    ... texture source color of (0,0,0,1) for all four source texels.
111
112    The textureGatherOffsets built-in shader functions return a vector
113    derived from sampling four texels in the image array of level
114    <level_base>. For each of the four texel offsets specified by the
115    <offsets> argument, the rules for the LINEAR minification filter are
116    applied to identify a 2x2 texel footprint, from which the single texel
117    T_i0_j0 is selected. A four-component vector is then assembled by taking
118    a single component from each of the four T_i0_j0 texels in the same
119    manner as for the textureGather function.
120
121
122Additions to the OpenGL ES Shading Language 3.10 Specification
123
124    Including the following line in a shader can be used to control the
125    language features described in this extension:
126
127      #extension GL_EXT_gpu_shader5 : <behavior>
128
129    where <behavior> is as specified in section 3.4.
130
131    A new preprocessor #define is added to the OpenGL ES Shading Language:
132
133      #define GL_EXT_gpu_shader5        1
134
135
136    Modifications to Section 3.7 (Keywords)
137
138    Remove "precise" from the list of reserved keywords and add it to the
139    list of keywords.
140
141    Remove the last paragraph from section 3.9.3 "Dynamically Uniform
142    Expressions" (starting "The definition is not used in this version...")
143
144
145    Add to the introduction to section 4.1.7, "Opaque Types" on p. 26:
146
147    When aggregated into arrays within a shader, opaque types can only be
148    indexed with a dynamically uniform integral expression (see section
149    3.9.3) unless otherwise noted; otherwise, results are undefined.
150
151
152    Replace the first paragraph of section 4.1.7.1, "Samplers" (removing the
153    second sentence) on p. 27:
154
155    Sampler types (e.g., sampler2D) are opaque types, declared and behaving
156    as described above for opaque types.
157
158    Sampler variables are ...
159
160
161
162    Modify Section 4.3.9 "Interface Blocks", as modified by
163    EXT_geometry_shader and EXT_shader_io_blocks:
164
165    (modify the paragraph starting "For uniform or shader storage blocks
166    declared as an array", removing the requirement for indexing uniform
167    blocks using constant expressions)
168
169    For uniform or shader storage blocks declared as an array, each
170    individual array element corresponds to a separate buffer object bind
171    range, backing one instance of the block. As the array size indicates
172    the number of buffer objects needed, uniform and shader storage block
173    array declarations must specify an array size. All indices used to index
174    a shader storage block array must be constant integral expressions. A
175    uniform block array can only be indexed with a dynamically uniform
176    integral expression, otherwise results are undefined.
177
178
179    Add new section 4.9gs5 before section 4.10 "Order of Qualification":
180
181    4.9gs5 The Precise Qualifier
182
183    Some algorithms may require that floating-point computations be carried
184    out in exactly the manner specified in the source code, even if the
185    implementation supports optimizations that could produce nearly
186    equivalent results with higher performance. For example, many GL
187    implementations support a "multiply-add" that can compute values such as
188
189      float result = (float(a) * float(b)) + float(c);
190
191    in a single operation. The result of a floating-point multiply-add may
192    not always be identical to first doing a multiply yielding a
193    floating-point result, and then doing a floating-point add. By default,
194    implementations are permitted to perform optimizations that effectively
195    modify the order of the operations used to evaluate an expression, even
196    if those optimizations may produce slightly different results relative
197    to unoptimized code.
198
199    The qualifier "precise" will ensure that operations contributing to a
200    variable's value are performed in the order and with the precision
201    specified in the source code. Order of evaluation is determined by
202    operator precedence and parentheses, as described in Section &5.
203    Expressions must be evaluated with a precision consistent with the
204    operation; for example, multiplying two "float" values must produce a
205    single value with "float" precision. This effectively prohibits the
206    arbitrary use of fused multiply-add operations if the intermediate
207    multiply result is kept at a higher precision. For example:
208
209      precise out vec4 position;
210
211    declares that computations used to produce the value of "position" must
212    be performed precisely using the order and precision specified. As with
213    the invariant qualifier (section &4.6.1), the precise qualifier may be
214    used to qualify a built-in or previously declared user-defined variable
215    as being precise:
216
217      out vec3 Color;
218      precise Color;            // make existing Color be precise
219
220    This qualifier will affect the evaluation of expressions used on the
221    right-hand side of an assignment if and only if:
222
223      * the variable assigned to is qualified as "precise"; or
224
225      * the value assigned is used later in the same function, either
226        directly or indirectly, on the right-hand of an assignment to a
227        variable declared as "precise".
228
229    Expressions computed in a function are treated as precise only if
230    assigned to a variable qualified as "precise" in that same function. Any
231    other expressions within a function are not automatically treated as
232    precise, even if they are used to determine a value that is returned by
233    the function and directly assigned to a variable qualified as "precise".
234
235    Some examples of the use of "precise" include:
236
237      in vec4 a, b, c, d;
238      precise out vec4 v;
239
240      float func(float e, float f, float g, float h)
241      {
242        return (e*f) + (g*h);            // no special precision
243      }
244
245      float func2(float e, float f, float g, float h)
246      {
247        precise result = (e*f) + (g*h);  // ensures a precise return value
248        return result;
249      }
250
251      float func3(float i, float j, precise out float k)
252      {
253        k = i * i + j;                   // precise, due to <k> declaration
254      }
255
256      void main(void)
257      {
258        vec4 r = vec3(a * b);           // precise, used to compute v.xyz
259        vec4 s = vec3(c * d);           // precise, used to compute v.xyz
260        v.xyz = r + s;                      // precise
261        v.w = (a.w * b.w) + (c.w * d.w);    // precise
262        v.x = func(a.x, b.x, c.x, d.x);     // values computed in func()
263                                            // are NOT precise
264        v.x = func2(a.x, b.x, c.x, d.x);    // precise!
265        func3(a.x * b.x, c.x * d.x, v.x);   // precise!
266      }
267
268
269    Modify Section 8.3, Common Functions, p. 104
270
271    (add support for floating-point multiply-add)
272
273    Syntax:
274
275      genType fma(genType a, genType b, genType c);
276
277    Computes and returns a * b + c.
278
279    In uses where the return value is eventually consumed by a variable
280    declared as precise:
281
282    * fma() is considered a single operation, whereas the expression
283      "a*b + c" consumed by a variable declared precise is considered two
284      operations.
285    * The precision of fma() can differ from the precision of the expression
286      "a*b + c".
287    * fma() will be computed with the same precision as any other fma()
288      consumed by a precise variable, giving invariant results for the same
289      input values of a, b, and c.
290
291    Otherwise, in the absence of precise consumption, there are no special
292    constraints on the number of operations or difference in precision
293    between fma() and the expression "a*b + c".
294
295
296    Modify the table of functions in section 8.9.3 "Texture Gather
297    Functions", changing the "Description" column for the existing
298    textureGatherOffset functions on p. 127:
299
300    Description
301
302        Perform a texture gather operation as in textureGather offset by
303        <offset> as described in textureOffset, except that the <offset> can
304        be variable (non-constant) and the implementation-dependent minimum
305        and maximum offset values are given by the values of
306        MIN_PROGRAM_TEXTURE_GATHER_OFFSET and
307        MAX_PROGRAM_TEXTURE_GATHER_OFFSET, respectively.
308
309
310    Add new textureGatherOffsets functions to the same table, on p. 127:
311
312    Syntax
313
314        gvec4 textureGatherOffsets(gsampler2D sampler, vec2 P,
315                                   ivec2 offsets[4] [, int comp])
316        gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 P,
317                                   ivec2 offsets[4] [, int comp])
318        vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 P,
319                                  float refZ, ivec2 offsets[4])
320        vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 P,
321                                  float refZ, ivec2 offsets[4])
322
323    Description
324
325        Operate identically to textureGatherOffset except that <offsets> is
326        used to determine the location of the four texels to sample. Each of
327        the four texels is obtained by applying the corresponding offset in
328        <offsets> as a (u,v) coordinate offset to <coord>, identifying the
329        four-texel linear footprint, and then selecting texel (i0,j0) of
330        that footprint. The specified values in <offsets> must be constant
331        integral expressions.
332
333New Implementation Dependent State
334
335    None.
336
337Issues
338
339    Note: These issues apply specifically to the definition of the
340    EXT_gpu_shader5 specification, which is based on the OpenGL extension
341    ARB_gpu_shader5 as updated in OpenGL 4.x. Resolved issues from
342    ARB_gpu_shader5 have been removed, but some remain applicable to this
343    extension. ARB_gpu_shader5 can be found in the OpenGL Registry.
344
345    (1) What functionality was removed relative to ARB_gpu_shader5?
346
347      - Instanced geometry support (moved into EXT_geometry_shader)
348      - Implicit conversions (moved to EXT_shader_implicit_conversions)
349      - Interactions with features not supported by the underlying
350        ES 3.1 API and Shading Language, including:
351        * interactions with ARB_gpu_Shader_fp64 and NV_gpu_shader, including
352          support for double-precision in implicit conversions and function
353          overload resolution
354        * multiple vertex streams (these require ARB_transform_feedback3)
355        * textureGather built-in variants for cube map array and rectangle
356          texture samples.
357        * shading language function overloading rules involving the type
358          double
359      - Functionality already in OpenGL ES 3.00, including packing and
360        unpacking of 16-bit types and converting floating-point values to or
361        from their integer bit encodings.
362      - Functionality already in OpenGL ES 3.10, including
363        * splitting and building floating-point numbers from a significand and
364          exponent, integer bitfield manipulation, and packing and unpacking
365          vectors of 8-bit fixed-point data types.
366        * a subset of the textureGather and textureGatherOffset builtins
367          (but some textureGather builtins remain in this extension).
368      - Functionality already in OES_sample_variables, including support for
369        reading a mask of covered samples in a fragment shader.
370      - Functionality already in OES_shader_multisample_interpolation,
371        including support for interpolating a fragment shader input at a
372        programmable offset relative to the pixel center, a programmable
373        sample number, or at the centroid.
374      - MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS (Issue 9).
375
376    (2) What functionality was changed and added relative to
377        ARB_gpu_shader5?
378
379      - Support for indexing into arrays of samplers with extended to all
380        opaque types, and the description of allowed indices was rewritten
381        in terms of dynamically uniform expressions, as was done when
382        ARB_gpu_shader5 was promoted into OpenGL 4.0.
383      - The only remaining API interaction is an increase in a
384        minium-maximum value, so no "Changes to the OpenGL ES Specification"
385        sections are included above.
386      - arrays of images and shader storage blocks can only be indexed
387        with constant integral expressions.
388
389    (3) What should the rules on GLSL suffixing be?
390
391    RESOLVED: "precise" is not a reserved keyword in ESSL 3.00, but it is
392    a keyword in GLSL 4.40. ESSL 3.10 updated the reserved keyword list
393    to include all keywords used or reserved in GLSL 4.40 (but not otherwise
394    used in ES) and thus we can use "precise" in this spec by moving it
395    from the reserved keywords section. See bug 11179.
396
397    (4) Are changes to the "Order of Qualification" section needed?
398
399    RESOLVED. No. ESSL 3.10 relaxes the ordering constraints similarly to
400    GLSL 4.40. And thus there is no need for modifications to section 4.7
401    in 3.00 (4.10 in 3.10) in this extension.
402
403    (5) Are any more changes needed to the descriptions of texture gather?
404
405    Probably not. Bug 11109 suggests cleanup to be applied to both desktop
406    API and language specifications to make them cleaner and more
407    consistent. The important parts of this cleanup were done in the texture
408    gather functionality folded into ES 3.1, although some small language
409    tweaks may still be needed.
410
411    (6) Moved to EXT_shader_implicit_conversions Issue 4.
412
413    (7) Should uniform and shader storage blocks be backable with buffer
414        object subranges?
415
416    RESOLVED: Yes. The section 4.3.7 "Interface Blocks" language picked up
417    from desktop GL allows this (they are called "bind ranges"). This is a
418    spec oversight in ES, because BindBufferRange is fully supported in
419    OpenGL ES 3.0.
420
421    (8) Where is MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS?
422
423    RESOLVED. It was not added in Core GL because ARB_texture_gather and
424    ARB_gpu_shader5 were both added to GL 4.0 and thus the query was
425    unneeded. Since OpenGL ES 3.1 also includes texture gather and the
426    multi-component gather support from gpu_shader5, the query was also
427    unnecessary there and here.  Bug 11002.
428
429    (9) Some vendors may not be able to support dynamic indexing
430    of arrays of images or shader storage blocks. What should we use instead?
431
432    RESOLVED: Only allowing 'constant integral expression' instead of
433    'dynamically uniform integer expression' for arrays of images or shader
434    storage blocks. For images this is done by carving out an exception in the
435    general language for opaque types. For shader storage blocks, different
436    rules are given for arrays of uniform blocks and arrays of shader storage
437    blocks.
438
439Revision History
440
441    Revision 1, 2013/10/27 (Jon Leech)
442        - Initial version based on ARB_gpu_shader5
443
444    Revision 2, 2013/11/06 (Jon Leech)
445        - Update Issues list with unresolved issues 4-7, which are dependent
446          on decisions to be made by the ARB and ES working groups.
447        - Remove {un,}packUnorm2x16EXT (already in ESSL 3.00)
448        - Match changes to ES 3.1 texture gather language, but still
449          reorganize the textureGather functions into their own subsection &
450          table. ES 3.1 restored the [, int comp] argument to the functions
451          it defined. Removed sampler2DRect variants incorrectly left in.
452        - Clean up function overloading example text and opened bug 11178 to
453          resolve possible problems with the GLSL 4.40 language this is
454          based on.
455        - Remove reference to image2DMS, since there is no longer any image
456          load/store support for multisample textures in ES 3.1
457        - Add issue (8) regarding "bind ranges".
458
459    Revision 3, 2013/11/14 (Jon Leech)
460        - Resolve function overloading issue 7, per bug 11178.
461
462    Revision 4, 2013/11/20 (Jon Leech)
463        - Sync with ES 3.1 spec language update.
464        - Refer to ES 3.1 instead of ES 3plus.
465
466    Revision 5, 2013/11/21 (Daniel Koch)
467        - removed implicit conversion language (to a separate document).
468        - updated textureGather functions to reflect the shadow gather
469          functionality being added in ES 3.1.
470        - added issue 9.
471
472    Revision 6, 2013/12/18 (Daniel Koch)
473        - minor cleanup
474        - added issue 10, restrict arrays of images to const-int-expr
475
476    Revision 7, 2014/02/12 (Daniel Koch)
477        - restrict indexing arrays of shader storage blocks to const-int-expr.
478        - Resolved issues 4, 5, 8, 9, 10 and supporting edits.
479
480    Revision 8, 2014/03/10 (Jon Leech)
481        - Rebase on OpenGL ES 3.1 and change suffix to EXT.
482        - Remove textureGather functions already present in the existing
483          GLSL-ES 3.10 spec section 8.9.3
484
485    Revision 9, 2014/03/26 (Daniel Koch)
486        - update contributors
487
488    Revision 10, 2014/03/28 (Jon Leech)
489        - Sync with released ES 3.1 specs. Reflow text.
490
491    Revision 11, 2014/04/01 (Daniel Koch)
492        - Update contributors
493
494    Revision 12, 2015/03/27 (Daniel Koch)
495        - Add missing function and token sections.
496