• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    OES_gpu_shader5
4
5Name Strings
6
7    GL_OES_gpu_shader5
8
9Contact
10
11    Jon Leech (oddhack 'at' sonic.net)
12    Daniel Koch, NVIDIA (dkoch 'at' nvidia.com)
13
14Contributors
15
16    Daniel Koch, NVIDIA (dkoch 'at' nvidia.com)
17    Pat Brown, NVIDIA (pbrown 'at' nvidia.com)
18    Jesse Hall, Google
19    Maurice Ribble, Qualcomm
20    Bill Licea-Kane, Qualcomm
21    Graham Connor, Imagination
22    Ben Bowman, Imagination
23    Jonathan Putsman, Imagination
24    Marcin Kantoch, Mobica
25    Slawomir Grajewski, Intel
26    Contributors to ARB_gpu_shader5
27
28Notice
29
30    Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at
31        http://www.khronos.org/registry/speccopyright.html
32
33    Portions Copyright (c) 2013-2014 NVIDIA Corporation.
34
35Status
36
37    Approved by the OpenGL ES Working Group
38    Ratified by the Khronos Board of Promoters on November 7, 2014
39
40Version
41
42    Last Modified Date: March 27, 2015
43    Revision: 2
44
45Number
46
47    OpenGL ES Extension #211
48
49Dependencies
50
51    OpenGL ES 3.1 and OpenGL ES Shading Language 3.10 are required.
52
53    This specification is written against the OpenGL ES 3.1 (March 17,
54    2014) and OpenGL ES 3.10 Shading Language (March 17, 2014)
55    Specifications.
56
57    This extension interacts with OES_geometry_shader.
58
59Overview
60
61    This extension provides a set of new features to the OpenGL ES Shading
62    Language and related APIs to support capabilities of new GPUs, extending
63    the capabilities of version 3.10 of the OpenGL ES Shading Language.
64    Shaders using the new functionality provided by this extension should
65    enable this functionality via the construct
66
67      #extension GL_OES_gpu_shader5 : require     (or enable)
68
69    This extension provides a variety of new features for all shader types,
70    including:
71
72      * support for indexing into arrays of opaque types (samplers,
73        and atomic counters) using dynamically uniform integer expressions;
74
75      * support for indexing into arrays of images and shader storage blocks
76        using only constant integral expressions;
77
78      * extending the uniform block capability to allow shaders to index
79        into an array of uniform blocks;
80
81      * a "precise" qualifier allowing computations to be carried out exactly
82        as specified in the shader source to avoid optimization-induced
83        invariance issues (which might cause cracking in tessellation);
84
85      * new built-in functions supporting:
86
87        * fused floating-point multiply-add operations;
88
89      * extending the textureGather() built-in functions provided by
90        OpenGL ES Shading Language 3.10:
91
92        * allowing shaders to use arbitrary offsets computed at run-time to
93          select a 2x2 footprint to gather from; and
94        * allowing shaders to use separate independent offsets for each of
95          the four texels returned, instead of requiring a fixed 2x2
96          footprint.
97
98New Procedures and Functions
99
100    None
101
102New Tokens
103
104    None
105
106Additions to the OpenGL ES 3.1 Specification
107
108    Add to the end of section 8.13.2, "Coordinate Wrapping and Texel
109    Selection":
110
111    ... texture source color of (0,0,0,1) for all four source texels.
112
113    The textureGatherOffsets built-in shader functions return a vector
114    derived from sampling four texels in the image array of level
115    <level_base>. For each of the four texel offsets specified by the
116    <offsets> argument, the rules for the LINEAR minification filter are
117    applied to identify a 2x2 texel footprint, from which the single texel
118    T_i0_j0 is selected. A four-component vector is then assembled by taking
119    a single component from each of the four T_i0_j0 texels in the same
120    manner as for the textureGather function.
121
122
123Additions to the OpenGL ES Shading Language 3.10 Specification
124
125    Including the following line in a shader can be used to control the
126    language features described in this extension:
127
128      #extension GL_OES_gpu_shader5 : <behavior>
129
130    where <behavior> is as specified in section 3.4.
131
132    A new preprocessor #define is added to the OpenGL ES Shading Language:
133
134      #define GL_OES_gpu_shader5        1
135
136
137    Modifications to Section 3.7 (Keywords)
138
139    Remove "precise" from the list of reserved keywords and add it to the
140    list of keywords.
141
142    Remove the last paragraph from section 3.9.3 "Dynamically Uniform
143    Expressions" (starting "The definition is not used in this version...")
144
145
146    Add to the introduction to section 4.1.7, "Opaque Types" on p. 26:
147
148    When aggregated into arrays within a shader, opaque types can only be
149    indexed with a dynamically uniform integral expression (see section
150    3.9.3) unless otherwise noted; otherwise, results are undefined.
151
152
153    Replace the first paragraph of section 4.1.7.1, "Samplers" (removing the
154    second sentence) on p. 27:
155
156    Sampler types (e.g., sampler2D) are opaque types, declared and behaving
157    as described above for opaque types.
158
159    Sampler variables are ...
160
161
162
163    Modify Section 4.3.9 "Interface Blocks", as modified by
164    OES_geometry_shader and OES_shader_io_blocks:
165
166    (modify the paragraph starting "For uniform or shader storage blocks
167    declared as an array", removing the requirement for indexing uniform
168    blocks using constant expressions)
169
170    For uniform or shader storage blocks declared as an array, each
171    individual array element corresponds to a separate buffer object bind
172    range, backing one instance of the block. As the array size indicates
173    the number of buffer objects needed, uniform and shader storage block
174    array declarations must specify an array size. All indices used to index
175    a shader storage block array must be constant integral expressions. A
176    uniform block array can only be indexed with a dynamically uniform
177    integral expression, otherwise results are undefined.
178
179
180    Add new section 4.9gs5 before section 4.10 "Order of Qualification":
181
182    4.9gs5 The Precise Qualifier
183
184    Some algorithms may require that floating-point computations be carried
185    out in exactly the manner specified in the source code, even if the
186    implementation supports optimizations that could produce nearly
187    equivalent results with higher performance. For example, many GL
188    implementations support a "multiply-add" that can compute values such as
189
190      float result = (float(a) * float(b)) + float(c);
191
192    in a single operation. The result of a floating-point multiply-add may
193    not always be identical to first doing a multiply yielding a
194    floating-point result, and then doing a floating-point add. By default,
195    implementations are permitted to perform optimizations that effectively
196    modify the order of the operations used to evaluate an expression, even
197    if those optimizations may produce slightly different results relative
198    to unoptimized code.
199
200    The qualifier "precise" will ensure that operations contributing to a
201    variable's value are performed in the order and with the precision
202    specified in the source code. Order of evaluation is determined by
203    operator precedence and parentheses, as described in Section &5.
204    Expressions must be evaluated with a precision consistent with the
205    operation; for example, multiplying two "float" values must produce a
206    single value with "float" precision. This effectively prohibits the
207    arbitrary use of fused multiply-add operations if the intermediate
208    multiply result is kept at a higher precision. For example:
209
210      precise out vec4 position;
211
212    declares that computations used to produce the value of "position" must
213    be performed precisely using the order and precision specified. As with
214    the invariant qualifier (section &4.6.1), the precise qualifier may be
215    used to qualify a built-in or previously declared user-defined variable
216    as being precise:
217
218      out vec3 Color;
219      precise Color;            // make existing Color be precise
220
221    This qualifier will affect the evaluation of expressions used on the
222    right-hand side of an assignment if and only if:
223
224      * the variable assigned to is qualified as "precise"; or
225
226      * the value assigned is used later in the same function, either
227        directly or indirectly, on the right-hand of an assignment to a
228        variable declared as "precise".
229
230    Expressions computed in a function are treated as precise only if
231    assigned to a variable qualified as "precise" in that same function. Any
232    other expressions within a function are not automatically treated as
233    precise, even if they are used to determine a value that is returned by
234    the function and directly assigned to a variable qualified as "precise".
235
236    Some examples of the use of "precise" include:
237
238      in vec4 a, b, c, d;
239      precise out vec4 v;
240
241      float func(float e, float f, float g, float h)
242      {
243        return (e*f) + (g*h);            // no special precision
244      }
245
246      float func2(float e, float f, float g, float h)
247      {
248        precise result = (e*f) + (g*h);  // ensures a precise return value
249        return result;
250      }
251
252      float func3(float i, float j, precise out float k)
253      {
254        k = i * i + j;                   // precise, due to <k> declaration
255      }
256
257      void main(void)
258      {
259        vec4 r = vec3(a * b);           // precise, used to compute v.xyz
260        vec4 s = vec3(c * d);           // precise, used to compute v.xyz
261        v.xyz = r + s;                      // precise
262        v.w = (a.w * b.w) + (c.w * d.w);    // precise
263        v.x = func(a.x, b.x, c.x, d.x);     // values computed in func()
264                                            // are NOT precise
265        v.x = func2(a.x, b.x, c.x, d.x);    // precise!
266        func3(a.x * b.x, c.x * d.x, v.x);   // precise!
267      }
268
269
270    Modify Section 8.3, Common Functions, p. 104
271
272    (add support for floating-point multiply-add)
273
274    Syntax:
275
276      genType fma(genType a, genType b, genType c);
277
278    Computes and returns a * b + c.
279
280    In uses where the return value is eventually consumed by a variable
281    declared as precise:
282
283    * fma() is considered a single operation, whereas the expression
284      "a*b + c" consumed by a variable declared precise is considered two
285      operations.
286    * The precision of fma() can differ from the precision of the expression
287      "a*b + c".
288    * fma() will be computed with the same precision as any other fma()
289      consumed by a precise variable, giving invariant results for the same
290      input values of a, b, and c.
291
292    Otherwise, in the absence of precise consumption, there are no special
293    constraints on the number of operations or difference in precision
294    between fma() and the expression "a*b + c".
295
296
297    Modify the table of functions in section 8.9.3 "Texture Gather
298    Functions", changing the "Description" column for the existing
299    textureGatherOffset functions on p. 127:
300
301    Description
302
303        Perform a texture gather operation as in textureGather offset by
304        <offset> as described in textureOffset, except that the <offset> can
305        be variable (non-constant) and the implementation-dependent minimum
306        and maximum offset values are given by the values of
307        MIN_PROGRAM_TEXTURE_GATHER_OFFSET and
308        MAX_PROGRAM_TEXTURE_GATHER_OFFSET, respectively.
309
310
311    Add new textureGatherOffsets functions to the same table, on p. 127:
312
313    Syntax
314
315        gvec4 textureGatherOffsets(gsampler2D sampler, vec2 P,
316                                   ivec2 offsets[4] [, int comp])
317        gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 P,
318                                   ivec2 offsets[4] [, int comp])
319        vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 P,
320                                  float refZ, ivec2 offsets[4])
321        vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 P,
322                                  float refZ, ivec2 offsets[4])
323
324    Description
325
326        Operate identically to textureGatherOffset except that <offsets> is
327        used to determine the location of the four texels to sample. Each of
328        the four texels is obtained by applying the corresponding offset in
329        <offsets> as a (u,v) coordinate offset to <coord>, identifying the
330        four-texel linear footprint, and then selecting texel (i0,j0) of
331        that footprint. The specified values in <offsets> must be constant
332        integral expressions.
333
334New Implementation Dependent State
335
336    None.
337
338Issues
339
340    Note: These issues apply specifically to the definition of the
341    OES_gpu_shader5 specification, which is based on the OpenGL extension
342    ARB_gpu_shader5 as updated in OpenGL 4.x. Resolved issues from
343    ARB_gpu_shader5 have been removed, but some remain applicable to this
344    extension. ARB_gpu_shader5 can be found in the OpenGL Registry.
345
346    (1) What functionality was removed relative to ARB_gpu_shader5?
347
348      - Instanced geometry support (moved into OES_geometry_shader)
349      - Implicit conversions (moved to EXT_shader_implicit_conversions)
350      - Interactions with features not supported by the underlying
351        ES 3.1 API and Shading Language, including:
352        * interactions with ARB_gpu_Shader_fp64 and NV_gpu_shader, including
353          support for double-precision in implicit conversions and function
354          overload resolution
355        * multiple vertex streams (these require ARB_transform_feedback3)
356        * textureGather built-in variants for cube map array and rectangle
357          texture samples.
358        * shading language function overloading rules involving the type
359          double
360      - Functionality already in OpenGL ES 3.00, including packing and
361        unpacking of 16-bit types and converting floating-point values to or
362        from their integer bit encodings.
363      - Functionality already in OpenGL ES 3.10, including
364        * splitting and building floating-point numbers from a significand and
365          exponent, integer bitfield manipulation, and packing and unpacking
366          vectors of 8-bit fixed-point data types.
367        * a subset of the textureGather and textureGatherOffset builtins
368          (but some textureGather builtins remain in this extension).
369      - Functionality already in OES_sample_variables, including support for
370        reading a mask of covered samples in a fragment shader.
371      - Functionality already in OES_shader_multisample_interpolation,
372        including support for interpolating a fragment shader input at a
373        programmable offset relative to the pixel center, a programmable
374        sample number, or at the centroid.
375      - MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS (Issue 9).
376
377    (2) What functionality was changed and added relative to
378        ARB_gpu_shader5?
379
380      - Support for indexing into arrays of samplers with extended to all
381        opaque types, and the description of allowed indices was rewritten
382        in terms of dynamically uniform expressions, as was done when
383        ARB_gpu_shader5 was promoted into OpenGL 4.0.
384      - The only remaining API interaction is an increase in a
385        minium-maximum value, so no "Changes to the OpenGL ES Specification"
386        sections are included above.
387      - arrays of images and shader storage blocks can only be indexed
388        with constant integral expressions.
389
390    (3) What should the rules on GLSL suffixing be?
391
392    RESOLVED: "precise" is not a reserved keyword in ESSL 3.00, but it is
393    a keyword in GLSL 4.40. ESSL 3.10 updated the reserved keyword list
394    to include all keywords used or reserved in GLSL 4.40 (but not otherwise
395    used in ES) and thus we can use "precise" in this spec by moving it
396    from the reserved keywords section. See bug 11179.
397
398    (4) Are changes to the "Order of Qualification" section needed?
399
400    RESOLVED. No. ESSL 3.10 relaxes the ordering constraints similarly to
401    GLSL 4.40. And thus there is no need for modifications to section 4.7
402    in 3.00 (4.10 in 3.10) in this extension.
403
404    (5) Are any more changes needed to the descriptions of texture gather?
405
406    Probably not. Bug 11109 suggests cleanup to be applied to both desktop
407    API and language specifications to make them cleaner and more
408    consistent. The important parts of this cleanup were done in the texture
409    gather functionality folded into ES 3.1, although some small language
410    tweaks may still be needed.
411
412    (6) Moved to EXT_shader_implicit_conversions Issue 4.
413
414    (7) Should uniform and shader storage blocks be backable with buffer
415        object subranges?
416
417    RESOLVED: Yes. The section 4.3.7 "Interface Blocks" language picked up
418    from desktop GL allows this (they are called "bind ranges"). This is a
419    spec oversight in ES, because BindBufferRange is fully supported in
420    OpenGL ES 3.0.
421
422    (8) Where is MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS?
423
424    RESOLVED. It was not added in Core GL because ARB_texture_gather and
425    ARB_gpu_shader5 were both added to GL 4.0 and thus the query was
426    unneeded. Since OpenGL ES 3.1 also includes texture gather and the
427    multi-component gather support from gpu_shader5, the query was also
428    unnecessary there and here.  Bug 11002.
429
430    (9) Some vendors may not be able to support dynamic indexing
431    of arrays of images or shader storage blocks. What should we use instead?
432
433    RESOLVED: Only allowing 'constant integral expression' instead of
434    'dynamically uniform integer expression' for arrays of images or shader
435    storage blocks. For images this is done by carving out an exception in the
436    general language for opaque types. For shader storage blocks, different
437    rules are given for arrays of uniform blocks and arrays of shader storage
438    blocks.
439
440Revision History
441
442    Rev.    Date      Author    Changes
443    ----  ----------  --------- -------------------------------------------------
444     1    06/18/2014  dkoch     Initial OES version based on EXT.
445                                No functional changes.
446     2    03/27/2015  dkoch     Add missing function and token sections.
447