• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    OES_gpu_shader5
4
5Name Strings
6
7    GL_OES_gpu_shader5
8
9Contact
10
11    Jon Leech (oddhack 'at' sonic.net)
12    Daniel Koch, NVIDIA (dkoch 'at' nvidia.com)
13
14Contributors
15
16    Daniel Koch, NVIDIA (dkoch 'at' nvidia.com)
17    Pat Brown, NVIDIA (pbrown 'at' nvidia.com)
18    Jesse Hall, Google
19    Maurice Ribble, Qualcomm
20    Bill Licea-Kane, Qualcomm
21    Graham Connor, Imagination
22    Ben Bowman, Imagination
23    Jonathan Putsman, Imagination
24    Marcin Kantoch, Mobica
25    Slawomir Grajewski, Intel
26    Contributors to ARB_gpu_shader5
27
28Notice
29
30    Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at
31        http://www.khronos.org/registry/speccopyright.html
32
33Specification Update Policy
34
35    Khronos-approved extension specifications are updated in response to
36    issues and bugs prioritized by the Khronos OpenGL ES Working Group. For
37    extensions which have been promoted to a core Specification, fixes will
38    first appear in the latest version of that core Specification, and will
39    eventually be backported to the extension document. This policy is
40    described in more detail at
41        https://www.khronos.org/registry/OpenGL/docs/update_policy.php
42
43    Portions Copyright (c) 2013-2014 NVIDIA Corporation.
44
45Status
46
47    Approved by the OpenGL ES Working Group
48    Ratified by the Khronos Board of Promoters on November 7, 2014
49
50Version
51
52    Last Modified Date: March 27, 2015
53    Revision: 2
54
55Number
56
57    OpenGL ES Extension #211
58
59Dependencies
60
61    OpenGL ES 3.1 and OpenGL ES Shading Language 3.10 are required.
62
63    This specification is written against the OpenGL ES 3.1 (March 17,
64    2014) and OpenGL ES 3.10 Shading Language (March 17, 2014)
65    Specifications.
66
67    This extension interacts with OES_geometry_shader.
68
69Overview
70
71    This extension provides a set of new features to the OpenGL ES Shading
72    Language and related APIs to support capabilities of new GPUs, extending
73    the capabilities of version 3.10 of the OpenGL ES Shading Language.
74    Shaders using the new functionality provided by this extension should
75    enable this functionality via the construct
76
77      #extension GL_OES_gpu_shader5 : require     (or enable)
78
79    This extension provides a variety of new features for all shader types,
80    including:
81
82      * support for indexing into arrays of opaque types (samplers,
83        and atomic counters) using dynamically uniform integer expressions;
84
85      * support for indexing into arrays of images and shader storage blocks
86        using only constant integral expressions;
87
88      * extending the uniform block capability to allow shaders to index
89        into an array of uniform blocks;
90
91      * a "precise" qualifier allowing computations to be carried out exactly
92        as specified in the shader source to avoid optimization-induced
93        invariance issues (which might cause cracking in tessellation);
94
95      * new built-in functions supporting:
96
97        * fused floating-point multiply-add operations;
98
99      * extending the textureGather() built-in functions provided by
100        OpenGL ES Shading Language 3.10:
101
102        * allowing shaders to use arbitrary offsets computed at run-time to
103          select a 2x2 footprint to gather from; and
104        * allowing shaders to use separate independent offsets for each of
105          the four texels returned, instead of requiring a fixed 2x2
106          footprint.
107
108New Procedures and Functions
109
110    None
111
112New Tokens
113
114    None
115
116Additions to the OpenGL ES 3.1 Specification
117
118    Add to the end of section 8.13.2, "Coordinate Wrapping and Texel
119    Selection":
120
121    ... texture source color of (0,0,0,1) for all four source texels.
122
123    The textureGatherOffsets built-in shader functions return a vector
124    derived from sampling four texels in the image array of level
125    <level_base>. For each of the four texel offsets specified by the
126    <offsets> argument, the rules for the LINEAR minification filter are
127    applied to identify a 2x2 texel footprint, from which the single texel
128    T_i0_j0 is selected. A four-component vector is then assembled by taking
129    a single component from each of the four T_i0_j0 texels in the same
130    manner as for the textureGather function.
131
132
133Additions to the OpenGL ES Shading Language 3.10 Specification
134
135    Including the following line in a shader can be used to control the
136    language features described in this extension:
137
138      #extension GL_OES_gpu_shader5 : <behavior>
139
140    where <behavior> is as specified in section 3.4.
141
142    A new preprocessor #define is added to the OpenGL ES Shading Language:
143
144      #define GL_OES_gpu_shader5        1
145
146
147    Modifications to Section 3.7 (Keywords)
148
149    Remove "precise" from the list of reserved keywords and add it to the
150    list of keywords.
151
152    Remove the last paragraph from section 3.9.3 "Dynamically Uniform
153    Expressions" (starting "The definition is not used in this version...")
154
155
156    Add to the introduction to section 4.1.7, "Opaque Types" on p. 26:
157
158    When aggregated into arrays within a shader, opaque types can only be
159    indexed with a dynamically uniform integral expression (see section
160    3.9.3) unless otherwise noted; otherwise, results are undefined.
161
162
163    Replace the first paragraph of section 4.1.7.1, "Samplers" (removing the
164    second sentence) on p. 27:
165
166    Sampler types (e.g., sampler2D) are opaque types, declared and behaving
167    as described above for opaque types.
168
169    Sampler variables are ...
170
171
172
173    Modify Section 4.3.9 "Interface Blocks", as modified by
174    OES_geometry_shader and OES_shader_io_blocks:
175
176    (modify the paragraph starting "For uniform or shader storage blocks
177    declared as an array", removing the requirement for indexing uniform
178    blocks using constant expressions)
179
180    For uniform or shader storage blocks declared as an array, each
181    individual array element corresponds to a separate buffer object bind
182    range, backing one instance of the block. As the array size indicates
183    the number of buffer objects needed, uniform and shader storage block
184    array declarations must specify an array size. All indices used to index
185    a shader storage block array must be constant integral expressions. A
186    uniform block array can only be indexed with a dynamically uniform
187    integral expression, otherwise results are undefined.
188
189
190    Add new section 4.9gs5 before section 4.10 "Order of Qualification":
191
192    4.9gs5 The Precise Qualifier
193
194    Some algorithms may require that floating-point computations be carried
195    out in exactly the manner specified in the source code, even if the
196    implementation supports optimizations that could produce nearly
197    equivalent results with higher performance. For example, many GL
198    implementations support a "multiply-add" that can compute values such as
199
200      float result = (float(a) * float(b)) + float(c);
201
202    in a single operation. The result of a floating-point multiply-add may
203    not always be identical to first doing a multiply yielding a
204    floating-point result, and then doing a floating-point add. By default,
205    implementations are permitted to perform optimizations that effectively
206    modify the order of the operations used to evaluate an expression, even
207    if those optimizations may produce slightly different results relative
208    to unoptimized code.
209
210    The qualifier "precise" will ensure that operations contributing to a
211    variable's value are performed in the order and with the precision
212    specified in the source code. Order of evaluation is determined by
213    operator precedence and parentheses, as described in Section &5.
214    Expressions must be evaluated with a precision consistent with the
215    operation; for example, multiplying two "float" values must produce a
216    single value with "float" precision. This effectively prohibits the
217    arbitrary use of fused multiply-add operations if the intermediate
218    multiply result is kept at a higher precision. For example:
219
220      precise out vec4 position;
221
222    declares that computations used to produce the value of "position" must
223    be performed precisely using the order and precision specified. As with
224    the invariant qualifier (section &4.6.1), the precise qualifier may be
225    used to qualify a built-in or previously declared user-defined variable
226    as being precise:
227
228      out vec3 Color;
229      precise Color;            // make existing Color be precise
230
231    This qualifier will affect the evaluation of expressions used on the
232    right-hand side of an assignment if and only if:
233
234      * the variable assigned to is qualified as "precise"; or
235
236      * the value assigned is used later in the same function, either
237        directly or indirectly, on the right-hand of an assignment to a
238        variable declared as "precise".
239
240    Expressions computed in a function are treated as precise only if
241    assigned to a variable qualified as "precise" in that same function. Any
242    other expressions within a function are not automatically treated as
243    precise, even if they are used to determine a value that is returned by
244    the function and directly assigned to a variable qualified as "precise".
245
246    Some examples of the use of "precise" include:
247
248      in vec4 a, b, c, d;
249      precise out vec4 v;
250
251      float func(float e, float f, float g, float h)
252      {
253        return (e*f) + (g*h);            // no special precision
254      }
255
256      float func2(float e, float f, float g, float h)
257      {
258        precise result = (e*f) + (g*h);  // ensures a precise return value
259        return result;
260      }
261
262      float func3(float i, float j, precise out float k)
263      {
264        k = i * i + j;                   // precise, due to <k> declaration
265      }
266
267      void main(void)
268      {
269        vec4 r = vec3(a * b);           // precise, used to compute v.xyz
270        vec4 s = vec3(c * d);           // precise, used to compute v.xyz
271        v.xyz = r + s;                      // precise
272        v.w = (a.w * b.w) + (c.w * d.w);    // precise
273        v.x = func(a.x, b.x, c.x, d.x);     // values computed in func()
274                                            // are NOT precise
275        v.x = func2(a.x, b.x, c.x, d.x);    // precise!
276        func3(a.x * b.x, c.x * d.x, v.x);   // precise!
277      }
278
279
280    Modify Section 8.3, Common Functions, p. 104
281
282    (add support for floating-point multiply-add)
283
284    Syntax:
285
286      genType fma(genType a, genType b, genType c);
287
288    Computes and returns a * b + c.
289
290    In uses where the return value is eventually consumed by a variable
291    declared as precise:
292
293    * fma() is considered a single operation, whereas the expression
294      "a*b + c" consumed by a variable declared precise is considered two
295      operations.
296    * The precision of fma() can differ from the precision of the expression
297      "a*b + c".
298    * fma() will be computed with the same precision as any other fma()
299      consumed by a precise variable, giving invariant results for the same
300      input values of a, b, and c.
301
302    Otherwise, in the absence of precise consumption, there are no special
303    constraints on the number of operations or difference in precision
304    between fma() and the expression "a*b + c".
305
306
307    Modify the table of functions in section 8.9.3 "Texture Gather
308    Functions", changing the "Description" column for the existing
309    textureGatherOffset functions on p. 127:
310
311    Description
312
313        Perform a texture gather operation as in textureGather offset by
314        <offset> as described in textureOffset, except that the <offset> can
315        be variable (non-constant) and the implementation-dependent minimum
316        and maximum offset values are given by the values of
317        MIN_PROGRAM_TEXTURE_GATHER_OFFSET and
318        MAX_PROGRAM_TEXTURE_GATHER_OFFSET, respectively.
319
320
321    Add new textureGatherOffsets functions to the same table, on p. 127:
322
323    Syntax
324
325        gvec4 textureGatherOffsets(gsampler2D sampler, vec2 P,
326                                   ivec2 offsets[4] [, int comp])
327        gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 P,
328                                   ivec2 offsets[4] [, int comp])
329        vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 P,
330                                  float refZ, ivec2 offsets[4])
331        vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 P,
332                                  float refZ, ivec2 offsets[4])
333
334    Description
335
336        Operate identically to textureGatherOffset except that <offsets> is
337        used to determine the location of the four texels to sample. Each of
338        the four texels is obtained by applying the corresponding offset in
339        <offsets> as a (u,v) coordinate offset to <coord>, identifying the
340        four-texel linear footprint, and then selecting texel (i0,j0) of
341        that footprint. The specified values in <offsets> must be constant
342        integral expressions.
343
344New Implementation Dependent State
345
346    None.
347
348Issues
349
350    Note: These issues apply specifically to the definition of the
351    OES_gpu_shader5 specification, which is based on the OpenGL extension
352    ARB_gpu_shader5 as updated in OpenGL 4.x. Resolved issues from
353    ARB_gpu_shader5 have been removed, but some remain applicable to this
354    extension. ARB_gpu_shader5 can be found in the OpenGL Registry.
355
356    (1) What functionality was removed relative to ARB_gpu_shader5?
357
358      - Instanced geometry support (moved into OES_geometry_shader)
359      - Implicit conversions (moved to EXT_shader_implicit_conversions)
360      - Interactions with features not supported by the underlying
361        ES 3.1 API and Shading Language, including:
362        * interactions with ARB_gpu_Shader_fp64 and NV_gpu_shader, including
363          support for double-precision in implicit conversions and function
364          overload resolution
365        * multiple vertex streams (these require ARB_transform_feedback3)
366        * textureGather built-in variants for cube map array and rectangle
367          texture samples.
368        * shading language function overloading rules involving the type
369          double
370      - Functionality already in OpenGL ES 3.00, including packing and
371        unpacking of 16-bit types and converting floating-point values to or
372        from their integer bit encodings.
373      - Functionality already in OpenGL ES 3.10, including
374        * splitting and building floating-point numbers from a significand and
375          exponent, integer bitfield manipulation, and packing and unpacking
376          vectors of 8-bit fixed-point data types.
377        * a subset of the textureGather and textureGatherOffset builtins
378          (but some textureGather builtins remain in this extension).
379      - Functionality already in OES_sample_variables, including support for
380        reading a mask of covered samples in a fragment shader.
381      - Functionality already in OES_shader_multisample_interpolation,
382        including support for interpolating a fragment shader input at a
383        programmable offset relative to the pixel center, a programmable
384        sample number, or at the centroid.
385      - MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS (Issue 9).
386
387    (2) What functionality was changed and added relative to
388        ARB_gpu_shader5?
389
390      - Support for indexing into arrays of samplers with extended to all
391        opaque types, and the description of allowed indices was rewritten
392        in terms of dynamically uniform expressions, as was done when
393        ARB_gpu_shader5 was promoted into OpenGL 4.0.
394      - The only remaining API interaction is an increase in a
395        minium-maximum value, so no "Changes to the OpenGL ES Specification"
396        sections are included above.
397      - arrays of images and shader storage blocks can only be indexed
398        with constant integral expressions.
399
400    (3) What should the rules on GLSL suffixing be?
401
402    RESOLVED: "precise" is not a reserved keyword in ESSL 3.00, but it is
403    a keyword in GLSL 4.40. ESSL 3.10 updated the reserved keyword list
404    to include all keywords used or reserved in GLSL 4.40 (but not otherwise
405    used in ES) and thus we can use "precise" in this spec by moving it
406    from the reserved keywords section. See bug 11179.
407
408    (4) Are changes to the "Order of Qualification" section needed?
409
410    RESOLVED. No. ESSL 3.10 relaxes the ordering constraints similarly to
411    GLSL 4.40. And thus there is no need for modifications to section 4.7
412    in 3.00 (4.10 in 3.10) in this extension.
413
414    (5) Are any more changes needed to the descriptions of texture gather?
415
416    Probably not. Bug 11109 suggests cleanup to be applied to both desktop
417    API and language specifications to make them cleaner and more
418    consistent. The important parts of this cleanup were done in the texture
419    gather functionality folded into ES 3.1, although some small language
420    tweaks may still be needed.
421
422    (6) Moved to EXT_shader_implicit_conversions Issue 4.
423
424    (7) Should uniform and shader storage blocks be backable with buffer
425        object subranges?
426
427    RESOLVED: Yes. The section 4.3.7 "Interface Blocks" language picked up
428    from desktop GL allows this (they are called "bind ranges"). This is a
429    spec oversight in ES, because BindBufferRange is fully supported in
430    OpenGL ES 3.0.
431
432    (8) Where is MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS?
433
434    RESOLVED. It was not added in Core GL because ARB_texture_gather and
435    ARB_gpu_shader5 were both added to GL 4.0 and thus the query was
436    unneeded. Since OpenGL ES 3.1 also includes texture gather and the
437    multi-component gather support from gpu_shader5, the query was also
438    unnecessary there and here.  Bug 11002.
439
440    (9) Some vendors may not be able to support dynamic indexing
441    of arrays of images or shader storage blocks. What should we use instead?
442
443    RESOLVED: Only allowing 'constant integral expression' instead of
444    'dynamically uniform integer expression' for arrays of images or shader
445    storage blocks. For images this is done by carving out an exception in the
446    general language for opaque types. For shader storage blocks, different
447    rules are given for arrays of uniform blocks and arrays of shader storage
448    blocks.
449
450Revision History
451
452    Rev.    Date      Author    Changes
453    ----  ----------  --------- -------------------------------------------------
454     1    06/18/2014  dkoch     Initial OES version based on EXT.
455                                No functional changes.
456     2    03/27/2015  dkoch     Add missing function and token sections.
457