1Name 2 3 OES_gpu_shader5 4 5Name Strings 6 7 GL_OES_gpu_shader5 8 9Contact 10 11 Jon Leech (oddhack 'at' sonic.net) 12 Daniel Koch, NVIDIA (dkoch 'at' nvidia.com) 13 14Contributors 15 16 Daniel Koch, NVIDIA (dkoch 'at' nvidia.com) 17 Pat Brown, NVIDIA (pbrown 'at' nvidia.com) 18 Jesse Hall, Google 19 Maurice Ribble, Qualcomm 20 Bill Licea-Kane, Qualcomm 21 Graham Connor, Imagination 22 Ben Bowman, Imagination 23 Jonathan Putsman, Imagination 24 Marcin Kantoch, Mobica 25 Slawomir Grajewski, Intel 26 Contributors to ARB_gpu_shader5 27 28Notice 29 30 Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at 31 http://www.khronos.org/registry/speccopyright.html 32 33 Portions Copyright (c) 2013-2014 NVIDIA Corporation. 34 35Status 36 37 Approved by the OpenGL ES Working Group 38 Ratified by the Khronos Board of Promoters on November 7, 2014 39 40Version 41 42 Last Modified Date: March 27, 2015 43 Revision: 2 44 45Number 46 47 OpenGL ES Extension #211 48 49Dependencies 50 51 OpenGL ES 3.1 and OpenGL ES Shading Language 3.10 are required. 52 53 This specification is written against the OpenGL ES 3.1 (March 17, 54 2014) and OpenGL ES 3.10 Shading Language (March 17, 2014) 55 Specifications. 56 57 This extension interacts with OES_geometry_shader. 58 59Overview 60 61 This extension provides a set of new features to the OpenGL ES Shading 62 Language and related APIs to support capabilities of new GPUs, extending 63 the capabilities of version 3.10 of the OpenGL ES Shading Language. 64 Shaders using the new functionality provided by this extension should 65 enable this functionality via the construct 66 67 #extension GL_OES_gpu_shader5 : require (or enable) 68 69 This extension provides a variety of new features for all shader types, 70 including: 71 72 * support for indexing into arrays of opaque types (samplers, 73 and atomic counters) using dynamically uniform integer expressions; 74 75 * support for indexing into arrays of images and shader storage blocks 76 using only constant integral expressions; 77 78 * extending the uniform block capability to allow shaders to index 79 into an array of uniform blocks; 80 81 * a "precise" qualifier allowing computations to be carried out exactly 82 as specified in the shader source to avoid optimization-induced 83 invariance issues (which might cause cracking in tessellation); 84 85 * new built-in functions supporting: 86 87 * fused floating-point multiply-add operations; 88 89 * extending the textureGather() built-in functions provided by 90 OpenGL ES Shading Language 3.10: 91 92 * allowing shaders to use arbitrary offsets computed at run-time to 93 select a 2x2 footprint to gather from; and 94 * allowing shaders to use separate independent offsets for each of 95 the four texels returned, instead of requiring a fixed 2x2 96 footprint. 97 98New Procedures and Functions 99 100 None 101 102New Tokens 103 104 None 105 106Additions to the OpenGL ES 3.1 Specification 107 108 Add to the end of section 8.13.2, "Coordinate Wrapping and Texel 109 Selection": 110 111 ... texture source color of (0,0,0,1) for all four source texels. 112 113 The textureGatherOffsets built-in shader functions return a vector 114 derived from sampling four texels in the image array of level 115 <level_base>. For each of the four texel offsets specified by the 116 <offsets> argument, the rules for the LINEAR minification filter are 117 applied to identify a 2x2 texel footprint, from which the single texel 118 T_i0_j0 is selected. A four-component vector is then assembled by taking 119 a single component from each of the four T_i0_j0 texels in the same 120 manner as for the textureGather function. 121 122 123Additions to the OpenGL ES Shading Language 3.10 Specification 124 125 Including the following line in a shader can be used to control the 126 language features described in this extension: 127 128 #extension GL_OES_gpu_shader5 : <behavior> 129 130 where <behavior> is as specified in section 3.4. 131 132 A new preprocessor #define is added to the OpenGL ES Shading Language: 133 134 #define GL_OES_gpu_shader5 1 135 136 137 Modifications to Section 3.7 (Keywords) 138 139 Remove "precise" from the list of reserved keywords and add it to the 140 list of keywords. 141 142 Remove the last paragraph from section 3.9.3 "Dynamically Uniform 143 Expressions" (starting "The definition is not used in this version...") 144 145 146 Add to the introduction to section 4.1.7, "Opaque Types" on p. 26: 147 148 When aggregated into arrays within a shader, opaque types can only be 149 indexed with a dynamically uniform integral expression (see section 150 3.9.3) unless otherwise noted; otherwise, results are undefined. 151 152 153 Replace the first paragraph of section 4.1.7.1, "Samplers" (removing the 154 second sentence) on p. 27: 155 156 Sampler types (e.g., sampler2D) are opaque types, declared and behaving 157 as described above for opaque types. 158 159 Sampler variables are ... 160 161 162 163 Modify Section 4.3.9 "Interface Blocks", as modified by 164 OES_geometry_shader and OES_shader_io_blocks: 165 166 (modify the paragraph starting "For uniform or shader storage blocks 167 declared as an array", removing the requirement for indexing uniform 168 blocks using constant expressions) 169 170 For uniform or shader storage blocks declared as an array, each 171 individual array element corresponds to a separate buffer object bind 172 range, backing one instance of the block. As the array size indicates 173 the number of buffer objects needed, uniform and shader storage block 174 array declarations must specify an array size. All indices used to index 175 a shader storage block array must be constant integral expressions. A 176 uniform block array can only be indexed with a dynamically uniform 177 integral expression, otherwise results are undefined. 178 179 180 Add new section 4.9gs5 before section 4.10 "Order of Qualification": 181 182 4.9gs5 The Precise Qualifier 183 184 Some algorithms may require that floating-point computations be carried 185 out in exactly the manner specified in the source code, even if the 186 implementation supports optimizations that could produce nearly 187 equivalent results with higher performance. For example, many GL 188 implementations support a "multiply-add" that can compute values such as 189 190 float result = (float(a) * float(b)) + float(c); 191 192 in a single operation. The result of a floating-point multiply-add may 193 not always be identical to first doing a multiply yielding a 194 floating-point result, and then doing a floating-point add. By default, 195 implementations are permitted to perform optimizations that effectively 196 modify the order of the operations used to evaluate an expression, even 197 if those optimizations may produce slightly different results relative 198 to unoptimized code. 199 200 The qualifier "precise" will ensure that operations contributing to a 201 variable's value are performed in the order and with the precision 202 specified in the source code. Order of evaluation is determined by 203 operator precedence and parentheses, as described in Section &5. 204 Expressions must be evaluated with a precision consistent with the 205 operation; for example, multiplying two "float" values must produce a 206 single value with "float" precision. This effectively prohibits the 207 arbitrary use of fused multiply-add operations if the intermediate 208 multiply result is kept at a higher precision. For example: 209 210 precise out vec4 position; 211 212 declares that computations used to produce the value of "position" must 213 be performed precisely using the order and precision specified. As with 214 the invariant qualifier (section &4.6.1), the precise qualifier may be 215 used to qualify a built-in or previously declared user-defined variable 216 as being precise: 217 218 out vec3 Color; 219 precise Color; // make existing Color be precise 220 221 This qualifier will affect the evaluation of expressions used on the 222 right-hand side of an assignment if and only if: 223 224 * the variable assigned to is qualified as "precise"; or 225 226 * the value assigned is used later in the same function, either 227 directly or indirectly, on the right-hand of an assignment to a 228 variable declared as "precise". 229 230 Expressions computed in a function are treated as precise only if 231 assigned to a variable qualified as "precise" in that same function. Any 232 other expressions within a function are not automatically treated as 233 precise, even if they are used to determine a value that is returned by 234 the function and directly assigned to a variable qualified as "precise". 235 236 Some examples of the use of "precise" include: 237 238 in vec4 a, b, c, d; 239 precise out vec4 v; 240 241 float func(float e, float f, float g, float h) 242 { 243 return (e*f) + (g*h); // no special precision 244 } 245 246 float func2(float e, float f, float g, float h) 247 { 248 precise result = (e*f) + (g*h); // ensures a precise return value 249 return result; 250 } 251 252 float func3(float i, float j, precise out float k) 253 { 254 k = i * i + j; // precise, due to <k> declaration 255 } 256 257 void main(void) 258 { 259 vec4 r = vec3(a * b); // precise, used to compute v.xyz 260 vec4 s = vec3(c * d); // precise, used to compute v.xyz 261 v.xyz = r + s; // precise 262 v.w = (a.w * b.w) + (c.w * d.w); // precise 263 v.x = func(a.x, b.x, c.x, d.x); // values computed in func() 264 // are NOT precise 265 v.x = func2(a.x, b.x, c.x, d.x); // precise! 266 func3(a.x * b.x, c.x * d.x, v.x); // precise! 267 } 268 269 270 Modify Section 8.3, Common Functions, p. 104 271 272 (add support for floating-point multiply-add) 273 274 Syntax: 275 276 genType fma(genType a, genType b, genType c); 277 278 Computes and returns a * b + c. 279 280 In uses where the return value is eventually consumed by a variable 281 declared as precise: 282 283 * fma() is considered a single operation, whereas the expression 284 "a*b + c" consumed by a variable declared precise is considered two 285 operations. 286 * The precision of fma() can differ from the precision of the expression 287 "a*b + c". 288 * fma() will be computed with the same precision as any other fma() 289 consumed by a precise variable, giving invariant results for the same 290 input values of a, b, and c. 291 292 Otherwise, in the absence of precise consumption, there are no special 293 constraints on the number of operations or difference in precision 294 between fma() and the expression "a*b + c". 295 296 297 Modify the table of functions in section 8.9.3 "Texture Gather 298 Functions", changing the "Description" column for the existing 299 textureGatherOffset functions on p. 127: 300 301 Description 302 303 Perform a texture gather operation as in textureGather offset by 304 <offset> as described in textureOffset, except that the <offset> can 305 be variable (non-constant) and the implementation-dependent minimum 306 and maximum offset values are given by the values of 307 MIN_PROGRAM_TEXTURE_GATHER_OFFSET and 308 MAX_PROGRAM_TEXTURE_GATHER_OFFSET, respectively. 309 310 311 Add new textureGatherOffsets functions to the same table, on p. 127: 312 313 Syntax 314 315 gvec4 textureGatherOffsets(gsampler2D sampler, vec2 P, 316 ivec2 offsets[4] [, int comp]) 317 gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 P, 318 ivec2 offsets[4] [, int comp]) 319 vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 P, 320 float refZ, ivec2 offsets[4]) 321 vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 P, 322 float refZ, ivec2 offsets[4]) 323 324 Description 325 326 Operate identically to textureGatherOffset except that <offsets> is 327 used to determine the location of the four texels to sample. Each of 328 the four texels is obtained by applying the corresponding offset in 329 <offsets> as a (u,v) coordinate offset to <coord>, identifying the 330 four-texel linear footprint, and then selecting texel (i0,j0) of 331 that footprint. The specified values in <offsets> must be constant 332 integral expressions. 333 334New Implementation Dependent State 335 336 None. 337 338Issues 339 340 Note: These issues apply specifically to the definition of the 341 OES_gpu_shader5 specification, which is based on the OpenGL extension 342 ARB_gpu_shader5 as updated in OpenGL 4.x. Resolved issues from 343 ARB_gpu_shader5 have been removed, but some remain applicable to this 344 extension. ARB_gpu_shader5 can be found in the OpenGL Registry. 345 346 (1) What functionality was removed relative to ARB_gpu_shader5? 347 348 - Instanced geometry support (moved into OES_geometry_shader) 349 - Implicit conversions (moved to EXT_shader_implicit_conversions) 350 - Interactions with features not supported by the underlying 351 ES 3.1 API and Shading Language, including: 352 * interactions with ARB_gpu_Shader_fp64 and NV_gpu_shader, including 353 support for double-precision in implicit conversions and function 354 overload resolution 355 * multiple vertex streams (these require ARB_transform_feedback3) 356 * textureGather built-in variants for cube map array and rectangle 357 texture samples. 358 * shading language function overloading rules involving the type 359 double 360 - Functionality already in OpenGL ES 3.00, including packing and 361 unpacking of 16-bit types and converting floating-point values to or 362 from their integer bit encodings. 363 - Functionality already in OpenGL ES 3.10, including 364 * splitting and building floating-point numbers from a significand and 365 exponent, integer bitfield manipulation, and packing and unpacking 366 vectors of 8-bit fixed-point data types. 367 * a subset of the textureGather and textureGatherOffset builtins 368 (but some textureGather builtins remain in this extension). 369 - Functionality already in OES_sample_variables, including support for 370 reading a mask of covered samples in a fragment shader. 371 - Functionality already in OES_shader_multisample_interpolation, 372 including support for interpolating a fragment shader input at a 373 programmable offset relative to the pixel center, a programmable 374 sample number, or at the centroid. 375 - MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS (Issue 9). 376 377 (2) What functionality was changed and added relative to 378 ARB_gpu_shader5? 379 380 - Support for indexing into arrays of samplers with extended to all 381 opaque types, and the description of allowed indices was rewritten 382 in terms of dynamically uniform expressions, as was done when 383 ARB_gpu_shader5 was promoted into OpenGL 4.0. 384 - The only remaining API interaction is an increase in a 385 minium-maximum value, so no "Changes to the OpenGL ES Specification" 386 sections are included above. 387 - arrays of images and shader storage blocks can only be indexed 388 with constant integral expressions. 389 390 (3) What should the rules on GLSL suffixing be? 391 392 RESOLVED: "precise" is not a reserved keyword in ESSL 3.00, but it is 393 a keyword in GLSL 4.40. ESSL 3.10 updated the reserved keyword list 394 to include all keywords used or reserved in GLSL 4.40 (but not otherwise 395 used in ES) and thus we can use "precise" in this spec by moving it 396 from the reserved keywords section. See bug 11179. 397 398 (4) Are changes to the "Order of Qualification" section needed? 399 400 RESOLVED. No. ESSL 3.10 relaxes the ordering constraints similarly to 401 GLSL 4.40. And thus there is no need for modifications to section 4.7 402 in 3.00 (4.10 in 3.10) in this extension. 403 404 (5) Are any more changes needed to the descriptions of texture gather? 405 406 Probably not. Bug 11109 suggests cleanup to be applied to both desktop 407 API and language specifications to make them cleaner and more 408 consistent. The important parts of this cleanup were done in the texture 409 gather functionality folded into ES 3.1, although some small language 410 tweaks may still be needed. 411 412 (6) Moved to EXT_shader_implicit_conversions Issue 4. 413 414 (7) Should uniform and shader storage blocks be backable with buffer 415 object subranges? 416 417 RESOLVED: Yes. The section 4.3.7 "Interface Blocks" language picked up 418 from desktop GL allows this (they are called "bind ranges"). This is a 419 spec oversight in ES, because BindBufferRange is fully supported in 420 OpenGL ES 3.0. 421 422 (8) Where is MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS? 423 424 RESOLVED. It was not added in Core GL because ARB_texture_gather and 425 ARB_gpu_shader5 were both added to GL 4.0 and thus the query was 426 unneeded. Since OpenGL ES 3.1 also includes texture gather and the 427 multi-component gather support from gpu_shader5, the query was also 428 unnecessary there and here. Bug 11002. 429 430 (9) Some vendors may not be able to support dynamic indexing 431 of arrays of images or shader storage blocks. What should we use instead? 432 433 RESOLVED: Only allowing 'constant integral expression' instead of 434 'dynamically uniform integer expression' for arrays of images or shader 435 storage blocks. For images this is done by carving out an exception in the 436 general language for opaque types. For shader storage blocks, different 437 rules are given for arrays of uniform blocks and arrays of shader storage 438 blocks. 439 440Revision History 441 442 Rev. Date Author Changes 443 ---- ---------- --------- ------------------------------------------------- 444 1 06/18/2014 dkoch Initial OES version based on EXT. 445 No functional changes. 446 2 03/27/2015 dkoch Add missing function and token sections. 447