1Name 2 3 OES_gpu_shader5 4 5Name Strings 6 7 GL_OES_gpu_shader5 8 9Contact 10 11 Jon Leech (oddhack 'at' sonic.net) 12 Daniel Koch, NVIDIA (dkoch 'at' nvidia.com) 13 14Contributors 15 16 Daniel Koch, NVIDIA (dkoch 'at' nvidia.com) 17 Pat Brown, NVIDIA (pbrown 'at' nvidia.com) 18 Jesse Hall, Google 19 Maurice Ribble, Qualcomm 20 Bill Licea-Kane, Qualcomm 21 Graham Connor, Imagination 22 Ben Bowman, Imagination 23 Jonathan Putsman, Imagination 24 Marcin Kantoch, Mobica 25 Slawomir Grajewski, Intel 26 Contributors to ARB_gpu_shader5 27 28Notice 29 30 Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at 31 http://www.khronos.org/registry/speccopyright.html 32 33Specification Update Policy 34 35 Khronos-approved extension specifications are updated in response to 36 issues and bugs prioritized by the Khronos OpenGL ES Working Group. For 37 extensions which have been promoted to a core Specification, fixes will 38 first appear in the latest version of that core Specification, and will 39 eventually be backported to the extension document. This policy is 40 described in more detail at 41 https://www.khronos.org/registry/OpenGL/docs/update_policy.php 42 43 Portions Copyright (c) 2013-2014 NVIDIA Corporation. 44 45Status 46 47 Approved by the OpenGL ES Working Group 48 Ratified by the Khronos Board of Promoters on November 7, 2014 49 50Version 51 52 Last Modified Date: March 27, 2015 53 Revision: 2 54 55Number 56 57 OpenGL ES Extension #211 58 59Dependencies 60 61 OpenGL ES 3.1 and OpenGL ES Shading Language 3.10 are required. 62 63 This specification is written against the OpenGL ES 3.1 (March 17, 64 2014) and OpenGL ES 3.10 Shading Language (March 17, 2014) 65 Specifications. 66 67 This extension interacts with OES_geometry_shader. 68 69Overview 70 71 This extension provides a set of new features to the OpenGL ES Shading 72 Language and related APIs to support capabilities of new GPUs, extending 73 the capabilities of version 3.10 of the OpenGL ES Shading Language. 74 Shaders using the new functionality provided by this extension should 75 enable this functionality via the construct 76 77 #extension GL_OES_gpu_shader5 : require (or enable) 78 79 This extension provides a variety of new features for all shader types, 80 including: 81 82 * support for indexing into arrays of opaque types (samplers, 83 and atomic counters) using dynamically uniform integer expressions; 84 85 * support for indexing into arrays of images and shader storage blocks 86 using only constant integral expressions; 87 88 * extending the uniform block capability to allow shaders to index 89 into an array of uniform blocks; 90 91 * a "precise" qualifier allowing computations to be carried out exactly 92 as specified in the shader source to avoid optimization-induced 93 invariance issues (which might cause cracking in tessellation); 94 95 * new built-in functions supporting: 96 97 * fused floating-point multiply-add operations; 98 99 * extending the textureGather() built-in functions provided by 100 OpenGL ES Shading Language 3.10: 101 102 * allowing shaders to use arbitrary offsets computed at run-time to 103 select a 2x2 footprint to gather from; and 104 * allowing shaders to use separate independent offsets for each of 105 the four texels returned, instead of requiring a fixed 2x2 106 footprint. 107 108New Procedures and Functions 109 110 None 111 112New Tokens 113 114 None 115 116Additions to the OpenGL ES 3.1 Specification 117 118 Add to the end of section 8.13.2, "Coordinate Wrapping and Texel 119 Selection": 120 121 ... texture source color of (0,0,0,1) for all four source texels. 122 123 The textureGatherOffsets built-in shader functions return a vector 124 derived from sampling four texels in the image array of level 125 <level_base>. For each of the four texel offsets specified by the 126 <offsets> argument, the rules for the LINEAR minification filter are 127 applied to identify a 2x2 texel footprint, from which the single texel 128 T_i0_j0 is selected. A four-component vector is then assembled by taking 129 a single component from each of the four T_i0_j0 texels in the same 130 manner as for the textureGather function. 131 132 133Additions to the OpenGL ES Shading Language 3.10 Specification 134 135 Including the following line in a shader can be used to control the 136 language features described in this extension: 137 138 #extension GL_OES_gpu_shader5 : <behavior> 139 140 where <behavior> is as specified in section 3.4. 141 142 A new preprocessor #define is added to the OpenGL ES Shading Language: 143 144 #define GL_OES_gpu_shader5 1 145 146 147 Modifications to Section 3.7 (Keywords) 148 149 Remove "precise" from the list of reserved keywords and add it to the 150 list of keywords. 151 152 Remove the last paragraph from section 3.9.3 "Dynamically Uniform 153 Expressions" (starting "The definition is not used in this version...") 154 155 156 Add to the introduction to section 4.1.7, "Opaque Types" on p. 26: 157 158 When aggregated into arrays within a shader, opaque types can only be 159 indexed with a dynamically uniform integral expression (see section 160 3.9.3) unless otherwise noted; otherwise, results are undefined. 161 162 163 Replace the first paragraph of section 4.1.7.1, "Samplers" (removing the 164 second sentence) on p. 27: 165 166 Sampler types (e.g., sampler2D) are opaque types, declared and behaving 167 as described above for opaque types. 168 169 Sampler variables are ... 170 171 172 173 Modify Section 4.3.9 "Interface Blocks", as modified by 174 OES_geometry_shader and OES_shader_io_blocks: 175 176 (modify the paragraph starting "For uniform or shader storage blocks 177 declared as an array", removing the requirement for indexing uniform 178 blocks using constant expressions) 179 180 For uniform or shader storage blocks declared as an array, each 181 individual array element corresponds to a separate buffer object bind 182 range, backing one instance of the block. As the array size indicates 183 the number of buffer objects needed, uniform and shader storage block 184 array declarations must specify an array size. All indices used to index 185 a shader storage block array must be constant integral expressions. A 186 uniform block array can only be indexed with a dynamically uniform 187 integral expression, otherwise results are undefined. 188 189 190 Add new section 4.9gs5 before section 4.10 "Order of Qualification": 191 192 4.9gs5 The Precise Qualifier 193 194 Some algorithms may require that floating-point computations be carried 195 out in exactly the manner specified in the source code, even if the 196 implementation supports optimizations that could produce nearly 197 equivalent results with higher performance. For example, many GL 198 implementations support a "multiply-add" that can compute values such as 199 200 float result = (float(a) * float(b)) + float(c); 201 202 in a single operation. The result of a floating-point multiply-add may 203 not always be identical to first doing a multiply yielding a 204 floating-point result, and then doing a floating-point add. By default, 205 implementations are permitted to perform optimizations that effectively 206 modify the order of the operations used to evaluate an expression, even 207 if those optimizations may produce slightly different results relative 208 to unoptimized code. 209 210 The qualifier "precise" will ensure that operations contributing to a 211 variable's value are performed in the order and with the precision 212 specified in the source code. Order of evaluation is determined by 213 operator precedence and parentheses, as described in Section &5. 214 Expressions must be evaluated with a precision consistent with the 215 operation; for example, multiplying two "float" values must produce a 216 single value with "float" precision. This effectively prohibits the 217 arbitrary use of fused multiply-add operations if the intermediate 218 multiply result is kept at a higher precision. For example: 219 220 precise out vec4 position; 221 222 declares that computations used to produce the value of "position" must 223 be performed precisely using the order and precision specified. As with 224 the invariant qualifier (section &4.6.1), the precise qualifier may be 225 used to qualify a built-in or previously declared user-defined variable 226 as being precise: 227 228 out vec3 Color; 229 precise Color; // make existing Color be precise 230 231 This qualifier will affect the evaluation of expressions used on the 232 right-hand side of an assignment if and only if: 233 234 * the variable assigned to is qualified as "precise"; or 235 236 * the value assigned is used later in the same function, either 237 directly or indirectly, on the right-hand of an assignment to a 238 variable declared as "precise". 239 240 Expressions computed in a function are treated as precise only if 241 assigned to a variable qualified as "precise" in that same function. Any 242 other expressions within a function are not automatically treated as 243 precise, even if they are used to determine a value that is returned by 244 the function and directly assigned to a variable qualified as "precise". 245 246 Some examples of the use of "precise" include: 247 248 in vec4 a, b, c, d; 249 precise out vec4 v; 250 251 float func(float e, float f, float g, float h) 252 { 253 return (e*f) + (g*h); // no special precision 254 } 255 256 float func2(float e, float f, float g, float h) 257 { 258 precise result = (e*f) + (g*h); // ensures a precise return value 259 return result; 260 } 261 262 float func3(float i, float j, precise out float k) 263 { 264 k = i * i + j; // precise, due to <k> declaration 265 } 266 267 void main(void) 268 { 269 vec4 r = vec3(a * b); // precise, used to compute v.xyz 270 vec4 s = vec3(c * d); // precise, used to compute v.xyz 271 v.xyz = r + s; // precise 272 v.w = (a.w * b.w) + (c.w * d.w); // precise 273 v.x = func(a.x, b.x, c.x, d.x); // values computed in func() 274 // are NOT precise 275 v.x = func2(a.x, b.x, c.x, d.x); // precise! 276 func3(a.x * b.x, c.x * d.x, v.x); // precise! 277 } 278 279 280 Modify Section 8.3, Common Functions, p. 104 281 282 (add support for floating-point multiply-add) 283 284 Syntax: 285 286 genType fma(genType a, genType b, genType c); 287 288 Computes and returns a * b + c. 289 290 In uses where the return value is eventually consumed by a variable 291 declared as precise: 292 293 * fma() is considered a single operation, whereas the expression 294 "a*b + c" consumed by a variable declared precise is considered two 295 operations. 296 * The precision of fma() can differ from the precision of the expression 297 "a*b + c". 298 * fma() will be computed with the same precision as any other fma() 299 consumed by a precise variable, giving invariant results for the same 300 input values of a, b, and c. 301 302 Otherwise, in the absence of precise consumption, there are no special 303 constraints on the number of operations or difference in precision 304 between fma() and the expression "a*b + c". 305 306 307 Modify the table of functions in section 8.9.3 "Texture Gather 308 Functions", changing the "Description" column for the existing 309 textureGatherOffset functions on p. 127: 310 311 Description 312 313 Perform a texture gather operation as in textureGather offset by 314 <offset> as described in textureOffset, except that the <offset> can 315 be variable (non-constant) and the implementation-dependent minimum 316 and maximum offset values are given by the values of 317 MIN_PROGRAM_TEXTURE_GATHER_OFFSET and 318 MAX_PROGRAM_TEXTURE_GATHER_OFFSET, respectively. 319 320 321 Add new textureGatherOffsets functions to the same table, on p. 127: 322 323 Syntax 324 325 gvec4 textureGatherOffsets(gsampler2D sampler, vec2 P, 326 ivec2 offsets[4] [, int comp]) 327 gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 P, 328 ivec2 offsets[4] [, int comp]) 329 vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 P, 330 float refZ, ivec2 offsets[4]) 331 vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 P, 332 float refZ, ivec2 offsets[4]) 333 334 Description 335 336 Operate identically to textureGatherOffset except that <offsets> is 337 used to determine the location of the four texels to sample. Each of 338 the four texels is obtained by applying the corresponding offset in 339 <offsets> as a (u,v) coordinate offset to <coord>, identifying the 340 four-texel linear footprint, and then selecting texel (i0,j0) of 341 that footprint. The specified values in <offsets> must be constant 342 integral expressions. 343 344New Implementation Dependent State 345 346 None. 347 348Issues 349 350 Note: These issues apply specifically to the definition of the 351 OES_gpu_shader5 specification, which is based on the OpenGL extension 352 ARB_gpu_shader5 as updated in OpenGL 4.x. Resolved issues from 353 ARB_gpu_shader5 have been removed, but some remain applicable to this 354 extension. ARB_gpu_shader5 can be found in the OpenGL Registry. 355 356 (1) What functionality was removed relative to ARB_gpu_shader5? 357 358 - Instanced geometry support (moved into OES_geometry_shader) 359 - Implicit conversions (moved to EXT_shader_implicit_conversions) 360 - Interactions with features not supported by the underlying 361 ES 3.1 API and Shading Language, including: 362 * interactions with ARB_gpu_Shader_fp64 and NV_gpu_shader, including 363 support for double-precision in implicit conversions and function 364 overload resolution 365 * multiple vertex streams (these require ARB_transform_feedback3) 366 * textureGather built-in variants for cube map array and rectangle 367 texture samples. 368 * shading language function overloading rules involving the type 369 double 370 - Functionality already in OpenGL ES 3.00, including packing and 371 unpacking of 16-bit types and converting floating-point values to or 372 from their integer bit encodings. 373 - Functionality already in OpenGL ES 3.10, including 374 * splitting and building floating-point numbers from a significand and 375 exponent, integer bitfield manipulation, and packing and unpacking 376 vectors of 8-bit fixed-point data types. 377 * a subset of the textureGather and textureGatherOffset builtins 378 (but some textureGather builtins remain in this extension). 379 - Functionality already in OES_sample_variables, including support for 380 reading a mask of covered samples in a fragment shader. 381 - Functionality already in OES_shader_multisample_interpolation, 382 including support for interpolating a fragment shader input at a 383 programmable offset relative to the pixel center, a programmable 384 sample number, or at the centroid. 385 - MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS (Issue 9). 386 387 (2) What functionality was changed and added relative to 388 ARB_gpu_shader5? 389 390 - Support for indexing into arrays of samplers with extended to all 391 opaque types, and the description of allowed indices was rewritten 392 in terms of dynamically uniform expressions, as was done when 393 ARB_gpu_shader5 was promoted into OpenGL 4.0. 394 - The only remaining API interaction is an increase in a 395 minium-maximum value, so no "Changes to the OpenGL ES Specification" 396 sections are included above. 397 - arrays of images and shader storage blocks can only be indexed 398 with constant integral expressions. 399 400 (3) What should the rules on GLSL suffixing be? 401 402 RESOLVED: "precise" is not a reserved keyword in ESSL 3.00, but it is 403 a keyword in GLSL 4.40. ESSL 3.10 updated the reserved keyword list 404 to include all keywords used or reserved in GLSL 4.40 (but not otherwise 405 used in ES) and thus we can use "precise" in this spec by moving it 406 from the reserved keywords section. See bug 11179. 407 408 (4) Are changes to the "Order of Qualification" section needed? 409 410 RESOLVED. No. ESSL 3.10 relaxes the ordering constraints similarly to 411 GLSL 4.40. And thus there is no need for modifications to section 4.7 412 in 3.00 (4.10 in 3.10) in this extension. 413 414 (5) Are any more changes needed to the descriptions of texture gather? 415 416 Probably not. Bug 11109 suggests cleanup to be applied to both desktop 417 API and language specifications to make them cleaner and more 418 consistent. The important parts of this cleanup were done in the texture 419 gather functionality folded into ES 3.1, although some small language 420 tweaks may still be needed. 421 422 (6) Moved to EXT_shader_implicit_conversions Issue 4. 423 424 (7) Should uniform and shader storage blocks be backable with buffer 425 object subranges? 426 427 RESOLVED: Yes. The section 4.3.7 "Interface Blocks" language picked up 428 from desktop GL allows this (they are called "bind ranges"). This is a 429 spec oversight in ES, because BindBufferRange is fully supported in 430 OpenGL ES 3.0. 431 432 (8) Where is MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS? 433 434 RESOLVED. It was not added in Core GL because ARB_texture_gather and 435 ARB_gpu_shader5 were both added to GL 4.0 and thus the query was 436 unneeded. Since OpenGL ES 3.1 also includes texture gather and the 437 multi-component gather support from gpu_shader5, the query was also 438 unnecessary there and here. Bug 11002. 439 440 (9) Some vendors may not be able to support dynamic indexing 441 of arrays of images or shader storage blocks. What should we use instead? 442 443 RESOLVED: Only allowing 'constant integral expression' instead of 444 'dynamically uniform integer expression' for arrays of images or shader 445 storage blocks. For images this is done by carving out an exception in the 446 general language for opaque types. For shader storage blocks, different 447 rules are given for arrays of uniform blocks and arrays of shader storage 448 blocks. 449 450Revision History 451 452 Rev. Date Author Changes 453 ---- ---------- --------- ------------------------------------------------- 454 1 06/18/2014 dkoch Initial OES version based on EXT. 455 No functional changes. 456 2 03/27/2015 dkoch Add missing function and token sections. 457