1Name 2 3 EXT_gpu_shader5 4 5Name Strings 6 7 GL_EXT_gpu_shader5 8 9Contact 10 11 Jon Leech (oddhack 'at' sonic.net) 12 Daniel Koch, NVIDIA (dkoch 'at' nvidia.com) 13 14Contributors 15 16 Daniel Koch, NVIDIA (dkoch 'at' nvidia.com) 17 Pat Brown, NVIDIA (pbrown 'at' nvidia.com) 18 Jesse Hall, Google 19 Maurice Ribble, Qualcomm 20 Bill Licea-Kane, Qualcomm 21 Graham Connor, Imagination 22 Ben Bowman, Imagination 23 Jonathan Putsman, Imagination 24 Marcin Kantoch, Mobica 25 Slawomir Grajewski, Intel 26 Contributors to ARB_gpu_shader5 27 28Notice 29 30 Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at 31 http://www.khronos.org/registry/speccopyright.html 32 33 Portions Copyright (c) 2013-2014 NVIDIA Corporation. 34 35Status 36 37 Complete. 38 39Version 40 41 Last Modified Date: March 27, 2015 42 Revision: 12 43 44Number 45 46 OpenGL ES Extension #178 47 48Dependencies 49 50 OpenGL ES 3.1 and OpenGL ES Shading Language 3.10 are required. 51 52 This specification is written against the OpenGL ES 3.1 (March 17, 53 2014) and OpenGL ES 3.10 Shading Language (March 17, 2014) 54 Specifications. 55 56 This extension interacts with EXT_geometry_shader. 57 58Overview 59 60 This extension provides a set of new features to the OpenGL ES Shading 61 Language and related APIs to support capabilities of new GPUs, extending 62 the capabilities of version 3.10 of the OpenGL ES Shading Language. 63 Shaders using the new functionality provided by this extension should 64 enable this functionality via the construct 65 66 #extension GL_EXT_gpu_shader5 : require (or enable) 67 68 This extension provides a variety of new features for all shader types, 69 including: 70 71 * support for indexing into arrays of opaque types (samplers, 72 and atomic counters) using dynamically uniform integer expressions; 73 74 * support for indexing into arrays of images and shader storage blocks 75 using only constant integral expressions; 76 77 * extending the uniform block capability to allow shaders to index 78 into an array of uniform blocks; 79 80 * a "precise" qualifier allowing computations to be carried out exactly 81 as specified in the shader source to avoid optimization-induced 82 invariance issues (which might cause cracking in tessellation); 83 84 * new built-in functions supporting: 85 86 * fused floating-point multiply-add operations; 87 88 * extending the textureGather() built-in functions provided by 89 OpenGL ES Shading Language 3.10: 90 91 * allowing shaders to use arbitrary offsets computed at run-time to 92 select a 2x2 footprint to gather from; and 93 * allowing shaders to use separate independent offsets for each of 94 the four texels returned, instead of requiring a fixed 2x2 95 footprint. 96 97New Procedures and Functions 98 99 None 100 101New Tokens 102 103 None 104 105Additions to the OpenGL ES 3.1 Specification 106 107 Add to the end of section 8.13.2, "Coordinate Wrapping and Texel 108 Selection": 109 110 ... texture source color of (0,0,0,1) for all four source texels. 111 112 The textureGatherOffsets built-in shader functions return a vector 113 derived from sampling four texels in the image array of level 114 <level_base>. For each of the four texel offsets specified by the 115 <offsets> argument, the rules for the LINEAR minification filter are 116 applied to identify a 2x2 texel footprint, from which the single texel 117 T_i0_j0 is selected. A four-component vector is then assembled by taking 118 a single component from each of the four T_i0_j0 texels in the same 119 manner as for the textureGather function. 120 121 122Additions to the OpenGL ES Shading Language 3.10 Specification 123 124 Including the following line in a shader can be used to control the 125 language features described in this extension: 126 127 #extension GL_EXT_gpu_shader5 : <behavior> 128 129 where <behavior> is as specified in section 3.4. 130 131 A new preprocessor #define is added to the OpenGL ES Shading Language: 132 133 #define GL_EXT_gpu_shader5 1 134 135 136 Modifications to Section 3.7 (Keywords) 137 138 Remove "precise" from the list of reserved keywords and add it to the 139 list of keywords. 140 141 Remove the last paragraph from section 3.9.3 "Dynamically Uniform 142 Expressions" (starting "The definition is not used in this version...") 143 144 145 Add to the introduction to section 4.1.7, "Opaque Types" on p. 26: 146 147 When aggregated into arrays within a shader, opaque types can only be 148 indexed with a dynamically uniform integral expression (see section 149 3.9.3) unless otherwise noted; otherwise, results are undefined. 150 151 152 Replace the first paragraph of section 4.1.7.1, "Samplers" (removing the 153 second sentence) on p. 27: 154 155 Sampler types (e.g., sampler2D) are opaque types, declared and behaving 156 as described above for opaque types. 157 158 Sampler variables are ... 159 160 161 162 Modify Section 4.3.9 "Interface Blocks", as modified by 163 EXT_geometry_shader and EXT_shader_io_blocks: 164 165 (modify the paragraph starting "For uniform or shader storage blocks 166 declared as an array", removing the requirement for indexing uniform 167 blocks using constant expressions) 168 169 For uniform or shader storage blocks declared as an array, each 170 individual array element corresponds to a separate buffer object bind 171 range, backing one instance of the block. As the array size indicates 172 the number of buffer objects needed, uniform and shader storage block 173 array declarations must specify an array size. All indices used to index 174 a shader storage block array must be constant integral expressions. A 175 uniform block array can only be indexed with a dynamically uniform 176 integral expression, otherwise results are undefined. 177 178 179 Add new section 4.9gs5 before section 4.10 "Order of Qualification": 180 181 4.9gs5 The Precise Qualifier 182 183 Some algorithms may require that floating-point computations be carried 184 out in exactly the manner specified in the source code, even if the 185 implementation supports optimizations that could produce nearly 186 equivalent results with higher performance. For example, many GL 187 implementations support a "multiply-add" that can compute values such as 188 189 float result = (float(a) * float(b)) + float(c); 190 191 in a single operation. The result of a floating-point multiply-add may 192 not always be identical to first doing a multiply yielding a 193 floating-point result, and then doing a floating-point add. By default, 194 implementations are permitted to perform optimizations that effectively 195 modify the order of the operations used to evaluate an expression, even 196 if those optimizations may produce slightly different results relative 197 to unoptimized code. 198 199 The qualifier "precise" will ensure that operations contributing to a 200 variable's value are performed in the order and with the precision 201 specified in the source code. Order of evaluation is determined by 202 operator precedence and parentheses, as described in Section &5. 203 Expressions must be evaluated with a precision consistent with the 204 operation; for example, multiplying two "float" values must produce a 205 single value with "float" precision. This effectively prohibits the 206 arbitrary use of fused multiply-add operations if the intermediate 207 multiply result is kept at a higher precision. For example: 208 209 precise out vec4 position; 210 211 declares that computations used to produce the value of "position" must 212 be performed precisely using the order and precision specified. As with 213 the invariant qualifier (section &4.6.1), the precise qualifier may be 214 used to qualify a built-in or previously declared user-defined variable 215 as being precise: 216 217 out vec3 Color; 218 precise Color; // make existing Color be precise 219 220 This qualifier will affect the evaluation of expressions used on the 221 right-hand side of an assignment if and only if: 222 223 * the variable assigned to is qualified as "precise"; or 224 225 * the value assigned is used later in the same function, either 226 directly or indirectly, on the right-hand of an assignment to a 227 variable declared as "precise". 228 229 Expressions computed in a function are treated as precise only if 230 assigned to a variable qualified as "precise" in that same function. Any 231 other expressions within a function are not automatically treated as 232 precise, even if they are used to determine a value that is returned by 233 the function and directly assigned to a variable qualified as "precise". 234 235 Some examples of the use of "precise" include: 236 237 in vec4 a, b, c, d; 238 precise out vec4 v; 239 240 float func(float e, float f, float g, float h) 241 { 242 return (e*f) + (g*h); // no special precision 243 } 244 245 float func2(float e, float f, float g, float h) 246 { 247 precise result = (e*f) + (g*h); // ensures a precise return value 248 return result; 249 } 250 251 float func3(float i, float j, precise out float k) 252 { 253 k = i * i + j; // precise, due to <k> declaration 254 } 255 256 void main(void) 257 { 258 vec4 r = vec3(a * b); // precise, used to compute v.xyz 259 vec4 s = vec3(c * d); // precise, used to compute v.xyz 260 v.xyz = r + s; // precise 261 v.w = (a.w * b.w) + (c.w * d.w); // precise 262 v.x = func(a.x, b.x, c.x, d.x); // values computed in func() 263 // are NOT precise 264 v.x = func2(a.x, b.x, c.x, d.x); // precise! 265 func3(a.x * b.x, c.x * d.x, v.x); // precise! 266 } 267 268 269 Modify Section 8.3, Common Functions, p. 104 270 271 (add support for floating-point multiply-add) 272 273 Syntax: 274 275 genType fma(genType a, genType b, genType c); 276 277 Computes and returns a * b + c. 278 279 In uses where the return value is eventually consumed by a variable 280 declared as precise: 281 282 * fma() is considered a single operation, whereas the expression 283 "a*b + c" consumed by a variable declared precise is considered two 284 operations. 285 * The precision of fma() can differ from the precision of the expression 286 "a*b + c". 287 * fma() will be computed with the same precision as any other fma() 288 consumed by a precise variable, giving invariant results for the same 289 input values of a, b, and c. 290 291 Otherwise, in the absence of precise consumption, there are no special 292 constraints on the number of operations or difference in precision 293 between fma() and the expression "a*b + c". 294 295 296 Modify the table of functions in section 8.9.3 "Texture Gather 297 Functions", changing the "Description" column for the existing 298 textureGatherOffset functions on p. 127: 299 300 Description 301 302 Perform a texture gather operation as in textureGather offset by 303 <offset> as described in textureOffset, except that the <offset> can 304 be variable (non-constant) and the implementation-dependent minimum 305 and maximum offset values are given by the values of 306 MIN_PROGRAM_TEXTURE_GATHER_OFFSET and 307 MAX_PROGRAM_TEXTURE_GATHER_OFFSET, respectively. 308 309 310 Add new textureGatherOffsets functions to the same table, on p. 127: 311 312 Syntax 313 314 gvec4 textureGatherOffsets(gsampler2D sampler, vec2 P, 315 ivec2 offsets[4] [, int comp]) 316 gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 P, 317 ivec2 offsets[4] [, int comp]) 318 vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 P, 319 float refZ, ivec2 offsets[4]) 320 vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 P, 321 float refZ, ivec2 offsets[4]) 322 323 Description 324 325 Operate identically to textureGatherOffset except that <offsets> is 326 used to determine the location of the four texels to sample. Each of 327 the four texels is obtained by applying the corresponding offset in 328 <offsets> as a (u,v) coordinate offset to <coord>, identifying the 329 four-texel linear footprint, and then selecting texel (i0,j0) of 330 that footprint. The specified values in <offsets> must be constant 331 integral expressions. 332 333New Implementation Dependent State 334 335 None. 336 337Issues 338 339 Note: These issues apply specifically to the definition of the 340 EXT_gpu_shader5 specification, which is based on the OpenGL extension 341 ARB_gpu_shader5 as updated in OpenGL 4.x. Resolved issues from 342 ARB_gpu_shader5 have been removed, but some remain applicable to this 343 extension. ARB_gpu_shader5 can be found in the OpenGL Registry. 344 345 (1) What functionality was removed relative to ARB_gpu_shader5? 346 347 - Instanced geometry support (moved into EXT_geometry_shader) 348 - Implicit conversions (moved to EXT_shader_implicit_conversions) 349 - Interactions with features not supported by the underlying 350 ES 3.1 API and Shading Language, including: 351 * interactions with ARB_gpu_Shader_fp64 and NV_gpu_shader, including 352 support for double-precision in implicit conversions and function 353 overload resolution 354 * multiple vertex streams (these require ARB_transform_feedback3) 355 * textureGather built-in variants for cube map array and rectangle 356 texture samples. 357 * shading language function overloading rules involving the type 358 double 359 - Functionality already in OpenGL ES 3.00, including packing and 360 unpacking of 16-bit types and converting floating-point values to or 361 from their integer bit encodings. 362 - Functionality already in OpenGL ES 3.10, including 363 * splitting and building floating-point numbers from a significand and 364 exponent, integer bitfield manipulation, and packing and unpacking 365 vectors of 8-bit fixed-point data types. 366 * a subset of the textureGather and textureGatherOffset builtins 367 (but some textureGather builtins remain in this extension). 368 - Functionality already in OES_sample_variables, including support for 369 reading a mask of covered samples in a fragment shader. 370 - Functionality already in OES_shader_multisample_interpolation, 371 including support for interpolating a fragment shader input at a 372 programmable offset relative to the pixel center, a programmable 373 sample number, or at the centroid. 374 - MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS (Issue 9). 375 376 (2) What functionality was changed and added relative to 377 ARB_gpu_shader5? 378 379 - Support for indexing into arrays of samplers with extended to all 380 opaque types, and the description of allowed indices was rewritten 381 in terms of dynamically uniform expressions, as was done when 382 ARB_gpu_shader5 was promoted into OpenGL 4.0. 383 - The only remaining API interaction is an increase in a 384 minium-maximum value, so no "Changes to the OpenGL ES Specification" 385 sections are included above. 386 - arrays of images and shader storage blocks can only be indexed 387 with constant integral expressions. 388 389 (3) What should the rules on GLSL suffixing be? 390 391 RESOLVED: "precise" is not a reserved keyword in ESSL 3.00, but it is 392 a keyword in GLSL 4.40. ESSL 3.10 updated the reserved keyword list 393 to include all keywords used or reserved in GLSL 4.40 (but not otherwise 394 used in ES) and thus we can use "precise" in this spec by moving it 395 from the reserved keywords section. See bug 11179. 396 397 (4) Are changes to the "Order of Qualification" section needed? 398 399 RESOLVED. No. ESSL 3.10 relaxes the ordering constraints similarly to 400 GLSL 4.40. And thus there is no need for modifications to section 4.7 401 in 3.00 (4.10 in 3.10) in this extension. 402 403 (5) Are any more changes needed to the descriptions of texture gather? 404 405 Probably not. Bug 11109 suggests cleanup to be applied to both desktop 406 API and language specifications to make them cleaner and more 407 consistent. The important parts of this cleanup were done in the texture 408 gather functionality folded into ES 3.1, although some small language 409 tweaks may still be needed. 410 411 (6) Moved to EXT_shader_implicit_conversions Issue 4. 412 413 (7) Should uniform and shader storage blocks be backable with buffer 414 object subranges? 415 416 RESOLVED: Yes. The section 4.3.7 "Interface Blocks" language picked up 417 from desktop GL allows this (they are called "bind ranges"). This is a 418 spec oversight in ES, because BindBufferRange is fully supported in 419 OpenGL ES 3.0. 420 421 (8) Where is MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS? 422 423 RESOLVED. It was not added in Core GL because ARB_texture_gather and 424 ARB_gpu_shader5 were both added to GL 4.0 and thus the query was 425 unneeded. Since OpenGL ES 3.1 also includes texture gather and the 426 multi-component gather support from gpu_shader5, the query was also 427 unnecessary there and here. Bug 11002. 428 429 (9) Some vendors may not be able to support dynamic indexing 430 of arrays of images or shader storage blocks. What should we use instead? 431 432 RESOLVED: Only allowing 'constant integral expression' instead of 433 'dynamically uniform integer expression' for arrays of images or shader 434 storage blocks. For images this is done by carving out an exception in the 435 general language for opaque types. For shader storage blocks, different 436 rules are given for arrays of uniform blocks and arrays of shader storage 437 blocks. 438 439Revision History 440 441 Revision 1, 2013/10/27 (Jon Leech) 442 - Initial version based on ARB_gpu_shader5 443 444 Revision 2, 2013/11/06 (Jon Leech) 445 - Update Issues list with unresolved issues 4-7, which are dependent 446 on decisions to be made by the ARB and ES working groups. 447 - Remove {un,}packUnorm2x16EXT (already in ESSL 3.00) 448 - Match changes to ES 3.1 texture gather language, but still 449 reorganize the textureGather functions into their own subsection & 450 table. ES 3.1 restored the [, int comp] argument to the functions 451 it defined. Removed sampler2DRect variants incorrectly left in. 452 - Clean up function overloading example text and opened bug 11178 to 453 resolve possible problems with the GLSL 4.40 language this is 454 based on. 455 - Remove reference to image2DMS, since there is no longer any image 456 load/store support for multisample textures in ES 3.1 457 - Add issue (8) regarding "bind ranges". 458 459 Revision 3, 2013/11/14 (Jon Leech) 460 - Resolve function overloading issue 7, per bug 11178. 461 462 Revision 4, 2013/11/20 (Jon Leech) 463 - Sync with ES 3.1 spec language update. 464 - Refer to ES 3.1 instead of ES 3plus. 465 466 Revision 5, 2013/11/21 (Daniel Koch) 467 - removed implicit conversion language (to a separate document). 468 - updated textureGather functions to reflect the shadow gather 469 functionality being added in ES 3.1. 470 - added issue 9. 471 472 Revision 6, 2013/12/18 (Daniel Koch) 473 - minor cleanup 474 - added issue 10, restrict arrays of images to const-int-expr 475 476 Revision 7, 2014/02/12 (Daniel Koch) 477 - restrict indexing arrays of shader storage blocks to const-int-expr. 478 - Resolved issues 4, 5, 8, 9, 10 and supporting edits. 479 480 Revision 8, 2014/03/10 (Jon Leech) 481 - Rebase on OpenGL ES 3.1 and change suffix to EXT. 482 - Remove textureGather functions already present in the existing 483 GLSL-ES 3.10 spec section 8.9.3 484 485 Revision 9, 2014/03/26 (Daniel Koch) 486 - update contributors 487 488 Revision 10, 2014/03/28 (Jon Leech) 489 - Sync with released ES 3.1 specs. Reflow text. 490 491 Revision 11, 2014/04/01 (Daniel Koch) 492 - Update contributors 493 494 Revision 12, 2015/03/27 (Daniel Koch) 495 - Add missing function and token sections. 496