1Name 2 3 ARB_gpu_shader5 4 5Name Strings 6 7 GL_ARB_gpu_shader5 8 9Contact 10 11 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 12 13Contributors 14 15 Barthold Lichtenbelt, NVIDIA 16 Bill Licea-Kane, AMD 17 Bruce Merry, ARM 18 Chris Dodd, NVIDIA 19 Eric Werness, NVIDIA 20 Graham Sellers, AMD 21 Greg Roth, NVIDIA 22 Jeff Bolz, NVIDIA 23 Nick Haemel, AMD 24 Pierre Boudier, AMD 25 Piers Daniell, NVIDIA 26 27Notice 28 29 Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at 30 http://www.khronos.org/registry/speccopyright.html 31 32Specification Update Policy 33 34 Khronos-approved extension specifications are updated in response to 35 issues and bugs prioritized by the Khronos OpenGL Working Group. For 36 extensions which have been promoted to a core Specification, fixes will 37 first appear in the latest version of that core Specification, and will 38 eventually be backported to the extension document. This policy is 39 described in more detail at 40 https://www.khronos.org/registry/OpenGL/docs/update_policy.php 41 42Status 43 44 Complete. Approved by the ARB at the 2010/01/22 F2F meeting. 45 Approved by the Khronos Board of Promoters on March 10, 2010. 46 47Version 48 49 Version 16, March 30, 2012 50 51Number 52 53 ARB Extension #88 54 55Dependencies 56 57 This extension is written against the OpenGL 3.2 (Compatibility Profile) 58 Specification. 59 60 This extension is written against Version 1.50 (Revision 09) of the OpenGL 61 Shading Language Specification. 62 63 OpenGL 3.2 and GLSL 1.50 are required. 64 65 This extension interacts with ARB_gpu_shader_fp64. 66 67 This extension interacts with NV_gpu_shader5. 68 69 This extension interacts with ARB_sample_shading. 70 71 This extension interacts with ARB_texture_gather. 72 73Overview 74 75 This extension provides a set of new features to the OpenGL Shading 76 Language and related APIs to support capabilities of new GPUs, extending 77 the capabilities of version 1.50 of the OpenGL Shading Language. Shaders 78 using the new functionality provided by this extension should enable this 79 functionality via the construct 80 81 #extension GL_ARB_gpu_shader5 : require (or enable) 82 83 This extension provides a variety of new features for all shader types, 84 including: 85 86 * support for indexing into arrays of samplers using non-constant 87 indices, as long as the index doesn't diverge if multiple shader 88 invocations are run in lockstep; 89 90 * extending the uniform block capability of OpenGL 3.1 and 3.2 to allow 91 shaders to index into an array of uniform blocks; 92 93 * support for implicitly converting signed integer types to unsigned 94 types, as well as more general implicit conversion and function 95 overloading infrastructure to support new data types introduced by 96 other extensions; 97 98 * a "precise" qualifier allowing computations to be carried out exactly 99 as specified in the shader source to avoid optimization-induced 100 invariance issues (which might cause cracking in tessellation); 101 102 * new built-in functions supporting: 103 104 * fused floating-point multiply-add operations; 105 106 * splitting a floating-point number into a significand and exponent 107 (frexp), or building a floating-point number from a significand and 108 exponent (ldexp); 109 110 * integer bitfield manipulation, including functions to find the 111 position of the most or least significant set bit, count the number 112 of one bits, and bitfield insertion, extraction, and reversal; 113 114 * packing and unpacking vectors of small fixed-point data types into a 115 larger scalar; and 116 117 * convert floating-point values to or from their integer bit 118 encodings; 119 120 * extending the textureGather() built-in functions provided by 121 ARB_texture_gather: 122 123 * allowing shaders to select any single component of a multi-component 124 texture to produce the gathered 2x2 footprint; 125 126 * allowing shaders to perform a per-sample depth comparison when 127 gathering the 2x2 footprint using for shadow sampler types; 128 129 * allowing shaders to use arbitrary offsets computed at run-time to 130 select a 2x2 footprint to gather from; and 131 132 * allowing shaders to use separate independent offsets for each of the 133 four texels returned, instead of requiring a fixed 2x2 footprint. 134 135 This extension also provides some new capabilities for individual 136 shader types, including: 137 138 * support for instanced geometry shaders, where a geometry shader may be 139 run multiple times for each primitive, including a built-in 140 gl_InvocationID to identify the invocation number; 141 142 * support for emitting vertices in a geometry program where each vertex 143 emitted may be directed independently at a specified vertex stream (as 144 provided by ARB_transform_feedback3), and where each shader output is 145 associated with a stream; 146 147 * support for reading a mask of covered samples in a fragment shader; 148 and 149 150 * support for interpolating a fragment shader input at a programmable 151 offset relative to the pixel center, a programmable sample number, or 152 at the centroid. 153 154IP Status 155 156 No known IP claims. 157 158New Procedures and Functions 159 160 None 161 162New Tokens 163 164 Accepted by the <pname> parameter of GetProgramiv: 165 166 GEOMETRY_SHADER_INVOCATIONS 0x887F 167 168 Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, 169 GetDoublev, and GetInteger64v: 170 171 MAX_GEOMETRY_SHADER_INVOCATIONS 0x8E5A 172 MIN_FRAGMENT_INTERPOLATION_OFFSET 0x8E5B 173 MAX_FRAGMENT_INTERPOLATION_OFFSET 0x8E5C 174 FRAGMENT_INTERPOLATION_OFFSET_BITS 0x8E5D 175 MAX_VERTEX_STREAMS 0x8E71 176 177 (note: MAX_GEOMETRY_SHADER_INVOCATIONS, 178 MIN_FRAGMENT_INTERPOLATION_OFFSET, MAX_FRAGMENT_INTERPOLATION_OFFSET, and 179 FRAGMENT_INTERPOLATION_OFFSET_BITS have identical values to corresponding 180 "NV" enums from NV_gpu_program5. MAX_VERTEX_STREAMS is also defined in 181 ARB_transform_feedback3.) 182 183 184Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification 185(OpenGL Operation) 186 187 Modify Section 2.15.4, Geometry Shader Execution Environment, p. 121 188 189 (add two unnumbered subsections after "Texture Access", p. 122) 190 191 Instanced Geometry Shaders 192 193 For each input primitive received by the geometry shader pipeline stage, 194 the geometry shader may be run once or multiple times. The number of 195 times a geometry shader should be executed for each input primitive may be 196 specified using a layout qualifier in a geometry shader of a linked 197 program. If the invocation count is not specified in any layout 198 qualifier, the invocation count will be one. 199 200 Each separate geometry shader invocation is assigned a unique invocation 201 number. For a geometry shader with <N> invocations, each input primitive 202 spawns <N> invocations, numbered 0 through <N>-1. The built-in uniform 203 gl_InvocationID may be used by a geometry shader invocation to determine 204 its invocation number. 205 206 When executing instanced geometry shaders, the output primitives generated 207 from each input primitive are passed to subsequent pipeline stages using 208 the shader invocation number to order the output. The first primitives 209 received by the subsequent pipeline stages are those emitted by the shader 210 invocation numbered zero, followed by those from the shader invocation 211 numbered one, and so forth. Additionally, all output primitives generated 212 from a given input primitive are passed to subsequent pipeline stages 213 before any output primitives generated from subsequent input primitives. 214 215 216 Geometry Shader Vertex Streams 217 218 Geometry shaders may emit primitives to multiple independent vertex 219 streams. Each vertex emitted by the geometry shader is directed at one of 220 the vertex streams. As vertices are received on each stream, they are 221 arranged into primitives of the type specified by the geometry shader 222 output primitive type. The shading language built-in functions 223 EndPrimitive() and EndStreamPrimitive() may be used to end the primitive 224 being assembled on a given vertex stream and start a new empty primitive 225 of the same type. If an implementation supports <N> vertex streams, the 226 individual streams are numbered 0 through <N>-1. There is no requirement 227 on the order of the streams to which vertices are emitted, and the number 228 of vertices emitted to each stream may be completely independent, subject 229 only to implementation-dependent output limits. 230 231 The primitives emitted to all vertex streams are passed to the transform 232 feedback stage to be captured and written to buffer objects in the manner 233 specified by the transform feedback state. The primitives emitted to all 234 streams but stream zero are discarded after transform feedback. 235 Primitives emitted to stream zero are passed to subsequent pipeline stages 236 for clipping, rasterization, and subsequent fragment processing. 237 238 Geometry shaders that emit vertices to multiple vertex streams are 239 currently limited to using only the "points" output primitive type. A 240 program will fail to link if it includes a geometry shader that calls the 241 EmitStreamVertex() built-in function and has any other output primitive 242 type parameter. 243 244 245Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification 246(Rasterization) 247 248 Modify Section 3.3.1, Multisampling, p. 148 249 250 (add new paragraph at the end of the section, p. 149) 251 252 If MULTISAMPLE is enabled and the current program object includes a 253 fragment shader with one or more input variables qualified with "sample 254 in", the data associated with those variables will be assigned 255 independently. The values for each sample must be evaluated at the 256 location of the sample. The data associated with any other variables not 257 qualified with "sample in" need not be evaluated independently for each 258 sample. 259 260 261 Modify ARB_texture_gather, "Changes to Section 3.8.8" 262 263 (extend language describing the operation of textureGather, allowing the 264 new <comp> argument to select any of the four components from a 265 multi-component texel vector) 266 267 The textureGather and textureGatherOffset built-in shader functions... A 268 four-component vector is then assembled by taking a single component from 269 the swizzled texture source colors of the four texels, in the order 270 T_i0_j1, T_i1_j1, T_i1_j0, and T_i0_j0. The selected component is 271 identified by the optional <comp> argument, where the values zero, one, 272 two, and three identify the Rs, Gs, Bs, or As component, respectively. If 273 <comp> is omitted, it is treated as identifying the Rs component. 274 Incomplete textures (section 3.8.10) are considered to return a texture 275 source color of (0,0,0,1) for all four source texels. 276 277 (add further language describing textureGatherOffsets) 278 279 The textureGatherOffsets built-in functions from the OpenGL Shading 280 Language return a vector derived from sampling four texels in the image 281 array of level <level_base>. For each of the four texel offsets specified 282 by the <offsets> argument, the rules for the LINEAR minification filter 283 are applied to identify a 2x2 texel footprint, from which the single texel 284 T_i0_j0 is selected. A four-component vector is then assembled by taking 285 a single component from each of the four T_i0_j0 texels in the same manner 286 as for the textureGather function. 287 288 289 Modify Section 3.12.1, Shader Variables, p. 273 290 291 (insert prior to the last paragraph of the section, p. 274) 292 293 When interpolating built-in and user-defined varying variables, the default 294 screen-space location at which these variables are sampled is defined in 295 previous rasterization sections. The default location may be overriden by 296 interpolation qualifiers. When interpolating variables declared using 297 "centroid in", the variable is sampled at a location within the pixel 298 covered by the primitive generating the fragment. When interpolating 299 variables declared using "sample in" when MULTISAMPLE is enabled, the 300 fragment shader will be invoked separately for each covered sample and the 301 variable will be sampled at the corresponding sample point. 302 303 Additionally, built-in fragment shader functions provide further 304 fine-grained control over interpolation. The built-in functions 305 interpolateAtCentroid() and interpolateAtSample() will sample variables as 306 though they were declared with the "centroid" or "sample" qualifiers, 307 respectively. The built-in function interpolateAtOffset() will sample 308 variables at a specified (x,y) offset relative to the center of the pixel. 309 The range and granularity of offsets supported by this function is 310 implementation-dependent. If either component of the specified offset is 311 less than MIN_FRAGMENT_INTERPOLATION_OFFSET or greater than 312 MAX_FRAGMENT_INTERPOLATION_OFFSET, the position used to interpolate the 313 variable is undefined. Not all values of <offset> may be supported; x and 314 y offsets may be rounded to fixed-point values with the number of fraction 315 bits given by the implementation-dependent constant 316 FRAGMENT_INTERPOLATION_OFFSET_BITS. 317 318 319 Modify Section 3.12.2, Shader Execution, p. 274 320 321 (insert prior to the next-to-last paragraph in "Shader Inputs", p. 277) 322 323 The built-in variable gl_SampleMaskIn[] is an integer array holding 324 bitfields indicating the set of fragment samples covered by the primitive 325 corresponding to the fragment shader invocation. The number of elements 326 in the array is ceil(<s>/32), where <s> is the maximum number of color 327 samples supported by the implementation. Bit <n> of element <w> in the 328 array is set if and only if the sample numbered <w>*32+<n> is considered 329 covered for this fragment shader invocation. When rendering to a 330 non-multisample buffer, or if multisample rasterization is disabled, all 331 bits are zero except for bit zero of the first array element. That bit 332 will be one if the pixel is covered and zero otherwise. Bits in the 333 sample mask corresponding to covered samples that will be killed due to 334 SAMPLE_COVERAGE or SAMPLE_MASK_NV will not be set (section 4.1.3). When 335 per-sample shading is active due to the use of a fragment input qualified 336 by "sample", only the bit for the current sample is set in 337 gl_SampleMaskIn. When OpenGL API state specifies multiple fragment shader 338 invocations for a given fragment, the sample mask for any single fragment 339 shader invocation may specify a subset of the covered samples for the 340 fragment. In this case, the bit corresponding to each covered sample will 341 be set in exactly one fragment shader invocation. 342 343 344Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification 345(Per-Fragment Operations and the Frame Buffer) 346 347 None. 348 349Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification 350(Special Functions) 351 352 None. 353 354Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification 355(State and State Requests) 356 357 Modify Section 6.1.16, Shader and Program Queries, p. 384 358 359 (add to long first paragraph, p. 386) ... If <pname> is 360 GEOMETRY_SHADER_INVOCATIONS, the number of geometry shader invocations per 361 primitive will be returned. If GEOMETRY_VERTICES_OUT, 362 GEOMETRY_INPUT_TYPE, GEOMETRY_OUTPUT_TYPE, or GEOMETRY_SHADER_INVOCATIONS 363 are queried for a program which has not been linked successfully, or which 364 does not contain objects to form a geometry shader, then an 365 INVALID_OPERATION error is generated. 366 367 368Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) 369Specification (Invariance) 370 371 None. 372 373Additions to the AGL/GLX/WGL Specifications 374 375 None. 376 377Modifications to The OpenGL Shading Language Specification, Version 1.50 378(Revision 09) 379 380 Including the following line in a shader can be used to control the 381 language features described in this extension: 382 383 #extension GL_ARB_gpu_shader5 : <behavior> 384 385 where <behavior> is as specified in section 3.3. 386 387 New preprocessor #defines are added to the OpenGL Shading Language: 388 389 #define GL_ARB_gpu_shader5 1 390 391 392 Modify Section 3.6, Keywords, p. 14 393 394 (add to the keyword list) 395 396 sample 397 398 399 Modify Section 4.1.7, Samplers, p. 23 400 401 (modify 1st paragraph of the section, deleting the restriction requiring 402 constant indexing of sampler arrays but still requiring uniform indexing 403 across invocations) ... Samplers may aggregated into arrays within a 404 shader (using square brackets [ ]) and can be indexed with general integer 405 expressions. The results of accessing a sampler array with an 406 out-of-bounds index are undefined. ... 407 408 (add new paragraph restricting the use of general integer expression in 409 sampler array indexing) When indexing an array of samplers, the integer 410 expression used to index the array must be uniform across shader 411 invocations. If this restriction is not satisfied, the results of 412 accessing the sampler array are undefined. For the purposes of this 413 uniformity test, the index used for texture lookups performed inside a 414 loop is considered uniform for the <n>th loop iteration if all shader 415 invocations that execute the loop at least <n> times compute the same 416 index on that iteration. For texture lookups inside a function other than 417 main(), an index is considered uniform if the value is the same for all 418 invocations calling the function from the same point in the caller. For 419 nested loops and function calls, the uniformity test requires that the 420 index match only those other shader invocations with identical loop 421 iteration counts and function call chains. 422 423 424 Modify Section 4.1.10, Implicit Conversions, p. 27 425 426 (modify table of implicit conversions) 427 428 Can be implicitly 429 Type of expression converted to 430 --------------------- ----------------- 431 int uint, float 432 ivec2 uvec2, vec2 433 ivec3 uvec3, vec3 434 ivec4 uvec4, vec4 435 436 uint float 437 uvec2 vec2 438 uvec3 vec3 439 uvec4 vec4 440 441 (modify second paragraph of the section) No implicit conversions are 442 provided to convert from unsigned to signed integer types or from 443 floating-point to integer types. There are no implicit array or structure 444 conversions. 445 446 (insert before the final paragraph of the section) When performing 447 implicit conversion for binary operators, there may be multiple data types 448 to which the two operands can be converted. For example, when adding an 449 int value to a uint value, both values can be implicitly converted to uint 450 and float. In such cases, a floating-point type is chosen if either 451 operand has a floating-point type. Otherwise, an unsigned integer type is 452 chosen if either operand has an unsigned integer type. Otherwise, a 453 signed integer type is chosen. 454 455 456 Modify Section 4.3, Storage Qualifiers, p. 29 457 458 (add to first table on the page) 459 460 Qualifier Meaning 461 -------------- ---------------------------------------- 462 sample in linkage with per-sample interpolation 463 sample out linkage with per-sample interpolation 464 465 (modify third paragraph, p. 29) These interpolation qualifiers may only 466 precede the qualifiers in, centroid in, sample in, out, centroid out, or 467 sample out in a declaration. ... 468 469 470 Modify Section 4.3.4, Inputs, p. 31 471 472 (modify first paragraph of section) Shader input variables are declared 473 with the in, centroid in, or sample in storage qualifiers. ... Variables 474 declared as in, centroid in, or sample in may not be written to during 475 shader execution. ... 476 477 (modify third paragraph, p. 32) ... Fragment shader inputs get 478 per-fragment values, typically interpolated from a previous stage's 479 outputs. They are declared in fragment shaders with the in, centroid in, 480 or sample in storage qualifiers or the deprecated varying and centroid 481 varying storage qualifiers. ... 482 483 (add to examples immediately below) 484 485 sample in vec4 perSampleColor; 486 487 488 Modify Section 4.3.6, Outputs, p. 33 489 490 (modify first paragraph of section) Shader output variables are declared 491 with the out, centroid out, or sample out storage qualifiers. ... 492 493 (modify third paragraph of section) Vertex and geometry output variables 494 output per-vertex data and are declared using the out, centroid out, or 495 sample out storage qualifiers, or the deprecated varying storage 496 qualifier. 497 498 (add to examples immediately below) 499 500 sample out vec4 perSampleColor; 501 502 (modify last paragraph, p. 33) Fragment outputs output per-fragment data 503 and are declared using the out storage qualifier. It is an error to use 504 centroid out or sample out in a fragment shader. ... 505 506 507 Modify Section 4.3.7, Interface Blocks, p. 34 508 509 (modify last paragaph, p. 36, removing the requirement for indexing 510 uniform blocks using constant expressions) For uniform blocks declared as 511 arrays, each individual array element corresponds to a separate buffer 512 object backing one instance of the block. As the array size indicates the 513 number of buffer objects needed, uniform block array declarations must 514 specify an integral array size. Arbitrary indices may be used to index a 515 uniform block array; integral constant expressions are not required. If 516 the index used to access an array of uniform blocks is out-of-bounds, the 517 results of the access are undefined. 518 519 520 Modify Section 4.3.8.1, Input Layout Qualifiers, p. 37 521 522 (modify last paragraph, p. 37, and subsequent paragraphs on p. 38) 523 524 Geometry shaders support input layout qualifiers. There are two types of 525 layout qualifiers used to specify an input primitive type and an 526 invocation count. The input primitive type and invocation count 527 qualifiers are allowed only on the interface qualifier in, not on an input 528 block, block member, or variable. 529 530 layout-qualifier-id 531 points 532 lines 533 lines_adjacency 534 triangles 535 triangles_adjacency 536 invocations = integer-constant 537 538 The identifiers "points", "lines", "lines_adjacency", "triangles", and 539 "triangles_adjacency" are used to specify the type of input primitive 540 accepted by the geometry shader, and only one of these is accepted. At 541 least one geometry shader (compilation unit) in a program must declare an 542 input primitive type, and all geometry shader input primitive type 543 declarations in a program must declare the same type. It is not required 544 that all geometry shaders in a program declare an input primitive type. 545 546 The identifier "invocations" is used to specify the number of times the 547 geometry shader is invoked for each input primitive received. Invocation 548 count declarations are optional. If no invocation count is declared in 549 any geometry shader in the program, the geometry shader will be run once 550 for each input primitive. If an invocation count is declared, all such 551 declarations must specify the same count. If a shader specifies an 552 invocation count greater than the implementation-dependent maximum, it 553 will fail to compile. 554 555 For example, 556 557 layout(triangles, invocations=6) in; 558 559 will establish that all inputs to the geometry shader are triangles and 560 that the geometry shader is run six times for each triangle processed. 561 562 All geometry shader input unsized array declarations ... 563 564 565 Modify Section 4.3.8.2, Output Layout Qualifiers, p. 40 566 567 (modify second and subsequent paragraphs, p. 40) 568 569 Geometry shaders can have output layout qualifiers. There are three types 570 of output layout qualifiers used to specify an output primitive type, a 571 maximum output vertex count, and per-output stream numbers. The output 572 primitive type and output vertex count qualifiers are allowed only on the 573 interface qualifier out, not on an output block, block member, or variable 574 declaration. The output stream number qualifier is allowed on the 575 interface qualifier out, or on output blocks or variable declarations. 576 577 The layout qualifier identifiers for geometry shader outputs are 578 579 layout-qualifier-id 580 points 581 line_strip 582 triangle_strip 583 max_vertices = integer-constant 584 stream = integer-constant 585 586 The identifiers "points", "line_strip", and "triangle_strip" are used to 587 specify the type of output primitive produced by the geometry shader, and 588 only one of these is accepted. At least one geometry shader (compilation 589 unit) in a program must declare an output primitive type, and all geometry 590 shader output primitive type declarations in a program must declare the 591 same primitive type. It is not required that all geometry shaders in a 592 program declare an output primitive type. 593 594 The identifier "max_vertices" is used to specify the maximum number of 595 vertices the shader will ever emit in a single invocation. At least one 596 geometry shader (compilation unit) in a program must declare an maximum 597 output vertex count, and all geometry shader output vertex count 598 declarations in a program must declare the same count. It is not required 599 that all geometry shaders in a program declare a count. 600 601 In the example, 602 603 layout(triangle_strip, max_vertices = 60) out; // order does not matter 604 layout(max_vertices = 60) out; // redeclaration okay 605 layout(triangle_strip) out; // redeclaration okay 606 layout(points) out; // error, contradicts triangle_strip 607 layout(max_vertices = 30) out; // error, contradicts 60 608 609 all outputs from the geometry shader are triangles and at most 60 vertices 610 will be emitted by the shader. It is an error for the maximum number of 611 vertices to be greater than gl_MaxGeometryOutputVertices. 612 613 The identifier "stream" is used to specify that a geometry shader output 614 variable or block is associated with a particular vertex stream (numbered 615 beginning with zero). A default stream number may be declared at global 616 scope by qualifying interface qualifier out as in this example: 617 618 layout(stream = 1) out; 619 620 The stream number specified in such a declaration replaces any previous 621 default and applies to all subsequent block and variable declarations 622 until a new default is established. The initial default stream number is 623 zero. 624 625 Each output block or non-block output variable is associated with a vertex 626 stream. If the block or variable is declared with a stream qualifier, it 627 is associated with the specified stream; otherwise, it is associated with 628 the current default stream. A block member may be declared with a stream 629 qualifier, but the specified stream must match the stream associated with 630 the containing block. One example: 631 632 layout(stream=1) out; // default is now stream 1 633 out vec4 var1; // var1 gets default stream (1) 634 layout(stream=2) out Block1 { // "Block1" belongs to stream 2 635 layout(stream=2) vec4 var2; // redundant block member stream decl 636 layout(stream=3) vec2 var3; // ILLEGAL (must match block stream) 637 vec3 var4; // belongs to stream 2 638 }; 639 layout(stream=0) out; // default is now stream 0 640 out vec4 var5; // var5 gets default stream (0) 641 out Block2 { // "Block2" gets default stream (0) 642 vec4 var6; 643 }; 644 layout(stream=3) out vec4 var7; // var7 belongs to stream 3 645 646 If a geometry shader output block or variable is declared more than once, 647 all such declarations must associate the variable with the same vertex 648 stream. If any stream declaration specifies a non-existent stream number, 649 the shader will fail to compile. 650 651 Built-in geometry shader outputs are always associated with vertex stream 652 zero. 653 654 Each vertex emitted by the geometry shader is assigned to a specific 655 stream, and the attributes of the emitted vertex are taken from the set of 656 output blocks and variables assigned to the targeted stream. After each 657 vertex is emitted, the values of all output variables become undefined. 658 Additionally, the output variables associated with each vertex stream may 659 share storage. Writing to an output variable associated with one stream 660 may overwrite output variables associated with any other stream. When 661 emitting each vertex, a geometry shader should write to all outputs 662 associated with the stream to which the vertex will be emitted and to no 663 outputs associated with any other stream. 664 665 666 Modify Section 4.3.9, Interpolation, p. 42 667 668 (modify first paragraph of section, add reference to sample in/out) The 669 presence of and type of interpolation is controlled by the storage 670 qualifiers centroid in, sample in, centroid out, and sample out, by the 671 optional interpolation qualifiers smooth, flat, and noperspective, and by 672 default behaviors established through the OpenGL API when no interpolation 673 qualifier is present. ... 674 675 (modify second paragraph) ... A variable may be qualified as flat centroid 676 or flat sample, which will mean the same thing as qualifying it only as 677 flat. 678 679 (replace last paragraph, p. 42) 680 681 When multisample rasterization is disabled, or for fragment shader input 682 variables qualified with neither "centroid in" nor "sample in", the value 683 of the assigned variable may be interpolated anywhere within the pixel and 684 a single value may be assigned to each sample within the pixel, to the 685 extent permitted by the OpenGL Specification. 686 687 When multisample rasterization is enabled, "centroid" and "sample" may be 688 used to control the location and frequency of the sampling of the 689 qualified fragment shader input. If a fragment shader input is qualified 690 with "centroid", a single value may be assigned to that variable for all 691 samples in the pixel, but that value must be interpolated at a location 692 that lies in both the pixel and in the primitive being rendered, including 693 any of the pixel's samples covered by the primitive. Because the location 694 at which the variable is sampled may be different in neighboring pixels, 695 derivatives of centroid-sampled inputs may be less accurate than those for 696 non-centroid interpolated variables. If a fragment shader input is 697 qualified with "sample", a separate value must be assigned to that 698 variable for each covered sample in the pixel, and that value must be 699 sampled at the location of the individual sample. 700 701 702 (Insert before Section 4.7, Order of Qualification, p. 47) 703 704 Section 4.Q, The Precise Qualifier 705 706 Some algorithms may require that floating-point computations be carried 707 out in exactly the manner specified in the source code, even if the 708 implementation supports optimizations that could produce nearly equivalent 709 results with higher performance. For example, many GL implementations 710 support a "multiply-add" that can compute values such as 711 712 float result = (float(a) * float(b)) + float(c); 713 714 in a single operation. The result of a floating-point multiply-add may 715 not always be identical to first doing a multiply yielding a 716 floating-point result, and then doing a floating-point add. By default, 717 implementations are permitted to perform optimizations that effectively 718 modify the order of the operations used to evaluate an expression, even if 719 those optimizations may produce slightly different results relative to 720 unoptimized code. 721 722 The qualifier "precise" will ensure that operations contributing to a 723 variable's value are performed in the order and with the precision 724 specified in the source code. Order of evaluation is determined by 725 operator precedence and parentheses, as described in Section 5. 726 Expressions must be evaluated with a precision consistent with the 727 operation; for example, multiplying two "float" values must produce a 728 single value with "float" precision. This effectively prohibits the 729 arbitrary use of fused multiply-add operations if the intermediate 730 multiply result is kept at a higher precision. For example: 731 732 precise out vec4 position; 733 734 declares that computations used to produce the value of "position" must be 735 performed precisely using the order and precision specified. As with the 736 invariant qualifier (section 4.6.1), the precise qualifier may be used to 737 qualify a built-in or previously declared user-defined variable as being 738 precise: 739 740 out vec3 Color; 741 precise Color; // make existing Color be precise 742 743 This qualifier will affect the evaluation of expressions used on the 744 right-hand side of an assignment if and only if: 745 746 * the variable assigned to is qualified as "precise"; or 747 748 * the value assigned is used later in the same function, either directly 749 or indirectly, on the right-hand of an assignment to a variable 750 declared as "precise". 751 752 Expressions computed in a function are treated as precise only if assigned 753 to a variable qualified as "precise" in that same function. Any other 754 expressions within a function are not automatically treated as precise, 755 even if they are used to determine a value that is returned by the 756 function and directly assigned to a variable qualified as "precise". 757 758 Some examples of the use of "precise" include: 759 760 in vec4 a, b, c, d; 761 precise out vec4 v; 762 763 float func(float e, float f, float g, float h) 764 { 765 return (e*f) + (g*h); // no special precision 766 } 767 768 float func2(float e, float f, float g, float h) 769 { 770 precise result = (e*f) + (g*h); // ensures a precise return value 771 return result; 772 } 773 774 float func3(float i, float j, precise out float k) 775 { 776 k = i * i + j; // precise, due to <k> declaration 777 } 778 779 void main(void) 780 { 781 vec4 r = vec3(a * b); // precise, used to compute v.xyz 782 vec4 s = vec3(c * d); // precise, used to compute v.xyz 783 v.xyz = r + s; // precise 784 v.w = (a.w * b.w) + (c.w * d.w); // precise 785 v.x = func(a.x, b.x, c.x, d.x); // values computed in func() 786 // are NOT precise 787 v.x = func2(a.x, b.x, c.x, d.x); // precise! 788 func3(a.x * b.x, c.x * d.x, v.x); // precise! 789 } 790 791 792 Modify Section 4.7, Order of Qualification, p. 47 793 794 When multiple qualifications are present, they must follow a strict order. 795 This order is as follows: 796 797 precise-qualifier invariant-qualifier interpolation-qualifier storage-qualifier 798 precision-qualifier 799 800 801 Modify Section 5.9, Expressions, p. 57 802 803 (modify bulleted list as follows, adding support for implicit conversion 804 between signed and unsigned types) 805 806 Expressions in the shading language are built from the following: 807 808 * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector 809 types, and all matrix types. 810 811 ... 812 813 * The operator modulus (%) operates on signed or unsigned integer scalars 814 or vectors. If the fundamental types of the operands do not match, the 815 conversions from Section 4.1.10 "Implicit Conversions" are applied to 816 produce matching types. ... 817 818 819 Modify Section 6.1, Function Definitions, p. 63 820 821 (modify description of overloading, beginning at the top of p. 64) 822 823 Function names can be overloaded. The same function name can be used for 824 multiple functions, as long as the parameter types differ. If a function 825 name is declared twice with the same parameter types, then the return 826 types and all qualifiers must also match, and it is the same function 827 being declared. For example, 828 829 vec4 f(in vec4 x, out vec4 y); // (A) 830 vec4 f(in vec4 x, out uvec4 y); // (B) okay, different argument type 831 vec4 f(in ivec4 x, out uvec4 y); // (C) okay, different argument type 832 833 int f(in vec4 x, out ivec4 y); // error, only return type differs 834 vec4 f(in vec4 x, in vec4 y); // error, only qualifier differs 835 vec4 f(const in vec4 x, out vec4 y); // error, only qualifier differs 836 837 When function calls are resolved, an exact type match for all the 838 arguments is sought. If an exact match is found, all other functions are 839 ignored, and the exact match is used. If no exact match is found, then 840 the implicit conversions in Section 4.1.10 (Implicit Conversions) will be 841 applied to find a match. Mismatched types on input parameters (in or 842 inout or default) must have a conversion from the calling argument type 843 to the formal parameter type. Mismatched types on output parameters (out 844 or inout) must have a conversion from the formal parameter type to the 845 calling argument type. 846 847 If implicit conversions can be used to find more than one matching 848 function, a single best-matching function is sought. To determine a best 849 match, the conversions between calling argument and formal parameter 850 types are compared for each function argument and pair of matching 851 functions. After these comparisons are performed, each pair of matching 852 functions are compared. A function definition A is considered a better 853 match than function definition B if: 854 855 * for at least one function argument, the conversion for that argument 856 in A is better than the corresponding conversion in B; and 857 858 * there is no function argument for which the conversion in B is better 859 than the corresponding conversion in A. 860 861 If a single function definition is considered a better match than every 862 other matching function definition, it will be used. Otherwise, a 863 semantic error occurs and the shader will fail to compile. 864 865 To determine whether the conversion for a single argument in one match is 866 better than that for another match, the following rules are applied, in 867 order: 868 869 1. An exact match is better than a match involving any implicit 870 conversion. 871 872 2. A match involving an implicit conversion from float to double is 873 better than a match involving any other implicit conversion. 874 875 3. A match involving an implicit conversion from either int or uint to 876 float is better than a match involving an implicit conversion from 877 either int or uint to double. 878 879 If none of the rules above apply to a particular pair of conversions, 880 neither conversion is considered better than the other. 881 882 For the function prototypes (A), (B), and (C) above, the following 883 examples show how the rules apply to different sets of calling argument 884 types: 885 886 f(vec4, vec4); // exact match of vec4 f(in vec4 x, out vec4 y) 887 f(vec4, uvec4); // exact match of vec4 f(in vec4 x, out ivec4 y) 888 f(vec4, ivec4); // matched to vec4 f(in vec4 x, out vec4 y) 889 // (C) not relevant, can't convert vec4 to 890 // ivec4. (A) better than (B) for 2nd 891 // argument (rule 2), same on first argument. 892 f(ivec4, vec4); // NOT matched. All three match by implicit 893 // conversion. (C) is better than (A) and (B) 894 // on the first argument. (A) is better than 895 // (B) and (C). 896 897 898 Modify Section 7.1, Vertex And Geometry Shader Special Variables, p. 69 899 900 (add to the list of geometry shader special variables, p. 69) 901 902 in int gl_InvocationID; 903 904 (add to the end of the section, p. 71) 905 906 The input variable gl_InvocationID is available in the geometry language 907 and is filled with an integer holding the invocation number associated 908 with the given shader invocation. If the program is linked to support 909 multiple geometry shader invocations per input primitive, the invocations 910 are numbered 0, 1, 2, ..., <N>-1. gl_InvocationID is not available in the 911 vertex or fragment language. 912 913 914 Modify Section 7.2, Fragment Shader Special Variables, p. 72 915 916 (add to the list of built-in variables) 917 918 in int gl_SampleMaskIn[]; 919 920 The variable gl_SampleMaskIn is an array of integers, each holding a 921 bitfield indicating the set of samples covered by the primitive generating 922 the fragment during multisample rasterization. The array has ceil(<s>/32) 923 elements, where <s> is the maximum number of color samples supported by 924 the implementation. Bit <n> or word <w> in the bitfield is set if and 925 only if the sample numbered <w>*32+<n> is considered covered for this 926 fragment shader invocation. 927 928 929 Modify Section 8.3, Common Functions, p. 84 930 931 (add support for floating-point multiply-add) 932 933 Syntax: 934 935 genType fma(genType a, genType b, genType c); 936 937 The function fma() performs a fused floating-point multiply-add to compute 938 the value a*b+c. The results of fma() may not be identical to evaluating 939 the expression (a*b)+c, because the computation may be performed in a 940 single operation with intermediate precision different from that used to 941 compute a non-fma() expression. 942 943 The results of fma() are guaranteed to be invariant given fixed inputs 944 <a>, <b>, and <c>, as though the result were taken from a variable 945 declared as "precise". 946 947 948 (add support for single-precision frexp and ldexp functions) 949 950 Syntax: 951 952 genType frexp(genType x, out genIType exp); 953 genType ldexp(genType x, in genIType exp); 954 955 The function frexp() splits each single-precision floating-point number in 956 <x> into a binary significand, a floating-point number in the range [0.5, 957 1.0), and an integral exponent of two, such that: 958 959 x = significand * 2 ^ exponent 960 961 The significand is returned by the function; the exponent is returned in 962 the parameter <exp>. For a floating-point value of zero, the significant 963 and exponent are both zero. For a floating-point value that is an 964 infinity or is not a number, the results of frexp() are undefined. 965 966 If the input <x> is a vector, this operation is performed in a 967 component-wise manner; the value returned by the function and the value 968 written to <exp> are vectors with the same number of components as <x>. 969 970 The function ldexp() builds a single-precision floating-point number from 971 each significand component in <x> and the corresponding integral exponent 972 of two in <exp>, returning: 973 974 significand * 2 ^ exponent 975 976 If this product is too large to be represented as a single-precision 977 floating-point value, the result is considered undefined. 978 979 If the input <x> is a vector, this operation is performed in a 980 component-wise manner; the value passed in <exp> and returned by the 981 function are vectors with the same number of components as <x>. 982 983 984 (add support for new integer built-in functions) 985 986 Syntax: 987 988 genIType bitfieldExtract(genIType value, int offset, int bits); 989 genUType bitfieldExtract(genUType value, int offset, int bits); 990 991 genIType bitfieldInsert(genIType base, genIType insert, int offset, 992 int bits); 993 genUType bitfieldInsert(genUType base, genUType insert, int offset, 994 int bits); 995 996 genIType bitfieldReverse(genIType value); 997 genUType bitfieldReverse(genUType value); 998 999 genIType bitCount(genIType value); 1000 genIType bitCount(genUType value); 1001 1002 genIType findLSB(genIType value); 1003 genIType findLSB(genUType value); 1004 1005 genIType findMSB(genIType value); 1006 genIType findMSB(genUType value); 1007 1008 The function bitfieldExtract() extracts bits <offset> through 1009 <offset>+<bits>-1 from each component in <value>, returning them in the 1010 least significant bits of corresponding component of the result. For 1011 unsigned data types, the most significant bits of the result will be set 1012 to zero. For signed data types, the most significant bits will be set to 1013 the value of bit <offset>+<base>-1. If <bits> is zero, the result will be 1014 zero. The result will be undefined if <offset> or <bits> is negative, or 1015 if the sum of <offset> and <bits> is greater than the number of bits used 1016 to store the operand. Note that for vector versions of bitfieldExtract(), 1017 a single pair of <offset> and <bits> values is shared for all components. 1018 1019 The function bitfieldInsert() inserts the <bits> least significant bits of 1020 each component of <insert> into the corresponding component of <base>. 1021 The result will have bits numbered <offset> through <offset>+<bits>-1 1022 taken from bits 0 through <bits>-1 of <insert>, and all other bits taken 1023 directly from the corresponding bits of <base>. If <bits> is zero, the 1024 result will simply be <base>. The result will be undefined if <offset> or 1025 <bits> is negative, or if the sum of <offset> and <bits> is greater than 1026 the number of bits used to store the operand. Note that for vector 1027 versions of bitfieldInsert(), a single pair of <offset> and <bits> values 1028 is shared for all components. 1029 1030 The function bitfieldReverse() reverses the bits of <value>. The bit 1031 numbered <n> of the result will be taken from bit (<bits>-1)-<n> of 1032 <value>, where <bits> is the total number of bits used to represent 1033 <value>. 1034 1035 The function bitCount() returns the number of one bits in the binary 1036 representation of <value>. 1037 1038 The function findLSB() returns the bit number of the least significant one 1039 bit in the binary representation of <value>. If <value> is zero, -1 will 1040 be returned. 1041 1042 The function findMSB() returns the bit number of the most significant bit 1043 in the binary representation of <value>. For positive integers, the 1044 result will be the bit number of the most significant one bit. For 1045 negative integers, the result will be the bit number of the most 1046 significant zero bit. For a <value> of zero or negative one, -1 will be 1047 returned. 1048 1049 1050 (add support for general packing functions) 1051 1052 Syntax: 1053 1054 uint packUnorm2x16(vec2 v); 1055 uint packUnorm4x8(vec4 v); 1056 uint packSnorm4x8(vec4 v); 1057 1058 vec2 unpackUnorm2x16(uint v); 1059 vec4 unpackUnorm4x8(uint v); 1060 vec4 unpackSnorm4x8(uint v); 1061 1062 The functions packUnorm2x16(), packUnorm4x8(), and packSnorm4x8() first 1063 convert each component of a two- or four-component vector of normalized 1064 floating-point values into 8- or 16-bit integer values. Then, the results 1065 are packed into a 32-bit unsigned integer. The first component of the 1066 vector will be written to the least significant bits of the output; the 1067 last component will be written to the most significant bits. 1068 1069 The functions unpackUnorm2x16(), unpackUnorm4x8(), and unpackSnorm4x8() 1070 first unpacks a single 32-bit unsigned integer into a pair of 16-bit 1071 unsigned integers, four 8-bit unsigned integers, or four 8-bit signed 1072 integers. The, each component is converted to a normalized floating-point 1073 value to generate a two- or four-component vector. The first component of 1074 the vector will be extracted from the least significant bits of the input; 1075 the last component will be extracted from the most significant bits. 1076 1077 The conversion between fixed- and normalized floating-point values will be 1078 performed as below. 1079 1080 function conversion 1081 --------------- ----------------------------------------------------- 1082 packUnorm2x16 fixed_val = round(clamp(float_val, 0, +1) * 65535.0); 1083 packUnorm4x8 fixed_val = round(clamp(float_val, 0, +1) * 255.0); 1084 packSnorm4x8 fixed_val = round(clamp(float_val, -1, +1) * 127.0); 1085 unpackUnorm2x16 float_val = fixed_val / 65535.0; 1086 unpackUnorm4x8 float_val = fixed_val / 255.0; 1087 unpackSnorm4x8 float_val = clamp(fixed_val / 127.0, -1, +1); 1088 1089 1090 (add functions to get/set the bit encoding for floating-point values) 1091 1092 32-bit floating-point data types in the OpenGL shading language are 1093 specified to be encoded according to the IEEE 754 specification for 1094 single-precision floating-point values. The functions below allow shaders 1095 to convert floating-point values to and from signed or unsigned integers 1096 representing their encoding. 1097 1098 To obtain signed or unsigned integer values holding the encoding of a 1099 floating-point value, use: 1100 1101 genIType floatBitsToInt(genType value); 1102 genUType floatBitsToUint(genType value); 1103 1104 Conversions are done on a component-by-component basis. 1105 1106 To obtain a floating-point value corresponding to a signed or unsigned 1107 integer encoding, use: 1108 1109 genType intBitsToFloat(genIType value); 1110 genType uintBitsToFloat(genUType value); 1111 1112 1113 (support for unsigned integer add/subtract with carry-out) 1114 1115 Syntax: 1116 1117 genUType uaddCarry(genUType x, genUType y, out genUType carry); 1118 genUType usubBorrow(genUType x, genUType y, out genUType borrow); 1119 1120 The function uaddCarry() adds 32-bit unsigned integers or vectors <x> and 1121 <y>, returning the sum modulo 2^32. The value <carry> is set to zero if 1122 the sum was less than 2^32, or one otherwise. 1123 1124 The function usubBorrow() subtracts the 32-bit unsigned integer or vector 1125 <y> from <x>, returning the difference if non-negative or 2^32 plus the 1126 difference, otherwise. The value <borrow> is set to zero if x >= y, or 1127 one otherwise. 1128 1129 1130 (support for signed and unsigned multiplies, with 32-bit inputs and a 1131 64-bit result spanning two 32-bit outputs) 1132 1133 Syntax: 1134 1135 void umulExtended(genUType x, genUType y, out genUType msb, 1136 out genUType lsb); 1137 void imulExtended(genIType x, genIType y, out genIType msb, 1138 out genIType lsb); 1139 1140 The functions umulExtended() and imulExtended() multiply 32-bit unsigned 1141 or signed integers or vectors <x> and <y>, producing a 64-bit result. The 1142 32 least significant bits are returned in <lsb>; the 32 most significant 1143 bits are returned in <msb>. 1144 1145 1146 Modify Section 8.7, Texture Lookup Functions, p. 91 1147 1148 (extend the basic versions of textureGather from ARB_texture_gather, 1149 allowing for optional component selection in a multi-component texture 1150 and for shadow mapping) 1151 1152 Syntax: 1153 gvec4 textureGather(gsampler2D sampler, vec2 coord[, int comp]); 1154 gvec4 textureGather(gsampler2DArray sampler, vec3 coord[, int comp]); 1155 gvec4 textureGather(gsamplerCube sampler, vec3 coord[, int comp]); 1156 gvec4 textureGather(gsamplerCubeArray sampler, vec4 coord[, int comp]); 1157 gvec4 textureGather(gsampler2DRect sampler, vec2 coord[, int comp]); 1158 1159 vec4 textureGather(sampler2DShadow sampler, vec2 coord, float refZ); 1160 vec4 textureGather(sampler2DArrayShadow sampler, vec3 coord, float refZ); 1161 vec4 textureGather(samplerCubeShadow sampler, vec3 coord, float refZ); 1162 vec4 textureGather(samplerCubeArrayShadow sampler, vec4 coord, 1163 float refZ); 1164 vec4 textureGather(sampler2DRectShadow sampler, vec2 coord, float refZ); 1165 1166 The textureGather() functions use the texture coordinates given by <coord> 1167 to determine a set of four texels to sample from the texture identified by 1168 <sampler>. These functions return a four-component vector consisting of 1169 one component from each texel. If specified, the value of <comp> must be 1170 a constant integer expression with a value of zero, one, two, or three, 1171 identifying the <x>, <y>, <z>, or <w> component of the four-component 1172 vector lookup result for each texel, respectively. If <comp> is not 1173 specified, the <x> component of each texel will be used to generate the 1174 result vector. As described in the OpenGL Specification, the vector 1175 selects the post-swizzle component corresponding to <comp> from each of 1176 the four texels, returning: 1177 1178 vec4(T_i0_j1(coord, base).<comp>, 1179 T_i1_j1(coord, base).<comp>, 1180 T_i1_j0(coord, base).<comp>, 1181 T_i0_j0(coord, base).<comp>) 1182 1183 For textureGather() functions using a shadow sampler type, each of the 1184 four texel lookups performs a depth comparison against the depth reference 1185 value passed in <refZ>, and returns the result of that comparison in the 1186 appropriate component of the result vector. The parameter <comp> used for 1187 component selection is not supported for textureGather() functions with 1188 shader sampler types. 1189 1190 As with other texture lookup functions, the results of textureGather() are 1191 undefined for shadow samplers if the texture referenced is not a depth 1192 texture or has depth comparisons disabled; or for non-shadow samplers if 1193 the texture referenced is a depth texture with depth comparisons enabled. 1194 1195 1196 (extend the "Offset" versions of textureGather from ARB_texture_gather, 1197 allowing for optional component selection in a multi-component texture, 1198 non-constant offsets, and shadow mapping) 1199 1200 Syntax: 1201 gvec4 textureGatherOffset(gsampler2D sampler, vec2 coord, 1202 ivec2 offset[, int comp]); 1203 gvec4 textureGatherOffset(gsampler2DArray sampler, vec3 coord, 1204 ivec2 offset[, int comp]); 1205 gvec4 textureGatherOffset(gsampler2DRect sampler, vec2 coord, 1206 ivec2 offset[, int comp]); 1207 1208 vec4 textureGatherOffset(sampler2DShadow sampler, vec2 coord, 1209 float refZ, ivec2 offset); 1210 vec4 textureGatherOffset(sampler2DArrayShadow sampler, vec3 coord, 1211 float refZ, ivec2 offset); 1212 vec4 textureGatherOffset(sampler2DRectShadow sampler, vec2 coord, 1213 float refZ, ivec2 offset); 1214 1215 The textureGatherOffset() functions operate identically to 1216 textureGather(), except that the 2-component integer texel offset vector 1217 <offset> is applied as a (u,v) offset to determine the four texels to 1218 sample. The value <offset> need not be constant; however, a limited range 1219 of offset values are supported. If any component of <offset> is less than 1220 MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB or greater than 1221 MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, the offset applied to the texture 1222 coordinates is undefined. Note that <offset> does not apply to the layer 1223 coordinate for array textures. 1224 1225 1226 (add new "Offsets" versions of textureGather from ARB_texture_gather, 1227 allowing for optional component selection in a multi-component texture, 1228 separate non-constant offsets for each texel in the footprint, and shadow 1229 mapping) 1230 1231 Syntax: 1232 gvec4 textureGatherOffsets(gsampler2D sampler, vec2 coord, 1233 ivec2 offsets[4][, int comp]); 1234 gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 coord, 1235 ivec2 offsets[4][, int comp]); 1236 gvec4 textureGatherOffsets(gsampler2DRect sampler, vec2 coord, 1237 ivec2 offsets[4][, int comp]); 1238 1239 vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 coord, 1240 float refZ, ivec2 offsets[4]); 1241 vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 coord, 1242 float refZ, ivec2 offsets[4]); 1243 vec4 textureGatherOffsets(sampler2DRectShadow sampler, vec2 coord, 1244 float refZ, ivec2 offsets[4]); 1245 1246 The textureGatherOffsets() functions operate identically to 1247 textureGather(), except that the array of two-component integer vectors 1248 <offsets> is used to determine the location of the four texels to sample. 1249 Each of the four texels is obtained by applying the corresponding offset 1250 in the four-element array <offsets> as a (u,v) coordinate offset to the 1251 coordinates <coord>, identifying the four-texel LINEAR footprint, and then 1252 selecting the texel T_i0_j0 of that footprint. The specified values in 1253 <offsets> must be constant. A limited range of offset values are 1254 supported; the minimum and maximum offset values are 1255 implementation-dependent and given by 1256 MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB and 1257 MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, respectively. Note that <offset> 1258 does not apply to the layer coordinate for array textures. 1259 1260 1261 Modify Section 8.8, Fragment Processing Functions, p. 101 1262 1263 (add new functions to the end of section, p. 102) 1264 1265 Built-in interpolation functions are available to compute an interpolated 1266 value of a fragment shader input variable at a shader-specified (x,y) 1267 location. A separate (x,y) location may be used for each invocation of 1268 the built-in function, and those locations may differ from the default 1269 (x,y) location used to produce the default value of the input. 1270 1271 float interpolateAtCentroid(float interpolant); 1272 vec2 interpolateAtCentroid(vec2 interpolant); 1273 vec3 interpolateAtCentroid(vec3 interpolant); 1274 vec4 interpolateAtCentroid(vec4 interpolant); 1275 1276 float interpolateAtSample(float interpolant, int sample); 1277 vec2 interpolateAtSample(vec2 interpolant, int sample); 1278 vec3 interpolateAtSample(vec3 interpolant, int sample); 1279 vec4 interpolateAtSample(vec4 interpolant, int sample); 1280 1281 float interpolateAtOffset(float interpolant, vec2 offset); 1282 vec2 interpolateAtOffset(vec2 interpolant, vec2 offset); 1283 vec3 interpolateAtOffset(vec3 interpolant, vec2 offset); 1284 vec4 interpolateAtOffset(vec4 interpolant, vec2 offset); 1285 1286 The function interpolateAtCentroid() will return the value of the input 1287 varying <interpolant> sampled at a location inside the both the pixel and 1288 the primitive being processed. The value obtained would be the same value 1289 assigned to the input variable if declared with the "centroid" qualifier. 1290 1291 The function interpolateAtSample() will return the value of the input 1292 varying <interpolant> at the location of the sample numbered <sample>. If 1293 multisample buffers are not available, the input varying will be evaluated 1294 at the center of the pixel. If the sample number given by <sample> does 1295 not exist, the position used to interpolate the input varying is 1296 undefined. 1297 1298 The function interpolateAtOffset() will return the value of the input 1299 varying <interpolant> sampled at an offset from the center of the pixel 1300 specified by <offset>. The two floating-point components of <offset> 1301 give the offset in pixels in the x and y directions, respectively. 1302 An offset of (0,0) identifies the center of the pixel. The range and 1303 granularity of offsets supported by this function is 1304 implementation-dependent. 1305 1306 For all of the interpolation functions, <interpolant> must be an input 1307 variable or an element of an input variable declared as an array. 1308 Component selection operators (e.g., ".xy") may not be used when 1309 specifying <interpolant>. If <interpolant> is declared with a "flat" or 1310 "centroid" qualifier, the qualifier will have no effect on the 1311 interpolated value. If <interpolant> is declared with the "noperspective" 1312 qualifier, the interpolated value will be computed without perspective 1313 correction. 1314 1315 1316 Modify Section 8.10, Geometry Shader Functions, p. 104 1317 1318 (replace the section, using the following more general formulation) 1319 1320 These functions are only available in geometry shaders. 1321 1322 Syntax: 1323 1324 void EmitStreamVertex(int stream); // Geometry-only 1325 void EndStreamPrimitive(int stream); // Geometry-only 1326 1327 void EmitVertex(); // Geometry-only 1328 void EndPrimitive(); // Geometry-only 1329 1330 Description: 1331 1332 The function EmitStreamVertex() specifies that the vertex being generated 1333 by the geometry shader is completed. A vertex is added to the current 1334 output primitive in the vertex stream numbered <stream> using the current 1335 values of all output variables associated with <stream>. The values of 1336 any unwritten output variables associated with <stream> are undefined. 1337 The argument <stream> must be a constant integral expression. The values 1338 of all output variables (for all output streams) are undefined after 1339 calling EmitStreamVertex(). If a geometry shader invocation has emitted 1340 more vertices than permitted by the output layout qualifier 1341 "max_vertices", the results of calling EmitStreamVertex() are undefined. 1342 1343 The function EmitVertex() is equivalent to calling EmitStreamVertex() with 1344 <stream> set to zero. 1345 1346 The function EndStreamPrimitive() specifies that the current output 1347 primitive for the vertex stream numbered <stream> is completed and that a 1348 new empty output primitive of the same type should be started. The 1349 argument <stream> must be a constant integral expression. This function 1350 does not emit a vertex. If the output layout is declared to be "points", 1351 calling EndPrimitive() is optional. 1352 1353 The function EndPrimitive() is equivalent to calling EndStreamPrimitive() 1354 with <stream> set to zero. 1355 1356 A geometry shader starts with an output primitive containing no vertices 1357 for each stream. When a geometry shader terminates, the current output 1358 primitive for each vertex stream is automatically completed. It is not 1359 necessary to call EndPrimitive() or EndStreamPrimitive() for any stream 1360 where the geometry shader writes only a single primitive. 1361 1362 Multiple vertex streams are supported only if the output primitive type is 1363 declared to be "points". A program will fail to link if it contains a 1364 geometry shader calling EmitStreamVertex() or EndStreamPrimitive() if its 1365 output primitive type is not "points". 1366 1367 1368 Modify Section 9, Shading Language Grammar, p. 92 1369 1370 !!! TBD !!! 1371 1372 1373GLX Protocol 1374 1375 None. 1376 1377Dependencies on ARB_gpu_shader_fp64 1378 1379 This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set 1380 of implicit conversions supported in the OpenGL Shading Language. If more 1381 than one of these extensions is supported, an expression of one type may 1382 be converted to another type if that conversion is allowed by any of these 1383 specifications. 1384 1385 If ARB_gpu_shader_fp64 or a similar extension introducing new data types 1386 is not supported, the function overloading rule in the GLSL specification 1387 preferring promotion an input parameters to smaller type to a larger type 1388 is never applicable, as all data types are of the same size. That rule 1389 and the example referring to "double" should be removed. 1390 1391 1392Dependencies on NV_gpu_shader5 1393 1394 This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set 1395 of implicit conversions supported in the OpenGL Shading Language. If more 1396 than one of these extensions is supported, an expression of one type may 1397 be converted to another type if that conversion is allowed by any of these 1398 specifications. 1399 1400 This specification and NV_gpu_shader5 both lift the restriction in GLSL 1401 1.50 requiring that indexing in arrays of samplers must be done with 1402 constant expressions. However, this extension specifies that results are 1403 undefined if the indices would diverge if multiple shader invocations are 1404 run in lockstep. NV_gpu_shader5 does not impose the non-divergent 1405 indexing requirement. 1406 1407 If NV_gpu_shader5 is supported, integer data types are supported with four 1408 different precisions (8-, 16, 32-, and 64-bit) and floating-point data 1409 types are supported with three different precisions (16-, 32-, and 1410 64-bit). The extension adds the following rule for output parameters, 1411 which is similar to the one present in this extension for input 1412 parameters: 1413 1414 5. If the formal parameters in both matches are output parameters, a 1415 conversion from a type with a larger number of bits per component is 1416 better than a conversion from a type with a smaller number of bits 1417 per component. For example, a conversion from an "int16_t" formal 1418 parameter type to "int" is better than one from an "int8_t" formal 1419 parameter type to "int". 1420 1421 Such a rule is not provided in this extension because there is no 1422 combination of types in this extension and ARB_gpu_shader_fp64 where this 1423 rule has any effect. 1424 1425 1426Dependencies on ARB_sample_shading 1427 1428 This extension builds upon the per-sample shading support provided by 1429 ARB_sample_shading to provide several new capabilities, including: 1430 1431 * the built-in variable gl_SampleMaskIn[] indicates the set of samples 1432 covered by the input primitive corresponding to the fragment shader 1433 invocation; and 1434 1435 * use of the "sample" qualifier on a fragment shader input forces 1436 per-sample shading, and specifies that the value of the input be 1437 evaluated per-sample. 1438 1439 There is no interaction between the extensions, except that shaders using 1440 the features of this extension seem likely to use features from 1441 ARB_sample_shading as well. 1442 1443 1444Dependencies on ARB_texture_gather 1445 1446 This extension builds upon the textureGather() built-ins provided by 1447 ARB_texture_gather to provide several new capabilities, including: 1448 1449 * allowing shaders to select any single component of a multi-component 1450 texture to produce the gathered 2x2 footprint; 1451 1452 * allowing shaders to perform a per-sample depth comparison when 1453 gathering the 2x2 footprint using for shadow sampler types; 1454 1455 * allowing shaders to use arbitrary offsets computed at run-time to 1456 select a 2x2 footprint to gather from; and 1457 1458 * allowing shaders to use separate independent offsets for each of the 1459 four texels returned, instead of requiring a fixed 2x2 footprint. 1460 1461 Other than the fact that they provide similar functionality, there is no 1462 interaction between the extensions. 1463 1464 Since this extension requires support for gathering from multi-component 1465 textures, the minimum value of MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB 1466 is increased to 4. 1467 1468 1469Errors 1470 1471 INVALID_OPERATION is generated by GetProgram if <pname> is 1472 GEOMETRY_SHADER_INVOCATIONS and the program which has not been linked 1473 successfully, or does not contain objects to form a geometry shader. 1474 1475 1476New State 1477 1478 Add the following state to Table 6.40, Program Object State, p. 378 1479 1480 Initial 1481 Get Value Type Get Command Value Description Sec. Attribute 1482 ------------------------- ---- ------------ ------- ------------------------- ------ ------- 1483 GEOMETRY_SHADER_ Z+ GetProgramiv 1 number of times a geometry 6.1.16 - 1484 INVOCATIONS shader should be executed 1485 for each input primitive 1486 1487New Implementation Dependent State 1488 1489 Min. 1490 Get Value Type Get Command Value Description Sec. Attrib 1491 ---------------------- ---- ----------- ----- -------------------------- -------- ------ 1492 MAX_GEOMETRY_SHADER_ Z+ GetIntegerv 32 maximum supported geometry 2.16.4 - 1493 INVOCATIONS shader invocation count 1494 MIN_FRAGMENT_INTERP- R GetFloatv -0.5 furthest negative offset 3.12.1 - 1495 OLATION_OFFSET for interpolateAtOffset() 1496 MAX_FRAGMENT_INTERP- R GetFloatv +0.5 furthest positive offset 3.12.1 - 1497 OLATION_OFFSET for interpolateAtOffset() 1498 FRAGMENT_INTERPOLATION_ Z+ GetIntegerv 4 supixel bits for 3.12.1 - 1499 OFFSET_BITS interpolateAtOffset() 1500 MAX_VERTEX_STREAMS Z+ GetInteger 4 total number of vertex 2.16.4 - 1501 streams 1502 1503 (Note: The minimum value for MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB, 1504 added by ARB_texture_gather, is increased to 4.) 1505 1506Issues 1507 1508 (1) This extension builds on the capability provided by 1509 ARB_sample_shading, adding a new built-in variable for the input 1510 sample mask. It seems likely that a shader using this mask might also 1511 want to use one or more ARB_sample_shading built-ins. Are such 1512 shaders required to include #extension lines for both extensions? 1513 1514 UNRESOLVED: It would be nice if it wasn't required. 1515 1516 (2) How do the per-sample shading features of this extension interact with 1517 non-multisample rendering? 1518 1519 RESOLVED: Non-multisample rendering (due to no multisample buffer or 1520 MULTISAMPLE disabled) is treated as single-sample rendering. 1521 1522 (3) This extension lifts the restriction requiring that indices into 1523 samplers be constant expressions, but makes the results undefined if 1524 the indices used would diverge in lockstep execution. What is this 1525 good for? 1526 1527 RESOLVED: This allows shaders to index into samplers using integer 1528 uniforms, or with non-divergent values computed at run-time (e.g., loop 1529 counters). Many implementations of this extension will be SIMD, running 1530 multiple shader invocations at once, and some implementations may have 1531 difficulty with accessing multiple textures in a single SIMD 1532 instruction. 1533 1534 Note that the NV_gpu_shader5 extension similarly lifts the restriction 1535 but does not require non-divergent indexing. 1536 1537 (4) What sort of implicit conversions should we support in this and 1538 related extensions? 1539 1540 RESOLVED: In GLSL 1.50, we have implicit conversion from "int" and 1541 "uint" to "float", as well as equivalent conversions for vector type. 1542 One of the primary motivations of this feature is to allow constants 1543 that are nominally integer values to be used in floating-point contexts 1544 without requiring special suffixes. The following code compiles 1545 successfully in GLSL 1.50. 1546 1547 float square(float x) { 1548 return x * x; 1549 } 1550 float f = 0; 1551 float g = f * 2; 1552 float h = square(3); 1553 1554 The same code would fail on GLSL 1.1, because "0", "2", and "3" would 1555 need to be written as "0.0", "2.0", and "3.0", respectively. 1556 1557 This extension adds implicit conversions from "int" to "uint" to allow 1558 for cases like: 1559 1560 uint square(uint x) { 1561 return x * x; 1562 } 1563 uint v = square(2); 1564 1565 This code is legal with this extension, but not in GLSL 1.50 ("2" would 1566 need to be replaced with "2U" or "uint(2)"). 1567 1568 ARB_gpu_shader_fp64 adds a new type "double", and we extend existing 1569 implicit conversions to allow for promotion of "int", "uint", and 1570 "float" to "double". 1571 1572 Unlike C/C++, the general rule for implicit conversions in GLSL is that 1573 conversions are unidirectional. If type A can be implicitly converted 1574 to type B, type B can not be converted to type A. 1575 1576 (5) Increasing the number of available implicit conversions means that 1577 there is the possibility of ambiguities in various operators? How do 1578 we deal with these cases? 1579 1580 RESOLVED: For binary operators, the new implicit conversions mean that 1581 there may be multiple ways to resolve an expression. For example, in 1582 the following declaration 1583 1584 int i; 1585 uint u; 1586 1587 the expression "i+u" could be resolved either by implicitly converting 1588 "i" to "uint", or by implicitly converting both values to either "float" 1589 or "double". To resolve, we define a set of preferences for a common 1590 data type based on the types of the operands: 1591 1592 - use a floating-point type if either operand is floating-point 1593 - use an unsigned integer type if either operand is unsigned 1594 - use a signed integer type otherwise 1595 1596 If conversions to multiple precisions are supported, the 1597 lowest-precision available data type is preferred (e.g., int*float will 1598 be converted to float*float and not double*double). 1599 1600 These rules should extend naturally if new basic data types are added. 1601 1602 (6) Increasing the number of available implicit conversions means that 1603 there is an increased possibility of ambiguity when function 1604 overloading is involved? Additionally, this and related extensions 1605 add new function overloads? How do we deal with these cases? 1606 1607 RESOLVED: The general rule for function overloading in GLSL 1.50 is 1608 that we first check for a function prototype that exactly matches the 1609 parameters passed to a function call. If no match exists, we check for 1610 prototypes that can be matched by implicit conversions. If more than 1611 one matching prototype can be matched by conversion, the function call 1612 is considered ambiguous and results in a complication error. 1613 1614 Unfortunately, when adding new implicit conversions, it is possible for 1615 cases that were formally unambiguous to become ambiguous. For backward 1616 compatibility purposes, it would be desirable to ensure that shaders 1617 that succeeded in old language versions should still compile if 1618 "upgraded" to more recent versions/extensions. However, the new 1619 conversions and overloads might make this more difficult without 1620 modifying other language rules. For example, the following prototypes 1621 are available for the standard built-in function min() on scalar values 1622 when this extension and ARB_gpu_shader_fp64 are supported: 1623 1624 int min(int a, int b); 1625 uint min(uint a, uint b); 1626 float min(float a, float b); 1627 double min(double a, double b); 1628 1629 In GLSL 1.50, a function call such as: 1630 1631 float f; 1632 min(f, 1); 1633 1634 would be considered unambiguous because the double-precision version of 1635 min() didn't exist and the call matched only the single-precision 1636 version. However, with double-precision, implicit conversions can be 1637 used to resolve to either the single- or double-precision versions. 1638 1639 To resolve this issue, we provide a set of rules that can be used to 1640 resolve multiple candidates to a "best match". The rules for 1641 determining a best match are similar to those for C++ function 1642 overloading, but not exactly the same. Like C++, these rules compare 1643 the conversions required on an argument-by-argument basis. A function 1644 prototype A is better than function prototype B if: 1645 1646 - A is better than B for one or more arguments 1647 - B is better than A for no arguments 1648 1649 If a single function prototype is better than all others, that one is 1650 used. Otherwise, we get the same ambiguity error as on previous GLSL 1651 versions. 1652 1653 As far as argument-by-argument comparisons go, the order of preference 1654 is: 1655 1656 - favor exact matches 1657 - prefer "promotions" (float->double) to other conversions 1658 - prefer conversions from int/uint to float over similar conversion to 1659 double 1660 1661 If none of the rules apply, one match is considered neither better nor 1662 worse than the other. 1663 1664 With these rules, the "min(f,1)" example above resolves to the "float" 1665 version, as is the case in GLSL 1.50. However, there are other cases 1666 where ambiguity remains. For example, consider the prototypes: 1667 1668 int f(uint x); 1669 int f(float x); 1670 1671 With GLSL 1.50 rules, "f(3)" would match the floating-point version, as 1672 no implicit conversions existed from "int" to "uint". With the new 1673 implicit conversions, both prototypes match and neither is preferred. 1674 Because of the ambiguity, "f(3)" would fail to compile with this 1675 extension enabled, but should still compile on implementations 1676 supporting this extension if the extension is not enabled in GLSL source 1677 code. 1678 1679 (7) The function overloading rules described in this extension describe 1680 conversions between data types with different sizes, however all 1681 existing data types allowing implicit conversion (int, uint, float) 1682 are the same size? Why do we specify these rules? 1683 1684 RESOLVED: This extension is specified at the same time as the related 1685 ARB_gpu_shader_fp64 and NV_gpu_shader5 extensions, which do provide such 1686 types. The rules are specified all in one place here so we don't have 1687 to replicate and extend the rules in the other extensions. It also 1688 provides the ability to automatically convert from signed to unsigned 1689 integer types, as in the C programming language. 1690 1691 (8) Should we support textureGather() for rectangle textures 1692 (sampler2DRect)? They aren't in ARB_texture_gather. 1693 1694 RESOLVED: Yes. 1695 1696 (9) How does the input sample mask interact with the fixed-function 1697 SampleCoverage and SampleMask state? Will samples be removed from the 1698 input mask if they would be eliminated by these masks in the 1699 per-fragment operations? 1700 1701 UNRESOLVED. 1702 1703 (10) Should we support reading patches as geometry shader inputs, and if 1704 so, where? 1705 1706 RESOLVED: Not in this extension. This capability will be provided in 1707 NV_gpu_shader5. 1708 1709 (11) Should we support per-sample interpolation of attributes? If so, 1710 how? 1711 1712 RESOLVED. Yes. When multisample rasterization is enabled, qualifying 1713 one or more fragment shader inputs with "sample" will force per-sample 1714 interpolation of those attributes. If the same shader includes other 1715 fragment inputs not qualified with sample, those attributes may be 1716 interpolated per-pixel (i.e., all samples get the same values, likely 1717 evaluated at the pixel center). 1718 1719 (12) Should we reserve "sample" as a keyword for per-sample interpolation 1720 qualifiers, or use something more obscure, such as "per_sample"? 1721 1722 RESOLVED: This extension uses "sample". 1723 1724 (13) What should be the base data type for the bitCount(), findLSB(), and 1725 findMSB() functions -- signed or unsigned integers? 1726 1727 RESOLVED: These functions will return signed values, with -1 returned 1728 by findLSB/findMSB if no bit is found. Note that the shading language 1729 supports implicit conversions of signed integers to unsigned, which 1730 makes it easy enough if an unsigned result is desired. 1731 1732 (14) Why do EmitVertex() and EndPrimitive() begin with capitalized words 1733 while most of the other built-ins start with a lower-case (e.g., 1734 emitVertex)? Which precedent should the new per-vertex stream emit 1735 and end primitive functions follow? 1736 1737 RESOLVED: The inconsistency began with the original functions in 1738 EXT_geometry_shader4; the spec author can't recall the original reasons 1739 (if any). Regardless, we decided to match the existing functions as 1740 closely as possible and use EmitStreamVertex() and EndStreamPrimitive(). 1741 1742 (15) How do the textureGather functions work with sRGB textures? 1743 1744 RESOLVED: Gamma-correction is applied to the texture source color 1745 before "gathering" and hence applies to all four components, unless the 1746 texture swizzle of the selected component is ALPHA in which case no 1747 gamma-correction is applied. 1748 1749 (16) How should we support arrays of uniform blocks (i.e., multiple blocks 1750 in a group, each backed by a separate buffer object)? 1751 1752 RESOLVED: We will use instance names in the block definitions, which 1753 can be declared as regular arrays: 1754 1755 uniform UniformData { 1756 vec4 stuff; 1757 } blocks[4]; 1758 1759 These four blocks used will be referred to as "block[0]" through 1760 "block[3]" in shader code, and "UniformData[0]" through "UniformData[3]" 1761 in the OpenGL API code. The block member in this example will be 1762 referred to as "UniformData.stuff" in the API. A similar approach was 1763 already adopted in GLSL 1.50, where geometry shaders supported arrays of 1764 input blocks that were treated similarly. Since this spec depends on 1765 GLSL 1.50, little new spec language is required here. 1766 1767 (17) What are instanced geometry shaders useful for? 1768 1769 RESOLVED: Instanced geometry shaders allow geometry programs that 1770 perform regular operations to run more efficiently. 1771 1772 Consider a simple example of an algorithm that uses geometry shaders to 1773 render primitives to a cube map in a single pass. Without instanced 1774 geometry shaders, the geometry shader to render triangles to the cube 1775 map would do something like: 1776 1777 for (face = 0; face < 6; face++) { 1778 for (vertex = 0; vertex < 3; vertex++) { 1779 project vertex <vertex> onto face <face>, output position 1780 compute/copy attributes of emitted <vertex> to outputs 1781 output <face> to result.layer 1782 emit the projected vertex 1783 } 1784 end the primitive (next triangle) 1785 } 1786 1787 This algorithm would output 18 vertices per input triangle, three for 1788 each cube face. The six triangles emitted would be rasterized, one per 1789 face. Geometry shaders that emit a large number of attributes have 1790 often posed performance challenges, since all the attributes must be 1791 stored somewhere until the emitted primitives. Large storage 1792 requirements may limit the number of threads that can be run in parallel 1793 and reduce overall performance. 1794 1795 Instanced geometry shaders allow this example to be restructured to run 1796 with six separate invocations, one per face. Each invocation projects 1797 the triangle to only a single face (identified by the invocation number) 1798 and emits only 3 vertices. The reduced storage requirements allow more 1799 geometry shader invocations to be run in parallel, with greater overall 1800 efficiency. 1801 1802 Additionally, the total number of attributes that can be emitted by a 1803 single geometry shader invocation is limited. However, for instanced 1804 geometry shaders, that limit applies to each of <N> invocations which 1805 allows for a larger total output. For example, if the GL implementation 1806 supports only 1024 components of output per invocation, the 18-vertex 1807 algorithm above could emit no more than 56 components per vertex. The 1808 same algorithm implemented as a 3-vertex 6-invocation geometry program 1809 could theoretically allow for 341 components per vertex. 1810 1811 (18) Should EmitStreamVertex() and EndStreamPrimitive() accept a 1812 non-constant stream number? 1813 1814 RESOLVED: Not in this extension. Requiring a constant stream number 1815 for each call simplifies code generation for the compiler. 1816 1817 (19) Are there any restrictions on geometry shaders with multiple output 1818 streams? 1819 1820 RESOLVED: Yes, such geometry shaders are required to generate points; 1821 line strip and triangle strip outputs are not supported. 1822 1823 (20) Since multi-stream geometry shaders only support points, why does 1824 EndStreamPrimitive() exist? Neither it nor EndStream() does anything 1825 useful when emitting points. 1826 1827 RESOLVED: This function was added for completeness, and would be useful 1828 if the requirement for emitting points were lifted by a future 1829 extension. 1830 1831 (21) Should we provide mechanisms allowing shaders to examine or set the 1832 bit representation of floating-point numbers? 1833 1834 RESOLVED: Yes, we will provide functions to convert single-precision 1835 floats to/from signed and unsigned 32-bit integers. The 1836 ARB_gpu_shader_fp64 extension will provide similar functionality for 1837 double-precision floats. We chose to adopt the Java naming convention 1838 here -- converting a single-precision float to/from a signed integer is 1839 accomplished by the functions floatBitsToInt() and intBitsToFloat(). 1840 1841 Note that this functionality has also been forked off into a separate 1842 extension (ARB_shader_bit_encoding) that can be exported on 1843 implementations capable of performing such conversions but not capable 1844 of the full feature set of this extension and/or OpenGL 4.0. 1845 1846 (22) What is the "precise" qualifier good for? 1847 1848 RESOLVED: Like "invariant", "precise" provides some invariance 1849 guarantees is useful for certain algorithms. 1850 1851 With an output position qualified as "invariant", we ensure that if the 1852 same geometry is processed by multiple shaders using the exact same 1853 code, it will be transformed in exactly the same way to ensure that we 1854 have no cracking or flickering in multi-pass algorithms using different 1855 shaders. 1856 1857 With "precise", we ensure that an algorithm can be written to produce 1858 identical results on subtly different inputs. For example, the order of 1859 vertices visible to a geometry or tessellation shader used to subdivide 1860 primitive edges might present an edge shared between two primitives in 1861 one direction for one primitive and the other direction for the adjacent 1862 primitive. Even if the weights are identical in the two cases, there 1863 may be cracking if the computations are being done in an order-dependent 1864 manner. If the position of a new vertex were provided by evaluation the 1865 function f() below with limited-precision floating-point math, it's not 1866 necessarily the case that f(a,b,c) == f(c,b,a) in the following code: 1867 1868 float f(float x, float y, float z) 1869 { 1870 return (x + y) + z; 1871 } 1872 1873 This function f() can be rewritten as follows with "precise" and a 1874 symmetric evaluation order to ensure that f(a,b,c) == f(c,b,a). 1875 1876 float f(float x, float y, float z) 1877 { 1878 // Note that we intentionally compute "(x+z)" instead of "(x+y)" 1879 // here, because that value will be the same when <x> and <z> 1880 // are reversed. 1881 precise float result = (x + z) + y; 1882 return result; 1883 } 1884 1885 (a + b) + c == (c + b) + a 1886 1887 The "precise" qualifier will disable certain optimization and thus 1888 carries a performance cost. The cost may be higher than "invariant", 1889 because "invariant" permits optimizations disallowed by "precise" as 1890 long as the compiler ensures that it always optimizes in the exact same 1891 manner. 1892 1893 (23) What computations will be affected by the "precise" qualifier, and 1894 what computations aren't? 1895 1896 RESOLVED: We will ensure precise computation of any expressions within 1897 a single function used directly or indirectly to produce the value of a 1898 variable qualified as "precise". 1899 1900 We chose not to provide this guarantee across function boundaries, even 1901 if the results of a function are used in the computation of an output 1902 qualified as "precise". Algorithms requiring the use of "precise" may 1903 have a mix of computations, some required to be precise, some not. This 1904 function boundary rule may serve to limit the amount of computation 1905 indirectly forced to be precise. 1906 1907 Additionally, the subroutine rule permits non-precise sub-operations in 1908 a computation required to be precise. For example, a shader might need 1909 to compute a "precise" position by taking a weighted average as in the 1910 following code: 1911 1912 precise vec3 pos = (p[0]*w[0] + p[1]*w[1]) + (p[2]*w[2] + p[3]*w[3]); 1913 1914 However, if the main precision requirement is that the same result be 1915 generated when <p> and <w> are reversed, the following code also gets 1916 the job done, even if posmad() is implemented with multiply-add 1917 operations. 1918 1919 vec3 posmad(vec3 p0, float w0, vec3 p1w1) { return p0*w0+p1w1; } 1920 precise vec3 pos = (posmad(p[0], w[0], p[1]*w[1]) + 1921 posmad(p[3], w[3], p[2]*w[2])); 1922 1923 To generate precise results within a function, the function arguments 1924 and/or temporaries within the function body should be qualified as 1925 "precise" as needed. 1926 1927 Note that when applying "precise" rules to assignments, indirect 1928 application of this rule applies on an assignment-by-assignment basis. 1929 In the following perverse example: 1930 1931 float a,b,c,d,e,f; 1932 precise float g; 1933 f = a + b + c; 1934 ... 1935 f = c + d + e; 1936 g = f * 2.0; 1937 1938 The first assignment to <f> need not be treated as "precise", since the 1939 value assigned will have no effect on the final value of the 1940 precise-qualified <g>. The second assignment to <f> must be evaluated 1941 precisely. The fact that one assignment to a variable needs to be 1942 treated as precise does not mean that the variable itself is implicitly 1943 treated as "precise". 1944 1945 (24) Are "precise" qualifiers allowed on function arguments? If so, what 1946 do they mean? Can a return value for a function be declared as 1947 precise? 1948 1949 RESOLVED: Yes; the rules permit the use of "precise" on any variable 1950 declaration, including function arguments. The code 1951 1952 float f(precise in vec4 arg1, precise out vec4 arg2) { ... } 1953 1954 specifies that any expressions used to assign values to <arg1> or <arg2> 1955 within f() will be evaluated as a precise manner. 1956 1957 Expressions used to derive the value passed to the function f() as 1958 <arg1> will be treated as precise according to the normal rules. The 1959 expression for <arg1> is treated as precise if and only if the function 1960 call is on the right-hand side of an assignment to a variable qualified 1961 as "precise" or is indirectly used in an assignment to such a variable. 1962 It is not automatically treated as precise just because the formal 1963 parameter <arg1> is qualified with "precise". 1964 1965 For the purposes of this rule, variables passed as "out" parameters do 1966 not count as assignments. Values assigned to an output parameter will 1967 not be evaluated precisely just because the caller provides a variable 1968 qualified as "precise". When the output parameter itself is qualified 1969 as "precise", precise evaluation of that output is required within the 1970 callee. 1971 1972 We chose not to permit function return values to be qualified as 1973 "precise", though we could have hypothetically allowed code such as: 1974 1975 precise float f(float a, float b, float c) { return (a+b)+c; } 1976 1977 To obtain a precise return value in such a case, use code such as: 1978 1979 float f(float a, float b, float c) 1980 { 1981 precise float result = (a+b) + c; 1982 return result; 1983 } 1984 1985 (25) How does texture gather interact with incomplete textures? 1986 1987 RESOLVED: For regular texture lookups, incomplete textures are 1988 considered to return a texel value with RGBA components of (0,0,0,1). 1989 For texture gather operations, each texel in the sampled footprint is 1990 considered to have RGBA components of (0,0,0,1). When using the 1991 textureGather() function to select the R, G, or B component of an 1992 incomplete texture, (0,0,0,0) will be returned. When selecting the A 1993 component, (1,1,1,1) will be returned. 1994 1995 1996Revision History 1997 1998 Rev. Date Author Changes 1999 ---- -------- -------- ----------------------------------------- 2000 16 03/30/12 pbrown Fix typo in language restricting the use of 2001 EmitStreamVertex()/EndStreamPrimitive() to 2002 programs with an output primitive type of 2003 points, not an input type of points (bug 8371). 2004 2005 15 10/17/11 pbrown Fix prototypes for textureGather and 2006 textureGatherOffset to use vec2 coordinates for 2007 "2DRect" sampler versions (bug 7964). 2008 2009 14 01/27/11 pbrown Add further clarification on the interaction 2010 of texture gather and incomplete textures (bug 2011 7289). 2012 2013 13 09/24/10 pbrown Clarify the interaction of texture gather 2014 with swizzle (bug 5910), fixing conflicts 2015 between API and GLSL spec language. 2016 Consolidate into one copy in the API 2017 spec. 2018 2019 12 03/23/10 pbrown Update issues section, both fixing/numbering 2020 existing issues and including other issues 2021 that were left behind in NV_gpu_shader5 when the 2022 specs were refactored. 2023 2024 11 03/23/10 Jon Leech Describe <offset> to interpolateAtOffset 2025 without implying it is a constant expression 2026 (Bug 6026). 2027 2028 10 03/07/10 pbrown Fix typo in an output stream qualifier example. 2029 2030 9 03/05/10 pbrown Modify function overloading rules to remove 2031 most preferences when converting between 2032 two different types. The only preferences 2033 that remain are promoting "float" to "double" 2034 over other conversions, and preferring 2035 conversion of integers to "float" to converting 2036 to "double" (bug 5938). 2037 2038 8 01/29/10 pbrown Update the spec to require that the minimum 2039 value for MAX_PROGRAM_TEXTURE_GATHER_- 2040 COMPONENTS is 4 (bug 5919). 2041 2042 7 01/21/10 pbrown Clarify the rules for determining a best match 2043 if implicit conversions can result in multiple 2044 matching function prototypes. Modify the rules 2045 to pick a best match by comparing pairs of 2046 functions, and using any function deemed better 2047 than any other choice. Modify the argument 2048 conversion preference rules for overloading to 2049 disfavor "int" to "uint" conversions, for 2050 backward compatibility with previous GLSL 2051 versions. Add some new discussion of the 2052 choices involved to the issues section (bug 2053 5938). 2054 2055 6 01/14/10 pbrown Minor wording updates from spec reviews. 2056 2057 5 12/10/09 pbrown Functionality updates from spec review: 2058 Rename fmad to fma. Fix error in spec 2059 language for negative diffs in usubBorrow. 2060 2061 4 12/10/09 pbrown Convert from EXT to ARB. 2062 2063 3 12/08/09 pbrown Miscellaneous fixes from spec review: Added 2064 missing implementation constants for 2065 interpolation offset range and granularity; 2066 added explicit section to OpenGL spec describing 2067 shader requested interpolation modifiers and 2068 functions. Clean up more dangling "ThreadID" 2069 references. General typo fixes and language 2070 clarifications. 2071 2072 2 10/01/09 pbrown Renamed gl_ThreadID to gl_InvocationID. 2073 2074 1 pbrown Internal revisions. 2075