Name ARB_gpu_shader5 Name Strings GL_ARB_gpu_shader5 Contact Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) Contributors Barthold Lichtenbelt, NVIDIA Bill Licea-Kane, AMD Bruce Merry, ARM Chris Dodd, NVIDIA Eric Werness, NVIDIA Graham Sellers, AMD Greg Roth, NVIDIA Jeff Bolz, NVIDIA Nick Haemel, AMD Pierre Boudier, AMD Piers Daniell, NVIDIA Notice Copyright (c) 2010-2013 The Khronos Group Inc. Copyright terms at http://www.khronos.org/registry/speccopyright.html Status Complete. Approved by the ARB at the 2010/01/22 F2F meeting. Approved by the Khronos Board of Promoters on March 10, 2010. Version Version 16, March 30, 2012 Number ARB Extension #88 Dependencies This extension is written against the OpenGL 3.2 (Compatibility Profile) Specification. This extension is written against Version 1.50 (Revision 09) of the OpenGL Shading Language Specification. OpenGL 3.2 and GLSL 1.50 are required. This extension interacts with ARB_gpu_shader_fp64. This extension interacts with NV_gpu_shader5. This extension interacts with ARB_sample_shading. This extension interacts with ARB_texture_gather. Overview This extension provides a set of new features to the OpenGL Shading Language and related APIs to support capabilities of new GPUs, extending the capabilities of version 1.50 of the OpenGL Shading Language. Shaders using the new functionality provided by this extension should enable this functionality via the construct #extension GL_ARB_gpu_shader5 : require (or enable) This extension provides a variety of new features for all shader types, including: * support for indexing into arrays of samplers using non-constant indices, as long as the index doesn't diverge if multiple shader invocations are run in lockstep; * extending the uniform block capability of OpenGL 3.1 and 3.2 to allow shaders to index into an array of uniform blocks; * support for implicitly converting signed integer types to unsigned types, as well as more general implicit conversion and function overloading infrastructure to support new data types introduced by other extensions; * a "precise" qualifier allowing computations to be carried out exactly as specified in the shader source to avoid optimization-induced invariance issues (which might cause cracking in tessellation); * new built-in functions supporting: * fused floating-point multiply-add operations; * splitting a floating-point number into a significand and exponent (frexp), or building a floating-point number from a significand and exponent (ldexp); * integer bitfield manipulation, including functions to find the position of the most or least significant set bit, count the number of one bits, and bitfield insertion, extraction, and reversal; * packing and unpacking vectors of small fixed-point data types into a larger scalar; and * convert floating-point values to or from their integer bit encodings; * extending the textureGather() built-in functions provided by ARB_texture_gather: * allowing shaders to select any single component of a multi-component texture to produce the gathered 2x2 footprint; * allowing shaders to perform a per-sample depth comparison when gathering the 2x2 footprint using for shadow sampler types; * allowing shaders to use arbitrary offsets computed at run-time to select a 2x2 footprint to gather from; and * allowing shaders to use separate independent offsets for each of the four texels returned, instead of requiring a fixed 2x2 footprint. This extension also provides some new capabilities for individual shader types, including: * support for instanced geometry shaders, where a geometry shader may be run multiple times for each primitive, including a built-in gl_InvocationID to identify the invocation number; * support for emitting vertices in a geometry program where each vertex emitted may be directed independently at a specified vertex stream (as provided by ARB_transform_feedback3), and where each shader output is associated with a stream; * support for reading a mask of covered samples in a fragment shader; and * support for interpolating a fragment shader input at a programmable offset relative to the pixel center, a programmable sample number, or at the centroid. IP Status No known IP claims. New Procedures and Functions None New Tokens Accepted by the parameter of GetProgramiv: GEOMETRY_SHADER_INVOCATIONS 0x887F Accepted by the parameter of GetBooleanv, GetIntegerv, GetFloatv, GetDoublev, and GetInteger64v: MAX_GEOMETRY_SHADER_INVOCATIONS 0x8E5A MIN_FRAGMENT_INTERPOLATION_OFFSET 0x8E5B MAX_FRAGMENT_INTERPOLATION_OFFSET 0x8E5C FRAGMENT_INTERPOLATION_OFFSET_BITS 0x8E5D MAX_VERTEX_STREAMS 0x8E71 (note: MAX_GEOMETRY_SHADER_INVOCATIONS, MIN_FRAGMENT_INTERPOLATION_OFFSET, MAX_FRAGMENT_INTERPOLATION_OFFSET, and FRAGMENT_INTERPOLATION_OFFSET_BITS have identical values to corresponding "NV" enums from NV_gpu_program5. MAX_VERTEX_STREAMS is also defined in ARB_transform_feedback3.) Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification (OpenGL Operation) Modify Section 2.15.4, Geometry Shader Execution Environment, p. 121 (add two unnumbered subsections after "Texture Access", p. 122) Instanced Geometry Shaders For each input primitive received by the geometry shader pipeline stage, the geometry shader may be run once or multiple times. The number of times a geometry shader should be executed for each input primitive may be specified using a layout qualifier in a geometry shader of a linked program. If the invocation count is not specified in any layout qualifier, the invocation count will be one. Each separate geometry shader invocation is assigned a unique invocation number. For a geometry shader with invocations, each input primitive spawns invocations, numbered 0 through -1. The built-in uniform gl_InvocationID may be used by a geometry shader invocation to determine its invocation number. When executing instanced geometry shaders, the output primitives generated from each input primitive are passed to subsequent pipeline stages using the shader invocation number to order the output. The first primitives received by the subsequent pipeline stages are those emitted by the shader invocation numbered zero, followed by those from the shader invocation numbered one, and so forth. Additionally, all output primitives generated from a given input primitive are passed to subsequent pipeline stages before any output primitives generated from subsequent input primitives. Geometry Shader Vertex Streams Geometry shaders may emit primitives to multiple independent vertex streams. Each vertex emitted by the geometry shader is directed at one of the vertex streams. As vertices are received on each stream, they are arranged into primitives of the type specified by the geometry shader output primitive type. The shading language built-in functions EndPrimitive() and EndStreamPrimitive() may be used to end the primitive being assembled on a given vertex stream and start a new empty primitive of the same type. If an implementation supports vertex streams, the individual streams are numbered 0 through -1. There is no requirement on the order of the streams to which vertices are emitted, and the number of vertices emitted to each stream may be completely independent, subject only to implementation-dependent output limits. The primitives emitted to all vertex streams are passed to the transform feedback stage to be captured and written to buffer objects in the manner specified by the transform feedback state. The primitives emitted to all streams but stream zero are discarded after transform feedback. Primitives emitted to stream zero are passed to subsequent pipeline stages for clipping, rasterization, and subsequent fragment processing. Geometry shaders that emit vertices to multiple vertex streams are currently limited to using only the "points" output primitive type. A program will fail to link if it includes a geometry shader that calls the EmitStreamVertex() built-in function and has any other output primitive type parameter. Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification (Rasterization) Modify Section 3.3.1, Multisampling, p. 148 (add new paragraph at the end of the section, p. 149) If MULTISAMPLE is enabled and the current program object includes a fragment shader with one or more input variables qualified with "sample in", the data associated with those variables will be assigned independently. The values for each sample must be evaluated at the location of the sample. The data associated with any other variables not qualified with "sample in" need not be evaluated independently for each sample. Modify ARB_texture_gather, "Changes to Section 3.8.8" (extend language describing the operation of textureGather, allowing the new argument to select any of the four components from a multi-component texel vector) The textureGather and textureGatherOffset built-in shader functions... A four-component vector is then assembled by taking a single component from the swizzled texture source colors of the four texels, in the order T_i0_j1, T_i1_j1, T_i1_j0, and T_i0_j0. The selected component is identified by the optional argument, where the values zero, one, two, and three identify the Rs, Gs, Bs, or As component, respectively. If is omitted, it is treated as identifying the Rs component. Incomplete textures (section 3.8.10) are considered to return a texture source color of (0,0,0,1) for all four source texels. (add further language describing textureGatherOffsets) The textureGatherOffsets built-in functions from the OpenGL Shading Language return a vector derived from sampling four texels in the image array of level . For each of the four texel offsets specified by the argument, the rules for the LINEAR minification filter are applied to identify a 2x2 texel footprint, from which the single texel T_i0_j0 is selected. A four-component vector is then assembled by taking a single component from each of the four T_i0_j0 texels in the same manner as for the textureGather function. Modify Section 3.12.1, Shader Variables, p. 273 (insert prior to the last paragraph of the section, p. 274) When interpolating built-in and user-defined varying variables, the default screen-space location at which these variables are sampled is defined in previous rasterization sections. The default location may be overriden by interpolation qualifiers. When interpolating variables declared using "centroid in", the variable is sampled at a location within the pixel covered by the primitive generating the fragment. When interpolating variables declared using "sample in" when MULTISAMPLE is enabled, the fragment shader will be invoked separately for each covered sample and the variable will be sampled at the corresponding sample point. Additionally, built-in fragment shader functions provide further fine-grained control over interpolation. The built-in functions interpolateAtCentroid() and interpolateAtSample() will sample variables as though they were declared with the "centroid" or "sample" qualifiers, respectively. The built-in function interpolateAtOffset() will sample variables at a specified (x,y) offset relative to the center of the pixel. The range and granularity of offsets supported by this function is implementation-dependent. If either component of the specified offset is less than MIN_FRAGMENT_INTERPOLATION_OFFSET or greater than MAX_FRAGMENT_INTERPOLATION_OFFSET, the position used to interpolate the variable is undefined. Not all values of may be supported; x and y offsets may be rounded to fixed-point values with the number of fraction bits given by the implementation-dependent constant FRAGMENT_INTERPOLATION_OFFSET_BITS. Modify Section 3.12.2, Shader Execution, p. 274 (insert prior to the next-to-last paragraph in "Shader Inputs", p. 277) The built-in variable gl_SampleMaskIn[] is an integer array holding bitfields indicating the set of fragment samples covered by the primitive corresponding to the fragment shader invocation. The number of elements in the array is ceil(/32), where is the maximum number of color samples supported by the implementation. Bit of element in the array is set if and only if the sample numbered *32+ is considered covered for this fragment shader invocation. When rendering to a non-multisample buffer, or if multisample rasterization is disabled, all bits are zero except for bit zero of the first array element. That bit will be one if the pixel is covered and zero otherwise. Bits in the sample mask corresponding to covered samples that will be killed due to SAMPLE_COVERAGE or SAMPLE_MASK_NV will not be set (section 4.1.3). When per-sample shading is active due to the use of a fragment input qualified by "sample", only the bit for the current sample is set in gl_SampleMaskIn. When OpenGL API state specifies multiple fragment shader invocations for a given fragment, the sample mask for any single fragment shader invocation may specify a subset of the covered samples for the fragment. In this case, the bit corresponding to each covered sample will be set in exactly one fragment shader invocation. Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification (Per-Fragment Operations and the Frame Buffer) None. Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification (Special Functions) None. Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification (State and State Requests) Modify Section 6.1.16, Shader and Program Queries, p. 384 (add to long first paragraph, p. 386) ... If is GEOMETRY_SHADER_INVOCATIONS, the number of geometry shader invocations per primitive will be returned. If GEOMETRY_VERTICES_OUT, GEOMETRY_INPUT_TYPE, GEOMETRY_OUTPUT_TYPE, or GEOMETRY_SHADER_INVOCATIONS are queried for a program which has not been linked successfully, or which does not contain objects to form a geometry shader, then an INVALID_OPERATION error is generated. Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) Specification (Invariance) None. Additions to the AGL/GLX/WGL Specifications None. Modifications to The OpenGL Shading Language Specification, Version 1.50 (Revision 09) Including the following line in a shader can be used to control the language features described in this extension: #extension GL_ARB_gpu_shader5 : where is as specified in section 3.3. New preprocessor #defines are added to the OpenGL Shading Language: #define GL_ARB_gpu_shader5 1 Modify Section 3.6, Keywords, p. 14 (add to the keyword list) sample Modify Section 4.1.7, Samplers, p. 23 (modify 1st paragraph of the section, deleting the restriction requiring constant indexing of sampler arrays but still requiring uniform indexing across invocations) ... Samplers may aggregated into arrays within a shader (using square brackets [ ]) and can be indexed with general integer expressions. The results of accessing a sampler array with an out-of-bounds index are undefined. ... (add new paragraph restricting the use of general integer expression in sampler array indexing) When indexing an array of samplers, the integer expression used to index the array must be uniform across shader invocations. If this restriction is not satisfied, the results of accessing the sampler array are undefined. For the purposes of this uniformity test, the index used for texture lookups performed inside a loop is considered uniform for the th loop iteration if all shader invocations that execute the loop at least times compute the same index on that iteration. For texture lookups inside a function other than main(), an index is considered uniform if the value is the same for all invocations calling the function from the same point in the caller. For nested loops and function calls, the uniformity test requires that the index match only those other shader invocations with identical loop iteration counts and function call chains. Modify Section 4.1.10, Implicit Conversions, p. 27 (modify table of implicit conversions) Can be implicitly Type of expression converted to --------------------- ----------------- int uint, float ivec2 uvec2, vec2 ivec3 uvec3, vec3 ivec4 uvec4, vec4 uint float uvec2 vec2 uvec3 vec3 uvec4 vec4 (modify second paragraph of the section) No implicit conversions are provided to convert from unsigned to signed integer types or from floating-point to integer types. There are no implicit array or structure conversions. (insert before the final paragraph of the section) When performing implicit conversion for binary operators, there may be multiple data types to which the two operands can be converted. For example, when adding an int value to a uint value, both values can be implicitly converted to uint and float. In such cases, a floating-point type is chosen if either operand has a floating-point type. Otherwise, an unsigned integer type is chosen if either operand has an unsigned integer type. Otherwise, a signed integer type is chosen. Modify Section 4.3, Storage Qualifiers, p. 29 (add to first table on the page) Qualifier Meaning -------------- ---------------------------------------- sample in linkage with per-sample interpolation sample out linkage with per-sample interpolation (modify third paragraph, p. 29) These interpolation qualifiers may only precede the qualifiers in, centroid in, sample in, out, centroid out, or sample out in a declaration. ... Modify Section 4.3.4, Inputs, p. 31 (modify first paragraph of section) Shader input variables are declared with the in, centroid in, or sample in storage qualifiers. ... Variables declared as in, centroid in, or sample in may not be written to during shader execution. ... (modify third paragraph, p. 32) ... Fragment shader inputs get per-fragment values, typically interpolated from a previous stage's outputs. They are declared in fragment shaders with the in, centroid in, or sample in storage qualifiers or the deprecated varying and centroid varying storage qualifiers. ... (add to examples immediately below) sample in vec4 perSampleColor; Modify Section 4.3.6, Outputs, p. 33 (modify first paragraph of section) Shader output variables are declared with the out, centroid out, or sample out storage qualifiers. ... (modify third paragraph of section) Vertex and geometry output variables output per-vertex data and are declared using the out, centroid out, or sample out storage qualifiers, or the deprecated varying storage qualifier. (add to examples immediately below) sample out vec4 perSampleColor; (modify last paragraph, p. 33) Fragment outputs output per-fragment data and are declared using the out storage qualifier. It is an error to use centroid out or sample out in a fragment shader. ... Modify Section 4.3.7, Interface Blocks, p. 34 (modify last paragaph, p. 36, removing the requirement for indexing uniform blocks using constant expressions) For uniform blocks declared as arrays, each individual array element corresponds to a separate buffer object backing one instance of the block. As the array size indicates the number of buffer objects needed, uniform block array declarations must specify an integral array size. Arbitrary indices may be used to index a uniform block array; integral constant expressions are not required. If the index used to access an array of uniform blocks is out-of-bounds, the results of the access are undefined. Modify Section 4.3.8.1, Input Layout Qualifiers, p. 37 (modify last paragraph, p. 37, and subsequent paragraphs on p. 38) Geometry shaders support input layout qualifiers. There are two types of layout qualifiers used to specify an input primitive type and an invocation count. The input primitive type and invocation count qualifiers are allowed only on the interface qualifier in, not on an input block, block member, or variable. layout-qualifier-id points lines lines_adjacency triangles triangles_adjacency invocations = integer-constant The identifiers "points", "lines", "lines_adjacency", "triangles", and "triangles_adjacency" are used to specify the type of input primitive accepted by the geometry shader, and only one of these is accepted. At least one geometry shader (compilation unit) in a program must declare an input primitive type, and all geometry shader input primitive type declarations in a program must declare the same type. It is not required that all geometry shaders in a program declare an input primitive type. The identifier "invocations" is used to specify the number of times the geometry shader is invoked for each input primitive received. Invocation count declarations are optional. If no invocation count is declared in any geometry shader in the program, the geometry shader will be run once for each input primitive. If an invocation count is declared, all such declarations must specify the same count. If a shader specifies an invocation count greater than the implementation-dependent maximum, it will fail to compile. For example, layout(triangles, invocations=6) in; will establish that all inputs to the geometry shader are triangles and that the geometry shader is run six times for each triangle processed. All geometry shader input unsized array declarations ... Modify Section 4.3.8.2, Output Layout Qualifiers, p. 40 (modify second and subsequent paragraphs, p. 40) Geometry shaders can have output layout qualifiers. There are three types of output layout qualifiers used to specify an output primitive type, a maximum output vertex count, and per-output stream numbers. The output primitive type and output vertex count qualifiers are allowed only on the interface qualifier out, not on an output block, block member, or variable declaration. The output stream number qualifier is allowed on the interface qualifier out, or on output blocks or variable declarations. The layout qualifier identifiers for geometry shader outputs are layout-qualifier-id points line_strip triangle_strip max_vertices = integer-constant stream = integer-constant The identifiers "points", "line_strip", and "triangle_strip" are used to specify the type of output primitive produced by the geometry shader, and only one of these is accepted. At least one geometry shader (compilation unit) in a program must declare an output primitive type, and all geometry shader output primitive type declarations in a program must declare the same primitive type. It is not required that all geometry shaders in a program declare an output primitive type. The identifier "max_vertices" is used to specify the maximum number of vertices the shader will ever emit in a single invocation. At least one geometry shader (compilation unit) in a program must declare an maximum output vertex count, and all geometry shader output vertex count declarations in a program must declare the same count. It is not required that all geometry shaders in a program declare a count. In the example, layout(triangle_strip, max_vertices = 60) out; // order does not matter layout(max_vertices = 60) out; // redeclaration okay layout(triangle_strip) out; // redeclaration okay layout(points) out; // error, contradicts triangle_strip layout(max_vertices = 30) out; // error, contradicts 60 all outputs from the geometry shader are triangles and at most 60 vertices will be emitted by the shader. It is an error for the maximum number of vertices to be greater than gl_MaxGeometryOutputVertices. The identifier "stream" is used to specify that a geometry shader output variable or block is associated with a particular vertex stream (numbered beginning with zero). A default stream number may be declared at global scope by qualifying interface qualifier out as in this example: layout(stream = 1) out; The stream number specified in such a declaration replaces any previous default and applies to all subsequent block and variable declarations until a new default is established. The initial default stream number is zero. Each output block or non-block output variable is associated with a vertex stream. If the block or variable is declared with a stream qualifier, it is associated with the specified stream; otherwise, it is associated with the current default stream. A block member may be declared with a stream qualifier, but the specified stream must match the stream associated with the containing block. One example: layout(stream=1) out; // default is now stream 1 out vec4 var1; // var1 gets default stream (1) layout(stream=2) out Block1 { // "Block1" belongs to stream 2 layout(stream=2) vec4 var2; // redundant block member stream decl layout(stream=3) vec2 var3; // ILLEGAL (must match block stream) vec3 var4; // belongs to stream 2 }; layout(stream=0) out; // default is now stream 0 out vec4 var5; // var5 gets default stream (0) out Block2 { // "Block2" gets default stream (0) vec4 var6; }; layout(stream=3) out vec4 var7; // var7 belongs to stream 3 If a geometry shader output block or variable is declared more than once, all such declarations must associate the variable with the same vertex stream. If any stream declaration specifies a non-existent stream number, the shader will fail to compile. Built-in geometry shader outputs are always associated with vertex stream zero. Each vertex emitted by the geometry shader is assigned to a specific stream, and the attributes of the emitted vertex are taken from the set of output blocks and variables assigned to the targeted stream. After each vertex is emitted, the values of all output variables become undefined. Additionally, the output variables associated with each vertex stream may share storage. Writing to an output variable associated with one stream may overwrite output variables associated with any other stream. When emitting each vertex, a geometry shader should write to all outputs associated with the stream to which the vertex will be emitted and to no outputs associated with any other stream. Modify Section 4.3.9, Interpolation, p. 42 (modify first paragraph of section, add reference to sample in/out) The presence of and type of interpolation is controlled by the storage qualifiers centroid in, sample in, centroid out, and sample out, by the optional interpolation qualifiers smooth, flat, and noperspective, and by default behaviors established through the OpenGL API when no interpolation qualifier is present. ... (modify second paragraph) ... A variable may be qualified as flat centroid or flat sample, which will mean the same thing as qualifying it only as flat. (replace last paragraph, p. 42) When multisample rasterization is disabled, or for fragment shader input variables qualified with neither "centroid in" nor "sample in", the value of the assigned variable may be interpolated anywhere within the pixel and a single value may be assigned to each sample within the pixel, to the extent permitted by the OpenGL Specification. When multisample rasterization is enabled, "centroid" and "sample" may be used to control the location and frequency of the sampling of the qualified fragment shader input. If a fragment shader input is qualified with "centroid", a single value may be assigned to that variable for all samples in the pixel, but that value must be interpolated at a location that lies in both the pixel and in the primitive being rendered, including any of the pixel's samples covered by the primitive. Because the location at which the variable is sampled may be different in neighboring pixels, derivatives of centroid-sampled inputs may be less accurate than those for non-centroid interpolated variables. If a fragment shader input is qualified with "sample", a separate value must be assigned to that variable for each covered sample in the pixel, and that value must be sampled at the location of the individual sample. (Insert before Section 4.7, Order of Qualification, p. 47) Section 4.Q, The Precise Qualifier Some algorithms may require that floating-point computations be carried out in exactly the manner specified in the source code, even if the implementation supports optimizations that could produce nearly equivalent results with higher performance. For example, many GL implementations support a "multiply-add" that can compute values such as float result = (float(a) * float(b)) + float(c); in a single operation. The result of a floating-point multiply-add may not always be identical to first doing a multiply yielding a floating-point result, and then doing a floating-point add. By default, implementations are permitted to perform optimizations that effectively modify the order of the operations used to evaluate an expression, even if those optimizations may produce slightly different results relative to unoptimized code. The qualifier "precise" will ensure that operations contributing to a variable's value are performed in the order and with the precision specified in the source code. Order of evaluation is determined by operator precedence and parentheses, as described in Section 5. Expressions must be evaluated with a precision consistent with the operation; for example, multiplying two "float" values must produce a single value with "float" precision. This effectively prohibits the arbitrary use of fused multiply-add operations if the intermediate multiply result is kept at a higher precision. For example: precise out vec4 position; declares that computations used to produce the value of "position" must be performed precisely using the order and precision specified. As with the invariant qualifier (section 4.6.1), the precise qualifier may be used to qualify a built-in or previously declared user-defined variable as being precise: out vec3 Color; precise Color; // make existing Color be precise This qualifier will affect the evaluation of expressions used on the right-hand side of an assignment if and only if: * the variable assigned to is qualified as "precise"; or * the value assigned is used later in the same function, either directly or indirectly, on the right-hand of an assignment to a variable declared as "precise". Expressions computed in a function are treated as precise only if assigned to a variable qualified as "precise" in that same function. Any other expressions within a function are not automatically treated as precise, even if they are used to determine a value that is returned by the function and directly assigned to a variable qualified as "precise". Some examples of the use of "precise" include: in vec4 a, b, c, d; precise out vec4 v; float func(float e, float f, float g, float h) { return (e*f) + (g*h); // no special precision } float func2(float e, float f, float g, float h) { precise result = (e*f) + (g*h); // ensures a precise return value return result; } float func3(float i, float j, precise out float k) { k = i * i + j; // precise, due to declaration } void main(void) { vec4 r = vec3(a * b); // precise, used to compute v.xyz vec4 s = vec3(c * d); // precise, used to compute v.xyz v.xyz = r + s; // precise v.w = (a.w * b.w) + (c.w * d.w); // precise v.x = func(a.x, b.x, c.x, d.x); // values computed in func() // are NOT precise v.x = func2(a.x, b.x, c.x, d.x); // precise! func3(a.x * b.x, c.x * d.x, v.x); // precise! } Modify Section 4.7, Order of Qualification, p. 47 When multiple qualifications are present, they must follow a strict order. This order is as follows: precise-qualifier invariant-qualifier interpolation-qualifier storage-qualifier precision-qualifier Modify Section 5.9, Expressions, p. 57 (modify bulleted list as follows, adding support for implicit conversion between signed and unsigned types) Expressions in the shading language are built from the following: * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector types, and all matrix types. ... * The operator modulus (%) operates on signed or unsigned integer scalars or vectors. If the fundamental types of the operands do not match, the conversions from Section 4.1.10 "Implicit Conversions" are applied to produce matching types. ... Modify Section 6.1, Function Definitions, p. 63 (modify description of overloading, beginning at the top of p. 64) Function names can be overloaded. The same function name can be used for multiple functions, as long as the parameter types differ. If a function name is declared twice with the same parameter types, then the return types and all qualifiers must also match, and it is the same function being declared. For example, vec4 f(in vec4 x, out vec4 y); // (A) vec4 f(in vec4 x, out uvec4 y); // (B) okay, different argument type vec4 f(in ivec4 x, out uvec4 y); // (C) okay, different argument type int f(in vec4 x, out ivec4 y); // error, only return type differs vec4 f(in vec4 x, in vec4 y); // error, only qualifier differs vec4 f(const in vec4 x, out vec4 y); // error, only qualifier differs When function calls are resolved, an exact type match for all the arguments is sought. If an exact match is found, all other functions are ignored, and the exact match is used. If no exact match is found, then the implicit conversions in Section 4.1.10 (Implicit Conversions) will be applied to find a match. Mismatched types on input parameters (in or inout or default) must have a conversion from the calling argument type to the formal parameter type. Mismatched types on output parameters (out or inout) must have a conversion from the formal parameter type to the calling argument type. If implicit conversions can be used to find more than one matching function, a single best-matching function is sought. To determine a best match, the conversions between calling argument and formal parameter types are compared for each function argument and pair of matching functions. After these comparisons are performed, each pair of matching functions are compared. A function definition A is considered a better match than function definition B if: * for at least one function argument, the conversion for that argument in A is better than the corresponding conversion in B; and * there is no function argument for which the conversion in B is better than the corresponding conversion in A. If a single function definition is considered a better match than every other matching function definition, it will be used. Otherwise, a semantic error occurs and the shader will fail to compile. To determine whether the conversion for a single argument in one match is better than that for another match, the following rules are applied, in order: 1. An exact match is better than a match involving any implicit conversion. 2. A match involving an implicit conversion from float to double is better than a match involving any other implicit conversion. 3. A match involving an implicit conversion from either int or uint to float is better than a match involving an implicit conversion from either int or uint to double. If none of the rules above apply to a particular pair of conversions, neither conversion is considered better than the other. For the function prototypes (A), (B), and (C) above, the following examples show how the rules apply to different sets of calling argument types: f(vec4, vec4); // exact match of vec4 f(in vec4 x, out vec4 y) f(vec4, uvec4); // exact match of vec4 f(in vec4 x, out ivec4 y) f(vec4, ivec4); // matched to vec4 f(in vec4 x, out vec4 y) // (C) not relevant, can't convert vec4 to // ivec4. (A) better than (B) for 2nd // argument (rule 2), same on first argument. f(ivec4, vec4); // NOT matched. All three match by implicit // conversion. (C) is better than (A) and (B) // on the first argument. (A) is better than // (B) and (C). Modify Section 7.1, Vertex And Geometry Shader Special Variables, p. 69 (add to the list of geometry shader special variables, p. 69) in int gl_InvocationID; (add to the end of the section, p. 71) The input variable gl_InvocationID is available in the geometry language and is filled with an integer holding the invocation number associated with the given shader invocation. If the program is linked to support multiple geometry shader invocations per input primitive, the invocations are numbered 0, 1, 2, ..., -1. gl_InvocationID is not available in the vertex or fragment language. Modify Section 7.2, Fragment Shader Special Variables, p. 72 (add to the list of built-in variables) in int gl_SampleMaskIn[]; The variable gl_SampleMaskIn is an array of integers, each holding a bitfield indicating the set of samples covered by the primitive generating the fragment during multisample rasterization. The array has ceil(/32) elements, where is the maximum number of color samples supported by the implementation. Bit or word in the bitfield is set if and only if the sample numbered *32+ is considered covered for this fragment shader invocation. Modify Section 8.3, Common Functions, p. 84 (add support for floating-point multiply-add) Syntax: genType fma(genType a, genType b, genType c); The function fma() performs a fused floating-point multiply-add to compute the value a*b+c. The results of fma() may not be identical to evaluating the expression (a*b)+c, because the computation may be performed in a single operation with intermediate precision different from that used to compute a non-fma() expression. The results of fma() are guaranteed to be invariant given fixed inputs , , and , as though the result were taken from a variable declared as "precise". (add support for single-precision frexp and ldexp functions) Syntax: genType frexp(genType x, out genIType exp); genType ldexp(genType x, in genIType exp); The function frexp() splits each single-precision floating-point number in into a binary significand, a floating-point number in the range [0.5, 1.0), and an integral exponent of two, such that: x = significand * 2 ^ exponent The significand is returned by the function; the exponent is returned in the parameter . For a floating-point value of zero, the significant and exponent are both zero. For a floating-point value that is an infinity or is not a number, the results of frexp() are undefined. If the input is a vector, this operation is performed in a component-wise manner; the value returned by the function and the value written to are vectors with the same number of components as . The function ldexp() builds a single-precision floating-point number from each significand component in and the corresponding integral exponent of two in , returning: significand * 2 ^ exponent If this product is too large to be represented as a single-precision floating-point value, the result is considered undefined. If the input is a vector, this operation is performed in a component-wise manner; the value passed in and returned by the function are vectors with the same number of components as . (add support for new integer built-in functions) Syntax: genIType bitfieldExtract(genIType value, int offset, int bits); genUType bitfieldExtract(genUType value, int offset, int bits); genIType bitfieldInsert(genIType base, genIType insert, int offset, int bits); genUType bitfieldInsert(genUType base, genUType insert, int offset, int bits); genIType bitfieldReverse(genIType value); genUType bitfieldReverse(genUType value); genIType bitCount(genIType value); genIType bitCount(genUType value); genIType findLSB(genIType value); genIType findLSB(genUType value); genIType findMSB(genIType value); genIType findMSB(genUType value); The function bitfieldExtract() extracts bits through +-1 from each component in , returning them in the least significant bits of corresponding component of the result. For unsigned data types, the most significant bits of the result will be set to zero. For signed data types, the most significant bits will be set to the value of bit +-1. If is zero, the result will be zero. The result will be undefined if or is negative, or if the sum of and is greater than the number of bits used to store the operand. Note that for vector versions of bitfieldExtract(), a single pair of and values is shared for all components. The function bitfieldInsert() inserts the least significant bits of each component of into the corresponding component of . The result will have bits numbered through +-1 taken from bits 0 through -1 of , and all other bits taken directly from the corresponding bits of . If is zero, the result will simply be . The result will be undefined if or is negative, or if the sum of and is greater than the number of bits used to store the operand. Note that for vector versions of bitfieldInsert(), a single pair of and values is shared for all components. The function bitfieldReverse() reverses the bits of . The bit numbered of the result will be taken from bit (-1)- of , where is the total number of bits used to represent . The function bitCount() returns the number of one bits in the binary representation of . The function findLSB() returns the bit number of the least significant one bit in the binary representation of . If is zero, -1 will be returned. The function findMSB() returns the bit number of the most significant bit in the binary representation of . For positive integers, the result will be the bit number of the most significant one bit. For negative integers, the result will be the bit number of the most significant zero bit. For a of zero or negative one, -1 will be returned. (add support for general packing functions) Syntax: uint packUnorm2x16(vec2 v); uint packUnorm4x8(vec4 v); uint packSnorm4x8(vec4 v); vec2 unpackUnorm2x16(uint v); vec4 unpackUnorm4x8(uint v); vec4 unpackSnorm4x8(uint v); The functions packUnorm2x16(), packUnorm4x8(), and packSnorm4x8() first convert each component of a two- or four-component vector of normalized floating-point values into 8- or 16-bit integer values. Then, the results are packed into a 32-bit unsigned integer. The first component of the vector will be written to the least significant bits of the output; the last component will be written to the most significant bits. The functions unpackUnorm2x16(), unpackUnorm4x8(), and unpackSnorm4x8() first unpacks a single 32-bit unsigned integer into a pair of 16-bit unsigned integers, four 8-bit unsigned integers, or four 8-bit signed integers. The, each component is converted to a normalized floating-point value to generate a two- or four-component vector. The first component of the vector will be extracted from the least significant bits of the input; the last component will be extracted from the most significant bits. The conversion between fixed- and normalized floating-point values will be performed as below. function conversion --------------- ----------------------------------------------------- packUnorm2x16 fixed_val = round(clamp(float_val, 0, +1) * 65535.0); packUnorm4x8 fixed_val = round(clamp(float_val, 0, +1) * 255.0); packSnorm4x8 fixed_val = round(clamp(float_val, -1, +1) * 127.0); unpackUnorm2x16 float_val = fixed_val / 65535.0; unpackUnorm4x8 float_val = fixed_val / 255.0; unpackSnorm4x8 float_val = clamp(fixed_val / 127.0, -1, +1); (add functions to get/set the bit encoding for floating-point values) 32-bit floating-point data types in the OpenGL shading language are specified to be encoded according to the IEEE 754 specification for single-precision floating-point values. The functions below allow shaders to convert floating-point values to and from signed or unsigned integers representing their encoding. To obtain signed or unsigned integer values holding the encoding of a floating-point value, use: genIType floatBitsToInt(genType value); genUType floatBitsToUint(genType value); Conversions are done on a component-by-component basis. To obtain a floating-point value corresponding to a signed or unsigned integer encoding, use: genType intBitsToFloat(genIType value); genType uintBitsToFloat(genUType value); (support for unsigned integer add/subtract with carry-out) Syntax: genUType uaddCarry(genUType x, genUType y, out genUType carry); genUType usubBorrow(genUType x, genUType y, out genUType borrow); The function uaddCarry() adds 32-bit unsigned integers or vectors and , returning the sum modulo 2^32. The value is set to zero if the sum was less than 2^32, or one otherwise. The function usubBorrow() subtracts the 32-bit unsigned integer or vector from , returning the difference if non-negative or 2^32 plus the difference, otherwise. The value is set to zero if x >= y, or one otherwise. (support for signed and unsigned multiplies, with 32-bit inputs and a 64-bit result spanning two 32-bit outputs) Syntax: void umulExtended(genUType x, genUType y, out genUType msb, out genUType lsb); void imulExtended(genIType x, genIType y, out genIType msb, out genIType lsb); The functions umulExtended() and imulExtended() multiply 32-bit unsigned or signed integers or vectors and , producing a 64-bit result. The 32 least significant bits are returned in ; the 32 most significant bits are returned in . Modify Section 8.7, Texture Lookup Functions, p. 91 (extend the basic versions of textureGather from ARB_texture_gather, allowing for optional component selection in a multi-component texture and for shadow mapping) Syntax: gvec4 textureGather(gsampler2D sampler, vec2 coord[, int comp]); gvec4 textureGather(gsampler2DArray sampler, vec3 coord[, int comp]); gvec4 textureGather(gsamplerCube sampler, vec3 coord[, int comp]); gvec4 textureGather(gsamplerCubeArray sampler, vec4 coord[, int comp]); gvec4 textureGather(gsampler2DRect sampler, vec2 coord[, int comp]); vec4 textureGather(sampler2DShadow sampler, vec2 coord, float refZ); vec4 textureGather(sampler2DArrayShadow sampler, vec3 coord, float refZ); vec4 textureGather(samplerCubeShadow sampler, vec3 coord, float refZ); vec4 textureGather(samplerCubeArrayShadow sampler, vec4 coord, float refZ); vec4 textureGather(sampler2DRectShadow sampler, vec2 coord, float refZ); The textureGather() functions use the texture coordinates given by to determine a set of four texels to sample from the texture identified by . These functions return a four-component vector consisting of one component from each texel. If specified, the value of must be a constant integer expression with a value of zero, one, two, or three, identifying the , , , or component of the four-component vector lookup result for each texel, respectively. If is not specified, the component of each texel will be used to generate the result vector. As described in the OpenGL Specification, the vector selects the post-swizzle component corresponding to from each of the four texels, returning: vec4(T_i0_j1(coord, base)., T_i1_j1(coord, base)., T_i1_j0(coord, base)., T_i0_j0(coord, base).) For textureGather() functions using a shadow sampler type, each of the four texel lookups performs a depth comparison against the depth reference value passed in , and returns the result of that comparison in the appropriate component of the result vector. The parameter used for component selection is not supported for textureGather() functions with shader sampler types. As with other texture lookup functions, the results of textureGather() are undefined for shadow samplers if the texture referenced is not a depth texture or has depth comparisons disabled; or for non-shadow samplers if the texture referenced is a depth texture with depth comparisons enabled. (extend the "Offset" versions of textureGather from ARB_texture_gather, allowing for optional component selection in a multi-component texture, non-constant offsets, and shadow mapping) Syntax: gvec4 textureGatherOffset(gsampler2D sampler, vec2 coord, ivec2 offset[, int comp]); gvec4 textureGatherOffset(gsampler2DArray sampler, vec3 coord, ivec2 offset[, int comp]); gvec4 textureGatherOffset(gsampler2DRect sampler, vec2 coord, ivec2 offset[, int comp]); vec4 textureGatherOffset(sampler2DShadow sampler, vec2 coord, float refZ, ivec2 offset); vec4 textureGatherOffset(sampler2DArrayShadow sampler, vec3 coord, float refZ, ivec2 offset); vec4 textureGatherOffset(sampler2DRectShadow sampler, vec2 coord, float refZ, ivec2 offset); The textureGatherOffset() functions operate identically to textureGather(), except that the 2-component integer texel offset vector is applied as a (u,v) offset to determine the four texels to sample. The value need not be constant; however, a limited range of offset values are supported. If any component of is less than MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB or greater than MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, the offset applied to the texture coordinates is undefined. Note that does not apply to the layer coordinate for array textures. (add new "Offsets" versions of textureGather from ARB_texture_gather, allowing for optional component selection in a multi-component texture, separate non-constant offsets for each texel in the footprint, and shadow mapping) Syntax: gvec4 textureGatherOffsets(gsampler2D sampler, vec2 coord, ivec2 offsets[4][, int comp]); gvec4 textureGatherOffsets(gsampler2DArray sampler, vec3 coord, ivec2 offsets[4][, int comp]); gvec4 textureGatherOffsets(gsampler2DRect sampler, vec2 coord, ivec2 offsets[4][, int comp]); vec4 textureGatherOffsets(sampler2DShadow sampler, vec2 coord, float refZ, ivec2 offsets[4]); vec4 textureGatherOffsets(sampler2DArrayShadow sampler, vec3 coord, float refZ, ivec2 offsets[4]); vec4 textureGatherOffsets(sampler2DRectShadow sampler, vec2 coord, float refZ, ivec2 offsets[4]); The textureGatherOffsets() functions operate identically to textureGather(), except that the array of two-component integer vectors is used to determine the location of the four texels to sample. Each of the four texels is obtained by applying the corresponding offset in the four-element array as a (u,v) coordinate offset to the coordinates , identifying the four-texel LINEAR footprint, and then selecting the texel T_i0_j0 of that footprint. The specified values in must be constant. A limited range of offset values are supported; the minimum and maximum offset values are implementation-dependent and given by MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB and MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, respectively. Note that does not apply to the layer coordinate for array textures. Modify Section 8.8, Fragment Processing Functions, p. 101 (add new functions to the end of section, p. 102) Built-in interpolation functions are available to compute an interpolated value of a fragment shader input variable at a shader-specified (x,y) location. A separate (x,y) location may be used for each invocation of the built-in function, and those locations may differ from the default (x,y) location used to produce the default value of the input. float interpolateAtCentroid(float interpolant); vec2 interpolateAtCentroid(vec2 interpolant); vec3 interpolateAtCentroid(vec3 interpolant); vec4 interpolateAtCentroid(vec4 interpolant); float interpolateAtSample(float interpolant, int sample); vec2 interpolateAtSample(vec2 interpolant, int sample); vec3 interpolateAtSample(vec3 interpolant, int sample); vec4 interpolateAtSample(vec4 interpolant, int sample); float interpolateAtOffset(float interpolant, vec2 offset); vec2 interpolateAtOffset(vec2 interpolant, vec2 offset); vec3 interpolateAtOffset(vec3 interpolant, vec2 offset); vec4 interpolateAtOffset(vec4 interpolant, vec2 offset); The function interpolateAtCentroid() will return the value of the input varying sampled at a location inside the both the pixel and the primitive being processed. The value obtained would be the same value assigned to the input variable if declared with the "centroid" qualifier. The function interpolateAtSample() will return the value of the input varying at the location of the sample numbered . If multisample buffers are not available, the input varying will be evaluated at the center of the pixel. If the sample number given by does not exist, the position used to interpolate the input varying is undefined. The function interpolateAtOffset() will return the value of the input varying sampled at an offset from the center of the pixel specified by . The two floating-point components of give the offset in pixels in the x and y directions, respectively. An offset of (0,0) identifies the center of the pixel. The range and granularity of offsets supported by this function is implementation-dependent. For all of the interpolation functions, must be an input variable or an element of an input variable declared as an array. Component selection operators (e.g., ".xy") may not be used when specifying . If is declared with a "flat" or "centroid" qualifier, the qualifier will have no effect on the interpolated value. If is declared with the "noperspective" qualifier, the interpolated value will be computed without perspective correction. Modify Section 8.10, Geometry Shader Functions, p. 104 (replace the section, using the following more general formulation) These functions are only available in geometry shaders. Syntax: void EmitStreamVertex(int stream); // Geometry-only void EndStreamPrimitive(int stream); // Geometry-only void EmitVertex(); // Geometry-only void EndPrimitive(); // Geometry-only Description: The function EmitStreamVertex() specifies that the vertex being generated by the geometry shader is completed. A vertex is added to the current output primitive in the vertex stream numbered using the current values of all output variables associated with . The values of any unwritten output variables associated with are undefined. The argument must be a constant integral expression. The values of all output variables (for all output streams) are undefined after calling EmitStreamVertex(). If a geometry shader invocation has emitted more vertices than permitted by the output layout qualifier "max_vertices", the results of calling EmitStreamVertex() are undefined. The function EmitVertex() is equivalent to calling EmitStreamVertex() with set to zero. The function EndStreamPrimitive() specifies that the current output primitive for the vertex stream numbered is completed and that a new empty output primitive of the same type should be started. The argument must be a constant integral expression. This function does not emit a vertex. If the output layout is declared to be "points", calling EndPrimitive() is optional. The function EndPrimitive() is equivalent to calling EndStreamPrimitive() with set to zero. A geometry shader starts with an output primitive containing no vertices for each stream. When a geometry shader terminates, the current output primitive for each vertex stream is automatically completed. It is not necessary to call EndPrimitive() or EndStreamPrimitive() for any stream where the geometry shader writes only a single primitive. Multiple vertex streams are supported only if the output primitive type is declared to be "points". A program will fail to link if it contains a geometry shader calling EmitStreamVertex() or EndStreamPrimitive() if its output primitive type is not "points". Modify Section 9, Shading Language Grammar, p. 92 !!! TBD !!! GLX Protocol None. Dependencies on ARB_gpu_shader_fp64 This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set of implicit conversions supported in the OpenGL Shading Language. If more than one of these extensions is supported, an expression of one type may be converted to another type if that conversion is allowed by any of these specifications. If ARB_gpu_shader_fp64 or a similar extension introducing new data types is not supported, the function overloading rule in the GLSL specification preferring promotion an input parameters to smaller type to a larger type is never applicable, as all data types are of the same size. That rule and the example referring to "double" should be removed. Dependencies on NV_gpu_shader5 This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set of implicit conversions supported in the OpenGL Shading Language. If more than one of these extensions is supported, an expression of one type may be converted to another type if that conversion is allowed by any of these specifications. This specification and NV_gpu_shader5 both lift the restriction in GLSL 1.50 requiring that indexing in arrays of samplers must be done with constant expressions. However, this extension specifies that results are undefined if the indices would diverge if multiple shader invocations are run in lockstep. NV_gpu_shader5 does not impose the non-divergent indexing requirement. If NV_gpu_shader5 is supported, integer data types are supported with four different precisions (8-, 16, 32-, and 64-bit) and floating-point data types are supported with three different precisions (16-, 32-, and 64-bit). The extension adds the following rule for output parameters, which is similar to the one present in this extension for input parameters: 5. If the formal parameters in both matches are output parameters, a conversion from a type with a larger number of bits per component is better than a conversion from a type with a smaller number of bits per component. For example, a conversion from an "int16_t" formal parameter type to "int" is better than one from an "int8_t" formal parameter type to "int". Such a rule is not provided in this extension because there is no combination of types in this extension and ARB_gpu_shader_fp64 where this rule has any effect. Dependencies on ARB_sample_shading This extension builds upon the per-sample shading support provided by ARB_sample_shading to provide several new capabilities, including: * the built-in variable gl_SampleMaskIn[] indicates the set of samples covered by the input primitive corresponding to the fragment shader invocation; and * use of the "sample" qualifier on a fragment shader input forces per-sample shading, and specifies that the value of the input be evaluated per-sample. There is no interaction between the extensions, except that shaders using the features of this extension seem likely to use features from ARB_sample_shading as well. Dependencies on ARB_texture_gather This extension builds upon the textureGather() built-ins provided by ARB_texture_gather to provide several new capabilities, including: * allowing shaders to select any single component of a multi-component texture to produce the gathered 2x2 footprint; * allowing shaders to perform a per-sample depth comparison when gathering the 2x2 footprint using for shadow sampler types; * allowing shaders to use arbitrary offsets computed at run-time to select a 2x2 footprint to gather from; and * allowing shaders to use separate independent offsets for each of the four texels returned, instead of requiring a fixed 2x2 footprint. Other than the fact that they provide similar functionality, there is no interaction between the extensions. Since this extension requires support for gathering from multi-component textures, the minimum value of MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB is increased to 4. Errors INVALID_OPERATION is generated by GetProgram if is GEOMETRY_SHADER_INVOCATIONS and the program which has not been linked successfully, or does not contain objects to form a geometry shader. New State Add the following state to Table 6.40, Program Object State, p. 378 Initial Get Value Type Get Command Value Description Sec. Attribute ------------------------- ---- ------------ ------- ------------------------- ------ ------- GEOMETRY_SHADER_ Z+ GetProgramiv 1 number of times a geometry 6.1.16 - INVOCATIONS shader should be executed for each input primitive New Implementation Dependent State Min. Get Value Type Get Command Value Description Sec. Attrib ---------------------- ---- ----------- ----- -------------------------- -------- ------ MAX_GEOMETRY_SHADER_ Z+ GetIntegerv 32 maximum supported geometry 2.16.4 - INVOCATIONS shader invocation count MIN_FRAGMENT_INTERP- R GetFloatv -0.5 furthest negative offset 3.12.1 - OLATION_OFFSET for interpolateAtOffset() MAX_FRAGMENT_INTERP- R GetFloatv +0.5 furthest positive offset 3.12.1 - OLATION_OFFSET for interpolateAtOffset() FRAGMENT_INTERPOLATION_ Z+ GetIntegerv 4 supixel bits for 3.12.1 - OFFSET_BITS interpolateAtOffset() MAX_VERTEX_STREAMS Z+ GetInteger 4 total number of vertex 2.16.4 - streams (Note: The minimum value for MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB, added by ARB_texture_gather, is increased to 4.) Issues (1) This extension builds on the capability provided by ARB_sample_shading, adding a new built-in variable for the input sample mask. It seems likely that a shader using this mask might also want to use one or more ARB_sample_shading built-ins. Are such shaders required to include #extension lines for both extensions? UNRESOLVED: It would be nice if it wasn't required. (2) How do the per-sample shading features of this extension interact with non-multisample rendering? RESOLVED: Non-multisample rendering (due to no multisample buffer or MULTISAMPLE disabled) is treated as single-sample rendering. (3) This extension lifts the restriction requiring that indices into samplers be constant expressions, but makes the results undefined if the indices used would diverge in lockstep execution. What is this good for? RESOLVED: This allows shaders to index into samplers using integer uniforms, or with non-divergent values computed at run-time (e.g., loop counters). Many implementations of this extension will be SIMD, running multiple shader invocations at once, and some implementations may have difficulty with accessing multiple textures in a single SIMD instruction. Note that the NV_gpu_shader5 extension similarly lifts the restriction but does not require non-divergent indexing. (4) What sort of implicit conversions should we support in this and related extensions? RESOLVED: In GLSL 1.50, we have implicit conversion from "int" and "uint" to "float", as well as equivalent conversions for vector type. One of the primary motivations of this feature is to allow constants that are nominally integer values to be used in floating-point contexts without requiring special suffixes. The following code compiles successfully in GLSL 1.50. float square(float x) { return x * x; } float f = 0; float g = f * 2; float h = square(3); The same code would fail on GLSL 1.1, because "0", "2", and "3" would need to be written as "0.0", "2.0", and "3.0", respectively. This extension adds implicit conversions from "int" to "uint" to allow for cases like: uint square(uint x) { return x * x; } uint v = square(2); This code is legal with this extension, but not in GLSL 1.50 ("2" would need to be replaced with "2U" or "uint(2)"). ARB_gpu_shader_fp64 adds a new type "double", and we extend existing implicit conversions to allow for promotion of "int", "uint", and "float" to "double". Unlike C/C++, the general rule for implicit conversions in GLSL is that conversions are unidirectional. If type A can be implicitly converted to type B, type B can not be converted to type A. (5) Increasing the number of available implicit conversions means that there is the possibility of ambiguities in various operators? How do we deal with these cases? RESOLVED: For binary operators, the new implicit conversions mean that there may be multiple ways to resolve an expression. For example, in the following declaration int i; uint u; the expression "i+u" could be resolved either by implicitly converting "i" to "uint", or by implicitly converting both values to either "float" or "double". To resolve, we define a set of preferences for a common data type based on the types of the operands: - use a floating-point type if either operand is floating-point - use an unsigned integer type if either operand is unsigned - use a signed integer type otherwise If conversions to multiple precisions are supported, the lowest-precision available data type is preferred (e.g., int*float will be converted to float*float and not double*double). These rules should extend naturally if new basic data types are added. (6) Increasing the number of available implicit conversions means that there is an increased possibility of ambiguity when function overloading is involved? Additionally, this and related extensions add new function overloads? How do we deal with these cases? RESOLVED: The general rule for function overloading in GLSL 1.50 is that we first check for a function prototype that exactly matches the parameters passed to a function call. If no match exists, we check for prototypes that can be matched by implicit conversions. If more than one matching prototype can be matched by conversion, the function call is considered ambiguous and results in a complication error. Unfortunately, when adding new implicit conversions, it is possible for cases that were formally unambiguous to become ambiguous. For backward compatibility purposes, it would be desirable to ensure that shaders that succeeded in old language versions should still compile if "upgraded" to more recent versions/extensions. However, the new conversions and overloads might make this more difficult without modifying other language rules. For example, the following prototypes are available for the standard built-in function min() on scalar values when this extension and ARB_gpu_shader_fp64 are supported: int min(int a, int b); uint min(uint a, uint b); float min(float a, float b); double min(double a, double b); In GLSL 1.50, a function call such as: float f; min(f, 1); would be considered unambiguous because the double-precision version of min() didn't exist and the call matched only the single-precision version. However, with double-precision, implicit conversions can be used to resolve to either the single- or double-precision versions. To resolve this issue, we provide a set of rules that can be used to resolve multiple candidates to a "best match". The rules for determining a best match are similar to those for C++ function overloading, but not exactly the same. Like C++, these rules compare the conversions required on an argument-by-argument basis. A function prototype A is better than function prototype B if: - A is better than B for one or more arguments - B is better than A for no arguments If a single function prototype is better than all others, that one is used. Otherwise, we get the same ambiguity error as on previous GLSL versions. As far as argument-by-argument comparisons go, the order of preference is: - favor exact matches - prefer "promotions" (float->double) to other conversions - prefer conversions from int/uint to float over similar conversion to double If none of the rules apply, one match is considered neither better nor worse than the other. With these rules, the "min(f,1)" example above resolves to the "float" version, as is the case in GLSL 1.50. However, there are other cases where ambiguity remains. For example, consider the prototypes: int f(uint x); int f(float x); With GLSL 1.50 rules, "f(3)" would match the floating-point version, as no implicit conversions existed from "int" to "uint". With the new implicit conversions, both prototypes match and neither is preferred. Because of the ambiguity, "f(3)" would fail to compile with this extension enabled, but should still compile on implementations supporting this extension if the extension is not enabled in GLSL source code. (7) The function overloading rules described in this extension describe conversions between data types with different sizes, however all existing data types allowing implicit conversion (int, uint, float) are the same size? Why do we specify these rules? RESOLVED: This extension is specified at the same time as the related ARB_gpu_shader_fp64 and NV_gpu_shader5 extensions, which do provide such types. The rules are specified all in one place here so we don't have to replicate and extend the rules in the other extensions. It also provides the ability to automatically convert from signed to unsigned integer types, as in the C programming language. (8) Should we support textureGather() for rectangle textures (sampler2DRect)? They aren't in ARB_texture_gather. RESOLVED: Yes. (9) How does the input sample mask interact with the fixed-function SampleCoverage and SampleMask state? Will samples be removed from the input mask if they would be eliminated by these masks in the per-fragment operations? UNRESOLVED. (10) Should we support reading patches as geometry shader inputs, and if so, where? RESOLVED: Not in this extension. This capability will be provided in NV_gpu_shader5. (11) Should we support per-sample interpolation of attributes? If so, how? RESOLVED. Yes. When multisample rasterization is enabled, qualifying one or more fragment shader inputs with "sample" will force per-sample interpolation of those attributes. If the same shader includes other fragment inputs not qualified with sample, those attributes may be interpolated per-pixel (i.e., all samples get the same values, likely evaluated at the pixel center). (12) Should we reserve "sample" as a keyword for per-sample interpolation qualifiers, or use something more obscure, such as "per_sample"? RESOLVED: This extension uses "sample". (13) What should be the base data type for the bitCount(), findLSB(), and findMSB() functions -- signed or unsigned integers? RESOLVED: These functions will return signed values, with -1 returned by findLSB/findMSB if no bit is found. Note that the shading language supports implicit conversions of signed integers to unsigned, which makes it easy enough if an unsigned result is desired. (14) Why do EmitVertex() and EndPrimitive() begin with capitalized words while most of the other built-ins start with a lower-case (e.g., emitVertex)? Which precedent should the new per-vertex stream emit and end primitive functions follow? RESOLVED: The inconsistency began with the original functions in EXT_geometry_shader4; the spec author can't recall the original reasons (if any). Regardless, we decided to match the existing functions as closely as possible and use EmitStreamVertex() and EndStreamPrimitive(). (15) How do the textureGather functions work with sRGB textures? RESOLVED: Gamma-correction is applied to the texture source color before "gathering" and hence applies to all four components, unless the texture swizzle of the selected component is ALPHA in which case no gamma-correction is applied. (16) How should we support arrays of uniform blocks (i.e., multiple blocks in a group, each backed by a separate buffer object)? RESOLVED: We will use instance names in the block definitions, which can be declared as regular arrays: uniform UniformData { vec4 stuff; } blocks[4]; These four blocks used will be referred to as "block[0]" through "block[3]" in shader code, and "UniformData[0]" through "UniformData[3]" in the OpenGL API code. The block member in this example will be referred to as "UniformData.stuff" in the API. A similar approach was already adopted in GLSL 1.50, where geometry shaders supported arrays of input blocks that were treated similarly. Since this spec depends on GLSL 1.50, little new spec language is required here. (17) What are instanced geometry shaders useful for? RESOLVED: Instanced geometry shaders allow geometry programs that perform regular operations to run more efficiently. Consider a simple example of an algorithm that uses geometry shaders to render primitives to a cube map in a single pass. Without instanced geometry shaders, the geometry shader to render triangles to the cube map would do something like: for (face = 0; face < 6; face++) { for (vertex = 0; vertex < 3; vertex++) { project vertex onto face , output position compute/copy attributes of emitted to outputs output to result.layer emit the projected vertex } end the primitive (next triangle) } This algorithm would output 18 vertices per input triangle, three for each cube face. The six triangles emitted would be rasterized, one per face. Geometry shaders that emit a large number of attributes have often posed performance challenges, since all the attributes must be stored somewhere until the emitted primitives. Large storage requirements may limit the number of threads that can be run in parallel and reduce overall performance. Instanced geometry shaders allow this example to be restructured to run with six separate invocations, one per face. Each invocation projects the triangle to only a single face (identified by the invocation number) and emits only 3 vertices. The reduced storage requirements allow more geometry shader invocations to be run in parallel, with greater overall efficiency. Additionally, the total number of attributes that can be emitted by a single geometry shader invocation is limited. However, for instanced geometry shaders, that limit applies to each of invocations which allows for a larger total output. For example, if the GL implementation supports only 1024 components of output per invocation, the 18-vertex algorithm above could emit no more than 56 components per vertex. The same algorithm implemented as a 3-vertex 6-invocation geometry program could theoretically allow for 341 components per vertex. (18) Should EmitStreamVertex() and EndStreamPrimitive() accept a non-constant stream number? RESOLVED: Not in this extension. Requiring a constant stream number for each call simplifies code generation for the compiler. (19) Are there any restrictions on geometry shaders with multiple output streams? RESOLVED: Yes, such geometry shaders are required to generate points; line strip and triangle strip outputs are not supported. (20) Since multi-stream geometry shaders only support points, why does EndStreamPrimitive() exist? Neither it nor EndStream() does anything useful when emitting points. RESOLVED: This function was added for completeness, and would be useful if the requirement for emitting points were lifted by a future extension. (21) Should we provide mechanisms allowing shaders to examine or set the bit representation of floating-point numbers? RESOLVED: Yes, we will provide functions to convert single-precision floats to/from signed and unsigned 32-bit integers. The ARB_gpu_shader_fp64 extension will provide similar functionality for double-precision floats. We chose to adopt the Java naming convention here -- converting a single-precision float to/from a signed integer is accomplished by the functions floatBitsToInt() and intBitsToFloat(). Note that this functionality has also been forked off into a separate extension (ARB_shader_bit_encoding) that can be exported on implementations capable of performing such conversions but not capable of the full feature set of this extension and/or OpenGL 4.0. (22) What is the "precise" qualifier good for? RESOLVED: Like "invariant", "precise" provides some invariance guarantees is useful for certain algorithms. With an output position qualified as "invariant", we ensure that if the same geometry is processed by multiple shaders using the exact same code, it will be transformed in exactly the same way to ensure that we have no cracking or flickering in multi-pass algorithms using different shaders. With "precise", we ensure that an algorithm can be written to produce identical results on subtly different inputs. For example, the order of vertices visible to a geometry or tessellation shader used to subdivide primitive edges might present an edge shared between two primitives in one direction for one primitive and the other direction for the adjacent primitive. Even if the weights are identical in the two cases, there may be cracking if the computations are being done in an order-dependent manner. If the position of a new vertex were provided by evaluation the function f() below with limited-precision floating-point math, it's not necessarily the case that f(a,b,c) == f(c,b,a) in the following code: float f(float x, float y, float z) { return (x + y) + z; } This function f() can be rewritten as follows with "precise" and a symmetric evaluation order to ensure that f(a,b,c) == f(c,b,a). float f(float x, float y, float z) { // Note that we intentionally compute "(x+z)" instead of "(x+y)" // here, because that value will be the same when and // are reversed. precise float result = (x + z) + y; return result; } (a + b) + c == (c + b) + a The "precise" qualifier will disable certain optimization and thus carries a performance cost. The cost may be higher than "invariant", because "invariant" permits optimizations disallowed by "precise" as long as the compiler ensures that it always optimizes in the exact same manner. (23) What computations will be affected by the "precise" qualifier, and what computations aren't? RESOLVED: We will ensure precise computation of any expressions within a single function used directly or indirectly to produce the value of a variable qualified as "precise". We chose not to provide this guarantee across function boundaries, even if the results of a function are used in the computation of an output qualified as "precise". Algorithms requiring the use of "precise" may have a mix of computations, some required to be precise, some not. This function boundary rule may serve to limit the amount of computation indirectly forced to be precise. Additionally, the subroutine rule permits non-precise sub-operations in a computation required to be precise. For example, a shader might need to compute a "precise" position by taking a weighted average as in the following code: precise vec3 pos = (p[0]*w[0] + p[1]*w[1]) + (p[2]*w[2] + p[3]*w[3]); However, if the main precision requirement is that the same result be generated when

and are reversed, the following code also gets the job done, even if posmad() is implemented with multiply-add operations. vec3 posmad(vec3 p0, float w0, vec3 p1w1) { return p0*w0+p1w1; } precise vec3 pos = (posmad(p[0], w[0], p[1]*w[1]) + posmad(p[3], w[3], p[2]*w[2])); To generate precise results within a function, the function arguments and/or temporaries within the function body should be qualified as "precise" as needed. Note that when applying "precise" rules to assignments, indirect application of this rule applies on an assignment-by-assignment basis. In the following perverse example: float a,b,c,d,e,f; precise float g; f = a + b + c; ... f = c + d + e; g = f * 2.0; The first assignment to need not be treated as "precise", since the value assigned will have no effect on the final value of the precise-qualified . The second assignment to must be evaluated precisely. The fact that one assignment to a variable needs to be treated as precise does not mean that the variable itself is implicitly treated as "precise". (24) Are "precise" qualifiers allowed on function arguments? If so, what do they mean? Can a return value for a function be declared as precise? RESOLVED: Yes; the rules permit the use of "precise" on any variable declaration, including function arguments. The code float f(precise in vec4 arg1, precise out vec4 arg2) { ... } specifies that any expressions used to assign values to or within f() will be evaluated as a precise manner. Expressions used to derive the value passed to the function f() as will be treated as precise according to the normal rules. The expression for is treated as precise if and only if the function call is on the right-hand side of an assignment to a variable qualified as "precise" or is indirectly used in an assignment to such a variable. It is not automatically treated as precise just because the formal parameter is qualified with "precise". For the purposes of this rule, variables passed as "out" parameters do not count as assignments. Values assigned to an output parameter will not be evaluated precisely just because the caller provides a variable qualified as "precise". When the output parameter itself is qualified as "precise", precise evaluation of that output is required within the callee. We chose not to permit function return values to be qualified as "precise", though we could have hypothetically allowed code such as: precise float f(float a, float b, float c) { return (a+b)+c; } To obtain a precise return value in such a case, use code such as: float f(float a, float b, float c) { precise float result = (a+b) + c; return result; } (25) How does texture gather interact with incomplete textures? RESOLVED: For regular texture lookups, incomplete textures are considered to return a texel value with RGBA components of (0,0,0,1). For texture gather operations, each texel in the sampled footprint is considered to have RGBA components of (0,0,0,1). When using the textureGather() function to select the R, G, or B component of an incomplete texture, (0,0,0,0) will be returned. When selecting the A component, (1,1,1,1) will be returned. Revision History Rev. Date Author Changes ---- -------- -------- ----------------------------------------- 16 03/30/12 pbrown Fix typo in language restricting the use of EmitStreamVertex()/EndStreamPrimitive() to programs with an output primitive type of points, not an input type of points (bug 8371). 15 10/17/11 pbrown Fix prototypes for textureGather and textureGatherOffset to use vec2 coordinates for "2DRect" sampler versions (bug 7964). 14 01/27/11 pbrown Add further clarification on the interaction of texture gather and incomplete textures (bug 7289). 13 09/24/10 pbrown Clarify the interaction of texture gather with swizzle (bug 5910), fixing conflicts between API and GLSL spec language. Consolidate into one copy in the API spec. 12 03/23/10 pbrown Update issues section, both fixing/numbering existing issues and including other issues that were left behind in NV_gpu_shader5 when the specs were refactored. 11 03/23/10 Jon Leech Describe to interpolateAtOffset without implying it is a constant expression (Bug 6026). 10 03/07/10 pbrown Fix typo in an output stream qualifier example. 9 03/05/10 pbrown Modify function overloading rules to remove most preferences when converting between two different types. The only preferences that remain are promoting "float" to "double" over other conversions, and preferring conversion of integers to "float" to converting to "double" (bug 5938). 8 01/29/10 pbrown Update the spec to require that the minimum value for MAX_PROGRAM_TEXTURE_GATHER_- COMPONENTS is 4 (bug 5919). 7 01/21/10 pbrown Clarify the rules for determining a best match if implicit conversions can result in multiple matching function prototypes. Modify the rules to pick a best match by comparing pairs of functions, and using any function deemed better than any other choice. Modify the argument conversion preference rules for overloading to disfavor "int" to "uint" conversions, for backward compatibility with previous GLSL versions. Add some new discussion of the choices involved to the issues section (bug 5938). 6 01/14/10 pbrown Minor wording updates from spec reviews. 5 12/10/09 pbrown Functionality updates from spec review: Rename fmad to fma. Fix error in spec language for negative diffs in usubBorrow. 4 12/10/09 pbrown Convert from EXT to ARB. 3 12/08/09 pbrown Miscellaneous fixes from spec review: Added missing implementation constants for interpolation offset range and granularity; added explicit section to OpenGL spec describing shader requested interpolation modifiers and functions. Clean up more dangling "ThreadID" references. General typo fixes and language clarifications. 2 10/01/09 pbrown Renamed gl_ThreadID to gl_InvocationID. 1 pbrown Internal revisions.