Name AMD_gpu_shader_half_float Name Strings GL_AMD_gpu_shader_half_float Contact Qun Lin, AMD (quentin.lin 'at' amd.com) Contributors Qun Lin, AMD Daniel Rakos, AMD Donglin Wei, AMD Graham Sellers, AMD Rex Xu, AMD Dominik Witczak, AMD Status Shipping. Version Last Modified Date: 09/21/2016 Author Revision: 5 Number OpenGL Extension #496 Dependencies This extension is written against the OpenGL 4.5 (Core Profile) Specification. This extension is written against version 4.50 of the OpenGL Shading Language Specification. OpenGL 4.0 and GLSL 4.00 are required. This extension interacts with ARB_gpu_shader_int64. This extension interacts with AMD_shader_trinary_minmax. This extension interacts with AMD_shader_explicit_vertex_parameter. Overview This extension was developed based on the NV_gpu_shader5 extension to allow implementations supporting half float in shader and expose the feature without the additional requirements that are present in NV_gpu_shader5. The extension introduces the following features for all shader types: * support for half float scalar, vector and matrix data types in shader; * new built-in functions to pack and unpack half float types into a 32-bit integer vector; * half float support for all existing single float built-in functions, including angle functions, exponential functions, common functions, geometric functions, matrix functions and etc.; This extension is designed to be a functional superset of the half-precision floating-point support from NV_gpu_shader5 and to keep source code compatible with that, thus the new procedures, functions, and tokens are identical to those found in that extension. New Procedures and Functions None. New Tokens Returned by the parameter of GetActiveAttrib, GetActiveUniform, and GetTransformFeedbackVarying: (The tokens are identical to those defined in NV_gpu_shader5.) FLOAT16_NV 0x8FF8 FLOAT16_VEC2_NV 0x8FF9 FLOAT16_VEC3_NV 0x8FFA FLOAT16_VEC4_NV 0x8FFB (New tokens) FLOAT16_MAT2_AMD 0x91C5 FLOAT16_MAT3_AMD 0x91C6 FLOAT16_MAT4_AMD 0x91C7 FLOAT16_MAT2x3_AMD 0x91C8 FLOAT16_MAT2x4_AMD 0x91C9 FLOAT16_MAT3x2_AMD 0x91CA FLOAT16_MAT3x4_AMD 0x91CB FLOAT16_MAT4x2_AMD 0x91CC FLOAT16_MAT4x3_AMD 0x91CD Additions to Chapter 7 of the OpenGL 4.5 (Core Profile) Specification (Program Objects) Modify Section 7.3.1, Program Interfaces (add to Table 7.3, OpenGL Shading Language type tokens, p. 108) +----------------------------+----------------+------+------+------+ | Type Name Token | Keyword |Attrib| Xfb |Buffer| +----------------------------+----------------+------+------+------+ | FLOAT16_NV | float16_t | * | * | * | | FLOAT16_VEC2_NV | f16vec2 | * | * | * | | FLOAT16_VEC3_NV | f16vec3 | * | * | * | | FLOAT16_VEC4_NV | f16vec4 | * | * | * | | FLOAT16_MAT2_AMD | f16mat2 | * | * | * | | FLOAT16_MAT3_AMD | f16mat3 | * | * | * | | FLOAT16_MAT4_AMD | f16mat4 | * | * | * | | FLOAT16_MAT2x3_AMD | f16mat2x3 | * | * | * | | FLOAT16_MAT2x4_AMD | f16mat2x4 | * | * | * | | FLOAT16_MAT3x2_AMD | f16mat3x2 | * | * | * | | FLOAT16_MAT3x4_AMD | f16mat3x4 | * | * | * | | FLOAT16_MAT4x2_AMD | f16mat4x2 | * | * | * | | FLOAT16_MAT4x3_AMD | f16mat4x3 | * | * | * | +----------------------------+----------------+------+------+------+ Modify Section 7.6.1, Loading Uniform Variables (modify the last paragraph on p. 132) The Uniform*f{v} commands will load count sets of one to four floating- point values into a uniform defined as a float, a half float, a floating- point vector, a half-precision floating-point vector or an array of either of these types. Floating-point values are converted to half float by the GL for uniforms defined as a half float, a half float vector or an array of those. Modify Section 7.6.2.1, Uniform Buffer Object Storage (modify the first two bullets of the first paragraph on p. 136) * Members of type bool, int, uint, float, float16_t and double are respectively extracted from a buffer object by reading a single uint, int, uint, float, half float or double value at the specified offset. * Vectors with N elements with basic data types of bool, int, uint, float, float16_t or double are extracted as N values in consecutive memory locations beginning at the specified offset, with components stored in order with the first (X) component at the lowest offset. The GL data type used for component extraction is derived according to the rules for scalar members above. Additions to Chapter 11 of the OpenGL 4.5 (Core Profile) Specification (Programmable Vertex Processing) Modify Section 11.1.1, Vertex Attributes (modify Table 11.2, Generic attributes and vector types used by column vectors of matrix variables bound to generic attribute index i. p. 366) +------------------------------+-------------------------+-----------------------+ | Data type |Column vector type layout| Generic | | |qualifier attributes used| | +------------------------------+-------------------------+-----------------------+ | mat2, dmat2, f16mat2 | two-component vector | i, i + 1 | | mat2x3, dmat2x3, f16mat2x3 | three-component vector | i, i + 1 | | mat2x4, dmat2x4, f16mat2x4 | four-component vector | i, i + 1 | | mat3x2, dmat3x2, f16mat3x2 | two-component vector | i, i + 1, i + 2 | | mat3, dmat3, f16mat3 | three-component vector | i, i + 1, i + 2 | | mat3x4, dmat3x4, f16mat3x4 | four-component vector | i, i + 1, i + 2 | | mat4x2, dmat4x2, f16mat4x2 | two-component vector | i, i + 1, i + 2, i + 3| | mat4x3, dmat4x3, f16mat4x3 | three-component vector | i, i + 1, i + 2, i + 3| | mat4, dmat4, f16mat4 | four-component vector | i, i + 1, i + 2, i + 3| +------------------------------+-------------------------+-----------------------+ (modify Table 11.3: Scalar and vector vertex attribute types and VertexAttrib* commands used to set the values of the corresponding generic attributes. p. 366) +-------------------+--------------------------+ | Data type | Command | +-------------------+--------------------------+ | float, float16_t | VertexAttrib1* | | vec2, f16vec2 | VertexAttrib2* | | vec3, f16vec3 | VertexAttrib3* | | vec4, f16vec4 | VertexAttrib4* | +-------------------+--------------------------+ Modify Section 11.1.2.1, Output Variables (modify the last paragraph on p. 374) ..., each component of outputs declared as half-precision floating-point scalars, vectors, or matrices is considered to consume two basic machine units, and each component of any other type ... Modifications to the OpenGL Shading Language Specification, Version 4.50 Including the following line in a shader can be used to control the language features described in this extension: #extension GL_AMD_gpu_shader_half_float : where is as specified in section 3.3. New preprocessor #defines are added to the OpenGL Shading Language: #define GL_AMD_gpu_shader_half_float 1 Additions to Chapter 3 of the OpenGL Shading Language Specification (Basics) Modify Section 3.6, Keywords (add the following to the list of reserved keywords at p. 18) float16_t f16vec2 f16vec3 f16vec4 f16mat2 f16mat3 f16mat4 f16mat2x2 fl6mat2x3 f16mat2x4 f16mat3x2 f16mat3x3 f16mat3x4 f16mat4x2 f16mat4x3 f16mat4x4 Additions to Chapter 4 of the OpenGL Shading Language Specification (Variables and Types) Modify Section 4.1, Basic Types (add to the basic "Transparent Types" table, p. 23) +-----------+------------------------------------------------------------+ | Type | Meaning | +-----------+------------------------------------------------------------+ | float16_t | a half-precision floating-point scalar | | f16vec2 | a two-component half-precision floating-point vector | | f16vec3 | a three-component half-precision floating-point vector | | f16vec4 | a four-component half-precision floating-point vector | | f16mat2 | a 2x2 half-precision floating-point matrix | | f16mat3 | a 3x3 half-precision floating-point matrix | | f16mat4 | a 4x4 half-precision floating-point matrix | | f16mat2x2 | same as a f16mat2 | | f16mat2x3 | a half-precision floating-point matrix with 2 columns and | | | 3 rows | | f16mat2x4 | a half-precision floating-point matrix with 2 columns and | | | 4 rows | | f16mat3x2 | a half-precision floating-point matrix with 3 columns and | | | 2 rows | | f16mat3x3 | same as a f16mat3 | | f16mat3x4 | a half-precision floating-point matrix with 3 columns and | | | 4 rows | | f16mat4x2 | a half-precision floating-point matrix with 4 columns and | | | 2 rows | | f16mat4x3 | a half-precision floating-point matrix with 4 columns and | | | 3 rows | | f16mat4x4 | same as a f16mat4 | +-----------+------------------------------------------------------------+ Modify Section 4.1.4, Floating-Point Variables (replace first paragraph of the section, p. 29) Single-precision, double-precision and half-precision floating point variables are available for use in a variety of scalar calculations. Generally, the term floating-point will refer to all single-, double- and half-precision floating point. Floating-point variables are defined as in the following examples: float a, b = 1.5; // single-precision floating-point double c, d = 2.0LF; // double-precision floating-point float16_t e, f = 3.0HF; // half-precision floating-point As an input value to one of the processing units, a single-precision, double- precision or half-precison floating-point variable is expected to match the corresponding IEEE 754 floating-point definition in terms of precision and dynamic range. (modify grammar rule for "floating-suffix", p. 30) floating-suffix: one of f F lf LF hf HF (modify the fourth sentence of second paragraph on p. 30) When the suffix "lf" or "LF" is present, the literal has type double. When the suffix "hf" or "HF" is present, the literal has type float16_t. Otherwise, the literal has type float. Modify Section 4.1.6, Matrices (modify the second sentence in the section, p. 30) Matrix types beginning with "mat" have single-precision components, matrix types beginning with "dmat" have double-precision components and matrix types beginning with "f16mat" have half-precision components. Modify Section 4.1.10, Implicit Conversions (modify the implicit conversion table on p. 37) +-----------------------+-------------------------------------------------+ | Type of expression | Can be implicitly converted to | +-----------------------+-------------------------------------------------+ | int, uint, float16_t | float | | ivec2, uvec2, f16vec2 | vec2 | | ivec3, uvec3, f16vec3 | vec3 | | ivec4, uvec4, f16vec4 | vec4 | | f16mat2 | mat2 | | f16mat3 | mat3 | | f16mat4 | mat4 | | f16mat2x3 | mat2x3 | | f16mat2x4 | mat2x4 | | f16mat3x2 | mat3x2 | | f16mat3x4 | mat3x4 | | f16mat4x2 | mat4x2 | | f16mat4x3 | mat4x3 | | int, uint, | double | | float, float16_t | | | ivec2, uvec2, | dvec2 | | vec2, f16vec2 | | | ivec3, uvec3, | dvec3 | | vec3, f16vec3 | | | ivec4, uvec4, | dvec4 | | vec4, f16vec4 | | | mat2, f16mat2, | dmat2 | | mat3, f16mat3 | dmat3 | | mat4, f16mat4 | dmat4 | | mat2x3, f16mat2x3 | dmat2x3 | | mat2x4, f16mat2x4 | dmat2x4 | | mat3x2, f16mat3x2 | dmat3x2 | | mat3x4, f16mat3x4 | dmat3x4 | | mat4x2, f16mat4x2 | dmat4x2 | | mat4x3, f16mat4x3 | dmat4x3 | +-----------------------+-------------------------------------------------+ Modify Section 4.4.2.1 Transform Feedback Layout Qualifiers (insert after the fourth paragraph in the section on p. 70) ... will be a multiple of 8; if applied to an aggregrate containing a float16_t, the offset must also be a multiple of 2, and the space taken in the buffer will be a multiple of 2. Modify Section 4.7.1 Range and Precision (insert after the first paragraph in the section on p. 85) ... and positive and negative zeros. The precision of stored half- precision floating-point variables is described in section 2.3.3.2 "16-Bit Floating-Point Numbers" of OpenGL Specification. The following rules apply to all floating operations, including single-, double- and half-precision operations:... Additions to Chapter 5 of the OpenGL Shading Language Specification (Operators and Expressions) Modify Section 5.4.1, Conversion and Scalar Constructors (add after the first list of constructor examples on p. 97) int(float16_t) // convert a float16_t value to a signed integer uint(float16_t) // convert a float16_t value to an unsigned integer bool(float16_t) // convert a float16_t value to a Boolean float(float16_t) // convert a float16_t value to a float value double(float16_t) // convert a float16_t value to a double value float16_t(bool) // convert a Boolean to a float16_t value float16_t(int) // convert a signed integer to a float16_t value float16_t(uint) // convert an unsigned integer to a float16_t value float16_t(float) // convert a float value to a float16_t value float16_t(double) // convert a double value to a float16_t value (modify the first sentence of last paragraph on p. 98) ... other arguments. If the basic type (bool, int, float, double, or float16_t) of a parameter to a constructor does not match the basic type of the object being constructed, the scalar construction rules (above) are used to convert the parameters. Additions to Chapter 6 of the OpenGL Shading Language Specification (Statements and Structure) Modify Section 6.1, Function Defintions (replace the second rule in third paragraph on p. 113) 2. A match involving a conversion from a signed integer, unsigned integer, or floating-point type to a similar type having a larger number of bits is better than a match involving any other implicit conversion. Additions to Chapter 8 of the OpenGL Shading Language Specification (Built-in Functions) (insert after the sixth sentence of last paragraph on p. 140) ... genDType is used as the argument. Where the input arguments (and corresponding output) can be float16_t, f16vec2, f16vec3, f16vec4, genF16Type is used as the argument. Modify Section 8.1, Angle and Trigonometry Functions (add to the table of Angle and Trigonometry Functions on p. 141) +------------------------------------------------+----------------------------------------------------+ | Syntax | Desciption | +------------------------------------------------+----------------------------------------------------+ | genF16Type radians (genF16Type degrees) | Converts degrees to radians, i.e., 180/PI * | | | degrees. | +------------------------------------------------+----------------------------------------------------+ | genF16Type degrees (genF16Type radians) | Converts radians to degrees, i.e., 180/PI * | | | radians. | +------------------------------------------------+----------------------------------------------------+ | genF16Type sin (genF16Type angle) | The standard trigonometric sine function. | +------------------------------------------------+----------------------------------------------------+ | genF16Type cos (genF16Type angle) | The standard trigonometric cosine function | +------------------------------------------------+----------------------------------------------------+ | genF16Type tan (genF16Type angle) | The standard trigonometric tangent. | +------------------------------------------------+----------------------------------------------------+ | genF16Type asin (genF16Type x) | Arc sine. Returns an angle whose sine is x. The | | | range of values returned by this function is [-PI/2| | | , PI/2] Results are undefined if |x| > 1. | +------------------------------------------------+----------------------------------------------------+ | genF16Type acos (genF16Type x) | Arc cosine. Returns an angle whose cosine is x. The| | | range of values returned by this function is [0, p]| | | Results are undefined if |x| > 1. | +------------------------------------------------+----------------------------------------------------+ | genF16Type atan (genF16Type y, genF16Type x) | Arc tangent. Returns an angle whose tangent is y/x.| | | The signs of x and y are used to determine what | | | quadrant the angle is in. The range of values | | | returned by this function is [-PI,PI]. Results are | | | undefined if x and y are both 0. | +------------------------------------------------+----------------------------------------------------+ | genF16Type atan (genF16Type y_over_x) | Arc tangent. Returns an angle whose tangent is | | | y_over_x. The range of values returned by this | | | function is [-PI/2, PI/2]. | +------------------------------------------------+----------------------------------------------------+ | genF16Type sinh (genF16Type x) | Returns the hyperbolic sine function | | | (e^x - e^-x) / 2. | +------------------------------------------------+----------------------------------------------------+ | genF16Type cosh (genF16Type x) | Returns the hyperbolic cosine function | | | (e^x + e^-x) / 2. | +------------------------------------------------+----------------------------------------------------+ | genF16Type tanh (genF16Type x) | Returns the hyperbolic tangent function | | | sinh(x) / cosh(x). | +------------------------------------------------+----------------------------------------------------+ | genF16Type asinh (genF16Type x) | Arc hyperbolic sine; returns the inverse of sinh. | +------------------------------------------------+----------------------------------------------------+ | genF16Type acosh (genF16Type x) | Arc hyperbolic cosine; returns the non-negative | | | inverse of cosh. Results are undefined if x < 1. | +------------------------------------------------+----------------------------------------------------+ | genF16Type atanh (genF16Type x) | Arc hyperbolic tangent; returns the inverse of | | | tanh. Results are undefined if |x| >= 1. | +------------------------------------------------+----------------------------------------------------+ Modify Section 8.2, Exponential Functions (add to the table of Exponential Functions on p. 143) +------------------------------------------------+----------------------------------------------------+ | Syntax | Desciption | +------------------------------------------------+----------------------------------------------------+ | genF16Type pow (genF16Type x, genF16Type y) | Returns x raised to the y power, i.e., x^y | | | Results are undefined if x < 0. | | | Results are undefined if x = 0 and y <= 0. | +------------------------------------------------+----------------------------------------------------+ | genF16Type exp (genF16Type x) | Returns the natural exponentiation of x, i.e., e^x.| +------------------------------------------------+----------------------------------------------------+ | genF16Type log (genF16Type x) | Returns the natural logarithm of x, i.e., returns | | | the value y which satisfies the equation x = e^y. | | | Results are undefined if x <= 0. | +------------------------------------------------+----------------------------------------------------+ | genF16Type exp2 (genF16Type x) | Returns 2 raised to the x power, i.e., 2^x. | +------------------------------------------------+----------------------------------------------------+ | genF16Type log2 (genF16Type x) | Returns the base 2 logarithm of x, i.e., returns | | | the value y which satisfies the equation x = 2^y | | | Results are undefined if x <= 0. | +------------------------------------------------+----------------------------------------------------+ | genF16Type sqrt (genF16Type x) | Returns sqrt(x) .Results are undefined if x < 0. | +------------------------------------------------+----------------------------------------------------+ | genF16Type inversesqrt (genF16Type x) | Returns 1 / sqrt(x). Results are undefined if | | | x <= 0. | +------------------------------------------------+----------------------------------------------------+ Modify Section 8.3, Common Functions (add to the table of common functions on p. 144) +------------------------------------------------+----------------------------------------------------+ | Syntax | Desciption | +------------------------------------------------+----------------------------------------------------+ | genF16Type abs(genF16Type x) | Returns x if x >= 0; otherwise it returns -x. | +------------------------------------------------+----------------------------------------------------+ | genF16Type sign(genF16Type x) | Returns 1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < | | | 0. | +------------------------------------------------+----------------------------------------------------+ | genF16Type floor (genF16Type x) | Returns a value equal to the nearest integer that | | | is less than or equal to x. | +------------------------------------------------+----------------------------------------------------+ | genF16Type trunc (genF16Type x) | Returns a value equal to the nearest integer to x | | | whose absolute value is not larger than the | | | absolute value of x. | +------------------------------------------------+----------------------------------------------------+ | genF16Type round (genF16Type x) | Returns a value equal to the nearest integer to x. | | | The fraction 0.5 will round in a direction chosen | | | by the implementation, presumably the direction | | | that is fastest. This includes the possibility | | | that round(x) returns the same value as | | | roundEven(x) for all values of x. | +------------------------------------------------+----------------------------------------------------+ | genF16Type roundEven (genF16Type x) | Returns a value equal to the nearest integer to x. | | | A fractional part of 0.5 will round toward the | | | nearest even integer. (Both 3.5 and 4.5 for x will | | | return 4.0.) | +------------------------------------------------+----------------------------------------------------+ | genF16Type ceil (genF16Type x) | Returns a value equal to the nearest integer that | | | is greater than or equal to x. | +------------------------------------------------+----------------------------------------------------+ | genF16Type fract (genF16Type x) | Returns x - floor(x). | +------------------------------------------------+----------------------------------------------------+ | genF16Type mod (genF16Type x, float16_t y) | Modulus. Returns x - y * floor(x/y). | | genF16Type mod (genF16Type x, genF16Type y) | | +------------------------------------------------+----------------------------------------------------+ | genF16Type modf(genF16Type x, out genF16Type i)| Returns the fractional part of x and sets i to the | | | integer part (as a whole number floating-point | | | value). Both the return value and the output | | | parameter will have the same sign as x. | +------------------------------------------------+----------------------------------------------------+ | genF16Type min(genF16Type x, | Returns y if y < x; otherwise it returns x. | | genF16Type y) | | | genF16Type min(genF16Type x, | | | float16_t y) | | +------------------------------------------------+----------------------------------------------------+ | genF16Type max(genF16Type x, | Returns y if x < y; otherwise it returns x. | | genF16Type y) | | | genF16Type max(genF16Type x, | | | float16_t y) | | +------------------------------------------------+----------------------------------------------------+ | genF16Type clamp(genF16Type x, | Returns min(max(x, minVal), maxVal). | | genF16Type minVal, | | | genF16Type maxVal) | Results are undefined if minVal > maxVal. | | genF16Type clamp(genF16Type x, | | | float16_t minVal, | | | float16_t maxVal) | | +------------------------------------------------+----------------------------------------------------+ | genF16Type mix(genF16Type x, | Selects which vector each returned component comes | | genF16Type y, | from. For a component of a that is false, the | | genF16Type a) | corresponding component of x is returned. For a | | genF16Type mix(genF16Type x, | component of a that is true, the corresponding | | genF16Type y, | component of y is returned. | | float16_t a) | | | genF16Type mix(genF16Type x, | | | genF16Type y, | | | genBType a) | | +------------------------------------------------+----------------------------------------------------+ | genF16Type step (genF16Type edge, genF16Type x)| Returns 0.0 if x < edge; otherwise it returns 1.0. | | genF16Type step (float16_t edge, genF16Type x) | | +------------------------------------------------+----------------------------------------------------+ | genF16Type smoothstep (genF16Type edge0, | Returns 0.0 if x <= edge0 and 1.0 if x >= edge1 | | genF16Type edge1, | and performs smooth Hermite interpolation between 0| | genF16Type x) | and 1 when edge0 < x < edge1. This is useful in | | genF16Type smoothstep (float16_t edge0, | cases where you would want a threshold function | | float16_t edge1 | with a smooth,transition. This is equivalent to: | | genF16Type x) | genF16Type t; | | | t = clamp((x - edge0) / (edge1 - edge0), 0, 1); | | | return t * t * (3 - 2 * t); | | | Results are undefined if edge0 >= edge1. | +------------------------------------------------+----------------------------------------------------+ | genBType isnan (genF16Type x) | Returns true if x holds a NaN. Returns false | | | otherwise. Always returns false if NaNs are not | | | implemented. | +------------------------------------------------+----------------------------------------------------+ | genBType isinf (genF16Type x) | Returns true if x holds a positive infinity or | | | negative infinity. Returns false otherwise. | +------------------------------------------------+----------------------------------------------------+ | genF16Type fma (genF16Type a, genF16Type b, | Computes and returns a * b + c. | | genF16Type c) | | +------------------------------------------------+----------------------------------------------------+ | genF16Type frexp (genF16Type x, | Splits x into a floating-point significand in the | | out genIType exp) | range [0.5, 1.0) and an integral exponent of two, | | | such that: | | | x = significand * 2^exp | | | The significand is returned by the function and the| | | exponent is returned in the parameter exp. For a | | | floating-point value of zero, the significand and | | | exponent are both zero. For a floating-point value | | | that is an infinity or is not a number, the results| | | are undefined. | +------------------------------------------------+----------------------------------------------------+ | genF16Type ldexp (genF16Type x, | Builds a floating-point number from x and the | | in genIType exp) | corresponding integral exponent of two in exp, | | | returning: | | | x* 2^exp | | | If this product is too large to be represented in | | | the floating-point type, the result is undefined. | +------------------------------------------------+----------------------------------------------------+ Modify Section 8.4, Floating-Point Pack and Unpack Functions (add to the table of pack and unpack functions on p. 149) +-----------------------------------+------------------------------------------------------+ | Syntax | Desciption | +-----------------------------------+------------------------------------------------------+ | uint packFloat2x16(f16vec2 v) | Returns an unsigned 32-bit integer obtained by | | | packing the components of a two-component half- | | | precision floating-point vector, respectively. The | | | first vector component specifies the 16 least | | | significant bits; the second component specifies the | | | 16 most significant bits. | +-----------------------------------+------------------------------------------------------+ | f16vec2 unpackFloat2x16(uint v) | Returns a two-component half-precision floating-point| | | vector built from a 32-bit unsigned integer scalar, | | | respectively. The first component of the vector | | | contains the 16 least significant bits of the input; | | | the second component contains the 16 most | | | significant bits. | +-----------------------------------+------------------------------------------------------+ Modify Section 8.5 Geometric Functions (add to table of geometric functions on p.152) +-------------------------------------------+-----------------------------------------------+ | Syntax | Desciption | +-------------------------------------------+-----------------------------------------------+ | float16_t length (genF16Type x) | Returns the length of vector x, i.e., | | | sqrt(x[0]*x[0] + x[1]*x[1] + ...) | +-------------------------------------------+-----------------------------------------------+ | float16_t distance (genF16Type p0, | Returns the distance between p0 and p1, i.e., | | genF16Type p1) | length (p0 - p1) | +-------------------------------------------+-----------------------------------------------+ | float16_t dot (genF16Type x, genF16Type y)| Returns the dot product of x and y, i.e., | | | x[0]*y[0] + x[1]*y [1] + ... | +-------------------------------------------+-----------------------------------------------+ | f16vec3 cross (f16vec3 x, f16vec3 y) | Returns the cross product of x and y, i.e., | | | |x[1] * y[2] - y[1] * x[2]| | | | |x[2] * y[0] - y[2] * x[0]| | | | |x[0] * y[1] - y[0] * x[1]| | +-------------------------------------------+-----------------------------------------------+ | genF16Type normalize (genF16Type x) | Returns a vector in the same direction as x | | | but with a length of 1. | +-------------------------------------------+-----------------------------------------------+ | genF16Type faceforward (genF16Type N, | If dot(Nref, I) < 0 return N, otherwise return| | genF16Type I, | -N. | | genF16Type Nref), | | +-------------------------------------------+-----------------------------------------------+ | genF16Type reflect (genF16Type I, | For the incident vector I and surface | | genF16Type N) | orientation N, returns the reflection | | | direction: | | | I - 2 * dot(N, I) * N | | | N must already be normalized in order to | | | achieve the desired result. | +-------------------------------------------+-----------------------------------------------+ | genF16Type refract (genF16Type I, | For the incident vector I and surface normal | | genF16Type N, | N, and the ratio of indices of refraction eta,| | float16_t eta) | return the refraction vector. The result is | | | computed by | | | k = 1.0 - eta * eta * (1.0 - dot(N, I) * | | | dot(N, I)) | | | if (k < 0.0) | | | return genF16Type(0.0) | | | else | | | return eta * I - (eta * dot(N, I) | | | + sqrt(k)) * N | | | The input parameters for the incident vector | | | I and the surface normal N must already be | | | normalized to get the desired results. | +-------------------------------------------+-----------------------------------------------+ Modify Section, 8.6 Matrix Functions (modify the first paragraph of the section on p. 154) ..., there is both a single-precision floating-point version, where all arguments and return values are single precision, a double-precision floating-point version, where all arguments and return values are double precision, and a half-precision floating-point version, where all arguments and return values are half precision. Modify Section, 8.7, Vector Relational Functions (add to the table of placeholders at the top of p. 156) +-------------+-----------------------------+ | Placeholder | Specific Types Allowed | +-------------+-----------------------------+ | f16vec | f16vec2, f16vec3, f16vec4 | +-------------+-----------------------------+ (add to the table of vector relational functions at the bottom of p. 156) +-------------------------------------------+-----------------------------------------------+ | Syntax | Desciption | +-------------------------------------------+-----------------------------------------------+ | bvec lessThan(f16vec x, f16vec y) | Returns the component-wise compare of x < y. | +-------------------------------------------+-----------------------------------------------+ | bvec lessThanEqual(f16vec x, f16vec y) | Returns the component-wise compare of x <= y. | +-------------------------------------------+-----------------------------------------------+ | bvec greaterThan(f16vec x, f16vec y) | Returns the component-wise compare of x > y. | +-------------------------------------------+-----------------------------------------------+ | bvec greaterThanEqual(f16vec x, f16vec y) | Returns the component-wise compare of x >= y. | +-------------------------------------------+-----------------------------------------------+ | bvec equal(f16vec x, f16vec y) | Returns the component-wise compare of x == y. | +-------------------------------------------+-----------------------------------------------+ | bvec notEqual(f16vec x, f16vec y) | Returns the component-wise compare of x != y. | +-------------------------------------------+-----------------------------------------------+ Modify Section 8.13.1 Derivative Functions (add to table of derivative functions on p. 181) +-------------------------------------------+-----------------------------------------------+ | Syntax | Description | +-------------------------------------------+-----------------------------------------------+ | genF16Type dFdx (genF16Type p) | Returns either dFdxFine(p) or dFdxCoarse(p), | | | based on implementation choice, presumably | | | whichever is the faster, or by whichever is | | | selected in the API through | | | quality-versus-speed hints. | +-------------------------------------------+-----------------------------------------------+ | genF16Type dFdy (genF16Type p) | Returns either dFdyFine(p) or dFdyCoarse(p), | | | based on implementation choice, presumably | | | whichever is the faster, or by whichever is | | | selected in the API through | | | quality-versus-speed hints. | +-------------------------------------------+-----------------------------------------------+ | genF16Type dFdxFine (genF16Type p) | Returns the partial derivative of p with | | | respect to the window x coordinate. Will use | | | local differencing based on the value of p | | | for the current fragment and its immediate | | | neighbor(s). | +-------------------------------------------+-----------------------------------------------+ | genF16Type dFdyFine (genF16Type p) | Returns the partial derivative of p with | | | respect to the window y coordinate. Will use | | | local differencing based on the value of p | | | for the current fragment and its immediate | | | neighbor(s). | +-------------------------------------------+-----------------------------------------------+ | genF16Type dFdxCoarse (genF16Type p) | Returns the partial derivative of p with | | | respect to the window x coordinate. Will use | | | local differencing based on the value of p | | | for the current fragment's neighbors, and | | | will possibly, but not necessarily, include | | | the value of p for the current fragment. That | | | is, over a given area, the implementation can | | | x compute derivatives in fewer unique | | | locations than would be allowed for | | | dFdxFine(p). | +-------------------------------------------+-----------------------------------------------+ | genF16Type dFdyCoarse (genF16Type p) | Returns the partial derivative of p with | | | respect to the window y coordinate. Will use | | | local differencing based on the value of p | | | for the current fragment's neighbors, and | | | will possibly, but not necessarily, include | | | the value of p for the current fragment. That | | | is, over a given area, the implementation can | | | compute y derivatives in fewer unique | | | locations than would be allowed for | | | dFdyFine(p). | +-------------------------------------------+-----------------------------------------------+ | genF16Type fwidth (genF16Type p) | Returns abs(dFdx(p)) + abs(dFdy(p)). | +-------------------------------------------+-----------------------------------------------+ | genF16Type fwidthFine (genF16Type p) | Returns abs(dFdxFine(p)) + abs(dFdyFine(p)). | +-------------------------------------------+-----------------------------------------------+ | genF16Type fwidthCoarse (genF16Type p) | Returns abs(dFdxCoarse(p)) + | | | abs(dFdyCoarse(p)). | +-------------------------------------------+-----------------------------------------------+ Modify Section 8.13.2 Interpolation Functions (add to table of interpolation functions on p. 180) +-------------------------------------------+-----------------------------------------------+ | Syntax | Description | +-------------------------------------------+-----------------------------------------------+ | genF16Type interpolateAtCentroid ( | Returns the value of the input interpolant | | genF16Type interpolant) | sampled at a location inside both the pixel | | | and the primitive being processed. The value | | | obtained would be the same value assigned to | | | the input variable if declared with the | | | centroid qualifier | +-------------------------------------------+-----------------------------------------------+ | genF16Type interpolateAtSample ( | Returns the value of the input interpolant | | genF16Type interpolant, | variable at the location of sample number | | int sample) | sample. If multisample buffers are not | | | available, the input variable will be | | | evaluated at the center of the pixel. If | | | sample sample does not exist, the position | | | used to interpolate the input variable is | | | undefined. | +-------------------------------------------+-----------------------------------------------+ | genF16Type interpolateAtOffset ( | Returns the value of the input interpolant | | genF16Type interpolant, | variable sampled at an offset from the center | | f16vec2 offset) | of the pixel specified by offset. The two | | | floating-point components of offset, give the | | | offset in pixels in the x and y directions, | | | respectively. An offset of (0, 0) identifies | | | the center of the pixel. The range and | | | granularity of offsets supported by this | | | function isimplementation-dependent. | +-------------------------------------------+-----------------------------------------------+ Modify Section 9, Shading Language Grammar for Core Profile (add to the list of tokens on p. 187) ... FLOAT16 F16VEC2 F16VEC3 F16VEC4 F16MAT2 F16MAT3 F16MAT4 F16MAT2X2 FL6MAT2X3 F16MAT2X4 F16MAT3X2 F16MAT3X3 F16MAT3X4 F16MAT4X2 F16MAT4X3 F16MAT4X4 ... FLOAT16CONSTANT (add to the rule of "primary_expression" on p. 188) primary_expression: ... FLOAT16CONSTANT ... (add to the rule of "type_specifier_nonarray" on p. 195) type_specifier_nonarray: ... FLOAT16 F16VEC2 F16VEC3 F16VEC4 F16MAT2 F16MAT3 F16MAT4 F16MAT2X2 FL6MAT2X3 F16MAT2X4 F16MAT3X2 F16MAT3X3 F16MAT3X4 F16MAT4X2 F16MAT4X3 F16MAT4X4 ... Dependencies on ARB_gpu_shader_int64 If the shader enables ARB_gpu_shader_int64, this extension allows additional explicit conversions between half-precision floating-point types and 64-bit integer types. Modify Section 5.4.1, Conversion and Scalar Constructors (add after the first list of constructor examples on p. 95) int64_t(float16_t) // convert a float16_t value to a signed 64-bit integer uint64_t(float16_t) // convert a float16_t value to an unsigned 64-bit integer float16_t(int64_t) // convert a signed 64-bit integer to a float16_t value float16_t(uint64_t) // convert an unsigned 64-bit integer to a float16_t value Dependencies on AMD_shader_trinary_minmax If the shader enables AMD_shader_trinary_minmax, this extension adds additional common functions. Modify Section 8.3, Common Functions (add to the table of common functions on p. 144) +-------------------------------------------+-----------------------------------------------+ | Syntax | Description | +-------------------------------------------+-----------------------------------------------+ | genF16Type min3(genF16Type x, | Returns the per-component minimum value of x, | | genF16Type y, | y, and z. | | genF16Type z) | | +-------------------------------------------+-----------------------------------------------+ | genF16Type max3(genF16Type x, | Returns the per-component maximum value of x, | | genF16Type y, | y, and z. | | genF16Type z) | | +-------------------------------------------+-----------------------------------------------+ | genF16Type mid3(genF16Type x, | Returns the per-component median value of x, | | genF16Type y, | y, and z. | | genF16Type z) | | +-------------------------------------------+-----------------------------------------------+ Dependencies on AMD_shader_explicit_vertex_parameter If the shader enables AMD_shader_explicit_vertex_parameter, this extension adds additional interpolation functions. Modify Section 8.13.2 Interpolation Functions (add to table of interpolation functions on p. 180) +-------------------------------------------+-----------------------------------------------+ | Syntax | Description | +-------------------------------------------+-----------------------------------------------+ | genF16Type interpolateAtVertexAMD ( | Returns the value of the input | | genF16Type interpolant, | without any interpolation. i.e. the raw | | uint vertexIdx) | output value of previous shader stage. | | | selects for which vertex of the | | | primitive the value of is | | | returned. | | | | | | This return value is equivalent with | | | interpolating the input using | | | the following set of barycentric coordinates, | | | depending on the value of : | | | | | | vertexIdx Barycentric coordinates | | | 0 I=0, J=0, K=1 | | | 1 I=1, J=0, K=0 | | | 2 I=0, J=1, K=0 | | | | | | However this order has no association with | | | the vertex order specified by the application | | | in the originating draw. | | | | | | The value of must be constant | | | integer expression with a value in the range | | | [0, 2]. | +-------------------------------------------+-----------------------------------------------+ Errors None. New State None. New Implementation Dependent State None. Issues (1) How the functionality in this extension different than the half_precision floating-point types introduced by NV_gpu_shader5? RESOLVED: This extension is designed to be source code compatible with the half-precison floating-point support in NV_gpu_shader5. However, it is a functional superset of that, as it adds the following additional features: * support for implicit conversions from int, uint and float to float16_t. * support for overloaded versions of the functions, such as abs, sign, min, max, clamp, and etc., that accept float16_t type or half-precision floating-point type as parameters. (2) What should be done to distinguish half-precison floating-point constants? RESOLVED: We will use "HF" and "hf" to identify half-precision floating-point constants. (3) Should we import new uniform API to setup the float16_t type uniform in default uniform block? RESOLVED: No. float16_t isn't a IEEE standard format, CPU doesn't support it directly. So most data on CPU side is stored in the form of single- or double-precision floating-point precision floating-point. Uniform*f{v}'s functionality is extended to support uniforms with float16_t type in this extension. (4) Should we support float16_t types as members of uniform blocks, shader storage buffer blocks, or as transform feedback varyings? RESOLVED: Yes, support all of them. float16_t types will consume two basic machine units. Some examples: struct S { float16_t x; // rule 1: align = 2, takes offsets 0-1 f16vec2 y; // rule 2: align = 4, takes offsets 4-7 f16vec3 z; // rule 3: align = 8, takes offsets 8-13 }; layout(column_major, std140) uniform B1 { float16_t a; // rule 1: align = 2, takes offsets 0-1 f16vec2 b; // rule 2: align = 4, takes offsets 4-7 f16vec3 c; // rule 3: align = 8, takes offsets 8-13 float16_t d[2]; // rule 4: align = 16, array stride = 16, // takes offsets 16-47 f16mat2x3 e; // rule 5: align = 16, matrix stride = 16, // takes offsets 48-79 f16mat2x3 f[2]; // rule 6: align = 16, matrix stride = 16, // array stride = 32, f[0] takes // offsets 80-111, f[1] takes offsets // 112-143 S g; // rule 9: align = 16, g.x takes offsets // 144-145, g.y takes offsets 148-151, // g.z takes offsets 152-159 S h[2]; // rule 10: align = 16, array stride = 16, h[0] // takes offsets 160-175, h[1] takes // offsets 176-191 }; layout(row_major, std430) buffer B2 { float16_t o; // rule 1: align = 2, takes offsets 0-1 f16vec2 p; // rule 2: align = 4, takes offsets 4-7 f16vec3 q; // rule 3: align = 8, takes offsets 8-13 float16_t r[2]; // rule 4: align = 2, array stride = 2, takes // offsets 14-17 f16mat2x3 s; // rule 7: align = 4, matrix stride = 4, takes // offsets 20-31 f16mat2x3 t[2]; // rule 8: align = 4, matrix stride = 4, array // stride = 12, t[0] takes offsets // 32-43, t[1] takes offsets 44-55 S u; // rule 9: align = 8, u.x takes offsets // 56-57, u.y takes offsets 60-63, u.z // takes offsets 64-69 S v[2]; // rule 10: align = 8, array stride = 16, v[0] // takes offsets 72-87, v[1] takes // offsets 88-103 }; (5) In OpenGL ES Shading Language, the format of floating-point in UBO and SSBO is always single-precision floating-point regardless of the precision qualifier in shader. which format should be used for this extension? RESOLVED: the format should be equal with the type declaried in shader. i.e. if the block member's type is float16_t, the format in buffer is half-precision floating-point. and if the block member's type is float, the format is single-precision floating-point. we will provide another extension to keep compatible with ES driver's behavior. Revision History Rev. Date Author Changes ---- -------- -------- ----------------------------------------- 5 09/21/16 dwitczak Fixed minor character encoding issues. 4 08/01/16 rexu Correct the example of offset calculation for block members. Add limitation of xfb_offset when this qualifier is applied to block members that have float16_t types. 3 07/11/16 rexu Clarify that each component of float16_t types consume two basic machine units. Remove the interaction with NV_gpu_shader5 in that implicit conversion from int, uint and float types to float16_t types are disallowed now. Add new derivative functions: dFdxFine, dFdyFine, dFdxCoarse, dFdyCoarse, fwidthFine, fwidthCoarse. Add the interaction with AMD_shader_trinary_minmax and AMD_shader_explicit_vertex_parameter. Remove two listed issues that are no longer valid for the updated version of this extension. Remove floatBitsToInt and decide to add it when 16-bit integer data type is supported. 2 07/06/16 rexu Remove sections that involve half-precision floating-point opaque types. Modify allowed rules of implicit conversion relevant to float16_t types. Add the interaction with ARB_gpu_shader_ int64. Remove the modification of the first rule of std140 layout. Provide some examples to demostrate memory storage layout of uniform blocks and shader storage blocks when they have members of float16_t types. 1 11/14/13 qlin Initial revision.