extensions/AMD/AMD_gpu_shader_half_float.txt

Name

    AMD_gpu_shader_half_float

Name Strings

    GL_AMD_gpu_shader_half_float

Contact

    Qun Lin, AMD (quentin.lin 'at' amd.com)

Contributors

    Qun Lin, AMD
    Daniel Rakos, AMD
    Donglin Wei, AMD
    Graham Sellers, AMD
    Rex Xu, AMD
    Dominik Witczak, AMD

Status

    Shipping.

Version

    Last Modified Date:         09/21/2016
    Author Revision:            5

Number

    OpenGL Extension #496

Dependencies

    This extension is written against the OpenGL 4.5 (Core Profile)
    Specification.

    This extension is written against version 4.50 of the OpenGL Shading
    Language Specification.

    OpenGL 4.0 and GLSL 4.00 are required.

    This extension interacts with ARB_gpu_shader_int64.

    This extension interacts with AMD_shader_trinary_minmax.

    This extension interacts with AMD_shader_explicit_vertex_parameter.

Overview

    This extension was developed based on the NV_gpu_shader5 extension to
    allow implementations supporting half float in shader and expose the
    feature without the additional requirements that are present in
    NV_gpu_shader5.

    The extension introduces the following features for all shader types:

      * support for half float scalar, vector and matrix data types in shader;

      * new built-in functions to pack and unpack half float types into a
        32-bit integer vector;

      * half float support for all existing single float built-in functions,
        including angle functions, exponential functions, common functions,
        geometric functions, matrix functions and etc.;

    This extension is designed to be a functional superset of the half-precision
    floating-point support from NV_gpu_shader5 and to keep source code compatible
    with that, thus the new procedures, functions, and tokens are identical to
    those found in that extension.


New Procedures and Functions

    None.

New Tokens

    Returned by the <type> parameter of GetActiveAttrib, GetActiveUniform, and
    GetTransformFeedbackVarying:

    (The tokens are identical to those defined in NV_gpu_shader5.)

        FLOAT16_NV                                      0x8FF8
        FLOAT16_VEC2_NV                                 0x8FF9
        FLOAT16_VEC3_NV                                 0x8FFA
        FLOAT16_VEC4_NV                                 0x8FFB

    (New tokens)
        FLOAT16_MAT2_AMD                                0x91C5
        FLOAT16_MAT3_AMD                                0x91C6
        FLOAT16_MAT4_AMD                                0x91C7
        FLOAT16_MAT2x3_AMD                              0x91C8
        FLOAT16_MAT2x4_AMD                              0x91C9
        FLOAT16_MAT3x2_AMD                              0x91CA
        FLOAT16_MAT3x4_AMD                              0x91CB
        FLOAT16_MAT4x2_AMD                              0x91CC
        FLOAT16_MAT4x3_AMD                              0x91CD


Additions to Chapter 7 of the OpenGL 4.5 (Core Profile) Specification
(Program Objects)

    Modify Section 7.3.1, Program Interfaces

    (add to Table 7.3, OpenGL Shading Language type tokens, p. 108)

    +----------------------------+----------------+------+------+------+
    | Type Name Token            | Keyword        |Attrib| Xfb  |Buffer|
    +----------------------------+----------------+------+------+------+
    | FLOAT16_NV                 | float16_t      |  *   |  *   |  *   |
    | FLOAT16_VEC2_NV            | f16vec2        |  *   |  *   |  *   |
    | FLOAT16_VEC3_NV            | f16vec3        |  *   |  *   |  *   |
    | FLOAT16_VEC4_NV            | f16vec4        |  *   |  *   |  *   |
    | FLOAT16_MAT2_AMD           | f16mat2        |  *   |  *   |  *   |
    | FLOAT16_MAT3_AMD           | f16mat3        |  *   |  *   |  *   |
    | FLOAT16_MAT4_AMD           | f16mat4        |  *   |  *   |  *   |
    | FLOAT16_MAT2x3_AMD         | f16mat2x3      |  *   |  *   |  *   |
    | FLOAT16_MAT2x4_AMD         | f16mat2x4      |  *   |  *   |  *   |
    | FLOAT16_MAT3x2_AMD         | f16mat3x2      |  *   |  *   |  *   |
    | FLOAT16_MAT3x4_AMD         | f16mat3x4      |  *   |  *   |  *   |
    | FLOAT16_MAT4x2_AMD         | f16mat4x2      |  *   |  *   |  *   |
    | FLOAT16_MAT4x3_AMD         | f16mat4x3      |  *   |  *   |  *   |
    +----------------------------+----------------+------+------+------+


    Modify Section 7.6.1, Loading Uniform Variables

    (modify the last paragraph on p. 132)

        The Uniform*f{v} commands will load count sets of one to four floating-
    point values into a uniform defined as a float, a half float, a floating-
    point vector, a half-precision floating-point vector or an array of either
    of these types. Floating-point values are converted to half float by the GL
    for uniforms defined as a half float, a half float vector or an array of
    those.


    Modify Section 7.6.2.1, Uniform Buffer Object Storage

    (modify the first two bullets of the first paragraph on p. 136)

    * Members of type bool, int, uint, float, float16_t and double are respectively
      extracted from a buffer object by reading a single uint, int, uint, float,
      half float or double value at the specified offset.

    * Vectors with N elements with basic data types of bool, int, uint, float,
      float16_t or double are extracted as N values in consecutive memory locations
      beginning at the specified offset, with components stored in order with the
      first (X) component at the lowest offset. The GL data type used for component
      extraction is derived according to the rules for scalar members above.


Additions to Chapter 11 of the OpenGL 4.5 (Core Profile) Specification
(Programmable Vertex Processing)

    Modify Section 11.1.1, Vertex Attributes

    (modify Table 11.2, Generic attributes and vector types used by column vectors of
    matrix variables bound to generic attribute index i. p. 366)

    +------------------------------+-------------------------+-----------------------+
    |          Data type           |Column vector type layout|     Generic           |
    |                              |qualifier attributes used|                       |
    +------------------------------+-------------------------+-----------------------+
    | mat2, dmat2, f16mat2         | two-component vector    | i, i + 1              |
    | mat2x3, dmat2x3, f16mat2x3   | three-component vector  | i, i + 1              |
    | mat2x4, dmat2x4, f16mat2x4   | four-component vector   | i, i + 1              |
    | mat3x2, dmat3x2, f16mat3x2   | two-component vector    | i, i + 1, i + 2       |
    | mat3, dmat3, f16mat3         | three-component vector  | i, i + 1, i + 2       |
    | mat3x4, dmat3x4, f16mat3x4   | four-component vector   | i, i + 1, i + 2       |
    | mat4x2, dmat4x2, f16mat4x2   | two-component vector    | i, i + 1, i + 2, i + 3|
    | mat4x3, dmat4x3, f16mat4x3   | three-component vector  | i, i + 1, i + 2, i + 3|
    | mat4, dmat4, f16mat4         | four-component vector   | i, i + 1, i + 2, i + 3|
    +------------------------------+-------------------------+-----------------------+

    (modify Table 11.3: Scalar and vector vertex attribute types and VertexAttrib*
    commands used to set the values of the corresponding generic attributes. p. 366)

    +-------------------+--------------------------+
    |   Data type       |         Command          |
    +-------------------+--------------------------+
    | float, float16_t  | VertexAttrib1*           |
    | vec2, f16vec2     | VertexAttrib2*           |
    | vec3, f16vec3     | VertexAttrib3*           |
    | vec4, f16vec4     | VertexAttrib4*           |
    +-------------------+--------------------------+


    Modify Section 11.1.2.1, Output Variables

    (modify the last paragraph on p. 374)

    ..., each component of outputs declared as half-precision floating-point
    scalars, vectors, or matrices is considered to consume two basic machine
    units, and each component of any other type ...


Modifications to the OpenGL Shading Language Specification, Version 4.50

    Including the following line in a shader can be used to control the
    language features described in this extension:

      #extension GL_AMD_gpu_shader_half_float : <behavior>

    where <behavior> is as specified in section 3.3.

    New preprocessor #defines are added to the OpenGL Shading Language:

      #define GL_AMD_gpu_shader_half_float       1


Additions to Chapter 3 of the OpenGL Shading Language Specification (Basics)


    Modify Section 3.6, Keywords

    (add the following to the list of reserved keywords at p. 18)

    float16_t f16vec2 f16vec3 f16vec4
    f16mat2  f16mat3  f16mat4
    f16mat2x2 fl6mat2x3 f16mat2x4
    f16mat3x2 f16mat3x3 f16mat3x4
    f16mat4x2 f16mat4x3 f16mat4x4


Additions to Chapter 4 of the OpenGL Shading Language Specification
(Variables and Types)


    Modify Section 4.1, Basic Types

    (add to the basic "Transparent Types" table, p. 23)

    +-----------+------------------------------------------------------------+
    | Type      | Meaning                                                    |
    +-----------+------------------------------------------------------------+
    | float16_t | a half-precision floating-point scalar                     |
    | f16vec2   | a two-component half-precision floating-point vector       |
    | f16vec3   | a three-component half-precision floating-point vector     |
    | f16vec4   | a four-component half-precision floating-point vector      |
    | f16mat2   | a 2x2 half-precision floating-point matrix                 |
    | f16mat3   | a 3x3 half-precision floating-point matrix                 |
    | f16mat4   | a 4x4 half-precision floating-point matrix                 |
    | f16mat2x2 | same as a f16mat2                                          |
    | f16mat2x3 | a half-precision floating-point matrix with 2 columns and  |
    |           | 3 rows                                                     |
    | f16mat2x4 | a half-precision floating-point matrix with 2 columns and  |
    |           | 4 rows                                                     |
    | f16mat3x2 | a half-precision floating-point matrix with 3 columns and  |
    |           | 2 rows                                                     |
    | f16mat3x3 | same as a f16mat3                                          |
    | f16mat3x4 | a half-precision floating-point matrix with 3 columns and  |
    |           | 4 rows                                                     |
    | f16mat4x2 | a half-precision floating-point matrix with 4 columns and  |
    |           | 2 rows                                                     |
    | f16mat4x3 | a half-precision floating-point matrix with 4 columns and  |
    |           | 3 rows                                                     |
    | f16mat4x4 | same as a f16mat4                                          |
    +-----------+------------------------------------------------------------+


    Modify Section 4.1.4, Floating-Point Variables

    (replace first paragraph of the section, p. 29)

    Single-precision, double-precision and half-precision floating point variables
    are available for use in a variety of scalar calculations. Generally, the term
    floating-point will refer to all single-, double- and half-precision floating
    point. Floating-point variables are defined as in the following examples:

        float a, b = 1.5;       // single-precision floating-point
        double c, d = 2.0LF;    // double-precision floating-point
        float16_t e, f = 3.0HF; // half-precision floating-point

    As an input value to one of the processing units, a single-precision, double-
    precision or half-precison floating-point variable is expected to match the
    corresponding IEEE 754 floating-point definition in terms of precision and
    dynamic range.

    (modify grammar rule for "floating-suffix", p. 30)

      floating-suffix: one of
        f F lf LF hf HF

    (modify the fourth sentence of second paragraph on p. 30)

    When the suffix "lf" or "LF" is present, the literal has type double. When the
    suffix "hf" or "HF" is present, the literal has type float16_t. Otherwise, the
    literal has type float.


    Modify Section 4.1.6, Matrices

    (modify the second sentence in the section, p. 30)

    Matrix types beginning with "mat" have single-precision components, matrix
    types beginning with "dmat" have double-precision components and matrix types
    beginning with "f16mat" have half-precision components.


    Modify Section 4.1.10, Implicit Conversions

    (modify the implicit conversion table on p. 37)

    +-----------------------+-------------------------------------------------+
    | Type of expression    | Can be implicitly converted to                  |
    +-----------------------+-------------------------------------------------+
    | int, uint, float16_t  | float                                           |
    | ivec2, uvec2, f16vec2 | vec2                                            |
    | ivec3, uvec3, f16vec3 | vec3                                            |
    | ivec4, uvec4, f16vec4 | vec4                                            |
    | f16mat2               | mat2                                            |
    | f16mat3               | mat3                                            |
    | f16mat4               | mat4                                            |
    | f16mat2x3             | mat2x3                                          |
    | f16mat2x4             | mat2x4                                          |
    | f16mat3x2             | mat3x2                                          |
    | f16mat3x4             | mat3x4                                          |
    | f16mat4x2             | mat4x2                                          |
    | f16mat4x3             | mat4x3                                          |
    | int, uint,            | double                                          |
    | float, float16_t      |                                                 |
    | ivec2, uvec2,         | dvec2                                           |
    | vec2, f16vec2         |                                                 |
    | ivec3, uvec3,         | dvec3                                           |
    | vec3, f16vec3         |                                                 |
    | ivec4, uvec4,         | dvec4                                           |
    | vec4, f16vec4         |                                                 |
    | mat2, f16mat2,        | dmat2                                           |
    | mat3, f16mat3         | dmat3                                           |
    | mat4, f16mat4         | dmat4                                           |
    | mat2x3, f16mat2x3     | dmat2x3                                         |
    | mat2x4, f16mat2x4     | dmat2x4                                         |
    | mat3x2, f16mat3x2     | dmat3x2                                         |
    | mat3x4, f16mat3x4     | dmat3x4                                         |
    | mat4x2, f16mat4x2     | dmat4x2                                         |
    | mat4x3, f16mat4x3     | dmat4x3                                         |
    +-----------------------+-------------------------------------------------+


    Modify Section 4.4.2.1 Transform Feedback Layout Qualifiers

    (insert after the fourth paragraph in the section on p. 70)

    ... will be a multiple of 8; if applied to an aggregrate containing a
    float16_t, the offset must also be a multiple of 2, and the space taken in
    the buffer will be a multiple of 2.


    Modify Section 4.7.1 Range and Precision

    (insert after the first paragraph in the section on p. 85)

    ... and positive and negative zeros. The precision of stored half-
    precision floating-point variables is described in section 2.3.3.2 "16-Bit
    Floating-Point Numbers" of OpenGL Specification.

    The following rules apply to all floating operations, including single-,
    double- and half-precision operations:...


Additions to Chapter 5 of the OpenGL Shading Language Specification
(Operators and Expressions)


    Modify Section 5.4.1, Conversion and Scalar Constructors

    (add after the first list of constructor examples on p. 97)

      int(float16_t)    // convert a float16_t value to a signed integer
      uint(float16_t)   // convert a float16_t value to an unsigned integer
      bool(float16_t)   // convert a float16_t value to a Boolean
      float(float16_t)  // convert a float16_t value to a float value
      double(float16_t) // convert a float16_t value to a double value
      float16_t(bool)   // convert a Boolean to a float16_t value
      float16_t(int)    // convert a signed integer to a float16_t value
      float16_t(uint)   // convert an unsigned integer to a float16_t value
      float16_t(float)  // convert a float value to a float16_t value
      float16_t(double) // convert a double value to a float16_t value

    (modify the first sentence of last paragraph on p. 98)

    ... other arguments.
    If the basic type (bool, int, float, double, or float16_t) of a parameter to
    a constructor does not match the basic type of the object being constructed,
    the scalar construction rules (above) are used to convert the parameters.


Additions to Chapter 6 of the OpenGL Shading Language Specification
(Statements and Structure)


    Modify Section 6.1, Function Defintions

    (replace the second rule in third paragraph on p. 113)

      2. A match involving a conversion from a signed integer, unsigned
         integer, or floating-point type to a similar type having a larger
         number of bits is better than a match involving any other implicit
         conversion.

Additions to Chapter 8 of the OpenGL Shading Language Specification
(Built-in Functions)

    (insert after the sixth sentence of last paragraph on p. 140)

    ... genDType is used as the argument. Where the input arguments (and
    corresponding output) can be float16_t, f16vec2, f16vec3, f16vec4,
    genF16Type is used as the argument.


    Modify Section 8.1, Angle and Trigonometry Functions

    (add to the table of Angle and Trigonometry Functions on p. 141)

    +------------------------------------------------+----------------------------------------------------+
    | Syntax                                         | Desciption                                         |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type radians (genF16Type degrees)        | Converts degrees to radians, i.e., 180/PI *        |
    |                                                | degrees.                                           |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type degrees (genF16Type radians)        | Converts radians to degrees, i.e., 180/PI *        |
    |                                                | radians.                                           |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type sin (genF16Type angle)              | The standard trigonometric sine function.          |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type cos (genF16Type angle)              | The standard trigonometric cosine function         |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type tan (genF16Type angle)              | The standard trigonometric tangent.                |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type asin (genF16Type x)                 | Arc sine. Returns an angle whose sine is x. The    |
    |                                                | range of values returned by this function is [-PI/2|
    |                                                | , PI/2] Results are undefined if |x| > 1.          |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type acos (genF16Type x)                 | Arc cosine. Returns an angle whose cosine is x. The|
    |                                                | range of values returned by this function is [0, p]|
    |                                                | Results are undefined if |x| > 1.                  |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type atan (genF16Type y, genF16Type x)   | Arc tangent. Returns an angle whose tangent is y/x.|
    |                                                | The signs of x and y are used to determine what    |
    |                                                | quadrant the angle is in. The range of values      |
    |                                                | returned by this function is [-PI,PI]. Results are |
    |                                                | undefined if x and y are both 0.                   |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type atan (genF16Type y_over_x)          | Arc tangent. Returns an angle whose tangent is     |
    |                                                | y_over_x. The range of values returned by this     |
    |                                                | function is [-PI/2, PI/2].                         |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type sinh (genF16Type x)                 | Returns the hyperbolic sine function               |
    |                                                | (e^x - e^-x) / 2.                                  |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type cosh (genF16Type x)                 | Returns the hyperbolic cosine function             |
    |                                                | (e^x + e^-x) / 2.                                  |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type tanh (genF16Type x)                 | Returns the hyperbolic tangent function            |
    |                                                | sinh(x) / cosh(x).                                 |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type asinh (genF16Type x)                | Arc hyperbolic sine; returns the inverse of sinh.  |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type acosh (genF16Type x)                | Arc hyperbolic cosine; returns the non-negative    |
    |                                                | inverse of cosh. Results are undefined if x < 1.   |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type atanh (genF16Type x)                | Arc hyperbolic tangent; returns the inverse of     |
    |                                                | tanh. Results are undefined if |x| >= 1.           |
    +------------------------------------------------+----------------------------------------------------+


    Modify Section 8.2, Exponential Functions

    (add to the table of Exponential Functions on p. 143)

    +------------------------------------------------+----------------------------------------------------+
    | Syntax                                         | Desciption                                         |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type pow (genF16Type x, genF16Type y)    | Returns x raised to the y power, i.e., x^y         |
    |                                                | Results are undefined if x < 0.                    |
    |                                                | Results are undefined if x = 0 and y <= 0.         |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type exp (genF16Type x)                  | Returns the natural exponentiation of x, i.e., e^x.|
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type log (genF16Type x)                  | Returns the natural logarithm of x, i.e., returns  |
    |                                                | the value y which satisfies the equation x = e^y.  |
    |                                                | Results are undefined if x <= 0.                   |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type exp2 (genF16Type x)                 | Returns 2 raised to the x power, i.e., 2^x.        |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type log2 (genF16Type x)                 | Returns the base 2 logarithm of x, i.e., returns   |
    |                                                | the value y which satisfies the equation x = 2^y   |
    |                                                | Results are undefined if x <= 0.                   |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type sqrt (genF16Type x)                 | Returns sqrt(x) .Results are undefined if x < 0.   |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type inversesqrt (genF16Type x)          | Returns 1 / sqrt(x). Results are undefined if      |
    |                                                | x <= 0.                                            |
    +------------------------------------------------+----------------------------------------------------+


    Modify Section 8.3, Common Functions

    (add to the table of common functions on p. 144)

    +------------------------------------------------+----------------------------------------------------+
    | Syntax                                         | Desciption                                         |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type abs(genF16Type x)                   | Returns x if x >= 0; otherwise it returns -x.      |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type sign(genF16Type x)                  | Returns 1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < |
    |                                                | 0.                                                 |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type floor (genF16Type x)                | Returns a value equal to the nearest integer that  |
    |                                                | is less than or equal to x.                        |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type trunc (genF16Type x)                | Returns a value equal to the nearest integer to x  |
    |                                                | whose absolute value is not larger than the        |
    |                                                | absolute value of x.                               |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type round (genF16Type x)                | Returns a value equal to the nearest integer to x. |
    |                                                | The fraction 0.5 will round in a direction chosen  |
    |                                                | by the implementation, presumably the direction    |
    |                                                | that is fastest. This includes the possibility     |
    |                                                | that round(x) returns the same value as            |
    |                                                | roundEven(x) for all values of x.                  |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type roundEven (genF16Type x)            | Returns a value equal to the nearest integer to x. |
    |                                                | A fractional part of 0.5 will round toward the     |
    |                                                | nearest even integer. (Both 3.5 and 4.5 for x will |
    |                                                | return 4.0.)                                       |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type ceil (genF16Type x)                 | Returns a value equal to the nearest integer that  |
    |                                                | is greater than or equal to x.                     |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type fract (genF16Type x)                | Returns x - floor(x).                              |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type mod (genF16Type x, float16_t y)     | Modulus. Returns x - y * floor(x/y).               |
    | genF16Type mod (genF16Type x, genF16Type y)    |                                                    |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type modf(genF16Type x, out genF16Type i)| Returns the fractional part of x and sets i to the |
    |                                                | integer part (as a whole number floating-point     |
    |                                                | value). Both the return value and the output       |
    |                                                | parameter will have the same sign as x.            |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type min(genF16Type x,                   | Returns y if y < x; otherwise it returns x.        |
    |                genF16Type y)                   |                                                    |
    | genF16Type min(genF16Type x,                   |                                                    |
    |                float16_t y)                    |                                                    |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type max(genF16Type x,                   | Returns y if x < y; otherwise it returns x.        |
    |                genF16Type y)                   |                                                    |
    | genF16Type max(genF16Type x,                   |                                                    |
    |                float16_t y)                    |                                                    |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type clamp(genF16Type x,                 | Returns min(max(x, minVal), maxVal).               |
    |                  genF16Type minVal,            |                                                    |
    |                  genF16Type maxVal)            | Results are undefined if minVal > maxVal.          |
    | genF16Type clamp(genF16Type x,                 |                                                    |
    |                  float16_t minVal,             |                                                    |
    |                  float16_t maxVal)             |                                                    |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type mix(genF16Type x,                   | Selects which vector each returned component comes |
    |                genF16Type y,                   | from. For a component of a that is false, the      |
    |                genF16Type a)                   | corresponding component of x is returned. For a    |
    | genF16Type mix(genF16Type x,                   | component of a that is true, the corresponding     |
    |                genF16Type y,                   | component of y is returned.                        |
    |                float16_t a)                    |                                                    |
    | genF16Type mix(genF16Type x,                   |                                                    |
    |                genF16Type y,                   |                                                    |
    |                genBType a)                     |                                                    |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type step (genF16Type edge, genF16Type x)| Returns 0.0 if x < edge; otherwise it returns 1.0. |
    | genF16Type step (float16_t edge, genF16Type x) |                                                    |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type smoothstep (genF16Type edge0,       | Returns 0.0 if x <= edge0 and 1.0 if x >= edge1    |
    |                        genF16Type edge1,       | and performs smooth Hermite interpolation between 0|
    |                        genF16Type x)           | and 1 when edge0 < x < edge1. This is useful in    |
    | genF16Type smoothstep (float16_t edge0,        | cases where you would want a threshold function    |
    |                        float16_t edge1         | with a smooth,transition. This is equivalent to:   |
    |                        genF16Type x)           |    genF16Type t;                                   |
    |                                                |    t = clamp((x - edge0) / (edge1 - edge0), 0, 1); |
    |                                                |    return t * t * (3 - 2 * t);                     |
    |                                                |    Results are undefined if edge0 >= edge1.        |
    +------------------------------------------------+----------------------------------------------------+
    | genBType isnan (genF16Type x)                  | Returns true if x holds a NaN. Returns false       |
    |                                                | otherwise. Always returns false if NaNs are not    |
    |                                                | implemented.                                       |
    +------------------------------------------------+----------------------------------------------------+
    | genBType isinf (genF16Type x)                  | Returns true if x holds a positive infinity or     |
    |                                                | negative infinity. Returns false otherwise.        |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type fma (genF16Type a, genF16Type b,    | Computes and returns a * b + c.                    |
    |                 genF16Type c)                  |                                                    |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type frexp (genF16Type x,                | Splits x into a floating-point significand in the  |
    |                   out genIType exp)            | range [0.5, 1.0) and an integral exponent of two,  |
    |                                                | such that:                                         |
    |                                                |    x = significand * 2^exp                         |
    |                                                | The significand is returned by the function and the|
    |                                                | exponent is returned in the parameter exp. For a   |
    |                                                | floating-point value of zero, the significand and  |
    |                                                | exponent are both zero. For a floating-point value |
    |                                                | that is an infinity or is not a number, the results|
    |                                                | are undefined.                                     |
    +------------------------------------------------+----------------------------------------------------+
    | genF16Type ldexp (genF16Type x,                | Builds a floating-point number from x and the      |
    |                   in genIType exp)             | corresponding integral exponent of two in exp,     |
    |                                                | returning:                                         |
    |                                                |    x* 2^exp                                        |
    |                                                | If this product is too large to be represented in  |
    |                                                | the floating-point type, the result is undefined.  |
    +------------------------------------------------+----------------------------------------------------+


    Modify Section 8.4, Floating-Point Pack and Unpack Functions

    (add to the table of pack and unpack functions on p. 149)

    +-----------------------------------+------------------------------------------------------+
    | Syntax                            | Desciption                                           |
    +-----------------------------------+------------------------------------------------------+
    | uint packFloat2x16(f16vec2 v)     | Returns an unsigned 32-bit integer obtained by       |
    |                                   | packing the components of a two-component half-      |
    |                                   | precision floating-point vector, respectively. The   |
    |                                   | first vector component specifies the 16 least        |
    |                                   | significant bits; the second component specifies the |
    |                                   | 16 most significant bits.                            |
    +-----------------------------------+------------------------------------------------------+
    | f16vec2 unpackFloat2x16(uint v)   | Returns a two-component half-precision floating-point|
    |                                   | vector built from a 32-bit unsigned integer scalar,  |
    |                                   | respectively. The first component of the vector      |
    |                                   | contains the 16 least significant bits of the input; |
    |                                   | the second component contains the 16 most            |
    |                                   | significant bits.                                    |
    +-----------------------------------+------------------------------------------------------+


    Modify Section 8.5 Geometric Functions

    (add to table of geometric functions on p.152)

    +-------------------------------------------+-----------------------------------------------+
    | Syntax                                    | Desciption                                    |
    +-------------------------------------------+-----------------------------------------------+
    | float16_t length (genF16Type x)           | Returns the length of vector x, i.e.,         |
    |                                           | sqrt(x[0]*x[0] + x[1]*x[1] + ...)             |
    +-------------------------------------------+-----------------------------------------------+
    | float16_t distance (genF16Type p0,        | Returns the distance between p0 and p1, i.e., |
    |                     genF16Type p1)        | length (p0 - p1)                              |
    +-------------------------------------------+-----------------------------------------------+
    | float16_t dot (genF16Type x, genF16Type y)| Returns the dot product of x and y, i.e.,     |
    |                                           | x[0]*y[0] + x[1]*y [1] + ...                  |
    +-------------------------------------------+-----------------------------------------------+
    | f16vec3 cross (f16vec3 x, f16vec3 y)      | Returns the cross product of x and y, i.e.,   |
    |                                           | |x[1] * y[2] - y[1] * x[2]|                   |
    |                                           | |x[2] * y[0] - y[2] * x[0]|                   |
    |                                           | |x[0] * y[1] - y[0] * x[1]|                   |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type normalize (genF16Type x)       | Returns a vector in the same direction as x   |
    |                                           | but with a length of 1.                       |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type faceforward (genF16Type N,     | If dot(Nref, I) < 0 return N, otherwise return|
    |                         genF16Type I,     | -N.                                           |
    |                         genF16Type Nref), |                                               |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type reflect (genF16Type I,         | For the incident vector I and surface         |
    |                     genF16Type N)         | orientation N, returns the reflection         |
    |                                           | direction:                                    |
    |                                           |    I - 2 * dot(N, I) * N                      |
    |                                           | N must already be normalized in order to      |
    |                                           | achieve the desired result.                   |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type refract (genF16Type I,         | For the incident vector I and surface normal  |
    |                     genF16Type N,         | N, and the ratio of indices of refraction eta,|
    |                     float16_t eta)        | return the refraction vector. The result is   |
    |                                           | computed by                                   |
    |                                           |    k = 1.0 - eta * eta * (1.0 - dot(N, I) *   |
    |                                           |                dot(N, I))                     |
    |                                           | if (k < 0.0)                                  |
    |                                           |     return genF16Type(0.0)                    |
    |                                           | else                                          |
    |                                           |    return eta * I - (eta * dot(N, I)          |
    |                                           |                      + sqrt(k)) * N           |
    |                                           | The input parameters for the incident vector  |
    |                                           | I and the surface normal N must already be    |
    |                                           | normalized to get the desired results.        |
    +-------------------------------------------+-----------------------------------------------+


    Modify Section, 8.6 Matrix Functions

    (modify the first paragraph of the section on p. 154)

    ..., there is both a single-precision floating-point version, where all
    arguments and return values are single precision, a double-precision
    floating-point version, where all arguments and return values are double
    precision, and a half-precision floating-point version, where all
    arguments and return values are half precision.


    Modify Section, 8.7, Vector Relational Functions

    (add to the table of placeholders at the top of p. 156)

    +-------------+-----------------------------+
    | Placeholder | Specific Types Allowed      |
    +-------------+-----------------------------+
    | f16vec      | f16vec2, f16vec3, f16vec4   |
    +-------------+-----------------------------+

    (add to the table of vector relational functions at the bottom of p. 156)

    +-------------------------------------------+-----------------------------------------------+
    | Syntax                                    | Desciption                                    |
    +-------------------------------------------+-----------------------------------------------+
    | bvec lessThan(f16vec x, f16vec y)         | Returns the component-wise compare of x < y.  |
    +-------------------------------------------+-----------------------------------------------+
    | bvec lessThanEqual(f16vec x, f16vec y)    | Returns the component-wise compare of x <= y. |
    +-------------------------------------------+-----------------------------------------------+
    | bvec greaterThan(f16vec x, f16vec y)      | Returns the component-wise compare of x > y.  |
    +-------------------------------------------+-----------------------------------------------+
    | bvec greaterThanEqual(f16vec x, f16vec y) | Returns the component-wise compare of x >= y. |
    +-------------------------------------------+-----------------------------------------------+
    | bvec equal(f16vec x, f16vec y)            | Returns the component-wise compare of x == y. |
    +-------------------------------------------+-----------------------------------------------+
    | bvec notEqual(f16vec x, f16vec y)         | Returns the component-wise compare of x != y. |
    +-------------------------------------------+-----------------------------------------------+


    Modify Section 8.13.1 Derivative Functions

    (add to table of derivative functions on p. 181)

    +-------------------------------------------+-----------------------------------------------+
    | Syntax                                    | Description                                   |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type dFdx (genF16Type p)            | Returns either dFdxFine(p) or dFdxCoarse(p),  |
    |                                           | based on implementation choice, presumably    |
    |                                           | whichever is the faster, or by whichever is   |
    |                                           | selected in the API through                   |
    |                                           | quality-versus-speed hints.                   |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type dFdy (genF16Type p)            | Returns either dFdyFine(p) or dFdyCoarse(p),  |
    |                                           | based on implementation choice, presumably    |
    |                                           | whichever is the faster, or by whichever is   |
    |                                           | selected in the API through                   |
    |                                           | quality-versus-speed hints.                   |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type dFdxFine (genF16Type p)        | Returns the partial derivative of p with      |
    |                                           | respect to the window x coordinate. Will use  |
    |                                           | local differencing based on the value of p    |
    |                                           | for the current fragment and its immediate    |
    |                                           | neighbor(s).                                  |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type dFdyFine (genF16Type p)        | Returns the partial derivative of p with      |
    |                                           | respect to the window y coordinate. Will use  |
    |                                           | local differencing based on the value of p    |
    |                                           | for the current fragment and its immediate    |
    |                                           | neighbor(s).                                  |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type dFdxCoarse (genF16Type p)      | Returns the partial derivative of p with      |
    |                                           | respect to the window x coordinate. Will use  |
    |                                           | local differencing based on the value of p    |
    |                                           | for the current fragment's neighbors, and     |
    |                                           | will possibly, but not necessarily, include   |
    |                                           | the value of p for the current fragment. That |
    |                                           | is, over a given area, the implementation can |
    |                                           | x compute derivatives in fewer unique         |
    |                                           | locations than would be allowed for           |
    |                                           | dFdxFine(p).                                  |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type dFdyCoarse (genF16Type p)      | Returns the partial derivative of p with      |
    |                                           | respect to the window y coordinate. Will use  |
    |                                           | local differencing based on the value of p    |
    |                                           | for the current fragment's neighbors, and     |
    |                                           | will possibly, but not necessarily, include   |
    |                                           | the value of p for the current fragment. That |
    |                                           | is, over a given area, the implementation can |
    |                                           | compute y derivatives in fewer unique         |
    |                                           | locations than would be allowed for           |
    |                                           | dFdyFine(p).                                  |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type fwidth (genF16Type p)          | Returns abs(dFdx(p)) + abs(dFdy(p)).          |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type fwidthFine (genF16Type p)      | Returns abs(dFdxFine(p)) + abs(dFdyFine(p)).  |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type fwidthCoarse (genF16Type p)    | Returns abs(dFdxCoarse(p)) +                  |
    |                                           |         abs(dFdyCoarse(p)).                   |
    +-------------------------------------------+-----------------------------------------------+


    Modify Section 8.13.2 Interpolation Functions

    (add to table of interpolation functions on p. 180)

    +-------------------------------------------+-----------------------------------------------+
    | Syntax                                    | Description                                   |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type interpolateAtCentroid (        | Returns the value of the input interpolant    |
    |            genF16Type interpolant)        | sampled at a location inside both the pixel   |
    |                                           | and the primitive being processed. The value  |
    |                                           | obtained would be the same value assigned to  |
    |                                           | the input variable if declared with the       |
    |                                           | centroid qualifier                            |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type interpolateAtSample (          | Returns the value of the input interpolant    |
    |            genF16Type interpolant,        | variable at the location of sample number     |
    |            int        sample)             | sample. If multisample buffers are not        |
    |                                           | available, the input variable will be         |
    |                                           | evaluated at the center of the pixel. If      |
    |                                           | sample sample does not exist, the position    |
    |                                           | used to interpolate the input variable is     |
    |                                           | undefined.                                    |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type interpolateAtOffset (          | Returns the value of the input interpolant    |
    |            genF16Type interpolant,        | variable sampled at an offset from the center |
    |            f16vec2    offset)             | of the pixel specified by offset. The two     |
    |                                           | floating-point components of offset, give the |
    |                                           | offset in pixels in the x and y directions,   |
    |                                           | respectively. An offset of (0, 0) identifies  |
    |                                           | the center of the pixel. The range and        |
    |                                           | granularity of offsets supported by this      |
    |                                           | function isimplementation-dependent.          |
    +-------------------------------------------+-----------------------------------------------+


    Modify Section 9, Shading Language Grammar for Core Profile

    (add to the list of tokens on p. 187)

      ...
      FLOAT16  F16VEC2  F16VEC3  F16VEC4
      F16MAT2 F16MAT3 F16MAT4
      F16MAT2X2 FL6MAT2X3 F16MAT2X4
      F16MAT3X2 F16MAT3X3 F16MAT3X4
      F16MAT4X2 F16MAT4X3 F16MAT4X4
      ...
      FLOAT16CONSTANT

    (add to the rule of "primary_expression" on p. 188)

      primary_expression:
        ...
        FLOAT16CONSTANT
        ...

    (add to the rule of "type_specifier_nonarray" on p. 195)

      type_specifier_nonarray:
        ...
          FLOAT16
          F16VEC2
          F16VEC3
          F16VEC4
          F16MAT2
          F16MAT3
          F16MAT4
          F16MAT2X2
          FL6MAT2X3
          F16MAT2X4
          F16MAT3X2
          F16MAT3X3
          F16MAT3X4
          F16MAT4X2
          F16MAT4X3
          F16MAT4X4
        ...


Dependencies on ARB_gpu_shader_int64

    If the shader enables ARB_gpu_shader_int64, this extension allows
    additional explicit conversions between half-precision floating-point
    types and 64-bit integer types.

    Modify Section 5.4.1, Conversion and Scalar Constructors

    (add after the first list of constructor examples on p. 95)

      int64_t(float16_t)    // convert a float16_t value to a signed 64-bit integer
      uint64_t(float16_t)   // convert a float16_t value to an unsigned 64-bit integer
      float16_t(int64_t)    // convert a signed 64-bit integer to a float16_t value
      float16_t(uint64_t)   // convert an unsigned 64-bit integer to a float16_t value


Dependencies on AMD_shader_trinary_minmax

    If the shader enables AMD_shader_trinary_minmax, this extension adds
    additional common functions.

    Modify Section 8.3, Common Functions

    (add to the table of common functions on p. 144)

    +-------------------------------------------+-----------------------------------------------+
    | Syntax                                    | Description                                   |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type min3(genF16Type x,             | Returns the per-component minimum value of x, |
    |                 genF16Type y,             | y, and z.                                     |
    |                 genF16Type z)             |                                               |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type max3(genF16Type x,             | Returns the per-component maximum value of x, |
    |                 genF16Type y,             | y, and z.                                     |
    |                 genF16Type z)             |                                               |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type mid3(genF16Type x,             | Returns the per-component median value of x,  |
    |                 genF16Type y,             | y, and z.                                     |
    |                 genF16Type z)             |                                               |
    +-------------------------------------------+-----------------------------------------------+


Dependencies on AMD_shader_explicit_vertex_parameter

    If the shader enables AMD_shader_explicit_vertex_parameter, this extension
    adds additional interpolation functions.

    Modify Section 8.13.2 Interpolation Functions

    (add to table of interpolation functions on p. 180)

    +-------------------------------------------+-----------------------------------------------+
    | Syntax                                    | Description                                   |
    +-------------------------------------------+-----------------------------------------------+
    | genF16Type interpolateAtVertexAMD (       | Returns the value of the input <interpolant>  |
    |            genF16Type interpolant,        | without any interpolation. i.e. the raw       |
    |            uint       vertexIdx)          | output value of previous shader stage.        |
    |                                           | <vertexIdx> selects for which vertex of the   |
    |                                           | primitive the value of <interpolant> is       |
    |                                           | returned.                                     |
    |                                           |                                               |
    |                                           | This return value is equivalent with          |
    |                                           | interpolating the input <interpolant> using   |
    |                                           | the following set of barycentric coordinates, |
    |                                           | depending on the value of <vertexIdx>:        |
    |                                           |                                               |
    |                                           |  vertexIdx    Barycentric coordinates         |
    |                                           |  0            I=0, J=0, K=1                   |
    |                                           |  1            I=1, J=0, K=0                   |
    |                                           |  2            I=0, J=1, K=0                   |
    |                                           |                                               |
    |                                           | However this order has no association with    |
    |                                           | the vertex order specified by the application |
    |                                           | in the originating draw.                      |
    |                                           |                                               |
    |                                           | The value of <vertexIdx> must be constant     |
    |                                           | integer expression with a value in the range  |
    |                                           | [0, 2].                                       |
    +-------------------------------------------+-----------------------------------------------+


Errors

    None.

New State

    None.

New Implementation Dependent State

    None.

Issues

    (1) How the functionality in this extension different than the half_precision
        floating-point types introduced by NV_gpu_shader5?

      RESOLVED: This extension is designed to be source code compatible with
      the half-precison floating-point support in NV_gpu_shader5. However, it
      is a functional superset of that, as it adds the following additional
      features:

        * support for implicit conversions from int, uint and float to float16_t.

        * support for overloaded versions of the functions, such as abs, sign, min,
          max, clamp, and etc., that accept float16_t type or half-precision
          floating-point type as parameters.

    (2) What should be done to distinguish half-precison floating-point constants?

      RESOLVED: We will use "HF" and "hf" to identify half-precision
      floating-point constants.

    (3) Should we import new uniform API to setup the float16_t type uniform in
        default uniform block?

      RESOLVED: No. float16_t isn't a IEEE standard format, CPU doesn't support
      it directly. So most data on CPU side is stored in the form of single- or
      double-precision floating-point precision floating-point. Uniform*f{v}'s
      functionality is extended to support uniforms with float16_t type in this
      extension.

    (4) Should we support float16_t types as members of uniform blocks,
        shader storage buffer blocks, or as transform feedback varyings?

      RESOLVED: Yes, support all of them. float16_t types will consume two
      basic machine units. Some examples:

          struct S {

              float16_t  x;     // rule 1:  align = 2, takes offsets 0-1
              f16vec2    y;     // rule 2:  align = 4, takes offsets 4-7
              f16vec3    z;     // rule 3:  align = 8, takes offsets 8-13
          };

          layout(column_major, std140) uniform B1 {

              float16_t  a;     // rule 1:  align = 2, takes offsets 0-1
              f16vec2    b;     // rule 2:  align = 4, takes offsets 4-7
              f16vec3    c;     // rule 3:  align = 8, takes offsets 8-13
              float16_t  d[2];  // rule 4:  align = 16, array stride = 16,
                                //          takes offsets 16-47
              f16mat2x3  e;     // rule 5:  align = 16, matrix stride = 16,
                                //          takes offsets 48-79
              f16mat2x3  f[2];  // rule 6:  align = 16, matrix stride = 16,
                                //          array stride = 32, f[0] takes
                                //          offsets 80-111, f[1] takes offsets
                                //          112-143
              S          g;     // rule 9:  align = 16, g.x takes offsets
                                //          144-145, g.y takes offsets 148-151,
                                //          g.z takes offsets 152-159
              S          h[2];  // rule 10: align = 16, array stride = 16, h[0]
                                //          takes offsets 160-175, h[1] takes
                                //          offsets 176-191
          };

          layout(row_major, std430) buffer B2 {

              float16_t  o;     // rule 1:  align = 2, takes offsets 0-1
              f16vec2    p;     // rule 2:  align = 4, takes offsets 4-7
              f16vec3    q;     // rule 3:  align = 8, takes offsets 8-13
              float16_t  r[2];  // rule 4:  align = 2, array stride = 2, takes
                                //          offsets 14-17
              f16mat2x3  s;     // rule 7:  align = 4, matrix stride = 4, takes
                                //          offsets 20-31
              f16mat2x3  t[2];  // rule 8:  align = 4, matrix stride = 4, array
                                //          stride = 12, t[0] takes offsets
                                //          32-43, t[1] takes offsets 44-55
              S          u;     // rule 9:  align = 8, u.x takes offsets
                                //          56-57, u.y takes offsets 60-63, u.z
                                //          takes offsets 64-69
              S          v[2];  // rule 10: align = 8, array stride = 16, v[0]
                                //          takes offsets 72-87, v[1] takes
                                //          offsets 88-103
          };

    (5) In OpenGL ES Shading Language, the format of floating-point in UBO and
        SSBO is always single-precision floating-point regardless of the precision
        qualifier in shader. which format should be used for this extension?

      RESOLVED: the format should be equal with the type declaried in shader.
      i.e. if the block member's type is float16_t, the format in buffer is
      half-precision floating-point. and if the block member's type is float,
      the format is single-precision floating-point. we will provide another
      extension to keep compatible with ES driver's behavior.


Revision History

    Rev.    Date    Author    Changes
    ----  --------  --------  -----------------------------------------
     5    09/21/16  dwitczak  Fixed minor character encoding issues.

     4    08/01/16  rexu      Correct the example of offset calculation for
                              block members. Add limitation of xfb_offset when
                              this qualifier is applied to block members that
                              have float16_t types.

     3    07/11/16  rexu      Clarify that each component of float16_t types
                              consume two basic machine units. Remove the
                              interaction with NV_gpu_shader5 in that implicit
                              conversion from int, uint and float types to
                              float16_t types are disallowed now. Add new
                              derivative functions: dFdxFine, dFdyFine,
                              dFdxCoarse, dFdyCoarse, fwidthFine, fwidthCoarse.
                              Add the interaction with AMD_shader_trinary_minmax
                              and AMD_shader_explicit_vertex_parameter. Remove
                              two listed issues that are no longer valid for
                              the updated version of this extension. Remove
                              floatBitsToInt and decide to add it when
                              16-bit integer data type is supported.

     2    07/06/16  rexu      Remove sections that involve half-precision
                              floating-point opaque types. Modify allowed rules
                              of implicit conversion relevant to float16_t
                              types. Add the interaction with ARB_gpu_shader_
                              int64. Remove the modification of the first rule
                              of std140 layout. Provide some examples to
                              demostrate memory storage layout of uniform
                              blocks and shader storage blocks when they have
                              members of float16_t types.

     1    11/14/13  qlin      Initial revision.