• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    AMD_gpu_shader_half_float
4
5Name Strings
6
7    GL_AMD_gpu_shader_half_float
8
9Contact
10
11    Qun Lin, AMD (quentin.lin 'at' amd.com)
12
13Contributors
14
15    Qun Lin, AMD
16    Daniel Rakos, AMD
17    Donglin Wei, AMD
18    Graham Sellers, AMD
19    Rex Xu, AMD
20    Dominik Witczak, AMD
21
22Status
23
24    Shipping.
25
26Version
27
28    Last Modified Date:         09/21/2016
29    Author Revision:            5
30
31Number
32
33    OpenGL Extension #496
34
35Dependencies
36
37    This extension is written against the OpenGL 4.5 (Core Profile)
38    Specification.
39
40    This extension is written against version 4.50 of the OpenGL Shading
41    Language Specification.
42
43    OpenGL 4.0 and GLSL 4.00 are required.
44
45    This extension interacts with ARB_gpu_shader_int64.
46
47    This extension interacts with AMD_shader_trinary_minmax.
48
49    This extension interacts with AMD_shader_explicit_vertex_parameter.
50
51Overview
52
53    This extension was developed based on the NV_gpu_shader5 extension to
54    allow implementations supporting half float in shader and expose the
55    feature without the additional requirements that are present in
56    NV_gpu_shader5.
57
58    The extension introduces the following features for all shader types:
59
60      * support for half float scalar, vector and matrix data types in shader;
61
62      * new built-in functions to pack and unpack half float types into a
63        32-bit integer vector;
64
65      * half float support for all existing single float built-in functions,
66        including angle functions, exponential functions, common functions,
67        geometric functions, matrix functions and etc.;
68
69    This extension is designed to be a functional superset of the half-precision
70    floating-point support from NV_gpu_shader5 and to keep source code compatible
71    with that, thus the new procedures, functions, and tokens are identical to
72    those found in that extension.
73
74
75New Procedures and Functions
76
77    None.
78
79New Tokens
80
81    Returned by the <type> parameter of GetActiveAttrib, GetActiveUniform, and
82    GetTransformFeedbackVarying:
83
84    (The tokens are identical to those defined in NV_gpu_shader5.)
85
86        FLOAT16_NV                                      0x8FF8
87        FLOAT16_VEC2_NV                                 0x8FF9
88        FLOAT16_VEC3_NV                                 0x8FFA
89        FLOAT16_VEC4_NV                                 0x8FFB
90
91    (New tokens)
92        FLOAT16_MAT2_AMD                                0x91C5
93        FLOAT16_MAT3_AMD                                0x91C6
94        FLOAT16_MAT4_AMD                                0x91C7
95        FLOAT16_MAT2x3_AMD                              0x91C8
96        FLOAT16_MAT2x4_AMD                              0x91C9
97        FLOAT16_MAT3x2_AMD                              0x91CA
98        FLOAT16_MAT3x4_AMD                              0x91CB
99        FLOAT16_MAT4x2_AMD                              0x91CC
100        FLOAT16_MAT4x3_AMD                              0x91CD
101
102
103Additions to Chapter 7 of the OpenGL 4.5 (Core Profile) Specification
104(Program Objects)
105
106    Modify Section 7.3.1, Program Interfaces
107
108    (add to Table 7.3, OpenGL Shading Language type tokens, p. 108)
109
110    +----------------------------+----------------+------+------+------+
111    | Type Name Token            | Keyword        |Attrib| Xfb  |Buffer|
112    +----------------------------+----------------+------+------+------+
113    | FLOAT16_NV                 | float16_t      |  *   |  *   |  *   |
114    | FLOAT16_VEC2_NV            | f16vec2        |  *   |  *   |  *   |
115    | FLOAT16_VEC3_NV            | f16vec3        |  *   |  *   |  *   |
116    | FLOAT16_VEC4_NV            | f16vec4        |  *   |  *   |  *   |
117    | FLOAT16_MAT2_AMD           | f16mat2        |  *   |  *   |  *   |
118    | FLOAT16_MAT3_AMD           | f16mat3        |  *   |  *   |  *   |
119    | FLOAT16_MAT4_AMD           | f16mat4        |  *   |  *   |  *   |
120    | FLOAT16_MAT2x3_AMD         | f16mat2x3      |  *   |  *   |  *   |
121    | FLOAT16_MAT2x4_AMD         | f16mat2x4      |  *   |  *   |  *   |
122    | FLOAT16_MAT3x2_AMD         | f16mat3x2      |  *   |  *   |  *   |
123    | FLOAT16_MAT3x4_AMD         | f16mat3x4      |  *   |  *   |  *   |
124    | FLOAT16_MAT4x2_AMD         | f16mat4x2      |  *   |  *   |  *   |
125    | FLOAT16_MAT4x3_AMD         | f16mat4x3      |  *   |  *   |  *   |
126    +----------------------------+----------------+------+------+------+
127
128
129    Modify Section 7.6.1, Loading Uniform Variables
130
131    (modify the last paragraph on p. 132)
132
133        The Uniform*f{v} commands will load count sets of one to four floating-
134    point values into a uniform defined as a float, a half float, a floating-
135    point vector, a half-precision floating-point vector or an array of either
136    of these types. Floating-point values are converted to half float by the GL
137    for uniforms defined as a half float, a half float vector or an array of
138    those.
139
140
141    Modify Section 7.6.2.1, Uniform Buffer Object Storage
142
143    (modify the first two bullets of the first paragraph on p. 136)
144
145    * Members of type bool, int, uint, float, float16_t and double are respectively
146      extracted from a buffer object by reading a single uint, int, uint, float,
147      half float or double value at the specified offset.
148
149    * Vectors with N elements with basic data types of bool, int, uint, float,
150      float16_t or double are extracted as N values in consecutive memory locations
151      beginning at the specified offset, with components stored in order with the
152      first (X) component at the lowest offset. The GL data type used for component
153      extraction is derived according to the rules for scalar members above.
154
155
156Additions to Chapter 11 of the OpenGL 4.5 (Core Profile) Specification
157(Programmable Vertex Processing)
158
159    Modify Section 11.1.1, Vertex Attributes
160
161    (modify Table 11.2, Generic attributes and vector types used by column vectors of
162    matrix variables bound to generic attribute index i. p. 366)
163
164    +------------------------------+-------------------------+-----------------------+
165    |          Data type           |Column vector type layout|     Generic           |
166    |                              |qualifier attributes used|                       |
167    +------------------------------+-------------------------+-----------------------+
168    | mat2, dmat2, f16mat2         | two-component vector    | i, i + 1              |
169    | mat2x3, dmat2x3, f16mat2x3   | three-component vector  | i, i + 1              |
170    | mat2x4, dmat2x4, f16mat2x4   | four-component vector   | i, i + 1              |
171    | mat3x2, dmat3x2, f16mat3x2   | two-component vector    | i, i + 1, i + 2       |
172    | mat3, dmat3, f16mat3         | three-component vector  | i, i + 1, i + 2       |
173    | mat3x4, dmat3x4, f16mat3x4   | four-component vector   | i, i + 1, i + 2       |
174    | mat4x2, dmat4x2, f16mat4x2   | two-component vector    | i, i + 1, i + 2, i + 3|
175    | mat4x3, dmat4x3, f16mat4x3   | three-component vector  | i, i + 1, i + 2, i + 3|
176    | mat4, dmat4, f16mat4         | four-component vector   | i, i + 1, i + 2, i + 3|
177    +------------------------------+-------------------------+-----------------------+
178
179    (modify Table 11.3: Scalar and vector vertex attribute types and VertexAttrib*
180    commands used to set the values of the corresponding generic attributes. p. 366)
181
182    +-------------------+--------------------------+
183    |   Data type       |         Command          |
184    +-------------------+--------------------------+
185    | float, float16_t  | VertexAttrib1*           |
186    | vec2, f16vec2     | VertexAttrib2*           |
187    | vec3, f16vec3     | VertexAttrib3*           |
188    | vec4, f16vec4     | VertexAttrib4*           |
189    +-------------------+--------------------------+
190
191
192    Modify Section 11.1.2.1, Output Variables
193
194    (modify the last paragraph on p. 374)
195
196    ..., each component of outputs declared as half-precision floating-point
197    scalars, vectors, or matrices is considered to consume two basic machine
198    units, and each component of any other type ...
199
200
201Modifications to the OpenGL Shading Language Specification, Version 4.50
202
203    Including the following line in a shader can be used to control the
204    language features described in this extension:
205
206      #extension GL_AMD_gpu_shader_half_float : <behavior>
207
208    where <behavior> is as specified in section 3.3.
209
210    New preprocessor #defines are added to the OpenGL Shading Language:
211
212      #define GL_AMD_gpu_shader_half_float       1
213
214
215Additions to Chapter 3 of the OpenGL Shading Language Specification (Basics)
216
217
218    Modify Section 3.6, Keywords
219
220    (add the following to the list of reserved keywords at p. 18)
221
222    float16_t f16vec2 f16vec3 f16vec4
223    f16mat2  f16mat3  f16mat4
224    f16mat2x2 fl6mat2x3 f16mat2x4
225    f16mat3x2 f16mat3x3 f16mat3x4
226    f16mat4x2 f16mat4x3 f16mat4x4
227
228
229Additions to Chapter 4 of the OpenGL Shading Language Specification
230(Variables and Types)
231
232
233    Modify Section 4.1, Basic Types
234
235    (add to the basic "Transparent Types" table, p. 23)
236
237    +-----------+------------------------------------------------------------+
238    | Type      | Meaning                                                    |
239    +-----------+------------------------------------------------------------+
240    | float16_t | a half-precision floating-point scalar                     |
241    | f16vec2   | a two-component half-precision floating-point vector       |
242    | f16vec3   | a three-component half-precision floating-point vector     |
243    | f16vec4   | a four-component half-precision floating-point vector      |
244    | f16mat2   | a 2x2 half-precision floating-point matrix                 |
245    | f16mat3   | a 3x3 half-precision floating-point matrix                 |
246    | f16mat4   | a 4x4 half-precision floating-point matrix                 |
247    | f16mat2x2 | same as a f16mat2                                          |
248    | f16mat2x3 | a half-precision floating-point matrix with 2 columns and  |
249    |           | 3 rows                                                     |
250    | f16mat2x4 | a half-precision floating-point matrix with 2 columns and  |
251    |           | 4 rows                                                     |
252    | f16mat3x2 | a half-precision floating-point matrix with 3 columns and  |
253    |           | 2 rows                                                     |
254    | f16mat3x3 | same as a f16mat3                                          |
255    | f16mat3x4 | a half-precision floating-point matrix with 3 columns and  |
256    |           | 4 rows                                                     |
257    | f16mat4x2 | a half-precision floating-point matrix with 4 columns and  |
258    |           | 2 rows                                                     |
259    | f16mat4x3 | a half-precision floating-point matrix with 4 columns and  |
260    |           | 3 rows                                                     |
261    | f16mat4x4 | same as a f16mat4                                          |
262    +-----------+------------------------------------------------------------+
263
264
265    Modify Section 4.1.4, Floating-Point Variables
266
267    (replace first paragraph of the section, p. 29)
268
269    Single-precision, double-precision and half-precision floating point variables
270    are available for use in a variety of scalar calculations. Generally, the term
271    floating-point will refer to all single-, double- and half-precision floating
272    point. Floating-point variables are defined as in the following examples:
273
274        float a, b = 1.5;       // single-precision floating-point
275        double c, d = 2.0LF;    // double-precision floating-point
276        float16_t e, f = 3.0HF; // half-precision floating-point
277
278    As an input value to one of the processing units, a single-precision, double-
279    precision or half-precison floating-point variable is expected to match the
280    corresponding IEEE 754 floating-point definition in terms of precision and
281    dynamic range.
282
283    (modify grammar rule for "floating-suffix", p. 30)
284
285      floating-suffix: one of
286        f F lf LF hf HF
287
288    (modify the fourth sentence of second paragraph on p. 30)
289
290    When the suffix "lf" or "LF" is present, the literal has type double. When the
291    suffix "hf" or "HF" is present, the literal has type float16_t. Otherwise, the
292    literal has type float.
293
294
295    Modify Section 4.1.6, Matrices
296
297    (modify the second sentence in the section, p. 30)
298
299    Matrix types beginning with "mat" have single-precision components, matrix
300    types beginning with "dmat" have double-precision components and matrix types
301    beginning with "f16mat" have half-precision components.
302
303
304    Modify Section 4.1.10, Implicit Conversions
305
306    (modify the implicit conversion table on p. 37)
307
308    +-----------------------+-------------------------------------------------+
309    | Type of expression    | Can be implicitly converted to                  |
310    +-----------------------+-------------------------------------------------+
311    | int, uint, float16_t  | float                                           |
312    | ivec2, uvec2, f16vec2 | vec2                                            |
313    | ivec3, uvec3, f16vec3 | vec3                                            |
314    | ivec4, uvec4, f16vec4 | vec4                                            |
315    | f16mat2               | mat2                                            |
316    | f16mat3               | mat3                                            |
317    | f16mat4               | mat4                                            |
318    | f16mat2x3             | mat2x3                                          |
319    | f16mat2x4             | mat2x4                                          |
320    | f16mat3x2             | mat3x2                                          |
321    | f16mat3x4             | mat3x4                                          |
322    | f16mat4x2             | mat4x2                                          |
323    | f16mat4x3             | mat4x3                                          |
324    | int, uint,            | double                                          |
325    | float, float16_t      |                                                 |
326    | ivec2, uvec2,         | dvec2                                           |
327    | vec2, f16vec2         |                                                 |
328    | ivec3, uvec3,         | dvec3                                           |
329    | vec3, f16vec3         |                                                 |
330    | ivec4, uvec4,         | dvec4                                           |
331    | vec4, f16vec4         |                                                 |
332    | mat2, f16mat2,        | dmat2                                           |
333    | mat3, f16mat3         | dmat3                                           |
334    | mat4, f16mat4         | dmat4                                           |
335    | mat2x3, f16mat2x3     | dmat2x3                                         |
336    | mat2x4, f16mat2x4     | dmat2x4                                         |
337    | mat3x2, f16mat3x2     | dmat3x2                                         |
338    | mat3x4, f16mat3x4     | dmat3x4                                         |
339    | mat4x2, f16mat4x2     | dmat4x2                                         |
340    | mat4x3, f16mat4x3     | dmat4x3                                         |
341    +-----------------------+-------------------------------------------------+
342
343
344    Modify Section 4.4.2.1 Transform Feedback Layout Qualifiers
345
346    (insert after the fourth paragraph in the section on p. 70)
347
348    ... will be a multiple of 8; if applied to an aggregrate containing a
349    float16_t, the offset must also be a multiple of 2, and the space taken in
350    the buffer will be a multiple of 2.
351
352
353    Modify Section 4.7.1 Range and Precision
354
355    (insert after the first paragraph in the section on p. 85)
356
357    ... and positive and negative zeros. The precision of stored half-
358    precision floating-point variables is described in section 2.3.3.2 "16-Bit
359    Floating-Point Numbers" of OpenGL Specification.
360
361    The following rules apply to all floating operations, including single-,
362    double- and half-precision operations:...
363
364
365Additions to Chapter 5 of the OpenGL Shading Language Specification
366(Operators and Expressions)
367
368
369    Modify Section 5.4.1, Conversion and Scalar Constructors
370
371    (add after the first list of constructor examples on p. 97)
372
373      int(float16_t)    // convert a float16_t value to a signed integer
374      uint(float16_t)   // convert a float16_t value to an unsigned integer
375      bool(float16_t)   // convert a float16_t value to a Boolean
376      float(float16_t)  // convert a float16_t value to a float value
377      double(float16_t) // convert a float16_t value to a double value
378      float16_t(bool)   // convert a Boolean to a float16_t value
379      float16_t(int)    // convert a signed integer to a float16_t value
380      float16_t(uint)   // convert an unsigned integer to a float16_t value
381      float16_t(float)  // convert a float value to a float16_t value
382      float16_t(double) // convert a double value to a float16_t value
383
384    (modify the first sentence of last paragraph on p. 98)
385
386    ... other arguments.
387    If the basic type (bool, int, float, double, or float16_t) of a parameter to
388    a constructor does not match the basic type of the object being constructed,
389    the scalar construction rules (above) are used to convert the parameters.
390
391
392Additions to Chapter 6 of the OpenGL Shading Language Specification
393(Statements and Structure)
394
395
396    Modify Section 6.1, Function Defintions
397
398    (replace the second rule in third paragraph on p. 113)
399
400      2. A match involving a conversion from a signed integer, unsigned
401         integer, or floating-point type to a similar type having a larger
402         number of bits is better than a match involving any other implicit
403         conversion.
404
405Additions to Chapter 8 of the OpenGL Shading Language Specification
406(Built-in Functions)
407
408    (insert after the sixth sentence of last paragraph on p. 140)
409
410    ... genDType is used as the argument. Where the input arguments (and
411    corresponding output) can be float16_t, f16vec2, f16vec3, f16vec4,
412    genF16Type is used as the argument.
413
414
415    Modify Section 8.1, Angle and Trigonometry Functions
416
417    (add to the table of Angle and Trigonometry Functions on p. 141)
418
419    +------------------------------------------------+----------------------------------------------------+
420    | Syntax                                         | Desciption                                         |
421    +------------------------------------------------+----------------------------------------------------+
422    | genF16Type radians (genF16Type degrees)        | Converts degrees to radians, i.e., 180/PI *        |
423    |                                                | degrees.                                           |
424    +------------------------------------------------+----------------------------------------------------+
425    | genF16Type degrees (genF16Type radians)        | Converts radians to degrees, i.e., 180/PI *        |
426    |                                                | radians.                                           |
427    +------------------------------------------------+----------------------------------------------------+
428    | genF16Type sin (genF16Type angle)              | The standard trigonometric sine function.          |
429    +------------------------------------------------+----------------------------------------------------+
430    | genF16Type cos (genF16Type angle)              | The standard trigonometric cosine function         |
431    +------------------------------------------------+----------------------------------------------------+
432    | genF16Type tan (genF16Type angle)              | The standard trigonometric tangent.                |
433    +------------------------------------------------+----------------------------------------------------+
434    | genF16Type asin (genF16Type x)                 | Arc sine. Returns an angle whose sine is x. The    |
435    |                                                | range of values returned by this function is [-PI/2|
436    |                                                | , PI/2] Results are undefined if |x| > 1.          |
437    +------------------------------------------------+----------------------------------------------------+
438    | genF16Type acos (genF16Type x)                 | Arc cosine. Returns an angle whose cosine is x. The|
439    |                                                | range of values returned by this function is [0, p]|
440    |                                                | Results are undefined if |x| > 1.                  |
441    +------------------------------------------------+----------------------------------------------------+
442    | genF16Type atan (genF16Type y, genF16Type x)   | Arc tangent. Returns an angle whose tangent is y/x.|
443    |                                                | The signs of x and y are used to determine what    |
444    |                                                | quadrant the angle is in. The range of values      |
445    |                                                | returned by this function is [-PI,PI]. Results are |
446    |                                                | undefined if x and y are both 0.                   |
447    +------------------------------------------------+----------------------------------------------------+
448    | genF16Type atan (genF16Type y_over_x)          | Arc tangent. Returns an angle whose tangent is     |
449    |                                                | y_over_x. The range of values returned by this     |
450    |                                                | function is [-PI/2, PI/2].                         |
451    +------------------------------------------------+----------------------------------------------------+
452    | genF16Type sinh (genF16Type x)                 | Returns the hyperbolic sine function               |
453    |                                                | (e^x - e^-x) / 2.                                  |
454    +------------------------------------------------+----------------------------------------------------+
455    | genF16Type cosh (genF16Type x)                 | Returns the hyperbolic cosine function             |
456    |                                                | (e^x + e^-x) / 2.                                  |
457    +------------------------------------------------+----------------------------------------------------+
458    | genF16Type tanh (genF16Type x)                 | Returns the hyperbolic tangent function            |
459    |                                                | sinh(x) / cosh(x).                                 |
460    +------------------------------------------------+----------------------------------------------------+
461    | genF16Type asinh (genF16Type x)                | Arc hyperbolic sine; returns the inverse of sinh.  |
462    +------------------------------------------------+----------------------------------------------------+
463    | genF16Type acosh (genF16Type x)                | Arc hyperbolic cosine; returns the non-negative    |
464    |                                                | inverse of cosh. Results are undefined if x < 1.   |
465    +------------------------------------------------+----------------------------------------------------+
466    | genF16Type atanh (genF16Type x)                | Arc hyperbolic tangent; returns the inverse of     |
467    |                                                | tanh. Results are undefined if |x| >= 1.           |
468    +------------------------------------------------+----------------------------------------------------+
469
470
471    Modify Section 8.2, Exponential Functions
472
473    (add to the table of Exponential Functions on p. 143)
474
475    +------------------------------------------------+----------------------------------------------------+
476    | Syntax                                         | Desciption                                         |
477    +------------------------------------------------+----------------------------------------------------+
478    | genF16Type pow (genF16Type x, genF16Type y)    | Returns x raised to the y power, i.e., x^y         |
479    |                                                | Results are undefined if x < 0.                    |
480    |                                                | Results are undefined if x = 0 and y <= 0.         |
481    +------------------------------------------------+----------------------------------------------------+
482    | genF16Type exp (genF16Type x)                  | Returns the natural exponentiation of x, i.e., e^x.|
483    +------------------------------------------------+----------------------------------------------------+
484    | genF16Type log (genF16Type x)                  | Returns the natural logarithm of x, i.e., returns  |
485    |                                                | the value y which satisfies the equation x = e^y.  |
486    |                                                | Results are undefined if x <= 0.                   |
487    +------------------------------------------------+----------------------------------------------------+
488    | genF16Type exp2 (genF16Type x)                 | Returns 2 raised to the x power, i.e., 2^x.        |
489    +------------------------------------------------+----------------------------------------------------+
490    | genF16Type log2 (genF16Type x)                 | Returns the base 2 logarithm of x, i.e., returns   |
491    |                                                | the value y which satisfies the equation x = 2^y   |
492    |                                                | Results are undefined if x <= 0.                   |
493    +------------------------------------------------+----------------------------------------------------+
494    | genF16Type sqrt (genF16Type x)                 | Returns sqrt(x) .Results are undefined if x < 0.   |
495    +------------------------------------------------+----------------------------------------------------+
496    | genF16Type inversesqrt (genF16Type x)          | Returns 1 / sqrt(x). Results are undefined if      |
497    |                                                | x <= 0.                                            |
498    +------------------------------------------------+----------------------------------------------------+
499
500
501    Modify Section 8.3, Common Functions
502
503    (add to the table of common functions on p. 144)
504
505    +------------------------------------------------+----------------------------------------------------+
506    | Syntax                                         | Desciption                                         |
507    +------------------------------------------------+----------------------------------------------------+
508    | genF16Type abs(genF16Type x)                   | Returns x if x >= 0; otherwise it returns -x.      |
509    +------------------------------------------------+----------------------------------------------------+
510    | genF16Type sign(genF16Type x)                  | Returns 1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < |
511    |                                                | 0.                                                 |
512    +------------------------------------------------+----------------------------------------------------+
513    | genF16Type floor (genF16Type x)                | Returns a value equal to the nearest integer that  |
514    |                                                | is less than or equal to x.                        |
515    +------------------------------------------------+----------------------------------------------------+
516    | genF16Type trunc (genF16Type x)                | Returns a value equal to the nearest integer to x  |
517    |                                                | whose absolute value is not larger than the        |
518    |                                                | absolute value of x.                               |
519    +------------------------------------------------+----------------------------------------------------+
520    | genF16Type round (genF16Type x)                | Returns a value equal to the nearest integer to x. |
521    |                                                | The fraction 0.5 will round in a direction chosen  |
522    |                                                | by the implementation, presumably the direction    |
523    |                                                | that is fastest. This includes the possibility     |
524    |                                                | that round(x) returns the same value as            |
525    |                                                | roundEven(x) for all values of x.                  |
526    +------------------------------------------------+----------------------------------------------------+
527    | genF16Type roundEven (genF16Type x)            | Returns a value equal to the nearest integer to x. |
528    |                                                | A fractional part of 0.5 will round toward the     |
529    |                                                | nearest even integer. (Both 3.5 and 4.5 for x will |
530    |                                                | return 4.0.)                                       |
531    +------------------------------------------------+----------------------------------------------------+
532    | genF16Type ceil (genF16Type x)                 | Returns a value equal to the nearest integer that  |
533    |                                                | is greater than or equal to x.                     |
534    +------------------------------------------------+----------------------------------------------------+
535    | genF16Type fract (genF16Type x)                | Returns x - floor(x).                              |
536    +------------------------------------------------+----------------------------------------------------+
537    | genF16Type mod (genF16Type x, float16_t y)     | Modulus. Returns x - y * floor(x/y).               |
538    | genF16Type mod (genF16Type x, genF16Type y)    |                                                    |
539    +------------------------------------------------+----------------------------------------------------+
540    | genF16Type modf(genF16Type x, out genF16Type i)| Returns the fractional part of x and sets i to the |
541    |                                                | integer part (as a whole number floating-point     |
542    |                                                | value). Both the return value and the output       |
543    |                                                | parameter will have the same sign as x.            |
544    +------------------------------------------------+----------------------------------------------------+
545    | genF16Type min(genF16Type x,                   | Returns y if y < x; otherwise it returns x.        |
546    |                genF16Type y)                   |                                                    |
547    | genF16Type min(genF16Type x,                   |                                                    |
548    |                float16_t y)                    |                                                    |
549    +------------------------------------------------+----------------------------------------------------+
550    | genF16Type max(genF16Type x,                   | Returns y if x < y; otherwise it returns x.        |
551    |                genF16Type y)                   |                                                    |
552    | genF16Type max(genF16Type x,                   |                                                    |
553    |                float16_t y)                    |                                                    |
554    +------------------------------------------------+----------------------------------------------------+
555    | genF16Type clamp(genF16Type x,                 | Returns min(max(x, minVal), maxVal).               |
556    |                  genF16Type minVal,            |                                                    |
557    |                  genF16Type maxVal)            | Results are undefined if minVal > maxVal.          |
558    | genF16Type clamp(genF16Type x,                 |                                                    |
559    |                  float16_t minVal,             |                                                    |
560    |                  float16_t maxVal)             |                                                    |
561    +------------------------------------------------+----------------------------------------------------+
562    | genF16Type mix(genF16Type x,                   | Selects which vector each returned component comes |
563    |                genF16Type y,                   | from. For a component of a that is false, the      |
564    |                genF16Type a)                   | corresponding component of x is returned. For a    |
565    | genF16Type mix(genF16Type x,                   | component of a that is true, the corresponding     |
566    |                genF16Type y,                   | component of y is returned.                        |
567    |                float16_t a)                    |                                                    |
568    | genF16Type mix(genF16Type x,                   |                                                    |
569    |                genF16Type y,                   |                                                    |
570    |                genBType a)                     |                                                    |
571    +------------------------------------------------+----------------------------------------------------+
572    | genF16Type step (genF16Type edge, genF16Type x)| Returns 0.0 if x < edge; otherwise it returns 1.0. |
573    | genF16Type step (float16_t edge, genF16Type x) |                                                    |
574    +------------------------------------------------+----------------------------------------------------+
575    | genF16Type smoothstep (genF16Type edge0,       | Returns 0.0 if x <= edge0 and 1.0 if x >= edge1    |
576    |                        genF16Type edge1,       | and performs smooth Hermite interpolation between 0|
577    |                        genF16Type x)           | and 1 when edge0 < x < edge1. This is useful in    |
578    | genF16Type smoothstep (float16_t edge0,        | cases where you would want a threshold function    |
579    |                        float16_t edge1         | with a smooth,transition. This is equivalent to:   |
580    |                        genF16Type x)           |    genF16Type t;                                   |
581    |                                                |    t = clamp((x - edge0) / (edge1 - edge0), 0, 1); |
582    |                                                |    return t * t * (3 - 2 * t);                     |
583    |                                                |    Results are undefined if edge0 >= edge1.        |
584    +------------------------------------------------+----------------------------------------------------+
585    | genBType isnan (genF16Type x)                  | Returns true if x holds a NaN. Returns false       |
586    |                                                | otherwise. Always returns false if NaNs are not    |
587    |                                                | implemented.                                       |
588    +------------------------------------------------+----------------------------------------------------+
589    | genBType isinf (genF16Type x)                  | Returns true if x holds a positive infinity or     |
590    |                                                | negative infinity. Returns false otherwise.        |
591    +------------------------------------------------+----------------------------------------------------+
592    | genF16Type fma (genF16Type a, genF16Type b,    | Computes and returns a * b + c.                    |
593    |                 genF16Type c)                  |                                                    |
594    +------------------------------------------------+----------------------------------------------------+
595    | genF16Type frexp (genF16Type x,                | Splits x into a floating-point significand in the  |
596    |                   out genIType exp)            | range [0.5, 1.0) and an integral exponent of two,  |
597    |                                                | such that:                                         |
598    |                                                |    x = significand * 2^exp                         |
599    |                                                | The significand is returned by the function and the|
600    |                                                | exponent is returned in the parameter exp. For a   |
601    |                                                | floating-point value of zero, the significand and  |
602    |                                                | exponent are both zero. For a floating-point value |
603    |                                                | that is an infinity or is not a number, the results|
604    |                                                | are undefined.                                     |
605    +------------------------------------------------+----------------------------------------------------+
606    | genF16Type ldexp (genF16Type x,                | Builds a floating-point number from x and the      |
607    |                   in genIType exp)             | corresponding integral exponent of two in exp,     |
608    |                                                | returning:                                         |
609    |                                                |    x* 2^exp                                        |
610    |                                                | If this product is too large to be represented in  |
611    |                                                | the floating-point type, the result is undefined.  |
612    +------------------------------------------------+----------------------------------------------------+
613
614
615    Modify Section 8.4, Floating-Point Pack and Unpack Functions
616
617    (add to the table of pack and unpack functions on p. 149)
618
619    +-----------------------------------+------------------------------------------------------+
620    | Syntax                            | Desciption                                           |
621    +-----------------------------------+------------------------------------------------------+
622    | uint packFloat2x16(f16vec2 v)     | Returns an unsigned 32-bit integer obtained by       |
623    |                                   | packing the components of a two-component half-      |
624    |                                   | precision floating-point vector, respectively. The   |
625    |                                   | first vector component specifies the 16 least        |
626    |                                   | significant bits; the second component specifies the |
627    |                                   | 16 most significant bits.                            |
628    +-----------------------------------+------------------------------------------------------+
629    | f16vec2 unpackFloat2x16(uint v)   | Returns a two-component half-precision floating-point|
630    |                                   | vector built from a 32-bit unsigned integer scalar,  |
631    |                                   | respectively. The first component of the vector      |
632    |                                   | contains the 16 least significant bits of the input; |
633    |                                   | the second component contains the 16 most            |
634    |                                   | significant bits.                                    |
635    +-----------------------------------+------------------------------------------------------+
636
637
638    Modify Section 8.5 Geometric Functions
639
640    (add to table of geometric functions on p.152)
641
642    +-------------------------------------------+-----------------------------------------------+
643    | Syntax                                    | Desciption                                    |
644    +-------------------------------------------+-----------------------------------------------+
645    | float16_t length (genF16Type x)           | Returns the length of vector x, i.e.,         |
646    |                                           | sqrt(x[0]*x[0] + x[1]*x[1] + ...)             |
647    +-------------------------------------------+-----------------------------------------------+
648    | float16_t distance (genF16Type p0,        | Returns the distance between p0 and p1, i.e., |
649    |                     genF16Type p1)        | length (p0 - p1)                              |
650    +-------------------------------------------+-----------------------------------------------+
651    | float16_t dot (genF16Type x, genF16Type y)| Returns the dot product of x and y, i.e.,     |
652    |                                           | x[0]*y[0] + x[1]*y [1] + ...                  |
653    +-------------------------------------------+-----------------------------------------------+
654    | f16vec3 cross (f16vec3 x, f16vec3 y)      | Returns the cross product of x and y, i.e.,   |
655    |                                           | |x[1] * y[2] - y[1] * x[2]|                   |
656    |                                           | |x[2] * y[0] - y[2] * x[0]|                   |
657    |                                           | |x[0] * y[1] - y[0] * x[1]|                   |
658    +-------------------------------------------+-----------------------------------------------+
659    | genF16Type normalize (genF16Type x)       | Returns a vector in the same direction as x   |
660    |                                           | but with a length of 1.                       |
661    +-------------------------------------------+-----------------------------------------------+
662    | genF16Type faceforward (genF16Type N,     | If dot(Nref, I) < 0 return N, otherwise return|
663    |                         genF16Type I,     | -N.                                           |
664    |                         genF16Type Nref), |                                               |
665    +-------------------------------------------+-----------------------------------------------+
666    | genF16Type reflect (genF16Type I,         | For the incident vector I and surface         |
667    |                     genF16Type N)         | orientation N, returns the reflection         |
668    |                                           | direction:                                    |
669    |                                           |    I - 2 * dot(N, I) * N                      |
670    |                                           | N must already be normalized in order to      |
671    |                                           | achieve the desired result.                   |
672    +-------------------------------------------+-----------------------------------------------+
673    | genF16Type refract (genF16Type I,         | For the incident vector I and surface normal  |
674    |                     genF16Type N,         | N, and the ratio of indices of refraction eta,|
675    |                     float16_t eta)        | return the refraction vector. The result is   |
676    |                                           | computed by                                   |
677    |                                           |    k = 1.0 - eta * eta * (1.0 - dot(N, I) *   |
678    |                                           |                dot(N, I))                     |
679    |                                           | if (k < 0.0)                                  |
680    |                                           |     return genF16Type(0.0)                    |
681    |                                           | else                                          |
682    |                                           |    return eta * I - (eta * dot(N, I)          |
683    |                                           |                      + sqrt(k)) * N           |
684    |                                           | The input parameters for the incident vector  |
685    |                                           | I and the surface normal N must already be    |
686    |                                           | normalized to get the desired results.        |
687    +-------------------------------------------+-----------------------------------------------+
688
689
690    Modify Section, 8.6 Matrix Functions
691
692    (modify the first paragraph of the section on p. 154)
693
694    ..., there is both a single-precision floating-point version, where all
695    arguments and return values are single precision, a double-precision
696    floating-point version, where all arguments and return values are double
697    precision, and a half-precision floating-point version, where all
698    arguments and return values are half precision.
699
700
701    Modify Section, 8.7, Vector Relational Functions
702
703    (add to the table of placeholders at the top of p. 156)
704
705    +-------------+-----------------------------+
706    | Placeholder | Specific Types Allowed      |
707    +-------------+-----------------------------+
708    | f16vec      | f16vec2, f16vec3, f16vec4   |
709    +-------------+-----------------------------+
710
711    (add to the table of vector relational functions at the bottom of p. 156)
712
713    +-------------------------------------------+-----------------------------------------------+
714    | Syntax                                    | Desciption                                    |
715    +-------------------------------------------+-----------------------------------------------+
716    | bvec lessThan(f16vec x, f16vec y)         | Returns the component-wise compare of x < y.  |
717    +-------------------------------------------+-----------------------------------------------+
718    | bvec lessThanEqual(f16vec x, f16vec y)    | Returns the component-wise compare of x <= y. |
719    +-------------------------------------------+-----------------------------------------------+
720    | bvec greaterThan(f16vec x, f16vec y)      | Returns the component-wise compare of x > y.  |
721    +-------------------------------------------+-----------------------------------------------+
722    | bvec greaterThanEqual(f16vec x, f16vec y) | Returns the component-wise compare of x >= y. |
723    +-------------------------------------------+-----------------------------------------------+
724    | bvec equal(f16vec x, f16vec y)            | Returns the component-wise compare of x == y. |
725    +-------------------------------------------+-----------------------------------------------+
726    | bvec notEqual(f16vec x, f16vec y)         | Returns the component-wise compare of x != y. |
727    +-------------------------------------------+-----------------------------------------------+
728
729
730    Modify Section 8.13.1 Derivative Functions
731
732    (add to table of derivative functions on p. 181)
733
734    +-------------------------------------------+-----------------------------------------------+
735    | Syntax                                    | Description                                   |
736    +-------------------------------------------+-----------------------------------------------+
737    | genF16Type dFdx (genF16Type p)            | Returns either dFdxFine(p) or dFdxCoarse(p),  |
738    |                                           | based on implementation choice, presumably    |
739    |                                           | whichever is the faster, or by whichever is   |
740    |                                           | selected in the API through                   |
741    |                                           | quality-versus-speed hints.                   |
742    +-------------------------------------------+-----------------------------------------------+
743    | genF16Type dFdy (genF16Type p)            | Returns either dFdyFine(p) or dFdyCoarse(p),  |
744    |                                           | based on implementation choice, presumably    |
745    |                                           | whichever is the faster, or by whichever is   |
746    |                                           | selected in the API through                   |
747    |                                           | quality-versus-speed hints.                   |
748    +-------------------------------------------+-----------------------------------------------+
749    | genF16Type dFdxFine (genF16Type p)        | Returns the partial derivative of p with      |
750    |                                           | respect to the window x coordinate. Will use  |
751    |                                           | local differencing based on the value of p    |
752    |                                           | for the current fragment and its immediate    |
753    |                                           | neighbor(s).                                  |
754    +-------------------------------------------+-----------------------------------------------+
755    | genF16Type dFdyFine (genF16Type p)        | Returns the partial derivative of p with      |
756    |                                           | respect to the window y coordinate. Will use  |
757    |                                           | local differencing based on the value of p    |
758    |                                           | for the current fragment and its immediate    |
759    |                                           | neighbor(s).                                  |
760    +-------------------------------------------+-----------------------------------------------+
761    | genF16Type dFdxCoarse (genF16Type p)      | Returns the partial derivative of p with      |
762    |                                           | respect to the window x coordinate. Will use  |
763    |                                           | local differencing based on the value of p    |
764    |                                           | for the current fragment's neighbors, and     |
765    |                                           | will possibly, but not necessarily, include   |
766    |                                           | the value of p for the current fragment. That |
767    |                                           | is, over a given area, the implementation can |
768    |                                           | x compute derivatives in fewer unique         |
769    |                                           | locations than would be allowed for           |
770    |                                           | dFdxFine(p).                                  |
771    +-------------------------------------------+-----------------------------------------------+
772    | genF16Type dFdyCoarse (genF16Type p)      | Returns the partial derivative of p with      |
773    |                                           | respect to the window y coordinate. Will use  |
774    |                                           | local differencing based on the value of p    |
775    |                                           | for the current fragment's neighbors, and     |
776    |                                           | will possibly, but not necessarily, include   |
777    |                                           | the value of p for the current fragment. That |
778    |                                           | is, over a given area, the implementation can |
779    |                                           | compute y derivatives in fewer unique         |
780    |                                           | locations than would be allowed for           |
781    |                                           | dFdyFine(p).                                  |
782    +-------------------------------------------+-----------------------------------------------+
783    | genF16Type fwidth (genF16Type p)          | Returns abs(dFdx(p)) + abs(dFdy(p)).          |
784    +-------------------------------------------+-----------------------------------------------+
785    | genF16Type fwidthFine (genF16Type p)      | Returns abs(dFdxFine(p)) + abs(dFdyFine(p)).  |
786    +-------------------------------------------+-----------------------------------------------+
787    | genF16Type fwidthCoarse (genF16Type p)    | Returns abs(dFdxCoarse(p)) +                  |
788    |                                           |         abs(dFdyCoarse(p)).                   |
789    +-------------------------------------------+-----------------------------------------------+
790
791
792    Modify Section 8.13.2 Interpolation Functions
793
794    (add to table of interpolation functions on p. 180)
795
796    +-------------------------------------------+-----------------------------------------------+
797    | Syntax                                    | Description                                   |
798    +-------------------------------------------+-----------------------------------------------+
799    | genF16Type interpolateAtCentroid (        | Returns the value of the input interpolant    |
800    |            genF16Type interpolant)        | sampled at a location inside both the pixel   |
801    |                                           | and the primitive being processed. The value  |
802    |                                           | obtained would be the same value assigned to  |
803    |                                           | the input variable if declared with the       |
804    |                                           | centroid qualifier                            |
805    +-------------------------------------------+-----------------------------------------------+
806    | genF16Type interpolateAtSample (          | Returns the value of the input interpolant    |
807    |            genF16Type interpolant,        | variable at the location of sample number     |
808    |            int        sample)             | sample. If multisample buffers are not        |
809    |                                           | available, the input variable will be         |
810    |                                           | evaluated at the center of the pixel. If      |
811    |                                           | sample sample does not exist, the position    |
812    |                                           | used to interpolate the input variable is     |
813    |                                           | undefined.                                    |
814    +-------------------------------------------+-----------------------------------------------+
815    | genF16Type interpolateAtOffset (          | Returns the value of the input interpolant    |
816    |            genF16Type interpolant,        | variable sampled at an offset from the center |
817    |            f16vec2    offset)             | of the pixel specified by offset. The two     |
818    |                                           | floating-point components of offset, give the |
819    |                                           | offset in pixels in the x and y directions,   |
820    |                                           | respectively. An offset of (0, 0) identifies  |
821    |                                           | the center of the pixel. The range and        |
822    |                                           | granularity of offsets supported by this      |
823    |                                           | function isimplementation-dependent.          |
824    +-------------------------------------------+-----------------------------------------------+
825
826
827    Modify Section 9, Shading Language Grammar for Core Profile
828
829    (add to the list of tokens on p. 187)
830
831      ...
832      FLOAT16  F16VEC2  F16VEC3  F16VEC4
833      F16MAT2 F16MAT3 F16MAT4
834      F16MAT2X2 FL6MAT2X3 F16MAT2X4
835      F16MAT3X2 F16MAT3X3 F16MAT3X4
836      F16MAT4X2 F16MAT4X3 F16MAT4X4
837      ...
838      FLOAT16CONSTANT
839
840    (add to the rule of "primary_expression" on p. 188)
841
842      primary_expression:
843        ...
844        FLOAT16CONSTANT
845        ...
846
847    (add to the rule of "type_specifier_nonarray" on p. 195)
848
849      type_specifier_nonarray:
850        ...
851          FLOAT16
852          F16VEC2
853          F16VEC3
854          F16VEC4
855          F16MAT2
856          F16MAT3
857          F16MAT4
858          F16MAT2X2
859          FL6MAT2X3
860          F16MAT2X4
861          F16MAT3X2
862          F16MAT3X3
863          F16MAT3X4
864          F16MAT4X2
865          F16MAT4X3
866          F16MAT4X4
867        ...
868
869
870Dependencies on ARB_gpu_shader_int64
871
872    If the shader enables ARB_gpu_shader_int64, this extension allows
873    additional explicit conversions between half-precision floating-point
874    types and 64-bit integer types.
875
876    Modify Section 5.4.1, Conversion and Scalar Constructors
877
878    (add after the first list of constructor examples on p. 95)
879
880      int64_t(float16_t)    // convert a float16_t value to a signed 64-bit integer
881      uint64_t(float16_t)   // convert a float16_t value to an unsigned 64-bit integer
882      float16_t(int64_t)    // convert a signed 64-bit integer to a float16_t value
883      float16_t(uint64_t)   // convert an unsigned 64-bit integer to a float16_t value
884
885
886Dependencies on AMD_shader_trinary_minmax
887
888    If the shader enables AMD_shader_trinary_minmax, this extension adds
889    additional common functions.
890
891    Modify Section 8.3, Common Functions
892
893    (add to the table of common functions on p. 144)
894
895    +-------------------------------------------+-----------------------------------------------+
896    | Syntax                                    | Description                                   |
897    +-------------------------------------------+-----------------------------------------------+
898    | genF16Type min3(genF16Type x,             | Returns the per-component minimum value of x, |
899    |                 genF16Type y,             | y, and z.                                     |
900    |                 genF16Type z)             |                                               |
901    +-------------------------------------------+-----------------------------------------------+
902    | genF16Type max3(genF16Type x,             | Returns the per-component maximum value of x, |
903    |                 genF16Type y,             | y, and z.                                     |
904    |                 genF16Type z)             |                                               |
905    +-------------------------------------------+-----------------------------------------------+
906    | genF16Type mid3(genF16Type x,             | Returns the per-component median value of x,  |
907    |                 genF16Type y,             | y, and z.                                     |
908    |                 genF16Type z)             |                                               |
909    +-------------------------------------------+-----------------------------------------------+
910
911
912Dependencies on AMD_shader_explicit_vertex_parameter
913
914    If the shader enables AMD_shader_explicit_vertex_parameter, this extension
915    adds additional interpolation functions.
916
917    Modify Section 8.13.2 Interpolation Functions
918
919    (add to table of interpolation functions on p. 180)
920
921    +-------------------------------------------+-----------------------------------------------+
922    | Syntax                                    | Description                                   |
923    +-------------------------------------------+-----------------------------------------------+
924    | genF16Type interpolateAtVertexAMD (       | Returns the value of the input <interpolant>  |
925    |            genF16Type interpolant,        | without any interpolation. i.e. the raw       |
926    |            uint       vertexIdx)          | output value of previous shader stage.        |
927    |                                           | <vertexIdx> selects for which vertex of the   |
928    |                                           | primitive the value of <interpolant> is       |
929    |                                           | returned.                                     |
930    |                                           |                                               |
931    |                                           | This return value is equivalent with          |
932    |                                           | interpolating the input <interpolant> using   |
933    |                                           | the following set of barycentric coordinates, |
934    |                                           | depending on the value of <vertexIdx>:        |
935    |                                           |                                               |
936    |                                           |  vertexIdx    Barycentric coordinates         |
937    |                                           |  0            I=0, J=0, K=1                   |
938    |                                           |  1            I=1, J=0, K=0                   |
939    |                                           |  2            I=0, J=1, K=0                   |
940    |                                           |                                               |
941    |                                           | However this order has no association with    |
942    |                                           | the vertex order specified by the application |
943    |                                           | in the originating draw.                      |
944    |                                           |                                               |
945    |                                           | The value of <vertexIdx> must be constant     |
946    |                                           | integer expression with a value in the range  |
947    |                                           | [0, 2].                                       |
948    +-------------------------------------------+-----------------------------------------------+
949
950
951Errors
952
953    None.
954
955New State
956
957    None.
958
959New Implementation Dependent State
960
961    None.
962
963Issues
964
965    (1) How the functionality in this extension different than the half_precision
966        floating-point types introduced by NV_gpu_shader5?
967
968      RESOLVED: This extension is designed to be source code compatible with
969      the half-precison floating-point support in NV_gpu_shader5. However, it
970      is a functional superset of that, as it adds the following additional
971      features:
972
973        * support for implicit conversions from int, uint and float to float16_t.
974
975        * support for overloaded versions of the functions, such as abs, sign, min,
976          max, clamp, and etc., that accept float16_t type or half-precision
977          floating-point type as parameters.
978
979    (2) What should be done to distinguish half-precison floating-point constants?
980
981      RESOLVED: We will use "HF" and "hf" to identify half-precision
982      floating-point constants.
983
984    (3) Should we import new uniform API to setup the float16_t type uniform in
985        default uniform block?
986
987      RESOLVED: No. float16_t isn't a IEEE standard format, CPU doesn't support
988      it directly. So most data on CPU side is stored in the form of single- or
989      double-precision floating-point precision floating-point. Uniform*f{v}'s
990      functionality is extended to support uniforms with float16_t type in this
991      extension.
992
993    (4) Should we support float16_t types as members of uniform blocks,
994        shader storage buffer blocks, or as transform feedback varyings?
995
996      RESOLVED: Yes, support all of them. float16_t types will consume two
997      basic machine units. Some examples:
998
999          struct S {
1000
1001              float16_t  x;     // rule 1:  align = 2, takes offsets 0-1
1002              f16vec2    y;     // rule 2:  align = 4, takes offsets 4-7
1003              f16vec3    z;     // rule 3:  align = 8, takes offsets 8-13
1004          };
1005
1006          layout(column_major, std140) uniform B1 {
1007
1008              float16_t  a;     // rule 1:  align = 2, takes offsets 0-1
1009              f16vec2    b;     // rule 2:  align = 4, takes offsets 4-7
1010              f16vec3    c;     // rule 3:  align = 8, takes offsets 8-13
1011              float16_t  d[2];  // rule 4:  align = 16, array stride = 16,
1012                                //          takes offsets 16-47
1013              f16mat2x3  e;     // rule 5:  align = 16, matrix stride = 16,
1014                                //          takes offsets 48-79
1015              f16mat2x3  f[2];  // rule 6:  align = 16, matrix stride = 16,
1016                                //          array stride = 32, f[0] takes
1017                                //          offsets 80-111, f[1] takes offsets
1018                                //          112-143
1019              S          g;     // rule 9:  align = 16, g.x takes offsets
1020                                //          144-145, g.y takes offsets 148-151,
1021                                //          g.z takes offsets 152-159
1022              S          h[2];  // rule 10: align = 16, array stride = 16, h[0]
1023                                //          takes offsets 160-175, h[1] takes
1024                                //          offsets 176-191
1025          };
1026
1027          layout(row_major, std430) buffer B2 {
1028
1029              float16_t  o;     // rule 1:  align = 2, takes offsets 0-1
1030              f16vec2    p;     // rule 2:  align = 4, takes offsets 4-7
1031              f16vec3    q;     // rule 3:  align = 8, takes offsets 8-13
1032              float16_t  r[2];  // rule 4:  align = 2, array stride = 2, takes
1033                                //          offsets 14-17
1034              f16mat2x3  s;     // rule 7:  align = 4, matrix stride = 4, takes
1035                                //          offsets 20-31
1036              f16mat2x3  t[2];  // rule 8:  align = 4, matrix stride = 4, array
1037                                //          stride = 12, t[0] takes offsets
1038                                //          32-43, t[1] takes offsets 44-55
1039              S          u;     // rule 9:  align = 8, u.x takes offsets
1040                                //          56-57, u.y takes offsets 60-63, u.z
1041                                //          takes offsets 64-69
1042              S          v[2];  // rule 10: align = 8, array stride = 16, v[0]
1043                                //          takes offsets 72-87, v[1] takes
1044                                //          offsets 88-103
1045          };
1046
1047    (5) In OpenGL ES Shading Language, the format of floating-point in UBO and
1048        SSBO is always single-precision floating-point regardless of the precision
1049        qualifier in shader. which format should be used for this extension?
1050
1051      RESOLVED: the format should be equal with the type declaried in shader.
1052      i.e. if the block member's type is float16_t, the format in buffer is
1053      half-precision floating-point. and if the block member's type is float,
1054      the format is single-precision floating-point. we will provide another
1055      extension to keep compatible with ES driver's behavior.
1056
1057
1058Revision History
1059
1060    Rev.    Date    Author    Changes
1061    ----  --------  --------  -----------------------------------------
1062     5    09/21/16  dwitczak  Fixed minor character encoding issues.
1063
1064     4    08/01/16  rexu      Correct the example of offset calculation for
1065                              block members. Add limitation of xfb_offset when
1066                              this qualifier is applied to block members that
1067                              have float16_t types.
1068
1069     3    07/11/16  rexu      Clarify that each component of float16_t types
1070                              consume two basic machine units. Remove the
1071                              interaction with NV_gpu_shader5 in that implicit
1072                              conversion from int, uint and float types to
1073                              float16_t types are disallowed now. Add new
1074                              derivative functions: dFdxFine, dFdyFine,
1075                              dFdxCoarse, dFdyCoarse, fwidthFine, fwidthCoarse.
1076                              Add the interaction with AMD_shader_trinary_minmax
1077                              and AMD_shader_explicit_vertex_parameter. Remove
1078                              two listed issues that are no longer valid for
1079                              the updated version of this extension. Remove
1080                              floatBitsToInt and decide to add it when
1081                              16-bit integer data type is supported.
1082
1083     2    07/06/16  rexu      Remove sections that involve half-precision
1084                              floating-point opaque types. Modify allowed rules
1085                              of implicit conversion relevant to float16_t
1086                              types. Add the interaction with ARB_gpu_shader_
1087                              int64. Remove the modification of the first rule
1088                              of std140 layout. Provide some examples to
1089                              demostrate memory storage layout of uniform
1090                              blocks and shader storage blocks when they have
1091                              members of float16_t types.
1092
1093     1    11/14/13  qlin      Initial revision.
1094