1Name 2 3 AMD_gpu_shader_half_float 4 5Name Strings 6 7 GL_AMD_gpu_shader_half_float 8 9Contact 10 11 Qun Lin, AMD (quentin.lin 'at' amd.com) 12 13Contributors 14 15 Qun Lin, AMD 16 Daniel Rakos, AMD 17 Donglin Wei, AMD 18 Graham Sellers, AMD 19 Rex Xu, AMD 20 Dominik Witczak, AMD 21 22Status 23 24 Shipping. 25 26Version 27 28 Last Modified Date: 09/21/2016 29 Author Revision: 5 30 31Number 32 33 OpenGL Extension #496 34 35Dependencies 36 37 This extension is written against the OpenGL 4.5 (Core Profile) 38 Specification. 39 40 This extension is written against version 4.50 of the OpenGL Shading 41 Language Specification. 42 43 OpenGL 4.0 and GLSL 4.00 are required. 44 45 This extension interacts with ARB_gpu_shader_int64. 46 47 This extension interacts with AMD_shader_trinary_minmax. 48 49 This extension interacts with AMD_shader_explicit_vertex_parameter. 50 51Overview 52 53 This extension was developed based on the NV_gpu_shader5 extension to 54 allow implementations supporting half float in shader and expose the 55 feature without the additional requirements that are present in 56 NV_gpu_shader5. 57 58 The extension introduces the following features for all shader types: 59 60 * support for half float scalar, vector and matrix data types in shader; 61 62 * new built-in functions to pack and unpack half float types into a 63 32-bit integer vector; 64 65 * half float support for all existing single float built-in functions, 66 including angle functions, exponential functions, common functions, 67 geometric functions, matrix functions and etc.; 68 69 This extension is designed to be a functional superset of the half-precision 70 floating-point support from NV_gpu_shader5 and to keep source code compatible 71 with that, thus the new procedures, functions, and tokens are identical to 72 those found in that extension. 73 74 75New Procedures and Functions 76 77 None. 78 79New Tokens 80 81 Returned by the <type> parameter of GetActiveAttrib, GetActiveUniform, and 82 GetTransformFeedbackVarying: 83 84 (The tokens are identical to those defined in NV_gpu_shader5.) 85 86 FLOAT16_NV 0x8FF8 87 FLOAT16_VEC2_NV 0x8FF9 88 FLOAT16_VEC3_NV 0x8FFA 89 FLOAT16_VEC4_NV 0x8FFB 90 91 (New tokens) 92 FLOAT16_MAT2_AMD 0x91C5 93 FLOAT16_MAT3_AMD 0x91C6 94 FLOAT16_MAT4_AMD 0x91C7 95 FLOAT16_MAT2x3_AMD 0x91C8 96 FLOAT16_MAT2x4_AMD 0x91C9 97 FLOAT16_MAT3x2_AMD 0x91CA 98 FLOAT16_MAT3x4_AMD 0x91CB 99 FLOAT16_MAT4x2_AMD 0x91CC 100 FLOAT16_MAT4x3_AMD 0x91CD 101 102 103Additions to Chapter 7 of the OpenGL 4.5 (Core Profile) Specification 104(Program Objects) 105 106 Modify Section 7.3.1, Program Interfaces 107 108 (add to Table 7.3, OpenGL Shading Language type tokens, p. 108) 109 110 +----------------------------+----------------+------+------+------+ 111 | Type Name Token | Keyword |Attrib| Xfb |Buffer| 112 +----------------------------+----------------+------+------+------+ 113 | FLOAT16_NV | float16_t | * | * | * | 114 | FLOAT16_VEC2_NV | f16vec2 | * | * | * | 115 | FLOAT16_VEC3_NV | f16vec3 | * | * | * | 116 | FLOAT16_VEC4_NV | f16vec4 | * | * | * | 117 | FLOAT16_MAT2_AMD | f16mat2 | * | * | * | 118 | FLOAT16_MAT3_AMD | f16mat3 | * | * | * | 119 | FLOAT16_MAT4_AMD | f16mat4 | * | * | * | 120 | FLOAT16_MAT2x3_AMD | f16mat2x3 | * | * | * | 121 | FLOAT16_MAT2x4_AMD | f16mat2x4 | * | * | * | 122 | FLOAT16_MAT3x2_AMD | f16mat3x2 | * | * | * | 123 | FLOAT16_MAT3x4_AMD | f16mat3x4 | * | * | * | 124 | FLOAT16_MAT4x2_AMD | f16mat4x2 | * | * | * | 125 | FLOAT16_MAT4x3_AMD | f16mat4x3 | * | * | * | 126 +----------------------------+----------------+------+------+------+ 127 128 129 Modify Section 7.6.1, Loading Uniform Variables 130 131 (modify the last paragraph on p. 132) 132 133 The Uniform*f{v} commands will load count sets of one to four floating- 134 point values into a uniform defined as a float, a half float, a floating- 135 point vector, a half-precision floating-point vector or an array of either 136 of these types. Floating-point values are converted to half float by the GL 137 for uniforms defined as a half float, a half float vector or an array of 138 those. 139 140 141 Modify Section 7.6.2.1, Uniform Buffer Object Storage 142 143 (modify the first two bullets of the first paragraph on p. 136) 144 145 * Members of type bool, int, uint, float, float16_t and double are respectively 146 extracted from a buffer object by reading a single uint, int, uint, float, 147 half float or double value at the specified offset. 148 149 * Vectors with N elements with basic data types of bool, int, uint, float, 150 float16_t or double are extracted as N values in consecutive memory locations 151 beginning at the specified offset, with components stored in order with the 152 first (X) component at the lowest offset. The GL data type used for component 153 extraction is derived according to the rules for scalar members above. 154 155 156Additions to Chapter 11 of the OpenGL 4.5 (Core Profile) Specification 157(Programmable Vertex Processing) 158 159 Modify Section 11.1.1, Vertex Attributes 160 161 (modify Table 11.2, Generic attributes and vector types used by column vectors of 162 matrix variables bound to generic attribute index i. p. 366) 163 164 +------------------------------+-------------------------+-----------------------+ 165 | Data type |Column vector type layout| Generic | 166 | |qualifier attributes used| | 167 +------------------------------+-------------------------+-----------------------+ 168 | mat2, dmat2, f16mat2 | two-component vector | i, i + 1 | 169 | mat2x3, dmat2x3, f16mat2x3 | three-component vector | i, i + 1 | 170 | mat2x4, dmat2x4, f16mat2x4 | four-component vector | i, i + 1 | 171 | mat3x2, dmat3x2, f16mat3x2 | two-component vector | i, i + 1, i + 2 | 172 | mat3, dmat3, f16mat3 | three-component vector | i, i + 1, i + 2 | 173 | mat3x4, dmat3x4, f16mat3x4 | four-component vector | i, i + 1, i + 2 | 174 | mat4x2, dmat4x2, f16mat4x2 | two-component vector | i, i + 1, i + 2, i + 3| 175 | mat4x3, dmat4x3, f16mat4x3 | three-component vector | i, i + 1, i + 2, i + 3| 176 | mat4, dmat4, f16mat4 | four-component vector | i, i + 1, i + 2, i + 3| 177 +------------------------------+-------------------------+-----------------------+ 178 179 (modify Table 11.3: Scalar and vector vertex attribute types and VertexAttrib* 180 commands used to set the values of the corresponding generic attributes. p. 366) 181 182 +-------------------+--------------------------+ 183 | Data type | Command | 184 +-------------------+--------------------------+ 185 | float, float16_t | VertexAttrib1* | 186 | vec2, f16vec2 | VertexAttrib2* | 187 | vec3, f16vec3 | VertexAttrib3* | 188 | vec4, f16vec4 | VertexAttrib4* | 189 +-------------------+--------------------------+ 190 191 192 Modify Section 11.1.2.1, Output Variables 193 194 (modify the last paragraph on p. 374) 195 196 ..., each component of outputs declared as half-precision floating-point 197 scalars, vectors, or matrices is considered to consume two basic machine 198 units, and each component of any other type ... 199 200 201Modifications to the OpenGL Shading Language Specification, Version 4.50 202 203 Including the following line in a shader can be used to control the 204 language features described in this extension: 205 206 #extension GL_AMD_gpu_shader_half_float : <behavior> 207 208 where <behavior> is as specified in section 3.3. 209 210 New preprocessor #defines are added to the OpenGL Shading Language: 211 212 #define GL_AMD_gpu_shader_half_float 1 213 214 215Additions to Chapter 3 of the OpenGL Shading Language Specification (Basics) 216 217 218 Modify Section 3.6, Keywords 219 220 (add the following to the list of reserved keywords at p. 18) 221 222 float16_t f16vec2 f16vec3 f16vec4 223 f16mat2 f16mat3 f16mat4 224 f16mat2x2 fl6mat2x3 f16mat2x4 225 f16mat3x2 f16mat3x3 f16mat3x4 226 f16mat4x2 f16mat4x3 f16mat4x4 227 228 229Additions to Chapter 4 of the OpenGL Shading Language Specification 230(Variables and Types) 231 232 233 Modify Section 4.1, Basic Types 234 235 (add to the basic "Transparent Types" table, p. 23) 236 237 +-----------+------------------------------------------------------------+ 238 | Type | Meaning | 239 +-----------+------------------------------------------------------------+ 240 | float16_t | a half-precision floating-point scalar | 241 | f16vec2 | a two-component half-precision floating-point vector | 242 | f16vec3 | a three-component half-precision floating-point vector | 243 | f16vec4 | a four-component half-precision floating-point vector | 244 | f16mat2 | a 2x2 half-precision floating-point matrix | 245 | f16mat3 | a 3x3 half-precision floating-point matrix | 246 | f16mat4 | a 4x4 half-precision floating-point matrix | 247 | f16mat2x2 | same as a f16mat2 | 248 | f16mat2x3 | a half-precision floating-point matrix with 2 columns and | 249 | | 3 rows | 250 | f16mat2x4 | a half-precision floating-point matrix with 2 columns and | 251 | | 4 rows | 252 | f16mat3x2 | a half-precision floating-point matrix with 3 columns and | 253 | | 2 rows | 254 | f16mat3x3 | same as a f16mat3 | 255 | f16mat3x4 | a half-precision floating-point matrix with 3 columns and | 256 | | 4 rows | 257 | f16mat4x2 | a half-precision floating-point matrix with 4 columns and | 258 | | 2 rows | 259 | f16mat4x3 | a half-precision floating-point matrix with 4 columns and | 260 | | 3 rows | 261 | f16mat4x4 | same as a f16mat4 | 262 +-----------+------------------------------------------------------------+ 263 264 265 Modify Section 4.1.4, Floating-Point Variables 266 267 (replace first paragraph of the section, p. 29) 268 269 Single-precision, double-precision and half-precision floating point variables 270 are available for use in a variety of scalar calculations. Generally, the term 271 floating-point will refer to all single-, double- and half-precision floating 272 point. Floating-point variables are defined as in the following examples: 273 274 float a, b = 1.5; // single-precision floating-point 275 double c, d = 2.0LF; // double-precision floating-point 276 float16_t e, f = 3.0HF; // half-precision floating-point 277 278 As an input value to one of the processing units, a single-precision, double- 279 precision or half-precison floating-point variable is expected to match the 280 corresponding IEEE 754 floating-point definition in terms of precision and 281 dynamic range. 282 283 (modify grammar rule for "floating-suffix", p. 30) 284 285 floating-suffix: one of 286 f F lf LF hf HF 287 288 (modify the fourth sentence of second paragraph on p. 30) 289 290 When the suffix "lf" or "LF" is present, the literal has type double. When the 291 suffix "hf" or "HF" is present, the literal has type float16_t. Otherwise, the 292 literal has type float. 293 294 295 Modify Section 4.1.6, Matrices 296 297 (modify the second sentence in the section, p. 30) 298 299 Matrix types beginning with "mat" have single-precision components, matrix 300 types beginning with "dmat" have double-precision components and matrix types 301 beginning with "f16mat" have half-precision components. 302 303 304 Modify Section 4.1.10, Implicit Conversions 305 306 (modify the implicit conversion table on p. 37) 307 308 +-----------------------+-------------------------------------------------+ 309 | Type of expression | Can be implicitly converted to | 310 +-----------------------+-------------------------------------------------+ 311 | int, uint, float16_t | float | 312 | ivec2, uvec2, f16vec2 | vec2 | 313 | ivec3, uvec3, f16vec3 | vec3 | 314 | ivec4, uvec4, f16vec4 | vec4 | 315 | f16mat2 | mat2 | 316 | f16mat3 | mat3 | 317 | f16mat4 | mat4 | 318 | f16mat2x3 | mat2x3 | 319 | f16mat2x4 | mat2x4 | 320 | f16mat3x2 | mat3x2 | 321 | f16mat3x4 | mat3x4 | 322 | f16mat4x2 | mat4x2 | 323 | f16mat4x3 | mat4x3 | 324 | int, uint, | double | 325 | float, float16_t | | 326 | ivec2, uvec2, | dvec2 | 327 | vec2, f16vec2 | | 328 | ivec3, uvec3, | dvec3 | 329 | vec3, f16vec3 | | 330 | ivec4, uvec4, | dvec4 | 331 | vec4, f16vec4 | | 332 | mat2, f16mat2, | dmat2 | 333 | mat3, f16mat3 | dmat3 | 334 | mat4, f16mat4 | dmat4 | 335 | mat2x3, f16mat2x3 | dmat2x3 | 336 | mat2x4, f16mat2x4 | dmat2x4 | 337 | mat3x2, f16mat3x2 | dmat3x2 | 338 | mat3x4, f16mat3x4 | dmat3x4 | 339 | mat4x2, f16mat4x2 | dmat4x2 | 340 | mat4x3, f16mat4x3 | dmat4x3 | 341 +-----------------------+-------------------------------------------------+ 342 343 344 Modify Section 4.4.2.1 Transform Feedback Layout Qualifiers 345 346 (insert after the fourth paragraph in the section on p. 70) 347 348 ... will be a multiple of 8; if applied to an aggregrate containing a 349 float16_t, the offset must also be a multiple of 2, and the space taken in 350 the buffer will be a multiple of 2. 351 352 353 Modify Section 4.7.1 Range and Precision 354 355 (insert after the first paragraph in the section on p. 85) 356 357 ... and positive and negative zeros. The precision of stored half- 358 precision floating-point variables is described in section 2.3.3.2 "16-Bit 359 Floating-Point Numbers" of OpenGL Specification. 360 361 The following rules apply to all floating operations, including single-, 362 double- and half-precision operations:... 363 364 365Additions to Chapter 5 of the OpenGL Shading Language Specification 366(Operators and Expressions) 367 368 369 Modify Section 5.4.1, Conversion and Scalar Constructors 370 371 (add after the first list of constructor examples on p. 97) 372 373 int(float16_t) // convert a float16_t value to a signed integer 374 uint(float16_t) // convert a float16_t value to an unsigned integer 375 bool(float16_t) // convert a float16_t value to a Boolean 376 float(float16_t) // convert a float16_t value to a float value 377 double(float16_t) // convert a float16_t value to a double value 378 float16_t(bool) // convert a Boolean to a float16_t value 379 float16_t(int) // convert a signed integer to a float16_t value 380 float16_t(uint) // convert an unsigned integer to a float16_t value 381 float16_t(float) // convert a float value to a float16_t value 382 float16_t(double) // convert a double value to a float16_t value 383 384 (modify the first sentence of last paragraph on p. 98) 385 386 ... other arguments. 387 If the basic type (bool, int, float, double, or float16_t) of a parameter to 388 a constructor does not match the basic type of the object being constructed, 389 the scalar construction rules (above) are used to convert the parameters. 390 391 392Additions to Chapter 6 of the OpenGL Shading Language Specification 393(Statements and Structure) 394 395 396 Modify Section 6.1, Function Defintions 397 398 (replace the second rule in third paragraph on p. 113) 399 400 2. A match involving a conversion from a signed integer, unsigned 401 integer, or floating-point type to a similar type having a larger 402 number of bits is better than a match involving any other implicit 403 conversion. 404 405Additions to Chapter 8 of the OpenGL Shading Language Specification 406(Built-in Functions) 407 408 (insert after the sixth sentence of last paragraph on p. 140) 409 410 ... genDType is used as the argument. Where the input arguments (and 411 corresponding output) can be float16_t, f16vec2, f16vec3, f16vec4, 412 genF16Type is used as the argument. 413 414 415 Modify Section 8.1, Angle and Trigonometry Functions 416 417 (add to the table of Angle and Trigonometry Functions on p. 141) 418 419 +------------------------------------------------+----------------------------------------------------+ 420 | Syntax | Desciption | 421 +------------------------------------------------+----------------------------------------------------+ 422 | genF16Type radians (genF16Type degrees) | Converts degrees to radians, i.e., 180/PI * | 423 | | degrees. | 424 +------------------------------------------------+----------------------------------------------------+ 425 | genF16Type degrees (genF16Type radians) | Converts radians to degrees, i.e., 180/PI * | 426 | | radians. | 427 +------------------------------------------------+----------------------------------------------------+ 428 | genF16Type sin (genF16Type angle) | The standard trigonometric sine function. | 429 +------------------------------------------------+----------------------------------------------------+ 430 | genF16Type cos (genF16Type angle) | The standard trigonometric cosine function | 431 +------------------------------------------------+----------------------------------------------------+ 432 | genF16Type tan (genF16Type angle) | The standard trigonometric tangent. | 433 +------------------------------------------------+----------------------------------------------------+ 434 | genF16Type asin (genF16Type x) | Arc sine. Returns an angle whose sine is x. The | 435 | | range of values returned by this function is [-PI/2| 436 | | , PI/2] Results are undefined if |x| > 1. | 437 +------------------------------------------------+----------------------------------------------------+ 438 | genF16Type acos (genF16Type x) | Arc cosine. Returns an angle whose cosine is x. The| 439 | | range of values returned by this function is [0, p]| 440 | | Results are undefined if |x| > 1. | 441 +------------------------------------------------+----------------------------------------------------+ 442 | genF16Type atan (genF16Type y, genF16Type x) | Arc tangent. Returns an angle whose tangent is y/x.| 443 | | The signs of x and y are used to determine what | 444 | | quadrant the angle is in. The range of values | 445 | | returned by this function is [-PI,PI]. Results are | 446 | | undefined if x and y are both 0. | 447 +------------------------------------------------+----------------------------------------------------+ 448 | genF16Type atan (genF16Type y_over_x) | Arc tangent. Returns an angle whose tangent is | 449 | | y_over_x. The range of values returned by this | 450 | | function is [-PI/2, PI/2]. | 451 +------------------------------------------------+----------------------------------------------------+ 452 | genF16Type sinh (genF16Type x) | Returns the hyperbolic sine function | 453 | | (e^x - e^-x) / 2. | 454 +------------------------------------------------+----------------------------------------------------+ 455 | genF16Type cosh (genF16Type x) | Returns the hyperbolic cosine function | 456 | | (e^x + e^-x) / 2. | 457 +------------------------------------------------+----------------------------------------------------+ 458 | genF16Type tanh (genF16Type x) | Returns the hyperbolic tangent function | 459 | | sinh(x) / cosh(x). | 460 +------------------------------------------------+----------------------------------------------------+ 461 | genF16Type asinh (genF16Type x) | Arc hyperbolic sine; returns the inverse of sinh. | 462 +------------------------------------------------+----------------------------------------------------+ 463 | genF16Type acosh (genF16Type x) | Arc hyperbolic cosine; returns the non-negative | 464 | | inverse of cosh. Results are undefined if x < 1. | 465 +------------------------------------------------+----------------------------------------------------+ 466 | genF16Type atanh (genF16Type x) | Arc hyperbolic tangent; returns the inverse of | 467 | | tanh. Results are undefined if |x| >= 1. | 468 +------------------------------------------------+----------------------------------------------------+ 469 470 471 Modify Section 8.2, Exponential Functions 472 473 (add to the table of Exponential Functions on p. 143) 474 475 +------------------------------------------------+----------------------------------------------------+ 476 | Syntax | Desciption | 477 +------------------------------------------------+----------------------------------------------------+ 478 | genF16Type pow (genF16Type x, genF16Type y) | Returns x raised to the y power, i.e., x^y | 479 | | Results are undefined if x < 0. | 480 | | Results are undefined if x = 0 and y <= 0. | 481 +------------------------------------------------+----------------------------------------------------+ 482 | genF16Type exp (genF16Type x) | Returns the natural exponentiation of x, i.e., e^x.| 483 +------------------------------------------------+----------------------------------------------------+ 484 | genF16Type log (genF16Type x) | Returns the natural logarithm of x, i.e., returns | 485 | | the value y which satisfies the equation x = e^y. | 486 | | Results are undefined if x <= 0. | 487 +------------------------------------------------+----------------------------------------------------+ 488 | genF16Type exp2 (genF16Type x) | Returns 2 raised to the x power, i.e., 2^x. | 489 +------------------------------------------------+----------------------------------------------------+ 490 | genF16Type log2 (genF16Type x) | Returns the base 2 logarithm of x, i.e., returns | 491 | | the value y which satisfies the equation x = 2^y | 492 | | Results are undefined if x <= 0. | 493 +------------------------------------------------+----------------------------------------------------+ 494 | genF16Type sqrt (genF16Type x) | Returns sqrt(x) .Results are undefined if x < 0. | 495 +------------------------------------------------+----------------------------------------------------+ 496 | genF16Type inversesqrt (genF16Type x) | Returns 1 / sqrt(x). Results are undefined if | 497 | | x <= 0. | 498 +------------------------------------------------+----------------------------------------------------+ 499 500 501 Modify Section 8.3, Common Functions 502 503 (add to the table of common functions on p. 144) 504 505 +------------------------------------------------+----------------------------------------------------+ 506 | Syntax | Desciption | 507 +------------------------------------------------+----------------------------------------------------+ 508 | genF16Type abs(genF16Type x) | Returns x if x >= 0; otherwise it returns -x. | 509 +------------------------------------------------+----------------------------------------------------+ 510 | genF16Type sign(genF16Type x) | Returns 1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < | 511 | | 0. | 512 +------------------------------------------------+----------------------------------------------------+ 513 | genF16Type floor (genF16Type x) | Returns a value equal to the nearest integer that | 514 | | is less than or equal to x. | 515 +------------------------------------------------+----------------------------------------------------+ 516 | genF16Type trunc (genF16Type x) | Returns a value equal to the nearest integer to x | 517 | | whose absolute value is not larger than the | 518 | | absolute value of x. | 519 +------------------------------------------------+----------------------------------------------------+ 520 | genF16Type round (genF16Type x) | Returns a value equal to the nearest integer to x. | 521 | | The fraction 0.5 will round in a direction chosen | 522 | | by the implementation, presumably the direction | 523 | | that is fastest. This includes the possibility | 524 | | that round(x) returns the same value as | 525 | | roundEven(x) for all values of x. | 526 +------------------------------------------------+----------------------------------------------------+ 527 | genF16Type roundEven (genF16Type x) | Returns a value equal to the nearest integer to x. | 528 | | A fractional part of 0.5 will round toward the | 529 | | nearest even integer. (Both 3.5 and 4.5 for x will | 530 | | return 4.0.) | 531 +------------------------------------------------+----------------------------------------------------+ 532 | genF16Type ceil (genF16Type x) | Returns a value equal to the nearest integer that | 533 | | is greater than or equal to x. | 534 +------------------------------------------------+----------------------------------------------------+ 535 | genF16Type fract (genF16Type x) | Returns x - floor(x). | 536 +------------------------------------------------+----------------------------------------------------+ 537 | genF16Type mod (genF16Type x, float16_t y) | Modulus. Returns x - y * floor(x/y). | 538 | genF16Type mod (genF16Type x, genF16Type y) | | 539 +------------------------------------------------+----------------------------------------------------+ 540 | genF16Type modf(genF16Type x, out genF16Type i)| Returns the fractional part of x and sets i to the | 541 | | integer part (as a whole number floating-point | 542 | | value). Both the return value and the output | 543 | | parameter will have the same sign as x. | 544 +------------------------------------------------+----------------------------------------------------+ 545 | genF16Type min(genF16Type x, | Returns y if y < x; otherwise it returns x. | 546 | genF16Type y) | | 547 | genF16Type min(genF16Type x, | | 548 | float16_t y) | | 549 +------------------------------------------------+----------------------------------------------------+ 550 | genF16Type max(genF16Type x, | Returns y if x < y; otherwise it returns x. | 551 | genF16Type y) | | 552 | genF16Type max(genF16Type x, | | 553 | float16_t y) | | 554 +------------------------------------------------+----------------------------------------------------+ 555 | genF16Type clamp(genF16Type x, | Returns min(max(x, minVal), maxVal). | 556 | genF16Type minVal, | | 557 | genF16Type maxVal) | Results are undefined if minVal > maxVal. | 558 | genF16Type clamp(genF16Type x, | | 559 | float16_t minVal, | | 560 | float16_t maxVal) | | 561 +------------------------------------------------+----------------------------------------------------+ 562 | genF16Type mix(genF16Type x, | Selects which vector each returned component comes | 563 | genF16Type y, | from. For a component of a that is false, the | 564 | genF16Type a) | corresponding component of x is returned. For a | 565 | genF16Type mix(genF16Type x, | component of a that is true, the corresponding | 566 | genF16Type y, | component of y is returned. | 567 | float16_t a) | | 568 | genF16Type mix(genF16Type x, | | 569 | genF16Type y, | | 570 | genBType a) | | 571 +------------------------------------------------+----------------------------------------------------+ 572 | genF16Type step (genF16Type edge, genF16Type x)| Returns 0.0 if x < edge; otherwise it returns 1.0. | 573 | genF16Type step (float16_t edge, genF16Type x) | | 574 +------------------------------------------------+----------------------------------------------------+ 575 | genF16Type smoothstep (genF16Type edge0, | Returns 0.0 if x <= edge0 and 1.0 if x >= edge1 | 576 | genF16Type edge1, | and performs smooth Hermite interpolation between 0| 577 | genF16Type x) | and 1 when edge0 < x < edge1. This is useful in | 578 | genF16Type smoothstep (float16_t edge0, | cases where you would want a threshold function | 579 | float16_t edge1 | with a smooth,transition. This is equivalent to: | 580 | genF16Type x) | genF16Type t; | 581 | | t = clamp((x - edge0) / (edge1 - edge0), 0, 1); | 582 | | return t * t * (3 - 2 * t); | 583 | | Results are undefined if edge0 >= edge1. | 584 +------------------------------------------------+----------------------------------------------------+ 585 | genBType isnan (genF16Type x) | Returns true if x holds a NaN. Returns false | 586 | | otherwise. Always returns false if NaNs are not | 587 | | implemented. | 588 +------------------------------------------------+----------------------------------------------------+ 589 | genBType isinf (genF16Type x) | Returns true if x holds a positive infinity or | 590 | | negative infinity. Returns false otherwise. | 591 +------------------------------------------------+----------------------------------------------------+ 592 | genF16Type fma (genF16Type a, genF16Type b, | Computes and returns a * b + c. | 593 | genF16Type c) | | 594 +------------------------------------------------+----------------------------------------------------+ 595 | genF16Type frexp (genF16Type x, | Splits x into a floating-point significand in the | 596 | out genIType exp) | range [0.5, 1.0) and an integral exponent of two, | 597 | | such that: | 598 | | x = significand * 2^exp | 599 | | The significand is returned by the function and the| 600 | | exponent is returned in the parameter exp. For a | 601 | | floating-point value of zero, the significand and | 602 | | exponent are both zero. For a floating-point value | 603 | | that is an infinity or is not a number, the results| 604 | | are undefined. | 605 +------------------------------------------------+----------------------------------------------------+ 606 | genF16Type ldexp (genF16Type x, | Builds a floating-point number from x and the | 607 | in genIType exp) | corresponding integral exponent of two in exp, | 608 | | returning: | 609 | | x* 2^exp | 610 | | If this product is too large to be represented in | 611 | | the floating-point type, the result is undefined. | 612 +------------------------------------------------+----------------------------------------------------+ 613 614 615 Modify Section 8.4, Floating-Point Pack and Unpack Functions 616 617 (add to the table of pack and unpack functions on p. 149) 618 619 +-----------------------------------+------------------------------------------------------+ 620 | Syntax | Desciption | 621 +-----------------------------------+------------------------------------------------------+ 622 | uint packFloat2x16(f16vec2 v) | Returns an unsigned 32-bit integer obtained by | 623 | | packing the components of a two-component half- | 624 | | precision floating-point vector, respectively. The | 625 | | first vector component specifies the 16 least | 626 | | significant bits; the second component specifies the | 627 | | 16 most significant bits. | 628 +-----------------------------------+------------------------------------------------------+ 629 | f16vec2 unpackFloat2x16(uint v) | Returns a two-component half-precision floating-point| 630 | | vector built from a 32-bit unsigned integer scalar, | 631 | | respectively. The first component of the vector | 632 | | contains the 16 least significant bits of the input; | 633 | | the second component contains the 16 most | 634 | | significant bits. | 635 +-----------------------------------+------------------------------------------------------+ 636 637 638 Modify Section 8.5 Geometric Functions 639 640 (add to table of geometric functions on p.152) 641 642 +-------------------------------------------+-----------------------------------------------+ 643 | Syntax | Desciption | 644 +-------------------------------------------+-----------------------------------------------+ 645 | float16_t length (genF16Type x) | Returns the length of vector x, i.e., | 646 | | sqrt(x[0]*x[0] + x[1]*x[1] + ...) | 647 +-------------------------------------------+-----------------------------------------------+ 648 | float16_t distance (genF16Type p0, | Returns the distance between p0 and p1, i.e., | 649 | genF16Type p1) | length (p0 - p1) | 650 +-------------------------------------------+-----------------------------------------------+ 651 | float16_t dot (genF16Type x, genF16Type y)| Returns the dot product of x and y, i.e., | 652 | | x[0]*y[0] + x[1]*y [1] + ... | 653 +-------------------------------------------+-----------------------------------------------+ 654 | f16vec3 cross (f16vec3 x, f16vec3 y) | Returns the cross product of x and y, i.e., | 655 | | |x[1] * y[2] - y[1] * x[2]| | 656 | | |x[2] * y[0] - y[2] * x[0]| | 657 | | |x[0] * y[1] - y[0] * x[1]| | 658 +-------------------------------------------+-----------------------------------------------+ 659 | genF16Type normalize (genF16Type x) | Returns a vector in the same direction as x | 660 | | but with a length of 1. | 661 +-------------------------------------------+-----------------------------------------------+ 662 | genF16Type faceforward (genF16Type N, | If dot(Nref, I) < 0 return N, otherwise return| 663 | genF16Type I, | -N. | 664 | genF16Type Nref), | | 665 +-------------------------------------------+-----------------------------------------------+ 666 | genF16Type reflect (genF16Type I, | For the incident vector I and surface | 667 | genF16Type N) | orientation N, returns the reflection | 668 | | direction: | 669 | | I - 2 * dot(N, I) * N | 670 | | N must already be normalized in order to | 671 | | achieve the desired result. | 672 +-------------------------------------------+-----------------------------------------------+ 673 | genF16Type refract (genF16Type I, | For the incident vector I and surface normal | 674 | genF16Type N, | N, and the ratio of indices of refraction eta,| 675 | float16_t eta) | return the refraction vector. The result is | 676 | | computed by | 677 | | k = 1.0 - eta * eta * (1.0 - dot(N, I) * | 678 | | dot(N, I)) | 679 | | if (k < 0.0) | 680 | | return genF16Type(0.0) | 681 | | else | 682 | | return eta * I - (eta * dot(N, I) | 683 | | + sqrt(k)) * N | 684 | | The input parameters for the incident vector | 685 | | I and the surface normal N must already be | 686 | | normalized to get the desired results. | 687 +-------------------------------------------+-----------------------------------------------+ 688 689 690 Modify Section, 8.6 Matrix Functions 691 692 (modify the first paragraph of the section on p. 154) 693 694 ..., there is both a single-precision floating-point version, where all 695 arguments and return values are single precision, a double-precision 696 floating-point version, where all arguments and return values are double 697 precision, and a half-precision floating-point version, where all 698 arguments and return values are half precision. 699 700 701 Modify Section, 8.7, Vector Relational Functions 702 703 (add to the table of placeholders at the top of p. 156) 704 705 +-------------+-----------------------------+ 706 | Placeholder | Specific Types Allowed | 707 +-------------+-----------------------------+ 708 | f16vec | f16vec2, f16vec3, f16vec4 | 709 +-------------+-----------------------------+ 710 711 (add to the table of vector relational functions at the bottom of p. 156) 712 713 +-------------------------------------------+-----------------------------------------------+ 714 | Syntax | Desciption | 715 +-------------------------------------------+-----------------------------------------------+ 716 | bvec lessThan(f16vec x, f16vec y) | Returns the component-wise compare of x < y. | 717 +-------------------------------------------+-----------------------------------------------+ 718 | bvec lessThanEqual(f16vec x, f16vec y) | Returns the component-wise compare of x <= y. | 719 +-------------------------------------------+-----------------------------------------------+ 720 | bvec greaterThan(f16vec x, f16vec y) | Returns the component-wise compare of x > y. | 721 +-------------------------------------------+-----------------------------------------------+ 722 | bvec greaterThanEqual(f16vec x, f16vec y) | Returns the component-wise compare of x >= y. | 723 +-------------------------------------------+-----------------------------------------------+ 724 | bvec equal(f16vec x, f16vec y) | Returns the component-wise compare of x == y. | 725 +-------------------------------------------+-----------------------------------------------+ 726 | bvec notEqual(f16vec x, f16vec y) | Returns the component-wise compare of x != y. | 727 +-------------------------------------------+-----------------------------------------------+ 728 729 730 Modify Section 8.13.1 Derivative Functions 731 732 (add to table of derivative functions on p. 181) 733 734 +-------------------------------------------+-----------------------------------------------+ 735 | Syntax | Description | 736 +-------------------------------------------+-----------------------------------------------+ 737 | genF16Type dFdx (genF16Type p) | Returns either dFdxFine(p) or dFdxCoarse(p), | 738 | | based on implementation choice, presumably | 739 | | whichever is the faster, or by whichever is | 740 | | selected in the API through | 741 | | quality-versus-speed hints. | 742 +-------------------------------------------+-----------------------------------------------+ 743 | genF16Type dFdy (genF16Type p) | Returns either dFdyFine(p) or dFdyCoarse(p), | 744 | | based on implementation choice, presumably | 745 | | whichever is the faster, or by whichever is | 746 | | selected in the API through | 747 | | quality-versus-speed hints. | 748 +-------------------------------------------+-----------------------------------------------+ 749 | genF16Type dFdxFine (genF16Type p) | Returns the partial derivative of p with | 750 | | respect to the window x coordinate. Will use | 751 | | local differencing based on the value of p | 752 | | for the current fragment and its immediate | 753 | | neighbor(s). | 754 +-------------------------------------------+-----------------------------------------------+ 755 | genF16Type dFdyFine (genF16Type p) | Returns the partial derivative of p with | 756 | | respect to the window y coordinate. Will use | 757 | | local differencing based on the value of p | 758 | | for the current fragment and its immediate | 759 | | neighbor(s). | 760 +-------------------------------------------+-----------------------------------------------+ 761 | genF16Type dFdxCoarse (genF16Type p) | Returns the partial derivative of p with | 762 | | respect to the window x coordinate. Will use | 763 | | local differencing based on the value of p | 764 | | for the current fragment's neighbors, and | 765 | | will possibly, but not necessarily, include | 766 | | the value of p for the current fragment. That | 767 | | is, over a given area, the implementation can | 768 | | x compute derivatives in fewer unique | 769 | | locations than would be allowed for | 770 | | dFdxFine(p). | 771 +-------------------------------------------+-----------------------------------------------+ 772 | genF16Type dFdyCoarse (genF16Type p) | Returns the partial derivative of p with | 773 | | respect to the window y coordinate. Will use | 774 | | local differencing based on the value of p | 775 | | for the current fragment's neighbors, and | 776 | | will possibly, but not necessarily, include | 777 | | the value of p for the current fragment. That | 778 | | is, over a given area, the implementation can | 779 | | compute y derivatives in fewer unique | 780 | | locations than would be allowed for | 781 | | dFdyFine(p). | 782 +-------------------------------------------+-----------------------------------------------+ 783 | genF16Type fwidth (genF16Type p) | Returns abs(dFdx(p)) + abs(dFdy(p)). | 784 +-------------------------------------------+-----------------------------------------------+ 785 | genF16Type fwidthFine (genF16Type p) | Returns abs(dFdxFine(p)) + abs(dFdyFine(p)). | 786 +-------------------------------------------+-----------------------------------------------+ 787 | genF16Type fwidthCoarse (genF16Type p) | Returns abs(dFdxCoarse(p)) + | 788 | | abs(dFdyCoarse(p)). | 789 +-------------------------------------------+-----------------------------------------------+ 790 791 792 Modify Section 8.13.2 Interpolation Functions 793 794 (add to table of interpolation functions on p. 180) 795 796 +-------------------------------------------+-----------------------------------------------+ 797 | Syntax | Description | 798 +-------------------------------------------+-----------------------------------------------+ 799 | genF16Type interpolateAtCentroid ( | Returns the value of the input interpolant | 800 | genF16Type interpolant) | sampled at a location inside both the pixel | 801 | | and the primitive being processed. The value | 802 | | obtained would be the same value assigned to | 803 | | the input variable if declared with the | 804 | | centroid qualifier | 805 +-------------------------------------------+-----------------------------------------------+ 806 | genF16Type interpolateAtSample ( | Returns the value of the input interpolant | 807 | genF16Type interpolant, | variable at the location of sample number | 808 | int sample) | sample. If multisample buffers are not | 809 | | available, the input variable will be | 810 | | evaluated at the center of the pixel. If | 811 | | sample sample does not exist, the position | 812 | | used to interpolate the input variable is | 813 | | undefined. | 814 +-------------------------------------------+-----------------------------------------------+ 815 | genF16Type interpolateAtOffset ( | Returns the value of the input interpolant | 816 | genF16Type interpolant, | variable sampled at an offset from the center | 817 | f16vec2 offset) | of the pixel specified by offset. The two | 818 | | floating-point components of offset, give the | 819 | | offset in pixels in the x and y directions, | 820 | | respectively. An offset of (0, 0) identifies | 821 | | the center of the pixel. The range and | 822 | | granularity of offsets supported by this | 823 | | function isimplementation-dependent. | 824 +-------------------------------------------+-----------------------------------------------+ 825 826 827 Modify Section 9, Shading Language Grammar for Core Profile 828 829 (add to the list of tokens on p. 187) 830 831 ... 832 FLOAT16 F16VEC2 F16VEC3 F16VEC4 833 F16MAT2 F16MAT3 F16MAT4 834 F16MAT2X2 FL6MAT2X3 F16MAT2X4 835 F16MAT3X2 F16MAT3X3 F16MAT3X4 836 F16MAT4X2 F16MAT4X3 F16MAT4X4 837 ... 838 FLOAT16CONSTANT 839 840 (add to the rule of "primary_expression" on p. 188) 841 842 primary_expression: 843 ... 844 FLOAT16CONSTANT 845 ... 846 847 (add to the rule of "type_specifier_nonarray" on p. 195) 848 849 type_specifier_nonarray: 850 ... 851 FLOAT16 852 F16VEC2 853 F16VEC3 854 F16VEC4 855 F16MAT2 856 F16MAT3 857 F16MAT4 858 F16MAT2X2 859 FL6MAT2X3 860 F16MAT2X4 861 F16MAT3X2 862 F16MAT3X3 863 F16MAT3X4 864 F16MAT4X2 865 F16MAT4X3 866 F16MAT4X4 867 ... 868 869 870Dependencies on ARB_gpu_shader_int64 871 872 If the shader enables ARB_gpu_shader_int64, this extension allows 873 additional explicit conversions between half-precision floating-point 874 types and 64-bit integer types. 875 876 Modify Section 5.4.1, Conversion and Scalar Constructors 877 878 (add after the first list of constructor examples on p. 95) 879 880 int64_t(float16_t) // convert a float16_t value to a signed 64-bit integer 881 uint64_t(float16_t) // convert a float16_t value to an unsigned 64-bit integer 882 float16_t(int64_t) // convert a signed 64-bit integer to a float16_t value 883 float16_t(uint64_t) // convert an unsigned 64-bit integer to a float16_t value 884 885 886Dependencies on AMD_shader_trinary_minmax 887 888 If the shader enables AMD_shader_trinary_minmax, this extension adds 889 additional common functions. 890 891 Modify Section 8.3, Common Functions 892 893 (add to the table of common functions on p. 144) 894 895 +-------------------------------------------+-----------------------------------------------+ 896 | Syntax | Description | 897 +-------------------------------------------+-----------------------------------------------+ 898 | genF16Type min3(genF16Type x, | Returns the per-component minimum value of x, | 899 | genF16Type y, | y, and z. | 900 | genF16Type z) | | 901 +-------------------------------------------+-----------------------------------------------+ 902 | genF16Type max3(genF16Type x, | Returns the per-component maximum value of x, | 903 | genF16Type y, | y, and z. | 904 | genF16Type z) | | 905 +-------------------------------------------+-----------------------------------------------+ 906 | genF16Type mid3(genF16Type x, | Returns the per-component median value of x, | 907 | genF16Type y, | y, and z. | 908 | genF16Type z) | | 909 +-------------------------------------------+-----------------------------------------------+ 910 911 912Dependencies on AMD_shader_explicit_vertex_parameter 913 914 If the shader enables AMD_shader_explicit_vertex_parameter, this extension 915 adds additional interpolation functions. 916 917 Modify Section 8.13.2 Interpolation Functions 918 919 (add to table of interpolation functions on p. 180) 920 921 +-------------------------------------------+-----------------------------------------------+ 922 | Syntax | Description | 923 +-------------------------------------------+-----------------------------------------------+ 924 | genF16Type interpolateAtVertexAMD ( | Returns the value of the input <interpolant> | 925 | genF16Type interpolant, | without any interpolation. i.e. the raw | 926 | uint vertexIdx) | output value of previous shader stage. | 927 | | <vertexIdx> selects for which vertex of the | 928 | | primitive the value of <interpolant> is | 929 | | returned. | 930 | | | 931 | | This return value is equivalent with | 932 | | interpolating the input <interpolant> using | 933 | | the following set of barycentric coordinates, | 934 | | depending on the value of <vertexIdx>: | 935 | | | 936 | | vertexIdx Barycentric coordinates | 937 | | 0 I=0, J=0, K=1 | 938 | | 1 I=1, J=0, K=0 | 939 | | 2 I=0, J=1, K=0 | 940 | | | 941 | | However this order has no association with | 942 | | the vertex order specified by the application | 943 | | in the originating draw. | 944 | | | 945 | | The value of <vertexIdx> must be constant | 946 | | integer expression with a value in the range | 947 | | [0, 2]. | 948 +-------------------------------------------+-----------------------------------------------+ 949 950 951Errors 952 953 None. 954 955New State 956 957 None. 958 959New Implementation Dependent State 960 961 None. 962 963Issues 964 965 (1) How the functionality in this extension different than the half_precision 966 floating-point types introduced by NV_gpu_shader5? 967 968 RESOLVED: This extension is designed to be source code compatible with 969 the half-precison floating-point support in NV_gpu_shader5. However, it 970 is a functional superset of that, as it adds the following additional 971 features: 972 973 * support for implicit conversions from int, uint and float to float16_t. 974 975 * support for overloaded versions of the functions, such as abs, sign, min, 976 max, clamp, and etc., that accept float16_t type or half-precision 977 floating-point type as parameters. 978 979 (2) What should be done to distinguish half-precison floating-point constants? 980 981 RESOLVED: We will use "HF" and "hf" to identify half-precision 982 floating-point constants. 983 984 (3) Should we import new uniform API to setup the float16_t type uniform in 985 default uniform block? 986 987 RESOLVED: No. float16_t isn't a IEEE standard format, CPU doesn't support 988 it directly. So most data on CPU side is stored in the form of single- or 989 double-precision floating-point precision floating-point. Uniform*f{v}'s 990 functionality is extended to support uniforms with float16_t type in this 991 extension. 992 993 (4) Should we support float16_t types as members of uniform blocks, 994 shader storage buffer blocks, or as transform feedback varyings? 995 996 RESOLVED: Yes, support all of them. float16_t types will consume two 997 basic machine units. Some examples: 998 999 struct S { 1000 1001 float16_t x; // rule 1: align = 2, takes offsets 0-1 1002 f16vec2 y; // rule 2: align = 4, takes offsets 4-7 1003 f16vec3 z; // rule 3: align = 8, takes offsets 8-13 1004 }; 1005 1006 layout(column_major, std140) uniform B1 { 1007 1008 float16_t a; // rule 1: align = 2, takes offsets 0-1 1009 f16vec2 b; // rule 2: align = 4, takes offsets 4-7 1010 f16vec3 c; // rule 3: align = 8, takes offsets 8-13 1011 float16_t d[2]; // rule 4: align = 16, array stride = 16, 1012 // takes offsets 16-47 1013 f16mat2x3 e; // rule 5: align = 16, matrix stride = 16, 1014 // takes offsets 48-79 1015 f16mat2x3 f[2]; // rule 6: align = 16, matrix stride = 16, 1016 // array stride = 32, f[0] takes 1017 // offsets 80-111, f[1] takes offsets 1018 // 112-143 1019 S g; // rule 9: align = 16, g.x takes offsets 1020 // 144-145, g.y takes offsets 148-151, 1021 // g.z takes offsets 152-159 1022 S h[2]; // rule 10: align = 16, array stride = 16, h[0] 1023 // takes offsets 160-175, h[1] takes 1024 // offsets 176-191 1025 }; 1026 1027 layout(row_major, std430) buffer B2 { 1028 1029 float16_t o; // rule 1: align = 2, takes offsets 0-1 1030 f16vec2 p; // rule 2: align = 4, takes offsets 4-7 1031 f16vec3 q; // rule 3: align = 8, takes offsets 8-13 1032 float16_t r[2]; // rule 4: align = 2, array stride = 2, takes 1033 // offsets 14-17 1034 f16mat2x3 s; // rule 7: align = 4, matrix stride = 4, takes 1035 // offsets 20-31 1036 f16mat2x3 t[2]; // rule 8: align = 4, matrix stride = 4, array 1037 // stride = 12, t[0] takes offsets 1038 // 32-43, t[1] takes offsets 44-55 1039 S u; // rule 9: align = 8, u.x takes offsets 1040 // 56-57, u.y takes offsets 60-63, u.z 1041 // takes offsets 64-69 1042 S v[2]; // rule 10: align = 8, array stride = 16, v[0] 1043 // takes offsets 72-87, v[1] takes 1044 // offsets 88-103 1045 }; 1046 1047 (5) In OpenGL ES Shading Language, the format of floating-point in UBO and 1048 SSBO is always single-precision floating-point regardless of the precision 1049 qualifier in shader. which format should be used for this extension? 1050 1051 RESOLVED: the format should be equal with the type declaried in shader. 1052 i.e. if the block member's type is float16_t, the format in buffer is 1053 half-precision floating-point. and if the block member's type is float, 1054 the format is single-precision floating-point. we will provide another 1055 extension to keep compatible with ES driver's behavior. 1056 1057 1058Revision History 1059 1060 Rev. Date Author Changes 1061 ---- -------- -------- ----------------------------------------- 1062 5 09/21/16 dwitczak Fixed minor character encoding issues. 1063 1064 4 08/01/16 rexu Correct the example of offset calculation for 1065 block members. Add limitation of xfb_offset when 1066 this qualifier is applied to block members that 1067 have float16_t types. 1068 1069 3 07/11/16 rexu Clarify that each component of float16_t types 1070 consume two basic machine units. Remove the 1071 interaction with NV_gpu_shader5 in that implicit 1072 conversion from int, uint and float types to 1073 float16_t types are disallowed now. Add new 1074 derivative functions: dFdxFine, dFdyFine, 1075 dFdxCoarse, dFdyCoarse, fwidthFine, fwidthCoarse. 1076 Add the interaction with AMD_shader_trinary_minmax 1077 and AMD_shader_explicit_vertex_parameter. Remove 1078 two listed issues that are no longer valid for 1079 the updated version of this extension. Remove 1080 floatBitsToInt and decide to add it when 1081 16-bit integer data type is supported. 1082 1083 2 07/06/16 rexu Remove sections that involve half-precision 1084 floating-point opaque types. Modify allowed rules 1085 of implicit conversion relevant to float16_t 1086 types. Add the interaction with ARB_gpu_shader_ 1087 int64. Remove the modification of the first rule 1088 of std140 layout. Provide some examples to 1089 demostrate memory storage layout of uniform 1090 blocks and shader storage blocks when they have 1091 members of float16_t types. 1092 1093 1 11/14/13 qlin Initial revision. 1094