1Name 2 3 INTEL_shader_integer_functions2 4 5Name Strings 6 7 GL_INTEL_shader_integer_functions2 8 9Contact 10 11 Ian Romanick <ian.d.romanick@intel.com> 12 13Contributors 14 15 16Status 17 18 In progress 19 20Version 21 22 Last Modification Date: 11/25/2019 23 Revision: 5 24 25Number 26 27 OpenGL Extension #547 28 OpenGL ES Extension #323 29 30Dependencies 31 32 This extension is written against the OpenGL 4.6 (Core Profile) 33 Specification. 34 35 This extension is written against Version 4.60 (Revision 03) of the OpenGL 36 Shading Language Specification. 37 38 GLSL 1.30 (OpenGL), GLSL ES 3.00 (OpenGL ES), or EXT_gpu_shader4 (OpenGL) 39 is required. 40 41 This extension interacts with ARB_gpu_shader_int64. 42 43 This extension interacts with AMD_gpu_shader_int16. 44 45 This extension interacts with OpenGL 4.6 and ARB_gl_spirv. 46 47 This extension interacts with EXT_shader_explicit_arithmetic_types. 48 49Overview 50 51 OpenCL and other GPU programming environments provides a number of useful 52 functions operating on integer data. Many of these functions are 53 supported by specialized instructions various GPUs. Correct GLSL 54 implementations for some of these functions are non-trivial. Recognizing 55 open-coded versions of these functions is often impractical. As a result, 56 potential performance improvements go unrealized. 57 58 This extension makes available a number of functions that have specialized 59 instruction support on Intel GPUs. 60 61New Procedures and Functions 62 63 None 64 65New Tokens 66 67 None 68 69IP Status 70 71 No known IP claims. 72 73Modifications to the OpenGL Shading Language Specification, Version 4.60 74 75 Including the following line in a shader can be used to control the 76 language features described in this extension: 77 78 #extension GL_INTEL_shader_integer_functions2 : <behavior> 79 80 where <behavior> is as specified in section 3.3. 81 82 New preprocessor #defines are added to the OpenGL Shading Language: 83 84 #define GL_INTEL_shader_integer_functions2 1 85 86Additions to Chapter 8 of the OpenGL Shading Language Specification 87(Built-in Functions) 88 89 Modify Section 8.8, Integer Functions 90 91 (add a new rows after the existing "findMSB" table row, p. 161) 92 93 genUType countLeadingZeros(genUType value) 94 95 Returns the number of leading 0-bits, stating at the most significant bit, 96 in the binary representation of value. If value is zero, the size in bits 97 of the type of value or component type of value, if value is a vector will 98 be returned. 99 100 101 genUType countTrailingZeros(genUType value) 102 103 Returns the number of trailing 0-bits, stating at the least significant bit, 104 in the binary representation of value. If value is zero, the size in bits 105 of the type of value or component type of value (if value is a vector) will 106 be returned. 107 108 109 genUType absoluteDifference(genUType x, genUType y) 110 genUType absoluteDifference(genIType x, genIType y) 111 genU64Type absoluteDifference(genU64Type x, genU64Type y) 112 genU64Type absoluteDifference(genI64Type x, genI64Type y) 113 genU16Type absoluteDifference(genU16Type x, genU16Type y) 114 genU16Type absoluteDifference(genI16Type x, genI16Type y) 115 116 Returns |x - y| clamped to the range of the return type (instead of modulo 117 overflowing). Note: the return type of each of these functions is an 118 unsigned type of the same bit-size and vector element count. 119 120 121 genUType addSaturate(genUType x, genUType y) 122 genIType addSaturate(genIType x, genIType y) 123 genU64Type addSaturate(genU64Type x, genU64Type y) 124 genI64Type addSaturate(genI64Type x, genI64Type y) 125 genU16Type addSaturate(genU16Type x, genU16Type y) 126 genI16Type addSaturate(genI16Type x, genI16Type y) 127 128 Returns x + y clamped to the range of the type of x (instead of modulo 129 overflowing). 130 131 132 genUType average(genUType x, genUType y) 133 genIType average(genIType x, genIType y) 134 genU64Type average(genU64Type x, genU64Type y) 135 genI64Type average(genI64Type x, genI64Type y) 136 genU16Type average(genU16Type x, genU16Type y) 137 genI16Type average(genI16Type x, genI16Type y) 138 139 Returns (x+y) >> 1. The intermediate sum does not modulo overflow. 140 141 142 genUType averageRounded(genUType x, genUType y) 143 genIType averageRounded(genIType x, genIType y) 144 genU64Type averageRounded(genU64Type x, genU64Type y) 145 genI64Type averageRounded(genI64Type x, genI64Type y) 146 genU16Type averageRounded(genU16Type x, genU16Type y) 147 genI16Type averageRounded(genI16Type x, genI16Type y) 148 149 Returns (x+y+1) >> 1. The intermediate sum does not modulo overflow. 150 151 152 genUType subtractSaturate(genUType x, genUType y) 153 genIType subtractSaturate(genIType x, genIType y) 154 genU64Type subtractSaturate(genU64Type x, genU64Type y) 155 genI64Type subtractSaturate(genI64Type x, genI64Type y) 156 genU16Type subtractSaturate(genU16Type x, genU16Type y) 157 genI16Type subtractSaturate(genI16Type x, genI16Type y) 158 159 Returns x - y clamped to the range of the type of x (instead of modulo 160 overflowing). 161 162 163 genUType multiply32x16(genUType x_32_bits, genUType y_16_bits) 164 genIType multiply32x16(genIType x_32_bits, genIType y_16_bits) 165 genUType multiply32x16(genUType x_32_bits, genU16Type y_16_bits) 166 genIType multiply32x16(genIType x_32_bits, genI16Type y_16_bits) 167 168 Returns x * y, where only the (possibly sign-extended) low 16-bits of y 169 are used. In cases where one of the signed operands is known to be in the 170 range [-2^15, (2^15)-1] or unsigned operands is known to be in the range 171 [0, (2^16)-1], this may provide a higher performance multiply. 172 173Interactions with OpenGL 4.6 and ARB_gl_spirv 174 175 If OpenGL 4.6 or ARB_gl_spirv is supported, then 176 SPV_INTEL_shader_integer_functions2 must also be supported. 177 178 The IntegerFunctions2INTEL capability is available whenever the 179 implementation supports INTEL_shader_integer_functions2. 180 181Interactions with ARB_gpu_shader_int64 and EXT_shader_explicit_arithmetic_types_int64 182 183 If the shader enables only INTEL_shader_integer_functions2 but not 184 ARB_gpu_shader_int64 or EXT_shader_explicit_arithmetic_types_int64, 185 remove all function overloads that have either genU64Type or genI64Type 186 parameters. 187 188Interactions with AMD_gpu_shader_int16 and EXT_shader_explicit_arithmetic_types_int16 189 190 If the shader enables only INTEL_shader_integer_functions2 but not 191 AMD_gpu_shader_int16 or EXT_shader_explicit_arithmetic_types_int16, 192 remove all function overloads that have either genU16Type or genI16Type 193 parameters. 194 195Issues 196 197 1) What should this extension be called? 198 199 RESOLVED. There already exists a MESA_shader_integer_functions extension, 200 so this is called INTEL_shader_integer_functions2 to prevent confusion. 201 202 2) How does countLeadingZeros differ from findMSB? 203 204 RESOLVED: countLeadingZeros is only defined for unsigned types, and it is 205 equivalent to 32-(findMSB(x)+1). This corresponds the clz() function in 206 OpenCL and the LZD (leading zero detection) instruction on Intel GPUs. 207 208 3) How does countTrailingZeros differ from findLSB? 209 210 RESOLVED: countTrailingZeros is equivalent to min(genUType(findLSB(x)), 211 32). This corresponds to the ctz() function in OpenCL. 212 213 4) Should 64-bit versions of countLeadingZeros and countTrailingZeros be 214 provided? 215 216 RESOLVED: NO. OpenCL has 64-bit versions of clz() and ctz(), but OpenGL 217 does not have 64-bit versions of findMSB() or findLSB() even when 218 ARB_gpu_shader_int64 is supported. The instructions used to implement 219 countLeadingZeros and countTrailingZeros do not natively support 64-bit 220 operands. 221 222 The implementation of 64-bit countLeadingZeros() would be 5 instructions, 223 and the implementation of 64-bit countTrailingZeros() would be 7 224 instructions. Neither of these is better than an application developer 225 could achieve in GLSL: 226 227 uint countLeadingZeros(uint64_t value) 228 { 229 uvec2 v = unpackUint2x32(value); 230 231 return v.y == 0 232 ? 32 + countLeadingZeros(v.x) : countLeadingZeros(v.y); 233 } 234 235 uint countTrailingZeros(uint64_t value) 236 { 237 uvec2 v = unpackUint2x32(value); 238 239 return v.x == 0 240 ? 32 + countTrailingZeros(v.y) : countTrailingZeros(v.x); 241 } 242 243 5) Should 64-bit versions of the arithmetic functions be provided? 244 245 RESOLVED: NO. Since recent generations of Intel GPUs have removed 246 hardware support for 64-bit integer arithmetic, there doesn't seem to be 247 much value in providing 64-bit arithmetic functions. 248 249 6) Should this extension include average()? 250 251 RESOLVED: YES. average() corresponds to hadd() in OpenCL, and 252 averageRounded() corresponds to rhadd() in OpenCL. 253 254 averageRounded() corresponds to the AVG instruction on Intel GPUs. 255 average(), on the other hand, does not correspond to a single instruction. 256 The signed and unsigned versions may have slightly different 257 implementations depending on the specific GPU. In the worst case, the 258 implementation is 4 instructions (e.g., averageRounded(x, y) - ((x ^ y) & 259 1)), and in the best case it is 3 instructions. 260 261Revision History 262 263 Rev Date Author Changes 264 --- ----------- -------- --------------------------------------------- 265 1 04-Sep-2018 idr Initial version. 266 2 19-Sep-2018 idr Add interactions with AMD_gpu_shader_int16. 267 3 22-Jan-2019 idr Add interactions with EXT_shader_explicit_arithmetic_types. 268 4 14-Nov-2019 idr Resolve issue #1 and issue #5. 269 5 25-Nov-2019 idr Fix a bunch of typos noticed by @cmarcelo. 270