• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    INTEL_shader_integer_functions2
4
5Name Strings
6
7    GL_INTEL_shader_integer_functions2
8
9Contact
10
11    Ian Romanick <ian.d.romanick@intel.com>
12
13Contributors
14
15
16Status
17
18    In progress
19
20Version
21
22    Last Modification Date: 11/25/2019
23    Revision: 5
24
25Number
26
27    OpenGL Extension #547
28    OpenGL ES Extension #323
29
30Dependencies
31
32    This extension is written against the OpenGL 4.6 (Core Profile)
33    Specification.
34
35    This extension is written against Version 4.60 (Revision 03) of the OpenGL
36    Shading Language Specification.
37
38    GLSL 1.30 (OpenGL), GLSL ES 3.00 (OpenGL ES), or EXT_gpu_shader4 (OpenGL)
39    is required.
40
41    This extension interacts with ARB_gpu_shader_int64.
42
43    This extension interacts with AMD_gpu_shader_int16.
44
45    This extension interacts with OpenGL 4.6 and ARB_gl_spirv.
46
47    This extension interacts with EXT_shader_explicit_arithmetic_types.
48
49Overview
50
51    OpenCL and other GPU programming environments provides a number of useful
52    functions operating on integer data.  Many of these functions are
53    supported by specialized instructions various GPUs.  Correct GLSL
54    implementations for some of these functions are non-trivial.  Recognizing
55    open-coded versions of these functions is often impractical.  As a result,
56    potential performance improvements go unrealized.
57
58    This extension makes available a number of functions that have specialized
59    instruction support on Intel GPUs.
60
61New Procedures and Functions
62
63    None
64
65New Tokens
66
67    None
68
69IP Status
70
71    No known IP claims.
72
73Modifications to the OpenGL Shading Language Specification, Version 4.60
74
75    Including the following line in a shader can be used to control the
76    language features described in this extension:
77
78      #extension GL_INTEL_shader_integer_functions2 : <behavior>
79
80    where <behavior> is as specified in section 3.3.
81
82    New preprocessor #defines are added to the OpenGL Shading Language:
83
84      #define GL_INTEL_shader_integer_functions2        1
85
86Additions to Chapter 8 of the OpenGL Shading Language Specification
87(Built-in Functions)
88
89    Modify Section 8.8, Integer Functions
90
91    (add a new rows after the existing "findMSB" table row, p. 161)
92
93    genUType countLeadingZeros(genUType value)
94
95    Returns the number of leading 0-bits, stating at the most significant bit,
96    in the binary representation of value.  If value is zero, the size in bits
97    of the type of value or component type of value, if value is a vector will
98    be returned.
99
100
101    genUType countTrailingZeros(genUType value)
102
103    Returns the number of trailing 0-bits, stating at the least significant bit,
104    in the binary representation of value.  If value is zero, the size in bits
105    of the type of value or component type of value (if value is a vector) will
106    be returned.
107
108
109    genUType absoluteDifference(genUType x, genUType y)
110    genUType absoluteDifference(genIType x, genIType y)
111    genU64Type absoluteDifference(genU64Type x, genU64Type y)
112    genU64Type absoluteDifference(genI64Type x, genI64Type y)
113    genU16Type absoluteDifference(genU16Type x, genU16Type y)
114    genU16Type absoluteDifference(genI16Type x, genI16Type y)
115
116    Returns |x - y| clamped to the range of the return type (instead of modulo
117    overflowing).  Note: the return type of each of these functions is an
118    unsigned type of the same bit-size and vector element count.
119
120
121    genUType addSaturate(genUType x, genUType y)
122    genIType addSaturate(genIType x, genIType y)
123    genU64Type addSaturate(genU64Type x, genU64Type y)
124    genI64Type addSaturate(genI64Type x, genI64Type y)
125    genU16Type addSaturate(genU16Type x, genU16Type y)
126    genI16Type addSaturate(genI16Type x, genI16Type y)
127
128    Returns x + y clamped to the range of the type of x (instead of modulo
129    overflowing).
130
131
132    genUType average(genUType x, genUType y)
133    genIType average(genIType x, genIType y)
134    genU64Type average(genU64Type x, genU64Type y)
135    genI64Type average(genI64Type x, genI64Type y)
136    genU16Type average(genU16Type x, genU16Type y)
137    genI16Type average(genI16Type x, genI16Type y)
138
139    Returns (x+y) >> 1.  The intermediate sum does not modulo overflow.
140
141
142    genUType averageRounded(genUType x, genUType y)
143    genIType averageRounded(genIType x, genIType y)
144    genU64Type averageRounded(genU64Type x, genU64Type y)
145    genI64Type averageRounded(genI64Type x, genI64Type y)
146    genU16Type averageRounded(genU16Type x, genU16Type y)
147    genI16Type averageRounded(genI16Type x, genI16Type y)
148
149    Returns (x+y+1) >> 1.  The intermediate sum does not modulo overflow.
150
151
152    genUType subtractSaturate(genUType x, genUType y)
153    genIType subtractSaturate(genIType x, genIType y)
154    genU64Type subtractSaturate(genU64Type x, genU64Type y)
155    genI64Type subtractSaturate(genI64Type x, genI64Type y)
156    genU16Type subtractSaturate(genU16Type x, genU16Type y)
157    genI16Type subtractSaturate(genI16Type x, genI16Type y)
158
159    Returns x - y clamped to the range of the type of x (instead of modulo
160    overflowing).
161
162
163    genUType multiply32x16(genUType x_32_bits, genUType y_16_bits)
164    genIType multiply32x16(genIType x_32_bits, genIType y_16_bits)
165    genUType multiply32x16(genUType x_32_bits, genU16Type y_16_bits)
166    genIType multiply32x16(genIType x_32_bits, genI16Type y_16_bits)
167
168    Returns x * y, where only the (possibly sign-extended) low 16-bits of y
169    are used.  In cases where one of the signed operands is known to be in the
170    range [-2^15, (2^15)-1] or unsigned operands is known to be in the range
171    [0, (2^16)-1], this may provide a higher performance multiply.
172
173Interactions with OpenGL 4.6 and ARB_gl_spirv
174
175    If OpenGL 4.6 or ARB_gl_spirv is supported, then
176    SPV_INTEL_shader_integer_functions2 must also be supported.
177
178    The IntegerFunctions2INTEL capability is available whenever the
179    implementation supports INTEL_shader_integer_functions2.
180
181Interactions with ARB_gpu_shader_int64 and EXT_shader_explicit_arithmetic_types_int64
182
183    If the shader enables only INTEL_shader_integer_functions2 but not
184    ARB_gpu_shader_int64 or EXT_shader_explicit_arithmetic_types_int64,
185    remove all function overloads that have either genU64Type or genI64Type
186    parameters.
187
188Interactions with AMD_gpu_shader_int16 and EXT_shader_explicit_arithmetic_types_int16
189
190    If the shader enables only INTEL_shader_integer_functions2 but not
191    AMD_gpu_shader_int16 or EXT_shader_explicit_arithmetic_types_int16,
192    remove all function overloads that have either genU16Type or genI16Type
193    parameters.
194
195Issues
196
197    1) What should this extension be called?
198
199    RESOLVED.  There already exists a MESA_shader_integer_functions extension,
200    so this is called INTEL_shader_integer_functions2 to prevent confusion.
201
202    2) How does countLeadingZeros differ from findMSB?
203
204    RESOLVED: countLeadingZeros is only defined for unsigned types, and it is
205    equivalent to 32-(findMSB(x)+1).  This corresponds the clz() function in
206    OpenCL and the LZD (leading zero detection) instruction on Intel GPUs.
207
208    3) How does countTrailingZeros differ from findLSB?
209
210    RESOLVED: countTrailingZeros is equivalent to min(genUType(findLSB(x)),
211    32).  This corresponds to the ctz() function in OpenCL.
212
213    4) Should 64-bit versions of countLeadingZeros and countTrailingZeros be
214    provided?
215
216    RESOLVED: NO.  OpenCL has 64-bit versions of clz() and ctz(), but OpenGL
217    does not have 64-bit versions of findMSB() or findLSB() even when
218    ARB_gpu_shader_int64 is supported.  The instructions used to implement
219    countLeadingZeros and countTrailingZeros do not natively support 64-bit
220    operands.
221
222    The implementation of 64-bit countLeadingZeros() would be 5 instructions,
223    and the implementation of 64-bit countTrailingZeros() would be 7
224    instructions.  Neither of these is better than an application developer
225    could achieve in GLSL:
226
227        uint countLeadingZeros(uint64_t value)
228        {
229            uvec2 v = unpackUint2x32(value);
230
231            return v.y == 0
232                ? 32 + countLeadingZeros(v.x) : countLeadingZeros(v.y);
233        }
234
235        uint countTrailingZeros(uint64_t value)
236        {
237            uvec2 v = unpackUint2x32(value);
238
239            return v.x == 0
240                ? 32 + countTrailingZeros(v.y) : countTrailingZeros(v.x);
241        }
242
243    5) Should 64-bit versions of the arithmetic functions be provided?
244
245    RESOLVED: NO.  Since recent generations of Intel GPUs have removed
246    hardware support for 64-bit integer arithmetic, there doesn't seem to be
247    much value in providing 64-bit arithmetic functions.
248
249    6) Should this extension include average()?
250
251    RESOLVED: YES.  average() corresponds to hadd() in OpenCL, and
252    averageRounded() corresponds to rhadd() in OpenCL.
253
254    averageRounded() corresponds to the AVG instruction on Intel GPUs.
255    average(), on the other hand, does not correspond to a single instruction.
256    The signed and unsigned versions may have slightly different
257    implementations depending on the specific GPU.  In the worst case, the
258    implementation is 4 instructions (e.g., averageRounded(x, y) - ((x ^ y) &
259    1)), and in the best case it is 3 instructions.
260
261Revision History
262
263    Rev  Date         Author    Changes
264    ---  -----------  --------  ---------------------------------------------
265      1  04-Sep-2018  idr       Initial version.
266      2  19-Sep-2018  idr       Add interactions with AMD_gpu_shader_int16.
267      3  22-Jan-2019  idr       Add interactions with EXT_shader_explicit_arithmetic_types.
268      4  14-Nov-2019  idr       Resolve issue #1 and issue #5.
269      5  25-Nov-2019  idr       Fix a bunch of typos noticed by @cmarcelo.
270