• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    AMD_shader_ballot
4
5Name Strings
6
7    GL_AMD_shader_ballot
8
9Contact
10
11    Qun Lin, AMD (quentin.lin 'at' amd.com)
12
13Contributors
14
15    Qun Lin, AMD
16    Graham Sellers, AMD
17    Daniel Rakos, AMD
18    Rex Xu, AMD
19    Dominik Witczak, AMD
20
21Status
22
23    Shipping
24
25Version
26
27    Last Modified Date:         03/28/2018
28    Author Revision:            5
29
30Number
31
32    ???
33
34Dependencies
35
36    This extension is written against the OpenGL Shading Language
37    Specification, Version 4.50.
38
39    This extension requires ARB_shader_group_vote and ARB_shader_ballot.
40
41    This extension interacts with ARB_gpu_shader_int64.
42
43    This extension interacts with AMD_gpu_shader_half_float.
44
45    This extension interacts with AMD_gpu_shader_int16.
46
47Overview
48
49    The extensions ARB_shader_group_vote and ARB_shader_ballot introduced the
50    concept of sub-groups and a set of operations that allow data exchange
51    across shader invocations within a sub-group.
52
53    This extension further extends the capabilities of these extensions with
54    additional sub-group operations.
55
56IP Status
57
58    None.
59
60New Procedures and Functions
61
62    None.
63
64New Tokens
65
66    None.
67
68Modifications to the OpenGL Shading Language Specification, Version 4.50
69
70    Including the following line in a shader can be used to control the
71    language features described in this extension:
72
73      #extension GL_AMD_shader_ballot : <behavior>
74
75    where <behavior> is as specified in section 3.3.
76
77    New preprocessor #defines are added to the OpenGL Shading Language:
78
79      #define GL_AMD_shader_ballot 1
80
81Additions to Chapter 8 of the OpenGL Shading Language (GLSL) Specification,
82version 4.30 (Built-in functions)
83
84    Add Section 8.18, Shader Invocation Group Functions
85
86    The <min>, <max>, <add> group invocation functions process values of the
87    specified value <v> across all active shader invocations in the sub-group
88    with three special group operatons according to the following table:
89
90    Group Operation   Description
91    ---------------   ---------------------------------------------------------
92    Reduce            A reduction operation for values of the specified value
93                      <v> in the sub-group
94
95    InclusiveScan     A binary operation with an identity <I> and <n> (where
96                      <n> is the size of the sub-group) elements { a[0], a[1],
97                      .., a[n] } resulting in { a[0], (a[0] op a[1]), .., (a[0]
98                      op a[1] op .. op a[n-1]) }. <op> could be any of <min>,
99                      <max>, <add>.
100
101    ExclusiveScan     A binary operation with an identity <I> and <n> (where
102                      <n> is the size of the sub-group) elements { a[0], a[1],
103                      .., a[n] } resulting in { I, a[0], (a[0] op a[1]), ..,
104                      (a[0] op a[1] op .. op a[n-2]) }. <op> could be any of
105                      <min>, <max>, <add>.
106
107    The identity <I> in the group operations <InclusiveScan> and <ExclusiveScan>
108    is decided according to the following table:
109
110    Function   Data Type                             Identity
111    --------   -----------------------------------   ----------
112    Min        32-bit signed integer                 INT_MAX
113               64-bit signed integer                 INT64_MAX
114               32-bit unsigned integer               UINT_MAX
115               64-bit unsigned integer               UINT64_MAX
116               16-bit/32-bit/64-bit floating-point   +INF
117
118    Max        32-bit signed integer                 INT_MIN
119               64-bit signed integer                 INT64_MIN
120               32-bit/64-bit unsigned integer        0
121               floating-point                        -INF
122
123    Add        32-bit/64-bit signed integer          0
124               32-bit/64-bit unsigned integer        0
125               16-bit/32-bit/64-bit floating-point   0
126
127    +------------------------------------------------------+-----------------------------------------------------------+
128    | Syntax                                               | Description                                               |
129    +------------------------------------------------------+-----------------------------------------------------------+
130    | genType  minInvocationsAMD(genType  v)               | Returns the minimum value of <v> across all active shader |
131    | genIType minInvocationsAMD(genIType v)               | invocations in the sub-group with <Reduce> group          |
132    | genUType minInvocationsAMD(genUType v)               | operation. These functions must be used in uniform        |
133    | genDType minInvocationsAMD(genDType v)               | control flow. These functions operate component-wise.     |
134    +------------------------------------------------------+-----------------------------------------------------------+
135    | genType  minInvocationsNonUniformAMD(genType  v)     | Returns the minimum value of <v> across all active shader |
136    | genIType minInvocationsNonUniformAMD(genIType v)     | invocations in the sub-group with <Reduce> group          |
137    | genUType minInvocationsNonUniformAMD(genUType v)     | operation. These functions could be used in non-uniform   |
138    | genDType minInvocationsNonUniformAMD(genDType v)     | control flow. These functions operate component-wise.     |
139    +------------------------------------------------------+-----------------------------------------------------------+
140    | genType  minInvocationsInclusiveScanAMD(genType  v)  | Returns the minimum value of <v> across all active shader |
141    | genIType minInvocationsInclusiveScanAMD(genIType v)  | invocations in the sub-group with <InclusiveScan> group   |
142    | genUType minInvocationsInclusiveScanAMD(genUType v)  | operation. These functions must be used in uniform        |
143    | genDType minInvocationsInclusiveScanAMD(genDType v)  | control flow. These functions operate component-wise.     |
144    |                                                      |                                                           |
145    |                                                      |                                                           |
146    |                                                      |                                                           |
147    |                                                      |                                                           |
148    +------------------------------------------------------+-----------------------------------------------------------+
149    | genType  minInvocationsInclusiveScanNonUniformAMD(   | Returns the minimum value of <v> across all active shader |
150    |          genType  v)                                 | invocations in the sub-group with <InclusiveScan> group   |
151    | genType  minInvocationsInclusiveScanNonUniformAMD(   | operation. These functions could be used in non-uniform   |
152    |          genIType v)                                 | control flow. These functions operate component-wise.     |
153    | genUType minInvocationsInclusiveScanNonUniformAMD(   |                                                           |
154    |          genUType v)                                 |                                                           |
155    | genDType minInvocationsInclusiveScanNonUniformAMD(   |                                                           |
156    |          genDType v)                                 |                                                           |
157    +------------------------------------------------------+-----------------------------------------------------------+
158    | genType  minInvocationsExclusiveScanAMD(genType  v)  | Returns the minimum value of <v> across all active shader |
159    | genIType minInvocationsExclusiveScanAMD(genIType v)  | invocations in the sub-group with <ExclusiveScan> group   |
160    | genUType minInvocationsExclusiveScanAMD(genUType v)  | operation. These functions must be used in uniform        |
161    | genDType minInvocationsExclusiveScanAMD(genDType v)  | control flow. These functions operate component-wise.     |
162    |                                                      |                                                           |
163    |                                                      |                                                           |
164    |                                                      |                                                           |
165    |                                                      |                                                           |
166    +------------------------------------------------------+-----------------------------------------------------------+
167    | genType  minInvocationsExclusiveScanNonUniformAMD(   | Returns the minimum value of <v> across all active shader |
168    |          genType  v)                                 | invocations in the sub-group with <ExclusiveScan> group   |
169    | genIType minInvocationsExclusiveScanNonUniformAMD(   | operation. These functions could be used in non-uniform   |
170    |          genIType v)                                 | control flow. These functions operate component-wise.     |
171    | genUType minInvocationsExclusiveScanNonUniformAMD(   |                                                           |
172    |          genUType v)                                 |                                                           |
173    | genDType minInvocationsExclusiveScanNonUniformAMD(   |                                                           |
174    |          genDType v)                                 |                                                           |
175    +------------------------------------------------------+-----------------------------------------------------------+
176    | genType  maxInvocationsAMD(genType  v)               | Returns the maximum value of <v> across all active shader |
177    | genIType maxInvocationsAMD(genIType v)               | invocations in the sub-group with <Reduce> group          |
178    | genUType maxInvocationsAMD(genUType v)               | operation. These functions must be used in uniform        |
179    | genDType maxInvocationsAMD(genDType v)               | control flow. These functions operate component-wise.     |
180    +------------------------------------------------------+-----------------------------------------------------------+
181    | genType  maxInvocationsNonUniformAMD(genType  v)     | Returns the maximum value of <v> across all active shader |
182    | genIType maxInvocationsNonUniformAMD(genIType v)     | invocations in the sub-group with <Reduce> group          |
183    | genUType maxInvocationsNonUniformAMD(genUType v)     | operation. These functions could be used in non-uniform   |
184    | genDType maxInvocationsNonUniformAMD(genDType v)     | control flow. These functions operate component-wise.     |
185    +------------------------------------------------------+-----------------------------------------------------------+
186    | genType  maxInvocationsInclusiveScanAMD(genType  v)  | Returns the maximum value of <v> across all active shader |
187    | genIType maxInvocationsInclusiveScanAMD(genIType v)  | invocations in the sub-group with <InclusiveScan> group   |
188    | genUType maxInvocationsInclusiveScanAMD(genUType v)  | operation. These functions must be used in uniform        |
189    | genDType maxInvocationsInclusiveScanAMD(genDType v)  | control flow. These functions operate component-wise.     |
190    |                                                      |                                                           |
191    |                                                      |                                                           |
192    |                                                      |                                                           |
193    |                                                      |                                                           |
194    +------------------------------------------------------+-----------------------------------------------------------+
195    | genType  maxInvocationsInclusiveScanNonUniformAMD(   | Returns the maximum value of <v> across all active shader |
196    |          genType  v)                                 | invocations in the sub-group with <InclusiveScan> group   |
197    | genType  maxInvocationsInclusiveScanNonUniformAMD(   | operation. These functions could be used in non-uniform   |
198    |          genIType v)                                 | control flow. These functions operate component-wise.     |
199    | genUType maxInvocationsInclusiveScanNonUniformAMD(   |                                                           |
200    |          genUType v)                                 |                                                           |
201    | genDType maxInvocationsInclusiveScanNonUniformAMD(   |                                                           |
202    |          genDType v)                                 |                                                           |
203    +------------------------------------------------------+-----------------------------------------------------------+
204    | genType  maxInvocationsExclusiveScanAMD(genType  v)  | Returns the maximum value of <v> across all active shader |
205    | genIType maxInvocationsExclusiveScanAMD(genIType v)  | invocations in the sub-group with <ExclusiveScan> group   |
206    | genUType maxInvocationsExclusiveScanAMD(genUType v)  | operation. These functions must be used in uniform        |
207    | genDType maxInvocationsExclusiveScanAMD(genDType v)  | control flow. These functions operate component-wise.     |
208    |                                                      |                                                           |
209    |                                                      |                                                           |
210    |                                                      |                                                           |
211    |                                                      |                                                           |
212    +------------------------------------------------------+-----------------------------------------------------------+
213    | genType  maxInvocationsExclusiveScanNonUniformAMD(   | Returns the maximum value of <v> across all active shader |
214    |          genType  v)                                 | invocations in the sub-group with <ExclusiveScan> group   |
215    | genIType maxInvocationsExclusiveScanNonUniformAMD(   | operation. These functions could be used in non-uniform   |
216    |          genIType v)                                 | control flow. These functions operate component-wise.     |
217    | genUType maxInvocationsExclusiveScanNonUniformAMD(   |                                                           |
218    |          genUType v)                                 |                                                           |
219    | genDType maxInvocationsExclusiveScanNonUniformAMD(   |                                                           |
220    |          genDType v)                                 |                                                           |
221    +------------------------------------------------------+-----------------------------------------------------------+
222    | genType  addInvocationsAMD(genType  v)               | Returns the sum of the value of <v> across all active     |
223    | genIType addInvocationsAMD(genIType v)               | shader invocations in the sub-group with <Reduce> group   |
224    | genUType addInvocationsAMD(genUType v)               | operation. These functions must be used in uniform        |
225    | genDType addInvocationsAMD(genDType v)               | control flow. These functions operate component-wise.     |
226    +------------------------------------------------------+-----------------------------------------------------------+
227    | genType  addInvocationsNonUniformAMD(genType  v)     | Returns the sum of the value of <v> across all active     |
228    | genIType addInvocationsNonUniformAMD(genIType v)     | shader invocations in the sub-group with <Reduce> group   |
229    | genUType addInvocationsNonUniformAMD(genUType v)     | operation. These functions could be used in non-uniform   |
230    | genDType addInvocationsNonUniformAMD(genDType v)     | control flow. These functions operate component-wise.     |
231    +------------------------------------------------------+-----------------------------------------------------------+
232    | genType  addInvocationsInclusiveScanAMD(genType  v)  | Returns the sum of the value of <v> across all active     |
233    | genIType addInvocationsInclusiveScanAMD(genIType v)  | shader invocations in the sub-group with <InclusiveScan>  |
234    | genUType addInvocationsInclusiveScanAMD(genUType v)  | group operation. These functions must be used in uniform  |
235    | genDType addInvocationsInclusiveScanAMD(genDType v)  | control flow. These functions operate component-wise.     |
236    |                                                      |                                                           |
237    |                                                      |                                                           |
238    |                                                      |                                                           |
239    |                                                      |                                                           |
240    +------------------------------------------------------+-----------------------------------------------------------+
241    | genType  addInvocationsInclusiveScanNonUniformAMD(   | Returns the sum of the value of <v> across all active     |
242    |          genType  v)                                 | shader invocations in the sub-group with <InclusiveScan>  |
243    | genIType addInvocationsInclusiveScanNonUniformAMD(   | group operation. These functions could be used in         |
244    |          genIType v)                                 | non-uniform control flow. These functions operate         |
245    | genUType addInvocationsInclusiveScanNonUniformAMD(   | component-wise.                                           |
246    |          genUType v)                                 |                                                           |
247    | genDType addInvocationsInclusiveScanNonUniformAMD(   |                                                           |
248    |          genDType v)                                 |                                                           |
249    +------------------------------------------------------+-----------------------------------------------------------+
250    | genType  addInvocationsExclusiveScanAMD(genType  v)  | Returns the sum of the value of <v> across all active     |
251    | genIType addInvocationsExclusiveScanAMD(genIType v)  | shader invocations in the sub-group with <ExclusiveScan>  |
252    | genUType addInvocationsExclusiveScanAMD(genUType v)  | group operation. These functions must be used in uniform  |
253    | genDType addInvocationsExclusiveScanAMD(genDType v)  | control flow. These functions operate component-wise.     |
254    |                                                      |                                                           |
255    |                                                      |                                                           |
256    |                                                      |                                                           |
257    |                                                      |                                                           |
258    +------------------------------------------------------+-----------------------------------------------------------+
259    | genType  addInvocationsExclusiveScanNonUniformAMD(   | Returns the sum of the value of <v> across all active     |
260    |          genType  v)                                 | shader invocations in the sub-group with <ExclusiveScan>  |
261    | genIType addInvocationsExclusiveScanNonUniformAMD(   | group operation. These functions could be used in         |
262    |          genIType v)                                 | non-uniform control flow. These functions operate         |
263    | genUType addInvocationsExclusiveScanNonUniformAMD(   | component-wise.                                           |
264    |          genUType v)                                 |                                                           |
265    | genDType addInvocationsExclusiveScanNonUniformAMD(   |                                                           |
266    |          genDType v)                                 |                                                           |
267    +------------------------------------------------------+-----------------------------------------------------------+
268    | genType  swizzleInvocationsAMD(                      | Swizzles data within a group of 4 consecutive invocations |
269    |          genType data, uvec4 offset)                 | of the sub-group based on <offset> as described below:    |
270    | genIType swizzleInvocationsAMD(                      |                                                           |
271    |          genIType data, uvec4 offset)                | for (i = 0; i < gl_SubGroupSizeARB; i+=4) {               |
272    | genUType swizzleInvocationsAMD(                      |     dataOut[i+0] = isActive[i+offset.x] ?                 |
273    |          genUType data, uvec4 offset)                |                    dataIn[i+offset.x] : 0;                |
274    |                                                      |     dataOut[i+1] = isActive[i+offset.y] ?                 |
275    |                                                      |                    dataIn[i+offset.y] : 0;                |
276    |                                                      |     dataOut[i+2] = isActive[i+offset.z] ?                 |
277    |                                                      |                    dataIn[i+offset.z] : 0;                |
278    |                                                      |     dataOut[i+3] = isActive[i+offset.w] ?                 |
279    |                                                      |                    dataIn[i+offset.w] : 0;                |
280    |                                                      | }                                                         |
281    |                                                      |                                                           |
282    |                                                      | Where:                                                    |
283    |                                                      | - isActive[i] tells whether the invocation with the index |
284    |                                                      |   <i> is currently active in the sub-group.               |
285    |                                                      | - dataIn[i] is the value of <data> for invocation index   |
286    |                                                      |   <i>.                                                    |
287    |                                                      | - dataOut[i] is the return value of the function for      |
288    |                                                      |   invocation index <i>.                                   |
289    |                                                      |                                                           |
290    |                                                      | Components of <offset> must be constant integer           |
291    |                                                      | expression with a value in the range [0, 3].              |
292    +------------------------------------------------------+-----------------------------------------------------------+
293    | genType  swizzleInvocationsMaskedAMD(                | Swizzles data within a group of 32 consecutive            |
294    |          genType data, uvec3 mask)                   | invocations with a limited mask as described below:       |
295    | genIType swizzleInvocationsMaskedAMD(                |                                                           |
296    |          genIType data, uvec3 mask)                  | for (i = 0; i < gl_SubGroupSizeARB; i++) {                |
297    | genUType swizzleInvocationsMaskedAMD(                |     j = (((i & 0x1f) & mask.x) | mask.y) ^ mask.z;        |
298    |          genIType data, uvec3 mask)                  |     j |= (i & 0x20); // which group of 32                 |
299    |                                                      |     dataOut[i] = isActive[j] ? dataIn[j] : 0;             |
300    |                                                      | }                                                         |
301    |                                                      |                                                           |
302    |                                                      | Where:                                                    |
303    |                                                      | - isActive[i] tells whether the invocation with the index |
304    |                                                      |   <i> is currently active in the sub-group.               |
305    |                                                      | - dataIn[i] is the value of <data> for invocation index   |
306    |                                                      |   <i>.                                                    |
307    |                                                      | - dataOut[i] is the return value of the function for      |
308    |                                                      |   invocation index <i>.                                   |
309    |                                                      |                                                           |
310    |                                                      | Components of <mask> must be constant integer expression  |
311    |                                                      | with a value in the range [0, 31].                        |
312    +------------------------------------------------------+-----------------------------------------------------------+
313    | genType  writeInvocationAMD(                         | Returns <inputValue> for all active invocations in the    |
314    |          genType  inputValue,                        | sub-group except for the invocation whose invocation      |
315    |          genType  writeValue,                        | index within the sub-group is <invocationIndex> for which |
316    |          uint     invocationIndex)                   | <writeValue> is returned as described below:              |
317    | genIType writeInvocationAMD(                         |                                                           |
318    |          genIType inputValue,                        | for (i = 0; i < gl_SubGroupSizeARB; i++) {                |
319    |          genIType writeValue,                        |     out[i] = (i == invocationIndex) ?                     |
320    |          uint     invocationIndex)                   |              writeValue:inputValue;                       |
321    | genUType writeInvocationAMD(                         | }                                                         |
322    |          genUType inputValue,                        |                                                           |
323    |          genUType writeValue,                        | Where out[i] is the return value of the function for      |
324    |          uint     invocationIndex)                   | invocation index <i>.                                     |
325    |                                                      |                                                           |
326    |                                                      | <writeValue> and <invocationIndex> must be dynamically    |
327    |                                                      | uniform within the sub-group, otherwise the return value  |
328    |                                                      | of the function is undefined.                             |
329    +------------------------------------------------------+-----------------------------------------------------------+
330
331Dependencies on ARB_gpu_shader_int64
332
333    If the shader enables ARB_gpu_shader_int64, this extension adds additional
334    shader invocation group functions.
335
336    Add Section 8.18, Shader Invocation Group Functions
337
338    +------------------------------------------------------+-----------------------------------------------------------+
339    | Syntax                                               | Description                                               |
340    +------------------------------------------------------+-----------------------------------------------------------+
341    | genI64Type minInvocationsAMD(genI64Type v)           | Returns the minimum value of <v> across all active shader |
342    | genU64Type minInvocationsAMD(genU64Type v)           | invocations in the sub-group with <Reduce> group          |
343    |                                                      | operation. These functions must be used in uniform        |
344    |                                                      | control flow. These functions operate component-wise.     |
345    +------------------------------------------------------+-----------------------------------------------------------+
346    | genI64Type minInvocationsNonUniformAMD(genI64Type v) | Returns the minimum value of <v> across all active shader |
347    | genU64Type minInvocationsNonUniformAMD(genU64Type v) | invocations in the sub-group with <Reduce> group          |
348    |                                                      | operation. These functions could be used in non-uniform   |
349    |                                                      | control flow. These functions operate component-wise.     |
350    +------------------------------------------------------+-----------------------------------------------------------+
351    | genI64Type minInvocationsInclusiveScanAMD(           | Returns the minimum value of <v> across all active shader |
352    |            genI64Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
353    | genU64Type minInvocationsInclusiveScanAMD(           | operation. These functions must be used in uniform        |
354    |            genU64Type v)                             | control flow. These functions operate component-wise.     |
355    +------------------------------------------------------+-----------------------------------------------------------+
356    | genI64Type minInvocationsInclusiveScanNonUniformAMD( | Returns the minimum value of <v> across all active shader |
357    |            genI64Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
358    | genU64Type minInvocationsInclusiveScanNonUniformAMD( | operation. These functions could be used in non-uniform   |
359    |            genU64Type v)                             | control flow. These functions operate component-wise.     |
360    +------------------------------------------------------+-----------------------------------------------------------+
361    | genI64Type minInvocationsExclusiveScanAMD(           | Returns the minimum value of <v> across all active shader |
362    |            genI64Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
363    | genU64Type minInvocationsExclusiveScanAMD(           | operation. These functions must be used in uniform        |
364    |            genU64Type v)                             | control flow. These functions operate component-wise.     |
365    +------------------------------------------------------+-----------------------------------------------------------+
366    | genI64Type minInvocationsExclusiveScanNonUniformAMD( | Returns the minimum value of <v> across all active shader |
367    |            genI64Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
368    | genU64Type minInvocationsExclusiveScanNonUniformAMD( | operation. These functions could be used in non-uniform   |
369    |            genU64Type v)                             | control flow. These functions operate component-wise.     |
370    +------------------------------------------------------+-----------------------------------------------------------+
371    | genI64Type maxInvocationsAMD(genI64Type v)           | Returns the maximum value of <v> across all active shader |
372    | genU64Type maxInvocationsAMD(genU64Type v)           | invocations in the sub-group with <Reduce> group          |
373    |                                                      | operation. These functions must be used in uniform        |
374    |                                                      | control flow. These functions operate component-wise.     |
375    +------------------------------------------------------+-----------------------------------------------------------+
376    | genI64Type maxInvocationsNonUniformAMD(genI64Type v) | Returns the maximum value of <v> across all active shader |
377    | genU64Type maxInvocationsNonUniformAMD(genU64Type v) | invocations in the sub-group with <Reduce> group          |
378    |                                                      | operation. These functions could be used in non-uniform   |
379    |                                                      | control flow. These functions operate component-wise.     |
380    +------------------------------------------------------+-----------------------------------------------------------+
381    | genI64Type maxInvocationsInclusiveScanAMD(           | Returns the maximum value of <v> across all active shader |
382    |            genI64Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
383    | genU64Type maxInvocationsInclusiveScanAMD(           | operation. These functions must be used in uniform        |
384    |            genU64Type v)                             | control flow. These functions operate component-wise.     |
385    +------------------------------------------------------+-----------------------------------------------------------+
386    | genI64Type maxInvocationsInclusiveScanNonUniformAMD( | Returns the maximum value of <v> across all active shader |
387    |            genI64Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
388    | genU64Type maxInvocationsInclusiveScanNonUniformAMD( | operation. These functions could be used in non-uniform   |
389    |            genU64Type v)                             | control flow. These functions operate component-wise.     |
390    +------------------------------------------------------+-----------------------------------------------------------+
391    | genI64Type maxInvocationsExclusiveScanAMD(           | Returns the maximum value of <v> across all active shader |
392    |            genI64Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
393    | genU64Type maxInvocationsExclusiveScanAMD(           | operation. These functions must be used in uniform        |
394    |            genU64Type v)                             | control flow. These functions operate component-wise.     |
395    +------------------------------------------------------+-----------------------------------------------------------+
396    | genI64Type maxInvocationsExclusiveScanNonUniformAMD( | Returns the maximum value of <v> across all active shader |
397    |            genI64Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
398    | genU64Type maxInvocationsExclusiveScanNonUniformAMD( | operation. These functions could be used in non-uniform   |
399    |            genU64Type v)                             | control flow. These functions operate component-wise.     |
400    +------------------------------------------------------+-----------------------------------------------------------+
401    | genI64Type addInvocationsAMD(genI64Type v)           | Returns the sum of the value of <v> across all active     |
402    | genU64Type addInvocationsAMD(genU64Type v)           | shader invocations in the sub-group with <Reduce> group   |
403    |                                                      | operation. These functions must be used in uniform        |
404    |                                                      | control flow. These functions operate component-wise.     |
405    +------------------------------------------------------+-----------------------------------------------------------+
406    | genI64Type addInvocationsNonUniformAMD(genI64Type v) | Returns the sum of the value of <v> across all active     |
407    | genU64Type addInvocationsNonUniformAMD(genU64Type v) | shader invocations in the sub-group with <Reduce> group   |
408    |                                                      | operation. These functions could be used in non-uniform   |
409    |                                                      | control flow. These functions operate component-wise.     |
410    +------------------------------------------------------+-----------------------------------------------------------+
411    | genI64Type addInvocationsInclusiveScanAMD(           | Returns the sum of the value of <v> across all active     |
412    |            genI64Type v)                             | shader invocations in the sub-group with <InclusiveScan>  |
413    | genU64Type addInvocationsInclusiveScanAMD(           | group operation. These functions must be used in uniform  |
414    |            genU64Type v)                             | control flow. These functions operate component-wise.     |
415    +------------------------------------------------------+-----------------------------------------------------------+
416    | genI64Type addInvocationsInclusiveScanNonUniformAMD( | Returns the sum of the value of <v> across all active     |
417    |            genI64Type v)                             | shader invocations in the sub-group with <InclusiveScan>  |
418    | genU64Type addInvocationsInclusiveScanNonUniformAMD( | group operation. These functions could be used in         |
419    |            genU64Type v)                             | non-uniform control flow. These functions operate         |
420    |                                                      | component-wise.                                           |
421    +------------------------------------------------------+-----------------------------------------------------------+
422    | genI64Type addInvocationsExclusiveScanAMD(           | Returns the sum of the value of <v> across all active     |
423    |            genI64Type v)                             | shader invocations in the sub-group with <ExclusiveScan>  |
424    | genU64Type addInvocationsExclusiveScanAMD(           | group operation. These functions must be used in uniform  |
425    |            genU64Type v)                             | control flow. These functions operate component-wise.     |
426    +------------------------------------------------------+-----------------------------------------------------------+
427    | genI64Type addInvocationsExclusiveScanNonUniformAMD( | Returns the sum of the value of <v> across all active     |
428    |            genI64Type  v)                            | shader invocations in the sub-group with <ExclusiveScan>  |
429    | genU64Type addInvocationsExclusiveScanNonUniformAMD( | group operation. These functions could be used in         |
430    |            genU64Type v)                             | non-uniform control flow. These functions operate         |
431    |                                                      | component-wise.                                           |
432    +------------------------------------------------------+-----------------------------------------------------------+
433    | uint mbcntAMD(uint64_t mask)                         | Returns the bit count of gl_SubGroupLtMaskARB with <mask> |
434    |                                                      | as described below:                                       |
435    |                                                      |                                                           |
436    |                                                      |   bitCount(gl_SubGroupLtMaskARB & mask).                  |
437    +------------------------------------------------------+-----------------------------------------------------------+
438
439Dependencies on AMD_gpu_shader_half_float
440
441    If the shader enables AMD_gpu_shader_half_float, this extension adds
442    additional shader invocation group functions.
443
444    Add Section 8.18, Shader Invocation Group Functions
445
446    +------------------------------------------------------+-----------------------------------------------------------+
447    | Syntax                                               | Description                                               |
448    +------------------------------------------------------+-----------------------------------------------------------+
449    | genF16Type minInvocationsAMD(genF16Type v)           | Returns the minimum value of <v> across all active shader |
450    |                                                      | invocations in the sub-group with <Reduce> group          |
451    |                                                      | operation. These functions must be used in uniform        |
452    |                                                      | control flow. These functions operate component-wise.     |
453    +------------------------------------------------------+-----------------------------------------------------------+
454    | genF16Type minInvocationsNonUniformAMD(genF16Type v) | Returns the minimum value of <v> across all active shader |
455    |                                                      | invocations in the sub-group with <Reduce> group          |
456    |                                                      | operation. These functions could be used in non-uniform   |
457    |                                                      | control flow. These functions operate component-wise.     |
458    +------------------------------------------------------+-----------------------------------------------------------+
459    | genF16Type minInvocationsInclusiveScanAMD(           | Returns the minimum value of <v> across all active shader |
460    |            genF16Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
461    |                                                      | operation. These functions must be used in uniform        |
462    |                                                      | control flow. These functions operate component-wise.     |
463    +------------------------------------------------------+-----------------------------------------------------------+
464    | genF16Type minInvocationsInclusiveScanNonUniformAMD( | Returns the minimum value of <v> across all active shader |
465    |            genF16Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
466    |                                                      | operation. These functions could be used in non-uniform   |
467    |                                                      | control flow. These functions operate component-wise.     |
468    +------------------------------------------------------+-----------------------------------------------------------+
469    | genF16Type minInvocationsExclusiveScanAMD(           | Returns the minimum value of <v> across all active shader |
470    |            genF16Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
471    |                                                      | operation. These functions must be used in uniform        |
472    |                                                      | control flow. These functions operate component-wise.     |
473    +------------------------------------------------------+-----------------------------------------------------------+
474    | genF16Type minInvocationsExclusiveScanNonUniformAMD( | Returns the minimum value of <v> across all active shader |
475    |            genF16Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
476    |                                                      | operation. These functions could be used in non-uniform   |
477    |                                                      | control flow. These functions operate component-wise.     |
478    +------------------------------------------------------+-----------------------------------------------------------+
479    | genF16Type maxInvocationsAMD(genF16Type v)           | Returns the maximum value of <v> across all active shader |
480    |                                                      | invocations in the sub-group with <Reduce> group          |
481    |                                                      | operation. These functions must be used in uniform        |
482    |                                                      | control flow. These functions operate component-wise.     |
483    +------------------------------------------------------+-----------------------------------------------------------+
484    | genF16Type maxInvocationsNonUniformAMD(genF16Type v) | Returns the maximum value of <v> across all active shader |
485    |                                                      | invocations in the sub-group with <Reduce> group          |
486    |                                                      | operation. These functions could be used in non-uniform   |
487    |                                                      | control flow. These functions operate component-wise.     |
488    +------------------------------------------------------+-----------------------------------------------------------+
489    | genF16Type maxInvocationsInclusiveScanAMD(           | Returns the maximum value of <v> across all active shader |
490    |            genF16Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
491    |                                                      | operation. These functions must be used in uniform        |
492    |                                                      | control flow. These functions operate component-wise.     |
493    +------------------------------------------------------+-----------------------------------------------------------+
494    | genF16Type maxInvocationsInclusiveScanNonUniformAMD( | Returns the maximum value of <v> across all active shader |
495    |            genF16Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
496    |                                                      | operation. These functions could be used in non-uniform   |
497    |                                                      | control flow. These functions operate component-wise.     |
498    +------------------------------------------------------+-----------------------------------------------------------+
499    | genF16Type maxInvocationsExclusiveScanAMD(           | Returns the maximum value of <v> across all active shader |
500    |            genF16Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
501    |                                                      | operation. These functions must be used in uniform        |
502    |                                                      | control flow. These functions operate component-wise.     |
503    +------------------------------------------------------+-----------------------------------------------------------+
504    | genF16Type maxInvocationsExclusiveScanNonUniformAMD( | Returns the maximum value of <v> across all active shader |
505    |            genF16Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
506    |                                                      | operation. These functions could be used in non-uniform   |
507    |                                                      | control flow. These functions operate component-wise.     |
508    +------------------------------------------------------+-----------------------------------------------------------+
509    | genF16Type addInvocationsAMD(genF16Type v)           | Returns the sum of the value of <v> across all active     |
510    |                                                      | shader invocations in the sub-group with <Reduce> group   |
511    |                                                      | operation. These functions must be used in uniform        |
512    |                                                      | control flow. These functions operate component-wise.     |
513    +------------------------------------------------------+-----------------------------------------------------------+
514    | genF16Type addInvocationsNonUniformAMD(genF16Type v) | Returns the sum of the value of <v> across all active     |
515    |                                                      | shader invocations in the sub-group with <Reduce> group   |
516    |                                                      | operation. These functions could be used in non-uniform   |
517    |                                                      | control flow. These functions operate component-wise.     |
518    +------------------------------------------------------+-----------------------------------------------------------+
519    | genF16Type addInvocationsInclusiveScanAMD(           | Returns the sum of the value of <v> across all active     |
520    |            genF16Type v)                             | shader invocations in the sub-group with <InclusiveScan>  |
521    |                                                      | group operation. These functions must be used in uniform  |
522    |                                                      | control flow. These functions operate component-wise.     |
523    +------------------------------------------------------+-----------------------------------------------------------+
524    | genF16Type addInvocationsInclusiveScanNonUniformAMD( | Returns the sum of the value of <v> across all active     |
525    |            genF16Type v)                             | shader invocations in the sub-group with <InclusiveScan>  |
526    |                                                      | group operation. These functions could be used in         |
527    |                                                      | non-uniform control flow. These functions operate         |
528    |                                                      | component-wise.                                           |
529    +------------------------------------------------------+-----------------------------------------------------------+
530    | genF16Type addInvocationsExclusiveScanAMD(           | Returns the sum of the value of <v> across all active     |
531    |            genF16Type v)                             | shader invocations in the sub-group with <ExclusiveScan>  |
532    |                                                      | group operation. These functions must be used in uniform  |
533    |                                                      | control flow. These functions operate component-wise.     |
534    +------------------------------------------------------+-----------------------------------------------------------+
535    | genF16Type addInvocationsExclusiveScanNonUniformAMD( | Returns the sum of the value of <v> across all active     |
536    |            genF16Type  v)                            | shader invocations in the sub-group with <ExclusiveScan>  |
537    |                                                      | group operation. These functions could be used in         |
538    |                                                      | non-uniform control flow. These functions operate         |
539    |                                                      | component-wise.                                           |
540    +------------------------------------------------------+-----------------------------------------------------------+
541
542Dependencies on AMD_gpu_shader_int16
543
544    If the shader enables AMD_gpu_shader_int16, this extension adds
545    additional shader invocation group functions.
546
547    Add Section 8.18, Shader Invocation Group Functions
548
549     +------------------------------------------------------+-----------------------------------------------------------+
550    | Syntax                                               | Description                                               |
551    +------------------------------------------------------+-----------------------------------------------------------+
552    | genI16Type minInvocationsAMD(genI16Type v)           | Returns the minimum value of <v> across all active shader |
553    | genU16Type minInvocationsAMD(genU16Type v)           | invocations in the sub-group with <Reduce> group          |
554    |                                                      | operation. These functions must be used in uniform        |
555    |                                                      | control flow. These functions operate component-wise.     |
556    +------------------------------------------------------+-----------------------------------------------------------+
557    | genI16Type minInvocationsNonUniformAMD(genI16Type v) | Returns the minimum value of <v> across all active shader |
558    | genU16Type minInvocationsNonUniformAMD(genU16Type v) | invocations in the sub-group with <Reduce> group          |
559    |                                                      | operation. These functions could be used in non-uniform   |
560    |                                                      | control flow. These functions operate component-wise.     |
561    +------------------------------------------------------+-----------------------------------------------------------+
562    | genI16Type minInvocationsInclusiveScanAMD(           | Returns the minimum value of <v> across all active shader |
563    |            genI16Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
564    | genU16Type minInvocationsInclusiveScanAMD(           | operation. These functions must be used in uniform        |
565    |            genU16Type v)                             | control flow. These functions operate component-wise.     |
566    +------------------------------------------------------+-----------------------------------------------------------+
567    | genI16Type minInvocationsInclusiveScanNonUniformAMD( | Returns the minimum value of <v> across all active shader |
568    |            genI16Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
569    | genU16Type minInvocationsInclusiveScanNonUniformAMD( | operation. These functions could be used in non-uniform   |
570    |            genU16Type v)                             | control flow. These functions operate component-wise.     |
571    +------------------------------------------------------+-----------------------------------------------------------+
572    | genI16Type minInvocationsExclusiveScanAMD(           | Returns the minimum value of <v> across all active shader |
573    |            genI16Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
574    | genU16Type minInvocationsExclusiveScanAMD(           | operation. These functions must be used in uniform        |
575    |            genU16Type v)                             | control flow. These functions operate component-wise.     |
576    +------------------------------------------------------+-----------------------------------------------------------+
577    | genI16Type minInvocationsExclusiveScanNonUniformAMD( | Returns the minimum value of <v> across all active shader |
578    |            genI16Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
579    | genU16Type minInvocationsExclusiveScanNonUniformAMD( | operation. These functions could be used in non-uniform   |
580    |            genU16Type v)                             | control flow. These functions operate component-wise.     |
581    +------------------------------------------------------+-----------------------------------------------------------+
582    | genI16Type maxInvocationsAMD(genI16Type v)           | Returns the maximum value of <v> across all active shader |
583    | genU16Type maxInvocationsAMD(genU16Type v)           | invocations in the sub-group with <Reduce> group          |
584    |                                                      | operation. These functions must be used in uniform        |
585    |                                                      | control flow. These functions operate component-wise.     |
586    +------------------------------------------------------+-----------------------------------------------------------+
587    | genI16Type maxInvocationsNonUniformAMD(genI16Type v) | Returns the maximum value of <v> across all active shader |
588    | genU16Type maxInvocationsNonUniformAMD(genU16Type v) | invocations in the sub-group with <Reduce> group          |
589    |                                                      | operation. These functions could be used in non-uniform   |
590    |                                                      | control flow. These functions operate component-wise.     |
591    +------------------------------------------------------+-----------------------------------------------------------+
592    | genI16Type maxInvocationsInclusiveScanAMD(           | Returns the maximum value of <v> across all active shader |
593    |            genI16Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
594    | genU16Type maxInvocationsInclusiveScanAMD(           | operation. These functions must be used in uniform        |
595    |            genU16Type v)                             | control flow. These functions operate component-wise.     |
596    +------------------------------------------------------+-----------------------------------------------------------+
597    | genI16Type maxInvocationsInclusiveScanNonUniformAMD( | Returns the maximum value of <v> across all active shader |
598    |            genI16Type v)                             | invocations in the sub-group with <InclusiveScan> group   |
599    | genU16Type maxInvocationsInclusiveScanNonUniformAMD( | operation. These functions could be used in non-uniform   |
600    |            genU16Type v)                             | control flow. These functions operate component-wise.     |
601    +------------------------------------------------------+-----------------------------------------------------------+
602    | genI16Type maxInvocationsExclusiveScanAMD(           | Returns the maximum value of <v> across all active shader |
603    |            genI16Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
604    | genU16Type maxInvocationsExclusiveScanAMD(           | operation. These functions must be used in uniform        |
605    |            genU16Type v)                             | control flow. These functions operate component-wise.     |
606    +------------------------------------------------------+-----------------------------------------------------------+
607    | genI16Type maxInvocationsExclusiveScanNonUniformAMD( | Returns the maximum value of <v> across all active shader |
608    |            genI16Type v)                             | invocations in the sub-group with <ExclusiveScan> group   |
609    | genU16Type maxInvocationsExclusiveScanNonUniformAMD( | operation. These functions could be used in non-uniform   |
610    |            genU16Type v)                             | control flow. These functions operate component-wise.     |
611    +------------------------------------------------------+-----------------------------------------------------------+
612    | genI16Type addInvocationsAMD(genI16Type v)           | Returns the sum of the value of <v> across all active     |
613    | genU16Type addInvocationsAMD(genU16Type v)           | shader invocations in the sub-group with <Reduce> group   |
614    |                                                      | operation. These functions must be used in uniform        |
615    |                                                      | control flow. These functions operate component-wise.     |
616    +------------------------------------------------------+-----------------------------------------------------------+
617    | genI16Type addInvocationsNonUniformAMD(genI16Type v) | Returns the sum of the value of <v> across all active     |
618    | genU16Type addInvocationsNonUniformAMD(genU16Type v) | shader invocations in the sub-group with <Reduce> group   |
619    |                                                      | operation. These functions could be used in non-uniform   |
620    |                                                      | control flow. These functions operate component-wise.     |
621    +------------------------------------------------------+-----------------------------------------------------------+
622    | genI16Type addInvocationsInclusiveScanAMD(           | Returns the sum of the value of <v> across all active     |
623    |            genI16Type v)                             | shader invocations in the sub-group with <InclusiveScan>  |
624    | genU16Type addInvocationsInclusiveScanAMD(           | group operation. These functions must be used in uniform  |
625    |            genU16Type v)                             | control flow. These functions operate component-wise.     |
626    +------------------------------------------------------+-----------------------------------------------------------+
627    | genI16Type addInvocationsInclusiveScanNonUniformAMD( | Returns the sum of the value of <v> across all active     |
628    |            genI16Type v)                             | shader invocations in the sub-group with <InclusiveScan>  |
629    | genU16Type addInvocationsInclusiveScanNonUniformAMD( | group operation. These functions could be used in         |
630    |            genU16Type v)                             | non-uniform control flow. These functions operate         |
631    |                                                      | component-wise.                                           |
632    +------------------------------------------------------+-----------------------------------------------------------+
633    | genI16Type addInvocationsExclusiveScanAMD(           | Returns the sum of the value of <v> across all active     |
634    |            genI16Type v)                             | shader invocations in the sub-group with <ExclusiveScan>  |
635    | genU16Type addInvocationsExclusiveScanAMD(           | group operation. These functions must be used in uniform  |
636    |            genU16Type v)                             | control flow. These functions operate component-wise.     |
637    +------------------------------------------------------+-----------------------------------------------------------+
638    | genI16Type addInvocationsExclusiveScanNonUniformAMD( | Returns the sum of the value of <v> across all active     |
639    |            genI16Type  v)                            | shader invocations in the sub-group with <ExclusiveScan>  |
640    | genU16Type addInvocationsExclusiveScanNonUniformAMD( | group operation. These functions could be used in         |
641    |            genU16Type v)                             | non-uniform control flow. These functions operate         |
642    |                                                      | component-wise.                                           |
643    +------------------------------------------------------+-----------------------------------------------------------+
644
645Additions to the AGL/GLX/WGL Specifications
646
647    None.
648
649GLX Protocol
650
651    None.
652
653Errors
654
655    None.
656
657Issues
658
659
660Revision History
661
662    Rev.    Date      Author    Changes
663    ----  ----------  --------  --------------------------------------------------
664    5     03/28/2018  rexu      Add interactions with ARB_gpu_shader_int16. New
665                                group invocation functions are added to support
666                                16-bit integer type in group operations.
667
668    4     10/19/2016  rexu      Add interactions with ARB_gpu_shader_int64 and
669                                AMD_gpu_shader_half_float. New group invocation
670                                functions are added to support 64-bit integer
671                                type and 16-bit/64-bit floating-point type
672                                in group operations. Clarify that <mask> in
673                                swizzleInvocationsMaskedAMD() should be constant
674                                integer expression with a value in the range
675                                [0, 31].
676
677    3     08/16/2016  rexu      Clarify that minInvocationsAMD, maxInvocationsAMD,
678                                addInvocationsAMD, along with their non-uniform
679                                versions, operate component-wise rather than on
680                                vector.
681
682    2     08/11/2016  rexu      Add non-uniform versions of minInvocationsAMD,
683                                maxInvocationsAMD, and addInvocationsAMD.
684                                Support those operations in non-uniform control
685                                flow.
686
687    1     04/21/2016  qlin      Internal revisions.
688