• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    ARB_shader_group_vote
4
5Name Strings
6
7    GL_ARB_shader_group_vote
8
9Contact
10
11    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
12
13Contributors
14
15    John Kessenich
16
17Notice
18
19    Copyright (c) 2013 The Khronos Group Inc. Copyright terms at
20        http://www.khronos.org/registry/speccopyright.html
21
22Specification Update Policy
23
24    Khronos-approved extension specifications are updated in response to
25    issues and bugs prioritized by the Khronos OpenGL Working Group. For
26    extensions which have been promoted to a core Specification, fixes will
27    first appear in the latest version of that core Specification, and will
28    eventually be backported to the extension document. This policy is
29    described in more detail at
30        https://www.khronos.org/registry/OpenGL/docs/update_policy.php
31
32Status
33
34    Complete. Approved by the ARB on June 3, 2013.
35    Ratified by the Khronos Board of Promoters on July 19, 2013.
36
37Version
38
39    Last Modified Date:         December 10, 2018
40    Revision:                   7
41
42Number
43
44    ARB Extension #157
45
46Dependencies
47
48    This extension is written against the OpenGL 4.3 (Compatibility Profile)
49    Specification, dated August 6, 2012.
50
51    This extension is written against the OpenGL Shading Language
52    Specification, Version 4.30, Revision 7, dated September 24, 2012.
53
54    OpenGL 4.3 or ARB_compute_shader is required.
55
56    This extension interacts with NV_gpu_shader5.
57
58Overview
59
60    This extension provides new built-in functions to compute the composite of
61    a set of boolean conditions across a group of shader invocations.  These
62    composite results may be used to execute shaders more efficiently on a
63    single-instruction multiple-data (SIMD) processor.  The set of shader
64    invocations across which boolean conditions are evaluated is
65    implementation-dependent, and this extension provides no guarantee over
66    how individual shader invocations are assigned to such sets.  In
67    particular, the set of shader invocations has no necessary relationship
68    with the compute shader workgroup -- a pair of shader invocations
69    in a single compute shader workgroup may end up in different sets used by
70    these built-ins.
71
72    Compute shaders operate on an explicitly specified group of threads (a
73    workgroup), but many implementations of OpenGL 4.3 will even group
74    non-compute shader invocations and execute them in a SIMD fashion.  When
75    executing code like
76
77      if (condition) {
78        result = do_fast_path();
79      } else {
80        result = do_general_path();
81      }
82
83    where <condition> diverges between invocations, a SIMD implementation
84    might first call do_fast_path() for the invocations where <condition> is
85    true and leave the other invocations dormant.  Once do_fast_path()
86    returns, it might call do_general_path() for invocations where <condition>
87    is false and leave the other invocations dormant.  In this case, the
88    shader executes *both* the fast and the general path and might be better
89    off just using the general path for all invocations.
90
91    This extension provides the ability to avoid divergent execution by
92    evaluting a condition across an entire SIMD invocation group using code
93    like:
94
95      if (allInvocationsARB(condition)) {
96        result = do_fast_path();
97      } else {
98        result = do_general_path();
99      }
100
101    The built-in function allInvocationsARB() will return the same value for
102    all invocations in the group, so the group will either execute
103    do_fast_path() or do_general_path(), but never both.  For example, shader
104    code might want to evaluate a complex function iteratively by starting
105    with an approximation of the result and then refining the approximation.
106    Some input values may require a small number of iterations to generate an
107    accurate result (do_fast_path) while others require a larger number
108    (do_general_path).  In another example, shader code might want to evaluate
109    a complex function (do_general_path) that can be greatly simplified when
110    assuming a specific value for one of its inputs (do_fast_path).
111
112New Procedures and Functions
113
114    None.
115
116New Tokens
117
118    None.
119
120Modifications to the OpenGL 4.3 (Compatibility Profile) Specification
121
122    None.
123
124Modifications to the OpenGL Shading Language Specification, Version 4.30
125
126    Including the following line in a shader can be used to control the
127    language features described in this extension:
128
129      #extension GL_ARB_shader_group_vote : <behavior>
130
131    where <behavior> is as specified in section 3.3.
132
133    New preprocessor #defines are added to the OpenGL Shading Language:
134
135      #define GL_ARB_shader_group_vote          1
136
137
138    Modify Chapter 8, Built-in Functions, p. 129
139
140    (insert a new section at the end of the chapter)
141
142    Section 8.18, Shader Invocation Group Functions
143
144    Implementations of the OpenGL Shading Language may optionally group
145    multiple shader invocations for a single shader stage into a single SIMD
146    invocation group, where invocations are assigned to groups in an
147    undefined, implementation-dependent manner.  Shader algorithms on such
148    implementations may benefit from being able to evaluate a composite of
149    boolean values over all active invocations in a group.
150
151    Syntax:
152
153      bool anyInvocationARB(bool value);
154      bool allInvocationsARB(bool value);
155      bool allInvocationsEqualARB(bool value);
156
157    The function anyInvocationARB() returns true if and only if <value> is
158    true for at least one active invocation in the group.
159
160    The function allInvocationsARB() returns true if and only if <value> is
161    true for all active invocations in the group.
162
163    The function allInvocationsEqualARB() returns true if <value> is the same
164    for all active invocations in the group.
165
166    For all of these functions, the same value is returned to all active
167    invocations in the group.
168
169    These functions may be called in conditionally executed code.  In groups
170    where some invocations do not execute the function call, the value
171    returned by the function is not affected by any invocation not calling the
172    function, even when <value> is well-defined for that invocation.
173
174    Since these functions depend on the values of <value> in an undefined
175    group of invocations, the value returned by these functions is largely
176    undefined.  However, anyInvocationARB() is guaranteed to return true if
177    <value> is true, and allInvocationsARB() is guaranteed to return false if
178    <value> is false.
179
180    Since implementations are not required to combine invocations into groups,
181    simply returning <value> for anyInvocationARB() and allInvocationsARB()
182    and returning true for allInvocationsEqualARB() is a legal implementation
183    of these functions.
184
185    For fragment shaders, invocations in a SIMD invocation group may include
186    invocations corresponding to pixels that are covered by a primitive being
187    rasterized, as well as invocations corresponding to neighboring pixels not
188    covered by the primitive.  The invocations for these neighboring "helper"
189    pixels may be created so that differencing can be used to evaluate
190    derivative functions like dFdx() and dFdx() (section 8.13) and implicit
191    derivatives used by texture() and related functions (section 8.9.2).  The
192    value of <value> for such "helper" pixels may affect the value returned by
193    anyInvocationARB(), allInvocationsARB(), and allInvocationsEqualARB().
194
195Additions to the AGL/EGL/GLX/WGL Specifications
196
197    None
198
199GLX Protocol
200
201    TBD
202
203Dependencies on NV_gpu_shader5
204
205    The built-in functions defined by this extension provide the same
206    functionality as the anyThreadNV(), allThreadsNV(), allThreadsEqualNV()
207    functions in NV_gpu_shader5 and are implemented identically.
208
209Errors
210
211    None.
212
213New State
214
215    None.
216
217New Implementation Dependent State
218
219    None.
220
221Issues
222
223    (1) Should we provide built-ins exposing a fixed implementation-dependent
224        SIMD workgroup size and/or the "location" of a single invocation
225        within a fixed-size SIMD workgroup?
226
227      RESOLVED:  Not in this extension.
228
229    (2) Should we provide mechanisms for sharing arbitrary data values across
230        SIMD workgroups?
231
232      RESOLVED:  Not in this extension.
233
234      For compute shaders, shared memory may already be used to share values
235      across invocations in a single workgroup.
236
237    (3) Is this capability supported for all shader types or just compute
238        shaders?
239
240      RESOLVED:  All shader types.
241
242    (4) For compute shaders, is there any relationship between the
243        workgroup and the SIMD invocation group across which conditions are
244        evaluated?
245
246      RESOLVED:  No.
247
248    (5) Is there any necessary relationship between SIMD workgroups in this
249        extension and the workgroups for compute shaders?
250
251      RESOLVED:  No.  It is expected that the SIMD workgroups in this
252      extension are relatively small compared to a maximum-sized compute
253      workgroup.  On current NVIDIA GPUs, the SIMD workgroup size will be 32;
254      however, maximum workgroup size (MAX_COMPUTE_WORK_GROUP_INVOCATIONS)
255      for OpenGL 4.3 compute shaders is 1024.
256
257      Perhaps there might be some small value in guaranteeing that a SIMD
258      workgroup doesn't span compute workgroups.  However, it's not clear
259      that there is any specific value in doing so, and having such a
260      restriction could limit parallelism for very small compute workgroups
261      (where one might be able to fit multiple workgroups in a single SIMD
262      workgroup).
263
264    (6) How do the built-in functions work when called in conditionally
265        executed code?
266
267      RESOLVED:  When these functions are called inside flow control, the
268      value for invocations not executing the function call have no effect on
269      the result.  For example, consider this code:
270
271        bool result = false;
272        bool condition1, condition2;
273        if (condition1) {
274          result = allInvocationsARB(condition2);
275        }
276
277      For all invocations where <condition1> is false, the value of <result>
278      will be false because allInvocationsARB() is not called.  For the other
279      invocations, the value of <result> will be true if and only if
280      <condition2> is true for all invocations where <condition1> is also
281      true.  In this similar code:
282
283        if (condition1) {
284          result = allInvocationsARB(condition1);
285        }
286
287      allInvocationsARB() will always return true, since it will only be
288      called by invocations where <condition1> is true.
289
290    (7) What should an implementation do if it groups invocations into SIMD
291        execution groups differently for different shader types?
292
293      RESOLVED:  As specified, there is no requirement of a specific SIMD
294      group size.  Additionally, there is no implementation-dependent constant
295      requiring applications to expose a single SIMD group size.
296
297      If an implementation has different SIMD group sizes for different
298      shaders, its implementation of the built-in functions could reflect such
299      differences.  Additionally, if an implementation doesn't even support
300      SIMD execution for some shader types, it could simply treat each
301      invocation as its own group.
302
303    (8) Should we provide any query by which an application can discover the
304        SIMD execution group size for a particular implementation?  Or for a
305        particular shader type, if any implementation might behave like the
306        hypothetical one in issue (7)?
307
308      RESOLVED:  No.  Given the limited functionality provided by this
309      extension, it's not clear that there's anything useful applications
310      could do with this information.
311
312    (9) Fragment shaders have built-in functions -- dFdx(), dFdy(), and
313        texture() -- that need to compute derivatives of their inputs in
314        screen space.  These derivatives may be approximated by computing the
315        difference between the value of an input at the pixel in question and
316        a neighboring pixel.  For small or slivery triangles, a pixel may not
317        actually have a neighboring pixel covered by the primitive.  In order
318        to allow for such differencing, implementations may need to create
319        fragment shader invocations for uncovered neighboring pixels -- called
320        "helper pixels".  How do such fragment shader invocations affect the
321        results of invocation group built-ins?
322
323      RESOLVED:  We specify that the results of the built-in functions can be
324      affected by the inputs evaluated for "helper" pixels found in a SIMD
325      execution group.  If a condition is true for all "real" fragment shader
326      invocations but false for some "helper" invocation, it's possible that
327      allInvocationsARB() will return false.
328
329    (10) For certain shading language operations indexing into arrays of
330         resources (samplers, images, atomic counters, uniform blocks, and
331         shader storage blocks), indices must be dynamically uniform to have
332         defined results.  Are the values returned by these new built-in
333         functions considered dynamically uniform?
334
335      RESOLVED:  No.
336
337      As defined, the values returned by these built-in functions should be
338      the same for all invocations in the SIMD execution group that call them.
339      However, for the purposes of some of these operations requiring dynamic
340      uniformity, some implementations may require identical values over a
341      group of invocations larger than a single SIMD execution group.  Since
342      these built-ins produce results that are only identical within a single
343      group, they can't qualify as "dynamically uniform".
344
345      In this code:
346
347        uniform sampler2D samplers[2];
348        bool condition = non_uniform_condition();
349        vec4 texel = texture(samplers[condition ? 1 : 0], ...);
350
351      the sampler accessed is *not* dynamically uniform.  However, in this
352      code:
353
354        bool condition = allInvocationsARB(non_uniform_condition());
355        vec4 texel = texture(samplers[condition ? 1 : 0], ...);
356
357      the value of <condition> will be the same for all invocations in the
358      SIMD execution group, so the indexed used to access <samplers> will also
359      be the same.  However, if dynamic uniformity requires two SIMD execution
360      groups to have the same value, this wouldn't qualify because a second
361      group could have a different value for <condition>.
362
363    (11) Should we provide allInvocationsEqual() that could determine if the
364         value of an integer/floating-point/vector variable is the same for
365         all invocations in a SIMD execution group?
366
367      RESOLVED:  Not in this extension.
368
369    (12) Does the use of built-in functions such as allInvocationsARB() have
370         invariance issues?
371
372      RESOLVED:  Yes.  The assignment of invocations to SIMD execution groups
373      is implementation-dependent, and there is no guarantee that the
374      assignment will be identical when rendering the exact same primitives in
375      a different viewport, or even when rendering the same primitives in the
376      same locations in different frames.  Since the assignment of invocations
377      to groups may vary from frame to frame, the value returned by
378      allInvocationsARB() may also vary from frame to frame.
379
380      If the computations performed when allInvocationsARB() returns true
381      produce results nearly identical to those performed when it returns
382      false, the invariance may result in images that are identical except for
383      least significant bits.  If the computations are not identical, more
384      severe flickering could occur.
385
386    (13) How should we name this extension?
387
388      RESOLVED:  We originally called it ARB_shader_group_operations, we
389      considered a number of other options in addition to evaluating a boolean
390      predicate across a SIMD execution group.  But the final extension is
391      limited to this specific operation, so a more specific name seems
392      appropriate.  We are using the term "vote", as it (like real voting)
393      involves collecting "choices" of multiple entities to generate a single
394      result and then returning the result of that collective choice.
395
396Revision History
397
398    Revision 7, December 10, 2018 (Jon Leech)
399      - Use 'workgroup' consistently throughout (Bug 11723, internal API
400        issue 87).
401
402    Revision 6, May 30, 2013
403      - Mark issue (13) as resolved.
404
405    Revision 5, May 7, 2013
406      - Extend the introduction to include an example of the use of the new
407        built-in functions.
408      - Add explicit language indicating that these functions return the same
409        value for all invocations in a SIMD execution group.
410
411    Revision 4, May 3, 2013
412      - Add some more concrete examples to the introduction illustrating why
413        these functions may be useful.
414      - Rename the extension to ARB_shader_group_vote.
415      - Add spec language indicating that fragment shader "helper" pixels
416        may affect the results of these "vote" functions.
417      - Mark various issues as resolved per working group discussions.
418      - Add issues (11), (12), and (13).
419
420    Revision 3, April 19, 2013
421      - Add #extension infrastructure for this feature, since it will begin as
422        an ARB extension.  Add "ARB" suffixes on the names of the built-in
423        functions.
424      - Add discussion on issue (7) and new issues (8) through (10).
425
426    Revision 2, March 28, 2013
427      - Checkpoint updating some issues for spec review (not done yet).
428
429    Revision 1, January 20, 2013
430      - Initial revision.
431