Name ARB_shader_group_vote Name Strings GL_ARB_shader_group_vote Contact Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) Contributors John Kessenich Notice Copyright (c) 2013 The Khronos Group Inc. Copyright terms at http://www.khronos.org/registry/speccopyright.html Specification Update Policy Khronos-approved extension specifications are updated in response to issues and bugs prioritized by the Khronos OpenGL Working Group. For extensions which have been promoted to a core Specification, fixes will first appear in the latest version of that core Specification, and will eventually be backported to the extension document. This policy is described in more detail at https://www.khronos.org/registry/OpenGL/docs/update_policy.php Status Complete. Approved by the ARB on June 3, 2013. Ratified by the Khronos Board of Promoters on July 19, 2013. Version Last Modified Date: December 10, 2018 Revision: 7 Number ARB Extension #157 Dependencies This extension is written against the OpenGL 4.3 (Compatibility Profile) Specification, dated August 6, 2012. This extension is written against the OpenGL Shading Language Specification, Version 4.30, Revision 7, dated September 24, 2012. OpenGL 4.3 or ARB_compute_shader is required. This extension interacts with NV_gpu_shader5. Overview This extension provides new built-in functions to compute the composite of a set of boolean conditions across a group of shader invocations. These composite results may be used to execute shaders more efficiently on a single-instruction multiple-data (SIMD) processor. The set of shader invocations across which boolean conditions are evaluated is implementation-dependent, and this extension provides no guarantee over how individual shader invocations are assigned to such sets. In particular, the set of shader invocations has no necessary relationship with the compute shader workgroup -- a pair of shader invocations in a single compute shader workgroup may end up in different sets used by these built-ins. Compute shaders operate on an explicitly specified group of threads (a workgroup), but many implementations of OpenGL 4.3 will even group non-compute shader invocations and execute them in a SIMD fashion. When executing code like if (condition) { result = do_fast_path(); } else { result = do_general_path(); } where diverges between invocations, a SIMD implementation might first call do_fast_path() for the invocations where is true and leave the other invocations dormant. Once do_fast_path() returns, it might call do_general_path() for invocations where is false and leave the other invocations dormant. In this case, the shader executes *both* the fast and the general path and might be better off just using the general path for all invocations. This extension provides the ability to avoid divergent execution by evaluting a condition across an entire SIMD invocation group using code like: if (allInvocationsARB(condition)) { result = do_fast_path(); } else { result = do_general_path(); } The built-in function allInvocationsARB() will return the same value for all invocations in the group, so the group will either execute do_fast_path() or do_general_path(), but never both. For example, shader code might want to evaluate a complex function iteratively by starting with an approximation of the result and then refining the approximation. Some input values may require a small number of iterations to generate an accurate result (do_fast_path) while others require a larger number (do_general_path). In another example, shader code might want to evaluate a complex function (do_general_path) that can be greatly simplified when assuming a specific value for one of its inputs (do_fast_path). New Procedures and Functions None. New Tokens None. Modifications to the OpenGL 4.3 (Compatibility Profile) Specification None. Modifications to the OpenGL Shading Language Specification, Version 4.30 Including the following line in a shader can be used to control the language features described in this extension: #extension GL_ARB_shader_group_vote : where is as specified in section 3.3. New preprocessor #defines are added to the OpenGL Shading Language: #define GL_ARB_shader_group_vote 1 Modify Chapter 8, Built-in Functions, p. 129 (insert a new section at the end of the chapter) Section 8.18, Shader Invocation Group Functions Implementations of the OpenGL Shading Language may optionally group multiple shader invocations for a single shader stage into a single SIMD invocation group, where invocations are assigned to groups in an undefined, implementation-dependent manner. Shader algorithms on such implementations may benefit from being able to evaluate a composite of boolean values over all active invocations in a group. Syntax: bool anyInvocationARB(bool value); bool allInvocationsARB(bool value); bool allInvocationsEqualARB(bool value); The function anyInvocationARB() returns true if and only if is true for at least one active invocation in the group. The function allInvocationsARB() returns true if and only if is true for all active invocations in the group. The function allInvocationsEqualARB() returns true if is the same for all active invocations in the group. For all of these functions, the same value is returned to all active invocations in the group. These functions may be called in conditionally executed code. In groups where some invocations do not execute the function call, the value returned by the function is not affected by any invocation not calling the function, even when is well-defined for that invocation. Since these functions depend on the values of in an undefined group of invocations, the value returned by these functions is largely undefined. However, anyInvocationARB() is guaranteed to return true if is true, and allInvocationsARB() is guaranteed to return false if is false. Since implementations are not required to combine invocations into groups, simply returning for anyInvocationARB() and allInvocationsARB() and returning true for allInvocationsEqualARB() is a legal implementation of these functions. For fragment shaders, invocations in a SIMD invocation group may include invocations corresponding to pixels that are covered by a primitive being rasterized, as well as invocations corresponding to neighboring pixels not covered by the primitive. The invocations for these neighboring "helper" pixels may be created so that differencing can be used to evaluate derivative functions like dFdx() and dFdx() (section 8.13) and implicit derivatives used by texture() and related functions (section 8.9.2). The value of for such "helper" pixels may affect the value returned by anyInvocationARB(), allInvocationsARB(), and allInvocationsEqualARB(). Additions to the AGL/EGL/GLX/WGL Specifications None GLX Protocol TBD Dependencies on NV_gpu_shader5 The built-in functions defined by this extension provide the same functionality as the anyThreadNV(), allThreadsNV(), allThreadsEqualNV() functions in NV_gpu_shader5 and are implemented identically. Errors None. New State None. New Implementation Dependent State None. Issues (1) Should we provide built-ins exposing a fixed implementation-dependent SIMD workgroup size and/or the "location" of a single invocation within a fixed-size SIMD workgroup? RESOLVED: Not in this extension. (2) Should we provide mechanisms for sharing arbitrary data values across SIMD workgroups? RESOLVED: Not in this extension. For compute shaders, shared memory may already be used to share values across invocations in a single workgroup. (3) Is this capability supported for all shader types or just compute shaders? RESOLVED: All shader types. (4) For compute shaders, is there any relationship between the workgroup and the SIMD invocation group across which conditions are evaluated? RESOLVED: No. (5) Is there any necessary relationship between SIMD workgroups in this extension and the workgroups for compute shaders? RESOLVED: No. It is expected that the SIMD workgroups in this extension are relatively small compared to a maximum-sized compute workgroup. On current NVIDIA GPUs, the SIMD workgroup size will be 32; however, maximum workgroup size (MAX_COMPUTE_WORK_GROUP_INVOCATIONS) for OpenGL 4.3 compute shaders is 1024. Perhaps there might be some small value in guaranteeing that a SIMD workgroup doesn't span compute workgroups. However, it's not clear that there is any specific value in doing so, and having such a restriction could limit parallelism for very small compute workgroups (where one might be able to fit multiple workgroups in a single SIMD workgroup). (6) How do the built-in functions work when called in conditionally executed code? RESOLVED: When these functions are called inside flow control, the value for invocations not executing the function call have no effect on the result. For example, consider this code: bool result = false; bool condition1, condition2; if (condition1) { result = allInvocationsARB(condition2); } For all invocations where is false, the value of will be false because allInvocationsARB() is not called. For the other invocations, the value of will be true if and only if is true for all invocations where is also true. In this similar code: if (condition1) { result = allInvocationsARB(condition1); } allInvocationsARB() will always return true, since it will only be called by invocations where is true. (7) What should an implementation do if it groups invocations into SIMD execution groups differently for different shader types? RESOLVED: As specified, there is no requirement of a specific SIMD group size. Additionally, there is no implementation-dependent constant requiring applications to expose a single SIMD group size. If an implementation has different SIMD group sizes for different shaders, its implementation of the built-in functions could reflect such differences. Additionally, if an implementation doesn't even support SIMD execution for some shader types, it could simply treat each invocation as its own group. (8) Should we provide any query by which an application can discover the SIMD execution group size for a particular implementation? Or for a particular shader type, if any implementation might behave like the hypothetical one in issue (7)? RESOLVED: No. Given the limited functionality provided by this extension, it's not clear that there's anything useful applications could do with this information. (9) Fragment shaders have built-in functions -- dFdx(), dFdy(), and texture() -- that need to compute derivatives of their inputs in screen space. These derivatives may be approximated by computing the difference between the value of an input at the pixel in question and a neighboring pixel. For small or slivery triangles, a pixel may not actually have a neighboring pixel covered by the primitive. In order to allow for such differencing, implementations may need to create fragment shader invocations for uncovered neighboring pixels -- called "helper pixels". How do such fragment shader invocations affect the results of invocation group built-ins? RESOLVED: We specify that the results of the built-in functions can be affected by the inputs evaluated for "helper" pixels found in a SIMD execution group. If a condition is true for all "real" fragment shader invocations but false for some "helper" invocation, it's possible that allInvocationsARB() will return false. (10) For certain shading language operations indexing into arrays of resources (samplers, images, atomic counters, uniform blocks, and shader storage blocks), indices must be dynamically uniform to have defined results. Are the values returned by these new built-in functions considered dynamically uniform? RESOLVED: No. As defined, the values returned by these built-in functions should be the same for all invocations in the SIMD execution group that call them. However, for the purposes of some of these operations requiring dynamic uniformity, some implementations may require identical values over a group of invocations larger than a single SIMD execution group. Since these built-ins produce results that are only identical within a single group, they can't qualify as "dynamically uniform". In this code: uniform sampler2D samplers[2]; bool condition = non_uniform_condition(); vec4 texel = texture(samplers[condition ? 1 : 0], ...); the sampler accessed is *not* dynamically uniform. However, in this code: bool condition = allInvocationsARB(non_uniform_condition()); vec4 texel = texture(samplers[condition ? 1 : 0], ...); the value of will be the same for all invocations in the SIMD execution group, so the indexed used to access will also be the same. However, if dynamic uniformity requires two SIMD execution groups to have the same value, this wouldn't qualify because a second group could have a different value for . (11) Should we provide allInvocationsEqual() that could determine if the value of an integer/floating-point/vector variable is the same for all invocations in a SIMD execution group? RESOLVED: Not in this extension. (12) Does the use of built-in functions such as allInvocationsARB() have invariance issues? RESOLVED: Yes. The assignment of invocations to SIMD execution groups is implementation-dependent, and there is no guarantee that the assignment will be identical when rendering the exact same primitives in a different viewport, or even when rendering the same primitives in the same locations in different frames. Since the assignment of invocations to groups may vary from frame to frame, the value returned by allInvocationsARB() may also vary from frame to frame. If the computations performed when allInvocationsARB() returns true produce results nearly identical to those performed when it returns false, the invariance may result in images that are identical except for least significant bits. If the computations are not identical, more severe flickering could occur. (13) How should we name this extension? RESOLVED: We originally called it ARB_shader_group_operations, we considered a number of other options in addition to evaluating a boolean predicate across a SIMD execution group. But the final extension is limited to this specific operation, so a more specific name seems appropriate. We are using the term "vote", as it (like real voting) involves collecting "choices" of multiple entities to generate a single result and then returning the result of that collective choice. Revision History Revision 7, December 10, 2018 (Jon Leech) - Use 'workgroup' consistently throughout (Bug 11723, internal API issue 87). Revision 6, May 30, 2013 - Mark issue (13) as resolved. Revision 5, May 7, 2013 - Extend the introduction to include an example of the use of the new built-in functions. - Add explicit language indicating that these functions return the same value for all invocations in a SIMD execution group. Revision 4, May 3, 2013 - Add some more concrete examples to the introduction illustrating why these functions may be useful. - Rename the extension to ARB_shader_group_vote. - Add spec language indicating that fragment shader "helper" pixels may affect the results of these "vote" functions. - Mark various issues as resolved per working group discussions. - Add issues (11), (12), and (13). Revision 3, April 19, 2013 - Add #extension infrastructure for this feature, since it will begin as an ARB extension. Add "ARB" suffixes on the names of the built-in functions. - Add discussion on issue (7) and new issues (8) through (10). Revision 2, March 28, 2013 - Checkpoint updating some issues for spec review (not done yet). Revision 1, January 20, 2013 - Initial revision.