1Name 2 3 ARB_shader_group_vote 4 5Name Strings 6 7 GL_ARB_shader_group_vote 8 9Contact 10 11 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 12 13Contributors 14 15 John Kessenich 16 17Notice 18 19 Copyright (c) 2013 The Khronos Group Inc. Copyright terms at 20 http://www.khronos.org/registry/speccopyright.html 21 22Specification Update Policy 23 24 Khronos-approved extension specifications are updated in response to 25 issues and bugs prioritized by the Khronos OpenGL Working Group. For 26 extensions which have been promoted to a core Specification, fixes will 27 first appear in the latest version of that core Specification, and will 28 eventually be backported to the extension document. This policy is 29 described in more detail at 30 https://www.khronos.org/registry/OpenGL/docs/update_policy.php 31 32Status 33 34 Complete. Approved by the ARB on June 3, 2013. 35 Ratified by the Khronos Board of Promoters on July 19, 2013. 36 37Version 38 39 Last Modified Date: December 10, 2018 40 Revision: 7 41 42Number 43 44 ARB Extension #157 45 46Dependencies 47 48 This extension is written against the OpenGL 4.3 (Compatibility Profile) 49 Specification, dated August 6, 2012. 50 51 This extension is written against the OpenGL Shading Language 52 Specification, Version 4.30, Revision 7, dated September 24, 2012. 53 54 OpenGL 4.3 or ARB_compute_shader is required. 55 56 This extension interacts with NV_gpu_shader5. 57 58Overview 59 60 This extension provides new built-in functions to compute the composite of 61 a set of boolean conditions across a group of shader invocations. These 62 composite results may be used to execute shaders more efficiently on a 63 single-instruction multiple-data (SIMD) processor. The set of shader 64 invocations across which boolean conditions are evaluated is 65 implementation-dependent, and this extension provides no guarantee over 66 how individual shader invocations are assigned to such sets. In 67 particular, the set of shader invocations has no necessary relationship 68 with the compute shader workgroup -- a pair of shader invocations 69 in a single compute shader workgroup may end up in different sets used by 70 these built-ins. 71 72 Compute shaders operate on an explicitly specified group of threads (a 73 workgroup), but many implementations of OpenGL 4.3 will even group 74 non-compute shader invocations and execute them in a SIMD fashion. When 75 executing code like 76 77 if (condition) { 78 result = do_fast_path(); 79 } else { 80 result = do_general_path(); 81 } 82 83 where <condition> diverges between invocations, a SIMD implementation 84 might first call do_fast_path() for the invocations where <condition> is 85 true and leave the other invocations dormant. Once do_fast_path() 86 returns, it might call do_general_path() for invocations where <condition> 87 is false and leave the other invocations dormant. In this case, the 88 shader executes *both* the fast and the general path and might be better 89 off just using the general path for all invocations. 90 91 This extension provides the ability to avoid divergent execution by 92 evaluting a condition across an entire SIMD invocation group using code 93 like: 94 95 if (allInvocationsARB(condition)) { 96 result = do_fast_path(); 97 } else { 98 result = do_general_path(); 99 } 100 101 The built-in function allInvocationsARB() will return the same value for 102 all invocations in the group, so the group will either execute 103 do_fast_path() or do_general_path(), but never both. For example, shader 104 code might want to evaluate a complex function iteratively by starting 105 with an approximation of the result and then refining the approximation. 106 Some input values may require a small number of iterations to generate an 107 accurate result (do_fast_path) while others require a larger number 108 (do_general_path). In another example, shader code might want to evaluate 109 a complex function (do_general_path) that can be greatly simplified when 110 assuming a specific value for one of its inputs (do_fast_path). 111 112New Procedures and Functions 113 114 None. 115 116New Tokens 117 118 None. 119 120Modifications to the OpenGL 4.3 (Compatibility Profile) Specification 121 122 None. 123 124Modifications to the OpenGL Shading Language Specification, Version 4.30 125 126 Including the following line in a shader can be used to control the 127 language features described in this extension: 128 129 #extension GL_ARB_shader_group_vote : <behavior> 130 131 where <behavior> is as specified in section 3.3. 132 133 New preprocessor #defines are added to the OpenGL Shading Language: 134 135 #define GL_ARB_shader_group_vote 1 136 137 138 Modify Chapter 8, Built-in Functions, p. 129 139 140 (insert a new section at the end of the chapter) 141 142 Section 8.18, Shader Invocation Group Functions 143 144 Implementations of the OpenGL Shading Language may optionally group 145 multiple shader invocations for a single shader stage into a single SIMD 146 invocation group, where invocations are assigned to groups in an 147 undefined, implementation-dependent manner. Shader algorithms on such 148 implementations may benefit from being able to evaluate a composite of 149 boolean values over all active invocations in a group. 150 151 Syntax: 152 153 bool anyInvocationARB(bool value); 154 bool allInvocationsARB(bool value); 155 bool allInvocationsEqualARB(bool value); 156 157 The function anyInvocationARB() returns true if and only if <value> is 158 true for at least one active invocation in the group. 159 160 The function allInvocationsARB() returns true if and only if <value> is 161 true for all active invocations in the group. 162 163 The function allInvocationsEqualARB() returns true if <value> is the same 164 for all active invocations in the group. 165 166 For all of these functions, the same value is returned to all active 167 invocations in the group. 168 169 These functions may be called in conditionally executed code. In groups 170 where some invocations do not execute the function call, the value 171 returned by the function is not affected by any invocation not calling the 172 function, even when <value> is well-defined for that invocation. 173 174 Since these functions depend on the values of <value> in an undefined 175 group of invocations, the value returned by these functions is largely 176 undefined. However, anyInvocationARB() is guaranteed to return true if 177 <value> is true, and allInvocationsARB() is guaranteed to return false if 178 <value> is false. 179 180 Since implementations are not required to combine invocations into groups, 181 simply returning <value> for anyInvocationARB() and allInvocationsARB() 182 and returning true for allInvocationsEqualARB() is a legal implementation 183 of these functions. 184 185 For fragment shaders, invocations in a SIMD invocation group may include 186 invocations corresponding to pixels that are covered by a primitive being 187 rasterized, as well as invocations corresponding to neighboring pixels not 188 covered by the primitive. The invocations for these neighboring "helper" 189 pixels may be created so that differencing can be used to evaluate 190 derivative functions like dFdx() and dFdx() (section 8.13) and implicit 191 derivatives used by texture() and related functions (section 8.9.2). The 192 value of <value> for such "helper" pixels may affect the value returned by 193 anyInvocationARB(), allInvocationsARB(), and allInvocationsEqualARB(). 194 195Additions to the AGL/EGL/GLX/WGL Specifications 196 197 None 198 199GLX Protocol 200 201 TBD 202 203Dependencies on NV_gpu_shader5 204 205 The built-in functions defined by this extension provide the same 206 functionality as the anyThreadNV(), allThreadsNV(), allThreadsEqualNV() 207 functions in NV_gpu_shader5 and are implemented identically. 208 209Errors 210 211 None. 212 213New State 214 215 None. 216 217New Implementation Dependent State 218 219 None. 220 221Issues 222 223 (1) Should we provide built-ins exposing a fixed implementation-dependent 224 SIMD workgroup size and/or the "location" of a single invocation 225 within a fixed-size SIMD workgroup? 226 227 RESOLVED: Not in this extension. 228 229 (2) Should we provide mechanisms for sharing arbitrary data values across 230 SIMD workgroups? 231 232 RESOLVED: Not in this extension. 233 234 For compute shaders, shared memory may already be used to share values 235 across invocations in a single workgroup. 236 237 (3) Is this capability supported for all shader types or just compute 238 shaders? 239 240 RESOLVED: All shader types. 241 242 (4) For compute shaders, is there any relationship between the 243 workgroup and the SIMD invocation group across which conditions are 244 evaluated? 245 246 RESOLVED: No. 247 248 (5) Is there any necessary relationship between SIMD workgroups in this 249 extension and the workgroups for compute shaders? 250 251 RESOLVED: No. It is expected that the SIMD workgroups in this 252 extension are relatively small compared to a maximum-sized compute 253 workgroup. On current NVIDIA GPUs, the SIMD workgroup size will be 32; 254 however, maximum workgroup size (MAX_COMPUTE_WORK_GROUP_INVOCATIONS) 255 for OpenGL 4.3 compute shaders is 1024. 256 257 Perhaps there might be some small value in guaranteeing that a SIMD 258 workgroup doesn't span compute workgroups. However, it's not clear 259 that there is any specific value in doing so, and having such a 260 restriction could limit parallelism for very small compute workgroups 261 (where one might be able to fit multiple workgroups in a single SIMD 262 workgroup). 263 264 (6) How do the built-in functions work when called in conditionally 265 executed code? 266 267 RESOLVED: When these functions are called inside flow control, the 268 value for invocations not executing the function call have no effect on 269 the result. For example, consider this code: 270 271 bool result = false; 272 bool condition1, condition2; 273 if (condition1) { 274 result = allInvocationsARB(condition2); 275 } 276 277 For all invocations where <condition1> is false, the value of <result> 278 will be false because allInvocationsARB() is not called. For the other 279 invocations, the value of <result> will be true if and only if 280 <condition2> is true for all invocations where <condition1> is also 281 true. In this similar code: 282 283 if (condition1) { 284 result = allInvocationsARB(condition1); 285 } 286 287 allInvocationsARB() will always return true, since it will only be 288 called by invocations where <condition1> is true. 289 290 (7) What should an implementation do if it groups invocations into SIMD 291 execution groups differently for different shader types? 292 293 RESOLVED: As specified, there is no requirement of a specific SIMD 294 group size. Additionally, there is no implementation-dependent constant 295 requiring applications to expose a single SIMD group size. 296 297 If an implementation has different SIMD group sizes for different 298 shaders, its implementation of the built-in functions could reflect such 299 differences. Additionally, if an implementation doesn't even support 300 SIMD execution for some shader types, it could simply treat each 301 invocation as its own group. 302 303 (8) Should we provide any query by which an application can discover the 304 SIMD execution group size for a particular implementation? Or for a 305 particular shader type, if any implementation might behave like the 306 hypothetical one in issue (7)? 307 308 RESOLVED: No. Given the limited functionality provided by this 309 extension, it's not clear that there's anything useful applications 310 could do with this information. 311 312 (9) Fragment shaders have built-in functions -- dFdx(), dFdy(), and 313 texture() -- that need to compute derivatives of their inputs in 314 screen space. These derivatives may be approximated by computing the 315 difference between the value of an input at the pixel in question and 316 a neighboring pixel. For small or slivery triangles, a pixel may not 317 actually have a neighboring pixel covered by the primitive. In order 318 to allow for such differencing, implementations may need to create 319 fragment shader invocations for uncovered neighboring pixels -- called 320 "helper pixels". How do such fragment shader invocations affect the 321 results of invocation group built-ins? 322 323 RESOLVED: We specify that the results of the built-in functions can be 324 affected by the inputs evaluated for "helper" pixels found in a SIMD 325 execution group. If a condition is true for all "real" fragment shader 326 invocations but false for some "helper" invocation, it's possible that 327 allInvocationsARB() will return false. 328 329 (10) For certain shading language operations indexing into arrays of 330 resources (samplers, images, atomic counters, uniform blocks, and 331 shader storage blocks), indices must be dynamically uniform to have 332 defined results. Are the values returned by these new built-in 333 functions considered dynamically uniform? 334 335 RESOLVED: No. 336 337 As defined, the values returned by these built-in functions should be 338 the same for all invocations in the SIMD execution group that call them. 339 However, for the purposes of some of these operations requiring dynamic 340 uniformity, some implementations may require identical values over a 341 group of invocations larger than a single SIMD execution group. Since 342 these built-ins produce results that are only identical within a single 343 group, they can't qualify as "dynamically uniform". 344 345 In this code: 346 347 uniform sampler2D samplers[2]; 348 bool condition = non_uniform_condition(); 349 vec4 texel = texture(samplers[condition ? 1 : 0], ...); 350 351 the sampler accessed is *not* dynamically uniform. However, in this 352 code: 353 354 bool condition = allInvocationsARB(non_uniform_condition()); 355 vec4 texel = texture(samplers[condition ? 1 : 0], ...); 356 357 the value of <condition> will be the same for all invocations in the 358 SIMD execution group, so the indexed used to access <samplers> will also 359 be the same. However, if dynamic uniformity requires two SIMD execution 360 groups to have the same value, this wouldn't qualify because a second 361 group could have a different value for <condition>. 362 363 (11) Should we provide allInvocationsEqual() that could determine if the 364 value of an integer/floating-point/vector variable is the same for 365 all invocations in a SIMD execution group? 366 367 RESOLVED: Not in this extension. 368 369 (12) Does the use of built-in functions such as allInvocationsARB() have 370 invariance issues? 371 372 RESOLVED: Yes. The assignment of invocations to SIMD execution groups 373 is implementation-dependent, and there is no guarantee that the 374 assignment will be identical when rendering the exact same primitives in 375 a different viewport, or even when rendering the same primitives in the 376 same locations in different frames. Since the assignment of invocations 377 to groups may vary from frame to frame, the value returned by 378 allInvocationsARB() may also vary from frame to frame. 379 380 If the computations performed when allInvocationsARB() returns true 381 produce results nearly identical to those performed when it returns 382 false, the invariance may result in images that are identical except for 383 least significant bits. If the computations are not identical, more 384 severe flickering could occur. 385 386 (13) How should we name this extension? 387 388 RESOLVED: We originally called it ARB_shader_group_operations, we 389 considered a number of other options in addition to evaluating a boolean 390 predicate across a SIMD execution group. But the final extension is 391 limited to this specific operation, so a more specific name seems 392 appropriate. We are using the term "vote", as it (like real voting) 393 involves collecting "choices" of multiple entities to generate a single 394 result and then returning the result of that collective choice. 395 396Revision History 397 398 Revision 7, December 10, 2018 (Jon Leech) 399 - Use 'workgroup' consistently throughout (Bug 11723, internal API 400 issue 87). 401 402 Revision 6, May 30, 2013 403 - Mark issue (13) as resolved. 404 405 Revision 5, May 7, 2013 406 - Extend the introduction to include an example of the use of the new 407 built-in functions. 408 - Add explicit language indicating that these functions return the same 409 value for all invocations in a SIMD execution group. 410 411 Revision 4, May 3, 2013 412 - Add some more concrete examples to the introduction illustrating why 413 these functions may be useful. 414 - Rename the extension to ARB_shader_group_vote. 415 - Add spec language indicating that fragment shader "helper" pixels 416 may affect the results of these "vote" functions. 417 - Mark various issues as resolved per working group discussions. 418 - Add issues (11), (12), and (13). 419 420 Revision 3, April 19, 2013 421 - Add #extension infrastructure for this feature, since it will begin as 422 an ARB extension. Add "ARB" suffixes on the names of the built-in 423 functions. 424 - Add discussion on issue (7) and new issues (8) through (10). 425 426 Revision 2, March 28, 2013 427 - Checkpoint updating some issues for spec review (not done yet). 428 429 Revision 1, January 20, 2013 430 - Initial revision. 431