1Name 2 3 ARB_compute_variable_group_size 4 5Name Strings 6 7 GL_ARB_compute_variable_group_size 8 9Contact 10 11 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 12 13Contributors 14 15 Slawomir Grajewski, Intel Corporation 16 Jeannot Breton, NVIDIA 17 Daniel Koch, NVIDIA 18 19Notice 20 21 Copyright (c) 2013 The Khronos Group Inc. Copyright terms at 22 http://www.khronos.org/registry/speccopyright.html 23 24Specification Update Policy 25 26 Khronos-approved extension specifications are updated in response to 27 issues and bugs prioritized by the Khronos OpenGL Working Group. For 28 extensions which have been promoted to a core Specification, fixes will 29 first appear in the latest version of that core Specification, and will 30 eventually be backported to the extension document. This policy is 31 described in more detail at 32 https://www.khronos.org/registry/OpenGL/docs/update_policy.php 33 34Status 35 36 Complete. Approved by the ARB on June 3, 2013. 37 Ratified by the Khronos Board of Promoters on July 19, 2013. 38 39Version 40 41 Last Modified Date: December 10, 2018 42 Revision: 9 43 44Number 45 46 ARB Extension #153 47 48Dependencies 49 50 This extension is written against the OpenGL 4.3 (Compatibility Profile) 51 Specification, dated August 6, 2012. 52 53 This extension is written against the OpenGL Shading Language 54 Specification, Version 4.30, Revision 7, dated September 24, 2012. 55 56 OpenGL 4.3 or ARB_compute_shader is required. 57 58 This extension interacts with NV_compute_program5. 59 60Overview 61 62 This extension allows applications to write generic compute shaders that 63 operate on workgroups with arbitrary dimensions. Instead of specifying a 64 fixed workgroup size in the compute shader, an application can use a 65 compute shader using the /local_size_variable/ layout qualifer to indicate 66 a variable workgroup size. When using such compute shaders, the new 67 command DispatchComputeGroupSizeARB should be used to specify both a 68 workgroup size and workgroup count. 69 70 In this extension, compute shaders with fixed group sizes must be 71 dispatched by DispatchCompute and DispatchComputeIndirect. Compute 72 shaders with variable group sizes must be dispatched via 73 DispatchComputeGroupSizeARB. No support is provided in this extension for 74 indirect dispatch of compute shaders with a variable group size. 75 76New Procedures and Functions 77 78 void DispatchComputeGroupSizeARB(uint num_groups_x, uint num_groups_y, 79 uint num_groups_z, uint group_size_x, 80 uint group_size_y, uint group_size_z); 81 82New Tokens 83 84 Accepted by the <pname> parameter of GetIntegerv, GetBooleanv, GetFloatv, 85 GetDoublev and GetInteger64v: 86 87 MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB 0x9344 88 MAX_COMPUTE_FIXED_GROUP_INVOCATIONS_ARB 0x90EB (see note) 89 90 Accepted by the <pname> parameter of GetIntegeri_v, GetBooleani_v, 91 GetFloati_v, GetDoublei_v and GetInteger64i_v: 92 93 MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB 0x9345 94 MAX_COMPUTE_FIXED_GROUP_SIZE_ARB 0x91BF (see note) 95 96 Note: MAX_COMPUTE_FIXED_GROUP_INVOCATIONS_ARB and 97 MAX_COMPUTE_FIXED_GROUP_SIZE_ARB are aliases for the OpenGL 4.3 core enums 98 MAX_COMPUTE_WORK_GROUP_INVOCATIONS and MAX_COMPUTE_WORK_GROUP_SIZE, 99 respectively. 100 101 102Modifications to the OpenGL 4.3 (Compatibility Profile) Specification 103 104 Modify Chapter 19, Compute Shaders, p. 585 105 106 (modify second paragraph, p. 585) 107 108 ... One or more workgroups is launched by calling 109 110 void DispatchCompute(uint num_groups_x, uint num_groups_y, 111 uint num_groups_z) 112 113 or 114 115 void DispatchComputeGroupSizeARB(uint num_groups_x, uint num_groups_y, 116 uint num_groups_z, uint group_size_x, 117 uint group_size_y, uint group_size_z); 118 119 (modify second paragraph, p. 586) 120 121 For DispatchCompute, the workgroup size in each dimension must be 122 specified at compile time in the active program for the compute shader 123 stage. The workgroup size is specified using an input layout qualifer 124 ... 125 126 (insert after second paragraph, p. 586) 127 128 For DispatchComputeGroupSizeARB, the workgroup size must be specified as 129 variable in the active program for the compute shader stage. The group 130 size used to execute the compute shader is taken from the <group_size_x>, 131 <group_size_y>, and <group_size_z> parameters. For the purposes of the 132 COMPUTE_WORK_GROUP_SIZE query, a program without a workgroup size 133 specified at compile time will be considered to have a size of zero in 134 each dimension. 135 136 (modify the third paragraph, p. 586) 137 138 The maximum size of a workgroup may be determined by calling 139 GetIntegeri_v with <index> set to 0, 1, or 2 to retrieve the maximum work 140 size in the X, Y and Z dimension, respectively. <target> should be set to 141 MAX_COMPUTE_FIXED_GROUP_SIZE_ARB for compute shaders with fixed group 142 sizes or MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB for compute shaders with 143 variable local group sizes. Furthermore, the maximum number of 144 invocations in a single workgroup (i.e., the product of the three 145 dimensions) may be determined by calling GetIntegerv with <pname> set to 146 MAX_COMPUTE_FIXED_GROUP_INVOCATIONS_ARB for compute shaders with fixed 147 group sizes or MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB for compute 148 shaders with variable group sizes. 149 150 (insert after the first INVALID_OPERATION error in the first error block, 151 shared between DispatchCompute and DispatchComputeGroupSizeARB, p. 586) 152 153 An INVALID_OPERATION error is generated by DispatchCompute if the active 154 program for the compute shader stage has a variable workgroup 155 size. 156 157 An INVALID_OPERATION error is generated by DispatchComputeGroupSizeARB if 158 the active program for the compute shader stage has a fixed workgroup 159 size. 160 161 (insert at the end of the first error block, shared between 162 DispatchCompute and DispatchComputeGroupSizeARB, p. 586) 163 164 An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if any 165 of <group_size_x>, <group_size_y>, or <group_size_z> is less than or equal 166 to zero or greater than the maximum workgroup size for compute 167 shaders with variable group size (MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB) in 168 the corresponding dimension. 169 170 An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if the 171 product of <group_size_x>, <group_size_y>, and <group_size_z> exceeds the 172 implementation-dependent maximum workgroup invocation count for 173 compute shaders with variable group size 174 (MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB). 175 176 (insert at the end of the first error block, for DispatchComputeIndirect, 177 p. 587) 178 179 An INVALID_OPERATION error is generated if the active program for the 180 compute shader stage has a variable workgroup size. 181 182 183Modifications to the OpenGL Shading Language Specification, Version 4.30 184 185 Including the following line in a shader can be used to control the 186 language features described in this extension: 187 188 #extension GL_ARB_compute_variable_group_size : <behavior> 189 190 where <behavior> is as specified in section 3.3. 191 192 New preprocessor #defines are added to the OpenGL Shading Language: 193 194 #define GL_ARB_compute_variable_group_size 1 195 196 197 Modify Section 4.4.1.4, Compute Shader Inputs (p. 59) 198 199 (add to list of layout qualifiers for compute shader inputs, p. 59) 200 201 layout-qualifier-id 202 local_size_x = integer-constant 203 local_size_y = integer-constant 204 local_size_z = integer-constant 205 local_size_variable 206 207 (modify the last paragraph, p. 59) 208 209 The local_size_x, local_size_y, and local_size_z qualifiers are used to 210 declare a fixed local group size for the kernel in the first, second... 211 212 (modify the second to last paragaph in the section) 213 214 If the fixed local group size of the shader in any dimension... 215 ... If multiple compute shaders attached to a single program object declare 216 a fixed local group size, the declarations must be identical; otherwise a 217 link-time error results. 218 219 (insert before the last paragraph of the section, p. 60) 220 221 The *local_size_variable* qualifier is used to declare that 222 the local group size of the shader is variable, and will be specified 223 using arguments to OpenGL API compute dispatch commands. If a compute 224 shader including a *local_size_variable* qualifier also declares a 225 fixed local group size using the *local_size_x*, *local_size_y*, or 226 *local_size_z* qualifiers, a compile-time error results. If one compute 227 shader attached to a program declares a variable local group size and a 228 second compute shader attached to the same program declares a fixed 229 local group size, a link-time error results. 230 231 (modify last paragraph of the section, p. 60, which specified link errors 232 if *local_size* layout qualifiers were omitted) 233 234 Furthermore, if a program object contains any compute shaders, at least 235 one must contain an input layout qualifier specifying a fixed or variable 236 local group size for the program, or a link-time error will occur. 237 238 239 Modify Section 7.1, Built-In Language Variables, p. 110 240 241 (add to list of compute built-ins, p. 110) 242 243 in uvec3 gl_NumWorkGroups; // already exists in 4.30 244 const uvec3 gl_WorkGroupSize; // already exists in 4.30 245 in uvec3 gl_LocalGroupSizeARB; // new! 246 247 (modify third paragraph, p. 113) 248 249 The built-in constant gl_WorkGroupSize is a compute-shader constant ... 250 It is a compile-time error to use gl_WorkGroupSize in a shader that does 251 not declare a fixed local group size, or before that shader has declared 252 a fixed local group size, using local_size_x, local_size_y, and 253 local_size_z. ... 254 255 (insert after third paragraph, p. 113) 256 257 The built-in variable /gl_LocalGroupSizeARB/ is a compute-shader input 258 variable containing the workgroup size for the current compute- 259 shader workgroup. For compute shaders with a fixed local group size (using 260 *local_size_x*, *local_size_y*, or *local_size_z* layout qualifiers), its 261 value will be the same as the constant /gl_WorkGroupSize/. For compute 262 shaders with a variable local group size (using *local_size_variable*), 263 the value of /gl_LocalGroupSizeARB/ will be the workgroup 264 size specified in the OpenGL API command dispatching the current 265 compute shader work. 266 267 (modify next-to-last paragraph, p. 113) 268 269 The built-in variable gl_LocalInvocationID ... The possible values for 270 this varaible range across the workgroup size, i.e., (0,0,0) to 271 (gl_LocalGroupSizeARB.x - 1, gl_LocalGroupSizeARB.y - 1, 272 gl_LocalGroupSizeARB.z - 1). 273 274 (modify last paragraph, p. 113) 275 276 The built-in variable gl_GlobalInvocationID ... This is computed as: 277 278 gl_GlobalInvocationID = gl_WorkGroupID * gl_LocalGroupSizeARB + 279 gl_LocalInvocationID; 280 281 282 (modify first paragraph, p. 114) 283 284 The built-in variable gl_LocalInvocationIndex ... This is computed as: 285 286 gl_LocalInvocationIndex = 287 gl_LocalInvocationID.z * (gl_LocalGroupSizeARB.x * 288 gl_LocalGroupSizeARB.y) + 289 gl_LocalInvocationID.y * gl_LocalGroupSizeARB.x + 290 gl_LocalInvocationID.x; 291 292 293Additions to the AGL/EGL/GLX/WGL Specifications 294 295 None 296 297GLX Protocol 298 299 TBD 300 301Dependencies on NV_compute_program5 302 303 If NV_compute_program5 is supported, variable workgroup sizes are 304 supported for assembly programs. Make the following edits to the 305 NV_compute_program5 specification: 306 307 (modify the NV_compute_program5 edits to Section 2.X.3.2, Program 308 Attribute Variables) 309 310 If a compute attribute binding matches "invocation.groupsize", the "x", 311 "y", and "z" components of the invocation attribute variable are filled 312 the "x", "y", and "z" dimensions, respectively, of the workgroup, 313 as specified by the GROUP_SIZE declaration for programs with fixed-size 314 workgroups or through the OpenGL API for programs with variable-size 315 workgroups. The "w" component of the attribute is undefined. 316 317 (add to section 2.X.6 of the NV_gpu_program4/5 spec, Program Options) 318 319 + Compute Shader Variable Group Size (ARB_compute_variable_group_size) 320 321 If a program specifies the "ARB_compute_variable_group_size" option, it 322 supports variable-size workgroups. Compute programs with a variable 323 workgroup size must be dispatched with DispatchComputeGroupSizeARB. Compute 324 programs with a fixed workgroup size must be dispatched with 325 DispatchCompute or DispatchComputeIndirect. 326 327 (modify Section 2.X.7.Y, Compute Program Declarations) 328 329 - Shader Thread Group Size (GROUP_SIZE) 330 331 The GROUP_SIZE statement declares the number of shader threads in a one-, 332 two-, or three-dimensional workgroup. The statement must have one 333 to three unsigned integer arguments. Each argument must be less than or 334 equal to the value of the implementation-dependent limit 335 MAX_COMPUTE_LOCAL_WORK_SIZE for its corresponding dimension (X, Y, or Z). 336 If the ARB_compute_variable_group_size option is specified, no fixed group 337 size should be specified and a program will fail to load if it includes 338 any GROUP_SIZE declaration. If the ARB_compute_variable_group_size option 339 is not specified, a program will fail to load unless it contains exactly 340 one GROUP_SIZE declaration. 341 342Errors 343 344 An INVALID_OPERATION error is generated by DispatchCompute or 345 DispatchComputeIndirect if the active program for the compute shader stage 346 has a variable workgroup size. 347 348 An INVALID_OPERATION error is generated by DispatchComputeGroupSizeARB if 349 the active program for the compute shader stage has a fixed workgroup 350 size. 351 352 An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if any 353 of <group_size_x>, <group_size_y>, or <group_size_z> is less than or equal 354 to zero or greater than the maximum workgroup size for compute 355 shaders with variable group size (MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB) in 356 the corresponding dimension. 357 358 An INVALID_VALUE error is generated by DispatchComputeGroupSizeARB if the 359 product of <group_size_x>, <group_size_y>, and <group_size_z> exceeds the 360 implementation-dependent maximum workgroup invocation count for 361 compute shaders with variable group size 362 (MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB). 363 364New State 365 366 None. 367 368New Implementation Dependent State 369 370 Add to Table 23.73 (Implementation Dependent Compute Shader Limits), 371 p. 716 372 373 Minimum 374 Get Value Type Get Command Value Description Sec. 375 ------------------------- ---- ------------- --------- ---------------------------- ------ 376 MAX_COMPUTE_VARIABLE_ 3xZ+ GetIntegeri_v 512 (x,y) maximum local group size for 19 377 WORK_GROUP_SIZE_ARB 64 (z) compute shaders with variable 378 group size (per dimension) 379 MAX_COMPUTE_VARIABLE_ Z+ GetIntegerv 512 maximum number of invocations 19 380 WORK_GROUP_ in a group for compute shaders 381 INVOCATIONS_ARB with variable group size 382 383 In table 23.73, rename entries for "MAX_COMPUTE_WORK_GROUP_SIZE" and 384 "MAX_COMPUTE_WORK_GROUP_INVOCATIONS" to use the labels 385 "MAX_COMPUTE_FIXED_GROUP_SIZE_ARB" and 386 "MAX_COMPUTE_FIXED_GROUP_INVOCATIONS_ARB", respectively. Also modify the 387 description of these entries to refer to "compute shaders with fixed group 388 size". 389 390Issues 391 392 (1) If a compute shader declares a workgroup size, can it be dispatched 393 using OpenGL APIs accepting an explicit workgroup size as part of the 394 command? If so, what happens? 395 396 RESOLVED: No. Attempting to do so will generate an INVALID_OPERATION 397 error. 398 399 Since the fixed workgroup size may affect the compilation of the shader 400 and the value of certain built-in constants, having the OpenGL API 401 override the workgroup size baked into the compute shader seems 402 suspect. We could conceivably allow an explicit workgroup size in the 403 OpenGL API and require that it match the workgroup size baked into the 404 compute shader, but doing so seems to be of limited value. 405 406 (2) If a compute shader doesn't declare a workgroup size, can it be 407 dispatched using OpenGL APIs that do not accept an explicit workgroup 408 size as part of the command? If so, what happens? 409 410 RESOLVED: No. Attempting to do so will generate an INVALID_OPERATION 411 error. 412 413 We could theoretically treat this case as allowing OpenGL 414 implementations to pick a workgroup size that "works well" on a 415 particular piece of hardware. However, that wouldn't resolve the 416 question of what the "num_groups" arguments to DispatchCompute would 417 mean if the group size were implementation-dependent. One could 418 intepret the "num_groups" arguments as specifying the number of 419 *invocations* in each dimension, as though the group size were 1x1x1. 420 But it's just easier to make this condition an error, as we do for APIs 421 attempting to override the group size of a compute shader. 422 423 (3) What new GLSL built-ins should we provide to expose the group size 424 specified in the OpenGL API? 425 426 RESOLVED: We will provide a new built-in variable exposing the group 427 size specified in the API. The name choice is potentially tricky, since 428 we now have two different "workgroup size" variables -- a previously 429 existing constant for the fixed workgroup size and now a second input 430 for the variable workgroup size specified in the API. We choose the 431 name "gl_LocalGroupSizeARB" here, which seems to fit reasonably well with 432 existing inputs such as "gl_LocalInvocationID". 433 434 If we had provided this functionality in the original compute shader 435 extension, maybe we could have only had "gl_LocalGroupSizeARB"? 436 However, the constant "gl_WorkGroupSize" would still be useful for 437 sizing built-in arrays for shaders with a fixed workgroup size. For 438 example, a shader might want to declare a shared variable with one 439 instance per workgroup invocation, such as: 440 441 shared float shared_values[gl_WorkGroupSize.x * gl_WorkGroupSize.y * 442 gl_WorkGroupSize.z]; 443 444 Such declarations would be illegal using the input 445 "gl_LocalGroupSizeARB". 446 447 (4) Do we need to modify the behavior of existing GLSL built-ins for 448 compute shaders without an explicit workgroup size? 449 450 RESOLVED: No, not really. 451 452 The constant gl_WorkGroupSize seems like it would be affected by 453 omitting an explicit workgroup size. However, it is already an error 454 to use gl_WorkGroupSize in a shader before a workgroup size layout 455 qualifier is declared. That would make its use illegal in shaders where 456 workgroup size layout qualifiers are not declared at all. 457 458 We do need to make minor modifications to the language describing other 459 built-in inputs such as gl_LocalInvocationIndex, that are today defined 460 to be a function of the constant gl_WorkGroupSize. We modify these 461 definitions to use the input gl_LocalGroupSizeARB instead. 462 463 (5) Should we provide a function (e.g., 464 DispatchComputeIndirectGroupSizeARB) that takes both a workgroup 465 count and a workgroup size from indirect dispatch buffers? If so, 466 what do we do if the workgroup size is not positive or exceeds 467 implementation-dependent limits? 468 469 RESOLVED: No, let's leave this out of this extension. 470 471 (6) Is it necessary for compute shaders to include a "#extension" 472 directive to enable this extension in order to link successfully 473 without a fixed workgroup size? 474 475 RESOLVED: Yes, compute shaders will have to use the 476 "local_size_variable" layout qualifier to declare a variable workgroup 477 size, and an "#extension" directive is required to be able to use that 478 layout qualifier. 479 480 In unextended OpenGL 4.3, we get a link error if no shaders in the 481 program exercise an existing language feature (declaring the fixed 482 workgroup size). We could have simply removed this error, but the general 483 rule for "#extension" is that a user should be able to determine if a 484 shader were legal or not simply by examining the source code. 485 486 Note that it is necessary to use "#extension" to use the new built-in 487 input (gl_LocalGroupSizeARB) provided by this extension. 488 489 (7) Do we need different implementation-dependent limits for dynamic group 490 sizes? 491 492 RESOLVED: Yes, some implementations of this extension may require lower 493 limits for variable local group sizes. We add new tokens 494 MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB and 495 MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB to query these limits. 496 Implementations must support variable group dimensions of 512/512/64, 497 with at least 512 invocations per group. The minimum limits for fixed 498 group sizes in unextended OpenGL 4.3 are 1024/1024/64 with at least 1024 499 invocations per group. 500 501 (8) Do we need an explicit query to determine if a program with a compute 502 shader has a fixed or variable local group size? 503 504 RESOLVED: No. The existing COMPUTE_WORK_GROUP_SIZE query will return 505 zero when using a shader with a variable local group size, and will 506 always return non-zero values for shaders with a fixed group size. 507 508Revision History 509 510 Revision 9, December 10, 2018 (Jon Leech) 511 - Use 'workgroup' consistently throughout (Bug 11723, internal API 512 issue 87). 513 514 Revision 8, May 30, 2013 (pbrown) 515 - Fix a typo in the MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB description; 516 that limit applies only to shaders with variable group sizes. 517 518 Revision 7, May 30, 2013 (pbrown) 519 - Mark issue (8) as resolved. 520 521 Revision 6, May 12, 2013 (JohnK) 522 - Editorial things: 523 - be more consistent/broader with "fixed local group size" language 524 (vs. variable), and related, also bringing in another paragraph from 525 the core spec. 526 - move spec. more toward using bold layout qualifier ids everywhere 527 - few minor typos, other tiny changes 528 529 Revision 5, May 8, 2013 530 - Assign enum values for new tokens. 531 - Add interaction with NV_compute_program5 assembly programs. 532 533 Revision 4, May 7, 2013 534 - Add new implementation limits MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB and 535 MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB for compute shaders with 536 variable group sizes, with minimum values of 512/512/64 and 512, 537 respectively. 538 - Add new tokens MAX_COMPUTE_FIXED_GROUP_SIZE_ARB and 539 MAX_COMPUTE_FIXED_GROUP_INVOCATIONS_ARB for compute shaders with fixed 540 group sizes, which are aliased to existing OpenGL 4.3 tokens 541 (MAX_COMPUTE_WORK_GROUP_SIZE and MAX_COMPUTE_WORK_GROUP_INVOCATIONS). 542 543 Revision 3, May 4, 2013 544 - Add ARB suffixes for the new entry point (DispatchComputeGroupSizeARB) 545 and GLSL built-in variable (gl_LocalGroupSizeARB). 546 - Add a missing INVALID_OPERATION error to DispatchComputeIndirect, 547 which requires a compute shader with a variable local group size. 548 - Add new issue (8) about querying if a program with a compute shader 549 has a fixed or variable group size. 550 551 Revision 2, May 3, 2013 552 - Modify the spec to accept an explicit layout qualifer 553 /local_size_variable/ to specify a compute shader with a variable 554 local group size instead of inferring it from the lack of fixed-size 555 layout qualifiers. 556 - Modify some spec language to refer to the existing and new types of 557 compute shaders as having a fixed and variable local group size, 558 respectively. 559 - Mark various issues as resolved based on work group discussions. 560 - Add new issue (7) about different implementation-dependent size limits 561 for compute shaders with variable-size work groups. 562 563 Revision 1, January 20, 2013 564 - Initial revision. 565