1Name 2 3 NV_compute_program5 4 5Name Strings 6 7 GL_NV_compute_program5 8 9Contact 10 11 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 12 13Status 14 15 Complete 16 17Version 18 19 Last Modified Date: 10/23/2012 20 NVIDIA Revision: 2 21 22Number 23 24 421 25 26Dependencies 27 28 OpenGL 4.0 (Core or Compatibiity Profile) is required. 29 30 This extension is written against the OpenGL 4.2 Specification 31 (Compatibility Profile). 32 33 NV_gpu_program4 and NV_gpu_program5 are required. 34 35 ARB_compute_shader is required. 36 37 This specification interacts with NV_shader_atomic_float. 38 39 This specification interacts with EXT_shader_image_load_store. 40 41Overview 42 43 This extension builds on the ARB_compute_shader extension to provide new 44 assembly compute program capability for OpenGL. ARB_compute_shader adds 45 the basic functionality, including the ability to dispatch compute work. 46 This extension provides the ability to write a compute program in 47 assembly, using the same basic syntax and capability set found in the 48 NV_gpu_program4 and NV_gpu_program5 extensions. 49 50New Procedures and Functions 51 52 None. 53 54New Tokens 55 56 Accepted by the <cap> parameter of Disable, Enable, and IsEnabled, 57 by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, 58 and GetDoublev, and by the <target> parameter of ProgramStringARB, 59 BindProgramARB, ProgramEnvParameter4[df][v]ARB, 60 ProgramLocalParameter4[df][v]ARB, GetProgramEnvParameter[df]vARB, 61 GetProgramLocalParameter[df]vARB, GetProgramivARB and 62 GetProgramStringARB: 63 64 COMPUTE_PROGRAM_NV 0x90FB 65 66 Accepted by the <target> parameter of ProgramBufferParametersfvNV, 67 ProgramBufferParametersIivNV, and ProgramBufferParametersIuivNV, 68 BindBufferRangeNV, BindBufferOffsetNV, BindBufferBaseNV, and BindBuffer 69 and the <value> parameter of GetIntegerIndexedvEXT: 70 71 COMPUTE_PROGRAM_PARAMETER_BUFFER_NV 0x90FC 72 73 (Note: Various enumerants from ARB_compute_shader will also be used by 74 this extension.) 75 76Additions to Chapter 2 of the OpenGL 4.2 (Compatibility Profile) Specification 77(OpenGL Operation) 78 79 Modify Section 2.X, GPU Programs, of NV_gpu_program4 (as modified by 80 NV_gpu_program5) 81 82 (insert after second paragraph) 83 84 Compute Programs 85 86 Compute programs are used to perform general purpose computations using a 87 three-dimensional array of program invocations (threads). The compute 88 shader invocations are arranged into work groups specified by the 89 mandatory GROUP_SIZE declaration, each of which comprises a fixed-size, 90 three-dimensional array of program invocations. One or more work groups 91 are scheduled for execution using the DispatchCompute or 92 DispatchComputeIndirect commands. 93 94 Each work group scheduled for execution will launch a separate program 95 invocation for each work group member. While the program invocations in a 96 work group are launched together, they run independently after launch. 97 The BAR (barrier) instruction is available to synchronize program 98 invocations; an invocation stops at each BAR instruction until all 99 invocations in the work group have executed the BAR instruction. Each 100 work group has an optional shared memory allocation (specified by the 101 SHARED_MEMORY declaration) that can be read or written by any invocations 102 of the work group. 103 104 Unlike other program types, compute program invocations have no inputs or 105 outputs interfacing with the rest of the pipeline. Compute programs may 106 obtain inputs using mechanisms such as global loads, image loads, atomic 107 counter reads, shader storage buffer reads, and program parameters. 108 Built-in inputs are also provided to allow a compute shader invocation to 109 determine its position in the work group, the position of its work group 110 in the full dispatch, as well as the work group and full dispatch sizes. 111 Compute program results are expected to be written to globally accessible 112 memory using mechanisms such as global stores, image stores, atomic 113 counters, and shader storage buffers. 114 115 116 Modify Section 2.X.2, Program Grammar 117 118 (replace third paragraph) 119 120 Compute programs are required to begin with the header string "!!NVcp5.0". 121 This header string identifies the subsequent program body as being a 122 compute program and indicates that it should be parsed according to the 123 base NV_gpu_program5 grammar plus the additions below. Program string 124 parsing begins with the character immediately following the header string. 125 126 (add the following grammar rules to the NV_gpu_program5 base grammar for 127 compute programs) 128 129 <declSequence> ::= <declaration> <declSequence> 130 131 <instruction> ::= <SpecialInstruction> 132 133 <opModifier> ::= "CTA" 134 135 <namingStatement> ::= <SHARED_statement> 136 137 <SHARED_statement> ::= "SHARED" <establishName> <sharedSingleInit> 138 | "SHARED" <establishName> <optArraySize> 139 <sharedMultipleInit> 140 141 <sharedSingleInit> ::= "=" <sharedUseDS> 142 143 <sharedMultipleInit> ::= "=" "{" <sharedItemList> "}" 144 145 <sharedItemList> ::= <sharedUseDM> 146 | <sharedUseDM> "," <sharedItemList> 147 148 <sharedUseV> ::= <sharedVarName> <optArrayMem> 149 150 <sharedUseDS> ::= <sharedBaseBinding> <arrayMemAbs> 151 152 <sharedUseDM> ::= <sharedUseDS> 153 | <sharedBaseBinding> <arrayRange> 154 155 <sharedBaseBinding> ::= "program" "." "sharedmem" 156 157 <SpecialInstruction> ::= "BAR" 158 | "ATOMS" <opModifiers> <instResult> "," 159 <instOperandV> "," <sharedUseV> 160 | "LDS" <opModifiers> <instResult> "," 161 <sharedUseV> 162 | "STS" <opModifiers> <instOperandV> "," 163 <sharedUseV> 164 165 <declaration> ::= "GROUP_SIZE" <int> 166 | "GROUP_SIZE" <int> <int> 167 | "GROUP_SIZE" <int> <int> <int> 168 | "SHARED_MEMORY" <int> 169 170 <attribBasic> ::= "invocation" "." "localid" 171 | "invocation" "." "globalid" 172 | "invocation" "." "groupid" 173 | "invocation" "." "groupcount" 174 | "invocation" "." "groupsize" 175 | "invocation" "." "localindex" 176 177 178 (add the following subsection to Section 2.X.3.2, Program Attribute 179 Variables) 180 181 Compute program attribute variables describe the attributes of the current 182 program invocation. Each DispatchCompute command produces a set of 183 program invocations arranged as a one-, two-, or three-dimensional array. 184 Figure X.1 illustrates a two-dimensional dispatch with a local work group 185 size of 8x4, and a total dispatch of 5x4 local workgroups. Each 186 individual program invocation has a global one-, two-, or 187 three-dimensional global coordinate, which can be further decomposed into 188 a work group offset (in fixed-size work groups) and a local offset 189 relative to the origin of an invocation's work group. 190 191 +-------+-------+-------+-------+-------+ 192 | | | work | | | 193 | | | group | | | 194 | | | (2,3) | | | 195 (0,12) +-------+-------+-------+-------+-------+ 196 | | | | | | 197 | | | | | | 198 | | * | | | | 199 (0,8) +-------+-------+-------+-------+-------+ 200 | | | | | work | 201 | | | | | group | 202 | | | | | (4,1) | 203 (0,4) +-------+-------+-------+-------+-------+ 204 | work | | | | | 205 | group | | | | | 206 | (0,0) | | | | | 207 +-------+-------+-------+-------+-------+ 208 (0,0) (8,0) (16,0) (24,0) (32,0) 209 210 Figure X.1, Compute Dispatch. The single invocation at the location 211 labeled "*" has a location (invocation.globalid) of (10,9). The offset 212 relative to its local work group (invocation.localid) is (2,1). Its 213 local work group has an offset (invocation.groupid) of (1,2), in units 214 of work groups. 215 216 The set of available compute program attribute bindings is enumerated in 217 Table X.1. All bindings are considered four-component unsigned integer 218 vectors with the value of the fourth component undefined. 219 220 Attribute Binding Components Underlying State 221 ------------------------- ---------- ------------------------------ 222 invocation.localid (x,y,z,-) offset relative to base of 223 work group 224 225 invocation.globalid (x,y,z,-) offset relative to the base 226 of the dispatched work 227 228 invocation.groupid (x,y,z,-) offset (in groups) of local work 229 group 230 231 invocation.groupcount (x,y,z,-) total local work group count 232 233 invocation.groupsize (x,y,z,-) number of invocations in each 234 dimension of the local work group 235 236 invocation.localindex (x,-,-,-) one-dimensional (flattened) index 237 in local workgroup 238 239 Table X.1, Compute Program Attribute Bindings. 240 241 If a compute attribute binding matches "invocation.localid", the "x", "y", 242 and "z" components of the invocation attribute variable are filled with 243 the "x", "y", "z" components, respectively, of the offset of the 244 invocation relative to the base of its local workgroup. The "w" component 245 of the attribute is undefined. 246 247 If a compute attribute binding matches "invocation.globalid", the "x", 248 "y", and "z" components of the invocation attribute variable are filled 249 with the "x", "y", "z" components, respectively, of the offset of the 250 invocation relative to the full compute dispatch. The "w" component of 251 the attribute is undefined. 252 253 If a compute attribute binding matches "invocation.groupid", the "x", "y", 254 and "z" components of the invocation attribute variable are filled with 255 the "x", "y", "z" components, respectively, of the offset of the local 256 work group (in groups) relative to the full compute dispatch. The "w" 257 component of the attribute is undefined. 258 259 If a compute attribute binding matches "invocation.groupcount", the "x", 260 "y", and "z" components of the invocation attribute variable are filled 261 the "x", "y", and "z" dimensions, respectively, in local work groups of 262 the full compute dispatch. The "w" component of the attribute is 263 undefined. 264 265 If a compute attribute binding matches "invocation.groupsize", the "x", 266 "y", and "z" components of the invocation attribute variable are filled 267 the "x", "y", and "z" dimensions, respectively, of the local work group, 268 as specified by the GROUP_SIZE declaration. The "w" component of the 269 attribute is undefined. 270 271 If a compute attribute binding matches "invocation.localindex", the "x", 272 components of the invocation attribute variable is filled with a flattened 273 one-dimensional index of the invocation, which is derived as: 274 275 invocation.localid.z * invocation.groupsize.x * invocation.groupsize.y + 276 invocation.localid.y * invocation.groupsize.x + 277 invocation.localid.x 278 279 The "y", "z", and "w" components of the attribute are undefined. 280 281 For one-dimensional dispatches, the "y" components of 282 "invocation.localid", "invocation.globalid", and "invocation.groupid" will 283 be zero. For one- and two- dimensional dispatches, the "z" components of 284 "invocation.localid", "invocation.globalid", and "invocation.groupid" will 285 be zero. The same components of "invocation.groupcount" and 286 "invocation.groupsize" will be one in these cases. 287 288 289 (add the following subsection to section 2.X.3.5, Program Results.) 290 291 Compute programs have no result variables; all shader results must be 292 written to memory. 293 294 295 Add New Section 2.X.3.Y, Compute Program Shared Memory, after Section 296 2.X.3.6, Program Parameter Buffers 297 298 Compute program shared memory variables are arrays of basic machine units 299 from which data can be read or written using the LDS and STS instructions. 300 Compute program shared memory also supports atomic memory operations using 301 the ATOMS instruction. The GL allocates a single block of shared memory 302 for each local work group, whose size in basic machine units is specified 303 by the "SHARED_MEMORY" statement. The contents of compute program shared 304 memory are undefined when program execution for the local work group 305 begins and can be changed only by using the ATOMS or STS instructions. 306 Compute program shared memory variables are shared between all invocations 307 of a local work group. Writes performed by one invocation will be visible 308 for any reads of the same memory from any other invocation executed after 309 the write. Note that the order of reads and writes between different 310 invocations in a local work group is largely undefined, although the BAR 311 instruction can be used to introduce synchronization points for all 312 invocations in a local work group. 313 314 Shared memory variables may only be used as operands in the ATOMS, LDS, 315 and STS instructions; they may not be used by used as results or operands 316 in general instructions. Shared memory variables must be declared 317 explicitly via the <SHARED_statement> grammar rule. Shared memory 318 bindings can not be used directly in executable instructions. 319 320 Shader storage buffer variables may be declared as arrays, but all 321 bindings assigned to the array must use the same binding point(s) and must 322 increase consecutively. 323 324 Binding Components Underlying State 325 ----------------------------- ---------- ----------------------------- 326 program.sharedmem[a] (x,x,x,x) compute shared memory, 327 element a 328 program.sharedmem[a..b] (x,x,x,x) compute shared memory, 329 elements a through b 330 program.sharedmem (x,x,x,x) compute shared memory, 331 all elements 332 333 Table X.3: Shared Memory Bindings. <a> and <b> indicate individual 334 elements of shared memory. 335 336 If a shared memory binding matches "program.sharedmem[a]", the shared 337 memory variable is associated with basic machine element <a> of compute 338 shared memory. 339 340 For shared memory declarations, "program.sharedmem[a..b]" is equivalent to 341 specifying elements <a> through <b> of compute shared memory in order. 342 343 For shared memory declarations, "program.sharedmem" is equivalent to 344 specifying elements zero through <N>-1 of compute shared memory in order, 345 where <N> is the total shared memory size declared by the "SHARED_MEMORY" 346 statement. 347 348 349 Modify Section 2.X.4, Program Execution Environment 350 351 (add to the opcode table) 352 353 Modifiers 354 Instruction F I C S H D Out Inputs Description 355 ----------- - - - - - - --- -------- -------------------------------- 356 ATOMS - - X - - - s v,su atomic transaction to shared mem 357 BAR - - - - - - - - work group execution barrier 358 LDS - - X X - F v su load from shared memory 359 STS - - - - - - - v,su store to shared memory 360 361 362 Modify Section 2.X.4.1, Program Instruction Modifiers 363 364 Modifier Description 365 -------- ----------------------------------------------- 366 CTA Memory barrier orders only memory transactions 367 relative to invocations within local work group 368 369 (add to descriptions of opcode modifiers) 370 371 For the MEMBAR (memory barrier) instruction, the "CTA" modifier specifies 372 that memory transactions before and after the barrier are strongly ordered 373 as observed by any other shader invocation in the local work group. 374 375 376 Modify Section 2.X.4.5, Program Memory Access, from NV_gpu_program5 377 378 (add to the end of the first paragraph) ... Additionally programs may load 379 from or store to shared memory via the ATOMS (atomic shared memory 380 operation), LDS (load from shared memory), and STS (store to shared 381 memory) instructions. 382 383 (modify miscellaneous other language referring to "buffer object memory" 384 to instead refer to "buffer object and shared memory") 385 386 (add hypothetical built-in functions SharedMemoryLoad() and 387 SharedMemoryStore() that behave similarly to BufferMemoryLoad() and 388 BufferMemoryStore(), except that they access local work group shared 389 memory instead of buffer object memory) 390 391 392 Add the following subsection to section 2.X.7, Program Declarations 393 394 Section 2.X.7.Y, Compute Program Declarations 395 396 Compute programs support two types of declaration statement, as described 397 below. 398 399 - Shader Thread Group Size (GROUP_SIZE) 400 401 The GROUP_SIZE statement declares the number of shader threads in a one-, 402 two-, or three-dimensional local work group. The statement must have one 403 to three unsigned integer arguments. Each argument must be less than or 404 equal to the value of the implementation-dependent limit 405 MAX_COMPUTE_LOCAL_WORK_SIZE for its corresponding dimension (X, Y, or Z). 406 A program will fail to load unless it contains exactly one GROUP_SIZE 407 declaration. 408 409 410 - Shared Memory Storage Size (SHARED_MEMORY) 411 412 The SHARED_MEMORY statement declares the size of the shared memory, in 413 basic machine units, available to the threads of each local work group. 414 The SHARED_MEMORY statement is optional, but a program will fail to load 415 if it includes multiple SHARED_MEMORY declarations, if it uses the the 416 ATOMS, LDS, or STS instructions in a program without a SHARED_MEMORY 417 declaration, if uses these instructions with an offset that would access 418 memory beyond the declared shared memory size, or if the declared shared 419 memory size is greater than the implementation-dependent limit 420 MAX_COMPUTE_SHARED_VARIABLE_SIZE. 421 422 423 (add the following subsection to section 2.X.8, Program Instruction Set.) 424 425 Section 2.X.8.Z, ATOMS: Atomic Memory Operation (Shared Memory) 426 427 The ATOMS instruction performs an atomic memory operation by reading from 428 shared memory specified by the second unsigned integer scalar operand, 429 computing a new value based on the value read from memory and the first 430 (vector) operand, and then writing the result back to the same memory 431 address. The memory transaction is atomic, guaranteeing that no other 432 write to the memory accessed will occur between the time it is read and 433 written by the ATOMS instruction. The result of the ATOMS instruction is 434 the scalar value read from memory. The second operand used for the ATOMS 435 instruction must correspond to a shared memory variable declared using the 436 "SHARED" statement; a program will fail to load if any other type of 437 operand is used for the second operand of an ATOMS instruction. 438 439 The ATOMS instruction has two required instruction modifiers. The atomic 440 modifier specifies the type of operation to be performed. The storage 441 modifier specifies the size and data type of the operand read from memory 442 and the base data type of the operation used to compute the value to be 443 written to memory. 444 445 atomic storage 446 modifier modifiers operation 447 -------- ------------------ -------------------------------------- 448 ADD U32, S32, U64, F32 compute a sum 449 MIN U32, S32 compute minimum 450 MAX U32, S32 compute maximum 451 IWRAP U32 increment memory, wrapping at operand 452 DWRAP U32 decrement memory, wrapping at operand 453 AND U32, S32 compute bit-wise AND 454 OR U32, S32 compute bit-wise OR 455 XOR U32, S32 compute bit-wise XOR 456 EXCH U32, S32, U64, F32 exchange memory with operand 457 CSWAP U32, S32, U64 compare-and-swap 458 459 Table X.Y, Supported atomic and storage modifiers for the ATOM 460 instruction. 461 462 Not all storage modifiers are supported by ATOMS, and the set of modifiers 463 allowed for any given instruction depends on the atomic modifier 464 specified. Table X.Y enumerates the set of atomic modifiers supported by 465 the ATOMS instruction, and the storage modifiers allowed for each. 466 467 tmp0 = VectorLoad(op0); 468 result = SharedMemoryLoad(op1, storageModifier); 469 switch (atomicModifier) { 470 case ADD: 471 writeval = tmp0.x + result; 472 break; 473 case MIN: 474 writeval = min(tmp0.x, result); 475 break; 476 case MAX: 477 writeval = max(tmp0.x, result); 478 break; 479 case IWRAP: 480 writeval = (result >= tmp0.x) ? 0 : result+1; 481 break; 482 case DWRAP: 483 writeval = (result == 0 || result > tmp0.x) ? tmp0.x : result-1; 484 break; 485 case AND: 486 writeval = tmp0.x & result; 487 break; 488 case OR: 489 writeval = tmp0.x | result; 490 break; 491 case XOR: 492 writeval = tmp0.x ^ result; 493 break; 494 case EXCH: 495 break; 496 case CSWAP: 497 if (result == tmp0.x) { 498 writeval = tmp0.y; 499 } else { 500 return result; // no memory store 501 } 502 break; 503 } 504 SharedMemoryStore(op1, writeval, storageModifier); 505 506 ATOMS performs a scalar atomic operation. The <y>, <z>, and <w> 507 components of the result vector are undefined. 508 509 ATOMS supports no base data type modifiers, but requires exactly one 510 storage modifier. The base data types of the result vector, and the first 511 (vector) operand are derived from the storage modifier. The second 512 operand is always interpreted as a scalar unsigned integer. 513 514 515 Section 2.X.8.Z, BAR: Execution Barrier 516 517 The BAR instruction synchronizes the execution of compute shader 518 invocations within a local work group. When a compute shader invocation 519 executes the BAR instruction, it pauses until the same BAR instruction has 520 been executed by all invocations in the current local work group. Once 521 all invocations have executed the BAR instruction, processing continues 522 with the instruction following the BAR instruction. 523 524 There is no compile-time restriction on the locations in a program where 525 BAR is allowed. However, BAR instructions are not allowed in divergent 526 flow control; if any compute shader invocation in the work group executes 527 the BAR instruction, all compute shaders invocations must execute the 528 instruction. Results of executing a BAR instruction are undefined and can 529 result in application hangs and/or program termination if the instruction 530 is issued: 531 532 * inside any IF/ELSE/ENDIF block where the results of the condition 533 evaluated by the IF instruction are not identical across the work 534 group; 535 536 * inside any iteration of REP/ENDREP block where at least one invocation 537 in the work group has skipped to the next iteration using the CONT 538 instruction, exited the loop using a BRK or RET instruction, or exited 539 the loop due to having completed the requested number of loop 540 iterations; or 541 542 * inside any subroutine (including main) where at least one invocation 543 in the work group has exited the subroutine using the RET instruction. 544 545 BAR has no operands and generates no result. 546 547 548 Section 2.X.8.Z, LDS: Load from Shared Memory 549 550 The LDS instruction generates a result vector by fetching data from the 551 shared memory for the current local work group identified by the first 552 operand, as described in Section 2.X.4.5. The single operand for the LDS 553 instruction must correspond to a shader shared memory variable declared 554 using the "SHARED" statement; a program will fail to load if any other 555 type of operand is used in an LDS instruction. 556 557 result = SharedMemoryLoad(op0, storageModifier); 558 559 LDS supports no base data type modifiers, but requires exactly one storage 560 modifier. The base data type of the result vector is derived from the 561 storage modifier. 562 563 564 Replace Section 2.X.8.Z, MEMBAR: Memory Barrier, as added by 565 EXT_shader_image_load_store 566 567 The MEMBAR instruction synchronizes memory transactions to ensure that 568 memory transactions resulting from any instruction executed by the thread 569 prior to the MEMBAR instruction complete prior to any memory transactions 570 issued after the instruction, as observed by other shader invocations. 571 572 The MEMBAR instruction has one optional instruction modifier. If the CTA 573 instruction modifier is specified, memory transactions before and after 574 the barrier will be strongly ordered as observed by other shader 575 invocations in the same local work group. However, it does not order 576 transactions as viewed by any other shader. With the CTA modifier, 577 shaders not in the local work group may observe the results of memory 578 transactions issued after the MEMBAR instruction before those issued 579 before the MEMBAR instruction. If the CTA instruction modifier is not 580 specified, all shader invocations will see the results of any memory 581 transaction issued before the MEMBAR instruction before those issued after 582 the MEMBAR instruction. 583 584 MEMBAR has no operands and generates no result. 585 586 587 Section 2.X.8.Z, STS: Store to Shared Memory 588 589 The STS instruction writes the contents of the first vector operand to 590 shared memory for the current local work group identified by the second 591 operand, as described in Section 2.X.4.5. This instruction generates no 592 result. The second operand for the STS instruction must correspond to a 593 shared memory variable declared using the "SHARED" statement; a program 594 will fail to load if any other type of operand is used in an STS 595 instruction. 596 597 tmp0 = VectorLoad(op0); 598 SharedMemoryStore(op1, tmp0, storageModifier); 599 600 STS supports no base data type modifiers, but requires exactly one storage 601 modifier. The base data type of the vector components of the first 602 operand is derived from the storage modifier. 603 604 605Additions to Chapter 3 of the OpenGL 4.2 (Compatibility Profile) Specification 606(Rasterization) 607 608 None. 609 610Additions to Chapter 4 of the OpenGL 4.2 (Compatibility Profile) Specification 611(Per-Fragment Operations and the Frame Buffer) 612 613 None. 614 615Additions to Chapter 5 of the OpenGL 4.2 (Compatibility Profile) Specification 616(Special Functions) 617 618 None. 619 620Additions to Chapter 6 of the OpenGL 4.2 (Compatibility Profile) Specification 621(State and State Requests) 622 623 None. 624 625Additions to the AGL/GLX/WGL Specifications 626 627 None. 628 629GLX Protocol 630 631 None. 632 633Dependencies on NV_shader_atomic_float 634 635 If NV_shader_atomic_float is not supported, the ADD and EXCH atomic 636 operations in the ATOMS instruction do not support the "F32" storage 637 modifier. 638 639Dependencies on EXT_shader_image_load_store 640 641 If EXT_shader_image_load_store is not supported, language describing the 642 "CTA" instruction modifier and modifying the MEMBAR instruction (as added 643 by EXT_shader_image_load_store) should be removed. 644 645Errors 646 647 None. 648 649New State 650 651 (Modify ARB_vertex_program, Table X.6 -- Program State) 652 653 Initial 654 Get Value Type Get Command Value Description Sec. Attribute 655 --------- ------- ----------- ------- ------------------------ ------ --------- 656 COMPUTE_PROGRAM_PARAMETER_ Z+ GetIntegerv 0 Active compute program 2.14.1 - 657 BUFFER_NV buffer object binding 658 COMPUTE_PROGRAM_PARAMETER_ nxZ+ GetInteger- 0 Buffer objects bound for 2.14.1 - 659 BUFFER_NV IndexedvEXT compute program use 660 661 Also shares buffer bindings and other state with the ARB_compute_shader 662 extension. 663 664New Implementation Dependent State 665 666 None, but shares implementation-dependent state with the 667 ARB_compute_shader extension. 668 669Issues 670 671 None. 672 673Revision History 674 675 Rev. Date Author Changes 676 ---- -------- -------- -------------------------------------------- 677 2 10/23/12 pbrown Remove the restriction forbidding the use of BAR 678 inside potentially divergent flow control. 679 Instead, we will allow BAR to be executed 680 anywhere, but specify undefined results 681 (including hangs or program termination) if the 682 flow control is divergent (bug 9367). 683 684 1 pbrown Internal spec development. 685