• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    NV_compute_program5
4
5Name Strings
6
7    GL_NV_compute_program5
8
9Contact
10
11    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
12
13Status
14
15    Complete
16
17Version
18
19    Last Modified Date:         10/23/2012
20    NVIDIA Revision:            2
21
22Number
23
24    421
25
26Dependencies
27
28    OpenGL 4.0 (Core or Compatibiity Profile) is required.
29
30    This extension is written against the OpenGL 4.2 Specification
31    (Compatibility Profile).
32
33    NV_gpu_program4 and NV_gpu_program5 are required.
34
35    ARB_compute_shader is required.
36
37    This specification interacts with NV_shader_atomic_float.
38
39    This specification interacts with EXT_shader_image_load_store.
40
41Overview
42
43    This extension builds on the ARB_compute_shader extension to provide new
44    assembly compute program capability for OpenGL.  ARB_compute_shader adds
45    the basic functionality, including the ability to dispatch compute work.
46    This extension provides the ability to write a compute program in
47    assembly, using the same basic syntax and capability set found in the
48    NV_gpu_program4 and NV_gpu_program5 extensions.
49
50New Procedures and Functions
51
52    None.
53
54New Tokens
55
56    Accepted by the <cap> parameter of Disable, Enable, and IsEnabled,
57    by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,
58    and GetDoublev, and by the <target> parameter of ProgramStringARB,
59    BindProgramARB, ProgramEnvParameter4[df][v]ARB,
60    ProgramLocalParameter4[df][v]ARB, GetProgramEnvParameter[df]vARB,
61    GetProgramLocalParameter[df]vARB, GetProgramivARB and
62    GetProgramStringARB:
63
64        COMPUTE_PROGRAM_NV                              0x90FB
65
66    Accepted by the <target> parameter of ProgramBufferParametersfvNV,
67    ProgramBufferParametersIivNV, and ProgramBufferParametersIuivNV,
68    BindBufferRangeNV, BindBufferOffsetNV, BindBufferBaseNV, and BindBuffer
69    and the <value> parameter of GetIntegerIndexedvEXT:
70
71        COMPUTE_PROGRAM_PARAMETER_BUFFER_NV             0x90FC
72
73    (Note:  Various enumerants from ARB_compute_shader will also be used by
74     this extension.)
75
76Additions to Chapter 2 of the OpenGL 4.2 (Compatibility Profile) Specification
77(OpenGL Operation)
78
79    Modify Section 2.X, GPU Programs, of NV_gpu_program4 (as modified by
80    NV_gpu_program5)
81
82    (insert after second paragraph)
83
84    Compute Programs
85
86    Compute programs are used to perform general purpose computations using a
87    three-dimensional array of program invocations (threads).  The compute
88    shader invocations are arranged into work groups specified by the
89    mandatory GROUP_SIZE declaration, each of which comprises a fixed-size,
90    three-dimensional array of program invocations.  One or more work groups
91    are scheduled for execution using the DispatchCompute or
92    DispatchComputeIndirect commands.
93
94    Each work group scheduled for execution will launch a separate program
95    invocation for each work group member.  While the program invocations in a
96    work group are launched together, they run independently after launch.
97    The BAR (barrier) instruction is available to synchronize program
98    invocations; an invocation stops at each BAR instruction until all
99    invocations in the work group have executed the BAR instruction.  Each
100    work group has an optional shared memory allocation (specified by the
101    SHARED_MEMORY declaration) that can be read or written by any invocations
102    of the work group.
103
104    Unlike other program types, compute program invocations have no inputs or
105    outputs interfacing with the rest of the pipeline.  Compute programs may
106    obtain inputs using mechanisms such as global loads, image loads, atomic
107    counter reads, shader storage buffer reads, and program parameters.
108    Built-in inputs are also provided to allow a compute shader invocation to
109    determine its position in the work group, the position of its work group
110    in the full dispatch, as well as the work group and full dispatch sizes.
111    Compute program results are expected to be written to globally accessible
112    memory using mechanisms such as global stores, image stores, atomic
113    counters, and shader storage buffers.
114
115
116    Modify Section 2.X.2, Program Grammar
117
118    (replace third paragraph)
119
120    Compute programs are required to begin with the header string "!!NVcp5.0".
121    This header string identifies the subsequent program body as being a
122    compute program and indicates that it should be parsed according to the
123    base NV_gpu_program5 grammar plus the additions below.  Program string
124    parsing begins with the character immediately following the header string.
125
126    (add the following grammar rules to the NV_gpu_program5 base grammar for
127     compute programs)
128
129    <declSequence>          ::= <declaration> <declSequence>
130
131    <instruction>           ::= <SpecialInstruction>
132
133    <opModifier>            ::= "CTA"
134
135    <namingStatement>       ::= <SHARED_statement>
136
137    <SHARED_statement>      ::= "SHARED" <establishName> <sharedSingleInit>
138                              | "SHARED" <establishName> <optArraySize>
139                                <sharedMultipleInit>
140
141    <sharedSingleInit>      ::= "=" <sharedUseDS>
142
143    <sharedMultipleInit>    ::= "=" "{" <sharedItemList> "}"
144
145    <sharedItemList>        ::= <sharedUseDM>
146                              | <sharedUseDM> "," <sharedItemList>
147
148    <sharedUseV>            ::= <sharedVarName> <optArrayMem>
149
150    <sharedUseDS>           ::= <sharedBaseBinding> <arrayMemAbs>
151
152    <sharedUseDM>           ::= <sharedUseDS>
153                              | <sharedBaseBinding> <arrayRange>
154
155    <sharedBaseBinding>     ::= "program" "." "sharedmem"
156
157    <SpecialInstruction>    ::= "BAR"
158                              | "ATOMS" <opModifiers> <instResult> ","
159                                <instOperandV> "," <sharedUseV>
160                              | "LDS" <opModifiers> <instResult> ","
161                                <sharedUseV>
162                              | "STS" <opModifiers> <instOperandV> ","
163                                <sharedUseV>
164
165    <declaration>           ::= "GROUP_SIZE" <int>
166                              | "GROUP_SIZE" <int> <int>
167                              | "GROUP_SIZE" <int> <int> <int>
168                              | "SHARED_MEMORY" <int>
169
170    <attribBasic>           ::= "invocation" "." "localid"
171                              | "invocation" "." "globalid"
172                              | "invocation" "." "groupid"
173                              | "invocation" "." "groupcount"
174                              | "invocation" "." "groupsize"
175                              | "invocation" "." "localindex"
176
177
178    (add the following subsection to Section 2.X.3.2, Program Attribute
179     Variables)
180
181    Compute program attribute variables describe the attributes of the current
182    program invocation.  Each DispatchCompute command produces a set of
183    program invocations arranged as a one-, two-, or three-dimensional array.
184    Figure X.1 illustrates a two-dimensional dispatch with a local work group
185    size of 8x4, and a total dispatch of 5x4 local workgroups.  Each
186    individual program invocation has a global one-, two-, or
187    three-dimensional global coordinate, which can be further decomposed into
188    a work group offset (in fixed-size work groups) and a local offset
189    relative to the origin of an invocation's work group.
190
191                +-------+-------+-------+-------+-------+
192                |       |       | work  |       |       |
193                |       |       | group |       |       |
194                |       |       | (2,3) |       |       |
195         (0,12) +-------+-------+-------+-------+-------+
196                |       |       |       |       |       |
197                |       |       |       |       |       |
198                |       | *     |       |       |       |
199          (0,8) +-------+-------+-------+-------+-------+
200                |       |       |       |       | work  |
201                |       |       |       |       | group |
202                |       |       |       |       | (4,1) |
203          (0,4) +-------+-------+-------+-------+-------+
204                | work  |       |       |       |       |
205                | group |       |       |       |       |
206                | (0,0) |       |       |       |       |
207                +-------+-------+-------+-------+-------+
208              (0,0)   (8,0)   (16,0)  (24,0)  (32,0)
209
210      Figure X.1, Compute Dispatch.  The single invocation at the location
211      labeled "*" has a location (invocation.globalid) of (10,9).  The offset
212      relative to its local work group (invocation.localid) is (2,1).  Its
213      local work group has an offset (invocation.groupid) of (1,2), in units
214      of work groups.
215
216    The set of available compute program attribute bindings is enumerated in
217    Table X.1.  All bindings are considered four-component unsigned integer
218    vectors with the value of the fourth component undefined.
219
220      Attribute Binding          Components  Underlying State
221      -------------------------  ----------  ------------------------------
222      invocation.localid         (x,y,z,-)   offset relative to base of
223                                             work group
224
225      invocation.globalid        (x,y,z,-)   offset relative to the base
226                                             of the dispatched work
227
228      invocation.groupid         (x,y,z,-)   offset (in groups) of local work
229                                             group
230
231      invocation.groupcount      (x,y,z,-)   total local work group count
232
233      invocation.groupsize       (x,y,z,-)   number of invocations in each
234                                             dimension of the local work group
235
236      invocation.localindex      (x,-,-,-)   one-dimensional (flattened) index
237                                             in local workgroup
238
239      Table X.1, Compute Program Attribute Bindings.
240
241    If a compute attribute binding matches "invocation.localid", the "x", "y",
242    and "z" components of the invocation attribute variable are filled with
243    the "x", "y", "z" components, respectively, of the offset of the
244    invocation relative to the base of its local workgroup.  The "w" component
245    of the attribute is undefined.
246
247    If a compute attribute binding matches "invocation.globalid", the "x",
248    "y", and "z" components of the invocation attribute variable are filled
249    with the "x", "y", "z" components, respectively, of the offset of the
250    invocation relative to the full compute dispatch.  The "w" component of
251    the attribute is undefined.
252
253    If a compute attribute binding matches "invocation.groupid", the "x", "y",
254    and "z" components of the invocation attribute variable are filled with
255    the "x", "y", "z" components, respectively, of the offset of the local
256    work group (in groups) relative to the full compute dispatch.  The "w"
257    component of the attribute is undefined.
258
259    If a compute attribute binding matches "invocation.groupcount", the "x",
260    "y", and "z" components of the invocation attribute variable are filled
261    the "x", "y", and "z" dimensions, respectively, in local work groups of
262    the full compute dispatch.  The "w" component of the attribute is
263    undefined.
264
265    If a compute attribute binding matches "invocation.groupsize", the "x",
266    "y", and "z" components of the invocation attribute variable are filled
267    the "x", "y", and "z" dimensions, respectively, of the local work group,
268    as specified by the GROUP_SIZE declaration.  The "w" component of the
269    attribute is undefined.
270
271    If a compute attribute binding matches "invocation.localindex", the "x",
272    components of the invocation attribute variable is filled with a flattened
273    one-dimensional index of the invocation, which is derived as:
274
275      invocation.localid.z * invocation.groupsize.x * invocation.groupsize.y +
276      invocation.localid.y * invocation.groupsize.x +
277      invocation.localid.x
278
279    The "y", "z", and "w" components of the attribute are undefined.
280
281    For one-dimensional dispatches, the "y" components of
282    "invocation.localid", "invocation.globalid", and "invocation.groupid" will
283    be zero.  For one- and two- dimensional dispatches, the "z" components of
284    "invocation.localid", "invocation.globalid", and "invocation.groupid" will
285    be zero.  The same components of "invocation.groupcount" and
286    "invocation.groupsize" will be one in these cases.
287
288
289    (add the following subsection to section 2.X.3.5, Program Results.)
290
291    Compute programs have no result variables; all shader results must be
292    written to memory.
293
294
295    Add New Section 2.X.3.Y, Compute Program Shared Memory, after Section
296    2.X.3.6, Program Parameter Buffers
297
298    Compute program shared memory variables are arrays of basic machine units
299    from which data can be read or written using the LDS and STS instructions.
300    Compute program shared memory also supports atomic memory operations using
301    the ATOMS instruction.  The GL allocates a single block of shared memory
302    for each local work group, whose size in basic machine units is specified
303    by the "SHARED_MEMORY" statement.  The contents of compute program shared
304    memory are undefined when program execution for the local work group
305    begins and can be changed only by using the ATOMS or STS instructions.
306    Compute program shared memory variables are shared between all invocations
307    of a local work group.  Writes performed by one invocation will be visible
308    for any reads of the same memory from any other invocation executed after
309    the write.  Note that the order of reads and writes between different
310    invocations in a local work group is largely undefined, although the BAR
311    instruction can be used to introduce synchronization points for all
312    invocations in a local work group.
313
314    Shared memory variables may only be used as operands in the ATOMS, LDS,
315    and STS instructions; they may not be used by used as results or operands
316    in general instructions.  Shared memory variables must be declared
317    explicitly via the <SHARED_statement> grammar rule.  Shared memory
318    bindings can not be used directly in executable instructions.
319
320    Shader storage buffer variables may be declared as arrays, but all
321    bindings assigned to the array must use the same binding point(s) and must
322    increase consecutively.
323
324      Binding                        Components  Underlying State
325      -----------------------------  ----------  -----------------------------
326      program.sharedmem[a]           (x,x,x,x)   compute shared memory,
327                                                   element a
328      program.sharedmem[a..b]        (x,x,x,x)   compute shared memory,
329                                                   elements a through b
330      program.sharedmem              (x,x,x,x)   compute shared memory,
331                                                   all elements
332
333      Table X.3: Shared Memory Bindings.  <a> and <b> indicate individual
334      elements of shared memory.
335
336    If a shared memory binding matches "program.sharedmem[a]", the shared
337    memory variable is associated with basic machine element <a> of compute
338    shared memory.
339
340    For shared memory declarations, "program.sharedmem[a..b]" is equivalent to
341    specifying elements <a> through <b> of compute shared memory in order.
342
343    For shared memory declarations, "program.sharedmem" is equivalent to
344    specifying elements zero through <N>-1 of compute shared memory in order,
345    where <N> is the total shared memory size declared by the "SHARED_MEMORY"
346    statement.
347
348
349    Modify Section 2.X.4, Program Execution Environment
350
351    (add to the opcode table)
352
353                  Modifiers
354      Instruction F I C S H D  Out Inputs    Description
355      ----------- - - - - - -  --- --------  --------------------------------
356      ATOMS       - - X - - -  s   v,su      atomic transaction to shared mem
357      BAR         - - - - - -  -   -         work group execution barrier
358      LDS         - - X X - F  v   su        load from shared memory
359      STS         - - - - - -  -   v,su      store to shared memory
360
361
362    Modify Section 2.X.4.1, Program Instruction Modifiers
363
364      Modifier  Description
365      --------  -----------------------------------------------
366      CTA       Memory barrier orders only memory transactions
367                relative to invocations within local work group
368
369    (add to descriptions of opcode modifiers)
370
371    For the MEMBAR (memory barrier) instruction, the "CTA" modifier specifies
372    that memory transactions before and after the barrier are strongly ordered
373    as observed by any other shader invocation in the local work group.
374
375
376    Modify Section 2.X.4.5, Program Memory Access, from NV_gpu_program5
377
378    (add to the end of the first paragraph) ... Additionally programs may load
379    from or store to shared memory via the ATOMS (atomic shared memory
380    operation), LDS (load from shared memory), and STS (store to shared
381    memory) instructions.
382
383    (modify miscellaneous other language referring to "buffer object memory"
384    to instead refer to "buffer object and shared memory")
385
386    (add hypothetical built-in functions SharedMemoryLoad() and
387    SharedMemoryStore() that behave similarly to BufferMemoryLoad() and
388    BufferMemoryStore(), except that they access local work group shared
389    memory instead of buffer object memory)
390
391
392    Add the following subsection to section 2.X.7, Program Declarations
393
394    Section 2.X.7.Y, Compute Program Declarations
395
396    Compute programs support two types of declaration statement, as described
397    below.
398
399    - Shader Thread Group Size (GROUP_SIZE)
400
401    The GROUP_SIZE statement declares the number of shader threads in a one-,
402    two-, or three-dimensional local work group.  The statement must have one
403    to three unsigned integer arguments.  Each argument must be less than or
404    equal to the value of the implementation-dependent limit
405    MAX_COMPUTE_LOCAL_WORK_SIZE for its corresponding dimension (X, Y, or Z).
406    A program will fail to load unless it contains exactly one GROUP_SIZE
407    declaration.
408
409
410    - Shared Memory Storage Size (SHARED_MEMORY)
411
412    The SHARED_MEMORY statement declares the size of the shared memory, in
413    basic machine units, available to the threads of each local work group.
414    The SHARED_MEMORY statement is optional, but a program will fail to load
415    if it includes multiple SHARED_MEMORY declarations, if it uses the the
416    ATOMS, LDS, or STS instructions in a program without a SHARED_MEMORY
417    declaration, if uses these instructions with an offset that would access
418    memory beyond the declared shared memory size, or if the declared shared
419    memory size is greater than the implementation-dependent limit
420    MAX_COMPUTE_SHARED_VARIABLE_SIZE.
421
422
423    (add the following subsection to section 2.X.8, Program Instruction Set.)
424
425    Section 2.X.8.Z, ATOMS:  Atomic Memory Operation (Shared Memory)
426
427    The ATOMS instruction performs an atomic memory operation by reading from
428    shared memory specified by the second unsigned integer scalar operand,
429    computing a new value based on the value read from memory and the first
430    (vector) operand, and then writing the result back to the same memory
431    address.  The memory transaction is atomic, guaranteeing that no other
432    write to the memory accessed will occur between the time it is read and
433    written by the ATOMS instruction.  The result of the ATOMS instruction is
434    the scalar value read from memory.  The second operand used for the ATOMS
435    instruction must correspond to a shared memory variable declared using the
436    "SHARED" statement; a program will fail to load if any other type of
437    operand is used for the second operand of an ATOMS instruction.
438
439    The ATOMS instruction has two required instruction modifiers.  The atomic
440    modifier specifies the type of operation to be performed.  The storage
441    modifier specifies the size and data type of the operand read from memory
442    and the base data type of the operation used to compute the value to be
443    written to memory.
444
445      atomic     storage
446      modifier   modifiers            operation
447      --------   ------------------   --------------------------------------
448       ADD       U32, S32, U64, F32   compute a sum
449       MIN       U32, S32             compute minimum
450       MAX       U32, S32             compute maximum
451       IWRAP     U32                  increment memory, wrapping at operand
452       DWRAP     U32                  decrement memory, wrapping at operand
453       AND       U32, S32             compute bit-wise AND
454       OR        U32, S32             compute bit-wise OR
455       XOR       U32, S32             compute bit-wise XOR
456       EXCH      U32, S32, U64, F32   exchange memory with operand
457       CSWAP     U32, S32, U64        compare-and-swap
458
459     Table X.Y, Supported atomic and storage modifiers for the ATOM
460     instruction.
461
462    Not all storage modifiers are supported by ATOMS, and the set of modifiers
463    allowed for any given instruction depends on the atomic modifier
464    specified.  Table X.Y enumerates the set of atomic modifiers supported by
465    the ATOMS instruction, and the storage modifiers allowed for each.
466
467      tmp0 = VectorLoad(op0);
468      result = SharedMemoryLoad(op1, storageModifier);
469      switch (atomicModifier) {
470      case ADD:
471        writeval = tmp0.x + result;
472        break;
473      case MIN:
474        writeval = min(tmp0.x, result);
475        break;
476      case MAX:
477        writeval = max(tmp0.x, result);
478        break;
479      case IWRAP:
480        writeval = (result >= tmp0.x) ? 0 : result+1;
481        break;
482      case DWRAP:
483        writeval = (result == 0 || result > tmp0.x) ? tmp0.x : result-1;
484        break;
485      case AND:
486        writeval = tmp0.x & result;
487        break;
488      case OR:
489        writeval = tmp0.x | result;
490        break;
491      case XOR:
492        writeval = tmp0.x ^ result;
493        break;
494      case EXCH:
495        break;
496      case CSWAP:
497        if (result == tmp0.x) {
498          writeval = tmp0.y;
499        } else {
500          return result;  // no memory store
501        }
502        break;
503      }
504      SharedMemoryStore(op1, writeval, storageModifier);
505
506    ATOMS performs a scalar atomic operation.  The <y>, <z>, and <w>
507    components of the result vector are undefined.
508
509    ATOMS supports no base data type modifiers, but requires exactly one
510    storage modifier.  The base data types of the result vector, and the first
511    (vector) operand are derived from the storage modifier.  The second
512    operand is always interpreted as a scalar unsigned integer.
513
514
515    Section 2.X.8.Z, BAR:  Execution Barrier
516
517    The BAR instruction synchronizes the execution of compute shader
518    invocations within a local work group.  When a compute shader invocation
519    executes the BAR instruction, it pauses until the same BAR instruction has
520    been executed by all invocations in the current local work group.  Once
521    all invocations have executed the BAR instruction, processing continues
522    with the instruction following the BAR instruction.
523
524    There is no compile-time restriction on the locations in a program where
525    BAR is allowed.  However, BAR instructions are not allowed in divergent
526    flow control; if any compute shader invocation in the work group executes
527    the BAR instruction, all compute shaders invocations must execute the
528    instruction.  Results of executing a BAR instruction are undefined and can
529    result in application hangs and/or program termination if the instruction
530    is issued:
531
532      * inside any IF/ELSE/ENDIF block where the results of the condition
533        evaluated by the IF instruction are not identical across the work
534        group;
535
536      * inside any iteration of REP/ENDREP block where at least one invocation
537        in the work group has skipped to the next iteration using the CONT
538        instruction, exited the loop using a BRK or RET instruction, or exited
539        the loop due to having completed the requested number of loop
540        iterations; or
541
542      * inside any subroutine (including main) where at least one invocation
543        in the work group has exited the subroutine using the RET instruction.
544
545    BAR has no operands and generates no result.
546
547
548    Section 2.X.8.Z, LDS:  Load from Shared Memory
549
550    The LDS instruction generates a result vector by fetching data from the
551    shared memory for the current local work group identified by the first
552    operand, as described in Section 2.X.4.5.  The single operand for the LDS
553    instruction must correspond to a shader shared memory variable declared
554    using the "SHARED" statement; a program will fail to load if any other
555    type of operand is used in an LDS instruction.
556
557      result = SharedMemoryLoad(op0, storageModifier);
558
559    LDS supports no base data type modifiers, but requires exactly one storage
560    modifier.  The base data type of the result vector is derived from the
561    storage modifier.
562
563
564    Replace Section 2.X.8.Z, MEMBAR:  Memory Barrier, as added by
565    EXT_shader_image_load_store
566
567    The MEMBAR instruction synchronizes memory transactions to ensure that
568    memory transactions resulting from any instruction executed by the thread
569    prior to the MEMBAR instruction complete prior to any memory transactions
570    issued after the instruction, as observed by other shader invocations.
571
572    The MEMBAR instruction has one optional instruction modifier.  If the CTA
573    instruction modifier is specified, memory transactions before and after
574    the barrier will be strongly ordered as observed by other shader
575    invocations in the same local work group.  However, it does not order
576    transactions as viewed by any other shader.  With the CTA modifier,
577    shaders not in the local work group may observe the results of memory
578    transactions issued after the MEMBAR instruction before those issued
579    before the MEMBAR instruction.  If the CTA instruction modifier is not
580    specified, all shader invocations will see the results of any memory
581    transaction issued before the MEMBAR instruction before those issued after
582    the MEMBAR instruction.
583
584    MEMBAR has no operands and generates no result.
585
586
587    Section 2.X.8.Z, STS:  Store to Shared Memory
588
589    The STS instruction writes the contents of the first vector operand to
590    shared memory for the current local work group identified by the second
591    operand, as described in Section 2.X.4.5.  This instruction generates no
592    result.  The second operand for the STS instruction must correspond to a
593    shared memory variable declared using the "SHARED" statement; a program
594    will fail to load if any other type of operand is used in an STS
595    instruction.
596
597      tmp0 = VectorLoad(op0);
598      SharedMemoryStore(op1, tmp0, storageModifier);
599
600    STS supports no base data type modifiers, but requires exactly one storage
601    modifier.  The base data type of the vector components of the first
602    operand is derived from the storage modifier.
603
604
605Additions to Chapter 3 of the OpenGL 4.2 (Compatibility Profile) Specification
606(Rasterization)
607
608    None.
609
610Additions to Chapter 4 of the OpenGL 4.2 (Compatibility Profile) Specification
611(Per-Fragment Operations and the Frame Buffer)
612
613    None.
614
615Additions to Chapter 5 of the OpenGL 4.2 (Compatibility Profile) Specification
616(Special Functions)
617
618    None.
619
620Additions to Chapter 6 of the OpenGL 4.2 (Compatibility Profile) Specification
621(State and State Requests)
622
623    None.
624
625Additions to the AGL/GLX/WGL Specifications
626
627    None.
628
629GLX Protocol
630
631    None.
632
633Dependencies on NV_shader_atomic_float
634
635    If NV_shader_atomic_float is not supported, the ADD and EXCH atomic
636    operations in the ATOMS instruction do not support the "F32" storage
637    modifier.
638
639Dependencies on EXT_shader_image_load_store
640
641    If EXT_shader_image_load_store is not supported, language describing the
642    "CTA" instruction modifier and modifying the MEMBAR instruction (as added
643    by EXT_shader_image_load_store) should be removed.
644
645Errors
646
647    None.
648
649New State
650
651    (Modify ARB_vertex_program, Table X.6 -- Program State)
652
653                                                      Initial
654    Get Value                    Type    Get Command  Value   Description               Sec.    Attribute
655    ---------                    ------- -----------  ------- ------------------------  ------  ---------
656    COMPUTE_PROGRAM_PARAMETER_   Z+      GetIntegerv  0       Active compute program    2.14.1  -
657      BUFFER_NV                                               buffer object binding
658    COMPUTE_PROGRAM_PARAMETER_   nxZ+    GetInteger-  0       Buffer objects bound for  2.14.1  -
659      BUFFER_NV                          IndexedvEXT          compute program use
660
661    Also shares buffer bindings and other state with the ARB_compute_shader
662    extension.
663
664New Implementation Dependent State
665
666    None, but shares implementation-dependent state with the
667    ARB_compute_shader extension.
668
669Issues
670
671    None.
672
673Revision History
674
675    Rev.    Date    Author    Changes
676    ----  --------  --------  --------------------------------------------
677     2    10/23/12  pbrown    Remove the restriction forbidding the use of BAR
678                              inside potentially divergent flow control.
679                              Instead, we will allow BAR to be executed
680                              anywhere, but specify undefined results
681                              (including hangs or program termination) if the
682                              flow control is divergent (bug 9367).
683
684     1              pbrown    Internal spec development.
685