• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    NV_fragment_program2
4
5Name Strings
6
7    GL_NV_fragment_program2
8
9Contact
10
11    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
12    Eric Werness, NVIDIA Corporation (ewerness 'at' nvidia.com)
13
14Status
15
16    Shipping.
17
18Version
19
20    Last Modified:      08/04/2004
21    NVIDIA Revision:    8
22
23Number
24
25    304
26
27
28Dependencies
29
30    ARB_fragment_program is required.
31    NV_fragment_program_option is required.
32
33Overview
34
35    This extension, like the NV_fragment_program_option extension, provides
36    additional fragment program functionality to extend the standard
37    ARB_fragment_program language and execution environment.  ARB programs
38    wishing to use this added functionality need only add:
39
40        OPTION NV_fragment_program2;
41
42    to the beginning of their fragment programs.
43
44    New functionality provided by this extension, above and beyond that
45    already provided by the NV_fragment_program_option extension, includes:
46
47
48      * structured branching support, including data-dependent IF tests, loops
49        supporting a fixed number of iterations, and a data-dependent loop
50        exit instruction (BRK),
51
52      * subroutine calls,
53
54      * instructions to perform vector normalization, divide vector components
55        by a scalar, and perform two-component dot products (with or without a
56        scalar add),
57
58      * an instruction to perform a texture lookup with an explicit LOD,
59
60      * a loop index register for indirect access into the texture coordinate
61        attribute array, and
62
63      * a facing attribute that indicates whether the fragment is generated
64        from a front- or back-facing primitive.
65
66
67Issues
68
69    * Should this extension expose projective forms of the LOD-modifying
70      texture instructions?
71
72        RESOLVED: No. The user can manually add a DIV instruction to achieve
73        the same effect.
74
75    * Should this extension expose precision explicitly?
76
77        RESOLVED: Only for storage using the SHORT TEMP and LONG TEMP syntax
78        (similar to NV_fragment_program_option).
79
80    * How are resources (such as registers and condition codes) scoped?
81
82        RESOLVED: All resources are globally scoped. This means that if, for
83        instance, a subroutine modifies a condition code, that modification
84        effects both the caller and the callee.
85
86    * How is the scope determined for instructions required to be within a
87      specific loop construct?
88
89        RESOLVED: The scope is determined statically at compile time. This means
90        that calling BRK and using A0 from a subroutine called within a loop is
91        a compile error.
92
93
94New Procedures and Functions
95
96    None.
97
98New Tokens
99
100    Accepted by the <pname> parameter of GetProgramivARB:
101
102        MAX_PROGRAM_EXEC_INSTRUCTIONS_NV                0x88F4
103        MAX_PROGRAM_CALL_DEPTH_NV                       0x88F5
104        MAX_PROGRAM_IF_DEPTH_NV                         0x88F6
105        MAX_PROGRAM_LOOP_DEPTH_NV                       0x88F7
106        MAX_PROGRAM_LOOP_COUNT_NV                       0x88F8
107
108
109Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation)
110
111    None.
112
113Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization)
114
115    Modify Section 3.11 of ARB_fragment_program (Fragment Program):
116
117    Delete the sentence referring to the lack of branching or looping.
118
119    Modify Section 3.11.2 of ARB_fragment_program (Fragment Program Grammar
120    and Restrictions):
121
122    (mostly add to existing grammar rules, as extended by
123    NV_fragment_program_option)
124
125    <optionName>            ::= "NV_fragment_program2"
126
127    <statement>             ::= <branchLabel> ":"
128
129    <instruction>           ::= <FlowInstruction>
130
131    <ALUInstruction>        ::= <VECSCAop_instruction>
132
133    <FlowInstruction>       ::= <BRAop_instruction>
134                              | <FLOWCCop_instruction>
135                              | <IFop_instruction>
136                              | <LOOPop_instruction>
137                              | <ENDFLOWop_instruction>
138
139    <VECTORop>              ::= "NRM"
140
141    <VECSCAop_instruction>  ::= <VECSCAop> <instResult> "," <instOperandV> ","
142                                <instOperandS>
143
144    <VECSCAop>              ::= "DIV"
145
146    <BINop>                 ::= "DP2"
147
148    <TRIop>                 ::= "DP2A"
149
150    <TEXop>                 ::= "TXL"
151
152    <BRAop_instruction>     ::= <BRAop> <branchLabel> <optBranchCond>
153
154    <BRAop>                 ::= "CAL"
155
156    <FLOWCCop_instruction>  ::= <FLOWCCop> <optBranchCond>
157
158    <FLOWCCop>              ::= "RET"
159                              | "BRK"
160
161    <IFop_instruction>      ::= <IFop> <ccTest>
162
163    <IFop>                  ::= "IF"
164
165    <LOOPop_instruction>    ::= <LOOPop> <instOperandV>
166
167    <LOOPop>                ::= "LOOP"
168                              | "REP"
169
170    <ENDFLOWop_instruction> ::= <ENDFLOWop>
171
172    <ENDFLOWop>             ::= "ELSE"
173                              | "ENDIF"
174                              | "ENDLOOP"
175                              | "ENDREP"
176
177    <optBranchCond>         ::= /* empty */
178                              | <ccMask>
179
180    <branchLabel>           ::= <identifier>
181
182    <attribFragBasic>       ::= "texcoord" "[" <arrayMemRel> "]"
183                              | "facing"
184
185    <arrayMemRel>           ::= <addrUseS> <arrayMemRelOffset>
186
187    <arrayMemRelOffset>     ::= /* empty */
188                              | "+" <addrRegPosOffset>
189
190    <addrRegPosOffset>      ::= <integer> from 0 to 9
191
192    <addrUseS>              ::= <addrVarName> <scalarAddrSuffix>
193
194    <scalarAddrSuffix>      ::= "." <addrComponent>
195
196    <addrComponent>         ::= "x"
197
198    Note:  This extension provides a pre-defined address register (A0) that
199    matches the <addrVarName> grammar rule and can be used as a loop counter
200    (Section 3.11.3.Y).  It is not possible to declare additional address
201    register variables.
202
203
204    Modify Section 3.11.3.1, Fragment Attributes
205
206    (add new bindings to binding table)
207
208      Fragment Attribute Binding  Components  Underlying State
209      --------------------------  ----------  ----------------------------
210      ...
211      fragment.texcoord[A0.x+n]   (s,t,r,q)   indexed texture coordinate
212      fragment.facing             (f,0,0,1)   fragment facing
213
214    If a fragment attribute binding matches "fragment.texcoord[A0.x+n]", a
215    texture coordinate number <c> is computed by adding the current value of
216    the "A0.x" address register (the loop index -- Section 3.11.3.Y) and <n>.
217    The "x", "y", "z", and "w" components of the fragment attribute variable
218    are filled with the "s", "t", "r", and "q" components, respectively, of
219    the fragment texture coordinates for texture coordinate set <c>.  If <c>
220    is negative or greater than or equal to MAX_TEXTURE_COORDS_ARB, the
221    fragment attribute variable is undefined.
222
223    If a fragment attribute binding matches "fragment.facing", the "x"
224    component of the fragment attribute variable is filled with +1.0 or -1.0,
225    depending on the orientation of the primitive producing the fragment.  If
226    the fragment is generated by a back-facing polygon (including point- and
227    line-mode polygons), the facing is -1.0; otherwise, the facing is +1.0.
228    The "y", "z", and "w" coordinates are filled with 0, 0, and 1,
229    respectively.
230
231
232    Add New Section 3.11.3.Y, Fragment Program Address Register (insert after
233    Section 3.11.3.X, Condition Code Register)
234
235    Fragment program address register variables are a set of four-component
236    signed integer vectors where only the "x" component of the address
237    registers is currently accessible.  Address registers are used as indices
238    when performing relative addressing in the "fragment.texcoord" attribute
239    array (section 3.11.3.1).
240
241    Fragment program address registers can not be declared in a fragment
242    program.  There is only a single built-in address register, "A0.x" (loop
243    index), which is available inside LOOP/ENDLOOP blocks.  A fragment program
244    that accesses A0.x outside a LOOP/ENDLOOP block will fail to load.
245
246    A0.x is initialized in by the LOOP instruction and updated by the ENDLOOP
247    instruction.  When LOOP blocks are nested, each block has its own value
248    for A0.x, but only the A0.x value for the innermost block can be used. The
249    value of A0.x is clamped to be greater than or equal to 0.
250
251
252    Modify Section 3.11.4, Fragment Program Execution Environment
253
254    (modify instruction table) There are sixty-seven fragment program
255    instructions....
256
257               Modifiers
258      Instr.   R H X C S  Inputs  Output   Description
259      -------  - - - - -  ------  ------   --------------------------------
260      ABS      X X X X X  v       v        absolute value
261      ADD      X X X X X  v,v     v        add
262      BRK      - - - - -  c       -        break out of loop instruction
263      CAL      - - - - -  c       -        subroutine call
264      CMP      - - - X X  v,v,v   v        compare
265      COS      X X - X X  s       ssss     cosine with reduction to [-PI,PI]
266      DDX      X X - X X  v       v        partial derivative relative to X
267      DDY      X X - X X  v       v        partial derivative relative to Y
268      DIV      X X - X X  v,s     v        divide vector components by scalar
269      DP2      X X X X X  v,v     ssss     2-component dot product
270      DP2A     X X X X X  v,v,v   ssss     2-comp. dot product w/scalar add
271      DP3      X X X X X  v,v     ssss     3-component dot product
272      DP4      X X X X X  v,v     ssss     4-component dot product
273      DPH      X X X X X  v,v     ssss     homogeneous dot product
274      DST      X X - X X  v,v     v        distance vector
275      ELSE     - - - - -  -       -        start if test else block
276      ENDIF    - - - - -  -       -        end if test block
277      ENDLOOP  - - - - -  -       -        end of loop block
278      ENDREP   - - - - -  -       -        end of repeat block
279      EX2      X X - X X  s       ssss     exponential base 2
280      FLR      X X X X X  v       v        floor
281      FRC      X X X X X  v       v        fraction
282      IF       - - - - -  c       -        start of if test block
283      KIL      - - - - -  v or c  -        kill fragment
284      LG2      X X - X X  s       ssss     logarithm base 2
285      LIT      X X - X X  v       v        compute light coefficients
286      LOOP     - - - - -  v       -        start of loop block
287      LRP      X X X X X  v,v,v   v        linear interpolation
288      MAD      X X X X X  v,v,v   v        multiply and add
289      MAX      X X X X X  v,v     v        maximum
290      MIN      X X X X X  v,v     v        minimum
291      MOV      X X X X X  v       v        move
292      MUL      X X X X X  v,v     v        multiply
293      NRM      X X - X X  v       v        normalize 3-component vector
294      PK2H     - - - - -  v       ssss     pack two 16-bit floats
295      PK2US    - - - - -  v       ssss     pack two unsigned 16-bit scalars
296      PK4B     - - - - -  v       ssss     pack four signed 8-bit scalars
297      PK4UB    - - - - -  v       ssss     pack four unsigned 8-bit scalars
298      POW      X X - X X  s,s     ssss     exponentiate
299      RCP      X X - X X  s       ssss     reciprocal
300      REP      - - - - -  v       -        start of repeat block
301      RET      - - - - -  c       -        subroutine return
302      RFL      X X - X X  v,v     v        reflection vector
303      RSQ      X X - X X  s       ssss     reciprocal square root
304      SCS      X X - X X  s       ss--     sine/cosine without reduction
305      SEQ      X X X X X  v,v     v        set on equal
306      SFL      X X X X X  v,v     v        set on false
307      SGE      X X X X X  v,v     v        set on greater than or equal
308      SGT      X X X X X  v,v     v        set on greater than
309      SIN      X X - X X  s       ssss     sine with reduction to [-PI,PI]
310      SLE      X X X X X  v,v     v        set on less than or equal
311      SLT      X X X X X  v,v     v        set on less than
312      SNE      X X X X X  v,v     v        set on not equal
313      STR      X X X X X  v,v     v        set on true
314      SUB      X X X X X  v,v     v        subtract
315      SWZ      X X - X X  v       v        extended swizzle
316      TEX      - - - X X  v       v        texture sample
317      TXB      - - - X X  v       v        texture sample with bias
318      TXD      - - - X X  v,v,v   v        texture sample w/partials
319      TXL      - - - X X  v       v        texture same w/explicit LOD
320      TXP      - - - X X  v       v        texture sample with projection
321      UP2H     - - - X X  s       v        unpack two 16-bit floats
322      UP2US    - - - X X  s       v        unpack two unsigned 16-bit scalars
323      UP4B     - - - X X  s       v        unpack four signed 8-bit scalars
324      UP4UB    - - - X X  s       v        unpack four unsigned 8-bit scalars
325      X2D      X X - X X  v,v,v   v        2D coordinate transformation
326      XPD      X X - X X  v,v     v        cross product
327
328      Table X.5:  Summary of fragment program instructions.  The columns "R",
329      "H", "X", "C", and "S" indicate whether the "R", "H", or "X" precision
330      modifiers, the C condition code update modifier, and the "_SAT"/"_SSAT"
331      saturation modifiers, respectively, are supported for the opcode.  In
332      the input/output columns, "v" indicates a floating-point vector input or
333      output, "s" indicates a floating-point scalar input, "ssss" indicates a
334      scalar output replicated across a 4-component result vector, "ss--"
335      indicates two scalar outputs in the first two components, and "c"
336      indicates a condition code test.  Instructions describe as "texture
337      sample" also specify a texture image unit identifier and a texture
338      target.
339
340
341    Modify Section 3.11.4.3, Fragment Program Destination Register Update
342
343    (modify saturation discussion) If the instruction opcode has the "_SAT"
344    suffix, requesting saturated result vectors, each component of the result
345    vector is clamped to the range [0,1] before updating the destination
346    register.  If the instruction opcode has the "_SSAT" suffix, requesting
347    signed saturation, each component of the result vector is clamped to the
348    range [-1,1] before updating the destination register.
349
350
351    Add Section 3.11.4.X, Fragment Program Branching (before Section 3.11.4.4,
352    Fragment Program Result Processing)
353
354    Fragment programs support a limited model of branching.  Fragment programs
355    can specify one of several types of instruction blocks: IF/ELSE/ENDIF
356    blocks, LOOP/ENDLOOP blocks, and REP/ENDREP blocks.  Examples include the
357    following:
358
359      LOOP {5, 0, 1};     # 5 iterations with loop index at 0,1,2,3,4
360      ADD R0, R0, R1;
361      ENDLOOP;
362
363      REP repCount;
364      ADD R0, R0, R1;
365      ENDREP;
366
367      MOVC CC, R0;
368      IF GT.x;
369        MOV R0, R1;  # executes if R0.x > 0
370      ELSE;
371        MOV R0, R2;  # executes if R0.x <= 0
372      ENDIF;
373
374    Instruction blocks may be nested -- for example, a LOOP block may be
375    contained inside an IF/ELSE/ENDIF block.  In all cases, each instruction
376    block must be terminated with the appropriate instruction (ENDIF for IF,
377    ENDLOOP for LOOP, ENDREP for REP).  Nested instruction blocks must be
378    wholly contained within a block -- if a LOOP instruction is found between
379    an IF and ELSE instruction, the ENDLOOP must also be present between the
380    IF and ELSE.  A fragment program will fail to load if any instruction
381    block is terminated by an incorrect instruction or is not terminated
382    before the block containing it.
383
384    IF/ELSE/ENDIF blocks evaluate a condition to determine which instructions
385    to execute.  If the condition is true, all instructions between the IF and
386    ELSE are executed.  If the condition is false, all instructions between
387    the ELSE and ENDIF are executed.  The ELSE instruction is optional.  If
388    the ELSE is omitted, all instructions between the IF and ENDIF are
389    executed if the condition is true, or skipped if the condition is false.
390    A limited amount of nesting is supported -- a fragment program will fail
391    to load if an IF instruction is nested inside MAX_PROGRAM_IF_DEPTH_NV or
392    more IF/ELSE/ENDIF blocks.
393
394    The condition of an IF test is specified by the <ccTest> grammar rule and
395    may depend on the contents of the condition code register.  Branch
396    conditions are evaluated by evaluating a condition code write mask in
397    exactly the same manner as done for register writes (section 2.14.2.2).
398    If any of the four components of the condition code write mask are
399    enabled, the branch is taken and execution continues with the instruction
400    following the label specified in the instruction.  Otherwise, the
401    instruction is ignored and fragment program execution continues with the
402    next instruction.  In the following example code,
403
404        MOVC CC, c[0];         # c[0]=(-2, 0, 2, NaN), CC gets (LT,EQ,GT,UN)
405        CAL label1 (LT.xyzw);  # call taken
406        CAL label2 (LT.wyzw);  # call not taken
407
408    the first CAL instruction loads a condition code of (LT,EQ,GT,UN) while
409    the second CAL instruction loads a condition code of (UN,EQ,GT,UN).  The
410    first call will be made because the "x" component evaluates to LT; the
411    second call will not be made because no component evaluates to LT.
412
413    LOOP/ENDLOOP and REP/ENDREP blocks involve a loop counter that indicates
414    the number of times the instructions between the LOOP/REP and
415    ENDLOOP/ENDREP are executed.  Looping blocks have a number of significant
416    limitations.  First, the loop counter can not be computed at run time; it
417    must be specified as a program parameter.  Second, the number of loop
418    iterations is limited to the value MAX_PROGRAM_LOOP_COUNT_NV, which must
419    be at least 255.  Third, only a limited amount of nesting is supported --
420    a fragment program will fail to load if a LOOP or REP instruction is
421    nested inside MAX_PROGRAM_LOOP_DEPTH_NV or more LOOP/ENDLOOP or REP/ENDREP
422    blocks.
423
424    The BRK instruction is available to terminate a loop block early.  A BRK
425    instruction can be conditional; the condition is evaluated in the same
426    manner as the condition of an IF instruction, and the loop is terminated
427    if the condition is true.  A fragment program will fail to load if it
428    contains a BRK instruction that is not nested inside a LOOP/ENDLOOP or
429    REP/ENDREP block.
430
431    Fragment programs can contain one or more instruction labels, matching the
432    grammar rule <branchLabel>.  An instruction label can be referred to
433    explicitly in subroutine call (CAL) instructions.  Instruction labels can
434    be used at any point in the body of a program, and can be used in
435    instructions before being defined in the program string.  Instruction
436    labels can be defined anywhere in the program, except inside an
437    IF/ELSE/ENDIF, LOOP/ENDLOOP, or REP/ENDREP instruction block.  A fragment
438    program will fail to load if it contains an instruction label inside an
439    instruction block.
440
441    Fragment programs can also specify subroutine calls.  When a subroutine
442    call (CAL) instruction is executed, a reference to the instruction
443    immediately following the CAL instruction is pushed onto the call stack.
444    When a subroutine return (RET) instruction is executed, an instruction
445    reference is popped off the call stack and program execution continues
446    with the popped instruction.  A fragment program will terminate if a CAL
447    instruction is executed with MAX_PROGRAM_CALL_DEPTH_NV entries already in
448    the call stack or if a RET instruction is executed with an empty call
449    stack.  Subroutine calls may be conditional; the condition is specified by
450    the <optBranchCond> grammar rule and evaluated in the same way as the
451    condition of the IF instruction.  If no condition is specified, it is as
452    though "(TR)" were specified -- the branch is unconditional.
453
454    If a fragment program has an instruction label "main", program execution
455    begins with the instruction immediately following the instruction label.
456    Otherwise, program execution begins with the first instruction of the
457    program.  Instructions will be executed sequentially in the order
458    specified in the program, although branch instructions will affect the
459    instruction execution order, as described above.  A fragment program will
460    terminate after executing a RET instruction with an empty call stack.  A
461    fragment program will also terminate after executing the last instruction
462    in the program, unless that instruction was a taken branch.
463
464    A fragment program will fail to load if an instruction refers to a label
465    that is not defined in the program string.
466
467    A fragment program will terminate abnormally if a subroutine call
468    instruction produces a call stack overflow.  Additionally, a fragment
469    program will terminate abnormally after executing
470    MAX_PROGRAM_EXEC_INSTRUCTIONS instructions to prevent hangs caused by
471    infinite loops in the program.
472
473    When a fragment program terminates, normally or abnormally, it will emit a
474    fragment whose attributes are taken from the final values of the fragment
475    program result variables (section 3.11.3.4).
476
477
478    Add to Section 3.11.4.5 of ARB_fragment_program (Fragment Program
479    Options):
480
481    Section 3.11.4.5.3, NV_fragment_program2 Option
482
483    If a fragment program specifies the "NV_fragment_program2" option, the
484    ARB_fragment_program grammar and execution environment are extended to
485    take advantage of all the features of the "NV_fragment_program" option,
486    plus the following features:
487
488      * structured branching support, including data-dependent IF tests, loops
489        supporting a fixed number of iterations, and a data-dependent loop
490        exit instruction (BRK),
491
492      * subroutine calls,
493
494      * several new instructions:
495
496        * NRM -- vector normalization
497        * DIV -- divide vector components by a scalar
498        * DP2 -- two-component dot product
499        * DP2A -- two-component dot product with scalar add
500        * TXL -- texture lookup with explicit LOD specified
501        * IF/ELSE/ENDIF -- conditional execution blocks
502        * REP/ENDREP -- loop block
503        * LOOP/ENDLOOP -- loop block using index register
504        * BRK -- break out of loop block
505        * CAL -- subroutine call
506        * RET -- subroutine return
507
508      * a loop index register inside LOOP/ENDLOOP blocks that can be used for
509        indirect access into the texture coordinate attribute array, and
510
511      * a facing attribute that indicates whether the fragment is generated
512        from a front- or back-facing primitive.
513
514
515    Modify Section 3.11.5,  Fragment Program ALU Instruction Set
516
517    Section 3.11.5.48, DIV:  Divide (Vector Components by Scalar)
518
519    The DIV instruction divides each component of the first vector operand by
520    the second scalar operand to produce a 4-component result vector.
521
522      tmp0 = VectorLoad(op0);
523      tmp1 = ScalarLoad(op1);
524      result.x = tmp0.x / tmp1;
525      result.y = tmp0.y / tmp1;
526      result.z = tmp0.z / tmp1;
527      result.w = tmp0.w / tmp1;
528
529    This instruction may not produce results identical to a RCP/MUL
530    instruction sequence.
531
532
533    Section 3.11.5.49, DP2:  2-Component Dot Product
534
535    The DP2 instruction computes a two-component dot product of the two
536    operands (using the first two components) and replicates the dot product
537    to all four components of the result vector.
538
539      tmp0 = VectorLoad(op0);
540      tmp1 = VectorLoad(op1);
541      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y);
542      result.x = dot;
543      result.y = dot;
544      result.z = dot;
545      result.w = dot;
546
547    Section 3.11.5.50, DP2A:  2-Component Dot Product w/Scalar Add
548
549    The DP2 instruction computes a two-component dot product of the two
550    operands (using the first two components), adds the x component of the
551    third operand, and replicates the result to all four components of the
552    result vector.
553
554      tmp0 = VectorLoad(op0);
555      tmp1 = VectorLoad(op1);
556      tmp2 = VectorLoad(op2);
557      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + tmp2.x;
558      result.x = dot;
559      result.y = dot;
560      result.z = dot;
561      result.w = dot;
562
563
564    Section 3.11.5.51, NRM:  3-Component Vector Normalize
565
566    The NRM instruction normalizes the vector given by the x, y, and z
567    components of the vector operand to produce the x, y, and z components of
568    the result vector.  The w component of the result is undefined.
569
570      tmp = VectorLoad(op0);
571      scale = ApproxRSQ(tmp.x * tmp.x + tmp.y * tmp.y + tmp.z * tmp.z);
572      result.x = tmp.x * scale;
573      result.y = tmp.y * scale;
574      result.z = tmp.z * scale;
575      result.w = undefined;
576
577    Note that the normalization uses an approximate scale and may be carried
578    at lower precision than a corresponding sequence of DP3, RSQ, and MUL
579    instructions.
580
581
582    Add Section 3.11.6.6, TXL: Texture Lookup with Explicit LOD
583
584    The TXL instruction takes the x, y, and z components of the vector operand
585    and maps them to s, t, and r, respectively.  These coordinates are used to
586    sample from the specified texture target on the specified texture image
587    unit in a manner consistent with its parameters.
588
589    The level of detail is computed as specified in section 3.8.8, except that
590    rho(x,y) is given by 2^w, where w is the w component of the vector
591    operand.
592
593    The resulting sample is mapped to RGBA as described in table 3.21
594    and written to the result vector.
595
596      tmp = VectorLoad(op0);
597      result = TextureSample(tmp.x, tmp.y, tmp.z, 0.0, op1, op2);
598
599
600    Add Section 3.11.X, Fragment Program Flow Control Instruction Set
601    (immediately after Section 3.11.6, Fragment Program Texture Instruction
602    Set)
603
604    3.11.X.1, BRK:  Break
605
606    The BRK instruction conditionally transfers control to the instruction
607    immediately following the next ENDLOOP or ENDREP instruction.  A BRK
608    instruction has no effect if the condition code test evaluates to FALSE.
609
610    The following pseudocode describes the operation of the instruction:
611
612      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
613          TestCC(cc.**c*) || TestCC(cc.***c)) {
614        continue execution at instruction following the next ENDLOOP or
615          ENDREP;
616      }
617
618
619    3.11.X.2, CAL:  Subroutine Call
620
621    The CAL instruction conditionally transfers control to the instruction
622    following the label specified in the instruction.  A CAL instruction has
623    no effect if the condition code test evaluates to FALSE.
624
625    When executed, the CAL instruction pushes a reference to the instruction
626    immediately following the CAL instruction onto the call stack.  When a
627    matching RET instruction is executed, execution will continue at that
628    instruction after executing the matching RET instruction.
629
630    Implementations may have a limited call stack.  If the number of CAL
631    instructions that have been performed without returning is
632    MAX_PROGRAM_CALL_DEPTH_NV, a CAL instruction will cause the call stack to
633    overflow and the fragment program to terminate.
634
635    The following pseudocode describes the operation of the instruction:
636
637      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
638          TestCC(cc.**c*) || TestCC(cc.***c)) {
639
640        // Check for call stack overflow.
641        if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) {
642          terminate fragment program;
643        }
644
645        push instruction following the CAL instruction on the call stack;
646        continue execution at instruction following <branchLabel>;
647      }
648
649
650    3.11.X.3, ELSE:  Beginning of ELSE Block
651
652    The ELSE instruction signifies the end of the "execute if true" portion of
653    an IF/ELSE/ENDIF block.
654
655    If the condition evaluated at the IF statement was TRUE, when a program
656    reaches the ELSE statement, it has completed the entire "execute if true"
657    portion of the IF/ELSE/ENDIF block.  Execution will continue at the
658    corresponding ENDIF instruction.
659
660    If the condition evaluated at the IF statement was FALSE, program
661    execution would skip over the entire "execute if true" portion of the
662    IF/ELSE/ENDIF block, including the ELSE instruction.
663
664
665    3.11.X.4, ENDIF:  End of IF/ELSE Block
666
667    The ENDIF instruction signifies the end of an IF/ELSE/ENDIF block.  It has
668    no other effect on program execution.
669
670
671    3.11.X.5, ENDLOOP:  End of LOOP Block
672
673    The ENDLOOP instruction specifies the end of a LOOP block.  When an
674    ENDLOOP instruction executes, the loop count is decremented and the loop
675    index increment value is added to the loop index (A0.x).  If the
676    decremented loop count is greater than zero, execution continues at the
677    top of the LOOP block.
678
679      LoopCount--;
680      LoopIndex += LoopIncr;
681      if (LoopCount > 0) {
682        continue execution at instruction following corresponding LOOP
683          instruction;
684      }
685
686    3.11.X.6, ENDREP:  End of REP Block
687
688    The ENDREP instruction specifies the end of a REP block.  When an ENDREP
689    instruction executes, the loop count is decremented.  If the decremented
690    loop count is greater than zero, execution continues at the top of the REP
691    block.
692
693      LoopCount--;
694      if (LoopCount > 0) {
695        continue execution at instruction following corresponding LOOP
696          instruction;
697      }
698
699
700    3.11.X.7, IF:  Beginning of IF Block
701
702    The IF instruction conditionally transfers control to the instruction
703    immediately following the corresponding ELSE instruction (if present) or
704    ENDIF instruction (if no ELSE is present).
705
706    Implementations may have a limited ability to nest IF blocks at run time.
707    If the number of IF/ENDIF blocks that are currently active is
708    MAX_PROGRAM_IF_DEPTH_NV, an IF instruction will cause the fragment program
709    to terminate.  If an IF instruction is executed inside a subroutine, any
710    active IF/ENDIF blocks in the calling code count against this limit.
711
712      if (IF block nested too deeply) {
713        terminate fragment program;
714      }
715
716      // Evaluate the condition.  If the condition is true, continue at the
717      // next instruction.  Otherwise, continue at the
718      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
719          TestCC(cc.**c*) || TestCC(cc.***c)) {
720        continue execution at the next instruction;
721      } else if (IF block contains an ELSE statement) {
722        continue execution at instruction following corresponding ELSE;
723      } else {
724        continue execution at instruction following corresponding ENDIF;
725      }
726
727
728    3.11.X.8, LOOP:  Beginning of LOOP Block
729
730    The LOOP instruction begins a LOOP block.  The x, y, and z components of
731    the single vector operand specify the initial values for the loop count,
732    loop index, and loop index increment, respectively.
733
734    The loop count indicates the number of times the instructions between the
735    LOOP and corresponding ENDLOOP instruction will be executed.  If the
736    initial value of the loop count is not positive, the entire block is
737    skipped and execution continues at the corresponding ENDLOOP instruction.
738
739    The loop index (A0.x) can be used for indirect addressing in the set of
740    texture coordinate fragment attributes.  A fragment program can only use
741    the loop index of the current LOOP block; loop indices for containing LOOP
742    blocks are not available.
743
744    Implementations may have a limited ability to nest LOOP and REP blocks at
745    run time.  If the number of LOOP/ENDLOOP and REP/ENDREP blocks that have
746    not completed is MAX_PROGRAM_LOOP_DEPTH_NV, a LOOP instruction will cause
747    the fragment program to terminate.  If a LOOP instruction is executed
748    inside a subroutine, any active LOOP/ENDLOOP or REP/ENDREP blocks in the
749    calling code count against this limit.
750
751      if (LOOP block nested too deeply) {
752        terminate fragment program;
753      }
754
755      // Set up loop information for the new nesting level.
756      tmp = VectorLoad(op0);
757      LoopCount = floor(op0.x);
758      LoopIndex = floor(op0.y);
759      LoopIncr  = floor(op0.z);
760      if (LoopCount <= 0) {
761        continue execution at the corresponding ENDLOOP;
762      }
763
764    LOOP blocks do not support fully general branching -- a fragment program
765    will fail to load if the vector operand is not a program parameter.
766
767
768    3.11.X.9, REP:  Beginning of REP Block
769
770    The REP instruction begins a REP block.  The x component of the single
771    vector operand specifies the initial value for the loop count.  REP blocks
772    are completely identical to LOOP blocks except that they don't use the
773    loop index at all.
774
775    The loop count indicates the number of times the instructions between the
776    REP and corresponding ENDREP instruction will be executed.  If the initial
777    value of the loop count is not positive, the entire block is skipped and
778    execution continues at the instruction following the corresponding ENDREP
779    instruction.
780
781    Implementations may have a limited ability to nest LOOP and REP blocks at
782    run time.  If the number of LOOP/ENDLOOP and REP/ENDREP blocks that have
783    not completed is MAX_PROGRAM_LOOP_DEPTH_NV, a REP instruction will cause
784    the fragment program to terminate.  If a REP instruction is executed
785    inside a subroutine, any active LOOP/ENDLOOP or REP/ENDREP blocks in the
786    calling code count against this limit.
787
788      if (REP block nested too deeply) {
789        terminate fragment program;
790      }
791
792      // Set up loop information for the new nesting level.
793      tmp = VectorLoad(op0);
794      LoopCount = floor(op0.x);
795      if (LoopCount <= 0) {
796        continue execution at the corresponding ENDREP;
797      }
798
799    REP blocks do not support fully general branching -- a fragment program
800    will fail to load if the vector operand is not a program parameter.
801
802
803    3.11.X.10, RET:  Subroutine Return
804
805    The RET instruction conditionally returns from a subroutine initiated by a
806    CAL instruction.  A RET instruction has no effect if the condition code
807    test evaluates to FALSE.
808
809    When executed, the RET instruction pops a reference to the instruction
810    immediately following the corresponding CAL instruction onto the call
811    stack and continues execution at that instruction.
812
813    If a RET instruction is issued when the call stack is empty, the fragment
814    program is terminated.
815
816      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
817          TestCC(cc.**c*) || TestCC(cc.***c)) {
818
819        if (callStackDepth <= 0) {
820          terminate fragment program;
821        }
822
823        pop instruction following the CAL instruction off the call stack;
824        continue execution at that instruction;
825      }
826
827
828Additions to Chapter 4 of the OpenGL 1.4 Specification (Per-Fragment
829Operations and the Frame Buffer)
830
831    None.
832
833Additions to Chapter 5 of the OpenGL 1.4 Specification (Special Functions)
834
835    None.
836
837Additions to Chapter 6 of the OpenGL 1.4 Specification (State and
838State Requests)
839
840    None.
841
842Additions to Appendix A of the OpenGL 1.4 Specification (Invariance)
843
844    None.
845
846Additions to the AGL/GLX/WGL Specifications
847
848    None.
849
850Dependencies on ARB_fragment_program
851
852    ARB_fragment_program is required.
853
854    This specification and NV_fragment_program_option are based on a modified
855    version of the grammar published in the ARB_fragment_program
856    specification.  This modified grammar includes a few structural changes to
857    better accommodate new functionality from this and other extensions, but
858    should be functionally equivalent to the ARB_fragment_program grammar.
859    See NV_fragment_program_option for details on the base grammar.
860
861Dependencies on NV_fragment_program2_option
862
863    NV_fragment_program_option is required.
864
865    If the NV_fragment_program2 program option is specified, all the
866    functionality described in both this extension and the
867    NV_fragment_program_option specification is available.
868
869GLX Protocol
870
871    None.
872
873Errors
874
875    None.
876
877New State
878
879    None.
880
881New Implementation Dependent State
882                                                                  Min
883    Get Value                            Type    Get Command      Value   Description         Sec       Attrib
884    -----------------------------------  ----    ---------------  ------  -----------------   --------  ------
885    MAX_PROGRAM_EXEC_INSTRUCTIONS_NV     Z+      GetProgramivARB  65536   maximum program     3.11.4.X  -
886                                                                          execution inst-
887                                                                          ruction count
888    MAX_PROGRAM_CALL_DEPTH_NV            Z+      GetProgramivARB  4       maximum program     3.11.4.X  -
889                                                                          call stack depth
890    MAX_PROGRAM_IF_DEPTH_NV              Z+      GetProgramivARB  48      maximum program     3.11.4.X  -
891                                                                          if nesting
892    MAX_PROGRAM_LOOP_DEPTH_NV            Z+      GetProgramivARB  4       maximum program     3.11.4.X  -
893                                                                          loop nesting
894    MAX_PROGRAM_LOOP_COUNT_NV            Z+      GetProgramivARB  255     maximum program     3.11.4.X  -
895                                                                          initial loop count
896
897    (add to Table X.10.  New Implementation-Dependent Values Introduced by
898     ARB_fragment_program.  Values queried by GetProgramivARB require a <pname>
899     of FRAGMENT_PROGRAM_ARB.)
900
901Revision History
902
903    Rev.  Date      Author   Changes
904    ----  --------  -------  --------------------------------------------
905    8     08/04/04  pbrown   Fixed two typos in the TXL instruction.
906
907    7     07/08/04  pbrown   Fixed entries for KIL and RFL in the opcode
908                             table.
909
910    6     05/16/04  pbrown   Documented that "A0" is a pre-defined address
911                             register variable for the purposes of the
912                             grammar, and that no other address register
913                             variables can be declared.
914
915    5     --------  pbrown   Internal pre-release revisions.
916