• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    NV_vertex_program3
4
5Name Strings
6
7    GL_NV_vertex_program3
8
9Contact
10
11    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
12
13Status
14
15    Shipping.
16
17Version
18
19    Last Modified Data:         10/12/2009
20    NVIDIA Revision:            7
21
22Number
23
24    306
25
26Dependencies
27
28    ARB_vertex_program is required.
29    NV_vertex_program2_option is required.
30    This extension interacts with ARB_fragment_program_shadow.
31
32Overview
33
34    This extension, like the NV_vertex_program2_option extension,
35    provides additional vertex program functionality to extend the
36    standard ARB_vertex_program language and execution environment.
37    ARB programs wishing to use this added functionality need only add:
38
39        OPTION NV_vertex_program3;
40
41    to the beginning of their vertex programs.
42
43    New functionality provided by this extension, above and beyond that
44    already provided by NV_vertex_program2_option extension, includes:
45
46        * texture lookups in vertex programs,
47
48        * ability to push and pop address registers on the stack,
49
50        * address register-relative addressing for vertex attribute and
51          result arrays, and
52
53        * a second four-component condition code.
54
55Issues
56
57    Should we provided a separate "!!VP3.0" program type, like the
58    "!!VP2.0" type defined in NV_vertex_program2?
59
60      RESOLVED:  No.  Since ARB_vertex_program has been fully defined
61      (it wasn't in the !!VP2.0 time-frame), we will simply define
62      language extensions to !!ARBvp1.0 that expose new functionality.
63      The NV_vertex_program2_option specification followed this same
64      pattern for the NV3X family (GeForce FX, Quadro FX).
65
66    Should this be called "NV_vertex_program3_option"?
67
68      RESOLVED:  No.  The similar extension to !!ARBvp1.0 called
69      "NV_vertex_program2_option" got that name only because the simpler
70      "NV_vertex_program2" name had already been used.
71
72    Is there a limit on the number of texture units that can be accessed
73    by a vertex program?
74
75      RESOLVED:  Yes.  The limit may be lower than the total number of texture
76      image units available and is given by the implementation-dependent
77      constant MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB.  Any program that attempts
78      to use more unique texture image units will fail to load.  Programs can
79      use any texture image unit number, as long as they don't use too many
80      simultaneously.  As an example, the GeForce 6 series of GPUs provides 16
81      texture image units accessible to vertex programs, but no more than four
82      can be used simultaneously.  It is not an error to use texture image
83      units 12-15 in a program.
84
85      This limitation is identical to the one in the ARB_vertex_shader
86      extensions -- both extensions use the same enum to query the number of
87      available image units.  Violating this limit in GLSL results in a link
88      error.
89
90    Is there a restriction on the texture targets that can be accessed by a
91    vertex program?
92
93      RESOLVED:  Yes -- for any texture image unit, vertex and fragment
94      processing can not use different targets.  If they do, an
95      INVALID_OPERATION is generated at Begin-time.  This resolution is
96      consistent with resultion of the same issue in the ARB_vertex_shader
97      extension and OpenGL 2.0.
98
99    Since vertices don't have screen space partial derivatives, how is
100    the LOD used for texture accesses defined?
101
102      RESOLVED:  The TXL instruction allows a program to explicitly
103      set an LOD; the LOD for all other texture instructions is zero.
104      The texture LOD bias specified in the texture object and environment
105      do apply to all vertex texture lookups.
106
107
108New Procedures and Functions
109
110    None.
111
112New Tokens
113
114    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
115    GetFloatv, and GetDoublev:
116
117        MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB              0x8B4C
118
119Additions to Chapter 2 of the OpenGL 1.4 Specification (OpenGL Operation)
120
121    Modify Section 2.14.2, Vertex Program Grammar and Restrictions
122
123    (mostly add to existing grammar rules, as extended by
124    NV_vertex_program2_option)
125
126    <optionName>            ::= "NV_vertex_program3"
127
128    <instruction>           ::= <TexInstruction>
129
130    <ALUInstruction>        ::= <ASTACKop_instruction>
131
132    <TexInstruction>        ::= <TEXop_instruction>
133
134    <ASTACKop_instruction>  ::= <PUSHAop> <instOperandAddrVNS>
135                              | <POPAop> <instResultAddr>
136
137    <PUSHAop>               ::= "PUSHA"
138
139    <POPAop>                ::= "POPA"
140
141    <TEXop_instruction>     ::= <TEXop> <instResult> "," <instOperandV> ","
142                                <texTarget>
143
144    <TEXop>                 ::= "TEX"
145                              | "TXP"
146                              | "TXB"
147                              | "TXL"
148
149    <texTarget>             ::= <texImageUnit> "," <texTargetType>
150
151    <texImageUnit>          ::= "texture" <optTexImageUnitNum>
152
153    <optTexImageUnitNum>    ::= /* empty */
154                              | "[" <texImageUnitNum> "]"
155
156    <texImageUnitNum>       ::= <integer>
157                                /*[0,MAX_TEXTURE_IMAGE_UNITS_ARB-1]*/
158
159    <texTargetType>         ::= "1D"
160                              | "2D"
161                              | "3D"
162                              | "CUBE"
163                              | "RECT"
164
165    <attribVtxBasic>        ::= "texcoord" "[" <arrayMemRel> "]"
166                              | "attrib" "[" <arrayMemRel> "]"
167
168    <resultVtxBasic>        ::= "texcoord" "[" <arrayMemRel> "]"
169
170    <ccMaskRule>            ::= "EQ0"
171                              | "GE0"
172                              | "GT0"
173                              | "LE0"
174                              | "LT0"
175                              | "NE0"
176                              | "TR0"
177                              | "FL0"
178                              | "EQ1"
179                              | "GE1"
180                              | "GT1"
181                              | "LE1"
182                              | "LT1"
183                              | "NE1"
184                              | "TR1"
185                              | "FL1"
186
187    (modify description of reserved identifiers)
188
189    ... The following strings are reserved keywords and may not be used
190    as identifiers:
191
192        ABS, ADD, ADDRESS, ALIAS, ARA, ARL, ARR, ATTRIB, BRA, CAL, COS,
193        DP3, DP4, DPH, DST, END, EX2, EXP, FLR, FRC, LG2, LIT, LOG, MAD,
194        MAX, MIN, MOV, MUL, OPTION, OUTPUT, PARAM, POPA, POW, PUSHA, RCC,
195        RCP, RET, RSQ, SEQ, SFL, SGE, SGT, SIN, SLE, SLT, SNE, SUB, SSG,
196        STR, SWZ, TEMP, TEX, TXB, TXL, TXP, XPD, program, result, state,
197        and vertex.
198
199    Modify Section 2.14.3.1, Vertex Attributes
200
201    (add new bindings to binding table)
202
203      Vertex Attribute Binding  Components  Underlying State
204      ------------------------  ----------  --------------------------------
205      ...
206      vertex.texcoord[A+n]      (s,t,r,q)   indexed texture coordinate
207      vertex.attrib[A+n]        (x,y,z,w)   indexed generic vertex attribute
208
209    If a vertex attribute binding matches "vertex.texcoord[A+n]", where
210    "A" is a component of an address register (Section 2.14.3.5), a
211    texture coordinate number <c> is computed by adding the current
212    value of the address register component and <n>.  The "x", "y",
213    "z", and "w" components of the vertex attribute variable are
214    filled with the "s", "t", "r", and "q" components, respectively,
215    of the vertex texture coordinates for texture unit <c>.  If <c>
216    is negative or greater than or equal to MAX_TEXTURE_COORDS_ARB,
217    the vertex attribute variable is undefined.
218
219    If a vertex attribute binding matches "vertex.attrib[A+n]", where
220    "A" is a component of an address register (Section 2.14.3.5), a
221    vertex attribute number <a> is computed by adding the current value
222    of the address register component and <n>.  The "x", "y", "z", and
223    "w" components of the vertex attribute variable are filled with the
224    "x", "y", "z", and "w" components, respectively, of generic vertex
225    attribute <a>.  If <a> is negative or greater than or equal to
226    MAX_VERTEX_ATTRIBS_ARB, the vertex attribute variable is undefined.
227
228    Modify Section 2.14.3.4, Vertex Program Results
229
230    (add new binding to binding table)
231
232      Binding                        Components  Description
233      -----------------------------  ----------  ----------------------------
234      ...
235      result.texcoord[A+n]           (s,t,r,q)   indexed texture coordinate
236
237    If a result variable binding matches "result.texcoord[A+n]", where "A"
238    is a component of an address register (Section 2.14.3.5), a texture
239    coordinate number <c> is computed by adding the current value of
240    the address register component and <n>.  Updates to the "x", "y",
241    "z", and "w" components of the result variable set the "s", "t",
242    "r" and "q" components, respectively, of the transformed vertex's
243    texture coordinates for texture unit <c>.  If <c> is negative or
244    greater than or equal to MAX_TEXTURE_COORDS_ARB, the effects of
245    updates to vertex attribute variable are undefined and may overwrite
246    other programs results.
247
248    Modify Section 2.14.3.X, Condition Code Registers (added in
249    NV_Vertex_program2_option)
250
251    The vertex program condition code registers are two four-component
252    vectors, called CC0 and CC1.  Each component of this register is one
253    of four enumerated values: GT (greater than), EQ (equal), LT (less
254    than), or UN (unordered).  The condition code register can be used
255    to mask writes to registers and to evaluate conditional branches.
256
257    Most vertex program instructions can optionally update one of the
258    two condition code registers.  When a vertex program instruction
259    updates a condition code register, a condition code component is set
260    to LT if the corresponding component of the result is less than zero,
261    EQ if it is equal to zero, GT if it is greater than zero, and UN if
262    it is NaN (not a number).
263
264    The condition code registers are initialized to vectors of EQ values
265    each time a vertex program executes.
266
267    Modify Section 2.14.3.7, Vertex Program Resource Limits
268
269    (add new paragraph to end of section) In addition to the previous limits,
270    the number of unique texture image units that can be accessed
271    simultaneously by a vertex program is limited.  The limit is given by the
272    implementation-dependent constant MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB, and
273    may be lower than the total number of texture image units provided.  If
274    the number of texture image units referenced by a vertex program exceeds
275    this limit, the program will fail to load.
276
277    Modify Section 2.14.4, Vertex Program Execution Environment
278
279    (modify Begin-time error language for vertex program execution to cover
280    invalid texture uses)
281
282    If vertex program mode is enabled and the currently bound program object
283    does not contain a valid vertex program, the error INVALID_OPERATION will
284    be generated by Begin, RasterPos, and any command that implicitly calls
285    Begin (e.g., DrawArrays).
286
287    If vertex program mode is enabled and the currently bound program object
288    accesses a texture image unit, the texture target used must be consistent
289    with the target (if any) used for fragment processing.  If vertex and
290    fragment processing require the use of different texture targets on the
291    same texture image unit, the error INVALID_OPERATION will be generated by
292    Begin, RasterPos, and any command that implicitly calls Begin.
293
294    (modify instruction table) There are forty-eight vertex program
295    instructions.  Vertex program instructions may have up to eight
296    variants, including a suffix of "C" or "C0" to allow an update of
297    condition code register zero (section 2.14.3.X), a suffix of "C1"
298    to allow an update of condition code register one, and a suffix of
299    "_SAT" to clamp the result vector components to the range [0,1].
300    For example, the eight forms of the "ADD" instruction are "ADD",
301    "ADDC", "ADDC0", "ADDC1", "ADD_SAT", "ADDC_SAT", "ADDC0_SAT", and
302    "ADDC1_SAT".  The instructions and their respective input and output
303    parameters are summarized in Table X.5.
304
305                  Modifiers
306      Instruction   C S   Inputs  Output   Description
307      -----------   - -   ------  ------   --------------------------------
308      ABS           X X   v       v        absolute value
309      ADD           X X   v,v     v        add
310      ARA           X -   a       a        address register add
311      ARL           X -   s       a        address register load
312      ARR           X -   v       a        address register load (round)
313      BRA           - -   c       -        branch
314      CAL           - -   c       -        subroutine call
315      COS           X X   s       ssss     cosine
316      DP3           X X   v,v     ssss     3-component dot product
317      DP4           X X   v,v     ssss     4-component dot product
318      DPH           X X   v,v     ssss     homogeneous dot product
319      DST           X X   v,v     v        distance vector
320      EX2           X X   s       ssss     exponential base 2
321      EXP           X X   s       v        exponential base 2 (approximate)
322      FLR           X X   v       v        floor
323      FRC           X X   v       v        fraction
324      LG2           X X   s       ssss     logarithm base 2
325      LIT           X X   v       v        compute light coefficients
326      LOG           X X   s       v        logarithm base 2 (approximate)
327      MAD           X X   v,v,v   v        multiply and add
328      MAX           X X   v,v     v        maximum
329      MIN           X X   v,v     v        minimum
330      MOV           X X   v       v        move
331      MUL           X X   v,v     v        multiply
332      POPA          - -   -       a        pop address register
333      POW           X X   s,s     ssss     exponentiate
334      PUSHA         - -   a       -        push address register
335      RCC           X X   s       ssss     reciprocal (clamped)
336      RCP           X X   s       ssss     reciprocal
337      RET           - -   c       -        subroutine return
338      RSQ           X X   s       ssss     reciprocal square root
339      SEQ           X X   v,v     v        set on equal
340      SFL           X X   v,v     v        set on false
341      SGE           X X   v,v     v        set on greater than or equal
342      SGT           X X   v,v     v        set on greater than
343      SIN           X X   s       ssss     sine
344      SLE           X X   v,v     v        set on less than or equal
345      SLT           X X   v,v     v        set on less than
346      SNE           X X   v,v     v        set on not equal
347      SSG           X X   v       v        set sign
348      STR           X X   v,v     v        set on true
349      SUB           X X   v,v     v        subtract
350      SWZ           X X   v       v        extended swizzle
351      TEX           X X   v       v        texture lookup
352      TXB           X X   v       v        texture lookup with LOD bias
353      TXL           X X   v       v        texture lookup with explicit LOD
354      TXP           X X   v       v        projective texture lookup
355      XPD           X X   v,v     v        cross product
356
357      Table X.5:  Summary of vertex program instructions.  The columns
358      "C" and "S" indicate whether the "C", "C0", and "C1" condition code
359      update modifiers, and the "_SAT" saturation modifiers, respectively,
360      are supported for the opcode.  "v" indicates a floating-point vector
361      input or output, "s" indicates a floating-point scalar input,
362      "ssss" indicates a scalar output replicated across a 4-component
363      result vector, "a" indicates a vector address register, and "c"
364      indicates a condition code test.
365
366    Rewrite Section 2.14.4.3,  Vertex Program Destination Register Update
367
368    A vertex program instruction can optionally clamp the results of
369    a floating-point result vector to the range [0,1].  The components
370    of the result vector are clamped to [0,1] if the saturation suffix
371    "_SAT" is present in the instruction.
372
373    Most vertex program instructions write a 4-component result vector to
374    a single temporary or vertex result register.  Writes to individual
375    components of the destination register are controlled by individual
376    component write masks specified as part of the instruction.
377
378    The component write mask is specified by the <optionalMask> rule
379    found in the <maskedDstReg> rule.  If the optional mask is "",
380    all components are enabled.  Otherwise, the optional mask names
381    the individual components to enable.  The characters "x", "y",
382    "z", and "w" match the x, y, z, and w components respectively.
383    For example, an optional mask of ".xzw" indicates that the x, z,
384    and w components should be enabled for writing but the y component
385    should not.  The grammar requires that the destination register mask
386    components must be listed in "xyzw" order.  The condition code write
387    mask is specified by the <ccMask> rule found in the <instResultCC>
388    and <instResultAddrCC> rules.  Otherwise, the selected condition
389    code register is loaded and swizzled according to the swizzle
390    codes specified by <swizzleSuffix>.  Each component of the swizzled
391    condition code is tested according to the rule given by <ccMaskRule>.
392    <ccMaskRule> may have the values "EQ", "NE", "LT", "GE", LE", or "GT",
393    which mean to enable writes if the corresponding condition code field
394    evaluates to equal, not equal, less than, greater than or equal, less
395    than or equal, or greater than, respectively.  Comparisons involving
396    condition codes of "UN" (unordered) evaluate to true for "NE" and
397    false otherwise.  For example, if the condition code is (GT,LT,EQ,GT)
398    and the condition code mask is "(NE.zyxw)", the swizzle operation
399    will load (EQ,LT,GT,GT) and the mask will thus will enable writes on
400    the y, z, and w components.  In addition, "TR" always enables writes
401    and "FL" always disables writes, regardless of the condition code.
402    If the condition code mask is empty, it is treated as "(TR)".
403
404    Each component of the destination register is updated with the result
405    of the vertex program instruction if and only if the component is
406    enabled for writes by both the component write mask and the condition
407    code write mask.  Otherwise, the component of the destination register
408    remains unchanged.
409
410    A vertex program instruction can also optionally update the condition
411    code register.  The condition code is updated if the condition
412    code register update suffix "C" is present in the instruction.
413    The instruction "ADDC" will update the condition code; the otherwise
414    equivalent instruction "ADD" will not.  If condition code updates
415    are enabled, each component of the destination register enabled
416    for writes is compared to zero.  The corresponding component of
417    the condition code is set to "LT", "EQ", or "GT", if the written
418    component is less than, equal to, or greater than zero, respectively.
419    Condition code components are set to "UN" if the written component is
420    NaN (not a number).  Values of -0.0 and +0.0 both evaluate to "EQ".
421    If a component of the destination register is not enabled for writes,
422    the corresponding condition code component is also unchanged.
423
424    In the following example code,
425
426        # R1=(-2, 0, 2, NaN)              R0                  CC
427        MOVC R0, R1;               # ( -2,  0,   2, NaN) (LT,EQ,GT,UN)
428        MOVC R0.xyz, R1.yzwx;      # (  0,  2, NaN, NaN) (EQ,GT,UN,UN)
429        MOVC R0 (NE), R1.zywx;     # (  0,  0, NaN,  -2) (EQ,EQ,UN,LT)
430
431    the first instruction writes (-2,0,2,NaN) to R0 and updates the
432    condition code to (LT,EQ,GT,UN).  The second instruction, only the
433    "x", "y", and "z" components of R0 and the condition code are updated,
434    so R0 ends up with (0,2,NaN,NaN) and the condition code ends up with
435    (EQ,GT,UN,UN).  In the third instruction, the condition code mask
436    disables writes to the x component (its condition code field is "EQ"),
437    so R0 ends up with (0,0,NaN,-2) and the condition code ends up with
438    (EQ,EQ,UN,LT).
439
440    The following pseudocode illustrates the process of writing a
441    result vector to the destination register.  In the pseudocode,
442    "instrSaturate" is TRUE if and only if result saturation is
443    enabled, "instrMask" refers to the component write mask given by
444    the <optWriteMask> rule.  "ccMaskRule" refers to the condition code
445    mask rule given by <ccMask> and "updatecc" is TRUE if and only if
446    condition code updates are enabled.  "result", "destination", and "cc"
447    refer to the result vector, the register selected by <dstRegister>
448    and the condition code, respectively.  Condition codes do not exist
449    in the VP1 execution environment.
450
451      boolean TestCC(CondCode field) {
452          switch (ccMaskRule) {
453          case "EQ":  return (field == "EQ");
454          case "NE":  return (field != "EQ");
455          case "LT":  return (field == "LT");
456          case "GE":  return (field == "GT" || field == "EQ");
457          case "LE":  return (field == "LT" || field == "EQ");
458          case "GT":  return (field == "GT");
459          case "TR":  return TRUE;
460          case "FL":  return FALSE;
461          case "":    return TRUE;
462          }
463      }
464
465      enum GenerateCC(float value) {
466        if (value == NaN) {
467          return UN;
468        } else if (value < 0) {
469          return LT;
470        } else if (value == 0) {
471          return EQ;
472        } else {
473          return GT;
474        }
475      }
476
477      void UpdateDestination(floatVec destination, floatVec result)
478      {
479          floatVec merged;
480          ccVec    mergedCC;
481
482          // Clamp result components to [0,1] if requested in the instruction.
483          if (instrSaturate) {
484              if (result.x < 0)      result.x = 0;
485              else if (result.x > 1) result.x = 1;
486              if (result.y < 0)      result.y = 0;
487              else if (result.y > 1) result.y = 1;
488              if (result.z < 0)      result.z = 0;
489              else if (result.z > 1) result.z = 1;
490              if (result.w < 0)      result.w = 0;
491              else if (result.w > 1) result.w = 1;
492          }
493
494          // Merge the converted result into the destination register, under
495          // control of the compile- and run-time write masks.
496          merged = destination;
497          mergedCC = cc;
498          if (instrMask.x && TestCC(cc.c***)) {
499              merged.x = result.x;
500              if (updatecc) mergedCC.x = GenerateCC(result.x);
501          }
502          if (instrMask.y && TestCC(cc.*c**)) {
503              merged.y = result.y;
504              if (updatecc) mergedCC.y = GenerateCC(result.y);
505          }
506          if (instrMask.z && TestCC(cc.**c*)) {
507              merged.z = result.z;
508              if (updatecc) mergedCC.z = GenerateCC(result.z);
509          }
510          if (instrMask.w && TestCC(cc.***c)) {
511              merged.w = result.w;
512              if (updatecc) mergedCC.w = GenerateCC(result.w);
513          }
514
515          // Write out the new destination register and condition code.
516          destination = merged;
517          cc = mergedCC;
518      }
519
520    While this rule describes floating-point results, the same logic
521    applies to the integer results generated by the ARA, ARL, and ARR
522    instructions.
523
524    Add to Section 2.14.4.5, Vertex Program Options
525
526    Section 2.14.4.5.3, NV_vertex_program3 Program Option
527
528    If a vertex program specifies the "NV_vertex_program3" option, the
529    ARB_vertex_program grammar and execution environment are extended
530    to take advantage of all the features of the "NV_vertex_program2"
531    option, plus the following features:
532
533        * several new instructions:
534
535          * POPA -- pop address register off stack
536          * PUSHA -- push address register onto stack
537          * TEX -- texture lookup
538          * TXB -- texture lookup w/LOD bias
539          * TXL -- texture lookup w/explicit LOD
540          * TXP -- projective texture lookup
541
542        * address register-relative addressing for vertex texture
543          coordinate and generic attribute arrays,
544
545        * address register-relative addressing for vertex texture
546          coordinate result array, and
547
548        * a second four-component condition code.
549
550
551    Modify Section 2.14.5.34,  RET:  Subroutine Call Return
552
553    The RET instruction conditionally returns from a subroutine initiated
554    by a CAL instruction by popping an instruction reference off the
555    top of the call stack and transferring control to the referenced
556    instruction.  The following pseudocode describes the operation of
557    the instruction:
558
559      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
560          TestCC(cc.**c*) || TestCC(cc.***c)) {
561        if (callStackDepth <= 0) {
562          // terminate vertex program normally
563        } else {
564          callStackDepth--;
565          if (callStack[callStackDepth] is a instruction reference) {
566            instruction = callStack[callStackDepth];
567          } else {
568            // terminate vertex program abnormally
569          }
570        }
571
572        // continue execution at <instruction>
573      } else {
574        // do nothing
575      }
576
577    In the pseudocode, <callStackDepth> is the depth of the call stack,
578    <callStack> is an array holding the call stack, and <instruction> is
579    a reference to an instruction previously pushed onto the call stack.
580
581    If the call stack is empty when RET executes, the vertex program
582    terminates normally.
583
584    The vertex program terminates abnormally if the entry at the top of the
585    call stack is not an instruction reference pushed by CAL.  When a vertex
586    program terminates abnormally, all of the vertex program results are
587    undefined.
588
589    Add to Section 2.14.5,  Vertex Program Instruction Set
590
591    Section 2.14.5.43, POPA:  Pop Address Register Stack
592
593    The POPA instruction generates a integer result vector by popping
594    an entry off of the call stack.
595
596      if (callStackDepth <= 0) {
597        terminate vertex program;
598      } else {
599        callStackDepth--;
600        if (callStack[callStackDepth] is an address register) {
601          iresult = callStack[callStackDepth];
602        } else {
603          terminate vertex program;
604        }
605      }
606
607    POPA does not support non-default write masks; a program will fail to load
608    if it includes a component write mask other than ".xyzw" or a condition
609    code write mask test other than "TR".
610
611    In the pseudocode, <callStackDepth> is the current depth of the call
612    stack and <callStack> is an array holding the call stack.
613
614    The vertex program terminates abnormally if it executes a POPA instruction
615    when the call stack is empty, or when the entry at the top of the call
616    stack is not an address register pushed by PUSHA.  When a vertex program
617    terminates abnormally, all of the vertex program results are undefined.
618
619    Section 2.14.5.44, PUSHA:  Push Address Register Stack
620
621    The PUSHA instruction pushes the address register operand onto the
622    call stack, which is also used for subroutine calls.  The PUSHA
623    instruction does not generate a result vector.
624
625      tmp = AddrVectorLoad(op0);
626      if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) {
627        terminate vertex program;
628      } else {
629        callStack[callStackDepth] = tmp;
630        callStackDepth++;
631      }
632
633    In the pseudocode, <callStackDepth> is the current depth of the call
634    stack and <callStack> is an array holding the call stack.
635
636    The vertex program terminates abnormally if it executes a PUSHA
637    instruction when the call stack is full.  When a vertex program terminates
638    abnormally, all of the vertex program results are undefined.
639
640    Component swizzling is not supported when the operand is loaded.
641
642    Section 2.14.5.45, TEX:  Texture Lookup
643
644    The TEX instruction uses the single vector operand to perform a
645    lookup in the specified texture map, yielding a 4-component result
646    vector containing filtered texel values.  The (s,t,r,q) coordinates
647    used for the texture lookup are (x,y,z,1), where x, y, and z are
648    components of the vector operand.
649
650      tmp = VectorLoad(op0);
651      result = TextureSample(tmp.x, tmp.y, tmp.z, 1.0, 0.0, unit, target);
652
653    where <unit> and <target> are the texture image unit number and
654    target type, matching the <texImageUnitNum> and <texTargetType>
655    grammar rules.
656
657    The resulting sample is mapped to RGBA as described in Table 3.21,
658    and the R, G, B, and A values are written to the x, y, z, and w
659    components, respectively, of the result vector.
660
661    Since partial derivatives of the texture coordinates are not defined,
662    the base LOD value for vertex texture lookups is defined to be
663    zero.  The value of lambda' used in equation 3.16 will be simply
664    clamp(texobj_bias + texunit_bias).
665
666    Section 2.14.5.46, TXB:  Texture Lookup (With LOD Bias)
667
668    The TXB instruction uses the single vector operand to perform a
669    lookup in the specified texture map, yielding a 4-component result
670    vector containing filtered texel values.  The (s,t,r,q) coordinates
671    used for the texture lookup are (x,y,z,1), where x, y, and z are
672    components of the vector operand.  The w component of the operand
673    is used as an additional LOD bias.
674
675      tmp = VectorLoad(op0);
676      result = TextureSample(tmp.x, tmp.y, tmp.z, 1.0, tmp.w, unit, target);
677
678    where <unit> and <target> are the texture image unit number and
679    target type, matching the <texImageUnitNum> and <texTargetType>
680    grammar rules.
681
682    The resulting sample is mapped to RGBA as described in Table 3.21,
683    and the R, G, B, and A values are written to the x, y, z, and w
684    components, respectively, of the result vector.
685
686    Since partial derivatives of the texture coordinates are not defined,
687    the base LOD value for vertex texture lookups is defined to be
688    zero.  The value of lambda' used in equation 3.16 will be simply
689    clamp(texobj_bias + texunit_bias + tmp.w).
690
691    Since the base LOD value is zero, the TXB instruction is completely
692    equivalent to the TXL instruction, where the w component contains
693    an explicit base LOD value.
694
695    Section 2.14.5.47, TXL:  Texture Lookup (With Explicit LOD)
696
697    The TXL instruction uses the single vector operand to perform a
698    lookup in the specified texture map, yielding a 4-component result
699    vector containing filtered texel values.  The (s,t,r,q) coordinates
700    used for the texture lookup are (x,y,z,1), where x, y, and z are
701    components of the vector operand.  The w component of the operand
702    is used as the base LOD for the texture lookup.
703
704      tmp = VectorLoad(op0);
705      result = TextureSampleLOD(tmp.x, tmp.y, tmp.z, 1.0, tmp.w, unit, target);
706
707    where <unit> and <target> are the texture image unit number and
708    target type, matching the <texImageUnitNum> and <texTargetType>
709    grammar rules.
710
711    The resulting sample is mapped to RGBA as described in Table 3.21,
712    and the R, G, B, and A values are written to the x, y, z, and w
713    components, respectively, of the result vector.
714
715    The value of lambda' used in equation 3.16 will be simply tmp.w +
716    clamp(texobj_bias + texunit_bias), where tmp.w is the base LOD.
717
718    Section 2.14.5.48, TXP:  Texture Lookup (Projective)
719
720    The TXP instruction uses the single vector operand to perform a
721    lookup in the specified texture map, yielding a 4-component result
722    vector containing filtered texel values.  The (s,t,r,q) coordinates
723    used for the texture lookup are (x,y,z,w), where x, y, z, and w are
724    the four components of the vector operand.
725
726      tmp = VectorLoad(op0);
727      result = TextureSample(tmp.x, tmp.y, tmp.z, tmp.w, 0.0, unit, target);
728
729    where <unit> and <target> are the texture image unit number and
730    target type, matching the <texImageUnitNum> and <texTargetType>
731    grammar rules.
732
733    The resulting sample is mapped to RGBA as described in Table 3.21,
734    and the R, G, B, and A values are written to the x, y, z, and w
735    components, respectively, of the result vector.
736
737    Since partial derivatives of the texture coordinates are not defined,
738    the base LOD value for vertex texture lookups is defined to be
739    zero.  The value of lambda' used in equation 3.16 will be simply
740    clamp(texobj_bias + texunit_bias).
741
742Additions to Chapter 3 of the OpenGL 1.4 Specification (Rasterization)
743
744    None.
745
746Additions to Chapter 4 of the OpenGL 1.4 Specification (Per-Fragment
747Operations and the Frame Buffer)
748
749    None.
750
751Additions to Chapter 5 of the OpenGL 1.4 Specification (Special Functions)
752
753    None.
754
755Additions to Chapter 6 of the OpenGL 1.4 Specification (State and
756State Requests)
757
758    None.
759
760Additions to Appendix A of the OpenGL 1.4 Specification (Invariance)
761
762    None.
763
764Additions to the AGL/GLX/WGL Specifications
765
766    None.
767
768Dependencies on ARB_vertex_program
769
770    ARB_vertex_program is required.
771
772    This specification and NV_vertex_program2_option are based on a
773    modified version of the grammar published in the ARB_vertex_program
774    specification.  This modified grammar includes a few structural
775    changes to better accommodate new functionality from this and
776    other extensions, but should be functionally equivalent to the
777    ARB_vertex_program grammar.  See NV_vertex_program2_option for
778    details on the base grammar.
779
780Dependencies on NV_vertex_program2_option
781
782    NV_vertex_program2_option is required.
783
784    If the NV_vertex_program3 program option is specified, all
785    the functionality described in both this extension and the
786    NV_vertex_program2_option specification is available.
787
788Dependencies on ARB_fragment_program_shadow
789
790    If this extension and ARB_fragment_program shadow are both supported,
791    vertex programs may include the option statement:
792
793      OPTION ARB_fragment_program_shadow;
794
795    which enables the use of SHADOW1D, SHADOW2D, and SHADOWRECT texture
796    targets in texture lookup instructions, as described in the
797    ARB_fragment_program_shadow specification.
798
799    NVIDIA NOTE:  Drivers prior to September 2006 do not support the use of
800    this option, and will not accept texture lookups with SHADOW1D, SHADOW2D,
801    and SHADOWRECT targets.  Shadow mapping in vertex programs will result in
802    software fallbacks on GeForce 6 and GeForce 7 series GPUs, but may be done
803    in hardware on future GPUs.
804
805Errors
806
807    None.
808
809New State
810
811    None.
812
813New Implementation Dependent State:
814
815                                             Minimum
816    Get Value             Type  Get Command   Value   Description                 Section   Attr.
817    ---------             ----  -----------  -------  --------------------------  --------  -----
818    MAX_VERTEX_TEXTURE_    Z+   GetIntegerv     1     Number of separate texture  2.14.3.7  -
819      IMAGE_UNITS_ARB                                 image units that can be
820                                                      accessed by a vertex program
821
822Revision History
823
824    Rev.    Date    Author    Changes
825    ----  --------  --------  --------------------------------------------
826    7     10/12/09  pbrown    Update grammar/documentation of PUSHA/POPA to
827                              reflect the implementation.  <instResultAddr> is
828                              used for POPA with some semantic checks.  Note
829                              that some driver versions erroneously allowed
830                              conditional write masks on POPA.  Also clarify
831                              that ARB_fragment_program_shadow includes
832                              support for "SHADOWRECT".
833
834    6     09/27/06  pbrown    Document that ARB_fragment_program_shadow is
835                              allowed, to enable the use of "SHADOW1D" and
836                              "SHADOW2D" targets for texture lookups.
837
838    5     11/07/05  pbrown    Fix PUSHA documentation to specify the right
839                              constant name used for overflow testing.
840
841    4     09/01/05  pbrown    Fix spec language to document that a vertex
842                              program will fail to compile if it uses "too
843                              many" textures -- previously only documented
844                              in the issues section.
845
846    3     08/25/05  pbrown    Document that using a different texture target
847                              than fragment processing on the same texture
848                              unit results in an INVALID_OPERATION error at
849                              Begin time.  This is consistent with GLSL
850                              language in the ARB_shader_objects and OpenGL
851                              2.0 specifications.  The implementation has
852                              always done this, but it was overlooked in
853                              the spec language.
854
855    2     06/23/04  pbrown    Documented that vertex results are undefined
856                              when a vertex program terminates abnormally
857                              (e.g., PUSHA/POPA stack overflow/underflow).
858                              Documented error in RET if the top of the call
859                              stack contains a value written by PUSHA.
860
861    1     --------  pbrown    Initial pre-release revisions.
862
863