• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    NV_vertex_program2
4
5Name Strings
6
7    GL_NV_vertex_program2
8
9Contact
10
11    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
12    Mark Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com)
13
14Notice
15
16    Copyright NVIDIA Corporation, 2000-2002.
17
18IP Status
19
20    NVIDIA Proprietary.
21
22Status
23
24    Implemented in CineFX (NV30) Emulation driver, August 2002.
25    Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003.
26
27Version
28
29    Last Modified Date:  03/18/2008
30    NVIDIA Revision:     33
31
32Number
33
34    287
35
36Dependencies
37
38    Written based on the wording of the OpenGL 1.3 Specification and requires
39    OpenGL 1.3.
40
41    Written based on the wording of the NV_vertex_program extension
42    specification, version 1.0.
43
44    NV_vertex_program is required.
45
46Overview
47
48    This extension further enhances the concept of vertex programmability
49    introduced by the NV_vertex_program extension, and extended by
50    NV_vertex_program1_1.  These extensions create a separate vertex program
51    mode where the configurable vertex transformation operations in unextended
52    OpenGL are replaced by a user-defined program.
53
54    This extension introduces the VP2 execution environment, which extends the
55    VP1 execution environment introduced in NV_vertex_program.  The VP2
56    environment provides several language features not present in previous
57    vertex programming execution environments:
58
59      * Branch instructions allow a program to jump to another instruction
60        specified in the program.
61
62      * Branching support allows for up to four levels of subroutine
63        calls/returns.
64
65      * A four-component condition code register allows an application to
66        compute a component-wise write mask at run time and apply that mask to
67        register writes.
68
69      * Conditional branches are supported, where the condition code register
70        is used to determine if a branch should be taken.
71
72      * Programmable user clipping is supported support (via the CLP0-CLP5
73        clip distance registers).  Primitives are clipped to the area where
74        the interpolated clip distances are greater than or equal to zero.
75
76      * Instructions can perform a component-wise absolute value operation on
77        any operand load.
78
79    The VP2 execution environment provides a number of new instructions, and
80    extends the semantics of several instructions already defined in
81    NV_vertex_program.
82
83      * ARR:  Operates like ARL, except that float-to-int conversion is done
84        by rounding.  Equivalent results could be achieved (less efficiently)
85        in NV_vertex program using an ADD/ARL sequence and a program parameter
86        holding the value 0.5.
87
88      * BRA, CAL, RET:  Branch, subroutine call, and subroutine return
89        instructions.
90
91      * COS, SIN:  Adds support for high-precision sine and cosine
92        computations.
93
94      * FLR, FRC:  Adds support for computing the floor and fractional portion
95        of floating-point vector components.  Equivalent results could be
96        achieved (less efficiently) in NV_vertex_program using the EXP
97        instruction to compute the fractional portion of one component at a
98        time.
99
100      * EX2, LG2:  Adds support for high-precision exponentiation and
101        logarithm computations.
102
103      * ARA:  Adds pairs of components of an address register; useful for
104        looping and other operations.
105
106      * SEQ, SFL, SGT, SLE, SNE, STR:  Add six new "set on" instructions,
107        similar to the SLT and SGE instructions defined in NV_vertex_program.
108        Equivalent results could be achieved (less efficiently) in
109        NV_vertex_program with multiple SLT, SGE, and arithmetic instructions.
110
111      * SSG:  Adds a new "set sign" operation, which produces a vector holding
112        negative one for negative components, zero for components with a value
113        of zero, and positive one for positive components.  Equivalent results
114        could be achieved (less efficiently) in NV_vertex_program with
115        multiple SLT, SGE, and arithmetic instructions.
116
117      * The ARL instruction is extended to operate on four components instead
118        of a single component.
119
120      * All instructions that produce integer or floating-point result vectors
121        have variants that update the condition code register based on the
122        result vector.
123
124    This extension also raises some of the resource limitations in the
125    NV_vertex_program extension.
126
127      * 256 program parameter registers (versus 96 in NV_vertex_program).
128
129      * 16 temporary registers (versus 12 in NV_vertex_program).
130
131      * Two four-component integer address registers (versus one
132        single-component register in NV_vertex_program).
133
134      * 256 total vertex program instructions (versus 128 in
135        NV_vertex_program).
136
137      * Including loops, programs can execute up to 64K instructions.
138
139
140Issues
141
142    This extension builds upon the NV_vertex_program extension.  Should this
143    specification contain selected edits to the NV_vertex_program
144    specification or should the specs be unified?
145
146      RESOLVED:  Since NV_vertex_program and NV_vertex_program2 programs share
147      many features, the main section of this specification is unified and
148      describes both types of programs.  Other sections containing
149      NV_vertex_program features that are unchanged by this extension will not
150      be edited.
151
152    How can a program use condition codes to avoid extra computations?
153
154      Consider the example of evaluating the OpenGL lighting model for a
155      given light.  If the diffuse dot product is negative (roughly 1/2 the
156      time for random geometry), the only contribution to the light is
157      ambient.  In this case, condition codes and branching can skip over a
158      number of unneeded instructions.
159
160          # R0 holds accumulated light color
161          # R2 holds normal
162          # R3 holds computed light vector
163          # R4 holds computed half vector
164          # c[0] holds ambient light/material product
165          # c[1] holds diffuse light/material product
166          # c[2].xyz holds specular light/material product
167          # c[2].w   holds specular exponent
168          DP3C R1.x, R2, R3;            # diffuse dot product
169          ADD  R0, R0, c[0];            # accumulate ambient
170          BRA  pointsAway (LT.x)        # skip rest if diffuse dot < 0
171          MOV  R1.w, c[2].w;
172          DP3  R1.y, R2, R4;            # specular dot product
173          LIT  R1, R1;                  # compute expontiated specular
174          MAD  R4, c[1], R0.y;          # accumulate diffuse
175          MAD  R4, c[2], R0.z;          # accumulate specular
176        pointsAway:
177          ...                           # continue execution
178
179    How can a program use subroutines?
180
181      With subroutines, a program can encapsulate a small piece of
182      functionality into a subroutine and call it multiple times, as in CPU
183      code.  Applications will need to identify the registers used to pass
184      data to and from the subroutine.
185
186      Subroutines could be used for applications like evaluating lighting
187      equations for a single light.  With conditional branching and
188      subroutines, a variable number of lights (which could even vary
189      per-vertex) can be easily supported.
190
191        accumulate:
192          # R0 holds the accumulated result
193          # R1 holds the value to add
194          ADD R0, R1;
195          RET;
196
197          # Compute floor(A)*B by repeated addition using a subroutine.  Yes,
198          # this is a stupid example.
199          #
200          # c[0] holds (A,B,0,1).
201          # R0 holds the accumulated result
202          # R1 holds B, the value to accumulate.
203          # R2 holds the number of iterations remaining.
204          MOV R0, c[0].z;               # start with zero
205          MOV R1, c[0].y;
206          FLRC R2.x, c[0].x;
207          BRA done (LE.x);
208        top:
209          CAL accumulate;
210          ADDC R2.x, R2.x, -c[0].w;     # decrement count
211          BRA top (GT.x);
212        done:
213          ...
214
215    How can conventional OpenGL clip planes be supported in vertex programs?
216
217      The clip distance in the OpenGL specification can be evaluated with a
218      simple DP4 instruction that writes to one of the six clip distance
219      registers.  Primitives will automatically be clipped to the half-space
220      where o[CLPx] >= 0, which matches the definition in the spec.
221
222          # R0 holds eye coordinates
223          # c[0] holds eye-space clip plane coefficients
224          DP4 o[CLP0].x, R0, c[0];
225
226      Note that the clip plane or clip distance volume corresponding to the
227      o[CLPn] register used must be enabled, or no clipping will be performed.
228
229      The clip distance registers allow for clip distance volumes to be
230      computed more-or-less arbitrarily.  To approximate clipping to a sphere
231      of radius <n>, the following code can be used.
232
233          # R0 holds eye coordinates
234          # c[0].xyz holds sphere center
235          # c[0].w holds the square of the sphere radius
236          SUB R1.xyz, R0, c[0];            # distance vector
237          DP3 R1.w, R1, R1;                # compute distance squared
238          SUB o[CLP0].x, c[0].w, R1.w;     # compute r^2 - d^2
239
240      Since the clip distance is interpolated linearly over a primitive, the
241      clip distance evaluated at a point will represent a piecewise-linear
242      approximation of the true distance.  The approximation will become
243      increasingly more accurate as the primitive is tesselated more finely.
244
245    How can looping be achieved in vertex programs?
246
247      Simple loops can be achieved using a general purpose floating-point
248      register component as a counter.  The following code calls a function
249      named "function" <n> times, where <n> is specified in a program
250      parameter register component.
251
252          # c[0].x holds the number of iterations to execute.
253          # c[1].x holds the constant 1.0.
254          MOVC R15.x, c[0].x;
255        startLoop:
256          CAL  function (GT.x);             # if (counter > 0) function();
257          SUBC R15.x, R15.x, c[1].x;        # counter = counter - 1;
258          BRA  startLoop (GT.x);            # if (counter > 0) goto start;
259        endLoop:
260          ...
261
262      More complex loops (where a separate index may be needed for indexed
263      addressing into the program parameter array) can be achieved using the
264      ARA instruction, which will add the x/z and y/w components of an address
265      register.
266
267          # c[0].x holds the number of iterations to execute
268          # c[0].y holds the initial index value
269          # c[0].z holds the constant -1.0 (used for the iteration count)
270          # c[0].w holds the index step value
271          ARLC A1, c[0];
272        startLoop:
273          CAL  function (GT.x);             # if (counter > 0) function();
274                                            # Note: A1.y can be used for
275                                            # indexing in function().
276          ARAC A1.xy, A1;                   # counter = counter - 1;
277                                            # index += loopStep;
278          BRA  startLoop (GT.x);            # if (counter > 0) goto start;
279        endLoop:
280          ...
281
282    Should this specification add support for vertex state programs beyond the
283    VP1 execution environment?
284
285      No.  Vertex state programs are a little-used feature of
286      NV_vertex_program and don't perform particularly well.  They are still
287      supported for compatibility with the original NV_vertex_program spec,
288      but they will not be extended to support new features.
289
290    How are NaN's be handled in the "set on" instructions (SEQ, SGE, SGT, SLE,
291    SLT, SNE)?  What about MIN, MAX?  SSG?  When doing condition code tests?
292
293      Any of these instructions involving a NaN operand will produce a NaN
294      result.  This behavior differs from the NV_fragment_program extension.
295      There, SEQ, SGE, SGT, SLE, and SLT will produce 0.0 if either operand is
296      a NaN, and SNE will produce 1.0 if either operand is a NaN.
297
298      For condition code updates, NaN values will result in "UN" condition
299      codes.  All conditionals using a "UN" condition code, except "TR" and
300      "NE" will evaluate to false.  This behavior is identical to the
301      functionality in NV_fragment_program.
302
303    How can the various features of this extension be used to provide skinning
304    functionality similar to that in ARB_vertex_blend and ARB_matrix_palette?
305    And how can that functionality be extended?
306
307      Assume an implementation that allows application of up to 8 matrices at
308      once.  Further assume that v[12].xyzw and v[13].xyzw hold the set of 8
309      weights, and v[14].xyzw and v[15].xyzw hold the set of 8 matrix indices.
310      Furthermore, assume that the palette of matrices are stored/tracked at
311      c[0], c[4], c[8], and so on.  As an additional optimization, an
312      application can specify that fewer than 8 matrices should be applied by
313      storing a negative palette index immediately after the last index is
314      applied.
315
316      Skinning support in this example can be provided by the following code:
317
318          ARLC A0, v[14];                 # load 4 palette indices at once
319          DP4 R1.x, c[A0.x+0], v[0];      # 1st matrix transform
320          DP4 R1.y, c[A0.x+1], v[0];
321          DP4 R1.z, c[A0.x+2], v[0];
322          DP4 R1.w, c[A0.x+3], v[0];
323          MUL R0, R1, v[12].x;            # accumulate weighted sum in R0
324          BRA end (LT.y);                 # stop on a negative matrix index
325          DP4 R1.x, c[A0.y+0], v[0];      # 2nd matrix transform
326          DP4 R1.y, c[A0.y+1], v[0];
327          DP4 R1.z, c[A0.y+2], v[0];
328          DP4 R1.w, c[A0.y+3], v[0];
329          MAD R0, R1, v[12].y, R0;        # accumulate weighted sum in R0
330          BRA end (LT.z);                 # stop on a negative matrix index
331
332          ...                             # 3rd and 4th matrix transform
333
334          ARLC A0, v[15];                 # load next four palette indices
335          BRA end (LT.x);
336          DP4 R1.x, c[A0.x+0], v[0];      # 5th matrix transform
337          DP4 R1.y, c[A0.x+1], v[0];
338          DP4 R1.z, c[A0.x+2], v[0];
339          DP4 R1.w, c[A0.x+3], v[0];
340          MAD R0, R1, v[13].x, R0;        # accumulate weighted sum in R0
341          BRA end (LT.y);                 # stop on a negative matrix index
342
343          ...                             # 6th, 7th, and 8th matrix transform
344
345        end:
346          ...                             # any additional instructions
347
348      The amount of code used by this example could further be reduced using a
349      subroutine performing four transformations at a time:
350
351          ARLC A0, v[14];  # load first four indices
352          CAL  skin4;      # do first four transformations
353          BRA  end (LT);   # end if any of the first 4 indices was < 0
354          ARLC A0, v[15];  # load second four indices
355          CAL  skin4;      # do second four transformations
356        end:
357          ...              # any additional instructions
358
359    Why does the RCC instruction exist?
360
361      RESOLVED:  To perform numeric operations that will avoid overflow and
362      underflow issues.
363
364    Should the specification provide more examples?
365
366      RESOLVED:  It would be nice.
367
368
369New Procedures and Functions
370
371    None.
372
373
374New Tokens
375
376    None.
377
378
379Additions to Chapter 2 of the OpenGL 1.3 Specification (OpenGL Operation)
380
381    Modify Section 2.11, Clipping (p. 39)
382
383    (modify last paragraph, p. 39) When the GL is not in vertex program mode
384
385    (section 2.14), this view volume may be further restricted by as many as n
386    client-defined clip planes to generate the clip volume. ...
387
388    (add before next-to-last paragraph, p. 40) When the GL is in vertex
389    program mode, the view volume may be restricted to the individual clip
390    distance volumes derived from the per-vertex clip distances (o[CLP0] -
391    o[CLP5]).  Clip distance volumes are applied if and only if per-vertex
392    clip distances are not supported in the vertex program execution
393    environment.  A point P belonging to the primitive under consideration is
394    in the clip distance volume numbered n if and only if
395
396      c_n(P) >= 0,
397
398    where c_n(P) is the interpolated value of the clip distance CLPn at the
399    point P.  For point primitives, c_n(P) is simply the clip distance for the
400    vertex in question.  For line and triangle primitives, per-vertex clip
401    distances are interpolated using a weighted mean, with weights derived
402    according to the algorithms described in sections 3.4 and 3.5.
403
404    (modify next-to-last paragraph, p.40) Client-defined clip planes or clip
405    distance volumes are enabled with the generic Enable command and disabled
406    with the Disable command. The value of the argument to either command is
407    CLIP PLANEi where i is an integer between 0 and n; specifying a value of i
408    enables or disables the plane equation with index i. The constants obey
409    CLIP PLANEi = CLIP PLANE0 + i.
410
411
412    Add Section 2.14,  Vertex Programs (p. 57).  This section supersedes the
413    similar section added in the NV_vertex_program extension and extended in
414    the NV_vertex_program1_1 extension.
415
416    The conventional GL vertex transformation model described in sections 2.10
417    through 2.13 is a configurable, but essentially hard-wired, sequence of
418    per-vertex computations based on a canonical set of per-vertex parameters
419    and vertex transformation related state such as transformation matrices,
420    lighting parameters, and texture coordinate generation parameters.
421
422    The general success and utility of the conventional GL vertex
423    transformation model reflects its basic correspondence to the typical
424    vertex transformation requirements of 3D applications.
425
426    However when the conventional GL vertex transformation model is not
427    sufficient, the vertex program mode provides a substantially more flexible
428    model for vertex transformation.  The vertex program mode permits
429    applications to define their own vertex programs.
430
431
432    Section 2.14.1, Vertex Program Execution Environment
433
434    The vertex program execution environment is an operational model that
435    defines how a program is executed.  The execution environment includes a
436    set of instructions, a set of registers, and semantic rules defining how
437    operations are performed.  There are three vertex program execution
438    environments, VP1, VP1.1, and VP2.  The environment names are taken from
439    the mandatory program prefix strings found at the beginning of all vertex
440    programs.  The VP1.1 execution environment is a minor addition to the VP1
441    execution environment, so references to the VP1 execution environment
442    below apply to both VP1 and VP1.1 execution environments except where
443    otherwise noted.
444
445    The vertex program instruction set consists primarily of floating-point
446    4-component vector operations operating on per-vertex attributes and
447    program parameters.  Vertex programs execute on a per-vertex basis and
448    operate on each vertex completely independently from the processing of
449    other vertices.  Vertex programs execute without data hazards so results
450    computed in one operation can be used immediately afterwards.  Vertex
451    programs produce a set of vertex result vectors that becomes the set of
452    transformed vertex parameters used by primitive assembly.
453
454    In the VP1 environment, vertex programs execute a finite fixed sequence of
455    instructions with no branching or looping.  In the VP2 environment, vertex
456    programs support conditional and unconditional branches and four levels of
457    subroutine calls.
458
459    The vertex program register set consists of six types of registers
460    described in the following sections.
461
462
463    Section 2.14.1.1, Vertex Attribute Registers
464
465    The Vertex Attribute Registers are sixteen 4-component vector
466    floating-point registers containing the current vertex's per-vertex
467    attributes.  These registers are numbered 0 through 15.  These registers
468    are private to each vertex program invocation and are initialized at each
469    vertex program invocation by the current vertex attribute state specified
470    with VertexAttribNV commands.  These registers are read-only during vertex
471    program execution.  The VertexAttribNV commands used to update the vertex
472    attribute registers can be issued both outside and inside of Begin/End
473    pairs.  Vertex program execution is provoked by updating vertex attribute
474    zero.  Updating vertex attribute zero outside of a Begin/End pair is
475    ignored without generating any error (identical to the Vertex command
476    operation).
477
478    The commands
479
480      void VertexAttrib{1234}{sfd}NV(uint index, T coords);
481      void VertexAttrib{1234}{sfd}vNV(uint index, T coords);
482      void VertexAttrib4ubNV(uint index, T coords);
483      void VertexAttrib4ubvNV(uint index, T coords);
484
485    specify the particular current vertex attribute indicated by index.
486    The coordinates for each vertex attribute are named x, y, z, and w.
487    The VertexAttrib1NV family of commands sets the x coordinate to the
488    provided single argument while setting y and z to 0 and w to 1.
489    Similarly, VertexAttrib2NV sets x and y to the specified values,
490    z to 0 and w to 1; VertexAttrib3NV sets x, y, and z, with w set
491    to 1, and VertexAttrib4NV sets all four coordinates.  The error
492    INVALID_VALUE is generated if index is greater than 15.
493
494    No conversions are applied to the vertex attributes specified as
495    type short, float, or double.  However, vertex attributes specified
496    as type ubyte are converted as described by Table 2.6.
497
498    The commands
499
500      void VertexAttribs{1234}{sfd}vNV(uint index, sizei n, T coords[]);
501      void VertexAttribs4ubvNV(uint index, sizei n, GLubyte coords[]);
502
503    specify a contiguous set of n vertex attributes.  The effect of
504
505      VertexAttribs{1234}{sfd}vNV(index, n, coords)
506
507    is the same (assuming no errors) as the command sequence
508
509      #define NUM k  /* where k is 1, 2, 3, or 4 components */
510      int i;
511      for (i=n-1; i>=0; i--) {
512        VertexAttrib{NUM}{sfd}vNV(i+index, &coords[i*NUM]);
513      }
514
515    VertexAttribs4ubvNV behaves similarly.
516
517    The VertexAttribNV calls equivalent to VertexAttribsNV are issued in
518    reverse order so that vertex program execution is provoked when index
519    is zero only after all the other vertex attributes have first been
520    specified.
521
522    The set and operation of vertex attribute registers are identical for both
523    VP1 and VP2 execution environment.
524
525
526    Section 2.14.1.2, Program Parameter Registers
527
528    The Program Parameter Registers are a set of 4-component floating-point
529    vector registers containing the vertex program parameters.  In the VP1
530    execution environment, there are 96 registers, numbered 0 through 95.  In
531    the VP2 execution environment, there are 256 registers, numbered 0 through
532    255.  This relatively large set of registers is intended to hold
533    parameters such as matrices, lighting parameters, and constants required
534    by vertex programs.  Vertex program parameter registers can be updated in
535    one of two ways:  by the ProgramParameterNV commands outside of a
536    Begin/End pair or by a vertex state program executed outside of a
537    Begin/End pair (vertex state programs are discussed in section 2.14.3).
538
539    The commands
540
541      void ProgramParameter4fNV(enum target, uint index,
542                                float x, float y, float z, float w)
543      void ProgramParameter4dNV(enum target, uint index,
544                                double x, double y, double z, double w)
545
546    specify the particular program parameter indicated by index.
547    The coordinates values x, y, z, and w are assigned to the respective
548    components of the particular program parameter.  target must be
549    VERTEX_PROGRAM_NV.
550
551    The commands
552
553      void ProgramParameter4dvNV(enum target, uint index, double *params);
554      void ProgramParameter4fvNV(enum target, uint index, float *params);
555
556    operate identically to ProgramParameter4fNV and ProgramParameter4dNV
557    respectively except that the program parameters are passed as an
558    array of four components.
559
560    The error INVALID_VALUE is generated if the specified index is greater
561    than or equal to the number of program parameters in the execution
562    environment (96 for VP1, 256 for VP2).
563
564    The commands
565
566      void ProgramParameters4dvNV(enum target, uint index,
567                                  uint num, double *params);
568      void ProgramParameters4fvNV(enum target, uint index,
569                                  uint num, float *params);
570
571    specify a contiguous set of num program parameters.  The effect is
572    the same (assuming no errors) as
573
574      for (i=index; i<index+num; i++) {
575        ProgramParameter4{fd}vNV(target, i, &params[i*4]);
576      }
577
578    The error INVALID_VALUE is generated if sum of <index> and <num> is
579    greater than the number of program parameters in the execution environment
580    (96 for VP1, 256 for VP2).
581
582    The program parameter registers are shared to all vertex program
583    invocations within a rendering context.  ProgramParameterNV command
584    updates and vertex state program executions are serialized with respect to
585    vertex program invocations and other vertex state program executions.
586
587    Writes to the program parameter registers during vertex state program
588    execution can be maskable on a per-component basis.
589
590    The initial value of all 96 (VP1) or 256 (VP2) program parameter registers
591    is (0,0,0,0).
592
593
594    Section 2.14.1.3, Address Registers
595
596    The Address Registers are 4-component vector registers with signed 10-bit
597    integer components.  In the VP1 execution environment, there is only a
598    single address register (A0) and only the x component of the register is
599    accessible.  In the VP2 execution environment, there are two address
600    registers (A0 and A1), of which all four components are accessible.  The
601    address registers are private to each vertex program invocation and are
602    initialized to (0,0,0,0) at every vertex program invocation.  These
603    registers can be written during vertex program execution (but not read)
604    and their values can be used for as a relative offset for reading vertex
605    program parameter registers.  Only the vertex program parameter registers
606    can be read using relative addressing (writes using relative addressing
607    are not supported).
608
609    See the discussion of relative addressing of program parameters in section
610    2.14.2.1 and the discussion of the ARL instruction in section 2.14.3.4.
611
612
613    Section 2.14.1.4, Temporary Registers
614
615    The Temporary Registers are 4-component floating-point vector registers
616    used to hold temporary results during vertex program execution.  In the
617    VP1 execution environment, there are 12 temporary registers, numbered 0
618    through 11.  In the VP2 execution environment, there are 16 temporary
619    registers, numbered 0 through 15.  These registers are private to each
620    vertex program invocation and initialized to (0,0,0,0) at every vertex
621    program invocation.  These registers can be read and written during vertex
622    program execution.  Writes to these registers can be maskable on a
623    per-component basis.
624
625    In the VP2 execution environment, there is one additional temporary
626    pseudo-register, "CC".  CC is treated as unnumbered, write-only temporary
627    register, whose sole purpose is to allow instructions to modify the
628    condition code register (section 2.14.1.6) without overwriting the
629    contents of any temporary register.
630
631
632    Section 2.14.1.5, Vertex Result Registers
633
634    The Vertex Result Registers are 4-component floating-point vector
635    registers used to write the results of a vertex program.  There are 15
636    result registers in the VP1 execution environment, and 21 in the VP2
637    execution environment.  Each register value is initialized to (0,0,0,1) at
638    the invocation of each vertex program.  Writes to the vertex result
639    registers can be maskable on a per-component basis.  These registers are
640    named in Table X.1 and further discussed below.
641
642
643    Vertex Result                                      Component
644    Register Name   Description                        Interpretation
645    --------------  ---------------------------------  --------------
646     HPOS            Homogeneous clip space position    (x,y,z,w)
647     COL0            Primary color (front-facing)       (r,g,b,a)
648     COL1            Secondary color (front-facing)     (r,g,b,a)
649     BFC0            Back-facing primary color          (r,g,b,a)
650     BFC1            Back-facing secondary color        (r,g,b,a)
651     FOGC            Fog coordinate                     (f,*,*,*)
652     PSIZ            Point size                         (p,*,*,*)
653     TEX0            Texture coordinate set 0           (s,t,r,q)
654     TEX1            Texture coordinate set 1           (s,t,r,q)
655     TEX2            Texture coordinate set 2           (s,t,r,q)
656     TEX3            Texture coordinate set 3           (s,t,r,q)
657     TEX4            Texture coordinate set 4           (s,t,r,q)
658     TEX5            Texture coordinate set 5           (s,t,r,q)
659     TEX6            Texture coordinate set 6           (s,t,r,q)
660     TEX7            Texture coordinate set 7           (s,t,r,q)
661     CLP0(*)         Clip distance 0                    (d,*,*,*)
662     CLP1(*)         Clip distance 1                    (d,*,*,*)
663     CLP2(*)         Clip distance 2                    (d,*,*,*)
664     CLP3(*)         Clip distance 3                    (d,*,*,*)
665     CLP4(*)         Clip distance 4                    (d,*,*,*)
666     CLP5(*)         Clip distance 5                    (d,*,*,*)
667
668    Table X.1:  Vertex Result Registers.  (*) Registers CLP0 through CLP5, are
669    available only in the VP2 execution environment.
670
671    HPOS is the transformed vertex's homogeneous clip space position.  The
672    vertex's homogeneous clip space position is converted to normalized device
673    coordinates and transformed to window coordinates as described at the end
674    of section 2.10 and in section 2.11.  Further processing (subsequent to
675    vertex program termination) is responsible for clipping primitives
676    assembled from vertex program-generated vertices as described in section
677    2.10 but all client-defined clip planes are treated as if they are
678    disabled when vertex program mode is enabled.
679
680    Four distinct color results can be generated for each vertex.  COL0 is the
681    transformed vertex's front-facing primary color.  COL1 is the transformed
682    vertex's front-facing secondary color.  BFC0 is the transformed vertex's
683    back-facing primary color.  BFC1 is the transformed vertex's back-facing
684    secondary color.
685
686    Primitive coloring may operate in two-sided color mode.  This behavior is
687    enabled and disabled by calling Enable or Disable with the symbolic value
688    VERTEX_PROGRAM_TWO_SIDE_NV.  The selection between the back-facing colors
689    and the front-facing colors depends on the primitive of which the vertex
690    is a part.  If the primitive is a point or a line segment, the
691    front-facing colors are always selected.  If the primitive is a polygon
692    and two-sided color mode is disabled, the front-facing colors are
693    selected.  If it is a polygon and two-sided color mode is enabled, then
694    the selection is based on the sign of the (clipped or unclipped) polygon's
695    signed area computed in window coordinates.  This facingness determination
696    is identical to the two-sided lighting facingness determination described
697    in section 2.13.1.
698
699    The selected primary and secondary colors for each primitive are clamped
700    to the range [0,1] and then interpolated across the assembled primitive
701    during rasterization with at least 8-bit accuracy for each color
702    component.
703
704    FOGC is the transformed vertex's fog coordinate.  The register's first
705    floating-point component is interpolated across the assembled primitive
706    during rasterization and used as the fog distance to compute per-fragment
707    the fog factor when fog is enabled.  However, if both fog and vertex
708    program mode are enabled, but the FOGC vertex result register is not
709    written, the fog factor is overridden to 1.0.  The register's other three
710    components are ignored.
711
712    Point size determination may operate in program-specified point size mode.
713    This behavior is enabled and disabled by calling Enable or Disable with
714    the symbolic value VERTEX_PROGRAM_POINT_SIZE_NV.  If the vertex is for a
715    point primitive and the mode is enabled and the PSIZ vertex result is
716    written, the point primitive's size is determined by the clamped x
717    component of the PSIZ register.  Otherwise (because vertex program mode is
718    disabled, program-specified point size mode is disabled, or because the
719    vertex program did not write PSIZ), the point primitive's size is
720    determined by the point size state (the state specified using the
721    PointSize command).
722
723    The PSIZ register's x component is clamped to the range zero through
724    either the hi value of ALIASED_POINT_SIZE_RANGE if point smoothing is
725    disabled or the hi value of the SMOOTH_POINT_SIZE_RANGE if point smoothing
726    is enabled.  The register's other three components are ignored.
727
728    If the vertex is not for a point primitive, the value of the PSIZ vertex
729    result register is ignored.
730
731    TEX0 through TEX7 are the transformed vertex's texture coordinate sets for
732    texture units 0 through 7.  These floating-point coordinates are
733    interpolated across the assembled primitive during rasterization and used
734    for accessing textures.  If the number of texture units supported is less
735    than eight, the values of vertex result registers that do not correspond
736    to existent texture units are ignored.
737
738    CLP0 through CLP5, available only in the VP2 execution environment, are
739    the transformed vertex's clip distances.  These floating-point coordinates
740    are used by post-vertex program clipping process (see section 2.11).
741
742
743    Section 2.14.1.6,  The Condition Code Register
744
745    The VP2 execution environment provides a single four-component vector
746    called the condition code register.  Each component of this register is
747    one of four enumerated values:  GT (greater than), EQ (equal), LT (less
748    than), or UN (unordered).  The condition code register can be used to mask
749    writes to registers and to evaluate conditional branches.
750
751    Most vertex program instructions can optionally update the condition code
752    register.  When a vertex program instruction updates the condition code
753    register, a condition code component is set to LT if the corresponding
754    component of the result is less than zero, EQ if it is equal to zero, GT
755    if it is greater than zero, and UN if it is NaN (not a number).
756
757    The condition code register is initialized to a vector of EQ values each
758    time a vertex program executes.
759
760    There is no condition code register available in the VP1 execution
761    environment.
762
763
764    Section 2.14.1.7,  Semantic Meaning for Vertex Attributes and Program
765                       Parameters
766
767    One important distinction between the conventional GL vertex
768    transformation mode and the vertex program mode is that per-vertex
769    parameters and other state parameters in vertex program mode do not have
770    dedicated semantic interpretations the way that they do with the
771    conventional GL vertex transformation mode.
772
773    For example, in the conventional GL vertex transformation mode, the Normal
774    command specifies a per-vertex normal.  The semantic that the Normal
775    command supplies a normal for lighting is established because that is how
776    the per-vertex attribute supplied by the Normal command is used by the
777    conventional GL vertex transformation mode.  Similarly, other state
778    parameters such as a light source position have semantic interpretations
779    based on how the conventional GL vertex transformation model uses each
780    particular parameter.
781
782    In contrast, vertex attributes and program parameters for vertex programs
783    have no pre-defined semantic meanings.  The meaning of a vertex attribute
784    or program parameter in vertex program mode is defined by how the vertex
785    attribute or program parameter is used by the current vertex program to
786    compute and write values to the Vertex Result Registers.  This is the
787    reason that per-vertex attributes and program parameters for vertex
788    programs are numbered instead of named.
789
790    For convenience however, the existing per-vertex parameters for the
791    conventional GL vertex transformation mode (vertices, normals,
792    colors, fog coordinates, vertex weights, and texture coordinates) are
793    aliased to numbered vertex attributes.  This aliasing is specified in
794    Table X.2.  The table includes how the various conventional components
795    map to the 4-component vertex attribute components.
796
797Vertex
798Attribute  Conventional                                           Conventional
799Register   Per-vertex        Conventional                         Component
800Number     Parameter         Per-vertex Parameter Command         Mapping
801---------  ---------------   -----------------------------------  ------------
802 0         vertex position   Vertex                               x,y,z,w
803 1         vertex weights    VertexWeightEXT                      w,0,0,1
804 2         normal            Normal                               x,y,z,1
805 3         primary color     Color                                r,g,b,a
806 4         secondary color   SecondaryColorEXT                    r,g,b,1
807 5         fog coordinate    FogCoordEXT                          fc,0,0,1
808 6         -                 -                                    -
809 7         -                 -                                    -
810 8         texture coord 0   MultiTexCoord(GL_TEXTURE0_ARB, ...)  s,t,r,q
811 9         texture coord 1   MultiTexCoord(GL_TEXTURE1_ARB, ...)  s,t,r,q
812 10        texture coord 2   MultiTexCoord(GL_TEXTURE2_ARB, ...)  s,t,r,q
813 11        texture coord 3   MultiTexCoord(GL_TEXTURE3_ARB, ...)  s,t,r,q
814 12        texture coord 4   MultiTexCoord(GL_TEXTURE4_ARB, ...)  s,t,r,q
815 13        texture coord 5   MultiTexCoord(GL_TEXTURE5_ARB, ...)  s,t,r,q
816 14        texture coord 6   MultiTexCoord(GL_TEXTURE6_ARB, ...)  s,t,r,q
817 15        texture coord 7   MultiTexCoord(GL_TEXTURE7_ARB, ...)  s,t,r,q
818
819Table X.2:  Aliasing of vertex attributes with conventional per-vertex
820parameters.
821
822    Only vertex attribute zero is treated specially because it is
823    the attribute that provokes the execution of the vertex program;
824    this is the attribute that aliases to the Vertex command's vertex
825    coordinates.
826
827    The result of a vertex program is the set of post-transformation
828    vertex parameters written to the Vertex Result Registers.
829    All vertex programs must write a homogeneous clip space position, but
830    the other Vertex Result Registers can be optionally written.
831
832    Clipping and culling are not the responsibility of vertex programs because
833    these operations assume the assembly of multiple vertices into a
834    primitive.  View frustum clipping is performed subsequent to vertex
835    program execution.  Clip planes are not supported in the VP1 execution
836    environment.  Clip planes are supported indirectly via the clip distance
837    (o[CLPx]) registers in the VP2 execution environment.
838
839
840    Section 2.14.1.8,  Vertex Program Specification
841
842    Vertex programs are specified as an array of ubytes.  The array is a
843    string of ASCII characters encoding the program.
844
845    The command
846
847      LoadProgramNV(enum target, uint id, sizei len,
848                    const ubyte *program);
849
850    loads a vertex program when the target parameter is VERTEX_PROGRAM_NV.
851    Multiple programs can be loaded with different names.  id names the
852    program to load.  The name space for programs is the positive integers
853    (zero is reserved).  The error INVALID_VALUE occurs if a program is loaded
854    with an id of zero.  The error INVALID_OPERATION is generated if a program
855    is loaded for an id that is currently loaded with a program of a different
856    program target.  Managing the program name space and binding to vertex
857    programs is discussed later in section 2.14.1.8.
858
859    program is a pointer to an array of ubytes that represents the program
860    being loaded.  The length of the array is indicated by len.
861
862    A second program target type known as vertex state programs is discussed
863    in 2.14.4.
864
865    At program load time, the program is parsed into a set of tokens possibly
866    separated by white space.  Spaces, tabs, newlines, carriage returns, and
867    comments are considered whitespace.  Comments begin with the character "#"
868    and are terminated by a newline, a carriage return, or the end of the
869    program array.
870
871    The Backus-Naur Form (BNF) grammar below specifies the syntactically valid
872    sequences for several types of vertex programs.  The set of valid tokens
873    can be inferred from the grammar.  The token "" represents an empty string
874    and is used to indicate optional rules.  A program is invalid if it
875    contains any undefined tokens or characters.
876
877    The grammar provides for three different vertex program types,
878    corresponding to the three vertex program execution environments.  VP1,
879    VP1.1, and VP2 programs match the grammar rules <vp1-program>,
880    <vp11-program>, and <vp2-program>, respectively.  Some grammar rules
881    correspond to features or instruction forms available only in certain
882    execution environments.  Rules beginning with the prefix "vp1-" are
883    available only to VP1 and VP1.1 programs.  Rules beginning with the
884    prefixes "vp11-" and "vp2-" are available only to VP1.1 and VP2 programs,
885    respectively.
886
887
888    <program>              ::= <vp1-program>
889                             | <vp11-program>
890                             | <vp2-program>
891
892    <vp1-program>          ::= "!!VP1.0" <programBody> "END"
893
894    <vp11-program>         ::= "!!VP1.1" <programBody> "END"
895
896    <vp2-program>          ::= "!!VP2.0" <programBody> "END"
897
898    <programBody>          ::= <optionSequence> <programText>
899
900    <optionSequence>       ::= <option> <optionSequence>
901                             | ""
902
903    <option>               ::= "OPTION" <vp11-option> ";"
904                             | "OPTION" <vp2-option> ";"
905
906    <vp11-option>          ::= "NV_position_invariant"
907
908    <vp2-option>           ::= "NV_position_invariant"
909
910    <programText>          ::= <programTextItem> <programText>
911                             | ""
912
913    <programTextItem>      ::= <instruction> ";"
914                             | <vp2-instructionLabel>
915
916    <instruction>          ::= <ARL-instruction>
917                             | <VECTORop-instruction>
918                             | <SCALARop-instruction>
919                             | <BINop-instruction>
920                             | <TRIop-instruction>
921                             | <vp2-BRA-instruction>
922                             | <vp2-RET-instruction>
923                             | <vp2-ARA-instruction>
924
925    <ARL-instruction>      ::= <vp1-ARL-instruction>
926                             | <vp2-ARL-instruction>
927
928    <vp1-ARL-instruction>  ::= "ARL" <maskedAddrReg> "," <scalarSrc>
929
930    <vp2-ARL-instruction>  ::= <vp2-ARLop> <maskedAddrReg> "," <vectorSrc>
931
932    <vp2-ARLop>            ::= "ARL" | "ARLC"
933                             | "ARR" | "ARRC"
934
935    <VECTORop-instruction> ::= <VECTORop> <maskedDstReg> "," <vectorSrc>
936
937    <VECTORop>             ::= "LIT"
938                             | "MOV"
939                             | <vp11-VECTORop>
940                             | <vp2-VECTORop>
941
942    <vp11-VECTORop>        ::= "ABS"
943
944    <vp2-VECTORop>         ::=         "ABSC"
945                             | "FLR" | "FLRC"
946                             | "FRC" | "FRCC"
947                             |         "LITC"
948                             |         "MOVC"
949                             | "SSG" | "SSGC"
950
951    <SCALARop-instruction> ::= <SCALARop> <maskedDstReg> "," <scalarSrc>
952
953    <SCALARop>             ::= "EXP"
954                             | "LOG"
955                             | "RCP"
956                             | "RSQ"
957                             | <vp11-SCALARop>
958                             | <vp2-SCALARop>
959
960    <vp11-SCALARop>        ::= "RCC"
961
962    <vp2-SCALARop>         ::= "COS"  | "COSC"
963                             | "EX2"  | "EX2C"
964                             | "LG2"  | "LG2C"
965                             |          "EXPC"
966                             |          "LOGC"
967                             |          "RCCC"
968                             |          "RCPC"
969                             |          "RSQC"
970                             | "SIN"  | "SINC"
971
972    <BINop-instruction>    ::= <BINop> <maskedDstReg> "," <vectorSrc> ","
973                               <vectorSrc>
974
975    <BINop>                ::= "ADD"
976                             | "DP3"
977                             | "DP4"
978                             | "DST"
979                             | "MAX"
980                             | "MIN"
981                             | "MUL"
982                             | "SGE"
983                             | "SLT"
984                             | <vp11-BINop>
985                             | <vp2-BINop>
986
987    <vp11-BINop>           ::= "DPH"
988                             | "SUB"
989
990    <vp2-BINop>            ::=         "ADDC"
991                             |         "DP3C"
992                             |         "DP4C"
993                             |         "DPHC"
994                             |         "DSTC"
995                             |         "MAXC"
996                             |         "MINC"
997                             |         "MULC"
998                             | "SEQ" | "SEQC"
999                             | "SFL" | "SFLC"
1000                             |         "SGEC"
1001                             | "SGT" | "SGTC"
1002                             |         "SLTC"
1003                             | "SLE" | "SLEC"
1004                             | "SNE" | "SNEC"
1005                             | "STR" | "STRC"
1006                             |         "SUBC"
1007
1008    <TRIop-instruction>    ::= <TRIop> <maskedDstReg> "," <vectorSrc> ","
1009                               <vectorSrc> "," <vectorSrc>
1010
1011    <TRIop>                ::= "MAD"
1012                             | <vp2-TRIop>
1013
1014    <vp2-TRIop>            ::= "MADC"
1015
1016    <vp2-BRA-instruction>  ::= <vp2-BRANCHop> <vp2-branchLabel>
1017                                 <vp2-branchCondition>
1018
1019    <vp2-BRANCHop>         ::= "BRA"
1020                             | "CAL"
1021
1022    <vp2-RET-instruction>  ::= "RET" <vp2-branchCondition>
1023
1024    <vp2-ARA-instruction>  ::= <vp2-ARAop> <maskedAddrReg> "," <addrRegister>
1025
1026    <vp2-ARAop>            ::= "ARA" | "ARAC"
1027
1028    <scalarSrc>            ::= <baseScalarSrc>
1029                             | <vp2-absScalarSrc>
1030
1031    <vp2-absScalarSrc>     ::= <optionalSign> "|" <baseScalarSrc> "|"
1032
1033    <baseScalarSrc>        ::= <optionalSign> <srcRegister> <scalarSuffix>
1034
1035    <vectorSrc>            ::= <baseVectorSrc>
1036                             | <vp2-absVectorSrc>
1037
1038    <vp2-absVectorSrc>     ::= <optionalSign> "|" <baseVectorSrc> "|"
1039
1040    <baseVectorSrc>        ::= <optionalSign> <srcRegister> <swizzleSuffix>
1041
1042    <srcRegister>          ::= <vtxAttribRegister>
1043                             | <progParamRegister>
1044                             | <tempRegister>
1045
1046    <maskedDstReg>         ::= <dstRegister> <optionalWriteMask>
1047                                   <optionalCCMask>
1048
1049    <dstRegister>          ::= <vtxResultRegister>
1050                             | <tempRegister>
1051                             | <vp2-nullRegister>
1052
1053    <vp2-nullRegister>     ::= "CC"
1054
1055    <vp2-branchCondition>  ::= <optionalCCMask>
1056
1057    <vtxAttribRegister>    ::= "v" "[" vtxAttribRegNum "]"
1058
1059    <vtxAttribRegNum>      ::= decimal integer from 0 to 15 inclusive
1060                             | "OPOS"
1061                             | "WGHT"
1062                             | "NRML"
1063                             | "COL0"
1064                             | "COL1"
1065                             | "FOGC"
1066                             | "TEX0"
1067                             | "TEX1"
1068                             | "TEX2"
1069                             | "TEX3"
1070                             | "TEX4"
1071                             | "TEX5"
1072                             | "TEX6"
1073                             | "TEX7"
1074
1075    <progParamRegister>    ::= <absProgParamReg>
1076                             | <relProgParamReg>
1077
1078    <absProgParamReg>      ::= "c" "[" <progParamRegNum> "]"
1079
1080    <progParamRegNum>      ::= <vp1-progParamRegNum>
1081                             | <vp2-progParamRegNum>
1082
1083    <vp1-progParamRegNum>  ::= decimal integer from 0 to 95 inclusive
1084
1085    <vp2-progParamRegNum>  ::= decimal integer from 0 to 255 inclusive
1086
1087    <relProgParamReg>      ::= "c" "[" <scalarAddr> <relProgParamOffset> "]"
1088
1089    <relProgParamOffset>   ::= ""
1090                             | "+" <progParamPosOffset>
1091                             | "-" <progParamNegOffset>
1092
1093    <progParamPosOffset>   ::= <vp1-progParamPosOff>
1094                             | <vp2-progParamPosOff>
1095
1096    <vp1-progParamPosOff>  ::= decimal integer from 0 to 63 inclusive
1097
1098    <vp2-progParamPosOff>  ::= decimal integer from 0 to 255 inclusive
1099
1100    <progParamNegOffset>   ::= <vp1-progParamNegOff>
1101                             | <vp2-progParamNegOff>
1102
1103    <vp1-progParamNegOff>  ::= decimal integer from 0 to 64 inclusive
1104
1105    <vp2-progParamNegOff>  ::= decimal integer from 0 to 256 inclusive
1106
1107    <tempRegister>         ::= "R0"  | "R1"  | "R2"  | "R3"
1108                             | "R4"  | "R5"  | "R6"  | "R7"
1109                             | "R8"  | "R9"  | "R10" | "R11"
1110
1111    <vp2-tempRegister>     ::= "R12" | "R13" | "R14" | "R15"
1112
1113    <vtxResultRegister>    ::= "o" "[" <vtxResultRegName> "]"
1114
1115    <vtxResultRegName>     ::= "HPOS"
1116                             | "COL0"
1117                             | "COL1"
1118                             | "BFC0"
1119                             | "BFC1"
1120                             | "FOGC"
1121                             | "PSIZ"
1122                             | "TEX0"
1123                             | "TEX1"
1124                             | "TEX2"
1125                             | "TEX3"
1126                             | "TEX4"
1127                             | "TEX5"
1128                             | "TEX6"
1129                             | "TEX7"
1130                             | <vp2-resultRegName>
1131
1132    <vp2-resultRegName>    ::= "CLP0"
1133                             | "CLP1"
1134                             | "CLP2"
1135                             | "CLP3"
1136                             | "CLP4"
1137                             | "CLP5"
1138
1139    <scalarAddr>           ::= <addrRegister> "." <addrRegisterComp>
1140
1141    <maskedAddrReg>        ::= <addrRegister> <addrWriteMask>
1142
1143    <addrRegister>         ::= "A0"
1144                             | <vp2-addrRegister>
1145
1146    <vp2-addrRegister>     ::= "A1"
1147
1148    <addrRegisterComp>     ::= "x"
1149                             | <vp2-addrRegisterComp>
1150
1151    <vp2-addrRegisterComp> ::= "y"
1152                             | "z"
1153                             | "w"
1154
1155    <addrWriteMask>        ::= "." "x"
1156                             | <vp2-addrWriteMask>
1157
1158    <vp2-addrWriteMask>     ::= ""
1159                             | "."     "y"
1160                             | "." "x" "y"
1161                             | "."         "z"
1162                             | "." "x"     "z"
1163                             | "."     "y" "z"
1164                             | "." "x" "y" "z"
1165                             | "."             "w"
1166                             | "." "x"         "w"
1167                             | "."     "y"     "w"
1168                             | "." "x" "y"     "w"
1169                             | "."         "z" "w"
1170                             | "." "x"     "z" "w"
1171                             | "."     "y" "z" "w"
1172                             | "." "x" "y" "z" "w"
1173
1174
1175    <optionalSign>         ::= ""
1176                             | "-"
1177                             | <vp2-optionalSign>
1178
1179    <vp2-optionalSign>     ::= "+"
1180
1181    <vp2-instructionLabel> ::= <vp2-branchLabel> ":"
1182
1183    <vp2-branchLabel>      ::= <identifier>
1184
1185    <optionalWriteMask>    ::= ""
1186                             | "." "x"
1187                             | "."     "y"
1188                             | "." "x" "y"
1189                             | "."         "z"
1190                             | "." "x"     "z"
1191                             | "."     "y" "z"
1192                             | "." "x" "y" "z"
1193                             | "."             "w"
1194                             | "." "x"         "w"
1195                             | "."     "y"     "w"
1196                             | "." "x" "y"     "w"
1197                             | "."         "z" "w"
1198                             | "." "x"     "z" "w"
1199                             | "."     "y" "z" "w"
1200                             | "." "x" "y" "z" "w"
1201
1202    <optionalCCMask>       ::= ""
1203                             | <vp2-ccMask>
1204
1205    <vp2-ccMask>           ::= "(" <vp2-ccMaskRule> <swizzleSuffix> ")"
1206
1207    <vp2-ccMaskRule>       ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE"
1208                             | "TR" | "FL"
1209
1210    <scalarSuffix>         ::= "." <component>
1211
1212    <swizzleSuffix>        ::= ""
1213                             | "." <component>
1214                             | "." <component> <component>
1215                                   <component> <component>
1216
1217    <component>            ::= "x"
1218                             | "y"
1219                             | "z"
1220                             | "w"
1221
1222    The <identifier> rule matches a sequence of one or more letters ("A"
1223    through "Z", "a" through "z", and "_") and digits ("0" through "9); the
1224    first character must be a letter.  The underscore ("_") counts as a
1225    letter.  Upper and lower case letters are different (names are
1226    case-sensitive).
1227
1228    The <vertexAttribRegNum> rule matches both register numbers 0 through 15
1229    and a set of mnemonics that abbreviate the aliasing of conventional
1230    per-vertex parameters to vertex attribute register numbers.  Table X.3
1231    shows the mapping from mnemonic to vertex attribute register number and
1232    what the mnemonic abbreviates.
1233
1234                   Vertex Attribute
1235        Mnemonic   Register Number     Meaning
1236        --------   ----------------    --------------------
1237         "OPOS"     0                  object position
1238         "WGHT"     1                  vertex weight
1239         "NRML"     2                  normal
1240         "COL0"     3                  primary color
1241         "COL1"     4                  secondary color
1242         "FOGC"     5                  fog coordinate
1243         "TEX0"     8                  texture coordinate 0
1244         "TEX1"     9                  texture coordinate 1
1245         "TEX2"     10                 texture coordinate 2
1246         "TEX3"     11                 texture coordinate 3
1247         "TEX4"     12                 texture coordinate 4
1248         "TEX5"     13                 texture coordinate 5
1249         "TEX6"     14                 texture coordinate 6
1250         "TEX7"     15                 texture coordinate 7
1251
1252        Table X.3:  The mapping between vertex attribute register numbers,
1253        mnemonics, and meanings.
1254
1255    A vertex program fails to load if it does not write at least one component
1256    of the HPOS register.
1257
1258    A vertex program fails to load in the VP1 execution environment if it
1259    contains more than 128 instructions.  A vertex program fails to load in
1260    the VP2 execution environment if it contains more than 256 instructions.
1261    Each block of text matching the <instruction> rule counts as an
1262    instruction.
1263
1264    A vertex program fails to load if any instruction sources more than one
1265    unique program parameter register.  An instruction can match the
1266    <progParamRegister> rule more than once only if all such matches are
1267    identical.
1268
1269    A vertex program fails to load if any instruction sources more than one
1270    unique vertex attribute register.  An instruction can match the
1271    <vtxAttribRegister> rule more than once only if all such matches refer to
1272    the same register.
1273
1274    The error INVALID_OPERATION is generated if a vertex program fails to load
1275    because it is not syntactically correct or for one of the semantic
1276    restrictions listed above.
1277
1278    The error INVALID_OPERATION is generated if a program is loaded for id
1279    when id is currently loaded with a program of a different target.
1280
1281    A successfully loaded vertex program is parsed into a sequence of
1282    instructions.  Each instruction is identified by its tokenized name.  The
1283    operation of these instructions when executed is defined in section
1284    2.14.1.10.
1285
1286    A successfully loaded program replaces the program previously assigned to
1287    the name specified by id.  If the OUT_OF_MEMORY error is generated by
1288    LoadProgramNV, no change is made to the previous contents of the named
1289    program.
1290
1291    Querying the value of PROGRAM_ERROR_POSITION_NV returns a ubyte offset
1292    into the last loaded program string indicating where the first error in
1293    the program.  If the program fails to load because of a semantic
1294    restriction that cannot be determined until the program is fully scanned,
1295    the error position will be len, the length of the program.  If the program
1296    loads successfully, the value of PROGRAM_ERROR_POSITION_NV is assigned the
1297    value negative one.
1298
1299
1300    Section 2.14.1.9,  Vertex Program Binding and Program Management
1301
1302    The current vertex program is invoked whenever vertex attribute zero is
1303    updated (whether by a VertexAttributeNV or Vertex command).  The current
1304    vertex program is updated by
1305
1306      BindProgramNV(enum target, uint id);
1307
1308    where target must be VERTEX_PROGRAM_NV.  This binds the vertex program
1309    named by id as the current vertex program. The error INVALID_OPERATION
1310    is generated if id names a program that is not a vertex program
1311    (for example, if id names a vertex state program as described in
1312    section 2.14.4).
1313
1314    Binding to a nonexistent program id does not generate an error.
1315    In particular, binding to program id zero does not generate an error.
1316    However, because program zero cannot be loaded, program zero is
1317    always nonexistent.  If a program id is successfully loaded with a
1318    new vertex program and id is also the currently bound vertex program,
1319    the new program is considered the currently bound vertex program.
1320
1321    The INVALID_OPERATION error is generated when both vertex program
1322    mode is enabled and Begin is called (or when a command that performs
1323    an implicit Begin is called) if the current vertex program is
1324    nonexistent or not valid.  A vertex program may not be valid for
1325    reasons explained in section 2.14.5.
1326
1327    Programs are deleted by calling
1328
1329      void DeleteProgramsNV(sizei n, const uint *ids);
1330
1331    ids contains n names of programs to be deleted.  After a program
1332    is deleted, it becomes nonexistent, and its name is again unused.
1333    If a program that is currently bound is deleted, it is as though
1334    BindProgramNV has been executed with the same target as the deleted
1335    program and program zero.  Unused names in ids are silently ignored,
1336    as is the value zero.
1337
1338    The command
1339
1340      void GenProgramsNV(sizei n, uint *ids);
1341
1342    returns n previously unused program names in ids.  These names
1343    are marked as used, for the purposes of GenProgramsNV only,
1344    but they become existent programs only when the are first loaded
1345    using LoadProgramNV.  The error INVALID_VALUE is generated if n
1346    is negative.
1347
1348    An implementation may choose to establish a working set of programs on
1349    which binding and ExecuteProgramNV operations (execute programs are
1350    explained in section 2.14.4) are performed with higher performance.
1351    A program that is currently part of this working set is said to
1352    be resident.
1353
1354    The command
1355
1356      boolean AreProgramsResidentNV(sizei n, const uint *ids,
1357                                    boolean *residences);
1358
1359    returns TRUE if all of the n programs named in ids are resident,
1360    or if the implementation does not distinguish a working set.  If at
1361    least one of the programs named in ids is not resident, then FALSE is
1362    returned, and the residence of each program is returned in residences.
1363    Otherwise the contents of residences are not changed.  If any of
1364    the names in ids are nonexistent or zero, FALSE is returned, the
1365    error INVALID_VALUE is generated, and the contents of residences
1366    are indeterminate.  The residence status of a single named program
1367    can also be queried by calling GetProgramivNV with id set to the
1368    name of the program and pname set to PROGRAM_RESIDENT_NV.
1369
1370    AreProgramsResidentNV indicates only whether a program is
1371    currently resident, not whether it could not be made resident.
1372    An implementation may choose to make a program resident only on
1373    first use, for example.  The client may guide the GL implementation
1374    in determining which programs should be resident by requesting a
1375    set of programs to make resident.
1376
1377    The command
1378
1379      void RequestResidentProgramsNV(sizei n, const uint *ids);
1380
1381    requests that the n programs named in ids should be made resident.
1382    While all the programs are not guaranteed to become resident,
1383    the implementation should make a best effort to make as many of
1384    the programs resident as possible.  As a result of making the
1385    requested programs resident, program names not among the requested
1386    programs may become non-resident.  Higher priority for residency
1387    should be given to programs listed earlier in the ids array.
1388    RequestResidentProgramsNV silently ignores attempts to make resident
1389    nonexistent program names or zero.  AreProgramsResidentNV can be
1390    called after RequestResidentProgramsNV to determine which programs
1391    actually became resident.
1392
1393
1394    Section 2.14.2,  Vertex Program Operation
1395
1396    In the VP1 execution environment, there are twenty-one vertex program
1397    instructions.  Four instructions (ABS, DPH, RCC, and SUB) are available
1398    only in the VP1.1 execution environment.  The instructions and their
1399    respective input and output parameters are summarized in Table X.4.
1400
1401      Instruction    Inputs  Output   Description
1402      -----------    ------  ------   --------------------------------
1403      ABS(*)         v       v        absolute value
1404      ADD            v,v     v        add
1405      ARL            v       as       address register load
1406      DP3            v,v     ssss     3-component dot product
1407      DP4            v,v     ssss     4-component dot product
1408      DPH(*)         v,v     ssss     homogeneous dot product
1409      DST            v,v     v        distance vector
1410      EXP            s       v        exponential base 2 (approximate)
1411      LIT            v       v        compute light coefficients
1412      LOG            s       v        logarithm base 2 (approximate)
1413      MAD            v,v,v   v        multiply and add
1414      MAX            v,v     v        maximum
1415      MIN            v,v     v        minimum
1416      MOV            v       v        move
1417      MUL            v,v     v        multiply
1418      RCC(*)         s       ssss     reciprocal (clamped)
1419      RCP            s       ssss     reciprocal
1420      RSQ            s       ssss     reciprocal square root
1421      SGE            v,v     v        set on greater than or equal
1422      SLT            v,v     v        set on less than
1423      SUB(*)         v,v     v        subtract
1424
1425    Table X.4:  Summary of vertex program instructions in the VP1 execution
1426    environment.  "v" indicates a floating-point vector input or output, "s"
1427    indicates a floating-point scalar input, "ssss" indicates a scalar output
1428    replicated across a 4-component vector, "as" indicates a single component
1429    of an address register.
1430
1431
1432    In the VP2 execution environment, are thirty-nine vertex program
1433    instructions.  Vertex program instructions may have an optional suffix of
1434    "C" to allow an update of the condition code register (section 2.14.1.6).
1435    For example, there are two instructions to perform vector addition, "ADD"
1436    and "ADDC".  The vertex program instructions available in the VP2
1437    execution environment and their respective input and output parameters are
1438    summarized in Table X.5.
1439
1440      Instruction    Inputs  Output   Description
1441      -----------    ------  ------   --------------------------------
1442      ABS[C]         v       v        absolute value
1443      ADD[C]         v,v     v        add
1444      ARA[C]         av      av       address register add
1445      ARL[C]         v       av       address register load
1446      ARR[C]         v       av       address register load (with round)
1447      BRA            as      none     branch
1448      CAL            as      none     subroutine call
1449      COS[C]         s       ssss     cosine
1450      DP3[C]         v,v     ssss     3-component dot product
1451      DP4[C]         v,v     ssss     4-component dot product
1452      DPH[C]         v,v     ssss     homogeneous dot product
1453      DST[C]         v,v     v        distance vector
1454      EX2[C]         s       ssss     exponential base 2
1455      EXP[C]         s       v        exponential base 2 (approximate)
1456      FLR[C]         v       v        floor
1457      FRC[C]         v       v        fraction
1458      LG2[C]         s       ssss     logarithm base 2
1459      LIT[C]         v       v        compute light coefficients
1460      LOG[C]         s       v        logarithm base 2 (approximate)
1461      MAD[C]         v,v,v   v        multiply and add
1462      MAX[C]         v,v     v        maximum
1463      MIN[C]         v,v     v        minimum
1464      MOV[C]         v       v        move
1465      MUL[C]         v,v     v        multiply
1466      RCC[C]         s       ssss     reciprocal (clamped)
1467      RCP[C]         s       ssss     reciprocal
1468      RET            none    none     subroutine call return
1469      RSQ[C]         s       ssss     reciprocal square root
1470      SEQ[C]         v,v     v        set on equal
1471      SFL[C]         v,v     v        set on false
1472      SGE[C]         v,v     v        set on greater than or equal
1473      SGT[C]         v,v     v        set on greater than
1474      SIN[C]         s       ssss     sine
1475      SLE[C]         v,v     v        set on less than or equal
1476      SLT[C]         v,v     v        set on less than
1477      SNE[C]         v,v     v        set on not equal
1478      SSG[C]         v       v        set sign
1479      STR[C]         v,v     v        set on true
1480      SUB[C]         v,v     v        subtract
1481
1482    Table X.5:  Summary of vertex program instructions in the VP2 execution
1483    environment.  "v" indicates a floating-point vector input or output, "s"
1484    indicates a floating-point scalar input, "ssss" indicates a scalar output
1485    replicated across a 4-component vector, "av" indicates a full address
1486    register, "as" indicates a single component of an address register.
1487
1488
1489    Section 2.14.2.1,  Vertex Program Operands
1490
1491    Most vertex program instructions operate on floating-point vectors,
1492    floating-point scalars, or integer scalars as, indicated in the grammar
1493    (see section 2.14.1.8) by the rules <vectorSrc>, <scalarSrc>, and
1494    <scalarAddr>, respectively.
1495
1496    The basic set of floating-point scalar operands is defined by the grammar
1497    rule <baseScalarSrc>.  Scalar operands are single components of vertex
1498    attribute, program parameter, or temporary registers, as allowed by the
1499    <srcRegister> rule.  A vector component is selected by the <scalarSuffix>
1500    rule, where the characters "x", "y", "z", and "w" select the x, y, z, and
1501    w components, respectively, of the vector.
1502
1503    The basic set of floating-point vector operands is defined by the grammar
1504    rule <baseVectorSrc>.  Vector operands can be obtained from vertex
1505    attribute, program parameter, or temporary registers as allowed by the
1506    <srcRegister> rule.
1507
1508    Basic vector operands can be swizzled according to the <swizzleSuffix>
1509    rule.  In its most general form, the <swizzleSuffix> rule matches the
1510    pattern ".????" where each question mark is replaced with one of "x", "y",
1511    "z", or "w".  For such patterns, the x, y, z, and w components of the
1512    operand are taken from the vector components named by the first, second,
1513    third, and fourth character of the pattern, respectively.  For example, if
1514    the swizzle suffix is ".yzzx" and the specified source contains {2,8,9,0},
1515    the swizzled operand used by the instruction is {8,9,9,2}.
1516
1517    If the <swizzleSuffix> rule matches "", it is treated as though it were
1518    ".xyzw".  If the <swizzleSuffix> rule matches (ignoring whitespace) ".x",
1519    ".y", ".z", or ".w", these are treated the same as ".xxxx", ".yyyy",
1520    ".zzzz", and ".wwww" respectively.
1521
1522    Floating-point scalar or vector operands can optionally be negated
1523    according to the <negate> rules in <baseScalarSrc> and <baseVectorSrc>.
1524    If the <negate> matches "-", each operand or operand component is negated.
1525
1526    In the VP2 execution environment, a component-wise absolute value
1527    operation is performed on an operand if the <scalarSrc> or <vectorSrc>
1528    rules match <vp2-absScalarSrc> or <vp2-absVectorSrc>.  In this case, the
1529    absolute value of each component of the operand is taken.  In addition, if
1530    the <negate> rule in <vp2-absScalarSrc> or <vp2-absVectorSrc> matches "-",
1531    each component is subsequently negated.
1532
1533    Integer scalar operands are single components of one of the address
1534    register vectors, as identified by the <addrRegister> rule.  A vector
1535    component is selected by the <scalarSuffix> rule in the same manner as
1536    floating-point scalar operands.  Negation and absolute value operations
1537    are not available for integer scalar operands.
1538
1539    The following pseudo-code spells out the operand generation process.  In
1540    the pseudo-code, "float" and "int" are floating-point and integer scalar
1541    types, while "floatVec" and "intVec" are four-component vectors.  "source"
1542    is the register used for the operand, matching the <srcRegister> or
1543    <addrRegister> rules.  "absolute" is TRUE if the operand matches the
1544    <vp2-absScalarSrc> or <vp2-absVectorSrc> rules, and FALSE otherwise.
1545    "negateBase" is TRUE if the <negate> rule in <baseScalarSrc> or
1546    <baseVectorSrc> matches "-" and FALSE otherwise.  "negateAbs" is TRUE if
1547    the <negate> rule in <vp2-absScalarSrc> or <vp2-absVectorSrc> matches "-"
1548    and FALSE otherwise.  The ".c***", ".*c**", ".**c*", ".***c" modifiers
1549    refer to the x, y, z, and w components obtained by the swizzle operation.
1550
1551      floatVec VectorLoad(floatVec source)
1552      {
1553          floatVec operand;
1554
1555          operand.x = source.c***;
1556          operand.y = source.*c**;
1557          operand.z = source.**c*;
1558          operand.w = source.***c;
1559          if (negateBase) {
1560             operand.x = -operand.x;
1561             operand.y = -operand.y;
1562             operand.z = -operand.z;
1563             operand.w = -operand.w;
1564          }
1565          if (absolute) {
1566             operand.x = abs(operand.x);
1567             operand.y = abs(operand.y);
1568             operand.z = abs(operand.z);
1569             operand.w = abs(operand.w);
1570          }
1571          if (negateAbs) {
1572             operand.x = -operand.x;
1573             operand.y = -operand.y;
1574             operand.z = -operand.z;
1575             operand.w = -operand.w;
1576          }
1577
1578          return operand;
1579      }
1580
1581      float ScalarLoad(floatVec source)
1582      {
1583          float operand;
1584
1585          operand = source.c***;
1586          if (negateBase) {
1587            operand = -operand;
1588          }
1589          if (absolute) {
1590             operand = abs(operand);
1591          }
1592          if (negateAbs) {
1593            operand = -operand;
1594          }
1595
1596          return operand;
1597      }
1598
1599      intVec AddrVectorLoad(intVec addrReg)
1600      {
1601          intVec operand;
1602
1603          operand.x = source.c***;
1604          operand.y = source.*c**;
1605          operand.z = source.**c*;
1606          operand.w = source.***c;
1607
1608          return operand;
1609      }
1610
1611      int AddrScalarLoad(intVec addrReg)
1612      {
1613          return source.c***;
1614      }
1615
1616    If an operand is obtained from a program parameter register, by matching
1617    the <progParamRegister> rule, the register number can be obtained by
1618    absolute or relative addressing.
1619
1620    When absolute addressing is used, by matching the <absProgParamReg> rule,
1621    the program parameter register number is the number matching the
1622    <progParamRegNum>.
1623
1624    When relative addressing is used, by matching the <relProgParamReg> rule,
1625    the program parameter register number is computed during program
1626    execution.  An index is computed by adding the integer scalar operand
1627    specified by the <scalarAddr> rule to the positive or negative offset
1628    specified by the <progParamOffset> rule.  If <progParamOffset> matches "",
1629    an offset of zero is used.
1630
1631    The following pseudo-code spells out the process of loading a program
1632    parameter.  "addrReg" refers to the address register used for relative
1633    addressing, "absolute" is TRUE if the operand uses absolute addressing and
1634    FALSE otherwise.  "paramNumber" is the program parameter number for
1635    absolute addressing; "paramOffset" is the program parameter offset for
1636    relative addressing.  "paramRegiser" is an array holding the complete set
1637    of program parameter registers.
1638
1639      floatVec ProgramParameterLoad(intVec addrReg)
1640      {
1641        int index;
1642
1643        if (absolute) {
1644          index = paramNumber;
1645        } else {
1646          index = AddrScalarLoad(addrReg) + paramOffset
1647        }
1648
1649        return paramRegister[index];
1650      }
1651
1652
1653    Section 2.14.2.2,  Vertex Program Destination Register Update
1654
1655    Most vertex program instructions write a 4-component result vector to a
1656    single temporary, vertex result, or address register.  Writes to
1657    individual components of the destination register are controlled by
1658    individual component write masks specified as part of the instruction.  In
1659    the VP2 execution environment, writes are additionally controlled by the a
1660    condition code write mask, which is computed at run time.
1661
1662    The component write mask is specified by the <optionalWriteMask> rule
1663    found in the <maskedDstReg> or <maskedAddrReg> rule.  If the optional mask
1664    is "", all components are enabled.  Otherwise, the optional mask names the
1665    individual components to enable.  The characters "x", "y", "z", and "w"
1666    match the x, y, z, and w components respectively.  For example, an
1667    optional mask of ".xzw" indicates that the x, z, and w components should
1668    be enabled for writing but the y component should not.  The grammar
1669    requires that the destination register mask components must be listed in
1670    "xyzw" order.
1671
1672    In the VP2 execution environment, the condition code write mask is
1673    specified by the <optionalCCMask> rule found in the <maskedDstReg> and
1674    <maskedAddrReg> rules.  If the condition code mask matches "", all
1675    components are enabled.  Otherwise, the condition code register is loaded
1676    and swizzled according to the swizzle codes specified by <swizzleSuffix>.
1677    Each component of the swizzled condition code is tested according to the
1678    rule given by <ccMaskRule>.  <ccMaskRule> may have the values "EQ", "NE",
1679    "LT", "GE", LE", or "GT", which mean to enable writes if the corresponding
1680    condition code field evaluates to equal, not equal, less than, greater
1681    than or equal, less than or equal, or greater than, respectively.
1682    Comparisons involving condition codes of "UN" (unordered) evaluate to true
1683    for "NE" and false otherwise.  For example, if the condition code is
1684    (GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle
1685    operation will load (EQ,LT,GT,GT) and the mask will thus will enable
1686    writes on the y, z, and w components.  In addition, "TR" always enables
1687    writes and "FL" always disables writes, regardless of the condition code.
1688
1689    Each component of the destination register is updated with the result of
1690    the vertex program instruction if and only if the component is enabled for
1691    writes by the component write mask, and the optional condition code mask
1692    (if applicable).  Otherwise, the component of the destination register
1693    remains unchanged.
1694
1695    In the VP2 execution environment, a vertex program instruction can also
1696    optionally update the condition code register.  The condition code is
1697    updated if the condition code register update suffix "C" is present in the
1698    instruction.  The instruction "ADDC" will update the condition code; the
1699    otherwise equivalent instruction "ADD" will not.  If condition code
1700    updates are enabled, each component of the destination register enabled
1701    for writes is compared to zero.  The corresponding component of the
1702    condition code is set to "LT", "EQ", or "GT", if the written component is
1703    less than, equal to, or greater than zero, respectively.  Condition code
1704    components are set to "UN" if the written component is NaN.  Values of
1705    -0.0 and +0.0 both evaluate to "EQ".  If a component of the destination
1706    register is not enabled for writes, the corresponding condition code
1707    component is also unchanged.
1708
1709    In the following example code,
1710
1711        # R1=(-2, 0, 2, NaN)              R0                  CC
1712        MOVC R0, R1;               # ( -2,  0,   2, NaN) (LT,EQ,GT,UN)
1713        MOVC R0.xyz, R1.yzwx;      # (  0,  2, NaN, NaN) (EQ,GT,UN,UN)
1714        MOVC R0 (NE), R1.zywx;     # (  0,  0, NaN,  -2) (EQ,EQ,UN,LT)
1715
1716    the first instruction writes (-2,0,2,NaN) to R0 and updates the condition
1717    code to (LT,EQ,GT,UN).  The second instruction, only the "x", "y", and "z"
1718    components of R0 and the condition code are updated, so R0 ends up with
1719    (0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN).  In the
1720    third instruction, the condition code mask disables writes to the x
1721    component (its condition code field is "EQ"), so R0 ends up with
1722    (0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT).
1723
1724    The following pseudocode illustrates the process of writing a result
1725    vector to the destination register.  In the pseudocode, "instrmask" refers
1726    to the component write mask given by the <optionalWriteMask> rule.  In the
1727    VP1 execution environment, "ccMaskRule" is always "" and "updatecc" is
1728    always FALSE.  In the VP2 execution environment, "ccMaskRule" refers to
1729    the condition code mask rule given by <vp2-optionalCCMask> and "updatecc"
1730    is TRUE if and only if condition code updates are enabled.  "result",
1731    "destination", and "cc" refer to the result vector, the register selected
1732    by <dstRegister> and the condition code, respectively.  Condition codes do
1733    not exist in the VP1 execution environment.
1734
1735      boolean TestCC(CondCode field) {
1736          switch (ccMaskRule) {
1737          case "EQ":  return (field == "EQ");
1738          case "NE":  return (field != "EQ");
1739          case "LT":  return (field == "LT");
1740          case "GE":  return (field == "GT" || field == "EQ");
1741          case "LE":  return (field == "LT" || field == "EQ");
1742          case "GT":  return (field == "GT");
1743          case "TR":  return TRUE;
1744          case "FL":  return FALSE;
1745          case "":    return TRUE;
1746          }
1747      }
1748
1749      enum GenerateCC(float value) {
1750        if (value == NaN) {
1751          return UN;
1752        } else if (value < 0) {
1753          return LT;
1754        } else if (value == 0) {
1755          return EQ;
1756        } else {
1757          return GT;
1758        }
1759      }
1760
1761      void UpdateDestination(floatVec destination, floatVec result)
1762      {
1763          floatVec merged;
1764          ccVec    mergedCC;
1765
1766          // Merge the converted result into the destination register, under
1767          // control of the compile- and run-time write masks.
1768          merged = destination;
1769          mergedCC = cc;
1770          if (instrMask.x && TestCC(cc.c***)) {
1771              merged.x = result.x;
1772              if (updatecc) mergedCC.x = GenerateCC(result.x);
1773          }
1774          if (instrMask.y && TestCC(cc.*c**)) {
1775              merged.y = result.y;
1776              if (updatecc) mergedCC.y = GenerateCC(result.y);
1777          }
1778          if (instrMask.z && TestCC(cc.**c*)) {
1779              merged.z = result.z;
1780              if (updatecc) mergedCC.z = GenerateCC(result.z);
1781          }
1782          if (instrMask.w && TestCC(cc.***c)) {
1783              merged.w = result.w;
1784              if (updatecc) mergedCC.w = GenerateCC(result.w);
1785          }
1786
1787          // Write out the new destination register and condition code.
1788          destination = merged;
1789          cc = mergedCC;
1790      }
1791
1792    Section 2.14.2.3, Vertex Program Execution
1793
1794    In the VP1 execution environment, vertex programs consist of a sequence of
1795    instructions without no support for branching.  Vertex programs begin by
1796    executing the first instruction in the program, and execute instructions
1797    in the order specified in the program until the last instruction is
1798    reached.
1799
1800    VP2 vertex programs can contain one or more instruction labels, matching
1801    the grammar rule <vp2-instructionLabel>.  An instruction label can be
1802    referred to explicitly in branch (BRA) or subroutine call (CAL)
1803    instructions.  Instruction labels can be defined or used at any point in
1804    the body of a program, and can be used in instructions before being
1805    defined in the program string.
1806
1807    VP2 vertex program branching instructions can be conditional.  The branch
1808    condition is specified by the <vp2-conditionMask> and may depend on the
1809    contents of the condition code register.  Branch conditions are evaluated
1810    by evaluating a condition code write mask in exactly the same manner as
1811    done for register writes (section 2.14.2.2).  If any of the four
1812    components of the condition code write mask are enabled, the branch is
1813    taken and execution continues with the instruction following the label
1814    specified in the instruction.  Otherwise, the instruction is ignored and
1815    vertex program execution continues with the next instruction.  In the
1816    following example code,
1817
1818        MOVC CC, c[0];         # c[0]=(-2, 0, 2, NaN), CC gets (LT,EQ,GT,UN)
1819        BRA label1 (LT.xyzw);
1820        MOV R0,R1;             # not executed
1821      label1:
1822        BRA label2 (LT.wyzw);
1823        MOV R0,R2;             # executed
1824      label2:
1825
1826    the first BRA instruction loads a condition code of (LT,EQ,GT,UN) while
1827    the second BRA instruction loads a condition code of (UN,EQ,GT,UN).  The
1828    first branch will be taken because the "x" component evaluates to LT; the
1829    second branch will not be taken because no component evaluates to LT.
1830
1831    VP2 vertex programs can specify subroutine calls.  When a subroutine call
1832    (CAL) instruction is executed, a reference to the instruction immediately
1833    following the CAL instruction is pushed onto the call stack.  When a
1834    subroutine return (RET) instruction is executed, an instruction reference
1835    is popped off the call stack and program execution continues with the
1836    popped instruction.  A vertex program will terminate if a CAL instruction
1837    is executed with four entries already in the call stack or if a RET
1838    instruction is executed with an empty call stack.
1839
1840    If a VP2 vertex program has an instruction label "main", program execution
1841    begins with the instruction immediately following the instruction label.
1842    Otherwise, program execution begins with the first instruction of the
1843    program.  Instructions will be executed sequentially in the order
1844    specified in the program, although branch instructions will affect the
1845    instruction execution order, as described above.  A vertex program will
1846    terminate after executing a RET instruction with an empty call stack.  A
1847    vertex program will also terminate after executing the last instruction in
1848    the program, unless that instruction was a taken branch.
1849
1850    A vertex program will fail to load if an instruction refers to a label
1851    that is not defined in the program string.
1852
1853    A vertex program will terminate abnormally if a subroutine call
1854    instruction produces a call stack overflow.  Additionally, a vertex
1855    program will terminate abnormally after executing 65536 instructions to
1856    prevent hangs caused by infinite loops in the program.
1857
1858    When a vertex program terminates, normally or abnormally, it will emit a
1859    vertex whose attributes are taken from the final values of the vertex
1860    result registers (section 2.14.1.5).
1861
1862
1863    Section 2.14.3,  Vertex Program Instruction Set
1864
1865    The following sections describe the set of supported vertex program
1866    instructions.  Instructions available only in the VP1.1 or VP2 execution
1867    environment will be noted in the instruction description.
1868
1869    Each section will contain pseudocode describing the instruction.
1870    Instructions will have up to three operands, referred to as "op0", "op1",
1871    and "op2".  The operands are loaded using the mechanisms specified in
1872    section 2.14.2.1.  Most instructions will generate a result vector called
1873    "result".  The result vector is then written to the destination register
1874    specified in the instruction using the mechanisms specified in section
1875    2.14.2.2.
1876
1877    Operands and results are represented as 32-bit single-precision
1878    floating-point numbers according to the IEEE 754 floating-point
1879    specification.  IEEE denorm encodings, used to represent numbers smaller
1880    than 2^-126, are not supported.  All such numbers are flushed to zero.
1881    There are three special encodings referred to in this section:  +INF means
1882    "positive infinity", -INF means "negative infinity", and NaN refers to
1883    "not a number".
1884
1885    Arithmetic operations are typically carried out in single precision
1886    according to the rules specified in the IEEE 754 specification.  Any
1887    exceptions and special cases will be noted in the instruction description.
1888
1889
1890    Section 2.14.3.1,  ABS:  Absolute Value
1891
1892    The ABS instruction performs a component-wise absolute value operation on
1893    the single operand to yield a result vector.
1894
1895      tmp = VectorLoad(op0);
1896      result.x = abs(tmp.x);
1897      result.y = abs(tmp.y);
1898      result.z = abs(tmp.z);
1899      result.w = abs(tmp.w);
1900
1901    The following special-case rules apply to absolute value operation:
1902
1903      1. abs(NaN) = NaN.
1904      2. abs(-INF) = abs(+INF) = +INF.
1905      3. abs(-0.0) = abs(+0.0) = +0.0.
1906
1907    The ABS instruction is available only in the VP1.1 and VP2 execution
1908    environments.
1909
1910    In the VP1.0 execution environment, the same functionality can be achieved
1911    with "MAX result, src, -src".
1912
1913    In the VP2 execution environment, the ABS instruction is effectively
1914    obsolete, since instructions can take the absolute value of each operand
1915    at no cost.
1916
1917
1918    Section 2.14.3.2,  ADD:  Add
1919
1920    The ADD instruction performs a component-wise add of the two operands to
1921    yield a result vector.
1922
1923      tmp0 = VectorLoad(op0);
1924      tmp1 = VectorLoad(op1);
1925      result.x = tmp0.x + tmp1.x;
1926      result.y = tmp0.y + tmp1.y;
1927      result.z = tmp0.z + tmp1.z;
1928      result.w = tmp0.w + tmp1.w;
1929
1930    The following special-case rules apply to addition:
1931
1932      1. "A+B" is always equivalent to "B+A".
1933      2. NaN + <x> = NaN, for all <x>.
1934      3. +INF + <x> = +INF, for all <x> except NaN and -INF.
1935      4. -INF + <x> = -INF, for all <x> except NaN and +INF.
1936      5. +INF + -INF = NaN.
1937      6. -0.0 + <x> = <x>, for all <x>.
1938      7. +0.0 + <x> = <x>, for all <x> except -0.0.
1939
1940
1941    Section 2.14.3.3,  ARA:  Address Register Add
1942
1943    The ARA instruction adds two pairs of components of a vector address
1944    register operand to produce an integer result vector.  The "x" and "z"
1945    components of the result vector contain the sum of the "x" and "z"
1946    components of the operand; the "y" and "w" components of the result vector
1947    contain the sum of the "y" and "w" components of the operand.  Each
1948    component of the result vector is clamped to [-512, +511], the range of
1949    representable address register components.
1950
1951      itmp = AddrVectorLoad(op0);
1952      iresult.x = itmp.x + itmp.z;
1953      iresult.y = itmp.y + itmp.w;
1954      iresult.z = itmp.x + itmp.z;
1955      iresult.w = itmp.y + itmp.w;
1956      if (iresult.x < -512) iresult.x = -512;
1957      if (iresult.x > 511)  iresult.x = 511;
1958      if (iresult.y < -512) iresult.y = -512;
1959      if (iresult.y > 511)  iresult.y = 511;
1960      if (iresult.z < -512) iresult.z = -512;
1961      if (iresult.z > 511)  iresult.z = 511;
1962      if (iresult.w < -512) iresult.w = -512;
1963      if (iresult.w > 511)  iresult.w = 511;
1964
1965    Component swizzling is not supported when the operand is loaded.
1966
1967    The ARA instruction is available only in the VP2 execution environment.
1968
1969
1970    Section 2.14.3.4,  ARL:  Address Register Load
1971
1972    In the VP1 execution environment, the ARL instruction loads a single
1973    scalar operand and performs a floor operation to generate an integer
1974    scalar to be written to the address register.
1975
1976      tmp = ScalarLoad(op0);
1977      iresult.x = floor(tmp);
1978
1979    In the VP2 execution environment, the ARL instruction loads a single
1980    vector operand and performs a component-wise floor operation to generate
1981    an integer result vector.  Each component of the result vector is clamped
1982    to [-512, +511], the range of representable address register components.
1983    The ARL instruction applies all masking operations to address register
1984    writes as are described in section 2.14.2.2.
1985
1986      tmp = VectorLoad(op0);
1987      iresult.x = floor(tmp.x);
1988      iresult.y = floor(tmp.y);
1989      iresult.z = floor(tmp.z);
1990      iresult.w = floor(tmp.w);
1991      if (iresult.x < -512) iresult.x = -512;
1992      if (iresult.x > 511)  iresult.x = 511;
1993      if (iresult.y < -512) iresult.y = -512;
1994      if (iresult.y > 511)  iresult.y = 511;
1995      if (iresult.z < -512) iresult.z = -512;
1996      if (iresult.z > 511)  iresult.z = 511;
1997      if (iresult.w < -512) iresult.w = -512;
1998      if (iresult.w > 511)  iresult.w = 511;
1999
2000    The following special-case rules apply to floor computation:
2001
2002      1. floor(NaN) = NaN.
2003      2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF.  In all cases, the
2004         sign of the result is equal to the sign of the operand.
2005
2006
2007    Section 2.14.3.5,  ARR:  Address Register Load (with round)
2008
2009    The ARR instruction loads a single vector operand and performs a
2010    component-wise round operation to generate an integer result vector.  Each
2011    component of the result vector is clamped to [-512, +511], the range of
2012    representable address register components.  The ARR instruction applies
2013    all masking operations to address register writes as described in section
2014    2.14.2.2.
2015
2016      tmp = VectorLoad(op0);
2017      iresult.x = round(tmp.x);
2018      iresult.y = round(tmp.y);
2019      iresult.z = round(tmp.z);
2020      iresult.w = round(tmp.w);
2021      if (iresult.x < -512) iresult.x = -512;
2022      if (iresult.x > 511)  iresult.x = 511;
2023      if (iresult.y < -512) iresult.y = -512;
2024      if (iresult.y > 511)  iresult.y = 511;
2025      if (iresult.z < -512) iresult.z = -512;
2026      if (iresult.z > 511)  iresult.z = 511;
2027      if (iresult.w < -512) iresult.w = -512;
2028      if (iresult.w > 511)  iresult.w = 511;
2029
2030    The rounding function, round(x), returns the nearest integer to <x>.  If
2031    the fractional portion of <x> is 0.5, round(x) selects the nearest even
2032    integer.
2033
2034    The ARR instruction is available only in the VP2 execution environment.
2035
2036
2037    Section 2.14.3.6,  BRA:  Branch
2038
2039    The BRA instruction conditionally transfers control to the instruction
2040    following the label specified in the instruction.  The following
2041    pseudocode describes the operation of the instruction:
2042
2043      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
2044          TestCC(cc.**c*) || TestCC(cc.***c)) {
2045        // continue execution at instruction following <branchLabel>
2046      } else {
2047        // do nothing
2048      }
2049
2050    In the pseudocode, <branchLabel> is the label specified in the instruction
2051    matching the <vp2-branchLabel> grammar rule.
2052
2053    The BRA instruction is available only in the VP2 execution environment.
2054
2055
2056    Section 2.14.3.7,  CAL:  Subroutine Call
2057
2058    The CAL instruction conditionally transfers control to the instruction
2059    following the label specified in the instruction.  It also pushes a
2060    reference to the instruction immediately following the CAL instruction
2061    onto the call stack, where execution will continue after executing the
2062    matching RET instruction.  The following pseudocode describes the
2063    operation of the instruction:
2064
2065      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
2066          TestCC(cc.**c*) || TestCC(cc.***c)) {
2067        if (callStackDepth >= 4) {
2068          // terminate vertex program
2069        } else {
2070          callStack[callStackDepth] = nextInstruction;
2071          callStackDepth++;
2072        }
2073        // continue execution at instruction following <branchLabel>
2074      } else {
2075        // do nothing
2076      }
2077
2078    In the pseudocode, <branchLabel> is the label specified in the instruction
2079    matching the <vp2-branchLabel> grammar rule, <callStackDepth> is the
2080    current depth of the call stack, <callStack> is an array holding the call
2081    stack, and <nextInstruction> is a reference to the instruction immediately
2082    following the present one in the program string.
2083
2084    The CAL instruction is available only in the VP2 execution environment.
2085
2086
2087    Section 2.14.3.8,  COS:  Cosine
2088
2089    The COS instruction approximates the cosine of the angle specified by the
2090    scalar operand and replicates the approximation to all four components of
2091    the result vector.  The angle is specified in radians and does not have to
2092    be in the range [0,2*PI].
2093
2094      tmp = ScalarLoad(op0);
2095      result.x = ApproxCosine(tmp);
2096      result.y = ApproxCosine(tmp);
2097      result.z = ApproxCosine(tmp);
2098      result.w = ApproxCosine(tmp);
2099
2100    The approximation function ApproxCosine is accurate to at least 22 bits
2101    with an angle in the range [0,2*PI].
2102
2103      | ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.
2104
2105    The error in the approximation will typically increase with the absolute
2106    value of the angle when the angle falls outside the range [0,2*PI].
2107
2108    The following special-case rules apply to cosine approximation:
2109
2110      1. ApproxCosine(NaN) = NaN.
2111      2. ApproxCosine(+/-INF) = NaN.
2112      3. ApproxCosine(+/-0.0) = +1.0.
2113
2114    The COS instruction is available only in the VP2 execution environment.
2115
2116
2117    Section 2.14.3.9,  DP3:  3-component Dot Product
2118
2119    The DP3 instruction computes a three component dot product of the two
2120    operands (using the x, y, and z components) and replicates the dot product
2121    to all four components of the result vector.
2122
2123      tmp0 = VectorLoad(op0);
2124      tmp1 = VectorLoad(op1):
2125      result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2126                 (tmp0.z * tmp1.z);
2127      result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2128                 (tmp0.z * tmp1.z);
2129      result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2130                 (tmp0.z * tmp1.z);
2131      result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2132                 (tmp0.z * tmp1.z);
2133
2134
2135    Section 2.14.3.10,  DP4:  4-component Dot Product
2136
2137    The DP4 instruction computes a four component dot product of the two
2138    operands and replicates the dot product to all four components of the
2139    result vector.
2140
2141      tmp0 = VectorLoad(op0);
2142      tmp1 = VectorLoad(op1):
2143      result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2144                 (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w);
2145      result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2146                 (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w);
2147      result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2148                 (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w);
2149      result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2150                 (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w);
2151
2152
2153    Section 2.14.3.11,  DPH:  Homogeneous Dot Product
2154
2155    The DPH instruction computes a four-component dot product of the two
2156    operands, except that the W component of the first operand is assumed to
2157    be 1.0.  The instruction replicates the dot product to all four components
2158    of the result vector.
2159
2160      tmp0 = VectorLoad(op0);
2161      tmp1 = VectorLoad(op1):
2162      result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2163                 (tmp0.z * tmp1.z) + tmp1.w;
2164      result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2165                 (tmp0.z * tmp1.z) + tmp1.w;
2166      result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2167                 (tmp0.z * tmp1.z) + tmp1.w;
2168      result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2169                 (tmp0.z * tmp1.z) + tmp1.w;
2170
2171    The DPH instruction is available only in the VP1.1 and VP2 execution
2172    environments.
2173
2174
2175    Section 2.14.3.12,  DST:  Distance Vector
2176
2177    The DST instruction computes a distance vector from two specially-
2178    formatted operands.  The first operand should be of the form [NA, d^2,
2179    d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],
2180    where NA values are not relevant to the calculation and d is a vector
2181    length.  If both vectors satisfy these conditions, the result vector will
2182    be of the form [1.0, d, d^2, 1/d].
2183
2184    The exact behavior is specified in the following pseudo-code:
2185
2186      tmp0 = VectorLoad(op0);
2187      tmp1 = VectorLoad(op1);
2188      result.x = 1.0;
2189      result.y = tmp0.y * tmp1.y;
2190      result.z = tmp0.z;
2191      result.w = tmp1.w;
2192
2193    Given an arbitrary vector, d^2 can be obtained using the DP3 instruction
2194    (using the same vector for both operands) and 1/d can be obtained from d^2
2195    using the RSQ instruction.
2196
2197    This distance vector is useful for per-vertex light attenuation
2198    calculations:  a DP3 operation using the distance vector and an
2199    attenuation constants vector as operands will yield the attenuation
2200    factor.
2201
2202
2203    Section 2.14.3.13,  EX2:  Exponential Base 2
2204
2205    The EX2 instruction approximates 2 raised to the power of the scalar
2206    operand and replicates it to all four components of the result vector.
2207
2208      tmp = ScalarLoad(op0);
2209      result.x = Approx2ToX(tmp);
2210      result.y = Approx2ToX(tmp);
2211      result.z = Approx2ToX(tmp);
2212      result.w = Approx2ToX(tmp);
2213
2214    The approximation function is accurate to at least 22 bits:
2215
2216      | Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0,
2217
2218    and, in general,
2219
2220      | Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)).
2221
2222    The following special-case rules apply to exponential approximation:
2223
2224      1. Approx2ToX(NaN) = NaN.
2225      2. Approx2ToX(-INF) = +0.0.
2226      3. Approx2ToX(+INF) = +INF.
2227      4. Approx2ToX(+/-0.0) = +1.0.
2228
2229    The EX2 instruction is available only in the VP2 execution environment.
2230
2231
2232    Section 2.14.3.14,  EXP:  Exponential Base 2 (approximate)
2233
2234    The EXP instruction computes a rough approximation of 2 raised to the
2235    power of the scalar operand.  The approximation is returned in the "z"
2236    component of the result vector.  A vertex program can also use the "x" and
2237    "y" components of the result vector to generate a more accurate
2238    approximation by evaluating
2239
2240        result.x * f(result.y),
2241
2242    where f(x) is a user-defined function that approximates 2^x over the
2243    domain [0.0, 1.0).  The "w" component of the result vector is always 1.0.
2244
2245    The exact behavior is specified in the following pseudo-code:
2246
2247      tmp = ScalarLoad(op0);
2248      result.x = 2^floor(tmp);
2249      result.y = tmp - floor(tmp);
2250      result.z = RoughApprox2ToX(tmp);
2251      result.w = 1.0;
2252
2253    The approximation function is accurate to at least 11 bits:
2254
2255      | RoughApprox2ToX(x) - 2^x | < 1.0 / 2^11, if 0.0 <= x < 1.0,
2256
2257    and, in general,
2258
2259      | RoughApprox2ToX(x) - 2^x | < (1.0 / 2^11) * (2^floor(x)).
2260
2261    The following special cases apply to the EXP instruction:
2262
2263      1. RoughApprox2ToX(NaN) = NaN.
2264      2. RoughApprox2ToX(-INF) = +0.0.
2265      3. RoughApprox2ToX(+INF) = +INF.
2266      4. RoughApprox2ToX(+/-0.0) = +1.0.
2267
2268    The EXP instruction is present for compatibility with the original
2269    NV_vertex_program instruction set; it is recommended that applications
2270    using NV_vertex_program2 use the EX2 instruction instead.
2271
2272
2273    Section 2.14.3.15,  FLR:  Floor
2274
2275    The FLR instruction performs a component-wise floor operation on the
2276    operand to generate a result vector.  The floor of a value is defined as
2277    the largest integer less than or equal to the value.  The floor of 2.3 is
2278    2.0; the floor of -3.6 is -4.0.
2279
2280      tmp = VectorLoad(op0);
2281      result.x = floor(tmp.x);
2282      result.y = floor(tmp.y);
2283      result.z = floor(tmp.z);
2284      result.w = floor(tmp.w);
2285
2286    The following special-case rules apply to floor computation:
2287
2288      1. floor(NaN) = NaN.
2289      2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF.  In all cases, the
2290         sign of the result is equal to the sign of the operand.
2291
2292    The FLR instruction is available only in the VP2 execution environment.
2293
2294
2295    Section 2.14.3.16,  FRC:  Fraction
2296
2297    The FRC instruction extracts the fractional portion of each component of
2298    the operand to generate a result vector.  The fractional portion of a
2299    component is defined as the result after subtracting off the floor of the
2300    component (see FLR), and is always in the range [0.00, 1.00).
2301
2302    For negative values, the fractional portion is NOT the number written to
2303    the right of the decimal point -- the fractional portion of -1.7 is not
2304    0.7 -- it is 0.3.  0.3 is produced by subtracting the floor of -1.7 (-2.0)
2305    from -1.7.
2306
2307      tmp = VectorLoad(op0);
2308      result.x = tmp.x - floor(tmp.x);
2309      result.y = tmp.y - floor(tmp.y);
2310      result.z = tmp.z - floor(tmp.z);
2311      result.w = tmp.w - floor(tmp.w);
2312
2313    The following special-case rules, which can be derived from the rules for
2314    FLR and ADD apply to fraction computation:
2315
2316      1. fraction(NaN) = NaN.
2317      2. fraction(+/-INF) = NaN.
2318      3. fraction(+/-0.0) = +0.0.
2319
2320    The FRC instruction is available only in the VP2 execution environment.
2321
2322
2323    Section 2.14.3.17,  LG2:  Logarithm Base 2
2324
2325    The LG2 instruction approximates the base 2 logarithm of the scalar
2326    operand and replicates it to all four components of the result vector.
2327
2328      tmp = ScalarLoad(op0);
2329      result.x = ApproxLog2(tmp);
2330      result.y = ApproxLog2(tmp);
2331      result.z = ApproxLog2(tmp);
2332      result.w = ApproxLog2(tmp);
2333
2334    The approximation function is accurate to at least 22 bits:
2335
2336      | ApproxLog2(x) - log_2(x) | < 1.0 / 2^22.
2337
2338    Note that for large values of x, there are not enough bits in the
2339    floating-point storage format to represent a result that precisely.
2340
2341    The following special-case rules apply to logarithm approximation:
2342
2343      1. ApproxLog2(NaN) = NaN.
2344      2. ApproxLog2(+INF) = +INF.
2345      3. ApproxLog2(+/-0.0) = -INF.
2346      4. ApproxLog2(x) = NaN, -INF < x < -0.0.
2347      5. ApproxLog2(-INF) = NaN.
2348
2349    The LG2 instruction is available only in the VP2 execution environment.
2350
2351
2352    Section 2.14.3.18,  LIT:  Compute Light Coefficients
2353
2354    The LIT instruction accelerates per-vertex lighting by computing lighting
2355    coefficients for ambient, diffuse, and specular light contributions.  The
2356    "x" component of the operand is assumed to hold a diffuse dot product (n
2357    dot VP_pli, as in the vertex lighting equations in Section 2.13.1).  The
2358    "y" component of the operand is assumed to hold a specular dot product (n
2359    dot h_i).  The "w" component of the operand is assumed to hold the
2360    specular exponent of the material (s_rm), and is clamped to the range
2361    (-128, +128) exclusive.
2362
2363    The "x" component of the result vector receives the value that should be
2364    multiplied by the ambient light/material product (always 1.0).  The "y"
2365    component of the result vector receives the value that should be
2366    multiplied by the diffuse light/material product (n dot VP_pli).  The "z"
2367    component of the result vector receives the value that should be
2368    multiplied by the specular light/material product (f_i * (n dot h_i) ^
2369    s_rm).  The "w" component of the result is the constant 1.0.
2370
2371    Negative diffuse and specular dot products are clamped to 0.0, as is done
2372    in the standard per-vertex lighting operations.  In addition, if the
2373    diffuse dot product is zero or negative, the specular coefficient is
2374    forced to zero.
2375
2376      tmp = VectorLoad(op0);
2377      if (t.x < 0) t.x = 0;
2378      if (t.y < 0) t.y = 0;
2379      if (t.w < -(128.0-epsilon)) t.w = -(128.0-epsilon);
2380      else if (t.w > 128-epsilon) t.w = 128-epsilon;
2381      result.x = 1.0;
2382      result.y = t.x;
2383      result.z = (t.x > 0) ? RoughApproxPower(t.y, t.w) : 0.0;
2384      result.w = 1.0;
2385
2386    The exponentiation approximation function is defined in terms of the base
2387    2 exponentiation and logarithm approximation operations in the EXP and LOG
2388    instructions, including errors and the processing of any special cases.
2389    In particular,
2390
2391      RoughApproxPower(a,b) = RoughApproxExp2(b * RoughApproxLog2(a)).
2392
2393    The following special-case rules, which can be derived from the rules in
2394    the LOG, MUL, and EXP instructions, apply to exponentiation:
2395
2396      1. RoughApproxPower(NaN, <x>) = NaN,
2397      2. RoughApproxPower(<x>, <y>) = NaN, if x <= -0.0,
2398      3. RoughApproxPower(+/-0.0, <x>) = +0.0, if x > +0.0, or
2399                                         +INF, if x < -0.0,
2400      4. RoughApproxPower(+1.0, <x>) = +1.0, if x is not NaN,
2401      5. RoughApproxPower(+INF, <x>) = +INF, if x > +0.0, or
2402                                       +0.0, if x < -0.0,
2403      6. RoughApproxPower(<x>, +/-0.0) = +1.0, if x >= -0.0
2404      7. RoughApproxPower(<x>, +INF) = +0.0, if -0.0 <= x < +1.0,
2405                                       +INF, if x > +1.0,
2406      8. RoughApproxPower(<x>, +INF) = +INF, if -0.0 <= x < +1.0,
2407                                       +0.0, if x > +1.0,
2408      9. RoughApproxPower(<x>, +1.0) = <x>, if x >= +0.0, and
2409      10. RoughApproxPower(<x>, NaN) = NaN.
2410
2411
2412    Section 2.14.3.19,  LOG:  Logarithm Base 2 (Approximate)
2413
2414    The LOG instruction computes a rough approximation of the base 2 logarithm
2415    of the absolute value of the scalar operand.  The approximation is
2416    returned in the "z" component of the result vector.  A vertex program can
2417    also use the "x" and "y" components of the result vector to generate a
2418    more accurate approximation by evaluating
2419
2420        result.x + f(result.y),
2421
2422    where f(x) is a user-defined function that approximates 2^x over the
2423    domain [1.0, 2.0).  The "w" component of the result vector is always 1.0.
2424
2425    The exact behavior is specified in the following pseudo-code:
2426
2427      tmp = fabs(ScalarLoad(op0));
2428      result.x = floor(log2(tmp));
2429      result.y = tmp / (2^floor(log2(tmp)));
2430      result.z = RoughApproxLog2(tmp);
2431      result.w = 1.0;
2432
2433    The approximation function is accurate to at least 11 bits:
2434
2435      | RoughApproxLog2(x) - log_2(x) | < 1.0 / 2^11.
2436
2437    The following special-case rules apply to the LOG instruction:
2438
2439      1. RoughApproxLog2(NaN) = NaN.
2440      2. RoughApproxLog2(+INF) = +INF.
2441      3. RoughApproxLog2(+0.0) = -INF.
2442
2443    The LOG instruction is present for compatibility with the original
2444    NV_vertex_program instruction set; it is recommended that applications
2445    using NV_vertex_program2 use the LG2 instruction instead.
2446
2447
2448    Section 2.14.3.20,  MAD:  Multiply And Add
2449
2450    The MAD instruction performs a component-wise multiply of the first two
2451    operands, and then does a component-wise add of the product to the third
2452    operand to yield a result vector.
2453
2454      tmp0 = VectorLoad(op0);
2455      tmp1 = VectorLoad(op1);
2456      tmp2 = VectorLoad(op2);
2457      result.x = tmp0.x * tmp1.x + tmp2.x;
2458      result.y = tmp0.y * tmp1.y + tmp2.y;
2459      result.z = tmp0.z * tmp1.z + tmp2.z;
2460      result.w = tmp0.w * tmp1.w + tmp2.w;
2461
2462    All special case rules applicable to the ADD and MUL instructions apply to
2463    the individual components of the MAD operation as well.
2464
2465
2466    Section 2.14.3.21,  MAX:  Maximum
2467
2468    The MAX instruction computes component-wise maximums of the values in the
2469    two operands to yield a result vector.
2470
2471      tmp0 = VectorLoad(op0);
2472      tmp1 = VectorLoad(op1);
2473      result.x = max(tmp0.x, tmp1.x);
2474      result.y = max(tmp0.y, tmp1.y);
2475      result.z = max(tmp0.z, tmp1.z);
2476      result.w = max(tmp0.w, tmp1.w);
2477
2478    The following special cases apply to the maximum operation:
2479
2480      1. max(A,B) is always equivalent to max(B,A).
2481      2. max(NaN, <x>) == NaN, for all <x>.
2482
2483
2484    Section 2.14.3.22,  MIN:  Minimum
2485
2486    The MIN instruction computes component-wise minimums of the values in the
2487    two operands to yield a result vector.
2488
2489      tmp0 = VectorLoad(op0);
2490      tmp1 = VectorLoad(op1);
2491      result.x = min(tmp0.x, tmp1.x);
2492      result.y = min(tmp0.y, tmp1.y);
2493      result.z = min(tmp0.z, tmp1.z);
2494      result.w = min(tmp0.w, tmp1.w);
2495
2496    The following special cases apply to the minimum operation:
2497
2498      1. min(A,B) is always equivalent to min(B,A).
2499      2. min(NaN, <x>) == NaN, for all <x>.
2500
2501
2502    Section 2.14.3.23,  MOV:  Move
2503
2504    The MOV instruction copies the value of the operand to yield a result
2505    vector.
2506
2507      result = VectorLoad(op0);
2508
2509
2510    Section 2.14.3.24,  MUL:  Multiply
2511
2512    The MUL instruction performs a component-wise multiply of the two operands
2513    to yield a result vector.
2514
2515      tmp0 = VectorLoad(op0);
2516      tmp1 = VectorLoad(op1);
2517      result.x = tmp0.x * tmp1.x;
2518      result.y = tmp0.y * tmp1.y;
2519      result.z = tmp0.z * tmp1.z;
2520      result.w = tmp0.w * tmp1.w;
2521
2522    The following special-case rules apply to multiplication:
2523
2524      1. "A*B" is always equivalent to "B*A".
2525      2. NaN * <x> = NaN, for all <x>.
2526      3. +/-0.0 * +/-INF = NaN.
2527      4. +/-0.0 * <x> = +/-0.0, for all <x> except -INF, +INF, and NaN.  The
2528         sign of the result is positive if the signs of the two operands match
2529         and negative otherwise.
2530      5. +/-INF * <x> = +/-INF, for all <x> except -0.0, +0.0, and NaN.  The
2531         sign of the result is positive if the signs of the two operands match
2532         and negative otherwise.
2533      6. +1.0 * <x> = <x>, for all <x>.
2534
2535
2536    Section 2.14.3.25,  RCC:  Reciprocal (Clamped)
2537
2538    The RCC instruction approximates the reciprocal of the scalar operand,
2539    clamps the result to one of two ranges, and replicates the clamped result
2540    to all four components of the result vector.
2541
2542    If the approximate reciprocal is greater than 0.0, the result is clamped
2543    to the range [2^-64, 2^+64].  If the approximate reciprocal is not greater
2544    than zero, the result is clamped to the range [-2^+64, -2^-64].
2545
2546      tmp = ScalarLoad(op0);
2547      result.x = ClampApproxReciprocal(tmp);
2548      result.y = ClampApproxReciprocal(tmp);
2549      result.z = ClampApproxReciprocal(tmp);
2550      result.w = ClampApproxReciprocal(tmp);
2551
2552    The approximation function is accurate to at least 22 bits:
2553
2554      | ClampApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0.
2555
2556    The following special-case rules apply to reciprocation:
2557
2558      1. ClampApproxReciprocal(NaN) = NaN.
2559      2. ClampApproxReciprocal(+INF) = +2^-64.
2560      3. ClampApproxReciprocal(-INF) = -2^-64.
2561      4. ClampApproxReciprocal(+0.0) = +2^64.
2562      5. ClampApproxReciprocal(-0.0) = -2^64.
2563      6. ClampApproxReciprocal(x) = +2^-64, if +2^64 < x < +INF.
2564      7. ClampApproxReciprocal(x) = -2^-64, if -INF < x < -2^-64.
2565      8. ClampApproxReciprocal(x) = +2^64, if +0.0 < x < +2^-64.
2566      9. ClampApproxReciprocal(x) = -2^64, if -2^-64 < x < -0.0.
2567
2568    The RCC instruction is available only in the VP1.1 and VP2 execution
2569    environments.
2570
2571
2572    Section 2.14.3.26,  RCP:  Reciprocal
2573
2574    The RCP instruction approximates the reciprocal of the scalar operand and
2575    replicates it to all four components of the result vector.
2576
2577      tmp = ScalarLoad(op0);
2578      result.x = ApproxReciprocal(tmp);
2579      result.y = ApproxReciprocal(tmp);
2580      result.z = ApproxReciprocal(tmp);
2581      result.w = ApproxReciprocal(tmp);
2582
2583    The approximation function is accurate to at least 22 bits:
2584
2585      | ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0.
2586
2587    The following special-case rules apply to reciprocation:
2588
2589      1. ApproxReciprocal(NaN) = NaN.
2590      2. ApproxReciprocal(+INF) = +0.0.
2591      3. ApproxReciprocal(-INF) = -0.0.
2592      4. ApproxReciprocal(+0.0) = +INF.
2593      5. ApproxReciprocal(-0.0) = -INF.
2594
2595
2596    Section 2.14.3.27,  RET:  Subroutine Call Return
2597
2598    The RET instruction conditionally returns from a subroutine initiated by a
2599    CAL instruction by popping an instruction reference off the top of the
2600    call stack and transferring control to the referenced instruction.  The
2601    following pseudocode describes the operation of the instruction:
2602
2603      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
2604          TestCC(cc.**c*) || TestCC(cc.***c)) {
2605        if (callStackDepth <= 0) {
2606          // terminate vertex program
2607        } else {
2608          callStackDepth--;
2609          instruction = callStack[callStackDepth];
2610        }
2611
2612        // continue execution at <instruction>
2613      } else {
2614        // do nothing
2615      }
2616
2617    In the pseudocode, <callStackDepth> is the depth of the call stack,
2618    <callStack> is an array holding the call stack, and <instruction> is a
2619    reference to an instruction previously pushed onto the call stack.
2620
2621    The RET instruction is available only in the VP2 execution environment.
2622
2623
2624    Section 2.14.3.28,  RSQ:  Reciprocal Square Root
2625
2626    The RSQ instruction approximates the reciprocal of the square root of the
2627    scalar operand and replicates it to all four components of the result
2628    vector.
2629
2630      tmp = ScalarLoad(op0);
2631      result.x = ApproxRSQRT(tmp);
2632      result.y = ApproxRSQRT(tmp);
2633      result.z = ApproxRSQRT(tmp);
2634      result.w = ApproxRSQRT(tmp);
2635
2636    The approximation function is accurate to at least 22 bits:
2637
2638      | ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0.
2639
2640    The following special-case rules apply to reciprocal square roots:
2641
2642      1. ApproxRSQRT(NaN) = NaN.
2643      2. ApproxRSQRT(+INF) = +0.0.
2644      3. ApproxRSQRT(-INF) = NaN.
2645      4. ApproxRSQRT(+0.0) = +INF.
2646      5. ApproxRSQRT(-0.0) = -INF.
2647      6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0.
2648
2649
2650    Section 2.14.3.29,  SEQ:  Set on Equal
2651
2652    The SEQ instruction performs a component-wise comparison of the two
2653    operands.  Each component of the result vector is 1.0 if the corresponding
2654    component of the first operand is equal to that of the second, and 0.0
2655    otherwise.
2656
2657      tmp0 = VectorLoad(op0);
2658      tmp1 = VectorLoad(op1);
2659      result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0;
2660      result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0;
2661      result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0;
2662      result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0;
2663      if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN;
2664      if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN;
2665      if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN;
2666      if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN;
2667
2668    The following special-case rules apply to SEQ:
2669
2670      1. (<x> == <y>) and (<y> == <x>) always produce the same result.
2671      1. (NaN == <x>) is FALSE for all <x>, including NaN.
2672      2. (+INF == +INF) and (-INF == -INF) are TRUE.
2673      3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE.
2674
2675    The SEQ instruction is available only in the VP2 execution environment.
2676
2677
2678    Section 2.14.3.30,  SFL:  Set on False
2679
2680    The SFL instruction is a degenerate case of the other "Set on"
2681    instructions that sets all components of the result vector to
2682    0.0.
2683
2684      result.x = 0.0;
2685      result.y = 0.0;
2686      result.z = 0.0;
2687      result.w = 0.0;
2688
2689    The SFL instruction is available only in the VP2 execution environment.
2690
2691
2692    Section 2.14.3.31,  SGE:  Set on Greater Than or Equal
2693
2694    The SGE instruction performs a component-wise comparison of the two
2695    operands.  Each component of the result vector is 1.0 if the corresponding
2696    component of the first operands is greater than or equal that of the
2697    second, and 0.0 otherwise.
2698
2699      tmp0 = VectorLoad(op0);
2700      tmp1 = VectorLoad(op1);
2701      result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0;
2702      result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0;
2703      result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0;
2704      result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0;
2705      if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN;
2706      if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN;
2707      if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN;
2708      if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN;
2709
2710    The following special-case rules apply to SGE:
2711
2712      1. (NaN >= <x>) and (<x> >= NaN) are FALSE for all <x>.
2713      2. (+INF >= +INF) and (-INF >= -INF) are TRUE.
2714      3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE.
2715
2716
2717    Section 2.14.3.32,  SGT:  Set on Greater Than
2718
2719    The SGT instruction performs a component-wise comparison of the two
2720    operands.  Each component of the result vector is 1.0 if the corresponding
2721    component of the first operands is greater than that of the second, and
2722    0.0 otherwise.
2723
2724      tmp0 = VectorLoad(op0);
2725      tmp1 = VectorLoad(op1);
2726      result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0;
2727      result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0;
2728      result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0;
2729      result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0;
2730      if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN;
2731      if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN;
2732      if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN;
2733      if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN;
2734
2735    The following special-case rules apply to SGT:
2736
2737      1. (NaN > <x>) and (<x> > NaN) are FALSE for all <x>.
2738      2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE.
2739
2740    The SGT instruction is available only in the VP2 execution environment.
2741
2742
2743    Section 2.14.3.33,  SIN:  Sine
2744
2745    The SIN instruction approximates the sine of the angle specified by the
2746    scalar operand and replicates it to all four components of the result
2747    vector.  The angle is specified in radians and does not have to be in the
2748    range [0,2*PI].
2749
2750      tmp = ScalarLoad(op0);
2751      result.x = ApproxSine(tmp);
2752      result.y = ApproxSine(tmp);
2753      result.z = ApproxSine(tmp);
2754      result.w = ApproxSine(tmp);
2755
2756    The approximation function is accurate to at least 22 bits with an angle
2757    in the range [0,2*PI].
2758
2759      | ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.
2760
2761    The error in the approximation will typically increase with the absolute
2762    value of the angle when the angle falls outside the range [0,2*PI].
2763
2764    The following special-case rules apply to cosine approximation:
2765
2766      1. ApproxSine(NaN) = NaN.
2767      2. ApproxSine(+/-INF) = NaN.
2768      3. ApproxSine(+/-0.0) = +/-0.0.  The sign of the result is equal to the
2769         sign of the single operand.
2770
2771    The SIN instruction is available only in the VP2 execution environment.
2772
2773
2774    Section 2.14.3.34,  SLE:  Set on Less Than or Equal
2775
2776    The SLE instruction performs a component-wise comparison of the two
2777    operands.  Each component of the result vector is 1.0 if the corresponding
2778    component of the first operand is less than or equal to that of the
2779    second, and 0.0 otherwise.
2780
2781      tmp0 = VectorLoad(op0);
2782      tmp1 = VectorLoad(op1);
2783      result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0;
2784      result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0;
2785      result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0;
2786      result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0;
2787      if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN;
2788      if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN;
2789      if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN;
2790      if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN;
2791
2792    The following special-case rules apply to SLE:
2793
2794      1. (NaN <= <x>) and (<x> <= NaN) are FALSE for all <x>.
2795      2. (+INF <= +INF) and (-INF <= -INF) are TRUE.
2796      3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE.
2797
2798    The SLE instruction is available only in the VP2 execution environment.
2799
2800
2801    Section 2.14.3.35,  SLT:  Set on Less Than
2802
2803    The SLT instruction performs a component-wise comparison of the two
2804    operands.  Each component of the result vector is 1.0 if the corresponding
2805    component of the first operand is less than that of the second, and 0.0
2806    otherwise.
2807
2808      tmp0 = VectorLoad(op0);
2809      tmp1 = VectorLoad(op1);
2810      result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0;
2811      result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0;
2812      result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0;
2813      result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0;
2814      if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN;
2815      if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN;
2816      if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN;
2817      if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN;
2818
2819    The following special-case rules apply to SLT:
2820
2821      1. (NaN < <x>) and (<x> < NaN) are FALSE for all <x>.
2822      2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE.
2823
2824
2825    Section 2.14.3.36,  SNE:  Set on Not Equal
2826
2827    The SNE instruction performs a component-wise comparison of the two
2828    operands.  Each component of the result vector is 1.0 if the corresponding
2829    component of the first operand is not equal to that of the second, and 0.0
2830    otherwise.
2831
2832      tmp0 = VectorLoad(op0);
2833      tmp1 = VectorLoad(op1);
2834      result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0;
2835      result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0;
2836      result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0;
2837      result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0;
2838      if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN;
2839      if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN;
2840      if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN;
2841      if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN;
2842
2843    The following special-case rules apply to SNE:
2844
2845      1. (<x> != <y>) and (<y> != <x>) always produce the same result.
2846      2. (NaN != <x>) is TRUE for all <x>, including NaN.
2847      3. (+INF != +INF) and (-INF != -INF) are FALSE.
2848      4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE.
2849
2850    The SNE instruction is available only in the VP2 execution environment.
2851
2852
2853    Section 2.14.3.37,  SSG:  Set Sign
2854
2855    The SSG instruction generates a result vector containing the signs of each
2856    component of the single operand.  Each component of the result vector is
2857    1.0 if the corresponding component of the operand is greater than zero,
2858    0.0 if the corresponding component of the operand is equal to zero, and
2859    -1.0 if the corresponding component of the operand is less than zero.
2860
2861      tmp = VectorLoad(op0);
2862      result.x = SetSign(tmp.x);
2863      result.y = SetSign(tmp.y);
2864      result.z = SetSign(tmp.z);
2865      result.w = SetSign(tmp.w);
2866
2867    The following special-case rules apply to SSG:
2868
2869      1. SetSign(NaN) = NaN.
2870      2. SetSign(-0.0) = SetSign(+0.0) = 0.0.
2871      3. SetSign(-INF) = -1.0.
2872      4. SetSign(+INF) = +1.0.
2873      5. SetSign(x) = -1.0, if -INF < x < -0.0.
2874      6. SetSign(x) = +1.0, if +0.0 < x < +INF.
2875
2876    The SSG instruction is available only in the VP2 execution environment.
2877
2878
2879    Section 2.14.3.38,  STR:  Set on True
2880
2881    The STR instruction is a degenerate case of the other "Set on"
2882    instructions that sets all components of the result vector to 1.0.
2883
2884      result.x = 1.0;
2885      result.y = 1.0;
2886      result.z = 1.0;
2887      result.w = 1.0;
2888
2889    The STR instruction is available only in the VP2 execution environment.
2890
2891
2892    Section 2.14.3.39,  SUB:  Subtract
2893
2894    The SUB instruction performs a component-wise subtraction of the second
2895    operand from the first to yield a result vector.
2896
2897      tmp0 = VectorLoad(op0);
2898      tmp1 = VectorLoad(op1);
2899      result.x = tmp0.x - tmp1.x;
2900      result.y = tmp0.y - tmp1.y;
2901      result.z = tmp0.z - tmp1.z;
2902      result.w = tmp0.w - tmp1.w;
2903
2904    The SUB instruction is completely equivalent to an identical ADD
2905    instruction in which the negate operator on the second operand is
2906    reversed:
2907
2908      1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2".
2909      2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2".
2910      3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|".
2911      4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|".
2912
2913    The SUB instruction is available only in the VP1.1 and VP2 execution
2914    environments.
2915
2916
2917    2.14.4  Vertex Arrays for Vertex Attributes
2918
2919    Data for vertex attributes in vertex program mode may be specified
2920    using vertex array commands.  The client may specify and enable any
2921    of sixteen vertex attribute arrays.
2922
2923    The vertex attribute arrays are ignored when vertex program mode
2924    is disabled.  When vertex program mode is enabled, vertex attribute
2925    arrays are used.
2926
2927    The command
2928
2929      void VertexAttribPointerNV(uint index, int size, enum type,
2930                                 sizei stride, const void *pointer);
2931
2932    describes the locations and organizations of the sixteen vertex
2933    attribute arrays.  index specifies the particular vertex attribute
2934    to be described.  size indicates the number of values per vertex
2935    that are stored in the array; size must be one of 1, 2, 3, or 4.
2936    type specifies the data type of the values stored in the array.
2937    type must be one of SHORT, FLOAT, DOUBLE, or UNSIGNED_BYTE and these
2938    values correspond to the array types short, int, float, double, and
2939    ubyte respectively.  The INVALID_OPERATION error is generated if
2940    type is UNSIGNED_BYTE and size is not 4.  The INVALID_VALUE error
2941    is generated if index is greater than 15.  The INVALID_VALUE error
2942    is generated if stride is negative.
2943
2944    The one, two, three, or four values in an array that correspond to a
2945    single vertex attribute comprise an array element.  The values within
2946    each array element at stored sequentially in memory.  If the stride
2947    is specified as zero, then array elements are stored sequentially
2948    as well.  Otherwise points to the ith and (i+1)st elements of an array
2949    differ by stride basic machine units (typically unsigned bytes),
2950    the pointer to the (i+1)st element being greater.  pointer specifies
2951    the location in memory of the first value of the first element of
2952    the array being specified.
2953
2954    Vertex attribute arrays are enabled with the EnableClientState command
2955    and disabled with the DisableClientState command.  The value of the
2956    argument to either command is VERTEX_ATTRIB_ARRAYi_NV where i is an
2957    integer between 0 and 15; specifying a value of i enables or
2958    disables the vertex attribute array with index i.  The constants
2959    obey VERTEX_ATTRIB_ARRAYi_NV = VERTEX_ATTRIB_ARRAY0_NV + i.
2960
2961    When vertex program mode is enabled, the ArrayElement command operates
2962    as described in this section in contrast to the behavior described
2963    in section 2.8.  Likewise, any vertex array transfer commands that
2964    are defined in terms of ArrayElement (DrawArrays, DrawElements, and
2965    DrawRangeElements) assume the operation of ArrayElement described
2966    in this section when vertex program mode is enabled.
2967
2968    When vertex program mode is enabled, the ArrayElement command
2969    transfers the ith element of particular enabled vertex arrays as
2970    described below.  For each enabled vertex attribute array, it is
2971    as though the corresponding command from section 2.14.1.1 were
2972    called with a pointer to element i.  For each vertex attribute,
2973    the corresponding command is VertexAttrib[size][type]v, where size
2974    is one of [1,2,3,4], and type is one of [s,f,d,ub], corresponding
2975    to the array types short, int, float, double, and ubyte respectively.
2976
2977    However, if a given vertex attribute array is disabled, but its
2978    corresponding aliased conventional per-vertex parameter's vertex
2979    array (as described in section 2.14.1.6) is enabled, then it is
2980    as though the corresponding command from section 2.7 or section
2981    2.6.2 were called with a pointer to element i.  In this case, the
2982    corresponding command is determined as described in section 2.8's
2983    description of ArrayElement.
2984
2985    If the vertex attribute array 0 is enabled, it is as though
2986    VertexAttrib[size][type]v(0, ...) is executed last, after the
2987    executions of other corresponding commands.  If the vertex attribute
2988    array 0 is disabled but the vertex array is enabled, it is as though
2989    Vertex[size][type]v is executed last, after the executions of other
2990    corresponding commands.
2991
2992    2.14.5  Vertex State Programs
2993
2994    Vertex state programs share the same instruction set as and a similar
2995    execution model to vertex programs.  While vertex programs are executed
2996    implicitly when a vertex transformation is provoked, vertex state programs
2997    are executed explicitly, independently of any vertices.  Vertex state
2998    programs can write program parameter registers, but may not write vertex
2999    result registers.  Vertex state programs have not been extended beyond the
3000    the VP1.0 execution environment, and are offered solely for compatibility
3001    with that execution environment.
3002
3003    The purpose of a vertex state program is to update program parameter
3004    registers by means of an application-defined program.  Typically, an
3005    application will load a set of program parameters and then execute a
3006    vertex state program that reads and updates the program parameter
3007    registers.  For example, a vertex state program might normalize a set of
3008    unnormalized vectors previously loaded as program parameters.  The
3009    expectation is that subsequently executed vertex programs would use the
3010    normalized program parameters.
3011
3012    Vertex state programs are loaded with the same LoadProgramNV command (see
3013    section 2.14.1.8) used to load vertex programs except that the target must
3014    be VERTEX_STATE_PROGRAM_NV when loading a vertex state program.
3015
3016    Vertex state programs must conform to a more limited grammar than the
3017    grammar for vertex programs.  The vertex state program grammar for
3018    syntactically valid sequences is the same as the grammar defined in
3019    section 2.14.1.8 with the following modified rules:
3020
3021    <program>              ::= <vp1-program>
3022
3023    <vp1-program>          ::= "!!VSP1.0" <programBody> "END"
3024
3025    <dstReg>               ::= <absProgParamReg>
3026                             | <temporaryReg>
3027
3028    <vertexAttribReg>      ::= "v" "[" "0" "]"
3029
3030    A vertex state program fails to load if it does not write at least
3031    one program parameter register.
3032
3033    A vertex state program fails to load if it contains more than 128
3034    instructions.
3035
3036    A vertex state program fails to load if any instruction sources more
3037    than one unique program parameter register.
3038
3039    A vertex state program fails to load if any instruction sources
3040    more than one unique vertex attribute register (this is necessarily
3041    true because only vertex attribute 0 is available in vertex state
3042    programs).
3043
3044    The error INVALID_OPERATION is generated if a vertex state program
3045    fails to load because it is not syntactically correct or for one
3046    of the other reasons listed above.
3047
3048    A successfully loaded vertex state program is parsed into a sequence
3049    of instructions.  Each instruction is identified by its tokenized
3050    name.  The operation of these instructions when executed is defined
3051    in section 2.14.1.10.
3052
3053    Executing vertex state programs is legal only outside a Begin/End
3054    pair.  A vertex state program may not read any vertex attribute
3055    register other than register zero.  A vertex state program may not
3056    write any vertex result register.
3057
3058    The command
3059
3060      ExecuteProgramNV(enum target, uint id, const float *params);
3061
3062    executes the vertex state program named by id.  The target must be
3063    VERTEX_STATE_PROGRAM_NV and the id must be the name of program loaded
3064    with a target type of VERTEX_STATE_PROGRAM_NV.  params points to
3065    an array of four floating-point values that are loaded into vertex
3066    attribute register zero (the only vertex attribute readable from a
3067    vertex state program).
3068
3069    The INVALID_OPERATION error is generated if the named program is
3070    nonexistent, is invalid, or the program is not a vertex state
3071    program.  A vertex state program may not be valid for reasons
3072    explained in section 2.14.5.
3073
3074
3075    2.14.6,  Program Options
3076
3077    In the VP1.1 and VP2.0 execution environment, vertex programs may specify
3078    one or more program options that modify the execution environment,
3079    according to the <option> grammar rule.  The set of options available to
3080    the program is described below.
3081
3082    Section 2.14.6.1, Position-Invariant Vertex Program Option
3083
3084    If <vp11-option> or <vp2-option> matches "NV_position_invariant", the
3085    vertex program is presumed to be position-invariant.  By default, vertex
3086    programs are not position-invariant.  Even if programs emulate the
3087    conventional OpenGL transformation model, they may still not produce the
3088    exact same transform results, due to rounding errors or different
3089    operation orders.  Such programs may not work well for multi-pass
3090    rendering algorithms where the second and subsequent passes use an EQUAL
3091    depth test.
3092
3093    Position-invariant vertex programs do not compute a final vertex position;
3094    instead, the GL computes vertex coordinates as described in section 2.10.
3095    This computation should produce exactly the same results as the
3096    conventional OpenGL transformation model, assuming vertex weighting and
3097    vertex blending are disabled.
3098
3099    A vertex program that specifies the position-invariant option will fail to
3100    load if it writes to the HPOS result register.
3101
3102    Additionally, in the VP1.1 execution environment, position-invariant
3103    programs can not use relative addressing for program parameters.  Any
3104    position-invariant VP1.1 program matches the grammar rule
3105    <relProgParamReg>, will fail to load.  No such restriction exists for
3106    VP2.0 programs.
3107
3108    For position-invariant programs, the limit on the number of instructions
3109    allowed in a program is reduced by four:  position-invariant VP1.1 and
3110    VP2.0 programs may have no more than 124 or 252 instructions,
3111    respectively.
3112
3113
3114    2.14.7  Tracking Matrices
3115
3116    As a convenience to applications, standard GL matrix state can be
3117    tracked into program parameter vectors.  This permits vertex programs
3118    to access matrices specified through GL matrix commands.
3119
3120    In addition to GL's conventional matrices, several additional matrices
3121    are available for tracking.  These matrices have names of the form
3122    MATRIXi_NV where i is between zero and n-1 where n is the value
3123    of the MAX_TRACK_MATRICES_NV implementation dependent constant.
3124    The MATRIXi_NV constants obey MATRIXi_NV = MATRIX0_NV + i.  The value
3125    of MAX_TRACK_MATRICES_NV must be at least eight.  The maximum
3126    stack depth for tracking matrices is defined by the
3127    MAX_TRACK_MATRIX_STACK_DEPTH_NV and must be at least 1.
3128
3129    The command
3130
3131      TrackMatrixNV(enum target, uint address, enum matrix, enum transform);
3132
3133    tracks a given transformed version of a particular matrix into
3134    a contiguous sequence of four vertex program parameter registers
3135    beginning at address.  target must be VERTEX_PROGRAM_NV (though
3136    tracked matrices apply to vertex state programs as well because both
3137    vertex state programs and vertex programs shared the same program
3138    parameter registers).  matrix must be one of NONE, MODELVIEW,
3139    PROJECTION, TEXTURE, TEXTUREi_ARB (where i is between 0 and n-1
3140    where n is the number of texture units supported), COLOR (if
3141    the ARB_imaging subset is supported), MODELVIEW_PROJECTION_NV,
3142    or MATRIXi_NV.  transform must be one of IDENTITY_NV, INVERSE_NV,
3143    TRANSPOSE_NV, or INVERSE_TRANSPOSE_NV.  The INVALID_VALUE error is
3144    generated if address is not a multiple of four.
3145
3146    The MODELVIEW_PROJECTION_NV matrix represents the concatenation of
3147    the current modelview and projection matrices.  If M is the current
3148    modelview matrix and P is the current projection matrix, then the
3149    MODELVIEW_PROJECTION_NV matrix is C and computed as
3150
3151        C = P M
3152
3153    Matrix tracking for the specified program parameter register and the
3154    next consecutive three registers is disabled when NONE is supplied
3155    for matrix.  When tracking is disabled the previously tracked program
3156    parameter registers retain the state of their last tracked values.
3157    Otherwise, the specified transformed version of matrix is tracked into
3158    the specified program parameter register and the next three registers.
3159    Whenever the matrix changes, the transformed version of the matrix
3160    is updated in the specified range of program parameter registers.
3161    If TEXTURE is specified for matrix, the texture matrix for the current
3162    active texture unit is tracked.  If TEXTUREi_ARB is specified for
3163    matrix, the <i>th texture matrix is tracked.
3164
3165    Matrices are tracked row-wise meaning that the top row of the
3166    transformed matrix is loaded into the program parameter address,
3167    the second from the top row of the transformed matrix is loaded into
3168    the program parameter address+1, the third from the top row of the
3169    transformed matrix is loaded into the program parameter address+2,
3170    and the bottom row of the transformed matrix is loaded into the
3171    program parameter address+3.  The transformed matrix may be identical
3172    to the specified matrix, the inverse of the specified matrix, the
3173    transpose of the specified matrix, or the inverse transpose of the
3174    specified matrix, depending on the value of transform.
3175
3176    When matrix tracking is enabled for a particular program parameter
3177    register sequence, updates to the program parameter using
3178    ProgramParameterNV commands, a vertex program, or a vertex state
3179    program are not possible.  The INVALID_OPERATION error is generated
3180    if a ProgramParameterNV command is used to update a program parameter
3181    register currently tracking a matrix.
3182
3183    The INVALID_OPERATION error is generated by ExecuteProgramNV when
3184    the vertex state program requested for execution writes to a program
3185    parameter register that is currently tracking a matrix because the
3186    program is considered invalid.
3187
3188    2.14.8  Required Vertex Program State
3189
3190    The state required for vertex programs consists of:
3191
3192      a bit indicating whether or not program mode is enabled;
3193
3194      a bit indicating whether or not two-sided color mode is enabled;
3195
3196      a bit indicating whether or not program-specified point size mode
3197      is enabled;
3198
3199      256 4-component floating-point program parameter registers;
3200
3201      16 4-component vertex attribute registers (though this state is
3202      aliased with the current normal, primary color, secondary color,
3203      fog coordinate, weights, and texture coordinate sets);
3204
3205      24 sets of matrix tracking state for each set of four sequential
3206      program parameter registers, consisting of a n-valued integer
3207      indicated the tracked matrix or GL_NONE (where n is 5 + the number
3208      of texture units supported + the number of tracking matrices
3209      supported) and a four-valued integer indicating the transformation
3210      of the tracked matrix;
3211
3212      an unsigned integer naming the currently bound vertex program
3213
3214      and the state must be maintained to indicate which integers
3215      are currently in use as program names.
3216
3217   Each existent program object consists of a target, a boolean indicating
3218   whether the program is resident, an array of type ubyte containing the
3219   program string, and the length of the program string array.  Initially,
3220   no program objects exist.
3221
3222   Program mode, two-sided color mode, and program-specified point size
3223   mode are all initially disabled.
3224
3225   The initial state of all 256 program parameter registers is (0,0,0,0).
3226
3227   The initial state of the 16 vertex attribute registers is (0,0,0,1)
3228   except in cases where a vertex attribute register aliases to a
3229   conventional GL transform mode vertex parameter in which case
3230   the initial state is the initial state of the respective aliased
3231   conventional vertex parameter.
3232
3233   The initial state of the 24 sets of matrix tracking state is NONE
3234   for the tracked matrix and IDENTITY_NV for the transformation of the
3235   tracked matrix.
3236
3237   The initial currently bound program is zero.
3238
3239   The client state required to implement the 16 vertex attribute
3240   arrays consists of 16 boolean values, 16 memory pointers, 16 integer
3241   stride values, 16 symbolic constants representing array types,
3242   and 16 integers representing values per element.  Initially, the
3243   boolean values are each disabled, the memory pointers are each null,
3244   the strides are each zero, the array types are each FLOAT, and the
3245   integers representing values per element are each four."
3246
3247
3248Additions to Chapter 3 of the OpenGL 1.3 Specification (Rasterization)
3249
3250    None.
3251
3252Additions to Chapter 4 of the OpenGL 1.3 Specification (Per-Fragment
3253Operations and the Frame Buffer)
3254
3255    None.
3256
3257Additions to Chapter 5 of the OpenGL 1.3 Specification (Special Functions)
3258
3259    None.
3260
3261Additions to Chapter 6 of the OpenGL 1.3 Specification (State and
3262State Requests)
3263
3264    None.
3265
3266Additions to Appendix A of the OpenGL 1.3 Specification (Invariance)
3267
3268    None.
3269
3270Additions to the AGL/GLX/WGL Specifications
3271
3272    None.
3273
3274GLX Protocol
3275
3276    All relevant protocol is defined in the NV_vertex_program extension.
3277
3278Errors
3279
3280    This list includes the errors specified in the NV_vertex_program
3281    extension, modified as appropriate.
3282
3283    The error INVALID_VALUE is generated if VertexAttribNV is called where
3284    index is greater than 15.
3285
3286    The error INVALID_VALUE is generated if any ProgramParameterNV has an
3287    index is greater than 255 (was 95 in NV_vertex_program).
3288
3289    The error INVALID_VALUE is generated if VertexAttribPointerNV is called
3290    where index is greater than 15.
3291
3292    The error INVALID_VALUE is generated if VertexAttribPointerNV is called
3293    where size is not one of 1, 2, 3, or 4.
3294
3295    The error INVALID_VALUE is generated if VertexAttribPointerNV is called
3296    where stride is negative.
3297
3298    The error INVALID_OPERATION is generated if VertexAttribPointerNV is
3299    called where type is UNSIGNED_BYTE and size is not 4.
3300
3301    The error INVALID_VALUE is generated if LoadProgramNV is used to load a
3302    program with an id of zero.
3303
3304    The error INVALID_OPERATION is generated if LoadProgramNV is used to load
3305    an id that is currently loaded with a program of a different program
3306    target.
3307
3308    The error INVALID_OPERATION is generated if the program passed to
3309    LoadProgramNV fails to load because it is not syntactically correct based
3310    on the specified target.  The value of PROGRAM_ERROR_POSITION_NV is still
3311    updated when this error is generated.
3312
3313    The error INVALID_OPERATION is generated if LoadProgramNV has a target of
3314    VERTEX_PROGRAM_NV and the specified program fails to load because it does
3315    not write the HPOS register at least once.  The value of
3316    PROGRAM_ERROR_POSITION_NV is still updated when this error is generated.
3317
3318    The error INVALID_OPERATION is generated if LoadProgramNV has a target of
3319    VERTEX_STATE_PROGRAM_NV and the specified program fails to load because it
3320    does not write at least one program parameter register.  The value of
3321    PROGRAM_ERROR_POSITION_NV is still updated when this error is generated.
3322
3323    The error INVALID_OPERATION is generated if the vertex program or vertex
3324    state program passed to LoadProgramNV fails to load because it contains
3325    more than 128 instructions (VP1 programs) or 256 instructions (VP2
3326    programs).  The value of PROGRAM_ERROR_POSITION_NV is still updated when
3327    this error is generated.
3328
3329    The error INVALID_OPERATION is generated if a program is loaded with
3330    LoadProgramNV for id when id is currently loaded with a program of a
3331    different target.
3332
3333    The error INVALID_OPERATION is generated if BindProgramNV attempts to bind
3334    to a program name that is not a vertex program (for example, if the
3335    program is a vertex state program).
3336
3337    The error INVALID_VALUE is generated if GenProgramsNV is called where n is
3338    negative.
3339
3340    The error INVALID_VALUE is generated if AreProgramsResidentNV is called
3341    and any of the queried programs are zero or do not exist.
3342
3343    The error INVALID_OPERATION is generated if ExecuteProgramNV executes a
3344    program that does not exist.
3345
3346    The error INVALID_OPERATION is generated if ExecuteProgramNV executes a
3347    program that is not a vertex state program.
3348
3349    The error INVALID_OPERATION is generated if Begin, RasterPos, or a command
3350    that performs an explicit Begin is called when vertex program mode is
3351    enabled and the currently bound vertex program writes program parameters
3352    that are currently being tracked.
3353
3354    The error INVALID_OPERATION is generated if ExecuteProgramNV is called and
3355    the vertex state program to execute writes program parameters that are
3356    currently being tracked.
3357
3358    The error INVALID_VALUE is generated if TrackMatrixNV has a target of
3359    VERTEX_PROGRAM_NV and attempts to track an address is not a multiple of
3360    four.
3361
3362    The error INVALID_VALUE is generated if GetProgramParameterNV is called to
3363    query an index greater than 255 (was 95 in NV_vertex_program).
3364
3365    The error INVALID_VALUE is generated if GetVertexAttribNV is called to
3366    query an <index> greater than 15, or if <index> is zero and <pname> is
3367    CURRENT_ATTRIB_NV.
3368
3369    The error INVALID_VALUE is generated if GetVertexAttribPointervNV is
3370    called to query an index greater than 15.
3371
3372    The error INVALID_OPERATION is generated if GetProgramivNV is called and
3373    the program named id does not exist.
3374
3375    The error INVALID_OPERATION is generated if GetProgramStringNV is called
3376    and the program named <program> does not exist.
3377
3378    The error INVALID_VALUE is generated if GetTrackMatrixivNV is called with
3379    an <address> that is not divisible by four or greater than or equal to 256
3380    (was 96 in NV_vertex_program).
3381
3382    The error INVALID_VALUE is generated if AreProgramsResidentNV,
3383    DeleteProgramsNV, GenProgramsNV, or RequestResidentProgramsNV are called
3384    where <n> is negative.
3385
3386    The error INVALID_VALUE is generated if LoadProgramNV is called where
3387    <len> is negative.
3388
3389    The error INVALID_VALUE is generated if ProgramParameters4dvNV or
3390    ProgramParameters4fvNV are called where <count> is negative.
3391
3392    The error INVALID_VALUE is generated if VertexAttribs{1,2,3,4}{d,f,s}vNV
3393    is called where <count> is negative.
3394
3395    The error INVALID_ENUM is generated if BindProgramNV,
3396    GetProgramParameterfvNV, GetProgramParameterdvNV, GetTrackMatrixivNV,
3397    ProgramParameter4fNV, ProgramParameter4dNV, ProgramParameter4fvNV,
3398    ProgramParameter4dvNV, ProgramParameters4fvNV, ProgramParameters4dvNV,
3399    or TrackMatrixNV are called where <target> is not VERTEX_PROGRAM_NV.
3400
3401    The error INVALID_ENUM is generated if LoadProgramNV or
3402    ExecuteProgramNV are called where <target> is not either
3403    VERTEX_PROGRAM_NV or VERTEX_STATE_PROGRAM_NV.
3404
3405New State
3406
3407(Modify Table X.5, New State Introduced by NV_vertex_program from the
3408 NV_vertex_program specification.)
3409
3410Get Value             Type    Get Command              Initial Value  Description         Sec       Attribute
3411--------------------- ------  -----------------------  -------------  ------------------  --------  ------------
3412PROGRAM_PARAMETER_NV  256xR4  GetProgramParameterNV    (0,0,0,0)      program parameters  2.14.1.2  -
3413
3414
3415(Modify Table X.7.  Vertex Program Per-vertex Execution State.  "VP1" and
3416"VP2" refer to the VP1 and VP2 execution environments, respectively.)
3417
3418Get Value    Type    Get Command   Initial Value  Description              Sec       Attribute
3419---------    ------  -----------   -------------  -----------------------  --------  ---------
3420-            12xR4   -             (0,0,0,0)      VP1 temporary registers  2.14.1.4  -
3421-            16xR4   -             (0,0,0,0)      VP2 temporary registers  2.14.1.4  -
3422-            15xR4   -             (0,0,0,1)      vertex result registers  2.14.1.4  -
3423             Z4      -             (0,0,0,0)      VP1 address register     2.14.1.3  -
3424             2xZ4    -             (0,0,0,0)      VP2 address registers    2.14.1.3  -
3425
3426
3427Revision History
3428
3429    Rev.  Date      Author   Changes
3430    ----  --------  -------  --------------------------------------------
3431    33    03/18/08  pbrown   Fixed incorrectly documented clamp in the RCC
3432                             instruction.
3433
3434    32    05/16/04  pbrown   Documented that it's not possible to results from
3435                             LG2 that are any more precise than what is
3436                             available in the fp32 storage format.
3437
3438    31    08/17/03  pbrown   Added several overlooked opcodes (RCC, SUB, SIN)
3439                             to the grammar.  They are documented in the spec
3440                             body, however.
3441
3442    30    02/28/03  pbrown   Fixed incorrect condition code example.
3443
3444    29    12/08/02  pbrown   Fixed minor bug where "ABS" and "DPH" were listed
3445                             twice in the grammar.
3446
3447    28    10/29/02  pbrown   Remove support for indirect branching.  Added
3448                             missing o[CLPx] outputs to the grammar.  Minor
3449                             typo fixes.
3450
3451    25    07/19/02  pbrown   Fixed several miscellaneous errors in the spec.
3452
3453    24    06/28/02  pbrown   Fixed several erroneous resource limitations.
3454
3455    23    06/07/02  pbrown   Removed stray and erroneous abs() from the
3456                             documentation of the LG2 instruction.
3457
3458    22    06/06/02  pbrown   Added missing items from NV_vertex_program1_1, in
3459                             particular, program options.  Documented the
3460                             VP2.0 position-invariant programs have no
3461                             restrictions on indirect addressing.
3462
3463    21    06/19/02  pbrown   Cleaned up miscellaneous errors and issues
3464                             in the spec.
3465
3466    20    05/17/02  pbrown   Documented LOG instruction as taking the
3467                             absolute value of the operand, as in VP1.0.
3468                             Fixed special-case rules for MUL.  Added clamps
3469                             to special-case clamping rules for RCC.
3470
3471    18    05/09/02  pbrown   Clarified the handling of NaN/UN in certain
3472                             instructions and conditional operations.
3473
3474    17    04/26/02  pbrown   Fix incorrectly specified algorithm for computing
3475                             the y result in the LOG instruction.
3476
3477    16    04/21/02  pbrown   Added example for "paletted skinning".
3478                             Documented size limitation (10 bits) on the
3479                             address register and ARA, ARL, and ARR
3480                             instructions.  The limits needs to be exposed
3481                             because of the ARA instruction.  Cleaned up
3482                             documentation on absolute value on input
3483                             operations.  Added examples for masked writes and
3484                             CC updates, and for branching.  Fixed
3485                             out-of-range indexed branch language and
3486                             pseudocode to clamp to the actual table size
3487                             (rather than the theoretical maximum).
3488                             Documented ABS as semi-deprecated in VP2.  Fixed
3489                             special cases for MIN, MAX, SEQ, SGE, SGT, SLE,
3490                             SLT, and SNE.  Fix completely botched description
3491                             of RET.
3492
3493    15    04/05/02  pbrown   Updated introduction to indicate that
3494                             ARL/ARR/ARA all can update condition code.
3495                             Minor fixes and optimizations to the looping
3496                             examples.  Add missing "set on" opcodes to the
3497                             grammar.  Fixed spec to clamp branch table
3498                             indices to [0,15].  Added a couple caveats to
3499                             the "ABS" pseudo-instruction.   Documented
3500                             "ARR" as using IEEE round to nearest even
3501                             mode.  Documented special cases for "SSG".
3502