• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    NV_gpu_program4
4
5Name Strings
6
7    GL_NV_gpu_program4
8
9Contact
10
11    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
12
13Status
14
15    Shipping for GeForce 8 Series (November 2006)
16
17Version
18
19    Last Modified Date:         09/11/2014
20    NVIDIA Revision:            11
21
22Number
23
24    322
25
26Dependencies
27
28    This extension is written against to OpenGL 2.0 specification.
29
30    OpenGL 2.0 is not required, but we expect all implementations of this
31    extension will also support OpenGL 2.0.
32
33    This extension is also written against the ARB_vertex_program
34    specification, which provides the basic mechanisms for the assembly
35    programming model used by this extension.
36
37    This extension serves as the basis for the NV_fragment_program4,
38    NV_geometry_program4, and NV_vertex_program4, which all build on this
39    extension to support fragment, geometry, and vertex programs,
40    respectively.  If "GL_NV_gpu_program4" is found in the extension string,
41    all of these extensions are supported.
42
43    NV_parameter_buffer_object affects the definition of this extension.
44
45    ARB_texture_rectangle trivially affects the definition of this extension.
46
47    EXT_gpu_program_parameters trivially affects the definition of this
48    extension.
49
50    EXT_texture_integer trivially affects the definition of this extension.
51
52    EXT_texture_array trivially affects the definition of this extension.
53
54    EXT_texture_buffer_object trivially affects the definition of this
55    extension.
56
57    NV_primitive_restart trivially affects the definition of this extension.
58
59Overview
60
61    This specification documents the common instruction set and basic
62    functionality provided by NVIDIA's 4th generation of assembly instruction
63    sets supporting programmable graphics pipeline stages.
64
65    The instruction set builds upon the basic framework provided by the
66    ARB_vertex_program and ARB_fragment_program extensions to expose
67    considerably more capable hardware.  In addition to new capabilities for
68    vertex and fragment programs, this extension provides a new program type
69    (geometry programs) further described in the NV_geometry_program4
70    specification.
71
72    NV_gpu_program4 provides a unified instruction set -- all instruction set
73    features are available for all program types, except for a small number of
74    features that make sense only for a specific program type.  It provides
75    fully capable signed and unsigned integer data types, along with a set of
76    arithmetic, logical, and data type conversion instructions capable of
77    operating on integers.  It also provides a uniform set of structured
78    branching constructs (if tests, loops, and subroutines) that fully support
79    run-time condition testing.
80
81    This extension provides several new texture mapping capabilities.  Shadow
82    cube maps are supported, where cube map faces can encode depth values.
83    Texture lookup instructions can include an immediate texel offset, which
84    can assist in advanced filtering.  New instructions are provided to fetch
85    a single texel by address in a texture map (TXF) and query the size of a
86    specified texture level (TXQ).
87
88    By and large, vertex and fragment programs written to ARB_vertex_program
89    and ARB_fragment_program can be ported directly by simply changing the
90    program header from "!!ARBvp1.0" or "!!ARBfp1.0" to "!!NVvp4.0" or
91    "!!NVfp4.0", and then modifying the code to take advantage of the expanded
92    feature set.  There are a small number of areas where this extension is
93    not a functional superset of previous vertex program extensions, which are
94    documented in this specification.
95
96
97New Procedures and Functions
98
99    void ProgramLocalParameterI4iNV(enum target, uint index,
100                                    int x, int y, int z, int w);
101    void ProgramLocalParameterI4ivNV(enum target, uint index,
102                                     const int *params);
103    void ProgramLocalParametersI4ivNV(enum target, uint index,
104                                      sizei count, const int *params);
105    void ProgramLocalParameterI4uiNV(enum target, uint index,
106                                     uint x, uint y, uint z, uint w);
107    void ProgramLocalParameterI4uivNV(enum target, uint index,
108                                      const uint *params);
109    void ProgramLocalParametersI4uivNV(enum target, uint index,
110                                       sizei count, const uint *params);
111
112    void ProgramEnvParameterI4iNV(enum target, uint index,
113                                  int x, int y, int z, int w);
114    void ProgramEnvParameterI4ivNV(enum target, uint index,
115                                   const int *params);
116    void ProgramEnvParametersI4ivNV(enum target, uint index,
117                                    sizei count, const int *params);
118    void ProgramEnvParameterI4uiNV(enum target, uint index,
119                                   uint x, uint y, uint z, uint w);
120    void ProgramEnvParameterI4uivNV(enum target, uint index,
121                                    const uint *params);
122    void ProgramEnvParametersI4uivNV(enum target, uint index,
123                                     sizei count, const uint *params);
124
125    void GetProgramLocalParameterIivNV(enum target, uint index,
126                                       int *params);
127    void GetProgramLocalParameterIuivNV(enum target, uint index,
128                                        uint *params);
129    void GetProgramEnvParameterIivNV(enum target, uint index,
130                                     int *params);
131    void GetProgramEnvParameterIuivNV(enum target, uint index,
132                                      uint *params);
133
134New Tokens
135
136
137    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
138    GetFloatv, and GetDoublev:
139
140        MIN_PROGRAM_TEXEL_OFFSET_EXT                    0x8904
141        MAX_PROGRAM_TEXEL_OFFSET_EXT                    0x8905
142
143    (note:  these tokens are shared with the EXT_gpu_shader4 extension.)
144
145    Accepted by the <pname> parameter of GetProgramivARB:
146
147        PROGRAM_ATTRIB_COMPONENTS_NV                    0x8906
148        PROGRAM_RESULT_COMPONENTS_NV                    0x8907
149        MAX_PROGRAM_ATTRIB_COMPONENTS_NV                0x8908
150        MAX_PROGRAM_RESULT_COMPONENTS_NV                0x8909
151        MAX_PROGRAM_GENERIC_ATTRIBS_NV                  0x8DA5
152        MAX_PROGRAM_GENERIC_RESULTS_NV                  0x8DA6
153
154Additions to Chapter 2 of the OpenGL 1.5 Specification (OpenGL Operation)
155
156    (Modify "Section 2.14.1" of the ARB_vertex_program specification,
157    describing program parameters.)
158
159    Each program object has an associated array of program local parameters.
160    Program local parameters are four-component vectors whose components can
161    hold floating-point, signed integer, or unsigned integer values.  The data
162    type of each local parameter is established when the parameter's values
163    are assigned.  If a program attempts to read a local parameter using a
164    data type other than the one used when the parameter is set, the values
165    returned are undefined.  ... The commands
166
167      void ProgramLocalParameter4fARB(enum target, uint index,
168                                      float x, float y, float z, float w);
169      void ProgramLocalParameter4fvARB(enum target, uint index,
170                                       const float *params);
171      void ProgramLocalParameter4dARB(enum target, uint index,
172                                      double x, double y, double z, double w);
173      void ProgramLocalParameter4dvARB(enum target, uint index,
174                                       const double *params);
175
176      void ProgramLocalParameterI4iNV(enum target, uint index,
177                                      int x, int y, int z, int w);
178      void ProgramLocalParameterI4ivNV(enum target, uint index,
179                                       const int *params);
180      void ProgramLocalParameterI4uiNV(enum target, uint index,
181                                       uint x, uint y, uint z, uint w);
182      void ProgramLocalParameterI4uivNV(enum target, uint index,
183                                        const uint *params);
184
185    update the values of the program local parameter numbered <index>
186    belonging to the program object currently bound to <target>.  For the
187    non-vector versions of these commands, the four components of the
188    parameter are updated with the values of <x>, <y>, <z>, and <w>,
189    respectively.  For the vector versions, the components of the parameter
190    are updated with the array of four values pointed to by <params>.  The
191    error INVALID_VALUE is generated if <index> is greater than or equal to
192    the number of program local parameters supported by <target>.
193
194    The commands
195
196      void ProgramLocalParameters4fvNV(enum target, uint index,
197                                       sizei count, const float *params);
198      void ProgramLocalParametersI4ivNV(enum target, uint index,
199                                        sizei count, const int *params);
200      void ProgramLocalParametersI4uivNV(enum target, uint index,
201                                         sizei count, const uint *params);
202
203    update the values of the program local parameters numbered <index> through
204    <index> + <count> - 1 with the array of 4 * <count> values pointed to by
205    <params>.  The error INVALID_VALUE is generated if the sum of <index> and
206    <count> is greater than the number of program local parameters supported
207    by <target>.
208
209    When a program local parameter is updated, the data type of its components
210    is assigned according to the data type of the provided values.  If values
211    provided are of type "float" or "double", the components of the parameter
212    are floating-point.  If the values provided are of type "int", the
213    components of the parameter are signed integers.  If the values provided
214    are of type "uint", the components of the parameter are unsigned integers.
215
216    Additionally, each program target has an associated array of program
217    environment parameters.  Unlike program local parameters, program
218    environment parameters are shared by all program objects of a given
219    target.  Program environment parameters are four-component vectors whose
220    components can hold floating-point, signed integer, or unsigned integer
221    values.  The data type of each environment parameter is established when
222    the parameter's values are assigned.  If a program attempts to read an
223    environment parameter using a data type other than the one used when the
224    parameter is set, the values returned are undefined.  ... The commands
225
226      void ProgramEnvParameter4fARB(enum target, uint index,
227                                    float x, float y, float z, float w);
228      void ProgramEnvParameter4fvARB(enum target, uint index,
229                                     const float *params);
230      void ProgramEnvParameter4dARB(enum target, uint index,
231                                    double x, double y, double z, double w);
232      void ProgramEnvParameter4dvARB(enum target, uint index,
233                                     const double *params);
234      void ProgramEnvParameterI4iNV(enum target, uint index,
235                                    int x, int y, int z, int w);
236      void ProgramEnvParameterI4ivNV(enum target, uint index,
237                                     const int *params);
238      void ProgramEnvParameterI4uiNV(enum target, uint index,
239                                     uint x, uint y, uint z, uint w);
240      void ProgramEnvParameterI4uivNV(enum target, uint index,
241                                      const uint *params);
242
243    update the values of the program environment parameter numbered <index>
244    for the given program target <target>.  For the non-vector versions of
245    these commands, the four components of the parameter are updated with the
246    values of <x>, <y>, <z>, and <w>, respectively.  For the vector versions,
247    the four components of the parameter are updated with the array of four
248    values pointed to by <params>.  The error INVALID_VALUE is generated if
249    <index> is greater than or equal to the number of program environment
250    parameters supported by <target>.
251
252    The commands
253
254      void ProgramEnvParameters4fvNV(enum target, uint index,
255                                     sizei count, const float *params);
256      void ProgramEnvParametersI4ivNV(enum target, uint index,
257                                      sizei count, const int *params);
258      void ProgramEnvParametersI4uivNV(enum target, uint index,
259                                       sizei count, const uint *params);
260
261    update the values of the program environment parameters numbered <index>
262    through <index> + <count> - 1 with the array of 4 * <count> values pointed
263    to by <params>.  The error INVALID_VALUE is generated if the sum of
264    <index> and <count> is greater than the number of program local parameters
265    supported by <target>.
266
267    When a program environment parameter is updated, the data type of its
268    components is assigned according to the data type of the provided values.
269    If values provided are of type "float" or "double", the components of the
270    parameter are floating-point.  If the values provided are of type "int",
271    the components of the parameter are signed integers.  If the values
272    provided are of type "uint", the components of the parameter are unsigned
273    integers.
274
275    ...
276
277
278    Insert New Section 2.X between Sections 2.Y and 2.Z:
279
280    Section 2.X, GPU Programs
281
282    The GL provides a number of different program targets that allow an
283    application to either replace certain fixed-function pipeline stages with
284    a fully programmable model or use a program to control aspects of the GL
285    pipeline that previously had only hard-wired behavior.
286
287    A common base instruction set is available for all program types,
288    providing both integer and floating-point operations.  Structured
289    branching operations and subroutine calls are available.  Texture
290    mapping (loading data from external images) is supported for all
291    program types.  The main differences between the different program
292    types are the set of available inputs and outputs, which are program type-
293    specific, and a few instructions that are meaningful for only a subset
294    of program types.
295
296
297
298    Section 2.X.2, Program Grammar
299
300    GPU program strings are specified as an array of ASCII characters
301    containing the program text.  When a GPU program is loaded by a call to
302    ProgramStringARB, the program string is parsed into a set of tokens
303    possibly separated by whitespace.  Spaces, tabs, newlines, carriage
304    returns, and comments are considered whitespace.  Comments begin with the
305    character "#" and are terminated by a newline, a carriage return, or the
306    end of the program array.
307
308    The Backus-Naur Form (BNF) grammar below specifies the syntactically valid
309    sequences for GPU programs.  The set of valid tokens can be inferred
310    from the grammar.  A line containing "/* empty */" represents an empty
311    string and is used to indicate optional rules.  A program is invalid if it
312    contains any tokens or characters not defined in this specification.
313
314    Note that this extension is not a standalone extension and a small number
315    of grammar rules are left to be defined in the extensions defining the
316    specific vertex, fragment, and geometry program types.
317
318
319    <program>               ::= <optionSequence> <declSequence>
320                                <statementSequence> "END"
321
322    <optionSequence>        ::= <option> <optionSequence>
323                              | /* empty */
324
325    <option>                ::= "OPTION" <identifier> ";"
326
327    <declSequence>          ::= /* empty */
328
329    <statementSequence>     ::= <statement> <statementSequence>
330                              | /* empty */
331
332    <statement>             ::= <instruction> ";"
333                              | <namingStatement> ";"
334                              | <instLabel> ":"
335
336    <instruction>           ::= <ALUInstruction>
337                              | <TexInstruction>
338                              | <FlowInstruction>
339
340    <ALUInstruction>        ::= <VECTORop_instruction>
341                              | <SCALARop_instruction>
342                              | <BINSCop_instruction>
343                              | <BINop_instruction>
344                              | <VECSCAop_instruction>
345                              | <TRIop_instruction>
346                              | <SWZop_instruction>
347
348    <TexInstruction>        ::= <TEXop_instruction>
349                              | <TXDop_instruction>
350
351    <FlowInstruction>       ::= <BRAop_instruction>
352                              | <FLOWCCop_instruction>
353                              | <IFop_instruction>
354                              | <REPop_instruction>
355                              | <ENDFLOWop_instruction>
356
357    <VECTORop_instruction>  ::= <VECTORop> <opModifiers> <instResult> ","
358                                <instOperandV>
359
360    <VECTORop>              ::= "ABS"
361                              | "CEIL"
362                              | "FLR"
363                              | "FRC"
364                              | "I2F"
365                              | "LIT"
366                              | "MOV"
367                              | "NOT"
368                              | "NRM"
369                              | "PK2H"
370                              | "PK2US"
371                              | "PK4B"
372                              | "PK4UB"
373                              | "ROUND"
374                              | "SSG"
375                              | "TRUNC"
376
377    <SCALARop_instruction>  ::= <SCALARop> <opModifiers> <instResult> ","
378                                <instOperandS>
379
380    <SCALARop>              ::= "COS"
381                              | "EX2"
382                              | "LG2"
383                              | "RCC"
384                              | "RCP"
385                              | "RSQ"
386                              | "SCS"
387                              | "SIN"
388                              | "UP2H"
389                              | "UP2US"
390                              | "UP4B"
391                              | "UP4UB"
392
393    <BINSCop_instruction>   ::= <BINSCop> <opModifiers> <instResult> ","
394                                <instOperandS> "," <instOperandS>
395
396    <BINSCop>               ::= "POW"
397
398    <VECSCAop_instruction>  ::= <VECSCAop> <opModifiers> <instResult> ","
399                                <instOperandV> "," <instOperandS>
400
401    <VECSCAop>              ::= "DIV"
402                              | "SHL"
403                              | "SHR"
404                              | "MOD"
405
406    <BINop_instruction>     ::= <BINop> <opModifiers> <instResult> ","
407                                <instOperandV> "," <instOperandV>
408
409    <BINop>                 ::= "ADD"
410                              | "AND"
411                              | "DP3"
412                              | "DP4"
413                              | "DPH"
414                              | "DST"
415                              | "MAX"
416                              | "MIN"
417                              | "MUL"
418                              | "OR"
419                              | "RFL"
420                              | "SEQ"
421                              | "SFL"
422                              | "SGE"
423                              | "SGT"
424                              | "SLE"
425                              | "SLT"
426                              | "SNE"
427                              | "STR"
428                              | "SUB"
429                              | "XPD"
430                              | "DP2"
431                              | "XOR"
432
433    <TRIop_instruction>     ::= <TRIop> <opModifiers> <instResult> ","
434                                <instOperandV> "," <instOperandV> ","
435                                <instOperandV>
436
437    <TRIop>                 ::= "CMP"
438                              | "DP2A"
439                              | "LRP"
440                              | "MAD"
441                              | "SAD"
442                              | "X2D"
443
444    <SWZop_instruction>     ::= <SWZop> <opModifiers> <instResult> ","
445                                <instOperandVNS> "," <extendedSwizzle>
446
447    <SWZop>                 ::= "SWZ"
448
449    <TEXop_instruction>     ::= <TEXop> <opModifiers> <instResult> ","
450                                <instOperandV> "," <texAccess>
451
452    <TEXop>                 ::= "TEX"
453                              | "TXB"
454                              | "TXF"
455                              | "TXL"
456                              | "TXP"
457                              | "TXQ"
458
459    <TXDop_instruction>     ::= <TXDop> <opModifiers> <instResult> ","
460                                <instOperandV> "," <instOperandV> ","
461                                <instOperandV> "," <texAccess>
462
463    <TXDop>                 ::= "TXD"
464
465    <BRAop_instruction>     ::= <BRAop> <opModifiers> <instTarget>
466                                <optBranchCond>
467
468    <BRAop>                 ::= "CAL"
469
470    <FLOWCCop_instruction>  ::= <FLOWCCop> <opModifiers> <optBranchCond>
471
472    <FLOWCCop>              ::= "RET"
473                              | "BRK"
474                              | "CONT"
475
476    <IFop_instruction>      ::= <IFop> <opModifiers> <ccTest>
477
478    <IFop>                  ::= "IF"
479
480    <REPop_instruction>     ::= <REPop> <opModifiers> <instOperandV>
481                              | <REPop> <opModifiers>
482
483    <REPop>                 ::= "REP"
484
485    <ENDFLOWop_instruction> ::= <ENDFLOWop> <opModifiers>
486
487    <ENDFLOWop>             ::= "ELSE"
488                              | "ENDIF"
489                              | "ENDREP"
490
491    <opModifiers>           ::= <opModifierItem> <opModifiers>
492                              | /* empty */
493
494    <opModifierItem>        ::= "." <opModifier>
495
496    <opModifier>            ::= "F"
497                              | "U"
498                              | "S"
499                              | "CC"
500                              | "CC0"
501                              | "CC1"
502                              | "SAT"
503                              | "SSAT"
504                              | "NTC"
505                              | "S24"
506                              | "U24"
507                              | "HI"
508
509    <texAccess>             ::= <texImageUnit> "," <texTarget>
510                              | <texImageUnit> "," <texTarget> "," <texOffset>
511
512    <texImageUnit>          ::= "texture" <optArrayMemAbs>
513
514    <texTarget>             ::= "1D"
515                              | "2D"
516                              | "3D"
517                              | "CUBE"
518                              | "RECT"
519                              | "SHADOW1D"
520                              | "SHADOW2D"
521                              | "SHADOWRECT"
522                              | "ARRAY1D"
523                              | "ARRAY2D"
524                              | "SHADOWCUBE"
525                              | "SHADOWARRAY1D"
526                              | "SHADOWARRAY2D"
527
528    <texOffset>             ::= "(" <texOffsetComp> ")"
529                              | "(" <texOffsetComp> "," <texOffsetComp> ")"
530                              | "(" <texOffsetComp> "," <texOffsetComp> ","
531                                <texOffsetComp> ")"
532
533    <texOffsetComp>         ::= <optSign> <int>
534
535    <optBranchCond>         ::= /* empty */
536                              | <ccMask>
537
538    <instOperandV>          ::= <instOperandAbsV>
539                              | <instOperandBaseV>
540
541    <instOperandAbsV>       ::= <operandAbsNeg> "|" <instOperandBaseV> "|"
542
543    <instOperandBaseV>      ::= <operandNeg> <attribUseV>
544                              | <operandNeg> <tempUseV>
545                              | <operandNeg> <paramUseV>
546                              | <operandNeg> <bufferUseV>
547
548    <instOperandS>          ::= <instOperandAbsS>
549                              | <instOperandBaseS>
550
551    <instOperandAbsS>       ::= <operandAbsNeg> "|" <instOperandBaseS> "|"
552
553    <instOperandBaseS>      ::= <operandNeg> <attribUseS>
554                              | <operandNeg> <tempUseS>
555                              | <operandNeg> <paramUseS>
556                              | <operandNeg> <bufferUseS>
557
558    <instOperandVNS>        ::= <attribUseVNS>
559                              | <tempUseVNS>
560                              | <paramUseVNS>
561                              | <bufferUseVNS>
562
563    <operandAbsNeg>         ::= <optSign>
564
565    <operandNeg>            ::= <optSign>
566
567    <instResult>            ::= <instResultCC>
568                              | <instResultBase>
569
570    <instResultCC>          ::= <instResultBase> <ccMask>
571
572    <instResultBase>        ::= <tempUseW>
573                              | <resultUseW>
574
575    <namingStatement>       ::= <varMods> <ATTRIB_statement>
576                              | <varMods> <PARAM_statement>
577                              | <varMods> <TEMP_statement>
578                              | <varMods> <OUTPUT_statement>
579                              | <varMods> <BUFFER_statement>
580                              | <ALIAS_statement>
581
582    <ATTRIB_statement>      ::= "ATTRIB" <establishName> "=" <attribUseD>
583
584    <PARAM_statement>       ::= <PARAM_singleStmt>
585                              | <PARAM_multipleStmt>
586
587    <PARAM_singleStmt>      ::= "PARAM" <establishName> <paramSingleInit>
588
589    <PARAM_multipleStmt>    ::= "PARAM" <establishName> <optArraySize>
590                                <paramMultipleInit>
591
592    <paramSingleInit>       ::= "=" <paramUseDB>
593
594    <paramMultipleInit>     ::= "=" "{" <paramMultInitList> "}"
595
596    <paramMultInitList>     ::= <paramUseDM>
597                              | <paramUseDM> "," <paramMultInitList>
598
599    <TEMP_statement>        ::= "TEMP" <varNameList>
600
601    <OUTPUT_statement>      ::= "OUTPUT" <establishName> "=" <resultUseD>
602
603    <varMods>               ::= <varModifier> <varMods>
604                              | /* empty */
605
606    <varModifier>           ::= "SHORT"
607                              | "LONG"
608                              | "INT"
609                              | "UINT"
610                              | "FLOAT"
611
612    <ALIAS_statement>       ::= "ALIAS" <establishName> "=" <establishedName>
613
614    <BUFFER_statement>      ::= <bufferDeclType> <establishName> "="
615                                <bufferSingleInit>
616                              | <bufferDeclType> <establishName>
617                                <optArraySize> "=" <bufferMultInit>
618
619    <bufferDeclType>        ::= "BUFFER"
620                              | "BUFFER4"
621
622    <bufferSingleInit>      ::= "=" <bufferUseDB>
623
624    <bufferMultInit>        ::= "=" "{" <bufferMultInitList> "}"
625
626    <bufferMultInitList>    ::= <bufferUseDM>
627                              | <bufferUseDM> "," <bufferMultInitList>
628
629    <varNameList>           ::= <establishName>
630                              | <establishName> "," <varNameList>
631
632    <attribUseV>            ::= <attribBasic> <swizzleSuffix>
633                              | <attribVarName> <swizzleSuffix>
634                              | <attribVarName> <arrayMem> <swizzleSuffix>
635                              | <attribColor> <swizzleSuffix>
636                              | <attribColor> "." <colorType> <swizzleSuffix>
637
638    <attribUseS>            ::= <attribBasic> <scalarSuffix>
639                              | <attribVarName> <scalarSuffix>
640                              | <attribVarName> <arrayMem> <scalarSuffix>
641                              | <attribColor> <scalarSuffix>
642                              | <attribColor> "." <colorType> <scalarSuffix>
643
644    <attribUseVNS>          ::= <attribBasic>
645                              | <attribVarName>
646                              | <attribVarName> <arrayMem>
647                              | <attribColor>
648                              | <attribColor> "." <colorType>
649
650    <attribUseD>            ::= <attribBasic>
651                              | <attribColor>
652                              | <attribColor> "." <colorType>
653                              | <attribMulti>
654
655    <paramUseV>             ::= <paramVarName> <optArrayMem> <swizzleSuffix>
656                              | <stateSingleItem> <swizzleSuffix>
657                              | <programSingleItem> <swizzleSuffix>
658                              | <constantVector> <swizzleSuffix>
659                              | <constantScalar>
660
661    <paramUseS>             ::= <paramVarName> <optArrayMem> <scalarSuffix>
662                              | <stateSingleItem> <scalarSuffix>
663                              | <programSingleItem> <scalarSuffix>
664                              | <constantVector> <scalarSuffix>
665                              | <constantScalar>
666
667    <paramUseVNS>           ::= <paramVarName> <optArrayMem>
668                              | <stateSingleItem>
669                              | <programSingleItem>
670                              | <constantVector>
671                              | <constantScalar>
672
673    <paramUseDB>            ::= <stateSingleItem>
674                              | <programSingleItem>
675                              | <constantVector>
676                              | <signedConstantScalar>
677
678    <paramUseDM>            ::= <stateMultipleItem>
679                              | <programMultipleItem>
680                              | <constantVector>
681                              | <signedConstantScalar>
682
683    <stateMultipleItem>     ::= <stateSingleItem>
684                              | "state" "." <stateMatrixRows>
685
686    <stateSingleItem>       ::= "state" "." <stateMaterialItem>
687                              | "state" "." <stateLightItem>
688                              | "state" "." <stateLightModelItem>
689                              | "state" "." <stateLightProdItem>
690                              | "state" "." <stateFogItem>
691                              | "state" "." <stateMatrixRow>
692                              | "state" "." <stateTexGenItem>
693                              | "state" "." <stateClipPlaneItem>
694                              | "state" "." <statePointItem>
695                              | "state" "." <stateTexEnvItem>
696                              | "state" "." <stateDepthItem>
697
698    <stateMaterialItem>     ::= "material" "." <stateMatProperty>
699                              | "material" "." <faceType> "."
700                                <stateMatProperty>
701
702    <stateMatProperty>      ::= "ambient"
703                              | "diffuse"
704                              | "specular"
705                              | "emission"
706                              | "shininess"
707
708    <stateLightItem>        ::= "light" <arrayMemAbs> "." <stateLightProperty>
709
710    <stateLightProperty>    ::= "ambient"
711                              | "diffuse"
712                              | "specular"
713                              | "position"
714                              | "attenuation"
715                              | "spot" "." <stateSpotProperty>
716                              | "half"
717
718    <stateSpotProperty>     ::= "direction"
719
720    <stateLightModelItem>   ::= "lightmodel" "." <stateLModProperty>
721
722    <stateLModProperty>     ::= "ambient"
723                              | "scenecolor"
724                              | <faceType> "." "scenecolor"
725
726    <stateLightProdItem>    ::= "lightprod" <arrayMemAbs> "."
727                                <stateLProdProperty>
728                              | "lightprod" <arrayMemAbs> "." <faceType> "."
729                                <stateLProdProperty>
730
731    <stateLProdProperty>    ::= "ambient"
732                              | "diffuse"
733                              | "specular"
734
735    <stateFogItem>          ::= "fog" "." <stateFogProperty>
736
737    <stateFogProperty>      ::= "color"
738                              | "params"
739
740    <stateMatrixRows>       ::= <stateMatrixItem>
741                              | <stateMatrixItem> "." <stateMatModifier>
742                              | <stateMatrixItem> "." "row" <arrayRange>
743                              | <stateMatrixItem> "." <stateMatModifier> "."
744                                "row" <arrayRange>
745
746    <stateMatrixRow>        ::= <stateMatrixItem> "." "row" <arrayMemAbs>
747                              | <stateMatrixItem> "." <stateMatModifier> "."
748                                "row" <arrayMemAbs>
749
750    <stateMatrixItem>       ::= "matrix" "." <stateMatrixName>
751
752    <stateMatModifier>      ::= "inverse"
753                              | "transpose"
754                              | "invtrans"
755
756    <stateMatrixName>       ::= "modelview" <optArrayMemAbs>
757                              | "projection"
758                              | "mvp"
759                              | "texture" <optArrayMemAbs>
760                              | "program" <arrayMemAbs>
761
762    <stateTexGenItem>       ::= "texgen" <optArrayMemAbs> "."
763                                <stateTexGenType> "." <stateTexGenCoord>
764
765    <stateTexGenType>       ::= "eye"
766                              | "object"
767
768    <stateTexGenCoord>      ::= "s"
769                              | "t"
770                              | "r"
771                              | "q"
772
773    <stateClipPlaneItem>    ::= "clip" <arrayMemAbs> "." "plane"
774
775    <statePointItem>        ::= "point" "." <statePointProperty>
776
777    <statePointProperty>    ::= "size"
778                              | "attenuation"
779
780    <stateTexEnvItem>       ::= "texenv" <optArrayMemAbs> "."
781                                <stateTexEnvProperty>
782
783    <stateTexEnvProperty>   ::= "color"
784
785    <stateDepthItem>        ::= "depth" "." <stateDepthProperty>
786
787    <stateDepthProperty>    ::= "range"
788
789    <programSingleItem>     ::= <progEnvParam>
790                              | <progLocalParam>
791
792    <programMultipleItem>   ::= <progEnvParams>
793                              | <progLocalParams>
794
795    <progEnvParams>         ::= "program" "." "env" <arrayMemAbs>
796                              | "program" "." "env" <arrayRange>
797
798    <progEnvParam>          ::= "program" "." "env" <arrayMemAbs>
799
800    <progLocalParams>       ::= "program" "." "local" <arrayMemAbs>
801                              | "program" "." "local" <arrayRange>
802
803    <progLocalParam>        ::= "program" "." "local" <arrayMemAbs>
804
805    <constantVector>        ::= "{" <constantVectorList> "}"
806
807    <constantVectorList>    ::= <signedConstantScalar>
808                              | <signedConstantScalar> ","
809                                <signedConstantScalar>
810                              | <signedConstantScalar> ","
811                                <signedConstantScalar> ","
812                                <signedConstantScalar>
813                              | <signedConstantScalar> ","
814                                <signedConstantScalar> ","
815                                <signedConstantScalar> ","
816                                <signedConstantScalar>
817
818    <signedConstantScalar>  ::= <optSign> <constantScalar>
819
820    <constantScalar>        ::= <floatConstant>
821                              | <intConstant>
822
823    <floatConstant>         ::= <float>
824
825    <intConstant>           ::= <int>
826
827    <tempUseV>              ::= <tempVarName> <swizzleSuffix>
828
829    <tempUseS>              ::= <tempVarName> <scalarSuffix>
830
831    <tempUseVNS>            ::= <tempVarName>
832
833    <tempUseW>              ::= <tempVarName> <optWriteMask>
834
835    <resultUseW>            ::= <resultBasic> <optWriteMask>
836                              | <resultVarName> <optWriteMask>
837
838    <resultUseD>            ::= <resultBasic>
839
840    <bufferUseV>            ::= <bufferVarName> <optArrayMem> <swizzleSuffix>
841
842    <bufferUseS>            ::= <bufferVarName> <optArrayMem> <scalarSuffix>
843
844    <bufferUseVNS>          ::= <bufferVarName> <optArrayMem>
845
846    <bufferUseDB>           ::= <bufferBinding> <arrayMemAbs>
847
848    <bufferUseDM>           ::= <bufferBinding> <arrayMemAbs>
849                              | <bufferBinding> <arrayRange>
850                              | <bufferBinding>
851
852    <bufferBinding>         ::= "program" "." "buffer" <arrayMemAbs>
853
854    <optArraySize>          ::= "[" "]"
855                              | "[" <int> "]"
856
857    <optArrayMem>           ::= /* empty */
858                              | <arrayMem>
859
860    <arrayMem>              ::= <arrayMemAbs>
861                              | <arrayMemRel>
862
863    <optArrayMemAbs>        ::= /* empty */
864                              | <arrayMemAbs>
865
866    <arrayMemAbs>           ::= "[" <int> "]"
867
868    <arrayMemRel>           ::= "[" <arrayMemReg> <arrayMemOffset> "]"
869
870    <arrayMemReg>           ::= <addrUseS>
871
872    <arrayMemOffset>        ::= /* empty */
873                              | "+" <int>
874                              | "-" <int>
875
876    <arrayRange>            ::= "[" <int> ".." <int> "]"
877
878    <addrUseS>              ::= <addrVarName> <scalarSuffix>
879
880    <ccMask>                ::= "(" <ccTest> ")"
881
882    <ccTest>                ::= <ccMaskRule> <swizzleSuffix>
883
884    <ccMaskRule>            ::= "EQ"
885                              | "GE"
886                              | "GT"
887                              | "LE"
888                              | "LT"
889                              | "NE"
890                              | "TR"
891                              | "FL"
892                              | "EQ0"
893                              | "GE0"
894                              | "GT0"
895                              | "LE0"
896                              | "LT0"
897                              | "NE0"
898                              | "TR0"
899                              | "FL0"
900                              | "EQ1"
901                              | "GE1"
902                              | "GT1"
903                              | "LE1"
904                              | "LT1"
905                              | "NE1"
906                              | "TR1"
907                              | "FL1"
908                              | "NAN"
909                              | "NAN0"
910                              | "NAN1"
911                              | "LEG"
912                              | "LEG0"
913                              | "LEG1"
914                              | "CF"
915                              | "CF0"
916                              | "CF1"
917                              | "NCF"
918                              | "NCF0"
919                              | "NCF1"
920                              | "OF"
921                              | "OF0"
922                              | "OF1"
923                              | "NOF"
924                              | "NOF0"
925                              | "NOF1"
926                              | "AB"
927                              | "AB0"
928                              | "AB1"
929                              | "BLE"
930                              | "BLE0"
931                              | "BLE1"
932                              | "SF"
933                              | "SF0"
934                              | "SF1"
935                              | "NSF"
936                              | "NSF0"
937                              | "NSF1"
938
939    <optWriteMask>          ::= /* empty */
940                              | <xyzwMask>
941                              | <rgbaMask>
942
943    <xyzwMask>              ::= "." "x"
944                              | "." "y"
945                              | "." "xy"
946                              | "." "z"
947                              | "." "xz"
948                              | "." "yz"
949                              | "." "xyz"
950                              | "." "w"
951                              | "." "xw"
952                              | "." "yw"
953                              | "." "xyw"
954                              | "." "zw"
955                              | "." "xzw"
956                              | "." "yzw"
957                              | "." "xyzw"
958
959    <rgbaMask>              ::= "." "r"
960                              | "." "g"
961                              | "." "rg"
962                              | "." "b"
963                              | "." "rb"
964                              | "." "gb"
965                              | "." "rgb"
966                              | "." "a"
967                              | "." "ra"
968                              | "." "ga"
969                              | "." "rga"
970                              | "." "ba"
971                              | "." "rba"
972                              | "." "gba"
973                              | "." "rgba"
974
975    <swizzleSuffix>         ::= /* empty */
976                              | "." <component>
977                              | "." <xyzwSwizzle>
978                              | "." <rgbaSwizzle>
979
980    <extendedSwizzle>       ::= <extSwizComp> "," <extSwizComp> ","
981                                <extSwizComp> "," <extSwizComp>
982
983    <extSwizComp>           ::= <optSign> <xyzwExtSwizSel>
984                              | <optSign> <rgbaExtSwizSel>
985
986    <xyzwExtSwizSel>        ::= "0"
987                              | "1"
988                              | <xyzwComponent>
989
990    <rgbaExtSwizSel>        ::= <rgbaComponent>
991
992    <scalarSuffix>          ::= "." <component>
993
994    <component>             ::= <xyzwComponent>
995                              | <rgbaComponent>
996
997    <xyzwComponent>         ::= "x"
998                              | "y"
999                              | "z"
1000                              | "w"
1001
1002    <rgbaComponent>         ::= "r"
1003                              | "g"
1004                              | "b"
1005                              | "a"
1006
1007    <optSign>               ::= /* empty */
1008                              | "-"
1009                              | "+"
1010
1011    <faceType>              ::= "front"
1012                              | "back"
1013
1014    <colorType>             ::= "primary"
1015                              | "secondary"
1016
1017    <instLabel>             ::= <identifier>
1018
1019    <instTarget>            ::= <identifier>
1020
1021    <establishedName>       ::= <identifier>
1022
1023    <establishName>         ::= <identifier>
1024
1025
1026    The <int> rule matches an integer constant.  The integer consists of a
1027    sequence of one or more digits ("0" through "9"), or a sequence in
1028    hexadecimal form beginning with "0x" followed by a sequence of one or more
1029    hexadecimal digits ("0" through "9", "a" through "f", "A" through "F").
1030
1031    The <float> rule matches a floating-point constant consisting of an
1032    integer part, a decimal point, a fraction part, an "e" or "E", and an
1033    optionally signed integer exponent.  The integer and fraction parts both
1034    consist of a sequence of one or more digits ("0" through "9").  Either the
1035    integer part or the fraction parts (not both) may be missing; either the
1036    decimal point or the "e" (or "E") and the exponent (not both) may be
1037    missing.  Most grammar rules that allow floating-point values also allow
1038    integers matching the <int> rule.
1039
1040    The <identifier> rule matches a sequence of one or more letters ("A"
1041    through "Z", "a" through "z"), digits ("0" through "9), underscores ("_"),
1042    or dollar signs ("$"); the first character must not be a number.  Upper
1043    and lower case letters are considered different (names are
1044    case-sensitive).  The following strings are reserved keywords and may not
1045    be used as identifiers:  "fragment" (for fragment programs only), "vertex"
1046    (for vertex and geometry programs), "primitive" (for fragment and geometry
1047    programs), "program", "result", "state", and "texture".
1048
1049    The <tempVarName>, <paramVarName>, <attribVarName>, <resultVarName>, and
1050    <bufferName> rules match identifiers that have been previously established
1051    as names of temporary, program parameter, attribute, result, and program
1052    parameter buffer variables, respectively.
1053
1054    The <xyzwSwizzle> and <rgbaSwizzle> rules match any 4-character strings
1055    consisting only of the characters "x", "y", "z", and "w" (<xyzwSwizzle>)
1056    or "r", "g", "b", "a" (<rgbaSwizzle>).
1057
1058    The error INVALID_OPERATION is generated if a program fails to load
1059    because it is not syntactically correct or for one of the semantic
1060    restrictions described in the following sections.
1061
1062    A successfully loaded program is parsed into a sequence of instructions.
1063    Each instruction is identified by its tokenized name.  The operation of
1064    these instructions when executed is defined in section 2.X.4.  A
1065    successfully loaded program string replaces the program string previously
1066    loaded into the specified program object.  If the OUT_OF_MEMORY error is
1067    generated by ProgramStringARB, no change is made to the previous contents
1068    of the current program object.
1069
1070
1071    Section 2.X.3, Program Variables
1072
1073    Programs may operate on a number of different variables during their
1074    execution.  The following sections define the different classes of
1075    variables that can be declared and used by a program.
1076
1077    Some variable classes require variable bindings.  Variable classes with
1078    bindings refer to state that is either generated or consumed outside the
1079    program.  Examples of variable bindings include a vertex's normal, the
1080    position of a vertex computed by a vertex program, an interpolated texture
1081    coordinate, and the diffuse color of light 1.  Variables that are used
1082    only during program execution do not have bindings.
1083
1084    Variables may be declared explicitly according to the <namingStatement>
1085    grammar rule.  Explicit variable declarations allow a program to establish
1086    a variable name that can be used to refer to a specified resource in
1087    subsequent instructions.  Variables may be declared anywhere in the
1088    program string, but must be declared prior to use.  A program will fail to
1089    load if it declares the same variable name more than once, or if it refers
1090    to a variable name that has not been previously declared in the program
1091    string.
1092
1093    Variables may also be declared implicitly, simply by using a variable
1094    binding as an operand in a program instruction.  Such uses are considered
1095    to automatically create a nameless variable using the specified binding.
1096    Only variable from classes with bindings can be declared implicitly.
1097
1098
1099    Section 2.X.3.1, Program Variable Types
1100
1101    Explicit variable declarations may include one or more modifiers that
1102    specify additional information about the variable, such as the size and
1103    data type of the components of the variable.  Variable modifiers are
1104    specified according to the <varModifier> grammar rule.
1105
1106    By default, variables are considered typeless.  They can be used in
1107    instructions that read or write the variable as floating-point values,
1108    signed integers, or unsigned integers.  If a variable is written using one
1109    data type but then read using a different one, the results of the
1110    operation are undefined.  Variables with bindings are considered to be
1111    read or written when their values are produced or consumed; the data type
1112    used by the GL is specified in the description of each binding.
1113
1114    Explicitly declared variables may optionally have one data type modifier,
1115    which can be used to detect data type mismatch errors.  Type modifers of
1116    "INT", "UINT", and "FLOAT" indicate that the components of the variable
1117    are stored as signed integers, unsigned integers, or floating-point
1118    values, respectively.  A program will fail to load if it attempts to read
1119    or write a variable using a data type other than the one indicated by the
1120    data type modifier.  Variables without a data type modifier can be read or
1121    written using any data type.
1122
1123    Explicitly declared variables may optionally have one storage size
1124    modifier.  Variables decared as "SHORT" will be represented using at least
1125    16 bits per component.  "SHORT" floating-point values will have at least 5
1126    bits of exponent and 10 bits of mantissa.  Variables declared as "LONG"
1127    will be represented with at least 32 bits per component.  "LONG"
1128    floating-point values will have at least 8 bits of exponent and 23 bits of
1129    mantissa.  If no size modifier is provided, the GL will automatically
1130    select component sizes.  Implementations are not required to support more
1131    than one component size, so "SHORT", "LONG", and the default could all
1132    refer to the same component size.  The "LONG" modifier is supported only
1133    for declarations of temporary variables ("TEMP").  The "SHORT" modifier is
1134    supported only for declarations of temporary variables and result
1135    variables ("OUTPUT").
1136
1137    Each variable declaration can include at most one data type and one
1138    storage size modifier.  A program will fail to load if it specifies
1139    multiple data type or multiple storage size modifiers in a single variable
1140    declaration.
1141
1142    (NOTE:  Fragment programs also support the modifiers "FLAT", "CENTROID",
1143    and "NOPERSPECTIVE", which control how per-fragment attribute values are
1144    produced.  These modifiers are described in detail in the
1145    NV_fragment_program4 specification.)
1146
1147    Explicitly declared variables of all types may be declared as arrays.  An
1148    array variable has one or more members, numbered 0 through <n>-1, where
1149    <n> is the number of entries in the array.  The total number of entries in
1150    the array can be declared using the <optArraySize> grammar rule.  For
1151    variable classes without bindings, an array size must be specified in the
1152    program, and must be a positive integer.  For variable classes with
1153    bindings, a declared size is optional, and is taken from the number of
1154    bindings assigned in the declaration if omitted.  A program will fail to
1155    load if the declared size of an array variable does not match the number
1156    of assigned bindings.
1157
1158    When a variable is declared as an array, instructions that use the
1159    variable must specify an array member to access according to the
1160    <arrayMem> grammar rule.  A program will fail to load if it contains an
1161    instruction that accesses an array variable without specifying an array
1162    member or an instruction that specifies an array member for a non-array
1163    variable.
1164
1165
1166    Section 2.X.3.2, Program Attribute Variables
1167
1168    Program attribute variables represent per-vertex or per-fragment inputs to
1169    the program.  All attribute variables have associated bindings, and are
1170    read-only during program execution.  Attribute variables may be declared
1171    explicitly via the <ATTRIB_statement> grammar rule, or implicitly by using
1172    an attribute binding in an instruction.
1173
1174    The set of available attribute bindings depends on the program type, and
1175    is enumerated in the specifications for each program type.
1176
1177    The set of bindings allowed for attribute array variables is limited to
1178    attribute state grouped in arrays (e.g., texture coordinates, generic
1179    vertex attributes).  Additionally, all bindings assigned to the array must
1180    be of the same binding type and must increase consecutively.  Examples of
1181    valid and invalid binding lists include:
1182
1183      vertex.attrib[1], vertex.attrib[2]      # valid, 2-entry array
1184      vertex.texcoord[0..3]                   # valid, 4-entry array
1185      vertex.attrib[1], vertex.attrib[3]      # invalid, skipped attrib 2
1186      vertex.attrib[2], vertex.attrib[1]      # invalid, wrong order
1187      vertex.attrib[1], vertex.texcoord[2]    # invalid, different types
1188
1189    Additionally, attribute bindings may be used in no more than one array
1190    variable accessed with relative addressing.
1191
1192    Implementations may have a limit on the total number of attribute binding
1193    components used by each program target (MAX_PROGRAM_ATTRIB_COMPONENTS_NV).
1194    Programs that use more attribute binding components than this limit will
1195    fail to load.  The method of counting used attribute binding components is
1196    implementation-dependent, but must satisfy the following properties:
1197
1198      * If an attribute binding is not referenced in a program, or is
1199        referenced only in declarations of attribute variables that are not
1200        used, none of its components are counted.
1201
1202      * An attribute binding component may be counted as used only if there
1203        exists an instruction operand where
1204
1205          - the component is enabled for read by the swizzle pattern (Section
1206            2.X.4.2), and
1207
1208          - the attribute binding is
1209
1210              - referenced directly by the operand,
1211
1212              - bound to a declared variable referenced by the operand, or
1213
1214              - bound to a declared array variable where another binding in
1215                the array satisfies one of the two previous conditions.
1216
1217        Implementations are not required to optimize out unused elements of an
1218        attribute array or components that are used in only some elements of
1219        an array.  The last of these rules is intended to cover the case where
1220        the same attribute binding is used in multiple variables.
1221
1222        For example, an operand whose swizzle pattern selects only the x
1223        component may result in the x component of an attribute binding being
1224        counted, but may never result in the counting of the y, z, or w
1225        components of any attribute binding.
1226
1227      * Implementations are not required to determine that components read by
1228        an instruction are actually unused due to:
1229
1230          - instruction write masks (for example, a component-wise ADD
1231            operation that only writes the "x" component doesn't have to read
1232            the "y", "z", and "w" components of its operands) or
1233
1234          - any other properties of the instruction (for example, the DP3
1235            instruction computes a 3-component dot product doesn't have to
1236            read the "w" component of its operands).
1237
1238
1239    Section 2.X.3.3, Program Parameters
1240
1241    Program parameter variables are used as constants during program
1242    execution.  All program parameter variables have associated bindings and
1243    are read-only during program execution.  Program parameters retain their
1244    values across program invocations, although their values may change
1245    between invocations due to GL state changes.  Program parameter variables
1246    may be declared explicitly via the <PARAM_statement> grammar rule, or
1247    implicitly by using a parameter binding in an instruction.  Except where
1248    otherwise specified, program parameter bindings always specify
1249    floating-point values.
1250
1251    When declaring program parameter array variables, all bindings are
1252    supported and can be assigned to array members in any order.  The only
1253    restriction is that no parameter binding may be used more than once in
1254    array variables accessed using relative addressing.  A program will fail
1255    to load if any program parameter binding is used more than once in a
1256    single array accessed using relative addressing or used at least once in
1257    two or more arrays accessed using relative addressing.
1258
1259
1260    Constant Bindings
1261
1262    If a program parameter binding matches the <constantScalar> or
1263    <signedConstantScalar> grammar rules, the corresponding program parameter
1264    variable is bound to the vector (X,X,X,X), where X is the value of the
1265    specified constant.
1266
1267    If a program parameter binding matches <constantVector>, the corresponding
1268    program parameter variable is bound to the vector (X,Y,Z,W), where X, Y,
1269    Z, and W are the values corresponding to the first, second, third, and
1270    fourth match of <signedConstantScalar>.  If fewer than four constants are
1271    specified, Y, Z, and W assume the values 0, 0, and 1, if their respective
1272    constants are not specified.
1273
1274    Constant bindings can be interpreted as having signed integer, unsigned
1275    integer, or floating-point values, depending on how they are used in the
1276    program text.  For constants in variable declarations, the components of
1277    the constant are interpreted according to the variable's component data
1278    type modifier.  If no data type modifier is specified in a declaration,
1279    constants are interpreted as floating-point values.  For constant bindings
1280    used directly in an instruction, the components of the constant are
1281    interpreted according to the required data type of the operand.  A program
1282    will fail to load if it specifies a floating-point constant value
1283    (matching the <floatConstant> grammar rule) that should be interpreted as
1284    a signed or unsigned integer, or a negative integer constant value that
1285    should be interpreted as an unsigned integer.
1286
1287    If the value used to specify a floating-point constant can not be exactly
1288    represented, the nearest floating-point value will be used.  If the value
1289    used to specify an integer constant is too large to be represented, the
1290    program will fail to load.
1291
1292
1293    Program Environment/Local Parameter Bindings
1294
1295      Binding                    Components  Underlying State
1296      -------------------------  ----------  -------------------------------
1297      program.env[a]             (x,y,z,w)   program environment parameter a
1298      program.local[a]           (x,y,z,w)   program local parameter a
1299      program.env[a..b]          (x,y,z,w)   program environment parameters
1300                                             a through b
1301      program.local[a..b]        (x,y,z,w)   program local parameters
1302                                             a through b
1303
1304      Table X.1:  Program Environment/Local Parameter Bindings.  <a> and <b>
1305      indicate parameter numbers, where <a> must be less than or equal to <b>.
1306
1307    If a program parameter binding matches "program.env[a]" or
1308    "program.local[a]", the four components of the program parameter variable
1309    are filled with the four components of program environment parameter <a>
1310    or program local parameter <a> respectively.
1311
1312    Additionally, for program parameter array bindings, "program.env[a..b]"
1313    and "program.local[a..b]" are equivalent to specifying program environment
1314    or local parameters <a> through <b> in order, respectively.  A program
1315    using any of these bindings will fail to load if <a> is greater than <b>.
1316
1317    Program environment and local parameters are typeless, and may be
1318    specified as signed integer, unsigned integer, or floating-point
1319    variables.  If a program environment parameter is read using a data type
1320    other than the one used to specify it, an undefined value is returned.
1321
1322
1323    Material Property Bindings
1324
1325      Binding                        Components  Underlying State
1326      -----------------------------  ----------  ----------------------------
1327      state.material.ambient         (r,g,b,a)   front ambient material color
1328      state.material.diffuse         (r,g,b,a)   front diffuse material color
1329      state.material.specular        (r,g,b,a)   front specular material color
1330      state.material.emission        (r,g,b,a)   front emissive material color
1331      state.material.shininess       (s,0,0,1)   front material shininess
1332      state.material.front.ambient   (r,g,b,a)   front ambient material color
1333      state.material.front.diffuse   (r,g,b,a)   front diffuse material color
1334      state.material.front.specular  (r,g,b,a)   front specular material color
1335      state.material.front.emission  (r,g,b,a)   front emissive material color
1336      state.material.front.shininess (s,0,0,1)   front material shininess
1337      state.material.back.ambient    (r,g,b,a)   back ambient material color
1338      state.material.back.diffuse    (r,g,b,a)   back diffuse material color
1339      state.material.back.specular   (r,g,b,a)   back specular material color
1340      state.material.back.emission   (r,g,b,a)   back emissive material color
1341      state.material.back.shininess  (s,0,0,1)   back material shininess
1342
1343      Table X.3:  Material Property Bindings.  If a material face is not
1344      specified in the binding, the front property is used.
1345
1346    If a program parameter binding matches any of the material properties
1347    listed in Table X.3, the program parameter variable is filled according to
1348    the table.  For ambient, diffuse, specular, or emissive colors, the "x",
1349    "y", "z", and "w" components are filled with the "r", "g", "b", and "a"
1350    components, respectively, of the corresponding material color.  For
1351    material shininess, the "x" component is filled with the material's
1352    specular exponent, and the "y", "z", and "w" components are filled with
1353    the floating-point constants 0, 0, and 1, respectively.  Bindings
1354    containing ".back" refer to the back material; all other bindings refer to
1355    the front material.
1356
1357    Material properties can be changed inside a Begin/End pair, either
1358    directly by calling Material, or indirectly through color material.
1359    However, such property changes are not guaranteed to update program
1360    parameter bindings until the following End command.  Program parameter
1361    variables bound to material properties changed inside a Begin/End pair are
1362    undefined until the following End command.
1363
1364
1365    Light Property Bindings
1366
1367      Binding                        Components  Underlying State
1368      -----------------------------  ----------  ----------------------------
1369      state.light[n].ambient         (r,g,b,a)   light n ambient color
1370      state.light[n].diffuse         (r,g,b,a)   light n diffuse color
1371      state.light[n].specular        (r,g,b,a)   light n specular color
1372      state.light[n].position        (x,y,z,w)   light n position
1373      state.light[n].attenuation     (a,b,c,e)   light n attenuation constants
1374                                                 and spot light exponent
1375      state.light[n].spot.direction  (x,y,z,c)   light n spot direction and
1376                                                 cutoff angle cosine
1377      state.light[n].half            (x,y,z,1)   light n infinite half-angle
1378      state.lightmodel.ambient       (r,g,b,a)   light model ambient color
1379      state.lightmodel.scenecolor    (r,g,b,a)   light model front scene color
1380      state.lightmodel.              (r,g,b,a)   light model front scene color
1381               front.scenecolor
1382      state.lightmodel.              (r,g,b,a)   light model back scene color
1383               back.scenecolor
1384      state.lightprod[n].ambient     (r,g,b,a)   light n / front material
1385                                                 ambient color product
1386      state.lightprod[n].diffuse     (r,g,b,a)   light n / front material
1387                                                 diffuse color product
1388      state.lightprod[n].specular    (r,g,b,a)   light n / front material
1389                                                 specular color product
1390      state.lightprod[n].            (r,g,b,a)   light n / front material
1391              front.ambient                      ambient color product
1392      state.lightprod[n].            (r,g,b,a)   light n / front material
1393              front.diffuse                      diffuse color product
1394      state.lightprod[n].            (r,g,b,a)   light n / front material
1395              front.specular                     specular color product
1396      state.lightprod[n].            (r,g,b,a)   light n / back material
1397              back.ambient                       ambient color product
1398      state.lightprod[n].            (r,g,b,a)   light n / back material
1399              back.diffuse                       diffuse color product
1400      state.lightprod[n].            (r,g,b,a)   light n / back material
1401              back.specular                      specular color product
1402
1403      Table X.4: Light Property Bindings.  <n> indicates a light number.
1404
1405    If a program parameter binding matches "state.light[n].ambient",
1406    "state.light[n].diffuse", or "state.light[n].specular", the "x", "y", "z",
1407    and "w" components of the program parameter variable are filled with the
1408    "r", "g", "b", and "a" components, respectively, of the corresponding
1409    light color.
1410
1411    If a program parameter binding matches "state.light[n].position", the "x",
1412    "y", "z", and "w" components of the program parameter variable are filled
1413    with the "x", "y", "z", and "w" components, respectively, of the light
1414    position.
1415
1416    If a program parameter binding matches "state.light[n].attenuation", the
1417    "x", "y", and "z" components of the program parameter variable are filled
1418    with the constant, linear, and quadratic attenuation parameters of the
1419    specified light, respectively (section 2.13.1).  The "w" component of the
1420    program parameter variable is filled with the spot light exponent of the
1421    specified light.
1422
1423    If a program parameter binding matches "state.light[n].spot.direction",
1424    the "x", "y", and "z" components of the program parameter variable are
1425    filled with the "x", "y", and "z" components of the spot light direction
1426    of the specified light, respectively (section 2.13.1).  The "w" component
1427    of the program parameter variable is filled with the cosine of the spot
1428    light cutoff angle of the specified light.
1429
1430    If a program parameter binding matches "state.light[n].half", the "x",
1431    "y", and "z" components of the program parameter variable are filled with
1432    the x, y, and z components, respectively, of the normalized infinite
1433    half-angle vector
1434
1435      h_inf = || P + (0, 0, 1) ||.
1436
1437    The "w" component is filled with 1.0.  In the computation of h_inf, P
1438    consists of the x, y, and z coordinates of the normalized vector from the
1439    eye position P_e to the eye-space light position P_pli (section 2.13.1).
1440    h_inf is defined to correspond to the normalized half-angle vector when
1441    using an infinite light (w coordinate of the position is zero) and an
1442    infinite viewer (v_bs is FALSE).  For local lights or a local viewer,
1443    h_inf is well-defined but does not match the normalized half-angle vector,
1444    which will vary depending on the vertex position.
1445
1446    If a program parameter binding matches "state.lightmodel.ambient", the
1447    "x", "y", "z", and "w" components of the program parameter variable are
1448    filled with the "r", "g", "b", and "a" components of the light model
1449    ambient color, respectively.
1450
1451    If a program parameter binding matches "state.lightmodel.scenecolor" or
1452    "state.lightmodel.front.scenecolor", the "x", "y", and "z" components of
1453    the program parameter variable are filled with the "r", "g", and "b"
1454    components respectively of the "front scene color"
1455
1456      c_scene = a_cs * a_cm + e_cm,
1457
1458    where a_cs is the light model ambient color, a_cm is the front ambient
1459    material color, and e_cm is the front emissive material color.  The "w"
1460    component of the program parameter variable is filled with the alpha
1461    component of the front diffuse material color.  If a program parameter
1462    binding matches "state.lightmodel.back.scenecolor", a similar back scene
1463    color, computed using back-facing material properties, is used.  The front
1464    and back scene colors match the values that would be assigned to vertices
1465    using conventional lighting if all lights were disabled.
1466
1467    If a program parameter binding matches anything beginning with
1468    "state.lightprod[n]", the "x", "y", and "z" components of the program
1469    parameter variable are filled with the "r", "g", and "b" components,
1470    respectively, of the corresponding light product.  The three light product
1471    components are the products of the corresponding color components of the
1472    specified material property and the light color of the specified light
1473    (see Table X.4).  The "w" component of the program parameter variable is
1474    filled with the alpha component of the specified material property.
1475
1476    Light products depend on material properties, which can be changed inside
1477    a Begin/End pair.  Such property changes are not guaranteed to take effect
1478    until the following End command.  Program parameter variables bound to
1479    light products whose corresponding material property changes inside a
1480    Begin/End pair are undefined until the following End command.
1481
1482
1483    Texture Coordinate Generation Property Bindings
1484
1485      Binding                    Components  Underlying State
1486      -------------------------  ----------  ----------------------------
1487      state.texgen[n].eye.s      (a,b,c,d)   TexGen eye linear plane
1488                                             coefficients, s coord, unit n
1489      state.texgen[n].eye.t      (a,b,c,d)   TexGen eye linear plane
1490                                             coefficients, t coord, unit n
1491      state.texgen[n].eye.r      (a,b,c,d)   TexGen eye linear plane
1492                                             coefficients, r coord, unit n
1493      state.texgen[n].eye.q      (a,b,c,d)   TexGen eye linear plane
1494                                             coefficients, q coord, unit n
1495      state.texgen[n].object.s   (a,b,c,d)   TexGen object linear plane
1496                                             coefficients, s coord, unit n
1497      state.texgen[n].object.t   (a,b,c,d)   TexGen object linear plane
1498                                             coefficients, t coord, unit n
1499      state.texgen[n].object.r   (a,b,c,d)   TexGen object linear plane
1500                                             coefficients, r coord, unit n
1501      state.texgen[n].object.q   (a,b,c,d)   TexGen object linear plane
1502                                             coefficients, q coord, unit n
1503
1504      Table X.5:  Texture Coordinate Generation Property Bindings.  "[n]" is
1505      optional -- texture unit <n> is used if specified; texture unit 0 is
1506      used otherwise.
1507
1508    If a program parameter binding matches a set of TexGen plane coefficients,
1509    the "x", "y", "z", and "w" components of the program parameter variable
1510    are filled with the coefficients p1, p2, p3, and p4, respectively, for
1511    object linear coefficients, and the coefficents p1', p2', p3', and p4',
1512    respectively, for eye linear coefficients (section 2.10.4).
1513
1514
1515    Fog Property Bindings
1516
1517      Binding                        Components  Underlying State
1518      -----------------------------  ----------  ----------------------------
1519      state.fog.color                (r,g,b,a)   RGB fog color (section 3.10)
1520      state.fog.params               (d,s,e,r)   fog density, linear start
1521                                                 and end, and 1/(end-start)
1522                                                 (section 3.10)
1523
1524      Table X.6:  Fog Property Bindings
1525
1526    If a program parameter binding matches "state.fog.color", the "x", "y",
1527    "z", and "w" components of the program parameter variable are filled with
1528    the "r", "g", "b", and "a" components, respectively, of the fog color
1529    (section 3.10).
1530
1531    If a program parameter binding matches "state.fog.params", the "x", "y",
1532    and "z" components of the program parameter variable are filled with the
1533    fog density, linear fog start, and linear fog end parameters (section
1534    3.10), respectively.  The "w" component is filled with 1/(end-start),
1535    where end and start are the linear fog end and start parameters,
1536    respectively.
1537
1538
1539    Clip Plane Property Bindings
1540
1541      Binding                        Components  Underlying State
1542      -----------------------------  ----------  ----------------------------
1543      state.clip[n].plane            (a,b,c,d)   clip plane n coefficients
1544
1545      Table X.7:  Clip Plane Property Bindings.  <n> specifies the clip plane
1546      number, and is required.
1547
1548    If a program parameter binding matches "state.clip[n].plane", the "x",
1549    "y", "z", and "w" components of the program parameter variable are filled
1550    with the coefficients p1', p2', p3', and p4', respectively, of clip plane
1551    <n> (section 2.11).
1552
1553
1554    Point Property Bindings
1555
1556      Binding                        Components  Underlying State
1557      -----------------------------  ----------  ----------------------------
1558      state.point.size               (s,n,x,f)   point size, min and max size
1559                                                 clamps, and fade threshold
1560                                                 (section 3.3)
1561      state.point.attenuation        (a,b,c,1)   point size attenuation consts
1562
1563      Table X.8:  Point Property Bindings
1564
1565    If a program parameter binding matches "state.point.size", the "x", "y",
1566    "z", and "w" components of the program parameter variable are filled with
1567    the point size, minimum point size, maximum point size, and fade
1568    threshold, respectively (section 3.3).
1569
1570    If a program parameter binding matches "state.point.attenuation", the "x",
1571    "y", and "z" components of the program parameter variable are filled with
1572    the constant, linear, and quadratic point size attenuation parameters (a,
1573    b, and c), respectively (section 3.3).  The "w" component is filled with
1574    1.0.
1575
1576
1577    Texture Environment Property Bindings
1578
1579      Binding                    Components  Underlying State
1580      -------------------------  ----------  ----------------------------
1581      state.texenv[n].color      (r,g,b,a)   texture environment n color
1582
1583      Table X.9:  Texture Environment Property Bindings.  "[n]" is optional --
1584      texture unit <n> is used if specified; texture unit 0 is used otherwise.
1585
1586    If a program parameter binding matches "state.texenv[n].color", the "x",
1587    "y", "z", and "w" components of the program parameter variable are filled
1588    with the "r", "g", "b", and "a" components, respectively, of the
1589    corresponding texture environment color.  Note that only "legacy" texture
1590    units, as queried by MAX_TEXTURE_UNITS, include texture environment state.
1591    Texture image units and texture coordinate sets do not have associated
1592    texture environment state.
1593
1594
1595    Depth Property Bindings
1596
1597      Binding                      Components  Underlying State
1598      ---------------------------  ----------  ----------------------------
1599      state.depth.range            (n,f,d,1)   Depth range near, far, and
1600                                               (far-near) (section 2.10.1)
1601
1602      Table X.10:  Depth Property Bindings
1603
1604    If a program parameter binding matches "state.depth.range", the "x" and
1605    "y" components of the program parameter variable are filled with the
1606    mappings of near and far clipping planes to window coordinates,
1607    respectively.  The "z" component is filled with the difference of the
1608    mappings of near and far clipping planes, far minus near.  The "w"
1609    component is filled with 1.0.
1610
1611
1612    Matrix Property Bindings
1613
1614      Binding                               Underlying State
1615      ------------------------------------  ---------------------------
1616      * state.matrix.modelview[n]           modelview matrix n
1617        state.matrix.projection             projection matrix
1618        state.matrix.mvp                    modelview-projection matrix
1619      * state.matrix.texture[n]             texture matrix n
1620        state.matrix.program[n]             program matrix n
1621
1622      Table X.11:  Base Matrix Property Bindings.  The "[n]" syntax indicates
1623      a specific matrix number.  For modelview and texture matrices, a matrix
1624      number is optional, and matrix zero will be used if the matrix number is
1625      omitted.  These base bindings may further be modified by a
1626      inverse/transpose selector and a row selector.
1627
1628    If the beginning of a program parameter binding matches any of the matrix
1629    binding names listed in Table X.11, the binding corresponds to a 4x4
1630    matrix.  If the parameter binding is followed by ".inverse", ".transpose",
1631    or ".invtrans" (<stateMatModifier> grammar rule), the inverse, transpose,
1632    or transpose of the inverse, respectively, of the matrix specified in
1633    Table X.11 is selected.  Otherwise, the matrix specified in Table X.11 is
1634    selected.  If the specified matrix is poorly-conditioned (singular or
1635    nearly so), its inverse matrix is undefined.  The binding name
1636    "state.matrix.mvp" refers to the product of modelview matrix zero and the
1637    projection matrix, defined as
1638
1639       MVP = P * M0,
1640
1641    where P is the projection matrix and M0 is modelview matrix zero.
1642
1643    If the selected matrix is followed by ".row[<a>]" (matching the
1644    <stateMatrixRow> grammar rule), the "x", "y", "z", and "w" components of
1645    the program parameter variable are filled with the four entries of row <a>
1646    of the selected matrix.  In the example,
1647
1648      PARAM m0 = state.matrix.modelview[1].row[0];
1649      PARAM m1 = state.matrix.projection.transpose.row[3];
1650
1651    the variable "m0" is set to the first row (row 0) of modelview matrix 1
1652    and "m1" is set to the last row (row 3) of the transpose of the projection
1653    matrix.
1654
1655    For program parameter array bindings, multiple rows of the selected matrix
1656    can be bound via the <stateMatrixRows> grammar rule.  If the selected
1657    matrix binding is followed by ".row[<a>..<b>]", the result is equivalent
1658    to specifying matrix rows <a> through <b>, in order.  A program will fail
1659    to load if <a> is greater than <b>.  If no row selection is specified
1660    (<optMatrixRows> matches ""), matrix rows 0 through 3 are bound in order.
1661    In the example,
1662
1663      PARAM m2[] = { state.matrix.program[0].row[1..2] };
1664      PARAM m3[] = { state.matrix.program[0].transpose };
1665
1666    the array "m2" has two entries, containing rows 1 and 2 of program matrix
1667    zero, and "m3" has four entries, containing all four rows of the transpose
1668    of program matrix zero.
1669
1670
1671    Section 2.X.3.4, Program Temporaries
1672
1673    Program temporary variables are used to hold temporary results during
1674    program execution.  Temporaries do not persist between program
1675    invocations, and are undefined at the beginning of each program
1676    invocation.
1677
1678    Temporary variables are declared explicitly using the <TEMP_statement>
1679    grammar rule.  Each such statement can declare one or more temporaries.
1680    Temporaries can not be declared implicitly.  Temporaries can be declared
1681    using any component size ("SHORT" or "LONG") and type ("FLOAT" or "INT")
1682    modifier.
1683
1684    Temporary variables may be declared as arrays.  Temporary variables
1685    declared as arrays may be stored in slower memory than those not declared
1686    as arrays, and it is recommended to use non-array variables unless array
1687    functionality is required.
1688
1689
1690    Section 2.X.3.5, Program Results
1691
1692    Program result variables represent the per-vertex or per-fragment results
1693    of the program.  All result variables have associated bindings, are
1694    write-only during program execution, and are undefined at the beginning of
1695    each program invocation.  Any vertex or fragment attributes corresponding
1696    to unwritten result variables will be undefined in subsequent stages of
1697    the pipeline.  Result variables may be declared explicitly via the
1698    <OUTPUT_statement> grammar rule, or implicitly by using a result binding
1699    in an instruction.
1700
1701    The set of available result bindings depends on the program type, and is
1702    enumerated in the specifications for each program type.
1703
1704    Result variables may generally be declared as arrays, but the set of
1705    bindings allowed for arrays is limited to state grouped in arrays (e.g.,
1706    texture coordinates, clip distances, colors).  Additionally, all bindings
1707    assigned to the array must be of the same binding type and must increase
1708    consecutively.  Examples of valid and invalid binding lists for vertex
1709    programs include:
1710
1711      result.clip[1], result.clip[2]          # valid, 2-entry array
1712      result.texcoord[0..3]                   # valid, 4-entry array
1713      result.texcoord[1], result.texcoord[3]  # invalid, skipped texcoord 2
1714      result.texcoord[2], result.texcoord[1]  # invalid, wrong order
1715      result.texcoord[1], result.clip[2]      # invalid, different types
1716
1717    Additionally, result bindings may be used in no more than one array
1718    addressed with relative addressing.
1719
1720    Implementations may have a limit on the total number of result binding
1721    components used by each program target (MAX_PROGRAM_RESULT_COMPONENTS_NV).
1722    Programs that require more result binding components than this limit will
1723    fail to load.  The method of counting used result binding components is
1724    implementation-dependent, but must satisfy the following properties:
1725
1726      * If a result binding is not referenced in a program, or is referenced
1727        only in declarations of result variables that are not used, none of
1728        its components are counted.
1729
1730      * A result binding component may be counted as used only if there exists
1731        an instruction operand where
1732
1733          - the component is enabled in the write mask (Section 2.X.4.3), and
1734
1735          - the result binding is either
1736
1737              - referenced directly by the operand,
1738
1739              - bound to a declared variable referenced by the operand, or
1740
1741              - bound to a declared array variable where another binding in
1742                the array satisfies one of the two previous conditions.
1743
1744        Implementations are not required to optimize out unused elements of an
1745        result array or components that are used in only some elements of an
1746        array.  The last of these rules is intended to cover the case where
1747        the same result binding is used in multiple variables.
1748
1749        For example, an instruction whose write mask selects only the x
1750        component may result in the x component of a result binding being
1751        counted, but may never result in the counting of the y, z, or w
1752        components of any result binding.
1753
1754
1755    Section 2.X.3.6, Program Parameter Buffers
1756
1757    Program parameter buffers are arrays consisting of single-component
1758    typeless values or four-component typeless vectors stored in a buffer
1759    object.  The GL provides an implementation-dependent number of buffer
1760    object binding points for each program target, to which buffer objects can
1761    be attached.  Program parameter buffer variables can be changed either by
1762    updating the contents of bound buffer objects, or simply by changing the
1763    buffer object attached to a binding point.
1764
1765    Program parameter buffer variables are used as constants during program
1766    execution.  All program parameter buffer variables have an associated
1767    binding and are read-only during program execution.  Program parameter
1768    buffers retain their values across program invocations, although their
1769    values may change as buffer object bindings or contents change.  Program
1770    parameter buffer variables must be declared explicitly via the
1771    <BUFFER_statement> grammar rule.  Program parameter buffer bindings can
1772    not be used directly in executable instructions.
1773
1774    Program parameter buffer variables are treated as an array of
1775    single-component values if the <bufferDeclType> grammar rule matches
1776    "BUFFER" or as an array of four-component vectors if it matches "BUFFER4".
1777    A program will fail to load if a variable declared as "BUFFER" and another
1778    variable declared as "BUFFER4" use the same buffer binding point.
1779
1780    Program parameter buffer variables may be declared as arrays, but all
1781    bindings assigned to the array must use the same binding point and must
1782    increase consecutively.
1783
1784      Binding                        Components  Underlying State
1785      -----------------------------  ----------  -----------------------------
1786      program.buffer[a][b]           (x,x,x,x)   program parameter buffer a,
1787                                                   element b
1788      program.buffer[a][b..c]        (x,x,x,x)   program parameter buffer a,
1789                                                   elements b through c
1790      program.buffer[a]              (x,x,x,x)   program parameter buffer a,
1791                                                   all elements
1792
1793      Table X.12: Program Parameter Buffer Bindings.  <a> indicates a buffer
1794      number, <b> and <c> indicate individual elements.
1795
1796    If a program parameter buffer binding matches "program.buffer[a][b]", the
1797    program parameter variable are filled with element <b> of the buffer
1798    object bound to binding point <a>.  Each element of the bound buffer
1799    object is treated a one or four words of data that can hold integer or
1800    floating-point values.  When a single-component binding is evaluated, the
1801    selected word is broadcast to all four components of the variable.  When a
1802    four-component binding is evaluated, the four components of the buffer
1803    element are loaded into the variable.  If no buffer object is bound to
1804    binding point <a>, or the bound buffer object is not large enough to hold
1805    an element <b>, the values used are undefined.  The binding point <a> must
1806    be a nonnegative integer constant.
1807
1808    For program parameter buffer array declarations, "program.buffer[a][b..c]"
1809    is equivalent to specifying elements <b> through <c> of the buffer object
1810    bound to binding point <a> in order.
1811
1812    For program parameter buffer array declarations, "program.buffer[a]" is
1813    equivalent to specifying the entire buffer -- elements 0 through <n>-1,
1814    where <n> is either the size of the array (if declared) or the
1815    implementation-dependent maximum parameter buffer object size limit (if no
1816    size is declared).
1817
1818
1819    Section 2.X.3.7, Program Condition Code Registers
1820
1821    The program condition code registers are four-component vectors.  Each
1822    component of this register is a collection of single-bit flags, including
1823    a sign flag (SF), a zero flag (ZF), an overflow flag (OF), and a carry
1824    flag (CF).  There are two condition code registers (CC0 and CC1), whose
1825    values are undefined at the beginning of program execution.
1826
1827    Most program instructions can optionally update one of the condition code
1828    registers, by designating the condition code to update in the instruction.
1829    When a condition code component is updated, the four flags of each
1830    component of the condition code are set according to the corresponding
1831    component of the instruction result.  Full details on the condition code
1832    updates and tests can be found in Section 2.X.4.3.
1833
1834    The value of these four flags can be combined in various condition code
1835    tests, which can be used to mask writes to destination variables and to
1836    perform conditional branches or other condition operations.
1837
1838
1839    Section 2.X.3.8, Program Aliases
1840
1841    Programs can create aliases by matching the <ALIAS_statement> grammar
1842    rule.  Aliases allow programs to use multiple variable names to refer to a
1843    single underlying variable.  For example, the statement
1844
1845      ALIAS var1 = var0
1846
1847    establishes a variable name of "var1".  Subsequent references to "var1" in
1848    the program text are treated as references to "var0".  The left hand side
1849    of an ALIAS statement must be a new variable name, and the right hand side
1850    must be an established variable name.
1851
1852    Aliases are not considered variable declarations, so do not count against
1853    the limits on the number of variable declarations allowed in the program
1854    text.
1855
1856
1857    Section 2.X.3.9, Program Resource Limits
1858
1859    (see ARB_vertex_program specification, incorporates all the different
1860    limits on instruction counts, temporaries, attribute bindings, program
1861    parameters, and so on)
1862
1863
1864    Section 2.X.4, Program Execution Environment
1865
1866    The set of instructions supported for GPU programs is given in Table X.13
1867    below and is described in detail in Section 2.X.8.  An instruction can use
1868    up to three operands when it executes, and most instructions can write a
1869    single result vector.  Instructions may also specify one or more
1870    modifiers, according to the <opModifiers> grammar rule.  Instruction
1871    modifiers affect how the specified operation is performed.
1872
1873    GPU programs may operate on signed integer, unsigned integer, or
1874    floating-point values; some instructions are capable of operating on any
1875    of the three types.  However, the data type of the operands and the result
1876    are always determined based solely on the instruction and its modifiers.
1877    If any of the variables used in the instruction are typeless, they will be
1878    interpreted according to the data type derived from the instruction.  If
1879    any variables with a conflicting data type are used in the instruction,
1880    the program will fail to load unless the "NTC" (no type checking)
1881    instruction modifier is specified.
1882
1883                  Modifiers
1884      Instruction F I C S H D  Out Inputs    Description
1885      ----------- - - - - - -  --- --------  --------------------------------
1886      ABS         X X X X X F  v   v         absolute value
1887      ADD         X X X X X F  v   v,v       add
1888      AND         - X X - - S  v   v,v       bitwise and
1889      BRK         - - - - - -  -   c         break out of loop instruction
1890      CAL         - - - - - -  -   c         subroutine call
1891      CEIL        X X X X X F  v   vf        ceiling
1892      CMP         X X X X X F  v   v,v,v     compare
1893      CONT        - - - - - -  -   c         continue with next loop interation
1894      COS         X - X X X F  s   s         cosine with reduction to [-PI,PI]
1895      DIV         X X X X X F  v   v,s       divide vector components by scalar
1896      DP2         X - X X X F  s   v,v       2-component dot product
1897      DP2A        X - X X X F  s   v,v,v     2-comp. dot product w/scalar add
1898      DP3         X - X X X F  s   v,v       3-component dot product
1899      DP4         X - X X X F  s   v,v       4-component dot product
1900      DPH         X - X X X F  s   v,v       homogeneous dot product
1901      DST         X - X X X F  v   v,v       distance vector
1902      ELSE        - - - - - -  -   -         start if test else block
1903      ENDIF       - - - - - -  -   -         end if test block
1904      ENDREP      - - - - - -  -   -         end of repeat block
1905      EX2         X - X X X F  s   s         exponential base 2
1906      FLR         X X X X X F  v   vf        floor
1907      FRC         X - X X X F  v   v         fraction
1908      I2F         - X X - - S  vf  v         integer to float
1909      IF          - - - - - -  -   c         start of if test block
1910      KIL         X X - - X F  -   vc        kill fragment
1911      LG2         X - X X X F  s   s         logarithm base 2
1912      LIT         X - X X X F  v   v         compute lighting coefficients
1913      LRP         X - X X X F  v   v,v,v     linear interpolation
1914      MAD         X X X X X F  v   v,v,v     multiply and add
1915      MAX         X X X X X F  v   v,v       maximum
1916      MIN         X X X X X F  v   v,v       minimum
1917      MOD         - X X - - S  v   v,s       modulus vector components by scalar
1918      MOV         X X X X X F  v   v         move
1919      MUL         X X X X X F  v   v,v       multiply
1920      NOT         - X X - - S  v   v         bitwise not
1921      NRM         X - X X X F  v   v         normalize 3-component vector
1922      OR          - X X - - S  v   v,v       bitwise or
1923      PK2H        X X - - - F  s   vf        pack two 16-bit floats
1924      PK2US       X X - - - F  s   vf        pack two floats as unsigned 16-bit
1925      PK4B        X X - - - F  s   vf        pack four floats as signed 8-bit
1926      PK4UB       X X - - - F  s   vf        pack four floats as unsigned 8-bit
1927      POW         X - X X X F  s   s,s       exponentiate
1928      RCC         X - X X X F  s   s         reciprocal (clamped)
1929      RCP         X - X X X F  s   s         reciprocal
1930      REP         X X - - X F  -   v         start of repeat block
1931      RET         - - - - - -  -   c         subroutine return
1932      RFL         X - X X X F  v   v,v       reflection vector
1933      ROUND       X X X X X F  v   vf        round to nearest integer
1934      RSQ         X - X X X F  s   s         reciprocal square root
1935      SAD         - X X - - S  vu  v,v,vu    sum of absolute differences
1936      SCS         X - X X X F  v   s         sine/cosine without reduction
1937      SEQ         X X X X X F  v   v,v       set on equal
1938      SFL         X X X X X F  v   v,v       set on false
1939      SGE         X X X X X F  v   v,v       set on greater than or equal
1940      SGT         X X X X X F  v   v,v       set on greater than
1941      SHL         - X X - - S  v   v,s       shift left
1942      SHR         - X X - - S  v   v,s       shift right
1943      SIN         X - X X X F  s   s         sine with reduction to [-PI,PI]
1944      SLE         X X X X X F  v   v,v       set on less than or equal
1945      SLT         X X X X X F  v   v,v       set on less than
1946      SNE         X X X X X F  v   v,v       set on not equal
1947      SSG         X - X X X F  v   v         set sign
1948      STR         X X X X X F  v   v,v       set on true
1949      SUB         X X X X X F  v   v,v       subtract
1950      SWZ         X - X X X F  v   v         extended swizzle
1951      TEX         X X X X - F  v   vf        texture sample
1952      TRUNC       X X X X X F  v   vf        truncate (round toward zero)
1953      TXB         X X X X - F  v   vf        texture sample with bias
1954      TXD         X X X X - F  v   vf,vf,vf  texture sample w/partials
1955      TXF         X X X X - F  v   vs        texel fetch
1956      TXL         X X X X - F  v   vf        texture sample w/LOD
1957      TXP         X X X X - F  v   vf        texture sample w/projection
1958      TXQ         - - - - - S  vs  vs        texture info query
1959      UP2H        X X X X - F  vf  s         unpack two 16-bit floats
1960      UP2US       X X X X - F  vf  s         unpack two unsigned 16-bit ints
1961      UP4B        X X X X - F  vf  s         unpack four signed 8-bit ints
1962      UP4UB       X X X X - F  vf  s         unpack four unsigned 8-bit ints
1963      X2D         X - X X X F  v   v,v,v     2D coordinate transformation
1964      XOR         - X X - - S  v   v,v       exclusive or
1965      XPD         X - X X X F  v   v,v       cross product
1966
1967      Table X.13:  Summary of NV_gpu_program4 instructions.  The "Modifiers"
1968      columns specify the set of modifiers allowed for the instruction:
1969
1970        F = floating-point data type modifiers
1971        I = signed and unsigned integer data type modifiers
1972        C = condition code update modifiers
1973        S = clamping (saturation) modifiers
1974        H = half-precision float data type suffix
1975        D = default data type modifier (F, U, or S)
1976
1977      The input and output columns describe the formats of the operands and
1978      results of the instruction.
1979
1980        v:  4-component vector (data type is inherited from operation)
1981        vf: 4-component vector (data type is always floating-point)
1982        vs: 4-component vector (data type is always signed integer)
1983        vu: 4-component vector (data type is always unsigned integer)
1984        s:  scalar (replicated if written to a vector destination;
1985                    data type is inherited from operation)
1986        c:  condition code test result (e.g., "EQ", "GT1.x")
1987        vc: 4-component vector or condition code test
1988
1989
1990    Section 2.X.4.1, Program Instruction Modifiers
1991
1992    There are several types of instruction modifiers available.  A data type
1993    modifier specifies that an instruction should operate on signed integer,
1994    unsigned integer, or floating-point data, when multiple data types are
1995    supported.  A clamping modifier applies to instructions with
1996    floating-point results, and specifies the range to which the results
1997    should be clamped.  A condition code update modifier specifies that the
1998    instruction should update one of the condition code variables.  Several
1999    other special modifiers are also provided.
2000
2001    Instruction modifiers may be specified as stand-alone modifiers or as
2002    suffixes concatenated with the opcode name.  A program will fail to load
2003    if it contains an instruction that
2004
2005      * specifies more than one modifier of any given type,
2006
2007      * specifies a clamping modifier on an instruction, unless it produces
2008        floating-point results, or
2009
2010      * specifies a modifier that is not supported by the instruction (see
2011        Table X.13 and the instruction description).
2012
2013    Stand-alone instruction modifiers are specified according to the
2014    <opModifiers> grammar rule using a ".<modifier>" syntax.  Multiple
2015    modifers, separated by periods, may be specified.  The set of supported
2016    modifiers is described in Table X.14.
2017
2018      Modifier  Description
2019      --------  -----------------------------------------------
2020      F         Floating-point operation
2021      U         Fixed-point operation, unsigned operands
2022      S         Fixed-point operation, signed operands
2023      CC        Update condition code register zero
2024      CC0       Update condition code register zero
2025      CC1       Update condition code register one
2026      SAT       Floating-point results clamped to [0,1]
2027      SSAT      Floating-point results clamped to [-1,1]
2028      NTC       Disable type-checking on operands/results
2029      S24       Signed multiply (24-bit operands)
2030      U24       Unsigned multiply (24-bit operands)
2031      HI        Multiplies two 32-bit integer operands, returns
2032                  the 32 MSBs of the product
2033
2034      Table X.14, Instruction Modifers.
2035
2036    "F", "U", and "S" modifiers are data type modifiers and specify that the
2037    instruction should operate on floating-point, unsigned integer, or
2038    signed integer values, respectively.  For example, "ADD.F", "ADD.U", and
2039    "ADD.S" specify component-wise addition of floating-point, unsigned
2040    integer, or signed integer vectors, respectively.  These modifiers specify
2041    a data type, but do not specify a precision at which the operation is
2042    performed.  Floating-point operations will be carried out with an internal
2043    precision no less than that used to represent the largest operand.
2044    Fixed-point operations will be carried out using at least as many bits as
2045    used to represent the largest operand.  Operands represented with fewer
2046    bits than used to perform the instruction will be promoted to a larger
2047    data type.  Signed integer operands will be sign-extended, where the most
2048    significant bits are filled with ones if the operand is negative and zero
2049    otherwise.  Unsigned integer operands will be zero-extended, where the
2050    most significant bits are always filled with zeroes.  For some
2051    instructions, the data type of some operands or the result are fixed; in
2052    these cases, the data type modifier specifies the data type of the
2053    remaining values.
2054
2055    "CC", "CC0", and "CC1" are condition code update modifiers that specify
2056    that one of the condition code registers should be updated based on the
2057    result of the instruction, as described in section 2.X.4.3.  "CC" and
2058    "CC0" specify that the condition code register CC0 be updated; "CC1"
2059    specifies an update to CC1.  If no condition code update modifier is
2060    provided, the condition code registers will not be affected.
2061
2062    "SAT" and "SSAT" are clamping modifiers that specify that the
2063    floating-point components of the instruction result should be clamped to
2064    [0,1] or [-1,1], respectively, before updating the condition code and the
2065    destination variable.  If no clamping suffix is specified, unclamped
2066    results will be used for condition code updates (if any) and destination
2067    variable writes.  Clamping modifiers are not supported on instructions
2068    that do not produce floating-point results.
2069
2070    "NTC" (no type checking) disables data type checking on the instruction,
2071    and allows instructions to use operands or result variables whose data
2072    types are inconsistent with the expected data types of the instruction.
2073
2074    "S24", "U24", and "HI" are special modifiers that are allowed only for the
2075    MUL instruction, and are described in detail where MUL is documented.  No
2076    more than one such modifier may be provided for any instruction.
2077
2078    If an instruction supports data type modifiers, but none is provided, a
2079    default data type will be chosen based on the instruction, as specified in
2080    Table X.13 and the instruction set description (Section 2.X.8).  If
2081    condition code update or clamping modifiers are not specified, the
2082    corresponding operation will not be performed.
2083
2084    Additionally, each instruction name may have one or more suffixes,
2085    concatenated onto the base instruction name, that operate as instruction
2086    modifiers.  For conciseness, these suffixes are not spelled out in the
2087    grammar -- the base opcode name is used as a placeholder for the opcode
2088    and all of its possible suffixes.  Instruction suffixes are provided
2089    mainly for compatibility with prior GPU program instruction sets (e.g.,
2090    NV_vertex_program3, NV_fragment_program2, and predecessors).  The set of
2091    allowable suffixes, and their equivalent stand-alone modifiers, are listed
2092    in Table X.15.
2093
2094      Suffix  Modifier     Description
2095      ------  ----------   ---------------------------------------------------
2096      R       F            Floating-point operation, 32-bit precision
2097      H       F(*)         Floating-point operation, at least 16-bit precision
2098      C       CC0          Update condition code register zero
2099      C0      CC0          Update condition code register zero
2100      C1      CC1          Update condition code register one
2101      _SAT    SAT          Floating-point results clamped to [0,1]
2102      _SSAT   SSAT         Floating-point results clamped to [-1,1]
2103
2104      Table X.15,  Instruction Suffixes.
2105
2106    The "R" and "H" suffixes specify floating-point operations and are
2107    equivalent to the "F" data type modifier.  They additionally specify a
2108    minimum precision for the operations.  Instructions with an "R" precision
2109    modifier will be carried out at no less than IEEE single-precision
2110    floating-point (8 bits of exponent, 23 bits of mantissa).  Instructions
2111    with an "H" precision modifier will be carried out at no less than 16-bit
2112    floating-point precision (5 bits of exponent, 10 bits of mantissa).
2113
2114    An instruction may have multiple suffixes, but they must appear in order,
2115    with data type suffixes first, followed by condition code update suffixes,
2116    followed by clamping suffixes.  For example, "ADDR" carries out an add at
2117    32-bit precision.  "ADDH_SAT" carries out an add at 16-bit precision (or
2118    better) and clamps the results to [0,1].  "ADDRC1_SSAT" carries out an add
2119    at 32-bit floating-point precision, clamps the results to [-1,1], and
2120    updates condition code one based on the clamped result.
2121
2122
2123    Section 2.X.4.2, Program Operands
2124
2125    Most program instructions operate on one or more scalar or vector
2126    operands.  Each operand specifies an operand variable, which is either the
2127    name of a previously declared variable or an implicit variable declaration
2128    created by using a variable binding in the instruction.  Attribute,
2129    parameter, or parameter buffer variables can be declared implicitly by
2130    using a valid binding name in an operand.  Instruction operands are
2131    specified by the <instOperandV>, <instOperandS>, or <instOperandVNS>
2132    grammar rules.
2133
2134    If the operand variable is not an array, its contents are loaded directly.
2135    If the operand variable is an array, a single element of the array is
2136    loaded according to the <arrayMem> grammar rule.  The elements of an array
2137    are numbered from 0 to <n>-1, where <n> is the number of entries in the
2138    array.  Array members can be accessed using either absolute or relative
2139    addressing.
2140
2141    Absolute array addressing is used when the <arrayMemAbs> grammar rule is
2142    matched; the array member to load is specified by the matching integer.
2143    Out-of-bounds array absolute accesses are not allowed.  If the specified
2144    member number is greater than or equal to the size of the array, the
2145    program will fail to load.
2146
2147    Relative array addressing is used when the <arrayMemRel> grammar rule is
2148    matched.  This grammar rule allows the program to specify a scalar integer
2149    operand and an optional constant offset, according to the <arrayMemReg>
2150    and <arrayMemOffset> grammar rules.  When performing relative addressing,
2151    the GL evaluates the specified integer scalar operand (according to the
2152    rules specified in this section) and adds the constant offset.  The array
2153    member loaded is given by this sum.  The constant offset is considered
2154    zero if an offset is omitted.  If the sum is negative or exceeds the size
2155    of the array, the results of the access are undefined, but may not lead to
2156    program or GL termination.  The set of constant offsets supported for
2157    relative addressing is limited to values in the range [0,<n>-1], where <n>
2158    is the size of the array.  A program will fail to load if it specifies an
2159    offset outside that range.  If offsets outside that range are required,
2160    they can be applied by using an integer ADD instruction writing to a
2161    temporary variable.
2162
2163    After the operand is loaded, its components can be rearranged according to
2164    the <swizzleSuffix> grammar rule, or it can be converted to a scalar
2165    operand according to the <scalarSuffix> grammar rule.
2166
2167    The <swizzleSuffix> grammar rule rearranges the components of a loaded
2168    vector to produce another vector.  If the <swizzleSuffix> rule matches the
2169    <xyzwSwizzle> or <rgbaSwizzle> grammar rule, a pattern of the form ".????"
2170    is used, where each question mark is replaced with one of "x", "y", "z",
2171    "w", "r", "g", "b", or a".  For such patterns, the x, y, z, and w
2172    components of the operand are taken from the vector components named by
2173    the first, second, third, and fourth character of the pattern,
2174    respectively.  Swizzle components of "r", "g", "b", and "a" are equivalent
2175    to "x", "y", "z", and "w", respectively.  For example, if the swizzle
2176    suffix is ".yzzx" or ".gbbr" and the specified source contains {2,8,9,0},
2177    the result is the vector {8,9,9,2}.  If the <swizzleSuffix> matches the
2178    <component> grammar rule, a pattern of the form ".?" is used.  For this
2179    pattern, all four components of the operand are taken from the single
2180    component identified by the pattern.  If the swizzle suffix is omitted,
2181    components are not rearranged and swizzling has no effect, as though
2182    ".xyzw" were specified.
2183
2184    The swizzle suffix rules do not allow mixing "x", "y", "z", or "w"
2185    selectors with "r", "g", "b", or "a" selectors.  A program will fail to
2186    load if it contains a swizzle suffix with selectors from both of these
2187    sets.
2188
2189    The <scalarSuffix> grammar rule converts a vector to a scalar by selecting
2190    a single component.  The <scalarSuffix> rule is similar to the swizzle
2191    selector, except that only a single component is selected.  If the scalar
2192    suffix is ".y" and the specified source contains {2,8,9,0}, the value is
2193    the scalar value 8.
2194
2195    Next, a component-wise negate operation is performed on the operand if the
2196    <operandNeg> grammar rule matches "-".  Negation is not performed if the
2197    operand has no sign prefix, or is prefixed with "+".  For unsigned integer
2198    operands, the negate operand performs a two's complement operation.
2199
2200    Next, a component-wise absolute value operation is performed on the
2201    operand if the <instOperandAbsV> or <instOperandAbsS> grammar rule is
2202    matched, by surrounding the operand with two "|" characters.  The result
2203    is optionally negated if the <operandAbsNeg> grammar rule matches "-".
2204    For unsigned integer operands, the absolute value operation has no effect.
2205
2206
2207    Section 2.X.4.3, Program Destination Variable Update
2208
2209    Most program instructions perform computations that produce a result,
2210    which will be written to a variable.  Each instruction that computes a
2211    result specifies a destination variable, which is either the name of a
2212    previously declared variable or an implicit variable declaration created
2213    by using a variable binding in the instruction.  Result variables can be
2214    declared implicitly by using a valid program result binding name in the
2215    result portion of the instruction.  Instruction results are specified
2216    according to the <instResult> grammar rule.
2217
2218    The destination variable may be a single member of an array.  In this
2219    case, a single array member is specified using the <arrayMem> grammar
2220    rule, and the array member to update is computed in the exact same manner
2221    as done for operand loads.  If the array member is computed at run time,
2222    and is negative or greater than or equal to the size of the array, the
2223    results of the destination variable update are undefined and could result
2224    in overwriting other program variables.
2225
2226    The results of the operation may be obtained at a different precision than
2227    that used to store the destination variable.  If so, the results are
2228    converted to match the size of the destination variable.  For
2229    floating-point values, the results are rounded to the nearest
2230    floating-point value that can be represented in the destination variable.
2231    If a result component is larger in magnitude than the largest
2232    representable floating-point value in the data type of the destination
2233    variable, an infinity encoding (+/-INF) is used.  Signed or unsigned
2234    integer values are sign-extended or zero-extended, respectively, if the
2235    destination variable has more bits than the result, and have their most
2236    significant bits discarded if the destination variable has fewer bits.
2237
2238    Writes to individual components of a vector destination variable can be
2239    controlled at compile time by individual component write masks specified
2240    in the instruction.  The component write mask is specified by the
2241    <optWriteMask> grammar rule, and is a string of up to four characters,
2242    naming the components to enable for writing.  If no write mask is
2243    specified, all components are enabled for writing.  The characters "x",
2244    "y", "z", and "w" match the x, y, z, and w components respectively.  For
2245    example, a write mask mask of ".xzw" indicates that the x, z, and w
2246    components should be enabled for writing but the y component should not be
2247    written.  The grammar requires that the destination register mask
2248    components must be listed in "xyzw" order.  Additionally, write mask
2249    components of "r", "g", "b", and "a" are equivalent to "x", "y", "z", and
2250    "w", respectively.  The grammar does not allow mixing "x", "y", "z", or
2251    "w" components with "r", "g", "b", and "a" ones.
2252
2253    Writes to individual components of a vector destination variable, or to a
2254    scalar destination variable, can also be controlled at run time using
2255    condition code write masks.  The condition code write mask is specified by
2256    the <ccMask> grammar rule.  If a mask is specified, a condition code
2257    variable is loaded according to the <ccMaskRule> grammar rule and tested
2258    as described in Table X.16 to produce a four-component vector of TRUE/FALSE
2259    values.
2260
2261         mask rule         test name                condition
2262         ---------------   ----------------------   -----------------
2263         EQ,  EQ0,  EQ1    equal                    !SF && ZF
2264         GE,  GE0,  GE1    greater than or equal    !(SF ^ OF)
2265         GT,  GT0,  GT1    greater than             (!SF ^ OF) && !ZF
2266         LE,  LE0,  LE1    less than or equal       SF ^ (ZF || OF)
2267         LT,  LT0,  LT1    less than                (SF && !ZF) ^ OF
2268         NE,  NE0,  NE1    not equal                SF || !ZF
2269         FL,  FL0,  FL1    false                    always false
2270         TR,  TR0,  TR1    true                     always true
2271
2272         NAN, NAN0, NAN1   not a number             SF && ZF
2273         LEG, LEG0, LEG1   less, equal, or greater  !SF || !ZF
2274                             (anything but a NaN)
2275
2276         CF,  CF0,  CF1    carry flag               CF
2277         NCF, NCF0, NCF1   no carry flag            !CF
2278         OF,  OF0,  OF1    overflow flag            OF
2279         NOF, NOF0, NOF1   no overflow flag         !OF
2280         SF,  SF0,  SF1    sign flag                SF
2281         NSF, NSF0, NSF1   no sign flag             !SF
2282         AB,  AB0,  AB1    above                    CF && !ZF
2283         BLE, BLE0, BLE1   below or equal           !CF || ZF
2284
2285      Table X.16, Condition Code Tests.  The allowed rules are specified in
2286      the "mask rule" column.  If "0" or "1" is appended to the rule name
2287      (e.g., "EQ1"), the corresponding condition code register (CC1 in this
2288      example) is loaded, otherwise CC0 is loaded.  After loading, each
2289      component is tested, using the expression listed in the "condition"
2290      column.
2291
2292    After the condition code tests are performed, the four-component result
2293    can be swizzled according to the <swizzleSuffix> grammar rule.  Individual
2294    components of the destination variable are written only if the
2295    corresponding component of the swizzled condition code test result is
2296    TRUE.  If both a (compile-time) component write mask and a condition code
2297    write mask are specified, destination variable components are written only
2298    if the corresponding component is enabled in both masks.
2299
2300    A program instruction can also optionally update one of the two condition
2301    code registers if the "CC", "CC0", or "CC1" instruction modifier are
2302    specified.  These instruction modifiers update condition code register
2303    CC0, CC0, or CC1, respectively.  The instructions "ADD.CC" or "ADD.CC0"
2304    will perform an add and update condition code zero, "ADD.CC1" will add and
2305    update condition code one, and "ADD" will simply perform the add without a
2306    condition code update.  The components of the selected condition code
2307    register are updated if and only if the corresponding component of the
2308    destination variable are enabled by both write masks.  For the purposes of
2309    condition code update, a scalar destination variable is treated as a
2310    vector where the scalar result is written to "x" (if enabled in the write
2311    mask), and writes to the "y", "z", and "w" components are disabled.
2312
2313    When condition code components are written, the condition code flags are
2314    updated based on the corresponding component of the result.  If a
2315    component of the destination register is not enabled for writes, the
2316    corresponding condition code component is also unchanged.
2317
2318    For floating-point results, the sign flag (SF) is set if the result is
2319    less than zero or is a NaN (not a number) value.  The zero flag (ZF) is
2320    set if the result is equal to zero or is a NaN.
2321
2322    For signed and unsigned integer results, the sign flag (SF) is set if the
2323    most significant bit of the value written to the result variable is set
2324    and the zero flag (ZF) is set if the result written is zero.  For
2325    instructions other than those performing an integer add or subtract (ADD,
2326    MAD, SAD, SUB), the overflow and carry flags (OF and CF) are cleared.
2327
2328    For integer add or subtract operations, the overflow and carry flags by
2329    doing both signed and unsigned adds/subtracts as follows:
2330
2331      The overflow flag (OF) is set by interpreting the two operands as signed
2332      integers and performing a signed add or subtract.  If the result is
2333      representable as a signed integer (i.e., doesn't overflow), the overflow
2334      flag is cleared; otherwise, it is set.
2335
2336      The carry flag (CF) is set by interpreting the two operands as unsigned
2337      integers and performing an unsigned add or subtract.  If the result of
2338      an add is representable as an unsigned integer (i.e., doesn't overflow),
2339      the carry flag is cleared; otherwise, it is set.  If the result of a
2340      subtract is greater than or equal to zero, the carry flag is set;
2341      otherwise, it is cleared.
2342
2343    For the purposes of condition code setting, negation modifiers turn add
2344    operations into subtracts and vice versa.  If the operation is equivalent
2345    to an add with both operands negated (-A-B), the carry and overflow flags
2346    are both undefined.
2347
2348
2349    Section 2.X.4.4, Program Texture Access
2350
2351    Certain program instructions may access texture images, as described in
2352    section 3.8.  The coordinates, level-of-detail, and partial derivatives
2353    used for performing the texture lookup are derived from values provided in
2354    the program as described in the various sub-sections of Section 2.X.8.
2355    These descriptions use the function
2356
2357      result_t_vec
2358        TextureSample(float_vec coord, float lod, float_vec ddx,
2359                      float_vec ddy, int_vec offset);
2360
2361    which obtains a filtered texel value <tau> as described in Section 3.8.8
2362    and returns a 4-component vector (R,G,B,A) according to the format
2363    conversions specified in Table 3.21.  The result vector is interpreted as
2364    floating-point, signed integer, or unsigned integer, according to the data
2365    type modifier of the instruction.  If the internal format of the texture
2366    does not match the instruction's data type modifer, the results of the
2367    texture lookup are undefined.
2368
2369    (Note:  For unextended OpenGL 2.0, all supported texture internal formats
2370    store integer values but return floating-point results in the range [0,1]
2371    on a texture lookup.  The ARB_texture_float extension introduces
2372    floating-point internal format where components are both stored and
2373    returned as floating-point values.  The EXT_texture_integer extension
2374    introduces formats that both store and return either signed or unsigned
2375    integer values.)
2376
2377    <coord> is a four-component floating-point vector from which the (s,t,r)
2378    texture coordinates used for the texture access, the layer used for array
2379    textures, and the reference value used for depth comparisons (section
2380    3.8.14) are extracted according to Table X.17.  If the texture is a cube
2381    map, (s,t,r) is projected to one of the six cube faces to produce a new
2382    (s,t) vector according to Section 3.8.6.  For array textures, the layer
2383    used is derived by rounding the extracted floating-point component to the
2384    nearest integer and clamping the result to the range [0,<n>-1], where <n>
2385    is the number of layers in the texture.
2386
2387    <lod> specifies the level of detail parameter and replaces the value
2388    computed in equation 3.18.  <ddx> and <ddy> specify partial derivatives
2389    (ds/dx, dt/dx, dr/dx, ds/dy, dt/dy, and dr/dy) for the texture
2390    coordinates, and may be used to derive footprint shapes for anisotropic
2391    texture filtering.
2392
2393    <offset> is a constant 3-component signed integer vector specified
2394    according to the <texOffset> grammar rule, which is added to the computed
2395    <u>, <v>, and <w> texel locations prior to sampling.  One, two, or three
2396    components may be specified in the instruction; if fewer than three are
2397    specified, the remaining offset components are zero.  A limited range of
2398    offset values are supported; the minimum and maximum <texOffset> values
2399    are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and
2400    MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively.  A program will fail to load:
2401
2402      * if the texture target specified in the instruction is 1D, ARRAY1D,
2403        SHADOW1D, or SHADOWARRAY1D, and the second or third component of the
2404        offset vector is non-zero,
2405
2406      * if the texture target specified in the instruction is 2D, RECT,
2407        ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third
2408        component of the offset vector is non-zero,
2409
2410      * if the texture target is CUBE or SHADOWCUBE, and any component of the
2411        offset vector is non-zero -- texel offsets are not supported for cube
2412        map or buffer textures, or
2413
2414      * if any component of the offset vector is less than
2415        MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than
2416        MAX_PROGRAM_TEXEL_OFFSET_EXT.
2417
2418    (NOTE:  Texel offsets are a new feature provided by this extension and are
2419    described in more detail in edits to Section 3.8 below.)
2420
2421    The texture used by TextureSample() is one of the textures bound to the
2422    texture image unit whose number is specified in the instruction according
2423    to the <texImageUnit> grammar rule.  The texture target accessed is
2424    specified according to the <texTarget> grammar rule and Table X.17.
2425    Fixed-function texture enables are always ignored when determining the
2426    texture to access in a program.
2427
2428                                                     coordinates used
2429      texTarget          Texture Type               s t r  layer  shadow
2430      ----------------   ---------------------      -----  -----  ------
2431      1D                 TEXTURE_1D                 x - -    -      -
2432      2D                 TEXTURE_2D                 x y -    -      -
2433      3D                 TEXTURE_3D                 x y z    -      -
2434      CUBE               TEXTURE_CUBE_MAP           x y z    -      -
2435      RECT               TEXTURE_RECTANGLE_ARB      x y -    -      -
2436      ARRAY1D            TEXTURE_1D_ARRAY_EXT       x - -    y      -
2437      ARRAY2D            TEXTURE_2D_ARRAY_EXT       x y -    z      -
2438      SHADOW1D           TEXTURE_1D                 x - -    -      z
2439      SHADOW2D           TEXTURE_2D                 x y -    -      z
2440      SHADOWRECT         TEXTURE_RECTANGLE_ARB      x y -    -      z
2441      SHADOWCUBE         TEXTURE_CUBE_MAP           x y z    -      w
2442      SHADOWARRAY1D      TEXTURE_1D_ARRAY_EXT       x - -    y      z
2443      SHADOWARRAY2D      TEXTURE_2D_ARRAY_EXT       x y -    z      w
2444      BUFFER             TEXTURE_BUFFER_EXT           <not supported>
2445
2446      Table X.17:  Texture types accessed for each of the <texTarget>, and
2447      coordinate mappings.  The "SHADOW" and "ARRAY" targets are special
2448      pseudo-targets described below.  The "coordinates used" column indicate
2449      the input values used for each coordinate of the texture lookup, the
2450      layer selector for array textures, and the reference value for texture
2451      comparisons.  Buffer textures are not supported by normal texture lookup
2452      functions, but are supported by TXF and TXQ, described below.
2453
2454    Texture targets with "SHADOW" are used to access textures with a
2455    DEPTH_COMPONENT base internal format using depth comparisons (Section
2456    3.8.14).  Results of a texture access are undefined:
2457
2458      * if a "SHADOW" target is used, and the corresponding texture has a base
2459        internal format other than DEPTH_COMPONENT or a TEXTURE_COMPARE_MODE
2460        of NONE, or
2461
2462      * if a non-"SHADOW" target is used, and the corresponding texture has a
2463        base internal format of DEPTH_COMPONENT and a TEXTURE_COMPARE_MODE
2464        other than NONE.
2465
2466    If the texture being accessed is not complete (or cube complete for
2467    cubemap textures), no texture access is performed and the result is
2468    undefined.
2469
2470    A program will fail to load if it attempts to sample from multiple texture
2471    targets (including the SHADOW pseudo-targets) on the same texture image
2472    unit.  For example, a program containing any two the following
2473    instructions will fail to load:
2474
2475      TEX out, coord, texture[0], 1D;
2476      TEX out, coord, texture[0], 2D;
2477      TEX out, coord, texture[0], ARRAY2D;
2478      TEX out, coord, texture[0], SHADOW2D;
2479      TEX out, coord, texture[0], 3D;
2480
2481    Additionally, multiple texture targets for a single texture image unit may
2482    not be used at the same time by the GL.  The error INVALID_OPERATION is
2483    generated by Begin, RasterPos, or any command that performs an implicit
2484    Begin if an enabled program accesses one texture target for a texture unit
2485    while another enabled program or fixed-function fragment processing
2486    accesses a different texture target for the same texture image unit.
2487
2488    Some texture instructions use standard methods to compute partial
2489    derivatives and/or the level-of-detail used to perform texture accesses.
2490    For fragment programs, the functions
2491
2492      float_vec ComputePartialsX(float_vec coord);
2493      float_vec ComputePartialsY(float_vec coord);
2494
2495    compute approximate component-wise partial derivatives of the
2496    floating-point vector <coord> relative to the X and Y coordinates,
2497    respectively.  For vertex and geometry programs, these functions always
2498    return (0,0,0,0).  The function
2499
2500      float ComputeLOD(float_vec ddx, float_vec ddy);
2501
2502    maps partial derivative vectors <ddx> and <ddy> to ds/dx, dt/dx, dr/dx,
2503    ds/dy, dt/dy, and dr/dy and computes lambda_base(x,y) according to
2504    equation 3.18.
2505
2506    The TXF instruction provides the ability to extract a single texel from a
2507    specified texture image using the function
2508
2509      result_t_vec TexelFetch(int_vec coord, int_vec offset);
2510
2511    The extracted texel is converted to an (R,G,B,A) vector according to Table
2512    3.21.  The result vector is interpreted as floating-point, signed integer,
2513    or unsigned integer, according to the data type modifier of the
2514    instruction.  If the internal format of the texture is not compatible with
2515    the instruction's data type modifer, the extracted texel value is
2516    undefined.
2517
2518    <coord> is a four-component signed integer vector used to identify the
2519    single texel accessed.  The (i,j,k) coordinates of the texel and the layer
2520    used for array textures are extracted according to Table X.18.  The level
2521    of detail accessed is obtained by adding the w component of <coord> to the
2522    base level (level_base).  <offset> is a constant 3-component signed
2523    integer vector added to the texel coordinates prior to the texel fetch as
2524    described above.  In addition to the restrictions described above,
2525    non-zero offset components are also not supported for BUFFER targets.
2526
2527    The texture used by TexelFetch() is specified by the image unit and target
2528    parameters provided in the instruction, as for TextureSample() above.
2529    Single texel fetches can not perform depth comparisons or access cubemaps.
2530    If a program contains a TXF instruction specifying one of the "SHADOW" or
2531    "CUBE" targets, it will fail to load.
2532
2533                                      coordinates used
2534      texTarget          supported      i j k  layer  lod
2535      ----------------   ---------      -----  -----  ---
2536      1D                    yes         x - -    -     w
2537      2D                    yes         x y -    -     w
2538      3D                    yes         x y z    -     w
2539      CUBE                  no          - - -    -     -
2540      RECT                  yes         x y -    -     w
2541      ARRAY1D               yes         x - -    y     w
2542      ARRAY2D               yes         x y -    z     w
2543      SHADOW1D              no          - - -    -     -
2544      SHADOW2D              no          - - -    -     -
2545      SHADOWRECT            no          - - -    -     -
2546      SHADOWCUBE            no          - - -    -     -
2547      SHADOWARRAY1D         no          - - -    -     -
2548      SHADOWARRAY2D         no          - - -    -     -
2549      BUFFER                yes         x - -    -     -
2550
2551      Table X.18, Mappings of texel fetch coordinates to texel location.
2552
2553    Single-texel fetches do not support LOD clamping or any texture wrap mode,
2554    and require a mipmapped minification filter to access any level of detail
2555    other than the base level.  The results of the texel fetch are undefined:
2556
2557      * if the computed LOD is less than the texture's base level (level_base)
2558        or greater than the maximum level (level_max),
2559
2560      * if the computed LOD is not the texture's base level and the texture's
2561        minification filter is NEAREST or LINEAR,
2562
2563      * if the layer specified for array textures is negative or greater than
2564        the number of layers in the array texture,
2565
2566      * if the texel at (i,j,k) coordinates refer to a border texel outside
2567        the defined extents of the specified LOD, where
2568
2569         i < -b_s, j < -b_s, k < -b_s,
2570         i >= w_s - b_s, j >= h_s - b_s, or k >= d_s - b_s,
2571
2572        where the size parameters (w_s, h_s, d_s, and b_s) refer to the width,
2573        height, depth, and border size of the image, as in equations 3.15,
2574        3.16, and 3.17, or
2575
2576      * if the texture being accessed is not complete (or cube complete for
2577        cubemaps).
2578
2579
2580    Section 2.X.5, Program Flow Control
2581
2582    In addition to basic arithmetic, logical, and texture instructions, a
2583    number of flow control instructions are provided, which are described in
2584    detail in Section 2.X.8.  Programs can contain several types of
2585    instruction blocks:  IF/ELSE/ENDIF blocks, REP/ENDREP blocks, and
2586    subroutine blocks.  IF/ELSE/ENDIF blocks are a set of instructions
2587    beginning with an "IF" instruction, ending with an "ENDIF" instruction,
2588    and possibly containing an optional "ELSE" instruction.  REP/ENDREP blocks
2589    are a set of instructions beginning with a "REP" instruction and ending
2590    with an "ENDREP" instruction.  Subroutine blocks begin with an instruction
2591    label identifying the name of the subroutine and ending just before the
2592    next instruction label or the end of the program.  Examples include the
2593    following:
2594
2595        MOVC CC, R0;
2596        IF GT.x;
2597          MOV R0, R1;     # executes if R0.x > 0
2598        ELSE;
2599          MOV R0, R2;     # executes if R0.x <= 0
2600        ENDIF;
2601
2602        REP repCount;
2603        ADD R0, R0, R1;
2604        ENDREP;
2605
2606      square:             # subroutine to compute R0^2
2607        MUL R0, R0, R0;
2608        RET;
2609      main:
2610        MOV R0, 9.0;
2611        CAL square;       # compute 9.0^2 in R0
2612
2613    IF/ELSE/ENDIF and REP/ENDREP blocks may be nested inside each other, and
2614    inside subroutines.  In all cases, each instruction block must be
2615    terminated with the appropriate instruction (ENDIF for IF, ENDREP for
2616    REP).  Nested instruction blocks must be wholly contained within a block
2617    -- if a REP instruction is found between an IF and ELSE instruction, the
2618    corresponding ENDREP must also be present between the IF and ELSE.
2619    Subroutines may not be nested inside IF/ELSE/ENDIF or REP/ENDREP blocks,
2620    or inside other subroutines.  A program will fail to load if any
2621    instruction block is terminated by an incorrect instruction, is not
2622    terminated before the block containing it, or contains an instruction
2623    label.
2624
2625    IF/ELSE/ENDIF blocks evaluate a condition to determine which instructions
2626    to execute.  If the condition is true, all instructions between the IF and
2627    ELSE are executed.  If the condition is false, all instructions between
2628    the ELSE and ENDIF are executed.  The ELSE instruction is optional.  If
2629    the ELSE is omitted, all instructions between the IF and ENDIF are
2630    executed if the condition is true, or skipped if the condition is false.
2631    A limited amount of nesting is supported -- a program will fail to load if
2632    an IF instruction is nested inside MAX_PROGRAM_IF_DEPTH_NV or more
2633    IF/ELSE/ENDIF blocks.
2634
2635    REP/ENDREP blocks are used to execute a sequence of instructions multiple
2636    times.  The REP instruction includes an optional scalar operand to specify
2637    a loop count indicating the number of times the block of instructions
2638    should be repeated.  If the loop count is omitted, the contents of a
2639    REP/ENDREP block will be repeated indefinitely until the loop is
2640    explicitly terminated.  A limited amount of nesting is supported -- a
2641    program will fail to load if a REP instruction is nested inside
2642    MAX_PROGRAM_LOOP_DEPTH_NV or more REP/ENDREP blocks.
2643
2644    Within a REP/ENDREP block, the CONT instruction can be used to terminate
2645    the current iteration of the loop by effectively jumping to the ENDREP
2646    instruction.  The BRK instruction can be used to terminate the entire loop
2647    by effectively jumping to the instruction immediately following the ENDREP
2648    instruction.  If CONT and BRK instructions are found inside multiply
2649    nested REP/ENDREP blocks, they apply to the innermost block.  A program
2650    will fail to load if it includes a CONT or BRK instruction that is not
2651    contained inside a REP/ENDREP block.
2652
2653    A REP/ENDREP block without a specified loop count can result in an
2654    infinite loop.  To prevent obvious infinite loops, a program will fail to
2655    load if it contains a REP/ENDREP block that contains neither a BRK
2656    instruction at the current nesting level or a RET instruction at any
2657    nesting level.
2658
2659    Subroutines are supported via the CAL and RET instructions.  A subroutine
2660    block is identified by an instruction, which can be any valid identifier
2661    according to the <instLabel> grammar rule.  The CAL instruction identifies
2662    a subroutine name to call according to the <instTarget> grammar rule.
2663    Instruction labels used in CAL instructions do not need to be defined in
2664    the program text that precedes the instruction, but a program will fail to
2665    load if it includes a CAL instruction that references an instruction label
2666    that is not defined anywhere in the program.  When a CAL instruction is
2667    executed, it transfers control to the instruction immediately following
2668    the specified instruction label.  Subsequent instructions in that
2669    subroutine are executed until a RET instruction is executed, or until
2670    program execution reaches another instruction label or the end of the
2671    program text.  After the subroutine finishes, execution continues with the
2672    instruction immediately following the CAL instruction.  When a RET
2673    instruction is issued, it will break out of any IF/ELSE/ENDIF or
2674    REP/ENDREP blocks that contain it.
2675
2676    Subroutines may call other subroutines before completing, up to an
2677    implementation-dependent maximum depth of MAX_PROGRAM_CALL_DEPTH_NV calls.
2678    Subroutines may call any subroutine in the program, including themselves,
2679    as long as the call depth limit is obeyed.  The results of issuing a CAL
2680    instruction while MAX_PROGRAM_CALL_DEPTH subroutines have not completed
2681    has undefined results, including possible program termination.
2682
2683    Several flow control instructions include condition code tests.  The IF
2684    instruction requires a condition test to determine what instructions are
2685    executed.  The CONT, BRK, CAL, and RET instructions have an optional
2686    condition code test; if the test fails, the instructions are not executed.
2687    Condition code tests are specified by the <ccTest> grammar rule.  The test
2688    is evaluated like the condition code write mask (section 2.X.4.3), and
2689    passes if and only if any of the four components passes.
2690
2691    If an instruction label named "main" is specified, GPU program execution
2692    begins with the instruction immediately following that label.  Otherwise,
2693    it begins with the first instruction of the program.  Instructions are
2694    executed in sequence until either a RET instruction is issued in the main
2695    subroutine or the end of the program text is reached.
2696
2697
2698    Section 2.X.6, Program Options
2699
2700    Programs may specify a number of options to indicate that one or more
2701    extended language features are used by the program.  All program options
2702    used by the program must be declared at the beginning of the program
2703    string.  Each program option specified in a program string will modify the
2704    syntactic or semantic rules used to interpet the program and the execution
2705    environment used to execute the program.  Features in program options
2706    not declared by the program are ignored, even if the option is otherwise
2707    supported by the GL.  Each option declaration consists of two tokens: the
2708    keyword "OPTION" and an identifier.
2709
2710    The set of available options depends on the program type, and is
2711    enumerated in the specifications for each program type.  Some program
2712    types may not provide any options.
2713
2714
2715    Section 2.X.7, Program Declarations
2716
2717    Programs may include a number of declaration statements to specify
2718    characteristics of the program.  Each declaration statement is followed by
2719    one or more arguments, separated by commas.
2720
2721    The set of available declarations depends on the program type, and is
2722    enumerated in the specifications for each program type.  Some program
2723    types may not provide declarations.
2724
2725
2726    Section 2.X.8, Program Instruction Set
2727
2728    The following sections enumerate the set of instructions supported for GPU
2729    programs.
2730
2731    Some instructions allow the use of one of the three basic data type
2732    modifiers (floating point, signed integer, and unsigned integer).  Unless
2733    otherwise mentioned:
2734
2735      * the result and all of the operands will be interpreted according to
2736        the specified data type, and
2737
2738      * if no data type modifier is specified, the instruction will operate as
2739        though a floating-point modifier ("F") were specified.
2740
2741    Some instructions will override one or both of these rules.
2742
2743
2744    Section 2.X.8.Z, ABS:  Absolute Value
2745
2746    The ABS instruction performs a component-wise absolute value operation on
2747    the single operand to yield a result vector.
2748
2749      tmp = VectorLoad(op0);
2750      result.x = abs(tmp.x);
2751      result.y = abs(tmp.y);
2752      result.z = abs(tmp.z);
2753      result.w = abs(tmp.w);
2754
2755    ABS supports all three data type modifiers.  Taking the absolute value of
2756    an unsigned integer is not a useful operation, but is not illegal.
2757
2758
2759    Section 2.X.8.Z, ADD:  Add
2760
2761    The ADD instruction performs a component-wise add of the two operands to
2762    yield a result vector.
2763
2764      tmp0 = VectorLoad(op0);
2765      tmp1 = VectorLoad(op1);
2766      result.x = tmp0.x + tmp1.x;
2767      result.y = tmp0.y + tmp1.y;
2768      result.z = tmp0.z + tmp1.z;
2769      result.w = tmp0.w + tmp1.w;
2770
2771    ADD supports all three data type modifiers.
2772
2773
2774    Section 2.X.8.Z, AND:  Bitwise AND
2775
2776    The AND instruction performs a bitwise AND operation on the components of
2777    the two source vectors to yield a result vector.
2778
2779      tmp0 = VectorLoad(op0);
2780      tmp1 = VectorLoad(op1);
2781      result.x = tmp0.x & tmp1.x;
2782      result.y = tmp0.y & tmp1.y;
2783      result.z = tmp0.z & tmp1.z;
2784      result.w = tmp0.w & tmp1.w;
2785
2786    AND supports only signed and unsigned integer data type modifiers.  If no
2787    type modifier is specified, both operands and the result are treated as
2788    signed integers.
2789
2790
2791    Section 2.X.8.Z, BRK:  Break out of Loop Instruction
2792
2793    The BRK instruction conditionally transfers control to the instruction
2794    immediately following the next ENDREP instruction.  A BRK instruction has
2795    no effect if the condition code test evaluates to FALSE.
2796
2797    The following pseudocode describes the operation of the instruction:
2798
2799      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
2800          TestCC(cc.**c*) || TestCC(cc.***c)) {
2801        continue execution at instruction following the next ENDREP;
2802      }
2803
2804
2805    Section 2.X.8.Z, CAL:  Subroutine Call
2806
2807    The CAL instruction conditionally transfers control to the instruction
2808    following the label specified in the instruction.  It also pushes a
2809    reference to the instruction immediately following the CAL instruction
2810    onto the call stack, where execution will continue after executing the
2811    matching RET instruction.  The following pseudocode describes the
2812    operation of the instruction:
2813
2814      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
2815          TestCC(cc.**c*) || TestCC(cc.***c)) {
2816        if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) {
2817          // undefined results
2818        } else {
2819          callStack[callStackDepth] = nextInstruction;
2820          callStackDepth++;
2821        }
2822        // continue execution at instruction following <instTarget>
2823      } else {
2824        // do nothing
2825      }
2826
2827    In the pseudocode, <instTarget> is the label specified in the instruction
2828    matching the <branchLabel> grammar rule, <callStackDepth> is the current
2829    depth of the call stack, <callStack> is an array holding the call stack,
2830    and <nextInstruction> is a reference to the instruction immediately
2831    following the CAL instruction in the program string.
2832
2833    If the call stack overflows, the results of the CAL instruction are
2834    undefined, and can result in immediate program termination.
2835
2836    An instruction label signifies the beginning of a new subroutine.
2837    Subroutines may not nest or overlap.  If a CAL instruction is executed and
2838    subsequent program execution reaches an instruction label before a
2839    corresponding RET instruction is executed, the subroutine call returns
2840    immediately, as though an unconditional RET instruction were inserted
2841    immediately before the instruction label.
2842
2843    (Note:  On previous vertex program extensions -- NV_vertex_program2 and
2844    NV_vertex_program3 -- instruction labels were also used as targets for
2845    branch (BRA) instructions.  This unstructured branching functionality has
2846    been replaced with the structured branching constructs found in this
2847    instruction set.)
2848
2849
2850    Section 2.X.8.Z, CEIL:  Ceiling
2851
2852    The CEIL instruction loads a single vector operand and performs a
2853    component-wise ceiling operation to generate a result vector.
2854
2855      tmp = VectorLoad(op0);
2856      iresult.x = ceil(tmp.x);
2857      iresult.y = ceil(tmp.y);
2858      iresult.z = ceil(tmp.z);
2859      iresult.w = ceil(tmp.w);
2860
2861    The ceiling operation returns the nearest integer greater than or equal to
2862    the operand.  For example ceil(-1.7) = -1.0, ceil(+1.0) = +1.0, and
2863    ceil(+3.7) = +4.0.
2864
2865    CEIL supports all three data type modifiers.  The single operand is always
2866    treated as a floating-point vector, but the result is written as a
2867    floating-point value, a signed integer, or an unsigned integer, as
2868    specified by the data type modifier.  If a value is not exactly
2869    representable using the data type of the result (e.g., an overflow or
2870    writing a negative value to an unsigned integer), the result is undefined.
2871
2872
2873    Section 2.X.8.Z, CMP:  Compare
2874
2875    The CMP instructions performs a component-wise comparison of the first
2876    operand against zero, and copies the values of the second or third
2877    operands based on the results of the compare.
2878
2879      tmp0 = VectorLoad(op0);
2880      tmp1 = VectorLoad(op1);
2881      tmp2 = VectorLoad(op2);
2882      result.x = (tmp0.x < 0) ? tmp1.x : tmp2.x;
2883      result.y = (tmp0.y < 0) ? tmp1.y : tmp2.y;
2884      result.z = (tmp0.z < 0) ? tmp1.z : tmp2.z;
2885      result.w = (tmp0.w < 0) ? tmp1.w : tmp2.w;
2886
2887    CMP supports all three data type modifiers.  CMP with an unsigned data
2888    type modifier is not a useful operation, but is not illegal.
2889
2890
2891    Section 2.X.8.Z, CONT:  Continue with Next Loop Iteration
2892
2893    The CONT instruction conditionally transfers control to the next ENDREP
2894    instruction.  A CONT instruction has no effect if the condition code test
2895    evaluates to FALSE.
2896
2897    The following pseudocode describes the operation of the instruction:
2898
2899      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
2900          TestCC(cc.**c*) || TestCC(cc.***c)) {
2901        continue execution at the next ENDREP;
2902      }
2903
2904
2905    Section 2.X.8.Z, COS:  Cosine with Reduction to [-PI,PI]
2906
2907    The COS instruction approximates the trigonometric cosine of the angle
2908    specified by the scalar operand and replicates it to all four components
2909    of the result vector.  The angle is specified in radians and does not have
2910    to be in the range [-PI,PI].
2911
2912      tmp = ScalarLoad(op0);
2913      result.x = ApproxCosine(tmp);
2914      result.y = ApproxCosine(tmp);
2915      result.z = ApproxCosine(tmp);
2916      result.w = ApproxCosine(tmp);
2917
2918    COS supports only floating-point data type modifiers.
2919
2920
2921    Section 2.X.8.Z, DDX:  Partial Derivative Relative to X
2922
2923    The DDX instruction computes approximate partial derivatives of a vector
2924    operand with respect to the X window coordinate, and is only available to
2925    fragment programs.  See the NV_fragment_program4 specification for more
2926    details.
2927
2928
2929    Section 2.X.8.Z, DDY:  Partial Derivative Relative to Y
2930
2931    The DDY instruction computes approximate partial derivatives of a vector
2932    operand with respect to the Y window coordinate, and is only available to
2933    fragment programs.  See the NV_fragment_program4 specification for more
2934    details.
2935
2936
2937    Section 2.X.8.Z, DIV:  Divide Vector Components by Scalar
2938
2939    The DIV instruction performs a component-wise divide of the first vector
2940    operand by the second scalar operand to produce a 4-component result
2941    vector.
2942
2943      tmp0 = VectorLoad(op0);
2944      tmp1 = ScalarLoad(op1);
2945      result.x = tmp0.x / tmp1;
2946      result.y = tmp0.y / tmp1;
2947      result.z = tmp0.z / tmp1;
2948      result.w = tmp0.w / tmp1;
2949
2950    DIV supports all three data type modifiers.  For floating-point division,
2951    this instruction is not guaranteed to produce results identical to a
2952    RCP/MUL instruction sequence.
2953
2954    The results of an signed or unsigned integer division by zero are
2955    undefined.
2956
2957
2958    Section 2.X.8.Z, DP2:  2-Component Dot Product
2959
2960    The DP2 instruction computes a two-component dot product of the two
2961    operands (using the first two components) and replicates the dot product
2962    to all four components of the result vector.
2963
2964      tmp0 = VectorLoad(op0);
2965      tmp1 = VectorLoad(op1);
2966      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y);
2967      result.x = dot;
2968      result.y = dot;
2969      result.z = dot;
2970      result.w = dot;
2971
2972    DP2 supports only floating-point data type modifiers.
2973
2974
2975    Section 2.X.8.Z, DP2A:  2-Component Dot Product with Scalar Add
2976
2977    The DP2 instruction computes a two-component dot product of the two
2978    operands (using the first two components), adds the x component of the
2979    third operand, and replicates the result to all four components of the
2980    result vector.
2981
2982      tmp0 = VectorLoad(op0);
2983      tmp1 = VectorLoad(op1);
2984      tmp2 = VectorLoad(op2);
2985      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + tmp2.x;
2986      result.x = dot;
2987      result.y = dot;
2988      result.z = dot;
2989      result.w = dot;
2990
2991    DP2A supports only floating-point data type modifiers.
2992
2993
2994    Section 2.X.8.Z, DP3:  3-Component Dot Product
2995
2996    The DP3 instruction computes a three-component dot product of the two
2997    operands (using the x, y, and z components) and replicates the dot product
2998    to all four components of the result vector.
2999
3000      tmp0 = VectorLoad(op0);
3001      tmp1 = VectorLoad(op1);
3002      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
3003            (tmp0.z * tmp1.z);
3004      result.x = dot;
3005      result.y = dot;
3006      result.z = dot;
3007      result.w = dot;
3008
3009    DP3 supports only floating-point data type modifiers.
3010
3011
3012    Section 2.X.8.Z, DP4:  4-Component Dot Product
3013
3014    The DP4 instruction computes a four-component dot product of the two
3015    operands and replicates the dot product to all four components of the
3016    result vector.
3017
3018      tmp0 = VectorLoad(op0);
3019      tmp1 = VectorLoad(op1):
3020      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
3021            (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w);
3022      result.x = dot;
3023      result.y = dot;
3024      result.z = dot;
3025      result.w = dot;
3026
3027    DP4 supports only floating-point data type modifiers.
3028
3029
3030    Section 2.X.8.Z, DPH:  Homogeneous Dot Product
3031
3032    The DPH instruction computes a three-component dot product of the two
3033    operands (using the x, y, and z components), adds the w component of the
3034    second operand, and replicates the sum to all four components of the
3035    result vector.  This is equivalent to a four-component dot product where
3036    the w component of the first operand is forced to 1.0.
3037
3038      tmp0 = VectorLoad(op0);
3039      tmp1 = VectorLoad(op1):
3040      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
3041            (tmp0.z * tmp1.z) + tmp1.w;
3042      result.x = dot;
3043      result.y = dot;
3044      result.z = dot;
3045      result.w = dot;
3046
3047    DPH supports only floating-point data type modifiers.
3048
3049
3050    Section 2.X.8.Z, DST:  Distance Vector
3051
3052    The DST instruction computes a distance vector from two specially-
3053    formatted operands.  The first operand should be of the form [NA, d^2,
3054    d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],
3055    where NA values are not relevant to the calculation and d is a vector
3056    length.  If both vectors satisfy these conditions, the result vector will
3057    be of the form [1.0, d, d^2, 1/d].
3058
3059    The exact behavior is specified in the following pseudo-code:
3060
3061      tmp0 = VectorLoad(op0);
3062      tmp1 = VectorLoad(op1);
3063      result.x = 1.0;
3064      result.y = tmp0.y * tmp1.y;
3065      result.z = tmp0.z;
3066      result.w = tmp1.w;
3067
3068    Given an arbitrary vector, d^2 can be obtained using the DP3 instruction
3069    (using the same vector for both operands) and 1/d can be obtained from d^2
3070    using the RSQ instruction.
3071
3072    This distance vector is useful for per-vertex light attenuation
3073    calculations:  a DP3 operation using the distance vector and an
3074    attenuation constants vector as operands will yield the attenuation
3075    factor.
3076
3077    DST supports only floating-point data type modifiers.
3078
3079
3080    Section 2.X.8.Z, ELSE:  Start of If Test Else Block
3081
3082    The ELSE instruction signifies the end of the "execute if true" portion of
3083    an IF/ELSE/ENDIF block and the beginning of the "execute if false"
3084    portion.
3085
3086    If the condition evaluated at the IF statement was TRUE, when a program
3087    reaches the ELSE statement, it has completed the entire "execute if true"
3088    portion of the IF/ELSE/ENDIF block.  Execution will continue at the
3089    corresponding ENDIF instruction.
3090
3091    If the condition evaluated at the IF statement was FALSE, program
3092    execution would skip over the entire "execute if true" portion of the
3093    IF/ELSE/ENDIF block, including the ELSE instruction.
3094
3095
3096    Section 2.X.8.Z, EMIT:  Emit Vertex
3097
3098    The EMIT instruction emits a new vertex to be added to the current output
3099    primitive generated by a geometry program, and is only available to
3100    geometry programs.  See the NV_geometry_program4 specification for more
3101    details.
3102
3103
3104    Section 2.X.8.Z, ENDIF:  End of If Test Block
3105
3106    The ENDIF instruction signifies the end of an IF/ELSE/ENDIF block.  It has
3107    no other effect on program execution.
3108
3109
3110    Section 2.X.8,Z, ENDPRIM:  End of Primitive
3111
3112    A geometry program can emit multiple primitives in a single invocation.
3113    The ENDPRIM instruction is used in a geometry program to signify the end
3114    of the current primitive and the beginning of a new primitive of the same
3115    type.  It is only available to geometry programs.  See the
3116    NV_geometry_program4 specification for more details.
3117
3118
3119    Section 2.X.8.Z, ENDREP:  End of Repeat Block
3120
3121    The ENDREP instruction specifies the end of a REP block.
3122
3123    When used with in conjunction with a REP instruction with a loop count,
3124    ENDREP decrements the loop counter.  If the decremented loop counter is
3125    greater than zero, ENDREP transfers control to the instruction immediately
3126    after the corresponding REP instruction.  If the loop counter is less than
3127    or equal to zero, execution continues at the instruction following the
3128    ENDREP instruction.  When used in conjunction with a REP instruction
3129    without loop count, ENDREP always transfers control to the instruction
3130    immediately after the REP instruction.
3131
3132      if (REP instruction includes a loop count) {
3133        LoopCount--;
3134        if (LoopCount > 0) {
3135          continue execution at instruction following corresponding REP
3136            instruction;
3137        }
3138      } else {
3139        continue execution at instruction following corresponding REP
3140          instruction;
3141      }
3142
3143
3144    Section 2.X.8.Z, EX2:  Exponential Base 2
3145
3146    The EX2 instruction approximates 2 raised to the power of the scalar
3147    operand and replicates the approximation to all four components of the
3148    result vector.
3149
3150      tmp = ScalarLoad(op0);
3151      result.x = Approx2ToX(tmp);
3152      result.y = Approx2ToX(tmp);
3153      result.z = Approx2ToX(tmp);
3154      result.w = Approx2ToX(tmp);
3155
3156    EX2 supports only floating-point data type modifiers.
3157
3158
3159    Section 2.X.8.Z, FLR:  Floor
3160
3161    The FLR instruction loads a single vector operand and performs a
3162    component-wise floor operation to generate a result vector.
3163
3164      tmp = VectorLoad(op0);
3165      result.x = floor(tmp.x);
3166      result.y = floor(tmp.y);
3167      result.z = floor(tmp.z);
3168      result.w = floor(tmp.w);
3169
3170    The floor operation returns the nearest integer less than or equal to the
3171    operand.  For example floor(-1.7) = -2.0, floor(+1.0) = +1.0, and floor(+3.7)
3172    = +3.0.
3173
3174    FLR supports all three data type modifiers.  The single operand is always
3175    treated as a floating-point value, but the result is written as a
3176    floating-point value, a signed integer, or an unsigned integer, as
3177    specified by the data type modifier.  If a value is not exactly
3178    representable using the data type of the result (e.g., an overflow or
3179    writing a negative value to an unsigned integer), the result is undefined.
3180
3181
3182    Section 2.X.8.Z, FRC:  Fraction
3183
3184    The FRC instruction extracts the fractional portion of each component of
3185    the operand to generate a result vector.  The fractional portion of a
3186    component is defined as the result after subtracting off the floor of the
3187    component (see FLR), and is always in the range [0.0, 1.0).
3188
3189    For negative values, the fractional portion is NOT the number written to
3190    the right of the decimal point -- the fractional portion of -1.7 is not
3191    0.7 -- it is 0.3.  0.3 is produced by subtracting the floor of -1.7 (-2.0)
3192    from -1.7.
3193
3194      tmp = VectorLoad(op0);
3195      result.x = fraction(tmp.x);
3196      result.y = fraction(tmp.y);
3197      result.z = fraction(tmp.z);
3198      result.w = fraction(tmp.w);
3199
3200    FRC supports only floating-point data type modifiers.
3201
3202
3203    Section 2.X.8.Z, I2F:  Integer to Float
3204
3205    The I2F instruction converts the components of an integer vector operand
3206    to floating-point to produce a floating-point result vector.
3207
3208      tmp = VectorLoad(op0);
3209      result.x = (float) tmp.x;
3210      result.y = (float) tmp.y;
3211      result.z = (float) tmp.z;
3212      result.w = (float) tmp.w;
3213
3214    I2F supports only signed and unsigned integer data type modifiers.  The
3215    single operand is interpreted according to the data type modifier.  If no
3216    data type modifier is specified, the operand is treated as a signed
3217    integer vector.  The result is always written as a float.
3218
3219
3220    Section 2.X.8.Z, IF:  Start of If Test Block
3221
3222    The IF instruction performs a condition code test to determine what
3223    instructions inside an IF/ELSE/ENDIF block are executed.  If the test
3224    passes, execution continues at the instruction immediately following the
3225    IF instruction.  If the test fails, IF transfers control to the
3226    instruction immediately following the corresponding ELSE instruction (if
3227    present) or the ENDIF instruction (if no ELSE is present).
3228
3229    Implementations may have a limited ability to nest IF blocks in any
3230    subroutine.  If the number of IF/ENDIF blocks nested inside each other is
3231    MAX_PROGRAM_IF_DEPTH_NV or higher, a program will fail to compile.
3232
3233      // Evaluate the condition.  If the condition is true, continue at the
3234      // next instruction.  Otherwise, continue at the
3235      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
3236          TestCC(cc.**c*) || TestCC(cc.***c)) {
3237        continue execution at the next instruction;
3238      } else if (IF block contains an ELSE statement) {
3239        continue execution at instruction following corresponding ELSE;
3240      } else {
3241        continue execution at instruction following corresponding ENDIF;
3242      }
3243
3244    (Note:  Unlike the NV_fragment_program2 extension, there is no run-time
3245    limit on the maximum overall depth of IF/ENDIF nesting.  As long as each
3246    individual subroutine of the program obeys the static nesting limits,
3247    there will be no run-time errors in the program.  With the
3248    NV_fragment_program2 extension, a program could terminate abnormally if it
3249    called a subroutine inside a very deeply nested set of IF/ENDIF blocks and
3250    the called subroutine also contained deeply nested IF/ENDIF blocks.  SUch
3251    an error could occur even if neither subroutine exceeded static limits.)
3252
3253
3254    Section 2.X.8.Z, KIL:  Kill Fragment
3255
3256    The KIL instruction conditionally kills a fragment, and is only available
3257    to fragment programs.  See the NV_fragment_program4 specification for more
3258    details.
3259
3260
3261    Section 2.X.8.Z, LG2:  Logarithm Base 2
3262
3263    The LG2 instruction approximates the base 2 logarithm of the scalar
3264    operand and replicates it to all four components of the result vector.
3265
3266      tmp = ScalarLoad(op0);
3267      result.x = ApproxLog2(tmp);
3268      result.y = ApproxLog2(tmp);
3269      result.z = ApproxLog2(tmp);
3270      result.w = ApproxLog2(tmp);
3271
3272    If the scalar operand is zero or negative, the result is undefined.
3273
3274    LG2 supports only floating-point data type modifiers.
3275
3276
3277    Section 2.X.8.Z, LIT:  Compute Lighting Coefficients
3278
3279    The LIT instruction accelerates lighting computations by computing
3280    lighting coefficients for ambient, diffuse, and specular light
3281    contributions.  The "x" component of the single operand is assumed to hold
3282    a diffuse dot product (n dot VP_pli, as in the vertex lighting equations
3283    in Section 2.13.1).  The "y" component of the operand is assumed to hold a
3284    specular dot product (n dot h_i).  The "w" component of the operand is
3285    assumed to hold the specular exponent of the material (s_rm), and is
3286    clamped to the range (-128, +128) exclusive.
3287
3288    The "x" component of the result vector receives the value that should be
3289    multiplied by the ambient light/material product (always 1.0).  The "y"
3290    component of the result vector receives the value that should be
3291    multiplied by the diffuse light/material product (n dot VP_pli).  The "z"
3292    component of the result vector receives the value that should be
3293    multiplied by the specular light/material product (f_i * (n dot h_i) ^
3294    s_rm).  The "w" component of the result is the constant 1.0.
3295
3296    Negative diffuse and specular dot products are clamped to 0.0, as is done
3297    in the standard per-vertex lighting operations.  In addition, if the
3298    diffuse dot product is zero or negative, the specular coefficient is
3299    forced to zero.
3300
3301      tmp = VectorLoad(op0);
3302      if (tmp.x < 0) tmp.x = 0;
3303      if (tmp.y < 0) tmp.y = 0;
3304      if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon);
3305      else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon;
3306      result.x = 1.0;
3307      result.y = tmp.x;
3308      result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0;
3309      result.w = 1.0;
3310
3311    Since 0^0 is defined to be 1, RoughApproxPower(0.0, 0.0) will produce 1.0.
3312
3313    LIT supports only floating-point data type modifiers.
3314
3315
3316    Section 2.X.8.Z, LRP:  Linear Interpolation
3317
3318    The LRP instruction performs a component-wise linear interpolation between
3319    the second and third operands using the first operand as the blend factor.
3320
3321      tmp0 = VectorLoad(op0);
3322      tmp1 = VectorLoad(op1);
3323      tmp2 = VectorLoad(op2);
3324      result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;
3325      result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;
3326      result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;
3327      result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;
3328
3329    LRP supports only floating-point data type modifiers.
3330
3331
3332    Section 2.X.8.Z, MAD:  Multiply and Add
3333
3334    The MAD instruction performs a component-wise multiply of the first two
3335    operands, and then does a component-wise add of the product to the third
3336    operand to yield a result vector.
3337
3338      tmp0 = VectorLoad(op0);
3339      tmp1 = VectorLoad(op1);
3340      tmp2 = VectorLoad(op2);
3341      result.x = tmp0.x * tmp1.x + tmp2.x;
3342      result.y = tmp0.y * tmp1.y + tmp2.y;
3343      result.z = tmp0.z * tmp1.z + tmp2.z;
3344      result.w = tmp0.w * tmp1.w + tmp2.w;
3345
3346    The multiplication and addition operations in this instruction are subject
3347    to the same rules as described for the MUL and ADD instructions.
3348
3349    MAD supports all three data type modifiers.
3350
3351
3352    Section 2.X.8.Z, MAX:  Maximum
3353
3354    The MAX instruction computes component-wise maximums of the values in the
3355    two operands to yield a result vector.
3356
3357      tmp0 = VectorLoad(op0);
3358      tmp1 = VectorLoad(op1);
3359      result.x = (tmp0.x > tmp1.x) ? tmp0.x : tmp1.x;
3360      result.y = (tmp0.y > tmp1.y) ? tmp0.y : tmp1.y;
3361      result.z = (tmp0.z > tmp1.z) ? tmp0.z : tmp1.z;
3362      result.w = (tmp0.w > tmp1.w) ? tmp0.w : tmp1.w;
3363
3364    MAX supports all three data type modifiers.
3365
3366
3367    Section 2.X.8.Z, MIN:  Minimum
3368
3369    The MIN instruction computes component-wise minimums of the values in the
3370    two operands to yield a result vector.
3371
3372      tmp0 = VectorLoad(op0);
3373      tmp1 = VectorLoad(op1);
3374      result.x = (tmp0.x > tmp1.x) ? tmp1.x : tmp0.x;
3375      result.y = (tmp0.y > tmp1.y) ? tmp1.y : tmp0.y;
3376      result.z = (tmp0.z > tmp1.z) ? tmp1.z : tmp0.z;
3377      result.w = (tmp0.w > tmp1.w) ? tmp1.w : tmp0.w;
3378
3379    MIN supports all three data type modifiers.
3380
3381
3382    Section 2.X.8.Z, MOD:  Modulus
3383
3384    The MOD instruction performs a component-wise modulus operation on the first
3385    vector operand by the second scalar operand to produce a 4-component result
3386    vector.
3387
3388      tmp0 = VectorLoad(op0);
3389      tmp1 = ScalarLoad(op1);
3390      result.x = tmp0.x % tmp1;
3391      result.y = tmp0.y % tmp1;
3392      result.z = tmp0.z % tmp1;
3393      result.w = tmp0.w % tmp1;
3394
3395    MOD supports both signed and unsigned integer data type modifiers.  If no
3396    data type modifier is specified, both operands and the result are treated
3397    as signed integers.
3398
3399    A result component is undefined if the corresponding component of the
3400    first operand is negative or if the second operand is less than or equal
3401    to zero.
3402
3403
3404    Section 2.X.8.Z, MOV:  Move
3405
3406    The MOV instruction copies the value of the operand to yield a result
3407    vector.
3408
3409      result = VectorLoad(op0);
3410
3411    MOV supports all three data type modifiers.
3412
3413
3414    Section 2.X.8.Z, MUL:  Multiply
3415
3416    The MUL instruction performs a component-wise multiply of the two operands
3417    to yield a result vector.
3418
3419      tmp0 = VectorLoad(op0);
3420      tmp1 = VectorLoad(op1);
3421      result.x = tmp0.x * tmp1.x;
3422      result.y = tmp0.y * tmp1.y;
3423      result.z = tmp0.z * tmp1.z;
3424      result.w = tmp0.w * tmp1.w;
3425
3426    MUL supports all three data type modifiers.  The MUL instruction
3427    additionally supports three special modifiers.
3428
3429    The "S24" and "U24" modifiers specify "fast" signed or unsigned integer
3430    multiplies of 24-bit quantities, respectively.  The results of such
3431    multiplies are undefined if either operand is outside the range
3432    [-2^23,+2^23-1] for S24 or [0,2^24-1] for U24.  If "S24" or "U24" is
3433    specified, the data type is implied and normal data type modifiers may not
3434    be provided.
3435
3436    The "HI" modifier specifies a 32-bit integer multiply that returns the 32
3437    most significant bits of the 64-bit product.  Integer multiplies without
3438    the "HI" modifier normally return the least significant bits of the
3439    product.  If "HI" is specified, either of the "S" or "U" integer data type
3440    modifiers must also be specified.
3441
3442    Note that if condition code updates are performed on integer multiplies,
3443    the overflow or carry flags are always cleared, even if the product
3444    overflowed.  If it is necessary to determine if the results of an integer
3445    multiply overflowed, the MUL.HI instruction may be used.
3446
3447
3448    Section 2.X.8.Z, NOT:  Bitwise Not
3449
3450    The NOT instruction performs a component-wise bitwise NOT operation on the
3451    source vector to produce a result vector.
3452
3453      tmp = VectorLoad(op0);
3454      tmp.x = ~tmp.x;
3455      tmp.y = ~tmp.y;
3456      tmp.z = ~tmp.z;
3457      tmp.w = ~tmp.w;
3458
3459    NOT supports only integer data type modifiers.  If no type modifier is
3460    specified, the operand and the result are treated as signed integers.
3461
3462
3463    Section 2.X.8.Z, NRM:  Normalize 3-Component Vector
3464
3465    The NRM instruction normalizes the vector given by the x, y, and z
3466    components of the vector operand to produce the x, y, and z components of
3467    the result vector.  The w component of the result is undefined.
3468
3469      tmp = VectorLoad(op0);
3470      scale = ApproxRSQ(tmp.x * tmp.x + tmp.y * tmp.y + tmp.z * tmp.z);
3471      result.x = tmp.x * scale;
3472      result.y = tmp.y * scale;
3473      result.z = tmp.z * scale;
3474      result.w = undefined;
3475
3476    NRM supports only floating-point data type modifiers.
3477
3478
3479    Section 2.X.8.Z, OR:  Bitwise Or
3480
3481    The OR instruction performs a bitwise OR operation on the components of
3482    the two source vectors to yield a result vector.
3483
3484      tmp0 = VectorLoad(op0);
3485      tmp1 = VectorLoad(op1);
3486      result.x = tmp0.x | tmp1.x;
3487      result.y = tmp0.y | tmp1.y;
3488      result.z = tmp0.z | tmp1.z;
3489      result.w = tmp0.w | tmp1.w;
3490
3491    OR supports only integer data type modifiers.  If no type modifier is
3492    specified, both operands and the result are treated as signed integers.
3493
3494
3495    Section 2.X.8.Z, PK2H:  Pack Two 16-bit Floats
3496
3497    The PK2H instruction converts the "x" and "y" components of the single
3498    floating-point vector operand into 16-bit floating-point format, packs the
3499    bit representation of these two floats into a 32-bit unsigned integer, and
3500    replicates that value to all four components of the result vector.  The
3501    PK2H instruction can be reversed by the UP2H instruction below.
3502
3503      tmp0 = VectorLoad(op0);
3504      /* result obtained by combining raw bits of tmp0.x, tmp0.y */
3505      result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
3506      result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
3507      result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
3508      result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
3509
3510    PK2H supports all three data type modifiers.  The single operand is always
3511    treated as a floating-point value, but the result is written as a
3512    floating-point value, a signed integer, or an unsigned integer, as
3513    specified by the data type modifier.  For integer results, the bits can be
3514    interpreted as described above.  For floating-point result variables, the
3515    packed results do not constitute a meaningful floating-point variable and
3516    should only be used to feed future unpack instructions.
3517
3518    A program will fail to load if it contains a PK2H instruction that writes
3519    its results to a variable declared as "SHORT".
3520
3521
3522    Section 2.X.8.Z, PK2US:  Pack Two Floats as Unsigned 16-bit
3523
3524    The PK2US instruction converts the "x" and "y" components of the single
3525    floating-point vector operand into a packed pair of 16-bit unsigned
3526    scalars.  The scalars are represented in a bit pattern where all '0' bits
3527    corresponds to 0.0 and all '1' bits corresponds to 1.0.  The bit
3528    representations of the two converted components are packed into a 32-bit
3529    unsigned integer, and that value is replicated to all four components of
3530    the result vector.  The PK2US instruction can be reversed by the UP2US
3531    instruction below.
3532
3533      tmp0 = VectorLoad(op0);
3534      if (tmp0.x < 0.0) tmp0.x = 0.0;
3535      if (tmp0.x > 1.0) tmp0.x = 1.0;
3536      if (tmp0.y < 0.0) tmp0.y = 0.0;
3537      if (tmp0.y > 1.0) tmp0.y = 1.0;
3538      us.x = round(65535.0 * tmp0.x);  /* us is a ushort vector */
3539      us.y = round(65535.0 * tmp0.y);
3540      /* result obtained by combining raw bits of us. */
3541      result.x = ((us.x) | (us.y << 16));
3542      result.y = ((us.x) | (us.y << 16));
3543      result.z = ((us.x) | (us.y << 16));
3544      result.w = ((us.x) | (us.y << 16));
3545
3546    PK2US supports all three data type modifiers.  The single operand is
3547    always treated as a floating-point value, but the result is written as a
3548    floating-point value, a signed integer, or an unsigned integer, as
3549    specified by the data type modifier.  For integer result variables, the
3550    bits can be interpreted as described above.  For floating-point result
3551    variables, the packed results do not constitute a meaningful
3552    floating-point variable and should only be used to feed future unpack
3553    instructions.
3554
3555    A program will fail to load if it contains a PK2US instruction that writes
3556    its results to a variable declared as "SHORT".
3557
3558
3559    Section 2.X.8.Z, PK4B:  Pack Four Floats as Signed 8-bit
3560
3561    The PK4B instruction converts the four components of the single
3562    floating-point vector operand into 8-bit signed quantities.  The signed
3563    quantities are represented in a bit pattern where all '0' bits corresponds
3564    to -128/127 and all '1' bits corresponds to +127/127.  The bit
3565    representations of the four converted components are packed into a 32-bit
3566    unsigned integer, and that value is replicated to all four components of
3567    the result vector.  The PK4B instruction can be reversed by the UP4B
3568    instruction below.
3569
3570      tmp0 = VectorLoad(op0);
3571      if (tmp0.x < -128/127) tmp0.x = -128/127;
3572      if (tmp0.y < -128/127) tmp0.y = -128/127;
3573      if (tmp0.z < -128/127) tmp0.z = -128/127;
3574      if (tmp0.w < -128/127) tmp0.w = -128/127;
3575      if (tmp0.x > +127/127) tmp0.x = +127/127;
3576      if (tmp0.y > +127/127) tmp0.y = +127/127;
3577      if (tmp0.z > +127/127) tmp0.z = +127/127;
3578      if (tmp0.w > +127/127) tmp0.w = +127/127;
3579      ub.x = round(127.0 * tmp0.x + 128.0);  /* ub is a ubyte vector */
3580      ub.y = round(127.0 * tmp0.y + 128.0);
3581      ub.z = round(127.0 * tmp0.z + 128.0);
3582      ub.w = round(127.0 * tmp0.w + 128.0);
3583      /* result obtained by combining raw bits of ub. */
3584      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
3585      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
3586      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
3587      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
3588
3589    PK4B supports all three data type modifiers.  The single operand is always
3590    treated as a floating-point value, but the result is written as a
3591    floating-point value, a signed integer, or an unsigned integer, as
3592    specified by the data type modifier.  For integer result variables, the
3593    bits can be interpreted as described above.  For floating-point result
3594    variables, the packed results do not constitute a meaningful
3595    floating-point variable and should only be used to feed future unpack
3596    instructions.  A program will fail to load if it contains a PK4B
3597    instruction that writes its results to a variable declared as "SHORT".
3598
3599
3600    Section 2.X.8.Z, PK4UB:  Pack Four Floats as Unsigned 8-bit
3601
3602    The PK4UB instruction converts the four components of the single
3603    floating-point vector operand into a packed grouping of 8-bit unsigned
3604    scalars.  The scalars are represented in a bit pattern where all '0' bits
3605    corresponds to 0.0 and all '1' bits corresponds to 1.0.  The bit
3606    representations of the four converted components are packed into a 32-bit
3607    unsigned integer, and that value is replicated to all four components of
3608    the result vector.  The PK4UB instruction can be reversed by the UP4UB
3609    instruction below.
3610
3611      tmp0 = VectorLoad(op0);
3612      if (tmp0.x < 0.0) tmp0.x = 0.0;
3613      if (tmp0.x > 1.0) tmp0.x = 1.0;
3614      if (tmp0.y < 0.0) tmp0.y = 0.0;
3615      if (tmp0.y > 1.0) tmp0.y = 1.0;
3616      if (tmp0.z < 0.0) tmp0.z = 0.0;
3617      if (tmp0.z > 1.0) tmp0.z = 1.0;
3618      if (tmp0.w < 0.0) tmp0.w = 0.0;
3619      if (tmp0.w > 1.0) tmp0.w = 1.0;
3620      ub.x = round(255.0 * tmp0.x);  /* ub is a ubyte vector */
3621      ub.y = round(255.0 * tmp0.y);
3622      ub.z = round(255.0 * tmp0.z);
3623      ub.w = round(255.0 * tmp0.w);
3624      /* result obtained by combining raw bits of ub. */
3625      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
3626      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
3627      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
3628      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
3629
3630    PK4UB supports all three data type modifiers.  The single operand is
3631    always treated as a floating-point value, but the result is written as a
3632    floating-point value, a signed integer, or an unsigned integer, as
3633    specified by the data type modifier.  For integer result variables, the
3634    bits can be interpreted as described above.  For floating-point result
3635    variables, the packed results do not constitute a meaningful
3636    floating-point variable and should only be used to feed future unpack
3637    instructions.
3638
3639    A program will fail to load if it contains a PK4UB instruction that writes
3640    its results to a variable declared as "SHORT".
3641
3642
3643    Section 2.X.8.Z, POW:  Exponentiate
3644
3645    The POW instruction approximates the value of the first scalar operand
3646    raised to the power of the second scalar operand and replicates it to all
3647    four components of the result vector.
3648
3649      tmp0 = ScalarLoad(op0);
3650      tmp1 = ScalarLoad(op1);
3651      result.x = ApproxPower(tmp0, tmp1);
3652      result.y = ApproxPower(tmp0, tmp1);
3653      result.z = ApproxPower(tmp0, tmp1);
3654      result.w = ApproxPower(tmp0, tmp1);
3655
3656    The exponentiation approximation function may be implemented using the
3657    base 2 exponentiation and logarithm approximation operations in the EX2
3658    and LG2 instructions.  In particular,
3659
3660      ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)).
3661
3662    Note that a logarithm may be involved even for cases where the exponent is
3663    an integer.  This means that it may not be possible to exponentiate
3664    correctly with a negative base.  In constrast, it is possible in a
3665    "normal" mathematical formulation to raise negative numbers to integral
3666    powers (e.g., (-3)^2== 9, and (-0.5)^-2==4).
3667
3668    POW supports only floating-point data type modifiers.
3669
3670
3671    Section 2.X.8.Z, RCC:  Reciprocal (Clamped)
3672
3673    The RCC instruction approximates the reciprocal of the scalar operand,
3674    clamps the result to one of two ranges, and replicates the clamped result
3675    to all four components of the result vector.
3676
3677    If the approximated reciprocal is greater than 0.0, the result is clamped
3678    to the range [2^-64, 2^+64].  If the approximate reciprocal is not greater
3679    than zero, the result is clamped to the range [-2^+64, -2^-64].
3680
3681      tmp = ScalarLoad(op0);
3682      result.x = ClampApproxReciprocal(tmp);
3683      result.y = ClampApproxReciprocal(tmp);
3684      result.z = ClampApproxReciprocal(tmp);
3685      result.w = ClampApproxReciprocal(tmp);
3686
3687    RCC supports only floating-point data type modifiers.
3688
3689
3690    Section 2.X.8.Z, RCP:  Reciprocal
3691
3692    The RCP instruction approximates the reciprocal of the scalar operand and
3693    replicates it to all four components of the result vector.
3694
3695      tmp = ScalarLoad(op0);
3696      result.x = ApproxReciprocal(tmp);
3697      result.y = ApproxReciprocal(tmp);
3698      result.z = ApproxReciprocal(tmp);
3699      result.w = ApproxReciprocal(tmp);
3700
3701    RCP supports only floating-point data type modifiers.
3702
3703
3704    Section 2.X.8.Z, REP:  Start of Repeat Block
3705
3706    The REP instruction begins a REP/ENDREP block.  The REP instruction
3707    supports an optional operand whose x component specifies the initial value
3708    for the loop count.  The loop count indicates the number of times the
3709    instructions between the REP and corresponding ENDREP instruction will be
3710    executed.  If the initial value of the loop count is not positive, the
3711    entire block is skipped and execution continues at the instruction
3712    following the corresponding ENDREP instruction.  If the loop count is
3713    specified as a floating-point value, it is converted to the largest
3714    integer less than or equal to the specified value (i.e., taking its
3715    floor).
3716
3717    If no operand is provided to REP, the loop count is ignored and the
3718    corresponding ENDREP instruction unconditionally transfers control to the
3719    instruction immediately following the REP instruction.  The only way to
3720    exit such a loop is with the BRK instruction.  To prevent obvious infinite
3721    loops, a program that includes a REP/ENDREP block with no loop count will
3722    fail to compile unless it contains either a BRK instruction at the current
3723    nesting level or a RET instruction at any nesting level.
3724
3725    Implementations may have a limited ability to nest REP/ENDREP blocks.  If
3726    the number of REP/ENDREP blocks nested inside each other is
3727    MAX_PROGRAM_LOOP_DEPTH_NV or higher, a program will fail to compile.
3728
3729      // Set up loop information for the new nesting level.
3730      tmp = VectorLoad(op0);
3731      LoopCount = floor(tmp.x);
3732      if (LoopCount <= 0) {
3733        continue execution at the corresponding ENDREP;
3734      }
3735
3736    REP supports all three data type modifiers.  The single operand is
3737    interpreted according to the data type modifier.
3738
3739    (Note:  Unlike the NV_fragment_program2 extension, REP blocks in this
3740    extension support fully general looping; the specified loop count can be
3741    computed in the program itself.  Additionally, there is no run-time limit
3742    on the maximum overall depth of REP/ENDREP nesting.  As long as each
3743    individual subroutine of the program obeys the static nesting limits,
3744    there will be no run-time errors in the program.  With the
3745    NV_fragment_program2 extension, a program could terminate abnormally if it
3746    called a subroutine inside a deeply nested set of REP/ENDREP blocks and
3747    the called subroutine also contained deeply nested REP/ENDREP blocks.
3748    Such an error could occur even if neither subroutine exceeded static
3749    limits.)
3750
3751
3752    Section 2.X.8.Z, RET:  Subroutine Return
3753
3754    The RET instruction conditionally returns from a subroutine initiated by a
3755    CAL instruction by popping an instruction reference off the top of the
3756    call stack and transferring control to the referenced instruction.  The
3757    following pseudocode describes the operation of the instruction:
3758
3759      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
3760          TestCC(cc.**c*) || TestCC(cc.***c)) {
3761        if (callStackDepth <= 0) {
3762          // terminate program
3763        } else {
3764          callStackDepth--;
3765          instruction = callStack[callStackDepth];
3766        }
3767
3768        // continue execution at <instruction>
3769      } else {
3770        // do nothing
3771      }
3772
3773    In the pseudocode, <callStackDepth> is the depth of the call stack,
3774    <callStack> is an array holding the call stack, and <instruction> is a
3775    reference to an instruction previously pushed onto the call stack.
3776
3777    If the call stack is empty when RET executes, the program terminates
3778    normally.
3779
3780
3781    Section 2.X.8.Z, RFL:  Reflection Vector
3782
3783    The RFL instruction computes the reflection of the second vector operand
3784    (the "direction" vector) about the vector specified by the first vector
3785    operand (the "axis" vector).  Both operands are treated as 3D vectors (the
3786    w components are ignored).  The result vector is another 3D vector (the
3787    "reflected direction" vector).  The length of the result vector, ignoring
3788    rounding errors, should equal that of the second operand.
3789
3790      axis = VectorLoad(op0);
3791      direction = VectorLoad(op1);
3792      tmp.w = (axis.x * axis.x + axis.y * axis.y + axis.z * axis.z);
3793      tmp.x = (axis.x * direction.x + axis.y * direction.y +
3794               axis.z * direction.z);
3795      tmp.x = 2.0 * tmp.x;
3796      tmp.x = tmp.x / tmp.w;
3797      result.x = tmp.x * axis.x - direction.x;
3798      result.y = tmp.x * axis.y - direction.y;
3799      result.z = tmp.x * axis.z - direction.z;
3800      result.w = undefined;
3801
3802    RFL supports only floating-point data type modifiers.
3803
3804
3805    Section 2.X.8.Z, ROUND:  Round to Nearest Integer
3806
3807    The ROUND instruction loads a single vector operand and performs a
3808    component-wise round operation to generate a result vector.
3809
3810      tmp = VectorLoad(op0);
3811      result.x = round(tmp.x);
3812      result.y = round(tmp.y);
3813      result.z = round(tmp.z);
3814      result.w = round(tmp.w);
3815
3816    The round operation returns the nearest integer to the operand.  If the
3817    fractional portion of the operand is 0.5, round() selects the nearest even
3818    integer.  For example round(-1.7) = -2.0, round(+1.0) = +1.0, and
3819    round(+3.7) = +4.0.
3820
3821    ROUND supports all three data type modifiers.  The single operand is
3822    always treated as a floating-point value, but the result is written as a
3823    floating-point value, a signed integer, or an unsigned integer, as
3824    specified by the data type modifier.  If a value is not exactly
3825    representable using the data type of the result (e.g., an overflow or
3826    writing a negative value to an unsigned integer), the result is undefined.
3827
3828
3829    Section 2.X.8.Z, RSQ:  Reciprocal Square Root
3830
3831    The RSQ instruction approximates the reciprocal of the square root of the
3832    scalar operand and replicates it to all four components of the result
3833    vector.
3834
3835      tmp = ScalarLoad(op0);
3836      result.x = ApproxRSQRT(tmp);
3837      result.y = ApproxRSQRT(tmp);
3838      result.z = ApproxRSQRT(tmp);
3839      result.w = ApproxRSQRT(tmp);
3840
3841    If the operand is less than or equal to zero, the results of the
3842    instruction are undefined.
3843
3844    RSQ supports only floating-point data type modifiers.
3845
3846    Note that this instruction differs from the RSQ instruction in
3847    ARB_vertex_program in that it does not implicitly take the absolute value
3848    of its operand.  The |abs| operator can be used to achieve equivalent
3849    semantics.
3850
3851
3852    Section 2.X.8.Z, SAD:  Sum of Absolute Differences
3853
3854    The SAD instruction performs a component-wise difference of the first two
3855    integer operands (subtracting the second from the first), and then does a
3856    component-wise add of the absolute value of the difference to the third
3857    unsigned integer operand to yield an unsigned integer result vector.
3858
3859      tmp0 = VectorLoad(op0);
3860      tmp1 = VectorLoad(op1);
3861      tmp2 = VectorLoad(op2);
3862      result.x = abs(tmp0.x - tmp1.x) + tmp2.x;
3863      result.y = abs(tmp0.y - tmp1.y) + tmp2.y;
3864      result.z = abs(tmp0.z - tmp1.z) + tmp2.z;
3865      result.w = abs(tmp0.w - tmp1.w) + tmp2.w;
3866
3867    SAD supports signed and unsigned integer data type modifiers.  The first
3868    two operands are interpreted according to the data type modifier.  The
3869    third operand and the result are always unsigned integers.
3870
3871
3872    Section 2.X.8.Z, SCS:  Sine/Cosine without Reduction
3873
3874    The SCS instruction approximates the trigonometric sine and cosine of the
3875    angle specified by the scalar operand and places the cosine in the x
3876    component and the sine in the y component of the result vector.  The z and
3877    w components of the result vector are undefined.  The angle is specified
3878    in radians and must be in the range [-PI,PI].
3879
3880      tmp = ScalarLoad(op0);
3881      result.x = ApproxCosine(tmp);
3882      result.y = ApproxSine(tmp);
3883      result.z = undefined;
3884      result.w = undefined;
3885
3886    If the scalar operand is not in the range [-PI,PI], the result vector is
3887    undefined.
3888
3889    SCS supports only floating-point data type modifiers.
3890
3891
3892    Section 2.X.8.Z, SEQ:  Set on Equal
3893
3894    The SEQ instruction performs a component-wise comparison of the two
3895    operands.  Each component of the result vector returns a TRUE value
3896    (described below) if the corresponding component of the first operand is
3897    equal to that of the second, and a FALSE value otherwise.
3898
3899      tmp0 = VectorLoad(op0);
3900      tmp1 = VectorLoad(op1);
3901      result.x = (tmp0.x == tmp1.x) ? TRUE : FALSE;
3902      result.y = (tmp0.y == tmp1.y) ? TRUE : FALSE;
3903      result.z = (tmp0.z == tmp1.z) ? TRUE : FALSE;
3904      result.w = (tmp0.w == tmp1.w) ? TRUE : FALSE;
3905
3906    SEQ supports all data type modifiers.  For floating-point data types, the
3907    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
3908    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
3909    integer data types, the TRUE value is the maximum integer value (all bits
3910    are ones) and the FALSE value is zero.
3911
3912
3913    Section 2.X.8.Z, SFL:  Set on False
3914
3915    The SFL instruction is a degenerate case of the other "Set on"
3916    instructions that sets all components of the result vector to a FALSE
3917    value (described below).
3918
3919      result.x = FALSE;
3920      result.y = FALSE;
3921      result.z = FALSE;
3922      result.w = FALSE;
3923
3924    SFL supports all data type modifiers.  For floating-point data types, the
3925    FALSE value is 0.0.  For signed and unsigned integer data types, the FALSE
3926    value is zero.
3927
3928
3929    Section 2.X.8.Z, SGE:  Set on Greater Than or Equal
3930
3931    The SGE instruction performs a component-wise comparison of the two
3932    operands.  Each component of the result vector returns a TRUE value
3933    (described below) if the corresponding component of the first operand is
3934    greater than or equal to that of the second, and a FALSE value otherwise.
3935
3936      tmp0 = VectorLoad(op0);
3937      tmp1 = VectorLoad(op1);
3938      result.x = (tmp0.x >= tmp1.x) ? TRUE : FALSE;
3939      result.y = (tmp0.y >= tmp1.y) ? TRUE : FALSE;
3940      result.z = (tmp0.z >= tmp1.z) ? TRUE : FALSE;
3941      result.w = (tmp0.w >= tmp1.w) ? TRUE : FALSE;
3942
3943    SGE supports all data type modifiers.  For floating-point data types, the
3944    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
3945    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
3946    integer data types, the TRUE value is the maximum integer value (all bits
3947    are ones) and the FALSE value is zero.
3948
3949
3950    Section 2.X.8.Z, SGT:  Set on Greater Than
3951
3952    The SGT instruction performs a component-wise comparison of the two
3953    operands.  Each component of the result vector returns a TRUE value
3954    (described below) if the corresponding component of the first operand is
3955    greater than that of the second, and a FALSE value otherwise.
3956
3957      tmp0 = VectorLoad(op0);
3958      tmp1 = VectorLoad(op1);
3959      result.x = (tmp0.x > tmp1.x) ? TRUE : FALSE;
3960      result.y = (tmp0.y > tmp1.y) ? TRUE : FALSE;
3961      result.z = (tmp0.z > tmp1.z) ? TRUE : FALSE;
3962      result.w = (tmp0.w > tmp1.w) ? TRUE : FALSE;
3963
3964    SGT supports all data type modifiers.  For floating-point data types, the
3965    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
3966    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
3967    integer data types, the TRUE value is the maximum integer value (all bits
3968    are ones) and the FALSE value is zero.
3969
3970
3971    Section 2.X.8.Z, SHL:  Shift Left
3972
3973    The SHL instruction performs a component-wise left shift of the bits of
3974    the first operand by the value of the second scalar operand to produce a
3975    result vector.  The bits vacated during the shift operation are filled
3976    with zeroes.
3977
3978      tmp0 = VectorLoad(op0);
3979      tmp1 = ScalarLoad(op1);
3980      result.x = tmp0.x << tmp1;
3981      result.y = tmp0.y << tmp1;
3982      result.z = tmp0.z << tmp1;
3983      result.w = tmp0.w << tmp1;
3984
3985    The results of a shift operation ("<<") are undefined if the value of the
3986    second operand is negative, or greater than or equal to the number of bits
3987    in the first operand.
3988
3989    SHL supports both signed and unsigned integer data type modifiers.  If no
3990    modifier is provided, the operands and the result are treated as signed
3991    integers.
3992
3993
3994    Section 2.X.8.Z, SHR:  Shift Right
3995
3996    The SHR instruction performs a component-wise right shift of the bits of
3997    the first operand by the value of the second scalar operand to produce a
3998    result vector.  The bits vacated during shift operation are filled with
3999    zeros if the operand is non-negative and ones otherwise.
4000
4001      tmp0 = VectorLoad(op0);
4002      tmp1 = ScalarLoad(op1);
4003      result.x = tmp0.x >> tmp1;
4004      result.y = tmp0.y >> tmp1;
4005      result.z = tmp0.z >> tmp1;
4006      result.w = tmp0.w >> tmp1;
4007
4008    The results of a shift operation (">>") are undefined if the value of the
4009    second operand is negative, or greater than or equal to the number of bits
4010    in the first operand.
4011
4012    SHR supports both signed and unsigned integer data type modifiers.  If no
4013    modifiers are provided, the operands and the result are treated as signed
4014    integers.
4015
4016
4017    Section 2.X.8.Z, SIN:  Sine with Reduction to [-PI,PI]
4018
4019    The SIN instruction approximates the trigonometric sine of the angle
4020    specified by the scalar operand and replicates it to all four components
4021    of the result vector.  The angle is specified in radians and does not have
4022    to be in the range [-PI,PI].
4023
4024      tmp = ScalarLoad(op0);
4025      result.x = ApproxSine(tmp);
4026      result.y = ApproxSine(tmp);
4027      result.z = ApproxSine(tmp);
4028      result.w = ApproxSine(tmp);
4029
4030    SIN supports only floating-point data type modifiers.
4031
4032
4033    Section 2.X.8.Z, SLE:  Set on Less Than or Equal
4034
4035    The SLE instruction performs a component-wise comparison of the two
4036    operands.  Each component of the result vector returns a TRUE value
4037    (described below) if the corresponding component of the first operand is
4038    less than or equal to that of the second, and a FALSE value otherwise.
4039
4040      tmp0 = VectorLoad(op0);
4041      tmp1 = VectorLoad(op1);
4042      result.x = (tmp0.x <= tmp1.x) ? TRUE : FALSE;
4043      result.y = (tmp0.y <= tmp1.y) ? TRUE : FALSE;
4044      result.z = (tmp0.z <= tmp1.z) ? TRUE : FALSE;
4045      result.w = (tmp0.w <= tmp1.w) ? TRUE : FALSE;
4046
4047    SLE supports all data type modifiers.  For floating-point data types, the
4048    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
4049    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
4050    integer data types, the TRUE value is the maximum integer value (all bits
4051    are ones) and the FALSE value is zero.
4052
4053
4054    Section 2.X.8.Z, SLT:  Set on Less Than
4055
4056    The SLT instruction performs a component-wise comparison of the two
4057    operands.  Each component of the result vector returns a TRUE value
4058    (described below) if the corresponding component of the first operand is
4059    less than that of the second, and a FALSE value otherwise.
4060
4061      tmp0 = VectorLoad(op0);
4062      tmp1 = VectorLoad(op1);
4063      result.x = (tmp0.x < tmp1.x) ? TRUE : FALSE;
4064      result.y = (tmp0.y < tmp1.y) ? TRUE : FALSE;
4065      result.z = (tmp0.z < tmp1.z) ? TRUE : FALSE;
4066      result.w = (tmp0.w < tmp1.w) ? TRUE : FALSE;
4067
4068    SLT supports all data type modifiers.  For floating-point data types, the
4069    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
4070    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
4071    integer data types, the TRUE value is the maximum integer value (all bits
4072    are ones) and the FALSE value is zero.
4073
4074
4075    Section 2.X.8.Z, SNE:  Set on Not Equal
4076
4077    The SNE instruction performs a component-wise comparison of the two
4078    operands.  Each component of the result vector returns a TRUE value
4079    (described below) if the corresponding component of the first operand is
4080    less than that of the second, and a FALSE value otherwise.
4081
4082      tmp0 = VectorLoad(op0);
4083      tmp1 = VectorLoad(op1);
4084      result.x = (tmp0.x != tmp1.x) ? TRUE : FALSE;
4085      result.y = (tmp0.y != tmp1.y) ? TRUE : FALSE;
4086      result.z = (tmp0.z != tmp1.z) ? TRUE : FALSE;
4087      result.w = (tmp0.w != tmp1.w) ? TRUE : FALSE;
4088
4089    SNE supports all data type modifiers.  For floating-point data types, the
4090    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
4091    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
4092    integer data types, the TRUE value is the maximum integer value (all bits
4093    are ones) and the FALSE value is zero.
4094
4095
4096    Section 2.X.8.Z, SSG:  Set Sign
4097
4098    The SSG instruction generates a result vector containing the signs of
4099    each component of the single vector operand.  Each component of the
4100    result vector is 1.0 if the corresponding component of the operand
4101    is greater than zero, 0.0 if the corresponding component of the
4102    operand is equal to zero, and -1.0 if the corresponding component
4103    of the operand is less than zero.
4104
4105      tmp = VectorLoad(op0);
4106      result.x = SetSign(tmp.x);
4107      result.y = SetSign(tmp.y);
4108      result.z = SetSign(tmp.z);
4109      result.w = SetSign(tmp.w);
4110
4111    SSG supports only floating-point data type modifiers.
4112
4113
4114    Section 2.X.8.Z, STR:  Set on True
4115
4116    The STR instruction is a degenerate case of the other "Set on"
4117    instructions that sets all components of the result vector to a TRUE value
4118    (described below).
4119
4120      result.x = TRUE;
4121      result.y = TRUE;
4122      result.z = TRUE;
4123      result.w = TRUE;
4124
4125    STR supports all data type modifiers.  For floating-point data types, the
4126    TRUE value is 1.0.  For signed integer data types, the TRUE value is -1.
4127    For unsigned integer data types, the TRUE value is the maximum integer
4128    value (all bits are ones).
4129
4130
4131    Section 2.X.8.Z, SUB:  Subtract
4132
4133    The SUB instruction performs a component-wise subtraction of the second
4134    operand from the first to yield a result vector.
4135
4136      tmp0 = VectorLoad(op0);
4137      tmp1 = VectorLoad(op1);
4138      result.x = tmp0.x - tmp1.x;
4139      result.y = tmp0.y - tmp1.y;
4140      result.z = tmp0.z - tmp1.z;
4141      result.w = tmp0.w - tmp1.w;
4142
4143    SUB supports all three data type modifiers.
4144
4145
4146    Section 2.X.8.Z, SWZ:  Extended Swizzle
4147
4148    The SWZ instruction loads the single vector operand, and performs a
4149    swizzle operation more powerful than that provided for loading normal
4150    vector operands to yield an instruction vector.
4151
4152    After the operand is loaded, the "x", "y", "z", and "w" components of the
4153    result vector are selected by the first, second, third, and fourth matches
4154    of the <extSwizComp> pattern in the <extendedSwizzle> rule.
4155
4156    A result component can be selected from any of the four components of the
4157    operand or the constants 0.0 and 1.0.  The result component can also be
4158    optionally negated.  The following pseudocode describes the component
4159    selection method.  "operand" refers to the vector operand, "select" is an
4160    enumerant where the values ZERO, ONE, X, Y, Z, and W correspond to the
4161    <extSwizSel> rule matching "0", "1", "x", "y", "z", and "w", respectively.
4162    "negate" is TRUE if and only if the <optionalSign> rule in <extSwizComp>
4163    matches "-".
4164
4165      float ExtSwizComponent(floatVec operand, enum select, boolean negate)
4166      {
4167          float result;
4168          switch (select) {
4169            case ZERO:  result = 0.0; break;
4170            case ONE:   result = 1.0; break;
4171            case X:     result = operand.x; break;
4172            case Y:     result = operand.y; break;
4173            case Z:     result = operand.z; break;
4174            case W:     result = operand.w; break;
4175          }
4176          if (negate) {
4177            result = -result;
4178          }
4179          return result;
4180      }
4181
4182    The entire extended swizzle operation is then defined using the following
4183    pseudocode:
4184
4185      tmp = VectorLoad(op0);
4186      result.x = ExtSwizComponent(tmp, xSelect, xNegate);
4187      result.y = ExtSwizComponent(tmp, ySelect, yNegate);
4188      result.z = ExtSwizComponent(tmp, zSelect, zNegate);
4189      result.w = ExtSwizComponent(tmp, wSelect, wNegate);
4190
4191    "xSelect", "xNegate", "ySelect", "yNegate", "zSelect", "zNegate",
4192    "wSelect", and "wNegate" correspond to the "select" and "negate" values
4193    above for the four <extSwizComp> matches.
4194
4195    Since this instruction allows for component selection and negation for
4196    each individual component, the grammar does not allow the use of the
4197    normal swizzle and negation operations allowed for vector operands in
4198    other instructions.
4199
4200    SWZ supports only floating-point data type modifiers.
4201
4202
4203    Section 2.X.8.Z, TEX:  Texture Sample
4204
4205    The TEX instruction takes the four components of a single floating-point
4206    source vector and performs a filtered texture access as described in
4207    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
4208    floating-point result vector.  Partial derivatives and the level of detail
4209    are computed automatically.
4210
4211      tmp = VectorLoad(op0);
4212      ddx = ComputePartialsX(tmp);
4213      ddy = ComputePartialsY(tmp);
4214      lambda = ComputeLOD(ddx, ddy);
4215      result = TextureSample(tmp, lambda, ddx, ddy, texelOffset);
4216
4217    TEX supports all three data type modifiers.  The single operand is always
4218    treated as a floating-point vector; the results are interpreted according
4219    to the data type modifier.
4220
4221
4222    Section 2.X.8.Z, TRUNC:  Truncate (Round Toward Zero)
4223
4224    The TRUNC instruction loads a single vector operand and performs a
4225    component-wise truncate operation to generate a result vector.
4226
4227      tmp = VectorLoad(op0);
4228      result.x = trunc(tmp.x);
4229      result.y = trunc(tmp.y);
4230      result.z = trunc(tmp.z);
4231      result.w = trunc(tmp.w);
4232
4233    The truncate operation returns the nearest integer to zero smaller in
4234    magnitude than the operand.  For example trunc(-1.7) = -1.0, trunc(+1.0) =
4235    +1.0, and trunc(+3.7) = +3.0.
4236
4237    TRUNC supports all three data type modifiers.  The single operand is
4238    always treated as a floating-point value, but the result is written as a
4239    floating-point value, a signed integer, or an unsigned integer, as
4240    specified by the data type modifier.  If a value is not exactly
4241    representable using the data type of the result (e.g., an overflow or
4242    writing a negative value to an unsigned integer), the result is undefined.
4243
4244
4245    Section 2.X.8.Z, TXB:  Texture Sample with Bias
4246
4247    The TXB instruction takes the four components of a single floating-point
4248    source vector and performs a filtered texture access as described in
4249    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
4250    floating-point result vector.  Partial derivatives and the level of detail
4251    are computed automatically, but the fourth component of the source vector
4252    is added to the computed LOD prior to sampling.
4253
4254      tmp = VectorLoad(op0);
4255      ddx = ComputePartialsX(tmp);
4256      ddy = ComputePartialsY(tmp);
4257      lambda = ComputeLOD(ddx, ddy);
4258      result = TextureSample(tmp, lambda + tmp.w, ddx, ddy, texelOffset);
4259
4260    The single source vector in the TXB instruction does not have enough
4261    coordinates to specify a lookup into a two-dimensional array texture or
4262    cube map texture with both an LOD bias and an explicit reference value for
4263    depth comparison.  A program will fail to load if it contains a TXB
4264    instruction with a target of SHADOWCUBE or SHADOWARRAY2D.
4265
4266    TXB supports all three data type modifiers.  The single operand is always
4267    treated as a floating-point vector; the results are interpreted according
4268    to the data type modifier.
4269
4270
4271    Section 2.X.8.Z, TXD:  Texture Sample with Partials
4272
4273    The TXD instruction takes the four components of the first floating-point
4274    source vector and performs a filtered texture access as described in
4275    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
4276    floating-point result vector.  The partial derivatives of the texture
4277    coordinates with respect to X and Y are specified by the second and third
4278    floating-point source vectors.  The level of detail is computed
4279    automatically using the provided partial derivatives.
4280
4281    Note that for cube map texture targets, the provided partial derivatives
4282    are in the coordinate system used before texture coordinates are projected
4283    onto the appropriate cube face.  The partial derivatives of the
4284    post-projection texture coordinates, which are used for level-of-detail
4285    and anisotropic filtering calculations, are derived from the original
4286    coordinates and partial derivatives in an implementation-dependent manner.
4287
4288      tmp0 = VectorLoad(op0);
4289      tmp1 = VectorLoad(op1);
4290      tmp2 = VectorLoad(op2);
4291      lambda = ComputeLOD(tmp1, tmp2);
4292      result = TextureSample(tmp0, lambda, tmp1, tmp2, texelOffset);
4293
4294    TXD supports all three data type modifiers.  All three operands are always
4295    treated as floating-point vectors; the results are interpreted according
4296    to the data type modifier.
4297
4298
4299    Section 2.X.8.Z, TXF:  Texel Fetch
4300
4301    The TXF instruction takes the four components of a single signed integer
4302    source vector and performs a single texel fetch as described in Section
4303    2.X.4.4.  The first three components provide the <i>, <j>, and <k> values
4304    for the texel fetch, and the fourth component is used to determine the LOD
4305    to access.  The returned (R,G,B,A) value is written to the floating-point
4306    result vector.  Partial derivatives are irrelevant for single texel
4307    fetches.
4308
4309      tmp = VectorLoad(op0);
4310      result = TexelFetch(tmp, texelOffset);
4311
4312    TXF supports all three data type modifiers.  The single vector operand is
4313    treated as a signed integer vector; the results are interpreted according
4314    to the data type modifier.
4315
4316
4317    Section 2.X.8.Z, TXL:  Texture Sample with LOD
4318
4319    The TXL instruction takes the four components of a single floating-point
4320    source vector and performs a filtered texture access as described in
4321    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
4322    floating-point result vector.  The level of detail is taken from the
4323    fourth component of the source vector.
4324
4325    Partial derivatives are not computed by the TXL instruction and
4326    anisotropic filtering is not performed.
4327
4328      tmp = VectorLoad(op0);
4329      ddx = (0,0,0);
4330      ddy = (0,0,0);
4331      result = TextureSample(tmp, tmp.w, ddx, ddy, texelOffset);
4332
4333    The single source vector in the TXL instruction does not have enough
4334    coordinates to specify a lookup into a 2D array or cube map texture with
4335    both an explicit LOD and a reference value for depth comparison.  A
4336    program will fail to load if it contains a TXL instruction with a target
4337    of SHADOWCUBE or SHADOWARRAY2D.
4338
4339    TXL supports all three data type modifiers.  The single vector operand is
4340    treated as a floating-point vector; the results are interpreted according
4341    to the data type modifier.
4342
4343
4344    Section 2.X.8.Z, TXP:  Texture Sample with Projection
4345
4346    The TXP instruction divides the first three components of its single
4347    floating-point source vector by its fourth component, maps the results to
4348    s, t, and r, and performs a filtered texture access as described in
4349    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
4350    floating-point result vector.  Partial derivatives and the level of detail
4351    are computed automatically.
4352
4353      tmp0 = VectorLoad(op0);
4354      tmp0.x = tmp0.x / tmp0.w;
4355      tmp0.y = tmp0.y / tmp0.w;
4356      tmp0.z = tmp0.z / tmp0.w;
4357      ddx = ComputePartialsX(tmp);
4358      ddy = ComputePartialsY(tmp);
4359      lambda = ComputeLOD(ddx, ddy);
4360      result = TextureSample(tmp, lambda, ddx, ddy, texelOffset);
4361
4362    The single source vector in the TXP instruction does not have enough
4363    coordinates to specify a lookup into a 2D array or cube map texture with
4364    both a Q coordinate and an explicit reference value for depth comparison.
4365    A program will fail to load if it contains a TXP instruction with a target
4366    of SHADOWCUBE or SHADOWARRAY2D.
4367
4368    TXP supports all three data type modifiers.  The single vector operand is
4369    treated as a floating-point vector; the results are interpreted according
4370    to the data type modifier.
4371
4372
4373    Section 2.X.8.Z, TXQ:  Texture Size Query
4374
4375    The TXQ instruction takes the first component of the single integer vector
4376    operand, adds the number of the base level of the specified texture to
4377    determine a texture image level, and returns an integer result vector
4378    containing the size of the image at that level of the texture.
4379
4380    For one-dimensional and one-dimensional array textures, the "x" component
4381    of the result vector is filled with the width of the image(s).  For
4382    two-dimensional, rectangle, cube map, and two-dimensional array textures,
4383    the "x" and "y" components are filled with the width and height of the
4384    image(s).  For three-dimensional textures, the "x", "y", and "z"
4385    components are filled with the width, height, and depth of the image.
4386    Additionally, the number of layers in an array texture is returned in the
4387    "y" component of the result for one-dimensional array textures or the "z"
4388    component for two-dimensional array textures.  All other components of the
4389    result vector is undefined.  For the purposes of this instruction, the
4390    width, height, and depth of a texture do NOT include any border.
4391
4392      tmp0 = VectorLoad(op0);
4393      tmp0.x = tmp0.x + texture[op1].target[op2].base_level;
4394      result.x = texture[op1].target[op2].level[tmp0.x].width;
4395      result.y = texture[op1].target[op2].level[tmp0.x].height;
4396      result.z = texture[op1].target[op2].level[tmp0.x].depth;
4397
4398    If the level computed by adding the operand to the base level of the
4399    texture is less than the base level number or greater than the maximum
4400    level number, the results are undefined.
4401
4402    TXQ supports no data type modifiers; the scalar operand and the result
4403    vector are both interpreted as signed integers.
4404
4405
4406    Section 2.X.8.Z, UP2H:  Unpack Two 16-bit Floats
4407
4408    The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit
4409    scalar operand.  The first 16-bit float (stored in the 16 least
4410    significant bits) is written into the "x" and "z" components of the result
4411    vector; the second is written into the "y" and "w" components of the
4412    result vector.
4413
4414    This operation undoes the type conversion and packing performed by
4415    the PK2H instruction.
4416
4417      tmp = ScalarLoad(op0);
4418      result.x = (fp16) (RawBits(tmp) & 0xFFFF);
4419      result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
4420      result.z = (fp16) (RawBits(tmp) & 0xFFFF);
4421      result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
4422
4423    UP2H supports all three data type modifiers.  The single operand is read
4424    as a floating-point value, a signed integer, or an unsigned integer, as
4425    specified by the data type modifier; the 32 least significant bits of the
4426    encoding are used for unpacking.  For floating-point operand variables, it
4427    is expected (but not required) that the operand was produced by a previous
4428    pack instruction.  The result is always written as a floating-point
4429    vector.
4430
4431    A program will fail to load if it contains a UP2H instruction whose
4432    operand is a variable declared as "SHORT".
4433
4434
4435    Section 2.X.8.Z, UP2US:  Unpack Two Unsigned 16-bit Integers
4436
4437    The UP2US instruction unpacks two 16-bit unsigned values packed
4438    together in a 32-bit scalar operand.  The unsigned quantities are
4439    encoded where a bit pattern of all '0' bits corresponds to 0.0 and
4440    a pattern of all '1' bits corresponds to 1.0.  The "x" and "z"
4441    components of the result vector are obtained from the 16 least
4442    significant bits of the operand; the "y" and "w" components are
4443    obtained from the 16 most significant bits.
4444
4445    This operation undoes the type conversion and packing performed by
4446    the PK2US instruction.
4447
4448      tmp = ScalarLoad(op0);
4449      result.x = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
4450      result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
4451      result.z = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
4452      result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
4453
4454    UP2US supports all three data type modifiers.  The single operand is read
4455    as a floating-point value, a signed integer, or an unsigned integer, as
4456    specified by the data type modifier; the 32 least significant bits of the
4457    encoding are used for unpacking.  For floating-point operand variables, it
4458    is expected (but not required) that the operand was produced by a previous
4459    pack instruction.  The result is always written as a floating-point
4460    vector.
4461
4462    A GPU program will fail to load if it contains a UP2S instruction
4463    whose operand is a variable declared as "SHORT".
4464
4465
4466    Section 2.X.8.Z, UP4B:  Unpack Four Signed 8-bit Integers
4467
4468    The UP4B instruction unpacks four 8-bit signed values packed together
4469    in a 32-bit scalar operand.  The signed quantities are encoded where
4470    a bit pattern of all '0' bits corresponds to -128/127 and a pattern
4471    of all '1' bits corresponds to +127/127.  The "x" component of the
4472    result vector is the converted value corresponding to the 8 least
4473    significant bits of the operand; the "w" component corresponds to
4474    the 8 most significant bits.
4475
4476    This operation undoes the type conversion and packing performed by
4477    the PK4B instruction.
4478
4479      tmp = ScalarLoad(op0);
4480      result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0;
4481      result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0;
4482      result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0;
4483      result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0;
4484
4485    UP2B supports all three data type modifiers.  The single operand is read
4486    as a floating-point value, a signed integer, or an unsigned integer, as
4487    specified by the data type modifier; the 32 least significant bits of the
4488    encoding are used for unpacking.  For floating-point operand variables, it
4489    is expected (but not required) that the operand was produced by a previous
4490    pack instruction.  The result is always written as a floating-point
4491    vector.
4492
4493    A program will fail to load if it contains a UP4B instruction whose
4494    operand is a variable declared as "SHORT".
4495
4496
4497    Section 2.X.8.Z, UP4UB:  Unpack Four Unsigned 8-bit Integers
4498
4499    The UP4UB instruction unpacks four 8-bit unsigned values packed
4500    together in a 32-bit scalar operand.  The unsigned quantities are
4501    encoded where a bit pattern of all '0' bits corresponds to 0.0 and a
4502    pattern of all '1' bits corresponds to 1.0.  The "x" component of the
4503    result vector is obtained from the 8 least significant bits of the
4504    operand; the "w" component is obtained from the 8 most significant
4505    bits.
4506
4507    This operation undoes the type conversion and packing performed by
4508    the PK4UB instruction.
4509
4510      tmp = ScalarLoad(op0);
4511      result.x = ((RawBits(tmp) >> 0)  & 0xFF) / 255.0;
4512      result.y = ((RawBits(tmp) >> 8)  & 0xFF) / 255.0;
4513      result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0;
4514      result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0;
4515
4516    UP4UB supports all three data type modifiers.  The single operand is read
4517    as a floating-point value, a signed integer, or an unsigned integer, as
4518    specified by the data type modifier; the 32 least significant bits of the
4519    encoding are used for unpacking.  For floating-point operand variables, it
4520    is expected (but not required) that the operand was produced by a previous
4521    pack instruction.  The result is always written as a floating-point
4522    vector.
4523
4524    A program will fail to load if it contains a UP4UB instruction whose
4525    operand is a variable declared as "SHORT".
4526
4527
4528    Section 2.X.8.Z, X2D:  2D Coordinate Transformation
4529
4530    The X2D instruction multiplies the 2D offset vector specified by the
4531    "x" and "y" components of the second vector operand by the 2x2 matrix
4532    specified by the four components of the third vector operand, and adds
4533    the transformed offset vector to the 2D vector specified by the "x"
4534    and "y" components of the first vector operand.  The first component
4535    of the sum is written to the "x" and "z" components of the result;
4536    the second component is written to the "y" and "w" components of
4537    the result.
4538
4539      tmp0 = VectorLoad(op0);
4540      tmp1 = VectorLoad(op1);
4541      tmp2 = VectorLoad(op2);
4542      result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
4543      result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
4544      result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
4545      result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
4546
4547    X2D supports only floating-point data type modifiers.
4548
4549
4550    Section 2.X.8.Z, XOR:  Exclusive Or
4551
4552    The XOR instruction performs a bitwise XOR operation on the components of
4553    the two source vectors to yield a result vector.
4554
4555      tmp0 = VectorLoad(op0);
4556      tmp1 = VectorLoad(op1);
4557      result.x = tmp0.x ^ tmp1.x;
4558      result.y = tmp0.y ^ tmp1.y;
4559      result.z = tmp0.z ^ tmp1.z;
4560      result.w = tmp0.w ^ tmp1.w;
4561
4562    XOR supports only integer data type modifiers.  If no type modifier is
4563    specified, both operands and the result are treated as signed integers.
4564
4565
4566    Section 2.X.8.Z, XPD:  Cross Product
4567
4568    The XPD instruction computes the cross product using the first three
4569    components of its two vector operands to generate the x, y, and z
4570    components of the result vector.  The w component of the result vector is
4571    undefined.
4572
4573      tmp0 = VectorLoad(op0);
4574      tmp1 = VectorLoad(op1);
4575      result.x = tmp0.y * tmp1.z - tmp0.z * tmp1.y;
4576      result.y = tmp0.z * tmp1.x - tmp0.x * tmp1.z;
4577      result.z = tmp0.x * tmp1.y - tmp0.y * tmp1.x;
4578      result.w = undefined;
4579
4580    XPD supports only floating-point data type modifiers.
4581
4582
4583Additions to Chapter 3 of the OpenGL 1.5 Specification (Rasterization)
4584
4585    Modify Section 3.8.1, Texture Image Specification, p. 150
4586
4587    (modify 4th paragraph, p. 151 -- add cubemaps to the list of texture
4588    targets that can be used with DEPTH_COMPONENT textures) Textures with a
4589    base internal format of DEPTH_COMPONENT are supported by texture image
4590    specification commands only if <target> is TEXTURE_1D, TEXTURE_2D,
4591    TEXTURE_CUBE_MAP, TEXTURE_RECTANGLE_ARB, TEXTURE_1D_ARRAY_EXT,
4592    TEXTURE_2D_ARRAY_EXT, PROXY_TEXTURE_1D PROXY_TEXTURE_2D,
4593    PROXY_TEXTURE_CUBE_MAP, PROXY_TEXTURE_RECTANGLE_ARB,
4594    PROXY_TEXTURE_1D_ARRAY_EXT, or PROXY_TEXTURE_2D_ARRAY_EXT.  Using this
4595    format in conjunction with any other target will result in an
4596    INVALID_OPERATION error.
4597
4598
4599    Delete Section 3.8.7, Texture Wrap Modes.  (The language in this section
4600    is folded into updates to the following section, and is no longer needed
4601    here.)
4602
4603
4604    Modify Section 3.8.8, Texture Minification:
4605
4606    (replace the last paragraph, p. 171):  Let s(x,y) be the function that
4607    associates an s texture coordinate with each set of window coordinates
4608    (x,y) that lie within a primitive; define t(x,y) and r(x,y) analogously.
4609    Let
4610
4611      u(x,y) = w_t * s(x,y) + offsetu_shader,
4612      v(x,y) = h_t * t(x,y) + offsetv_shader,
4613      w(x,y) = d_t * r(x,y) + offsetw_shader, and
4614
4615    where w_t, h_t, and d_t are as defined by equations 3.15, 3.16, and 3.17
4616    with w_s, h_s, and d_s equal to the width, height, and depth of the image
4617    array whose level is level_base.  (offsetu_shader, offsetv_shader,
4618    offsetw_shader) is the texel offset specified in the vertex, geometry, or
4619    fragment program instruction used to perform the access.  For
4620    fixed-function texture accesses, all three shader offsets are taken to be
4621    zero.  For a one-dimensional texture, define v(x,y) == 0 and w(x,y) === 0;
4622    for two-dimensional textures, define w(x,y) == 0.
4623
4624    After u(x,y), v(x,y), and w(x,y) are generated, they are clamped if the
4625    corresponding texture wrap modes are CLAMP or MIRROR_CLAMP_EXT.  Let
4626
4627      u'(x,y) = clamp(u(x,y), 0, w_t),      if TEXTURE_WRAP_S is CLAMP
4628                clamp(u(x,y), -w_t, w_t),   if TEXTURE_WRAP_S is
4629                                              MIRROR_CLAMP_EXT, or
4630                u(x,y),                     otherwise
4631      v'(x,y) = clamp(v(x,y), 0, w_t),      if TEXTURE_WRAP_T is CLAMP
4632                clamp(v(x,y), -w_t, w_t),   if TEXTURE_WRAP_T is
4633                                              MIRROR_CLAMP_EXT, or
4634                v(x,y),                     otherwise
4635      w'(x,y) = clamp(w(x,y), 0, w_t),      if TEXTURE_WRAP_R is CLAMP
4636                clamp(w(x,y), -w_t, w_t),   if TEXTURE_WRAP_R is
4637                                              MIRROR_CLAMP_EXT, or
4638                w(x,y),                     otherwise,
4639
4640    where clamp(<a>,<b>,<c>) returns <b> if <a> is less than <b>, <c> if a is
4641    greater than <c>, and <a> otherwise.
4642
4643    (start a new paragraph with "For a polygon, rho is given at a fragment
4644    with window coordinates...", and then continue with the original spec
4645    text.)
4646
4647    (replace text starting with the last paragraph on p. 172, continuing to
4648    the end of p. 174)
4649
4650    When lambda indicates minification, the value assigned to
4651    TEXTURE_MIN_FILTER is used to determine how the texture value for a
4652    fragment is selected.
4653
4654    When TEXTURE_MIN_FILTER is NEAREST, the texel in the image array of level
4655    level_base that is nearest (in Manhattan distance) to that specified by
4656    (s,t,r) is obtained.  Let i, j, and k be integers such that:
4657
4658      i = apply_wrap(floor(u'(x,y))),
4659      j = apply_wrap(floor(v'(x,y))), and
4660      k = apply_wrap(floor(w'(x,y))),
4661
4662    where the coordinate returned by apply_wrap() is as defined by Table X.19.
4663    The values of i, j, and k are then modified according to the texture wrap
4664    modes, as described in Table 3.19, to produce new values (i', j', and k').
4665    For a three-dimensional texture, the texel at location (i,j,k) becomes the
4666    texture value.  For a two-dimensional texture, k is irrelevant, and the
4667    texel at location (i,j) becomes the texture value.  For a one-dimensional
4668    texture, j and k are irrelevant, and the texel at location i becomes the
4669    texture value.
4670
4671      Wrap mode                   Result
4672      --------------------------  ------------------------------------------
4673      CLAMP_TO_EDGE               clamp(coord, 0, size-1)
4674      CLAMP_TO_BORDER             clamp(coord, -1, size)
4675      CLAMP                       { clamp(coord, 0, size-1),
4676                                  {         for NEAREST filtering
4677                                  { clamp(coord, -1, size),
4678                                  {         for LINEAR filtering
4679      REPEAT                      mod(coord, size)
4680      MIRROR_CLAMP_TO_EDGE_EXT    clamp(mirror(coord), 0, size-1)
4681      MIRROR_CLAMP_TO_BORDER_EXT  clamp(mirror(size), 0, size)
4682      MIRROR_CLAMP_EXT            { clamp(mirror(coord), 0, size-1),
4683                                  {         for NEAREST filtering
4684                                  { clamp(mirror(size), 0, size),
4685                                  {         for LINEAR filtering
4686      MIRRORED_REPEAT             (size-1) - mirror(mod(coord, 2*size)-size)
4687
4688      Table X.19:  Texel location wrap mode application.  mod(<a>,<b>) is
4689      defined to return <a>-<b>*floor(<a>/<b>), and mirror(<a>) is defined to
4690      return <a> if <a> is greater than or equal to zero or -(1+<a>)
4691      otherwise.  The values of "wrap mode" and size are TEXTURE_WRAP_S and
4692      w_t, TEXTURE_WRAP_T and h_t, and TEXTURE_WRAP_R and d_t, for i, j, and k
4693      coordinates, respectively.  The coordinate clamp and MIRROR_CLAMP_EXT
4694      depends on the filtering mode (NEAREST or LINEAR).
4695
4696    If the selected (i,j,k), (i,j), or i location refers to a border texel
4697    that satisfies any of the following conditions:
4698
4699      i < -b_s,
4700      j < -b_s,
4701      k < -b_s,
4702      i >= w_t + b_s,
4703      j >= h_t + b_s, or
4704      j >= d_t + b_s,
4705
4706    then the border values defined by TEXTURE_BORDER_COLOR are used in place
4707    of the non-existent texel. If the texture contains color components, the
4708    values of TEXTURE_BORDER_COLOR are interpreted as an RGBA color to match
4709    the texture's internal format in a manner consistent with table 3.15. If
4710    the texture contains depth components, the first component of
4711    TEXTURE_BORDER_COLOR is interpreted as a depth value.
4712
4713    When TEXTURE_MIN_FILTER is LINEAR, a 2x2x2 cube of texels in the image
4714    array of level level_base is selected.  Let:
4715
4716      i_0   = apply_wrap(floor(u' - 0.5)),
4717      j_0   = apply_wrap(floor(v' - 0.5)),
4718      k_0   = apply_wrap(floor(w' - 0.5)),
4719      i_1   = apply_wrap(floor(u' - 0.5) + 1),
4720      j_1   = apply_wrap(floor(v' - 0.5) + 1),
4721      k_1   = apply_wrap(floor(w' - 0.5) + 1),
4722      alpha = frac(u' - 0.5),
4723      beta  = frac(v' - 0.5),
4724      gamma = frac(w' - 0.5),
4725
4726    where frac(<x>) denotes the fractional part of <x>.
4727
4728    For a three-dimensional texture, the texture value tau is found as...
4729
4730    (replace last paragraph, p.174) For any texel in the equation above that
4731    refers to a border texel outside the defined range of the image, the texel
4732    value is taken from the texture border color as with NEAREST filtering.
4733
4734
4735    Modify Section 3.8.14, Texture Comparison Modes (p. 185)
4736
4737    (modify 2nd paragraph, p. 188, indicating that the Q texture coordinate is
4738    used for depth comparisons on cubemap textures)
4739
4740    Let D_t be the depth texture value, in the range [0, 1].  For
4741    fixed-function texture lookups, let R be the interpolated <r> texture
4742    coordinate, clamped to the range [0, 1].  For texture lookups generated by
4743    a program instruction, let R be the reference value for depth comparisons
4744    provided in the instruction, also clamped to [0, 1].  Then the effective
4745    texture value L_t, I_t, or A_t is computed as follows:
4746
4747
4748Additions to Chapter 4 of the OpenGL 1.5 Specification (Per-Fragment
4749Operations and the Frame Buffer)
4750
4751    None.
4752
4753
4754Additions to Chapter 5 of the OpenGL 1.5 Specification (Special Functions)
4755
4756    None.
4757
4758
4759Additions to Chapter 6 of the OpenGL 1.5 Specification (State and
4760State Requests)
4761
4762    Modify Section 6.1.12 of the ARB_vertex_program specification.
4763
4764    (Add new integer program parameter queries, plus language that program
4765    environment or local parameter query results are undefined if the query
4766    specifies a data type incompatible with the data type of the parameter
4767    being queried.)
4768
4769    The commands
4770
4771      void GetProgramEnvParameterdvARB(enum target, uint index,
4772                                       double *params);
4773      void GetProgramEnvParameterfvARB(enum target, uint index,
4774                                       float *params);
4775      void GetProgramEnvParameterIivNV(enum target, uint index,
4776                                       int *params);
4777      void GetProgramEnvParameterIuivNV(enum target, uint index,
4778                                        uint *params);
4779
4780    obtain the current value for the program environment parameter numbered
4781    <index> for the given program target <target>, and places the information
4782    in the array <params>.  The values returned are undefined if the data type
4783    of the components of the parameter is not compatible with the data type of
4784    <params>.  Floating-point components are compatible with "double" or
4785    "float"; signed and unsigned integer components are compatible with "int"
4786    and "uint", respectively.  The error INVALID_ENUM is generated if <target>
4787    specifies a nonexistent program target or a program target that does not
4788    support program environment parameters.  The error INVALID_VALUE is
4789    generated if <index> is greater than or equal to the
4790    implementation-dependent number of supported program environment
4791    parameters for the program target.
4792
4793    ...
4794
4795    The commands
4796
4797      void GetProgramLocalParameterdvARB(enum target, uint index,
4798                                         double *params);
4799      void GetProgramLocalParameterfvARB(enum target, uint index,
4800                                         float *params);
4801      void GetProgramLocalParameterIivNV(enum target, uint index,
4802                                         int *params);
4803      void GetProgramLocalParameterIuivNV(enum target, uint index,
4804                                          uint *params);
4805
4806    obtain the current value for the program local parameter numbered <index>
4807    belonging to the program object currently bound to <target>, and places
4808    the information in the array <params>.  The values returned are undefined
4809    if the data type of the components of the parameter is not compatible with
4810    the data type of <params>.  Floating-point components are compatible with
4811    "double' or "float"; signed and unsigned integer components are compatible
4812    with "int" and "uint", respectively.  The error INVALID_ENUM is generated
4813    if <target> specifies a nonexistent program target or a program target
4814    that does not support program local parameters.  The error INVALID_VALUE
4815    is generated if <index> is greater than or equal to the
4816    implementation-dependent number of supported program local parameters for
4817    the program target.
4818
4819    ...
4820
4821    The command
4822
4823      void GetProgramivARB(enum target, enum pname, int *params);
4824
4825    obtains program state for the program target <target>, writing ...
4826
4827    (add new paragraphs describing the new supported queries)
4828
4829    If <pname> is PROGRAM_ATTRIB_COMPONENTS_NV or
4830    PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer
4831    holding the number of active attribute or result variable components,
4832    respectively, used by the program object currently bound to <target>.
4833
4834    If <pname> is MAX_PROGRAM_ATTRIB_COMPONENTS or
4835    MAX_PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer
4836    holding the maximum number of active attribute or result variable
4837    components, respectively, supported for programs of type <target>.
4838
4839
4840Additions to Appendix A of the OpenGL 1.5 Specification (Invariance)
4841
4842    None.
4843
4844
4845Additions to the AGL/GLX/WGL Specifications
4846
4847    None.
4848
4849
4850GLX Protocol
4851
4852    The following new rendering commands are sent to the server as part
4853    of a glXRender request.
4854
4855    ProgramLocalParameterI4ivNV
4856
4857        2           28               rendering command length
4858        2           4303             rendering command opcode
4859        4           ENUM             target
4860        4           CARD32           index
4861        4           INT32            params[0]
4862        4           INT32            params[1]
4863        4           INT32            params[2]
4864        4           INT32            params[3]
4865
4866    ProgramLocalParameterI4uivNV
4867
4868        2           28               rendering command length
4869        2           4305             rendering command opcode
4870        4           ENUM             target
4871        4           CARD32           index
4872        4           CARD32           params[0]
4873        4           CARD32           params[1]
4874        4           CARD32           params[2]
4875        4           CARD32           params[3]
4876
4877    ProgramEnvParameterI4ivNV
4878
4879        2           28               rendering command length
4880        2           4307             rendering command opcode
4881        4           ENUM             target
4882        4           CARD32           index
4883        4           INT32            params[0]
4884        4           INT32            params[1]
4885        4           INT32            params[2]
4886        4           INT32            params[3]
4887
4888    ProgramEnvParameterI4uivNV
4889
4890        2           28               rendering command length
4891        2           4309             rendering command opcode
4892        4           ENUM             target
4893        4           CARD32           index
4894        4           CARD32           params[0]
4895        4           CARD32           params[1]
4896        4           CARD32           params[2]
4897        4           CARD32           params[3]
4898
4899    Following new rendering commands are added. These can be sent as a
4900    glXRender or glXRenderLarge request.
4901
4902    ProgramLocalParametersI4ivNV
4903
4904        2           16+count*4*4     rendering command length
4905        2           4304             rendering command opcode
4906        4           ENUM             target
4907        4           CARD32           index
4908        4           CARD32           count
4909        4*count*4   LISTofINT32      params
4910
4911    If the command is encoded in a glXRenderLarge request, the
4912    command opcode and command length fields above are expanded to
4913    4 bytes each:
4914
4915        4           20+count*4*4     rendering command length
4916        4           4304             rendering command opcode
4917
4918    ProgramLocalParametersI4uivNV
4919
4920        2           16+count*4*4     rendering command length
4921        2           4306             rendering command opcode
4922        4           ENUM             target
4923        4           CARD32           index
4924        4           CARD32           count
4925        4*count*4   LISTofCARD32     params
4926
4927    If the command is encoded in a glXRenderLarge request, the
4928    command opcode and command length fields above are expanded to
4929    4 bytes each:
4930
4931        4           20+count*4*4     rendering command length
4932        4           4306             rendering command opcode
4933
4934    ProgramEnvParametersI4ivNV
4935
4936        2           16+count*4*4     rendering command length
4937        2           4308             rendering command opcode
4938        4           ENUM             target
4939        4           CARD32           index
4940        4           CARD32           count
4941        4*count*4   LISTofCARD32     params
4942
4943    If the command is encoded in a glXRenderLarge request, the
4944    command opcode and command length fields above are expanded to
4945    4 bytes each:
4946
4947        4           20+count*4*4     rendering command length
4948        4           4308             rendering command opcode
4949
4950    ProgramEnvParametersI4uivNV
4951
4952        2           16+count*4*4     rendering command length
4953        2           4310             rendering command opcode
4954        4           ENUM             target
4955        4           CARD32           index
4956        4           INT32            count
4957        4*count*4   LISTofCARD32     params
4958
4959    If the command is encoded in a glXRenderLarge request, the
4960    command opcode and command length fields above are expanded to
4961    4 bytes each:
4962
4963        4           20+count*4*4     rendering command length
4964        4           4310             rendering command opcode
4965
4966    The remaining commands are non-rendering commands.  These commands
4967    are sent separately (i.e., not as part of a glXRender or
4968    glXRenderLarge request), using the glXVendorPrivateWithReply
4969    request:
4970
4971    GetProgramLocalParameterIivNV
4972        1           CARD8            opcode (X assigned)
4973        1           17               GLX opcode (X_GLXVendorPrivateWithReply)
4974        2           5                request length
4975        4           1365             vendor specific opcode
4976        4           GLX_CONTEXT_TAG  context tag
4977        4           ENUM             target
4978        4           CARD32           index
4979      =>
4980        1           1                reply
4981        1           CARD8            unused
4982        2           CARD16           sequence number
4983        4           4                reply length
4984        24          CARD32           unused
4985        16          INT32            params
4986
4987    GetProgramLocalParameterIuivNV
4988        1           CARD8            opcode (X assigned)
4989        1           17               GLX opcode (X_GLXVendorPrivateWithReply)
4990        2           5                request length
4991        4           1366             vendor specific opcode
4992        4           GLX_CONTEXT_TAG  context tag
4993        4           ENUM             target
4994        4           CARD32           index
4995      =>
4996        1           1                reply
4997        1           CARD8            unused
4998        2           CARD16           sequence number
4999        4           4                reply length
5000        24          CARD32           unused
5001        16          CARD32           params
5002
5003    GetProgramEnvParameterIivNV
5004        1           CARD8            opcode (X assigned)
5005        1           17               GLX opcode (X_GLXVendorPrivateWithReply)
5006        2           5                request length
5007        4           1367             vendor specific opcode
5008        4           GLX_CONTEXT_TAG  context tag
5009        4           ENUM             target
5010        4           CARD32           index
5011      =>
5012        1           1                reply
5013        1           CARD8            unused
5014        2           CARD16           sequence number
5015        4           4                reply length
5016        24          CARD32           unused
5017        16          INT32            params
5018
5019    GetProgramEnvParameterIuivNV
5020        1           CARD8            opcode (X assigned)
5021        1           17               GLX opcode (X_GLXVendorPrivateWithReply)
5022        2           5                request length
5023        4           1368             vendor specific opcode
5024        4           GLX_CONTEXT_TAG  context tag
5025        4           ENUM             target
5026        4           CARD32           index
5027      =>
5028        1           1                reply
5029        1           CARD8            unused
5030        2           CARD16           sequence number
5031        4           4                reply length
5032        24          CARD32           unused
5033        16          CARD32           params
5034
5035Errors
5036
5037    The error INVALID_VALUE is generated by ProgramLocalParameter4fARB,
5038    ProgramLocalParameter4fvARB, ProgramLocalParameter4dARB,
5039    ProgramLocalParameter4dvARB, ProgramLocalParameterI4iNV,
5040    ProgramLocalParameterI4ivNV, ProgramLocalParameterI4uiNV,
5041    ProgramLocalParameterI4uivNV, GetProgramLocalParameter4fvARB,
5042    GetProgramLocalParameter4dvARB, GetProgramLocalParameterI4ivNV, and
5043    GetProgramLocalParameterI4uivNV if <index> is greater than or equal to the
5044    number of program local parameters supported by <target>.
5045
5046    The error INVALID_VALUE is generated by ProgramEnvParameter4fARB,
5047    ProgramEnvParameter4fvARB, ProgramEnvParameter4dARB,
5048    ProgramEnvParameter4dvARB, ProgramEnvParameterI4iNV,
5049    ProgramEnvParameterI4ivNV, ProgramEnvParameterI4uiNV,
5050    ProgramEnvParameterI4uivNV, GetProgramEnvParameter4fvARB,
5051    GetProgramEnvParameter4dvARB, GetProgramEnvParameterI4ivNV, and
5052    GetProgramEnvParameterI4uivNV if <index> is greater than or equal to the
5053    number of program environment parameters supported by <target>.
5054
5055    The error INVALID_VALUE is generated by ProgramLocalParameters4fvNV,
5056    ProgramLocalParametersI4ivNV, and ProgramLocalParametersI4uivNV if the sum
5057    of <index> and <count> is greater than the number of program local
5058    parameters supported by <target>.
5059
5060    The error INVALID_VALUE is generated by ProgramEnvParameters4fvNV,
5061    ProgramEnvParametersI4ivNV, and ProgramEnvParametersI4uivNV if the sum of
5062    <index> and <count> is greater than the number of program environment
5063    parameters supported by <target>.
5064
5065
5066Dependencies on NV_parameter_buffer_object
5067
5068    If NV_parameter_buffer_object is not supported, references to program
5069    parameter buffer variables and bindings should be removed.
5070
5071
5072Dependencies on ARB_texture_rectangle
5073
5074    If ARB_texture_rectangle is not supported, references to rectangle
5075    textures and the RECT and SHADOWRECT texture target identifiers should be
5076    removed.
5077
5078
5079Dependencies on EXT_gpu_program_parameters
5080
5081    If EXT_gpu_program_parameters is not supported, references to the
5082    Program{Local,Env}Parameters4fvNV commands, which set multiple program
5083    local or environment parameters in a single call, should be removed.
5084    These prototypes were included in this spec for completeness only.
5085
5086
5087Dependencies on EXT_texture_integer
5088
5089    If EXT_texture_integer is not supported, references to texture lookups
5090    returning integer values in Section 2.X.4.4 (Texture Access) should be
5091    removed, and all texture formats are considered to produce floating-point
5092    values.
5093
5094
5095Dependencies on EXT_texture_array
5096
5097    If EXT_texture_array is not supported, references to array textures in
5098    Section 2.X.4.4 (Texture Access) and elsewhere should be removed, as
5099    should all references to the "ARRAY1D", "ARRAY2D", "SHADOWARRAY1D", and
5100    "SHADOWARRAY2D" tokens.
5101
5102
5103Dependencies on EXT_texture_buffer_object
5104
5105    If EXT_texture_buffer_object is not supported, references to buffer
5106    textures in Section 2.X.4.4 (Texture Access) and elsewhere should be
5107    removed, as should all references to the "BUFFER" tokens.
5108
5109
5110Dependencies on NV_primitive_restart
5111
5112    If NV_primitive_restart is supported, index values causing a primitive
5113    restart are not considered as specifying an End command, followed by
5114    another Begin.  Primitive restart is therefore not guaranteed to
5115    immediately update bindings for material properties changed inside a
5116    Begin/End.  The spec language says they "are not guaranteed to update
5117    program parameter bindings until the following End command."
5118
5119
5120New State
5121
5122                                                         Initial
5123    Get Value                     Type  Get Command       Value  Description             Sec     Attrib
5124    ----------------------------  ----  ---------------  ------- ----------------------  ------  ------
5125    PROGRAM_ATTRIB_COMPONENTS_NV  Z+    GetProgramivARB     -    number of components    6.1.12   -
5126                                                                 used for attributes
5127    PROGRAM_RESULT_COMPONENTS_NV  Z+    GetProgramivARB     -    number of components    6.1.12   -
5128                                                                 used for results
5129
5130    Table X.20.  New Program Object State.  Program object queries return
5131    attributes of the program object currently bound to the program target
5132    <target>.
5133
5134
5135New Implementation Dependent State
5136
5137                                                             Minimum
5138    Get Value                         Type  Get Command       Value   Description           Sec.   Attrib
5139    --------------------------------  ----  ---------------  -------  --------------------- ------ ------
5140    MIN_PROGRAM_TEXEL_OFFSET_EXT      Z     GetIntegerv        -8     minimum texel offset  2.x.4.4  -
5141                                                                      allowed in lookup
5142    MAX_PROGRAM_TEXEL_OFFSET_EXT      Z     GetIntegerv        +7     maximum texel offset  2.x.4.4  -
5143                                                                      allowed in lookup
5144    MAX_PROGRAM_ATTRIB_COMPONENTS_NV  Z+    GetProgramivARB    (*)    maximum number of     6.1.12   -
5145                                                                      components allowed
5146                                                                      for attributes
5147    MAX_PROGRAM_RESULT_COMPONENTS_NV  Z+    GetProgramivARB    (*)    maximum number of     6.1.12   -
5148                                                                      components allowed
5149                                                                      for results
5150    MAX_PROGRAM_GENERIC_ATTRIBS_NV    Z+    GetProgramivARB    (*)    number of generic     6.1.12   -
5151                                                                      attribute vectors
5152                                                                      supported
5153    MAX_PROGRAM_GENERIC_RESULTS_NV    Z+    GetProgramivARB    (*)    number of generic     6.1.12   -
5154                                                                      result vectors
5155                                                                      supported
5156    MAX_PROGRAM_CALL_DEPTH_NV         Z+    GetProgramivARB     4     maximum program       2.X.5    -
5157                                                                      call stack depth
5158    MAX_PROGRAM_IF_DEPTH_NV           Z+    GetProgramivARB     48    maximum program       2.X.5    -
5159                                                                      if nesting
5160    MAX_PROGRAM_LOOP_DEPTH_NV         Z+    GetProgramivARB     4     maximum program       2.X.5    -
5161                                                                      loop nesting
5162
5163    Table X.21:  New Implementation-Dependent Values Introduced by
5164    NV_gpu_program4.  (*) means that the required minimum is program
5165    type-specific.  There are separate limits for each program type.
5166
5167
5168Issues
5169
5170    (1) How does this extension differ from previous NV_vertex_program and
5171    NV_fragment_program extensions?
5172
5173      RESOLVED:
5174
5175        - This extension provides a uniform set of instructions and bindings.
5176          Unlike previous extensions, the set of instructions and bindings
5177          available is generally the same.  The only exceptions are a small
5178          number of instructions and bindings that make sense for one specific
5179          program type.
5180
5181        - This extension supports integer data types and provides a
5182          full-fledged integer instruction set.
5183
5184        - This extension supports array variables of all types, including
5185          temporaries.  Array variables can be accessed directly or indirectly
5186          (using integer temporaries as indices).
5187
5188        - This extension provides a uniform set of structured branching
5189          constructs (if tests, loops, subroutines) that fully support
5190          run-time condition testing.  Previous versions of NV_vertex_program
5191          provided unstructured branching.  Previous versions of
5192          NV_fragment_program provided structure branching constructs, but the
5193          support was more limited -- for example, looping constructs couldn't
5194          specify loop counts with values computed at run time.
5195
5196        - This extension supports geometry programs, which are described in
5197          more detail in the NV_geometry_program4 extension.
5198
5199        - This extension provides the ability to specify and use cubemap
5200          textures with a DEPTH_COMPONENT internal format.  Shadow mapping is
5201          supported; the Q texture coordinate is used as the reference value
5202          for comparisons.
5203
5204    (2) Is this extension backward-compatible with previous NV_vertex_program
5205    and NV_fragment_program extensions?  If not, what support has been
5206    removed?
5207
5208      RESOLVED:  This extension is largely, but not completely,
5209      backward-compatible.  Functionality removed includes:
5210
5211        - Unstructured branching:  NV_vertex_program2 included a general
5212          branch instruction "BRA" that could be used to jump to an arbitrary
5213          instruction.  The "CAL" instruction could "call" to an arbitrary
5214          instruction into code that was not necessarily structured as simple
5215          subroutine blocks.  Arbitrary unstructured branching can be
5216          difficult to implement efficiently on highly parallel GPU
5217          architectures, while basic structured branching is not nearly as
5218          difficult.
5219
5220          This extension retains the "CAL" instruction but treats each block
5221          of code between instruction labels as a separate subroutine.  The
5222          "BRA" instruction and arbitrary branching has been removed.  The
5223          structured branching constructs in this extension are sufficient to
5224          implement almost all of the looping/branching support in high-level
5225          languages ("goto" being the most obvious exception).
5226
5227        - Address registers:  NV_vertex_program added the notion of address
5228          registers, which were effectively under-powered integer temporaries.
5229          The set of instructions used to manipulate address registers was
5230          severely limited.  NV_vertex_program[23] extended the original
5231          scalars to vectors and added a few more instructions to manipulate
5232          address registers.  Fragment programs had no address registers until
5233          NV_fragment_program2 added the loop counter, which was very similar
5234          in functionality to vertex program address registers, but even more
5235          limited.  This extension adds true integer temporaries, which can
5236          accomplish everything old address registers could do, and much more.
5237          Address register support was removed to simplify the API.
5238
5239        - NV_fragment_program2 LOOP construct:  NV_fragment_program2 added a
5240          LOOP instruction, which let you repeat a block of code <N> times,
5241          with a parallel loop counter that started at <A> and stepped by <B>
5242          on each iteration.  This construct was signficantly limited in
5243          several ways -- the loop count had to be constant, and you could
5244          only access the innermost loop counter in a nested loop.  This
5245          extension discards the support and retains the simpler "REP"
5246          construct to implement loops.  If desired, a loop counter can be
5247          implemented by manipulating an integer temporary.  The "BRK"
5248          instruction (conditional break) is retained, and a "CONT"
5249          instruction (conditional continue) is added.  Additionally, the loop
5250          count need not be a constant.
5251
5252        - NV_vertex_program and ARB_vertex_program EXP and LOG instructions:
5253          NV_vertex_program provided EXP and LOG instructions that computed a
5254          rough approximation of 2^x or log_2(x) and provided some additional
5255          values that could help refine the approximation.  Those opcodes were
5256          carried forward into ARB_vertex_program.  Both ARB_vertex_program
5257          and NV_vertex_program2 provided EX2 and LG2 instructions that
5258          computed a better approximation.  All fragment program extensions
5259          also provided EX2 and LG2, but did not bother to include EXP and
5260          LOG.  On the hardware targeted by this extension, there is no
5261          advantage to using EXP and LOG, so these opcodes have been removed
5262          for simplicity.
5263
5264        - NV_vertex_program3 and NV_fragment_program2 provide the ability to
5265          do indirect addressing of inputs/outputs when using bindings in
5266          instructions -- for example:
5267
5268            MOV R0, vertex.attrib[A0.x+2];      # vertex
5269            MOV result.texcoord[A0.y], R1;      # vertex
5270            MOV R2, fragment.texcoord[A0.x];    # fragment
5271
5272          This extension provides indexing capability, but using named array
5273          variables instead.
5274
5275            ATTRIB attribs[] = { vertex.attrib[2..5] };
5276            MOV R0, attribs[A0.x];
5277            OUTPUT outcoords[] = { result.texcoord[0..3] };
5278            MOV outcoords[A0.y], R1;
5279            ATTRIB texcoords[] = { fragment.texcoord[0..2] };
5280            MOV R2, texcoords[A0.x];
5281
5282          This approach makes the set of attribute and result bindings more
5283          regular.  Additionally, it helps the assembler determine which
5284          vertex/fragment attributes are actually needed -- when the assembler
5285          sees constructs like "fragment.texcoord[A0.x]", it must treat *all*
5286          texture coordinates as live unless it can determine the range of
5287          values used for indexing.  The named array variable approach
5288          explicitly identifies which attributes are needed when indexing is
5289          used.
5290
5291      Functionality altered includes:
5292
5293        - The RSQ instruction in the original NV_vertex_program and
5294          ARB_vertex_program extensions implicitly took the absolute value of
5295          their operand.  Since the ARB extensions don't have numerics
5296          guarantees, computing the reciprocal square root of a negative value
5297          was not meaningful.  To allow for the possibility of taking the
5298          reciprocal square root of a negative value (which should yield NaN
5299          -- "not a number"), the RSQ instruction in this instruction no
5300          longer implicitly takes the absolute value of its operand.
5301          Equivalent functionality can be achieved using the explicit |abs|
5302          absolute value operator on the operand to RSQ.
5303
5304        - The results of texture lookups accessing inconsistent textures are
5305          now undefined, instead of producing a fixed constant vector.
5306
5307
5308    (3) What should this set of extensions be called?
5309
5310      RESOLVED:  NV_gpu_program4, NV_vertex_program4, NV_fragment_program4,
5311      and NV_geometry_program4.  Only NV_gpu_program4 will appear in the
5312      extension string; the other three specifications exist simply to define
5313      vertex, fragment, and geometry program-specific features.
5314
5315      The "gpu_program" name was chosen due to the common instruction set
5316      intended to run on GPUs.  On previous chip generations, the vertex and
5317      fragment instruction sets were similar, but there were enough
5318      differences to package them separately.
5319
5320      The choice of "4" indicates that this is the fourth generation of
5321      programmable hardware from NVIDIA.  The GeForce3 and GeForce4 series
5322      supported NV_vertex_program.  The GeForce FX series supported
5323      NV_vertex_program2 and added fragment programmability with
5324      NV_fragment_program.  Around this time, the OpenGL Architecture Review
5325      Board (ARB) approved ARB_vertex_program and ARB_fragment_program
5326      extensions, and NVIDIA added NV_vertex_program2_option and
5327      NV_fragment_program_option extensions exposing GeForce FX features using
5328      the ARB extensions' instruction set.  The GeForce6 and GeForce7 series
5329      brought the NV_vertex_program3 and NV_fragment_program2 extensions,
5330      which extend the ARB extensions further.  This extension adds geometry
5331      programs, and brings the "version number" for each of these extensions
5332      up to "4".
5333
5334
5335    (4) This instruction adds integer data type support in programmable
5336    shaders that were previously float-centric.  Should applications be able
5337    to pass integer values directly to the shaders, and if so, how does it
5338    work?
5339
5340      RESOLVED:  The diagram at the bottom of this issue depicts data flows in
5341      the GL, as extended by this and related extensions.
5342
5343      This extension generalizes some state to be "typeless", instead of being
5344      strongly typed (and almost invariably floating-point) as in the core
5345      specification.  We introduce a new set of functions to specify GL state
5346      as signed or unsigned integer values, instead of floating point values.
5347      These functions include:
5348
5349        * VertexAttribI*{i,ui}() -- Specify generic vertex attributes as
5350          integers.  This extension does not create "integer" versions for
5351          fixed-function attribute functions (e.g., glColor, glTexCoord),
5352          which remain fully floating-point.
5353
5354        * Program{Env,Local}ParameterI*{i,ui}() -- Specify environment and
5355          local parameters as integers.
5356
5357        * TexImage*() with EXT_texture_integer internal formats -- Specify
5358          texture images as containing integer data whose values are not
5359          converted to floating-point values.
5360
5361        * EXT_parameter_buffer_object functions -- Bind (typeless) buffer
5362          object data stores for use as program parameters.  These buffer
5363          objects can be loaded with either integer or floating-point data.
5364
5365        * EXT_texture_buffer_object functions -- Bind (typeless) buffer object
5366          data stores for use as textures.  These buffer objects can be loaded
5367          with either integer or floating-point data.
5368
5369      Each type of program (using NV_gpu_program4 and related extension) can
5370      read attributes using any data type (float, signed integer, unsigned
5371      integer) and write result values used by subsequent stages using any
5372      data type.
5373
5374      Finally, there are several new places where integer data can be
5375      consumed by the GL:
5376
5377        * NV_transform_feedback -- Stream transformed vertex attribute
5378          components to a (typeless) buffer object.  The transformed
5379          attributes can be written as signed or unsigned integers in vertex
5380          and geometry programs.
5381
5382        * EXT_texture_integer internal formats and framebuffer objects --
5383          Provide support for rendering to integer texture formats, where
5384          final fragment values are treated as signed or unsigned integers,
5385          rather than floating-point values.
5386
5387      The diagram below represents a substantial portion of the GL pipeline.
5388      Each line connecting blocks represents an interface where data is
5389      "produced" from the GL state or by fixed-function or programmable
5390      pipeline stages and "consumed" by another pipeline stage.  Each producer
5391      and consumer is labeled with a data type.  For producers, the
5392      "(typeless)" designation generally means that the state and/or output
5393      can be written as floating-point values or as signed or unsigned
5394      integers.  "(float)" means that the outputs are always written as
5395      floating-point.  The same distinction applies to consumers --
5396      "(typeless)" means that the consumer is capable of reading inputs using
5397      any data type, and "(float)" means that consumer always reads inputs as
5398      floating-point values.
5399
5400      To get sane results, applications must ensure that each value passed
5401      between pipeline stages is produced and consumed using the same data
5402      type.  If a value is written in one stage as a floating-point value; it
5403      must be read as a floating-point value as well.  If such a value is read
5404      as a signed or unsigned integer, its value is considered undefined.  In
5405      practice, the raw bits used to represent the floating-point (IEEE
5406      single-precision floating-point encoding in the initial implementation
5407      of this spec) will be treated as an integer.
5408
5409      Type matching between stages is not enforced by the GL, because the
5410      overhead of doing so would be substantial.  Such overhead would include:
5411
5412        * matching the inputs and outputs of each pipeline stage
5413          (fixed-function or programmable) every time the program
5414          configuration or fixed-function state changes,
5415
5416        * tracking the data type of each generic vertex attribute and checking
5417          it against the vertex program's inputs,
5418
5419        * tracking the data type of each program parameter and checking it
5420          against the manner the parameters were used in programs,
5421
5422        * matching color buffers against fragment program outputs.
5423
5424      Such error checking is certainly valuable, but the additional CPU
5425      overhead cost is substantial.  Given that current CPUs often have a hard
5426      time keeping up with high-end GPUs, adding more overhead is a step in
5427      the wrong direction.  We expect developer tools, such as instrumented
5428      drivers, to be able to provide type checking on most interfaces.
5429
5430      The diagram below depicts assembly programmability.  Using vertex,
5431      geometry, and fragment shaders provided by the OpenGL Shading Language
5432      (GLSL) isn't substantially different from the assembly interface, except
5433      that the interfaces between programmable pipeline stages are more
5434      tightly coupled in GLSL (vertex, geometry, and fragment shaders are
5435      linked together into a single program object), and that shader variables
5436      are more strongly typed in GLSL than in the assembly interface.
5437
5438      In the figure below, the first programmable stage is vertex program
5439      execution.  For all inputs read by the vertex program, they must be
5440      specified in the GL vertex APIs (immediate mode or vertex arrays) using
5441      a data type matching the data type read by the shader.  Additionally,
5442      vertex programs (and all other program types) can read program
5443      parameters, parameter buffers, and textures.  In all cases the
5444      parameter, buffer, or texture data must be accessed in the shader using
5445      the same data type used to specify the data.  If vertex programs are
5446      disabled, fixed-function vertex processing is used.  Fixed-function
5447      vertex processing is fully floating-point, and all the conventional
5448      vertex attributes and state used by fixed-function are floating-point
5449      values.
5450
5451      After vertex processing, an optional geometry program can be executed,
5452      which reads attributes written by vertex programs (or fixed-functon) and
5453      writes out new vertex attributes.  The vertex attributes it reads must
5454      have been written by the vertex program (or fixed-function) using a
5455      matching data type.
5456
5457      After geometry program execution, vertex attributes can optionally be
5458      written out to buffer objects using the NV_transform_feedback extension.
5459      The vertex attributes are written by the GL to the buffer objects using
5460      the same data type used to write the attribute in the geometry program
5461      (or vertex program if geometry programs are disabled).
5462
5463      Then, rasterization generates fragments based on transformed vertices.
5464      Most attributes written by vertex or geometry programs can be read by
5465      fragment programs, after the rasterization hardware "interpolates" them.
5466      This extension allows fragment programs to control how each attribute is
5467      interpolated.  If an attribute is flat-shaded, it will be taken from the
5468      output attribute of the provoking vertex of the primitive using the same
5469      data type.  If an attribute is smooth-shaded, the per-vertex attributes
5470      will be interpreted as a floating-point value, and a floating-point
5471      result.  One necessary consequence of this is that any integer
5472      per-fragment attributes must be flat-shaded.  To prevent some
5473      interpolation type errors, assembly and GLSL fragment shaders will not
5474      compile if they declare an integer fragment attribute that is not flat
5475      shaded.  [NOTE:  While point primitives generally have constant
5476      attributes, any integer attributes must still be flat-shaded; point
5477      rasterization may perform (degenerate) floating-point interpolation.]
5478
5479      Fragment programs must read attributes using data types matching the
5480      outputs of the interpolation or flat-shading operations.  They may write
5481      one or more color outputs using any data type, but the data type used
5482      must match the corresponding framebuffer attachments.  Outputs directed
5483      at signed or unsigned integer textures (EXT_texture_integer) must be
5484      written using the appropriate integer data type; all other outputs must
5485      be written as floating-point values.  Note that some of the
5486      fixed-function per-fragment operations (e.g., blending, alpha test) are
5487      specified as floating-point operations and are skipped when directed at
5488      signed or unsigned integer color buffers.
5489
5490
5491
5492                                     generic               conventional
5493                                     vertex                  vertex
5494                                    attributes              attributes
5495                                       | (typeless)             | (float)
5496                                       |                        |
5497                                       |                        |
5498                                       | +----------------------+
5499         program                       | |                      |
5500        parameters ----+               | |                      |
5501        (typeless)     |               | | (typeless)           | (float)
5502                       |               V V                      V
5503         constant      +-+----------> vertex              fixed-function
5504         buffers   ----+ |(typeless)  program                 vertex
5505        (typeless)     | |              |                       |
5506                       | |              | (typeless)            | (float)
5507         textures  ----+ |              V                       |
5508        (typeless)       |              |<----------------------+
5509            |            |              |
5510            |            |              +---------------+
5511            |            |              |               |
5512            |            |              | (typeless)    |
5513            |            |              V               |
5514            |            +---------> geometry           |
5515            |            |(typeless) program            |
5516            |            |              |               |
5517            |            |              | (typeless)    |
5518            |            |              V               |
5519            |            |              |<--------------+
5520            |            |              |
5521            |            |              |
5522            |            |              +-----------------+
5523            |            |              |                 |(typeless)
5524            |            |              |                 v
5525            |            |              |             transform
5526            |            |              |             feedback
5527            |            |              |              buffers
5528            |            |              |
5529            |            |              |
5530            |            |              +-----------------------+
5531            |            |              |                       |
5532            |            |              | (float)               | (typeless)
5533            |            |              V                       V
5534            |            |         interpolated               flat
5535            |            |          attributes             attributes
5536            |            |              |                       |
5537            |            |              | (float)               | (typeless)
5538            |            |              V                       |
5539            |            |              |<----------------------+
5540            |            |              |
5541            |            |              +-----------------------+
5542            |            |              |                       |
5543            |            |              | (typeless)            | (float)
5544            |            |(typeless)    V                       V
5545            |            +---------> fragment     +------> fixed-function
5546            |                        program      |(float)   fragment
5547            |                           |         |             |
5548            +--------------------------/|/--------+             |
5549                                        |                       |
5550                                        | (typeless)            | (float)
5551                                        V                       |
5552                                        |<----------------------+
5553                                        |
5554                                        +-----------------------+------ ....
5555                                        |                       |
5556                                        | (typeless)            | (typeless)
5557                                        V                       V
5558                                      color                   color
5559                                    attachment              attachment
5560                                        0                       1
5561
5562
5563    (5) Instructions can operate on signed integer, unsigned integer, and
5564    floating-point values.  Some operations make sense on all three data
5565    types?  How is this supported, and what type checking support is provided
5566    by the assembler?
5567
5568      RESOLVED:  One important property of the instruction set is that the
5569      data type for all operands and the result is fully specified by the
5570      instructions themselves.  For instructions (such as ADD) that make sense
5571      for both integer and floating-point values, an optional data type
5572      modifier is provided to indicate which type of operation should be
5573      performed.  For example, "ADD.S", "ADD.U", and "ADD.F", add signed
5574      integers, unsigned integers, or floating-point values, respectively.  If
5575      no data type modifier is provided, ".F" is assumed if the instruction
5576      can apply to floating-point values and ".S" is assumed otherwise.
5577
5578      To help identify errors where the wrong data type is used -- for
5579      example, adding integer values in an ADD instruction that omits a data
5580      type modifier and thus defaults to "ADD.F" -- variables may be declared
5581      with optional data type modifiers.  In the following code:
5582
5583        INT TEMP a;
5584        UINT TEMP b;
5585        FLOAT TEMP c;
5586        TEMP d;
5587
5588      "a", "b", "c", and "d" are declared as temporary variables holding
5589      signed integer, unsigned integer, floating-point, and typeless values.
5590      Since each instruction fully specifies the data type of each operand and
5591      its result, these data types can be checked against the data type
5592      assigned to the variables operated on.  If the types don't match, and
5593      the variable is not typeless, an error is reported.  The opcode modifier
5594      ".NTC" can be used to ignore such errors on a per-opcode basis, if
5595      required.
5596
5597      Note that when bindings are used directly in instructions, they are
5598      always considered typeless for simplicity.  Some fixed-function bindings
5599      have an obvious data type, but other bindings (e.g., program parameters)
5600      can hold either integer or floating-point values, depending on how they
5601      were specified.
5602
5603      Variable data types are optional.  Typeless variables are provided
5604      because some programs may want to reuse the same variable in several
5605      places with different data types.
5606
5607    (6) Should both signed (INT) and unsigned integer (UINT) data types be
5608    provided?
5609
5610      RESOLVED:  Yes.  Signed and unsigned integer operations are supported.
5611      Providing both "INT" and "UINT" variable modifiers distinguish between
5612      signed and unsigned values for type checking purposes, to ensure that
5613      unsigned values aren't read as signed values and vice versa.
5614
5615      This specification says if a value is read a signed integer, but was
5616      written as an unsigned integer, the value returned is undefined.
5617      However, signed and unsigned integers are interchangeable in practice,
5618      except for very large unsigned integers (which can't be represented as
5619      signed values of the equivalent size) or negative signed integers.
5620
5621      If programs know that they won't generate negative or very large values,
5622      signed and unsigned integers can be used interchangeably.  To avoid type
5623      errors in the assembler in this case, typeless variables can be used.
5624      Or the ".NTC" modifier can be used when appropriate.
5625
5626    (7) Integer and floating-point constants are supported in the instruction
5627    set.  Integer constants might be interpreted to mean either "real integer"
5628    values or floating-point values.  How are they supported?
5629
5630      RESOLVED:  When an obvious floating point constant is specified (e.g.,
5631      "3.0"), the developers' intent is clear.  If you try to use a
5632      floating-point value in an instruction that wants an integer operand, or
5633      a declaration of an integer parameter variable, the program will fail to
5634      load.  An integer constant used in an instruction isn't quite as clear.
5635      But its meaning can be easily inferred because the operand types of
5636      instructions are well-known at compile time.  An integer multiply
5637      involving the constant "2" will interpret the "2" as an integer.  A
5638      floating-point multiply involving the same constant "2" will interpret
5639      it as a floating-point value.
5640
5641      The only real problem is for a parameter declaration that is typeless.
5642      For typed variables, the intent is clear:
5643
5644        INT PARAM two = 2;               # use integer 2
5645        FLOAT PARAM twoPt0 = 2;          # use floating-point 2.0
5646
5647      For typeless variables, there's no context to go on:
5648
5649        PARAM two = 2;                   # 2?  2.0?
5650
5651      This extension is intended to be largely upward-compatible with
5652      ARB_vertex_program, ARB_fragment_program, and the other extensions built
5653      on top of them.  In all of these, the previous declaration is legal and
5654      means "2.0".  For compatibility, we choose to interpret integer
5655      constants in this case as floating-point values.  The assembler in the
5656      NVIDIA implementation will issue a warning if this case ever occurs.
5657
5658      This extension does not provide decoration of integer constant values --
5659      we considered adding suffixed integers such as "2U" to mean "2, and
5660      don't even think about converting me to a float!".  We expect that it
5661      will be sufficient to use the "INT" or "FLOAT" modifiers to disambiguate
5662      effectively.
5663
5664    (8) Should hexadecimal constants (e.g., 0x87A3 or 0xFFFFFFFF) be supported?
5665
5666      RESOLVED:  Yes.
5667
5668    (9) Should we provide data type modifiers with explicit component sizes?
5669    For example, "INT8", "FLOAT16", or "INT32".  If so, should we provide a
5670    mechanism to query the size (in bits) of a variable, or of different
5671    variable types/qualifiers?
5672
5673      RESOLVED:  No.
5674
5675    (10) Should this extension provide better support for array variables?
5676
5677      RESOLVED:  Yes; array variables of all types are allowed.
5678
5679      In ARB_vertex_program, program parameter (constant) variables could be
5680      addressed as arrays.  Temporary variables, vertex attributes, and vertex
5681      results could not be declared as arrays.
5682
5683      In NV_vertex_program3 and NV_fragment_program2, relative addressing was
5684      supported in program bindings:
5685
5686        MOV R0, vertex.attrib[A0.x];            # vertex
5687        MOV result.texcoord[A0.x], R0;          # vertex
5688        MOV R0, fragment.texcoord[A0.x];        # fragment -- inside LOOP
5689
5690      Explicitly declared attribute or result arrays were not supported, and
5691      temporaries could also not be arrays.
5692
5693      This extension allows users to declare attribute, result, and temporary
5694      arrays such as:
5695
5696        ATTRIB attribs[] = { vertex.attrib[7..11] };
5697        TEMP scratch[10];
5698        RESULT texcoords[] = { result.texcoord[0..3] };
5699
5700      Additionally, the relative addressing mechanisms provided by
5701      NV_vertex_program3 and NV_fragment_program2 are NOT supported in this
5702      extension -- instead, declared array variables are the only way to get
5703      relative addressing.  Using declared arrays allows the assembler to
5704      identify which attributes will actually be used.  An expression like
5705      "vertex.texcoord[A0.x]" doesn't identify which texture coordinates are
5706      referenced, and the assembler must be conservative in this case and
5707      assume that they all are.
5708
5709    (11) Is relative addressing of temporaries allowed?
5710
5711      RESOLVED:  Yes.  However, arrays of temporaries may end up being stored
5712      in off-chip memory, and may be slower to access than non-array
5713      temporaries.
5714
5715    (12) Should this extension add bindings to pass generic attributes between
5716    vertex, geometry, and fragment programs, or are texture coordinates
5717    sufficient?
5718
5719      RESOLVED:  While texture coordinates have been used in the past, generic
5720      attributes should be provided.
5721
5722      The assembler provides a large set of bindings and automatically
5723      eliminates generic attributes or components that are unused.  At each
5724      interface between programs, there is an implementation-dependent limit
5725      on the number of attribute components that can be passed.
5726
5727      There are several reasons that this approach was chosen.  First, if the
5728      number of attributes that can be passed between program stages exceeds
5729      the number of existing texture coordinate sets supported when specifying
5730      vertex, a second implementation-dependent number of texture coordinates
5731      would need to be exposed to cover the number supported between stages.
5732      Second, the mechanisms described above reduce or eliminate the need to
5733      pack attributes into four component vectors.  Third, "texture
5734      coordinates" that have been historically used for texture lookups don't
5735      need to be used to pass values that aren't used this way.
5736
5737    (13) The structured branching support in NV_fragment_program2 provides a
5738    REP instruction that says to repeat a block of code <N> times, as well as
5739    a LOOP instruction that does the same, but also provides a special loop
5740    counter variable.  What sort of looping mechanism should we provide here?
5741
5742      RESOLVED:  Provide only the REP instruction.  The functionality provided
5743      by the LOOP instruction can be easily achieved by using an integer
5744      temporary as the loop index.  This avoids two annoyances of the old LOOP
5745      models:  (a) the loop index (A0.x) is a special variable name, while all
5746      other variables are declared normally and (b) instructions can only
5747      access the loop index of the innermost loop -- loop indices at higher
5748      nesting levels are not accessible.
5749
5750      One other option was a considered -- a "LOOPV" instruction (LOOP with a
5751      variable where the program specified a variable name and component to
5752      hold the loop index, instead of using the implicit variable name "A0.x".
5753      In the end, it was decided that using an integer temporary as a loop
5754      counter was sufficient.
5755
5756    (14) The structured branching support in NV_fragment_program2 provides a
5757    REP instruction that requires a loop count.  Some looping constructs may
5758    not have a definite loop count, such as a "while" statement in C.  Should
5759    this construct be supported, and if so, how?
5760
5761      RESOLVED:  The REP instruction is extended to make the loop count
5762      optional.  If no loop count is provided, the REP instruction specified a
5763      loop that can only be exited using the BRK (break) or RET instructions.
5764      To avoid obvious infinite loops, an error will be reported if a
5765      REP/ENDREP block contains no BRK instruction at the current nesting
5766      level and no RET instruction at any nesting level.
5767
5768      To implement a loop like "while (value < 7.0) ...", code such as the
5769      following can be used:
5770
5771        TEMP cc;                        # dummy variable
5772        REP;
5773          SLT.CC cc.x, value.x, 7.0;    # compare value.x to 7.0, set CC0
5774          BRK NE.x;                     # break out if not true
5775          ...
5776          ...                           # presumably update value!
5777          ...
5778        ENDREP;
5779
5780    (15) The structured branching support in NV_fragment_program2 provides a
5781    BRK instruction that operates like C's "break" statement.  Should we
5782    provide something similar to C's "continue" statement, which skips to the
5783    next iteration of the loop?
5784
5785      RESOLVED:  Yes, a new CONT opcode is provided for this purpose.
5786
5787    (16) Can the BRK or CONT instructions break out of multiple levels of
5788    nested loops at once?
5789
5790      RESOLVED:  No.  BRK and CONT only exit the current nesting level.  To
5791      break out of multiple levels of nested loops, multiple BRK/CONT
5792      instructions are required.
5793
5794    (17) For REP instructions, is the loop counter reloaded on each iteration
5795    of the loop?
5796
5797      RESOLVED:  No.  The loop counter is loaded once at the top of the loop,
5798      compared to zero at the top of the loop, and decremented when each loop
5799      iteration completes.  A program may overwrite the variable used to
5800      specify the initial value of the loop counter inside the loop without
5801      affecting the number of times the loop body is executed.
5802
5803    (18) How are floating-point values represented in this extension?  What
5804    about floating-point arithmetic operations?
5805
5806      RESOLVED:  In the initial hardware implementation of this extension,
5807      floating-point values are represented using the standard 32-bit IEEE
5808      single-precision encoding, consisting of a sign bit, 8 exponent bits,
5809      and 23 mantissa bits.  Special encodings for NaN (not a number), +/-INF
5810      (infinity), and positive and negative zero are supported.  Denorms
5811      (values less than 2^-126, which have an exponent encoding of "0" and no
5812      implied leading one) are supported, but may be flushed to zero,
5813      preserving the sign bit of the original value.  Arithmetic operations
5814      are carried out at single-precision using normal IEEE floating-point
5815      rules, including special rules for generating infinities, NaNs, and
5816      zeros of each sign.
5817
5818      Floating-point temporaries declared as "SHORT" may be, but are not
5819      necessarily, stored as 16-bit "fp16" values (sign bit, five exponent
5820      bits, ten mantissa bits), as specified in the NV_float_buffer and
5821      ARB_half_float_pixel extensions.
5822
5823    (19) Should we provide a method to declare how fragment attributes are
5824    interpolated?  It is possible to have flat-shaded attributes,
5825    perspective-corrected attributes, and centroid-sampled attributes.
5826
5827      RESOLVED:  Yes.  Fragment program attribute variable declarations may
5828      specify the "FLAT", "NOPERSPECTIVE", and "CENTROID" modifiers.
5829
5830      These modifiers are documented in detail in the NV_fragment_program4
5831      specification.
5832
5833    (20) Should vertex and primitive identifiers be supported?  If so, how?
5834
5835      RESOLVED:  A vertex identifier is available as "vertex.id" in a vertex
5836      program.  The vertex ID is equal to value effectively passed to
5837      ArrayElement when the vertex is specified, and is defined only if vertex
5838      arrays are used with buffer objects (VBOs).
5839
5840      A primitive identifier is available as "primitive.id" in a geometry or
5841      fragment program.  The primitive ID is equal to the number of primitives
5842      processed since the last implicit or explicit call to glBegin().
5843
5844      See the NV_vertex_program4 spec for more information on vertex IDs, and
5845      the NV_geometry_program4 or NV_fragment_program4 specs for more
5846      information on primitive IDs.
5847
5848    (21) For integer opcodes, should a bitwise inversion operator "~" be
5849    provided, analogous to existing negation operator?
5850
5851      RESOLVED:  No.  If this operator were provided, it might allow a program
5852      to evaluate the expression "a&(~b)" using a single instruction:
5853
5854        AND.U a, a, ~b;
5855
5856      Instead, it is necessary to instead do something like:
5857
5858        UINT TEMP t;
5859        NOT.U t, b;
5860        AND.U a, a, t;
5861
5862      If necessary, this functionality could be added in a subsequent
5863      extension.
5864
5865    (22) What happens if you negate or take the absolute value of the
5866    biggest-magnitude negative integer?
5867
5868      RESOLVED:  Signed integers are represented using two's complement
5869      representation.  For 32-bit integers, the largest possible value is
5870      2^31-1; the smallest possible value is -2^31.  There is no way to
5871      represent 2^31, which is what these operators "should" return.  The
5872      value returned in this case is the original value of -2^31.
5873
5874    (23) How do condition codes work?  How are they different from those
5875    provided in previous NVIDIA extensions?
5876
5877      RESOLVED:  There are two condition codes -- CC0 and CC1 -- each of which
5878      is a four-component vector.  The condition codes are set based on the
5879      result of an instruction that specifies a condition code update
5880      modifier.  Examples include:
5881
5882        ADD.S.CC  R0, R1, R2;       # add signed integers R1 and R2, update
5883                                    #   CC0 based on the result, write the
5884                                    #   final value to R0
5885        ADD.F.CC1 R3, R4, R5;       # add floats R4 and R5, update CC1 based
5886                                    #   on the result, write the final value
5887                                    #   to R3
5888        ADD.U.CC0 R6.xy, R7, R8;    # add unsigned integers R7 and R8, update
5889                                    #   CC0 (x and y components) based on the
5890                                    #   result, write the final value to R6
5891                                    #   (x and y components)
5892
5893      Condition codes can be used for conditional writes, conditional
5894      branches, or other operations.  The condition codes aren't used
5895      directly, but are instead used with a condition code test such as "LT"
5896      (less than) or "EQ" (equal to).  Examples include:
5897
5898        MOV R0 (GT.x), R1;          # move R1 to R0 only if the x component of
5899                                    #   CC0 indicates a result of ">0"
5900        MOV R2 (NE1), R3;           # component-wise move of R3 to R2 if the
5901                                    #   corresponding component of CC1
5902                                    #   indicates a result of "!=0"
5903        IF LE0.xyxy;                # execute the block of code if the x or
5904          ...                       #   y components of CC0 indicate a result
5905        ENDIF;                      #   of "<=0"
5906        REP;
5907          ...
5908          BRK EQ1.xyzx;             # break out of loop if the x, y, or z
5909        ENDREP;                     #   components of CC1 indicate a result of
5910                                    #   "==0".
5911
5912      Previous NVIDIA extensions provide eight tests, which are still
5913      supported here.  The tests "EQ" (equal), "GE" (greater/equal), "GT"
5914      (greater than), "LE" (less/equal), "LT" (less than), and "NE" (not
5915      equal) can be used to determine the relation of the result used to set
5916      the condition code with zero.  The tests "TR" (true) and "FL" (false),
5917      are special tests that always evaluate to true or false respectively.
5918
5919      For floating-point results, a NaN (not a number) encoding causes the
5920      "NE" condition to evaluate to TRUE and all other conditions to evaluate
5921      to FALSE.  IEEE encodings for "negative" and "positive" zero are both
5922      treated as equal to zero.
5923
5924      Condition codes are implemented as a set of flags, which are set
5925      depending on the type of operation, as described in the spec.
5926
5927      For instructions that return floating-point or signed integer values,
5928      the normal condition code tests reliably indicate the relationship of
5929      the result to zero.  For instructions that return unsigned values, the
5930      condition codes are a bit more complicated.  For example, the sign flag
5931      is set if the most significant bit of the result written is set.  As a
5932      result, very large unsigned integer values (e.g., 0x80000000 -
5933      0xFFFFFFFF) are effectively treated as negative values.  Condition code
5934      tests should be used with care with unsigned results -- to test if an
5935      unsigned integer is ">0", use a sequence like:
5936
5937        MOV.U.CC R0, R1;            # move R1 to R0, set condition code
5938        IF NE;                      # test if the result is "!=0", a very
5939          ...                       #   large value might fail "GT"!
5940        ENDIF;
5941
5942      This extension provides a number of additional condition code tests
5943      useful for different floating-point or integer operations:
5944
5945        * NAN (not a number) is true if a floating-point result is a NaN.  LEG
5946          (less, equal to, or greater) is the opposite of NAN.
5947
5948        * CF (carry flag) is true if an unsigned add overflows, or if an
5949          unsigned subtract produces a non-negative value.  NCF (no carry
5950          flag) is the opposite of CF.
5951
5952        * OF (overflow flag) is true if a signed add or subtract overflows.
5953          NOF (no overflow flag) is the opposite of OF.
5954
5955        * SF (sign flag) is true if the sign flag is set.  NSF (no sign flag)
5956          is the opposite of SF.
5957
5958        * AB (above) is true if an unsigned subtract produces a positive
5959          result.  BLE (below or equal) is the opposite of AB, and is true if
5960          an unsigned subtract produces a negative result or zero.  Note that
5961          CF can be used to test if the result is greater than or equal to
5962          zero, and NCF can be used to test if the result is less than zero.
5963
5964    (24) How do the "set on" instructions (SEQ, SGE, SGT, SLE, SLT, SNE) work
5965    with integer values and/or condition codes?
5966
5967      RESOLVED:  "Set on" instructions comparing signed and unsigned values
5968      return zero if the condition is false, and an integer with all bits set
5969      if the condition is true.  If the result is signed, it is interpreted as
5970      -1.  If the result is unsigned, it is interpreted the largest unsigned
5971      value (0xFFFFFFFF for 32-bit integers).  This is different from the
5972      floating-point "set on", which is defined to return 1.0.
5973
5974      This specific result encoding was chosen so that bitwise operators (NOT,
5975      AND, OR, XOR) can be used to evaluate boolean expressions.
5976
5977      When performing condition code tests on the results of an integer "set
5978      on" instruction, keep in mind that a TRUE result has the most
5979      significant bit set and will be interpreted as a negative value.  To
5980      test if a condition is true, use "NE" (!=0).  A condition code test of
5981      "GT" will always fail if the condition code was written by an integer
5982      "set on" instruction.
5983
5984    (25) What new texture functionality is provided?
5985
5986      RESOLVED:  Several new features are provided.
5987
5988      First, the TXF (texel fetch) instruction allows programs to access a
5989      texture map like a normal array.  Integer coordinates identifying an
5990      individual texel and LOD are provided, and the corresponding texture
5991      data is returned without filtering of any type.
5992
5993      Second, the TXQ (texture size query) instruction allows programs to
5994      query the size of a specified level of detail of a texture.  This
5995      feature allows programs to perform computations dependent on the size of
5996      the texture without having to pass the size as a program parameter or
5997      via some other mechanism.
5998
5999      Third, applications may specify a constant texel offset in a texture
6000      instruction that moves the texture sample point by the specified number
6001      of texels.  This offset can be used to perform custom texture filtering,
6002      and is also independent of the size of the texture LOD -- the same
6003      offsets are applied, regardless of the mipmap level.
6004
6005      Fourth, shadow mapping is supported for cube map textures.  The first
6006      three coordinates are the normal (s,t,r) coordinates for a cube map
6007      texture lookup, and the fourth component is a depth reference value that
6008      can be compared to the depth value stored in the texture.
6009
6010    (26) What "consistency" requirements are in effect for textures accessed
6011    via the TXF (texel fetch) instruction?
6012
6013      UNRESOLVED:  The texture must be usable for regular texture mapping
6014      operations -- if texture sizes or formats are inconsistent and a
6015      mipmapped min filter is used, the results are undefined.
6016
6017    (27) How does the TXF instruction work with bordered textures?
6018
6019      RESOLVED:  The entire image can be accessed, including the border
6020      texels.  For a 64x64 2D texture plus border (66x66 overall), the lower
6021      left border texel is accessed using the coordinates (-1,-1); the upper
6022      right border texel is accessed using the coordinates (64,64).
6023
6024    (28) What should TXQ (texture size query) return for "irrelevant" texture
6025    sizes (e.g., height of a 1D texture)?  Should it return any other
6026    information at the same time?
6027
6028      RESOLVED:  This specification leaves all "extra" components undefined.
6029
6030    (29) How do texture offsets interact with cubemap textures?
6031
6032      RESOLVED:  They are not supported in this extension.
6033
6034    (30) How do texture offsets interact with mipmapped textures?
6035
6036      RESOLVED:  The texture offsets are added after the (s,t,r) coordinates
6037      have been divided by q (if applicable) and converted to (u,v,w)
6038      coordinates by multiplying by the size of the selected texture level.
6039      The offsets are added to the (u,v,w) coordinates, and always move the
6040      sample point by an integral number of texel coordinates.  If multiple
6041      mipmaps are accessed, the sample point in each mipmap level is moved by
6042      an identical offset.  The applied offsets are independent of the
6043      selected mipmap level.
6044
6045    (31) How do shadow cube maps work?
6046
6047      UNRESOLVED:  An application can define a cube map texture with a
6048      DEPTH_COMPONENT internal format, and then render a scene using the cube
6049      map faces as the depth buffer(s).  When rendering the projection should
6050      be set up using the "center" of the cubemap as the eye, and using a
6051      normal projection matrix.  When applying the shadow map, the fragment
6052      program read the (x,y,z) eye coordinates, compute the length of the
6053      major axis (MAX(|x|,|y|,|z|) and then transform this coordinate to [0,1]
6054      space using the same parameters used to derive Z in the projection
6055      matrix.  A 4-component vector consisting of x, y, z, and this computed
6056      depth value should be passed to the texture lookup, and normal shadow
6057      mapping operations will be performed.
6058
6059      This issue should include the math needed to do this computation and
6060      sample code.
6061
6062    (32) Integer multiplies can overflow by a lot.  Should there be some way
6063    to return the high part of both unsigned and signed integer multiplies?
6064
6065      RESOLVED:  Yes.  The ".HI" multipler is provided to do a return the 32
6066      MSBs of a 32x32 integer multiply.  The instruction sequence:
6067
6068        INT TEMP R0, R1, R2, R3;
6069        MUL.S    R0, R2, R3;
6070        MUL.S.HI R1, R2, R3;
6071
6072     will do a 32x32 signed integer multiply of R2 and R3, with the 32 LSBs of
6073     the 64-bit result in R0 and the 32 MSBs in R1.
6074
6075    (33) Should there be any other special multiplication modifiers?
6076
6077      RESOLVED:  Yes.  The ".S24" and ".U24" modifiers allow for signed and
6078      unsigned integer multiplies where both operands are guaranteed to fit in
6079      the least significant 24 bits.  On some architectures supporting this
6080      extension, ".S24" and ".U24" integer multiplies may be faster than
6081      general-purpose ".S" and ".U" multiplies.  If either value doesn't fit
6082      in 24 bits, the results of the operation are undefined --
6083      implementations may, but are not required to, ignore the MSBs of the
6084      operands if ".S24" or ".U24" is specified.
6085
6086    (34) This extension provides subroutines, but doesn't provide a stack to
6087    push and pop parameters.  How do we deal with this?  NV_vertex_program3
6088    supported PUSHA/POPA instructions to push and pop address registers.
6089
6090      RESOLVED:  No explicit stack is required.  A program can implement a
6091      stack by allocating a temporary array plus a single integer temporary to
6092      use as the stack "pointer".  For example:
6093
6094        TEMP stack[256];                # 256 4-component vectors
6095        INT TEMP sp;                    # sp.x == stack pointer
6096        INT TEMP cc;                    # condition code results
6097
6098        function:
6099          SGE.S.CC cc.x, sp.x, 256;     # compute stackPointer >= 256
6100          RET NE.x;                     # return if TRUE
6101          MOV stack[sp], R0;            # push R0 onto the stack
6102          ADD.S sp.x, sp.x, 1;
6103          ...
6104          SUB.S sp.x, sp.x, 1;          # pop R0 off the stack
6105          MOV R0, stack[sp];
6106          RET
6107
6108    (35) Should we provide new vector semantics for previously-defined opcodes
6109    (e.g., LG2 computes a component-wise logarithm)?
6110
6111      RESOLVED:  Not in this extension.  The instructions we define here are
6112      compatible with the vector or scalar nature of previously defined
6113      opcodes.  This simplifies the implementation of an assembler that needs
6114      to support both old and new instruction sets.
6115
6116    (36) Should it really be undefined to read from a register storing data of
6117    one type with an instruction of the other type (e.g., to read the bits of
6118    a floating-point number as an unsigned integer)?
6119
6120      RESOLVED:  The spec describes undefined results for simplicity.  In
6121      practice, mixing data types can be done, where signed integers are
6122      represented as two's complement integers and floating-point numbers are
6123      represented using IEEE single-precision representation.  For example:
6124
6125        TEMP R0, R1;                    # typeless
6126        MOV.U R0, 0x3F800000;           # R0 = 1.0
6127        MOV.U R1, 0xBF800000;           # R1 = -1.0
6128        MUL.F R0, R0, R1;               # R0 = -1 * 1 = -1 (0xBF800000)
6129        XOR.U R0, R0, R1;               # R0 = 0xBF800000 ^ 0xBF800000 = 0
6130        NOT.U R0, R0;                   # R0 = 0xFFFFFFFF
6131        I2F.S R0, R0;                   # R0 = -1.0 (0xFFFFFFFF = -1 signed)
6132        SEQ.F R0, R0, R1;               # R0 = 1.0 (-1.0 == -1.0)
6133
6134    (37) Buffer objects can be sourced as program parameters using the
6135    NV_parameter_buffer_object extension.  How are they accessed in a program?
6136
6137      RESOLVED:  The instruction set and existing program environment and
6138      local parameter bindings operate largely on four-component vectors.
6139      However, NV_parameter_buffer_object exposes the ability to reach into
6140      buffers consisting of user-generated data or data written to the buffer
6141      object by the GPU.  Such data sets may not consist entirely
6142      four-component floating-point vectors, so a four-component vector API
6143      may be unnatural.  An application might need to reformat its data set to
6144      deal with this issue.  Or it might generate odd code to compensate for
6145      mis-alignment -- for example, reading an array of 3-component vectors by
6146      doing two four-component vector accesses and then rotating based on
6147      alignment.  Neither approach is particularly satisfying.
6148
6149      Instead, this extension takes the approach of treating parameter buffers
6150      as array of scalar words.  When an individual buffer element is read,
6151      the single word is replicated to produce a four-component vector.  To
6152      access an array of 3-component vectors, code like the following can be
6153      used:
6154
6155        PARAM buffer[] = { program.buffer[0] };
6156        INT TEMP index;
6157        TEMP R0;
6158        ...
6159        MUL.S index, index, 3;          # to read "vec3" #X, compute 3*X
6160        MOV R0.x, buffer[index+0];
6161        MOV R0.y, buffer[index+1];
6162        MOV R0.z, buffer[index+2];
6163
6164    (38) Should recursion be allowed?  If so, how is the total amount of
6165    recursion limited?
6166
6167      RESOLVED:  Recursion is allowed, and a call stack is provided by the
6168      implementation.  The size of the call stack is limited to the
6169      implementation-dependent constant MAX_PROGRAM_CALL_DEPTH, and when a the
6170      call stack is full, the results of further CAL instructions is
6171      undefined.  In the initial implementation of this extension, such
6172      instructions will have no effect.
6173
6174      Note that no stack is provided to hold local registers; a program may
6175      implement its own via a temporary array and integer stack "pointer".
6176
6177    (39) Variables are all four-component vectors in previous extensions.
6178    Should scalar or small-vector variables be provided?
6179
6180      RESOLVED:  It would be a useful feature, but it was left out for
6181      simplicity.  In practice, a variable where only the X component is used
6182      will be equivalent to a scalar.
6183
6184    (40) The PK* (pack) and UP* (unpack) instructions allow packing multiple
6185    components of data into a single component.  The bit packing is
6186    well-defined.  Should we require specific data types (e.g., unsigned
6187    integer) to hold packed values?
6188
6189      RESOLVED:  No.  Previous instruction sets only allowed programs to write
6190      packed values to a floating-point variable (the only data type
6191      provided).  We will allow packed results to be written to a variable of
6192      any data type.  Integer instructions can be used to manipulate bits of
6193      packed data in place.
6194
6195    (41) What happens when converting integers to floats or vice versa if
6196    there is insufficient precision or range to represent the result?
6197
6198      RESOLVED:  For integer-to-float conversions, the nearest representable
6199      floating-point value is used, and the least significant bits of the
6200      original integer value are lost.  For float-to-integer conversions,
6201      out-of-range values are clamped to the nearest representable integer.
6202
6203    (42) Why are some of the grammar rules so bizarre (e.g., attribUseD,
6204    attribUseV, attribUseS, attribUseVNS)?
6205
6206      RESOLVED:  This grammar is based upon the original ARB_vertex_program
6207      grammar, which has a number of "interesting" characteristics.  For
6208      example, some of the bindings provided by ARB_vertex_program naturally
6209      require some amount of lookahead.  For example, a vertex program can
6210      write an output color using any of the following:
6211
6212        MOV result.color, 0;            # primary color
6213        MOV result.color.primary, 0;    # primary color again
6214        MOV result.color.secondary, 0;  # secondary color this time
6215
6216      The pieces of the color binding are separated by "." tokens.  However,
6217      writemasks are also supported, which also use "." before the write
6218      mask.  So, we could also have something like:
6219
6220        MOV result.color.xyz, 0;        # primary color with W masked off
6221
6222      In this form, a parser needs to look at both the "." and the "xyz" to
6223      determine that the binding being used is "result.color" (and not
6224      "result.color.secondary").
6225
6226      Additionally, some checks that should probably be semantic errors (e.g.,
6227      allowing different swizzle or scalar operand selectors per instruction,
6228      or disallowing both in the case of SWZ) we specified in the original
6229      grammar.
6230
6231      ARB_fragment_program and subsequent NVIDIA instructions built upon this,
6232      and the grammar for this extension was rewritten in the current form so
6233      it could be validated more easily.
6234
6235    (43) This is an NV extension (NV_gpu_program4).  Why does the
6236     MAX_PROGRAM_TEXEL_OFFSET_EXT token has an "EXT" suffix?
6237
6238      RESOLVED:  This token is shared between this extension and the
6239      comparable high-level GLSL programmability extension (EXT_gpu_shader4).
6240      Rather than provide a duplicate set of token names, we simply use the
6241      EXT version here.
6242
6243    (44) For the purposes of determining the number of attribute and result
6244         components, how are "scalar" attributes counted.  For example, only
6245         the x component of the "pointsize" per-vertex output is actually
6246         relevant.
6247
6248      RESOLVED:  Implementations are allowed to count all inputs and outputs
6249      as full four-component vectors.  To avoid this, apply appropriate write
6250      masks or swizzles.
6251
6252      For example, writing to "result.pointsize" may count as four components.
6253      Consistently writing to "result.pointsize.x" may only count as one.
6254      Similarly, reading a fragment's fog coordinate as "fragment.fogcoord"
6255      may count as four components; "fragment.fogcoord.x" will only count as
6256      one.
6257
6258Revision History
6259
6260    Rev.    Date    Author    Changes
6261    ----  --------  --------  --------------------------------------------
6262    11    09/11/14  pbrown    Fix cut-and-paste error in PK2US section.
6263
6264    10    12/14/09  mgodse    Added GLX protocol.
6265
6266     9    10/29/09  pbrown    Add language for previously undocumented errors
6267                              when using "SHORT" and "LONG" modifiers on
6268                              variable declarations.  They're allowed only on
6269                              "TEMP" statements, except that "SHORT" is
6270                              allowed for "OUTPUT" as well.
6271
6272     8    08/11/08  jbreton   Clarified that when a MOD instruction is
6273                              performed on negative operands the result is
6274                              undefined.
6275
6276     7    07/29/08  pbrown    Discovered additional issues with texture wrap
6277                              handling, replaced with logic that applies wrap
6278                              modes per sample.  Add a few instruction
6279                              pseudo-code lines explicitly identifying
6280                              undefined components.
6281
6282     6    05/02/08  pbrown    Fix the prototype for the internal TexelFetch()
6283                              function used in the spec language; texel
6284                              coordinates are signed integers.
6285
6286     5    02/22/08  pbrown    Clarified that when counting attribute/result
6287                              components, irrelevant/undefined components
6288                              can still count against the limits.
6289
6290     4    02/04/08  pbrown    Fix errors in texture wrap mode handling.
6291                              Added a missing clamp to avoid sampling border
6292                              in REPEAT mode.  Fixed incorrectly specified
6293                              weights for LINEAR filtering.
6294
6295     3    02/09/07  pbrown    Updated status section (now released).
6296
6297     2    10/19/06  pbrown    Change the token suffix for maximum texel offset
6298                              values from NV to EXT, since it is shared with
6299                              EXT_gpu_shader4.  Clarify what happens on a
6300                              negate of an unsigned value.  Fix typo in data
6301                              type modifier description.  Add missing
6302                              description of the "BUFFER4" declaration
6303                              keyword.
6304
6305     1              pbrown    Internal spec development.
6306