extensions/NV/NV_gpu_program4.txt

Name

    NV_gpu_program4

Name Strings

    GL_NV_gpu_program4

Contact

    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)

Status

    Shipping for GeForce 8 Series (November 2006)

Version

    Last Modified Date:         09/11/2014
    NVIDIA Revision:            11

Number

    322

Dependencies

    This extension is written against to OpenGL 2.0 specification.

    OpenGL 2.0 is not required, but we expect all implementations of this
    extension will also support OpenGL 2.0.

    This extension is also written against the ARB_vertex_program
    specification, which provides the basic mechanisms for the assembly
    programming model used by this extension.

    This extension serves as the basis for the NV_fragment_program4,
    NV_geometry_program4, and NV_vertex_program4, which all build on this
    extension to support fragment, geometry, and vertex programs,
    respectively.  If "GL_NV_gpu_program4" is found in the extension string,
    all of these extensions are supported.

    NV_parameter_buffer_object affects the definition of this extension.

    ARB_texture_rectangle trivially affects the definition of this extension.

    EXT_gpu_program_parameters trivially affects the definition of this
    extension.

    EXT_texture_integer trivially affects the definition of this extension.

    EXT_texture_array trivially affects the definition of this extension.

    EXT_texture_buffer_object trivially affects the definition of this
    extension.

    NV_primitive_restart trivially affects the definition of this extension.

Overview

    This specification documents the common instruction set and basic
    functionality provided by NVIDIA's 4th generation of assembly instruction
    sets supporting programmable graphics pipeline stages.

    The instruction set builds upon the basic framework provided by the
    ARB_vertex_program and ARB_fragment_program extensions to expose
    considerably more capable hardware.  In addition to new capabilities for
    vertex and fragment programs, this extension provides a new program type
    (geometry programs) further described in the NV_geometry_program4
    specification.

    NV_gpu_program4 provides a unified instruction set -- all instruction set
    features are available for all program types, except for a small number of
    features that make sense only for a specific program type.  It provides
    fully capable signed and unsigned integer data types, along with a set of
    arithmetic, logical, and data type conversion instructions capable of
    operating on integers.  It also provides a uniform set of structured
    branching constructs (if tests, loops, and subroutines) that fully support
    run-time condition testing.

    This extension provides several new texture mapping capabilities.  Shadow
    cube maps are supported, where cube map faces can encode depth values.
    Texture lookup instructions can include an immediate texel offset, which
    can assist in advanced filtering.  New instructions are provided to fetch
    a single texel by address in a texture map (TXF) and query the size of a
    specified texture level (TXQ).

    By and large, vertex and fragment programs written to ARB_vertex_program
    and ARB_fragment_program can be ported directly by simply changing the
    program header from "!!ARBvp1.0" or "!!ARBfp1.0" to "!!NVvp4.0" or
    "!!NVfp4.0", and then modifying the code to take advantage of the expanded
    feature set.  There are a small number of areas where this extension is
    not a functional superset of previous vertex program extensions, which are
    documented in this specification.


New Procedures and Functions

    void ProgramLocalParameterI4iNV(enum target, uint index,
                                    int x, int y, int z, int w);
    void ProgramLocalParameterI4ivNV(enum target, uint index,
                                     const int *params);
    void ProgramLocalParametersI4ivNV(enum target, uint index,
                                      sizei count, const int *params);
    void ProgramLocalParameterI4uiNV(enum target, uint index,
                                     uint x, uint y, uint z, uint w);
    void ProgramLocalParameterI4uivNV(enum target, uint index,
                                      const uint *params);
    void ProgramLocalParametersI4uivNV(enum target, uint index,
                                       sizei count, const uint *params);

    void ProgramEnvParameterI4iNV(enum target, uint index,
                                  int x, int y, int z, int w);
    void ProgramEnvParameterI4ivNV(enum target, uint index,
                                   const int *params);
    void ProgramEnvParametersI4ivNV(enum target, uint index,
                                    sizei count, const int *params);
    void ProgramEnvParameterI4uiNV(enum target, uint index,
                                   uint x, uint y, uint z, uint w);
    void ProgramEnvParameterI4uivNV(enum target, uint index,
                                    const uint *params);
    void ProgramEnvParametersI4uivNV(enum target, uint index,
                                     sizei count, const uint *params);

    void GetProgramLocalParameterIivNV(enum target, uint index,
                                       int *params);
    void GetProgramLocalParameterIuivNV(enum target, uint index,
                                        uint *params);
    void GetProgramEnvParameterIivNV(enum target, uint index,
                                     int *params);
    void GetProgramEnvParameterIuivNV(enum target, uint index,
                                      uint *params);

New Tokens


    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
    GetFloatv, and GetDoublev:

        MIN_PROGRAM_TEXEL_OFFSET_EXT                    0x8904
        MAX_PROGRAM_TEXEL_OFFSET_EXT                    0x8905

    (note:  these tokens are shared with the EXT_gpu_shader4 extension.)

    Accepted by the <pname> parameter of GetProgramivARB:

        PROGRAM_ATTRIB_COMPONENTS_NV                    0x8906
        PROGRAM_RESULT_COMPONENTS_NV                    0x8907
        MAX_PROGRAM_ATTRIB_COMPONENTS_NV                0x8908
        MAX_PROGRAM_RESULT_COMPONENTS_NV                0x8909
        MAX_PROGRAM_GENERIC_ATTRIBS_NV                  0x8DA5
        MAX_PROGRAM_GENERIC_RESULTS_NV                  0x8DA6

Additions to Chapter 2 of the OpenGL 1.5 Specification (OpenGL Operation)

    (Modify "Section 2.14.1" of the ARB_vertex_program specification,
    describing program parameters.)

    Each program object has an associated array of program local parameters.
    Program local parameters are four-component vectors whose components can
    hold floating-point, signed integer, or unsigned integer values.  The data
    type of each local parameter is established when the parameter's values
    are assigned.  If a program attempts to read a local parameter using a
    data type other than the one used when the parameter is set, the values
    returned are undefined.  ... The commands

      void ProgramLocalParameter4fARB(enum target, uint index,
                                      float x, float y, float z, float w);
      void ProgramLocalParameter4fvARB(enum target, uint index,
                                       const float *params);
      void ProgramLocalParameter4dARB(enum target, uint index,
                                      double x, double y, double z, double w);
      void ProgramLocalParameter4dvARB(enum target, uint index,
                                       const double *params);

      void ProgramLocalParameterI4iNV(enum target, uint index,
                                      int x, int y, int z, int w);
      void ProgramLocalParameterI4ivNV(enum target, uint index,
                                       const int *params);
      void ProgramLocalParameterI4uiNV(enum target, uint index,
                                       uint x, uint y, uint z, uint w);
      void ProgramLocalParameterI4uivNV(enum target, uint index,
                                        const uint *params);

    update the values of the program local parameter numbered <index>
    belonging to the program object currently bound to <target>.  For the
    non-vector versions of these commands, the four components of the
    parameter are updated with the values of <x>, <y>, <z>, and <w>,
    respectively.  For the vector versions, the components of the parameter
    are updated with the array of four values pointed to by <params>.  The
    error INVALID_VALUE is generated if <index> is greater than or equal to
    the number of program local parameters supported by <target>.

    The commands

      void ProgramLocalParameters4fvNV(enum target, uint index,
                                       sizei count, const float *params);
      void ProgramLocalParametersI4ivNV(enum target, uint index,
                                        sizei count, const int *params);
      void ProgramLocalParametersI4uivNV(enum target, uint index,
                                         sizei count, const uint *params);

    update the values of the program local parameters numbered <index> through
    <index> + <count> - 1 with the array of 4 * <count> values pointed to by
    <params>.  The error INVALID_VALUE is generated if the sum of <index> and
    <count> is greater than the number of program local parameters supported
    by <target>.

    When a program local parameter is updated, the data type of its components
    is assigned according to the data type of the provided values.  If values
    provided are of type "float" or "double", the components of the parameter
    are floating-point.  If the values provided are of type "int", the
    components of the parameter are signed integers.  If the values provided
    are of type "uint", the components of the parameter are unsigned integers.

    Additionally, each program target has an associated array of program
    environment parameters.  Unlike program local parameters, program
    environment parameters are shared by all program objects of a given
    target.  Program environment parameters are four-component vectors whose
    components can hold floating-point, signed integer, or unsigned integer
    values.  The data type of each environment parameter is established when
    the parameter's values are assigned.  If a program attempts to read an
    environment parameter using a data type other than the one used when the
    parameter is set, the values returned are undefined.  ... The commands

      void ProgramEnvParameter4fARB(enum target, uint index,
                                    float x, float y, float z, float w);
      void ProgramEnvParameter4fvARB(enum target, uint index,
                                     const float *params);
      void ProgramEnvParameter4dARB(enum target, uint index,
                                    double x, double y, double z, double w);
      void ProgramEnvParameter4dvARB(enum target, uint index,
                                     const double *params);
      void ProgramEnvParameterI4iNV(enum target, uint index,
                                    int x, int y, int z, int w);
      void ProgramEnvParameterI4ivNV(enum target, uint index,
                                     const int *params);
      void ProgramEnvParameterI4uiNV(enum target, uint index,
                                     uint x, uint y, uint z, uint w);
      void ProgramEnvParameterI4uivNV(enum target, uint index,
                                      const uint *params);

    update the values of the program environment parameter numbered <index>
    for the given program target <target>.  For the non-vector versions of
    these commands, the four components of the parameter are updated with the
    values of <x>, <y>, <z>, and <w>, respectively.  For the vector versions,
    the four components of the parameter are updated with the array of four
    values pointed to by <params>.  The error INVALID_VALUE is generated if
    <index> is greater than or equal to the number of program environment
    parameters supported by <target>.

    The commands

      void ProgramEnvParameters4fvNV(enum target, uint index,
                                     sizei count, const float *params);
      void ProgramEnvParametersI4ivNV(enum target, uint index,
                                      sizei count, const int *params);
      void ProgramEnvParametersI4uivNV(enum target, uint index,
                                       sizei count, const uint *params);

    update the values of the program environment parameters numbered <index>
    through <index> + <count> - 1 with the array of 4 * <count> values pointed
    to by <params>.  The error INVALID_VALUE is generated if the sum of
    <index> and <count> is greater than the number of program local parameters
    supported by <target>.

    When a program environment parameter is updated, the data type of its
    components is assigned according to the data type of the provided values.
    If values provided are of type "float" or "double", the components of the
    parameter are floating-point.  If the values provided are of type "int",
    the components of the parameter are signed integers.  If the values
    provided are of type "uint", the components of the parameter are unsigned
    integers.

    ...


    Insert New Section 2.X between Sections 2.Y and 2.Z:

    Section 2.X, GPU Programs

    The GL provides a number of different program targets that allow an
    application to either replace certain fixed-function pipeline stages with
    a fully programmable model or use a program to control aspects of the GL
    pipeline that previously had only hard-wired behavior.

    A common base instruction set is available for all program types,
    providing both integer and floating-point operations.  Structured
    branching operations and subroutine calls are available.  Texture
    mapping (loading data from external images) is supported for all
    program types.  The main differences between the different program
    types are the set of available inputs and outputs, which are program type-
    specific, and a few instructions that are meaningful for only a subset
    of program types.


    Section 2.X.2, Program Grammar

    GPU program strings are specified as an array of ASCII characters
    containing the program text.  When a GPU program is loaded by a call to
    ProgramStringARB, the program string is parsed into a set of tokens
    possibly separated by whitespace.  Spaces, tabs, newlines, carriage
    returns, and comments are considered whitespace.  Comments begin with the
    character "#" and are terminated by a newline, a carriage return, or the
    end of the program array.

    The Backus-Naur Form (BNF) grammar below specifies the syntactically valid
    sequences for GPU programs.  The set of valid tokens can be inferred
    from the grammar.  A line containing "/* empty */" represents an empty
    string and is used to indicate optional rules.  A program is invalid if it
    contains any tokens or characters not defined in this specification.

    Note that this extension is not a standalone extension and a small number
    of grammar rules are left to be defined in the extensions defining the
    specific vertex, fragment, and geometry program types.


    <program>               ::= <optionSequence> <declSequence>
                                <statementSequence> "END"

    <optionSequence>        ::= <option> <optionSequence>
                              | /* empty */

    <option>                ::= "OPTION" <identifier> ";"

    <declSequence>          ::= /* empty */

    <statementSequence>     ::= <statement> <statementSequence>
                              | /* empty */

    <statement>             ::= <instruction> ";"
                              | <namingStatement> ";"
                              | <instLabel> ":"

    <instruction>           ::= <ALUInstruction>
                              | <TexInstruction>
                              | <FlowInstruction>

    <ALUInstruction>        ::= <VECTORop_instruction>
                              | <SCALARop_instruction>
                              | <BINSCop_instruction>
                              | <BINop_instruction>
                              | <VECSCAop_instruction>
                              | <TRIop_instruction>
                              | <SWZop_instruction>

    <TexInstruction>        ::= <TEXop_instruction>
                              | <TXDop_instruction>

    <FlowInstruction>       ::= <BRAop_instruction>
                              | <FLOWCCop_instruction>
                              | <IFop_instruction>
                              | <REPop_instruction>
                              | <ENDFLOWop_instruction>

    <VECTORop_instruction>  ::= <VECTORop> <opModifiers> <instResult> ","
                                <instOperandV>

    <VECTORop>              ::= "ABS"
                              | "CEIL"
                              | "FLR"
                              | "FRC"
                              | "I2F"
                              | "LIT"
                              | "MOV"
                              | "NOT"
                              | "NRM"
                              | "PK2H"
                              | "PK2US"
                              | "PK4B"
                              | "PK4UB"
                              | "ROUND"
                              | "SSG"
                              | "TRUNC"

    <SCALARop_instruction>  ::= <SCALARop> <opModifiers> <instResult> ","
                                <instOperandS>

    <SCALARop>              ::= "COS"
                              | "EX2"
                              | "LG2"
                              | "RCC"
                              | "RCP"
                              | "RSQ"
                              | "SCS"
                              | "SIN"
                              | "UP2H"
                              | "UP2US"
                              | "UP4B"
                              | "UP4UB"

    <BINSCop_instruction>   ::= <BINSCop> <opModifiers> <instResult> ","
                                <instOperandS> "," <instOperandS>

    <BINSCop>               ::= "POW"

    <VECSCAop_instruction>  ::= <VECSCAop> <opModifiers> <instResult> ","
                                <instOperandV> "," <instOperandS>

    <VECSCAop>              ::= "DIV"
                              | "SHL"
                              | "SHR"
                              | "MOD"

    <BINop_instruction>     ::= <BINop> <opModifiers> <instResult> ","
                                <instOperandV> "," <instOperandV>

    <BINop>                 ::= "ADD"
                              | "AND"
                              | "DP3"
                              | "DP4"
                              | "DPH"
                              | "DST"
                              | "MAX"
                              | "MIN"
                              | "MUL"
                              | "OR"
                              | "RFL"
                              | "SEQ"
                              | "SFL"
                              | "SGE"
                              | "SGT"
                              | "SLE"
                              | "SLT"
                              | "SNE"
                              | "STR"
                              | "SUB"
                              | "XPD"
                              | "DP2"
                              | "XOR"

    <TRIop_instruction>     ::= <TRIop> <opModifiers> <instResult> ","
                                <instOperandV> "," <instOperandV> ","
                                <instOperandV>

    <TRIop>                 ::= "CMP"
                              | "DP2A"
                              | "LRP"
                              | "MAD"
                              | "SAD"
                              | "X2D"

    <SWZop_instruction>     ::= <SWZop> <opModifiers> <instResult> ","
                                <instOperandVNS> "," <extendedSwizzle>

    <SWZop>                 ::= "SWZ"

    <TEXop_instruction>     ::= <TEXop> <opModifiers> <instResult> ","
                                <instOperandV> "," <texAccess>

    <TEXop>                 ::= "TEX"
                              | "TXB"
                              | "TXF"
                              | "TXL"
                              | "TXP"
                              | "TXQ"

    <TXDop_instruction>     ::= <TXDop> <opModifiers> <instResult> ","
                                <instOperandV> "," <instOperandV> ","
                                <instOperandV> "," <texAccess>

    <TXDop>                 ::= "TXD"

    <BRAop_instruction>     ::= <BRAop> <opModifiers> <instTarget>
                                <optBranchCond>

    <BRAop>                 ::= "CAL"

    <FLOWCCop_instruction>  ::= <FLOWCCop> <opModifiers> <optBranchCond>

    <FLOWCCop>              ::= "RET"
                              | "BRK"
                              | "CONT"

    <IFop_instruction>      ::= <IFop> <opModifiers> <ccTest>

    <IFop>                  ::= "IF"

    <REPop_instruction>     ::= <REPop> <opModifiers> <instOperandV>
                              | <REPop> <opModifiers>

    <REPop>                 ::= "REP"

    <ENDFLOWop_instruction> ::= <ENDFLOWop> <opModifiers>

    <ENDFLOWop>             ::= "ELSE"
                              | "ENDIF"
                              | "ENDREP"

    <opModifiers>           ::= <opModifierItem> <opModifiers>
                              | /* empty */

    <opModifierItem>        ::= "." <opModifier>

    <opModifier>            ::= "F"
                              | "U"
                              | "S"
                              | "CC"
                              | "CC0"
                              | "CC1"
                              | "SAT"
                              | "SSAT"
                              | "NTC"
                              | "S24"
                              | "U24"
                              | "HI"

    <texAccess>             ::= <texImageUnit> "," <texTarget>
                              | <texImageUnit> "," <texTarget> "," <texOffset>

    <texImageUnit>          ::= "texture" <optArrayMemAbs>

    <texTarget>             ::= "1D"
                              | "2D"
                              | "3D"
                              | "CUBE"
                              | "RECT"
                              | "SHADOW1D"
                              | "SHADOW2D"
                              | "SHADOWRECT"
                              | "ARRAY1D"
                              | "ARRAY2D"
                              | "SHADOWCUBE"
                              | "SHADOWARRAY1D"
                              | "SHADOWARRAY2D"

    <texOffset>             ::= "(" <texOffsetComp> ")"
                              | "(" <texOffsetComp> "," <texOffsetComp> ")"
                              | "(" <texOffsetComp> "," <texOffsetComp> ","
                                <texOffsetComp> ")"

    <texOffsetComp>         ::= <optSign> <int>

    <optBranchCond>         ::= /* empty */
                              | <ccMask>

    <instOperandV>          ::= <instOperandAbsV>
                              | <instOperandBaseV>

    <instOperandAbsV>       ::= <operandAbsNeg> "|" <instOperandBaseV> "|"

    <instOperandBaseV>      ::= <operandNeg> <attribUseV>
                              | <operandNeg> <tempUseV>
                              | <operandNeg> <paramUseV>
                              | <operandNeg> <bufferUseV>

    <instOperandS>          ::= <instOperandAbsS>
                              | <instOperandBaseS>

    <instOperandAbsS>       ::= <operandAbsNeg> "|" <instOperandBaseS> "|"

    <instOperandBaseS>      ::= <operandNeg> <attribUseS>
                              | <operandNeg> <tempUseS>
                              | <operandNeg> <paramUseS>
                              | <operandNeg> <bufferUseS>

    <instOperandVNS>        ::= <attribUseVNS>
                              | <tempUseVNS>
                              | <paramUseVNS>
                              | <bufferUseVNS>

    <operandAbsNeg>         ::= <optSign>

    <operandNeg>            ::= <optSign>

    <instResult>            ::= <instResultCC>
                              | <instResultBase>

    <instResultCC>          ::= <instResultBase> <ccMask>

    <instResultBase>        ::= <tempUseW>
                              | <resultUseW>

    <namingStatement>       ::= <varMods> <ATTRIB_statement>
                              | <varMods> <PARAM_statement>
                              | <varMods> <TEMP_statement>
                              | <varMods> <OUTPUT_statement>
                              | <varMods> <BUFFER_statement>
                              | <ALIAS_statement>

    <ATTRIB_statement>      ::= "ATTRIB" <establishName> "=" <attribUseD>

    <PARAM_statement>       ::= <PARAM_singleStmt>
                              | <PARAM_multipleStmt>

    <PARAM_singleStmt>      ::= "PARAM" <establishName> <paramSingleInit>

    <PARAM_multipleStmt>    ::= "PARAM" <establishName> <optArraySize>
                                <paramMultipleInit>

    <paramSingleInit>       ::= "=" <paramUseDB>

    <paramMultipleInit>     ::= "=" "{" <paramMultInitList> "}"

    <paramMultInitList>     ::= <paramUseDM>
                              | <paramUseDM> "," <paramMultInitList>

    <TEMP_statement>        ::= "TEMP" <varNameList>

    <OUTPUT_statement>      ::= "OUTPUT" <establishName> "=" <resultUseD>

    <varMods>               ::= <varModifier> <varMods>
                              | /* empty */

    <varModifier>           ::= "SHORT"
                              | "LONG"
                              | "INT"
                              | "UINT"
                              | "FLOAT"

    <ALIAS_statement>       ::= "ALIAS" <establishName> "=" <establishedName>

    <BUFFER_statement>      ::= <bufferDeclType> <establishName> "="
                                <bufferSingleInit>
                              | <bufferDeclType> <establishName>
                                <optArraySize> "=" <bufferMultInit>

    <bufferDeclType>        ::= "BUFFER"
                              | "BUFFER4"

    <bufferSingleInit>      ::= "=" <bufferUseDB>

    <bufferMultInit>        ::= "=" "{" <bufferMultInitList> "}"

    <bufferMultInitList>    ::= <bufferUseDM>
                              | <bufferUseDM> "," <bufferMultInitList>

    <varNameList>           ::= <establishName>
                              | <establishName> "," <varNameList>

    <attribUseV>            ::= <attribBasic> <swizzleSuffix>
                              | <attribVarName> <swizzleSuffix>
                              | <attribVarName> <arrayMem> <swizzleSuffix>
                              | <attribColor> <swizzleSuffix>
                              | <attribColor> "." <colorType> <swizzleSuffix>

    <attribUseS>            ::= <attribBasic> <scalarSuffix>
                              | <attribVarName> <scalarSuffix>
                              | <attribVarName> <arrayMem> <scalarSuffix>
                              | <attribColor> <scalarSuffix>
                              | <attribColor> "." <colorType> <scalarSuffix>

    <attribUseVNS>          ::= <attribBasic>
                              | <attribVarName>
                              | <attribVarName> <arrayMem>
                              | <attribColor>
                              | <attribColor> "." <colorType>

    <attribUseD>            ::= <attribBasic>
                              | <attribColor>
                              | <attribColor> "." <colorType>
                              | <attribMulti>

    <paramUseV>             ::= <paramVarName> <optArrayMem> <swizzleSuffix>
                              | <stateSingleItem> <swizzleSuffix>
                              | <programSingleItem> <swizzleSuffix>
                              | <constantVector> <swizzleSuffix>
                              | <constantScalar>

    <paramUseS>             ::= <paramVarName> <optArrayMem> <scalarSuffix>
                              | <stateSingleItem> <scalarSuffix>
                              | <programSingleItem> <scalarSuffix>
                              | <constantVector> <scalarSuffix>
                              | <constantScalar>

    <paramUseVNS>           ::= <paramVarName> <optArrayMem>
                              | <stateSingleItem>
                              | <programSingleItem>
                              | <constantVector>
                              | <constantScalar>

    <paramUseDB>            ::= <stateSingleItem>
                              | <programSingleItem>
                              | <constantVector>
                              | <signedConstantScalar>

    <paramUseDM>            ::= <stateMultipleItem>
                              | <programMultipleItem>
                              | <constantVector>
                              | <signedConstantScalar>

    <stateMultipleItem>     ::= <stateSingleItem>
                              | "state" "." <stateMatrixRows>

    <stateSingleItem>       ::= "state" "." <stateMaterialItem>
                              | "state" "." <stateLightItem>
                              | "state" "." <stateLightModelItem>
                              | "state" "." <stateLightProdItem>
                              | "state" "." <stateFogItem>
                              | "state" "." <stateMatrixRow>
                              | "state" "." <stateTexGenItem>
                              | "state" "." <stateClipPlaneItem>
                              | "state" "." <statePointItem>
                              | "state" "." <stateTexEnvItem>
                              | "state" "." <stateDepthItem>

    <stateMaterialItem>     ::= "material" "." <stateMatProperty>
                              | "material" "." <faceType> "."
                                <stateMatProperty>

    <stateMatProperty>      ::= "ambient"
                              | "diffuse"
                              | "specular"
                              | "emission"
                              | "shininess"

    <stateLightItem>        ::= "light" <arrayMemAbs> "." <stateLightProperty>

    <stateLightProperty>    ::= "ambient"
                              | "diffuse"
                              | "specular"
                              | "position"
                              | "attenuation"
                              | "spot" "." <stateSpotProperty>
                              | "half"

    <stateSpotProperty>     ::= "direction"

    <stateLightModelItem>   ::= "lightmodel" "." <stateLModProperty>

    <stateLModProperty>     ::= "ambient"
                              | "scenecolor"
                              | <faceType> "." "scenecolor"

    <stateLightProdItem>    ::= "lightprod" <arrayMemAbs> "."
                                <stateLProdProperty>
                              | "lightprod" <arrayMemAbs> "." <faceType> "."
                                <stateLProdProperty>

    <stateLProdProperty>    ::= "ambient"
                              | "diffuse"
                              | "specular"

    <stateFogItem>          ::= "fog" "." <stateFogProperty>

    <stateFogProperty>      ::= "color"
                              | "params"

    <stateMatrixRows>       ::= <stateMatrixItem>
                              | <stateMatrixItem> "." <stateMatModifier>
                              | <stateMatrixItem> "." "row" <arrayRange>
                              | <stateMatrixItem> "." <stateMatModifier> "."
                                "row" <arrayRange>

    <stateMatrixRow>        ::= <stateMatrixItem> "." "row" <arrayMemAbs>
                              | <stateMatrixItem> "." <stateMatModifier> "."
                                "row" <arrayMemAbs>

    <stateMatrixItem>       ::= "matrix" "." <stateMatrixName>

    <stateMatModifier>      ::= "inverse"
                              | "transpose"
                              | "invtrans"

    <stateMatrixName>       ::= "modelview" <optArrayMemAbs>
                              | "projection"
                              | "mvp"
                              | "texture" <optArrayMemAbs>
                              | "program" <arrayMemAbs>

    <stateTexGenItem>       ::= "texgen" <optArrayMemAbs> "."
                                <stateTexGenType> "." <stateTexGenCoord>

    <stateTexGenType>       ::= "eye"
                              | "object"

    <stateTexGenCoord>      ::= "s"
                              | "t"
                              | "r"
                              | "q"

    <stateClipPlaneItem>    ::= "clip" <arrayMemAbs> "." "plane"

    <statePointItem>        ::= "point" "." <statePointProperty>

    <statePointProperty>    ::= "size"
                              | "attenuation"

    <stateTexEnvItem>       ::= "texenv" <optArrayMemAbs> "."
                                <stateTexEnvProperty>

    <stateTexEnvProperty>   ::= "color"

    <stateDepthItem>        ::= "depth" "." <stateDepthProperty>

    <stateDepthProperty>    ::= "range"

    <programSingleItem>     ::= <progEnvParam>
                              | <progLocalParam>

    <programMultipleItem>   ::= <progEnvParams>
                              | <progLocalParams>

    <progEnvParams>         ::= "program" "." "env" <arrayMemAbs>
                              | "program" "." "env" <arrayRange>

    <progEnvParam>          ::= "program" "." "env" <arrayMemAbs>

    <progLocalParams>       ::= "program" "." "local" <arrayMemAbs>
                              | "program" "." "local" <arrayRange>

    <progLocalParam>        ::= "program" "." "local" <arrayMemAbs>

    <constantVector>        ::= "{" <constantVectorList> "}"

    <constantVectorList>    ::= <signedConstantScalar>
                              | <signedConstantScalar> ","
                                <signedConstantScalar>
                              | <signedConstantScalar> ","
                                <signedConstantScalar> ","
                                <signedConstantScalar>
                              | <signedConstantScalar> ","
                                <signedConstantScalar> ","
                                <signedConstantScalar> ","
                                <signedConstantScalar>

    <signedConstantScalar>  ::= <optSign> <constantScalar>

    <constantScalar>        ::= <floatConstant>
                              | <intConstant>

    <floatConstant>         ::= <float>

    <intConstant>           ::= <int>

    <tempUseV>              ::= <tempVarName> <swizzleSuffix>

    <tempUseS>              ::= <tempVarName> <scalarSuffix>

    <tempUseVNS>            ::= <tempVarName>

    <tempUseW>              ::= <tempVarName> <optWriteMask>

    <resultUseW>            ::= <resultBasic> <optWriteMask>
                              | <resultVarName> <optWriteMask>

    <resultUseD>            ::= <resultBasic>

    <bufferUseV>            ::= <bufferVarName> <optArrayMem> <swizzleSuffix>

    <bufferUseS>            ::= <bufferVarName> <optArrayMem> <scalarSuffix>

    <bufferUseVNS>          ::= <bufferVarName> <optArrayMem>

    <bufferUseDB>           ::= <bufferBinding> <arrayMemAbs>

    <bufferUseDM>           ::= <bufferBinding> <arrayMemAbs>
                              | <bufferBinding> <arrayRange>
                              | <bufferBinding>

    <bufferBinding>         ::= "program" "." "buffer" <arrayMemAbs>

    <optArraySize>          ::= "[" "]"
                              | "[" <int> "]"

    <optArrayMem>           ::= /* empty */
                              | <arrayMem>

    <arrayMem>              ::= <arrayMemAbs>
                              | <arrayMemRel>

    <optArrayMemAbs>        ::= /* empty */
                              | <arrayMemAbs>

    <arrayMemAbs>           ::= "[" <int> "]"

    <arrayMemRel>           ::= "[" <arrayMemReg> <arrayMemOffset> "]"

    <arrayMemReg>           ::= <addrUseS>

    <arrayMemOffset>        ::= /* empty */
                              | "+" <int>
                              | "-" <int>

    <arrayRange>            ::= "[" <int> ".." <int> "]"

    <addrUseS>              ::= <addrVarName> <scalarSuffix>

    <ccMask>                ::= "(" <ccTest> ")"

    <ccTest>                ::= <ccMaskRule> <swizzleSuffix>

    <ccMaskRule>            ::= "EQ"
                              | "GE"
                              | "GT"
                              | "LE"
                              | "LT"
                              | "NE"
                              | "TR"
                              | "FL"
                              | "EQ0"
                              | "GE0"
                              | "GT0"
                              | "LE0"
                              | "LT0"
                              | "NE0"
                              | "TR0"
                              | "FL0"
                              | "EQ1"
                              | "GE1"
                              | "GT1"
                              | "LE1"
                              | "LT1"
                              | "NE1"
                              | "TR1"
                              | "FL1"
                              | "NAN"
                              | "NAN0"
                              | "NAN1"
                              | "LEG"
                              | "LEG0"
                              | "LEG1"
                              | "CF"
                              | "CF0"
                              | "CF1"
                              | "NCF"
                              | "NCF0"
                              | "NCF1"
                              | "OF"
                              | "OF0"
                              | "OF1"
                              | "NOF"
                              | "NOF0"
                              | "NOF1"
                              | "AB"
                              | "AB0"
                              | "AB1"
                              | "BLE"
                              | "BLE0"
                              | "BLE1"
                              | "SF"
                              | "SF0"
                              | "SF1"
                              | "NSF"
                              | "NSF0"
                              | "NSF1"

    <optWriteMask>          ::= /* empty */
                              | <xyzwMask>
                              | <rgbaMask>

    <xyzwMask>              ::= "." "x"
                              | "." "y"
                              | "." "xy"
                              | "." "z"
                              | "." "xz"
                              | "." "yz"
                              | "." "xyz"
                              | "." "w"
                              | "." "xw"
                              | "." "yw"
                              | "." "xyw"
                              | "." "zw"
                              | "." "xzw"
                              | "." "yzw"
                              | "." "xyzw"

    <rgbaMask>              ::= "." "r"
                              | "." "g"
                              | "." "rg"
                              | "." "b"
                              | "." "rb"
                              | "." "gb"
                              | "." "rgb"
                              | "." "a"
                              | "." "ra"
                              | "." "ga"
                              | "." "rga"
                              | "." "ba"
                              | "." "rba"
                              | "." "gba"
                              | "." "rgba"

    <swizzleSuffix>         ::= /* empty */
                              | "." <component>
                              | "." <xyzwSwizzle>
                              | "." <rgbaSwizzle>

    <extendedSwizzle>       ::= <extSwizComp> "," <extSwizComp> ","
                                <extSwizComp> "," <extSwizComp>

    <extSwizComp>           ::= <optSign> <xyzwExtSwizSel>
                              | <optSign> <rgbaExtSwizSel>

    <xyzwExtSwizSel>        ::= "0"
                              | "1"
                              | <xyzwComponent>

    <rgbaExtSwizSel>        ::= <rgbaComponent>

    <scalarSuffix>          ::= "." <component>

    <component>             ::= <xyzwComponent>
                              | <rgbaComponent>

    <xyzwComponent>         ::= "x"
                              | "y"
                              | "z"
                              | "w"

    <rgbaComponent>         ::= "r"
                              | "g"
                              | "b"
                              | "a"

    <optSign>               ::= /* empty */
                              | "-"
                              | "+"

    <faceType>              ::= "front"
                              | "back"

    <colorType>             ::= "primary"
                              | "secondary"

    <instLabel>             ::= <identifier>

    <instTarget>            ::= <identifier>

    <establishedName>       ::= <identifier>

    <establishName>         ::= <identifier>


    The <int> rule matches an integer constant.  The integer consists of a
    sequence of one or more digits ("0" through "9"), or a sequence in
    hexadecimal form beginning with "0x" followed by a sequence of one or more
    hexadecimal digits ("0" through "9", "a" through "f", "A" through "F").

    The <float> rule matches a floating-point constant consisting of an
    integer part, a decimal point, a fraction part, an "e" or "E", and an
    optionally signed integer exponent.  The integer and fraction parts both
    consist of a sequence of one or more digits ("0" through "9").  Either the
    integer part or the fraction parts (not both) may be missing; either the
    decimal point or the "e" (or "E") and the exponent (not both) may be
    missing.  Most grammar rules that allow floating-point values also allow
    integers matching the <int> rule.

    The <identifier> rule matches a sequence of one or more letters ("A"
    through "Z", "a" through "z"), digits ("0" through "9), underscores ("_"),
    or dollar signs ("$"); the first character must not be a number.  Upper
    and lower case letters are considered different (names are
    case-sensitive).  The following strings are reserved keywords and may not
    be used as identifiers:  "fragment" (for fragment programs only), "vertex"
    (for vertex and geometry programs), "primitive" (for fragment and geometry
    programs), "program", "result", "state", and "texture".

    The <tempVarName>, <paramVarName>, <attribVarName>, <resultVarName>, and
    <bufferName> rules match identifiers that have been previously established
    as names of temporary, program parameter, attribute, result, and program
    parameter buffer variables, respectively.

    The <xyzwSwizzle> and <rgbaSwizzle> rules match any 4-character strings
    consisting only of the characters "x", "y", "z", and "w" (<xyzwSwizzle>)
    or "r", "g", "b", "a" (<rgbaSwizzle>).

    The error INVALID_OPERATION is generated if a program fails to load
    because it is not syntactically correct or for one of the semantic
    restrictions described in the following sections.

    A successfully loaded program is parsed into a sequence of instructions.
    Each instruction is identified by its tokenized name.  The operation of
    these instructions when executed is defined in section 2.X.4.  A
    successfully loaded program string replaces the program string previously
    loaded into the specified program object.  If the OUT_OF_MEMORY error is
    generated by ProgramStringARB, no change is made to the previous contents
    of the current program object.


    Section 2.X.3, Program Variables

    Programs may operate on a number of different variables during their
    execution.  The following sections define the different classes of
    variables that can be declared and used by a program.

    Some variable classes require variable bindings.  Variable classes with
    bindings refer to state that is either generated or consumed outside the
    program.  Examples of variable bindings include a vertex's normal, the
    position of a vertex computed by a vertex program, an interpolated texture
    coordinate, and the diffuse color of light 1.  Variables that are used
    only during program execution do not have bindings.

    Variables may be declared explicitly according to the <namingStatement>
    grammar rule.  Explicit variable declarations allow a program to establish
    a variable name that can be used to refer to a specified resource in
    subsequent instructions.  Variables may be declared anywhere in the
    program string, but must be declared prior to use.  A program will fail to
    load if it declares the same variable name more than once, or if it refers
    to a variable name that has not been previously declared in the program
    string.

    Variables may also be declared implicitly, simply by using a variable
    binding as an operand in a program instruction.  Such uses are considered
    to automatically create a nameless variable using the specified binding.
    Only variable from classes with bindings can be declared implicitly.


    Section 2.X.3.1, Program Variable Types

    Explicit variable declarations may include one or more modifiers that
    specify additional information about the variable, such as the size and
    data type of the components of the variable.  Variable modifiers are
    specified according to the <varModifier> grammar rule.

    By default, variables are considered typeless.  They can be used in
    instructions that read or write the variable as floating-point values,
    signed integers, or unsigned integers.  If a variable is written using one
    data type but then read using a different one, the results of the
    operation are undefined.  Variables with bindings are considered to be
    read or written when their values are produced or consumed; the data type
    used by the GL is specified in the description of each binding.

    Explicitly declared variables may optionally have one data type modifier,
    which can be used to detect data type mismatch errors.  Type modifers of
    "INT", "UINT", and "FLOAT" indicate that the components of the variable
    are stored as signed integers, unsigned integers, or floating-point
    values, respectively.  A program will fail to load if it attempts to read
    or write a variable using a data type other than the one indicated by the
    data type modifier.  Variables without a data type modifier can be read or
    written using any data type.

    Explicitly declared variables may optionally have one storage size
    modifier.  Variables decared as "SHORT" will be represented using at least
    16 bits per component.  "SHORT" floating-point values will have at least 5
    bits of exponent and 10 bits of mantissa.  Variables declared as "LONG"
    will be represented with at least 32 bits per component.  "LONG"
    floating-point values will have at least 8 bits of exponent and 23 bits of
    mantissa.  If no size modifier is provided, the GL will automatically
    select component sizes.  Implementations are not required to support more
    than one component size, so "SHORT", "LONG", and the default could all
    refer to the same component size.  The "LONG" modifier is supported only
    for declarations of temporary variables ("TEMP").  The "SHORT" modifier is
    supported only for declarations of temporary variables and result
    variables ("OUTPUT").

    Each variable declaration can include at most one data type and one
    storage size modifier.  A program will fail to load if it specifies
    multiple data type or multiple storage size modifiers in a single variable
    declaration.

    (NOTE:  Fragment programs also support the modifiers "FLAT", "CENTROID",
    and "NOPERSPECTIVE", which control how per-fragment attribute values are
    produced.  These modifiers are described in detail in the
    NV_fragment_program4 specification.)

    Explicitly declared variables of all types may be declared as arrays.  An
    array variable has one or more members, numbered 0 through <n>-1, where
    <n> is the number of entries in the array.  The total number of entries in
    the array can be declared using the <optArraySize> grammar rule.  For
    variable classes without bindings, an array size must be specified in the
    program, and must be a positive integer.  For variable classes with
    bindings, a declared size is optional, and is taken from the number of
    bindings assigned in the declaration if omitted.  A program will fail to
    load if the declared size of an array variable does not match the number
    of assigned bindings.

    When a variable is declared as an array, instructions that use the
    variable must specify an array member to access according to the
    <arrayMem> grammar rule.  A program will fail to load if it contains an
    instruction that accesses an array variable without specifying an array
    member or an instruction that specifies an array member for a non-array
    variable.


    Section 2.X.3.2, Program Attribute Variables

    Program attribute variables represent per-vertex or per-fragment inputs to
    the program.  All attribute variables have associated bindings, and are
    read-only during program execution.  Attribute variables may be declared
    explicitly via the <ATTRIB_statement> grammar rule, or implicitly by using
    an attribute binding in an instruction.

    The set of available attribute bindings depends on the program type, and
    is enumerated in the specifications for each program type.

    The set of bindings allowed for attribute array variables is limited to
    attribute state grouped in arrays (e.g., texture coordinates, generic
    vertex attributes).  Additionally, all bindings assigned to the array must
    be of the same binding type and must increase consecutively.  Examples of
    valid and invalid binding lists include:

      vertex.attrib[1], vertex.attrib[2]      # valid, 2-entry array
      vertex.texcoord[0..3]                   # valid, 4-entry array
      vertex.attrib[1], vertex.attrib[3]      # invalid, skipped attrib 2
      vertex.attrib[2], vertex.attrib[1]      # invalid, wrong order
      vertex.attrib[1], vertex.texcoord[2]    # invalid, different types

    Additionally, attribute bindings may be used in no more than one array
    variable accessed with relative addressing.

    Implementations may have a limit on the total number of attribute binding
    components used by each program target (MAX_PROGRAM_ATTRIB_COMPONENTS_NV).
    Programs that use more attribute binding components than this limit will
    fail to load.  The method of counting used attribute binding components is
    implementation-dependent, but must satisfy the following properties:

      * If an attribute binding is not referenced in a program, or is
        referenced only in declarations of attribute variables that are not
        used, none of its components are counted.

      * An attribute binding component may be counted as used only if there
        exists an instruction operand where

          - the component is enabled for read by the swizzle pattern (Section
            2.X.4.2), and

          - the attribute binding is

              - referenced directly by the operand,

              - bound to a declared variable referenced by the operand, or

              - bound to a declared array variable where another binding in
                the array satisfies one of the two previous conditions.

        Implementations are not required to optimize out unused elements of an
        attribute array or components that are used in only some elements of
        an array.  The last of these rules is intended to cover the case where
        the same attribute binding is used in multiple variables.

        For example, an operand whose swizzle pattern selects only the x
        component may result in the x component of an attribute binding being
        counted, but may never result in the counting of the y, z, or w
        components of any attribute binding.

      * Implementations are not required to determine that components read by
        an instruction are actually unused due to:

          - instruction write masks (for example, a component-wise ADD
            operation that only writes the "x" component doesn't have to read
            the "y", "z", and "w" components of its operands) or

          - any other properties of the instruction (for example, the DP3
            instruction computes a 3-component dot product doesn't have to
            read the "w" component of its operands).


    Section 2.X.3.3, Program Parameters

    Program parameter variables are used as constants during program
    execution.  All program parameter variables have associated bindings and
    are read-only during program execution.  Program parameters retain their
    values across program invocations, although their values may change
    between invocations due to GL state changes.  Program parameter variables
    may be declared explicitly via the <PARAM_statement> grammar rule, or
    implicitly by using a parameter binding in an instruction.  Except where
    otherwise specified, program parameter bindings always specify
    floating-point values.

    When declaring program parameter array variables, all bindings are
    supported and can be assigned to array members in any order.  The only
    restriction is that no parameter binding may be used more than once in
    array variables accessed using relative addressing.  A program will fail
    to load if any program parameter binding is used more than once in a
    single array accessed using relative addressing or used at least once in
    two or more arrays accessed using relative addressing.


    Constant Bindings

    If a program parameter binding matches the <constantScalar> or
    <signedConstantScalar> grammar rules, the corresponding program parameter
    variable is bound to the vector (X,X,X,X), where X is the value of the
    specified constant.

    If a program parameter binding matches <constantVector>, the corresponding
    program parameter variable is bound to the vector (X,Y,Z,W), where X, Y,
    Z, and W are the values corresponding to the first, second, third, and
    fourth match of <signedConstantScalar>.  If fewer than four constants are
    specified, Y, Z, and W assume the values 0, 0, and 1, if their respective
    constants are not specified.

    Constant bindings can be interpreted as having signed integer, unsigned
    integer, or floating-point values, depending on how they are used in the
    program text.  For constants in variable declarations, the components of
    the constant are interpreted according to the variable's component data
    type modifier.  If no data type modifier is specified in a declaration,
    constants are interpreted as floating-point values.  For constant bindings
    used directly in an instruction, the components of the constant are
    interpreted according to the required data type of the operand.  A program
    will fail to load if it specifies a floating-point constant value
    (matching the <floatConstant> grammar rule) that should be interpreted as
    a signed or unsigned integer, or a negative integer constant value that
    should be interpreted as an unsigned integer.

    If the value used to specify a floating-point constant can not be exactly
    represented, the nearest floating-point value will be used.  If the value
    used to specify an integer constant is too large to be represented, the
    program will fail to load.


    Program Environment/Local Parameter Bindings

      Binding                    Components  Underlying State
      -------------------------  ----------  -------------------------------
      program.env[a]             (x,y,z,w)   program environment parameter a
      program.local[a]           (x,y,z,w)   program local parameter a
      program.env[a..b]          (x,y,z,w)   program environment parameters
                                             a through b
      program.local[a..b]        (x,y,z,w)   program local parameters
                                             a through b

      Table X.1:  Program Environment/Local Parameter Bindings.  <a> and <b>
      indicate parameter numbers, where <a> must be less than or equal to <b>.

    If a program parameter binding matches "program.env[a]" or
    "program.local[a]", the four components of the program parameter variable
    are filled with the four components of program environment parameter <a>
    or program local parameter <a> respectively.

    Additionally, for program parameter array bindings, "program.env[a..b]"
    and "program.local[a..b]" are equivalent to specifying program environment
    or local parameters <a> through <b> in order, respectively.  A program
    using any of these bindings will fail to load if <a> is greater than <b>.

    Program environment and local parameters are typeless, and may be
    specified as signed integer, unsigned integer, or floating-point
    variables.  If a program environment parameter is read using a data type
    other than the one used to specify it, an undefined value is returned.


    Material Property Bindings

      Binding                        Components  Underlying State
      -----------------------------  ----------  ----------------------------
      state.material.ambient         (r,g,b,a)   front ambient material color
      state.material.diffuse         (r,g,b,a)   front diffuse material color
      state.material.specular        (r,g,b,a)   front specular material color
      state.material.emission        (r,g,b,a)   front emissive material color
      state.material.shininess       (s,0,0,1)   front material shininess
      state.material.front.ambient   (r,g,b,a)   front ambient material color
      state.material.front.diffuse   (r,g,b,a)   front diffuse material color
      state.material.front.specular  (r,g,b,a)   front specular material color
      state.material.front.emission  (r,g,b,a)   front emissive material color
      state.material.front.shininess (s,0,0,1)   front material shininess
      state.material.back.ambient    (r,g,b,a)   back ambient material color
      state.material.back.diffuse    (r,g,b,a)   back diffuse material color
      state.material.back.specular   (r,g,b,a)   back specular material color
      state.material.back.emission   (r,g,b,a)   back emissive material color
      state.material.back.shininess  (s,0,0,1)   back material shininess

      Table X.3:  Material Property Bindings.  If a material face is not
      specified in the binding, the front property is used.

    If a program parameter binding matches any of the material properties
    listed in Table X.3, the program parameter variable is filled according to
    the table.  For ambient, diffuse, specular, or emissive colors, the "x",
    "y", "z", and "w" components are filled with the "r", "g", "b", and "a"
    components, respectively, of the corresponding material color.  For
    material shininess, the "x" component is filled with the material's
    specular exponent, and the "y", "z", and "w" components are filled with
    the floating-point constants 0, 0, and 1, respectively.  Bindings
    containing ".back" refer to the back material; all other bindings refer to
    the front material.

    Material properties can be changed inside a Begin/End pair, either
    directly by calling Material, or indirectly through color material.
    However, such property changes are not guaranteed to update program
    parameter bindings until the following End command.  Program parameter
    variables bound to material properties changed inside a Begin/End pair are
    undefined until the following End command.


    Light Property Bindings

      Binding                        Components  Underlying State
      -----------------------------  ----------  ----------------------------
      state.light[n].ambient         (r,g,b,a)   light n ambient color
      state.light[n].diffuse         (r,g,b,a)   light n diffuse color
      state.light[n].specular        (r,g,b,a)   light n specular color
      state.light[n].position        (x,y,z,w)   light n position
      state.light[n].attenuation     (a,b,c,e)   light n attenuation constants
                                                 and spot light exponent
      state.light[n].spot.direction  (x,y,z,c)   light n spot direction and
                                                 cutoff angle cosine
      state.light[n].half            (x,y,z,1)   light n infinite half-angle
      state.lightmodel.ambient       (r,g,b,a)   light model ambient color
      state.lightmodel.scenecolor    (r,g,b,a)   light model front scene color
      state.lightmodel.              (r,g,b,a)   light model front scene color
               front.scenecolor
      state.lightmodel.              (r,g,b,a)   light model back scene color
               back.scenecolor
      state.lightprod[n].ambient     (r,g,b,a)   light n / front material
                                                 ambient color product
      state.lightprod[n].diffuse     (r,g,b,a)   light n / front material
                                                 diffuse color product
      state.lightprod[n].specular    (r,g,b,a)   light n / front material
                                                 specular color product
      state.lightprod[n].            (r,g,b,a)   light n / front material
              front.ambient                      ambient color product
      state.lightprod[n].            (r,g,b,a)   light n / front material
              front.diffuse                      diffuse color product
      state.lightprod[n].            (r,g,b,a)   light n / front material
              front.specular                     specular color product
      state.lightprod[n].            (r,g,b,a)   light n / back material
              back.ambient                       ambient color product
      state.lightprod[n].            (r,g,b,a)   light n / back material
              back.diffuse                       diffuse color product
      state.lightprod[n].            (r,g,b,a)   light n / back material
              back.specular                      specular color product

      Table X.4: Light Property Bindings.  <n> indicates a light number.

    If a program parameter binding matches "state.light[n].ambient",
    "state.light[n].diffuse", or "state.light[n].specular", the "x", "y", "z",
    and "w" components of the program parameter variable are filled with the
    "r", "g", "b", and "a" components, respectively, of the corresponding
    light color.

    If a program parameter binding matches "state.light[n].position", the "x",
    "y", "z", and "w" components of the program parameter variable are filled
    with the "x", "y", "z", and "w" components, respectively, of the light
    position.

    If a program parameter binding matches "state.light[n].attenuation", the
    "x", "y", and "z" components of the program parameter variable are filled
    with the constant, linear, and quadratic attenuation parameters of the
    specified light, respectively (section 2.13.1).  The "w" component of the
    program parameter variable is filled with the spot light exponent of the
    specified light.

    If a program parameter binding matches "state.light[n].spot.direction",
    the "x", "y", and "z" components of the program parameter variable are
    filled with the "x", "y", and "z" components of the spot light direction
    of the specified light, respectively (section 2.13.1).  The "w" component
    of the program parameter variable is filled with the cosine of the spot
    light cutoff angle of the specified light.

    If a program parameter binding matches "state.light[n].half", the "x",
    "y", and "z" components of the program parameter variable are filled with
    the x, y, and z components, respectively, of the normalized infinite
    half-angle vector

      h_inf = || P + (0, 0, 1) ||.

    The "w" component is filled with 1.0.  In the computation of h_inf, P
    consists of the x, y, and z coordinates of the normalized vector from the
    eye position P_e to the eye-space light position P_pli (section 2.13.1).
    h_inf is defined to correspond to the normalized half-angle vector when
    using an infinite light (w coordinate of the position is zero) and an
    infinite viewer (v_bs is FALSE).  For local lights or a local viewer,
    h_inf is well-defined but does not match the normalized half-angle vector,
    which will vary depending on the vertex position.

    If a program parameter binding matches "state.lightmodel.ambient", the
    "x", "y", "z", and "w" components of the program parameter variable are
    filled with the "r", "g", "b", and "a" components of the light model
    ambient color, respectively.

    If a program parameter binding matches "state.lightmodel.scenecolor" or
    "state.lightmodel.front.scenecolor", the "x", "y", and "z" components of
    the program parameter variable are filled with the "r", "g", and "b"
    components respectively of the "front scene color"

      c_scene = a_cs * a_cm + e_cm,

    where a_cs is the light model ambient color, a_cm is the front ambient
    material color, and e_cm is the front emissive material color.  The "w"
    component of the program parameter variable is filled with the alpha
    component of the front diffuse material color.  If a program parameter
    binding matches "state.lightmodel.back.scenecolor", a similar back scene
    color, computed using back-facing material properties, is used.  The front
    and back scene colors match the values that would be assigned to vertices
    using conventional lighting if all lights were disabled.

    If a program parameter binding matches anything beginning with
    "state.lightprod[n]", the "x", "y", and "z" components of the program
    parameter variable are filled with the "r", "g", and "b" components,
    respectively, of the corresponding light product.  The three light product
    components are the products of the corresponding color components of the
    specified material property and the light color of the specified light
    (see Table X.4).  The "w" component of the program parameter variable is
    filled with the alpha component of the specified material property.

    Light products depend on material properties, which can be changed inside
    a Begin/End pair.  Such property changes are not guaranteed to take effect
    until the following End command.  Program parameter variables bound to
    light products whose corresponding material property changes inside a
    Begin/End pair are undefined until the following End command.


    Texture Coordinate Generation Property Bindings

      Binding                    Components  Underlying State
      -------------------------  ----------  ----------------------------
      state.texgen[n].eye.s      (a,b,c,d)   TexGen eye linear plane
                                             coefficients, s coord, unit n
      state.texgen[n].eye.t      (a,b,c,d)   TexGen eye linear plane
                                             coefficients, t coord, unit n
      state.texgen[n].eye.r      (a,b,c,d)   TexGen eye linear plane
                                             coefficients, r coord, unit n
      state.texgen[n].eye.q      (a,b,c,d)   TexGen eye linear plane
                                             coefficients, q coord, unit n
      state.texgen[n].object.s   (a,b,c,d)   TexGen object linear plane
                                             coefficients, s coord, unit n
      state.texgen[n].object.t   (a,b,c,d)   TexGen object linear plane
                                             coefficients, t coord, unit n
      state.texgen[n].object.r   (a,b,c,d)   TexGen object linear plane
                                             coefficients, r coord, unit n
      state.texgen[n].object.q   (a,b,c,d)   TexGen object linear plane
                                             coefficients, q coord, unit n

      Table X.5:  Texture Coordinate Generation Property Bindings.  "[n]" is
      optional -- texture unit <n> is used if specified; texture unit 0 is
      used otherwise.

    If a program parameter binding matches a set of TexGen plane coefficients,
    the "x", "y", "z", and "w" components of the program parameter variable
    are filled with the coefficients p1, p2, p3, and p4, respectively, for
    object linear coefficients, and the coefficents p1', p2', p3', and p4',
    respectively, for eye linear coefficients (section 2.10.4).


    Fog Property Bindings

      Binding                        Components  Underlying State
      -----------------------------  ----------  ----------------------------
      state.fog.color                (r,g,b,a)   RGB fog color (section 3.10)
      state.fog.params               (d,s,e,r)   fog density, linear start
                                                 and end, and 1/(end-start)
                                                 (section 3.10)

      Table X.6:  Fog Property Bindings

    If a program parameter binding matches "state.fog.color", the "x", "y",
    "z", and "w" components of the program parameter variable are filled with
    the "r", "g", "b", and "a" components, respectively, of the fog color
    (section 3.10).

    If a program parameter binding matches "state.fog.params", the "x", "y",
    and "z" components of the program parameter variable are filled with the
    fog density, linear fog start, and linear fog end parameters (section
    3.10), respectively.  The "w" component is filled with 1/(end-start),
    where end and start are the linear fog end and start parameters,
    respectively.


    Clip Plane Property Bindings

      Binding                        Components  Underlying State
      -----------------------------  ----------  ----------------------------
      state.clip[n].plane            (a,b,c,d)   clip plane n coefficients

      Table X.7:  Clip Plane Property Bindings.  <n> specifies the clip plane
      number, and is required.

    If a program parameter binding matches "state.clip[n].plane", the "x",
    "y", "z", and "w" components of the program parameter variable are filled
    with the coefficients p1', p2', p3', and p4', respectively, of clip plane
    <n> (section 2.11).


    Point Property Bindings

      Binding                        Components  Underlying State
      -----------------------------  ----------  ----------------------------
      state.point.size               (s,n,x,f)   point size, min and max size
                                                 clamps, and fade threshold
                                                 (section 3.3)
      state.point.attenuation        (a,b,c,1)   point size attenuation consts

      Table X.8:  Point Property Bindings

    If a program parameter binding matches "state.point.size", the "x", "y",
    "z", and "w" components of the program parameter variable are filled with
    the point size, minimum point size, maximum point size, and fade
    threshold, respectively (section 3.3).

    If a program parameter binding matches "state.point.attenuation", the "x",
    "y", and "z" components of the program parameter variable are filled with
    the constant, linear, and quadratic point size attenuation parameters (a,
    b, and c), respectively (section 3.3).  The "w" component is filled with
    1.0.


    Texture Environment Property Bindings

      Binding                    Components  Underlying State
      -------------------------  ----------  ----------------------------
      state.texenv[n].color      (r,g,b,a)   texture environment n color

      Table X.9:  Texture Environment Property Bindings.  "[n]" is optional --
      texture unit <n> is used if specified; texture unit 0 is used otherwise.

    If a program parameter binding matches "state.texenv[n].color", the "x",
    "y", "z", and "w" components of the program parameter variable are filled
    with the "r", "g", "b", and "a" components, respectively, of the
    corresponding texture environment color.  Note that only "legacy" texture
    units, as queried by MAX_TEXTURE_UNITS, include texture environment state.
    Texture image units and texture coordinate sets do not have associated
    texture environment state.


    Depth Property Bindings

      Binding                      Components  Underlying State
      ---------------------------  ----------  ----------------------------
      state.depth.range            (n,f,d,1)   Depth range near, far, and
                                               (far-near) (section 2.10.1)

      Table X.10:  Depth Property Bindings

    If a program parameter binding matches "state.depth.range", the "x" and
    "y" components of the program parameter variable are filled with the
    mappings of near and far clipping planes to window coordinates,
    respectively.  The "z" component is filled with the difference of the
    mappings of near and far clipping planes, far minus near.  The "w"
    component is filled with 1.0.


    Matrix Property Bindings

      Binding                               Underlying State
      ------------------------------------  ---------------------------
      * state.matrix.modelview[n]           modelview matrix n
        state.matrix.projection             projection matrix
        state.matrix.mvp                    modelview-projection matrix
      * state.matrix.texture[n]             texture matrix n
        state.matrix.program[n]             program matrix n

      Table X.11:  Base Matrix Property Bindings.  The "[n]" syntax indicates
      a specific matrix number.  For modelview and texture matrices, a matrix
      number is optional, and matrix zero will be used if the matrix number is
      omitted.  These base bindings may further be modified by a
      inverse/transpose selector and a row selector.

    If the beginning of a program parameter binding matches any of the matrix
    binding names listed in Table X.11, the binding corresponds to a 4x4
    matrix.  If the parameter binding is followed by ".inverse", ".transpose",
    or ".invtrans" (<stateMatModifier> grammar rule), the inverse, transpose,
    or transpose of the inverse, respectively, of the matrix specified in
    Table X.11 is selected.  Otherwise, the matrix specified in Table X.11 is
    selected.  If the specified matrix is poorly-conditioned (singular or
    nearly so), its inverse matrix is undefined.  The binding name
    "state.matrix.mvp" refers to the product of modelview matrix zero and the
    projection matrix, defined as

       MVP = P * M0,

    where P is the projection matrix and M0 is modelview matrix zero.

    If the selected matrix is followed by ".row[<a>]" (matching the
    <stateMatrixRow> grammar rule), the "x", "y", "z", and "w" components of
    the program parameter variable are filled with the four entries of row <a>
    of the selected matrix.  In the example,

      PARAM m0 = state.matrix.modelview[1].row[0];
      PARAM m1 = state.matrix.projection.transpose.row[3];

    the variable "m0" is set to the first row (row 0) of modelview matrix 1
    and "m1" is set to the last row (row 3) of the transpose of the projection
    matrix.

    For program parameter array bindings, multiple rows of the selected matrix
    can be bound via the <stateMatrixRows> grammar rule.  If the selected
    matrix binding is followed by ".row[<a>..<b>]", the result is equivalent
    to specifying matrix rows <a> through <b>, in order.  A program will fail
    to load if <a> is greater than <b>.  If no row selection is specified
    (<optMatrixRows> matches ""), matrix rows 0 through 3 are bound in order.
    In the example,

      PARAM m2[] = { state.matrix.program[0].row[1..2] };
      PARAM m3[] = { state.matrix.program[0].transpose };

    the array "m2" has two entries, containing rows 1 and 2 of program matrix
    zero, and "m3" has four entries, containing all four rows of the transpose
    of program matrix zero.


    Section 2.X.3.4, Program Temporaries

    Program temporary variables are used to hold temporary results during
    program execution.  Temporaries do not persist between program
    invocations, and are undefined at the beginning of each program
    invocation.

    Temporary variables are declared explicitly using the <TEMP_statement>
    grammar rule.  Each such statement can declare one or more temporaries.
    Temporaries can not be declared implicitly.  Temporaries can be declared
    using any component size ("SHORT" or "LONG") and type ("FLOAT" or "INT")
    modifier.

    Temporary variables may be declared as arrays.  Temporary variables
    declared as arrays may be stored in slower memory than those not declared
    as arrays, and it is recommended to use non-array variables unless array
    functionality is required.


    Section 2.X.3.5, Program Results

    Program result variables represent the per-vertex or per-fragment results
    of the program.  All result variables have associated bindings, are
    write-only during program execution, and are undefined at the beginning of
    each program invocation.  Any vertex or fragment attributes corresponding
    to unwritten result variables will be undefined in subsequent stages of
    the pipeline.  Result variables may be declared explicitly via the
    <OUTPUT_statement> grammar rule, or implicitly by using a result binding
    in an instruction.

    The set of available result bindings depends on the program type, and is
    enumerated in the specifications for each program type.

    Result variables may generally be declared as arrays, but the set of
    bindings allowed for arrays is limited to state grouped in arrays (e.g.,
    texture coordinates, clip distances, colors).  Additionally, all bindings
    assigned to the array must be of the same binding type and must increase
    consecutively.  Examples of valid and invalid binding lists for vertex
    programs include:

      result.clip[1], result.clip[2]          # valid, 2-entry array
      result.texcoord[0..3]                   # valid, 4-entry array
      result.texcoord[1], result.texcoord[3]  # invalid, skipped texcoord 2
      result.texcoord[2], result.texcoord[1]  # invalid, wrong order
      result.texcoord[1], result.clip[2]      # invalid, different types

    Additionally, result bindings may be used in no more than one array
    addressed with relative addressing.

    Implementations may have a limit on the total number of result binding
    components used by each program target (MAX_PROGRAM_RESULT_COMPONENTS_NV).
    Programs that require more result binding components than this limit will
    fail to load.  The method of counting used result binding components is
    implementation-dependent, but must satisfy the following properties:

      * If a result binding is not referenced in a program, or is referenced
        only in declarations of result variables that are not used, none of
        its components are counted.

      * A result binding component may be counted as used only if there exists
        an instruction operand where

          - the component is enabled in the write mask (Section 2.X.4.3), and

          - the result binding is either

              - referenced directly by the operand,

              - bound to a declared variable referenced by the operand, or

              - bound to a declared array variable where another binding in
                the array satisfies one of the two previous conditions.

        Implementations are not required to optimize out unused elements of an
        result array or components that are used in only some elements of an
        array.  The last of these rules is intended to cover the case where
        the same result binding is used in multiple variables.

        For example, an instruction whose write mask selects only the x
        component may result in the x component of a result binding being
        counted, but may never result in the counting of the y, z, or w
        components of any result binding.


    Section 2.X.3.6, Program Parameter Buffers

    Program parameter buffers are arrays consisting of single-component
    typeless values or four-component typeless vectors stored in a buffer
    object.  The GL provides an implementation-dependent number of buffer
    object binding points for each program target, to which buffer objects can
    be attached.  Program parameter buffer variables can be changed either by
    updating the contents of bound buffer objects, or simply by changing the
    buffer object attached to a binding point.

    Program parameter buffer variables are used as constants during program
    execution.  All program parameter buffer variables have an associated
    binding and are read-only during program execution.  Program parameter
    buffers retain their values across program invocations, although their
    values may change as buffer object bindings or contents change.  Program
    parameter buffer variables must be declared explicitly via the
    <BUFFER_statement> grammar rule.  Program parameter buffer bindings can
    not be used directly in executable instructions.

    Program parameter buffer variables are treated as an array of
    single-component values if the <bufferDeclType> grammar rule matches
    "BUFFER" or as an array of four-component vectors if it matches "BUFFER4".
    A program will fail to load if a variable declared as "BUFFER" and another
    variable declared as "BUFFER4" use the same buffer binding point.

    Program parameter buffer variables may be declared as arrays, but all
    bindings assigned to the array must use the same binding point and must
    increase consecutively.

      Binding                        Components  Underlying State
      -----------------------------  ----------  -----------------------------
      program.buffer[a][b]           (x,x,x,x)   program parameter buffer a,
                                                   element b
      program.buffer[a][b..c]        (x,x,x,x)   program parameter buffer a,
                                                   elements b through c
      program.buffer[a]              (x,x,x,x)   program parameter buffer a,
                                                   all elements

      Table X.12: Program Parameter Buffer Bindings.  <a> indicates a buffer
      number, <b> and <c> indicate individual elements.

    If a program parameter buffer binding matches "program.buffer[a][b]", the
    program parameter variable are filled with element <b> of the buffer
    object bound to binding point <a>.  Each element of the bound buffer
    object is treated a one or four words of data that can hold integer or
    floating-point values.  When a single-component binding is evaluated, the
    selected word is broadcast to all four components of the variable.  When a
    four-component binding is evaluated, the four components of the buffer
    element are loaded into the variable.  If no buffer object is bound to
    binding point <a>, or the bound buffer object is not large enough to hold
    an element <b>, the values used are undefined.  The binding point <a> must
    be a nonnegative integer constant.

    For program parameter buffer array declarations, "program.buffer[a][b..c]"
    is equivalent to specifying elements <b> through <c> of the buffer object
    bound to binding point <a> in order.

    For program parameter buffer array declarations, "program.buffer[a]" is
    equivalent to specifying the entire buffer -- elements 0 through <n>-1,
    where <n> is either the size of the array (if declared) or the
    implementation-dependent maximum parameter buffer object size limit (if no
    size is declared).


    Section 2.X.3.7, Program Condition Code Registers

    The program condition code registers are four-component vectors.  Each
    component of this register is a collection of single-bit flags, including
    a sign flag (SF), a zero flag (ZF), an overflow flag (OF), and a carry
    flag (CF).  There are two condition code registers (CC0 and CC1), whose
    values are undefined at the beginning of program execution.

    Most program instructions can optionally update one of the condition code
    registers, by designating the condition code to update in the instruction.
    When a condition code component is updated, the four flags of each
    component of the condition code are set according to the corresponding
    component of the instruction result.  Full details on the condition code
    updates and tests can be found in Section 2.X.4.3.

    The value of these four flags can be combined in various condition code
    tests, which can be used to mask writes to destination variables and to
    perform conditional branches or other condition operations.


    Section 2.X.3.8, Program Aliases

    Programs can create aliases by matching the <ALIAS_statement> grammar
    rule.  Aliases allow programs to use multiple variable names to refer to a
    single underlying variable.  For example, the statement

      ALIAS var1 = var0

    establishes a variable name of "var1".  Subsequent references to "var1" in
    the program text are treated as references to "var0".  The left hand side
    of an ALIAS statement must be a new variable name, and the right hand side
    must be an established variable name.

    Aliases are not considered variable declarations, so do not count against
    the limits on the number of variable declarations allowed in the program
    text.


    Section 2.X.3.9, Program Resource Limits

    (see ARB_vertex_program specification, incorporates all the different
    limits on instruction counts, temporaries, attribute bindings, program
    parameters, and so on)


    Section 2.X.4, Program Execution Environment

    The set of instructions supported for GPU programs is given in Table X.13
    below and is described in detail in Section 2.X.8.  An instruction can use
    up to three operands when it executes, and most instructions can write a
    single result vector.  Instructions may also specify one or more
    modifiers, according to the <opModifiers> grammar rule.  Instruction
    modifiers affect how the specified operation is performed.

    GPU programs may operate on signed integer, unsigned integer, or
    floating-point values; some instructions are capable of operating on any
    of the three types.  However, the data type of the operands and the result
    are always determined based solely on the instruction and its modifiers.
    If any of the variables used in the instruction are typeless, they will be
    interpreted according to the data type derived from the instruction.  If
    any variables with a conflicting data type are used in the instruction,
    the program will fail to load unless the "NTC" (no type checking)
    instruction modifier is specified.

                  Modifiers
      Instruction F I C S H D  Out Inputs    Description
      ----------- - - - - - -  --- --------  --------------------------------
      ABS         X X X X X F  v   v         absolute value
      ADD         X X X X X F  v   v,v       add
      AND         - X X - - S  v   v,v       bitwise and
      BRK         - - - - - -  -   c         break out of loop instruction
      CAL         - - - - - -  -   c         subroutine call
      CEIL        X X X X X F  v   vf        ceiling
      CMP         X X X X X F  v   v,v,v     compare
      CONT        - - - - - -  -   c         continue with next loop interation
      COS         X - X X X F  s   s         cosine with reduction to [-PI,PI]
      DIV         X X X X X F  v   v,s       divide vector components by scalar
      DP2         X - X X X F  s   v,v       2-component dot product
      DP2A        X - X X X F  s   v,v,v     2-comp. dot product w/scalar add
      DP3         X - X X X F  s   v,v       3-component dot product
      DP4         X - X X X F  s   v,v       4-component dot product
      DPH         X - X X X F  s   v,v       homogeneous dot product
      DST         X - X X X F  v   v,v       distance vector
      ELSE        - - - - - -  -   -         start if test else block
      ENDIF       - - - - - -  -   -         end if test block
      ENDREP      - - - - - -  -   -         end of repeat block
      EX2         X - X X X F  s   s         exponential base 2
      FLR         X X X X X F  v   vf        floor
      FRC         X - X X X F  v   v         fraction
      I2F         - X X - - S  vf  v         integer to float
      IF          - - - - - -  -   c         start of if test block
      KIL         X X - - X F  -   vc        kill fragment
      LG2         X - X X X F  s   s         logarithm base 2
      LIT         X - X X X F  v   v         compute lighting coefficients
      LRP         X - X X X F  v   v,v,v     linear interpolation
      MAD         X X X X X F  v   v,v,v     multiply and add
      MAX         X X X X X F  v   v,v       maximum
      MIN         X X X X X F  v   v,v       minimum
      MOD         - X X - - S  v   v,s       modulus vector components by scalar
      MOV         X X X X X F  v   v         move
      MUL         X X X X X F  v   v,v       multiply
      NOT         - X X - - S  v   v         bitwise not
      NRM         X - X X X F  v   v         normalize 3-component vector
      OR          - X X - - S  v   v,v       bitwise or
      PK2H        X X - - - F  s   vf        pack two 16-bit floats
      PK2US       X X - - - F  s   vf        pack two floats as unsigned 16-bit
      PK4B        X X - - - F  s   vf        pack four floats as signed 8-bit
      PK4UB       X X - - - F  s   vf        pack four floats as unsigned 8-bit
      POW         X - X X X F  s   s,s       exponentiate
      RCC         X - X X X F  s   s         reciprocal (clamped)
      RCP         X - X X X F  s   s         reciprocal
      REP         X X - - X F  -   v         start of repeat block
      RET         - - - - - -  -   c         subroutine return
      RFL         X - X X X F  v   v,v       reflection vector
      ROUND       X X X X X F  v   vf        round to nearest integer
      RSQ         X - X X X F  s   s         reciprocal square root
      SAD         - X X - - S  vu  v,v,vu    sum of absolute differences
      SCS         X - X X X F  v   s         sine/cosine without reduction
      SEQ         X X X X X F  v   v,v       set on equal
      SFL         X X X X X F  v   v,v       set on false
      SGE         X X X X X F  v   v,v       set on greater than or equal
      SGT         X X X X X F  v   v,v       set on greater than
      SHL         - X X - - S  v   v,s       shift left
      SHR         - X X - - S  v   v,s       shift right
      SIN         X - X X X F  s   s         sine with reduction to [-PI,PI]
      SLE         X X X X X F  v   v,v       set on less than or equal
      SLT         X X X X X F  v   v,v       set on less than
      SNE         X X X X X F  v   v,v       set on not equal
      SSG         X - X X X F  v   v         set sign
      STR         X X X X X F  v   v,v       set on true
      SUB         X X X X X F  v   v,v       subtract
      SWZ         X - X X X F  v   v         extended swizzle
      TEX         X X X X - F  v   vf        texture sample
      TRUNC       X X X X X F  v   vf        truncate (round toward zero)
      TXB         X X X X - F  v   vf        texture sample with bias
      TXD         X X X X - F  v   vf,vf,vf  texture sample w/partials
      TXF         X X X X - F  v   vs        texel fetch
      TXL         X X X X - F  v   vf        texture sample w/LOD
      TXP         X X X X - F  v   vf        texture sample w/projection
      TXQ         - - - - - S  vs  vs        texture info query
      UP2H        X X X X - F  vf  s         unpack two 16-bit floats
      UP2US       X X X X - F  vf  s         unpack two unsigned 16-bit ints
      UP4B        X X X X - F  vf  s         unpack four signed 8-bit ints
      UP4UB       X X X X - F  vf  s         unpack four unsigned 8-bit ints
      X2D         X - X X X F  v   v,v,v     2D coordinate transformation
      XOR         - X X - - S  v   v,v       exclusive or
      XPD         X - X X X F  v   v,v       cross product

      Table X.13:  Summary of NV_gpu_program4 instructions.  The "Modifiers"
      columns specify the set of modifiers allowed for the instruction:

        F = floating-point data type modifiers
        I = signed and unsigned integer data type modifiers
        C = condition code update modifiers
        S = clamping (saturation) modifiers
        H = half-precision float data type suffix
        D = default data type modifier (F, U, or S)

      The input and output columns describe the formats of the operands and
      results of the instruction.

        v:  4-component vector (data type is inherited from operation)
        vf: 4-component vector (data type is always floating-point)
        vs: 4-component vector (data type is always signed integer)
        vu: 4-component vector (data type is always unsigned integer)
        s:  scalar (replicated if written to a vector destination;
                    data type is inherited from operation)
        c:  condition code test result (e.g., "EQ", "GT1.x")
        vc: 4-component vector or condition code test


    Section 2.X.4.1, Program Instruction Modifiers

    There are several types of instruction modifiers available.  A data type
    modifier specifies that an instruction should operate on signed integer,
    unsigned integer, or floating-point data, when multiple data types are
    supported.  A clamping modifier applies to instructions with
    floating-point results, and specifies the range to which the results
    should be clamped.  A condition code update modifier specifies that the
    instruction should update one of the condition code variables.  Several
    other special modifiers are also provided.

    Instruction modifiers may be specified as stand-alone modifiers or as
    suffixes concatenated with the opcode name.  A program will fail to load
    if it contains an instruction that

      * specifies more than one modifier of any given type,

      * specifies a clamping modifier on an instruction, unless it produces
        floating-point results, or

      * specifies a modifier that is not supported by the instruction (see
        Table X.13 and the instruction description).

    Stand-alone instruction modifiers are specified according to the
    <opModifiers> grammar rule using a ".<modifier>" syntax.  Multiple
    modifers, separated by periods, may be specified.  The set of supported
    modifiers is described in Table X.14.

      Modifier  Description
      --------  -----------------------------------------------
      F         Floating-point operation
      U         Fixed-point operation, unsigned operands
      S         Fixed-point operation, signed operands
      CC        Update condition code register zero
      CC0       Update condition code register zero
      CC1       Update condition code register one
      SAT       Floating-point results clamped to [0,1]
      SSAT      Floating-point results clamped to [-1,1]
      NTC       Disable type-checking on operands/results
      S24       Signed multiply (24-bit operands)
      U24       Unsigned multiply (24-bit operands)
      HI        Multiplies two 32-bit integer operands, returns
                  the 32 MSBs of the product

      Table X.14, Instruction Modifers.

    "F", "U", and "S" modifiers are data type modifiers and specify that the
    instruction should operate on floating-point, unsigned integer, or
    signed integer values, respectively.  For example, "ADD.F", "ADD.U", and
    "ADD.S" specify component-wise addition of floating-point, unsigned
    integer, or signed integer vectors, respectively.  These modifiers specify
    a data type, but do not specify a precision at which the operation is
    performed.  Floating-point operations will be carried out with an internal
    precision no less than that used to represent the largest operand.
    Fixed-point operations will be carried out using at least as many bits as
    used to represent the largest operand.  Operands represented with fewer
    bits than used to perform the instruction will be promoted to a larger
    data type.  Signed integer operands will be sign-extended, where the most
    significant bits are filled with ones if the operand is negative and zero
    otherwise.  Unsigned integer operands will be zero-extended, where the
    most significant bits are always filled with zeroes.  For some
    instructions, the data type of some operands or the result are fixed; in
    these cases, the data type modifier specifies the data type of the
    remaining values.

    "CC", "CC0", and "CC1" are condition code update modifiers that specify
    that one of the condition code registers should be updated based on the
    result of the instruction, as described in section 2.X.4.3.  "CC" and
    "CC0" specify that the condition code register CC0 be updated; "CC1"
    specifies an update to CC1.  If no condition code update modifier is
    provided, the condition code registers will not be affected.

    "SAT" and "SSAT" are clamping modifiers that specify that the
    floating-point components of the instruction result should be clamped to
    [0,1] or [-1,1], respectively, before updating the condition code and the
    destination variable.  If no clamping suffix is specified, unclamped
    results will be used for condition code updates (if any) and destination
    variable writes.  Clamping modifiers are not supported on instructions
    that do not produce floating-point results.

    "NTC" (no type checking) disables data type checking on the instruction,
    and allows instructions to use operands or result variables whose data
    types are inconsistent with the expected data types of the instruction.

    "S24", "U24", and "HI" are special modifiers that are allowed only for the
    MUL instruction, and are described in detail where MUL is documented.  No
    more than one such modifier may be provided for any instruction.

    If an instruction supports data type modifiers, but none is provided, a
    default data type will be chosen based on the instruction, as specified in
    Table X.13 and the instruction set description (Section 2.X.8).  If
    condition code update or clamping modifiers are not specified, the
    corresponding operation will not be performed.

    Additionally, each instruction name may have one or more suffixes,
    concatenated onto the base instruction name, that operate as instruction
    modifiers.  For conciseness, these suffixes are not spelled out in the
    grammar -- the base opcode name is used as a placeholder for the opcode
    and all of its possible suffixes.  Instruction suffixes are provided
    mainly for compatibility with prior GPU program instruction sets (e.g.,
    NV_vertex_program3, NV_fragment_program2, and predecessors).  The set of
    allowable suffixes, and their equivalent stand-alone modifiers, are listed
    in Table X.15.

      Suffix  Modifier     Description
      ------  ----------   ---------------------------------------------------
      R       F            Floating-point operation, 32-bit precision
      H       F(*)         Floating-point operation, at least 16-bit precision
      C       CC0          Update condition code register zero
      C0      CC0          Update condition code register zero
      C1      CC1          Update condition code register one
      _SAT    SAT          Floating-point results clamped to [0,1]
      _SSAT   SSAT         Floating-point results clamped to [-1,1]

      Table X.15,  Instruction Suffixes.

    The "R" and "H" suffixes specify floating-point operations and are
    equivalent to the "F" data type modifier.  They additionally specify a
    minimum precision for the operations.  Instructions with an "R" precision
    modifier will be carried out at no less than IEEE single-precision
    floating-point (8 bits of exponent, 23 bits of mantissa).  Instructions
    with an "H" precision modifier will be carried out at no less than 16-bit
    floating-point precision (5 bits of exponent, 10 bits of mantissa).

    An instruction may have multiple suffixes, but they must appear in order,
    with data type suffixes first, followed by condition code update suffixes,
    followed by clamping suffixes.  For example, "ADDR" carries out an add at
    32-bit precision.  "ADDH_SAT" carries out an add at 16-bit precision (or
    better) and clamps the results to [0,1].  "ADDRC1_SSAT" carries out an add
    at 32-bit floating-point precision, clamps the results to [-1,1], and
    updates condition code one based on the clamped result.


    Section 2.X.4.2, Program Operands

    Most program instructions operate on one or more scalar or vector
    operands.  Each operand specifies an operand variable, which is either the
    name of a previously declared variable or an implicit variable declaration
    created by using a variable binding in the instruction.  Attribute,
    parameter, or parameter buffer variables can be declared implicitly by
    using a valid binding name in an operand.  Instruction operands are
    specified by the <instOperandV>, <instOperandS>, or <instOperandVNS>
    grammar rules.

    If the operand variable is not an array, its contents are loaded directly.
    If the operand variable is an array, a single element of the array is
    loaded according to the <arrayMem> grammar rule.  The elements of an array
    are numbered from 0 to <n>-1, where <n> is the number of entries in the
    array.  Array members can be accessed using either absolute or relative
    addressing.

    Absolute array addressing is used when the <arrayMemAbs> grammar rule is
    matched; the array member to load is specified by the matching integer.
    Out-of-bounds array absolute accesses are not allowed.  If the specified
    member number is greater than or equal to the size of the array, the
    program will fail to load.

    Relative array addressing is used when the <arrayMemRel> grammar rule is
    matched.  This grammar rule allows the program to specify a scalar integer
    operand and an optional constant offset, according to the <arrayMemReg>
    and <arrayMemOffset> grammar rules.  When performing relative addressing,
    the GL evaluates the specified integer scalar operand (according to the
    rules specified in this section) and adds the constant offset.  The array
    member loaded is given by this sum.  The constant offset is considered
    zero if an offset is omitted.  If the sum is negative or exceeds the size
    of the array, the results of the access are undefined, but may not lead to
    program or GL termination.  The set of constant offsets supported for
    relative addressing is limited to values in the range [0,<n>-1], where <n>
    is the size of the array.  A program will fail to load if it specifies an
    offset outside that range.  If offsets outside that range are required,
    they can be applied by using an integer ADD instruction writing to a
    temporary variable.

    After the operand is loaded, its components can be rearranged according to
    the <swizzleSuffix> grammar rule, or it can be converted to a scalar
    operand according to the <scalarSuffix> grammar rule.

    The <swizzleSuffix> grammar rule rearranges the components of a loaded
    vector to produce another vector.  If the <swizzleSuffix> rule matches the
    <xyzwSwizzle> or <rgbaSwizzle> grammar rule, a pattern of the form ".????"
    is used, where each question mark is replaced with one of "x", "y", "z",
    "w", "r", "g", "b", or a".  For such patterns, the x, y, z, and w
    components of the operand are taken from the vector components named by
    the first, second, third, and fourth character of the pattern,
    respectively.  Swizzle components of "r", "g", "b", and "a" are equivalent
    to "x", "y", "z", and "w", respectively.  For example, if the swizzle
    suffix is ".yzzx" or ".gbbr" and the specified source contains {2,8,9,0},
    the result is the vector {8,9,9,2}.  If the <swizzleSuffix> matches the
    <component> grammar rule, a pattern of the form ".?" is used.  For this
    pattern, all four components of the operand are taken from the single
    component identified by the pattern.  If the swizzle suffix is omitted,
    components are not rearranged and swizzling has no effect, as though
    ".xyzw" were specified.

    The swizzle suffix rules do not allow mixing "x", "y", "z", or "w"
    selectors with "r", "g", "b", or "a" selectors.  A program will fail to
    load if it contains a swizzle suffix with selectors from both of these
    sets.

    The <scalarSuffix> grammar rule converts a vector to a scalar by selecting
    a single component.  The <scalarSuffix> rule is similar to the swizzle
    selector, except that only a single component is selected.  If the scalar
    suffix is ".y" and the specified source contains {2,8,9,0}, the value is
    the scalar value 8.

    Next, a component-wise negate operation is performed on the operand if the
    <operandNeg> grammar rule matches "-".  Negation is not performed if the
    operand has no sign prefix, or is prefixed with "+".  For unsigned integer
    operands, the negate operand performs a two's complement operation.

    Next, a component-wise absolute value operation is performed on the
    operand if the <instOperandAbsV> or <instOperandAbsS> grammar rule is
    matched, by surrounding the operand with two "|" characters.  The result
    is optionally negated if the <operandAbsNeg> grammar rule matches "-".
    For unsigned integer operands, the absolute value operation has no effect.


    Section 2.X.4.3, Program Destination Variable Update

    Most program instructions perform computations that produce a result,
    which will be written to a variable.  Each instruction that computes a
    result specifies a destination variable, which is either the name of a
    previously declared variable or an implicit variable declaration created
    by using a variable binding in the instruction.  Result variables can be
    declared implicitly by using a valid program result binding name in the
    result portion of the instruction.  Instruction results are specified
    according to the <instResult> grammar rule.

    The destination variable may be a single member of an array.  In this
    case, a single array member is specified using the <arrayMem> grammar
    rule, and the array member to update is computed in the exact same manner
    as done for operand loads.  If the array member is computed at run time,
    and is negative or greater than or equal to the size of the array, the
    results of the destination variable update are undefined and could result
    in overwriting other program variables.

    The results of the operation may be obtained at a different precision than
    that used to store the destination variable.  If so, the results are
    converted to match the size of the destination variable.  For
    floating-point values, the results are rounded to the nearest
    floating-point value that can be represented in the destination variable.
    If a result component is larger in magnitude than the largest
    representable floating-point value in the data type of the destination
    variable, an infinity encoding (+/-INF) is used.  Signed or unsigned
    integer values are sign-extended or zero-extended, respectively, if the
    destination variable has more bits than the result, and have their most
    significant bits discarded if the destination variable has fewer bits.

    Writes to individual components of a vector destination variable can be
    controlled at compile time by individual component write masks specified
    in the instruction.  The component write mask is specified by the
    <optWriteMask> grammar rule, and is a string of up to four characters,
    naming the components to enable for writing.  If no write mask is
    specified, all components are enabled for writing.  The characters "x",
    "y", "z", and "w" match the x, y, z, and w components respectively.  For
    example, a write mask mask of ".xzw" indicates that the x, z, and w
    components should be enabled for writing but the y component should not be
    written.  The grammar requires that the destination register mask
    components must be listed in "xyzw" order.  Additionally, write mask
    components of "r", "g", "b", and "a" are equivalent to "x", "y", "z", and
    "w", respectively.  The grammar does not allow mixing "x", "y", "z", or
    "w" components with "r", "g", "b", and "a" ones.

    Writes to individual components of a vector destination variable, or to a
    scalar destination variable, can also be controlled at run time using
    condition code write masks.  The condition code write mask is specified by
    the <ccMask> grammar rule.  If a mask is specified, a condition code
    variable is loaded according to the <ccMaskRule> grammar rule and tested
    as described in Table X.16 to produce a four-component vector of TRUE/FALSE
    values.

         mask rule         test name                condition
         ---------------   ----------------------   -----------------
         EQ,  EQ0,  EQ1    equal                    !SF && ZF
         GE,  GE0,  GE1    greater than or equal    !(SF ^ OF)
         GT,  GT0,  GT1    greater than             (!SF ^ OF) && !ZF
         LE,  LE0,  LE1    less than or equal       SF ^ (ZF || OF)
         LT,  LT0,  LT1    less than                (SF && !ZF) ^ OF
         NE,  NE0,  NE1    not equal                SF || !ZF
         FL,  FL0,  FL1    false                    always false
         TR,  TR0,  TR1    true                     always true

         NAN, NAN0, NAN1   not a number             SF && ZF
         LEG, LEG0, LEG1   less, equal, or greater  !SF || !ZF
                             (anything but a NaN)

         CF,  CF0,  CF1    carry flag               CF
         NCF, NCF0, NCF1   no carry flag            !CF
         OF,  OF0,  OF1    overflow flag            OF
         NOF, NOF0, NOF1   no overflow flag         !OF
         SF,  SF0,  SF1    sign flag                SF
         NSF, NSF0, NSF1   no sign flag             !SF
         AB,  AB0,  AB1    above                    CF && !ZF
         BLE, BLE0, BLE1   below or equal           !CF || ZF

      Table X.16, Condition Code Tests.  The allowed rules are specified in
      the "mask rule" column.  If "0" or "1" is appended to the rule name
      (e.g., "EQ1"), the corresponding condition code register (CC1 in this
      example) is loaded, otherwise CC0 is loaded.  After loading, each
      component is tested, using the expression listed in the "condition"
      column.

    After the condition code tests are performed, the four-component result
    can be swizzled according to the <swizzleSuffix> grammar rule.  Individual
    components of the destination variable are written only if the
    corresponding component of the swizzled condition code test result is
    TRUE.  If both a (compile-time) component write mask and a condition code
    write mask are specified, destination variable components are written only
    if the corresponding component is enabled in both masks.

    A program instruction can also optionally update one of the two condition
    code registers if the "CC", "CC0", or "CC1" instruction modifier are
    specified.  These instruction modifiers update condition code register
    CC0, CC0, or CC1, respectively.  The instructions "ADD.CC" or "ADD.CC0"
    will perform an add and update condition code zero, "ADD.CC1" will add and
    update condition code one, and "ADD" will simply perform the add without a
    condition code update.  The components of the selected condition code
    register are updated if and only if the corresponding component of the
    destination variable are enabled by both write masks.  For the purposes of
    condition code update, a scalar destination variable is treated as a
    vector where the scalar result is written to "x" (if enabled in the write
    mask), and writes to the "y", "z", and "w" components are disabled.

    When condition code components are written, the condition code flags are
    updated based on the corresponding component of the result.  If a
    component of the destination register is not enabled for writes, the
    corresponding condition code component is also unchanged.

    For floating-point results, the sign flag (SF) is set if the result is
    less than zero or is a NaN (not a number) value.  The zero flag (ZF) is
    set if the result is equal to zero or is a NaN.

    For signed and unsigned integer results, the sign flag (SF) is set if the
    most significant bit of the value written to the result variable is set
    and the zero flag (ZF) is set if the result written is zero.  For
    instructions other than those performing an integer add or subtract (ADD,
    MAD, SAD, SUB), the overflow and carry flags (OF and CF) are cleared.

    For integer add or subtract operations, the overflow and carry flags by
    doing both signed and unsigned adds/subtracts as follows:

      The overflow flag (OF) is set by interpreting the two operands as signed
      integers and performing a signed add or subtract.  If the result is
      representable as a signed integer (i.e., doesn't overflow), the overflow
      flag is cleared; otherwise, it is set.

      The carry flag (CF) is set by interpreting the two operands as unsigned
      integers and performing an unsigned add or subtract.  If the result of
      an add is representable as an unsigned integer (i.e., doesn't overflow),
      the carry flag is cleared; otherwise, it is set.  If the result of a
      subtract is greater than or equal to zero, the carry flag is set;
      otherwise, it is cleared.

    For the purposes of condition code setting, negation modifiers turn add
    operations into subtracts and vice versa.  If the operation is equivalent
    to an add with both operands negated (-A-B), the carry and overflow flags
    are both undefined.


    Section 2.X.4.4, Program Texture Access

    Certain program instructions may access texture images, as described in
    section 3.8.  The coordinates, level-of-detail, and partial derivatives
    used for performing the texture lookup are derived from values provided in
    the program as described in the various sub-sections of Section 2.X.8.
    These descriptions use the function

      result_t_vec
        TextureSample(float_vec coord, float lod, float_vec ddx,
                      float_vec ddy, int_vec offset);

    which obtains a filtered texel value <tau> as described in Section 3.8.8
    and returns a 4-component vector (R,G,B,A) according to the format
    conversions specified in Table 3.21.  The result vector is interpreted as
    floating-point, signed integer, or unsigned integer, according to the data
    type modifier of the instruction.  If the internal format of the texture
    does not match the instruction's data type modifer, the results of the
    texture lookup are undefined.

    (Note:  For unextended OpenGL 2.0, all supported texture internal formats
    store integer values but return floating-point results in the range [0,1]
    on a texture lookup.  The ARB_texture_float extension introduces
    floating-point internal format where components are both stored and
    returned as floating-point values.  The EXT_texture_integer extension
    introduces formats that both store and return either signed or unsigned
    integer values.)

    <coord> is a four-component floating-point vector from which the (s,t,r)
    texture coordinates used for the texture access, the layer used for array
    textures, and the reference value used for depth comparisons (section
    3.8.14) are extracted according to Table X.17.  If the texture is a cube
    map, (s,t,r) is projected to one of the six cube faces to produce a new
    (s,t) vector according to Section 3.8.6.  For array textures, the layer
    used is derived by rounding the extracted floating-point component to the
    nearest integer and clamping the result to the range [0,<n>-1], where <n>
    is the number of layers in the texture.

    <lod> specifies the level of detail parameter and replaces the value
    computed in equation 3.18.  <ddx> and <ddy> specify partial derivatives
    (ds/dx, dt/dx, dr/dx, ds/dy, dt/dy, and dr/dy) for the texture
    coordinates, and may be used to derive footprint shapes for anisotropic
    texture filtering.

    <offset> is a constant 3-component signed integer vector specified
    according to the <texOffset> grammar rule, which is added to the computed
    <u>, <v>, and <w> texel locations prior to sampling.  One, two, or three
    components may be specified in the instruction; if fewer than three are
    specified, the remaining offset components are zero.  A limited range of
    offset values are supported; the minimum and maximum <texOffset> values
    are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and
    MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively.  A program will fail to load:

      * if the texture target specified in the instruction is 1D, ARRAY1D,
        SHADOW1D, or SHADOWARRAY1D, and the second or third component of the
        offset vector is non-zero,

      * if the texture target specified in the instruction is 2D, RECT,
        ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third
        component of the offset vector is non-zero,

      * if the texture target is CUBE or SHADOWCUBE, and any component of the
        offset vector is non-zero -- texel offsets are not supported for cube
        map or buffer textures, or

      * if any component of the offset vector is less than
        MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than
        MAX_PROGRAM_TEXEL_OFFSET_EXT.

    (NOTE:  Texel offsets are a new feature provided by this extension and are
    described in more detail in edits to Section 3.8 below.)

    The texture used by TextureSample() is one of the textures bound to the
    texture image unit whose number is specified in the instruction according
    to the <texImageUnit> grammar rule.  The texture target accessed is
    specified according to the <texTarget> grammar rule and Table X.17.
    Fixed-function texture enables are always ignored when determining the
    texture to access in a program.

                                                     coordinates used
      texTarget          Texture Type               s t r  layer  shadow
      ----------------   ---------------------      -----  -----  ------
      1D                 TEXTURE_1D                 x - -    -      -
      2D                 TEXTURE_2D                 x y -    -      -
      3D                 TEXTURE_3D                 x y z    -      -
      CUBE               TEXTURE_CUBE_MAP           x y z    -      -
      RECT               TEXTURE_RECTANGLE_ARB      x y -    -      -
      ARRAY1D            TEXTURE_1D_ARRAY_EXT       x - -    y      -
      ARRAY2D            TEXTURE_2D_ARRAY_EXT       x y -    z      -
      SHADOW1D           TEXTURE_1D                 x - -    -      z
      SHADOW2D           TEXTURE_2D                 x y -    -      z
      SHADOWRECT         TEXTURE_RECTANGLE_ARB      x y -    -      z
      SHADOWCUBE         TEXTURE_CUBE_MAP           x y z    -      w
      SHADOWARRAY1D      TEXTURE_1D_ARRAY_EXT       x - -    y      z
      SHADOWARRAY2D      TEXTURE_2D_ARRAY_EXT       x y -    z      w
      BUFFER             TEXTURE_BUFFER_EXT           <not supported>

      Table X.17:  Texture types accessed for each of the <texTarget>, and
      coordinate mappings.  The "SHADOW" and "ARRAY" targets are special
      pseudo-targets described below.  The "coordinates used" column indicate
      the input values used for each coordinate of the texture lookup, the
      layer selector for array textures, and the reference value for texture
      comparisons.  Buffer textures are not supported by normal texture lookup
      functions, but are supported by TXF and TXQ, described below.

    Texture targets with "SHADOW" are used to access textures with a
    DEPTH_COMPONENT base internal format using depth comparisons (Section
    3.8.14).  Results of a texture access are undefined:

      * if a "SHADOW" target is used, and the corresponding texture has a base
        internal format other than DEPTH_COMPONENT or a TEXTURE_COMPARE_MODE
        of NONE, or

      * if a non-"SHADOW" target is used, and the corresponding texture has a
        base internal format of DEPTH_COMPONENT and a TEXTURE_COMPARE_MODE
        other than NONE.

    If the texture being accessed is not complete (or cube complete for
    cubemap textures), no texture access is performed and the result is
    undefined.

    A program will fail to load if it attempts to sample from multiple texture
    targets (including the SHADOW pseudo-targets) on the same texture image
    unit.  For example, a program containing any two the following
    instructions will fail to load:

      TEX out, coord, texture[0], 1D;
      TEX out, coord, texture[0], 2D;
      TEX out, coord, texture[0], ARRAY2D;
      TEX out, coord, texture[0], SHADOW2D;
      TEX out, coord, texture[0], 3D;

    Additionally, multiple texture targets for a single texture image unit may
    not be used at the same time by the GL.  The error INVALID_OPERATION is
    generated by Begin, RasterPos, or any command that performs an implicit
    Begin if an enabled program accesses one texture target for a texture unit
    while another enabled program or fixed-function fragment processing
    accesses a different texture target for the same texture image unit.

    Some texture instructions use standard methods to compute partial
    derivatives and/or the level-of-detail used to perform texture accesses.
    For fragment programs, the functions

      float_vec ComputePartialsX(float_vec coord);
      float_vec ComputePartialsY(float_vec coord);

    compute approximate component-wise partial derivatives of the
    floating-point vector <coord> relative to the X and Y coordinates,
    respectively.  For vertex and geometry programs, these functions always
    return (0,0,0,0).  The function

      float ComputeLOD(float_vec ddx, float_vec ddy);

    maps partial derivative vectors <ddx> and <ddy> to ds/dx, dt/dx, dr/dx,
    ds/dy, dt/dy, and dr/dy and computes lambda_base(x,y) according to
    equation 3.18.

    The TXF instruction provides the ability to extract a single texel from a
    specified texture image using the function

      result_t_vec TexelFetch(int_vec coord, int_vec offset);

    The extracted texel is converted to an (R,G,B,A) vector according to Table
    3.21.  The result vector is interpreted as floating-point, signed integer,
    or unsigned integer, according to the data type modifier of the
    instruction.  If the internal format of the texture is not compatible with
    the instruction's data type modifer, the extracted texel value is
    undefined.

    <coord> is a four-component signed integer vector used to identify the
    single texel accessed.  The (i,j,k) coordinates of the texel and the layer
    used for array textures are extracted according to Table X.18.  The level
    of detail accessed is obtained by adding the w component of <coord> to the
    base level (level_base).  <offset> is a constant 3-component signed
    integer vector added to the texel coordinates prior to the texel fetch as
    described above.  In addition to the restrictions described above,
    non-zero offset components are also not supported for BUFFER targets.

    The texture used by TexelFetch() is specified by the image unit and target
    parameters provided in the instruction, as for TextureSample() above.
    Single texel fetches can not perform depth comparisons or access cubemaps.
    If a program contains a TXF instruction specifying one of the "SHADOW" or
    "CUBE" targets, it will fail to load.

                                      coordinates used
      texTarget          supported      i j k  layer  lod
      ----------------   ---------      -----  -----  ---
      1D                    yes         x - -    -     w
      2D                    yes         x y -    -     w
      3D                    yes         x y z    -     w
      CUBE                  no          - - -    -     -
      RECT                  yes         x y -    -     w
      ARRAY1D               yes         x - -    y     w
      ARRAY2D               yes         x y -    z     w
      SHADOW1D              no          - - -    -     -
      SHADOW2D              no          - - -    -     -
      SHADOWRECT            no          - - -    -     -
      SHADOWCUBE            no          - - -    -     -
      SHADOWARRAY1D         no          - - -    -     -
      SHADOWARRAY2D         no          - - -    -     -
      BUFFER                yes         x - -    -     -

      Table X.18, Mappings of texel fetch coordinates to texel location.

    Single-texel fetches do not support LOD clamping or any texture wrap mode,
    and require a mipmapped minification filter to access any level of detail
    other than the base level.  The results of the texel fetch are undefined:

      * if the computed LOD is less than the texture's base level (level_base)
        or greater than the maximum level (level_max),

      * if the computed LOD is not the texture's base level and the texture's
        minification filter is NEAREST or LINEAR,

      * if the layer specified for array textures is negative or greater than
        the number of layers in the array texture,

      * if the texel at (i,j,k) coordinates refer to a border texel outside
        the defined extents of the specified LOD, where

         i < -b_s, j < -b_s, k < -b_s,
         i >= w_s - b_s, j >= h_s - b_s, or k >= d_s - b_s,

        where the size parameters (w_s, h_s, d_s, and b_s) refer to the width,
        height, depth, and border size of the image, as in equations 3.15,
        3.16, and 3.17, or

      * if the texture being accessed is not complete (or cube complete for
        cubemaps).


    Section 2.X.5, Program Flow Control

    In addition to basic arithmetic, logical, and texture instructions, a
    number of flow control instructions are provided, which are described in
    detail in Section 2.X.8.  Programs can contain several types of
    instruction blocks:  IF/ELSE/ENDIF blocks, REP/ENDREP blocks, and
    subroutine blocks.  IF/ELSE/ENDIF blocks are a set of instructions
    beginning with an "IF" instruction, ending with an "ENDIF" instruction,
    and possibly containing an optional "ELSE" instruction.  REP/ENDREP blocks
    are a set of instructions beginning with a "REP" instruction and ending
    with an "ENDREP" instruction.  Subroutine blocks begin with an instruction
    label identifying the name of the subroutine and ending just before the
    next instruction label or the end of the program.  Examples include the
    following:

        MOVC CC, R0;
        IF GT.x;
          MOV R0, R1;     # executes if R0.x > 0
        ELSE;
          MOV R0, R2;     # executes if R0.x <= 0
        ENDIF;

        REP repCount;
        ADD R0, R0, R1;
        ENDREP;

      square:             # subroutine to compute R0^2
        MUL R0, R0, R0;
        RET;
      main:
        MOV R0, 9.0;
        CAL square;       # compute 9.0^2 in R0

    IF/ELSE/ENDIF and REP/ENDREP blocks may be nested inside each other, and
    inside subroutines.  In all cases, each instruction block must be
    terminated with the appropriate instruction (ENDIF for IF, ENDREP for
    REP).  Nested instruction blocks must be wholly contained within a block
    -- if a REP instruction is found between an IF and ELSE instruction, the
    corresponding ENDREP must also be present between the IF and ELSE.
    Subroutines may not be nested inside IF/ELSE/ENDIF or REP/ENDREP blocks,
    or inside other subroutines.  A program will fail to load if any
    instruction block is terminated by an incorrect instruction, is not
    terminated before the block containing it, or contains an instruction
    label.

    IF/ELSE/ENDIF blocks evaluate a condition to determine which instructions
    to execute.  If the condition is true, all instructions between the IF and
    ELSE are executed.  If the condition is false, all instructions between
    the ELSE and ENDIF are executed.  The ELSE instruction is optional.  If
    the ELSE is omitted, all instructions between the IF and ENDIF are
    executed if the condition is true, or skipped if the condition is false.
    A limited amount of nesting is supported -- a program will fail to load if
    an IF instruction is nested inside MAX_PROGRAM_IF_DEPTH_NV or more
    IF/ELSE/ENDIF blocks.

    REP/ENDREP blocks are used to execute a sequence of instructions multiple
    times.  The REP instruction includes an optional scalar operand to specify
    a loop count indicating the number of times the block of instructions
    should be repeated.  If the loop count is omitted, the contents of a
    REP/ENDREP block will be repeated indefinitely until the loop is
    explicitly terminated.  A limited amount of nesting is supported -- a
    program will fail to load if a REP instruction is nested inside
    MAX_PROGRAM_LOOP_DEPTH_NV or more REP/ENDREP blocks.

    Within a REP/ENDREP block, the CONT instruction can be used to terminate
    the current iteration of the loop by effectively jumping to the ENDREP
    instruction.  The BRK instruction can be used to terminate the entire loop
    by effectively jumping to the instruction immediately following the ENDREP
    instruction.  If CONT and BRK instructions are found inside multiply
    nested REP/ENDREP blocks, they apply to the innermost block.  A program
    will fail to load if it includes a CONT or BRK instruction that is not
    contained inside a REP/ENDREP block.

    A REP/ENDREP block without a specified loop count can result in an
    infinite loop.  To prevent obvious infinite loops, a program will fail to
    load if it contains a REP/ENDREP block that contains neither a BRK
    instruction at the current nesting level or a RET instruction at any
    nesting level.

    Subroutines are supported via the CAL and RET instructions.  A subroutine
    block is identified by an instruction, which can be any valid identifier
    according to the <instLabel> grammar rule.  The CAL instruction identifies
    a subroutine name to call according to the <instTarget> grammar rule.
    Instruction labels used in CAL instructions do not need to be defined in
    the program text that precedes the instruction, but a program will fail to
    load if it includes a CAL instruction that references an instruction label
    that is not defined anywhere in the program.  When a CAL instruction is
    executed, it transfers control to the instruction immediately following
    the specified instruction label.  Subsequent instructions in that
    subroutine are executed until a RET instruction is executed, or until
    program execution reaches another instruction label or the end of the
    program text.  After the subroutine finishes, execution continues with the
    instruction immediately following the CAL instruction.  When a RET
    instruction is issued, it will break out of any IF/ELSE/ENDIF or
    REP/ENDREP blocks that contain it.

    Subroutines may call other subroutines before completing, up to an
    implementation-dependent maximum depth of MAX_PROGRAM_CALL_DEPTH_NV calls.
    Subroutines may call any subroutine in the program, including themselves,
    as long as the call depth limit is obeyed.  The results of issuing a CAL
    instruction while MAX_PROGRAM_CALL_DEPTH subroutines have not completed
    has undefined results, including possible program termination.

    Several flow control instructions include condition code tests.  The IF
    instruction requires a condition test to determine what instructions are
    executed.  The CONT, BRK, CAL, and RET instructions have an optional
    condition code test; if the test fails, the instructions are not executed.
    Condition code tests are specified by the <ccTest> grammar rule.  The test
    is evaluated like the condition code write mask (section 2.X.4.3), and
    passes if and only if any of the four components passes.

    If an instruction label named "main" is specified, GPU program execution
    begins with the instruction immediately following that label.  Otherwise,
    it begins with the first instruction of the program.  Instructions are
    executed in sequence until either a RET instruction is issued in the main
    subroutine or the end of the program text is reached.


    Section 2.X.6, Program Options

    Programs may specify a number of options to indicate that one or more
    extended language features are used by the program.  All program options
    used by the program must be declared at the beginning of the program
    string.  Each program option specified in a program string will modify the
    syntactic or semantic rules used to interpet the program and the execution
    environment used to execute the program.  Features in program options
    not declared by the program are ignored, even if the option is otherwise
    supported by the GL.  Each option declaration consists of two tokens: the
    keyword "OPTION" and an identifier.

    The set of available options depends on the program type, and is
    enumerated in the specifications for each program type.  Some program
    types may not provide any options.


    Section 2.X.7, Program Declarations

    Programs may include a number of declaration statements to specify
    characteristics of the program.  Each declaration statement is followed by
    one or more arguments, separated by commas.

    The set of available declarations depends on the program type, and is
    enumerated in the specifications for each program type.  Some program
    types may not provide declarations.


    Section 2.X.8, Program Instruction Set

    The following sections enumerate the set of instructions supported for GPU
    programs.

    Some instructions allow the use of one of the three basic data type
    modifiers (floating point, signed integer, and unsigned integer).  Unless
    otherwise mentioned:

      * the result and all of the operands will be interpreted according to
        the specified data type, and

      * if no data type modifier is specified, the instruction will operate as
        though a floating-point modifier ("F") were specified.

    Some instructions will override one or both of these rules.


    Section 2.X.8.Z, ABS:  Absolute Value

    The ABS instruction performs a component-wise absolute value operation on
    the single operand to yield a result vector.

      tmp = VectorLoad(op0);
      result.x = abs(tmp.x);
      result.y = abs(tmp.y);
      result.z = abs(tmp.z);
      result.w = abs(tmp.w);

    ABS supports all three data type modifiers.  Taking the absolute value of
    an unsigned integer is not a useful operation, but is not illegal.


    Section 2.X.8.Z, ADD:  Add

    The ADD instruction performs a component-wise add of the two operands to
    yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = tmp0.x + tmp1.x;
      result.y = tmp0.y + tmp1.y;
      result.z = tmp0.z + tmp1.z;
      result.w = tmp0.w + tmp1.w;

    ADD supports all three data type modifiers.


    Section 2.X.8.Z, AND:  Bitwise AND

    The AND instruction performs a bitwise AND operation on the components of
    the two source vectors to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = tmp0.x & tmp1.x;
      result.y = tmp0.y & tmp1.y;
      result.z = tmp0.z & tmp1.z;
      result.w = tmp0.w & tmp1.w;

    AND supports only signed and unsigned integer data type modifiers.  If no
    type modifier is specified, both operands and the result are treated as
    signed integers.


    Section 2.X.8.Z, BRK:  Break out of Loop Instruction

    The BRK instruction conditionally transfers control to the instruction
    immediately following the next ENDREP instruction.  A BRK instruction has
    no effect if the condition code test evaluates to FALSE.

    The following pseudocode describes the operation of the instruction:

      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
          TestCC(cc.**c*) || TestCC(cc.***c)) {
        continue execution at instruction following the next ENDREP;
      }


    Section 2.X.8.Z, CAL:  Subroutine Call

    The CAL instruction conditionally transfers control to the instruction
    following the label specified in the instruction.  It also pushes a
    reference to the instruction immediately following the CAL instruction
    onto the call stack, where execution will continue after executing the
    matching RET instruction.  The following pseudocode describes the
    operation of the instruction:

      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
          TestCC(cc.**c*) || TestCC(cc.***c)) {
        if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) {
          // undefined results
        } else {
          callStack[callStackDepth] = nextInstruction;
          callStackDepth++;
        }
        // continue execution at instruction following <instTarget>
      } else {
        // do nothing
      }

    In the pseudocode, <instTarget> is the label specified in the instruction
    matching the <branchLabel> grammar rule, <callStackDepth> is the current
    depth of the call stack, <callStack> is an array holding the call stack,
    and <nextInstruction> is a reference to the instruction immediately
    following the CAL instruction in the program string.

    If the call stack overflows, the results of the CAL instruction are
    undefined, and can result in immediate program termination.

    An instruction label signifies the beginning of a new subroutine.
    Subroutines may not nest or overlap.  If a CAL instruction is executed and
    subsequent program execution reaches an instruction label before a
    corresponding RET instruction is executed, the subroutine call returns
    immediately, as though an unconditional RET instruction were inserted
    immediately before the instruction label.

    (Note:  On previous vertex program extensions -- NV_vertex_program2 and
    NV_vertex_program3 -- instruction labels were also used as targets for
    branch (BRA) instructions.  This unstructured branching functionality has
    been replaced with the structured branching constructs found in this
    instruction set.)


    Section 2.X.8.Z, CEIL:  Ceiling

    The CEIL instruction loads a single vector operand and performs a
    component-wise ceiling operation to generate a result vector.

      tmp = VectorLoad(op0);
      iresult.x = ceil(tmp.x);
      iresult.y = ceil(tmp.y);
      iresult.z = ceil(tmp.z);
      iresult.w = ceil(tmp.w);

    The ceiling operation returns the nearest integer greater than or equal to
    the operand.  For example ceil(-1.7) = -1.0, ceil(+1.0) = +1.0, and
    ceil(+3.7) = +4.0.

    CEIL supports all three data type modifiers.  The single operand is always
    treated as a floating-point vector, but the result is written as a
    floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier.  If a value is not exactly
    representable using the data type of the result (e.g., an overflow or
    writing a negative value to an unsigned integer), the result is undefined.


    Section 2.X.8.Z, CMP:  Compare

    The CMP instructions performs a component-wise comparison of the first
    operand against zero, and copies the values of the second or third
    operands based on the results of the compare.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      tmp2 = VectorLoad(op2);
      result.x = (tmp0.x < 0) ? tmp1.x : tmp2.x;
      result.y = (tmp0.y < 0) ? tmp1.y : tmp2.y;
      result.z = (tmp0.z < 0) ? tmp1.z : tmp2.z;
      result.w = (tmp0.w < 0) ? tmp1.w : tmp2.w;

    CMP supports all three data type modifiers.  CMP with an unsigned data
    type modifier is not a useful operation, but is not illegal.


    Section 2.X.8.Z, CONT:  Continue with Next Loop Iteration

    The CONT instruction conditionally transfers control to the next ENDREP
    instruction.  A CONT instruction has no effect if the condition code test
    evaluates to FALSE.

    The following pseudocode describes the operation of the instruction:

      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
          TestCC(cc.**c*) || TestCC(cc.***c)) {
        continue execution at the next ENDREP;
      }


    Section 2.X.8.Z, COS:  Cosine with Reduction to [-PI,PI]

    The COS instruction approximates the trigonometric cosine of the angle
    specified by the scalar operand and replicates it to all four components
    of the result vector.  The angle is specified in radians and does not have
    to be in the range [-PI,PI].

      tmp = ScalarLoad(op0);
      result.x = ApproxCosine(tmp);
      result.y = ApproxCosine(tmp);
      result.z = ApproxCosine(tmp);
      result.w = ApproxCosine(tmp);

    COS supports only floating-point data type modifiers.


    Section 2.X.8.Z, DDX:  Partial Derivative Relative to X

    The DDX instruction computes approximate partial derivatives of a vector
    operand with respect to the X window coordinate, and is only available to
    fragment programs.  See the NV_fragment_program4 specification for more
    details.


    Section 2.X.8.Z, DDY:  Partial Derivative Relative to Y

    The DDY instruction computes approximate partial derivatives of a vector
    operand with respect to the Y window coordinate, and is only available to
    fragment programs.  See the NV_fragment_program4 specification for more
    details.


    Section 2.X.8.Z, DIV:  Divide Vector Components by Scalar

    The DIV instruction performs a component-wise divide of the first vector
    operand by the second scalar operand to produce a 4-component result
    vector.

      tmp0 = VectorLoad(op0);
      tmp1 = ScalarLoad(op1);
      result.x = tmp0.x / tmp1;
      result.y = tmp0.y / tmp1;
      result.z = tmp0.z / tmp1;
      result.w = tmp0.w / tmp1;

    DIV supports all three data type modifiers.  For floating-point division,
    this instruction is not guaranteed to produce results identical to a
    RCP/MUL instruction sequence.

    The results of an signed or unsigned integer division by zero are
    undefined.


    Section 2.X.8.Z, DP2:  2-Component Dot Product

    The DP2 instruction computes a two-component dot product of the two
    operands (using the first two components) and replicates the dot product
    to all four components of the result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y);
      result.x = dot;
      result.y = dot;
      result.z = dot;
      result.w = dot;

    DP2 supports only floating-point data type modifiers.


    Section 2.X.8.Z, DP2A:  2-Component Dot Product with Scalar Add

    The DP2 instruction computes a two-component dot product of the two
    operands (using the first two components), adds the x component of the
    third operand, and replicates the result to all four components of the
    result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      tmp2 = VectorLoad(op2);
      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + tmp2.x;
      result.x = dot;
      result.y = dot;
      result.z = dot;
      result.w = dot;

    DP2A supports only floating-point data type modifiers.


    Section 2.X.8.Z, DP3:  3-Component Dot Product

    The DP3 instruction computes a three-component dot product of the two
    operands (using the x, y, and z components) and replicates the dot product
    to all four components of the result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
            (tmp0.z * tmp1.z);
      result.x = dot;
      result.y = dot;
      result.z = dot;
      result.w = dot;

    DP3 supports only floating-point data type modifiers.


    Section 2.X.8.Z, DP4:  4-Component Dot Product

    The DP4 instruction computes a four-component dot product of the two
    operands and replicates the dot product to all four components of the
    result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1):
      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
            (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w);
      result.x = dot;
      result.y = dot;
      result.z = dot;
      result.w = dot;

    DP4 supports only floating-point data type modifiers.


    Section 2.X.8.Z, DPH:  Homogeneous Dot Product

    The DPH instruction computes a three-component dot product of the two
    operands (using the x, y, and z components), adds the w component of the
    second operand, and replicates the sum to all four components of the
    result vector.  This is equivalent to a four-component dot product where
    the w component of the first operand is forced to 1.0.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1):
      dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
            (tmp0.z * tmp1.z) + tmp1.w;
      result.x = dot;
      result.y = dot;
      result.z = dot;
      result.w = dot;

    DPH supports only floating-point data type modifiers.


    Section 2.X.8.Z, DST:  Distance Vector

    The DST instruction computes a distance vector from two specially-
    formatted operands.  The first operand should be of the form [NA, d^2,
    d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],
    where NA values are not relevant to the calculation and d is a vector
    length.  If both vectors satisfy these conditions, the result vector will
    be of the form [1.0, d, d^2, 1/d].

    The exact behavior is specified in the following pseudo-code:

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = 1.0;
      result.y = tmp0.y * tmp1.y;
      result.z = tmp0.z;
      result.w = tmp1.w;

    Given an arbitrary vector, d^2 can be obtained using the DP3 instruction
    (using the same vector for both operands) and 1/d can be obtained from d^2
    using the RSQ instruction.

    This distance vector is useful for per-vertex light attenuation
    calculations:  a DP3 operation using the distance vector and an
    attenuation constants vector as operands will yield the attenuation
    factor.

    DST supports only floating-point data type modifiers.


    Section 2.X.8.Z, ELSE:  Start of If Test Else Block

    The ELSE instruction signifies the end of the "execute if true" portion of
    an IF/ELSE/ENDIF block and the beginning of the "execute if false"
    portion.

    If the condition evaluated at the IF statement was TRUE, when a program
    reaches the ELSE statement, it has completed the entire "execute if true"
    portion of the IF/ELSE/ENDIF block.  Execution will continue at the
    corresponding ENDIF instruction.

    If the condition evaluated at the IF statement was FALSE, program
    execution would skip over the entire "execute if true" portion of the
    IF/ELSE/ENDIF block, including the ELSE instruction.


    Section 2.X.8.Z, EMIT:  Emit Vertex

    The EMIT instruction emits a new vertex to be added to the current output
    primitive generated by a geometry program, and is only available to
    geometry programs.  See the NV_geometry_program4 specification for more
    details.


    Section 2.X.8.Z, ENDIF:  End of If Test Block

    The ENDIF instruction signifies the end of an IF/ELSE/ENDIF block.  It has
    no other effect on program execution.


    Section 2.X.8,Z, ENDPRIM:  End of Primitive

    A geometry program can emit multiple primitives in a single invocation.
    The ENDPRIM instruction is used in a geometry program to signify the end
    of the current primitive and the beginning of a new primitive of the same
    type.  It is only available to geometry programs.  See the
    NV_geometry_program4 specification for more details.


    Section 2.X.8.Z, ENDREP:  End of Repeat Block

    The ENDREP instruction specifies the end of a REP block.

    When used with in conjunction with a REP instruction with a loop count,
    ENDREP decrements the loop counter.  If the decremented loop counter is
    greater than zero, ENDREP transfers control to the instruction immediately
    after the corresponding REP instruction.  If the loop counter is less than
    or equal to zero, execution continues at the instruction following the
    ENDREP instruction.  When used in conjunction with a REP instruction
    without loop count, ENDREP always transfers control to the instruction
    immediately after the REP instruction.

      if (REP instruction includes a loop count) {
        LoopCount--;
        if (LoopCount > 0) {
          continue execution at instruction following corresponding REP
            instruction;
        }
      } else {
        continue execution at instruction following corresponding REP
          instruction;
      }


    Section 2.X.8.Z, EX2:  Exponential Base 2

    The EX2 instruction approximates 2 raised to the power of the scalar
    operand and replicates the approximation to all four components of the
    result vector.

      tmp = ScalarLoad(op0);
      result.x = Approx2ToX(tmp);
      result.y = Approx2ToX(tmp);
      result.z = Approx2ToX(tmp);
      result.w = Approx2ToX(tmp);

    EX2 supports only floating-point data type modifiers.


    Section 2.X.8.Z, FLR:  Floor

    The FLR instruction loads a single vector operand and performs a
    component-wise floor operation to generate a result vector.

      tmp = VectorLoad(op0);
      result.x = floor(tmp.x);
      result.y = floor(tmp.y);
      result.z = floor(tmp.z);
      result.w = floor(tmp.w);

    The floor operation returns the nearest integer less than or equal to the
    operand.  For example floor(-1.7) = -2.0, floor(+1.0) = +1.0, and floor(+3.7)
    = +3.0.

    FLR supports all three data type modifiers.  The single operand is always
    treated as a floating-point value, but the result is written as a
    floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier.  If a value is not exactly
    representable using the data type of the result (e.g., an overflow or
    writing a negative value to an unsigned integer), the result is undefined.


    Section 2.X.8.Z, FRC:  Fraction

    The FRC instruction extracts the fractional portion of each component of
    the operand to generate a result vector.  The fractional portion of a
    component is defined as the result after subtracting off the floor of the
    component (see FLR), and is always in the range [0.0, 1.0).

    For negative values, the fractional portion is NOT the number written to
    the right of the decimal point -- the fractional portion of -1.7 is not
    0.7 -- it is 0.3.  0.3 is produced by subtracting the floor of -1.7 (-2.0)
    from -1.7.

      tmp = VectorLoad(op0);
      result.x = fraction(tmp.x);
      result.y = fraction(tmp.y);
      result.z = fraction(tmp.z);
      result.w = fraction(tmp.w);

    FRC supports only floating-point data type modifiers.


    Section 2.X.8.Z, I2F:  Integer to Float

    The I2F instruction converts the components of an integer vector operand
    to floating-point to produce a floating-point result vector.

      tmp = VectorLoad(op0);
      result.x = (float) tmp.x;
      result.y = (float) tmp.y;
      result.z = (float) tmp.z;
      result.w = (float) tmp.w;

    I2F supports only signed and unsigned integer data type modifiers.  The
    single operand is interpreted according to the data type modifier.  If no
    data type modifier is specified, the operand is treated as a signed
    integer vector.  The result is always written as a float.


    Section 2.X.8.Z, IF:  Start of If Test Block

    The IF instruction performs a condition code test to determine what
    instructions inside an IF/ELSE/ENDIF block are executed.  If the test
    passes, execution continues at the instruction immediately following the
    IF instruction.  If the test fails, IF transfers control to the
    instruction immediately following the corresponding ELSE instruction (if
    present) or the ENDIF instruction (if no ELSE is present).

    Implementations may have a limited ability to nest IF blocks in any
    subroutine.  If the number of IF/ENDIF blocks nested inside each other is
    MAX_PROGRAM_IF_DEPTH_NV or higher, a program will fail to compile.

      // Evaluate the condition.  If the condition is true, continue at the
      // next instruction.  Otherwise, continue at the
      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
          TestCC(cc.**c*) || TestCC(cc.***c)) {
        continue execution at the next instruction;
      } else if (IF block contains an ELSE statement) {
        continue execution at instruction following corresponding ELSE;
      } else {
        continue execution at instruction following corresponding ENDIF;
      }

    (Note:  Unlike the NV_fragment_program2 extension, there is no run-time
    limit on the maximum overall depth of IF/ENDIF nesting.  As long as each
    individual subroutine of the program obeys the static nesting limits,
    there will be no run-time errors in the program.  With the
    NV_fragment_program2 extension, a program could terminate abnormally if it
    called a subroutine inside a very deeply nested set of IF/ENDIF blocks and
    the called subroutine also contained deeply nested IF/ENDIF blocks.  SUch
    an error could occur even if neither subroutine exceeded static limits.)


    Section 2.X.8.Z, KIL:  Kill Fragment

    The KIL instruction conditionally kills a fragment, and is only available
    to fragment programs.  See the NV_fragment_program4 specification for more
    details.


    Section 2.X.8.Z, LG2:  Logarithm Base 2

    The LG2 instruction approximates the base 2 logarithm of the scalar
    operand and replicates it to all four components of the result vector.

      tmp = ScalarLoad(op0);
      result.x = ApproxLog2(tmp);
      result.y = ApproxLog2(tmp);
      result.z = ApproxLog2(tmp);
      result.w = ApproxLog2(tmp);

    If the scalar operand is zero or negative, the result is undefined.

    LG2 supports only floating-point data type modifiers.


    Section 2.X.8.Z, LIT:  Compute Lighting Coefficients

    The LIT instruction accelerates lighting computations by computing
    lighting coefficients for ambient, diffuse, and specular light
    contributions.  The "x" component of the single operand is assumed to hold
    a diffuse dot product (n dot VP_pli, as in the vertex lighting equations
    in Section 2.13.1).  The "y" component of the operand is assumed to hold a
    specular dot product (n dot h_i).  The "w" component of the operand is
    assumed to hold the specular exponent of the material (s_rm), and is
    clamped to the range (-128, +128) exclusive.

    The "x" component of the result vector receives the value that should be
    multiplied by the ambient light/material product (always 1.0).  The "y"
    component of the result vector receives the value that should be
    multiplied by the diffuse light/material product (n dot VP_pli).  The "z"
    component of the result vector receives the value that should be
    multiplied by the specular light/material product (f_i * (n dot h_i) ^
    s_rm).  The "w" component of the result is the constant 1.0.

    Negative diffuse and specular dot products are clamped to 0.0, as is done
    in the standard per-vertex lighting operations.  In addition, if the
    diffuse dot product is zero or negative, the specular coefficient is
    forced to zero.

      tmp = VectorLoad(op0);
      if (tmp.x < 0) tmp.x = 0;
      if (tmp.y < 0) tmp.y = 0;
      if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon);
      else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon;
      result.x = 1.0;
      result.y = tmp.x;
      result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0;
      result.w = 1.0;

    Since 0^0 is defined to be 1, RoughApproxPower(0.0, 0.0) will produce 1.0.

    LIT supports only floating-point data type modifiers.


    Section 2.X.8.Z, LRP:  Linear Interpolation

    The LRP instruction performs a component-wise linear interpolation between
    the second and third operands using the first operand as the blend factor.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      tmp2 = VectorLoad(op2);
      result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;
      result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;
      result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;
      result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;

    LRP supports only floating-point data type modifiers.


    Section 2.X.8.Z, MAD:  Multiply and Add

    The MAD instruction performs a component-wise multiply of the first two
    operands, and then does a component-wise add of the product to the third
    operand to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      tmp2 = VectorLoad(op2);
      result.x = tmp0.x * tmp1.x + tmp2.x;
      result.y = tmp0.y * tmp1.y + tmp2.y;
      result.z = tmp0.z * tmp1.z + tmp2.z;
      result.w = tmp0.w * tmp1.w + tmp2.w;

    The multiplication and addition operations in this instruction are subject
    to the same rules as described for the MUL and ADD instructions.

    MAD supports all three data type modifiers.


    Section 2.X.8.Z, MAX:  Maximum

    The MAX instruction computes component-wise maximums of the values in the
    two operands to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x > tmp1.x) ? tmp0.x : tmp1.x;
      result.y = (tmp0.y > tmp1.y) ? tmp0.y : tmp1.y;
      result.z = (tmp0.z > tmp1.z) ? tmp0.z : tmp1.z;
      result.w = (tmp0.w > tmp1.w) ? tmp0.w : tmp1.w;

    MAX supports all three data type modifiers.


    Section 2.X.8.Z, MIN:  Minimum

    The MIN instruction computes component-wise minimums of the values in the
    two operands to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x > tmp1.x) ? tmp1.x : tmp0.x;
      result.y = (tmp0.y > tmp1.y) ? tmp1.y : tmp0.y;
      result.z = (tmp0.z > tmp1.z) ? tmp1.z : tmp0.z;
      result.w = (tmp0.w > tmp1.w) ? tmp1.w : tmp0.w;

    MIN supports all three data type modifiers.


    Section 2.X.8.Z, MOD:  Modulus

    The MOD instruction performs a component-wise modulus operation on the first
    vector operand by the second scalar operand to produce a 4-component result
    vector.

      tmp0 = VectorLoad(op0);
      tmp1 = ScalarLoad(op1);
      result.x = tmp0.x % tmp1;
      result.y = tmp0.y % tmp1;
      result.z = tmp0.z % tmp1;
      result.w = tmp0.w % tmp1;

    MOD supports both signed and unsigned integer data type modifiers.  If no
    data type modifier is specified, both operands and the result are treated
    as signed integers.

    A result component is undefined if the corresponding component of the
    first operand is negative or if the second operand is less than or equal
    to zero.


    Section 2.X.8.Z, MOV:  Move

    The MOV instruction copies the value of the operand to yield a result
    vector.

      result = VectorLoad(op0);

    MOV supports all three data type modifiers.


    Section 2.X.8.Z, MUL:  Multiply

    The MUL instruction performs a component-wise multiply of the two operands
    to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = tmp0.x * tmp1.x;
      result.y = tmp0.y * tmp1.y;
      result.z = tmp0.z * tmp1.z;
      result.w = tmp0.w * tmp1.w;

    MUL supports all three data type modifiers.  The MUL instruction
    additionally supports three special modifiers.

    The "S24" and "U24" modifiers specify "fast" signed or unsigned integer
    multiplies of 24-bit quantities, respectively.  The results of such
    multiplies are undefined if either operand is outside the range
    [-2^23,+2^23-1] for S24 or [0,2^24-1] for U24.  If "S24" or "U24" is
    specified, the data type is implied and normal data type modifiers may not
    be provided.

    The "HI" modifier specifies a 32-bit integer multiply that returns the 32
    most significant bits of the 64-bit product.  Integer multiplies without
    the "HI" modifier normally return the least significant bits of the
    product.  If "HI" is specified, either of the "S" or "U" integer data type
    modifiers must also be specified.

    Note that if condition code updates are performed on integer multiplies,
    the overflow or carry flags are always cleared, even if the product
    overflowed.  If it is necessary to determine if the results of an integer
    multiply overflowed, the MUL.HI instruction may be used.


    Section 2.X.8.Z, NOT:  Bitwise Not

    The NOT instruction performs a component-wise bitwise NOT operation on the
    source vector to produce a result vector.

      tmp = VectorLoad(op0);
      tmp.x = ~tmp.x;
      tmp.y = ~tmp.y;
      tmp.z = ~tmp.z;
      tmp.w = ~tmp.w;

    NOT supports only integer data type modifiers.  If no type modifier is
    specified, the operand and the result are treated as signed integers.


    Section 2.X.8.Z, NRM:  Normalize 3-Component Vector

    The NRM instruction normalizes the vector given by the x, y, and z
    components of the vector operand to produce the x, y, and z components of
    the result vector.  The w component of the result is undefined.

      tmp = VectorLoad(op0);
      scale = ApproxRSQ(tmp.x * tmp.x + tmp.y * tmp.y + tmp.z * tmp.z);
      result.x = tmp.x * scale;
      result.y = tmp.y * scale;
      result.z = tmp.z * scale;
      result.w = undefined;

    NRM supports only floating-point data type modifiers.


    Section 2.X.8.Z, OR:  Bitwise Or

    The OR instruction performs a bitwise OR operation on the components of
    the two source vectors to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = tmp0.x | tmp1.x;
      result.y = tmp0.y | tmp1.y;
      result.z = tmp0.z | tmp1.z;
      result.w = tmp0.w | tmp1.w;

    OR supports only integer data type modifiers.  If no type modifier is
    specified, both operands and the result are treated as signed integers.


    Section 2.X.8.Z, PK2H:  Pack Two 16-bit Floats

    The PK2H instruction converts the "x" and "y" components of the single
    floating-point vector operand into 16-bit floating-point format, packs the
    bit representation of these two floats into a 32-bit unsigned integer, and
    replicates that value to all four components of the result vector.  The
    PK2H instruction can be reversed by the UP2H instruction below.

      tmp0 = VectorLoad(op0);
      /* result obtained by combining raw bits of tmp0.x, tmp0.y */
      result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
      result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
      result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
      result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);

    PK2H supports all three data type modifiers.  The single operand is always
    treated as a floating-point value, but the result is written as a
    floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier.  For integer results, the bits can be
    interpreted as described above.  For floating-point result variables, the
    packed results do not constitute a meaningful floating-point variable and
    should only be used to feed future unpack instructions.

    A program will fail to load if it contains a PK2H instruction that writes
    its results to a variable declared as "SHORT".


    Section 2.X.8.Z, PK2US:  Pack Two Floats as Unsigned 16-bit

    The PK2US instruction converts the "x" and "y" components of the single
    floating-point vector operand into a packed pair of 16-bit unsigned
    scalars.  The scalars are represented in a bit pattern where all '0' bits
    corresponds to 0.0 and all '1' bits corresponds to 1.0.  The bit
    representations of the two converted components are packed into a 32-bit
    unsigned integer, and that value is replicated to all four components of
    the result vector.  The PK2US instruction can be reversed by the UP2US
    instruction below.

      tmp0 = VectorLoad(op0);
      if (tmp0.x < 0.0) tmp0.x = 0.0;
      if (tmp0.x > 1.0) tmp0.x = 1.0;
      if (tmp0.y < 0.0) tmp0.y = 0.0;
      if (tmp0.y > 1.0) tmp0.y = 1.0;
      us.x = round(65535.0 * tmp0.x);  /* us is a ushort vector */
      us.y = round(65535.0 * tmp0.y);
      /* result obtained by combining raw bits of us. */
      result.x = ((us.x) | (us.y << 16));
      result.y = ((us.x) | (us.y << 16));
      result.z = ((us.x) | (us.y << 16));
      result.w = ((us.x) | (us.y << 16));

    PK2US supports all three data type modifiers.  The single operand is
    always treated as a floating-point value, but the result is written as a
    floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier.  For integer result variables, the
    bits can be interpreted as described above.  For floating-point result
    variables, the packed results do not constitute a meaningful
    floating-point variable and should only be used to feed future unpack
    instructions.

    A program will fail to load if it contains a PK2US instruction that writes
    its results to a variable declared as "SHORT".


    Section 2.X.8.Z, PK4B:  Pack Four Floats as Signed 8-bit

    The PK4B instruction converts the four components of the single
    floating-point vector operand into 8-bit signed quantities.  The signed
    quantities are represented in a bit pattern where all '0' bits corresponds
    to -128/127 and all '1' bits corresponds to +127/127.  The bit
    representations of the four converted components are packed into a 32-bit
    unsigned integer, and that value is replicated to all four components of
    the result vector.  The PK4B instruction can be reversed by the UP4B
    instruction below.

      tmp0 = VectorLoad(op0);
      if (tmp0.x < -128/127) tmp0.x = -128/127;
      if (tmp0.y < -128/127) tmp0.y = -128/127;
      if (tmp0.z < -128/127) tmp0.z = -128/127;
      if (tmp0.w < -128/127) tmp0.w = -128/127;
      if (tmp0.x > +127/127) tmp0.x = +127/127;
      if (tmp0.y > +127/127) tmp0.y = +127/127;
      if (tmp0.z > +127/127) tmp0.z = +127/127;
      if (tmp0.w > +127/127) tmp0.w = +127/127;
      ub.x = round(127.0 * tmp0.x + 128.0);  /* ub is a ubyte vector */
      ub.y = round(127.0 * tmp0.y + 128.0);
      ub.z = round(127.0 * tmp0.z + 128.0);
      ub.w = round(127.0 * tmp0.w + 128.0);
      /* result obtained by combining raw bits of ub. */
      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));

    PK4B supports all three data type modifiers.  The single operand is always
    treated as a floating-point value, but the result is written as a
    floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier.  For integer result variables, the
    bits can be interpreted as described above.  For floating-point result
    variables, the packed results do not constitute a meaningful
    floating-point variable and should only be used to feed future unpack
    instructions.  A program will fail to load if it contains a PK4B
    instruction that writes its results to a variable declared as "SHORT".


    Section 2.X.8.Z, PK4UB:  Pack Four Floats as Unsigned 8-bit

    The PK4UB instruction converts the four components of the single
    floating-point vector operand into a packed grouping of 8-bit unsigned
    scalars.  The scalars are represented in a bit pattern where all '0' bits
    corresponds to 0.0 and all '1' bits corresponds to 1.0.  The bit
    representations of the four converted components are packed into a 32-bit
    unsigned integer, and that value is replicated to all four components of
    the result vector.  The PK4UB instruction can be reversed by the UP4UB
    instruction below.

      tmp0 = VectorLoad(op0);
      if (tmp0.x < 0.0) tmp0.x = 0.0;
      if (tmp0.x > 1.0) tmp0.x = 1.0;
      if (tmp0.y < 0.0) tmp0.y = 0.0;
      if (tmp0.y > 1.0) tmp0.y = 1.0;
      if (tmp0.z < 0.0) tmp0.z = 0.0;
      if (tmp0.z > 1.0) tmp0.z = 1.0;
      if (tmp0.w < 0.0) tmp0.w = 0.0;
      if (tmp0.w > 1.0) tmp0.w = 1.0;
      ub.x = round(255.0 * tmp0.x);  /* ub is a ubyte vector */
      ub.y = round(255.0 * tmp0.y);
      ub.z = round(255.0 * tmp0.z);
      ub.w = round(255.0 * tmp0.w);
      /* result obtained by combining raw bits of ub. */
      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));

    PK4UB supports all three data type modifiers.  The single operand is
    always treated as a floating-point value, but the result is written as a
    floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier.  For integer result variables, the
    bits can be interpreted as described above.  For floating-point result
    variables, the packed results do not constitute a meaningful
    floating-point variable and should only be used to feed future unpack
    instructions.

    A program will fail to load if it contains a PK4UB instruction that writes
    its results to a variable declared as "SHORT".


    Section 2.X.8.Z, POW:  Exponentiate

    The POW instruction approximates the value of the first scalar operand
    raised to the power of the second scalar operand and replicates it to all
    four components of the result vector.

      tmp0 = ScalarLoad(op0);
      tmp1 = ScalarLoad(op1);
      result.x = ApproxPower(tmp0, tmp1);
      result.y = ApproxPower(tmp0, tmp1);
      result.z = ApproxPower(tmp0, tmp1);
      result.w = ApproxPower(tmp0, tmp1);

    The exponentiation approximation function may be implemented using the
    base 2 exponentiation and logarithm approximation operations in the EX2
    and LG2 instructions.  In particular,

      ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)).

    Note that a logarithm may be involved even for cases where the exponent is
    an integer.  This means that it may not be possible to exponentiate
    correctly with a negative base.  In constrast, it is possible in a
    "normal" mathematical formulation to raise negative numbers to integral
    powers (e.g., (-3)^2== 9, and (-0.5)^-2==4).

    POW supports only floating-point data type modifiers.


    Section 2.X.8.Z, RCC:  Reciprocal (Clamped)

    The RCC instruction approximates the reciprocal of the scalar operand,
    clamps the result to one of two ranges, and replicates the clamped result
    to all four components of the result vector.

    If the approximated reciprocal is greater than 0.0, the result is clamped
    to the range [2^-64, 2^+64].  If the approximate reciprocal is not greater
    than zero, the result is clamped to the range [-2^+64, -2^-64].

      tmp = ScalarLoad(op0);
      result.x = ClampApproxReciprocal(tmp);
      result.y = ClampApproxReciprocal(tmp);
      result.z = ClampApproxReciprocal(tmp);
      result.w = ClampApproxReciprocal(tmp);

    RCC supports only floating-point data type modifiers.


    Section 2.X.8.Z, RCP:  Reciprocal

    The RCP instruction approximates the reciprocal of the scalar operand and
    replicates it to all four components of the result vector.

      tmp = ScalarLoad(op0);
      result.x = ApproxReciprocal(tmp);
      result.y = ApproxReciprocal(tmp);
      result.z = ApproxReciprocal(tmp);
      result.w = ApproxReciprocal(tmp);

    RCP supports only floating-point data type modifiers.


    Section 2.X.8.Z, REP:  Start of Repeat Block

    The REP instruction begins a REP/ENDREP block.  The REP instruction
    supports an optional operand whose x component specifies the initial value
    for the loop count.  The loop count indicates the number of times the
    instructions between the REP and corresponding ENDREP instruction will be
    executed.  If the initial value of the loop count is not positive, the
    entire block is skipped and execution continues at the instruction
    following the corresponding ENDREP instruction.  If the loop count is
    specified as a floating-point value, it is converted to the largest
    integer less than or equal to the specified value (i.e., taking its
    floor).

    If no operand is provided to REP, the loop count is ignored and the
    corresponding ENDREP instruction unconditionally transfers control to the
    instruction immediately following the REP instruction.  The only way to
    exit such a loop is with the BRK instruction.  To prevent obvious infinite
    loops, a program that includes a REP/ENDREP block with no loop count will
    fail to compile unless it contains either a BRK instruction at the current
    nesting level or a RET instruction at any nesting level.

    Implementations may have a limited ability to nest REP/ENDREP blocks.  If
    the number of REP/ENDREP blocks nested inside each other is
    MAX_PROGRAM_LOOP_DEPTH_NV or higher, a program will fail to compile.

      // Set up loop information for the new nesting level.
      tmp = VectorLoad(op0);
      LoopCount = floor(tmp.x);
      if (LoopCount <= 0) {
        continue execution at the corresponding ENDREP;
      }

    REP supports all three data type modifiers.  The single operand is
    interpreted according to the data type modifier.

    (Note:  Unlike the NV_fragment_program2 extension, REP blocks in this
    extension support fully general looping; the specified loop count can be
    computed in the program itself.  Additionally, there is no run-time limit
    on the maximum overall depth of REP/ENDREP nesting.  As long as each
    individual subroutine of the program obeys the static nesting limits,
    there will be no run-time errors in the program.  With the
    NV_fragment_program2 extension, a program could terminate abnormally if it
    called a subroutine inside a deeply nested set of REP/ENDREP blocks and
    the called subroutine also contained deeply nested REP/ENDREP blocks.
    Such an error could occur even if neither subroutine exceeded static
    limits.)


    Section 2.X.8.Z, RET:  Subroutine Return

    The RET instruction conditionally returns from a subroutine initiated by a
    CAL instruction by popping an instruction reference off the top of the
    call stack and transferring control to the referenced instruction.  The
    following pseudocode describes the operation of the instruction:

      if (TestCC(cc.c***) || TestCC(cc.*c**) ||
          TestCC(cc.**c*) || TestCC(cc.***c)) {
        if (callStackDepth <= 0) {
          // terminate program
        } else {
          callStackDepth--;
          instruction = callStack[callStackDepth];
        }

        // continue execution at <instruction>
      } else {
        // do nothing
      }

    In the pseudocode, <callStackDepth> is the depth of the call stack,
    <callStack> is an array holding the call stack, and <instruction> is a
    reference to an instruction previously pushed onto the call stack.

    If the call stack is empty when RET executes, the program terminates
    normally.


    Section 2.X.8.Z, RFL:  Reflection Vector

    The RFL instruction computes the reflection of the second vector operand
    (the "direction" vector) about the vector specified by the first vector
    operand (the "axis" vector).  Both operands are treated as 3D vectors (the
    w components are ignored).  The result vector is another 3D vector (the
    "reflected direction" vector).  The length of the result vector, ignoring
    rounding errors, should equal that of the second operand.

      axis = VectorLoad(op0);
      direction = VectorLoad(op1);
      tmp.w = (axis.x * axis.x + axis.y * axis.y + axis.z * axis.z);
      tmp.x = (axis.x * direction.x + axis.y * direction.y +
               axis.z * direction.z);
      tmp.x = 2.0 * tmp.x;
      tmp.x = tmp.x / tmp.w;
      result.x = tmp.x * axis.x - direction.x;
      result.y = tmp.x * axis.y - direction.y;
      result.z = tmp.x * axis.z - direction.z;
      result.w = undefined;

    RFL supports only floating-point data type modifiers.


    Section 2.X.8.Z, ROUND:  Round to Nearest Integer

    The ROUND instruction loads a single vector operand and performs a
    component-wise round operation to generate a result vector.

      tmp = VectorLoad(op0);
      result.x = round(tmp.x);
      result.y = round(tmp.y);
      result.z = round(tmp.z);
      result.w = round(tmp.w);

    The round operation returns the nearest integer to the operand.  If the
    fractional portion of the operand is 0.5, round() selects the nearest even
    integer.  For example round(-1.7) = -2.0, round(+1.0) = +1.0, and
    round(+3.7) = +4.0.

    ROUND supports all three data type modifiers.  The single operand is
    always treated as a floating-point value, but the result is written as a
    floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier.  If a value is not exactly
    representable using the data type of the result (e.g., an overflow or
    writing a negative value to an unsigned integer), the result is undefined.


    Section 2.X.8.Z, RSQ:  Reciprocal Square Root

    The RSQ instruction approximates the reciprocal of the square root of the
    scalar operand and replicates it to all four components of the result
    vector.

      tmp = ScalarLoad(op0);
      result.x = ApproxRSQRT(tmp);
      result.y = ApproxRSQRT(tmp);
      result.z = ApproxRSQRT(tmp);
      result.w = ApproxRSQRT(tmp);

    If the operand is less than or equal to zero, the results of the
    instruction are undefined.

    RSQ supports only floating-point data type modifiers.

    Note that this instruction differs from the RSQ instruction in
    ARB_vertex_program in that it does not implicitly take the absolute value
    of its operand.  The |abs| operator can be used to achieve equivalent
    semantics.


    Section 2.X.8.Z, SAD:  Sum of Absolute Differences

    The SAD instruction performs a component-wise difference of the first two
    integer operands (subtracting the second from the first), and then does a
    component-wise add of the absolute value of the difference to the third
    unsigned integer operand to yield an unsigned integer result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      tmp2 = VectorLoad(op2);
      result.x = abs(tmp0.x - tmp1.x) + tmp2.x;
      result.y = abs(tmp0.y - tmp1.y) + tmp2.y;
      result.z = abs(tmp0.z - tmp1.z) + tmp2.z;
      result.w = abs(tmp0.w - tmp1.w) + tmp2.w;

    SAD supports signed and unsigned integer data type modifiers.  The first
    two operands are interpreted according to the data type modifier.  The
    third operand and the result are always unsigned integers.


    Section 2.X.8.Z, SCS:  Sine/Cosine without Reduction

    The SCS instruction approximates the trigonometric sine and cosine of the
    angle specified by the scalar operand and places the cosine in the x
    component and the sine in the y component of the result vector.  The z and
    w components of the result vector are undefined.  The angle is specified
    in radians and must be in the range [-PI,PI].

      tmp = ScalarLoad(op0);
      result.x = ApproxCosine(tmp);
      result.y = ApproxSine(tmp);
      result.z = undefined;
      result.w = undefined;

    If the scalar operand is not in the range [-PI,PI], the result vector is
    undefined.

    SCS supports only floating-point data type modifiers.


    Section 2.X.8.Z, SEQ:  Set on Equal

    The SEQ instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector returns a TRUE value
    (described below) if the corresponding component of the first operand is
    equal to that of the second, and a FALSE value otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x == tmp1.x) ? TRUE : FALSE;
      result.y = (tmp0.y == tmp1.y) ? TRUE : FALSE;
      result.z = (tmp0.z == tmp1.z) ? TRUE : FALSE;
      result.w = (tmp0.w == tmp1.w) ? TRUE : FALSE;

    SEQ supports all data type modifiers.  For floating-point data types, the
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
    integer data types, the TRUE value is the maximum integer value (all bits
    are ones) and the FALSE value is zero.


    Section 2.X.8.Z, SFL:  Set on False

    The SFL instruction is a degenerate case of the other "Set on"
    instructions that sets all components of the result vector to a FALSE
    value (described below).

      result.x = FALSE;
      result.y = FALSE;
      result.z = FALSE;
      result.w = FALSE;

    SFL supports all data type modifiers.  For floating-point data types, the
    FALSE value is 0.0.  For signed and unsigned integer data types, the FALSE
    value is zero.


    Section 2.X.8.Z, SGE:  Set on Greater Than or Equal

    The SGE instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector returns a TRUE value
    (described below) if the corresponding component of the first operand is
    greater than or equal to that of the second, and a FALSE value otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x >= tmp1.x) ? TRUE : FALSE;
      result.y = (tmp0.y >= tmp1.y) ? TRUE : FALSE;
      result.z = (tmp0.z >= tmp1.z) ? TRUE : FALSE;
      result.w = (tmp0.w >= tmp1.w) ? TRUE : FALSE;

    SGE supports all data type modifiers.  For floating-point data types, the
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
    integer data types, the TRUE value is the maximum integer value (all bits
    are ones) and the FALSE value is zero.


    Section 2.X.8.Z, SGT:  Set on Greater Than

    The SGT instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector returns a TRUE value
    (described below) if the corresponding component of the first operand is
    greater than that of the second, and a FALSE value otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x > tmp1.x) ? TRUE : FALSE;
      result.y = (tmp0.y > tmp1.y) ? TRUE : FALSE;
      result.z = (tmp0.z > tmp1.z) ? TRUE : FALSE;
      result.w = (tmp0.w > tmp1.w) ? TRUE : FALSE;

    SGT supports all data type modifiers.  For floating-point data types, the
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
    integer data types, the TRUE value is the maximum integer value (all bits
    are ones) and the FALSE value is zero.


    Section 2.X.8.Z, SHL:  Shift Left

    The SHL instruction performs a component-wise left shift of the bits of
    the first operand by the value of the second scalar operand to produce a
    result vector.  The bits vacated during the shift operation are filled
    with zeroes.

      tmp0 = VectorLoad(op0);
      tmp1 = ScalarLoad(op1);
      result.x = tmp0.x << tmp1;
      result.y = tmp0.y << tmp1;
      result.z = tmp0.z << tmp1;
      result.w = tmp0.w << tmp1;

    The results of a shift operation ("<<") are undefined if the value of the
    second operand is negative, or greater than or equal to the number of bits
    in the first operand.

    SHL supports both signed and unsigned integer data type modifiers.  If no
    modifier is provided, the operands and the result are treated as signed
    integers.


    Section 2.X.8.Z, SHR:  Shift Right

    The SHR instruction performs a component-wise right shift of the bits of
    the first operand by the value of the second scalar operand to produce a
    result vector.  The bits vacated during shift operation are filled with
    zeros if the operand is non-negative and ones otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = ScalarLoad(op1);
      result.x = tmp0.x >> tmp1;
      result.y = tmp0.y >> tmp1;
      result.z = tmp0.z >> tmp1;
      result.w = tmp0.w >> tmp1;

    The results of a shift operation (">>") are undefined if the value of the
    second operand is negative, or greater than or equal to the number of bits
    in the first operand.

    SHR supports both signed and unsigned integer data type modifiers.  If no
    modifiers are provided, the operands and the result are treated as signed
    integers.


    Section 2.X.8.Z, SIN:  Sine with Reduction to [-PI,PI]

    The SIN instruction approximates the trigonometric sine of the angle
    specified by the scalar operand and replicates it to all four components
    of the result vector.  The angle is specified in radians and does not have
    to be in the range [-PI,PI].

      tmp = ScalarLoad(op0);
      result.x = ApproxSine(tmp);
      result.y = ApproxSine(tmp);
      result.z = ApproxSine(tmp);
      result.w = ApproxSine(tmp);

    SIN supports only floating-point data type modifiers.


    Section 2.X.8.Z, SLE:  Set on Less Than or Equal

    The SLE instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector returns a TRUE value
    (described below) if the corresponding component of the first operand is
    less than or equal to that of the second, and a FALSE value otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x <= tmp1.x) ? TRUE : FALSE;
      result.y = (tmp0.y <= tmp1.y) ? TRUE : FALSE;
      result.z = (tmp0.z <= tmp1.z) ? TRUE : FALSE;
      result.w = (tmp0.w <= tmp1.w) ? TRUE : FALSE;

    SLE supports all data type modifiers.  For floating-point data types, the
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
    integer data types, the TRUE value is the maximum integer value (all bits
    are ones) and the FALSE value is zero.


    Section 2.X.8.Z, SLT:  Set on Less Than

    The SLT instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector returns a TRUE value
    (described below) if the corresponding component of the first operand is
    less than that of the second, and a FALSE value otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x < tmp1.x) ? TRUE : FALSE;
      result.y = (tmp0.y < tmp1.y) ? TRUE : FALSE;
      result.z = (tmp0.z < tmp1.z) ? TRUE : FALSE;
      result.w = (tmp0.w < tmp1.w) ? TRUE : FALSE;

    SLT supports all data type modifiers.  For floating-point data types, the
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
    integer data types, the TRUE value is the maximum integer value (all bits
    are ones) and the FALSE value is zero.


    Section 2.X.8.Z, SNE:  Set on Not Equal

    The SNE instruction performs a component-wise comparison of the two
    operands.  Each component of the result vector returns a TRUE value
    (described below) if the corresponding component of the first operand is
    less than that of the second, and a FALSE value otherwise.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = (tmp0.x != tmp1.x) ? TRUE : FALSE;
      result.y = (tmp0.y != tmp1.y) ? TRUE : FALSE;
      result.z = (tmp0.z != tmp1.z) ? TRUE : FALSE;
      result.w = (tmp0.w != tmp1.w) ? TRUE : FALSE;

    SNE supports all data type modifiers.  For floating-point data types, the
    TRUE value is 1.0 and the FALSE value is 0.0.  For signed integer data
    types, the TRUE value is -1 and the FALSE value is 0.  For unsigned
    integer data types, the TRUE value is the maximum integer value (all bits
    are ones) and the FALSE value is zero.


    Section 2.X.8.Z, SSG:  Set Sign

    The SSG instruction generates a result vector containing the signs of
    each component of the single vector operand.  Each component of the
    result vector is 1.0 if the corresponding component of the operand
    is greater than zero, 0.0 if the corresponding component of the
    operand is equal to zero, and -1.0 if the corresponding component
    of the operand is less than zero.

      tmp = VectorLoad(op0);
      result.x = SetSign(tmp.x);
      result.y = SetSign(tmp.y);
      result.z = SetSign(tmp.z);
      result.w = SetSign(tmp.w);

    SSG supports only floating-point data type modifiers.


    Section 2.X.8.Z, STR:  Set on True

    The STR instruction is a degenerate case of the other "Set on"
    instructions that sets all components of the result vector to a TRUE value
    (described below).

      result.x = TRUE;
      result.y = TRUE;
      result.z = TRUE;
      result.w = TRUE;

    STR supports all data type modifiers.  For floating-point data types, the
    TRUE value is 1.0.  For signed integer data types, the TRUE value is -1.
    For unsigned integer data types, the TRUE value is the maximum integer
    value (all bits are ones).


    Section 2.X.8.Z, SUB:  Subtract

    The SUB instruction performs a component-wise subtraction of the second
    operand from the first to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = tmp0.x - tmp1.x;
      result.y = tmp0.y - tmp1.y;
      result.z = tmp0.z - tmp1.z;
      result.w = tmp0.w - tmp1.w;

    SUB supports all three data type modifiers.


    Section 2.X.8.Z, SWZ:  Extended Swizzle

    The SWZ instruction loads the single vector operand, and performs a
    swizzle operation more powerful than that provided for loading normal
    vector operands to yield an instruction vector.

    After the operand is loaded, the "x", "y", "z", and "w" components of the
    result vector are selected by the first, second, third, and fourth matches
    of the <extSwizComp> pattern in the <extendedSwizzle> rule.

    A result component can be selected from any of the four components of the
    operand or the constants 0.0 and 1.0.  The result component can also be
    optionally negated.  The following pseudocode describes the component
    selection method.  "operand" refers to the vector operand, "select" is an
    enumerant where the values ZERO, ONE, X, Y, Z, and W correspond to the
    <extSwizSel> rule matching "0", "1", "x", "y", "z", and "w", respectively.
    "negate" is TRUE if and only if the <optionalSign> rule in <extSwizComp>
    matches "-".

      float ExtSwizComponent(floatVec operand, enum select, boolean negate)
      {
          float result;
          switch (select) {
            case ZERO:  result = 0.0; break;
            case ONE:   result = 1.0; break;
            case X:     result = operand.x; break;
            case Y:     result = operand.y; break;
            case Z:     result = operand.z; break;
            case W:     result = operand.w; break;
          }
          if (negate) {
            result = -result;
          }
          return result;
      }

    The entire extended swizzle operation is then defined using the following
    pseudocode:

      tmp = VectorLoad(op0);
      result.x = ExtSwizComponent(tmp, xSelect, xNegate);
      result.y = ExtSwizComponent(tmp, ySelect, yNegate);
      result.z = ExtSwizComponent(tmp, zSelect, zNegate);
      result.w = ExtSwizComponent(tmp, wSelect, wNegate);

    "xSelect", "xNegate", "ySelect", "yNegate", "zSelect", "zNegate",
    "wSelect", and "wNegate" correspond to the "select" and "negate" values
    above for the four <extSwizComp> matches.

    Since this instruction allows for component selection and negation for
    each individual component, the grammar does not allow the use of the
    normal swizzle and negation operations allowed for vector operands in
    other instructions.

    SWZ supports only floating-point data type modifiers.


    Section 2.X.8.Z, TEX:  Texture Sample

    The TEX instruction takes the four components of a single floating-point
    source vector and performs a filtered texture access as described in
    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
    floating-point result vector.  Partial derivatives and the level of detail
    are computed automatically.

      tmp = VectorLoad(op0);
      ddx = ComputePartialsX(tmp);
      ddy = ComputePartialsY(tmp);
      lambda = ComputeLOD(ddx, ddy);
      result = TextureSample(tmp, lambda, ddx, ddy, texelOffset);

    TEX supports all three data type modifiers.  The single operand is always
    treated as a floating-point vector; the results are interpreted according
    to the data type modifier.


    Section 2.X.8.Z, TRUNC:  Truncate (Round Toward Zero)

    The TRUNC instruction loads a single vector operand and performs a
    component-wise truncate operation to generate a result vector.

      tmp = VectorLoad(op0);
      result.x = trunc(tmp.x);
      result.y = trunc(tmp.y);
      result.z = trunc(tmp.z);
      result.w = trunc(tmp.w);

    The truncate operation returns the nearest integer to zero smaller in
    magnitude than the operand.  For example trunc(-1.7) = -1.0, trunc(+1.0) =
    +1.0, and trunc(+3.7) = +3.0.

    TRUNC supports all three data type modifiers.  The single operand is
    always treated as a floating-point value, but the result is written as a
    floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier.  If a value is not exactly
    representable using the data type of the result (e.g., an overflow or
    writing a negative value to an unsigned integer), the result is undefined.


    Section 2.X.8.Z, TXB:  Texture Sample with Bias

    The TXB instruction takes the four components of a single floating-point
    source vector and performs a filtered texture access as described in
    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
    floating-point result vector.  Partial derivatives and the level of detail
    are computed automatically, but the fourth component of the source vector
    is added to the computed LOD prior to sampling.

      tmp = VectorLoad(op0);
      ddx = ComputePartialsX(tmp);
      ddy = ComputePartialsY(tmp);
      lambda = ComputeLOD(ddx, ddy);
      result = TextureSample(tmp, lambda + tmp.w, ddx, ddy, texelOffset);

    The single source vector in the TXB instruction does not have enough
    coordinates to specify a lookup into a two-dimensional array texture or
    cube map texture with both an LOD bias and an explicit reference value for
    depth comparison.  A program will fail to load if it contains a TXB
    instruction with a target of SHADOWCUBE or SHADOWARRAY2D.

    TXB supports all three data type modifiers.  The single operand is always
    treated as a floating-point vector; the results are interpreted according
    to the data type modifier.


    Section 2.X.8.Z, TXD:  Texture Sample with Partials

    The TXD instruction takes the four components of the first floating-point
    source vector and performs a filtered texture access as described in
    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
    floating-point result vector.  The partial derivatives of the texture
    coordinates with respect to X and Y are specified by the second and third
    floating-point source vectors.  The level of detail is computed
    automatically using the provided partial derivatives.

    Note that for cube map texture targets, the provided partial derivatives
    are in the coordinate system used before texture coordinates are projected
    onto the appropriate cube face.  The partial derivatives of the
    post-projection texture coordinates, which are used for level-of-detail
    and anisotropic filtering calculations, are derived from the original
    coordinates and partial derivatives in an implementation-dependent manner.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      tmp2 = VectorLoad(op2);
      lambda = ComputeLOD(tmp1, tmp2);
      result = TextureSample(tmp0, lambda, tmp1, tmp2, texelOffset);

    TXD supports all three data type modifiers.  All three operands are always
    treated as floating-point vectors; the results are interpreted according
    to the data type modifier.


    Section 2.X.8.Z, TXF:  Texel Fetch

    The TXF instruction takes the four components of a single signed integer
    source vector and performs a single texel fetch as described in Section
    2.X.4.4.  The first three components provide the <i>, <j>, and <k> values
    for the texel fetch, and the fourth component is used to determine the LOD
    to access.  The returned (R,G,B,A) value is written to the floating-point
    result vector.  Partial derivatives are irrelevant for single texel
    fetches.

      tmp = VectorLoad(op0);
      result = TexelFetch(tmp, texelOffset);

    TXF supports all three data type modifiers.  The single vector operand is
    treated as a signed integer vector; the results are interpreted according
    to the data type modifier.


    Section 2.X.8.Z, TXL:  Texture Sample with LOD

    The TXL instruction takes the four components of a single floating-point
    source vector and performs a filtered texture access as described in
    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
    floating-point result vector.  The level of detail is taken from the
    fourth component of the source vector.

    Partial derivatives are not computed by the TXL instruction and
    anisotropic filtering is not performed.

      tmp = VectorLoad(op0);
      ddx = (0,0,0);
      ddy = (0,0,0);
      result = TextureSample(tmp, tmp.w, ddx, ddy, texelOffset);

    The single source vector in the TXL instruction does not have enough
    coordinates to specify a lookup into a 2D array or cube map texture with
    both an explicit LOD and a reference value for depth comparison.  A
    program will fail to load if it contains a TXL instruction with a target
    of SHADOWCUBE or SHADOWARRAY2D.

    TXL supports all three data type modifiers.  The single vector operand is
    treated as a floating-point vector; the results are interpreted according
    to the data type modifier.


    Section 2.X.8.Z, TXP:  Texture Sample with Projection

    The TXP instruction divides the first three components of its single
    floating-point source vector by its fourth component, maps the results to
    s, t, and r, and performs a filtered texture access as described in
    Section 2.X.4.4.  The returned (R,G,B,A) value is written to the
    floating-point result vector.  Partial derivatives and the level of detail
    are computed automatically.

      tmp0 = VectorLoad(op0);
      tmp0.x = tmp0.x / tmp0.w;
      tmp0.y = tmp0.y / tmp0.w;
      tmp0.z = tmp0.z / tmp0.w;
      ddx = ComputePartialsX(tmp);
      ddy = ComputePartialsY(tmp);
      lambda = ComputeLOD(ddx, ddy);
      result = TextureSample(tmp, lambda, ddx, ddy, texelOffset);

    The single source vector in the TXP instruction does not have enough
    coordinates to specify a lookup into a 2D array or cube map texture with
    both a Q coordinate and an explicit reference value for depth comparison.
    A program will fail to load if it contains a TXP instruction with a target
    of SHADOWCUBE or SHADOWARRAY2D.

    TXP supports all three data type modifiers.  The single vector operand is
    treated as a floating-point vector; the results are interpreted according
    to the data type modifier.


    Section 2.X.8.Z, TXQ:  Texture Size Query

    The TXQ instruction takes the first component of the single integer vector
    operand, adds the number of the base level of the specified texture to
    determine a texture image level, and returns an integer result vector
    containing the size of the image at that level of the texture.

    For one-dimensional and one-dimensional array textures, the "x" component
    of the result vector is filled with the width of the image(s).  For
    two-dimensional, rectangle, cube map, and two-dimensional array textures,
    the "x" and "y" components are filled with the width and height of the
    image(s).  For three-dimensional textures, the "x", "y", and "z"
    components are filled with the width, height, and depth of the image.
    Additionally, the number of layers in an array texture is returned in the
    "y" component of the result for one-dimensional array textures or the "z"
    component for two-dimensional array textures.  All other components of the
    result vector is undefined.  For the purposes of this instruction, the
    width, height, and depth of a texture do NOT include any border.

      tmp0 = VectorLoad(op0);
      tmp0.x = tmp0.x + texture[op1].target[op2].base_level;
      result.x = texture[op1].target[op2].level[tmp0.x].width;
      result.y = texture[op1].target[op2].level[tmp0.x].height;
      result.z = texture[op1].target[op2].level[tmp0.x].depth;

    If the level computed by adding the operand to the base level of the
    texture is less than the base level number or greater than the maximum
    level number, the results are undefined.

    TXQ supports no data type modifiers; the scalar operand and the result
    vector are both interpreted as signed integers.


    Section 2.X.8.Z, UP2H:  Unpack Two 16-bit Floats

    The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit
    scalar operand.  The first 16-bit float (stored in the 16 least
    significant bits) is written into the "x" and "z" components of the result
    vector; the second is written into the "y" and "w" components of the
    result vector.

    This operation undoes the type conversion and packing performed by
    the PK2H instruction.

      tmp = ScalarLoad(op0);
      result.x = (fp16) (RawBits(tmp) & 0xFFFF);
      result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
      result.z = (fp16) (RawBits(tmp) & 0xFFFF);
      result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);

    UP2H supports all three data type modifiers.  The single operand is read
    as a floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier; the 32 least significant bits of the
    encoding are used for unpacking.  For floating-point operand variables, it
    is expected (but not required) that the operand was produced by a previous
    pack instruction.  The result is always written as a floating-point
    vector.

    A program will fail to load if it contains a UP2H instruction whose
    operand is a variable declared as "SHORT".


    Section 2.X.8.Z, UP2US:  Unpack Two Unsigned 16-bit Integers

    The UP2US instruction unpacks two 16-bit unsigned values packed
    together in a 32-bit scalar operand.  The unsigned quantities are
    encoded where a bit pattern of all '0' bits corresponds to 0.0 and
    a pattern of all '1' bits corresponds to 1.0.  The "x" and "z"
    components of the result vector are obtained from the 16 least
    significant bits of the operand; the "y" and "w" components are
    obtained from the 16 most significant bits.

    This operation undoes the type conversion and packing performed by
    the PK2US instruction.

      tmp = ScalarLoad(op0);
      result.x = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
      result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
      result.z = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
      result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;

    UP2US supports all three data type modifiers.  The single operand is read
    as a floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier; the 32 least significant bits of the
    encoding are used for unpacking.  For floating-point operand variables, it
    is expected (but not required) that the operand was produced by a previous
    pack instruction.  The result is always written as a floating-point
    vector.

    A GPU program will fail to load if it contains a UP2S instruction
    whose operand is a variable declared as "SHORT".


    Section 2.X.8.Z, UP4B:  Unpack Four Signed 8-bit Integers

    The UP4B instruction unpacks four 8-bit signed values packed together
    in a 32-bit scalar operand.  The signed quantities are encoded where
    a bit pattern of all '0' bits corresponds to -128/127 and a pattern
    of all '1' bits corresponds to +127/127.  The "x" component of the
    result vector is the converted value corresponding to the 8 least
    significant bits of the operand; the "w" component corresponds to
    the 8 most significant bits.

    This operation undoes the type conversion and packing performed by
    the PK4B instruction.

      tmp = ScalarLoad(op0);
      result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0;
      result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0;
      result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0;
      result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0;

    UP2B supports all three data type modifiers.  The single operand is read
    as a floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier; the 32 least significant bits of the
    encoding are used for unpacking.  For floating-point operand variables, it
    is expected (but not required) that the operand was produced by a previous
    pack instruction.  The result is always written as a floating-point
    vector.

    A program will fail to load if it contains a UP4B instruction whose
    operand is a variable declared as "SHORT".


    Section 2.X.8.Z, UP4UB:  Unpack Four Unsigned 8-bit Integers

    The UP4UB instruction unpacks four 8-bit unsigned values packed
    together in a 32-bit scalar operand.  The unsigned quantities are
    encoded where a bit pattern of all '0' bits corresponds to 0.0 and a
    pattern of all '1' bits corresponds to 1.0.  The "x" component of the
    result vector is obtained from the 8 least significant bits of the
    operand; the "w" component is obtained from the 8 most significant
    bits.

    This operation undoes the type conversion and packing performed by
    the PK4UB instruction.

      tmp = ScalarLoad(op0);
      result.x = ((RawBits(tmp) >> 0)  & 0xFF) / 255.0;
      result.y = ((RawBits(tmp) >> 8)  & 0xFF) / 255.0;
      result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0;
      result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0;

    UP4UB supports all three data type modifiers.  The single operand is read
    as a floating-point value, a signed integer, or an unsigned integer, as
    specified by the data type modifier; the 32 least significant bits of the
    encoding are used for unpacking.  For floating-point operand variables, it
    is expected (but not required) that the operand was produced by a previous
    pack instruction.  The result is always written as a floating-point
    vector.

    A program will fail to load if it contains a UP4UB instruction whose
    operand is a variable declared as "SHORT".


    Section 2.X.8.Z, X2D:  2D Coordinate Transformation

    The X2D instruction multiplies the 2D offset vector specified by the
    "x" and "y" components of the second vector operand by the 2x2 matrix
    specified by the four components of the third vector operand, and adds
    the transformed offset vector to the 2D vector specified by the "x"
    and "y" components of the first vector operand.  The first component
    of the sum is written to the "x" and "z" components of the result;
    the second component is written to the "y" and "w" components of
    the result.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      tmp2 = VectorLoad(op2);
      result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
      result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
      result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
      result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;

    X2D supports only floating-point data type modifiers.


    Section 2.X.8.Z, XOR:  Exclusive Or

    The XOR instruction performs a bitwise XOR operation on the components of
    the two source vectors to yield a result vector.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = tmp0.x ^ tmp1.x;
      result.y = tmp0.y ^ tmp1.y;
      result.z = tmp0.z ^ tmp1.z;
      result.w = tmp0.w ^ tmp1.w;

    XOR supports only integer data type modifiers.  If no type modifier is
    specified, both operands and the result are treated as signed integers.


    Section 2.X.8.Z, XPD:  Cross Product

    The XPD instruction computes the cross product using the first three
    components of its two vector operands to generate the x, y, and z
    components of the result vector.  The w component of the result vector is
    undefined.

      tmp0 = VectorLoad(op0);
      tmp1 = VectorLoad(op1);
      result.x = tmp0.y * tmp1.z - tmp0.z * tmp1.y;
      result.y = tmp0.z * tmp1.x - tmp0.x * tmp1.z;
      result.z = tmp0.x * tmp1.y - tmp0.y * tmp1.x;
      result.w = undefined;

    XPD supports only floating-point data type modifiers.


Additions to Chapter 3 of the OpenGL 1.5 Specification (Rasterization)

    Modify Section 3.8.1, Texture Image Specification, p. 150

    (modify 4th paragraph, p. 151 -- add cubemaps to the list of texture
    targets that can be used with DEPTH_COMPONENT textures) Textures with a
    base internal format of DEPTH_COMPONENT are supported by texture image
    specification commands only if <target> is TEXTURE_1D, TEXTURE_2D,
    TEXTURE_CUBE_MAP, TEXTURE_RECTANGLE_ARB, TEXTURE_1D_ARRAY_EXT,
    TEXTURE_2D_ARRAY_EXT, PROXY_TEXTURE_1D PROXY_TEXTURE_2D,
    PROXY_TEXTURE_CUBE_MAP, PROXY_TEXTURE_RECTANGLE_ARB,
    PROXY_TEXTURE_1D_ARRAY_EXT, or PROXY_TEXTURE_2D_ARRAY_EXT.  Using this
    format in conjunction with any other target will result in an
    INVALID_OPERATION error.


    Delete Section 3.8.7, Texture Wrap Modes.  (The language in this section
    is folded into updates to the following section, and is no longer needed
    here.)


    Modify Section 3.8.8, Texture Minification:

    (replace the last paragraph, p. 171):  Let s(x,y) be the function that
    associates an s texture coordinate with each set of window coordinates
    (x,y) that lie within a primitive; define t(x,y) and r(x,y) analogously.
    Let

      u(x,y) = w_t * s(x,y) + offsetu_shader,
      v(x,y) = h_t * t(x,y) + offsetv_shader,
      w(x,y) = d_t * r(x,y) + offsetw_shader, and

    where w_t, h_t, and d_t are as defined by equations 3.15, 3.16, and 3.17
    with w_s, h_s, and d_s equal to the width, height, and depth of the image
    array whose level is level_base.  (offsetu_shader, offsetv_shader,
    offsetw_shader) is the texel offset specified in the vertex, geometry, or
    fragment program instruction used to perform the access.  For
    fixed-function texture accesses, all three shader offsets are taken to be
    zero.  For a one-dimensional texture, define v(x,y) == 0 and w(x,y) === 0;
    for two-dimensional textures, define w(x,y) == 0.

    After u(x,y), v(x,y), and w(x,y) are generated, they are clamped if the
    corresponding texture wrap modes are CLAMP or MIRROR_CLAMP_EXT.  Let

      u'(x,y) = clamp(u(x,y), 0, w_t),      if TEXTURE_WRAP_S is CLAMP
                clamp(u(x,y), -w_t, w_t),   if TEXTURE_WRAP_S is
                                              MIRROR_CLAMP_EXT, or
                u(x,y),                     otherwise
      v'(x,y) = clamp(v(x,y), 0, w_t),      if TEXTURE_WRAP_T is CLAMP
                clamp(v(x,y), -w_t, w_t),   if TEXTURE_WRAP_T is
                                              MIRROR_CLAMP_EXT, or
                v(x,y),                     otherwise
      w'(x,y) = clamp(w(x,y), 0, w_t),      if TEXTURE_WRAP_R is CLAMP
                clamp(w(x,y), -w_t, w_t),   if TEXTURE_WRAP_R is
                                              MIRROR_CLAMP_EXT, or
                w(x,y),                     otherwise,

    where clamp(<a>,<b>,<c>) returns <b> if <a> is less than <b>, <c> if a is
    greater than <c>, and <a> otherwise.

    (start a new paragraph with "For a polygon, rho is given at a fragment
    with window coordinates...", and then continue with the original spec
    text.)

    (replace text starting with the last paragraph on p. 172, continuing to
    the end of p. 174)

    When lambda indicates minification, the value assigned to
    TEXTURE_MIN_FILTER is used to determine how the texture value for a
    fragment is selected.

    When TEXTURE_MIN_FILTER is NEAREST, the texel in the image array of level
    level_base that is nearest (in Manhattan distance) to that specified by
    (s,t,r) is obtained.  Let i, j, and k be integers such that:

      i = apply_wrap(floor(u'(x,y))),
      j = apply_wrap(floor(v'(x,y))), and
      k = apply_wrap(floor(w'(x,y))),

    where the coordinate returned by apply_wrap() is as defined by Table X.19.
    The values of i, j, and k are then modified according to the texture wrap
    modes, as described in Table 3.19, to produce new values (i', j', and k').
    For a three-dimensional texture, the texel at location (i,j,k) becomes the
    texture value.  For a two-dimensional texture, k is irrelevant, and the
    texel at location (i,j) becomes the texture value.  For a one-dimensional
    texture, j and k are irrelevant, and the texel at location i becomes the
    texture value.

      Wrap mode                   Result
      --------------------------  ------------------------------------------
      CLAMP_TO_EDGE               clamp(coord, 0, size-1)
      CLAMP_TO_BORDER             clamp(coord, -1, size)
      CLAMP                       { clamp(coord, 0, size-1),
                                  {         for NEAREST filtering
                                  { clamp(coord, -1, size),
                                  {         for LINEAR filtering
      REPEAT                      mod(coord, size)
      MIRROR_CLAMP_TO_EDGE_EXT    clamp(mirror(coord), 0, size-1)
      MIRROR_CLAMP_TO_BORDER_EXT  clamp(mirror(size), 0, size)
      MIRROR_CLAMP_EXT            { clamp(mirror(coord), 0, size-1),
                                  {         for NEAREST filtering
                                  { clamp(mirror(size), 0, size),
                                  {         for LINEAR filtering
      MIRRORED_REPEAT             (size-1) - mirror(mod(coord, 2*size)-size)

      Table X.19:  Texel location wrap mode application.  mod(<a>,<b>) is
      defined to return <a>-<b>*floor(<a>/<b>), and mirror(<a>) is defined to
      return <a> if <a> is greater than or equal to zero or -(1+<a>)
      otherwise.  The values of "wrap mode" and size are TEXTURE_WRAP_S and
      w_t, TEXTURE_WRAP_T and h_t, and TEXTURE_WRAP_R and d_t, for i, j, and k
      coordinates, respectively.  The coordinate clamp and MIRROR_CLAMP_EXT
      depends on the filtering mode (NEAREST or LINEAR).

    If the selected (i,j,k), (i,j), or i location refers to a border texel
    that satisfies any of the following conditions:

      i < -b_s,
      j < -b_s,
      k < -b_s,
      i >= w_t + b_s,
      j >= h_t + b_s, or
      j >= d_t + b_s,

    then the border values defined by TEXTURE_BORDER_COLOR are used in place
    of the non-existent texel. If the texture contains color components, the
    values of TEXTURE_BORDER_COLOR are interpreted as an RGBA color to match
    the texture's internal format in a manner consistent with table 3.15. If
    the texture contains depth components, the first component of
    TEXTURE_BORDER_COLOR is interpreted as a depth value.

    When TEXTURE_MIN_FILTER is LINEAR, a 2x2x2 cube of texels in the image
    array of level level_base is selected.  Let:

      i_0   = apply_wrap(floor(u' - 0.5)),
      j_0   = apply_wrap(floor(v' - 0.5)),
      k_0   = apply_wrap(floor(w' - 0.5)),
      i_1   = apply_wrap(floor(u' - 0.5) + 1),
      j_1   = apply_wrap(floor(v' - 0.5) + 1),
      k_1   = apply_wrap(floor(w' - 0.5) + 1),
      alpha = frac(u' - 0.5),
      beta  = frac(v' - 0.5),
      gamma = frac(w' - 0.5),

    where frac(<x>) denotes the fractional part of <x>.

    For a three-dimensional texture, the texture value tau is found as...

    (replace last paragraph, p.174) For any texel in the equation above that
    refers to a border texel outside the defined range of the image, the texel
    value is taken from the texture border color as with NEAREST filtering.


    Modify Section 3.8.14, Texture Comparison Modes (p. 185)

    (modify 2nd paragraph, p. 188, indicating that the Q texture coordinate is
    used for depth comparisons on cubemap textures)

    Let D_t be the depth texture value, in the range [0, 1].  For
    fixed-function texture lookups, let R be the interpolated <r> texture
    coordinate, clamped to the range [0, 1].  For texture lookups generated by
    a program instruction, let R be the reference value for depth comparisons
    provided in the instruction, also clamped to [0, 1].  Then the effective
    texture value L_t, I_t, or A_t is computed as follows:


Additions to Chapter 4 of the OpenGL 1.5 Specification (Per-Fragment
Operations and the Frame Buffer)

    None.


Additions to Chapter 5 of the OpenGL 1.5 Specification (Special Functions)

    None.


Additions to Chapter 6 of the OpenGL 1.5 Specification (State and
State Requests)

    Modify Section 6.1.12 of the ARB_vertex_program specification.

    (Add new integer program parameter queries, plus language that program
    environment or local parameter query results are undefined if the query
    specifies a data type incompatible with the data type of the parameter
    being queried.)

    The commands

      void GetProgramEnvParameterdvARB(enum target, uint index,
                                       double *params);
      void GetProgramEnvParameterfvARB(enum target, uint index,
                                       float *params);
      void GetProgramEnvParameterIivNV(enum target, uint index,
                                       int *params);
      void GetProgramEnvParameterIuivNV(enum target, uint index,
                                        uint *params);

    obtain the current value for the program environment parameter numbered
    <index> for the given program target <target>, and places the information
    in the array <params>.  The values returned are undefined if the data type
    of the components of the parameter is not compatible with the data type of
    <params>.  Floating-point components are compatible with "double" or
    "float"; signed and unsigned integer components are compatible with "int"
    and "uint", respectively.  The error INVALID_ENUM is generated if <target>
    specifies a nonexistent program target or a program target that does not
    support program environment parameters.  The error INVALID_VALUE is
    generated if <index> is greater than or equal to the
    implementation-dependent number of supported program environment
    parameters for the program target.

    ...

    The commands

      void GetProgramLocalParameterdvARB(enum target, uint index,
                                         double *params);
      void GetProgramLocalParameterfvARB(enum target, uint index,
                                         float *params);
      void GetProgramLocalParameterIivNV(enum target, uint index,
                                         int *params);
      void GetProgramLocalParameterIuivNV(enum target, uint index,
                                          uint *params);

    obtain the current value for the program local parameter numbered <index>
    belonging to the program object currently bound to <target>, and places
    the information in the array <params>.  The values returned are undefined
    if the data type of the components of the parameter is not compatible with
    the data type of <params>.  Floating-point components are compatible with
    "double' or "float"; signed and unsigned integer components are compatible
    with "int" and "uint", respectively.  The error INVALID_ENUM is generated
    if <target> specifies a nonexistent program target or a program target
    that does not support program local parameters.  The error INVALID_VALUE
    is generated if <index> is greater than or equal to the
    implementation-dependent number of supported program local parameters for
    the program target.

    ...

    The command

      void GetProgramivARB(enum target, enum pname, int *params);

    obtains program state for the program target <target>, writing ...

    (add new paragraphs describing the new supported queries)

    If <pname> is PROGRAM_ATTRIB_COMPONENTS_NV or
    PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer
    holding the number of active attribute or result variable components,
    respectively, used by the program object currently bound to <target>.

    If <pname> is MAX_PROGRAM_ATTRIB_COMPONENTS or
    MAX_PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer
    holding the maximum number of active attribute or result variable
    components, respectively, supported for programs of type <target>.


Additions to Appendix A of the OpenGL 1.5 Specification (Invariance)

    None.


Additions to the AGL/GLX/WGL Specifications

    None.


GLX Protocol

    The following new rendering commands are sent to the server as part
    of a glXRender request.

    ProgramLocalParameterI4ivNV

        2           28               rendering command length
        2           4303             rendering command opcode
        4           ENUM             target
        4           CARD32           index
        4           INT32            params[0]
        4           INT32            params[1]
        4           INT32            params[2]
        4           INT32            params[3]

    ProgramLocalParameterI4uivNV

        2           28               rendering command length
        2           4305             rendering command opcode
        4           ENUM             target
        4           CARD32           index
        4           CARD32           params[0]
        4           CARD32           params[1]
        4           CARD32           params[2]
        4           CARD32           params[3]

    ProgramEnvParameterI4ivNV

        2           28               rendering command length
        2           4307             rendering command opcode
        4           ENUM             target
        4           CARD32           index
        4           INT32            params[0]
        4           INT32            params[1]
        4           INT32            params[2]
        4           INT32            params[3]

    ProgramEnvParameterI4uivNV

        2           28               rendering command length
        2           4309             rendering command opcode
        4           ENUM             target
        4           CARD32           index
        4           CARD32           params[0]
        4           CARD32           params[1]
        4           CARD32           params[2]
        4           CARD32           params[3]

    Following new rendering commands are added. These can be sent as a
    glXRender or glXRenderLarge request.

    ProgramLocalParametersI4ivNV

        2           16+count*4*4     rendering command length
        2           4304             rendering command opcode
        4           ENUM             target
        4           CARD32           index
        4           CARD32           count
        4*count*4   LISTofINT32      params

    If the command is encoded in a glXRenderLarge request, the
    command opcode and command length fields above are expanded to
    4 bytes each:

        4           20+count*4*4     rendering command length
        4           4304             rendering command opcode

    ProgramLocalParametersI4uivNV

        2           16+count*4*4     rendering command length
        2           4306             rendering command opcode
        4           ENUM             target
        4           CARD32           index
        4           CARD32           count
        4*count*4   LISTofCARD32     params

    If the command is encoded in a glXRenderLarge request, the
    command opcode and command length fields above are expanded to
    4 bytes each:

        4           20+count*4*4     rendering command length
        4           4306             rendering command opcode

    ProgramEnvParametersI4ivNV

        2           16+count*4*4     rendering command length
        2           4308             rendering command opcode
        4           ENUM             target
        4           CARD32           index
        4           CARD32           count
        4*count*4   LISTofCARD32     params

    If the command is encoded in a glXRenderLarge request, the
    command opcode and command length fields above are expanded to
    4 bytes each:

        4           20+count*4*4     rendering command length
        4           4308             rendering command opcode

    ProgramEnvParametersI4uivNV

        2           16+count*4*4     rendering command length
        2           4310             rendering command opcode
        4           ENUM             target
        4           CARD32           index
        4           INT32            count
        4*count*4   LISTofCARD32     params

    If the command is encoded in a glXRenderLarge request, the
    command opcode and command length fields above are expanded to
    4 bytes each:

        4           20+count*4*4     rendering command length
        4           4310             rendering command opcode

    The remaining commands are non-rendering commands.  These commands
    are sent separately (i.e., not as part of a glXRender or
    glXRenderLarge request), using the glXVendorPrivateWithReply
    request:

    GetProgramLocalParameterIivNV
        1           CARD8            opcode (X assigned)
        1           17               GLX opcode (X_GLXVendorPrivateWithReply)
        2           5                request length
        4           1365             vendor specific opcode
        4           GLX_CONTEXT_TAG  context tag
        4           ENUM             target
        4           CARD32           index
      =>
        1           1                reply
        1           CARD8            unused
        2           CARD16           sequence number
        4           4                reply length
        24          CARD32           unused
        16          INT32            params

    GetProgramLocalParameterIuivNV
        1           CARD8            opcode (X assigned)
        1           17               GLX opcode (X_GLXVendorPrivateWithReply)
        2           5                request length
        4           1366             vendor specific opcode
        4           GLX_CONTEXT_TAG  context tag
        4           ENUM             target
        4           CARD32           index
      =>
        1           1                reply
        1           CARD8            unused
        2           CARD16           sequence number
        4           4                reply length
        24          CARD32           unused
        16          CARD32           params

    GetProgramEnvParameterIivNV
        1           CARD8            opcode (X assigned)
        1           17               GLX opcode (X_GLXVendorPrivateWithReply)
        2           5                request length
        4           1367             vendor specific opcode
        4           GLX_CONTEXT_TAG  context tag
        4           ENUM             target
        4           CARD32           index
      =>
        1           1                reply
        1           CARD8            unused
        2           CARD16           sequence number
        4           4                reply length
        24          CARD32           unused
        16          INT32            params

    GetProgramEnvParameterIuivNV
        1           CARD8            opcode (X assigned)
        1           17               GLX opcode (X_GLXVendorPrivateWithReply)
        2           5                request length
        4           1368             vendor specific opcode
        4           GLX_CONTEXT_TAG  context tag
        4           ENUM             target
        4           CARD32           index
      =>
        1           1                reply
        1           CARD8            unused
        2           CARD16           sequence number
        4           4                reply length
        24          CARD32           unused
        16          CARD32           params

Errors

    The error INVALID_VALUE is generated by ProgramLocalParameter4fARB,
    ProgramLocalParameter4fvARB, ProgramLocalParameter4dARB,
    ProgramLocalParameter4dvARB, ProgramLocalParameterI4iNV,
    ProgramLocalParameterI4ivNV, ProgramLocalParameterI4uiNV,
    ProgramLocalParameterI4uivNV, GetProgramLocalParameter4fvARB,
    GetProgramLocalParameter4dvARB, GetProgramLocalParameterI4ivNV, and
    GetProgramLocalParameterI4uivNV if <index> is greater than or equal to the
    number of program local parameters supported by <target>.

    The error INVALID_VALUE is generated by ProgramEnvParameter4fARB,
    ProgramEnvParameter4fvARB, ProgramEnvParameter4dARB,
    ProgramEnvParameter4dvARB, ProgramEnvParameterI4iNV,
    ProgramEnvParameterI4ivNV, ProgramEnvParameterI4uiNV,
    ProgramEnvParameterI4uivNV, GetProgramEnvParameter4fvARB,
    GetProgramEnvParameter4dvARB, GetProgramEnvParameterI4ivNV, and
    GetProgramEnvParameterI4uivNV if <index> is greater than or equal to the
    number of program environment parameters supported by <target>.

    The error INVALID_VALUE is generated by ProgramLocalParameters4fvNV,
    ProgramLocalParametersI4ivNV, and ProgramLocalParametersI4uivNV if the sum
    of <index> and <count> is greater than the number of program local
    parameters supported by <target>.

    The error INVALID_VALUE is generated by ProgramEnvParameters4fvNV,
    ProgramEnvParametersI4ivNV, and ProgramEnvParametersI4uivNV if the sum of
    <index> and <count> is greater than the number of program environment
    parameters supported by <target>.


Dependencies on NV_parameter_buffer_object

    If NV_parameter_buffer_object is not supported, references to program
    parameter buffer variables and bindings should be removed.


Dependencies on ARB_texture_rectangle

    If ARB_texture_rectangle is not supported, references to rectangle
    textures and the RECT and SHADOWRECT texture target identifiers should be
    removed.


Dependencies on EXT_gpu_program_parameters

    If EXT_gpu_program_parameters is not supported, references to the
    Program{Local,Env}Parameters4fvNV commands, which set multiple program
    local or environment parameters in a single call, should be removed.
    These prototypes were included in this spec for completeness only.


Dependencies on EXT_texture_integer

    If EXT_texture_integer is not supported, references to texture lookups
    returning integer values in Section 2.X.4.4 (Texture Access) should be
    removed, and all texture formats are considered to produce floating-point
    values.


Dependencies on EXT_texture_array

    If EXT_texture_array is not supported, references to array textures in
    Section 2.X.4.4 (Texture Access) and elsewhere should be removed, as
    should all references to the "ARRAY1D", "ARRAY2D", "SHADOWARRAY1D", and
    "SHADOWARRAY2D" tokens.


Dependencies on EXT_texture_buffer_object

    If EXT_texture_buffer_object is not supported, references to buffer
    textures in Section 2.X.4.4 (Texture Access) and elsewhere should be
    removed, as should all references to the "BUFFER" tokens.


Dependencies on NV_primitive_restart

    If NV_primitive_restart is supported, index values causing a primitive
    restart are not considered as specifying an End command, followed by
    another Begin.  Primitive restart is therefore not guaranteed to
    immediately update bindings for material properties changed inside a
    Begin/End.  The spec language says they "are not guaranteed to update
    program parameter bindings until the following End command."


New State

                                                         Initial
    Get Value                     Type  Get Command       Value  Description             Sec     Attrib
    ----------------------------  ----  ---------------  ------- ----------------------  ------  ------
    PROGRAM_ATTRIB_COMPONENTS_NV  Z+    GetProgramivARB     -    number of components    6.1.12   -
                                                                 used for attributes
    PROGRAM_RESULT_COMPONENTS_NV  Z+    GetProgramivARB     -    number of components    6.1.12   -
                                                                 used for results

    Table X.20.  New Program Object State.  Program object queries return
    attributes of the program object currently bound to the program target
    <target>.


New Implementation Dependent State

                                                             Minimum
    Get Value                         Type  Get Command       Value   Description           Sec.   Attrib
    --------------------------------  ----  ---------------  -------  --------------------- ------ ------
    MIN_PROGRAM_TEXEL_OFFSET_EXT      Z     GetIntegerv        -8     minimum texel offset  2.x.4.4  -
                                                                      allowed in lookup
    MAX_PROGRAM_TEXEL_OFFSET_EXT      Z     GetIntegerv        +7     maximum texel offset  2.x.4.4  -
                                                                      allowed in lookup
    MAX_PROGRAM_ATTRIB_COMPONENTS_NV  Z+    GetProgramivARB    (*)    maximum number of     6.1.12   -
                                                                      components allowed
                                                                      for attributes
    MAX_PROGRAM_RESULT_COMPONENTS_NV  Z+    GetProgramivARB    (*)    maximum number of     6.1.12   -
                                                                      components allowed
                                                                      for results
    MAX_PROGRAM_GENERIC_ATTRIBS_NV    Z+    GetProgramivARB    (*)    number of generic     6.1.12   -
                                                                      attribute vectors
                                                                      supported
    MAX_PROGRAM_GENERIC_RESULTS_NV    Z+    GetProgramivARB    (*)    number of generic     6.1.12   -
                                                                      result vectors
                                                                      supported
    MAX_PROGRAM_CALL_DEPTH_NV         Z+    GetProgramivARB     4     maximum program       2.X.5    -
                                                                      call stack depth
    MAX_PROGRAM_IF_DEPTH_NV           Z+    GetProgramivARB     48    maximum program       2.X.5    -
                                                                      if nesting
    MAX_PROGRAM_LOOP_DEPTH_NV         Z+    GetProgramivARB     4     maximum program       2.X.5    -
                                                                      loop nesting

    Table X.21:  New Implementation-Dependent Values Introduced by
    NV_gpu_program4.  (*) means that the required minimum is program
    type-specific.  There are separate limits for each program type.


Issues

    (1) How does this extension differ from previous NV_vertex_program and
    NV_fragment_program extensions?

      RESOLVED:

        - This extension provides a uniform set of instructions and bindings.
          Unlike previous extensions, the set of instructions and bindings
          available is generally the same.  The only exceptions are a small
          number of instructions and bindings that make sense for one specific
          program type.

        - This extension supports integer data types and provides a
          full-fledged integer instruction set.

        - This extension supports array variables of all types, including
          temporaries.  Array variables can be accessed directly or indirectly
          (using integer temporaries as indices).

        - This extension provides a uniform set of structured branching
          constructs (if tests, loops, subroutines) that fully support
          run-time condition testing.  Previous versions of NV_vertex_program
          provided unstructured branching.  Previous versions of
          NV_fragment_program provided structure branching constructs, but the
          support was more limited -- for example, looping constructs couldn't
          specify loop counts with values computed at run time.

        - This extension supports geometry programs, which are described in
          more detail in the NV_geometry_program4 extension.

        - This extension provides the ability to specify and use cubemap
          textures with a DEPTH_COMPONENT internal format.  Shadow mapping is
          supported; the Q texture coordinate is used as the reference value
          for comparisons.

    (2) Is this extension backward-compatible with previous NV_vertex_program
    and NV_fragment_program extensions?  If not, what support has been
    removed?

      RESOLVED:  This extension is largely, but not completely,
      backward-compatible.  Functionality removed includes:

        - Unstructured branching:  NV_vertex_program2 included a general
          branch instruction "BRA" that could be used to jump to an arbitrary
          instruction.  The "CAL" instruction could "call" to an arbitrary
          instruction into code that was not necessarily structured as simple
          subroutine blocks.  Arbitrary unstructured branching can be
          difficult to implement efficiently on highly parallel GPU
          architectures, while basic structured branching is not nearly as
          difficult.

          This extension retains the "CAL" instruction but treats each block
          of code between instruction labels as a separate subroutine.  The
          "BRA" instruction and arbitrary branching has been removed.  The
          structured branching constructs in this extension are sufficient to
          implement almost all of the looping/branching support in high-level
          languages ("goto" being the most obvious exception).

        - Address registers:  NV_vertex_program added the notion of address
          registers, which were effectively under-powered integer temporaries.
          The set of instructions used to manipulate address registers was
          severely limited.  NV_vertex_program[23] extended the original
          scalars to vectors and added a few more instructions to manipulate
          address registers.  Fragment programs had no address registers until
          NV_fragment_program2 added the loop counter, which was very similar
          in functionality to vertex program address registers, but even more
          limited.  This extension adds true integer temporaries, which can
          accomplish everything old address registers could do, and much more.
          Address register support was removed to simplify the API.

        - NV_fragment_program2 LOOP construct:  NV_fragment_program2 added a
          LOOP instruction, which let you repeat a block of code <N> times,
          with a parallel loop counter that started at <A> and stepped by <B>
          on each iteration.  This construct was signficantly limited in
          several ways -- the loop count had to be constant, and you could
          only access the innermost loop counter in a nested loop.  This
          extension discards the support and retains the simpler "REP"
          construct to implement loops.  If desired, a loop counter can be
          implemented by manipulating an integer temporary.  The "BRK"
          instruction (conditional break) is retained, and a "CONT"
          instruction (conditional continue) is added.  Additionally, the loop
          count need not be a constant.

        - NV_vertex_program and ARB_vertex_program EXP and LOG instructions:
          NV_vertex_program provided EXP and LOG instructions that computed a
          rough approximation of 2^x or log_2(x) and provided some additional
          values that could help refine the approximation.  Those opcodes were
          carried forward into ARB_vertex_program.  Both ARB_vertex_program
          and NV_vertex_program2 provided EX2 and LG2 instructions that
          computed a better approximation.  All fragment program extensions
          also provided EX2 and LG2, but did not bother to include EXP and
          LOG.  On the hardware targeted by this extension, there is no
          advantage to using EXP and LOG, so these opcodes have been removed
          for simplicity.

        - NV_vertex_program3 and NV_fragment_program2 provide the ability to
          do indirect addressing of inputs/outputs when using bindings in
          instructions -- for example:

            MOV R0, vertex.attrib[A0.x+2];      # vertex
            MOV result.texcoord[A0.y], R1;      # vertex
            MOV R2, fragment.texcoord[A0.x];    # fragment

          This extension provides indexing capability, but using named array
          variables instead.

            ATTRIB attribs[] = { vertex.attrib[2..5] };
            MOV R0, attribs[A0.x];
            OUTPUT outcoords[] = { result.texcoord[0..3] };
            MOV outcoords[A0.y], R1;
            ATTRIB texcoords[] = { fragment.texcoord[0..2] };
            MOV R2, texcoords[A0.x];

          This approach makes the set of attribute and result bindings more
          regular.  Additionally, it helps the assembler determine which
          vertex/fragment attributes are actually needed -- when the assembler
          sees constructs like "fragment.texcoord[A0.x]", it must treat *all*
          texture coordinates as live unless it can determine the range of
          values used for indexing.  The named array variable approach
          explicitly identifies which attributes are needed when indexing is
          used.

      Functionality altered includes:

        - The RSQ instruction in the original NV_vertex_program and
          ARB_vertex_program extensions implicitly took the absolute value of
          their operand.  Since the ARB extensions don't have numerics
          guarantees, computing the reciprocal square root of a negative value
          was not meaningful.  To allow for the possibility of taking the
          reciprocal square root of a negative value (which should yield NaN
          -- "not a number"), the RSQ instruction in this instruction no
          longer implicitly takes the absolute value of its operand.
          Equivalent functionality can be achieved using the explicit |abs|
          absolute value operator on the operand to RSQ.

        - The results of texture lookups accessing inconsistent textures are
          now undefined, instead of producing a fixed constant vector.


    (3) What should this set of extensions be called?

      RESOLVED:  NV_gpu_program4, NV_vertex_program4, NV_fragment_program4,
      and NV_geometry_program4.  Only NV_gpu_program4 will appear in the
      extension string; the other three specifications exist simply to define
      vertex, fragment, and geometry program-specific features.

      The "gpu_program" name was chosen due to the common instruction set
      intended to run on GPUs.  On previous chip generations, the vertex and
      fragment instruction sets were similar, but there were enough
      differences to package them separately.

      The choice of "4" indicates that this is the fourth generation of
      programmable hardware from NVIDIA.  The GeForce3 and GeForce4 series
      supported NV_vertex_program.  The GeForce FX series supported
      NV_vertex_program2 and added fragment programmability with
      NV_fragment_program.  Around this time, the OpenGL Architecture Review
      Board (ARB) approved ARB_vertex_program and ARB_fragment_program
      extensions, and NVIDIA added NV_vertex_program2_option and
      NV_fragment_program_option extensions exposing GeForce FX features using
      the ARB extensions' instruction set.  The GeForce6 and GeForce7 series
      brought the NV_vertex_program3 and NV_fragment_program2 extensions,
      which extend the ARB extensions further.  This extension adds geometry
      programs, and brings the "version number" for each of these extensions
      up to "4".


    (4) This instruction adds integer data type support in programmable
    shaders that were previously float-centric.  Should applications be able
    to pass integer values directly to the shaders, and if so, how does it
    work?

      RESOLVED:  The diagram at the bottom of this issue depicts data flows in
      the GL, as extended by this and related extensions.

      This extension generalizes some state to be "typeless", instead of being
      strongly typed (and almost invariably floating-point) as in the core
      specification.  We introduce a new set of functions to specify GL state
      as signed or unsigned integer values, instead of floating point values.
      These functions include:

        * VertexAttribI*{i,ui}() -- Specify generic vertex attributes as
          integers.  This extension does not create "integer" versions for
          fixed-function attribute functions (e.g., glColor, glTexCoord),
          which remain fully floating-point.

        * Program{Env,Local}ParameterI*{i,ui}() -- Specify environment and
          local parameters as integers.

        * TexImage*() with EXT_texture_integer internal formats -- Specify
          texture images as containing integer data whose values are not
          converted to floating-point values.

        * EXT_parameter_buffer_object functions -- Bind (typeless) buffer
          object data stores for use as program parameters.  These buffer
          objects can be loaded with either integer or floating-point data.

        * EXT_texture_buffer_object functions -- Bind (typeless) buffer object
          data stores for use as textures.  These buffer objects can be loaded
          with either integer or floating-point data.

      Each type of program (using NV_gpu_program4 and related extension) can
      read attributes using any data type (float, signed integer, unsigned
      integer) and write result values used by subsequent stages using any
      data type.

      Finally, there are several new places where integer data can be
      consumed by the GL:

        * NV_transform_feedback -- Stream transformed vertex attribute
          components to a (typeless) buffer object.  The transformed
          attributes can be written as signed or unsigned integers in vertex
          and geometry programs.

        * EXT_texture_integer internal formats and framebuffer objects --
          Provide support for rendering to integer texture formats, where
          final fragment values are treated as signed or unsigned integers,
          rather than floating-point values.

      The diagram below represents a substantial portion of the GL pipeline.
      Each line connecting blocks represents an interface where data is
      "produced" from the GL state or by fixed-function or programmable
      pipeline stages and "consumed" by another pipeline stage.  Each producer
      and consumer is labeled with a data type.  For producers, the
      "(typeless)" designation generally means that the state and/or output
      can be written as floating-point values or as signed or unsigned
      integers.  "(float)" means that the outputs are always written as
      floating-point.  The same distinction applies to consumers --
      "(typeless)" means that the consumer is capable of reading inputs using
      any data type, and "(float)" means that consumer always reads inputs as
      floating-point values.

      To get sane results, applications must ensure that each value passed
      between pipeline stages is produced and consumed using the same data
      type.  If a value is written in one stage as a floating-point value; it
      must be read as a floating-point value as well.  If such a value is read
      as a signed or unsigned integer, its value is considered undefined.  In
      practice, the raw bits used to represent the floating-point (IEEE
      single-precision floating-point encoding in the initial implementation
      of this spec) will be treated as an integer.

      Type matching between stages is not enforced by the GL, because the
      overhead of doing so would be substantial.  Such overhead would include:

        * matching the inputs and outputs of each pipeline stage
          (fixed-function or programmable) every time the program
          configuration or fixed-function state changes,

        * tracking the data type of each generic vertex attribute and checking
          it against the vertex program's inputs,

        * tracking the data type of each program parameter and checking it
          against the manner the parameters were used in programs,

        * matching color buffers against fragment program outputs.

      Such error checking is certainly valuable, but the additional CPU
      overhead cost is substantial.  Given that current CPUs often have a hard
      time keeping up with high-end GPUs, adding more overhead is a step in
      the wrong direction.  We expect developer tools, such as instrumented
      drivers, to be able to provide type checking on most interfaces.

      The diagram below depicts assembly programmability.  Using vertex,
      geometry, and fragment shaders provided by the OpenGL Shading Language
      (GLSL) isn't substantially different from the assembly interface, except
      that the interfaces between programmable pipeline stages are more
      tightly coupled in GLSL (vertex, geometry, and fragment shaders are
      linked together into a single program object), and that shader variables
      are more strongly typed in GLSL than in the assembly interface.

      In the figure below, the first programmable stage is vertex program
      execution.  For all inputs read by the vertex program, they must be
      specified in the GL vertex APIs (immediate mode or vertex arrays) using
      a data type matching the data type read by the shader.  Additionally,
      vertex programs (and all other program types) can read program
      parameters, parameter buffers, and textures.  In all cases the
      parameter, buffer, or texture data must be accessed in the shader using
      the same data type used to specify the data.  If vertex programs are
      disabled, fixed-function vertex processing is used.  Fixed-function
      vertex processing is fully floating-point, and all the conventional
      vertex attributes and state used by fixed-function are floating-point
      values.

      After vertex processing, an optional geometry program can be executed,
      which reads attributes written by vertex programs (or fixed-functon) and
      writes out new vertex attributes.  The vertex attributes it reads must
      have been written by the vertex program (or fixed-function) using a
      matching data type.

      After geometry program execution, vertex attributes can optionally be
      written out to buffer objects using the NV_transform_feedback extension.
      The vertex attributes are written by the GL to the buffer objects using
      the same data type used to write the attribute in the geometry program
      (or vertex program if geometry programs are disabled).

      Then, rasterization generates fragments based on transformed vertices.
      Most attributes written by vertex or geometry programs can be read by
      fragment programs, after the rasterization hardware "interpolates" them.
      This extension allows fragment programs to control how each attribute is
      interpolated.  If an attribute is flat-shaded, it will be taken from the
      output attribute of the provoking vertex of the primitive using the same
      data type.  If an attribute is smooth-shaded, the per-vertex attributes
      will be interpreted as a floating-point value, and a floating-point
      result.  One necessary consequence of this is that any integer
      per-fragment attributes must be flat-shaded.  To prevent some
      interpolation type errors, assembly and GLSL fragment shaders will not
      compile if they declare an integer fragment attribute that is not flat
      shaded.  [NOTE:  While point primitives generally have constant
      attributes, any integer attributes must still be flat-shaded; point
      rasterization may perform (degenerate) floating-point interpolation.]

      Fragment programs must read attributes using data types matching the
      outputs of the interpolation or flat-shading operations.  They may write
      one or more color outputs using any data type, but the data type used
      must match the corresponding framebuffer attachments.  Outputs directed
      at signed or unsigned integer textures (EXT_texture_integer) must be
      written using the appropriate integer data type; all other outputs must
      be written as floating-point values.  Note that some of the
      fixed-function per-fragment operations (e.g., blending, alpha test) are
      specified as floating-point operations and are skipped when directed at
      signed or unsigned integer color buffers.


                                     generic               conventional
                                     vertex                  vertex
                                    attributes              attributes
                                       | (typeless)             | (float)
                                       |                        |
                                       |                        |
                                       | +----------------------+
         program                       | |                      |
        parameters ----+               | |                      |
        (typeless)     |               | | (typeless)           | (float)
                       |               V V                      V
         constant      +-+----------> vertex              fixed-function
         buffers   ----+ |(typeless)  program                 vertex
        (typeless)     | |              |                       |
                       | |              | (typeless)            | (float)
         textures  ----+ |              V                       |
        (typeless)       |              |<----------------------+
            |            |              |
            |            |              +---------------+
            |            |              |               |
            |            |              | (typeless)    |
            |            |              V               |
            |            +---------> geometry           |
            |            |(typeless) program            |
            |            |              |               |
            |            |              | (typeless)    |
            |            |              V               |
            |            |              |<--------------+
            |            |              |
            |            |              |
            |            |              +-----------------+
            |            |              |                 |(typeless)
            |            |              |                 v
            |            |              |             transform
            |            |              |             feedback
            |            |              |              buffers
            |            |              |
            |            |              |
            |            |              +-----------------------+
            |            |              |                       |
            |            |              | (float)               | (typeless)
            |            |              V                       V
            |            |         interpolated               flat
            |            |          attributes             attributes
            |            |              |                       |
            |            |              | (float)               | (typeless)
            |            |              V                       |
            |            |              |<----------------------+
            |            |              |
            |            |              +-----------------------+
            |            |              |                       |
            |            |              | (typeless)            | (float)
            |            |(typeless)    V                       V
            |            +---------> fragment     +------> fixed-function
            |                        program      |(float)   fragment
            |                           |         |             |
            +--------------------------/|/--------+             |
                                        |                       |
                                        | (typeless)            | (float)
                                        V                       |
                                        |<----------------------+
                                        |
                                        +-----------------------+------ ....
                                        |                       |
                                        | (typeless)            | (typeless)
                                        V                       V
                                      color                   color
                                    attachment              attachment
                                        0                       1


    (5) Instructions can operate on signed integer, unsigned integer, and
    floating-point values.  Some operations make sense on all three data
    types?  How is this supported, and what type checking support is provided
    by the assembler?

      RESOLVED:  One important property of the instruction set is that the
      data type for all operands and the result is fully specified by the
      instructions themselves.  For instructions (such as ADD) that make sense
      for both integer and floating-point values, an optional data type
      modifier is provided to indicate which type of operation should be
      performed.  For example, "ADD.S", "ADD.U", and "ADD.F", add signed
      integers, unsigned integers, or floating-point values, respectively.  If
      no data type modifier is provided, ".F" is assumed if the instruction
      can apply to floating-point values and ".S" is assumed otherwise.

      To help identify errors where the wrong data type is used -- for
      example, adding integer values in an ADD instruction that omits a data
      type modifier and thus defaults to "ADD.F" -- variables may be declared
      with optional data type modifiers.  In the following code:

        INT TEMP a;
        UINT TEMP b;
        FLOAT TEMP c;
        TEMP d;

      "a", "b", "c", and "d" are declared as temporary variables holding
      signed integer, unsigned integer, floating-point, and typeless values.
      Since each instruction fully specifies the data type of each operand and
      its result, these data types can be checked against the data type
      assigned to the variables operated on.  If the types don't match, and
      the variable is not typeless, an error is reported.  The opcode modifier
      ".NTC" can be used to ignore such errors on a per-opcode basis, if
      required.

      Note that when bindings are used directly in instructions, they are
      always considered typeless for simplicity.  Some fixed-function bindings
      have an obvious data type, but other bindings (e.g., program parameters)
      can hold either integer or floating-point values, depending on how they
      were specified.

      Variable data types are optional.  Typeless variables are provided
      because some programs may want to reuse the same variable in several
      places with different data types.

    (6) Should both signed (INT) and unsigned integer (UINT) data types be
    provided?

      RESOLVED:  Yes.  Signed and unsigned integer operations are supported.
      Providing both "INT" and "UINT" variable modifiers distinguish between
      signed and unsigned values for type checking purposes, to ensure that
      unsigned values aren't read as signed values and vice versa.

      This specification says if a value is read a signed integer, but was
      written as an unsigned integer, the value returned is undefined.
      However, signed and unsigned integers are interchangeable in practice,
      except for very large unsigned integers (which can't be represented as
      signed values of the equivalent size) or negative signed integers.

      If programs know that they won't generate negative or very large values,
      signed and unsigned integers can be used interchangeably.  To avoid type
      errors in the assembler in this case, typeless variables can be used.
      Or the ".NTC" modifier can be used when appropriate.

    (7) Integer and floating-point constants are supported in the instruction
    set.  Integer constants might be interpreted to mean either "real integer"
    values or floating-point values.  How are they supported?

      RESOLVED:  When an obvious floating point constant is specified (e.g.,
      "3.0"), the developers' intent is clear.  If you try to use a
      floating-point value in an instruction that wants an integer operand, or
      a declaration of an integer parameter variable, the program will fail to
      load.  An integer constant used in an instruction isn't quite as clear.
      But its meaning can be easily inferred because the operand types of
      instructions are well-known at compile time.  An integer multiply
      involving the constant "2" will interpret the "2" as an integer.  A
      floating-point multiply involving the same constant "2" will interpret
      it as a floating-point value.

      The only real problem is for a parameter declaration that is typeless.
      For typed variables, the intent is clear:

        INT PARAM two = 2;               # use integer 2
        FLOAT PARAM twoPt0 = 2;          # use floating-point 2.0

      For typeless variables, there's no context to go on:

        PARAM two = 2;                   # 2?  2.0?

      This extension is intended to be largely upward-compatible with
      ARB_vertex_program, ARB_fragment_program, and the other extensions built
      on top of them.  In all of these, the previous declaration is legal and
      means "2.0".  For compatibility, we choose to interpret integer
      constants in this case as floating-point values.  The assembler in the
      NVIDIA implementation will issue a warning if this case ever occurs.

      This extension does not provide decoration of integer constant values --
      we considered adding suffixed integers such as "2U" to mean "2, and
      don't even think about converting me to a float!".  We expect that it
      will be sufficient to use the "INT" or "FLOAT" modifiers to disambiguate
      effectively.

    (8) Should hexadecimal constants (e.g., 0x87A3 or 0xFFFFFFFF) be supported?

      RESOLVED:  Yes.

    (9) Should we provide data type modifiers with explicit component sizes?
    For example, "INT8", "FLOAT16", or "INT32".  If so, should we provide a
    mechanism to query the size (in bits) of a variable, or of different
    variable types/qualifiers?

      RESOLVED:  No.

    (10) Should this extension provide better support for array variables?

      RESOLVED:  Yes; array variables of all types are allowed.

      In ARB_vertex_program, program parameter (constant) variables could be
      addressed as arrays.  Temporary variables, vertex attributes, and vertex
      results could not be declared as arrays.

      In NV_vertex_program3 and NV_fragment_program2, relative addressing was
      supported in program bindings:

        MOV R0, vertex.attrib[A0.x];            # vertex
        MOV result.texcoord[A0.x], R0;          # vertex
        MOV R0, fragment.texcoord[A0.x];        # fragment -- inside LOOP

      Explicitly declared attribute or result arrays were not supported, and
      temporaries could also not be arrays.

      This extension allows users to declare attribute, result, and temporary
      arrays such as:

        ATTRIB attribs[] = { vertex.attrib[7..11] };
        TEMP scratch[10];
        RESULT texcoords[] = { result.texcoord[0..3] };

      Additionally, the relative addressing mechanisms provided by
      NV_vertex_program3 and NV_fragment_program2 are NOT supported in this
      extension -- instead, declared array variables are the only way to get
      relative addressing.  Using declared arrays allows the assembler to
      identify which attributes will actually be used.  An expression like
      "vertex.texcoord[A0.x]" doesn't identify which texture coordinates are
      referenced, and the assembler must be conservative in this case and
      assume that they all are.

    (11) Is relative addressing of temporaries allowed?

      RESOLVED:  Yes.  However, arrays of temporaries may end up being stored
      in off-chip memory, and may be slower to access than non-array
      temporaries.

    (12) Should this extension add bindings to pass generic attributes between
    vertex, geometry, and fragment programs, or are texture coordinates
    sufficient?

      RESOLVED:  While texture coordinates have been used in the past, generic
      attributes should be provided.

      The assembler provides a large set of bindings and automatically
      eliminates generic attributes or components that are unused.  At each
      interface between programs, there is an implementation-dependent limit
      on the number of attribute components that can be passed.

      There are several reasons that this approach was chosen.  First, if the
      number of attributes that can be passed between program stages exceeds
      the number of existing texture coordinate sets supported when specifying
      vertex, a second implementation-dependent number of texture coordinates
      would need to be exposed to cover the number supported between stages.
      Second, the mechanisms described above reduce or eliminate the need to
      pack attributes into four component vectors.  Third, "texture
      coordinates" that have been historically used for texture lookups don't
      need to be used to pass values that aren't used this way.

    (13) The structured branching support in NV_fragment_program2 provides a
    REP instruction that says to repeat a block of code <N> times, as well as
    a LOOP instruction that does the same, but also provides a special loop
    counter variable.  What sort of looping mechanism should we provide here?

      RESOLVED:  Provide only the REP instruction.  The functionality provided
      by the LOOP instruction can be easily achieved by using an integer
      temporary as the loop index.  This avoids two annoyances of the old LOOP
      models:  (a) the loop index (A0.x) is a special variable name, while all
      other variables are declared normally and (b) instructions can only
      access the loop index of the innermost loop -- loop indices at higher
      nesting levels are not accessible.

      One other option was a considered -- a "LOOPV" instruction (LOOP with a
      variable where the program specified a variable name and component to
      hold the loop index, instead of using the implicit variable name "A0.x".
      In the end, it was decided that using an integer temporary as a loop
      counter was sufficient.

    (14) The structured branching support in NV_fragment_program2 provides a
    REP instruction that requires a loop count.  Some looping constructs may
    not have a definite loop count, such as a "while" statement in C.  Should
    this construct be supported, and if so, how?

      RESOLVED:  The REP instruction is extended to make the loop count
      optional.  If no loop count is provided, the REP instruction specified a
      loop that can only be exited using the BRK (break) or RET instructions.
      To avoid obvious infinite loops, an error will be reported if a
      REP/ENDREP block contains no BRK instruction at the current nesting
      level and no RET instruction at any nesting level.

      To implement a loop like "while (value < 7.0) ...", code such as the
      following can be used:

        TEMP cc;                        # dummy variable
        REP;
          SLT.CC cc.x, value.x, 7.0;    # compare value.x to 7.0, set CC0
          BRK NE.x;                     # break out if not true
          ...
          ...                           # presumably update value!
          ...
        ENDREP;

    (15) The structured branching support in NV_fragment_program2 provides a
    BRK instruction that operates like C's "break" statement.  Should we
    provide something similar to C's "continue" statement, which skips to the
    next iteration of the loop?

      RESOLVED:  Yes, a new CONT opcode is provided for this purpose.

    (16) Can the BRK or CONT instructions break out of multiple levels of
    nested loops at once?

      RESOLVED:  No.  BRK and CONT only exit the current nesting level.  To
      break out of multiple levels of nested loops, multiple BRK/CONT
      instructions are required.

    (17) For REP instructions, is the loop counter reloaded on each iteration
    of the loop?

      RESOLVED:  No.  The loop counter is loaded once at the top of the loop,
      compared to zero at the top of the loop, and decremented when each loop
      iteration completes.  A program may overwrite the variable used to
      specify the initial value of the loop counter inside the loop without
      affecting the number of times the loop body is executed.

    (18) How are floating-point values represented in this extension?  What
    about floating-point arithmetic operations?

      RESOLVED:  In the initial hardware implementation of this extension,
      floating-point values are represented using the standard 32-bit IEEE
      single-precision encoding, consisting of a sign bit, 8 exponent bits,
      and 23 mantissa bits.  Special encodings for NaN (not a number), +/-INF
      (infinity), and positive and negative zero are supported.  Denorms
      (values less than 2^-126, which have an exponent encoding of "0" and no
      implied leading one) are supported, but may be flushed to zero,
      preserving the sign bit of the original value.  Arithmetic operations
      are carried out at single-precision using normal IEEE floating-point
      rules, including special rules for generating infinities, NaNs, and
      zeros of each sign.

      Floating-point temporaries declared as "SHORT" may be, but are not
      necessarily, stored as 16-bit "fp16" values (sign bit, five exponent
      bits, ten mantissa bits), as specified in the NV_float_buffer and
      ARB_half_float_pixel extensions.

    (19) Should we provide a method to declare how fragment attributes are
    interpolated?  It is possible to have flat-shaded attributes,
    perspective-corrected attributes, and centroid-sampled attributes.

      RESOLVED:  Yes.  Fragment program attribute variable declarations may
      specify the "FLAT", "NOPERSPECTIVE", and "CENTROID" modifiers.

      These modifiers are documented in detail in the NV_fragment_program4
      specification.

    (20) Should vertex and primitive identifiers be supported?  If so, how?

      RESOLVED:  A vertex identifier is available as "vertex.id" in a vertex
      program.  The vertex ID is equal to value effectively passed to
      ArrayElement when the vertex is specified, and is defined only if vertex
      arrays are used with buffer objects (VBOs).

      A primitive identifier is available as "primitive.id" in a geometry or
      fragment program.  The primitive ID is equal to the number of primitives
      processed since the last implicit or explicit call to glBegin().

      See the NV_vertex_program4 spec for more information on vertex IDs, and
      the NV_geometry_program4 or NV_fragment_program4 specs for more
      information on primitive IDs.

    (21) For integer opcodes, should a bitwise inversion operator "~" be
    provided, analogous to existing negation operator?

      RESOLVED:  No.  If this operator were provided, it might allow a program
      to evaluate the expression "a&(~b)" using a single instruction:

        AND.U a, a, ~b;

      Instead, it is necessary to instead do something like:

        UINT TEMP t;
        NOT.U t, b;
        AND.U a, a, t;

      If necessary, this functionality could be added in a subsequent
      extension.

    (22) What happens if you negate or take the absolute value of the
    biggest-magnitude negative integer?

      RESOLVED:  Signed integers are represented using two's complement
      representation.  For 32-bit integers, the largest possible value is
      2^31-1; the smallest possible value is -2^31.  There is no way to
      represent 2^31, which is what these operators "should" return.  The
      value returned in this case is the original value of -2^31.

    (23) How do condition codes work?  How are they different from those
    provided in previous NVIDIA extensions?

      RESOLVED:  There are two condition codes -- CC0 and CC1 -- each of which
      is a four-component vector.  The condition codes are set based on the
      result of an instruction that specifies a condition code update
      modifier.  Examples include:

        ADD.S.CC  R0, R1, R2;       # add signed integers R1 and R2, update
                                    #   CC0 based on the result, write the
                                    #   final value to R0
        ADD.F.CC1 R3, R4, R5;       # add floats R4 and R5, update CC1 based
                                    #   on the result, write the final value
                                    #   to R3
        ADD.U.CC0 R6.xy, R7, R8;    # add unsigned integers R7 and R8, update
                                    #   CC0 (x and y components) based on the
                                    #   result, write the final value to R6
                                    #   (x and y components)

      Condition codes can be used for conditional writes, conditional
      branches, or other operations.  The condition codes aren't used
      directly, but are instead used with a condition code test such as "LT"
      (less than) or "EQ" (equal to).  Examples include:

        MOV R0 (GT.x), R1;          # move R1 to R0 only if the x component of
                                    #   CC0 indicates a result of ">0"
        MOV R2 (NE1), R3;           # component-wise move of R3 to R2 if the
                                    #   corresponding component of CC1
                                    #   indicates a result of "!=0"
        IF LE0.xyxy;                # execute the block of code if the x or
          ...                       #   y components of CC0 indicate a result
        ENDIF;                      #   of "<=0"
        REP;
          ...
          BRK EQ1.xyzx;             # break out of loop if the x, y, or z
        ENDREP;                     #   components of CC1 indicate a result of
                                    #   "==0".

      Previous NVIDIA extensions provide eight tests, which are still
      supported here.  The tests "EQ" (equal), "GE" (greater/equal), "GT"
      (greater than), "LE" (less/equal), "LT" (less than), and "NE" (not
      equal) can be used to determine the relation of the result used to set
      the condition code with zero.  The tests "TR" (true) and "FL" (false),
      are special tests that always evaluate to true or false respectively.

      For floating-point results, a NaN (not a number) encoding causes the
      "NE" condition to evaluate to TRUE and all other conditions to evaluate
      to FALSE.  IEEE encodings for "negative" and "positive" zero are both
      treated as equal to zero.

      Condition codes are implemented as a set of flags, which are set
      depending on the type of operation, as described in the spec.

      For instructions that return floating-point or signed integer values,
      the normal condition code tests reliably indicate the relationship of
      the result to zero.  For instructions that return unsigned values, the
      condition codes are a bit more complicated.  For example, the sign flag
      is set if the most significant bit of the result written is set.  As a
      result, very large unsigned integer values (e.g., 0x80000000 -
      0xFFFFFFFF) are effectively treated as negative values.  Condition code
      tests should be used with care with unsigned results -- to test if an
      unsigned integer is ">0", use a sequence like:

        MOV.U.CC R0, R1;            # move R1 to R0, set condition code
        IF NE;                      # test if the result is "!=0", a very
          ...                       #   large value might fail "GT"!
        ENDIF;

      This extension provides a number of additional condition code tests
      useful for different floating-point or integer operations:

        * NAN (not a number) is true if a floating-point result is a NaN.  LEG
          (less, equal to, or greater) is the opposite of NAN.

        * CF (carry flag) is true if an unsigned add overflows, or if an
          unsigned subtract produces a non-negative value.  NCF (no carry
          flag) is the opposite of CF.

        * OF (overflow flag) is true if a signed add or subtract overflows.
          NOF (no overflow flag) is the opposite of OF.

        * SF (sign flag) is true if the sign flag is set.  NSF (no sign flag)
          is the opposite of SF.

        * AB (above) is true if an unsigned subtract produces a positive
          result.  BLE (below or equal) is the opposite of AB, and is true if
          an unsigned subtract produces a negative result or zero.  Note that
          CF can be used to test if the result is greater than or equal to
          zero, and NCF can be used to test if the result is less than zero.

    (24) How do the "set on" instructions (SEQ, SGE, SGT, SLE, SLT, SNE) work
    with integer values and/or condition codes?

      RESOLVED:  "Set on" instructions comparing signed and unsigned values
      return zero if the condition is false, and an integer with all bits set
      if the condition is true.  If the result is signed, it is interpreted as
      -1.  If the result is unsigned, it is interpreted the largest unsigned
      value (0xFFFFFFFF for 32-bit integers).  This is different from the
      floating-point "set on", which is defined to return 1.0.

      This specific result encoding was chosen so that bitwise operators (NOT,
      AND, OR, XOR) can be used to evaluate boolean expressions.

      When performing condition code tests on the results of an integer "set
      on" instruction, keep in mind that a TRUE result has the most
      significant bit set and will be interpreted as a negative value.  To
      test if a condition is true, use "NE" (!=0).  A condition code test of
      "GT" will always fail if the condition code was written by an integer
      "set on" instruction.

    (25) What new texture functionality is provided?

      RESOLVED:  Several new features are provided.

      First, the TXF (texel fetch) instruction allows programs to access a
      texture map like a normal array.  Integer coordinates identifying an
      individual texel and LOD are provided, and the corresponding texture
      data is returned without filtering of any type.

      Second, the TXQ (texture size query) instruction allows programs to
      query the size of a specified level of detail of a texture.  This
      feature allows programs to perform computations dependent on the size of
      the texture without having to pass the size as a program parameter or
      via some other mechanism.

      Third, applications may specify a constant texel offset in a texture
      instruction that moves the texture sample point by the specified number
      of texels.  This offset can be used to perform custom texture filtering,
      and is also independent of the size of the texture LOD -- the same
      offsets are applied, regardless of the mipmap level.

      Fourth, shadow mapping is supported for cube map textures.  The first
      three coordinates are the normal (s,t,r) coordinates for a cube map
      texture lookup, and the fourth component is a depth reference value that
      can be compared to the depth value stored in the texture.

    (26) What "consistency" requirements are in effect for textures accessed
    via the TXF (texel fetch) instruction?

      UNRESOLVED:  The texture must be usable for regular texture mapping
      operations -- if texture sizes or formats are inconsistent and a
      mipmapped min filter is used, the results are undefined.

    (27) How does the TXF instruction work with bordered textures?

      RESOLVED:  The entire image can be accessed, including the border
      texels.  For a 64x64 2D texture plus border (66x66 overall), the lower
      left border texel is accessed using the coordinates (-1,-1); the upper
      right border texel is accessed using the coordinates (64,64).

    (28) What should TXQ (texture size query) return for "irrelevant" texture
    sizes (e.g., height of a 1D texture)?  Should it return any other
    information at the same time?

      RESOLVED:  This specification leaves all "extra" components undefined.

    (29) How do texture offsets interact with cubemap textures?

      RESOLVED:  They are not supported in this extension.

    (30) How do texture offsets interact with mipmapped textures?

      RESOLVED:  The texture offsets are added after the (s,t,r) coordinates
      have been divided by q (if applicable) and converted to (u,v,w)
      coordinates by multiplying by the size of the selected texture level.
      The offsets are added to the (u,v,w) coordinates, and always move the
      sample point by an integral number of texel coordinates.  If multiple
      mipmaps are accessed, the sample point in each mipmap level is moved by
      an identical offset.  The applied offsets are independent of the
      selected mipmap level.

    (31) How do shadow cube maps work?

      UNRESOLVED:  An application can define a cube map texture with a
      DEPTH_COMPONENT internal format, and then render a scene using the cube
      map faces as the depth buffer(s).  When rendering the projection should
      be set up using the "center" of the cubemap as the eye, and using a
      normal projection matrix.  When applying the shadow map, the fragment
      program read the (x,y,z) eye coordinates, compute the length of the
      major axis (MAX(|x|,|y|,|z|) and then transform this coordinate to [0,1]
      space using the same parameters used to derive Z in the projection
      matrix.  A 4-component vector consisting of x, y, z, and this computed
      depth value should be passed to the texture lookup, and normal shadow
      mapping operations will be performed.

      This issue should include the math needed to do this computation and
      sample code.

    (32) Integer multiplies can overflow by a lot.  Should there be some way
    to return the high part of both unsigned and signed integer multiplies?

      RESOLVED:  Yes.  The ".HI" multipler is provided to do a return the 32
      MSBs of a 32x32 integer multiply.  The instruction sequence:

        INT TEMP R0, R1, R2, R3;
        MUL.S    R0, R2, R3;
        MUL.S.HI R1, R2, R3;

     will do a 32x32 signed integer multiply of R2 and R3, with the 32 LSBs of
     the 64-bit result in R0 and the 32 MSBs in R1.

    (33) Should there be any other special multiplication modifiers?

      RESOLVED:  Yes.  The ".S24" and ".U24" modifiers allow for signed and
      unsigned integer multiplies where both operands are guaranteed to fit in
      the least significant 24 bits.  On some architectures supporting this
      extension, ".S24" and ".U24" integer multiplies may be faster than
      general-purpose ".S" and ".U" multiplies.  If either value doesn't fit
      in 24 bits, the results of the operation are undefined --
      implementations may, but are not required to, ignore the MSBs of the
      operands if ".S24" or ".U24" is specified.

    (34) This extension provides subroutines, but doesn't provide a stack to
    push and pop parameters.  How do we deal with this?  NV_vertex_program3
    supported PUSHA/POPA instructions to push and pop address registers.

      RESOLVED:  No explicit stack is required.  A program can implement a
      stack by allocating a temporary array plus a single integer temporary to
      use as the stack "pointer".  For example:

        TEMP stack[256];                # 256 4-component vectors
        INT TEMP sp;                    # sp.x == stack pointer
        INT TEMP cc;                    # condition code results

        function:
          SGE.S.CC cc.x, sp.x, 256;     # compute stackPointer >= 256
          RET NE.x;                     # return if TRUE
          MOV stack[sp], R0;            # push R0 onto the stack
          ADD.S sp.x, sp.x, 1;
          ...
          SUB.S sp.x, sp.x, 1;          # pop R0 off the stack
          MOV R0, stack[sp];
          RET

    (35) Should we provide new vector semantics for previously-defined opcodes
    (e.g., LG2 computes a component-wise logarithm)?

      RESOLVED:  Not in this extension.  The instructions we define here are
      compatible with the vector or scalar nature of previously defined
      opcodes.  This simplifies the implementation of an assembler that needs
      to support both old and new instruction sets.

    (36) Should it really be undefined to read from a register storing data of
    one type with an instruction of the other type (e.g., to read the bits of
    a floating-point number as an unsigned integer)?

      RESOLVED:  The spec describes undefined results for simplicity.  In
      practice, mixing data types can be done, where signed integers are
      represented as two's complement integers and floating-point numbers are
      represented using IEEE single-precision representation.  For example:

        TEMP R0, R1;                    # typeless
        MOV.U R0, 0x3F800000;           # R0 = 1.0
        MOV.U R1, 0xBF800000;           # R1 = -1.0
        MUL.F R0, R0, R1;               # R0 = -1 * 1 = -1 (0xBF800000)
        XOR.U R0, R0, R1;               # R0 = 0xBF800000 ^ 0xBF800000 = 0
        NOT.U R0, R0;                   # R0 = 0xFFFFFFFF
        I2F.S R0, R0;                   # R0 = -1.0 (0xFFFFFFFF = -1 signed)
        SEQ.F R0, R0, R1;               # R0 = 1.0 (-1.0 == -1.0)

    (37) Buffer objects can be sourced as program parameters using the
    NV_parameter_buffer_object extension.  How are they accessed in a program?

      RESOLVED:  The instruction set and existing program environment and
      local parameter bindings operate largely on four-component vectors.
      However, NV_parameter_buffer_object exposes the ability to reach into
      buffers consisting of user-generated data or data written to the buffer
      object by the GPU.  Such data sets may not consist entirely
      four-component floating-point vectors, so a four-component vector API
      may be unnatural.  An application might need to reformat its data set to
      deal with this issue.  Or it might generate odd code to compensate for
      mis-alignment -- for example, reading an array of 3-component vectors by
      doing two four-component vector accesses and then rotating based on
      alignment.  Neither approach is particularly satisfying.

      Instead, this extension takes the approach of treating parameter buffers
      as array of scalar words.  When an individual buffer element is read,
      the single word is replicated to produce a four-component vector.  To
      access an array of 3-component vectors, code like the following can be
      used:

        PARAM buffer[] = { program.buffer[0] };
        INT TEMP index;
        TEMP R0;
        ...
        MUL.S index, index, 3;          # to read "vec3" #X, compute 3*X
        MOV R0.x, buffer[index+0];
        MOV R0.y, buffer[index+1];
        MOV R0.z, buffer[index+2];

    (38) Should recursion be allowed?  If so, how is the total amount of
    recursion limited?

      RESOLVED:  Recursion is allowed, and a call stack is provided by the
      implementation.  The size of the call stack is limited to the
      implementation-dependent constant MAX_PROGRAM_CALL_DEPTH, and when a the
      call stack is full, the results of further CAL instructions is
      undefined.  In the initial implementation of this extension, such
      instructions will have no effect.

      Note that no stack is provided to hold local registers; a program may
      implement its own via a temporary array and integer stack "pointer".

    (39) Variables are all four-component vectors in previous extensions.
    Should scalar or small-vector variables be provided?

      RESOLVED:  It would be a useful feature, but it was left out for
      simplicity.  In practice, a variable where only the X component is used
      will be equivalent to a scalar.

    (40) The PK* (pack) and UP* (unpack) instructions allow packing multiple
    components of data into a single component.  The bit packing is
    well-defined.  Should we require specific data types (e.g., unsigned
    integer) to hold packed values?

      RESOLVED:  No.  Previous instruction sets only allowed programs to write
      packed values to a floating-point variable (the only data type
      provided).  We will allow packed results to be written to a variable of
      any data type.  Integer instructions can be used to manipulate bits of
      packed data in place.

    (41) What happens when converting integers to floats or vice versa if
    there is insufficient precision or range to represent the result?

      RESOLVED:  For integer-to-float conversions, the nearest representable
      floating-point value is used, and the least significant bits of the
      original integer value are lost.  For float-to-integer conversions,
      out-of-range values are clamped to the nearest representable integer.

    (42) Why are some of the grammar rules so bizarre (e.g., attribUseD,
    attribUseV, attribUseS, attribUseVNS)?

      RESOLVED:  This grammar is based upon the original ARB_vertex_program
      grammar, which has a number of "interesting" characteristics.  For
      example, some of the bindings provided by ARB_vertex_program naturally
      require some amount of lookahead.  For example, a vertex program can
      write an output color using any of the following:

        MOV result.color, 0;            # primary color
        MOV result.color.primary, 0;    # primary color again
        MOV result.color.secondary, 0;  # secondary color this time

      The pieces of the color binding are separated by "." tokens.  However,
      writemasks are also supported, which also use "." before the write
      mask.  So, we could also have something like:

        MOV result.color.xyz, 0;        # primary color with W masked off

      In this form, a parser needs to look at both the "." and the "xyz" to
      determine that the binding being used is "result.color" (and not
      "result.color.secondary").

      Additionally, some checks that should probably be semantic errors (e.g.,
      allowing different swizzle or scalar operand selectors per instruction,
      or disallowing both in the case of SWZ) we specified in the original
      grammar.

      ARB_fragment_program and subsequent NVIDIA instructions built upon this,
      and the grammar for this extension was rewritten in the current form so
      it could be validated more easily.

    (43) This is an NV extension (NV_gpu_program4).  Why does the
     MAX_PROGRAM_TEXEL_OFFSET_EXT token has an "EXT" suffix?

      RESOLVED:  This token is shared between this extension and the
      comparable high-level GLSL programmability extension (EXT_gpu_shader4).
      Rather than provide a duplicate set of token names, we simply use the
      EXT version here.

    (44) For the purposes of determining the number of attribute and result
         components, how are "scalar" attributes counted.  For example, only
         the x component of the "pointsize" per-vertex output is actually
         relevant.

      RESOLVED:  Implementations are allowed to count all inputs and outputs
      as full four-component vectors.  To avoid this, apply appropriate write
      masks or swizzles.

      For example, writing to "result.pointsize" may count as four components.
      Consistently writing to "result.pointsize.x" may only count as one.
      Similarly, reading a fragment's fog coordinate as "fragment.fogcoord"
      may count as four components; "fragment.fogcoord.x" will only count as
      one.

Revision History

    Rev.    Date    Author    Changes
    ----  --------  --------  --------------------------------------------
    11    09/11/14  pbrown    Fix cut-and-paste error in PK2US section.

    10    12/14/09  mgodse    Added GLX protocol.

     9    10/29/09  pbrown    Add language for previously undocumented errors
                              when using "SHORT" and "LONG" modifiers on
                              variable declarations.  They're allowed only on
                              "TEMP" statements, except that "SHORT" is
                              allowed for "OUTPUT" as well.

     8    08/11/08  jbreton   Clarified that when a MOD instruction is
                              performed on negative operands the result is
                              undefined.

     7    07/29/08  pbrown    Discovered additional issues with texture wrap
                              handling, replaced with logic that applies wrap
                              modes per sample.  Add a few instruction
                              pseudo-code lines explicitly identifying
                              undefined components.

     6    05/02/08  pbrown    Fix the prototype for the internal TexelFetch()
                              function used in the spec language; texel
                              coordinates are signed integers.

     5    02/22/08  pbrown    Clarified that when counting attribute/result
                              components, irrelevant/undefined components
                              can still count against the limits.

     4    02/04/08  pbrown    Fix errors in texture wrap mode handling.
                              Added a missing clamp to avoid sampling border
                              in REPEAT mode.  Fixed incorrectly specified
                              weights for LINEAR filtering.

     3    02/09/07  pbrown    Updated status section (now released).

     2    10/19/06  pbrown    Change the token suffix for maximum texel offset
                              values from NV to EXT, since it is shared with
                              EXT_gpu_shader4.  Clarify what happens on a
                              negate of an unsigned value.  Fix typo in data
                              type modifier description.  Add missing
                              description of the "BUFFER4" declaration
                              keyword.

     1              pbrown    Internal spec development.