• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name
2
3    NV_fragment_program
4
5Name Strings
6
7    GL_NV_fragment_program
8
9Contact
10
11    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
12    Mark J. Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com)
13
14Notice
15
16    Copyright NVIDIA Corporation, 2001-2002.
17
18IP Status
19
20    NVIDIA Proprietary.
21
22Status
23
24    Implemented in CineFX (NV30) Emulation driver, August 2002.
25    Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003.
26
27Version
28
29    Last Modified Date:  2005/05/24
30    NVIDIA Revision:     73
31
32Number
33
34    282
35
36Dependencies
37
38    Written based on the wording of the OpenGL 1.2.1 specification and
39    requires OpenGL 1.2.1.
40
41    Requires support for the ARB_multitexture extension with at least
42    two texture units.
43
44    NV_vertex_program affects the definition of this extension.  The only
45    dependency is that both extensions use the same mechanisms for defining
46    and binding programs.
47
48    NV_texture_shader trivially affects the definition of this extension.
49
50    NV_texture_rectangle trivially affects the definition of this extension.
51
52    ARB_texture_cube_map trivially affects the definition of this extension.
53
54    EXT_fog_coord trivially affects the definition of this extension.
55
56    NV_depth_clamp affects the definition of this extension.
57
58    ARB_depth_texture and SGIX_depth_texture affect the definition of this
59    extension.
60
61    NV_float_buffer affects the definition of this extension.
62
63    ARB_vertex_program affects the definition of this extension.
64
65    ARB_fragment_program affects the definition of this extension.
66
67Overview
68
69    OpenGL mandates a certain set of configurable per-fragment computations
70    defining texture lookup, texture environment, color sum, and fog
71    operations.  Each of these areas provide a useful but limited set of fixed
72    operations.  For example, unextended OpenGL 1.2.1 provides only four
73    texture environment modes, color sum, and three fog modes.  Many OpenGL
74    extensions have either improved existing functionality or introduced new
75    configurable fragment operations.  While these extensions have enabled new
76    and interesting rendering effects, the set of effects is limited by the
77    set of special modes introduced by the extension.  This lack of
78    flexibility is in contrast to the high-level of programmability of
79    general-purpose CPUs and other (frequently software-based) shading
80    languages.  The purpose of this extension is to expose to the OpenGL
81    application writer an unprecedented degree of programmability in the
82    computation of final fragment colors and depth values.
83
84    This extension provides a mechanism for defining fragment program
85    instruction sequences for application-defined fragment programs.  When in
86    fragment program mode, a program is executed each time a fragment is
87    produced by rasterization.  The inputs for the program are the attributes
88    (position, colors, texture coordinates) associated with the fragment and a
89    set of constant registers.  A fragment program can perform mathematical
90    computations and texture lookups using arbitrary texture coordinates.  The
91    results of a fragment program are new color and depth values for the
92    fragment.
93
94    This extension defines a programming model including a 4-component vector
95    instruction set, 16- and 32-bit floating-point data types, and a
96    relatively large set of temporary registers.  The programming model also
97    includes a condition code vector which can be used to mask register writes
98    at run-time or kill fragments altogether.  The syntax, program
99    instructions, and general semantics are similar to those in the
100    NV_vertex_program and NV_vertex_program2 extensions, which provide for the
101    execution of an arbitrary program each time the GL receives a vertex.
102
103    The fragment program execution environment is designed for efficient
104    hardware implementation and to support a wide variety of programs.  By
105    design, the entire set of existing fragment programs defined by existing
106    OpenGL per-fragment computation extensions can be implemented using the
107    extension's programming model.
108
109    The fragment program execution environment accesses textures via
110    arbitrarily computed texture coordinates.  As such, there is no necessary
111    correspondence between the texture coordinates and texture maps previously
112    lumped into a single "texture unit".  This extension separates the notion
113    of "texture coordinate sets" and "texture image units" (texture maps and
114    associated parameters), allowing implementations with a different number
115    of each.  The initial implementation of this extension will support 8
116    texture coordinate sets and 16 texture image units.
117
118Issues
119
120    What limitations exist in this extension?
121
122        RESOLVED:  Very few.  Programs can not exceed a maximum program length
123        (which is no less than 1024 instructions), and can use no more than
124        32-64 temporary registers.  Programs can not access more than one
125        fragment attribute or program parameter (constant) per instruction,
126        but can work around this restriction using temporaries.  The number of
127        textures that can be used by a program is limited to the number of
128        texture image units provided by the implementation (16 in the initial
129        implementation of this extension).
130
131        These limits are fairly high.  Additionally, there is no limit on the
132        total number of texture lookups that can be performed by a program.
133        There is no limit on the length of a texture dependency chain -- one
134        can write a program that performs over 1000 consecutive dependent
135        texture lookups.  There is no restrictions on dependencies between
136        texture mapping instructions and arithmetic instructions.  Texture
137        lookups can be performed using arbitrarily computed texture
138        coordinates.  Applications can carry out their calculations with full
139        32-bit single precision, although two lower-precision modes are also
140        available.
141
142    How does texture mapping work with fragment programs?
143
144        RESOLVED:  This extension provides three instructions used to perform
145        texture lookups.
146
147        The "TEX" instruction performs a lookup with the (s,t,r) values taken
148        from an interpolated texture coordinate, an arbitrarily computed
149        vector, or even a program constant.  The "TXP" instruction performs a
150        similar lookup, except that it uses the fourth component of the source
151        vector to performs a perspective divide, using (s/q, t/q, r/q).  In
152        both cases, the GL will automatically compute partial derivatives used
153        for filter and LOD selection.
154
155        The "TXD" instruction operates like "TEX", except that it allows the
156        program to explicitly specify two additional vectors containing the
157        partial derivatives of the texture coordinate with respect to x and y
158        window coordinates.
159
160        All three instructions write a filtered texel value to a temporary or
161        output register.  Other than the computation of texture coordinates
162        and partial derivatives, texture lookups not performed any differently
163        in fragment program mode.  In particular, any applicable LOD biases,
164        wrap modes, minification and magnification filters, and anisotropic
165        filtering controls are still applied in fragment program mode.
166
167        The results of the texture lookup are available to be used arbitrarily
168        by subsequent fragment program instructions.  Fragment programs are
169        allowed to access any texture map arbitrarily many times.
170
171    Can fragment programs be used to compute depth values?
172
173         RESOLVED:  Yes.  A fragment program can perform arbitrary
174         computations to compute a final value for the fragment, which it
175         should write to the "z" component of the o[DEPR] register.  The "z"
176         value written should be in the range [0,1], regardless of the size of
177         the depth buffer.
178
179         To assist in the computation of the final Z value, a fragment program
180         can access the interpolated depth of the fragment (prior to any
181         displacement) by reading the "z" component of the f[WPOS] attribute
182         register.
183
184    How should near and far plane clipping work in fragment program mode if
185    the current fragment program computes a depth value?
186
187        RESOLVED:  Geometric clipping to the near and far clip plane should be
188        disabled.  Clipping should be done based on the depth values computed
189        per-fragment.  The rationale is that per-fragment depth displacement
190        operations may effectively move portions of a primitive initially
191        outside the clip volume inside, and vice versa.
192
193        Note that under the NV_depth_clamp extension, geometric clipping to
194        the near and far clip planes is also disabled, and the fragment depth
195        values are clamped to the depth range.  If depth clamp mode is enabled
196        when using a fragment program that computes a depth value, the
197        computed depth value will be clamped to the depth range.
198
199    Should fragment programs be allowed to use multiple precisions for
200    operands and operations?
201
202        RESOLVED:  Yes.  Low-precision operands are generally adequate for
203        representing colors.  Allowing low-precision registers also allows for
204        a larger number of temporary registers (at lower precision).
205        Low-precision operations also provide the opportunity for a higher
206        level of performance.
207
208        Applications are free to use only high-precision operations or mix
209        high- and low-precision operations as necessary.
210
211    What levels of precision are supported in arithmetic operations?
212
213        RESOLVED:  Arithmetic operations can be performed at three different
214        precisions.  32-bit floating point precision (fp32) uses the IEEE
215        single-precision standard with a sign bit, 8 exponent bits, and 23
216        mantissa bits.  16-bit floating-point precision (fp16) uses a similar
217        floating-point representation, but with 5 exponent bits and 10
218        mantissa bits.  Additionally, many arithmetic operations can also be
219        carried out at 12-bit fixed point precision (fx12), where values in
220        the range [-2,+2) are represented as signed values with 10 fraction
221        bits.
222
223    How should the precision with which operations are carried out be
224    specified?  Should we infer the precision from the types of the operands
225    or result vectors?  Or should it be an attribute of the instruction?
226
227        RESOLVED:  Applications can optionally specify the precision of
228        individual instructions by adding a suffix of "R", "H", and "X" to
229        instruction names to select fp32, fp16, and fx12 precision,
230        respectively.
231
232        By default, instructions will be carried out using the precision of
233        the destination register.  Always inferring the precision from the
234        operands has a number of issues.  First, there are a number of
235        operations (e.g., TEX/TXP/TXD) where result type has little to no
236        correspondance to the type of the operands.  In these cases, precision
237        suffixes are not supported.  Second, one could have instructions
238        automatically cast operands and compute results using the type of the
239        highest precision operand or result.  This behavior would be
240        problematic since all fragment attribute registers and program
241        parameters are kept at full precision, but full precision may not be
242        needed by the operation.
243
244        The choice of precision level allows programs to trade off precision
245        for potentially higher performance.  Giving the program explicit
246        control over the precision also allows it to dictate precision
247        explicitly and eliminate any uncertainty over type casting.
248
249    For instructions whose specified precision is different than the precision
250    of the operands or the result registers, how are the operations performed?
251    How are the condition codes updated?
252
253        RESOLVED:  Operations are performed with operands and results at the
254        precision specified by the instruction.  After the operation is
255        complete, the result is converted to the precision of the destination
256        register, after which the condition code is generated.
257
258        In an alternate approach, the condition code could be generated from
259        the result.  However, in some cases, the register contents would not
260        match the condition code.  In such cases, it may not be reliable to
261        use the condition code to prevent division by zero or other special
262        cases.
263
264    How does this extension interact with the ARB_multisample extension?  In
265    the ARB_multisample extension, each fragment has multiple depth values.
266    In this extension, a single interpolated depth value may be modified by a
267    fragment program.
268
269        RESOLVED:  The depth values for the extra samples are generated by
270        computing partials of the computed depth value and using these
271        partials to derive the depth values for each of the extra samples.
272
273    How does this extension interact with polygon offset?  Both extensions
274    modify fragment depth values.
275
276        RESOLVED:  As in the base OpenGL spec, the depth offset generated by
277        polygon offset is added during polygon rasterization.  The depth value
278        provided to programs in f[WPOS].z already includes polygon offset, if
279        enabled.  If the depth value is replaced by a fragment program, the
280        polygon offset value will NOT be recomputed and added back after
281        program execution.
282
283        This is probably not desirable for fragment programs that modify depth
284        values since the partials used to generate the offset may not match
285        the partials of the computed depth value.  Polygon offset for filled
286        polygons can be approximated in a fragment program using the depth
287        partials obtained by the DDX and DDY instructions.  This will not work
288        properly for line- and point-mode polygons, since the partials used
289        for offset are computed over the polygon, while the partials resulting
290        from the DDX and DDY instructions are computed along the line (or are
291        zero for point-mode polygons).  In addition, separate treatment of
292        points, line segments, and polygons is not possible in a fragment
293        program.
294
295    Should depth component replacement be an property of the fragment program
296    or a separate enable?
297
298        RESOLVED:  It should be a program property.  Using the output register
299        notation simplifies matters:  depth components are replaced if and
300        only if the DEPR register is written to.  This alleviates the
301        application and driver burden of maintaining separate state.
302
303    How does this extension affect the handling of q texture coordinates in
304    the OpenGL spec?
305
306        RESOLVED:  Fragment programs are allowed to access an associated q
307        texture coordinate, so this attribute must be produced by
308        rasterization.  In unextended OpenGL 1.2, the q coordinate is
309        eliminated in the rasterization portions of the spec after dividing
310        each of s, t, and r by it.  This extension updates the specification
311        to pass q coordinates through at least to conventional texture
312        mapping.  When fragment program mode are disabled, q coordinates will
313        be eliminated there in an identical manner.  This modification has the
314        added benefit of simplifying the equations used for attribute
315        interpolation.
316
317    How should clip w coordinates be handled by this extension?
318
319        RESOLVED:  Fragment programs are allowed to access the reciprocal of
320        the clip w coordinate, so this attribute must be produced by
321        rasterization.  The OpenGL 1.2 spec doesn't explictly enumerate the
322        attributes associated with the fragment, but we add treatment of the w
323        clip coordinate in the appropriate locations.
324
325        The reciprocal of the clip w coordinate in traditional graphics
326        hardware is produced by screen-space linear interpolation of the
327        reciprocals of the clip w coordinates of the vertices.  However, this
328        spec says the clip w coordinate is produced by perspective-correct
329        interpolation of the (non-reciprocated) clip w vertex coordinates.
330        These two formulations turn out to be equivalent, and the latter is
331        more convenient since the core OpenGL spec already contains formulas
332        for perspective-correct interpolation of vertex attributes.
333
334    What is produced by the TEX/TXP/TXD instructions if the requested texture
335    image is inconsistent?
336
337        RESOLVED:  The result vector is specified to be (0,0,0,0).  This
338        behavior is consistent with the NV_texture_shader extension.  Note
339        that like in NV_texture_shader, these instructions ignore the standard
340        hierarchy of texture enables and programs can access textures that are
341        not specifically "enabled".
342
343    Should a minimum precision be specified for certain fragment attribute
344    registers (in particular COL0, COL1) that may not be generated with full
345    fp32 precision?
346
347        RESOLVED:  No.  It is expected that the precision of COL0/COL1 should
348        generally be at least as high as that of the frame buffer.
349
350    Fragment color components (f[COL0] and f[COL1]) are generally
351    low-precision fixed-point values in the range [0,1].  Is it possible to
352    pass unclamped or high-precision color components to fragment programs?
353
354        RESOLVED:  Yes, although you can't exactly call them "colors".
355        High-precision per-vertex color values can be written into any unused
356        texture coordinate set, either via a MultiTexCoord call or using a
357        vertex program.  These "texture coordinates" will be interpolated
358        during rasterization, and can be used arbitrarily by a fragment
359        program.
360
361        In particular, there is no requirement that per-fragment attributes
362        called "texture coordinates" be used for texture mapping.
363
364    Should this specification guarantee that temporary registers are
365    initialized to zero?
366
367        RESOLVED:  Yes.  This will allow for the modular construction of
368        programs that accumulate results in registers.  For example,
369        per-fragment lighting may use MAD instructions to accumulate color
370        contributions at each light.  Without zero-initialization, the program
371        would require an explicit MOV instruction to load 0 or the use of the
372        MUL instruction for the first light.
373
374    Should this specification support Unicode program strings?
375
376        RESOLVED:  Not necessary.
377
378    Programs defined by NV_vertex_program begin with "!!VP1.0".  Should
379    fragment programs have a similar identifier?
380
381        RESOLVED:  Yes, "!!FP1.0", identifying the first revision of this
382        fragment program language.
383
384    Should per-fragment attributes have equivalent integer names in the
385    program language, as per-vertex attributes do in NV_vertex_program?
386
387        RESOLVED:  No.  In NV_vertex_program, "generic" vertex attributes
388        could be specified directly by an application using only an attribute
389        number.  Those numbers may have no necessary correlation with the
390        conventional attribute names, although conventional vertex attributes
391        are mapped to attribute numbers.  However, conventional attributes are
392        the only outputs of vertex programs and of rasterization.  Therefore,
393        there is no need for a similar input-by-number functionality for
394        fragment programs.
395
396    Should we provide the ability to issue instructions that do not update
397    temporary or output registers?
398
399        RESOLVED:  Yes.  Programs may issue instructions whose only purpose is
400        to update the condition code register, and requiring such instructions
401        to write to a temporary may require the use of an additional temporary
402        and/or defeat possible program optimizations.  We accomplish this by
403        adding two write-only temporary pseudo-registers ("RC" and "HC") that
404        can be specified as destination registers.
405
406    Do the packing and unpacking instructions in this extension make any
407    sense?
408
409        RESOLVED:  Yes.  They are useful for packing and unpacking multiple
410        components in a single channel of a floating-point frame buffer.  For
411        example, a 128-bit "RGBA" frame buffer could pack 16 8-bit quantities
412        or 8 16-bit quantities, all of which could be used in later
413        rasterization passes.  See the NV_float_buffer extension for more
414        information.
415
416    Should we provide a method for specifying a fp16 depth component output
417    value?
418
419        RESOLVED:  No.  There is no good reason for supporting half-precision
420        Z outputs.  Even with 16-bit Z buffers, the 10-bit mantissa of the
421        half-precision float is rather limiting.  There would effectively be
422        only 11 good bits in the back half of the Z buffer.
423
424    Should RequestResidentProgramsNV (or a new equivalent function) take a
425    target?  Dealing with working sets of different program types is a bit
426    messy.  Should we document some limitation if we get programs of different
427    types?
428
429        RESOLVED:  In retrospect, it may have been a good idea to attach a
430        target to this command, but there isn't a good reason to mess with
431        something that already works for vertex programs.  The driver is
432        responsible for ensuring consistent results when the program types
433        specified are mixed.
434
435    What happens on data type conversions where the original value is not
436    exactly representable in the new data type, either due to overflow or
437    insufficient precision in the destination type?
438
439        RESOLVED:  In case of overflow, the original value is clamped to the
440        +/-INF (fp16 or fp32) or the nearest representable value (fx12).  In
441        case of imprecision, the conversion is either to round or truncate to
442        the nearest representable value.
443
444    Should this extension support IEEE-style denorms?  For 32-bit IEEE
445    floating point, denorms are numbers smaller in absolute value than 2^-126.
446    For 16-bit floats used by this extension, denorms are numbers smaller in
447    absolute value than 2^-14.
448
449        RESOLVED:  For 32-bit data types, hardware support for denorms was
450        considered too expensive relative to the benefit provided.
451        Computational results that would otherwise produce denorms are flushed
452        to zero.  For 16-bit data types, hardware denorm support will be
453        present.  The expense of hardware denorm support is lower and the
454        potential precision benefit is greater for 16-bit data types.
455
456    OpenGL provides a hierarchy of texture enables.  The texture lookup
457    operations in NV_texture_shader effectively override the texture enable
458    hierarchy and select a specific texture to enable.  What should be done by
459    this extension?
460
461        RESOLVED:  This extension will build upon NV_texture_shader and reduce
462        the driver overhead of validating the texture enables.  Texture
463        lookups can be specified by instructions like "TEX H0, f[TEX2], TEX2,
464        3D", which would indicate to use texture coordinate set number 2 to do
465        a lookup in the texture object bound to the TEXTURE_3D target in
466        texture image unit 2.
467
468        Each texture unit can have only one "active" target.  Programs are not
469        allowed to reference different texture targets in the same texture
470        image unit.  In the example above, any other texture instructions
471        using texture image unit 2 must specify the 3D texture target.
472
473    What is the interaction with NV_register_combiners?
474
475        RESOLVED:  Register combiners are not available when fragment programs
476        are enabled.
477
478        Previous version of this specification supported the notion of
479        combiner programs, where the result of fragment program execution was
480        a set of four "texture lookup" values that fed the register combiners.
481
482    For convenience, should we include pseudo-instructions not present in the
483    hardware instruction set that are trivially implementable?  For example,
484    absolute value and subtract instructions could fall in this category.  An
485    "ABS R1,R0" instruction would be equivalent to "MAX R1,R0,-R0", and a "SUB
486    R2,R0,R1" would be equivalent to "ADD R2,R0,-R1"
487
488        RESOLVED:  In general, yes.  A SUB instruction is provided for
489        convenience.  This extension does not provide a separate ABS
490        instruction because it supports absolute value operations of each
491        operand.
492
493    Should there be a '+' in the <optionalSign> portion of the grammar?  There
494    isn't one in the GL_NV_vertex_program spec.
495
496        RESOLVED:  Yes, for orthogonality/readability.  A '+' obviously adds
497        no functionality.  In NV_vertex_program, an <optionalSign> of "-" was
498        always a negation operator.  However, in fragment programs, it can
499        also be used as a sign for a constant value.
500
501    Can the same fragment attribute register, program parameter register, or
502    constants be used for multiple operands in the same instruction?  If so,
503    can it be used with different swizzle patterns?
504
505        RESOLVED:  Yes and yes.
506
507    This extension allows different limits for the number of texture
508    coordinate sets and the number of texture image units (i.e., texture maps
509    and associated data).  The state in ActiveTextureARB affects both
510    coordinate sets (TexGen, matrix operations) and image units (TexParameter,
511    TexEnv).  How should we deal with this?
512
513        RESOLVED:  Continue to use ActiveTextureARB and emit an
514        INVALID_OPERATION if the active texture refers to an unsupported
515        coordinate set/image unit.  Other options included creating dummy
516        (unusable) state for unsupported coordinate sets/image units and
517        continue to use ActiveTextureARB normally, or creating separate state
518        and state-setting commands for coordinate sets and image units.
519        Separate state is the cleanest solution, but would add more calls and
520        potentially cause more programmer confusion.  Dummy state would avoid
521        additional error checks, but the demands of dummy state could grow if
522        the number of texture image units and texture coordinate sets
523        increases.
524
525        The current OpenGL spec is vague as to what state is affected by the
526        active texture selector and has no distination between
527        coordinate-related and image-related state.  The state tables could
528        use a good clean-up in this area.
529
530    The LRP instruction is defined so that the result of "LRP R0, R0, R1, R2"
531    is R0*R1+(1-R0)*R2.  There are conflicting precedents here.  The
532    definition here matches the "lrp" instruction in the DirectX 8.0 pixel
533    shader language.  However, an equivalent RenderMan lerp operation would
534    yield a result of (1-R0)*R1+R0*R2.  Which ordering should be implemented?
535
536        RESOLVED:  NVIDIA hardware implements the former operand ordering, and
537        there is no good reason to specify a different ordering.  To convert a
538        "LRP" using the latter ordering to NV_fragment_program, swap the third
539        and fourth arguments.
540
541    Should this extension provide tracking of matrices or any other state,
542    similar to that provided in NV_vertex_program?
543
544        RESOLVED:  No.
545
546    Should this extension provide global program parameters -- values shared
547    between multiple fragment programs?
548
549        RESOLVED:  No.
550
551    Should this extension provide program parameters specific to a program?
552    If so, how?
553
554        RESOLVED:  Yes.  These parameters will be called "local parameters".
555        This extension will provide both named and numbered local parameters.
556        Local parameters can be managed by the driver and eliminate the need
557        for applications to manage a global name space.
558
559        Named local parameters work much like standard variable names in most
560        programming languages.  They are created using the "DECLARE"
561        instruction within the fragment program itself.  For example:
562
563            DECLARE color = {1,0,0,1};
564
565        Named local parameters are used simply by referencing the variable
566        name.  They do not require the array syntax like the global parameters
567        in the NV_vertex_program extension.  They can be updated using the
568        commands ProgramNamedParameter4[f,fv]NV.
569
570        Numbered local parameters are not declared.  They are used by simply
571        referencing an element of an array called "p".  For example,
572
573            MOV R0, p[12];
574
575        loads the value of numbered local parameter 12 into register R0.
576        Numbered local parameters can be updated using the commands
577        ProgramLocalParameter4[d,dv,f,fv]ARB.
578
579        The numbered local parameter APIs were added to this extension late in
580        its development, and are provided for compatibility with the
581        ARB_vertex_program extension, and what will likely be supported in
582        ARB_fragment_program as well.  Providing this mechanism allows
583        programs to use the same mechanisms to set local parameters in both
584        extension.
585
586    Why are the APIs for setting named and numbered local parameters
587    different?
588
589        RESOLVED:  The named parameter API was created prior to
590        ARB_vertex_program (and the possible future ARB_fragment_program) and
591        uses conventions borrowed from NV_vertex_program.  A slightly
592        different API was chosen during the ARB standardization process; see
593        the ARB_vertex_program specification for more details.
594
595        The named parameter API takes a program ID and a parameter name, and
596        sets the parameter for the program with the specified ID.  The
597        specified program does not need to be bound (via BindProgramNV) in
598        order to modify the values of its named parameters.  The numbered
599        parameter API takes a program target enum (FRAGMENT_PROGRAM_NV) and a
600        parameter number and modifies the corresponding numbered parameter of
601        the currently bound program.
602
603    What should be the initial value of uninitialized local parameters?
604
605        RESOLVED:  (0,0,0,0).  This choice is somewhat arbitrary, but matches
606        previous extensions (e.g., NV_vertex_program).
607
608    Should this extension support program parameter arrays?
609
610        RESOLVED:  No hardware support is present.  Note that from the point
611        of view of a fragment program, a texture map can be used as a 1-, 2-,
612        or 3-dimensional array of constants.
613
614    Should this extension provide support constants in fragment programs?  If
615    so, how?
616
617        RESOLVED:  Yes.  Scalar or vector constants can be defined inline
618        (e.g., "1.0" or "{1,2,3,4}").  In addition, named constants are
619        supported using the "DEFINE" instruction, which allow programmers to
620        change the values of constants used in multiple instructions simply be
621        changing the value assigned to the named constant.
622
623        Note that because this extension uses program strings, the
624        floating-point value of any constants generated on the fly must be
625        printed to the program string.  An alternate method that avoids the
626        need to print constants is to declare a named local program parameter
627        and initialize it with the ProgramNamedParameter4[f,fv]() calls.
628
629    Should named constants be allowed to be redefined?
630
631        RESOLVED:  No.  If you want to redefine the values of constants, you
632        can create an equivalent named program parameter by changing the
633        "DEFINE" keyword to "DECLARE".
634
635    Should functions used to update or query named local parameters take a
636    zero-terminated string (as with most strings in the C programming
637    language), or should they require an explicit string length?  If the
638    former, should we create a version of LoadProgramNV that does not require
639    a string length.
640
641        RESOLVED:  Stick with explicit string length.  Strings that are
642        defined as constants can have the length computed at compile-time.
643        Strings read from files will have the length known in advance.
644        Programs to build strings at run-time also likely keep the length
645        up-to-date.  Passing an explicit length saves time, since the driver
646        doesn't have to do a strlen().
647
648    What is the deal with the alpha of the secondary color?
649
650        RESOLVED:  In unextended OpenGL 1.2, the alpha component of the
651        secondary color is forced to 0.0.  In the EXT_secondary_color
652        extension, the alpha of the per-vertex secondary colors is defined to
653        be 0.0.  NV_vertex_program allows vertex programs to produce a
654        per-vertex alpha component, but it is forced to zero for the purposes
655        of the color sum.  In the NV_register_combiners extension, the alpha
656        component of the secondary color is undefined.  What a mess.
657
658        In this extension, the alpha of the secondary color is well-defined
659        and can be used normally.  When in vertex program mode
660
661    Why are fragment program instructions involving f[FOGC] or f[TEX0] through
662    f[TEX7] automatically carried out at full precision?
663
664        RESOLVED:  This is an artifact of the method that these interpolants
665        are generated the NVIDIA graphics hardware.  If such instructions
666        absolutely must be carried out at lower precision, the requirement can
667        be met by first loading the interpolants into a temporary register.
668
669    With a different number of texture coordinate sets and texture image
670    units, how many copies of each kind of texture state are there?
671
672        RESOLVED:  The intention is that texture state be broken into three
673        groups.  (1) There are MAX_TEXTURE_COORDS_NV copies of texture
674        coordinate set state, which includes current texture coordinates,
675        TexGen state, and texture matrices.  (2) There are
676        MAX_TEXTURE_IMAGE_UNITS_NV copies of texture image unit state, which
677        include texture maps, texture parameters, LOD bias parameters.  (3)
678        There are MAX_TEXTURE_UNITS_ARB copies of legacy OpenGL texture unit
679        state (e.g., texture enables, TexEnv blending state), all of which are
680        unused when in fragment program mode.
681
682        It is not necessary that MAX_TEXTURE_UNITS_ARB be equal to the minimum
683        of MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS --
684        implementations may choose not to extend fixed-function OpenGL texture
685        mapping modes beyond a certain point.
686
687    The GLX protocol for LoadProgramNV (and ProgramNamedParameterNV) may end
688    up with programs >64KB.  This will overflow the limits of the GLX Render
689    protocol, resulting in the need to use RenderLarge path.  This is an issue
690    with vertex programs, also.
691
692        RESOLVED:  Yes, it is.
693
694    Should textures used by fragment programs be declared?  For example,
695    "TEXTURE TEX3, 2D", indicating that the 2D texture should be used for all
696    accesses to texture unit 3.  The dimension could be dropped from the TEX
697    family of instructions, and some of the compile-time error checking could
698    be dropped.
699
700        RESOLVED:  Maybe it should be, but for better or worse, it isn't.
701
702    It is not all that uncommon to have negative q values with projective
703    texture mapping, but results are undefined if any q values are negative in
704    this specification.  Why?
705
706        RESOLVED:  This restriction carries on a similar one in the initial
707        OpenGL specification.  The motivation for this restriction is that
708        when interpolating, it is possible for a fragment to have an
709        interpolated q coordinate at or near 0.0.  Since the texture
710        coordinates used for projective texture mapping are s/q, t/q, and r/q,
711        this will result in a divide-by-zero error or suffer from significant
712        numerical instability.  Results will be inaccurate for such fragments.
713
714        Other than the numerical stability issue above, NVIDIA hardware should
715        have no problems with negative q coordinates.
716
717    Should programs that replace depth have their own special program type,
718    Such as "!!FPD1.0" and "!!FPDC1.0"?
719
720        RESOLVED:  No.  If a program has an instruction that writes to
721        o[DEPR], the final fragment depth value is taken from o[DEPR].z.
722        Otherwise, the fragment's original depth value is used.
723
724    What fx12 value should NaN map to?
725
726        RESOLVED:  For the lack of any better choice, 0.0.
727
728    How are special-case encodings (-INF, +INF, -0.0, +0.0, NaN) handled for
729    arithmetic and comparison operations?
730
731        RESOLVED:  The special cases for all floating-point operations are
732        designed to match the IEEE specification for floating-point numbers as
733        closely as possible.  The results produced by special cases should be
734        enumerated in the sections of this spec describing the operations.
735        There are some cases where the implemented fragment program behavior
736        does not match IEEE conventions, and these cases should be noted in
737        this specification.
738
739    How can condition codes be used to mask out register writes?  How about
740    killing fragments?  What other things can you do?
741
742        RESOLVED:  The following example computes a component wise |R1-R2|:
743
744          SUBC R0, R1, R2;      # "C" suffix means update condition code
745          MOV  R0 (LT), -R0;    # Conditional write mask in parentheses
746
747        The first instruction computes a component-wise difference between R1
748        and R2, storing R1-R2 in register R0.  The "C" suffix in the
749        instruction means to update the condition code based on the sign of
750        the result vector components.  The second instruction inverts the sign
751        of the components of R0.  However the "(LT)" portion says that the
752        destination register should be updated only if the corresponding
753        condition code component is LT (negative).  This means that only those
754        components of R0
755
756        To kill a fragment if the red (x) component of a texture lookup
757        returns zero:
758
759          TEXC R0, f[TEX0], TEX0, 2D;
760          KIL EQ.x;
761
762        To kill based on the green (y) component, use "EQ.y" instead.  To kill
763        if any of the four components is zero, use "EQ.xyzw" or just "EQ".
764
765        Fragment programs do not support boolean expressions.  These can
766        generally be achieved using conditional write mask.
767
768        To evaluate the expression "(R0.x == 0) && (R1.x == 0)":
769
770          MOVC RC.x, R0.x;
771          MOVC RC.x (EQ), R1.x;
772
773        To evaluate the expression "(R0.x == 0) || (R1.x == 0)":
774
775          MOVC RC.x, R0.x;
776          MOVC RC.x (NE), R1.x;
777
778        In both cases, the x component of the condition code will contain "EQ"
779        if and only if the condition is TRUE.
780
781    How can fragment programs be used to implement non-standard texture
782    filtering modes?
783
784        RESOLVED:  As one example, consider a case where you want to do linear
785        filtering in a 2D texture map, but only horizontally.  To achieve
786        this, first set the texture filtering mode to NEAREST.  For a 16 x n
787        texture, you might do something like:
788
789          DEFINE halfTexel = { 0.03125, 0 };   # 1/32 (1/2 a texel)
790          ADD R2, f[TEX0], -halfTexel;         # coords of left sample
791          ADD R1, f[TEX0], +halfTexel;         # coords of right sample
792          TEX R0, R2, TEX0, 2D;                # lookup left sample
793          TEX R1, R1, TEX0, 2D;                # lookup right sample
794          MUL R2.x, R2.x, 16;                  # scale X coords to texels
795          FRC R2.x, R2.x;                      # get fraction, filter weight
796          LRP R0, R2.x, R1, R0;                # blend samples based on weight
797
798        There are plenty of other interesting things that can be done.
799
800    Should this specification provide more examples?
801
802        RESOLVED:  Yes, it should.
803
804    Is the OpenGL ARB working on a multi-vendor standard for fragment
805    programmability?  Will there be an ARB_fragment_program extension?  If so,
806    how will this extension interact with the ARB standard?
807
808        RESOLVED:  Yes, as of July 2002, there was a multi-vendor working
809        group and a draft specification.  The ARB extension is expected to
810        have several features not present in this extension, such as state
811        tracking and global parameters (called "program environment
812        parameters").  It will also likely lack certain features found in this
813        extension.
814
815    Why does the HEMI mapping apply to the third component of signed HILO
816    textures, but not to unsigned HILO textures?
817
818        RESOLVED:  This behavior matches the behavior of NV_texture_shader
819        (e.g., the DOT_PRODUCT_NV mode).  The HEMI mapping will construct the
820        third component of a unit vector whose first two components are
821        encoded in the HILO texture.
822
823
824New Procedures and Functions
825
826    void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name,
827                                   float x, float y, float z, float w);
828    void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name,
829                                   double x, double y, double z, double w);
830    void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name,
831                                    const float v[]);
832    void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name,
833                                    const double v[]);
834    void GetProgramNamedParameterfvNV(uint id, sizei len, const ubyte *name,
835                                      float *params);
836    void GetProgramNamedParameterdvNV(uint id, sizei len, const ubyte *name,
837                                      double *params);
838
839    void ProgramLocalParameter4dARB(enum target, uint index,
840                                    double x, double y, double z, double w);
841    void ProgramLocalParameter4dvARB(enum target, uint index,
842                                     const double *params);
843    void ProgramLocalParameter4fARB(enum target, uint index,
844                                    float x, float y, float z, float w);
845    void ProgramLocalParameter4fvARB(enum target, uint index,
846                                     const float *params);
847    void GetProgramLocalParameterdvARB(enum target, uint index,
848                                       double *params);
849    void GetProgramLocalParameterfvARB(enum target, uint index,
850                                       float *params);
851
852
853New Tokens
854
855    Accepted by the <cap> parameter of Disable, Enable, and IsEnabled, by the
856    <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev,
857    and by the <target> parameter of BindProgramNV, LoadProgramNV,
858    ProgramLocalParameter4dARB, ProgramLocalParameter4dvARB,
859    ProgramLocalParameter4fARB, ProgramLocalParameter4fvARB,
860    GetProgramLocalParameterdvARB, and GetProgramLocalParameterfvARB:
861
862        FRAGMENT_PROGRAM_NV                            0x8870
863
864    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv,
865    and GetDoublev:
866
867        MAX_TEXTURE_COORDS_NV                          0x8871
868        MAX_TEXTURE_IMAGE_UNITS_NV                     0x8872
869        FRAGMENT_PROGRAM_BINDING_NV                    0x8873
870        MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV       0x8868
871
872    Accepted by the <name> parameter of GetString:
873
874        PROGRAM_ERROR_STRING_NV                        0x8874
875
876
877Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation)
878
879    Modify Section 2.11, Clipping (p.39)
880
881    (replace the first paragraph of the section, p. 39)  Primitives are clipped
882    to the clip volume.  In clip coordinates, the view volume is defined by
883
884        -w_c <= x_c <= w_c,
885        -w_c <= y_c <= w_c, and
886        -w_c <= z_c <= w_c.
887
888    Clipping to the near and far clip planes is ignored if fragment program
889    mode (section 3.11) or texture shaders (see NV_texture_shader
890    specification) are enabled, if the current fragment program or texture
891    shader computes per-fragment depth values.  In this case, the view volume
892    is defined by:
893
894        -w_c <= x_c <= w_c and
895        -w_c <= y_c <= w_c.
896
897
898Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization)
899
900    Modify Chapter 3 introduction (p. 57)
901
902    (p.57, modify 1st paragraph) ... Figure 3.1 diagrams the rasterization
903    process.  The color value assigned to a fragment is initially determined
904    by the rasterization operations (Sections 3.3 through 3.7) and modified by
905    either the execution of the texturing, color sum, and fog operations as
906    defined in Sections 3.8, 3.9, and 3.10, or of a fragment program defined
907    in Section 3.11.  The final depth value is initially determined by the
908    rasterization operations and may be modified by a fragment program.
909
910    note:  Antialiasing Application is renumbered from Section 3.11 to Section
911    3.12.
912
913    Modify Figure 3.1 (p.58)
914
915                             Primitive Assembly
916                                      |
917              +-----------+-----------+-----------+-----------+
918              |           |           |           |           |
919              |           |           |        Pixel          |
920            Point       Line       Polygon     Rectangle   Bitmap
921           Raster-     Raster-     Raster-     Raster-     Raster-
922           ization     ization     ization     ization     ization
923              |           |           |           |           |
924              +-----------+-----------+-----------+-----------+
925                                      |
926                                      |
927                    +-----------------+-----------------+
928                    |                 |                 |
929              Conventional         Texture          Fragment
930              Texture Fetch        Shaders          Programs
931                    |                 |                 |
932                    |  +--------------+                 |
933                    |  |                                |
934        TEXTURE_    o  o                                |
935        SHADER_NV                                       |
936        enable      o                                   |
937                    |                                   |
938                    +-------------+                     |
939                    |             |                     |
940               Conventional   Register                  |
941                  TexEnv      Combiners                 |
942                    |             |                     |
943                Color Sum         |                     |
944                    |             |                     |
945                   Fog            |                     |
946                    |             |                     |
947                    |  +----------+                     |
948                    |  |                                |
949        REGISTER_   o  o                                |
950        COMBINERS_                                      |
951        NV enable   o                                   |
952                    |                                   |
953                    +-----------------+  +--------------+
954                                      |  |
955                           FRAGMENT_  o  o
956                           PROGRAM_
957                           NV enable  o
958                                      |
959                                      |
960                                   Coverage
961                                  Application
962                                      |
963                                      v
964                            to fragment processing
965
966
967    Modify Section 3.3, Points (p.61)
968
969    All fragments produced in rasterizing a non-antialiased point are assigned
970    the same associated data, which are those of the vertex corresponding to
971    the point.  (delete reference to divide by q).
972
973    If anitialiasing is enabled, then ...  The data associated with each
974    fragment are otherwise the data associated with the point being
975    rasterized.  (delete reference to divide by q)
976
977    Modify Section 3.4.1, Basic Line Segment Rasterization (p.66)
978
979    (Note that t=0 at p_a and t=1 at p_b).  The value of an associated datum f
980    from the fragment, whether it be R, G, B, or A (in RGBA mode) or a color
981    index (in color index mode), the s, t, r, or q texture coordinate, or the
982    clip w coordinate (the depth value, window z, must be found using equation
983    3.3, below), is found as
984
985      f = (1-t) * f_a / w_a + t * f_b / w_b                     (3.2)
986          ---------------------------------
987                (1-t) / w_a + t / w_b
988
989    where f_a and f_b are the data associated with the starting and ending
990    endpoints of the segment, respectively; w_a and w_b are the clip
991    w coordinates of the starting and ending endpoints of the segments
992    respectively.  Note that linear interpolation would use
993
994      f = (1-t) * f_a + t * f_b.                                (3.3)
995
996    ... A GL implementation may choose to approximate equation 3.2 with 3.3,
997    but this will normally lead to unacceptable distortion effects when
998    interpolating texture coordinates or clip w coordinates.
999
1000    Modify Section 3.5.1, Basic Polygon Rasterization (p.71)
1001
1002    Denote a datum at p_a, p_b, or p_c ... is given by
1003
1004      f = a * f_a / w_a + b * f_b / w_b + c * f_c / w_c         (3.4)
1005          ---------------------------------------------
1006                  a / w_a + b / w_b + c / w_c
1007
1008    where w_a, w_b, and w_c are the clip w coordinates of p_a, p_b, and p_c,
1009    respectively.  a, b, and c are the barycentric coordinates of the fragment
1010    for which the data are produced. a, b, and c must correspond precisely to
1011    the exact coordinates ... at the fragment's center.
1012
1013    Just as with line segment rasterization, equation 3.4 may be approximated
1014    by
1015
1016      f = a * f_a + b * f_b + c * f_c;                          (3.5)
1017
1018    this may yield ... for texture coordinates or clip w coordinates.
1019
1020    Modify Section 3.6.4, Rasterization of Pixel Rectangles (p.100)
1021
1022    A fragment arising from a group ... are given by those associated with the
1023    current raster position.  (delete reference to divide by q)
1024
1025    Modify Section 3.7, Bitmaps (p.111)
1026
1027    Otherwise, a rectangular array ... The associated data for each fragment
1028    are those associated with the current raster position.  (delete reference
1029    to divide by q)  Once the fragments have been produced ...
1030
1031    Modify Section 3.8, Texturing (p.112)
1032
1033    ... an image at the location indicated by a fragment's texture coordinates
1034    to modify the fragments primary RGBA color.  Texturing does not affect the
1035    secondary color.
1036
1037    Texturing is specified only for RGBA mode; its use in color index mode is
1038    undefined.
1039
1040    Except when in fragment program mode (Section 3.11), the (s,t,r) texture
1041    coordinates used for texturing are the values s/q, t/q, and r/q,
1042    respectively, where s, t, r, and q are the texture coordinates associated
1043    with the fragment.  When in fragment program mode, the (s,t,r) texture
1044    coordinates are specified by the program.  If q is less than or equal to
1045    zero, the results of texturing are undefined.
1046
1047    Add new Section 3.11, Fragment Programs (p.140)
1048
1049    Fragment program mode is enabled and disabled with the Enable and Disable
1050    commands using the symbolic constant FRAGMENT_PROGRAM_NV.  When fragment
1051    program mode is enabled, standard and extended texturing, color sum, and
1052    fog application stages are ignored and a general purpose program is
1053    executed instead.
1054
1055    A fragment program is a sequence of instructions that execute on a
1056    per-fragment basis.  In fragment program mode, the currently bound
1057    fragment program is executed as each fragment is generated by the
1058    rasterization operations.  Fragment programs execute a finite fixed
1059    sequence of instructions with no branching or looping, and operate
1060    independently from the processing of other fragments.  Fragment programs
1061    are used to compute new color values to be associated with each fragment,
1062    and can optionally compute a new depth value for each fragment as well.
1063
1064    Fragment program mode is not available in color index mode and is
1065    considered disabled, regardless of the state of FRAGMENT_PROGRAM_NV.  When
1066    fragment program mode is enabled, texture shaders and register combiners
1067    (NV_texture_shader and NV_register_combiners extension) are disabled,
1068    regardless of the state of TEXTURE_SHADER_NV and REGISTER_COMBINERS_NV.
1069
1070    Section 3.11.1, Fragment Program Registers
1071
1072    Fragment programs operate on a set of program registers.  Each program
1073    register is a 4-component vector, whose components are referred to as "x",
1074    "y", "z", and "w" respectively.  The components of a fragment register are
1075    always referred to in this manner, regardless of the meaning of their
1076    contents.
1077
1078    The four components of each fragment program register have one of two
1079    different representations:  32-bit floating-point (fp32) or 16-bit
1080    floating-point (fp16).  More details on these representations can be found
1081    in Section 3.11.4.1.
1082
1083    There are several different classes of program registers.  Attribute
1084    registers (Table X.1) correspond to the fragment's associated data
1085    produced by rasterization.  Temporary registers (Table X.2) hold
1086    intermediate results generated by the fragment program.  Output registers
1087    (Table X.3) hold the final results of a fragment program.  The single
1088    condition code register is used to mask writes to other registers or to
1089    determine if a fragment should be discarded.
1090
1091
1092    Section 3.11.1.1, Fragment Program Attribute Registers
1093
1094    The fragment program attribute registers (Table X.1) hold the location of
1095    the fragment and the data associated with the fragment produced by
1096    rasterization.
1097
1098    Fragment Attribute                                    Component
1099    Register Name    Description                          Interpretation
1100    --------------   -----------------------------------  --------------
1101       f[WPOS]       Position of the fragment center.     (x,y,z,1/w)
1102       f[COL0]       Interpolated primary color           (r,g,b,a)
1103       f[COL1]       Interpolated secondary color         (r,g,b,a)
1104       f[FOGC]       Interpolated fog distance/coord      (z,0,0,0)
1105       f[TEX0]       Texture coordinate (unit 0)          (s,t,r,q)
1106       f[TEX1]       Texture coordinate (unit 1)          (s,t,r,q)
1107       f[TEX2]       Texture coordinate (unit 2)          (s,t,r,q)
1108       f[TEX3]       Texture coordinate (unit 3)          (s,t,r,q)
1109       f[TEX4]       Texture coordinate (unit 4)          (s,t,r,q)
1110       f[TEX5]       Texture coordinate (unit 5)          (s,t,r,q)
1111       f[TEX6]       Texture coordinate (unit 6)          (s,t,r,q)
1112       f[TEX7]       Texture coordinate (unit 7)          (s,t,r,q)
1113
1114    Table X.1:  Fragment Attribute Registers.  The component interpretation
1115    column describes the mapping of attribute values to register components.
1116    For example, the "x" component of f[COL0] holds the red color component,
1117    and the "x" component of f[TEX0] holds the "s" texture coordinate for
1118    texture unit 0.  The entries "0" and "1" indicate that the attribute
1119    register components hold the constants 0 and 1, respectively.
1120
1121    f[WPOS].x and f[WPOS].y hold the (x,y) window coordinates of the fragment
1122    center, and relative to the lower left corner of the window.  f[WPOS].z
1123    holds the associated z window coordinate, normally in the range [0,1].
1124    f[WPOS].w holds the reciprocal of the associated clip w coordinate.
1125
1126    f[COL0] and f[COL1] hold the associated RGBA primary and secondary colors
1127    of the fragment, respectively.
1128
1129    f[FOGC] holds the associated eye distance or fog coordinate normally used
1130    for fog computations.
1131
1132    f[TEX0] through f[TEX7] hold the associated texture coordinates for
1133    texture coordinate sets 0 through 7, respectively.
1134
1135    All attribute register components are treated as 32-bit floats.  However,
1136    the components of primary and secondary colors (f[COL0] and f[COL1]) may
1137    be generated with reduced precision.
1138
1139    The contents of the fragment attribute registers may not be modified by a
1140    fragment program.  In addition, each fragment program instruction can use
1141    at most one unique attribute register.
1142
1143
1144    Section 3.11.1.2, Fragment Program Temporary Registers
1145
1146    The fragment temporary registers (Table X.2) hold intermediate values used
1147    during the execution of a fragment program.  There are 96 temporary
1148    register names, but not all can be used simultaneously.
1149
1150    Fragment Temporary
1151    Register Name       Description
1152    ------------------  -----------------------------------------------------
1153        R0-R31          Four 32-bit (fp32) floating point values (s.e8.m23)
1154        H0-H63          Four 16-bit (fp16) floating point values (s.e5.m10)
1155
1156    Table X.2:  Fragment Temporary Registers.
1157
1158    In addition to the normal temporary registers, there are two temporary
1159    pseudo-registers, "RC" and "HC".  RC and HC are treated as unnumbered,
1160    write-only temporary registers.  The components of RC have a fp32 data
1161    type; the components of HC have a fp16 data type.  The sole purpose of
1162    these registers is to permit instructions to modify the condition code
1163    register (section 3.11.1.4) without overwriting the values in any
1164    temporary register.
1165
1166    Fragment program instructions can read and write temporary registers.
1167    There is no restriction on the number of temporary registers that can be
1168    accessed by any given instruction.
1169
1170    All temporary registers are initialized to (0,0,0,0) each time a fragment
1171    program executes.
1172
1173
1174    Section 3.11.1.3, Fragment Program Output Registers
1175
1176    The fragment program output registers hold the final results of the
1177    fragment program.  The possible final results of a fragment program are a
1178    high- or low-precision RGBA fragment color, and a fragment depth value.
1179
1180       Output
1181    Register Name      Description
1182    -------------      -------------------------------------------------------
1183       o[COLR]         Final RGBA fragment color, fp32 format
1184       o[COLH]         Final RGBA fragment color, fp16 format
1185       o[DEPR]         Final fragment depth value, fp32 format
1186
1187    Table X.3:  Fragment Program Output Registers.
1188
1189    o[COLR] and o[COLH] specify the color of a fragment.  These two registers
1190    are identical, except for the associated data type of the components.  The
1191    R, G, B, and A components of the fragment color are taken from the x, y,
1192    z, and w components respectively of the o[COLR] or o[COLH].  A fragment
1193    program will fail to load if it writes to both o[COLR] and o[COLH].
1194
1195    o[DEPR] can be used to replace the associated depth value of a fragment.
1196    The new depth value is taken from the z component of o[DEPR].  If a
1197    fragment program does not write to o[DEPR], the associated depth value is
1198    unmodified.
1199
1200    A fragment program will fail to load if it does not write to at least one
1201    output register.
1202
1203    The fragment program output registers may not be read by a fragment
1204    program, but may be written to multiple times.
1205
1206    The values of all fragment program output registers are initially
1207    undefined.
1208
1209
1210    Section 3.11.1.4, Fragment Program Condition Code Register
1211
1212    The condition code register (CC) is a single four-component vector.  Each
1213    component of this register is one of four enumerated values:  GT (greater
1214    than), EQ (equal), LT (less than), or UN (unordered).  The condition code
1215    register can be used to mask writes to fragment data register components
1216    or to terminate processing of a fragment altogether (via the KIL
1217    instruction).
1218
1219    Most fragment program instructions can optionally update the condition
1220    code register.  When a fragment program instruction updates the condition
1221    code register, a condition code component is set to LT if the
1222    corresponding component of the result vector is less than zero, EQ if it
1223    is equal to zero, GT if it is greater than zero, and UN if it is NaN (not
1224    a number).
1225
1226    The condition code register is initialized to a vector of EQ values each
1227    time a fragment program executes.
1228
1229
1230    Section 3.11.2, Fragment Program Parameters
1231
1232    In addition to using the registers defined in Section 3.11.1, fragment
1233    programs may also use fragment program parameters in their computation.
1234    Fragment program parameters are constant during the execution of fragment
1235    programs, but some parameters may be modified outside the execution of a
1236    fragment program.
1237
1238    There are five different types of program parameters:  embedded scalar
1239    constants, embedded vector constants, named constants, named local
1240    parameters, and numbered local parameters.
1241
1242    Embedded scalar constants are written as standard floating-point numbers
1243    with an optional sign designator ("+" or "-") and optional scientific
1244    notation (e.g., "E+06", meaning "times 10^6").
1245
1246    Embedded vector constants are written as a comma-separated array of one to
1247    four scalar constants, surrounded by braces (like a C/C++ array
1248    initializer).  Vector constants are always treated as 4-component vectors:
1249    constants with fewer than four components are expanded to 4-components by
1250    filling missing y and z components with 0.0 and missing w components with
1251    1.0.  Thus, the vector constant "{2}" is equivalent to "{2,0,0,1}",
1252    "{3,4}" is equivalent to "{3,4,0,1}", and "{5,6,7}" is equivalent to
1253    "{5,6,7,1}".
1254
1255    Named constants allow fragment program instructions to define scalar or
1256    vector constants that can be referenced by name.  Named constants are
1257    created using the DEFINE instruction:
1258
1259        DEFINE pi = 3.1415926535;
1260        DEFINE color = {0.2, 0.5, 0.8, 1.0};
1261
1262    The DEFINE instruction associates a constant name with a scalar or vector
1263    constant value.  Subsequent fragment program instructions that use the
1264    constant name are equivalent to those using the corresponding constant
1265    value.
1266
1267    Named local parameters are similar to named vector constants, but their
1268    values can be modified after the program is loaded.  Local parameters are
1269    created using the DECLARE instruction:
1270
1271        DECLARE fog_color1;
1272        DECLARE fog_color2 = {0.3, 0.6, 0.9, 0.1};
1273
1274    The DECLARE instruction creates a 4-component vector associated with the
1275    local parameter name.  Subsequent fragment program instructions
1276    referencing the local parameter name are processed as though the current
1277    value of the local parameter vector were specified instead of the
1278    parameter name.  A DECLARE instruction can optionally specify an initial
1279    value for the local parameter, which can be either a scalar or vector
1280    constant.  Scalar constants are expanded to 4-component vectors by
1281    replicating the scalar value in each component.  The initial value of
1282    local parameters not initialized by the program is (0,0,0,0).
1283
1284    A named local parameter for a specific program can be updated using the
1285    calls ProgramNamedParameter4fNV or ProgramNamedParameter4fvNV (section
1286    5.7).  Named local parameters are accessible only by the program in which
1287    they are defined.  Modifying a local parameter affects the only the
1288    associated program and does not affect local parameters with the same name
1289    that are found in any other fragment program.
1290
1291    Numbered local parameters are similar to named local parameters, except
1292    that they are referred to by number and are not declared in fragment
1293    programs.  Each fragment program object has an array of four-component
1294    floating-point vectors that can be used by the program.  The number of
1295    vectors is given by the implementation-dependent constant
1296    MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV, and must be at least 64.  A
1297    numbered local parameter is accessed by a fragment program as members of
1298    an array called "p".  For example, the instruction
1299
1300        MOV R0, p[31];
1301
1302    copies the contents of numbered local parameter 31 into temporary register
1303    R0.
1304
1305    Constant and local parameter names can be arbitrary strings consisting of
1306    letters (upper or lower-case), numbers, underscores ("_"), and dollar
1307    signs ("$").  Keywords defined in the grammar (including instruction
1308    names) can not be used as constant names, nor can strings that start with
1309    numbers, or strings that specify valid temporary register or texture
1310    numbers (e.g., "R0"-"R31", "H0"-"H63"", "TEX0"-"TEX15").  A fragment
1311    program will fail to load if a DEFINE or DECLARE instruction specifies an
1312    invalid constant or local parameter name.
1313
1314    A fragment program will fail to load if an instruction contains a named
1315    parameter not specified in a previous DEFINE or DECLARE instruction.  A
1316    fragment program will also fail to load if a DEFINE or DECLARE instruction
1317    attempts to re-define a named parameter specified in a previous DEFINE or
1318    DECLARE instruction.
1319
1320    The contents of the fragment program parameters may not be modified by a
1321    fragment program.  In addition, each fragment program instruction can
1322    normally use at most one unique program parameter.  The only exception to
1323    this rule is if all program parameter references specify named or embedded
1324    constants that taken together contain no more than four unique scalar
1325    values.  For such instructions, the GL will automatically generate an
1326    equivalent instruction that references a single merged vector constant.
1327    This merging allows programs to specify instructions like the following:
1328
1329        Instruction              Equivalent Instruction
1330        ---------------------    ---------------------------------------
1331        MAD R0, R1, 2, -1;       MAD R0, R1, {2,-1,0,0}.x, {2,-1,0,0}.y;
1332        ADD R0, {1,2,3,4}, 4;    ADD R0, {1,2,3,4}.xyzw, {1,2,3,4}.w;
1333
1334    Before counting the number of unique values, any named constants are first
1335    converted to the equivalent embedded constants.  When generating a
1336    combined vector constant, the GL does not perform swizzling, component
1337    selection, negation, or absolute value operations.  The following
1338    instructions are invalid, as they contain more than four unique scalar
1339    values.
1340
1341        Invalid Instructions
1342        -----------------------------------
1343        ADD R0, {1,2,3,4}, -4;
1344        ADD R0, {1,2,3,4}, |-4|;
1345        ADD R0, {1,2,3,4}, -{-1,-2,-3,-4};
1346        ADD R0, {1,2,3,4}, {4,5,6,7}.x;
1347
1348
1349    Section 3.11.3, Fragment Program Specification
1350
1351    Fragment programs are specified as an array of ubytes.  The array is a
1352    string of ASCII characters encoding the program.  The command
1353    LoadProgramNV loads a fragment program when the target parameter is
1354    FRAGMENT_PROGRAM_NV.  The command BindProgramNV enables a fragment program
1355    for execution.
1356
1357    At program load time, the program is parsed into a set of tokens possibly
1358    separated by white space.  Spaces, tabs, newlines, carriage returns, and
1359    comments are considered whitespace.  Comments begin with the character "#"
1360    and are terminated by a newline, a carriage return, or the end of the
1361    program array.  Fragment programs are case-sensitive -- upper and lower
1362    case letters are treated differently.  The proper choice of case can be
1363    inferred from the grammar.
1364
1365    The Backus-Naur Form (BNF) grammar below specifies the syntactically valid
1366    sequences for fragment programs.  The set of valid tokens can be inferred
1367    from the grammar.  The token "" represents an empty string and is used to
1368    indicate optional rules.  A program is invalid if it contains any
1369    undefined tokens or characters.
1370
1371    <program>              ::= <progPrefix> <instructionSequence> "END"
1372
1373    <progPrefix>           ::= "!!FP1.0"
1374
1375    <instructionSequence>  ::= <instructionSequence> <instructionStatement>
1376                             | <instructionStatement>
1377
1378    <instructionStatement> ::= <instruction> ";"
1379                             | <constantDefinition> ";"
1380                             | <localDeclaration> ";"
1381
1382    <instruction>          ::= <VECTORop-instruction>
1383                             | <SCALARop-instruction>
1384                             | <BINSCop-instruction>
1385                             | <BINop-instruction>
1386                             | <TRIop-instruction>
1387                             | <KILop-instruction>
1388                             | <TEXop-instruction>
1389                             | <TXDop-instruction>
1390
1391    <VECTORop-instruction> ::= <VECTORop> <maskedDstReg> ","
1392                               <vectorSrc>
1393
1394    <VECTORop>             ::= "DDX"   | "DDX_SAT"
1395                             | "DDXR"  | "DDXR_SAT"
1396                             | "DDXH"  | "DDXH_SAT"
1397                             | "DDXC"  | "DDXC_SAT"
1398                             | "DDXRC" | "DDXRC_SAT"
1399                             | "DDXHC" | "DDXHC_SAT"
1400                             | "DDY"   | "DDY_SAT"
1401                             | "DDYR"  | "DDYR_SAT"
1402                             | "DDYH"  | "DDYH_SAT"
1403                             | "DDYC"  | "DDYC_SAT"
1404                             | "DDYRC" | "DDYRC_SAT"
1405                             | "DDYHC" | "DDYHC_SAT"
1406                             | "FLR"   | "FLR_SAT"
1407                             | "FLRR"  | "FLRR_SAT"
1408                             | "FLRH"  | "FLRH_SAT"
1409                             | "FLRX"  | "FLRX_SAT"
1410                             | "FLRC"  | "FLRC_SAT"
1411                             | "FLRRC" | "FLRRC_SAT"
1412                             | "FLRHC" | "FLRHC_SAT"
1413                             | "FLRXC" | "FLRXC_SAT"
1414                             | "FRC"   | "FRC_SAT"
1415                             | "FRCR"  | "FRCR_SAT"
1416                             | "FRCH"  | "FRCH_SAT"
1417                             | "FRCX"  | "FRCX_SAT"
1418                             | "FRCC"  | "FRCC_SAT"
1419                             | "FRCRC" | "FRCRC_SAT"
1420                             | "FRCHC" | "FRCHC_SAT"
1421                             | "FRCXC" | "FRCXC_SAT"
1422                             | "LIT"   | "LIT_SAT"
1423                             | "LITR"  | "LITR_SAT"
1424                             | "LITH"  | "LITH_SAT"
1425                             | "LITC"  | "LITC_SAT"
1426                             | "LITRC" | "LITRC_SAT"
1427                             | "LITHC" | "LITHC_SAT"
1428                             | "MOV"   | "MOV_SAT"
1429                             | "MOVR"  | "MOVR_SAT"
1430                             | "MOVH"  | "MOVH_SAT"
1431                             | "MOVX"  | "MOVX_SAT"
1432                             | "MOVC"  | "MOVC_SAT"
1433                             | "MOVRC" | "MOVRC_SAT"
1434                             | "MOVHC" | "MOVHC_SAT"
1435                             | "MOVXC" | "MOVXC_SAT"
1436                             | "PK2H"
1437                             | "PK2US"
1438                             | "PK4B"
1439                             | "PK4UB"
1440
1441    <SCALARop-instruction> ::= <SCALARop> <maskedDstReg> ","
1442                               <scalarSrc>
1443
1444    <SCALARop>             ::= "COS"     | "COS_SAT"
1445                             | "COSR"    | "COSR_SAT"
1446                             | "COSH"    | "COSH_SAT"
1447                             | "COSC"    | "COSC_SAT"
1448                             | "COSRC"   | "COSRC_SAT"
1449                             | "COSHC"   | "COSHC_SAT"
1450                             | "EX2"     | "EX2_SAT"
1451                             | "EX2R"    | "EX2R_SAT"
1452                             | "EX2H"    | "EX2H_SAT"
1453                             | "EX2C"    | "EX2C_SAT"
1454                             | "EX2RC"   | "EX2RC_SAT"
1455                             | "EX2HC"   | "EX2HC_SAT"
1456                             | "LG2"     | "LG2_SAT"
1457                             | "LG2R"    | "LG2R_SAT"
1458                             | "LG2H"    | "LG2H_SAT"
1459                             | "LG2C"    | "LG2C_SAT"
1460                             | "LG2RC"   | "LG2RC_SAT"
1461                             | "LG2HC"   | "LG2HC_SAT"
1462                             | "RCP"     | "RCP_SAT"
1463                             | "RCPR"    | "RCPR_SAT"
1464                             | "RCPH"    | "RCPH_SAT"
1465                             | "RCPC"    | "RCPC_SAT"
1466                             | "RCPRC"   | "RCPRC_SAT"
1467                             | "RCPHC"   | "RCPHC_SAT"
1468                             | "RSQ"     | "RSQ_SAT"
1469                             | "RSQR"    | "RSQR_SAT"
1470                             | "RSQH"    | "RSQH_SAT"
1471                             | "RSQC"    | "RSQC_SAT"
1472                             | "RSQRC"   | "RSQRC_SAT"
1473                             | "RSQHC"   | "RSQHC_SAT"
1474                             | "SIN"     | "SIN_SAT"
1475                             | "SINR"    | "SINR_SAT"
1476                             | "SINH"    | "SINH_SAT"
1477                             | "SINC"    | "SINC_SAT"
1478                             | "SINRC"   | "SINRC_SAT"
1479                             | "SINHC"   | "SINHC_SAT"
1480                             | "UP2H"    | "UP2H_SAT"
1481                             | "UP2HC"   | "UP2HC_SAT"
1482                             | "UP2US"   | "UP2US_SAT"
1483                             | "UP2USC"  | "UP2USC_SAT"
1484                             | "UP4B"    | "UP4B_SAT"
1485                             | "UP4BC"   | "UP4BC_SAT"
1486                             | "UP4UB"   | "UP4UB_SAT"
1487                             | "UP4UBC"  | "UP4UBC_SAT"
1488
1489    <BINSCop-instruction> ::=  <BINSCop> <maskedDstReg> ","
1490                               <scalarSrc> "," <scalarSrc>
1491
1492    <BINSCop>              ::= "POW"   | "POW_SAT"
1493                             | "POWR"  | "POWR_SAT"
1494                             | "POWH"  | "POWH_SAT"
1495                             | "POWC"  | "POWC_SAT"
1496                             | "POWRC" | "POWRC_SAT"
1497                             | "POWHC" | "POWHC_SAT"
1498
1499    <BINop-instruction>    ::= <BINop> <maskedDstReg> ","
1500                               <vectorSrc> "," <vectorSrc>
1501
1502    <BINop>                ::= "ADD"   | "ADD_SAT"
1503                             | "ADDR"  | "ADDR_SAT"
1504                             | "ADDH"  | "ADDH_SAT"
1505                             | "ADDX"  | "ADDX_SAT"
1506                             | "ADDC"  | "ADDC_SAT"
1507                             | "ADDRC" | "ADDRC_SAT"
1508                             | "ADDHC" | "ADDHC_SAT"
1509                             | "ADDXC" | "ADDXC_SAT"
1510                             | "DP3"   | "DP3_SAT"
1511                             | "DP3R"  | "DP3R_SAT"
1512                             | "DP3H"  | "DP3H_SAT"
1513                             | "DP3X"  | "DP3X_SAT"
1514                             | "DP3C"  | "DP3C_SAT"
1515                             | "DP3RC" | "DP3RC_SAT"
1516                             | "DP3HC" | "DP3HC_SAT"
1517                             | "DP3XC" | "DP3XC_SAT"
1518                             | "DP4"   | "DP4_SAT"
1519                             | "DP4R"  | "DP4R_SAT"
1520                             | "DP4H"  | "DP4H_SAT"
1521                             | "DP4X"  | "DP4X_SAT"
1522                             | "DP4C"  | "DP4C_SAT"
1523                             | "DP4RC" | "DP4RC_SAT"
1524                             | "DP4HC" | "DP4HC_SAT"
1525                             | "DP4XC" | "DP4XC_SAT"
1526                             | "DST"   | "DST_SAT"
1527                             | "DSTR"  | "DSTR_SAT"
1528                             | "DSTH"  | "DSTH_SAT"
1529                             | "DSTC"  | "DSTC_SAT"
1530                             | "DSTRC" | "DSTRC_SAT"
1531                             | "DSTHC" | "DSTHC_SAT"
1532                             | "MAX"   | "MAX_SAT"
1533                             | "MAXR"  | "MAXR_SAT"
1534                             | "MAXH"  | "MAXH_SAT"
1535                             | "MAXX"  | "MAXX_SAT"
1536                             | "MAXC"  | "MAXC_SAT"
1537                             | "MAXRC" | "MAXRC_SAT"
1538                             | "MAXHC" | "MAXHC_SAT"
1539                             | "MAXXC" | "MAXXC_SAT"
1540                             | "MIN"   | "MIN_SAT"
1541                             | "MINR"  | "MINR_SAT"
1542                             | "MINH"  | "MINH_SAT"
1543                             | "MINX"  | "MINX_SAT"
1544                             | "MINC"  | "MINC_SAT"
1545                             | "MINRC" | "MINRC_SAT"
1546                             | "MINHC" | "MINHC_SAT"
1547                             | "MINXC" | "MINXC_SAT"
1548                             | "MUL"   | "MUL_SAT"
1549                             | "MULR"  | "MULR_SAT"
1550                             | "MULH"  | "MULH_SAT"
1551                             | "MULX"  | "MULX_SAT"
1552                             | "MULC"  | "MULC_SAT"
1553                             | "MULRC" | "MULRC_SAT"
1554                             | "MULHC" | "MULHC_SAT"
1555                             | "MULXC" | "MULXC_SAT"
1556                             | "RFL"   | "RFL_SAT"
1557                             | "RFLR"  | "RFLR_SAT"
1558                             | "RFLH"  | "RFLH_SAT"
1559                             | "RFLC"  | "RFLC_SAT"
1560                             | "RFLRC" | "RFLRC_SAT"
1561                             | "RFLHC" | "RFLHC_SAT"
1562                             | "SEQ"   | "SEQ_SAT"
1563                             | "SEQR"  | "SEQR_SAT"
1564                             | "SEQH"  | "SEQH_SAT"
1565                             | "SEQX"  | "SEQX_SAT"
1566                             | "SEQC"  | "SEQC_SAT"
1567                             | "SEQRC" | "SEQRC_SAT"
1568                             | "SEQHC" | "SEQHC_SAT"
1569                             | "SEQXC" | "SEQXC_SAT"
1570                             | "SFL"   | "SFL_SAT"
1571                             | "SFLR"  | "SFLR_SAT"
1572                             | "SFLH"  | "SFLH_SAT"
1573                             | "SFLX"  | "SFLX_SAT"
1574                             | "SFLC"  | "SFLC_SAT"
1575                             | "SFLRC" | "SFLRC_SAT"
1576                             | "SFLHC" | "SFLHC_SAT"
1577                             | "SFLXC" | "SFLXC_SAT"
1578                             | "SGE"   | "SGE_SAT"
1579                             | "SGER"  | "SGER_SAT"
1580                             | "SGEH"  | "SGEH_SAT"
1581                             | "SGEX"  | "SGEX_SAT"
1582                             | "SGEC"  | "SGEC_SAT"
1583                             | "SGERC" | "SGERC_SAT"
1584                             | "SGEHC" | "SGEHC_SAT"
1585                             | "SGEXC" | "SGEXC_SAT"
1586                             | "SGT"   | "SGT_SAT"
1587                             | "SGTR"  | "SGTR_SAT"
1588                             | "SGTH"  | "SGTH_SAT"
1589                             | "SGTX"  | "SGTX_SAT"
1590                             | "SGTC"  | "SGTC_SAT"
1591                             | "SGTRC" | "SGTRC_SAT"
1592                             | "SGTHC" | "SGTHC_SAT"
1593                             | "SGTXC" | "SGTXC_SAT"
1594                             | "SLE"   | "SLE_SAT"
1595                             | "SLER"  | "SLER_SAT"
1596                             | "SLEH"  | "SLEH_SAT"
1597                             | "SLEX"  | "SLEX_SAT"
1598                             | "SLEC"  | "SLEC_SAT"
1599                             | "SLERC" | "SLERC_SAT"
1600                             | "SLEHC" | "SLEHC_SAT"
1601                             | "SLEXC" | "SLEXC_SAT"
1602                             | "SLT"   | "SLT_SAT"
1603                             | "SLTR"  | "SLTR_SAT"
1604                             | "SLTH"  | "SLTH_SAT"
1605                             | "SLTX"  | "SLTX_SAT"
1606                             | "SLTC"  | "SLTC_SAT"
1607                             | "SLTRC" | "SLTRC_SAT"
1608                             | "SLTHC" | "SLTHC_SAT"
1609                             | "SLTXC" | "SLTXC_SAT"
1610                             | "SNE"   | "SNE_SAT"
1611                             | "SNER"  | "SNER_SAT"
1612                             | "SNEH"  | "SNEH_SAT"
1613                             | "SNEX"  | "SNEX_SAT"
1614                             | "SNEC"  | "SNEC_SAT"
1615                             | "SNERC" | "SNERC_SAT"
1616                             | "SNEHC" | "SNEHC_SAT"
1617                             | "SNEXC" | "SNEXC_SAT"
1618                             | "STR"   | "STR_SAT"
1619                             | "STRR"  | "STRR_SAT"
1620                             | "STRH"  | "STRH_SAT"
1621                             | "STRX"  | "STRX_SAT"
1622                             | "STRC"  | "STRC_SAT"
1623                             | "STRRC" | "STRRC_SAT"
1624                             | "STRHC" | "STRHC_SAT"
1625                             | "STRXC" | "STRXC_SAT"
1626                             | "SUB"   | "SUB_SAT"
1627                             | "SUBR"  | "SUBR_SAT"
1628                             | "SUBH"  | "SUBH_SAT"
1629                             | "SUBX"  | "SUBX_SAT"
1630                             | "SUBC"  | "SUBC_SAT"
1631                             | "SUBRC" | "SUBRC_SAT"
1632                             | "SUBHC" | "SUBHC_SAT"
1633                             | "SUBXC" | "SUBXC_SAT"
1634
1635    <TRIop-instruction>    ::= <TRIop> <maskedDstReg> ","
1636                               <vectorSrc> "," <vectorSrc> ","
1637                               <vectorSrc>
1638
1639    <TRIop>                ::= "MAD"   | "MAD_SAT"
1640                             | "MADR"  | "MADR_SAT"
1641                             | "MADH"  | "MADH_SAT"
1642                             | "MADX"  | "MADX_SAT"
1643                             | "MADC"  | "MADC_SAT"
1644                             | "MADRC" | "MADRC_SAT"
1645                             | "MADHC" | "MADHC_SAT"
1646                             | "MADXC" | "MADXC_SAT"
1647                             | "LRP"   | "LRP_SAT"
1648                             | "LRPR"  | "LRPR_SAT"
1649                             | "LRPH"  | "LRPH_SAT"
1650                             | "LRPX"  | "LRPX_SAT"
1651                             | "LRPC"  | "LRPC_SAT"
1652                             | "LRPRC" | "LRPRC_SAT"
1653                             | "LRPHC" | "LRPHC_SAT"
1654                             | "LRPXC" | "LRPXC_SAT"
1655                             | "X2D"   | "X2D_SAT"
1656                             | "X2DR"  | "X2DR_SAT"
1657                             | "X2DH"  | "X2DH_SAT"
1658                             | "X2DC"  | "X2DC_SAT"
1659                             | "X2DRC" | "X2DRC_SAT"
1660                             | "X2DHC" | "X2DHC_SAT"
1661
1662    <KILop-instruction>    ::= <KILop> <ccMask>
1663
1664    <KILop>                ::= "KIL"
1665
1666    <TEXop-instruction>    ::= <TEXop> <maskedDstReg> ","
1667                               <vectorSrc> "," <texImageId>
1668
1669    <TEXop>                ::= "TEX"  | "TEX_SAT"
1670                             | "TEXC" | "TEXC_SAT"
1671                             | "TXP"  | "TXP_SAT"
1672                             | "TXPC" | "TXPC_SAT"
1673
1674    <TXDop-instruction>    ::= <TXDop> <maskedDstReg> ","
1675                               <vectorSrc> "," <vectorSrc> ","
1676                               <vectorSrc> "," <texImageId>
1677
1678    <TXDop>                ::= "TXD"  | "TXD_SAT"
1679                             | "TXDC" | "TXDC_SAT"
1680
1681    <scalarSrc>            ::= <absScalarSrc>
1682                             | <baseScalarSrc>
1683
1684    <absScalarSrc>         ::= <negate> "|" <baseScalarSrc> "|"
1685
1686    <baseScalarSrc>        ::= <signedScalarConstant>
1687                             | <negate> <namedScalarConstant>
1688                             | <negate> <vectorConstant> <scalarSuffix>
1689                             | <negate> <namedLocalParameter> <scalarSuffix>
1690                             | <negate> <numberedLocal> <scalarSuffix>
1691                             | <negate> <srcRegister> <scalarSuffix>
1692
1693    <vectorSrc>            ::= <absVectorSrc>
1694                             | <baseVectorSrc>
1695
1696    <absVectorSrc>         ::= <negate> "|" <baseVectorSrc> "|"
1697
1698    <baseVectorSrc>        ::= <signedScalarConstant>
1699                             | <negate> <namedScalarConstant>
1700                             | <negate> <vectorConstant> <scalarSuffix>
1701                             | <negate> <vectorConstant> <swizzleSuffix>
1702                             | <negate> <namedLocalParameter> <scalarSuffix>
1703                             | <negate> <namedLocalParameter> <swizzleSuffix>
1704                             | <negate> <numberedLocal> <scalarSuffix>
1705                             | <negate> <numberedLocal> <swizzleSuffix>
1706                             | <negate> <srcRegister> <scalarSuffix>
1707                             | <negate> <srcRegister> <swizzleSuffix>
1708
1709    <maskedDstReg>         ::= <dstRegister> <optionalWriteMask>
1710                               <optionalCCMask>
1711
1712    <dstRegister>          ::= <fragTempReg>
1713                             | <fragOutputReg>
1714                             | "RC"
1715                             | "HC"
1716
1717    <optionalCCMask>       ::= "(" <ccMask> ")"
1718                             | ""
1719
1720    <ccMask>               ::= <ccMaskRule> <swizzleSuffix>
1721                             | <ccMaskRule> <scalarSuffix>
1722
1723    <ccMaskRule>           ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE" |
1724                               "TR" | "FL"
1725
1726    <optionalWriteMask>    ::= ""
1727                             | "." "x"
1728                             | "."     "y"
1729                             | "." "x" "y"
1730                             | "."         "z"
1731                             | "." "x"     "z"
1732                             | "."     "y" "z"
1733                             | "." "x" "y" "z"
1734                             | "."             "w"
1735                             | "." "x"         "w"
1736                             | "."     "y"     "w"
1737                             | "." "x" "y"     "w"
1738                             | "."         "z" "w"
1739                             | "." "x"     "z" "w"
1740                             | "."     "y" "z" "w"
1741                             | "." "x" "y" "z" "w"
1742
1743    <srcRegister>          ::= <fragAttribReg>
1744                             | <fragTempReg>
1745
1746    <fragAttribReg>        ::= "f" "[" <fragAttribRegId> "]"
1747
1748    <fragAttribRegId>      ::= "WPOS" | "COL0" | "COL1" | "FOGC" | "TEX0"
1749                             | "TEX1" | "TEX2" | "TEX3" | "TEX4" | "TEX5"
1750                             | "TEX6" | "TEX7"
1751
1752    <fragTempReg>          ::= <fragF32Reg>
1753                             | <fragF16Reg>
1754
1755    <fragF32Reg>           ::= "R0"  | "R1"  | "R2"  | "R3"
1756                             | "R4"  | "R5"  | "R6"  | "R7"
1757                             | "R8"  | "R9"  | "R10" | "R11"
1758                             | "R12" | "R13" | "R14" | "R15"
1759                             | "R16" | "R17" | "R18" | "R19"
1760                             | "R20" | "R21" | "R22" | "R23"
1761                             | "R24" | "R25" | "R26" | "R27"
1762                             | "R28" | "R29" | "R30" | "R31"
1763
1764    <fragF16Reg>           ::= "H0"  | "H1"  | "H2"  | "H3"
1765                             | "H4"  | "H5"  | "H6"  | "H7"
1766                             | "H8"  | "H9"  | "H10" | "H11"
1767                             | "H12" | "H13" | "H14" | "H15"
1768                             | "H16" | "H17" | "H18" | "H19"
1769                             | "H20" | "H21" | "H22" | "H23"
1770                             | "H24" | "H25" | "H26" | "H27"
1771                             | "H28" | "H29" | "H30" | "H31"
1772                             | "H32" | "H33" | "H34" | "H35"
1773                             | "H36" | "H37" | "H38" | "H39"
1774                             | "H40" | "H41" | "H42" | "H43"
1775                             | "H44" | "H45" | "H46" | "H47"
1776                             | "H48" | "H49" | "H50" | "H51"
1777                             | "H52" | "H53" | "H54" | "H55"
1778                             | "H56" | "H57" | "H58" | "H59"
1779                             | "H60" | "H61" | "H62" | "H63"
1780
1781    <fragOutputReg>        ::= "o" "[" <fragOutputRegName> "]"
1782
1783    <fragOutputRegName>    ::= "COLR" | "COLH" | "DEPR"
1784
1785    <numberedLocal>        ::= "p" "[" <localNumber> "]"
1786
1787    <localNumber>          ::= <integer> from 0 to
1788                               MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV - 1
1789
1790    <scalarSuffix>         ::= "." <component>
1791
1792    <swizzleSuffix>        ::= ""
1793                             | "." <component> <component>
1794                                   <component> <component>
1795
1796    <component>            ::= "x" | "y" | "z" | "w"
1797
1798    <texImageId>           ::= <texImageUnit> "," <texImageTarget>
1799
1800    <texImageUnit>         ::= "TEX0"  | "TEX1"  | "TEX2"  | "TEX3"
1801                             | "TEX4"  | "TEX5"  | "TEX6"  | "TEX7"
1802                             | "TEX8"  | "TEX9"  | "TEX10" | "TEX11"
1803                             | "TEX12" | "TEX13" | "TEX14" | "TEX15"
1804
1805    <texImageTarget>       ::= "1D" | "2D" | "3D" | "CUBE" | "RECT"
1806
1807    <constantDefinition>   ::= "DEFINE" <namedVectorConstant> "="
1808                               <vectorConstant>
1809                             | "DEFINE" <namedScalarConstant> "="
1810                               <scalarConstant>
1811
1812    <localDeclaration>     ::= "DECLARE" <namedLocalParameter>
1813                               <optionalLocalValue>
1814
1815    <optionalLocalValue>   ::= ""
1816                             | "=" <vectorConstant>
1817                             | "=" <scalarConstant>
1818
1819    <vectorConstant>       ::= {" <vectorConstantList> "}"
1820                             | <namedVectorConstant>
1821
1822    <vectorConstantList>   ::= <scalarConstant>
1823                             | <scalarConstant> "," <scalarConstant>
1824                             | <scalarConstant> "," <scalarConstant> ","
1825                               <scalarConstant>
1826                             | <scalarConstant> "," <scalarConstant> ","
1827                               <scalarConstant> "," <scalarConstant>
1828
1829    <scalarConstant>       ::= <signedScalarConstant>
1830                             | <namedScalarConstant>
1831
1832    <signedScalarConstant> ::= <optionalSign> <floatConstant>
1833
1834    <namedScalarConstant>  ::= <identifier>    ((name of a scalar constant
1835                                                 in a DEFINE instruction))
1836
1837    <namedVectorConstant>  ::= <identifier>    ((name of a vector constant
1838                                                 in a DEFINE instruction))
1839
1840    <namedLocalParameter>  ::= <identifier>    ((name of a local parameter
1841                                                 in a DECLARE instruction))
1842
1843    <negate>               ::= "-" | "+" | ""
1844
1845    <optionalSign>         ::= "-" | "+" | ""
1846
1847    <identifier>           ::= see text below
1848
1849    <floatConstant>        ::= see text below
1850
1851
1852    The <identifier> rule matches a sequence of one or more letters ("A"
1853    through "Z", "a" through "z", "_", and "$") and digits ("0" through "9);
1854    the first character must be a letter.  The underscore ("_") and dollar
1855    sign ("$") count as a letters.  Upper and lower case letters are different
1856    (names are case-sensitive).
1857
1858    The <floatConstant> rule matches a floating-point constant consisting
1859    of an integer part, a decimal point, a fraction part, an "e" or
1860    "E", and an optionally signed integer exponent.  The integer and
1861    fraction parts both consist of a sequence of on or more digits ("0"
1862    through "9").  Either the integer part or the fraction parts (not
1863    both) may be missing; either the decimal point or the "e" (or "E")
1864    and the exponent (not both) may be missing.
1865
1866    A fragment program fails to load if it contains more than the maximum
1867    number of executable instructions.  If ARB_fragment_program is supported,
1868    this limit is the value of MAX_PROGRAM_INSTRUCTIONS_ARB for the
1869    FRAGMENT_PROGRAM_ARB target.  Otherwise, the limit is 1024.  Executable
1870    instructions are those matching the <instruction> rule in the grammar, and
1871    do not include DEFINE or DECLARE instructions.
1872
1873    A fragment program fails to load if its total temporary and output
1874    register count exceeds 64.  Each fp32 temporary or output register used by
1875    the program (R0-R31, o[COLR], and o[DEPR]) counts as two registers; each
1876    fp16 temporary or output register used by the program (H0-H63 and o[COLH])
1877    count as a single register.
1878
1879    A fragment program fails to load if any instruction sources more than one
1880    unique fragment attribute register.  Instructions sourcing the same
1881    attribute register multiple times are acceptable.
1882
1883    A fragment program fails to load if any instruction sources more than one
1884    unique program parameter register.  Instructions sourcing the same program
1885    parameter multiple times are acceptable.
1886
1887    A fragment program fails to load if multiple texture lookup instructions
1888    reference different targets for the same texture image unit.
1889
1890    A fragment program fails to load if it writes to both the o[COLR] and
1891    o[COLH] output registers.
1892
1893    The error INVALID_OPERATION is generated by LoadProgramNV if a fragment
1894    program fails to load because it is not syntactically correct or for one
1895    of the semantic restrictions listed above.
1896
1897    The error INVALID_OPERATION is generated by LoadProgramNV if a program is
1898    loaded for id when id is currently loaded with a program of a different
1899    target.
1900
1901    A successfully loaded fragment program is parsed into a sequence of
1902    instructions.  Each instruction is identified by its tokenized name.  The
1903    operation of these instructions when executed is defined in Sections
1904    3.11.4 and 3.11.5.
1905
1906
1907    Section 3.11.4, Fragment Program Operation
1908
1909    There are forty-five fragment program instructions.  Fragment program
1910    instructions may have up to eight variants, including a suffix of "R",
1911    "H", or "X" to specify arithmetic precision (section 3.11.4.2), a suffix
1912    of "C" to allow an update of the condition code register (section
1913    3.11.4.4), and a suffix of "_SAT" to clamp the result vector components to
1914    the range [0,1] (section 3.11.4.4).  For example, the sixteen forms of the
1915    "ADD" instruction are "ADD", "ADDR", "ADDH", "ADDX", "ADDC", "ADDRC",
1916    "ADDHC", "ADDXC", "ADD_SAT", "ADDR_SAT", "ADDH_SAT", "ADDX_SAT",
1917    "ADDC_SAT", "ADDRC_SAT", "ADDHC_SAT", and "ADDXC_SAT".
1918
1919    Some mathematical instructions that support precision suffixes, typically
1920    those that involve complicated floating-point computations, do not support
1921    the "X" precision suffix.
1922
1923    The fragment program instructions and their respective input and output
1924    parameters are summarized in Table X.4.
1925
1926      Instruction          Inputs  Output   Description
1927      -----------------    ------  ------   --------------------------------
1928      ADD[RHX][C][_SAT]    v,v     v        add
1929      COS[RH ][C][_SAT]    s       ssss     cosine
1930      DDX[RH ][C][_SAT]    v       v        derivative relative to x
1931      DDY[RH ][C][_SAT]    v       v        derivative relative to y
1932      DP3[RHX][C][_SAT]    v,v     ssss     3-component dot product
1933      DP4[RHX][C][_SAT]    v,v     ssss     4-component dot product
1934      DST[RH ][C][_SAT]    v,v     v        distance vector
1935      EX2[RH ][C][_SAT]    s       ssss     exponential base 2
1936      FLR[RHX][C][_SAT]    v       v        floor
1937      FRC[RHX][C][_SAT]    v       v        fraction
1938      KIL                  none    none     conditionally discard fragment
1939      LG2[RH ][C][_SAT]    s       ssss     logarithm base 2
1940      LIT[RH ][C][_SAT]    v       v        compute light coefficients
1941      LRP[RHX][C][_SAT]    v,v,v   v        linear interpolation
1942      MAD[RHX][C][_SAT]    v,v,v   v        multiply and add
1943      MAX[RHX][C][_SAT]    v,v     v        maximum
1944      MIN[RHX][C][_SAT]    v,v     v        minimum
1945      MOV[RHX][C][_SAT]    v       v        move
1946      MUL[RHX][C][_SAT]    v,v     v        multiply
1947      PK2H                 v       ssss     pack two 16-bit floats
1948      PK2US                v       ssss     pack two unsigned 16-bit scalars
1949      PK4B                 v       ssss     pack four signed 8-bit scalars
1950      PK4UB                v       ssss     pack four unsigned 8-bit scalars
1951      POW[RH ][C][_SAT]    s,s     ssss     exponentiation (x^y)
1952      RCP[RH ][C][_SAT]    s       ssss     reciprocal
1953      RFL[RH ][C][_SAT]    v,v     v        reflection vector
1954      RSQ[RH ][C][_SAT]    s       ssss     reciprocal square root
1955      SEQ[RHX][C][_SAT]    v,v     v        set on equal
1956      SFL[RHX][C][_SAT]    v,v     v        set on false
1957      SGE[RHX][C][_SAT]    v,v     v        set on greater than or equal
1958      SGT[RHX][C][_SAT]    v,v     v        set on greater than
1959      SIN[RH ][C][_SAT]    s       ssss     sine
1960      SLE[RHX][C][_SAT]    v,v     v        set on less than or equal
1961      SLT[RHX][C][_SAT]    v,v     v        set on less than
1962      SNE[RHX][C][_SAT]    v,v     v        set on not equal
1963      STR[RHX][C][_SAT]    v,v     v        set on true
1964      SUB[RHX][C][_SAT]    v,v     v        subtract
1965      TEX[C][_SAT]         v       v        texture lookup
1966      TXD[C][_SAT]         v,v,v   v        texture lookup w/partials
1967      TXP[C][_SAT]         v       v        projective texture lookup
1968      UP2H[C][_SAT]        s       v        unpack two 16-bit floats
1969      UP2US[C][_SAT]       s       v        unpack two unsigned 16-bit scalars
1970      UP4B[C][_SAT]        s       v        unpack four signed 8-bit scalars
1971      UP4UB[C][_SAT]       s       v        unpack four unsigned 8-bit scalars
1972      X2D[RH ][C][_SAT]    v,v,v   v        2D coordinate transformation
1973
1974    Table X.4:  Summary of fragment program instructions.  "[RHX]" indicates
1975    an optional arithmetic precision suffix.  "[C]" indicates an optional
1976    condition code update suffix.  "[_SAT]" indicates an optional clamp of
1977    result vector components to [0,1].  "v" indicates a 4-component vector
1978    input or output, "s" indicates a scalar input, and "ssss" indicates a
1979    scalar output replicated across a 4-component vector.
1980
1981
1982    Section 3.11.4.1:  Fragment Program Storage Precision
1983
1984    Registers in fragment program are stored in two different representations:
1985    16-bit floating-point (fp16) and 32-bit floating-point (fp32).  There is
1986    an additional 12-bit fixed-point representation (fx12) used only as an
1987    internal representation for instructions with the "X" precision qualifier.
1988
1989    In the 32-bit float (fp32) representation, each component is represented
1990    in floating-point with eight exponent and twenty-three mantissa bits, as
1991    in the standard IEEE single-precision format.  If S represents the sign (0
1992    or 1), E represents the exponent in the range [0,255], and M represents
1993    the mantissa in the range [0,2^23-1], then a fp32 float is decoded as:
1994
1995       (-1)^S * 0.0,                           if E == 0,
1996       (-1)^S * 2^(E-127) * (1 + M/2^23),      if 0 < E < 255,
1997       (-1)^S * INF,                           if E == 255 and M == 0,
1998       NaN,                                    if E == 255 and M != 0.
1999
2000    INF (Infinity) is a special representation indicating numerical overflow.
2001    NaN (Not a Number) is a special representation indicating the result of
2002    illegal arithmetic operations, such as computing the square root or
2003    logarithm of a negative number.  Note that all normal fp32 values, zero,
2004    and INF have an associated sign.  -0.0 and +0.0 are considered equivalent
2005    for the purposes of comparisons.
2006
2007    This representation is identical to the IEEE single-precision
2008    floating-point standard, except that no special representation is provided
2009    for denorms -- numbers in the range (-2^-126, +2^-126).  All such numbers
2010    are flushed to zero.
2011
2012    In a 16-bit float (fp16) register, each component is represented
2013    similarly, except with only five exponent and ten mantissa bits.  If S
2014    represents the sign (0 or 1), E represents the exponent in the range
2015    [0,31], and M represents the mantissa in the range [0,2^10-1], then an
2016    fp32 float is decoded as:
2017
2018       (-1)^S * 0.0,                           if E == 0 and M == 0,
2019       (-1)^S * 2^-14 * M/2^10                 if E == 0 and M != 0,
2020       (-1)^S * 2^(E-15) * (1 + M/2^10),       if 0 < E < 31,
2021       (-1)^S * INF,                           if E == 31 and M == 0, or
2022       NaN,                                    if E == 31 and M != 0.
2023
2024    One important difference is that the fp16 representation, unlike fp32,
2025    supports denorms to maximize the limited precision of the 16-bit floating
2026    point encodings.
2027
2028    In the 12-bit fixed-point (fx12) format, numbers are represented as signed
2029    12-bit two's complement integers with 10 fraction bits.  The range of
2030    representable values is [-2048/1024, +2047/1024].
2031
2032    Section 3.11.4.2:  Fragment Program Operation Precision
2033
2034    Fragment program instructions frequently perform mathematical operations.
2035    Such operations may be performed at one of three different precisions.
2036    Fragment programs can specify the precision of each instruction by using
2037    the precision suffix.  If an instruction has a suffix of "R", calculations
2038    are carried out with 32-bit floating point operands and results.  If an
2039    instruction has a suffix of "H", calculations are carried out using 16-bit
2040    floating point operands and results.  If an instruction has a suffix of
2041    "X", calculations are carried out using 12-bit fixed point operands and
2042    results.  For example, the instruction "MULR" performs a 32-bit
2043    floating-point multiply, "MULH" performs a 16-bit floating-point multiply,
2044    and "MULX" performs a 12-bit fixed-point multiply.  If no precision suffix
2045    is specified, calculations are carried out using the precision of the
2046    temporary register receiving the result.
2047
2048    Fragment program instructions may source registers or constants whose
2049    precisions differ from the precision specified with the instruction.
2050    Instructions may also generate intermediate results with a different
2051    precision than that of the destination register.  In these cases, the
2052    values sourced are converted to the precision specified by the
2053    instruction.
2054
2055    When converting to fx12 format, -INF and any values less than -2048/1024
2056    become -2048/1024.  +INF, and any values greater than +2047/1024 become
2057    +2047/1024.  NaN becomes 0.
2058
2059    When converting to fp16 format, any values less than or equal to -2^16 are
2060    converted to -INF.  Any values greater than or equal to +2^16 are
2061    converted to +INF.  -INF, +INF, NaN, -0.0, and +0.0 are unchanged.  Any
2062    other values that are not exactly representable in fp16 format are
2063    converted to one of the two nearest representable values.
2064
2065    When converting to fp32 format, any values less than or equal to -2^128
2066    are converted to -INF.  Any values greater than or equal to +2^128 are
2067    converted to +INF.  -INF, +INF, NaN, -0.0, and +0.0 are unchanged.  Any
2068    other values that are not exactly representable in fp32 format are
2069    converted to one of the two nearest representable values.
2070
2071    Fragment program instructions using the fragment attribute registers
2072    f[FOGC] or f[TEX0] through f[TEX7] will be carried out at full fp32
2073    precision, regardless of the precision specified by the instruction.
2074
2075    Section 3.11.4.3:  Fragment Program Operands
2076
2077    Except for KIL, fragment program instructions operate on either vector or
2078    scalar operands, indicated in the grammar (see section 3.11.3) by the
2079    rules <vectorSrc> and <scalarSrc> respectively.
2080
2081    The basic set of scalar operands is defined by the grammar rule
2082    <baseScalarSrc>.  Scalar operands can be scalar constants (embedded or
2083    named), or single components of vector constants, local parameters, or
2084    registers allowed by the <srcRegister> rule.  A vector component is
2085    selected by the <scalarSuffix> rule, where the characters "x", "y", "z",
2086    and "w" select the x, y, z, and w components, respectively, of the vector.
2087
2088    The basic set of vector operands is defined by the grammar rule
2089    <baseVectorSrc>.  Vector operands can include vector constants, local
2090    parameters, or registers allowed by the <srcRegister> rule.
2091
2092    Basic vector operands can be swizzled according to the <swizzleSuffix>
2093    rule.  In its most general form, the <swizzleSuffix> rule matches the
2094    pattern ".????" where each question mark is one of "x", "y", "z", or "w".
2095    For such patterns, the x, y, z, and w components of the operand are taken
2096    from the vector components named by the first, second, third, and fourth
2097    character of the pattern, respectively.  For example, if the swizzle
2098    suffix is ".yzzx" and the specified source contains {2,8,9,0}, the
2099    swizzled operand used by the instruction is {8,9,9,2}.  If the
2100    <swizzleSuffix> rule matches "", it is treated as though it were ".xyzw".
2101
2102    Operands can optionally be negated according to the <negate> rule in
2103    <baseScalarSrc> or <baseVectorSrc>.  If the <negate> matches "-", each
2104    value is negated.
2105
2106    The absolute value of operands can be taken if the <vectorSrc> or
2107    <scalarSrc> rules match <absScalarSrc> or <absVectorSrc>.  In this case,
2108    the absolute value of each component is taken.  In addition, if the
2109    <negate> rule in <absScalarSrc> or <absVectorSrc> matches "-", the result
2110    is then negated.
2111
2112    Instructions requiring vector operands can also use scalar operands in the
2113    case where the <vectorSrc> rule matches <scalarSrc>.  In such cases, a
2114    4-component vector is produced by replicating the scalar.
2115
2116    After operands are loaded, they are converted to a data type corresponding
2117    to the operation precision specified in the fragment program instruction.
2118
2119    The following pseudo-code spells out the operand generation process.
2120    "SrcT" and "InstT" refer to the data types of the specified register or
2121    constant and the instruction, respectively.  "VecSrcT" and "VecInstT"
2122    refer to 4-component vectors of the corresponding type.  "absolute" is
2123    TRUE if the operand matches the <absScalarSrc> or <absVectorSrc> rules,
2124    and FALSE otherwise.  "negateBase" is TRUE if the <negate> rule in
2125    <baseScalarSrc> or <baseVectorSrc> matches "-" and FALSE otherwise.
2126    "negateAbs" is TRUE if the <negate> rule in <absScalarSrc> or
2127    <absVectorSrc> matches "-" and FALSE otherwise.  The ".c***", ".*c**",
2128    ".**c*", ".***c" modifiers refer to the x, y, z, and w components obtained
2129    by the swizzle operation.  TypeConvert() is assumed to convert a scalar of
2130    type SrcT to a scalar of type InstT using the type conversion process
2131    specified above.
2132
2133      VecInstT VectorLoad(VecSrcT source)
2134      {
2135          VecSrcT srcVal;
2136          VecInstT convertedVal;
2137
2138          srcVal.x = source.c***;
2139          srcVal.y = source.*c**;
2140          srcVal.z = source.**c*;
2141          srcVal.w = source.***c;
2142          if (negateBase) {
2143             srcVal.x = -srcVal.x;
2144             srcVal.y = -srcVal.y;
2145             srcVal.z = -srcVal.z;
2146             srcVal.w = -srcVal.w;
2147          }
2148          if (absolute) {
2149             srcVal.x = abs(srcVal.x);
2150             srcVal.y = abs(srcVal.y);
2151             srcVal.z = abs(srcVal.z);
2152             srcVal.w = abs(srcVal.w);
2153          }
2154          if (negateAbs) {
2155             srcVal.x = -srcVal.x;
2156             srcVal.y = -srcVal.y;
2157             srcVal.z = -srcVal.z;
2158             srcVal.w = -srcVal.w;
2159          }
2160
2161          convertedVal.x = TypeConvert(srcVal.x);
2162          convertedVal.y = TypeConvert(srcVal.y);
2163          convertedVal.z = TypeConvert(srcVal.z);
2164          convertedVal.w = TypeConvert(srcVal.w);
2165          return convertedVal;
2166      }
2167
2168      InstT ScalarLoad(VecSrcT source)
2169      {
2170          SrcT srcVal;
2171          InstT convertedVal;
2172
2173          srcVal = source.c***;
2174          if (negateBase) {
2175            srcVal = -srcVal;
2176          }
2177          if (absolute) {
2178             srcVal = abs(srcVal);
2179          }
2180          if (negateAbs) {
2181            srcVal = -srcVal;
2182          }
2183
2184          convertedVal = TypeConvert(srcVal);
2185          return convertedVal;
2186      }
2187
2188
2189    Section 3.11.4.4, Fragment Program Destination Register Update
2190
2191    Each fragment program instruction, except for KIL, writes a 4-component
2192    result vector to a single temporary or output register.
2193
2194    The four components of the result vector are first optionally clamped to
2195    the range [0,1].  The components will be clamped if and only if the result
2196    clamp suffix "_SAT" is present in the instruction name.  The instruction
2197    "ADD_SAT" will clamp the results to [0,1]; the otherwise equivalent
2198    instruction "ADD" will not.
2199
2200    Since the instruction may be carried out at a different precision than the
2201    destination register, the components of the results vector are then
2202    converted to the data type corresponding to destination register.
2203
2204    Writes to individual components of the temporary register are controlled
2205    by two sets of enables: individual component write masks specified as part
2206    of the instruction and the optional condition code mask.
2207
2208    The component write mask is specified by the <optionalWriteMask> rule
2209    found in the <maskedDstReg> rule.  If the optional mask is "", all
2210    components are enabled.  Otherwise, the optional mask names the individual
2211    components to enable.  The characters "x", "y", "z", and "w" match the x,
2212    y, z, and w components respectively.  For example, an optional mask of
2213    ".xzw" indicates that the x, z, and w components should be enabled for
2214    writing but the y component should not.  The grammar requires that the
2215    destination register mask components must be listed in "xyzw" order.
2216
2217    The optional condition code mask is specified by the <optionalCCMask> rule
2218    found in the <maskedDstReg> rule.  If <optionalCCMask> matches "", all
2219    components are enabled.  Otherwise, the condition code register is loaded
2220    and swizzled according to the swizzling specified by <swizzleSuffix>.
2221    Each component of the swizzled condition code is tested according to the
2222    rule given by <ccMaskRule>.  <ccMaskRule> may have the values "EQ", "NE",
2223    "LT", "GE", LE", or "GT", which mean to enable writes if the corresponding
2224    condition code field evaluates to equal, not equal, less than, greater
2225    than or equal, less than or equal, or greater than, respectively.
2226    Comparisons involving condition codes of "UN" (unordered) evaluate to true
2227    for "NE" and false otherwise.  For example, if the condition code is
2228    (GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle
2229    operation will load (EQ,LT,GT,GT) and the mask will thus will enable
2230    writes on the y, z, and w components.  In addition, "TR" always enables
2231    writes and "FL" always disables writes, regardless of the condition code.
2232
2233    Each component of the destination register is updated with the result of
2234    the fragment program if and only if the component is enabled for writes by
2235    both the component write mask and the optional condition code mask.
2236    Otherwise, the component of the destination register remains unchanged.
2237
2238    A fragment program instruction can also optionally update the condition
2239    code register.  The condition code is updated if the condition code
2240    register update suffix "C" is present in the instruction name.  The
2241    instruction "ADDC" will update the condition code; the otherwise
2242    equivalent instruction "ADD" will not.  If condition code updates are
2243    enabled, each component of the destination register enabled for writes is
2244    compared to zero.  The corresponding component of the condition code is
2245    set to "LT", "EQ", or "GT", if the written component is less than, equal
2246    to, or greater than zero, respectively.  Condition code components are set
2247    to "UN" if the written component is NaN.  Note that values of -0.0 and
2248    +0.0 both evaluate to "EQ".  If a component of the destination register is
2249    not enabled for writes, the corresponding condition code component is
2250    unchanged.
2251
2252    In the following example code,
2253
2254        # R1=(-2, 0, 2, NaN)              R0                  CC
2255        MOVC R0, R1;               # ( -2,  0,   2, NaN) (LT,EQ,GT,UN)
2256        MOVC R0.xyz, R1.yzwx;      # (  0,  2, NaN, NaN) (EQ,GT,UN,UN)
2257        MOVC R0 (NE), R1.zywx;     # (  0,  0, NaN,  -2) (EQ,EQ,UN,LT)
2258
2259    the first instruction writes (-2,0,2,NaN) to R0 and updates the condition
2260    code to (LT,EQ,GT,UN).  The second instruction, only the "x", "y", and "z"
2261    components of R0 and the condition code are updated, so R0 ends up with
2262    (0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN).  In the
2263    third instruction, the condition code mask disables writes to the x
2264    component (its condition code field is "EQ"), so R0 ends up with
2265    (0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT).
2266
2267    The following pseudocode illustrates the process of writing a result
2268    vector to the destination register.  In the example, "ccMaskRule" refers
2269    to the condition code mask rule given by <ccMaskRule> (or "" if no rule is
2270    specified), "instrmask" refers to the component write mask given by the
2271    <optionalWriteMask> rule, "updatecc" is TRUE if condition code updates are
2272    enabled, and "clamp01" is TRUE if [0,1] result clamping is enabled.
2273    "destination" and "cc" refer to the register selected by <dstRegister> and
2274    the condition code, respectively.
2275
2276      boolean TestCC(CondCode field) {
2277          switch (ccMaskRule) {
2278          case "EQ":  return (field == "EQ");
2279          case "NE":  return (field != "EQ");
2280          case "LT":  return (field == "LT");
2281          case "GE":  return (field == "GT" || field == "EQ");
2282          case "LE":  return (field == "LT" || field == "EQ");
2283          case "GT":  return (field == "GT");
2284          case "TR":  return TRUE;
2285          case "FL":  return FALSE;
2286          case "":    return TRUE;
2287      }
2288
2289      enum GenerateCC(DstT value) {
2290        if (value == NaN) {
2291          return UN;
2292        } else if (value < 0) {
2293          return LT;
2294        } else if (value == 0) {
2295          return EQ;
2296        } else {
2297          return GT;
2298        }
2299      }
2300
2301      void UpdateDestination(VecDstT destination, VecInstT result)
2302      {
2303          // Load the original destination register and condition code.
2304          VecDstT resultDst;
2305          VecDstT merged;
2306          VecCC   mergedCC;
2307
2308          // Clamp the result vector components to [0,1], if requested.
2309          if (clamp01) {
2310              if (result.x < 0)      result.x = 0;
2311              else if (result.x > 1) result.x = 1;
2312              if (result.y < 0)      result.y = 0;
2313              else if (result.y > 1) result.y = 1;
2314              if (result.z < 0)      result.z = 0;
2315              else if (result.z > 1) result.z = 1;
2316              if (result.w < 0)      result.w = 0;
2317              else if (result.w > 1) result.w = 1;
2318          }
2319
2320          // Convert the result to the type of the destination register.
2321          resultDst.x = TypeConvert(result.x);
2322          resultDst.y = TypeConvert(result.y);
2323          resultDst.z = TypeConvert(result.z);
2324          resultDst.w = TypeConvert(result.w);
2325
2326          // Merge the converted result into the destination register, under
2327          // control of the compile- and run-time write masks.
2328          merged = destination;
2329          mergedCC = cc;
2330          if (instrMask.x && TestCC(cc.c***)) {
2331              merged.x = result.x;
2332              if (updatecc) mergedCC.x = GenerateCC(result.x);
2333          }
2334          if (instrMask.y && TestCC(cc.*c**)) {
2335              merged.y = result.y;
2336              if (updatecc) mergedCC.y = GenerateCC(result.y);
2337          }
2338          if (instrMask.z && TestCC(cc.**c*)) {
2339              merged.z = result.z;
2340              if (updatecc) mergedCC.z = GenerateCC(result.z);
2341          }
2342          if (instrMask.w && TestCC(cc.***c)) {
2343              merged.w = result.w;
2344              if (updatecc) mergedCC.w = GenerateCC(result.w);
2345          }
2346
2347          // Write out the new destination register and result code.
2348          destination = merged;
2349          cc = mergedCC;
2350      }
2351
2352    Section 3.11.5, Fragment Program Instruction Set
2353
2354    The following sections describe the instruction set available to fragment
2355    programs.
2356
2357
2358    Section 3.11.5.1,  ADD:  Add
2359
2360    The ADD instruction performs a component-wise add of the two operands to
2361    yield a result vector.
2362
2363      tmp0 = VectorLoad(op0);
2364      tmp1 = VectorLoad(op1);
2365      result.x = tmp0.x + tmp1.x;
2366      result.y = tmp0.y + tmp1.y;
2367      result.z = tmp0.z + tmp1.z;
2368      result.w = tmp0.w + tmp1.w;
2369
2370    The following special-case rules apply to addition:
2371
2372      1. "A+B" is always equivalent to "B+A".
2373      2. NaN + <x> = NaN, for all <x>.
2374      3. +INF + <x> = +INF, for all <x> except NaN and -INF.
2375      4. -INF + <x> = -INF, for all <x> except NaN and +INF.
2376      5. +INF + -INF = NaN.
2377      6. -0.0 + <x> = <x>, for all <x>.
2378      7. +0.0 + <x> = <x>, for all <x> except -0.0.
2379
2380
2381    Section 3.11.5.2,  COS:  Cosine
2382
2383    The COS instruction approximates the cosine of the angle specified by the
2384    scalar operand and replicates the approximation to all four components of
2385    the result vector.  The angle is specified in radians and does not have to
2386    be in the range [0,2*PI].
2387
2388      tmp = ScalarLoad(op0);
2389      result.x = ApproxCosine(tmp);
2390      result.y = ApproxCosine(tmp);
2391      result.z = ApproxCosine(tmp);
2392      result.w = ApproxCosine(tmp);
2393
2394    The approximation function ApproxCosine is accurate to at least 22 bits
2395    with an angle in the range [0,2*PI].
2396
2397      | ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.
2398
2399    The error in the approximation will typically increase with the absolute
2400    value of the angle when the angle falls outside the range [0,2*PI].
2401
2402    The following special-case rules apply to cosine approximation:
2403
2404      1. ApproxCosine(NaN) = NaN.
2405      2. ApproxCosine(+/-INF) = NaN.
2406      3. ApproxCosine(+/-0.0) = +1.0.
2407
2408
2409    Section 3.11.5.3,  DDX:  Derivative Relative to X
2410
2411    The DDX instruction computes approximate partial derivatives of the four
2412    components of the single operand with respect to the X window coordinate
2413    to yield a result vector.  The partial derivative is evaluated at the
2414    center of the pixel.
2415
2416      f = VectorLoad(op0);
2417      result = ComputePartialX(f);
2418
2419    Note that the partial derivates obtained by this instruction are
2420    approximate, and derivative-of-derivate instruction sequences may not
2421    yield accurate second derivatives.
2422
2423    For components with partial derivatives that overflow (including +/-INF
2424    inputs), the resulting partials may be encoded as large floating-point
2425    numbers instead of +/-INF.
2426
2427
2428    Section 3.11.5.4,  DDY:  Derivative Relative to Y
2429
2430    The DDY instruction computes approximate partial derivatives of the four
2431    components of the single operand with respect to the Y window coordinate
2432    to yield a result vector.  The partial derivative is evaluated at the
2433    center of the pixel.
2434
2435      f = VectorLoad(op0);
2436      result = ComputePartialY(f);
2437
2438    Note that the partial derivates obtained by this instruction are
2439    approximate, and derivative-of-derivate instruction sequences may not
2440    yield accurate second derivatives.
2441
2442    For components with partial derivatives that overflow (including +/-INF
2443    inputs), the resulting partials may be encoded as large floating-point
2444    numbers instead of +/-INF.
2445
2446
2447    Section 3.11.5.5,  DP3:  3-Component Dot Product
2448
2449    The DP3 instruction computes a three component dot product of the two
2450    operands (using the x, y, and z components) and replicates the dot product
2451    to all four components of the result vector.
2452
2453      tmp0 = VectorLoad(op0);
2454      tmp1 = VectorLoad(op1):
2455      result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2456                 (tmp0.z * tmp2.z);
2457      result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2458                 (tmp0.z * tmp2.z);
2459      result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2460                 (tmp0.z * tmp2.z);
2461      result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2462                 (tmp0.z * tmp2.z);
2463
2464
2465    Section 3.11.5.6,  DP4:  4-Component Dot Product
2466
2467    The DP4 instruction computes a four component dot product of the two
2468    operands and replicates the dot product to all four components of the
2469    result vector.
2470
2471      tmp0 = VectorLoad(op0);
2472      tmp1 = VectorLoad(op1):
2473      result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2474                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
2475      result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2476                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
2477      result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2478                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
2479      result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) +
2480                 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w);
2481
2482
2483    Section 3.11.5.7,  DST:  Distance Vector
2484
2485    The DST instruction computes a distance vector from two specially-
2486    formatted operands.  The first operand should be of the form [NA, d^2,
2487    d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d],
2488    where NA values are not relevant to the calculation and d is a vector
2489    length.  If both vectors satisfy these conditions, the result vector will
2490    be of the form [1.0, d, d^2, 1/d].
2491
2492    The exact behavior is specified in the following pseudo-code:
2493
2494      tmp0 = VectorLoad(op0);
2495      tmp1 = VectorLoad(op1);
2496      result.x = 1.0;
2497      result.y = tmp0.y * tmp1.y;
2498      result.z = tmp0.z;
2499      result.w = tmp1.w;
2500
2501    Given an arbitrary vector, d^2 can be obtained using the DOT3 instruction
2502    (using the same vector for both operands) and 1/d can be obtained from d^2
2503    using the RSQ instruction.
2504
2505    This distance vector is useful for per-fragment light attenuation
2506    calculations:  a DOT3 operation involving the distance vector and an
2507    attenuation constants vector will yield the attenuation factor.
2508
2509
2510    Section 3.11.5.8,  EX2:  Exponential Base 2
2511
2512    The EX2 instruction approximates 2 raised to the power of the scalar
2513    operand and replicates it to all four components of the result
2514    vector.
2515
2516      tmp = ScalarLoad(op0);
2517      result.x = Approx2ToX(tmp);
2518      result.y = Approx2ToX(tmp);
2519      result.z = Approx2ToX(tmp);
2520      result.w = Approx2ToX(tmp);
2521
2522    The approximation function is accurate to at least 22 bits:
2523
2524      | Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0,
2525
2526    and, in general,
2527
2528      | Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)).
2529
2530    The following special-case rules apply to exponential approximation:
2531
2532      1. Approx2ToX(NaN) = NaN.
2533      2. Approx2ToX(-INF) = +0.0.
2534      3. Approx2ToX(+INF) = +INF.
2535      4. Approx2ToX(+/-0.0) = +1.0.
2536
2537
2538    Section 3.11.5.9,  FLR:  Floor
2539
2540    The FLR instruction performs a component-wise floor operation on the
2541    operand to generate a result vector.  The floor of a value is defined as
2542    the largest integer less than or equal to the value.  The floor of 2.3 is
2543    2.0; the floor of -3.6 is -4.0.
2544
2545      tmp = VectorLoad(op0);
2546      result.x = floor(tmp.x);
2547      result.y = floor(tmp.y);
2548      result.z = floor(tmp.z);
2549      result.w = floor(tmp.w);
2550
2551    The following special-case rules apply to floor computation:
2552
2553      1. floor(NaN) = NaN.
2554      2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF.  In all cases, the
2555         sign of the result is equal to the sign of the operand.
2556
2557
2558    Section 3.11.5.10,  FRC:  Fraction
2559
2560    The FRC instruction extracts the fractional portion of each component of
2561    the operand to generate a result vector.  The fractional portion of a
2562    component is defined as the result after subtracting off the floor of the
2563    component (see FLR), and is always in the range [0.00, 1.00).
2564
2565    For negative values, the fractional portion is NOT the number written to
2566    the right of the decimal point -- the fractional portion of -1.7 is not
2567    0.7 -- it is 0.3.  0.3 is produced by subtracting the floor of -1.7 (-2.0)
2568    from -1.7.
2569
2570      tmp = VectorLoad(op0);
2571      result.x = tmp.x - floor(tmp.x);
2572      result.y = tmp.y - floor(tmp.y);
2573      result.z = tmp.z - floor(tmp.z);
2574      result.w = tmp.w - floor(tmp.w);
2575
2576    The following special-case rules, which can be derived from the rules for
2577    FLR and ADD apply to fraction computation:
2578
2579      1. fraction(NaN) = NaN.
2580      2. fraction(+/-INF) = NaN.
2581      3. fraction(+/-0.0) = +0.0.
2582
2583
2584    Section 3.11.5.11,  KIL:  Conditionally Discard Fragment
2585
2586    The KIL instruction is unlike any other instruction in the instruction
2587    set.  This instruction evaluates components of a swizzled condition code
2588    using a test expression identical to that used to evaluate condition code
2589    write masks (Section 3.11.4.4).  If any condition code component evaluates
2590    to TRUE, the fragment is discarded.  Otherwise, the instruction has no
2591    effect.  The condition code components are specified, swizzled, and
2592    evaluated in the same manner as the condition code write mask.
2593
2594      if (TestCC(rc.c***) || TestCC(rc.*c**) ||
2595          TestCC(rc.**c*) || TestCC(rc.***c)) {
2596         // Discard the fragment.
2597      } else {
2598        // Do nothing.
2599      }
2600
2601    If the fragment is discarded, it is treated as though it were not produced
2602    by rasterization.  In particular, none of the per-fragment operations
2603    (such as stencil tests, blends, stencil, depth, or color buffer writes)
2604    are performed on the fragment.
2605
2606
2607    Section 3.11.5.12,  LG2:  Logarithm Base 2
2608
2609    The LG2 instruction approximates the base 2 logarithm of the scalar
2610    operand and replicates it to all four components of the result vector.
2611
2612      tmp = ScalarLoad(op0);
2613      result.x = ApproxLog2(tmp);
2614      result.y = ApproxLog2(tmp);
2615      result.z = ApproxLog2(tmp);
2616      result.w = ApproxLog2(tmp);
2617
2618    The approximation function is accurate to at least 22 bits:
2619
2620      | ApproxLog2(x) - log_2(x) | < 1.0 / 2^22.
2621
2622    Note that for large values of x, there are not enough bits in the
2623    floating-point storage format to represent a result that precisely.
2624
2625    The following special-case rules apply to logarithm approximation:
2626
2627      1. ApproxLog2(NaN) = NaN.
2628      2. ApproxLog2(+INF) = +INF.
2629      3. ApproxLog2(+/-0.0) = -INF.
2630      4. ApproxLog2(x) = NaN, -INF < x < -0.0.
2631      5. ApproxLog2(-INF) = NaN.
2632
2633
2634    Section 3.11.5.13,  LIT:  Compute Light Coefficients
2635
2636    The LIT instruction accelerates per-fragment lighting by computing
2637    lighting coefficients for ambient, diffuse, and specular light
2638    contributions.  The "x" component of the operand is assumed to hold a
2639    diffuse dot product (n dot VP_pli, as in the vertex lighting equations in
2640    Section 2.13.1).  The "y" component of the operand is assumed to hold a
2641    specular dot product (n dot h_i).  The "w" component of the operand is
2642    assumed to hold the specular exponent of the material (s_rm).
2643
2644    The "x" component of the result vector receives the value that should be
2645    multiplied by the ambient light/material product (always 1.0).  The "y"
2646    component of the result vector receives the value that should be
2647    multiplied by the diffuse light/material product (n dot VP_pli).  The "z"
2648    component of the result vector receives the value that should be
2649    multiplied by the specular light/material product (f_i * (n dot h_i) ^
2650    s_rm).  The "w" component of the result is the constant 1.0.
2651
2652    Negative diffuse and specular dot products are clamped to 0.0, as is done
2653    in the standard per-vertex lighting operations.  In addition, if the
2654    diffuse dot product is zero or negative, the specular coefficient is
2655    forced to zero.
2656
2657      tmp = VectorLoad(op0);
2658      if (t.x < 0) t.x = 0;
2659      if (t.y < 0) t.y = 0;
2660      result.x = 1.0;
2661      result.y = t.x;
2662      result.z = (t.x > 0) ? ApproxPower(t.y, t.w) : 0.0;
2663      result.w = 1.0;
2664
2665    The exponentiation approximation used to compute result.z are identical to
2666    that used in the POW instruction, including errors and the processing of
2667    any special cases.
2668
2669
2670    Section 3.11.5.14,  LRP:  Linear Interpolation
2671
2672    The LRP instruction performs a component-wise linear interpolation to
2673    yield a result vector.  It interpolates between the components of the
2674    second and third operands, using the first operand as a weight.
2675
2676      tmp0 = VectorLoad(op0);
2677      tmp1 = VectorLoad(op1);
2678      tmp2 = VectorLoad(op2);
2679      result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;
2680      result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;
2681      result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;
2682      result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;
2683
2684
2685    Section 3.11.5.15,  MAD:  Multiply and Add
2686
2687    The MAD instruction performs a component-wise multiply of the first two
2688    operands, and then does a component-wise add of the product to the third
2689    operand to yield a result vector.
2690
2691      tmp0 = VectorLoad(op0);
2692      tmp1 = VectorLoad(op1);
2693      tmp2 = VectorLoad(op2);
2694      result.x = tmp0.x * tmp1.x + tmp2.x;
2695      result.y = tmp0.y * tmp1.y + tmp2.y;
2696      result.z = tmp0.z * tmp1.z + tmp2.z;
2697      result.w = tmp0.w * tmp1.w + tmp2.w;
2698
2699
2700    Section 3.11.5.16,  MAX:  maximum
2701
2702    The MAX instruction computes component-wise maximums of the values in the
2703    two operands to yield a result vector.
2704
2705      tmp0 = VectorLoad(op0);
2706      tmp1 = VectorLoad(op1);
2707      result.x = max(tmp0.x, tmp1.x);
2708      result.y = max(tmp0.y, tmp1.y);
2709      result.z = max(tmp0.z, tmp1.z);
2710      result.w = max(tmp0.w, tmp1.w);
2711
2712    The following special cases apply to the maximum operation:
2713
2714      1. max(A,B) is always equivalent to max(B,A).
2715      2. max(NaN, <x>) == NaN, for all <x>.
2716
2717
2718
2719    Section 3.11.5.17,  MIN:  minimum
2720
2721    The MIN instruction computes component-wise minimums of the values in the
2722    two operands to yield a result vector.
2723
2724      tmp0 = VectorLoad(op0);
2725      tmp1 = VectorLoad(op1);
2726      result.x = min(tmp0.x, tmp1.x);
2727      result.y = min(tmp0.y, tmp1.y);
2728      result.z = min(tmp0.z, tmp1.z);
2729      result.w = min(tmp0.w, tmp1.w);
2730
2731    The following special cases apply to the minimum operation:
2732
2733      1. min(A,B) is always equivalent to min(B,A).
2734      2. min(NaN, <x>) == NaN, for all <x>.
2735
2736
2737    Section 3.11.5.18,  MOV:  Move
2738
2739    The MOV instruction copies the value of the operand to yield a result
2740    vector.
2741
2742      result = VectorLoad(op0);
2743
2744
2745    Section 3.11.5.19,  MUL:  Multiply
2746
2747    The MUL instruction performs a component-wise multiply of the two operands
2748    to yield a result vector.
2749
2750      tmp0 = VectorLoad(op0);
2751      tmp1 = VectorLoad(op1);
2752      result.x = tmp0.x * tmp1.x;
2753      result.y = tmp0.y * tmp1.y;
2754      result.z = tmp0.z * tmp1.z;
2755      result.w = tmp0.w * tmp1.w;
2756
2757    The following special-case rules apply to multiplication:
2758
2759      1. "A*B" is always equivalent to "B*A".
2760      2. NaN * <x> = NaN, for all <x>.
2761      3. +/-0.0 * +/-INF = NaN.
2762      4. +/-0.0 * <x> = +/-0.0, for all <x> except -INF, +INF, and NaN.  The
2763         sign of the result is positive if the signs of the two operands match
2764         and negative otherwise.
2765      5. +/-INF * <x> = +/-INF, for all <x> except -0.0, +0.0, and NaN.  The
2766         sign of the result is positive if the signs of the two operands match
2767         and negative otherwise.
2768      6. +1.0 * <x> = <x>, for all <x>.
2769
2770
2771    Section 3.11.5.20,  PK2H:  Pack Two 16-bit Floats
2772
2773    The PK2H instruction converts the "x" and "y" components of the single
2774    operand into 16-bit floating-point format, packs the bit representation of
2775    these two floats into a 32-bit value, and replicates that value to all
2776    four components of the result vector.  The PK2H instruction can be
2777    reversed by the UP2H instruction below.
2778
2779      tmp0 = VectorLoad(op0);
2780      /* result obtained by combining raw bits of tmp0.x, tmp0.y */
2781      result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
2782      result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
2783      result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
2784      result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16);
2785
2786    The result must be written to a register with 32-bit components (an "R"
2787    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
2788    any other register type is specified.
2789
2790
2791    Section 3.11.5.21,  PK2US:  Pack Two Unsigned 16-bit Scalars
2792
2793    The PK2US instruction converts the "x" and "y" components of the single
2794    operand into a packed pair of 16-bit unsigned scalars.  The scalars are
2795    represented in a bit pattern where all '0' bits corresponds to 0.0 and all
2796    '1' bits corresponds to 1.0.  The bit representations of the two converted
2797    components are packed into a 32-bit value, and that value is replicated to
2798    all four components of the result vector.  The PK2US instruction can be
2799    reversed by the UP2US instruction below.
2800
2801      tmp0 = VectorLoad(op0);
2802      if (tmp0.x < 0.0) tmp0.x = 0.0;
2803      if (tmp0.x > 1.0) tmp0.x = 1.0;
2804      if (tmp0.y < 0.0) tmp0.y = 0.0;
2805      if (tmp0.y > 1.0) tmp0.y = 1.0;
2806      us.x = round(65535.0 * tmp0.x);  /* us is a ushort vector */
2807      us.y = round(65535.0 * tmp0.y);
2808      /* result obtained by combining raw bits of us. */
2809      result.x = ((us.x) | (us.y << 16));
2810      result.y = ((us.x) | (us.y << 16));
2811      result.z = ((us.x) | (us.y << 16));
2812      result.w = ((us.x) | (us.y << 16));
2813
2814    The result must be written to a register with 32-bit components (an "R"
2815    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
2816    any other register type is specified.
2817
2818
2819    Section 3.11.5.22,  PK4B:  Pack Four Signed 8-bit Scalars
2820
2821    The PK4B instruction converts the four components of the single operand
2822    into 8-bit signed quantities.  The signed quantities are represented in a
2823    bit pattern where all '0' bits corresponds to -128/127 and all '1' bits
2824    corresponds to +127/127.  The bit representations of the four converted
2825    components are packed into a 32-bit value, and that value is replicated to
2826    all four components of the result vector.  The PK4B instruction can be
2827    reversed by the UP4B instruction below.
2828
2829      tmp0 = VectorLoad(op0);
2830      if (tmp0.x < -128/127) tmp0.x = -128/127;
2831      if (tmp0.y < -128/127) tmp0.y = -128/127;
2832      if (tmp0.z < -128/127) tmp0.z = -128/127;
2833      if (tmp0.w < -128/127) tmp0.w = -128/127;
2834      if (tmp0.x > +127/127) tmp0.x = +127/127;
2835      if (tmp0.y > +127/127) tmp0.y = +127/127;
2836      if (tmp0.z > +127/127) tmp0.z = +127/127;
2837      if (tmp0.w > +127/127) tmp0.w = +127/127;
2838      ub.x = round(127.0 * tmp0.x + 128.0);  /* ub is a ubyte vector */
2839      ub.y = round(127.0 * tmp0.y + 128.0);
2840      ub.z = round(127.0 * tmp0.z + 128.0);
2841      ub.w = round(127.0 * tmp0.w + 128.0);
2842      /* result obtained by combining raw bits of ub. */
2843      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
2844      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
2845      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
2846      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
2847
2848    The result must be written to a register with 32-bit components (an "R"
2849    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
2850    any other register type is specified.
2851
2852
2853    Section 3.11.5.23,  PK4UB:  Pack Four Unsigned 8-bit Scalars
2854
2855    The PK4UB instruction converts the four components of the single operand
2856    into a packed grouping of 8-bit unsigned scalars.  The scalars are
2857    represented in a bit pattern where all '0' bits corresponds to 0.0 and all
2858    '1' bits corresponds to 1.0.  The bit representations of the four
2859    converted components are packed into a 32-bit value, and that value is
2860    replicated to all four components of the result vector.  The PK4UB
2861    instruction can be reversed by the UP4UB instruction below.
2862
2863      tmp0 = VectorLoad(op0);
2864      if (tmp0.x < 0.0) tmp0.x = 0.0;
2865      if (tmp0.x > 1.0) tmp0.x = 1.0;
2866      if (tmp0.y < 0.0) tmp0.y = 0.0;
2867      if (tmp0.y > 1.0) tmp0.y = 1.0;
2868      if (tmp0.z < 0.0) tmp0.z = 0.0;
2869      if (tmp0.z > 1.0) tmp0.z = 1.0;
2870      if (tmp0.w < 0.0) tmp0.w = 0.0;
2871      if (tmp0.w > 1.0) tmp0.w = 1.0;
2872      ub.x = round(255.0 * tmp0.x);  /* ub is a ubyte vector */
2873      ub.y = round(255.0 * tmp0.y);
2874      ub.z = round(255.0 * tmp0.z);
2875      ub.w = round(255.0 * tmp0.w);
2876      /* result obtained by combining raw bits of ub. */
2877      result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
2878      result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
2879      result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
2880      result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24));
2881
2882    The result must be written to a register with 32-bit components (an "R"
2883    register, o[COLR], or o[DEPR]).  A fragment program will fail to load if
2884    any other register type is specified.
2885
2886
2887    Section 3.11.5.24,  POW:  Exponentiation
2888
2889    The POW instruction approximates the value of the first scalar operand
2890    raised to the power of the second scalar operand and replicates it to all
2891    four components of the result vector.
2892
2893      tmp0 = ScalarLoad(op0);
2894      tmp1 = ScalarLoad(op1);
2895      result.x = ApproxPower(tmp0, tmp1);
2896      result.y = ApproxPower(tmp0, tmp1);
2897      result.z = ApproxPower(tmp0, tmp1);
2898      result.w = ApproxPower(tmp0, tmp1);
2899
2900    The exponentiation approximation function is defined in terms of the base
2901    2 exponentiation and logarithm approximation operations in the EX2 and LG2
2902    instructions, including errors and the processing of any special cases.
2903    In particular,
2904
2905      ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)).
2906
2907    The following special-case rules, which can be derived from the rules in
2908    the LG2, MUL, and EX2 instructions, apply to exponentiation:
2909
2910      1. ApproxPower(<x>, <y>) = NaN, if x < -0.0,
2911      2. ApproxPower(<x>, <y>) = NaN, if x or y is NaN.
2912      3. ApproxPower(+/-0.0, +/-0.0) = NaN.
2913      4. ApproxPower(+INF, +/-0.0) = NaN.
2914      5. ApproxPower(+1.0, +/-INF) = NaN.
2915      6. ApproxPower(+/-0.0, <x>) = +0.0, if x > +0.0.
2916      7. ApproxPower(+/-0.0, <x>) = +INF, if x < -0.0.
2917      8. ApproxPower(+1.0, <x>)   = +1.0, if -INF < x < +INF.
2918      9. ApproxPower(+INF, <x>) = +INF, if x > +0.0.
2919      10. ApproxPower(+INF, <x>) = +INF, if x < -0.0.
2920      11. ApproxPower(<x>, +/-0.0) = +1.0, if +0.0 < x < +INF.
2921      12. ApproxPower(<x>, +1.0) ~= <x>, if x >= +0.0.
2922      13. ApproxPower(<x>, +INF) = +0.0, if -0.0 <= x < +1.0,
2923                                   +INF, if x > +1.0,
2924      14. ApproxPower(<x>, -INF) = +INF, if -0.0 <= x < +1.0,
2925                                   +0.0, if x > +1.0,
2926
2927    Note that 0^0 is defined here as NaN, since ApproxLog2(0) = -INF, and
2928    0*(-INF) = NaN.  In many other applications, including the standard C
2929    pow() function, 0^0 is defined as 1.0.  This behavior can be emulated
2930    using additional instructions in much that same way that the pow()
2931    function is implemented on many CPUs.
2932
2933    Note that a logarithm is involved even if the exponent is an integer.
2934    This means that any exponentiating with a negative base will produce NaN.
2935    In constrast, it is possible in a "normal" mathematical formulation to
2936    raise negative numbers to integral powers (e.g., (-3)^2== 9, and
2937    (-0.5)^-2==4).
2938
2939
2940    Section 3.11.5.25,  RCP:  Reciprocal
2941
2942    The RCP instruction approximates the reciprocal of the scalar operand and
2943    replicates it to all four components of the result vector.
2944
2945      tmp = ScalarLoad(op0);
2946      result.x = ApproxReciprocal(tmp);
2947      result.y = ApproxReciprocal(tmp);
2948      result.z = ApproxReciprocal(tmp);
2949      result.w = ApproxReciprocal(tmp);
2950
2951    The approximation function is accurate to at least 22 bits:
2952
2953      | ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0.
2954
2955    The following special-case rules apply to reciprocation:
2956
2957      1. ApproxReciprocal(NaN) = NaN.
2958      2. ApproxReciprocal(+INF) = +0.0.
2959      3. ApproxReciprocal(-INF) = -0.0.
2960      4. ApproxReciprocal(+0.0) = +INF.
2961      5. ApproxReciprocal(-0.0) = -INF.
2962
2963
2964    Section 3.11.5.26,  RFL:  Reflection Vector
2965
2966    The RFL instruction computes the reflection of the second vector operand
2967    (the "direction" vector) about the vector specified by the first vector
2968    operand (the "axis" vector).  Both operands are treated as 3D vectors (the
2969    w components are ignored).  The result vector is another 3D vector (the
2970    "reflected direction" vector).  The length of the result vector, ignoring
2971    rounding errors, should equal that of the second operand.
2972
2973      axis = VectorLoad(op0);
2974      direction = VectorLoad(op1);
2975      tmp.w = (axis.x * axis.x + axis.y * axis.y +
2976               axis.z * axis.z);
2977      tmp.x = (axis.x * direction.x + axis.y * direction.y +
2978               axis.z * direction.z);
2979      tmp.x = 2.0 * tmp.x;
2980      tmp.x = tmp.x / tmp.w;
2981      result.x = tmp.x * axis.x - direction.x;
2982      result.y = tmp.x * axis.y - direction.y;
2983      result.z = tmp.x * axis.z - direction.z;
2984
2985    A fragment program will fail to load if the w component of the result is
2986    enabled in the component write mask (see the <optionalWriteMask> rule in
2987    the grammar).
2988
2989
2990    Section 3.11.5.27,  RSQ:  Reciprocal Square Root
2991
2992    The RSQ instruction approximates the reciprocal of the square root of the
2993    scalar operand and replicates it to all four components of the result
2994    vector.
2995
2996      tmp = ScalarLoad(op0);
2997      result.x = ApproxRSQRT(tmp);
2998      result.y = ApproxRSQRT(tmp);
2999      result.z = ApproxRSQRT(tmp);
3000      result.w = ApproxRSQRT(tmp);
3001
3002    The approximation function is accurate to at least 22 bits:
3003
3004      | ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0.
3005
3006    The following special-case rules apply to reciprocal square roots:
3007
3008      1. ApproxRSQRT(NaN) = NaN.
3009      2. ApproxRSQRT(+INF) = +0.0.
3010      3. ApproxRSQRT(-INF) = NaN.
3011      4. ApproxRSQRT(+0.0) = +INF.
3012      5. ApproxRSQRT(-0.0) = -INF.
3013      6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0.
3014
3015
3016    Section 3.11.5.28,  SEQ:  Set on Equal To
3017
3018    The SEQ instruction performs a component-wise comparison of the two
3019    operands.  Each component of the result vector is 1.0 if the corresponding
3020    component of the first operand is equal to that of the second, and 0.0
3021    otherwise.
3022
3023      tmp0 = VectorLoad(op0);
3024      tmp1 = VectorLoad(op1);
3025      result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0;
3026      result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0;
3027      result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0;
3028      result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0;
3029
3030    The following special-case rules apply to SEQ:
3031
3032      1. (<x> == <y>) and (<y> == <x>) always produce the same result.
3033      1. (NaN == <x>) is FALSE for all <x>, including NaN.
3034      2. (+INF == +INF) and (-INF == -INF) are TRUE.
3035      3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE.
3036
3037
3038    Section 3.11.5.29,  SFL:  Set on False
3039
3040    The SFL instruction is a degenerate case of the other "Set on"
3041    instructions that sets all components of the result vector to
3042    0.0.
3043
3044      result.x = 0.0;
3045      result.y = 0.0;
3046      result.z = 0.0;
3047      result.w = 0.0;
3048
3049
3050    Section 3.11.5.30,  SGE:  Set on Greater Than or Equal
3051
3052    The SGE instruction performs a component-wise comparison of the two
3053    operands.  Each component of the result vector is 1.0 if the corresponding
3054    component of the first operands is greater than or equal that of the
3055    second, and 0.0 otherwise.
3056
3057      tmp0 = VectorLoad(op0);
3058      tmp1 = VectorLoad(op1);
3059      result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0;
3060      result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0;
3061      result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0;
3062      result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0;
3063
3064    The following special-case rules apply to SGE:
3065
3066      1. (NaN >= <x>) and (<x> >= NaN) are FALSE for all <x>.
3067      2. (+INF >= +INF) and (-INF >= -INF) are TRUE.
3068      3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE.
3069
3070
3071    Section 3.11.5.31,  SGT:  Set on Greater Than
3072
3073    The SGT instruction performs a component-wise comparison of the two
3074    operands.  Each component of the result vector is 1.0 if the corresponding
3075    component of the first operands is greater than that of the second, and
3076    0.0 otherwise.
3077
3078      tmp0 = VectorLoad(op0);
3079      tmp1 = VectorLoad(op1);
3080      result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0;
3081      result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0;
3082      result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0;
3083      result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0;
3084
3085    The following special-case rules apply to SGT:
3086
3087      1. (NaN > <x>) and (<x> > NaN) are FALSE for all <x>.
3088      2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE.
3089
3090
3091    Section 3.11.5.32,  SIN:  Sine
3092
3093    The SIN instruction approximates the sine of the angle specified by the
3094    scalar operand and replicates it to all four components of the result
3095    vector.  The angle is specified in radians and does not have to be in the
3096    range [0,2*PI].
3097
3098      tmp = ScalarLoad(op0);
3099      result.x = ApproxSine(tmp);
3100      result.y = ApproxSine(tmp);
3101      result.z = ApproxSine(tmp);
3102      result.w = ApproxSine(tmp);
3103
3104    The approximation function is accurate to at least 22 bits with an angle
3105    in the range [0,2*PI].
3106
3107      | ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI.
3108
3109    The error in the approximation will typically increase with the absolute
3110    value of the angle when the angle falls outside the range [0,2*PI].
3111
3112    The following special-case rules apply to cosine approximation:
3113
3114      1. ApproxSine(NaN) = NaN.
3115      2. ApproxSine(+/-INF) = NaN.
3116      3. ApproxSine(+/-0.0) = +/-0.0.  The sign of the result is equal to the
3117         sign of the single operand.
3118
3119
3120    Section 3.11.5.33,  SLE:  Set on Less Than or Equal
3121
3122    The SLE instruction performs a component-wise comparison of the two
3123    operands.  Each component of the result vector is 1.0 if the corresponding
3124    component of the first operand is less than or equal to that of the
3125    second, and 0.0 otherwise.
3126
3127      tmp0 = VectorLoad(op0);
3128      tmp1 = VectorLoad(op1);
3129      result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0;
3130      result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0;
3131      result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0;
3132      result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0;
3133
3134    The following special-case rules apply to SLE:
3135
3136      1. (NaN <= <x>) and (<x> <= NaN) are FALSE for all <x>.
3137      2. (+INF <= +INF) and (-INF <= -INF) are TRUE.
3138      3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE.
3139
3140
3141    Section 3.11.5.34,  SLT:  Set on Less Than
3142
3143    The SLT instruction performs a component-wise comparison of the two
3144    operands.  Each component of the result vector is 1.0 if the corresponding
3145    component of the first operand is less than that of the second, and 0.0
3146    otherwise.
3147
3148      tmp0 = VectorLoad(op0);
3149      tmp1 = VectorLoad(op1);
3150      result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0;
3151      result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0;
3152      result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0;
3153      result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0;
3154
3155    The following special-case rules apply to SLT:
3156
3157      1. (NaN < <x>) and (<x> < NaN) are FALSE for all <x>.
3158      2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE.
3159
3160
3161    Section 3.11.5.35,  SNE:  Set on Not Equal
3162
3163    The SNE instruction performs a component-wise comparison of the two
3164    operands.  Each component of the result vector is 1.0 if the corresponding
3165    component of the first operand is not equal to that of the second, and 0.0
3166    otherwise.
3167
3168      tmp0 = VectorLoad(op0);
3169      tmp1 = VectorLoad(op1);
3170      result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0;
3171      result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0;
3172      result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0;
3173      result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0;
3174
3175    The following special-case rules apply to SNE:
3176
3177      1. (<x> != <y>) and (<y> != <x>) always produce the same result.
3178      2. (NaN != <x>) is TRUE for all <x>, including NaN.
3179      3. (+INF != +INF) and (-INF != -INF) are FALSE.
3180      4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE.
3181
3182
3183    Section 3.11.5.36,  STR:  Set on True
3184
3185    The STR instruction is a degenerate case of the other "Set on"
3186    instructions that sets all components of the result vector to 1.0.
3187
3188      result.x = 1.0;
3189      result.y = 1.0;
3190      result.z = 1.0;
3191      result.w = 1.0;
3192
3193
3194    Section 3.11.5.37,  SUB:  Subtract
3195
3196    The SUB instruction performs a component-wise subtraction of the second
3197    operand from the first to yield a result vector.
3198
3199      tmp0 = VectorLoad(op0);
3200      tmp1 = VectorLoad(op1);
3201      result.x = tmp0.x - tmp1.x;
3202      result.y = tmp0.y - tmp1.y;
3203      result.z = tmp0.z - tmp1.z;
3204      result.w = tmp0.w - tmp1.w;
3205
3206    The SUB instruction is completely equivalent to an identical ADD
3207    instruction in which the negate operator on the second operand is
3208    reversed:
3209
3210      1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2".
3211      2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2".
3212      3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|".
3213      4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|".
3214
3215
3216    Section 3.11.5.38,  TEX: Texture Lookup
3217
3218    The TEX instruction performs a filtered texture lookup using the texture
3219    target given by <texImageTarget> belonging to the texture image unit given
3220    by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",
3221    and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
3222    TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.
3223
3224    The (s,t,r) texture coordinates used for the lookup are the x, y, and z
3225    components of the single operand.
3226
3227    The texture lookup is performed as specified in Section 3.8.  The LOD
3228    calculations in Section 3.8.5 are performed using an implementation
3229    dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy.
3230    The mapping of filtered texture components to the components of the result
3231    vector is dependent on the base internal format of the texture and is
3232    specified in Table X.5.
3233
3234                                 Result Vector Components
3235      Base Internal Format        X      Y      Z      W
3236      --------------------      -----  -----  -----  -----
3237      ALPHA                      0.0    0.0    0.0    At
3238      LUMINANCE                  Lt     Lt     Lt     1.0
3239      LUMINANCE_ALPHA            Lt     Lt     Lt     At
3240      INTENSITY                  It     It     It     It
3241      RGB                        Rt     Gt     Bt     1.0
3242      RGBA                       Rt     Gt     Bt     At
3243      HILO_NV (signed)           HIt    LOt    HEMI   1.0
3244      HILO_NV (unsigned)         HIt    LOt    1.0    1.0
3245      DSDT_NV                    DSt    DTt    0.0    1.0
3246      DSDT_MAG_NV                DSt    DTt    MAGt   1.0
3247      DSDT_MAG_INTENSITY_NV      DSt    DTt    MAGt   It
3248      FLOAT_R_NV                 Rt     0.0    0.0    1.0
3249      FLOAT_RG_NV                Rt     Gt     0.0    1.0
3250      FLOAT_RGB_NV               Rt     Gt     Bt     1.0
3251      FLOAT_RGBA_NV              Rt     Gt     Bt     At
3252
3253      Table X.5:  Mapping of filtered texel components to result vector
3254      components for the TEX instruction.  0.0 and 1.0 indicate that the
3255      corresponding constant value is written to the result vector.
3256      DEPTH_COMPONENT textures are treated as ALPHA, LUMINANCE, or INTENSITY,
3257      as specified in the texture's depth texture mode.
3258
3259      For HILO_NV textures with signed components, "HEMI" is defined as
3260      sqrt(MAX(0, 1-(HIt^2+LOt^2))).
3261
3262    This instruction specifies a particular texture target, ignoring the
3263    standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
3264    TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
3265    OpenGL.  If the specified texture target has a consistent set of images, a
3266    lookup is performed.  Otherwise, the result of the instruction is the
3267    vector (0,0,0,0).
3268
3269    Although this instruction allows the selection of any texture target, a
3270    fragment program can not use more than one texture target for any given
3271    texture image unit.
3272
3273
3274    Section 3.11.5.39,  TXD: Texture Lookup with Derivatives
3275
3276    The TXD instruction performs a filtered texture lookup using the texture
3277    target given by <texImageTarget> belonging to the texture image unit given
3278    by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",
3279    and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
3280    TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.
3281
3282    The (s,t,r) texture coordinates used for the lookup are the x, y, and z
3283    components of the first operand.  The partial derivatives in the X
3284    direction (ds/dx, dt/dx, dr/dx) are specified by the x, y, and z
3285    components of the second operand.  The partial derivatives in the Y
3286    direction (ds/dy, dt/dy, dr/dy) are specified by the x, y, and z
3287    components of the third operand.
3288
3289    The texture lookup is performed as specified in Section 3.8.  The LOD
3290    calculations in Section 3.8.5 are performed using the specified partial
3291    derivatives.  The mapping of filtered texture components to the components
3292    of the result vector is dependent on the base internal format of the
3293    texture and is specified in Table X.5.
3294
3295    This instruction specifies a particular texture target, ignoring the
3296    standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
3297    TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
3298    OpenGL.  If the specified texture target has a consistent set of images, a
3299    lookup is performed.  Otherwise, the result of the instruction is the
3300    vector (0,0,0,0).
3301
3302    Although this instruction allows the selection of any texture target, a
3303    fragment program can not use more than one texture target for any given
3304    texture image unit.
3305
3306
3307    Section 3.11.5.40,  TXP: Projective Texture Lookup
3308
3309    The TXP instruction performs a filtered texture lookup using the texture
3310    target given by <texImageTarget> belonging to the texture image unit given
3311    by <texImageUnit>.  <texImageTarget> values of "1D", "2D", "3D", "CUBE",
3312    and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D,
3313    TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively.
3314
3315    For cube map textures, the (s,t,r) texture coordinates used for the lookup
3316    are given by x, y, and z, respectively.  For all other textures, the
3317    (s,t,r) texture coordinates used for the lookup are given by x/w, y/w, and
3318    z/w, respectively, where x, y, z, and w are the corresponding components
3319    of the operand.
3320
3321    The texture lookup is performed as specified in Section 3.8.  The LOD
3322    calculations in Section 3.8.5 are performed using an implementation
3323    dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy.
3324    The mapping of filtered texture components to the components of the result
3325    vector is dependent on the base internal format of the texture and is
3326    specified in Table X.5.
3327
3328    This instruction specifies a particular texture target, ignoring the
3329    standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D,
3330    TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended
3331    OpenGL.  If the specified texture target has a consistent set of images, a
3332    lookup is performed.  Otherwise, the result of the instruction is the
3333    vector (0,0,0,0).
3334
3335    Although this instruction allows the selection of any texture target, a
3336    fragment program can not use more than one texture target for any given
3337    texture image unit.
3338
3339
3340    Section 3.11.5.41,  UP2H:  Unpack Two 16-Bit Floats
3341
3342    The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit
3343    scalar operand.  The first 16-bit float (stored in the 16 least
3344    significant bits) is written into the "x" and "z" components of the result
3345    vector; the second is written into the "y" and "w" components of the
3346    result vector.
3347
3348    This operation undoes the type conversion and packing performed by the
3349    PK2H instruction.
3350
3351      tmp = ScalarLoad(op0);
3352      result.x = (fp16) (RawBits(tmp) & 0xFFFF);
3353      result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
3354      result.z = (fp16) (RawBits(tmp) & 0xFFFF);
3355      result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF);
3356
3357    Since the source operand must be a 32-bit scalar, a fragment program will
3358    fail to load if the operand is not obtained from a register with 32-bit
3359    components or from a program parameter.
3360
3361
3362    Section 3.11.5.42,  UP2US:  Unpack Two Unsigned 16-Bit Scalars
3363
3364    The UP2US instruction unpacks two 16-bit unsigned values packed together
3365    in a 32-bit scalar operand.  The unsigned quantities are encoded where a
3366    bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1'
3367    bits corresponds to 1.0.  The "x" and "z" components of the result vector
3368    are obtained from the 16 least significant bits of the operand; the "y"
3369    and "w" components are obtained from the 16 most significant bits.
3370
3371    This operation undoes the type conversion and packing performed by the
3372    PK2US instruction.
3373
3374      tmp = ScalarLoad(op0);
3375      result.x = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
3376      result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
3377      result.z = ((RawBits(tmp) >> 0)  & 0xFFFF) / 65535.0;
3378      result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0;
3379
3380    Since the source operand must be a 32-bit scalar, a fragment program will
3381    fail to load if the operand is not obtained from a register with 32-bit
3382    components or from a program parameter.
3383
3384
3385    Section 3.11.5.43,  UP4B:  Unpack Four Signed 8-Bit Values
3386
3387    The UP4B instruction unpacks four 8-bit signed values packed together in a
3388    32-bit scalar operand.  The signed quantities are encoded where a bit
3389    pattern of all '0' bits corresponds to -128/127 and a pattern of all '1'
3390    bits corresponds to +127/127.  The "x" component of the result vector is
3391    the converted value corresponding to the 8 least significant bits of the
3392    operand; the "w" component corresponds to the 8 most significant bits.
3393
3394    This operation undoes the type conversion and packing performed by the
3395    PK4B instruction.
3396
3397      tmp = ScalarLoad(op0);
3398      result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0;
3399      result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0;
3400      result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0;
3401      result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0;
3402
3403    Since the source operand must be a 32-bit scalar, a fragment program will
3404    fail to load if the operand is not obtained from a register with 32-bit
3405    components or from a program parameter.
3406
3407
3408    Section 3.11.5.44,  UP4UB:  Unpack Four Unsigned 8-Bit Scalars
3409
3410    The UP4UB instruction unpacks four 8-bit unsigned values packed together
3411    in a 32-bit scalar operand.  The unsigned quantities are encoded where a
3412    bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1'
3413    bits corresponds to 1.0.  The "x" component of the result vector is
3414    obtained from the 8 least significant bits of the operand; the "w"
3415    component is obtained from the 8 most significant bits.
3416
3417    This operation undoes the type conversion and packing performed by the
3418    PK4UB instruction.
3419
3420      tmp = ScalarLoad(op0);
3421      result.x = ((RawBits(tmp) >> 0)  & 0xFF) / 255.0;
3422      result.y = ((RawBits(tmp) >> 8)  & 0xFF) / 255.0;
3423      result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0;
3424      result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0;
3425
3426    Since the source operand must be a 32-bit scalar, a fragment program will
3427    fail to load if the operand is not obtained from a register with 32-bit
3428    components or from a program parameter.
3429
3430
3431    Section 3.11.5.45,  X2D:  2D Coordinate Transformation
3432
3433    The X2D instruction multiplies the 2D offset vector specified by the "x"
3434    and "y" components of the second vector operand by the 2x2 matrix
3435    specified by the four components of the third vector operand, and adds the
3436    transformed offset vector to the 2D vector specified by the "x" and "y"
3437    components of the first vector operand.  The first component of the sum is
3438    written to the "x" and "z" components of the result; the second component
3439    is written to the "y" and "w" components of the result.
3440
3441    The X2D instruction can be used to displace texture coordinates in the
3442    same manner as the OFFSET_TEXTURE_2D_NV mode in the GL_NV_texture_shader
3443    extension.
3444
3445      tmp0 = VectorLoad(op0);
3446      tmp1 = VectorLoad(op1);
3447      tmp2 = VectorLoad(op2);
3448      result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
3449      result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
3450      result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y;
3451      result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w;
3452
3453
3454    Section 3.11.6, Fragment Program Outputs
3455
3456    Upon completion of fragment program execution, the output registers are
3457    used to replace the fragment's associated data.
3458
3459    The RGBA color of the fragment is taken from the color output register
3460    used by the program (COLR or COLH).  The R, G, B, and A color components
3461    are extracted from the "x", "y", "z", and "w" components, respectively, of
3462    the output register and are clamped to the range [0,1].
3463
3464    If the DEPR output register is written by the fragment program, the depth
3465    value of the fragment is taken from the z component of the DEPR output
3466    register.  If depth clamping is enabled, the depth value is clamped to the
3467    range [min(n,f), max(n,f)], where n and f are the near and far depth range
3468    values.  If depth clamping is disabled, the fragment is discarded if its
3469    depth value is outside the range [min(n,f), max(n,f)].
3470
3471
3472    Section 3.11.7, Required Fragment Program State
3473
3474    The state required for managing fragment programs consists of:
3475
3476      a bit indicating whether or not fragment program mode is enabled;
3477
3478      an unsigned integer naming the currently bound fragment program
3479
3480      and the state that must be maintained to indicate which integers are
3481      currently in use as fragment program names.
3482
3483    Fragment program mode is initially disabled.  The initial state of all 128
3484    fragment program parameter registers is (0,0,0,0).  The initial currently
3485    bound fragment program is zero.
3486
3487    Each fragment program object consists of:
3488
3489      an enumerant given the program target (FRAGMENT_PROGRAM_NV);
3490
3491      a boolean indicating whether the program is resident;
3492
3493      an array of type ubyte containing the program string;
3494
3495      an integer representing the length of the program string array;
3496
3497      one four-component floating-point vector for each named local
3498      parameter in the program;
3499
3500      and a set of MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV four-component
3501      floating-point vectors to hold numbered local parameters, each initially
3502      set to (0,0,0,0).
3503
3504    Initially, no program objects exist.
3505
3506    Additionally, the state required during the execution of a fragment
3507    program consists of:  twelve 4-component floating-point fragment attribute
3508    registers, thirty-two 128-bit physical temporary registers, and a single
3509    4-component condition code, whose components have one of four values (LT,
3510    EQ, GT, or UN).
3511
3512    Each time a fragment program is executed, the fragment attribute registers
3513    are initialized with the fragment's location and associated data, all
3514    temporary register components are initialized to zero, and all condition
3515    code components are initialized to EQ.
3516
3517
3518    Renumber Section 3.11 to Section 3.12, Antialiasing Application (p.140).
3519    No changes to the text of the section.
3520
3521
3522Additions to Chapter 4 of the OpenGL 1.2.1 Specification (Per-Fragment
3523Operations and the Framebuffer)
3524
3525    None
3526
3527Additions to Chapter 5 of the OpenGL 1.2.1 Specification (Special Functions)
3528
3529    Add new section 5.7, Programs (after "Flush and Finish")
3530
3531    Programs are specified as an array of ubytes used to control the operation
3532    of portions of the GL.  The array is a string of ASCII characters encoding
3533    the program.
3534
3535    The command
3536
3537      LoadProgramNV(enum target, uint id, sizei len, const ubyte *program);
3538
3539    loads a program.  The target parameter specifies the type of program
3540    loaded and can be VERTEX_PROGRAM_NV, VERTEX_STATE_PROGRAM_NV, or
3541    FRAGMENT_PROGRAM_NV.  VERTEX_PROGRAM_NV specifies a program to be executed
3542    in vertex program mode as each vertex is specified.  VERTEX_STATE_PROGRAM
3543    specifies a program to be run manually to update vertex state.
3544    FRAGMENT_PROGRAM specifies a program to be executed in fragment program
3545    mode as each fragment is rasterized.
3546
3547    Multiple programs can be loaded with different names.  id names the
3548    program to load.  The name space for programs is the set of positive
3549    integers (zero is reserved).  The error INVALID_VALUE is generated by
3550    LoadProgramNV if a program is loaded with an id of zero.  The error
3551    INVALID_OPERATION is generated by LoadProgramNV or if a program is loaded
3552    for an id that is currently loaded with a program of a different program
3553    target.  program is a pointer to an array of ubytes that represents the
3554    program being loaded.  The length of the array in ubytes is indicated by
3555    len.
3556
3557    At program load time, the program is parsed into a set of tokens possibly
3558    separated by white space.  Spaces, tabs, newlines, carriage returns, and
3559    comments are considered whitespace.  Comments begin with the character "#"
3560    and are terminated by a newline, a carriage return, or the end of the
3561    program array.  Tokens are processed in a case-sensitive manner:  upper
3562    and lower-case letters are not considered equivalent.
3563
3564    Each program target has a corresponding Backus-Naur Form (BNF) grammar
3565    specifying the syntactically valid sequences for programs of the specified
3566    type.  The set of valid tokens can be inferred from the grammar.  The
3567    token "" represents an empty string and is used to indicate optional
3568    rules.  A program is invalid if it contains any undefined tokens or
3569    characters.
3570
3571    The error INVALID_OPERATION is generated by LoadProgramNV if a program
3572    fails to load because it is not syntactically correct or fails to satisfy
3573    all of the semantic restrictions corresponding to the program target.
3574
3575    A successfully loaded program is parsed into a sequence of instructions.
3576    Each instruction is identified by its tokenized name.  The operation of
3577    these instructions is specific to the program target and is defined
3578    elsewhere.
3579
3580    A successfully loaded program replaces the program previously assigned to
3581    the name specified by id.  If the OUT_OF_MEMORY error is generated by
3582    LoadProgramNV, no change is made to the previous contents of the named
3583    program.
3584
3585    Querying the value of PROGRAM_ERROR_POSITION_NV returns a ubyte offset
3586    into the program string most recently passed to LoadProgramNV indicating
3587    the position of the first error, if any, in the program.  If the program
3588    fails to load because of a semantic restriction that cannot be determined
3589    until the program is fully scanned, the error position will be len, the
3590    length of the program.  If the program loads successfully, the value of
3591    PROGRAM_ERROR_POSITION_NV is assigned the value negative one.
3592
3593    For targets whose programs are executed automatically (e.g., vertex and
3594    fragment programs), there must be a current program.  The current vertex
3595    program is executed automatically in vertex program mode as vertices are
3596    specified.  The current fragment program is executed automatically in
3597    fragment program mode as fragments are generated by rasterization.
3598    Current programs for a program target are updated by
3599
3600      BindProgramNV(enum target, uint id);
3601
3602    where target must be VERTEX_PROGRAM_NV or FRAGMENT_PROGRAM_NV.  The error
3603    INVALID_OPERATION is generated by BindProgramNV if id names a program that
3604    has a type different than target (for example, if id names a vertex state
3605    program as described in section 2.14.4).
3606
3607    Binding to a nonexistent program id does not generate an error.  In
3608    particular, binding to program id zero does not generate an error.
3609    However, because program zero cannot be loaded, program zero is always
3610    nonexistent.  If a program id is successfully loaded with a new vertex
3611    program and id is also the currently bound vertex program, the new program
3612    is considered the currently bound vertex program.
3613
3614    The INVALID_OPERATION error is generated when both vertex program mode is
3615    enabled and Begin is called (or when a command that performs an implicit
3616    Begin is called) if the current vertex program is nonexistent or not
3617    valid.  A vertex program may not be valid for reasons explained in section
3618    2.14.5.
3619
3620    The INVALID_OPERATION error is generated when both fragment program mode
3621    is enabled and Begin, another GL command that performs an implicit Begin,
3622    or any other GL command that generates fragments is called, if the current
3623    fragment program is nonexistent or not valid.  A fragment program may be
3624    invalid for reasons explained in Section 3.11.3.
3625
3626    Programs are deleted by calling
3627
3628      void DeleteProgramsNV(sizei n, const uint *ids);
3629
3630    ids contains n names of programs to be deleted.  After a program is
3631    deleted, it becomes nonexistent, and its name is again unused.  If a
3632    program that is currently bound is deleted, it is as though BindProgramNV
3633    has been executed with the same target as the deleted program and program
3634    zero.  Unused names in ids are silently ignored, as is the value zero.
3635
3636    The command
3637
3638      void GenProgramsNV(sizei n, uint *ids);
3639
3640    returns n currently unused program names in ids.  These names are marked
3641    as used, for the purposes of GenProgramsNV only, but they become existent
3642    programs only when the are first loaded using LoadProgramNV.
3643
3644    An implementation may choose to establish a working set of programs on
3645    which binding and/or manual execution are performed with higher
3646    performance.  A program that is currently part of this working set is said
3647    to be resident.
3648
3649    The command
3650
3651      boolean AreProgramsResidentNV(sizei n, const uint *ids,
3652                                    boolean *residences);
3653
3654    returns TRUE if all of the n programs named in ids are resident, or if the
3655    implementation does not distinguish a working set.  If at least one of the
3656    programs named in ids is not resident, then FALSE is returned, and the
3657    residence of each program is returned in residences.  Otherwise the
3658    contents of residences are not changed.  If any of the names in ids are
3659    nonexistent or zero, FALSE is returned, the error INVALID_VALUE is
3660    generated, and the contents of residences are indeterminate.  The
3661    residence status of a single named program can also be queried by calling
3662    GetProgramivNV (Section 6.1.13) with id set to the name of the program and
3663    pname set to PROGRAM_RESIDENT_NV.
3664
3665    AreProgramsResidentNV indicates only whether a program is currently
3666    resident, not whether it could not be made resident.  An implementation
3667    may choose to make a program resident only on first use, for example.  The
3668    client may guide the GL implementation in determining which programs
3669    should be resident by requesting a set of programs to make resident.
3670
3671    The command
3672
3673      void RequestResidentProgramsNV(sizei n, const uint *ids);
3674
3675    requests that the n programs named in ids should be made resident.
3676    While all the programs are not guaranteed to become resident,
3677    the implementation should make a best effort to make as many of
3678    the programs resident as possible.  As a result of making the
3679    requested programs resident, program names not among the requested
3680    programs may become non-resident.  Higher priority for residency
3681    should be given to programs listed earlier in the ids array.
3682    RequestResidentProgramsNV silently ignores attempts to make resident
3683    nonexistent program names or zero.  AreProgramsResidentNV can be
3684    called after RequestResidentProgramsNV to determine which programs
3685    actually became resident.
3686
3687    The commands
3688
3689      void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name,
3690                                     float x, float y, float z, float w);
3691      void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name,
3692                                     double x, double y, double z, double w);
3693      void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name,
3694                                      const float v[]);
3695      void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name,
3696                                      const double v[]);
3697
3698    specify a new value for the named program local parameter <name> belonging
3699    to the fragment program specified by <id>.  <name> is a pointer to an
3700    array of ubytes holding the parameter name.  <len> specifies the number of
3701    ubytes in the array given by <name>.  The new x, y, z, and w components of
3702    the named local parameter are given by x, y, z, and w, respectively, for
3703    ProgramNamedParameter4fNV and ProgramNamedParameter4dNV, and by v[0],
3704    v[1], v[2], and v[3], respectively, for ProgramNamedParameter4fvNV and
3705    ProgramNamedParameter4dvNV.  The error INVALID_OPERATION is generated if
3706    <id> specifies a nonexistent program or a program whose type does not
3707    suport named local parameters.  The error INVALID_VALUE error is generated
3708    if <name> does not specify the name of a local parameter in the program
3709    corresponding to <id>.  The error INVALID_VALUE is also generated if <len>
3710    is zero.
3711
3712    The commands
3713
3714      void ProgramLocalParameter4fARB(enum target, uint index,
3715                                      float x, float y, float z, float w);
3716      void ProgramLocalParameter4fvARB(enum target, uint index,
3717                                       const float *params);
3718      void ProgramLocalParameter4dARB(enum target, uint index,
3719                                      double x, double y, double z, double w);
3720      void ProgramLocalParameter4dvARB(enum target, uint index,
3721                                       const double *params);
3722
3723    update the values of the numbered program local parameter <index>
3724    belonging to the program object currently bound to <target>.  For
3725    ProgramLocalParameter4fARB and ProgramLocalParameter4dARB, the four
3726    components of the parameter are updated with the values of <x>, <y>, <z>,
3727    and <w>, respectively.  For ProgramLocalParameter4fvARB and
3728    ProgramLocalParameter4dvARB, the four components of the parameter are
3729    updated with the array of four values pointed to by <params>.  The error
3730    INVALID_VALUE is generated if <index> is greater than or equal to the
3731    number of numbered program local parameters supported by <target>.
3732
3733
3734Additions to Chapter 6 of the OpenGL 1.2.1 Specification (State and
3735State Requests)
3736
3737    Modify Section 6.1.11, Pointer and String Queries (p. 206)
3738
3739    (modify last paragraph, p. 206) ... The possible values for <name> are
3740    VENDOR, RENDERER, VERSION, EXTENSIONS, and PROGRAM_ERROR_STRING_NV.
3741
3742    (add after last paragraph of section, p. 207) Queries of
3743    PROGRAM_ERROR_STRING_NV return a pointer to an implementation-dependent
3744    program load error string.  If the last call to LoadProgramNV failed to
3745    load a program, the returned string describes a reason that the program
3746    failed to load.  Otherwise, a pointer to an empty string (containing only
3747    a terminator) is returned.
3748
3749    Rename and modify Section 6.1.13, Vertex and Fragment Program Queries
3750    (from GL_NV_fragment_program).  Portions of this section pertaining to
3751    fragment programs are copied verbatim.
3752
3753    (insert after discussion of GetProgramParameter[fd]vNV)
3754
3755    The commands
3756
3757      void GetProgramNamedParameterfvNV(uint id, sizei len,
3758                                        const ubyte *name, float *params);
3759      void GetProgramNamedParameterdvNV(uint id, sizei len,
3760                                        const ubyte *name, double *params);
3761
3762    obtain the current program named local parameter value for the parameter
3763    named <name> belonging to the program given by <id>.  <name> is a pointer
3764    to an array of ubytes holding the parameter name.  <len> specifies the
3765    number of ubytes in the array given by <name>.  The error
3766    INVALID_OPERATION is generated if <id> specifies a nonexistent program or
3767    a program whose type does not suport named local parameters.  The error
3768    INVALID_VALUE is generated if <name> does not specify the name of a local
3769    parameter in the program corresponding to <id>.  The error INVALID_VALUE
3770    is also generated if <len> is zero.  Each named program local parameter is
3771    an array of four values.
3772
3773    The commands
3774
3775      void GetProgramLocalParameterdvARB(enum target, uint index,
3776                                         double *params);
3777      void GetProgramLocalParameterfvARB(enum target, uint index,
3778                                         float *params);
3779
3780    obtain the current value for the numbered program local parameter <index>
3781    belonging to the program object currently bound to <target>, and places
3782    the information in the array <params>.  The error INVALID_ENUM is
3783    generated if <target> specifies a nonexistent program target or a program
3784    target that does not support numbered program local parameters.  The error
3785    INVALID_VALUE is generated if <index> is greater than or equal to the
3786    implementation-dependent number of supported numbered program local
3787    parameters for the program target.
3788
3789    When the program target type is FRAGMENT_PROGRAM_NV, each numbered program
3790    local parameter returned is an array of four values.  ...
3791
3792    The command
3793
3794      void GetProgramivNV(uint id, enum pname, int *params);
3795
3796    obtains program state named by pname for the program named id in the array
3797    params.  pname must be one of PROGRAM_TARGET_NV, PROGRAM_LENGTH_NV, or
3798    PROGRAM_RESIDENT_NV.  The error INVALID_OPERATION is generated if the
3799    program named id does not exist.
3800
3801    The command
3802
3803      void GetProgramStringNV(uint id, enum pname,
3804                              ubyte *program);
3805
3806    obtains the program string for program id.  pname must be
3807    PROGRAM_STRING_NV.  n ubytes are returned into the array program
3808    where n is the length of the program in ubytes.  GetProgramivNV with
3809    PROGRAM_LENGTH_NV can be used to query the length of a program's
3810    string.  The INVALID_OPERATION error is generated if the program
3811    named id does not exist.
3812
3813    ...
3814
3815    The command
3816
3817      boolean IsProgramNV(uint id);
3818
3819    returns TRUE if program is the name of a program object.  If program
3820    is zero or is a non-zero value that is not the name of a program
3821    object, or if an error condition occurs, IsProgramNV returns FALSE.
3822    A name returned by GenProgramsNV but not yet loaded with a program
3823    is not the name of a program object."
3824
3825
3826Additions to Appendix F of the OpenGL 1.2.1 Specification (ARB Extensions)
3827
3828    Modify Section F.2.3 (Changes to Section 2.6), p.240
3829
3830    (modify last paragraph on p.240) ... Multiple sets of texture coordinates
3831    may be used to specify how multiple texture images are mapped onto a
3832    primitive.  The number of texture coordinate sets supported is
3833    implementation dependent, but must be at least 1.  The number of texture
3834    coordinate sets supported may be queried with the state
3835    MAX_TEXTURE_COORDS_NV.
3836
3837    Modify Section F.2.4 (Changes to Section 2.7), p.241
3838
3839    (modify the last paragraph on p.241, carrying over to p.243)
3840    Implementations may support more than one set of texture coordinates.  The
3841    commands
3842
3843        void MultiTexCoord{1234}{sifd}ARB(enum texture, T coords)
3844        void MultiTexCoord{1234}{sifd}vARB(enum texture, T coords)
3845
3846    take the coordinate set to be modified as the <texture> parameter.
3847    <texture> is a symbolic constant of the form TEXTUREi_ARB, indicating that
3848    texture coordinate set i is to be modified.  The constants obey
3849    TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is
3850    the implementation dependent number of texture units defined by
3851    MAX_TEXTURE_COORDS_NV).
3852
3853
3854    Modify Section F.2.5 (Changes to Section 2.8), p.243
3855
3856    (modify first and second paragraphs of section) ... The client may specify
3857    up to 5 plus the value of MAX_TEXTURE_COORDS_NV arrays; one each to store
3858    vertex coordinates...
3859
3860    In implementations which support more than one texture coordinate set, the
3861    command
3862
3863        void ClientActiveTextureARB(enum texture)
3864
3865    is used to select the vertex array client state parameters to be modified
3866    by the TexCoordPointer command and the array affected by EnableClientState
3867    and DisableClientState with the parameter TEXTURE_COORD_ARRAY.  This
3868    command sets the state variable CLIENT_ACTIVE_TEXTURE_ARB.  Each texture
3869    coordinate set has a client state vector which is selected when this
3870    command is invoked.  This state vector also includes the vertex array
3871    state.  This command also selects the texture coordinate set state used
3872    for queries of client state.
3873
3874    (modify first paragraph on p.244) If the number of supported texture
3875    coordinate sets (the value of MAX_TEXTURE_COORDS_NV) is k, ...
3876
3877
3878    Modify Section F.2.6 (Changes to Section 2.10.2), p.244
3879
3880    (modify first paragraph)  For each texture coordinate set, a 4x4 matrix is
3881    applied to the corresponding texture coordinates...
3882
3883    (replace second and third paragraphs) The command
3884
3885      void ActiveTextureARB(enum texture);
3886
3887    specifies the active texture unit selector, ACTIVE_TEXTURE_ARB.  Each
3888    texture unit contains up to two distinct sub-units:  a texture coordinate
3889    processing unit (consisting of a texture matrix stack and texture
3890    coordinate generation state) and a texture image unit (consisting of all
3891    the texture state defined in Section 3.8).  In implementations with a
3892    different number of supported texture coordinate sets and texture image
3893    units, some texture units may consist of only one of the two sub-units.
3894
3895    The active texture unit selector specifies the texture unit accessed by
3896    commands involving texture coordinate processing.  Such commands include
3897    those accessing the current matrix stack (if MATRIX_MODE is TEXTURE),
3898    TexGen (Section 2.10.4), Enable/Disable (if any texture coordinate
3899    generation enum is selected), as well as queries of the current texture
3900    coordinates and current raster texture coordinates.  If the texture unit
3901    number corresponding to the current value of ACTIVE_TEXTURE_ARB is greater
3902    than or equal to the implementation dependent constant
3903    MAX_TEXTURE_COORDS_NV, the error INVALID_OPERATION is generated by any
3904    such command.
3905
3906    The active texture unit selector also selects the texture unit accessed by
3907    commands involving texture image processing (Section 3.8).  Such commands
3908    include all variants of TexEnv, TexParameter, and TexImage commands,
3909    BindTexture, Enable/Disable for any texture target (e.g., TEXTURE_2D), and
3910    queries of all such state.  If the texture unit number corresponding to
3911    the current value of ACTIVE_TEXTURE_ARB is greater than or equal to the
3912    implementation dependent constant MAX_TEXTURE_IMAGE_UNITS_NV, the error
3913    INVALID_OPERATION is generated by any such command.
3914
3915    ActiveTextureARB generates the error INVALID_ENUM if an invalid <texture>
3916    is specified.  <texture> is a symbolic constant of the form TEXTUREi_ARB,
3917    indicating that texture unit i is to be modified.  The constants obey
3918    TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is
3919    the larger of the MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS_NV).
3920    For compatibility with old OpenGL specifications, the implementation
3921    dependent constant MAX_TEXTURE_UNITS_ARB specifies the number of
3922    conventional texture units supported by the implementation.  Its value
3923    must be no larger than the minimum of MAX_TEXTURE_COORDS_NV and
3924    MAX_TEXTURE_IMAGE_UNITS_NV.
3925
3926    Modify Section F.2.12 (Changes to Section 3.8.10), p.249
3927
3928    (modify next-to-last paragraph) Texturing is enabled and disabled
3929    individually for each texture unit.  If texturing is disabled for one of
3930    the units, then the fragment resulting from the previous unit is passed
3931    unaltered to the following unit.  Individual texture units beyond those
3932    specified by MAX_TEXTURE_UNITS_ARB may be incomplete and are always
3933    treated as disabled.
3934
3935    Modify Section F.2.15 (Changes to Section 6.1.2), p.251
3936
3937    (add to end of paragraph) Queries of texture state variables corresponding
3938    to texture coordinate processing unit (namely, TexGen state and enables,
3939    and matrices) will produce an INVALID_OPERATION error if the value of
3940    ACTIVE_TEXTURE_ARB is greater than or equal to MAX_TEXTURE_COORDS_NV.  All
3941    other texture state queries will result in an INVALID_OPERATION error if
3942    the value of ACTIVE_TEXTURE_ARB is greater than or equal to
3943    MAX_TEXTURE_IMAGE_UNITS_NV.
3944
3945Additions to the AGL/GLX/WGL Specifications
3946
3947    Program objects are shared between AGL/GLX/WGL rendering contexts if
3948    and only if the rendering contexts share display lists.  No change
3949    is made to the AGL/GLX/WGL API.
3950
3951Dependencies on GL_NV_vertex_program
3952
3953    If NV_vertex_program is supported, the description of LoadProgramNV in
3954    Section 2.14.1.7 (up to the BNF description of vertex programs) is
3955    deleted, as it is replaced by the contents of Section 5.7 in this
3956    specification.  The general error descriptions in Section 2.14.1.7 common
3957    to Section 5.7 (like INVALID_OPERATION if the program fails to compile)
3958    should also be deleted.  Section 2.14.1.8 should also be deleted.  Section
3959    6.1.13 is modified by this specification as described above.
3960
3961Dependencies on NV_texture_shader
3962
3963    If NV_texture_shader is not supported, the comment about texture shaders
3964    being disabled in fragment program mode is not applicable.
3965
3966Dependencies on NV_texture_rectangle
3967
3968    If NV_texture_rectangle is not supported, the references to "RECT" in the
3969    <texImageTarget> grammar rule and TEXTURE_RECTANGLE_NV are not applicable.
3970
3971Dependencies on ARB_texture_cube_map
3972
3973    If ARB_texture_cube_map is not supported, the references to "CUBE" in the
3974    <texImageTarget> grammar rule and TEXTURE_CUBE_MAP_ARB are not applicable.
3975
3976Dependencies on EXT_fog_coord
3977
3978    If EXT_fog_coord is not supported, references to "fog coordinate" in the
3979    definition of the "FOGC" fragment attribute register should be removed.
3980
3981Dependencies on NV_depth_clamp
3982
3983    If NV_depth_clamp is not supported, section 3.11.6 is modified to remove
3984    discussion of the depth clamp enable and instead indicate that fragments
3985    with depth values outside [min(n,f), max(n,f)] are always discarded.
3986
3987Dependencies on ARB_depth_texture and SGIX_depth_texture
3988
3989    If ARB_depth_texture is not supported, but SGIX_depth_texture is
3990    supported, the discussion of Table X.5 is modified to indicate that
3991    DEPTH_COMPONENT textures are treated as LUMINANCE.
3992
3993    If neither extension is supported, the discussion of DEPTH_COMPONENT
3994    textures in Table X.5 should be removed.
3995
3996Dependencies on NV_float_buffer
3997
3998    If NV_float_buffer is not supported, references to FLOAT_R_NV,
3999    FLOAT_RG_NV, FLOAT_RGB_NV, and FLOAT_RGBA_NV internal texture formats in
4000    Table X.5 should be removed.
4001
4002Dependencies on ARB_vertex_program
4003
4004    This extension does not have any explicit dependencies, but the APIs for
4005    setting and querying numbered local parameters (ProgramLocalParameter*ARB
4006    and GetProgramLocalParameter*ARB) were taken directly from this extension,
4007
4008Dependencies on ARB_fragment_program
4009
4010    If ARB_fragment_program is not supported, the maximum number of executable
4011    instructions in any !!FP1.0 program is 1024.  If ARB_fragment_program is
4012    supported, the maximum number of executable instructions for an !!FP1.0 is
4013    at least 1024, but can be larger.  The limit can be queried by calling
4014    GetProgramiv with <target> set to FRAGMENT_PROGRAM_ARB and <pname> set to
4015    MAX_PROGRAM_INSTRUCTIONS_ARB.
4016
4017
4018GLX Protocol
4019
4020    Most of the GLX protocol needed to implement this extension is described
4021    in the GL_NV_vertex_program extension specification and will not be
4022    repeated here.
4023
4024    The following two rendering commands are potentially large, and hence can
4025    be sent in a glXRender or glXRenderLarge request.
4026
4027        ProgramNamedParameter4fvNV
4028            2           28+len+p        rendering command length
4029            2           4218            rendering command opcode
4030            4           CARD32          id
4031            4           CARD32          len
4032            4           FLOAT32         params[0]
4033            4           FLOAT32         params[1]
4034            4           FLOAT32         params[2]
4035            4           FLOAT32         params[3]
4036            len         LISTofCARD8     name
4037            p                           unused, p=pad(len)
4038
4039         If the command is encoded in a glxRenderLarge request, the command
4040         opcode and command length fields above are expanded to 4 bytes each:
4041
4042            4           32+len+p        rendering command length
4043            4           4218            rendering command opcode
4044
4045
4046        ProgramNamedParameter4dvNV
4047            2           44+len+p        rendering command length
4048            2           4219            rendering command opcode
4049            4           CARD32          id
4050            4           CARD32          len
4051            8           FLOAT64         params[0]
4052            8           FLOAT64         params[1]
4053            8           FLOAT64         params[2]
4054            8           FLOAT64         params[3]
4055            len         LISTofCARD8     name
4056            p                           unused, p=pad(len)
4057
4058         If the command is encoded in a glxRenderLarge request, the command
4059         opcode and command length fields above are expanded to 4 bytes each:
4060
4061            4           48+len+p        rendering command length
4062            4           4219            rendering command opcode
4063
4064
4065    The remaining two commands are non-rendering commands.  These commands are
4066    sent separately (i.e., not as part of a glXRender or glXRenderLarge
4067    request), using the glXVendorPrivateWithReply request:
4068
4069        GetProgramNamedParameterfvNV
4070            1           CARD8           opcode (X assigned)
4071            1           17              GLX opcode (glXVendorPrivateWithReply)
4072            2           4+(len+p)/4     request length
4073            4           1310            vendor specific opcode
4074            4           GLX_CONTEXT_TAG context tag
4075            4           INT32           len
4076            len         LISTofCARD8     name
4077            p                           unused, p=pad(len)
4078          =>
4079
4080          If the command succeeds, 4 floats are sent in the reply:
4081
4082            1           1               reply
4083            1                           unused
4084            2           CARD16          sequence number
4085            4           4               reply length
4086            24                          unused
4087            16          LISTofFLOAT32   params
4088
4089          Otherwise, an empty reply is sent, indicating that a GL error
4090          occured:
4091
4092            1           1               reply
4093            1                           unused
4094            2           CARD16          sequence number
4095            4           0               reply length
4096            24                          unused
4097
4098
4099        GetProgramNamedParameterdvNV
4100            1           CARD8           opcode (X assigned)
4101            1           17              GLX opcode (glXVendorPrivateWithReply)
4102            2           4+(len+p)/4     request length
4103            4           1311            vendor specific opcode
4104            4           GLX_CONTEXT_TAG context tag
4105            4           INT32           len
4106            len         LISTofCARD8     name
4107            p                           unused, p=pad(len)
4108          =>
4109
4110          If the command succeeds, 4 doubles are sent in the reply:
4111
4112            1           1               reply
4113            1                           unused
4114            2           CARD16          sequence number
4115            4           8               reply length
4116            24                          unused
4117            32          LISTofFLOAT64   params
4118
4119          Otherwise, an empty reply is sent, indicating that a GL error
4120          occured:
4121
4122            1           1               reply
4123            1                           unused
4124            2           CARD16          sequence number
4125            4           0               reply length
4126            24                          unused
4127
4128
4129Errors
4130
4131    INVALID_OPERATION is generated by Begin, DrawPixels, Bitmap, CopyPixels,
4132    or a command that performs an explicit Begin if FRAGMENT_PROGRAM_NV is
4133    enabled and the currently bound fragment program does not exist.
4134
4135    INVALID_OPERATION is generated by ProgramNamedParameter4fNV,
4136    ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,
4137    ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or
4138    GetProgramNamedParameterdvNV if <id> specifies a nonexistent program or a
4139    program whose type does not suport local parameters.
4140
4141    INVALID_VALUE is generated by ProgramNamedParameter4fNV,
4142    ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,
4143    ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or
4144    GetProgramNamedParameterdvNV if <len> is zero.
4145
4146    INVALID_VALUE is generated by ProgramNamedParameter4fNV,
4147    ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV,
4148    ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or
4149    GetProgramNamedParameterdvNV if <name> does not specify the name of a
4150    local parameter in the program corresponding to <id>.
4151
4152    INVALID_OPERATION is generated by any command accessing texture coordinate
4153    processing state if the texture unit number corresponding to the current
4154    value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation
4155    dependent constant MAX_TEXTURE_COORDS_NV.
4156
4157    INVALID_OPERATION is generated by any command accessing texture image
4158    processing state if the texture unit number corresponding to the current
4159    value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation
4160    dependent constant MAX_TEXTURE_IMAGE_UNITS_NV.
4161
4162
4163    (The following are error descriptions copied from GL_NV_vertex_program
4164     that apply to this extension as well.  These modifications do not affect
4165     the behavior of that extension.)
4166
4167    INVALID_VALUE is generated by LoadProgramNV if id is zero.
4168
4169    INVALID_OPERATION is generated by LoadProgramNV if the program
4170    corresponding to id is currently loaded but has a program type different
4171    from that given by target.
4172
4173    INVALID_OPERATION is generated by LoadProgramNV if the program specified
4174    is syntactically incorrect for the program type specified by target.  The
4175    value of PROGRAM_ERROR_POSITION_NV is still updated when this error is
4176    generated.
4177
4178    INVALID_OPERATION is generated by LoadProgramNV if the program specified
4179    fails to conform to any of the semantic restrictions imposed on programs
4180    of the type specified by target.  The value of PROGRAM_ERROR_POSITION_NV
4181    is still updated when this error is generated.
4182
4183    INVALID_OPERATION is generated by BindProgramNV if target does not match
4184    the type of the program named by id.
4185
4186    INVALID_VALUE is generated by AreProgramsResidentNV if any of the queried
4187    programs are zero or do not exist.
4188
4189    INVALID_OPERATION is generated by GetProgramivNV or GetProgramStringNV if
4190    the program named id does not exist.
4191
4192
4193New State
4194
4195Get Value                          Type  Get Command              Initial Value  Description         Section   Attribute
4196---------------------------------  ----  -----------------------  -------------  ------------------  --------  ------------
4197FRAGMENT_PROGRAM_NV                B     IsEnabled                FALSE          fragment program    3.11      enable
4198                                                                                 mode enable
4199FRAGMENT_PROGRAM_BINDING_NV        Z+    GetIntegerv              0              bound fragment      5.7       -
4200                                                                                 program
4201
4202Table X.6.  New State Introduced by NV_fragment_program.
4203
4204
4205Get Value                  Type    Get Command          Initial Value  Description         Section   Attribute
4206-------------------------  ------  ------------------   -------------  ------------------  --------  ---------
4207PROGRAM_ERROR_POSITION_NV  Z       GetIntegerv          -1             program error       5.7       -
4208                                                                       position
4209PROGRAM_TARGET_NV          Z2      GetProgramivNV       0              program target      6.1.13    -
4210PROGRAM_LENGTH_NV          Z+      GetProgramivNV       0              program length      6.1.13    -
4211PROGRAM_RESIDENT_NV        Z2      GetProgramivNV       False          program residency   6.1.13    -
4212PROGRAM_STRING_NV          ubxn    GetProgramStringNV   ""             program string      6.1.13    -
4213-                          nxR4    GetProgramNamed-     (0,0,0,0)      named program local 5.7       -
4214                                   ParameterNV                         parameter value
4215-                          64+xR4  GetProgramLocal-     (0,0,0,0)      numbered program    5.7       -
4216                                   ParameterARB                        local parameter
4217
4218Table X.7.  Program Object State common to NV_vertex_program and NV_fragment_program.
4219
4220
4221Get Value    Type    Get Command   Initial Value  Description               Section   Attribute
4222---------    ------  -----------   -------------  -----------------------   --------  ---------
4223-            12xR4   -             fragment data  fragment attribute
4224                                                  registers                 3.11.1.1  -
4225-            16xR4   -             (0,0,0,0)      fp32 temporary registers  3.11.1.2  -
4226-            32xR4   -             (0,0,0,0)      fp16 temporary registers  3.11.1.2  -
4227             (Z_4)4  -             (EQ,EQ,EQ,EQ)  condition code register   3.11.1.4  -
4228                                                  address register
4229
4230Table X.8.  Fragment Program Per-Fragment Execution State.
4231
4232
4233New Implementation Dependent State
4234
4235                                                 Minimum
4236Get Value                   Type   Get Command    Value       Description    Section  Attribute
4237---------                   ----   -----------   -------  -----------------  -------  ---------
4238MAX_TEXTURE_COORDS_NV       Z+     GetIntegerv      2     number of texture  2.6      -
4239                                                          coordinate sets
4240                                                          supported
4241MAX_TEXTURE_IMAGE_UNITS_NV  Z+     GetIntegerv      2     number of texture  2.10.2   -
4242                                                          image units
4243                                                          supported
4244MAX_FRAGMENT_PROGRAM_       Z+     GetIntegerv     64     number of numbered 3.11.7   -
4245  LOCAL_PARAMETERS_NV                                     local parameters
4246                                                          supported
4247
4248
4249Revision History
4250
4251    Rev.    Date    Author   Changes
4252    ----  -------- --------  --------------------------------------------
4253     73   05/23/05  pbrown   Fixed cut-and-paste error in the dependency
4254                             section where it said "NV_texture_rectangle"
4255                             instead of "ARB_texture_cube_map".
4256
4257     72   05/16/04  pbrown   Documented that it's not possible to results from
4258                             LG2 that are any more precise than what is
4259                             available in the fp32 storage format.
4260
4261     71   04/23/04  pbrown   Fixed incorrect example.
4262
4263     70   03/20/03  pbrown   Made the instruction count limit for !!FP1.0
4264                             programs queryable instead of a hard-wired value
4265                             of 1024.  The limit can be queried using
4266                             ARB_fragment_program mechanisms, and remains 1024
4267                             if ARB_fragment_program is unsupported.
4268
4269     69   02/01/03  pbrown   Removed support for combiner fragment programs
4270                             (!!FCP1.0).
4271
4272     68   01/08/03  pbrown   Correct spec language providing examples of NaNs,
4273                             such as sqrt(-1) or log(-1).  Division by zero
4274                             produces an infinity, not a NaN.
4275
4276     67   12/23/02  pbrown   Fix incorrect syntax of examples of "KIL"
4277                             instruction. The condition code test is not
4278                             parenthesized in KIL.
4279
4280     66   10/31/02  pbrown   Cleaned up special cases of POW, including the
4281                             fact that "POW dst, 0, 0" produces NaN in this
4282                             spec, not 1.0.
4283
4284     65   10/28/02  pbrown   Documented that signed HILO textures will have
4285                             the hemisphere remapping applied, but unsigned
4286                             textures will not.
4287
4288     64   09/17/02  pbrown   Minor typo fixes.
4289
4290     63   08/14/02  pbrown   Clarified the value of the "other" components
4291                             of f[FOGC].
4292
4293     62   07/24/02  pbrown   Removed PK4UBG and UP4UBG instructions.
4294                             Simplified the implementation of the temporary
4295                             and output register limit for combiner
4296                             programs by counting all four o[TEXn] registers
4297                             against the limit, whether or not they are
4298                             written.
4299
4300     61   07/19/02  pbrown   Renamed ProgramLocalParameter*NV to
4301                             ProgramNamedParameter*NV to eliminate naming
4302                             conflicts with ARB_vertex_program (and presumably
4303                             ARB_fragment_program).
4304
4305                             Added support for numbered program local
4306                             parameters for compatibility with the ARB vertex
4307                             program extension (and upcoming ARB fragment
4308                             program extension), so it's possible to set local
4309                             parameters the same way in both extensions.
4310
4311                             Eliminated the language describing "register
4312                             slots" and how the "H" and "R" registers overlap.
4313                             Instead, registers are guaranteed not to overlap,
4314                             and a semantic limit is added on the number of
4315                             temporaries and output registers that can be used
4316                             by a program.
4317
4318                             Eliminated the requirement that non-combiner
4319                             programs actually write a color value; the only
4320                             requirement is that one output register be
4321                             written.  When using fragment programs that use
4322                             depth replacement, there may not be a need to
4323                             compute color if color writes are currently
4324                             disabled
4325
4326                             Cleaned up the issues section.  Added several
4327                             examples of fragment program operation.
4328
4329                             Cleaned up GLX protocol.
4330
4331     59   07/07/02  pbrown   Minor clarifications of texture lookup handling.
4332                             Documented that DDX and DDY may not always
4333                             produce infinities.
4334
4335     58   06/27/02  pbrown   Added clarification that instructions can use the
4336                             same attribute or parameter register more than
4337                             once.  Added support for "X" precision on the
4338                             "set on" instructions.  Removed "X" precision
4339                             support from DST.
4340
4341     57   06/27/02  pbrown   Added missing table entries covering the use of
4342                             floating-point textures.
4343
4344     56   06/27/02  pbrown   Modified the spec to indicate that depth textures
4345                             are treated as alpha, luminance, or intensity
4346                             according to the depth texture mode in ARB_shadow.
4347
4348     55   06/26/02  pbrown   Fixed the correct aliased register number and
4349                             "read-only" mappings for o[DEPR] in combiner
4350                             programs.
4351
4352     54   06/05/02  pbrown   Fixed the spec to indicate that near and far
4353                             frustum clipping is disabled for depth
4354                             replacement programs.  Fixed the spec to indicate
4355                             that the register combiners enable is overridden
4356                             for fragment programs (enabled for combiner
4357                             programs, disabled for color programs).
4358
4359     53   05/20/02  pbrown   Miscellaneous bug fixes for wording and
4360                             special-case handling errors.
4361
4362     52   05/16/02  pbrown   Added "_SAT" suffix to clamp result vector
4363                             components to [0,1].  Fixed special case rules
4364                             for MUL instruction and the "UN" condition code.
4365
4366     50   04/19/02  pbrown   Added "$" as a legal character in an identifier
4367                             name.  Added example for fixed and conditional
4368                             write masks and condition code updates.
4369
4370     49   04/16/02  pbrown   Added new query of PROGRAM_ERROR_STRING_NV to
4371                             return more detailed information on program load
4372                             failures.
4373
4374     48   04/02/02  pbrown   Added missing enum value for the
4375                             FRAGMENT_PROGRAM_BINDING_NV query.
4376
4377     47   03/15/02  pbrown   Fixed various typos, and an incorrect description
4378                             of the MAX operation.
4379
4380     45   01/31/02  pbrown   Renamed the packing and unpacking opcode to more
4381                             closely match OpenGL data type naming conventions
4382                             (PK2 becomes PK2H, PK16 becomes PH2US, PK4
4383                             becomes PK4B, PKB becomes PK4UB).  Renamed "BEM"
4384                             instruction to "X2D" to reflect the fact that it
4385                             does a 2D coordinate transformation (not just a
4386                             bump mapping operation).  Added PK4UBG and UP4UBG
4387                             instructions to support sRGB gamma correction
4388                             when packing and unpacking components.
4389
4390     44   01/18/02  pbrown   Double the number of available temporaries (16 to
4391                             32 fp32 vectors).  Add BEM (texture coordinate
4392                             offset), PKB/UPB (unsigned byte packing), and
4393                             PK16/UP16 (unsigned short packing) instructions.
4394
4395     43   01/04/02  pbrown   Documented special cases for comparisons,
4396                             including the handling of NaN in the SNE
4397                             instruction. Added automatic generation of a
4398                             third normal component for HILO textures.
4399                             Documented the restriction that RFL can't write
4400                             to the w component of the result.  Trivial fix of
4401                             the special-cases for RCP.  Fixed minor typo on
4402                             the TEX instruction.
4403
4404     40   11/26/01  pbrown   Eliminated "X" precision specifier on those
4405                             instructions that do complicated math or don't
4406                             otherwise need it (e.g., "SGE").  Fixed special
4407                             case math on LG2 instruction.  Eliminated
4408                             incorrectly specified exponent clamping on LIT
4409                             instruction.  Fixed description and special-case
4410                             math on LIT/POW instructions.  Specified that
4411                             combiner program outputs are clamped to [-1,+1],
4412                             not [+0,+1].
4413
4414     39   11/16/01  pbrown   Added semantic restriction that PK2/PK4 must
4415                             write to a 32-bit register.  Cleaned up the
4416                             converse restrictions on UP2/UP4, making sure to
4417                             allow UP2/UP4 from a program parameter.  Fix
4418                             section numberings and a few typos.
4419
4420     36   11/07/01  pbrown   Cleaned up explanation of the "negative q is
4421                             undefined" for texture mapping spec restriction.
4422                             Fixed a nit on the number of condition code
4423                             values (now 4 with UN - unordered).
4424
4425     35   10/29/01  pbrown   Add a SUB instruction for programmer
4426                             convenience. Moved unresolved issue list back to
4427                             the "Issues" section.  Fix several minor wording
4428                             issues.  Clarify register combiners/texture
4429                             shader/fragment program flow control diagram.
4430
4431     32   10/19/01  pbrown   Document the fragment program restriction that
4432                             instructions involving f[FOGC] and f[TEX0-TEX7]
4433                             are always carried out at fp32 precision.
4434
4435     31   10/19/01  pbrown   Fixed incorrect description of encoding of fp16
4436                             denorms.
4437
4438     30   10/12/01  pbrown   Documented (0,0,0,0) local parameter
4439                             initialization.  Disallow multiple defines of the
4440                             same token.  Allow tokens that look like a
4441                             possible register or texture name, but have
4442                             numbers that are too big (e.g., "TEX24", "R37").
4443                             Fixed up several grammar bugs.  Documented that
4444                             LG2 and RSQ now do not automatically take
4445                             absolute values, plus new math special cases.
4446