1Name 2 3 NV_fragment_program 4 5Name Strings 6 7 GL_NV_fragment_program 8 9Contact 10 11 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 12 Mark J. Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com) 13 14Notice 15 16 Copyright NVIDIA Corporation, 2001-2002. 17 18IP Status 19 20 NVIDIA Proprietary. 21 22Status 23 24 Implemented in CineFX (NV30) Emulation driver, August 2002. 25 Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003. 26 27Version 28 29 Last Modified Date: 2005/05/24 30 NVIDIA Revision: 73 31 32Number 33 34 282 35 36Dependencies 37 38 Written based on the wording of the OpenGL 1.2.1 specification and 39 requires OpenGL 1.2.1. 40 41 Requires support for the ARB_multitexture extension with at least 42 two texture units. 43 44 NV_vertex_program affects the definition of this extension. The only 45 dependency is that both extensions use the same mechanisms for defining 46 and binding programs. 47 48 NV_texture_shader trivially affects the definition of this extension. 49 50 NV_texture_rectangle trivially affects the definition of this extension. 51 52 ARB_texture_cube_map trivially affects the definition of this extension. 53 54 EXT_fog_coord trivially affects the definition of this extension. 55 56 NV_depth_clamp affects the definition of this extension. 57 58 ARB_depth_texture and SGIX_depth_texture affect the definition of this 59 extension. 60 61 NV_float_buffer affects the definition of this extension. 62 63 ARB_vertex_program affects the definition of this extension. 64 65 ARB_fragment_program affects the definition of this extension. 66 67Overview 68 69 OpenGL mandates a certain set of configurable per-fragment computations 70 defining texture lookup, texture environment, color sum, and fog 71 operations. Each of these areas provide a useful but limited set of fixed 72 operations. For example, unextended OpenGL 1.2.1 provides only four 73 texture environment modes, color sum, and three fog modes. Many OpenGL 74 extensions have either improved existing functionality or introduced new 75 configurable fragment operations. While these extensions have enabled new 76 and interesting rendering effects, the set of effects is limited by the 77 set of special modes introduced by the extension. This lack of 78 flexibility is in contrast to the high-level of programmability of 79 general-purpose CPUs and other (frequently software-based) shading 80 languages. The purpose of this extension is to expose to the OpenGL 81 application writer an unprecedented degree of programmability in the 82 computation of final fragment colors and depth values. 83 84 This extension provides a mechanism for defining fragment program 85 instruction sequences for application-defined fragment programs. When in 86 fragment program mode, a program is executed each time a fragment is 87 produced by rasterization. The inputs for the program are the attributes 88 (position, colors, texture coordinates) associated with the fragment and a 89 set of constant registers. A fragment program can perform mathematical 90 computations and texture lookups using arbitrary texture coordinates. The 91 results of a fragment program are new color and depth values for the 92 fragment. 93 94 This extension defines a programming model including a 4-component vector 95 instruction set, 16- and 32-bit floating-point data types, and a 96 relatively large set of temporary registers. The programming model also 97 includes a condition code vector which can be used to mask register writes 98 at run-time or kill fragments altogether. The syntax, program 99 instructions, and general semantics are similar to those in the 100 NV_vertex_program and NV_vertex_program2 extensions, which provide for the 101 execution of an arbitrary program each time the GL receives a vertex. 102 103 The fragment program execution environment is designed for efficient 104 hardware implementation and to support a wide variety of programs. By 105 design, the entire set of existing fragment programs defined by existing 106 OpenGL per-fragment computation extensions can be implemented using the 107 extension's programming model. 108 109 The fragment program execution environment accesses textures via 110 arbitrarily computed texture coordinates. As such, there is no necessary 111 correspondence between the texture coordinates and texture maps previously 112 lumped into a single "texture unit". This extension separates the notion 113 of "texture coordinate sets" and "texture image units" (texture maps and 114 associated parameters), allowing implementations with a different number 115 of each. The initial implementation of this extension will support 8 116 texture coordinate sets and 16 texture image units. 117 118Issues 119 120 What limitations exist in this extension? 121 122 RESOLVED: Very few. Programs can not exceed a maximum program length 123 (which is no less than 1024 instructions), and can use no more than 124 32-64 temporary registers. Programs can not access more than one 125 fragment attribute or program parameter (constant) per instruction, 126 but can work around this restriction using temporaries. The number of 127 textures that can be used by a program is limited to the number of 128 texture image units provided by the implementation (16 in the initial 129 implementation of this extension). 130 131 These limits are fairly high. Additionally, there is no limit on the 132 total number of texture lookups that can be performed by a program. 133 There is no limit on the length of a texture dependency chain -- one 134 can write a program that performs over 1000 consecutive dependent 135 texture lookups. There is no restrictions on dependencies between 136 texture mapping instructions and arithmetic instructions. Texture 137 lookups can be performed using arbitrarily computed texture 138 coordinates. Applications can carry out their calculations with full 139 32-bit single precision, although two lower-precision modes are also 140 available. 141 142 How does texture mapping work with fragment programs? 143 144 RESOLVED: This extension provides three instructions used to perform 145 texture lookups. 146 147 The "TEX" instruction performs a lookup with the (s,t,r) values taken 148 from an interpolated texture coordinate, an arbitrarily computed 149 vector, or even a program constant. The "TXP" instruction performs a 150 similar lookup, except that it uses the fourth component of the source 151 vector to performs a perspective divide, using (s/q, t/q, r/q). In 152 both cases, the GL will automatically compute partial derivatives used 153 for filter and LOD selection. 154 155 The "TXD" instruction operates like "TEX", except that it allows the 156 program to explicitly specify two additional vectors containing the 157 partial derivatives of the texture coordinate with respect to x and y 158 window coordinates. 159 160 All three instructions write a filtered texel value to a temporary or 161 output register. Other than the computation of texture coordinates 162 and partial derivatives, texture lookups not performed any differently 163 in fragment program mode. In particular, any applicable LOD biases, 164 wrap modes, minification and magnification filters, and anisotropic 165 filtering controls are still applied in fragment program mode. 166 167 The results of the texture lookup are available to be used arbitrarily 168 by subsequent fragment program instructions. Fragment programs are 169 allowed to access any texture map arbitrarily many times. 170 171 Can fragment programs be used to compute depth values? 172 173 RESOLVED: Yes. A fragment program can perform arbitrary 174 computations to compute a final value for the fragment, which it 175 should write to the "z" component of the o[DEPR] register. The "z" 176 value written should be in the range [0,1], regardless of the size of 177 the depth buffer. 178 179 To assist in the computation of the final Z value, a fragment program 180 can access the interpolated depth of the fragment (prior to any 181 displacement) by reading the "z" component of the f[WPOS] attribute 182 register. 183 184 How should near and far plane clipping work in fragment program mode if 185 the current fragment program computes a depth value? 186 187 RESOLVED: Geometric clipping to the near and far clip plane should be 188 disabled. Clipping should be done based on the depth values computed 189 per-fragment. The rationale is that per-fragment depth displacement 190 operations may effectively move portions of a primitive initially 191 outside the clip volume inside, and vice versa. 192 193 Note that under the NV_depth_clamp extension, geometric clipping to 194 the near and far clip planes is also disabled, and the fragment depth 195 values are clamped to the depth range. If depth clamp mode is enabled 196 when using a fragment program that computes a depth value, the 197 computed depth value will be clamped to the depth range. 198 199 Should fragment programs be allowed to use multiple precisions for 200 operands and operations? 201 202 RESOLVED: Yes. Low-precision operands are generally adequate for 203 representing colors. Allowing low-precision registers also allows for 204 a larger number of temporary registers (at lower precision). 205 Low-precision operations also provide the opportunity for a higher 206 level of performance. 207 208 Applications are free to use only high-precision operations or mix 209 high- and low-precision operations as necessary. 210 211 What levels of precision are supported in arithmetic operations? 212 213 RESOLVED: Arithmetic operations can be performed at three different 214 precisions. 32-bit floating point precision (fp32) uses the IEEE 215 single-precision standard with a sign bit, 8 exponent bits, and 23 216 mantissa bits. 16-bit floating-point precision (fp16) uses a similar 217 floating-point representation, but with 5 exponent bits and 10 218 mantissa bits. Additionally, many arithmetic operations can also be 219 carried out at 12-bit fixed point precision (fx12), where values in 220 the range [-2,+2) are represented as signed values with 10 fraction 221 bits. 222 223 How should the precision with which operations are carried out be 224 specified? Should we infer the precision from the types of the operands 225 or result vectors? Or should it be an attribute of the instruction? 226 227 RESOLVED: Applications can optionally specify the precision of 228 individual instructions by adding a suffix of "R", "H", and "X" to 229 instruction names to select fp32, fp16, and fx12 precision, 230 respectively. 231 232 By default, instructions will be carried out using the precision of 233 the destination register. Always inferring the precision from the 234 operands has a number of issues. First, there are a number of 235 operations (e.g., TEX/TXP/TXD) where result type has little to no 236 correspondance to the type of the operands. In these cases, precision 237 suffixes are not supported. Second, one could have instructions 238 automatically cast operands and compute results using the type of the 239 highest precision operand or result. This behavior would be 240 problematic since all fragment attribute registers and program 241 parameters are kept at full precision, but full precision may not be 242 needed by the operation. 243 244 The choice of precision level allows programs to trade off precision 245 for potentially higher performance. Giving the program explicit 246 control over the precision also allows it to dictate precision 247 explicitly and eliminate any uncertainty over type casting. 248 249 For instructions whose specified precision is different than the precision 250 of the operands or the result registers, how are the operations performed? 251 How are the condition codes updated? 252 253 RESOLVED: Operations are performed with operands and results at the 254 precision specified by the instruction. After the operation is 255 complete, the result is converted to the precision of the destination 256 register, after which the condition code is generated. 257 258 In an alternate approach, the condition code could be generated from 259 the result. However, in some cases, the register contents would not 260 match the condition code. In such cases, it may not be reliable to 261 use the condition code to prevent division by zero or other special 262 cases. 263 264 How does this extension interact with the ARB_multisample extension? In 265 the ARB_multisample extension, each fragment has multiple depth values. 266 In this extension, a single interpolated depth value may be modified by a 267 fragment program. 268 269 RESOLVED: The depth values for the extra samples are generated by 270 computing partials of the computed depth value and using these 271 partials to derive the depth values for each of the extra samples. 272 273 How does this extension interact with polygon offset? Both extensions 274 modify fragment depth values. 275 276 RESOLVED: As in the base OpenGL spec, the depth offset generated by 277 polygon offset is added during polygon rasterization. The depth value 278 provided to programs in f[WPOS].z already includes polygon offset, if 279 enabled. If the depth value is replaced by a fragment program, the 280 polygon offset value will NOT be recomputed and added back after 281 program execution. 282 283 This is probably not desirable for fragment programs that modify depth 284 values since the partials used to generate the offset may not match 285 the partials of the computed depth value. Polygon offset for filled 286 polygons can be approximated in a fragment program using the depth 287 partials obtained by the DDX and DDY instructions. This will not work 288 properly for line- and point-mode polygons, since the partials used 289 for offset are computed over the polygon, while the partials resulting 290 from the DDX and DDY instructions are computed along the line (or are 291 zero for point-mode polygons). In addition, separate treatment of 292 points, line segments, and polygons is not possible in a fragment 293 program. 294 295 Should depth component replacement be an property of the fragment program 296 or a separate enable? 297 298 RESOLVED: It should be a program property. Using the output register 299 notation simplifies matters: depth components are replaced if and 300 only if the DEPR register is written to. This alleviates the 301 application and driver burden of maintaining separate state. 302 303 How does this extension affect the handling of q texture coordinates in 304 the OpenGL spec? 305 306 RESOLVED: Fragment programs are allowed to access an associated q 307 texture coordinate, so this attribute must be produced by 308 rasterization. In unextended OpenGL 1.2, the q coordinate is 309 eliminated in the rasterization portions of the spec after dividing 310 each of s, t, and r by it. This extension updates the specification 311 to pass q coordinates through at least to conventional texture 312 mapping. When fragment program mode are disabled, q coordinates will 313 be eliminated there in an identical manner. This modification has the 314 added benefit of simplifying the equations used for attribute 315 interpolation. 316 317 How should clip w coordinates be handled by this extension? 318 319 RESOLVED: Fragment programs are allowed to access the reciprocal of 320 the clip w coordinate, so this attribute must be produced by 321 rasterization. The OpenGL 1.2 spec doesn't explictly enumerate the 322 attributes associated with the fragment, but we add treatment of the w 323 clip coordinate in the appropriate locations. 324 325 The reciprocal of the clip w coordinate in traditional graphics 326 hardware is produced by screen-space linear interpolation of the 327 reciprocals of the clip w coordinates of the vertices. However, this 328 spec says the clip w coordinate is produced by perspective-correct 329 interpolation of the (non-reciprocated) clip w vertex coordinates. 330 These two formulations turn out to be equivalent, and the latter is 331 more convenient since the core OpenGL spec already contains formulas 332 for perspective-correct interpolation of vertex attributes. 333 334 What is produced by the TEX/TXP/TXD instructions if the requested texture 335 image is inconsistent? 336 337 RESOLVED: The result vector is specified to be (0,0,0,0). This 338 behavior is consistent with the NV_texture_shader extension. Note 339 that like in NV_texture_shader, these instructions ignore the standard 340 hierarchy of texture enables and programs can access textures that are 341 not specifically "enabled". 342 343 Should a minimum precision be specified for certain fragment attribute 344 registers (in particular COL0, COL1) that may not be generated with full 345 fp32 precision? 346 347 RESOLVED: No. It is expected that the precision of COL0/COL1 should 348 generally be at least as high as that of the frame buffer. 349 350 Fragment color components (f[COL0] and f[COL1]) are generally 351 low-precision fixed-point values in the range [0,1]. Is it possible to 352 pass unclamped or high-precision color components to fragment programs? 353 354 RESOLVED: Yes, although you can't exactly call them "colors". 355 High-precision per-vertex color values can be written into any unused 356 texture coordinate set, either via a MultiTexCoord call or using a 357 vertex program. These "texture coordinates" will be interpolated 358 during rasterization, and can be used arbitrarily by a fragment 359 program. 360 361 In particular, there is no requirement that per-fragment attributes 362 called "texture coordinates" be used for texture mapping. 363 364 Should this specification guarantee that temporary registers are 365 initialized to zero? 366 367 RESOLVED: Yes. This will allow for the modular construction of 368 programs that accumulate results in registers. For example, 369 per-fragment lighting may use MAD instructions to accumulate color 370 contributions at each light. Without zero-initialization, the program 371 would require an explicit MOV instruction to load 0 or the use of the 372 MUL instruction for the first light. 373 374 Should this specification support Unicode program strings? 375 376 RESOLVED: Not necessary. 377 378 Programs defined by NV_vertex_program begin with "!!VP1.0". Should 379 fragment programs have a similar identifier? 380 381 RESOLVED: Yes, "!!FP1.0", identifying the first revision of this 382 fragment program language. 383 384 Should per-fragment attributes have equivalent integer names in the 385 program language, as per-vertex attributes do in NV_vertex_program? 386 387 RESOLVED: No. In NV_vertex_program, "generic" vertex attributes 388 could be specified directly by an application using only an attribute 389 number. Those numbers may have no necessary correlation with the 390 conventional attribute names, although conventional vertex attributes 391 are mapped to attribute numbers. However, conventional attributes are 392 the only outputs of vertex programs and of rasterization. Therefore, 393 there is no need for a similar input-by-number functionality for 394 fragment programs. 395 396 Should we provide the ability to issue instructions that do not update 397 temporary or output registers? 398 399 RESOLVED: Yes. Programs may issue instructions whose only purpose is 400 to update the condition code register, and requiring such instructions 401 to write to a temporary may require the use of an additional temporary 402 and/or defeat possible program optimizations. We accomplish this by 403 adding two write-only temporary pseudo-registers ("RC" and "HC") that 404 can be specified as destination registers. 405 406 Do the packing and unpacking instructions in this extension make any 407 sense? 408 409 RESOLVED: Yes. They are useful for packing and unpacking multiple 410 components in a single channel of a floating-point frame buffer. For 411 example, a 128-bit "RGBA" frame buffer could pack 16 8-bit quantities 412 or 8 16-bit quantities, all of which could be used in later 413 rasterization passes. See the NV_float_buffer extension for more 414 information. 415 416 Should we provide a method for specifying a fp16 depth component output 417 value? 418 419 RESOLVED: No. There is no good reason for supporting half-precision 420 Z outputs. Even with 16-bit Z buffers, the 10-bit mantissa of the 421 half-precision float is rather limiting. There would effectively be 422 only 11 good bits in the back half of the Z buffer. 423 424 Should RequestResidentProgramsNV (or a new equivalent function) take a 425 target? Dealing with working sets of different program types is a bit 426 messy. Should we document some limitation if we get programs of different 427 types? 428 429 RESOLVED: In retrospect, it may have been a good idea to attach a 430 target to this command, but there isn't a good reason to mess with 431 something that already works for vertex programs. The driver is 432 responsible for ensuring consistent results when the program types 433 specified are mixed. 434 435 What happens on data type conversions where the original value is not 436 exactly representable in the new data type, either due to overflow or 437 insufficient precision in the destination type? 438 439 RESOLVED: In case of overflow, the original value is clamped to the 440 +/-INF (fp16 or fp32) or the nearest representable value (fx12). In 441 case of imprecision, the conversion is either to round or truncate to 442 the nearest representable value. 443 444 Should this extension support IEEE-style denorms? For 32-bit IEEE 445 floating point, denorms are numbers smaller in absolute value than 2^-126. 446 For 16-bit floats used by this extension, denorms are numbers smaller in 447 absolute value than 2^-14. 448 449 RESOLVED: For 32-bit data types, hardware support for denorms was 450 considered too expensive relative to the benefit provided. 451 Computational results that would otherwise produce denorms are flushed 452 to zero. For 16-bit data types, hardware denorm support will be 453 present. The expense of hardware denorm support is lower and the 454 potential precision benefit is greater for 16-bit data types. 455 456 OpenGL provides a hierarchy of texture enables. The texture lookup 457 operations in NV_texture_shader effectively override the texture enable 458 hierarchy and select a specific texture to enable. What should be done by 459 this extension? 460 461 RESOLVED: This extension will build upon NV_texture_shader and reduce 462 the driver overhead of validating the texture enables. Texture 463 lookups can be specified by instructions like "TEX H0, f[TEX2], TEX2, 464 3D", which would indicate to use texture coordinate set number 2 to do 465 a lookup in the texture object bound to the TEXTURE_3D target in 466 texture image unit 2. 467 468 Each texture unit can have only one "active" target. Programs are not 469 allowed to reference different texture targets in the same texture 470 image unit. In the example above, any other texture instructions 471 using texture image unit 2 must specify the 3D texture target. 472 473 What is the interaction with NV_register_combiners? 474 475 RESOLVED: Register combiners are not available when fragment programs 476 are enabled. 477 478 Previous version of this specification supported the notion of 479 combiner programs, where the result of fragment program execution was 480 a set of four "texture lookup" values that fed the register combiners. 481 482 For convenience, should we include pseudo-instructions not present in the 483 hardware instruction set that are trivially implementable? For example, 484 absolute value and subtract instructions could fall in this category. An 485 "ABS R1,R0" instruction would be equivalent to "MAX R1,R0,-R0", and a "SUB 486 R2,R0,R1" would be equivalent to "ADD R2,R0,-R1" 487 488 RESOLVED: In general, yes. A SUB instruction is provided for 489 convenience. This extension does not provide a separate ABS 490 instruction because it supports absolute value operations of each 491 operand. 492 493 Should there be a '+' in the <optionalSign> portion of the grammar? There 494 isn't one in the GL_NV_vertex_program spec. 495 496 RESOLVED: Yes, for orthogonality/readability. A '+' obviously adds 497 no functionality. In NV_vertex_program, an <optionalSign> of "-" was 498 always a negation operator. However, in fragment programs, it can 499 also be used as a sign for a constant value. 500 501 Can the same fragment attribute register, program parameter register, or 502 constants be used for multiple operands in the same instruction? If so, 503 can it be used with different swizzle patterns? 504 505 RESOLVED: Yes and yes. 506 507 This extension allows different limits for the number of texture 508 coordinate sets and the number of texture image units (i.e., texture maps 509 and associated data). The state in ActiveTextureARB affects both 510 coordinate sets (TexGen, matrix operations) and image units (TexParameter, 511 TexEnv). How should we deal with this? 512 513 RESOLVED: Continue to use ActiveTextureARB and emit an 514 INVALID_OPERATION if the active texture refers to an unsupported 515 coordinate set/image unit. Other options included creating dummy 516 (unusable) state for unsupported coordinate sets/image units and 517 continue to use ActiveTextureARB normally, or creating separate state 518 and state-setting commands for coordinate sets and image units. 519 Separate state is the cleanest solution, but would add more calls and 520 potentially cause more programmer confusion. Dummy state would avoid 521 additional error checks, but the demands of dummy state could grow if 522 the number of texture image units and texture coordinate sets 523 increases. 524 525 The current OpenGL spec is vague as to what state is affected by the 526 active texture selector and has no distination between 527 coordinate-related and image-related state. The state tables could 528 use a good clean-up in this area. 529 530 The LRP instruction is defined so that the result of "LRP R0, R0, R1, R2" 531 is R0*R1+(1-R0)*R2. There are conflicting precedents here. The 532 definition here matches the "lrp" instruction in the DirectX 8.0 pixel 533 shader language. However, an equivalent RenderMan lerp operation would 534 yield a result of (1-R0)*R1+R0*R2. Which ordering should be implemented? 535 536 RESOLVED: NVIDIA hardware implements the former operand ordering, and 537 there is no good reason to specify a different ordering. To convert a 538 "LRP" using the latter ordering to NV_fragment_program, swap the third 539 and fourth arguments. 540 541 Should this extension provide tracking of matrices or any other state, 542 similar to that provided in NV_vertex_program? 543 544 RESOLVED: No. 545 546 Should this extension provide global program parameters -- values shared 547 between multiple fragment programs? 548 549 RESOLVED: No. 550 551 Should this extension provide program parameters specific to a program? 552 If so, how? 553 554 RESOLVED: Yes. These parameters will be called "local parameters". 555 This extension will provide both named and numbered local parameters. 556 Local parameters can be managed by the driver and eliminate the need 557 for applications to manage a global name space. 558 559 Named local parameters work much like standard variable names in most 560 programming languages. They are created using the "DECLARE" 561 instruction within the fragment program itself. For example: 562 563 DECLARE color = {1,0,0,1}; 564 565 Named local parameters are used simply by referencing the variable 566 name. They do not require the array syntax like the global parameters 567 in the NV_vertex_program extension. They can be updated using the 568 commands ProgramNamedParameter4[f,fv]NV. 569 570 Numbered local parameters are not declared. They are used by simply 571 referencing an element of an array called "p". For example, 572 573 MOV R0, p[12]; 574 575 loads the value of numbered local parameter 12 into register R0. 576 Numbered local parameters can be updated using the commands 577 ProgramLocalParameter4[d,dv,f,fv]ARB. 578 579 The numbered local parameter APIs were added to this extension late in 580 its development, and are provided for compatibility with the 581 ARB_vertex_program extension, and what will likely be supported in 582 ARB_fragment_program as well. Providing this mechanism allows 583 programs to use the same mechanisms to set local parameters in both 584 extension. 585 586 Why are the APIs for setting named and numbered local parameters 587 different? 588 589 RESOLVED: The named parameter API was created prior to 590 ARB_vertex_program (and the possible future ARB_fragment_program) and 591 uses conventions borrowed from NV_vertex_program. A slightly 592 different API was chosen during the ARB standardization process; see 593 the ARB_vertex_program specification for more details. 594 595 The named parameter API takes a program ID and a parameter name, and 596 sets the parameter for the program with the specified ID. The 597 specified program does not need to be bound (via BindProgramNV) in 598 order to modify the values of its named parameters. The numbered 599 parameter API takes a program target enum (FRAGMENT_PROGRAM_NV) and a 600 parameter number and modifies the corresponding numbered parameter of 601 the currently bound program. 602 603 What should be the initial value of uninitialized local parameters? 604 605 RESOLVED: (0,0,0,0). This choice is somewhat arbitrary, but matches 606 previous extensions (e.g., NV_vertex_program). 607 608 Should this extension support program parameter arrays? 609 610 RESOLVED: No hardware support is present. Note that from the point 611 of view of a fragment program, a texture map can be used as a 1-, 2-, 612 or 3-dimensional array of constants. 613 614 Should this extension provide support constants in fragment programs? If 615 so, how? 616 617 RESOLVED: Yes. Scalar or vector constants can be defined inline 618 (e.g., "1.0" or "{1,2,3,4}"). In addition, named constants are 619 supported using the "DEFINE" instruction, which allow programmers to 620 change the values of constants used in multiple instructions simply be 621 changing the value assigned to the named constant. 622 623 Note that because this extension uses program strings, the 624 floating-point value of any constants generated on the fly must be 625 printed to the program string. An alternate method that avoids the 626 need to print constants is to declare a named local program parameter 627 and initialize it with the ProgramNamedParameter4[f,fv]() calls. 628 629 Should named constants be allowed to be redefined? 630 631 RESOLVED: No. If you want to redefine the values of constants, you 632 can create an equivalent named program parameter by changing the 633 "DEFINE" keyword to "DECLARE". 634 635 Should functions used to update or query named local parameters take a 636 zero-terminated string (as with most strings in the C programming 637 language), or should they require an explicit string length? If the 638 former, should we create a version of LoadProgramNV that does not require 639 a string length. 640 641 RESOLVED: Stick with explicit string length. Strings that are 642 defined as constants can have the length computed at compile-time. 643 Strings read from files will have the length known in advance. 644 Programs to build strings at run-time also likely keep the length 645 up-to-date. Passing an explicit length saves time, since the driver 646 doesn't have to do a strlen(). 647 648 What is the deal with the alpha of the secondary color? 649 650 RESOLVED: In unextended OpenGL 1.2, the alpha component of the 651 secondary color is forced to 0.0. In the EXT_secondary_color 652 extension, the alpha of the per-vertex secondary colors is defined to 653 be 0.0. NV_vertex_program allows vertex programs to produce a 654 per-vertex alpha component, but it is forced to zero for the purposes 655 of the color sum. In the NV_register_combiners extension, the alpha 656 component of the secondary color is undefined. What a mess. 657 658 In this extension, the alpha of the secondary color is well-defined 659 and can be used normally. When in vertex program mode 660 661 Why are fragment program instructions involving f[FOGC] or f[TEX0] through 662 f[TEX7] automatically carried out at full precision? 663 664 RESOLVED: This is an artifact of the method that these interpolants 665 are generated the NVIDIA graphics hardware. If such instructions 666 absolutely must be carried out at lower precision, the requirement can 667 be met by first loading the interpolants into a temporary register. 668 669 With a different number of texture coordinate sets and texture image 670 units, how many copies of each kind of texture state are there? 671 672 RESOLVED: The intention is that texture state be broken into three 673 groups. (1) There are MAX_TEXTURE_COORDS_NV copies of texture 674 coordinate set state, which includes current texture coordinates, 675 TexGen state, and texture matrices. (2) There are 676 MAX_TEXTURE_IMAGE_UNITS_NV copies of texture image unit state, which 677 include texture maps, texture parameters, LOD bias parameters. (3) 678 There are MAX_TEXTURE_UNITS_ARB copies of legacy OpenGL texture unit 679 state (e.g., texture enables, TexEnv blending state), all of which are 680 unused when in fragment program mode. 681 682 It is not necessary that MAX_TEXTURE_UNITS_ARB be equal to the minimum 683 of MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS -- 684 implementations may choose not to extend fixed-function OpenGL texture 685 mapping modes beyond a certain point. 686 687 The GLX protocol for LoadProgramNV (and ProgramNamedParameterNV) may end 688 up with programs >64KB. This will overflow the limits of the GLX Render 689 protocol, resulting in the need to use RenderLarge path. This is an issue 690 with vertex programs, also. 691 692 RESOLVED: Yes, it is. 693 694 Should textures used by fragment programs be declared? For example, 695 "TEXTURE TEX3, 2D", indicating that the 2D texture should be used for all 696 accesses to texture unit 3. The dimension could be dropped from the TEX 697 family of instructions, and some of the compile-time error checking could 698 be dropped. 699 700 RESOLVED: Maybe it should be, but for better or worse, it isn't. 701 702 It is not all that uncommon to have negative q values with projective 703 texture mapping, but results are undefined if any q values are negative in 704 this specification. Why? 705 706 RESOLVED: This restriction carries on a similar one in the initial 707 OpenGL specification. The motivation for this restriction is that 708 when interpolating, it is possible for a fragment to have an 709 interpolated q coordinate at or near 0.0. Since the texture 710 coordinates used for projective texture mapping are s/q, t/q, and r/q, 711 this will result in a divide-by-zero error or suffer from significant 712 numerical instability. Results will be inaccurate for such fragments. 713 714 Other than the numerical stability issue above, NVIDIA hardware should 715 have no problems with negative q coordinates. 716 717 Should programs that replace depth have their own special program type, 718 Such as "!!FPD1.0" and "!!FPDC1.0"? 719 720 RESOLVED: No. If a program has an instruction that writes to 721 o[DEPR], the final fragment depth value is taken from o[DEPR].z. 722 Otherwise, the fragment's original depth value is used. 723 724 What fx12 value should NaN map to? 725 726 RESOLVED: For the lack of any better choice, 0.0. 727 728 How are special-case encodings (-INF, +INF, -0.0, +0.0, NaN) handled for 729 arithmetic and comparison operations? 730 731 RESOLVED: The special cases for all floating-point operations are 732 designed to match the IEEE specification for floating-point numbers as 733 closely as possible. The results produced by special cases should be 734 enumerated in the sections of this spec describing the operations. 735 There are some cases where the implemented fragment program behavior 736 does not match IEEE conventions, and these cases should be noted in 737 this specification. 738 739 How can condition codes be used to mask out register writes? How about 740 killing fragments? What other things can you do? 741 742 RESOLVED: The following example computes a component wise |R1-R2|: 743 744 SUBC R0, R1, R2; # "C" suffix means update condition code 745 MOV R0 (LT), -R0; # Conditional write mask in parentheses 746 747 The first instruction computes a component-wise difference between R1 748 and R2, storing R1-R2 in register R0. The "C" suffix in the 749 instruction means to update the condition code based on the sign of 750 the result vector components. The second instruction inverts the sign 751 of the components of R0. However the "(LT)" portion says that the 752 destination register should be updated only if the corresponding 753 condition code component is LT (negative). This means that only those 754 components of R0 755 756 To kill a fragment if the red (x) component of a texture lookup 757 returns zero: 758 759 TEXC R0, f[TEX0], TEX0, 2D; 760 KIL EQ.x; 761 762 To kill based on the green (y) component, use "EQ.y" instead. To kill 763 if any of the four components is zero, use "EQ.xyzw" or just "EQ". 764 765 Fragment programs do not support boolean expressions. These can 766 generally be achieved using conditional write mask. 767 768 To evaluate the expression "(R0.x == 0) && (R1.x == 0)": 769 770 MOVC RC.x, R0.x; 771 MOVC RC.x (EQ), R1.x; 772 773 To evaluate the expression "(R0.x == 0) || (R1.x == 0)": 774 775 MOVC RC.x, R0.x; 776 MOVC RC.x (NE), R1.x; 777 778 In both cases, the x component of the condition code will contain "EQ" 779 if and only if the condition is TRUE. 780 781 How can fragment programs be used to implement non-standard texture 782 filtering modes? 783 784 RESOLVED: As one example, consider a case where you want to do linear 785 filtering in a 2D texture map, but only horizontally. To achieve 786 this, first set the texture filtering mode to NEAREST. For a 16 x n 787 texture, you might do something like: 788 789 DEFINE halfTexel = { 0.03125, 0 }; # 1/32 (1/2 a texel) 790 ADD R2, f[TEX0], -halfTexel; # coords of left sample 791 ADD R1, f[TEX0], +halfTexel; # coords of right sample 792 TEX R0, R2, TEX0, 2D; # lookup left sample 793 TEX R1, R1, TEX0, 2D; # lookup right sample 794 MUL R2.x, R2.x, 16; # scale X coords to texels 795 FRC R2.x, R2.x; # get fraction, filter weight 796 LRP R0, R2.x, R1, R0; # blend samples based on weight 797 798 There are plenty of other interesting things that can be done. 799 800 Should this specification provide more examples? 801 802 RESOLVED: Yes, it should. 803 804 Is the OpenGL ARB working on a multi-vendor standard for fragment 805 programmability? Will there be an ARB_fragment_program extension? If so, 806 how will this extension interact with the ARB standard? 807 808 RESOLVED: Yes, as of July 2002, there was a multi-vendor working 809 group and a draft specification. The ARB extension is expected to 810 have several features not present in this extension, such as state 811 tracking and global parameters (called "program environment 812 parameters"). It will also likely lack certain features found in this 813 extension. 814 815 Why does the HEMI mapping apply to the third component of signed HILO 816 textures, but not to unsigned HILO textures? 817 818 RESOLVED: This behavior matches the behavior of NV_texture_shader 819 (e.g., the DOT_PRODUCT_NV mode). The HEMI mapping will construct the 820 third component of a unit vector whose first two components are 821 encoded in the HILO texture. 822 823 824New Procedures and Functions 825 826 void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name, 827 float x, float y, float z, float w); 828 void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name, 829 double x, double y, double z, double w); 830 void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name, 831 const float v[]); 832 void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name, 833 const double v[]); 834 void GetProgramNamedParameterfvNV(uint id, sizei len, const ubyte *name, 835 float *params); 836 void GetProgramNamedParameterdvNV(uint id, sizei len, const ubyte *name, 837 double *params); 838 839 void ProgramLocalParameter4dARB(enum target, uint index, 840 double x, double y, double z, double w); 841 void ProgramLocalParameter4dvARB(enum target, uint index, 842 const double *params); 843 void ProgramLocalParameter4fARB(enum target, uint index, 844 float x, float y, float z, float w); 845 void ProgramLocalParameter4fvARB(enum target, uint index, 846 const float *params); 847 void GetProgramLocalParameterdvARB(enum target, uint index, 848 double *params); 849 void GetProgramLocalParameterfvARB(enum target, uint index, 850 float *params); 851 852 853New Tokens 854 855 Accepted by the <cap> parameter of Disable, Enable, and IsEnabled, by the 856 <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev, 857 and by the <target> parameter of BindProgramNV, LoadProgramNV, 858 ProgramLocalParameter4dARB, ProgramLocalParameter4dvARB, 859 ProgramLocalParameter4fARB, ProgramLocalParameter4fvARB, 860 GetProgramLocalParameterdvARB, and GetProgramLocalParameterfvARB: 861 862 FRAGMENT_PROGRAM_NV 0x8870 863 864 Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv, 865 and GetDoublev: 866 867 MAX_TEXTURE_COORDS_NV 0x8871 868 MAX_TEXTURE_IMAGE_UNITS_NV 0x8872 869 FRAGMENT_PROGRAM_BINDING_NV 0x8873 870 MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV 0x8868 871 872 Accepted by the <name> parameter of GetString: 873 874 PROGRAM_ERROR_STRING_NV 0x8874 875 876 877Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation) 878 879 Modify Section 2.11, Clipping (p.39) 880 881 (replace the first paragraph of the section, p. 39) Primitives are clipped 882 to the clip volume. In clip coordinates, the view volume is defined by 883 884 -w_c <= x_c <= w_c, 885 -w_c <= y_c <= w_c, and 886 -w_c <= z_c <= w_c. 887 888 Clipping to the near and far clip planes is ignored if fragment program 889 mode (section 3.11) or texture shaders (see NV_texture_shader 890 specification) are enabled, if the current fragment program or texture 891 shader computes per-fragment depth values. In this case, the view volume 892 is defined by: 893 894 -w_c <= x_c <= w_c and 895 -w_c <= y_c <= w_c. 896 897 898Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization) 899 900 Modify Chapter 3 introduction (p. 57) 901 902 (p.57, modify 1st paragraph) ... Figure 3.1 diagrams the rasterization 903 process. The color value assigned to a fragment is initially determined 904 by the rasterization operations (Sections 3.3 through 3.7) and modified by 905 either the execution of the texturing, color sum, and fog operations as 906 defined in Sections 3.8, 3.9, and 3.10, or of a fragment program defined 907 in Section 3.11. The final depth value is initially determined by the 908 rasterization operations and may be modified by a fragment program. 909 910 note: Antialiasing Application is renumbered from Section 3.11 to Section 911 3.12. 912 913 Modify Figure 3.1 (p.58) 914 915 Primitive Assembly 916 | 917 +-----------+-----------+-----------+-----------+ 918 | | | | | 919 | | | Pixel | 920 Point Line Polygon Rectangle Bitmap 921 Raster- Raster- Raster- Raster- Raster- 922 ization ization ization ization ization 923 | | | | | 924 +-----------+-----------+-----------+-----------+ 925 | 926 | 927 +-----------------+-----------------+ 928 | | | 929 Conventional Texture Fragment 930 Texture Fetch Shaders Programs 931 | | | 932 | +--------------+ | 933 | | | 934 TEXTURE_ o o | 935 SHADER_NV | 936 enable o | 937 | | 938 +-------------+ | 939 | | | 940 Conventional Register | 941 TexEnv Combiners | 942 | | | 943 Color Sum | | 944 | | | 945 Fog | | 946 | | | 947 | +----------+ | 948 | | | 949 REGISTER_ o o | 950 COMBINERS_ | 951 NV enable o | 952 | | 953 +-----------------+ +--------------+ 954 | | 955 FRAGMENT_ o o 956 PROGRAM_ 957 NV enable o 958 | 959 | 960 Coverage 961 Application 962 | 963 v 964 to fragment processing 965 966 967 Modify Section 3.3, Points (p.61) 968 969 All fragments produced in rasterizing a non-antialiased point are assigned 970 the same associated data, which are those of the vertex corresponding to 971 the point. (delete reference to divide by q). 972 973 If anitialiasing is enabled, then ... The data associated with each 974 fragment are otherwise the data associated with the point being 975 rasterized. (delete reference to divide by q) 976 977 Modify Section 3.4.1, Basic Line Segment Rasterization (p.66) 978 979 (Note that t=0 at p_a and t=1 at p_b). The value of an associated datum f 980 from the fragment, whether it be R, G, B, or A (in RGBA mode) or a color 981 index (in color index mode), the s, t, r, or q texture coordinate, or the 982 clip w coordinate (the depth value, window z, must be found using equation 983 3.3, below), is found as 984 985 f = (1-t) * f_a / w_a + t * f_b / w_b (3.2) 986 --------------------------------- 987 (1-t) / w_a + t / w_b 988 989 where f_a and f_b are the data associated with the starting and ending 990 endpoints of the segment, respectively; w_a and w_b are the clip 991 w coordinates of the starting and ending endpoints of the segments 992 respectively. Note that linear interpolation would use 993 994 f = (1-t) * f_a + t * f_b. (3.3) 995 996 ... A GL implementation may choose to approximate equation 3.2 with 3.3, 997 but this will normally lead to unacceptable distortion effects when 998 interpolating texture coordinates or clip w coordinates. 999 1000 Modify Section 3.5.1, Basic Polygon Rasterization (p.71) 1001 1002 Denote a datum at p_a, p_b, or p_c ... is given by 1003 1004 f = a * f_a / w_a + b * f_b / w_b + c * f_c / w_c (3.4) 1005 --------------------------------------------- 1006 a / w_a + b / w_b + c / w_c 1007 1008 where w_a, w_b, and w_c are the clip w coordinates of p_a, p_b, and p_c, 1009 respectively. a, b, and c are the barycentric coordinates of the fragment 1010 for which the data are produced. a, b, and c must correspond precisely to 1011 the exact coordinates ... at the fragment's center. 1012 1013 Just as with line segment rasterization, equation 3.4 may be approximated 1014 by 1015 1016 f = a * f_a + b * f_b + c * f_c; (3.5) 1017 1018 this may yield ... for texture coordinates or clip w coordinates. 1019 1020 Modify Section 3.6.4, Rasterization of Pixel Rectangles (p.100) 1021 1022 A fragment arising from a group ... are given by those associated with the 1023 current raster position. (delete reference to divide by q) 1024 1025 Modify Section 3.7, Bitmaps (p.111) 1026 1027 Otherwise, a rectangular array ... The associated data for each fragment 1028 are those associated with the current raster position. (delete reference 1029 to divide by q) Once the fragments have been produced ... 1030 1031 Modify Section 3.8, Texturing (p.112) 1032 1033 ... an image at the location indicated by a fragment's texture coordinates 1034 to modify the fragments primary RGBA color. Texturing does not affect the 1035 secondary color. 1036 1037 Texturing is specified only for RGBA mode; its use in color index mode is 1038 undefined. 1039 1040 Except when in fragment program mode (Section 3.11), the (s,t,r) texture 1041 coordinates used for texturing are the values s/q, t/q, and r/q, 1042 respectively, where s, t, r, and q are the texture coordinates associated 1043 with the fragment. When in fragment program mode, the (s,t,r) texture 1044 coordinates are specified by the program. If q is less than or equal to 1045 zero, the results of texturing are undefined. 1046 1047 Add new Section 3.11, Fragment Programs (p.140) 1048 1049 Fragment program mode is enabled and disabled with the Enable and Disable 1050 commands using the symbolic constant FRAGMENT_PROGRAM_NV. When fragment 1051 program mode is enabled, standard and extended texturing, color sum, and 1052 fog application stages are ignored and a general purpose program is 1053 executed instead. 1054 1055 A fragment program is a sequence of instructions that execute on a 1056 per-fragment basis. In fragment program mode, the currently bound 1057 fragment program is executed as each fragment is generated by the 1058 rasterization operations. Fragment programs execute a finite fixed 1059 sequence of instructions with no branching or looping, and operate 1060 independently from the processing of other fragments. Fragment programs 1061 are used to compute new color values to be associated with each fragment, 1062 and can optionally compute a new depth value for each fragment as well. 1063 1064 Fragment program mode is not available in color index mode and is 1065 considered disabled, regardless of the state of FRAGMENT_PROGRAM_NV. When 1066 fragment program mode is enabled, texture shaders and register combiners 1067 (NV_texture_shader and NV_register_combiners extension) are disabled, 1068 regardless of the state of TEXTURE_SHADER_NV and REGISTER_COMBINERS_NV. 1069 1070 Section 3.11.1, Fragment Program Registers 1071 1072 Fragment programs operate on a set of program registers. Each program 1073 register is a 4-component vector, whose components are referred to as "x", 1074 "y", "z", and "w" respectively. The components of a fragment register are 1075 always referred to in this manner, regardless of the meaning of their 1076 contents. 1077 1078 The four components of each fragment program register have one of two 1079 different representations: 32-bit floating-point (fp32) or 16-bit 1080 floating-point (fp16). More details on these representations can be found 1081 in Section 3.11.4.1. 1082 1083 There are several different classes of program registers. Attribute 1084 registers (Table X.1) correspond to the fragment's associated data 1085 produced by rasterization. Temporary registers (Table X.2) hold 1086 intermediate results generated by the fragment program. Output registers 1087 (Table X.3) hold the final results of a fragment program. The single 1088 condition code register is used to mask writes to other registers or to 1089 determine if a fragment should be discarded. 1090 1091 1092 Section 3.11.1.1, Fragment Program Attribute Registers 1093 1094 The fragment program attribute registers (Table X.1) hold the location of 1095 the fragment and the data associated with the fragment produced by 1096 rasterization. 1097 1098 Fragment Attribute Component 1099 Register Name Description Interpretation 1100 -------------- ----------------------------------- -------------- 1101 f[WPOS] Position of the fragment center. (x,y,z,1/w) 1102 f[COL0] Interpolated primary color (r,g,b,a) 1103 f[COL1] Interpolated secondary color (r,g,b,a) 1104 f[FOGC] Interpolated fog distance/coord (z,0,0,0) 1105 f[TEX0] Texture coordinate (unit 0) (s,t,r,q) 1106 f[TEX1] Texture coordinate (unit 1) (s,t,r,q) 1107 f[TEX2] Texture coordinate (unit 2) (s,t,r,q) 1108 f[TEX3] Texture coordinate (unit 3) (s,t,r,q) 1109 f[TEX4] Texture coordinate (unit 4) (s,t,r,q) 1110 f[TEX5] Texture coordinate (unit 5) (s,t,r,q) 1111 f[TEX6] Texture coordinate (unit 6) (s,t,r,q) 1112 f[TEX7] Texture coordinate (unit 7) (s,t,r,q) 1113 1114 Table X.1: Fragment Attribute Registers. The component interpretation 1115 column describes the mapping of attribute values to register components. 1116 For example, the "x" component of f[COL0] holds the red color component, 1117 and the "x" component of f[TEX0] holds the "s" texture coordinate for 1118 texture unit 0. The entries "0" and "1" indicate that the attribute 1119 register components hold the constants 0 and 1, respectively. 1120 1121 f[WPOS].x and f[WPOS].y hold the (x,y) window coordinates of the fragment 1122 center, and relative to the lower left corner of the window. f[WPOS].z 1123 holds the associated z window coordinate, normally in the range [0,1]. 1124 f[WPOS].w holds the reciprocal of the associated clip w coordinate. 1125 1126 f[COL0] and f[COL1] hold the associated RGBA primary and secondary colors 1127 of the fragment, respectively. 1128 1129 f[FOGC] holds the associated eye distance or fog coordinate normally used 1130 for fog computations. 1131 1132 f[TEX0] through f[TEX7] hold the associated texture coordinates for 1133 texture coordinate sets 0 through 7, respectively. 1134 1135 All attribute register components are treated as 32-bit floats. However, 1136 the components of primary and secondary colors (f[COL0] and f[COL1]) may 1137 be generated with reduced precision. 1138 1139 The contents of the fragment attribute registers may not be modified by a 1140 fragment program. In addition, each fragment program instruction can use 1141 at most one unique attribute register. 1142 1143 1144 Section 3.11.1.2, Fragment Program Temporary Registers 1145 1146 The fragment temporary registers (Table X.2) hold intermediate values used 1147 during the execution of a fragment program. There are 96 temporary 1148 register names, but not all can be used simultaneously. 1149 1150 Fragment Temporary 1151 Register Name Description 1152 ------------------ ----------------------------------------------------- 1153 R0-R31 Four 32-bit (fp32) floating point values (s.e8.m23) 1154 H0-H63 Four 16-bit (fp16) floating point values (s.e5.m10) 1155 1156 Table X.2: Fragment Temporary Registers. 1157 1158 In addition to the normal temporary registers, there are two temporary 1159 pseudo-registers, "RC" and "HC". RC and HC are treated as unnumbered, 1160 write-only temporary registers. The components of RC have a fp32 data 1161 type; the components of HC have a fp16 data type. The sole purpose of 1162 these registers is to permit instructions to modify the condition code 1163 register (section 3.11.1.4) without overwriting the values in any 1164 temporary register. 1165 1166 Fragment program instructions can read and write temporary registers. 1167 There is no restriction on the number of temporary registers that can be 1168 accessed by any given instruction. 1169 1170 All temporary registers are initialized to (0,0,0,0) each time a fragment 1171 program executes. 1172 1173 1174 Section 3.11.1.3, Fragment Program Output Registers 1175 1176 The fragment program output registers hold the final results of the 1177 fragment program. The possible final results of a fragment program are a 1178 high- or low-precision RGBA fragment color, and a fragment depth value. 1179 1180 Output 1181 Register Name Description 1182 ------------- ------------------------------------------------------- 1183 o[COLR] Final RGBA fragment color, fp32 format 1184 o[COLH] Final RGBA fragment color, fp16 format 1185 o[DEPR] Final fragment depth value, fp32 format 1186 1187 Table X.3: Fragment Program Output Registers. 1188 1189 o[COLR] and o[COLH] specify the color of a fragment. These two registers 1190 are identical, except for the associated data type of the components. The 1191 R, G, B, and A components of the fragment color are taken from the x, y, 1192 z, and w components respectively of the o[COLR] or o[COLH]. A fragment 1193 program will fail to load if it writes to both o[COLR] and o[COLH]. 1194 1195 o[DEPR] can be used to replace the associated depth value of a fragment. 1196 The new depth value is taken from the z component of o[DEPR]. If a 1197 fragment program does not write to o[DEPR], the associated depth value is 1198 unmodified. 1199 1200 A fragment program will fail to load if it does not write to at least one 1201 output register. 1202 1203 The fragment program output registers may not be read by a fragment 1204 program, but may be written to multiple times. 1205 1206 The values of all fragment program output registers are initially 1207 undefined. 1208 1209 1210 Section 3.11.1.4, Fragment Program Condition Code Register 1211 1212 The condition code register (CC) is a single four-component vector. Each 1213 component of this register is one of four enumerated values: GT (greater 1214 than), EQ (equal), LT (less than), or UN (unordered). The condition code 1215 register can be used to mask writes to fragment data register components 1216 or to terminate processing of a fragment altogether (via the KIL 1217 instruction). 1218 1219 Most fragment program instructions can optionally update the condition 1220 code register. When a fragment program instruction updates the condition 1221 code register, a condition code component is set to LT if the 1222 corresponding component of the result vector is less than zero, EQ if it 1223 is equal to zero, GT if it is greater than zero, and UN if it is NaN (not 1224 a number). 1225 1226 The condition code register is initialized to a vector of EQ values each 1227 time a fragment program executes. 1228 1229 1230 Section 3.11.2, Fragment Program Parameters 1231 1232 In addition to using the registers defined in Section 3.11.1, fragment 1233 programs may also use fragment program parameters in their computation. 1234 Fragment program parameters are constant during the execution of fragment 1235 programs, but some parameters may be modified outside the execution of a 1236 fragment program. 1237 1238 There are five different types of program parameters: embedded scalar 1239 constants, embedded vector constants, named constants, named local 1240 parameters, and numbered local parameters. 1241 1242 Embedded scalar constants are written as standard floating-point numbers 1243 with an optional sign designator ("+" or "-") and optional scientific 1244 notation (e.g., "E+06", meaning "times 10^6"). 1245 1246 Embedded vector constants are written as a comma-separated array of one to 1247 four scalar constants, surrounded by braces (like a C/C++ array 1248 initializer). Vector constants are always treated as 4-component vectors: 1249 constants with fewer than four components are expanded to 4-components by 1250 filling missing y and z components with 0.0 and missing w components with 1251 1.0. Thus, the vector constant "{2}" is equivalent to "{2,0,0,1}", 1252 "{3,4}" is equivalent to "{3,4,0,1}", and "{5,6,7}" is equivalent to 1253 "{5,6,7,1}". 1254 1255 Named constants allow fragment program instructions to define scalar or 1256 vector constants that can be referenced by name. Named constants are 1257 created using the DEFINE instruction: 1258 1259 DEFINE pi = 3.1415926535; 1260 DEFINE color = {0.2, 0.5, 0.8, 1.0}; 1261 1262 The DEFINE instruction associates a constant name with a scalar or vector 1263 constant value. Subsequent fragment program instructions that use the 1264 constant name are equivalent to those using the corresponding constant 1265 value. 1266 1267 Named local parameters are similar to named vector constants, but their 1268 values can be modified after the program is loaded. Local parameters are 1269 created using the DECLARE instruction: 1270 1271 DECLARE fog_color1; 1272 DECLARE fog_color2 = {0.3, 0.6, 0.9, 0.1}; 1273 1274 The DECLARE instruction creates a 4-component vector associated with the 1275 local parameter name. Subsequent fragment program instructions 1276 referencing the local parameter name are processed as though the current 1277 value of the local parameter vector were specified instead of the 1278 parameter name. A DECLARE instruction can optionally specify an initial 1279 value for the local parameter, which can be either a scalar or vector 1280 constant. Scalar constants are expanded to 4-component vectors by 1281 replicating the scalar value in each component. The initial value of 1282 local parameters not initialized by the program is (0,0,0,0). 1283 1284 A named local parameter for a specific program can be updated using the 1285 calls ProgramNamedParameter4fNV or ProgramNamedParameter4fvNV (section 1286 5.7). Named local parameters are accessible only by the program in which 1287 they are defined. Modifying a local parameter affects the only the 1288 associated program and does not affect local parameters with the same name 1289 that are found in any other fragment program. 1290 1291 Numbered local parameters are similar to named local parameters, except 1292 that they are referred to by number and are not declared in fragment 1293 programs. Each fragment program object has an array of four-component 1294 floating-point vectors that can be used by the program. The number of 1295 vectors is given by the implementation-dependent constant 1296 MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV, and must be at least 64. A 1297 numbered local parameter is accessed by a fragment program as members of 1298 an array called "p". For example, the instruction 1299 1300 MOV R0, p[31]; 1301 1302 copies the contents of numbered local parameter 31 into temporary register 1303 R0. 1304 1305 Constant and local parameter names can be arbitrary strings consisting of 1306 letters (upper or lower-case), numbers, underscores ("_"), and dollar 1307 signs ("$"). Keywords defined in the grammar (including instruction 1308 names) can not be used as constant names, nor can strings that start with 1309 numbers, or strings that specify valid temporary register or texture 1310 numbers (e.g., "R0"-"R31", "H0"-"H63"", "TEX0"-"TEX15"). A fragment 1311 program will fail to load if a DEFINE or DECLARE instruction specifies an 1312 invalid constant or local parameter name. 1313 1314 A fragment program will fail to load if an instruction contains a named 1315 parameter not specified in a previous DEFINE or DECLARE instruction. A 1316 fragment program will also fail to load if a DEFINE or DECLARE instruction 1317 attempts to re-define a named parameter specified in a previous DEFINE or 1318 DECLARE instruction. 1319 1320 The contents of the fragment program parameters may not be modified by a 1321 fragment program. In addition, each fragment program instruction can 1322 normally use at most one unique program parameter. The only exception to 1323 this rule is if all program parameter references specify named or embedded 1324 constants that taken together contain no more than four unique scalar 1325 values. For such instructions, the GL will automatically generate an 1326 equivalent instruction that references a single merged vector constant. 1327 This merging allows programs to specify instructions like the following: 1328 1329 Instruction Equivalent Instruction 1330 --------------------- --------------------------------------- 1331 MAD R0, R1, 2, -1; MAD R0, R1, {2,-1,0,0}.x, {2,-1,0,0}.y; 1332 ADD R0, {1,2,3,4}, 4; ADD R0, {1,2,3,4}.xyzw, {1,2,3,4}.w; 1333 1334 Before counting the number of unique values, any named constants are first 1335 converted to the equivalent embedded constants. When generating a 1336 combined vector constant, the GL does not perform swizzling, component 1337 selection, negation, or absolute value operations. The following 1338 instructions are invalid, as they contain more than four unique scalar 1339 values. 1340 1341 Invalid Instructions 1342 ----------------------------------- 1343 ADD R0, {1,2,3,4}, -4; 1344 ADD R0, {1,2,3,4}, |-4|; 1345 ADD R0, {1,2,3,4}, -{-1,-2,-3,-4}; 1346 ADD R0, {1,2,3,4}, {4,5,6,7}.x; 1347 1348 1349 Section 3.11.3, Fragment Program Specification 1350 1351 Fragment programs are specified as an array of ubytes. The array is a 1352 string of ASCII characters encoding the program. The command 1353 LoadProgramNV loads a fragment program when the target parameter is 1354 FRAGMENT_PROGRAM_NV. The command BindProgramNV enables a fragment program 1355 for execution. 1356 1357 At program load time, the program is parsed into a set of tokens possibly 1358 separated by white space. Spaces, tabs, newlines, carriage returns, and 1359 comments are considered whitespace. Comments begin with the character "#" 1360 and are terminated by a newline, a carriage return, or the end of the 1361 program array. Fragment programs are case-sensitive -- upper and lower 1362 case letters are treated differently. The proper choice of case can be 1363 inferred from the grammar. 1364 1365 The Backus-Naur Form (BNF) grammar below specifies the syntactically valid 1366 sequences for fragment programs. The set of valid tokens can be inferred 1367 from the grammar. The token "" represents an empty string and is used to 1368 indicate optional rules. A program is invalid if it contains any 1369 undefined tokens or characters. 1370 1371 <program> ::= <progPrefix> <instructionSequence> "END" 1372 1373 <progPrefix> ::= "!!FP1.0" 1374 1375 <instructionSequence> ::= <instructionSequence> <instructionStatement> 1376 | <instructionStatement> 1377 1378 <instructionStatement> ::= <instruction> ";" 1379 | <constantDefinition> ";" 1380 | <localDeclaration> ";" 1381 1382 <instruction> ::= <VECTORop-instruction> 1383 | <SCALARop-instruction> 1384 | <BINSCop-instruction> 1385 | <BINop-instruction> 1386 | <TRIop-instruction> 1387 | <KILop-instruction> 1388 | <TEXop-instruction> 1389 | <TXDop-instruction> 1390 1391 <VECTORop-instruction> ::= <VECTORop> <maskedDstReg> "," 1392 <vectorSrc> 1393 1394 <VECTORop> ::= "DDX" | "DDX_SAT" 1395 | "DDXR" | "DDXR_SAT" 1396 | "DDXH" | "DDXH_SAT" 1397 | "DDXC" | "DDXC_SAT" 1398 | "DDXRC" | "DDXRC_SAT" 1399 | "DDXHC" | "DDXHC_SAT" 1400 | "DDY" | "DDY_SAT" 1401 | "DDYR" | "DDYR_SAT" 1402 | "DDYH" | "DDYH_SAT" 1403 | "DDYC" | "DDYC_SAT" 1404 | "DDYRC" | "DDYRC_SAT" 1405 | "DDYHC" | "DDYHC_SAT" 1406 | "FLR" | "FLR_SAT" 1407 | "FLRR" | "FLRR_SAT" 1408 | "FLRH" | "FLRH_SAT" 1409 | "FLRX" | "FLRX_SAT" 1410 | "FLRC" | "FLRC_SAT" 1411 | "FLRRC" | "FLRRC_SAT" 1412 | "FLRHC" | "FLRHC_SAT" 1413 | "FLRXC" | "FLRXC_SAT" 1414 | "FRC" | "FRC_SAT" 1415 | "FRCR" | "FRCR_SAT" 1416 | "FRCH" | "FRCH_SAT" 1417 | "FRCX" | "FRCX_SAT" 1418 | "FRCC" | "FRCC_SAT" 1419 | "FRCRC" | "FRCRC_SAT" 1420 | "FRCHC" | "FRCHC_SAT" 1421 | "FRCXC" | "FRCXC_SAT" 1422 | "LIT" | "LIT_SAT" 1423 | "LITR" | "LITR_SAT" 1424 | "LITH" | "LITH_SAT" 1425 | "LITC" | "LITC_SAT" 1426 | "LITRC" | "LITRC_SAT" 1427 | "LITHC" | "LITHC_SAT" 1428 | "MOV" | "MOV_SAT" 1429 | "MOVR" | "MOVR_SAT" 1430 | "MOVH" | "MOVH_SAT" 1431 | "MOVX" | "MOVX_SAT" 1432 | "MOVC" | "MOVC_SAT" 1433 | "MOVRC" | "MOVRC_SAT" 1434 | "MOVHC" | "MOVHC_SAT" 1435 | "MOVXC" | "MOVXC_SAT" 1436 | "PK2H" 1437 | "PK2US" 1438 | "PK4B" 1439 | "PK4UB" 1440 1441 <SCALARop-instruction> ::= <SCALARop> <maskedDstReg> "," 1442 <scalarSrc> 1443 1444 <SCALARop> ::= "COS" | "COS_SAT" 1445 | "COSR" | "COSR_SAT" 1446 | "COSH" | "COSH_SAT" 1447 | "COSC" | "COSC_SAT" 1448 | "COSRC" | "COSRC_SAT" 1449 | "COSHC" | "COSHC_SAT" 1450 | "EX2" | "EX2_SAT" 1451 | "EX2R" | "EX2R_SAT" 1452 | "EX2H" | "EX2H_SAT" 1453 | "EX2C" | "EX2C_SAT" 1454 | "EX2RC" | "EX2RC_SAT" 1455 | "EX2HC" | "EX2HC_SAT" 1456 | "LG2" | "LG2_SAT" 1457 | "LG2R" | "LG2R_SAT" 1458 | "LG2H" | "LG2H_SAT" 1459 | "LG2C" | "LG2C_SAT" 1460 | "LG2RC" | "LG2RC_SAT" 1461 | "LG2HC" | "LG2HC_SAT" 1462 | "RCP" | "RCP_SAT" 1463 | "RCPR" | "RCPR_SAT" 1464 | "RCPH" | "RCPH_SAT" 1465 | "RCPC" | "RCPC_SAT" 1466 | "RCPRC" | "RCPRC_SAT" 1467 | "RCPHC" | "RCPHC_SAT" 1468 | "RSQ" | "RSQ_SAT" 1469 | "RSQR" | "RSQR_SAT" 1470 | "RSQH" | "RSQH_SAT" 1471 | "RSQC" | "RSQC_SAT" 1472 | "RSQRC" | "RSQRC_SAT" 1473 | "RSQHC" | "RSQHC_SAT" 1474 | "SIN" | "SIN_SAT" 1475 | "SINR" | "SINR_SAT" 1476 | "SINH" | "SINH_SAT" 1477 | "SINC" | "SINC_SAT" 1478 | "SINRC" | "SINRC_SAT" 1479 | "SINHC" | "SINHC_SAT" 1480 | "UP2H" | "UP2H_SAT" 1481 | "UP2HC" | "UP2HC_SAT" 1482 | "UP2US" | "UP2US_SAT" 1483 | "UP2USC" | "UP2USC_SAT" 1484 | "UP4B" | "UP4B_SAT" 1485 | "UP4BC" | "UP4BC_SAT" 1486 | "UP4UB" | "UP4UB_SAT" 1487 | "UP4UBC" | "UP4UBC_SAT" 1488 1489 <BINSCop-instruction> ::= <BINSCop> <maskedDstReg> "," 1490 <scalarSrc> "," <scalarSrc> 1491 1492 <BINSCop> ::= "POW" | "POW_SAT" 1493 | "POWR" | "POWR_SAT" 1494 | "POWH" | "POWH_SAT" 1495 | "POWC" | "POWC_SAT" 1496 | "POWRC" | "POWRC_SAT" 1497 | "POWHC" | "POWHC_SAT" 1498 1499 <BINop-instruction> ::= <BINop> <maskedDstReg> "," 1500 <vectorSrc> "," <vectorSrc> 1501 1502 <BINop> ::= "ADD" | "ADD_SAT" 1503 | "ADDR" | "ADDR_SAT" 1504 | "ADDH" | "ADDH_SAT" 1505 | "ADDX" | "ADDX_SAT" 1506 | "ADDC" | "ADDC_SAT" 1507 | "ADDRC" | "ADDRC_SAT" 1508 | "ADDHC" | "ADDHC_SAT" 1509 | "ADDXC" | "ADDXC_SAT" 1510 | "DP3" | "DP3_SAT" 1511 | "DP3R" | "DP3R_SAT" 1512 | "DP3H" | "DP3H_SAT" 1513 | "DP3X" | "DP3X_SAT" 1514 | "DP3C" | "DP3C_SAT" 1515 | "DP3RC" | "DP3RC_SAT" 1516 | "DP3HC" | "DP3HC_SAT" 1517 | "DP3XC" | "DP3XC_SAT" 1518 | "DP4" | "DP4_SAT" 1519 | "DP4R" | "DP4R_SAT" 1520 | "DP4H" | "DP4H_SAT" 1521 | "DP4X" | "DP4X_SAT" 1522 | "DP4C" | "DP4C_SAT" 1523 | "DP4RC" | "DP4RC_SAT" 1524 | "DP4HC" | "DP4HC_SAT" 1525 | "DP4XC" | "DP4XC_SAT" 1526 | "DST" | "DST_SAT" 1527 | "DSTR" | "DSTR_SAT" 1528 | "DSTH" | "DSTH_SAT" 1529 | "DSTC" | "DSTC_SAT" 1530 | "DSTRC" | "DSTRC_SAT" 1531 | "DSTHC" | "DSTHC_SAT" 1532 | "MAX" | "MAX_SAT" 1533 | "MAXR" | "MAXR_SAT" 1534 | "MAXH" | "MAXH_SAT" 1535 | "MAXX" | "MAXX_SAT" 1536 | "MAXC" | "MAXC_SAT" 1537 | "MAXRC" | "MAXRC_SAT" 1538 | "MAXHC" | "MAXHC_SAT" 1539 | "MAXXC" | "MAXXC_SAT" 1540 | "MIN" | "MIN_SAT" 1541 | "MINR" | "MINR_SAT" 1542 | "MINH" | "MINH_SAT" 1543 | "MINX" | "MINX_SAT" 1544 | "MINC" | "MINC_SAT" 1545 | "MINRC" | "MINRC_SAT" 1546 | "MINHC" | "MINHC_SAT" 1547 | "MINXC" | "MINXC_SAT" 1548 | "MUL" | "MUL_SAT" 1549 | "MULR" | "MULR_SAT" 1550 | "MULH" | "MULH_SAT" 1551 | "MULX" | "MULX_SAT" 1552 | "MULC" | "MULC_SAT" 1553 | "MULRC" | "MULRC_SAT" 1554 | "MULHC" | "MULHC_SAT" 1555 | "MULXC" | "MULXC_SAT" 1556 | "RFL" | "RFL_SAT" 1557 | "RFLR" | "RFLR_SAT" 1558 | "RFLH" | "RFLH_SAT" 1559 | "RFLC" | "RFLC_SAT" 1560 | "RFLRC" | "RFLRC_SAT" 1561 | "RFLHC" | "RFLHC_SAT" 1562 | "SEQ" | "SEQ_SAT" 1563 | "SEQR" | "SEQR_SAT" 1564 | "SEQH" | "SEQH_SAT" 1565 | "SEQX" | "SEQX_SAT" 1566 | "SEQC" | "SEQC_SAT" 1567 | "SEQRC" | "SEQRC_SAT" 1568 | "SEQHC" | "SEQHC_SAT" 1569 | "SEQXC" | "SEQXC_SAT" 1570 | "SFL" | "SFL_SAT" 1571 | "SFLR" | "SFLR_SAT" 1572 | "SFLH" | "SFLH_SAT" 1573 | "SFLX" | "SFLX_SAT" 1574 | "SFLC" | "SFLC_SAT" 1575 | "SFLRC" | "SFLRC_SAT" 1576 | "SFLHC" | "SFLHC_SAT" 1577 | "SFLXC" | "SFLXC_SAT" 1578 | "SGE" | "SGE_SAT" 1579 | "SGER" | "SGER_SAT" 1580 | "SGEH" | "SGEH_SAT" 1581 | "SGEX" | "SGEX_SAT" 1582 | "SGEC" | "SGEC_SAT" 1583 | "SGERC" | "SGERC_SAT" 1584 | "SGEHC" | "SGEHC_SAT" 1585 | "SGEXC" | "SGEXC_SAT" 1586 | "SGT" | "SGT_SAT" 1587 | "SGTR" | "SGTR_SAT" 1588 | "SGTH" | "SGTH_SAT" 1589 | "SGTX" | "SGTX_SAT" 1590 | "SGTC" | "SGTC_SAT" 1591 | "SGTRC" | "SGTRC_SAT" 1592 | "SGTHC" | "SGTHC_SAT" 1593 | "SGTXC" | "SGTXC_SAT" 1594 | "SLE" | "SLE_SAT" 1595 | "SLER" | "SLER_SAT" 1596 | "SLEH" | "SLEH_SAT" 1597 | "SLEX" | "SLEX_SAT" 1598 | "SLEC" | "SLEC_SAT" 1599 | "SLERC" | "SLERC_SAT" 1600 | "SLEHC" | "SLEHC_SAT" 1601 | "SLEXC" | "SLEXC_SAT" 1602 | "SLT" | "SLT_SAT" 1603 | "SLTR" | "SLTR_SAT" 1604 | "SLTH" | "SLTH_SAT" 1605 | "SLTX" | "SLTX_SAT" 1606 | "SLTC" | "SLTC_SAT" 1607 | "SLTRC" | "SLTRC_SAT" 1608 | "SLTHC" | "SLTHC_SAT" 1609 | "SLTXC" | "SLTXC_SAT" 1610 | "SNE" | "SNE_SAT" 1611 | "SNER" | "SNER_SAT" 1612 | "SNEH" | "SNEH_SAT" 1613 | "SNEX" | "SNEX_SAT" 1614 | "SNEC" | "SNEC_SAT" 1615 | "SNERC" | "SNERC_SAT" 1616 | "SNEHC" | "SNEHC_SAT" 1617 | "SNEXC" | "SNEXC_SAT" 1618 | "STR" | "STR_SAT" 1619 | "STRR" | "STRR_SAT" 1620 | "STRH" | "STRH_SAT" 1621 | "STRX" | "STRX_SAT" 1622 | "STRC" | "STRC_SAT" 1623 | "STRRC" | "STRRC_SAT" 1624 | "STRHC" | "STRHC_SAT" 1625 | "STRXC" | "STRXC_SAT" 1626 | "SUB" | "SUB_SAT" 1627 | "SUBR" | "SUBR_SAT" 1628 | "SUBH" | "SUBH_SAT" 1629 | "SUBX" | "SUBX_SAT" 1630 | "SUBC" | "SUBC_SAT" 1631 | "SUBRC" | "SUBRC_SAT" 1632 | "SUBHC" | "SUBHC_SAT" 1633 | "SUBXC" | "SUBXC_SAT" 1634 1635 <TRIop-instruction> ::= <TRIop> <maskedDstReg> "," 1636 <vectorSrc> "," <vectorSrc> "," 1637 <vectorSrc> 1638 1639 <TRIop> ::= "MAD" | "MAD_SAT" 1640 | "MADR" | "MADR_SAT" 1641 | "MADH" | "MADH_SAT" 1642 | "MADX" | "MADX_SAT" 1643 | "MADC" | "MADC_SAT" 1644 | "MADRC" | "MADRC_SAT" 1645 | "MADHC" | "MADHC_SAT" 1646 | "MADXC" | "MADXC_SAT" 1647 | "LRP" | "LRP_SAT" 1648 | "LRPR" | "LRPR_SAT" 1649 | "LRPH" | "LRPH_SAT" 1650 | "LRPX" | "LRPX_SAT" 1651 | "LRPC" | "LRPC_SAT" 1652 | "LRPRC" | "LRPRC_SAT" 1653 | "LRPHC" | "LRPHC_SAT" 1654 | "LRPXC" | "LRPXC_SAT" 1655 | "X2D" | "X2D_SAT" 1656 | "X2DR" | "X2DR_SAT" 1657 | "X2DH" | "X2DH_SAT" 1658 | "X2DC" | "X2DC_SAT" 1659 | "X2DRC" | "X2DRC_SAT" 1660 | "X2DHC" | "X2DHC_SAT" 1661 1662 <KILop-instruction> ::= <KILop> <ccMask> 1663 1664 <KILop> ::= "KIL" 1665 1666 <TEXop-instruction> ::= <TEXop> <maskedDstReg> "," 1667 <vectorSrc> "," <texImageId> 1668 1669 <TEXop> ::= "TEX" | "TEX_SAT" 1670 | "TEXC" | "TEXC_SAT" 1671 | "TXP" | "TXP_SAT" 1672 | "TXPC" | "TXPC_SAT" 1673 1674 <TXDop-instruction> ::= <TXDop> <maskedDstReg> "," 1675 <vectorSrc> "," <vectorSrc> "," 1676 <vectorSrc> "," <texImageId> 1677 1678 <TXDop> ::= "TXD" | "TXD_SAT" 1679 | "TXDC" | "TXDC_SAT" 1680 1681 <scalarSrc> ::= <absScalarSrc> 1682 | <baseScalarSrc> 1683 1684 <absScalarSrc> ::= <negate> "|" <baseScalarSrc> "|" 1685 1686 <baseScalarSrc> ::= <signedScalarConstant> 1687 | <negate> <namedScalarConstant> 1688 | <negate> <vectorConstant> <scalarSuffix> 1689 | <negate> <namedLocalParameter> <scalarSuffix> 1690 | <negate> <numberedLocal> <scalarSuffix> 1691 | <negate> <srcRegister> <scalarSuffix> 1692 1693 <vectorSrc> ::= <absVectorSrc> 1694 | <baseVectorSrc> 1695 1696 <absVectorSrc> ::= <negate> "|" <baseVectorSrc> "|" 1697 1698 <baseVectorSrc> ::= <signedScalarConstant> 1699 | <negate> <namedScalarConstant> 1700 | <negate> <vectorConstant> <scalarSuffix> 1701 | <negate> <vectorConstant> <swizzleSuffix> 1702 | <negate> <namedLocalParameter> <scalarSuffix> 1703 | <negate> <namedLocalParameter> <swizzleSuffix> 1704 | <negate> <numberedLocal> <scalarSuffix> 1705 | <negate> <numberedLocal> <swizzleSuffix> 1706 | <negate> <srcRegister> <scalarSuffix> 1707 | <negate> <srcRegister> <swizzleSuffix> 1708 1709 <maskedDstReg> ::= <dstRegister> <optionalWriteMask> 1710 <optionalCCMask> 1711 1712 <dstRegister> ::= <fragTempReg> 1713 | <fragOutputReg> 1714 | "RC" 1715 | "HC" 1716 1717 <optionalCCMask> ::= "(" <ccMask> ")" 1718 | "" 1719 1720 <ccMask> ::= <ccMaskRule> <swizzleSuffix> 1721 | <ccMaskRule> <scalarSuffix> 1722 1723 <ccMaskRule> ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE" | 1724 "TR" | "FL" 1725 1726 <optionalWriteMask> ::= "" 1727 | "." "x" 1728 | "." "y" 1729 | "." "x" "y" 1730 | "." "z" 1731 | "." "x" "z" 1732 | "." "y" "z" 1733 | "." "x" "y" "z" 1734 | "." "w" 1735 | "." "x" "w" 1736 | "." "y" "w" 1737 | "." "x" "y" "w" 1738 | "." "z" "w" 1739 | "." "x" "z" "w" 1740 | "." "y" "z" "w" 1741 | "." "x" "y" "z" "w" 1742 1743 <srcRegister> ::= <fragAttribReg> 1744 | <fragTempReg> 1745 1746 <fragAttribReg> ::= "f" "[" <fragAttribRegId> "]" 1747 1748 <fragAttribRegId> ::= "WPOS" | "COL0" | "COL1" | "FOGC" | "TEX0" 1749 | "TEX1" | "TEX2" | "TEX3" | "TEX4" | "TEX5" 1750 | "TEX6" | "TEX7" 1751 1752 <fragTempReg> ::= <fragF32Reg> 1753 | <fragF16Reg> 1754 1755 <fragF32Reg> ::= "R0" | "R1" | "R2" | "R3" 1756 | "R4" | "R5" | "R6" | "R7" 1757 | "R8" | "R9" | "R10" | "R11" 1758 | "R12" | "R13" | "R14" | "R15" 1759 | "R16" | "R17" | "R18" | "R19" 1760 | "R20" | "R21" | "R22" | "R23" 1761 | "R24" | "R25" | "R26" | "R27" 1762 | "R28" | "R29" | "R30" | "R31" 1763 1764 <fragF16Reg> ::= "H0" | "H1" | "H2" | "H3" 1765 | "H4" | "H5" | "H6" | "H7" 1766 | "H8" | "H9" | "H10" | "H11" 1767 | "H12" | "H13" | "H14" | "H15" 1768 | "H16" | "H17" | "H18" | "H19" 1769 | "H20" | "H21" | "H22" | "H23" 1770 | "H24" | "H25" | "H26" | "H27" 1771 | "H28" | "H29" | "H30" | "H31" 1772 | "H32" | "H33" | "H34" | "H35" 1773 | "H36" | "H37" | "H38" | "H39" 1774 | "H40" | "H41" | "H42" | "H43" 1775 | "H44" | "H45" | "H46" | "H47" 1776 | "H48" | "H49" | "H50" | "H51" 1777 | "H52" | "H53" | "H54" | "H55" 1778 | "H56" | "H57" | "H58" | "H59" 1779 | "H60" | "H61" | "H62" | "H63" 1780 1781 <fragOutputReg> ::= "o" "[" <fragOutputRegName> "]" 1782 1783 <fragOutputRegName> ::= "COLR" | "COLH" | "DEPR" 1784 1785 <numberedLocal> ::= "p" "[" <localNumber> "]" 1786 1787 <localNumber> ::= <integer> from 0 to 1788 MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV - 1 1789 1790 <scalarSuffix> ::= "." <component> 1791 1792 <swizzleSuffix> ::= "" 1793 | "." <component> <component> 1794 <component> <component> 1795 1796 <component> ::= "x" | "y" | "z" | "w" 1797 1798 <texImageId> ::= <texImageUnit> "," <texImageTarget> 1799 1800 <texImageUnit> ::= "TEX0" | "TEX1" | "TEX2" | "TEX3" 1801 | "TEX4" | "TEX5" | "TEX6" | "TEX7" 1802 | "TEX8" | "TEX9" | "TEX10" | "TEX11" 1803 | "TEX12" | "TEX13" | "TEX14" | "TEX15" 1804 1805 <texImageTarget> ::= "1D" | "2D" | "3D" | "CUBE" | "RECT" 1806 1807 <constantDefinition> ::= "DEFINE" <namedVectorConstant> "=" 1808 <vectorConstant> 1809 | "DEFINE" <namedScalarConstant> "=" 1810 <scalarConstant> 1811 1812 <localDeclaration> ::= "DECLARE" <namedLocalParameter> 1813 <optionalLocalValue> 1814 1815 <optionalLocalValue> ::= "" 1816 | "=" <vectorConstant> 1817 | "=" <scalarConstant> 1818 1819 <vectorConstant> ::= {" <vectorConstantList> "}" 1820 | <namedVectorConstant> 1821 1822 <vectorConstantList> ::= <scalarConstant> 1823 | <scalarConstant> "," <scalarConstant> 1824 | <scalarConstant> "," <scalarConstant> "," 1825 <scalarConstant> 1826 | <scalarConstant> "," <scalarConstant> "," 1827 <scalarConstant> "," <scalarConstant> 1828 1829 <scalarConstant> ::= <signedScalarConstant> 1830 | <namedScalarConstant> 1831 1832 <signedScalarConstant> ::= <optionalSign> <floatConstant> 1833 1834 <namedScalarConstant> ::= <identifier> ((name of a scalar constant 1835 in a DEFINE instruction)) 1836 1837 <namedVectorConstant> ::= <identifier> ((name of a vector constant 1838 in a DEFINE instruction)) 1839 1840 <namedLocalParameter> ::= <identifier> ((name of a local parameter 1841 in a DECLARE instruction)) 1842 1843 <negate> ::= "-" | "+" | "" 1844 1845 <optionalSign> ::= "-" | "+" | "" 1846 1847 <identifier> ::= see text below 1848 1849 <floatConstant> ::= see text below 1850 1851 1852 The <identifier> rule matches a sequence of one or more letters ("A" 1853 through "Z", "a" through "z", "_", and "$") and digits ("0" through "9); 1854 the first character must be a letter. The underscore ("_") and dollar 1855 sign ("$") count as a letters. Upper and lower case letters are different 1856 (names are case-sensitive). 1857 1858 The <floatConstant> rule matches a floating-point constant consisting 1859 of an integer part, a decimal point, a fraction part, an "e" or 1860 "E", and an optionally signed integer exponent. The integer and 1861 fraction parts both consist of a sequence of on or more digits ("0" 1862 through "9"). Either the integer part or the fraction parts (not 1863 both) may be missing; either the decimal point or the "e" (or "E") 1864 and the exponent (not both) may be missing. 1865 1866 A fragment program fails to load if it contains more than the maximum 1867 number of executable instructions. If ARB_fragment_program is supported, 1868 this limit is the value of MAX_PROGRAM_INSTRUCTIONS_ARB for the 1869 FRAGMENT_PROGRAM_ARB target. Otherwise, the limit is 1024. Executable 1870 instructions are those matching the <instruction> rule in the grammar, and 1871 do not include DEFINE or DECLARE instructions. 1872 1873 A fragment program fails to load if its total temporary and output 1874 register count exceeds 64. Each fp32 temporary or output register used by 1875 the program (R0-R31, o[COLR], and o[DEPR]) counts as two registers; each 1876 fp16 temporary or output register used by the program (H0-H63 and o[COLH]) 1877 count as a single register. 1878 1879 A fragment program fails to load if any instruction sources more than one 1880 unique fragment attribute register. Instructions sourcing the same 1881 attribute register multiple times are acceptable. 1882 1883 A fragment program fails to load if any instruction sources more than one 1884 unique program parameter register. Instructions sourcing the same program 1885 parameter multiple times are acceptable. 1886 1887 A fragment program fails to load if multiple texture lookup instructions 1888 reference different targets for the same texture image unit. 1889 1890 A fragment program fails to load if it writes to both the o[COLR] and 1891 o[COLH] output registers. 1892 1893 The error INVALID_OPERATION is generated by LoadProgramNV if a fragment 1894 program fails to load because it is not syntactically correct or for one 1895 of the semantic restrictions listed above. 1896 1897 The error INVALID_OPERATION is generated by LoadProgramNV if a program is 1898 loaded for id when id is currently loaded with a program of a different 1899 target. 1900 1901 A successfully loaded fragment program is parsed into a sequence of 1902 instructions. Each instruction is identified by its tokenized name. The 1903 operation of these instructions when executed is defined in Sections 1904 3.11.4 and 3.11.5. 1905 1906 1907 Section 3.11.4, Fragment Program Operation 1908 1909 There are forty-five fragment program instructions. Fragment program 1910 instructions may have up to eight variants, including a suffix of "R", 1911 "H", or "X" to specify arithmetic precision (section 3.11.4.2), a suffix 1912 of "C" to allow an update of the condition code register (section 1913 3.11.4.4), and a suffix of "_SAT" to clamp the result vector components to 1914 the range [0,1] (section 3.11.4.4). For example, the sixteen forms of the 1915 "ADD" instruction are "ADD", "ADDR", "ADDH", "ADDX", "ADDC", "ADDRC", 1916 "ADDHC", "ADDXC", "ADD_SAT", "ADDR_SAT", "ADDH_SAT", "ADDX_SAT", 1917 "ADDC_SAT", "ADDRC_SAT", "ADDHC_SAT", and "ADDXC_SAT". 1918 1919 Some mathematical instructions that support precision suffixes, typically 1920 those that involve complicated floating-point computations, do not support 1921 the "X" precision suffix. 1922 1923 The fragment program instructions and their respective input and output 1924 parameters are summarized in Table X.4. 1925 1926 Instruction Inputs Output Description 1927 ----------------- ------ ------ -------------------------------- 1928 ADD[RHX][C][_SAT] v,v v add 1929 COS[RH ][C][_SAT] s ssss cosine 1930 DDX[RH ][C][_SAT] v v derivative relative to x 1931 DDY[RH ][C][_SAT] v v derivative relative to y 1932 DP3[RHX][C][_SAT] v,v ssss 3-component dot product 1933 DP4[RHX][C][_SAT] v,v ssss 4-component dot product 1934 DST[RH ][C][_SAT] v,v v distance vector 1935 EX2[RH ][C][_SAT] s ssss exponential base 2 1936 FLR[RHX][C][_SAT] v v floor 1937 FRC[RHX][C][_SAT] v v fraction 1938 KIL none none conditionally discard fragment 1939 LG2[RH ][C][_SAT] s ssss logarithm base 2 1940 LIT[RH ][C][_SAT] v v compute light coefficients 1941 LRP[RHX][C][_SAT] v,v,v v linear interpolation 1942 MAD[RHX][C][_SAT] v,v,v v multiply and add 1943 MAX[RHX][C][_SAT] v,v v maximum 1944 MIN[RHX][C][_SAT] v,v v minimum 1945 MOV[RHX][C][_SAT] v v move 1946 MUL[RHX][C][_SAT] v,v v multiply 1947 PK2H v ssss pack two 16-bit floats 1948 PK2US v ssss pack two unsigned 16-bit scalars 1949 PK4B v ssss pack four signed 8-bit scalars 1950 PK4UB v ssss pack four unsigned 8-bit scalars 1951 POW[RH ][C][_SAT] s,s ssss exponentiation (x^y) 1952 RCP[RH ][C][_SAT] s ssss reciprocal 1953 RFL[RH ][C][_SAT] v,v v reflection vector 1954 RSQ[RH ][C][_SAT] s ssss reciprocal square root 1955 SEQ[RHX][C][_SAT] v,v v set on equal 1956 SFL[RHX][C][_SAT] v,v v set on false 1957 SGE[RHX][C][_SAT] v,v v set on greater than or equal 1958 SGT[RHX][C][_SAT] v,v v set on greater than 1959 SIN[RH ][C][_SAT] s ssss sine 1960 SLE[RHX][C][_SAT] v,v v set on less than or equal 1961 SLT[RHX][C][_SAT] v,v v set on less than 1962 SNE[RHX][C][_SAT] v,v v set on not equal 1963 STR[RHX][C][_SAT] v,v v set on true 1964 SUB[RHX][C][_SAT] v,v v subtract 1965 TEX[C][_SAT] v v texture lookup 1966 TXD[C][_SAT] v,v,v v texture lookup w/partials 1967 TXP[C][_SAT] v v projective texture lookup 1968 UP2H[C][_SAT] s v unpack two 16-bit floats 1969 UP2US[C][_SAT] s v unpack two unsigned 16-bit scalars 1970 UP4B[C][_SAT] s v unpack four signed 8-bit scalars 1971 UP4UB[C][_SAT] s v unpack four unsigned 8-bit scalars 1972 X2D[RH ][C][_SAT] v,v,v v 2D coordinate transformation 1973 1974 Table X.4: Summary of fragment program instructions. "[RHX]" indicates 1975 an optional arithmetic precision suffix. "[C]" indicates an optional 1976 condition code update suffix. "[_SAT]" indicates an optional clamp of 1977 result vector components to [0,1]. "v" indicates a 4-component vector 1978 input or output, "s" indicates a scalar input, and "ssss" indicates a 1979 scalar output replicated across a 4-component vector. 1980 1981 1982 Section 3.11.4.1: Fragment Program Storage Precision 1983 1984 Registers in fragment program are stored in two different representations: 1985 16-bit floating-point (fp16) and 32-bit floating-point (fp32). There is 1986 an additional 12-bit fixed-point representation (fx12) used only as an 1987 internal representation for instructions with the "X" precision qualifier. 1988 1989 In the 32-bit float (fp32) representation, each component is represented 1990 in floating-point with eight exponent and twenty-three mantissa bits, as 1991 in the standard IEEE single-precision format. If S represents the sign (0 1992 or 1), E represents the exponent in the range [0,255], and M represents 1993 the mantissa in the range [0,2^23-1], then a fp32 float is decoded as: 1994 1995 (-1)^S * 0.0, if E == 0, 1996 (-1)^S * 2^(E-127) * (1 + M/2^23), if 0 < E < 255, 1997 (-1)^S * INF, if E == 255 and M == 0, 1998 NaN, if E == 255 and M != 0. 1999 2000 INF (Infinity) is a special representation indicating numerical overflow. 2001 NaN (Not a Number) is a special representation indicating the result of 2002 illegal arithmetic operations, such as computing the square root or 2003 logarithm of a negative number. Note that all normal fp32 values, zero, 2004 and INF have an associated sign. -0.0 and +0.0 are considered equivalent 2005 for the purposes of comparisons. 2006 2007 This representation is identical to the IEEE single-precision 2008 floating-point standard, except that no special representation is provided 2009 for denorms -- numbers in the range (-2^-126, +2^-126). All such numbers 2010 are flushed to zero. 2011 2012 In a 16-bit float (fp16) register, each component is represented 2013 similarly, except with only five exponent and ten mantissa bits. If S 2014 represents the sign (0 or 1), E represents the exponent in the range 2015 [0,31], and M represents the mantissa in the range [0,2^10-1], then an 2016 fp32 float is decoded as: 2017 2018 (-1)^S * 0.0, if E == 0 and M == 0, 2019 (-1)^S * 2^-14 * M/2^10 if E == 0 and M != 0, 2020 (-1)^S * 2^(E-15) * (1 + M/2^10), if 0 < E < 31, 2021 (-1)^S * INF, if E == 31 and M == 0, or 2022 NaN, if E == 31 and M != 0. 2023 2024 One important difference is that the fp16 representation, unlike fp32, 2025 supports denorms to maximize the limited precision of the 16-bit floating 2026 point encodings. 2027 2028 In the 12-bit fixed-point (fx12) format, numbers are represented as signed 2029 12-bit two's complement integers with 10 fraction bits. The range of 2030 representable values is [-2048/1024, +2047/1024]. 2031 2032 Section 3.11.4.2: Fragment Program Operation Precision 2033 2034 Fragment program instructions frequently perform mathematical operations. 2035 Such operations may be performed at one of three different precisions. 2036 Fragment programs can specify the precision of each instruction by using 2037 the precision suffix. If an instruction has a suffix of "R", calculations 2038 are carried out with 32-bit floating point operands and results. If an 2039 instruction has a suffix of "H", calculations are carried out using 16-bit 2040 floating point operands and results. If an instruction has a suffix of 2041 "X", calculations are carried out using 12-bit fixed point operands and 2042 results. For example, the instruction "MULR" performs a 32-bit 2043 floating-point multiply, "MULH" performs a 16-bit floating-point multiply, 2044 and "MULX" performs a 12-bit fixed-point multiply. If no precision suffix 2045 is specified, calculations are carried out using the precision of the 2046 temporary register receiving the result. 2047 2048 Fragment program instructions may source registers or constants whose 2049 precisions differ from the precision specified with the instruction. 2050 Instructions may also generate intermediate results with a different 2051 precision than that of the destination register. In these cases, the 2052 values sourced are converted to the precision specified by the 2053 instruction. 2054 2055 When converting to fx12 format, -INF and any values less than -2048/1024 2056 become -2048/1024. +INF, and any values greater than +2047/1024 become 2057 +2047/1024. NaN becomes 0. 2058 2059 When converting to fp16 format, any values less than or equal to -2^16 are 2060 converted to -INF. Any values greater than or equal to +2^16 are 2061 converted to +INF. -INF, +INF, NaN, -0.0, and +0.0 are unchanged. Any 2062 other values that are not exactly representable in fp16 format are 2063 converted to one of the two nearest representable values. 2064 2065 When converting to fp32 format, any values less than or equal to -2^128 2066 are converted to -INF. Any values greater than or equal to +2^128 are 2067 converted to +INF. -INF, +INF, NaN, -0.0, and +0.0 are unchanged. Any 2068 other values that are not exactly representable in fp32 format are 2069 converted to one of the two nearest representable values. 2070 2071 Fragment program instructions using the fragment attribute registers 2072 f[FOGC] or f[TEX0] through f[TEX7] will be carried out at full fp32 2073 precision, regardless of the precision specified by the instruction. 2074 2075 Section 3.11.4.3: Fragment Program Operands 2076 2077 Except for KIL, fragment program instructions operate on either vector or 2078 scalar operands, indicated in the grammar (see section 3.11.3) by the 2079 rules <vectorSrc> and <scalarSrc> respectively. 2080 2081 The basic set of scalar operands is defined by the grammar rule 2082 <baseScalarSrc>. Scalar operands can be scalar constants (embedded or 2083 named), or single components of vector constants, local parameters, or 2084 registers allowed by the <srcRegister> rule. A vector component is 2085 selected by the <scalarSuffix> rule, where the characters "x", "y", "z", 2086 and "w" select the x, y, z, and w components, respectively, of the vector. 2087 2088 The basic set of vector operands is defined by the grammar rule 2089 <baseVectorSrc>. Vector operands can include vector constants, local 2090 parameters, or registers allowed by the <srcRegister> rule. 2091 2092 Basic vector operands can be swizzled according to the <swizzleSuffix> 2093 rule. In its most general form, the <swizzleSuffix> rule matches the 2094 pattern ".????" where each question mark is one of "x", "y", "z", or "w". 2095 For such patterns, the x, y, z, and w components of the operand are taken 2096 from the vector components named by the first, second, third, and fourth 2097 character of the pattern, respectively. For example, if the swizzle 2098 suffix is ".yzzx" and the specified source contains {2,8,9,0}, the 2099 swizzled operand used by the instruction is {8,9,9,2}. If the 2100 <swizzleSuffix> rule matches "", it is treated as though it were ".xyzw". 2101 2102 Operands can optionally be negated according to the <negate> rule in 2103 <baseScalarSrc> or <baseVectorSrc>. If the <negate> matches "-", each 2104 value is negated. 2105 2106 The absolute value of operands can be taken if the <vectorSrc> or 2107 <scalarSrc> rules match <absScalarSrc> or <absVectorSrc>. In this case, 2108 the absolute value of each component is taken. In addition, if the 2109 <negate> rule in <absScalarSrc> or <absVectorSrc> matches "-", the result 2110 is then negated. 2111 2112 Instructions requiring vector operands can also use scalar operands in the 2113 case where the <vectorSrc> rule matches <scalarSrc>. In such cases, a 2114 4-component vector is produced by replicating the scalar. 2115 2116 After operands are loaded, they are converted to a data type corresponding 2117 to the operation precision specified in the fragment program instruction. 2118 2119 The following pseudo-code spells out the operand generation process. 2120 "SrcT" and "InstT" refer to the data types of the specified register or 2121 constant and the instruction, respectively. "VecSrcT" and "VecInstT" 2122 refer to 4-component vectors of the corresponding type. "absolute" is 2123 TRUE if the operand matches the <absScalarSrc> or <absVectorSrc> rules, 2124 and FALSE otherwise. "negateBase" is TRUE if the <negate> rule in 2125 <baseScalarSrc> or <baseVectorSrc> matches "-" and FALSE otherwise. 2126 "negateAbs" is TRUE if the <negate> rule in <absScalarSrc> or 2127 <absVectorSrc> matches "-" and FALSE otherwise. The ".c***", ".*c**", 2128 ".**c*", ".***c" modifiers refer to the x, y, z, and w components obtained 2129 by the swizzle operation. TypeConvert() is assumed to convert a scalar of 2130 type SrcT to a scalar of type InstT using the type conversion process 2131 specified above. 2132 2133 VecInstT VectorLoad(VecSrcT source) 2134 { 2135 VecSrcT srcVal; 2136 VecInstT convertedVal; 2137 2138 srcVal.x = source.c***; 2139 srcVal.y = source.*c**; 2140 srcVal.z = source.**c*; 2141 srcVal.w = source.***c; 2142 if (negateBase) { 2143 srcVal.x = -srcVal.x; 2144 srcVal.y = -srcVal.y; 2145 srcVal.z = -srcVal.z; 2146 srcVal.w = -srcVal.w; 2147 } 2148 if (absolute) { 2149 srcVal.x = abs(srcVal.x); 2150 srcVal.y = abs(srcVal.y); 2151 srcVal.z = abs(srcVal.z); 2152 srcVal.w = abs(srcVal.w); 2153 } 2154 if (negateAbs) { 2155 srcVal.x = -srcVal.x; 2156 srcVal.y = -srcVal.y; 2157 srcVal.z = -srcVal.z; 2158 srcVal.w = -srcVal.w; 2159 } 2160 2161 convertedVal.x = TypeConvert(srcVal.x); 2162 convertedVal.y = TypeConvert(srcVal.y); 2163 convertedVal.z = TypeConvert(srcVal.z); 2164 convertedVal.w = TypeConvert(srcVal.w); 2165 return convertedVal; 2166 } 2167 2168 InstT ScalarLoad(VecSrcT source) 2169 { 2170 SrcT srcVal; 2171 InstT convertedVal; 2172 2173 srcVal = source.c***; 2174 if (negateBase) { 2175 srcVal = -srcVal; 2176 } 2177 if (absolute) { 2178 srcVal = abs(srcVal); 2179 } 2180 if (negateAbs) { 2181 srcVal = -srcVal; 2182 } 2183 2184 convertedVal = TypeConvert(srcVal); 2185 return convertedVal; 2186 } 2187 2188 2189 Section 3.11.4.4, Fragment Program Destination Register Update 2190 2191 Each fragment program instruction, except for KIL, writes a 4-component 2192 result vector to a single temporary or output register. 2193 2194 The four components of the result vector are first optionally clamped to 2195 the range [0,1]. The components will be clamped if and only if the result 2196 clamp suffix "_SAT" is present in the instruction name. The instruction 2197 "ADD_SAT" will clamp the results to [0,1]; the otherwise equivalent 2198 instruction "ADD" will not. 2199 2200 Since the instruction may be carried out at a different precision than the 2201 destination register, the components of the results vector are then 2202 converted to the data type corresponding to destination register. 2203 2204 Writes to individual components of the temporary register are controlled 2205 by two sets of enables: individual component write masks specified as part 2206 of the instruction and the optional condition code mask. 2207 2208 The component write mask is specified by the <optionalWriteMask> rule 2209 found in the <maskedDstReg> rule. If the optional mask is "", all 2210 components are enabled. Otherwise, the optional mask names the individual 2211 components to enable. The characters "x", "y", "z", and "w" match the x, 2212 y, z, and w components respectively. For example, an optional mask of 2213 ".xzw" indicates that the x, z, and w components should be enabled for 2214 writing but the y component should not. The grammar requires that the 2215 destination register mask components must be listed in "xyzw" order. 2216 2217 The optional condition code mask is specified by the <optionalCCMask> rule 2218 found in the <maskedDstReg> rule. If <optionalCCMask> matches "", all 2219 components are enabled. Otherwise, the condition code register is loaded 2220 and swizzled according to the swizzling specified by <swizzleSuffix>. 2221 Each component of the swizzled condition code is tested according to the 2222 rule given by <ccMaskRule>. <ccMaskRule> may have the values "EQ", "NE", 2223 "LT", "GE", LE", or "GT", which mean to enable writes if the corresponding 2224 condition code field evaluates to equal, not equal, less than, greater 2225 than or equal, less than or equal, or greater than, respectively. 2226 Comparisons involving condition codes of "UN" (unordered) evaluate to true 2227 for "NE" and false otherwise. For example, if the condition code is 2228 (GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle 2229 operation will load (EQ,LT,GT,GT) and the mask will thus will enable 2230 writes on the y, z, and w components. In addition, "TR" always enables 2231 writes and "FL" always disables writes, regardless of the condition code. 2232 2233 Each component of the destination register is updated with the result of 2234 the fragment program if and only if the component is enabled for writes by 2235 both the component write mask and the optional condition code mask. 2236 Otherwise, the component of the destination register remains unchanged. 2237 2238 A fragment program instruction can also optionally update the condition 2239 code register. The condition code is updated if the condition code 2240 register update suffix "C" is present in the instruction name. The 2241 instruction "ADDC" will update the condition code; the otherwise 2242 equivalent instruction "ADD" will not. If condition code updates are 2243 enabled, each component of the destination register enabled for writes is 2244 compared to zero. The corresponding component of the condition code is 2245 set to "LT", "EQ", or "GT", if the written component is less than, equal 2246 to, or greater than zero, respectively. Condition code components are set 2247 to "UN" if the written component is NaN. Note that values of -0.0 and 2248 +0.0 both evaluate to "EQ". If a component of the destination register is 2249 not enabled for writes, the corresponding condition code component is 2250 unchanged. 2251 2252 In the following example code, 2253 2254 # R1=(-2, 0, 2, NaN) R0 CC 2255 MOVC R0, R1; # ( -2, 0, 2, NaN) (LT,EQ,GT,UN) 2256 MOVC R0.xyz, R1.yzwx; # ( 0, 2, NaN, NaN) (EQ,GT,UN,UN) 2257 MOVC R0 (NE), R1.zywx; # ( 0, 0, NaN, -2) (EQ,EQ,UN,LT) 2258 2259 the first instruction writes (-2,0,2,NaN) to R0 and updates the condition 2260 code to (LT,EQ,GT,UN). The second instruction, only the "x", "y", and "z" 2261 components of R0 and the condition code are updated, so R0 ends up with 2262 (0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN). In the 2263 third instruction, the condition code mask disables writes to the x 2264 component (its condition code field is "EQ"), so R0 ends up with 2265 (0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT). 2266 2267 The following pseudocode illustrates the process of writing a result 2268 vector to the destination register. In the example, "ccMaskRule" refers 2269 to the condition code mask rule given by <ccMaskRule> (or "" if no rule is 2270 specified), "instrmask" refers to the component write mask given by the 2271 <optionalWriteMask> rule, "updatecc" is TRUE if condition code updates are 2272 enabled, and "clamp01" is TRUE if [0,1] result clamping is enabled. 2273 "destination" and "cc" refer to the register selected by <dstRegister> and 2274 the condition code, respectively. 2275 2276 boolean TestCC(CondCode field) { 2277 switch (ccMaskRule) { 2278 case "EQ": return (field == "EQ"); 2279 case "NE": return (field != "EQ"); 2280 case "LT": return (field == "LT"); 2281 case "GE": return (field == "GT" || field == "EQ"); 2282 case "LE": return (field == "LT" || field == "EQ"); 2283 case "GT": return (field == "GT"); 2284 case "TR": return TRUE; 2285 case "FL": return FALSE; 2286 case "": return TRUE; 2287 } 2288 2289 enum GenerateCC(DstT value) { 2290 if (value == NaN) { 2291 return UN; 2292 } else if (value < 0) { 2293 return LT; 2294 } else if (value == 0) { 2295 return EQ; 2296 } else { 2297 return GT; 2298 } 2299 } 2300 2301 void UpdateDestination(VecDstT destination, VecInstT result) 2302 { 2303 // Load the original destination register and condition code. 2304 VecDstT resultDst; 2305 VecDstT merged; 2306 VecCC mergedCC; 2307 2308 // Clamp the result vector components to [0,1], if requested. 2309 if (clamp01) { 2310 if (result.x < 0) result.x = 0; 2311 else if (result.x > 1) result.x = 1; 2312 if (result.y < 0) result.y = 0; 2313 else if (result.y > 1) result.y = 1; 2314 if (result.z < 0) result.z = 0; 2315 else if (result.z > 1) result.z = 1; 2316 if (result.w < 0) result.w = 0; 2317 else if (result.w > 1) result.w = 1; 2318 } 2319 2320 // Convert the result to the type of the destination register. 2321 resultDst.x = TypeConvert(result.x); 2322 resultDst.y = TypeConvert(result.y); 2323 resultDst.z = TypeConvert(result.z); 2324 resultDst.w = TypeConvert(result.w); 2325 2326 // Merge the converted result into the destination register, under 2327 // control of the compile- and run-time write masks. 2328 merged = destination; 2329 mergedCC = cc; 2330 if (instrMask.x && TestCC(cc.c***)) { 2331 merged.x = result.x; 2332 if (updatecc) mergedCC.x = GenerateCC(result.x); 2333 } 2334 if (instrMask.y && TestCC(cc.*c**)) { 2335 merged.y = result.y; 2336 if (updatecc) mergedCC.y = GenerateCC(result.y); 2337 } 2338 if (instrMask.z && TestCC(cc.**c*)) { 2339 merged.z = result.z; 2340 if (updatecc) mergedCC.z = GenerateCC(result.z); 2341 } 2342 if (instrMask.w && TestCC(cc.***c)) { 2343 merged.w = result.w; 2344 if (updatecc) mergedCC.w = GenerateCC(result.w); 2345 } 2346 2347 // Write out the new destination register and result code. 2348 destination = merged; 2349 cc = mergedCC; 2350 } 2351 2352 Section 3.11.5, Fragment Program Instruction Set 2353 2354 The following sections describe the instruction set available to fragment 2355 programs. 2356 2357 2358 Section 3.11.5.1, ADD: Add 2359 2360 The ADD instruction performs a component-wise add of the two operands to 2361 yield a result vector. 2362 2363 tmp0 = VectorLoad(op0); 2364 tmp1 = VectorLoad(op1); 2365 result.x = tmp0.x + tmp1.x; 2366 result.y = tmp0.y + tmp1.y; 2367 result.z = tmp0.z + tmp1.z; 2368 result.w = tmp0.w + tmp1.w; 2369 2370 The following special-case rules apply to addition: 2371 2372 1. "A+B" is always equivalent to "B+A". 2373 2. NaN + <x> = NaN, for all <x>. 2374 3. +INF + <x> = +INF, for all <x> except NaN and -INF. 2375 4. -INF + <x> = -INF, for all <x> except NaN and +INF. 2376 5. +INF + -INF = NaN. 2377 6. -0.0 + <x> = <x>, for all <x>. 2378 7. +0.0 + <x> = <x>, for all <x> except -0.0. 2379 2380 2381 Section 3.11.5.2, COS: Cosine 2382 2383 The COS instruction approximates the cosine of the angle specified by the 2384 scalar operand and replicates the approximation to all four components of 2385 the result vector. The angle is specified in radians and does not have to 2386 be in the range [0,2*PI]. 2387 2388 tmp = ScalarLoad(op0); 2389 result.x = ApproxCosine(tmp); 2390 result.y = ApproxCosine(tmp); 2391 result.z = ApproxCosine(tmp); 2392 result.w = ApproxCosine(tmp); 2393 2394 The approximation function ApproxCosine is accurate to at least 22 bits 2395 with an angle in the range [0,2*PI]. 2396 2397 | ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. 2398 2399 The error in the approximation will typically increase with the absolute 2400 value of the angle when the angle falls outside the range [0,2*PI]. 2401 2402 The following special-case rules apply to cosine approximation: 2403 2404 1. ApproxCosine(NaN) = NaN. 2405 2. ApproxCosine(+/-INF) = NaN. 2406 3. ApproxCosine(+/-0.0) = +1.0. 2407 2408 2409 Section 3.11.5.3, DDX: Derivative Relative to X 2410 2411 The DDX instruction computes approximate partial derivatives of the four 2412 components of the single operand with respect to the X window coordinate 2413 to yield a result vector. The partial derivative is evaluated at the 2414 center of the pixel. 2415 2416 f = VectorLoad(op0); 2417 result = ComputePartialX(f); 2418 2419 Note that the partial derivates obtained by this instruction are 2420 approximate, and derivative-of-derivate instruction sequences may not 2421 yield accurate second derivatives. 2422 2423 For components with partial derivatives that overflow (including +/-INF 2424 inputs), the resulting partials may be encoded as large floating-point 2425 numbers instead of +/-INF. 2426 2427 2428 Section 3.11.5.4, DDY: Derivative Relative to Y 2429 2430 The DDY instruction computes approximate partial derivatives of the four 2431 components of the single operand with respect to the Y window coordinate 2432 to yield a result vector. The partial derivative is evaluated at the 2433 center of the pixel. 2434 2435 f = VectorLoad(op0); 2436 result = ComputePartialY(f); 2437 2438 Note that the partial derivates obtained by this instruction are 2439 approximate, and derivative-of-derivate instruction sequences may not 2440 yield accurate second derivatives. 2441 2442 For components with partial derivatives that overflow (including +/-INF 2443 inputs), the resulting partials may be encoded as large floating-point 2444 numbers instead of +/-INF. 2445 2446 2447 Section 3.11.5.5, DP3: 3-Component Dot Product 2448 2449 The DP3 instruction computes a three component dot product of the two 2450 operands (using the x, y, and z components) and replicates the dot product 2451 to all four components of the result vector. 2452 2453 tmp0 = VectorLoad(op0); 2454 tmp1 = VectorLoad(op1): 2455 result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2456 (tmp0.z * tmp2.z); 2457 result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2458 (tmp0.z * tmp2.z); 2459 result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2460 (tmp0.z * tmp2.z); 2461 result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2462 (tmp0.z * tmp2.z); 2463 2464 2465 Section 3.11.5.6, DP4: 4-Component Dot Product 2466 2467 The DP4 instruction computes a four component dot product of the two 2468 operands and replicates the dot product to all four components of the 2469 result vector. 2470 2471 tmp0 = VectorLoad(op0); 2472 tmp1 = VectorLoad(op1): 2473 result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2474 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); 2475 result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2476 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); 2477 result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2478 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); 2479 result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2480 (tmp0.z * tmp2.z) + (tmp0.w * tmp1.w); 2481 2482 2483 Section 3.11.5.7, DST: Distance Vector 2484 2485 The DST instruction computes a distance vector from two specially- 2486 formatted operands. The first operand should be of the form [NA, d^2, 2487 d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d], 2488 where NA values are not relevant to the calculation and d is a vector 2489 length. If both vectors satisfy these conditions, the result vector will 2490 be of the form [1.0, d, d^2, 1/d]. 2491 2492 The exact behavior is specified in the following pseudo-code: 2493 2494 tmp0 = VectorLoad(op0); 2495 tmp1 = VectorLoad(op1); 2496 result.x = 1.0; 2497 result.y = tmp0.y * tmp1.y; 2498 result.z = tmp0.z; 2499 result.w = tmp1.w; 2500 2501 Given an arbitrary vector, d^2 can be obtained using the DOT3 instruction 2502 (using the same vector for both operands) and 1/d can be obtained from d^2 2503 using the RSQ instruction. 2504 2505 This distance vector is useful for per-fragment light attenuation 2506 calculations: a DOT3 operation involving the distance vector and an 2507 attenuation constants vector will yield the attenuation factor. 2508 2509 2510 Section 3.11.5.8, EX2: Exponential Base 2 2511 2512 The EX2 instruction approximates 2 raised to the power of the scalar 2513 operand and replicates it to all four components of the result 2514 vector. 2515 2516 tmp = ScalarLoad(op0); 2517 result.x = Approx2ToX(tmp); 2518 result.y = Approx2ToX(tmp); 2519 result.z = Approx2ToX(tmp); 2520 result.w = Approx2ToX(tmp); 2521 2522 The approximation function is accurate to at least 22 bits: 2523 2524 | Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0, 2525 2526 and, in general, 2527 2528 | Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)). 2529 2530 The following special-case rules apply to exponential approximation: 2531 2532 1. Approx2ToX(NaN) = NaN. 2533 2. Approx2ToX(-INF) = +0.0. 2534 3. Approx2ToX(+INF) = +INF. 2535 4. Approx2ToX(+/-0.0) = +1.0. 2536 2537 2538 Section 3.11.5.9, FLR: Floor 2539 2540 The FLR instruction performs a component-wise floor operation on the 2541 operand to generate a result vector. The floor of a value is defined as 2542 the largest integer less than or equal to the value. The floor of 2.3 is 2543 2.0; the floor of -3.6 is -4.0. 2544 2545 tmp = VectorLoad(op0); 2546 result.x = floor(tmp.x); 2547 result.y = floor(tmp.y); 2548 result.z = floor(tmp.z); 2549 result.w = floor(tmp.w); 2550 2551 The following special-case rules apply to floor computation: 2552 2553 1. floor(NaN) = NaN. 2554 2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF. In all cases, the 2555 sign of the result is equal to the sign of the operand. 2556 2557 2558 Section 3.11.5.10, FRC: Fraction 2559 2560 The FRC instruction extracts the fractional portion of each component of 2561 the operand to generate a result vector. The fractional portion of a 2562 component is defined as the result after subtracting off the floor of the 2563 component (see FLR), and is always in the range [0.00, 1.00). 2564 2565 For negative values, the fractional portion is NOT the number written to 2566 the right of the decimal point -- the fractional portion of -1.7 is not 2567 0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0) 2568 from -1.7. 2569 2570 tmp = VectorLoad(op0); 2571 result.x = tmp.x - floor(tmp.x); 2572 result.y = tmp.y - floor(tmp.y); 2573 result.z = tmp.z - floor(tmp.z); 2574 result.w = tmp.w - floor(tmp.w); 2575 2576 The following special-case rules, which can be derived from the rules for 2577 FLR and ADD apply to fraction computation: 2578 2579 1. fraction(NaN) = NaN. 2580 2. fraction(+/-INF) = NaN. 2581 3. fraction(+/-0.0) = +0.0. 2582 2583 2584 Section 3.11.5.11, KIL: Conditionally Discard Fragment 2585 2586 The KIL instruction is unlike any other instruction in the instruction 2587 set. This instruction evaluates components of a swizzled condition code 2588 using a test expression identical to that used to evaluate condition code 2589 write masks (Section 3.11.4.4). If any condition code component evaluates 2590 to TRUE, the fragment is discarded. Otherwise, the instruction has no 2591 effect. The condition code components are specified, swizzled, and 2592 evaluated in the same manner as the condition code write mask. 2593 2594 if (TestCC(rc.c***) || TestCC(rc.*c**) || 2595 TestCC(rc.**c*) || TestCC(rc.***c)) { 2596 // Discard the fragment. 2597 } else { 2598 // Do nothing. 2599 } 2600 2601 If the fragment is discarded, it is treated as though it were not produced 2602 by rasterization. In particular, none of the per-fragment operations 2603 (such as stencil tests, blends, stencil, depth, or color buffer writes) 2604 are performed on the fragment. 2605 2606 2607 Section 3.11.5.12, LG2: Logarithm Base 2 2608 2609 The LG2 instruction approximates the base 2 logarithm of the scalar 2610 operand and replicates it to all four components of the result vector. 2611 2612 tmp = ScalarLoad(op0); 2613 result.x = ApproxLog2(tmp); 2614 result.y = ApproxLog2(tmp); 2615 result.z = ApproxLog2(tmp); 2616 result.w = ApproxLog2(tmp); 2617 2618 The approximation function is accurate to at least 22 bits: 2619 2620 | ApproxLog2(x) - log_2(x) | < 1.0 / 2^22. 2621 2622 Note that for large values of x, there are not enough bits in the 2623 floating-point storage format to represent a result that precisely. 2624 2625 The following special-case rules apply to logarithm approximation: 2626 2627 1. ApproxLog2(NaN) = NaN. 2628 2. ApproxLog2(+INF) = +INF. 2629 3. ApproxLog2(+/-0.0) = -INF. 2630 4. ApproxLog2(x) = NaN, -INF < x < -0.0. 2631 5. ApproxLog2(-INF) = NaN. 2632 2633 2634 Section 3.11.5.13, LIT: Compute Light Coefficients 2635 2636 The LIT instruction accelerates per-fragment lighting by computing 2637 lighting coefficients for ambient, diffuse, and specular light 2638 contributions. The "x" component of the operand is assumed to hold a 2639 diffuse dot product (n dot VP_pli, as in the vertex lighting equations in 2640 Section 2.13.1). The "y" component of the operand is assumed to hold a 2641 specular dot product (n dot h_i). The "w" component of the operand is 2642 assumed to hold the specular exponent of the material (s_rm). 2643 2644 The "x" component of the result vector receives the value that should be 2645 multiplied by the ambient light/material product (always 1.0). The "y" 2646 component of the result vector receives the value that should be 2647 multiplied by the diffuse light/material product (n dot VP_pli). The "z" 2648 component of the result vector receives the value that should be 2649 multiplied by the specular light/material product (f_i * (n dot h_i) ^ 2650 s_rm). The "w" component of the result is the constant 1.0. 2651 2652 Negative diffuse and specular dot products are clamped to 0.0, as is done 2653 in the standard per-vertex lighting operations. In addition, if the 2654 diffuse dot product is zero or negative, the specular coefficient is 2655 forced to zero. 2656 2657 tmp = VectorLoad(op0); 2658 if (t.x < 0) t.x = 0; 2659 if (t.y < 0) t.y = 0; 2660 result.x = 1.0; 2661 result.y = t.x; 2662 result.z = (t.x > 0) ? ApproxPower(t.y, t.w) : 0.0; 2663 result.w = 1.0; 2664 2665 The exponentiation approximation used to compute result.z are identical to 2666 that used in the POW instruction, including errors and the processing of 2667 any special cases. 2668 2669 2670 Section 3.11.5.14, LRP: Linear Interpolation 2671 2672 The LRP instruction performs a component-wise linear interpolation to 2673 yield a result vector. It interpolates between the components of the 2674 second and third operands, using the first operand as a weight. 2675 2676 tmp0 = VectorLoad(op0); 2677 tmp1 = VectorLoad(op1); 2678 tmp2 = VectorLoad(op2); 2679 result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x; 2680 result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y; 2681 result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z; 2682 result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w; 2683 2684 2685 Section 3.11.5.15, MAD: Multiply and Add 2686 2687 The MAD instruction performs a component-wise multiply of the first two 2688 operands, and then does a component-wise add of the product to the third 2689 operand to yield a result vector. 2690 2691 tmp0 = VectorLoad(op0); 2692 tmp1 = VectorLoad(op1); 2693 tmp2 = VectorLoad(op2); 2694 result.x = tmp0.x * tmp1.x + tmp2.x; 2695 result.y = tmp0.y * tmp1.y + tmp2.y; 2696 result.z = tmp0.z * tmp1.z + tmp2.z; 2697 result.w = tmp0.w * tmp1.w + tmp2.w; 2698 2699 2700 Section 3.11.5.16, MAX: maximum 2701 2702 The MAX instruction computes component-wise maximums of the values in the 2703 two operands to yield a result vector. 2704 2705 tmp0 = VectorLoad(op0); 2706 tmp1 = VectorLoad(op1); 2707 result.x = max(tmp0.x, tmp1.x); 2708 result.y = max(tmp0.y, tmp1.y); 2709 result.z = max(tmp0.z, tmp1.z); 2710 result.w = max(tmp0.w, tmp1.w); 2711 2712 The following special cases apply to the maximum operation: 2713 2714 1. max(A,B) is always equivalent to max(B,A). 2715 2. max(NaN, <x>) == NaN, for all <x>. 2716 2717 2718 2719 Section 3.11.5.17, MIN: minimum 2720 2721 The MIN instruction computes component-wise minimums of the values in the 2722 two operands to yield a result vector. 2723 2724 tmp0 = VectorLoad(op0); 2725 tmp1 = VectorLoad(op1); 2726 result.x = min(tmp0.x, tmp1.x); 2727 result.y = min(tmp0.y, tmp1.y); 2728 result.z = min(tmp0.z, tmp1.z); 2729 result.w = min(tmp0.w, tmp1.w); 2730 2731 The following special cases apply to the minimum operation: 2732 2733 1. min(A,B) is always equivalent to min(B,A). 2734 2. min(NaN, <x>) == NaN, for all <x>. 2735 2736 2737 Section 3.11.5.18, MOV: Move 2738 2739 The MOV instruction copies the value of the operand to yield a result 2740 vector. 2741 2742 result = VectorLoad(op0); 2743 2744 2745 Section 3.11.5.19, MUL: Multiply 2746 2747 The MUL instruction performs a component-wise multiply of the two operands 2748 to yield a result vector. 2749 2750 tmp0 = VectorLoad(op0); 2751 tmp1 = VectorLoad(op1); 2752 result.x = tmp0.x * tmp1.x; 2753 result.y = tmp0.y * tmp1.y; 2754 result.z = tmp0.z * tmp1.z; 2755 result.w = tmp0.w * tmp1.w; 2756 2757 The following special-case rules apply to multiplication: 2758 2759 1. "A*B" is always equivalent to "B*A". 2760 2. NaN * <x> = NaN, for all <x>. 2761 3. +/-0.0 * +/-INF = NaN. 2762 4. +/-0.0 * <x> = +/-0.0, for all <x> except -INF, +INF, and NaN. The 2763 sign of the result is positive if the signs of the two operands match 2764 and negative otherwise. 2765 5. +/-INF * <x> = +/-INF, for all <x> except -0.0, +0.0, and NaN. The 2766 sign of the result is positive if the signs of the two operands match 2767 and negative otherwise. 2768 6. +1.0 * <x> = <x>, for all <x>. 2769 2770 2771 Section 3.11.5.20, PK2H: Pack Two 16-bit Floats 2772 2773 The PK2H instruction converts the "x" and "y" components of the single 2774 operand into 16-bit floating-point format, packs the bit representation of 2775 these two floats into a 32-bit value, and replicates that value to all 2776 four components of the result vector. The PK2H instruction can be 2777 reversed by the UP2H instruction below. 2778 2779 tmp0 = VectorLoad(op0); 2780 /* result obtained by combining raw bits of tmp0.x, tmp0.y */ 2781 result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); 2782 result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); 2783 result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); 2784 result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); 2785 2786 The result must be written to a register with 32-bit components (an "R" 2787 register, o[COLR], or o[DEPR]). A fragment program will fail to load if 2788 any other register type is specified. 2789 2790 2791 Section 3.11.5.21, PK2US: Pack Two Unsigned 16-bit Scalars 2792 2793 The PK2US instruction converts the "x" and "y" components of the single 2794 operand into a packed pair of 16-bit unsigned scalars. The scalars are 2795 represented in a bit pattern where all '0' bits corresponds to 0.0 and all 2796 '1' bits corresponds to 1.0. The bit representations of the two converted 2797 components are packed into a 32-bit value, and that value is replicated to 2798 all four components of the result vector. The PK2US instruction can be 2799 reversed by the UP2US instruction below. 2800 2801 tmp0 = VectorLoad(op0); 2802 if (tmp0.x < 0.0) tmp0.x = 0.0; 2803 if (tmp0.x > 1.0) tmp0.x = 1.0; 2804 if (tmp0.y < 0.0) tmp0.y = 0.0; 2805 if (tmp0.y > 1.0) tmp0.y = 1.0; 2806 us.x = round(65535.0 * tmp0.x); /* us is a ushort vector */ 2807 us.y = round(65535.0 * tmp0.y); 2808 /* result obtained by combining raw bits of us. */ 2809 result.x = ((us.x) | (us.y << 16)); 2810 result.y = ((us.x) | (us.y << 16)); 2811 result.z = ((us.x) | (us.y << 16)); 2812 result.w = ((us.x) | (us.y << 16)); 2813 2814 The result must be written to a register with 32-bit components (an "R" 2815 register, o[COLR], or o[DEPR]). A fragment program will fail to load if 2816 any other register type is specified. 2817 2818 2819 Section 3.11.5.22, PK4B: Pack Four Signed 8-bit Scalars 2820 2821 The PK4B instruction converts the four components of the single operand 2822 into 8-bit signed quantities. The signed quantities are represented in a 2823 bit pattern where all '0' bits corresponds to -128/127 and all '1' bits 2824 corresponds to +127/127. The bit representations of the four converted 2825 components are packed into a 32-bit value, and that value is replicated to 2826 all four components of the result vector. The PK4B instruction can be 2827 reversed by the UP4B instruction below. 2828 2829 tmp0 = VectorLoad(op0); 2830 if (tmp0.x < -128/127) tmp0.x = -128/127; 2831 if (tmp0.y < -128/127) tmp0.y = -128/127; 2832 if (tmp0.z < -128/127) tmp0.z = -128/127; 2833 if (tmp0.w < -128/127) tmp0.w = -128/127; 2834 if (tmp0.x > +127/127) tmp0.x = +127/127; 2835 if (tmp0.y > +127/127) tmp0.y = +127/127; 2836 if (tmp0.z > +127/127) tmp0.z = +127/127; 2837 if (tmp0.w > +127/127) tmp0.w = +127/127; 2838 ub.x = round(127.0 * tmp0.x + 128.0); /* ub is a ubyte vector */ 2839 ub.y = round(127.0 * tmp0.y + 128.0); 2840 ub.z = round(127.0 * tmp0.z + 128.0); 2841 ub.w = round(127.0 * tmp0.w + 128.0); 2842 /* result obtained by combining raw bits of ub. */ 2843 result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 2844 result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 2845 result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 2846 result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 2847 2848 The result must be written to a register with 32-bit components (an "R" 2849 register, o[COLR], or o[DEPR]). A fragment program will fail to load if 2850 any other register type is specified. 2851 2852 2853 Section 3.11.5.23, PK4UB: Pack Four Unsigned 8-bit Scalars 2854 2855 The PK4UB instruction converts the four components of the single operand 2856 into a packed grouping of 8-bit unsigned scalars. The scalars are 2857 represented in a bit pattern where all '0' bits corresponds to 0.0 and all 2858 '1' bits corresponds to 1.0. The bit representations of the four 2859 converted components are packed into a 32-bit value, and that value is 2860 replicated to all four components of the result vector. The PK4UB 2861 instruction can be reversed by the UP4UB instruction below. 2862 2863 tmp0 = VectorLoad(op0); 2864 if (tmp0.x < 0.0) tmp0.x = 0.0; 2865 if (tmp0.x > 1.0) tmp0.x = 1.0; 2866 if (tmp0.y < 0.0) tmp0.y = 0.0; 2867 if (tmp0.y > 1.0) tmp0.y = 1.0; 2868 if (tmp0.z < 0.0) tmp0.z = 0.0; 2869 if (tmp0.z > 1.0) tmp0.z = 1.0; 2870 if (tmp0.w < 0.0) tmp0.w = 0.0; 2871 if (tmp0.w > 1.0) tmp0.w = 1.0; 2872 ub.x = round(255.0 * tmp0.x); /* ub is a ubyte vector */ 2873 ub.y = round(255.0 * tmp0.y); 2874 ub.z = round(255.0 * tmp0.z); 2875 ub.w = round(255.0 * tmp0.w); 2876 /* result obtained by combining raw bits of ub. */ 2877 result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 2878 result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 2879 result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 2880 result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 2881 2882 The result must be written to a register with 32-bit components (an "R" 2883 register, o[COLR], or o[DEPR]). A fragment program will fail to load if 2884 any other register type is specified. 2885 2886 2887 Section 3.11.5.24, POW: Exponentiation 2888 2889 The POW instruction approximates the value of the first scalar operand 2890 raised to the power of the second scalar operand and replicates it to all 2891 four components of the result vector. 2892 2893 tmp0 = ScalarLoad(op0); 2894 tmp1 = ScalarLoad(op1); 2895 result.x = ApproxPower(tmp0, tmp1); 2896 result.y = ApproxPower(tmp0, tmp1); 2897 result.z = ApproxPower(tmp0, tmp1); 2898 result.w = ApproxPower(tmp0, tmp1); 2899 2900 The exponentiation approximation function is defined in terms of the base 2901 2 exponentiation and logarithm approximation operations in the EX2 and LG2 2902 instructions, including errors and the processing of any special cases. 2903 In particular, 2904 2905 ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)). 2906 2907 The following special-case rules, which can be derived from the rules in 2908 the LG2, MUL, and EX2 instructions, apply to exponentiation: 2909 2910 1. ApproxPower(<x>, <y>) = NaN, if x < -0.0, 2911 2. ApproxPower(<x>, <y>) = NaN, if x or y is NaN. 2912 3. ApproxPower(+/-0.0, +/-0.0) = NaN. 2913 4. ApproxPower(+INF, +/-0.0) = NaN. 2914 5. ApproxPower(+1.0, +/-INF) = NaN. 2915 6. ApproxPower(+/-0.0, <x>) = +0.0, if x > +0.0. 2916 7. ApproxPower(+/-0.0, <x>) = +INF, if x < -0.0. 2917 8. ApproxPower(+1.0, <x>) = +1.0, if -INF < x < +INF. 2918 9. ApproxPower(+INF, <x>) = +INF, if x > +0.0. 2919 10. ApproxPower(+INF, <x>) = +INF, if x < -0.0. 2920 11. ApproxPower(<x>, +/-0.0) = +1.0, if +0.0 < x < +INF. 2921 12. ApproxPower(<x>, +1.0) ~= <x>, if x >= +0.0. 2922 13. ApproxPower(<x>, +INF) = +0.0, if -0.0 <= x < +1.0, 2923 +INF, if x > +1.0, 2924 14. ApproxPower(<x>, -INF) = +INF, if -0.0 <= x < +1.0, 2925 +0.0, if x > +1.0, 2926 2927 Note that 0^0 is defined here as NaN, since ApproxLog2(0) = -INF, and 2928 0*(-INF) = NaN. In many other applications, including the standard C 2929 pow() function, 0^0 is defined as 1.0. This behavior can be emulated 2930 using additional instructions in much that same way that the pow() 2931 function is implemented on many CPUs. 2932 2933 Note that a logarithm is involved even if the exponent is an integer. 2934 This means that any exponentiating with a negative base will produce NaN. 2935 In constrast, it is possible in a "normal" mathematical formulation to 2936 raise negative numbers to integral powers (e.g., (-3)^2== 9, and 2937 (-0.5)^-2==4). 2938 2939 2940 Section 3.11.5.25, RCP: Reciprocal 2941 2942 The RCP instruction approximates the reciprocal of the scalar operand and 2943 replicates it to all four components of the result vector. 2944 2945 tmp = ScalarLoad(op0); 2946 result.x = ApproxReciprocal(tmp); 2947 result.y = ApproxReciprocal(tmp); 2948 result.z = ApproxReciprocal(tmp); 2949 result.w = ApproxReciprocal(tmp); 2950 2951 The approximation function is accurate to at least 22 bits: 2952 2953 | ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0. 2954 2955 The following special-case rules apply to reciprocation: 2956 2957 1. ApproxReciprocal(NaN) = NaN. 2958 2. ApproxReciprocal(+INF) = +0.0. 2959 3. ApproxReciprocal(-INF) = -0.0. 2960 4. ApproxReciprocal(+0.0) = +INF. 2961 5. ApproxReciprocal(-0.0) = -INF. 2962 2963 2964 Section 3.11.5.26, RFL: Reflection Vector 2965 2966 The RFL instruction computes the reflection of the second vector operand 2967 (the "direction" vector) about the vector specified by the first vector 2968 operand (the "axis" vector). Both operands are treated as 3D vectors (the 2969 w components are ignored). The result vector is another 3D vector (the 2970 "reflected direction" vector). The length of the result vector, ignoring 2971 rounding errors, should equal that of the second operand. 2972 2973 axis = VectorLoad(op0); 2974 direction = VectorLoad(op1); 2975 tmp.w = (axis.x * axis.x + axis.y * axis.y + 2976 axis.z * axis.z); 2977 tmp.x = (axis.x * direction.x + axis.y * direction.y + 2978 axis.z * direction.z); 2979 tmp.x = 2.0 * tmp.x; 2980 tmp.x = tmp.x / tmp.w; 2981 result.x = tmp.x * axis.x - direction.x; 2982 result.y = tmp.x * axis.y - direction.y; 2983 result.z = tmp.x * axis.z - direction.z; 2984 2985 A fragment program will fail to load if the w component of the result is 2986 enabled in the component write mask (see the <optionalWriteMask> rule in 2987 the grammar). 2988 2989 2990 Section 3.11.5.27, RSQ: Reciprocal Square Root 2991 2992 The RSQ instruction approximates the reciprocal of the square root of the 2993 scalar operand and replicates it to all four components of the result 2994 vector. 2995 2996 tmp = ScalarLoad(op0); 2997 result.x = ApproxRSQRT(tmp); 2998 result.y = ApproxRSQRT(tmp); 2999 result.z = ApproxRSQRT(tmp); 3000 result.w = ApproxRSQRT(tmp); 3001 3002 The approximation function is accurate to at least 22 bits: 3003 3004 | ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0. 3005 3006 The following special-case rules apply to reciprocal square roots: 3007 3008 1. ApproxRSQRT(NaN) = NaN. 3009 2. ApproxRSQRT(+INF) = +0.0. 3010 3. ApproxRSQRT(-INF) = NaN. 3011 4. ApproxRSQRT(+0.0) = +INF. 3012 5. ApproxRSQRT(-0.0) = -INF. 3013 6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0. 3014 3015 3016 Section 3.11.5.28, SEQ: Set on Equal To 3017 3018 The SEQ instruction performs a component-wise comparison of the two 3019 operands. Each component of the result vector is 1.0 if the corresponding 3020 component of the first operand is equal to that of the second, and 0.0 3021 otherwise. 3022 3023 tmp0 = VectorLoad(op0); 3024 tmp1 = VectorLoad(op1); 3025 result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0; 3026 result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0; 3027 result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0; 3028 result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0; 3029 3030 The following special-case rules apply to SEQ: 3031 3032 1. (<x> == <y>) and (<y> == <x>) always produce the same result. 3033 1. (NaN == <x>) is FALSE for all <x>, including NaN. 3034 2. (+INF == +INF) and (-INF == -INF) are TRUE. 3035 3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE. 3036 3037 3038 Section 3.11.5.29, SFL: Set on False 3039 3040 The SFL instruction is a degenerate case of the other "Set on" 3041 instructions that sets all components of the result vector to 3042 0.0. 3043 3044 result.x = 0.0; 3045 result.y = 0.0; 3046 result.z = 0.0; 3047 result.w = 0.0; 3048 3049 3050 Section 3.11.5.30, SGE: Set on Greater Than or Equal 3051 3052 The SGE instruction performs a component-wise comparison of the two 3053 operands. Each component of the result vector is 1.0 if the corresponding 3054 component of the first operands is greater than or equal that of the 3055 second, and 0.0 otherwise. 3056 3057 tmp0 = VectorLoad(op0); 3058 tmp1 = VectorLoad(op1); 3059 result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0; 3060 result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0; 3061 result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0; 3062 result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0; 3063 3064 The following special-case rules apply to SGE: 3065 3066 1. (NaN >= <x>) and (<x> >= NaN) are FALSE for all <x>. 3067 2. (+INF >= +INF) and (-INF >= -INF) are TRUE. 3068 3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE. 3069 3070 3071 Section 3.11.5.31, SGT: Set on Greater Than 3072 3073 The SGT instruction performs a component-wise comparison of the two 3074 operands. Each component of the result vector is 1.0 if the corresponding 3075 component of the first operands is greater than that of the second, and 3076 0.0 otherwise. 3077 3078 tmp0 = VectorLoad(op0); 3079 tmp1 = VectorLoad(op1); 3080 result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0; 3081 result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0; 3082 result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0; 3083 result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0; 3084 3085 The following special-case rules apply to SGT: 3086 3087 1. (NaN > <x>) and (<x> > NaN) are FALSE for all <x>. 3088 2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE. 3089 3090 3091 Section 3.11.5.32, SIN: Sine 3092 3093 The SIN instruction approximates the sine of the angle specified by the 3094 scalar operand and replicates it to all four components of the result 3095 vector. The angle is specified in radians and does not have to be in the 3096 range [0,2*PI]. 3097 3098 tmp = ScalarLoad(op0); 3099 result.x = ApproxSine(tmp); 3100 result.y = ApproxSine(tmp); 3101 result.z = ApproxSine(tmp); 3102 result.w = ApproxSine(tmp); 3103 3104 The approximation function is accurate to at least 22 bits with an angle 3105 in the range [0,2*PI]. 3106 3107 | ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. 3108 3109 The error in the approximation will typically increase with the absolute 3110 value of the angle when the angle falls outside the range [0,2*PI]. 3111 3112 The following special-case rules apply to cosine approximation: 3113 3114 1. ApproxSine(NaN) = NaN. 3115 2. ApproxSine(+/-INF) = NaN. 3116 3. ApproxSine(+/-0.0) = +/-0.0. The sign of the result is equal to the 3117 sign of the single operand. 3118 3119 3120 Section 3.11.5.33, SLE: Set on Less Than or Equal 3121 3122 The SLE instruction performs a component-wise comparison of the two 3123 operands. Each component of the result vector is 1.0 if the corresponding 3124 component of the first operand is less than or equal to that of the 3125 second, and 0.0 otherwise. 3126 3127 tmp0 = VectorLoad(op0); 3128 tmp1 = VectorLoad(op1); 3129 result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0; 3130 result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0; 3131 result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0; 3132 result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0; 3133 3134 The following special-case rules apply to SLE: 3135 3136 1. (NaN <= <x>) and (<x> <= NaN) are FALSE for all <x>. 3137 2. (+INF <= +INF) and (-INF <= -INF) are TRUE. 3138 3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE. 3139 3140 3141 Section 3.11.5.34, SLT: Set on Less Than 3142 3143 The SLT instruction performs a component-wise comparison of the two 3144 operands. Each component of the result vector is 1.0 if the corresponding 3145 component of the first operand is less than that of the second, and 0.0 3146 otherwise. 3147 3148 tmp0 = VectorLoad(op0); 3149 tmp1 = VectorLoad(op1); 3150 result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0; 3151 result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0; 3152 result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0; 3153 result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0; 3154 3155 The following special-case rules apply to SLT: 3156 3157 1. (NaN < <x>) and (<x> < NaN) are FALSE for all <x>. 3158 2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE. 3159 3160 3161 Section 3.11.5.35, SNE: Set on Not Equal 3162 3163 The SNE instruction performs a component-wise comparison of the two 3164 operands. Each component of the result vector is 1.0 if the corresponding 3165 component of the first operand is not equal to that of the second, and 0.0 3166 otherwise. 3167 3168 tmp0 = VectorLoad(op0); 3169 tmp1 = VectorLoad(op1); 3170 result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0; 3171 result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0; 3172 result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0; 3173 result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0; 3174 3175 The following special-case rules apply to SNE: 3176 3177 1. (<x> != <y>) and (<y> != <x>) always produce the same result. 3178 2. (NaN != <x>) is TRUE for all <x>, including NaN. 3179 3. (+INF != +INF) and (-INF != -INF) are FALSE. 3180 4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE. 3181 3182 3183 Section 3.11.5.36, STR: Set on True 3184 3185 The STR instruction is a degenerate case of the other "Set on" 3186 instructions that sets all components of the result vector to 1.0. 3187 3188 result.x = 1.0; 3189 result.y = 1.0; 3190 result.z = 1.0; 3191 result.w = 1.0; 3192 3193 3194 Section 3.11.5.37, SUB: Subtract 3195 3196 The SUB instruction performs a component-wise subtraction of the second 3197 operand from the first to yield a result vector. 3198 3199 tmp0 = VectorLoad(op0); 3200 tmp1 = VectorLoad(op1); 3201 result.x = tmp0.x - tmp1.x; 3202 result.y = tmp0.y - tmp1.y; 3203 result.z = tmp0.z - tmp1.z; 3204 result.w = tmp0.w - tmp1.w; 3205 3206 The SUB instruction is completely equivalent to an identical ADD 3207 instruction in which the negate operator on the second operand is 3208 reversed: 3209 3210 1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2". 3211 2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2". 3212 3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|". 3213 4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|". 3214 3215 3216 Section 3.11.5.38, TEX: Texture Lookup 3217 3218 The TEX instruction performs a filtered texture lookup using the texture 3219 target given by <texImageTarget> belonging to the texture image unit given 3220 by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE", 3221 and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, 3222 TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. 3223 3224 The (s,t,r) texture coordinates used for the lookup are the x, y, and z 3225 components of the single operand. 3226 3227 The texture lookup is performed as specified in Section 3.8. The LOD 3228 calculations in Section 3.8.5 are performed using an implementation 3229 dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy. 3230 The mapping of filtered texture components to the components of the result 3231 vector is dependent on the base internal format of the texture and is 3232 specified in Table X.5. 3233 3234 Result Vector Components 3235 Base Internal Format X Y Z W 3236 -------------------- ----- ----- ----- ----- 3237 ALPHA 0.0 0.0 0.0 At 3238 LUMINANCE Lt Lt Lt 1.0 3239 LUMINANCE_ALPHA Lt Lt Lt At 3240 INTENSITY It It It It 3241 RGB Rt Gt Bt 1.0 3242 RGBA Rt Gt Bt At 3243 HILO_NV (signed) HIt LOt HEMI 1.0 3244 HILO_NV (unsigned) HIt LOt 1.0 1.0 3245 DSDT_NV DSt DTt 0.0 1.0 3246 DSDT_MAG_NV DSt DTt MAGt 1.0 3247 DSDT_MAG_INTENSITY_NV DSt DTt MAGt It 3248 FLOAT_R_NV Rt 0.0 0.0 1.0 3249 FLOAT_RG_NV Rt Gt 0.0 1.0 3250 FLOAT_RGB_NV Rt Gt Bt 1.0 3251 FLOAT_RGBA_NV Rt Gt Bt At 3252 3253 Table X.5: Mapping of filtered texel components to result vector 3254 components for the TEX instruction. 0.0 and 1.0 indicate that the 3255 corresponding constant value is written to the result vector. 3256 DEPTH_COMPONENT textures are treated as ALPHA, LUMINANCE, or INTENSITY, 3257 as specified in the texture's depth texture mode. 3258 3259 For HILO_NV textures with signed components, "HEMI" is defined as 3260 sqrt(MAX(0, 1-(HIt^2+LOt^2))). 3261 3262 This instruction specifies a particular texture target, ignoring the 3263 standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, 3264 TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended 3265 OpenGL. If the specified texture target has a consistent set of images, a 3266 lookup is performed. Otherwise, the result of the instruction is the 3267 vector (0,0,0,0). 3268 3269 Although this instruction allows the selection of any texture target, a 3270 fragment program can not use more than one texture target for any given 3271 texture image unit. 3272 3273 3274 Section 3.11.5.39, TXD: Texture Lookup with Derivatives 3275 3276 The TXD instruction performs a filtered texture lookup using the texture 3277 target given by <texImageTarget> belonging to the texture image unit given 3278 by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE", 3279 and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, 3280 TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. 3281 3282 The (s,t,r) texture coordinates used for the lookup are the x, y, and z 3283 components of the first operand. The partial derivatives in the X 3284 direction (ds/dx, dt/dx, dr/dx) are specified by the x, y, and z 3285 components of the second operand. The partial derivatives in the Y 3286 direction (ds/dy, dt/dy, dr/dy) are specified by the x, y, and z 3287 components of the third operand. 3288 3289 The texture lookup is performed as specified in Section 3.8. The LOD 3290 calculations in Section 3.8.5 are performed using the specified partial 3291 derivatives. The mapping of filtered texture components to the components 3292 of the result vector is dependent on the base internal format of the 3293 texture and is specified in Table X.5. 3294 3295 This instruction specifies a particular texture target, ignoring the 3296 standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, 3297 TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended 3298 OpenGL. If the specified texture target has a consistent set of images, a 3299 lookup is performed. Otherwise, the result of the instruction is the 3300 vector (0,0,0,0). 3301 3302 Although this instruction allows the selection of any texture target, a 3303 fragment program can not use more than one texture target for any given 3304 texture image unit. 3305 3306 3307 Section 3.11.5.40, TXP: Projective Texture Lookup 3308 3309 The TXP instruction performs a filtered texture lookup using the texture 3310 target given by <texImageTarget> belonging to the texture image unit given 3311 by <texImageUnit>. <texImageTarget> values of "1D", "2D", "3D", "CUBE", 3312 and "RECT" correspond to the texture targets TEXTURE_1D, TEXTURE_2D, 3313 TEXTURE_3D, TEXTURE_CUBE_MAP_ARB, and TEXTURE_RECTANGLE_NV, respectively. 3314 3315 For cube map textures, the (s,t,r) texture coordinates used for the lookup 3316 are given by x, y, and z, respectively. For all other textures, the 3317 (s,t,r) texture coordinates used for the lookup are given by x/w, y/w, and 3318 z/w, respectively, where x, y, z, and w are the corresponding components 3319 of the operand. 3320 3321 The texture lookup is performed as specified in Section 3.8. The LOD 3322 calculations in Section 3.8.5 are performed using an implementation 3323 dependent method to derive ds/dx, ds/dy, dt/dx, dt/dy, dr/dx, and dr/dy. 3324 The mapping of filtered texture components to the components of the result 3325 vector is dependent on the base internal format of the texture and is 3326 specified in Table X.5. 3327 3328 This instruction specifies a particular texture target, ignoring the 3329 standard hierarchy of texture enables (TEXTURE_CUBE_MAP_ARB, TEXTURE_3D, 3330 TEXTURE_2D, TEXTURE_1D) used to select a texture target in unextended 3331 OpenGL. If the specified texture target has a consistent set of images, a 3332 lookup is performed. Otherwise, the result of the instruction is the 3333 vector (0,0,0,0). 3334 3335 Although this instruction allows the selection of any texture target, a 3336 fragment program can not use more than one texture target for any given 3337 texture image unit. 3338 3339 3340 Section 3.11.5.41, UP2H: Unpack Two 16-Bit Floats 3341 3342 The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit 3343 scalar operand. The first 16-bit float (stored in the 16 least 3344 significant bits) is written into the "x" and "z" components of the result 3345 vector; the second is written into the "y" and "w" components of the 3346 result vector. 3347 3348 This operation undoes the type conversion and packing performed by the 3349 PK2H instruction. 3350 3351 tmp = ScalarLoad(op0); 3352 result.x = (fp16) (RawBits(tmp) & 0xFFFF); 3353 result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); 3354 result.z = (fp16) (RawBits(tmp) & 0xFFFF); 3355 result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); 3356 3357 Since the source operand must be a 32-bit scalar, a fragment program will 3358 fail to load if the operand is not obtained from a register with 32-bit 3359 components or from a program parameter. 3360 3361 3362 Section 3.11.5.42, UP2US: Unpack Two Unsigned 16-Bit Scalars 3363 3364 The UP2US instruction unpacks two 16-bit unsigned values packed together 3365 in a 32-bit scalar operand. The unsigned quantities are encoded where a 3366 bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1' 3367 bits corresponds to 1.0. The "x" and "z" components of the result vector 3368 are obtained from the 16 least significant bits of the operand; the "y" 3369 and "w" components are obtained from the 16 most significant bits. 3370 3371 This operation undoes the type conversion and packing performed by the 3372 PK2US instruction. 3373 3374 tmp = ScalarLoad(op0); 3375 result.x = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; 3376 result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; 3377 result.z = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; 3378 result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; 3379 3380 Since the source operand must be a 32-bit scalar, a fragment program will 3381 fail to load if the operand is not obtained from a register with 32-bit 3382 components or from a program parameter. 3383 3384 3385 Section 3.11.5.43, UP4B: Unpack Four Signed 8-Bit Values 3386 3387 The UP4B instruction unpacks four 8-bit signed values packed together in a 3388 32-bit scalar operand. The signed quantities are encoded where a bit 3389 pattern of all '0' bits corresponds to -128/127 and a pattern of all '1' 3390 bits corresponds to +127/127. The "x" component of the result vector is 3391 the converted value corresponding to the 8 least significant bits of the 3392 operand; the "w" component corresponds to the 8 most significant bits. 3393 3394 This operation undoes the type conversion and packing performed by the 3395 PK4B instruction. 3396 3397 tmp = ScalarLoad(op0); 3398 result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0; 3399 result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0; 3400 result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0; 3401 result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0; 3402 3403 Since the source operand must be a 32-bit scalar, a fragment program will 3404 fail to load if the operand is not obtained from a register with 32-bit 3405 components or from a program parameter. 3406 3407 3408 Section 3.11.5.44, UP4UB: Unpack Four Unsigned 8-Bit Scalars 3409 3410 The UP4UB instruction unpacks four 8-bit unsigned values packed together 3411 in a 32-bit scalar operand. The unsigned quantities are encoded where a 3412 bit pattern of all '0' bits corresponds to 0.0 and a pattern of all '1' 3413 bits corresponds to 1.0. The "x" component of the result vector is 3414 obtained from the 8 least significant bits of the operand; the "w" 3415 component is obtained from the 8 most significant bits. 3416 3417 This operation undoes the type conversion and packing performed by the 3418 PK4UB instruction. 3419 3420 tmp = ScalarLoad(op0); 3421 result.x = ((RawBits(tmp) >> 0) & 0xFF) / 255.0; 3422 result.y = ((RawBits(tmp) >> 8) & 0xFF) / 255.0; 3423 result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0; 3424 result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0; 3425 3426 Since the source operand must be a 32-bit scalar, a fragment program will 3427 fail to load if the operand is not obtained from a register with 32-bit 3428 components or from a program parameter. 3429 3430 3431 Section 3.11.5.45, X2D: 2D Coordinate Transformation 3432 3433 The X2D instruction multiplies the 2D offset vector specified by the "x" 3434 and "y" components of the second vector operand by the 2x2 matrix 3435 specified by the four components of the third vector operand, and adds the 3436 transformed offset vector to the 2D vector specified by the "x" and "y" 3437 components of the first vector operand. The first component of the sum is 3438 written to the "x" and "z" components of the result; the second component 3439 is written to the "y" and "w" components of the result. 3440 3441 The X2D instruction can be used to displace texture coordinates in the 3442 same manner as the OFFSET_TEXTURE_2D_NV mode in the GL_NV_texture_shader 3443 extension. 3444 3445 tmp0 = VectorLoad(op0); 3446 tmp1 = VectorLoad(op1); 3447 tmp2 = VectorLoad(op2); 3448 result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; 3449 result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; 3450 result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; 3451 result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; 3452 3453 3454 Section 3.11.6, Fragment Program Outputs 3455 3456 Upon completion of fragment program execution, the output registers are 3457 used to replace the fragment's associated data. 3458 3459 The RGBA color of the fragment is taken from the color output register 3460 used by the program (COLR or COLH). The R, G, B, and A color components 3461 are extracted from the "x", "y", "z", and "w" components, respectively, of 3462 the output register and are clamped to the range [0,1]. 3463 3464 If the DEPR output register is written by the fragment program, the depth 3465 value of the fragment is taken from the z component of the DEPR output 3466 register. If depth clamping is enabled, the depth value is clamped to the 3467 range [min(n,f), max(n,f)], where n and f are the near and far depth range 3468 values. If depth clamping is disabled, the fragment is discarded if its 3469 depth value is outside the range [min(n,f), max(n,f)]. 3470 3471 3472 Section 3.11.7, Required Fragment Program State 3473 3474 The state required for managing fragment programs consists of: 3475 3476 a bit indicating whether or not fragment program mode is enabled; 3477 3478 an unsigned integer naming the currently bound fragment program 3479 3480 and the state that must be maintained to indicate which integers are 3481 currently in use as fragment program names. 3482 3483 Fragment program mode is initially disabled. The initial state of all 128 3484 fragment program parameter registers is (0,0,0,0). The initial currently 3485 bound fragment program is zero. 3486 3487 Each fragment program object consists of: 3488 3489 an enumerant given the program target (FRAGMENT_PROGRAM_NV); 3490 3491 a boolean indicating whether the program is resident; 3492 3493 an array of type ubyte containing the program string; 3494 3495 an integer representing the length of the program string array; 3496 3497 one four-component floating-point vector for each named local 3498 parameter in the program; 3499 3500 and a set of MAX_FRAGMENT_PROGRAM_LOCAL_PARAMETERS_NV four-component 3501 floating-point vectors to hold numbered local parameters, each initially 3502 set to (0,0,0,0). 3503 3504 Initially, no program objects exist. 3505 3506 Additionally, the state required during the execution of a fragment 3507 program consists of: twelve 4-component floating-point fragment attribute 3508 registers, thirty-two 128-bit physical temporary registers, and a single 3509 4-component condition code, whose components have one of four values (LT, 3510 EQ, GT, or UN). 3511 3512 Each time a fragment program is executed, the fragment attribute registers 3513 are initialized with the fragment's location and associated data, all 3514 temporary register components are initialized to zero, and all condition 3515 code components are initialized to EQ. 3516 3517 3518 Renumber Section 3.11 to Section 3.12, Antialiasing Application (p.140). 3519 No changes to the text of the section. 3520 3521 3522Additions to Chapter 4 of the OpenGL 1.2.1 Specification (Per-Fragment 3523Operations and the Framebuffer) 3524 3525 None 3526 3527Additions to Chapter 5 of the OpenGL 1.2.1 Specification (Special Functions) 3528 3529 Add new section 5.7, Programs (after "Flush and Finish") 3530 3531 Programs are specified as an array of ubytes used to control the operation 3532 of portions of the GL. The array is a string of ASCII characters encoding 3533 the program. 3534 3535 The command 3536 3537 LoadProgramNV(enum target, uint id, sizei len, const ubyte *program); 3538 3539 loads a program. The target parameter specifies the type of program 3540 loaded and can be VERTEX_PROGRAM_NV, VERTEX_STATE_PROGRAM_NV, or 3541 FRAGMENT_PROGRAM_NV. VERTEX_PROGRAM_NV specifies a program to be executed 3542 in vertex program mode as each vertex is specified. VERTEX_STATE_PROGRAM 3543 specifies a program to be run manually to update vertex state. 3544 FRAGMENT_PROGRAM specifies a program to be executed in fragment program 3545 mode as each fragment is rasterized. 3546 3547 Multiple programs can be loaded with different names. id names the 3548 program to load. The name space for programs is the set of positive 3549 integers (zero is reserved). The error INVALID_VALUE is generated by 3550 LoadProgramNV if a program is loaded with an id of zero. The error 3551 INVALID_OPERATION is generated by LoadProgramNV or if a program is loaded 3552 for an id that is currently loaded with a program of a different program 3553 target. program is a pointer to an array of ubytes that represents the 3554 program being loaded. The length of the array in ubytes is indicated by 3555 len. 3556 3557 At program load time, the program is parsed into a set of tokens possibly 3558 separated by white space. Spaces, tabs, newlines, carriage returns, and 3559 comments are considered whitespace. Comments begin with the character "#" 3560 and are terminated by a newline, a carriage return, or the end of the 3561 program array. Tokens are processed in a case-sensitive manner: upper 3562 and lower-case letters are not considered equivalent. 3563 3564 Each program target has a corresponding Backus-Naur Form (BNF) grammar 3565 specifying the syntactically valid sequences for programs of the specified 3566 type. The set of valid tokens can be inferred from the grammar. The 3567 token "" represents an empty string and is used to indicate optional 3568 rules. A program is invalid if it contains any undefined tokens or 3569 characters. 3570 3571 The error INVALID_OPERATION is generated by LoadProgramNV if a program 3572 fails to load because it is not syntactically correct or fails to satisfy 3573 all of the semantic restrictions corresponding to the program target. 3574 3575 A successfully loaded program is parsed into a sequence of instructions. 3576 Each instruction is identified by its tokenized name. The operation of 3577 these instructions is specific to the program target and is defined 3578 elsewhere. 3579 3580 A successfully loaded program replaces the program previously assigned to 3581 the name specified by id. If the OUT_OF_MEMORY error is generated by 3582 LoadProgramNV, no change is made to the previous contents of the named 3583 program. 3584 3585 Querying the value of PROGRAM_ERROR_POSITION_NV returns a ubyte offset 3586 into the program string most recently passed to LoadProgramNV indicating 3587 the position of the first error, if any, in the program. If the program 3588 fails to load because of a semantic restriction that cannot be determined 3589 until the program is fully scanned, the error position will be len, the 3590 length of the program. If the program loads successfully, the value of 3591 PROGRAM_ERROR_POSITION_NV is assigned the value negative one. 3592 3593 For targets whose programs are executed automatically (e.g., vertex and 3594 fragment programs), there must be a current program. The current vertex 3595 program is executed automatically in vertex program mode as vertices are 3596 specified. The current fragment program is executed automatically in 3597 fragment program mode as fragments are generated by rasterization. 3598 Current programs for a program target are updated by 3599 3600 BindProgramNV(enum target, uint id); 3601 3602 where target must be VERTEX_PROGRAM_NV or FRAGMENT_PROGRAM_NV. The error 3603 INVALID_OPERATION is generated by BindProgramNV if id names a program that 3604 has a type different than target (for example, if id names a vertex state 3605 program as described in section 2.14.4). 3606 3607 Binding to a nonexistent program id does not generate an error. In 3608 particular, binding to program id zero does not generate an error. 3609 However, because program zero cannot be loaded, program zero is always 3610 nonexistent. If a program id is successfully loaded with a new vertex 3611 program and id is also the currently bound vertex program, the new program 3612 is considered the currently bound vertex program. 3613 3614 The INVALID_OPERATION error is generated when both vertex program mode is 3615 enabled and Begin is called (or when a command that performs an implicit 3616 Begin is called) if the current vertex program is nonexistent or not 3617 valid. A vertex program may not be valid for reasons explained in section 3618 2.14.5. 3619 3620 The INVALID_OPERATION error is generated when both fragment program mode 3621 is enabled and Begin, another GL command that performs an implicit Begin, 3622 or any other GL command that generates fragments is called, if the current 3623 fragment program is nonexistent or not valid. A fragment program may be 3624 invalid for reasons explained in Section 3.11.3. 3625 3626 Programs are deleted by calling 3627 3628 void DeleteProgramsNV(sizei n, const uint *ids); 3629 3630 ids contains n names of programs to be deleted. After a program is 3631 deleted, it becomes nonexistent, and its name is again unused. If a 3632 program that is currently bound is deleted, it is as though BindProgramNV 3633 has been executed with the same target as the deleted program and program 3634 zero. Unused names in ids are silently ignored, as is the value zero. 3635 3636 The command 3637 3638 void GenProgramsNV(sizei n, uint *ids); 3639 3640 returns n currently unused program names in ids. These names are marked 3641 as used, for the purposes of GenProgramsNV only, but they become existent 3642 programs only when the are first loaded using LoadProgramNV. 3643 3644 An implementation may choose to establish a working set of programs on 3645 which binding and/or manual execution are performed with higher 3646 performance. A program that is currently part of this working set is said 3647 to be resident. 3648 3649 The command 3650 3651 boolean AreProgramsResidentNV(sizei n, const uint *ids, 3652 boolean *residences); 3653 3654 returns TRUE if all of the n programs named in ids are resident, or if the 3655 implementation does not distinguish a working set. If at least one of the 3656 programs named in ids is not resident, then FALSE is returned, and the 3657 residence of each program is returned in residences. Otherwise the 3658 contents of residences are not changed. If any of the names in ids are 3659 nonexistent or zero, FALSE is returned, the error INVALID_VALUE is 3660 generated, and the contents of residences are indeterminate. The 3661 residence status of a single named program can also be queried by calling 3662 GetProgramivNV (Section 6.1.13) with id set to the name of the program and 3663 pname set to PROGRAM_RESIDENT_NV. 3664 3665 AreProgramsResidentNV indicates only whether a program is currently 3666 resident, not whether it could not be made resident. An implementation 3667 may choose to make a program resident only on first use, for example. The 3668 client may guide the GL implementation in determining which programs 3669 should be resident by requesting a set of programs to make resident. 3670 3671 The command 3672 3673 void RequestResidentProgramsNV(sizei n, const uint *ids); 3674 3675 requests that the n programs named in ids should be made resident. 3676 While all the programs are not guaranteed to become resident, 3677 the implementation should make a best effort to make as many of 3678 the programs resident as possible. As a result of making the 3679 requested programs resident, program names not among the requested 3680 programs may become non-resident. Higher priority for residency 3681 should be given to programs listed earlier in the ids array. 3682 RequestResidentProgramsNV silently ignores attempts to make resident 3683 nonexistent program names or zero. AreProgramsResidentNV can be 3684 called after RequestResidentProgramsNV to determine which programs 3685 actually became resident. 3686 3687 The commands 3688 3689 void ProgramNamedParameter4fNV(uint id, sizei len, const ubyte *name, 3690 float x, float y, float z, float w); 3691 void ProgramNamedParameter4dNV(uint id, sizei len, const ubyte *name, 3692 double x, double y, double z, double w); 3693 void ProgramNamedParameter4fvNV(uint id, sizei len, const ubyte *name, 3694 const float v[]); 3695 void ProgramNamedParameter4dvNV(uint id, sizei len, const ubyte *name, 3696 const double v[]); 3697 3698 specify a new value for the named program local parameter <name> belonging 3699 to the fragment program specified by <id>. <name> is a pointer to an 3700 array of ubytes holding the parameter name. <len> specifies the number of 3701 ubytes in the array given by <name>. The new x, y, z, and w components of 3702 the named local parameter are given by x, y, z, and w, respectively, for 3703 ProgramNamedParameter4fNV and ProgramNamedParameter4dNV, and by v[0], 3704 v[1], v[2], and v[3], respectively, for ProgramNamedParameter4fvNV and 3705 ProgramNamedParameter4dvNV. The error INVALID_OPERATION is generated if 3706 <id> specifies a nonexistent program or a program whose type does not 3707 suport named local parameters. The error INVALID_VALUE error is generated 3708 if <name> does not specify the name of a local parameter in the program 3709 corresponding to <id>. The error INVALID_VALUE is also generated if <len> 3710 is zero. 3711 3712 The commands 3713 3714 void ProgramLocalParameter4fARB(enum target, uint index, 3715 float x, float y, float z, float w); 3716 void ProgramLocalParameter4fvARB(enum target, uint index, 3717 const float *params); 3718 void ProgramLocalParameter4dARB(enum target, uint index, 3719 double x, double y, double z, double w); 3720 void ProgramLocalParameter4dvARB(enum target, uint index, 3721 const double *params); 3722 3723 update the values of the numbered program local parameter <index> 3724 belonging to the program object currently bound to <target>. For 3725 ProgramLocalParameter4fARB and ProgramLocalParameter4dARB, the four 3726 components of the parameter are updated with the values of <x>, <y>, <z>, 3727 and <w>, respectively. For ProgramLocalParameter4fvARB and 3728 ProgramLocalParameter4dvARB, the four components of the parameter are 3729 updated with the array of four values pointed to by <params>. The error 3730 INVALID_VALUE is generated if <index> is greater than or equal to the 3731 number of numbered program local parameters supported by <target>. 3732 3733 3734Additions to Chapter 6 of the OpenGL 1.2.1 Specification (State and 3735State Requests) 3736 3737 Modify Section 6.1.11, Pointer and String Queries (p. 206) 3738 3739 (modify last paragraph, p. 206) ... The possible values for <name> are 3740 VENDOR, RENDERER, VERSION, EXTENSIONS, and PROGRAM_ERROR_STRING_NV. 3741 3742 (add after last paragraph of section, p. 207) Queries of 3743 PROGRAM_ERROR_STRING_NV return a pointer to an implementation-dependent 3744 program load error string. If the last call to LoadProgramNV failed to 3745 load a program, the returned string describes a reason that the program 3746 failed to load. Otherwise, a pointer to an empty string (containing only 3747 a terminator) is returned. 3748 3749 Rename and modify Section 6.1.13, Vertex and Fragment Program Queries 3750 (from GL_NV_fragment_program). Portions of this section pertaining to 3751 fragment programs are copied verbatim. 3752 3753 (insert after discussion of GetProgramParameter[fd]vNV) 3754 3755 The commands 3756 3757 void GetProgramNamedParameterfvNV(uint id, sizei len, 3758 const ubyte *name, float *params); 3759 void GetProgramNamedParameterdvNV(uint id, sizei len, 3760 const ubyte *name, double *params); 3761 3762 obtain the current program named local parameter value for the parameter 3763 named <name> belonging to the program given by <id>. <name> is a pointer 3764 to an array of ubytes holding the parameter name. <len> specifies the 3765 number of ubytes in the array given by <name>. The error 3766 INVALID_OPERATION is generated if <id> specifies a nonexistent program or 3767 a program whose type does not suport named local parameters. The error 3768 INVALID_VALUE is generated if <name> does not specify the name of a local 3769 parameter in the program corresponding to <id>. The error INVALID_VALUE 3770 is also generated if <len> is zero. Each named program local parameter is 3771 an array of four values. 3772 3773 The commands 3774 3775 void GetProgramLocalParameterdvARB(enum target, uint index, 3776 double *params); 3777 void GetProgramLocalParameterfvARB(enum target, uint index, 3778 float *params); 3779 3780 obtain the current value for the numbered program local parameter <index> 3781 belonging to the program object currently bound to <target>, and places 3782 the information in the array <params>. The error INVALID_ENUM is 3783 generated if <target> specifies a nonexistent program target or a program 3784 target that does not support numbered program local parameters. The error 3785 INVALID_VALUE is generated if <index> is greater than or equal to the 3786 implementation-dependent number of supported numbered program local 3787 parameters for the program target. 3788 3789 When the program target type is FRAGMENT_PROGRAM_NV, each numbered program 3790 local parameter returned is an array of four values. ... 3791 3792 The command 3793 3794 void GetProgramivNV(uint id, enum pname, int *params); 3795 3796 obtains program state named by pname for the program named id in the array 3797 params. pname must be one of PROGRAM_TARGET_NV, PROGRAM_LENGTH_NV, or 3798 PROGRAM_RESIDENT_NV. The error INVALID_OPERATION is generated if the 3799 program named id does not exist. 3800 3801 The command 3802 3803 void GetProgramStringNV(uint id, enum pname, 3804 ubyte *program); 3805 3806 obtains the program string for program id. pname must be 3807 PROGRAM_STRING_NV. n ubytes are returned into the array program 3808 where n is the length of the program in ubytes. GetProgramivNV with 3809 PROGRAM_LENGTH_NV can be used to query the length of a program's 3810 string. The INVALID_OPERATION error is generated if the program 3811 named id does not exist. 3812 3813 ... 3814 3815 The command 3816 3817 boolean IsProgramNV(uint id); 3818 3819 returns TRUE if program is the name of a program object. If program 3820 is zero or is a non-zero value that is not the name of a program 3821 object, or if an error condition occurs, IsProgramNV returns FALSE. 3822 A name returned by GenProgramsNV but not yet loaded with a program 3823 is not the name of a program object." 3824 3825 3826Additions to Appendix F of the OpenGL 1.2.1 Specification (ARB Extensions) 3827 3828 Modify Section F.2.3 (Changes to Section 2.6), p.240 3829 3830 (modify last paragraph on p.240) ... Multiple sets of texture coordinates 3831 may be used to specify how multiple texture images are mapped onto a 3832 primitive. The number of texture coordinate sets supported is 3833 implementation dependent, but must be at least 1. The number of texture 3834 coordinate sets supported may be queried with the state 3835 MAX_TEXTURE_COORDS_NV. 3836 3837 Modify Section F.2.4 (Changes to Section 2.7), p.241 3838 3839 (modify the last paragraph on p.241, carrying over to p.243) 3840 Implementations may support more than one set of texture coordinates. The 3841 commands 3842 3843 void MultiTexCoord{1234}{sifd}ARB(enum texture, T coords) 3844 void MultiTexCoord{1234}{sifd}vARB(enum texture, T coords) 3845 3846 take the coordinate set to be modified as the <texture> parameter. 3847 <texture> is a symbolic constant of the form TEXTUREi_ARB, indicating that 3848 texture coordinate set i is to be modified. The constants obey 3849 TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is 3850 the implementation dependent number of texture units defined by 3851 MAX_TEXTURE_COORDS_NV). 3852 3853 3854 Modify Section F.2.5 (Changes to Section 2.8), p.243 3855 3856 (modify first and second paragraphs of section) ... The client may specify 3857 up to 5 plus the value of MAX_TEXTURE_COORDS_NV arrays; one each to store 3858 vertex coordinates... 3859 3860 In implementations which support more than one texture coordinate set, the 3861 command 3862 3863 void ClientActiveTextureARB(enum texture) 3864 3865 is used to select the vertex array client state parameters to be modified 3866 by the TexCoordPointer command and the array affected by EnableClientState 3867 and DisableClientState with the parameter TEXTURE_COORD_ARRAY. This 3868 command sets the state variable CLIENT_ACTIVE_TEXTURE_ARB. Each texture 3869 coordinate set has a client state vector which is selected when this 3870 command is invoked. This state vector also includes the vertex array 3871 state. This command also selects the texture coordinate set state used 3872 for queries of client state. 3873 3874 (modify first paragraph on p.244) If the number of supported texture 3875 coordinate sets (the value of MAX_TEXTURE_COORDS_NV) is k, ... 3876 3877 3878 Modify Section F.2.6 (Changes to Section 2.10.2), p.244 3879 3880 (modify first paragraph) For each texture coordinate set, a 4x4 matrix is 3881 applied to the corresponding texture coordinates... 3882 3883 (replace second and third paragraphs) The command 3884 3885 void ActiveTextureARB(enum texture); 3886 3887 specifies the active texture unit selector, ACTIVE_TEXTURE_ARB. Each 3888 texture unit contains up to two distinct sub-units: a texture coordinate 3889 processing unit (consisting of a texture matrix stack and texture 3890 coordinate generation state) and a texture image unit (consisting of all 3891 the texture state defined in Section 3.8). In implementations with a 3892 different number of supported texture coordinate sets and texture image 3893 units, some texture units may consist of only one of the two sub-units. 3894 3895 The active texture unit selector specifies the texture unit accessed by 3896 commands involving texture coordinate processing. Such commands include 3897 those accessing the current matrix stack (if MATRIX_MODE is TEXTURE), 3898 TexGen (Section 2.10.4), Enable/Disable (if any texture coordinate 3899 generation enum is selected), as well as queries of the current texture 3900 coordinates and current raster texture coordinates. If the texture unit 3901 number corresponding to the current value of ACTIVE_TEXTURE_ARB is greater 3902 than or equal to the implementation dependent constant 3903 MAX_TEXTURE_COORDS_NV, the error INVALID_OPERATION is generated by any 3904 such command. 3905 3906 The active texture unit selector also selects the texture unit accessed by 3907 commands involving texture image processing (Section 3.8). Such commands 3908 include all variants of TexEnv, TexParameter, and TexImage commands, 3909 BindTexture, Enable/Disable for any texture target (e.g., TEXTURE_2D), and 3910 queries of all such state. If the texture unit number corresponding to 3911 the current value of ACTIVE_TEXTURE_ARB is greater than or equal to the 3912 implementation dependent constant MAX_TEXTURE_IMAGE_UNITS_NV, the error 3913 INVALID_OPERATION is generated by any such command. 3914 3915 ActiveTextureARB generates the error INVALID_ENUM if an invalid <texture> 3916 is specified. <texture> is a symbolic constant of the form TEXTUREi_ARB, 3917 indicating that texture unit i is to be modified. The constants obey 3918 TEXTUREi_ARB = TEXTURE0_ARB + i (i is in the range 0 to k-1, where k is 3919 the larger of the MAX_TEXTURE_COORDS_NV and MAX_TEXTURE_IMAGE_UNITS_NV). 3920 For compatibility with old OpenGL specifications, the implementation 3921 dependent constant MAX_TEXTURE_UNITS_ARB specifies the number of 3922 conventional texture units supported by the implementation. Its value 3923 must be no larger than the minimum of MAX_TEXTURE_COORDS_NV and 3924 MAX_TEXTURE_IMAGE_UNITS_NV. 3925 3926 Modify Section F.2.12 (Changes to Section 3.8.10), p.249 3927 3928 (modify next-to-last paragraph) Texturing is enabled and disabled 3929 individually for each texture unit. If texturing is disabled for one of 3930 the units, then the fragment resulting from the previous unit is passed 3931 unaltered to the following unit. Individual texture units beyond those 3932 specified by MAX_TEXTURE_UNITS_ARB may be incomplete and are always 3933 treated as disabled. 3934 3935 Modify Section F.2.15 (Changes to Section 6.1.2), p.251 3936 3937 (add to end of paragraph) Queries of texture state variables corresponding 3938 to texture coordinate processing unit (namely, TexGen state and enables, 3939 and matrices) will produce an INVALID_OPERATION error if the value of 3940 ACTIVE_TEXTURE_ARB is greater than or equal to MAX_TEXTURE_COORDS_NV. All 3941 other texture state queries will result in an INVALID_OPERATION error if 3942 the value of ACTIVE_TEXTURE_ARB is greater than or equal to 3943 MAX_TEXTURE_IMAGE_UNITS_NV. 3944 3945Additions to the AGL/GLX/WGL Specifications 3946 3947 Program objects are shared between AGL/GLX/WGL rendering contexts if 3948 and only if the rendering contexts share display lists. No change 3949 is made to the AGL/GLX/WGL API. 3950 3951Dependencies on GL_NV_vertex_program 3952 3953 If NV_vertex_program is supported, the description of LoadProgramNV in 3954 Section 2.14.1.7 (up to the BNF description of vertex programs) is 3955 deleted, as it is replaced by the contents of Section 5.7 in this 3956 specification. The general error descriptions in Section 2.14.1.7 common 3957 to Section 5.7 (like INVALID_OPERATION if the program fails to compile) 3958 should also be deleted. Section 2.14.1.8 should also be deleted. Section 3959 6.1.13 is modified by this specification as described above. 3960 3961Dependencies on NV_texture_shader 3962 3963 If NV_texture_shader is not supported, the comment about texture shaders 3964 being disabled in fragment program mode is not applicable. 3965 3966Dependencies on NV_texture_rectangle 3967 3968 If NV_texture_rectangle is not supported, the references to "RECT" in the 3969 <texImageTarget> grammar rule and TEXTURE_RECTANGLE_NV are not applicable. 3970 3971Dependencies on ARB_texture_cube_map 3972 3973 If ARB_texture_cube_map is not supported, the references to "CUBE" in the 3974 <texImageTarget> grammar rule and TEXTURE_CUBE_MAP_ARB are not applicable. 3975 3976Dependencies on EXT_fog_coord 3977 3978 If EXT_fog_coord is not supported, references to "fog coordinate" in the 3979 definition of the "FOGC" fragment attribute register should be removed. 3980 3981Dependencies on NV_depth_clamp 3982 3983 If NV_depth_clamp is not supported, section 3.11.6 is modified to remove 3984 discussion of the depth clamp enable and instead indicate that fragments 3985 with depth values outside [min(n,f), max(n,f)] are always discarded. 3986 3987Dependencies on ARB_depth_texture and SGIX_depth_texture 3988 3989 If ARB_depth_texture is not supported, but SGIX_depth_texture is 3990 supported, the discussion of Table X.5 is modified to indicate that 3991 DEPTH_COMPONENT textures are treated as LUMINANCE. 3992 3993 If neither extension is supported, the discussion of DEPTH_COMPONENT 3994 textures in Table X.5 should be removed. 3995 3996Dependencies on NV_float_buffer 3997 3998 If NV_float_buffer is not supported, references to FLOAT_R_NV, 3999 FLOAT_RG_NV, FLOAT_RGB_NV, and FLOAT_RGBA_NV internal texture formats in 4000 Table X.5 should be removed. 4001 4002Dependencies on ARB_vertex_program 4003 4004 This extension does not have any explicit dependencies, but the APIs for 4005 setting and querying numbered local parameters (ProgramLocalParameter*ARB 4006 and GetProgramLocalParameter*ARB) were taken directly from this extension, 4007 4008Dependencies on ARB_fragment_program 4009 4010 If ARB_fragment_program is not supported, the maximum number of executable 4011 instructions in any !!FP1.0 program is 1024. If ARB_fragment_program is 4012 supported, the maximum number of executable instructions for an !!FP1.0 is 4013 at least 1024, but can be larger. The limit can be queried by calling 4014 GetProgramiv with <target> set to FRAGMENT_PROGRAM_ARB and <pname> set to 4015 MAX_PROGRAM_INSTRUCTIONS_ARB. 4016 4017 4018GLX Protocol 4019 4020 Most of the GLX protocol needed to implement this extension is described 4021 in the GL_NV_vertex_program extension specification and will not be 4022 repeated here. 4023 4024 The following two rendering commands are potentially large, and hence can 4025 be sent in a glXRender or glXRenderLarge request. 4026 4027 ProgramNamedParameter4fvNV 4028 2 28+len+p rendering command length 4029 2 4218 rendering command opcode 4030 4 CARD32 id 4031 4 CARD32 len 4032 4 FLOAT32 params[0] 4033 4 FLOAT32 params[1] 4034 4 FLOAT32 params[2] 4035 4 FLOAT32 params[3] 4036 len LISTofCARD8 name 4037 p unused, p=pad(len) 4038 4039 If the command is encoded in a glxRenderLarge request, the command 4040 opcode and command length fields above are expanded to 4 bytes each: 4041 4042 4 32+len+p rendering command length 4043 4 4218 rendering command opcode 4044 4045 4046 ProgramNamedParameter4dvNV 4047 2 44+len+p rendering command length 4048 2 4219 rendering command opcode 4049 4 CARD32 id 4050 4 CARD32 len 4051 8 FLOAT64 params[0] 4052 8 FLOAT64 params[1] 4053 8 FLOAT64 params[2] 4054 8 FLOAT64 params[3] 4055 len LISTofCARD8 name 4056 p unused, p=pad(len) 4057 4058 If the command is encoded in a glxRenderLarge request, the command 4059 opcode and command length fields above are expanded to 4 bytes each: 4060 4061 4 48+len+p rendering command length 4062 4 4219 rendering command opcode 4063 4064 4065 The remaining two commands are non-rendering commands. These commands are 4066 sent separately (i.e., not as part of a glXRender or glXRenderLarge 4067 request), using the glXVendorPrivateWithReply request: 4068 4069 GetProgramNamedParameterfvNV 4070 1 CARD8 opcode (X assigned) 4071 1 17 GLX opcode (glXVendorPrivateWithReply) 4072 2 4+(len+p)/4 request length 4073 4 1310 vendor specific opcode 4074 4 GLX_CONTEXT_TAG context tag 4075 4 INT32 len 4076 len LISTofCARD8 name 4077 p unused, p=pad(len) 4078 => 4079 4080 If the command succeeds, 4 floats are sent in the reply: 4081 4082 1 1 reply 4083 1 unused 4084 2 CARD16 sequence number 4085 4 4 reply length 4086 24 unused 4087 16 LISTofFLOAT32 params 4088 4089 Otherwise, an empty reply is sent, indicating that a GL error 4090 occured: 4091 4092 1 1 reply 4093 1 unused 4094 2 CARD16 sequence number 4095 4 0 reply length 4096 24 unused 4097 4098 4099 GetProgramNamedParameterdvNV 4100 1 CARD8 opcode (X assigned) 4101 1 17 GLX opcode (glXVendorPrivateWithReply) 4102 2 4+(len+p)/4 request length 4103 4 1311 vendor specific opcode 4104 4 GLX_CONTEXT_TAG context tag 4105 4 INT32 len 4106 len LISTofCARD8 name 4107 p unused, p=pad(len) 4108 => 4109 4110 If the command succeeds, 4 doubles are sent in the reply: 4111 4112 1 1 reply 4113 1 unused 4114 2 CARD16 sequence number 4115 4 8 reply length 4116 24 unused 4117 32 LISTofFLOAT64 params 4118 4119 Otherwise, an empty reply is sent, indicating that a GL error 4120 occured: 4121 4122 1 1 reply 4123 1 unused 4124 2 CARD16 sequence number 4125 4 0 reply length 4126 24 unused 4127 4128 4129Errors 4130 4131 INVALID_OPERATION is generated by Begin, DrawPixels, Bitmap, CopyPixels, 4132 or a command that performs an explicit Begin if FRAGMENT_PROGRAM_NV is 4133 enabled and the currently bound fragment program does not exist. 4134 4135 INVALID_OPERATION is generated by ProgramNamedParameter4fNV, 4136 ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV, 4137 ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or 4138 GetProgramNamedParameterdvNV if <id> specifies a nonexistent program or a 4139 program whose type does not suport local parameters. 4140 4141 INVALID_VALUE is generated by ProgramNamedParameter4fNV, 4142 ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV, 4143 ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or 4144 GetProgramNamedParameterdvNV if <len> is zero. 4145 4146 INVALID_VALUE is generated by ProgramNamedParameter4fNV, 4147 ProgramNamedParameter4dNV, ProgramNamedParameter4fvNV, 4148 ProgramNamedParameter4dvNV, GetProgramNamedParameterfvNV, or 4149 GetProgramNamedParameterdvNV if <name> does not specify the name of a 4150 local parameter in the program corresponding to <id>. 4151 4152 INVALID_OPERATION is generated by any command accessing texture coordinate 4153 processing state if the texture unit number corresponding to the current 4154 value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation 4155 dependent constant MAX_TEXTURE_COORDS_NV. 4156 4157 INVALID_OPERATION is generated by any command accessing texture image 4158 processing state if the texture unit number corresponding to the current 4159 value of ACTIVE_TEXTURE_ARB is greater than or equal to the implementation 4160 dependent constant MAX_TEXTURE_IMAGE_UNITS_NV. 4161 4162 4163 (The following are error descriptions copied from GL_NV_vertex_program 4164 that apply to this extension as well. These modifications do not affect 4165 the behavior of that extension.) 4166 4167 INVALID_VALUE is generated by LoadProgramNV if id is zero. 4168 4169 INVALID_OPERATION is generated by LoadProgramNV if the program 4170 corresponding to id is currently loaded but has a program type different 4171 from that given by target. 4172 4173 INVALID_OPERATION is generated by LoadProgramNV if the program specified 4174 is syntactically incorrect for the program type specified by target. The 4175 value of PROGRAM_ERROR_POSITION_NV is still updated when this error is 4176 generated. 4177 4178 INVALID_OPERATION is generated by LoadProgramNV if the program specified 4179 fails to conform to any of the semantic restrictions imposed on programs 4180 of the type specified by target. The value of PROGRAM_ERROR_POSITION_NV 4181 is still updated when this error is generated. 4182 4183 INVALID_OPERATION is generated by BindProgramNV if target does not match 4184 the type of the program named by id. 4185 4186 INVALID_VALUE is generated by AreProgramsResidentNV if any of the queried 4187 programs are zero or do not exist. 4188 4189 INVALID_OPERATION is generated by GetProgramivNV or GetProgramStringNV if 4190 the program named id does not exist. 4191 4192 4193New State 4194 4195Get Value Type Get Command Initial Value Description Section Attribute 4196--------------------------------- ---- ----------------------- ------------- ------------------ -------- ------------ 4197FRAGMENT_PROGRAM_NV B IsEnabled FALSE fragment program 3.11 enable 4198 mode enable 4199FRAGMENT_PROGRAM_BINDING_NV Z+ GetIntegerv 0 bound fragment 5.7 - 4200 program 4201 4202Table X.6. New State Introduced by NV_fragment_program. 4203 4204 4205Get Value Type Get Command Initial Value Description Section Attribute 4206------------------------- ------ ------------------ ------------- ------------------ -------- --------- 4207PROGRAM_ERROR_POSITION_NV Z GetIntegerv -1 program error 5.7 - 4208 position 4209PROGRAM_TARGET_NV Z2 GetProgramivNV 0 program target 6.1.13 - 4210PROGRAM_LENGTH_NV Z+ GetProgramivNV 0 program length 6.1.13 - 4211PROGRAM_RESIDENT_NV Z2 GetProgramivNV False program residency 6.1.13 - 4212PROGRAM_STRING_NV ubxn GetProgramStringNV "" program string 6.1.13 - 4213- nxR4 GetProgramNamed- (0,0,0,0) named program local 5.7 - 4214 ParameterNV parameter value 4215- 64+xR4 GetProgramLocal- (0,0,0,0) numbered program 5.7 - 4216 ParameterARB local parameter 4217 4218Table X.7. Program Object State common to NV_vertex_program and NV_fragment_program. 4219 4220 4221Get Value Type Get Command Initial Value Description Section Attribute 4222--------- ------ ----------- ------------- ----------------------- -------- --------- 4223- 12xR4 - fragment data fragment attribute 4224 registers 3.11.1.1 - 4225- 16xR4 - (0,0,0,0) fp32 temporary registers 3.11.1.2 - 4226- 32xR4 - (0,0,0,0) fp16 temporary registers 3.11.1.2 - 4227 (Z_4)4 - (EQ,EQ,EQ,EQ) condition code register 3.11.1.4 - 4228 address register 4229 4230Table X.8. Fragment Program Per-Fragment Execution State. 4231 4232 4233New Implementation Dependent State 4234 4235 Minimum 4236Get Value Type Get Command Value Description Section Attribute 4237--------- ---- ----------- ------- ----------------- ------- --------- 4238MAX_TEXTURE_COORDS_NV Z+ GetIntegerv 2 number of texture 2.6 - 4239 coordinate sets 4240 supported 4241MAX_TEXTURE_IMAGE_UNITS_NV Z+ GetIntegerv 2 number of texture 2.10.2 - 4242 image units 4243 supported 4244MAX_FRAGMENT_PROGRAM_ Z+ GetIntegerv 64 number of numbered 3.11.7 - 4245 LOCAL_PARAMETERS_NV local parameters 4246 supported 4247 4248 4249Revision History 4250 4251 Rev. Date Author Changes 4252 ---- -------- -------- -------------------------------------------- 4253 73 05/23/05 pbrown Fixed cut-and-paste error in the dependency 4254 section where it said "NV_texture_rectangle" 4255 instead of "ARB_texture_cube_map". 4256 4257 72 05/16/04 pbrown Documented that it's not possible to results from 4258 LG2 that are any more precise than what is 4259 available in the fp32 storage format. 4260 4261 71 04/23/04 pbrown Fixed incorrect example. 4262 4263 70 03/20/03 pbrown Made the instruction count limit for !!FP1.0 4264 programs queryable instead of a hard-wired value 4265 of 1024. The limit can be queried using 4266 ARB_fragment_program mechanisms, and remains 1024 4267 if ARB_fragment_program is unsupported. 4268 4269 69 02/01/03 pbrown Removed support for combiner fragment programs 4270 (!!FCP1.0). 4271 4272 68 01/08/03 pbrown Correct spec language providing examples of NaNs, 4273 such as sqrt(-1) or log(-1). Division by zero 4274 produces an infinity, not a NaN. 4275 4276 67 12/23/02 pbrown Fix incorrect syntax of examples of "KIL" 4277 instruction. The condition code test is not 4278 parenthesized in KIL. 4279 4280 66 10/31/02 pbrown Cleaned up special cases of POW, including the 4281 fact that "POW dst, 0, 0" produces NaN in this 4282 spec, not 1.0. 4283 4284 65 10/28/02 pbrown Documented that signed HILO textures will have 4285 the hemisphere remapping applied, but unsigned 4286 textures will not. 4287 4288 64 09/17/02 pbrown Minor typo fixes. 4289 4290 63 08/14/02 pbrown Clarified the value of the "other" components 4291 of f[FOGC]. 4292 4293 62 07/24/02 pbrown Removed PK4UBG and UP4UBG instructions. 4294 Simplified the implementation of the temporary 4295 and output register limit for combiner 4296 programs by counting all four o[TEXn] registers 4297 against the limit, whether or not they are 4298 written. 4299 4300 61 07/19/02 pbrown Renamed ProgramLocalParameter*NV to 4301 ProgramNamedParameter*NV to eliminate naming 4302 conflicts with ARB_vertex_program (and presumably 4303 ARB_fragment_program). 4304 4305 Added support for numbered program local 4306 parameters for compatibility with the ARB vertex 4307 program extension (and upcoming ARB fragment 4308 program extension), so it's possible to set local 4309 parameters the same way in both extensions. 4310 4311 Eliminated the language describing "register 4312 slots" and how the "H" and "R" registers overlap. 4313 Instead, registers are guaranteed not to overlap, 4314 and a semantic limit is added on the number of 4315 temporaries and output registers that can be used 4316 by a program. 4317 4318 Eliminated the requirement that non-combiner 4319 programs actually write a color value; the only 4320 requirement is that one output register be 4321 written. When using fragment programs that use 4322 depth replacement, there may not be a need to 4323 compute color if color writes are currently 4324 disabled 4325 4326 Cleaned up the issues section. Added several 4327 examples of fragment program operation. 4328 4329 Cleaned up GLX protocol. 4330 4331 59 07/07/02 pbrown Minor clarifications of texture lookup handling. 4332 Documented that DDX and DDY may not always 4333 produce infinities. 4334 4335 58 06/27/02 pbrown Added clarification that instructions can use the 4336 same attribute or parameter register more than 4337 once. Added support for "X" precision on the 4338 "set on" instructions. Removed "X" precision 4339 support from DST. 4340 4341 57 06/27/02 pbrown Added missing table entries covering the use of 4342 floating-point textures. 4343 4344 56 06/27/02 pbrown Modified the spec to indicate that depth textures 4345 are treated as alpha, luminance, or intensity 4346 according to the depth texture mode in ARB_shadow. 4347 4348 55 06/26/02 pbrown Fixed the correct aliased register number and 4349 "read-only" mappings for o[DEPR] in combiner 4350 programs. 4351 4352 54 06/05/02 pbrown Fixed the spec to indicate that near and far 4353 frustum clipping is disabled for depth 4354 replacement programs. Fixed the spec to indicate 4355 that the register combiners enable is overridden 4356 for fragment programs (enabled for combiner 4357 programs, disabled for color programs). 4358 4359 53 05/20/02 pbrown Miscellaneous bug fixes for wording and 4360 special-case handling errors. 4361 4362 52 05/16/02 pbrown Added "_SAT" suffix to clamp result vector 4363 components to [0,1]. Fixed special case rules 4364 for MUL instruction and the "UN" condition code. 4365 4366 50 04/19/02 pbrown Added "$" as a legal character in an identifier 4367 name. Added example for fixed and conditional 4368 write masks and condition code updates. 4369 4370 49 04/16/02 pbrown Added new query of PROGRAM_ERROR_STRING_NV to 4371 return more detailed information on program load 4372 failures. 4373 4374 48 04/02/02 pbrown Added missing enum value for the 4375 FRAGMENT_PROGRAM_BINDING_NV query. 4376 4377 47 03/15/02 pbrown Fixed various typos, and an incorrect description 4378 of the MAX operation. 4379 4380 45 01/31/02 pbrown Renamed the packing and unpacking opcode to more 4381 closely match OpenGL data type naming conventions 4382 (PK2 becomes PK2H, PK16 becomes PH2US, PK4 4383 becomes PK4B, PKB becomes PK4UB). Renamed "BEM" 4384 instruction to "X2D" to reflect the fact that it 4385 does a 2D coordinate transformation (not just a 4386 bump mapping operation). Added PK4UBG and UP4UBG 4387 instructions to support sRGB gamma correction 4388 when packing and unpacking components. 4389 4390 44 01/18/02 pbrown Double the number of available temporaries (16 to 4391 32 fp32 vectors). Add BEM (texture coordinate 4392 offset), PKB/UPB (unsigned byte packing), and 4393 PK16/UP16 (unsigned short packing) instructions. 4394 4395 43 01/04/02 pbrown Documented special cases for comparisons, 4396 including the handling of NaN in the SNE 4397 instruction. Added automatic generation of a 4398 third normal component for HILO textures. 4399 Documented the restriction that RFL can't write 4400 to the w component of the result. Trivial fix of 4401 the special-cases for RCP. Fixed minor typo on 4402 the TEX instruction. 4403 4404 40 11/26/01 pbrown Eliminated "X" precision specifier on those 4405 instructions that do complicated math or don't 4406 otherwise need it (e.g., "SGE"). Fixed special 4407 case math on LG2 instruction. Eliminated 4408 incorrectly specified exponent clamping on LIT 4409 instruction. Fixed description and special-case 4410 math on LIT/POW instructions. Specified that 4411 combiner program outputs are clamped to [-1,+1], 4412 not [+0,+1]. 4413 4414 39 11/16/01 pbrown Added semantic restriction that PK2/PK4 must 4415 write to a 32-bit register. Cleaned up the 4416 converse restrictions on UP2/UP4, making sure to 4417 allow UP2/UP4 from a program parameter. Fix 4418 section numberings and a few typos. 4419 4420 36 11/07/01 pbrown Cleaned up explanation of the "negative q is 4421 undefined" for texture mapping spec restriction. 4422 Fixed a nit on the number of condition code 4423 values (now 4 with UN - unordered). 4424 4425 35 10/29/01 pbrown Add a SUB instruction for programmer 4426 convenience. Moved unresolved issue list back to 4427 the "Issues" section. Fix several minor wording 4428 issues. Clarify register combiners/texture 4429 shader/fragment program flow control diagram. 4430 4431 32 10/19/01 pbrown Document the fragment program restriction that 4432 instructions involving f[FOGC] and f[TEX0-TEX7] 4433 are always carried out at fp32 precision. 4434 4435 31 10/19/01 pbrown Fixed incorrect description of encoding of fp16 4436 denorms. 4437 4438 30 10/12/01 pbrown Documented (0,0,0,0) local parameter 4439 initialization. Disallow multiple defines of the 4440 same token. Allow tokens that look like a 4441 possible register or texture name, but have 4442 numbers that are too big (e.g., "TEX24", "R37"). 4443 Fixed up several grammar bugs. Documented that 4444 LG2 and RSQ now do not automatically take 4445 absolute values, plus new math special cases. 4446