1Name 2 3 NV_gpu_program4 4 5Name Strings 6 7 GL_NV_gpu_program4 8 9Contact 10 11 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 12 13Status 14 15 Shipping for GeForce 8 Series (November 2006) 16 17Version 18 19 Last Modified Date: 09/11/2014 20 NVIDIA Revision: 11 21 22Number 23 24 322 25 26Dependencies 27 28 This extension is written against to OpenGL 2.0 specification. 29 30 OpenGL 2.0 is not required, but we expect all implementations of this 31 extension will also support OpenGL 2.0. 32 33 This extension is also written against the ARB_vertex_program 34 specification, which provides the basic mechanisms for the assembly 35 programming model used by this extension. 36 37 This extension serves as the basis for the NV_fragment_program4, 38 NV_geometry_program4, and NV_vertex_program4, which all build on this 39 extension to support fragment, geometry, and vertex programs, 40 respectively. If "GL_NV_gpu_program4" is found in the extension string, 41 all of these extensions are supported. 42 43 NV_parameter_buffer_object affects the definition of this extension. 44 45 ARB_texture_rectangle trivially affects the definition of this extension. 46 47 EXT_gpu_program_parameters trivially affects the definition of this 48 extension. 49 50 EXT_texture_integer trivially affects the definition of this extension. 51 52 EXT_texture_array trivially affects the definition of this extension. 53 54 EXT_texture_buffer_object trivially affects the definition of this 55 extension. 56 57 NV_primitive_restart trivially affects the definition of this extension. 58 59Overview 60 61 This specification documents the common instruction set and basic 62 functionality provided by NVIDIA's 4th generation of assembly instruction 63 sets supporting programmable graphics pipeline stages. 64 65 The instruction set builds upon the basic framework provided by the 66 ARB_vertex_program and ARB_fragment_program extensions to expose 67 considerably more capable hardware. In addition to new capabilities for 68 vertex and fragment programs, this extension provides a new program type 69 (geometry programs) further described in the NV_geometry_program4 70 specification. 71 72 NV_gpu_program4 provides a unified instruction set -- all instruction set 73 features are available for all program types, except for a small number of 74 features that make sense only for a specific program type. It provides 75 fully capable signed and unsigned integer data types, along with a set of 76 arithmetic, logical, and data type conversion instructions capable of 77 operating on integers. It also provides a uniform set of structured 78 branching constructs (if tests, loops, and subroutines) that fully support 79 run-time condition testing. 80 81 This extension provides several new texture mapping capabilities. Shadow 82 cube maps are supported, where cube map faces can encode depth values. 83 Texture lookup instructions can include an immediate texel offset, which 84 can assist in advanced filtering. New instructions are provided to fetch 85 a single texel by address in a texture map (TXF) and query the size of a 86 specified texture level (TXQ). 87 88 By and large, vertex and fragment programs written to ARB_vertex_program 89 and ARB_fragment_program can be ported directly by simply changing the 90 program header from "!!ARBvp1.0" or "!!ARBfp1.0" to "!!NVvp4.0" or 91 "!!NVfp4.0", and then modifying the code to take advantage of the expanded 92 feature set. There are a small number of areas where this extension is 93 not a functional superset of previous vertex program extensions, which are 94 documented in this specification. 95 96 97New Procedures and Functions 98 99 void ProgramLocalParameterI4iNV(enum target, uint index, 100 int x, int y, int z, int w); 101 void ProgramLocalParameterI4ivNV(enum target, uint index, 102 const int *params); 103 void ProgramLocalParametersI4ivNV(enum target, uint index, 104 sizei count, const int *params); 105 void ProgramLocalParameterI4uiNV(enum target, uint index, 106 uint x, uint y, uint z, uint w); 107 void ProgramLocalParameterI4uivNV(enum target, uint index, 108 const uint *params); 109 void ProgramLocalParametersI4uivNV(enum target, uint index, 110 sizei count, const uint *params); 111 112 void ProgramEnvParameterI4iNV(enum target, uint index, 113 int x, int y, int z, int w); 114 void ProgramEnvParameterI4ivNV(enum target, uint index, 115 const int *params); 116 void ProgramEnvParametersI4ivNV(enum target, uint index, 117 sizei count, const int *params); 118 void ProgramEnvParameterI4uiNV(enum target, uint index, 119 uint x, uint y, uint z, uint w); 120 void ProgramEnvParameterI4uivNV(enum target, uint index, 121 const uint *params); 122 void ProgramEnvParametersI4uivNV(enum target, uint index, 123 sizei count, const uint *params); 124 125 void GetProgramLocalParameterIivNV(enum target, uint index, 126 int *params); 127 void GetProgramLocalParameterIuivNV(enum target, uint index, 128 uint *params); 129 void GetProgramEnvParameterIivNV(enum target, uint index, 130 int *params); 131 void GetProgramEnvParameterIuivNV(enum target, uint index, 132 uint *params); 133 134New Tokens 135 136 137 Accepted by the <pname> parameter of GetBooleanv, GetIntegerv, 138 GetFloatv, and GetDoublev: 139 140 MIN_PROGRAM_TEXEL_OFFSET_EXT 0x8904 141 MAX_PROGRAM_TEXEL_OFFSET_EXT 0x8905 142 143 (note: these tokens are shared with the EXT_gpu_shader4 extension.) 144 145 Accepted by the <pname> parameter of GetProgramivARB: 146 147 PROGRAM_ATTRIB_COMPONENTS_NV 0x8906 148 PROGRAM_RESULT_COMPONENTS_NV 0x8907 149 MAX_PROGRAM_ATTRIB_COMPONENTS_NV 0x8908 150 MAX_PROGRAM_RESULT_COMPONENTS_NV 0x8909 151 MAX_PROGRAM_GENERIC_ATTRIBS_NV 0x8DA5 152 MAX_PROGRAM_GENERIC_RESULTS_NV 0x8DA6 153 154Additions to Chapter 2 of the OpenGL 1.5 Specification (OpenGL Operation) 155 156 (Modify "Section 2.14.1" of the ARB_vertex_program specification, 157 describing program parameters.) 158 159 Each program object has an associated array of program local parameters. 160 Program local parameters are four-component vectors whose components can 161 hold floating-point, signed integer, or unsigned integer values. The data 162 type of each local parameter is established when the parameter's values 163 are assigned. If a program attempts to read a local parameter using a 164 data type other than the one used when the parameter is set, the values 165 returned are undefined. ... The commands 166 167 void ProgramLocalParameter4fARB(enum target, uint index, 168 float x, float y, float z, float w); 169 void ProgramLocalParameter4fvARB(enum target, uint index, 170 const float *params); 171 void ProgramLocalParameter4dARB(enum target, uint index, 172 double x, double y, double z, double w); 173 void ProgramLocalParameter4dvARB(enum target, uint index, 174 const double *params); 175 176 void ProgramLocalParameterI4iNV(enum target, uint index, 177 int x, int y, int z, int w); 178 void ProgramLocalParameterI4ivNV(enum target, uint index, 179 const int *params); 180 void ProgramLocalParameterI4uiNV(enum target, uint index, 181 uint x, uint y, uint z, uint w); 182 void ProgramLocalParameterI4uivNV(enum target, uint index, 183 const uint *params); 184 185 update the values of the program local parameter numbered <index> 186 belonging to the program object currently bound to <target>. For the 187 non-vector versions of these commands, the four components of the 188 parameter are updated with the values of <x>, <y>, <z>, and <w>, 189 respectively. For the vector versions, the components of the parameter 190 are updated with the array of four values pointed to by <params>. The 191 error INVALID_VALUE is generated if <index> is greater than or equal to 192 the number of program local parameters supported by <target>. 193 194 The commands 195 196 void ProgramLocalParameters4fvNV(enum target, uint index, 197 sizei count, const float *params); 198 void ProgramLocalParametersI4ivNV(enum target, uint index, 199 sizei count, const int *params); 200 void ProgramLocalParametersI4uivNV(enum target, uint index, 201 sizei count, const uint *params); 202 203 update the values of the program local parameters numbered <index> through 204 <index> + <count> - 1 with the array of 4 * <count> values pointed to by 205 <params>. The error INVALID_VALUE is generated if the sum of <index> and 206 <count> is greater than the number of program local parameters supported 207 by <target>. 208 209 When a program local parameter is updated, the data type of its components 210 is assigned according to the data type of the provided values. If values 211 provided are of type "float" or "double", the components of the parameter 212 are floating-point. If the values provided are of type "int", the 213 components of the parameter are signed integers. If the values provided 214 are of type "uint", the components of the parameter are unsigned integers. 215 216 Additionally, each program target has an associated array of program 217 environment parameters. Unlike program local parameters, program 218 environment parameters are shared by all program objects of a given 219 target. Program environment parameters are four-component vectors whose 220 components can hold floating-point, signed integer, or unsigned integer 221 values. The data type of each environment parameter is established when 222 the parameter's values are assigned. If a program attempts to read an 223 environment parameter using a data type other than the one used when the 224 parameter is set, the values returned are undefined. ... The commands 225 226 void ProgramEnvParameter4fARB(enum target, uint index, 227 float x, float y, float z, float w); 228 void ProgramEnvParameter4fvARB(enum target, uint index, 229 const float *params); 230 void ProgramEnvParameter4dARB(enum target, uint index, 231 double x, double y, double z, double w); 232 void ProgramEnvParameter4dvARB(enum target, uint index, 233 const double *params); 234 void ProgramEnvParameterI4iNV(enum target, uint index, 235 int x, int y, int z, int w); 236 void ProgramEnvParameterI4ivNV(enum target, uint index, 237 const int *params); 238 void ProgramEnvParameterI4uiNV(enum target, uint index, 239 uint x, uint y, uint z, uint w); 240 void ProgramEnvParameterI4uivNV(enum target, uint index, 241 const uint *params); 242 243 update the values of the program environment parameter numbered <index> 244 for the given program target <target>. For the non-vector versions of 245 these commands, the four components of the parameter are updated with the 246 values of <x>, <y>, <z>, and <w>, respectively. For the vector versions, 247 the four components of the parameter are updated with the array of four 248 values pointed to by <params>. The error INVALID_VALUE is generated if 249 <index> is greater than or equal to the number of program environment 250 parameters supported by <target>. 251 252 The commands 253 254 void ProgramEnvParameters4fvNV(enum target, uint index, 255 sizei count, const float *params); 256 void ProgramEnvParametersI4ivNV(enum target, uint index, 257 sizei count, const int *params); 258 void ProgramEnvParametersI4uivNV(enum target, uint index, 259 sizei count, const uint *params); 260 261 update the values of the program environment parameters numbered <index> 262 through <index> + <count> - 1 with the array of 4 * <count> values pointed 263 to by <params>. The error INVALID_VALUE is generated if the sum of 264 <index> and <count> is greater than the number of program local parameters 265 supported by <target>. 266 267 When a program environment parameter is updated, the data type of its 268 components is assigned according to the data type of the provided values. 269 If values provided are of type "float" or "double", the components of the 270 parameter are floating-point. If the values provided are of type "int", 271 the components of the parameter are signed integers. If the values 272 provided are of type "uint", the components of the parameter are unsigned 273 integers. 274 275 ... 276 277 278 Insert New Section 2.X between Sections 2.Y and 2.Z: 279 280 Section 2.X, GPU Programs 281 282 The GL provides a number of different program targets that allow an 283 application to either replace certain fixed-function pipeline stages with 284 a fully programmable model or use a program to control aspects of the GL 285 pipeline that previously had only hard-wired behavior. 286 287 A common base instruction set is available for all program types, 288 providing both integer and floating-point operations. Structured 289 branching operations and subroutine calls are available. Texture 290 mapping (loading data from external images) is supported for all 291 program types. The main differences between the different program 292 types are the set of available inputs and outputs, which are program type- 293 specific, and a few instructions that are meaningful for only a subset 294 of program types. 295 296 297 298 Section 2.X.2, Program Grammar 299 300 GPU program strings are specified as an array of ASCII characters 301 containing the program text. When a GPU program is loaded by a call to 302 ProgramStringARB, the program string is parsed into a set of tokens 303 possibly separated by whitespace. Spaces, tabs, newlines, carriage 304 returns, and comments are considered whitespace. Comments begin with the 305 character "#" and are terminated by a newline, a carriage return, or the 306 end of the program array. 307 308 The Backus-Naur Form (BNF) grammar below specifies the syntactically valid 309 sequences for GPU programs. The set of valid tokens can be inferred 310 from the grammar. A line containing "/* empty */" represents an empty 311 string and is used to indicate optional rules. A program is invalid if it 312 contains any tokens or characters not defined in this specification. 313 314 Note that this extension is not a standalone extension and a small number 315 of grammar rules are left to be defined in the extensions defining the 316 specific vertex, fragment, and geometry program types. 317 318 319 <program> ::= <optionSequence> <declSequence> 320 <statementSequence> "END" 321 322 <optionSequence> ::= <option> <optionSequence> 323 | /* empty */ 324 325 <option> ::= "OPTION" <identifier> ";" 326 327 <declSequence> ::= /* empty */ 328 329 <statementSequence> ::= <statement> <statementSequence> 330 | /* empty */ 331 332 <statement> ::= <instruction> ";" 333 | <namingStatement> ";" 334 | <instLabel> ":" 335 336 <instruction> ::= <ALUInstruction> 337 | <TexInstruction> 338 | <FlowInstruction> 339 340 <ALUInstruction> ::= <VECTORop_instruction> 341 | <SCALARop_instruction> 342 | <BINSCop_instruction> 343 | <BINop_instruction> 344 | <VECSCAop_instruction> 345 | <TRIop_instruction> 346 | <SWZop_instruction> 347 348 <TexInstruction> ::= <TEXop_instruction> 349 | <TXDop_instruction> 350 351 <FlowInstruction> ::= <BRAop_instruction> 352 | <FLOWCCop_instruction> 353 | <IFop_instruction> 354 | <REPop_instruction> 355 | <ENDFLOWop_instruction> 356 357 <VECTORop_instruction> ::= <VECTORop> <opModifiers> <instResult> "," 358 <instOperandV> 359 360 <VECTORop> ::= "ABS" 361 | "CEIL" 362 | "FLR" 363 | "FRC" 364 | "I2F" 365 | "LIT" 366 | "MOV" 367 | "NOT" 368 | "NRM" 369 | "PK2H" 370 | "PK2US" 371 | "PK4B" 372 | "PK4UB" 373 | "ROUND" 374 | "SSG" 375 | "TRUNC" 376 377 <SCALARop_instruction> ::= <SCALARop> <opModifiers> <instResult> "," 378 <instOperandS> 379 380 <SCALARop> ::= "COS" 381 | "EX2" 382 | "LG2" 383 | "RCC" 384 | "RCP" 385 | "RSQ" 386 | "SCS" 387 | "SIN" 388 | "UP2H" 389 | "UP2US" 390 | "UP4B" 391 | "UP4UB" 392 393 <BINSCop_instruction> ::= <BINSCop> <opModifiers> <instResult> "," 394 <instOperandS> "," <instOperandS> 395 396 <BINSCop> ::= "POW" 397 398 <VECSCAop_instruction> ::= <VECSCAop> <opModifiers> <instResult> "," 399 <instOperandV> "," <instOperandS> 400 401 <VECSCAop> ::= "DIV" 402 | "SHL" 403 | "SHR" 404 | "MOD" 405 406 <BINop_instruction> ::= <BINop> <opModifiers> <instResult> "," 407 <instOperandV> "," <instOperandV> 408 409 <BINop> ::= "ADD" 410 | "AND" 411 | "DP3" 412 | "DP4" 413 | "DPH" 414 | "DST" 415 | "MAX" 416 | "MIN" 417 | "MUL" 418 | "OR" 419 | "RFL" 420 | "SEQ" 421 | "SFL" 422 | "SGE" 423 | "SGT" 424 | "SLE" 425 | "SLT" 426 | "SNE" 427 | "STR" 428 | "SUB" 429 | "XPD" 430 | "DP2" 431 | "XOR" 432 433 <TRIop_instruction> ::= <TRIop> <opModifiers> <instResult> "," 434 <instOperandV> "," <instOperandV> "," 435 <instOperandV> 436 437 <TRIop> ::= "CMP" 438 | "DP2A" 439 | "LRP" 440 | "MAD" 441 | "SAD" 442 | "X2D" 443 444 <SWZop_instruction> ::= <SWZop> <opModifiers> <instResult> "," 445 <instOperandVNS> "," <extendedSwizzle> 446 447 <SWZop> ::= "SWZ" 448 449 <TEXop_instruction> ::= <TEXop> <opModifiers> <instResult> "," 450 <instOperandV> "," <texAccess> 451 452 <TEXop> ::= "TEX" 453 | "TXB" 454 | "TXF" 455 | "TXL" 456 | "TXP" 457 | "TXQ" 458 459 <TXDop_instruction> ::= <TXDop> <opModifiers> <instResult> "," 460 <instOperandV> "," <instOperandV> "," 461 <instOperandV> "," <texAccess> 462 463 <TXDop> ::= "TXD" 464 465 <BRAop_instruction> ::= <BRAop> <opModifiers> <instTarget> 466 <optBranchCond> 467 468 <BRAop> ::= "CAL" 469 470 <FLOWCCop_instruction> ::= <FLOWCCop> <opModifiers> <optBranchCond> 471 472 <FLOWCCop> ::= "RET" 473 | "BRK" 474 | "CONT" 475 476 <IFop_instruction> ::= <IFop> <opModifiers> <ccTest> 477 478 <IFop> ::= "IF" 479 480 <REPop_instruction> ::= <REPop> <opModifiers> <instOperandV> 481 | <REPop> <opModifiers> 482 483 <REPop> ::= "REP" 484 485 <ENDFLOWop_instruction> ::= <ENDFLOWop> <opModifiers> 486 487 <ENDFLOWop> ::= "ELSE" 488 | "ENDIF" 489 | "ENDREP" 490 491 <opModifiers> ::= <opModifierItem> <opModifiers> 492 | /* empty */ 493 494 <opModifierItem> ::= "." <opModifier> 495 496 <opModifier> ::= "F" 497 | "U" 498 | "S" 499 | "CC" 500 | "CC0" 501 | "CC1" 502 | "SAT" 503 | "SSAT" 504 | "NTC" 505 | "S24" 506 | "U24" 507 | "HI" 508 509 <texAccess> ::= <texImageUnit> "," <texTarget> 510 | <texImageUnit> "," <texTarget> "," <texOffset> 511 512 <texImageUnit> ::= "texture" <optArrayMemAbs> 513 514 <texTarget> ::= "1D" 515 | "2D" 516 | "3D" 517 | "CUBE" 518 | "RECT" 519 | "SHADOW1D" 520 | "SHADOW2D" 521 | "SHADOWRECT" 522 | "ARRAY1D" 523 | "ARRAY2D" 524 | "SHADOWCUBE" 525 | "SHADOWARRAY1D" 526 | "SHADOWARRAY2D" 527 528 <texOffset> ::= "(" <texOffsetComp> ")" 529 | "(" <texOffsetComp> "," <texOffsetComp> ")" 530 | "(" <texOffsetComp> "," <texOffsetComp> "," 531 <texOffsetComp> ")" 532 533 <texOffsetComp> ::= <optSign> <int> 534 535 <optBranchCond> ::= /* empty */ 536 | <ccMask> 537 538 <instOperandV> ::= <instOperandAbsV> 539 | <instOperandBaseV> 540 541 <instOperandAbsV> ::= <operandAbsNeg> "|" <instOperandBaseV> "|" 542 543 <instOperandBaseV> ::= <operandNeg> <attribUseV> 544 | <operandNeg> <tempUseV> 545 | <operandNeg> <paramUseV> 546 | <operandNeg> <bufferUseV> 547 548 <instOperandS> ::= <instOperandAbsS> 549 | <instOperandBaseS> 550 551 <instOperandAbsS> ::= <operandAbsNeg> "|" <instOperandBaseS> "|" 552 553 <instOperandBaseS> ::= <operandNeg> <attribUseS> 554 | <operandNeg> <tempUseS> 555 | <operandNeg> <paramUseS> 556 | <operandNeg> <bufferUseS> 557 558 <instOperandVNS> ::= <attribUseVNS> 559 | <tempUseVNS> 560 | <paramUseVNS> 561 | <bufferUseVNS> 562 563 <operandAbsNeg> ::= <optSign> 564 565 <operandNeg> ::= <optSign> 566 567 <instResult> ::= <instResultCC> 568 | <instResultBase> 569 570 <instResultCC> ::= <instResultBase> <ccMask> 571 572 <instResultBase> ::= <tempUseW> 573 | <resultUseW> 574 575 <namingStatement> ::= <varMods> <ATTRIB_statement> 576 | <varMods> <PARAM_statement> 577 | <varMods> <TEMP_statement> 578 | <varMods> <OUTPUT_statement> 579 | <varMods> <BUFFER_statement> 580 | <ALIAS_statement> 581 582 <ATTRIB_statement> ::= "ATTRIB" <establishName> "=" <attribUseD> 583 584 <PARAM_statement> ::= <PARAM_singleStmt> 585 | <PARAM_multipleStmt> 586 587 <PARAM_singleStmt> ::= "PARAM" <establishName> <paramSingleInit> 588 589 <PARAM_multipleStmt> ::= "PARAM" <establishName> <optArraySize> 590 <paramMultipleInit> 591 592 <paramSingleInit> ::= "=" <paramUseDB> 593 594 <paramMultipleInit> ::= "=" "{" <paramMultInitList> "}" 595 596 <paramMultInitList> ::= <paramUseDM> 597 | <paramUseDM> "," <paramMultInitList> 598 599 <TEMP_statement> ::= "TEMP" <varNameList> 600 601 <OUTPUT_statement> ::= "OUTPUT" <establishName> "=" <resultUseD> 602 603 <varMods> ::= <varModifier> <varMods> 604 | /* empty */ 605 606 <varModifier> ::= "SHORT" 607 | "LONG" 608 | "INT" 609 | "UINT" 610 | "FLOAT" 611 612 <ALIAS_statement> ::= "ALIAS" <establishName> "=" <establishedName> 613 614 <BUFFER_statement> ::= <bufferDeclType> <establishName> "=" 615 <bufferSingleInit> 616 | <bufferDeclType> <establishName> 617 <optArraySize> "=" <bufferMultInit> 618 619 <bufferDeclType> ::= "BUFFER" 620 | "BUFFER4" 621 622 <bufferSingleInit> ::= "=" <bufferUseDB> 623 624 <bufferMultInit> ::= "=" "{" <bufferMultInitList> "}" 625 626 <bufferMultInitList> ::= <bufferUseDM> 627 | <bufferUseDM> "," <bufferMultInitList> 628 629 <varNameList> ::= <establishName> 630 | <establishName> "," <varNameList> 631 632 <attribUseV> ::= <attribBasic> <swizzleSuffix> 633 | <attribVarName> <swizzleSuffix> 634 | <attribVarName> <arrayMem> <swizzleSuffix> 635 | <attribColor> <swizzleSuffix> 636 | <attribColor> "." <colorType> <swizzleSuffix> 637 638 <attribUseS> ::= <attribBasic> <scalarSuffix> 639 | <attribVarName> <scalarSuffix> 640 | <attribVarName> <arrayMem> <scalarSuffix> 641 | <attribColor> <scalarSuffix> 642 | <attribColor> "." <colorType> <scalarSuffix> 643 644 <attribUseVNS> ::= <attribBasic> 645 | <attribVarName> 646 | <attribVarName> <arrayMem> 647 | <attribColor> 648 | <attribColor> "." <colorType> 649 650 <attribUseD> ::= <attribBasic> 651 | <attribColor> 652 | <attribColor> "." <colorType> 653 | <attribMulti> 654 655 <paramUseV> ::= <paramVarName> <optArrayMem> <swizzleSuffix> 656 | <stateSingleItem> <swizzleSuffix> 657 | <programSingleItem> <swizzleSuffix> 658 | <constantVector> <swizzleSuffix> 659 | <constantScalar> 660 661 <paramUseS> ::= <paramVarName> <optArrayMem> <scalarSuffix> 662 | <stateSingleItem> <scalarSuffix> 663 | <programSingleItem> <scalarSuffix> 664 | <constantVector> <scalarSuffix> 665 | <constantScalar> 666 667 <paramUseVNS> ::= <paramVarName> <optArrayMem> 668 | <stateSingleItem> 669 | <programSingleItem> 670 | <constantVector> 671 | <constantScalar> 672 673 <paramUseDB> ::= <stateSingleItem> 674 | <programSingleItem> 675 | <constantVector> 676 | <signedConstantScalar> 677 678 <paramUseDM> ::= <stateMultipleItem> 679 | <programMultipleItem> 680 | <constantVector> 681 | <signedConstantScalar> 682 683 <stateMultipleItem> ::= <stateSingleItem> 684 | "state" "." <stateMatrixRows> 685 686 <stateSingleItem> ::= "state" "." <stateMaterialItem> 687 | "state" "." <stateLightItem> 688 | "state" "." <stateLightModelItem> 689 | "state" "." <stateLightProdItem> 690 | "state" "." <stateFogItem> 691 | "state" "." <stateMatrixRow> 692 | "state" "." <stateTexGenItem> 693 | "state" "." <stateClipPlaneItem> 694 | "state" "." <statePointItem> 695 | "state" "." <stateTexEnvItem> 696 | "state" "." <stateDepthItem> 697 698 <stateMaterialItem> ::= "material" "." <stateMatProperty> 699 | "material" "." <faceType> "." 700 <stateMatProperty> 701 702 <stateMatProperty> ::= "ambient" 703 | "diffuse" 704 | "specular" 705 | "emission" 706 | "shininess" 707 708 <stateLightItem> ::= "light" <arrayMemAbs> "." <stateLightProperty> 709 710 <stateLightProperty> ::= "ambient" 711 | "diffuse" 712 | "specular" 713 | "position" 714 | "attenuation" 715 | "spot" "." <stateSpotProperty> 716 | "half" 717 718 <stateSpotProperty> ::= "direction" 719 720 <stateLightModelItem> ::= "lightmodel" "." <stateLModProperty> 721 722 <stateLModProperty> ::= "ambient" 723 | "scenecolor" 724 | <faceType> "." "scenecolor" 725 726 <stateLightProdItem> ::= "lightprod" <arrayMemAbs> "." 727 <stateLProdProperty> 728 | "lightprod" <arrayMemAbs> "." <faceType> "." 729 <stateLProdProperty> 730 731 <stateLProdProperty> ::= "ambient" 732 | "diffuse" 733 | "specular" 734 735 <stateFogItem> ::= "fog" "." <stateFogProperty> 736 737 <stateFogProperty> ::= "color" 738 | "params" 739 740 <stateMatrixRows> ::= <stateMatrixItem> 741 | <stateMatrixItem> "." <stateMatModifier> 742 | <stateMatrixItem> "." "row" <arrayRange> 743 | <stateMatrixItem> "." <stateMatModifier> "." 744 "row" <arrayRange> 745 746 <stateMatrixRow> ::= <stateMatrixItem> "." "row" <arrayMemAbs> 747 | <stateMatrixItem> "." <stateMatModifier> "." 748 "row" <arrayMemAbs> 749 750 <stateMatrixItem> ::= "matrix" "." <stateMatrixName> 751 752 <stateMatModifier> ::= "inverse" 753 | "transpose" 754 | "invtrans" 755 756 <stateMatrixName> ::= "modelview" <optArrayMemAbs> 757 | "projection" 758 | "mvp" 759 | "texture" <optArrayMemAbs> 760 | "program" <arrayMemAbs> 761 762 <stateTexGenItem> ::= "texgen" <optArrayMemAbs> "." 763 <stateTexGenType> "." <stateTexGenCoord> 764 765 <stateTexGenType> ::= "eye" 766 | "object" 767 768 <stateTexGenCoord> ::= "s" 769 | "t" 770 | "r" 771 | "q" 772 773 <stateClipPlaneItem> ::= "clip" <arrayMemAbs> "." "plane" 774 775 <statePointItem> ::= "point" "." <statePointProperty> 776 777 <statePointProperty> ::= "size" 778 | "attenuation" 779 780 <stateTexEnvItem> ::= "texenv" <optArrayMemAbs> "." 781 <stateTexEnvProperty> 782 783 <stateTexEnvProperty> ::= "color" 784 785 <stateDepthItem> ::= "depth" "." <stateDepthProperty> 786 787 <stateDepthProperty> ::= "range" 788 789 <programSingleItem> ::= <progEnvParam> 790 | <progLocalParam> 791 792 <programMultipleItem> ::= <progEnvParams> 793 | <progLocalParams> 794 795 <progEnvParams> ::= "program" "." "env" <arrayMemAbs> 796 | "program" "." "env" <arrayRange> 797 798 <progEnvParam> ::= "program" "." "env" <arrayMemAbs> 799 800 <progLocalParams> ::= "program" "." "local" <arrayMemAbs> 801 | "program" "." "local" <arrayRange> 802 803 <progLocalParam> ::= "program" "." "local" <arrayMemAbs> 804 805 <constantVector> ::= "{" <constantVectorList> "}" 806 807 <constantVectorList> ::= <signedConstantScalar> 808 | <signedConstantScalar> "," 809 <signedConstantScalar> 810 | <signedConstantScalar> "," 811 <signedConstantScalar> "," 812 <signedConstantScalar> 813 | <signedConstantScalar> "," 814 <signedConstantScalar> "," 815 <signedConstantScalar> "," 816 <signedConstantScalar> 817 818 <signedConstantScalar> ::= <optSign> <constantScalar> 819 820 <constantScalar> ::= <floatConstant> 821 | <intConstant> 822 823 <floatConstant> ::= <float> 824 825 <intConstant> ::= <int> 826 827 <tempUseV> ::= <tempVarName> <swizzleSuffix> 828 829 <tempUseS> ::= <tempVarName> <scalarSuffix> 830 831 <tempUseVNS> ::= <tempVarName> 832 833 <tempUseW> ::= <tempVarName> <optWriteMask> 834 835 <resultUseW> ::= <resultBasic> <optWriteMask> 836 | <resultVarName> <optWriteMask> 837 838 <resultUseD> ::= <resultBasic> 839 840 <bufferUseV> ::= <bufferVarName> <optArrayMem> <swizzleSuffix> 841 842 <bufferUseS> ::= <bufferVarName> <optArrayMem> <scalarSuffix> 843 844 <bufferUseVNS> ::= <bufferVarName> <optArrayMem> 845 846 <bufferUseDB> ::= <bufferBinding> <arrayMemAbs> 847 848 <bufferUseDM> ::= <bufferBinding> <arrayMemAbs> 849 | <bufferBinding> <arrayRange> 850 | <bufferBinding> 851 852 <bufferBinding> ::= "program" "." "buffer" <arrayMemAbs> 853 854 <optArraySize> ::= "[" "]" 855 | "[" <int> "]" 856 857 <optArrayMem> ::= /* empty */ 858 | <arrayMem> 859 860 <arrayMem> ::= <arrayMemAbs> 861 | <arrayMemRel> 862 863 <optArrayMemAbs> ::= /* empty */ 864 | <arrayMemAbs> 865 866 <arrayMemAbs> ::= "[" <int> "]" 867 868 <arrayMemRel> ::= "[" <arrayMemReg> <arrayMemOffset> "]" 869 870 <arrayMemReg> ::= <addrUseS> 871 872 <arrayMemOffset> ::= /* empty */ 873 | "+" <int> 874 | "-" <int> 875 876 <arrayRange> ::= "[" <int> ".." <int> "]" 877 878 <addrUseS> ::= <addrVarName> <scalarSuffix> 879 880 <ccMask> ::= "(" <ccTest> ")" 881 882 <ccTest> ::= <ccMaskRule> <swizzleSuffix> 883 884 <ccMaskRule> ::= "EQ" 885 | "GE" 886 | "GT" 887 | "LE" 888 | "LT" 889 | "NE" 890 | "TR" 891 | "FL" 892 | "EQ0" 893 | "GE0" 894 | "GT0" 895 | "LE0" 896 | "LT0" 897 | "NE0" 898 | "TR0" 899 | "FL0" 900 | "EQ1" 901 | "GE1" 902 | "GT1" 903 | "LE1" 904 | "LT1" 905 | "NE1" 906 | "TR1" 907 | "FL1" 908 | "NAN" 909 | "NAN0" 910 | "NAN1" 911 | "LEG" 912 | "LEG0" 913 | "LEG1" 914 | "CF" 915 | "CF0" 916 | "CF1" 917 | "NCF" 918 | "NCF0" 919 | "NCF1" 920 | "OF" 921 | "OF0" 922 | "OF1" 923 | "NOF" 924 | "NOF0" 925 | "NOF1" 926 | "AB" 927 | "AB0" 928 | "AB1" 929 | "BLE" 930 | "BLE0" 931 | "BLE1" 932 | "SF" 933 | "SF0" 934 | "SF1" 935 | "NSF" 936 | "NSF0" 937 | "NSF1" 938 939 <optWriteMask> ::= /* empty */ 940 | <xyzwMask> 941 | <rgbaMask> 942 943 <xyzwMask> ::= "." "x" 944 | "." "y" 945 | "." "xy" 946 | "." "z" 947 | "." "xz" 948 | "." "yz" 949 | "." "xyz" 950 | "." "w" 951 | "." "xw" 952 | "." "yw" 953 | "." "xyw" 954 | "." "zw" 955 | "." "xzw" 956 | "." "yzw" 957 | "." "xyzw" 958 959 <rgbaMask> ::= "." "r" 960 | "." "g" 961 | "." "rg" 962 | "." "b" 963 | "." "rb" 964 | "." "gb" 965 | "." "rgb" 966 | "." "a" 967 | "." "ra" 968 | "." "ga" 969 | "." "rga" 970 | "." "ba" 971 | "." "rba" 972 | "." "gba" 973 | "." "rgba" 974 975 <swizzleSuffix> ::= /* empty */ 976 | "." <component> 977 | "." <xyzwSwizzle> 978 | "." <rgbaSwizzle> 979 980 <extendedSwizzle> ::= <extSwizComp> "," <extSwizComp> "," 981 <extSwizComp> "," <extSwizComp> 982 983 <extSwizComp> ::= <optSign> <xyzwExtSwizSel> 984 | <optSign> <rgbaExtSwizSel> 985 986 <xyzwExtSwizSel> ::= "0" 987 | "1" 988 | <xyzwComponent> 989 990 <rgbaExtSwizSel> ::= <rgbaComponent> 991 992 <scalarSuffix> ::= "." <component> 993 994 <component> ::= <xyzwComponent> 995 | <rgbaComponent> 996 997 <xyzwComponent> ::= "x" 998 | "y" 999 | "z" 1000 | "w" 1001 1002 <rgbaComponent> ::= "r" 1003 | "g" 1004 | "b" 1005 | "a" 1006 1007 <optSign> ::= /* empty */ 1008 | "-" 1009 | "+" 1010 1011 <faceType> ::= "front" 1012 | "back" 1013 1014 <colorType> ::= "primary" 1015 | "secondary" 1016 1017 <instLabel> ::= <identifier> 1018 1019 <instTarget> ::= <identifier> 1020 1021 <establishedName> ::= <identifier> 1022 1023 <establishName> ::= <identifier> 1024 1025 1026 The <int> rule matches an integer constant. The integer consists of a 1027 sequence of one or more digits ("0" through "9"), or a sequence in 1028 hexadecimal form beginning with "0x" followed by a sequence of one or more 1029 hexadecimal digits ("0" through "9", "a" through "f", "A" through "F"). 1030 1031 The <float> rule matches a floating-point constant consisting of an 1032 integer part, a decimal point, a fraction part, an "e" or "E", and an 1033 optionally signed integer exponent. The integer and fraction parts both 1034 consist of a sequence of one or more digits ("0" through "9"). Either the 1035 integer part or the fraction parts (not both) may be missing; either the 1036 decimal point or the "e" (or "E") and the exponent (not both) may be 1037 missing. Most grammar rules that allow floating-point values also allow 1038 integers matching the <int> rule. 1039 1040 The <identifier> rule matches a sequence of one or more letters ("A" 1041 through "Z", "a" through "z"), digits ("0" through "9), underscores ("_"), 1042 or dollar signs ("$"); the first character must not be a number. Upper 1043 and lower case letters are considered different (names are 1044 case-sensitive). The following strings are reserved keywords and may not 1045 be used as identifiers: "fragment" (for fragment programs only), "vertex" 1046 (for vertex and geometry programs), "primitive" (for fragment and geometry 1047 programs), "program", "result", "state", and "texture". 1048 1049 The <tempVarName>, <paramVarName>, <attribVarName>, <resultVarName>, and 1050 <bufferName> rules match identifiers that have been previously established 1051 as names of temporary, program parameter, attribute, result, and program 1052 parameter buffer variables, respectively. 1053 1054 The <xyzwSwizzle> and <rgbaSwizzle> rules match any 4-character strings 1055 consisting only of the characters "x", "y", "z", and "w" (<xyzwSwizzle>) 1056 or "r", "g", "b", "a" (<rgbaSwizzle>). 1057 1058 The error INVALID_OPERATION is generated if a program fails to load 1059 because it is not syntactically correct or for one of the semantic 1060 restrictions described in the following sections. 1061 1062 A successfully loaded program is parsed into a sequence of instructions. 1063 Each instruction is identified by its tokenized name. The operation of 1064 these instructions when executed is defined in section 2.X.4. A 1065 successfully loaded program string replaces the program string previously 1066 loaded into the specified program object. If the OUT_OF_MEMORY error is 1067 generated by ProgramStringARB, no change is made to the previous contents 1068 of the current program object. 1069 1070 1071 Section 2.X.3, Program Variables 1072 1073 Programs may operate on a number of different variables during their 1074 execution. The following sections define the different classes of 1075 variables that can be declared and used by a program. 1076 1077 Some variable classes require variable bindings. Variable classes with 1078 bindings refer to state that is either generated or consumed outside the 1079 program. Examples of variable bindings include a vertex's normal, the 1080 position of a vertex computed by a vertex program, an interpolated texture 1081 coordinate, and the diffuse color of light 1. Variables that are used 1082 only during program execution do not have bindings. 1083 1084 Variables may be declared explicitly according to the <namingStatement> 1085 grammar rule. Explicit variable declarations allow a program to establish 1086 a variable name that can be used to refer to a specified resource in 1087 subsequent instructions. Variables may be declared anywhere in the 1088 program string, but must be declared prior to use. A program will fail to 1089 load if it declares the same variable name more than once, or if it refers 1090 to a variable name that has not been previously declared in the program 1091 string. 1092 1093 Variables may also be declared implicitly, simply by using a variable 1094 binding as an operand in a program instruction. Such uses are considered 1095 to automatically create a nameless variable using the specified binding. 1096 Only variable from classes with bindings can be declared implicitly. 1097 1098 1099 Section 2.X.3.1, Program Variable Types 1100 1101 Explicit variable declarations may include one or more modifiers that 1102 specify additional information about the variable, such as the size and 1103 data type of the components of the variable. Variable modifiers are 1104 specified according to the <varModifier> grammar rule. 1105 1106 By default, variables are considered typeless. They can be used in 1107 instructions that read or write the variable as floating-point values, 1108 signed integers, or unsigned integers. If a variable is written using one 1109 data type but then read using a different one, the results of the 1110 operation are undefined. Variables with bindings are considered to be 1111 read or written when their values are produced or consumed; the data type 1112 used by the GL is specified in the description of each binding. 1113 1114 Explicitly declared variables may optionally have one data type modifier, 1115 which can be used to detect data type mismatch errors. Type modifers of 1116 "INT", "UINT", and "FLOAT" indicate that the components of the variable 1117 are stored as signed integers, unsigned integers, or floating-point 1118 values, respectively. A program will fail to load if it attempts to read 1119 or write a variable using a data type other than the one indicated by the 1120 data type modifier. Variables without a data type modifier can be read or 1121 written using any data type. 1122 1123 Explicitly declared variables may optionally have one storage size 1124 modifier. Variables decared as "SHORT" will be represented using at least 1125 16 bits per component. "SHORT" floating-point values will have at least 5 1126 bits of exponent and 10 bits of mantissa. Variables declared as "LONG" 1127 will be represented with at least 32 bits per component. "LONG" 1128 floating-point values will have at least 8 bits of exponent and 23 bits of 1129 mantissa. If no size modifier is provided, the GL will automatically 1130 select component sizes. Implementations are not required to support more 1131 than one component size, so "SHORT", "LONG", and the default could all 1132 refer to the same component size. The "LONG" modifier is supported only 1133 for declarations of temporary variables ("TEMP"). The "SHORT" modifier is 1134 supported only for declarations of temporary variables and result 1135 variables ("OUTPUT"). 1136 1137 Each variable declaration can include at most one data type and one 1138 storage size modifier. A program will fail to load if it specifies 1139 multiple data type or multiple storage size modifiers in a single variable 1140 declaration. 1141 1142 (NOTE: Fragment programs also support the modifiers "FLAT", "CENTROID", 1143 and "NOPERSPECTIVE", which control how per-fragment attribute values are 1144 produced. These modifiers are described in detail in the 1145 NV_fragment_program4 specification.) 1146 1147 Explicitly declared variables of all types may be declared as arrays. An 1148 array variable has one or more members, numbered 0 through <n>-1, where 1149 <n> is the number of entries in the array. The total number of entries in 1150 the array can be declared using the <optArraySize> grammar rule. For 1151 variable classes without bindings, an array size must be specified in the 1152 program, and must be a positive integer. For variable classes with 1153 bindings, a declared size is optional, and is taken from the number of 1154 bindings assigned in the declaration if omitted. A program will fail to 1155 load if the declared size of an array variable does not match the number 1156 of assigned bindings. 1157 1158 When a variable is declared as an array, instructions that use the 1159 variable must specify an array member to access according to the 1160 <arrayMem> grammar rule. A program will fail to load if it contains an 1161 instruction that accesses an array variable without specifying an array 1162 member or an instruction that specifies an array member for a non-array 1163 variable. 1164 1165 1166 Section 2.X.3.2, Program Attribute Variables 1167 1168 Program attribute variables represent per-vertex or per-fragment inputs to 1169 the program. All attribute variables have associated bindings, and are 1170 read-only during program execution. Attribute variables may be declared 1171 explicitly via the <ATTRIB_statement> grammar rule, or implicitly by using 1172 an attribute binding in an instruction. 1173 1174 The set of available attribute bindings depends on the program type, and 1175 is enumerated in the specifications for each program type. 1176 1177 The set of bindings allowed for attribute array variables is limited to 1178 attribute state grouped in arrays (e.g., texture coordinates, generic 1179 vertex attributes). Additionally, all bindings assigned to the array must 1180 be of the same binding type and must increase consecutively. Examples of 1181 valid and invalid binding lists include: 1182 1183 vertex.attrib[1], vertex.attrib[2] # valid, 2-entry array 1184 vertex.texcoord[0..3] # valid, 4-entry array 1185 vertex.attrib[1], vertex.attrib[3] # invalid, skipped attrib 2 1186 vertex.attrib[2], vertex.attrib[1] # invalid, wrong order 1187 vertex.attrib[1], vertex.texcoord[2] # invalid, different types 1188 1189 Additionally, attribute bindings may be used in no more than one array 1190 variable accessed with relative addressing. 1191 1192 Implementations may have a limit on the total number of attribute binding 1193 components used by each program target (MAX_PROGRAM_ATTRIB_COMPONENTS_NV). 1194 Programs that use more attribute binding components than this limit will 1195 fail to load. The method of counting used attribute binding components is 1196 implementation-dependent, but must satisfy the following properties: 1197 1198 * If an attribute binding is not referenced in a program, or is 1199 referenced only in declarations of attribute variables that are not 1200 used, none of its components are counted. 1201 1202 * An attribute binding component may be counted as used only if there 1203 exists an instruction operand where 1204 1205 - the component is enabled for read by the swizzle pattern (Section 1206 2.X.4.2), and 1207 1208 - the attribute binding is 1209 1210 - referenced directly by the operand, 1211 1212 - bound to a declared variable referenced by the operand, or 1213 1214 - bound to a declared array variable where another binding in 1215 the array satisfies one of the two previous conditions. 1216 1217 Implementations are not required to optimize out unused elements of an 1218 attribute array or components that are used in only some elements of 1219 an array. The last of these rules is intended to cover the case where 1220 the same attribute binding is used in multiple variables. 1221 1222 For example, an operand whose swizzle pattern selects only the x 1223 component may result in the x component of an attribute binding being 1224 counted, but may never result in the counting of the y, z, or w 1225 components of any attribute binding. 1226 1227 * Implementations are not required to determine that components read by 1228 an instruction are actually unused due to: 1229 1230 - instruction write masks (for example, a component-wise ADD 1231 operation that only writes the "x" component doesn't have to read 1232 the "y", "z", and "w" components of its operands) or 1233 1234 - any other properties of the instruction (for example, the DP3 1235 instruction computes a 3-component dot product doesn't have to 1236 read the "w" component of its operands). 1237 1238 1239 Section 2.X.3.3, Program Parameters 1240 1241 Program parameter variables are used as constants during program 1242 execution. All program parameter variables have associated bindings and 1243 are read-only during program execution. Program parameters retain their 1244 values across program invocations, although their values may change 1245 between invocations due to GL state changes. Program parameter variables 1246 may be declared explicitly via the <PARAM_statement> grammar rule, or 1247 implicitly by using a parameter binding in an instruction. Except where 1248 otherwise specified, program parameter bindings always specify 1249 floating-point values. 1250 1251 When declaring program parameter array variables, all bindings are 1252 supported and can be assigned to array members in any order. The only 1253 restriction is that no parameter binding may be used more than once in 1254 array variables accessed using relative addressing. A program will fail 1255 to load if any program parameter binding is used more than once in a 1256 single array accessed using relative addressing or used at least once in 1257 two or more arrays accessed using relative addressing. 1258 1259 1260 Constant Bindings 1261 1262 If a program parameter binding matches the <constantScalar> or 1263 <signedConstantScalar> grammar rules, the corresponding program parameter 1264 variable is bound to the vector (X,X,X,X), where X is the value of the 1265 specified constant. 1266 1267 If a program parameter binding matches <constantVector>, the corresponding 1268 program parameter variable is bound to the vector (X,Y,Z,W), where X, Y, 1269 Z, and W are the values corresponding to the first, second, third, and 1270 fourth match of <signedConstantScalar>. If fewer than four constants are 1271 specified, Y, Z, and W assume the values 0, 0, and 1, if their respective 1272 constants are not specified. 1273 1274 Constant bindings can be interpreted as having signed integer, unsigned 1275 integer, or floating-point values, depending on how they are used in the 1276 program text. For constants in variable declarations, the components of 1277 the constant are interpreted according to the variable's component data 1278 type modifier. If no data type modifier is specified in a declaration, 1279 constants are interpreted as floating-point values. For constant bindings 1280 used directly in an instruction, the components of the constant are 1281 interpreted according to the required data type of the operand. A program 1282 will fail to load if it specifies a floating-point constant value 1283 (matching the <floatConstant> grammar rule) that should be interpreted as 1284 a signed or unsigned integer, or a negative integer constant value that 1285 should be interpreted as an unsigned integer. 1286 1287 If the value used to specify a floating-point constant can not be exactly 1288 represented, the nearest floating-point value will be used. If the value 1289 used to specify an integer constant is too large to be represented, the 1290 program will fail to load. 1291 1292 1293 Program Environment/Local Parameter Bindings 1294 1295 Binding Components Underlying State 1296 ------------------------- ---------- ------------------------------- 1297 program.env[a] (x,y,z,w) program environment parameter a 1298 program.local[a] (x,y,z,w) program local parameter a 1299 program.env[a..b] (x,y,z,w) program environment parameters 1300 a through b 1301 program.local[a..b] (x,y,z,w) program local parameters 1302 a through b 1303 1304 Table X.1: Program Environment/Local Parameter Bindings. <a> and <b> 1305 indicate parameter numbers, where <a> must be less than or equal to <b>. 1306 1307 If a program parameter binding matches "program.env[a]" or 1308 "program.local[a]", the four components of the program parameter variable 1309 are filled with the four components of program environment parameter <a> 1310 or program local parameter <a> respectively. 1311 1312 Additionally, for program parameter array bindings, "program.env[a..b]" 1313 and "program.local[a..b]" are equivalent to specifying program environment 1314 or local parameters <a> through <b> in order, respectively. A program 1315 using any of these bindings will fail to load if <a> is greater than <b>. 1316 1317 Program environment and local parameters are typeless, and may be 1318 specified as signed integer, unsigned integer, or floating-point 1319 variables. If a program environment parameter is read using a data type 1320 other than the one used to specify it, an undefined value is returned. 1321 1322 1323 Material Property Bindings 1324 1325 Binding Components Underlying State 1326 ----------------------------- ---------- ---------------------------- 1327 state.material.ambient (r,g,b,a) front ambient material color 1328 state.material.diffuse (r,g,b,a) front diffuse material color 1329 state.material.specular (r,g,b,a) front specular material color 1330 state.material.emission (r,g,b,a) front emissive material color 1331 state.material.shininess (s,0,0,1) front material shininess 1332 state.material.front.ambient (r,g,b,a) front ambient material color 1333 state.material.front.diffuse (r,g,b,a) front diffuse material color 1334 state.material.front.specular (r,g,b,a) front specular material color 1335 state.material.front.emission (r,g,b,a) front emissive material color 1336 state.material.front.shininess (s,0,0,1) front material shininess 1337 state.material.back.ambient (r,g,b,a) back ambient material color 1338 state.material.back.diffuse (r,g,b,a) back diffuse material color 1339 state.material.back.specular (r,g,b,a) back specular material color 1340 state.material.back.emission (r,g,b,a) back emissive material color 1341 state.material.back.shininess (s,0,0,1) back material shininess 1342 1343 Table X.3: Material Property Bindings. If a material face is not 1344 specified in the binding, the front property is used. 1345 1346 If a program parameter binding matches any of the material properties 1347 listed in Table X.3, the program parameter variable is filled according to 1348 the table. For ambient, diffuse, specular, or emissive colors, the "x", 1349 "y", "z", and "w" components are filled with the "r", "g", "b", and "a" 1350 components, respectively, of the corresponding material color. For 1351 material shininess, the "x" component is filled with the material's 1352 specular exponent, and the "y", "z", and "w" components are filled with 1353 the floating-point constants 0, 0, and 1, respectively. Bindings 1354 containing ".back" refer to the back material; all other bindings refer to 1355 the front material. 1356 1357 Material properties can be changed inside a Begin/End pair, either 1358 directly by calling Material, or indirectly through color material. 1359 However, such property changes are not guaranteed to update program 1360 parameter bindings until the following End command. Program parameter 1361 variables bound to material properties changed inside a Begin/End pair are 1362 undefined until the following End command. 1363 1364 1365 Light Property Bindings 1366 1367 Binding Components Underlying State 1368 ----------------------------- ---------- ---------------------------- 1369 state.light[n].ambient (r,g,b,a) light n ambient color 1370 state.light[n].diffuse (r,g,b,a) light n diffuse color 1371 state.light[n].specular (r,g,b,a) light n specular color 1372 state.light[n].position (x,y,z,w) light n position 1373 state.light[n].attenuation (a,b,c,e) light n attenuation constants 1374 and spot light exponent 1375 state.light[n].spot.direction (x,y,z,c) light n spot direction and 1376 cutoff angle cosine 1377 state.light[n].half (x,y,z,1) light n infinite half-angle 1378 state.lightmodel.ambient (r,g,b,a) light model ambient color 1379 state.lightmodel.scenecolor (r,g,b,a) light model front scene color 1380 state.lightmodel. (r,g,b,a) light model front scene color 1381 front.scenecolor 1382 state.lightmodel. (r,g,b,a) light model back scene color 1383 back.scenecolor 1384 state.lightprod[n].ambient (r,g,b,a) light n / front material 1385 ambient color product 1386 state.lightprod[n].diffuse (r,g,b,a) light n / front material 1387 diffuse color product 1388 state.lightprod[n].specular (r,g,b,a) light n / front material 1389 specular color product 1390 state.lightprod[n]. (r,g,b,a) light n / front material 1391 front.ambient ambient color product 1392 state.lightprod[n]. (r,g,b,a) light n / front material 1393 front.diffuse diffuse color product 1394 state.lightprod[n]. (r,g,b,a) light n / front material 1395 front.specular specular color product 1396 state.lightprod[n]. (r,g,b,a) light n / back material 1397 back.ambient ambient color product 1398 state.lightprod[n]. (r,g,b,a) light n / back material 1399 back.diffuse diffuse color product 1400 state.lightprod[n]. (r,g,b,a) light n / back material 1401 back.specular specular color product 1402 1403 Table X.4: Light Property Bindings. <n> indicates a light number. 1404 1405 If a program parameter binding matches "state.light[n].ambient", 1406 "state.light[n].diffuse", or "state.light[n].specular", the "x", "y", "z", 1407 and "w" components of the program parameter variable are filled with the 1408 "r", "g", "b", and "a" components, respectively, of the corresponding 1409 light color. 1410 1411 If a program parameter binding matches "state.light[n].position", the "x", 1412 "y", "z", and "w" components of the program parameter variable are filled 1413 with the "x", "y", "z", and "w" components, respectively, of the light 1414 position. 1415 1416 If a program parameter binding matches "state.light[n].attenuation", the 1417 "x", "y", and "z" components of the program parameter variable are filled 1418 with the constant, linear, and quadratic attenuation parameters of the 1419 specified light, respectively (section 2.13.1). The "w" component of the 1420 program parameter variable is filled with the spot light exponent of the 1421 specified light. 1422 1423 If a program parameter binding matches "state.light[n].spot.direction", 1424 the "x", "y", and "z" components of the program parameter variable are 1425 filled with the "x", "y", and "z" components of the spot light direction 1426 of the specified light, respectively (section 2.13.1). The "w" component 1427 of the program parameter variable is filled with the cosine of the spot 1428 light cutoff angle of the specified light. 1429 1430 If a program parameter binding matches "state.light[n].half", the "x", 1431 "y", and "z" components of the program parameter variable are filled with 1432 the x, y, and z components, respectively, of the normalized infinite 1433 half-angle vector 1434 1435 h_inf = || P + (0, 0, 1) ||. 1436 1437 The "w" component is filled with 1.0. In the computation of h_inf, P 1438 consists of the x, y, and z coordinates of the normalized vector from the 1439 eye position P_e to the eye-space light position P_pli (section 2.13.1). 1440 h_inf is defined to correspond to the normalized half-angle vector when 1441 using an infinite light (w coordinate of the position is zero) and an 1442 infinite viewer (v_bs is FALSE). For local lights or a local viewer, 1443 h_inf is well-defined but does not match the normalized half-angle vector, 1444 which will vary depending on the vertex position. 1445 1446 If a program parameter binding matches "state.lightmodel.ambient", the 1447 "x", "y", "z", and "w" components of the program parameter variable are 1448 filled with the "r", "g", "b", and "a" components of the light model 1449 ambient color, respectively. 1450 1451 If a program parameter binding matches "state.lightmodel.scenecolor" or 1452 "state.lightmodel.front.scenecolor", the "x", "y", and "z" components of 1453 the program parameter variable are filled with the "r", "g", and "b" 1454 components respectively of the "front scene color" 1455 1456 c_scene = a_cs * a_cm + e_cm, 1457 1458 where a_cs is the light model ambient color, a_cm is the front ambient 1459 material color, and e_cm is the front emissive material color. The "w" 1460 component of the program parameter variable is filled with the alpha 1461 component of the front diffuse material color. If a program parameter 1462 binding matches "state.lightmodel.back.scenecolor", a similar back scene 1463 color, computed using back-facing material properties, is used. The front 1464 and back scene colors match the values that would be assigned to vertices 1465 using conventional lighting if all lights were disabled. 1466 1467 If a program parameter binding matches anything beginning with 1468 "state.lightprod[n]", the "x", "y", and "z" components of the program 1469 parameter variable are filled with the "r", "g", and "b" components, 1470 respectively, of the corresponding light product. The three light product 1471 components are the products of the corresponding color components of the 1472 specified material property and the light color of the specified light 1473 (see Table X.4). The "w" component of the program parameter variable is 1474 filled with the alpha component of the specified material property. 1475 1476 Light products depend on material properties, which can be changed inside 1477 a Begin/End pair. Such property changes are not guaranteed to take effect 1478 until the following End command. Program parameter variables bound to 1479 light products whose corresponding material property changes inside a 1480 Begin/End pair are undefined until the following End command. 1481 1482 1483 Texture Coordinate Generation Property Bindings 1484 1485 Binding Components Underlying State 1486 ------------------------- ---------- ---------------------------- 1487 state.texgen[n].eye.s (a,b,c,d) TexGen eye linear plane 1488 coefficients, s coord, unit n 1489 state.texgen[n].eye.t (a,b,c,d) TexGen eye linear plane 1490 coefficients, t coord, unit n 1491 state.texgen[n].eye.r (a,b,c,d) TexGen eye linear plane 1492 coefficients, r coord, unit n 1493 state.texgen[n].eye.q (a,b,c,d) TexGen eye linear plane 1494 coefficients, q coord, unit n 1495 state.texgen[n].object.s (a,b,c,d) TexGen object linear plane 1496 coefficients, s coord, unit n 1497 state.texgen[n].object.t (a,b,c,d) TexGen object linear plane 1498 coefficients, t coord, unit n 1499 state.texgen[n].object.r (a,b,c,d) TexGen object linear plane 1500 coefficients, r coord, unit n 1501 state.texgen[n].object.q (a,b,c,d) TexGen object linear plane 1502 coefficients, q coord, unit n 1503 1504 Table X.5: Texture Coordinate Generation Property Bindings. "[n]" is 1505 optional -- texture unit <n> is used if specified; texture unit 0 is 1506 used otherwise. 1507 1508 If a program parameter binding matches a set of TexGen plane coefficients, 1509 the "x", "y", "z", and "w" components of the program parameter variable 1510 are filled with the coefficients p1, p2, p3, and p4, respectively, for 1511 object linear coefficients, and the coefficents p1', p2', p3', and p4', 1512 respectively, for eye linear coefficients (section 2.10.4). 1513 1514 1515 Fog Property Bindings 1516 1517 Binding Components Underlying State 1518 ----------------------------- ---------- ---------------------------- 1519 state.fog.color (r,g,b,a) RGB fog color (section 3.10) 1520 state.fog.params (d,s,e,r) fog density, linear start 1521 and end, and 1/(end-start) 1522 (section 3.10) 1523 1524 Table X.6: Fog Property Bindings 1525 1526 If a program parameter binding matches "state.fog.color", the "x", "y", 1527 "z", and "w" components of the program parameter variable are filled with 1528 the "r", "g", "b", and "a" components, respectively, of the fog color 1529 (section 3.10). 1530 1531 If a program parameter binding matches "state.fog.params", the "x", "y", 1532 and "z" components of the program parameter variable are filled with the 1533 fog density, linear fog start, and linear fog end parameters (section 1534 3.10), respectively. The "w" component is filled with 1/(end-start), 1535 where end and start are the linear fog end and start parameters, 1536 respectively. 1537 1538 1539 Clip Plane Property Bindings 1540 1541 Binding Components Underlying State 1542 ----------------------------- ---------- ---------------------------- 1543 state.clip[n].plane (a,b,c,d) clip plane n coefficients 1544 1545 Table X.7: Clip Plane Property Bindings. <n> specifies the clip plane 1546 number, and is required. 1547 1548 If a program parameter binding matches "state.clip[n].plane", the "x", 1549 "y", "z", and "w" components of the program parameter variable are filled 1550 with the coefficients p1', p2', p3', and p4', respectively, of clip plane 1551 <n> (section 2.11). 1552 1553 1554 Point Property Bindings 1555 1556 Binding Components Underlying State 1557 ----------------------------- ---------- ---------------------------- 1558 state.point.size (s,n,x,f) point size, min and max size 1559 clamps, and fade threshold 1560 (section 3.3) 1561 state.point.attenuation (a,b,c,1) point size attenuation consts 1562 1563 Table X.8: Point Property Bindings 1564 1565 If a program parameter binding matches "state.point.size", the "x", "y", 1566 "z", and "w" components of the program parameter variable are filled with 1567 the point size, minimum point size, maximum point size, and fade 1568 threshold, respectively (section 3.3). 1569 1570 If a program parameter binding matches "state.point.attenuation", the "x", 1571 "y", and "z" components of the program parameter variable are filled with 1572 the constant, linear, and quadratic point size attenuation parameters (a, 1573 b, and c), respectively (section 3.3). The "w" component is filled with 1574 1.0. 1575 1576 1577 Texture Environment Property Bindings 1578 1579 Binding Components Underlying State 1580 ------------------------- ---------- ---------------------------- 1581 state.texenv[n].color (r,g,b,a) texture environment n color 1582 1583 Table X.9: Texture Environment Property Bindings. "[n]" is optional -- 1584 texture unit <n> is used if specified; texture unit 0 is used otherwise. 1585 1586 If a program parameter binding matches "state.texenv[n].color", the "x", 1587 "y", "z", and "w" components of the program parameter variable are filled 1588 with the "r", "g", "b", and "a" components, respectively, of the 1589 corresponding texture environment color. Note that only "legacy" texture 1590 units, as queried by MAX_TEXTURE_UNITS, include texture environment state. 1591 Texture image units and texture coordinate sets do not have associated 1592 texture environment state. 1593 1594 1595 Depth Property Bindings 1596 1597 Binding Components Underlying State 1598 --------------------------- ---------- ---------------------------- 1599 state.depth.range (n,f,d,1) Depth range near, far, and 1600 (far-near) (section 2.10.1) 1601 1602 Table X.10: Depth Property Bindings 1603 1604 If a program parameter binding matches "state.depth.range", the "x" and 1605 "y" components of the program parameter variable are filled with the 1606 mappings of near and far clipping planes to window coordinates, 1607 respectively. The "z" component is filled with the difference of the 1608 mappings of near and far clipping planes, far minus near. The "w" 1609 component is filled with 1.0. 1610 1611 1612 Matrix Property Bindings 1613 1614 Binding Underlying State 1615 ------------------------------------ --------------------------- 1616 * state.matrix.modelview[n] modelview matrix n 1617 state.matrix.projection projection matrix 1618 state.matrix.mvp modelview-projection matrix 1619 * state.matrix.texture[n] texture matrix n 1620 state.matrix.program[n] program matrix n 1621 1622 Table X.11: Base Matrix Property Bindings. The "[n]" syntax indicates 1623 a specific matrix number. For modelview and texture matrices, a matrix 1624 number is optional, and matrix zero will be used if the matrix number is 1625 omitted. These base bindings may further be modified by a 1626 inverse/transpose selector and a row selector. 1627 1628 If the beginning of a program parameter binding matches any of the matrix 1629 binding names listed in Table X.11, the binding corresponds to a 4x4 1630 matrix. If the parameter binding is followed by ".inverse", ".transpose", 1631 or ".invtrans" (<stateMatModifier> grammar rule), the inverse, transpose, 1632 or transpose of the inverse, respectively, of the matrix specified in 1633 Table X.11 is selected. Otherwise, the matrix specified in Table X.11 is 1634 selected. If the specified matrix is poorly-conditioned (singular or 1635 nearly so), its inverse matrix is undefined. The binding name 1636 "state.matrix.mvp" refers to the product of modelview matrix zero and the 1637 projection matrix, defined as 1638 1639 MVP = P * M0, 1640 1641 where P is the projection matrix and M0 is modelview matrix zero. 1642 1643 If the selected matrix is followed by ".row[<a>]" (matching the 1644 <stateMatrixRow> grammar rule), the "x", "y", "z", and "w" components of 1645 the program parameter variable are filled with the four entries of row <a> 1646 of the selected matrix. In the example, 1647 1648 PARAM m0 = state.matrix.modelview[1].row[0]; 1649 PARAM m1 = state.matrix.projection.transpose.row[3]; 1650 1651 the variable "m0" is set to the first row (row 0) of modelview matrix 1 1652 and "m1" is set to the last row (row 3) of the transpose of the projection 1653 matrix. 1654 1655 For program parameter array bindings, multiple rows of the selected matrix 1656 can be bound via the <stateMatrixRows> grammar rule. If the selected 1657 matrix binding is followed by ".row[<a>..<b>]", the result is equivalent 1658 to specifying matrix rows <a> through <b>, in order. A program will fail 1659 to load if <a> is greater than <b>. If no row selection is specified 1660 (<optMatrixRows> matches ""), matrix rows 0 through 3 are bound in order. 1661 In the example, 1662 1663 PARAM m2[] = { state.matrix.program[0].row[1..2] }; 1664 PARAM m3[] = { state.matrix.program[0].transpose }; 1665 1666 the array "m2" has two entries, containing rows 1 and 2 of program matrix 1667 zero, and "m3" has four entries, containing all four rows of the transpose 1668 of program matrix zero. 1669 1670 1671 Section 2.X.3.4, Program Temporaries 1672 1673 Program temporary variables are used to hold temporary results during 1674 program execution. Temporaries do not persist between program 1675 invocations, and are undefined at the beginning of each program 1676 invocation. 1677 1678 Temporary variables are declared explicitly using the <TEMP_statement> 1679 grammar rule. Each such statement can declare one or more temporaries. 1680 Temporaries can not be declared implicitly. Temporaries can be declared 1681 using any component size ("SHORT" or "LONG") and type ("FLOAT" or "INT") 1682 modifier. 1683 1684 Temporary variables may be declared as arrays. Temporary variables 1685 declared as arrays may be stored in slower memory than those not declared 1686 as arrays, and it is recommended to use non-array variables unless array 1687 functionality is required. 1688 1689 1690 Section 2.X.3.5, Program Results 1691 1692 Program result variables represent the per-vertex or per-fragment results 1693 of the program. All result variables have associated bindings, are 1694 write-only during program execution, and are undefined at the beginning of 1695 each program invocation. Any vertex or fragment attributes corresponding 1696 to unwritten result variables will be undefined in subsequent stages of 1697 the pipeline. Result variables may be declared explicitly via the 1698 <OUTPUT_statement> grammar rule, or implicitly by using a result binding 1699 in an instruction. 1700 1701 The set of available result bindings depends on the program type, and is 1702 enumerated in the specifications for each program type. 1703 1704 Result variables may generally be declared as arrays, but the set of 1705 bindings allowed for arrays is limited to state grouped in arrays (e.g., 1706 texture coordinates, clip distances, colors). Additionally, all bindings 1707 assigned to the array must be of the same binding type and must increase 1708 consecutively. Examples of valid and invalid binding lists for vertex 1709 programs include: 1710 1711 result.clip[1], result.clip[2] # valid, 2-entry array 1712 result.texcoord[0..3] # valid, 4-entry array 1713 result.texcoord[1], result.texcoord[3] # invalid, skipped texcoord 2 1714 result.texcoord[2], result.texcoord[1] # invalid, wrong order 1715 result.texcoord[1], result.clip[2] # invalid, different types 1716 1717 Additionally, result bindings may be used in no more than one array 1718 addressed with relative addressing. 1719 1720 Implementations may have a limit on the total number of result binding 1721 components used by each program target (MAX_PROGRAM_RESULT_COMPONENTS_NV). 1722 Programs that require more result binding components than this limit will 1723 fail to load. The method of counting used result binding components is 1724 implementation-dependent, but must satisfy the following properties: 1725 1726 * If a result binding is not referenced in a program, or is referenced 1727 only in declarations of result variables that are not used, none of 1728 its components are counted. 1729 1730 * A result binding component may be counted as used only if there exists 1731 an instruction operand where 1732 1733 - the component is enabled in the write mask (Section 2.X.4.3), and 1734 1735 - the result binding is either 1736 1737 - referenced directly by the operand, 1738 1739 - bound to a declared variable referenced by the operand, or 1740 1741 - bound to a declared array variable where another binding in 1742 the array satisfies one of the two previous conditions. 1743 1744 Implementations are not required to optimize out unused elements of an 1745 result array or components that are used in only some elements of an 1746 array. The last of these rules is intended to cover the case where 1747 the same result binding is used in multiple variables. 1748 1749 For example, an instruction whose write mask selects only the x 1750 component may result in the x component of a result binding being 1751 counted, but may never result in the counting of the y, z, or w 1752 components of any result binding. 1753 1754 1755 Section 2.X.3.6, Program Parameter Buffers 1756 1757 Program parameter buffers are arrays consisting of single-component 1758 typeless values or four-component typeless vectors stored in a buffer 1759 object. The GL provides an implementation-dependent number of buffer 1760 object binding points for each program target, to which buffer objects can 1761 be attached. Program parameter buffer variables can be changed either by 1762 updating the contents of bound buffer objects, or simply by changing the 1763 buffer object attached to a binding point. 1764 1765 Program parameter buffer variables are used as constants during program 1766 execution. All program parameter buffer variables have an associated 1767 binding and are read-only during program execution. Program parameter 1768 buffers retain their values across program invocations, although their 1769 values may change as buffer object bindings or contents change. Program 1770 parameter buffer variables must be declared explicitly via the 1771 <BUFFER_statement> grammar rule. Program parameter buffer bindings can 1772 not be used directly in executable instructions. 1773 1774 Program parameter buffer variables are treated as an array of 1775 single-component values if the <bufferDeclType> grammar rule matches 1776 "BUFFER" or as an array of four-component vectors if it matches "BUFFER4". 1777 A program will fail to load if a variable declared as "BUFFER" and another 1778 variable declared as "BUFFER4" use the same buffer binding point. 1779 1780 Program parameter buffer variables may be declared as arrays, but all 1781 bindings assigned to the array must use the same binding point and must 1782 increase consecutively. 1783 1784 Binding Components Underlying State 1785 ----------------------------- ---------- ----------------------------- 1786 program.buffer[a][b] (x,x,x,x) program parameter buffer a, 1787 element b 1788 program.buffer[a][b..c] (x,x,x,x) program parameter buffer a, 1789 elements b through c 1790 program.buffer[a] (x,x,x,x) program parameter buffer a, 1791 all elements 1792 1793 Table X.12: Program Parameter Buffer Bindings. <a> indicates a buffer 1794 number, <b> and <c> indicate individual elements. 1795 1796 If a program parameter buffer binding matches "program.buffer[a][b]", the 1797 program parameter variable are filled with element <b> of the buffer 1798 object bound to binding point <a>. Each element of the bound buffer 1799 object is treated a one or four words of data that can hold integer or 1800 floating-point values. When a single-component binding is evaluated, the 1801 selected word is broadcast to all four components of the variable. When a 1802 four-component binding is evaluated, the four components of the buffer 1803 element are loaded into the variable. If no buffer object is bound to 1804 binding point <a>, or the bound buffer object is not large enough to hold 1805 an element <b>, the values used are undefined. The binding point <a> must 1806 be a nonnegative integer constant. 1807 1808 For program parameter buffer array declarations, "program.buffer[a][b..c]" 1809 is equivalent to specifying elements <b> through <c> of the buffer object 1810 bound to binding point <a> in order. 1811 1812 For program parameter buffer array declarations, "program.buffer[a]" is 1813 equivalent to specifying the entire buffer -- elements 0 through <n>-1, 1814 where <n> is either the size of the array (if declared) or the 1815 implementation-dependent maximum parameter buffer object size limit (if no 1816 size is declared). 1817 1818 1819 Section 2.X.3.7, Program Condition Code Registers 1820 1821 The program condition code registers are four-component vectors. Each 1822 component of this register is a collection of single-bit flags, including 1823 a sign flag (SF), a zero flag (ZF), an overflow flag (OF), and a carry 1824 flag (CF). There are two condition code registers (CC0 and CC1), whose 1825 values are undefined at the beginning of program execution. 1826 1827 Most program instructions can optionally update one of the condition code 1828 registers, by designating the condition code to update in the instruction. 1829 When a condition code component is updated, the four flags of each 1830 component of the condition code are set according to the corresponding 1831 component of the instruction result. Full details on the condition code 1832 updates and tests can be found in Section 2.X.4.3. 1833 1834 The value of these four flags can be combined in various condition code 1835 tests, which can be used to mask writes to destination variables and to 1836 perform conditional branches or other condition operations. 1837 1838 1839 Section 2.X.3.8, Program Aliases 1840 1841 Programs can create aliases by matching the <ALIAS_statement> grammar 1842 rule. Aliases allow programs to use multiple variable names to refer to a 1843 single underlying variable. For example, the statement 1844 1845 ALIAS var1 = var0 1846 1847 establishes a variable name of "var1". Subsequent references to "var1" in 1848 the program text are treated as references to "var0". The left hand side 1849 of an ALIAS statement must be a new variable name, and the right hand side 1850 must be an established variable name. 1851 1852 Aliases are not considered variable declarations, so do not count against 1853 the limits on the number of variable declarations allowed in the program 1854 text. 1855 1856 1857 Section 2.X.3.9, Program Resource Limits 1858 1859 (see ARB_vertex_program specification, incorporates all the different 1860 limits on instruction counts, temporaries, attribute bindings, program 1861 parameters, and so on) 1862 1863 1864 Section 2.X.4, Program Execution Environment 1865 1866 The set of instructions supported for GPU programs is given in Table X.13 1867 below and is described in detail in Section 2.X.8. An instruction can use 1868 up to three operands when it executes, and most instructions can write a 1869 single result vector. Instructions may also specify one or more 1870 modifiers, according to the <opModifiers> grammar rule. Instruction 1871 modifiers affect how the specified operation is performed. 1872 1873 GPU programs may operate on signed integer, unsigned integer, or 1874 floating-point values; some instructions are capable of operating on any 1875 of the three types. However, the data type of the operands and the result 1876 are always determined based solely on the instruction and its modifiers. 1877 If any of the variables used in the instruction are typeless, they will be 1878 interpreted according to the data type derived from the instruction. If 1879 any variables with a conflicting data type are used in the instruction, 1880 the program will fail to load unless the "NTC" (no type checking) 1881 instruction modifier is specified. 1882 1883 Modifiers 1884 Instruction F I C S H D Out Inputs Description 1885 ----------- - - - - - - --- -------- -------------------------------- 1886 ABS X X X X X F v v absolute value 1887 ADD X X X X X F v v,v add 1888 AND - X X - - S v v,v bitwise and 1889 BRK - - - - - - - c break out of loop instruction 1890 CAL - - - - - - - c subroutine call 1891 CEIL X X X X X F v vf ceiling 1892 CMP X X X X X F v v,v,v compare 1893 CONT - - - - - - - c continue with next loop interation 1894 COS X - X X X F s s cosine with reduction to [-PI,PI] 1895 DIV X X X X X F v v,s divide vector components by scalar 1896 DP2 X - X X X F s v,v 2-component dot product 1897 DP2A X - X X X F s v,v,v 2-comp. dot product w/scalar add 1898 DP3 X - X X X F s v,v 3-component dot product 1899 DP4 X - X X X F s v,v 4-component dot product 1900 DPH X - X X X F s v,v homogeneous dot product 1901 DST X - X X X F v v,v distance vector 1902 ELSE - - - - - - - - start if test else block 1903 ENDIF - - - - - - - - end if test block 1904 ENDREP - - - - - - - - end of repeat block 1905 EX2 X - X X X F s s exponential base 2 1906 FLR X X X X X F v vf floor 1907 FRC X - X X X F v v fraction 1908 I2F - X X - - S vf v integer to float 1909 IF - - - - - - - c start of if test block 1910 KIL X X - - X F - vc kill fragment 1911 LG2 X - X X X F s s logarithm base 2 1912 LIT X - X X X F v v compute lighting coefficients 1913 LRP X - X X X F v v,v,v linear interpolation 1914 MAD X X X X X F v v,v,v multiply and add 1915 MAX X X X X X F v v,v maximum 1916 MIN X X X X X F v v,v minimum 1917 MOD - X X - - S v v,s modulus vector components by scalar 1918 MOV X X X X X F v v move 1919 MUL X X X X X F v v,v multiply 1920 NOT - X X - - S v v bitwise not 1921 NRM X - X X X F v v normalize 3-component vector 1922 OR - X X - - S v v,v bitwise or 1923 PK2H X X - - - F s vf pack two 16-bit floats 1924 PK2US X X - - - F s vf pack two floats as unsigned 16-bit 1925 PK4B X X - - - F s vf pack four floats as signed 8-bit 1926 PK4UB X X - - - F s vf pack four floats as unsigned 8-bit 1927 POW X - X X X F s s,s exponentiate 1928 RCC X - X X X F s s reciprocal (clamped) 1929 RCP X - X X X F s s reciprocal 1930 REP X X - - X F - v start of repeat block 1931 RET - - - - - - - c subroutine return 1932 RFL X - X X X F v v,v reflection vector 1933 ROUND X X X X X F v vf round to nearest integer 1934 RSQ X - X X X F s s reciprocal square root 1935 SAD - X X - - S vu v,v,vu sum of absolute differences 1936 SCS X - X X X F v s sine/cosine without reduction 1937 SEQ X X X X X F v v,v set on equal 1938 SFL X X X X X F v v,v set on false 1939 SGE X X X X X F v v,v set on greater than or equal 1940 SGT X X X X X F v v,v set on greater than 1941 SHL - X X - - S v v,s shift left 1942 SHR - X X - - S v v,s shift right 1943 SIN X - X X X F s s sine with reduction to [-PI,PI] 1944 SLE X X X X X F v v,v set on less than or equal 1945 SLT X X X X X F v v,v set on less than 1946 SNE X X X X X F v v,v set on not equal 1947 SSG X - X X X F v v set sign 1948 STR X X X X X F v v,v set on true 1949 SUB X X X X X F v v,v subtract 1950 SWZ X - X X X F v v extended swizzle 1951 TEX X X X X - F v vf texture sample 1952 TRUNC X X X X X F v vf truncate (round toward zero) 1953 TXB X X X X - F v vf texture sample with bias 1954 TXD X X X X - F v vf,vf,vf texture sample w/partials 1955 TXF X X X X - F v vs texel fetch 1956 TXL X X X X - F v vf texture sample w/LOD 1957 TXP X X X X - F v vf texture sample w/projection 1958 TXQ - - - - - S vs vs texture info query 1959 UP2H X X X X - F vf s unpack two 16-bit floats 1960 UP2US X X X X - F vf s unpack two unsigned 16-bit ints 1961 UP4B X X X X - F vf s unpack four signed 8-bit ints 1962 UP4UB X X X X - F vf s unpack four unsigned 8-bit ints 1963 X2D X - X X X F v v,v,v 2D coordinate transformation 1964 XOR - X X - - S v v,v exclusive or 1965 XPD X - X X X F v v,v cross product 1966 1967 Table X.13: Summary of NV_gpu_program4 instructions. The "Modifiers" 1968 columns specify the set of modifiers allowed for the instruction: 1969 1970 F = floating-point data type modifiers 1971 I = signed and unsigned integer data type modifiers 1972 C = condition code update modifiers 1973 S = clamping (saturation) modifiers 1974 H = half-precision float data type suffix 1975 D = default data type modifier (F, U, or S) 1976 1977 The input and output columns describe the formats of the operands and 1978 results of the instruction. 1979 1980 v: 4-component vector (data type is inherited from operation) 1981 vf: 4-component vector (data type is always floating-point) 1982 vs: 4-component vector (data type is always signed integer) 1983 vu: 4-component vector (data type is always unsigned integer) 1984 s: scalar (replicated if written to a vector destination; 1985 data type is inherited from operation) 1986 c: condition code test result (e.g., "EQ", "GT1.x") 1987 vc: 4-component vector or condition code test 1988 1989 1990 Section 2.X.4.1, Program Instruction Modifiers 1991 1992 There are several types of instruction modifiers available. A data type 1993 modifier specifies that an instruction should operate on signed integer, 1994 unsigned integer, or floating-point data, when multiple data types are 1995 supported. A clamping modifier applies to instructions with 1996 floating-point results, and specifies the range to which the results 1997 should be clamped. A condition code update modifier specifies that the 1998 instruction should update one of the condition code variables. Several 1999 other special modifiers are also provided. 2000 2001 Instruction modifiers may be specified as stand-alone modifiers or as 2002 suffixes concatenated with the opcode name. A program will fail to load 2003 if it contains an instruction that 2004 2005 * specifies more than one modifier of any given type, 2006 2007 * specifies a clamping modifier on an instruction, unless it produces 2008 floating-point results, or 2009 2010 * specifies a modifier that is not supported by the instruction (see 2011 Table X.13 and the instruction description). 2012 2013 Stand-alone instruction modifiers are specified according to the 2014 <opModifiers> grammar rule using a ".<modifier>" syntax. Multiple 2015 modifers, separated by periods, may be specified. The set of supported 2016 modifiers is described in Table X.14. 2017 2018 Modifier Description 2019 -------- ----------------------------------------------- 2020 F Floating-point operation 2021 U Fixed-point operation, unsigned operands 2022 S Fixed-point operation, signed operands 2023 CC Update condition code register zero 2024 CC0 Update condition code register zero 2025 CC1 Update condition code register one 2026 SAT Floating-point results clamped to [0,1] 2027 SSAT Floating-point results clamped to [-1,1] 2028 NTC Disable type-checking on operands/results 2029 S24 Signed multiply (24-bit operands) 2030 U24 Unsigned multiply (24-bit operands) 2031 HI Multiplies two 32-bit integer operands, returns 2032 the 32 MSBs of the product 2033 2034 Table X.14, Instruction Modifers. 2035 2036 "F", "U", and "S" modifiers are data type modifiers and specify that the 2037 instruction should operate on floating-point, unsigned integer, or 2038 signed integer values, respectively. For example, "ADD.F", "ADD.U", and 2039 "ADD.S" specify component-wise addition of floating-point, unsigned 2040 integer, or signed integer vectors, respectively. These modifiers specify 2041 a data type, but do not specify a precision at which the operation is 2042 performed. Floating-point operations will be carried out with an internal 2043 precision no less than that used to represent the largest operand. 2044 Fixed-point operations will be carried out using at least as many bits as 2045 used to represent the largest operand. Operands represented with fewer 2046 bits than used to perform the instruction will be promoted to a larger 2047 data type. Signed integer operands will be sign-extended, where the most 2048 significant bits are filled with ones if the operand is negative and zero 2049 otherwise. Unsigned integer operands will be zero-extended, where the 2050 most significant bits are always filled with zeroes. For some 2051 instructions, the data type of some operands or the result are fixed; in 2052 these cases, the data type modifier specifies the data type of the 2053 remaining values. 2054 2055 "CC", "CC0", and "CC1" are condition code update modifiers that specify 2056 that one of the condition code registers should be updated based on the 2057 result of the instruction, as described in section 2.X.4.3. "CC" and 2058 "CC0" specify that the condition code register CC0 be updated; "CC1" 2059 specifies an update to CC1. If no condition code update modifier is 2060 provided, the condition code registers will not be affected. 2061 2062 "SAT" and "SSAT" are clamping modifiers that specify that the 2063 floating-point components of the instruction result should be clamped to 2064 [0,1] or [-1,1], respectively, before updating the condition code and the 2065 destination variable. If no clamping suffix is specified, unclamped 2066 results will be used for condition code updates (if any) and destination 2067 variable writes. Clamping modifiers are not supported on instructions 2068 that do not produce floating-point results. 2069 2070 "NTC" (no type checking) disables data type checking on the instruction, 2071 and allows instructions to use operands or result variables whose data 2072 types are inconsistent with the expected data types of the instruction. 2073 2074 "S24", "U24", and "HI" are special modifiers that are allowed only for the 2075 MUL instruction, and are described in detail where MUL is documented. No 2076 more than one such modifier may be provided for any instruction. 2077 2078 If an instruction supports data type modifiers, but none is provided, a 2079 default data type will be chosen based on the instruction, as specified in 2080 Table X.13 and the instruction set description (Section 2.X.8). If 2081 condition code update or clamping modifiers are not specified, the 2082 corresponding operation will not be performed. 2083 2084 Additionally, each instruction name may have one or more suffixes, 2085 concatenated onto the base instruction name, that operate as instruction 2086 modifiers. For conciseness, these suffixes are not spelled out in the 2087 grammar -- the base opcode name is used as a placeholder for the opcode 2088 and all of its possible suffixes. Instruction suffixes are provided 2089 mainly for compatibility with prior GPU program instruction sets (e.g., 2090 NV_vertex_program3, NV_fragment_program2, and predecessors). The set of 2091 allowable suffixes, and their equivalent stand-alone modifiers, are listed 2092 in Table X.15. 2093 2094 Suffix Modifier Description 2095 ------ ---------- --------------------------------------------------- 2096 R F Floating-point operation, 32-bit precision 2097 H F(*) Floating-point operation, at least 16-bit precision 2098 C CC0 Update condition code register zero 2099 C0 CC0 Update condition code register zero 2100 C1 CC1 Update condition code register one 2101 _SAT SAT Floating-point results clamped to [0,1] 2102 _SSAT SSAT Floating-point results clamped to [-1,1] 2103 2104 Table X.15, Instruction Suffixes. 2105 2106 The "R" and "H" suffixes specify floating-point operations and are 2107 equivalent to the "F" data type modifier. They additionally specify a 2108 minimum precision for the operations. Instructions with an "R" precision 2109 modifier will be carried out at no less than IEEE single-precision 2110 floating-point (8 bits of exponent, 23 bits of mantissa). Instructions 2111 with an "H" precision modifier will be carried out at no less than 16-bit 2112 floating-point precision (5 bits of exponent, 10 bits of mantissa). 2113 2114 An instruction may have multiple suffixes, but they must appear in order, 2115 with data type suffixes first, followed by condition code update suffixes, 2116 followed by clamping suffixes. For example, "ADDR" carries out an add at 2117 32-bit precision. "ADDH_SAT" carries out an add at 16-bit precision (or 2118 better) and clamps the results to [0,1]. "ADDRC1_SSAT" carries out an add 2119 at 32-bit floating-point precision, clamps the results to [-1,1], and 2120 updates condition code one based on the clamped result. 2121 2122 2123 Section 2.X.4.2, Program Operands 2124 2125 Most program instructions operate on one or more scalar or vector 2126 operands. Each operand specifies an operand variable, which is either the 2127 name of a previously declared variable or an implicit variable declaration 2128 created by using a variable binding in the instruction. Attribute, 2129 parameter, or parameter buffer variables can be declared implicitly by 2130 using a valid binding name in an operand. Instruction operands are 2131 specified by the <instOperandV>, <instOperandS>, or <instOperandVNS> 2132 grammar rules. 2133 2134 If the operand variable is not an array, its contents are loaded directly. 2135 If the operand variable is an array, a single element of the array is 2136 loaded according to the <arrayMem> grammar rule. The elements of an array 2137 are numbered from 0 to <n>-1, where <n> is the number of entries in the 2138 array. Array members can be accessed using either absolute or relative 2139 addressing. 2140 2141 Absolute array addressing is used when the <arrayMemAbs> grammar rule is 2142 matched; the array member to load is specified by the matching integer. 2143 Out-of-bounds array absolute accesses are not allowed. If the specified 2144 member number is greater than or equal to the size of the array, the 2145 program will fail to load. 2146 2147 Relative array addressing is used when the <arrayMemRel> grammar rule is 2148 matched. This grammar rule allows the program to specify a scalar integer 2149 operand and an optional constant offset, according to the <arrayMemReg> 2150 and <arrayMemOffset> grammar rules. When performing relative addressing, 2151 the GL evaluates the specified integer scalar operand (according to the 2152 rules specified in this section) and adds the constant offset. The array 2153 member loaded is given by this sum. The constant offset is considered 2154 zero if an offset is omitted. If the sum is negative or exceeds the size 2155 of the array, the results of the access are undefined, but may not lead to 2156 program or GL termination. The set of constant offsets supported for 2157 relative addressing is limited to values in the range [0,<n>-1], where <n> 2158 is the size of the array. A program will fail to load if it specifies an 2159 offset outside that range. If offsets outside that range are required, 2160 they can be applied by using an integer ADD instruction writing to a 2161 temporary variable. 2162 2163 After the operand is loaded, its components can be rearranged according to 2164 the <swizzleSuffix> grammar rule, or it can be converted to a scalar 2165 operand according to the <scalarSuffix> grammar rule. 2166 2167 The <swizzleSuffix> grammar rule rearranges the components of a loaded 2168 vector to produce another vector. If the <swizzleSuffix> rule matches the 2169 <xyzwSwizzle> or <rgbaSwizzle> grammar rule, a pattern of the form ".????" 2170 is used, where each question mark is replaced with one of "x", "y", "z", 2171 "w", "r", "g", "b", or a". For such patterns, the x, y, z, and w 2172 components of the operand are taken from the vector components named by 2173 the first, second, third, and fourth character of the pattern, 2174 respectively. Swizzle components of "r", "g", "b", and "a" are equivalent 2175 to "x", "y", "z", and "w", respectively. For example, if the swizzle 2176 suffix is ".yzzx" or ".gbbr" and the specified source contains {2,8,9,0}, 2177 the result is the vector {8,9,9,2}. If the <swizzleSuffix> matches the 2178 <component> grammar rule, a pattern of the form ".?" is used. For this 2179 pattern, all four components of the operand are taken from the single 2180 component identified by the pattern. If the swizzle suffix is omitted, 2181 components are not rearranged and swizzling has no effect, as though 2182 ".xyzw" were specified. 2183 2184 The swizzle suffix rules do not allow mixing "x", "y", "z", or "w" 2185 selectors with "r", "g", "b", or "a" selectors. A program will fail to 2186 load if it contains a swizzle suffix with selectors from both of these 2187 sets. 2188 2189 The <scalarSuffix> grammar rule converts a vector to a scalar by selecting 2190 a single component. The <scalarSuffix> rule is similar to the swizzle 2191 selector, except that only a single component is selected. If the scalar 2192 suffix is ".y" and the specified source contains {2,8,9,0}, the value is 2193 the scalar value 8. 2194 2195 Next, a component-wise negate operation is performed on the operand if the 2196 <operandNeg> grammar rule matches "-". Negation is not performed if the 2197 operand has no sign prefix, or is prefixed with "+". For unsigned integer 2198 operands, the negate operand performs a two's complement operation. 2199 2200 Next, a component-wise absolute value operation is performed on the 2201 operand if the <instOperandAbsV> or <instOperandAbsS> grammar rule is 2202 matched, by surrounding the operand with two "|" characters. The result 2203 is optionally negated if the <operandAbsNeg> grammar rule matches "-". 2204 For unsigned integer operands, the absolute value operation has no effect. 2205 2206 2207 Section 2.X.4.3, Program Destination Variable Update 2208 2209 Most program instructions perform computations that produce a result, 2210 which will be written to a variable. Each instruction that computes a 2211 result specifies a destination variable, which is either the name of a 2212 previously declared variable or an implicit variable declaration created 2213 by using a variable binding in the instruction. Result variables can be 2214 declared implicitly by using a valid program result binding name in the 2215 result portion of the instruction. Instruction results are specified 2216 according to the <instResult> grammar rule. 2217 2218 The destination variable may be a single member of an array. In this 2219 case, a single array member is specified using the <arrayMem> grammar 2220 rule, and the array member to update is computed in the exact same manner 2221 as done for operand loads. If the array member is computed at run time, 2222 and is negative or greater than or equal to the size of the array, the 2223 results of the destination variable update are undefined and could result 2224 in overwriting other program variables. 2225 2226 The results of the operation may be obtained at a different precision than 2227 that used to store the destination variable. If so, the results are 2228 converted to match the size of the destination variable. For 2229 floating-point values, the results are rounded to the nearest 2230 floating-point value that can be represented in the destination variable. 2231 If a result component is larger in magnitude than the largest 2232 representable floating-point value in the data type of the destination 2233 variable, an infinity encoding (+/-INF) is used. Signed or unsigned 2234 integer values are sign-extended or zero-extended, respectively, if the 2235 destination variable has more bits than the result, and have their most 2236 significant bits discarded if the destination variable has fewer bits. 2237 2238 Writes to individual components of a vector destination variable can be 2239 controlled at compile time by individual component write masks specified 2240 in the instruction. The component write mask is specified by the 2241 <optWriteMask> grammar rule, and is a string of up to four characters, 2242 naming the components to enable for writing. If no write mask is 2243 specified, all components are enabled for writing. The characters "x", 2244 "y", "z", and "w" match the x, y, z, and w components respectively. For 2245 example, a write mask mask of ".xzw" indicates that the x, z, and w 2246 components should be enabled for writing but the y component should not be 2247 written. The grammar requires that the destination register mask 2248 components must be listed in "xyzw" order. Additionally, write mask 2249 components of "r", "g", "b", and "a" are equivalent to "x", "y", "z", and 2250 "w", respectively. The grammar does not allow mixing "x", "y", "z", or 2251 "w" components with "r", "g", "b", and "a" ones. 2252 2253 Writes to individual components of a vector destination variable, or to a 2254 scalar destination variable, can also be controlled at run time using 2255 condition code write masks. The condition code write mask is specified by 2256 the <ccMask> grammar rule. If a mask is specified, a condition code 2257 variable is loaded according to the <ccMaskRule> grammar rule and tested 2258 as described in Table X.16 to produce a four-component vector of TRUE/FALSE 2259 values. 2260 2261 mask rule test name condition 2262 --------------- ---------------------- ----------------- 2263 EQ, EQ0, EQ1 equal !SF && ZF 2264 GE, GE0, GE1 greater than or equal !(SF ^ OF) 2265 GT, GT0, GT1 greater than (!SF ^ OF) && !ZF 2266 LE, LE0, LE1 less than or equal SF ^ (ZF || OF) 2267 LT, LT0, LT1 less than (SF && !ZF) ^ OF 2268 NE, NE0, NE1 not equal SF || !ZF 2269 FL, FL0, FL1 false always false 2270 TR, TR0, TR1 true always true 2271 2272 NAN, NAN0, NAN1 not a number SF && ZF 2273 LEG, LEG0, LEG1 less, equal, or greater !SF || !ZF 2274 (anything but a NaN) 2275 2276 CF, CF0, CF1 carry flag CF 2277 NCF, NCF0, NCF1 no carry flag !CF 2278 OF, OF0, OF1 overflow flag OF 2279 NOF, NOF0, NOF1 no overflow flag !OF 2280 SF, SF0, SF1 sign flag SF 2281 NSF, NSF0, NSF1 no sign flag !SF 2282 AB, AB0, AB1 above CF && !ZF 2283 BLE, BLE0, BLE1 below or equal !CF || ZF 2284 2285 Table X.16, Condition Code Tests. The allowed rules are specified in 2286 the "mask rule" column. If "0" or "1" is appended to the rule name 2287 (e.g., "EQ1"), the corresponding condition code register (CC1 in this 2288 example) is loaded, otherwise CC0 is loaded. After loading, each 2289 component is tested, using the expression listed in the "condition" 2290 column. 2291 2292 After the condition code tests are performed, the four-component result 2293 can be swizzled according to the <swizzleSuffix> grammar rule. Individual 2294 components of the destination variable are written only if the 2295 corresponding component of the swizzled condition code test result is 2296 TRUE. If both a (compile-time) component write mask and a condition code 2297 write mask are specified, destination variable components are written only 2298 if the corresponding component is enabled in both masks. 2299 2300 A program instruction can also optionally update one of the two condition 2301 code registers if the "CC", "CC0", or "CC1" instruction modifier are 2302 specified. These instruction modifiers update condition code register 2303 CC0, CC0, or CC1, respectively. The instructions "ADD.CC" or "ADD.CC0" 2304 will perform an add and update condition code zero, "ADD.CC1" will add and 2305 update condition code one, and "ADD" will simply perform the add without a 2306 condition code update. The components of the selected condition code 2307 register are updated if and only if the corresponding component of the 2308 destination variable are enabled by both write masks. For the purposes of 2309 condition code update, a scalar destination variable is treated as a 2310 vector where the scalar result is written to "x" (if enabled in the write 2311 mask), and writes to the "y", "z", and "w" components are disabled. 2312 2313 When condition code components are written, the condition code flags are 2314 updated based on the corresponding component of the result. If a 2315 component of the destination register is not enabled for writes, the 2316 corresponding condition code component is also unchanged. 2317 2318 For floating-point results, the sign flag (SF) is set if the result is 2319 less than zero or is a NaN (not a number) value. The zero flag (ZF) is 2320 set if the result is equal to zero or is a NaN. 2321 2322 For signed and unsigned integer results, the sign flag (SF) is set if the 2323 most significant bit of the value written to the result variable is set 2324 and the zero flag (ZF) is set if the result written is zero. For 2325 instructions other than those performing an integer add or subtract (ADD, 2326 MAD, SAD, SUB), the overflow and carry flags (OF and CF) are cleared. 2327 2328 For integer add or subtract operations, the overflow and carry flags by 2329 doing both signed and unsigned adds/subtracts as follows: 2330 2331 The overflow flag (OF) is set by interpreting the two operands as signed 2332 integers and performing a signed add or subtract. If the result is 2333 representable as a signed integer (i.e., doesn't overflow), the overflow 2334 flag is cleared; otherwise, it is set. 2335 2336 The carry flag (CF) is set by interpreting the two operands as unsigned 2337 integers and performing an unsigned add or subtract. If the result of 2338 an add is representable as an unsigned integer (i.e., doesn't overflow), 2339 the carry flag is cleared; otherwise, it is set. If the result of a 2340 subtract is greater than or equal to zero, the carry flag is set; 2341 otherwise, it is cleared. 2342 2343 For the purposes of condition code setting, negation modifiers turn add 2344 operations into subtracts and vice versa. If the operation is equivalent 2345 to an add with both operands negated (-A-B), the carry and overflow flags 2346 are both undefined. 2347 2348 2349 Section 2.X.4.4, Program Texture Access 2350 2351 Certain program instructions may access texture images, as described in 2352 section 3.8. The coordinates, level-of-detail, and partial derivatives 2353 used for performing the texture lookup are derived from values provided in 2354 the program as described in the various sub-sections of Section 2.X.8. 2355 These descriptions use the function 2356 2357 result_t_vec 2358 TextureSample(float_vec coord, float lod, float_vec ddx, 2359 float_vec ddy, int_vec offset); 2360 2361 which obtains a filtered texel value <tau> as described in Section 3.8.8 2362 and returns a 4-component vector (R,G,B,A) according to the format 2363 conversions specified in Table 3.21. The result vector is interpreted as 2364 floating-point, signed integer, or unsigned integer, according to the data 2365 type modifier of the instruction. If the internal format of the texture 2366 does not match the instruction's data type modifer, the results of the 2367 texture lookup are undefined. 2368 2369 (Note: For unextended OpenGL 2.0, all supported texture internal formats 2370 store integer values but return floating-point results in the range [0,1] 2371 on a texture lookup. The ARB_texture_float extension introduces 2372 floating-point internal format where components are both stored and 2373 returned as floating-point values. The EXT_texture_integer extension 2374 introduces formats that both store and return either signed or unsigned 2375 integer values.) 2376 2377 <coord> is a four-component floating-point vector from which the (s,t,r) 2378 texture coordinates used for the texture access, the layer used for array 2379 textures, and the reference value used for depth comparisons (section 2380 3.8.14) are extracted according to Table X.17. If the texture is a cube 2381 map, (s,t,r) is projected to one of the six cube faces to produce a new 2382 (s,t) vector according to Section 3.8.6. For array textures, the layer 2383 used is derived by rounding the extracted floating-point component to the 2384 nearest integer and clamping the result to the range [0,<n>-1], where <n> 2385 is the number of layers in the texture. 2386 2387 <lod> specifies the level of detail parameter and replaces the value 2388 computed in equation 3.18. <ddx> and <ddy> specify partial derivatives 2389 (ds/dx, dt/dx, dr/dx, ds/dy, dt/dy, and dr/dy) for the texture 2390 coordinates, and may be used to derive footprint shapes for anisotropic 2391 texture filtering. 2392 2393 <offset> is a constant 3-component signed integer vector specified 2394 according to the <texOffset> grammar rule, which is added to the computed 2395 <u>, <v>, and <w> texel locations prior to sampling. One, two, or three 2396 components may be specified in the instruction; if fewer than three are 2397 specified, the remaining offset components are zero. A limited range of 2398 offset values are supported; the minimum and maximum <texOffset> values 2399 are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET_EXT and 2400 MAX_PROGRAM_TEXEL_OFFSET_EXT, respectively. A program will fail to load: 2401 2402 * if the texture target specified in the instruction is 1D, ARRAY1D, 2403 SHADOW1D, or SHADOWARRAY1D, and the second or third component of the 2404 offset vector is non-zero, 2405 2406 * if the texture target specified in the instruction is 2D, RECT, 2407 ARRAY2D, SHADOW2D, SHADOWRECT, or SHADOWARRAY2D, and the third 2408 component of the offset vector is non-zero, 2409 2410 * if the texture target is CUBE or SHADOWCUBE, and any component of the 2411 offset vector is non-zero -- texel offsets are not supported for cube 2412 map or buffer textures, or 2413 2414 * if any component of the offset vector is less than 2415 MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than 2416 MAX_PROGRAM_TEXEL_OFFSET_EXT. 2417 2418 (NOTE: Texel offsets are a new feature provided by this extension and are 2419 described in more detail in edits to Section 3.8 below.) 2420 2421 The texture used by TextureSample() is one of the textures bound to the 2422 texture image unit whose number is specified in the instruction according 2423 to the <texImageUnit> grammar rule. The texture target accessed is 2424 specified according to the <texTarget> grammar rule and Table X.17. 2425 Fixed-function texture enables are always ignored when determining the 2426 texture to access in a program. 2427 2428 coordinates used 2429 texTarget Texture Type s t r layer shadow 2430 ---------------- --------------------- ----- ----- ------ 2431 1D TEXTURE_1D x - - - - 2432 2D TEXTURE_2D x y - - - 2433 3D TEXTURE_3D x y z - - 2434 CUBE TEXTURE_CUBE_MAP x y z - - 2435 RECT TEXTURE_RECTANGLE_ARB x y - - - 2436 ARRAY1D TEXTURE_1D_ARRAY_EXT x - - y - 2437 ARRAY2D TEXTURE_2D_ARRAY_EXT x y - z - 2438 SHADOW1D TEXTURE_1D x - - - z 2439 SHADOW2D TEXTURE_2D x y - - z 2440 SHADOWRECT TEXTURE_RECTANGLE_ARB x y - - z 2441 SHADOWCUBE TEXTURE_CUBE_MAP x y z - w 2442 SHADOWARRAY1D TEXTURE_1D_ARRAY_EXT x - - y z 2443 SHADOWARRAY2D TEXTURE_2D_ARRAY_EXT x y - z w 2444 BUFFER TEXTURE_BUFFER_EXT <not supported> 2445 2446 Table X.17: Texture types accessed for each of the <texTarget>, and 2447 coordinate mappings. The "SHADOW" and "ARRAY" targets are special 2448 pseudo-targets described below. The "coordinates used" column indicate 2449 the input values used for each coordinate of the texture lookup, the 2450 layer selector for array textures, and the reference value for texture 2451 comparisons. Buffer textures are not supported by normal texture lookup 2452 functions, but are supported by TXF and TXQ, described below. 2453 2454 Texture targets with "SHADOW" are used to access textures with a 2455 DEPTH_COMPONENT base internal format using depth comparisons (Section 2456 3.8.14). Results of a texture access are undefined: 2457 2458 * if a "SHADOW" target is used, and the corresponding texture has a base 2459 internal format other than DEPTH_COMPONENT or a TEXTURE_COMPARE_MODE 2460 of NONE, or 2461 2462 * if a non-"SHADOW" target is used, and the corresponding texture has a 2463 base internal format of DEPTH_COMPONENT and a TEXTURE_COMPARE_MODE 2464 other than NONE. 2465 2466 If the texture being accessed is not complete (or cube complete for 2467 cubemap textures), no texture access is performed and the result is 2468 undefined. 2469 2470 A program will fail to load if it attempts to sample from multiple texture 2471 targets (including the SHADOW pseudo-targets) on the same texture image 2472 unit. For example, a program containing any two the following 2473 instructions will fail to load: 2474 2475 TEX out, coord, texture[0], 1D; 2476 TEX out, coord, texture[0], 2D; 2477 TEX out, coord, texture[0], ARRAY2D; 2478 TEX out, coord, texture[0], SHADOW2D; 2479 TEX out, coord, texture[0], 3D; 2480 2481 Additionally, multiple texture targets for a single texture image unit may 2482 not be used at the same time by the GL. The error INVALID_OPERATION is 2483 generated by Begin, RasterPos, or any command that performs an implicit 2484 Begin if an enabled program accesses one texture target for a texture unit 2485 while another enabled program or fixed-function fragment processing 2486 accesses a different texture target for the same texture image unit. 2487 2488 Some texture instructions use standard methods to compute partial 2489 derivatives and/or the level-of-detail used to perform texture accesses. 2490 For fragment programs, the functions 2491 2492 float_vec ComputePartialsX(float_vec coord); 2493 float_vec ComputePartialsY(float_vec coord); 2494 2495 compute approximate component-wise partial derivatives of the 2496 floating-point vector <coord> relative to the X and Y coordinates, 2497 respectively. For vertex and geometry programs, these functions always 2498 return (0,0,0,0). The function 2499 2500 float ComputeLOD(float_vec ddx, float_vec ddy); 2501 2502 maps partial derivative vectors <ddx> and <ddy> to ds/dx, dt/dx, dr/dx, 2503 ds/dy, dt/dy, and dr/dy and computes lambda_base(x,y) according to 2504 equation 3.18. 2505 2506 The TXF instruction provides the ability to extract a single texel from a 2507 specified texture image using the function 2508 2509 result_t_vec TexelFetch(int_vec coord, int_vec offset); 2510 2511 The extracted texel is converted to an (R,G,B,A) vector according to Table 2512 3.21. The result vector is interpreted as floating-point, signed integer, 2513 or unsigned integer, according to the data type modifier of the 2514 instruction. If the internal format of the texture is not compatible with 2515 the instruction's data type modifer, the extracted texel value is 2516 undefined. 2517 2518 <coord> is a four-component signed integer vector used to identify the 2519 single texel accessed. The (i,j,k) coordinates of the texel and the layer 2520 used for array textures are extracted according to Table X.18. The level 2521 of detail accessed is obtained by adding the w component of <coord> to the 2522 base level (level_base). <offset> is a constant 3-component signed 2523 integer vector added to the texel coordinates prior to the texel fetch as 2524 described above. In addition to the restrictions described above, 2525 non-zero offset components are also not supported for BUFFER targets. 2526 2527 The texture used by TexelFetch() is specified by the image unit and target 2528 parameters provided in the instruction, as for TextureSample() above. 2529 Single texel fetches can not perform depth comparisons or access cubemaps. 2530 If a program contains a TXF instruction specifying one of the "SHADOW" or 2531 "CUBE" targets, it will fail to load. 2532 2533 coordinates used 2534 texTarget supported i j k layer lod 2535 ---------------- --------- ----- ----- --- 2536 1D yes x - - - w 2537 2D yes x y - - w 2538 3D yes x y z - w 2539 CUBE no - - - - - 2540 RECT yes x y - - w 2541 ARRAY1D yes x - - y w 2542 ARRAY2D yes x y - z w 2543 SHADOW1D no - - - - - 2544 SHADOW2D no - - - - - 2545 SHADOWRECT no - - - - - 2546 SHADOWCUBE no - - - - - 2547 SHADOWARRAY1D no - - - - - 2548 SHADOWARRAY2D no - - - - - 2549 BUFFER yes x - - - - 2550 2551 Table X.18, Mappings of texel fetch coordinates to texel location. 2552 2553 Single-texel fetches do not support LOD clamping or any texture wrap mode, 2554 and require a mipmapped minification filter to access any level of detail 2555 other than the base level. The results of the texel fetch are undefined: 2556 2557 * if the computed LOD is less than the texture's base level (level_base) 2558 or greater than the maximum level (level_max), 2559 2560 * if the computed LOD is not the texture's base level and the texture's 2561 minification filter is NEAREST or LINEAR, 2562 2563 * if the layer specified for array textures is negative or greater than 2564 the number of layers in the array texture, 2565 2566 * if the texel at (i,j,k) coordinates refer to a border texel outside 2567 the defined extents of the specified LOD, where 2568 2569 i < -b_s, j < -b_s, k < -b_s, 2570 i >= w_s - b_s, j >= h_s - b_s, or k >= d_s - b_s, 2571 2572 where the size parameters (w_s, h_s, d_s, and b_s) refer to the width, 2573 height, depth, and border size of the image, as in equations 3.15, 2574 3.16, and 3.17, or 2575 2576 * if the texture being accessed is not complete (or cube complete for 2577 cubemaps). 2578 2579 2580 Section 2.X.5, Program Flow Control 2581 2582 In addition to basic arithmetic, logical, and texture instructions, a 2583 number of flow control instructions are provided, which are described in 2584 detail in Section 2.X.8. Programs can contain several types of 2585 instruction blocks: IF/ELSE/ENDIF blocks, REP/ENDREP blocks, and 2586 subroutine blocks. IF/ELSE/ENDIF blocks are a set of instructions 2587 beginning with an "IF" instruction, ending with an "ENDIF" instruction, 2588 and possibly containing an optional "ELSE" instruction. REP/ENDREP blocks 2589 are a set of instructions beginning with a "REP" instruction and ending 2590 with an "ENDREP" instruction. Subroutine blocks begin with an instruction 2591 label identifying the name of the subroutine and ending just before the 2592 next instruction label or the end of the program. Examples include the 2593 following: 2594 2595 MOVC CC, R0; 2596 IF GT.x; 2597 MOV R0, R1; # executes if R0.x > 0 2598 ELSE; 2599 MOV R0, R2; # executes if R0.x <= 0 2600 ENDIF; 2601 2602 REP repCount; 2603 ADD R0, R0, R1; 2604 ENDREP; 2605 2606 square: # subroutine to compute R0^2 2607 MUL R0, R0, R0; 2608 RET; 2609 main: 2610 MOV R0, 9.0; 2611 CAL square; # compute 9.0^2 in R0 2612 2613 IF/ELSE/ENDIF and REP/ENDREP blocks may be nested inside each other, and 2614 inside subroutines. In all cases, each instruction block must be 2615 terminated with the appropriate instruction (ENDIF for IF, ENDREP for 2616 REP). Nested instruction blocks must be wholly contained within a block 2617 -- if a REP instruction is found between an IF and ELSE instruction, the 2618 corresponding ENDREP must also be present between the IF and ELSE. 2619 Subroutines may not be nested inside IF/ELSE/ENDIF or REP/ENDREP blocks, 2620 or inside other subroutines. A program will fail to load if any 2621 instruction block is terminated by an incorrect instruction, is not 2622 terminated before the block containing it, or contains an instruction 2623 label. 2624 2625 IF/ELSE/ENDIF blocks evaluate a condition to determine which instructions 2626 to execute. If the condition is true, all instructions between the IF and 2627 ELSE are executed. If the condition is false, all instructions between 2628 the ELSE and ENDIF are executed. The ELSE instruction is optional. If 2629 the ELSE is omitted, all instructions between the IF and ENDIF are 2630 executed if the condition is true, or skipped if the condition is false. 2631 A limited amount of nesting is supported -- a program will fail to load if 2632 an IF instruction is nested inside MAX_PROGRAM_IF_DEPTH_NV or more 2633 IF/ELSE/ENDIF blocks. 2634 2635 REP/ENDREP blocks are used to execute a sequence of instructions multiple 2636 times. The REP instruction includes an optional scalar operand to specify 2637 a loop count indicating the number of times the block of instructions 2638 should be repeated. If the loop count is omitted, the contents of a 2639 REP/ENDREP block will be repeated indefinitely until the loop is 2640 explicitly terminated. A limited amount of nesting is supported -- a 2641 program will fail to load if a REP instruction is nested inside 2642 MAX_PROGRAM_LOOP_DEPTH_NV or more REP/ENDREP blocks. 2643 2644 Within a REP/ENDREP block, the CONT instruction can be used to terminate 2645 the current iteration of the loop by effectively jumping to the ENDREP 2646 instruction. The BRK instruction can be used to terminate the entire loop 2647 by effectively jumping to the instruction immediately following the ENDREP 2648 instruction. If CONT and BRK instructions are found inside multiply 2649 nested REP/ENDREP blocks, they apply to the innermost block. A program 2650 will fail to load if it includes a CONT or BRK instruction that is not 2651 contained inside a REP/ENDREP block. 2652 2653 A REP/ENDREP block without a specified loop count can result in an 2654 infinite loop. To prevent obvious infinite loops, a program will fail to 2655 load if it contains a REP/ENDREP block that contains neither a BRK 2656 instruction at the current nesting level or a RET instruction at any 2657 nesting level. 2658 2659 Subroutines are supported via the CAL and RET instructions. A subroutine 2660 block is identified by an instruction, which can be any valid identifier 2661 according to the <instLabel> grammar rule. The CAL instruction identifies 2662 a subroutine name to call according to the <instTarget> grammar rule. 2663 Instruction labels used in CAL instructions do not need to be defined in 2664 the program text that precedes the instruction, but a program will fail to 2665 load if it includes a CAL instruction that references an instruction label 2666 that is not defined anywhere in the program. When a CAL instruction is 2667 executed, it transfers control to the instruction immediately following 2668 the specified instruction label. Subsequent instructions in that 2669 subroutine are executed until a RET instruction is executed, or until 2670 program execution reaches another instruction label or the end of the 2671 program text. After the subroutine finishes, execution continues with the 2672 instruction immediately following the CAL instruction. When a RET 2673 instruction is issued, it will break out of any IF/ELSE/ENDIF or 2674 REP/ENDREP blocks that contain it. 2675 2676 Subroutines may call other subroutines before completing, up to an 2677 implementation-dependent maximum depth of MAX_PROGRAM_CALL_DEPTH_NV calls. 2678 Subroutines may call any subroutine in the program, including themselves, 2679 as long as the call depth limit is obeyed. The results of issuing a CAL 2680 instruction while MAX_PROGRAM_CALL_DEPTH subroutines have not completed 2681 has undefined results, including possible program termination. 2682 2683 Several flow control instructions include condition code tests. The IF 2684 instruction requires a condition test to determine what instructions are 2685 executed. The CONT, BRK, CAL, and RET instructions have an optional 2686 condition code test; if the test fails, the instructions are not executed. 2687 Condition code tests are specified by the <ccTest> grammar rule. The test 2688 is evaluated like the condition code write mask (section 2.X.4.3), and 2689 passes if and only if any of the four components passes. 2690 2691 If an instruction label named "main" is specified, GPU program execution 2692 begins with the instruction immediately following that label. Otherwise, 2693 it begins with the first instruction of the program. Instructions are 2694 executed in sequence until either a RET instruction is issued in the main 2695 subroutine or the end of the program text is reached. 2696 2697 2698 Section 2.X.6, Program Options 2699 2700 Programs may specify a number of options to indicate that one or more 2701 extended language features are used by the program. All program options 2702 used by the program must be declared at the beginning of the program 2703 string. Each program option specified in a program string will modify the 2704 syntactic or semantic rules used to interpet the program and the execution 2705 environment used to execute the program. Features in program options 2706 not declared by the program are ignored, even if the option is otherwise 2707 supported by the GL. Each option declaration consists of two tokens: the 2708 keyword "OPTION" and an identifier. 2709 2710 The set of available options depends on the program type, and is 2711 enumerated in the specifications for each program type. Some program 2712 types may not provide any options. 2713 2714 2715 Section 2.X.7, Program Declarations 2716 2717 Programs may include a number of declaration statements to specify 2718 characteristics of the program. Each declaration statement is followed by 2719 one or more arguments, separated by commas. 2720 2721 The set of available declarations depends on the program type, and is 2722 enumerated in the specifications for each program type. Some program 2723 types may not provide declarations. 2724 2725 2726 Section 2.X.8, Program Instruction Set 2727 2728 The following sections enumerate the set of instructions supported for GPU 2729 programs. 2730 2731 Some instructions allow the use of one of the three basic data type 2732 modifiers (floating point, signed integer, and unsigned integer). Unless 2733 otherwise mentioned: 2734 2735 * the result and all of the operands will be interpreted according to 2736 the specified data type, and 2737 2738 * if no data type modifier is specified, the instruction will operate as 2739 though a floating-point modifier ("F") were specified. 2740 2741 Some instructions will override one or both of these rules. 2742 2743 2744 Section 2.X.8.Z, ABS: Absolute Value 2745 2746 The ABS instruction performs a component-wise absolute value operation on 2747 the single operand to yield a result vector. 2748 2749 tmp = VectorLoad(op0); 2750 result.x = abs(tmp.x); 2751 result.y = abs(tmp.y); 2752 result.z = abs(tmp.z); 2753 result.w = abs(tmp.w); 2754 2755 ABS supports all three data type modifiers. Taking the absolute value of 2756 an unsigned integer is not a useful operation, but is not illegal. 2757 2758 2759 Section 2.X.8.Z, ADD: Add 2760 2761 The ADD instruction performs a component-wise add of the two operands to 2762 yield a result vector. 2763 2764 tmp0 = VectorLoad(op0); 2765 tmp1 = VectorLoad(op1); 2766 result.x = tmp0.x + tmp1.x; 2767 result.y = tmp0.y + tmp1.y; 2768 result.z = tmp0.z + tmp1.z; 2769 result.w = tmp0.w + tmp1.w; 2770 2771 ADD supports all three data type modifiers. 2772 2773 2774 Section 2.X.8.Z, AND: Bitwise AND 2775 2776 The AND instruction performs a bitwise AND operation on the components of 2777 the two source vectors to yield a result vector. 2778 2779 tmp0 = VectorLoad(op0); 2780 tmp1 = VectorLoad(op1); 2781 result.x = tmp0.x & tmp1.x; 2782 result.y = tmp0.y & tmp1.y; 2783 result.z = tmp0.z & tmp1.z; 2784 result.w = tmp0.w & tmp1.w; 2785 2786 AND supports only signed and unsigned integer data type modifiers. If no 2787 type modifier is specified, both operands and the result are treated as 2788 signed integers. 2789 2790 2791 Section 2.X.8.Z, BRK: Break out of Loop Instruction 2792 2793 The BRK instruction conditionally transfers control to the instruction 2794 immediately following the next ENDREP instruction. A BRK instruction has 2795 no effect if the condition code test evaluates to FALSE. 2796 2797 The following pseudocode describes the operation of the instruction: 2798 2799 if (TestCC(cc.c***) || TestCC(cc.*c**) || 2800 TestCC(cc.**c*) || TestCC(cc.***c)) { 2801 continue execution at instruction following the next ENDREP; 2802 } 2803 2804 2805 Section 2.X.8.Z, CAL: Subroutine Call 2806 2807 The CAL instruction conditionally transfers control to the instruction 2808 following the label specified in the instruction. It also pushes a 2809 reference to the instruction immediately following the CAL instruction 2810 onto the call stack, where execution will continue after executing the 2811 matching RET instruction. The following pseudocode describes the 2812 operation of the instruction: 2813 2814 if (TestCC(cc.c***) || TestCC(cc.*c**) || 2815 TestCC(cc.**c*) || TestCC(cc.***c)) { 2816 if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) { 2817 // undefined results 2818 } else { 2819 callStack[callStackDepth] = nextInstruction; 2820 callStackDepth++; 2821 } 2822 // continue execution at instruction following <instTarget> 2823 } else { 2824 // do nothing 2825 } 2826 2827 In the pseudocode, <instTarget> is the label specified in the instruction 2828 matching the <branchLabel> grammar rule, <callStackDepth> is the current 2829 depth of the call stack, <callStack> is an array holding the call stack, 2830 and <nextInstruction> is a reference to the instruction immediately 2831 following the CAL instruction in the program string. 2832 2833 If the call stack overflows, the results of the CAL instruction are 2834 undefined, and can result in immediate program termination. 2835 2836 An instruction label signifies the beginning of a new subroutine. 2837 Subroutines may not nest or overlap. If a CAL instruction is executed and 2838 subsequent program execution reaches an instruction label before a 2839 corresponding RET instruction is executed, the subroutine call returns 2840 immediately, as though an unconditional RET instruction were inserted 2841 immediately before the instruction label. 2842 2843 (Note: On previous vertex program extensions -- NV_vertex_program2 and 2844 NV_vertex_program3 -- instruction labels were also used as targets for 2845 branch (BRA) instructions. This unstructured branching functionality has 2846 been replaced with the structured branching constructs found in this 2847 instruction set.) 2848 2849 2850 Section 2.X.8.Z, CEIL: Ceiling 2851 2852 The CEIL instruction loads a single vector operand and performs a 2853 component-wise ceiling operation to generate a result vector. 2854 2855 tmp = VectorLoad(op0); 2856 iresult.x = ceil(tmp.x); 2857 iresult.y = ceil(tmp.y); 2858 iresult.z = ceil(tmp.z); 2859 iresult.w = ceil(tmp.w); 2860 2861 The ceiling operation returns the nearest integer greater than or equal to 2862 the operand. For example ceil(-1.7) = -1.0, ceil(+1.0) = +1.0, and 2863 ceil(+3.7) = +4.0. 2864 2865 CEIL supports all three data type modifiers. The single operand is always 2866 treated as a floating-point vector, but the result is written as a 2867 floating-point value, a signed integer, or an unsigned integer, as 2868 specified by the data type modifier. If a value is not exactly 2869 representable using the data type of the result (e.g., an overflow or 2870 writing a negative value to an unsigned integer), the result is undefined. 2871 2872 2873 Section 2.X.8.Z, CMP: Compare 2874 2875 The CMP instructions performs a component-wise comparison of the first 2876 operand against zero, and copies the values of the second or third 2877 operands based on the results of the compare. 2878 2879 tmp0 = VectorLoad(op0); 2880 tmp1 = VectorLoad(op1); 2881 tmp2 = VectorLoad(op2); 2882 result.x = (tmp0.x < 0) ? tmp1.x : tmp2.x; 2883 result.y = (tmp0.y < 0) ? tmp1.y : tmp2.y; 2884 result.z = (tmp0.z < 0) ? tmp1.z : tmp2.z; 2885 result.w = (tmp0.w < 0) ? tmp1.w : tmp2.w; 2886 2887 CMP supports all three data type modifiers. CMP with an unsigned data 2888 type modifier is not a useful operation, but is not illegal. 2889 2890 2891 Section 2.X.8.Z, CONT: Continue with Next Loop Iteration 2892 2893 The CONT instruction conditionally transfers control to the next ENDREP 2894 instruction. A CONT instruction has no effect if the condition code test 2895 evaluates to FALSE. 2896 2897 The following pseudocode describes the operation of the instruction: 2898 2899 if (TestCC(cc.c***) || TestCC(cc.*c**) || 2900 TestCC(cc.**c*) || TestCC(cc.***c)) { 2901 continue execution at the next ENDREP; 2902 } 2903 2904 2905 Section 2.X.8.Z, COS: Cosine with Reduction to [-PI,PI] 2906 2907 The COS instruction approximates the trigonometric cosine of the angle 2908 specified by the scalar operand and replicates it to all four components 2909 of the result vector. The angle is specified in radians and does not have 2910 to be in the range [-PI,PI]. 2911 2912 tmp = ScalarLoad(op0); 2913 result.x = ApproxCosine(tmp); 2914 result.y = ApproxCosine(tmp); 2915 result.z = ApproxCosine(tmp); 2916 result.w = ApproxCosine(tmp); 2917 2918 COS supports only floating-point data type modifiers. 2919 2920 2921 Section 2.X.8.Z, DDX: Partial Derivative Relative to X 2922 2923 The DDX instruction computes approximate partial derivatives of a vector 2924 operand with respect to the X window coordinate, and is only available to 2925 fragment programs. See the NV_fragment_program4 specification for more 2926 details. 2927 2928 2929 Section 2.X.8.Z, DDY: Partial Derivative Relative to Y 2930 2931 The DDY instruction computes approximate partial derivatives of a vector 2932 operand with respect to the Y window coordinate, and is only available to 2933 fragment programs. See the NV_fragment_program4 specification for more 2934 details. 2935 2936 2937 Section 2.X.8.Z, DIV: Divide Vector Components by Scalar 2938 2939 The DIV instruction performs a component-wise divide of the first vector 2940 operand by the second scalar operand to produce a 4-component result 2941 vector. 2942 2943 tmp0 = VectorLoad(op0); 2944 tmp1 = ScalarLoad(op1); 2945 result.x = tmp0.x / tmp1; 2946 result.y = tmp0.y / tmp1; 2947 result.z = tmp0.z / tmp1; 2948 result.w = tmp0.w / tmp1; 2949 2950 DIV supports all three data type modifiers. For floating-point division, 2951 this instruction is not guaranteed to produce results identical to a 2952 RCP/MUL instruction sequence. 2953 2954 The results of an signed or unsigned integer division by zero are 2955 undefined. 2956 2957 2958 Section 2.X.8.Z, DP2: 2-Component Dot Product 2959 2960 The DP2 instruction computes a two-component dot product of the two 2961 operands (using the first two components) and replicates the dot product 2962 to all four components of the result vector. 2963 2964 tmp0 = VectorLoad(op0); 2965 tmp1 = VectorLoad(op1); 2966 dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y); 2967 result.x = dot; 2968 result.y = dot; 2969 result.z = dot; 2970 result.w = dot; 2971 2972 DP2 supports only floating-point data type modifiers. 2973 2974 2975 Section 2.X.8.Z, DP2A: 2-Component Dot Product with Scalar Add 2976 2977 The DP2 instruction computes a two-component dot product of the two 2978 operands (using the first two components), adds the x component of the 2979 third operand, and replicates the result to all four components of the 2980 result vector. 2981 2982 tmp0 = VectorLoad(op0); 2983 tmp1 = VectorLoad(op1); 2984 tmp2 = VectorLoad(op2); 2985 dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + tmp2.x; 2986 result.x = dot; 2987 result.y = dot; 2988 result.z = dot; 2989 result.w = dot; 2990 2991 DP2A supports only floating-point data type modifiers. 2992 2993 2994 Section 2.X.8.Z, DP3: 3-Component Dot Product 2995 2996 The DP3 instruction computes a three-component dot product of the two 2997 operands (using the x, y, and z components) and replicates the dot product 2998 to all four components of the result vector. 2999 3000 tmp0 = VectorLoad(op0); 3001 tmp1 = VectorLoad(op1); 3002 dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 3003 (tmp0.z * tmp1.z); 3004 result.x = dot; 3005 result.y = dot; 3006 result.z = dot; 3007 result.w = dot; 3008 3009 DP3 supports only floating-point data type modifiers. 3010 3011 3012 Section 2.X.8.Z, DP4: 4-Component Dot Product 3013 3014 The DP4 instruction computes a four-component dot product of the two 3015 operands and replicates the dot product to all four components of the 3016 result vector. 3017 3018 tmp0 = VectorLoad(op0); 3019 tmp1 = VectorLoad(op1): 3020 dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 3021 (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w); 3022 result.x = dot; 3023 result.y = dot; 3024 result.z = dot; 3025 result.w = dot; 3026 3027 DP4 supports only floating-point data type modifiers. 3028 3029 3030 Section 2.X.8.Z, DPH: Homogeneous Dot Product 3031 3032 The DPH instruction computes a three-component dot product of the two 3033 operands (using the x, y, and z components), adds the w component of the 3034 second operand, and replicates the sum to all four components of the 3035 result vector. This is equivalent to a four-component dot product where 3036 the w component of the first operand is forced to 1.0. 3037 3038 tmp0 = VectorLoad(op0); 3039 tmp1 = VectorLoad(op1): 3040 dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 3041 (tmp0.z * tmp1.z) + tmp1.w; 3042 result.x = dot; 3043 result.y = dot; 3044 result.z = dot; 3045 result.w = dot; 3046 3047 DPH supports only floating-point data type modifiers. 3048 3049 3050 Section 2.X.8.Z, DST: Distance Vector 3051 3052 The DST instruction computes a distance vector from two specially- 3053 formatted operands. The first operand should be of the form [NA, d^2, 3054 d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d], 3055 where NA values are not relevant to the calculation and d is a vector 3056 length. If both vectors satisfy these conditions, the result vector will 3057 be of the form [1.0, d, d^2, 1/d]. 3058 3059 The exact behavior is specified in the following pseudo-code: 3060 3061 tmp0 = VectorLoad(op0); 3062 tmp1 = VectorLoad(op1); 3063 result.x = 1.0; 3064 result.y = tmp0.y * tmp1.y; 3065 result.z = tmp0.z; 3066 result.w = tmp1.w; 3067 3068 Given an arbitrary vector, d^2 can be obtained using the DP3 instruction 3069 (using the same vector for both operands) and 1/d can be obtained from d^2 3070 using the RSQ instruction. 3071 3072 This distance vector is useful for per-vertex light attenuation 3073 calculations: a DP3 operation using the distance vector and an 3074 attenuation constants vector as operands will yield the attenuation 3075 factor. 3076 3077 DST supports only floating-point data type modifiers. 3078 3079 3080 Section 2.X.8.Z, ELSE: Start of If Test Else Block 3081 3082 The ELSE instruction signifies the end of the "execute if true" portion of 3083 an IF/ELSE/ENDIF block and the beginning of the "execute if false" 3084 portion. 3085 3086 If the condition evaluated at the IF statement was TRUE, when a program 3087 reaches the ELSE statement, it has completed the entire "execute if true" 3088 portion of the IF/ELSE/ENDIF block. Execution will continue at the 3089 corresponding ENDIF instruction. 3090 3091 If the condition evaluated at the IF statement was FALSE, program 3092 execution would skip over the entire "execute if true" portion of the 3093 IF/ELSE/ENDIF block, including the ELSE instruction. 3094 3095 3096 Section 2.X.8.Z, EMIT: Emit Vertex 3097 3098 The EMIT instruction emits a new vertex to be added to the current output 3099 primitive generated by a geometry program, and is only available to 3100 geometry programs. See the NV_geometry_program4 specification for more 3101 details. 3102 3103 3104 Section 2.X.8.Z, ENDIF: End of If Test Block 3105 3106 The ENDIF instruction signifies the end of an IF/ELSE/ENDIF block. It has 3107 no other effect on program execution. 3108 3109 3110 Section 2.X.8,Z, ENDPRIM: End of Primitive 3111 3112 A geometry program can emit multiple primitives in a single invocation. 3113 The ENDPRIM instruction is used in a geometry program to signify the end 3114 of the current primitive and the beginning of a new primitive of the same 3115 type. It is only available to geometry programs. See the 3116 NV_geometry_program4 specification for more details. 3117 3118 3119 Section 2.X.8.Z, ENDREP: End of Repeat Block 3120 3121 The ENDREP instruction specifies the end of a REP block. 3122 3123 When used with in conjunction with a REP instruction with a loop count, 3124 ENDREP decrements the loop counter. If the decremented loop counter is 3125 greater than zero, ENDREP transfers control to the instruction immediately 3126 after the corresponding REP instruction. If the loop counter is less than 3127 or equal to zero, execution continues at the instruction following the 3128 ENDREP instruction. When used in conjunction with a REP instruction 3129 without loop count, ENDREP always transfers control to the instruction 3130 immediately after the REP instruction. 3131 3132 if (REP instruction includes a loop count) { 3133 LoopCount--; 3134 if (LoopCount > 0) { 3135 continue execution at instruction following corresponding REP 3136 instruction; 3137 } 3138 } else { 3139 continue execution at instruction following corresponding REP 3140 instruction; 3141 } 3142 3143 3144 Section 2.X.8.Z, EX2: Exponential Base 2 3145 3146 The EX2 instruction approximates 2 raised to the power of the scalar 3147 operand and replicates the approximation to all four components of the 3148 result vector. 3149 3150 tmp = ScalarLoad(op0); 3151 result.x = Approx2ToX(tmp); 3152 result.y = Approx2ToX(tmp); 3153 result.z = Approx2ToX(tmp); 3154 result.w = Approx2ToX(tmp); 3155 3156 EX2 supports only floating-point data type modifiers. 3157 3158 3159 Section 2.X.8.Z, FLR: Floor 3160 3161 The FLR instruction loads a single vector operand and performs a 3162 component-wise floor operation to generate a result vector. 3163 3164 tmp = VectorLoad(op0); 3165 result.x = floor(tmp.x); 3166 result.y = floor(tmp.y); 3167 result.z = floor(tmp.z); 3168 result.w = floor(tmp.w); 3169 3170 The floor operation returns the nearest integer less than or equal to the 3171 operand. For example floor(-1.7) = -2.0, floor(+1.0) = +1.0, and floor(+3.7) 3172 = +3.0. 3173 3174 FLR supports all three data type modifiers. The single operand is always 3175 treated as a floating-point value, but the result is written as a 3176 floating-point value, a signed integer, or an unsigned integer, as 3177 specified by the data type modifier. If a value is not exactly 3178 representable using the data type of the result (e.g., an overflow or 3179 writing a negative value to an unsigned integer), the result is undefined. 3180 3181 3182 Section 2.X.8.Z, FRC: Fraction 3183 3184 The FRC instruction extracts the fractional portion of each component of 3185 the operand to generate a result vector. The fractional portion of a 3186 component is defined as the result after subtracting off the floor of the 3187 component (see FLR), and is always in the range [0.0, 1.0). 3188 3189 For negative values, the fractional portion is NOT the number written to 3190 the right of the decimal point -- the fractional portion of -1.7 is not 3191 0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0) 3192 from -1.7. 3193 3194 tmp = VectorLoad(op0); 3195 result.x = fraction(tmp.x); 3196 result.y = fraction(tmp.y); 3197 result.z = fraction(tmp.z); 3198 result.w = fraction(tmp.w); 3199 3200 FRC supports only floating-point data type modifiers. 3201 3202 3203 Section 2.X.8.Z, I2F: Integer to Float 3204 3205 The I2F instruction converts the components of an integer vector operand 3206 to floating-point to produce a floating-point result vector. 3207 3208 tmp = VectorLoad(op0); 3209 result.x = (float) tmp.x; 3210 result.y = (float) tmp.y; 3211 result.z = (float) tmp.z; 3212 result.w = (float) tmp.w; 3213 3214 I2F supports only signed and unsigned integer data type modifiers. The 3215 single operand is interpreted according to the data type modifier. If no 3216 data type modifier is specified, the operand is treated as a signed 3217 integer vector. The result is always written as a float. 3218 3219 3220 Section 2.X.8.Z, IF: Start of If Test Block 3221 3222 The IF instruction performs a condition code test to determine what 3223 instructions inside an IF/ELSE/ENDIF block are executed. If the test 3224 passes, execution continues at the instruction immediately following the 3225 IF instruction. If the test fails, IF transfers control to the 3226 instruction immediately following the corresponding ELSE instruction (if 3227 present) or the ENDIF instruction (if no ELSE is present). 3228 3229 Implementations may have a limited ability to nest IF blocks in any 3230 subroutine. If the number of IF/ENDIF blocks nested inside each other is 3231 MAX_PROGRAM_IF_DEPTH_NV or higher, a program will fail to compile. 3232 3233 // Evaluate the condition. If the condition is true, continue at the 3234 // next instruction. Otherwise, continue at the 3235 if (TestCC(cc.c***) || TestCC(cc.*c**) || 3236 TestCC(cc.**c*) || TestCC(cc.***c)) { 3237 continue execution at the next instruction; 3238 } else if (IF block contains an ELSE statement) { 3239 continue execution at instruction following corresponding ELSE; 3240 } else { 3241 continue execution at instruction following corresponding ENDIF; 3242 } 3243 3244 (Note: Unlike the NV_fragment_program2 extension, there is no run-time 3245 limit on the maximum overall depth of IF/ENDIF nesting. As long as each 3246 individual subroutine of the program obeys the static nesting limits, 3247 there will be no run-time errors in the program. With the 3248 NV_fragment_program2 extension, a program could terminate abnormally if it 3249 called a subroutine inside a very deeply nested set of IF/ENDIF blocks and 3250 the called subroutine also contained deeply nested IF/ENDIF blocks. SUch 3251 an error could occur even if neither subroutine exceeded static limits.) 3252 3253 3254 Section 2.X.8.Z, KIL: Kill Fragment 3255 3256 The KIL instruction conditionally kills a fragment, and is only available 3257 to fragment programs. See the NV_fragment_program4 specification for more 3258 details. 3259 3260 3261 Section 2.X.8.Z, LG2: Logarithm Base 2 3262 3263 The LG2 instruction approximates the base 2 logarithm of the scalar 3264 operand and replicates it to all four components of the result vector. 3265 3266 tmp = ScalarLoad(op0); 3267 result.x = ApproxLog2(tmp); 3268 result.y = ApproxLog2(tmp); 3269 result.z = ApproxLog2(tmp); 3270 result.w = ApproxLog2(tmp); 3271 3272 If the scalar operand is zero or negative, the result is undefined. 3273 3274 LG2 supports only floating-point data type modifiers. 3275 3276 3277 Section 2.X.8.Z, LIT: Compute Lighting Coefficients 3278 3279 The LIT instruction accelerates lighting computations by computing 3280 lighting coefficients for ambient, diffuse, and specular light 3281 contributions. The "x" component of the single operand is assumed to hold 3282 a diffuse dot product (n dot VP_pli, as in the vertex lighting equations 3283 in Section 2.13.1). The "y" component of the operand is assumed to hold a 3284 specular dot product (n dot h_i). The "w" component of the operand is 3285 assumed to hold the specular exponent of the material (s_rm), and is 3286 clamped to the range (-128, +128) exclusive. 3287 3288 The "x" component of the result vector receives the value that should be 3289 multiplied by the ambient light/material product (always 1.0). The "y" 3290 component of the result vector receives the value that should be 3291 multiplied by the diffuse light/material product (n dot VP_pli). The "z" 3292 component of the result vector receives the value that should be 3293 multiplied by the specular light/material product (f_i * (n dot h_i) ^ 3294 s_rm). The "w" component of the result is the constant 1.0. 3295 3296 Negative diffuse and specular dot products are clamped to 0.0, as is done 3297 in the standard per-vertex lighting operations. In addition, if the 3298 diffuse dot product is zero or negative, the specular coefficient is 3299 forced to zero. 3300 3301 tmp = VectorLoad(op0); 3302 if (tmp.x < 0) tmp.x = 0; 3303 if (tmp.y < 0) tmp.y = 0; 3304 if (tmp.w < -(128.0-epsilon)) tmp.w = -(128.0-epsilon); 3305 else if (tmp.w > 128-epsilon) tmp.w = 128-epsilon; 3306 result.x = 1.0; 3307 result.y = tmp.x; 3308 result.z = (tmp.x > 0) ? RoughApproxPower(tmp.y, tmp.w) : 0.0; 3309 result.w = 1.0; 3310 3311 Since 0^0 is defined to be 1, RoughApproxPower(0.0, 0.0) will produce 1.0. 3312 3313 LIT supports only floating-point data type modifiers. 3314 3315 3316 Section 2.X.8.Z, LRP: Linear Interpolation 3317 3318 The LRP instruction performs a component-wise linear interpolation between 3319 the second and third operands using the first operand as the blend factor. 3320 3321 tmp0 = VectorLoad(op0); 3322 tmp1 = VectorLoad(op1); 3323 tmp2 = VectorLoad(op2); 3324 result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x; 3325 result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y; 3326 result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z; 3327 result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w; 3328 3329 LRP supports only floating-point data type modifiers. 3330 3331 3332 Section 2.X.8.Z, MAD: Multiply and Add 3333 3334 The MAD instruction performs a component-wise multiply of the first two 3335 operands, and then does a component-wise add of the product to the third 3336 operand to yield a result vector. 3337 3338 tmp0 = VectorLoad(op0); 3339 tmp1 = VectorLoad(op1); 3340 tmp2 = VectorLoad(op2); 3341 result.x = tmp0.x * tmp1.x + tmp2.x; 3342 result.y = tmp0.y * tmp1.y + tmp2.y; 3343 result.z = tmp0.z * tmp1.z + tmp2.z; 3344 result.w = tmp0.w * tmp1.w + tmp2.w; 3345 3346 The multiplication and addition operations in this instruction are subject 3347 to the same rules as described for the MUL and ADD instructions. 3348 3349 MAD supports all three data type modifiers. 3350 3351 3352 Section 2.X.8.Z, MAX: Maximum 3353 3354 The MAX instruction computes component-wise maximums of the values in the 3355 two operands to yield a result vector. 3356 3357 tmp0 = VectorLoad(op0); 3358 tmp1 = VectorLoad(op1); 3359 result.x = (tmp0.x > tmp1.x) ? tmp0.x : tmp1.x; 3360 result.y = (tmp0.y > tmp1.y) ? tmp0.y : tmp1.y; 3361 result.z = (tmp0.z > tmp1.z) ? tmp0.z : tmp1.z; 3362 result.w = (tmp0.w > tmp1.w) ? tmp0.w : tmp1.w; 3363 3364 MAX supports all three data type modifiers. 3365 3366 3367 Section 2.X.8.Z, MIN: Minimum 3368 3369 The MIN instruction computes component-wise minimums of the values in the 3370 two operands to yield a result vector. 3371 3372 tmp0 = VectorLoad(op0); 3373 tmp1 = VectorLoad(op1); 3374 result.x = (tmp0.x > tmp1.x) ? tmp1.x : tmp0.x; 3375 result.y = (tmp0.y > tmp1.y) ? tmp1.y : tmp0.y; 3376 result.z = (tmp0.z > tmp1.z) ? tmp1.z : tmp0.z; 3377 result.w = (tmp0.w > tmp1.w) ? tmp1.w : tmp0.w; 3378 3379 MIN supports all three data type modifiers. 3380 3381 3382 Section 2.X.8.Z, MOD: Modulus 3383 3384 The MOD instruction performs a component-wise modulus operation on the first 3385 vector operand by the second scalar operand to produce a 4-component result 3386 vector. 3387 3388 tmp0 = VectorLoad(op0); 3389 tmp1 = ScalarLoad(op1); 3390 result.x = tmp0.x % tmp1; 3391 result.y = tmp0.y % tmp1; 3392 result.z = tmp0.z % tmp1; 3393 result.w = tmp0.w % tmp1; 3394 3395 MOD supports both signed and unsigned integer data type modifiers. If no 3396 data type modifier is specified, both operands and the result are treated 3397 as signed integers. 3398 3399 A result component is undefined if the corresponding component of the 3400 first operand is negative or if the second operand is less than or equal 3401 to zero. 3402 3403 3404 Section 2.X.8.Z, MOV: Move 3405 3406 The MOV instruction copies the value of the operand to yield a result 3407 vector. 3408 3409 result = VectorLoad(op0); 3410 3411 MOV supports all three data type modifiers. 3412 3413 3414 Section 2.X.8.Z, MUL: Multiply 3415 3416 The MUL instruction performs a component-wise multiply of the two operands 3417 to yield a result vector. 3418 3419 tmp0 = VectorLoad(op0); 3420 tmp1 = VectorLoad(op1); 3421 result.x = tmp0.x * tmp1.x; 3422 result.y = tmp0.y * tmp1.y; 3423 result.z = tmp0.z * tmp1.z; 3424 result.w = tmp0.w * tmp1.w; 3425 3426 MUL supports all three data type modifiers. The MUL instruction 3427 additionally supports three special modifiers. 3428 3429 The "S24" and "U24" modifiers specify "fast" signed or unsigned integer 3430 multiplies of 24-bit quantities, respectively. The results of such 3431 multiplies are undefined if either operand is outside the range 3432 [-2^23,+2^23-1] for S24 or [0,2^24-1] for U24. If "S24" or "U24" is 3433 specified, the data type is implied and normal data type modifiers may not 3434 be provided. 3435 3436 The "HI" modifier specifies a 32-bit integer multiply that returns the 32 3437 most significant bits of the 64-bit product. Integer multiplies without 3438 the "HI" modifier normally return the least significant bits of the 3439 product. If "HI" is specified, either of the "S" or "U" integer data type 3440 modifiers must also be specified. 3441 3442 Note that if condition code updates are performed on integer multiplies, 3443 the overflow or carry flags are always cleared, even if the product 3444 overflowed. If it is necessary to determine if the results of an integer 3445 multiply overflowed, the MUL.HI instruction may be used. 3446 3447 3448 Section 2.X.8.Z, NOT: Bitwise Not 3449 3450 The NOT instruction performs a component-wise bitwise NOT operation on the 3451 source vector to produce a result vector. 3452 3453 tmp = VectorLoad(op0); 3454 tmp.x = ~tmp.x; 3455 tmp.y = ~tmp.y; 3456 tmp.z = ~tmp.z; 3457 tmp.w = ~tmp.w; 3458 3459 NOT supports only integer data type modifiers. If no type modifier is 3460 specified, the operand and the result are treated as signed integers. 3461 3462 3463 Section 2.X.8.Z, NRM: Normalize 3-Component Vector 3464 3465 The NRM instruction normalizes the vector given by the x, y, and z 3466 components of the vector operand to produce the x, y, and z components of 3467 the result vector. The w component of the result is undefined. 3468 3469 tmp = VectorLoad(op0); 3470 scale = ApproxRSQ(tmp.x * tmp.x + tmp.y * tmp.y + tmp.z * tmp.z); 3471 result.x = tmp.x * scale; 3472 result.y = tmp.y * scale; 3473 result.z = tmp.z * scale; 3474 result.w = undefined; 3475 3476 NRM supports only floating-point data type modifiers. 3477 3478 3479 Section 2.X.8.Z, OR: Bitwise Or 3480 3481 The OR instruction performs a bitwise OR operation on the components of 3482 the two source vectors to yield a result vector. 3483 3484 tmp0 = VectorLoad(op0); 3485 tmp1 = VectorLoad(op1); 3486 result.x = tmp0.x | tmp1.x; 3487 result.y = tmp0.y | tmp1.y; 3488 result.z = tmp0.z | tmp1.z; 3489 result.w = tmp0.w | tmp1.w; 3490 3491 OR supports only integer data type modifiers. If no type modifier is 3492 specified, both operands and the result are treated as signed integers. 3493 3494 3495 Section 2.X.8.Z, PK2H: Pack Two 16-bit Floats 3496 3497 The PK2H instruction converts the "x" and "y" components of the single 3498 floating-point vector operand into 16-bit floating-point format, packs the 3499 bit representation of these two floats into a 32-bit unsigned integer, and 3500 replicates that value to all four components of the result vector. The 3501 PK2H instruction can be reversed by the UP2H instruction below. 3502 3503 tmp0 = VectorLoad(op0); 3504 /* result obtained by combining raw bits of tmp0.x, tmp0.y */ 3505 result.x = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); 3506 result.y = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); 3507 result.z = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); 3508 result.w = RawBits(tmp0.x) | (RawBits(tmp0.y) << 16); 3509 3510 PK2H supports all three data type modifiers. The single operand is always 3511 treated as a floating-point value, but the result is written as a 3512 floating-point value, a signed integer, or an unsigned integer, as 3513 specified by the data type modifier. For integer results, the bits can be 3514 interpreted as described above. For floating-point result variables, the 3515 packed results do not constitute a meaningful floating-point variable and 3516 should only be used to feed future unpack instructions. 3517 3518 A program will fail to load if it contains a PK2H instruction that writes 3519 its results to a variable declared as "SHORT". 3520 3521 3522 Section 2.X.8.Z, PK2US: Pack Two Floats as Unsigned 16-bit 3523 3524 The PK2US instruction converts the "x" and "y" components of the single 3525 floating-point vector operand into a packed pair of 16-bit unsigned 3526 scalars. The scalars are represented in a bit pattern where all '0' bits 3527 corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit 3528 representations of the two converted components are packed into a 32-bit 3529 unsigned integer, and that value is replicated to all four components of 3530 the result vector. The PK2US instruction can be reversed by the UP2US 3531 instruction below. 3532 3533 tmp0 = VectorLoad(op0); 3534 if (tmp0.x < 0.0) tmp0.x = 0.0; 3535 if (tmp0.x > 1.0) tmp0.x = 1.0; 3536 if (tmp0.y < 0.0) tmp0.y = 0.0; 3537 if (tmp0.y > 1.0) tmp0.y = 1.0; 3538 us.x = round(65535.0 * tmp0.x); /* us is a ushort vector */ 3539 us.y = round(65535.0 * tmp0.y); 3540 /* result obtained by combining raw bits of us. */ 3541 result.x = ((us.x) | (us.y << 16)); 3542 result.y = ((us.x) | (us.y << 16)); 3543 result.z = ((us.x) | (us.y << 16)); 3544 result.w = ((us.x) | (us.y << 16)); 3545 3546 PK2US supports all three data type modifiers. The single operand is 3547 always treated as a floating-point value, but the result is written as a 3548 floating-point value, a signed integer, or an unsigned integer, as 3549 specified by the data type modifier. For integer result variables, the 3550 bits can be interpreted as described above. For floating-point result 3551 variables, the packed results do not constitute a meaningful 3552 floating-point variable and should only be used to feed future unpack 3553 instructions. 3554 3555 A program will fail to load if it contains a PK2US instruction that writes 3556 its results to a variable declared as "SHORT". 3557 3558 3559 Section 2.X.8.Z, PK4B: Pack Four Floats as Signed 8-bit 3560 3561 The PK4B instruction converts the four components of the single 3562 floating-point vector operand into 8-bit signed quantities. The signed 3563 quantities are represented in a bit pattern where all '0' bits corresponds 3564 to -128/127 and all '1' bits corresponds to +127/127. The bit 3565 representations of the four converted components are packed into a 32-bit 3566 unsigned integer, and that value is replicated to all four components of 3567 the result vector. The PK4B instruction can be reversed by the UP4B 3568 instruction below. 3569 3570 tmp0 = VectorLoad(op0); 3571 if (tmp0.x < -128/127) tmp0.x = -128/127; 3572 if (tmp0.y < -128/127) tmp0.y = -128/127; 3573 if (tmp0.z < -128/127) tmp0.z = -128/127; 3574 if (tmp0.w < -128/127) tmp0.w = -128/127; 3575 if (tmp0.x > +127/127) tmp0.x = +127/127; 3576 if (tmp0.y > +127/127) tmp0.y = +127/127; 3577 if (tmp0.z > +127/127) tmp0.z = +127/127; 3578 if (tmp0.w > +127/127) tmp0.w = +127/127; 3579 ub.x = round(127.0 * tmp0.x + 128.0); /* ub is a ubyte vector */ 3580 ub.y = round(127.0 * tmp0.y + 128.0); 3581 ub.z = round(127.0 * tmp0.z + 128.0); 3582 ub.w = round(127.0 * tmp0.w + 128.0); 3583 /* result obtained by combining raw bits of ub. */ 3584 result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 3585 result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 3586 result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 3587 result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 3588 3589 PK4B supports all three data type modifiers. The single operand is always 3590 treated as a floating-point value, but the result is written as a 3591 floating-point value, a signed integer, or an unsigned integer, as 3592 specified by the data type modifier. For integer result variables, the 3593 bits can be interpreted as described above. For floating-point result 3594 variables, the packed results do not constitute a meaningful 3595 floating-point variable and should only be used to feed future unpack 3596 instructions. A program will fail to load if it contains a PK4B 3597 instruction that writes its results to a variable declared as "SHORT". 3598 3599 3600 Section 2.X.8.Z, PK4UB: Pack Four Floats as Unsigned 8-bit 3601 3602 The PK4UB instruction converts the four components of the single 3603 floating-point vector operand into a packed grouping of 8-bit unsigned 3604 scalars. The scalars are represented in a bit pattern where all '0' bits 3605 corresponds to 0.0 and all '1' bits corresponds to 1.0. The bit 3606 representations of the four converted components are packed into a 32-bit 3607 unsigned integer, and that value is replicated to all four components of 3608 the result vector. The PK4UB instruction can be reversed by the UP4UB 3609 instruction below. 3610 3611 tmp0 = VectorLoad(op0); 3612 if (tmp0.x < 0.0) tmp0.x = 0.0; 3613 if (tmp0.x > 1.0) tmp0.x = 1.0; 3614 if (tmp0.y < 0.0) tmp0.y = 0.0; 3615 if (tmp0.y > 1.0) tmp0.y = 1.0; 3616 if (tmp0.z < 0.0) tmp0.z = 0.0; 3617 if (tmp0.z > 1.0) tmp0.z = 1.0; 3618 if (tmp0.w < 0.0) tmp0.w = 0.0; 3619 if (tmp0.w > 1.0) tmp0.w = 1.0; 3620 ub.x = round(255.0 * tmp0.x); /* ub is a ubyte vector */ 3621 ub.y = round(255.0 * tmp0.y); 3622 ub.z = round(255.0 * tmp0.z); 3623 ub.w = round(255.0 * tmp0.w); 3624 /* result obtained by combining raw bits of ub. */ 3625 result.x = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 3626 result.y = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 3627 result.z = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 3628 result.w = ((ub.x) | (ub.y << 8) | (ub.z << 16) | (ub.w << 24)); 3629 3630 PK4UB supports all three data type modifiers. The single operand is 3631 always treated as a floating-point value, but the result is written as a 3632 floating-point value, a signed integer, or an unsigned integer, as 3633 specified by the data type modifier. For integer result variables, the 3634 bits can be interpreted as described above. For floating-point result 3635 variables, the packed results do not constitute a meaningful 3636 floating-point variable and should only be used to feed future unpack 3637 instructions. 3638 3639 A program will fail to load if it contains a PK4UB instruction that writes 3640 its results to a variable declared as "SHORT". 3641 3642 3643 Section 2.X.8.Z, POW: Exponentiate 3644 3645 The POW instruction approximates the value of the first scalar operand 3646 raised to the power of the second scalar operand and replicates it to all 3647 four components of the result vector. 3648 3649 tmp0 = ScalarLoad(op0); 3650 tmp1 = ScalarLoad(op1); 3651 result.x = ApproxPower(tmp0, tmp1); 3652 result.y = ApproxPower(tmp0, tmp1); 3653 result.z = ApproxPower(tmp0, tmp1); 3654 result.w = ApproxPower(tmp0, tmp1); 3655 3656 The exponentiation approximation function may be implemented using the 3657 base 2 exponentiation and logarithm approximation operations in the EX2 3658 and LG2 instructions. In particular, 3659 3660 ApproxPower(a,b) = ApproxExp2(b * ApproxLog2(a)). 3661 3662 Note that a logarithm may be involved even for cases where the exponent is 3663 an integer. This means that it may not be possible to exponentiate 3664 correctly with a negative base. In constrast, it is possible in a 3665 "normal" mathematical formulation to raise negative numbers to integral 3666 powers (e.g., (-3)^2== 9, and (-0.5)^-2==4). 3667 3668 POW supports only floating-point data type modifiers. 3669 3670 3671 Section 2.X.8.Z, RCC: Reciprocal (Clamped) 3672 3673 The RCC instruction approximates the reciprocal of the scalar operand, 3674 clamps the result to one of two ranges, and replicates the clamped result 3675 to all four components of the result vector. 3676 3677 If the approximated reciprocal is greater than 0.0, the result is clamped 3678 to the range [2^-64, 2^+64]. If the approximate reciprocal is not greater 3679 than zero, the result is clamped to the range [-2^+64, -2^-64]. 3680 3681 tmp = ScalarLoad(op0); 3682 result.x = ClampApproxReciprocal(tmp); 3683 result.y = ClampApproxReciprocal(tmp); 3684 result.z = ClampApproxReciprocal(tmp); 3685 result.w = ClampApproxReciprocal(tmp); 3686 3687 RCC supports only floating-point data type modifiers. 3688 3689 3690 Section 2.X.8.Z, RCP: Reciprocal 3691 3692 The RCP instruction approximates the reciprocal of the scalar operand and 3693 replicates it to all four components of the result vector. 3694 3695 tmp = ScalarLoad(op0); 3696 result.x = ApproxReciprocal(tmp); 3697 result.y = ApproxReciprocal(tmp); 3698 result.z = ApproxReciprocal(tmp); 3699 result.w = ApproxReciprocal(tmp); 3700 3701 RCP supports only floating-point data type modifiers. 3702 3703 3704 Section 2.X.8.Z, REP: Start of Repeat Block 3705 3706 The REP instruction begins a REP/ENDREP block. The REP instruction 3707 supports an optional operand whose x component specifies the initial value 3708 for the loop count. The loop count indicates the number of times the 3709 instructions between the REP and corresponding ENDREP instruction will be 3710 executed. If the initial value of the loop count is not positive, the 3711 entire block is skipped and execution continues at the instruction 3712 following the corresponding ENDREP instruction. If the loop count is 3713 specified as a floating-point value, it is converted to the largest 3714 integer less than or equal to the specified value (i.e., taking its 3715 floor). 3716 3717 If no operand is provided to REP, the loop count is ignored and the 3718 corresponding ENDREP instruction unconditionally transfers control to the 3719 instruction immediately following the REP instruction. The only way to 3720 exit such a loop is with the BRK instruction. To prevent obvious infinite 3721 loops, a program that includes a REP/ENDREP block with no loop count will 3722 fail to compile unless it contains either a BRK instruction at the current 3723 nesting level or a RET instruction at any nesting level. 3724 3725 Implementations may have a limited ability to nest REP/ENDREP blocks. If 3726 the number of REP/ENDREP blocks nested inside each other is 3727 MAX_PROGRAM_LOOP_DEPTH_NV or higher, a program will fail to compile. 3728 3729 // Set up loop information for the new nesting level. 3730 tmp = VectorLoad(op0); 3731 LoopCount = floor(tmp.x); 3732 if (LoopCount <= 0) { 3733 continue execution at the corresponding ENDREP; 3734 } 3735 3736 REP supports all three data type modifiers. The single operand is 3737 interpreted according to the data type modifier. 3738 3739 (Note: Unlike the NV_fragment_program2 extension, REP blocks in this 3740 extension support fully general looping; the specified loop count can be 3741 computed in the program itself. Additionally, there is no run-time limit 3742 on the maximum overall depth of REP/ENDREP nesting. As long as each 3743 individual subroutine of the program obeys the static nesting limits, 3744 there will be no run-time errors in the program. With the 3745 NV_fragment_program2 extension, a program could terminate abnormally if it 3746 called a subroutine inside a deeply nested set of REP/ENDREP blocks and 3747 the called subroutine also contained deeply nested REP/ENDREP blocks. 3748 Such an error could occur even if neither subroutine exceeded static 3749 limits.) 3750 3751 3752 Section 2.X.8.Z, RET: Subroutine Return 3753 3754 The RET instruction conditionally returns from a subroutine initiated by a 3755 CAL instruction by popping an instruction reference off the top of the 3756 call stack and transferring control to the referenced instruction. The 3757 following pseudocode describes the operation of the instruction: 3758 3759 if (TestCC(cc.c***) || TestCC(cc.*c**) || 3760 TestCC(cc.**c*) || TestCC(cc.***c)) { 3761 if (callStackDepth <= 0) { 3762 // terminate program 3763 } else { 3764 callStackDepth--; 3765 instruction = callStack[callStackDepth]; 3766 } 3767 3768 // continue execution at <instruction> 3769 } else { 3770 // do nothing 3771 } 3772 3773 In the pseudocode, <callStackDepth> is the depth of the call stack, 3774 <callStack> is an array holding the call stack, and <instruction> is a 3775 reference to an instruction previously pushed onto the call stack. 3776 3777 If the call stack is empty when RET executes, the program terminates 3778 normally. 3779 3780 3781 Section 2.X.8.Z, RFL: Reflection Vector 3782 3783 The RFL instruction computes the reflection of the second vector operand 3784 (the "direction" vector) about the vector specified by the first vector 3785 operand (the "axis" vector). Both operands are treated as 3D vectors (the 3786 w components are ignored). The result vector is another 3D vector (the 3787 "reflected direction" vector). The length of the result vector, ignoring 3788 rounding errors, should equal that of the second operand. 3789 3790 axis = VectorLoad(op0); 3791 direction = VectorLoad(op1); 3792 tmp.w = (axis.x * axis.x + axis.y * axis.y + axis.z * axis.z); 3793 tmp.x = (axis.x * direction.x + axis.y * direction.y + 3794 axis.z * direction.z); 3795 tmp.x = 2.0 * tmp.x; 3796 tmp.x = tmp.x / tmp.w; 3797 result.x = tmp.x * axis.x - direction.x; 3798 result.y = tmp.x * axis.y - direction.y; 3799 result.z = tmp.x * axis.z - direction.z; 3800 result.w = undefined; 3801 3802 RFL supports only floating-point data type modifiers. 3803 3804 3805 Section 2.X.8.Z, ROUND: Round to Nearest Integer 3806 3807 The ROUND instruction loads a single vector operand and performs a 3808 component-wise round operation to generate a result vector. 3809 3810 tmp = VectorLoad(op0); 3811 result.x = round(tmp.x); 3812 result.y = round(tmp.y); 3813 result.z = round(tmp.z); 3814 result.w = round(tmp.w); 3815 3816 The round operation returns the nearest integer to the operand. If the 3817 fractional portion of the operand is 0.5, round() selects the nearest even 3818 integer. For example round(-1.7) = -2.0, round(+1.0) = +1.0, and 3819 round(+3.7) = +4.0. 3820 3821 ROUND supports all three data type modifiers. The single operand is 3822 always treated as a floating-point value, but the result is written as a 3823 floating-point value, a signed integer, or an unsigned integer, as 3824 specified by the data type modifier. If a value is not exactly 3825 representable using the data type of the result (e.g., an overflow or 3826 writing a negative value to an unsigned integer), the result is undefined. 3827 3828 3829 Section 2.X.8.Z, RSQ: Reciprocal Square Root 3830 3831 The RSQ instruction approximates the reciprocal of the square root of the 3832 scalar operand and replicates it to all four components of the result 3833 vector. 3834 3835 tmp = ScalarLoad(op0); 3836 result.x = ApproxRSQRT(tmp); 3837 result.y = ApproxRSQRT(tmp); 3838 result.z = ApproxRSQRT(tmp); 3839 result.w = ApproxRSQRT(tmp); 3840 3841 If the operand is less than or equal to zero, the results of the 3842 instruction are undefined. 3843 3844 RSQ supports only floating-point data type modifiers. 3845 3846 Note that this instruction differs from the RSQ instruction in 3847 ARB_vertex_program in that it does not implicitly take the absolute value 3848 of its operand. The |abs| operator can be used to achieve equivalent 3849 semantics. 3850 3851 3852 Section 2.X.8.Z, SAD: Sum of Absolute Differences 3853 3854 The SAD instruction performs a component-wise difference of the first two 3855 integer operands (subtracting the second from the first), and then does a 3856 component-wise add of the absolute value of the difference to the third 3857 unsigned integer operand to yield an unsigned integer result vector. 3858 3859 tmp0 = VectorLoad(op0); 3860 tmp1 = VectorLoad(op1); 3861 tmp2 = VectorLoad(op2); 3862 result.x = abs(tmp0.x - tmp1.x) + tmp2.x; 3863 result.y = abs(tmp0.y - tmp1.y) + tmp2.y; 3864 result.z = abs(tmp0.z - tmp1.z) + tmp2.z; 3865 result.w = abs(tmp0.w - tmp1.w) + tmp2.w; 3866 3867 SAD supports signed and unsigned integer data type modifiers. The first 3868 two operands are interpreted according to the data type modifier. The 3869 third operand and the result are always unsigned integers. 3870 3871 3872 Section 2.X.8.Z, SCS: Sine/Cosine without Reduction 3873 3874 The SCS instruction approximates the trigonometric sine and cosine of the 3875 angle specified by the scalar operand and places the cosine in the x 3876 component and the sine in the y component of the result vector. The z and 3877 w components of the result vector are undefined. The angle is specified 3878 in radians and must be in the range [-PI,PI]. 3879 3880 tmp = ScalarLoad(op0); 3881 result.x = ApproxCosine(tmp); 3882 result.y = ApproxSine(tmp); 3883 result.z = undefined; 3884 result.w = undefined; 3885 3886 If the scalar operand is not in the range [-PI,PI], the result vector is 3887 undefined. 3888 3889 SCS supports only floating-point data type modifiers. 3890 3891 3892 Section 2.X.8.Z, SEQ: Set on Equal 3893 3894 The SEQ instruction performs a component-wise comparison of the two 3895 operands. Each component of the result vector returns a TRUE value 3896 (described below) if the corresponding component of the first operand is 3897 equal to that of the second, and a FALSE value otherwise. 3898 3899 tmp0 = VectorLoad(op0); 3900 tmp1 = VectorLoad(op1); 3901 result.x = (tmp0.x == tmp1.x) ? TRUE : FALSE; 3902 result.y = (tmp0.y == tmp1.y) ? TRUE : FALSE; 3903 result.z = (tmp0.z == tmp1.z) ? TRUE : FALSE; 3904 result.w = (tmp0.w == tmp1.w) ? TRUE : FALSE; 3905 3906 SEQ supports all data type modifiers. For floating-point data types, the 3907 TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data 3908 types, the TRUE value is -1 and the FALSE value is 0. For unsigned 3909 integer data types, the TRUE value is the maximum integer value (all bits 3910 are ones) and the FALSE value is zero. 3911 3912 3913 Section 2.X.8.Z, SFL: Set on False 3914 3915 The SFL instruction is a degenerate case of the other "Set on" 3916 instructions that sets all components of the result vector to a FALSE 3917 value (described below). 3918 3919 result.x = FALSE; 3920 result.y = FALSE; 3921 result.z = FALSE; 3922 result.w = FALSE; 3923 3924 SFL supports all data type modifiers. For floating-point data types, the 3925 FALSE value is 0.0. For signed and unsigned integer data types, the FALSE 3926 value is zero. 3927 3928 3929 Section 2.X.8.Z, SGE: Set on Greater Than or Equal 3930 3931 The SGE instruction performs a component-wise comparison of the two 3932 operands. Each component of the result vector returns a TRUE value 3933 (described below) if the corresponding component of the first operand is 3934 greater than or equal to that of the second, and a FALSE value otherwise. 3935 3936 tmp0 = VectorLoad(op0); 3937 tmp1 = VectorLoad(op1); 3938 result.x = (tmp0.x >= tmp1.x) ? TRUE : FALSE; 3939 result.y = (tmp0.y >= tmp1.y) ? TRUE : FALSE; 3940 result.z = (tmp0.z >= tmp1.z) ? TRUE : FALSE; 3941 result.w = (tmp0.w >= tmp1.w) ? TRUE : FALSE; 3942 3943 SGE supports all data type modifiers. For floating-point data types, the 3944 TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data 3945 types, the TRUE value is -1 and the FALSE value is 0. For unsigned 3946 integer data types, the TRUE value is the maximum integer value (all bits 3947 are ones) and the FALSE value is zero. 3948 3949 3950 Section 2.X.8.Z, SGT: Set on Greater Than 3951 3952 The SGT instruction performs a component-wise comparison of the two 3953 operands. Each component of the result vector returns a TRUE value 3954 (described below) if the corresponding component of the first operand is 3955 greater than that of the second, and a FALSE value otherwise. 3956 3957 tmp0 = VectorLoad(op0); 3958 tmp1 = VectorLoad(op1); 3959 result.x = (tmp0.x > tmp1.x) ? TRUE : FALSE; 3960 result.y = (tmp0.y > tmp1.y) ? TRUE : FALSE; 3961 result.z = (tmp0.z > tmp1.z) ? TRUE : FALSE; 3962 result.w = (tmp0.w > tmp1.w) ? TRUE : FALSE; 3963 3964 SGT supports all data type modifiers. For floating-point data types, the 3965 TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data 3966 types, the TRUE value is -1 and the FALSE value is 0. For unsigned 3967 integer data types, the TRUE value is the maximum integer value (all bits 3968 are ones) and the FALSE value is zero. 3969 3970 3971 Section 2.X.8.Z, SHL: Shift Left 3972 3973 The SHL instruction performs a component-wise left shift of the bits of 3974 the first operand by the value of the second scalar operand to produce a 3975 result vector. The bits vacated during the shift operation are filled 3976 with zeroes. 3977 3978 tmp0 = VectorLoad(op0); 3979 tmp1 = ScalarLoad(op1); 3980 result.x = tmp0.x << tmp1; 3981 result.y = tmp0.y << tmp1; 3982 result.z = tmp0.z << tmp1; 3983 result.w = tmp0.w << tmp1; 3984 3985 The results of a shift operation ("<<") are undefined if the value of the 3986 second operand is negative, or greater than or equal to the number of bits 3987 in the first operand. 3988 3989 SHL supports both signed and unsigned integer data type modifiers. If no 3990 modifier is provided, the operands and the result are treated as signed 3991 integers. 3992 3993 3994 Section 2.X.8.Z, SHR: Shift Right 3995 3996 The SHR instruction performs a component-wise right shift of the bits of 3997 the first operand by the value of the second scalar operand to produce a 3998 result vector. The bits vacated during shift operation are filled with 3999 zeros if the operand is non-negative and ones otherwise. 4000 4001 tmp0 = VectorLoad(op0); 4002 tmp1 = ScalarLoad(op1); 4003 result.x = tmp0.x >> tmp1; 4004 result.y = tmp0.y >> tmp1; 4005 result.z = tmp0.z >> tmp1; 4006 result.w = tmp0.w >> tmp1; 4007 4008 The results of a shift operation (">>") are undefined if the value of the 4009 second operand is negative, or greater than or equal to the number of bits 4010 in the first operand. 4011 4012 SHR supports both signed and unsigned integer data type modifiers. If no 4013 modifiers are provided, the operands and the result are treated as signed 4014 integers. 4015 4016 4017 Section 2.X.8.Z, SIN: Sine with Reduction to [-PI,PI] 4018 4019 The SIN instruction approximates the trigonometric sine of the angle 4020 specified by the scalar operand and replicates it to all four components 4021 of the result vector. The angle is specified in radians and does not have 4022 to be in the range [-PI,PI]. 4023 4024 tmp = ScalarLoad(op0); 4025 result.x = ApproxSine(tmp); 4026 result.y = ApproxSine(tmp); 4027 result.z = ApproxSine(tmp); 4028 result.w = ApproxSine(tmp); 4029 4030 SIN supports only floating-point data type modifiers. 4031 4032 4033 Section 2.X.8.Z, SLE: Set on Less Than or Equal 4034 4035 The SLE instruction performs a component-wise comparison of the two 4036 operands. Each component of the result vector returns a TRUE value 4037 (described below) if the corresponding component of the first operand is 4038 less than or equal to that of the second, and a FALSE value otherwise. 4039 4040 tmp0 = VectorLoad(op0); 4041 tmp1 = VectorLoad(op1); 4042 result.x = (tmp0.x <= tmp1.x) ? TRUE : FALSE; 4043 result.y = (tmp0.y <= tmp1.y) ? TRUE : FALSE; 4044 result.z = (tmp0.z <= tmp1.z) ? TRUE : FALSE; 4045 result.w = (tmp0.w <= tmp1.w) ? TRUE : FALSE; 4046 4047 SLE supports all data type modifiers. For floating-point data types, the 4048 TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data 4049 types, the TRUE value is -1 and the FALSE value is 0. For unsigned 4050 integer data types, the TRUE value is the maximum integer value (all bits 4051 are ones) and the FALSE value is zero. 4052 4053 4054 Section 2.X.8.Z, SLT: Set on Less Than 4055 4056 The SLT instruction performs a component-wise comparison of the two 4057 operands. Each component of the result vector returns a TRUE value 4058 (described below) if the corresponding component of the first operand is 4059 less than that of the second, and a FALSE value otherwise. 4060 4061 tmp0 = VectorLoad(op0); 4062 tmp1 = VectorLoad(op1); 4063 result.x = (tmp0.x < tmp1.x) ? TRUE : FALSE; 4064 result.y = (tmp0.y < tmp1.y) ? TRUE : FALSE; 4065 result.z = (tmp0.z < tmp1.z) ? TRUE : FALSE; 4066 result.w = (tmp0.w < tmp1.w) ? TRUE : FALSE; 4067 4068 SLT supports all data type modifiers. For floating-point data types, the 4069 TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data 4070 types, the TRUE value is -1 and the FALSE value is 0. For unsigned 4071 integer data types, the TRUE value is the maximum integer value (all bits 4072 are ones) and the FALSE value is zero. 4073 4074 4075 Section 2.X.8.Z, SNE: Set on Not Equal 4076 4077 The SNE instruction performs a component-wise comparison of the two 4078 operands. Each component of the result vector returns a TRUE value 4079 (described below) if the corresponding component of the first operand is 4080 less than that of the second, and a FALSE value otherwise. 4081 4082 tmp0 = VectorLoad(op0); 4083 tmp1 = VectorLoad(op1); 4084 result.x = (tmp0.x != tmp1.x) ? TRUE : FALSE; 4085 result.y = (tmp0.y != tmp1.y) ? TRUE : FALSE; 4086 result.z = (tmp0.z != tmp1.z) ? TRUE : FALSE; 4087 result.w = (tmp0.w != tmp1.w) ? TRUE : FALSE; 4088 4089 SNE supports all data type modifiers. For floating-point data types, the 4090 TRUE value is 1.0 and the FALSE value is 0.0. For signed integer data 4091 types, the TRUE value is -1 and the FALSE value is 0. For unsigned 4092 integer data types, the TRUE value is the maximum integer value (all bits 4093 are ones) and the FALSE value is zero. 4094 4095 4096 Section 2.X.8.Z, SSG: Set Sign 4097 4098 The SSG instruction generates a result vector containing the signs of 4099 each component of the single vector operand. Each component of the 4100 result vector is 1.0 if the corresponding component of the operand 4101 is greater than zero, 0.0 if the corresponding component of the 4102 operand is equal to zero, and -1.0 if the corresponding component 4103 of the operand is less than zero. 4104 4105 tmp = VectorLoad(op0); 4106 result.x = SetSign(tmp.x); 4107 result.y = SetSign(tmp.y); 4108 result.z = SetSign(tmp.z); 4109 result.w = SetSign(tmp.w); 4110 4111 SSG supports only floating-point data type modifiers. 4112 4113 4114 Section 2.X.8.Z, STR: Set on True 4115 4116 The STR instruction is a degenerate case of the other "Set on" 4117 instructions that sets all components of the result vector to a TRUE value 4118 (described below). 4119 4120 result.x = TRUE; 4121 result.y = TRUE; 4122 result.z = TRUE; 4123 result.w = TRUE; 4124 4125 STR supports all data type modifiers. For floating-point data types, the 4126 TRUE value is 1.0. For signed integer data types, the TRUE value is -1. 4127 For unsigned integer data types, the TRUE value is the maximum integer 4128 value (all bits are ones). 4129 4130 4131 Section 2.X.8.Z, SUB: Subtract 4132 4133 The SUB instruction performs a component-wise subtraction of the second 4134 operand from the first to yield a result vector. 4135 4136 tmp0 = VectorLoad(op0); 4137 tmp1 = VectorLoad(op1); 4138 result.x = tmp0.x - tmp1.x; 4139 result.y = tmp0.y - tmp1.y; 4140 result.z = tmp0.z - tmp1.z; 4141 result.w = tmp0.w - tmp1.w; 4142 4143 SUB supports all three data type modifiers. 4144 4145 4146 Section 2.X.8.Z, SWZ: Extended Swizzle 4147 4148 The SWZ instruction loads the single vector operand, and performs a 4149 swizzle operation more powerful than that provided for loading normal 4150 vector operands to yield an instruction vector. 4151 4152 After the operand is loaded, the "x", "y", "z", and "w" components of the 4153 result vector are selected by the first, second, third, and fourth matches 4154 of the <extSwizComp> pattern in the <extendedSwizzle> rule. 4155 4156 A result component can be selected from any of the four components of the 4157 operand or the constants 0.0 and 1.0. The result component can also be 4158 optionally negated. The following pseudocode describes the component 4159 selection method. "operand" refers to the vector operand, "select" is an 4160 enumerant where the values ZERO, ONE, X, Y, Z, and W correspond to the 4161 <extSwizSel> rule matching "0", "1", "x", "y", "z", and "w", respectively. 4162 "negate" is TRUE if and only if the <optionalSign> rule in <extSwizComp> 4163 matches "-". 4164 4165 float ExtSwizComponent(floatVec operand, enum select, boolean negate) 4166 { 4167 float result; 4168 switch (select) { 4169 case ZERO: result = 0.0; break; 4170 case ONE: result = 1.0; break; 4171 case X: result = operand.x; break; 4172 case Y: result = operand.y; break; 4173 case Z: result = operand.z; break; 4174 case W: result = operand.w; break; 4175 } 4176 if (negate) { 4177 result = -result; 4178 } 4179 return result; 4180 } 4181 4182 The entire extended swizzle operation is then defined using the following 4183 pseudocode: 4184 4185 tmp = VectorLoad(op0); 4186 result.x = ExtSwizComponent(tmp, xSelect, xNegate); 4187 result.y = ExtSwizComponent(tmp, ySelect, yNegate); 4188 result.z = ExtSwizComponent(tmp, zSelect, zNegate); 4189 result.w = ExtSwizComponent(tmp, wSelect, wNegate); 4190 4191 "xSelect", "xNegate", "ySelect", "yNegate", "zSelect", "zNegate", 4192 "wSelect", and "wNegate" correspond to the "select" and "negate" values 4193 above for the four <extSwizComp> matches. 4194 4195 Since this instruction allows for component selection and negation for 4196 each individual component, the grammar does not allow the use of the 4197 normal swizzle and negation operations allowed for vector operands in 4198 other instructions. 4199 4200 SWZ supports only floating-point data type modifiers. 4201 4202 4203 Section 2.X.8.Z, TEX: Texture Sample 4204 4205 The TEX instruction takes the four components of a single floating-point 4206 source vector and performs a filtered texture access as described in 4207 Section 2.X.4.4. The returned (R,G,B,A) value is written to the 4208 floating-point result vector. Partial derivatives and the level of detail 4209 are computed automatically. 4210 4211 tmp = VectorLoad(op0); 4212 ddx = ComputePartialsX(tmp); 4213 ddy = ComputePartialsY(tmp); 4214 lambda = ComputeLOD(ddx, ddy); 4215 result = TextureSample(tmp, lambda, ddx, ddy, texelOffset); 4216 4217 TEX supports all three data type modifiers. The single operand is always 4218 treated as a floating-point vector; the results are interpreted according 4219 to the data type modifier. 4220 4221 4222 Section 2.X.8.Z, TRUNC: Truncate (Round Toward Zero) 4223 4224 The TRUNC instruction loads a single vector operand and performs a 4225 component-wise truncate operation to generate a result vector. 4226 4227 tmp = VectorLoad(op0); 4228 result.x = trunc(tmp.x); 4229 result.y = trunc(tmp.y); 4230 result.z = trunc(tmp.z); 4231 result.w = trunc(tmp.w); 4232 4233 The truncate operation returns the nearest integer to zero smaller in 4234 magnitude than the operand. For example trunc(-1.7) = -1.0, trunc(+1.0) = 4235 +1.0, and trunc(+3.7) = +3.0. 4236 4237 TRUNC supports all three data type modifiers. The single operand is 4238 always treated as a floating-point value, but the result is written as a 4239 floating-point value, a signed integer, or an unsigned integer, as 4240 specified by the data type modifier. If a value is not exactly 4241 representable using the data type of the result (e.g., an overflow or 4242 writing a negative value to an unsigned integer), the result is undefined. 4243 4244 4245 Section 2.X.8.Z, TXB: Texture Sample with Bias 4246 4247 The TXB instruction takes the four components of a single floating-point 4248 source vector and performs a filtered texture access as described in 4249 Section 2.X.4.4. The returned (R,G,B,A) value is written to the 4250 floating-point result vector. Partial derivatives and the level of detail 4251 are computed automatically, but the fourth component of the source vector 4252 is added to the computed LOD prior to sampling. 4253 4254 tmp = VectorLoad(op0); 4255 ddx = ComputePartialsX(tmp); 4256 ddy = ComputePartialsY(tmp); 4257 lambda = ComputeLOD(ddx, ddy); 4258 result = TextureSample(tmp, lambda + tmp.w, ddx, ddy, texelOffset); 4259 4260 The single source vector in the TXB instruction does not have enough 4261 coordinates to specify a lookup into a two-dimensional array texture or 4262 cube map texture with both an LOD bias and an explicit reference value for 4263 depth comparison. A program will fail to load if it contains a TXB 4264 instruction with a target of SHADOWCUBE or SHADOWARRAY2D. 4265 4266 TXB supports all three data type modifiers. The single operand is always 4267 treated as a floating-point vector; the results are interpreted according 4268 to the data type modifier. 4269 4270 4271 Section 2.X.8.Z, TXD: Texture Sample with Partials 4272 4273 The TXD instruction takes the four components of the first floating-point 4274 source vector and performs a filtered texture access as described in 4275 Section 2.X.4.4. The returned (R,G,B,A) value is written to the 4276 floating-point result vector. The partial derivatives of the texture 4277 coordinates with respect to X and Y are specified by the second and third 4278 floating-point source vectors. The level of detail is computed 4279 automatically using the provided partial derivatives. 4280 4281 Note that for cube map texture targets, the provided partial derivatives 4282 are in the coordinate system used before texture coordinates are projected 4283 onto the appropriate cube face. The partial derivatives of the 4284 post-projection texture coordinates, which are used for level-of-detail 4285 and anisotropic filtering calculations, are derived from the original 4286 coordinates and partial derivatives in an implementation-dependent manner. 4287 4288 tmp0 = VectorLoad(op0); 4289 tmp1 = VectorLoad(op1); 4290 tmp2 = VectorLoad(op2); 4291 lambda = ComputeLOD(tmp1, tmp2); 4292 result = TextureSample(tmp0, lambda, tmp1, tmp2, texelOffset); 4293 4294 TXD supports all three data type modifiers. All three operands are always 4295 treated as floating-point vectors; the results are interpreted according 4296 to the data type modifier. 4297 4298 4299 Section 2.X.8.Z, TXF: Texel Fetch 4300 4301 The TXF instruction takes the four components of a single signed integer 4302 source vector and performs a single texel fetch as described in Section 4303 2.X.4.4. The first three components provide the <i>, <j>, and <k> values 4304 for the texel fetch, and the fourth component is used to determine the LOD 4305 to access. The returned (R,G,B,A) value is written to the floating-point 4306 result vector. Partial derivatives are irrelevant for single texel 4307 fetches. 4308 4309 tmp = VectorLoad(op0); 4310 result = TexelFetch(tmp, texelOffset); 4311 4312 TXF supports all three data type modifiers. The single vector operand is 4313 treated as a signed integer vector; the results are interpreted according 4314 to the data type modifier. 4315 4316 4317 Section 2.X.8.Z, TXL: Texture Sample with LOD 4318 4319 The TXL instruction takes the four components of a single floating-point 4320 source vector and performs a filtered texture access as described in 4321 Section 2.X.4.4. The returned (R,G,B,A) value is written to the 4322 floating-point result vector. The level of detail is taken from the 4323 fourth component of the source vector. 4324 4325 Partial derivatives are not computed by the TXL instruction and 4326 anisotropic filtering is not performed. 4327 4328 tmp = VectorLoad(op0); 4329 ddx = (0,0,0); 4330 ddy = (0,0,0); 4331 result = TextureSample(tmp, tmp.w, ddx, ddy, texelOffset); 4332 4333 The single source vector in the TXL instruction does not have enough 4334 coordinates to specify a lookup into a 2D array or cube map texture with 4335 both an explicit LOD and a reference value for depth comparison. A 4336 program will fail to load if it contains a TXL instruction with a target 4337 of SHADOWCUBE or SHADOWARRAY2D. 4338 4339 TXL supports all three data type modifiers. The single vector operand is 4340 treated as a floating-point vector; the results are interpreted according 4341 to the data type modifier. 4342 4343 4344 Section 2.X.8.Z, TXP: Texture Sample with Projection 4345 4346 The TXP instruction divides the first three components of its single 4347 floating-point source vector by its fourth component, maps the results to 4348 s, t, and r, and performs a filtered texture access as described in 4349 Section 2.X.4.4. The returned (R,G,B,A) value is written to the 4350 floating-point result vector. Partial derivatives and the level of detail 4351 are computed automatically. 4352 4353 tmp0 = VectorLoad(op0); 4354 tmp0.x = tmp0.x / tmp0.w; 4355 tmp0.y = tmp0.y / tmp0.w; 4356 tmp0.z = tmp0.z / tmp0.w; 4357 ddx = ComputePartialsX(tmp); 4358 ddy = ComputePartialsY(tmp); 4359 lambda = ComputeLOD(ddx, ddy); 4360 result = TextureSample(tmp, lambda, ddx, ddy, texelOffset); 4361 4362 The single source vector in the TXP instruction does not have enough 4363 coordinates to specify a lookup into a 2D array or cube map texture with 4364 both a Q coordinate and an explicit reference value for depth comparison. 4365 A program will fail to load if it contains a TXP instruction with a target 4366 of SHADOWCUBE or SHADOWARRAY2D. 4367 4368 TXP supports all three data type modifiers. The single vector operand is 4369 treated as a floating-point vector; the results are interpreted according 4370 to the data type modifier. 4371 4372 4373 Section 2.X.8.Z, TXQ: Texture Size Query 4374 4375 The TXQ instruction takes the first component of the single integer vector 4376 operand, adds the number of the base level of the specified texture to 4377 determine a texture image level, and returns an integer result vector 4378 containing the size of the image at that level of the texture. 4379 4380 For one-dimensional and one-dimensional array textures, the "x" component 4381 of the result vector is filled with the width of the image(s). For 4382 two-dimensional, rectangle, cube map, and two-dimensional array textures, 4383 the "x" and "y" components are filled with the width and height of the 4384 image(s). For three-dimensional textures, the "x", "y", and "z" 4385 components are filled with the width, height, and depth of the image. 4386 Additionally, the number of layers in an array texture is returned in the 4387 "y" component of the result for one-dimensional array textures or the "z" 4388 component for two-dimensional array textures. All other components of the 4389 result vector is undefined. For the purposes of this instruction, the 4390 width, height, and depth of a texture do NOT include any border. 4391 4392 tmp0 = VectorLoad(op0); 4393 tmp0.x = tmp0.x + texture[op1].target[op2].base_level; 4394 result.x = texture[op1].target[op2].level[tmp0.x].width; 4395 result.y = texture[op1].target[op2].level[tmp0.x].height; 4396 result.z = texture[op1].target[op2].level[tmp0.x].depth; 4397 4398 If the level computed by adding the operand to the base level of the 4399 texture is less than the base level number or greater than the maximum 4400 level number, the results are undefined. 4401 4402 TXQ supports no data type modifiers; the scalar operand and the result 4403 vector are both interpreted as signed integers. 4404 4405 4406 Section 2.X.8.Z, UP2H: Unpack Two 16-bit Floats 4407 4408 The UP2H instruction unpacks two 16-bit floats stored together in a 32-bit 4409 scalar operand. The first 16-bit float (stored in the 16 least 4410 significant bits) is written into the "x" and "z" components of the result 4411 vector; the second is written into the "y" and "w" components of the 4412 result vector. 4413 4414 This operation undoes the type conversion and packing performed by 4415 the PK2H instruction. 4416 4417 tmp = ScalarLoad(op0); 4418 result.x = (fp16) (RawBits(tmp) & 0xFFFF); 4419 result.y = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); 4420 result.z = (fp16) (RawBits(tmp) & 0xFFFF); 4421 result.w = (fp16) ((RawBits(tmp) >> 16) & 0xFFFF); 4422 4423 UP2H supports all three data type modifiers. The single operand is read 4424 as a floating-point value, a signed integer, or an unsigned integer, as 4425 specified by the data type modifier; the 32 least significant bits of the 4426 encoding are used for unpacking. For floating-point operand variables, it 4427 is expected (but not required) that the operand was produced by a previous 4428 pack instruction. The result is always written as a floating-point 4429 vector. 4430 4431 A program will fail to load if it contains a UP2H instruction whose 4432 operand is a variable declared as "SHORT". 4433 4434 4435 Section 2.X.8.Z, UP2US: Unpack Two Unsigned 16-bit Integers 4436 4437 The UP2US instruction unpacks two 16-bit unsigned values packed 4438 together in a 32-bit scalar operand. The unsigned quantities are 4439 encoded where a bit pattern of all '0' bits corresponds to 0.0 and 4440 a pattern of all '1' bits corresponds to 1.0. The "x" and "z" 4441 components of the result vector are obtained from the 16 least 4442 significant bits of the operand; the "y" and "w" components are 4443 obtained from the 16 most significant bits. 4444 4445 This operation undoes the type conversion and packing performed by 4446 the PK2US instruction. 4447 4448 tmp = ScalarLoad(op0); 4449 result.x = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; 4450 result.y = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; 4451 result.z = ((RawBits(tmp) >> 0) & 0xFFFF) / 65535.0; 4452 result.w = ((RawBits(tmp) >> 16) & 0xFFFF) / 65535.0; 4453 4454 UP2US supports all three data type modifiers. The single operand is read 4455 as a floating-point value, a signed integer, or an unsigned integer, as 4456 specified by the data type modifier; the 32 least significant bits of the 4457 encoding are used for unpacking. For floating-point operand variables, it 4458 is expected (but not required) that the operand was produced by a previous 4459 pack instruction. The result is always written as a floating-point 4460 vector. 4461 4462 A GPU program will fail to load if it contains a UP2S instruction 4463 whose operand is a variable declared as "SHORT". 4464 4465 4466 Section 2.X.8.Z, UP4B: Unpack Four Signed 8-bit Integers 4467 4468 The UP4B instruction unpacks four 8-bit signed values packed together 4469 in a 32-bit scalar operand. The signed quantities are encoded where 4470 a bit pattern of all '0' bits corresponds to -128/127 and a pattern 4471 of all '1' bits corresponds to +127/127. The "x" component of the 4472 result vector is the converted value corresponding to the 8 least 4473 significant bits of the operand; the "w" component corresponds to 4474 the 8 most significant bits. 4475 4476 This operation undoes the type conversion and packing performed by 4477 the PK4B instruction. 4478 4479 tmp = ScalarLoad(op0); 4480 result.x = (((RawBits(tmp) >> 0) & 0xFF) - 128) / 127.0; 4481 result.y = (((RawBits(tmp) >> 8) & 0xFF) - 128) / 127.0; 4482 result.z = (((RawBits(tmp) >> 16) & 0xFF) - 128) / 127.0; 4483 result.w = (((RawBits(tmp) >> 24) & 0xFF) - 128) / 127.0; 4484 4485 UP2B supports all three data type modifiers. The single operand is read 4486 as a floating-point value, a signed integer, or an unsigned integer, as 4487 specified by the data type modifier; the 32 least significant bits of the 4488 encoding are used for unpacking. For floating-point operand variables, it 4489 is expected (but not required) that the operand was produced by a previous 4490 pack instruction. The result is always written as a floating-point 4491 vector. 4492 4493 A program will fail to load if it contains a UP4B instruction whose 4494 operand is a variable declared as "SHORT". 4495 4496 4497 Section 2.X.8.Z, UP4UB: Unpack Four Unsigned 8-bit Integers 4498 4499 The UP4UB instruction unpacks four 8-bit unsigned values packed 4500 together in a 32-bit scalar operand. The unsigned quantities are 4501 encoded where a bit pattern of all '0' bits corresponds to 0.0 and a 4502 pattern of all '1' bits corresponds to 1.0. The "x" component of the 4503 result vector is obtained from the 8 least significant bits of the 4504 operand; the "w" component is obtained from the 8 most significant 4505 bits. 4506 4507 This operation undoes the type conversion and packing performed by 4508 the PK4UB instruction. 4509 4510 tmp = ScalarLoad(op0); 4511 result.x = ((RawBits(tmp) >> 0) & 0xFF) / 255.0; 4512 result.y = ((RawBits(tmp) >> 8) & 0xFF) / 255.0; 4513 result.z = ((RawBits(tmp) >> 16) & 0xFF) / 255.0; 4514 result.w = ((RawBits(tmp) >> 24) & 0xFF) / 255.0; 4515 4516 UP4UB supports all three data type modifiers. The single operand is read 4517 as a floating-point value, a signed integer, or an unsigned integer, as 4518 specified by the data type modifier; the 32 least significant bits of the 4519 encoding are used for unpacking. For floating-point operand variables, it 4520 is expected (but not required) that the operand was produced by a previous 4521 pack instruction. The result is always written as a floating-point 4522 vector. 4523 4524 A program will fail to load if it contains a UP4UB instruction whose 4525 operand is a variable declared as "SHORT". 4526 4527 4528 Section 2.X.8.Z, X2D: 2D Coordinate Transformation 4529 4530 The X2D instruction multiplies the 2D offset vector specified by the 4531 "x" and "y" components of the second vector operand by the 2x2 matrix 4532 specified by the four components of the third vector operand, and adds 4533 the transformed offset vector to the 2D vector specified by the "x" 4534 and "y" components of the first vector operand. The first component 4535 of the sum is written to the "x" and "z" components of the result; 4536 the second component is written to the "y" and "w" components of 4537 the result. 4538 4539 tmp0 = VectorLoad(op0); 4540 tmp1 = VectorLoad(op1); 4541 tmp2 = VectorLoad(op2); 4542 result.x = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; 4543 result.y = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; 4544 result.z = tmp0.x + tmp1.x * tmp2.x + tmp1.y * tmp2.y; 4545 result.w = tmp0.y + tmp1.x * tmp2.z + tmp1.y * tmp2.w; 4546 4547 X2D supports only floating-point data type modifiers. 4548 4549 4550 Section 2.X.8.Z, XOR: Exclusive Or 4551 4552 The XOR instruction performs a bitwise XOR operation on the components of 4553 the two source vectors to yield a result vector. 4554 4555 tmp0 = VectorLoad(op0); 4556 tmp1 = VectorLoad(op1); 4557 result.x = tmp0.x ^ tmp1.x; 4558 result.y = tmp0.y ^ tmp1.y; 4559 result.z = tmp0.z ^ tmp1.z; 4560 result.w = tmp0.w ^ tmp1.w; 4561 4562 XOR supports only integer data type modifiers. If no type modifier is 4563 specified, both operands and the result are treated as signed integers. 4564 4565 4566 Section 2.X.8.Z, XPD: Cross Product 4567 4568 The XPD instruction computes the cross product using the first three 4569 components of its two vector operands to generate the x, y, and z 4570 components of the result vector. The w component of the result vector is 4571 undefined. 4572 4573 tmp0 = VectorLoad(op0); 4574 tmp1 = VectorLoad(op1); 4575 result.x = tmp0.y * tmp1.z - tmp0.z * tmp1.y; 4576 result.y = tmp0.z * tmp1.x - tmp0.x * tmp1.z; 4577 result.z = tmp0.x * tmp1.y - tmp0.y * tmp1.x; 4578 result.w = undefined; 4579 4580 XPD supports only floating-point data type modifiers. 4581 4582 4583Additions to Chapter 3 of the OpenGL 1.5 Specification (Rasterization) 4584 4585 Modify Section 3.8.1, Texture Image Specification, p. 150 4586 4587 (modify 4th paragraph, p. 151 -- add cubemaps to the list of texture 4588 targets that can be used with DEPTH_COMPONENT textures) Textures with a 4589 base internal format of DEPTH_COMPONENT are supported by texture image 4590 specification commands only if <target> is TEXTURE_1D, TEXTURE_2D, 4591 TEXTURE_CUBE_MAP, TEXTURE_RECTANGLE_ARB, TEXTURE_1D_ARRAY_EXT, 4592 TEXTURE_2D_ARRAY_EXT, PROXY_TEXTURE_1D PROXY_TEXTURE_2D, 4593 PROXY_TEXTURE_CUBE_MAP, PROXY_TEXTURE_RECTANGLE_ARB, 4594 PROXY_TEXTURE_1D_ARRAY_EXT, or PROXY_TEXTURE_2D_ARRAY_EXT. Using this 4595 format in conjunction with any other target will result in an 4596 INVALID_OPERATION error. 4597 4598 4599 Delete Section 3.8.7, Texture Wrap Modes. (The language in this section 4600 is folded into updates to the following section, and is no longer needed 4601 here.) 4602 4603 4604 Modify Section 3.8.8, Texture Minification: 4605 4606 (replace the last paragraph, p. 171): Let s(x,y) be the function that 4607 associates an s texture coordinate with each set of window coordinates 4608 (x,y) that lie within a primitive; define t(x,y) and r(x,y) analogously. 4609 Let 4610 4611 u(x,y) = w_t * s(x,y) + offsetu_shader, 4612 v(x,y) = h_t * t(x,y) + offsetv_shader, 4613 w(x,y) = d_t * r(x,y) + offsetw_shader, and 4614 4615 where w_t, h_t, and d_t are as defined by equations 3.15, 3.16, and 3.17 4616 with w_s, h_s, and d_s equal to the width, height, and depth of the image 4617 array whose level is level_base. (offsetu_shader, offsetv_shader, 4618 offsetw_shader) is the texel offset specified in the vertex, geometry, or 4619 fragment program instruction used to perform the access. For 4620 fixed-function texture accesses, all three shader offsets are taken to be 4621 zero. For a one-dimensional texture, define v(x,y) == 0 and w(x,y) === 0; 4622 for two-dimensional textures, define w(x,y) == 0. 4623 4624 After u(x,y), v(x,y), and w(x,y) are generated, they are clamped if the 4625 corresponding texture wrap modes are CLAMP or MIRROR_CLAMP_EXT. Let 4626 4627 u'(x,y) = clamp(u(x,y), 0, w_t), if TEXTURE_WRAP_S is CLAMP 4628 clamp(u(x,y), -w_t, w_t), if TEXTURE_WRAP_S is 4629 MIRROR_CLAMP_EXT, or 4630 u(x,y), otherwise 4631 v'(x,y) = clamp(v(x,y), 0, w_t), if TEXTURE_WRAP_T is CLAMP 4632 clamp(v(x,y), -w_t, w_t), if TEXTURE_WRAP_T is 4633 MIRROR_CLAMP_EXT, or 4634 v(x,y), otherwise 4635 w'(x,y) = clamp(w(x,y), 0, w_t), if TEXTURE_WRAP_R is CLAMP 4636 clamp(w(x,y), -w_t, w_t), if TEXTURE_WRAP_R is 4637 MIRROR_CLAMP_EXT, or 4638 w(x,y), otherwise, 4639 4640 where clamp(<a>,<b>,<c>) returns <b> if <a> is less than <b>, <c> if a is 4641 greater than <c>, and <a> otherwise. 4642 4643 (start a new paragraph with "For a polygon, rho is given at a fragment 4644 with window coordinates...", and then continue with the original spec 4645 text.) 4646 4647 (replace text starting with the last paragraph on p. 172, continuing to 4648 the end of p. 174) 4649 4650 When lambda indicates minification, the value assigned to 4651 TEXTURE_MIN_FILTER is used to determine how the texture value for a 4652 fragment is selected. 4653 4654 When TEXTURE_MIN_FILTER is NEAREST, the texel in the image array of level 4655 level_base that is nearest (in Manhattan distance) to that specified by 4656 (s,t,r) is obtained. Let i, j, and k be integers such that: 4657 4658 i = apply_wrap(floor(u'(x,y))), 4659 j = apply_wrap(floor(v'(x,y))), and 4660 k = apply_wrap(floor(w'(x,y))), 4661 4662 where the coordinate returned by apply_wrap() is as defined by Table X.19. 4663 The values of i, j, and k are then modified according to the texture wrap 4664 modes, as described in Table 3.19, to produce new values (i', j', and k'). 4665 For a three-dimensional texture, the texel at location (i,j,k) becomes the 4666 texture value. For a two-dimensional texture, k is irrelevant, and the 4667 texel at location (i,j) becomes the texture value. For a one-dimensional 4668 texture, j and k are irrelevant, and the texel at location i becomes the 4669 texture value. 4670 4671 Wrap mode Result 4672 -------------------------- ------------------------------------------ 4673 CLAMP_TO_EDGE clamp(coord, 0, size-1) 4674 CLAMP_TO_BORDER clamp(coord, -1, size) 4675 CLAMP { clamp(coord, 0, size-1), 4676 { for NEAREST filtering 4677 { clamp(coord, -1, size), 4678 { for LINEAR filtering 4679 REPEAT mod(coord, size) 4680 MIRROR_CLAMP_TO_EDGE_EXT clamp(mirror(coord), 0, size-1) 4681 MIRROR_CLAMP_TO_BORDER_EXT clamp(mirror(size), 0, size) 4682 MIRROR_CLAMP_EXT { clamp(mirror(coord), 0, size-1), 4683 { for NEAREST filtering 4684 { clamp(mirror(size), 0, size), 4685 { for LINEAR filtering 4686 MIRRORED_REPEAT (size-1) - mirror(mod(coord, 2*size)-size) 4687 4688 Table X.19: Texel location wrap mode application. mod(<a>,<b>) is 4689 defined to return <a>-<b>*floor(<a>/<b>), and mirror(<a>) is defined to 4690 return <a> if <a> is greater than or equal to zero or -(1+<a>) 4691 otherwise. The values of "wrap mode" and size are TEXTURE_WRAP_S and 4692 w_t, TEXTURE_WRAP_T and h_t, and TEXTURE_WRAP_R and d_t, for i, j, and k 4693 coordinates, respectively. The coordinate clamp and MIRROR_CLAMP_EXT 4694 depends on the filtering mode (NEAREST or LINEAR). 4695 4696 If the selected (i,j,k), (i,j), or i location refers to a border texel 4697 that satisfies any of the following conditions: 4698 4699 i < -b_s, 4700 j < -b_s, 4701 k < -b_s, 4702 i >= w_t + b_s, 4703 j >= h_t + b_s, or 4704 j >= d_t + b_s, 4705 4706 then the border values defined by TEXTURE_BORDER_COLOR are used in place 4707 of the non-existent texel. If the texture contains color components, the 4708 values of TEXTURE_BORDER_COLOR are interpreted as an RGBA color to match 4709 the texture's internal format in a manner consistent with table 3.15. If 4710 the texture contains depth components, the first component of 4711 TEXTURE_BORDER_COLOR is interpreted as a depth value. 4712 4713 When TEXTURE_MIN_FILTER is LINEAR, a 2x2x2 cube of texels in the image 4714 array of level level_base is selected. Let: 4715 4716 i_0 = apply_wrap(floor(u' - 0.5)), 4717 j_0 = apply_wrap(floor(v' - 0.5)), 4718 k_0 = apply_wrap(floor(w' - 0.5)), 4719 i_1 = apply_wrap(floor(u' - 0.5) + 1), 4720 j_1 = apply_wrap(floor(v' - 0.5) + 1), 4721 k_1 = apply_wrap(floor(w' - 0.5) + 1), 4722 alpha = frac(u' - 0.5), 4723 beta = frac(v' - 0.5), 4724 gamma = frac(w' - 0.5), 4725 4726 where frac(<x>) denotes the fractional part of <x>. 4727 4728 For a three-dimensional texture, the texture value tau is found as... 4729 4730 (replace last paragraph, p.174) For any texel in the equation above that 4731 refers to a border texel outside the defined range of the image, the texel 4732 value is taken from the texture border color as with NEAREST filtering. 4733 4734 4735 Modify Section 3.8.14, Texture Comparison Modes (p. 185) 4736 4737 (modify 2nd paragraph, p. 188, indicating that the Q texture coordinate is 4738 used for depth comparisons on cubemap textures) 4739 4740 Let D_t be the depth texture value, in the range [0, 1]. For 4741 fixed-function texture lookups, let R be the interpolated <r> texture 4742 coordinate, clamped to the range [0, 1]. For texture lookups generated by 4743 a program instruction, let R be the reference value for depth comparisons 4744 provided in the instruction, also clamped to [0, 1]. Then the effective 4745 texture value L_t, I_t, or A_t is computed as follows: 4746 4747 4748Additions to Chapter 4 of the OpenGL 1.5 Specification (Per-Fragment 4749Operations and the Frame Buffer) 4750 4751 None. 4752 4753 4754Additions to Chapter 5 of the OpenGL 1.5 Specification (Special Functions) 4755 4756 None. 4757 4758 4759Additions to Chapter 6 of the OpenGL 1.5 Specification (State and 4760State Requests) 4761 4762 Modify Section 6.1.12 of the ARB_vertex_program specification. 4763 4764 (Add new integer program parameter queries, plus language that program 4765 environment or local parameter query results are undefined if the query 4766 specifies a data type incompatible with the data type of the parameter 4767 being queried.) 4768 4769 The commands 4770 4771 void GetProgramEnvParameterdvARB(enum target, uint index, 4772 double *params); 4773 void GetProgramEnvParameterfvARB(enum target, uint index, 4774 float *params); 4775 void GetProgramEnvParameterIivNV(enum target, uint index, 4776 int *params); 4777 void GetProgramEnvParameterIuivNV(enum target, uint index, 4778 uint *params); 4779 4780 obtain the current value for the program environment parameter numbered 4781 <index> for the given program target <target>, and places the information 4782 in the array <params>. The values returned are undefined if the data type 4783 of the components of the parameter is not compatible with the data type of 4784 <params>. Floating-point components are compatible with "double" or 4785 "float"; signed and unsigned integer components are compatible with "int" 4786 and "uint", respectively. The error INVALID_ENUM is generated if <target> 4787 specifies a nonexistent program target or a program target that does not 4788 support program environment parameters. The error INVALID_VALUE is 4789 generated if <index> is greater than or equal to the 4790 implementation-dependent number of supported program environment 4791 parameters for the program target. 4792 4793 ... 4794 4795 The commands 4796 4797 void GetProgramLocalParameterdvARB(enum target, uint index, 4798 double *params); 4799 void GetProgramLocalParameterfvARB(enum target, uint index, 4800 float *params); 4801 void GetProgramLocalParameterIivNV(enum target, uint index, 4802 int *params); 4803 void GetProgramLocalParameterIuivNV(enum target, uint index, 4804 uint *params); 4805 4806 obtain the current value for the program local parameter numbered <index> 4807 belonging to the program object currently bound to <target>, and places 4808 the information in the array <params>. The values returned are undefined 4809 if the data type of the components of the parameter is not compatible with 4810 the data type of <params>. Floating-point components are compatible with 4811 "double' or "float"; signed and unsigned integer components are compatible 4812 with "int" and "uint", respectively. The error INVALID_ENUM is generated 4813 if <target> specifies a nonexistent program target or a program target 4814 that does not support program local parameters. The error INVALID_VALUE 4815 is generated if <index> is greater than or equal to the 4816 implementation-dependent number of supported program local parameters for 4817 the program target. 4818 4819 ... 4820 4821 The command 4822 4823 void GetProgramivARB(enum target, enum pname, int *params); 4824 4825 obtains program state for the program target <target>, writing ... 4826 4827 (add new paragraphs describing the new supported queries) 4828 4829 If <pname> is PROGRAM_ATTRIB_COMPONENTS_NV or 4830 PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer 4831 holding the number of active attribute or result variable components, 4832 respectively, used by the program object currently bound to <target>. 4833 4834 If <pname> is MAX_PROGRAM_ATTRIB_COMPONENTS or 4835 MAX_PROGRAM_RESULT_COMPONENTS_NV, GetProgramivARB returns a single integer 4836 holding the maximum number of active attribute or result variable 4837 components, respectively, supported for programs of type <target>. 4838 4839 4840Additions to Appendix A of the OpenGL 1.5 Specification (Invariance) 4841 4842 None. 4843 4844 4845Additions to the AGL/GLX/WGL Specifications 4846 4847 None. 4848 4849 4850GLX Protocol 4851 4852 The following new rendering commands are sent to the server as part 4853 of a glXRender request. 4854 4855 ProgramLocalParameterI4ivNV 4856 4857 2 28 rendering command length 4858 2 4303 rendering command opcode 4859 4 ENUM target 4860 4 CARD32 index 4861 4 INT32 params[0] 4862 4 INT32 params[1] 4863 4 INT32 params[2] 4864 4 INT32 params[3] 4865 4866 ProgramLocalParameterI4uivNV 4867 4868 2 28 rendering command length 4869 2 4305 rendering command opcode 4870 4 ENUM target 4871 4 CARD32 index 4872 4 CARD32 params[0] 4873 4 CARD32 params[1] 4874 4 CARD32 params[2] 4875 4 CARD32 params[3] 4876 4877 ProgramEnvParameterI4ivNV 4878 4879 2 28 rendering command length 4880 2 4307 rendering command opcode 4881 4 ENUM target 4882 4 CARD32 index 4883 4 INT32 params[0] 4884 4 INT32 params[1] 4885 4 INT32 params[2] 4886 4 INT32 params[3] 4887 4888 ProgramEnvParameterI4uivNV 4889 4890 2 28 rendering command length 4891 2 4309 rendering command opcode 4892 4 ENUM target 4893 4 CARD32 index 4894 4 CARD32 params[0] 4895 4 CARD32 params[1] 4896 4 CARD32 params[2] 4897 4 CARD32 params[3] 4898 4899 Following new rendering commands are added. These can be sent as a 4900 glXRender or glXRenderLarge request. 4901 4902 ProgramLocalParametersI4ivNV 4903 4904 2 16+count*4*4 rendering command length 4905 2 4304 rendering command opcode 4906 4 ENUM target 4907 4 CARD32 index 4908 4 CARD32 count 4909 4*count*4 LISTofINT32 params 4910 4911 If the command is encoded in a glXRenderLarge request, the 4912 command opcode and command length fields above are expanded to 4913 4 bytes each: 4914 4915 4 20+count*4*4 rendering command length 4916 4 4304 rendering command opcode 4917 4918 ProgramLocalParametersI4uivNV 4919 4920 2 16+count*4*4 rendering command length 4921 2 4306 rendering command opcode 4922 4 ENUM target 4923 4 CARD32 index 4924 4 CARD32 count 4925 4*count*4 LISTofCARD32 params 4926 4927 If the command is encoded in a glXRenderLarge request, the 4928 command opcode and command length fields above are expanded to 4929 4 bytes each: 4930 4931 4 20+count*4*4 rendering command length 4932 4 4306 rendering command opcode 4933 4934 ProgramEnvParametersI4ivNV 4935 4936 2 16+count*4*4 rendering command length 4937 2 4308 rendering command opcode 4938 4 ENUM target 4939 4 CARD32 index 4940 4 CARD32 count 4941 4*count*4 LISTofCARD32 params 4942 4943 If the command is encoded in a glXRenderLarge request, the 4944 command opcode and command length fields above are expanded to 4945 4 bytes each: 4946 4947 4 20+count*4*4 rendering command length 4948 4 4308 rendering command opcode 4949 4950 ProgramEnvParametersI4uivNV 4951 4952 2 16+count*4*4 rendering command length 4953 2 4310 rendering command opcode 4954 4 ENUM target 4955 4 CARD32 index 4956 4 INT32 count 4957 4*count*4 LISTofCARD32 params 4958 4959 If the command is encoded in a glXRenderLarge request, the 4960 command opcode and command length fields above are expanded to 4961 4 bytes each: 4962 4963 4 20+count*4*4 rendering command length 4964 4 4310 rendering command opcode 4965 4966 The remaining commands are non-rendering commands. These commands 4967 are sent separately (i.e., not as part of a glXRender or 4968 glXRenderLarge request), using the glXVendorPrivateWithReply 4969 request: 4970 4971 GetProgramLocalParameterIivNV 4972 1 CARD8 opcode (X assigned) 4973 1 17 GLX opcode (X_GLXVendorPrivateWithReply) 4974 2 5 request length 4975 4 1365 vendor specific opcode 4976 4 GLX_CONTEXT_TAG context tag 4977 4 ENUM target 4978 4 CARD32 index 4979 => 4980 1 1 reply 4981 1 CARD8 unused 4982 2 CARD16 sequence number 4983 4 4 reply length 4984 24 CARD32 unused 4985 16 INT32 params 4986 4987 GetProgramLocalParameterIuivNV 4988 1 CARD8 opcode (X assigned) 4989 1 17 GLX opcode (X_GLXVendorPrivateWithReply) 4990 2 5 request length 4991 4 1366 vendor specific opcode 4992 4 GLX_CONTEXT_TAG context tag 4993 4 ENUM target 4994 4 CARD32 index 4995 => 4996 1 1 reply 4997 1 CARD8 unused 4998 2 CARD16 sequence number 4999 4 4 reply length 5000 24 CARD32 unused 5001 16 CARD32 params 5002 5003 GetProgramEnvParameterIivNV 5004 1 CARD8 opcode (X assigned) 5005 1 17 GLX opcode (X_GLXVendorPrivateWithReply) 5006 2 5 request length 5007 4 1367 vendor specific opcode 5008 4 GLX_CONTEXT_TAG context tag 5009 4 ENUM target 5010 4 CARD32 index 5011 => 5012 1 1 reply 5013 1 CARD8 unused 5014 2 CARD16 sequence number 5015 4 4 reply length 5016 24 CARD32 unused 5017 16 INT32 params 5018 5019 GetProgramEnvParameterIuivNV 5020 1 CARD8 opcode (X assigned) 5021 1 17 GLX opcode (X_GLXVendorPrivateWithReply) 5022 2 5 request length 5023 4 1368 vendor specific opcode 5024 4 GLX_CONTEXT_TAG context tag 5025 4 ENUM target 5026 4 CARD32 index 5027 => 5028 1 1 reply 5029 1 CARD8 unused 5030 2 CARD16 sequence number 5031 4 4 reply length 5032 24 CARD32 unused 5033 16 CARD32 params 5034 5035Errors 5036 5037 The error INVALID_VALUE is generated by ProgramLocalParameter4fARB, 5038 ProgramLocalParameter4fvARB, ProgramLocalParameter4dARB, 5039 ProgramLocalParameter4dvARB, ProgramLocalParameterI4iNV, 5040 ProgramLocalParameterI4ivNV, ProgramLocalParameterI4uiNV, 5041 ProgramLocalParameterI4uivNV, GetProgramLocalParameter4fvARB, 5042 GetProgramLocalParameter4dvARB, GetProgramLocalParameterI4ivNV, and 5043 GetProgramLocalParameterI4uivNV if <index> is greater than or equal to the 5044 number of program local parameters supported by <target>. 5045 5046 The error INVALID_VALUE is generated by ProgramEnvParameter4fARB, 5047 ProgramEnvParameter4fvARB, ProgramEnvParameter4dARB, 5048 ProgramEnvParameter4dvARB, ProgramEnvParameterI4iNV, 5049 ProgramEnvParameterI4ivNV, ProgramEnvParameterI4uiNV, 5050 ProgramEnvParameterI4uivNV, GetProgramEnvParameter4fvARB, 5051 GetProgramEnvParameter4dvARB, GetProgramEnvParameterI4ivNV, and 5052 GetProgramEnvParameterI4uivNV if <index> is greater than or equal to the 5053 number of program environment parameters supported by <target>. 5054 5055 The error INVALID_VALUE is generated by ProgramLocalParameters4fvNV, 5056 ProgramLocalParametersI4ivNV, and ProgramLocalParametersI4uivNV if the sum 5057 of <index> and <count> is greater than the number of program local 5058 parameters supported by <target>. 5059 5060 The error INVALID_VALUE is generated by ProgramEnvParameters4fvNV, 5061 ProgramEnvParametersI4ivNV, and ProgramEnvParametersI4uivNV if the sum of 5062 <index> and <count> is greater than the number of program environment 5063 parameters supported by <target>. 5064 5065 5066Dependencies on NV_parameter_buffer_object 5067 5068 If NV_parameter_buffer_object is not supported, references to program 5069 parameter buffer variables and bindings should be removed. 5070 5071 5072Dependencies on ARB_texture_rectangle 5073 5074 If ARB_texture_rectangle is not supported, references to rectangle 5075 textures and the RECT and SHADOWRECT texture target identifiers should be 5076 removed. 5077 5078 5079Dependencies on EXT_gpu_program_parameters 5080 5081 If EXT_gpu_program_parameters is not supported, references to the 5082 Program{Local,Env}Parameters4fvNV commands, which set multiple program 5083 local or environment parameters in a single call, should be removed. 5084 These prototypes were included in this spec for completeness only. 5085 5086 5087Dependencies on EXT_texture_integer 5088 5089 If EXT_texture_integer is not supported, references to texture lookups 5090 returning integer values in Section 2.X.4.4 (Texture Access) should be 5091 removed, and all texture formats are considered to produce floating-point 5092 values. 5093 5094 5095Dependencies on EXT_texture_array 5096 5097 If EXT_texture_array is not supported, references to array textures in 5098 Section 2.X.4.4 (Texture Access) and elsewhere should be removed, as 5099 should all references to the "ARRAY1D", "ARRAY2D", "SHADOWARRAY1D", and 5100 "SHADOWARRAY2D" tokens. 5101 5102 5103Dependencies on EXT_texture_buffer_object 5104 5105 If EXT_texture_buffer_object is not supported, references to buffer 5106 textures in Section 2.X.4.4 (Texture Access) and elsewhere should be 5107 removed, as should all references to the "BUFFER" tokens. 5108 5109 5110Dependencies on NV_primitive_restart 5111 5112 If NV_primitive_restart is supported, index values causing a primitive 5113 restart are not considered as specifying an End command, followed by 5114 another Begin. Primitive restart is therefore not guaranteed to 5115 immediately update bindings for material properties changed inside a 5116 Begin/End. The spec language says they "are not guaranteed to update 5117 program parameter bindings until the following End command." 5118 5119 5120New State 5121 5122 Initial 5123 Get Value Type Get Command Value Description Sec Attrib 5124 ---------------------------- ---- --------------- ------- ---------------------- ------ ------ 5125 PROGRAM_ATTRIB_COMPONENTS_NV Z+ GetProgramivARB - number of components 6.1.12 - 5126 used for attributes 5127 PROGRAM_RESULT_COMPONENTS_NV Z+ GetProgramivARB - number of components 6.1.12 - 5128 used for results 5129 5130 Table X.20. New Program Object State. Program object queries return 5131 attributes of the program object currently bound to the program target 5132 <target>. 5133 5134 5135New Implementation Dependent State 5136 5137 Minimum 5138 Get Value Type Get Command Value Description Sec. Attrib 5139 -------------------------------- ---- --------------- ------- --------------------- ------ ------ 5140 MIN_PROGRAM_TEXEL_OFFSET_EXT Z GetIntegerv -8 minimum texel offset 2.x.4.4 - 5141 allowed in lookup 5142 MAX_PROGRAM_TEXEL_OFFSET_EXT Z GetIntegerv +7 maximum texel offset 2.x.4.4 - 5143 allowed in lookup 5144 MAX_PROGRAM_ATTRIB_COMPONENTS_NV Z+ GetProgramivARB (*) maximum number of 6.1.12 - 5145 components allowed 5146 for attributes 5147 MAX_PROGRAM_RESULT_COMPONENTS_NV Z+ GetProgramivARB (*) maximum number of 6.1.12 - 5148 components allowed 5149 for results 5150 MAX_PROGRAM_GENERIC_ATTRIBS_NV Z+ GetProgramivARB (*) number of generic 6.1.12 - 5151 attribute vectors 5152 supported 5153 MAX_PROGRAM_GENERIC_RESULTS_NV Z+ GetProgramivARB (*) number of generic 6.1.12 - 5154 result vectors 5155 supported 5156 MAX_PROGRAM_CALL_DEPTH_NV Z+ GetProgramivARB 4 maximum program 2.X.5 - 5157 call stack depth 5158 MAX_PROGRAM_IF_DEPTH_NV Z+ GetProgramivARB 48 maximum program 2.X.5 - 5159 if nesting 5160 MAX_PROGRAM_LOOP_DEPTH_NV Z+ GetProgramivARB 4 maximum program 2.X.5 - 5161 loop nesting 5162 5163 Table X.21: New Implementation-Dependent Values Introduced by 5164 NV_gpu_program4. (*) means that the required minimum is program 5165 type-specific. There are separate limits for each program type. 5166 5167 5168Issues 5169 5170 (1) How does this extension differ from previous NV_vertex_program and 5171 NV_fragment_program extensions? 5172 5173 RESOLVED: 5174 5175 - This extension provides a uniform set of instructions and bindings. 5176 Unlike previous extensions, the set of instructions and bindings 5177 available is generally the same. The only exceptions are a small 5178 number of instructions and bindings that make sense for one specific 5179 program type. 5180 5181 - This extension supports integer data types and provides a 5182 full-fledged integer instruction set. 5183 5184 - This extension supports array variables of all types, including 5185 temporaries. Array variables can be accessed directly or indirectly 5186 (using integer temporaries as indices). 5187 5188 - This extension provides a uniform set of structured branching 5189 constructs (if tests, loops, subroutines) that fully support 5190 run-time condition testing. Previous versions of NV_vertex_program 5191 provided unstructured branching. Previous versions of 5192 NV_fragment_program provided structure branching constructs, but the 5193 support was more limited -- for example, looping constructs couldn't 5194 specify loop counts with values computed at run time. 5195 5196 - This extension supports geometry programs, which are described in 5197 more detail in the NV_geometry_program4 extension. 5198 5199 - This extension provides the ability to specify and use cubemap 5200 textures with a DEPTH_COMPONENT internal format. Shadow mapping is 5201 supported; the Q texture coordinate is used as the reference value 5202 for comparisons. 5203 5204 (2) Is this extension backward-compatible with previous NV_vertex_program 5205 and NV_fragment_program extensions? If not, what support has been 5206 removed? 5207 5208 RESOLVED: This extension is largely, but not completely, 5209 backward-compatible. Functionality removed includes: 5210 5211 - Unstructured branching: NV_vertex_program2 included a general 5212 branch instruction "BRA" that could be used to jump to an arbitrary 5213 instruction. The "CAL" instruction could "call" to an arbitrary 5214 instruction into code that was not necessarily structured as simple 5215 subroutine blocks. Arbitrary unstructured branching can be 5216 difficult to implement efficiently on highly parallel GPU 5217 architectures, while basic structured branching is not nearly as 5218 difficult. 5219 5220 This extension retains the "CAL" instruction but treats each block 5221 of code between instruction labels as a separate subroutine. The 5222 "BRA" instruction and arbitrary branching has been removed. The 5223 structured branching constructs in this extension are sufficient to 5224 implement almost all of the looping/branching support in high-level 5225 languages ("goto" being the most obvious exception). 5226 5227 - Address registers: NV_vertex_program added the notion of address 5228 registers, which were effectively under-powered integer temporaries. 5229 The set of instructions used to manipulate address registers was 5230 severely limited. NV_vertex_program[23] extended the original 5231 scalars to vectors and added a few more instructions to manipulate 5232 address registers. Fragment programs had no address registers until 5233 NV_fragment_program2 added the loop counter, which was very similar 5234 in functionality to vertex program address registers, but even more 5235 limited. This extension adds true integer temporaries, which can 5236 accomplish everything old address registers could do, and much more. 5237 Address register support was removed to simplify the API. 5238 5239 - NV_fragment_program2 LOOP construct: NV_fragment_program2 added a 5240 LOOP instruction, which let you repeat a block of code <N> times, 5241 with a parallel loop counter that started at <A> and stepped by <B> 5242 on each iteration. This construct was signficantly limited in 5243 several ways -- the loop count had to be constant, and you could 5244 only access the innermost loop counter in a nested loop. This 5245 extension discards the support and retains the simpler "REP" 5246 construct to implement loops. If desired, a loop counter can be 5247 implemented by manipulating an integer temporary. The "BRK" 5248 instruction (conditional break) is retained, and a "CONT" 5249 instruction (conditional continue) is added. Additionally, the loop 5250 count need not be a constant. 5251 5252 - NV_vertex_program and ARB_vertex_program EXP and LOG instructions: 5253 NV_vertex_program provided EXP and LOG instructions that computed a 5254 rough approximation of 2^x or log_2(x) and provided some additional 5255 values that could help refine the approximation. Those opcodes were 5256 carried forward into ARB_vertex_program. Both ARB_vertex_program 5257 and NV_vertex_program2 provided EX2 and LG2 instructions that 5258 computed a better approximation. All fragment program extensions 5259 also provided EX2 and LG2, but did not bother to include EXP and 5260 LOG. On the hardware targeted by this extension, there is no 5261 advantage to using EXP and LOG, so these opcodes have been removed 5262 for simplicity. 5263 5264 - NV_vertex_program3 and NV_fragment_program2 provide the ability to 5265 do indirect addressing of inputs/outputs when using bindings in 5266 instructions -- for example: 5267 5268 MOV R0, vertex.attrib[A0.x+2]; # vertex 5269 MOV result.texcoord[A0.y], R1; # vertex 5270 MOV R2, fragment.texcoord[A0.x]; # fragment 5271 5272 This extension provides indexing capability, but using named array 5273 variables instead. 5274 5275 ATTRIB attribs[] = { vertex.attrib[2..5] }; 5276 MOV R0, attribs[A0.x]; 5277 OUTPUT outcoords[] = { result.texcoord[0..3] }; 5278 MOV outcoords[A0.y], R1; 5279 ATTRIB texcoords[] = { fragment.texcoord[0..2] }; 5280 MOV R2, texcoords[A0.x]; 5281 5282 This approach makes the set of attribute and result bindings more 5283 regular. Additionally, it helps the assembler determine which 5284 vertex/fragment attributes are actually needed -- when the assembler 5285 sees constructs like "fragment.texcoord[A0.x]", it must treat *all* 5286 texture coordinates as live unless it can determine the range of 5287 values used for indexing. The named array variable approach 5288 explicitly identifies which attributes are needed when indexing is 5289 used. 5290 5291 Functionality altered includes: 5292 5293 - The RSQ instruction in the original NV_vertex_program and 5294 ARB_vertex_program extensions implicitly took the absolute value of 5295 their operand. Since the ARB extensions don't have numerics 5296 guarantees, computing the reciprocal square root of a negative value 5297 was not meaningful. To allow for the possibility of taking the 5298 reciprocal square root of a negative value (which should yield NaN 5299 -- "not a number"), the RSQ instruction in this instruction no 5300 longer implicitly takes the absolute value of its operand. 5301 Equivalent functionality can be achieved using the explicit |abs| 5302 absolute value operator on the operand to RSQ. 5303 5304 - The results of texture lookups accessing inconsistent textures are 5305 now undefined, instead of producing a fixed constant vector. 5306 5307 5308 (3) What should this set of extensions be called? 5309 5310 RESOLVED: NV_gpu_program4, NV_vertex_program4, NV_fragment_program4, 5311 and NV_geometry_program4. Only NV_gpu_program4 will appear in the 5312 extension string; the other three specifications exist simply to define 5313 vertex, fragment, and geometry program-specific features. 5314 5315 The "gpu_program" name was chosen due to the common instruction set 5316 intended to run on GPUs. On previous chip generations, the vertex and 5317 fragment instruction sets were similar, but there were enough 5318 differences to package them separately. 5319 5320 The choice of "4" indicates that this is the fourth generation of 5321 programmable hardware from NVIDIA. The GeForce3 and GeForce4 series 5322 supported NV_vertex_program. The GeForce FX series supported 5323 NV_vertex_program2 and added fragment programmability with 5324 NV_fragment_program. Around this time, the OpenGL Architecture Review 5325 Board (ARB) approved ARB_vertex_program and ARB_fragment_program 5326 extensions, and NVIDIA added NV_vertex_program2_option and 5327 NV_fragment_program_option extensions exposing GeForce FX features using 5328 the ARB extensions' instruction set. The GeForce6 and GeForce7 series 5329 brought the NV_vertex_program3 and NV_fragment_program2 extensions, 5330 which extend the ARB extensions further. This extension adds geometry 5331 programs, and brings the "version number" for each of these extensions 5332 up to "4". 5333 5334 5335 (4) This instruction adds integer data type support in programmable 5336 shaders that were previously float-centric. Should applications be able 5337 to pass integer values directly to the shaders, and if so, how does it 5338 work? 5339 5340 RESOLVED: The diagram at the bottom of this issue depicts data flows in 5341 the GL, as extended by this and related extensions. 5342 5343 This extension generalizes some state to be "typeless", instead of being 5344 strongly typed (and almost invariably floating-point) as in the core 5345 specification. We introduce a new set of functions to specify GL state 5346 as signed or unsigned integer values, instead of floating point values. 5347 These functions include: 5348 5349 * VertexAttribI*{i,ui}() -- Specify generic vertex attributes as 5350 integers. This extension does not create "integer" versions for 5351 fixed-function attribute functions (e.g., glColor, glTexCoord), 5352 which remain fully floating-point. 5353 5354 * Program{Env,Local}ParameterI*{i,ui}() -- Specify environment and 5355 local parameters as integers. 5356 5357 * TexImage*() with EXT_texture_integer internal formats -- Specify 5358 texture images as containing integer data whose values are not 5359 converted to floating-point values. 5360 5361 * EXT_parameter_buffer_object functions -- Bind (typeless) buffer 5362 object data stores for use as program parameters. These buffer 5363 objects can be loaded with either integer or floating-point data. 5364 5365 * EXT_texture_buffer_object functions -- Bind (typeless) buffer object 5366 data stores for use as textures. These buffer objects can be loaded 5367 with either integer or floating-point data. 5368 5369 Each type of program (using NV_gpu_program4 and related extension) can 5370 read attributes using any data type (float, signed integer, unsigned 5371 integer) and write result values used by subsequent stages using any 5372 data type. 5373 5374 Finally, there are several new places where integer data can be 5375 consumed by the GL: 5376 5377 * NV_transform_feedback -- Stream transformed vertex attribute 5378 components to a (typeless) buffer object. The transformed 5379 attributes can be written as signed or unsigned integers in vertex 5380 and geometry programs. 5381 5382 * EXT_texture_integer internal formats and framebuffer objects -- 5383 Provide support for rendering to integer texture formats, where 5384 final fragment values are treated as signed or unsigned integers, 5385 rather than floating-point values. 5386 5387 The diagram below represents a substantial portion of the GL pipeline. 5388 Each line connecting blocks represents an interface where data is 5389 "produced" from the GL state or by fixed-function or programmable 5390 pipeline stages and "consumed" by another pipeline stage. Each producer 5391 and consumer is labeled with a data type. For producers, the 5392 "(typeless)" designation generally means that the state and/or output 5393 can be written as floating-point values or as signed or unsigned 5394 integers. "(float)" means that the outputs are always written as 5395 floating-point. The same distinction applies to consumers -- 5396 "(typeless)" means that the consumer is capable of reading inputs using 5397 any data type, and "(float)" means that consumer always reads inputs as 5398 floating-point values. 5399 5400 To get sane results, applications must ensure that each value passed 5401 between pipeline stages is produced and consumed using the same data 5402 type. If a value is written in one stage as a floating-point value; it 5403 must be read as a floating-point value as well. If such a value is read 5404 as a signed or unsigned integer, its value is considered undefined. In 5405 practice, the raw bits used to represent the floating-point (IEEE 5406 single-precision floating-point encoding in the initial implementation 5407 of this spec) will be treated as an integer. 5408 5409 Type matching between stages is not enforced by the GL, because the 5410 overhead of doing so would be substantial. Such overhead would include: 5411 5412 * matching the inputs and outputs of each pipeline stage 5413 (fixed-function or programmable) every time the program 5414 configuration or fixed-function state changes, 5415 5416 * tracking the data type of each generic vertex attribute and checking 5417 it against the vertex program's inputs, 5418 5419 * tracking the data type of each program parameter and checking it 5420 against the manner the parameters were used in programs, 5421 5422 * matching color buffers against fragment program outputs. 5423 5424 Such error checking is certainly valuable, but the additional CPU 5425 overhead cost is substantial. Given that current CPUs often have a hard 5426 time keeping up with high-end GPUs, adding more overhead is a step in 5427 the wrong direction. We expect developer tools, such as instrumented 5428 drivers, to be able to provide type checking on most interfaces. 5429 5430 The diagram below depicts assembly programmability. Using vertex, 5431 geometry, and fragment shaders provided by the OpenGL Shading Language 5432 (GLSL) isn't substantially different from the assembly interface, except 5433 that the interfaces between programmable pipeline stages are more 5434 tightly coupled in GLSL (vertex, geometry, and fragment shaders are 5435 linked together into a single program object), and that shader variables 5436 are more strongly typed in GLSL than in the assembly interface. 5437 5438 In the figure below, the first programmable stage is vertex program 5439 execution. For all inputs read by the vertex program, they must be 5440 specified in the GL vertex APIs (immediate mode or vertex arrays) using 5441 a data type matching the data type read by the shader. Additionally, 5442 vertex programs (and all other program types) can read program 5443 parameters, parameter buffers, and textures. In all cases the 5444 parameter, buffer, or texture data must be accessed in the shader using 5445 the same data type used to specify the data. If vertex programs are 5446 disabled, fixed-function vertex processing is used. Fixed-function 5447 vertex processing is fully floating-point, and all the conventional 5448 vertex attributes and state used by fixed-function are floating-point 5449 values. 5450 5451 After vertex processing, an optional geometry program can be executed, 5452 which reads attributes written by vertex programs (or fixed-functon) and 5453 writes out new vertex attributes. The vertex attributes it reads must 5454 have been written by the vertex program (or fixed-function) using a 5455 matching data type. 5456 5457 After geometry program execution, vertex attributes can optionally be 5458 written out to buffer objects using the NV_transform_feedback extension. 5459 The vertex attributes are written by the GL to the buffer objects using 5460 the same data type used to write the attribute in the geometry program 5461 (or vertex program if geometry programs are disabled). 5462 5463 Then, rasterization generates fragments based on transformed vertices. 5464 Most attributes written by vertex or geometry programs can be read by 5465 fragment programs, after the rasterization hardware "interpolates" them. 5466 This extension allows fragment programs to control how each attribute is 5467 interpolated. If an attribute is flat-shaded, it will be taken from the 5468 output attribute of the provoking vertex of the primitive using the same 5469 data type. If an attribute is smooth-shaded, the per-vertex attributes 5470 will be interpreted as a floating-point value, and a floating-point 5471 result. One necessary consequence of this is that any integer 5472 per-fragment attributes must be flat-shaded. To prevent some 5473 interpolation type errors, assembly and GLSL fragment shaders will not 5474 compile if they declare an integer fragment attribute that is not flat 5475 shaded. [NOTE: While point primitives generally have constant 5476 attributes, any integer attributes must still be flat-shaded; point 5477 rasterization may perform (degenerate) floating-point interpolation.] 5478 5479 Fragment programs must read attributes using data types matching the 5480 outputs of the interpolation or flat-shading operations. They may write 5481 one or more color outputs using any data type, but the data type used 5482 must match the corresponding framebuffer attachments. Outputs directed 5483 at signed or unsigned integer textures (EXT_texture_integer) must be 5484 written using the appropriate integer data type; all other outputs must 5485 be written as floating-point values. Note that some of the 5486 fixed-function per-fragment operations (e.g., blending, alpha test) are 5487 specified as floating-point operations and are skipped when directed at 5488 signed or unsigned integer color buffers. 5489 5490 5491 5492 generic conventional 5493 vertex vertex 5494 attributes attributes 5495 | (typeless) | (float) 5496 | | 5497 | | 5498 | +----------------------+ 5499 program | | | 5500 parameters ----+ | | | 5501 (typeless) | | | (typeless) | (float) 5502 | V V V 5503 constant +-+----------> vertex fixed-function 5504 buffers ----+ |(typeless) program vertex 5505 (typeless) | | | | 5506 | | | (typeless) | (float) 5507 textures ----+ | V | 5508 (typeless) | |<----------------------+ 5509 | | | 5510 | | +---------------+ 5511 | | | | 5512 | | | (typeless) | 5513 | | V | 5514 | +---------> geometry | 5515 | |(typeless) program | 5516 | | | | 5517 | | | (typeless) | 5518 | | V | 5519 | | |<--------------+ 5520 | | | 5521 | | | 5522 | | +-----------------+ 5523 | | | |(typeless) 5524 | | | v 5525 | | | transform 5526 | | | feedback 5527 | | | buffers 5528 | | | 5529 | | | 5530 | | +-----------------------+ 5531 | | | | 5532 | | | (float) | (typeless) 5533 | | V V 5534 | | interpolated flat 5535 | | attributes attributes 5536 | | | | 5537 | | | (float) | (typeless) 5538 | | V | 5539 | | |<----------------------+ 5540 | | | 5541 | | +-----------------------+ 5542 | | | | 5543 | | | (typeless) | (float) 5544 | |(typeless) V V 5545 | +---------> fragment +------> fixed-function 5546 | program |(float) fragment 5547 | | | | 5548 +--------------------------/|/--------+ | 5549 | | 5550 | (typeless) | (float) 5551 V | 5552 |<----------------------+ 5553 | 5554 +-----------------------+------ .... 5555 | | 5556 | (typeless) | (typeless) 5557 V V 5558 color color 5559 attachment attachment 5560 0 1 5561 5562 5563 (5) Instructions can operate on signed integer, unsigned integer, and 5564 floating-point values. Some operations make sense on all three data 5565 types? How is this supported, and what type checking support is provided 5566 by the assembler? 5567 5568 RESOLVED: One important property of the instruction set is that the 5569 data type for all operands and the result is fully specified by the 5570 instructions themselves. For instructions (such as ADD) that make sense 5571 for both integer and floating-point values, an optional data type 5572 modifier is provided to indicate which type of operation should be 5573 performed. For example, "ADD.S", "ADD.U", and "ADD.F", add signed 5574 integers, unsigned integers, or floating-point values, respectively. If 5575 no data type modifier is provided, ".F" is assumed if the instruction 5576 can apply to floating-point values and ".S" is assumed otherwise. 5577 5578 To help identify errors where the wrong data type is used -- for 5579 example, adding integer values in an ADD instruction that omits a data 5580 type modifier and thus defaults to "ADD.F" -- variables may be declared 5581 with optional data type modifiers. In the following code: 5582 5583 INT TEMP a; 5584 UINT TEMP b; 5585 FLOAT TEMP c; 5586 TEMP d; 5587 5588 "a", "b", "c", and "d" are declared as temporary variables holding 5589 signed integer, unsigned integer, floating-point, and typeless values. 5590 Since each instruction fully specifies the data type of each operand and 5591 its result, these data types can be checked against the data type 5592 assigned to the variables operated on. If the types don't match, and 5593 the variable is not typeless, an error is reported. The opcode modifier 5594 ".NTC" can be used to ignore such errors on a per-opcode basis, if 5595 required. 5596 5597 Note that when bindings are used directly in instructions, they are 5598 always considered typeless for simplicity. Some fixed-function bindings 5599 have an obvious data type, but other bindings (e.g., program parameters) 5600 can hold either integer or floating-point values, depending on how they 5601 were specified. 5602 5603 Variable data types are optional. Typeless variables are provided 5604 because some programs may want to reuse the same variable in several 5605 places with different data types. 5606 5607 (6) Should both signed (INT) and unsigned integer (UINT) data types be 5608 provided? 5609 5610 RESOLVED: Yes. Signed and unsigned integer operations are supported. 5611 Providing both "INT" and "UINT" variable modifiers distinguish between 5612 signed and unsigned values for type checking purposes, to ensure that 5613 unsigned values aren't read as signed values and vice versa. 5614 5615 This specification says if a value is read a signed integer, but was 5616 written as an unsigned integer, the value returned is undefined. 5617 However, signed and unsigned integers are interchangeable in practice, 5618 except for very large unsigned integers (which can't be represented as 5619 signed values of the equivalent size) or negative signed integers. 5620 5621 If programs know that they won't generate negative or very large values, 5622 signed and unsigned integers can be used interchangeably. To avoid type 5623 errors in the assembler in this case, typeless variables can be used. 5624 Or the ".NTC" modifier can be used when appropriate. 5625 5626 (7) Integer and floating-point constants are supported in the instruction 5627 set. Integer constants might be interpreted to mean either "real integer" 5628 values or floating-point values. How are they supported? 5629 5630 RESOLVED: When an obvious floating point constant is specified (e.g., 5631 "3.0"), the developers' intent is clear. If you try to use a 5632 floating-point value in an instruction that wants an integer operand, or 5633 a declaration of an integer parameter variable, the program will fail to 5634 load. An integer constant used in an instruction isn't quite as clear. 5635 But its meaning can be easily inferred because the operand types of 5636 instructions are well-known at compile time. An integer multiply 5637 involving the constant "2" will interpret the "2" as an integer. A 5638 floating-point multiply involving the same constant "2" will interpret 5639 it as a floating-point value. 5640 5641 The only real problem is for a parameter declaration that is typeless. 5642 For typed variables, the intent is clear: 5643 5644 INT PARAM two = 2; # use integer 2 5645 FLOAT PARAM twoPt0 = 2; # use floating-point 2.0 5646 5647 For typeless variables, there's no context to go on: 5648 5649 PARAM two = 2; # 2? 2.0? 5650 5651 This extension is intended to be largely upward-compatible with 5652 ARB_vertex_program, ARB_fragment_program, and the other extensions built 5653 on top of them. In all of these, the previous declaration is legal and 5654 means "2.0". For compatibility, we choose to interpret integer 5655 constants in this case as floating-point values. The assembler in the 5656 NVIDIA implementation will issue a warning if this case ever occurs. 5657 5658 This extension does not provide decoration of integer constant values -- 5659 we considered adding suffixed integers such as "2U" to mean "2, and 5660 don't even think about converting me to a float!". We expect that it 5661 will be sufficient to use the "INT" or "FLOAT" modifiers to disambiguate 5662 effectively. 5663 5664 (8) Should hexadecimal constants (e.g., 0x87A3 or 0xFFFFFFFF) be supported? 5665 5666 RESOLVED: Yes. 5667 5668 (9) Should we provide data type modifiers with explicit component sizes? 5669 For example, "INT8", "FLOAT16", or "INT32". If so, should we provide a 5670 mechanism to query the size (in bits) of a variable, or of different 5671 variable types/qualifiers? 5672 5673 RESOLVED: No. 5674 5675 (10) Should this extension provide better support for array variables? 5676 5677 RESOLVED: Yes; array variables of all types are allowed. 5678 5679 In ARB_vertex_program, program parameter (constant) variables could be 5680 addressed as arrays. Temporary variables, vertex attributes, and vertex 5681 results could not be declared as arrays. 5682 5683 In NV_vertex_program3 and NV_fragment_program2, relative addressing was 5684 supported in program bindings: 5685 5686 MOV R0, vertex.attrib[A0.x]; # vertex 5687 MOV result.texcoord[A0.x], R0; # vertex 5688 MOV R0, fragment.texcoord[A0.x]; # fragment -- inside LOOP 5689 5690 Explicitly declared attribute or result arrays were not supported, and 5691 temporaries could also not be arrays. 5692 5693 This extension allows users to declare attribute, result, and temporary 5694 arrays such as: 5695 5696 ATTRIB attribs[] = { vertex.attrib[7..11] }; 5697 TEMP scratch[10]; 5698 RESULT texcoords[] = { result.texcoord[0..3] }; 5699 5700 Additionally, the relative addressing mechanisms provided by 5701 NV_vertex_program3 and NV_fragment_program2 are NOT supported in this 5702 extension -- instead, declared array variables are the only way to get 5703 relative addressing. Using declared arrays allows the assembler to 5704 identify which attributes will actually be used. An expression like 5705 "vertex.texcoord[A0.x]" doesn't identify which texture coordinates are 5706 referenced, and the assembler must be conservative in this case and 5707 assume that they all are. 5708 5709 (11) Is relative addressing of temporaries allowed? 5710 5711 RESOLVED: Yes. However, arrays of temporaries may end up being stored 5712 in off-chip memory, and may be slower to access than non-array 5713 temporaries. 5714 5715 (12) Should this extension add bindings to pass generic attributes between 5716 vertex, geometry, and fragment programs, or are texture coordinates 5717 sufficient? 5718 5719 RESOLVED: While texture coordinates have been used in the past, generic 5720 attributes should be provided. 5721 5722 The assembler provides a large set of bindings and automatically 5723 eliminates generic attributes or components that are unused. At each 5724 interface between programs, there is an implementation-dependent limit 5725 on the number of attribute components that can be passed. 5726 5727 There are several reasons that this approach was chosen. First, if the 5728 number of attributes that can be passed between program stages exceeds 5729 the number of existing texture coordinate sets supported when specifying 5730 vertex, a second implementation-dependent number of texture coordinates 5731 would need to be exposed to cover the number supported between stages. 5732 Second, the mechanisms described above reduce or eliminate the need to 5733 pack attributes into four component vectors. Third, "texture 5734 coordinates" that have been historically used for texture lookups don't 5735 need to be used to pass values that aren't used this way. 5736 5737 (13) The structured branching support in NV_fragment_program2 provides a 5738 REP instruction that says to repeat a block of code <N> times, as well as 5739 a LOOP instruction that does the same, but also provides a special loop 5740 counter variable. What sort of looping mechanism should we provide here? 5741 5742 RESOLVED: Provide only the REP instruction. The functionality provided 5743 by the LOOP instruction can be easily achieved by using an integer 5744 temporary as the loop index. This avoids two annoyances of the old LOOP 5745 models: (a) the loop index (A0.x) is a special variable name, while all 5746 other variables are declared normally and (b) instructions can only 5747 access the loop index of the innermost loop -- loop indices at higher 5748 nesting levels are not accessible. 5749 5750 One other option was a considered -- a "LOOPV" instruction (LOOP with a 5751 variable where the program specified a variable name and component to 5752 hold the loop index, instead of using the implicit variable name "A0.x". 5753 In the end, it was decided that using an integer temporary as a loop 5754 counter was sufficient. 5755 5756 (14) The structured branching support in NV_fragment_program2 provides a 5757 REP instruction that requires a loop count. Some looping constructs may 5758 not have a definite loop count, such as a "while" statement in C. Should 5759 this construct be supported, and if so, how? 5760 5761 RESOLVED: The REP instruction is extended to make the loop count 5762 optional. If no loop count is provided, the REP instruction specified a 5763 loop that can only be exited using the BRK (break) or RET instructions. 5764 To avoid obvious infinite loops, an error will be reported if a 5765 REP/ENDREP block contains no BRK instruction at the current nesting 5766 level and no RET instruction at any nesting level. 5767 5768 To implement a loop like "while (value < 7.0) ...", code such as the 5769 following can be used: 5770 5771 TEMP cc; # dummy variable 5772 REP; 5773 SLT.CC cc.x, value.x, 7.0; # compare value.x to 7.0, set CC0 5774 BRK NE.x; # break out if not true 5775 ... 5776 ... # presumably update value! 5777 ... 5778 ENDREP; 5779 5780 (15) The structured branching support in NV_fragment_program2 provides a 5781 BRK instruction that operates like C's "break" statement. Should we 5782 provide something similar to C's "continue" statement, which skips to the 5783 next iteration of the loop? 5784 5785 RESOLVED: Yes, a new CONT opcode is provided for this purpose. 5786 5787 (16) Can the BRK or CONT instructions break out of multiple levels of 5788 nested loops at once? 5789 5790 RESOLVED: No. BRK and CONT only exit the current nesting level. To 5791 break out of multiple levels of nested loops, multiple BRK/CONT 5792 instructions are required. 5793 5794 (17) For REP instructions, is the loop counter reloaded on each iteration 5795 of the loop? 5796 5797 RESOLVED: No. The loop counter is loaded once at the top of the loop, 5798 compared to zero at the top of the loop, and decremented when each loop 5799 iteration completes. A program may overwrite the variable used to 5800 specify the initial value of the loop counter inside the loop without 5801 affecting the number of times the loop body is executed. 5802 5803 (18) How are floating-point values represented in this extension? What 5804 about floating-point arithmetic operations? 5805 5806 RESOLVED: In the initial hardware implementation of this extension, 5807 floating-point values are represented using the standard 32-bit IEEE 5808 single-precision encoding, consisting of a sign bit, 8 exponent bits, 5809 and 23 mantissa bits. Special encodings for NaN (not a number), +/-INF 5810 (infinity), and positive and negative zero are supported. Denorms 5811 (values less than 2^-126, which have an exponent encoding of "0" and no 5812 implied leading one) are supported, but may be flushed to zero, 5813 preserving the sign bit of the original value. Arithmetic operations 5814 are carried out at single-precision using normal IEEE floating-point 5815 rules, including special rules for generating infinities, NaNs, and 5816 zeros of each sign. 5817 5818 Floating-point temporaries declared as "SHORT" may be, but are not 5819 necessarily, stored as 16-bit "fp16" values (sign bit, five exponent 5820 bits, ten mantissa bits), as specified in the NV_float_buffer and 5821 ARB_half_float_pixel extensions. 5822 5823 (19) Should we provide a method to declare how fragment attributes are 5824 interpolated? It is possible to have flat-shaded attributes, 5825 perspective-corrected attributes, and centroid-sampled attributes. 5826 5827 RESOLVED: Yes. Fragment program attribute variable declarations may 5828 specify the "FLAT", "NOPERSPECTIVE", and "CENTROID" modifiers. 5829 5830 These modifiers are documented in detail in the NV_fragment_program4 5831 specification. 5832 5833 (20) Should vertex and primitive identifiers be supported? If so, how? 5834 5835 RESOLVED: A vertex identifier is available as "vertex.id" in a vertex 5836 program. The vertex ID is equal to value effectively passed to 5837 ArrayElement when the vertex is specified, and is defined only if vertex 5838 arrays are used with buffer objects (VBOs). 5839 5840 A primitive identifier is available as "primitive.id" in a geometry or 5841 fragment program. The primitive ID is equal to the number of primitives 5842 processed since the last implicit or explicit call to glBegin(). 5843 5844 See the NV_vertex_program4 spec for more information on vertex IDs, and 5845 the NV_geometry_program4 or NV_fragment_program4 specs for more 5846 information on primitive IDs. 5847 5848 (21) For integer opcodes, should a bitwise inversion operator "~" be 5849 provided, analogous to existing negation operator? 5850 5851 RESOLVED: No. If this operator were provided, it might allow a program 5852 to evaluate the expression "a&(~b)" using a single instruction: 5853 5854 AND.U a, a, ~b; 5855 5856 Instead, it is necessary to instead do something like: 5857 5858 UINT TEMP t; 5859 NOT.U t, b; 5860 AND.U a, a, t; 5861 5862 If necessary, this functionality could be added in a subsequent 5863 extension. 5864 5865 (22) What happens if you negate or take the absolute value of the 5866 biggest-magnitude negative integer? 5867 5868 RESOLVED: Signed integers are represented using two's complement 5869 representation. For 32-bit integers, the largest possible value is 5870 2^31-1; the smallest possible value is -2^31. There is no way to 5871 represent 2^31, which is what these operators "should" return. The 5872 value returned in this case is the original value of -2^31. 5873 5874 (23) How do condition codes work? How are they different from those 5875 provided in previous NVIDIA extensions? 5876 5877 RESOLVED: There are two condition codes -- CC0 and CC1 -- each of which 5878 is a four-component vector. The condition codes are set based on the 5879 result of an instruction that specifies a condition code update 5880 modifier. Examples include: 5881 5882 ADD.S.CC R0, R1, R2; # add signed integers R1 and R2, update 5883 # CC0 based on the result, write the 5884 # final value to R0 5885 ADD.F.CC1 R3, R4, R5; # add floats R4 and R5, update CC1 based 5886 # on the result, write the final value 5887 # to R3 5888 ADD.U.CC0 R6.xy, R7, R8; # add unsigned integers R7 and R8, update 5889 # CC0 (x and y components) based on the 5890 # result, write the final value to R6 5891 # (x and y components) 5892 5893 Condition codes can be used for conditional writes, conditional 5894 branches, or other operations. The condition codes aren't used 5895 directly, but are instead used with a condition code test such as "LT" 5896 (less than) or "EQ" (equal to). Examples include: 5897 5898 MOV R0 (GT.x), R1; # move R1 to R0 only if the x component of 5899 # CC0 indicates a result of ">0" 5900 MOV R2 (NE1), R3; # component-wise move of R3 to R2 if the 5901 # corresponding component of CC1 5902 # indicates a result of "!=0" 5903 IF LE0.xyxy; # execute the block of code if the x or 5904 ... # y components of CC0 indicate a result 5905 ENDIF; # of "<=0" 5906 REP; 5907 ... 5908 BRK EQ1.xyzx; # break out of loop if the x, y, or z 5909 ENDREP; # components of CC1 indicate a result of 5910 # "==0". 5911 5912 Previous NVIDIA extensions provide eight tests, which are still 5913 supported here. The tests "EQ" (equal), "GE" (greater/equal), "GT" 5914 (greater than), "LE" (less/equal), "LT" (less than), and "NE" (not 5915 equal) can be used to determine the relation of the result used to set 5916 the condition code with zero. The tests "TR" (true) and "FL" (false), 5917 are special tests that always evaluate to true or false respectively. 5918 5919 For floating-point results, a NaN (not a number) encoding causes the 5920 "NE" condition to evaluate to TRUE and all other conditions to evaluate 5921 to FALSE. IEEE encodings for "negative" and "positive" zero are both 5922 treated as equal to zero. 5923 5924 Condition codes are implemented as a set of flags, which are set 5925 depending on the type of operation, as described in the spec. 5926 5927 For instructions that return floating-point or signed integer values, 5928 the normal condition code tests reliably indicate the relationship of 5929 the result to zero. For instructions that return unsigned values, the 5930 condition codes are a bit more complicated. For example, the sign flag 5931 is set if the most significant bit of the result written is set. As a 5932 result, very large unsigned integer values (e.g., 0x80000000 - 5933 0xFFFFFFFF) are effectively treated as negative values. Condition code 5934 tests should be used with care with unsigned results -- to test if an 5935 unsigned integer is ">0", use a sequence like: 5936 5937 MOV.U.CC R0, R1; # move R1 to R0, set condition code 5938 IF NE; # test if the result is "!=0", a very 5939 ... # large value might fail "GT"! 5940 ENDIF; 5941 5942 This extension provides a number of additional condition code tests 5943 useful for different floating-point or integer operations: 5944 5945 * NAN (not a number) is true if a floating-point result is a NaN. LEG 5946 (less, equal to, or greater) is the opposite of NAN. 5947 5948 * CF (carry flag) is true if an unsigned add overflows, or if an 5949 unsigned subtract produces a non-negative value. NCF (no carry 5950 flag) is the opposite of CF. 5951 5952 * OF (overflow flag) is true if a signed add or subtract overflows. 5953 NOF (no overflow flag) is the opposite of OF. 5954 5955 * SF (sign flag) is true if the sign flag is set. NSF (no sign flag) 5956 is the opposite of SF. 5957 5958 * AB (above) is true if an unsigned subtract produces a positive 5959 result. BLE (below or equal) is the opposite of AB, and is true if 5960 an unsigned subtract produces a negative result or zero. Note that 5961 CF can be used to test if the result is greater than or equal to 5962 zero, and NCF can be used to test if the result is less than zero. 5963 5964 (24) How do the "set on" instructions (SEQ, SGE, SGT, SLE, SLT, SNE) work 5965 with integer values and/or condition codes? 5966 5967 RESOLVED: "Set on" instructions comparing signed and unsigned values 5968 return zero if the condition is false, and an integer with all bits set 5969 if the condition is true. If the result is signed, it is interpreted as 5970 -1. If the result is unsigned, it is interpreted the largest unsigned 5971 value (0xFFFFFFFF for 32-bit integers). This is different from the 5972 floating-point "set on", which is defined to return 1.0. 5973 5974 This specific result encoding was chosen so that bitwise operators (NOT, 5975 AND, OR, XOR) can be used to evaluate boolean expressions. 5976 5977 When performing condition code tests on the results of an integer "set 5978 on" instruction, keep in mind that a TRUE result has the most 5979 significant bit set and will be interpreted as a negative value. To 5980 test if a condition is true, use "NE" (!=0). A condition code test of 5981 "GT" will always fail if the condition code was written by an integer 5982 "set on" instruction. 5983 5984 (25) What new texture functionality is provided? 5985 5986 RESOLVED: Several new features are provided. 5987 5988 First, the TXF (texel fetch) instruction allows programs to access a 5989 texture map like a normal array. Integer coordinates identifying an 5990 individual texel and LOD are provided, and the corresponding texture 5991 data is returned without filtering of any type. 5992 5993 Second, the TXQ (texture size query) instruction allows programs to 5994 query the size of a specified level of detail of a texture. This 5995 feature allows programs to perform computations dependent on the size of 5996 the texture without having to pass the size as a program parameter or 5997 via some other mechanism. 5998 5999 Third, applications may specify a constant texel offset in a texture 6000 instruction that moves the texture sample point by the specified number 6001 of texels. This offset can be used to perform custom texture filtering, 6002 and is also independent of the size of the texture LOD -- the same 6003 offsets are applied, regardless of the mipmap level. 6004 6005 Fourth, shadow mapping is supported for cube map textures. The first 6006 three coordinates are the normal (s,t,r) coordinates for a cube map 6007 texture lookup, and the fourth component is a depth reference value that 6008 can be compared to the depth value stored in the texture. 6009 6010 (26) What "consistency" requirements are in effect for textures accessed 6011 via the TXF (texel fetch) instruction? 6012 6013 UNRESOLVED: The texture must be usable for regular texture mapping 6014 operations -- if texture sizes or formats are inconsistent and a 6015 mipmapped min filter is used, the results are undefined. 6016 6017 (27) How does the TXF instruction work with bordered textures? 6018 6019 RESOLVED: The entire image can be accessed, including the border 6020 texels. For a 64x64 2D texture plus border (66x66 overall), the lower 6021 left border texel is accessed using the coordinates (-1,-1); the upper 6022 right border texel is accessed using the coordinates (64,64). 6023 6024 (28) What should TXQ (texture size query) return for "irrelevant" texture 6025 sizes (e.g., height of a 1D texture)? Should it return any other 6026 information at the same time? 6027 6028 RESOLVED: This specification leaves all "extra" components undefined. 6029 6030 (29) How do texture offsets interact with cubemap textures? 6031 6032 RESOLVED: They are not supported in this extension. 6033 6034 (30) How do texture offsets interact with mipmapped textures? 6035 6036 RESOLVED: The texture offsets are added after the (s,t,r) coordinates 6037 have been divided by q (if applicable) and converted to (u,v,w) 6038 coordinates by multiplying by the size of the selected texture level. 6039 The offsets are added to the (u,v,w) coordinates, and always move the 6040 sample point by an integral number of texel coordinates. If multiple 6041 mipmaps are accessed, the sample point in each mipmap level is moved by 6042 an identical offset. The applied offsets are independent of the 6043 selected mipmap level. 6044 6045 (31) How do shadow cube maps work? 6046 6047 UNRESOLVED: An application can define a cube map texture with a 6048 DEPTH_COMPONENT internal format, and then render a scene using the cube 6049 map faces as the depth buffer(s). When rendering the projection should 6050 be set up using the "center" of the cubemap as the eye, and using a 6051 normal projection matrix. When applying the shadow map, the fragment 6052 program read the (x,y,z) eye coordinates, compute the length of the 6053 major axis (MAX(|x|,|y|,|z|) and then transform this coordinate to [0,1] 6054 space using the same parameters used to derive Z in the projection 6055 matrix. A 4-component vector consisting of x, y, z, and this computed 6056 depth value should be passed to the texture lookup, and normal shadow 6057 mapping operations will be performed. 6058 6059 This issue should include the math needed to do this computation and 6060 sample code. 6061 6062 (32) Integer multiplies can overflow by a lot. Should there be some way 6063 to return the high part of both unsigned and signed integer multiplies? 6064 6065 RESOLVED: Yes. The ".HI" multipler is provided to do a return the 32 6066 MSBs of a 32x32 integer multiply. The instruction sequence: 6067 6068 INT TEMP R0, R1, R2, R3; 6069 MUL.S R0, R2, R3; 6070 MUL.S.HI R1, R2, R3; 6071 6072 will do a 32x32 signed integer multiply of R2 and R3, with the 32 LSBs of 6073 the 64-bit result in R0 and the 32 MSBs in R1. 6074 6075 (33) Should there be any other special multiplication modifiers? 6076 6077 RESOLVED: Yes. The ".S24" and ".U24" modifiers allow for signed and 6078 unsigned integer multiplies where both operands are guaranteed to fit in 6079 the least significant 24 bits. On some architectures supporting this 6080 extension, ".S24" and ".U24" integer multiplies may be faster than 6081 general-purpose ".S" and ".U" multiplies. If either value doesn't fit 6082 in 24 bits, the results of the operation are undefined -- 6083 implementations may, but are not required to, ignore the MSBs of the 6084 operands if ".S24" or ".U24" is specified. 6085 6086 (34) This extension provides subroutines, but doesn't provide a stack to 6087 push and pop parameters. How do we deal with this? NV_vertex_program3 6088 supported PUSHA/POPA instructions to push and pop address registers. 6089 6090 RESOLVED: No explicit stack is required. A program can implement a 6091 stack by allocating a temporary array plus a single integer temporary to 6092 use as the stack "pointer". For example: 6093 6094 TEMP stack[256]; # 256 4-component vectors 6095 INT TEMP sp; # sp.x == stack pointer 6096 INT TEMP cc; # condition code results 6097 6098 function: 6099 SGE.S.CC cc.x, sp.x, 256; # compute stackPointer >= 256 6100 RET NE.x; # return if TRUE 6101 MOV stack[sp], R0; # push R0 onto the stack 6102 ADD.S sp.x, sp.x, 1; 6103 ... 6104 SUB.S sp.x, sp.x, 1; # pop R0 off the stack 6105 MOV R0, stack[sp]; 6106 RET 6107 6108 (35) Should we provide new vector semantics for previously-defined opcodes 6109 (e.g., LG2 computes a component-wise logarithm)? 6110 6111 RESOLVED: Not in this extension. The instructions we define here are 6112 compatible with the vector or scalar nature of previously defined 6113 opcodes. This simplifies the implementation of an assembler that needs 6114 to support both old and new instruction sets. 6115 6116 (36) Should it really be undefined to read from a register storing data of 6117 one type with an instruction of the other type (e.g., to read the bits of 6118 a floating-point number as an unsigned integer)? 6119 6120 RESOLVED: The spec describes undefined results for simplicity. In 6121 practice, mixing data types can be done, where signed integers are 6122 represented as two's complement integers and floating-point numbers are 6123 represented using IEEE single-precision representation. For example: 6124 6125 TEMP R0, R1; # typeless 6126 MOV.U R0, 0x3F800000; # R0 = 1.0 6127 MOV.U R1, 0xBF800000; # R1 = -1.0 6128 MUL.F R0, R0, R1; # R0 = -1 * 1 = -1 (0xBF800000) 6129 XOR.U R0, R0, R1; # R0 = 0xBF800000 ^ 0xBF800000 = 0 6130 NOT.U R0, R0; # R0 = 0xFFFFFFFF 6131 I2F.S R0, R0; # R0 = -1.0 (0xFFFFFFFF = -1 signed) 6132 SEQ.F R0, R0, R1; # R0 = 1.0 (-1.0 == -1.0) 6133 6134 (37) Buffer objects can be sourced as program parameters using the 6135 NV_parameter_buffer_object extension. How are they accessed in a program? 6136 6137 RESOLVED: The instruction set and existing program environment and 6138 local parameter bindings operate largely on four-component vectors. 6139 However, NV_parameter_buffer_object exposes the ability to reach into 6140 buffers consisting of user-generated data or data written to the buffer 6141 object by the GPU. Such data sets may not consist entirely 6142 four-component floating-point vectors, so a four-component vector API 6143 may be unnatural. An application might need to reformat its data set to 6144 deal with this issue. Or it might generate odd code to compensate for 6145 mis-alignment -- for example, reading an array of 3-component vectors by 6146 doing two four-component vector accesses and then rotating based on 6147 alignment. Neither approach is particularly satisfying. 6148 6149 Instead, this extension takes the approach of treating parameter buffers 6150 as array of scalar words. When an individual buffer element is read, 6151 the single word is replicated to produce a four-component vector. To 6152 access an array of 3-component vectors, code like the following can be 6153 used: 6154 6155 PARAM buffer[] = { program.buffer[0] }; 6156 INT TEMP index; 6157 TEMP R0; 6158 ... 6159 MUL.S index, index, 3; # to read "vec3" #X, compute 3*X 6160 MOV R0.x, buffer[index+0]; 6161 MOV R0.y, buffer[index+1]; 6162 MOV R0.z, buffer[index+2]; 6163 6164 (38) Should recursion be allowed? If so, how is the total amount of 6165 recursion limited? 6166 6167 RESOLVED: Recursion is allowed, and a call stack is provided by the 6168 implementation. The size of the call stack is limited to the 6169 implementation-dependent constant MAX_PROGRAM_CALL_DEPTH, and when a the 6170 call stack is full, the results of further CAL instructions is 6171 undefined. In the initial implementation of this extension, such 6172 instructions will have no effect. 6173 6174 Note that no stack is provided to hold local registers; a program may 6175 implement its own via a temporary array and integer stack "pointer". 6176 6177 (39) Variables are all four-component vectors in previous extensions. 6178 Should scalar or small-vector variables be provided? 6179 6180 RESOLVED: It would be a useful feature, but it was left out for 6181 simplicity. In practice, a variable where only the X component is used 6182 will be equivalent to a scalar. 6183 6184 (40) The PK* (pack) and UP* (unpack) instructions allow packing multiple 6185 components of data into a single component. The bit packing is 6186 well-defined. Should we require specific data types (e.g., unsigned 6187 integer) to hold packed values? 6188 6189 RESOLVED: No. Previous instruction sets only allowed programs to write 6190 packed values to a floating-point variable (the only data type 6191 provided). We will allow packed results to be written to a variable of 6192 any data type. Integer instructions can be used to manipulate bits of 6193 packed data in place. 6194 6195 (41) What happens when converting integers to floats or vice versa if 6196 there is insufficient precision or range to represent the result? 6197 6198 RESOLVED: For integer-to-float conversions, the nearest representable 6199 floating-point value is used, and the least significant bits of the 6200 original integer value are lost. For float-to-integer conversions, 6201 out-of-range values are clamped to the nearest representable integer. 6202 6203 (42) Why are some of the grammar rules so bizarre (e.g., attribUseD, 6204 attribUseV, attribUseS, attribUseVNS)? 6205 6206 RESOLVED: This grammar is based upon the original ARB_vertex_program 6207 grammar, which has a number of "interesting" characteristics. For 6208 example, some of the bindings provided by ARB_vertex_program naturally 6209 require some amount of lookahead. For example, a vertex program can 6210 write an output color using any of the following: 6211 6212 MOV result.color, 0; # primary color 6213 MOV result.color.primary, 0; # primary color again 6214 MOV result.color.secondary, 0; # secondary color this time 6215 6216 The pieces of the color binding are separated by "." tokens. However, 6217 writemasks are also supported, which also use "." before the write 6218 mask. So, we could also have something like: 6219 6220 MOV result.color.xyz, 0; # primary color with W masked off 6221 6222 In this form, a parser needs to look at both the "." and the "xyz" to 6223 determine that the binding being used is "result.color" (and not 6224 "result.color.secondary"). 6225 6226 Additionally, some checks that should probably be semantic errors (e.g., 6227 allowing different swizzle or scalar operand selectors per instruction, 6228 or disallowing both in the case of SWZ) we specified in the original 6229 grammar. 6230 6231 ARB_fragment_program and subsequent NVIDIA instructions built upon this, 6232 and the grammar for this extension was rewritten in the current form so 6233 it could be validated more easily. 6234 6235 (43) This is an NV extension (NV_gpu_program4). Why does the 6236 MAX_PROGRAM_TEXEL_OFFSET_EXT token has an "EXT" suffix? 6237 6238 RESOLVED: This token is shared between this extension and the 6239 comparable high-level GLSL programmability extension (EXT_gpu_shader4). 6240 Rather than provide a duplicate set of token names, we simply use the 6241 EXT version here. 6242 6243 (44) For the purposes of determining the number of attribute and result 6244 components, how are "scalar" attributes counted. For example, only 6245 the x component of the "pointsize" per-vertex output is actually 6246 relevant. 6247 6248 RESOLVED: Implementations are allowed to count all inputs and outputs 6249 as full four-component vectors. To avoid this, apply appropriate write 6250 masks or swizzles. 6251 6252 For example, writing to "result.pointsize" may count as four components. 6253 Consistently writing to "result.pointsize.x" may only count as one. 6254 Similarly, reading a fragment's fog coordinate as "fragment.fogcoord" 6255 may count as four components; "fragment.fogcoord.x" will only count as 6256 one. 6257 6258Revision History 6259 6260 Rev. Date Author Changes 6261 ---- -------- -------- -------------------------------------------- 6262 11 09/11/14 pbrown Fix cut-and-paste error in PK2US section. 6263 6264 10 12/14/09 mgodse Added GLX protocol. 6265 6266 9 10/29/09 pbrown Add language for previously undocumented errors 6267 when using "SHORT" and "LONG" modifiers on 6268 variable declarations. They're allowed only on 6269 "TEMP" statements, except that "SHORT" is 6270 allowed for "OUTPUT" as well. 6271 6272 8 08/11/08 jbreton Clarified that when a MOD instruction is 6273 performed on negative operands the result is 6274 undefined. 6275 6276 7 07/29/08 pbrown Discovered additional issues with texture wrap 6277 handling, replaced with logic that applies wrap 6278 modes per sample. Add a few instruction 6279 pseudo-code lines explicitly identifying 6280 undefined components. 6281 6282 6 05/02/08 pbrown Fix the prototype for the internal TexelFetch() 6283 function used in the spec language; texel 6284 coordinates are signed integers. 6285 6286 5 02/22/08 pbrown Clarified that when counting attribute/result 6287 components, irrelevant/undefined components 6288 can still count against the limits. 6289 6290 4 02/04/08 pbrown Fix errors in texture wrap mode handling. 6291 Added a missing clamp to avoid sampling border 6292 in REPEAT mode. Fixed incorrectly specified 6293 weights for LINEAR filtering. 6294 6295 3 02/09/07 pbrown Updated status section (now released). 6296 6297 2 10/19/06 pbrown Change the token suffix for maximum texel offset 6298 values from NV to EXT, since it is shared with 6299 EXT_gpu_shader4. Clarify what happens on a 6300 negate of an unsigned value. Fix typo in data 6301 type modifier description. Add missing 6302 description of the "BUFFER4" declaration 6303 keyword. 6304 6305 1 pbrown Internal spec development. 6306