1Name 2 3 NV_fragment_program2 4 5Name Strings 6 7 GL_NV_fragment_program2 8 9Contact 10 11 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 12 Eric Werness, NVIDIA Corporation (ewerness 'at' nvidia.com) 13 14Status 15 16 Shipping. 17 18Version 19 20 Last Modified: 08/04/2004 21 NVIDIA Revision: 8 22 23Number 24 25 304 26 27 28Dependencies 29 30 ARB_fragment_program is required. 31 NV_fragment_program_option is required. 32 33Overview 34 35 This extension, like the NV_fragment_program_option extension, provides 36 additional fragment program functionality to extend the standard 37 ARB_fragment_program language and execution environment. ARB programs 38 wishing to use this added functionality need only add: 39 40 OPTION NV_fragment_program2; 41 42 to the beginning of their fragment programs. 43 44 New functionality provided by this extension, above and beyond that 45 already provided by the NV_fragment_program_option extension, includes: 46 47 48 * structured branching support, including data-dependent IF tests, loops 49 supporting a fixed number of iterations, and a data-dependent loop 50 exit instruction (BRK), 51 52 * subroutine calls, 53 54 * instructions to perform vector normalization, divide vector components 55 by a scalar, and perform two-component dot products (with or without a 56 scalar add), 57 58 * an instruction to perform a texture lookup with an explicit LOD, 59 60 * a loop index register for indirect access into the texture coordinate 61 attribute array, and 62 63 * a facing attribute that indicates whether the fragment is generated 64 from a front- or back-facing primitive. 65 66 67Issues 68 69 * Should this extension expose projective forms of the LOD-modifying 70 texture instructions? 71 72 RESOLVED: No. The user can manually add a DIV instruction to achieve 73 the same effect. 74 75 * Should this extension expose precision explicitly? 76 77 RESOLVED: Only for storage using the SHORT TEMP and LONG TEMP syntax 78 (similar to NV_fragment_program_option). 79 80 * How are resources (such as registers and condition codes) scoped? 81 82 RESOLVED: All resources are globally scoped. This means that if, for 83 instance, a subroutine modifies a condition code, that modification 84 effects both the caller and the callee. 85 86 * How is the scope determined for instructions required to be within a 87 specific loop construct? 88 89 RESOLVED: The scope is determined statically at compile time. This means 90 that calling BRK and using A0 from a subroutine called within a loop is 91 a compile error. 92 93 94New Procedures and Functions 95 96 None. 97 98New Tokens 99 100 Accepted by the <pname> parameter of GetProgramivARB: 101 102 MAX_PROGRAM_EXEC_INSTRUCTIONS_NV 0x88F4 103 MAX_PROGRAM_CALL_DEPTH_NV 0x88F5 104 MAX_PROGRAM_IF_DEPTH_NV 0x88F6 105 MAX_PROGRAM_LOOP_DEPTH_NV 0x88F7 106 MAX_PROGRAM_LOOP_COUNT_NV 0x88F8 107 108 109Additions to Chapter 2 of the OpenGL 1.2.1 Specification (OpenGL Operation) 110 111 None. 112 113Additions to Chapter 3 of the OpenGL 1.2.1 Specification (Rasterization) 114 115 Modify Section 3.11 of ARB_fragment_program (Fragment Program): 116 117 Delete the sentence referring to the lack of branching or looping. 118 119 Modify Section 3.11.2 of ARB_fragment_program (Fragment Program Grammar 120 and Restrictions): 121 122 (mostly add to existing grammar rules, as extended by 123 NV_fragment_program_option) 124 125 <optionName> ::= "NV_fragment_program2" 126 127 <statement> ::= <branchLabel> ":" 128 129 <instruction> ::= <FlowInstruction> 130 131 <ALUInstruction> ::= <VECSCAop_instruction> 132 133 <FlowInstruction> ::= <BRAop_instruction> 134 | <FLOWCCop_instruction> 135 | <IFop_instruction> 136 | <LOOPop_instruction> 137 | <ENDFLOWop_instruction> 138 139 <VECTORop> ::= "NRM" 140 141 <VECSCAop_instruction> ::= <VECSCAop> <instResult> "," <instOperandV> "," 142 <instOperandS> 143 144 <VECSCAop> ::= "DIV" 145 146 <BINop> ::= "DP2" 147 148 <TRIop> ::= "DP2A" 149 150 <TEXop> ::= "TXL" 151 152 <BRAop_instruction> ::= <BRAop> <branchLabel> <optBranchCond> 153 154 <BRAop> ::= "CAL" 155 156 <FLOWCCop_instruction> ::= <FLOWCCop> <optBranchCond> 157 158 <FLOWCCop> ::= "RET" 159 | "BRK" 160 161 <IFop_instruction> ::= <IFop> <ccTest> 162 163 <IFop> ::= "IF" 164 165 <LOOPop_instruction> ::= <LOOPop> <instOperandV> 166 167 <LOOPop> ::= "LOOP" 168 | "REP" 169 170 <ENDFLOWop_instruction> ::= <ENDFLOWop> 171 172 <ENDFLOWop> ::= "ELSE" 173 | "ENDIF" 174 | "ENDLOOP" 175 | "ENDREP" 176 177 <optBranchCond> ::= /* empty */ 178 | <ccMask> 179 180 <branchLabel> ::= <identifier> 181 182 <attribFragBasic> ::= "texcoord" "[" <arrayMemRel> "]" 183 | "facing" 184 185 <arrayMemRel> ::= <addrUseS> <arrayMemRelOffset> 186 187 <arrayMemRelOffset> ::= /* empty */ 188 | "+" <addrRegPosOffset> 189 190 <addrRegPosOffset> ::= <integer> from 0 to 9 191 192 <addrUseS> ::= <addrVarName> <scalarAddrSuffix> 193 194 <scalarAddrSuffix> ::= "." <addrComponent> 195 196 <addrComponent> ::= "x" 197 198 Note: This extension provides a pre-defined address register (A0) that 199 matches the <addrVarName> grammar rule and can be used as a loop counter 200 (Section 3.11.3.Y). It is not possible to declare additional address 201 register variables. 202 203 204 Modify Section 3.11.3.1, Fragment Attributes 205 206 (add new bindings to binding table) 207 208 Fragment Attribute Binding Components Underlying State 209 -------------------------- ---------- ---------------------------- 210 ... 211 fragment.texcoord[A0.x+n] (s,t,r,q) indexed texture coordinate 212 fragment.facing (f,0,0,1) fragment facing 213 214 If a fragment attribute binding matches "fragment.texcoord[A0.x+n]", a 215 texture coordinate number <c> is computed by adding the current value of 216 the "A0.x" address register (the loop index -- Section 3.11.3.Y) and <n>. 217 The "x", "y", "z", and "w" components of the fragment attribute variable 218 are filled with the "s", "t", "r", and "q" components, respectively, of 219 the fragment texture coordinates for texture coordinate set <c>. If <c> 220 is negative or greater than or equal to MAX_TEXTURE_COORDS_ARB, the 221 fragment attribute variable is undefined. 222 223 If a fragment attribute binding matches "fragment.facing", the "x" 224 component of the fragment attribute variable is filled with +1.0 or -1.0, 225 depending on the orientation of the primitive producing the fragment. If 226 the fragment is generated by a back-facing polygon (including point- and 227 line-mode polygons), the facing is -1.0; otherwise, the facing is +1.0. 228 The "y", "z", and "w" coordinates are filled with 0, 0, and 1, 229 respectively. 230 231 232 Add New Section 3.11.3.Y, Fragment Program Address Register (insert after 233 Section 3.11.3.X, Condition Code Register) 234 235 Fragment program address register variables are a set of four-component 236 signed integer vectors where only the "x" component of the address 237 registers is currently accessible. Address registers are used as indices 238 when performing relative addressing in the "fragment.texcoord" attribute 239 array (section 3.11.3.1). 240 241 Fragment program address registers can not be declared in a fragment 242 program. There is only a single built-in address register, "A0.x" (loop 243 index), which is available inside LOOP/ENDLOOP blocks. A fragment program 244 that accesses A0.x outside a LOOP/ENDLOOP block will fail to load. 245 246 A0.x is initialized in by the LOOP instruction and updated by the ENDLOOP 247 instruction. When LOOP blocks are nested, each block has its own value 248 for A0.x, but only the A0.x value for the innermost block can be used. The 249 value of A0.x is clamped to be greater than or equal to 0. 250 251 252 Modify Section 3.11.4, Fragment Program Execution Environment 253 254 (modify instruction table) There are sixty-seven fragment program 255 instructions.... 256 257 Modifiers 258 Instr. R H X C S Inputs Output Description 259 ------- - - - - - ------ ------ -------------------------------- 260 ABS X X X X X v v absolute value 261 ADD X X X X X v,v v add 262 BRK - - - - - c - break out of loop instruction 263 CAL - - - - - c - subroutine call 264 CMP - - - X X v,v,v v compare 265 COS X X - X X s ssss cosine with reduction to [-PI,PI] 266 DDX X X - X X v v partial derivative relative to X 267 DDY X X - X X v v partial derivative relative to Y 268 DIV X X - X X v,s v divide vector components by scalar 269 DP2 X X X X X v,v ssss 2-component dot product 270 DP2A X X X X X v,v,v ssss 2-comp. dot product w/scalar add 271 DP3 X X X X X v,v ssss 3-component dot product 272 DP4 X X X X X v,v ssss 4-component dot product 273 DPH X X X X X v,v ssss homogeneous dot product 274 DST X X - X X v,v v distance vector 275 ELSE - - - - - - - start if test else block 276 ENDIF - - - - - - - end if test block 277 ENDLOOP - - - - - - - end of loop block 278 ENDREP - - - - - - - end of repeat block 279 EX2 X X - X X s ssss exponential base 2 280 FLR X X X X X v v floor 281 FRC X X X X X v v fraction 282 IF - - - - - c - start of if test block 283 KIL - - - - - v or c - kill fragment 284 LG2 X X - X X s ssss logarithm base 2 285 LIT X X - X X v v compute light coefficients 286 LOOP - - - - - v - start of loop block 287 LRP X X X X X v,v,v v linear interpolation 288 MAD X X X X X v,v,v v multiply and add 289 MAX X X X X X v,v v maximum 290 MIN X X X X X v,v v minimum 291 MOV X X X X X v v move 292 MUL X X X X X v,v v multiply 293 NRM X X - X X v v normalize 3-component vector 294 PK2H - - - - - v ssss pack two 16-bit floats 295 PK2US - - - - - v ssss pack two unsigned 16-bit scalars 296 PK4B - - - - - v ssss pack four signed 8-bit scalars 297 PK4UB - - - - - v ssss pack four unsigned 8-bit scalars 298 POW X X - X X s,s ssss exponentiate 299 RCP X X - X X s ssss reciprocal 300 REP - - - - - v - start of repeat block 301 RET - - - - - c - subroutine return 302 RFL X X - X X v,v v reflection vector 303 RSQ X X - X X s ssss reciprocal square root 304 SCS X X - X X s ss-- sine/cosine without reduction 305 SEQ X X X X X v,v v set on equal 306 SFL X X X X X v,v v set on false 307 SGE X X X X X v,v v set on greater than or equal 308 SGT X X X X X v,v v set on greater than 309 SIN X X - X X s ssss sine with reduction to [-PI,PI] 310 SLE X X X X X v,v v set on less than or equal 311 SLT X X X X X v,v v set on less than 312 SNE X X X X X v,v v set on not equal 313 STR X X X X X v,v v set on true 314 SUB X X X X X v,v v subtract 315 SWZ X X - X X v v extended swizzle 316 TEX - - - X X v v texture sample 317 TXB - - - X X v v texture sample with bias 318 TXD - - - X X v,v,v v texture sample w/partials 319 TXL - - - X X v v texture same w/explicit LOD 320 TXP - - - X X v v texture sample with projection 321 UP2H - - - X X s v unpack two 16-bit floats 322 UP2US - - - X X s v unpack two unsigned 16-bit scalars 323 UP4B - - - X X s v unpack four signed 8-bit scalars 324 UP4UB - - - X X s v unpack four unsigned 8-bit scalars 325 X2D X X - X X v,v,v v 2D coordinate transformation 326 XPD X X - X X v,v v cross product 327 328 Table X.5: Summary of fragment program instructions. The columns "R", 329 "H", "X", "C", and "S" indicate whether the "R", "H", or "X" precision 330 modifiers, the C condition code update modifier, and the "_SAT"/"_SSAT" 331 saturation modifiers, respectively, are supported for the opcode. In 332 the input/output columns, "v" indicates a floating-point vector input or 333 output, "s" indicates a floating-point scalar input, "ssss" indicates a 334 scalar output replicated across a 4-component result vector, "ss--" 335 indicates two scalar outputs in the first two components, and "c" 336 indicates a condition code test. Instructions describe as "texture 337 sample" also specify a texture image unit identifier and a texture 338 target. 339 340 341 Modify Section 3.11.4.3, Fragment Program Destination Register Update 342 343 (modify saturation discussion) If the instruction opcode has the "_SAT" 344 suffix, requesting saturated result vectors, each component of the result 345 vector is clamped to the range [0,1] before updating the destination 346 register. If the instruction opcode has the "_SSAT" suffix, requesting 347 signed saturation, each component of the result vector is clamped to the 348 range [-1,1] before updating the destination register. 349 350 351 Add Section 3.11.4.X, Fragment Program Branching (before Section 3.11.4.4, 352 Fragment Program Result Processing) 353 354 Fragment programs support a limited model of branching. Fragment programs 355 can specify one of several types of instruction blocks: IF/ELSE/ENDIF 356 blocks, LOOP/ENDLOOP blocks, and REP/ENDREP blocks. Examples include the 357 following: 358 359 LOOP {5, 0, 1}; # 5 iterations with loop index at 0,1,2,3,4 360 ADD R0, R0, R1; 361 ENDLOOP; 362 363 REP repCount; 364 ADD R0, R0, R1; 365 ENDREP; 366 367 MOVC CC, R0; 368 IF GT.x; 369 MOV R0, R1; # executes if R0.x > 0 370 ELSE; 371 MOV R0, R2; # executes if R0.x <= 0 372 ENDIF; 373 374 Instruction blocks may be nested -- for example, a LOOP block may be 375 contained inside an IF/ELSE/ENDIF block. In all cases, each instruction 376 block must be terminated with the appropriate instruction (ENDIF for IF, 377 ENDLOOP for LOOP, ENDREP for REP). Nested instruction blocks must be 378 wholly contained within a block -- if a LOOP instruction is found between 379 an IF and ELSE instruction, the ENDLOOP must also be present between the 380 IF and ELSE. A fragment program will fail to load if any instruction 381 block is terminated by an incorrect instruction or is not terminated 382 before the block containing it. 383 384 IF/ELSE/ENDIF blocks evaluate a condition to determine which instructions 385 to execute. If the condition is true, all instructions between the IF and 386 ELSE are executed. If the condition is false, all instructions between 387 the ELSE and ENDIF are executed. The ELSE instruction is optional. If 388 the ELSE is omitted, all instructions between the IF and ENDIF are 389 executed if the condition is true, or skipped if the condition is false. 390 A limited amount of nesting is supported -- a fragment program will fail 391 to load if an IF instruction is nested inside MAX_PROGRAM_IF_DEPTH_NV or 392 more IF/ELSE/ENDIF blocks. 393 394 The condition of an IF test is specified by the <ccTest> grammar rule and 395 may depend on the contents of the condition code register. Branch 396 conditions are evaluated by evaluating a condition code write mask in 397 exactly the same manner as done for register writes (section 2.14.2.2). 398 If any of the four components of the condition code write mask are 399 enabled, the branch is taken and execution continues with the instruction 400 following the label specified in the instruction. Otherwise, the 401 instruction is ignored and fragment program execution continues with the 402 next instruction. In the following example code, 403 404 MOVC CC, c[0]; # c[0]=(-2, 0, 2, NaN), CC gets (LT,EQ,GT,UN) 405 CAL label1 (LT.xyzw); # call taken 406 CAL label2 (LT.wyzw); # call not taken 407 408 the first CAL instruction loads a condition code of (LT,EQ,GT,UN) while 409 the second CAL instruction loads a condition code of (UN,EQ,GT,UN). The 410 first call will be made because the "x" component evaluates to LT; the 411 second call will not be made because no component evaluates to LT. 412 413 LOOP/ENDLOOP and REP/ENDREP blocks involve a loop counter that indicates 414 the number of times the instructions between the LOOP/REP and 415 ENDLOOP/ENDREP are executed. Looping blocks have a number of significant 416 limitations. First, the loop counter can not be computed at run time; it 417 must be specified as a program parameter. Second, the number of loop 418 iterations is limited to the value MAX_PROGRAM_LOOP_COUNT_NV, which must 419 be at least 255. Third, only a limited amount of nesting is supported -- 420 a fragment program will fail to load if a LOOP or REP instruction is 421 nested inside MAX_PROGRAM_LOOP_DEPTH_NV or more LOOP/ENDLOOP or REP/ENDREP 422 blocks. 423 424 The BRK instruction is available to terminate a loop block early. A BRK 425 instruction can be conditional; the condition is evaluated in the same 426 manner as the condition of an IF instruction, and the loop is terminated 427 if the condition is true. A fragment program will fail to load if it 428 contains a BRK instruction that is not nested inside a LOOP/ENDLOOP or 429 REP/ENDREP block. 430 431 Fragment programs can contain one or more instruction labels, matching the 432 grammar rule <branchLabel>. An instruction label can be referred to 433 explicitly in subroutine call (CAL) instructions. Instruction labels can 434 be used at any point in the body of a program, and can be used in 435 instructions before being defined in the program string. Instruction 436 labels can be defined anywhere in the program, except inside an 437 IF/ELSE/ENDIF, LOOP/ENDLOOP, or REP/ENDREP instruction block. A fragment 438 program will fail to load if it contains an instruction label inside an 439 instruction block. 440 441 Fragment programs can also specify subroutine calls. When a subroutine 442 call (CAL) instruction is executed, a reference to the instruction 443 immediately following the CAL instruction is pushed onto the call stack. 444 When a subroutine return (RET) instruction is executed, an instruction 445 reference is popped off the call stack and program execution continues 446 with the popped instruction. A fragment program will terminate if a CAL 447 instruction is executed with MAX_PROGRAM_CALL_DEPTH_NV entries already in 448 the call stack or if a RET instruction is executed with an empty call 449 stack. Subroutine calls may be conditional; the condition is specified by 450 the <optBranchCond> grammar rule and evaluated in the same way as the 451 condition of the IF instruction. If no condition is specified, it is as 452 though "(TR)" were specified -- the branch is unconditional. 453 454 If a fragment program has an instruction label "main", program execution 455 begins with the instruction immediately following the instruction label. 456 Otherwise, program execution begins with the first instruction of the 457 program. Instructions will be executed sequentially in the order 458 specified in the program, although branch instructions will affect the 459 instruction execution order, as described above. A fragment program will 460 terminate after executing a RET instruction with an empty call stack. A 461 fragment program will also terminate after executing the last instruction 462 in the program, unless that instruction was a taken branch. 463 464 A fragment program will fail to load if an instruction refers to a label 465 that is not defined in the program string. 466 467 A fragment program will terminate abnormally if a subroutine call 468 instruction produces a call stack overflow. Additionally, a fragment 469 program will terminate abnormally after executing 470 MAX_PROGRAM_EXEC_INSTRUCTIONS instructions to prevent hangs caused by 471 infinite loops in the program. 472 473 When a fragment program terminates, normally or abnormally, it will emit a 474 fragment whose attributes are taken from the final values of the fragment 475 program result variables (section 3.11.3.4). 476 477 478 Add to Section 3.11.4.5 of ARB_fragment_program (Fragment Program 479 Options): 480 481 Section 3.11.4.5.3, NV_fragment_program2 Option 482 483 If a fragment program specifies the "NV_fragment_program2" option, the 484 ARB_fragment_program grammar and execution environment are extended to 485 take advantage of all the features of the "NV_fragment_program" option, 486 plus the following features: 487 488 * structured branching support, including data-dependent IF tests, loops 489 supporting a fixed number of iterations, and a data-dependent loop 490 exit instruction (BRK), 491 492 * subroutine calls, 493 494 * several new instructions: 495 496 * NRM -- vector normalization 497 * DIV -- divide vector components by a scalar 498 * DP2 -- two-component dot product 499 * DP2A -- two-component dot product with scalar add 500 * TXL -- texture lookup with explicit LOD specified 501 * IF/ELSE/ENDIF -- conditional execution blocks 502 * REP/ENDREP -- loop block 503 * LOOP/ENDLOOP -- loop block using index register 504 * BRK -- break out of loop block 505 * CAL -- subroutine call 506 * RET -- subroutine return 507 508 * a loop index register inside LOOP/ENDLOOP blocks that can be used for 509 indirect access into the texture coordinate attribute array, and 510 511 * a facing attribute that indicates whether the fragment is generated 512 from a front- or back-facing primitive. 513 514 515 Modify Section 3.11.5, Fragment Program ALU Instruction Set 516 517 Section 3.11.5.48, DIV: Divide (Vector Components by Scalar) 518 519 The DIV instruction divides each component of the first vector operand by 520 the second scalar operand to produce a 4-component result vector. 521 522 tmp0 = VectorLoad(op0); 523 tmp1 = ScalarLoad(op1); 524 result.x = tmp0.x / tmp1; 525 result.y = tmp0.y / tmp1; 526 result.z = tmp0.z / tmp1; 527 result.w = tmp0.w / tmp1; 528 529 This instruction may not produce results identical to a RCP/MUL 530 instruction sequence. 531 532 533 Section 3.11.5.49, DP2: 2-Component Dot Product 534 535 The DP2 instruction computes a two-component dot product of the two 536 operands (using the first two components) and replicates the dot product 537 to all four components of the result vector. 538 539 tmp0 = VectorLoad(op0); 540 tmp1 = VectorLoad(op1); 541 dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y); 542 result.x = dot; 543 result.y = dot; 544 result.z = dot; 545 result.w = dot; 546 547 Section 3.11.5.50, DP2A: 2-Component Dot Product w/Scalar Add 548 549 The DP2 instruction computes a two-component dot product of the two 550 operands (using the first two components), adds the x component of the 551 third operand, and replicates the result to all four components of the 552 result vector. 553 554 tmp0 = VectorLoad(op0); 555 tmp1 = VectorLoad(op1); 556 tmp2 = VectorLoad(op2); 557 dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + tmp2.x; 558 result.x = dot; 559 result.y = dot; 560 result.z = dot; 561 result.w = dot; 562 563 564 Section 3.11.5.51, NRM: 3-Component Vector Normalize 565 566 The NRM instruction normalizes the vector given by the x, y, and z 567 components of the vector operand to produce the x, y, and z components of 568 the result vector. The w component of the result is undefined. 569 570 tmp = VectorLoad(op0); 571 scale = ApproxRSQ(tmp.x * tmp.x + tmp.y * tmp.y + tmp.z * tmp.z); 572 result.x = tmp.x * scale; 573 result.y = tmp.y * scale; 574 result.z = tmp.z * scale; 575 result.w = undefined; 576 577 Note that the normalization uses an approximate scale and may be carried 578 at lower precision than a corresponding sequence of DP3, RSQ, and MUL 579 instructions. 580 581 582 Add Section 3.11.6.6, TXL: Texture Lookup with Explicit LOD 583 584 The TXL instruction takes the x, y, and z components of the vector operand 585 and maps them to s, t, and r, respectively. These coordinates are used to 586 sample from the specified texture target on the specified texture image 587 unit in a manner consistent with its parameters. 588 589 The level of detail is computed as specified in section 3.8.8, except that 590 rho(x,y) is given by 2^w, where w is the w component of the vector 591 operand. 592 593 The resulting sample is mapped to RGBA as described in table 3.21 594 and written to the result vector. 595 596 tmp = VectorLoad(op0); 597 result = TextureSample(tmp.x, tmp.y, tmp.z, 0.0, op1, op2); 598 599 600 Add Section 3.11.X, Fragment Program Flow Control Instruction Set 601 (immediately after Section 3.11.6, Fragment Program Texture Instruction 602 Set) 603 604 3.11.X.1, BRK: Break 605 606 The BRK instruction conditionally transfers control to the instruction 607 immediately following the next ENDLOOP or ENDREP instruction. A BRK 608 instruction has no effect if the condition code test evaluates to FALSE. 609 610 The following pseudocode describes the operation of the instruction: 611 612 if (TestCC(cc.c***) || TestCC(cc.*c**) || 613 TestCC(cc.**c*) || TestCC(cc.***c)) { 614 continue execution at instruction following the next ENDLOOP or 615 ENDREP; 616 } 617 618 619 3.11.X.2, CAL: Subroutine Call 620 621 The CAL instruction conditionally transfers control to the instruction 622 following the label specified in the instruction. A CAL instruction has 623 no effect if the condition code test evaluates to FALSE. 624 625 When executed, the CAL instruction pushes a reference to the instruction 626 immediately following the CAL instruction onto the call stack. When a 627 matching RET instruction is executed, execution will continue at that 628 instruction after executing the matching RET instruction. 629 630 Implementations may have a limited call stack. If the number of CAL 631 instructions that have been performed without returning is 632 MAX_PROGRAM_CALL_DEPTH_NV, a CAL instruction will cause the call stack to 633 overflow and the fragment program to terminate. 634 635 The following pseudocode describes the operation of the instruction: 636 637 if (TestCC(cc.c***) || TestCC(cc.*c**) || 638 TestCC(cc.**c*) || TestCC(cc.***c)) { 639 640 // Check for call stack overflow. 641 if (callStackDepth >= MAX_PROGRAM_CALL_DEPTH_NV) { 642 terminate fragment program; 643 } 644 645 push instruction following the CAL instruction on the call stack; 646 continue execution at instruction following <branchLabel>; 647 } 648 649 650 3.11.X.3, ELSE: Beginning of ELSE Block 651 652 The ELSE instruction signifies the end of the "execute if true" portion of 653 an IF/ELSE/ENDIF block. 654 655 If the condition evaluated at the IF statement was TRUE, when a program 656 reaches the ELSE statement, it has completed the entire "execute if true" 657 portion of the IF/ELSE/ENDIF block. Execution will continue at the 658 corresponding ENDIF instruction. 659 660 If the condition evaluated at the IF statement was FALSE, program 661 execution would skip over the entire "execute if true" portion of the 662 IF/ELSE/ENDIF block, including the ELSE instruction. 663 664 665 3.11.X.4, ENDIF: End of IF/ELSE Block 666 667 The ENDIF instruction signifies the end of an IF/ELSE/ENDIF block. It has 668 no other effect on program execution. 669 670 671 3.11.X.5, ENDLOOP: End of LOOP Block 672 673 The ENDLOOP instruction specifies the end of a LOOP block. When an 674 ENDLOOP instruction executes, the loop count is decremented and the loop 675 index increment value is added to the loop index (A0.x). If the 676 decremented loop count is greater than zero, execution continues at the 677 top of the LOOP block. 678 679 LoopCount--; 680 LoopIndex += LoopIncr; 681 if (LoopCount > 0) { 682 continue execution at instruction following corresponding LOOP 683 instruction; 684 } 685 686 3.11.X.6, ENDREP: End of REP Block 687 688 The ENDREP instruction specifies the end of a REP block. When an ENDREP 689 instruction executes, the loop count is decremented. If the decremented 690 loop count is greater than zero, execution continues at the top of the REP 691 block. 692 693 LoopCount--; 694 if (LoopCount > 0) { 695 continue execution at instruction following corresponding LOOP 696 instruction; 697 } 698 699 700 3.11.X.7, IF: Beginning of IF Block 701 702 The IF instruction conditionally transfers control to the instruction 703 immediately following the corresponding ELSE instruction (if present) or 704 ENDIF instruction (if no ELSE is present). 705 706 Implementations may have a limited ability to nest IF blocks at run time. 707 If the number of IF/ENDIF blocks that are currently active is 708 MAX_PROGRAM_IF_DEPTH_NV, an IF instruction will cause the fragment program 709 to terminate. If an IF instruction is executed inside a subroutine, any 710 active IF/ENDIF blocks in the calling code count against this limit. 711 712 if (IF block nested too deeply) { 713 terminate fragment program; 714 } 715 716 // Evaluate the condition. If the condition is true, continue at the 717 // next instruction. Otherwise, continue at the 718 if (TestCC(cc.c***) || TestCC(cc.*c**) || 719 TestCC(cc.**c*) || TestCC(cc.***c)) { 720 continue execution at the next instruction; 721 } else if (IF block contains an ELSE statement) { 722 continue execution at instruction following corresponding ELSE; 723 } else { 724 continue execution at instruction following corresponding ENDIF; 725 } 726 727 728 3.11.X.8, LOOP: Beginning of LOOP Block 729 730 The LOOP instruction begins a LOOP block. The x, y, and z components of 731 the single vector operand specify the initial values for the loop count, 732 loop index, and loop index increment, respectively. 733 734 The loop count indicates the number of times the instructions between the 735 LOOP and corresponding ENDLOOP instruction will be executed. If the 736 initial value of the loop count is not positive, the entire block is 737 skipped and execution continues at the corresponding ENDLOOP instruction. 738 739 The loop index (A0.x) can be used for indirect addressing in the set of 740 texture coordinate fragment attributes. A fragment program can only use 741 the loop index of the current LOOP block; loop indices for containing LOOP 742 blocks are not available. 743 744 Implementations may have a limited ability to nest LOOP and REP blocks at 745 run time. If the number of LOOP/ENDLOOP and REP/ENDREP blocks that have 746 not completed is MAX_PROGRAM_LOOP_DEPTH_NV, a LOOP instruction will cause 747 the fragment program to terminate. If a LOOP instruction is executed 748 inside a subroutine, any active LOOP/ENDLOOP or REP/ENDREP blocks in the 749 calling code count against this limit. 750 751 if (LOOP block nested too deeply) { 752 terminate fragment program; 753 } 754 755 // Set up loop information for the new nesting level. 756 tmp = VectorLoad(op0); 757 LoopCount = floor(op0.x); 758 LoopIndex = floor(op0.y); 759 LoopIncr = floor(op0.z); 760 if (LoopCount <= 0) { 761 continue execution at the corresponding ENDLOOP; 762 } 763 764 LOOP blocks do not support fully general branching -- a fragment program 765 will fail to load if the vector operand is not a program parameter. 766 767 768 3.11.X.9, REP: Beginning of REP Block 769 770 The REP instruction begins a REP block. The x component of the single 771 vector operand specifies the initial value for the loop count. REP blocks 772 are completely identical to LOOP blocks except that they don't use the 773 loop index at all. 774 775 The loop count indicates the number of times the instructions between the 776 REP and corresponding ENDREP instruction will be executed. If the initial 777 value of the loop count is not positive, the entire block is skipped and 778 execution continues at the instruction following the corresponding ENDREP 779 instruction. 780 781 Implementations may have a limited ability to nest LOOP and REP blocks at 782 run time. If the number of LOOP/ENDLOOP and REP/ENDREP blocks that have 783 not completed is MAX_PROGRAM_LOOP_DEPTH_NV, a REP instruction will cause 784 the fragment program to terminate. If a REP instruction is executed 785 inside a subroutine, any active LOOP/ENDLOOP or REP/ENDREP blocks in the 786 calling code count against this limit. 787 788 if (REP block nested too deeply) { 789 terminate fragment program; 790 } 791 792 // Set up loop information for the new nesting level. 793 tmp = VectorLoad(op0); 794 LoopCount = floor(op0.x); 795 if (LoopCount <= 0) { 796 continue execution at the corresponding ENDREP; 797 } 798 799 REP blocks do not support fully general branching -- a fragment program 800 will fail to load if the vector operand is not a program parameter. 801 802 803 3.11.X.10, RET: Subroutine Return 804 805 The RET instruction conditionally returns from a subroutine initiated by a 806 CAL instruction. A RET instruction has no effect if the condition code 807 test evaluates to FALSE. 808 809 When executed, the RET instruction pops a reference to the instruction 810 immediately following the corresponding CAL instruction onto the call 811 stack and continues execution at that instruction. 812 813 If a RET instruction is issued when the call stack is empty, the fragment 814 program is terminated. 815 816 if (TestCC(cc.c***) || TestCC(cc.*c**) || 817 TestCC(cc.**c*) || TestCC(cc.***c)) { 818 819 if (callStackDepth <= 0) { 820 terminate fragment program; 821 } 822 823 pop instruction following the CAL instruction off the call stack; 824 continue execution at that instruction; 825 } 826 827 828Additions to Chapter 4 of the OpenGL 1.4 Specification (Per-Fragment 829Operations and the Frame Buffer) 830 831 None. 832 833Additions to Chapter 5 of the OpenGL 1.4 Specification (Special Functions) 834 835 None. 836 837Additions to Chapter 6 of the OpenGL 1.4 Specification (State and 838State Requests) 839 840 None. 841 842Additions to Appendix A of the OpenGL 1.4 Specification (Invariance) 843 844 None. 845 846Additions to the AGL/GLX/WGL Specifications 847 848 None. 849 850Dependencies on ARB_fragment_program 851 852 ARB_fragment_program is required. 853 854 This specification and NV_fragment_program_option are based on a modified 855 version of the grammar published in the ARB_fragment_program 856 specification. This modified grammar includes a few structural changes to 857 better accommodate new functionality from this and other extensions, but 858 should be functionally equivalent to the ARB_fragment_program grammar. 859 See NV_fragment_program_option for details on the base grammar. 860 861Dependencies on NV_fragment_program2_option 862 863 NV_fragment_program_option is required. 864 865 If the NV_fragment_program2 program option is specified, all the 866 functionality described in both this extension and the 867 NV_fragment_program_option specification is available. 868 869GLX Protocol 870 871 None. 872 873Errors 874 875 None. 876 877New State 878 879 None. 880 881New Implementation Dependent State 882 Min 883 Get Value Type Get Command Value Description Sec Attrib 884 ----------------------------------- ---- --------------- ------ ----------------- -------- ------ 885 MAX_PROGRAM_EXEC_INSTRUCTIONS_NV Z+ GetProgramivARB 65536 maximum program 3.11.4.X - 886 execution inst- 887 ruction count 888 MAX_PROGRAM_CALL_DEPTH_NV Z+ GetProgramivARB 4 maximum program 3.11.4.X - 889 call stack depth 890 MAX_PROGRAM_IF_DEPTH_NV Z+ GetProgramivARB 48 maximum program 3.11.4.X - 891 if nesting 892 MAX_PROGRAM_LOOP_DEPTH_NV Z+ GetProgramivARB 4 maximum program 3.11.4.X - 893 loop nesting 894 MAX_PROGRAM_LOOP_COUNT_NV Z+ GetProgramivARB 255 maximum program 3.11.4.X - 895 initial loop count 896 897 (add to Table X.10. New Implementation-Dependent Values Introduced by 898 ARB_fragment_program. Values queried by GetProgramivARB require a <pname> 899 of FRAGMENT_PROGRAM_ARB.) 900 901Revision History 902 903 Rev. Date Author Changes 904 ---- -------- ------- -------------------------------------------- 905 8 08/04/04 pbrown Fixed two typos in the TXL instruction. 906 907 7 07/08/04 pbrown Fixed entries for KIL and RFL in the opcode 908 table. 909 910 6 05/16/04 pbrown Documented that "A0" is a pre-defined address 911 register variable for the purposes of the 912 grammar, and that no other address register 913 variables can be declared. 914 915 5 -------- pbrown Internal pre-release revisions. 916