1Name 2 3 NV_vertex_program2 4 5Name Strings 6 7 GL_NV_vertex_program2 8 9Contact 10 11 Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) 12 Mark Kilgard, NVIDIA Corporation (mjk 'at' nvidia.com) 13 14Notice 15 16 Copyright NVIDIA Corporation, 2000-2002. 17 18IP Status 19 20 NVIDIA Proprietary. 21 22Status 23 24 Implemented in CineFX (NV30) Emulation driver, August 2002. 25 Shipping in Release 40 NVIDIA driver for CineFX hardware, January 2003. 26 27Version 28 29 Last Modified Date: 03/18/2008 30 NVIDIA Revision: 33 31 32Number 33 34 287 35 36Dependencies 37 38 Written based on the wording of the OpenGL 1.3 Specification and requires 39 OpenGL 1.3. 40 41 Written based on the wording of the NV_vertex_program extension 42 specification, version 1.0. 43 44 NV_vertex_program is required. 45 46Overview 47 48 This extension further enhances the concept of vertex programmability 49 introduced by the NV_vertex_program extension, and extended by 50 NV_vertex_program1_1. These extensions create a separate vertex program 51 mode where the configurable vertex transformation operations in unextended 52 OpenGL are replaced by a user-defined program. 53 54 This extension introduces the VP2 execution environment, which extends the 55 VP1 execution environment introduced in NV_vertex_program. The VP2 56 environment provides several language features not present in previous 57 vertex programming execution environments: 58 59 * Branch instructions allow a program to jump to another instruction 60 specified in the program. 61 62 * Branching support allows for up to four levels of subroutine 63 calls/returns. 64 65 * A four-component condition code register allows an application to 66 compute a component-wise write mask at run time and apply that mask to 67 register writes. 68 69 * Conditional branches are supported, where the condition code register 70 is used to determine if a branch should be taken. 71 72 * Programmable user clipping is supported support (via the CLP0-CLP5 73 clip distance registers). Primitives are clipped to the area where 74 the interpolated clip distances are greater than or equal to zero. 75 76 * Instructions can perform a component-wise absolute value operation on 77 any operand load. 78 79 The VP2 execution environment provides a number of new instructions, and 80 extends the semantics of several instructions already defined in 81 NV_vertex_program. 82 83 * ARR: Operates like ARL, except that float-to-int conversion is done 84 by rounding. Equivalent results could be achieved (less efficiently) 85 in NV_vertex program using an ADD/ARL sequence and a program parameter 86 holding the value 0.5. 87 88 * BRA, CAL, RET: Branch, subroutine call, and subroutine return 89 instructions. 90 91 * COS, SIN: Adds support for high-precision sine and cosine 92 computations. 93 94 * FLR, FRC: Adds support for computing the floor and fractional portion 95 of floating-point vector components. Equivalent results could be 96 achieved (less efficiently) in NV_vertex_program using the EXP 97 instruction to compute the fractional portion of one component at a 98 time. 99 100 * EX2, LG2: Adds support for high-precision exponentiation and 101 logarithm computations. 102 103 * ARA: Adds pairs of components of an address register; useful for 104 looping and other operations. 105 106 * SEQ, SFL, SGT, SLE, SNE, STR: Add six new "set on" instructions, 107 similar to the SLT and SGE instructions defined in NV_vertex_program. 108 Equivalent results could be achieved (less efficiently) in 109 NV_vertex_program with multiple SLT, SGE, and arithmetic instructions. 110 111 * SSG: Adds a new "set sign" operation, which produces a vector holding 112 negative one for negative components, zero for components with a value 113 of zero, and positive one for positive components. Equivalent results 114 could be achieved (less efficiently) in NV_vertex_program with 115 multiple SLT, SGE, and arithmetic instructions. 116 117 * The ARL instruction is extended to operate on four components instead 118 of a single component. 119 120 * All instructions that produce integer or floating-point result vectors 121 have variants that update the condition code register based on the 122 result vector. 123 124 This extension also raises some of the resource limitations in the 125 NV_vertex_program extension. 126 127 * 256 program parameter registers (versus 96 in NV_vertex_program). 128 129 * 16 temporary registers (versus 12 in NV_vertex_program). 130 131 * Two four-component integer address registers (versus one 132 single-component register in NV_vertex_program). 133 134 * 256 total vertex program instructions (versus 128 in 135 NV_vertex_program). 136 137 * Including loops, programs can execute up to 64K instructions. 138 139 140Issues 141 142 This extension builds upon the NV_vertex_program extension. Should this 143 specification contain selected edits to the NV_vertex_program 144 specification or should the specs be unified? 145 146 RESOLVED: Since NV_vertex_program and NV_vertex_program2 programs share 147 many features, the main section of this specification is unified and 148 describes both types of programs. Other sections containing 149 NV_vertex_program features that are unchanged by this extension will not 150 be edited. 151 152 How can a program use condition codes to avoid extra computations? 153 154 Consider the example of evaluating the OpenGL lighting model for a 155 given light. If the diffuse dot product is negative (roughly 1/2 the 156 time for random geometry), the only contribution to the light is 157 ambient. In this case, condition codes and branching can skip over a 158 number of unneeded instructions. 159 160 # R0 holds accumulated light color 161 # R2 holds normal 162 # R3 holds computed light vector 163 # R4 holds computed half vector 164 # c[0] holds ambient light/material product 165 # c[1] holds diffuse light/material product 166 # c[2].xyz holds specular light/material product 167 # c[2].w holds specular exponent 168 DP3C R1.x, R2, R3; # diffuse dot product 169 ADD R0, R0, c[0]; # accumulate ambient 170 BRA pointsAway (LT.x) # skip rest if diffuse dot < 0 171 MOV R1.w, c[2].w; 172 DP3 R1.y, R2, R4; # specular dot product 173 LIT R1, R1; # compute expontiated specular 174 MAD R4, c[1], R0.y; # accumulate diffuse 175 MAD R4, c[2], R0.z; # accumulate specular 176 pointsAway: 177 ... # continue execution 178 179 How can a program use subroutines? 180 181 With subroutines, a program can encapsulate a small piece of 182 functionality into a subroutine and call it multiple times, as in CPU 183 code. Applications will need to identify the registers used to pass 184 data to and from the subroutine. 185 186 Subroutines could be used for applications like evaluating lighting 187 equations for a single light. With conditional branching and 188 subroutines, a variable number of lights (which could even vary 189 per-vertex) can be easily supported. 190 191 accumulate: 192 # R0 holds the accumulated result 193 # R1 holds the value to add 194 ADD R0, R1; 195 RET; 196 197 # Compute floor(A)*B by repeated addition using a subroutine. Yes, 198 # this is a stupid example. 199 # 200 # c[0] holds (A,B,0,1). 201 # R0 holds the accumulated result 202 # R1 holds B, the value to accumulate. 203 # R2 holds the number of iterations remaining. 204 MOV R0, c[0].z; # start with zero 205 MOV R1, c[0].y; 206 FLRC R2.x, c[0].x; 207 BRA done (LE.x); 208 top: 209 CAL accumulate; 210 ADDC R2.x, R2.x, -c[0].w; # decrement count 211 BRA top (GT.x); 212 done: 213 ... 214 215 How can conventional OpenGL clip planes be supported in vertex programs? 216 217 The clip distance in the OpenGL specification can be evaluated with a 218 simple DP4 instruction that writes to one of the six clip distance 219 registers. Primitives will automatically be clipped to the half-space 220 where o[CLPx] >= 0, which matches the definition in the spec. 221 222 # R0 holds eye coordinates 223 # c[0] holds eye-space clip plane coefficients 224 DP4 o[CLP0].x, R0, c[0]; 225 226 Note that the clip plane or clip distance volume corresponding to the 227 o[CLPn] register used must be enabled, or no clipping will be performed. 228 229 The clip distance registers allow for clip distance volumes to be 230 computed more-or-less arbitrarily. To approximate clipping to a sphere 231 of radius <n>, the following code can be used. 232 233 # R0 holds eye coordinates 234 # c[0].xyz holds sphere center 235 # c[0].w holds the square of the sphere radius 236 SUB R1.xyz, R0, c[0]; # distance vector 237 DP3 R1.w, R1, R1; # compute distance squared 238 SUB o[CLP0].x, c[0].w, R1.w; # compute r^2 - d^2 239 240 Since the clip distance is interpolated linearly over a primitive, the 241 clip distance evaluated at a point will represent a piecewise-linear 242 approximation of the true distance. The approximation will become 243 increasingly more accurate as the primitive is tesselated more finely. 244 245 How can looping be achieved in vertex programs? 246 247 Simple loops can be achieved using a general purpose floating-point 248 register component as a counter. The following code calls a function 249 named "function" <n> times, where <n> is specified in a program 250 parameter register component. 251 252 # c[0].x holds the number of iterations to execute. 253 # c[1].x holds the constant 1.0. 254 MOVC R15.x, c[0].x; 255 startLoop: 256 CAL function (GT.x); # if (counter > 0) function(); 257 SUBC R15.x, R15.x, c[1].x; # counter = counter - 1; 258 BRA startLoop (GT.x); # if (counter > 0) goto start; 259 endLoop: 260 ... 261 262 More complex loops (where a separate index may be needed for indexed 263 addressing into the program parameter array) can be achieved using the 264 ARA instruction, which will add the x/z and y/w components of an address 265 register. 266 267 # c[0].x holds the number of iterations to execute 268 # c[0].y holds the initial index value 269 # c[0].z holds the constant -1.0 (used for the iteration count) 270 # c[0].w holds the index step value 271 ARLC A1, c[0]; 272 startLoop: 273 CAL function (GT.x); # if (counter > 0) function(); 274 # Note: A1.y can be used for 275 # indexing in function(). 276 ARAC A1.xy, A1; # counter = counter - 1; 277 # index += loopStep; 278 BRA startLoop (GT.x); # if (counter > 0) goto start; 279 endLoop: 280 ... 281 282 Should this specification add support for vertex state programs beyond the 283 VP1 execution environment? 284 285 No. Vertex state programs are a little-used feature of 286 NV_vertex_program and don't perform particularly well. They are still 287 supported for compatibility with the original NV_vertex_program spec, 288 but they will not be extended to support new features. 289 290 How are NaN's be handled in the "set on" instructions (SEQ, SGE, SGT, SLE, 291 SLT, SNE)? What about MIN, MAX? SSG? When doing condition code tests? 292 293 Any of these instructions involving a NaN operand will produce a NaN 294 result. This behavior differs from the NV_fragment_program extension. 295 There, SEQ, SGE, SGT, SLE, and SLT will produce 0.0 if either operand is 296 a NaN, and SNE will produce 1.0 if either operand is a NaN. 297 298 For condition code updates, NaN values will result in "UN" condition 299 codes. All conditionals using a "UN" condition code, except "TR" and 300 "NE" will evaluate to false. This behavior is identical to the 301 functionality in NV_fragment_program. 302 303 How can the various features of this extension be used to provide skinning 304 functionality similar to that in ARB_vertex_blend and ARB_matrix_palette? 305 And how can that functionality be extended? 306 307 Assume an implementation that allows application of up to 8 matrices at 308 once. Further assume that v[12].xyzw and v[13].xyzw hold the set of 8 309 weights, and v[14].xyzw and v[15].xyzw hold the set of 8 matrix indices. 310 Furthermore, assume that the palette of matrices are stored/tracked at 311 c[0], c[4], c[8], and so on. As an additional optimization, an 312 application can specify that fewer than 8 matrices should be applied by 313 storing a negative palette index immediately after the last index is 314 applied. 315 316 Skinning support in this example can be provided by the following code: 317 318 ARLC A0, v[14]; # load 4 palette indices at once 319 DP4 R1.x, c[A0.x+0], v[0]; # 1st matrix transform 320 DP4 R1.y, c[A0.x+1], v[0]; 321 DP4 R1.z, c[A0.x+2], v[0]; 322 DP4 R1.w, c[A0.x+3], v[0]; 323 MUL R0, R1, v[12].x; # accumulate weighted sum in R0 324 BRA end (LT.y); # stop on a negative matrix index 325 DP4 R1.x, c[A0.y+0], v[0]; # 2nd matrix transform 326 DP4 R1.y, c[A0.y+1], v[0]; 327 DP4 R1.z, c[A0.y+2], v[0]; 328 DP4 R1.w, c[A0.y+3], v[0]; 329 MAD R0, R1, v[12].y, R0; # accumulate weighted sum in R0 330 BRA end (LT.z); # stop on a negative matrix index 331 332 ... # 3rd and 4th matrix transform 333 334 ARLC A0, v[15]; # load next four palette indices 335 BRA end (LT.x); 336 DP4 R1.x, c[A0.x+0], v[0]; # 5th matrix transform 337 DP4 R1.y, c[A0.x+1], v[0]; 338 DP4 R1.z, c[A0.x+2], v[0]; 339 DP4 R1.w, c[A0.x+3], v[0]; 340 MAD R0, R1, v[13].x, R0; # accumulate weighted sum in R0 341 BRA end (LT.y); # stop on a negative matrix index 342 343 ... # 6th, 7th, and 8th matrix transform 344 345 end: 346 ... # any additional instructions 347 348 The amount of code used by this example could further be reduced using a 349 subroutine performing four transformations at a time: 350 351 ARLC A0, v[14]; # load first four indices 352 CAL skin4; # do first four transformations 353 BRA end (LT); # end if any of the first 4 indices was < 0 354 ARLC A0, v[15]; # load second four indices 355 CAL skin4; # do second four transformations 356 end: 357 ... # any additional instructions 358 359 Why does the RCC instruction exist? 360 361 RESOLVED: To perform numeric operations that will avoid overflow and 362 underflow issues. 363 364 Should the specification provide more examples? 365 366 RESOLVED: It would be nice. 367 368 369New Procedures and Functions 370 371 None. 372 373 374New Tokens 375 376 None. 377 378 379Additions to Chapter 2 of the OpenGL 1.3 Specification (OpenGL Operation) 380 381 Modify Section 2.11, Clipping (p. 39) 382 383 (modify last paragraph, p. 39) When the GL is not in vertex program mode 384 385 (section 2.14), this view volume may be further restricted by as many as n 386 client-defined clip planes to generate the clip volume. ... 387 388 (add before next-to-last paragraph, p. 40) When the GL is in vertex 389 program mode, the view volume may be restricted to the individual clip 390 distance volumes derived from the per-vertex clip distances (o[CLP0] - 391 o[CLP5]). Clip distance volumes are applied if and only if per-vertex 392 clip distances are not supported in the vertex program execution 393 environment. A point P belonging to the primitive under consideration is 394 in the clip distance volume numbered n if and only if 395 396 c_n(P) >= 0, 397 398 where c_n(P) is the interpolated value of the clip distance CLPn at the 399 point P. For point primitives, c_n(P) is simply the clip distance for the 400 vertex in question. For line and triangle primitives, per-vertex clip 401 distances are interpolated using a weighted mean, with weights derived 402 according to the algorithms described in sections 3.4 and 3.5. 403 404 (modify next-to-last paragraph, p.40) Client-defined clip planes or clip 405 distance volumes are enabled with the generic Enable command and disabled 406 with the Disable command. The value of the argument to either command is 407 CLIP PLANEi where i is an integer between 0 and n; specifying a value of i 408 enables or disables the plane equation with index i. The constants obey 409 CLIP PLANEi = CLIP PLANE0 + i. 410 411 412 Add Section 2.14, Vertex Programs (p. 57). This section supersedes the 413 similar section added in the NV_vertex_program extension and extended in 414 the NV_vertex_program1_1 extension. 415 416 The conventional GL vertex transformation model described in sections 2.10 417 through 2.13 is a configurable, but essentially hard-wired, sequence of 418 per-vertex computations based on a canonical set of per-vertex parameters 419 and vertex transformation related state such as transformation matrices, 420 lighting parameters, and texture coordinate generation parameters. 421 422 The general success and utility of the conventional GL vertex 423 transformation model reflects its basic correspondence to the typical 424 vertex transformation requirements of 3D applications. 425 426 However when the conventional GL vertex transformation model is not 427 sufficient, the vertex program mode provides a substantially more flexible 428 model for vertex transformation. The vertex program mode permits 429 applications to define their own vertex programs. 430 431 432 Section 2.14.1, Vertex Program Execution Environment 433 434 The vertex program execution environment is an operational model that 435 defines how a program is executed. The execution environment includes a 436 set of instructions, a set of registers, and semantic rules defining how 437 operations are performed. There are three vertex program execution 438 environments, VP1, VP1.1, and VP2. The environment names are taken from 439 the mandatory program prefix strings found at the beginning of all vertex 440 programs. The VP1.1 execution environment is a minor addition to the VP1 441 execution environment, so references to the VP1 execution environment 442 below apply to both VP1 and VP1.1 execution environments except where 443 otherwise noted. 444 445 The vertex program instruction set consists primarily of floating-point 446 4-component vector operations operating on per-vertex attributes and 447 program parameters. Vertex programs execute on a per-vertex basis and 448 operate on each vertex completely independently from the processing of 449 other vertices. Vertex programs execute without data hazards so results 450 computed in one operation can be used immediately afterwards. Vertex 451 programs produce a set of vertex result vectors that becomes the set of 452 transformed vertex parameters used by primitive assembly. 453 454 In the VP1 environment, vertex programs execute a finite fixed sequence of 455 instructions with no branching or looping. In the VP2 environment, vertex 456 programs support conditional and unconditional branches and four levels of 457 subroutine calls. 458 459 The vertex program register set consists of six types of registers 460 described in the following sections. 461 462 463 Section 2.14.1.1, Vertex Attribute Registers 464 465 The Vertex Attribute Registers are sixteen 4-component vector 466 floating-point registers containing the current vertex's per-vertex 467 attributes. These registers are numbered 0 through 15. These registers 468 are private to each vertex program invocation and are initialized at each 469 vertex program invocation by the current vertex attribute state specified 470 with VertexAttribNV commands. These registers are read-only during vertex 471 program execution. The VertexAttribNV commands used to update the vertex 472 attribute registers can be issued both outside and inside of Begin/End 473 pairs. Vertex program execution is provoked by updating vertex attribute 474 zero. Updating vertex attribute zero outside of a Begin/End pair is 475 ignored without generating any error (identical to the Vertex command 476 operation). 477 478 The commands 479 480 void VertexAttrib{1234}{sfd}NV(uint index, T coords); 481 void VertexAttrib{1234}{sfd}vNV(uint index, T coords); 482 void VertexAttrib4ubNV(uint index, T coords); 483 void VertexAttrib4ubvNV(uint index, T coords); 484 485 specify the particular current vertex attribute indicated by index. 486 The coordinates for each vertex attribute are named x, y, z, and w. 487 The VertexAttrib1NV family of commands sets the x coordinate to the 488 provided single argument while setting y and z to 0 and w to 1. 489 Similarly, VertexAttrib2NV sets x and y to the specified values, 490 z to 0 and w to 1; VertexAttrib3NV sets x, y, and z, with w set 491 to 1, and VertexAttrib4NV sets all four coordinates. The error 492 INVALID_VALUE is generated if index is greater than 15. 493 494 No conversions are applied to the vertex attributes specified as 495 type short, float, or double. However, vertex attributes specified 496 as type ubyte are converted as described by Table 2.6. 497 498 The commands 499 500 void VertexAttribs{1234}{sfd}vNV(uint index, sizei n, T coords[]); 501 void VertexAttribs4ubvNV(uint index, sizei n, GLubyte coords[]); 502 503 specify a contiguous set of n vertex attributes. The effect of 504 505 VertexAttribs{1234}{sfd}vNV(index, n, coords) 506 507 is the same (assuming no errors) as the command sequence 508 509 #define NUM k /* where k is 1, 2, 3, or 4 components */ 510 int i; 511 for (i=n-1; i>=0; i--) { 512 VertexAttrib{NUM}{sfd}vNV(i+index, &coords[i*NUM]); 513 } 514 515 VertexAttribs4ubvNV behaves similarly. 516 517 The VertexAttribNV calls equivalent to VertexAttribsNV are issued in 518 reverse order so that vertex program execution is provoked when index 519 is zero only after all the other vertex attributes have first been 520 specified. 521 522 The set and operation of vertex attribute registers are identical for both 523 VP1 and VP2 execution environment. 524 525 526 Section 2.14.1.2, Program Parameter Registers 527 528 The Program Parameter Registers are a set of 4-component floating-point 529 vector registers containing the vertex program parameters. In the VP1 530 execution environment, there are 96 registers, numbered 0 through 95. In 531 the VP2 execution environment, there are 256 registers, numbered 0 through 532 255. This relatively large set of registers is intended to hold 533 parameters such as matrices, lighting parameters, and constants required 534 by vertex programs. Vertex program parameter registers can be updated in 535 one of two ways: by the ProgramParameterNV commands outside of a 536 Begin/End pair or by a vertex state program executed outside of a 537 Begin/End pair (vertex state programs are discussed in section 2.14.3). 538 539 The commands 540 541 void ProgramParameter4fNV(enum target, uint index, 542 float x, float y, float z, float w) 543 void ProgramParameter4dNV(enum target, uint index, 544 double x, double y, double z, double w) 545 546 specify the particular program parameter indicated by index. 547 The coordinates values x, y, z, and w are assigned to the respective 548 components of the particular program parameter. target must be 549 VERTEX_PROGRAM_NV. 550 551 The commands 552 553 void ProgramParameter4dvNV(enum target, uint index, double *params); 554 void ProgramParameter4fvNV(enum target, uint index, float *params); 555 556 operate identically to ProgramParameter4fNV and ProgramParameter4dNV 557 respectively except that the program parameters are passed as an 558 array of four components. 559 560 The error INVALID_VALUE is generated if the specified index is greater 561 than or equal to the number of program parameters in the execution 562 environment (96 for VP1, 256 for VP2). 563 564 The commands 565 566 void ProgramParameters4dvNV(enum target, uint index, 567 uint num, double *params); 568 void ProgramParameters4fvNV(enum target, uint index, 569 uint num, float *params); 570 571 specify a contiguous set of num program parameters. The effect is 572 the same (assuming no errors) as 573 574 for (i=index; i<index+num; i++) { 575 ProgramParameter4{fd}vNV(target, i, ¶ms[i*4]); 576 } 577 578 The error INVALID_VALUE is generated if sum of <index> and <num> is 579 greater than the number of program parameters in the execution environment 580 (96 for VP1, 256 for VP2). 581 582 The program parameter registers are shared to all vertex program 583 invocations within a rendering context. ProgramParameterNV command 584 updates and vertex state program executions are serialized with respect to 585 vertex program invocations and other vertex state program executions. 586 587 Writes to the program parameter registers during vertex state program 588 execution can be maskable on a per-component basis. 589 590 The initial value of all 96 (VP1) or 256 (VP2) program parameter registers 591 is (0,0,0,0). 592 593 594 Section 2.14.1.3, Address Registers 595 596 The Address Registers are 4-component vector registers with signed 10-bit 597 integer components. In the VP1 execution environment, there is only a 598 single address register (A0) and only the x component of the register is 599 accessible. In the VP2 execution environment, there are two address 600 registers (A0 and A1), of which all four components are accessible. The 601 address registers are private to each vertex program invocation and are 602 initialized to (0,0,0,0) at every vertex program invocation. These 603 registers can be written during vertex program execution (but not read) 604 and their values can be used for as a relative offset for reading vertex 605 program parameter registers. Only the vertex program parameter registers 606 can be read using relative addressing (writes using relative addressing 607 are not supported). 608 609 See the discussion of relative addressing of program parameters in section 610 2.14.2.1 and the discussion of the ARL instruction in section 2.14.3.4. 611 612 613 Section 2.14.1.4, Temporary Registers 614 615 The Temporary Registers are 4-component floating-point vector registers 616 used to hold temporary results during vertex program execution. In the 617 VP1 execution environment, there are 12 temporary registers, numbered 0 618 through 11. In the VP2 execution environment, there are 16 temporary 619 registers, numbered 0 through 15. These registers are private to each 620 vertex program invocation and initialized to (0,0,0,0) at every vertex 621 program invocation. These registers can be read and written during vertex 622 program execution. Writes to these registers can be maskable on a 623 per-component basis. 624 625 In the VP2 execution environment, there is one additional temporary 626 pseudo-register, "CC". CC is treated as unnumbered, write-only temporary 627 register, whose sole purpose is to allow instructions to modify the 628 condition code register (section 2.14.1.6) without overwriting the 629 contents of any temporary register. 630 631 632 Section 2.14.1.5, Vertex Result Registers 633 634 The Vertex Result Registers are 4-component floating-point vector 635 registers used to write the results of a vertex program. There are 15 636 result registers in the VP1 execution environment, and 21 in the VP2 637 execution environment. Each register value is initialized to (0,0,0,1) at 638 the invocation of each vertex program. Writes to the vertex result 639 registers can be maskable on a per-component basis. These registers are 640 named in Table X.1 and further discussed below. 641 642 643 Vertex Result Component 644 Register Name Description Interpretation 645 -------------- --------------------------------- -------------- 646 HPOS Homogeneous clip space position (x,y,z,w) 647 COL0 Primary color (front-facing) (r,g,b,a) 648 COL1 Secondary color (front-facing) (r,g,b,a) 649 BFC0 Back-facing primary color (r,g,b,a) 650 BFC1 Back-facing secondary color (r,g,b,a) 651 FOGC Fog coordinate (f,*,*,*) 652 PSIZ Point size (p,*,*,*) 653 TEX0 Texture coordinate set 0 (s,t,r,q) 654 TEX1 Texture coordinate set 1 (s,t,r,q) 655 TEX2 Texture coordinate set 2 (s,t,r,q) 656 TEX3 Texture coordinate set 3 (s,t,r,q) 657 TEX4 Texture coordinate set 4 (s,t,r,q) 658 TEX5 Texture coordinate set 5 (s,t,r,q) 659 TEX6 Texture coordinate set 6 (s,t,r,q) 660 TEX7 Texture coordinate set 7 (s,t,r,q) 661 CLP0(*) Clip distance 0 (d,*,*,*) 662 CLP1(*) Clip distance 1 (d,*,*,*) 663 CLP2(*) Clip distance 2 (d,*,*,*) 664 CLP3(*) Clip distance 3 (d,*,*,*) 665 CLP4(*) Clip distance 4 (d,*,*,*) 666 CLP5(*) Clip distance 5 (d,*,*,*) 667 668 Table X.1: Vertex Result Registers. (*) Registers CLP0 through CLP5, are 669 available only in the VP2 execution environment. 670 671 HPOS is the transformed vertex's homogeneous clip space position. The 672 vertex's homogeneous clip space position is converted to normalized device 673 coordinates and transformed to window coordinates as described at the end 674 of section 2.10 and in section 2.11. Further processing (subsequent to 675 vertex program termination) is responsible for clipping primitives 676 assembled from vertex program-generated vertices as described in section 677 2.10 but all client-defined clip planes are treated as if they are 678 disabled when vertex program mode is enabled. 679 680 Four distinct color results can be generated for each vertex. COL0 is the 681 transformed vertex's front-facing primary color. COL1 is the transformed 682 vertex's front-facing secondary color. BFC0 is the transformed vertex's 683 back-facing primary color. BFC1 is the transformed vertex's back-facing 684 secondary color. 685 686 Primitive coloring may operate in two-sided color mode. This behavior is 687 enabled and disabled by calling Enable or Disable with the symbolic value 688 VERTEX_PROGRAM_TWO_SIDE_NV. The selection between the back-facing colors 689 and the front-facing colors depends on the primitive of which the vertex 690 is a part. If the primitive is a point or a line segment, the 691 front-facing colors are always selected. If the primitive is a polygon 692 and two-sided color mode is disabled, the front-facing colors are 693 selected. If it is a polygon and two-sided color mode is enabled, then 694 the selection is based on the sign of the (clipped or unclipped) polygon's 695 signed area computed in window coordinates. This facingness determination 696 is identical to the two-sided lighting facingness determination described 697 in section 2.13.1. 698 699 The selected primary and secondary colors for each primitive are clamped 700 to the range [0,1] and then interpolated across the assembled primitive 701 during rasterization with at least 8-bit accuracy for each color 702 component. 703 704 FOGC is the transformed vertex's fog coordinate. The register's first 705 floating-point component is interpolated across the assembled primitive 706 during rasterization and used as the fog distance to compute per-fragment 707 the fog factor when fog is enabled. However, if both fog and vertex 708 program mode are enabled, but the FOGC vertex result register is not 709 written, the fog factor is overridden to 1.0. The register's other three 710 components are ignored. 711 712 Point size determination may operate in program-specified point size mode. 713 This behavior is enabled and disabled by calling Enable or Disable with 714 the symbolic value VERTEX_PROGRAM_POINT_SIZE_NV. If the vertex is for a 715 point primitive and the mode is enabled and the PSIZ vertex result is 716 written, the point primitive's size is determined by the clamped x 717 component of the PSIZ register. Otherwise (because vertex program mode is 718 disabled, program-specified point size mode is disabled, or because the 719 vertex program did not write PSIZ), the point primitive's size is 720 determined by the point size state (the state specified using the 721 PointSize command). 722 723 The PSIZ register's x component is clamped to the range zero through 724 either the hi value of ALIASED_POINT_SIZE_RANGE if point smoothing is 725 disabled or the hi value of the SMOOTH_POINT_SIZE_RANGE if point smoothing 726 is enabled. The register's other three components are ignored. 727 728 If the vertex is not for a point primitive, the value of the PSIZ vertex 729 result register is ignored. 730 731 TEX0 through TEX7 are the transformed vertex's texture coordinate sets for 732 texture units 0 through 7. These floating-point coordinates are 733 interpolated across the assembled primitive during rasterization and used 734 for accessing textures. If the number of texture units supported is less 735 than eight, the values of vertex result registers that do not correspond 736 to existent texture units are ignored. 737 738 CLP0 through CLP5, available only in the VP2 execution environment, are 739 the transformed vertex's clip distances. These floating-point coordinates 740 are used by post-vertex program clipping process (see section 2.11). 741 742 743 Section 2.14.1.6, The Condition Code Register 744 745 The VP2 execution environment provides a single four-component vector 746 called the condition code register. Each component of this register is 747 one of four enumerated values: GT (greater than), EQ (equal), LT (less 748 than), or UN (unordered). The condition code register can be used to mask 749 writes to registers and to evaluate conditional branches. 750 751 Most vertex program instructions can optionally update the condition code 752 register. When a vertex program instruction updates the condition code 753 register, a condition code component is set to LT if the corresponding 754 component of the result is less than zero, EQ if it is equal to zero, GT 755 if it is greater than zero, and UN if it is NaN (not a number). 756 757 The condition code register is initialized to a vector of EQ values each 758 time a vertex program executes. 759 760 There is no condition code register available in the VP1 execution 761 environment. 762 763 764 Section 2.14.1.7, Semantic Meaning for Vertex Attributes and Program 765 Parameters 766 767 One important distinction between the conventional GL vertex 768 transformation mode and the vertex program mode is that per-vertex 769 parameters and other state parameters in vertex program mode do not have 770 dedicated semantic interpretations the way that they do with the 771 conventional GL vertex transformation mode. 772 773 For example, in the conventional GL vertex transformation mode, the Normal 774 command specifies a per-vertex normal. The semantic that the Normal 775 command supplies a normal for lighting is established because that is how 776 the per-vertex attribute supplied by the Normal command is used by the 777 conventional GL vertex transformation mode. Similarly, other state 778 parameters such as a light source position have semantic interpretations 779 based on how the conventional GL vertex transformation model uses each 780 particular parameter. 781 782 In contrast, vertex attributes and program parameters for vertex programs 783 have no pre-defined semantic meanings. The meaning of a vertex attribute 784 or program parameter in vertex program mode is defined by how the vertex 785 attribute or program parameter is used by the current vertex program to 786 compute and write values to the Vertex Result Registers. This is the 787 reason that per-vertex attributes and program parameters for vertex 788 programs are numbered instead of named. 789 790 For convenience however, the existing per-vertex parameters for the 791 conventional GL vertex transformation mode (vertices, normals, 792 colors, fog coordinates, vertex weights, and texture coordinates) are 793 aliased to numbered vertex attributes. This aliasing is specified in 794 Table X.2. The table includes how the various conventional components 795 map to the 4-component vertex attribute components. 796 797Vertex 798Attribute Conventional Conventional 799Register Per-vertex Conventional Component 800Number Parameter Per-vertex Parameter Command Mapping 801--------- --------------- ----------------------------------- ------------ 802 0 vertex position Vertex x,y,z,w 803 1 vertex weights VertexWeightEXT w,0,0,1 804 2 normal Normal x,y,z,1 805 3 primary color Color r,g,b,a 806 4 secondary color SecondaryColorEXT r,g,b,1 807 5 fog coordinate FogCoordEXT fc,0,0,1 808 6 - - - 809 7 - - - 810 8 texture coord 0 MultiTexCoord(GL_TEXTURE0_ARB, ...) s,t,r,q 811 9 texture coord 1 MultiTexCoord(GL_TEXTURE1_ARB, ...) s,t,r,q 812 10 texture coord 2 MultiTexCoord(GL_TEXTURE2_ARB, ...) s,t,r,q 813 11 texture coord 3 MultiTexCoord(GL_TEXTURE3_ARB, ...) s,t,r,q 814 12 texture coord 4 MultiTexCoord(GL_TEXTURE4_ARB, ...) s,t,r,q 815 13 texture coord 5 MultiTexCoord(GL_TEXTURE5_ARB, ...) s,t,r,q 816 14 texture coord 6 MultiTexCoord(GL_TEXTURE6_ARB, ...) s,t,r,q 817 15 texture coord 7 MultiTexCoord(GL_TEXTURE7_ARB, ...) s,t,r,q 818 819Table X.2: Aliasing of vertex attributes with conventional per-vertex 820parameters. 821 822 Only vertex attribute zero is treated specially because it is 823 the attribute that provokes the execution of the vertex program; 824 this is the attribute that aliases to the Vertex command's vertex 825 coordinates. 826 827 The result of a vertex program is the set of post-transformation 828 vertex parameters written to the Vertex Result Registers. 829 All vertex programs must write a homogeneous clip space position, but 830 the other Vertex Result Registers can be optionally written. 831 832 Clipping and culling are not the responsibility of vertex programs because 833 these operations assume the assembly of multiple vertices into a 834 primitive. View frustum clipping is performed subsequent to vertex 835 program execution. Clip planes are not supported in the VP1 execution 836 environment. Clip planes are supported indirectly via the clip distance 837 (o[CLPx]) registers in the VP2 execution environment. 838 839 840 Section 2.14.1.8, Vertex Program Specification 841 842 Vertex programs are specified as an array of ubytes. The array is a 843 string of ASCII characters encoding the program. 844 845 The command 846 847 LoadProgramNV(enum target, uint id, sizei len, 848 const ubyte *program); 849 850 loads a vertex program when the target parameter is VERTEX_PROGRAM_NV. 851 Multiple programs can be loaded with different names. id names the 852 program to load. The name space for programs is the positive integers 853 (zero is reserved). The error INVALID_VALUE occurs if a program is loaded 854 with an id of zero. The error INVALID_OPERATION is generated if a program 855 is loaded for an id that is currently loaded with a program of a different 856 program target. Managing the program name space and binding to vertex 857 programs is discussed later in section 2.14.1.8. 858 859 program is a pointer to an array of ubytes that represents the program 860 being loaded. The length of the array is indicated by len. 861 862 A second program target type known as vertex state programs is discussed 863 in 2.14.4. 864 865 At program load time, the program is parsed into a set of tokens possibly 866 separated by white space. Spaces, tabs, newlines, carriage returns, and 867 comments are considered whitespace. Comments begin with the character "#" 868 and are terminated by a newline, a carriage return, or the end of the 869 program array. 870 871 The Backus-Naur Form (BNF) grammar below specifies the syntactically valid 872 sequences for several types of vertex programs. The set of valid tokens 873 can be inferred from the grammar. The token "" represents an empty string 874 and is used to indicate optional rules. A program is invalid if it 875 contains any undefined tokens or characters. 876 877 The grammar provides for three different vertex program types, 878 corresponding to the three vertex program execution environments. VP1, 879 VP1.1, and VP2 programs match the grammar rules <vp1-program>, 880 <vp11-program>, and <vp2-program>, respectively. Some grammar rules 881 correspond to features or instruction forms available only in certain 882 execution environments. Rules beginning with the prefix "vp1-" are 883 available only to VP1 and VP1.1 programs. Rules beginning with the 884 prefixes "vp11-" and "vp2-" are available only to VP1.1 and VP2 programs, 885 respectively. 886 887 888 <program> ::= <vp1-program> 889 | <vp11-program> 890 | <vp2-program> 891 892 <vp1-program> ::= "!!VP1.0" <programBody> "END" 893 894 <vp11-program> ::= "!!VP1.1" <programBody> "END" 895 896 <vp2-program> ::= "!!VP2.0" <programBody> "END" 897 898 <programBody> ::= <optionSequence> <programText> 899 900 <optionSequence> ::= <option> <optionSequence> 901 | "" 902 903 <option> ::= "OPTION" <vp11-option> ";" 904 | "OPTION" <vp2-option> ";" 905 906 <vp11-option> ::= "NV_position_invariant" 907 908 <vp2-option> ::= "NV_position_invariant" 909 910 <programText> ::= <programTextItem> <programText> 911 | "" 912 913 <programTextItem> ::= <instruction> ";" 914 | <vp2-instructionLabel> 915 916 <instruction> ::= <ARL-instruction> 917 | <VECTORop-instruction> 918 | <SCALARop-instruction> 919 | <BINop-instruction> 920 | <TRIop-instruction> 921 | <vp2-BRA-instruction> 922 | <vp2-RET-instruction> 923 | <vp2-ARA-instruction> 924 925 <ARL-instruction> ::= <vp1-ARL-instruction> 926 | <vp2-ARL-instruction> 927 928 <vp1-ARL-instruction> ::= "ARL" <maskedAddrReg> "," <scalarSrc> 929 930 <vp2-ARL-instruction> ::= <vp2-ARLop> <maskedAddrReg> "," <vectorSrc> 931 932 <vp2-ARLop> ::= "ARL" | "ARLC" 933 | "ARR" | "ARRC" 934 935 <VECTORop-instruction> ::= <VECTORop> <maskedDstReg> "," <vectorSrc> 936 937 <VECTORop> ::= "LIT" 938 | "MOV" 939 | <vp11-VECTORop> 940 | <vp2-VECTORop> 941 942 <vp11-VECTORop> ::= "ABS" 943 944 <vp2-VECTORop> ::= "ABSC" 945 | "FLR" | "FLRC" 946 | "FRC" | "FRCC" 947 | "LITC" 948 | "MOVC" 949 | "SSG" | "SSGC" 950 951 <SCALARop-instruction> ::= <SCALARop> <maskedDstReg> "," <scalarSrc> 952 953 <SCALARop> ::= "EXP" 954 | "LOG" 955 | "RCP" 956 | "RSQ" 957 | <vp11-SCALARop> 958 | <vp2-SCALARop> 959 960 <vp11-SCALARop> ::= "RCC" 961 962 <vp2-SCALARop> ::= "COS" | "COSC" 963 | "EX2" | "EX2C" 964 | "LG2" | "LG2C" 965 | "EXPC" 966 | "LOGC" 967 | "RCCC" 968 | "RCPC" 969 | "RSQC" 970 | "SIN" | "SINC" 971 972 <BINop-instruction> ::= <BINop> <maskedDstReg> "," <vectorSrc> "," 973 <vectorSrc> 974 975 <BINop> ::= "ADD" 976 | "DP3" 977 | "DP4" 978 | "DST" 979 | "MAX" 980 | "MIN" 981 | "MUL" 982 | "SGE" 983 | "SLT" 984 | <vp11-BINop> 985 | <vp2-BINop> 986 987 <vp11-BINop> ::= "DPH" 988 | "SUB" 989 990 <vp2-BINop> ::= "ADDC" 991 | "DP3C" 992 | "DP4C" 993 | "DPHC" 994 | "DSTC" 995 | "MAXC" 996 | "MINC" 997 | "MULC" 998 | "SEQ" | "SEQC" 999 | "SFL" | "SFLC" 1000 | "SGEC" 1001 | "SGT" | "SGTC" 1002 | "SLTC" 1003 | "SLE" | "SLEC" 1004 | "SNE" | "SNEC" 1005 | "STR" | "STRC" 1006 | "SUBC" 1007 1008 <TRIop-instruction> ::= <TRIop> <maskedDstReg> "," <vectorSrc> "," 1009 <vectorSrc> "," <vectorSrc> 1010 1011 <TRIop> ::= "MAD" 1012 | <vp2-TRIop> 1013 1014 <vp2-TRIop> ::= "MADC" 1015 1016 <vp2-BRA-instruction> ::= <vp2-BRANCHop> <vp2-branchLabel> 1017 <vp2-branchCondition> 1018 1019 <vp2-BRANCHop> ::= "BRA" 1020 | "CAL" 1021 1022 <vp2-RET-instruction> ::= "RET" <vp2-branchCondition> 1023 1024 <vp2-ARA-instruction> ::= <vp2-ARAop> <maskedAddrReg> "," <addrRegister> 1025 1026 <vp2-ARAop> ::= "ARA" | "ARAC" 1027 1028 <scalarSrc> ::= <baseScalarSrc> 1029 | <vp2-absScalarSrc> 1030 1031 <vp2-absScalarSrc> ::= <optionalSign> "|" <baseScalarSrc> "|" 1032 1033 <baseScalarSrc> ::= <optionalSign> <srcRegister> <scalarSuffix> 1034 1035 <vectorSrc> ::= <baseVectorSrc> 1036 | <vp2-absVectorSrc> 1037 1038 <vp2-absVectorSrc> ::= <optionalSign> "|" <baseVectorSrc> "|" 1039 1040 <baseVectorSrc> ::= <optionalSign> <srcRegister> <swizzleSuffix> 1041 1042 <srcRegister> ::= <vtxAttribRegister> 1043 | <progParamRegister> 1044 | <tempRegister> 1045 1046 <maskedDstReg> ::= <dstRegister> <optionalWriteMask> 1047 <optionalCCMask> 1048 1049 <dstRegister> ::= <vtxResultRegister> 1050 | <tempRegister> 1051 | <vp2-nullRegister> 1052 1053 <vp2-nullRegister> ::= "CC" 1054 1055 <vp2-branchCondition> ::= <optionalCCMask> 1056 1057 <vtxAttribRegister> ::= "v" "[" vtxAttribRegNum "]" 1058 1059 <vtxAttribRegNum> ::= decimal integer from 0 to 15 inclusive 1060 | "OPOS" 1061 | "WGHT" 1062 | "NRML" 1063 | "COL0" 1064 | "COL1" 1065 | "FOGC" 1066 | "TEX0" 1067 | "TEX1" 1068 | "TEX2" 1069 | "TEX3" 1070 | "TEX4" 1071 | "TEX5" 1072 | "TEX6" 1073 | "TEX7" 1074 1075 <progParamRegister> ::= <absProgParamReg> 1076 | <relProgParamReg> 1077 1078 <absProgParamReg> ::= "c" "[" <progParamRegNum> "]" 1079 1080 <progParamRegNum> ::= <vp1-progParamRegNum> 1081 | <vp2-progParamRegNum> 1082 1083 <vp1-progParamRegNum> ::= decimal integer from 0 to 95 inclusive 1084 1085 <vp2-progParamRegNum> ::= decimal integer from 0 to 255 inclusive 1086 1087 <relProgParamReg> ::= "c" "[" <scalarAddr> <relProgParamOffset> "]" 1088 1089 <relProgParamOffset> ::= "" 1090 | "+" <progParamPosOffset> 1091 | "-" <progParamNegOffset> 1092 1093 <progParamPosOffset> ::= <vp1-progParamPosOff> 1094 | <vp2-progParamPosOff> 1095 1096 <vp1-progParamPosOff> ::= decimal integer from 0 to 63 inclusive 1097 1098 <vp2-progParamPosOff> ::= decimal integer from 0 to 255 inclusive 1099 1100 <progParamNegOffset> ::= <vp1-progParamNegOff> 1101 | <vp2-progParamNegOff> 1102 1103 <vp1-progParamNegOff> ::= decimal integer from 0 to 64 inclusive 1104 1105 <vp2-progParamNegOff> ::= decimal integer from 0 to 256 inclusive 1106 1107 <tempRegister> ::= "R0" | "R1" | "R2" | "R3" 1108 | "R4" | "R5" | "R6" | "R7" 1109 | "R8" | "R9" | "R10" | "R11" 1110 1111 <vp2-tempRegister> ::= "R12" | "R13" | "R14" | "R15" 1112 1113 <vtxResultRegister> ::= "o" "[" <vtxResultRegName> "]" 1114 1115 <vtxResultRegName> ::= "HPOS" 1116 | "COL0" 1117 | "COL1" 1118 | "BFC0" 1119 | "BFC1" 1120 | "FOGC" 1121 | "PSIZ" 1122 | "TEX0" 1123 | "TEX1" 1124 | "TEX2" 1125 | "TEX3" 1126 | "TEX4" 1127 | "TEX5" 1128 | "TEX6" 1129 | "TEX7" 1130 | <vp2-resultRegName> 1131 1132 <vp2-resultRegName> ::= "CLP0" 1133 | "CLP1" 1134 | "CLP2" 1135 | "CLP3" 1136 | "CLP4" 1137 | "CLP5" 1138 1139 <scalarAddr> ::= <addrRegister> "." <addrRegisterComp> 1140 1141 <maskedAddrReg> ::= <addrRegister> <addrWriteMask> 1142 1143 <addrRegister> ::= "A0" 1144 | <vp2-addrRegister> 1145 1146 <vp2-addrRegister> ::= "A1" 1147 1148 <addrRegisterComp> ::= "x" 1149 | <vp2-addrRegisterComp> 1150 1151 <vp2-addrRegisterComp> ::= "y" 1152 | "z" 1153 | "w" 1154 1155 <addrWriteMask> ::= "." "x" 1156 | <vp2-addrWriteMask> 1157 1158 <vp2-addrWriteMask> ::= "" 1159 | "." "y" 1160 | "." "x" "y" 1161 | "." "z" 1162 | "." "x" "z" 1163 | "." "y" "z" 1164 | "." "x" "y" "z" 1165 | "." "w" 1166 | "." "x" "w" 1167 | "." "y" "w" 1168 | "." "x" "y" "w" 1169 | "." "z" "w" 1170 | "." "x" "z" "w" 1171 | "." "y" "z" "w" 1172 | "." "x" "y" "z" "w" 1173 1174 1175 <optionalSign> ::= "" 1176 | "-" 1177 | <vp2-optionalSign> 1178 1179 <vp2-optionalSign> ::= "+" 1180 1181 <vp2-instructionLabel> ::= <vp2-branchLabel> ":" 1182 1183 <vp2-branchLabel> ::= <identifier> 1184 1185 <optionalWriteMask> ::= "" 1186 | "." "x" 1187 | "." "y" 1188 | "." "x" "y" 1189 | "." "z" 1190 | "." "x" "z" 1191 | "." "y" "z" 1192 | "." "x" "y" "z" 1193 | "." "w" 1194 | "." "x" "w" 1195 | "." "y" "w" 1196 | "." "x" "y" "w" 1197 | "." "z" "w" 1198 | "." "x" "z" "w" 1199 | "." "y" "z" "w" 1200 | "." "x" "y" "z" "w" 1201 1202 <optionalCCMask> ::= "" 1203 | <vp2-ccMask> 1204 1205 <vp2-ccMask> ::= "(" <vp2-ccMaskRule> <swizzleSuffix> ")" 1206 1207 <vp2-ccMaskRule> ::= "EQ" | "GE" | "GT" | "LE" | "LT" | "NE" 1208 | "TR" | "FL" 1209 1210 <scalarSuffix> ::= "." <component> 1211 1212 <swizzleSuffix> ::= "" 1213 | "." <component> 1214 | "." <component> <component> 1215 <component> <component> 1216 1217 <component> ::= "x" 1218 | "y" 1219 | "z" 1220 | "w" 1221 1222 The <identifier> rule matches a sequence of one or more letters ("A" 1223 through "Z", "a" through "z", and "_") and digits ("0" through "9); the 1224 first character must be a letter. The underscore ("_") counts as a 1225 letter. Upper and lower case letters are different (names are 1226 case-sensitive). 1227 1228 The <vertexAttribRegNum> rule matches both register numbers 0 through 15 1229 and a set of mnemonics that abbreviate the aliasing of conventional 1230 per-vertex parameters to vertex attribute register numbers. Table X.3 1231 shows the mapping from mnemonic to vertex attribute register number and 1232 what the mnemonic abbreviates. 1233 1234 Vertex Attribute 1235 Mnemonic Register Number Meaning 1236 -------- ---------------- -------------------- 1237 "OPOS" 0 object position 1238 "WGHT" 1 vertex weight 1239 "NRML" 2 normal 1240 "COL0" 3 primary color 1241 "COL1" 4 secondary color 1242 "FOGC" 5 fog coordinate 1243 "TEX0" 8 texture coordinate 0 1244 "TEX1" 9 texture coordinate 1 1245 "TEX2" 10 texture coordinate 2 1246 "TEX3" 11 texture coordinate 3 1247 "TEX4" 12 texture coordinate 4 1248 "TEX5" 13 texture coordinate 5 1249 "TEX6" 14 texture coordinate 6 1250 "TEX7" 15 texture coordinate 7 1251 1252 Table X.3: The mapping between vertex attribute register numbers, 1253 mnemonics, and meanings. 1254 1255 A vertex program fails to load if it does not write at least one component 1256 of the HPOS register. 1257 1258 A vertex program fails to load in the VP1 execution environment if it 1259 contains more than 128 instructions. A vertex program fails to load in 1260 the VP2 execution environment if it contains more than 256 instructions. 1261 Each block of text matching the <instruction> rule counts as an 1262 instruction. 1263 1264 A vertex program fails to load if any instruction sources more than one 1265 unique program parameter register. An instruction can match the 1266 <progParamRegister> rule more than once only if all such matches are 1267 identical. 1268 1269 A vertex program fails to load if any instruction sources more than one 1270 unique vertex attribute register. An instruction can match the 1271 <vtxAttribRegister> rule more than once only if all such matches refer to 1272 the same register. 1273 1274 The error INVALID_OPERATION is generated if a vertex program fails to load 1275 because it is not syntactically correct or for one of the semantic 1276 restrictions listed above. 1277 1278 The error INVALID_OPERATION is generated if a program is loaded for id 1279 when id is currently loaded with a program of a different target. 1280 1281 A successfully loaded vertex program is parsed into a sequence of 1282 instructions. Each instruction is identified by its tokenized name. The 1283 operation of these instructions when executed is defined in section 1284 2.14.1.10. 1285 1286 A successfully loaded program replaces the program previously assigned to 1287 the name specified by id. If the OUT_OF_MEMORY error is generated by 1288 LoadProgramNV, no change is made to the previous contents of the named 1289 program. 1290 1291 Querying the value of PROGRAM_ERROR_POSITION_NV returns a ubyte offset 1292 into the last loaded program string indicating where the first error in 1293 the program. If the program fails to load because of a semantic 1294 restriction that cannot be determined until the program is fully scanned, 1295 the error position will be len, the length of the program. If the program 1296 loads successfully, the value of PROGRAM_ERROR_POSITION_NV is assigned the 1297 value negative one. 1298 1299 1300 Section 2.14.1.9, Vertex Program Binding and Program Management 1301 1302 The current vertex program is invoked whenever vertex attribute zero is 1303 updated (whether by a VertexAttributeNV or Vertex command). The current 1304 vertex program is updated by 1305 1306 BindProgramNV(enum target, uint id); 1307 1308 where target must be VERTEX_PROGRAM_NV. This binds the vertex program 1309 named by id as the current vertex program. The error INVALID_OPERATION 1310 is generated if id names a program that is not a vertex program 1311 (for example, if id names a vertex state program as described in 1312 section 2.14.4). 1313 1314 Binding to a nonexistent program id does not generate an error. 1315 In particular, binding to program id zero does not generate an error. 1316 However, because program zero cannot be loaded, program zero is 1317 always nonexistent. If a program id is successfully loaded with a 1318 new vertex program and id is also the currently bound vertex program, 1319 the new program is considered the currently bound vertex program. 1320 1321 The INVALID_OPERATION error is generated when both vertex program 1322 mode is enabled and Begin is called (or when a command that performs 1323 an implicit Begin is called) if the current vertex program is 1324 nonexistent or not valid. A vertex program may not be valid for 1325 reasons explained in section 2.14.5. 1326 1327 Programs are deleted by calling 1328 1329 void DeleteProgramsNV(sizei n, const uint *ids); 1330 1331 ids contains n names of programs to be deleted. After a program 1332 is deleted, it becomes nonexistent, and its name is again unused. 1333 If a program that is currently bound is deleted, it is as though 1334 BindProgramNV has been executed with the same target as the deleted 1335 program and program zero. Unused names in ids are silently ignored, 1336 as is the value zero. 1337 1338 The command 1339 1340 void GenProgramsNV(sizei n, uint *ids); 1341 1342 returns n previously unused program names in ids. These names 1343 are marked as used, for the purposes of GenProgramsNV only, 1344 but they become existent programs only when the are first loaded 1345 using LoadProgramNV. The error INVALID_VALUE is generated if n 1346 is negative. 1347 1348 An implementation may choose to establish a working set of programs on 1349 which binding and ExecuteProgramNV operations (execute programs are 1350 explained in section 2.14.4) are performed with higher performance. 1351 A program that is currently part of this working set is said to 1352 be resident. 1353 1354 The command 1355 1356 boolean AreProgramsResidentNV(sizei n, const uint *ids, 1357 boolean *residences); 1358 1359 returns TRUE if all of the n programs named in ids are resident, 1360 or if the implementation does not distinguish a working set. If at 1361 least one of the programs named in ids is not resident, then FALSE is 1362 returned, and the residence of each program is returned in residences. 1363 Otherwise the contents of residences are not changed. If any of 1364 the names in ids are nonexistent or zero, FALSE is returned, the 1365 error INVALID_VALUE is generated, and the contents of residences 1366 are indeterminate. The residence status of a single named program 1367 can also be queried by calling GetProgramivNV with id set to the 1368 name of the program and pname set to PROGRAM_RESIDENT_NV. 1369 1370 AreProgramsResidentNV indicates only whether a program is 1371 currently resident, not whether it could not be made resident. 1372 An implementation may choose to make a program resident only on 1373 first use, for example. The client may guide the GL implementation 1374 in determining which programs should be resident by requesting a 1375 set of programs to make resident. 1376 1377 The command 1378 1379 void RequestResidentProgramsNV(sizei n, const uint *ids); 1380 1381 requests that the n programs named in ids should be made resident. 1382 While all the programs are not guaranteed to become resident, 1383 the implementation should make a best effort to make as many of 1384 the programs resident as possible. As a result of making the 1385 requested programs resident, program names not among the requested 1386 programs may become non-resident. Higher priority for residency 1387 should be given to programs listed earlier in the ids array. 1388 RequestResidentProgramsNV silently ignores attempts to make resident 1389 nonexistent program names or zero. AreProgramsResidentNV can be 1390 called after RequestResidentProgramsNV to determine which programs 1391 actually became resident. 1392 1393 1394 Section 2.14.2, Vertex Program Operation 1395 1396 In the VP1 execution environment, there are twenty-one vertex program 1397 instructions. Four instructions (ABS, DPH, RCC, and SUB) are available 1398 only in the VP1.1 execution environment. The instructions and their 1399 respective input and output parameters are summarized in Table X.4. 1400 1401 Instruction Inputs Output Description 1402 ----------- ------ ------ -------------------------------- 1403 ABS(*) v v absolute value 1404 ADD v,v v add 1405 ARL v as address register load 1406 DP3 v,v ssss 3-component dot product 1407 DP4 v,v ssss 4-component dot product 1408 DPH(*) v,v ssss homogeneous dot product 1409 DST v,v v distance vector 1410 EXP s v exponential base 2 (approximate) 1411 LIT v v compute light coefficients 1412 LOG s v logarithm base 2 (approximate) 1413 MAD v,v,v v multiply and add 1414 MAX v,v v maximum 1415 MIN v,v v minimum 1416 MOV v v move 1417 MUL v,v v multiply 1418 RCC(*) s ssss reciprocal (clamped) 1419 RCP s ssss reciprocal 1420 RSQ s ssss reciprocal square root 1421 SGE v,v v set on greater than or equal 1422 SLT v,v v set on less than 1423 SUB(*) v,v v subtract 1424 1425 Table X.4: Summary of vertex program instructions in the VP1 execution 1426 environment. "v" indicates a floating-point vector input or output, "s" 1427 indicates a floating-point scalar input, "ssss" indicates a scalar output 1428 replicated across a 4-component vector, "as" indicates a single component 1429 of an address register. 1430 1431 1432 In the VP2 execution environment, are thirty-nine vertex program 1433 instructions. Vertex program instructions may have an optional suffix of 1434 "C" to allow an update of the condition code register (section 2.14.1.6). 1435 For example, there are two instructions to perform vector addition, "ADD" 1436 and "ADDC". The vertex program instructions available in the VP2 1437 execution environment and their respective input and output parameters are 1438 summarized in Table X.5. 1439 1440 Instruction Inputs Output Description 1441 ----------- ------ ------ -------------------------------- 1442 ABS[C] v v absolute value 1443 ADD[C] v,v v add 1444 ARA[C] av av address register add 1445 ARL[C] v av address register load 1446 ARR[C] v av address register load (with round) 1447 BRA as none branch 1448 CAL as none subroutine call 1449 COS[C] s ssss cosine 1450 DP3[C] v,v ssss 3-component dot product 1451 DP4[C] v,v ssss 4-component dot product 1452 DPH[C] v,v ssss homogeneous dot product 1453 DST[C] v,v v distance vector 1454 EX2[C] s ssss exponential base 2 1455 EXP[C] s v exponential base 2 (approximate) 1456 FLR[C] v v floor 1457 FRC[C] v v fraction 1458 LG2[C] s ssss logarithm base 2 1459 LIT[C] v v compute light coefficients 1460 LOG[C] s v logarithm base 2 (approximate) 1461 MAD[C] v,v,v v multiply and add 1462 MAX[C] v,v v maximum 1463 MIN[C] v,v v minimum 1464 MOV[C] v v move 1465 MUL[C] v,v v multiply 1466 RCC[C] s ssss reciprocal (clamped) 1467 RCP[C] s ssss reciprocal 1468 RET none none subroutine call return 1469 RSQ[C] s ssss reciprocal square root 1470 SEQ[C] v,v v set on equal 1471 SFL[C] v,v v set on false 1472 SGE[C] v,v v set on greater than or equal 1473 SGT[C] v,v v set on greater than 1474 SIN[C] s ssss sine 1475 SLE[C] v,v v set on less than or equal 1476 SLT[C] v,v v set on less than 1477 SNE[C] v,v v set on not equal 1478 SSG[C] v v set sign 1479 STR[C] v,v v set on true 1480 SUB[C] v,v v subtract 1481 1482 Table X.5: Summary of vertex program instructions in the VP2 execution 1483 environment. "v" indicates a floating-point vector input or output, "s" 1484 indicates a floating-point scalar input, "ssss" indicates a scalar output 1485 replicated across a 4-component vector, "av" indicates a full address 1486 register, "as" indicates a single component of an address register. 1487 1488 1489 Section 2.14.2.1, Vertex Program Operands 1490 1491 Most vertex program instructions operate on floating-point vectors, 1492 floating-point scalars, or integer scalars as, indicated in the grammar 1493 (see section 2.14.1.8) by the rules <vectorSrc>, <scalarSrc>, and 1494 <scalarAddr>, respectively. 1495 1496 The basic set of floating-point scalar operands is defined by the grammar 1497 rule <baseScalarSrc>. Scalar operands are single components of vertex 1498 attribute, program parameter, or temporary registers, as allowed by the 1499 <srcRegister> rule. A vector component is selected by the <scalarSuffix> 1500 rule, where the characters "x", "y", "z", and "w" select the x, y, z, and 1501 w components, respectively, of the vector. 1502 1503 The basic set of floating-point vector operands is defined by the grammar 1504 rule <baseVectorSrc>. Vector operands can be obtained from vertex 1505 attribute, program parameter, or temporary registers as allowed by the 1506 <srcRegister> rule. 1507 1508 Basic vector operands can be swizzled according to the <swizzleSuffix> 1509 rule. In its most general form, the <swizzleSuffix> rule matches the 1510 pattern ".????" where each question mark is replaced with one of "x", "y", 1511 "z", or "w". For such patterns, the x, y, z, and w components of the 1512 operand are taken from the vector components named by the first, second, 1513 third, and fourth character of the pattern, respectively. For example, if 1514 the swizzle suffix is ".yzzx" and the specified source contains {2,8,9,0}, 1515 the swizzled operand used by the instruction is {8,9,9,2}. 1516 1517 If the <swizzleSuffix> rule matches "", it is treated as though it were 1518 ".xyzw". If the <swizzleSuffix> rule matches (ignoring whitespace) ".x", 1519 ".y", ".z", or ".w", these are treated the same as ".xxxx", ".yyyy", 1520 ".zzzz", and ".wwww" respectively. 1521 1522 Floating-point scalar or vector operands can optionally be negated 1523 according to the <negate> rules in <baseScalarSrc> and <baseVectorSrc>. 1524 If the <negate> matches "-", each operand or operand component is negated. 1525 1526 In the VP2 execution environment, a component-wise absolute value 1527 operation is performed on an operand if the <scalarSrc> or <vectorSrc> 1528 rules match <vp2-absScalarSrc> or <vp2-absVectorSrc>. In this case, the 1529 absolute value of each component of the operand is taken. In addition, if 1530 the <negate> rule in <vp2-absScalarSrc> or <vp2-absVectorSrc> matches "-", 1531 each component is subsequently negated. 1532 1533 Integer scalar operands are single components of one of the address 1534 register vectors, as identified by the <addrRegister> rule. A vector 1535 component is selected by the <scalarSuffix> rule in the same manner as 1536 floating-point scalar operands. Negation and absolute value operations 1537 are not available for integer scalar operands. 1538 1539 The following pseudo-code spells out the operand generation process. In 1540 the pseudo-code, "float" and "int" are floating-point and integer scalar 1541 types, while "floatVec" and "intVec" are four-component vectors. "source" 1542 is the register used for the operand, matching the <srcRegister> or 1543 <addrRegister> rules. "absolute" is TRUE if the operand matches the 1544 <vp2-absScalarSrc> or <vp2-absVectorSrc> rules, and FALSE otherwise. 1545 "negateBase" is TRUE if the <negate> rule in <baseScalarSrc> or 1546 <baseVectorSrc> matches "-" and FALSE otherwise. "negateAbs" is TRUE if 1547 the <negate> rule in <vp2-absScalarSrc> or <vp2-absVectorSrc> matches "-" 1548 and FALSE otherwise. The ".c***", ".*c**", ".**c*", ".***c" modifiers 1549 refer to the x, y, z, and w components obtained by the swizzle operation. 1550 1551 floatVec VectorLoad(floatVec source) 1552 { 1553 floatVec operand; 1554 1555 operand.x = source.c***; 1556 operand.y = source.*c**; 1557 operand.z = source.**c*; 1558 operand.w = source.***c; 1559 if (negateBase) { 1560 operand.x = -operand.x; 1561 operand.y = -operand.y; 1562 operand.z = -operand.z; 1563 operand.w = -operand.w; 1564 } 1565 if (absolute) { 1566 operand.x = abs(operand.x); 1567 operand.y = abs(operand.y); 1568 operand.z = abs(operand.z); 1569 operand.w = abs(operand.w); 1570 } 1571 if (negateAbs) { 1572 operand.x = -operand.x; 1573 operand.y = -operand.y; 1574 operand.z = -operand.z; 1575 operand.w = -operand.w; 1576 } 1577 1578 return operand; 1579 } 1580 1581 float ScalarLoad(floatVec source) 1582 { 1583 float operand; 1584 1585 operand = source.c***; 1586 if (negateBase) { 1587 operand = -operand; 1588 } 1589 if (absolute) { 1590 operand = abs(operand); 1591 } 1592 if (negateAbs) { 1593 operand = -operand; 1594 } 1595 1596 return operand; 1597 } 1598 1599 intVec AddrVectorLoad(intVec addrReg) 1600 { 1601 intVec operand; 1602 1603 operand.x = source.c***; 1604 operand.y = source.*c**; 1605 operand.z = source.**c*; 1606 operand.w = source.***c; 1607 1608 return operand; 1609 } 1610 1611 int AddrScalarLoad(intVec addrReg) 1612 { 1613 return source.c***; 1614 } 1615 1616 If an operand is obtained from a program parameter register, by matching 1617 the <progParamRegister> rule, the register number can be obtained by 1618 absolute or relative addressing. 1619 1620 When absolute addressing is used, by matching the <absProgParamReg> rule, 1621 the program parameter register number is the number matching the 1622 <progParamRegNum>. 1623 1624 When relative addressing is used, by matching the <relProgParamReg> rule, 1625 the program parameter register number is computed during program 1626 execution. An index is computed by adding the integer scalar operand 1627 specified by the <scalarAddr> rule to the positive or negative offset 1628 specified by the <progParamOffset> rule. If <progParamOffset> matches "", 1629 an offset of zero is used. 1630 1631 The following pseudo-code spells out the process of loading a program 1632 parameter. "addrReg" refers to the address register used for relative 1633 addressing, "absolute" is TRUE if the operand uses absolute addressing and 1634 FALSE otherwise. "paramNumber" is the program parameter number for 1635 absolute addressing; "paramOffset" is the program parameter offset for 1636 relative addressing. "paramRegiser" is an array holding the complete set 1637 of program parameter registers. 1638 1639 floatVec ProgramParameterLoad(intVec addrReg) 1640 { 1641 int index; 1642 1643 if (absolute) { 1644 index = paramNumber; 1645 } else { 1646 index = AddrScalarLoad(addrReg) + paramOffset 1647 } 1648 1649 return paramRegister[index]; 1650 } 1651 1652 1653 Section 2.14.2.2, Vertex Program Destination Register Update 1654 1655 Most vertex program instructions write a 4-component result vector to a 1656 single temporary, vertex result, or address register. Writes to 1657 individual components of the destination register are controlled by 1658 individual component write masks specified as part of the instruction. In 1659 the VP2 execution environment, writes are additionally controlled by the a 1660 condition code write mask, which is computed at run time. 1661 1662 The component write mask is specified by the <optionalWriteMask> rule 1663 found in the <maskedDstReg> or <maskedAddrReg> rule. If the optional mask 1664 is "", all components are enabled. Otherwise, the optional mask names the 1665 individual components to enable. The characters "x", "y", "z", and "w" 1666 match the x, y, z, and w components respectively. For example, an 1667 optional mask of ".xzw" indicates that the x, z, and w components should 1668 be enabled for writing but the y component should not. The grammar 1669 requires that the destination register mask components must be listed in 1670 "xyzw" order. 1671 1672 In the VP2 execution environment, the condition code write mask is 1673 specified by the <optionalCCMask> rule found in the <maskedDstReg> and 1674 <maskedAddrReg> rules. If the condition code mask matches "", all 1675 components are enabled. Otherwise, the condition code register is loaded 1676 and swizzled according to the swizzle codes specified by <swizzleSuffix>. 1677 Each component of the swizzled condition code is tested according to the 1678 rule given by <ccMaskRule>. <ccMaskRule> may have the values "EQ", "NE", 1679 "LT", "GE", LE", or "GT", which mean to enable writes if the corresponding 1680 condition code field evaluates to equal, not equal, less than, greater 1681 than or equal, less than or equal, or greater than, respectively. 1682 Comparisons involving condition codes of "UN" (unordered) evaluate to true 1683 for "NE" and false otherwise. For example, if the condition code is 1684 (GT,LT,EQ,GT) and the condition code mask is "(NE.zyxw)", the swizzle 1685 operation will load (EQ,LT,GT,GT) and the mask will thus will enable 1686 writes on the y, z, and w components. In addition, "TR" always enables 1687 writes and "FL" always disables writes, regardless of the condition code. 1688 1689 Each component of the destination register is updated with the result of 1690 the vertex program instruction if and only if the component is enabled for 1691 writes by the component write mask, and the optional condition code mask 1692 (if applicable). Otherwise, the component of the destination register 1693 remains unchanged. 1694 1695 In the VP2 execution environment, a vertex program instruction can also 1696 optionally update the condition code register. The condition code is 1697 updated if the condition code register update suffix "C" is present in the 1698 instruction. The instruction "ADDC" will update the condition code; the 1699 otherwise equivalent instruction "ADD" will not. If condition code 1700 updates are enabled, each component of the destination register enabled 1701 for writes is compared to zero. The corresponding component of the 1702 condition code is set to "LT", "EQ", or "GT", if the written component is 1703 less than, equal to, or greater than zero, respectively. Condition code 1704 components are set to "UN" if the written component is NaN. Values of 1705 -0.0 and +0.0 both evaluate to "EQ". If a component of the destination 1706 register is not enabled for writes, the corresponding condition code 1707 component is also unchanged. 1708 1709 In the following example code, 1710 1711 # R1=(-2, 0, 2, NaN) R0 CC 1712 MOVC R0, R1; # ( -2, 0, 2, NaN) (LT,EQ,GT,UN) 1713 MOVC R0.xyz, R1.yzwx; # ( 0, 2, NaN, NaN) (EQ,GT,UN,UN) 1714 MOVC R0 (NE), R1.zywx; # ( 0, 0, NaN, -2) (EQ,EQ,UN,LT) 1715 1716 the first instruction writes (-2,0,2,NaN) to R0 and updates the condition 1717 code to (LT,EQ,GT,UN). The second instruction, only the "x", "y", and "z" 1718 components of R0 and the condition code are updated, so R0 ends up with 1719 (0,2,NaN,NaN) and the condition code ends up with (EQ,GT,UN,UN). In the 1720 third instruction, the condition code mask disables writes to the x 1721 component (its condition code field is "EQ"), so R0 ends up with 1722 (0,0,NaN,-2) and the condition code ends up with (EQ,EQ,UN,LT). 1723 1724 The following pseudocode illustrates the process of writing a result 1725 vector to the destination register. In the pseudocode, "instrmask" refers 1726 to the component write mask given by the <optionalWriteMask> rule. In the 1727 VP1 execution environment, "ccMaskRule" is always "" and "updatecc" is 1728 always FALSE. In the VP2 execution environment, "ccMaskRule" refers to 1729 the condition code mask rule given by <vp2-optionalCCMask> and "updatecc" 1730 is TRUE if and only if condition code updates are enabled. "result", 1731 "destination", and "cc" refer to the result vector, the register selected 1732 by <dstRegister> and the condition code, respectively. Condition codes do 1733 not exist in the VP1 execution environment. 1734 1735 boolean TestCC(CondCode field) { 1736 switch (ccMaskRule) { 1737 case "EQ": return (field == "EQ"); 1738 case "NE": return (field != "EQ"); 1739 case "LT": return (field == "LT"); 1740 case "GE": return (field == "GT" || field == "EQ"); 1741 case "LE": return (field == "LT" || field == "EQ"); 1742 case "GT": return (field == "GT"); 1743 case "TR": return TRUE; 1744 case "FL": return FALSE; 1745 case "": return TRUE; 1746 } 1747 } 1748 1749 enum GenerateCC(float value) { 1750 if (value == NaN) { 1751 return UN; 1752 } else if (value < 0) { 1753 return LT; 1754 } else if (value == 0) { 1755 return EQ; 1756 } else { 1757 return GT; 1758 } 1759 } 1760 1761 void UpdateDestination(floatVec destination, floatVec result) 1762 { 1763 floatVec merged; 1764 ccVec mergedCC; 1765 1766 // Merge the converted result into the destination register, under 1767 // control of the compile- and run-time write masks. 1768 merged = destination; 1769 mergedCC = cc; 1770 if (instrMask.x && TestCC(cc.c***)) { 1771 merged.x = result.x; 1772 if (updatecc) mergedCC.x = GenerateCC(result.x); 1773 } 1774 if (instrMask.y && TestCC(cc.*c**)) { 1775 merged.y = result.y; 1776 if (updatecc) mergedCC.y = GenerateCC(result.y); 1777 } 1778 if (instrMask.z && TestCC(cc.**c*)) { 1779 merged.z = result.z; 1780 if (updatecc) mergedCC.z = GenerateCC(result.z); 1781 } 1782 if (instrMask.w && TestCC(cc.***c)) { 1783 merged.w = result.w; 1784 if (updatecc) mergedCC.w = GenerateCC(result.w); 1785 } 1786 1787 // Write out the new destination register and condition code. 1788 destination = merged; 1789 cc = mergedCC; 1790 } 1791 1792 Section 2.14.2.3, Vertex Program Execution 1793 1794 In the VP1 execution environment, vertex programs consist of a sequence of 1795 instructions without no support for branching. Vertex programs begin by 1796 executing the first instruction in the program, and execute instructions 1797 in the order specified in the program until the last instruction is 1798 reached. 1799 1800 VP2 vertex programs can contain one or more instruction labels, matching 1801 the grammar rule <vp2-instructionLabel>. An instruction label can be 1802 referred to explicitly in branch (BRA) or subroutine call (CAL) 1803 instructions. Instruction labels can be defined or used at any point in 1804 the body of a program, and can be used in instructions before being 1805 defined in the program string. 1806 1807 VP2 vertex program branching instructions can be conditional. The branch 1808 condition is specified by the <vp2-conditionMask> and may depend on the 1809 contents of the condition code register. Branch conditions are evaluated 1810 by evaluating a condition code write mask in exactly the same manner as 1811 done for register writes (section 2.14.2.2). If any of the four 1812 components of the condition code write mask are enabled, the branch is 1813 taken and execution continues with the instruction following the label 1814 specified in the instruction. Otherwise, the instruction is ignored and 1815 vertex program execution continues with the next instruction. In the 1816 following example code, 1817 1818 MOVC CC, c[0]; # c[0]=(-2, 0, 2, NaN), CC gets (LT,EQ,GT,UN) 1819 BRA label1 (LT.xyzw); 1820 MOV R0,R1; # not executed 1821 label1: 1822 BRA label2 (LT.wyzw); 1823 MOV R0,R2; # executed 1824 label2: 1825 1826 the first BRA instruction loads a condition code of (LT,EQ,GT,UN) while 1827 the second BRA instruction loads a condition code of (UN,EQ,GT,UN). The 1828 first branch will be taken because the "x" component evaluates to LT; the 1829 second branch will not be taken because no component evaluates to LT. 1830 1831 VP2 vertex programs can specify subroutine calls. When a subroutine call 1832 (CAL) instruction is executed, a reference to the instruction immediately 1833 following the CAL instruction is pushed onto the call stack. When a 1834 subroutine return (RET) instruction is executed, an instruction reference 1835 is popped off the call stack and program execution continues with the 1836 popped instruction. A vertex program will terminate if a CAL instruction 1837 is executed with four entries already in the call stack or if a RET 1838 instruction is executed with an empty call stack. 1839 1840 If a VP2 vertex program has an instruction label "main", program execution 1841 begins with the instruction immediately following the instruction label. 1842 Otherwise, program execution begins with the first instruction of the 1843 program. Instructions will be executed sequentially in the order 1844 specified in the program, although branch instructions will affect the 1845 instruction execution order, as described above. A vertex program will 1846 terminate after executing a RET instruction with an empty call stack. A 1847 vertex program will also terminate after executing the last instruction in 1848 the program, unless that instruction was a taken branch. 1849 1850 A vertex program will fail to load if an instruction refers to a label 1851 that is not defined in the program string. 1852 1853 A vertex program will terminate abnormally if a subroutine call 1854 instruction produces a call stack overflow. Additionally, a vertex 1855 program will terminate abnormally after executing 65536 instructions to 1856 prevent hangs caused by infinite loops in the program. 1857 1858 When a vertex program terminates, normally or abnormally, it will emit a 1859 vertex whose attributes are taken from the final values of the vertex 1860 result registers (section 2.14.1.5). 1861 1862 1863 Section 2.14.3, Vertex Program Instruction Set 1864 1865 The following sections describe the set of supported vertex program 1866 instructions. Instructions available only in the VP1.1 or VP2 execution 1867 environment will be noted in the instruction description. 1868 1869 Each section will contain pseudocode describing the instruction. 1870 Instructions will have up to three operands, referred to as "op0", "op1", 1871 and "op2". The operands are loaded using the mechanisms specified in 1872 section 2.14.2.1. Most instructions will generate a result vector called 1873 "result". The result vector is then written to the destination register 1874 specified in the instruction using the mechanisms specified in section 1875 2.14.2.2. 1876 1877 Operands and results are represented as 32-bit single-precision 1878 floating-point numbers according to the IEEE 754 floating-point 1879 specification. IEEE denorm encodings, used to represent numbers smaller 1880 than 2^-126, are not supported. All such numbers are flushed to zero. 1881 There are three special encodings referred to in this section: +INF means 1882 "positive infinity", -INF means "negative infinity", and NaN refers to 1883 "not a number". 1884 1885 Arithmetic operations are typically carried out in single precision 1886 according to the rules specified in the IEEE 754 specification. Any 1887 exceptions and special cases will be noted in the instruction description. 1888 1889 1890 Section 2.14.3.1, ABS: Absolute Value 1891 1892 The ABS instruction performs a component-wise absolute value operation on 1893 the single operand to yield a result vector. 1894 1895 tmp = VectorLoad(op0); 1896 result.x = abs(tmp.x); 1897 result.y = abs(tmp.y); 1898 result.z = abs(tmp.z); 1899 result.w = abs(tmp.w); 1900 1901 The following special-case rules apply to absolute value operation: 1902 1903 1. abs(NaN) = NaN. 1904 2. abs(-INF) = abs(+INF) = +INF. 1905 3. abs(-0.0) = abs(+0.0) = +0.0. 1906 1907 The ABS instruction is available only in the VP1.1 and VP2 execution 1908 environments. 1909 1910 In the VP1.0 execution environment, the same functionality can be achieved 1911 with "MAX result, src, -src". 1912 1913 In the VP2 execution environment, the ABS instruction is effectively 1914 obsolete, since instructions can take the absolute value of each operand 1915 at no cost. 1916 1917 1918 Section 2.14.3.2, ADD: Add 1919 1920 The ADD instruction performs a component-wise add of the two operands to 1921 yield a result vector. 1922 1923 tmp0 = VectorLoad(op0); 1924 tmp1 = VectorLoad(op1); 1925 result.x = tmp0.x + tmp1.x; 1926 result.y = tmp0.y + tmp1.y; 1927 result.z = tmp0.z + tmp1.z; 1928 result.w = tmp0.w + tmp1.w; 1929 1930 The following special-case rules apply to addition: 1931 1932 1. "A+B" is always equivalent to "B+A". 1933 2. NaN + <x> = NaN, for all <x>. 1934 3. +INF + <x> = +INF, for all <x> except NaN and -INF. 1935 4. -INF + <x> = -INF, for all <x> except NaN and +INF. 1936 5. +INF + -INF = NaN. 1937 6. -0.0 + <x> = <x>, for all <x>. 1938 7. +0.0 + <x> = <x>, for all <x> except -0.0. 1939 1940 1941 Section 2.14.3.3, ARA: Address Register Add 1942 1943 The ARA instruction adds two pairs of components of a vector address 1944 register operand to produce an integer result vector. The "x" and "z" 1945 components of the result vector contain the sum of the "x" and "z" 1946 components of the operand; the "y" and "w" components of the result vector 1947 contain the sum of the "y" and "w" components of the operand. Each 1948 component of the result vector is clamped to [-512, +511], the range of 1949 representable address register components. 1950 1951 itmp = AddrVectorLoad(op0); 1952 iresult.x = itmp.x + itmp.z; 1953 iresult.y = itmp.y + itmp.w; 1954 iresult.z = itmp.x + itmp.z; 1955 iresult.w = itmp.y + itmp.w; 1956 if (iresult.x < -512) iresult.x = -512; 1957 if (iresult.x > 511) iresult.x = 511; 1958 if (iresult.y < -512) iresult.y = -512; 1959 if (iresult.y > 511) iresult.y = 511; 1960 if (iresult.z < -512) iresult.z = -512; 1961 if (iresult.z > 511) iresult.z = 511; 1962 if (iresult.w < -512) iresult.w = -512; 1963 if (iresult.w > 511) iresult.w = 511; 1964 1965 Component swizzling is not supported when the operand is loaded. 1966 1967 The ARA instruction is available only in the VP2 execution environment. 1968 1969 1970 Section 2.14.3.4, ARL: Address Register Load 1971 1972 In the VP1 execution environment, the ARL instruction loads a single 1973 scalar operand and performs a floor operation to generate an integer 1974 scalar to be written to the address register. 1975 1976 tmp = ScalarLoad(op0); 1977 iresult.x = floor(tmp); 1978 1979 In the VP2 execution environment, the ARL instruction loads a single 1980 vector operand and performs a component-wise floor operation to generate 1981 an integer result vector. Each component of the result vector is clamped 1982 to [-512, +511], the range of representable address register components. 1983 The ARL instruction applies all masking operations to address register 1984 writes as are described in section 2.14.2.2. 1985 1986 tmp = VectorLoad(op0); 1987 iresult.x = floor(tmp.x); 1988 iresult.y = floor(tmp.y); 1989 iresult.z = floor(tmp.z); 1990 iresult.w = floor(tmp.w); 1991 if (iresult.x < -512) iresult.x = -512; 1992 if (iresult.x > 511) iresult.x = 511; 1993 if (iresult.y < -512) iresult.y = -512; 1994 if (iresult.y > 511) iresult.y = 511; 1995 if (iresult.z < -512) iresult.z = -512; 1996 if (iresult.z > 511) iresult.z = 511; 1997 if (iresult.w < -512) iresult.w = -512; 1998 if (iresult.w > 511) iresult.w = 511; 1999 2000 The following special-case rules apply to floor computation: 2001 2002 1. floor(NaN) = NaN. 2003 2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF. In all cases, the 2004 sign of the result is equal to the sign of the operand. 2005 2006 2007 Section 2.14.3.5, ARR: Address Register Load (with round) 2008 2009 The ARR instruction loads a single vector operand and performs a 2010 component-wise round operation to generate an integer result vector. Each 2011 component of the result vector is clamped to [-512, +511], the range of 2012 representable address register components. The ARR instruction applies 2013 all masking operations to address register writes as described in section 2014 2.14.2.2. 2015 2016 tmp = VectorLoad(op0); 2017 iresult.x = round(tmp.x); 2018 iresult.y = round(tmp.y); 2019 iresult.z = round(tmp.z); 2020 iresult.w = round(tmp.w); 2021 if (iresult.x < -512) iresult.x = -512; 2022 if (iresult.x > 511) iresult.x = 511; 2023 if (iresult.y < -512) iresult.y = -512; 2024 if (iresult.y > 511) iresult.y = 511; 2025 if (iresult.z < -512) iresult.z = -512; 2026 if (iresult.z > 511) iresult.z = 511; 2027 if (iresult.w < -512) iresult.w = -512; 2028 if (iresult.w > 511) iresult.w = 511; 2029 2030 The rounding function, round(x), returns the nearest integer to <x>. If 2031 the fractional portion of <x> is 0.5, round(x) selects the nearest even 2032 integer. 2033 2034 The ARR instruction is available only in the VP2 execution environment. 2035 2036 2037 Section 2.14.3.6, BRA: Branch 2038 2039 The BRA instruction conditionally transfers control to the instruction 2040 following the label specified in the instruction. The following 2041 pseudocode describes the operation of the instruction: 2042 2043 if (TestCC(cc.c***) || TestCC(cc.*c**) || 2044 TestCC(cc.**c*) || TestCC(cc.***c)) { 2045 // continue execution at instruction following <branchLabel> 2046 } else { 2047 // do nothing 2048 } 2049 2050 In the pseudocode, <branchLabel> is the label specified in the instruction 2051 matching the <vp2-branchLabel> grammar rule. 2052 2053 The BRA instruction is available only in the VP2 execution environment. 2054 2055 2056 Section 2.14.3.7, CAL: Subroutine Call 2057 2058 The CAL instruction conditionally transfers control to the instruction 2059 following the label specified in the instruction. It also pushes a 2060 reference to the instruction immediately following the CAL instruction 2061 onto the call stack, where execution will continue after executing the 2062 matching RET instruction. The following pseudocode describes the 2063 operation of the instruction: 2064 2065 if (TestCC(cc.c***) || TestCC(cc.*c**) || 2066 TestCC(cc.**c*) || TestCC(cc.***c)) { 2067 if (callStackDepth >= 4) { 2068 // terminate vertex program 2069 } else { 2070 callStack[callStackDepth] = nextInstruction; 2071 callStackDepth++; 2072 } 2073 // continue execution at instruction following <branchLabel> 2074 } else { 2075 // do nothing 2076 } 2077 2078 In the pseudocode, <branchLabel> is the label specified in the instruction 2079 matching the <vp2-branchLabel> grammar rule, <callStackDepth> is the 2080 current depth of the call stack, <callStack> is an array holding the call 2081 stack, and <nextInstruction> is a reference to the instruction immediately 2082 following the present one in the program string. 2083 2084 The CAL instruction is available only in the VP2 execution environment. 2085 2086 2087 Section 2.14.3.8, COS: Cosine 2088 2089 The COS instruction approximates the cosine of the angle specified by the 2090 scalar operand and replicates the approximation to all four components of 2091 the result vector. The angle is specified in radians and does not have to 2092 be in the range [0,2*PI]. 2093 2094 tmp = ScalarLoad(op0); 2095 result.x = ApproxCosine(tmp); 2096 result.y = ApproxCosine(tmp); 2097 result.z = ApproxCosine(tmp); 2098 result.w = ApproxCosine(tmp); 2099 2100 The approximation function ApproxCosine is accurate to at least 22 bits 2101 with an angle in the range [0,2*PI]. 2102 2103 | ApproxCosine(x) - cos(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. 2104 2105 The error in the approximation will typically increase with the absolute 2106 value of the angle when the angle falls outside the range [0,2*PI]. 2107 2108 The following special-case rules apply to cosine approximation: 2109 2110 1. ApproxCosine(NaN) = NaN. 2111 2. ApproxCosine(+/-INF) = NaN. 2112 3. ApproxCosine(+/-0.0) = +1.0. 2113 2114 The COS instruction is available only in the VP2 execution environment. 2115 2116 2117 Section 2.14.3.9, DP3: 3-component Dot Product 2118 2119 The DP3 instruction computes a three component dot product of the two 2120 operands (using the x, y, and z components) and replicates the dot product 2121 to all four components of the result vector. 2122 2123 tmp0 = VectorLoad(op0); 2124 tmp1 = VectorLoad(op1): 2125 result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2126 (tmp0.z * tmp1.z); 2127 result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2128 (tmp0.z * tmp1.z); 2129 result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2130 (tmp0.z * tmp1.z); 2131 result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2132 (tmp0.z * tmp1.z); 2133 2134 2135 Section 2.14.3.10, DP4: 4-component Dot Product 2136 2137 The DP4 instruction computes a four component dot product of the two 2138 operands and replicates the dot product to all four components of the 2139 result vector. 2140 2141 tmp0 = VectorLoad(op0); 2142 tmp1 = VectorLoad(op1): 2143 result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2144 (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w); 2145 result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2146 (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w); 2147 result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2148 (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w); 2149 result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2150 (tmp0.z * tmp1.z) + (tmp0.w * tmp1.w); 2151 2152 2153 Section 2.14.3.11, DPH: Homogeneous Dot Product 2154 2155 The DPH instruction computes a four-component dot product of the two 2156 operands, except that the W component of the first operand is assumed to 2157 be 1.0. The instruction replicates the dot product to all four components 2158 of the result vector. 2159 2160 tmp0 = VectorLoad(op0); 2161 tmp1 = VectorLoad(op1): 2162 result.x = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2163 (tmp0.z * tmp1.z) + tmp1.w; 2164 result.y = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2165 (tmp0.z * tmp1.z) + tmp1.w; 2166 result.z = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2167 (tmp0.z * tmp1.z) + tmp1.w; 2168 result.w = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + 2169 (tmp0.z * tmp1.z) + tmp1.w; 2170 2171 The DPH instruction is available only in the VP1.1 and VP2 execution 2172 environments. 2173 2174 2175 Section 2.14.3.12, DST: Distance Vector 2176 2177 The DST instruction computes a distance vector from two specially- 2178 formatted operands. The first operand should be of the form [NA, d^2, 2179 d^2, NA] and the second operand should be of the form [NA, 1/d, NA, 1/d], 2180 where NA values are not relevant to the calculation and d is a vector 2181 length. If both vectors satisfy these conditions, the result vector will 2182 be of the form [1.0, d, d^2, 1/d]. 2183 2184 The exact behavior is specified in the following pseudo-code: 2185 2186 tmp0 = VectorLoad(op0); 2187 tmp1 = VectorLoad(op1); 2188 result.x = 1.0; 2189 result.y = tmp0.y * tmp1.y; 2190 result.z = tmp0.z; 2191 result.w = tmp1.w; 2192 2193 Given an arbitrary vector, d^2 can be obtained using the DP3 instruction 2194 (using the same vector for both operands) and 1/d can be obtained from d^2 2195 using the RSQ instruction. 2196 2197 This distance vector is useful for per-vertex light attenuation 2198 calculations: a DP3 operation using the distance vector and an 2199 attenuation constants vector as operands will yield the attenuation 2200 factor. 2201 2202 2203 Section 2.14.3.13, EX2: Exponential Base 2 2204 2205 The EX2 instruction approximates 2 raised to the power of the scalar 2206 operand and replicates it to all four components of the result vector. 2207 2208 tmp = ScalarLoad(op0); 2209 result.x = Approx2ToX(tmp); 2210 result.y = Approx2ToX(tmp); 2211 result.z = Approx2ToX(tmp); 2212 result.w = Approx2ToX(tmp); 2213 2214 The approximation function is accurate to at least 22 bits: 2215 2216 | Approx2ToX(x) - 2^x | < 1.0 / 2^22, if 0.0 <= x < 1.0, 2217 2218 and, in general, 2219 2220 | Approx2ToX(x) - 2^x | < (1.0 / 2^22) * (2^floor(x)). 2221 2222 The following special-case rules apply to exponential approximation: 2223 2224 1. Approx2ToX(NaN) = NaN. 2225 2. Approx2ToX(-INF) = +0.0. 2226 3. Approx2ToX(+INF) = +INF. 2227 4. Approx2ToX(+/-0.0) = +1.0. 2228 2229 The EX2 instruction is available only in the VP2 execution environment. 2230 2231 2232 Section 2.14.3.14, EXP: Exponential Base 2 (approximate) 2233 2234 The EXP instruction computes a rough approximation of 2 raised to the 2235 power of the scalar operand. The approximation is returned in the "z" 2236 component of the result vector. A vertex program can also use the "x" and 2237 "y" components of the result vector to generate a more accurate 2238 approximation by evaluating 2239 2240 result.x * f(result.y), 2241 2242 where f(x) is a user-defined function that approximates 2^x over the 2243 domain [0.0, 1.0). The "w" component of the result vector is always 1.0. 2244 2245 The exact behavior is specified in the following pseudo-code: 2246 2247 tmp = ScalarLoad(op0); 2248 result.x = 2^floor(tmp); 2249 result.y = tmp - floor(tmp); 2250 result.z = RoughApprox2ToX(tmp); 2251 result.w = 1.0; 2252 2253 The approximation function is accurate to at least 11 bits: 2254 2255 | RoughApprox2ToX(x) - 2^x | < 1.0 / 2^11, if 0.0 <= x < 1.0, 2256 2257 and, in general, 2258 2259 | RoughApprox2ToX(x) - 2^x | < (1.0 / 2^11) * (2^floor(x)). 2260 2261 The following special cases apply to the EXP instruction: 2262 2263 1. RoughApprox2ToX(NaN) = NaN. 2264 2. RoughApprox2ToX(-INF) = +0.0. 2265 3. RoughApprox2ToX(+INF) = +INF. 2266 4. RoughApprox2ToX(+/-0.0) = +1.0. 2267 2268 The EXP instruction is present for compatibility with the original 2269 NV_vertex_program instruction set; it is recommended that applications 2270 using NV_vertex_program2 use the EX2 instruction instead. 2271 2272 2273 Section 2.14.3.15, FLR: Floor 2274 2275 The FLR instruction performs a component-wise floor operation on the 2276 operand to generate a result vector. The floor of a value is defined as 2277 the largest integer less than or equal to the value. The floor of 2.3 is 2278 2.0; the floor of -3.6 is -4.0. 2279 2280 tmp = VectorLoad(op0); 2281 result.x = floor(tmp.x); 2282 result.y = floor(tmp.y); 2283 result.z = floor(tmp.z); 2284 result.w = floor(tmp.w); 2285 2286 The following special-case rules apply to floor computation: 2287 2288 1. floor(NaN) = NaN. 2289 2. floor(<x>) = <x>, for -0.0, +0.0, -INF, and +INF. In all cases, the 2290 sign of the result is equal to the sign of the operand. 2291 2292 The FLR instruction is available only in the VP2 execution environment. 2293 2294 2295 Section 2.14.3.16, FRC: Fraction 2296 2297 The FRC instruction extracts the fractional portion of each component of 2298 the operand to generate a result vector. The fractional portion of a 2299 component is defined as the result after subtracting off the floor of the 2300 component (see FLR), and is always in the range [0.00, 1.00). 2301 2302 For negative values, the fractional portion is NOT the number written to 2303 the right of the decimal point -- the fractional portion of -1.7 is not 2304 0.7 -- it is 0.3. 0.3 is produced by subtracting the floor of -1.7 (-2.0) 2305 from -1.7. 2306 2307 tmp = VectorLoad(op0); 2308 result.x = tmp.x - floor(tmp.x); 2309 result.y = tmp.y - floor(tmp.y); 2310 result.z = tmp.z - floor(tmp.z); 2311 result.w = tmp.w - floor(tmp.w); 2312 2313 The following special-case rules, which can be derived from the rules for 2314 FLR and ADD apply to fraction computation: 2315 2316 1. fraction(NaN) = NaN. 2317 2. fraction(+/-INF) = NaN. 2318 3. fraction(+/-0.0) = +0.0. 2319 2320 The FRC instruction is available only in the VP2 execution environment. 2321 2322 2323 Section 2.14.3.17, LG2: Logarithm Base 2 2324 2325 The LG2 instruction approximates the base 2 logarithm of the scalar 2326 operand and replicates it to all four components of the result vector. 2327 2328 tmp = ScalarLoad(op0); 2329 result.x = ApproxLog2(tmp); 2330 result.y = ApproxLog2(tmp); 2331 result.z = ApproxLog2(tmp); 2332 result.w = ApproxLog2(tmp); 2333 2334 The approximation function is accurate to at least 22 bits: 2335 2336 | ApproxLog2(x) - log_2(x) | < 1.0 / 2^22. 2337 2338 Note that for large values of x, there are not enough bits in the 2339 floating-point storage format to represent a result that precisely. 2340 2341 The following special-case rules apply to logarithm approximation: 2342 2343 1. ApproxLog2(NaN) = NaN. 2344 2. ApproxLog2(+INF) = +INF. 2345 3. ApproxLog2(+/-0.0) = -INF. 2346 4. ApproxLog2(x) = NaN, -INF < x < -0.0. 2347 5. ApproxLog2(-INF) = NaN. 2348 2349 The LG2 instruction is available only in the VP2 execution environment. 2350 2351 2352 Section 2.14.3.18, LIT: Compute Light Coefficients 2353 2354 The LIT instruction accelerates per-vertex lighting by computing lighting 2355 coefficients for ambient, diffuse, and specular light contributions. The 2356 "x" component of the operand is assumed to hold a diffuse dot product (n 2357 dot VP_pli, as in the vertex lighting equations in Section 2.13.1). The 2358 "y" component of the operand is assumed to hold a specular dot product (n 2359 dot h_i). The "w" component of the operand is assumed to hold the 2360 specular exponent of the material (s_rm), and is clamped to the range 2361 (-128, +128) exclusive. 2362 2363 The "x" component of the result vector receives the value that should be 2364 multiplied by the ambient light/material product (always 1.0). The "y" 2365 component of the result vector receives the value that should be 2366 multiplied by the diffuse light/material product (n dot VP_pli). The "z" 2367 component of the result vector receives the value that should be 2368 multiplied by the specular light/material product (f_i * (n dot h_i) ^ 2369 s_rm). The "w" component of the result is the constant 1.0. 2370 2371 Negative diffuse and specular dot products are clamped to 0.0, as is done 2372 in the standard per-vertex lighting operations. In addition, if the 2373 diffuse dot product is zero or negative, the specular coefficient is 2374 forced to zero. 2375 2376 tmp = VectorLoad(op0); 2377 if (t.x < 0) t.x = 0; 2378 if (t.y < 0) t.y = 0; 2379 if (t.w < -(128.0-epsilon)) t.w = -(128.0-epsilon); 2380 else if (t.w > 128-epsilon) t.w = 128-epsilon; 2381 result.x = 1.0; 2382 result.y = t.x; 2383 result.z = (t.x > 0) ? RoughApproxPower(t.y, t.w) : 0.0; 2384 result.w = 1.0; 2385 2386 The exponentiation approximation function is defined in terms of the base 2387 2 exponentiation and logarithm approximation operations in the EXP and LOG 2388 instructions, including errors and the processing of any special cases. 2389 In particular, 2390 2391 RoughApproxPower(a,b) = RoughApproxExp2(b * RoughApproxLog2(a)). 2392 2393 The following special-case rules, which can be derived from the rules in 2394 the LOG, MUL, and EXP instructions, apply to exponentiation: 2395 2396 1. RoughApproxPower(NaN, <x>) = NaN, 2397 2. RoughApproxPower(<x>, <y>) = NaN, if x <= -0.0, 2398 3. RoughApproxPower(+/-0.0, <x>) = +0.0, if x > +0.0, or 2399 +INF, if x < -0.0, 2400 4. RoughApproxPower(+1.0, <x>) = +1.0, if x is not NaN, 2401 5. RoughApproxPower(+INF, <x>) = +INF, if x > +0.0, or 2402 +0.0, if x < -0.0, 2403 6. RoughApproxPower(<x>, +/-0.0) = +1.0, if x >= -0.0 2404 7. RoughApproxPower(<x>, +INF) = +0.0, if -0.0 <= x < +1.0, 2405 +INF, if x > +1.0, 2406 8. RoughApproxPower(<x>, +INF) = +INF, if -0.0 <= x < +1.0, 2407 +0.0, if x > +1.0, 2408 9. RoughApproxPower(<x>, +1.0) = <x>, if x >= +0.0, and 2409 10. RoughApproxPower(<x>, NaN) = NaN. 2410 2411 2412 Section 2.14.3.19, LOG: Logarithm Base 2 (Approximate) 2413 2414 The LOG instruction computes a rough approximation of the base 2 logarithm 2415 of the absolute value of the scalar operand. The approximation is 2416 returned in the "z" component of the result vector. A vertex program can 2417 also use the "x" and "y" components of the result vector to generate a 2418 more accurate approximation by evaluating 2419 2420 result.x + f(result.y), 2421 2422 where f(x) is a user-defined function that approximates 2^x over the 2423 domain [1.0, 2.0). The "w" component of the result vector is always 1.0. 2424 2425 The exact behavior is specified in the following pseudo-code: 2426 2427 tmp = fabs(ScalarLoad(op0)); 2428 result.x = floor(log2(tmp)); 2429 result.y = tmp / (2^floor(log2(tmp))); 2430 result.z = RoughApproxLog2(tmp); 2431 result.w = 1.0; 2432 2433 The approximation function is accurate to at least 11 bits: 2434 2435 | RoughApproxLog2(x) - log_2(x) | < 1.0 / 2^11. 2436 2437 The following special-case rules apply to the LOG instruction: 2438 2439 1. RoughApproxLog2(NaN) = NaN. 2440 2. RoughApproxLog2(+INF) = +INF. 2441 3. RoughApproxLog2(+0.0) = -INF. 2442 2443 The LOG instruction is present for compatibility with the original 2444 NV_vertex_program instruction set; it is recommended that applications 2445 using NV_vertex_program2 use the LG2 instruction instead. 2446 2447 2448 Section 2.14.3.20, MAD: Multiply And Add 2449 2450 The MAD instruction performs a component-wise multiply of the first two 2451 operands, and then does a component-wise add of the product to the third 2452 operand to yield a result vector. 2453 2454 tmp0 = VectorLoad(op0); 2455 tmp1 = VectorLoad(op1); 2456 tmp2 = VectorLoad(op2); 2457 result.x = tmp0.x * tmp1.x + tmp2.x; 2458 result.y = tmp0.y * tmp1.y + tmp2.y; 2459 result.z = tmp0.z * tmp1.z + tmp2.z; 2460 result.w = tmp0.w * tmp1.w + tmp2.w; 2461 2462 All special case rules applicable to the ADD and MUL instructions apply to 2463 the individual components of the MAD operation as well. 2464 2465 2466 Section 2.14.3.21, MAX: Maximum 2467 2468 The MAX instruction computes component-wise maximums of the values in the 2469 two operands to yield a result vector. 2470 2471 tmp0 = VectorLoad(op0); 2472 tmp1 = VectorLoad(op1); 2473 result.x = max(tmp0.x, tmp1.x); 2474 result.y = max(tmp0.y, tmp1.y); 2475 result.z = max(tmp0.z, tmp1.z); 2476 result.w = max(tmp0.w, tmp1.w); 2477 2478 The following special cases apply to the maximum operation: 2479 2480 1. max(A,B) is always equivalent to max(B,A). 2481 2. max(NaN, <x>) == NaN, for all <x>. 2482 2483 2484 Section 2.14.3.22, MIN: Minimum 2485 2486 The MIN instruction computes component-wise minimums of the values in the 2487 two operands to yield a result vector. 2488 2489 tmp0 = VectorLoad(op0); 2490 tmp1 = VectorLoad(op1); 2491 result.x = min(tmp0.x, tmp1.x); 2492 result.y = min(tmp0.y, tmp1.y); 2493 result.z = min(tmp0.z, tmp1.z); 2494 result.w = min(tmp0.w, tmp1.w); 2495 2496 The following special cases apply to the minimum operation: 2497 2498 1. min(A,B) is always equivalent to min(B,A). 2499 2. min(NaN, <x>) == NaN, for all <x>. 2500 2501 2502 Section 2.14.3.23, MOV: Move 2503 2504 The MOV instruction copies the value of the operand to yield a result 2505 vector. 2506 2507 result = VectorLoad(op0); 2508 2509 2510 Section 2.14.3.24, MUL: Multiply 2511 2512 The MUL instruction performs a component-wise multiply of the two operands 2513 to yield a result vector. 2514 2515 tmp0 = VectorLoad(op0); 2516 tmp1 = VectorLoad(op1); 2517 result.x = tmp0.x * tmp1.x; 2518 result.y = tmp0.y * tmp1.y; 2519 result.z = tmp0.z * tmp1.z; 2520 result.w = tmp0.w * tmp1.w; 2521 2522 The following special-case rules apply to multiplication: 2523 2524 1. "A*B" is always equivalent to "B*A". 2525 2. NaN * <x> = NaN, for all <x>. 2526 3. +/-0.0 * +/-INF = NaN. 2527 4. +/-0.0 * <x> = +/-0.0, for all <x> except -INF, +INF, and NaN. The 2528 sign of the result is positive if the signs of the two operands match 2529 and negative otherwise. 2530 5. +/-INF * <x> = +/-INF, for all <x> except -0.0, +0.0, and NaN. The 2531 sign of the result is positive if the signs of the two operands match 2532 and negative otherwise. 2533 6. +1.0 * <x> = <x>, for all <x>. 2534 2535 2536 Section 2.14.3.25, RCC: Reciprocal (Clamped) 2537 2538 The RCC instruction approximates the reciprocal of the scalar operand, 2539 clamps the result to one of two ranges, and replicates the clamped result 2540 to all four components of the result vector. 2541 2542 If the approximate reciprocal is greater than 0.0, the result is clamped 2543 to the range [2^-64, 2^+64]. If the approximate reciprocal is not greater 2544 than zero, the result is clamped to the range [-2^+64, -2^-64]. 2545 2546 tmp = ScalarLoad(op0); 2547 result.x = ClampApproxReciprocal(tmp); 2548 result.y = ClampApproxReciprocal(tmp); 2549 result.z = ClampApproxReciprocal(tmp); 2550 result.w = ClampApproxReciprocal(tmp); 2551 2552 The approximation function is accurate to at least 22 bits: 2553 2554 | ClampApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0. 2555 2556 The following special-case rules apply to reciprocation: 2557 2558 1. ClampApproxReciprocal(NaN) = NaN. 2559 2. ClampApproxReciprocal(+INF) = +2^-64. 2560 3. ClampApproxReciprocal(-INF) = -2^-64. 2561 4. ClampApproxReciprocal(+0.0) = +2^64. 2562 5. ClampApproxReciprocal(-0.0) = -2^64. 2563 6. ClampApproxReciprocal(x) = +2^-64, if +2^64 < x < +INF. 2564 7. ClampApproxReciprocal(x) = -2^-64, if -INF < x < -2^-64. 2565 8. ClampApproxReciprocal(x) = +2^64, if +0.0 < x < +2^-64. 2566 9. ClampApproxReciprocal(x) = -2^64, if -2^-64 < x < -0.0. 2567 2568 The RCC instruction is available only in the VP1.1 and VP2 execution 2569 environments. 2570 2571 2572 Section 2.14.3.26, RCP: Reciprocal 2573 2574 The RCP instruction approximates the reciprocal of the scalar operand and 2575 replicates it to all four components of the result vector. 2576 2577 tmp = ScalarLoad(op0); 2578 result.x = ApproxReciprocal(tmp); 2579 result.y = ApproxReciprocal(tmp); 2580 result.z = ApproxReciprocal(tmp); 2581 result.w = ApproxReciprocal(tmp); 2582 2583 The approximation function is accurate to at least 22 bits: 2584 2585 | ApproxReciprocal(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 2.0. 2586 2587 The following special-case rules apply to reciprocation: 2588 2589 1. ApproxReciprocal(NaN) = NaN. 2590 2. ApproxReciprocal(+INF) = +0.0. 2591 3. ApproxReciprocal(-INF) = -0.0. 2592 4. ApproxReciprocal(+0.0) = +INF. 2593 5. ApproxReciprocal(-0.0) = -INF. 2594 2595 2596 Section 2.14.3.27, RET: Subroutine Call Return 2597 2598 The RET instruction conditionally returns from a subroutine initiated by a 2599 CAL instruction by popping an instruction reference off the top of the 2600 call stack and transferring control to the referenced instruction. The 2601 following pseudocode describes the operation of the instruction: 2602 2603 if (TestCC(cc.c***) || TestCC(cc.*c**) || 2604 TestCC(cc.**c*) || TestCC(cc.***c)) { 2605 if (callStackDepth <= 0) { 2606 // terminate vertex program 2607 } else { 2608 callStackDepth--; 2609 instruction = callStack[callStackDepth]; 2610 } 2611 2612 // continue execution at <instruction> 2613 } else { 2614 // do nothing 2615 } 2616 2617 In the pseudocode, <callStackDepth> is the depth of the call stack, 2618 <callStack> is an array holding the call stack, and <instruction> is a 2619 reference to an instruction previously pushed onto the call stack. 2620 2621 The RET instruction is available only in the VP2 execution environment. 2622 2623 2624 Section 2.14.3.28, RSQ: Reciprocal Square Root 2625 2626 The RSQ instruction approximates the reciprocal of the square root of the 2627 scalar operand and replicates it to all four components of the result 2628 vector. 2629 2630 tmp = ScalarLoad(op0); 2631 result.x = ApproxRSQRT(tmp); 2632 result.y = ApproxRSQRT(tmp); 2633 result.z = ApproxRSQRT(tmp); 2634 result.w = ApproxRSQRT(tmp); 2635 2636 The approximation function is accurate to at least 22 bits: 2637 2638 | ApproxRSQRT(x) - (1/x) | < 1.0 / 2^22, if 1.0 <= x < 4.0. 2639 2640 The following special-case rules apply to reciprocal square roots: 2641 2642 1. ApproxRSQRT(NaN) = NaN. 2643 2. ApproxRSQRT(+INF) = +0.0. 2644 3. ApproxRSQRT(-INF) = NaN. 2645 4. ApproxRSQRT(+0.0) = +INF. 2646 5. ApproxRSQRT(-0.0) = -INF. 2647 6. ApproxRSQRT(x) = NaN, if -INF < x < -0.0. 2648 2649 2650 Section 2.14.3.29, SEQ: Set on Equal 2651 2652 The SEQ instruction performs a component-wise comparison of the two 2653 operands. Each component of the result vector is 1.0 if the corresponding 2654 component of the first operand is equal to that of the second, and 0.0 2655 otherwise. 2656 2657 tmp0 = VectorLoad(op0); 2658 tmp1 = VectorLoad(op1); 2659 result.x = (tmp0.x == tmp1.x) ? 1.0 : 0.0; 2660 result.y = (tmp0.y == tmp1.y) ? 1.0 : 0.0; 2661 result.z = (tmp0.z == tmp1.z) ? 1.0 : 0.0; 2662 result.w = (tmp0.w == tmp1.w) ? 1.0 : 0.0; 2663 if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; 2664 if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; 2665 if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; 2666 if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; 2667 2668 The following special-case rules apply to SEQ: 2669 2670 1. (<x> == <y>) and (<y> == <x>) always produce the same result. 2671 1. (NaN == <x>) is FALSE for all <x>, including NaN. 2672 2. (+INF == +INF) and (-INF == -INF) are TRUE. 2673 3. (-0.0 == +0.0) and (+0.0 == -0.0) are TRUE. 2674 2675 The SEQ instruction is available only in the VP2 execution environment. 2676 2677 2678 Section 2.14.3.30, SFL: Set on False 2679 2680 The SFL instruction is a degenerate case of the other "Set on" 2681 instructions that sets all components of the result vector to 2682 0.0. 2683 2684 result.x = 0.0; 2685 result.y = 0.0; 2686 result.z = 0.0; 2687 result.w = 0.0; 2688 2689 The SFL instruction is available only in the VP2 execution environment. 2690 2691 2692 Section 2.14.3.31, SGE: Set on Greater Than or Equal 2693 2694 The SGE instruction performs a component-wise comparison of the two 2695 operands. Each component of the result vector is 1.0 if the corresponding 2696 component of the first operands is greater than or equal that of the 2697 second, and 0.0 otherwise. 2698 2699 tmp0 = VectorLoad(op0); 2700 tmp1 = VectorLoad(op1); 2701 result.x = (tmp0.x >= tmp1.x) ? 1.0 : 0.0; 2702 result.y = (tmp0.y >= tmp1.y) ? 1.0 : 0.0; 2703 result.z = (tmp0.z >= tmp1.z) ? 1.0 : 0.0; 2704 result.w = (tmp0.w >= tmp1.w) ? 1.0 : 0.0; 2705 if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; 2706 if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; 2707 if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; 2708 if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; 2709 2710 The following special-case rules apply to SGE: 2711 2712 1. (NaN >= <x>) and (<x> >= NaN) are FALSE for all <x>. 2713 2. (+INF >= +INF) and (-INF >= -INF) are TRUE. 2714 3. (-0.0 >= +0.0) and (+0.0 >= -0.0) are TRUE. 2715 2716 2717 Section 2.14.3.32, SGT: Set on Greater Than 2718 2719 The SGT instruction performs a component-wise comparison of the two 2720 operands. Each component of the result vector is 1.0 if the corresponding 2721 component of the first operands is greater than that of the second, and 2722 0.0 otherwise. 2723 2724 tmp0 = VectorLoad(op0); 2725 tmp1 = VectorLoad(op1); 2726 result.x = (tmp0.x > tmp1.x) ? 1.0 : 0.0; 2727 result.y = (tmp0.y > tmp1.y) ? 1.0 : 0.0; 2728 result.z = (tmp0.z > tmp1.z) ? 1.0 : 0.0; 2729 result.w = (tmp0.w > tmp1.w) ? 1.0 : 0.0; 2730 if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; 2731 if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; 2732 if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; 2733 if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; 2734 2735 The following special-case rules apply to SGT: 2736 2737 1. (NaN > <x>) and (<x> > NaN) are FALSE for all <x>. 2738 2. (-0.0 > +0.0) and (+0.0 > -0.0) are FALSE. 2739 2740 The SGT instruction is available only in the VP2 execution environment. 2741 2742 2743 Section 2.14.3.33, SIN: Sine 2744 2745 The SIN instruction approximates the sine of the angle specified by the 2746 scalar operand and replicates it to all four components of the result 2747 vector. The angle is specified in radians and does not have to be in the 2748 range [0,2*PI]. 2749 2750 tmp = ScalarLoad(op0); 2751 result.x = ApproxSine(tmp); 2752 result.y = ApproxSine(tmp); 2753 result.z = ApproxSine(tmp); 2754 result.w = ApproxSine(tmp); 2755 2756 The approximation function is accurate to at least 22 bits with an angle 2757 in the range [0,2*PI]. 2758 2759 | ApproxSine(x) - sin(x) | < 1.0 / 2^22, if 0.0 <= x < 2.0 * PI. 2760 2761 The error in the approximation will typically increase with the absolute 2762 value of the angle when the angle falls outside the range [0,2*PI]. 2763 2764 The following special-case rules apply to cosine approximation: 2765 2766 1. ApproxSine(NaN) = NaN. 2767 2. ApproxSine(+/-INF) = NaN. 2768 3. ApproxSine(+/-0.0) = +/-0.0. The sign of the result is equal to the 2769 sign of the single operand. 2770 2771 The SIN instruction is available only in the VP2 execution environment. 2772 2773 2774 Section 2.14.3.34, SLE: Set on Less Than or Equal 2775 2776 The SLE instruction performs a component-wise comparison of the two 2777 operands. Each component of the result vector is 1.0 if the corresponding 2778 component of the first operand is less than or equal to that of the 2779 second, and 0.0 otherwise. 2780 2781 tmp0 = VectorLoad(op0); 2782 tmp1 = VectorLoad(op1); 2783 result.x = (tmp0.x <= tmp1.x) ? 1.0 : 0.0; 2784 result.y = (tmp0.y <= tmp1.y) ? 1.0 : 0.0; 2785 result.z = (tmp0.z <= tmp1.z) ? 1.0 : 0.0; 2786 result.w = (tmp0.w <= tmp1.w) ? 1.0 : 0.0; 2787 if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; 2788 if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; 2789 if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; 2790 if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; 2791 2792 The following special-case rules apply to SLE: 2793 2794 1. (NaN <= <x>) and (<x> <= NaN) are FALSE for all <x>. 2795 2. (+INF <= +INF) and (-INF <= -INF) are TRUE. 2796 3. (-0.0 <= +0.0) and (+0.0 <= -0.0) are TRUE. 2797 2798 The SLE instruction is available only in the VP2 execution environment. 2799 2800 2801 Section 2.14.3.35, SLT: Set on Less Than 2802 2803 The SLT instruction performs a component-wise comparison of the two 2804 operands. Each component of the result vector is 1.0 if the corresponding 2805 component of the first operand is less than that of the second, and 0.0 2806 otherwise. 2807 2808 tmp0 = VectorLoad(op0); 2809 tmp1 = VectorLoad(op1); 2810 result.x = (tmp0.x < tmp1.x) ? 1.0 : 0.0; 2811 result.y = (tmp0.y < tmp1.y) ? 1.0 : 0.0; 2812 result.z = (tmp0.z < tmp1.z) ? 1.0 : 0.0; 2813 result.w = (tmp0.w < tmp1.w) ? 1.0 : 0.0; 2814 if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; 2815 if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; 2816 if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; 2817 if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; 2818 2819 The following special-case rules apply to SLT: 2820 2821 1. (NaN < <x>) and (<x> < NaN) are FALSE for all <x>. 2822 2. (-0.0 < +0.0) and (+0.0 < -0.0) are FALSE. 2823 2824 2825 Section 2.14.3.36, SNE: Set on Not Equal 2826 2827 The SNE instruction performs a component-wise comparison of the two 2828 operands. Each component of the result vector is 1.0 if the corresponding 2829 component of the first operand is not equal to that of the second, and 0.0 2830 otherwise. 2831 2832 tmp0 = VectorLoad(op0); 2833 tmp1 = VectorLoad(op1); 2834 result.x = (tmp0.x != tmp1.x) ? 1.0 : 0.0; 2835 result.y = (tmp0.y != tmp1.y) ? 1.0 : 0.0; 2836 result.z = (tmp0.z != tmp1.z) ? 1.0 : 0.0; 2837 result.w = (tmp0.w != tmp1.w) ? 1.0 : 0.0; 2838 if (tmp0.x is NaN or tmp1.x is NaN) result.x = NaN; 2839 if (tmp0.y is NaN or tmp1.y is NaN) result.y = NaN; 2840 if (tmp0.z is NaN or tmp1.z is NaN) result.z = NaN; 2841 if (tmp0.w is NaN or tmp1.w is NaN) result.w = NaN; 2842 2843 The following special-case rules apply to SNE: 2844 2845 1. (<x> != <y>) and (<y> != <x>) always produce the same result. 2846 2. (NaN != <x>) is TRUE for all <x>, including NaN. 2847 3. (+INF != +INF) and (-INF != -INF) are FALSE. 2848 4. (-0.0 != +0.0) and (+0.0 != -0.0) are TRUE. 2849 2850 The SNE instruction is available only in the VP2 execution environment. 2851 2852 2853 Section 2.14.3.37, SSG: Set Sign 2854 2855 The SSG instruction generates a result vector containing the signs of each 2856 component of the single operand. Each component of the result vector is 2857 1.0 if the corresponding component of the operand is greater than zero, 2858 0.0 if the corresponding component of the operand is equal to zero, and 2859 -1.0 if the corresponding component of the operand is less than zero. 2860 2861 tmp = VectorLoad(op0); 2862 result.x = SetSign(tmp.x); 2863 result.y = SetSign(tmp.y); 2864 result.z = SetSign(tmp.z); 2865 result.w = SetSign(tmp.w); 2866 2867 The following special-case rules apply to SSG: 2868 2869 1. SetSign(NaN) = NaN. 2870 2. SetSign(-0.0) = SetSign(+0.0) = 0.0. 2871 3. SetSign(-INF) = -1.0. 2872 4. SetSign(+INF) = +1.0. 2873 5. SetSign(x) = -1.0, if -INF < x < -0.0. 2874 6. SetSign(x) = +1.0, if +0.0 < x < +INF. 2875 2876 The SSG instruction is available only in the VP2 execution environment. 2877 2878 2879 Section 2.14.3.38, STR: Set on True 2880 2881 The STR instruction is a degenerate case of the other "Set on" 2882 instructions that sets all components of the result vector to 1.0. 2883 2884 result.x = 1.0; 2885 result.y = 1.0; 2886 result.z = 1.0; 2887 result.w = 1.0; 2888 2889 The STR instruction is available only in the VP2 execution environment. 2890 2891 2892 Section 2.14.3.39, SUB: Subtract 2893 2894 The SUB instruction performs a component-wise subtraction of the second 2895 operand from the first to yield a result vector. 2896 2897 tmp0 = VectorLoad(op0); 2898 tmp1 = VectorLoad(op1); 2899 result.x = tmp0.x - tmp1.x; 2900 result.y = tmp0.y - tmp1.y; 2901 result.z = tmp0.z - tmp1.z; 2902 result.w = tmp0.w - tmp1.w; 2903 2904 The SUB instruction is completely equivalent to an identical ADD 2905 instruction in which the negate operator on the second operand is 2906 reversed: 2907 2908 1. "SUB R0, R1, R2" is equivalent to "ADD R0, R1, -R2". 2909 2. "SUB R0, R1, -R2" is equivalent to "ADD R0, R1, R2". 2910 3. "SUB R0, R1, |R2|" is equivalent to "ADD R0, R1, -|R2|". 2911 4. "SUB R0, R1, -|R2|" is equivalent to "ADD R0, R1, |R2|". 2912 2913 The SUB instruction is available only in the VP1.1 and VP2 execution 2914 environments. 2915 2916 2917 2.14.4 Vertex Arrays for Vertex Attributes 2918 2919 Data for vertex attributes in vertex program mode may be specified 2920 using vertex array commands. The client may specify and enable any 2921 of sixteen vertex attribute arrays. 2922 2923 The vertex attribute arrays are ignored when vertex program mode 2924 is disabled. When vertex program mode is enabled, vertex attribute 2925 arrays are used. 2926 2927 The command 2928 2929 void VertexAttribPointerNV(uint index, int size, enum type, 2930 sizei stride, const void *pointer); 2931 2932 describes the locations and organizations of the sixteen vertex 2933 attribute arrays. index specifies the particular vertex attribute 2934 to be described. size indicates the number of values per vertex 2935 that are stored in the array; size must be one of 1, 2, 3, or 4. 2936 type specifies the data type of the values stored in the array. 2937 type must be one of SHORT, FLOAT, DOUBLE, or UNSIGNED_BYTE and these 2938 values correspond to the array types short, int, float, double, and 2939 ubyte respectively. The INVALID_OPERATION error is generated if 2940 type is UNSIGNED_BYTE and size is not 4. The INVALID_VALUE error 2941 is generated if index is greater than 15. The INVALID_VALUE error 2942 is generated if stride is negative. 2943 2944 The one, two, three, or four values in an array that correspond to a 2945 single vertex attribute comprise an array element. The values within 2946 each array element at stored sequentially in memory. If the stride 2947 is specified as zero, then array elements are stored sequentially 2948 as well. Otherwise points to the ith and (i+1)st elements of an array 2949 differ by stride basic machine units (typically unsigned bytes), 2950 the pointer to the (i+1)st element being greater. pointer specifies 2951 the location in memory of the first value of the first element of 2952 the array being specified. 2953 2954 Vertex attribute arrays are enabled with the EnableClientState command 2955 and disabled with the DisableClientState command. The value of the 2956 argument to either command is VERTEX_ATTRIB_ARRAYi_NV where i is an 2957 integer between 0 and 15; specifying a value of i enables or 2958 disables the vertex attribute array with index i. The constants 2959 obey VERTEX_ATTRIB_ARRAYi_NV = VERTEX_ATTRIB_ARRAY0_NV + i. 2960 2961 When vertex program mode is enabled, the ArrayElement command operates 2962 as described in this section in contrast to the behavior described 2963 in section 2.8. Likewise, any vertex array transfer commands that 2964 are defined in terms of ArrayElement (DrawArrays, DrawElements, and 2965 DrawRangeElements) assume the operation of ArrayElement described 2966 in this section when vertex program mode is enabled. 2967 2968 When vertex program mode is enabled, the ArrayElement command 2969 transfers the ith element of particular enabled vertex arrays as 2970 described below. For each enabled vertex attribute array, it is 2971 as though the corresponding command from section 2.14.1.1 were 2972 called with a pointer to element i. For each vertex attribute, 2973 the corresponding command is VertexAttrib[size][type]v, where size 2974 is one of [1,2,3,4], and type is one of [s,f,d,ub], corresponding 2975 to the array types short, int, float, double, and ubyte respectively. 2976 2977 However, if a given vertex attribute array is disabled, but its 2978 corresponding aliased conventional per-vertex parameter's vertex 2979 array (as described in section 2.14.1.6) is enabled, then it is 2980 as though the corresponding command from section 2.7 or section 2981 2.6.2 were called with a pointer to element i. In this case, the 2982 corresponding command is determined as described in section 2.8's 2983 description of ArrayElement. 2984 2985 If the vertex attribute array 0 is enabled, it is as though 2986 VertexAttrib[size][type]v(0, ...) is executed last, after the 2987 executions of other corresponding commands. If the vertex attribute 2988 array 0 is disabled but the vertex array is enabled, it is as though 2989 Vertex[size][type]v is executed last, after the executions of other 2990 corresponding commands. 2991 2992 2.14.5 Vertex State Programs 2993 2994 Vertex state programs share the same instruction set as and a similar 2995 execution model to vertex programs. While vertex programs are executed 2996 implicitly when a vertex transformation is provoked, vertex state programs 2997 are executed explicitly, independently of any vertices. Vertex state 2998 programs can write program parameter registers, but may not write vertex 2999 result registers. Vertex state programs have not been extended beyond the 3000 the VP1.0 execution environment, and are offered solely for compatibility 3001 with that execution environment. 3002 3003 The purpose of a vertex state program is to update program parameter 3004 registers by means of an application-defined program. Typically, an 3005 application will load a set of program parameters and then execute a 3006 vertex state program that reads and updates the program parameter 3007 registers. For example, a vertex state program might normalize a set of 3008 unnormalized vectors previously loaded as program parameters. The 3009 expectation is that subsequently executed vertex programs would use the 3010 normalized program parameters. 3011 3012 Vertex state programs are loaded with the same LoadProgramNV command (see 3013 section 2.14.1.8) used to load vertex programs except that the target must 3014 be VERTEX_STATE_PROGRAM_NV when loading a vertex state program. 3015 3016 Vertex state programs must conform to a more limited grammar than the 3017 grammar for vertex programs. The vertex state program grammar for 3018 syntactically valid sequences is the same as the grammar defined in 3019 section 2.14.1.8 with the following modified rules: 3020 3021 <program> ::= <vp1-program> 3022 3023 <vp1-program> ::= "!!VSP1.0" <programBody> "END" 3024 3025 <dstReg> ::= <absProgParamReg> 3026 | <temporaryReg> 3027 3028 <vertexAttribReg> ::= "v" "[" "0" "]" 3029 3030 A vertex state program fails to load if it does not write at least 3031 one program parameter register. 3032 3033 A vertex state program fails to load if it contains more than 128 3034 instructions. 3035 3036 A vertex state program fails to load if any instruction sources more 3037 than one unique program parameter register. 3038 3039 A vertex state program fails to load if any instruction sources 3040 more than one unique vertex attribute register (this is necessarily 3041 true because only vertex attribute 0 is available in vertex state 3042 programs). 3043 3044 The error INVALID_OPERATION is generated if a vertex state program 3045 fails to load because it is not syntactically correct or for one 3046 of the other reasons listed above. 3047 3048 A successfully loaded vertex state program is parsed into a sequence 3049 of instructions. Each instruction is identified by its tokenized 3050 name. The operation of these instructions when executed is defined 3051 in section 2.14.1.10. 3052 3053 Executing vertex state programs is legal only outside a Begin/End 3054 pair. A vertex state program may not read any vertex attribute 3055 register other than register zero. A vertex state program may not 3056 write any vertex result register. 3057 3058 The command 3059 3060 ExecuteProgramNV(enum target, uint id, const float *params); 3061 3062 executes the vertex state program named by id. The target must be 3063 VERTEX_STATE_PROGRAM_NV and the id must be the name of program loaded 3064 with a target type of VERTEX_STATE_PROGRAM_NV. params points to 3065 an array of four floating-point values that are loaded into vertex 3066 attribute register zero (the only vertex attribute readable from a 3067 vertex state program). 3068 3069 The INVALID_OPERATION error is generated if the named program is 3070 nonexistent, is invalid, or the program is not a vertex state 3071 program. A vertex state program may not be valid for reasons 3072 explained in section 2.14.5. 3073 3074 3075 2.14.6, Program Options 3076 3077 In the VP1.1 and VP2.0 execution environment, vertex programs may specify 3078 one or more program options that modify the execution environment, 3079 according to the <option> grammar rule. The set of options available to 3080 the program is described below. 3081 3082 Section 2.14.6.1, Position-Invariant Vertex Program Option 3083 3084 If <vp11-option> or <vp2-option> matches "NV_position_invariant", the 3085 vertex program is presumed to be position-invariant. By default, vertex 3086 programs are not position-invariant. Even if programs emulate the 3087 conventional OpenGL transformation model, they may still not produce the 3088 exact same transform results, due to rounding errors or different 3089 operation orders. Such programs may not work well for multi-pass 3090 rendering algorithms where the second and subsequent passes use an EQUAL 3091 depth test. 3092 3093 Position-invariant vertex programs do not compute a final vertex position; 3094 instead, the GL computes vertex coordinates as described in section 2.10. 3095 This computation should produce exactly the same results as the 3096 conventional OpenGL transformation model, assuming vertex weighting and 3097 vertex blending are disabled. 3098 3099 A vertex program that specifies the position-invariant option will fail to 3100 load if it writes to the HPOS result register. 3101 3102 Additionally, in the VP1.1 execution environment, position-invariant 3103 programs can not use relative addressing for program parameters. Any 3104 position-invariant VP1.1 program matches the grammar rule 3105 <relProgParamReg>, will fail to load. No such restriction exists for 3106 VP2.0 programs. 3107 3108 For position-invariant programs, the limit on the number of instructions 3109 allowed in a program is reduced by four: position-invariant VP1.1 and 3110 VP2.0 programs may have no more than 124 or 252 instructions, 3111 respectively. 3112 3113 3114 2.14.7 Tracking Matrices 3115 3116 As a convenience to applications, standard GL matrix state can be 3117 tracked into program parameter vectors. This permits vertex programs 3118 to access matrices specified through GL matrix commands. 3119 3120 In addition to GL's conventional matrices, several additional matrices 3121 are available for tracking. These matrices have names of the form 3122 MATRIXi_NV where i is between zero and n-1 where n is the value 3123 of the MAX_TRACK_MATRICES_NV implementation dependent constant. 3124 The MATRIXi_NV constants obey MATRIXi_NV = MATRIX0_NV + i. The value 3125 of MAX_TRACK_MATRICES_NV must be at least eight. The maximum 3126 stack depth for tracking matrices is defined by the 3127 MAX_TRACK_MATRIX_STACK_DEPTH_NV and must be at least 1. 3128 3129 The command 3130 3131 TrackMatrixNV(enum target, uint address, enum matrix, enum transform); 3132 3133 tracks a given transformed version of a particular matrix into 3134 a contiguous sequence of four vertex program parameter registers 3135 beginning at address. target must be VERTEX_PROGRAM_NV (though 3136 tracked matrices apply to vertex state programs as well because both 3137 vertex state programs and vertex programs shared the same program 3138 parameter registers). matrix must be one of NONE, MODELVIEW, 3139 PROJECTION, TEXTURE, TEXTUREi_ARB (where i is between 0 and n-1 3140 where n is the number of texture units supported), COLOR (if 3141 the ARB_imaging subset is supported), MODELVIEW_PROJECTION_NV, 3142 or MATRIXi_NV. transform must be one of IDENTITY_NV, INVERSE_NV, 3143 TRANSPOSE_NV, or INVERSE_TRANSPOSE_NV. The INVALID_VALUE error is 3144 generated if address is not a multiple of four. 3145 3146 The MODELVIEW_PROJECTION_NV matrix represents the concatenation of 3147 the current modelview and projection matrices. If M is the current 3148 modelview matrix and P is the current projection matrix, then the 3149 MODELVIEW_PROJECTION_NV matrix is C and computed as 3150 3151 C = P M 3152 3153 Matrix tracking for the specified program parameter register and the 3154 next consecutive three registers is disabled when NONE is supplied 3155 for matrix. When tracking is disabled the previously tracked program 3156 parameter registers retain the state of their last tracked values. 3157 Otherwise, the specified transformed version of matrix is tracked into 3158 the specified program parameter register and the next three registers. 3159 Whenever the matrix changes, the transformed version of the matrix 3160 is updated in the specified range of program parameter registers. 3161 If TEXTURE is specified for matrix, the texture matrix for the current 3162 active texture unit is tracked. If TEXTUREi_ARB is specified for 3163 matrix, the <i>th texture matrix is tracked. 3164 3165 Matrices are tracked row-wise meaning that the top row of the 3166 transformed matrix is loaded into the program parameter address, 3167 the second from the top row of the transformed matrix is loaded into 3168 the program parameter address+1, the third from the top row of the 3169 transformed matrix is loaded into the program parameter address+2, 3170 and the bottom row of the transformed matrix is loaded into the 3171 program parameter address+3. The transformed matrix may be identical 3172 to the specified matrix, the inverse of the specified matrix, the 3173 transpose of the specified matrix, or the inverse transpose of the 3174 specified matrix, depending on the value of transform. 3175 3176 When matrix tracking is enabled for a particular program parameter 3177 register sequence, updates to the program parameter using 3178 ProgramParameterNV commands, a vertex program, or a vertex state 3179 program are not possible. The INVALID_OPERATION error is generated 3180 if a ProgramParameterNV command is used to update a program parameter 3181 register currently tracking a matrix. 3182 3183 The INVALID_OPERATION error is generated by ExecuteProgramNV when 3184 the vertex state program requested for execution writes to a program 3185 parameter register that is currently tracking a matrix because the 3186 program is considered invalid. 3187 3188 2.14.8 Required Vertex Program State 3189 3190 The state required for vertex programs consists of: 3191 3192 a bit indicating whether or not program mode is enabled; 3193 3194 a bit indicating whether or not two-sided color mode is enabled; 3195 3196 a bit indicating whether or not program-specified point size mode 3197 is enabled; 3198 3199 256 4-component floating-point program parameter registers; 3200 3201 16 4-component vertex attribute registers (though this state is 3202 aliased with the current normal, primary color, secondary color, 3203 fog coordinate, weights, and texture coordinate sets); 3204 3205 24 sets of matrix tracking state for each set of four sequential 3206 program parameter registers, consisting of a n-valued integer 3207 indicated the tracked matrix or GL_NONE (where n is 5 + the number 3208 of texture units supported + the number of tracking matrices 3209 supported) and a four-valued integer indicating the transformation 3210 of the tracked matrix; 3211 3212 an unsigned integer naming the currently bound vertex program 3213 3214 and the state must be maintained to indicate which integers 3215 are currently in use as program names. 3216 3217 Each existent program object consists of a target, a boolean indicating 3218 whether the program is resident, an array of type ubyte containing the 3219 program string, and the length of the program string array. Initially, 3220 no program objects exist. 3221 3222 Program mode, two-sided color mode, and program-specified point size 3223 mode are all initially disabled. 3224 3225 The initial state of all 256 program parameter registers is (0,0,0,0). 3226 3227 The initial state of the 16 vertex attribute registers is (0,0,0,1) 3228 except in cases where a vertex attribute register aliases to a 3229 conventional GL transform mode vertex parameter in which case 3230 the initial state is the initial state of the respective aliased 3231 conventional vertex parameter. 3232 3233 The initial state of the 24 sets of matrix tracking state is NONE 3234 for the tracked matrix and IDENTITY_NV for the transformation of the 3235 tracked matrix. 3236 3237 The initial currently bound program is zero. 3238 3239 The client state required to implement the 16 vertex attribute 3240 arrays consists of 16 boolean values, 16 memory pointers, 16 integer 3241 stride values, 16 symbolic constants representing array types, 3242 and 16 integers representing values per element. Initially, the 3243 boolean values are each disabled, the memory pointers are each null, 3244 the strides are each zero, the array types are each FLOAT, and the 3245 integers representing values per element are each four." 3246 3247 3248Additions to Chapter 3 of the OpenGL 1.3 Specification (Rasterization) 3249 3250 None. 3251 3252Additions to Chapter 4 of the OpenGL 1.3 Specification (Per-Fragment 3253Operations and the Frame Buffer) 3254 3255 None. 3256 3257Additions to Chapter 5 of the OpenGL 1.3 Specification (Special Functions) 3258 3259 None. 3260 3261Additions to Chapter 6 of the OpenGL 1.3 Specification (State and 3262State Requests) 3263 3264 None. 3265 3266Additions to Appendix A of the OpenGL 1.3 Specification (Invariance) 3267 3268 None. 3269 3270Additions to the AGL/GLX/WGL Specifications 3271 3272 None. 3273 3274GLX Protocol 3275 3276 All relevant protocol is defined in the NV_vertex_program extension. 3277 3278Errors 3279 3280 This list includes the errors specified in the NV_vertex_program 3281 extension, modified as appropriate. 3282 3283 The error INVALID_VALUE is generated if VertexAttribNV is called where 3284 index is greater than 15. 3285 3286 The error INVALID_VALUE is generated if any ProgramParameterNV has an 3287 index is greater than 255 (was 95 in NV_vertex_program). 3288 3289 The error INVALID_VALUE is generated if VertexAttribPointerNV is called 3290 where index is greater than 15. 3291 3292 The error INVALID_VALUE is generated if VertexAttribPointerNV is called 3293 where size is not one of 1, 2, 3, or 4. 3294 3295 The error INVALID_VALUE is generated if VertexAttribPointerNV is called 3296 where stride is negative. 3297 3298 The error INVALID_OPERATION is generated if VertexAttribPointerNV is 3299 called where type is UNSIGNED_BYTE and size is not 4. 3300 3301 The error INVALID_VALUE is generated if LoadProgramNV is used to load a 3302 program with an id of zero. 3303 3304 The error INVALID_OPERATION is generated if LoadProgramNV is used to load 3305 an id that is currently loaded with a program of a different program 3306 target. 3307 3308 The error INVALID_OPERATION is generated if the program passed to 3309 LoadProgramNV fails to load because it is not syntactically correct based 3310 on the specified target. The value of PROGRAM_ERROR_POSITION_NV is still 3311 updated when this error is generated. 3312 3313 The error INVALID_OPERATION is generated if LoadProgramNV has a target of 3314 VERTEX_PROGRAM_NV and the specified program fails to load because it does 3315 not write the HPOS register at least once. The value of 3316 PROGRAM_ERROR_POSITION_NV is still updated when this error is generated. 3317 3318 The error INVALID_OPERATION is generated if LoadProgramNV has a target of 3319 VERTEX_STATE_PROGRAM_NV and the specified program fails to load because it 3320 does not write at least one program parameter register. The value of 3321 PROGRAM_ERROR_POSITION_NV is still updated when this error is generated. 3322 3323 The error INVALID_OPERATION is generated if the vertex program or vertex 3324 state program passed to LoadProgramNV fails to load because it contains 3325 more than 128 instructions (VP1 programs) or 256 instructions (VP2 3326 programs). The value of PROGRAM_ERROR_POSITION_NV is still updated when 3327 this error is generated. 3328 3329 The error INVALID_OPERATION is generated if a program is loaded with 3330 LoadProgramNV for id when id is currently loaded with a program of a 3331 different target. 3332 3333 The error INVALID_OPERATION is generated if BindProgramNV attempts to bind 3334 to a program name that is not a vertex program (for example, if the 3335 program is a vertex state program). 3336 3337 The error INVALID_VALUE is generated if GenProgramsNV is called where n is 3338 negative. 3339 3340 The error INVALID_VALUE is generated if AreProgramsResidentNV is called 3341 and any of the queried programs are zero or do not exist. 3342 3343 The error INVALID_OPERATION is generated if ExecuteProgramNV executes a 3344 program that does not exist. 3345 3346 The error INVALID_OPERATION is generated if ExecuteProgramNV executes a 3347 program that is not a vertex state program. 3348 3349 The error INVALID_OPERATION is generated if Begin, RasterPos, or a command 3350 that performs an explicit Begin is called when vertex program mode is 3351 enabled and the currently bound vertex program writes program parameters 3352 that are currently being tracked. 3353 3354 The error INVALID_OPERATION is generated if ExecuteProgramNV is called and 3355 the vertex state program to execute writes program parameters that are 3356 currently being tracked. 3357 3358 The error INVALID_VALUE is generated if TrackMatrixNV has a target of 3359 VERTEX_PROGRAM_NV and attempts to track an address is not a multiple of 3360 four. 3361 3362 The error INVALID_VALUE is generated if GetProgramParameterNV is called to 3363 query an index greater than 255 (was 95 in NV_vertex_program). 3364 3365 The error INVALID_VALUE is generated if GetVertexAttribNV is called to 3366 query an <index> greater than 15, or if <index> is zero and <pname> is 3367 CURRENT_ATTRIB_NV. 3368 3369 The error INVALID_VALUE is generated if GetVertexAttribPointervNV is 3370 called to query an index greater than 15. 3371 3372 The error INVALID_OPERATION is generated if GetProgramivNV is called and 3373 the program named id does not exist. 3374 3375 The error INVALID_OPERATION is generated if GetProgramStringNV is called 3376 and the program named <program> does not exist. 3377 3378 The error INVALID_VALUE is generated if GetTrackMatrixivNV is called with 3379 an <address> that is not divisible by four or greater than or equal to 256 3380 (was 96 in NV_vertex_program). 3381 3382 The error INVALID_VALUE is generated if AreProgramsResidentNV, 3383 DeleteProgramsNV, GenProgramsNV, or RequestResidentProgramsNV are called 3384 where <n> is negative. 3385 3386 The error INVALID_VALUE is generated if LoadProgramNV is called where 3387 <len> is negative. 3388 3389 The error INVALID_VALUE is generated if ProgramParameters4dvNV or 3390 ProgramParameters4fvNV are called where <count> is negative. 3391 3392 The error INVALID_VALUE is generated if VertexAttribs{1,2,3,4}{d,f,s}vNV 3393 is called where <count> is negative. 3394 3395 The error INVALID_ENUM is generated if BindProgramNV, 3396 GetProgramParameterfvNV, GetProgramParameterdvNV, GetTrackMatrixivNV, 3397 ProgramParameter4fNV, ProgramParameter4dNV, ProgramParameter4fvNV, 3398 ProgramParameter4dvNV, ProgramParameters4fvNV, ProgramParameters4dvNV, 3399 or TrackMatrixNV are called where <target> is not VERTEX_PROGRAM_NV. 3400 3401 The error INVALID_ENUM is generated if LoadProgramNV or 3402 ExecuteProgramNV are called where <target> is not either 3403 VERTEX_PROGRAM_NV or VERTEX_STATE_PROGRAM_NV. 3404 3405New State 3406 3407(Modify Table X.5, New State Introduced by NV_vertex_program from the 3408 NV_vertex_program specification.) 3409 3410Get Value Type Get Command Initial Value Description Sec Attribute 3411--------------------- ------ ----------------------- ------------- ------------------ -------- ------------ 3412PROGRAM_PARAMETER_NV 256xR4 GetProgramParameterNV (0,0,0,0) program parameters 2.14.1.2 - 3413 3414 3415(Modify Table X.7. Vertex Program Per-vertex Execution State. "VP1" and 3416"VP2" refer to the VP1 and VP2 execution environments, respectively.) 3417 3418Get Value Type Get Command Initial Value Description Sec Attribute 3419--------- ------ ----------- ------------- ----------------------- -------- --------- 3420- 12xR4 - (0,0,0,0) VP1 temporary registers 2.14.1.4 - 3421- 16xR4 - (0,0,0,0) VP2 temporary registers 2.14.1.4 - 3422- 15xR4 - (0,0,0,1) vertex result registers 2.14.1.4 - 3423 Z4 - (0,0,0,0) VP1 address register 2.14.1.3 - 3424 2xZ4 - (0,0,0,0) VP2 address registers 2.14.1.3 - 3425 3426 3427Revision History 3428 3429 Rev. Date Author Changes 3430 ---- -------- ------- -------------------------------------------- 3431 33 03/18/08 pbrown Fixed incorrectly documented clamp in the RCC 3432 instruction. 3433 3434 32 05/16/04 pbrown Documented that it's not possible to results from 3435 LG2 that are any more precise than what is 3436 available in the fp32 storage format. 3437 3438 31 08/17/03 pbrown Added several overlooked opcodes (RCC, SUB, SIN) 3439 to the grammar. They are documented in the spec 3440 body, however. 3441 3442 30 02/28/03 pbrown Fixed incorrect condition code example. 3443 3444 29 12/08/02 pbrown Fixed minor bug where "ABS" and "DPH" were listed 3445 twice in the grammar. 3446 3447 28 10/29/02 pbrown Remove support for indirect branching. Added 3448 missing o[CLPx] outputs to the grammar. Minor 3449 typo fixes. 3450 3451 25 07/19/02 pbrown Fixed several miscellaneous errors in the spec. 3452 3453 24 06/28/02 pbrown Fixed several erroneous resource limitations. 3454 3455 23 06/07/02 pbrown Removed stray and erroneous abs() from the 3456 documentation of the LG2 instruction. 3457 3458 22 06/06/02 pbrown Added missing items from NV_vertex_program1_1, in 3459 particular, program options. Documented the 3460 VP2.0 position-invariant programs have no 3461 restrictions on indirect addressing. 3462 3463 21 06/19/02 pbrown Cleaned up miscellaneous errors and issues 3464 in the spec. 3465 3466 20 05/17/02 pbrown Documented LOG instruction as taking the 3467 absolute value of the operand, as in VP1.0. 3468 Fixed special-case rules for MUL. Added clamps 3469 to special-case clamping rules for RCC. 3470 3471 18 05/09/02 pbrown Clarified the handling of NaN/UN in certain 3472 instructions and conditional operations. 3473 3474 17 04/26/02 pbrown Fix incorrectly specified algorithm for computing 3475 the y result in the LOG instruction. 3476 3477 16 04/21/02 pbrown Added example for "paletted skinning". 3478 Documented size limitation (10 bits) on the 3479 address register and ARA, ARL, and ARR 3480 instructions. The limits needs to be exposed 3481 because of the ARA instruction. Cleaned up 3482 documentation on absolute value on input 3483 operations. Added examples for masked writes and 3484 CC updates, and for branching. Fixed 3485 out-of-range indexed branch language and 3486 pseudocode to clamp to the actual table size 3487 (rather than the theoretical maximum). 3488 Documented ABS as semi-deprecated in VP2. Fixed 3489 special cases for MIN, MAX, SEQ, SGE, SGT, SLE, 3490 SLT, and SNE. Fix completely botched description 3491 of RET. 3492 3493 15 04/05/02 pbrown Updated introduction to indicate that 3494 ARL/ARR/ARA all can update condition code. 3495 Minor fixes and optimizations to the looping 3496 examples. Add missing "set on" opcodes to the 3497 grammar. Fixed spec to clamp branch table 3498 indices to [0,15]. Added a couple caveats to 3499 the "ABS" pseudo-instruction. Documented 3500 "ARR" as using IEEE round to nearest even 3501 mode. Documented special cases for "SSG". 3502