1TGSI 2==== 3 4TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language 5for describing shaders. Since Gallium is inherently shaderful, shaders are 6an important part of the API. TGSI is the only intermediate representation 7used by all drivers. 8 9Basics 10------ 11 12All TGSI instructions, known as *opcodes*, operate on arbitrary-precision 13floating-point four-component vectors. An opcode may have up to one 14destination register, known as *dst*, and between zero and three source 15registers, called *src0* through *src2*, or simply *src* if there is only 16one. 17 18Some instructions, like :opcode:`I2F`, permit re-interpretation of vector 19components as integers. Other instructions permit using registers as 20two-component vectors with double precision; see :ref:`doubleopcodes`. 21 22When an instruction has a scalar result, the result is usually copied into 23each of the components of *dst*. When this happens, the result is said to be 24*replicated* to *dst*. :opcode:`RCP` is one such instruction. 25 26Modifiers 27^^^^^^^^^^^^^^^ 28 29TGSI supports modifiers on inputs (as well as saturate and precise modifier 30on instructions). 31 32For arithmetic instruction having a precise modifier certain optimizations 33which may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be 34optimized to TGSI_OPCODE_MAD, because some hardware only supports the fused 35MAD instruction. 36 37For inputs which have a floating point type, both absolute value and 38negation modifiers are supported (with absolute value being applied 39first). The only source of TGSI_OPCODE_MOV and the second and third 40sources of TGSI_OPCODE_UCMP are considered to have float type for 41applying modifiers. 42 43For inputs which have signed or unsigned type only the negate modifier is 44supported. 45 46Instruction Set 47--------------- 48 49Core ISA 50^^^^^^^^^^^^^^^^^^^^^^^^^ 51 52These opcodes are guaranteed to be available regardless of the driver being 53used. 54 55.. opcode:: ARL - Address Register Load 56 57.. math:: 58 59 dst.x = (int) \lfloor src.x\rfloor 60 61 dst.y = (int) \lfloor src.y\rfloor 62 63 dst.z = (int) \lfloor src.z\rfloor 64 65 dst.w = (int) \lfloor src.w\rfloor 66 67 68.. opcode:: MOV - Move 69 70.. math:: 71 72 dst.x = src.x 73 74 dst.y = src.y 75 76 dst.z = src.z 77 78 dst.w = src.w 79 80 81.. opcode:: LIT - Light Coefficients 82 83.. math:: 84 85 dst.x &= 1 \\ 86 dst.y &= max(src.x, 0) \\ 87 dst.z &= (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 \\ 88 dst.w &= 1 89 90 91.. opcode:: RCP - Reciprocal 92 93This instruction replicates its result. 94 95.. math:: 96 97 dst = \frac{1}{src.x} 98 99 100.. opcode:: RSQ - Reciprocal Square Root 101 102This instruction replicates its result. The results are undefined for src <= 0. 103 104.. math:: 105 106 dst = \frac{1}{\sqrt{src.x}} 107 108 109.. opcode:: SQRT - Square Root 110 111This instruction replicates its result. The results are undefined for src < 0. 112 113.. math:: 114 115 dst = {\sqrt{src.x}} 116 117 118.. opcode:: EXP - Approximate Exponential Base 2 119 120.. math:: 121 122 dst.x &= 2^{\lfloor src.x\rfloor} \\ 123 dst.y &= src.x - \lfloor src.x\rfloor \\ 124 dst.z &= 2^{src.x} \\ 125 dst.w &= 1 126 127 128.. opcode:: LOG - Approximate Logarithm Base 2 129 130.. math:: 131 132 dst.x &= \lfloor\log_2{|src.x|}\rfloor \\ 133 dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\ 134 dst.z &= \log_2{|src.x|} \\ 135 dst.w &= 1 136 137 138.. opcode:: MUL - Multiply 139 140.. math:: 141 142 dst.x = src0.x \times src1.x 143 144 dst.y = src0.y \times src1.y 145 146 dst.z = src0.z \times src1.z 147 148 dst.w = src0.w \times src1.w 149 150 151.. opcode:: ADD - Add 152 153.. math:: 154 155 dst.x = src0.x + src1.x 156 157 dst.y = src0.y + src1.y 158 159 dst.z = src0.z + src1.z 160 161 dst.w = src0.w + src1.w 162 163 164.. opcode:: DP3 - 3-component Dot Product 165 166This instruction replicates its result. 167 168.. math:: 169 170 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z 171 172 173.. opcode:: DP4 - 4-component Dot Product 174 175This instruction replicates its result. 176 177.. math:: 178 179 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w 180 181 182.. opcode:: DST - Distance Vector 183 184.. math:: 185 186 dst.x &= 1\\ 187 dst.y &= src0.y \times src1.y\\ 188 dst.z &= src0.z\\ 189 dst.w &= src1.w 190 191 192.. opcode:: MIN - Minimum 193 194.. math:: 195 196 dst.x = min(src0.x, src1.x) 197 198 dst.y = min(src0.y, src1.y) 199 200 dst.z = min(src0.z, src1.z) 201 202 dst.w = min(src0.w, src1.w) 203 204 205.. opcode:: MAX - Maximum 206 207.. math:: 208 209 dst.x = max(src0.x, src1.x) 210 211 dst.y = max(src0.y, src1.y) 212 213 dst.z = max(src0.z, src1.z) 214 215 dst.w = max(src0.w, src1.w) 216 217 218.. opcode:: SLT - Set On Less Than 219 220.. math:: 221 222 dst.x = (src0.x < src1.x) ? 1.0F : 0.0F 223 224 dst.y = (src0.y < src1.y) ? 1.0F : 0.0F 225 226 dst.z = (src0.z < src1.z) ? 1.0F : 0.0F 227 228 dst.w = (src0.w < src1.w) ? 1.0F : 0.0F 229 230 231.. opcode:: SGE - Set On Greater Equal Than 232 233.. math:: 234 235 dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F 236 237 dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F 238 239 dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F 240 241 dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F 242 243 244.. opcode:: MAD - Multiply And Add 245 246Perform a * b + c. The implementation is free to decide whether there is an 247intermediate rounding step or not. 248 249.. math:: 250 251 dst.x = src0.x \times src1.x + src2.x 252 253 dst.y = src0.y \times src1.y + src2.y 254 255 dst.z = src0.z \times src1.z + src2.z 256 257 dst.w = src0.w \times src1.w + src2.w 258 259 260.. opcode:: LRP - Linear Interpolate 261 262.. math:: 263 264 dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x 265 266 dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y 267 268 dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z 269 270 dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w 271 272 273.. opcode:: FMA - Fused Multiply-Add 274 275Perform a * b + c with no intermediate rounding step. 276 277.. math:: 278 279 dst.x = src0.x \times src1.x + src2.x 280 281 dst.y = src0.y \times src1.y + src2.y 282 283 dst.z = src0.z \times src1.z + src2.z 284 285 dst.w = src0.w \times src1.w + src2.w 286 287 288.. opcode:: FRC - Fraction 289 290.. math:: 291 292 dst.x = src.x - \lfloor src.x\rfloor 293 294 dst.y = src.y - \lfloor src.y\rfloor 295 296 dst.z = src.z - \lfloor src.z\rfloor 297 298 dst.w = src.w - \lfloor src.w\rfloor 299 300 301.. opcode:: FLR - Floor 302 303.. math:: 304 305 dst.x = \lfloor src.x\rfloor 306 307 dst.y = \lfloor src.y\rfloor 308 309 dst.z = \lfloor src.z\rfloor 310 311 dst.w = \lfloor src.w\rfloor 312 313 314.. opcode:: ROUND - Round 315 316.. math:: 317 318 dst.x = round(src.x) 319 320 dst.y = round(src.y) 321 322 dst.z = round(src.z) 323 324 dst.w = round(src.w) 325 326 327.. opcode:: EX2 - Exponential Base 2 328 329This instruction replicates its result. 330 331.. math:: 332 333 dst = 2^{src.x} 334 335 336.. opcode:: LG2 - Logarithm Base 2 337 338This instruction replicates its result. 339 340.. math:: 341 342 dst = \log_2{src.x} 343 344 345.. opcode:: POW - Power 346 347This instruction replicates its result. 348 349.. math:: 350 351 dst = src0.x^{src1.x} 352 353 354.. opcode:: LDEXP - Multiply Number by Integral Power of 2 355 356src1 is an integer. 357 358.. math:: 359 360 dst.x = src0.x * 2^{src1.x} 361 dst.y = src0.y * 2^{src1.y} 362 dst.z = src0.z * 2^{src1.z} 363 dst.w = src0.w * 2^{src1.w} 364 365 366.. opcode:: COS - Cosine 367 368This instruction replicates its result. 369 370.. math:: 371 372 dst = \cos{src.x} 373 374 375.. opcode:: DDX, DDX_FINE - Derivative Relative To X 376 377The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is 378advertised. When it is, the fine version guarantees one derivative per row 379while DDX is allowed to be the same for the entire 2x2 quad. 380 381.. math:: 382 383 dst.x = partialx(src.x) 384 385 dst.y = partialx(src.y) 386 387 dst.z = partialx(src.z) 388 389 dst.w = partialx(src.w) 390 391 392.. opcode:: DDY, DDY_FINE - Derivative Relative To Y 393 394The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is 395advertised. When it is, the fine version guarantees one derivative per column 396while DDY is allowed to be the same for the entire 2x2 quad. 397 398.. math:: 399 400 dst.x = partialy(src.x) 401 402 dst.y = partialy(src.y) 403 404 dst.z = partialy(src.z) 405 406 dst.w = partialy(src.w) 407 408 409.. opcode:: PK2H - Pack Two 16-bit Floats 410 411This instruction replicates its result. 412 413.. math:: 414 415 dst = f32\_to\_f16(src.x) | f32\_to\_f16(src.y) << 16 416 417 418.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars 419 420This instruction replicates its result. 421 422.. math:: 423 424 dst = f32\_to\_unorm16(src.x) | f32\_to\_unorm16(src.y) << 16 425 426 427.. opcode:: PK4B - Pack Four Signed 8-bit Scalars 428 429This instruction replicates its result. 430 431.. math:: 432 433 dst = f32\_to\_snorm8(src.x) | 434 (f32\_to\_snorm8(src.y) << 8) | 435 (f32\_to\_snorm8(src.z) << 16) | 436 (f32\_to\_snorm8(src.w) << 24) 437 438 439.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars 440 441This instruction replicates its result. 442 443.. math:: 444 445 dst = f32\_to\_unorm8(src.x) | 446 (f32\_to\_unorm8(src.y) << 8) | 447 (f32\_to\_unorm8(src.z) << 16) | 448 (f32\_to\_unorm8(src.w) << 24) 449 450 451.. opcode:: SEQ - Set On Equal 452 453.. math:: 454 455 dst.x = (src0.x == src1.x) ? 1.0F : 0.0F 456 457 dst.y = (src0.y == src1.y) ? 1.0F : 0.0F 458 459 dst.z = (src0.z == src1.z) ? 1.0F : 0.0F 460 461 dst.w = (src0.w == src1.w) ? 1.0F : 0.0F 462 463 464.. opcode:: SGT - Set On Greater Than 465 466.. math:: 467 468 dst.x = (src0.x > src1.x) ? 1.0F : 0.0F 469 470 dst.y = (src0.y > src1.y) ? 1.0F : 0.0F 471 472 dst.z = (src0.z > src1.z) ? 1.0F : 0.0F 473 474 dst.w = (src0.w > src1.w) ? 1.0F : 0.0F 475 476 477.. opcode:: SIN - Sine 478 479This instruction replicates its result. 480 481.. math:: 482 483 dst = \sin{src.x} 484 485 486.. opcode:: SLE - Set On Less Equal Than 487 488.. math:: 489 490 dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F 491 492 dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F 493 494 dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F 495 496 dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F 497 498 499.. opcode:: SNE - Set On Not Equal 500 501.. math:: 502 503 dst.x = (src0.x != src1.x) ? 1.0F : 0.0F 504 505 dst.y = (src0.y != src1.y) ? 1.0F : 0.0F 506 507 dst.z = (src0.z != src1.z) ? 1.0F : 0.0F 508 509 dst.w = (src0.w != src1.w) ? 1.0F : 0.0F 510 511 512.. opcode:: TEX - Texture Lookup 513 514 for array textures src0.y contains the slice for 1D, 515 and src0.z contain the slice for 2D. 516 517 for shadow textures with no arrays (and not cube map), 518 src0.z contains the reference value. 519 520 for shadow textures with arrays, src0.z contains 521 the reference value for 1D arrays, and src0.w contains 522 the reference value for 2D arrays and cube maps. 523 524 for cube map array shadow textures, the reference value 525 cannot be passed in src0.w, and TEX2 must be used instead. 526 527.. math:: 528 529 coord = src0 530 531 shadow_ref = src0.z or src0.w (optional) 532 533 unit = src1 534 535 dst = texture\_sample(unit, coord, shadow_ref) 536 537 538.. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only) 539 540 this is the same as TEX, but uses another reg to encode the 541 reference value. 542 543.. math:: 544 545 coord = src0 546 547 shadow_ref = src1.x 548 549 unit = src2 550 551 dst = texture\_sample(unit, coord, shadow_ref) 552 553 554 555 556.. opcode:: TXD - Texture Lookup with Derivatives 557 558.. math:: 559 560 coord = src0 561 562 ddx = src1 563 564 ddy = src2 565 566 unit = src3 567 568 dst = texture\_sample\_deriv(unit, coord, ddx, ddy) 569 570 571.. opcode:: TXP - Projective Texture Lookup 572 573.. math:: 574 575 coord.x = src0.x / src0.w 576 577 coord.y = src0.y / src0.w 578 579 coord.z = src0.z / src0.w 580 581 coord.w = src0.w 582 583 unit = src1 584 585 dst = texture\_sample(unit, coord) 586 587 588.. opcode:: UP2H - Unpack Two 16-Bit Floats 589 590.. math:: 591 592 dst.x = f16\_to\_f32(src0.x \& 0xffff) 593 594 dst.y = f16\_to\_f32(src0.x >> 16) 595 596 dst.z = f16\_to\_f32(src0.x \& 0xffff) 597 598 dst.w = f16\_to\_f32(src0.x >> 16) 599 600.. note:: 601 602 Considered for removal. 603 604.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars 605 606 TBD 607 608.. note:: 609 610 Considered for removal. 611 612.. opcode:: UP4B - Unpack Four Signed 8-Bit Values 613 614 TBD 615 616.. note:: 617 618 Considered for removal. 619 620.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars 621 622 TBD 623 624.. note:: 625 626 Considered for removal. 627 628 629.. opcode:: ARR - Address Register Load With Round 630 631.. math:: 632 633 dst.x = (int) round(src.x) 634 635 dst.y = (int) round(src.y) 636 637 dst.z = (int) round(src.z) 638 639 dst.w = (int) round(src.w) 640 641 642.. opcode:: SSG - Set Sign 643 644.. math:: 645 646 dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0 647 648 dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0 649 650 dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0 651 652 dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0 653 654 655.. opcode:: CMP - Compare 656 657.. math:: 658 659 dst.x = (src0.x < 0) ? src1.x : src2.x 660 661 dst.y = (src0.y < 0) ? src1.y : src2.y 662 663 dst.z = (src0.z < 0) ? src1.z : src2.z 664 665 dst.w = (src0.w < 0) ? src1.w : src2.w 666 667 668.. opcode:: KILL_IF - Conditional Discard 669 670 Conditional discard. Allowed in fragment shaders only. 671 672.. math:: 673 674 if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0) 675 discard 676 endif 677 678 679.. opcode:: KILL - Discard 680 681 Unconditional discard. Allowed in fragment shaders only. 682 683 684.. opcode:: DEMOTE - Demote Invocation to a Helper 685 686 This demotes the current invocation to a helper, but continues 687 execution (while KILL may or may not terminate the 688 invocation). After this runs, all the usual helper invocation rules 689 apply about discarding buffer and render target writes. This is 690 useful for having accurate derivatives in the other invocations 691 which have not been demoted. 692 693 Allowed in fragment shaders only. 694 695 696.. opcode:: READ_HELPER - Reads Invocation Helper Status 697 698 This is identical to ``TGSI_SEMANTIC_HELPER_INVOCATION``, except 699 this will read the current value, which might change as a result of 700 a ``DEMOTE`` instruction. 701 702 Allowed in fragment shaders only. 703 704 705.. opcode:: TXB - Texture Lookup With Bias 706 707 for cube map array textures and shadow cube maps, the bias value 708 cannot be passed in src0.w, and TXB2 must be used instead. 709 710 if the target is a shadow texture, the reference value is always 711 in src.z (this prevents shadow 3d and shadow 2d arrays from 712 using this instruction, but this is not needed). 713 714.. math:: 715 716 coord.x = src0.x 717 718 coord.y = src0.y 719 720 coord.z = src0.z 721 722 coord.w = none 723 724 bias = src0.w 725 726 unit = src1 727 728 dst = texture\_sample(unit, coord, bias) 729 730 731.. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only) 732 733 this is the same as TXB, but uses another reg to encode the 734 lod bias value for cube map arrays and shadow cube maps. 735 Presumably shadow 2d arrays and shadow 3d targets could use 736 this encoding too, but this is not legal. 737 738 if the target is a shadow cube map array, the reference value is in 739 src1.y. 740 741.. math:: 742 743 coord = src0 744 745 bias = src1.x 746 747 unit = src2 748 749 dst = texture\_sample(unit, coord, bias) 750 751 752.. opcode:: DIV - Divide 753 754.. math:: 755 756 dst.x = \frac{src0.x}{src1.x} 757 758 dst.y = \frac{src0.y}{src1.y} 759 760 dst.z = \frac{src0.z}{src1.z} 761 762 dst.w = \frac{src0.w}{src1.w} 763 764 765.. opcode:: DP2 - 2-component Dot Product 766 767This instruction replicates its result. 768 769.. math:: 770 771 dst = src0.x \times src1.x + src0.y \times src1.y 772 773 774.. opcode:: TEX_LZ - Texture Lookup With LOD = 0 775 776 This is the same as TXL with LOD = 0. Like every texture opcode, it obeys 777 pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod. 778 There is no way to override those two in shaders. 779 780.. math:: 781 782 coord.x = src0.x 783 784 coord.y = src0.y 785 786 coord.z = src0.z 787 788 coord.w = none 789 790 lod = 0 791 792 unit = src1 793 794 dst = texture\_sample(unit, coord, lod) 795 796 797.. opcode:: TXL - Texture Lookup With explicit LOD 798 799 for cube map array textures, the explicit lod value 800 cannot be passed in src0.w, and TXL2 must be used instead. 801 802 if the target is a shadow texture, the reference value is always 803 in src.z (this prevents shadow 3d / 2d array / cube targets from 804 using this instruction, but this is not needed). 805 806.. math:: 807 808 coord.x = src0.x 809 810 coord.y = src0.y 811 812 coord.z = src0.z 813 814 coord.w = none 815 816 lod = src0.w 817 818 unit = src1 819 820 dst = texture\_sample(unit, coord, lod) 821 822 823.. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only) 824 825 this is the same as TXL, but uses another reg to encode the 826 explicit lod value. 827 Presumably shadow 3d / 2d array / cube targets could use 828 this encoding too, but this is not legal. 829 830 if the target is a shadow cube map array, the reference value is in 831 src1.y. 832 833.. math:: 834 835 coord = src0 836 837 lod = src1.x 838 839 unit = src2 840 841 dst = texture\_sample(unit, coord, lod) 842 843 844Compute ISA 845^^^^^^^^^^^^^^^^^^^^^^^^ 846 847These opcodes are primarily provided for special-use computational shaders. 848Support for these opcodes indicated by a special pipe capability bit (TBD). 849 850XXX doesn't look like most of the opcodes really belong here. 851 852.. opcode:: CEIL - Ceiling 853 854.. math:: 855 856 dst.x = \lceil src.x\rceil 857 858 dst.y = \lceil src.y\rceil 859 860 dst.z = \lceil src.z\rceil 861 862 dst.w = \lceil src.w\rceil 863 864 865.. opcode:: TRUNC - Truncate 866 867.. math:: 868 869 dst.x = trunc(src.x) 870 871 dst.y = trunc(src.y) 872 873 dst.z = trunc(src.z) 874 875 dst.w = trunc(src.w) 876 877 878.. opcode:: MOD - Modulus 879 880.. math:: 881 882 dst.x = src0.x \bmod src1.x 883 884 dst.y = src0.y \bmod src1.y 885 886 dst.z = src0.z \bmod src1.z 887 888 dst.w = src0.w \bmod src1.w 889 890 891.. opcode:: UARL - Integer Address Register Load 892 893 Moves the contents of the source register, assumed to be an integer, into the 894 destination register, which is assumed to be an address (ADDR) register. 895 896 897.. opcode:: TXF - Texel Fetch 898 899 As per NV_gpu_shader4, extract a single texel from a specified texture 900 image or PIPE_BUFFER resource. The source sampler may not be a CUBE or 901 SHADOW. src 0 is a 902 four-component signed integer vector used to identify the single texel 903 accessed. 3 components + level. If the texture is multisampled, then 904 the fourth component indicates the sample, not the mipmap level. 905 Just like texture instructions, an optional 906 offset vector is provided, which is subject to various driver restrictions 907 (regarding range, source of offsets). This instruction ignores the sampler 908 state. 909 910 TXF(uint_vec coord, int_vec offset). 911 912 913.. opcode:: TXQ - Texture Size Query 914 915 As per NV_gpu_program4, retrieve the dimensions of the texture depending on 916 the target. For 1D (width), 2D/RECT/CUBE (width, height), 3D (width, height, 917 depth), 1D array (width, layers), 2D array (width, height, layers). 918 Also return the number of accessible levels (last_level - first_level + 1) 919 in W. 920 921 For components which don't return a resource dimension, their value 922 is undefined. 923 924.. math:: 925 926 lod = src0.x 927 928 dst.x = texture\_width(unit, lod) 929 930 dst.y = texture\_height(unit, lod) 931 932 dst.z = texture\_depth(unit, lod) 933 934 dst.w = texture\_levels(unit) 935 936 937.. opcode:: TXQS - Texture Samples Query 938 939 This retrieves the number of samples in the texture, and stores it 940 into the x component as an unsigned integer. The other components are 941 undefined. If the texture is not multisampled, this function returns 942 (1, undef, undef, undef). 943 944.. math:: 945 946 dst.x = texture\_samples(unit) 947 948 949.. opcode:: TG4 - Texture Gather 950 951 As per ARB_texture_gather, gathers the four texels to be used in a bi-linear 952 filtering operation and packs them into a single register. Only works with 953 2D, 2D array, cubemaps, and cubemaps arrays. For 2D textures, only the 954 addressing modes of the sampler and the top level of any mip pyramid are 955 used. Set W to zero. It behaves like the TEX instruction, but a filtered 956 sample is not generated. The four samples that contribute to filtering are 957 placed into xyzw in clockwise order, starting with the (u,v) texture 958 coordinate delta at the following locations (-, +), (+, +), (+, -), (-, -), 959 where the magnitude of the deltas are half a texel. 960 961 PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample 962 depth compares, single component selection, and a non-constant offset. It 963 doesn't allow support for the GL independent offset to get i0,j0. This would 964 require another CAP is hw can do it natively. For now we lower that before 965 TGSI. 966 967 PIPE_CAP_TGSI_TG4_COMPONENT_IN_SWIZZLE changes the encoding so that component 968 is stored in the sampler source swizzle x. 969 970.. math:: 971 972 coord = src0 973 974 (without TGSI_TG4_COMPONENT_IN_SWIZZLE) 975 component = src1 976 977 dst = texture\_gather4 (unit, coord, component) 978 979 (with TGSI_TG4_COMPONENT_IN_SWIZZLE) 980 dst = texture\_gather4 (unit, coord) 981 component is encoded in sampler swizzle. 982 983(with SM5 - cube array shadow) 984 985.. math:: 986 987 coord = src0 988 989 compare = src1 990 991 dst = texture\_gather (uint, coord, compare) 992 993.. opcode:: LODQ - level of detail query 994 995 Compute the LOD information that the texture pipe would use to access the 996 texture. The Y component contains the computed LOD lambda_prime. The X 997 component contains the LOD that will be accessed, based on min/max lod's 998 and mipmap filters. 999 1000.. math:: 1001 1002 coord = src0 1003 1004 dst.xy = lodq(uint, coord); 1005 1006.. opcode:: CLOCK - retrieve the current shader time 1007 1008 Invoking this instruction multiple times in the same shader should 1009 cause monotonically increasing values to be returned. The values 1010 are implicitly 64-bit, so if fewer than 64 bits of precision are 1011 available, to provide expected wraparound semantics, the value 1012 should be shifted up so that the most significant bit of the time 1013 is the most significant bit of the 64-bit value. 1014 1015.. math:: 1016 1017 dst.xy = clock() 1018 1019 1020Integer ISA 1021^^^^^^^^^^^^^^^^^^^^^^^^ 1022These opcodes are used for integer operations. 1023Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?) 1024 1025 1026.. opcode:: I2F - Signed Integer To Float 1027 1028 Rounding is unspecified (round to nearest even suggested). 1029 1030.. math:: 1031 1032 dst.x = (float) src.x 1033 1034 dst.y = (float) src.y 1035 1036 dst.z = (float) src.z 1037 1038 dst.w = (float) src.w 1039 1040 1041.. opcode:: U2F - Unsigned Integer To Float 1042 1043 Rounding is unspecified (round to nearest even suggested). 1044 1045.. math:: 1046 1047 dst.x = (float) src.x 1048 1049 dst.y = (float) src.y 1050 1051 dst.z = (float) src.z 1052 1053 dst.w = (float) src.w 1054 1055 1056.. opcode:: F2I - Float to Signed Integer 1057 1058 Rounding is towards zero (truncate). 1059 Values outside signed range (including NaNs) produce undefined results. 1060 1061.. math:: 1062 1063 dst.x = (int) src.x 1064 1065 dst.y = (int) src.y 1066 1067 dst.z = (int) src.z 1068 1069 dst.w = (int) src.w 1070 1071 1072.. opcode:: F2U - Float to Unsigned Integer 1073 1074 Rounding is towards zero (truncate). 1075 Values outside unsigned range (including NaNs) produce undefined results. 1076 1077.. math:: 1078 1079 dst.x = (unsigned) src.x 1080 1081 dst.y = (unsigned) src.y 1082 1083 dst.z = (unsigned) src.z 1084 1085 dst.w = (unsigned) src.w 1086 1087 1088.. opcode:: UADD - Integer Add 1089 1090 This instruction works the same for signed and unsigned integers. 1091 The low 32bit of the result is returned. 1092 1093.. math:: 1094 1095 dst.x = src0.x + src1.x 1096 1097 dst.y = src0.y + src1.y 1098 1099 dst.z = src0.z + src1.z 1100 1101 dst.w = src0.w + src1.w 1102 1103 1104.. opcode:: UMAD - Integer Multiply And Add 1105 1106 This instruction works the same for signed and unsigned integers. 1107 The multiplication returns the low 32bit (as does the result itself). 1108 1109.. math:: 1110 1111 dst.x = src0.x \times src1.x + src2.x 1112 1113 dst.y = src0.y \times src1.y + src2.y 1114 1115 dst.z = src0.z \times src1.z + src2.z 1116 1117 dst.w = src0.w \times src1.w + src2.w 1118 1119 1120.. opcode:: UMUL - Integer Multiply 1121 1122 This instruction works the same for signed and unsigned integers. 1123 The low 32bit of the result is returned. 1124 1125.. math:: 1126 1127 dst.x = src0.x \times src1.x 1128 1129 dst.y = src0.y \times src1.y 1130 1131 dst.z = src0.z \times src1.z 1132 1133 dst.w = src0.w \times src1.w 1134 1135 1136.. opcode:: IMUL_HI - Signed Integer Multiply High Bits 1137 1138 The high 32bits of the multiplication of 2 signed integers are returned. 1139 1140.. math:: 1141 1142 dst.x = (src0.x \times src1.x) >> 32 1143 1144 dst.y = (src0.y \times src1.y) >> 32 1145 1146 dst.z = (src0.z \times src1.z) >> 32 1147 1148 dst.w = (src0.w \times src1.w) >> 32 1149 1150 1151.. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits 1152 1153 The high 32bits of the multiplication of 2 unsigned integers are returned. 1154 1155.. math:: 1156 1157 dst.x = (src0.x \times src1.x) >> 32 1158 1159 dst.y = (src0.y \times src1.y) >> 32 1160 1161 dst.z = (src0.z \times src1.z) >> 32 1162 1163 dst.w = (src0.w \times src1.w) >> 32 1164 1165 1166.. opcode:: IDIV - Signed Integer Division 1167 1168 TBD: behavior for division by zero. 1169 1170.. math:: 1171 1172 dst.x = \frac{src0.x}{src1.x} 1173 1174 dst.y = \frac{src0.y}{src1.y} 1175 1176 dst.z = \frac{src0.z}{src1.z} 1177 1178 dst.w = \frac{src0.w}{src1.w} 1179 1180 1181.. opcode:: UDIV - Unsigned Integer Division 1182 1183 For division by zero, 0xffffffff is returned. 1184 1185.. math:: 1186 1187 dst.x = \frac{src0.x}{src1.x} 1188 1189 dst.y = \frac{src0.y}{src1.y} 1190 1191 dst.z = \frac{src0.z}{src1.z} 1192 1193 dst.w = \frac{src0.w}{src1.w} 1194 1195 1196.. opcode:: UMOD - Unsigned Integer Remainder 1197 1198 If second arg is zero, 0xffffffff is returned. 1199 1200.. math:: 1201 1202 dst.x = src0.x \bmod src1.x 1203 1204 dst.y = src0.y \bmod src1.y 1205 1206 dst.z = src0.z \bmod src1.z 1207 1208 dst.w = src0.w \bmod src1.w 1209 1210 1211.. opcode:: NOT - Bitwise Not 1212 1213.. math:: 1214 1215 dst.x = \sim src.x 1216 1217 dst.y = \sim src.y 1218 1219 dst.z = \sim src.z 1220 1221 dst.w = \sim src.w 1222 1223 1224.. opcode:: AND - Bitwise And 1225 1226.. math:: 1227 1228 dst.x = src0.x \& src1.x 1229 1230 dst.y = src0.y \& src1.y 1231 1232 dst.z = src0.z \& src1.z 1233 1234 dst.w = src0.w \& src1.w 1235 1236 1237.. opcode:: OR - Bitwise Or 1238 1239.. math:: 1240 1241 dst.x = src0.x | src1.x 1242 1243 dst.y = src0.y | src1.y 1244 1245 dst.z = src0.z | src1.z 1246 1247 dst.w = src0.w | src1.w 1248 1249 1250.. opcode:: XOR - Bitwise Xor 1251 1252.. math:: 1253 1254 dst.x = src0.x \oplus src1.x 1255 1256 dst.y = src0.y \oplus src1.y 1257 1258 dst.z = src0.z \oplus src1.z 1259 1260 dst.w = src0.w \oplus src1.w 1261 1262 1263.. opcode:: IMAX - Maximum of Signed Integers 1264 1265.. math:: 1266 1267 dst.x = max(src0.x, src1.x) 1268 1269 dst.y = max(src0.y, src1.y) 1270 1271 dst.z = max(src0.z, src1.z) 1272 1273 dst.w = max(src0.w, src1.w) 1274 1275 1276.. opcode:: UMAX - Maximum of Unsigned Integers 1277 1278.. math:: 1279 1280 dst.x = max(src0.x, src1.x) 1281 1282 dst.y = max(src0.y, src1.y) 1283 1284 dst.z = max(src0.z, src1.z) 1285 1286 dst.w = max(src0.w, src1.w) 1287 1288 1289.. opcode:: IMIN - Minimum of Signed Integers 1290 1291.. math:: 1292 1293 dst.x = min(src0.x, src1.x) 1294 1295 dst.y = min(src0.y, src1.y) 1296 1297 dst.z = min(src0.z, src1.z) 1298 1299 dst.w = min(src0.w, src1.w) 1300 1301 1302.. opcode:: UMIN - Minimum of Unsigned Integers 1303 1304.. math:: 1305 1306 dst.x = min(src0.x, src1.x) 1307 1308 dst.y = min(src0.y, src1.y) 1309 1310 dst.z = min(src0.z, src1.z) 1311 1312 dst.w = min(src0.w, src1.w) 1313 1314 1315.. opcode:: SHL - Shift Left 1316 1317 The shift count is masked with 0x1f before the shift is applied. 1318 1319.. math:: 1320 1321 dst.x = src0.x << (0x1f \& src1.x) 1322 1323 dst.y = src0.y << (0x1f \& src1.y) 1324 1325 dst.z = src0.z << (0x1f \& src1.z) 1326 1327 dst.w = src0.w << (0x1f \& src1.w) 1328 1329 1330.. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer) 1331 1332 The shift count is masked with 0x1f before the shift is applied. 1333 1334.. math:: 1335 1336 dst.x = src0.x >> (0x1f \& src1.x) 1337 1338 dst.y = src0.y >> (0x1f \& src1.y) 1339 1340 dst.z = src0.z >> (0x1f \& src1.z) 1341 1342 dst.w = src0.w >> (0x1f \& src1.w) 1343 1344 1345.. opcode:: USHR - Logical Shift Right 1346 1347 The shift count is masked with 0x1f before the shift is applied. 1348 1349.. math:: 1350 1351 dst.x = src0.x >> (unsigned) (0x1f \& src1.x) 1352 1353 dst.y = src0.y >> (unsigned) (0x1f \& src1.y) 1354 1355 dst.z = src0.z >> (unsigned) (0x1f \& src1.z) 1356 1357 dst.w = src0.w >> (unsigned) (0x1f \& src1.w) 1358 1359 1360.. opcode:: UCMP - Integer Conditional Move 1361 1362.. math:: 1363 1364 dst.x = src0.x ? src1.x : src2.x 1365 1366 dst.y = src0.y ? src1.y : src2.y 1367 1368 dst.z = src0.z ? src1.z : src2.z 1369 1370 dst.w = src0.w ? src1.w : src2.w 1371 1372 1373 1374.. opcode:: ISSG - Integer Set Sign 1375 1376.. math:: 1377 1378 dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0 1379 1380 dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0 1381 1382 dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0 1383 1384 dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0 1385 1386 1387 1388.. opcode:: FSLT - Float Set On Less Than (ordered) 1389 1390 Same comparison as SLT but returns integer instead of 1.0/0.0 float 1391 1392.. math:: 1393 1394 dst.x = (src0.x < src1.x) ? \sim 0 : 0 1395 1396 dst.y = (src0.y < src1.y) ? \sim 0 : 0 1397 1398 dst.z = (src0.z < src1.z) ? \sim 0 : 0 1399 1400 dst.w = (src0.w < src1.w) ? \sim 0 : 0 1401 1402 1403.. opcode:: ISLT - Signed Integer Set On Less Than 1404 1405.. math:: 1406 1407 dst.x = (src0.x < src1.x) ? \sim 0 : 0 1408 1409 dst.y = (src0.y < src1.y) ? \sim 0 : 0 1410 1411 dst.z = (src0.z < src1.z) ? \sim 0 : 0 1412 1413 dst.w = (src0.w < src1.w) ? \sim 0 : 0 1414 1415 1416.. opcode:: USLT - Unsigned Integer Set On Less Than 1417 1418.. math:: 1419 1420 dst.x = (src0.x < src1.x) ? \sim 0 : 0 1421 1422 dst.y = (src0.y < src1.y) ? \sim 0 : 0 1423 1424 dst.z = (src0.z < src1.z) ? \sim 0 : 0 1425 1426 dst.w = (src0.w < src1.w) ? \sim 0 : 0 1427 1428 1429.. opcode:: FSGE - Float Set On Greater Equal Than (ordered) 1430 1431 Same comparison as SGE but returns integer instead of 1.0/0.0 float 1432 1433.. math:: 1434 1435 dst.x = (src0.x >= src1.x) ? \sim 0 : 0 1436 1437 dst.y = (src0.y >= src1.y) ? \sim 0 : 0 1438 1439 dst.z = (src0.z >= src1.z) ? \sim 0 : 0 1440 1441 dst.w = (src0.w >= src1.w) ? \sim 0 : 0 1442 1443 1444.. opcode:: ISGE - Signed Integer Set On Greater Equal Than 1445 1446.. math:: 1447 1448 dst.x = (src0.x >= src1.x) ? \sim 0 : 0 1449 1450 dst.y = (src0.y >= src1.y) ? \sim 0 : 0 1451 1452 dst.z = (src0.z >= src1.z) ? \sim 0 : 0 1453 1454 dst.w = (src0.w >= src1.w) ? \sim 0 : 0 1455 1456 1457.. opcode:: USGE - Unsigned Integer Set On Greater Equal Than 1458 1459.. math:: 1460 1461 dst.x = (src0.x >= src1.x) ? \sim 0 : 0 1462 1463 dst.y = (src0.y >= src1.y) ? \sim 0 : 0 1464 1465 dst.z = (src0.z >= src1.z) ? \sim 0 : 0 1466 1467 dst.w = (src0.w >= src1.w) ? \sim 0 : 0 1468 1469 1470.. opcode:: FSEQ - Float Set On Equal (ordered) 1471 1472 Same comparison as SEQ but returns integer instead of 1.0/0.0 float 1473 1474.. math:: 1475 1476 dst.x = (src0.x == src1.x) ? \sim 0 : 0 1477 1478 dst.y = (src0.y == src1.y) ? \sim 0 : 0 1479 1480 dst.z = (src0.z == src1.z) ? \sim 0 : 0 1481 1482 dst.w = (src0.w == src1.w) ? \sim 0 : 0 1483 1484 1485.. opcode:: USEQ - Integer Set On Equal 1486 1487.. math:: 1488 1489 dst.x = (src0.x == src1.x) ? \sim 0 : 0 1490 1491 dst.y = (src0.y == src1.y) ? \sim 0 : 0 1492 1493 dst.z = (src0.z == src1.z) ? \sim 0 : 0 1494 1495 dst.w = (src0.w == src1.w) ? \sim 0 : 0 1496 1497 1498.. opcode:: FSNE - Float Set On Not Equal (unordered) 1499 1500 Same comparison as SNE but returns integer instead of 1.0/0.0 float 1501 1502.. math:: 1503 1504 dst.x = (src0.x != src1.x) ? \sim 0 : 0 1505 1506 dst.y = (src0.y != src1.y) ? \sim 0 : 0 1507 1508 dst.z = (src0.z != src1.z) ? \sim 0 : 0 1509 1510 dst.w = (src0.w != src1.w) ? \sim 0 : 0 1511 1512 1513.. opcode:: USNE - Integer Set On Not Equal 1514 1515.. math:: 1516 1517 dst.x = (src0.x != src1.x) ? \sim 0 : 0 1518 1519 dst.y = (src0.y != src1.y) ? \sim 0 : 0 1520 1521 dst.z = (src0.z != src1.z) ? \sim 0 : 0 1522 1523 dst.w = (src0.w != src1.w) ? \sim 0 : 0 1524 1525 1526.. opcode:: INEG - Integer Negate 1527 1528 Two's complement. 1529 1530.. math:: 1531 1532 dst.x = -src.x 1533 1534 dst.y = -src.y 1535 1536 dst.z = -src.z 1537 1538 dst.w = -src.w 1539 1540 1541.. opcode:: IABS - Integer Absolute Value 1542 1543.. math:: 1544 1545 dst.x = |src.x| 1546 1547 dst.y = |src.y| 1548 1549 dst.z = |src.z| 1550 1551 dst.w = |src.w| 1552 1553Bitwise ISA 1554^^^^^^^^^^^ 1555These opcodes are used for bit-level manipulation of integers. 1556 1557.. opcode:: IBFE - Signed Bitfield Extract 1558 1559 Like GLSL bitfieldExtract. Extracts a set of bits from the input, and 1560 sign-extends them if the high bit of the extracted window is set. 1561 1562 Pseudocode:: 1563 1564 def ibfe(value, offset, bits): 1565 if offset < 0 or bits < 0 or offset + bits > 32: 1566 return undefined 1567 if bits == 0: return 0 1568 # Note: >> sign-extends 1569 return (value << (32 - offset - bits)) >> (32 - bits) 1570 1571.. opcode:: UBFE - Unsigned Bitfield Extract 1572 1573 Like GLSL bitfieldExtract. Extracts a set of bits from the input, without 1574 any sign-extension. 1575 1576 Pseudocode:: 1577 1578 def ubfe(value, offset, bits): 1579 if offset < 0 or bits < 0 or offset + bits > 32: 1580 return undefined 1581 if bits == 0: return 0 1582 # Note: >> does not sign-extend 1583 return (value << (32 - offset - bits)) >> (32 - bits) 1584 1585.. opcode:: BFI - Bitfield Insert 1586 1587 Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits 1588 of 'insert'. 1589 1590 Pseudocode:: 1591 1592 def bfi(base, insert, offset, bits): 1593 if offset < 0 or bits < 0 or offset + bits > 32: 1594 return undefined 1595 # << defined such that mask == ~0 when bits == 32, offset == 0 1596 mask = ((1 << bits) - 1) << offset 1597 return ((insert << offset) & mask) | (base & ~mask) 1598 1599.. opcode:: BREV - Bitfield Reverse 1600 1601 See SM5 instruction BFREV. Reverses the bits of the argument. 1602 1603.. opcode:: POPC - Population Count 1604 1605 See SM5 instruction COUNTBITS. Counts the number of set bits in the argument. 1606 1607.. opcode:: LSB - Index of lowest set bit 1608 1609 See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set 1610 bit of the argument. Returns -1 if none are set. 1611 1612.. opcode:: IMSB - Index of highest non-sign bit 1613 1614 See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest 1615 non-sign bit of the argument (i.e. highest 0 bit for negative numbers, 1616 highest 1 bit for positive numbers). Returns -1 if all bits are the same 1617 (i.e. for inputs 0 and -1). 1618 1619.. opcode:: UMSB - Index of highest set bit 1620 1621 See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest 1622 set bit of the argument. Returns -1 if none are set. 1623 1624Geometry ISA 1625^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1626 1627These opcodes are only supported in geometry shaders; they have no meaning 1628in any other type of shader. 1629 1630.. opcode:: EMIT - Emit 1631 1632 Generate a new vertex for the current primitive into the specified vertex 1633 stream using the values in the output registers. 1634 1635 1636.. opcode:: ENDPRIM - End Primitive 1637 1638 Complete the current primitive in the specified vertex stream (consisting of 1639 the emitted vertices), and start a new one. 1640 1641 1642GLSL ISA 1643^^^^^^^^^^ 1644 1645These opcodes are part of :term:`GLSL`'s opcode set. Support for these 1646opcodes is determined by a special capability bit, ``GLSL``. 1647Some require glsl version 1.30 (UIF/SWITCH/CASE/DEFAULT/ENDSWITCH). 1648 1649.. opcode:: CAL - Subroutine Call 1650 1651 push(pc) 1652 pc = target 1653 1654 1655.. opcode:: RET - Subroutine Call Return 1656 1657 pc = pop() 1658 1659 1660.. opcode:: CONT - Continue 1661 1662 Unconditionally moves the point of execution to the instruction after the 1663 last bgnloop. The instruction must appear within a bgnloop/endloop. 1664 1665.. note:: 1666 1667 Support for CONT is determined by a special capability bit, 1668 ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information. 1669 1670 1671.. opcode:: BGNLOOP - Begin a Loop 1672 1673 Start a loop. Must have a matching endloop. 1674 1675 1676.. opcode:: BGNSUB - Begin Subroutine 1677 1678 Starts definition of a subroutine. Must have a matching endsub. 1679 1680 1681.. opcode:: ENDLOOP - End a Loop 1682 1683 End a loop started with bgnloop. 1684 1685 1686.. opcode:: ENDSUB - End Subroutine 1687 1688 Ends definition of a subroutine. 1689 1690 1691.. opcode:: NOP - No Operation 1692 1693 Do nothing. 1694 1695 1696.. opcode:: BRK - Break 1697 1698 Unconditionally moves the point of execution to the instruction after the 1699 next endloop or endswitch. The instruction must appear within a loop/endloop 1700 or switch/endswitch. 1701 1702 1703.. opcode:: IF - Float If 1704 1705 Start an IF ... ELSE .. ENDIF block. Condition evaluates to true if 1706 1707 src0.x != 0.0 1708 1709 where src0.x is interpreted as a floating point register. 1710 1711 1712.. opcode:: UIF - Bitwise If 1713 1714 Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if 1715 1716 src0.x != 0 1717 1718 where src0.x is interpreted as an integer register. 1719 1720 1721.. opcode:: ELSE - Else 1722 1723 Starts an else block, after an IF or UIF statement. 1724 1725 1726.. opcode:: ENDIF - End If 1727 1728 Ends an IF or UIF block. 1729 1730 1731.. opcode:: SWITCH - Switch 1732 1733 Starts a C-style switch expression. The switch consists of one or multiple 1734 CASE statements, and at most one DEFAULT statement. Execution of a statement 1735 ends when a BRK is hit, but just like in C falling through to other cases 1736 without a break is allowed. Similarly, DEFAULT label is allowed anywhere not 1737 just as last statement, and fallthrough is allowed into/from it. 1738 CASE src arguments are evaluated at bit level against the SWITCH src argument. 1739 1740 Example:: 1741 1742 SWITCH src[0].x 1743 CASE src[0].x 1744 (some instructions here) 1745 (optional BRK here) 1746 DEFAULT 1747 (some instructions here) 1748 (optional BRK here) 1749 CASE src[0].x 1750 (some instructions here) 1751 (optional BRK here) 1752 ENDSWITCH 1753 1754 1755.. opcode:: CASE - Switch case 1756 1757 This represents a switch case label. The src arg must be an integer immediate. 1758 1759 1760.. opcode:: DEFAULT - Switch default 1761 1762 This represents the default case in the switch, which is taken if no other 1763 case matches. 1764 1765 1766.. opcode:: ENDSWITCH - End of switch 1767 1768 Ends a switch expression. 1769 1770 1771Interpolation ISA 1772^^^^^^^^^^^^^^^^^ 1773 1774The interpolation instructions allow an input to be interpolated in a 1775different way than its declaration. This corresponds to the GLSL 4.00 1776interpolateAt* functions. The first argument of each of these must come from 1777``TGSI_FILE_INPUT``. 1778 1779.. opcode:: INTERP_CENTROID - Interpolate at the centroid 1780 1781 Interpolates the varying specified by src0 at the centroid 1782 1783.. opcode:: INTERP_SAMPLE - Interpolate at the specified sample 1784 1785 Interpolates the varying specified by src0 at the sample id specified by 1786 src1.x (interpreted as an integer) 1787 1788.. opcode:: INTERP_OFFSET - Interpolate at the specified offset 1789 1790 Interpolates the varying specified by src0 at the offset src1.xy from the 1791 pixel center (interpreted as floats) 1792 1793 1794.. _doubleopcodes: 1795 1796Double ISA 1797^^^^^^^^^^^^^^^ 1798 1799The double-precision opcodes reinterpret four-component vectors into 1800two-component vectors with doubled precision in each component. 1801 1802.. opcode:: DABS - Absolute 1803 1804.. math:: 1805 1806 dst.xy = |src0.xy| 1807 1808 dst.zw = |src0.zw| 1809 1810.. opcode:: DADD - Add 1811 1812.. math:: 1813 1814 dst.xy = src0.xy + src1.xy 1815 1816 dst.zw = src0.zw + src1.zw 1817 1818.. opcode:: DSEQ - Set on Equal 1819 1820.. math:: 1821 1822 dst.x = src0.xy == src1.xy ? \sim 0 : 0 1823 1824 dst.z = src0.zw == src1.zw ? \sim 0 : 0 1825 1826.. opcode:: DSNE - Set on Not Equal 1827 1828.. math:: 1829 1830 dst.x = src0.xy != src1.xy ? \sim 0 : 0 1831 1832 dst.z = src0.zw != src1.zw ? \sim 0 : 0 1833 1834.. opcode:: DSLT - Set on Less than 1835 1836.. math:: 1837 1838 dst.x = src0.xy < src1.xy ? \sim 0 : 0 1839 1840 dst.z = src0.zw < src1.zw ? \sim 0 : 0 1841 1842.. opcode:: DSGE - Set on Greater equal 1843 1844.. math:: 1845 1846 dst.x = src0.xy >= src1.xy ? \sim 0 : 0 1847 1848 dst.z = src0.zw >= src1.zw ? \sim 0 : 0 1849 1850.. opcode:: DFRAC - Fraction 1851 1852.. math:: 1853 1854 dst.xy = src.xy - \lfloor src.xy\rfloor 1855 1856 dst.zw = src.zw - \lfloor src.zw\rfloor 1857 1858.. opcode:: DTRUNC - Truncate 1859 1860.. math:: 1861 1862 dst.xy = trunc(src.xy) 1863 1864 dst.zw = trunc(src.zw) 1865 1866.. opcode:: DCEIL - Ceiling 1867 1868.. math:: 1869 1870 dst.xy = \lceil src.xy\rceil 1871 1872 dst.zw = \lceil src.zw\rceil 1873 1874.. opcode:: DFLR - Floor 1875 1876.. math:: 1877 1878 dst.xy = \lfloor src.xy\rfloor 1879 1880 dst.zw = \lfloor src.zw\rfloor 1881 1882.. opcode:: DROUND - Fraction 1883 1884.. math:: 1885 1886 dst.xy = round(src.xy) 1887 1888 dst.zw = round(src.zw) 1889 1890.. opcode:: DSSG - Set Sign 1891 1892.. math:: 1893 1894 dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0 1895 1896 dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0 1897 1898.. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components 1899 1900Like the ``frexp()`` routine in many math libraries, this opcode stores the 1901exponent of its source to ``dst0``, and the significand to ``dst1``, such that 1902:math:`dst1 \times 2^{dst0} = src` . The results are replicated across 1903channels. 1904 1905.. math:: 1906 1907 dst0.xy = dst.zw = frac(src.xy) 1908 1909 dst1 = frac(src.xy) 1910 1911 1912.. opcode:: DLDEXP - Multiply Number by Integral Power of 2 1913 1914This opcode is the inverse of :opcode:`DFRACEXP`. The second 1915source is an integer. 1916 1917.. math:: 1918 1919 dst.xy = src0.xy \times 2^{src1.x} 1920 1921 dst.zw = src0.zw \times 2^{src1.z} 1922 1923.. opcode:: DMIN - Minimum 1924 1925.. math:: 1926 1927 dst.xy = min(src0.xy, src1.xy) 1928 1929 dst.zw = min(src0.zw, src1.zw) 1930 1931.. opcode:: DMAX - Maximum 1932 1933.. math:: 1934 1935 dst.xy = max(src0.xy, src1.xy) 1936 1937 dst.zw = max(src0.zw, src1.zw) 1938 1939.. opcode:: DMUL - Multiply 1940 1941.. math:: 1942 1943 dst.xy = src0.xy \times src1.xy 1944 1945 dst.zw = src0.zw \times src1.zw 1946 1947 1948.. opcode:: DMAD - Multiply And Add 1949 1950.. math:: 1951 1952 dst.xy = src0.xy \times src1.xy + src2.xy 1953 1954 dst.zw = src0.zw \times src1.zw + src2.zw 1955 1956 1957.. opcode:: DFMA - Fused Multiply-Add 1958 1959Perform a * b + c with no intermediate rounding step. 1960 1961.. math:: 1962 1963 dst.xy = src0.xy \times src1.xy + src2.xy 1964 1965 dst.zw = src0.zw \times src1.zw + src2.zw 1966 1967 1968.. opcode:: DDIV - Divide 1969 1970.. math:: 1971 1972 dst.xy = \frac{src0.xy}{src1.xy} 1973 1974 dst.zw = \frac{src0.zw}{src1.zw} 1975 1976 1977.. opcode:: DRCP - Reciprocal 1978 1979.. math:: 1980 1981 dst.xy = \frac{1}{src.xy} 1982 1983 dst.zw = \frac{1}{src.zw} 1984 1985.. opcode:: DSQRT - Square Root 1986 1987.. math:: 1988 1989 dst.xy = \sqrt{src.xy} 1990 1991 dst.zw = \sqrt{src.zw} 1992 1993.. opcode:: DRSQ - Reciprocal Square Root 1994 1995.. math:: 1996 1997 dst.xy = \frac{1}{\sqrt{src.xy}} 1998 1999 dst.zw = \frac{1}{\sqrt{src.zw}} 2000 2001.. opcode:: F2D - Float to Double 2002 2003.. math:: 2004 2005 dst.xy = double(src0.x) 2006 2007 dst.zw = double(src0.y) 2008 2009.. opcode:: D2F - Double to Float 2010 2011.. math:: 2012 2013 dst.x = float(src0.xy) 2014 2015 dst.y = float(src0.zw) 2016 2017.. opcode:: I2D - Int to Double 2018 2019.. math:: 2020 2021 dst.xy = double(src0.x) 2022 2023 dst.zw = double(src0.y) 2024 2025.. opcode:: D2I - Double to Int 2026 2027.. math:: 2028 2029 dst.x = int(src0.xy) 2030 2031 dst.y = int(src0.zw) 2032 2033.. opcode:: U2D - Unsigned Int to Double 2034 2035.. math:: 2036 2037 dst.xy = double(src0.x) 2038 2039 dst.zw = double(src0.y) 2040 2041.. opcode:: D2U - Double to Unsigned Int 2042 2043.. math:: 2044 2045 dst.x = unsigned(src0.xy) 2046 2047 dst.y = unsigned(src0.zw) 2048 204964-bit Integer ISA 2050^^^^^^^^^^^^^^^^^^ 2051 2052The 64-bit integer opcodes reinterpret four-component vectors into 2053two-component vectors with 64-bits in each component. 2054 2055.. opcode:: I64ABS - 64-bit Integer Absolute Value 2056 2057.. math:: 2058 2059 dst.xy = |src0.xy| 2060 2061 dst.zw = |src0.zw| 2062 2063.. opcode:: I64NEG - 64-bit Integer Negate 2064 2065 Two's complement. 2066 2067.. math:: 2068 2069 dst.xy = -src.xy 2070 2071 dst.zw = -src.zw 2072 2073.. opcode:: I64SSG - 64-bit Integer Set Sign 2074 2075.. math:: 2076 2077 dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0 2078 2079 dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0 2080 2081.. opcode:: U64ADD - 64-bit Integer Add 2082 2083.. math:: 2084 2085 dst.xy = src0.xy + src1.xy 2086 2087 dst.zw = src0.zw + src1.zw 2088 2089.. opcode:: U64MUL - 64-bit Integer Multiply 2090 2091.. math:: 2092 2093 dst.xy = src0.xy * src1.xy 2094 2095 dst.zw = src0.zw * src1.zw 2096 2097.. opcode:: U64SEQ - 64-bit Integer Set on Equal 2098 2099.. math:: 2100 2101 dst.x = src0.xy == src1.xy ? \sim 0 : 0 2102 2103 dst.z = src0.zw == src1.zw ? \sim 0 : 0 2104 2105.. opcode:: U64SNE - 64-bit Integer Set on Not Equal 2106 2107.. math:: 2108 2109 dst.x = src0.xy != src1.xy ? \sim 0 : 0 2110 2111 dst.z = src0.zw != src1.zw ? \sim 0 : 0 2112 2113.. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than 2114 2115.. math:: 2116 2117 dst.x = src0.xy < src1.xy ? \sim 0 : 0 2118 2119 dst.z = src0.zw < src1.zw ? \sim 0 : 0 2120 2121.. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal 2122 2123.. math:: 2124 2125 dst.x = src0.xy >= src1.xy ? \sim 0 : 0 2126 2127 dst.z = src0.zw >= src1.zw ? \sim 0 : 0 2128 2129.. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than 2130 2131.. math:: 2132 2133 dst.x = src0.xy < src1.xy ? \sim 0 : 0 2134 2135 dst.z = src0.zw < src1.zw ? \sim 0 : 0 2136 2137.. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal 2138 2139.. math:: 2140 2141 dst.x = src0.xy >= src1.xy ? \sim 0 : 0 2142 2143 dst.z = src0.zw >= src1.zw ? \sim 0 : 0 2144 2145.. opcode:: I64MIN - Minimum of 64-bit Signed Integers 2146 2147.. math:: 2148 2149 dst.xy = min(src0.xy, src1.xy) 2150 2151 dst.zw = min(src0.zw, src1.zw) 2152 2153.. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers 2154 2155.. math:: 2156 2157 dst.xy = min(src0.xy, src1.xy) 2158 2159 dst.zw = min(src0.zw, src1.zw) 2160 2161.. opcode:: I64MAX - Maximum of 64-bit Signed Integers 2162 2163.. math:: 2164 2165 dst.xy = max(src0.xy, src1.xy) 2166 2167 dst.zw = max(src0.zw, src1.zw) 2168 2169.. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers 2170 2171.. math:: 2172 2173 dst.xy = max(src0.xy, src1.xy) 2174 2175 dst.zw = max(src0.zw, src1.zw) 2176 2177.. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer 2178 2179 The shift count is masked with 0x3f before the shift is applied. 2180 2181.. math:: 2182 2183 dst.xy = src0.xy << (0x3f \& src1.x) 2184 2185 dst.zw = src0.zw << (0x3f \& src1.y) 2186 2187.. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer) 2188 2189 The shift count is masked with 0x3f before the shift is applied. 2190 2191.. math:: 2192 2193 dst.xy = src0.xy >> (0x3f \& src1.x) 2194 2195 dst.zw = src0.zw >> (0x3f \& src1.y) 2196 2197.. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer) 2198 2199 The shift count is masked with 0x3f before the shift is applied. 2200 2201.. math:: 2202 2203 dst.xy = src0.xy >> (unsigned) (0x3f \& src1.x) 2204 2205 dst.zw = src0.zw >> (unsigned) (0x3f \& src1.y) 2206 2207.. opcode:: I64DIV - 64-bit Signed Integer Division 2208 2209.. math:: 2210 2211 dst.xy = \frac{src0.xy}{src1.xy} 2212 2213 dst.zw = \frac{src0.zw}{src1.zw} 2214 2215.. opcode:: U64DIV - 64-bit Unsigned Integer Division 2216 2217.. math:: 2218 2219 dst.xy = \frac{src0.xy}{src1.xy} 2220 2221 dst.zw = \frac{src0.zw}{src1.zw} 2222 2223.. opcode:: U64MOD - 64-bit Unsigned Integer Remainder 2224 2225.. math:: 2226 2227 dst.xy = src0.xy \bmod src1.xy 2228 2229 dst.zw = src0.zw \bmod src1.zw 2230 2231.. opcode:: I64MOD - 64-bit Signed Integer Remainder 2232 2233.. math:: 2234 2235 dst.xy = src0.xy \bmod src1.xy 2236 2237 dst.zw = src0.zw \bmod src1.zw 2238 2239.. opcode:: F2U64 - Float to 64-bit Unsigned Int 2240 2241.. math:: 2242 2243 dst.xy = (uint64_t) src0.x 2244 2245 dst.zw = (uint64_t) src0.y 2246 2247.. opcode:: F2I64 - Float to 64-bit Int 2248 2249.. math:: 2250 2251 dst.xy = (int64_t) src0.x 2252 2253 dst.zw = (int64_t) src0.y 2254 2255.. opcode:: U2I64 - Unsigned Integer to 64-bit Integer 2256 2257 This is a zero extension. 2258 2259.. math:: 2260 2261 dst.xy = (int64_t) src0.x 2262 2263 dst.zw = (int64_t) src0.y 2264 2265.. opcode:: I2I64 - Signed Integer to 64-bit Integer 2266 2267 This is a sign extension. 2268 2269.. math:: 2270 2271 dst.xy = (int64_t) src0.x 2272 2273 dst.zw = (int64_t) src0.y 2274 2275.. opcode:: D2U64 - Double to 64-bit Unsigned Int 2276 2277.. math:: 2278 2279 dst.xy = (uint64_t) src0.xy 2280 2281 dst.zw = (uint64_t) src0.zw 2282 2283.. opcode:: D2I64 - Double to 64-bit Int 2284 2285.. math:: 2286 2287 dst.xy = (int64_t) src0.xy 2288 2289 dst.zw = (int64_t) src0.zw 2290 2291.. opcode:: U642F - 64-bit unsigned integer to float 2292 2293.. math:: 2294 2295 dst.x = (float) src0.xy 2296 2297 dst.y = (float) src0.zw 2298 2299.. opcode:: I642F - 64-bit Int to Float 2300 2301.. math:: 2302 2303 dst.x = (float) src0.xy 2304 2305 dst.y = (float) src0.zw 2306 2307.. opcode:: U642D - 64-bit unsigned integer to double 2308 2309.. math:: 2310 2311 dst.xy = (double) src0.xy 2312 2313 dst.zw = (double) src0.zw 2314 2315.. opcode:: I642D - 64-bit Int to double 2316 2317.. math:: 2318 2319 dst.xy = (double) src0.xy 2320 2321 dst.zw = (double) src0.zw 2322 2323.. _samplingopcodes: 2324 2325Resource Sampling Opcodes 2326^^^^^^^^^^^^^^^^^^^^^^^^^ 2327 2328Those opcodes follow very closely semantics of the respective Direct3D 2329instructions. If in doubt double check Direct3D documentation. 2330Note that the swizzle on SVIEW (src1) determines texel swizzling 2331after lookup. 2332 2333.. opcode:: SAMPLE 2334 2335 Using provided address, sample data from the specified texture using the 2336 filtering mode identified by the given sampler. The source data may come from 2337 any resource type other than buffers. 2338 2339 Syntax: ``SAMPLE dst, address, sampler_view, sampler`` 2340 2341 Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]`` 2342 2343.. opcode:: SAMPLE_I 2344 2345 Simplified alternative to the SAMPLE instruction. Using the provided 2346 integer address, SAMPLE_I fetches data from the specified sampler view 2347 without any filtering. The source data may come from any resource type 2348 other than CUBE. 2349 2350 Syntax: ``SAMPLE_I dst, address, sampler_view`` 2351 2352 Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]`` 2353 2354 The 'address' is specified as unsigned integers. If the 'address' is out of 2355 range [0...(# texels - 1)] the result of the fetch is always 0 in all 2356 components. As such the instruction doesn't honor address wrap modes, in 2357 cases where that behavior is desirable 'SAMPLE' instruction should be used. 2358 address.w always provides an unsigned integer mipmap level. If the value is 2359 out of the range then the instruction always returns 0 in all components. 2360 address.yz are ignored for buffers and 1d textures. address.z is ignored 2361 for 1d texture arrays and 2d textures. 2362 2363 For 1D texture arrays address.y provides the array index (also as unsigned 2364 integer). If the value is out of the range of available array indices 2365 [0... (array size - 1)] then the opcode always returns 0 in all components. 2366 For 2D texture arrays address.z provides the array index, otherwise it 2367 exhibits the same behavior as in the case for 1D texture arrays. The exact 2368 semantics of the source address are presented in the table below: 2369 2370 +---------------------------+----+-----+-----+---------+ 2371 | resource type | X | Y | Z | W | 2372 +===========================+====+=====+=====+=========+ 2373 | ``PIPE_BUFFER`` | x | | | ignored | 2374 +---------------------------+----+-----+-----+---------+ 2375 | ``PIPE_TEXTURE_1D`` | x | | | mpl | 2376 +---------------------------+----+-----+-----+---------+ 2377 | ``PIPE_TEXTURE_2D`` | x | y | | mpl | 2378 +---------------------------+----+-----+-----+---------+ 2379 | ``PIPE_TEXTURE_3D`` | x | y | z | mpl | 2380 +---------------------------+----+-----+-----+---------+ 2381 | ``PIPE_TEXTURE_RECT`` | x | y | | mpl | 2382 +---------------------------+----+-----+-----+---------+ 2383 | ``PIPE_TEXTURE_CUBE`` | not allowed as source | 2384 +---------------------------+----+-----+-----+---------+ 2385 | ``PIPE_TEXTURE_1D_ARRAY`` | x | idx | | mpl | 2386 +---------------------------+----+-----+-----+---------+ 2387 | ``PIPE_TEXTURE_2D_ARRAY`` | x | y | idx | mpl | 2388 +---------------------------+----+-----+-----+---------+ 2389 2390 Where 'mpl' is a mipmap level and 'idx' is the array index. 2391 2392.. opcode:: SAMPLE_I_MS 2393 2394 Just like SAMPLE_I but allows fetch data from multi-sampled surfaces. 2395 2396 Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample`` 2397 2398.. opcode:: SAMPLE_B 2399 2400 Just like the SAMPLE instruction with the exception that an additional bias 2401 is applied to the level of detail computed as part of the instruction 2402 execution. 2403 2404 Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias`` 2405 2406 Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x`` 2407 2408.. opcode:: SAMPLE_C 2409 2410 Similar to the SAMPLE instruction but it performs a comparison filter. The 2411 operands to SAMPLE_C are identical to SAMPLE, except that there is an 2412 additional float32 operand, reference value, which must be a register with 2413 single-component, or a scalar literal. SAMPLE_C makes the hardware use the 2414 current samplers compare_func (in pipe_sampler_state) to compare reference 2415 value against the red component value for the surce resource at each texel 2416 that the currently configured texture filter covers based on the provided 2417 coordinates. 2418 2419 Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value`` 2420 2421 Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x`` 2422 2423.. opcode:: SAMPLE_C_LZ 2424 2425 Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands 2426 for level-zero. 2427 2428 Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value`` 2429 2430 Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x`` 2431 2432 2433.. opcode:: SAMPLE_D 2434 2435 SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for 2436 the source address in the x direction and the y direction are provided by 2437 extra parameters. 2438 2439 Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y`` 2440 2441 Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]`` 2442 2443.. opcode:: SAMPLE_L 2444 2445 SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided 2446 directly as a scalar value, representing no anisotropy. 2447 2448 Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod`` 2449 2450 Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x`` 2451 2452.. opcode:: GATHER4 2453 2454 Gathers the four texels to be used in a bi-linear filtering operation and 2455 packs them into a single register. Only works with 2D, 2D array, cubemaps, 2456 and cubemaps arrays. For 2D textures, only the addressing modes of the 2457 sampler and the top level of any mip pyramid are used. Set W to zero. It 2458 behaves like the SAMPLE instruction, but a filtered sample is not 2459 generated. The four samples that contribute to filtering are placed into 2460 xyzw in counter-clockwise order, starting with the (u,v) texture coordinate 2461 delta at the following locations (-, +), (+, +), (+, -), (-, -), where the 2462 magnitude of the deltas are half a texel. 2463 2464 2465.. opcode:: SVIEWINFO 2466 2467 Query the dimensions of a given sampler view. dst receives width, height, 2468 depth or array size and number of mipmap levels as int4. The dst can have a 2469 writemask which will specify what info is the caller interested in. 2470 2471 Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view`` 2472 2473 Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]`` 2474 2475 src_mip_level is an unsigned integer scalar. If it's out of range then 2476 returns 0 for width, height and depth/array size but the total number of 2477 mipmap is still returned correctly for the given sampler view. The returned 2478 width, height and depth values are for the mipmap level selected by the 2479 src_mip_level and are in the number of texels. For 1d texture array width 2480 is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is 2481 still in dst.w. In contrast to d3d10 resinfo, there's no way in the tgsi 2482 instruction encoding to specify the return type (float/rcpfloat/uint), hence 2483 always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1 2484 resinfo allowing swizzling dst values is ignored (due to the interaction 2485 with rcpfloat modifier which requires some swizzle handling in the state 2486 tracker anyway). 2487 2488.. opcode:: SAMPLE_POS 2489 2490 Query the position of a sample in the given resource or render target 2491 when per-sample fragment shading is in effect. 2492 2493 Syntax: ``SAMPLE_POS dst, source, sample_index`` 2494 2495 dst receives float4 (x, y, undef, undef) indicated where the sample is 2496 located. Sample locations are in the range [0, 1] where 0.5 is the center 2497 of the fragment. 2498 2499 source is either a sampler view (to indicate a shader resource) or temp 2500 register (to indicate the render target). The source register may have 2501 an optional swizzle to apply to the returned result 2502 2503 sample_index is an integer scalar indicating which sample position is to 2504 be queried. 2505 2506 If per-sample shading is not in effect or the source resource or render 2507 target is not multisampled, the result is (0.5, 0.5, undef, undef). 2508 2509 NOTE: no driver has implemented this opcode yet (and no gallium frontend 2510 emits it). This information is subject to change. 2511 2512.. opcode:: SAMPLE_INFO 2513 2514 Query the number of samples in a multisampled resource or render target. 2515 2516 Syntax: ``SAMPLE_INFO dst, source`` 2517 2518 dst receives int4 (n, 0, 0, 0) where n is the number of samples in a 2519 resource or the render target. 2520 2521 source is either a sampler view (to indicate a shader resource) or temp 2522 register (to indicate the render target). The source register may have 2523 an optional swizzle to apply to the returned result 2524 2525 If per-sample shading is not in effect or the source resource or render 2526 target is not multisampled, the result is (1, 0, 0, 0). 2527 2528 NOTE: no driver has implemented this opcode yet (and no gallium frontend 2529 emits it). This information is subject to change. 2530 2531.. opcode:: LOD - level of detail 2532 2533 Same syntax as the SAMPLE opcode but instead of performing an actual 2534 texture lookup/filter, return the computed LOD information that the 2535 texture pipe would use to access the texture. The Y component contains 2536 the computed LOD lambda_prime. The X component contains the LOD that will 2537 be accessed, based on min/max lod's and mipmap filters. 2538 The Z and W components are set to 0. 2539 2540 Syntax: ``LOD dst, address, sampler_view, sampler`` 2541 2542 2543.. _resourceopcodes: 2544 2545Resource Access Opcodes 2546^^^^^^^^^^^^^^^^^^^^^^^ 2547 2548For these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY. 2549 2550.. opcode:: LOAD - Fetch data from a shader buffer or image 2551 2552 Syntax: ``LOAD dst, resource, address`` 2553 2554 Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]`` 2555 2556 Using the provided integer address, LOAD fetches data 2557 from the specified buffer or texture without any 2558 filtering. 2559 2560 The 'address' is specified as a vector of unsigned 2561 integers. If the 'address' is out of range the result 2562 is unspecified. 2563 2564 Only the first mipmap level of a resource can be read 2565 from using this instruction. 2566 2567 For 1D or 2D texture arrays, the array index is 2568 provided as an unsigned integer in address.y or 2569 address.z, respectively. address.yz are ignored for 2570 buffers and 1D textures. address.z is ignored for 1D 2571 texture arrays and 2D textures. address.w is always 2572 ignored. 2573 2574 A swizzle suffix may be added to the resource argument 2575 this will cause the resource data to be swizzled accordingly. 2576 2577.. opcode:: STORE - Write data to a shader resource 2578 2579 Syntax: ``STORE resource, address, src`` 2580 2581 Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]`` 2582 2583 Using the provided integer address, STORE writes data 2584 to the specified buffer or texture. 2585 2586 The 'address' is specified as a vector of unsigned 2587 integers. If the 'address' is out of range the result 2588 is unspecified. 2589 2590 Only the first mipmap level of a resource can be 2591 written to using this instruction. 2592 2593 For 1D or 2D texture arrays, the array index is 2594 provided as an unsigned integer in address.y or 2595 address.z, respectively. address.yz are ignored for 2596 buffers and 1D textures. address.z is ignored for 1D 2597 texture arrays and 2D textures. address.w is always 2598 ignored. 2599 2600.. opcode:: RESQ - Query information about a resource 2601 2602 Syntax: ``RESQ dst, resource`` 2603 2604 Example: ``RESQ TEMP[0], BUFFER[0]`` 2605 2606 Returns information about the buffer or image resource. For buffer 2607 resources, the size (in bytes) is returned in the x component. For 2608 image resources, .xyz will contain the width/height/layers of the 2609 image, while .w will contain the number of samples for multi-sampled 2610 images. 2611 2612.. opcode:: FBFETCH - Load data from framebuffer 2613 2614 Syntax: ``FBFETCH dst, output`` 2615 2616 Example: ``FBFETCH TEMP[0], OUT[0]`` 2617 2618 This is only valid on ``COLOR`` semantic outputs. Returns the color 2619 of the current position in the framebuffer from before this fragment 2620 shader invocation. May return the same value from multiple calls for 2621 a particular output within a single invocation. Note that result may 2622 be undefined if a fragment is drawn multiple times without a blend 2623 barrier in between. 2624 2625 2626.. _bindlessopcodes: 2627 2628Bindless Opcodes 2629^^^^^^^^^^^^^^^^ 2630 2631These opcodes are for working with bindless sampler or image handles and 2632require PIPE_CAP_BINDLESS_TEXTURE. 2633 2634.. opcode:: IMG2HND - Get a bindless handle for a image 2635 2636 Syntax: ``IMG2HND dst, image`` 2637 2638 Example: ``IMG2HND TEMP[0], IMAGE[0]`` 2639 2640 Sets 'dst' to a bindless handle for 'image'. 2641 2642.. opcode:: SAMP2HND - Get a bindless handle for a sampler 2643 2644 Syntax: ``SAMP2HND dst, sampler`` 2645 2646 Example: ``SAMP2HND TEMP[0], SAMP[0]`` 2647 2648 Sets 'dst' to a bindless handle for 'sampler'. 2649 2650 2651.. _threadsyncopcodes: 2652 2653Inter-thread synchronization opcodes 2654^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2655 2656These opcodes are intended for communication between threads running 2657within the same compute grid. For now they're only valid in compute 2658programs. 2659 2660.. opcode:: BARRIER - Thread group barrier 2661 2662 ``BARRIER`` 2663 2664 This opcode suspends the execution of the current thread until all 2665 the remaining threads in the working group reach the same point of 2666 the program. Results are unspecified if any of the remaining 2667 threads terminates or never reaches an executed BARRIER instruction. 2668 2669.. opcode:: MEMBAR - Memory barrier 2670 2671 ``MEMBAR type`` 2672 2673 This opcode waits for the completion of all memory accesses based on 2674 the type passed in. The type is an immediate bitfield with the following 2675 meaning: 2676 2677 Bit 0: Shader storage buffers 2678 Bit 1: Atomic buffers 2679 Bit 2: Images 2680 Bit 3: Shared memory 2681 Bit 4: Thread group 2682 2683 These may be passed in in any combination. An implementation is free to not 2684 distinguish between these as it sees fit. However these map to all the 2685 possibilities made available by GLSL. 2686 2687.. _atomopcodes: 2688 2689Atomic opcodes 2690^^^^^^^^^^^^^^ 2691 2692These opcodes provide atomic variants of some common arithmetic and 2693logical operations. In this context atomicity means that another 2694concurrent memory access operation that affects the same memory 2695location is guaranteed to be performed strictly before or after the 2696entire execution of the atomic operation. The resource may be a BUFFER, 2697IMAGE, HWATOMIC, or MEMORY. In the case of an image, the offset works 2698the same as for ``LOAD`` and ``STORE``, specified above. For atomic 2699counters, the offset is an immediate index to the base hw atomic 2700counter for this operation. 2701These atomic operations may only be used with 32-bit integer image formats. 2702 2703.. opcode:: ATOMUADD - Atomic integer addition 2704 2705 Syntax: ``ATOMUADD dst, resource, offset, src`` 2706 2707 Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2708 2709 The following operation is performed atomically: 2710 2711.. math:: 2712 2713 dst_x = resource[offset] 2714 2715 resource[offset] = dst_x + src_x 2716 2717 2718.. opcode:: ATOMFADD - Atomic floating point addition 2719 2720 Syntax: ``ATOMFADD dst, resource, offset, src`` 2721 2722 Example: ``ATOMFADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2723 2724 The following operation is performed atomically: 2725 2726.. math:: 2727 2728 dst_x = resource[offset] 2729 2730 resource[offset] = dst_x + src_x 2731 2732 2733.. opcode:: ATOMXCHG - Atomic exchange 2734 2735 Syntax: ``ATOMXCHG dst, resource, offset, src`` 2736 2737 Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2738 2739 The following operation is performed atomically: 2740 2741.. math:: 2742 2743 dst_x = resource[offset] 2744 2745 resource[offset] = src_x 2746 2747 2748.. opcode:: ATOMCAS - Atomic compare-and-exchange 2749 2750 Syntax: ``ATOMCAS dst, resource, offset, cmp, src`` 2751 2752 Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]`` 2753 2754 The following operation is performed atomically: 2755 2756.. math:: 2757 2758 dst_x = resource[offset] 2759 2760 resource[offset] = (dst_x == cmp_x ? src_x : dst_x) 2761 2762 2763.. opcode:: ATOMAND - Atomic bitwise And 2764 2765 Syntax: ``ATOMAND dst, resource, offset, src`` 2766 2767 Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2768 2769 The following operation is performed atomically: 2770 2771.. math:: 2772 2773 dst_x = resource[offset] 2774 2775 resource[offset] = dst_x \& src_x 2776 2777 2778.. opcode:: ATOMOR - Atomic bitwise Or 2779 2780 Syntax: ``ATOMOR dst, resource, offset, src`` 2781 2782 Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2783 2784 The following operation is performed atomically: 2785 2786.. math:: 2787 2788 dst_x = resource[offset] 2789 2790 resource[offset] = dst_x | src_x 2791 2792 2793.. opcode:: ATOMXOR - Atomic bitwise Xor 2794 2795 Syntax: ``ATOMXOR dst, resource, offset, src`` 2796 2797 Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2798 2799 The following operation is performed atomically: 2800 2801.. math:: 2802 2803 dst_x = resource[offset] 2804 2805 resource[offset] = dst_x \oplus src_x 2806 2807 2808.. opcode:: ATOMUMIN - Atomic unsigned minimum 2809 2810 Syntax: ``ATOMUMIN dst, resource, offset, src`` 2811 2812 Example: ``ATOMUMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2813 2814 The following operation is performed atomically: 2815 2816.. math:: 2817 2818 dst_x = resource[offset] 2819 2820 resource[offset] = (dst_x < src_x ? dst_x : src_x) 2821 2822 2823.. opcode:: ATOMUMAX - Atomic unsigned maximum 2824 2825 Syntax: ``ATOMUMAX dst, resource, offset, src`` 2826 2827 Example: ``ATOMUMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2828 2829 The following operation is performed atomically: 2830 2831.. math:: 2832 2833 dst_x = resource[offset] 2834 2835 resource[offset] = (dst_x > src_x ? dst_x : src_x) 2836 2837 2838.. opcode:: ATOMIMIN - Atomic signed minimum 2839 2840 Syntax: ``ATOMIMIN dst, resource, offset, src`` 2841 2842 Example: ``ATOMIMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2843 2844 The following operation is performed atomically: 2845 2846.. math:: 2847 2848 dst_x = resource[offset] 2849 2850 resource[offset] = (dst_x < src_x ? dst_x : src_x) 2851 2852 2853.. opcode:: ATOMIMAX - Atomic signed maximum 2854 2855 Syntax: ``ATOMIMAX dst, resource, offset, src`` 2856 2857 Example: ``ATOMIMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2858 2859 The following operation is performed atomically: 2860 2861.. math:: 2862 2863 dst_x = resource[offset] 2864 2865 resource[offset] = (dst_x > src_x ? dst_x : src_x) 2866 2867 2868.. opcode:: ATOMINC_WRAP - Atomic increment + wrap around 2869 2870 Syntax: ``ATOMINC_WRAP dst, resource, offset, src`` 2871 2872 Example: ``ATOMINC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2873 2874 The following operation is performed atomically: 2875 2876.. math:: 2877 2878 dst_x = resource[offset] + 1 2879 2880 resource[offset] = dst_x <= src_x ? dst_x : 0 2881 2882 2883.. opcode:: ATOMDEC_WRAP - Atomic decrement + wrap around 2884 2885 Syntax: ``ATOMDEC_WRAP dst, resource, offset, src`` 2886 2887 Example: ``ATOMDEC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2888 2889 The following operation is performed atomically: 2890 2891.. math:: 2892 2893 dst_x = resource[offset] 2894 2895 resource[offset] = (dst_x > 0 && dst_x < src_x) ? dst_x - 1 : 0 2896 2897 2898.. _interlaneopcodes: 2899 2900Inter-lane opcodes 2901^^^^^^^^^^^^^^^^^^ 2902 2903These opcodes reduce the given value across the shader invocations 2904running in the current SIMD group. Every thread in the subgroup will receive 2905the same result. The BALLOT operations accept a single-channel argument that 2906is treated as a boolean and produce a 64-bit value. 2907 2908.. opcode:: VOTE_ANY - Value is set in any of the active invocations 2909 2910 Syntax: ``VOTE_ANY dst, value`` 2911 2912 Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x`` 2913 2914 2915.. opcode:: VOTE_ALL - Value is set in all of the active invocations 2916 2917 Syntax: ``VOTE_ALL dst, value`` 2918 2919 Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x`` 2920 2921 2922.. opcode:: VOTE_EQ - Value is the same in all of the active invocations 2923 2924 Syntax: ``VOTE_EQ dst, value`` 2925 2926 Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x`` 2927 2928 2929.. opcode:: BALLOT - Lanemask of whether the value is set in each active 2930 invocation 2931 2932 Syntax: ``BALLOT dst, value`` 2933 2934 Example: ``BALLOT TEMP[0].xy, TEMP[1].x`` 2935 2936 When the argument is a constant true, this produces a bitmask of active 2937 invocations. In fragment shaders, this can include helper invocations 2938 (invocations whose outputs and writes to memory are discarded, but which 2939 are used to compute derivatives). 2940 2941 2942.. opcode:: READ_FIRST - Broadcast the value from the first active 2943 invocation to all active lanes 2944 2945 Syntax: ``READ_FIRST dst, value`` 2946 2947 Example: ``READ_FIRST TEMP[0], TEMP[1]`` 2948 2949 2950.. opcode:: READ_INVOC - Retrieve the value from the given invocation 2951 (need not be uniform) 2952 2953 Syntax: ``READ_INVOC dst, value, invocation`` 2954 2955 Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x`` 2956 2957 invocation.x controls the invocation number to read from for all channels. 2958 The invocation number must be the same across all active invocations in a 2959 sub-group; otherwise, the results are undefined. 2960 2961 2962Explanation of symbols used 2963------------------------------ 2964 2965 2966Functions 2967^^^^^^^^^^^^^^ 2968 2969 2970 :math:`|x|` Absolute value of `x`. 2971 2972 :math:`\lceil x \rceil` Ceiling of `x`. 2973 2974 clamp(x,y,z) Clamp x between y and z. 2975 (x < y) ? y : (x > z) ? z : x 2976 2977 :math:`\lfloor x\rfloor` Floor of `x`. 2978 2979 :math:`\log_2{x}` Logarithm of `x`, base 2. 2980 2981 max(x,y) Maximum of x and y. 2982 (x > y) ? x : y 2983 2984 min(x,y) Minimum of x and y. 2985 (x < y) ? x : y 2986 2987 partialx(x) Derivative of x relative to fragment's X. 2988 2989 partialy(x) Derivative of x relative to fragment's Y. 2990 2991 pop() Pop from stack. 2992 2993 :math:`x^y` `x` to the power `y`. 2994 2995 push(x) Push x on stack. 2996 2997 round(x) Round x. 2998 2999 trunc(x) Truncate x, i.e. drop the fraction bits. 3000 3001 3002Keywords 3003^^^^^^^^^^^^^ 3004 3005 3006 discard Discard fragment. 3007 3008 pc Program counter. 3009 3010 target Label of target instruction. 3011 3012 3013Other tokens 3014--------------- 3015 3016 3017Declaration 3018^^^^^^^^^^^ 3019 3020 3021Declares a register that is will be referenced as an operand in Instruction 3022tokens. 3023 3024File field contains register file that is being declared and is one 3025of TGSI_FILE. 3026 3027UsageMask field specifies which of the register components can be accessed 3028and is one of TGSI_WRITEMASK. 3029 3030The Local flag specifies that a given value isn't intended for 3031subroutine parameter passing and, as a result, the implementation 3032isn't required to give any guarantees of it being preserved across 3033subroutine boundaries. As it's merely a compiler hint, the 3034implementation is free to ignore it. 3035 3036If Dimension flag is set to 1, a Declaration Dimension token follows. 3037 3038If Semantic flag is set to 1, a Declaration Semantic token follows. 3039 3040If Interpolate flag is set to 1, a Declaration Interpolate token follows. 3041 3042If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows. 3043 3044If Array flag is set to 1, a Declaration Array token follows. 3045 3046Array Declaration 3047^^^^^^^^^^^^^^^^^^^^^^^^ 3048 3049Declarations can optional have an ArrayID attribute which can be referred by 3050indirect addressing operands. An ArrayID of zero is reserved and treated as 3051if no ArrayID is specified. 3052 3053If an indirect addressing operand refers to a specific declaration by using 3054an ArrayID only the registers in this declaration are guaranteed to be 3055accessed, accessing any register outside this declaration results in undefined 3056behavior. Note that for compatibility the effective index is zero-based and 3057not relative to the specified declaration 3058 3059If no ArrayID is specified with an indirect addressing operand the whole 3060register file might be accessed by this operand. This is strongly discouraged 3061and will prevent packing of scalar/vec2 arrays and effective alias analysis. 3062This is only legal for TEMP and CONST register files. 3063 3064Declaration Semantic 3065^^^^^^^^^^^^^^^^^^^^^^^^ 3066 3067Vertex and fragment shader input and output registers may be labeled 3068with semantic information consisting of a name and index. 3069 3070Follows Declaration token if Semantic bit is set. 3071 3072Since its purpose is to link a shader with other stages of the pipeline, 3073it is valid to follow only those Declaration tokens that declare a register 3074either in INPUT or OUTPUT file. 3075 3076SemanticName field contains the semantic name of the register being declared. 3077There is no default value. 3078 3079SemanticIndex is an optional subscript that can be used to distinguish 3080different register declarations with the same semantic name. The default value 3081is 0. 3082 3083The meanings of the individual semantic names are explained in the following 3084sections. 3085 3086TGSI_SEMANTIC_POSITION 3087"""""""""""""""""""""" 3088 3089For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader 3090output register which contains the homogeneous vertex position in the clip 3091space coordinate system. After clipping, the X, Y and Z components of the 3092vertex will be divided by the W value to get normalized device coordinates. 3093 3094For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that 3095fragment shader input (or system value, depending on which one is 3096supported by the driver) contains the fragment's window position. The X 3097component starts at zero and always increases from left to right. 3098The Y component starts at zero and always increases but Y=0 may either 3099indicate the top of the window or the bottom depending on the fragment 3100coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN). 3101The Z coordinate ranges from 0 to 1 to represent depth from the front 3102to the back of the Z buffer. The W component contains the interpolated 3103reciprocal of the vertex position W component (corresponding to gl_Fragcoord, 3104but unlike d3d10 which interpolates the same 1/w but then gives back 3105the reciprocal of the interpolated value). 3106 3107Fragment shaders may also declare an output register with 3108TGSI_SEMANTIC_POSITION. Only the Z component is writable. This allows 3109the fragment shader to change the fragment's Z position. 3110 3111 3112 3113TGSI_SEMANTIC_COLOR 3114""""""""""""""""""" 3115 3116For vertex shader outputs or fragment shader inputs/outputs, this 3117label indicates that the register contains an R,G,B,A color. 3118 3119Several shader inputs/outputs may contain colors so the semantic index 3120is used to distinguish them. For example, color[0] may be the diffuse 3121color while color[1] may be the specular color. 3122 3123This label is needed so that the flat/smooth shading can be applied 3124to the right interpolants during rasterization. 3125 3126 3127 3128TGSI_SEMANTIC_BCOLOR 3129"""""""""""""""""""" 3130 3131Back-facing colors are only used for back-facing polygons, and are only valid 3132in vertex shader outputs. After rasterization, all polygons are front-facing 3133and COLOR and BCOLOR end up occupying the same slots in the fragment shader, 3134so all BCOLORs effectively become regular COLORs in the fragment shader. 3135 3136 3137TGSI_SEMANTIC_FOG 3138""""""""""""""""" 3139 3140Vertex shader inputs and outputs and fragment shader inputs may be 3141labeled with TGSI_SEMANTIC_FOG to indicate that the register contains 3142a fog coordinate. Typically, the fragment shader will use the fog coordinate 3143to compute a fog blend factor which is used to blend the normal fragment color 3144with a constant fog color. But fog coord really is just an ordinary vec4 3145register like regular semantics. 3146 3147 3148TGSI_SEMANTIC_PSIZE 3149""""""""""""""""""" 3150 3151Vertex shader input and output registers may be labeled with 3152TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size 3153in the form (S, 0, 0, 1). The point size controls the width or diameter 3154of points for rasterization. This label cannot be used in fragment 3155shaders. 3156 3157When using this semantic, be sure to set the appropriate state in the 3158:ref:`rasterizer` first. 3159 3160 3161TGSI_SEMANTIC_TEXCOORD 3162"""""""""""""""""""""" 3163 3164Only available if PIPE_CAP_TGSI_TEXCOORD is exposed ! 3165 3166Vertex shader outputs and fragment shader inputs may be labeled with 3167this semantic to make them replaceable by sprite coordinates via the 3168sprite_coord_enable state in the :ref:`rasterizer`. 3169The semantic index permitted with this semantic is limited to <= 7. 3170 3171If the driver does not support TEXCOORD, sprite coordinate replacement 3172applies to inputs with the GENERIC semantic instead. 3173 3174The intended use case for this semantic is gl_TexCoord. 3175 3176 3177TGSI_SEMANTIC_PCOORD 3178"""""""""""""""""""" 3179 3180Only available if PIPE_CAP_TGSI_TEXCOORD is exposed ! 3181 3182Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate 3183that the register contains sprite coordinates in the form (x, y, 0, 1), if 3184the current primitive is a point and point sprites are enabled. Otherwise, 3185the contents of the register are undefined. 3186 3187The intended use case for this semantic is gl_PointCoord. 3188 3189 3190TGSI_SEMANTIC_GENERIC 3191""""""""""""""""""""" 3192 3193All vertex/fragment shader inputs/outputs not labeled with any other 3194semantic label can be considered to be generic attributes. Typical 3195uses of generic inputs/outputs are texcoords and user-defined values. 3196 3197 3198TGSI_SEMANTIC_NORMAL 3199"""""""""""""""""""" 3200 3201Indicates that a vertex shader input is a normal vector. This is 3202typically only used for legacy graphics APIs. 3203 3204 3205TGSI_SEMANTIC_FACE 3206"""""""""""""""""" 3207 3208This label applies to fragment shader inputs (or system values, 3209depending on which one is supported by the driver) and indicates that 3210the register contains front/back-face information. 3211 3212If it is an input, it will be a floating-point vector in the form (F, 0, 0, 1), 3213where F will be positive when the fragment belongs to a front-facing polygon, 3214and negative when the fragment belongs to a back-facing polygon. 3215 3216If it is a system value, it will be an integer vector in the form (F, 0, 0, 1), 3217where F is 0xffffffff when the fragment belongs to a front-facing polygon and 32180 when the fragment belongs to a back-facing polygon. 3219 3220 3221TGSI_SEMANTIC_EDGEFLAG 3222"""""""""""""""""""""" 3223 3224For vertex shaders, this sematic label indicates that an input or 3225output is a boolean edge flag. The register layout is [F, x, x, x] 3226where F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader 3227simply copies the edge flag input to the edgeflag output. 3228 3229Edge flags are used to control which lines or points are actually 3230drawn when the polygon mode converts triangles/quads/polygons into 3231points or lines. 3232 3233 3234TGSI_SEMANTIC_STENCIL 3235""""""""""""""""""""" 3236 3237For fragment shaders, this semantic label indicates that an output 3238is a writable stencil reference value. Only the Y component is writable. 3239This allows the fragment shader to change the fragments stencilref value. 3240 3241 3242TGSI_SEMANTIC_VIEWPORT_INDEX 3243"""""""""""""""""""""""""""" 3244 3245For geometry shaders, this semantic label indicates that an output 3246contains the index of the viewport (and scissor) to use. 3247This is an integer value, and only the X component is used. 3248 3249If PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is 3250supported, then this semantic label can also be used in vertex or 3251tessellation evaluation shaders, respectively. Only the value written in the 3252last vertex processing stage is used. 3253 3254 3255TGSI_SEMANTIC_LAYER 3256""""""""""""""""""" 3257 3258For geometry shaders, this semantic label indicates that an output 3259contains the layer value to use for the color and depth/stencil surfaces. 3260This is an integer value, and only the X component is used. 3261(Also known as rendertarget array index.) 3262 3263If PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is 3264supported, then this semantic label can also be used in vertex or 3265tessellation evaluation shaders, respectively. Only the value written in the 3266last vertex processing stage is used. 3267 3268 3269TGSI_SEMANTIC_CLIPDIST 3270"""""""""""""""""""""" 3271 3272Note this covers clipping and culling distances. 3273 3274When components of vertex elements are identified this way, these 3275values are each assumed to be a float32 signed distance to a plane. 3276 3277For clip distances: 3278Primitive setup only invokes rasterization on pixels for which 3279the interpolated plane distances are >= 0. 3280 3281For cull distances: 3282Primitives will be completely discarded if the plane distance 3283for all of the vertices in the primitive are < 0. 3284If a vertex has a cull distance of NaN, that vertex counts as "out" 3285(as if its < 0); 3286 3287Multiple clip/cull planes can be implemented simultaneously, by 3288annotating multiple components of one or more vertex elements with 3289the above specified semantic. 3290The limits on both clip and cull distances are bound 3291by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines 3292the maximum number of components that can be used to hold the 3293distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT 3294which specifies the maximum number of registers which can be 3295annotated with those semantics. 3296The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED 3297are used to divide up the 2 x vec4 space between clipping and culling. 3298 3299TGSI_SEMANTIC_SAMPLEID 3300"""""""""""""""""""""" 3301 3302For fragment shaders, this semantic label indicates that a system value 3303contains the current sample id (i.e. gl_SampleID) as an unsigned int. 3304Only the X component is used. If per-sample shading is not enabled, 3305the result is (0, undef, undef, undef). 3306 3307Note that if the fragment shader uses this system value, the fragment 3308shader is automatically executed at per sample frequency. 3309 3310TGSI_SEMANTIC_SAMPLEPOS 3311""""""""""""""""""""""" 3312 3313For fragment shaders, this semantic label indicates that a system 3314value contains the current sample's position as float4(x, y, undef, undef) 3315in the render target (i.e. gl_SamplePosition) when per-fragment shading 3316is in effect. Position values are in the range [0, 1] where 0.5 is 3317the center of the fragment. 3318 3319Note that if the fragment shader uses this system value, the fragment 3320shader is automatically executed at per sample frequency. 3321 3322TGSI_SEMANTIC_SAMPLEMASK 3323"""""""""""""""""""""""" 3324 3325For fragment shaders, this semantic label can be applied to either a 3326shader system value input or output. 3327 3328For a system value, the sample mask indicates the set of samples covered by 3329the current primitive. If MSAA is not enabled, the value is (1, 0, 0, 0). 3330 3331For an output, the sample mask is used to disable further sample processing. 3332 3333For both, the register type is uint[4] but only the X component is used 3334(i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up 3335to 32x MSAA is supported). 3336 3337TGSI_SEMANTIC_INVOCATIONID 3338"""""""""""""""""""""""""" 3339 3340For geometry shaders, this semantic label indicates that a system value 3341contains the current invocation id (i.e. gl_InvocationID). 3342This is an integer value, and only the X component is used. 3343 3344TGSI_SEMANTIC_INSTANCEID 3345"""""""""""""""""""""""" 3346 3347For vertex shaders, this semantic label indicates that a system value contains 3348the current instance id (i.e. gl_InstanceID). It does not include the base 3349instance. This is an integer value, and only the X component is used. 3350 3351TGSI_SEMANTIC_VERTEXID 3352"""""""""""""""""""""" 3353 3354For vertex shaders, this semantic label indicates that a system value contains 3355the current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the 3356base vertex. This is an integer value, and only the X component is used. 3357 3358TGSI_SEMANTIC_VERTEXID_NOBASE 3359""""""""""""""""""""""""""""""" 3360 3361For vertex shaders, this semantic label indicates that a system value contains 3362the current vertex id without including the base vertex (this corresponds to 3363d3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX 3364== TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component 3365is used. 3366 3367TGSI_SEMANTIC_BASEVERTEX 3368"""""""""""""""""""""""" 3369 3370For vertex shaders, this semantic label indicates that a system value contains 3371the base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls, 3372this contains the first (or start) value instead. 3373This is an integer value, and only the X component is used. 3374 3375TGSI_SEMANTIC_PRIMID 3376"""""""""""""""""""" 3377 3378For geometry and fragment shaders, this semantic label indicates the value 3379contains the primitive id (i.e. gl_PrimitiveID). This is an integer value, 3380and only the X component is used. 3381FIXME: This right now can be either a ordinary input or a system value... 3382 3383 3384TGSI_SEMANTIC_PATCH 3385""""""""""""""""""" 3386 3387For tessellation evaluation/control shaders, this semantic label indicates a 3388generic per-patch attribute. Such semantics will not implicitly be per-vertex 3389arrays. 3390 3391TGSI_SEMANTIC_TESSCOORD 3392""""""""""""""""""""""" 3393 3394For tessellation evaluation shaders, this semantic label indicates the 3395coordinates of the vertex being processed. This is available in XYZ; W is 3396undefined. 3397 3398TGSI_SEMANTIC_TESSOUTER 3399""""""""""""""""""""""" 3400 3401For tessellation evaluation/control shaders, this semantic label indicates the 3402outer tessellation levels of the patch. Isoline tessellation will only have XY 3403defined, triangle will have XYZ and quads will have XYZW defined. This 3404corresponds to gl_TessLevelOuter. 3405 3406TGSI_SEMANTIC_TESSINNER 3407""""""""""""""""""""""" 3408 3409For tessellation evaluation/control shaders, this semantic label indicates the 3410inner tessellation levels of the patch. The X value is only defined for 3411triangle tessellation, while quads will have XY defined. This is entirely 3412undefined for isoline tessellation. 3413 3414TGSI_SEMANTIC_VERTICESIN 3415"""""""""""""""""""""""" 3416 3417For tessellation evaluation/control shaders, this semantic label indicates the 3418number of vertices provided in the input patch. Only the X value is defined. 3419 3420TGSI_SEMANTIC_HELPER_INVOCATION 3421""""""""""""""""""""""""""""""" 3422 3423For fragment shaders, this semantic indicates whether the current 3424invocation is covered or not. Helper invocations are created in order 3425to properly compute derivatives, however it may be desirable to skip 3426some of the logic in those cases. See ``gl_HelperInvocation`` documentation. 3427 3428TGSI_SEMANTIC_BASEINSTANCE 3429"""""""""""""""""""""""""" 3430 3431For vertex shaders, the base instance argument supplied for this 3432draw. This is an integer value, and only the X component is used. 3433 3434TGSI_SEMANTIC_DRAWID 3435"""""""""""""""""""" 3436 3437For vertex shaders, the zero-based index of the current draw in a 3438``glMultiDraw*`` invocation. This is an integer value, and only the X 3439component is used. 3440 3441 3442TGSI_SEMANTIC_WORK_DIM 3443"""""""""""""""""""""" 3444 3445For compute shaders started via opencl this retrieves the work_dim 3446parameter to the clEnqueueNDRangeKernel call with which the shader 3447was started. 3448 3449 3450TGSI_SEMANTIC_GRID_SIZE 3451""""""""""""""""""""""" 3452 3453For compute shaders, this semantic indicates the maximum (x, y, z) dimensions 3454of a grid of thread blocks. 3455 3456 3457TGSI_SEMANTIC_BLOCK_ID 3458"""""""""""""""""""""" 3459 3460For compute shaders, this semantic indicates the (x, y, z) coordinates of the 3461current block inside of the grid. 3462 3463 3464TGSI_SEMANTIC_BLOCK_SIZE 3465"""""""""""""""""""""""" 3466 3467For compute shaders, this semantic indicates the maximum (x, y, z) dimensions 3468of a block in threads. 3469 3470 3471TGSI_SEMANTIC_THREAD_ID 3472""""""""""""""""""""""" 3473 3474For compute shaders, this semantic indicates the (x, y, z) coordinates of the 3475current thread inside of the block. 3476 3477 3478TGSI_SEMANTIC_SUBGROUP_SIZE 3479""""""""""""""""""""""""""" 3480 3481This semantic indicates the subgroup size for the current invocation. This is 3482an integer of at most 64, as it indicates the width of lanemasks. It does not 3483depend on the number of invocations that are active. 3484 3485 3486TGSI_SEMANTIC_SUBGROUP_INVOCATION 3487""""""""""""""""""""""""""""""""" 3488 3489The index of the current invocation within its subgroup. 3490 3491 3492TGSI_SEMANTIC_SUBGROUP_EQ_MASK 3493"""""""""""""""""""""""""""""" 3494 3495A bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3496``1 << subgroup_invocation`` in arbitrary precision arithmetic. 3497 3498 3499TGSI_SEMANTIC_SUBGROUP_GE_MASK 3500"""""""""""""""""""""""""""""" 3501 3502A bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3503``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation`` 3504in arbitrary precision arithmetic. 3505 3506 3507TGSI_SEMANTIC_SUBGROUP_GT_MASK 3508"""""""""""""""""""""""""""""" 3509 3510A bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3511``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)`` 3512in arbitrary precision arithmetic. 3513 3514 3515TGSI_SEMANTIC_SUBGROUP_LE_MASK 3516"""""""""""""""""""""""""""""" 3517 3518A bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3519``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic. 3520 3521 3522TGSI_SEMANTIC_SUBGROUP_LT_MASK 3523"""""""""""""""""""""""""""""" 3524 3525A bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3526``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic. 3527 3528 3529TGSI_SEMANTIC_VIEWPORT_MASK 3530""""""""""""""""""""""""""" 3531 3532A bit mask of viewports to broadcast the current primitive to. See 3533GL_NV_viewport_array2 for more details. 3534 3535 3536TGSI_SEMANTIC_TESS_DEFAULT_OUTER_LEVEL 3537"""""""""""""""""""""""""""""""""""""" 3538 3539A system value equal to the default_outer_level array set via set_tess_level. 3540 3541 3542TGSI_SEMANTIC_TESS_DEFAULT_INNER_LEVEL 3543"""""""""""""""""""""""""""""""""""""" 3544 3545A system value equal to the default_inner_level array set via set_tess_level. 3546 3547 3548Declaration Interpolate 3549^^^^^^^^^^^^^^^^^^^^^^^ 3550 3551This token is only valid for fragment shader INPUT declarations. 3552 3553The Interpolate field specifes the way input is being interpolated by 3554the rasteriser and is one of TGSI_INTERPOLATE_*. 3555 3556The Location field specifies the location inside the pixel that the 3557interpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that 3558when per-sample shading is enabled, the implementation may choose to 3559interpolate at the sample irrespective of the Location field. 3560 3561The CylindricalWrap bitfield specifies which register components 3562should be subject to cylindrical wrapping when interpolating by the 3563rasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component 3564should be interpolated according to cylindrical wrapping rules. 3565 3566 3567Declaration Sampler View 3568^^^^^^^^^^^^^^^^^^^^^^^^ 3569 3570Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW. 3571 3572DCL SVIEW[#], resource, type(s) 3573 3574Declares a shader input sampler view and assigns it to a SVIEW[#] 3575register. 3576 3577resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray. 3578 3579type must be 1 or 4 entries (if specifying on a per-component 3580level) out of UNORM, SNORM, SINT, UINT and FLOAT. 3581 3582For TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes 3583which take an explicit SVIEW[#] source register), there may be optionally 3584SVIEW[#] declarations. In this case, the SVIEW index is implied by the 3585SAMP index, and there must be a corresponding SVIEW[#] declaration for 3586each SAMP[#] declaration. Drivers are free to ignore this if they wish. 3587But note in particular that some drivers need to know the sampler type 3588(float/int/unsigned) in order to generate the correct code, so cases 3589where integer textures are sampled, SVIEW[#] declarations should be 3590used. 3591 3592NOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes 3593in the same shader. 3594 3595Declaration Resource 3596^^^^^^^^^^^^^^^^^^^^ 3597 3598Follows Declaration token if file is TGSI_FILE_RESOURCE. 3599 3600DCL RES[#], resource [, WR] [, RAW] 3601 3602Declares a shader input resource and assigns it to a RES[#] 3603register. 3604 3605resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and 36062DArray. 3607 3608If the RAW keyword is not specified, the texture data will be 3609subject to conversion, swizzling and scaling as required to yield 3610the specified data type from the physical data format of the bound 3611resource. 3612 3613If the RAW keyword is specified, no channel conversion will be 3614performed: the values read for each of the channels (X,Y,Z,W) will 3615correspond to consecutive words in the same order and format 3616they're found in memory. No element-to-address conversion will be 3617performed either: the value of the provided X coordinate will be 3618interpreted in byte units instead of texel units. The result of 3619accessing a misaligned address is undefined. 3620 3621Usage of the STORE opcode is only allowed if the WR (writable) flag 3622is set. 3623 3624Hardware Atomic Register File 3625^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3626 3627Hardware atomics are declared as a 2D array with an optional array id. 3628 3629The first member of the dimension is the buffer resource the atomic 3630is located in. 3631The second member is a range into the buffer resource, either for 3632one or multiple counters. If this is an array, the declaration will have 3633an unique array id. 3634 3635Each counter is 4 bytes in size, and index and ranges are in counters not bytes. 3636DCL HWATOMIC[0][0] 3637DCL HWATOMIC[0][1] 3638 3639This declares two atomics, one at the start of the buffer and one in the 3640second 4 bytes. 3641 3642DCL HWATOMIC[0][0] 3643DCL HWATOMIC[1][0] 3644DCL HWATOMIC[1][1..3], ARRAY(1) 3645 3646This declares 5 atomics, one in buffer 0 at 0, 3647one in buffer 1 at 0, and an array of 3 atomics in 3648the buffer 1, starting at 1. 3649 3650Properties 3651^^^^^^^^^^^^^^^^^^^^^^^^ 3652 3653Properties are general directives that apply to the whole TGSI program. 3654 3655FS_COORD_ORIGIN 3656""""""""""""""" 3657 3658Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin. 3659The default value is UPPER_LEFT. 3660 3661If UPPER_LEFT, the position will be (0,0) at the upper left corner and 3662increase downward and rightward. 3663If LOWER_LEFT, the position will be (0,0) at the lower left corner and 3664increase upward and rightward. 3665 3666OpenGL defaults to LOWER_LEFT, and is configurable with the 3667GL_ARB_fragment_coord_conventions extension. 3668 3669DirectX 9/10 use UPPER_LEFT. 3670 3671FS_COORD_PIXEL_CENTER 3672""""""""""""""""""""" 3673 3674Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention. 3675The default value is HALF_INTEGER. 3676 3677If HALF_INTEGER, the fractionary part of the position will be 0.5 3678If INTEGER, the fractionary part of the position will be 0.0 3679 3680Note that this does not affect the set of fragments generated by 3681rasterization, which is instead controlled by half_pixel_center in the 3682rasterizer. 3683 3684OpenGL defaults to HALF_INTEGER, and is configurable with the 3685GL_ARB_fragment_coord_conventions extension. 3686 3687DirectX 9 uses INTEGER. 3688DirectX 10 uses HALF_INTEGER. 3689 3690FS_COLOR0_WRITES_ALL_CBUFS 3691"""""""""""""""""""""""""" 3692Specifies that writes to the fragment shader color 0 are replicated to all 3693bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where 3694fragData is directed to a single color buffer, but fragColor is broadcast. 3695 3696VS_PROHIBIT_UCPS 3697"""""""""""""""""""""""""" 3698If this property is set on the program bound to the shader stage before the 3699fragment shader, user clip planes should have no effect (be disabled) even if 3700that shader does not write to any clip distance outputs and the rasterizer's 3701clip_plane_enable is non-zero. 3702This property is only supported by drivers that also support shader clip 3703distance outputs. 3704This is useful for APIs that don't have UCPs and where clip distances written 3705by a shader cannot be disabled. 3706 3707GS_INVOCATIONS 3708"""""""""""""" 3709 3710Specifies the number of times a geometry shader should be executed for each 3711input primitive. Each invocation will have a different 3712TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to 3713be 1. 3714 3715VS_WINDOW_SPACE_POSITION 3716"""""""""""""""""""""""""" 3717If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output 3718is assumed to contain window space coordinates. 3719Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is 3720directly taken from the 4-th component of the shader output. 3721Naturally, clipping is not performed on window coordinates either. 3722The effect of this property is undefined if a geometry or tessellation shader 3723are in use. 3724 3725TCS_VERTICES_OUT 3726"""""""""""""""" 3727 3728The number of vertices written by the tessellation control shader. This 3729effectively defines the patch input size of the tessellation evaluation shader 3730as well. 3731 3732TES_PRIM_MODE 3733""""""""""""" 3734 3735This sets the tessellation primitive mode, one of ``PIPE_PRIM_TRIANGLES``, 3736``PIPE_PRIM_QUADS``, or ``PIPE_PRIM_LINES``. (Unlike in GL, there is no 3737separate isolines settings, the regular lines is assumed to mean isolines.) 3738 3739TES_SPACING 3740""""""""""" 3741 3742This sets the spacing mode of the tessellation generator, one of 3743``PIPE_TESS_SPACING_*``. 3744 3745TES_VERTEX_ORDER_CW 3746""""""""""""""""""" 3747 3748This sets the vertex order to be clockwise if the value is 1, or 3749counter-clockwise if set to 0. 3750 3751TES_POINT_MODE 3752"""""""""""""" 3753 3754If set to a non-zero value, this turns on point mode for the tessellator, 3755which means that points will be generated instead of primitives. 3756 3757NUM_CLIPDIST_ENABLED 3758"""""""""""""""""""" 3759 3760How many clip distance scalar outputs are enabled. 3761 3762NUM_CULLDIST_ENABLED 3763"""""""""""""""""""" 3764 3765How many cull distance scalar outputs are enabled. 3766 3767FS_EARLY_DEPTH_STENCIL 3768"""""""""""""""""""""" 3769 3770Whether depth test, stencil test, and occlusion query should run before 3771the fragment shader (regardless of fragment shader side effects). Corresponds 3772to GLSL early_fragment_tests. 3773 3774NEXT_SHADER 3775""""""""""" 3776 3777Which shader stage will MOST LIKELY follow after this shader when the shader 3778is bound. This is only a hint to the driver and doesn't have to be precise. 3779Only set for VS and TES. 3780 3781CS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH 3782""""""""""""""""""""""""""""""""""""" 3783 3784Threads per block in each dimension, if known at compile time. If the block size 3785is known all three should be at least 1. If it is unknown they should all be set 3786to 0 or not set. 3787 3788MUL_ZERO_WINS 3789""""""""""""" 3790 3791The MUL TGSI operation (FP32 multiplication) will return 0 if either 3792of the operands are equal to 0. That means that 0 * Inf = 0. This 3793should be set the same way for an entire pipeline. Note that this 3794applies not only to the literal MUL TGSI opcode, but all FP32 3795multiplications implied by other operations, such as MAD, FMA, DP2, 3796DP3, DP4, DST, LOG, LRP, and possibly others. If there is a 3797mismatch between shaders, then it is unspecified whether this behavior 3798will be enabled. 3799 3800FS_POST_DEPTH_COVERAGE 3801"""""""""""""""""""""" 3802 3803When enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples 3804that have failed the depth/stencil tests. This is only valid when 3805FS_EARLY_DEPTH_STENCIL is also specified. 3806 3807LAYER_VIEWPORT_RELATIVE 3808""""""""""""""""""""""" 3809 3810When enabled, the TGSI_SEMATNIC_LAYER output value is relative to the 3811current viewport. This is especially useful in conjunction with 3812TGSI_SEMANTIC_VIEWPORT_MASK. 3813 3814 3815Texture Sampling and Texture Formats 3816------------------------------------ 3817 3818This table shows how texture image components are returned as (x,y,z,w) tuples 3819by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and 3820:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as 3821well. 3822 3823+--------------------+--------------+--------------------+--------------+ 3824| Texture Components | Gallium | OpenGL | Direct3D 9 | 3825+====================+==============+====================+==============+ 3826| R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) | 3827+--------------------+--------------+--------------------+--------------+ 3828| RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) | 3829+--------------------+--------------+--------------------+--------------+ 3830| RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) | 3831+--------------------+--------------+--------------------+--------------+ 3832| RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) | 3833+--------------------+--------------+--------------------+--------------+ 3834| A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) | 3835+--------------------+--------------+--------------------+--------------+ 3836| L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) | 3837+--------------------+--------------+--------------------+--------------+ 3838| LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) | 3839+--------------------+--------------+--------------------+--------------+ 3840| I | (i, i, i, i) | (i, i, i, i) | N/A | 3841+--------------------+--------------+--------------------+--------------+ 3842| UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) | 3843| | | [#envmap-bumpmap]_ | | 3844+--------------------+--------------+--------------------+--------------+ 3845| Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) | 3846| | | [#depth-tex-mode]_ | | 3847+--------------------+--------------+--------------------+--------------+ 3848| S | (s, s, s, s) | unknown | unknown | 3849+--------------------+--------------+--------------------+--------------+ 3850 3851.. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt 3852.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z) 3853 or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE. 3854