cat3 src1 and src2, some parts are similar to cat2/cat4 src
encoding, but a few extra bits trimmed out to squeeze in the
3rd src register (dropping (abs), immed encoding, and moving
a few other bits elsewhere)
{HALF}{SRC}
00000
{IMMED_ENCODING}
{IMMED}
1
{HALF}c{CONST}.{SWIZ}
10
01
{HALF}r<a0.x + {OFFSET}>
0
{HALF}c<a0.x + {OFFSET}>
1
{SY}{SS}{JP}{SAT}(nop{NOP}) {UL}{NAME} {DST_HALF}{DST}, {SRC1_NEG}{SRC1}, {SRC2_NEG}{HALF}{SRC2}, {SRC3_NEG}{SRC3}
{SY}{SS}{JP}{SAT}{REPEAT}{UL}{NAME} {DST_HALF}{DST}, {SRC1_NEG}{SRC1_R}{SRC1}, {SRC2_NEG}{SRC2_R}{HALF}{SRC2}, {SRC3_NEG}{SRC3_R}{SRC3}
011
0
The source precision is determined by the instruction
opcode. If {DST_CONV} the result is widened/narrowed
to the opposite precision.
The difference is that this cat3 version does not support plain
const registers as src1/src3 but does support inmidiate values.
On the other hand it still supports relative gpr and consts.
1
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
(src2 >> src1) & src3
1000
(src2 << src1) & src3
1001
(src2 >> src1) | src3
1010
(src2 << src1) | src3
1011
(src2 & src1) | src3
1100
{SY}{SS}{JP}{SAT}(nop{NOP}) {UL}{NAME}{SRC_SIGN}{SRC_PACK} {DST}, {SRC1}, {SRC2}, {SRC3_NEG}{SRC3}
1
Given:
SRC1 is a i8vec2 or u8vec2
SRC2 is a u8vec2
SRC1 and SRC2 are packed into low or high halves of the registers.
SRC3 is a int32_t or uint32_t
Do:
DST = dot(SRC1, SRC2) + SRC3
0
1101
Same a dp2acc but for vec4 instead of vec2.
Corresponds to packed variantes of OpUDotKHR and OpSUDotKHR.
1
1101
(!{DST_FULL})
1
Given:
SRC1 = (x_1, x_2, x_3, x_4) - 4 consecutive registers
SRC2 = (y_1, y_2, y_3, y_4) - 4 consecutive registers
SRC3 is an immediate in range of [0, 160]
Do:
float y_sum = y_1 + y_2 + y_3 + y_4
vec4 result = (x_1 * y_sum, x_2 * y_sum, x_3 * y_sum, x_4 * y_sum)
Starting from DST reg duplicate *result* into consecutive registers
(1 << (SRC3 / 32)) times.
0
1110
Same as wmm but instead of overwriting DST - the result is
added to DST registers, however the first reg of the result
is always overwritten.
1
1110