Unlike other instruction categories, cat1 can have relative dest
({OFFSET} == 0) && {DST_REL}
r<a0.x>
{DST_REL}
r<a0.x + {OFFSET}>
{DST}
0
001
({SRC_TYPE} == 0) /* f16 */ ||
({SRC_TYPE} == 2) /* u16 */ ||
({SRC_TYPE} == 4) /* s16 */ ||
({SRC_TYPE} == 6) /* u8 */ ||
({SRC_TYPE} == 7) /* s8 */
({DST_TYPE} == 0) /* f16 */ ||
({DST_TYPE} == 2) /* u16 */ ||
({DST_TYPE} == 4) /* s16 */ ||
({DST_TYPE} == 6) /* u8 */ ||
({DST_TYPE} == 7) /* s8 */
({DST} == 0xf4 /* a0.x */) && ({SRC_TYPE} == 4 /* s16 */) && ({DST_TYPE} == 4)
{SY}{SS}{JP}{REPEAT}{UL}mova {ROUND}a0.x, {SRC}
11110100
100
100
({DST} == 0xf5 /* a0.y */) && ({SRC_TYPE} == 2 /* u16 */) && ({DST_TYPE} == 2)
{SY}{SS}{JP}{REPEAT}{UL}mova1 {ROUND}a1.x, {SRC}
11110101
010
010
{SRC_TYPE} != {DST_TYPE}
{SY}{SS}{JP}{REPEAT}{UL}cov.{SRC_TYPE}{DST_TYPE} {ROUND}{DST_HALF}{DST}, {SRC}
{SY}{SS}{JP}{REPEAT}{UL}mov.{SRC_TYPE}{DST_TYPE} {ROUND}{DST_HALF}{DST}, {SRC}
00
{SRC_TYPE} == 0 /* f16 */
h({IMMED})
{SRC_TYPE} == 1 /* f32 */
({IMMED})
({SRC_TYPE} == 3 /* u32 */) && ({IMMED} > 0x1000)
0x{IMMED}
{SRC_TYPE} == 4 /* s16 */
{SRC_TYPE} == 5 /* s32 */
{IMMED}
{SRC_R}{HALF}{CONST}
{SRC_R}{HALF}{SRC}
{SRC_R}{HALF}{SRC}
{SRC_R}{HALF}{SRC}
0
10
000000000000000000000
01
000000000000000000000000
00
1
00000000000000000000
00
0
1
{HALF}{REG}
{DST_HALF}{REG}
These instructions all expand to a series of mov instructions,
like (rptN) but more flexible. They aren't any faster than the
equivalent sequence of mov/cov, but they guarantee that all
sources are read before any destination is written, so they
behave as-if the moves are executed in parallel.
0
0
00
10
SWiZzle. Move SRC0 to DST0 and SRC1 to DST1 in parallel. In
particular this can be used to swap two registers.
{SY}{SS}{JP}{UL}swz.{SRC_TYPE}{DST_TYPE} {ROUND}{DST0}, {DST1}, {SRC0}, {SRC1}
00000000
00
GATher. Move SRC0 to DST0, SRC1 to DST0 + 1, SRC2 to DST0 + 2, and SRC3 to DST0 + 3.
{SY}{SS}{JP}{UL}gat.{SRC_TYPE}{DST_TYPE} {ROUND}{DST0}, {SRC0}, {SRC1}, {SRC2}, {SRC3}
01
SCaTter. Move SRC0 to DST0, SRC0 + 1 to DST1, SRC0 + 2 to DST2 + 3, and SRC0 + 3 to DST3.
{SY}{SS}{JP}{UL}sct.{SRC_TYPE}{DST_TYPE} {ROUND}{DST0}, {DST1}, {DST2}, {DST3}, {SRC0}
10
{SY}{SS}{JP}{UL}movmsk.w{W} {DST}
({REPEAT} + 1) * 32
00000000000000000000000000000000
0
011
011
00
11