initialize CP's micro-engine
skip N 32-bit words to get to the next packet
indirect buffer dispatch. prefetch parser uses this packet
type to determine whether to pre-fetch the IB
Takes the same arguments as CP_INDIRECT_BUFFER, but jumps to
another buffer at the same level. Must be at the end of IB, and
doesn't work with draw state IB's.
indirect buffer dispatch. same as IB, but init is pipelined
wait for the IDLE state of the engine
wait until a register or memory location is a specific value
wait until a register location is equal to a specific value
wait until a register location is >= a specific value
wait until a read completes
wait until all base/size writes from an IB_PFD packet have completed
register read/modify/write
Set binning configuration registers
reads register in chip and writes to memory
write N 32-bit words to memory
write CP_PROG_COUNTER value to memory
conditional execution of a sequence of packets
conditional write to memory or register
generate an event that creates a write to memory when completed
generate a VS|PS_done event
generate a cache flush done event
generate a z_pass done event
not sure the real name, but this seems to be what is used for
opencl, instead of CP_DRAW_INDX..
initiate fetch of index buffer and draw
draw using supplied indices in packet
initiate fetch of index buffer and binIDs and draw
initiate fetch of bin IDs and draw using supplied indices
begin/end initiator for viz query extent processing
fetch state sub-blocks and initiate shader code DMAs
load constant into chip and to memory
load sequencer instruction memory (pointer-based)
load sequencer instruction memory (code embedded in packet)
load constants from a location in memory
selective invalidation of state pointers
dynamically changes shader instruction memory partition
sets the 64-bit BIN_MASK register in the PFP
sets the 64-bit BIN_SELECT register in the PFP
updates the current context, if needed
generate interrupt from the command stream
copy sequencer instruction memory to system memory
sets draw initiator flags register in PFP, gets bitwise-ORed into
every draw initiator
sets the register protection mode
load high level sequencer command
Conditionally load a IB based on a flag, prefetch enabled
Conditionally load a IB based on a flag, prefetch disabled
Load a buffer with pre-fetch enabled
Set bin (?)
test 2 memory locations to dword values specified
Write register, ignoring context state for context sensitive registers
Record the real-time when this packet is processed by PFP
PFP waits until the FIFO between the PFP and the ME is empty
Used a bit like CP_SET_CONSTANT on a2xx, but can write multiple
groups of registers. Looks like it can be used to create state
objects in GPU memory, and on state change only emit pointer
(via CP_SET_DRAW_STATE), which should be nice for reducing CPU
overhead:
(A4x) save PM4 stream pointers to execute upon a visible draw
Enable or disable predication globally. Also resets the
predicate to "passing" and the local bit to enabled when
enabling global predication.
Enable or disable predication locally. Unlike globally enabling
predication, this packet doesn't touch any other state.
Predication only happens when enabled globally and locally and a
predicate has been set. This should be used for internal draws
which aren't supposed to use the predication state:
CP_DRAW_PRED_ENABLE_LOCAL(0)
... do draw...
CP_DRAW_PRED_ENABLE_LOCAL(1)
Latch a draw predicate into the internal register.
for A4xx
Write to register with address that does not fit into type-0 pkt
copy from ME scratch RAM to a register
Copy from REG to ME scratch RAM
Wait for memory writes to complete
Conditional execution based on register comparison
Memory to REG copy
for a5xx
Tells CP the current mode of GPU operation
Instruct CP to set a few internal CP registers
Load state, a3xx (and later?)
inline with the CP_LOAD_STATE packet
in buffer pointed to by EXT_SRC_ADDR
Load state, a4xx+
Load state, a6xx+
SS6_UBO used by the a6xx vulkan blob with tesselation constants
in this case, EXT_SRC_ADDR is (ubo_id shl 16 | offset)
to load constants from a UBO loaded with DST_OFF = 14 and offset 0,
EXT_SRC_ADDR = 0xe0000
(offset is a guess, should be in bytes given that maxUniformBufferRange=64k)
DST_OFF same as in CP_LOAD_STATE6 - vec4 VS const at this offset will
be updated for each draw to {draw_id, first_vertex, first_instance, 0}
value of 0 disables it
Read a 64-bit value at the given address and
test if it equals/doesn't equal 0.
value at offset 0 always seems to be 0x00000000..
Like CP_SET_BIN_DATA5, but set the pointers as offsets from the
pointers stored in VSC_PIPE_{DATA,DATA2,SIZE}_ADDRESS. Useful
for Vulkan where these values aren't known when the command
stream is recorded.
Modifies DST_REG using two sources that can either be registers
or immediates. If SRC1_ADD is set, then do the following:
$dst = (($dst & $src0) rot $rotate) + $src1
Otherwise:
$dst = (($dst & $src0) rot $rotate) | $src1
Here "rot" means rotate left.
Like CP_REG_TO_MEM, but the memory address to write to can be
offsetted using either one or two registers or scratch
registers.
Like CP_REG_TO_MEM, but the memory address to write to can be
offsetted using a DWORD in memory.
Wait until a memory value is greater than or equal to the
reference, using signed comparison.
This uses the same internal comparison as CP_COND_WRITE,
but waits until the comparison is true instead. It busy-loops in
the CP for the given number of cycles before trying again.
Waits for REG0 to not be 0 or REG1 to not equal REF
Tell CP the current operation mode, indicates save and restore procedure
Set internal CP registers, used to indicate context save data addresses
Tests bit in specified register and sets predicate for CP_COND_REG_EXEC.
So:
opcode: CP_REG_TEST (39) (2 dwords)
{ REG = 0xc10 | BIT = 0 }
0000: 70b90001 00000c10
opcode: CP_COND_REG_EXEC (47) (3 dwords)
0000: 70c70002 10000000 00000004
opcode: CP_INDIRECT_BUFFER (3f) (4 dwords)
Will execute the CP_INDIRECT_BUFFER only if b0 in the register at
offset 0x0c10 is 1
Executes the following DWORDs of commands if the dword at ADDR0
is not equal to 0 and the dword at ADDR1 is less than REF
(signed comparison).
Used by the userspace driver to set various IB's which are
executed during context save/restore for handling
state that isn't restored by the
context switch routine itself.
Executed unconditionally when switching back to the context.
Executed when switching back after switching
away during execution of
a CP_SET_MARKER packet with RM6_YIELD as the
payload *and* the normal save routine was
bypassed for a shorter one. I think this is
connected to the "skipsaverestore" bit set by
the kernel when preempting.
Executed when switching away from the context,
except for context switches initiated via
CP_YIELD.
This can only be set by the RB (i.e. the kernel)
and executes with protected mode off, but
is otherwise similar to SAVE_IB.
Keep shadow copies of these registers and only set them
when drawing, avoiding redundant writes:
- VPC_CNTL_0
- HLSQ_CONTROL_1_REG
- HLSQ_UNKNOWN_B980
Track RB_RENDER_CNTL, and insert a WFI in the following
situation:
- There is a write that disables binning
- There was a draw with binning left enabled, but in
BYPASS mode
Presumably this is a hang workaround?
Do a mysterious CP_EVENT_WRITE 0x3f when the low bit of
the data to write is 0. Used by the Vulkan blob with
PC_MULTIVIEW_CNTL, but this isn't predicated on particular
register(s) like the others.
Note that the SMMU's definition of TTBRn can take different forms
depending on the pgtable format. But a5xx+ only uses aarch64
format.
Unused, does not apply to aarch64 pgtable format