initialize CP's micro-engine skip N 32-bit words to get to the next packet indirect buffer dispatch. prefetch parser uses this packet type to determine whether to pre-fetch the IB Takes the same arguments as CP_INDIRECT_BUFFER, but jumps to another buffer at the same level. Must be at the end of IB, and doesn't work with draw state IB's. indirect buffer dispatch. same as IB, but init is pipelined wait for the IDLE state of the engine wait until a register or memory location is a specific value wait until a register location is equal to a specific value wait until a register location is >= a specific value wait until a read completes wait until all base/size writes from an IB_PFD packet have completed register read/modify/write Set binning configuration registers reads register in chip and writes to memory write N 32-bit words to memory write CP_PROG_COUNTER value to memory conditional execution of a sequence of packets conditional write to memory or register generate an event that creates a write to memory when completed generate a VS|PS_done event generate a cache flush done event generate a z_pass done event not sure the real name, but this seems to be what is used for opencl, instead of CP_DRAW_INDX.. initiate fetch of index buffer and draw draw using supplied indices in packet initiate fetch of index buffer and binIDs and draw initiate fetch of bin IDs and draw using supplied indices begin/end initiator for viz query extent processing fetch state sub-blocks and initiate shader code DMAs load constant into chip and to memory load sequencer instruction memory (pointer-based) load sequencer instruction memory (code embedded in packet) load constants from a location in memory selective invalidation of state pointers dynamically changes shader instruction memory partition sets the 64-bit BIN_MASK register in the PFP sets the 64-bit BIN_SELECT register in the PFP updates the current context, if needed generate interrupt from the command stream copy sequencer instruction memory to system memory sets draw initiator flags register in PFP, gets bitwise-ORed into every draw initiator sets the register protection mode load high level sequencer command Conditionally load a IB based on a flag, prefetch enabled Conditionally load a IB based on a flag, prefetch disabled Load a buffer with pre-fetch enabled Set bin (?) test 2 memory locations to dword values specified Write register, ignoring context state for context sensitive registers Record the real-time when this packet is processed by PFP PFP waits until the FIFO between the PFP and the ME is empty Used a bit like CP_SET_CONSTANT on a2xx, but can write multiple groups of registers. Looks like it can be used to create state objects in GPU memory, and on state change only emit pointer (via CP_SET_DRAW_STATE), which should be nice for reducing CPU overhead: (A4x) save PM4 stream pointers to execute upon a visible draw Enable or disable predication globally. Also resets the predicate to "passing" and the local bit to enabled when enabling global predication. Enable or disable predication locally. Unlike globally enabling predication, this packet doesn't touch any other state. Predication only happens when enabled globally and locally and a predicate has been set. This should be used for internal draws which aren't supposed to use the predication state: CP_DRAW_PRED_ENABLE_LOCAL(0) ... do draw... CP_DRAW_PRED_ENABLE_LOCAL(1) Latch a draw predicate into the internal register. for A4xx Write to register with address that does not fit into type-0 pkt copy from ME scratch RAM to a register Copy from REG to ME scratch RAM Wait for memory writes to complete Conditional execution based on register comparison Memory to REG copy for a5xx Tells CP the current mode of GPU operation Instruct CP to set a few internal CP registers Load state, a3xx (and later?) inline with the CP_LOAD_STATE packet in buffer pointed to by EXT_SRC_ADDR Load state, a4xx+ Load state, a6xx+ SS6_UBO used by the a6xx vulkan blob with tesselation constants in this case, EXT_SRC_ADDR is (ubo_id shl 16 | offset) to load constants from a UBO loaded with DST_OFF = 14 and offset 0, EXT_SRC_ADDR = 0xe0000 (offset is a guess, should be in bytes given that maxUniformBufferRange=64k) DST_OFF same as in CP_LOAD_STATE6 - vec4 VS const at this offset will be updated for each draw to {draw_id, first_vertex, first_instance, 0} value of 0 disables it Read a 64-bit value at the given address and test if it equals/doesn't equal 0. value at offset 0 always seems to be 0x00000000.. Like CP_SET_BIN_DATA5, but set the pointers as offsets from the pointers stored in VSC_PIPE_{DATA,DATA2,SIZE}_ADDRESS. Useful for Vulkan where these values aren't known when the command stream is recorded. Modifies DST_REG using two sources that can either be registers or immediates. If SRC1_ADD is set, then do the following: $dst = (($dst & $src0) rot $rotate) + $src1 Otherwise: $dst = (($dst & $src0) rot $rotate) | $src1 Here "rot" means rotate left. Like CP_REG_TO_MEM, but the memory address to write to can be offsetted using either one or two registers or scratch registers. Like CP_REG_TO_MEM, but the memory address to write to can be offsetted using a DWORD in memory. Wait until a memory value is greater than or equal to the reference, using signed comparison. This uses the same internal comparison as CP_COND_WRITE, but waits until the comparison is true instead. It busy-loops in the CP for the given number of cycles before trying again. Waits for REG0 to not be 0 or REG1 to not equal REF Tell CP the current operation mode, indicates save and restore procedure Set internal CP registers, used to indicate context save data addresses Tests bit in specified register and sets predicate for CP_COND_REG_EXEC. So: opcode: CP_REG_TEST (39) (2 dwords) { REG = 0xc10 | BIT = 0 } 0000: 70b90001 00000c10 opcode: CP_COND_REG_EXEC (47) (3 dwords) 0000: 70c70002 10000000 00000004 opcode: CP_INDIRECT_BUFFER (3f) (4 dwords) Will execute the CP_INDIRECT_BUFFER only if b0 in the register at offset 0x0c10 is 1 Executes the following DWORDs of commands if the dword at ADDR0 is not equal to 0 and the dword at ADDR1 is less than REF (signed comparison). Used by the userspace driver to set various IB's which are executed during context save/restore for handling state that isn't restored by the context switch routine itself. Executed unconditionally when switching back to the context. Executed when switching back after switching away during execution of a CP_SET_MARKER packet with RM6_YIELD as the payload *and* the normal save routine was bypassed for a shorter one. I think this is connected to the "skipsaverestore" bit set by the kernel when preempting. Executed when switching away from the context, except for context switches initiated via CP_YIELD. This can only be set by the RB (i.e. the kernel) and executes with protected mode off, but is otherwise similar to SAVE_IB. Keep shadow copies of these registers and only set them when drawing, avoiding redundant writes: - VPC_CNTL_0 - HLSQ_CONTROL_1_REG - HLSQ_UNKNOWN_B980 Track RB_RENDER_CNTL, and insert a WFI in the following situation: - There is a write that disables binning - There was a draw with binning left enabled, but in BYPASS mode Presumably this is a hang workaround? Do a mysterious CP_EVENT_WRITE 0x3f when the low bit of the data to write is 0. Used by the Vulkan blob with PC_MULTIVIEW_CNTL, but this isn't predicated on particular register(s) like the others. Note that the SMMU's definition of TTBRn can take different forms depending on the pgtable format. But a5xx+ only uses aarch64 format. Unused, does not apply to aarch64 pgtable format