Allow early z-test and early-lrz (if applicable)
Disable early z-test and early-lrz test (if applicable)
A special mode that allows early-lrz test but disables
early-z test. Which might sound a bit funny, since
lrz-test happens before z-test. But as long as a couple
conditions are maintained this allows using lrz-test in
cases where fragment shader has kill/discard:
1) Disable lrz-write in cases where it is uncertain during
binning pass that a fragment will pass. Ie. if frag
shader has-kill, writes-z, or alpha/stencil test is
enabled. (For correctness, lrz-write must be disabled
when blend is enabled.) This is analogous to how a
z-prepass works.
2) Disable lrz-write and test if a depth-test direction
reversal is detected. Due to condition (1), the contents
of the lrz buffer are a conservative estimation of the
depth buffer during the draw pass. Meaning that geometry
that we know for certain will not be visible will not pass
lrz-test. But geometry which may be (or contributes to
blend) will pass the lrz-test.
This allows us to keep early-lrz-test in cases where the frag
shader does not write-z (ie. we know the z-value before FS)
and does not have side-effects (image/ssbo writes, etc), but
does have kill/discard. Which turns out to be a common
enough case that it is useful to keep early-lrz test against
the conservative lrz buffer to discard fragments that we
know will definitely not be visible.
b0..7 seems to contain the size of buffered by not yet processed
RB level cmdstream.. it's possible that it is a low threshold
and b8..15 is a high threshold?
b16..23 identifies where IB1 data starts (and RB data ends?)
b24..31 identifies where IB2 data starts (and IB1 data ends)
low bits identify where CP_SET_DRAW_STATE stateobj
processing starts (and IB2 data ends). I'm guessing
b8 is part of this since (from downstream kgsl):
/* ROQ sizes are twice as big on a640/a680 than on a630 */
if (adreno_is_a640(adreno_dev) || adreno_is_a680(adreno_dev)) {
kgsl_regwrite(device, A6XX_CP_ROQ_THRESHOLDS_2, 0x02000140);
kgsl_regwrite(device, A6XX_CP_ROQ_THRESHOLDS_1, 0x8040362C);
} ...
number of remaining dwords incl current dword being consumed?
number of remaining dwords incl current dword being consumed?
number of dwords that have already been read but haven't been consumed by $addr
Configures the mapping between VSC_PIPE buffer and
bin, X/Y specify the bin index in the horiz/vert
direction (0,0 is upper left, 0,1 is leftmost bin
on second row, and so on). W/H specify the number
of bins assigned to this VSC_PIPE in the horiz/vert
dimension.
Seems to be a bitmap of which tiles mapped to the VSC
pipe contain geometry.
I suppose we can connect a maximum of 32 tiles to a
single VSC pipe.
Has the size of data written to corresponding VSC_PRIM_STRM
buffer.
Has the size of data written to corresponding VSC pipe, ie.
same thing that is written out to VSC_DRAW_STRM_SIZE_ADDRESS_LO/HI
LRZ write also disabled for blend/etc.
update MAX instead of MIN value, ie. GL_GREATER/GL_GEQUAL
Z_TEST_ENABLE bit is set for zfunc other than GL_ALWAYS or GL_NEVER
also set when Z_BOUNDS_ENABLE is set
For clearing depth/stencil
1 - depth
2 - stencil
3 - depth+stencil
For clearing color buffer:
then probably a component mask, I always see 0xf
num of varyings plus four for gl_Position (plus one if gl_PointSize)
plus # of transform-feedback (streamout) varyings if using the
hw streamout (rather than stg instructions in shader)
The number of extra copies of POSITION, i.e.
number of views minus one when multi-position
output is enabled, otherwise 0.
This VPC location will be overwritten with
ViewID when multiview is enabled. It's used when
fragment shaders read ViewID. It's only
strictly required for multi-position output,
where the same VS invocation is used for all the
views at once, but it can be used when multi-pos
output is disabled too, to avoid having to pass
ViewID through the VS.
num of varyings plus four for gl_Position (plus one if gl_PointSize)
plus # of transform-feedback (streamout) varyings if using the
hw streamout (rather than stg instructions in shader)
geometry shader
size in vec4s of per-primitive storage for gs. TODO: not actually in VPC
Multi-position output lets the last geometry
stage shader write multiple copies of
gl_Position. If disabled then the VS is run once
for each view, and ViewID is passed as a
register to the VS.
bit 0 seems to toggle between 2k and 32k of shared storage
the ldl/stl offset seems to be rewritten to 0 when it is beyond
this limit. This is different from ldlw/stlw, which wraps at
64k (and has 36k of storage on A640 - reads between 36k-64k
always return 0)
per MRT
This register clears pending loads queued up by
CP_LOAD_STATE6. Each bit resets a particular kind(s) of
CP_LOAD_STATE6.
Shared constants are intended to be used for Vulkan push
constants. When enabled, 8 vec4's are reserved in the FS
const pool and 16 in the geometry const pool although
only 8 are actually used (why?) and they are mapped to
c504-c511 in each stage. Both VS and FS shared consts
are written using ST6_CONSTANTS/SB6_IBO, so that both
the geometry and FS shared consts can be written at once
by using CP_LOAD_STATE6 rather than
CP_LOAD_STATE6_FRAG/CP_LOAD_STATE6_GEOM. In addition
DST_OFF and NUM_UNIT are in units of dwords instead of
vec4's.
There is also a separate shared constant pool for CS,
which is loaded through CP_LOAD_STATE6_FRAG with
ST6_UBO/ST6_IBO. However the only real difference for CS
is the dword units.
Texture sampler dwords
Texture constant dwords
Pitch in bytes (so actually stride)
Pitch in bytes (so actually stride)