1Low Resolution Z Buffer 2======================= 3 4This doc is based on a6xx HW reverse engineering, a5xx should be similar to 5a6xx before gen3. 6 7Low Resolution Z buffer is very similar to a depth prepass that helps 8the HW to avoid executing the fragment shader on those fragments that will 9be subsequently discarded by the depth test afterwards. 10 11The interesting part of this feature is that it allows applications 12to submit the vertices in any order. 13 14Citing official Adreno documentation: 15 16:: 17 18 [A Low Resolution Z (LRZ)] pass is also referred to as draw order independent 19 depth rejection. During the binning pass, a low resolution Z-buffer is constructed, 20 and can reject LRZ-tile wide contributions to boost binning performance. This LRZ 21 is then used during the rendering pass to reject pixels efficiently before testing 22 against the full resolution Z-buffer. 23 24TODO: a7xx 25 26Limitations 27----------- 28 29There are two main limitations of LRZ: 30 31- Since LRZ is an early depth test, such test cannot be used when late-z is required; 32- LRZ buffer could be formed only in one direction, changing depth comparison directions 33 without disabling LRZ would lead to a malformed LRZ buffer. 34 35Pre-a650 (before gen3) 36---------------------- 37 38The direction is fully tracked on CPU. In render pass LRZ starts with 39unknown direction, the direction is set first time when depth write occurs 40and if it does change afterwards then the direction becomes invalid and LRZ is 41disabled for the rest of the render pass. 42 43Since the direction is not tracked by the GPU, it's impossible to know whether 44LRZ is enabled during construction of secondary command buffers. 45 46For the same reason, it's impossible to reuse LRZ between render passes. 47 48A650+ (gen3+) 49------------- 50 51Now LRZ direction can be tracked on GPU. There are two parts: 52 53- Direction byte which stores current LRZ direction - ``GRAS_LRZ_CNTL.DIR``. 54- Parameters of the last used depth view - ``GRAS_LRZ_DEPTH_VIEW``. 55 56The idea is the same as when LRZ tracked on CPU: when ``GRAS_LRZ_CNTL`` 57is used, its direction is compared to the previously known direction 58and direction byte is set to disabled when directions are incompatible. 59 60Additionally, to reuse LRZ between render passes, ``GRAS_LRZ_CNTL`` checks 61if the current value of ``GRAS_LRZ_DEPTH_VIEW`` is equal to the value 62stored in the buffer. If not, LRZ is disabled. This is necessary 63because depth buffer may have several layers and mip levels, while the 64LRZ buffer represents only a single layer + mip level. 65 66LRZ Fast-Clear 67-------------- 68 69The LRZ fast-clear buffer is initialized to zeroes and read/written 70when ``GRAS_LRZ_CNTL.FC_ENABLE`` is set. It appears to store 1b/block. 71``0`` means block has original depth clear value, and ``1`` means that the 72corresponding block in LRZ has been modified. 73 74LRZ fast-clear conservatively clears LRZ buffer. At the point where LRZ is 75written the LRZ block which corresponds to a single fast-clear bit is cleared: 76 77- To ``0.0`` if depth comparison is ``GREATER`` 78- To ``1.0`` if depth comparison is ``LESS`` 79 80This way it's always valid to fast-clear. 81 82LRZ Precision 83------------- 84 85LRZ always uses ``Z16_UNORM``. The epsilon for it is ``1.f / (1 << 16)`` which is 86not enough to represent all values of ``Z32_UNORM`` or ``Z32_FLOAT``. 87This especially raises questions in context of fast-clear, if fast-clear 88uses a value which cannot be precisely represented by LRZ - we wouldn't 89be able to round it in the correct direction since direction is tracked 90on GPU. 91 92However, it seems that depth comparisons with LRZ values have some "slack" 93and nothing special should be done for such depth clear values. 94 95How it was tested: 96 97- Clear ``Z32_FLOAT`` attachment to ``1.f / (1 << 17)`` 98 99 - LRZ buffer contains all zeroes. 100 101- Do draws and check whether all samples are passing: 102 103 - ``OP_GREATER`` with ``(1.f / (1 << 17) + float32_epsilon)`` - passing; 104 - ``OP_GREATER`` with ``(1.f / (1 << 17) - float32_epsilon)`` - not passing; 105 - ``OP_LESS`` with ``(1.f / (1 << 17) - float32_epsilon)`` - samples; 106 - ``OP_LESS`` with ``(1.f / (1 << 17) + float32_epsilon)``- not passing; 107 - ``OP_LESS_OR_EQ`` with ``(1.f / (1 << 17) + float32_epsilon)`` - not passing. 108 109In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated. 110 111LRZ Caches 112---------- 113 114``LRZ_FLUSH`` flushes and invalidates LRZ caches, there are two caches: 115 116- Cache for fast-clear buffer; 117- Cache for direction byte + depth view params. 118 119They could be cleared by ``LRZ_CLEAR``. To become visible in GPU memory 120the caches should be flushed with ``LRZ_FLUSH`` afterwards. 121 122``GRAS_LRZ_CNTL`` reads from these caches. 123