• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Freedreno
2=========
3
4Freedreno GLES and GL driver for Adreno 2xx-6xx GPUs.  It implements up to
5OpenGL ES 3.2 and desktop OpenGL 4.5.
6
7See the `Freedreno Wiki
8<https://gitlab.freedesktop.org/freedreno/freedreno/-/wikis/home>`__ for more
9details.
10
11Turnip
12------
13
14Turnip is a Vulkan 1.3 driver for Adreno 6xx GPUs.
15
16The current set of specific chip versions supported can be found in
17:file:`src/freedreno/common/freedreno_devices.py`.  The current set of features
18supported can be found rendered at `Mesa Matrix <https://mesamatrix.net/>`__.
19There are no plans to port to a5xx or earlier GPUs.
20
21Hardware architecture
22---------------------
23
24Adreno is a mostly tile-mode renderer, but with the option to bypass tiling
25("gmem") and render directly to system memory ("sysmem").  It is UMA, using
26mostly write combined memory but with the ability to map some buffers as cache
27coherent with the CPU.
28
29.. toctree::
30   :glob:
31
32   freedreno/hw/*
33
34Hardware acronyms
35^^^^^^^^^^^^^^^^^
36
37.. glossary::
38
39  Cluster
40    A group of hardware registers, often with multiple copies to allow
41    pipelining.  There is an M:N relationship between hardware blocks that do
42    work and the clusters of registers for the state that hardware blocks use.
43
44  CP
45    Command Processor.  Reads the stream of state changes and draw commands
46    generated by the driver.
47
48  PFP
49    Prefetch Parser.  Adreno 2xx-4xx CP component.
50
51  ME
52    Micro Engine. Adreno 2xx-4xx CP component after PFP, handles most PM4 commands.
53
54  SQE
55    a6xx+ replacement for PFP/ME.  This is the microcontroller that runs the
56    microcode (loaded from Linux) which actually processes the command stream
57    and writes to the hardware registers.  See `afuc
58    <https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/freedreno/afuc/README.rst>`__.
59
60  ROQ
61    DMA engine used by the SQE for reading memory, with some prefetch buffering.
62    Mostly reads in the command stream, but also serves for
63    ``CP_MEMCPY``/``CP_MEM_TO_REG`` and visibility stream reads.
64
65  SP
66    Shader Processor.  Unified, scalar shader engine.  One or more, depending on
67    GPU and tier.
68
69  TP
70    Texture Processor.
71
72  UCHE
73    Unified L2 Cache.  32KB on A330, unclear how big now.
74
75  CCU
76    Color Cache Unit.
77
78  VSC
79    Visibility Stream Compressor
80
81  PVS
82    Primitive Visibility Stream
83
84  FE
85    Front End?  Index buffer and vertex attribute fetch cluster.  Includes PC,
86    VFD, VPC.
87
88  VFD
89    Vertex Fetch and Decode
90
91  VPC
92    Varying/Position Cache?  Hardware block that stores shaded vertex data for
93    primitive assembly.
94
95  HLSQ
96    High Level Sequencer.  Manages state for the SPs, batches up PS invocations
97    between primitives, is involved in preemption.
98
99  PC_VS
100    Cluster where varyings are read from VPC and assembled into primitives to
101    feed GRAS.
102
103  VS
104    Vertex Shader. Responsible for generating VS/GS/tess invocations
105
106  GRAS
107    Rasterizer. Responsible for generating PS invocations from primitives, also
108    does LRZ
109
110  PS
111    Pixel Shader.
112
113  RB
114    Render Backend.  Performs both early and late Z testing, blending, and
115    attachment stores of output of the PS.
116
117  GMEM
118    Roughly 128KB-1MB of memory on the GPU (SKU-dependent), used to store
119    attachments during tiled rendering
120
121  LRZ
122    Low Resolution Z.  A low resolution area of the depth buffer that can be
123    initialized during the binning pass to contain the worst-case (farthest) Z
124    values in a block, and then used to early reject fragments during
125    rasterization.
126
127Cache hierarchy
128^^^^^^^^^^^^^^^
129
130The a6xx GPUs have two main caches: CCU and UCHE.
131
132UCHE (Unified L2 Cache) is the cache behind the vertex fetch, VSC writes,
133texture L1, LRZ, and storage image accesses (``ldib``/``stib``).  Misses and
134flushes access system memory.
135
136The CCU is the separate cache used by 2D blits and sysmem render target access
137(and also for resolves to system memory when in GMEM mode).  Its memory comes
138from a carveout of GMEM controlled by ``RB_CCU_CNTL``, with a varying amount
139reserved based on whether we're in a render pass using GMEM for attachment
140storage, or we're doing sysmem rendering.  Cache entries have the attachment
141number and layer mixed into the cache tag in some way, likely so that a
142fragment's access is spread through the cache even if the attachments are the
143same size and alignments in address space.  This means that the cache must be
144flushed and invalidated between memory being used for one attachment and another
145(notably depth vs color, but also MRT color).
146
147The Texture Processors (TP) additionally have a small L1 cache (1KB on A330,
148unclear how big now) before accessing UCHE. This cache is used for normal
149sampling like ``sam``` and ``isam`` (and the compiler will make read-only
150storage image access through it as well).  It is not coherent with UCHE (may get
151stale results when you ``sam`` after ``stib``), but must get flushed per draw or
152something because you don't need a manual invalidate between draws storing to an
153image and draws sampling from a texture.
154
155The command processor (CP) does not read from either of these caches, and
156instead uses FIFOs in the ROQ to avoid stalls reading from system memory.
157
158Draw states
159^^^^^^^^^^^
160
161Since the SQE is not a fast processor, and tiled rendering means that many draws
162won't even be used in many bins, since a5xx state updates can be batched up into
163"draw states" that point to a fragment of CP packets.  At draw time, if the draw
164call is going to actually execute (some primitive is visible in the current
165tile), the SQE goes through the ``GROUP_ID``\s and for any with an update since
166the last time they were executed, it executes the corresponding fragment.
167
168Starting with a6xx, states can be tagged with whether they should be executed
169at draw time for any of sysmem, binning, or tile rendering.  This allows a
170single command stream to be generated which can be executed in any of the modes,
171unlike pre-a6xx where we had to generate separate command lists for the binning
172and rendering phases.
173
174Note that this means that the generated draw state has to always update all of
175the state you have chosen to pack into that ``GROUP_ID``, since any of your
176previous state changes in a previous draw state command may have been skipped.
177
178Pipelining (a6xx+)
179^^^^^^^^^^^^^^^^^^
180
181Most CP commands write to registers.  In a6xx+, the registers are located in
182clusters corresponding to the stage of the pipeline they are used from (see
183``enum tu_stage`` for a list). To pipeline state updates and drawing, registers
184generally have two copies ("contexts") in their cluster, so previous draws can
185be working on the previous set of register state while the next draw's state is
186being set up. You can find what registers go into which clusters by looking at
187:command:`crashdec` output in the ``regs-name: CP_MEMPOOL`` section.
188
189As SQE processes register writes in the command stream, it sends them into a
190per-cluster queue stored in ``CP_MEMPOOL``.  This allows the pipeline stages to
191process their stream of register updates and events independent of each other
192(so even with just 2 contexts in a stage, earlier stages can proceed on to later
193draws before later stages have caught up).
194
195Each cluster has a per-context bit indicating that the context is done/free.
196Register writes will stall on the context being done.
197
198During a 3D draw command, SQE generates several internal events flow through the
199pipeline:
200
201- ``CP_EVENT_START`` clears the done bit for the context when written to the
202  cluster
203- ``PC_EVENT_CMD``/``PC_DRAW_CMD``/``HLSQ_EVENT_CMD``/``HLSQ_DRAW_CMD`` kick off
204  the actual event/drawing.
205- ``CONTEXT_DONE`` event completes after the event/draw is complete and sets the
206  done flag.
207- ``CP_EVENT_END`` waits for the done flag on the next context, then copies all
208  the registers that were dirtied in this context to that one.
209
210The 2D blit engine has its own ``CP_2D_EVENT_START``, ``CP_2D_EVENT_END``,
211``CONTEXT_DONE_2D``, so 2D and 3D register contexts can do separate context
212rollover.
213
214Because the clusters proceed independently of each other even across draws, if
215you need to synchronize an earlier cluster to the output of a later one, then
216you will need to ``CP_WAIT_FOR_IDLE`` after flushing and invalidating any
217necessary caches.
218
219Also, note that some registers are not banked at all, and will require a
220``CP_WAIT_FOR_IDLE`` for any previous usage of the register to complete.
221
222In a2xx-a4xx, there weren't per-stage clusters, and instead there were two
223register banks that were flipped between per draw.
224
225Bindless/Bindful Descriptors (a6xx+)
226^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
227
228Starting with a6xx++, cat5 (texture) and cat6 (image/ssbo/ubo) instructions are
229extended to support bindless descriptors.
230
231In the old bindful model, descriptors are separate for textures, samplers,
232UBOs, and IBOs (combined descriptor for images and SSBOs), with separate
233registers for the memory containing the array of descriptors, and/or different
234``STATE_TYPE`` and ``STATE_BLOCK`` for ``CP_LOAD_STATE``/``_FRAG``/``_GEOM``
235to pre-load the descriptors into cache.
236
237- textures - per-shader-stage
238   - registers: ``SP_xS_TEX_CONST``/``SP_xS_TEX_COUNT``
239   - state-type: ``ST6_CONSTANTS``
240   - state-block: ``SB6_xS_TEX``
241- samplers - per-shader-stage
242   - registers: ``SP_xS_TEX_SAMP``
243   - state-type: ``ST6_SHADER``
244   - state-block: ``SB6_xS_TEX``
245- UBOs - per-shader-stage
246   - registers: none
247   - state-type: ``ST6_UBO``
248   - state-block: ``SB6_xS_SHADER``
249- IBOs - global acress shader 3d stages, separate for compute shader
250   - registers: ``SP_IBO``/``SP_IBO_COUNT`` or ``SP_CS_IBO``/``SP_CS_IBO_COUNT``
251   - state-type: ``ST6_SHADER``
252   - state-block: ``ST6_IBO`` or ``ST6_CS_IBO`` for compute shaders
253   - Note, unlike per-shader-stage descriptors, ``CP_LOAD_STATE6`` is used,
254     as opposed to ``CP_LOAD_STATE6_GEOM`` or ``CP_LOAD_STATE6_FRAG``
255     depending on shader stage.
256
257.. note::
258   For the per-shader-stage registers and state-blocks the ``xS`` notation
259   refers to per-shader-stage names, ex. ``SP_FS_TEX_CONST`` or ``SB6_DS_TEX``
260
261Textures and IBOs (images) use *basically* the same 64byte descriptor format
262with some exceptions (for ex, for IBOs cubemaps are handles as 2d array).
263SSBOs are just untyped buffers, but otherwise use the same descriptors and
264instructions as images.  Samplers use a 16byte descriptor, and UBOs use an
2658byte descriptor which packs the size in the upper 15 bits of the UBO address.
266
267In the bindless model, descriptors are split into 5 descriptor sets, which are
268global across shader stages (but as with bindful IBO descriptors, separate for
2693d stages vs compute stage).  Each hw descriptor is an array of descriptors
270of configurable size (each descriptor set can be configured for a descriptor
271pitch of 8bytes or 64bytes).  Each descriptor can be of arbitrary format (ie.
272UBOs/IBOs/textures/samplers interleaved), it's interpretation by the hw is
273determined by the instruction that references the descriptor.  Each descriptor
274set can contain at least 2^^16 descriptors.
275
276The hw is configured with the base address of the descriptor set via an array
277of "BINDLESS_BASE" registers, ie ``SP_BINDLESS_BASE[n]``/``HLSQ_BINDLESS_BASE[n]``
278for 3d shader stages, or ``SP_CS_BINDLESS_BASE[n]``/``HLSQ_CS_BINDLESS_BASE[n]``
279for compute shaders, with the descriptor pitch encoded in the low bits.
280Which of the descriptor sets is referenced is encoded via three bits in the
281instruction.  The address of the descriptor is calculated as::
282
283   descriptor_addr = (BINDLESS_BASE[n] & ~0x3) +
284                     (idx * 4 * (2 << BINDLESS_BASE[n] & 0x3))
285
286
287.. note::
288   Turnip reserves one descriptor set for internal use and exposes the other
289   four for the application via the vulkan API.
290
291Software Architecture
292---------------------
293
294Freedreno and Turnip use a shared core for shader compiler, image layout, and
295register and command stream definitions.  They implement separate state
296management and command stream generation.
297
298.. toctree::
299   :glob:
300
301   freedreno/*
302
303GPU devcoredump
304^^^^^^^^^^^^^^^^^^
305
306A kernel message from DRM of "gpu fault" can mean any sort of error reported by
307the GPU (including its internal hang detection).  If a fault in GPU address
308space happened, you should expect to find a message from the iommu, with the
309faulting address and a hardware unit involved:
310
311.. code-block:: text
312
313  *** gpu fault: ttbr0=000000001c941000 iova=000000010066a000 dir=READ type=TRANSLATION source=TP|VFD (0,0,0,1)
314
315On a GPU fault or hang, a GPU core dump is taken by the DRM driver and saved to
316``/sys/devices/virtual/devcoredump/**/data``.  You can cp that file to a
317:file:`crash.devcore` to save it, otherwise the kernel will expire it
318eventually. Echo 1 to the file to free the core early, as another core won't be
319taken until then.
320
321Once you have your core file, you can use :command:`crashdec -f crash.devcore`
322to decode it.  The output will have ``ESTIMATED CRASH LOCATION`` where we
323estimate the CP to have stopped.  Note that it is expected that this will be
324some distance past whatever state triggered the fault, given GPU pipelining, and
325will often be at some ``CP_REG_TO_MEM`` (which waits on previous WFIs) or
326``CP_WAIT_FOR_ME`` (which waits for all register writes to land) or similar
327event. You can try running the workload with ``TU_DEBUG=flushall`` or
328``FD_MESA_DEBUG=flush`` to try to close in on the failing commands.
329
330You can also find what commands were queued up to each cluster in the
331``regs-name: CP_MEMPOOL`` section.
332
333If ``ESTIMATED CRASH LOCATION`` doesn't exist you could find ``CP_SQE_STAT``,
334though going here is the last resort and likely won't be helpful.
335
336.. code-block::
337
338  indexed-registers:
339    - regs-name: CP_SQE_STAT
340      dwords: 51
341  	 PC: 00d7                                <-------------
342  	PKT: CP_LOAD_STATE6_FRAG
343  	$01: 70348003		$11: 00000000
344  	$02: 20000000		$12: 00000022
345
346The ``PC`` value is an instruction address in the current firmware.
347You would need to disassemble the firmware (/lib/firmware/qcom/aXXX_sqe.fw) via:
348
349.. code-block:: sh
350
351  afuc-disasm -v a650_sqe.fw > a650_sqe.fw.disasm
352
353Now you should search for PC value in the disassembly, e.g.:
354
355.. code-block::
356
357  l018:	00d1: 08dd0001  add $addr, $06, 0x0001
358       	00d2: 981ff806  mov $data, $data
359       	00d3: 8a080001  mov $08, 0x0001 << 16
360       	00d4: 3108ffff  or $08, $08, 0xffff
361       	00d5: 9be8f805  and $data, $data, $08
362       	00d6: 9806e806  mov $addr, $06
363       	00d7: 9803f806  mov $data, $03           <------------- HERE
364       	00d8: d8000000  waitin
365       	00d9: 981f0806  mov $01, $data
366
367
368Command Stream Capture
369^^^^^^^^^^^^^^^^^^^^^^
370
371During Mesa development, it's often useful to look at the command streams we
372send to the kernel.  We have an interface for the kernel to capture all
373submitted command streams:
374
375.. code-block:: sh
376
377  cat /sys/kernel/debug/dri/0/rd > cmdstream &
378
379By default, command stream capture does not capture texture/vertex/etc. data.
380You can enable capturing all the BOs with:
381
382.. code-block:: sh
383
384  echo Y > /sys/module/msm/parameters/rd_full
385
386Note that, since all command streams get captured, it is easy to run the system
387out of memory doing this, so you probably don't want to enable it during play of
388a heavyweight game.  Instead, to capture a command stream within a game, you
389probably want to cause a crash in the GPU during a frame of interest so that a
390single GPU core dump is generated.  Emitting ``0xdeadbeef`` in the CS should be
391enough to cause a fault.
392
393``fd_rd_output`` facilities provide support for generating the command stream
394capture from inside Mesa. Different ``FD_RD_DUMP`` options are available:
395
396- ``enable`` simply enables dumping the command stream on each submit for a
397  given logical device. When a more advanced option is specified, ``enable`` is
398  implied as specified.
399- ``combine`` will combine all dumps into a single file instead of writing the
400  dump for each submit into a standalone file.
401- ``full`` will dump every buffer object, which is necessary for replays of
402  command streams (see below).
403- ``trigger`` will establish a trigger file through which dumps can be better
404  controlled. Writing a positive integer value into the file will enable dumping
405  of that many subsequent submits. Writing -1 will enable dumping of submits
406  until disabled. Writing 0 (or any other value) will disable dumps.
407
408Output dump files and trigger file (when enabled) are hard-coded to be placed
409under ``/tmp``, or ``/data/local/tmp`` under Android.
410
411Functionality is generic to any Freedreno-based backend, but is currently only
412integrated in the MSM backend of Turnip. Using the existing ``TU_DEBUG=rd``
413option will translate to ``FD_RD_DUMP=enable``.
414
415Capturing Hang RD
416+++++++++++++++++
417
418Devcore file doesn't contain all submitted command streams, only the hanging one.
419Additionally it is geared towards analyzing the GPU state at the moment of the crash.
420
421Alternatively, it's possible to obtain the whole submission with all command
422streams via ``/sys/kernel/debug/dri/0/hangrd``:
423
424.. code-block:: sh
425
426  sudo cat /sys/kernel/debug/dri/0/hangrd > logfile.rd // Do the cat _before_ the expected hang
427
428The format of hangrd is the same as in ordinary command stream capture.
429``rd_full`` also has the same effect on it.
430
431Replaying Command Stream
432^^^^^^^^^^^^^^^^^^^^^^^^
433
434`replay` tool allows capturing and replaying ``rd`` to reproduce GPU faults.
435Especially useful for transient GPU issues since it has much higher chances to
436reproduce them.
437
438Dumping rendering results or even just memory is currently unsupported.
439
440- Replaying command streams requires kernel with ``MSM_INFO_SET_IOVA`` support.
441- Requires ``rd`` capture to have full snapshots of the memory (``rd_full`` is enabled).
442
443Replaying is done via `replay` tool:
444
445.. code-block:: sh
446
447  ./replay test_replay.rd
448
449More examples:
450
451.. code-block:: sh
452
453  ./replay --first=start_submit_n --last=last_submit_n test_replay.rd
454
455.. code-block:: sh
456
457  ./replay --override=0 --generator=./generate_rd test_replay.rd
458
459Editing Command Stream (a6xx+)
460^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
461
462While replaying a fault is useful in itself, modifying the capture to
463understand what causes the fault could be even more useful.
464
465``rddecompiler`` decompiles a single cmdstream from ``rd`` into compilable C source.
466Given the address space bounds the generated program creates a new ``rd`` which
467could be used to override cmdstream with 'replay'. Generated ``rd`` is not replayable
468on its own and depends on buffers provided by the source ``rd``.
469
470C source could be compiled using rdcompiler-meson.build as an example.
471
472The workflow would look like this:
473
4741. Find the cmdstream № you want to edit;
4752. Decompile it:
476
477.. code-block:: sh
478
479  ./rddecompiler -s %cmd_stream_n% example.rd > generate_rd.c
480
4813. Edit the command stream;
4824. Compile it back, see rdcompiler-meson.build for the instructions;
4835. Plug the generator into cmdstream replay:
484
485.. code-block:: sh
486
487  ./replay --override=%cmd_stream_№% --generator=~/generate_rd
488
4896. Repeat 3-5.
490
491GPU Hang Debugging
492^^^^^^^^^^^^^^^^^^
493
494Not a guide for how to do it but mostly an enumeration of methods.
495
496Useful ``TU_DEBUG`` (for Turnip) options to narrow down the hang cause:
497
498``sysmem``, ``gmem``, ``nobin``, ``forcebin``, ``noubwc``, ``nolrz``, ``flushall``, ``syncdraw``, ``rast_order``
499
500Useful ``FD_MESA_DEBUG`` (for Freedreno) options:
501
502``sysmem``, ``gmem``, ``nobin``, ``noubwc``, ``nolrz``, ``notile``, ``dclear``, ``ddraw``, ``flush``, ``inorder``, ``noblit``
503
504Useful ``IR3_SHADER_DEBUG`` options:
505
506``nouboopt``, ``spillall``, ``nopreamble``, ``nofp16``
507
508Use Graphics Flight Recorder to narrow down the place which hangs,
509use our own breadcrumbs implementation in case of unrecoverable hangs.
510
511In case of faults use RenderDoc to find the problematic command. If it's
512a draw call, edit shader in RenderDoc to find whether it culprit is a shader.
513If yes, bisect it.
514
515If editing the shader messes the assembly too much and the issue becomes unreproducible
516try editing the assembly itself via ``IR3_SHADER_OVERRIDE_PATH``.
517
518If fault or hang is transient try capturing an ``rd`` and replay it. If issue
519is reproduced - bisect the GPU packets until the culprit is found.
520
521Do the above if culprit is not a shader.
522
523The hang recovery mechanism in Kernel is not perfect, in case of unrecoverable
524hangs check whether the kernel is up to date and look for unmerged patches
525which could improve the recovery.
526
527GPU Breadcrumbs
528+++++++++++++++
529
530Breadcrumbs described below are available only in Turnip.
531
532Freedreno has simpler breadcrumbs, in debug build writes breadcrumbs
533into ``CP_SCRATCH_REG[6]`` and per-tile breadcrumbs into ``CP_SCRATCH_REG[7]``,
534in this way they are available in the devcoredump. TODO: generalize Tunip's
535breadcrumbs implementation.
536
537This is a simple implementations of breadcrumbs tracking of GPU progress
538intended to be a last resort when debugging unrecoverable hangs.
539For best results use Vulkan traces to have a predictable place of hang.
540
541For ordinary hangs as a more user-friendly solution use GFR
542"Graphics Flight Recorder".
543
544Or breadcrumbs implementation aims to handle cases where nothing can be done
545after the hang. In-driver breadcrumbs also allow more precise tracking since
546we could target a single GPU packet.
547
548While breadcrumbs support gmem, try to reproduce the hang in a sysmem mode
549because it would require much less breadcrumb writes and syncs.
550
551Breadcrumbs settings:
552
553.. code-block:: sh
554
555  TU_BREADCRUMBS=%IP%:%PORT%,break=%BREAKPOINT%:%BREAKPOINT_HITS%
556
557``BREAKPOINT``
558  The breadcrumb starting from which we require explicit ack.
559``BREAKPOINT_HITS``
560  How many times breakpoint should be reached for break to occur.
561  Necessary for a gmem mode and re-usable cmdbuffers in both of which
562  the same cmdstream could be executed several times.
563
564A typical work flow would be:
565
566- Start listening for breadcrumbs on a remote host:
567
568.. code-block:: sh
569
570   nc -lvup $PORT | stdbuf -o0 xxd -pc -c 4 | awk -Wposix '{printf("%u:%u\n", "0x" $0, a[$0]++)}'
571
572- Start capturing command stream;
573- Replay the hanging trace with:
574
575.. code-block:: sh
576
577   TU_BREADCRUMBS=$IP:$PORT,break=-1:0
578
579- Increase hangcheck period:
580
581.. code-block:: sh
582
583   echo -n 60000 > /sys/kernel/debug/dri/0/hangcheck_period_ms
584
585- After GPU hang note the last breadcrumb and relaunch trace with:
586
587.. code-block:: sh
588
589   TU_BREADCRUMBS=%IP%:%PORT%,break=%LAST_BREADCRUMB%:%HITS%
590
591- After the breakpoint is reached each breadcrumb would require
592  explicit ack from the user. This way it's possible to find
593  the last packet which didn't hang.
594
595- Find the packet in the decoded cmdstream.
596
597Debugging random failures
598^^^^^^^^^^^^^^^^^^^^^^^^^
599
600In most cases random GPU faults and rendering artifacts are caused by some kind
601of undifined behaviour that falls under the following categories:
602
603- Usage of a stale reg value;
604- Usage of stale memory (e.g. expecting it to be zeroed when it is not);
605- Lack of the proper synchronization.
606
607Finding instances of stale reg reads
608++++++++++++++++++++++++++++++++++++
609
610Turnip has a debug option to stomp the registers with invalid values to catch
611the cases where stale data is read.
612
613.. code-block:: sh
614
615  MESA_VK_ABORT_ON_DEVICE_LOSS=1 \
616  TU_DEBUG_STALE_REGS_RANGE=0x00000c00,0x0000be01 \
617  TU_DEBUG_STALE_REGS_FLAGS=cmdbuf,renderpass \
618  ./app
619
620.. envvar:: TU_DEBUG_STALE_REGS_RANGE
621
622  the reg range in which registers would be stomped. Add ``inverse`` to the
623  flags in order for this range to specify which registers NOT to stomp.
624
625.. envvar:: TU_DEBUG_STALE_REGS_FLAGS
626
627  ``cmdbuf``
628    stomp registers at the start of each command buffer.
629  ``renderpass``
630    stomp registers before each renderpass.
631  ``inverse``
632    changes `TU_DEBUG_STALE_REGS_RANGE` meaning to
633    "regs that should NOT be stomped".
634
635The best way to pinpoint the reg which causes a failure is to bisect the regs
636range. In case when a fail is caused by combination of several registers
637the `inverse` flag may be set to find the reg which prevents the failure.
638