• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:orphan:
2
3.. _radv-debug-hang:
4
5Debugging GPU hangs with RADV
6=============================
7
8UMR (optional)
9--------------
10
11UMR is needed for dumping a lot of useful information. Clone, build and install
12`UMR <https://gitlab.freedesktop.org/tomstdenis/umr>`__. Do not forget to run
13``chmod +s $(which umr)`` so RADV can actually access UMR.
14
15UMR needs to access some kernel debug interfaces:
16
17.. code-block:: sh
18
19   chmod 777 /sys/kernel/debug
20   chmod -R 777 /sys/kernel/debug/dri
21
22Secure boot has to be disabled as well.
23
24Generating and analyzing hang reports
25-------------------------------------
26
27With UMR installed, you can now set ``RADV_DEBUG=hang`` which makes RADV insert
28trace markers and synchronization and check for hangs. The hang report will be
29saved to ``~/radv_dumps_<pid>_<time>``. Inside the directory of the hang report,
30there are a couple of files:
31
32* ``*.spv``: SPIR-V binaries of the pipeline that was bound when the hang
33  occurred.
34* ``addr_binding_report.log``: VK_EXT_address_binding_report logs.
35* ``app_info.log``: ``VkApplicationInfo`` fields.
36* ``bo_history.log``: A list of every GPU memory allocation and deallocation.
37  If the GPU hang was caused by a page fault, you can use
38  `radv_check_va.py <https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/amd/vulkan/radv_check_va.py>`__
39  to figure out if address is invalid or used after the memory was deallocated.
40* ``bo_ranges.log``: Address ranges that were valid at the time of submission.
41* ``dmesg.log``: Output of ``dmesg``, if available.
42* ``gpu_info.log``: Fields of ``radeon_info``.
43* ``pipeline.log``: IR of the shaders that were bound during the hang as well as
44  programm counters of waves executing said shaders and bound descriptors.
45* ``registers.log``: Various GPU state registers.
46* ``trace.log``: An annotated list of the command stream that caused the hang.
47  the commands that hung come after
48  ``!!!!! This is the last trace point that was reached by the CP !!!!!``.
49* ``umr_ring.log``: Similar to ``trace.log``.
50* ``umr_waves.log``: A list of waves that were active at the time of the hang,
51  including register values.
52* ``vm_fault.log``: The page fault address if a page fault occurred.
53
54Note: By default, the backend IR (ACO or LLVM) and the disassembly should be
55dumped to ``pipeline.log``. But due to shaders caching, you might need
56``RADV_DEBUG=hang,nocache`` to get SPIR-V and NIR in the GPU hang report.
57
58Debugging Steam games
59---------------------
60
61Steam games require a bit more work so RADV can access UMR: In your Steam library,
62make sure **Tools** is checked and search for **Steam Linux Runtime**.
63Under **Properties** -> **Installed Files**, click **Browse**, open
64``_v2-entry-point`` and add
65
66.. code-block:: sh
67
68   shift 2
69   exec "$@"
70
71at the top of the file. Hang debugging can be enabled by selecting the faulting
72game and adding ``RADV_DEBUG=hang %command%`` under **Properties** -> **General**
73-> **LAUNCH OPTIONS**.
74
75Debugging hangs without RADV_DEBUG=hang
76---------------------------------------
77
78In some situations, ``RADV_DEBUG=hang`` wouldn't be able to generate a GPU hang
79report, like for synchronization issues (because it enables
80``RADV_DEBUG=syncshaders`` behind the scene). An alternative solution is to
81disable GPU recovery by adding ``amdgpu.gpu_recovery=0`` to your kernel command
82line options. And then invoke UMR manually with
83``umr --by-pci <pci_id> -O bits,halt_waves -go 0 -wa <ring> -go 1 2>&1`` for
84dumping the waves and ``umr --by-pci <pci_id> -RS <ring> 2>&1`` for dumping the
85rings once the GPU hang occurred.
86