• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1ANV
2===
3
4Debugging
5---------
6
7Here are a few environment variable debug environment variables
8specific to ANV:
9
10:envvar:`ANV_ENABLE_PIPELINE_CACHE`
11   If defined to ``0`` or ``false``, this will disable pipeline
12   caching, forcing ANV to reparse and recompile any VkShaderModule
13   (SPIRV) it is given.
14:envvar:`ANV_DISABLE_SECONDARY_CMD_BUFFER_CALLS`
15   If defined to ``1`` or ``true``, this will prevent usage of self
16   modifying command buffers to implement ``vkCmdExecuteCommands``. As
17   a result of this, it will also disable :ext:`VK_KHR_performance_query`.
18:envvar:`ANV_ALWAYS_BINDLESS`
19   If defined to ``1`` or ``true``, this forces all descriptor sets to
20   use the internal `Bindless model`_.
21:envvar:`ANV_QUEUE_THREAD_DISABLE`
22   If defined to ``1`` or ``true``, this disables support for timeline
23   semaphores.
24:envvar:`ANV_USERSPACE_RELOCS`
25   If defined to ``1`` or ``true``, this forces ANV to always do
26   kernel relocations in command buffers. This should only have an
27   effect on hardware that doesn't support soft-pinning (Ivybridge,
28   Haswell, Cherryview).
29:envvar:`ANV_PRIMITIVE_REPLICATION_MAX_VIEWS`
30   Specifies up to how many view shaders can be lowered to handle
31   :ext:`VK_KHR_multiview`. Beyond this number, multiview is implemented
32   using instanced rendering. If unspecified, the value default to
33   ``2``.
34
35
36Experimental features
37---------------------
38
39.. _`Bindless model`:
40
41Binding Model
42-------------
43
44Here is the ANV bindless binding model that was implemented for the
45descriptor indexing feature of Vulkan 1.2 :
46
47.. graphviz::
48
49  digraph G {
50    fontcolor="black";
51    compound=true;
52
53    subgraph cluster_1 {
54      label = "Binding Table (HW)";
55
56      bgcolor="cornflowerblue";
57
58      node [ style=filled,shape="record",fillcolor="white",
59             label="RT0"    ] n0;
60      node [ label="RT1"    ] n1;
61      node [ label="dynbuf0"] n2;
62      node [ label="set0"   ] n3;
63      node [ label="set1"   ] n4;
64      node [ label="set2"   ] n5;
65
66      n0 -> n1 -> n2 -> n3 -> n4 -> n5 [style=invis];
67    }
68    subgraph cluster_2 {
69      label = "Descriptor Set 0";
70
71      bgcolor="burlywood3";
72      fixedsize = true;
73
74      node [ style=filled,shape="record",fillcolor="white", fixedsize = true, width=4,
75             label="binding 0 - STORAGE_IMAGE\n anv_storage_image_descriptor"          ] n8;
76      node [ label="binding 1 - COMBINED_IMAGE_SAMPLER\n anv_sampled_image_descriptor" ] n9;
77      node [ label="binding 2 - UNIFORM_BUFFER\n anv_address_range_descriptor"         ] n10;
78      node [ label="binding 3 - UNIFORM_TEXEL_BUFFER\n anv_storage_image_descriptor"   ] n11;
79
80      n8 -> n9 -> n10 -> n11 [style=invis];
81    }
82    subgraph cluster_5 {
83      label = "Vulkan Objects"
84
85      fontcolor="black";
86      bgcolor="darkolivegreen4";
87
88      subgraph cluster_6 {
89        label = "VkImageView";
90
91        bgcolor=darkolivegreen3;
92        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
93               label="surface_state" ] n12;
94      }
95      subgraph cluster_7 {
96        label = "VkSampler";
97
98        bgcolor=darkolivegreen3;
99        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
100               label="sample_state" ] n13;
101      }
102      subgraph cluster_8 {
103        label = "VkImageView";
104        bgcolor="darkolivegreen3";
105
106        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
107               label="surface_state" ] n14;
108      }
109      subgraph cluster_9 {
110        label = "VkBuffer";
111        bgcolor=darkolivegreen3;
112
113        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
114               label="address" ] n15;
115      }
116      subgraph cluster_10 {
117        label = "VkBufferView";
118
119        bgcolor=darkolivegreen3;
120        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
121               label="surface_state" ] n16;
122      }
123
124      n12 -> n13 -> n14 -> n15 -> n16 [style=invis];
125    }
126
127    subgraph cluster_11 {
128      subgraph cluster_12 {
129        label = "CommandBuffer state stream";
130
131        bgcolor="gold3";
132        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
133               label="surface_state" ] n17;
134        node [ label="surface_state" ] n18;
135        node [ label="surface_state" ] n19;
136
137        n17 -> n18 -> n19 [style=invis];
138      }
139    }
140
141    n3  -> n8 [lhead=cluster_2];
142
143    n8  -> n12;
144    n9  -> n13;
145    n9  -> n14;
146    n10 -> n15;
147    n11 -> n16;
148
149    n0 -> n17;
150    n1 -> n18;
151    n2 -> n19;
152  }
153
154
155
156The HW binding table is generated when the draw or dispatch commands
157are emitted. Here are the types of entries one can find in the binding
158table :
159
160- The currently bound descriptor sets, one entry per descriptor set
161  (our limit is 8).
162
163- For dynamic buffers, one entry per dynamic buffer.
164
165- For draw commands, render target entries if needed.
166
167The entries of the HW binding table for descriptor sets are
168RENDER_SURFACE_STATE similar to what you would have for a normal
169uniform buffer. The shader will emit reads this buffer first to get
170the information it needs to access a surface/sampler/etc... and then
171emits the appropriate message using the information gathered from the
172descriptor set buffer.
173
174Each binding type entry gets an associated structure in memory
175(``anv_storage_image_descriptor``, ``anv_sampled_image_descriptor``,
176``anv_address_range_descriptor``, ``anv_storage_image_descriptor``).
177This is the information read by the shader.
178
179
180.. _`Binding tables`:
181
182Binding Tables
183--------------
184
185Binding tables are arrays of 32bit offset entries referencing surface
186states. This is how shaders can refer to binding table entry to read
187or write a surface. For example fragment shaders will often refer to
188entry 0 as the first render target.
189
190The way binding tables are managed is fairly awkward.
191
192Each shader stage must have its binding table programmed through
193a corresponding instruction
194``3DSTATE_BINDING_TABLE_POINTERS_*`` (each stage has its own).
195
196.. graphviz::
197
198  digraph structs {
199    node [shape=record];
200    struct3 [label="{ binding tables&#92;n area | { <bt4> BT4 | <bt3> BT3 | ... | <bt0> BT0 } }|{ surface state&#92;n area |{<ss0> ss0|<ss1> ss1|<ss2> ss2|...}}"];
201    struct3:bt0 -> struct3:ss0;
202    struct3:bt0 -> struct3:ss1;
203  }
204
205
206The value programmed in the ``3DSTATE_BINDING_TABLE_POINTERS_*``
207instructions is not a 64bit pointer but an offset from the address
208programmed in ``STATE_BASE_ADDRESS::Surface State Base Address`` or
209``3DSTATE_BINDING_TABLE_POOL_ALLOC::Binding Table Pool Base Address``
210(available on Gfx11+). The offset value in
211``3DSTATE_BINDING_TABLE_POINTERS_*`` is also limited to a few bits
212(not a full 32bit value), meaning that as we use more and more binding
213tables we need to reposition ``STATE_BASE_ADDRESS::Surface State Base
214Address`` to make space for new binding table arrays.
215
216To make things even more awkward, the binding table entries are also
217relative to ``STATE_BASE_ADDRESS::Surface State Base Address`` so as
218we change ``STATE_BASE_ADDRESS::Surface State Base Address`` we need
219add that offsets to the binding table entries.
220
221The way with deal with this is that we allocate 4Gb of address space
222(since the binding table entries can address 4Gb of surface state
223elements). We reserve the first gigabyte exclusively to binding
224tables, so that anywhere we position our binding table in that first
225gigabyte, it can always refer to the surface states in the next 3Gb.
226
227
228.. _`Descriptor Set Memory Layout`:
229
230Descriptor Set Memory Layout
231----------------------------
232
233Here is a representation of how the descriptor set bindings, with each
234elements in each binding is mapped to a the descriptor set memory :
235
236.. graphviz::
237
238  digraph structs {
239    node [shape=record];
240    rankdir=LR;
241
242    struct1 [label="Descriptor Set | \
243                    <b0> binding 0\n STORAGE_IMAGE \n (array_length=3) | \
244                    <b1> binding 1\n COMBINED_IMAGE_SAMPLER \n (array_length=2) | \
245                    <b2> binding 2\n UNIFORM_BUFFER \n (array_length=1) | \
246                    <b3> binding 3\n UNIFORM_TEXEL_BUFFER \n (array_length=1)"];
247    struct2 [label="Descriptor Set Memory | \
248                    <b0e0> anv_storage_image_descriptor|\
249                    <b0e1> anv_storage_image_descriptor|\
250                    <b0e2> anv_storage_image_descriptor|\
251                    <b1e0> anv_sampled_image_descriptor|\
252                    <b1e1> anv_sampled_image_descriptor|\
253                    <b2e0> anv_address_range_descriptor|\
254                    <b3e0> anv_storage_image_descriptor"];
255
256    struct1:b0 -> struct2:b0e0;
257    struct1:b0 -> struct2:b0e1;
258    struct1:b0 -> struct2:b0e2;
259    struct1:b1 -> struct2:b1e0;
260    struct1:b1 -> struct2:b1e1;
261    struct1:b2 -> struct2:b2e0;
262    struct1:b3 -> struct2:b3e0;
263  }
264
265Each Binding in the descriptor set is allocated an array of
266``anv_*_descriptor`` data structure. The type of ``anv_*_descriptor``
267used for a binding is selected based on the ``VkDescriptorType`` of
268the bindings.
269
270The value of ``anv_descriptor_set_binding_layout::descriptor_offset``
271is a byte offset from the descriptor set memory to the associated
272binding. ``anv_descriptor_set_binding_layout::array_size`` is the
273number of ``anv_*_descriptor`` elements in the descriptor set memory
274from that offset for the binding.
275
276
277Pipeline state emission
278-----------------------
279
280Vulkan initially started by baking as much state as possible in
281pipelines. But extension after extension, more and more state has
282become potentially dynamic.
283
284ANV tries to limit the amount of time an instruction has to be packed
285to reprogram part of the 3D pipeline state. The packing is happening
286in 2 places :
287
288- ``genX_pipeline.c`` where the non dynamic state is emitted in the
289  pipeline batch. Chunks of the batches are copied into the command
290  buffer as a result of calling ``vkCmdBindPipeline()``, depending on
291  what changes from the previously bound graphics pipeline
292
293- ``genX_gfx_state.c`` where the dynamic state is added to already
294  packed instructions from ``genX_pipeline.c``
295
296The rule to know where to emit an instruction programming the 3D
297pipeline is as follow :
298
299- If any field of the instruction can be made dynamic, it should be
300  emitted in ``genX_gfx_state.c``
301
302- Otherwise, the instruction can be emitted in ``genX_pipeline.c``
303
304When a piece of state programming is dynamic, it should have a
305corresponding field in ``anv_gfx_dynamic_state`` and the
306``genX(cmd_buffer_flush_gfx_runtime_state)`` function should be
307updated to ensure we minimize the amount of time an instruction should
308be emitted. Each instruction should have a associated
309``ANV_GFX_STATE_*`` mask so that the dynamic emission code can tell
310when to re-emit an instruction.
311
312
313Generated indirect draws optimization
314-------------------------------------
315
316Indirect draws have traditionally been implemented on Intel HW by
317loading the indirect parameters from memory into HW registers using
318the command streamer's ``MI_LOAD_REGISTER_MEM`` instruction before
319dispatching a draw call to the 3D pipeline.
320
321On recent products, it was found that the command streamer is showing
322as performance bottleneck, because it cannot dispatch draw calls fast
323enough to keep the 3D pipeline busy.
324
325The solution to this problem is to change the way we deal with
326indirect draws. Instead of loading HW registers with values using the
327command streamer, we generate entire set of ``3DPRIMITIVE``
328instructions using a shader. The generated instructions contain the
329entire draw call parameters. This way the command streamer executes
330only ``3DPRIMITIVE`` instructions and doesn't do any data loading from
331memory or touch HW registers, feeding the 3D pipeline as fast as it
332can.
333
334In ANV this implemented in 2 different ways :
335
336By generating instructions directly into the command stream using a
337side batch buffer. When ANV encounters the first indirect draws, it
338generates a jump into the side batch, the side batch contains a draw
339call using a generation shader for each indirect draw. We keep adding
340on more generation draws into the batch until we have to stop due to
341command buffer end, secondary command buffer calls or a barrier
342containing the access flag ``VK_ACCESS_INDIRECT_COMMAND_READ_BIT``.
343The side batch buffer jump back right after the instruction where it
344was called. Here is a high level diagram showing how the generation
345batch buffer writes in the main command buffer :
346
347.. graphviz::
348
349  digraph commands_mode {
350    rankdir = "LR"
351    "main-command-buffer" [
352      label = "main command buffer|...|draw indirect0 start|<f0>jump to\ngeneration batch|<f1>|<f2>empty instruction0|<f3>empty instruction1|...|draw indirect0 end|...|draw indirect1 start|<f4>empty instruction0|<f5>empty instruction1|...|<f6>draw indirect1 end|..."
353      shape = "record"
354    ];
355    "generation-command-buffer" [
356      label = "generation command buffer|<f0>|<f1>write draw indirect0|<f2>write draw indirect1|...|<f3>exit jump"
357      shape = "record"
358    ];
359    "main-command-buffer":f0 -> "generation-command-buffer":f0;
360    "generation-command-buffer":f1 -> "main-command-buffer":f2 [color="#0000ff"];
361    "generation-command-buffer":f1 -> "main-command-buffer":f3 [color="#0000ff"];
362    "generation-command-buffer":f2 -> "main-command-buffer":f4 [color="#0000ff"];
363    "generation-command-buffer":f2 -> "main-command-buffer":f5 [color="#0000ff"];
364    "generation-command-buffer":f3 -> "main-command-buffer":f1;
365  }
366
367By generating instructions into a ring buffer of commands, when the
368draw count number is high. This solution allows smaller batches to be
369emitted. Here is a high level diagram showing how things are
370executed :
371
372.. graphviz::
373
374  digraph ring_mode {
375    rankdir=LR;
376    "main-command-buffer" [
377      label = "main command buffer|...| draw indirect |<f1>generation shader|<f2> jump to ring|<f3> increment\ndraw_base|<f4>..."
378      shape = "record"
379    ];
380    "ring-buffer" [
381      label = "ring buffer|<f0>generated draw0|<f1>generated draw1|<f2>generated draw2|...|<f3>exit jump"
382      shape = "record"
383    ];
384    "main-command-buffer":f2 -> "ring-buffer":f0;
385    "ring-buffer":f3 -> "main-command-buffer":f3;
386    "ring-buffer":f3 -> "main-command-buffer":f4;
387    "main-command-buffer":f3 -> "main-command-buffer":f1;
388    "main-command-buffer":f1 -> "ring-buffer":f1 [color="#0000ff"];
389    "main-command-buffer":f1 -> "ring-buffer":f2 [color="#0000ff"];
390  }
391
392Runtime dependencies
393--------------------
394
395Starting with Intel 12th generation/Alder Lake-P and Intel Arc Alchemist, the Intel 3D driver stack requires GuC firmware for proper operation. You have two options to install the firmware:
396
397- Distro package: Install the pre-packaged firmware included in your Linux distribution's repositories.
398- Manual download: You can download the firmware from the official repository: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915. Place the downloaded files in the /lib/firmware/i915 directory.
399
400Important: For optimal performance, we recommend updating the GuC firmware to version 70.6.3 or later.