1ANV 2=== 3 4Debugging 5--------- 6 7Here are a few environment variable debug environment variables 8specific to ANV: 9 10:envvar:`ANV_ENABLE_PIPELINE_CACHE` 11 If defined to ``0`` or ``false``, this will disable pipeline 12 caching, forcing ANV to reparse and recompile any VkShaderModule 13 (SPIRV) it is given. 14:envvar:`ANV_DISABLE_SECONDARY_CMD_BUFFER_CALLS` 15 If defined to ``1`` or ``true``, this will prevent usage of self 16 modifying command buffers to implement ``vkCmdExecuteCommands``. As 17 a result of this, it will also disable :ext:`VK_KHR_performance_query`. 18:envvar:`ANV_ALWAYS_BINDLESS` 19 If defined to ``1`` or ``true``, this forces all descriptor sets to 20 use the internal `Bindless model`_. 21:envvar:`ANV_QUEUE_THREAD_DISABLE` 22 If defined to ``1`` or ``true``, this disables support for timeline 23 semaphores. 24:envvar:`ANV_USERSPACE_RELOCS` 25 If defined to ``1`` or ``true``, this forces ANV to always do 26 kernel relocations in command buffers. This should only have an 27 effect on hardware that doesn't support soft-pinning (Ivybridge, 28 Haswell, Cherryview). 29:envvar:`ANV_PRIMITIVE_REPLICATION_MAX_VIEWS` 30 Specifies up to how many view shaders can be lowered to handle 31 :ext:`VK_KHR_multiview`. Beyond this number, multiview is implemented 32 using instanced rendering. If unspecified, the value default to 33 ``2``. 34 35 36Experimental features 37--------------------- 38 39.. _`Bindless model`: 40 41Binding Model 42------------- 43 44Here is the ANV bindless binding model that was implemented for the 45descriptor indexing feature of Vulkan 1.2 : 46 47.. graphviz:: 48 49 digraph G { 50 fontcolor="black"; 51 compound=true; 52 53 subgraph cluster_1 { 54 label = "Binding Table (HW)"; 55 56 bgcolor="cornflowerblue"; 57 58 node [ style=filled,shape="record",fillcolor="white", 59 label="RT0" ] n0; 60 node [ label="RT1" ] n1; 61 node [ label="dynbuf0"] n2; 62 node [ label="set0" ] n3; 63 node [ label="set1" ] n4; 64 node [ label="set2" ] n5; 65 66 n0 -> n1 -> n2 -> n3 -> n4 -> n5 [style=invis]; 67 } 68 subgraph cluster_2 { 69 label = "Descriptor Set 0"; 70 71 bgcolor="burlywood3"; 72 fixedsize = true; 73 74 node [ style=filled,shape="record",fillcolor="white", fixedsize = true, width=4, 75 label="binding 0 - STORAGE_IMAGE\n anv_storage_image_descriptor" ] n8; 76 node [ label="binding 1 - COMBINED_IMAGE_SAMPLER\n anv_sampled_image_descriptor" ] n9; 77 node [ label="binding 2 - UNIFORM_BUFFER\n anv_address_range_descriptor" ] n10; 78 node [ label="binding 3 - UNIFORM_TEXEL_BUFFER\n anv_storage_image_descriptor" ] n11; 79 80 n8 -> n9 -> n10 -> n11 [style=invis]; 81 } 82 subgraph cluster_5 { 83 label = "Vulkan Objects" 84 85 fontcolor="black"; 86 bgcolor="darkolivegreen4"; 87 88 subgraph cluster_6 { 89 label = "VkImageView"; 90 91 bgcolor=darkolivegreen3; 92 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2, 93 label="surface_state" ] n12; 94 } 95 subgraph cluster_7 { 96 label = "VkSampler"; 97 98 bgcolor=darkolivegreen3; 99 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2, 100 label="sample_state" ] n13; 101 } 102 subgraph cluster_8 { 103 label = "VkImageView"; 104 bgcolor="darkolivegreen3"; 105 106 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2, 107 label="surface_state" ] n14; 108 } 109 subgraph cluster_9 { 110 label = "VkBuffer"; 111 bgcolor=darkolivegreen3; 112 113 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2, 114 label="address" ] n15; 115 } 116 subgraph cluster_10 { 117 label = "VkBufferView"; 118 119 bgcolor=darkolivegreen3; 120 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2, 121 label="surface_state" ] n16; 122 } 123 124 n12 -> n13 -> n14 -> n15 -> n16 [style=invis]; 125 } 126 127 subgraph cluster_11 { 128 subgraph cluster_12 { 129 label = "CommandBuffer state stream"; 130 131 bgcolor="gold3"; 132 node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2, 133 label="surface_state" ] n17; 134 node [ label="surface_state" ] n18; 135 node [ label="surface_state" ] n19; 136 137 n17 -> n18 -> n19 [style=invis]; 138 } 139 } 140 141 n3 -> n8 [lhead=cluster_2]; 142 143 n8 -> n12; 144 n9 -> n13; 145 n9 -> n14; 146 n10 -> n15; 147 n11 -> n16; 148 149 n0 -> n17; 150 n1 -> n18; 151 n2 -> n19; 152 } 153 154 155 156The HW binding table is generated when the draw or dispatch commands 157are emitted. Here are the types of entries one can find in the binding 158table : 159 160- The currently bound descriptor sets, one entry per descriptor set 161 (our limit is 8). 162 163- For dynamic buffers, one entry per dynamic buffer. 164 165- For draw commands, render target entries if needed. 166 167The entries of the HW binding table for descriptor sets are 168RENDER_SURFACE_STATE similar to what you would have for a normal 169uniform buffer. The shader will emit reads this buffer first to get 170the information it needs to access a surface/sampler/etc... and then 171emits the appropriate message using the information gathered from the 172descriptor set buffer. 173 174Each binding type entry gets an associated structure in memory 175(``anv_storage_image_descriptor``, ``anv_sampled_image_descriptor``, 176``anv_address_range_descriptor``, ``anv_storage_image_descriptor``). 177This is the information read by the shader. 178 179 180.. _`Binding tables`: 181 182Binding Tables 183-------------- 184 185Binding tables are arrays of 32bit offset entries referencing surface 186states. This is how shaders can refer to binding table entry to read 187or write a surface. For example fragment shaders will often refer to 188entry 0 as the first render target. 189 190The way binding tables are managed is fairly awkward. 191 192Each shader stage must have its binding table programmed through 193a corresponding instruction 194``3DSTATE_BINDING_TABLE_POINTERS_*`` (each stage has its own). 195 196.. graphviz:: 197 198 digraph structs { 199 node [shape=record]; 200 struct3 [label="{ binding tables\n area | { <bt4> BT4 | <bt3> BT3 | ... | <bt0> BT0 } }|{ surface state\n area |{<ss0> ss0|<ss1> ss1|<ss2> ss2|...}}"]; 201 struct3:bt0 -> struct3:ss0; 202 struct3:bt0 -> struct3:ss1; 203 } 204 205 206The value programmed in the ``3DSTATE_BINDING_TABLE_POINTERS_*`` 207instructions is not a 64bit pointer but an offset from the address 208programmed in ``STATE_BASE_ADDRESS::Surface State Base Address`` or 209``3DSTATE_BINDING_TABLE_POOL_ALLOC::Binding Table Pool Base Address`` 210(available on Gfx11+). The offset value in 211``3DSTATE_BINDING_TABLE_POINTERS_*`` is also limited to a few bits 212(not a full 32bit value), meaning that as we use more and more binding 213tables we need to reposition ``STATE_BASE_ADDRESS::Surface State Base 214Address`` to make space for new binding table arrays. 215 216To make things even more awkward, the binding table entries are also 217relative to ``STATE_BASE_ADDRESS::Surface State Base Address`` so as 218we change ``STATE_BASE_ADDRESS::Surface State Base Address`` we need 219add that offsets to the binding table entries. 220 221The way with deal with this is that we allocate 4Gb of address space 222(since the binding table entries can address 4Gb of surface state 223elements). We reserve the first gigabyte exclusively to binding 224tables, so that anywhere we position our binding table in that first 225gigabyte, it can always refer to the surface states in the next 3Gb. 226 227 228.. _`Descriptor Set Memory Layout`: 229 230Descriptor Set Memory Layout 231---------------------------- 232 233Here is a representation of how the descriptor set bindings, with each 234elements in each binding is mapped to a the descriptor set memory : 235 236.. graphviz:: 237 238 digraph structs { 239 node [shape=record]; 240 rankdir=LR; 241 242 struct1 [label="Descriptor Set | \ 243 <b0> binding 0\n STORAGE_IMAGE \n (array_length=3) | \ 244 <b1> binding 1\n COMBINED_IMAGE_SAMPLER \n (array_length=2) | \ 245 <b2> binding 2\n UNIFORM_BUFFER \n (array_length=1) | \ 246 <b3> binding 3\n UNIFORM_TEXEL_BUFFER \n (array_length=1)"]; 247 struct2 [label="Descriptor Set Memory | \ 248 <b0e0> anv_storage_image_descriptor|\ 249 <b0e1> anv_storage_image_descriptor|\ 250 <b0e2> anv_storage_image_descriptor|\ 251 <b1e0> anv_sampled_image_descriptor|\ 252 <b1e1> anv_sampled_image_descriptor|\ 253 <b2e0> anv_address_range_descriptor|\ 254 <b3e0> anv_storage_image_descriptor"]; 255 256 struct1:b0 -> struct2:b0e0; 257 struct1:b0 -> struct2:b0e1; 258 struct1:b0 -> struct2:b0e2; 259 struct1:b1 -> struct2:b1e0; 260 struct1:b1 -> struct2:b1e1; 261 struct1:b2 -> struct2:b2e0; 262 struct1:b3 -> struct2:b3e0; 263 } 264 265Each Binding in the descriptor set is allocated an array of 266``anv_*_descriptor`` data structure. The type of ``anv_*_descriptor`` 267used for a binding is selected based on the ``VkDescriptorType`` of 268the bindings. 269 270The value of ``anv_descriptor_set_binding_layout::descriptor_offset`` 271is a byte offset from the descriptor set memory to the associated 272binding. ``anv_descriptor_set_binding_layout::array_size`` is the 273number of ``anv_*_descriptor`` elements in the descriptor set memory 274from that offset for the binding. 275 276 277Pipeline state emission 278----------------------- 279 280Vulkan initially started by baking as much state as possible in 281pipelines. But extension after extension, more and more state has 282become potentially dynamic. 283 284ANV tries to limit the amount of time an instruction has to be packed 285to reprogram part of the 3D pipeline state. The packing is happening 286in 2 places : 287 288- ``genX_pipeline.c`` where the non dynamic state is emitted in the 289 pipeline batch. Chunks of the batches are copied into the command 290 buffer as a result of calling ``vkCmdBindPipeline()``, depending on 291 what changes from the previously bound graphics pipeline 292 293- ``genX_gfx_state.c`` where the dynamic state is added to already 294 packed instructions from ``genX_pipeline.c`` 295 296The rule to know where to emit an instruction programming the 3D 297pipeline is as follow : 298 299- If any field of the instruction can be made dynamic, it should be 300 emitted in ``genX_gfx_state.c`` 301 302- Otherwise, the instruction can be emitted in ``genX_pipeline.c`` 303 304When a piece of state programming is dynamic, it should have a 305corresponding field in ``anv_gfx_dynamic_state`` and the 306``genX(cmd_buffer_flush_gfx_runtime_state)`` function should be 307updated to ensure we minimize the amount of time an instruction should 308be emitted. Each instruction should have a associated 309``ANV_GFX_STATE_*`` mask so that the dynamic emission code can tell 310when to re-emit an instruction. 311 312 313Generated indirect draws optimization 314------------------------------------- 315 316Indirect draws have traditionally been implemented on Intel HW by 317loading the indirect parameters from memory into HW registers using 318the command streamer's ``MI_LOAD_REGISTER_MEM`` instruction before 319dispatching a draw call to the 3D pipeline. 320 321On recent products, it was found that the command streamer is showing 322as performance bottleneck, because it cannot dispatch draw calls fast 323enough to keep the 3D pipeline busy. 324 325The solution to this problem is to change the way we deal with 326indirect draws. Instead of loading HW registers with values using the 327command streamer, we generate entire set of ``3DPRIMITIVE`` 328instructions using a shader. The generated instructions contain the 329entire draw call parameters. This way the command streamer executes 330only ``3DPRIMITIVE`` instructions and doesn't do any data loading from 331memory or touch HW registers, feeding the 3D pipeline as fast as it 332can. 333 334In ANV this implemented in 2 different ways : 335 336By generating instructions directly into the command stream using a 337side batch buffer. When ANV encounters the first indirect draws, it 338generates a jump into the side batch, the side batch contains a draw 339call using a generation shader for each indirect draw. We keep adding 340on more generation draws into the batch until we have to stop due to 341command buffer end, secondary command buffer calls or a barrier 342containing the access flag ``VK_ACCESS_INDIRECT_COMMAND_READ_BIT``. 343The side batch buffer jump back right after the instruction where it 344was called. Here is a high level diagram showing how the generation 345batch buffer writes in the main command buffer : 346 347.. graphviz:: 348 349 digraph commands_mode { 350 rankdir = "LR" 351 "main-command-buffer" [ 352 label = "main command buffer|...|draw indirect0 start|<f0>jump to\ngeneration batch|<f1>|<f2>empty instruction0|<f3>empty instruction1|...|draw indirect0 end|...|draw indirect1 start|<f4>empty instruction0|<f5>empty instruction1|...|<f6>draw indirect1 end|..." 353 shape = "record" 354 ]; 355 "generation-command-buffer" [ 356 label = "generation command buffer|<f0>|<f1>write draw indirect0|<f2>write draw indirect1|...|<f3>exit jump" 357 shape = "record" 358 ]; 359 "main-command-buffer":f0 -> "generation-command-buffer":f0; 360 "generation-command-buffer":f1 -> "main-command-buffer":f2 [color="#0000ff"]; 361 "generation-command-buffer":f1 -> "main-command-buffer":f3 [color="#0000ff"]; 362 "generation-command-buffer":f2 -> "main-command-buffer":f4 [color="#0000ff"]; 363 "generation-command-buffer":f2 -> "main-command-buffer":f5 [color="#0000ff"]; 364 "generation-command-buffer":f3 -> "main-command-buffer":f1; 365 } 366 367By generating instructions into a ring buffer of commands, when the 368draw count number is high. This solution allows smaller batches to be 369emitted. Here is a high level diagram showing how things are 370executed : 371 372.. graphviz:: 373 374 digraph ring_mode { 375 rankdir=LR; 376 "main-command-buffer" [ 377 label = "main command buffer|...| draw indirect |<f1>generation shader|<f2> jump to ring|<f3> increment\ndraw_base|<f4>..." 378 shape = "record" 379 ]; 380 "ring-buffer" [ 381 label = "ring buffer|<f0>generated draw0|<f1>generated draw1|<f2>generated draw2|...|<f3>exit jump" 382 shape = "record" 383 ]; 384 "main-command-buffer":f2 -> "ring-buffer":f0; 385 "ring-buffer":f3 -> "main-command-buffer":f3; 386 "ring-buffer":f3 -> "main-command-buffer":f4; 387 "main-command-buffer":f3 -> "main-command-buffer":f1; 388 "main-command-buffer":f1 -> "ring-buffer":f1 [color="#0000ff"]; 389 "main-command-buffer":f1 -> "ring-buffer":f2 [color="#0000ff"]; 390 } 391 392Runtime dependencies 393-------------------- 394 395Starting with Intel 12th generation/Alder Lake-P and Intel Arc Alchemist, the Intel 3D driver stack requires GuC firmware for proper operation. You have two options to install the firmware: 396 397- Distro package: Install the pre-packaged firmware included in your Linux distribution's repositories. 398- Manual download: You can download the firmware from the official repository: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915. Place the downloaded files in the /lib/firmware/i915 directory. 399 400Important: For optimal performance, we recommend updating the GuC firmware to version 70.6.3 or later.