asahi.rst - OpenGrok cross reference for /third_party/mesa3d/docs/drivers/asahi.rst

Lines Matching +full:run +full:- +full:shader +full:- +full:db
7 -----------------
12 reverse-engineering the hardware, as glue to get at the "interesting" GPU
15 The library is only built if ``-Dtools=asahi`` is passed. It builds a single
19 For example, to trace an app ``./app``, run:
24 -----------------
26 At an API level, vertex shader outputs need to be interpolated to become
27 fragment shader inputs. This process is logically pipelined in AGX, with a value
28 traveling from a vertex shader to remapping hardware to coefficient register
29 setup to the fragment shader to the iterator hardware. Each stage is described
32 Vertex shader
35 A vertex shader (running on the :term:`Unified Shader Cores`) outputs varyings with the
36 ``st_var`` instruction. ``st_var`` takes a *vertex output index* and a 32-bit
38 of the shader in the "Bind Vertex Pipeline" packet. The value may be interpreted
39 consist of a single 32-bit value or an aligned 16-bit register pair, depending
40 on whether interpolation should happen at 32-bit or 16-bit. Vertex outputs are
42 32-bit user varyings coming next with perspective, flat, and linear interpolated
43 varyings grouped in that order, then 16-bit user varyings with the same groupings,
45 *clip distances* are not accessible from the fragment shader; if the fragment
46 shader needs to read the interpolated clip distance, the vertex shader must
47 *also* write the clip distance values to a user varying for the fragment shader
51 .. list-table:: Ordering of vertex outputs with all outputs used
53    :header-rows: 1
55    * - Size (words)
56      - Value
57    * - 4
58      - Vertex position
59    * - 1
60      - 32-bit smooth varying 0
61    * -
62      - ...
63    * - 1
64      - 32-bit smooth varying m
65    * - 1
66      - 32-bit flat varying 0
67    * -
68      - ...
69    * - 1
70      - 32-bit flat varying n
71    * - 1
72      - 32-bit linear varying 0
73    * -
74      - ...
75    * - 1
76      - 32-bit linear varying o
77    * - 1
78      - Packed pair of 16-bit smooth varyings 0
79    * -
80      - ...
81    * - 1
82      - Packed pair of 16-bit smooth varyings p
83    * - 1
84      - Packed pair of 16-bit flat varyings 0
85    * -
86      - ...
87    * - 1
88      - Packed pair of 16-bit flat varyings q
89    * - 1
90      - Packed pair of 16-bit linear varyings 0
91    * -
92      - ...
93    * - 1
94      - Packed pair of 16-bit linear varyings r
95    * - 1
96      - Point size
97    * - 1
98      - Clip distance for plane 0
99    * -
100      - ...
101    * - 1
102      - Clip distance for plane 15
113 .. list-table:: Ordering of remapped slots
115    :header-rows: 1
117    * - Index
118      - Value
119    * - 0
120      - Fragment coord W
121    * - 1
122      - Fragment coord Z
123    * - 2
124      - 32-bit varying 0
125    * -
126      - ...
127    * - 2 + m
128      - 32-bit varying m
129    * - 2 + m + 1
130      - Packed pair of 16-bit varyings 0
131    * -
132      - ...
133    * - 2 + m + n + 1
134      - Packed pair of 16-bit varyings n
139 The fragment shader does not see the physical slots.
141 register is a register allocated constant for all fragment shader invocations in
142 a given polygon. Physically, it contains the values output by the vertex shader
149 preceded by a header. The header contains the number of 32-bit varying slots. As
151 is below this count are treated as 32-bit. The remaining slots are treated as
152 16-bits.
161 However, this may be inconvenient for some APIs that require a separable shader
162 model. For these APIs, the flexibility to mix-and-match slots and coefficient
163 registers allows mixing shaders without shader variants. In that case, the
165 bindings are fixed and known at compile-time, the bindings could be generated
168 Fragment shader
171 In the fragment shader, coefficient registers, identified by the prefix ``cf``
185 To actually interpolate varyings, AGX provides fixed-function iteration hardware
189 the required coefficients are preloaded before the shader begins execution.
191 a data fence, and does not require the shader to wait on a data fence before
195 -------------
200 The simplest layout is **strided linear**. Pixels are stored in raster-order in
201 memory with a software-controlled stride. Strided linear images are useful for
202 working with modifier-unaware window systems, however performance will suffer.
205 - Strides must be a multiple of 16 bytes.
206 - Strides must be nonzero. For 1D images where the stride is logically
208 - Only 1D, 2D, and 2D Array images may be linear. In particular, no 3D or cubemaps.
209 - 2D images must not be mipmapped.
210 - Block-compressed formats and multisampled images are unsupported. Elements of
222 power-of-two sized tiles. The tiles themselves are stored in raster-order.
227 are used (:math:`n` power-of-two). :math:`n` is such that each page contains
228 exactly one tile. Only power-of-two block sizes are supported in hardware,
232 .. list-table:: Tile sizes for large images
234    :header-rows: 1
236    * - Bytes per block
237      - Tile size
238    * - 1
239      - 128 x 128
240    * - 2
241      - 128 x 64
242    * - 4
243      - 64 x 64
244    * - 8
245      - 64 x 32
246    * - 16
247      - 32 x 32
250 In addition, non-power-of-two large images have extra padding tiles when
262 In other words, small images use the smallest square power-of-two tile such that
272 by the dimensions of level 0. For power-of-two images, the two calculations are
273 equivalent. However, they differ subtly for non-power-of-two images. To
275 tiles for level 0 should be right-shifted by :math:`2l`. That appears to divide
280 non-power-of-two integer multiplication is only required for level 0.
296 drm-shim (Linux only)
297 ---------------------
300 stack, allowing the Mesa driver to run on non-M1 Linux hardware. This can be
305    -Dgallium-drivers=asahi -Dtools=drm-shim
307 Then run an OpenGL workload with environment variable:
309 .. code-block:: sh
311    LD_PRELOAD=~/mesa/build/src/asahi/drm-shim/libasahi_noop_drm_shim.so
313 For example to compile a shader with shaderdb and print some statistics along
316 .. code-block:: sh
318 …shader-db$ AGX_MESA_DEBUG=shaders,shaderdb ASAHI_MESA_DEBUG=precompile LD_PRELOAD=~/mesa/build/src…
320 The drm-shim implementation for Asahi is located in ``src/asahi/drm-shim``. The
321 drm-shim implementation there should be updated as new UABI is added.
324 -----------------
344    Unified Shader Cores
345       A unified shader core is a small CPU that runs shader code. The core is
348       compute have separate ISAs for shader stages.
367       Hardware unit which buffers the outputs of the vertex shader (varyings).