• Home
  • Raw
  • Download

Lines Matching +full:run +full:- +full:shader +full:- +full:db

7 -----------------
12 reverse-engineering the hardware, as glue to get at the "interesting" GPU
15 The library is only built if ``-Dtools=asahi`` is passed. It builds a single
19 For example, to trace an app ``./app``, run:
24 -----------------
26 At an API level, vertex shader outputs need to be interpolated to become
27 fragment shader inputs. This process is logically pipelined in AGX, with a value
28 traveling from a vertex shader to remapping hardware to coefficient register
29 setup to the fragment shader to the iterator hardware. Each stage is described
32 Vertex shader
35 A vertex shader (running on the :term:`Unified Shader Cores`) outputs varyings with the
36 ``st_var`` instruction. ``st_var`` takes a *vertex output index* and a 32-bit
38 of the shader in the "Bind Vertex Pipeline" packet. The value may be interpreted
39 consist of a single 32-bit value or an aligned 16-bit register pair, depending
40 on whether interpolation should happen at 32-bit or 16-bit. Vertex outputs are
42 32-bit user varyings coming next with perspective, flat, and linear interpolated
43 varyings grouped in that order, then 16-bit user varyings with the same groupings,
45 *clip distances* are not accessible from the fragment shader; if the fragment
46 shader needs to read the interpolated clip distance, the vertex shader must
47 *also* write the clip distance values to a user varying for the fragment shader
51 .. list-table:: Ordering of vertex outputs with all outputs used
53 :header-rows: 1
55 * - Size (words)
56 - Value
57 * - 4
58 - Vertex position
59 * - 1
60 - 32-bit smooth varying 0
61 * -
62 - ...
63 * - 1
64 - 32-bit smooth varying m
65 * - 1
66 - 32-bit flat varying 0
67 * -
68 - ...
69 * - 1
70 - 32-bit flat varying n
71 * - 1
72 - 32-bit linear varying 0
73 * -
74 - ...
75 * - 1
76 - 32-bit linear varying o
77 * - 1
78 - Packed pair of 16-bit smooth varyings 0
79 * -
80 - ...
81 * - 1
82 - Packed pair of 16-bit smooth varyings p
83 * - 1
84 - Packed pair of 16-bit flat varyings 0
85 * -
86 - ...
87 * - 1
88 - Packed pair of 16-bit flat varyings q
89 * - 1
90 - Packed pair of 16-bit linear varyings 0
91 * -
92 - ...
93 * - 1
94 - Packed pair of 16-bit linear varyings r
95 * - 1
96 - Point size
97 * - 1
98 - Clip distance for plane 0
99 * -
100 - ...
101 * - 1
102 - Clip distance for plane 15
113 .. list-table:: Ordering of remapped slots
115 :header-rows: 1
117 * - Index
118 - Value
119 * - 0
120 - Fragment coord W
121 * - 1
122 - Fragment coord Z
123 * - 2
124 - 32-bit varying 0
125 * -
126 - ...
127 * - 2 + m
128 - 32-bit varying m
129 * - 2 + m + 1
130 - Packed pair of 16-bit varyings 0
131 * -
132 - ...
133 * - 2 + m + n + 1
134 - Packed pair of 16-bit varyings n
139 The fragment shader does not see the physical slots.
141 register is a register allocated constant for all fragment shader invocations in
142 a given polygon. Physically, it contains the values output by the vertex shader
149 preceded by a header. The header contains the number of 32-bit varying slots. As
151 is below this count are treated as 32-bit. The remaining slots are treated as
152 16-bits.
161 However, this may be inconvenient for some APIs that require a separable shader
162 model. For these APIs, the flexibility to mix-and-match slots and coefficient
163 registers allows mixing shaders without shader variants. In that case, the
165 bindings are fixed and known at compile-time, the bindings could be generated
168 Fragment shader
171 In the fragment shader, coefficient registers, identified by the prefix ``cf``
185 To actually interpolate varyings, AGX provides fixed-function iteration hardware
189 the required coefficients are preloaded before the shader begins execution.
191 a data fence, and does not require the shader to wait on a data fence before
195 -------------
200 The simplest layout is **strided linear**. Pixels are stored in raster-order in
201 memory with a software-controlled stride. Strided linear images are useful for
202 working with modifier-unaware window systems, however performance will suffer.
205 - Strides must be a multiple of 16 bytes.
206 - Strides must be nonzero. For 1D images where the stride is logically
208 - Only 1D, 2D, and 2D Array images may be linear. In particular, no 3D or cubemaps.
209 - 2D images must not be mipmapped.
210 - Block-compressed formats and multisampled images are unsupported. Elements of
222 power-of-two sized tiles. The tiles themselves are stored in raster-order.
227 are used (:math:`n` power-of-two). :math:`n` is such that each page contains
228 exactly one tile. Only power-of-two block sizes are supported in hardware,
232 .. list-table:: Tile sizes for large images
234 :header-rows: 1
236 * - Bytes per block
237 - Tile size
238 * - 1
239 - 128 x 128
240 * - 2
241 - 128 x 64
242 * - 4
243 - 64 x 64
244 * - 8
245 - 64 x 32
246 * - 16
247 - 32 x 32
250 In addition, non-power-of-two large images have extra padding tiles when
262 In other words, small images use the smallest square power-of-two tile such that
272 by the dimensions of level 0. For power-of-two images, the two calculations are
273 equivalent. However, they differ subtly for non-power-of-two images. To
275 tiles for level 0 should be right-shifted by :math:`2l`. That appears to divide
280 non-power-of-two integer multiplication is only required for level 0.
296 drm-shim (Linux only)
297 ---------------------
300 stack, allowing the Mesa driver to run on non-M1 Linux hardware. This can be
305 -Dgallium-drivers=asahi -Dtools=drm-shim
307 Then run an OpenGL workload with environment variable:
309 .. code-block:: sh
311 LD_PRELOAD=~/mesa/build/src/asahi/drm-shim/libasahi_noop_drm_shim.so
313 For example to compile a shader with shaderdb and print some statistics along
316 .. code-block:: sh
318shader-db$ AGX_MESA_DEBUG=shaders,shaderdb ASAHI_MESA_DEBUG=precompile LD_PRELOAD=~/mesa/build/src…
320 The drm-shim implementation for Asahi is located in ``src/asahi/drm-shim``. The
321 drm-shim implementation there should be updated as new UABI is added.
324 -----------------
344 Unified Shader Cores
345 A unified shader core is a small CPU that runs shader code. The core is
348 compute have separate ISAs for shader stages.
367 Hardware unit which buffers the outputs of the vertex shader (varyings).