• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1r600-sb
2=======
3
4* * * * *
5
6Debugging
7---------
8
9### Environment variables
10
11-   **R600\_DEBUG**
12
13    There are new flags:
14
15    -   **sb** - Enable optimization of graphics shaders
16    -   **sbcl** - Enable optimization of compute shaders (experimental)
17    -   **sbdry** - Dry run, optimize but use source bytecode -
18        useful if you only want to check shader dumps
19        without the risk of lockups and other problems
20    -   **sbstat** - Print optimization statistics (only time so far)
21    -   **sbdump** - Print IR after some passes.
22
23### Regression debugging
24
25If there are any regressions as compared to the default backend
26(R600\_SB=0), it's possible to use the following environment variables
27to find the incorrectly optimized shader that causes the regression.
28
29-   **R600\_SB\_DSKIP\_MODE** - allows to skip optimization for some
30    shaders
31    -   0 - disabled (default)
32    -   1 - skip optimization for the shaders in the range
33        [R600\_SB\_DSKIP\_START; R600\_SB\_DSKIP\_END], that is,
34        optimize only the shaders that are not in this range
35    -   2 - optimize only the shaders in the range
36        [R600\_SB\_DSKIP\_START; R600\_SB\_DSKIP\_END]
37
38-   **R600\_SB\_DSKIP\_START** - start of the range (1-based)
39
40-   **R600\_SB\_DSKIP\_END** - end of the range (1-based)
41
42Example - optimize only the shaders 5, 6, and 7:
43
44    R600_SB_DSKIP_START=5 R600_SB_DSKIP_END=7 R600_SB_DSKIP_MODE=2
45
46All shaders compiled by the application are numbered starting from 1,
47the number of shaders used by the application may be obtained by running
48it with "R600_DEBUG=sb,sbstat" - it will print "sb: shader \#index\#"
49for each compiled shader.
50
51After figuring out the total number of shaders used by the application,
52the variables above allow to use bisection to find the shader that is
53the cause of regression. E.g. if the application uses 100 shaders, we
54can divide the range [1; 100] and run the application with the
55optimization enabled only for the first half of the shaders:
56
57    R600_SB_DSKIP_START=1 R600_SB_DSKIP_END=50 R600_SB_DSKIP_MODE=2 <app>
58
59If the regression is reproduced with these parameters, then the failing
60shader is in the range [1; 50], if it's not reproduced - then it's in
61the range [51; 100]. Then we can divide the new range again and repeat
62the testing, until we'll reduce the range to a single failing shader.
63
64*NOTE: This method relies on the assumption that the application
65produces the same sequence of the shaders on each run. It's not always
66true - some applications may produce different sequences of the shaders,
67in such cases the tools like apitrace may be used to record the trace
68with the application, then this method may be applied when replaying the
69trace - also this may be faster and/or more convenient than testing the
70application itself.*
71
72* * * * *
73
74Intermediate Representation
75---------------------------
76
77### Values
78
79All kinds of the operands (literal constants, references to kcache
80constants, references to GPRs, etc) are currently represented by the
81**value** class (possibly it makes sense to switch to hierarchy of
82classes derived from **value** instead, to save some memory).
83
84All values (except some pseudo values like the exec\_mask or predicate
85register) represent 32bit scalar values - there are no vector values,
86CF/FETCH instructions use groups of 4 values for src and dst operands.
87
88### Nodes
89
90Shader programs are represented using the tree data structure, some
91nodes contain a list of subnodes.
92
93#### Control flow nodes
94
95Control flow information is represented using four special node types
96(based on the ideas from [[1]](#references) )
97
98-   **region\_node** - single-entry, single-exit region.
99
100    All loops and if's in the program are enclosed in region nodes.
101    Region nodes have two containers for phi nodes -
102    region\_node::loop\_phi contains the phi expressions to be executed
103    at the region entry, region\_node::phi contains the phi expressions
104    to be executed at the region exit. It's the only type of the node
105    that contains associated phi expressions.
106
107-   **depart\_node** - "depart region \$id after { ... }"
108
109    Depart target region (jump to exit point) after executing contained
110    code.
111
112-   **repeat\_node** - "repeat region \$id after { ... }"
113
114    Repeat target region (jump to entry point) after executing contained
115    code.
116
117-   **if\_node** - "if (cond) { ... }"
118
119    Execute contained code if condition is true. The difference from
120    [[1]](#references) is that we don't have associated phi expressions
121    for the **if\_node**, we enclose **if\_node** in the
122    **region\_node** and store corresponding phi's in the
123    **region\_node**, this allows more uniform handling.
124
125The target region of depart and repeat nodes is always the region where
126they are located (possibly in the nested region), there are no arbitrary
127jumps/goto's - control flow in the program is always structured.
128
129Typical control flow constructs can be represented as in the following
130examples:
131
132GLSL:
133
134    if (cond) {
135        < 1 >
136    } else {
137        < 2 >
138    }
139
140IR:
141
142    region #0 {
143        depart region #0 after {
144            if (cond) {
145                depart region #0 after {
146                    < 1 >
147                }
148            }
149            < 2 >
150        }
151        <region #0 phi nodes >
152    }
153
154GLSL:
155
156    while (cond) {
157        < 1 >
158    }
159
160IR:
161
162    region #0 {
163        <region #0 loop_phi nodes>
164        repeat region #0 after {
165            region #1 {
166                depart region #1 after {
167                    if (!cond) {
168                        depart region #0
169                    }
170                }
171            }
172            < 1 >
173        }
174        <region #0 phi nodes>
175    }
176
177'Break' and 'continue' inside the loops are directly translated to the
178depart and repeat nodes for the corresponding loop region.
179
180This may look a bit too complicated, but in fact this allows more simple
181and uniform handling of the control flow.
182
183All loop\_phi and phi nodes for some region always have the same number
184of source operands. The number of source operands for
185region\_node::loop\_phi nodes is 1 + number of repeat nodes that
186reference this region as a target. The number of source operands for
187region\_node::phi nodes is equal to the number of depart nodes that
188reference this region as a target. All depart/repeat nodes for the
189region have unique indices equal to the index of source operand for
190phi/loop\_phi nodes.
191
192First source operand for region\_node::loop\_phi nodes (src[0]) is an
193incoming value that enters the region from the outside. Each remaining
194source operand comes from the corresponding repeat node.
195
196More complex example:
197
198GLSL:
199
200    a = 1;
201    while (a < 5) {
202        a = a * 2;
203        if (b == 3) {
204            continue;
205        } else {
206            a = 6;
207        }
208        if (c == 4)
209            break;
210        a = a + 1;
211    }
212
213IR with SSA form:
214
215    a.1 = 1;
216    region #0 {
217        // loop phi values: src[0] - incoming, src[1] - from repeat_1, src[2] - from repeat_2
218        region#0 loop_phi: a.2 = phi a.1, a.6, a.3
219
220        repeat_1 region #0 after {
221            a.3 = a.2 * 2;
222            cond1 = (b == 3);
223            region #1 {
224                depart_0 region #1 after {
225                    if (cond1) {
226                        repeat_2 region #0;
227                    }
228                }
229                a.4 = 6;
230
231                region #1 phi: a.5 = phi a.4; // src[0] - from depart_0
232            }
233            cond2 = (c == 4);
234            region #2 {
235                depart_0 region #2 after {
236                    if (cond2) {
237                        depart_0 region #0;
238                    }
239                }
240            }
241            a.6 = a.5 + 1;
242        }
243
244        region #0 phi: a.7 = phi a.5 // src[0] from depart_0
245    }
246
247Phi nodes with single source operand are just copies, they are not
248really necessary, but this allows to handle all **depart\_node**s in the
249uniform way.
250
251#### Instruction nodes
252
253Instruction nodes represent different kinds of instructions -
254**alu\_node**, **cf\_node**, **fetch\_node**, etc. Each of them contains
255the "bc" structure where all fields of the bytecode are stored (the type
256is **bc\_alu** for **alu\_node**, etc). The operands are represented
257using the vectors of pointers to **value** class (node::src, node::dst)
258
259#### SSA-specific nodes
260
261Phi nodes currently don't have special node class, they are stored as
262**node**. Destination vector contains a single destination value, source
263vector contains 1 or more source values.
264
265Psi nodes [[5], [6]](#references) also don't have a special node class
266and stored as **node**. Source vector contains 3 values for each source
267operand - the **value** of predicate, **value** of corresponding
268PRED\_SEL field, and the source **value** itself.
269
270### Indirect addressing
271
272Special kind of values (VLK\_RELREG) is used to represent indirect
273operands. These values don't have SSA versions. The representation is
274mostly based on the [[2]](#references). Indirect operand contains the
275"offset/address" value (value::rel), (e.g. some SSA version of the AR
276register value, though after some passes it may be any value - constant,
277register, etc), also it contains the maydef and mayuse vectors of
278pointers to **value**s (similar to dst/src vectors in the **node**) to
279represent the effects of aliasing in the SSA form.
280
281E.g. if we have the array R5.x ... R8.x and the following instruction :
282
283    MOV R0.x, R[5 + AR].x
284
285then source indirect operand is represented with the VLK\_RELREG value,
286value::rel is AR, value::maydef is empty (in fact it always contain the
287same number of elements as mayuse to simplify the handling, but they are
288NULLs), value::mayuse contains [R5.x, R6.x, R7.x, R8.x] (or the
289corresponding SSA versions after ssa\_rename).
290
291Additional "virtual variables" as in [HSSA [2]](#references) are not
292used, also there is no special handling for "zero versions". Typical
293programs in our case are small, indirect addressing is rare, array sizes
294are limited by max gpr number, so we don't really need to use special
295tricks to avoid the explosion of value versions. Also this allows more
296precise liveness computation for array elements without modifications to
297the algorithms.
298
299With the following instruction:
300
301    MOV R[5+AR].x, R0.x
302
303we'll have both maydef and mayuse vectors for dst operand filled with
304array values initially: [R5.x, R6.x, R7.x, R8.x]. After the ssa\_rename
305pass mayuse will contain previous versions, maydef will contain new
306potentially-defined versions.
307
308* * * * *
309
310Passes
311------
312
313-   **bc\_parser** - creates the IR from the source bytecode,
314    initializes src and dst value vectors for instruction nodes. Most
315    ALU nodes have one dst operand and the number of source operands is
316    equal to the number of source operands for the ISA instruction.
317    Nodes for PREDSETxx instructions have 3 dst operands - dst[0] is dst
318    gpr as in the original instruction, other two are pseudo-operands
319    that represent possibly updated predicate and exec\_mask. Predicate
320    values are used in the predicated alu instructions (node::pred),
321    exec\_mask values are used in the if\_nodes (if\_node::cond). Each
322    vector operand in the CF/TEX/VTX instructions is represented with 4
323    values - components of the vector.
324
325-   **ssa\_prepare** - creates phi expressions.
326
327-   **ssa\_rename** - renames the values (assigns versions).
328
329-   **liveness** - liveness computation, sets 'dead' flag for unused
330    nodes and values, optionally computes interference information for
331    the values.
332
333-   **dce\_cleanup** - eliminates 'dead' nodes, also removes some
334    unnecessary nodes created by bc\_parser, e.g. the nodes for the JUMP
335    instructions in the source, containers for ALU groups (they were
336    only needed for the ssa\_rename pass)
337
338-   **if\_conversion** - converts control flow with if\_nodes to the
339    data flow in cases where it can improve performance (small alu-only
340    branches). Both branches are executed speculatively and the phi
341    expressions are replaced with conditional moves (CNDxx) to select
342    the final value using the same condition predicate as was used by
343    the original if\_node. E.g. **if\_node** used dst[2] from PREDSETxx
344    instruction, CNDxx now uses dst[0] from the same PREDSETxx
345    instruction.
346
347-   **peephole** - peephole optimizations
348
349-   **gvn** - Global Value Numbering [[2]](#references),
350    [[3]](#references)
351
352-   **gcm** - Global Code Motion [[3]](#references). Also performs
353    grouping of the instructions of the same kind (CF/FETCH/ALU).
354
355-   register allocation passes, some ideas are used from
356    [[4]](#references), but implementation is simplified to make it more
357    efficient in terms of the compilation speed (e.g. no recursive
358    recoloring) while achieving good enough results.
359
360    -   **ra\_split** - prepares the program to register allocation.
361        Splits live ranges for constrained values by inserting the
362        copies to/from temporary values, so that the live range of the
363        constrained values becomes minimal.
364
365    -   **ra\_coalesce** - performs global allocation on registers used
366        in CF/FETCH instructions. It's performed first to make sure they
367        end up in the same GPR. Also tries to allocate all values
368        involved in copies (inserted by the ra\_split pass) to the same
369        register, so that the copies may be eliminated.
370
371    -   **ra\_init** - allocates gpr arrays (if indirect addressing is
372        used), and remaining values.
373
374-   **post\_scheduler** - ALU scheduler, handles VLIW packing and
375    performs the final register allocation for local values inside ALU
376    clauses. Eliminates all coalesced copies (if src and dst of the copy
377    are allocated to the same register).
378
379-   **ra\_checker** - optional debugging pass that tries to catch basic
380    errors of the scheduler or regalloc,
381
382-   **bc\_finalize** - propagates the regalloc information from values
383    in node::src and node::dst vectors to the bytecode fields, converts
384    control flow structure (region/depart/repeat) to the target
385    instructions (JUMP/ELSE/POP,
386    LOOP\_START/LOOP\_END/LOOP\_CONTINUE/LOOP\_BREAK).
387
388-   **bc\_builder** - builds final bytecode,
389
390* * * * *
391
392References
393----------
394
395[1] ["Tree-Based Code Optimization. A Thesis Proposal", Carl
396McConnell](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.4210&rep=rep1&type=pdf)
397
398[2] ["Effective Representation of Aliases and Indirect Memory Operations
399in SSA Form", Fred Chow, Sun Chan, Shin-Ming Liu, Raymond Lo, Mark
400Streich](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.6974&rep=rep1&type=pdf)
401
402[3] ["Global Code Motion. Global Value Numbering.", Cliff
403Click](http://www.cs.washington.edu/education/courses/cse501/06wi/reading/click-pldi95.pdf)
404
405[4] ["Register Allocation for Programs in SSA Form", Sebastian
406Hack](http://digbib.ubka.uni-karlsruhe.de/volltexte/documents/6532)
407
408[5] ["An extension to the SSA representation for predicated code",
409Francois de
410Ferriere](http://www.cdl.uni-saarland.de/ssasem/talks/Francois.de.Ferriere.pdf)
411
412[6] ["Improvements to the Psi-SSA Representation", F. de
413Ferriere](http://www.scopesconf.org/scopes-07/presentations/3_Presentation.pdf)
414