• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Expression balancing
2
3## Overview
4**Expression balancing** - optimization that reorganises computation of algebraic expressions.
5The optimization is applied to expressions of the form of a long chain of the same binary associative and commutative operator, such as `ADD`, `MUL`, `AND`, `OR`, and etc. It calculates expression's critical path and, if it can be decreased, reorganises expression so it would be optimal. For example: critical path of `(((a + b) + c) + d)` is 3, whereas critical path of `((a + b) + (c + d))` is 2.
6
7## Rationality
8Increases instruction-level parallelism.
9
10## Dependence
11* RPO analyze.
12
13## Algorithm
14
15Visit all basic blocks in PRO order.
16
17For each block iterate over instructions in reverse order looking for suitable instruction (i.e. operator).
18
19If such instruction is found, it is the last one in an expression and it is necessary to determine the whole chain and its critical path by recursive [analysis](#operator-analysis) of operator inputs (`lhs` and `rhs`).
20
21If the critical path isn't [optimal](#optimal-critical-path), delete expression's operators from the basic block, allocate them to expression's terms in an [optimal way](#operators-allocation) and insert new operators in the basic block.
22
23
24### Note
25#### Operator analysis
26`Analysis of operator inputs` is a check if `lhs`(`rhs`) has the same opcode and has the only user (the operator itself). If so, the input is an operator of the expression, the analysis is called for it too and it should be saved to array of operators (`operators_`); otherwise it is an expression's term and it should be saved to array of terms (`sources_`).
27If inputs belong to different basic blocks but satisfy the conditions above (have single user and are arithmetic operations), they should be moved to a single (dominatee) basic block (similiar to CodeSink).
28
29#### Optimal critical path
30The `optimal critical path` of an expression is `ceil[log2(n_terms)]` (`ceil[x]` is rounding `x` up).
31
32#### Operators allocation
33`Allocation in an optimal way` is an algorithm that creates expression and guarantees that it would have an [optimal](#optimal-critical-path) critical path:
34Assume `terms[]` is an array of expression terms, algorithm called for two arguments `first_idx` and `last_idx` creates expression with terms in range from `terms[first_idx]` to `terms[last_idx]`.
35The algorithm is:
36- If range `first_idx:last_idx` covers `1` element then return this element.
37- If range `first_idx:last_idx` covers `2` elements then create and return operator with `lhs` and `rhs` equal to `terms[first_idx]` and `terms[last_idx]`.
38- Else calculate `split_idx` so that `split_idx` is strictly less than `last_idx` and size of `first_idx:split_idx` is the greatest possible power of 2, create and return operator with `lhs` and `rhs` equal to results of recursive calls  `allocate(first_idx, split_idx)` and `allocate(split_idx + 1, last_idx)`.
39
40
41## Pseudocode
42```
43    for (auto basic_block : GetGraph()->GetBlocksRPO()) {
44        for (auto instruction : basic_block->InstsReverse()) {
45            if (instruction is a suitable operator) {
46
47                // Recursively check and save inputs, determine critical_path_length;
48                ...
49
50                // Calculate optimal_critical_path_length;
51                ...
52
53                if (optimal < current) {
54                    AllocateSources(0, size(terms[]) - 1);
55                    insert new expression to the basic block;
56                }
57            }
58        }
59    }
60
61    auto AllocateSources(size_t first_idx, size_t last_idx) {
62        if (first_idx == last_idx) {
63            return terms[first_idx];
64        }
65        if (first_idx == last_idx + 1) {
66            auto operator = operators[free_op_idx++];
67            operator->GetBasicBlock()->RemoveInst(operator);
68
69            operator.lhs = terms[first_idx];
70            operator.rhs = terms[last_idx];
71
72            basic_block.Insert(operator);
73            return operator;
74        }
75        else {
76            size_t split_idx = calculate split_idx;
77            auto lhs = AllocateSources(first_idx, split_idx);
78            auto rhs = AllocateSources(split_idx + 1, last_idx);
79
80            auto operator = operators[free_op_idx++];
81            basic_block->RemoveInst(operator);
82
83            operator.lhs = lhs;
84            operator.rhs = rhs;
85
86            basic_block.Insert(operator);
87            return operator;
88        }
89    }
90
91```
92## Examples
93Before expressions balancing:
94```
950.i64 Parameter  -> v8      // a
961.i64 Parameter  -> v8      // b
972.i64 Parameter  -> v9      // c
983.i64 Parameter  -> v10     // d
994.i64 Parameter  -> v11     // e
1005.i64 Parameter  -> v12     // f
1016.i64 Parameter  -> v12     // g
102
1038. Add v0, v1 -> v9         // a + b
1049. Add v2, v8 -> v10        // c + (a + b)
105// As soon as v10 has more than one users it has side-effects, so the algorithm considers it as a term:
10610. Add v3, v9 -> v11, v14  // s10 = d + (c + (a + b))
107
10811. Add v4, v10 -> v13      // e + s10
10912. Add v5, v6 -> v13       // f + g
11013. Add v11, v12 -> v14     // (e + s10) + (f + g)
11114. Add v10, v13 -> v15     // s10 + ((e + s10) + (f + g))
112
11315. Return v14
114```
115The code above contains two expressions: `v8-v10` (critical path is 3) and `v11-v14` (critical path is 3). Moreover, `v11-v14` is [optimal](#optimal-critical-path).
116After expressions balancing:
117```
1180.i64 Parameter  -> v8      // a
1191.i64 Parameter  -> v8      // b
1202.i64 Parameter  -> v9      // c
1213.i64 Parameter  -> v10     // d
1224.i64 Parameter  -> v11     // e
1235.i64 Parameter  -> v12     // f
1246.i64 Parameter  -> v12     // g
125
1268. Add v3, v2 -> v10        // d + c
1279. Add v0, v1 -> v10        // a + b
128// As soon as v10 has more than one users it has side-effects, so the algorithm considers it as a term:
12910. Add v8, v9 -> v11, v14  // s10 = ((d + c) + (a + b))
130
13111. Add v4, v10 -> v13      // e + s10
13212. Add v5, v6 -> v13       // f + g
13313. Add v11, v12 -> v14     // (e + s10) + (f + g)
13414. Add v10, v13 -> v15     // s10 + ((e + s10) + (f + g))
135
13615. Return v14
137```
138## Links
139Source code:
140[balance_expressions.cpp](../optimizer/optimizations/balance_expressions.cpp)
141[balance_expressions.h](../optimizer/optimizations/balance_expressions.h)
142
143Tests:
144[balance_expressions_test.cpp](../tests/balance_expressions_test.cpp)
145
146