1# Expression balancing 2 3## Overview 4**Expression balancing** - optimization that reorganises computation of algebraic expressions. 5The optimization is applied to expressions of the form of a long chain of the same binary associative and commutative operator, such as `ADD`, `MUL`, `AND`, `OR`, and etc. It calculates expression's critical path and, if it can be decreased, reorganises expression so it would be optimal. For example: critical path of `(((a + b) + c) + d)` is 3, whereas critical path of `((a + b) + (c + d))` is 2. 6 7## Rationality 8Increases instruction-level parallelism. 9 10## Dependence 11* RPO analyze. 12 13## Algorithm 14 15Visit all basic blocks in PRO order. 16 17For each block iterate over instructions in reverse order looking for suitable instruction (i.e. operator). 18 19If such instruction is found, it is the last one in an expression and it is necessary to determine the whole chain and its critical path by recursive [analysis](#operator-analysis) of operator inputs (`lhs` and `rhs`). 20 21If the critical path isn't [optimal](#optimal-critical-path), delete expression's operators from the basic block, allocate them to expression's terms in an [optimal way](#operators-allocation) and insert new operators in the basic block. 22 23 24### Note 25#### Operator analysis 26`Analysis of operator inputs` is a check if `lhs`(`rhs`) has the same opcode and has the only user (the operator itself). If so, the input is an operator of the expression, the analysis is called for it too and it should be saved to array of operators (`operators_`); otherwise it is an expression's term and it should be saved to array of terms (`sources_`). 27If inputs belong to different basic blocks but satisfy the conditions above (have single user and are arithmetic operations), they should be moved to a single (dominatee) basic block (similiar to CodeSink). 28 29#### Optimal critical path 30The `optimal critical path` of an expression is `ceil[log2(n_terms)]` (`ceil[x]` is rounding `x` up). 31 32#### Operators allocation 33`Allocation in an optimal way` is an algorithm that creates expression and guarantees that it would have an [optimal](#optimal-critical-path) critical path: 34Assume `terms[]` is an array of expression terms, algorithm called for two arguments `first_idx` and `last_idx` creates expression with terms in range from `terms[first_idx]` to `terms[last_idx]`. 35The algorithm is: 36- If range `first_idx:last_idx` covers `1` element then return this element. 37- If range `first_idx:last_idx` covers `2` elements then create and return operator with `lhs` and `rhs` equal to `terms[first_idx]` and `terms[last_idx]`. 38- Else calculate `split_idx` so that `split_idx` is strictly less than `last_idx` and size of `first_idx:split_idx` is the greatest possible power of 2, create and return operator with `lhs` and `rhs` equal to results of recursive calls `allocate(first_idx, split_idx)` and `allocate(split_idx + 1, last_idx)`. 39 40 41## Pseudocode 42``` 43 for (auto basic_block : GetGraph()->GetBlocksRPO()) { 44 for (auto instruction : basic_block->InstsReverse()) { 45 if (instruction is a suitable operator) { 46 47 // Recursively check and save inputs, determine critical_path_length; 48 ... 49 50 // Calculate optimal_critical_path_length; 51 ... 52 53 if (optimal < current) { 54 AllocateSources(0, size(terms[]) - 1); 55 insert new expression to the basic block; 56 } 57 } 58 } 59 } 60 61 auto AllocateSources(size_t first_idx, size_t last_idx) { 62 if (first_idx == last_idx) { 63 return terms[first_idx]; 64 } 65 if (first_idx == last_idx + 1) { 66 auto operator = operators[free_op_idx++]; 67 operator->GetBasicBlock()->RemoveInst(operator); 68 69 operator.lhs = terms[first_idx]; 70 operator.rhs = terms[last_idx]; 71 72 basic_block.Insert(operator); 73 return operator; 74 } 75 else { 76 size_t split_idx = calculate split_idx; 77 auto lhs = AllocateSources(first_idx, split_idx); 78 auto rhs = AllocateSources(split_idx + 1, last_idx); 79 80 auto operator = operators[free_op_idx++]; 81 basic_block->RemoveInst(operator); 82 83 operator.lhs = lhs; 84 operator.rhs = rhs; 85 86 basic_block.Insert(operator); 87 return operator; 88 } 89 } 90 91``` 92## Examples 93Before expressions balancing: 94``` 950.i64 Parameter -> v8 // a 961.i64 Parameter -> v8 // b 972.i64 Parameter -> v9 // c 983.i64 Parameter -> v10 // d 994.i64 Parameter -> v11 // e 1005.i64 Parameter -> v12 // f 1016.i64 Parameter -> v12 // g 102 1038. Add v0, v1 -> v9 // a + b 1049. Add v2, v8 -> v10 // c + (a + b) 105// As soon as v10 has more than one users it has side-effects, so the algorithm considers it as a term: 10610. Add v3, v9 -> v11, v14 // s10 = d + (c + (a + b)) 107 10811. Add v4, v10 -> v13 // e + s10 10912. Add v5, v6 -> v13 // f + g 11013. Add v11, v12 -> v14 // (e + s10) + (f + g) 11114. Add v10, v13 -> v15 // s10 + ((e + s10) + (f + g)) 112 11315. Return v14 114``` 115The code above contains two expressions: `v8-v10` (critical path is 3) and `v11-v14` (critical path is 3). Moreover, `v11-v14` is [optimal](#optimal-critical-path). 116After expressions balancing: 117``` 1180.i64 Parameter -> v8 // a 1191.i64 Parameter -> v8 // b 1202.i64 Parameter -> v9 // c 1213.i64 Parameter -> v10 // d 1224.i64 Parameter -> v11 // e 1235.i64 Parameter -> v12 // f 1246.i64 Parameter -> v12 // g 125 1268. Add v3, v2 -> v10 // d + c 1279. Add v0, v1 -> v10 // a + b 128// As soon as v10 has more than one users it has side-effects, so the algorithm considers it as a term: 12910. Add v8, v9 -> v11, v14 // s10 = ((d + c) + (a + b)) 130 13111. Add v4, v10 -> v13 // e + s10 13212. Add v5, v6 -> v13 // f + g 13313. Add v11, v12 -> v14 // (e + s10) + (f + g) 13414. Add v10, v13 -> v15 // s10 + ((e + s10) + (f + g)) 135 13615. Return v14 137``` 138## Links 139Source code: 140[balance_expressions.cpp](../optimizer/optimizations/balance_expressions.cpp) 141[balance_expressions.h](../optimizer/optimizations/balance_expressions.h) 142 143Tests: 144[balance_expressions_test.cpp](../tests/balance_expressions_test.cpp) 145 146