• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Loop Unrolling
2## Overview
3
4`Loop unrolling` optimization increases loop body by copying instructions of the original loop body.
5
6## Rationality
7
8Increase number of instructions for each loop iteration, reduce branch penalty.
9
10## Dependence
11
12* Loop Analysis
13* Dominators Tree
14* Reverse Post Order (RPO)
15* Loop Peeling (to make loop with exit-point from backedge)
16
17
18## Algorithm
19
20`Loop unrolling` modifies loops with the following requirements:
21
22* loop is not irreducible;
23* loop-header is not OSR-entry;
24* there is only 1 back-edge;
25* loop-backedge is a single loop-exit point;
26* there are no inner loops;
27
28Optimization settings:
29
30**Instructions limit** - the maximum number of loop instructions after its unrolling;
31
32**Unroll factor** - the number of loop body copies including the original one;
33
34There two types of unrolling: with side-exits and without them. Unrolling without side-exits is applied for countable loops.
35
36### Countable loops
37
38Loop is countable if it contains compare between loop-index instruction and test-instruction defined outside loop.
39Loop-index should be incremented or decremented by a constant. Currently signed integer indexes are supported.
40
41```
42[Loop-header]
43Phi(init, update)
44...
45
46[Loop-backedge]
47update(phi, constant)
48Compare(update, test)
49...
50where `update` is Add or Sub instruction
51```
52### Unrolling without side-exits
53
54There are 3 stages of unrolling loop without side-exits:
55
561. Clone loop-body without loop-backedge `factor` times;
572. Fix loop-backedge compare by incrementing/decrementing test input with constant, counted using formula: `factor * loop_step`. If compare has `not-equal` condition code, replace it by `less-than`/`greater-than`
583. Clone loop-body with the original loop-backedge `factor` times, but replace edge to the loop-header with edge the loop-outer block;
59
60Here `factor` means number of cloned loop bodies.
61
62```cpp
63/---->[header]
64|        |
65|        v
66|     [loop-body]
67|        |
68|        v
69\-----[backedge]----> ...
70```
71```
72  /---->[header]
73  |        |
74  |        v
75  |     [loop-body]
76  |        |
77  |        v
78  |     [loop-body']
79  |        |
80  |        v
81  |     [loop-body'']
82  |        |
83  |        v
84  \-----[backedge]----> ...
85           |
86           v
87       [loop-body]
88           |
89           v
90       [backedge]------\
91           |           |
92           v           |
93       [loop-body]     |
94           |           |
95           v           |
96      [outer-block]<---/
97```
98
99
100
101### Unrolling with side-exits
102
103For this case both loop-body and loop-backedge are cloned:
104```cpp
105   /---->[header]
106  |         |
107  |         v
108  |     [loop-body]
109  |         |
110  |         v
111  |     [backedge]------------\   << exit-block
112  |         |                 |
113  |         v                 |
114  |    [loop-body-clone]      |
115  |         |                 |
116  |         v                 |
117  \-----[backedge-clone]----->|       << last-block
118                              |
119                              v
120                           [outer]-----> ...
121```
122## Pseudocode
123```cpp
124if (IsLoopCountable(loop)) {
125    auto clone_loop = CloneLoop(loop);
126    UnrollLoopBodyWithoutSideExits(loop);
127    FixCompareInst(loop);
128    UnrollLoopBodyWithSideExits(clone_loop);
129    RemoveEdgeToLoopHeader(clone_loop);
130} else {
131    UnrollLoopBodyWithSideExits(loop);
132}
133```
134## Examples
135
136Countable loop unrolling:
137```cpp
138    auto graph = CreateEmptyGraph();
139    GRAPH(graph) {
140        CONSTANT(0, stop);
141        CONSTANT(1, 0);  // a = 0, b = 0
142        CONSTANT(2, step);
143        BASIC_BLOCK(2, 3, 4) {
144            INST(3, Opcode::Compare).b().SrcType(DataType::INT32).CC(CC_LT).Inputs(1, 0);
145            INST(4, Opcode::IfImm).SrcType(DataType::BOOL).CC(CC_NE).Imm(0).Inputs(3);  // if a < stop
146        }
147        BASIC_BLOCK(3, 3, 4) {
148            INST(5, Opcode::Phi).s32().Inputs(1, 7);  // a
149            INST(6, Opcode::Phi).s32().Inputs(1, 8);  // b
150            INST(7, Opcode::Add).s32().Inputs(5, 2);  // a += step
151            INST(8, Opcode::Add).s32().Inputs(6, 7);  // b += a
152            INST(9, Opcode::Compare).b().SrcType(DataType::INT32).CC(CC_LT).Inputs(7, 0);
153            INST(10, Opcode::IfImm).SrcType(DataType::BOOL).CC(CC_NE).Imm(0).Inputs(9);  // if a < stop
154        }
155        BASIC_BLOCK(4, -1) {
156            INST(11, Opcode::Phi).s32().Inputs(1, 6);
157            INST(12, Opcode::Return).s32().Inputs(11);  // return b;
158        }
159    }
160    return graph;
161
162```
163```cpp
164    uint32_t UNROLL_FACTOR = 2;
165
166    GRAPH(graph_unroll) {
167        CONSTANT(0, 10);
168        CONSTANT(1, 0);  // a = 0, b = 0
169        CONSTANT(2, 1);
170        BASIC_BLOCK(2, 3, 5) {
171            INST(20, Opcode::SubI).s32().Inputs(0).Imm(UNROLL_FACTOR - 1);
172            INST(3, Opcode::Compare).b().SrcType(DataType::INT32).CC(CC_LT).Inputs(1, 20);  // if (a < 10 - (UNROLL_FACTOR - 1))
173            INST(4, Opcode::IfImm).SrcType(DataType::BOOL).CC(CC_NE).Imm(0).Inputs(3);
174        }
175        BASIC_BLOCK(3, 3, 5) {
176            INST(5, Opcode::Phi).s32().Inputs(1, 21);   // a
177            INST(6, Opcode::Phi).s32().Inputs(1, 22);   // b
178            INST(7, Opcode::Add).s32().Inputs(5, 2);    // a + 1
179            INST(8, Opcode::Add).s32().Inputs(6, 7);    // b + 1
180            INST(21, Opcode::Add).s32().Inputs(7, 2);   // a + 1
181            INST(22, Opcode::Add).s32().Inputs(8, 21);  // b + 1
182            INST(9, Opcode::Compare).b().SrcType(DataType::INT32).CC(CC_LT).Inputs(21, 20);
183            INST(10, Opcode::IfImm).SrcType(DataType::BOOL).CC(CC_NE).Imm(0).Inputs(9);  // if (a < 10 - (UNROLL_FACTOR - 1))
184        }
185        BASIC_BLOCK(5, 6, 4) {
186            INST(11, Opcode::Phi).s32().Inputs(1, 8);
187            INST(25, Opcode::Phi).s32().Inputs(1, 21);                                       // a
188            INST(26, Opcode::Phi).s32().Inputs(1, 22);                                       // b
189            INST(27, Opcode::Compare).b().SrcType(DataType::INT32).CC(CC_LT).Inputs(25, 0);  // if (a < 10)
190            INST(28, Opcode::IfImm).SrcType(DataType::BOOL).CC(CC_NE).Imm(0).Inputs(27);
191        }
192        BASIC_BLOCK(6, 4) {
193            INST(29, Opcode::Add).s32().Inputs(25, 2);   // a + 1
194            INST(30, Opcode::Add).s32().Inputs(26, 29);  // b + 1
195        }
196        BASIC_BLOCK(4, -1) {
197            INST(31, Opcode::Phi).s32().Inputs(11, 26);
198            INST(12, Opcode::Return).s32().Inputs(31);  // return b
199        }
200    }
201
202```
203## Links
204Source code:
205
206[loop_unroll.cpp](../optimizer/optimizations/loop_unroll.cpp)
207
208[loop_unroll.h](../optimizer/optimizations/loop_unroll.h)
209
210Tests:
211
212[loop_unroll_test.cpp](../tests/loop_unroll_test.cpp)