• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1
2# Inlining optimization
3
4## Overview
5
6Inlining optimization replaces a method call site with the body of the called method.
7
8Inlining optimizations supports two types of call instructions.
9
10- CallStatic - call static methods
11- CallVirtual - call virtual methods
12
13Inlining of these instructions has two main difference: method resolving and guards.
14
15Resolving of the CallStatic method is quite simple, since target method is explicitly determined by the call instruction
16itself. Thus, no guards is needed as well.
17
18CallVirtual instruction also contains method ID, that should be called, but it also contains input object(receiver) that
19is instance of some class, that, in turn, determines which implementation of the method should be called.
20
21To devirtualize virtual calls, i.e. determine target method in compile time, we use following techniques:
22- Static devirtualization
23- Devirtualization with Class Hierarchy Analysis
24- Inline caches
25
26### Static devirtualization
27
28Receiver is determined via ObjectTypePropagation analysis, that propagates statically known types down to the users.
29For example:
30```
31    newobj v0, A
32    call.virt A.foo, v0
33```
34here, we know receiver class (A) at compile time and we can inline A.foo method without any speculation.
35
36### Devirtualization with Class Hierarchy Analysis
37
38We use dynamic Class Hierarchy Analysis, because Panda Runtime allows dynamic class loading.
39
40Class Hierarchy Analysis(CHA) is a runtime analysis that is invoked every time when a new class is loaded. The result of
41the analysis is a flag in Method class, that indicates that method has a single implementation.
42
43```
44.record Test2 {}
45.record A {}
46.record B <extends=A> {}
47
48.function i32 A.func(A a0) {
49    ldai 1
50    return
51}
52.function i32 B.func(B a0) {
53    ldai 0
54    return
55}
56
57# Actually, here is static devirtualization makes a work, we use this assembly for simplicity and clarity.
58.function i32 Test2.main() {
59    newobj v0, A  # CHA set SingleImplementation for A.foo method
60    call.virt A.foo, v0
61    newobj v0, B  # CHA set SingleImplementation for B.foo method and reset it for A.foo
62    call.virt A.foo, v0
63    return
64}
65```
66
67Since Panda VM allows instantiating a classes with abstract methods, we can't rely on the fact that abstract classes are
68never instantiated. Thus we can't apply the rule: abstract method does not break Single Implementation flag of the
69parent method.
70
71We should protect all devirtualized call sites with CHA by special guards. In short, these guards check that
72devirtualized method still has Single Implementation flag.
73
74CHA has dependency map, where key is a devirtualized method and value is a list of methods that employ this
75devirtualized method, f.e. inline. Once method lost Single Implementation flag, CHA searches this method in the
76dependency map and reset compilation code in the dependent methods.
77
78But dependent methods (methods that inline devirtualized method) may already being executed and resetting the
79compilation code in Method class is not enough. Thus, we walk over call stack of the all threads and search frames with
80dependent methods. Then we set special `ShouldDeoptimize` flag in the frame of these dependent methods. This flag is
81exactly that CHA guards check. When method execution reaches this guard, it reads flag and see that method is no longer
82valid and deoptimize itself.
83
84CHA guard comprises two instructions:
85- `IsMustDeoptimize` - checks `ShouldDeoptimize` flag in the frame, return true if it is set.
86- `DeoptimizeIf` - deoptimize method if input condition is true.
87
88Disassembly of the CHA guard:
89```
90# [inst]    60.b    IsMustDeoptimize           r7 (v61)
91  00a0: ldrb w7, [sp, #592]
92  00a4: and w7, w7, #0x1
93# [inst]    61.     DeoptimizeIf               v60(r7), v14
94  00c0: cbnz w7, #+0x3f4 (addr 0x400316e8fc)
95```
96
97
98### Inline caches
99
100Not implemented yet.
101
102## Inlining algorithm
103
104After target method is determined, Inlining optimization call IrBuilderInliningAnalysis that check each bytecode
105instruction of the inlining method is suiteble for inline.
106
107If method is suitable, the Inliner creates new graph and runs IrBuilder for it. After that it embeds inlined graph into
108the current graph.
109
110_Pseudo code for the inlined graph embedding:_
111```python
112# Check bytecode is suitable for inlining
113if not inlined_method.bytecode.include_only_suitable_insts:
114    return false
115
116# Build graph for inlined method. Pass current SaveState instruction, created for call_inst.
117# This SaveState will be set as aditional input for all SaveState instructions in the inlined graph.
118inlined_graph = IrBuilder.run(inlined_method, current_save_state)
119
120# Split block by call instruction
121[first_bb, second_bb] = curr_block.split_on(call_inst)
122
123# Replace inlined graph incoming dataflow edges
124if call_inst.has_inputs:
125    for input in call_inst.inputs:
126        input.replace_succ(call_inst, inlined_graph.start_block.param_for(input))
127
128# Replace inlined graph outcoming dataflow edges
129call_inst.replace_succs(inlined_graph.return_inst.input)
130inlined_graph.remove(inlined_graph.return_inst)
131
132# Move constants of inlined graph to the current graph
133for constant in inlined_graph.constants:
134    exisitng_constant = current_graph.get_constant(constant.value)
135    if not exisitng_constant:
136        current_graph.append_constant(constant)
137    else:
138        for succ : constant.succs:
139            succ.replace_pred(constant, exisitng_constant)
140
141# Connect inlined graph as successor of the first part of split block
142inlined_graph.start_block.succ.set_pred(first_bb)
143
144# Move all predecessors of inlined end block to second part of split block
145for pred in inlined_graph.end_block:
146    pred.replace_succ(inlined_graph.end_block, second_bb)
147
148```
149
150## Dependencies
151
152- IrBuilder
153- ObjectTypePropagation
154
155## Links
156
157[Inlining optimization source code](../optimizer/optimizations/inlining.cpp)
158
159[Class Hierarchy Analysis source code](../../runtime/cha.cpp)
160