1 2# Inlining optimization 3 4## Overview 5 6Inlining optimization replaces a method call site with the body of the called method. 7 8Inlining optimizations supports two types of call instructions. 9 10- CallStatic - call static methods 11- CallVirtual - call virtual methods 12 13Inlining of these instructions has two main difference: method resolving and guards. 14 15Resolving of the CallStatic method is quite simple, since target method is explicitly determined by the call instruction 16itself. Thus, no guards is needed as well. 17 18CallVirtual instruction also contains method ID, that should be called, but it also contains input object(receiver) that 19is instance of some class, that, in turn, determines which implementation of the method should be called. 20 21To devirtualize virtual calls, i.e. determine target method in compile time, we use following techniques: 22- Static devirtualization 23- Devirtualization with Class Hierarchy Analysis 24- Inline caches 25 26### Static devirtualization 27 28Receiver is determined via ObjectTypePropagation analysis, that propagates statically known types down to the users. 29For example: 30``` 31 newobj v0, A 32 call.virt A.foo, v0 33``` 34here, we know receiver class (A) at compile time and we can inline A.foo method without any speculation. 35 36### Devirtualization with Class Hierarchy Analysis 37 38We use dynamic Class Hierarchy Analysis, because Panda Runtime allows dynamic class loading. 39 40Class Hierarchy Analysis(CHA) is a runtime analysis that is invoked every time when a new class is loaded. The result of 41the analysis is a flag in Method class, that indicates that method has a single implementation. 42 43``` 44.record Test2 {} 45.record A {} 46.record B <extends=A> {} 47 48.function i32 A.func(A a0) { 49 ldai 1 50 return 51} 52.function i32 B.func(B a0) { 53 ldai 0 54 return 55} 56 57# Actually, here is static devirtualization makes a work, we use this assembly for simplicity and clarity. 58.function i32 Test2.main() { 59 newobj v0, A # CHA set SingleImplementation for A.foo method 60 call.virt A.foo, v0 61 newobj v0, B # CHA set SingleImplementation for B.foo method and reset it for A.foo 62 call.virt A.foo, v0 63 return 64} 65``` 66 67Since Panda VM allows instantiating a classes with abstract methods, we can't rely on the fact that abstract classes are 68never instantiated. Thus we can't apply the rule: abstract method does not break Single Implementation flag of the 69parent method. 70 71We should protect all devirtualized call sites with CHA by special guards. In short, these guards check that 72devirtualized method still has Single Implementation flag. 73 74CHA has dependency map, where key is a devirtualized method and value is a list of methods that employ this 75devirtualized method, f.e. inline. Once method lost Single Implementation flag, CHA searches this method in the 76dependency map and reset compilation code in the dependent methods. 77 78But dependent methods (methods that inline devirtualized method) may already being executed and resetting the 79compilation code in Method class is not enough. Thus, we walk over call stack of the all threads and search frames with 80dependent methods. Then we set special `ShouldDeoptimize` flag in the frame of these dependent methods. This flag is 81exactly that CHA guards check. When method execution reaches this guard, it reads flag and see that method is no longer 82valid and deoptimize itself. 83 84CHA guard comprises two instructions: 85- `IsMustDeoptimize` - checks `ShouldDeoptimize` flag in the frame, return true if it is set. 86- `DeoptimizeIf` - deoptimize method if input condition is true. 87 88Disassembly of the CHA guard: 89``` 90# [inst] 60.b IsMustDeoptimize r7 (v61) 91 00a0: ldrb w7, [sp, #592] 92 00a4: and w7, w7, #0x1 93# [inst] 61. DeoptimizeIf v60(r7), v14 94 00c0: cbnz w7, #+0x3f4 (addr 0x400316e8fc) 95``` 96 97 98### Inline caches 99 100Not implemented yet. 101 102## Inlining algorithm 103 104After target method is determined, Inlining optimization call IrBuilderInliningAnalysis that check each bytecode 105instruction of the inlining method is suiteble for inline. 106 107If method is suitable, the Inliner creates new graph and runs IrBuilder for it. After that it embeds inlined graph into 108the current graph. 109 110_Pseudo code for the inlined graph embedding:_ 111```python 112# Check bytecode is suitable for inlining 113if not inlined_method.bytecode.include_only_suitable_insts: 114 return false 115 116# Build graph for inlined method. Pass current SaveState instruction, created for call_inst. 117# This SaveState will be set as aditional input for all SaveState instructions in the inlined graph. 118inlined_graph = IrBuilder.run(inlined_method, current_save_state) 119 120# Split block by call instruction 121[first_bb, second_bb] = curr_block.split_on(call_inst) 122 123# Replace inlined graph incoming dataflow edges 124if call_inst.has_inputs: 125 for input in call_inst.inputs: 126 input.replace_succ(call_inst, inlined_graph.start_block.param_for(input)) 127 128# Replace inlined graph outcoming dataflow edges 129call_inst.replace_succs(inlined_graph.return_inst.input) 130inlined_graph.remove(inlined_graph.return_inst) 131 132# Move constants of inlined graph to the current graph 133for constant in inlined_graph.constants: 134 exisitng_constant = current_graph.get_constant(constant.value) 135 if not exisitng_constant: 136 current_graph.append_constant(constant) 137 else: 138 for succ : constant.succs: 139 succ.replace_pred(constant, exisitng_constant) 140 141# Connect inlined graph as successor of the first part of split block 142inlined_graph.start_block.succ.set_pred(first_bb) 143 144# Move all predecessors of inlined end block to second part of split block 145for pred in inlined_graph.end_block: 146 pred.replace_succ(inlined_graph.end_block, second_bb) 147 148``` 149 150## Dependencies 151 152- IrBuilder 153- ObjectTypePropagation 154 155## Links 156 157[Inlining optimization source code](../optimizer/optimizations/inlining.cpp) 158 159[Class Hierarchy Analysis source code](../../runtime/cha.cpp) 160