# Inlining optimization ## Overview Inlining optimization replaces a method call site with the body of the called method. Inlining optimizations supports two types of call instructions. - CallStatic - call static methods - CallVirtual - call virtual methods Inlining of these instructions has two main difference: method resolving and guards. Resolving of the CallStatic method is quite simple, since target method is explicitly determined by the call instruction itself. Thus, no guards is needed as well. CallVirtual instruction also contains method ID, that should be called, but it also contains input object(receiver) that is instance of some class, that, in turn, determines which implementation of the method should be called. To devirtualize virtual calls, i.e. determine target method in compile time, we use following techniques: - Static devirtualization - Devirtualization with Class Hierarchy Analysis - Inline caches ### Static devirtualization Receiver is determined via ObjectTypePropagation analysis, that propagates statically known types down to the users. For example: ``` newobj v0, A call.virt A.foo, v0 ``` here, we know receiver class (A) at compile time and we can inline A.foo method without any speculation. ### Devirtualization with Class Hierarchy Analysis We use dynamic Class Hierarchy Analysis, because Panda Runtime allows dynamic class loading. Class Hierarchy Analysis(CHA) is a runtime analysis that is invoked every time when a new class is loaded. The result of the analysis is a flag in Method class, that indicates that method has a single implementation. ``` .record Test2 {} .record A {} .record B {} .function i32 A.func(A a0) { ldai 1 return } .function i32 B.func(B a0) { ldai 0 return } # Actually, here is static devirtualization makes a work, we use this assembly for simplicity and clarity. .function i32 Test2.main() { newobj v0, A # CHA set SingleImplementation for A.foo method call.virt A.foo, v0 newobj v0, B # CHA set SingleImplementation for B.foo method and reset it for A.foo call.virt A.foo, v0 return } ``` Since Panda VM allows instantiating a classes with abstract methods, we can't rely on the fact that abstract classes are never instantiated. Thus we can't apply the rule: abstract method does not break Single Implementation flag of the parent method. We should protect all devirtualized call sites with CHA by special guards. In short, these guards check that devirtualized method still has Single Implementation flag. CHA has dependency map, where key is a devirtualized method and value is a list of methods that employ this devirtualized method, f.e. inline. Once method lost Single Implementation flag, CHA searches this method in the dependency map and reset compilation code in the dependent methods. But dependent methods (methods that inline devirtualized method) may already being executed and resetting the compilation code in Method class is not enough. Thus, we walk over call stack of the all threads and search frames with dependent methods. Then we set special `ShouldDeoptimize` flag in the frame of these dependent methods. This flag is exactly that CHA guards check. When method execution reaches this guard, it reads flag and see that method is no longer valid and deoptimize itself. CHA guard comprises two instructions: - `IsMustDeoptimize` - checks `ShouldDeoptimize` flag in the frame, return true if it is set. - `DeoptimizeIf` - deoptimize method if input condition is true. Disassembly of the CHA guard: ``` # [inst] 60.b IsMustDeoptimize r7 (v61) 00a0: ldrb w7, [sp, #592] 00a4: and w7, w7, #0x1 # [inst] 61. DeoptimizeIf v60(r7), v14 00c0: cbnz w7, #+0x3f4 (addr 0x400316e8fc) ``` ### Inline caches Not implemented yet. ## Inlining algorithm After target method is determined, Inlining optimization call IrBuilderInliningAnalysis that check each bytecode instruction of the inlining method is suiteble for inline. If method is suitable, the Inliner creates new graph and runs IrBuilder for it. After that it embeds inlined graph into the current graph. _Pseudo code for the inlined graph embedding:_ ```python # Check bytecode is suitable for inlining if not inlined_method.bytecode.include_only_suitable_insts: return false # Build graph for inlined method. Pass current SaveState instruction, created for call_inst. # This SaveState will be set as aditional input for all SaveState instructions in the inlined graph. inlined_graph = IrBuilder.run(inlined_method, current_save_state) # Split block by call instruction [first_bb, second_bb] = curr_block.split_on(call_inst) # Replace inlined graph incoming dataflow edges if call_inst.has_inputs: for input in call_inst.inputs: input.replace_succ(call_inst, inlined_graph.start_block.param_for(input)) # Replace inlined graph outcoming dataflow edges call_inst.replace_succs(inlined_graph.return_inst.input) inlined_graph.remove(inlined_graph.return_inst) # Move constants of inlined graph to the current graph for constant in inlined_graph.constants: exisitng_constant = current_graph.get_constant(constant.value) if not exisitng_constant: current_graph.append_constant(constant) else: for succ : constant.succs: succ.replace_pred(constant, exisitng_constant) # Connect inlined graph as successor of the first part of split block inlined_graph.start_block.succ.set_pred(first_bb) # Move all predecessors of inlined end block to second part of split block for pred in inlined_graph.end_block: pred.replace_succ(inlined_graph.end_block, second_bb) ``` ## Dependencies - IrBuilder - ObjectTypePropagation ## Links [Inlining optimization source code](../optimizer/optimizations/inlining.cpp) [Class Hierarchy Analysis source code](../../runtime/cha.cpp)