1# On-Stack Replacement 2 3### Overview 4 5On-Stack Replacement (OSR) is a technique for switching between different implementations of the same function. 6 7Under the OSR, we mean the transition from interpreter code to optimized code. Opposite transition - from optimized to 8unoptimized - we call `Deoptimization`. 9 10OSR workflow: 11``` 12 +-----------------------+ 13 | | 14 | Interpreter | 15 | | 16 +-----------------------+ 17 Method::osr_code | 18 +------------------------+ | 19 | Method Prologue | V 20 +------------------------+ +-----------------+ 21 | mov x10, 0 | |OsrEntry | 22 | mov d4, 3.14 | +-----------------+ 23 | | | 24 | | +---------------------+ 25 | . . . | | V 26 | | | +-------------------+ 27 | osr_entry_1: | | | PrepareOsrEntry | 28+-->|------------------------| | |(fill CFrame from | 29| | Loop 2 | | | OsrStateStamp) | 30| | | | +-------------------+ 31| | | | CFrame | ^ 32| |------------------------| |<------------------+ | 33| | . . . | | | 34| | | | OsrStateStamp | 35| |------------------------| | +-----------------------------------+ 36| | Method epilogue | | |native_pc : INVALID | 37| |------------------------| | |bytecode_pc : offsetof osr_entry_1 | 38| | OSR Stub 1: |<-----------------+ |osr_entry : osr_code+bytecode_pc | 39| | mov x10, 0 | |vregs[] : vreg1=Slot(2) | 40| | mov d4, 3.14 | | vreg4=CpuReg(8) | 41+---| jump osr_entry_1 | +-----------------------------------+ 42 +------------------------+ 43``` 44 45### Triggering 46 47Both, OSR and regular compilation use the same hotness counter. First time, when counter is overflowed we look 48whether method is already compiled or not. If not, we start compilation in regular mode. Otherwise, we compile 49method in OSR mode. 50 51Once compilation is triggered and OSR compiled code is already set, we begin On-Stack Replacement procedure. 52 53Triggering workflow: 54 55 56 57### Compilation 58 59JIT compiles the whole OSR-method the same way it compiles a hot method. 60 61To ensure all loops in the compiled code may be entered from the interpreter, we need to avoid loop-optimizations. 62In OSR-methods special osr-entry flag is added to the loop-header basic blocks and some optimizations have to skip 63such loops. 64 65There are no restrictions for inlining: methods can be inlined in a general way and all loop-optimizations are 66applicable for them, because methods' loop-headers are not marked as osr-entry. 67 68New pseudo-instruction is introduced: SaveStateOsr - instruction should be the first one in each loop-header basic block 69with true osr-entry flag. 70This instruction contains information about all live virtual registers at the enter to the loop. 71Codegen creates special OsrStackMap for each SaveStateOsr instruction. Difference from regular stackmap is that it has 72`osr entry bytecode offset` field. 73 74### Metainfo 75 76On each OSR entry, we need to restore execution context. 77To do this, we need to know all live virtual registers at this moment. 78For this purpose new stackmap and new opcode were introduced. 79 80New opcode(OsrSaveState) has the same properties as regular SaveState, except that codegen handles them differently. 81No code is generated in place of OsrSaveState, but a special OsrEntryStub entity is created, 82which is necessary to generate an OSR entry code. 83 84OsrEntryStub does the following: 851. move all constants to the cpu registers or frame slots by inserting move or store instructions 862. encodes jump instruction to the head of the loop where the corresponding OsrSaveState is located 87 88The first point is necessary because the Panda compiler can place some constants in the cpu registers, 89but the constants themselves are not virtual registers and won't be stored in the metainfo. 90Accordingly, they need to be restored back to the CPU registers or frame slots. 91 92Osr stackmaps (OsrStateStamp) are needed to restore virtual registers. 93Each OsrStateStamp is linked to specific bytecode offset, which is offset to the first instruction of the loop. 94Stackmap contains all needed information to convert IFrame to CFrame. 95 96### Frame replacement 97 98Since Panda Interpreter is written in the C++ language, we haven't access to its stack. Thus, we can't just replace 99interpreter frame by cframe on the stack. When OSR is occurred we call OSR compiled code, and once it finishes execution 100we return `true` to the Interpreter. Interpreter, in turn, execute fake `return` instruction to exit from the execution 101procedure. 102 103Pseudocode: 104```python 105def interpreter_work(): 106 switch(current_inst): 107 case Return: 108 return 109 case Jump: 110 if target < current_inst.offset: 111 if update_hotness(method, current_inst.bytecode_offset): 112 set_current_inst(Return) 113 ... 114 115def update_hotness(method: Method*, bytecode_offset: int) -> bool: 116 hotness_counter += 1 117 return false if hotness_counter < threshold: 118 119 if method.HasOsrCode(): 120 return OsrEntry(method, bytecode_offset) 121 122 ... # run compilation, see Triggering for more information 123 124 return false 125 126def osr_entry(method: Method*, bytecode_offset: int) -> bool: 127 stamp = Metainfo.find_stamp(bytecode_offset) 128 return false if not stamp 129 130 # Call assembly functions to do OSR magic 131 132 return true 133``` 134 135Most part of the OSR entry is written in an assembly language, because CFrame is resided in the native stack. 136 137Osr Entry can occur in three different contexts according to the previous frame's kind: 1381. **Previous frame is CFrame** 139 140 Before: cframe->c2i->iframe 141 142 After: cframe->cframe' 143 144 New cframe is created in place of `c2i` frame, which is just dropped 145 1462. **Previous frame is IFrame** 147 148 Before: iframe->iframe 149 150 After: iframe->i2c->cframe' 151 152 New cframe is created in the current stack position. But before it we need to insert i2c bridge. 153 1543. **Previous frame is null(current frame is the top frame)** 155 156 Before: iframe 157 158 After: cframe' 159 160c2i - compiled to interpreter code bridge 161 162i2c - interpreter to compiled code bridge 163 164cframe' - new cframe, converted from iframe