1# PLT Resolvers 2 3AOT compiler mode mainly described in [aot.md](../../docs/aot.md), please read it first. 4 5## Brief SlowPath idea description 6 7JIT/AOT compiler has a `SlowPath` mechanism. It is used for some opcodes where a call to runtime is required conditionally, 8but not always. 9During code generation so-called `SlowPath` code is created, and we put it into a special cold code block at the end of the function. 10Unique `SlowPath` blob is generated for each place it is called, and as it contains saving registers and setting up of so-called 11`BoundaryFrame` for stack walker, it's code is much longer than few runtime-call-related instructions mentioned in the section above. 12 13## Code size issue 14 15Speaking about AOT mode, for opcodes like `CallStatic`, `CallVirtual`, and opcodes related to `Class` resolving such 16`SlowPath` also can be used, as we can cache gathered Method or Class pointer into a slot in GOT table (in `.aot_got` section). 17The problem is that such a `SlowPath` would be actually required only once when we first time reach appropriate `method Id` 18or `class Id`. So, in order to reduce code size in AOT mode, more tricky solution with PLT Resolvers is used. 19 20## Static Call Resolver 21 22For each pair of File (input for `ark_aot` compiler) and callee `method Id` (`panda_file::File::EntityId`) three 23consecutive slots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `method Id`. 24`SecondSlot` is filled during AOT file loading into runtime and contains `PLT CallStatic Resolver` address. 25`ThirdSlot` would actually store `Method pointer` after resolving, but during AOT file loading it is initialized 26to address of `SecondSlot`, subtracted by `GetCompiledEntryPointOffset` value. 27 28During calls, first parameter is always a callee `Method pointer`, so the trick from previous paragraph allows to have 29fully transparent resolver for code generation. Lets see `arm64` example (`GetCompiledEntryPointOffset` is 56 = 7 * 8, all function 30parameters are already in proper registers): 31 32``` 33========= .aot_got ======== 34; Somewhere in PLT-GOT table 35 . . . 36-YY-16: FirstSlot - method Id 37-YY-08: SecondSlot - PLT CallStatic Resolver 38-YY-00: ThirdSlot - address of (-YY-08-56) <-------------- 39 . . . | 40; start of entrypoint table | 41-NN: address of handler 0, NN = N * 8 | 42 . . . | 43-16: address of handler N-1 | 44-08: address of handler N | 45========== .text ========== | 4600: | 47 . . . | 48XX+00: adr x0, #-(YY+XX) ; Put to the x0 address of ThirdSlot ; before resolve ; after resolve 49XX+04: ldr x0, [x0] ; Load value stored in ThirdSlot ; (&FirstSlot)-48 ; Method Pointer 50XX+08: ldr x30, [x0, #56] ; Load EntryPoint ; SecondSlot value ; Executable code 51XX+12: blr x30 ; Call ; Call Resolver ; Call Method 52 . . . 53``` 54 55`PLT CallStatic Resolver` after saving all registers to the stack and `BoundaryFrame` generation, have `(&FirstSlot)-48` 56value in `x0`, so it may load `ldr x1, [x0, #48]` to get `method Id` from `FirstSlot`. 57Caller `Method pointer` could be extracted (into `x0`) directly from Caller's CFrame, so, 58having this two values in `x0` and `x1` it just call `GetCalleeMethod` to gather `Method pointer`. 59 60When we have `Method pointer`, it is stored into `ThirdSlot`, allow to load proper executable address, and goes as first 61parameter in actual method call. Jump by register value operation is used instead of call to return back directly into code, 62not the resolver. 63 64## Virtual Call Resolver 65 66For each pair of File (input for `ark_aot` compiler) and callee `method Id` (`panda_file::File::EntityId`) two consecutive 67slots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `method Id`. 68`SecondSlot` is filled with zero and after resolving it stores `VTable index` incremented by 1. 69 70``` 71========= .aot_got ======== 72; Somewhere in PLT-GOT table 73 . . . 74-YY-08: FirstSlot - method Id 75-YY-00: SecondSlot, zero or (index+1) <--------------------------- 76 . . . | 77; start of entrypoint table | 78-NN: address of handler 0, NN = N * 8 | 79 . . . | 80-16: address of handler N-1 | 81-08: address of handler N | 82========== .text ========== | 8300: | 84 . . . | 85; CallVirtual opcode (register allocator used x5 for Class ptr) | 86XX+00: adr x16, #-(YY+XX) ; Put to the x16 address of SecondSlot 87XX+04: ldr w17, [x16] ; Load value from SecondSlot 88XX+08: cbnz w17, #16 ; Jump to XX+24 when non-zero 89XX+16: ldr x28, [#CALL_VIRTUAL_RESOLVER] ; Load VirtualCall Resolver address 90XX+20: blr x30 ; Call Resolver, x16 is like a "parameter" and "return value" 91XX+24: ldr w16, [x5, #4] ; Get Class pointer into x16 92XX+28: add w16, w16, w17, lsl #3 ; x16 = Class+(index+1)*8 93XX+32: ldr w16, [x16, #160] ; Load Method from VTable (compensating index+1, as VTable start offset is 168) 94 . . . ; Check IsAbstract 95 . . . ; Save caller-saved registers 96 . . . ; Set call parameters 97ZZ+00: mov x0, x16 ; x0 = Method address 98ZZ+04: ldr x30, [x0, #56] ; Executable code address 99ZZ+08: blr x30 ; Call 100 . . . 101``` 102 103Unlike CallStatic, there is no way to use default parameter registers to send/receive values into resolver. 104Thus for `PLT CallVirtual Resolver` convention is the following - first `Encoder` temporary register 105(`x16` for `arm64` or `r12` for `x86_84`) is a parameter with `SecondSlot` address and also the same register 106works as "return value" 107 108`PLT CallVirtual Resolver` loads `method Id` from `FirstSlot` using address `x16-8`, 109takes caller `Method pointer` from previous frame and calls `GetCalleeMethod` entrypoint. 110Having `Method pointer` it is easy to load `VTable index` value. 111Resolver returns `index+1` value using `x16`, and don't call any other functions like `PLT CallStatic Resolver` do. 112Control is returned back into code instead. 113 114## Class and InitClass Resolvers 115 116For each pair of File (input for `ark_aot` compiler) and `class Id` (`panda_file::File::EntityId`) which needs to be resolved 117three consecutive slots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `class Id`. 118`SecondSlot` and `ThirdSlot` are filled with zeroes and after resolving they both store `Class pointer`, but have different meaning. 119When `SecondSlot` in non-zero it means that `Class` is known to be in `Initialized` state already. 120 121``` 122========= .aot_got ======== 123; Somewhere in PLT-GOT table 124 . . . 125-YY-16: FirstSlot - class Id 126-YY-08: SecondSlot, zero or "Inialized Class" pointer <----------- 127-YY-00: ThirdSlot, zero or Class pointer | 128 . . . | 129; start of entrypoint table | 130-NN: address of handler 0, NN = N * 8 | 131 . . . | 132-16: address of handler N-1 | 133-08: address of handler N | 134========== .text ========== | 13500: | 136 . . . | 137; Shared resolved slow path for PLT resolver | 138YY+00: ldr x17, x28, [CLASS_INIT_RESOLVER] ; Load InitClass Resolver address 139YY+04: br x17 ; Jump to resolver, x16 works like a "parameter" and "return value" 140 . . . | 141; LoadAndInitClass opcode (w7 register allocated for result) | 142XX+00: adr x16, #-(YY+8+XX) ; Put to the x16 address of SecondSlot 143XX+04: ldr w7, [x16] ; Load value from SecondSlot 144XX+08: cbnz w7, #20 ; Jump to XX+28 when non-zero 145XX+12: bl YY - (XX+08) ; Call shared slow path for PLT resolver, x16 works like a "parameter" and "return value" 146XX+16: mov w7, w16 ; Class should be in w7 147XX+20: ... ; run next opcode 148 . . . 149``` 150 151For class-related resolvers convention is the following - first `Encoder` temporary register 152(`x16` for `arm64` or `r12` for `x86_84`) is a parameter with Slot address, and it is also used as "return value". 153 154`PLT InitClass Resolver` loads `class Id` from `FirstSlot` using address `x16-8`, 155takes caller `Method pointer` from previous frame and calls `InitializeClassById` entrypoint. 156It stores gathered `Class pointer` into `ThirdSlot`, and also does the same for `SecondSlot` but under condition. 157The condition is whether `Class` state is `Initialized`, as returning from `InitializeClassById` entrypoint in some corner 158cases can happen when `Class` is yet only in `Initializing` state. 159 160`PLT Class Resolver` receives `x16` addressing `ThirdSlot`, so it loads `class Id` from `FirstSlot` using address `x16-16`. 161Another entrypoint is called here - `ResolveClass`. Gathered `Class pointer` value is stored into `ThirdSlot` only. 162 163Both Resolvers returns `Class pointer` value using `x16` back into code. 164 165## Resolver Encoding 166 167As all 4 resolvers have a lot of similar parts, their generation in implemented in one method - `EncodePltHelper`. 168Moreover, it is placed in platform-independent file `code_generator/target/target.cpp`, although there are actually several 169differences in what's happening for `arm64` and `x86_64`. 170 171Main difference between two supported platforms is a main temporary register to use in Resolver. 172For `arm64` we use `LR` register (`x30`), and for `x86_64` third `Encoder` temporary - `r14` is used. 173 174One more issue is that first `Encoder` temporary register (`x16` for `arm64` or `r12` for `x86_84`) used as parameter 175in 3 Resolvers (all but CallStatic) is actually a caller-saved for `arm64`, but callee-saved for `x86`, leading to some 176difference. 177 178Lets briefly discuss all steps which happen consecutively in any Resolver: 179* **Save LR and FP register to stack.** 180On `arm64` is is just a one `stp x29, x30, [sp, #-16]` instruction,while on `x86` caller return address is already 181on stack, so we load it into temporary (we need it for `BoundaryFrame`), and push `rbp` to the stack. 182 183* **Create BoundaryFrame.** 184It actually copies the `SlowPath` behavior of usual `BoundaryFrame` class constructor, but with one special trick: 185for 3 out of 4 Resolvers (all but CallStatic) "return address" and "previous frame" values which are already on stack 186(see previous step) directly became the upper part of `BoundaryFrame` stack part. 187 188* **Save caller-saved registers.** 189In CallStatic resolver we prepare place on the stack and save registers there. In three other Resolvers caller-saved 190registers are saved directly into appropriate places in previous CFrame. 191Stack pointer is temporarily manually adjusted in this case to allow `SaveCallerRegisters` function to do it's job. 192Moreover, for `arm64` we manually add `x16` to live registers set. 193 194* **Prepare parameters for Runtime Call.** 195This step is described above separately in each resolver description. 196 197* **Save callee-saved registers.** 198Adjust stack pointer (second time for `CallStatic` Resolver, and the only time for other) and 199call `SaveRegisters` two times - for float and scalar registers. 200 201* **Make a Runtime Call.** 202This step is done using `MakeCallAot` function with properly calculated offset. Resolvers are placed after all functions in 203AOT file, but distance to `.aot_got` section can be calculated in the same way like for usual code generation. 204 205* **Load callee-saved registers.** 206Reverse what was done two steps above - `LoadRegisters` for float and scalar registers, then adjust the stack pointer back. 207 208* **Restore previous Frame.** 209Works similar to `BoundaryFrame` class destructor. 210 211* **Process gathered result.** 212First, `arm64` non-`CallStatic` Resolvers need to manually restore `x16` from the place it was saved. 213On `x86_64` this step is not required, as `r12` appears to be callee-saved register and is restored already. 214Main logic of this step is described above separately in each resolver description. 215 216* **Load caller-saved registers.** 217Registers are loaded in the same manner they were saved. So, in CallStatic we have to adjust stack pointer after loading, 218while in other Resolvers it is temporarily manually adjusted to previous frame before calling `LoadCallerRegisters` function. 219 220* **Restore LR and FP.** 221Nothing special, symmetric to the very first step. 222 223* **Leave Resolver.** 224Jump to the callee Method in `CallStatic` Resolver, and do a usual "return" in others. 225