• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# PLT Resolvers
2
3AOT compiler mode mainly described in [aot.md](../../docs/aot.md), please read it first.
4
5## Brief SlowPath idea description
6
7JIT/AOT compiler has a `SlowPath` mechanism. It is used for some opcodes where a call to runtime is required conditionally,
8but not always.
9During code generation so-called `SlowPath` code is created, and we put it into a special cold code block at the end of the function.
10Unique `SlowPath` blob is generated for each place it is called, and as it contains saving registers and setting up of so-called
11`BoundaryFrame` for stack walker, it's code is much longer than few runtime-call-related instructions mentioned in the section above.
12
13## Code size issue
14
15Speaking about AOT mode, for opcodes like `CallStatic`, `CallVirtual`, and opcodes related to `Class` resolving such
16`SlowPath` also can be used, as we can cache gathered Method or Class pointer into a slot in GOT table (in `.aot_got` section).
17The problem is that such a `SlowPath` would be actually required only once when we first time reach appropriate `method Id`
18or `class Id`. So, in order to reduce code size in AOT mode, more tricky solution with PLT Resolvers is used.
19
20## Static Call Resolver
21
22For each pair of File (input for `ark_aot` compiler) and callee `method Id` (`panda_file::File::EntityId`) three
23consecutive slots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `method Id`.
24`SecondSlot` is filled during AOT file loading into runtime and contains `PLT CallStatic Resolver` address.
25`ThirdSlot` would actually store `Method pointer` after resolving, but during AOT file loading it is initialized
26to address of `SecondSlot`, subtracted by `GetCompiledEntryPointOffset` value.
27
28During calls, first parameter is always a callee `Method pointer`, so the trick from previous paragraph allows to have
29fully transparent resolver for code generation. Lets see `arm64` example (`GetCompiledEntryPointOffset` is 56 = 7 * 8, all function
30parameters are already in proper registers):
31
32```
33========= .aot_got ========
34; Somewhere in PLT-GOT table
35 . . .
36-YY-16: FirstSlot - method Id
37-YY-08: SecondSlot - PLT CallStatic Resolver
38-YY-00: ThirdSlot - address of (-YY-08-56)  <--------------
39 . . .                                                    |
40; start of entrypoint table                               |
41-NN: address of handler 0, NN = N * 8                     |
42 . . .                                                    |
43-16: address of handler N-1                               |
44-08: address of handler N                                 |
45========== .text ==========                               |
4600:                                                       |
47 . . .                                                    |
48XX+00: adr x0, #-(YY+XX)   ; Put to the x0 address of ThirdSlot ; before resolve   ; after resolve
49XX+04: ldr x0, [x0]        ; Load value stored in ThirdSlot     ; (&FirstSlot)-48  ; Method Pointer
50XX+08: ldr x30, [x0, #56]  ; Load EntryPoint                    ; SecondSlot value ; Executable code
51XX+12: blr x30             ; Call                               ; Call Resolver    ; Call Method
52 . . .
53```
54
55`PLT CallStatic Resolver` after saving all registers to the stack and `BoundaryFrame` generation, have `(&FirstSlot)-48`
56value in `x0`, so it may load `ldr x1, [x0, #48]` to get `method Id` from `FirstSlot`.
57Caller `Method pointer` could be extracted (into `x0`) directly from Caller's CFrame, so,
58having this two values in `x0` and `x1` it just call `GetCalleeMethod` to gather `Method pointer`.
59
60When we have `Method pointer`, it is stored into `ThirdSlot`, allow to load proper executable address, and goes as first
61parameter in actual method call. Jump by register value operation is used instead of call to return back directly into code,
62not the resolver.
63
64## Virtual Call Resolver
65
66For each pair of File (input for `ark_aot` compiler) and callee `method Id` (`panda_file::File::EntityId`) two consecutive
67slots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `method Id`.
68`SecondSlot` is filled with zero and after resolving it stores `VTable index` incremented by 1.
69
70```
71========= .aot_got ========
72; Somewhere in PLT-GOT table
73 . . .
74-YY-08: FirstSlot - method Id
75-YY-00: SecondSlot, zero or (index+1) <---------------------------
76 . . .                                                           |
77; start of entrypoint table                                      |
78-NN: address of handler 0, NN = N * 8                            |
79 . . .                                                           |
80-16: address of handler N-1                                      |
81-08: address of handler N                                        |
82========== .text ==========                                      |
8300:                                                              |
84 . . .                                                           |
85; CallVirtual opcode (register allocator used x5 for Class ptr)  |
86XX+00: adr x16, #-(YY+XX)        ; Put to the x16 address of SecondSlot
87XX+04: ldr w17, [x16]            ; Load value from SecondSlot
88XX+08: cbnz w17, #16             ; Jump to XX+24 when non-zero
89XX+16: ldr x28, [#CALL_VIRTUAL_RESOLVER] ; Load VirtualCall Resolver address
90XX+20: blr x30                   ; Call Resolver, x16 is like a "parameter" and "return value"
91XX+24: ldr w16, [x5, #4]         ; Get Class pointer into x16
92XX+28: add w16, w16, w17, lsl #3 ; x16 = Class+(index+1)*8
93XX+32: ldr w16, [x16, #160]      ; Load Method from VTable (compensating index+1, as VTable start offset is 168)
94 . . . ; Check IsAbstract
95 . . . ; Save caller-saved registers
96 . . . ; Set call parameters
97ZZ+00: mov x0, x16               ; x0 = Method address
98ZZ+04: ldr x30, [x0, #56]        ; Executable code address
99ZZ+08: blr x30                   ; Call
100 . . .
101```
102
103Unlike CallStatic, there is no way to use default parameter registers to send/receive values into resolver.
104Thus for `PLT CallVirtual Resolver` convention is the following - first `Encoder` temporary register
105(`x16` for `arm64` or `r12` for `x86_84`) is a parameter with `SecondSlot` address and also the same register
106works as "return value"
107
108`PLT CallVirtual Resolver` loads `method Id` from `FirstSlot` using address `x16-8`,
109takes caller `Method pointer` from previous frame and calls `GetCalleeMethod` entrypoint.
110Having `Method pointer` it is easy to load `VTable index` value.
111Resolver returns `index+1` value using `x16`, and don't call any other functions like `PLT CallStatic Resolver` do.
112Control is returned back into code instead.
113
114## Class and InitClass Resolvers
115
116For each pair of File (input for `ark_aot` compiler) and `class Id` (`panda_file::File::EntityId`) which needs to be resolved
117three consecutive slots are reserved in PLT-GOT table. `FirstSlot` is filled during AOT file creation and contains `class Id`.
118`SecondSlot` and `ThirdSlot` are filled with zeroes and after resolving they both store `Class pointer`, but have different meaning.
119When `SecondSlot` in non-zero it means that `Class` is known to be in `Initialized` state already.
120
121```
122========= .aot_got ========
123; Somewhere in PLT-GOT table
124 . . .
125-YY-16: FirstSlot - class Id
126-YY-08: SecondSlot, zero or "Inialized Class" pointer <-----------
127-YY-00: ThirdSlot, zero or Class pointer                         |
128 . . .                                                           |
129; start of entrypoint table                                      |
130-NN: address of handler 0, NN = N * 8                            |
131 . . .                                                           |
132-16: address of handler N-1                                      |
133-08: address of handler N                                        |
134========== .text ==========                                      |
13500:                                                              |
136 . . .                                                           |
137; Shared resolved slow path for PLT resolver                     |
138YY+00: ldr x17, x28, [CLASS_INIT_RESOLVER] ; Load InitClass Resolver address
139YY+04: br  x17                             ; Jump to resolver, x16 works like a "parameter" and "return value"
140 . . .                                                           |
141; LoadAndInitClass opcode (w7 register allocated for result)     |
142XX+00: adr x16, #-(YY+8+XX)      ; Put to the x16 address of SecondSlot
143XX+04: ldr w7, [x16]             ; Load value from SecondSlot
144XX+08: cbnz w7, #20              ; Jump to XX+28 when non-zero
145XX+12: bl YY - (XX+08)           ; Call shared slow path for PLT resolver, x16 works like a "parameter" and "return value"
146XX+16: mov w7, w16               ; Class should be in w7
147XX+20: ... ; run next opcode
148 . . .
149```
150
151For class-related resolvers convention is the following - first `Encoder` temporary register
152(`x16` for `arm64` or `r12` for `x86_84`) is a parameter with Slot address, and it is also used as "return value".
153
154`PLT InitClass Resolver` loads `class Id` from `FirstSlot` using address `x16-8`,
155takes caller `Method pointer` from previous frame and calls `InitializeClassById` entrypoint.
156It stores gathered `Class pointer` into `ThirdSlot`, and also does the same for `SecondSlot` but under condition.
157The condition is whether `Class` state is `Initialized`, as returning from `InitializeClassById` entrypoint in some corner
158cases can happen when `Class` is yet only in `Initializing` state.
159
160`PLT Class Resolver` receives `x16` addressing `ThirdSlot`, so it loads `class Id` from `FirstSlot` using address `x16-16`.
161Another entrypoint is called here - `ResolveClass`. Gathered `Class pointer` value is stored into `ThirdSlot` only.
162
163Both Resolvers returns `Class pointer` value using `x16` back into code.
164
165## Resolver Encoding
166
167As all 4 resolvers have a lot of similar parts, their generation in implemented in one method - `EncodePltHelper`.
168Moreover, it is placed in platform-independent file `code_generator/target/target.cpp`, although there are actually several
169differences in what's happening for `arm64` and `x86_64`.
170
171Main difference between two supported platforms is a main temporary register to use in Resolver.
172For `arm64` we use `LR` register (`x30`), and for `x86_64` third `Encoder` temporary - `r14` is used.
173
174One more issue is that first `Encoder` temporary register (`x16` for `arm64` or `r12` for `x86_84`) used as parameter
175in 3 Resolvers (all but CallStatic) is actually a caller-saved for `arm64`, but callee-saved for `x86`, leading to some
176difference.
177
178Lets briefly discuss all steps which happen consecutively in any Resolver:
179* **Save LR and FP register to stack.**
180On `arm64` is is just a one `stp x29, x30, [sp, #-16]` instruction,while on `x86` caller return address is already
181on stack, so we load it into temporary (we need it for `BoundaryFrame`), and push `rbp` to the stack.
182
183* **Create BoundaryFrame.**
184It actually copies the `SlowPath` behavior of usual `BoundaryFrame` class constructor, but with one special trick:
185for 3 out of 4 Resolvers (all but CallStatic) "return address" and "previous frame" values which are already on stack
186(see previous step) directly became the upper part of `BoundaryFrame` stack part.
187
188* **Save caller-saved registers.**
189In CallStatic resolver we prepare place on the stack and save registers there. In three other Resolvers caller-saved
190registers are saved directly into appropriate places in previous CFrame.
191Stack pointer is temporarily manually adjusted in this case to allow `SaveCallerRegisters` function to do it's job.
192Moreover, for `arm64` we manually add `x16` to live registers set.
193
194* **Prepare parameters for Runtime Call.**
195This step is described above separately in each resolver description.
196
197* **Save callee-saved registers.**
198Adjust stack pointer (second time for `CallStatic` Resolver, and the only time for other) and
199call `SaveRegisters` two times - for float and scalar registers.
200
201* **Make a Runtime Call.**
202This step is done using `MakeCallAot` function with properly calculated offset. Resolvers are placed after all functions in
203AOT file, but distance to `.aot_got` section can be calculated in the same way like for usual code generation.
204
205* **Load callee-saved registers.**
206Reverse what was done two steps above - `LoadRegisters` for float and scalar registers, then adjust the stack pointer back.
207
208* **Restore previous Frame.**
209Works similar to `BoundaryFrame` class destructor.
210
211* **Process gathered result.**
212First, `arm64` non-`CallStatic` Resolvers need to manually restore `x16` from the place it was saved.
213On `x86_64` this step is not required, as `r12` appears to be callee-saved register and is restored already.
214Main logic of this step is described above separately in each resolver description.
215
216* **Load caller-saved registers.**
217Registers are loaded in the same manner they were saved. So, in CallStatic we have to adjust stack pointer after loading,
218while in other Resolvers it is temporarily manually adjusted to previous frame before calling `LoadCallerRegisters` function.
219
220* **Restore LR and FP.**
221Nothing special, symmetric to the very first step.
222
223* **Leave Resolver.**
224Jump to the callee Method in `CallStatic` Resolver, and do a usual "return" in others.
225