1# The Frame Stack 2 3Each call to a Python function has an activation record, 4commonly known as a "frame". 5Python semantics allows frames to outlive the activation, 6so they have (before 3.11) been allocated on the heap. 7This is expensive as it requires many allocations and 8results in poor locality of reference. 9 10In 3.11, rather than have these frames scattered about memory, 11as happens for heap-allocated objects, frames are allocated 12contiguously in a per-thread stack. 13This improves performance significantly for two reasons: 14* It reduces allocation overhead to a pointer comparison and increment. 15* Stack allocated data has the best possible locality and will always be in 16 CPU cache. 17 18Generator and coroutines still need heap allocated activation records, but 19can be linked into the per-thread stack so as to not impact performance too much. 20 21## Layout 22 23Each activation record consists of four conceptual sections: 24 25* Local variables (including arguments, cells and free variables) 26* Evaluation stack 27* Specials: The per-frame object references needed by the VM: globals dict, 28 code object, etc. 29* Linkage: Pointer to the previous activation record, stack depth, etc. 30 31### Layout 32 33The specials and linkage sections are a fixed size, so are grouped together. 34 35Each activation record is laid out as: 36* Specials and linkage 37* Locals 38* Stack 39 40This seems to provide the best performance without excessive complexity. 41It needs the interpreter to hold two pointers, a frame pointer and a stack pointer. 42 43#### Alternative layout 44 45An alternative layout that was used for part of 3.11 alpha was: 46 47* Locals 48* Specials and linkage 49* Stack 50 51This has the advantage that no copying is required when making a call, 52as the arguments on the stack are (usually) already in the correct 53location for the parameters. However, it requires the VM to maintain 54an extra pointer for the locals, which can hurt performance. 55 56A variant that only needs the need two pointers is to reverse the numbering 57of the locals, so that the last one is numbered `0`, and the first in memory 58is numbered `N-1`. 59This allows the locals, specials and linkage to accessed from the frame pointer. 60We may implement this in the future. 61 62#### Note: 63 64> In a contiguous stack, we would need to save one fewer registers, as the 65> top of the caller's activation record would be the same at the base of the 66> callee's. However, since some activation records are kept on the heap we 67> cannot do this. 68 69### Generators and Coroutines 70 71Generators and coroutines contain a `_PyInterpreterFrame` 72The specials sections contains the following pointers: 73 74* Globals dict 75* Builtins dict 76* Locals dict (not the "fast" locals, but the locals for eval and class creation) 77* Code object 78* Heap allocated `PyFrameObject` for this activation record, if any. 79* The function. 80 81The pointer to the function is not strictly required, but it is cheaper to 82store a strong reference to the function and borrowed references to the globals 83and builtins, than strong references to both globals and builtins. 84 85### Frame objects 86 87When creating a backtrace or when calling `sys._getframe()` the frame becomes 88visible to Python code. When this happens a new `PyFrameObject` is created 89and a strong reference to it placed in the `frame_obj` field of the specials 90section. The `frame_obj` field is initially `NULL`. 91 92The `PyFrameObject` may outlive a stack-allocated `_PyInterpreterFrame`. 93If it does then `_PyInterpreterFrame` is copied into the `PyFrameObject`, 94except the evaluation stack which must be empty at this point. 95The linkage section is updated to reflect the new location of the frame. 96 97This mechanism provides the appearance of persistent, heap-allocated 98frames for each activation, but with low runtime overhead. 99 100### Generators and Coroutines 101 102 103Generator objects have a `_PyInterpreterFrame` embedded in them. 104This means that creating a generator requires only a single allocation, 105reducing allocation overhead and improving locality of reference. 106The embedded frame is linked into the per-thread frame when iterated or 107awaited. 108 109If a frame object associated with a generator outlives the generator, then 110the embedded `_PyInterpreterFrame` is copied into the frame object. 111 112 113All the above applies to coroutines and async generators as well. 114 115### Field names 116 117Many of the fields in `_PyInterpreterFrame` were copied from the 3.10 `PyFrameObject`. 118Thus, some of the field names may be a bit misleading. 119 120For example the `f_globals` field has a `f_` prefix implying it belongs to the 121`PyFrameObject` struct, although it belongs to the `_PyInterpreterFrame` struct. 122We may rationalize this naming scheme for 3.12. 123 124 125### Shim frames 126 127On entry to `_PyEval_EvalFrameDefault()` a shim `_PyInterpreterFrame` is pushed. 128This frame is stored on the C stack, and popped when `_PyEval_EvalFrameDefault()` 129returns. This extra frame is inserted so that `RETURN_VALUE`, `YIELD_VALUE`, and 130`RETURN_GENERATOR` do not need to check whether the current frame is the entry frame. 131The shim frame points to a special code object containing the `INTERPRETER_EXIT` 132instruction which cleans up the shim frame and returns. 133 134 135### The Instruction Pointer 136 137`_PyInterpreterFrame` has two fields which are used to maintain the instruction 138pointer: `instr_ptr` and `return_offset`. 139 140When a frame is executing, `instr_ptr` points to the instruction currently being 141executed. In a suspended frame, it points to the instruction that would execute 142if the frame were to resume. After `frame.f_lineno` is set, `instr_ptr` points to 143the next instruction to be executed. During a call to a python function, 144`instr_ptr` points to the call instruction, because this is what we would expect 145to see in an exception traceback. 146 147The `return_offset` field determines where a `RETURN` should go in the caller, 148relative to `instr_ptr`. It is only meaningful to the callee, so it needs to 149be set in any instruction that implements a call (to a Python function), 150including CALL, SEND and BINARY_SUBSCR_GETITEM, among others. If there is no 151callee, then return_offset is meaningless. It is necessary to have a separate 152field for the return offset because (1) if we apply this offset to `instr_ptr` 153while executing the `RETURN`, this is too early and would lose us information 154about the previous instruction which we could need for introspecting and 155debugging. (2) `SEND` needs to pass two offsets to the generator: one for 156`RETURN` and one for `YIELD`. It uses the `oparg` for one, and the 157`return_offset` for the other. 158