• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# The Frame Stack
2
3Each call to a Python function has an activation record,
4commonly known as a "frame".
5Python semantics allows frames to outlive the activation,
6so they have (before 3.11) been allocated on the heap.
7This is expensive as it requires many allocations and
8results in poor locality of reference.
9
10In 3.11, rather than have these frames scattered about memory,
11as happens for heap-allocated objects, frames are allocated
12contiguously in a per-thread stack.
13This improves performance significantly for two reasons:
14* It reduces allocation overhead to a pointer comparison and increment.
15* Stack allocated data has the best possible locality and will always be in
16  CPU cache.
17
18Generator and coroutines still need heap allocated activation records, but
19can be linked into the per-thread stack so as to not impact performance too much.
20
21## Layout
22
23Each activation record consists of four conceptual sections:
24
25* Local variables (including arguments, cells and free variables)
26* Evaluation stack
27* Specials: The per-frame object references needed by the VM: globals dict,
28  code object, etc.
29* Linkage: Pointer to the previous activation record, stack depth, etc.
30
31### Layout
32
33The specials and linkage sections are a fixed size, so are grouped together.
34
35Each activation record is laid out as:
36* Specials and linkage
37* Locals
38* Stack
39
40This seems to provide the best performance without excessive complexity.
41It needs the interpreter to hold two pointers, a frame pointer and a stack pointer.
42
43#### Alternative layout
44
45An alternative layout that was used for part of 3.11 alpha was:
46
47* Locals
48* Specials and linkage
49* Stack
50
51This has the advantage that no copying is required when making a call,
52as the arguments on the stack are (usually) already in the correct
53location for the parameters. However, it requires the VM to maintain
54an extra pointer for the locals, which can hurt performance.
55
56A variant that only needs the need two pointers is to reverse the numbering
57of the locals, so that the last one is numbered `0`, and the first in memory
58is numbered `N-1`.
59This allows the locals, specials and linkage to accessed from the frame pointer.
60We may implement this in the future.
61
62#### Note:
63
64> In a contiguous stack, we would need to save one fewer registers, as the
65> top of the caller's activation record would be the same at the base of the
66> callee's. However, since some activation records are kept on the heap we
67> cannot do this.
68
69### Generators and Coroutines
70
71Generators and coroutines contain a `_PyInterpreterFrame`
72The specials sections contains the following pointers:
73
74* Globals dict
75* Builtins dict
76* Locals dict (not the "fast" locals, but the locals for eval and class creation)
77* Code object
78* Heap allocated `PyFrameObject` for this activation record, if any.
79* The function.
80
81The pointer to the function is not strictly required, but it is cheaper to
82store a strong reference to the function and borrowed references to the globals
83and builtins, than strong references to both globals and builtins.
84
85### Frame objects
86
87When creating a backtrace or when calling `sys._getframe()` the frame becomes
88visible to Python code. When this happens a new `PyFrameObject` is created
89and a strong reference to it placed in the `frame_obj` field of the specials
90section. The `frame_obj` field is initially `NULL`.
91
92The `PyFrameObject` may outlive a stack-allocated `_PyInterpreterFrame`.
93If it does then `_PyInterpreterFrame` is copied into the `PyFrameObject`,
94except the evaluation stack which must be empty at this point.
95The linkage section is updated to reflect the new location of the frame.
96
97This mechanism provides the appearance of persistent, heap-allocated
98frames for each activation, but with low runtime overhead.
99
100### Generators and Coroutines
101
102
103Generator objects have a `_PyInterpreterFrame` embedded in them.
104This means that creating a generator requires only a single allocation,
105reducing allocation overhead and improving locality of reference.
106The embedded frame is linked into the per-thread frame when iterated or
107awaited.
108
109If a frame object associated with a generator outlives the generator, then
110the embedded `_PyInterpreterFrame` is copied into the frame object.
111
112
113All the above applies to coroutines and async generators as well.
114
115### Field names
116
117Many of the fields in `_PyInterpreterFrame` were copied from the 3.10 `PyFrameObject`.
118Thus, some of the field names may be a bit misleading.
119
120For example the `f_globals` field has a `f_` prefix implying it belongs to the
121`PyFrameObject` struct, although it belongs to the `_PyInterpreterFrame` struct.
122We may rationalize this naming scheme for 3.12.
123
124
125### Shim frames
126
127On entry to `_PyEval_EvalFrameDefault()` a shim `_PyInterpreterFrame` is pushed.
128This frame is stored on the C stack, and popped when `_PyEval_EvalFrameDefault()`
129returns. This extra frame is inserted so that `RETURN_VALUE`, `YIELD_VALUE`, and
130`RETURN_GENERATOR` do not need to check whether the current frame is the entry frame.
131The shim frame points to a special code object containing the `INTERPRETER_EXIT`
132instruction which cleans up the shim frame and returns.
133
134
135### The Instruction Pointer
136
137`_PyInterpreterFrame` has two fields which are used to maintain the instruction
138pointer: `instr_ptr` and `return_offset`.
139
140When a frame is executing, `instr_ptr` points to the instruction currently being
141executed. In a suspended frame, it points to the instruction that would execute
142if the frame were to resume. After `frame.f_lineno` is set, `instr_ptr` points to
143the next instruction to be executed. During a call to a python function,
144`instr_ptr` points to the call instruction, because this is what we would expect
145to see in an exception traceback.
146
147The `return_offset` field determines where a `RETURN` should go in the caller,
148relative to `instr_ptr`.  It is only meaningful to the callee, so it needs to
149be set in any instruction that implements a call (to a Python function),
150including CALL, SEND and BINARY_SUBSCR_GETITEM, among others. If there is no
151callee, then return_offset is meaningless.  It is necessary to have a separate
152field for the return offset because (1) if we apply this offset to `instr_ptr`
153while executing the `RETURN`, this is too early and would lose us information
154about the previous instruction which we could need for introspecting and
155debugging. (2) `SEND` needs to pass two offsets to the generator: one for
156`RETURN` and one for `YIELD`. It uses the `oparg` for one, and the
157`return_offset` for the other.
158