• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Description of the internal format of the line number table in Python 3.10
2and earlier.
3
4(For 3.11 onwards, see Objects/locations.md)
5
6Conceptually, the line number table consists of a sequence of triples:
7    start-offset (inclusive), end-offset (exclusive), line-number.
8
9Note that not all byte codes have a line number so we need handle `None` for the line-number.
10
11However, storing the above sequence directly would be very inefficient as we would need 12 bytes per entry.
12
13First, note that the end of one entry is the same as the start of the next, so we can overlap entries.
14Second, we don't really need arbitrary access to the sequence, so we can store deltas.
15
16We just need to store (end - start, line delta) pairs. The start offset of the first entry is always zero.
17
18Third, most deltas are small, so we can use a single byte for each value, as long we allow several entries for the same line.
19
20Consider the following table
21     Start    End     Line
22      0       6       1
23      6       50      2
24      50      350     7
25      350     360     No line number
26      360     376     8
27      376     380     208
28
29Stripping the redundant ends gives:
30
31   End-Start  Line-delta
32      6         +1
33      44        +1
34      300       +5
35      10        No line number
36      16        +1
37      4         +200
38
39
40Note that the end - start value is always positive.
41
42Finally, in order to fit into a single byte we need to convert start deltas to the range 0 <= delta <= 254,
43and line deltas to the range -127  <= delta <= 127.
44A line delta of -128 is used to indicate no line number.
45Also note that a delta of zero indicates that there are no bytecodes in the given range,
46which means we can use an invalid line number for that range.
47
48Final form:
49
50   Start delta   Line delta
51    6               +1
52    44              +1
53    254             +5
54    46              0
55    10              -128 (No line number, treated as a delta of zero)
56    16              +1
57    0               +127 (line 135, but the range is empty as no bytecodes are at line 135)
58    4               +73
59
60Iterating over the table.
61-------------------------
62
63For the `co_lines` method we want to emit the full form, omitting the (350, 360, No line number) and empty entries.
64
65The code is as follows:
66
67def co_lines(code):
68    line = code.co_firstlineno
69    end = 0
70    table_iter = iter(code.internal_line_table):
71    for sdelta, ldelta in table_iter:
72        if ldelta == 0: # No change to line number, just accumulate changes to end
73            end += sdelta
74            continue
75        start = end
76        end = start + sdelta
77        if ldelta == -128: # No valid line number -- skip entry
78            continue
79        line += ldelta
80        if end == start: # Empty range, omit.
81            continue
82        yield start, end, line
83
84
85
86
87The historical co_lnotab format
88-------------------------------
89
90prior to 3.10 code objects stored a field named co_lnotab.
91This was an array of unsigned bytes disguised as a Python bytes object.
92
93The old co_lnotab did not account for the presence of bytecodes without a line number,
94nor was it well suited to tracing as a number of workarounds were required.
95
96The old format can still be accessed via `code.co_lnotab`, which is lazily computed from the new format.
97
98Below is the description of the old co_lnotab format:
99
100
101The array is conceptually a compressed list of
102    (bytecode offset increment, line number increment)
103pairs.  The details are important and delicate, best illustrated by example:
104
105    byte code offset    source code line number
106        0                   1
107        6                   2
108       50                   7
109      350                 207
110      361                 208
111
112Instead of storing these numbers literally, we compress the list by storing only
113the difference from one row to the next.  Conceptually, the stored list might
114look like:
115
116    0, 1,  6, 1,  44, 5,  300, 200,  11, 1
117
118The above doesn't really work, but it's a start. An unsigned byte (byte code
119offset) can't hold negative values, or values larger than 255, a signed byte
120(line number) can't hold values larger than 127 or less than -128, and the
121above example contains two such values.  (Note that before 3.6, line number
122was also encoded by an unsigned byte.)  So we make two tweaks:
123
124 (a) there's a deep assumption that byte code offsets increase monotonically,
125 and
126 (b) if byte code offset jumps by more than 255 from one row to the next, or if
127 source code line number jumps by more than 127 or less than -128 from one row
128 to the next, more than one pair is written to the table. In case #b,
129 there's no way to know from looking at the table later how many were written.
130 That's the delicate part.  A user of co_lnotab desiring to find the source
131 line number corresponding to a bytecode address A should do something like
132 this:
133
134    lineno = addr = 0
135    for addr_incr, line_incr in co_lnotab:
136        addr += addr_incr
137        if addr > A:
138            return lineno
139        if line_incr >= 0x80:
140            line_incr -= 0x100
141        lineno += line_incr
142
143(In C, this is implemented by PyCode_Addr2Line().)  In order for this to work,
144when the addr field increments by more than 255, the line # increment in each
145pair generated must be 0 until the remaining addr increment is < 256.  So, in
146the example above, assemble_lnotab in compile.c should not (as was actually done
147until 2.2) expand 300, 200 to
148    255, 255, 45, 45,
149but to
150    255, 0, 45, 127, 0, 73.
151
152The above is sufficient to reconstruct line numbers for tracebacks, but not for
153line tracing.  Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
154and maybe_call_line_trace() in ceval.c.
155
156*** Tracing ***
157
158To a first approximation, we want to call the tracing function when the line
159number of the current instruction changes.  Re-computing the current line for
160every instruction is a little slow, though, so each time we compute the line
161number we save the bytecode indices where it's valid:
162
163     *instr_lb <= frame->f_lasti < *instr_ub
164
165is true so long as execution does not change lines.  That is, *instr_lb holds
166the first bytecode index of the current line, and *instr_ub holds the first
167bytecode index of the next line.  As long as the above expression is true,
168maybe_call_line_trace() does not need to call PyCode_CheckLineNumber().  Note
169that the same line may appear multiple times in the lnotab, either because the
170bytecode jumped more than 255 indices between line number changes or because
171the compiler inserted the same line twice.  Even in that case, *instr_ub holds
172the first index of the next line.
173
174However, we don't *always* want to call the line trace function when the above
175test fails.
176
177Consider this code:
178
1791: def f(a):
1802:    while a:
1813:       print(1)
1824:       break
1835:    else:
1846:       print(2)
185
186which compiles to this:
187
188  2           0 SETUP_LOOP              26 (to 28)
189        >>    2 LOAD_FAST                0 (a)
190              4 POP_JUMP_IF_FALSE       18
191
192  3           6 LOAD_GLOBAL              0 (print)
193              8 LOAD_CONST               1 (1)
194             10 CALL_NO_KW               1
195             12 POP_TOP
196
197  4          14 BREAK_LOOP
198             16 JUMP_ABSOLUTE            2
199        >>   18 POP_BLOCK
200
201  6          20 LOAD_GLOBAL              0 (print)
202             22 LOAD_CONST               2 (2)
203             24 CALL_NO_KW               1
204             26 POP_TOP
205        >>   28 LOAD_CONST               0 (None)
206             30 RETURN_VALUE
207
208If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18
209and the co_lnotab will claim that execution has moved to line 4, which is wrong.
210In this case, we could instead associate the POP_BLOCK with line 5, but that
211would break jumps around loops without else clauses.
212
213We fix this by only calling the line trace function for a forward jump if the
214co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current
215instruction offset matches the offset given for the start of a line by the
216co_lnotab.  For backward jumps, however, we always call the line trace function,
217which lets a debugger stop on every evaluation of a loop guard (which usually
218won't be the first opcode in a line).
219
220Why do we set f_lineno when tracing, and only just before calling the trace
221function?  Well, consider the code above when 'a' is true.  If stepping through
222this with 'n' in pdb, you would stop at line 1 with a "call" type event, then
223line events on lines 2, 3, and 4, then a "return" type event -- but because the
224code for the return actually falls in the range of the "line 6" opcodes, you
225would be shown line 6 during this event.  This is a change from the behaviour in
2262.2 and before, and I've found it confusing in practice.  By setting and using
227f_lineno when tracing, one can report a line number different from that
228suggested by f_lasti on this one occasion where it's desirable.
229