1Description of the internal format of the line number table in Python 3.10 2and earlier. 3 4(For 3.11 onwards, see Objects/locations.md) 5 6Conceptually, the line number table consists of a sequence of triples: 7 start-offset (inclusive), end-offset (exclusive), line-number. 8 9Note that not all byte codes have a line number so we need handle `None` for the line-number. 10 11However, storing the above sequence directly would be very inefficient as we would need 12 bytes per entry. 12 13First, note that the end of one entry is the same as the start of the next, so we can overlap entries. 14Second, we don't really need arbitrary access to the sequence, so we can store deltas. 15 16We just need to store (end - start, line delta) pairs. The start offset of the first entry is always zero. 17 18Third, most deltas are small, so we can use a single byte for each value, as long we allow several entries for the same line. 19 20Consider the following table 21 Start End Line 22 0 6 1 23 6 50 2 24 50 350 7 25 350 360 No line number 26 360 376 8 27 376 380 208 28 29Stripping the redundant ends gives: 30 31 End-Start Line-delta 32 6 +1 33 44 +1 34 300 +5 35 10 No line number 36 16 +1 37 4 +200 38 39 40Note that the end - start value is always positive. 41 42Finally, in order to fit into a single byte we need to convert start deltas to the range 0 <= delta <= 254, 43and line deltas to the range -127 <= delta <= 127. 44A line delta of -128 is used to indicate no line number. 45Also note that a delta of zero indicates that there are no bytecodes in the given range, 46which means we can use an invalid line number for that range. 47 48Final form: 49 50 Start delta Line delta 51 6 +1 52 44 +1 53 254 +5 54 46 0 55 10 -128 (No line number, treated as a delta of zero) 56 16 +1 57 0 +127 (line 135, but the range is empty as no bytecodes are at line 135) 58 4 +73 59 60Iterating over the table. 61------------------------- 62 63For the `co_lines` method we want to emit the full form, omitting the (350, 360, No line number) and empty entries. 64 65The code is as follows: 66 67def co_lines(code): 68 line = code.co_firstlineno 69 end = 0 70 table_iter = iter(code.internal_line_table): 71 for sdelta, ldelta in table_iter: 72 if ldelta == 0: # No change to line number, just accumulate changes to end 73 end += sdelta 74 continue 75 start = end 76 end = start + sdelta 77 if ldelta == -128: # No valid line number -- skip entry 78 continue 79 line += ldelta 80 if end == start: # Empty range, omit. 81 continue 82 yield start, end, line 83 84 85 86 87The historical co_lnotab format 88------------------------------- 89 90prior to 3.10 code objects stored a field named co_lnotab. 91This was an array of unsigned bytes disguised as a Python bytes object. 92 93The old co_lnotab did not account for the presence of bytecodes without a line number, 94nor was it well suited to tracing as a number of workarounds were required. 95 96The old format can still be accessed via `code.co_lnotab`, which is lazily computed from the new format. 97 98Below is the description of the old co_lnotab format: 99 100 101The array is conceptually a compressed list of 102 (bytecode offset increment, line number increment) 103pairs. The details are important and delicate, best illustrated by example: 104 105 byte code offset source code line number 106 0 1 107 6 2 108 50 7 109 350 207 110 361 208 111 112Instead of storing these numbers literally, we compress the list by storing only 113the difference from one row to the next. Conceptually, the stored list might 114look like: 115 116 0, 1, 6, 1, 44, 5, 300, 200, 11, 1 117 118The above doesn't really work, but it's a start. An unsigned byte (byte code 119offset) can't hold negative values, or values larger than 255, a signed byte 120(line number) can't hold values larger than 127 or less than -128, and the 121above example contains two such values. (Note that before 3.6, line number 122was also encoded by an unsigned byte.) So we make two tweaks: 123 124 (a) there's a deep assumption that byte code offsets increase monotonically, 125 and 126 (b) if byte code offset jumps by more than 255 from one row to the next, or if 127 source code line number jumps by more than 127 or less than -128 from one row 128 to the next, more than one pair is written to the table. In case #b, 129 there's no way to know from looking at the table later how many were written. 130 That's the delicate part. A user of co_lnotab desiring to find the source 131 line number corresponding to a bytecode address A should do something like 132 this: 133 134 lineno = addr = 0 135 for addr_incr, line_incr in co_lnotab: 136 addr += addr_incr 137 if addr > A: 138 return lineno 139 if line_incr >= 0x80: 140 line_incr -= 0x100 141 lineno += line_incr 142 143(In C, this is implemented by PyCode_Addr2Line().) In order for this to work, 144when the addr field increments by more than 255, the line # increment in each 145pair generated must be 0 until the remaining addr increment is < 256. So, in 146the example above, assemble_lnotab in compile.c should not (as was actually done 147until 2.2) expand 300, 200 to 148 255, 255, 45, 45, 149but to 150 255, 0, 45, 127, 0, 73. 151 152The above is sufficient to reconstruct line numbers for tracebacks, but not for 153line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c 154and maybe_call_line_trace() in ceval.c. 155 156*** Tracing *** 157 158To a first approximation, we want to call the tracing function when the line 159number of the current instruction changes. Re-computing the current line for 160every instruction is a little slow, though, so each time we compute the line 161number we save the bytecode indices where it's valid: 162 163 *instr_lb <= frame->f_lasti < *instr_ub 164 165is true so long as execution does not change lines. That is, *instr_lb holds 166the first bytecode index of the current line, and *instr_ub holds the first 167bytecode index of the next line. As long as the above expression is true, 168maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note 169that the same line may appear multiple times in the lnotab, either because the 170bytecode jumped more than 255 indices between line number changes or because 171the compiler inserted the same line twice. Even in that case, *instr_ub holds 172the first index of the next line. 173 174However, we don't *always* want to call the line trace function when the above 175test fails. 176 177Consider this code: 178 1791: def f(a): 1802: while a: 1813: print(1) 1824: break 1835: else: 1846: print(2) 185 186which compiles to this: 187 188 2 0 SETUP_LOOP 26 (to 28) 189 >> 2 LOAD_FAST 0 (a) 190 4 POP_JUMP_IF_FALSE 18 191 192 3 6 LOAD_GLOBAL 0 (print) 193 8 LOAD_CONST 1 (1) 194 10 CALL_NO_KW 1 195 12 POP_TOP 196 197 4 14 BREAK_LOOP 198 16 JUMP_ABSOLUTE 2 199 >> 18 POP_BLOCK 200 201 6 20 LOAD_GLOBAL 0 (print) 202 22 LOAD_CONST 2 (2) 203 24 CALL_NO_KW 1 204 26 POP_TOP 205 >> 28 LOAD_CONST 0 (None) 206 30 RETURN_VALUE 207 208If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18 209and the co_lnotab will claim that execution has moved to line 4, which is wrong. 210In this case, we could instead associate the POP_BLOCK with line 5, but that 211would break jumps around loops without else clauses. 212 213We fix this by only calling the line trace function for a forward jump if the 214co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current 215instruction offset matches the offset given for the start of a line by the 216co_lnotab. For backward jumps, however, we always call the line trace function, 217which lets a debugger stop on every evaluation of a loop guard (which usually 218won't be the first opcode in a line). 219 220Why do we set f_lineno when tracing, and only just before calling the trace 221function? Well, consider the code above when 'a' is true. If stepping through 222this with 'n' in pdb, you would stop at line 1 with a "call" type event, then 223line events on lines 2, 3, and 4, then a "return" type event -- but because the 224code for the return actually falls in the range of the "line 6" opcodes, you 225would be shown line 6 during this event. This is a change from the behaviour in 2262.2 and before, and I've found it confusing in practice. By setting and using 227f_lineno when tracing, one can report a line number different from that 228suggested by f_lasti on this one occasion where it's desirable. 229