1Description of the internal format of the line number table 2 3Conceptually, the line number table consists of a sequence of triples: 4 start-offset (inclusive), end-offset (exclusive), line-number. 5 6Note that not all byte codes have a line number so we need handle `None` for the line-number. 7 8However, storing the above sequence directly would be very inefficient as we would need 12 bytes per entry. 9 10First, note that the end of one entry is the same as the start of the next, so we can overlap entries. 11Second, we don't really need arbitrary access to the sequence, so we can store deltas. 12 13We just need to store (end - start, line delta) pairs. The start offset of the first entry is always zero. 14 15Third, most deltas are small, so we can use a single byte for each value, as long we allow several entries for the same line. 16 17Consider the following table 18 Start End Line 19 0 6 1 20 6 50 2 21 50 350 7 22 350 360 No line number 23 360 376 8 24 376 380 208 25 26Stripping the redundant ends gives: 27 28 End-Start Line-delta 29 6 +1 30 44 +1 31 300 +5 32 10 No line number 33 16 +1 34 4 +200 35 36 37Note that the end - start value is always positive. 38 39Finally, in order to fit into a single byte we need to convert start deltas to the range 0 <= delta <= 254, 40and line deltas to the range -127 <= delta <= 127. 41A line delta of -128 is used to indicate no line number. 42Also note that a delta of zero indicates that there are no bytecodes in the given range, 43which means we can use an invalid line number for that range. 44 45Final form: 46 47 Start delta Line delta 48 6 +1 49 44 +1 50 254 +5 51 46 0 52 10 -128 (No line number, treated as a delta of zero) 53 16 +1 54 0 +127 (line 135, but the range is empty as no bytecodes are at line 135) 55 4 +73 56 57Iterating over the table. 58------------------------- 59 60For the `co_lines` attribute we want to emit the full form, omitting the (350, 360, No line number) and empty entries. 61 62The code is as follows: 63 64def co_lines(code): 65 line = code.co_firstlineno 66 end = 0 67 table_iter = iter(code.internal_line_table): 68 for sdelta, ldelta in table_iter: 69 if ldelta == 0: # No change to line number, just accumulate changes to end 70 end += odelta 71 continue 72 start = end 73 end = start + sdelta 74 if ldelta == -128: # No valid line number -- skip entry 75 continue 76 line += ldelta 77 if end == start: # Empty range, omit. 78 continue 79 yield start, end, line 80 81 82 83 84The historical co_lnotab format 85------------------------------- 86 87prior to 3.10 code objects stored a field named co_lnotab. 88This was an array of unsigned bytes disguised as a Python bytes object. 89 90The old co_lnotab did not account for the presence of bytecodes without a line number, 91nor was it well suited to tracing as a number of workarounds were required. 92 93The old format can still be accessed via `code.co_lnotab`, which is lazily computed from the new format. 94 95Below is the description of the old co_lnotab format: 96 97 98The array is conceptually a compressed list of 99 (bytecode offset increment, line number increment) 100pairs. The details are important and delicate, best illustrated by example: 101 102 byte code offset source code line number 103 0 1 104 6 2 105 50 7 106 350 207 107 361 208 108 109Instead of storing these numbers literally, we compress the list by storing only 110the difference from one row to the next. Conceptually, the stored list might 111look like: 112 113 0, 1, 6, 1, 44, 5, 300, 200, 11, 1 114 115The above doesn't really work, but it's a start. An unsigned byte (byte code 116offset) can't hold negative values, or values larger than 255, a signed byte 117(line number) can't hold values larger than 127 or less than -128, and the 118above example contains two such values. (Note that before 3.6, line number 119was also encoded by an unsigned byte.) So we make two tweaks: 120 121 (a) there's a deep assumption that byte code offsets increase monotonically, 122 and 123 (b) if byte code offset jumps by more than 255 from one row to the next, or if 124 source code line number jumps by more than 127 or less than -128 from one row 125 to the next, more than one pair is written to the table. In case #b, 126 there's no way to know from looking at the table later how many were written. 127 That's the delicate part. A user of co_lnotab desiring to find the source 128 line number corresponding to a bytecode address A should do something like 129 this: 130 131 lineno = addr = 0 132 for addr_incr, line_incr in co_lnotab: 133 addr += addr_incr 134 if addr > A: 135 return lineno 136 if line_incr >= 0x80: 137 line_incr -= 0x100 138 lineno += line_incr 139 140(In C, this is implemented by PyCode_Addr2Line().) In order for this to work, 141when the addr field increments by more than 255, the line # increment in each 142pair generated must be 0 until the remaining addr increment is < 256. So, in 143the example above, assemble_lnotab in compile.c should not (as was actually done 144until 2.2) expand 300, 200 to 145 255, 255, 45, 45, 146but to 147 255, 0, 45, 127, 0, 73. 148 149The above is sufficient to reconstruct line numbers for tracebacks, but not for 150line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c 151and maybe_call_line_trace() in ceval.c. 152 153*** Tracing *** 154 155To a first approximation, we want to call the tracing function when the line 156number of the current instruction changes. Re-computing the current line for 157every instruction is a little slow, though, so each time we compute the line 158number we save the bytecode indices where it's valid: 159 160 *instr_lb <= frame->f_lasti < *instr_ub 161 162is true so long as execution does not change lines. That is, *instr_lb holds 163the first bytecode index of the current line, and *instr_ub holds the first 164bytecode index of the next line. As long as the above expression is true, 165maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note 166that the same line may appear multiple times in the lnotab, either because the 167bytecode jumped more than 255 indices between line number changes or because 168the compiler inserted the same line twice. Even in that case, *instr_ub holds 169the first index of the next line. 170 171However, we don't *always* want to call the line trace function when the above 172test fails. 173 174Consider this code: 175 1761: def f(a): 1772: while a: 1783: print(1) 1794: break 1805: else: 1816: print(2) 182 183which compiles to this: 184 185 2 0 SETUP_LOOP 26 (to 28) 186 >> 2 LOAD_FAST 0 (a) 187 4 POP_JUMP_IF_FALSE 18 188 189 3 6 LOAD_GLOBAL 0 (print) 190 8 LOAD_CONST 1 (1) 191 10 CALL_FUNCTION 1 192 12 POP_TOP 193 194 4 14 BREAK_LOOP 195 16 JUMP_ABSOLUTE 2 196 >> 18 POP_BLOCK 197 198 6 20 LOAD_GLOBAL 0 (print) 199 22 LOAD_CONST 2 (2) 200 24 CALL_FUNCTION 1 201 26 POP_TOP 202 >> 28 LOAD_CONST 0 (None) 203 30 RETURN_VALUE 204 205If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18 206and the co_lnotab will claim that execution has moved to line 4, which is wrong. 207In this case, we could instead associate the POP_BLOCK with line 5, but that 208would break jumps around loops without else clauses. 209 210We fix this by only calling the line trace function for a forward jump if the 211co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current 212instruction offset matches the offset given for the start of a line by the 213co_lnotab. For backward jumps, however, we always call the line trace function, 214which lets a debugger stop on every evaluation of a loop guard (which usually 215won't be the first opcode in a line). 216 217Why do we set f_lineno when tracing, and only just before calling the trace 218function? Well, consider the code above when 'a' is true. If stepping through 219this with 'n' in pdb, you would stop at line 1 with a "call" type event, then 220line events on lines 2, 3, and 4, then a "return" type event -- but because the 221code for the return actually falls in the range of the "line 6" opcodes, you 222would be shown line 6 during this event. This is a change from the behaviour in 2232.2 and before, and I've found it confusing in practice. By setting and using 224f_lineno when tracing, one can report a line number different from that 225suggested by f_lasti on this one occasion where it's desirable. 226