1All about co_lnotab, the line number table. 2 3Code objects store a field named co_lnotab. This is an array of unsigned bytes 4disguised as a Python string. It is used to map bytecode offsets to source code 5line #s for tracebacks and to identify line number boundaries for line tracing. 6 7The array is conceptually a compressed list of 8 (bytecode offset increment, line number increment) 9pairs. The details are important and delicate, best illustrated by example: 10 11 byte code offset source code line number 12 0 1 13 6 2 14 50 7 15 350 307 16 361 308 17 18Instead of storing these numbers literally, we compress the list by storing only 19the increments from one row to the next. Conceptually, the stored list might 20look like: 21 22 0, 1, 6, 1, 44, 5, 300, 300, 11, 1 23 24The above doesn't really work, but it's a start. Note that an unsigned byte 25can't hold negative values, or values larger than 255, and the above example 26contains two such values. So we make two tweaks: 27 28 (a) there's a deep assumption that byte code offsets and their corresponding 29 line #s both increase monotonically, and 30 (b) if at least one column jumps by more than 255 from one row to the next, 31 more than one pair is written to the table. In case #b, there's no way to know 32 from looking at the table later how many were written. That's the delicate 33 part. A user of co_lnotab desiring to find the source line number 34 corresponding to a bytecode address A should do something like this 35 36 lineno = addr = 0 37 for addr_incr, line_incr in co_lnotab: 38 addr += addr_incr 39 if addr > A: 40 return lineno 41 lineno += line_incr 42 43(In C, this is implemented by PyCode_Addr2Line().) In order for this to work, 44when the addr field increments by more than 255, the line # increment in each 45pair generated must be 0 until the remaining addr increment is < 256. So, in 46the example above, assemble_lnotab in compile.c should not (as was actually done 47until 2.2) expand 300, 300 to 48 255, 255, 45, 45, 49but to 50 255, 0, 45, 255, 0, 45. 51 52The above is sufficient to reconstruct line numbers for tracebacks, but not for 53line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c 54and maybe_call_line_trace() in ceval.c. 55 56*** Tracing *** 57 58To a first approximation, we want to call the tracing function when the line 59number of the current instruction changes. Re-computing the current line for 60every instruction is a little slow, though, so each time we compute the line 61number we save the bytecode indices where it's valid: 62 63 *instr_lb <= frame->f_lasti < *instr_ub 64 65is true so long as execution does not change lines. That is, *instr_lb holds 66the first bytecode index of the current line, and *instr_ub holds the first 67bytecode index of the next line. As long as the above expression is true, 68maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note 69that the same line may appear multiple times in the lnotab, either because the 70bytecode jumped more than 255 indices between line number changes or because 71the compiler inserted the same line twice. Even in that case, *instr_ub holds 72the first index of the next line. 73 74However, we don't *always* want to call the line trace function when the above 75test fails. 76 77Consider this code: 78 791: def f(a): 802: while a: 813: print 1, 824: break 835: else: 846: print 2, 85 86which compiles to this: 87 88 2 0 SETUP_LOOP 19 (to 22) 89 >> 3 LOAD_FAST 0 (a) 90 6 POP_JUMP_IF_FALSE 17 91 92 3 9 LOAD_CONST 1 (1) 93 12 PRINT_ITEM 94 95 4 13 BREAK_LOOP 96 14 JUMP_ABSOLUTE 3 97 >> 17 POP_BLOCK 98 99 6 18 LOAD_CONST 2 (2) 100 21 PRINT_ITEM 101 >> 22 LOAD_CONST 0 (None) 102 25 RETURN_VALUE 103 104If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 17 105and the co_lnotab will claim that execution has moved to line 4, which is wrong. 106In this case, we could instead associate the POP_BLOCK with line 5, but that 107would break jumps around loops without else clauses. 108 109We fix this by only calling the line trace function for a forward jump if the 110co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current 111instruction offset matches the offset given for the start of a line by the 112co_lnotab. For backward jumps, however, we always call the line trace function, 113which lets a debugger stop on every evaluation of a loop guard (which usually 114won't be the first opcode in a line). 115 116Why do we set f_lineno when tracing, and only just before calling the trace 117function? Well, consider the code above when 'a' is true. If stepping through 118this with 'n' in pdb, you would stop at line 1 with a "call" type event, then 119line events on lines 2, 3, and 4, then a "return" type event -- but because the 120code for the return actually falls in the range of the "line 6" opcodes, you 121would be shown line 6 during this event. This is a change from the behaviour in 1222.2 and before, and I've found it confusing in practice. By setting and using 123f_lineno when tracing, one can report a line number different from that 124suggested by f_lasti on this one occasion where it's desirable. 125