1# Tooling to generate interpreters 2 3Documentation for the instruction definitions in `Python/bytecodes.c` 4("the DSL") is [here](interpreter_definition.md). 5 6What's currently here: 7 8- `analyzer.py`: code for converting `AST` generated by `Parser` 9 to more high-level structure for easier interaction 10- `lexer.py`: lexer for C, originally written by Mark Shannon 11- `plexer.py`: OO interface on top of lexer.py; main class: `PLexer` 12- `parsing.py`: Parser for instruction definition DSL; main class: `Parser` 13- `parser.py` helper for interactions with `parsing.py` 14- `tierN_generator.py`: a couple of driver scripts to read `Python/bytecodes.c` and 15 write `Python/generated_cases.c.h` (and several other files) 16- `optimizer_generator.py`: reads `Python/bytecodes.c` and 17 `Python/optimizer_bytecodes.c` and writes 18 `Python/optimizer_cases.c.h` 19- `stack.py`: code to handle generalized stack effects 20- `cwriter.py`: code which understands tokens and how to format C code; 21 main class: `CWriter` 22- `generators_common.py`: helpers for generators 23- `opcode_id_generator.py`: generate a list of opcodes and write them to 24 `Include/opcode_ids.h` 25- `opcode_metadata_generator.py`: reads the instruction definitions and 26 write the metadata to `Include/internal/pycore_opcode_metadata.h` 27- `py_metadata_generator.py`: reads the instruction definitions and 28 write the metadata to `Lib/_opcode_metadata.py` 29- `target_generator.py`: generate targets for computed goto dispatch and 30 write them to `Python/opcode_targets.h` 31- `uop_id_generator.py`: generate a list of uop IDs and write them to 32 `Include/internal/pycore_uop_ids.h` 33- `uop_metadata_generator.py`: reads the instruction definitions and 34 write the metadata to `Include/internal/pycore_uop_metadata.h` 35 36Note that there is some dummy C code at the top and bottom of 37`Python/bytecodes.c` 38to fool text editors like VS Code into believing this is valid C code. 39 40## A bit about the parser 41 42The parser class uses a pretty standard recursive descent scheme, 43but with unlimited backtracking. 44The `PLexer` class tokenizes the entire input before parsing starts. 45We do not run the C preprocessor. 46Each parsing method returns either an AST node (a `Node` instance) 47or `None`, or raises `SyntaxError` (showing the error in the C source). 48 49Most parsing methods are decorated with `@contextual`, which automatically 50resets the tokenizer input position when `None` is returned. 51Parsing methods may also raise `SyntaxError`, which is irrecoverable. 52When a parsing method returns `None`, it is possible that after backtracking 53a different parsing method returns a valid AST. 54 55Neither the lexer nor the parsers are complete or fully correct. 56Most known issues are tersely indicated by `# TODO:` comments. 57We plan to fix issues as they become relevant. 58