• Home
Name Date Size #Lines LOC

..--

README.mdD04-Jul-20252.7 KiB5849

_typing_backports.pyD04-Jul-2025469 1610

analyzer.pyD04-Jul-202526.6 KiB891747

cwriter.pyD04-Jul-20254.3 KiB147128

generators_common.pyD04-Jul-20255.9 KiB243213

interpreter_definition.mdD04-Jul-202513.7 KiB439349

lexer.pyD04-Jul-20258 KiB376313

mypy.iniD04-Jul-2025381 1613

opcode_id_generator.pyD04-Jul-20251.7 KiB6651

opcode_metadata_generator.pyD04-Jul-202513.6 KiB392340

optimizer_generator.pyD04-Jul-20257.4 KiB237204

parser.pyD04-Jul-20251.8 KiB6751

parsing.pyD04-Jul-202515 KiB481395

plexer.pyD04-Jul-20253.3 KiB11181

py_metadata_generator.pyD04-Jul-20252.8 KiB9877

stack.pyD04-Jul-20257.3 KiB228195

target_generator.pyD04-Jul-20251.4 KiB5544

tier1_generator.pyD04-Jul-20256.5 KiB206179

tier2_generator.pyD04-Jul-20257.1 KiB255222

uop_id_generator.pyD04-Jul-20252.3 KiB8367

uop_metadata_generator.pyD04-Jul-20253.2 KiB9681

README.md

1# Tooling to generate interpreters
2
3Documentation for the instruction definitions in `Python/bytecodes.c`
4("the DSL") is [here](interpreter_definition.md).
5
6What's currently here:
7
8- `analyzer.py`: code for converting `AST` generated by `Parser`
9  to more high-level structure for easier interaction
10- `lexer.py`: lexer for C, originally written by Mark Shannon
11- `plexer.py`: OO interface on top of lexer.py; main class: `PLexer`
12- `parsing.py`: Parser for instruction definition DSL; main class: `Parser`
13- `parser.py` helper for interactions with `parsing.py`
14- `tierN_generator.py`: a couple of driver scripts to read `Python/bytecodes.c` and
15  write `Python/generated_cases.c.h` (and several other files)
16- `optimizer_generator.py`: reads `Python/bytecodes.c` and
17  `Python/optimizer_bytecodes.c` and writes
18  `Python/optimizer_cases.c.h`
19- `stack.py`: code to handle generalized stack effects
20- `cwriter.py`: code which understands tokens and how to format C code;
21  main class: `CWriter`
22- `generators_common.py`: helpers for generators
23- `opcode_id_generator.py`: generate a list of opcodes and write them to
24  `Include/opcode_ids.h`
25- `opcode_metadata_generator.py`: reads the instruction definitions and
26  write the metadata to `Include/internal/pycore_opcode_metadata.h`
27- `py_metadata_generator.py`: reads the instruction definitions and
28  write the metadata to `Lib/_opcode_metadata.py`
29- `target_generator.py`: generate targets for computed goto dispatch and
30  write them to `Python/opcode_targets.h`
31- `uop_id_generator.py`: generate a list of uop IDs and write them to
32  `Include/internal/pycore_uop_ids.h`
33- `uop_metadata_generator.py`: reads the instruction definitions and
34  write the metadata to `Include/internal/pycore_uop_metadata.h`
35
36Note that there is some dummy C code at the top and bottom of
37`Python/bytecodes.c`
38to fool text editors like VS Code into believing this is valid C code.
39
40## A bit about the parser
41
42The parser class uses a pretty standard recursive descent scheme,
43but with unlimited backtracking.
44The `PLexer` class tokenizes the entire input before parsing starts.
45We do not run the C preprocessor.
46Each parsing method returns either an AST node (a `Node` instance)
47or `None`, or raises `SyntaxError` (showing the error in the C source).
48
49Most parsing methods are decorated with `@contextual`, which automatically
50resets the tokenizer input position when `None` is returned.
51Parsing methods may also raise `SyntaxError`, which is irrecoverable.
52When a parsing method returns `None`, it is possible that after backtracking
53a different parsing method returns a valid AST.
54
55Neither the lexer nor the parsers are complete or fully correct.
56Most known issues are tersely indicated by `# TODO:` comments.
57We plan to fix issues as they become relevant.
58