• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Interpreter Design
2
3## Introduction
4
5This document outlines the key design decisions in the interpreter and its companion components:
6bytecode instruction set, executable file format, etc. This document is organized as follows:
7
8| Title                           | Description                                                |
9| ------------------------------- | ---------------------------------------------------------- |
10| Requirements                    | Enlists the requirements to the component.                 |
11| Key Design Decisions            | Summarizes the key decisions to address the requirements.  |
12| Rationale                       | Elaborates on the rationales behind the decisions.         |
13| Specification / Implementation  | Provides relevant links.                                   |
14
15Please refer to the [glossary](glossary.md) for terminology clarification.
16
17### Common Requirements
18
19This section outlines common requirements that should be considered while designing the interpreter
20and all related components of the platform:
21
221. The platform should scale from microcontrollers to high-end mobile phones:
23    1. It should fit into 50KB of ROM.
24    1. It should be able to run consuming 64KB of RAM.
251. Program execution via bytecode interpretation should be enabled on all targets.
261. The platform should support multiple programming languages:
27    1. Java (and probably other languages that compile to JVM bytecode) should run with acceptable
28       performance.
29    1. JavaScript (and probably other dynamically typed languages) should run on the platform, but
30       their performance may be worse than shown by specialized engines.
31
32## Bytecode
33
34### Requirements
35
361. Bytecode should allow the interpreter to run no slower than state of the art interpreters.
371. Bytecode should be compact in size to avoid bloating application code.
381. Bytecode description should have a single entry point to simplify maintenance
39   across all components of the platform.
40
41### Key Design Decisions
42
431. Bytecode is register-based: all arguments and variables are mapped to virtual registers,
44   and most of bytecodes encode virtual registers as operands.
451. There is a dedicated register called accumulator, which is addressed implicitly by some
46   bytecodes and shared across all function frames during runtime.
471. Bytecode's instruction set architecture is machine-readable with a dedicated API for
48   code and documentation generation.
49
50### Rationale
51
52For the rationale on the bytecode type, please see [here](rationale-for-bytecode.md).
53
54Rationale on the machine-readable instruction set architecture is following. Since bytecode
55is the main form of program representation, information about it is needed in many components of
56the platform outside the interpreter. Having this crucial information copy-pasted or delivered as
57a bunch of C/C++ headers is very fragile and error-prone. At the same time,
58with the machine-readable ISA we now already can re-use this information in many places, e.g.:
59
60* In Panda Assembler's back-end, we automatically generate emission of bytecode in the binary form.
61* Converter from bytecode to the compiler's intermediate representation is partially implemented
62  with code autogenerated from the ISA.
63
64Besides, the machine-readable form naturally sets up the framework for self-testing (e.g.
65definitions have been defined and used, different parts of instruction description don't contradict
66themselves, etc.).
67
68### Specification / Implementation
69
70Please find the implementation of the instruction set architecture [here](../isa/isa.yaml).
71
72## Executable File Format
73
74### Requirements
75
761. All entities in the executable file should be encoded and stored compactly to avoid bloating
77   application code.
781. Format should be enforced as "flat" layout as possible: referencing metadata from other metadata
79   should be kept at minimum to reduce access overhead during runtime.
801. Runtime memory footprint of executable files should be low.
81
82### Key Design Decisions
83
841. The entire code of the application (excluding frameworks and external libraries) fits into a
85   single file to gain maximum benefits from deduplicating constant string pools, information
86   about types, etc.
871. All metadata entities are split into two groups: local (declared in the current executable file)
88   and foreign (declared elsewhere). Local entities can be accessed directly by the offset
89   in the offset. Additionally, 4-byte alignment is enforced for the most of data structures for
90   more efficient data reads from the Panda binary file. Foreign entities are loaded from their
91   respective files on demand.
921. The format uses raw offsets as the main access method to the actual data and doesn't
93   require explicitly how structures should be located relative to each other.
94
95### Specification / Implementation
96
97Please find the specification [here](file_format.md).
98
99## Interpreter
100
101### Requirements
102
1031. Interpreter should run no slower than state of the art interpreters.
1041. Interpreter should be portable enough to run on targets from IoT devices
105   to high-end mobile phones.
1061. Interpreter should not create extra preassure on the host system.
107
108### Key Design Decisions
109
1101. Interpreter uses indirect threaded dispatch technique
111   (implemented via computed goto) to reduce dispatch overhead.
1121. Interpreter does not depend on C++ standard library. All necessary classes, containers, etc.
113   are reimplemented by the platform.
1141. Interpreter is stackless (from the host stack perspective): Whenever a call from managed code
115   to managed code is performed, no new host frame is created for the interpreter itself.
116
117### Rationale
118
1191. Interpreters are by nature slower than native code execution. Slowdown can be explained by:
120    1. The inevitable extra cost that interpreters pay for decoding and dispatching bytecode
121       instructions.
122    1. The "heaviness" of language semantics that interpreters should implement.
123    1. The "nativeness" of the language implementation to the platform. A language that is
124       implemented in another language that runs on the platform may run slower because of
125       additional runtime overhead.
1261. An ideal target would be 5x-10x slowdown factor (compared to native execution)
127   for statically typed languages that run on the platform natively, and 15x-20x for L2 languages.
1281. Panda should scale onto a wide range of devices, including IoT devices. Although more and
129   more toolchains support C++ compilation for IoT, the standard library is often not present
130   on the device. Since static linking with a subset of the library is a pain and may not guarantee
131   optimal size of resulting native binary executable files, it is more reasonable to reimplement
132   required parts.
1331. According to our experiments, a stackless interpreter for a stack-based bytecode (which is by
134   nature slower) can even beat sometimes a non-stackless interpreter for a register-based
135   bytecode (which is by nature faster).
136
137### Specification / Implementation
138
139Please find the reference implementation [here](../runtime/interpreter).
140
141## Virtual Stack
142
143### Requirements
144
1451. All virtual registers should explicitly and precisely distinguish between garbage collectable
146   objects and non-garbage collectable primitives to simplify automatic memory management.
1471. Virtual registers should be able to hold following types: unsigned and signed integers with
148   the size of up to 64 bits, floating point numbers of single and double precision, raw pointers
149   to the objects on 32-bit and 64-bit architectures.
1501. Virtual stack should abstract limitations possibly imposed by the host stack.
151
152### Key Design Decisions
153
1541. All virtual registers are "tagged", meaning that they contain both the payload and additional
155   metadata (a "tag"), which allows at least distinguishment between garbage collectable and other
156   types.
1571. The size of a virtual register is not hardcoded by design. Currently it is always 128 bits,
158   but this is an implementation detail which is hidden behind the interfaces
159   and can be reduced for memory-constrained targets and to 64 bits (using NaN-tagging technique).
1601. By default, virtual stack is not mapped to a host stack. Instead, it is allocated on heap using
161   platform's memory management facilities. It is important that this behavior is not
162   hardcoded by design, so that it can be reconfigured for platforms where performance may benefit from
163   mapping virtual stack directly to the host stack.
164
165### Rationale
166
1671. Although tagged virtual registers occupy more memory (especially on 64-bit architectures),
168   redundant memory consumption is cheaper than ongoing runtime penalties on garbage collector
169   trying to distiguish between objects from non-objects on an "imprecise" stack.
1701. Where does 128 come from? It is `128 = 64 + 64`. The first 64 is either `sizeof(long int)`
171   or `sizeof(double)` or `sizeof(void *)` on a 64-bit architecture (i.e. theoretical maximum
172   size of the payload we are required to store in a virtual register). The second 64 is for tag
173   and padding. A lot of free space is expected to be in the padding area. Probably we may use it
174   for memory management or some other needs.
1751. Enforcing mapping of a virtual stack to the host stack brings some unpleasant constraints
176   on IoT devices. On such devices the size of the host stack may be severely limited: as a result,
177   managed applications have to think about this limitation, which contradicts the idea of
178   portability of managed applications. We can relax this constraint to some extent with configurable
179   virtual stack implementation, and can even to a greater extent with the stackless interpreter (see above).
180
181### Specification / Implementation
182
183Please find the reference implementation [here](../runtime/interpreter/frame.h).
184
185## Quality Assurance
186
187### Requirements
188
1891. Interpreter should allow testing without involving front-ends of concrete languages.
1901. Interpreter should allow early testing.
1911. Interpreter should allow early benchmarking.
192
193### Key Design Decisions
194
1951. A light-weight Panda Assembly language is developed, along with the Panda Assembler tool.
1961. A compliance test suite for the Panda Assembly language is created. The core part of the suite
197   are small chunks of hand-written Panda Assembly covering corner cases, while the majority of
198   cases are covered by automatically generated chunks.
1991. A set of benchmarks is ported to Panda Assembly and maintaned as a part of the source tree.
200
201### Specification / Implementation
202
203Please find the specification [here](assembly_format.md).
204
205Please find the implementation [here](../assembler).
206