rationale-for-bytecode.md - OpenGrok cross reference for /arkcompiler/runtime

Lines Matching full:the
16 directly on the CPU) or _memory_ (some locations in computer's RAM). An important subset of memory
17 operands are _stack operands_ that reside in a special data structure called _stack_. The program
18 must maintain the stack in the correct state during runtime because exactly this data structure
23 that the number and purpose of registers differs, too. Some nuances of working with stack may also
26 Here comes the bytecode. Simply said, it is an attempt to build an abstract CPU on top of real
27 ones. A program written for such abstract CPU can be run on any real hardware with the help of a
28 special program called _interpreter_. The goal of the interpreter is to read our unified _virtual_
30 making interpretation slower than _native code execution_. In return, we get the ability to
32 (debugger, profilers, etc.) is also unified, as well as the ecosystem for managing libraries,
35 Although bytecode represents some abstraction, it mirrors all the mentioned concepts from the
36 hardware world: the terms "operations", "operands", "registers" and "stack" have the same meaning.
37 In case there is a chance for ambiguity, the terms "virtual registers" and "virtual stack" are used
38 to distinguish between an abstract system and the hardware.
47 In _stack-based_ approach, operands are implicitly encoded in the operation, which results in
52     push_arg1 ; copy the first argument to the top of the stack
53     push_arg2 ; copy the second argument to the top of stack
54     add       ; remove two top-most values from the stack, add them and put the result at the top
55     ; at this point, the top of the stack contains arg1 + arg2
60 In _register-based approach_, operands are explicitly encoded in the operation, which results in
76 At the same time, to execute a stack-based addition we need to run 3 instructions compared to
77 just a single register-based instruction. Since the interpreter has an extra work to do to read
78 each bytecode instruction, execute it and move to the next one, running more instruction results in
79 more _dispatch overhead_. Which means that the stack-based bytecode is slower by nature.
82 if substituted by a stack-based analogue. At the same time, performance becomes 10%-40% worse
83 (depending on the benchmark).
85 Since bytecode interpretation is a required program execution mode for Panda, performance of the
89 However, to address the issue of compactness, two main tweaks are used:
94 According to our research, these tweaks will allow to reduce the size of uncompressed bytecode by
112 some "stack-based'ness" into an otherwise register-based instruction set in attempt to make the
119   body forming a separate def-use chain, i.e. in the majority of loops.
120 * You don't need to pass object reference in accumulator in the object call. Usually objects live
123 * The same goes with object and array loads and stores.
125 To address the risk of producing inefficient bytecode with redundant moves from and to
126 accumulator, a simple optimizer will be introduced as a part of the toolchain.
128 Finally, using accumulator allows getting rid of the instructions for writing the result to the reg…
134 the virtual stack as follows:
149 the instruction as follows:
157 to the stack-based approach. Of course, if virtual registers have large numbers that do no fit
165 How to make sure that we benefit from the shorter encoding most of the time? An observation shows
179 overloads are calls (different number of operands) and calls are the most popular instructions in
187 One option is to make the operation _statically typed_, i.e. specify explicitly that it works only
189 numbers and store the result into accumulator, we will need a dedicated `adda_d ...`, etc.
191 Another option is to make the operation _dynamically typed_, i.e. specify that `adda ...` handles
195 The first approach bloats the instruction set, but keeps the semantics of each instruction simple
196 and compact. The second approach keeps the instruction set small, but bloats the semantics of
199 It may seem that the dynamically typed approach is better for dynamically typed languages, but it
200 is true only if the platform is **not** supposed to support multiple languages.
201 Consider a simple example: what is the result of the expression `4 + "2"` in JavaScript and, say,
202 Python? In JavaScript, it evaluates to the string `"42"`, while Python forbids adding a string to
204 on the same platform with the same bytecode, we would have to handle both JavaScript-style addition
218 and `reg2` **must** hold only integer values throughout the function? Fortunately, the answer is
220 which do not distinguish between integers and pointers on many platforms). The key constraint is
222 must be of this very type, unless the virtual register is redefined. Language compilers and
223 bytecode verifiers take the responsibility to control this invariant.