1Dalvik "mterp" README 2 3NOTE: Find rebuilding instructions at the bottom of this file. 4 5 6==== Overview ==== 7 8This is the source code for the Dalvik interpreter. The core of the 9original version was implemented as a single C function, but to improve 10performance we rewrote it in assembly. To make this and future assembly 11ports easier and less error-prone, we used a modular approach that allows 12development of platform-specific code one opcode at a time. 13 14The original all-in-one-function C version still exists as the "portable" 15interpreter, and is generated using the same sources and tools that 16generate the platform-specific versions. 17 18Every configuration has a "config-*" file that controls how the sources 19are generated. The sources are written into the "out" directory, where 20they are picked up by the Android build system. 21 22The best way to become familiar with the interpreter is to look at the 23generated files in the "out" directory, such as out/InterpC-portstd.c, 24rather than trying to look at the various component pieces in (say) 25armv5te. 26 27 28==== Platform-specific source generation ==== 29 30The architecture-specific config files determine what goes into two 31generated output files (InterpC-<arch>.c, InterpAsm-<arch>.S). The goal is 32to make it easy to swap C and assembly sources during initial development 33and testing, and to provide a way to use architecture-specific versions of 34some operations (e.g. making use of PLD instructions on ARMv6 or avoiding 35CLZ on ARMv4T). 36 37Depending on architecture, instruction-to-instruction transitions may 38be done as either computed goto or jump table. In the computed goto 39variant, each instruction handler is allocated a fixed-size area (e.g. 64 40byte). "Overflow" code is tacked on to the end. In the jump table variant, 41all of the instructions handlers are contiguous and may be of any size. 42The interpreter style is selected via the "handler-size" command (see below). 43 44When a C implementation for an instruction is desired, the assembly 45version packs all local state into the Thread structure and passes 46that to the C function. Updates to the state are pulled out of 47"Thread" on return. 48 49The "arch" value should indicate an architecture family with common 50programming characteristics, so "armv5te" would work for all ARMv5TE CPUs, 51but might not be backward- or forward-compatible. (We *might* want to 52specify the ABI model as well, e.g. "armv5te-eabi", but currently that adds 53verbosity without value.) 54 55 56==== Config file format ==== 57 58The config files are parsed from top to bottom. Each line in the file 59may be blank, hold a comment (line starts with '#'), or be a command. 60 61The commands are: 62 63 handler-style <computed-goto|jump-table|all-c> 64 65 Specify which style of interpreter to generate. In computed-goto, 66 each handler is allocated a fixed region, allowing transitions to 67 be done via table-start-address + (opcode * handler-size). With 68 jump-table style, handlers may be of any length, and the generated 69 table is an array of pointers to the handlers. The "all-c" style is 70 for the portable interpreter (which is implemented completely in C). 71 [Note: all-c is distinct from an "allstubs" configuration. In both 72 configurations, all handlers are the C versions, but the allstubs 73 configuration uses the assembly outer loop and assembly stubs to 74 transition to the handlers]. This command is required, and must be 75 the first command in the config file. 76 77 handler-size <bytes> 78 79 Specify the size of the fixed region, in bytes. On most platforms 80 this will need to be a power of 2. For jump-table and all-c 81 implementations, this command is ignored. 82 83 import <filename> 84 85 The specified file is included immediately, in its entirety. No 86 substitutions are performed. ".cpp" and ".h" files are copied to the 87 C output, ".S" files are copied to the asm output. 88 89 asm-stub <filename> 90 91 The named file will be included whenever an assembly "stub" is needed 92 to transfer control to a handler written in C. Text substitution is 93 performed on the opcode name. This command is not applicable to 94 to "all-c" configurations. 95 96 asm-alt-stub <filename> 97 98 When present, this command will cause the generation of an alternate 99 set of entry points (for computed-goto interpreters) or an alternate 100 jump table (for jump-table interpreters). 101 102 op-start <directory> 103 104 Indicates the start of the opcode list. Must precede any "op" 105 commands. The specified directory is the default location to pull 106 instruction files from. 107 108 op <opcode> <directory> 109 110 Can only appear after "op-start" and before "op-end". Overrides the 111 default source file location of the specified opcode. The opcode 112 definition will come from the specified file, e.g. "op OP_NOP armv5te" 113 will load from "armv5te/OP_NOP.S". A substitution dictionary will be 114 applied (see below). 115 116 alt <opcode> <directory> 117 118 Can only appear after "op-start" and before "op-end". Similar to the 119 "op" command above, but denotes a source file to override the entry 120 in the alternate handler table. The opcode definition will come from 121 the specified file, e.g. "alt OP_NOP armv5te" will load from 122 "armv5te/ALT_OP_NOP.S". A substitution dictionary will be applied 123 (see below). 124 125 op-end 126 127 Indicates the end of the opcode list. All kNumPackedOpcodes 128 opcodes are emitted when this is seen, followed by any code that 129 didn't fit inside the fixed-size instruction handler space. 130 131The order of "op" and "alt" directives are not significant; the generation 132tool will extract ordering info from the VM sources. 133 134Typically the form in which most opcodes currently exist is used in 135the "op-start" directive. For a new port you would start with "c", 136and add architecture-specific "op" entries as you write instructions. 137When complete it will default to the target architecture, and you insert 138"c" ops to stub out platform-specific code. 139 140For the <directory> specified in the "op" command, the "c" directory 141is special in two ways: (1) the sources are assumed to be C code, and 142will be inserted into the generated C file; (2) when a C implementation 143is emitted, a "glue stub" is emitted in the assembly source file. 144(The generator script always emits kNumPackedOpcodes assembly 145instructions, unless "asm-stub" was left blank, in which case it only 146emits some labels.) 147 148 149==== Instruction file format ==== 150 151The assembly instruction files are simply fragments of assembly sources. 152The starting label will be provided by the generation tool, as will 153declarations for the segment type and alignment. The expected target 154assembler is GNU "as", but others will work (may require fiddling with 155some of the pseudo-ops emitted by the generation tool). 156 157The C files do a bunch of fancy things with macros in an attempt to share 158code with the portable interpreter. (This is expected to be reduced in 159the future.) 160 161A substitution dictionary is applied to all opcode fragments as they are 162appended to the output. Substitutions can look like "$value" or "${value}". 163 164The dictionary always includes: 165 166 $opcode - opcode name, e.g. "OP_NOP" 167 $opnum - opcode number, e.g. 0 for OP_NOP 168 $handler_size_bytes - max size of an instruction handler, in bytes 169 $handler_size_bits - max size of an instruction handler, log 2 170 171Both C and assembly sources will be passed through the C pre-processor, 172so you can take advantage of C-style comments and preprocessor directives 173like "#define". 174 175Some generator operations are available. 176 177 %include "filename" [subst-dict] 178 179 Includes the file, which should look like "armv5te/OP_NOP.S". You can 180 specify values for the substitution dictionary, using standard Python 181 syntax. For example, this: 182 %include "armv5te/unop.S" {"result":"r1"} 183 would insert "armv5te/unop.S" at the current file position, replacing 184 occurrences of "$result" with "r1". 185 186 %default <subst-dict> 187 188 Specify default substitution dictionary values, using standard Python 189 syntax. Useful if you want to have a "base" version and variants. 190 191 %break 192 193 Identifies the split between the main portion of the instruction 194 handler (which must fit in "handler-size" bytes) and the "sister" 195 code, which is appended to the end of the instruction handler block. 196 In jump table implementations, %break is ignored. 197 198 %verify "message" 199 200 Leave a note to yourself about what needs to be tested. (This may 201 turn into something more interesting someday; for now, it just gets 202 stripped out before the output is generated.) 203 204The generation tool does *not* print a warning if your instructions 205exceed "handler-size", but the VM will abort on startup if it detects an 206oversized handler. On architectures with fixed-width instructions this 207is easy to work with, on others this you will need to count bytes. 208 209 210==== Using C constants from assembly sources ==== 211 212The file "common/asm-constants.h" has some definitions for constant 213values, structure sizes, and struct member offsets. The format is fairly 214restricted, as simple macros are used to massage it for use with both C 215(where it is verified) and assembly (where the definitions are used). 216 217If a constant in the file becomes out of sync, the VM will log an error 218message and abort during startup. 219 220 221==== Development tips ==== 222 223If you need to debug the initial piece of an opcode handler, and your 224debug code expands it beyond the handler size limit, you can insert a 225generic header at the top: 226 227 b ${opcode}_start 228%break 229${opcode}_start: 230 231If you already have a %break, it's okay to leave it in place -- the second 232%break is ignored. 233 234 235==== Rebuilding ==== 236 237If you change any of the source file fragments, you need to rebuild the 238combined source files in the "out" directory. Make sure the files in 239"out" are editable, then: 240 241 $ cd mterp 242 $ ./rebuild.sh 243 244As of this writing, this requires Python 2.5. You may see inscrutible 245error messages or just general failure if you have a different version 246of Python installed. 247 248The ultimate goal is to have the build system generate the necessary 249output files without requiring this separate step, but we're not yet 250ready to require Python in the build. 251 252==== Interpreter Control ==== 253 254The central mechanism for interpreter control is the InterpBreak struture 255that is found in each thread's Thread struct (see vm/Thread.h). There 256is one mandatory field, and two optional fields: 257 258 subMode - required, describes debug/profile/special operation 259 breakFlags & curHandlerTable - optional, used lower subMode polling costs 260 261The subMode field is a bitmask which records all currently active 262special modes of operation. For example, when Traceview profiling 263is active, kSubModeMethodTrace is set. This bit informs the interpreter 264that it must notify the profiling subsystem on each method entry and 265return. There are similar bits for an active debugging session, 266instruction count profiling, pending thread suspension request, etc. 267 268To support special subMode operation the simplest mechanism for the 269interpreter is to poll the subMode field before interpreting each Dalvik 270bytecode and take any required action. In fact, this is precisely 271what the portable interpreter does. The "FINISH" macro expands to 272include a test of subMode and subsequent call to the "dvmCheckBefore()". 273 274Per-instruction polling, however, is expensive and subMode operation is 275relative rare. For normal operation we'd like to avoid having to perform 276any checks unless a special subMode is actually in effect. This is 277where curHandlerTable and breakFlags come in to play. 278 279The mterp fast interpreter achieves much of its performance advantage 280over the portable interpreter through its efficient mechanism of 281transitioning from one Dalvik bytecode to the next. Mterp for ARM targets 282uses a computed-goto mechanism, in which the handler entrypoints are 283located at the base of the handler table + (opcode * 64). Mterp for x86 284targets instead uses a jump table of handler entry points indexed 285by the Dalvik opcode. To support efficient handling of special subModes, 286mterp supports two sets of handler entries (for ARM) or two jump 287tables (for x86). One handler set is optimized for speed and performs no 288inter-instruction checks (mainHandlerTable in the Thread structure), while 289the other includes a test of the subMode field (altHandlerTable). 290 291In normal operation (i.e. subMode == 0), the dedicated register rIBASE 292(r8 for ARM, edx for x86) holds a mainHandlerTable. If we need to switch 293to a subMode that requires inter-instruction checking, rIBASE is changed 294to altHandlerTable. Note that this change is not immediate. What is actually 295changed is the value of curHandlerTable - which is part of the interpBreak 296structure. Rather than explicitly check for changes, each thread will 297blindly refresh rIBASE at backward branches, exception throws and returns. 298 299The breakFlags field tells the interpreter control mechanism whether 300curHandlerTable should hold the real or alternate handler base. If 301non-zero, we use the altHandlerBase. The bits within breakFlags 302tells dvmCheckBefore which set of subModes need to be checked. 303 304See dvmCheckBefore() for subMode handling, and dvmEnableSubMode(), 305dvmDisableSubMode() for switching on and off. 306