1<html> 2<head> 3 <title>Dalvik Porting Guide</title> 4</head> 5 6<body> 7<h1>Dalvik Porting Guide</h1> 8 9<p> 10The Dalvik virtual machine is intended to run on a variety of platforms. 11The baseline system is expected to be a variant of UNIX (Linux, BSD, Mac 12OS X) running the GNU C compiler. Little-endian CPUs have been exercised 13the most heavily, but big-endian systems are explicitly supported. 14</p><p> 15There are two general categories of work: porting to a Linux system 16with a previously unseen CPU architecture, and porting to a different 17operating system. This document covers the former. 18</p><p> 19Basic familiarity with the Android platform, source code structure, and 20build system is assumed. 21</p> 22 23 24<h2>Core Libraries</h2> 25 26<p> 27The native code in the core libraries (chiefly <code>libcore</code>, 28but also <code>dalvik/vm/native</code>) is written in C/C++ and is expected 29to work without modification in a Linux environment. 30</p><p> 31The core libraries pull in code from many other projects, including 32OpenSSL, zlib, and ICU. These will also need to be ported before the VM 33can be used. 34</p> 35 36 37<h2>JNI Call Bridge</h2> 38 39<p> 40Most of the Dalvik VM runtime is written in portable C. The one 41non-portable component of the runtime is the JNI call bridge. Simply put, 42this converts an array of integers into function arguments of various 43types, and calls a function. This must be done according to the C calling 44conventions for the platform. The task could be as simple as pushing all 45of the arguments onto the stack, or involve complex rules for register 46assignment and stack alignment. 47</p><p> 48To ease porting to new platforms, the <a href="http://sourceware.org/libffi/"> 49open-source FFI library</a> (Foreign Function Interface) is used when a 50custom bridge is unavailable. FFI is not as fast as a native implementation, 51and the optional performance improvements it does offer are not used, so 52writing a replacement is a good first step. 53</p><p> 54The code lives in <code>dalvik/vm/arch/*</code>, with the FFI-based version 55in the "generic" directory. There are two source files for each architecture. 56One defines the call bridge itself: 57</p><p><blockquote> 58<code>void dvmPlatformInvoke(void* pEnv, ClassObject* clazz, int argInfo, 59int argc, const u4* argv, const char* signature, void* func, 60JValue* pReturn)</code> 61</blockquote></p><p> 62This will invoke a C/C++ function declared: 63</p><p><blockquote> 64 <code>return_type func(JNIEnv* pEnv, Object* this [, <i>args</i>])<br></code> 65</blockquote>or (for a "static" method):<blockquote> 66 <code>return_type func(JNIEnv* pEnv, ClassObject* clazz [, <i>args</i>])</code> 67</blockquote></p><p> 68The role of <code>dvmPlatformInvoke</code> is to convert the values in 69<code>argv</code> into C-style calling conventions, call the method, and 70then place the return type into <code>pReturn</code> (a union that holds 71all of the basic JNI types). The code may use the method signature 72(a DEX "shorty" signature, with one character for the return type and one 73per argument) to determine how to handle the values. 74</p><p> 75The other source file involved here defines a 32-bit "hint". The hint 76is computed when the method's class is loaded, and passed in as the 77"argInfo" argument. The hint can be used to avoid scanning the ASCII 78method signature for things like the return value, total argument size, 79or inter-argument 64-bit alignment restrictions. 80 81 82<h2>Interpreter</h2> 83 84<p> 85The Dalvik runtime includes two interpreters, labeled "portable" and "fast". 86The portable interpreter is largely contained within a single C function, 87and should compile on any system that supports gcc. (If you don't have gcc, 88you may need to disable the "threaded" execution model, which relies on 89gcc's "goto table" implementation; look for the THREADED_INTERP define.) 90</p><p> 91The fast interpreter uses hand-coded assembly fragments. If none are 92available for the current architecture, the build system will create an 93interpreter out of C "stubs". The resulting "all stubs" interpreter is 94quite a bit slower than the portable interpreter, making "fast" something 95of a misnomer. 96</p><p> 97The fast interpreter is enabled by default. On platforms without native 98support, you may want to switch to the portable interpreter. This can 99be controlled with the <code>dalvik.vm.execution-mode</code> system 100property. For example, if you: 101</p><p><blockquote> 102<code>adb shell "echo dalvik.vm.execution-mode = int:portable >> /data/local.prop"</code> 103</blockquote></p><p> 104and reboot, the Android app framework will start the VM with the portable 105interpreter enabled. 106</p> 107 108 109<h3>Mterp Interpreter Structure</h3> 110 111<p> 112There may be significant performance advantages to rewriting the 113interpreter core in assembly language, using architecture-specific 114optimizations. In Dalvik this can be done one instruction at a time. 115</p><p> 116The simplest way to implement an interpreter is to have a large "switch" 117statement. After each instruction is handled, the interpreter returns to 118the top of the loop, fetches the next instruction, and jumps to the 119appropriate label. 120</p><p> 121An improvement on this is called "threaded" execution. The instruction 122fetch and dispatch are included at the end of every instruction handler. 123This makes the interpreter a little larger overall, but you get to avoid 124the (potentially expensive) branch back to the top of the switch statement. 125</p><p> 126Dalvik mterp goes one step further, using a computed goto instead of a goto 127table. Instead of looking up the address in a table, which requires an 128extra memory fetch on every instruction, mterp multiplies the opcode number 129by a fixed value. By default, each handler is allowed 64 bytes of space. 130</p><p> 131Not all handlers fit in 64 bytes. Those that don't can have subroutines 132or simply continue on to additional code outside the basic space. Some of 133this is handled automatically by Dalvik, but there's no portable way to detect 134overflow of a 64-byte handler until the VM starts executing. 135</p><p> 136The choice of 64 bytes is somewhat arbitrary, but has worked out well for 137ARM and x86. 138</p><p> 139In the course of development it's useful to have C and assembly 140implementations of each handler, and be able to flip back and forth 141between them when hunting problems down. In mterp this is relatively 142straightforward. You can always see the files being fed to the compiler 143and assembler for your platform by looking in the 144<code>dalvik/vm/mterp/out</code> directory. 145</p><p> 146The interpreter sources live in <code>dalvik/vm/mterp</code>. If you 147haven't yet, you should read <code>dalvik/vm/mterp/README.txt</code> now. 148</p> 149 150 151<h3>Getting Started With Mterp</h3> 152 153</p><p> 154Getting started: 155<ol> 156<li>Decide on the name of your architecture. For the sake of discussion, 157let's call it <code>myarch</code>. 158<li>Make a copy of <code>dalvik/vm/mterp/config-allstubs</code> to 159<code>dalvik/vm/mterp/config-myarch</code>. 160<li>Create a <code>dalvik/vm/mterp/myarch</code> directory to hold your 161source files. 162<li>Add <code>myarch</code> to the list in 163<code>dalvik/vm/mterp/rebuild.sh</code>. 164<li>Make sure <code>dalvik/vm/Android.mk</code> will find the files for 165your architecture. If <code>$(TARGET_ARCH)</code> is configured this 166will happen automatically. 167<li>Disable the Dalvik JIT. You can do this in the general device 168configuration, or by editing the initialization of WITH_JIT in 169<code>dalvik/vm/Dvm.mk</code> to always be <code>false</code>. 170</ol> 171</p><p> 172You now have the basic framework in place. Whenever you make a change, you 173need to perform two steps: regenerate the mterp output, and build the 174core VM library. (It's two steps because we didn't want the build system 175to require Python 2.5. Which, incidentally, you need to have.) 176<ol> 177<li>In the <code>dalvik/vm/mterp</code> directory, regenerate the contents 178of the files in <code>dalvik/vm/mterp/out</code> by executing 179<code>./rebuild.sh</code>. Note there are two files, one in C and one 180in assembly. 181<li>In the <code>dalvik</code> directory, regenerate the 182<code>libdvm.so</code> library with <code>mm</code>. You can also use 183<code>mmm dalvik/vm</code> from the top of the tree. 184</ol> 185</p><p> 186This will leave you with an updated libdvm.so, which can be pushed out to 187a device with <code>adb sync</code> or <code>adb push</code>. If you're 188using the emulator, you need to add <code>make snod</code> (System image, 189NO Dependency check) to rebuild the system image file. You should not 190need to do a top-level "make" and rebuild the dependent binaries. 191</p><p> 192At this point you have an "all stubs" interpreter. You can see how it 193works by examining <code>dalvik/vm/mterp/cstubs/entry.c</code>. The 194code runs in a loop, pulling out the next opcode, and invoking the 195handler through a function pointer. Each handler takes a "glue" argument 196that contains all of the useful state. 197</p><p> 198Your goal is to replace the entry method, exit method, and each individual 199instruction with custom implementations. The first thing you need to do 200is create an entry function that calls the handler for the first instruction. 201After that, the instructions chain together, so you don't need a loop. 202(Look at the ARM or x86 implementation to see how they work.) 203</p><p> 204Once you have that, you need something to jump to. You can't branch 205directly to the C stub because it's expecting to be called with a "glue" 206argument and then return. We need a C stub "wrapper" that does the 207setup and jumps directly to the next handler. We write this in assembly 208and then add it to the config file definition. 209</p><p> 210To see how this works, create a file called 211<code>dalvik/vm/mterp/myarch/stub.S</code> that contains one line: 212<pre> 213/* stub for ${opcode} */ 214</pre> 215Then, in <code>dalvik/vm/mterp/config-myarch</code>, add this below the 216<code>handler-size</code> directive: 217<pre> 218# source for the instruction table stub 219asm-stub myarch/stub.S 220</pre> 221</p><p> 222Regenerate the sources with <code>./rebuild.sh</code>, and take a look 223inside <code>dalvik/vm/mterp/out/InterpAsm-myarch.S</code>. You should 224see 256 copies of the stub function in a single large block after the 225<code>dvmAsmInstructionStart</code> label. The <code>stub.S</code> 226code will be used anywhere you don't provide an assembly implementation. 227</p><p> 228Note that each block begins with a <code>.balign 64</code> directive. 229This is what pads each handler out to 64 bytes. Note also that the 230<code>${opcode}</code> text changed into an opcode name, which should 231be used to call the C implementation (<code>dvmMterp_${opcode}</code>). 232</p><p> 233The actual contents of <code>stub.S</code> are up to you to define. 234See <code>entry.S</code> and <code>stub.S</code> in the <code>armv5te</code> 235or <code>x86</code> directories for working examples. 236</p><p> 237If you're working on a variation of an existing architecture, you may be 238able to use most of the existing code and just provide replacements for 239a few instructions. Look at the <code>vm/mterp/config-*</code> files 240for examples. 241</p> 242 243 244<h3>Replacing Stubs</h3> 245 246<p> 247There are roughly 250 Dalvik opcodes, including some that are inserted by 248<a href="dexopt.html">dexopt</a> and aren't described in the 249<a href="dalvik-bytecode.html">Dalvik bytecode</a> documentation. Each 250one must perform the appropriate actions, fetch the next opcode, and 251branch to the next handler. The actions performed by the assembly version 252must exactly match those performed by the C version (in 253<code>dalvik/vm/mterp/c/OP_*</code>). 254</p><p> 255It is possible to customize the set of "optimized" instructions for your 256platform. This is possible because optimized DEX files are not expected 257to work on multiple devices. Adding, removing, or redefining instructions 258is beyond the scope of this document, and for simplicity it's best to stick 259with the basic set defined by the portable interpreter. 260</p><p> 261Once you have written a handler that looks like it should work, add 262it to the config file. For example, suppose we have a working version 263of <code>OP_NOP</code>. For demonstration purposes, fake it for now by 264putting this into <code>dalvik/vm/mterp/myarch/OP_NOP.S</code>: 265<pre> 266/* This is my NOP handler */ 267</pre> 268</p><p> 269Then, in the <code>op-start</code> section of <code>config-myarch</code>, add: 270<pre> 271 op OP_NOP myarch 272</pre> 273</p><p> 274This tells the generation script to use the assembly version from the 275<code>myarch</code> directory instead of the C version from the <code>c</code> 276directory. 277</p><p> 278Execute <code>./rebuild.sh</code>. Look at <code>InterpAsm-myarch.S</code> 279and <code>InterpC-myarch.c</code> in the <code>out</code> directory. You 280will see that the <code>OP_NOP</code> stub wrapper has been replaced with our 281new code in the assembly file, and the C stub implementation is no longer 282included. 283</p><p> 284As you implement instructions, the C version and corresponding stub wrapper 285will disappear from the output files. Eventually you will have a 100% 286assembly interpreter. You may find it saves a little time to examine 287the output of your compiler for some of the operations. The 288<a href="porting-proto.c.txt">porting-proto.c</a> sample code can be 289helpful here. 290</p> 291 292 293<h3>Interpreter Switching</h3> 294 295<p> 296The Dalvik VM actually includes a third interpreter implementation: the debug 297interpreter. This is a variation of the portable interpreter that includes 298support for debugging and profiling. 299</p><p> 300When a debugger attaches, or a profiling feature is enabled, the VM 301will switch interpreters at a convenient point. This is done at the 302same time as the GC safe point check: on a backward branch, a method 303return, or an exception throw. Similarly, when the debugger detaches 304or profiling is discontinued, execution transfers back to the "fast" or 305"portable" interpreter. 306</p><p> 307Your entry function needs to test the "entryPoint" value in the "glue" 308pointer to determine where execution should begin. Your exit function 309will need to return a boolean that indicates whether the interpreter is 310exiting (because we reached the "bottom" of a thread stack) or wants to 311switch to the other implementation. 312</p><p> 313See the <code>entry.S</code> file in <code>x86</code> or <code>armv5te</code> 314for examples. 315</p> 316 317 318<h3>Testing</h3> 319 320<p> 321A number of VM tests can be found in <code>dalvik/tests</code>. The most 322useful during interpreter development is <code>003-omnibus-opcodes</code>, 323which tests many different instructions. 324</p><p> 325The basic invocation is: 326<pre> 327$ cd dalvik/tests 328$ ./run-test 003 329</pre> 330</p><p> 331This will run test 003 on an attached device or emulator. You can run 332the test against your desktop VM by specifying <code>--reference</code> 333if you suspect the test may be faulty. You can also use 334<code>--portable</code> and <code>--fast</code> to explictly specify 335one Dalvik interpreter or the other. 336</p><p> 337Some instructions are replaced by <code>dexopt</code>, notably when 338"quickening" field accesses and method invocations. To ensure 339that you are testing the basic form of the instruction, add the 340<code>--no-optimize</code> option. 341</p><p> 342There is no in-built instruction tracing mechanism. If you want 343to know for sure that your implementation of an opcode handler 344is being used, the easiest approach is to insert a "printf" 345call. For an example, look at <code>common_squeak</code> in 346<code>dalvik/vm/mterp/armv5te/footer.S</code>. 347</p><p> 348At some point you need to ensure that debuggers and profiling work with 349your interpreter. The easiest way to do this is to simply connect a 350debugger or toggle profiling. (A future test suite may include some 351tests for this.) 352</p> 353 354 355<h2>Other Performance Issues</h2> 356 357<p> 358The <code>System.arraycopy()</code> function is heavily used. The 359implementation relies on the bionic C library to provide a fast, 360platform-optimized data copy function for arrays with elements wider 361than one byte. If you're not using bionic, or your platform does not 362have an implementation of this method, Dalvik will use correct but 363sub-optimal algorithms instead. For best performance you will want 364to provide your own version. 365</p><p> 366See the comments in <code>dalvik/vm/native/java_lang_System.c</code> 367for details. 368</p> 369 370<p> 371<address>Copyright © 2009 The Android Open Source Project</address> 372 373</body> 374</html> 375