Dalvik Porting Guide

The Dalvik virtual machine is intended to run on a variety of platforms. The baseline system is expected to be a variant of UNIX (Linux, BSD, Mac OS X) running the GNU C compiler. Little-endian CPUs have been exercised the most heavily, but big-endian systems are explicitly supported.

There are two general categories of work: porting to a Linux system with a previously unseen CPU architecture, and porting to a different operating system. This document covers the former.

Basic familiarity with the Android platform, source code structure, and build system is assumed.

Core Libraries

The native code in the core libraries (chiefly libcore, but also dalvik/vm/native) is written in C/C++ and is expected to work without modification in a Linux environment.

The core libraries pull in code from many other projects, including OpenSSL, zlib, and ICU. These will also need to be ported before the VM can be used.

JNI Call Bridge

Most of the Dalvik VM runtime is written in portable C. The one non-portable component of the runtime is the JNI call bridge. Simply put, this converts an array of integers into function arguments of various types, and calls a function. This must be done according to the C calling conventions for the platform. The task could be as simple as pushing all of the arguments onto the stack, or involve complex rules for register assignment and stack alignment.

To ease porting to new platforms, the open-source FFI library (Foreign Function Interface) is used when a custom bridge is unavailable. FFI is not as fast as a native implementation, and the optional performance improvements it does offer are not used, so writing a replacement is a good first step.

The code lives in dalvik/vm/arch/*, with the FFI-based version in the "generic" directory. There are two source files for each architecture. One defines the call bridge itself:

void dvmPlatformInvoke(void* pEnv, ClassObject* clazz, int argInfo, int argc, const u4* argv, const char* signature, void* func, JValue* pReturn)

This will invoke a C/C++ function declared:

return_type func(JNIEnv* pEnv, Object* this [, args])

or (for a "static" method):

return_type func(JNIEnv* pEnv, ClassObject* clazz [, args])

The role of dvmPlatformInvoke is to convert the values in argv into C-style calling conventions, call the method, and then place the return type into pReturn (a union that holds all of the basic JNI types). The code may use the method signature (a DEX "shorty" signature, with one character for the return type and one per argument) to determine how to handle the values.

The other source file involved here defines a 32-bit "hint". The hint is computed when the method's class is loaded, and passed in as the "argInfo" argument. The hint can be used to avoid scanning the ASCII method signature for things like the return value, total argument size, or inter-argument 64-bit alignment restrictions.

Interpreter

The Dalvik runtime includes two interpreters, labeled "portable" and "fast". The portable interpreter is largely contained within a single C function, and should compile on any system that supports gcc. (If you don't have gcc, you may need to disable the "threaded" execution model, which relies on gcc's "goto table" implementation; look for the THREADED_INTERP define.)

The fast interpreter uses hand-coded assembly fragments. If none are available for the current architecture, the build system will create an interpreter out of C "stubs". The resulting "all stubs" interpreter is quite a bit slower than the portable interpreter, making "fast" something of a misnomer.

The fast interpreter is enabled by default. On platforms without native support, you may want to switch to the portable interpreter. This can be controlled with the dalvik.vm.execution-mode system property. For example, if you:

adb shell "echo dalvik.vm.execution-mode = int:portable >> /data/local.prop"

and reboot, the Android app framework will start the VM with the portable interpreter enabled.

Mterp Interpreter Structure

There may be significant performance advantages to rewriting the interpreter core in assembly language, using architecture-specific optimizations. In Dalvik this can be done one instruction at a time.

The simplest way to implement an interpreter is to have a large "switch" statement. After each instruction is handled, the interpreter returns to the top of the loop, fetches the next instruction, and jumps to the appropriate label.

An improvement on this is called "threaded" execution. The instruction fetch and dispatch are included at the end of every instruction handler. This makes the interpreter a little larger overall, but you get to avoid the (potentially expensive) branch back to the top of the switch statement.

Dalvik mterp goes one step further, using a computed goto instead of a goto table. Instead of looking up the address in a table, which requires an extra memory fetch on every instruction, mterp multiplies the opcode number by a fixed value. By default, each handler is allowed 64 bytes of space.

Not all handlers fit in 64 bytes. Those that don't can have subroutines or simply continue on to additional code outside the basic space. Some of this is handled automatically by Dalvik, but there's no portable way to detect overflow of a 64-byte handler until the VM starts executing.

The choice of 64 bytes is somewhat arbitrary, but has worked out well for ARM and x86.

In the course of development it's useful to have C and assembly implementations of each handler, and be able to flip back and forth between them when hunting problems down. In mterp this is relatively straightforward. You can always see the files being fed to the compiler and assembler for your platform by looking in the dalvik/vm/mterp/out directory.

The interpreter sources live in dalvik/vm/mterp. If you haven't yet, you should read dalvik/vm/mterp/README.txt now.

Getting Started With Mterp

Getting started:

Decide on the name of your architecture. For the sake of discussion, let's call it myarch.
Make a copy of dalvik/vm/mterp/config-allstubs to dalvik/vm/mterp/config-myarch.
Create a dalvik/vm/mterp/myarch directory to hold your source files.
Add myarch to the list in dalvik/vm/mterp/rebuild.sh.
Make sure dalvik/vm/Android.mk will find the files for your architecture. If $(TARGET_ARCH) is configured this will happen automatically.
Disable the Dalvik JIT. You can do this in the general device configuration, or by editing the initialization of WITH_JIT in dalvik/vm/Dvm.mk to always be false.

You now have the basic framework in place. Whenever you make a change, you need to perform two steps: regenerate the mterp output, and build the core VM library. (It's two steps because we didn't want the build system to require Python 2.5. Which, incidentally, you need to have.)

In the dalvik/vm/mterp directory, regenerate the contents of the files in dalvik/vm/mterp/out by executing ./rebuild.sh. Note there are two files, one in C and one in assembly.
In the dalvik directory, regenerate the libdvm.so library with mm. You can also use mmm dalvik/vm from the top of the tree.

This will leave you with an updated libdvm.so, which can be pushed out to a device with adb sync or adb push. If you're using the emulator, you need to add make snod (System image, NO Dependency check) to rebuild the system image file. You should not need to do a top-level "make" and rebuild the dependent binaries.

At this point you have an "all stubs" interpreter. You can see how it works by examining dalvik/vm/mterp/cstubs/entry.c. The code runs in a loop, pulling out the next opcode, and invoking the handler through a function pointer. Each handler takes a "glue" argument that contains all of the useful state.

Your goal is to replace the entry method, exit method, and each individual instruction with custom implementations. The first thing you need to do is create an entry function that calls the handler for the first instruction. After that, the instructions chain together, so you don't need a loop. (Look at the ARM or x86 implementation to see how they work.)

Once you have that, you need something to jump to. You can't branch directly to the C stub because it's expecting to be called with a "glue" argument and then return. We need a C stub "wrapper" that does the setup and jumps directly to the next handler. We write this in assembly and then add it to the config file definition.

To see how this works, create a file called dalvik/vm/mterp/myarch/stub.S that contains one line:

/* stub for ${opcode} */

Then, in dalvik/vm/mterp/config-myarch, add this below the handler-size directive:

# source for the instruction table stub
asm-stub myarch/stub.S

Regenerate the sources with ./rebuild.sh, and take a look inside dalvik/vm/mterp/out/InterpAsm-myarch.S. You should see 256 copies of the stub function in a single large block after the dvmAsmInstructionStart label. The stub.S code will be used anywhere you don't provide an assembly implementation.

Note that each block begins with a .balign 64 directive. This is what pads each handler out to 64 bytes. Note also that the ${opcode} text changed into an opcode name, which should be used to call the C implementation (dvmMterp_${opcode}).

The actual contents of stub.S are up to you to define. See entry.S and stub.S in the armv5te or x86 directories for working examples.

If you're working on a variation of an existing architecture, you may be able to use most of the existing code and just provide replacements for a few instructions. Look at the vm/mterp/config-* files for examples.

Replacing Stubs

There are roughly 250 Dalvik opcodes, including some that are inserted by dexopt and aren't described in the Dalvik bytecode documentation. Each one must perform the appropriate actions, fetch the next opcode, and branch to the next handler. The actions performed by the assembly version must exactly match those performed by the C version (in dalvik/vm/mterp/c/OP_*).

It is possible to customize the set of "optimized" instructions for your platform. This is possible because optimized DEX files are not expected to work on multiple devices. Adding, removing, or redefining instructions is beyond the scope of this document, and for simplicity it's best to stick with the basic set defined by the portable interpreter.

Once you have written a handler that looks like it should work, add it to the config file. For example, suppose we have a working version of OP_NOP. For demonstration purposes, fake it for now by putting this into dalvik/vm/mterp/myarch/OP_NOP.S:

/* This is my NOP handler */

Then, in the op-start section of config-myarch, add:

    op OP_NOP myarch

This tells the generation script to use the assembly version from the myarch directory instead of the C version from the c directory.

Execute ./rebuild.sh. Look at InterpAsm-myarch.S and InterpC-myarch.c in the out directory. You will see that the OP_NOP stub wrapper has been replaced with our new code in the assembly file, and the C stub implementation is no longer included.

As you implement instructions, the C version and corresponding stub wrapper will disappear from the output files. Eventually you will have a 100% assembly interpreter. You may find it saves a little time to examine the output of your compiler for some of the operations. The porting-proto.c sample code can be helpful here.

Interpreter Switching

The Dalvik VM actually includes a third interpreter implementation: the debug interpreter. This is a variation of the portable interpreter that includes support for debugging and profiling.

When a debugger attaches, or a profiling feature is enabled, the VM will switch interpreters at a convenient point. This is done at the same time as the GC safe point check: on a backward branch, a method return, or an exception throw. Similarly, when the debugger detaches or profiling is discontinued, execution transfers back to the "fast" or "portable" interpreter.

Your entry function needs to test the "entryPoint" value in the "glue" pointer to determine where execution should begin. Your exit function will need to return a boolean that indicates whether the interpreter is exiting (because we reached the "bottom" of a thread stack) or wants to switch to the other implementation.

See the entry.S file in x86 or armv5te for examples.

Testing

A number of VM tests can be found in dalvik/tests. The most useful during interpreter development is 003-omnibus-opcodes, which tests many different instructions.

The basic invocation is:

$ cd dalvik/tests
$ ./run-test 003

This will run test 003 on an attached device or emulator. You can run the test against your desktop VM by specifying --reference if you suspect the test may be faulty. You can also use --portable and --fast to explictly specify one Dalvik interpreter or the other.

Some instructions are replaced by dexopt, notably when "quickening" field accesses and method invocations. To ensure that you are testing the basic form of the instruction, add the --no-optimize option.

There is no in-built instruction tracing mechanism. If you want to know for sure that your implementation of an opcode handler is being used, the easiest approach is to insert a "printf" call. For an example, look at common_squeak in dalvik/vm/mterp/armv5te/footer.S.

At some point you need to ensure that debuggers and profiling work with your interpreter. The easiest way to do this is to simply connect a debugger or toggle profiling. (A future test suite may include some tests for this.)

Other Performance Issues

The System.arraycopy() function is heavily used. The implementation relies on the bionic C library to provide a fast, platform-optimized data copy function for arrays with elements wider than one byte. If you're not using bionic, or your platform does not have an implementation of this method, Dalvik will use correct but sub-optimal algorithms instead. For best performance you will want to provide your own version.

See the comments in dalvik/vm/native/java_lang_System.c for details.