1<html> 2<head> 3<title>Dalvik Bytecode Verifier Notes</title> 4</head> 5 6<body> 7<h1>Dalvik Bytecode Verifier Notes</h1> 8 9<p> 10The bytecode verifier in the Dalvik VM attempts to provide the same sorts 11of checks and guarantees that other popular virtual machines do. We 12perform generally the same set of checks as are described in _The Java 13Virtual Machine Specification, Second Edition_, including the updates 14planned for the Third Edition. 15 16<p> 17Verification can be enabled for all classes, disabled for all, or enabled 18only for "remote" (non-bootstrap) classes. It should be performed for any 19class that will be processed with the DEX optimizer, and in fact the 20default VM behavior is to only optimize verified classes. 21 22 23<h2>Why Verify?</h2> 24 25<p> 26The verification process adds additional time to the build and to 27the installation of new applications. It's fairly quick for app-sized 28DEX files, but rather slow for the big "core" and "framework" files. 29Why do it all, when our system relies on UNIX processes for security? 30<p> 31<ol> 32 <li>Optimizations. The interpreter can ignore a lot of potential 33 error cases because the verifier guarantees that they are impossible. 34 Also, we can optimize the DEX file more aggressively if we start 35 with a stronger set of assumptions about the bytecode. 36 <li>"Precise" GC. The work peformed during verification has significant 37 overlap with the work required to compute register use maps for 38 type-precise GC. 39 <li>Intra-application security. If an app wants to download bits 40 of interpreted code over the network and execute them, it can safely 41 do so using well-established security mechanisms. 42 <li>3rd party app failure analysis. We have no way to control the 43 tools and post-processing utilities that external developers employ, 44 so when we get bug reports with a weird exception or native crash 45 it's very helpful to start with the assumption that the bytecode 46 is valid. 47</ol> 48<p> 49It's also a convenient framework to deal with certain situations, notably 50replacement of instructions that access volatile 64-bit fields with 51more rigorous versions that guarantee atomicity. 52 53 54<h2>Verifier Differences</h2> 55 56<p> 57There are a few checks that the Dalvik bytecode verifier does not perform, 58because they're not relevant. For example: 59<ul> 60 <li>Type restrictions on constant pool references are not enforced, 61 because Dalvik does not have a pool of typed constants. (Dalvik 62 uses a simple index into type-specific pools.) 63 <li>Verification of the operand stack size is not performed, because 64 Dalvik does not have an operand stack. 65 <li>Limitations on <code>jsr</code> and <code>ret</code> do not apply, 66 because Dalvik doesn't support subroutines. 67</ul> 68 69In some cases they are implemented differently, e.g.: 70<ul> 71 <li>In a conventional VM, backward branches and exceptions are 72 forbidden when a local variable holds an uninitialized reference. The 73 restriction was changed to mark registers as invalid when they hold 74 references to the uninitialized result of a previous invocation of the 75 same <code>new-instance</code> instruction. 76 This solves the same problem -- trickery potentially allowing 77 uninitialized objects to slip past the verifier -- without unduly 78 limiting branches. 79</ul> 80 81There are also some new ones, such as: 82<ul> 83 <li>The <code>move-exception</code> instruction can only appear as 84 the first instruction in an exception handler. 85 <li>The <code>move-result*</code> instructions can only appear 86 immediately after an appropriate <code>invoke-*</code> 87 or <code>filled-new-array</code> instruction. 88</ul> 89 90<p> 91The VM is permitted but not required to enforce "structured locking" 92constraints, which are designed to ensure that, when a method returns, all 93monitors locked by the method have been unlocked an equal number of times. 94This is not currently implemented. 95 96<p> 97The Dalvik verifier is more restrictive than other VMs in one area: 98type safety on sub-32-bit integer widths. These additional restrictions 99should make it impossible to, say, pass a value outside the range 100[-128, 127] to a function that takes a <code>byte</code> as an argument. 101 102 103<h2>Verification Failures</h2> 104 105<p> 106The verifier may reject a class immediately, or it may defer throwing 107an exception until the code is actually used. For example, if a class 108attempts to perform an illegal access on a field, the VM should throw 109an IllegalAccessError the first time the instruction is encountered. 110On the other hand, if a class contains an invalid bytecode, it should be 111rejected immediately with a VerifyError. 112 113<p> 114Immediate VerifyErrors are accompanied by detailed, if somewhat cryptic, 115information in the log file. From this it's possible to determine the 116exact instruction that failed, and the reason for the failure. 117 118<p> 119It's a bit tricky to implement deferred verification errors in Dalvik. 120A few approaches were considered: 121 122<ol> 123<li>We could replace the invalid field access instruction with a special 124instruction that generates an illegal access error, and allow class 125verification to complete successfully. This type of verification must 126be deferred to first class load, rather than be performed ahead of time 127during DEX optimization, because some failures will depend on the current 128execution environment (e.g. not all classes are available at dexopt time). 129At that point the bytecode instructions are mapped read-only during 130verification, so rewriting them isn't possible. 131</li> 132 133<li>We can perform the access checks when the field/method/class is 134resolved. In a typical VM implementation we would do the check when the 135entry is resolved in the context of the current classfile, but our DEX 136files combine multiple classfiles together, merging the field/method/class 137resolution results into a single large table. Once one class successfully 138resolves the field, every other class in the same DEX file would be able 139to access the field. This is incorrect. 140</li> 141 142<li>Perform the access checks on every field/method/class access. 143This adds significant overhead. This is mitigated somewhat by the DEX 144optimizer, which will convert many field/method/class accesses into a 145simpler form after performing the access check. However, not all accesses 146can be optimized (e.g. accesses to classes unknown at dexopt time), 147and we don't currently have an optimized form of certain instructions 148(notably static field operations). 149</li> 150</ol> 151 152<p> 153In early versions of Dalvik (as found in Android 1.6 and earlier), the verifier 154simply regarded all problems as immediately fatal. This generally worked, 155but in some cases the VM was rejecting classes because of bits of code 156that were never used. The VerifyError itself was sometimes difficult to 157decipher, because it was thrown during verification rather than at the 158point where the problem was first noticed during execution. 159<p> 160The current version uses a variation of approach #1. The dexopt 161command works the way it did before, leaving the code untouched and 162flagging fully-correct classes as "pre-verified". When the VM loads a 163class that didn't pass pre-verification, the verifier is invoked. If a 164"deferrable" problem is detected, a modifiable copy of the instructions 165in the problematic method is made. In that copy, the troubled instruction 166is replaced with an "always throw" opcode, and verification continues. 167 168<p> 169In the example used earlier, an attempt to read from an inaccessible 170field would result in the "field get" instruction being replaced by 171"always throw IllegalAccessError on field X". Creating copies of method 172bodies requires additional heap space, but since this affects very few 173methods overall the memory impact should be minor. 174 175<p> 176<address>Copyright © 2008 The Android Open Source Project</address> 177 178</body> 179</html> 180