1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 2 3<html> 4 5<head> 6<title>Dalvik VM Instruction Formats</title> 7<link rel=stylesheet href="instruction-formats.css"> 8</head> 9 10<body> 11 12<h1>Dalvik VM Instruction Formats</h1> 13<p>Copyright © 2007 The Android Open Source Project 14 15<h2>Introduction and Overview</h2> 16 17<p>This document lists the instruction formats used by Dalvik bytecode 18and is meant to be used in conjunction with the 19<a href="dalvik-bytecode.html">bytecode reference document</a>.</p> 20 21<h3>Bitwise descriptions</h3> 22 23<p>The first column in the format table lists the bitwise layout of 24the format. It consists of one or more space-separated "words" each of 25which describes a 16-bit code unit. Each character in a word 26represents four bits, read from high bits to low, with vertical bars 27("<code>|</code>") interspersed to aid in reading. Uppercase letters 28in sequence from "<code>A</code>" are used to indicate fields within 29the format (which then get defined further by the syntax column). The term 30"<code>op</code>" is used to indicate the position of an eight-bit 31opcode within the format. A slashed zero 32("<code>Ø</code>") is used to indicate that all bits must be 33zero in the indicated position.</p> 34 35<p>For the most part, lettering proceeds from earlier code units to 36later code units, and low-order to high-order within a code unit. 37However, there are a few exceptions to this general rule, which are 38done in order to make the naming of similar-meaning parts be the same 39across different instruction formats. These cases are noted explicitly 40in the format descriptions.</p> 41 42<p>For example, the format "<code>B|A|<i>op</i> CCCC</code>" indicates 43that the format consists of two 16-bit code units. The first word 44consists of the opcode in the low eight bits and a pair of four-bit 45values in the high eight bits; and the second word consists of a single 4616-bit value.</p> 47 48<h3>Format IDs</h3> 49 50<p>The second column in the format table indicates the short identifier 51for the format, which is used in other documents and in code to identify 52the format.</p> 53 54<p>Most format IDs consist of three characters, two digits followed by a 55letter. The first digit indicates the number of 16-bit code units in the 56format. The second digit indicates the maximum number of registers that the 57format contains (maximum, since some formats can accomodate a variable 58number of registers), with the special designation "<code>r</code>" indicating 59that a range of registers is encoded. The final letter semi-mnemonically 60indicates the type of any extra data encoded by the format. For example, 61format "<code>21t</code>" is of length two, contains one register reference, 62and additionally contains a branch target.</p> 63 64<p>Suggested static linking formats have an additional 65"<code>s</code>" suffix, making them four characters total. Similarly, 66suggested "inline" linking formats have an additional "<code>i</code>" 67suffix. (In this context, inline linking is like static linking, 68except with more direct ties into a virtual machine's implementation.) 69Finally, a couple oddball suggested formats (e.g., 70"<code>20bc</code>") include two pieces of data which are both 71represented in its format ID.</p> 72 73<p>The full list of typecode letters are as follows. Note that some 74forms have different sizes, depending on the format:</p> 75 76<table class="letters"> 77<thead> 78<tr> 79 <th>Mnemonic</th> 80 <th>Bit Sizes</th> 81 <th>Meaning</th> 82</tr> 83</thead> 84<tbody> 85<tr> 86 <td>b</td> 87 <td>8</td> 88 <td>immediate signed <b>b</b>yte</td> 89</tr> 90<tr> 91 <td>c</td> 92 <td>16, 32</td> 93 <td><b>c</b>onstant pool index</td> 94</tr> 95<tr> 96 <td>f</td> 97 <td>16</td> 98 <td>inter<b>f</b>ace constants (only used in statically linked formats) 99 </td> 100</tr> 101<tr> 102 <td>h</td> 103 <td>16</td> 104 <td>immediate signed <b>h</b>at (high-order bits of a 32- or 64-bit 105 value; low-order bits are all <code>0</code>) 106 </td> 107</tr> 108<tr> 109 <td>i</td> 110 <td>32</td> 111 <td>immediate signed <b>i</b>nt, or 32-bit float</td> 112</tr> 113<tr> 114 <td>l</td> 115 <td>64</td> 116 <td>immediate signed <b>l</b>ong, or 64-bit double</td> 117</tr> 118<tr> 119 <td>m</td> 120 <td>16</td> 121 <td><b>m</b>ethod constants (only used in statically linked formats)</td> 122</tr> 123<tr> 124 <td>n</td> 125 <td>4</td> 126 <td>immediate signed <b>n</b>ibble</td> 127</tr> 128<tr> 129 <td>s</td> 130 <td>16</td> 131 <td>immediate signed <b>s</b>hort</td> 132</tr> 133<tr> 134 <td>t</td> 135 <td>8, 16, 32</td> 136 <td>branch <b>t</b>arget</td> 137</tr> 138<tr> 139 <td>x</td> 140 <td>0</td> 141 <td>no additional data</td> 142</tr> 143</tbody> 144</table> 145 146<h3>Syntax</h3> 147 148<p>The third column of the format table indicates the human-oriented 149syntax for instructions which use the indicated format. Each instruction 150starts with the named opcode and is optionally followed by one or 151more arguments, themselves separated with commas.</p> 152 153<p>Wherever an argument refers to a field from the first column, the 154letter for that field is indicated in the syntax, repeated once for 155each four bits of the field. For example, an eight-bit field labeled 156"<code>BB</code>" in the first column would also be labeled 157"<code>BB</code>" in the syntax column.</p> 158 159<p>Arguments which name a register have the form "<code>v<i>X</i></code>". 160The prefix "<code>v</code>" was chosen instead of the more common 161"<code>r</code>" exactly to avoid conflicting with (non-virtual) architectures 162on which a Dalvik virtual machine might be implemented which themselves 163use the prefix "<code>r</code>" for their registers. (That is, this 164decision makes it possible to talk about both virtual and real registers 165together without the need for circumlocution.)</p> 166 167<p>Arguments which indicate a literal value have the form 168"<code>#+<i>X</i></code>". Some formats indicate literals that only 169have non-zero bits in their high-order bits; for these, the zeroes 170are represented explicitly in the syntax, even though they do not 171appear in the bitwise representation.</p> 172 173<p>Arguments which indicate a relative instruction address offset have the 174form "<code>+<i>X</i></code>".</p> 175 176<p>Arguments which indicate a literal constant pool index have the form 177"<code><i>kind</i>@<i>X</i></code>", where "<code><i>kind</i></code>" 178indicates which constant pool is being referred to. Each opcode that 179uses such a format explicitly allows only one kind of constant; see 180the opcode reference to figure out the correspondence. The four 181kinds of constant pool are "<code>string</code>" (string pool index), 182"<code>type</code>" (type pool index), "<code>field</code>" (field 183pool index), and "<code>meth</code>" (method pool index).</p> 184 185<p>Similar to the representation of constant pool indices, there are 186also suggested (optional) forms that indicate prelinked offsets or 187indices. There are two types of suggested prelinked value: vtable offsets 188(indicated as "<code>vtaboff</code>") and field offsets (indicated as 189"<code>fieldoff</code>").</p> 190 191<p>In the cases where a format value isn't explictly part of the syntax 192but instead picks a variant, each variant is listed with the prefix 193"<code>[<i>X</i>=<i>N</i>]</code>" (e.g., "<code>[A=2]</code>") to indicate 194the correspondence.</p> 195 196<h2>The Formats</h2> 197 198<table class="format"> 199<thead> 200<tr> 201 <th>Format</th> 202 <th>ID</th> 203 <th>Syntax</th> 204 <th>Notable Opcodes Covered</th> 205</tr> 206</thead> 207<tbody> 208<tr> 209 <td><i>N/A</i></td> 210 <td>00x</td> 211 <td><i><code>N/A</code></i></td> 212 <td><i>pseudo-format used for unused opcodes; suggested for use as the 213 nominal format for a breakpoint opcode</i></td> 214</tr> 215<tr> 216 <td>ØØ|<i>op</i></td> 217 <td>10x</td> 218 <td><i><code>op</code></i></td> 219 <td> </td> 220</tr> 221<tr> 222 <td rowspan="2">B|A|<i>op</i></td> 223 <td>12x</td> 224 <td><i><code>op</code></i> vA, vB</td> 225 <td> </td> 226</tr> 227<tr> 228 <td>11n</td> 229 <td><i><code>op</code></i> vA, #+B</td> 230 <td> </td> 231</tr> 232<tr> 233 <td rowspan="2">AA|<i>op</i></td> 234 <td>11x</td> 235 <td><i><code>op</code></i> vAA</td> 236 <td> </td> 237</tr> 238<tr> 239 <td>10t</td> 240 <td><i><code>op</code></i> +AA</td> 241 <td>goto</td> 242</tr> 243<tr> 244 <td>ØØ|<i>op</i> AAAA</td></td> 245 <td>20t</td> 246 <td><i><code>op</code></i> +AAAA</td> 247 <td>goto/16</td> 248</tr> 249<tr> 250 <td>AA|<i>op</i> BBBB</td></td> 251 <td>20bc</td> 252 <td><i><code>op</code></i> AA, kind@BBBB</td> 253 <td><i>suggested format for statically determined verification errors; 254 A is the type of error and B is an index into a type-appropriate 255 table (e.g. method references for a no-such-method error)</i></td> 256</tr> 257<tr> 258 <td rowspan="5">AA|<i>op</i> BBBB</td> 259 <td>22x</td> 260 <td><i><code>op</code></i> vAA, vBBBB</td> 261 <td> </td> 262</tr> 263<tr> 264 <td>21t</td> 265 <td><i><code>op</code></i> vAA, +BBBB</td> 266 <td> </td> 267</tr> 268<tr> 269 <td>21s</td> 270 <td><i><code>op</code></i> vAA, #+BBBB</td> 271 <td> </td> 272</tr> 273<tr> 274 <td>21h</td> 275 <td><i><code>op</code></i> vAA, #+BBBB0000<br/> 276 <i><code>op</code></i> vAA, #+BBBB000000000000 277 </td> 278 <td> </td> 279</tr> 280<tr> 281 <td>21c</td> 282 <td><i><code>op</code></i> vAA, type@BBBB<br/> 283 <i><code>op</code></i> vAA, field@BBBB<br/> 284 <i><code>op</code></i> vAA, string@BBBB 285 </td> 286 <td>check-cast<br/> 287 const-class<br/> 288 const-string 289 </td> 290</tr> 291<tr> 292 <td rowspan="2">AA|<i>op</i> CC|BB</td> 293 <td>23x</td> 294 <td><i><code>op</code></i> vAA, vBB, vCC</td> 295 <td> </td> 296</tr> 297<tr> 298 <td>22b</td> 299 <td><i><code>op</code></i> vAA, vBB, #+CC</td> 300 <td> </td> 301</tr> 302<tr> 303 <td rowspan="4">B|A|<i>op</i> CCCC</td> 304 <td>22t</td> 305 <td><i><code>op</code></i> vA, vB, +CCCC</td> 306 <td> </td> 307</tr> 308<tr> 309 <td>22s</td> 310 <td><i><code>op</code></i> vA, vB, #+CCCC</td> 311 <td> </td> 312</tr> 313<tr> 314 <td>22c</td> 315 <td><i><code>op</code></i> vA, vB, type@CCCC<br/> 316 <i><code>op</code></i> vA, vB, field@CCCC 317 </td> 318 <td>instance-of</td> 319</tr> 320<tr> 321 <td>22cs</td> 322 <td><i><code>op</code></i> vA, vB, fieldoff@CCCC</td> 323 <td><i>suggested format for statically linked field access instructions of 324 format 22c</i> 325 </td> 326</tr> 327<tr> 328 <td>ØØ|<i>op</i> AAAA<sub>lo</sub> AAAA<sub>hi</sub></td></td> 329 <td>30t</td> 330 <td><i><code>op</code></i> +AAAAAAAA</td> 331 <td>goto/32</td> 332</tr> 333<tr> 334 <td>ØØ|<i>op</i> AAAA BBBB</td> 335 <td>32x</td> 336 <td><i><code>op</code></i> vAAAA, vBBBB</td> 337 <td> </td> 338</tr> 339<tr> 340 <td rowspan="3">AA|<i>op</i> BBBB<sub>lo</sub> BBBB<sub>hi</sub></td> 341 <td>31i</td> 342 <td><i><code>op</code></i> vAA, #+BBBBBBBB</td> 343 <td> </td> 344</tr> 345<tr> 346 <td>31t</td> 347 <td><i><code>op</code></i> vAA, +BBBBBBBB</td> 348 <td> </td> 349</tr> 350<tr> 351 <td>31c</td> 352 <td><i><code>op</code></i> vAA, string@BBBBBBBB</td> 353 <td>const-string/jumbo</td> 354</tr> 355<tr> 356 <td rowspan="3">A|G|<i>op</i> BBBB F|E|D|C</td> 357 <td>35c</td> 358 <td><i>[<code>A=5</code>] <code>op</code></i> {vC, vD, vE, vF, vG}, 359 meth@BBBB<br/> 360 <i>[<code>A=5</code>] <code>op</code></i> {vC, vD, vE, vF, vG}, 361 type@BBBB<br/> 362 <i>[<code>A=4</code>] <code>op</code></i> {vC, vD, vE, vF}, 363 <i><code>kind</code></i>@BBBB<br/> 364 <i>[<code>A=3</code>] <code>op</code></i> {vC, vD, vE}, 365 <i><code>kind</code></i>@BBBB<br/> 366 <i>[<code>A=2</code>] <code>op</code></i> {vC, vD}, 367 <i><code>kind</code></i>@BBBB<br/> 368 <i>[<code>A=1</code>] <code>op</code></i> {vC}, 369 <i><code>kind</code></i>@BBBB<br/> 370 <i>[<code>A=0</code>] <code>op</code></i> {}, 371 <i><code>kind</code></i>@BBBB<br/> 372 <p><i>The unusual choice in lettering here reflects a desire to make 373 the count and the reference index have the same label as in format 374 3rc.</i></p> 375 </td> 376 <td> </td> 377</tr> 378<tr> 379 <td>35ms</td> 380 <td><i>[<code>A=5</code>] <code>op</code></i> {vC, vD, vE, vF, vG}, 381 vtaboff@BBBB<br/> 382 <i>[<code>A=4</code>] <code>op</code></i> {vC, vD, vE, vF}, 383 vtaboff@BBBB<br/> 384 <i>[<code>A=3</code>] <code>op</code></i> {vC, vD, vE}, 385 vtaboff@BBBB<br/> 386 <i>[<code>A=2</code>] <code>op</code></i> {vC, vD}, 387 vtaboff@BBBB<br/> 388 <i>[<code>A=1</code>] <code>op</code></i> {vC}, 389 vtaboff@BBBB<br/> 390 <p><i>The unusual choice in lettering here reflects a desire to make 391 the count and the reference index have the same label as in format 392 3rms.</i></p> 393 </td> 394 <td><i>suggested format for statically linked <code>invoke-virtual</code> 395 and <code>invoke-super</code> instructions of format 35c</i> 396 </td> 397</tr> 398<tr> 399 <td>35mi</td> 400 <td><i>[<code>A=5</code>] <code>op</code></i> {vC, vD, vE, vF, vG}, 401 inline@BBBB<br/> 402 <i>[<code>A=4</code>] <code>op</code></i> {vC, vD, vE, vF}, 403 inline@BBBB<br/> 404 <i>[<code>A=3</code>] <code>op</code></i> {vC, vD, vE}, 405 inline@BBBB<br/> 406 <i>[<code>A=2</code>] <code>op</code></i> {vC, vD}, 407 inline@BBBB<br/> 408 <i>[<code>A=1</code>] <code>op</code></i> {vC}, 409 inline@BBBB<br/> 410 <p><i>The unusual choice in lettering here reflects a desire to make 411 the count and the reference index have the same label as in format 412 3rmi.</i></p> 413 </td> 414 <td><i>suggested format for inline linked <code>invoke-static</code> 415 and <code>invoke-virtual</code> instructions of format 35c</i> 416 </td> 417</tr> 418<tr> 419 <td rowspan="3">AA|<i>op</i> BBBB CCCC</td> 420 <td>3rc</td> 421 <td><i><code>op</code></i> {vCCCC .. vNNNN}, meth@BBBB<br/> 422 <i><code>op</code></i> {vCCCC .. vNNNN}, type@BBBB<br/> 423 <p><i>where <code>NNNN = CCCC+AA-1</code>, that is <code>A</code> 424 determines the count <code>0..255</code>, and <code>C</code> 425 determines the first register</i></p> 426 </td> 427 <td> </td> 428</tr> 429<tr> 430 <td>3rms</td> 431 <td><i><code>op</code></i> {vCCCC .. vNNNN}, vtaboff@BBBB<br/> 432 <p><i>where <code>NNNN = CCCC+AA-1</code>, that is <code>A</code> 433 determines the count <code>0..255</code>, and <code>C</code> 434 determines the first register</i></p> 435 </td> 436 <td><i>suggested format for statically linked <code>invoke-virtual</code> 437 and <code>invoke-super</code> instructions of format <code>3rc</code></i> 438 </td> 439</tr> 440<tr> 441 <td>3rmi</td> 442 <td><i><code>op</code></i> {vCCCC .. vNNNN}, inline@BBBB<br/> 443 <p><i>where <code>NNNN = CCCC+AA-1</code>, that is <code>A</code> 444 determines the count <code>0..255</code>, and <code>C</code> 445 determines the first register</i></p> 446 </td> 447 <td><i>suggested format for inline linked <code>invoke-static</code> 448 and <code>invoke-virtual</code> instructions of format 3rc</i> 449 </td> 450</tr> 451<tr> 452 <td>AA|<i>op</i> BBBB<sub>lo</sub> BBBB BBBB BBBB<sub>hi</sub></td> 453 <td>51l</td> 454 <td><i><code>op</code></i> vAA, #+BBBBBBBBBBBBBBBB</td> 455 <td>const-wide</td> 456</tr> 457</tbody> 458</table> 459 460</body> 461</html> 462