1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 2 3<html> 4 5<head> 6<title>Dalvik VM Instruction Formats</title> 7<link rel=stylesheet href="instruction-formats.css"> 8</head> 9 10<body> 11 12<h1>Dalvik VM Instruction Formats</h1> 13<p>Copyright © 2007 The Android Open Source Project 14 15<h2>Introduction and Overview</h2> 16 17<p>This document lists the instruction formats used by Dalvik bytecode 18and is meant to be used in conjunction with the 19<a href="dalvik-bytecode.html">bytecode reference document</a>.</p> 20 21<h3>Bitwise descriptions</h3> 22 23<p>The first column in the format table lists the bitwise layout of 24the format. It consists of one or more space-separated "words" each of 25which describes a 16-bit code unit. Each character in a word 26represents four bits, read from high bits to low, with vertical bars 27("<code>|</code>") interspersed to aid in reading. Uppercase letters 28in sequence from "<code>A</code>" are used to indicate fields within 29the format (which then get defined further by the syntax column). The term 30"<code>op</code>" is used to indicate the position of an eight-bit 31opcode within the format, and similarly "<code>exop</code>" is used 32to indicate an extended sixteen-bit opcode. A slashed zero 33("<code>Ø</code>") is used to indicate that all bits must be 34zero in the indicated position.</p> 35 36<p>For the most part, lettering proceeds from earlier code units to 37later code units, and low-order to high-order within a code unit. 38However, there are a few exceptions to this general rule, which are 39done in order to make the naming of similar-meaning parts be the same 40across different instruction formats. These cases are noted explicitly 41in the format descriptions.</p> 42 43<p>For example, the format "<code>B|A|<i>op</i> CCCC</code>" indicates 44that the format consists of two 16-bit code units. The first word 45consists of the opcode in the low eight bits and a pair of four-bit 46values in the high eight bits; and the second word consists of a single 4716-bit value.</p> 48 49<h3>Format IDs</h3> 50 51<p>The second column in the format table indicates the short identifier 52for the format, which is used in other documents and in code to identify 53the format.</p> 54 55<p>Most format IDs consist of three characters, two digits followed by a 56letter. The first digit indicates the number of 16-bit code units in the 57format. The second digit indicates the maximum number of registers that the 58format contains (maximum, since some formats can accomodate a variable 59number of registers), with the special designation "<code>r</code>" indicating 60that a range of registers is encoded. The final letter semi-mnemonically 61indicates the type of any extra data encoded by the format. For example, 62format "<code>21t</code>" is of length two, contains one register reference, 63and additionally contains a branch target.</p> 64 65<p>Suggested static linking formats have an additional 66"<code>s</code>" suffix, making them four characters total. Similarly, 67suggested "inline" linking formats have an additional "<code>i</code>" 68suffix. (In this context, inline linking is like static linking, 69except with more direct ties into a virtual machine's implementation.) 70Finally, a couple oddball suggested formats (e.g., 71"<code>20bc</code>") include two pieces of data which are both 72represented in its format ID.</p> 73 74<p>The full list of typecode letters are as follows. Note that some 75forms have different sizes, depending on the format:</p> 76 77<table class="letters"> 78<thead> 79<tr> 80 <th>Mnemonic</th> 81 <th>Bit Sizes</th> 82 <th>Meaning</th> 83</tr> 84</thead> 85<tbody> 86<tr> 87 <td>b</td> 88 <td>8</td> 89 <td>immediate signed <b>b</b>yte</td> 90</tr> 91<tr> 92 <td>c</td> 93 <td>16, 32</td> 94 <td><b>c</b>onstant pool index</td> 95</tr> 96<tr> 97 <td>f</td> 98 <td>16</td> 99 <td>inter<b>f</b>ace constants (only used in statically linked formats) 100 </td> 101</tr> 102<tr> 103 <td>h</td> 104 <td>16</td> 105 <td>immediate signed <b>h</b>at (high-order bits of a 32- or 64-bit 106 value; low-order bits are all <code>0</code>) 107 </td> 108</tr> 109<tr> 110 <td>i</td> 111 <td>32</td> 112 <td>immediate signed <b>i</b>nt, or 32-bit float</td> 113</tr> 114<tr> 115 <td>l</td> 116 <td>64</td> 117 <td>immediate signed <b>l</b>ong, or 64-bit double</td> 118</tr> 119<tr> 120 <td>m</td> 121 <td>16</td> 122 <td><b>m</b>ethod constants (only used in statically linked formats)</td> 123</tr> 124<tr> 125 <td>n</td> 126 <td>4</td> 127 <td>immediate signed <b>n</b>ibble</td> 128</tr> 129<tr> 130 <td>s</td> 131 <td>16</td> 132 <td>immediate signed <b>s</b>hort</td> 133</tr> 134<tr> 135 <td>t</td> 136 <td>8, 16, 32</td> 137 <td>branch <b>t</b>arget</td> 138</tr> 139<tr> 140 <td>x</td> 141 <td>0</td> 142 <td>no additional data</td> 143</tr> 144</tbody> 145</table> 146 147<h3>Syntax</h3> 148 149<p>The third column of the format table indicates the human-oriented 150syntax for instructions which use the indicated format. Each instruction 151starts with the named opcode and is optionally followed by one or 152more arguments, themselves separated with commas.</p> 153 154<p>Wherever an argument refers to a field from the first column, the 155letter for that field is indicated in the syntax, repeated once for 156each four bits of the field. For example, an eight-bit field labeled 157"<code>BB</code>" in the first column would also be labeled 158"<code>BB</code>" in the syntax column.</p> 159 160<p>Arguments which name a register have the form "<code>v<i>X</i></code>". 161The prefix "<code>v</code>" was chosen instead of the more common 162"<code>r</code>" exactly to avoid conflicting with (non-virtual) architectures 163on which a Dalvik virtual machine might be implemented which themselves 164use the prefix "<code>r</code>" for their registers. (That is, this 165decision makes it possible to talk about both virtual and real registers 166together without the need for circumlocution.)</p> 167 168<p>Arguments which indicate a literal value have the form 169"<code>#+<i>X</i></code>". Some formats indicate literals that only 170have non-zero bits in their high-order bits; for these, the zeroes 171are represented explicitly in the syntax, even though they do not 172appear in the bitwise representation.</p> 173 174<p>Arguments which indicate a relative instruction address offset have the 175form "<code>+<i>X</i></code>".</p> 176 177<p>Arguments which indicate a literal constant pool index have the form 178"<code><i>kind</i>@<i>X</i></code>", where "<code><i>kind</i></code>" 179indicates which constant pool is being referred to. Each opcode that 180uses such a format explicitly allows only one kind of constant; see 181the opcode reference to figure out the correspondence. The four 182kinds of constant pool are "<code>string</code>" (string pool index), 183"<code>type</code>" (type pool index), "<code>field</code>" (field 184pool index), and "<code>meth</code>" (method pool index).</p> 185 186<p>Similar to the representation of constant pool indices, there are 187also suggested (optional) forms that indicate prelinked offsets or 188indices. There are two types of suggested prelinked value: vtable offsets 189(indicated as "<code>vtaboff</code>") and field offsets (indicated as 190"<code>fieldoff</code>").</p> 191 192<p>In the cases where a format value isn't explictly part of the syntax 193but instead picks a variant, each variant is listed with the prefix 194"<code>[<i>X</i>=<i>N</i>]</code>" (e.g., "<code>[A=2]</code>") to indicate 195the correspondence.</p> 196 197<h2>The Formats</h2> 198 199<table class="format"> 200<thead> 201<tr> 202 <th>Format</th> 203 <th>ID</th> 204 <th>Syntax</th> 205 <th>Notable Opcodes Covered</th> 206</tr> 207</thead> 208<tbody> 209<tr> 210 <td><i>N/A</i></td> 211 <td>00x</td> 212 <td><i><code>N/A</code></i></td> 213 <td><i>pseudo-format used for unused opcodes; suggested for use as the 214 nominal format for a breakpoint opcode</i></td> 215</tr> 216<tr> 217 <td>ØØ|<i>op</i></td> 218 <td>10x</td> 219 <td><i><code>op</code></i></td> 220 <td> </td> 221</tr> 222<tr> 223 <td rowspan="2">B|A|<i>op</i></td> 224 <td>12x</td> 225 <td><i><code>op</code></i> vA, vB</td> 226 <td> </td> 227</tr> 228<tr> 229 <td>11n</td> 230 <td><i><code>op</code></i> vA, #+B</td> 231 <td> </td> 232</tr> 233<tr> 234 <td rowspan="2">AA|<i>op</i></td> 235 <td>11x</td> 236 <td><i><code>op</code></i> vAA</td> 237 <td> </td> 238</tr> 239<tr> 240 <td>10t</td> 241 <td><i><code>op</code></i> +AA</td> 242 <td>goto</td> 243</tr> 244<tr> 245 <td>ØØ|<i>op</i> AAAA</td></td> 246 <td>20t</td> 247 <td><i><code>op</code></i> +AAAA</td> 248 <td>goto/16</td> 249</tr> 250<tr> 251 <td>AA|<i>op</i> BBBB</td></td> 252 <td>20bc</td> 253 <td><i><code>op</code></i> AA, kind@BBBB</td> 254 <td><i>suggested format for statically determined verification errors; 255 A is the type of error and B is an index into a type-appropriate 256 table (e.g. method references for a no-such-method error)</i></td> 257</tr> 258<tr> 259 <td rowspan="5">AA|<i>op</i> BBBB</td> 260 <td>22x</td> 261 <td><i><code>op</code></i> vAA, vBBBB</td> 262 <td> </td> 263</tr> 264<tr> 265 <td>21t</td> 266 <td><i><code>op</code></i> vAA, +BBBB</td> 267 <td> </td> 268</tr> 269<tr> 270 <td>21s</td> 271 <td><i><code>op</code></i> vAA, #+BBBB</td> 272 <td> </td> 273</tr> 274<tr> 275 <td>21h</td> 276 <td><i><code>op</code></i> vAA, #+BBBB0000<br/> 277 <i><code>op</code></i> vAA, #+BBBB000000000000 278 </td> 279 <td> </td> 280</tr> 281<tr> 282 <td>21c</td> 283 <td><i><code>op</code></i> vAA, type@BBBB<br/> 284 <i><code>op</code></i> vAA, field@BBBB<br/> 285 <i><code>op</code></i> vAA, string@BBBB 286 </td> 287 <td>check-cast<br/> 288 const-class<br/> 289 const-string 290 </td> 291</tr> 292<tr> 293 <td rowspan="2">AA|<i>op</i> CC|BB</td> 294 <td>23x</td> 295 <td><i><code>op</code></i> vAA, vBB, vCC</td> 296 <td> </td> 297</tr> 298<tr> 299 <td>22b</td> 300 <td><i><code>op</code></i> vAA, vBB, #+CC</td> 301 <td> </td> 302</tr> 303<tr> 304 <td rowspan="4">B|A|<i>op</i> CCCC</td> 305 <td>22t</td> 306 <td><i><code>op</code></i> vA, vB, +CCCC</td> 307 <td> </td> 308</tr> 309<tr> 310 <td>22s</td> 311 <td><i><code>op</code></i> vA, vB, #+CCCC</td> 312 <td> </td> 313</tr> 314<tr> 315 <td>22c</td> 316 <td><i><code>op</code></i> vA, vB, type@CCCC<br/> 317 <i><code>op</code></i> vA, vB, field@CCCC 318 </td> 319 <td>instance-of</td> 320</tr> 321<tr> 322 <td>22cs</td> 323 <td><i><code>op</code></i> vA, vB, fieldoff@CCCC</td> 324 <td><i>suggested format for statically linked field access instructions of 325 format 22c</i> 326 </td> 327</tr> 328<tr> 329 <td>ØØ|<i>op</i> AAAA<sub>lo</sub> AAAA<sub>hi</sub></td></td> 330 <td>30t</td> 331 <td><i><code>op</code></i> +AAAAAAAA</td> 332 <td>goto/32</td> 333</tr> 334<tr> 335 <td>ØØ|<i>op</i> AAAA BBBB</td> 336 <td>32x</td> 337 <td><i><code>op</code></i> vAAAA, vBBBB</td> 338 <td> </td> 339</tr> 340<tr> 341 <td rowspan="3">AA|<i>op</i> BBBB<sub>lo</sub> BBBB<sub>hi</sub></td> 342 <td>31i</td> 343 <td><i><code>op</code></i> vAA, #+BBBBBBBB</td> 344 <td> </td> 345</tr> 346<tr> 347 <td>31t</td> 348 <td><i><code>op</code></i> vAA, +BBBBBBBB</td> 349 <td> </td> 350</tr> 351<tr> 352 <td>31c</td> 353 <td><i><code>op</code></i> vAA, string@BBBBBBBB</td> 354 <td>const-string/jumbo</td> 355</tr> 356<tr> 357 <td rowspan="3">A|G|<i>op</i> BBBB F|E|D|C</td> 358 <td>35c</td> 359 <td><i>[<code>A=5</code>] <code>op</code></i> {vC, vD, vE, vF, vG}, 360 meth@BBBB<br/> 361 <i>[<code>A=5</code>] <code>op</code></i> {vC, vD, vE, vF, vG}, 362 type@BBBB<br/> 363 <i>[<code>A=4</code>] <code>op</code></i> {vC, vD, vE, vF}, 364 <i><code>kind</code></i>@BBBB<br/> 365 <i>[<code>A=3</code>] <code>op</code></i> {vC, vD, vE}, 366 <i><code>kind</code></i>@BBBB<br/> 367 <i>[<code>A=2</code>] <code>op</code></i> {vC, vD}, 368 <i><code>kind</code></i>@BBBB<br/> 369 <i>[<code>A=1</code>] <code>op</code></i> {vC}, 370 <i><code>kind</code></i>@BBBB<br/> 371 <i>[<code>A=0</code>] <code>op</code></i> {}, 372 <i><code>kind</code></i>@BBBB<br/> 373 <p><i>The unusual choice in lettering here reflects a desire to make 374 the count and the reference index have the same label as in format 375 3rc.</i></p> 376 </td> 377 <td> </td> 378</tr> 379<tr> 380 <td>35ms</td> 381 <td><i>[<code>A=5</code>] <code>op</code></i> {vC, vD, vE, vF, vG}, 382 vtaboff@BBBB<br/> 383 <i>[<code>A=4</code>] <code>op</code></i> {vC, vD, vE, vF}, 384 vtaboff@BBBB<br/> 385 <i>[<code>A=3</code>] <code>op</code></i> {vC, vD, vE}, 386 vtaboff@BBBB<br/> 387 <i>[<code>A=2</code>] <code>op</code></i> {vC, vD}, 388 vtaboff@BBBB<br/> 389 <i>[<code>A=1</code>] <code>op</code></i> {vC}, 390 vtaboff@BBBB<br/> 391 <p><i>The unusual choice in lettering here reflects a desire to make 392 the count and the reference index have the same label as in format 393 3rms.</i></p> 394 </td> 395 <td><i>suggested format for statically linked <code>invoke-virtual</code> 396 and <code>invoke-super</code> instructions of format 35c</i> 397 </td> 398</tr> 399<tr> 400 <td>35mi</td> 401 <td><i>[<code>A=5</code>] <code>op</code></i> {vC, vD, vE, vF, vG}, 402 inline@BBBB<br/> 403 <i>[<code>A=4</code>] <code>op</code></i> {vC, vD, vE, vF}, 404 inline@BBBB<br/> 405 <i>[<code>A=3</code>] <code>op</code></i> {vC, vD, vE}, 406 inline@BBBB<br/> 407 <i>[<code>A=2</code>] <code>op</code></i> {vC, vD}, 408 inline@BBBB<br/> 409 <i>[<code>A=1</code>] <code>op</code></i> {vC}, 410 inline@BBBB<br/> 411 <p><i>The unusual choice in lettering here reflects a desire to make 412 the count and the reference index have the same label as in format 413 3rmi.</i></p> 414 </td> 415 <td><i>suggested format for inline linked <code>invoke-static</code> 416 and <code>invoke-virtual</code> instructions of format 35c</i> 417 </td> 418</tr> 419<tr> 420 <td rowspan="3">AA|<i>op</i> BBBB CCCC</td> 421 <td>3rc</td> 422 <td><i><code>op</code></i> {vCCCC .. vNNNN}, meth@BBBB<br/> 423 <i><code>op</code></i> {vCCCC .. vNNNN}, type@BBBB<br/> 424 <p><i>where <code>NNNN = CCCC+AA-1</code>, that is <code>A</code> 425 determines the count <code>0..255</code>, and <code>C</code> 426 determines the first register</i></p> 427 </td> 428 <td> </td> 429</tr> 430<tr> 431 <td>3rms</td> 432 <td><i><code>op</code></i> {vCCCC .. vNNNN}, vtaboff@BBBB<br/> 433 <p><i>where <code>NNNN = CCCC+AA-1</code>, that is <code>A</code> 434 determines the count <code>0..255</code>, and <code>C</code> 435 determines the first register</i></p> 436 </td> 437 <td><i>suggested format for statically linked <code>invoke-virtual</code> 438 and <code>invoke-super</code> instructions of format <code>3rc</code></i> 439 </td> 440</tr> 441<tr> 442 <td>3rmi</td> 443 <td><i><code>op</code></i> {vCCCC .. vNNNN}, inline@BBBB<br/> 444 <p><i>where <code>NNNN = CCCC+AA-1</code>, that is <code>A</code> 445 determines the count <code>0..255</code>, and <code>C</code> 446 determines the first register</i></p> 447 </td> 448 <td><i>suggested format for inline linked <code>invoke-static</code> 449 and <code>invoke-virtual</code> instructions of format 3rc</i> 450 </td> 451</tr> 452<tr> 453 <td>AA|<i>op</i> BBBB<sub>lo</sub> BBBB BBBB BBBB<sub>hi</sub></td> 454 <td>51l</td> 455 <td><i><code>op</code></i> vAA, #+BBBBBBBBBBBBBBBB</td> 456 <td>const-wide</td> 457</tr> 458<tr> 459 <td rowspan="2"><i>exop</i> BB|AA CCCC</td> 460 <td>33x</td> 461 <td><i><code>exop</code></i> vAA, vBB, vCCCC</td> 462 <td> </td> 463</tr> 464<tr> 465 <td>32s</td> 466 <td><i><code>exop</code></i> vAA, vBB, #+CCCC</td> 467 <td> </td> 468</tr> 469<tr> 470 <td><i>exop</i> BBBB<sub>lo</sub> BBBB<sub>hi</sub> AAAA</td></td> 471 <td>40sc</td> 472 <td><i><code>exop</code></i> AAAA, kind@BBBBBBBB</td> 473 <td><i>suggested format for statically determined verification errors; 474 see <code>20bc</code>, above</i></td> 475</tr> 476<tr> 477 <td><i>exop</i> BBBB<sub>lo</sub> BBBB<sub>hi</sub> AAAA 478 <td>41c</td> 479 <td><i><code>exop</code></i> vAAAA, field@BBBBBBBB<br/> 480 <i><code>exop</code></i> vAAAA, type@BBBBBBBB 481 <p><i>The unusual choice in lettering here reflects a desire to make 482 the letters match their use in related formats 21c and 31c.</i></p> 483 </td> 484 <td> </td> 485</tr> 486<tr> 487 <td><i>exop</i> CCCC<sub>lo</sub> CCCC<sub>hi</sub> 488 AAAA BBBB</td> 489 <td>52c</td> 490 <td><i><code>exop</code></i> vAAAA, vBBBB, field@CCCCCCCC<br/> 491 <i><code>exop</code></i> vAAAA, vBBBB, type@CCCCCCCC 492 <p><i>The unusual choice in lettering here reflects a desire to make 493 the letters match their use in related formats 22c and 22cs.</i></p> 494 </td> 495 <td> </td> 496</tr> 497<tr> 498 <td><i>exop</i> BBBB<sub>lo</sub> BBBB<sub>hi</sub> 499 AAAA CCCC</td> 500 <td>5rc</td> 501 <td><i><code>exop</code></i> {vCCCC .. vNNNN}, meth@BBBBBBBB<br/> 502 <i><code>exop</code></i> {vCCCC .. vNNNN}, type@BBBBBBBB<br/> 503 <p><i>where <code>NNNN = CCCC+AAAA-1</code>, that is <code>A</code> 504 determines the count <code>0..65535</code>, and <code>C</code> 505 determines the first register</i></p> 506 <p><i>The unusual choice in lettering here reflects a desire to make 507 the letters match their use in related formats 3rc, 3rms, and 3rmi.</i></p> 508 </td> 509 <td> </td> 510</tr> 511</tbody> 512</table> 513 514</body> 515</html> 516