1<?xml version="1.0"?> 2<!-- 3 * Licensed to the Apache Software Foundation (ASF) under one 4 * or more contributor license agreements. See the NOTICE file 5 * distributed with this work for additional information 6 * regarding copyright ownership. The ASF licenses this file 7 * to you under the Apache License, Version 2.0 (the 8 * "License"); you may not use this file except in compliance 9 * with the License. You may obtain a copy of the License at 10 * 11 * http://www.apache.org/licenses/LICENSE-2.0 12 * 13 * Unless required by applicable law or agreed to in writing, 14 * software distributed under the License is distributed on an 15 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 16 * KIND, either express or implied. See the License for the 17 * specific language governing permissions and limitations 18 * under the License. 19--> 20<document> 21 <properties> 22 <title>The BCEL API</title> 23 </properties> 24 25 <body> 26 <section name="The BCEL API"> 27 <p> 28 The <font face="helvetica,arial">BCEL</font> API abstracts from 29 the concrete circumstances of the Java Virtual Machine and how to 30 read and write binary Java class files. The API mainly consists 31 of three parts: 32 </p> 33 34 <p> 35 36 <ol type="1"> 37 <li> A package that contains classes that describe "static" 38 constraints of class files, i.e., reflects the class file format and 39 is not intended for byte code modifications. The classes may be 40 used to read and write class files from or to a file. This is 41 useful especially for analyzing Java classes without having the 42 source files at hand. The main data structure is called 43 <tt>JavaClass</tt> which contains methods, fields, etc..</li> 44 45 <li> A package to dynamically generate or modify 46 <tt>JavaClass</tt> or <tt>Method</tt> objects. It may be used to 47 insert analysis code, to strip unnecessary information from class 48 files, or to implement the code generator back-end of a Java 49 compiler.</li> 50 51 <li> Various code examples and utilities like a class file viewer, 52 a tool to convert class files into HTML, and a converter from 53 class files to the <a 54 href="http://jasmin.sourceforge.net">Jasmin</a> assembly 55 language.</li> 56 </ol> 57 </p> 58 59 <subsection name="JavaClass"> 60 <p> 61 The "static" component of the <font 62 face="helvetica,arial">BCEL</font> API resides in the package 63 <tt>org.apache.bcel.classfile</tt> and closely represents class 64 files. All of the binary components and data structures declared 65 in the <a 66 href="http://docs.oracle.com/javase/specs/">JVM 67 specification</a> and described in section <a 68 href="#2 The Java Virtual Machine">2</a> are mapped to classes. 69 70 <a href="#Figure 3">Figure 3</a> shows an UML diagram of the 71 hierarchy of classes of the <font face="helvetica,arial">BCEL 72 </font>API. <a href="#Figure 8">Figure 8</a> in the appendix also 73 shows a detailed diagram of the <tt>ConstantPool</tt> components. 74 </p> 75 76 <p align="center"> 77 <a name="Figure 3"> 78 <img src="../images/javaclass.gif"/> <br/> 79 Figure 3: UML diagram for the JavaClass API</a> 80 </p> 81 82 <p> 83 The top-level data structure is <tt>JavaClass</tt>, which in most 84 cases is created by a <tt>ClassParser</tt> object that is capable 85 of parsing binary class files. A <tt>JavaClass</tt> object 86 basically consists of fields, methods, symbolic references to the 87 super class and to the implemented interfaces. 88 </p> 89 90 <p> 91 The constant pool serves as some kind of central repository and is 92 thus of outstanding importance for all components. 93 <tt>ConstantPool</tt> objects contain an array of fixed size of 94 <tt>Constant</tt> entries, which may be retrieved via the 95 <tt>getConstant()</tt> method taking an integer index as argument. 96 Indexes to the constant pool may be contained in instructions as 97 well as in other components of a class file and in constant pool 98 entries themselves. 99 </p> 100 101 <p> 102 Methods and fields contain a signature, symbolically defining 103 their types. Access flags like <tt>public static final</tt> occur 104 in several places and are encoded by an integer bit mask, e.g., 105 <tt>public static final</tt> matches to the Java expression 106 </p> 107 108 109 <source>int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;</source> 110 111 <p> 112 As mentioned in <a href="jvm.html#Java_class_file_format">section 113 2.1</a> already, several components may contain <em>attribute</em> 114 objects: classes, fields, methods, and <tt>Code</tt> objects 115 (introduced in <a href="jvm.html#Method_code">section 2.3</a>). The 116 latter is an attribute itself that contains the actual byte code 117 array, the maximum stack size, the number of local variables, a 118 table of handled exceptions, and some optional debugging 119 information coded as <tt>LineNumberTable</tt> and 120 <tt>LocalVariableTable</tt> attributes. Attributes are in general 121 specific to some data structure, i.e., no two components share the 122 same kind of attribute, though this is not explicitly 123 forbidden. In the figure the <tt>Attribute</tt> classes are stereotyped 124 with the component they belong to. 125 </p> 126 127 </subsection> 128 129 <subsection name="Class repository"> 130 <p> 131 Using the provided <tt>Repository</tt> class, reading class files into 132 a <tt>JavaClass</tt> object is quite simple: 133 </p> 134 135 <source>JavaClass clazz = Repository.lookupClass("java.lang.String");</source> 136 137 <p> 138 The repository also contains methods providing the dynamic equivalent 139 of the <tt>instanceof</tt> operator, and other useful routines: 140 </p> 141 142 <source> 143if (Repository.instanceOf(clazz, super_class)) { 144 ... 145} 146 </source> 147 148 </subsection> 149 150 <h4>Accessing class file data</h4> 151 152 <p> 153 Information within the class file components may be accessed like 154 Java Beans via intuitive set/get methods. All of them also define 155 a <tt>toString()</tt> method so that implementing a simple class 156 viewer is very easy. In fact all of the examples used here have 157 been produced this way: 158 </p> 159 160 <source> 161System.out.println(clazz); 162printCode(clazz.getMethods()); 163... 164public static void printCode(Method[] methods) { 165 for (int i = 0; i < methods.length; i++) { 166 System.out.println(methods[i]); 167 168 Code code = methods[i].getCode(); 169 if (code != null) // Non-abstract method 170 System.out.println(code); 171 } 172} 173 </source> 174 175 <h4>Analyzing class data</h4> 176 <p> 177 Last but not least, <font face="helvetica,arial">BCEL</font> 178 supports the <em>Visitor</em> design pattern, so one can write 179 visitor objects to traverse and analyze the contents of a class 180 file. Included in the distribution is a class 181 <tt>JasminVisitor</tt> that converts class files into the <a 182 href="http://jasmin.sourceforge.net">Jasmin</a> 183 assembler language. 184 </p> 185 186 <subsection name="ClassGen"> 187 <p> 188 This part of the API (package <tt>org.apache.bcel.generic</tt>) 189 supplies an abstraction level for creating or transforming class 190 files dynamically. It makes the static constraints of Java class 191 files like the hard-coded byte code addresses "generic". The 192 generic constant pool, for example, is implemented by the class 193 <tt>ConstantPoolGen</tt> which offers methods for adding different 194 types of constants. Accordingly, <tt>ClassGen</tt> offers an 195 interface to add methods, fields, and attributes. 196 <a href="#Figure 4">Figure 4</a> gives an overview of this part of the API. 197 </p> 198 199 <p align="center"> 200 <a name="Figure 4"> 201 <img src="../images/classgen.gif"/> 202 <br/> 203 Figure 4: UML diagram of the ClassGen API</a> 204 </p> 205 206 <h4>Types</h4> 207 <p> 208 We abstract from the concrete details of the type signature syntax 209 (see <a href="jvm.html#Type_information">2.5</a>) by introducing the 210 <tt>Type</tt> class, which is used, for example, by methods to 211 define their return and argument types. Concrete sub-classes are 212 <tt>BasicType</tt>, <tt>ObjectType</tt>, and <tt>ArrayType</tt> 213 which consists of the element type and the number of 214 dimensions. For commonly used types the class offers some 215 predefined constants. For example, the method signature of the 216 <tt>main</tt> method as shown in 217 <a href="jvm.html#Type_information">section 2.5</a> is represented by: 218 </p> 219 220 <source> 221Type return_type = Type.VOID; 222Type[] arg_types = new Type[] { new ArrayType(Type.STRING, 1) }; 223 </source> 224 225 <p> 226 <tt>Type</tt> also contains methods to convert types into textual 227 signatures and vice versa. The sub-classes contain implementations 228 of the routines and constraints specified by the Java Language 229 Specification. 230 </p> 231 232 <h4>Generic fields and methods</h4> 233 <p> 234 Fields are represented by <tt>FieldGen</tt> objects, which may be 235 freely modified by the user. If they have the access rights 236 <tt>static final</tt>, i.e., are constants and of basic type, they 237 may optionally have an initializing value. 238 </p> 239 240 <p> 241 Generic methods contain methods to add exceptions the method may 242 throw, local variables, and exception handlers. The latter two are 243 represented by user-configurable objects as well. Because 244 exception handlers and local variables contain references to byte 245 code addresses, they also take the role of an <em>instruction 246 targeter</em> in our terminology. Instruction targeters contain a 247 method <tt>updateTarget()</tt> to redirect a reference. This is 248 somewhat related to the Observer design pattern. Generic 249 (non-abstract) methods refer to <em>instruction lists</em> that 250 consist of instruction objects. References to byte code addresses 251 are implemented by handles to instruction objects. If the list is 252 updated the instruction targeters will be informed about it. This 253 is explained in more detail in the following sections. 254 </p> 255 256 <p> 257 The maximum stack size needed by the method and the maximum number 258 of local variables used may be set manually or computed via the 259 <tt>setMaxStack()</tt> and <tt>setMaxLocals()</tt> methods 260 automatically. 261 </p> 262 263 <h4>Instructions</h4> 264 <p> 265 Modeling instructions as objects may look somewhat odd at first 266 sight, but in fact enables programmers to obtain a high-level view 267 upon control flow without handling details like concrete byte code 268 offsets. Instructions consist of an opcode (sometimes called 269 tag), their length in bytes and an offset (or index) within the 270 byte code. Since many instructions are immutable (stack operators, 271 e.g.), the <tt>InstructionConstants</tt> interface offers 272 shareable predefined "fly-weight" constants to use. 273 </p> 274 275 <p> 276 Instructions are grouped via sub-classing, the type hierarchy of 277 instruction classes is illustrated by (incomplete) figure in the 278 appendix. The most important family of instructions are the 279 <em>branch instructions</em>, e.g., <tt>goto</tt>, that branch to 280 targets somewhere within the byte code. Obviously, this makes them 281 candidates for playing an <tt>InstructionTargeter</tt> role, 282 too. Instructions are further grouped by the interfaces they 283 implement, there are, e.g., <tt>TypedInstruction</tt>s that are 284 associated with a specific type like <tt>ldc</tt>, or 285 <tt>ExceptionThrower</tt> instructions that may raise exceptions 286 when executed. 287 </p> 288 289 <p> 290 All instructions can be traversed via <tt>accept(Visitor v)</tt> 291 methods, i.e., the Visitor design pattern. There is however some 292 special trick in these methods that allows to merge the handling 293 of certain instruction groups. The <tt>accept()</tt> do not only 294 call the corresponding <tt>visit()</tt> method, but call 295 <tt>visit()</tt> methods of their respective super classes and 296 implemented interfaces first, i.e., the most specific 297 <tt>visit()</tt> call is last. Thus one can group the handling of, 298 say, all <tt>BranchInstruction</tt>s into one single method. 299 </p> 300 301 <p> 302 For debugging purposes it may even make sense to "invent" your own 303 instructions. In a sophisticated code generator like the one used 304 as a backend of the <a href="http://barat.sourceforge.net">Barat 305 framework</a> for static analysis one often has to insert 306 temporary <tt>nop</tt> (No operation) instructions. When examining 307 the produced code it may be very difficult to track back where the 308 <tt>nop</tt> was actually inserted. One could think of a derived 309 <tt>nop2</tt> instruction that contains additional debugging 310 information. When the instruction list is dumped to byte code, the 311 extra data is simply dropped. 312 </p> 313 314 <p> 315 One could also think of new byte code instructions operating on 316 complex numbers that are replaced by normal byte code upon 317 load-time or are recognized by a new JVM. 318 </p> 319 320 <h4>Instruction lists</h4> 321 <p> 322 An <em>instruction list</em> is implemented by a list of 323 <em>instruction handles</em> encapsulating instruction objects. 324 References to instructions in the list are thus not implemented by 325 direct pointers to instructions but by pointers to instruction 326 <em>handles</em>. This makes appending, inserting and deleting 327 areas of code very simple and also allows us to reuse immutable 328 instruction objects (fly-weight objects). Since we use symbolic 329 references, computation of concrete byte code offsets does not 330 need to occur until finalization, i.e., until the user has 331 finished the process of generating or transforming code. We will 332 use the term instruction handle and instruction synonymously 333 throughout the rest of the paper. Instruction handles may contain 334 additional user-defined data using the <tt>addAttribute()</tt> 335 method. 336 </p> 337 338 <p> 339 <b>Appending:</b> One can append instructions or other instruction 340 lists anywhere to an existing list. The instructions are appended 341 after the given instruction handle. All append methods return a 342 new instruction handle which may then be used as the target of a 343 branch instruction, e.g.: 344 </p> 345 346 <source> 347InstructionList il = new InstructionList(); 348... 349GOTO g = new GOTO(null); 350il.append(g); 351... 352// Use immutable fly-weight object 353InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL); 354g.setTarget(ih); 355 </source> 356 357 <p> 358 <b>Inserting:</b> Instructions may be inserted anywhere into an 359 existing list. They are inserted before the given instruction 360 handle. All insert methods return a new instruction handle which 361 may then be used as the start address of an exception handler, for 362 example. 363 </p> 364 365 <source> 366InstructionHandle start = il.insert(insertion_point, InstructionConstants.NOP); 367... 368mg.addExceptionHandler(start, end, handler, "java.io.IOException"); 369 </source> 370 371 <p> 372 <b>Deleting:</b> Deletion of instructions is also very 373 straightforward; all instruction handles and the contained 374 instructions within a given range are removed from the instruction 375 list and disposed. The <tt>delete()</tt> method may however throw 376 a <tt>TargetLostException</tt> when there are instruction 377 targeters still referencing one of the deleted instructions. The 378 user is forced to handle such exceptions in a <tt>try-catch</tt> 379 clause and redirect these references elsewhere. The <em>peep 380 hole</em> optimizer described in the appendix gives a detailed 381 example for this. 382 </p> 383 384 <source> 385try { 386 il.delete(first, last); 387} catch (TargetLostException e) { 388 for (InstructionHandle target : e.getTargets()) { 389 for (InstructionTargeter targeter : target.getTargeters()) { 390 targeter.updateTarget(target, new_target); 391 } 392 } 393} 394 </source> 395 396 <p> 397 <b>Finalizing:</b> When the instruction list is ready to be dumped 398 to pure byte code, all symbolic references must be mapped to real 399 byte code offsets. This is done by the <tt>getByteCode()</tt> 400 method which is called by default by 401 <tt>MethodGen.getMethod()</tt>. Afterwards you should call 402 <tt>dispose()</tt> so that the instruction handles can be reused 403 internally. This helps to improve memory usage. 404 </p> 405 406 <source> 407InstructionList il = new InstructionList(); 408 409ClassGen cg = new ClassGen("HelloWorld", "java.lang.Object", 410 "<generated>", ACC_PUBLIC | ACC_SUPER, null); 411MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC, 412 Type.VOID, new Type[] { new ArrayType(Type.STRING, 1) }, 413 new String[] { "argv" }, "main", "HelloWorld", il, cp); 414... 415cg.addMethod(mg.getMethod()); 416il.dispose(); // Reuse instruction handles of list 417 </source> 418 419 <h4>Code example revisited</h4> 420 <p> 421 Using instruction lists gives us a generic view upon the code: In 422 <a href="#Figure 5">Figure 5</a> we again present the code chunk 423 of the <tt>readInt()</tt> method of the factorial example in section 424 <a href="jvm.html#Code_example">2.6</a>: The local variables 425 <tt>n</tt> and <tt>e1</tt> both hold two references to 426 instructions, defining their scope. There are two <tt>goto</tt>s 427 branching to the <tt>iload</tt> at the end of the method. One of 428 the exception handlers is displayed, too: it references the start 429 and the end of the <tt>try</tt> block and also the exception 430 handler code. 431 </p> 432 433 <p align="center"> 434 <a name="Figure 5"> 435 <img src="../images/il.gif"/> 436 <br/> 437 Figure 5: Instruction list for <tt>readInt()</tt> method</a> 438 </p> 439 440 <h4>Instruction factories</h4> 441 <p> 442 To simplify the creation of certain instructions the user can use 443 the supplied <tt>InstructionFactory</tt> class which offers a lot 444 of useful methods to create instructions from 445 scratch. Alternatively, he can also use <em>compound 446 instructions</em>: When producing byte code, some patterns 447 typically occur very frequently, for instance the compilation of 448 arithmetic or comparison expressions. You certainly do not want 449 to rewrite the code that translates such expressions into byte 450 code in every place they may appear. In order to support this, the 451 <font face="helvetica,arial">BCEL</font> API includes a <em>compound 452 instruction</em> (an interface with a single 453 <tt>getInstructionList()</tt> method). Instances of this class 454 may be used in any place where normal instructions would occur, 455 particularly in append operations. 456 </p> 457 458 <p> 459 <b>Example: Pushing constants</b> Pushing constants onto the 460 operand stack may be coded in different ways. As explained in <a 461 href="jvm.html#Byte_code_instruction_set">section 2.2</a> there are 462 some "short-cut" instructions that can be used to make the 463 produced byte code more compact. The smallest instruction to push 464 a single <tt>1</tt> onto the stack is <tt>iconst_1</tt>, other 465 possibilities are <tt>bipush</tt> (can be used to push values 466 between -128 and 127), <tt>sipush</tt> (between -32768 and 32767), 467 or <tt>ldc</tt> (load constant from constant pool). 468 </p> 469 470 <p> 471 Instead of repeatedly selecting the most compact instruction in, 472 say, a switch, one can use the compound <tt>PUSH</tt> instruction 473 whenever pushing a constant number or string. It will produce the 474 appropriate byte code instruction and insert entries into to 475 constant pool if necessary. 476 </p> 477 478 <source> 479InstructionFactory f = new InstructionFactory(class_gen); 480InstructionList il = new InstructionList(); 481... 482il.append(new PUSH(cp, "Hello, world")); 483il.append(new PUSH(cp, 4711)); 484... 485il.append(f.createPrintln("Hello World")); 486... 487il.append(f.createReturn(type)); 488 </source> 489 490 <h4>Code patterns using regular expressions</h4> 491 <p> 492 When transforming code, for instance during optimization or when 493 inserting analysis method calls, one typically searches for 494 certain patterns of code to perform the transformation at. To 495 simplify handling such situations <font 496 face="helvetica,arial">BCEL </font>introduces a special feature: 497 One can search for given code patterns within an instruction list 498 using <em>regular expressions</em>. In such expressions, 499 instructions are represented by their opcode names, e.g., 500 <tt>LDC</tt>, one may also use their respective super classes, e.g., 501 "<tt>IfInstruction</tt>". Meta characters like <tt>+</tt>, 502 <tt>*</tt>, and <tt>(..|..)</tt> have their usual meanings. Thus, 503 the expression 504 </p> 505 506 <source>"NOP+(ILOAD|ALOAD)*"</source> 507 508 <p> 509 represents a piece of code consisting of at least one <tt>NOP</tt> 510 followed by a possibly empty sequence of <tt>ILOAD</tt> and 511 <tt>ALOAD</tt> instructions. 512 </p> 513 514 <p> 515 The <tt>search()</tt> method of class 516 <tt>org.apache.bcel.util.InstructionFinder</tt> gets a regular 517 expression and a starting point as arguments and returns an 518 iterator describing the area of matched instructions. Additional 519 constraints to the matching area of instructions, which can not be 520 implemented via regular expressions, may be expressed via <em>code 521 constraint</em> objects. 522 </p> 523 524 <h4>Example: Optimizing boolean expressions</h4> 525 <p> 526 In Java, boolean values are mapped to 1 and to 0, 527 respectively. Thus, the simplest way to evaluate boolean 528 expressions is to push a 1 or a 0 onto the operand stack depending 529 on the truth value of the expression. But this way, the 530 subsequent combination of boolean expressions (with 531 <tt>&&</tt>, e.g) yields long chunks of code that push 532 lots of 1s and 0s onto the stack. 533 </p> 534 535 <p> 536 When the code has been finalized these chunks can be optimized 537 with a <em>peep hole</em> algorithm: An <tt>IfInstruction</tt> 538 (e.g. the comparison of two integers: <tt>if_icmpeq</tt>) that 539 either produces a 1 or a 0 on the stack and is followed by an 540 <tt>ifne</tt> instruction (branch if stack value 0) may be 541 replaced by the <tt>IfInstruction</tt> with its branch target 542 replaced by the target of the <tt>ifne</tt> instruction: 543 </p> 544 545 <source> 546CodeConstraint constraint = new CodeConstraint() { 547 public boolean checkCode(InstructionHandle[] match) { 548 IfInstruction if1 = (IfInstruction) match[0].getInstruction(); 549 GOTO g = (GOTO) match[2].getInstruction(); 550 return (if1.getTarget() == match[3]) && 551 (g.getTarget() == match[4]); 552 } 553}; 554 555InstructionFinder f = new InstructionFinder(il); 556String pat = "IfInstruction ICONST_0 GOTO ICONST_1 NOP(IFEQ|IFNE)"; 557 558for (Iterator e = f.search(pat, constraint); e.hasNext(); ) { 559 InstructionHandle[] match = (InstructionHandle[]) e.next();; 560 ... 561 match[0].setTarget(match[5].getTarget()); // Update target 562 ... 563 try { 564 il.delete(match[1], match[5]); 565 } catch (TargetLostException ex) { 566 ... 567 } 568} 569 </source> 570 571 <p> 572 The applied code constraint object ensures that the matched code 573 really corresponds to the targeted expression pattern. Subsequent 574 application of this algorithm removes all unnecessary stack 575 operations and branch instructions from the byte code. If any of 576 the deleted instructions is still referenced by an 577 <tt>InstructionTargeter</tt> object, the reference has to be 578 updated in the <tt>catch</tt>-clause. 579 </p> 580 581 <p> 582 <b>Example application:</b> 583 The expression: 584 </p> 585 586 <source> 587 if ((a == null) || (i < 2)) 588 System.out.println("Ooops"); 589 </source> 590 591 <p> 592 can be mapped to both of the chunks of byte code shown in <a 593 href="#Figure 6">figure 6</a>. The left column represents the 594 unoptimized code while the right column displays the same code 595 after the peep hole algorithm has been applied: 596 </p> 597 598 <p align="center"><a name="Figure 6"> 599 <table> 600 <tr> 601 <td valign="top"><pre> 602 5: aload_0 603 6: ifnull #13 604 9: iconst_0 605 10: goto #14 606 13: iconst_1 607 14: nop 608 15: ifne #36 609 18: iload_1 610 19: iconst_2 611 20: if_icmplt #27 612 23: iconst_0 613 24: goto #28 614 27: iconst_1 615 28: nop 616 29: ifne #36 617 32: iconst_0 618 33: goto #37 619 36: iconst_1 620 37: nop 621 38: ifeq #52 622 41: getstatic System.out 623 44: ldc "Ooops" 624 46: invokevirtual println 625 52: return 626 </pre></td> 627 <td valign="top"><pre> 628 10: aload_0 629 11: ifnull #19 630 14: iload_1 631 15: iconst_2 632 16: if_icmpge #27 633 19: getstatic System.out 634 22: ldc "Ooops" 635 24: invokevirtual println 636 27: return 637 </pre></td> 638 </tr> 639 </table> 640 </a> 641 </p> 642 </subsection> 643 </section> 644 </body> 645</document>