• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<?xml version="1.0"?>
2<!--
3    * Licensed to the Apache Software Foundation (ASF) under one
4    * or more contributor license agreements.  See the NOTICE file
5    * distributed with this work for additional information
6    * regarding copyright ownership.  The ASF licenses this file
7    * to you under the Apache License, Version 2.0 (the
8    * "License"); you may not use this file except in compliance
9    * with the License.  You may obtain a copy of the License at
10    *
11    *   http://www.apache.org/licenses/LICENSE-2.0
12    *
13    * Unless required by applicable law or agreed to in writing,
14    * software distributed under the License is distributed on an
15    * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16    * KIND, either express or implied.  See the License for the
17    * specific language governing permissions and limitations
18    * under the License.
19-->
20<document>
21  <properties>
22    <title>The BCEL API</title>
23  </properties>
24
25  <body>
26    <section name="The BCEL API">
27      <p>
28        The <font face="helvetica,arial">BCEL</font> API abstracts from
29        the concrete circumstances of the Java Virtual Machine and how to
30        read and write binary Java class files. The API mainly consists
31        of three parts:
32      </p>
33
34      <p>
35
36        <ol type="1">
37          <li> A package that contains classes that describe "static"
38            constraints of class files, i.e., reflects the class file format and
39            is not intended for byte code modifications. The classes may be
40            used to read and write class files from or to a file.  This is
41            useful especially for analyzing Java classes without having the
42            source files at hand.  The main data structure is called
43            <tt>JavaClass</tt> which contains methods, fields, etc..</li>
44
45          <li> A package to dynamically generate or modify
46            <tt>JavaClass</tt> or <tt>Method</tt> objects.  It may be used to
47            insert analysis code, to strip unnecessary information from class
48            files, or to implement the code generator back-end of a Java
49            compiler.</li>
50
51          <li> Various code examples and utilities like a class file viewer,
52            a tool to convert class files into HTML, and a converter from
53            class files to the <a
54                    href="http://jasmin.sourceforge.net">Jasmin</a> assembly
55            language.</li>
56        </ol>
57      </p>
58
59    <subsection name="JavaClass">
60      <p>
61        The "static" component of the <font
62              face="helvetica,arial">BCEL</font> API resides in the package
63        <tt>org.apache.bcel.classfile</tt> and closely represents class
64        files. All of the binary components and data structures declared
65        in the <a
66              href="http://docs.oracle.com/javase/specs/">JVM
67        specification</a> and described in section <a
68              href="#2 The Java Virtual Machine">2</a> are mapped to classes.
69
70        <a href="#Figure 3">Figure 3</a> shows an UML diagram of the
71        hierarchy of classes of the <font face="helvetica,arial">BCEL
72      </font>API. <a href="#Figure 8">Figure 8</a> in the appendix also
73        shows a detailed diagram of the <tt>ConstantPool</tt> components.
74      </p>
75
76      <p align="center">
77        <a name="Figure 3">
78          <img src="../images/javaclass.gif"/> <br/>
79          Figure 3: UML diagram for the JavaClass API</a>
80      </p>
81
82      <p>
83        The top-level data structure is <tt>JavaClass</tt>, which in most
84        cases is created by a <tt>ClassParser</tt> object that is capable
85        of parsing binary class files. A <tt>JavaClass</tt> object
86        basically consists of fields, methods, symbolic references to the
87        super class and to the implemented interfaces.
88      </p>
89
90      <p>
91        The constant pool serves as some kind of central repository and is
92        thus of outstanding importance for all components.
93        <tt>ConstantPool</tt> objects contain an array of fixed size of
94        <tt>Constant</tt> entries, which may be retrieved via the
95        <tt>getConstant()</tt> method taking an integer index as argument.
96        Indexes to the constant pool may be contained in instructions as
97        well as in other components of a class file and in constant pool
98        entries themselves.
99      </p>
100
101      <p>
102        Methods and fields contain a signature, symbolically defining
103        their types.  Access flags like <tt>public static final</tt> occur
104        in several places and are encoded by an integer bit mask, e.g.,
105        <tt>public static final</tt> matches to the Java expression
106      </p>
107
108
109      <source>int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;</source>
110
111      <p>
112        As mentioned in <a href="jvm.html#Java_class_file_format">section
113        2.1</a> already, several components may contain <em>attribute</em>
114        objects: classes, fields, methods, and <tt>Code</tt> objects
115        (introduced in <a href="jvm.html#Method_code">section 2.3</a>).  The
116        latter is an attribute itself that contains the actual byte code
117        array, the maximum stack size, the number of local variables, a
118        table of handled exceptions, and some optional debugging
119        information coded as <tt>LineNumberTable</tt> and
120        <tt>LocalVariableTable</tt> attributes. Attributes are in general
121        specific to some data structure, i.e., no two components share the
122        same kind of attribute, though this is not explicitly
123        forbidden. In the figure the <tt>Attribute</tt> classes are stereotyped
124        with the component they belong to.
125      </p>
126
127    </subsection>
128
129    <subsection name="Class repository">
130      <p>
131        Using the provided <tt>Repository</tt> class, reading class files into
132        a <tt>JavaClass</tt> object is quite simple:
133      </p>
134
135      <source>JavaClass clazz = Repository.lookupClass("java.lang.String");</source>
136
137      <p>
138        The repository also contains methods providing the dynamic equivalent
139        of the <tt>instanceof</tt> operator, and other useful routines:
140      </p>
141
142      <source>
143if (Repository.instanceOf(clazz, super_class)) {
144    ...
145}
146      </source>
147
148    </subsection>
149
150    <h4>Accessing class file data</h4>
151
152      <p>
153        Information within the class file components may be accessed like
154        Java Beans via intuitive set/get methods. All of them also define
155        a <tt>toString()</tt> method so that implementing a simple class
156        viewer is very easy. In fact all of the examples used here have
157        been produced this way:
158      </p>
159
160      <source>
161System.out.println(clazz);
162printCode(clazz.getMethods());
163...
164public static void printCode(Method[] methods) {
165    for (int i = 0; i &lt; methods.length; i++) {
166        System.out.println(methods[i]);
167
168        Code code = methods[i].getCode();
169        if (code != null) // Non-abstract method
170        System.out.println(code);
171    }
172}
173      </source>
174
175    <h4>Analyzing class data</h4>
176      <p>
177        Last but not least, <font face="helvetica,arial">BCEL</font>
178        supports the <em>Visitor</em> design pattern, so one can write
179        visitor objects to traverse and analyze the contents of a class
180        file. Included in the distribution is a class
181        <tt>JasminVisitor</tt> that converts class files into the <a
182              href="http://jasmin.sourceforge.net">Jasmin</a>
183        assembler language.
184      </p>
185
186    <subsection name="ClassGen">
187      <p>
188        This part of the API (package <tt>org.apache.bcel.generic</tt>)
189        supplies an abstraction level for creating or transforming class
190        files dynamically. It makes the static constraints of Java class
191        files like the hard-coded byte code addresses "generic". The
192        generic constant pool, for example, is implemented by the class
193        <tt>ConstantPoolGen</tt> which offers methods for adding different
194        types of constants. Accordingly, <tt>ClassGen</tt> offers an
195        interface to add methods, fields, and attributes.
196        <a href="#Figure 4">Figure 4</a> gives an overview of this part of the API.
197      </p>
198
199      <p align="center">
200        <a name="Figure 4">
201          <img src="../images/classgen.gif"/>
202          <br/>
203          Figure 4: UML diagram of the ClassGen API</a>
204      </p>
205
206    <h4>Types</h4>
207      <p>
208        We abstract from the concrete details of the type signature syntax
209        (see <a href="jvm.html#Type_information">2.5</a>) by introducing the
210        <tt>Type</tt> class, which is used, for example, by methods to
211        define their return and argument types. Concrete sub-classes are
212        <tt>BasicType</tt>, <tt>ObjectType</tt>, and <tt>ArrayType</tt>
213        which consists of the element type and the number of
214        dimensions. For commonly used types the class offers some
215        predefined constants. For example, the method signature of the
216        <tt>main</tt> method as shown in
217        <a href="jvm.html#Type_information">section 2.5</a> is represented by:
218      </p>
219
220      <source>
221Type return_type = Type.VOID;
222Type[] arg_types = new Type[] { new ArrayType(Type.STRING, 1) };
223      </source>
224
225      <p>
226        <tt>Type</tt> also contains methods to convert types into textual
227        signatures and vice versa. The sub-classes contain implementations
228        of the routines and constraints specified by the Java Language
229        Specification.
230      </p>
231
232    <h4>Generic fields and methods</h4>
233      <p>
234        Fields are represented by <tt>FieldGen</tt> objects, which may be
235        freely modified by the user. If they have the access rights
236        <tt>static final</tt>, i.e., are constants and of basic type, they
237        may optionally have an initializing value.
238      </p>
239
240      <p>
241        Generic methods contain methods to add exceptions the method may
242        throw, local variables, and exception handlers. The latter two are
243        represented by user-configurable objects as well. Because
244        exception handlers and local variables contain references to byte
245        code addresses, they also take the role of an <em>instruction
246        targeter</em> in our terminology. Instruction targeters contain a
247        method <tt>updateTarget()</tt> to redirect a reference. This is
248        somewhat related to the Observer design pattern. Generic
249        (non-abstract) methods refer to <em>instruction lists</em> that
250        consist of instruction objects. References to byte code addresses
251        are implemented by handles to instruction objects. If the list is
252        updated the instruction targeters will be informed about it. This
253        is explained in more detail in the following sections.
254      </p>
255
256      <p>
257        The maximum stack size needed by the method and the maximum number
258        of local variables used may be set manually or computed via the
259        <tt>setMaxStack()</tt> and <tt>setMaxLocals()</tt> methods
260        automatically.
261      </p>
262
263    <h4>Instructions</h4>
264      <p>
265        Modeling instructions as objects may look somewhat odd at first
266        sight, but in fact enables programmers to obtain a high-level view
267        upon control flow without handling details like concrete byte code
268        offsets.  Instructions consist of an opcode (sometimes called
269        tag), their length in bytes and an offset (or index) within the
270        byte code. Since many instructions are immutable (stack operators,
271        e.g.), the <tt>InstructionConstants</tt> interface offers
272        shareable predefined "fly-weight" constants to use.
273      </p>
274
275      <p>
276        Instructions are grouped via sub-classing, the type hierarchy of
277        instruction classes is illustrated by (incomplete) figure in the
278        appendix. The most important family of instructions are the
279        <em>branch instructions</em>, e.g., <tt>goto</tt>, that branch to
280        targets somewhere within the byte code. Obviously, this makes them
281        candidates for playing an <tt>InstructionTargeter</tt> role,
282        too. Instructions are further grouped by the interfaces they
283        implement, there are, e.g., <tt>TypedInstruction</tt>s that are
284        associated with a specific type like <tt>ldc</tt>, or
285        <tt>ExceptionThrower</tt> instructions that may raise exceptions
286        when executed.
287      </p>
288
289      <p>
290        All instructions can be traversed via <tt>accept(Visitor v)</tt>
291        methods, i.e., the Visitor design pattern. There is however some
292        special trick in these methods that allows to merge the handling
293        of certain instruction groups. The <tt>accept()</tt> do not only
294        call the corresponding <tt>visit()</tt> method, but call
295        <tt>visit()</tt> methods of their respective super classes and
296        implemented interfaces first, i.e., the most specific
297        <tt>visit()</tt> call is last. Thus one can group the handling of,
298        say, all <tt>BranchInstruction</tt>s into one single method.
299      </p>
300
301      <p>
302        For debugging purposes it may even make sense to "invent" your own
303        instructions. In a sophisticated code generator like the one used
304        as a backend of the <a href="http://barat.sourceforge.net">Barat
305        framework</a> for static analysis one often has to insert
306        temporary <tt>nop</tt> (No operation) instructions. When examining
307        the produced code it may be very difficult to track back where the
308        <tt>nop</tt> was actually inserted. One could think of a derived
309        <tt>nop2</tt> instruction that contains additional debugging
310        information. When the instruction list is dumped to byte code, the
311        extra data is simply dropped.
312      </p>
313
314      <p>
315        One could also think of new byte code instructions operating on
316        complex numbers that are replaced by normal byte code upon
317        load-time or are recognized by a new JVM.
318      </p>
319
320    <h4>Instruction lists</h4>
321      <p>
322        An <em>instruction list</em> is implemented by a list of
323        <em>instruction handles</em> encapsulating instruction objects.
324        References to instructions in the list are thus not implemented by
325        direct pointers to instructions but by pointers to instruction
326        <em>handles</em>. This makes appending, inserting and deleting
327        areas of code very simple and also allows us to reuse immutable
328        instruction objects (fly-weight objects). Since we use symbolic
329        references, computation of concrete byte code offsets does not
330        need to occur until finalization, i.e., until the user has
331        finished the process of generating or transforming code. We will
332        use the term instruction handle and instruction synonymously
333        throughout the rest of the paper. Instruction handles may contain
334        additional user-defined data using the <tt>addAttribute()</tt>
335        method.
336      </p>
337
338      <p>
339        <b>Appending:</b> One can append instructions or other instruction
340        lists anywhere to an existing list. The instructions are appended
341        after the given instruction handle. All append methods return a
342        new instruction handle which may then be used as the target of a
343        branch instruction, e.g.:
344      </p>
345
346      <source>
347InstructionList il = new InstructionList();
348...
349GOTO g = new GOTO(null);
350il.append(g);
351...
352// Use immutable fly-weight object
353InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL);
354g.setTarget(ih);
355      </source>
356
357      <p>
358        <b>Inserting:</b> Instructions may be inserted anywhere into an
359        existing list. They are inserted before the given instruction
360        handle. All insert methods return a new instruction handle which
361        may then be used as the start address of an exception handler, for
362        example.
363      </p>
364
365      <source>
366InstructionHandle start = il.insert(insertion_point, InstructionConstants.NOP);
367...
368mg.addExceptionHandler(start, end, handler, "java.io.IOException");
369      </source>
370
371      <p>
372        <b>Deleting:</b> Deletion of instructions is also very
373        straightforward; all instruction handles and the contained
374        instructions within a given range are removed from the instruction
375        list and disposed. The <tt>delete()</tt> method may however throw
376        a <tt>TargetLostException</tt> when there are instruction
377        targeters still referencing one of the deleted instructions. The
378        user is forced to handle such exceptions in a <tt>try-catch</tt>
379        clause and redirect these references elsewhere. The <em>peep
380        hole</em> optimizer described in the appendix gives a detailed
381        example for this.
382      </p>
383
384      <source>
385try {
386    il.delete(first, last);
387} catch (TargetLostException e) {
388    for (InstructionHandle target : e.getTargets()) {
389        for (InstructionTargeter targeter : target.getTargeters()) {
390            targeter.updateTarget(target, new_target);
391        }
392    }
393}
394      </source>
395
396      <p>
397        <b>Finalizing:</b> When the instruction list is ready to be dumped
398        to pure byte code, all symbolic references must be mapped to real
399        byte code offsets. This is done by the <tt>getByteCode()</tt>
400        method which is called by default by
401        <tt>MethodGen.getMethod()</tt>. Afterwards you should call
402        <tt>dispose()</tt> so that the instruction handles can be reused
403        internally. This helps to improve memory usage.
404      </p>
405
406      <source>
407InstructionList il = new InstructionList();
408
409ClassGen  cg = new ClassGen("HelloWorld", "java.lang.Object",
410        "&lt;generated&#62;", ACC_PUBLIC | ACC_SUPER, null);
411MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC,
412        Type.VOID, new Type[] { new ArrayType(Type.STRING, 1) },
413        new String[] { "argv" }, "main", "HelloWorld", il, cp);
414...
415cg.addMethod(mg.getMethod());
416il.dispose(); // Reuse instruction handles of list
417      </source>
418
419    <h4>Code example revisited</h4>
420      <p>
421        Using instruction lists gives us a generic view upon the code: In
422        <a href="#Figure 5">Figure 5</a> we again present the code chunk
423        of the <tt>readInt()</tt> method of the factorial example in section
424        <a href="jvm.html#Code_example">2.6</a>: The local variables
425        <tt>n</tt> and <tt>e1</tt> both hold two references to
426        instructions, defining their scope.  There are two <tt>goto</tt>s
427        branching to the <tt>iload</tt> at the end of the method. One of
428        the exception handlers is displayed, too: it references the start
429        and the end of the <tt>try</tt> block and also the exception
430        handler code.
431      </p>
432
433      <p align="center">
434        <a name="Figure 5">
435          <img src="../images/il.gif"/>
436          <br/>
437          Figure 5: Instruction list for <tt>readInt()</tt> method</a>
438      </p>
439
440    <h4>Instruction factories</h4>
441      <p>
442        To simplify the creation of certain instructions the user can use
443        the supplied <tt>InstructionFactory</tt> class which offers a lot
444        of useful methods to create instructions from
445        scratch. Alternatively, he can also use <em>compound
446        instructions</em>: When producing byte code, some patterns
447        typically occur very frequently, for instance the compilation of
448        arithmetic or comparison expressions. You certainly do not want
449        to rewrite the code that translates such expressions into byte
450        code in every place they may appear. In order to support this, the
451        <font face="helvetica,arial">BCEL</font> API includes a <em>compound
452        instruction</em> (an interface with a single
453        <tt>getInstructionList()</tt> method). Instances of this class
454        may be used in any place where normal instructions would occur,
455        particularly in append operations.
456      </p>
457
458      <p>
459        <b>Example: Pushing constants</b> Pushing constants onto the
460        operand stack may be coded in different ways. As explained in <a
461              href="jvm.html#Byte_code_instruction_set">section 2.2</a> there are
462        some "short-cut" instructions that can be used to make the
463        produced byte code more compact. The smallest instruction to push
464        a single <tt>1</tt> onto the stack is <tt>iconst_1</tt>, other
465        possibilities are <tt>bipush</tt> (can be used to push values
466        between -128 and 127), <tt>sipush</tt> (between -32768 and 32767),
467        or <tt>ldc</tt> (load constant from constant pool).
468      </p>
469
470      <p>
471        Instead of repeatedly selecting the most compact instruction in,
472        say, a switch, one can use the compound <tt>PUSH</tt> instruction
473        whenever pushing a constant number or string. It will produce the
474        appropriate byte code instruction and insert entries into to
475        constant pool if necessary.
476      </p>
477
478      <source>
479InstructionFactory f  = new InstructionFactory(class_gen);
480InstructionList    il = new InstructionList();
481...
482il.append(new PUSH(cp, "Hello, world"));
483il.append(new PUSH(cp, 4711));
484...
485il.append(f.createPrintln("Hello World"));
486...
487il.append(f.createReturn(type));
488      </source>
489
490    <h4>Code patterns using regular expressions</h4>
491      <p>
492        When transforming code, for instance during optimization or when
493        inserting analysis method calls, one typically searches for
494        certain patterns of code to perform the transformation at. To
495        simplify handling such situations <font
496              face="helvetica,arial">BCEL </font>introduces a special feature:
497        One can search for given code patterns within an instruction list
498        using <em>regular expressions</em>. In such expressions,
499        instructions are represented by their opcode names, e.g.,
500        <tt>LDC</tt>, one may also use their respective super classes, e.g.,
501        "<tt>IfInstruction</tt>". Meta characters like <tt>+</tt>,
502        <tt>*</tt>, and <tt>(..|..)</tt> have their usual meanings. Thus,
503        the expression
504      </p>
505
506      <source>"NOP+(ILOAD|ALOAD)*"</source>
507
508      <p>
509        represents a piece of code consisting of at least one <tt>NOP</tt>
510        followed by a possibly empty sequence of <tt>ILOAD</tt> and
511        <tt>ALOAD</tt> instructions.
512      </p>
513
514      <p>
515        The <tt>search()</tt> method of class
516        <tt>org.apache.bcel.util.InstructionFinder</tt> gets a regular
517        expression and a starting point as arguments and returns an
518        iterator describing the area of matched instructions. Additional
519        constraints to the matching area of instructions, which can not be
520        implemented via regular expressions, may be expressed via <em>code
521        constraint</em> objects.
522      </p>
523
524    <h4>Example: Optimizing boolean expressions</h4>
525      <p>
526        In Java, boolean values are mapped to 1 and to 0,
527        respectively. Thus, the simplest way to evaluate boolean
528        expressions is to push a 1 or a 0 onto the operand stack depending
529        on the truth value of the expression. But this way, the
530        subsequent combination of boolean expressions (with
531        <tt>&amp;&amp;</tt>, e.g) yields long chunks of code that push
532        lots of 1s and 0s onto the stack.
533      </p>
534
535      <p>
536        When the code has been finalized these chunks can be optimized
537        with a <em>peep hole</em> algorithm: An <tt>IfInstruction</tt>
538        (e.g.  the comparison of two integers: <tt>if_icmpeq</tt>) that
539        either produces a 1 or a 0 on the stack and is followed by an
540        <tt>ifne</tt> instruction (branch if stack value 0) may be
541        replaced by the <tt>IfInstruction</tt> with its branch target
542        replaced by the target of the <tt>ifne</tt> instruction:
543      </p>
544
545      <source>
546CodeConstraint constraint = new CodeConstraint() {
547    public boolean checkCode(InstructionHandle[] match) {
548        IfInstruction if1 = (IfInstruction) match[0].getInstruction();
549        GOTO g = (GOTO) match[2].getInstruction();
550        return (if1.getTarget() == match[3]) &amp;&amp;
551            (g.getTarget() == match[4]);
552    }
553};
554
555InstructionFinder f = new InstructionFinder(il);
556String pat = "IfInstruction ICONST_0 GOTO ICONST_1 NOP(IFEQ|IFNE)";
557
558for (Iterator e = f.search(pat, constraint); e.hasNext(); ) {
559    InstructionHandle[] match = (InstructionHandle[]) e.next();;
560    ...
561    match[0].setTarget(match[5].getTarget()); // Update target
562    ...
563    try {
564        il.delete(match[1], match[5]);
565    } catch (TargetLostException ex) {
566        ...
567    }
568}
569      </source>
570
571      <p>
572        The applied code constraint object ensures that the matched code
573        really corresponds to the targeted expression pattern. Subsequent
574        application of this algorithm removes all unnecessary stack
575        operations and branch instructions from the byte code. If any of
576        the deleted instructions is still referenced by an
577        <tt>InstructionTargeter</tt> object, the reference has to be
578        updated in the <tt>catch</tt>-clause.
579      </p>
580
581      <p>
582        <b>Example application:</b>
583        The expression:
584      </p>
585
586      <source>
587        if ((a == null) || (i &lt; 2))
588        System.out.println("Ooops");
589      </source>
590
591      <p>
592        can be mapped to both of the chunks of byte code shown in <a
593              href="#Figure 6">figure 6</a>. The left column represents the
594        unoptimized code while the right column displays the same code
595        after the peep hole algorithm has been applied:
596      </p>
597
598      <p align="center"><a name="Figure 6">
599        <table>
600          <tr>
601            <td valign="top"><pre>
602              5:  aload_0
603              6:  ifnull        #13
604              9:  iconst_0
605              10: goto          #14
606              13: iconst_1
607              14: nop
608              15: ifne          #36
609              18: iload_1
610              19: iconst_2
611              20: if_icmplt     #27
612              23: iconst_0
613              24: goto          #28
614              27: iconst_1
615              28: nop
616              29: ifne          #36
617              32: iconst_0
618              33: goto          #37
619              36: iconst_1
620              37: nop
621              38: ifeq          #52
622              41: getstatic     System.out
623              44: ldc           "Ooops"
624              46: invokevirtual println
625              52: return
626            </pre></td>
627            <td valign="top"><pre>
628              10: aload_0
629              11: ifnull        #19
630              14: iload_1
631              15: iconst_2
632              16: if_icmpge     #27
633              19: getstatic     System.out
634              22: ldc           "Ooops"
635              24: invokevirtual println
636              27: return
637            </pre></td>
638          </tr>
639        </table>
640      </a>
641      </p>
642    </subsection>
643    </section>
644  </body>
645</document>