# LyX 1.1 created this file. For more info see http://www.lyx.org/ # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. \lyxformat 218 \textclass scrbook \begin_preamble \end_preamble \language english \inputencoding latin1 \fontscheme default \graphics default \float_placement !htp \paperfontsize default \spacing single \papersize Default \paperpackage a4wide \use_geometry 0 \use_amsmath 0 \paperorientation portrait \secnumdepth 2 \tocdepth 2 \paragraph_separation indent \defskip medskip \quotes_language english \quotes_times 2 \papercolumns 1 \papersides 2 \paperpagestyle default \layout Subject \emph on Diplomarbeit \layout Title JustIce \newline \size small A Free Class File Verifier for Java \latex latex \backslash texttrademark\SpecialChar ~ \layout Author Enver Haase \newline \size tiny \layout Date September 2001 \layout Publishers Freie Universität Berlin \newline Institut für Informatik \newline Takustraße 9 \newline D-14195 Berlin \layout Lowertitleback \series bold \size scriptsize Revision \series default \series bold \shape smallcaps $Id$ \layout Minisec Erklärung \begin_float footnote \layout Standard I declare that I wrote this \emph on Diplomarbeit \emph default completely on my own and without the help of persons not listed. All sources of information are listed in the Bibliography section. \end_float \layout Standard Hiermit versichere ich, die vorliegende Diplomarbeit selbständig und ohne fremde Hilfe verfaßt zu haben. Es wurden nur die in der Bibliographie angegebenen Quellen benutzt. \layout Minisec Danksagung \begin_float footnote \layout Standard The creation of this \emph on Diplomarbeit \emph default paper was supported and supervised by Prof. Dr. Elfriede Fehr and Dipl.-Inform. Markus Dahm. Keith Seymour suggested a lot of language-related improvements. Thank you. \end_float \layout Standard Während der Anfertigung dieser Diplomarbeit wurde ich von Prof. Dr. Elfriede Fehr und Dipl.-Inform. Markus Dahm betreut, wofür ich mich an dieser Stelle herzlich bedanke. \layout Standard Desweiteren bedanke ich mich bei Keith Seymour, der mir eine Reihe sprachspezifi scher Verbesserungsvorschläge sandte. \layout Minisec Autor \begin_float footnote \layout Standard Author \end_float \layout Standard Enver Haase \newline Gubener Straße 18 \newline D-10243 Berlin \newline \layout Standard \begin_inset LatexCommand \tableofcontents{} \end_inset \layout Addchap Abstract \layout Standard When Sun Microsystems developed their \emph on Java Platform \emph default in the early 1990s, it was originally designed for use in networked and embedded consumer-electronics applications. But when they introduced it around 1995, it quickly became used in World Wide Web browser software. This was a way to bring interactive content to demanding World Wide Web users. Sun took great care for the robustness of the platform: they planned to connect embedded devices and let them share data and code over a network. Defective devices transmitting bad data or unreliable network connections should not cause other devices to crash. This property made Java a good choice for the code-executing engine in World Wide Web browsers: defective server software or transmission errors would not cause the \emph on Java Platform \emph default to crash; this is also true for purposely malicious code hidden on the Web. The code-executing part of the \emph on Java Platform \emph default is called \emph on The Java Virtual Machine \emph default (the \emph on JVM \emph default , for short). This execution engine has to assure that the code to be executed is well-behave d; it has to \emph on verify \emph default the code. Therefore, the \emph on verifier \emph default is an integral part of every JVM, but JustIce implements a verifier that is not integrated in a JVM. It was implemented using a software library called the \emph on Byte Code Engineering Library \emph default (the \emph on BCEL \emph default , for short) by Markus Dahm \begin_inset LatexCommand \cite{BCEL98,BCEL-WWW} \end_inset . \layout Standard The BCEL is intended to give users a convenient mechanism to analyze, create and manipulate (binary) Java class files. It offers an object-oriented view of otherwise raw data, including program code. This library is, therefore, well-respected especially in the compiler-writer community whenever the JVM is chosen as the target machine of the compiler. Compiler back-ends use the BCEL to produce code for the JVM; and as new compilers may be faulty, they may produce bad code. Testing these compilers often is a difficult task. The generated code should not only be semantically correct, but it also has to pass the verifiers of all existing JVM implementations. Normally, a lot of human interaction is required to run test cases. If the code is rejected by a verifier, one often does not know why. Most verifiers emit error messages which do not identify the offending instruction. \layout Standard JustIce presents an Application Programming Interface (API) that may be used to automate the procedure sketched above. The constraints imposed on class files are designed to be strict, therefore eleminating the need to run several verifiers on the generated code. If code passes the JustIce verifier, it should pass all other verifiers. JustIce was also designed to output human-understandable messages if the verification of some code fails. \layout Standard The application range of JustIce is not limited to compiler back-ends, in the same sense as the BCEL is not only useful in this area. Transformations of existing code and even generation of hand-crafted code fall into its scope, too. As a side effect, JustIce exports some data structures such as a control flow graph; so its API may also be used for applications targeting other problem areas such as static analyses of program code. \layout Chapter Introduction \layout Section Low Level Security as a Part of a Many-Tiered Strategy \layout Standard The Java programming language is well-known for its inherent security facilities such as the lack of pointer arithmetic or the need for memory allocation and deallocation. Lesser known is that this is only the top of an iceberg; the \emph on Java Platform \emph default implements a many-tiered security strategy \begin_inset LatexCommand \cite{Yellin-WWW} \end_inset . It was designed to run even untrusted code -- code that possibly was not produced by a compiler for the Java programming language, code that may be corrupt or code that may have malicious intent (such as stealing credit card number information from a hard disk drive). Three considerations were made: \layout Itemize Untrusted code could damage hardware, software, or information on the host machine. \layout Itemize It could pass unauthorized information to anyone. \layout Itemize It could cause the host machine to become unusable through resource depletion. \layout Standard While some security features such as type-safety or the already-mentioned lack of pointer arithmetic of the Java programming language are a convenient help for programmers, they can only help to reduce programming errors. Of course these features do not help targeting the above problems. At a lower level, however, the \emph on Java Plat\SpecialChar \- form \emph default implements a so-called sandbox: an area where code can be executed but that has well-defined boundaries shielding the rest of the system. This is achieved by means of a \emph on Java Virtual Machine \emph default (JVM) emulation; the host platform does not directly run untrusted code, but a \emph on run-time system \emph default which in turn runs the code, restricting its access to system resources. \layout Standard A run-time system cannot safely assume that untrusted code is well-behaved. Code could cause stack overflows, stack underruns, or otherwise erroneous behaviour that may bring the run-time system into an undefined state -- possibly allowing access to protected memory areas. One could protect the run-time system by letting it predict the effects of every single instruction just in time while actually executing it -- but that would be too time-consuming to be applicable in practice. \layout Standard Therefore, good behaviour of program code has to be enforced \emph on before \emph default it is actually executed -- at least as far as this is possible. This is the lowest level of Java security; there has to be an integral component in every JVM implementation doing so ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 420). This part of the JVM is called the \emph on class file verifier \emph default , yet better known as the \emph on bytecode verifier. \emph default Technically speaking, bytecode verification is only a part of class file verification so \emph on class file verifier \emph default is a more embracing term. JustIce implements a whole class file verifier. \layout Standard \begin_float fig \layout Standard \align center \begin_inset Figure size 595 396 file chap1.eps width 3 100 flags 9 \end_inset \layout Caption Concept of Class File Verification \end_float \layout Section Why Another Verifier? \layout Standard As said before, every JVM implementation must contain a class file verifier, so it is reasonable to ask for the motivation behind creating just another class file verifier -- especially one that is \emph on not \emph default part of a JVM implementation. \layout Subsection Bytecode Engineers Need JustIce \layout Standard Shortly after the \emph on Java Platform \emph default was introduced, it was adopted with pleasure because of its inherent independen ce from operating systems and concrete hardware. Industry and educational institutions with heterogenous networked computers could now run the same software program on different host machines. Soon, many efforts were put into research and development of compilers for programming languages other than the Java programming language that use the JVM bytecode as target. \layout Standard Nowadays, many other programming languages do have the JVM as its target platform; e.g. Fortran \begin_inset LatexCommand \cite{f2j} \end_inset , Ada \begin_inset LatexCommand \cite{AppMag-WWW} \end_inset , Scheme \begin_inset LatexCommand \cite{KAWA-WWW} \end_inset or modified Java language versions \begin_inset LatexCommand \cite{GJ-WWW,PMG-WWW} \end_inset . A vast collection of programming languages targeting the JVM can be found on the World Wide Web \begin_inset LatexCommand \cite{PL4JVM} \end_inset . \layout Standard All these compilers emit code for the JVM -- and so all these compilers have to pass the JVM's verifier. Implementors of such compilers have to consider the security related constraint s the JVM poses on the generated code. It is difficult to test if the emitted code works on all JVM implementations, passing all JVM verifier implementations. This is especially problematic if not all of the project's class files are loaded into the JVM during a test run, because then they will not be verified. \layout Standard Having an opportunity to verify the transitive hull of referenced class files (starting with some main class file) would be of help; JustIce offers it. \layout Standard The Bytecode Engineering Library by Markus Dahm is often used as a compiler back-end to emit code, but it is also used to hand-craft code or to implement bytecode transformations. Because JustIce works closely together with the BCEL, users of the BCEL do not even have to leave their development environment to run the JustIce verifier. \layout Standard To our knowledge, JustIce is the only implementation of a Java class file verifier that was written in the Java programming language \begin_inset LatexCommand \cite{langspec2} \end_inset itself \begin_float footnote \layout Standard In a personal communication, Robert Stärk told the author that there was a Java implementation of the verifier discussed in \begin_inset LatexCommand \cite{JBook} \end_inset , written by Joachim Schmid using the BCEL. However, it is not released for public use yet. \end_float . Because of its \emph on Verification API \emph default , it can be included in other software projects written in Java with more ease than any other verifier implementation in a different programming language could provide. \layout Subsection JustIce is Verbose \layout Standard Usually, when classes pass the verifier, it is mute. JustIce, in contrast, distinguishes between verification results and messages. Messages are often warnings, but the reason for emitting such a warning instead of a negative verification result is because the class file does not pose a threat to the integrity of the JVM and thus does not have to be rejected. \layout Standard When a verification error occurs and the class file is rejected, even the built-in verifiers usually produce some output saying so. As an example, consider the following verifier run: \newline \newline \family typewriter ehaase@haneman:/home/ehaase > java Cc \newline Exception in thread "main" java.lang.VerifyError: \newline (class: Cc, method: ttt signature: ()V) \newline Recursive call to jsr entry \family default \newline \latex latex \newline \layout Standard One might ask \emph on which \emph default \begin_inset Quotes eld \end_inset jsr entry \begin_inset Quotes erd \end_inset (a branch target of a \latex latex \backslash texttt{jsr} \latex default or a \latex latex \backslash texttt{jsr \backslash _w} \latex default instruction) is called recursively and which instructions may be responsible for this. Compare this to JustIce's output: \newline \newline [...] \layout Standard \family typewriter Pass 3b, method number 0 ['public static void ttt()']: \layout Standard \family typewriter VERIFIED_REJECTED \layout Standard \family typewriter Constraint violated in method 'public static void ttt()': \layout Standard \family typewriter Subroutine with local variable '1', JSRs '[ 36: jsr[168](3) -> astore_1, 8: jsr[168](3) -> astore_1, 30: jsr[168](3) -> astore_1, 23: jsr[168](3) -> astore_1]', RET ' 62: ret[169](2) 1' is called by a subroutine which uses the same local variable index as itself; maybe even a recursive call? JustIce's clean definition of a subroutine forbids both. \newline \family default [...] \layout Standard \family typewriter Warnings: \layout Standard \family typewriter Pass 2: Attribute 'LineNumber(0, 4), LineNumber(0, 5), LineNumber(15, 8), LineNumber(39, 11), LineNumber(47, 12), LineNumber(57, 13), LineNumber(64, 15)' as an attribute of Code attribute '' (method 'public static void ttt()') will effectively be ignored and is only useful for debuggers and such. \layout Standard \family typewriter Pass 2: Attribute 'LineNumber(0, 1), LineNumber(4, 1)' as an attribute of Code attribute '' (method 'public void ()') will effectively be ignored and is only useful for debuggers and such. \layout Standard \family typewriter Pass 3a: LineNumberTable attribute 'LineNumber(0, 4), LineNumber(0, 5), LineNumber(15, 8), LineNumber(39, 11), LineNumber(47, 12), LineNumber(57, 13), LineNumber(64, 15)' refers to the same code offset ('0') more than once which is violating the semantics [but is sometimes produced by IBM's 'jikes' compiler]. \newline \layout Standard This output obviously has an answer to the above question; it shows the only \latex latex \backslash texttt{jsr} \latex default or \latex latex \backslash texttt{jsr \backslash _w} \latex default instructions possibly responsible for a recursive call (which is not allowed by the specification of the JVM). For the special --but clean-- definition of subroutines JustIce uses, please see section \begin_inset LatexCommand \ref{Subroutines_Def} \end_inset . \layout Standard Note also the warning messages. Class files that were not generated by Sun's \emph on javac \emph default compiler have a tendency to look a little different in some corner cases. IBM's \emph on jikes \emph default compiler, for instance, produces LineNumberTable attributes (see \begin_inset LatexCommand \ref{LineNumberTableAttribute} \end_inset ) which look different from those created by \emph on javac \emph default . Detecting such differences is desirable because future JVMs will have stricter verification checks \begin_float footnote \layout Standard The Solaris port of Sun's JVM, version 1.3.0_01, already has (some of) the stricter checks built in. You may enable them using the command-line option '-Xfuture'. Nothing about this issue is mentioned in the specification \begin_inset LatexCommand \cite{vmspec2} \end_inset . \end_float (which most old \emph on javac \emph default -compiled class files will probably still pass). JustIce guides bytecode engineers to create class files that are indistinguisha ble from those created by \emph on javac \emph default to retain compatibility with Sun's future JVM implementations. Figure \begin_inset LatexCommand \ref{FigVenn} \end_inset graphically shows the relationship between class files and the verifier \begin_float footnote \layout Standard This is a simplicistic figure; unfortunately, there are class files produced by the \emph on javac \emph default compiler that do not pass the verifier. Please see section \begin_inset LatexCommand \ref{javacRejected} \end_inset for more details. \end_float . \begin_float fig \layout Standard \align center \begin_inset Figure size 595 378 file VennDiag.eps width 3 100 height 3 45 flags 9 \end_inset \layout Caption \begin_inset LatexCommand \label{FigVenn} \end_inset Venn diagram showing the operating domain of the Java verifier. \end_float \layout Subsection JustIce is Free \layout Standard Currently, there is no other free and complete open source verifier available known to the author. You may have a look at the JVM's source code by Sun Microsystems but you are not allowed to use the knowledge from that inspection for your own projects or even use their code. JustIce is a clean-room implementation: the author wrote JustIce by only reading the Java \latex latex \backslash texttrademark \latex default \SpecialChar ~ Virtual Machine Specification, Second Edition \begin_inset LatexCommand \cite{vmspec2} \end_inset and comparing the behaviour of JustIce with the behaviour of commercial implementations of Sun Microsystems and IBM Corporation. \layout Standard The open source JVM implementation \emph on Kaffe \emph default \begin_inset LatexCommand \cite{Kaffe-WWW} \end_inset , for example \emph on , \emph default does not have a \emph on complete \emph default verifier built in (although mandated by the JVM specification). \layout Standard \emph on Kissme \emph default \begin_inset LatexCommand \cite{kissme-WWW} \end_inset , another open source JVM implementation, currently does not include any verifier at all. \layout Standard The JVM implementations \emph on SableVM \emph default \begin_inset LatexCommand \cite{SableVM-WWW} \end_inset and Intel Corporation's \emph on Open Runtime Platform \emph default \begin_inset LatexCommand \cite{ORP-WWW} \end_inset are platforms to experiment with performance-enhancements. They are not intended to work as general-purpose JVMs so they do not need to implement verifiers. \layout Standard Other open source projects that could make use of a free verifier include the Java compiler \emph on gcj \emph default which is part of the GNU compiler collection \begin_inset LatexCommand \cite{GCC-WWW} \end_inset . \layout Standard JustIce is covered by the well-known and respected software license \emph on GNU General Public License \emph default (GPL); see section \begin_inset LatexCommand \ref{GPL} \end_inset . The author hopes other free software will benefit from it; from the JustIce software \begin_inset LatexCommand \cite{JustIce} \end_inset as well as from this paper describing some of the inner workings of JustIce. \layout Chapter The Java Virtual Machine \layout Standard The Java Virtual Machine (JVM) is an abstract machine specified in \begin_inset LatexCommand \cite{vmspec2} \end_inset . It has no knowledge about the Java programming language; but only of a certain binary file format: the class file format. A class file contains machine instructions for the JVM (called \emph on bytecodes \emph default ), a symbol table (called \emph on constant pool \emph default ) and some other ancillary information. \layout Standard On method invocation, a local stack frame is set up called the \emph on execution frame \emph default . It consists of an \emph on operand stack \emph default and \emph on local variables \emph default (which may be compared to registers of traditional machines). \layout Standard The instructions in the code arrays of class files are interpreted by the JVM. There are 212 legal instructions; they have read-access to the class file's constant pool and they can modify the operand stack and the local variables in their execution frame. An invoked method reads its arguments from the local variables. Certain instructions pass a return value to the invoking method. \layout Section \begin_inset LatexCommand \label{Classfile Structure} \end_inset The ClassFile Structure \layout Standard Traditionally, the JVM loads its programs from files stored on file systems of host machines; these files have names that end with \emph on \begin_inset Quotes eld \end_inset .class \begin_inset Quotes erd \end_inset \emph default . It is possible to store the files in various other ways; a so-called \emph on class loader \emph default is then used to transform the files internally to the desired, basic class file format. Therefore, it suffices to explain the structure of traditional class files. Every class file consists of a single \family typewriter ClassFile \family default structure as defined below. It defines a single class as known from the Java Programming Language \begin_inset LatexCommand \cite{langspec2} \end_inset . The terms \emph on class \emph default and \emph on class file \emph default may therefore be used interchangeably. \begin_float fig \layout Standard \align center \begin_inset Figure size 595 526 file classfile.eps width 3 100 flags 9 \end_inset \layout Standard A class file consists of constants, fields, methods, attributes and some ancillary information. This figure was taken from \begin_inset LatexCommand \cite{BCEL98} \end_inset , used with permission of the author. \layout Caption A Class File \end_float \layout Standard As we will see, the \family typewriter ClassFile \family default structure and its sub-structures are defined for upwards compatibility, i.e., new structure definitions can be added to the specification easily at a later time. \newline \newline \family typewriter ClassFile { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u4 magic; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 minor_version; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 major_version; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 constant_pool_count; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ cp_info constant_pool[constant_pool_count-1]; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 access_flags; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 this_class; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 super_class; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 interfaces_count; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 interfaces[interfaces_count]; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 fields_count; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ field_info fields[fields_count]; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 methods_count; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ method_info methods[methods_count]; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 attributes_count; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ attribute_info attributes[attributes_count]; \newline } \newline \newline \family default You may read an ' \family typewriter u \family default ' as 'byte times'; e.g., ' \family typewriter u2 \family default ' means 'two bytes in size'. We will not delve into too much detail here; the exact specification of the entries are published by Sun \begin_inset LatexCommand \cite{vmspec2} \end_inset . But one should note that besides some other information, a class file basically defines \emph on attributes \emph default , \emph on constants \emph default , \emph on fields \emph default and \emph on methods \emph default . Also, there are strong structural constraints imposed on class files. It is a verifier's task to validate them. \layout Subsection Attributes \layout Standard The general format of an attribute is defined below. \newline \newline \family typewriter attribute_info { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 attribute_name_index; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u4 attribute_length; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u1 info[attribute_length]; \newline } \family default \newline \newline An attribute is basically a typed data container; its type is determined by its name. Every JVM is required to be silent about attributes of types it does not know. On the other hand, newly defined attributes are required not to impose a semantical change on the class file. These attributes should be uniquely named; in fact, the pair (, ) is required to be unique. This is guaranteed because attributes not defined by Sun Microsystems have to be named according to the package naming scheme of the Java Programming Language \begin_inset LatexCommand \cite{langspec2} \end_inset . Certain basic attributes are predefined. They are used in the \family typewriter ClassFile \family default (see section \begin_inset LatexCommand \ref{Classfile Structure} \end_inset ), \family typewriter field_info \family default (see section \begin_inset LatexCommand \ref{Fields} \end_inset ) and \family typewriter method_info \family default (see section \begin_inset LatexCommand \ref{Methods} \end_inset ). Also, attributes may be nested: the \family typewriter Code \family default attribute references other attributes. \layout Standard Some examples for predefined attributes are listed below. \layout Subsubsection \begin_inset LatexCommand \label{ConstantValueAttribute} \end_inset The ConstantValue attribute \layout Standard The ConstantValue attribute has the following format: \newline \newline \family typewriter ConstantValue_attribute { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 attribute_name_index; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u4 attribute_length; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 constantvalue_index; \newline } \family default \newline \newline The \family typewriter ConstantValue \family default attribute represents the value of a constant field. It has a fixed length: it contains only a two-byte reference into the constant pool. Only \family typewriter field_info \family default structures (see section \begin_inset LatexCommand \ref{Fields} \end_inset ) contain this type of attribute. \layout Subsubsection \begin_inset LatexCommand \label{CodeAttribute} \end_inset The Code Attribute \layout Standard The \family typewriter Code \family default attribute is used in the \family typewriter method_info \family default (see section \begin_inset LatexCommand \ref{Methods} \end_inset ) structure. It represents the program code of a method and it is defined as follows: \newline \newline \family typewriter Code_attribute { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 attribute_name_index; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u4 attribute_length; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 max_stack; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 max_locals; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u4 code_length; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u1 code[code_length]; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 exception_table_length; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 start_pc; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 end_pc; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 handler_pc; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 catch_type; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ } exception_table[exception_table_length]; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 attributes_count; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ attribute_info attributes[attributes_count]; \newline } \family default \newline \newline This is the most complex of all predefined attributes. Every method that has code (i.e., every non-native, non-abstract method) must have such an attribute. Note that the maximum stack depth and the number of local variables for a method invocation are defined here. This is important for the JVM when it creates an \emph on execution frame \emph default (see section \begin_inset LatexCommand \ref{LV_and_OpStack} \end_inset ) at the time the method is invoked. \layout Standard Also, the exception handlers are defined here. Exception handlers prevent an executing method from an abrupt completion if an exceptional situation occurs. Code areas are said to be protected against a class of exceptional situations by an exception handler \begin_float footnote \layout Standard The JVM closely reflects the \emph on exception \emph default mechanism of the Java programming language \begin_inset LatexCommand \cite{langspec2} \end_inset . In the Java programming language, exceptions can be \emph on thrown \emph default , and they can be \emph on caught \emph default explicitly. If an internal JVM error occurs, the JVM also --implicitly-- throws an exception. \end_float . Algorithm \begin_inset LatexCommand \ref{ExcHdAlgo} \end_inset shows an example for the use of exception handlers. The exact meaning of the instruction opcodes is not important here, the most common instructions are explained later in this paper. \layout Standard \begin_float alg \layout Standard [Let \family typewriter start_pc \family default and \family typewriter end_pc \family default protect the area A to B, inclusive. Let the \family typewriter catch_type \family default be \begin_inset Quotes eld \end_inset \family typewriter java.lang.NullPointerException \family default \begin_inset Quotes erd \end_inset . Let the \family typewriter handler_pc \family default point to C.] \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ aconst_null\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; push a NULL onto the operand stack. \layout Standard \family typewriter A:\SpecialChar ~ nop\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; do nothing \layout Standard \family typewriter B:\SpecialChar ~ getfield Foo::bar\SpecialChar ~ \SpecialChar ~ ; dereference NULL, cause NullPointerExc. \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ return\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ;\SpecialChar ~ never executed \layout Standard \family typewriter C:\SpecialChar ~ nop\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ;\SpecialChar ~ this is executed: we could handle \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ nop\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ;\SpecialChar ~ the NullPointerException \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ return\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ;\SpecialChar ~ leave method (complete normally) \layout Caption \begin_inset LatexCommand \label{ExcHdAlgo} \end_inset Use of Exception Handlers \end_float \layout Standard The most important item, however, is the \family typewriter code \family default item. It defines the bytecode of this method; i.e., the JVM machine instructions. \layout Subsubsection \begin_inset LatexCommand \label{LineNumberTableAttribute} \end_inset The LineNumberTable Attribute \layout Standard The \family typewriter LineNumberTable \family default attribute is defined as follows: \newline \newline \family typewriter LineNumberTable_attribute { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 attribute_name_index; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u4 attribute_length; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 line_number_table_length; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 start_pc; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 line_number; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ } line_number_table[line_number_table_length]; \newline } \newline \family default \newline This attribute describes the relation between source code line numbers and JVM instruction offsets in the \family typewriter code \family default array of the \family typewriter Code_attribute \family default ; it can be used by debuggers to show the source code of currently executing JVM machine instructions. This attribute is usually a sub-attribute of a \family typewriter Code_attribute \family default . Multiple \family typewriter LineNumberTable \family default attributes may together represent a given line of a source code file. \layout Subsection Constants \layout Standard All the constants together form the \emph on constant pool \emph default . The general \family typewriter cp_info \family default structure is straightforward. \newline \newline \family typewriter cp_info { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u1 tag; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u1 info[]; \newline } \family default \newline \newline The 'tag' defines what 'info' follows it. Constants define either constant values or constant symbolic references, such as references to other classes. Currently, eleven constant types are defined: \family typewriter Class \family default , \family typewriter Field\SpecialChar \- ref \family default , \family typewriter Method\SpecialChar \- ref \family default , \family typewriter In\SpecialChar \- ter\SpecialChar \- face\SpecialChar \- Method\SpecialChar \- ref \family default , \family typewriter String \family default , \family typewriter In\SpecialChar \- teger \family default , \family typewriter Float \family default , \family typewriter Long \family default , \family typewriter Double \family default , \family typewriter Name\SpecialChar \- And\SpecialChar \- Type \family default and \family typewriter Utf8 \family default . \layout Standard Most of the names are self-explanatory; the interested reader will find more information in the specification \begin_inset LatexCommand \cite{vmspec2} \end_inset . Constants can be nested; this is done by referring to the constant pool index of the enclosed constant. \layout Standard See the following examples. \newline \newline \family typewriter CONSTANT_Utf8_info { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u1 tag; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 length; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u1 bytes[length]; \newline } \newline \newline \family default A CONSTANT_Utf8 represents a constant string. Such a string is e.g. used to describe names of methods, names of fields, names of attributes, types of methods or types of fields. This string is encoded in UTF-8 format, a variant of the unicode character set \begin_inset LatexCommand \cite{Unicode} \end_inset . \family typewriter \family default The tag for this type of constant is simply the number 1, as defined in the Java Virtual Machine Specification, Second Edition \begin_inset LatexCommand \cite{vmspec2} \end_inset . \family typewriter \newline \newline CONSTANT_NameAndType_info { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u1 tag; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 name_index; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 descriptor_index; \newline } \family default \newline \newline A Constant_NameAndType represents a name and a signature of a method, the tag is the number 12. \family typewriter \family default Both \family typewriter class_index \family default and \family typewriter descriptor_index \family default refer to a \family typewriter CONSTANT_Utf8 \family default . \family typewriter \newline \newline CONSTANT_InterfaceMethodref_info { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u1 tag; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 class_index; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 name_and_type_index; \newline } \family default \newline \newline A \family typewriter CONSTANT_InterfaceMethodref \family default describes a reference to a method defined in an interface class (see section \begin_inset LatexCommand \cite{langspec2} \end_inset for an explanation of interfaces), the tag is number 11. The interface class is referenced via a two-byte index into the constant pool. A \family typewriter Constant_Class \family default is expected there describing a reference to some class file. Every method has a name, zero or more argument types and a return type; this is described in the \family typewriter CONSTANT_NameAndType \family default that is also referenced via a two-byte constant pool index. \layout Standard Note that there are implicit constraints on the integrity of a class file: for example, there must not be a \family typewriter CONSTANT_Integer \family default where a \family typewriter CONSTANT_Utf8 \family default is expected for a certain entity. As another example, the names and the types of methods are encoded as strings in UTF-8 format \begin_inset LatexCommand \cite{Unicode} \end_inset . They have to be well-formed (according to the specification) to be valid. \layout Subsection \begin_inset LatexCommand \label{Fields} \end_inset Fields \layout Standard Each field is described by a field_info structure as defined below. \newline \newline \family typewriter field_info { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 access_flags; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 name_index; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 descriptor_index; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 attributes_count; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ attribute_info attributes[attributes_count]; \family default \newline } \newline \newline A field has to be unique in a class file with respect to its name and descriptor \begin_float footnote \layout Standard The descriptor of a field describes its type. E.g., a descriptor of \begin_inset Quotes eld \end_inset [I \begin_inset Quotes erd \end_inset means \begin_inset Quotes eld \end_inset one-dimensional array of \family typewriter int \family default \begin_inset Quotes erd \end_inset . \end_float . We see that fields reference constants in the constant pool via their constant pool indices (such as a \family typewriter CONSTANT_Utf8 \family default describing a field's name). An important attribute used by fields is the ConstantValue attribute (see section \begin_inset LatexCommand \ref{ConstantValueAttribute} \end_inset ). \layout Standard The \family typewriter access_flags \family default entry is a bit vector that specifies the accessibility and other properties \begin_float footnote \layout Standard Often called \emph on visibility \emph default . \end_float of the field. E.g., a field with the \family typewriter ACC_PRIVATE \begin_float footnote \layout Standard Bit number 1. \end_float bit set is not accessible to other classes. A field with the \family typewriter ACC_PUBLIC \begin_float footnote \layout Standard Bit number 0. \end_float bit set is accessible to any other class. Any combination with both the \family typewriter ACC_PRIVATE \family default and the \family typewriter ACC_PUBLIC \family default bit set is not valid. \layout Standard The \family typewriter descriptor_index \family default refers to a \family typewriter CONSTANT_Utf8 \family default that symbolically encodes the type of the field. \layout Subsection \begin_inset LatexCommand \label{Methods} \end_inset Methods \layout Standard Each method is described by a method_info structure as defined below. \newline \newline \family typewriter method_info { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 access_flags; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 name_index; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 descriptor_index; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ u2 attributes_count; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ attribute_info attributes[attributes_count]; \newline } \family default \newline \newline As we can easily see, this is exactly the same structure we already know as \family typewriter field_info \family default (see section \begin_inset LatexCommand \ref{Fields} \end_inset ). The difference lies in the meaning of the enlisted entities. For example, an access flag saying a field was volatile (non-cacheable) would not make any sense if set in a \family typewriter method_info \family default structure. Vice versa, an access flag saying the floating point instructions should work in \begin_inset Quotes eld \end_inset FP-strict \begin_inset Quotes erd \end_inset mode would be of no use if set in a \family typewriter field_info \family default structure. \layout Standard Methods use a different set of attributes than fields; for example, the \family typewriter Constant\SpecialChar \- Value \family default attribute (see section \begin_inset LatexCommand \ref{ConstantValueAttribute} \end_inset ) is of no use here. The \family typewriter Code \family default and \family typewriter Exceptions \family default attributes frequently used by methods are of no use for fields on the other hand. \layout Section The Execution Engine \layout Standard Before a piece of code (the code of a \begin_inset Quotes eld \end_inset method \begin_inset Quotes erd \end_inset ) is executed, an \emph on execution frame \emph default is set up. It consists of a program counter (as known from traditional CPUs), a set of local variables (similar to registers known from traditional CPUs), and an operand stack. For each new invocation instance of a method, a new execution frame is set up; it is destroyed on method termination. \layout Standard Because a method may invoke other methods or itself recursively, there is a global method invocation stack. \layout Standard There also is a garbage-collected heap shared among the execution frames. This heap is used for object allocation (see section \begin_inset LatexCommand \ref{Instructions} \end_inset ). \layout Standard The number of local variables is not fixed. Every method defines how many local variables are used for its code (up to 65536). \layout Standard Also note that there is no equivalent of a \emph on Processor Status Word \emph default (PSW) in the JVM. Traditionally, a PSW has flags that are set implicitly during execution of the instructions (such as an overflow or is-zero flag). This is often used for conditional branching. The JVM, however, uses the operand stack to store the result of a comparison instruction explicitly. This result is often read from the stack by the JVM's conditional branching instructions. \layout Standard Should exceptional situations occur (such as an out-of-memory situation), the JVM does not lock up. Instead, an \begin_inset Quotes eld \end_inset exception is thrown \begin_inset Quotes erd \end_inset ; the currently executing program is signalled. These signals can be processed ( \begin_inset Quotes eld \end_inset exceptions can be caught \begin_inset Quotes erd \end_inset ). If such a signal is not handled by the currently executing method, the JVM will search a handler through the invocation hierarchy and stop execution only if none was found. \layout Standard There is a thread mechanism in the JVM. Basically every thread creates an own method invocation stack (so there may be more than one active execution frame at a time), but this feature is not important for the rest of this text. \layout Standard \begin_float fig \layout Standard \align center \begin_inset Figure size 595 379 file exframe.eps width 3 100 flags 9 \end_inset \layout Standard This figure shows a method invocation stack. Method \family typewriter main \family default was invoked by the system, \family typewriter main \family default invoked \family typewriter foo \family default , \family typewriter foo \family default invoked \family typewriter bar \family default , and \family typewriter bar \family default invoked \family typewriter foo \family default recursively. This figure assumes \family typewriter main \family default allocates one local variable and one operand stack slot, \family typewriter foo \family default allocates three local variables and two operand stack slots and \family typewriter bar \family default allocates one local variable and two operand stack slots. \layout Caption Method Invocation Stack \end_float \layout Subsection \begin_inset LatexCommand \label{LV_and_OpStack} \end_inset Local Variables and the Operand Stack \layout Standard The method information in a class file defines how many local variables are used on this method's invocation. It also defines the maximum operand stack size. Together, the local variables array and the operand stack are called the \emph on execution frame \emph default . \layout Standard A single stack slot has a width of 32 bits, which is also the width of a local variable. Therefore, values of types that occupy 64 bits ( \emph on double \emph default and \emph on long \emph default ) must be stored in two consecutive stack slots or local variables. \layout Standard The verifier takes care that the stack cannot overflow and that it cannot underflow. Also, it takes care that instructions may only access local variables if they contain a value of a known, correct type (see section \begin_inset LatexCommand \ref{Pass3Spec} \end_inset ). \layout Subsection \begin_inset LatexCommand \label{Instructions} \end_inset Introduction to JVM Instructions \layout Standard This section is derived from section 2.2 of \begin_inset LatexCommand \cite{BCEL98} \end_inset , used with permission of the author. \layout Standard The JVM's instruction set currently consists of 212 instructions, 44 opcodes are marked as reserved and may be used for future extensions or intermediate optimizations within the Virtual Machine. The instruction set can be roughly grouped as follows: \layout Description Stack\SpecialChar ~ operations: Constants can be pushed onto the stack either by loading them from the constant pool with the \latex latex \backslash texttt{ldc} \latex default instruction or with special ``short-cut'' instructions where the operand is encoded into the instructions, e.g., \latex latex \backslash texttt{iconst \backslash _0} \latex default or \latex latex \backslash texttt{bipush} \latex default (push byte value). \layout Description Arithmetic\SpecialChar ~ operations: The instruction set of the JVM distinguishes its operand types using different instructions to operate on values of specific type. Arithmetic operations starting with \latex latex \backslash texttt{i} \latex default , for example, denote an integer operation. E.g., \latex latex \backslash texttt{iadd} \latex default that adds two integers and pushes the result back on the operand stack. The Java types \latex latex \backslash texttt{boolean} \latex default , \latex latex \backslash texttt{byte} \latex default , \latex latex \backslash texttt{short} \latex default , and \latex latex \backslash texttt{char} \latex default are handled as integers by the JVM. \layout Description \begin_inset LatexCommand \label{RetDesc} \end_inset Control\SpecialChar ~ flow: There are branch instructions like \latex latex \backslash texttt{goto} \latex default and \latex latex \backslash texttt{if \backslash _icmpeq} \latex default , which compares two integers for equality. There is also a \latex latex \backslash texttt{jsr} \begin_float footnote \layout Standard There is a \begin_inset Quotes eld \end_inset wide \begin_inset Quotes erd \end_inset version of \latex latex \backslash texttt{jsr} \latex default called \latex latex \backslash texttt{jsr \backslash _w} \latex default . The instructions \latex latex \backslash texttt{jsr} \latex default / \latex latex \backslash texttt{jsr \backslash _w} \latex default and \latex latex \backslash texttt{ret} \latex default play in important role in chapter \begin_inset LatexCommand \ref{Pass3Spec} \end_inset . \end_float (jump into subroutine) and \latex latex \backslash texttt{ret} \latex default (return from subroutine) pair of instructions. Exceptions may be thrown with the \latex latex \backslash texttt{athrow} \latex default instruction. Branch targets are coded as offsets from the current byte code position, i.e., they are coded with an integer number. \layout Description Load\SpecialChar ~ and\SpecialChar ~ store\SpecialChar ~ operations for local variables like \latex latex \backslash texttt{iload} \latex default and \latex latex \backslash texttt{istore} \latex default . There are also array operations like \latex latex \backslash texttt{iastore} \latex default which stores an integer value into an array. \layout Description Field\SpecialChar ~ access: The value of an instance field may be retrieved with \latex latex \backslash texttt{getfield} \latex default and written with \latex latex \backslash texttt{putfield} \latex default . For static fields, there are \latex latex \backslash texttt{getstatic} \latex default and \latex latex \backslash texttt{putstatic} \latex default counterparts. \layout Description Method\SpecialChar ~ invocation: Methods may either be called via static references with \latex latex \backslash texttt{invokestatic} \latex default or be bound virtually with the \latex latex \backslash texttt{invokevirtual} \latex default instruction. Super class methods and private methods are invoked with \latex latex \backslash texttt{invokespecial} \latex default . \layout Description Object\SpecialChar ~ allocation: Class instances are allocated with the \latex latex \backslash texttt{new} \latex default instruction, arrays of basic type like \latex latex \backslash texttt{int[]} \latex default with \latex latex \backslash texttt{newarray} \latex default , arrays of references like \latex latex \backslash texttt{String[][]} \latex default with \latex latex \backslash texttt{anewarray} \latex default or \latex latex \backslash texttt{multianewarray} \latex default . \layout Description Conversion\SpecialChar ~ and\SpecialChar ~ type\SpecialChar ~ checking: For stack operands of basic type there exist casting operations like \latex latex \backslash texttt{f2i} \latex default which converts a float value into an integer. The validity of a type cast may be checked with \latex latex \backslash texttt{checkcast} \latex default and the \latex latex \backslash texttt{instanceof} \latex default operator can be directly mapped to the equally named instruction. \layout Standard Most instructions have a fixed length, but there are also some variable-length instructions: In particular, the \latex latex \backslash texttt{lookupswitch} \latex default and \latex latex \backslash texttt{tableswitch} \latex default instructions, which are often used by compilers to implement the Java language \latex latex \backslash texttt{switch()} \latex default statements. Since the number of \latex latex \backslash texttt{case} \latex default clauses may vary, these instructions contain a variable number of statements. \layout Standard In a class file, the \family typewriter code \family default item in the \family typewriter Code \family default attributes (which in turn are attributes of \family typewriter method_info \family default structures), is a byte array in which binary representations of JVM instruction s are stored sequentially. This is also called \emph on bytecode \emph default . \layout Standard The JVM is a stack-based machine. There are local variables which may be compared to registers, but most instructions work on the operand stack. E.g., the \latex latex \backslash texttt{iadd} \latex default instruction pops two integers from the operand stack and pushes the result of the add operation on top of the stack. \layout Standard We will not list all of the instructions here, since these are explained in detail in the JVM specification. However, you will find the most common instructions in table \begin_inset LatexCommand \ref{typeprefixes} \end_inset , cited with slight corrections and modifications from chapter 4 of \begin_inset LatexCommand \cite{JNS} \end_inset . \layout Standard \begin_float tab \layout Caption \begin_inset LatexCommand \label{typeprefixes} \end_inset Type Prefixes and the Most Common JVM Instructions \layout Standard \align center \begin_inset Tabular \begin_inset Text \layout Standard Prefix \end_inset \begin_inset Text \layout Standard Bytecode type \end_inset \begin_inset Text \layout Standard i \end_inset \begin_inset Text \layout Standard Integer \end_inset \begin_inset Text \layout Standard f \end_inset \begin_inset Text \layout Standard Floating point \end_inset \begin_inset Text \layout Standard l \end_inset \begin_inset Text \layout Standard Long \end_inset \begin_inset Text \layout Standard d \end_inset \begin_inset Text \layout Standard Double precision floating point \end_inset \begin_inset Text \layout Standard b \end_inset \begin_inset Text \layout Standard Byte \end_inset \begin_inset Text \layout Standard s \end_inset \begin_inset Text \layout Standard Short \end_inset \begin_inset Text \layout Standard c \end_inset \begin_inset Text \layout Standard Character \end_inset \begin_inset Text \layout Standard a \end_inset \begin_inset Text \layout Standard Object reference \end_inset \end_inset \end_float \layout Standard \begin_inset Tabular \begin_inset Text \layout Standard \size scriptsize Instruction \end_inset \begin_inset Text \layout Standard \size scriptsize int \end_inset \begin_inset Text \layout Standard \size scriptsize long \end_inset \begin_inset Text \layout Standard \size scriptsize float \end_inset \begin_inset Text \layout Standard \size scriptsize double \end_inset \begin_inset Text \layout Standard \size scriptsize byte \end_inset \begin_inset Text \layout Standard \size scriptsize char \end_inset \begin_inset Text \layout Standard \size scriptsize short \end_inset \begin_inset Text \layout Standard \size scriptsize object ref. \end_inset \begin_inset Text \layout Standard \size scriptsize Function \end_inset \begin_inset Text \layout Standard ?2c \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Convert value of type to character \end_inset \begin_inset Text \layout Standard ?2d \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Convert value of type to double \end_inset \begin_inset Text \layout Standard ?2i \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Convert value of type to integer \end_inset \begin_inset Text \layout Standard ?2f \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Convert value of type to float \end_inset \begin_inset Text \layout Standard ?2l \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Convert value of type to long \end_inset \begin_inset Text \layout Standard ?2s \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Convert value of type to short \end_inset \begin_inset Text \layout Standard ?add \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Add two values of type \end_inset \begin_inset Text \layout Standard ?aload \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard Push an element of type from an array onto the stack \end_inset \begin_inset Text \layout Standard ?and \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Perform logical AND on two values of type \end_inset \begin_inset Text \layout Standard ?astore \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard Pop an element of type from the stack and store it in an array of type \end_inset \begin_inset Text \layout Standard ?cmp \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Compare two long values. If they are equal push 0, if the first is greater push 1, else push -1 \end_inset \begin_inset Text \layout Standard ?cmpg \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Compare two IEEE values of type from the stack. If they are equal push 0, if the first is greater push 1, if the second is greater push -1. If either is NaN (not a number) push 1 \end_inset \begin_inset Text \layout Standard ?cmpl \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Compare two IEEE values of type from the stack. If they are equal push 0, if the first is greater push 1, if the second is greater push -1. If either is NaN (not a number) push -1 \end_inset \begin_inset Text \layout Standard ?const \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard Push a constant value of type onto the stack \end_inset \begin_inset Text \layout Standard ?div \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Perform a division using two values of type and push the quotient onto the stack \end_inset \begin_inset Text \layout Standard ?inc \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Increment the top of the stack (possibly by a negative value) \end_inset \begin_inset Text \layout Standard ?ipush \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Push a sign extended byte or short value onto the stack \end_inset \begin_inset Text \layout Standard ?load \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Push a value of type from a local variable onto the stack \end_inset \begin_inset Text \layout Standard ?mul \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Perform multiplication of two values of type \end_inset \begin_inset Text \layout Standard ?neg \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Negate a value of type \end_inset \begin_inset Text \layout Standard ?newarray \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard Create a new array of object references \end_inset \begin_inset Text \layout Standard ?or \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Perform logical OR on two values of type \end_inset \begin_inset Text \layout Standard ?rem \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Perform a division using two values of type and push the remainder onto the stack \end_inset \begin_inset Text \layout Standard ?return \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard Return a value of type to the invoking method \end_inset \begin_inset Text \layout Standard ?shl \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Perform arithmetic shift left on a value of type \end_inset \begin_inset Text \layout Standard ?shr \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Perform arithmetic shift right on a value of type \end_inset \begin_inset Text \layout Standard ?store \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard Pop a value of type and store it into a local variable \end_inset \begin_inset Text \layout Standard ?sub \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard X \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard \end_inset \begin_inset Text \layout Standard Perform a subtraction using two values of type \end_inset \end_inset \layout Standard The opcode names are mostly self-explanatory. In this paper, all bytecode is commented to support the intuitive understanding. Algorithms \begin_inset LatexCommand \ref{facjavapl} \end_inset and \begin_inset LatexCommand \ref{facjavabytecode} \end_inset show an example bytecode taken from \begin_inset LatexCommand \cite{BCEL98} \end_inset . It implements the well-known faculty function. To understand this example, it is important to know that method arguments are stored into the local variables of a newly created execution frame upon method invocation. \layout Standard \begin_float alg \layout Caption \begin_inset LatexCommand \label{facjavapl} \end_inset Methed \emph on fac \emph default in a class \emph on Faculty \emph default , Java programming language version \layout Standard \family typewriter public static final int fac(int n){ \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ return (n==0)?1:n*fac(n-1); \layout Standard \family typewriter } \end_float \layout Standard \begin_float alg \layout Caption \begin_inset LatexCommand \label{facjavabytecode} \end_inset Method \emph on fac \emph default in a class \emph on Faculty \emph default , Java bytecode version \layout Standard \family typewriter \size footnotesize Faculty.fac (I)I \layout Standard \family typewriter \size footnotesize 0:\SpecialChar ~ \SpecialChar ~ iload_0\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; load argument onto stack \layout Standard \family typewriter \size footnotesize 1:\SpecialChar ~ \SpecialChar ~ ifne #8\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; non-zero? Then branch to 8. \layout Standard \family typewriter \size footnotesize 4:\SpecialChar ~ \SpecialChar ~ iconst_1\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; push constant 1 onto stack \layout Standard \family typewriter \size footnotesize 5:\SpecialChar ~ \SpecialChar ~ goto #16\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; jump to 16 \layout Standard \family typewriter \size footnotesize 8:\SpecialChar ~ \SpecialChar ~ iload_0\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; load argument onto stack \layout Standard \family typewriter \size footnotesize 9:\SpecialChar ~ \SpecialChar ~ iload_0\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; load argument onto stack \layout Standard \family typewriter \size footnotesize 10:\SpecialChar ~ iconst_1\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; push constant 1 onto stack \layout Standard \family typewriter \size footnotesize 11:\SpecialChar ~ isub\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; subtract the stack top from \layout Standard \family typewriter \size footnotesize \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; the stack next-to-top which becomes \layout Standard \family typewriter \size footnotesize \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; the new stack top \layout Standard \family typewriter \size footnotesize 12:\SpecialChar ~ invokestatic Faculty.fac (I)I\SpecialChar ~ \SpecialChar ~ ; call method fac recursively, \layout Standard \family typewriter \size footnotesize \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; the new invocation \layout Standard \family typewriter \size footnotesize \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; instance's argument is the stack top \layout Standard \family typewriter \size footnotesize 15:\SpecialChar ~ imul\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; multiply the return value with the \layout Standard \family typewriter \size footnotesize \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; argument given to the current \layout Standard \family typewriter \size footnotesize \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; invocation instance \layout Standard \family typewriter \size footnotesize 16:\SpecialChar ~ ireturn\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; return value on top of the \layout Standard \family typewriter \size footnotesize \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; stack to the invoking method \end_float \layout Chapter \begin_inset LatexCommand \label{SpecPasses} \end_inset Specification of the Verification Passes \layout Standard Sun describes a four-pass class file verifier in The Java Virtual Machine Specification, Second Edition \begin_inset LatexCommand \cite{vmspec2} \end_inset . It is not necessary to implement the verification algorithms literally; and it is not possible anyway (see section \begin_inset LatexCommand \ref{SpecSubroutines} \end_inset ). However, implementing a verifier with a multiple-pass architecture makes sense. It is a good thing to stay close to the specification because it is well-known throughout the bytecode engineering community. Also, the boundaries between the passes are not arbitrary. They are drawn to improve the performance of the verifiers built into JVMs. For example, classes are not verified (completely) before they are actually used but they are loaded as soon as they are referenced in a certain way. Most verifiers use the traditional multiple-pass architecture, including Kimera \begin_inset LatexCommand \cite{Kimera-WWW} \end_inset . Work in other directions (for instance, the one-pass-architecture proposed by Fong \begin_inset LatexCommand \cite{Fong-WWW} \end_inset ) did not yield lasting results. \layout Standard Pass one is basically about loading a class file into the JVM in a sane way and pass two verifies that the loaded class file information is consistent. Pass three verifies that the program code is well-behaved; pass four verifies things that conceptually belong to pass three but are delayed to the run-time for performance reasons. \layout Standard Sometimes implementation details are discussed in this chapter. Whenever the specification \begin_inset LatexCommand \cite{vmspec2} \end_inset was ambigous about some issue, the behaviour of Sun's JVM implementations was observed. The discussed details are part of the specification of the JustIce verifier. \layout Section \begin_inset LatexCommand \label{PassOneSpec} \end_inset Pass One \layout Standard The first pass of the verifier is only vaguely specified. It is there to assure a class file \begin_inset Quotes eld \end_inset \series bold has the basic format of a class file. The first four bytes must contain the right magic number. All recognized attributes must be of the proper length. The class file must not be truncated or have any extra bytes at the end. The constant pool must not contain any superficially unrecognizable information \series default \begin_inset Quotes erd \end_inset ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 141). \layout Standard The right magic number is 0xCAFEBABE ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 94), which is easy to assure. \layout Standard It is not clear what \begin_inset Quotes eld \end_inset superficially unrecognizable information \begin_inset Quotes erd \end_inset exactly is, however. If an attribute is not known to the JVM (or verifier) implementation, it has to be ignored -- so this does not seem to be \begin_inset Quotes eld \end_inset superficially unrecognizable information \begin_inset Quotes erd \end_inset . Attributes that are not used cannot be detected in pass one. One would have to look at the bytecodes to decide whether an attribute is used or not (which is not the domain of pass one, but of pass three). \layout Standard Observations show that most existing JVM verifiers \begin_float footnote \layout Standard An example of a verifier with this behaviour is the one implemented in Sun's Solaris port of the JVM, version 1.3.0_01. \end_float ignore \begin_inset Quotes eld \end_inset extra bytes at the end \begin_inset Quotes erd \end_inset instead of rejecting class files bearing them. \layout Standard The other two statements specify verification of the class file structure (and the structure of the attributes therein). But this is also the domain of pass two! Only by inspecting the way the JVM \emph on loads \emph default , \emph on resolves \emph default and \emph on prepares \emph default classes one will understand the precise boundary between verification passes one and two \begin_inset LatexCommand \cite{Fong-WWW} \end_inset . \layout Standard 'Being careful when loading a class file' is a good definition for pass one: the structure of the file to load is untrusted. Every implicit statement such as \begin_inset Quotes eld \end_inset this attribute has a length of 1234 bytes in total \begin_inset Quotes erd \end_inset is validated. \layout Standard \emph on Resolution \emph default is the transformation of a symbolic reference to an actual reference -- i.e., as long as there is only a symbolic reference to an entity, this entity cannot be verified at all because it has not been loaded yet. Passes two and three are performed during the \emph on resolution \emph default of a class file; while loading of the class file --pass one-- must have been performed before. \emph on Resolution \emph default as such is meaningless to JustIce; the term is only used to draw the borders between the verification passes. \layout Section \begin_inset LatexCommand \label{SpecPassTwo} \end_inset Pass Two \layout Standard The checks performed in pass two enforce that the following constraints are satisfied. \layout Itemize Ensuring that final classes are not subclassed and that final methods are not overridden. \layout Itemize Checking that every class (except \family typewriter java.lang.Object \family default ) has a direct superclass. \layout Itemize Ensuring that the constant pool satisfies the documented static constraints: for example, that each \family typewriter CONSTANT_Class_info \family default structure in the constant pool contains in its \family typewriter name_index \family default item a valid constant pool index for a \family typewriter CONSTANT_Utf8_info \family default structure. \layout Itemize Checking that all field references and method references in the constant pool have valid names, valid classes, and a valid type descriptor. \layout Standard As Frank Yellin puts it \begin_inset LatexCommand \cite{Yellin-WWW} \end_inset : pass two \begin_inset Quotes eld \end_inset performs all verification that can be performed without looking at the bytecodes \begin_inset Quotes erd \end_inset . Also, \begin_inset Quotes eld \end_inset this pass does not actually check to make sure that the given field or method really exists in the given class; nor does it check that the type signatures given refer to real classes. \begin_inset Quotes erd \end_inset Note that again \emph on resolution \emph default plays an important role to create the boundary between two passes; here it is the boundary between pass two and pass three. Because linking-time verification enhances the performance of the JVM, checks that basically belong to pass two are delayed to pass three. This leads to the obvious contradiction in the sentences cited above. \layout Standard This performance enhancement has an ugly side effect. Consider a reference to a method m contained in a class file C that does not exist. As long as this reference is not \emph on used \emph default , i.e., \emph on resolved \emph default , the absence of C cannot be detected. Such a reference should in the author's opinion regarded as \begin_inset Quotes eld \end_inset superficially unrecognizable information \begin_inset Quotes erd \end_inset (see section \begin_inset LatexCommand \ref{PassOneSpec} \end_inset ) and therefore be detected. \layout Standard This pass has to verify the integrity of the clas file's data structures as explained in section \begin_inset LatexCommand \ref{Classfile Structure} \end_inset . As an example, consider the Line\SpecialChar \- Number\SpecialChar \- Table atribute. Sun did not specify there has to be exactly one \family typewriter Line\SpecialChar \- Number\SpecialChar \- Table \family default attribute (or none at all) per method, so possibly there is more than one attribute of that kind. This lax specification is not necessary due to the fact that you can put all information in a single \family typewriter Line\SpecialChar \- Number\SpecialChar \- Table_attri\SpecialChar \- bute \begin_float footnote \layout Standard Any number of \family typewriter line_number_table \family default array entries fits nicely in a single \family typewriter LineNumberTable_attribute \family default attribute. \end_float , but Sun did specify it this way ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 129). \layout Standard Verifiers are requested to reject class files with inconsistent information in their attributes. However, here it may be that only by looking at all \family typewriter Line\SpecialChar \- Number\SpecialChar \- Table_attribute \family default s of a method, an inconsistency can be detected. JustIce does so and rejects class files with inconsistent \family typewriter Line\SpecialChar \- Number\SpecialChar \- Table \family default information. \layout Standard Furthermore, it issues warnings if such an attribute is detected at all to discourage its use (see section \begin_inset LatexCommand \ref{Pass2Impl} \end_inset ). This is done because of possible different interpretations of the specification. \layout Standard It should be noted that the use of attributes raises a few more problems to class file verification. A simple case is the presence of an unknown attribute that may safely be ignored. It is explicitly stated that such a class file must not be rejected. On the other hand, how should a verifier react if --for example-- a \family typewriter field_info \family default (see section \begin_inset LatexCommand \ref{Fields} \end_inset ) structure encloses a \family typewriter Code_attribute \family default ? JustIce will issue a warning but not reject the class file. \layout Section \begin_inset LatexCommand \label{Pass3Spec} \end_inset Pass Three \layout Standard Performing pass three basically means \emph on verifying the bytecode \emph default . There are so-called \begin_inset Quotes eld \end_inset static constraints \begin_inset Quotes erd \end_inset on both the instructions in the code array and their operands. There are also so-called \begin_inset Quotes eld \end_inset structural constraints \begin_inset Quotes erd \end_inset . The structural constraints specify constraints on relationships between JVM instructions, so some people (including the author) regard \begin_inset Quotes eld \end_inset structural constraints \begin_inset Quotes erd \end_inset as a misnomer; they should be called \begin_inset Quotes eld \end_inset dynamic constraints \begin_inset Quotes erd \end_inset . \layout Standard Static constraints are easily enforced using very simple checks. Here is an example for such a check: let there be a \family typewriter Code \family default (see section \begin_inset LatexCommand \ref{CodeAttribute} \end_inset ) attribute with a \family typewriter max_locals \family default value of 2. Only local variables number 0 and 1 may be accessed by the bytecode in this \family typewriter Code \family default attribute. For all instructions accessing local variables, make sure they do not access any other local variable. \layout Standard Structural constraints are enforced using an algorithm sketched by Sun; it implements a symbolic execution of a method's code, by means of data flow analysis including type inference ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , pages 143-151). This algorithm is called the \emph on data flow analyzer. \emph default It is intuitively easy to understand, but it is hard to prove its correctness. The reason for that is the very weak specification of its subtleties; especiall y \emph on subroutines \emph default , \emph on wide date types \emph default and \emph on object initialization \emph default (see below). The general approach, however, is sound \begin_inset LatexCommand \cite{BCV-Soundness} \end_inset . Here is an example for a structural constraint enforced by this algorithm: during program execution, at any given point in the program the operand stack is always of the same height, no matter which code path was taken to reach that point. \layout Standard Pass three is the core of the verifier. Note that we will split this pass up into two passes, namely a pass verifying the static constraints and a pass verifying the structural constraints of a method's code. We will call these passes \begin_inset Quotes eld \end_inset pass 3a \begin_inset Quotes erd \end_inset and \begin_inset Quotes eld \end_inset pass 3b \begin_inset Quotes erd \end_inset . In a way, they resemble pass one and pass two: the former pass carefully parses an entity, while the latter pass performs additional verification. \layout Standard By defining pass four, the specification \begin_inset LatexCommand \cite{vmspec2} \end_inset implicitly excludes \begin_inset Quotes eld \end_inset certain tests that could in principle be performed in Pass 3 \begin_inset Quotes erd \end_inset , because they are \begin_inset Quotes eld \end_inset delayed until the first time the code for the method is actually invoked \begin_inset Quotes erd \end_inset . On the other hand, verifiers are allowed to perform pass four partially or completely as a part of pass three. JustIce performs the pass four checks in pass 3a. \layout Subsection Static Constraints: Pass 3a \layout Standard Sun gives examples of what the verifier does before starting the data flow analyzer ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , pages 143-144): \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold Branches must be within the bounds of the code array for the method. \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold The targets of all control-flow instructions are each the start of an instructio n. In the case of a \latex latex \backslash texttt{wide} \latex default instruction the \latex latex \backslash texttt{wide} \latex default opcode is considered the start of the instruction, and the opcode giving the operation modified by that \latex latex \backslash texttt{wide} \latex default instruction is not considered to start an instruction. Branches into the middle of an instruction are disallowed. \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold No instruction can access or modify a local variable at an index greater than or equal to the number of local variables that its method indicates it allocates. \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold All references to the constant pool must be an entry of the appropriate type. For example: the instruction \latex latex \backslash texttt{ldc} \latex default can be used only for data of type int or float or for instances of class String; the instruction \latex latex \backslash texttt{getfield} \latex default must reference a field. \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold The code does not end in the middle of an instruction. \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold Execution cannot fall off the end of the code. \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold For each exception handler, the starting and ending point of the code protected by the handler must be at the beginning of an instruction or, in the case of the ending point, immediately past the end of the code. The starting point must be before the ending point. The exception handler code must start at a valid instruction, and it may not start at an opcode being modified by the \latex latex \backslash texttt{wide} \latex default instruction. \layout Standard Most of these constraints are either static constraints on instructions or on their operands. A full list of constraints can be found in the Java Virtual Machine Specificati on, Second Edition ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , pages 133-137). \layout Standard The check for execution falling off the end of the code is an exception: this is a structural constraint and should therefore be performed in pass 3b. Sun's verifiers, however, reject code that has an unreachable \latex latex \backslash texttt{nop} \latex default at the end of the code array. Obviously, they reject the code before performing data flow analysis. For the sake of compatibility, JustIce performs this check in pass 3a. \layout Standard Note that the JVM's instructions differ in length. Some instructions occupy only one byte (such as \family typewriter nop \family default ), others occupy three bytes (such as \family typewriter goto \family default ). Branch instructions could therefore target operands of instructions. For example, line 1 of algorithm \begin_inset LatexCommand \ref{facjavabytecode} \end_inset reads \begin_inset Quotes eld \end_inset \family typewriter 1: ifne #8 \family default \begin_inset Quotes erd \end_inset . If it would read \begin_inset Quotes eld \end_inset \family typewriter 1: ifne #7 \family default \begin_inset Quotes erd \end_inset , this code was malformed. A special case is the instruction \family typewriter wide \family default . This instruction takes another instruction \emph on as its operand \emph default , so one could be misguided into thinking this embedded instruction was a valid target for branches. It is not. \layout Standard The checks Sun delays until pass four are performed in pass 3a by JustIce. These are checks to ensure allowed and possible access to a referenced type, listed below. \layout Itemize Is the type (class or interface) currently under examination allowed to reference the type \begin_float footnote \layout Standard Interfaces may contain code, this is normally used for static initialization of \family typewriter final \family default variables. \end_float ? \layout Itemize Does the referenced method or field exist in the given class? \layout Itemize Does the referenced method or field have the indicated descriptor (signature)? \layout Itemize Does the method currently under examination have access to the referenced method or field? \layout Subsection Structural Constraints: Pass 3b \layout Standard The structural constraints of JVM instructions are enforced by a data flow analyzer. This algorithm ensures the following constraints ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 142). \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold The operand stack is always the same size and contains the same types of values. \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold No local variable is accessed unless it is known to contain a value of an appropriate type. \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold Methods are invoked with the appropriate arguments. \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold Fields are assigned only using values of appropriate types. \layout Itemize \pextra_type 1 \pextra_width 10mm \series bold All opcodes have appropriate type arguments on the operand stack and in the local variable array. \layout Standard A full list of structural constraints can be found in The Java Virtual Machine Specification, Second Edition ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , pages 137-139). \layout Subsubsection \begin_inset LatexCommand \label{SunCoreAlgo} \end_inset Sun's Verification Algorithm \layout Standard Sun specifies the data flow analyzer by giving an informal algorithm ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , pages 144-146). This algorithm it cited here completely because it is the very core of the verifier. According to this algorithm, every bytecode instruction has a \begin_inset Quotes eld \end_inset changed \begin_inset Quotes erd \end_inset bit. Initially, only the \begin_inset Quotes eld \end_inset changed \begin_inset Quotes erd \end_inset bit of the first instruction is set. \layout Enumerate \pextra_type 1 \pextra_width 10mm \series bold Select a virtual machine instruction whose "changed" bit is set. If no instruction remains whose "changed" bit is set, the method has successful ly been verified. Otherwise, turn off the "changed" bit of the selected instruction. \layout Enumerate \pextra_type 1 \pextra_width 10mm \series bold Model the effect of the instruction on the operand stack and local variable array by doing the following: \newline \latex latex \backslash textbullet\SpecialChar ~ \latex default If the instruction uses values from the operand stack, ensure that there are a sufficient number of values on the stack and that the top values on the stack are of an appropriate type. Otherwise, verification fails. \newline \latex latex \backslash textbullet\SpecialChar ~ \latex default If the instruction uses a local variable, ensure that the specified local variable contains a value of the appropriate type. Otherwise, verification fails. \newline \latex latex \backslash textbullet\SpecialChar ~ \latex default If the instruction pushes values onto the operand stack, ensure that there is sufficient room on the operand stack for the new values. Add the indicated types to the top of the modeled operand stack. \newline \latex latex \backslash textbullet\SpecialChar ~ \latex default If the instruction modifies a local variable, record that the local variable now contains the new type. \layout Enumerate \pextra_type 1 \pextra_width 10mm \series bold Determine the instructions that can follow the current instruction. Successor instructions can be one of the following: \newline \latex latex \backslash textbullet\SpecialChar ~ \latex default The next instruction, if the current instruction is not an unconditional control transfer instruction (for instance goto, return, or athrow). Verification fails if it is possible to "fall off" the last instruction of the method. \newline \latex latex \backslash textbullet\SpecialChar ~ \latex default The target(s) of a conditional or unconditional branch or switch. \newline \latex latex \backslash textbullet\SpecialChar ~ \latex default Any exception handlers for this instruction. \layout Enumerate \pextra_type 1 \pextra_width 10mm \series bold Merge the state of the operand stack and local variable array at the end of the execution of the current instruction into each of the successor instructions. In the special case of control transfer to an exception handler, the operand stack is set to contain a single object of the exception type indicated by the exception handler information. \newline \latex latex \backslash textbullet\SpecialChar ~ \latex default If this is the first time the successor instruction has been visited, record that the operand stack and local variable values calculated in steps 2 and 3 are the state of the operand stack and local variable array prior to executing the successor instruction. Set the "changed" bit for the successor instruction. \newline \latex latex \backslash textbullet\SpecialChar ~ \latex default If the successor instruction has been seen before, merge the operand stack and local variable values calculated in steps 2 and 3 into the values already there. Set the "changed" bit if there is any modification to the values. \layout Enumerate \pextra_type 1 \pextra_width 10mm \series bold Continue at step 1. \layout Standard \pextra_type 1 \pextra_width 10mm \series bold To merge two operand stacks, the number of values on each stack must be identical. The types of values on the stacks must also be identical, except that different ly typed reference values may appear at corresponding places on the two stacks. In this case, the merged operand stack contains a reference to an instance of the first common superclass of the two types. Such a reference type always exists because the type Object is a superclass of all class and interface types. If the operand stacks cannot be merged, verification of the method fails. \layout Standard \pextra_type 1 \pextra_width 10mm \series bold To merge two local variable array states, corresponding pairs of local variables are compared. If the two types are not identical, then unless both contain reference values, the verifier records that the local variable contains an unusable value. If both of the pair of local variables contain reference values, the merged state contains a reference to an instance of the first common superclass of the two types. \layout Standard Certain instructions and data types complicate the data flow analyzer, most notably the instruction \latex latex \backslash texttt{ret} \latex default (see section \begin_inset LatexCommand \ref{RetDesc} \end_inset ). The algorithm above even uses a special definition of \emph on merging \emph default for the \latex latex \backslash texttt{ret} \latex default instruction (see \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 151). The \latex latex \backslash texttt{ret} \latex default instruction is parameterized with a value of type \family typewriter returnaddress \family default which is read from a local variable and used as a branching target. The \latex latex \backslash texttt{ret} \latex default instruction is there to implement a (control flow) return from a \emph on subroutine \emph default . \layout Subsubsection Reachability of Instructions \layout Standard For the data flow analysis algorithm, you need to know all the possible control flow successors of every instruction, i.e., you need to build a \emph on control flow graph \emph default (see below). Without the instructions \latex latex \backslash texttt{jsr} \begin_float footnote \layout Standard Remember, a \latex latex \backslash texttt{jsr} \latex default or \latex latex \backslash texttt{jsr \backslash _w} \latex default instruction is an unconditional branch instruction that jumps into a \emph on subroutine \emph default . Usually a \latex latex \backslash texttt{ret} \latex default instruction leaves the \emph on subroutine \emph default . \end_float , \latex latex \backslash texttt{jsr \backslash _w} \latex default and \latex latex \backslash texttt{ret} \latex default this calculation would be easy. But to calculate successors of a \latex latex \backslash texttt{ret} \latex default instruction, you need a complete control flow graph: you need to find out which \latex latex \backslash texttt{jsr} \latex default or \latex latex \backslash texttt{jsr \backslash _w} \latex default and \latex latex \backslash texttt{ret} \latex default pairs belong together. Therefore, a cycle of self-dependency is created that has to be broken somewhere. This is explained in detail below. \layout Standard This was also an issue that led to the definition of the term \emph on subroutine \emph default that JustIce uses. This definition allows the prediction of a \latex latex \backslash texttt{ret} \latex default instruction's target without performing control flow analysis. \layout Subsubsection \begin_inset LatexCommand \label{SpecSubroutines} \end_inset Subroutines \layout Standard Subroutines make the verification algorithm extremely difficult. They are harshly underspecified. Although \begin_inset Quotes eld \end_inset the Java virtual machine has no guarantee that any file it is asked to load was generated by that compiler \begin_inset Quotes erd \end_inset , the subroutine specification explains how \emph on javac \emph default transforms \begin_inset Quotes eld \end_inset \latex latex \backslash texttt{try} \latex default / \latex latex \backslash texttt{catch} \latex default / \latex latex \backslash texttt{finally} \latex default \begin_inset Quotes erd \end_inset clauses into subroutines \begin_inset LatexCommand \cite{vmspec2} \end_inset . Intuitively, one gets the idea that a subroutine starts with some jump target of a \latex latex \backslash texttt{jsr} \latex default or \latex latex \backslash texttt{jsr \backslash _w} \latex default instruction and ends with a \latex latex \backslash texttt{ret} \latex default instruction. But the specification fails to correctly specify what subroutines exactly are at machine instruction level. Consider algorithm \begin_inset LatexCommand \ref{jsrpopalgo} \end_inset . \layout Standard \begin_float alg \layout Standard \family typewriter 00 jsr\SpecialChar ~ 03\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; Jump to \begin_inset Quotes eld \end_inset subroutine \begin_inset Quotes erd \end_inset at offset 03; push return \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; address 03 onto stack. \layout Standard \family typewriter 03 pop\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; Pop the return address off the stack. \layout Standard \family typewriter 04 nop\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; No operation. \layout Caption \begin_inset LatexCommand \label{jsrpopalgo} \end_inset Is This a Subroutine? \end_float \layout Standard What is this? Is the \emph on NOP \emph default instruction part of a subroutine or not? Algorithm \begin_inset LatexCommand \ref{OneOrTwoSubroutinesAlgo} \end_inset shows another example. \layout Standard \begin_float alg \layout Caption \begin_inset LatexCommand \label{OneOrTwoSubroutinesAlgo} \end_inset One or Two Subroutines? \layout Standard \family typewriter 00 iload_0\SpecialChar ~ \SpecialChar ~ ; Load a numerical 0 onto the stack. \layout Standard \family typewriter 01 jsr\SpecialChar ~ 05\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; Jump to "subroutine" at offset 05; push return \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; address 04 onto stack. \layout Standard \family typewriter 04 return\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; Leave the method. \layout Standard \family typewriter 05 dup\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; Duplicate the stack's top. \layout Standard \family typewriter 06 astore\SpecialChar ~ 0\SpecialChar ~ ; Store the return address from the stack into \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; local variable 0. \layout Standard \family typewriter 07 astore\SpecialChar ~ 1\SpecialChar ~ ; Store the return address from the stack into \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; local variable 1. \layout Standard \family typewriter 08 ifeq\SpecialChar ~ 12\SpecialChar ~ \SpecialChar ~ ; If there is a 0 on top of the stack, jump to \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; offset 12. \layout Standard \family typewriter 11 ret\SpecialChar ~ 0\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; Return to offset 4 (because this is in local \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; variable 0 here). \layout Standard \family typewriter 12 nop\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; No operation. \layout Standard \family typewriter 13 ret\SpecialChar ~ 1\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; Return to offset 4 (because this is in local \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; variable 1 here). \end_float \layout Standard Do we deal with one subroutine (which is the case if you define subroutines to start with a \latex latex \backslash texttt{jsr} \latex default or \latex latex \backslash texttt{jsr \backslash _w} \latex default 's target) or are these two subroutines (which is the case if you count the \latex latex \backslash texttt{ret} \latex default instructions and believe that there must be exactly one \latex latex \backslash texttt{ret} \latex default per subroutine)? \layout Standard Recursive calls to subroutines are forbidden by the specification; however, Sun's verifier implementations are not consequently deciding which recursive calls to reject \begin_float footnote \layout Standard This was experimentally found by the author and also published in \begin_inset LatexCommand \cite{JBook} \end_inset . \end_float . This is a failure due to a missing definition of the term \emph on subroutine \emph default . \layout Standard While the first example passes Sun's verifier, the second example is rejected. The exact definition of the term \emph on subroutine \emph default cannot be deducted from ther behaviour of Sun's verifier. \layout Standard A new, clean specification had to be defined. Such a specification can of course not be compatible with the behaviour of Sun's verifier in all corner cases. \layout Subsubsection \begin_inset LatexCommand \label{Subroutines_Def} \end_inset A Precise Definition of the Term \emph on Subroutine \layout Standard Because Sun --inappropriately-- describes how \emph on javac \emph default creates subroutines, the definition presented here is based on the observation of \emph on javac \emph default 's behaviour. This makes the definition compatible with a lot of existing code, but without violating the validity of far-reaching conclusions earned by exploiting a clean definition \begin_float footnote \layout Standard Unfortunately, in some rare cases, \emph on javac \emph default produces code that is incompatible with the constraints related to our definition of \emph on subroutine \emph default . However, \emph on javac \emph default also produces code which is incompatible with Sun's verifier (see section \begin_inset LatexCommand \ref{StaerkJreject} \end_inset ). \end_float . \layout Itemize Every instruction of a method is part of exactly one subroutine (or the top-level). \layout Itemize The first instruction of a subroutine is an \latex latex \backslash texttt{astore N} \latex default instruction that stores the return address in local variable number \emph on N \emph default . \layout Itemize There must be exactly one \latex latex \backslash texttt{ret} \latex default instruction per subroutine. This instruction must work on the local variable \emph on N \emph default ; i.e., it is a \latex latex \backslash texttt{ret N} \latex default instruction. \layout Itemize Subroutines are not protected by exception handlers. \layout Itemize No instruction that is part of a subroutine is the target of an exception handler. \layout Itemize Subroutines of a subroutine do not access local variable \emph on N \emph default . A subsubroutine of a subroutine is also considered a subroutine here, in a recursive sense. \layout Standard As we can see, a subroutine can be characterized by its set of instructions, the most important instruction being the target of some \latex latex \backslash texttt{jsr} \latex default or \latex latex \backslash texttt{jsr \backslash _w} \latex default instruction that is not part of the subroutine itself. Another important property is the local variable \emph on N \emph default the \latex latex \backslash texttt{ret} \latex default instruction is working on. \layout Standard This way, we can make sure subroutines are properly nested, so that JustIce would reject both the example bytecodes in algorithms \begin_inset LatexCommand \ref{jsrpopalgo} \end_inset and \begin_inset LatexCommand \ref{OneOrTwoSubroutinesAlgo} \end_inset . \layout Standard The \latex latex \backslash texttt{astore} \latex default instruction mentioned above is so important because there is no JVM instruction that can read values of a \latex latex \backslash texttt{returnaddress} \latex default type from local variables. After entering a subroutine, the \latex latex \backslash texttt{astore} \latex default instruction pops the return address off the operand stack and writes it into local variable number \emph on N \emph default . Therefore we can be sure it will not be duplicated or deleted as in algorithms \begin_inset LatexCommand \ref{jsrpopalgo} \end_inset and \begin_inset LatexCommand \ref{OneOrTwoSubroutinesAlgo} \end_inset . \layout Standard The constraints concerning exception handlers are defined to make sure that we can observe the control flow statically. If an exception is thrown from within a subroutine, the method simply \begin_inset Quotes eld \end_inset \emph on completes abruptly \emph default \begin_inset Quotes erd \end_inset ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 74). If we would allow subroutine instructions to be protected by exception handlers, it would not be clear if the handling instructions are part of the subroutine or not. \layout Standard We can also derive subsubroutines of subroutines recursively by exploiting the properly-nested property explained above. \layout Subsubsection The Control Flow Graph \layout Standard A control flow graph is a directed graph with edges that represent possible branches of control flow. Similarly, the nodes describe groups of physically adjacent instructions that have to be executed one after another -- without any possible control flow branch to another instruction but the physical successor \begin_float footnote \layout Standard More information about control flow graphs can be found in \begin_inset LatexCommand \cite{DragonBook} \end_inset . \end_float . Figure \begin_inset LatexCommand \ref{convcfg} \end_inset shows such a control flow graph for algorithm \begin_inset LatexCommand \ref{facjavabytecode} \end_inset , the implementation of the faculty function discussed earlier. \layout Standard \begin_float fig \layout Standard \align center \begin_inset Figure size 595 368 file conventcfg.eps width 3 100 flags 9 \end_inset \layout Caption \begin_inset LatexCommand \label{convcfg} \end_inset A Conventional Control Flow Graph \end_float \layout Standard The JVM defines a sort of control flow orthogonal to the common execution of instructions, namely, the exception mechanism. Because every instruction could possibly throw an exception (say, a \family typewriter java.lang.VirtualMachineError \family default ) during its execution, the control flow graph calculated by JustIce always uses only one instruction per node. This also reflects the original verification algorithm given by Sun Microsystem s. Figure \begin_inset LatexCommand \ref{justicecfg} \end_inset shows an example for such a control flow graph. \layout Standard \begin_float fig \layout Standard \align center \begin_inset Figure size 595 473 file justicecfg.eps width 3 100 flags 9 \end_inset \layout Caption \begin_inset LatexCommand \label{justicecfg} \end_inset A Control Flow Graph as Used by JustIce \end_float \layout Standard Instruction nodes are augmented with a data structure that represents the simulated operand stack and the simulated local variables array. When running the core verification algorithm, these nodes are put into a queue which is equivalent to tagging them with a \emph on changed \emph default bit as Sun describes \begin_float footnote \layout Standard As explained later, JustIce uses a queue that allows duplicates: this is a slight semantical change. \end_float . \layout Subsubsection Subroutines Revisited: Interplay With the Data Flow Analyzer \layout Standard There is another problem concerning subroutines. Normally, when merging the type information of two simulated local variables, the common type is recorded as \emph on unusable \emph default if the types differ. This \emph on unusable \emph default value is then propagated to subsequent instructions to prevent read access. \layout Standard This is not the case with the successors of the \latex latex \backslash texttt{ret} \latex default instruction. These successors are physical successors of some \latex latex \backslash texttt{jsr} \latex default or \latex latex \backslash texttt{jsr \backslash _w} \latex default instructions. \layout Standard Subroutines are said to be \emph on polymorphic \emph default with respect to their local variables arrays. As an example, consider algorithm \begin_inset LatexCommand \ref{lvpolymorphalgo} \end_inset . This algorithm shows legal JVM code. In line 11, local variable 0 may contain a value of the \family typewriter integer \family default or the \family typewriter float \family default type; depending on the \latex latex \backslash texttt{jsr} \latex default instruction that entered the subroutine. Normally, this would cause the verifier to mark local variable 0 as \emph on unusable \emph default and propagate this information. The successors of the \latex latex \backslash texttt{ret} \latex default instruction are the instructions in lines 5 and 10. However, a correct verifier does \emph on not \emph default mark local variable 0 as \emph on unusable \emph default for them, because the local variable 0 was not accessed or modified in the subroutine. \layout Standard \begin_float alg \layout Caption \begin_inset LatexCommand \label{lvpolymorphalgo} \end_inset Local Variables are Polymorphic in Subroutines \layout Standard \family typewriter 0 : iconst_0\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; load integer constant 0 onto stack \layout Standard \family typewriter 1 : istore 0\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; move it into local variable 0 \layout Standard \family typewriter 2 : jsr 11\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; enter subroutine \layout Standard \family typewriter 5 : fconst 0.0\SpecialChar ~ ; load float constant 0.0 onto stack \layout Standard \family typewriter 6 : fstore 0\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; move it into local variable 0 \layout Standard \family typewriter 7 : jsr 11\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; enter subroutine again \layout Standard \family typewriter 10: return\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; complete method \layout Standard \family typewriter 11: astore 1\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; Subroutine entry: move return address \layout Standard \family typewriter \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; into local variable 1 \layout Standard \family typewriter 12: nop\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; do nothing \layout Standard \family typewriter 13: ret 1\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; return from subroutine \end_float \layout Standard Basically, only the local variables accessed in the called subroutine (and the subroutines called from there, recursively) are merged with the correspondi ng successor of a \latex latex \backslash texttt{ret} \latex default instruction. This means that in this special case, three sources are used to construct the merged array of local variables type information (instead of only two): the \latex latex \backslash texttt{jsr} \latex default / \latex latex \backslash texttt{jsr \backslash _w} \latex default instruction, the \latex latex \backslash texttt{ret} \latex default instruction and the "old" type information of the \latex latex \backslash texttt{ret} \latex default instruction's target (which is the physical successor of the \latex latex \backslash texttt{jsr} \latex default / \latex latex \backslash texttt{jsr \backslash _w} \latex default instruction). \layout Standard One possibility to deal with this situation is \emph on inlining \emph default . For instance, the verifier of the ElectricalFire JVM \begin_inset LatexCommand \cite{EF} \end_inset uses this approach: instruction nodes of subroutines are duplicated for every calling \latex latex \backslash texttt{jsr} \latex default or \latex latex \backslash texttt{jsr \backslash _w} \latex default instruction. This approach is equivalent to the one sketched by Sun (see \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 151). \layout Standard JustIce uses a variant of this approach: instruction nodes are augmented with sets of local variables arrays. The local variables array used for merging a \latex latex \backslash texttt{ret} \latex default 's type information with the physical successor of some \latex latex \backslash texttt{jsr} \latex default / \latex latex \backslash texttt{jsr \backslash _w} \latex default instruction is keyed by that \latex latex \backslash texttt{jsr} \latex default / \latex latex \backslash texttt{jsr \backslash _w} \latex default instruction itself. This still implies a special merging mechanism for the \latex latex \backslash texttt{ret} \latex default instruction: only the physical successor of one \latex latex \backslash texttt{jsr} \latex default / \latex latex \backslash texttt{jsr \backslash _w} \latex default instruction can be merged with the \latex latex \backslash texttt{ret} \latex default at a time, because other \latex latex \backslash texttt{jsr} \latex default / \latex latex \backslash texttt{jsr \backslash _w} \latex default instructions have possibly not been symbolically executed yet and thus bear no type information at the time of merging. In this scenario, an instruction in a subroutine plays multiple roles; one for each occurence of a \latex latex \backslash texttt{jsr} \latex default / \latex latex \backslash texttt{jsr \backslash _w} \latex default that is calling the subroutine. The queue holding the instructions to symbolically execute is therefore required to allow duplicates. \layout Subsubsection Wide Data Types \layout Standard The types \family typewriter long \family default and \family typewriter double \family default use two consecutive local variables if written to or read from a local variables array. Similarly, they use two operand stack slots. This makes type verification a bit more difficult because of subtle special cases. For example, when a method uses three local variables at maximum (local variables 0, 1 and 2), the code is not allowed to store a \family typewriter double \family default value in local variable 2 (because local variable 3 would have to be occupied, too). \layout Subsubsection Instance Initialization and Newly Created Objects \layout Standard It would be difficult to verify that a newly created instance is initialized exactly once, given all possible paths of execution flow in a method. Fortunately (from a verifier implementor's view), Sun puts constraints on object initialization that match the behaviour of the verifier --- instead of putting sane constraints on object initialization and actually verifying them. \layout Standard \begin_inset Quotes eld \end_inset A valid instruction sequence must not have an uninitialized object on the operand stack or in a local variable during a backwards branch [\SpecialChar \ldots{} ]. Otherwise, a devious piece of code might fool the verifier into thinking it had initialized a class instance when it had, in fact, initialized a class instance created in a previous pass through a loop \begin_inset Quotes erd \end_inset ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 148). \layout Section \begin_inset LatexCommand \label{Pass4Spec} \end_inset Pass Four \layout Standard Pass four performs \begin_inset Quotes eld \end_inset certain tests that could in principle be performed in Pass 3 \begin_inset Quotes erd \end_inset ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 142). These tests are usually delayed by JVM implementations until run-time, because they possibly trigger the loading of referenced class file definitions. This is a performance enhancement. However, \begin_inset Quotes eld \end_inset A Java virtual machine implementation is allowed to perform any or all of the Pass 4 steps as part of Pass 3 \begin_inset Quotes erd \end_inset ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 143). The tests \layout Itemize ensure that the referenced method or field exists in the given class \layout Itemize check that the referenced method or field has the indicated descriptor (signatur e) \layout Itemize check that the currently executing method has access to the referenced method or field. \layout Standard JustIce has no run-time system and so the tests of pass four are performed in pass 3a. \layout Standard There are tests that have to be performed at run-time: for example, if an object referenced by an object reference on top of the operand stack implements a certain interface or not \begin_inset LatexCommand \cite{Fong2-WWW} \end_inset . These are not considered part of the pass four verification. \layout Chapter Implementation of the Verification Passes \layout Standard Occasionally, the behaviour of other verifier implementations was explained in section \begin_inset LatexCommand \ref{SpecPasses} \end_inset \emph on . \emph default This is not a mistake; the Java Virtual Machine Specification, Second Edition \begin_inset LatexCommand \cite{vmspec2} \end_inset is unfortunately not detailed enough to make a clean-room implementation of the JVM verifier possible. Having a close look at the behaviour of existing verifier implementations is sometimes necessary to interpret the specification correctly. For that reason, the behaviour of these implementations is part of the specification of JustIce whereever appropriate. Still, there are some minor differences in behaviour between JustIce and the traditional JVM built-in verifiers. These differences were observed by using the traditional verifiers, not by inspecting their source code. \layout Standard JustIce is implemented in the Java programming language \begin_inset LatexCommand \cite{langspec2} \end_inset using the Byte Code Engineering Library \begin_inset LatexCommand \cite{BCEL-WWW,BCEL98} \end_inset . \layout Section Pass One \layout Standard The Byte Code Engineering Library (BCEL) presents an object oriented view of the class file structure. Therefore, an integral part of that library is parsing class files. JustIce uses the BCEL, so there was nothing left to do to load a class file in. Only minor changes were made to the BCEL to make it more verbose when exception al situations occur; i.e., when a garbled class file is loaded in. The BCEL uses Java's exception mechanism to signal these situations; JustIce transforms this behaviour into the behaviour expected by users of the Verificat ion API (see section \begin_inset LatexCommand \ref{Verification API} \end_inset ). \layout Subsubsection Comparison to Sun's Implementation \layout Standard There does not seem to be any difference in behaviour between JustIce and the traditional verifiers. Still, this conviction is a result of black box tests so it might not be true in corner cases. \layout Standard Unknown attributes are ignored (though JustIce records a warning message, where the traditional verifiers don't). \layout Standard Trailing bytes at the end of the class file are ignored in both versions, contradicting the specification. This was necessary because some Java run-time environments are broken concernin g the handling of .JAR archive files. The mechanism of loading class files from these archives files using the Java Platform's API is used by BCEL and probably by Sun's JVM, too. It is possible that this is the reason why Sun's verifier itself does not enforce this constraint. However, it does not really pose a threat to the integrity of any JVM known to the author. There is no entry in the \family typewriter ClassFile \family default structure (see section \begin_inset LatexCommand \ref{Classfile Structure} \end_inset ) stating how long the class file is in its entirety, so a JVM implementor cannot possibly base a wrong decision on that. \layout Section \begin_inset LatexCommand \label{Pass2Impl} \end_inset Pass Two \layout Standard JustIce does perform \begin_inset Quotes eld \end_inset all verification that can be performed without looking at the bytecodes \begin_inset Quotes erd \end_inset in pass two. For some reasons (like determining a valid ancestor hierarchy of a class), pass two of JustIce has to load referenced classes. Of course, this is done in a careful way: by pass-one-verifying them. If loading of a referenced class should fail (i.e., verification pass one fails on this class), the referencing class is rejected by JustIce's pass two. Pass two of JustIce does not pass-two-verify any referenced classes. \layout Standard Also, JustIce's pass two emits a wealth of (warning) messages. Their target is to guide a bytecode engineer to create class files that are indistinguishable from those created by Sun's \emph on javac \emph default compiler with no debugging output. For example, the use of \family typewriter LineNumberTable \family default attributes (see section \begin_inset LatexCommand \ref{LineNumberTableAttribute} \end_inset ) is discouraged, because these atributes are only useful for debugging purposes. Still, they can be the reason for a class file to be rejected -- to be on the safe side, finished applications for the JVM should not be shipped with this debug information. \layout Standard Most of the checks of pass two were implemented using the Visitor programming pattern \begin_inset LatexCommand \cite{DesignPatterns} \end_inset provided by the BCEL's \emph on de.fub.byte\SpecialChar \- code.class\SpecialChar \- file \emph default API. This made it possible to have all the verification split into several methods without having to define artificial boundaries. For instance, a \family typewriter ConstantValue \family default attribute is verified in a method called \emph on visitConstantValue(ConstantValue) \emph default . This is a use of the object oriented view of class files the BCEL offers. \layout Subsubsection Comparison to Sun's Implementation \layout Standard JustIce does not distinguish between run-time or link-time because it was not intended to implement a JVM. Therefore, the notion of \emph on resolving \emph default (see section \begin_inset LatexCommand \ref{SpecPassTwo} \end_inset ) is useless for JustIce. The author believes that the specification of pass two given by Sun closely reflects their implementation (or the other way around) \begin_float footnote \layout Standard The Java Virtual Machine Specification, Second Edition, began as an internal project documentation ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page xiv). Unfortunately, this can still be felt sometimes. \end_float . \layout Standard Sometimes, there are ambiguities in the specification. For instance, it is said that \begin_inset Quotes eld \end_inset If the constant pool of a class or interface refers to any class or interface that is not a member of a package, its \family typewriter ClassFile \family default structure must have exactly one \family typewriter InnerClasses \family default attribute in its \family typewriter attributes \family default table \begin_inset Quotes erd \end_inset . A class or interface that is \begin_inset Quotes eld \end_inset not member of a package \begin_inset Quotes erd \end_inset is better known as a \emph on nested class \emph default or \emph on inner class \emph default \begin_inset LatexCommand \cite{InnerSpec} \end_inset , but this is something specific to the Java language. The \emph on javac \emph default compiler creates multiple, often funny-named \begin_float footnote \layout Standard For anonymous classes defined in a class \emph on X \emph default the names are \emph on X$1 \emph default , \emph on X$2 \emph default and so on. For a named inner class \emph on I \emph default defined in class \emph on C \emph default the name is \emph on C$I \emph default . There is, however, no guarantee for that: this is only observed behaviour of javac. Please see section \begin_inset LatexCommand \ref{InnerBug} \end_inset for an example how this behaviour can lead to unexpected problems. \end_float class files that are otherwise indistinguishable from normal class files. \layout Standard Therefore, it is generally not possible to decide if such an attribute is missing; therefore Sun's implementation does not check this constraint. JustIce, in contrast, uses its warning mechanism if the name of a referenced class or interface could be a name of an inner class created by the \emph on javac \emph default compiler and the \family typewriter InnerClass \family default attribute is missing. \layout Standard The sets of accepted or rejected class files concerning pass two are equal using both Sun's implementation and JustIce, as exhaustive tests show. This can, however, not be proven because one would need to analyze Sun's source code for that (which is not intended: as already mentioned, JustIce is a clean-room implementation). \layout Section Pass Three \layout Subsection Pass 3a \layout Standard One feature of the BCEL's \emph on de.fub.bytecode.generic \emph default package is parsing code attributes of methods and transforming them into so-called \family typewriter Instruction\SpecialChar \- List \family default objects. Consequently, this feature is used to implement pass 3a; a few additional checks have been implemented where BCEL is too \begin_inset Quotes eld \end_inset trustful \begin_inset Quotes erd \end_inset when parsing, i.e., where BCEL relies on the correctness of the class file. \layout Standard Pass 3a consists of the checking of static constraints on instructions and static constraints on operands of these instructions. The successful creation an an \family typewriter Instruction\SpecialChar \- List \family default object already implies that the static constraints on instructions are satisfied. Similar to pass one, JustIce transforms the behaviour of BCEL's exception mechanism into the behaviour expected by users of the Verification API (see section \begin_inset LatexCommand \ref{Verification API} \end_inset ). \layout Standard The \emph on de.fub.byte\SpecialChar \- code.ge\SpecialChar \- ne\SpecialChar \- ric \emph default API provided by BCEL offers a Visitor design pattern similar to the one of the \emph on de.fub.byte\SpecialChar \- code.class\SpecialChar \- file \emph default API. The tests for the static constraints on operands of instructions are implemente d by using it. For example, the constraints put on the operands of any \latex latex \backslash texttt{iload} \latex default instruction are verified using a \emph on visitILOAD(ILOAD) \emph default method defined in a Visitor class. This Visitor class implements all the checks for integrity of all instruction's operands. Algorithm \begin_inset LatexCommand \ref{visitILOADstaticoperands} \end_inset shows the impementation of the \emph on visitILOAD(ILOAD) \emph default method. \begin_float alg \layout Caption \begin_inset LatexCommand \label{visitILOADstaticoperands} \end_inset visitILOAD, Visitor ensuring static constraints on operands of instructions \layout Standard \family typewriter \SpecialChar \- \SpecialChar ~ /** Checks if the constraints of operands of the said instruction(s) are satisfied. */ \newline \SpecialChar \- public void visitILOAD(ILOAD o){ \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ int idx = o.getIndex(); \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ if (idx < 0){ \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ constraintViolated(o, "Index '"+idx+"' must be non-negative."); \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ } \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ else{ \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ int maxminus1 = max_locals()-1; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ if (idx > maxminus1){ \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ constraintViolated(o, "Index '"+idx+"' must not be greater than max_locals-1 '"+maxminus1+"'."); \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ } \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ } \newline } \end_float \layout Standard JustIce does not provide any run-time, so the tests of pass four (see section \begin_inset LatexCommand \ref{Pass4Spec} \end_inset ) are not delayed until run-time, but performed here. \layout Subsubsection Comparison to Sun's Implementation \layout Standard Sun does not distinguish pass 3a and pass 3b. However, Sun's verifiers also have to ensure that the static constraints on instructions are satisfied before starting data flow analysis. \layout Standard This is obvious because a data structure has to be built before the data flow analyzer can be run; and this data structure has to be built carefully \begin_float footnote \layout Standard This actually means verifying the structural integrity of the bytecodes. \end_float because passes one and two did not look at the bytecodes before. \layout Standard JustIce does implement pass four checks in pass 3a which Sun's verifiers do not. Because JustIce provides no run-time, the outcome of a verification failure is reported instantly. Traditional JVMs are required to silently delay the actions triggered by that knowledge until run-time. \layout Subsection Pass 3b \layout Standard JustIce aims at implementing Sun's data flow analyzing algorithm as closely as possible. First, a control flow graph is built --- which implies analyzing a method's subroutine calling structure first. \layout Standard After that an implementation of the core algorithm sketched by Sun Microsystems is started. Verification failure is internally signalled by the Java exception handling mechanism which is then transformed to match the Verification API (see section \begin_inset LatexCommand \ref{Verification API} \end_inset ). \layout Subsubsection \begin_inset LatexCommand \label{SubroutineImpl} \end_inset Subroutines \layout Standard Subroutines are modeled as instances of the \family typewriter Subroutine \family default interface \emph on . \emph default They provide the following methods (note that an \family typewriter InstructionHandle \family default is the BCEL's programming handle to instruction objects and that \emph on X[] \emph default is the common Java notation for \emph on array of \emph default \emph on X \emph default ): \layout Itemize \emph on boolean contains(InstructionHandle) \emph default \newline Returns true if and only if the given \family typewriter InstructionHandle \family default refers to an instruction that is part of this subroutine, \layout Itemize \emph on InstructionHandle[] getInstructions() \emph default \newline Returns all instructions that together form this subroutine, \layout Itemize \emph on int[] getAccessedLocalsIndices() \emph default \newline Returns an array containing the indices of the local variable slots accessed by this subroutine (read-accessed, write-accessed or both); local variables referenced by subroutines of this subroutine are not included, \layout Itemize \emph on int[] getRecursivelyAccessedLocalsIndices() \emph default \emph on \newline \emph default Returns an array containing the indices of the local variable slots accessed by this subroutine (read-accessed, write-accessed or both); local variables referenced by subroutines of this subroutine are included, \layout Itemize \emph on Subroutine[] subSubs() \emph default \emph on \newline \emph default Returns the subroutines that are directly called from this subroutine, \layout Itemize \emph on InstructionHandle[] getEnteringJsrInstructions() \emph default \newline Returns all the JsrInstructions that have the first instruction of this subroutine as their target, \layout Itemize \emph on InstructionHandle getLeavingRET() \emph default \newline Returns the one and only RET that leaves the subroutine. \layout Standard Together with information from a simple analysis of the possible control flow transfer of all the other instructions but \latex latex \backslash texttt{ret} \latex default (see section \begin_inset LatexCommand \ref{Pass3Spec} \end_inset ), a control flow graph is built. \layout Subsubsection The Control Flow Graph \layout Standard The control flow graph is a single instance with respect to a given method to verify. It is defined by providing access to a set of contexts of instructions. These are modeled as instances of the \emph on \family typewriter \emph default In\SpecialChar \- struc\SpecialChar \- tion\SpecialChar \- Con\SpecialChar \- text \family default interface. \layout Standard These instances enclose \family typewriter InstructionHandle \family default objects (which represent an instruction in the bytecode), but they augment these objects with type information (a set of \family typewriter Frame \family default s, see below) as needed by the data flow analysis algorithm. Also, a method called \emph on getSuccessors() \emph default is provided that calculates the possible control flow successors of a given \family typewriter In\SpecialChar \- struc\SpecialChar \- tion\SpecialChar \- Con\SpecialChar \- text \family default instance. \layout Standard The most notable method defined in the \family typewriter In\SpecialChar \- struc\SpecialChar \- tion\SpecialChar \- Con\SpecialChar \- text \family default \emph on \emph default interface is, however, the \emph on execute(Frame, ArrayList, InstConstraintVisitor, ExecutionVisitor) \emph default method. This method is used to symbolically execute a given instruction. \layout Standard The \family typewriter ArrayList \family default \emph on \emph default argument is there to record the subroutine calling chain. The properly-nested property of JustIce subroutines is exploited here: one can simply count \latex latex \backslash texttt{jsr} \latex default / \latex latex \backslash texttt{jsr \backslash _w} \latex default and \latex latex \backslash texttt{ret} \latex default instructions, similar to counting opened and closed braces in mathematical expressions. \layout Standard A \family typewriter Frame \family default is JustIce's model of an \emph on execution frame \emph default : a local variables array model together with an operand stack model. Every \emph on InstructionContext \emph default instance is augmented with such a frame (to be precise, a set of such frames as discussed in the specification of subroutines, see section \begin_inset LatexCommand \ref{Pass3Spec} \end_inset ). \layout Standard When frames are merged, the \emph on execute(Frame, ArrayList, InstConstraintVisitor, ExecutionVisitor) \emph default method of some successor \family typewriter InstructionContext \family default is called. The \family typewriter Frame \family default argument represents is the current type information of the predecessing \family typewriter InstructionContext. \layout Subsubsection Visitors \layout Standard As in pass 3a, the Visitor pattern of the BCEL \emph on de.fub.byte\SpecialChar \- code.ge\SpecialChar \- ne\SpecialChar \- ric \emph default API is also used in pass 3b. While it was used to verify the static constraints of pass three in pass 3a, it is now used to verify the structural constraints. \layout Standard Before an instruction \family typewriter X \family default is symbolically executed, the corresponding \emph on visitX(X) \emph default method is invoked on an \family typewriter InstConstraintVisitor \family default instance. This instance is there to verify all the preconditions are met to safely execute the instruction \family typewriter X \family default . The \family typewriter InstConstraintVisitor \family default class therefore holds information about the preconditions of all 212 valid Java bytecode instructions. A simplified version of this Visitor's \emph on visitILOAD(ILOAD) \emph default method is listed in algorithm \begin_inset LatexCommand \ref{visitILOADInstConstraints} \end_inset . \layout Standard Similarly, the \emph on \family typewriter \emph default ExecutionVisitor \family default class contains information about the behaviour of every bytecode instruction. An instance of this class is used to model the effect of the bytecode instructi ons on a \emph on Frame \emph default instance. Algorithm \begin_inset LatexCommand \ref{visitILOADExecution} \end_inset shows the \emph on visitILOAD(ILOAD) \emph default method of this Visitor. \layout Standard \begin_float alg \layout Caption \begin_inset LatexCommand \label{visitILOADInstConstraints} \end_inset visitILOAD, Visitor ensuring the structural (dynamic) constraints of instruction s \layout Standard \family typewriter public void visitILOAD(ILOAD o){ \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ int produce = o.produceStack(cpg); \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ if ( produce + stack().slotsUsed() > stack().maxStack() ){ \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ constraintViolated(o, "Cannot produce "+produce+" stack slots: only "+(stack().ma xStack()-stack().slotsUsed())+" free stack slot(s) left. \backslash nStack: \backslash n"+stack()); \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ } \newline [\SpecialChar \ldots{} ] \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ } \end_float \begin_float alg \layout Caption \begin_inset LatexCommand \label{visitILOADExecution} \end_inset visitILOAD, Visitor symbolically executing instructions \layout Standard \family typewriter /** Symbolically executes the corresponding Java Virtual Machine instruction. */ \newline \SpecialChar \- public void visitILOAD(ILOAD o){ \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ stack().push(Type.INT); \newline \SpecialChar \- } \end_float \begin_float alg \layout Caption Simplified Core Verification Algorithm of Pass 3b \layout Standard \series bold \size small public VerificationResult do_verify(Method m) \series default { \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ ControlFlowGraph cfg; \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ if (m.hasCode()) \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ cfg = new ControlFlowGraph(m) \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ else \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ return Good_VerificationResult; \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ Frame f = new Frame(); \shape slanted // local variables and operand stack \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ f.localVariables().initialize(m.signature()); \shape slanted // put formal param types into loc. vars \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ InstConstraintVisitor icv = new InstConstraintVisitor(); \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ ExecutionVisitor ev = new ExecutionVisitor(); \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ try{ \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ circulationPump(cfg, f, icv, ev); \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ } \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ catch(VerificationFailure){ \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ return Bad_VerificationResult; \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ } \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ return Good_VerificationResult; \layout Standard \size small } \newline \layout Standard \series bold \size small public void circulationPump(ControlflowGraph cfg, Frame startFrame, InstConstrai ntVisitor icv, ExecutionVisitor ev) throws VerificationFailure \series default { \layout Standard \size small Instruction start = cfg.getFirstInstruction(); \layout Standard \shape slanted \size small /* \layout Standard \shape slanted \size small Now merge the first frame (type info) into the first instruction. \layout Standard \shape slanted \size small Empty list -> no instructions have been executed before. \layout Standard \shape slanted \size small */ \layout Standard \size small start.execute(startFrame, EmptyInstructionList, icv, ev); \layout Standard \shape slanted \size small /* \layout Standard \shape slanted \size small Q is a Queue of pairs (Instruction, InstructionList). \layout Standard \shape slanted \size small */ \layout Standard \size small Queue Q = EmptyQueue; \layout Standard \shape slanted \size small /* \layout Standard \shape slanted \size small Put the first instruction into the queue. This is similar to initializing a breadth first search. \layout Standard \shape slanted \size small */ \layout Standard \size small Q.add (start, EmptyInstructionList); \layout Standard \shape slanted \size small /* \layout Standard \shape slanted \size small The main loop \layout Standard \shape slanted \size small */ \layout Standard \size small while (Q.isNotEmpty()){ \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ Instruction u = fst(Q.head()); \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ InstructionList ec = snd(Q.head()); \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ Q.removeHead(); \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ InstructionList oldchain = ec; \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ InstructionList newchain = ec++[u]; \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ for (all successors v of u){ \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \shape slanted /* \layout Standard \shape slanted \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ execute returns true if type info has changed. It may throw VerificationFailures. \layout Standard \shape slanted \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ */ \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ if (v.execute(u.getOutFrame(oldchain), newchain,icv,ev)) \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ Q.add((v, newchain)); \layout Standard \size small \SpecialChar \- \SpecialChar ~ \SpecialChar ~ } \layout Standard \size small } \end_float \layout Subsubsection \begin_inset LatexCommand \label{ComparisonSubroutines} \end_inset Comparison to Sun's Implementation \layout Standard JustIce was originally aimed to be as compatible to Sun's implementation as possible. However, the unclear specification prevents clean room implementations (i.e., implementations whose programmers did not look into Sun's code) from perfect compatibility. \layout Standard Fortunately, it JustIce closely matches Sun's implementation in its behaviour. As a test case, the author verified the transitive hull of the referenced class files starting with the \emph on de.fub.bytecode.verifier.Verifier \emph default class. This set includes most of the classes of the Java 2 API supplied by Sun Microsystems, i.e., a few hundreds of apparently correct classes. A very small number of class files was rejected by JustIce because of its different specification of subroutine constraints. No other rejects were encountered. \layout Standard Most class files that are found to be rejected by Sun's verifier implementations are rejected by JustIce, too. \layout Standard However, there are class file rejected by Sun's verifier implementations but not by JustIce. This should not occur, but JustIce does not mimic the programming errors of Sun's verifiers so far. Please see section \begin_inset LatexCommand \ref{javacRejected} \end_inset for a discussion on a selected incompatibility issue. \layout Standard An automated testing suite could solidify the trust in JustIce's implementation which is not implemented yet. Please see section \begin_inset LatexCommand \ref{VerifierValidationSuite} \end_inset for a discussion on that topic. \layout Section Pass Four \layout Standard The tests Sun's verifiers perform during run-time but which in principle could be performed in pass three \emph on are \emph default performed in pass 3a by JustIce. \layout Subsubsection Comparison to Sun's Implementation \layout Standard It sems natural that Sun's verifier implements the specification by Sun. Obviously, JustIce has no run-time so JustIce has no pass four. The checks Sun performs in pass four \begin_float footnote \layout Standard Some JVMs expose implementation mistakes concerning pass four verification. See section \begin_inset LatexCommand \ref{PassFourBug} \end_inset . \end_float are performed in pass 3a by JustIce. \layout Chapter \begin_inset LatexCommand \label{Verification API} \end_inset The Verification API \layout Section Introduction \layout Standard The Application Programming Interface (API) of JustIce uses object oriented design patterns \begin_inset LatexCommand \cite{DesignPatterns} \end_inset . Readers not familiar with design patterns are encouraged to read at least about the \emph on Visitor \emph default , \emph on Singleton \emph default , \emph on Observer \emph default and \emph on Factory \emph default patterns. \layout Standard JustIce currently consists of four packages: \emph on de.fub.byte\SpecialChar \- code.veri\SpecialChar \- fier \emph default , \emph on de.fub. byte\SpecialChar \- code.veri\SpecialChar \- fier.exc \emph default , \emph on de.fub.byte\SpecialChar \- code.veri\SpecialChar \- fier.statics \emph default and \emph on de.fub.byte\SpecialChar \- code.veri\SpecialChar \- fier. struc\SpecialChar \- tu\SpecialChar \- rals \emph default . (We shall from now on omit the preceding \emph on de.fub.byte\SpecialChar \- code \emph default .) The most important of them is the \emph on verifier \emph default package. The class \family typewriter VerifierFactory \family default can be found here; this is the place where all verification starts. The \family typewriter Veri\SpecialChar \- fier\SpecialChar \- Fac\SpecialChar \- tory \family default creates \family typewriter Verifier \family default instances; only the \family typewriter VerifierFactory \family default can create these instances. A \family typewriter Verifier \family default instance, in turn, has a one-to-one relationship with a class file to verify, \begin_inset Quotes eld \end_inset its class \begin_inset Quotes erd \end_inset . You can instruct a \family typewriter Verifier \family default instance to run a verification pass on its class yielding a \family typewriter VerificationResult \family default . \layout Standard All class files are fetched from the BCEL's class file repository, i.e., the class \family typewriter Re\SpecialChar \- po\SpecialChar \- si\SpecialChar \- to\SpecialChar \- ry \family default . The class files stored there are either put there by the user or they are read from the file system. For a bytecode engineer who uses the BCEL this is convenient, because one does not have to save the dynamically created class file first in order to load it into JustIce. \layout Standard Pass 1 and pass 2 are related to the \family typewriter ClassFile \family default structure as such; passes 3a and 3b verify the bytecode of a method. If a class file was created using the BCEL, the BCEL user already knows how the \family typewriter JavaClass \family default object looks like \begin_float footnote \layout Standard A \family typewriter JavaClass \family default object represents a class file in the BCEL. \end_float . The number of methods is known and the order of the methods in the class file is known. \layout Standard However, if this is not the case, one usually does not know the number of methods in a class file or the order of these methods. To carefully extract this information from an untrusted class file, one should first let a pass-2-verification run on this file. Afterwards, the information can be read from the \family typewriter JavaClass \family default object the BCEL offers. \layout Standard Finally, one is able to supply the \begin_inset Quotes eld \end_inset method index \begin_inset Quotes erd \end_inset needed by verification passes 3a and 3b. \layout Standard Basically, after pass 2 has been run successfully on a class file, one can safely use the methods in the BCEL's \emph on classfile \emph default package \emph on \emph default on that class file. After pass 3a has been run successfully on a method, one can safely work on that method using the BCEL's \emph on generic \emph default package. After pass 3b has been run successfully on all methods in a class file, this class file will not be rejected by other verifiers. \layout Standard Often, the run of a verification pass implies recursively verifying other class files as well (because they are somehow referenced). Therefore, \emph on Verifier \emph default instances for these referenced classes are created transparently. To be notified when such an event occurs, one can implement the \emph on VerifierFactoryObserver \emph default interface and let the \emph on VerifierFactory \emph default register your implementation. \layout Standard \begin_float fig \layout Standard \align center \begin_inset Figure size 595 863 file VerificationAPI.eps width 3 100 angle 90 flags 1 \end_inset \layout Caption UML class diagram of the Verification API \end_float \layout Standard A Verifier creates instances of PassVerifiers. A PassVerifier instance in charge of performing some later verification pass transparently creates PassVerifier instances for the preceding passes. Therefore, users of the Verification API do not have to care about the order of verification passes; i.e., earlier passes are run always before later passes. All verification results are cached; this way an unsual order of calls to the \emph on doPassX() \emph default methods of the \emph on Verifier \emph default class does not even waste computing time. \begin_float fig \layout Standard \align center \begin_inset Figure size 595 631 file V_API_SD.eps width 3 100 height 3 75 flags 9 \end_inset \layout Caption Informal UML sequence diagram showing the dependency of verification pass two on verification pass one. \end_float \layout Section Some Example Code \layout Standard The code below shows an example of how to use the API provided by JustIce. It will verify the transitive hull of all referenced class files. Normally, while verifying a class, referenced classes are recursively verified performing \emph on earlier \emph default passes. Verifiers that are using pass 1 on their class will not load in any other classes (see section \begin_inset LatexCommand \ref{SpecPasses} \end_inset ). Therefore, normally the transitive hull is \emph on not \emph default verified completely (it usually does not make sense to verify it, though -- it's done here only to give an example of what can be done). \family typewriter \size small \newline \newline 01\SpecialChar ~ package de.fub.bytecode.verifier; \newline 02\SpecialChar ~ import de.fub.bytecode.verifier.*; \newline 03\SpecialChar ~ import de.fub.bytecode.classfile.*; \newline 04\SpecialChar ~ import de.fub.bytecode.*; \newline 05\SpecialChar ~ /** \newline 06\SpecialChar ~ \SpecialChar ~ * This class has a main method implementing a demonstration program \newline 07\SpecialChar ~ \SpecialChar ~ * of how to use the VerifierFactoryObserver. It transitively verifies \newline 08\SpecialChar ~ \SpecialChar ~ * all class files encountered; this may take up a lot of time and, \newline 09\SpecialChar ~ \SpecialChar ~ * more notably, memory. \newline 10\SpecialChar ~ \SpecialChar ~ * \newline 11\SpecialChar ~ \SpecialChar ~ * @author Enver Haase \newline 12\SpecialChar ~ \SpecialChar ~ */ \newline 13\SpecialChar ~ public class TransitiveHull implements VerifierFactoryObserver{ \newline 14\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ /** Used for indentation. */ \newline 15\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ private int indent = 0; \newline 16\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ /** Not publicly instantiable. */ \newline 17\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ private TransitiveHull(){ } \newline 18 \newline 19\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ /* Implementing VerifierFactoryObserver. */ \newline 20\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ public void update(String classname){ \newline 21\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ for (int i=0; i \emph default if it exists (see \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 53). For the correct operation of the JVM it is important that this method does not contain an infinite loop. Verifying if this constraint is true is similar to the Halting Problem and therefore not generally computable \begin_inset LatexCommand \cite{Unknowable} \end_inset . A verifier has to omit the check and pass potentially unsafe class files. \layout Standard For another example, consider algorithm \begin_inset LatexCommand \ref{StackOverflowAlgo} \end_inset below. \layout Standard \begin_float alg \layout Caption \begin_inset LatexCommand \label{StackOverflowAlgo} \end_inset Rejected class \layout Standard \family typewriter public static int always_true() \layout Standard \family typewriter Code(max_stack = 1, max_locals = 1, code_length = 2) \layout Standard \family typewriter 0: iconst_1\SpecialChar ~ \SpecialChar ~ ; push constant 1 onto stack \layout Standard \family typewriter 1: ireturn\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; return constant 1 ( \begin_inset Quotes eld \end_inset true \begin_inset Quotes erd \end_inset ) \newline \layout Standard \family typewriter public static void good_method() \layout Standard \family typewriter 0: invokestatic NewClass0.always_true ()I (18) \layout Standard \family typewriter \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; Push \begin_inset Quotes eld \end_inset true \begin_inset Quotes erd \end_inset on stack \layout Standard \family typewriter 3: ifne #10\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; If \begin_inset Quotes eld \end_inset true \begin_inset Quotes erd \end_inset is on stack jump to 10 \layout Standard \family typewriter 6: pop \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; Pop a value off the stack \layout Standard \family typewriter 7: goto #6 \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; jump to 6 \layout Standard \family typewriter 10:return\SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ ; complete method \end_float This code is harmless, because lines 6 and 7 can never be executed (it would underflow the operand stack in an infinite loop). A class file with this code is rejected by JustIce and other verifiers, because the endless loop seems to be a malicious threat to the integrity of the JVM. \layout Standard We conclude that there cannot be a perfect verifier. All that could be done is reduce the degree of uncertainty. For practical purposes, i.e., to be compatible with Sun's implementation, one should not even do that. \layout Standard There is also a simple proof showing a perfect verifier does not exist in \begin_inset LatexCommand \cite{JNS} \end_inset , chapter 6. It uses a diagonalization argument. \layout Section Future Work \layout Standard Class file verification is an integral component of Java security; and applicati on programs running on the Java Virtual Machine are often used in security critical areas. Several security holes and flaws have been found both in implementations and the specification of the Java class file verifier since it was introduced. \layout Standard Recently, the area has experienced a leap as a theoretically founded, sound and complete Java environment was defined in \begin_inset LatexCommand \cite{JBook} \end_inset . Possibly Sun's engineers will use this work to improve Java and the Java verifier. JustIce will have to change to always keep close to the industry standard. \layout Standard But JustIce itself can also be improved concerning practicability, and new software can be developed on top of the Verification API. \layout Subsection Improvements to JustIce \layout Subsubsection Introduction of Unique Identifers for Verification Results and Warning Messages \layout Standard Currently, warning messages and verification results are conceptually text-based. Only \emph on VerificationResult \emph default objects include a numeric value which programs can use to decide if some class verification failed or not. A program like the prototype introduced in section \begin_inset LatexCommand \ref{GUI_APP} \end_inset can currently not hide specific messages from the user without parsing text. This limitation should be removed in the future by using unique message numbers. This would also make translation of the messages into other languages easier. \layout Subsubsection \begin_inset LatexCommand \label{NewVerificationStrategy} \end_inset A New Verification Strategy \layout Standard The core verification algorithm cited in section \begin_inset LatexCommand \ref{SunCoreAlgo} \end_inset works by generalizing the knowledge about an object type along the inheritance hierarchy. \layout Standard For instance, let there be an object of type \family typewriter java.util.Ab\SpecialChar \- stract\SpecialChar \- List \family default on the simulated stack of some modeled instruction. Let there be a loop so that the algorithm has to visit that same instruction again, this time with an object of type \family typewriter java.util.Ab\SpecialChar \- stract\SpecialChar \- Set \family default in that same stack slot. The verifier will compute the meet of the two types and record that there is some object of type \family typewriter java.util.Ab\SpecialChar \- stract\SpecialChar \- Collection \family default in that stack slot. \layout Standard Remember that the instruction will be marked with a \emph on changed \emph default bit until no such re-typing change occurs any more (JustIce will actually put it into a queue). \layout Standard This approach does not work very well when it comes to interface types instead of class files. For example, the meet of a \family typewriter java.lang.In\SpecialChar \- teger \family default and a \family typewriter java.lang.Doub\SpecialChar \- le \family default is a \family typewriter java.lang.Num\SpecialChar \- ber \family default because \family typewriter java.lang.Num\SpecialChar \- ber \family default \emph on \emph default is the first common super class. Both classes also implement the \family typewriter java.lang.Com\SpecialChar \- parable \family default interface, but \family typewriter java.lang.Num\SpecialChar \- ber \family default does not. This information is lost when replacing the type information. However, current verifiers do not reject the class files but make additional run-time checks necessary. \layout Standard Fong noticed that this could be the reason for the \latex latex \backslash texttt{invoke\SpecialChar \- interface} \latex default opcode to be underspecified \begin_inset LatexCommand \cite{Fong2-WWW} \end_inset (also see section \begin_inset LatexCommand \ref{InvokeInterfaceDescFONG} \end_inset ). \layout Standard Stärk et al. suggest the use of \emph on sets \emph default of reference types instead ( \begin_inset LatexCommand \cite{JBook} \end_inset , pages 229-231). This could also be implemented in JustIce. \layout Subsubsection Keeping up with Specification Clarifications \layout Standard As a clean-room implementation, JustIce depends on the clearness of the specification. Ambiguities could lead to programming errors. \layout Standard Here we give one example: methods can be inherited in Java (for example, the method \emph on clone() \emph default is declared in the \family typewriter java.lang.Ob\SpecialChar \- ject \family default class and therefore inherited by every other class). \layout Standard Let a class \family typewriter A \family default be a subclass of \family typewriter java.lang.Ob\SpecialChar \- ject \family default and let class \family typewriter B \family default be a subclass of \family typewriter A \family default . Also, let class \family typewriter B \family default override the definition of \emph on clone() \emph default with an own implementation. \layout Standard If \emph on javac \emph default compiles a Java program that invokes this method, it is either referenced as \emph on java.lang.Ob\SpecialChar \- ject::clone() \emph default or as \emph on B::clone() \emph default . However, because \family typewriter A \family default inherits this method, the reference \emph on A::clone() \emph default is legal, too. \layout Standard In The Java Virtual Machine Specification, Second Edition ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 291) it is said that the reference must be a \begin_inset Quotes eld \end_inset symbolic reference to the class in which the method is to be found \begin_inset Quotes erd \end_inset . Statically, the method \emph on clone() \emph default can of course not be found in class \family typewriter A \family default . One could therefore think the reference \emph on A::clone() \emph default was not legal. \layout Standard In the meanwhile, Sun's engineer Gilad Bracha clarified this issue: \begin_inset Quotes eld \end_inset Of course. This is discussed in JVMS 5.4.3.4, which describes interface method resolution. I don't see the text on page 280 as contradicting that. The symbolic reference does give an interface in which the required method can be found, albeit as an inherited member. We could try and reword it in a more precise way, to eliminate any misunderstan dings. \begin_inset Quotes erd \end_inset \layout Standard Keeping up with clarifications like this is an inevitable and on-going part of the development of JustIce. \layout Subsubsection Keeping up with Java Extensions \layout Standard Recently, Sun Microsystems introduced a new attribute: the \family typewriter StackMap \family default attribute which is an attribute local to the \family typewriter Code \family default attribute (see section \emph on \begin_inset LatexCommand \ref{CodeAttribute} \end_inset \emph default ). It was specified in \begin_inset LatexCommand \cite{J2ME-CLDCS} \end_inset . \layout Standard It is there to provide \begin_inset Quotes eld \end_inset limited devices \begin_inset Quotes erd \end_inset that perform a one-pass verification with type information that would normally have to be inferred by the verifier. \layout Standard It is not used by the verification algorithm of JustIce now: it's currently an \emph on unknown attribute \emph default to JustIce. \layout Subsubsection Detecting Local Variable Accesses out of Scope \layout Standard The \family typewriter LocalVariableTable \family default attribute is a debug information attribute. Basically, it gives debuggers information about the original (source code) name and type of a given local variable. \layout Standard JustIce builds data structures to warn if it detects contradicting and overlappi ng areas; e.g., if some local variable is anounced to carry an \family typewriter int \family default value and a \family typewriter float \family default value at the same time. \layout Standard It could also be interesting to warn if a local variable is accessed for which no debug information exists. This is currently not implemented. \layout Subsubsection Extending the Verification API \layout Standard JustIce can easily be extended to run certain analyses related to symbolic bytecode execution. \layout Standard This includes the computation of the maximum number of used operand stack slots in a method or the computation of unused local variables in a method. \layout Standard These analyses are normally costly to implement \begin_float footnote \layout Standard Often, heuristics are used such as the method MethodGen.getMaxStack() in the BCEL \begin_inset LatexCommand \cite{BCEL-WWW,BCEL98} \end_inset . \end_float , but they are a waste product of the verifier's core algorithm. \layout Subsubsection \begin_inset LatexCommand \label{VerifierValidationSuite} \end_inset A Verifier Validation Suite \layout Standard The Kimera project \begin_inset LatexCommand \cite{Kimera-WWW} \end_inset was the first known project to implement a stand-alone Java verifier. The people behind the project had to test the behaviour of their verifier against the behaviour of the previous implementations. Tests have been run in order to validate the Kimera verifier. These tests range from simply introducing random one-byte errors into class files and automatically running Kimera against other verifiers to elaborate research work \begin_inset LatexCommand \cite{Kimera-ProdGram,Kimera-TestingJVM} \end_inset . \layout Standard Currently, JustIce comes only with a very limited possibility of running test cases against the native verifier of the host machine's JVM. The pioneering work of the Kimera project could be used to implement a validation suite for JustIce. \layout Subsection \begin_inset LatexCommand \label{Firewall} \end_inset A Verifier Protecting an Intranet \layout Standard Often, Java Virtual Machines are built into software used to browse the World Wide Web such as the KDE project's \emph on Konqueror \begin_inset LatexCommand \cite{KDE} \end_inset \emph default or Mozilla.org's \emph on Mozilla \emph default \begin_inset LatexCommand \cite{Mozilla} \end_inset products. Such Internet technology is also often used in corporate networks. Corporate networks based on internet technology are called \emph on intranets \emph default ; these networks are normally protected from the Internet by a so-called \emph on firewall \emph default computer. \layout Standard This computer's task is to provide access to the internet only to privileged employees and --even more important-- it blocks access from unauthorized persons outside the intranet. The firewall machine is a single, bi-directional point of access. \layout Standard However, normally web-browsing is considered harmless, so that the employees can unrestrictedly gather information, possibly visiting Java-enabled web sites. The JVMs built into the browser software run software downloaded from the World Wide Web; while the the built-in verifiers make sure that no dangerous code can be executed. \layout Standard Let us assume someone discovered a security hole in the verifier implementation or implementations that are used on the corporate network's workstations; let us also assume a patch exists that would fix the problem. \layout Standard A system administrator would have to spent a lot of time to repair every single verifier. A cheaper solution would be a verifier built into the firewall machine; such a verifier can easily be implemented using JustIce and its Verification API. \layout Subsection A Java Virtual Machine Implementation Using JustIce \layout Standard The Java verifier is originally a part of the Java Virtual Machine. JustIce could also be part of a Java Virtual Machine. JustIce's class files (the program code JustIce consists of) could simply be integrated into the core Java class files. The execution engine would then run JustIce without actually verifying JustIce's class files themselves. \layout Standard For scientific purposes one could also implement a JVM in the Java programming language. Such an implementation could, for example, serve as a debugger. \layout Subsection \begin_inset LatexCommand \label{LinePrincipleInfoHidingAndSecurity} \end_inset Drawing a Clear Line Between the Principle of Information Hiding and Security \layout Standard The principle of information hiding has been (and still is!) a practice of experienced programmers for many years. It is there to reduce programming errors. \layout Standard In the Modula-2 programming language \begin_inset LatexCommand \cite{M2} \end_inset this is achieved by explicitely dividing the program code in definition modules and implementation modules. In older programming languages, such as in the C programming language \begin_inset LatexCommand \cite{C} \end_inset , this principle is implicitely used, too. Basically this is achieved by defining interfaces that only describe what the code of a program module does. These interface \begin_inset Quotes eld \end_inset headers \begin_inset Quotes erd \end_inset are included into user code instead of simply including the code itself. \layout Standard In object-oriented programming languages such as in Delphi \begin_inset LatexCommand \cite{D3} \end_inset , C++ \begin_inset LatexCommand \cite{CPP-D,CPP-E} \end_inset or Java \begin_inset LatexCommand \cite{langspec2} \end_inset , this principle is refined to what is called object encapsulation. When a class is defined, certain key words such as \family typewriter private \family default , \family typewriter protected \family default , \family typewriter friend \family default , \family typewriter public \family default , \family typewriter published \family default set the access rules for the members \begin_float footnote \layout Standard The members of a class are its components: methods (program code) and fields (also called attributes or variables). \end_float of an object of the given class. \layout Standard Still, this refined technique does not have anything to do with security. It is only there to aid programmers create a reasonable design. If every piece of code could manipulate every data structure, one would not know where to look for a programming error in the program source code. On the other hand, if some field is private in C++, one could (with some knowledge about the compiler used) still reference and modify this field by pointer manipulation. In addition to that, a second program like a debugger could watch even the data of private fields. \layout Standard However, when a Java program is compiled into the language of the JVM, the information about the access rights of the fields and methods is included. This is where the principle of information hiding is exploited to provide security. For example, the verifier of the JVM has to make sure private fields are never accessed from a foreign piece of code. But there are many implementations of the JVM which have security flaws such as not honouring the access rights. There are debuggers for JVM bytecodes, too. \layout Standard When one thinks about security, one has to think of some enemy who could try to harm the computer or information stored on that computer. From a JVM user's point of view, the JVM is relatively secure. Even running untrusted code cannot do much harm. Because the security flaws in different JVM implementations differ, they are probably not exploited most times. \layout Standard From a Java programmer's point of view, the JVM is not secure. Untrusted users can do much harm. For example, an online banking application storing important data in Java fields (such as access information to the bank's database management system) is a threat to both the bank and its customers. This information could easily be extracted by a malicious user. \layout Standard Another problem for Java programmers is the amount of symbolical information stored in class files. Today, it is easy to de-compile a Java class file back to Java language source code \begin_inset LatexCommand \cite{JODE-WWW} \end_inset . This source code can then be read and analyzed by the user. Facing this problem, the \begin_inset Quotes eld \end_inset only safe course of action is to assume that ALL Java code will at some point be decompiled \begin_inset Quotes erd \end_inset ( \begin_inset LatexCommand \cite{JNS} \end_inset , page 68). \layout Standard We conclude that the principle of information hiding is not enough to provide a degree of security that both --users and programmers-- could accept. Programmers should not believe a good design makes a program \emph on secure \emph default . \layout Chapter Appendix \layout Section History of JustIce \layout Standard The author of JustIce once started to implement a class file decompiler like Jode \begin_inset LatexCommand \cite{JODE-WWW} \end_inset . It soon became clear that to successfully implement it, one should exploit the \begin_inset Quotes eld \end_inset well-behaved \begin_inset Quotes erd \end_inset property of class files (which essentially means that they pass a verifier, especially pass three) \begin_inset LatexCommand \cite{Krakatoa-WWW} \end_inset . \layout Standard JustIce was then developed to understand the \begin_inset Quotes eld \end_inset well-behaved \begin_inset Quotes erd \end_inset property of usual class files. It took much longer to complete than estimated because of the many inherent bugs and ambiguities in The Java Virtual Machine Specification, Second Edition \begin_inset LatexCommand \cite{vmspec2} \end_inset . \layout Standard Its name starts with a \emph on J \emph default like Java does, referring to the tradition of giving Java-related software such names. The second part of the name, \emph on ICE \emph default , was inspired by a novel by William Gibson \begin_inset LatexCommand \cite{Neuromancer} \end_inset . It is an acronym for \emph on Intrusion Countermeasures Electronics \emph default , something that is very much like today's firewall systems (see section \begin_inset LatexCommand \ref{Firewall} \end_inset ). He credits the invention of \emph on ICE \emph default to Tom Maddox. The missing three letters were inserted to create a word that makes sense; in fact, choosing the three-letter combination \emph on ust \emph default resulted in the creation of a word with a double sense via bi-capitalization. \layout Standard JustIce was written using and extending the excellent Byte Code Engineering Library \begin_inset LatexCommand \cite{BCEL-WWW,BCEL98} \end_inset by Markus Dahm. It really helped a lot and sped up development time. \layout Standard It was also --last but not least-- written to earn its author a German \emph on Dipl.-Inform. \emph default degree which one may compare to a \emph on master \emph default degree. \layout Section Flaws and Ambiguities Encountered \layout Standard While designing, implementing and testing JustIce, a lot of interesting flaws and ambiguities were found in the specification \begin_inset LatexCommand \cite{vmspec2} \end_inset , the Java compiler \emph on javac \emph default and the JVM \emph on java \emph default . \layout Subsection Flaws in the Java Virtual Machine Specification \layout Standard The Java Virtual Machine Specification, Second Edition was derived from an in-house document describing the as-is implementation of Sun's genuine Java Virtual Machine ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page xiv). This sometimes leads to problems as there are still a few points left where Sun's engineers forgot to describe specification details to the public, in error assuming they would be implementation details. Another source of mistakes are ambiguities, inherent to natural languages auch as English. \layout Subsubsection A Code Length Maximum of 65535 Bytes per Method \layout Standard On page 152, The Java Virtual Machine Specification, Second Edition \begin_inset LatexCommand \cite{vmspec2} \end_inset says that code arrays may at most have a length of 65536 bytes because certain indices that point into the code are only 16 bits of width. Page 134 states the code must have \begin_inset Quotes gld \end_inset less than \begin_inset Quotes grd \end_inset 65536 bytes. Therefore, the limitation stated on page 152 is not helpful, but only confusing. \layout Subsubsection Subroutines \layout Standard The implementation of a provably correct verifier is not possible because of the ambiguities in the specification \begin_inset LatexCommand \cite{vmspec2} \end_inset . To reach this goal, various efforts have been made to describe the verifier and the JVM formally \begin_inset LatexCommand \cite{Qian,StataAbadi,FreundMitchell,JBook,JPaper} \end_inset . By restricting the code \emph on javac \emph default produces or by redefining the verifier's behaviour, however, they are never one-to-one with the behaviour of the existing JVMs. \layout Standard Sun's specification does not define the term \emph on subroutine \emph default although it is used. Instead, it is explained what bytecode the Java \emph on compiler \emph default generates when a \family typewriter finally \family default clause appears in the Java \emph on language \emph default source code -- this definitely does not belong there, because a verifier must never assume the code it verifies was created by Sun's \emph on javac \emph default compiler. \layout Standard Clarifying this issue could lead to an \emph on official \emph default formal specification. \layout Subsubsection The Specification Sometimes Satisfies the Verifier \layout Standard \begin_inset LatexCommand \label{InvokeInterfaceDescFONG} \end_inset Fong \begin_inset LatexCommand \cite{Fong2-WWW} \end_inset found in 1997 that the \family typewriter invokeinterface \family default opcode was underspecified in the first edition of the Java Virtual Machine Specification. He managed to create a class file that did not implement a specific interface but nevertheless used \family typewriter invokeinterface \family default to invoke a method. This class file passed the verifier (up to pass three), but the JVM found the problem during run-time (pass four). Fong concluded that the omission in the specification was done on purpose because the implementation of the data flow analyzer does not allow to check this constraint (please see section \begin_inset LatexCommand \ref{NewVerificationStrategy} \end_inset for a description of how this limitation could be overcome). However, in The Java Virtual Machine Specification, Second Edition \begin_inset LatexCommand \cite{vmspec2} \end_inset , the specification of \family typewriter invokeinterface \family default is corrected. \layout Standard Still, there is another case where one would suspect the specification describes the behaviour of the verifier: on pages 147 and 148 of the specification \begin_inset LatexCommand \cite{vmspec2} \end_inset , verification of instance initialization methods and newly created objects is explained. \begin_inset Quotes eld \end_inset A valid instruction sequence must not have an uninitialized object on the operand stack or in a local variable during a backwards branch, or in a local variable in code protected by an exception handler or a \family typewriter finally \family default clause \begin_inset Quotes erd \end_inset . Note that the Java language keyword \family typewriter finally \family default does not really belong here (Sun should speak of \emph on subroutines \emph default ), but more important is that this specification is made to satisfy the verification algorithm: \begin_inset Quotes eld \end_inset Otherwise, a devious piece of code might fool the verifier \begin_inset Quotes erd \end_inset . \layout Subsubsection \begin_inset LatexCommand \label{InnerBug} \end_inset The '$' Character as a Valid Part of a Java Name \layout Standard Because the \emph on javac \emph default compiler may create class files with a '$' character in their names as a result of Java source files defining inner classes, this character should no longer be a valid part of a Java name to avoid problems. I.e., the method invocation \emph on ja\SpecialChar \- va.lang.Cha\SpecialChar \- rac\SpecialChar \- ter.is\SpecialChar \- Ja\SpecialChar \- va\SpecialChar \- Iden\SpecialChar \- tifier\SpecialChar \- Part('$'); \emph default should return the value \family typewriter false \family default . \layout Subsection Flaws in the Implementation of the \emph on Java Platform \layout Subsubsection \begin_inset LatexCommand \label{javacRejected} \end_inset Sun's Verifier Rejects Code Produced by Sun's Compiler \layout Standard Surprisingly, there are a number of examples in which such a thing happens. \layout Paragraph \begin_inset LatexCommand \label{StaerkJreject} \end_inset Another Problem With Subroutines \layout Standard In \begin_inset LatexCommand \cite{JPaper} \end_inset , Stärk and Schmid give a few code examples which are compiled correctly by the \emph on javac \emph default compiler but the resulting code is rejected by the traditional verifiers. Algorithms \begin_inset LatexCommand \ref{StaerkJLang} \end_inset and \begin_inset LatexCommand \ref{StaerkJByteCode} \end_inset show one of their examples given in the Java programming language and the resulting output of the \emph on javac \emph default compiler. \begin_float alg \layout Caption \begin_inset LatexCommand \label{StaerkJLang} \end_inset Stärk and Schmid's Rejected Class, Java Language Version \layout Standard \family typewriter class Test1{ \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ int test(boolean b){ \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ int i; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ try{ \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ if (b) return 1; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ i=2; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ } \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ finally { \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ if (b) i = 3; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ } \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ return i; \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ } \newline \SpecialChar ~ \SpecialChar ~ \SpecialChar ~ } \end_float \layout Standard \begin_float alg \layout Caption \begin_inset LatexCommand \label{StaerkJByteCode} \end_inset Stärk and Schmid's Rejected Class, JVM Bytecode Version \layout Standard \family typewriter int test(boolean arg1) \layout Standard \family typewriter Code(max_stack = 1, max_locals = 6, code_length = 39) \layout Standard \family typewriter 0: iload_1 \layout Standard \family typewriter 1: ifeq #11 \layout Standard \family typewriter 4: iconst_1 \layout Standard \family typewriter 5: istore_3 \layout Standard \family typewriter 6: jsr #27 \layout Standard \family typewriter 9: iload_3 \layout Standard \family typewriter 10: ireturn \layout Standard \family typewriter 11: iconst_2 \layout Standard \family typewriter 12: istore_2 \layout Standard \family typewriter 13: jsr #27 \layout Standard \family typewriter 16: goto #37 \layout Standard \family typewriter 19: astore %4 \layout Standard \family typewriter 21: jsr #27 \layout Standard \family typewriter 24: aload %4 \layout Standard \family typewriter 26: athrow \layout Standard \family typewriter 27: astore %5 \layout Standard \family typewriter 29: iload_1 \layout Standard \family typewriter 30: ifeq #35 \layout Standard \family typewriter 33: iconst_3 \layout Standard \family typewriter 34: istore_2 \layout Standard \family typewriter 35: ret %5 \layout Standard \family typewriter 37: iload_2 \layout Standard \family typewriter 38: ireturn \end_float If one tries to run this bytecode using a JVM by IBM Corporation, the code is rejected \begin_float footnote \layout Standard It is also rejected by Sun's JVMs and the Kimera verifier \begin_inset LatexCommand \cite{Kimera-WWW} \end_inset . \end_float : \newline \family typewriter ehaase@haneman:/home/ehaase > java Test1 \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ Exception in thread "main" java.lang.VerifyError: \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ (class: Test1, method: test signature: (Z)I) \newline \SpecialChar \- \SpecialChar ~ \SpecialChar ~ Localvariable 2 contains wrong type \newline \newline \family default In his lectures, Stärk explains that the problem lies in the polymorphic nature of JVM subroutines \begin_inset LatexCommand \cite{JLectures} \end_inset . Consider algorithm \begin_inset LatexCommand \ref{StaerkJByteCode} \end_inset . In line 12, an \family typewriter int \family default is put into local variable number 2. The subroutine starting at line 27 is then called from line number 13. Note that this subroutine accesses the local variable number 2. Finally, line 16 transfers control to line 37 where the verification problem occurs. An \family typewriter int \family default should be read from local variable number 2, but this is marked \family typewriter unusable \family default , because it was accessed in the subroutine. \layout Standard However, the specification ( \begin_inset LatexCommand \cite{vmspec2} \end_inset , page 151) states: \layout Itemize For any local variable that [\SpecialChar \ldots{} ] has been accessed or modified by the subroutine, use the type of the local variable at the time of the \family typewriter ret \family default . \layout Itemize For any other local variables, use the type of the local variable before the \family typewriter jsr \family default instruction. \layout Standard As one can see, in the above example local variable number 2 holds an \family typewriter int \family default data type in both cases; there is no need to mark it \family typewriter unusable \family default . This is the reason why JustIce does not reject the above bytecode, thus being slightly incompatible with the behaviour of other verifiers. \layout Paragraph The Maximum Method Length May Be Exceeded \layout Standard The \emph on javac \emph default compiler Sun included in the Java Development Kit version 1.3.0_01 does not check for the maximum method length of the \family typewriter code \family default array in a \family typewriter Code \family default attribute (see section \begin_inset LatexCommand \ref{CodeAttribute} \end_inset ). A test file containing 65000 lines like \begin_inset Quotes eld \end_inset \family typewriter Sys\SpecialChar \- tem.out.println( \begin_inset Quotes eld \end_inset Test \begin_inset Quotes erd \end_inset ); \family default \begin_inset Quotes erd \end_inset was compiled, but the resulting class file was rejected by the verifier. \layout Standard IBM Corporation's \emph on jikes \emph default compiler does not even generate code, but it locks up while compiling the test file. \layout Subsubsection A Compiler Issue Related to Inner Classes \layout Standard The \emph on javac \emph default compiler has to name class files, even those of so-called anonymous classes \begin_inset LatexCommand \cite{InnerSpec} \end_inset . \layout Standard This can cause problems: an inner class \emph on I \emph default defined in a class \emph on A \emph default will be compiled into a class file called \emph on A$I.class \emph default . A Java class named \emph on A$I \emph default will also be compiled into a class file named \emph on A$I.class \emph default overwriting the former class file. Because Sun did not forbid the ' \emph on $ \emph default ' character as a legal part of a Java identifier, the \emph on javac \emph default compiler should use a more sophisticated naming scheme. \layout Subsubsection \begin_inset LatexCommand \label{PassFourBug} \end_inset Pass Four is Only Partially Implemented \layout Standard Pass four defines run-time tests for constraints that could also be verified in pass three; it is only for performance reasons that these tests are delayed. Instead of having all the tests in one place, they are unnecessarily spread \begin_inset Quotes eld \end_inset making the validation of the verification algorithm itself extremely difficult \begin_inset Quotes erd \end_inset \begin_inset LatexCommand \cite{Fong-WWW} \end_inset . Risking security for better performance is often regarded as a bad decision. For instance, in the \layout Standard \family typewriter java version "1.3.0_01" \layout Standard \family typewriter Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0_01) \layout Standard \family typewriter Java HotSpot(TM) Client VM (build 1.3.0_01, mixed mode) \layout Standard Java Virtual Machine, the pass four check for access rights was unintentionally omitted. Sadly, other vendors license Sun's code and base their own implementations on that code. Therefore, mistakes are often inherited throughout the JVM vendors. The \layout Standard \family typewriter java version "1.3.0" \layout Standard \family typewriter Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0) \layout Standard \family typewriter Classic VM (build 1.3.0, J2RE 1.3.0 IBM build cx130-20010626 (JIT enabled: jitc)) \layout Standard Java Virtual Machine by IBM Corporation, for example, exposes the same mistake. \layout Section Related Work \layout Subsection The Kimera Project \layout Standard It is a misfortune that the Kimera \begin_inset LatexCommand \cite{Kimera-WWW} \end_inset project closed the World Wide Web presence and that the source code of the Kimera verifier was never released -- it would have been quite interesting to see how that respected verifier implementation deals with the problems arising concerning subroutine verification. \layout Standard However, Kimera is the single other stand-alone verifier besides JustIce the author knows of. The people behind the project found important security breaches in JVM implementations of various World Wide Web browsers. \layout Standard Also, they validated their verifier implementation and published several papers on JVM implementation verification \begin_inset LatexCommand \cite{Kimera-ProdGram,Kimera-TestingJVM} \end_inset . \layout Subsection The Verifier by Stärk, Schmid and Börger \layout Standard In \begin_inset LatexCommand \cite{JBook} \end_inset , the authors define the Java programming language and the Java virtual machine formally using \emph on Abstract State Machines \emph default (ASM). This also includes the verifier; its specifications have also been implemented in the functional programming language AsmGofer \begin_inset LatexCommand \cite{AsmGofer} \end_inset . This implementation is included on the CD-ROM that accompanies the book. \layout Standard The \begin_inset Quotes eld \end_inset \emph on JBook verifier \emph default \begin_inset Quotes erd \end_inset does not implement a complete class file verifier. It currently only implements the bytecode verification. Its input files are not class files itself, but a textual representation of class files in so-called Jasmin format \begin_inset LatexCommand \cite{JVM} \end_inset . Therefore, this implementation is merely of theoretical interest. \layout Standard It does, however, implement a bytecode verifier that is founded on a \emph on solid \emph default theory. This theory could become the standard for the interpretation of the JVM specification \begin_inset LatexCommand \cite{vmspec2} \end_inset . It could even change the specification to remove its ambiguities. \layout Standard There is also an unreleased version of this verifier implemented in the Java programming language using the BCEL. This implementation, if it should ever be released, promises a lot as it could combine usability and a solid theory. \layout Section \begin_inset LatexCommand \label{GPL} \end_inset The GNU General Public License \layout Standard \emph on GNU GENERAL PUBLIC LICENSE \layout Standard Version 2, June 1991 \layout Standard Copyright (C) 1989, 1991 Free Software Foundation, Inc. \layout Standard 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA \layout Standard Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. \layout Standard \emph on Preamble \layout Standard The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too.When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. \layout Standard To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. \layout Standard These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. \layout Standard We protect your rights with two steps: \layout Standard (1) copyright the software, and \layout Standard (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. \layout Standard Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. \layout Standard Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. \layout Standard The precise terms and conditions for copying, distribution and modification follow. \layout Standard \emph on GNU GENERAL PUBLIC LICENSE \layout Standard \emph on TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION \layout Standard 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modificat ion".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. \layout Standard 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropria tely publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. \layout Standard 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: \layout Standard a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. \layout Standard b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. \layout Standard c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. \layout Standard 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: \layout Standard a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, \layout Standard b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, \layout Standard c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. \layout Standard 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. \layout Standard 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. \layout Standard 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. \layout Standard 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. \layout Standard 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. \layout Standard 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. \layout Standard 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. \layout Standard \emph on NO WARRANTY \layout Standard 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. \layout Standard 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. \layout Standard \emph on END OF TERMS AND CONDITIONS \layout Addchap Glossary \layout Description Access\SpecialChar ~ modifiers In the Java programming language, the use of the keywords \family typewriter private \family default , \family typewriter protected \family default , \family typewriter public \family default (or the use of no keyword) defines the access rights for data or program code (also called visibility). This information is also used by the JVM: it is part of the class files. The most important modifier is \family typewriter private \family default which is used to globally deny access to a field or method. \layout Description Access\SpecialChar ~ rights Access rights are granted or denied by the use of \latex latex \backslash ( \backslash triangleright \backslash ) \latex default access modifiers. \layout Description API Applications Programming Interface. Such an interface is used to include functionality of foreign program modules (often \latex latex \latex default Java \latex latex \backslash ( \backslash triangleright \backslash ) \latex default packages) into own programs. \layout Description Debugger A program used to investigate the behaviour of another program. Often used to find and remove programming errors, so-called bugs. \layout Description Descriptor A symbolic description of type information. In the JVM's class files, strings in UTF-8 format \begin_inset LatexCommand \cite{Unicode} \end_inset are used to describe type information. \layout Description Field A member of a Java object or class, also called variable or attribute. \layout Description Method A member of a Java object or class. Methods include program code or they are abstract representatives for program code. A method can be compared to a \emph on function \emph default in programming languages like C or Pascal. \layout Description Opcode Operation Code. This denotes an instruction in an assembly-like computer language; to some people it means its binary representation. \layout Description Package A package is an entity used in both the Java programming language and the Java Virtual Machine definition. It is used to group classes that in the eyes of the programmer belong together. Package definitions have impact on \latex latex \backslash ( \backslash triangleright \backslash ) \latex default access rights granted to other classes. \layout Description Signature A method has a (possibly empty) set of arguments it expects, and it has a return type (possibly the \family typewriter void \family default type). The type information of the arguments and the return type together is called signature. A signature can be expressed in terms of a \latex latex \backslash ( \backslash triangleright \backslash ) \latex default descriptor. \layout Description Type A field or a method argument has a type such as \family typewriter int \family default or \family typewriter String \family default . In the JVM's context, all values are typed. Types can be expressed in terms of a \latex latex \backslash ( \backslash triangleright \backslash ) \latex default descriptor. \layout Standard \begin_inset LatexCommand \listoffigures{} \end_inset \layout Standard \latex latex \backslash addcontentsline{toc}{chapter}{List Of Figures} \layout Standard \begin_inset LatexCommand \listofalgorithms{} \end_inset \layout Standard \latex latex \backslash addcontentsline{toc}{chapter}{List Of Algorithms} \layout Bibliography \bibitem [AppMag-WWW]{AppMag-WWW} \latex latex \backslash addcontentsline{toc}{chapter}{Bibliography} \latex default AverStar's AppletMagic(tm): Ada for the Java Virtual Machine. \newline \emph on http://www.appletmagic.com \layout Bibliography \bibitem [AsmGofer]{AsmGofer} Joachim Schmid: AsmGofer. \newline \emph on http://www.tydo.org \layout Bibliography \bibitem [BCEL98]{BCEL98} Markus Dahm: Byte Code Engineering with the BCEL API. Freie Universität Berlin, Institut für Informatik. Technical Report B-17-98. \layout Bibliography \bibitem [BCEL-WWW]{BCEL-WWW} Markus Dahm: Byte Code Engineering Library. \emph on \newline http://bcel.sourceforge.net \layout Bibliography \bibitem [BCV-Soundness]{BCV-Soundness} Cornelia Pusch: Proving the Soundness of a Java Bytecode Verifier Specification in Isabelle/HOL. Technische Universität München, Institut für Informatik. \newline \emph on http://www.in.tum.de/~pusch/ \layout Bibliography \bibitem [C]{C} Brian W. Kerninghan, Dennis M. Ritchie: The C Programming Language, Second Edition, ANSI C. Prentice-Hall 1998, ISBN 0131103628. \layout Bibliography \bibitem [CPP-D]{CPP-D} Bjarne Stroustrup: Die C++ Programmiersprache. Addison-Wesly-Longman, 1998, ISBN 3-8273-1296-5. \layout Bibliography \bibitem [CPP-E]{CPP-E} Bjarne Stroustrup: The C++-Programming Language, Third Edition. Addison-Wesley 1997, ISBN 0-201-88954-4. \layout Bibliography \bibitem [D3]{D3} Guido Lang, Andreas Bohne: Delphi 3.0 lernen. Addison-Wesley-Longman 1997, ISBN 3-8273-1190-x. \layout Bibliography \bibitem [DesignPatterns]{DesignPatterns} Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides: Design Patterns Elements of Reusable Object-Oriented Software. Addison-Wesley 1995, ISBN: 0201633612. \layout Bibliography \bibitem [DragonBook]{DragonBook} Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman: Compilers: Principles, Techniques, and Tools. Addison-Wesley 1985, ISBN: 0201100886. \layout Bibliography \bibitem [EF]{EF} ElectricalFire. \emph on \newline http://www.mozilla.org/projects/ef/ \layout Bibliography \bibitem [f2j]{f2j} Keith Seymour: f2j - Fortran-to-Java Compiler. \newline \emph on http://cs.utk.edu/f2j/ \layout Bibliography \bibitem [Fong-WWW]{Fong-WWW} Philip W. L. Fong: The mysterious Pass One, first draft, September 2, 1997. \newline \emph on http://www.cs.sfu.ca/people/GradStudents/pwfong/personal/ JVM/pass1/ \layout Bibliography \bibitem [Fong2-WWW]{Fong2-WWW} Philip W. L. Fong: A Flaw with the Specification of the Invokeinterface Opcode. \newline \emph on http://www.cs.sfu.ca/people/GradStudents/pwfong/personal/ JVM/invokeinterface/ \layout Bibliography \bibitem [FreundMitchell]{FreundMitchell} Stephen N. Freund, John Mitchell: A Formal Framework for the Java Bytecode Language and Verifier. Department of Computer Science, Stanford University. Stanford, CA 94305-9045. Appeared in OOPSLA '99. \layout Bibliography \bibitem [GCC-WWW]{GCC-WWW} GCC, The GNU compiler collection. \emph on \newline http://gcc.gnu.org \layout Bibliography \bibitem [GJ-WWW]{GJ-WWW} GJ. A Generic Java Language Extension. \newline \emph on http://www.cis.unisa.edu.au/~pizza/gj/ \layout Bibliography \bibitem [InnerSpec]{InnerSpec} Sun Microsystems: Inner Classes Specification. \newline \emph on http://java.sun.com/products/jdk/1.1/docs/guide/ \newline innerclasses/spec/innerclasses.doc.html \layout Bibliography \bibitem [J2ME-CLDCS]{J2ME-CLDCS} Sun Microsystems: J2ME \latex latex \backslash texttrademark \latex default \SpecialChar ~ Connected Limited Device Configuration Specification. \newline \emph on http://jcp.org/aboutJava/communityprocess/final/jsr030/ \layout Bibliography \bibitem [JBook]{JBook} Robert Stärk, Joachim Schmid, Egon Börger: Java \latex latex \backslash texttrademark\SpecialChar ~ \latex default and the Java \latex latex \backslash texttrademark\SpecialChar ~ \latex default Virtual Machine. Springer-Verlag 2001, ISBN 3-540-42088-6. \newline \emph on http://www.inf.ethz.ch/~jbook/ \layout Bibliography \bibitem [JPaper]{JPaper} Robert F. Stärk, Joachim Schmid: Java bytecode verification is not possible. ETH Zürich, Department of Computer Science 2000. \emph on \newline http://www.inf.ethz.ch/~staerk/pdf/jbv00.pdf \layout Bibliography \bibitem [JLectures]{JLectures} Robert F. Stärk: Java and the JVM: Definition and Verification (37-474). \newline \emph on http://www.inf.ethz.ch/~jbook/eth37474/ \newline http://www.inf.ethz.ch/~jbook/eth37474/javaBV.pdf \layout Bibliography \bibitem [JNS]{JNS} Robert Macgregor, Dave Durbin, John Owlett, Andrew Yeomans: JAVA \latex latex \backslash texttrademark \latex default \SpecialChar ~ Network Security. Prentice Hall 1998, ISBN 0137615299. \layout Bibliography \bibitem [JODE-WWW]{JODE-WWW} JODE is a java package containing a decompiler and an optimizer for java. \newline \emph on http://jode.sourceforge.net \layout Bibliography \bibitem [JustIce]{JustIce} Enver Haase: JustIce. A Free Class File Verifier for Java \latex latex \backslash texttrademark \latex default \SpecialChar ~ .Freie Universität Berlin, Takustraße 9, D-14195 Berlin; September 2001. \newline \emph on http://bcel.sourceforge.net/ \newline http://bcel.sourceforge.net/justice \layout Bibliography \bibitem [JVM]{JVM} Jon Meyer, Troy Downing: JAVA Virtual Machine. O'Reilly 1997, ISBN 1-56592-194-1. \layout Bibliography \bibitem [Kaffe-WWW]{Kaffe-WWW} Kaffe. Kaffe is a cleanroom, open source implementation of a Java virtual machine and class libraries. \emph on \newline http://www.kaffe.org \layout Bibliography \bibitem [KAWA-WWW]{KAWA-WWW} Kawa, the Java-based Scheme system. \emph on \newline http://http://www.gnu.org/software/kawa/ \layout Bibliography \bibitem [KDE]{KDE} KDE, the K desktop environment. \newline \emph on http://www.kde.org \layout Bibliography \bibitem [Kimera-WWW]{Kimera-WWW} The Kimera Verifier. \emph on \emph default \newline Currently off-line because of a World Wide Web presentation rework. \emph on \newline http://kimera.cs.washington.edu/verifier.html \newline http://www-kimera.cs.washington.edu \layout Bibliography \bibitem [Kimera-TestingJVM]{Kimera-TestingJVM} Emin Gün Sirer: Testing Java Virtual Machines. An Experience Report on Automatically Testing Java Virtual Machines. University of Washington, Dept. of Computer Science and Engineering. \newline \emph on http://kimera.cs.washington.edu \layout Bibliography \bibitem [Kimera-ProdGram]{Kimera-ProdGram} Emin Gün Sirer, Brian N. Bershad: Using Production Grammars in Software Testing. University of Washington, Department of Computer Science. \newline \emph on http://kimera.cs.washington.edu \layout Bibliography \bibitem [kissme-WWW]{kissme-WWW} kissme. A free Java Virtual Machine. \emph on \newline http://kissme.sourceforge.net \layout Bibliography \bibitem [Krakatoa-WWW]{Krakatoa-WWW} Todd A. Proebsting, Scott A. Watterson: Krakatoa: Decompilation in Java (Does Bytecode Reveal Source?). The University of Arizona, Department of Computer Science. \newline \emph on http://www.cs.arizona.edu/people/saw/papers/Krakatoa-COOTS97.ps.Z \layout Bibliography \bibitem [langspec2]{langspec2} James Gosling, Bill Joy, Guy Steele, Gilad Bracha: The Java Language Specificati on, Second Edition. Addison-Wesley 2000, ISBN 0201310082. \layout Bibliography \bibitem [M2]{M2} Niklaus Wirth: Programming in Modula-2, Fourth Edition. Springer-Verlag 1988, ISBN 3-540-50150-9. \layout Bibliography \bibitem [Mozilla]{Mozilla} Mozilla.org (The Mozilla Origanization): Mozilla. \newline \emph on http://www.mozilla.org \layout Bibliography \bibitem [Neuromancer]{Neuromancer} William Gibson: Neuromancer. Ace Books 1994, ISBN 0441000681. \layout Bibliography \bibitem [ORP-WWW]{ORP-WWW} Open Runtime Platform. A Platform For Bytecode System Research. \newline \emph on http://www.intel.com/research/mrl/orp/index.htm \layout Bibliography \bibitem [PL4JVM]{PL4JVM} Robert Tolksdorf: Programming Languages for the Java Virtual Machine. \newline \emph on http://www.robert-tolksdorf.de/vmlanguages.html \layout Bibliography \bibitem [PMG-WWW]{PMG-WWW} PMG. Poor Man's Genericity for Java. \newline \emph on \layout Bibliography \bibitem [Qian]{Qian} Zhenyu Qian: A Formal Specification of Java \latex latex \backslash texttrademark \latex default \SpecialChar ~ Virtual Machine Instructions for Objects, Methods and Subroutines. Bremen Institute for Safe Systems (BISS), FB3 Informatik, Universität Bremen, D-28334 Bremen, Germany. \layout Bibliography \bibitem [SableVM-WWW]{SableVM-WWW} SableVM. A Bytecode Interpreter. \emph on \newline http://www.sablevm.org \layout Bibliography \bibitem [StataAbadi]{StataAbadi} Raymie Stata and Martin Abadi: A Type System for Java Bytecode Subroutines. In: ACM Transactions on Programming Languages and Systems, Vol. 21, No. 1, January 1999, Pages 90-137. \layout Bibliography \bibitem [Unknowable]{Unknowable} G.J. Chaitin: The Unknowable. Springer-Verlag 1999, ISBN 981-4021-72-5. \newline \emph on http://www.umcs.maine.edu/~chaitin/unknowable/ \layout Bibliography \bibitem [Unicode]{Unicode} The Unicode Consortium: The Unicode Standard, Version 2.0. Niso Press 1996, ISBN 0-201-48345-9. \newline \emph on http://www.unicode.org \layout Bibliography \bibitem [Yellin-WWW]{Yellin-WWW} Frank Yellin: Low Level Security in Java. \emph on \newline http://java.sun.com/sfaq/verifier.html \layout Bibliography \bibitem [VMSPEC2]{vmspec2} Tim Lindholm, Frank Yellin: The Java \latex latex \backslash texttrademark\SpecialChar ~ \latex default Virtual Machine Specification, Second Edition. Addison-Wesley 1999, ISBN 0-201-43294-4. \the_end