1\documentclass{article} 2\title{AFU v4 Design} 3\author{Dan Brown} 4 5\usepackage{graphics} 6\usepackage{verbatim} 7\newenvironment{code}{\footnotesize\verbatim}{\endverbatim\normalsize} 8\newenvironment{jcode}{\footnotesize\verbatim}{\endverbatim\normalsize} 9 10\begin{document} 11\maketitle 12 13\section{Design Considerations} 14 15The Annotation File Utilities (AFU) include programs for extracting 16annotations from Java class files and inserting annotations into Java 17source or class files, in accordance with the Java 8 language and virtual 18machine specification. This 19document uses AFU in the singular, in reference to the source 20insertion program except where otherwise noted. 21 22The AFU suffers from multiple design flaws that impede performance and 23maintainability, including: 24 25\begin{description} 26\item[Unstructured text modification.] The AFU inserts 27annotations into source text instead of modifying syntax trees. 28The use of source positions rather than tree paths for indicating 29insertion placement has led to code that is fragile and hence difficult 30to maintain. 31\item[Non-uniform logic.] There are two distinct ways to 32identify where to insert annotations into the source, each having its 33own distinct data structures and logic. Consequently, it is possible to 34have inconsistent results for different representations of the same 35location. 36\item[Inefficient traversal.] The search strategy is 37generate-and-test: it chooses an AST node in a preorder traversal and 38checks every insertion specification relevant to the class (or package) 39to determine whether it applies to the current node. Furthermore, since 40the implementations of some tests contain recursive calls going up the 41path to the root, these tests can be repeated many times over for some 42nodes. 43\end{description} 44 45The proposed design is aimed at mitigating these problems. It makes 46the traversal a single pass over the annotations, with limited 47re-visitation of AST nodes. It reduces the chance for inconsistency 48between formats by finding the AST node that corresponds to a code 49location given in the scene before determining the insertion position. 50It modifies an AST data structure rather than the source itself, 51preserving formatting and commentary by incorporating non-leaf text 52(i.e.\ whitespace and comments) into the representation of the AST. 53 54\section{Current AFU} 55 56The current AFU source code (3.6.16 as of 11 June 2015) is organized 57into four packages (not counting the \texttt{asmx} and 58\texttt{scene\_lib} sub-projects): 59\begin{itemize} 60\item \texttt{annotator}, containing the main class and a class 61representing source text; 62\item \texttt{annotator.find},\ containing the classes that implement 63the core insertion logic; 64\item \texttt{annotator.scanner},\ containing support classes for 65\texttt{annotator.find};\ and 66\item \texttt{annotator.specification},\ containing classes that 67manage intermediate representations of the insertions specified in a 68Java Annotation Index File (JAIF). 69\end{itemize} 70Should the proposed redesign be implemented, it is anticipated that the 71first will be simplified somewhat and modified to reflect changes in the 72other packages, the second will be radically simplified, the third will 73remain the same, and the fourth will be eliminated. 74 75The details of \texttt{annotator.scanner}\ are not discussed here, since 76the proposed redesign will neither change them nor be influenced by 77them. Of the classes in \texttt{annotator.specification},\ only 78\texttt{IndexFileSpecification},\ which extracts \texttt{Insertion}s 79from \texttt{scene-lib}'s JAIF representation 80(\texttt{annotations.el.AScene}), is relevant to the redesign. 81 82By far the largest and most complex package is \texttt{annotator.find}. 83The insertion logic is distributed among \texttt{TreeFinder} and 84numerous implementations of the \texttt{Criterion} and 85\texttt{Insertion} interfaces. (A \texttt{Criterion} is a reified 86condition that provides a method 87\texttt{boolean isSatisfiedBy(TreePath path)}, 88and an \texttt{Insertion} includes matching criteria, output parameters, 89and methods for generating the source text to be inserted.) 90 91A simplified outline of the basic flow of the program is as follows: 92 93\begin{enumerate} 94\item Interpret each JAIF as a ``scene'' (\texttt{AScene}), then invoke 95the \texttt{parse} method in \texttt{IndexFileSpecification} to extract 96the indicated \texttt{Insertion}s from the scene. 97\item Parse each Java source file to obtain a (type-checked) AST. 98\item Create an empty map from (Integer) source positions to 99insertions. 100\item For each node in the AST in pre-order: 101\begin{itemize} 102\item For each extracted \texttt{Insertion} that is relevant to the 103immediately enclosing class (or package for compilation unit nodes): 104\begin{itemize} 105\item If the path from the root to the current node satisfies every 106\texttt{Criterion} for the \texttt{Insertion}: 107\begin{enumerate} 108\item Find the appropriate source position for the annotation. 109\item Add a mapping from the position to the \texttt{Insertion} to the 110map. 111\item Remove the insertion from the extracted set. 112\end{enumerate} 113\end{itemize} 114\end{itemize} 115\item For each position-to-insertion mapping in descending order by 116position: 117\begin{itemize} 118\item Generate insertion text and insert at indicated position. 119\end{itemize} 120\item Emit modified source code. 121\end{enumerate} 122 123\section{Proposed Revisions} 124 125\begin{figure}[ht] 126\begin{center} 127\resizebox{\columnwidth}{!}{\includegraphics{dataflow.png}} 128\end{center} 129\caption{ 130Data flow between components. Boxes represent sources and sinks; 131ellipses represent Java objects. Bold lines mark new classes and data 132paths, and dashed lines indicate current data flows to be bypassed. 133} 134\label{fig:dataflow} 135\end{figure} 136 137Figure~\ref{fig:dataflow} gives an overview of proposed changes to the 138flow of data between classes. First, the current re-interpretation of 139the scene as a ``specification'' and subsequent extraction of 140\texttt{Criteria} will be eliminated, as the necessary information can 141be determined directly from the \texttt{Scene}. Second, the source text 142will be incorporated into the augmented AST in a more convenient form, 143so there will no need to refer back to the source itself. 144 145The division of responsibilities among classes will remain largely the 146same, with the major differences corresponding to the changes in the 147data flow. First, the annotation locations will be read from the 148\texttt{Scene} and mapped to the corresponding path in the (augmented) 149AST (if any). Second, since \texttt{AAST} will incorporate the salient 150capabilities of \texttt{JCTree} and \texttt{Source}, it will cover the 151responsibilities of both classes. 152 153Many of the current supporting classes will be retained as they are. In 154particular, the new AFU will continue to rely on javac's AST classes and 155parser, and \texttt{scene-lib} probably will need little or no 156modification. The \texttt{annotator.find} package will be radically 157simplified with the elimination of the \texttt{Criterion} and 158(possibly) \texttt{Insertion} hierarchies, as the insertion location 159within the AST can be determined directly from the \texttt{Scene}. 160 161The redesign can be broken down into two independent parts: 162\begin{description} 163\item{Path-based positioning.} Eliminate the \texttt{Criterion} 164hierarchy; determine the insertion location and the text to be inserted 165during (a single) scene traversal; let the scene traversal rather than 166an AST traversal be the outer loop. Advantages: since insertion is 167based on the node rather than a numeric position, there is only one 168positioning logic (even if there are still two matching objects); an 169entire pass and a large amount of intermediate data are eliminated, thus 170improving time and space efficiency. 171\item{Tree transformation.} Eliminate the \texttt{Insertion} hierarchy; 172instead of calculating numeric positions, apply a transformation to the 173appropriate subtree. Advantage: the logic of tree transformation is 174both closer to the specification and easier to comprehend (and maintain) 175than the logic of finding numeric positions for inserting text by 176reverse order. 177\end{description} 178 179While these have no obvious technical disadvantages with respect to the 180existing AFU, the payoff must be weighed against the costs. The current 181codebase is mostly stable at this point, so any increase in reliability 182will be small (though for verification software, a jump from, say, 18399.99\% to 99.9999\% correct makes an enormous difference). 184Fortunately, the AFU's extensive test suite will make it clear how 185quickly a revised implementation achieves the same level of reliability. 186 187\subsection{Path-based Positioning} 188 189The current AFU creates transforms the scene into a set of 190\texttt{Insertions}, each with its own \texttt{Criteria} (a collection 191of objects of various \texttt{Criterion} subclasses). In doing so, it 192throws away information about hierarchical relationships that could be 193used to guide the search for matching locations in the code. 194 195\begin{figure}[ht] 196\begin{center} 197\resizebox{156pt}{!}{\includegraphics{corresp.png}} 198\end{center} 199\caption{ 200Correspondence between scene and AST. 201} 202\label{fig:corresp} 203\end{figure} 204 205It makes more sense to traverse an AST in parallel (conceptually, not 206necessarily by different processors) with a scene that refers to it. 207For example, if a class specification (\texttt{AClass}) in a scene 208corresponds to a class node in an AST, each time one of the method 209specifications (\texttt{AMethod}) is visited, the corresponding method 210node (see Figure~\ref{fig:corresp}) should be visited as well to see 211whether any insertions under the \texttt{AMethod} apply to it. This 212strategy not only obviates an additional pass but avoids much useless 213test repetition, as there is no need to test \emph{every} potential 214insertion against \emph{every} AST node encountered before the correct 215insertion site. With this change, it becomes possible to generate 216\texttt{Insertion}s on the fly---or perhaps even to go ahead and insert 217text, though managing order dependencies would take some additional 218work. 219 220The main loop of the program thus will become a pre-order traversal of 221\texttt{AScene} rather than of \texttt{JCTree}, although the scene 222traversal will explore corresponding parts of the AST. In other words: 223the process will be guided by the annotations to be inserted rather than 224by the shape of the AST. Hence, each specified annotation will be 225visited exactly once. The information currently used to create 226\texttt{Criteria} for the insertions can instead be used to directly 227determine the path (if any) in the AST to which the current scene 228element corresponds. 229 230\subsection{Tree Transformation} 231 232The current AFU finds numeric source positions in trees and inserts 233annotations directly into the source code. The position-finding logic 234is therefore applied at a time other than insertion, introducing 235insertion order dependencies and making it harder to find the source of 236a bug. 237 238A new class hierarchy, the ``Augmented AST'' (\texttt{AAST}), will 239directly manage non-leaf text--that is, inter-leaf text (whitespace and 240comments) and intra-branch text (implicit or contained in branch node 241data members). It will be an implementation of 242\texttt{com.sun.source.Tree} that provides needed bits from 243\texttt{com.sun.tools.javac.tree.JCTree} and defines a parallel subclass 244hierarchy. AAST and subclasses provide methods to retrieve the 245inter-leaf text segments that precede and follow an AST node and 246printing methods that preserve the formatting of the source code. 247 248Non-leaf text will be stored in three types of locations in an AAST: 249\begin{enumerate} 250\item Each leaf node will store any non-code text that immediately 251follows. 252\item For branch nodes, for every implicit text segment or data member 253that is represented in the source code but not in a descendant node, the 254node will store any whitespace and comments that follow, unless the 255content is at the end of the node's text. 256\item The computation unit node will store any text that precedes the 257beginning of the Java code. 258\end{enumerate} 259In this way, all text segment references occur exactly once in the AST, 260and annotations can be inserted at the front with no need to take 261preceding text into account. 262 263An annotation insertion will consist of a transformation of a tree node. 264If no additional code is to be generated, the transformation is simple: 265\begin{itemize} 266\item For type annotations, the parent AST node of the element to be 267annotated adds the optional annotation node if it is absent, then adds 268the annotation to the annotation node. 269\item For declaration annotations, the annotation is added onto the 270front of a declaration's list of modifiers. 271\end{itemize} 272Cases that involve code generation are more complex. 273Figures~\ref{fig:nocast} and \ref{fig:typecast} give an example (pre- 274and post-insertion, respectively) of a transformation that requires a 275typecast to be generated and inserted. Note that previously existing 276non-leaf text segments remain unaltered. 277 278\begin{figure}[ht] 279\begin{center} 280\resizebox{261pt}{!}{\includegraphics{nocast.png}} 281\end{center} 282\caption{ 283Augmented subgraph for \texttt{int i = 0}. Text of leaf nodes and text 284representing elements of branch nodes are connected with solid arrows, 285and inter-leaf and inter-element text segments are connected with dashed 286arrows. The text of the subgraph can be read from the lower frontier of 287the graph. (Underscores represent spaces here.) 288} 289\label{fig:nocast} 290\end{figure} 291 292\begin{figure}[ht] 293\begin{center} 294\resizebox{\columnwidth}{!}{\includegraphics{typecast.png}} 295\end{center} 296\caption{ 297Augmented subgraph for \texttt{int i = (@A int) 0}, the result of adding 298an annotated cast \texttt{(@A int)} to the previous figure's code. With 299respect to the graph in the previous figure, \texttt{TypeCast} has taken 300\texttt{Literal}'s place in the tree, and \texttt{Literal} has become a 301child of \texttt{TypeCast} along with the newly inserted material. 302} 303\label{fig:typecast} 304\end{figure} 305 306The other major change will be in the output generation process, which 307will depend on the \texttt{AAST} rather than directly on the source. 308The \texttt{AAST}'s pretty-printer will perform an in-order traversal 309and emit the text of each leaf interleaved with the non-leaf text. 310 311\subsection{Algorithm Summary} 312 313\begin{enumerate} 314\item Interpret each JAIF as a ``scene'' (\texttt{AScene}). 315\item Parse each Java source file to obtain a (type-checked) AST. 316\item For each AST: 317\begin{enumerate} 318\item Transform the AST into an AAST. 319\item For each annotated element in the scene in pre-order: 320\begin{itemize} 321\item Try to find the AAST node to which the element corresponds. 322\item If the node exists (or can be generated): 323\begin{itemize} 324\item Transform the subtree rooted at the node to incorporate the 325annotation. 326\end{itemize} 327\end{itemize} 328\item Pretty-print AAST. 329\end{enumerate} 330\end{enumerate} 331 332\section{Planning} 333 334There will be some time required for integrating the two parts if they 335are implemented separately. In particular, doing tree transformation 336separately from path-based positioning means reifying transformations 337so they can be done in a second pass. Hence it makes more sense to 338do path-based positioning on the way to doing tree transformation. 339 340Estimated times are as follows: 341\begin{itemize} 342\item Path-based positioning: 2.5 weeks. 343\item Tree transformation: 3 to 3.5 weeks if done after path-based 344positioning, 4 weeks otherwise. 345\end{itemize} 346Thus, doing the full job should take about six weeks. 347 348\appendix 349 350\section{API extensions (non-exhaustive)} 351 352\begin{jcode} 353public abstract class AAST extends com.sun.source.Tree { 354 /** Interleaves non-leaf text with text serialization of AAST. */ 355 public void prettyPrint(java.io.Writer w) throws java.io.IOException; 356 357 /** Insert single annotation at given path. */ 358 public insertAnnotation(TreePath path, Annotation anno); 359} 360 361// main loop 362AAST insertAnnotations(annotations.el.AScene scene, AAST aast); 363\end{jcode} 364 365\newpage 366\section{Excerpts from original API}\label{sec:exc} 367 368\begin{jcode} 369public interface Tree { 370 public enum Kind { // MUCH larger in actual interface 371 /** 372 * Used for instances of {@link BinaryTree} representing 373 * addition or string concatenation {@code +}. 374 */ 375 PLUS(BinaryTree.class), 376 /** 377 * Used for instances of {@link LiteralTree} representing 378 * an integral literal expression of type {@code int}. 379 */ 380 INT_LITERAL(LiteralTree.class); 381 } 382 Kind getKind(); 383 /** 384 * Accept method used to implement the visitor pattern. The 385 * visitor pattern is used to implement operations on trees. 386 * 387 * @param <R> result type of this operation. 388 * @param <D> type of additional data. 389 */ 390 <R,D> R accept(TreeVisitor<R,D> visitor, D data); 391} 392public interface ExpressionTree extends Tree {} 393public interface BinaryTree extends ExpressionTree { 394 ExpressionTree getLeftOperand(); 395 ExpressionTree getRightOperand(); 396} 397public interface LiteralTree extends ExpressionTree { 398 Object getValue(); 399} 400 401public abstract class JCTree implements Tree { 402 public int getStartPosition() { 403 return TreeInfo.getStartPos(this); 404 } 405 public static abstract class JCExpression 406 extends JCTree implements ExpressionTree {} 407 public static class JCBinary extends JCExpression implements BinaryTree { 408 private Tag opcode; 409 public JCExpression lhs; 410 public JCExpression rhs; 411 public Symbol operator; 412 protected JCBinary(Tag opcode, 413 JCExpression lhs, 414 JCExpression rhs, 415 Symbol operator) { 416 this.opcode = opcode; 417 this.lhs = lhs; 418 this.rhs = rhs; 419 this.operator = operator; 420 } 421 public Kind getKind() { return TreeInfo.tagToKind(getTag()); } 422 public JCExpression getLeftOperand() { return lhs; } 423 public JCExpression getRightOperand() { return rhs; } 424 public Symbol getOperator() { return operator; } 425 @Override 426 public <R,D> R accept(TreeVisitor<R,D> v, D d) { 427 return v.visitBinary(this, d); 428 } 429 } 430 public static class JCLiteral extends JCExpression implements LiteralTree { 431 public TypeTag typetag; 432 public Object value; 433 protected JCLiteral(TypeTag typetag, Object value) { 434 this.typetag = typetag; 435 this.value = value; 436 } 437 public Kind getKind() { 438 return typetag.getKindLiteral(); 439 } 440 public Object getValue() { 441 switch (typetag) { 442 case BOOLEAN: 443 int bi = (Integer) value; 444 return (bi != 0); 445 case CHAR: 446 int ci = (Integer) value; 447 char c = (char) ci; 448 if (c != ci) 449 throw new AssertionError("bad value for char literal"); 450 return c; 451 default: 452 return value; 453 } 454 } 455 @Override 456 public <R,D> R accept(TreeVisitor<R,D> v, D d) { 457 return v.visitLiteral(this, d); 458 } 459 } 460} 461\end{jcode} 462 463\end{document} 464 465