• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1\documentclass{article}
2\title{AFU v4 Design}
3\author{Dan Brown}
4
5\usepackage{graphics}
6\usepackage{verbatim}
7\newenvironment{code}{\footnotesize\verbatim}{\endverbatim\normalsize}
8\newenvironment{jcode}{\footnotesize\verbatim}{\endverbatim\normalsize}
9
10\begin{document}
11\maketitle
12
13\section{Design Considerations}
14
15The Annotation File Utilities (AFU) include programs for extracting
16annotations from Java class files and inserting annotations into Java
17source or class files, in accordance with the Java 8 language and virtual
18machine specification.  This
19document uses AFU in the singular, in reference to the source
20insertion program except where otherwise noted.
21
22The AFU suffers from multiple design flaws that impede performance and
23maintainability, including:
24
25\begin{description}
26\item[Unstructured text modification.]  The AFU inserts
27annotations into source text instead of modifying syntax trees.
28The use of source positions rather than tree paths for indicating
29insertion placement has led to code that is fragile and hence difficult
30to maintain.
31\item[Non-uniform logic.]  There are two distinct ways to
32identify where to insert annotations into the source, each having its
33own distinct data structures and logic.  Consequently, it is possible to
34have inconsistent results for different representations of the same
35location.
36\item[Inefficient traversal.]  The search strategy is
37generate-and-test: it chooses an AST node in a preorder traversal and
38checks every insertion specification relevant to the class (or package)
39to determine whether it applies to the current node.  Furthermore, since
40the implementations of some tests contain recursive calls going up the
41path to the root, these tests can be repeated many times over for some
42nodes.
43\end{description}
44
45The proposed design is aimed at mitigating these problems.  It makes
46the traversal a single pass over the annotations, with limited
47re-visitation of AST nodes.  It reduces the chance for inconsistency
48between formats by finding the AST node that corresponds to a code
49location given in the scene before determining the insertion position.
50It modifies an AST data structure rather than the source itself,
51preserving formatting and commentary by incorporating non-leaf text
52(i.e.\ whitespace and comments) into the representation of the AST.
53
54\section{Current AFU}
55
56The current AFU source code (3.6.16 as of 11 June 2015) is organized
57into four packages (not counting the \texttt{asmx} and
58\texttt{scene\_lib} sub-projects):
59\begin{itemize}
60\item  \texttt{annotator}, containing the main class and a class
61representing source text;
62\item  \texttt{annotator.find},\ containing the classes that implement
63the core insertion logic;
64\item  \texttt{annotator.scanner},\ containing support classes for
65\texttt{annotator.find};\ and
66\item  \texttt{annotator.specification},\ containing classes that
67manage intermediate representations of the insertions specified in a
68Java Annotation Index File (JAIF).
69\end{itemize}
70Should the proposed redesign be implemented, it is anticipated that the
71first will be simplified somewhat and modified to reflect changes in the
72other packages, the second will be radically simplified, the third will
73remain the same, and the fourth will be eliminated.
74
75The details of \texttt{annotator.scanner}\ are not discussed here, since
76the proposed redesign will neither change them nor be influenced by
77them.  Of the classes in \texttt{annotator.specification},\ only
78\texttt{IndexFileSpecification},\ which extracts \texttt{Insertion}s
79from \texttt{scene-lib}'s JAIF representation
80(\texttt{annotations.el.AScene}), is relevant to the redesign.
81
82By far the largest and most complex package is \texttt{annotator.find}.
83The insertion logic is distributed among \texttt{TreeFinder} and
84numerous implementations of the \texttt{Criterion} and
85\texttt{Insertion} interfaces.  (A \texttt{Criterion} is a reified
86condition that provides a method
87\texttt{boolean isSatisfiedBy(TreePath path)},
88and an \texttt{Insertion} includes matching criteria, output parameters,
89and methods for generating the source text to be inserted.)
90
91A simplified outline of the basic flow of the program is as follows:
92
93\begin{enumerate}
94\item  Interpret each JAIF as a ``scene'' (\texttt{AScene}), then invoke
95the \texttt{parse} method in \texttt{IndexFileSpecification} to extract
96the indicated \texttt{Insertion}s from the scene.
97\item  Parse each Java source file to obtain a (type-checked) AST.
98\item  Create an empty map from (Integer) source positions to
99insertions.
100\item  For each node in the AST in pre-order:
101\begin{itemize}
102\item  For each extracted \texttt{Insertion} that is relevant to the
103immediately enclosing class (or package for compilation unit nodes):
104\begin{itemize}
105\item  If the path from the root to the current node satisfies every
106\texttt{Criterion} for the \texttt{Insertion}:
107\begin{enumerate}
108\item  Find the appropriate source position for the annotation.
109\item  Add a mapping from the position to the \texttt{Insertion} to the
110map.
111\item  Remove the insertion from the extracted set.
112\end{enumerate}
113\end{itemize}
114\end{itemize}
115\item  For each position-to-insertion mapping in descending order by
116position:
117\begin{itemize}
118\item  Generate insertion text and insert at indicated position.
119\end{itemize}
120\item  Emit modified source code.
121\end{enumerate}
122
123\section{Proposed Revisions}
124
125\begin{figure}[ht]
126\begin{center}
127\resizebox{\columnwidth}{!}{\includegraphics{dataflow.png}}
128\end{center}
129\caption{
130Data flow between components.  Boxes represent sources and sinks;
131ellipses represent Java objects.  Bold lines mark new classes and data
132paths, and dashed lines indicate current data flows to be bypassed.
133}
134\label{fig:dataflow}
135\end{figure}
136
137Figure~\ref{fig:dataflow} gives an overview of proposed changes to the
138flow of data between classes.  First, the current re-interpretation of
139the scene as a ``specification'' and subsequent extraction of
140\texttt{Criteria} will be eliminated, as the necessary information can
141be determined directly from the \texttt{Scene}.  Second, the source text
142will be incorporated into the augmented AST in a more convenient form,
143so there will no need to refer back to the source itself.
144
145The division of responsibilities among classes will remain largely the
146same, with the major differences corresponding to the changes in the
147data flow.  First, the annotation locations will be read from the
148\texttt{Scene} and mapped to the corresponding path in the (augmented)
149AST (if any).  Second, since \texttt{AAST} will incorporate the salient
150capabilities of \texttt{JCTree} and \texttt{Source}, it will cover the
151responsibilities of both classes.
152
153Many of the current supporting classes will be retained as they are.  In
154particular, the new AFU will continue to rely on javac's AST classes and
155parser, and \texttt{scene-lib} probably will need little or no
156modification.  The \texttt{annotator.find} package will be radically
157simplified with the elimination of the \texttt{Criterion} and
158(possibly) \texttt{Insertion} hierarchies, as the insertion location
159within the AST can be determined directly from the \texttt{Scene}.
160
161The redesign can be broken down into two independent parts:
162\begin{description}
163\item{Path-based positioning.} Eliminate the \texttt{Criterion}
164hierarchy; determine the insertion location and the text to be inserted
165during (a single) scene traversal; let the scene traversal rather than
166an AST traversal be the outer loop.  Advantages: since insertion is
167based on the node rather than a numeric position, there is only one
168positioning logic (even if there are still two matching objects); an
169entire pass and a large amount of intermediate data are eliminated, thus
170improving time and space efficiency.
171\item{Tree transformation.}  Eliminate the \texttt{Insertion} hierarchy;
172instead of calculating numeric positions, apply a transformation to the
173appropriate subtree.  Advantage: the logic of tree transformation is
174both closer to the specification and easier to comprehend (and maintain)
175than the logic of finding numeric positions for inserting text by
176reverse order.
177\end{description}
178
179While these have no obvious technical disadvantages with respect to the
180existing AFU, the payoff must be weighed against the costs.  The current
181codebase is mostly stable at this point, so any increase in reliability
182will be small (though for verification software, a jump from, say,
18399.99\% to 99.9999\% correct makes an enormous difference).
184Fortunately, the AFU's extensive test suite will make it clear how
185quickly a revised implementation achieves the same level of reliability.
186
187\subsection{Path-based Positioning}
188
189The current AFU creates transforms the scene into a set of
190\texttt{Insertions}, each with its own \texttt{Criteria} (a collection
191of objects of various \texttt{Criterion} subclasses).  In doing so, it
192throws away information about hierarchical relationships that could be
193used to guide the search for matching locations in the code.
194
195\begin{figure}[ht]
196\begin{center}
197\resizebox{156pt}{!}{\includegraphics{corresp.png}}
198\end{center}
199\caption{
200Correspondence between scene and AST.
201}
202\label{fig:corresp}
203\end{figure}
204
205It makes more sense to traverse an AST in parallel (conceptually, not
206necessarily by different processors) with a scene that refers to it.
207For example, if a class specification (\texttt{AClass}) in a scene
208corresponds to a class node in an AST, each time one of the method
209specifications (\texttt{AMethod}) is visited, the corresponding method
210node (see Figure~\ref{fig:corresp}) should be visited as well to see
211whether any insertions under the \texttt{AMethod} apply to it.  This
212strategy not only obviates an additional pass but avoids much useless
213test repetition, as there is no need to test \emph{every} potential
214insertion against \emph{every} AST node encountered before the correct
215insertion site.  With this change, it becomes possible to generate
216\texttt{Insertion}s on the fly---or perhaps even to go ahead and insert
217text, though managing order dependencies would take some additional
218work.
219
220The main loop of the program thus will become a pre-order traversal of
221\texttt{AScene} rather than of \texttt{JCTree}, although the scene
222traversal will explore corresponding parts of the AST.  In other words:
223the process will be guided by the annotations to be inserted rather than
224by the shape of the AST.  Hence, each specified annotation will be
225visited exactly once.  The information currently used to create
226\texttt{Criteria} for the insertions can instead be used to directly
227determine the path (if any) in the AST to which the current scene
228element corresponds.
229
230\subsection{Tree Transformation}
231
232The current AFU finds numeric source positions in trees and inserts
233annotations directly into the source code.  The position-finding logic
234is therefore applied at a time other than insertion, introducing
235insertion order dependencies and making it harder to find the source of
236a bug.
237
238A new class hierarchy, the ``Augmented AST'' (\texttt{AAST}), will
239directly manage non-leaf text--that is, inter-leaf text (whitespace and
240comments) and intra-branch text (implicit or contained in branch node
241data members).  It will be an implementation of
242\texttt{com.sun.source.Tree} that provides needed bits from
243\texttt{com.sun.tools.javac.tree.JCTree} and defines a parallel subclass
244hierarchy.  AAST and subclasses provide methods to retrieve the
245inter-leaf text segments that precede and follow an AST node and
246printing methods that preserve the formatting of the source code.
247
248Non-leaf text will be stored in three types of locations in an AAST:
249\begin{enumerate}
250\item  Each leaf node will store any non-code text that immediately
251follows.
252\item  For branch nodes, for every implicit text segment or data member
253that is represented in the source code but not in a descendant node, the
254node will store any whitespace and comments that follow, unless the
255content is at the end of the node's text.
256\item  The computation unit node will store any text that precedes the
257beginning of the Java code.
258\end{enumerate}
259In this way, all text segment references occur exactly once in the AST,
260and annotations can be inserted at the front with no need to take
261preceding text into account.
262
263An annotation insertion will consist of a transformation of a tree node.
264If no additional code is to be generated, the transformation is simple:
265\begin{itemize}
266\item  For type annotations, the parent AST node of the element to be
267annotated adds the optional annotation node if it is absent, then adds
268the annotation to the annotation node.
269\item  For declaration annotations, the annotation is added onto the
270front of a declaration's list of modifiers.
271\end{itemize}
272Cases that involve code generation are more complex.
273Figures~\ref{fig:nocast} and \ref{fig:typecast} give an example (pre-
274and post-insertion, respectively) of a transformation that requires a
275typecast to be generated and inserted.  Note that previously existing
276non-leaf text segments remain unaltered.
277
278\begin{figure}[ht]
279\begin{center}
280\resizebox{261pt}{!}{\includegraphics{nocast.png}}
281\end{center}
282\caption{
283Augmented subgraph for \texttt{int i = 0}.  Text of leaf nodes and text
284representing elements of branch nodes are connected with solid arrows,
285and inter-leaf and inter-element text segments are connected with dashed
286arrows.  The text of the subgraph can be read from the lower frontier of
287the graph.  (Underscores represent spaces here.)
288}
289\label{fig:nocast}
290\end{figure}
291
292\begin{figure}[ht]
293\begin{center}
294\resizebox{\columnwidth}{!}{\includegraphics{typecast.png}}
295\end{center}
296\caption{
297Augmented subgraph for \texttt{int i = (@A int) 0}, the result of adding
298an annotated cast \texttt{(@A int)} to the previous figure's code.  With
299respect to the graph in the previous figure, \texttt{TypeCast} has taken
300\texttt{Literal}'s place in the tree, and \texttt{Literal} has become a
301child of \texttt{TypeCast} along with the newly inserted material.
302}
303\label{fig:typecast}
304\end{figure}
305
306The other major change will be in the output generation process, which
307will depend on the \texttt{AAST} rather than directly on the source.
308The \texttt{AAST}'s pretty-printer will perform an in-order traversal
309and emit the text of each leaf interleaved with the non-leaf text.
310
311\subsection{Algorithm Summary}
312
313\begin{enumerate}
314\item  Interpret each JAIF as a ``scene'' (\texttt{AScene}).
315\item  Parse each Java source file to obtain a (type-checked) AST.
316\item  For each AST:
317\begin{enumerate}
318\item  Transform the AST into an AAST.
319\item  For each annotated element in the scene in pre-order:
320\begin{itemize}
321\item  Try to find the AAST node to which the element corresponds.
322\item  If the node exists (or can be generated):
323\begin{itemize}
324\item  Transform the subtree rooted at the node to incorporate the
325annotation.
326\end{itemize}
327\end{itemize}
328\item  Pretty-print AAST.
329\end{enumerate}
330\end{enumerate}
331
332\section{Planning}
333
334There will be some time required for integrating the two parts if they
335are implemented separately.  In particular, doing tree transformation
336separately from path-based positioning means reifying transformations
337so they can be done in a second pass.  Hence it makes more sense to
338do path-based positioning on the way to doing tree transformation.
339
340Estimated times are as follows:
341\begin{itemize}
342\item  Path-based positioning: 2.5 weeks.
343\item  Tree transformation: 3 to 3.5 weeks if done after path-based
344positioning, 4 weeks otherwise.
345\end{itemize}
346Thus, doing the full job should take about six weeks.
347
348\appendix
349
350\section{API extensions (non-exhaustive)}
351
352\begin{jcode}
353public abstract class AAST extends com.sun.source.Tree {
354  /** Interleaves non-leaf text with text serialization of AAST. */
355  public void prettyPrint(java.io.Writer w) throws java.io.IOException;
356
357  /** Insert single annotation at given path. */
358  public insertAnnotation(TreePath path, Annotation anno);
359}
360
361// main loop
362AAST insertAnnotations(annotations.el.AScene scene, AAST aast);
363\end{jcode}
364
365\newpage
366\section{Excerpts from original API}\label{sec:exc}
367
368\begin{jcode}
369public interface Tree {
370    public enum Kind {  // MUCH larger in actual interface
371        /**
372         * Used for instances of {@link BinaryTree} representing
373         * addition or string concatenation {@code +}.
374         */
375        PLUS(BinaryTree.class),
376        /**
377         * Used for instances of {@link LiteralTree} representing
378         * an integral literal expression of type {@code int}.
379         */
380        INT_LITERAL(LiteralTree.class);
381    }
382    Kind getKind();
383    /**
384     * Accept method used to implement the visitor pattern.  The
385     * visitor pattern is used to implement operations on trees.
386     *
387     * @param <R> result type of this operation.
388     * @param <D> type of additional data.
389     */
390    <R,D> R accept(TreeVisitor<R,D> visitor, D data);
391}
392public interface ExpressionTree extends Tree {}
393public interface BinaryTree extends ExpressionTree {
394    ExpressionTree getLeftOperand();
395    ExpressionTree getRightOperand();
396}
397public interface LiteralTree extends ExpressionTree {
398    Object getValue();
399}
400
401public abstract class JCTree implements Tree {
402    public int getStartPosition() {
403        return TreeInfo.getStartPos(this);
404    }
405    public static abstract class JCExpression
406            extends JCTree implements ExpressionTree {}
407    public static class JCBinary extends JCExpression implements BinaryTree {
408        private Tag opcode;
409        public JCExpression lhs;
410        public JCExpression rhs;
411        public Symbol operator;
412        protected JCBinary(Tag opcode,
413                         JCExpression lhs,
414                         JCExpression rhs,
415                         Symbol operator) {
416            this.opcode = opcode;
417            this.lhs = lhs;
418            this.rhs = rhs;
419            this.operator = operator;
420        }
421        public Kind getKind() { return TreeInfo.tagToKind(getTag()); }
422        public JCExpression getLeftOperand() { return lhs; }
423        public JCExpression getRightOperand() { return rhs; }
424        public Symbol getOperator() { return operator; }
425        @Override
426        public <R,D> R accept(TreeVisitor<R,D> v, D d) {
427            return v.visitBinary(this, d);
428        }
429    }
430    public static class JCLiteral extends JCExpression implements LiteralTree {
431        public TypeTag typetag;
432        public Object value;
433        protected JCLiteral(TypeTag typetag, Object value) {
434            this.typetag = typetag;
435            this.value = value;
436        }
437        public Kind getKind() {
438            return typetag.getKindLiteral();
439        }
440        public Object getValue() {
441            switch (typetag) {
442                case BOOLEAN:
443                    int bi = (Integer) value;
444                    return (bi != 0);
445                case CHAR:
446                    int ci = (Integer) value;
447                    char c = (char) ci;
448                    if (c != ci)
449                        throw new AssertionError("bad value for char literal");
450                    return c;
451                default:
452                    return value;
453            }
454        }
455        @Override
456        public <R,D> R accept(TreeVisitor<R,D> v, D d) {
457            return v.visitLiteral(this, d);
458        }
459    }
460}
461\end{jcode}
462
463\end{document}
464
465