1//===----------------------------------------------------------------------===// 2// Clang Static Analyzer 3//===----------------------------------------------------------------------===// 4 5= Library Structure = 6 7The analyzer library has two layers: a (low-level) static analysis 8engine (GRExprEngine.cpp and friends), and some static checkers 9(*Checker.cpp). The latter are built on top of the former via the 10Checker and CheckerVisitor interfaces (Checker.h and 11CheckerVisitor.h). The Checker interface is designed to be minimal 12and simple for checker writers, and attempts to isolate them from much 13of the gore of the internal analysis engine. 14 15= How It Works = 16 17The analyzer is inspired by several foundational research papers ([1], 18[2]). (FIXME: kremenek to add more links) 19 20In a nutshell, the analyzer is basically a source code simulator that 21traces out possible paths of execution. The state of the program 22(values of variables and expressions) is encapsulated by the state 23(ProgramState). A location in the program is called a program point 24(ProgramPoint), and the combination of state and program point is a 25node in an exploded graph (ExplodedGraph). The term "exploded" comes 26from exploding the control-flow edges in the control-flow graph (CFG). 27 28Conceptually the analyzer does a reachability analysis through the 29ExplodedGraph. We start at a root node, which has the entry program 30point and initial state, and then simulate transitions by analyzing 31individual expressions. The analysis of an expression can cause the 32state to change, resulting in a new node in the ExplodedGraph with an 33updated program point and an updated state. A bug is found by hitting 34a node that satisfies some "bug condition" (basically a violation of a 35checking invariant). 36 37The analyzer traces out multiple paths by reasoning about branches and 38then bifurcating the state: on the true branch the conditions of the 39branch are assumed to be true and on the false branch the conditions 40of the branch are assumed to be false. Such "assumptions" create 41constraints on the values of the program, and those constraints are 42recorded in the ProgramState object (and are manipulated by the 43ConstraintManager). If assuming the conditions of a branch would 44cause the constraints to be unsatisfiable, the branch is considered 45infeasible and that path is not taken. This is how we get 46path-sensitivity. We reduce exponential blow-up by caching nodes. If 47a new node with the same state and program point as an existing node 48would get generated, the path "caches out" and we simply reuse the 49existing node. Thus the ExplodedGraph is not a DAG; it can contain 50cycles as paths loop back onto each other and cache out. 51 52ProgramState and ExplodedNodes are basically immutable once created. Once 53one creates a ProgramState, you need to create a new one to get a new 54ProgramState. This immutability is key since the ExplodedGraph represents 55the behavior of the analyzed program from the entry point. To 56represent these efficiently, we use functional data structures (e.g., 57ImmutableMaps) which share data between instances. 58 59Finally, individual Checkers work by also manipulating the analysis 60state. The analyzer engine talks to them via a visitor interface. 61For example, the PreVisitCallExpr() method is called by GRExprEngine 62to tell the Checker that we are about to analyze a CallExpr, and the 63checker is asked to check for any preconditions that might not be 64satisfied. The checker can do nothing, or it can generate a new 65ProgramState and ExplodedNode which contains updated checker state. If it 66finds a bug, it can tell the BugReporter object about the bug, 67providing it an ExplodedNode which is the last node in the path that 68triggered the problem. 69 70= Notes about C++ = 71 72Since now constructors are seen before the variable that is constructed 73in the CFG, we create a temporary object as the destination region that 74is constructed into. See ExprEngine::VisitCXXConstructExpr(). 75 76In ExprEngine::processCallExit(), we always bind the object region to the 77evaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the 78corresponding lazy compound value if the variable is not a reference, and 79bind the variable region to the lazy compound value. If the variable 80is a reference, just use the object region as the initilizer value. 81 82Before entering a C++ method (or ctor/dtor), the 'this' region is bound 83to the object region. In ctors, we synthesize 'this' region with 84CXXRecordDecl*, which means we do not use type qualifiers. In methods, we 85synthesize 'this' region with CXXMethodDecl*, which has getThisType() 86taking type qualifiers into account. It does not matter we use qualified 87'this' region in one method and unqualified 'this' region in another 88method, because we only need to ensure the 'this' region is consistent 89when we synthesize it and create it directly from CXXThisExpr in a single 90method call. 91 92= Working on the Analyzer = 93 94If you are interested in bringing up support for C++ expressions, the 95best place to look is the visitation logic in GRExprEngine, which 96handles the simulation of individual expressions. There are plenty of 97examples there of how other expressions are handled. 98 99If you are interested in writing checkers, look at the Checker and 100CheckerVisitor interfaces (Checker.h and CheckerVisitor.h). Also look 101at the files named *Checker.cpp for examples on how you can implement 102these interfaces. 103 104= Debugging the Analyzer = 105 106There are some useful command-line options for debugging. For example: 107 108$ clang -cc1 -help | grep analyze 109 -analyze-function <value> 110 -analyzer-display-progress 111 -analyzer-viz-egraph-graphviz 112 ... 113 114The first allows you to specify only analyzing a specific function. 115The second prints to the console what function is being analyzed. The 116third generates a graphviz dot file of the ExplodedGraph. This is 117extremely useful when debugging the analyzer and viewing the 118simulation results. 119 120Of course, viewing the CFG (Control-Flow Graph) is also useful: 121 122$ clang -cc1 -help | grep cfg 123 -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses 124 -cfg-add-initializers Add C++ initializers to CFGs for all analyses 125 -cfg-dump Display Control-Flow Graphs 126 -cfg-view View Control-Flow Graphs using GraphViz 127 -unoptimized-cfg Generate unoptimized CFGs for all analyses 128 129-cfg-dump dumps a textual representation of the CFG to the console, 130and -cfg-view creates a GraphViz representation. 131 132= References = 133 134[1] Precise interprocedural dataflow analysis via graph reachability, 135 T Reps, S Horwitz, and M Sagiv, POPL '95, 136 http://portal.acm.org/citation.cfm?id=199462 137 138[2] A memory model for static analysis of C programs, Z Xu, T 139 Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf 140