1============================= 2Introduction to the Clang AST 3============================= 4 5This document gives a gentle introduction to the mysteries of the Clang 6AST. It is targeted at developers who either want to contribute to 7Clang, or use tools that work based on Clang's AST, like the AST 8matchers. 9 10.. raw:: html 11 12 <center><iframe width="560" height="315" src="http://www.youtube.com/embed/VqCkCDFLSsc?vq=hd720" frameborder="0" allowfullscreen></iframe></center> 13 14`Slides <http://llvm.org/devmtg/2013-04/klimek-slides.pdf>`_ 15 16Introduction 17============ 18 19Clang's AST is different from ASTs produced by some other compilers in 20that it closely resembles both the written C++ code and the C++ 21standard. For example, parenthesis expressions and compile time 22constants are available in an unreduced form in the AST. This makes 23Clang's AST a good fit for refactoring tools. 24 25Documentation for all Clang AST nodes is available via the generated 26`Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online 27documentation is also indexed by your favorite search engine, which will 28make a search for clang and the AST node's class name usually turn up 29the doxygen of the class you're looking for (for example, search for: 30clang ParenExpr). 31 32Examining the AST 33================= 34 35A good way to familarize yourself with the Clang AST is to actually look 36at it on some simple example code. Clang has a builtin AST-dump mode, 37which can be enabled with the flag ``-ast-dump``. 38 39Let's look at a simple example AST: 40 41:: 42 43 $ cat test.cc 44 int f(int x) { 45 int result = (x / 42); 46 return result; 47 } 48 49 # Clang by default is a frontend for many tools; -Xclang is used to pass 50 # options directly to the C++ frontend. 51 $ clang -Xclang -ast-dump -fsyntax-only test.cc 52 TranslationUnitDecl 0x5aea0d0 <<invalid sloc>> 53 ... cutting out internal declarations of clang ... 54 `-FunctionDecl 0x5aeab50 <test.cc:1:1, line:4:1> f 'int (int)' 55 |-ParmVarDecl 0x5aeaa90 <line:1:7, col:11> x 'int' 56 `-CompoundStmt 0x5aead88 <col:14, line:4:1> 57 |-DeclStmt 0x5aead10 <line:2:3, col:24> 58 | `-VarDecl 0x5aeac10 <col:3, col:23> result 'int' 59 | `-ParenExpr 0x5aeacf0 <col:16, col:23> 'int' 60 | `-BinaryOperator 0x5aeacc8 <col:17, col:21> 'int' '/' 61 | |-ImplicitCastExpr 0x5aeacb0 <col:17> 'int' <LValueToRValue> 62 | | `-DeclRefExpr 0x5aeac68 <col:17> 'int' lvalue ParmVar 0x5aeaa90 'x' 'int' 63 | `-IntegerLiteral 0x5aeac90 <col:21> 'int' 42 64 `-ReturnStmt 0x5aead68 <line:3:3, col:10> 65 `-ImplicitCastExpr 0x5aead50 <col:10> 'int' <LValueToRValue> 66 `-DeclRefExpr 0x5aead28 <col:10> 'int' lvalue Var 0x5aeac10 'result' 'int' 67 68The toplevel declaration in 69a translation unit is always the `translation unit 70declaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_. 71In this example, our first user written declaration is the `function 72declaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_ 73of "``f``". The body of "``f``" is a `compound 74statement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_, 75whose child nodes are a `declaration 76statement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_ 77that declares our result variable, and the `return 78statement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_. 79 80AST Context 81=========== 82 83All information about the AST for a translation unit is bundled up in 84the class 85`ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_. 86It allows traversal of the whole translation unit starting from 87`getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_, 88or to access Clang's `table of 89identifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_ 90for the parsed translation unit. 91 92AST Nodes 93========= 94 95Clang's AST nodes are modeled on a class hierarchy that does not have a 96common ancestor. Instead, there are multiple larger hierarchies for 97basic node types like 98`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and 99`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many 100important AST nodes derive from 101`Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_, 102`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_, 103`DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_ 104or `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with 105some classes deriving from both Decl and DeclContext. 106 107There are also a multitude of nodes in the AST that are not part of a 108larger hierarchy, and are only reachable from specific other nodes, like 109`CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_. 110 111Thus, to traverse the full AST, one starts from the 112`TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_ 113and then recursively traverses everything that can be reached from that 114node - this information has to be encoded for each specific node type. 115This algorithm is encoded in the 116`RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_. 117See the `RecursiveASTVisitor 118tutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_. 119 120The two most basic nodes in the Clang AST are statements 121(`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and 122declarations 123(`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note 124that expressions 125(`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are 126also statements in Clang's AST. 127