1=============================================================== 2Tutorial for building tools using LibTooling and LibASTMatchers 3=============================================================== 4 5This document is intended to show how to build a useful source-to-source 6translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is 7explicitly aimed at people who are new to Clang, so all you should need 8is a working knowledge of C++ and the command line. 9 10In order to work on the compiler, you need some basic knowledge of the 11abstract syntax tree (AST). To this end, the reader is incouraged to 12skim the :doc:`Introduction to the Clang 13AST <IntroductionToTheClangAST>` 14 15Step 0: Obtaining Clang 16======================= 17 18As Clang is part of the LLVM project, you'll need to download LLVM's 19source code first. Both Clang and LLVM are maintained as Subversion 20repositories, but we'll be accessing them through the git mirror. For 21further information, see the `getting started 22guide <http://llvm.org/docs/GettingStarted.html>`_. 23 24.. code-block:: console 25 26 mkdir ~/clang-llvm && cd ~/clang-llvm 27 git clone http://llvm.org/git/llvm.git 28 cd llvm/tools 29 git clone http://llvm.org/git/clang.git 30 cd clang/tools 31 git clone http://llvm.org/git/clang-tools-extra.git extra 32 33Next you need to obtain the CMake build system and Ninja build tool. You 34may already have CMake installed, but current binary versions of CMake 35aren't built with Ninja support. 36 37.. code-block:: console 38 39 cd ~/clang-llvm 40 git clone https://github.com/martine/ninja.git 41 cd ninja 42 git checkout release 43 ./bootstrap.py 44 sudo cp ninja /usr/bin/ 45 46 cd ~/clang-llvm 47 git clone git://cmake.org/stage/cmake.git 48 cd cmake 49 git checkout next 50 ./bootstrap 51 make 52 sudo make install 53 54Okay. Now we'll build Clang! 55 56.. code-block:: console 57 58 cd ~/clang-llvm 59 mkdir build && cd build 60 cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. 61 ninja 62 ninja check # Test LLVM only. 63 ninja clang-test # Test Clang only. 64 ninja install 65 66And we're live. 67 68All of the tests should pass, though there is a (very) small chance that 69you can catch LLVM and Clang out of sync. Running ``'git svn rebase'`` 70in both the llvm and clang directories should fix any problems. 71 72Finally, we want to set Clang as its own compiler. 73 74.. code-block:: console 75 76 cd ~/clang-llvm/build 77 ccmake ../llvm 78 79The second command will bring up a GUI for configuring Clang. You need 80to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on 81advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to 82``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to 83configure, then ``'g'`` to generate CMake's files. 84 85Finally, run ninja one last time, and you're done. 86 87Step 1: Create a ClangTool 88========================== 89 90Now that we have enough background knowledge, it's time to create the 91simplest productive ClangTool in existence: a syntax checker. While this 92already exists as ``clang-check``, it's important to understand what's 93going on. 94 95First, we'll need to create a new directory for our tool and tell CMake 96that it exists. As this is not going to be a core clang tool, it will 97live in the ``tools/extra`` repository. 98 99.. code-block:: console 100 101 cd ~/clang-llvm/llvm/tools/clang 102 mkdir tools/extra/loop-convert 103 echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt 104 vim tools/extra/loop-convert/CMakeLists.txt 105 106CMakeLists.txt should have the following contents: 107 108:: 109 110 set(LLVM_LINK_COMPONENTS support) 111 112 add_clang_executable(loop-convert 113 LoopConvert.cpp 114 ) 115 target_link_libraries(loop-convert 116 clangTooling 117 clangBasic 118 clangASTMatchers 119 ) 120 121With that done, Ninja will be able to compile our tool. Let's give it 122something to compile! Put the following into 123``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of 124why the different parts are needed can be found in the `LibTooling 125documentation <LibTooling.html>`_. 126 127.. code-block:: c++ 128 129 // Declares clang::SyntaxOnlyAction. 130 #include "clang/Frontend/FrontendActions.h" 131 #include "clang/Tooling/CommonOptionsParser.h" 132 #include "clang/Tooling/Tooling.h" 133 // Declares llvm::cl::extrahelp. 134 #include "llvm/Support/CommandLine.h" 135 136 using namespace clang::tooling; 137 using namespace llvm; 138 139 // Apply a custom category to all command-line options so that they are the 140 // only ones displayed. 141 static llvm::cl::OptionCategory MyToolCategory("my-tool options"); 142 143 // CommonOptionsParser declares HelpMessage with a description of the common 144 // command-line options related to the compilation database and input files. 145 // It's nice to have this help message in all tools. 146 static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); 147 148 // A help message for this specific tool can be added afterwards. 149 static cl::extrahelp MoreHelp("\nMore help text..."); 150 151 int main(int argc, const char **argv) { 152 CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); 153 ClangTool Tool(OptionsParser.getCompilations(), 154 OptionsParser.getSourcePathList()); 155 return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get()); 156 } 157 158And that's it! You can compile our new tool by running ninja from the 159``build`` directory. 160 161.. code-block:: console 162 163 cd ~/clang-llvm/build 164 ninja 165 166You should now be able to run the syntax checker, which is located in 167``~/clang-llvm/build/bin``, on any source file. Try it! 168 169.. code-block:: console 170 171 echo "int main() { return 0; }" > test.cpp 172 bin/loop-convert test.cpp -- 173 174Note the two dashes after we specify the source file. The additional 175options for the compiler are passed after the dashes rather than loading 176them from a compilation database - there just aren't any options needed 177right now. 178 179Intermezzo: Learn AST matcher basics 180==================================== 181 182Clang recently introduced the :doc:`ASTMatcher 183library <LibASTMatchers>` to provide a simple, powerful, and 184concise way to describe specific patterns in the AST. Implemented as a 185DSL powered by macros and templates (see 186`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're 187curious), matchers offer the feel of algebraic data types common to 188functional programming languages. 189 190For example, suppose you wanted to examine only binary operators. There 191is a matcher to do exactly that, conveniently named ``binaryOperator``. 192I'll give you one guess what this matcher does: 193 194.. code-block:: c++ 195 196 binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) 197 198Shockingly, it will match against addition expressions whose left hand 199side is exactly the literal 0. It will not match against other forms of 2000, such as ``'\0'`` or ``NULL``, but it will match against macros that 201expand to 0. The matcher will also not match against calls to the 202overloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` 203matcher to handle overloaded operators. 204 205There are AST matchers to match all the different nodes of the AST, 206narrowing matchers to only match AST nodes fulfilling specific criteria, 207and traversal matchers to get from one kind of AST node to another. For 208a complete list of AST matchers, take a look at the `AST Matcher 209References <LibASTMatchersReference.html>`_ 210 211All matcher that are nouns describe entities in the AST and can be 212bound, so that they can be referred to whenever a match is found. To do 213so, simply call the method ``bind`` on these matchers, e.g.: 214 215.. code-block:: c++ 216 217 variable(hasType(isInteger())).bind("intvar") 218 219Step 2: Using AST matchers 220========================== 221 222Okay, on to using matchers for real. Let's start by defining a matcher 223which will capture all ``for`` statements that define a new variable 224initialized to zero. Let's start with matching all ``for`` loops: 225 226.. code-block:: c++ 227 228 forStmt() 229 230Next, we want to specify that a single variable is declared in the first 231portion of the loop, so we can extend the matcher to 232 233.. code-block:: c++ 234 235 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) 236 237Finally, we can add the condition that the variable is initialized to 238zero. 239 240.. code-block:: c++ 241 242 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 243 hasInitializer(integerLiteral(equals(0)))))))) 244 245It is fairly easy to read and understand the matcher definition ("match 246loops whose init portion declares a single variable which is initialized 247to the integer literal 0"), but deciding that every piece is necessary 248is more difficult. Note that this matcher will not match loops whose 249variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of 250zero besides the integer 0. 251 252The last step is giving the matcher a name and binding the ``ForStmt`` 253as we will want to do something with it: 254 255.. code-block:: c++ 256 257 StatementMatcher LoopMatcher = 258 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 259 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 260 261Once you have defined your matchers, you will need to add a little more 262scaffolding in order to run them. Matchers are paired with a 263``MatchCallback`` and registered with a ``MatchFinder`` object, then run 264from a ``ClangTool``. More code! 265 266Add the following to ``LoopConvert.cpp``: 267 268.. code-block:: c++ 269 270 #include "clang/ASTMatchers/ASTMatchers.h" 271 #include "clang/ASTMatchers/ASTMatchFinder.h" 272 273 using namespace clang; 274 using namespace clang::ast_matchers; 275 276 StatementMatcher LoopMatcher = 277 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 278 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 279 280 class LoopPrinter : public MatchFinder::MatchCallback { 281 public : 282 virtual void run(const MatchFinder::MatchResult &Result) { 283 if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) 284 FS->dump(); 285 } 286 }; 287 288And change ``main()`` to: 289 290.. code-block:: c++ 291 292 int main(int argc, const char **argv) { 293 CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); 294 ClangTool Tool(OptionsParser.getCompilations(), 295 OptionsParser.getSourcePathList()); 296 297 LoopPrinter Printer; 298 MatchFinder Finder; 299 Finder.addMatcher(LoopMatcher, &Printer); 300 301 return Tool.run(newFrontendActionFactory(&Finder).get()); 302 } 303 304Now, you should be able to recompile and run the code to discover for 305loops. Create a new file with a few examples, and test out our new 306handiwork: 307 308.. code-block:: console 309 310 cd ~/clang-llvm/llvm/llvm_build/ 311 ninja loop-convert 312 vim ~/test-files/simple-loops.cc 313 bin/loop-convert ~/test-files/simple-loops.cc 314 315Step 3.5: More Complicated Matchers 316=================================== 317 318Our simple matcher is capable of discovering for loops, but we would 319still need to filter out many more ourselves. We can do a good portion 320of the remaining work with some cleverly chosen matchers, but first we 321need to decide exactly which properties we want to allow. 322 323How can we characterize for loops over arrays which would be eligible 324for translation to range-based syntax? Range based loops over arrays of 325size ``N`` that: 326 327- start at index ``0`` 328- iterate consecutively 329- end at index ``N-1`` 330 331We already check for (1), so all we need to add is a check to the loop's 332condition to ensure that the loop's index variable is compared against 333``N`` and another check to ensure that the increment step just 334increments this same variable. The matcher for (2) is straightforward: 335require a pre- or post-increment of the same variable declared in the 336init portion. 337 338Unfortunately, such a matcher is impossible to write. Matchers contain 339no logic for comparing two arbitrary AST nodes and determining whether 340or not they are equal, so the best we can do is matching more than we 341would like to allow, and punting extra comparisons to the callback. 342 343In any case, we can start building this sub-matcher. We can require that 344the increment step be a unary increment like this: 345 346.. code-block:: c++ 347 348 hasIncrement(unaryOperator(hasOperatorName("++"))) 349 350Specifying what is incremented introduces another quirk of Clang's AST: 351Usages of variables are represented as ``DeclRefExpr``'s ("declaration 352reference expressions") because they are expressions which refer to 353variable declarations. To find a ``unaryOperator`` that refers to a 354specific declaration, we can simply add a second condition to it: 355 356.. code-block:: c++ 357 358 hasIncrement(unaryOperator( 359 hasOperatorName("++"), 360 hasUnaryOperand(declRefExpr()))) 361 362Furthermore, we can restrict our matcher to only match if the 363incremented variable is an integer: 364 365.. code-block:: c++ 366 367 hasIncrement(unaryOperator( 368 hasOperatorName("++"), 369 hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) 370 371And the last step will be to attach an identifier to this variable, so 372that we can retrieve it in the callback: 373 374.. code-block:: c++ 375 376 hasIncrement(unaryOperator( 377 hasOperatorName("++"), 378 hasUnaryOperand(declRefExpr(to( 379 varDecl(hasType(isInteger())).bind("incrementVariable")))))) 380 381We can add this code to the definition of ``LoopMatcher`` and make sure 382that our program, outfitted with the new matcher, only prints out loops 383that declare a single variable initialized to zero and have an increment 384step consisting of a unary increment of some variable. 385 386Now, we just need to add a matcher to check if the condition part of the 387``for`` loop compares a variable against the size of the array. There is 388only one problem - we don't know which array we're iterating over 389without looking at the body of the loop! We are again restricted to 390approximating the result we want with matchers, filling in the details 391in the callback. So we start with: 392 393.. code-block:: c++ 394 395 hasCondition(binaryOperator(hasOperatorName("<")) 396 397It makes sense to ensure that the left-hand side is a reference to a 398variable, and that the right-hand side has integer type. 399 400.. code-block:: c++ 401 402 hasCondition(binaryOperator( 403 hasOperatorName("<"), 404 hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), 405 hasRHS(expr(hasType(isInteger()))))) 406 407Why? Because it doesn't work. Of the three loops provided in 408``test-files/simple.cpp``, zero of them have a matching condition. A 409quick look at the AST dump of the first for loop, produced by the 410previous iteration of loop-convert, shows us the answer: 411 412:: 413 414 (ForStmt 0x173b240 415 (DeclStmt 0x173afc8 416 0x173af50 "int i = 417 (IntegerLiteral 0x173afa8 'int' 0)") 418 <<>> 419 (BinaryOperator 0x173b060 '_Bool' '<' 420 (ImplicitCastExpr 0x173b030 'int' 421 (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) 422 (ImplicitCastExpr 0x173b048 'int' 423 (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) 424 (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' 425 (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) 426 (CompoundStatement ... 427 428We already know that the declaration and increments both match, or this 429loop wouldn't have been dumped. The culprit lies in the implicit cast 430applied to the first operand (i.e. the LHS) of the less-than operator, 431an L-value to R-value conversion applied to the expression referencing 432``i``. Thankfully, the matcher library offers a solution to this problem 433in the form of ``ignoringParenImpCasts``, which instructs the matcher to 434ignore implicit casts and parentheses before continuing to match. 435Adjusting the condition operator will restore the desired match. 436 437.. code-block:: c++ 438 439 hasCondition(binaryOperator( 440 hasOperatorName("<"), 441 hasLHS(ignoringParenImpCasts(declRefExpr( 442 to(varDecl(hasType(isInteger())))))), 443 hasRHS(expr(hasType(isInteger()))))) 444 445After adding binds to the expressions we wished to capture and 446extracting the identifier strings into variables, we have array-step-2 447completed. 448 449Step 4: Retrieving Matched Nodes 450================================ 451 452So far, the matcher callback isn't very interesting: it just dumps the 453loop's AST. At some point, we will need to make changes to the input 454source code. Next, we'll work on using the nodes we bound in the 455previous step. 456 457The ``MatchFinder::run()`` callback takes a 458``MatchFinder::MatchResult&`` as its parameter. We're most interested in 459its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` 460class to represent contextual information about the AST, as the name 461implies, though the most functionally important detail is that several 462operations require an ``ASTContext*`` parameter. More immediately useful 463is the set of matched nodes, and how we retrieve them. 464 465Since we bind three variables (identified by ConditionVarName, 466InitVarName, and IncrementVarName), we can obtain the matched nodes by 467using the ``getNodeAs()`` member function. 468 469In ``LoopConvert.cpp`` add 470 471.. code-block:: c++ 472 473 #include "clang/AST/ASTContext.h" 474 475Change ``LoopMatcher`` to 476 477.. code-block:: c++ 478 479 StatementMatcher LoopMatcher = 480 forStmt(hasLoopInit(declStmt( 481 hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))) 482 .bind("initVarName")))), 483 hasIncrement(unaryOperator( 484 hasOperatorName("++"), 485 hasUnaryOperand(declRefExpr( 486 to(varDecl(hasType(isInteger())).bind("incVarName")))))), 487 hasCondition(binaryOperator( 488 hasOperatorName("<"), 489 hasLHS(ignoringParenImpCasts(declRefExpr( 490 to(varDecl(hasType(isInteger())).bind("condVarName"))))), 491 hasRHS(expr(hasType(isInteger())))))).bind("forLoop"); 492 493And change ``LoopPrinter::run`` to 494 495.. code-block:: c++ 496 497 void LoopPrinter::run(const MatchFinder::MatchResult &Result) { 498 ASTContext *Context = Result.Context; 499 const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>("forLoop"); 500 // We do not want to convert header files! 501 if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc())) 502 return; 503 const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName"); 504 const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName"); 505 const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName"); 506 507 if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) 508 return; 509 llvm::outs() << "Potential array-based loop discovered.\n"; 510 } 511 512Clang associates a ``VarDecl`` with each variable to represent the variable's 513declaration. Since the "canonical" form of each declaration is unique by 514address, all we need to do is make sure neither ``ValueDecl`` (base class of 515``VarDecl``) is ``NULL`` and compare the canonical Decls. 516 517.. code-block:: c++ 518 519 static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { 520 return First && Second && 521 First->getCanonicalDecl() == Second->getCanonicalDecl(); 522 } 523 524If execution reaches the end of ``LoopPrinter::run()``, we know that the 525loop shell that looks like 526 527.. code-block:: c++ 528 529 for (int i= 0; i < expr(); ++i) { ... } 530 531For now, we will just print a message explaining that we found a loop. 532The next section will deal with recursively traversing the AST to 533discover all changes needed. 534 535As a side note, it's not as trivial to test if two expressions are the same, 536though Clang has already done the hard work for us by providing a way to 537canonicalize expressions: 538 539.. code-block:: c++ 540 541 static bool areSameExpr(ASTContext *Context, const Expr *First, 542 const Expr *Second) { 543 if (!First || !Second) 544 return false; 545 llvm::FoldingSetNodeID FirstID, SecondID; 546 First->Profile(FirstID, *Context, true); 547 Second->Profile(SecondID, *Context, true); 548 return FirstID == SecondID; 549 } 550 551This code relies on the comparison between two 552``llvm::FoldingSetNodeIDs``. As the documentation for 553``Stmt::Profile()`` indicates, the ``Profile()`` member function builds 554a description of a node in the AST, based on its properties, along with 555those of its children. ``FoldingSetNodeID`` then serves as a hash we can 556use to compare expressions. We will need ``areSameExpr`` later. Before 557you run the new code on the additional loops added to 558test-files/simple.cpp, try to figure out which ones will be considered 559potentially convertible. 560