1============================================== 2Kaleidoscope: Adding JIT and Optimizer Support 3============================================== 4 5.. contents:: 6 :local: 7 8Chapter 4 Introduction 9====================== 10 11Welcome to Chapter 4 of the "`Implementing a language with 12LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation 13of a simple language and added support for generating LLVM IR. This 14chapter describes two new techniques: adding optimizer support to your 15language, and adding JIT compiler support. These additions will 16demonstrate how to get nice, efficient code for the Kaleidoscope 17language. 18 19Trivial Constant Folding 20======================== 21 22Our demonstration for Chapter 3 is elegant and easy to extend. 23Unfortunately, it does not produce wonderful code. The IRBuilder, 24however, does give us obvious optimizations when compiling simple code: 25 26:: 27 28 ready> def test(x) 1+2+x; 29 Read function definition: 30 define double @test(double %x) { 31 entry: 32 %addtmp = fadd double 3.000000e+00, %x 33 ret double %addtmp 34 } 35 36This code is not a literal transcription of the AST built by parsing the 37input. That would be: 38 39:: 40 41 ready> def test(x) 1+2+x; 42 Read function definition: 43 define double @test(double %x) { 44 entry: 45 %addtmp = fadd double 2.000000e+00, 1.000000e+00 46 %addtmp1 = fadd double %addtmp, %x 47 ret double %addtmp1 48 } 49 50Constant folding, as seen above, in particular, is a very common and 51very important optimization: so much so that many language implementors 52implement constant folding support in their AST representation. 53 54With LLVM, you don't need this support in the AST. Since all calls to 55build LLVM IR go through the LLVM IR builder, the builder itself checked 56to see if there was a constant folding opportunity when you call it. If 57so, it just does the constant fold and return the constant instead of 58creating an instruction. 59 60Well, that was easy :). In practice, we recommend always using 61``IRBuilder`` when generating code like this. It has no "syntactic 62overhead" for its use (you don't have to uglify your compiler with 63constant checks everywhere) and it can dramatically reduce the amount of 64LLVM IR that is generated in some cases (particular for languages with a 65macro preprocessor or that use a lot of constants). 66 67On the other hand, the ``IRBuilder`` is limited by the fact that it does 68all of its analysis inline with the code as it is built. If you take a 69slightly more complex example: 70 71:: 72 73 ready> def test(x) (1+2+x)*(x+(1+2)); 74 ready> Read function definition: 75 define double @test(double %x) { 76 entry: 77 %addtmp = fadd double 3.000000e+00, %x 78 %addtmp1 = fadd double %x, 3.000000e+00 79 %multmp = fmul double %addtmp, %addtmp1 80 ret double %multmp 81 } 82 83In this case, the LHS and RHS of the multiplication are the same value. 84We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``" 85instead of computing "``x+3``" twice. 86 87Unfortunately, no amount of local analysis will be able to detect and 88correct this. This requires two transformations: reassociation of 89expressions (to make the add's lexically identical) and Common 90Subexpression Elimination (CSE) to delete the redundant add instruction. 91Fortunately, LLVM provides a broad range of optimizations that you can 92use, in the form of "passes". 93 94LLVM Optimization Passes 95======================== 96 97LLVM provides many optimization passes, which do many different sorts of 98things and have different tradeoffs. Unlike other systems, LLVM doesn't 99hold to the mistaken notion that one set of optimizations is right for 100all languages and for all situations. LLVM allows a compiler implementor 101to make complete decisions about what optimizations to use, in which 102order, and in what situation. 103 104As a concrete example, LLVM supports both "whole module" passes, which 105look across as large of body of code as they can (often a whole file, 106but if run at link time, this can be a substantial portion of the whole 107program). It also supports and includes "per-function" passes which just 108operate on a single function at a time, without looking at other 109functions. For more information on passes and how they are run, see the 110`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the 111`List of LLVM Passes <../Passes.html>`_. 112 113For Kaleidoscope, we are currently generating functions on the fly, one 114at a time, as the user types them in. We aren't shooting for the 115ultimate optimization experience in this setting, but we also want to 116catch the easy and quick stuff where possible. As such, we will choose 117to run a few per-function optimizations as the user types the function 118in. If we wanted to make a "static Kaleidoscope compiler", we would use 119exactly the code we have now, except that we would defer running the 120optimizer until the entire file has been parsed. 121 122In order to get per-function optimizations going, we need to set up a 123`FunctionPassManager <../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold 124and organize the LLVM optimizations that we want to run. Once we have 125that, we can add a set of optimizations to run. We'll need a new 126FunctionPassManager for each module that we want to optimize, so we'll 127write a function to create and initialize both the module and pass manager 128for us: 129 130.. code-block:: c++ 131 132 void InitializeModuleAndPassManager(void) { 133 // Open a new module. 134 Context LLVMContext; 135 TheModule = llvm::make_unique<Module>("my cool jit", LLVMContext); 136 TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout()); 137 138 // Create a new pass manager attached to it. 139 TheFPM = llvm::make_unique<FunctionPassManager>(TheModule.get()); 140 141 // Provide basic AliasAnalysis support for GVN. 142 TheFPM.add(createBasicAliasAnalysisPass()); 143 // Do simple "peephole" optimizations and bit-twiddling optzns. 144 TheFPM.add(createInstructionCombiningPass()); 145 // Reassociate expressions. 146 TheFPM.add(createReassociatePass()); 147 // Eliminate Common SubExpressions. 148 TheFPM.add(createGVNPass()); 149 // Simplify the control flow graph (deleting unreachable blocks, etc). 150 TheFPM.add(createCFGSimplificationPass()); 151 152 TheFPM.doInitialization(); 153 } 154 155This code initializes the global module ``TheModule``, and the function pass 156manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is 157set up, we use a series of "add" calls to add a bunch of LLVM passes. 158 159In this case, we choose to add five passes: one analysis pass (alias analysis), 160and four optimization passes. The passes we choose here are a pretty standard set 161of "cleanup" optimizations that are useful for a wide variety of code. I won't 162delve into what they do but, believe me, they are a good starting place :). 163 164Once the PassManager is set up, we need to make use of it. We do this by 165running it after our newly created function is constructed (in 166``FunctionAST::codegen()``), but before it is returned to the client: 167 168.. code-block:: c++ 169 170 if (Value *RetVal = Body->codegen()) { 171 // Finish off the function. 172 Builder.CreateRet(RetVal); 173 174 // Validate the generated code, checking for consistency. 175 verifyFunction(*TheFunction); 176 177 // Optimize the function. 178 TheFPM->run(*TheFunction); 179 180 return TheFunction; 181 } 182 183As you can see, this is pretty straightforward. The 184``FunctionPassManager`` optimizes and updates the LLVM Function\* in 185place, improving (hopefully) its body. With this in place, we can try 186our test above again: 187 188:: 189 190 ready> def test(x) (1+2+x)*(x+(1+2)); 191 ready> Read function definition: 192 define double @test(double %x) { 193 entry: 194 %addtmp = fadd double %x, 3.000000e+00 195 %multmp = fmul double %addtmp, %addtmp 196 ret double %multmp 197 } 198 199As expected, we now get our nicely optimized code, saving a floating 200point add instruction from every execution of this function. 201 202LLVM provides a wide variety of optimizations that can be used in 203certain circumstances. Some `documentation about the various 204passes <../Passes.html>`_ is available, but it isn't very complete. 205Another good source of ideas can come from looking at the passes that 206``Clang`` runs to get started. The "``opt``" tool allows you to 207experiment with passes from the command line, so you can see if they do 208anything. 209 210Now that we have reasonable code coming out of our front-end, lets talk 211about executing it! 212 213Adding a JIT Compiler 214===================== 215 216Code that is available in LLVM IR can have a wide variety of tools 217applied to it. For example, you can run optimizations on it (as we did 218above), you can dump it out in textual or binary forms, you can compile 219the code to an assembly file (.s) for some target, or you can JIT 220compile it. The nice thing about the LLVM IR representation is that it 221is the "common currency" between many different parts of the compiler. 222 223In this section, we'll add JIT compiler support to our interpreter. The 224basic idea that we want for Kaleidoscope is to have the user enter 225function bodies as they do now, but immediately evaluate the top-level 226expressions they type in. For example, if they type in "1 + 2;", we 227should evaluate and print out 3. If they define a function, they should 228be able to call it from the command line. 229 230In order to do this, we first declare and initialize the JIT. This is 231done by adding a global variable ``TheJIT``, and initializing it in 232``main``: 233 234.. code-block:: c++ 235 236 static std::unique_ptr<KaleidoscopeJIT> TheJIT; 237 ... 238 int main() { 239 .. 240 TheJIT = llvm::make_unique<KaleidoscopeJIT>(); 241 242 // Run the main "interpreter loop" now. 243 MainLoop(); 244 245 return 0; 246 } 247 248The KaleidoscopeJIT class is a simple JIT built specifically for these 249tutorials. In later chapters we will look at how it works and extend it with 250new features, but for now we will take it as given. Its API is very simple:: 251``addModule`` adds an LLVM IR module to the JIT, making its functions 252available for execution; ``removeModule`` removes a module, freeing any 253memory associated with the code in that module; and ``findSymbol`` allows us 254to look up pointers to the compiled code. 255 256We can take this simple API and change our code that parses top-level expressions to 257look like this: 258 259.. code-block:: c++ 260 261 static void HandleTopLevelExpression() { 262 // Evaluate a top-level expression into an anonymous function. 263 if (auto FnAST = ParseTopLevelExpr()) { 264 if (FnAST->codegen()) { 265 266 // JIT the module containing the anonymous expression, keeping a handle so 267 // we can free it later. 268 auto H = TheJIT->addModule(std::move(TheModule)); 269 InitializeModuleAndPassManager(); 270 271 // Search the JIT for the __anon_expr symbol. 272 auto ExprSymbol = TheJIT->findSymbol("__anon_expr"); 273 assert(ExprSymbol && "Function not found"); 274 275 // Get the symbol's address and cast it to the right type (takes no 276 // arguments, returns a double) so we can call it as a native function. 277 double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress(); 278 fprintf(stderr, "Evaluated to %f\n", FP()); 279 280 // Delete the anonymous expression module from the JIT. 281 TheJIT->removeModule(H); 282 } 283 284If parsing and codegen succeeed, the next step is to add the module containing 285the top-level expression to the JIT. We do this by calling addModule, which 286triggers code generation for all the functions in the module, and returns a 287handle that can be used to remove the module from the JIT later. Once the module 288has been added to the JIT it can no longer be modified, so we also open a new 289module to hold subsequent code by calling ``InitializeModuleAndPassManager()``. 290 291Once we've added the module to the JIT we need to get a pointer to the final 292generated code. We do this by calling the JIT's findSymbol method, and passing 293the name of the top-level expression function: ``__anon_expr``. Since we just 294added this function, we assert that findSymbol returned a result. 295 296Next, we get the in-memory address of the ``__anon_expr`` function by calling 297``getAddress()`` on the symbol. Recall that we compile top-level expressions 298into a self-contained LLVM function that takes no arguments and returns the 299computed double. Because the LLVM JIT compiler matches the native platform ABI, 300this means that you can just cast the result pointer to a function pointer of 301that type and call it directly. This means, there is no difference between JIT 302compiled code and native machine code that is statically linked into your 303application. 304 305Finally, since we don't support re-evaluation of top-level expressions, we 306remove the module from the JIT when we're done to free the associated memory. 307Recall, however, that the module we created a few lines earlier (via 308``InitializeModuleAndPassManager``) is still open and waiting for new code to be 309added. 310 311With just these two changes, lets see how Kaleidoscope works now! 312 313:: 314 315 ready> 4+5; 316 Read top-level expression: 317 define double @0() { 318 entry: 319 ret double 9.000000e+00 320 } 321 322 Evaluated to 9.000000 323 324Well this looks like it is basically working. The dump of the function 325shows the "no argument function that always returns double" that we 326synthesize for each top-level expression that is typed in. This 327demonstrates very basic functionality, but can we do more? 328 329:: 330 331 ready> def testfunc(x y) x + y*2; 332 Read function definition: 333 define double @testfunc(double %x, double %y) { 334 entry: 335 %multmp = fmul double %y, 2.000000e+00 336 %addtmp = fadd double %multmp, %x 337 ret double %addtmp 338 } 339 340 ready> testfunc(4, 10); 341 Read top-level expression: 342 define double @1() { 343 entry: 344 %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) 345 ret double %calltmp 346 } 347 348 Evaluated to 24.000000 349 350 ready> testfunc(5, 10); 351 ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved! 352 353 354Function definitions and calls also work, but something went very wrong on that 355last line. The call looks valid, so what happened? As you may have guessed from 356the the API a Module is a unit of allocation for the JIT, and testfunc was part 357of the same module that contained anonymous expression. When we removed that 358module from the JIT to free the memory for the anonymous expression, we deleted 359the definition of ``testfunc`` along with it. Then, when we tried to call 360testfunc a second time, the JIT could no longer find it. 361 362The easiest way to fix this is to put the anonymous expression in a separate 363module from the rest of the function definitions. The JIT will happily resolve 364function calls across module boundaries, as long as each of the functions called 365has a prototype, and is added to the JIT before it is called. By putting the 366anonymous expression in a different module we can delete it without affecting 367the rest of the functions. 368 369In fact, we're going to go a step further and put every function in its own 370module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT 371that will make our environment more REPL-like: Functions can be added to the 372JIT more than once (unlike a module where every function must have a unique 373definition). When you look up a symbol in KaleidoscopeJIT it will always return 374the most recent definition: 375 376:: 377 378 ready> def foo(x) x + 1; 379 Read function definition: 380 define double @foo(double %x) { 381 entry: 382 %addtmp = fadd double %x, 1.000000e+00 383 ret double %addtmp 384 } 385 386 ready> foo(2); 387 Evaluated to 3.000000 388 389 ready> def foo(x) x + 2; 390 define double @foo(double %x) { 391 entry: 392 %addtmp = fadd double %x, 2.000000e+00 393 ret double %addtmp 394 } 395 396 ready> foo(2); 397 Evaluated to 4.000000 398 399 400To allow each function to live in its own module we'll need a way to 401re-generate previous function declarations into each new module we open: 402 403.. code-block:: c++ 404 405 static std::unique_ptr<KaleidoscopeJIT> TheJIT; 406 407 ... 408 409 Function *getFunction(std::string Name) { 410 // First, see if the function has already been added to the current module. 411 if (auto *F = TheModule->getFunction(Name)) 412 return F; 413 414 // If not, check whether we can codegen the declaration from some existing 415 // prototype. 416 auto FI = FunctionProtos.find(Name); 417 if (FI != FunctionProtos.end()) 418 return FI->second->codegen(); 419 420 // If no existing prototype exists, return null. 421 return nullptr; 422 } 423 424 ... 425 426 Value *CallExprAST::codegen() { 427 // Look up the name in the global module table. 428 Function *CalleeF = getFunction(Callee); 429 430 ... 431 432 Function *FunctionAST::codegen() { 433 // Transfer ownership of the prototype to the FunctionProtos map, but keep a 434 // reference to it for use below. 435 auto &P = *Proto; 436 FunctionProtos[Proto->getName()] = std::move(Proto); 437 Function *TheFunction = getFunction(P.getName()); 438 if (!TheFunction) 439 return nullptr; 440 441 442To enable this, we'll start by adding a new global, ``FunctionProtos``, that 443holds the most recent prototype for each function. We'll also add a convenience 444method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``. 445Our convenience method searches ``TheModule`` for an existing function 446declaration, falling back to generating a new declaration from FunctionProtos if 447it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the 448call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to 449update the FunctionProtos map first, then call ``getFunction()``. With this 450done, we can always obtain a function declaration in the current module for any 451previously declared function. 452 453We also need to update HandleDefinition and HandleExtern: 454 455.. code-block:: c++ 456 457 static void HandleDefinition() { 458 if (auto FnAST = ParseDefinition()) { 459 if (auto *FnIR = FnAST->codegen()) { 460 fprintf(stderr, "Read function definition:"); 461 FnIR->dump(); 462 TheJIT->addModule(std::move(TheModule)); 463 InitializeModuleAndPassManager(); 464 } 465 } else { 466 // Skip token for error recovery. 467 getNextToken(); 468 } 469 } 470 471 static void HandleExtern() { 472 if (auto ProtoAST = ParseExtern()) { 473 if (auto *FnIR = ProtoAST->codegen()) { 474 fprintf(stderr, "Read extern: "); 475 FnIR->dump(); 476 FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST); 477 } 478 } else { 479 // Skip token for error recovery. 480 getNextToken(); 481 } 482 } 483 484In HandleDefinition, we add two lines to transfer the newly defined function to 485the JIT and open a new module. In HandleExtern, we just need to add one line to 486add the prototype to FunctionProtos. 487 488With these changes made, lets try our REPL again (I removed the dump of the 489anonymous functions this time, you should get the idea by now :) : 490 491:: 492 493 ready> def foo(x) x + 1; 494 ready> foo(2); 495 Evaluated to 3.000000 496 497 ready> def foo(x) x + 2; 498 ready> foo(2); 499 Evaluated to 4.000000 500 501It works! 502 503Even with this simple code, we get some surprisingly powerful capabilities - 504check this out: 505 506:: 507 508 ready> extern sin(x); 509 Read extern: 510 declare double @sin(double) 511 512 ready> extern cos(x); 513 Read extern: 514 declare double @cos(double) 515 516 ready> sin(1.0); 517 Read top-level expression: 518 define double @2() { 519 entry: 520 ret double 0x3FEAED548F090CEE 521 } 522 523 Evaluated to 0.841471 524 525 ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); 526 Read function definition: 527 define double @foo(double %x) { 528 entry: 529 %calltmp = call double @sin(double %x) 530 %multmp = fmul double %calltmp, %calltmp 531 %calltmp2 = call double @cos(double %x) 532 %multmp4 = fmul double %calltmp2, %calltmp2 533 %addtmp = fadd double %multmp, %multmp4 534 ret double %addtmp 535 } 536 537 ready> foo(4.0); 538 Read top-level expression: 539 define double @3() { 540 entry: 541 %calltmp = call double @foo(double 4.000000e+00) 542 ret double %calltmp 543 } 544 545 Evaluated to 1.000000 546 547Whoa, how does the JIT know about sin and cos? The answer is surprisingly 548simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that 549it uses to find symbols that aren't available in any given module: First 550it searches all the modules that have already been added to the JIT, from the 551most recent to the oldest, to find the newest definition. If no definition is 552found inside the JIT, it falls back to calling "``dlsym("sin")``" on the 553Kaleidoscope process itself. Since "``sin``" is defined within the JIT's 554address space, it simply patches up calls in the module to call the libm 555version of ``sin`` directly. 556 557In the future we'll see how tweaking this symbol resolution rule can be used to 558enable all sorts of useful features, from security (restricting the set of 559symbols available to JIT'd code), to dynamic code generation based on symbol 560names, and even lazy compilation. 561 562One immediate benefit of the symbol resolution rule is that we can now extend 563the language by writing arbitrary C++ code to implement operations. For example, 564if we add: 565 566.. code-block:: c++ 567 568 /// putchard - putchar that takes a double and returns 0. 569 extern "C" double putchard(double X) { 570 fputc((char)X, stderr); 571 return 0; 572 } 573 574Now we can produce simple output to the console by using things like: 575"``extern putchard(x); putchard(120);``", which prints a lowercase 'x' 576on the console (120 is the ASCII code for 'x'). Similar code could be 577used to implement file I/O, console input, and many other capabilities 578in Kaleidoscope. 579 580This completes the JIT and optimizer chapter of the Kaleidoscope 581tutorial. At this point, we can compile a non-Turing-complete 582programming language, optimize and JIT compile it in a user-driven way. 583Next up we'll look into `extending the language with control flow 584constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues 585along the way. 586 587Full Code Listing 588================= 589 590Here is the complete code listing for our running example, enhanced with 591the LLVM JIT and optimizer. To build this example, use: 592 593.. code-block:: bash 594 595 # Compile 596 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy 597 # Run 598 ./toy 599 600If you are compiling this on Linux, make sure to add the "-rdynamic" 601option as well. This makes sure that the external functions are resolved 602properly at runtime. 603 604Here is the code: 605 606.. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp 607 :language: c++ 608 609`Next: Extending the language: control flow <LangImpl05.html>`_ 610 611