1============================================== 2Kaleidoscope: Adding JIT and Optimizer Support 3============================================== 4 5.. contents:: 6 :local: 7 8Chapter 4 Introduction 9====================== 10 11Welcome to Chapter 4 of the "`Implementing a language with 12LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation 13of a simple language and added support for generating LLVM IR. This 14chapter describes two new techniques: adding optimizer support to your 15language, and adding JIT compiler support. These additions will 16demonstrate how to get nice, efficient code for the Kaleidoscope 17language. 18 19Trivial Constant Folding 20======================== 21 22Our demonstration for Chapter 3 is elegant and easy to extend. 23Unfortunately, it does not produce wonderful code. The IRBuilder, 24however, does give us obvious optimizations when compiling simple code: 25 26:: 27 28 ready> def test(x) 1+2+x; 29 Read function definition: 30 define double @test(double %x) { 31 entry: 32 %addtmp = fadd double 3.000000e+00, %x 33 ret double %addtmp 34 } 35 36This code is not a literal transcription of the AST built by parsing the 37input. That would be: 38 39:: 40 41 ready> def test(x) 1+2+x; 42 Read function definition: 43 define double @test(double %x) { 44 entry: 45 %addtmp = fadd double 2.000000e+00, 1.000000e+00 46 %addtmp1 = fadd double %addtmp, %x 47 ret double %addtmp1 48 } 49 50Constant folding, as seen above, in particular, is a very common and 51very important optimization: so much so that many language implementors 52implement constant folding support in their AST representation. 53 54With LLVM, you don't need this support in the AST. Since all calls to 55build LLVM IR go through the LLVM IR builder, the builder itself checked 56to see if there was a constant folding opportunity when you call it. If 57so, it just does the constant fold and return the constant instead of 58creating an instruction. 59 60Well, that was easy :). In practice, we recommend always using 61``IRBuilder`` when generating code like this. It has no "syntactic 62overhead" for its use (you don't have to uglify your compiler with 63constant checks everywhere) and it can dramatically reduce the amount of 64LLVM IR that is generated in some cases (particular for languages with a 65macro preprocessor or that use a lot of constants). 66 67On the other hand, the ``IRBuilder`` is limited by the fact that it does 68all of its analysis inline with the code as it is built. If you take a 69slightly more complex example: 70 71:: 72 73 ready> def test(x) (1+2+x)*(x+(1+2)); 74 ready> Read function definition: 75 define double @test(double %x) { 76 entry: 77 %addtmp = fadd double 3.000000e+00, %x 78 %addtmp1 = fadd double %x, 3.000000e+00 79 %multmp = fmul double %addtmp, %addtmp1 80 ret double %multmp 81 } 82 83In this case, the LHS and RHS of the multiplication are the same value. 84We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``" 85instead of computing "``x+3``" twice. 86 87Unfortunately, no amount of local analysis will be able to detect and 88correct this. This requires two transformations: reassociation of 89expressions (to make the add's lexically identical) and Common 90Subexpression Elimination (CSE) to delete the redundant add instruction. 91Fortunately, LLVM provides a broad range of optimizations that you can 92use, in the form of "passes". 93 94LLVM Optimization Passes 95======================== 96 97.. warning:: 98 99 Due to the transition to the new PassManager infrastructure this tutorial 100 is based on ``llvm::legacy::FunctionPassManager`` which can be found in 101 `LegacyPassManager.h <https://llvm.org/doxygen/classllvm_1_1legacy_1_1FunctionPassManager.html>`_. 102 For the purpose of the this tutorial the above should be used until 103 the pass manager transition is complete. 104 105LLVM provides many optimization passes, which do many different sorts of 106things and have different tradeoffs. Unlike other systems, LLVM doesn't 107hold to the mistaken notion that one set of optimizations is right for 108all languages and for all situations. LLVM allows a compiler implementor 109to make complete decisions about what optimizations to use, in which 110order, and in what situation. 111 112As a concrete example, LLVM supports both "whole module" passes, which 113look across as large of body of code as they can (often a whole file, 114but if run at link time, this can be a substantial portion of the whole 115program). It also supports and includes "per-function" passes which just 116operate on a single function at a time, without looking at other 117functions. For more information on passes and how they are run, see the 118`How to Write a Pass <../../WritingAnLLVMPass.html>`_ document and the 119`List of LLVM Passes <../../Passes.html>`_. 120 121For Kaleidoscope, we are currently generating functions on the fly, one 122at a time, as the user types them in. We aren't shooting for the 123ultimate optimization experience in this setting, but we also want to 124catch the easy and quick stuff where possible. As such, we will choose 125to run a few per-function optimizations as the user types the function 126in. If we wanted to make a "static Kaleidoscope compiler", we would use 127exactly the code we have now, except that we would defer running the 128optimizer until the entire file has been parsed. 129 130In order to get per-function optimizations going, we need to set up a 131`FunctionPassManager <../../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold 132and organize the LLVM optimizations that we want to run. Once we have 133that, we can add a set of optimizations to run. We'll need a new 134FunctionPassManager for each module that we want to optimize, so we'll 135write a function to create and initialize both the module and pass manager 136for us: 137 138.. code-block:: c++ 139 140 void InitializeModuleAndPassManager(void) { 141 // Open a new module. 142 TheModule = std::make_unique<Module>("my cool jit", TheContext); 143 144 // Create a new pass manager attached to it. 145 TheFPM = std::make_unique<legacy::FunctionPassManager>(TheModule.get()); 146 147 // Do simple "peephole" optimizations and bit-twiddling optzns. 148 TheFPM->add(createInstructionCombiningPass()); 149 // Reassociate expressions. 150 TheFPM->add(createReassociatePass()); 151 // Eliminate Common SubExpressions. 152 TheFPM->add(createGVNPass()); 153 // Simplify the control flow graph (deleting unreachable blocks, etc). 154 TheFPM->add(createCFGSimplificationPass()); 155 156 TheFPM->doInitialization(); 157 } 158 159This code initializes the global module ``TheModule``, and the function pass 160manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is 161set up, we use a series of "add" calls to add a bunch of LLVM passes. 162 163In this case, we choose to add four optimization passes. 164The passes we choose here are a pretty standard set 165of "cleanup" optimizations that are useful for a wide variety of code. I won't 166delve into what they do but, believe me, they are a good starting place :). 167 168Once the PassManager is set up, we need to make use of it. We do this by 169running it after our newly created function is constructed (in 170``FunctionAST::codegen()``), but before it is returned to the client: 171 172.. code-block:: c++ 173 174 if (Value *RetVal = Body->codegen()) { 175 // Finish off the function. 176 Builder.CreateRet(RetVal); 177 178 // Validate the generated code, checking for consistency. 179 verifyFunction(*TheFunction); 180 181 // Optimize the function. 182 TheFPM->run(*TheFunction); 183 184 return TheFunction; 185 } 186 187As you can see, this is pretty straightforward. The 188``FunctionPassManager`` optimizes and updates the LLVM Function\* in 189place, improving (hopefully) its body. With this in place, we can try 190our test above again: 191 192:: 193 194 ready> def test(x) (1+2+x)*(x+(1+2)); 195 ready> Read function definition: 196 define double @test(double %x) { 197 entry: 198 %addtmp = fadd double %x, 3.000000e+00 199 %multmp = fmul double %addtmp, %addtmp 200 ret double %multmp 201 } 202 203As expected, we now get our nicely optimized code, saving a floating 204point add instruction from every execution of this function. 205 206LLVM provides a wide variety of optimizations that can be used in 207certain circumstances. Some `documentation about the various 208passes <../../Passes.html>`_ is available, but it isn't very complete. 209Another good source of ideas can come from looking at the passes that 210``Clang`` runs to get started. The "``opt``" tool allows you to 211experiment with passes from the command line, so you can see if they do 212anything. 213 214Now that we have reasonable code coming out of our front-end, let's talk 215about executing it! 216 217Adding a JIT Compiler 218===================== 219 220Code that is available in LLVM IR can have a wide variety of tools 221applied to it. For example, you can run optimizations on it (as we did 222above), you can dump it out in textual or binary forms, you can compile 223the code to an assembly file (.s) for some target, or you can JIT 224compile it. The nice thing about the LLVM IR representation is that it 225is the "common currency" between many different parts of the compiler. 226 227In this section, we'll add JIT compiler support to our interpreter. The 228basic idea that we want for Kaleidoscope is to have the user enter 229function bodies as they do now, but immediately evaluate the top-level 230expressions they type in. For example, if they type in "1 + 2;", we 231should evaluate and print out 3. If they define a function, they should 232be able to call it from the command line. 233 234In order to do this, we first prepare the environment to create code for 235the current native target and declare and initialize the JIT. This is 236done by calling some ``InitializeNativeTarget\*`` functions and 237adding a global variable ``TheJIT``, and initializing it in 238``main``: 239 240.. code-block:: c++ 241 242 static std::unique_ptr<KaleidoscopeJIT> TheJIT; 243 ... 244 int main() { 245 InitializeNativeTarget(); 246 InitializeNativeTargetAsmPrinter(); 247 InitializeNativeTargetAsmParser(); 248 249 // Install standard binary operators. 250 // 1 is lowest precedence. 251 BinopPrecedence['<'] = 10; 252 BinopPrecedence['+'] = 20; 253 BinopPrecedence['-'] = 20; 254 BinopPrecedence['*'] = 40; // highest. 255 256 // Prime the first token. 257 fprintf(stderr, "ready> "); 258 getNextToken(); 259 260 TheJIT = std::make_unique<KaleidoscopeJIT>(); 261 262 // Run the main "interpreter loop" now. 263 MainLoop(); 264 265 return 0; 266 } 267 268We also need to setup the data layout for the JIT: 269 270.. code-block:: c++ 271 272 void InitializeModuleAndPassManager(void) { 273 // Open a new module. 274 TheModule = std::make_unique<Module>("my cool jit", TheContext); 275 TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout()); 276 277 // Create a new pass manager attached to it. 278 TheFPM = std::make_unique<legacy::FunctionPassManager>(TheModule.get()); 279 ... 280 281The KaleidoscopeJIT class is a simple JIT built specifically for these 282tutorials, available inside the LLVM source code 283at llvm-src/examples/Kaleidoscope/include/KaleidoscopeJIT.h. 284In later chapters we will look at how it works and extend it with 285new features, but for now we will take it as given. Its API is very simple: 286``addModule`` adds an LLVM IR module to the JIT, making its functions 287available for execution; ``removeModule`` removes a module, freeing any 288memory associated with the code in that module; and ``findSymbol`` allows us 289to look up pointers to the compiled code. 290 291We can take this simple API and change our code that parses top-level expressions to 292look like this: 293 294.. code-block:: c++ 295 296 static void HandleTopLevelExpression() { 297 // Evaluate a top-level expression into an anonymous function. 298 if (auto FnAST = ParseTopLevelExpr()) { 299 if (FnAST->codegen()) { 300 301 // JIT the module containing the anonymous expression, keeping a handle so 302 // we can free it later. 303 auto H = TheJIT->addModule(std::move(TheModule)); 304 InitializeModuleAndPassManager(); 305 306 // Search the JIT for the __anon_expr symbol. 307 auto ExprSymbol = TheJIT->findSymbol("__anon_expr"); 308 assert(ExprSymbol && "Function not found"); 309 310 // Get the symbol's address and cast it to the right type (takes no 311 // arguments, returns a double) so we can call it as a native function. 312 double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress(); 313 fprintf(stderr, "Evaluated to %f\n", FP()); 314 315 // Delete the anonymous expression module from the JIT. 316 TheJIT->removeModule(H); 317 } 318 319If parsing and codegen succeed, the next step is to add the module containing 320the top-level expression to the JIT. We do this by calling addModule, which 321triggers code generation for all the functions in the module, and returns a 322handle that can be used to remove the module from the JIT later. Once the module 323has been added to the JIT it can no longer be modified, so we also open a new 324module to hold subsequent code by calling ``InitializeModuleAndPassManager()``. 325 326Once we've added the module to the JIT we need to get a pointer to the final 327generated code. We do this by calling the JIT's findSymbol method, and passing 328the name of the top-level expression function: ``__anon_expr``. Since we just 329added this function, we assert that findSymbol returned a result. 330 331Next, we get the in-memory address of the ``__anon_expr`` function by calling 332``getAddress()`` on the symbol. Recall that we compile top-level expressions 333into a self-contained LLVM function that takes no arguments and returns the 334computed double. Because the LLVM JIT compiler matches the native platform ABI, 335this means that you can just cast the result pointer to a function pointer of 336that type and call it directly. This means, there is no difference between JIT 337compiled code and native machine code that is statically linked into your 338application. 339 340Finally, since we don't support re-evaluation of top-level expressions, we 341remove the module from the JIT when we're done to free the associated memory. 342Recall, however, that the module we created a few lines earlier (via 343``InitializeModuleAndPassManager``) is still open and waiting for new code to be 344added. 345 346With just these two changes, let's see how Kaleidoscope works now! 347 348:: 349 350 ready> 4+5; 351 Read top-level expression: 352 define double @0() { 353 entry: 354 ret double 9.000000e+00 355 } 356 357 Evaluated to 9.000000 358 359Well this looks like it is basically working. The dump of the function 360shows the "no argument function that always returns double" that we 361synthesize for each top-level expression that is typed in. This 362demonstrates very basic functionality, but can we do more? 363 364:: 365 366 ready> def testfunc(x y) x + y*2; 367 Read function definition: 368 define double @testfunc(double %x, double %y) { 369 entry: 370 %multmp = fmul double %y, 2.000000e+00 371 %addtmp = fadd double %multmp, %x 372 ret double %addtmp 373 } 374 375 ready> testfunc(4, 10); 376 Read top-level expression: 377 define double @1() { 378 entry: 379 %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) 380 ret double %calltmp 381 } 382 383 Evaluated to 24.000000 384 385 ready> testfunc(5, 10); 386 ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved! 387 388 389Function definitions and calls also work, but something went very wrong on that 390last line. The call looks valid, so what happened? As you may have guessed from 391the API a Module is a unit of allocation for the JIT, and testfunc was part 392of the same module that contained anonymous expression. When we removed that 393module from the JIT to free the memory for the anonymous expression, we deleted 394the definition of ``testfunc`` along with it. Then, when we tried to call 395testfunc a second time, the JIT could no longer find it. 396 397The easiest way to fix this is to put the anonymous expression in a separate 398module from the rest of the function definitions. The JIT will happily resolve 399function calls across module boundaries, as long as each of the functions called 400has a prototype, and is added to the JIT before it is called. By putting the 401anonymous expression in a different module we can delete it without affecting 402the rest of the functions. 403 404In fact, we're going to go a step further and put every function in its own 405module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT 406that will make our environment more REPL-like: Functions can be added to the 407JIT more than once (unlike a module where every function must have a unique 408definition). When you look up a symbol in KaleidoscopeJIT it will always return 409the most recent definition: 410 411:: 412 413 ready> def foo(x) x + 1; 414 Read function definition: 415 define double @foo(double %x) { 416 entry: 417 %addtmp = fadd double %x, 1.000000e+00 418 ret double %addtmp 419 } 420 421 ready> foo(2); 422 Evaluated to 3.000000 423 424 ready> def foo(x) x + 2; 425 define double @foo(double %x) { 426 entry: 427 %addtmp = fadd double %x, 2.000000e+00 428 ret double %addtmp 429 } 430 431 ready> foo(2); 432 Evaluated to 4.000000 433 434 435To allow each function to live in its own module we'll need a way to 436re-generate previous function declarations into each new module we open: 437 438.. code-block:: c++ 439 440 static std::unique_ptr<KaleidoscopeJIT> TheJIT; 441 442 ... 443 444 Function *getFunction(std::string Name) { 445 // First, see if the function has already been added to the current module. 446 if (auto *F = TheModule->getFunction(Name)) 447 return F; 448 449 // If not, check whether we can codegen the declaration from some existing 450 // prototype. 451 auto FI = FunctionProtos.find(Name); 452 if (FI != FunctionProtos.end()) 453 return FI->second->codegen(); 454 455 // If no existing prototype exists, return null. 456 return nullptr; 457 } 458 459 ... 460 461 Value *CallExprAST::codegen() { 462 // Look up the name in the global module table. 463 Function *CalleeF = getFunction(Callee); 464 465 ... 466 467 Function *FunctionAST::codegen() { 468 // Transfer ownership of the prototype to the FunctionProtos map, but keep a 469 // reference to it for use below. 470 auto &P = *Proto; 471 FunctionProtos[Proto->getName()] = std::move(Proto); 472 Function *TheFunction = getFunction(P.getName()); 473 if (!TheFunction) 474 return nullptr; 475 476 477To enable this, we'll start by adding a new global, ``FunctionProtos``, that 478holds the most recent prototype for each function. We'll also add a convenience 479method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``. 480Our convenience method searches ``TheModule`` for an existing function 481declaration, falling back to generating a new declaration from FunctionProtos if 482it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the 483call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to 484update the FunctionProtos map first, then call ``getFunction()``. With this 485done, we can always obtain a function declaration in the current module for any 486previously declared function. 487 488We also need to update HandleDefinition and HandleExtern: 489 490.. code-block:: c++ 491 492 static void HandleDefinition() { 493 if (auto FnAST = ParseDefinition()) { 494 if (auto *FnIR = FnAST->codegen()) { 495 fprintf(stderr, "Read function definition:"); 496 FnIR->print(errs()); 497 fprintf(stderr, "\n"); 498 TheJIT->addModule(std::move(TheModule)); 499 InitializeModuleAndPassManager(); 500 } 501 } else { 502 // Skip token for error recovery. 503 getNextToken(); 504 } 505 } 506 507 static void HandleExtern() { 508 if (auto ProtoAST = ParseExtern()) { 509 if (auto *FnIR = ProtoAST->codegen()) { 510 fprintf(stderr, "Read extern: "); 511 FnIR->print(errs()); 512 fprintf(stderr, "\n"); 513 FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST); 514 } 515 } else { 516 // Skip token for error recovery. 517 getNextToken(); 518 } 519 } 520 521In HandleDefinition, we add two lines to transfer the newly defined function to 522the JIT and open a new module. In HandleExtern, we just need to add one line to 523add the prototype to FunctionProtos. 524 525With these changes made, let's try our REPL again (I removed the dump of the 526anonymous functions this time, you should get the idea by now :) : 527 528:: 529 530 ready> def foo(x) x + 1; 531 ready> foo(2); 532 Evaluated to 3.000000 533 534 ready> def foo(x) x + 2; 535 ready> foo(2); 536 Evaluated to 4.000000 537 538It works! 539 540Even with this simple code, we get some surprisingly powerful capabilities - 541check this out: 542 543:: 544 545 ready> extern sin(x); 546 Read extern: 547 declare double @sin(double) 548 549 ready> extern cos(x); 550 Read extern: 551 declare double @cos(double) 552 553 ready> sin(1.0); 554 Read top-level expression: 555 define double @2() { 556 entry: 557 ret double 0x3FEAED548F090CEE 558 } 559 560 Evaluated to 0.841471 561 562 ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); 563 Read function definition: 564 define double @foo(double %x) { 565 entry: 566 %calltmp = call double @sin(double %x) 567 %multmp = fmul double %calltmp, %calltmp 568 %calltmp2 = call double @cos(double %x) 569 %multmp4 = fmul double %calltmp2, %calltmp2 570 %addtmp = fadd double %multmp, %multmp4 571 ret double %addtmp 572 } 573 574 ready> foo(4.0); 575 Read top-level expression: 576 define double @3() { 577 entry: 578 %calltmp = call double @foo(double 4.000000e+00) 579 ret double %calltmp 580 } 581 582 Evaluated to 1.000000 583 584Whoa, how does the JIT know about sin and cos? The answer is surprisingly 585simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that 586it uses to find symbols that aren't available in any given module: First 587it searches all the modules that have already been added to the JIT, from the 588most recent to the oldest, to find the newest definition. If no definition is 589found inside the JIT, it falls back to calling "``dlsym("sin")``" on the 590Kaleidoscope process itself. Since "``sin``" is defined within the JIT's 591address space, it simply patches up calls in the module to call the libm 592version of ``sin`` directly. But in some cases this even goes further: 593as sin and cos are names of standard math functions, the constant folder 594will directly evaluate the function calls to the correct result when called 595with constants like in the "``sin(1.0)``" above. 596 597In the future we'll see how tweaking this symbol resolution rule can be used to 598enable all sorts of useful features, from security (restricting the set of 599symbols available to JIT'd code), to dynamic code generation based on symbol 600names, and even lazy compilation. 601 602One immediate benefit of the symbol resolution rule is that we can now extend 603the language by writing arbitrary C++ code to implement operations. For example, 604if we add: 605 606.. code-block:: c++ 607 608 #ifdef _WIN32 609 #define DLLEXPORT __declspec(dllexport) 610 #else 611 #define DLLEXPORT 612 #endif 613 614 /// putchard - putchar that takes a double and returns 0. 615 extern "C" DLLEXPORT double putchard(double X) { 616 fputc((char)X, stderr); 617 return 0; 618 } 619 620Note, that for Windows we need to actually export the functions because 621the dynamic symbol loader will use GetProcAddress to find the symbols. 622 623Now we can produce simple output to the console by using things like: 624"``extern putchard(x); putchard(120);``", which prints a lowercase 'x' 625on the console (120 is the ASCII code for 'x'). Similar code could be 626used to implement file I/O, console input, and many other capabilities 627in Kaleidoscope. 628 629This completes the JIT and optimizer chapter of the Kaleidoscope 630tutorial. At this point, we can compile a non-Turing-complete 631programming language, optimize and JIT compile it in a user-driven way. 632Next up we'll look into `extending the language with control flow 633constructs <LangImpl05.html>`_, tackling some interesting LLVM IR issues 634along the way. 635 636Full Code Listing 637================= 638 639Here is the complete code listing for our running example, enhanced with 640the LLVM JIT and optimizer. To build this example, use: 641 642.. code-block:: bash 643 644 # Compile 645 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy 646 # Run 647 ./toy 648 649If you are compiling this on Linux, make sure to add the "-rdynamic" 650option as well. This makes sure that the external functions are resolved 651properly at runtime. 652 653Here is the code: 654 655.. literalinclude:: ../../../examples/Kaleidoscope/Chapter4/toy.cpp 656 :language: c++ 657 658`Next: Extending the language: control flow <LangImpl05.html>`_ 659 660