1============================================== 2Kaleidoscope: Adding JIT and Optimizer Support 3============================================== 4 5.. contents:: 6 :local: 7 8Chapter 4 Introduction 9====================== 10 11Welcome to Chapter 4 of the "`Implementing a language with 12LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation 13of a simple language and added support for generating LLVM IR. This 14chapter describes two new techniques: adding optimizer support to your 15language, and adding JIT compiler support. These additions will 16demonstrate how to get nice, efficient code for the Kaleidoscope 17language. 18 19Trivial Constant Folding 20======================== 21 22Our demonstration for Chapter 3 is elegant and easy to extend. 23Unfortunately, it does not produce wonderful code. The IRBuilder, 24however, does give us obvious optimizations when compiling simple code: 25 26:: 27 28 ready> def test(x) 1+2+x; 29 Read function definition: 30 define double @test(double %x) { 31 entry: 32 %addtmp = fadd double 3.000000e+00, %x 33 ret double %addtmp 34 } 35 36This code is not a literal transcription of the AST built by parsing the 37input. That would be: 38 39:: 40 41 ready> def test(x) 1+2+x; 42 Read function definition: 43 define double @test(double %x) { 44 entry: 45 %addtmp = fadd double 2.000000e+00, 1.000000e+00 46 %addtmp1 = fadd double %addtmp, %x 47 ret double %addtmp1 48 } 49 50Constant folding, as seen above, in particular, is a very common and 51very important optimization: so much so that many language implementors 52implement constant folding support in their AST representation. 53 54With LLVM, you don't need this support in the AST. Since all calls to 55build LLVM IR go through the LLVM IR builder, the builder itself checked 56to see if there was a constant folding opportunity when you call it. If 57so, it just does the constant fold and return the constant instead of 58creating an instruction. 59 60Well, that was easy :). In practice, we recommend always using 61``IRBuilder`` when generating code like this. It has no "syntactic 62overhead" for its use (you don't have to uglify your compiler with 63constant checks everywhere) and it can dramatically reduce the amount of 64LLVM IR that is generated in some cases (particular for languages with a 65macro preprocessor or that use a lot of constants). 66 67On the other hand, the ``IRBuilder`` is limited by the fact that it does 68all of its analysis inline with the code as it is built. If you take a 69slightly more complex example: 70 71:: 72 73 ready> def test(x) (1+2+x)*(x+(1+2)); 74 ready> Read function definition: 75 define double @test(double %x) { 76 entry: 77 %addtmp = fadd double 3.000000e+00, %x 78 %addtmp1 = fadd double %x, 3.000000e+00 79 %multmp = fmul double %addtmp, %addtmp1 80 ret double %multmp 81 } 82 83In this case, the LHS and RHS of the multiplication are the same value. 84We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``" 85instead of computing "``x+3``" twice. 86 87Unfortunately, no amount of local analysis will be able to detect and 88correct this. This requires two transformations: reassociation of 89expressions (to make the add's lexically identical) and Common 90Subexpression Elimination (CSE) to delete the redundant add instruction. 91Fortunately, LLVM provides a broad range of optimizations that you can 92use, in the form of "passes". 93 94LLVM Optimization Passes 95======================== 96 97LLVM provides many optimization passes, which do many different sorts of 98things and have different tradeoffs. Unlike other systems, LLVM doesn't 99hold to the mistaken notion that one set of optimizations is right for 100all languages and for all situations. LLVM allows a compiler implementor 101to make complete decisions about what optimizations to use, in which 102order, and in what situation. 103 104As a concrete example, LLVM supports both "whole module" passes, which 105look across as large of body of code as they can (often a whole file, 106but if run at link time, this can be a substantial portion of the whole 107program). It also supports and includes "per-function" passes which just 108operate on a single function at a time, without looking at other 109functions. For more information on passes and how they are run, see the 110`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the 111`List of LLVM Passes <../Passes.html>`_. 112 113For Kaleidoscope, we are currently generating functions on the fly, one 114at a time, as the user types them in. We aren't shooting for the 115ultimate optimization experience in this setting, but we also want to 116catch the easy and quick stuff where possible. As such, we will choose 117to run a few per-function optimizations as the user types the function 118in. If we wanted to make a "static Kaleidoscope compiler", we would use 119exactly the code we have now, except that we would defer running the 120optimizer until the entire file has been parsed. 121 122In order to get per-function optimizations going, we need to set up a 123`FunctionPassManager <../WritingAnLLVMPass.html#passmanager>`_ to hold 124and organize the LLVM optimizations that we want to run. Once we have 125that, we can add a set of optimizations to run. The code looks like 126this: 127 128.. code-block:: c++ 129 130 FunctionPassManager OurFPM(TheModule); 131 132 // Set up the optimizer pipeline. Start with registering info about how the 133 // target lays out data structures. 134 OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); 135 // Provide basic AliasAnalysis support for GVN. 136 OurFPM.add(createBasicAliasAnalysisPass()); 137 // Do simple "peephole" optimizations and bit-twiddling optzns. 138 OurFPM.add(createInstructionCombiningPass()); 139 // Reassociate expressions. 140 OurFPM.add(createReassociatePass()); 141 // Eliminate Common SubExpressions. 142 OurFPM.add(createGVNPass()); 143 // Simplify the control flow graph (deleting unreachable blocks, etc). 144 OurFPM.add(createCFGSimplificationPass()); 145 146 OurFPM.doInitialization(); 147 148 // Set the global so the code gen can use this. 149 TheFPM = &OurFPM; 150 151 // Run the main "interpreter loop" now. 152 MainLoop(); 153 154This code defines a ``FunctionPassManager``, "``OurFPM``". It requires a 155pointer to the ``Module`` to construct itself. Once it is set up, we use 156a series of "add" calls to add a bunch of LLVM passes. The first pass is 157basically boilerplate, it adds a pass so that later optimizations know 158how the data structures in the program are laid out. The 159"``TheExecutionEngine``" variable is related to the JIT, which we will 160get to in the next section. 161 162In this case, we choose to add 4 optimization passes. The passes we 163chose here are a pretty standard set of "cleanup" optimizations that are 164useful for a wide variety of code. I won't delve into what they do but, 165believe me, they are a good starting place :). 166 167Once the PassManager is set up, we need to make use of it. We do this by 168running it after our newly created function is constructed (in 169``FunctionAST::Codegen``), but before it is returned to the client: 170 171.. code-block:: c++ 172 173 if (Value *RetVal = Body->Codegen()) { 174 // Finish off the function. 175 Builder.CreateRet(RetVal); 176 177 // Validate the generated code, checking for consistency. 178 verifyFunction(*TheFunction); 179 180 // Optimize the function. 181 TheFPM->run(*TheFunction); 182 183 return TheFunction; 184 } 185 186As you can see, this is pretty straightforward. The 187``FunctionPassManager`` optimizes and updates the LLVM Function\* in 188place, improving (hopefully) its body. With this in place, we can try 189our test above again: 190 191:: 192 193 ready> def test(x) (1+2+x)*(x+(1+2)); 194 ready> Read function definition: 195 define double @test(double %x) { 196 entry: 197 %addtmp = fadd double %x, 3.000000e+00 198 %multmp = fmul double %addtmp, %addtmp 199 ret double %multmp 200 } 201 202As expected, we now get our nicely optimized code, saving a floating 203point add instruction from every execution of this function. 204 205LLVM provides a wide variety of optimizations that can be used in 206certain circumstances. Some `documentation about the various 207passes <../Passes.html>`_ is available, but it isn't very complete. 208Another good source of ideas can come from looking at the passes that 209``Clang`` runs to get started. The "``opt``" tool allows you to 210experiment with passes from the command line, so you can see if they do 211anything. 212 213Now that we have reasonable code coming out of our front-end, lets talk 214about executing it! 215 216Adding a JIT Compiler 217===================== 218 219Code that is available in LLVM IR can have a wide variety of tools 220applied to it. For example, you can run optimizations on it (as we did 221above), you can dump it out in textual or binary forms, you can compile 222the code to an assembly file (.s) for some target, or you can JIT 223compile it. The nice thing about the LLVM IR representation is that it 224is the "common currency" between many different parts of the compiler. 225 226In this section, we'll add JIT compiler support to our interpreter. The 227basic idea that we want for Kaleidoscope is to have the user enter 228function bodies as they do now, but immediately evaluate the top-level 229expressions they type in. For example, if they type in "1 + 2;", we 230should evaluate and print out 3. If they define a function, they should 231be able to call it from the command line. 232 233In order to do this, we first declare and initialize the JIT. This is 234done by adding a global variable and a call in ``main``: 235 236.. code-block:: c++ 237 238 static ExecutionEngine *TheExecutionEngine; 239 ... 240 int main() { 241 .. 242 // Create the JIT. This takes ownership of the module. 243 TheExecutionEngine = EngineBuilder(TheModule).create(); 244 .. 245 } 246 247This creates an abstract "Execution Engine" which can be either a JIT 248compiler or the LLVM interpreter. LLVM will automatically pick a JIT 249compiler for you if one is available for your platform, otherwise it 250will fall back to the interpreter. 251 252Once the ``ExecutionEngine`` is created, the JIT is ready to be used. 253There are a variety of APIs that are useful, but the simplest one is the 254"``getPointerToFunction(F)``" method. This method JIT compiles the 255specified LLVM Function and returns a function pointer to the generated 256machine code. In our case, this means that we can change the code that 257parses a top-level expression to look like this: 258 259.. code-block:: c++ 260 261 static void HandleTopLevelExpression() { 262 // Evaluate a top-level expression into an anonymous function. 263 if (FunctionAST *F = ParseTopLevelExpr()) { 264 if (Function *LF = F->Codegen()) { 265 LF->dump(); // Dump the function for exposition purposes. 266 267 // JIT the function, returning a function pointer. 268 void *FPtr = TheExecutionEngine->getPointerToFunction(LF); 269 270 // Cast it to the right type (takes no arguments, returns a double) so we 271 // can call it as a native function. 272 double (*FP)() = (double (*)())(intptr_t)FPtr; 273 fprintf(stderr, "Evaluated to %f\n", FP()); 274 } 275 276Recall that we compile top-level expressions into a self-contained LLVM 277function that takes no arguments and returns the computed double. 278Because the LLVM JIT compiler matches the native platform ABI, this 279means that you can just cast the result pointer to a function pointer of 280that type and call it directly. This means, there is no difference 281between JIT compiled code and native machine code that is statically 282linked into your application. 283 284With just these two changes, lets see how Kaleidoscope works now! 285 286:: 287 288 ready> 4+5; 289 Read top-level expression: 290 define double @0() { 291 entry: 292 ret double 9.000000e+00 293 } 294 295 Evaluated to 9.000000 296 297Well this looks like it is basically working. The dump of the function 298shows the "no argument function that always returns double" that we 299synthesize for each top-level expression that is typed in. This 300demonstrates very basic functionality, but can we do more? 301 302:: 303 304 ready> def testfunc(x y) x + y*2; 305 Read function definition: 306 define double @testfunc(double %x, double %y) { 307 entry: 308 %multmp = fmul double %y, 2.000000e+00 309 %addtmp = fadd double %multmp, %x 310 ret double %addtmp 311 } 312 313 ready> testfunc(4, 10); 314 Read top-level expression: 315 define double @1() { 316 entry: 317 %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) 318 ret double %calltmp 319 } 320 321 Evaluated to 24.000000 322 323This illustrates that we can now call user code, but there is something 324a bit subtle going on here. Note that we only invoke the JIT on the 325anonymous functions that *call testfunc*, but we never invoked it on 326*testfunc* itself. What actually happened here is that the JIT scanned 327for all non-JIT'd functions transitively called from the anonymous 328function and compiled all of them before returning from 329``getPointerToFunction()``. 330 331The JIT provides a number of other more advanced interfaces for things 332like freeing allocated machine code, rejit'ing functions to update them, 333etc. However, even with this simple code, we get some surprisingly 334powerful capabilities - check this out (I removed the dump of the 335anonymous functions, you should get the idea by now :) : 336 337:: 338 339 ready> extern sin(x); 340 Read extern: 341 declare double @sin(double) 342 343 ready> extern cos(x); 344 Read extern: 345 declare double @cos(double) 346 347 ready> sin(1.0); 348 Read top-level expression: 349 define double @2() { 350 entry: 351 ret double 0x3FEAED548F090CEE 352 } 353 354 Evaluated to 0.841471 355 356 ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); 357 Read function definition: 358 define double @foo(double %x) { 359 entry: 360 %calltmp = call double @sin(double %x) 361 %multmp = fmul double %calltmp, %calltmp 362 %calltmp2 = call double @cos(double %x) 363 %multmp4 = fmul double %calltmp2, %calltmp2 364 %addtmp = fadd double %multmp, %multmp4 365 ret double %addtmp 366 } 367 368 ready> foo(4.0); 369 Read top-level expression: 370 define double @3() { 371 entry: 372 %calltmp = call double @foo(double 4.000000e+00) 373 ret double %calltmp 374 } 375 376 Evaluated to 1.000000 377 378Whoa, how does the JIT know about sin and cos? The answer is 379surprisingly simple: in this example, the JIT started execution of a 380function and got to a function call. It realized that the function was 381not yet JIT compiled and invoked the standard set of routines to resolve 382the function. In this case, there is no body defined for the function, 383so the JIT ended up calling "``dlsym("sin")``" on the Kaleidoscope 384process itself. Since "``sin``" is defined within the JIT's address 385space, it simply patches up calls in the module to call the libm version 386of ``sin`` directly. 387 388The LLVM JIT provides a number of interfaces (look in the 389``ExecutionEngine.h`` file) for controlling how unknown functions get 390resolved. It allows you to establish explicit mappings between IR 391objects and addresses (useful for LLVM global variables that you want to 392map to static tables, for example), allows you to dynamically decide on 393the fly based on the function name, and even allows you to have the JIT 394compile functions lazily the first time they're called. 395 396One interesting application of this is that we can now extend the 397language by writing arbitrary C++ code to implement operations. For 398example, if we add: 399 400.. code-block:: c++ 401 402 /// putchard - putchar that takes a double and returns 0. 403 extern "C" 404 double putchard(double X) { 405 putchar((char)X); 406 return 0; 407 } 408 409Now we can produce simple output to the console by using things like: 410"``extern putchard(x); putchard(120);``", which prints a lowercase 'x' 411on the console (120 is the ASCII code for 'x'). Similar code could be 412used to implement file I/O, console input, and many other capabilities 413in Kaleidoscope. 414 415This completes the JIT and optimizer chapter of the Kaleidoscope 416tutorial. At this point, we can compile a non-Turing-complete 417programming language, optimize and JIT compile it in a user-driven way. 418Next up we'll look into `extending the language with control flow 419constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues 420along the way. 421 422Full Code Listing 423================= 424 425Here is the complete code listing for our running example, enhanced with 426the LLVM JIT and optimizer. To build this example, use: 427 428.. code-block:: bash 429 430 # Compile 431 clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy 432 # Run 433 ./toy 434 435If you are compiling this on Linux, make sure to add the "-rdynamic" 436option as well. This makes sure that the external functions are resolved 437properly at runtime. 438 439Here is the code: 440 441.. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp 442 :language: c++ 443 444`Next: Extending the language: control flow <LangImpl5.html>`_ 445 446