• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1==============================================
2Kaleidoscope: Adding JIT and Optimizer Support
3==============================================
4
5.. contents::
6   :local:
7
8Chapter 4 Introduction
9======================
10
11Welcome to Chapter 4 of the "`Implementing a language with
12LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation
13of a simple language and added support for generating LLVM IR. This
14chapter describes two new techniques: adding optimizer support to your
15language, and adding JIT compiler support. These additions will
16demonstrate how to get nice, efficient code for the Kaleidoscope
17language.
18
19Trivial Constant Folding
20========================
21
22Our demonstration for Chapter 3 is elegant and easy to extend.
23Unfortunately, it does not produce wonderful code. The IRBuilder,
24however, does give us obvious optimizations when compiling simple code:
25
26::
27
28    ready> def test(x) 1+2+x;
29    Read function definition:
30    define double @test(double %x) {
31    entry:
32            %addtmp = fadd double 3.000000e+00, %x
33            ret double %addtmp
34    }
35
36This code is not a literal transcription of the AST built by parsing the
37input. That would be:
38
39::
40
41    ready> def test(x) 1+2+x;
42    Read function definition:
43    define double @test(double %x) {
44    entry:
45            %addtmp = fadd double 2.000000e+00, 1.000000e+00
46            %addtmp1 = fadd double %addtmp, %x
47            ret double %addtmp1
48    }
49
50Constant folding, as seen above, in particular, is a very common and
51very important optimization: so much so that many language implementors
52implement constant folding support in their AST representation.
53
54With LLVM, you don't need this support in the AST. Since all calls to
55build LLVM IR go through the LLVM IR builder, the builder itself checked
56to see if there was a constant folding opportunity when you call it. If
57so, it just does the constant fold and return the constant instead of
58creating an instruction.
59
60Well, that was easy :). In practice, we recommend always using
61``IRBuilder`` when generating code like this. It has no "syntactic
62overhead" for its use (you don't have to uglify your compiler with
63constant checks everywhere) and it can dramatically reduce the amount of
64LLVM IR that is generated in some cases (particular for languages with a
65macro preprocessor or that use a lot of constants).
66
67On the other hand, the ``IRBuilder`` is limited by the fact that it does
68all of its analysis inline with the code as it is built. If you take a
69slightly more complex example:
70
71::
72
73    ready> def test(x) (1+2+x)*(x+(1+2));
74    ready> Read function definition:
75    define double @test(double %x) {
76    entry:
77            %addtmp = fadd double 3.000000e+00, %x
78            %addtmp1 = fadd double %x, 3.000000e+00
79            %multmp = fmul double %addtmp, %addtmp1
80            ret double %multmp
81    }
82
83In this case, the LHS and RHS of the multiplication are the same value.
84We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``"
85instead of computing "``x+3``" twice.
86
87Unfortunately, no amount of local analysis will be able to detect and
88correct this. This requires two transformations: reassociation of
89expressions (to make the add's lexically identical) and Common
90Subexpression Elimination (CSE) to delete the redundant add instruction.
91Fortunately, LLVM provides a broad range of optimizations that you can
92use, in the form of "passes".
93
94LLVM Optimization Passes
95========================
96
97.. warning::
98
99   Due to the transition to the new PassManager infrastructure this tutorial
100   is based on ``llvm::legacy::FunctionPassManager`` which can be found in
101   `LegacyPassManager.h <https://llvm.org/doxygen/classllvm_1_1legacy_1_1FunctionPassManager.html>`_.
102   For the purpose of the this tutorial the above should be used until
103   the pass manager transition is complete.
104
105LLVM provides many optimization passes, which do many different sorts of
106things and have different tradeoffs. Unlike other systems, LLVM doesn't
107hold to the mistaken notion that one set of optimizations is right for
108all languages and for all situations. LLVM allows a compiler implementor
109to make complete decisions about what optimizations to use, in which
110order, and in what situation.
111
112As a concrete example, LLVM supports both "whole module" passes, which
113look across as large of body of code as they can (often a whole file,
114but if run at link time, this can be a substantial portion of the whole
115program). It also supports and includes "per-function" passes which just
116operate on a single function at a time, without looking at other
117functions. For more information on passes and how they are run, see the
118`How to Write a Pass <../../WritingAnLLVMPass.html>`_ document and the
119`List of LLVM Passes <../../Passes.html>`_.
120
121For Kaleidoscope, we are currently generating functions on the fly, one
122at a time, as the user types them in. We aren't shooting for the
123ultimate optimization experience in this setting, but we also want to
124catch the easy and quick stuff where possible. As such, we will choose
125to run a few per-function optimizations as the user types the function
126in. If we wanted to make a "static Kaleidoscope compiler", we would use
127exactly the code we have now, except that we would defer running the
128optimizer until the entire file has been parsed.
129
130In order to get per-function optimizations going, we need to set up a
131`FunctionPassManager <../../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold
132and organize the LLVM optimizations that we want to run. Once we have
133that, we can add a set of optimizations to run. We'll need a new
134FunctionPassManager for each module that we want to optimize, so we'll
135write a function to create and initialize both the module and pass manager
136for us:
137
138.. code-block:: c++
139
140    void InitializeModuleAndPassManager(void) {
141      // Open a new module.
142      TheModule = std::make_unique<Module>("my cool jit", TheContext);
143
144      // Create a new pass manager attached to it.
145      TheFPM = std::make_unique<legacy::FunctionPassManager>(TheModule.get());
146
147      // Do simple "peephole" optimizations and bit-twiddling optzns.
148      TheFPM->add(createInstructionCombiningPass());
149      // Reassociate expressions.
150      TheFPM->add(createReassociatePass());
151      // Eliminate Common SubExpressions.
152      TheFPM->add(createGVNPass());
153      // Simplify the control flow graph (deleting unreachable blocks, etc).
154      TheFPM->add(createCFGSimplificationPass());
155
156      TheFPM->doInitialization();
157    }
158
159This code initializes the global module ``TheModule``, and the function pass
160manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is
161set up, we use a series of "add" calls to add a bunch of LLVM passes.
162
163In this case, we choose to add four optimization passes.
164The passes we choose here are a pretty standard set
165of "cleanup" optimizations that are useful for a wide variety of code. I won't
166delve into what they do but, believe me, they are a good starting place :).
167
168Once the PassManager is set up, we need to make use of it. We do this by
169running it after our newly created function is constructed (in
170``FunctionAST::codegen()``), but before it is returned to the client:
171
172.. code-block:: c++
173
174      if (Value *RetVal = Body->codegen()) {
175        // Finish off the function.
176        Builder.CreateRet(RetVal);
177
178        // Validate the generated code, checking for consistency.
179        verifyFunction(*TheFunction);
180
181        // Optimize the function.
182        TheFPM->run(*TheFunction);
183
184        return TheFunction;
185      }
186
187As you can see, this is pretty straightforward. The
188``FunctionPassManager`` optimizes and updates the LLVM Function\* in
189place, improving (hopefully) its body. With this in place, we can try
190our test above again:
191
192::
193
194    ready> def test(x) (1+2+x)*(x+(1+2));
195    ready> Read function definition:
196    define double @test(double %x) {
197    entry:
198            %addtmp = fadd double %x, 3.000000e+00
199            %multmp = fmul double %addtmp, %addtmp
200            ret double %multmp
201    }
202
203As expected, we now get our nicely optimized code, saving a floating
204point add instruction from every execution of this function.
205
206LLVM provides a wide variety of optimizations that can be used in
207certain circumstances. Some `documentation about the various
208passes <../../Passes.html>`_ is available, but it isn't very complete.
209Another good source of ideas can come from looking at the passes that
210``Clang`` runs to get started. The "``opt``" tool allows you to
211experiment with passes from the command line, so you can see if they do
212anything.
213
214Now that we have reasonable code coming out of our front-end, let's talk
215about executing it!
216
217Adding a JIT Compiler
218=====================
219
220Code that is available in LLVM IR can have a wide variety of tools
221applied to it. For example, you can run optimizations on it (as we did
222above), you can dump it out in textual or binary forms, you can compile
223the code to an assembly file (.s) for some target, or you can JIT
224compile it. The nice thing about the LLVM IR representation is that it
225is the "common currency" between many different parts of the compiler.
226
227In this section, we'll add JIT compiler support to our interpreter. The
228basic idea that we want for Kaleidoscope is to have the user enter
229function bodies as they do now, but immediately evaluate the top-level
230expressions they type in. For example, if they type in "1 + 2;", we
231should evaluate and print out 3. If they define a function, they should
232be able to call it from the command line.
233
234In order to do this, we first prepare the environment to create code for
235the current native target and declare and initialize the JIT. This is
236done by calling some ``InitializeNativeTarget\*`` functions and
237adding a global variable ``TheJIT``, and initializing it in
238``main``:
239
240.. code-block:: c++
241
242    static std::unique_ptr<KaleidoscopeJIT> TheJIT;
243    ...
244    int main() {
245      InitializeNativeTarget();
246      InitializeNativeTargetAsmPrinter();
247      InitializeNativeTargetAsmParser();
248
249      // Install standard binary operators.
250      // 1 is lowest precedence.
251      BinopPrecedence['<'] = 10;
252      BinopPrecedence['+'] = 20;
253      BinopPrecedence['-'] = 20;
254      BinopPrecedence['*'] = 40; // highest.
255
256      // Prime the first token.
257      fprintf(stderr, "ready> ");
258      getNextToken();
259
260      TheJIT = std::make_unique<KaleidoscopeJIT>();
261
262      // Run the main "interpreter loop" now.
263      MainLoop();
264
265      return 0;
266    }
267
268We also need to setup the data layout for the JIT:
269
270.. code-block:: c++
271
272    void InitializeModuleAndPassManager(void) {
273      // Open a new module.
274      TheModule = std::make_unique<Module>("my cool jit", TheContext);
275      TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());
276
277      // Create a new pass manager attached to it.
278      TheFPM = std::make_unique<legacy::FunctionPassManager>(TheModule.get());
279      ...
280
281The KaleidoscopeJIT class is a simple JIT built specifically for these
282tutorials, available inside the LLVM source code
283at llvm-src/examples/Kaleidoscope/include/KaleidoscopeJIT.h.
284In later chapters we will look at how it works and extend it with
285new features, but for now we will take it as given. Its API is very simple:
286``addModule`` adds an LLVM IR module to the JIT, making its functions
287available for execution; ``removeModule`` removes a module, freeing any
288memory associated with the code in that module; and ``findSymbol`` allows us
289to look up pointers to the compiled code.
290
291We can take this simple API and change our code that parses top-level expressions to
292look like this:
293
294.. code-block:: c++
295
296    static void HandleTopLevelExpression() {
297      // Evaluate a top-level expression into an anonymous function.
298      if (auto FnAST = ParseTopLevelExpr()) {
299        if (FnAST->codegen()) {
300
301          // JIT the module containing the anonymous expression, keeping a handle so
302          // we can free it later.
303          auto H = TheJIT->addModule(std::move(TheModule));
304          InitializeModuleAndPassManager();
305
306          // Search the JIT for the __anon_expr symbol.
307          auto ExprSymbol = TheJIT->findSymbol("__anon_expr");
308          assert(ExprSymbol && "Function not found");
309
310          // Get the symbol's address and cast it to the right type (takes no
311          // arguments, returns a double) so we can call it as a native function.
312          double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
313          fprintf(stderr, "Evaluated to %f\n", FP());
314
315          // Delete the anonymous expression module from the JIT.
316          TheJIT->removeModule(H);
317        }
318
319If parsing and codegen succeed, the next step is to add the module containing
320the top-level expression to the JIT. We do this by calling addModule, which
321triggers code generation for all the functions in the module, and returns a
322handle that can be used to remove the module from the JIT later. Once the module
323has been added to the JIT it can no longer be modified, so we also open a new
324module to hold subsequent code by calling ``InitializeModuleAndPassManager()``.
325
326Once we've added the module to the JIT we need to get a pointer to the final
327generated code. We do this by calling the JIT's findSymbol method, and passing
328the name of the top-level expression function: ``__anon_expr``. Since we just
329added this function, we assert that findSymbol returned a result.
330
331Next, we get the in-memory address of the ``__anon_expr`` function by calling
332``getAddress()`` on the symbol. Recall that we compile top-level expressions
333into a self-contained LLVM function that takes no arguments and returns the
334computed double. Because the LLVM JIT compiler matches the native platform ABI,
335this means that you can just cast the result pointer to a function pointer of
336that type and call it directly. This means, there is no difference between JIT
337compiled code and native machine code that is statically linked into your
338application.
339
340Finally, since we don't support re-evaluation of top-level expressions, we
341remove the module from the JIT when we're done to free the associated memory.
342Recall, however, that the module we created a few lines earlier (via
343``InitializeModuleAndPassManager``) is still open and waiting for new code to be
344added.
345
346With just these two changes, let's see how Kaleidoscope works now!
347
348::
349
350    ready> 4+5;
351    Read top-level expression:
352    define double @0() {
353    entry:
354      ret double 9.000000e+00
355    }
356
357    Evaluated to 9.000000
358
359Well this looks like it is basically working. The dump of the function
360shows the "no argument function that always returns double" that we
361synthesize for each top-level expression that is typed in. This
362demonstrates very basic functionality, but can we do more?
363
364::
365
366    ready> def testfunc(x y) x + y*2;
367    Read function definition:
368    define double @testfunc(double %x, double %y) {
369    entry:
370      %multmp = fmul double %y, 2.000000e+00
371      %addtmp = fadd double %multmp, %x
372      ret double %addtmp
373    }
374
375    ready> testfunc(4, 10);
376    Read top-level expression:
377    define double @1() {
378    entry:
379      %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
380      ret double %calltmp
381    }
382
383    Evaluated to 24.000000
384
385    ready> testfunc(5, 10);
386    ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved!
387
388
389Function definitions and calls also work, but something went very wrong on that
390last line. The call looks valid, so what happened? As you may have guessed from
391the API a Module is a unit of allocation for the JIT, and testfunc was part
392of the same module that contained anonymous expression. When we removed that
393module from the JIT to free the memory for the anonymous expression, we deleted
394the definition of ``testfunc`` along with it. Then, when we tried to call
395testfunc a second time, the JIT could no longer find it.
396
397The easiest way to fix this is to put the anonymous expression in a separate
398module from the rest of the function definitions. The JIT will happily resolve
399function calls across module boundaries, as long as each of the functions called
400has a prototype, and is added to the JIT before it is called. By putting the
401anonymous expression in a different module we can delete it without affecting
402the rest of the functions.
403
404In fact, we're going to go a step further and put every function in its own
405module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT
406that will make our environment more REPL-like: Functions can be added to the
407JIT more than once (unlike a module where every function must have a unique
408definition). When you look up a symbol in KaleidoscopeJIT it will always return
409the most recent definition:
410
411::
412
413    ready> def foo(x) x + 1;
414    Read function definition:
415    define double @foo(double %x) {
416    entry:
417      %addtmp = fadd double %x, 1.000000e+00
418      ret double %addtmp
419    }
420
421    ready> foo(2);
422    Evaluated to 3.000000
423
424    ready> def foo(x) x + 2;
425    define double @foo(double %x) {
426    entry:
427      %addtmp = fadd double %x, 2.000000e+00
428      ret double %addtmp
429    }
430
431    ready> foo(2);
432    Evaluated to 4.000000
433
434
435To allow each function to live in its own module we'll need a way to
436re-generate previous function declarations into each new module we open:
437
438.. code-block:: c++
439
440    static std::unique_ptr<KaleidoscopeJIT> TheJIT;
441
442    ...
443
444    Function *getFunction(std::string Name) {
445      // First, see if the function has already been added to the current module.
446      if (auto *F = TheModule->getFunction(Name))
447        return F;
448
449      // If not, check whether we can codegen the declaration from some existing
450      // prototype.
451      auto FI = FunctionProtos.find(Name);
452      if (FI != FunctionProtos.end())
453        return FI->second->codegen();
454
455      // If no existing prototype exists, return null.
456      return nullptr;
457    }
458
459    ...
460
461    Value *CallExprAST::codegen() {
462      // Look up the name in the global module table.
463      Function *CalleeF = getFunction(Callee);
464
465    ...
466
467    Function *FunctionAST::codegen() {
468      // Transfer ownership of the prototype to the FunctionProtos map, but keep a
469      // reference to it for use below.
470      auto &P = *Proto;
471      FunctionProtos[Proto->getName()] = std::move(Proto);
472      Function *TheFunction = getFunction(P.getName());
473      if (!TheFunction)
474        return nullptr;
475
476
477To enable this, we'll start by adding a new global, ``FunctionProtos``, that
478holds the most recent prototype for each function. We'll also add a convenience
479method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``.
480Our convenience method searches ``TheModule`` for an existing function
481declaration, falling back to generating a new declaration from FunctionProtos if
482it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the
483call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to
484update the FunctionProtos map first, then call ``getFunction()``. With this
485done, we can always obtain a function declaration in the current module for any
486previously declared function.
487
488We also need to update HandleDefinition and HandleExtern:
489
490.. code-block:: c++
491
492    static void HandleDefinition() {
493      if (auto FnAST = ParseDefinition()) {
494        if (auto *FnIR = FnAST->codegen()) {
495          fprintf(stderr, "Read function definition:");
496          FnIR->print(errs());
497          fprintf(stderr, "\n");
498          TheJIT->addModule(std::move(TheModule));
499          InitializeModuleAndPassManager();
500        }
501      } else {
502        // Skip token for error recovery.
503         getNextToken();
504      }
505    }
506
507    static void HandleExtern() {
508      if (auto ProtoAST = ParseExtern()) {
509        if (auto *FnIR = ProtoAST->codegen()) {
510          fprintf(stderr, "Read extern: ");
511          FnIR->print(errs());
512          fprintf(stderr, "\n");
513          FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);
514        }
515      } else {
516        // Skip token for error recovery.
517        getNextToken();
518      }
519    }
520
521In HandleDefinition, we add two lines to transfer the newly defined function to
522the JIT and open a new module. In HandleExtern, we just need to add one line to
523add the prototype to FunctionProtos.
524
525With these changes made, let's try our REPL again (I removed the dump of the
526anonymous functions this time, you should get the idea by now :) :
527
528::
529
530    ready> def foo(x) x + 1;
531    ready> foo(2);
532    Evaluated to 3.000000
533
534    ready> def foo(x) x + 2;
535    ready> foo(2);
536    Evaluated to 4.000000
537
538It works!
539
540Even with this simple code, we get some surprisingly powerful capabilities -
541check this out:
542
543::
544
545    ready> extern sin(x);
546    Read extern:
547    declare double @sin(double)
548
549    ready> extern cos(x);
550    Read extern:
551    declare double @cos(double)
552
553    ready> sin(1.0);
554    Read top-level expression:
555    define double @2() {
556    entry:
557      ret double 0x3FEAED548F090CEE
558    }
559
560    Evaluated to 0.841471
561
562    ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);
563    Read function definition:
564    define double @foo(double %x) {
565    entry:
566      %calltmp = call double @sin(double %x)
567      %multmp = fmul double %calltmp, %calltmp
568      %calltmp2 = call double @cos(double %x)
569      %multmp4 = fmul double %calltmp2, %calltmp2
570      %addtmp = fadd double %multmp, %multmp4
571      ret double %addtmp
572    }
573
574    ready> foo(4.0);
575    Read top-level expression:
576    define double @3() {
577    entry:
578      %calltmp = call double @foo(double 4.000000e+00)
579      ret double %calltmp
580    }
581
582    Evaluated to 1.000000
583
584Whoa, how does the JIT know about sin and cos? The answer is surprisingly
585simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that
586it uses to find symbols that aren't available in any given module: First
587it searches all the modules that have already been added to the JIT, from the
588most recent to the oldest, to find the newest definition. If no definition is
589found inside the JIT, it falls back to calling "``dlsym("sin")``" on the
590Kaleidoscope process itself. Since "``sin``" is defined within the JIT's
591address space, it simply patches up calls in the module to call the libm
592version of ``sin`` directly. But in some cases this even goes further:
593as sin and cos are names of standard math functions, the constant folder
594will directly evaluate the function calls to the correct result when called
595with constants like in the "``sin(1.0)``" above.
596
597In the future we'll see how tweaking this symbol resolution rule can be used to
598enable all sorts of useful features, from security (restricting the set of
599symbols available to JIT'd code), to dynamic code generation based on symbol
600names, and even lazy compilation.
601
602One immediate benefit of the symbol resolution rule is that we can now extend
603the language by writing arbitrary C++ code to implement operations. For example,
604if we add:
605
606.. code-block:: c++
607
608    #ifdef _WIN32
609    #define DLLEXPORT __declspec(dllexport)
610    #else
611    #define DLLEXPORT
612    #endif
613
614    /// putchard - putchar that takes a double and returns 0.
615    extern "C" DLLEXPORT double putchard(double X) {
616      fputc((char)X, stderr);
617      return 0;
618    }
619
620Note, that for Windows we need to actually export the functions because
621the dynamic symbol loader will use GetProcAddress to find the symbols.
622
623Now we can produce simple output to the console by using things like:
624"``extern putchard(x); putchard(120);``", which prints a lowercase 'x'
625on the console (120 is the ASCII code for 'x'). Similar code could be
626used to implement file I/O, console input, and many other capabilities
627in Kaleidoscope.
628
629This completes the JIT and optimizer chapter of the Kaleidoscope
630tutorial. At this point, we can compile a non-Turing-complete
631programming language, optimize and JIT compile it in a user-driven way.
632Next up we'll look into `extending the language with control flow
633constructs <LangImpl05.html>`_, tackling some interesting LLVM IR issues
634along the way.
635
636Full Code Listing
637=================
638
639Here is the complete code listing for our running example, enhanced with
640the LLVM JIT and optimizer. To build this example, use:
641
642.. code-block:: bash
643
644    # Compile
645    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
646    # Run
647    ./toy
648
649If you are compiling this on Linux, make sure to add the "-rdynamic"
650option as well. This makes sure that the external functions are resolved
651properly at runtime.
652
653Here is the code:
654
655.. literalinclude:: ../../../examples/Kaleidoscope/Chapter4/toy.cpp
656   :language: c++
657
658`Next: Extending the language: control flow <LangImpl05.html>`_
659
660