1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3 4<html> 5<head> 6 <title>Kaleidoscope: Extending the Language: Control Flow</title> 7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 8 <meta name="author" content="Chris Lattner"> 9 <link rel="stylesheet" href="../llvm.css" type="text/css"> 10</head> 11 12<body> 13 14<h1>Kaleidoscope: Extending the Language: Control Flow</h1> 15 16<ul> 17<li><a href="index.html">Up to Tutorial Index</a></li> 18<li>Chapter 5 19 <ol> 20 <li><a href="#intro">Chapter 5 Introduction</a></li> 21 <li><a href="#ifthen">If/Then/Else</a> 22 <ol> 23 <li><a href="#iflexer">Lexer Extensions</a></li> 24 <li><a href="#ifast">AST Extensions</a></li> 25 <li><a href="#ifparser">Parser Extensions</a></li> 26 <li><a href="#ifir">LLVM IR</a></li> 27 <li><a href="#ifcodegen">Code Generation</a></li> 28 </ol> 29 </li> 30 <li><a href="#for">'for' Loop Expression</a> 31 <ol> 32 <li><a href="#forlexer">Lexer Extensions</a></li> 33 <li><a href="#forast">AST Extensions</a></li> 34 <li><a href="#forparser">Parser Extensions</a></li> 35 <li><a href="#forir">LLVM IR</a></li> 36 <li><a href="#forcodegen">Code Generation</a></li> 37 </ol> 38 </li> 39 <li><a href="#code">Full Code Listing</a></li> 40 </ol> 41</li> 42<li><a href="LangImpl6.html">Chapter 6</a>: Extending the Language: 43User-defined Operators</li> 44</ul> 45 46<div class="doc_author"> 47 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> 48</div> 49 50<!-- *********************************************************************** --> 51<h2><a name="intro">Chapter 5 Introduction</a></h2> 52<!-- *********************************************************************** --> 53 54<div> 55 56<p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language 57with LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple 58Kaleidoscope language and included support for generating LLVM IR, followed by 59optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is 60mostly useless: it has no control flow other than call and return. This means 61that you can't have conditional branches in the code, significantly limiting its 62power. In this episode of "build that compiler", we'll extend Kaleidoscope to 63have an if/then/else expression plus a simple 'for' loop.</p> 64 65</div> 66 67<!-- *********************************************************************** --> 68<h2><a name="ifthen">If/Then/Else</a></h2> 69<!-- *********************************************************************** --> 70 71<div> 72 73<p> 74Extending Kaleidoscope to support if/then/else is quite straightforward. It 75basically requires adding support for this "new" concept to the lexer, 76parser, AST, and LLVM code emitter. This example is nice, because it shows how 77easy it is to "grow" a language over time, incrementally extending it as new 78ideas are discovered.</p> 79 80<p>Before we get going on "how" we add this extension, lets talk about "what" we 81want. The basic idea is that we want to be able to write this sort of thing: 82</p> 83 84<div class="doc_code"> 85<pre> 86def fib(x) 87 if x < 3 then 88 1 89 else 90 fib(x-1)+fib(x-2); 91</pre> 92</div> 93 94<p>In Kaleidoscope, every construct is an expression: there are no statements. 95As such, the if/then/else expression needs to return a value like any other. 96Since we're using a mostly functional form, we'll have it evaluate its 97conditional, then return the 'then' or 'else' value based on how the condition 98was resolved. This is very similar to the C "?:" expression.</p> 99 100<p>The semantics of the if/then/else expression is that it evaluates the 101condition to a boolean equality value: 0.0 is considered to be false and 102everything else is considered to be true. 103If the condition is true, the first subexpression is evaluated and returned, if 104the condition is false, the second subexpression is evaluated and returned. 105Since Kaleidoscope allows side-effects, this behavior is important to nail down. 106</p> 107 108<p>Now that we know what we "want", lets break this down into its constituent 109pieces.</p> 110 111<!-- ======================================================================= --> 112<h4><a name="iflexer">Lexer Extensions for If/Then/Else</a></h4> 113<!-- ======================================================================= --> 114 115 116<div> 117 118<p>The lexer extensions are straightforward. First we add new enum values 119for the relevant tokens:</p> 120 121<div class="doc_code"> 122<pre> 123 // control 124 tok_if = -6, tok_then = -7, tok_else = -8, 125</pre> 126</div> 127 128<p>Once we have that, we recognize the new keywords in the lexer. This is pretty simple 129stuff:</p> 130 131<div class="doc_code"> 132<pre> 133 ... 134 if (IdentifierStr == "def") return tok_def; 135 if (IdentifierStr == "extern") return tok_extern; 136 <b>if (IdentifierStr == "if") return tok_if; 137 if (IdentifierStr == "then") return tok_then; 138 if (IdentifierStr == "else") return tok_else;</b> 139 return tok_identifier; 140</pre> 141</div> 142 143</div> 144 145<!-- ======================================================================= --> 146<h4><a name="ifast">AST Extensions for If/Then/Else</a></h4> 147<!-- ======================================================================= --> 148 149<div> 150 151<p>To represent the new expression we add a new AST node for it:</p> 152 153<div class="doc_code"> 154<pre> 155/// IfExprAST - Expression class for if/then/else. 156class IfExprAST : public ExprAST { 157 ExprAST *Cond, *Then, *Else; 158public: 159 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) 160 : Cond(cond), Then(then), Else(_else) {} 161 virtual Value *Codegen(); 162}; 163</pre> 164</div> 165 166<p>The AST node just has pointers to the various subexpressions.</p> 167 168</div> 169 170<!-- ======================================================================= --> 171<h4><a name="ifparser">Parser Extensions for If/Then/Else</a></h4> 172<!-- ======================================================================= --> 173 174<div> 175 176<p>Now that we have the relevant tokens coming from the lexer and we have the 177AST node to build, our parsing logic is relatively straightforward. First we 178define a new parsing function:</p> 179 180<div class="doc_code"> 181<pre> 182/// ifexpr ::= 'if' expression 'then' expression 'else' expression 183static ExprAST *ParseIfExpr() { 184 getNextToken(); // eat the if. 185 186 // condition. 187 ExprAST *Cond = ParseExpression(); 188 if (!Cond) return 0; 189 190 if (CurTok != tok_then) 191 return Error("expected then"); 192 getNextToken(); // eat the then 193 194 ExprAST *Then = ParseExpression(); 195 if (Then == 0) return 0; 196 197 if (CurTok != tok_else) 198 return Error("expected else"); 199 200 getNextToken(); 201 202 ExprAST *Else = ParseExpression(); 203 if (!Else) return 0; 204 205 return new IfExprAST(Cond, Then, Else); 206} 207</pre> 208</div> 209 210<p>Next we hook it up as a primary expression:</p> 211 212<div class="doc_code"> 213<pre> 214static ExprAST *ParsePrimary() { 215 switch (CurTok) { 216 default: return Error("unknown token when expecting an expression"); 217 case tok_identifier: return ParseIdentifierExpr(); 218 case tok_number: return ParseNumberExpr(); 219 case '(': return ParseParenExpr(); 220 <b>case tok_if: return ParseIfExpr();</b> 221 } 222} 223</pre> 224</div> 225 226</div> 227 228<!-- ======================================================================= --> 229<h4><a name="ifir">LLVM IR for If/Then/Else</a></h4> 230<!-- ======================================================================= --> 231 232<div> 233 234<p>Now that we have it parsing and building the AST, the final piece is adding 235LLVM code generation support. This is the most interesting part of the 236if/then/else example, because this is where it starts to introduce new concepts. 237All of the code above has been thoroughly described in previous chapters. 238</p> 239 240<p>To motivate the code we want to produce, lets take a look at a simple 241example. Consider:</p> 242 243<div class="doc_code"> 244<pre> 245extern foo(); 246extern bar(); 247def baz(x) if x then foo() else bar(); 248</pre> 249</div> 250 251<p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope 252looks like this:</p> 253 254<div class="doc_code"> 255<pre> 256declare double @foo() 257 258declare double @bar() 259 260define double @baz(double %x) { 261entry: 262 %ifcond = fcmp one double %x, 0.000000e+00 263 br i1 %ifcond, label %then, label %else 264 265then: ; preds = %entry 266 %calltmp = call double @foo() 267 br label %ifcont 268 269else: ; preds = %entry 270 %calltmp1 = call double @bar() 271 br label %ifcont 272 273ifcont: ; preds = %else, %then 274 %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] 275 ret double %iftmp 276} 277</pre> 278</div> 279 280<p>To visualize the control flow graph, you can use a nifty feature of the LLVM 281'<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR 282into "t.ll" and run "<tt>llvm-as < t.ll | opt -analyze -view-cfg</tt>", <a 283href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll 284see this graph:</p> 285 286<div style="text-align: center"><img src="LangImpl5-cfg.png" alt="Example CFG" width="423" 287height="315"></div> 288 289<p>Another way to get this is to call "<tt>F->viewCFG()</tt>" or 290"<tt>F->viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by 291inserting actual calls into the code and recompiling or by calling these in the 292debugger. LLVM has many nice features for visualizing various graphs.</p> 293 294<p>Getting back to the generated code, it is fairly simple: the entry block 295evaluates the conditional expression ("x" in our case here) and compares the 296result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>" 297instruction ('one' is "Ordered and Not Equal"). Based on the result of this 298expression, the code jumps to either the "then" or "else" blocks, which contain 299the expressions for the true/false cases.</p> 300 301<p>Once the then/else blocks are finished executing, they both branch back to the 302'ifcont' block to execute the code that happens after the if/then/else. In this 303case the only thing left to do is to return to the caller of the function. The 304question then becomes: how does the code know which expression to return?</p> 305 306<p>The answer to this question involves an important SSA operation: the 307<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi 308operation</a>. If you're not familiar with SSA, <a 309href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia 310article</a> is a good introduction and there are various other introductions to 311it available on your favorite search engine. The short version is that 312"execution" of the Phi operation requires "remembering" which block control came 313from. The Phi operation takes on the value corresponding to the input control 314block. In this case, if control comes in from the "then" block, it gets the 315value of "calltmp". If control comes from the "else" block, it gets the value 316of "calltmp1".</p> 317 318<p>At this point, you are probably starting to think "Oh no! This means my 319simple and elegant front-end will have to start generating SSA form in order to 320use LLVM!". Fortunately, this is not the case, and we strongly advise 321<em>not</em> implementing an SSA construction algorithm in your front-end 322unless there is an amazingly good reason to do so. In practice, there are two 323sorts of values that float around in code written for your average imperative 324programming language that might need Phi nodes:</p> 325 326<ol> 327<li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li> 328<li>Values that are implicit in the structure of your AST, such as the Phi node 329in this case.</li> 330</ol> 331 332<p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable 333variables"), we'll talk about #1 334in depth. For now, just believe me that you don't need SSA construction to 335handle this case. For #2, you have the choice of using the techniques that we will 336describe for #1, or you can insert Phi nodes directly, if convenient. In this 337case, it is really really easy to generate the Phi node, so we choose to do it 338directly.</p> 339 340<p>Okay, enough of the motivation and overview, lets generate code!</p> 341 342</div> 343 344<!-- ======================================================================= --> 345<h4><a name="ifcodegen">Code Generation for If/Then/Else</a></h4> 346<!-- ======================================================================= --> 347 348<div> 349 350<p>In order to generate code for this, we implement the <tt>Codegen</tt> method 351for <tt>IfExprAST</tt>:</p> 352 353<div class="doc_code"> 354<pre> 355Value *IfExprAST::Codegen() { 356 Value *CondV = Cond->Codegen(); 357 if (CondV == 0) return 0; 358 359 // Convert condition to a bool by comparing equal to 0.0. 360 CondV = Builder.CreateFCmpONE(CondV, 361 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 362 "ifcond"); 363</pre> 364</div> 365 366<p>This code is straightforward and similar to what we saw before. We emit the 367expression for the condition, then compare that value to zero to get a truth 368value as a 1-bit (bool) value.</p> 369 370<div class="doc_code"> 371<pre> 372 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 373 374 // Create blocks for the then and else cases. Insert the 'then' block at the 375 // end of the function. 376 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); 377 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); 378 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); 379 380 Builder.CreateCondBr(CondV, ThenBB, ElseBB); 381</pre> 382</div> 383 384<p>This code creates the basic blocks that are related to the if/then/else 385statement, and correspond directly to the blocks in the example above. The 386first line gets the current Function object that is being built. It 387gets this by asking the builder for the current BasicBlock, and asking that 388block for its "parent" (the function it is currently embedded into).</p> 389 390<p>Once it has that, it creates three blocks. Note that it passes "TheFunction" 391into the constructor for the "then" block. This causes the constructor to 392automatically insert the new block into the end of the specified function. The 393other two blocks are created, but aren't yet inserted into the function.</p> 394 395<p>Once the blocks are created, we can emit the conditional branch that chooses 396between them. Note that creating new blocks does not implicitly affect the 397IRBuilder, so it is still inserting into the block that the condition 398went into. Also note that it is creating a branch to the "then" block and the 399"else" block, even though the "else" block isn't inserted into the function yet. 400This is all ok: it is the standard way that LLVM supports forward 401references.</p> 402 403<div class="doc_code"> 404<pre> 405 // Emit then value. 406 Builder.SetInsertPoint(ThenBB); 407 408 Value *ThenV = Then->Codegen(); 409 if (ThenV == 0) return 0; 410 411 Builder.CreateBr(MergeBB); 412 // Codegen of 'Then' can change the current block, update ThenBB for the PHI. 413 ThenBB = Builder.GetInsertBlock(); 414</pre> 415</div> 416 417<p>After the conditional branch is inserted, we move the builder to start 418inserting into the "then" block. Strictly speaking, this call moves the 419insertion point to be at the end of the specified block. However, since the 420"then" block is empty, it also starts out by inserting at the beginning of the 421block. :)</p> 422 423<p>Once the insertion point is set, we recursively codegen the "then" expression 424from the AST. To finish off the "then" block, we create an unconditional branch 425to the merge block. One interesting (and very important) aspect of the LLVM IR 426is that it <a href="../LangRef.html#functionstructure">requires all basic blocks 427to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow 428instruction</a> such as return or branch. This means that all control flow, 429<em>including fall throughs</em> must be made explicit in the LLVM IR. If you 430violate this rule, the verifier will emit an error.</p> 431 432<p>The final line here is quite subtle, but is very important. The basic issue 433is that when we create the Phi node in the merge block, we need to set up the 434block/value pairs that indicate how the Phi will work. Importantly, the Phi 435node expects to have an entry for each predecessor of the block in the CFG. Why 436then, are we getting the current block when we just set it to ThenBB 5 lines 437above? The problem is that the "Then" expression may actually itself change the 438block that the Builder is emitting into if, for example, it contains a nested 439"if/then/else" expression. Because calling Codegen recursively could 440arbitrarily change the notion of the current block, we are required to get an 441up-to-date value for code that will set up the Phi node.</p> 442 443<div class="doc_code"> 444<pre> 445 // Emit else block. 446 TheFunction->getBasicBlockList().push_back(ElseBB); 447 Builder.SetInsertPoint(ElseBB); 448 449 Value *ElseV = Else->Codegen(); 450 if (ElseV == 0) return 0; 451 452 Builder.CreateBr(MergeBB); 453 // Codegen of 'Else' can change the current block, update ElseBB for the PHI. 454 ElseBB = Builder.GetInsertBlock(); 455</pre> 456</div> 457 458<p>Code generation for the 'else' block is basically identical to codegen for 459the 'then' block. The only significant difference is the first line, which adds 460the 'else' block to the function. Recall previously that the 'else' block was 461created, but not added to the function. Now that the 'then' and 'else' blocks 462are emitted, we can finish up with the merge code:</p> 463 464<div class="doc_code"> 465<pre> 466 // Emit merge block. 467 TheFunction->getBasicBlockList().push_back(MergeBB); 468 Builder.SetInsertPoint(MergeBB); 469 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, 470 "iftmp"); 471 472 PN->addIncoming(ThenV, ThenBB); 473 PN->addIncoming(ElseV, ElseBB); 474 return PN; 475} 476</pre> 477</div> 478 479<p>The first two lines here are now familiar: the first adds the "merge" block 480to the Function object (it was previously floating, like the else block above). 481The second block changes the insertion point so that newly created code will go 482into the "merge" block. Once that is done, we need to create the PHI node and 483set up the block/value pairs for the PHI.</p> 484 485<p>Finally, the CodeGen function returns the phi node as the value computed by 486the if/then/else expression. In our example above, this returned value will 487feed into the code for the top-level function, which will create the return 488instruction.</p> 489 490<p>Overall, we now have the ability to execute conditional code in 491Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language 492that can calculate a wide variety of numeric functions. Next up we'll add 493another useful expression that is familiar from non-functional languages...</p> 494 495</div> 496 497</div> 498 499<!-- *********************************************************************** --> 500<h2><a name="for">'for' Loop Expression</a></h2> 501<!-- *********************************************************************** --> 502 503<div> 504 505<p>Now that we know how to add basic control flow constructs to the language, 506we have the tools to add more powerful things. Lets add something more 507aggressive, a 'for' expression:</p> 508 509<div class="doc_code"> 510<pre> 511 extern putchard(char) 512 def printstar(n) 513 for i = 1, i < n, 1.0 in 514 putchard(42); # ascii 42 = '*' 515 516 # print 100 '*' characters 517 printstar(100); 518</pre> 519</div> 520 521<p>This expression defines a new variable ("i" in this case) which iterates from 522a starting value, while the condition ("i < n" in this case) is true, 523incrementing by an optional step value ("1.0" in this case). If the step value 524is omitted, it defaults to 1.0. While the loop is true, it executes its 525body expression. Because we don't have anything better to return, we'll just 526define the loop as always returning 0.0. In the future when we have mutable 527variables, it will get more useful.</p> 528 529<p>As before, lets talk about the changes that we need to Kaleidoscope to 530support this.</p> 531 532<!-- ======================================================================= --> 533<h4><a name="forlexer">Lexer Extensions for the 'for' Loop</a></h4> 534<!-- ======================================================================= --> 535 536<div> 537 538<p>The lexer extensions are the same sort of thing as for if/then/else:</p> 539 540<div class="doc_code"> 541<pre> 542 ... in enum Token ... 543 // control 544 tok_if = -6, tok_then = -7, tok_else = -8, 545<b> tok_for = -9, tok_in = -10</b> 546 547 ... in gettok ... 548 if (IdentifierStr == "def") return tok_def; 549 if (IdentifierStr == "extern") return tok_extern; 550 if (IdentifierStr == "if") return tok_if; 551 if (IdentifierStr == "then") return tok_then; 552 if (IdentifierStr == "else") return tok_else; 553 <b>if (IdentifierStr == "for") return tok_for; 554 if (IdentifierStr == "in") return tok_in;</b> 555 return tok_identifier; 556</pre> 557</div> 558 559</div> 560 561<!-- ======================================================================= --> 562<h4><a name="forast">AST Extensions for the 'for' Loop</a></h4> 563<!-- ======================================================================= --> 564 565<div> 566 567<p>The AST node is just as simple. It basically boils down to capturing 568the variable name and the constituent expressions in the node.</p> 569 570<div class="doc_code"> 571<pre> 572/// ForExprAST - Expression class for for/in. 573class ForExprAST : public ExprAST { 574 std::string VarName; 575 ExprAST *Start, *End, *Step, *Body; 576public: 577 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, 578 ExprAST *step, ExprAST *body) 579 : VarName(varname), Start(start), End(end), Step(step), Body(body) {} 580 virtual Value *Codegen(); 581}; 582</pre> 583</div> 584 585</div> 586 587<!-- ======================================================================= --> 588<h4><a name="forparser">Parser Extensions for the 'for' Loop</a></h4> 589<!-- ======================================================================= --> 590 591<div> 592 593<p>The parser code is also fairly standard. The only interesting thing here is 594handling of the optional step value. The parser code handles it by checking to 595see if the second comma is present. If not, it sets the step value to null in 596the AST node:</p> 597 598<div class="doc_code"> 599<pre> 600/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression 601static ExprAST *ParseForExpr() { 602 getNextToken(); // eat the for. 603 604 if (CurTok != tok_identifier) 605 return Error("expected identifier after for"); 606 607 std::string IdName = IdentifierStr; 608 getNextToken(); // eat identifier. 609 610 if (CurTok != '=') 611 return Error("expected '=' after for"); 612 getNextToken(); // eat '='. 613 614 615 ExprAST *Start = ParseExpression(); 616 if (Start == 0) return 0; 617 if (CurTok != ',') 618 return Error("expected ',' after for start value"); 619 getNextToken(); 620 621 ExprAST *End = ParseExpression(); 622 if (End == 0) return 0; 623 624 // The step value is optional. 625 ExprAST *Step = 0; 626 if (CurTok == ',') { 627 getNextToken(); 628 Step = ParseExpression(); 629 if (Step == 0) return 0; 630 } 631 632 if (CurTok != tok_in) 633 return Error("expected 'in' after for"); 634 getNextToken(); // eat 'in'. 635 636 ExprAST *Body = ParseExpression(); 637 if (Body == 0) return 0; 638 639 return new ForExprAST(IdName, Start, End, Step, Body); 640} 641</pre> 642</div> 643 644</div> 645 646<!-- ======================================================================= --> 647<h4><a name="forir">LLVM IR for the 'for' Loop</a></h4> 648<!-- ======================================================================= --> 649 650<div> 651 652<p>Now we get to the good part: the LLVM IR we want to generate for this thing. 653With the simple example above, we get this LLVM IR (note that this dump is 654generated with optimizations disabled for clarity): 655</p> 656 657<div class="doc_code"> 658<pre> 659declare double @putchard(double) 660 661define double @printstar(double %n) { 662entry: 663 ; initial value = 1.0 (inlined into phi) 664 br label %loop 665 666loop: ; preds = %loop, %entry 667 %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ] 668 ; body 669 %calltmp = call double @putchard(double 4.200000e+01) 670 ; increment 671 %nextvar = fadd double %i, 1.000000e+00 672 673 ; termination test 674 %cmptmp = fcmp ult double %i, %n 675 %booltmp = uitofp i1 %cmptmp to double 676 %loopcond = fcmp one double %booltmp, 0.000000e+00 677 br i1 %loopcond, label %loop, label %afterloop 678 679afterloop: ; preds = %loop 680 ; loop always returns 0.0 681 ret double 0.000000e+00 682} 683</pre> 684</div> 685 686<p>This loop contains all the same constructs we saw before: a phi node, several 687expressions, and some basic blocks. Lets see how this fits together.</p> 688 689</div> 690 691<!-- ======================================================================= --> 692<h4><a name="forcodegen">Code Generation for the 'for' Loop</a></h4> 693<!-- ======================================================================= --> 694 695<div> 696 697<p>The first part of Codegen is very simple: we just output the start expression 698for the loop value:</p> 699 700<div class="doc_code"> 701<pre> 702Value *ForExprAST::Codegen() { 703 // Emit the start code first, without 'variable' in scope. 704 Value *StartVal = Start->Codegen(); 705 if (StartVal == 0) return 0; 706</pre> 707</div> 708 709<p>With this out of the way, the next step is to set up the LLVM basic block 710for the start of the loop body. In the case above, the whole loop body is one 711block, but remember that the body code itself could consist of multiple blocks 712(e.g. if it contains an if/then/else or a for/in expression).</p> 713 714<div class="doc_code"> 715<pre> 716 // Make the new basic block for the loop header, inserting after current 717 // block. 718 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 719 BasicBlock *PreheaderBB = Builder.GetInsertBlock(); 720 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); 721 722 // Insert an explicit fall through from the current block to the LoopBB. 723 Builder.CreateBr(LoopBB); 724</pre> 725</div> 726 727<p>This code is similar to what we saw for if/then/else. Because we will need 728it to create the Phi node, we remember the block that falls through into the 729loop. Once we have that, we create the actual block that starts the loop and 730create an unconditional branch for the fall-through between the two blocks.</p> 731 732<div class="doc_code"> 733<pre> 734 // Start insertion in LoopBB. 735 Builder.SetInsertPoint(LoopBB); 736 737 // Start the PHI node with an entry for Start. 738 PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); 739 Variable->addIncoming(StartVal, PreheaderBB); 740</pre> 741</div> 742 743<p>Now that the "preheader" for the loop is set up, we switch to emitting code 744for the loop body. To begin with, we move the insertion point and create the 745PHI node for the loop induction variable. Since we already know the incoming 746value for the starting value, we add it to the Phi node. Note that the Phi will 747eventually get a second value for the backedge, but we can't set it up yet 748(because it doesn't exist!).</p> 749 750<div class="doc_code"> 751<pre> 752 // Within the loop, the variable is defined equal to the PHI node. If it 753 // shadows an existing variable, we have to restore it, so save it now. 754 Value *OldVal = NamedValues[VarName]; 755 NamedValues[VarName] = Variable; 756 757 // Emit the body of the loop. This, like any other expr, can change the 758 // current BB. Note that we ignore the value computed by the body, but don't 759 // allow an error. 760 if (Body->Codegen() == 0) 761 return 0; 762</pre> 763</div> 764 765<p>Now the code starts to get more interesting. Our 'for' loop introduces a new 766variable to the symbol table. This means that our symbol table can now contain 767either function arguments or loop variables. To handle this, before we codegen 768the body of the loop, we add the loop variable as the current value for its 769name. Note that it is possible that there is a variable of the same name in the 770outer scope. It would be easy to make this an error (emit an error and return 771null if there is already an entry for VarName) but we choose to allow shadowing 772of variables. In order to handle this correctly, we remember the Value that 773we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is 774no shadowed variable).</p> 775 776<p>Once the loop variable is set into the symbol table, the code recursively 777codegen's the body. This allows the body to use the loop variable: any 778references to it will naturally find it in the symbol table.</p> 779 780<div class="doc_code"> 781<pre> 782 // Emit the step value. 783 Value *StepVal; 784 if (Step) { 785 StepVal = Step->Codegen(); 786 if (StepVal == 0) return 0; 787 } else { 788 // If not specified, use 1.0. 789 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); 790 } 791 792 Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); 793</pre> 794</div> 795 796<p>Now that the body is emitted, we compute the next value of the iteration 797variable by adding the step value, or 1.0 if it isn't present. '<tt>NextVar</tt>' 798will be the value of the loop variable on the next iteration of the loop.</p> 799 800<div class="doc_code"> 801<pre> 802 // Compute the end condition. 803 Value *EndCond = End->Codegen(); 804 if (EndCond == 0) return EndCond; 805 806 // Convert condition to a bool by comparing equal to 0.0. 807 EndCond = Builder.CreateFCmpONE(EndCond, 808 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 809 "loopcond"); 810</pre> 811</div> 812 813<p>Finally, we evaluate the exit value of the loop, to determine whether the 814loop should exit. This mirrors the condition evaluation for the if/then/else 815statement.</p> 816 817<div class="doc_code"> 818<pre> 819 // Create the "after loop" block and insert it. 820 BasicBlock *LoopEndBB = Builder.GetInsertBlock(); 821 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); 822 823 // Insert the conditional branch into the end of LoopEndBB. 824 Builder.CreateCondBr(EndCond, LoopBB, AfterBB); 825 826 // Any new code will be inserted in AfterBB. 827 Builder.SetInsertPoint(AfterBB); 828</pre> 829</div> 830 831<p>With the code for the body of the loop complete, we just need to finish up 832the control flow for it. This code remembers the end block (for the phi node), 833then creates the block for the loop exit ("afterloop"). Based on the value of 834the exit condition, it creates a conditional branch that chooses between 835executing the loop again and exiting the loop. Any future code is emitted in 836the "afterloop" block, so it sets the insertion position to it.</p> 837 838<div class="doc_code"> 839<pre> 840 // Add a new entry to the PHI node for the backedge. 841 Variable->addIncoming(NextVar, LoopEndBB); 842 843 // Restore the unshadowed variable. 844 if (OldVal) 845 NamedValues[VarName] = OldVal; 846 else 847 NamedValues.erase(VarName); 848 849 // for expr always returns 0.0. 850 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); 851} 852</pre> 853</div> 854 855<p>The final code handles various cleanups: now that we have the "NextVar" 856value, we can add the incoming value to the loop PHI node. After that, we 857remove the loop variable from the symbol table, so that it isn't in scope after 858the for loop. Finally, code generation of the for loop always returns 0.0, so 859that is what we return from <tt>ForExprAST::Codegen</tt>.</p> 860 861<p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of 862the tutorial. In this chapter we added two control flow constructs, and used them to motivate a couple of aspects of the LLVM IR that are important for front-end implementors 863to know. In the next chapter of our saga, we will get a bit crazier and add 864<a href="LangImpl6.html">user-defined operators</a> to our poor innocent 865language.</p> 866 867</div> 868 869</div> 870 871<!-- *********************************************************************** --> 872<h2><a name="code">Full Code Listing</a></h2> 873<!-- *********************************************************************** --> 874 875<div> 876 877<p> 878Here is the complete code listing for our running example, enhanced with the 879if/then/else and for expressions.. To build this example, use: 880</p> 881 882<div class="doc_code"> 883<pre> 884# Compile 885clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy 886# Run 887./toy 888</pre> 889</div> 890 891<p>Here is the code:</p> 892 893<div class="doc_code"> 894<pre> 895#include "llvm/DerivedTypes.h" 896#include "llvm/ExecutionEngine/ExecutionEngine.h" 897#include "llvm/ExecutionEngine/JIT.h" 898#include "llvm/LLVMContext.h" 899#include "llvm/Module.h" 900#include "llvm/PassManager.h" 901#include "llvm/Analysis/Verifier.h" 902#include "llvm/Analysis/Passes.h" 903#include "llvm/Target/TargetData.h" 904#include "llvm/Transforms/Scalar.h" 905#include "llvm/Support/IRBuilder.h" 906#include "llvm/Support/TargetSelect.h" 907#include <cstdio> 908#include <string> 909#include <map> 910#include <vector> 911using namespace llvm; 912 913//===----------------------------------------------------------------------===// 914// Lexer 915//===----------------------------------------------------------------------===// 916 917// The lexer returns tokens [0-255] if it is an unknown character, otherwise one 918// of these for known things. 919enum Token { 920 tok_eof = -1, 921 922 // commands 923 tok_def = -2, tok_extern = -3, 924 925 // primary 926 tok_identifier = -4, tok_number = -5, 927 928 // control 929 tok_if = -6, tok_then = -7, tok_else = -8, 930 tok_for = -9, tok_in = -10 931}; 932 933static std::string IdentifierStr; // Filled in if tok_identifier 934static double NumVal; // Filled in if tok_number 935 936/// gettok - Return the next token from standard input. 937static int gettok() { 938 static int LastChar = ' '; 939 940 // Skip any whitespace. 941 while (isspace(LastChar)) 942 LastChar = getchar(); 943 944 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* 945 IdentifierStr = LastChar; 946 while (isalnum((LastChar = getchar()))) 947 IdentifierStr += LastChar; 948 949 if (IdentifierStr == "def") return tok_def; 950 if (IdentifierStr == "extern") return tok_extern; 951 if (IdentifierStr == "if") return tok_if; 952 if (IdentifierStr == "then") return tok_then; 953 if (IdentifierStr == "else") return tok_else; 954 if (IdentifierStr == "for") return tok_for; 955 if (IdentifierStr == "in") return tok_in; 956 return tok_identifier; 957 } 958 959 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ 960 std::string NumStr; 961 do { 962 NumStr += LastChar; 963 LastChar = getchar(); 964 } while (isdigit(LastChar) || LastChar == '.'); 965 966 NumVal = strtod(NumStr.c_str(), 0); 967 return tok_number; 968 } 969 970 if (LastChar == '#') { 971 // Comment until end of line. 972 do LastChar = getchar(); 973 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); 974 975 if (LastChar != EOF) 976 return gettok(); 977 } 978 979 // Check for end of file. Don't eat the EOF. 980 if (LastChar == EOF) 981 return tok_eof; 982 983 // Otherwise, just return the character as its ascii value. 984 int ThisChar = LastChar; 985 LastChar = getchar(); 986 return ThisChar; 987} 988 989//===----------------------------------------------------------------------===// 990// Abstract Syntax Tree (aka Parse Tree) 991//===----------------------------------------------------------------------===// 992 993/// ExprAST - Base class for all expression nodes. 994class ExprAST { 995public: 996 virtual ~ExprAST() {} 997 virtual Value *Codegen() = 0; 998}; 999 1000/// NumberExprAST - Expression class for numeric literals like "1.0". 1001class NumberExprAST : public ExprAST { 1002 double Val; 1003public: 1004 NumberExprAST(double val) : Val(val) {} 1005 virtual Value *Codegen(); 1006}; 1007 1008/// VariableExprAST - Expression class for referencing a variable, like "a". 1009class VariableExprAST : public ExprAST { 1010 std::string Name; 1011public: 1012 VariableExprAST(const std::string &name) : Name(name) {} 1013 virtual Value *Codegen(); 1014}; 1015 1016/// BinaryExprAST - Expression class for a binary operator. 1017class BinaryExprAST : public ExprAST { 1018 char Op; 1019 ExprAST *LHS, *RHS; 1020public: 1021 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) 1022 : Op(op), LHS(lhs), RHS(rhs) {} 1023 virtual Value *Codegen(); 1024}; 1025 1026/// CallExprAST - Expression class for function calls. 1027class CallExprAST : public ExprAST { 1028 std::string Callee; 1029 std::vector<ExprAST*> Args; 1030public: 1031 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) 1032 : Callee(callee), Args(args) {} 1033 virtual Value *Codegen(); 1034}; 1035 1036/// IfExprAST - Expression class for if/then/else. 1037class IfExprAST : public ExprAST { 1038 ExprAST *Cond, *Then, *Else; 1039public: 1040 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) 1041 : Cond(cond), Then(then), Else(_else) {} 1042 virtual Value *Codegen(); 1043}; 1044 1045/// ForExprAST - Expression class for for/in. 1046class ForExprAST : public ExprAST { 1047 std::string VarName; 1048 ExprAST *Start, *End, *Step, *Body; 1049public: 1050 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, 1051 ExprAST *step, ExprAST *body) 1052 : VarName(varname), Start(start), End(end), Step(step), Body(body) {} 1053 virtual Value *Codegen(); 1054}; 1055 1056/// PrototypeAST - This class represents the "prototype" for a function, 1057/// which captures its name, and its argument names (thus implicitly the number 1058/// of arguments the function takes). 1059class PrototypeAST { 1060 std::string Name; 1061 std::vector<std::string> Args; 1062public: 1063 PrototypeAST(const std::string &name, const std::vector<std::string> &args) 1064 : Name(name), Args(args) {} 1065 1066 Function *Codegen(); 1067}; 1068 1069/// FunctionAST - This class represents a function definition itself. 1070class FunctionAST { 1071 PrototypeAST *Proto; 1072 ExprAST *Body; 1073public: 1074 FunctionAST(PrototypeAST *proto, ExprAST *body) 1075 : Proto(proto), Body(body) {} 1076 1077 Function *Codegen(); 1078}; 1079 1080//===----------------------------------------------------------------------===// 1081// Parser 1082//===----------------------------------------------------------------------===// 1083 1084/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current 1085/// token the parser is looking at. getNextToken reads another token from the 1086/// lexer and updates CurTok with its results. 1087static int CurTok; 1088static int getNextToken() { 1089 return CurTok = gettok(); 1090} 1091 1092/// BinopPrecedence - This holds the precedence for each binary operator that is 1093/// defined. 1094static std::map<char, int> BinopPrecedence; 1095 1096/// GetTokPrecedence - Get the precedence of the pending binary operator token. 1097static int GetTokPrecedence() { 1098 if (!isascii(CurTok)) 1099 return -1; 1100 1101 // Make sure it's a declared binop. 1102 int TokPrec = BinopPrecedence[CurTok]; 1103 if (TokPrec <= 0) return -1; 1104 return TokPrec; 1105} 1106 1107/// Error* - These are little helper functions for error handling. 1108ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} 1109PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } 1110FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } 1111 1112static ExprAST *ParseExpression(); 1113 1114/// identifierexpr 1115/// ::= identifier 1116/// ::= identifier '(' expression* ')' 1117static ExprAST *ParseIdentifierExpr() { 1118 std::string IdName = IdentifierStr; 1119 1120 getNextToken(); // eat identifier. 1121 1122 if (CurTok != '(') // Simple variable ref. 1123 return new VariableExprAST(IdName); 1124 1125 // Call. 1126 getNextToken(); // eat ( 1127 std::vector<ExprAST*> Args; 1128 if (CurTok != ')') { 1129 while (1) { 1130 ExprAST *Arg = ParseExpression(); 1131 if (!Arg) return 0; 1132 Args.push_back(Arg); 1133 1134 if (CurTok == ')') break; 1135 1136 if (CurTok != ',') 1137 return Error("Expected ')' or ',' in argument list"); 1138 getNextToken(); 1139 } 1140 } 1141 1142 // Eat the ')'. 1143 getNextToken(); 1144 1145 return new CallExprAST(IdName, Args); 1146} 1147 1148/// numberexpr ::= number 1149static ExprAST *ParseNumberExpr() { 1150 ExprAST *Result = new NumberExprAST(NumVal); 1151 getNextToken(); // consume the number 1152 return Result; 1153} 1154 1155/// parenexpr ::= '(' expression ')' 1156static ExprAST *ParseParenExpr() { 1157 getNextToken(); // eat (. 1158 ExprAST *V = ParseExpression(); 1159 if (!V) return 0; 1160 1161 if (CurTok != ')') 1162 return Error("expected ')'"); 1163 getNextToken(); // eat ). 1164 return V; 1165} 1166 1167/// ifexpr ::= 'if' expression 'then' expression 'else' expression 1168static ExprAST *ParseIfExpr() { 1169 getNextToken(); // eat the if. 1170 1171 // condition. 1172 ExprAST *Cond = ParseExpression(); 1173 if (!Cond) return 0; 1174 1175 if (CurTok != tok_then) 1176 return Error("expected then"); 1177 getNextToken(); // eat the then 1178 1179 ExprAST *Then = ParseExpression(); 1180 if (Then == 0) return 0; 1181 1182 if (CurTok != tok_else) 1183 return Error("expected else"); 1184 1185 getNextToken(); 1186 1187 ExprAST *Else = ParseExpression(); 1188 if (!Else) return 0; 1189 1190 return new IfExprAST(Cond, Then, Else); 1191} 1192 1193/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression 1194static ExprAST *ParseForExpr() { 1195 getNextToken(); // eat the for. 1196 1197 if (CurTok != tok_identifier) 1198 return Error("expected identifier after for"); 1199 1200 std::string IdName = IdentifierStr; 1201 getNextToken(); // eat identifier. 1202 1203 if (CurTok != '=') 1204 return Error("expected '=' after for"); 1205 getNextToken(); // eat '='. 1206 1207 1208 ExprAST *Start = ParseExpression(); 1209 if (Start == 0) return 0; 1210 if (CurTok != ',') 1211 return Error("expected ',' after for start value"); 1212 getNextToken(); 1213 1214 ExprAST *End = ParseExpression(); 1215 if (End == 0) return 0; 1216 1217 // The step value is optional. 1218 ExprAST *Step = 0; 1219 if (CurTok == ',') { 1220 getNextToken(); 1221 Step = ParseExpression(); 1222 if (Step == 0) return 0; 1223 } 1224 1225 if (CurTok != tok_in) 1226 return Error("expected 'in' after for"); 1227 getNextToken(); // eat 'in'. 1228 1229 ExprAST *Body = ParseExpression(); 1230 if (Body == 0) return 0; 1231 1232 return new ForExprAST(IdName, Start, End, Step, Body); 1233} 1234 1235/// primary 1236/// ::= identifierexpr 1237/// ::= numberexpr 1238/// ::= parenexpr 1239/// ::= ifexpr 1240/// ::= forexpr 1241static ExprAST *ParsePrimary() { 1242 switch (CurTok) { 1243 default: return Error("unknown token when expecting an expression"); 1244 case tok_identifier: return ParseIdentifierExpr(); 1245 case tok_number: return ParseNumberExpr(); 1246 case '(': return ParseParenExpr(); 1247 case tok_if: return ParseIfExpr(); 1248 case tok_for: return ParseForExpr(); 1249 } 1250} 1251 1252/// binoprhs 1253/// ::= ('+' primary)* 1254static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { 1255 // If this is a binop, find its precedence. 1256 while (1) { 1257 int TokPrec = GetTokPrecedence(); 1258 1259 // If this is a binop that binds at least as tightly as the current binop, 1260 // consume it, otherwise we are done. 1261 if (TokPrec < ExprPrec) 1262 return LHS; 1263 1264 // Okay, we know this is a binop. 1265 int BinOp = CurTok; 1266 getNextToken(); // eat binop 1267 1268 // Parse the primary expression after the binary operator. 1269 ExprAST *RHS = ParsePrimary(); 1270 if (!RHS) return 0; 1271 1272 // If BinOp binds less tightly with RHS than the operator after RHS, let 1273 // the pending operator take RHS as its LHS. 1274 int NextPrec = GetTokPrecedence(); 1275 if (TokPrec < NextPrec) { 1276 RHS = ParseBinOpRHS(TokPrec+1, RHS); 1277 if (RHS == 0) return 0; 1278 } 1279 1280 // Merge LHS/RHS. 1281 LHS = new BinaryExprAST(BinOp, LHS, RHS); 1282 } 1283} 1284 1285/// expression 1286/// ::= primary binoprhs 1287/// 1288static ExprAST *ParseExpression() { 1289 ExprAST *LHS = ParsePrimary(); 1290 if (!LHS) return 0; 1291 1292 return ParseBinOpRHS(0, LHS); 1293} 1294 1295/// prototype 1296/// ::= id '(' id* ')' 1297static PrototypeAST *ParsePrototype() { 1298 if (CurTok != tok_identifier) 1299 return ErrorP("Expected function name in prototype"); 1300 1301 std::string FnName = IdentifierStr; 1302 getNextToken(); 1303 1304 if (CurTok != '(') 1305 return ErrorP("Expected '(' in prototype"); 1306 1307 std::vector<std::string> ArgNames; 1308 while (getNextToken() == tok_identifier) 1309 ArgNames.push_back(IdentifierStr); 1310 if (CurTok != ')') 1311 return ErrorP("Expected ')' in prototype"); 1312 1313 // success. 1314 getNextToken(); // eat ')'. 1315 1316 return new PrototypeAST(FnName, ArgNames); 1317} 1318 1319/// definition ::= 'def' prototype expression 1320static FunctionAST *ParseDefinition() { 1321 getNextToken(); // eat def. 1322 PrototypeAST *Proto = ParsePrototype(); 1323 if (Proto == 0) return 0; 1324 1325 if (ExprAST *E = ParseExpression()) 1326 return new FunctionAST(Proto, E); 1327 return 0; 1328} 1329 1330/// toplevelexpr ::= expression 1331static FunctionAST *ParseTopLevelExpr() { 1332 if (ExprAST *E = ParseExpression()) { 1333 // Make an anonymous proto. 1334 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); 1335 return new FunctionAST(Proto, E); 1336 } 1337 return 0; 1338} 1339 1340/// external ::= 'extern' prototype 1341static PrototypeAST *ParseExtern() { 1342 getNextToken(); // eat extern. 1343 return ParsePrototype(); 1344} 1345 1346//===----------------------------------------------------------------------===// 1347// Code Generation 1348//===----------------------------------------------------------------------===// 1349 1350static Module *TheModule; 1351static IRBuilder<> Builder(getGlobalContext()); 1352static std::map<std::string, Value*> NamedValues; 1353static FunctionPassManager *TheFPM; 1354 1355Value *ErrorV(const char *Str) { Error(Str); return 0; } 1356 1357Value *NumberExprAST::Codegen() { 1358 return ConstantFP::get(getGlobalContext(), APFloat(Val)); 1359} 1360 1361Value *VariableExprAST::Codegen() { 1362 // Look this variable up in the function. 1363 Value *V = NamedValues[Name]; 1364 return V ? V : ErrorV("Unknown variable name"); 1365} 1366 1367Value *BinaryExprAST::Codegen() { 1368 Value *L = LHS->Codegen(); 1369 Value *R = RHS->Codegen(); 1370 if (L == 0 || R == 0) return 0; 1371 1372 switch (Op) { 1373 case '+': return Builder.CreateFAdd(L, R, "addtmp"); 1374 case '-': return Builder.CreateFSub(L, R, "subtmp"); 1375 case '*': return Builder.CreateFMul(L, R, "multmp"); 1376 case '<': 1377 L = Builder.CreateFCmpULT(L, R, "cmptmp"); 1378 // Convert bool 0/1 to double 0.0 or 1.0 1379 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), 1380 "booltmp"); 1381 default: return ErrorV("invalid binary operator"); 1382 } 1383} 1384 1385Value *CallExprAST::Codegen() { 1386 // Look up the name in the global module table. 1387 Function *CalleeF = TheModule->getFunction(Callee); 1388 if (CalleeF == 0) 1389 return ErrorV("Unknown function referenced"); 1390 1391 // If argument mismatch error. 1392 if (CalleeF->arg_size() != Args.size()) 1393 return ErrorV("Incorrect # arguments passed"); 1394 1395 std::vector<Value*> ArgsV; 1396 for (unsigned i = 0, e = Args.size(); i != e; ++i) { 1397 ArgsV.push_back(Args[i]->Codegen()); 1398 if (ArgsV.back() == 0) return 0; 1399 } 1400 1401 return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); 1402} 1403 1404Value *IfExprAST::Codegen() { 1405 Value *CondV = Cond->Codegen(); 1406 if (CondV == 0) return 0; 1407 1408 // Convert condition to a bool by comparing equal to 0.0. 1409 CondV = Builder.CreateFCmpONE(CondV, 1410 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 1411 "ifcond"); 1412 1413 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 1414 1415 // Create blocks for the then and else cases. Insert the 'then' block at the 1416 // end of the function. 1417 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); 1418 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); 1419 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); 1420 1421 Builder.CreateCondBr(CondV, ThenBB, ElseBB); 1422 1423 // Emit then value. 1424 Builder.SetInsertPoint(ThenBB); 1425 1426 Value *ThenV = Then->Codegen(); 1427 if (ThenV == 0) return 0; 1428 1429 Builder.CreateBr(MergeBB); 1430 // Codegen of 'Then' can change the current block, update ThenBB for the PHI. 1431 ThenBB = Builder.GetInsertBlock(); 1432 1433 // Emit else block. 1434 TheFunction->getBasicBlockList().push_back(ElseBB); 1435 Builder.SetInsertPoint(ElseBB); 1436 1437 Value *ElseV = Else->Codegen(); 1438 if (ElseV == 0) return 0; 1439 1440 Builder.CreateBr(MergeBB); 1441 // Codegen of 'Else' can change the current block, update ElseBB for the PHI. 1442 ElseBB = Builder.GetInsertBlock(); 1443 1444 // Emit merge block. 1445 TheFunction->getBasicBlockList().push_back(MergeBB); 1446 Builder.SetInsertPoint(MergeBB); 1447 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, 1448 "iftmp"); 1449 1450 PN->addIncoming(ThenV, ThenBB); 1451 PN->addIncoming(ElseV, ElseBB); 1452 return PN; 1453} 1454 1455Value *ForExprAST::Codegen() { 1456 // Output this as: 1457 // ... 1458 // start = startexpr 1459 // goto loop 1460 // loop: 1461 // variable = phi [start, loopheader], [nextvariable, loopend] 1462 // ... 1463 // bodyexpr 1464 // ... 1465 // loopend: 1466 // step = stepexpr 1467 // nextvariable = variable + step 1468 // endcond = endexpr 1469 // br endcond, loop, endloop 1470 // outloop: 1471 1472 // Emit the start code first, without 'variable' in scope. 1473 Value *StartVal = Start->Codegen(); 1474 if (StartVal == 0) return 0; 1475 1476 // Make the new basic block for the loop header, inserting after current 1477 // block. 1478 Function *TheFunction = Builder.GetInsertBlock()->getParent(); 1479 BasicBlock *PreheaderBB = Builder.GetInsertBlock(); 1480 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); 1481 1482 // Insert an explicit fall through from the current block to the LoopBB. 1483 Builder.CreateBr(LoopBB); 1484 1485 // Start insertion in LoopBB. 1486 Builder.SetInsertPoint(LoopBB); 1487 1488 // Start the PHI node with an entry for Start. 1489 PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str()); 1490 Variable->addIncoming(StartVal, PreheaderBB); 1491 1492 // Within the loop, the variable is defined equal to the PHI node. If it 1493 // shadows an existing variable, we have to restore it, so save it now. 1494 Value *OldVal = NamedValues[VarName]; 1495 NamedValues[VarName] = Variable; 1496 1497 // Emit the body of the loop. This, like any other expr, can change the 1498 // current BB. Note that we ignore the value computed by the body, but don't 1499 // allow an error. 1500 if (Body->Codegen() == 0) 1501 return 0; 1502 1503 // Emit the step value. 1504 Value *StepVal; 1505 if (Step) { 1506 StepVal = Step->Codegen(); 1507 if (StepVal == 0) return 0; 1508 } else { 1509 // If not specified, use 1.0. 1510 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); 1511 } 1512 1513 Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); 1514 1515 // Compute the end condition. 1516 Value *EndCond = End->Codegen(); 1517 if (EndCond == 0) return EndCond; 1518 1519 // Convert condition to a bool by comparing equal to 0.0. 1520 EndCond = Builder.CreateFCmpONE(EndCond, 1521 ConstantFP::get(getGlobalContext(), APFloat(0.0)), 1522 "loopcond"); 1523 1524 // Create the "after loop" block and insert it. 1525 BasicBlock *LoopEndBB = Builder.GetInsertBlock(); 1526 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); 1527 1528 // Insert the conditional branch into the end of LoopEndBB. 1529 Builder.CreateCondBr(EndCond, LoopBB, AfterBB); 1530 1531 // Any new code will be inserted in AfterBB. 1532 Builder.SetInsertPoint(AfterBB); 1533 1534 // Add a new entry to the PHI node for the backedge. 1535 Variable->addIncoming(NextVar, LoopEndBB); 1536 1537 // Restore the unshadowed variable. 1538 if (OldVal) 1539 NamedValues[VarName] = OldVal; 1540 else 1541 NamedValues.erase(VarName); 1542 1543 1544 // for expr always returns 0.0. 1545 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); 1546} 1547 1548Function *PrototypeAST::Codegen() { 1549 // Make the function type: double(double,double) etc. 1550 std::vector<Type*> Doubles(Args.size(), 1551 Type::getDoubleTy(getGlobalContext())); 1552 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), 1553 Doubles, false); 1554 1555 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); 1556 1557 // If F conflicted, there was already something named 'Name'. If it has a 1558 // body, don't allow redefinition or reextern. 1559 if (F->getName() != Name) { 1560 // Delete the one we just made and get the existing one. 1561 F->eraseFromParent(); 1562 F = TheModule->getFunction(Name); 1563 1564 // If F already has a body, reject this. 1565 if (!F->empty()) { 1566 ErrorF("redefinition of function"); 1567 return 0; 1568 } 1569 1570 // If F took a different number of args, reject. 1571 if (F->arg_size() != Args.size()) { 1572 ErrorF("redefinition of function with different # args"); 1573 return 0; 1574 } 1575 } 1576 1577 // Set names for all arguments. 1578 unsigned Idx = 0; 1579 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); 1580 ++AI, ++Idx) { 1581 AI->setName(Args[Idx]); 1582 1583 // Add arguments to variable symbol table. 1584 NamedValues[Args[Idx]] = AI; 1585 } 1586 1587 return F; 1588} 1589 1590Function *FunctionAST::Codegen() { 1591 NamedValues.clear(); 1592 1593 Function *TheFunction = Proto->Codegen(); 1594 if (TheFunction == 0) 1595 return 0; 1596 1597 // Create a new basic block to start insertion into. 1598 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); 1599 Builder.SetInsertPoint(BB); 1600 1601 if (Value *RetVal = Body->Codegen()) { 1602 // Finish off the function. 1603 Builder.CreateRet(RetVal); 1604 1605 // Validate the generated code, checking for consistency. 1606 verifyFunction(*TheFunction); 1607 1608 // Optimize the function. 1609 TheFPM->run(*TheFunction); 1610 1611 return TheFunction; 1612 } 1613 1614 // Error reading body, remove function. 1615 TheFunction->eraseFromParent(); 1616 return 0; 1617} 1618 1619//===----------------------------------------------------------------------===// 1620// Top-Level parsing and JIT Driver 1621//===----------------------------------------------------------------------===// 1622 1623static ExecutionEngine *TheExecutionEngine; 1624 1625static void HandleDefinition() { 1626 if (FunctionAST *F = ParseDefinition()) { 1627 if (Function *LF = F->Codegen()) { 1628 fprintf(stderr, "Read function definition:"); 1629 LF->dump(); 1630 } 1631 } else { 1632 // Skip token for error recovery. 1633 getNextToken(); 1634 } 1635} 1636 1637static void HandleExtern() { 1638 if (PrototypeAST *P = ParseExtern()) { 1639 if (Function *F = P->Codegen()) { 1640 fprintf(stderr, "Read extern: "); 1641 F->dump(); 1642 } 1643 } else { 1644 // Skip token for error recovery. 1645 getNextToken(); 1646 } 1647} 1648 1649static void HandleTopLevelExpression() { 1650 // Evaluate a top-level expression into an anonymous function. 1651 if (FunctionAST *F = ParseTopLevelExpr()) { 1652 if (Function *LF = F->Codegen()) { 1653 // JIT the function, returning a function pointer. 1654 void *FPtr = TheExecutionEngine->getPointerToFunction(LF); 1655 1656 // Cast it to the right type (takes no arguments, returns a double) so we 1657 // can call it as a native function. 1658 double (*FP)() = (double (*)())(intptr_t)FPtr; 1659 fprintf(stderr, "Evaluated to %f\n", FP()); 1660 } 1661 } else { 1662 // Skip token for error recovery. 1663 getNextToken(); 1664 } 1665} 1666 1667/// top ::= definition | external | expression | ';' 1668static void MainLoop() { 1669 while (1) { 1670 fprintf(stderr, "ready> "); 1671 switch (CurTok) { 1672 case tok_eof: return; 1673 case ';': getNextToken(); break; // ignore top-level semicolons. 1674 case tok_def: HandleDefinition(); break; 1675 case tok_extern: HandleExtern(); break; 1676 default: HandleTopLevelExpression(); break; 1677 } 1678 } 1679} 1680 1681//===----------------------------------------------------------------------===// 1682// "Library" functions that can be "extern'd" from user code. 1683//===----------------------------------------------------------------------===// 1684 1685/// putchard - putchar that takes a double and returns 0. 1686extern "C" 1687double putchard(double X) { 1688 putchar((char)X); 1689 return 0; 1690} 1691 1692//===----------------------------------------------------------------------===// 1693// Main driver code. 1694//===----------------------------------------------------------------------===// 1695 1696int main() { 1697 InitializeNativeTarget(); 1698 LLVMContext &Context = getGlobalContext(); 1699 1700 // Install standard binary operators. 1701 // 1 is lowest precedence. 1702 BinopPrecedence['<'] = 10; 1703 BinopPrecedence['+'] = 20; 1704 BinopPrecedence['-'] = 20; 1705 BinopPrecedence['*'] = 40; // highest. 1706 1707 // Prime the first token. 1708 fprintf(stderr, "ready> "); 1709 getNextToken(); 1710 1711 // Make the module, which holds all the code. 1712 TheModule = new Module("my cool jit", Context); 1713 1714 // Create the JIT. This takes ownership of the module. 1715 std::string ErrStr; 1716 TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); 1717 if (!TheExecutionEngine) { 1718 fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); 1719 exit(1); 1720 } 1721 1722 FunctionPassManager OurFPM(TheModule); 1723 1724 // Set up the optimizer pipeline. Start with registering info about how the 1725 // target lays out data structures. 1726 OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); 1727 // Provide basic AliasAnalysis support for GVN. 1728 OurFPM.add(createBasicAliasAnalysisPass()); 1729 // Do simple "peephole" optimizations and bit-twiddling optzns. 1730 OurFPM.add(createInstructionCombiningPass()); 1731 // Reassociate expressions. 1732 OurFPM.add(createReassociatePass()); 1733 // Eliminate Common SubExpressions. 1734 OurFPM.add(createGVNPass()); 1735 // Simplify the control flow graph (deleting unreachable blocks, etc). 1736 OurFPM.add(createCFGSimplificationPass()); 1737 1738 OurFPM.doInitialization(); 1739 1740 // Set the global so the code gen can use this. 1741 TheFPM = &OurFPM; 1742 1743 // Run the main "interpreter loop" now. 1744 MainLoop(); 1745 1746 TheFPM = 0; 1747 1748 // Print out all of the generated code. 1749 TheModule->dump(); 1750 1751 return 0; 1752} 1753</pre> 1754</div> 1755 1756<a href="LangImpl6.html">Next: Extending the language: user-defined operators</a> 1757</div> 1758 1759<!-- *********************************************************************** --> 1760<hr> 1761<address> 1762 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img 1763 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> 1764 <a href="http://validator.w3.org/check/referer"><img 1765 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> 1766 1767 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> 1768 <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> 1769 Last modified: $Date$ 1770</address> 1771</body> 1772</html> 1773