1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3<html> 4<head> 5 <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> 6 <title>Clang - Performance</title> 7 <link type="text/css" rel="stylesheet" href="menu.css" /> 8 <link type="text/css" rel="stylesheet" href="content.css" /> 9 <style type="text/css"> 10</style> 11</head> 12<body> 13 14<!--#include virtual="menu.html.incl"--> 15 16<div id="content"> 17 18<!--*************************************************************************--> 19<h1>Clang - Performance</h1> 20<!--*************************************************************************--> 21 22<p>This page tracks the compile time performance of Clang on two 23interesting benchmarks: 24<ul> 25 <li><i>Sketch</i>: The Objective-C example application shipped on 26 Mac OS X as part of Xcode. <i>Sketch</i> is indicative of a 27 "typical" Objective-C app. The source itself has a relatively 28 small amount of code (~7,500 lines of source code), but it relies 29 on the extensive Cocoa APIs to build its functionality. Like many 30 Objective-C applications, it includes 31 <tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a 32 significant stress test of the front-end's performance on lexing, 33 preprocessing, parsing, and syntax analysis.</li> 34 <li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in 35 SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a 36 large amount of C source code (~220,000 lines) with few system 37 dependencies. This stresses the back-end's performance on generating 38 assembly code and debug information.</li> 39</ul> 40</p> 41 42<!--*************************************************************************--> 43<h2><a name="enduser">Experiments</a></h2> 44<!--*************************************************************************--> 45 46<p>Measurements are done by serially processing each file in the 47respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In 48order to track the performance of various subsystems the timings have 49been broken down into separate stages where possible: 50 51<ul> 52 <li><tt>-Eonly</tt>: This option runs the preprocessor but does not 53 perform any output. For gcc and llvm-gcc, the -MM option is used 54 as a rough equivalent to this step.</li> 55 <li><tt>-parse-noop</tt>: This option runs the parser on the input, 56 but without semantic analysis or any output. gcc and llvm-gcc have 57 no equivalent for this option.</li> 58 <li><tt>-fsyntax-only</tt>: This option runs the parser with semantic 59 analysis.</li> 60 <li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option 61 converts to the LLVM intermediate representation but doesn't 62 generate native code.</li> 63 <li><tt>-S -O0</tt>: Perform actual code generation to produce a 64 native assembler file.</li> 65 <li><tt>-S -O0 -g</tt>: This adds emission of debug information to 66 the assembly output.</li> 67</ul> 68</p> 69 70<p>This set of stages is chosen to be approximately additive, that is 71each subsequent stage simply adds some additional processing. The 72timings measure the delta of the given stage from the previous 73one. For example, the timings for <tt>-fsyntax-only</tt> below show 74the difference of running with <tt>-fsyntax-only</tt> versus running 75with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and 76llvm-gcc. This amounts to a fairly accurate measure of only the time 77to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p> 78 79<p>These timings are chosen to break down the compilation process for 80clang as much as possible. The graphs below show these numbers 81combined so that it is easy to see how the time for a particular task 82is divided among various components. For example, <tt>-S -O0</tt> 83includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p> 84 85<p>Note that we already know that the LLVM optimizers are substantially (30-40%) 86faster than the GCC optimizers at a given -O level, so we only focus on -O0 87compile time here.</p> 88 89<!--*************************************************************************--> 90<h2><a name="enduser">Timing Results</a></h2> 91<!--*************************************************************************--> 92 93<!--=======================================================================--> 94<h3><a name="2008-10-31">2008-10-31</a></h3> 95<!--=======================================================================--> 96 97<center><h4>Sketch</h4></center> 98<img class="img_slide" 99 src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings"/> 100 101<p>This shows Clang's substantial performance improvements in 102preprocessing and semantic analysis; over 90% faster on 103-fsyntax-only. As expected, time spent in code generation for this 104benchmark is relatively small. One caveat, Clang's debug information 105generation for Objective-C is very incomplete; this means the <tt>-S 106-O0 -g</tt> numbers are unfair since Clang is generating substantially 107less output.</p> 108 109<p>This chart also shows the effect of using precompiled headers (PCH) 110on compiler time. gcc and llvm-gcc see a large performance improvement 111with PCH; about 4x in wall time. Unfortunately, Clang does not yet 112have an implementation of PCH-style optimizations, but we are actively 113working to address this.</p> 114 115<center><h4>176.gcc</h4></center> 116<img class="img_slide" 117 src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings"/> 118 119<p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i> 120involves a large amount of code generation. The time spent in Clang's 121LLVM IR generation and code generation is on par with gcc's code 122generation time but the improved parsing & semantic analysis 123performance means Clang still comes in at ~29% faster versus gcc 124on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p> 125 126<p>These numbers indicate that Clang still has room for improvement in 127several areas, notably our LLVM IR generation is significantly slower 128than that of llvm-gcc, and both Clang and llvm-gcc incur a 129significantly higher cost for adding debugging information compared to 130gcc.</p> 131 132</div> 133</body> 134</html> 135