1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3<html> 4<head> 5 <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> 6 <title>Clang - Features and Goals</title> 7 <link type="text/css" rel="stylesheet" href="menu.css" /> 8 <link type="text/css" rel="stylesheet" href="content.css" /> 9 <style type="text/css"> 10</style> 11</head> 12<body> 13 14<!--#include virtual="menu.html.incl"--> 15 16<div id="content"> 17 18<!--*************************************************************************--> 19<h1>Clang - Features and Goals</h1> 20<!--*************************************************************************--> 21 22<p> 23This page describes the <a href="index.html#goals">features and goals</a> of 24Clang in more detail and gives a more broad explanation about what we mean. 25These features are: 26</p> 27 28<p>End-User Features:</p> 29 30<ul> 31<li><a href="#performance">Fast compiles and low memory use</a></li> 32<li><a href="#expressivediags">Expressive diagnostics</a></li> 33<li><a href="#gcccompat">GCC compatibility</a></li> 34</ul> 35 36<p>Utility and Applications:</p> 37 38<ul> 39<li><a href="#libraryarch">Library based architecture</a></li> 40<li><a href="#diverseclients">Support diverse clients</a></li> 41<li><a href="#ideintegration">Integration with IDEs</a></li> 42<li><a href="#license">Use the LLVM 'BSD' License</a></li> 43</ul> 44 45<p>Internal Design and Implementation:</p> 46 47<ul> 48<li><a href="#real">A real-world, production quality compiler</a></li> 49<li><a href="#simplecode">A simple and hackable code base</a></li> 50<li><a href="#unifiedparser">A single unified parser for C, Objective C, C++, 51 and Objective C++</a></li> 52<li><a href="#conformance">Conformance with C/C++/ObjC and their 53 variants</a></li> 54</ul> 55 56<!--*************************************************************************--> 57<h2><a name="enduser">End-User Features</a></h2> 58<!--*************************************************************************--> 59 60 61<!--=======================================================================--> 62<h3><a name="performance">Fast compiles and Low Memory Use</a></h3> 63<!--=======================================================================--> 64 65<p>A major focus of our work on clang is to make it fast, light and scalable. 66The library-based architecture of clang makes it straight-forward to time and 67profile the cost of each layer of the stack, and the driver has a number of 68options for performance analysis.</p> 69 70<p>While there is still much that can be done, we find that the clang front-end 71is significantly quicker than gcc and uses less memory For example, when 72compiling "Carbon.h" on Mac OS/X, we see that clang is 2.5x faster than GCC:</p> 73 74<img class="img_slide" src="feature-compile1.png" width="400" height="300" /> 75 76<p>Carbon.h is a monster: it transitively includes 558 files, 12.3M of code, 77declares 10000 functions, has 2000 struct definitions, 8000 fields, 20000 enum 78constants, etc (see slide 25+ of the <a href="clang_video-07-25-2007.html">clang 79talk</a> for more information). It is also #include'd into almost every C file 80in a GUI app on the Mac, so its compile time is very important.</p> 81 82<p>From the slide above, you can see that we can measure the time to preprocess 83the file independently from the time to parse it, and independently from the 84time to build the ASTs for the code. GCC doesn't provide a way to measure the 85parser without AST building (it only provides -fsyntax-only). In our 86measurements, we find that clang's preprocessor is consistently 40% faster than 87GCCs, and the parser + AST builder is ~4x faster than GCC's. If you have 88sources that do not depend as heavily on the preprocessor (or if you 89use Precompiled Headers) you may see a much bigger speedup from clang. 90</p> 91 92<p>Compile time performance is important, but when using clang as an API, often 93memory use is even moreso: the less memory the code takes the more code you can 94fit into memory at a time (useful for whole program analysis tools, for 95example).</p> 96 97<img class="img_slide" src="feature-memory1.png" width="400" height="300" /> 98 99<p>Here we see a huge advantage of clang: its ASTs take <b>5x less memory</b> 100than GCC's syntax trees, despite the fact that clang's ASTs capture far more 101source-level information than GCC's trees do. This feat is accomplished through 102the use of carefully designed APIs and efficient representations.</p> 103 104<p>In addition to being efficient when pitted head-to-head against GCC in batch 105mode, clang is built with a <a href="#libraryarch">library based 106architecture</a> that makes it relatively easy to adapt it and build new tools 107with it. This means that it is often possible to apply out-of-the-box thinking 108and novel techniques to improve compilation in various ways.</p> 109 110<img class="img_slide" src="feature-compile2.png" width="400" height="300" /> 111 112<p>This slide shows how the clang preprocessor can be used to make "distcc" 113parallelization <b>3x</b> more scalable than when using the GCC preprocessor. 114"distcc" quickly bottlenecks on the preprocessor running on the central driver 115machine, so a fast preprocessor is very useful. Comparing the first two bars 116of each group shows how a ~40% faster preprocessor can reduce preprocessing time 117of these large C++ apps by about 40% (shocking!).</p> 118 119<p>The third bar on the slide is the interesting part: it shows how trivial 120caching of file system accesses across invocations of the preprocessor allows 121clang to reduce time spent in the kernel by 10x, making distcc over 3x more 122scalable. This is obviously just one simple hack, doing more interesting things 123(like caching tokens across preprocessed files) would yield another substantial 124speedup.</p> 125 126<p>The clean framework-based design of clang means that many things are possible 127that would be very difficult in other systems, for example incremental 128compilation, multithreading, intelligent caching, etc. We are only starting 129to tap the full potential of the clang design.</p> 130 131 132<!--=======================================================================--> 133<h3><a name="expressivediags">Expressive Diagnostics</a></h3> 134<!--=======================================================================--> 135 136<p>In addition to being fast and functional, we aim to make Clang extremely user 137friendly. As far as a command-line compiler goes, this basically boils down to 138making the diagnostics (error and warning messages) generated by the compiler 139be as useful as possible. There are several ways that we do this, but the 140most important are pinpointing exactly what is wrong in the program, 141highlighting related information so that it is easy to understand at a glance, 142and making the wording as clear as possible.</p> 143 144<p>Here is one simple example that illustrates the difference between a typical 145GCC and Clang diagnostic:</p> 146 147<pre> 148 $ <b>gcc-4.2 -fsyntax-only t.c</b> 149 t.c:7: error: invalid operands to binary + (have 'int' and 'struct A') 150 $ <b>clang -fsyntax-only t.c</b> 151 t.c:7:39: error: invalid operands to binary expression ('int' and 'struct A') 152 <font color="darkgreen"> return y + func(y ? ((SomeA.X + 40) + SomeA) / 42 + SomeA.X : SomeA.X);</font> 153 <font color="blue"> ~~~~~~~~~~~~~~ ^ ~~~~~</font> 154</pre> 155 156<p>Here you can see that you don't even need to see the original source code to 157understand what is wrong based on the Clang error: Because clang prints a 158caret, you know exactly <em>which</em> plus it is complaining about. The range 159information highlights the left and right side of the plus which makes it 160immediately obvious what the compiler is talking about, which is very useful for 161cases involving precedence issues and many other situations.</p> 162 163<p>Clang diagnostics are very polished and have many features. For more 164information and examples, please see the <a href="diagnostics.html">Expressive 165Diagnostics</a> page.</p> 166 167<!--=======================================================================--> 168<h3><a name="gcccompat">GCC Compatibility</a></h3> 169<!--=======================================================================--> 170 171<p>GCC is currently the defacto-standard open source compiler today, and it 172routinely compiles a huge volume of code. GCC supports a huge number of 173extensions and features (many of which are undocumented) and a lot of 174code and header files depend on these features in order to build.</p> 175 176<p>While it would be nice to be able to ignore these extensions and focus on 177implementing the language standards to the letter, pragmatics force us to 178support the GCC extensions that see the most use. Many users just want their 179code to compile, they don't care to argue about whether it is pedantically C99 180or not.</p> 181 182<p>As mentioned above, all 183extensions are explicitly recognized as such and marked with extension 184diagnostics, which can be mapped to warnings, errors, or just ignored. 185</p> 186 187 188<!--*************************************************************************--> 189<h2><a name="applications">Utility and Applications</a></h2> 190<!--*************************************************************************--> 191 192<!--=======================================================================--> 193<h3><a name="libraryarch">Library Based Architecture</a></h3> 194<!--=======================================================================--> 195 196<p>A major design concept for clang is its use of a library-based 197architecture. In this design, various parts of the front-end can be cleanly 198divided into separate libraries which can then be mixed up for different needs 199and uses. In addition, the library-based approach encourages good interfaces 200and makes it easier for new developers to get involved (because they only need 201to understand small pieces of the big picture).</p> 202 203<blockquote> 204"The world needs better compiler tools, tools which are built as libraries. 205This design point allows reuse of the tools in new and novel ways. However, 206building the tools as libraries isn't enough: they must have clean APIs, be as 207decoupled from each other as possible, and be easy to modify/extend. This 208requires clean layering, decent design, and keeping the libraries independent of 209any specific client."</blockquote> 210 211<p> 212Currently, clang is divided into the following libraries and tool: 213</p> 214 215<ul> 216<li><b>libsupport</b> - Basic support library, from LLVM.</li> 217<li><b>libsystem</b> - System abstraction library, from LLVM.</li> 218<li><b>libbasic</b> - Diagnostics, SourceLocations, SourceBuffer abstraction, 219 file system caching for input source files.</li> 220<li><b>libast</b> - Provides classes to represent the C AST, the C type system, 221 builtin functions, and various helpers for analyzing and manipulating the 222 AST (visitors, pretty printers, etc).</li> 223<li><b>liblex</b> - Lexing and preprocessing, identifier hash table, pragma 224 handling, tokens, and macro expansion.</li> 225<li><b>libparse</b> - Parsing. This library invokes coarse-grained 'Actions' 226 provided by the client (e.g. libsema builds ASTs) but knows nothing about 227 ASTs or other client-specific data structures.</li> 228<li><b>libsema</b> - Semantic Analysis. This provides a set of parser actions 229 to build a standardized AST for programs.</li> 230<li><b>libcodegen</b> - Lower the AST to LLVM IR for optimization & code 231 generation.</li> 232<li><b>librewrite</b> - Editing of text buffers (important for code rewriting 233 transformation, like refactoring).</li> 234<li><b>libanalysis</b> - Static analysis support.</li> 235<li><b>clang</b> - A driver program, client of the libraries at various 236 levels.</li> 237</ul> 238 239<p>As an example of the power of this library based design.... If you wanted to 240build a preprocessor, you would take the Basic and Lexer libraries. If you want 241an indexer, you would take the previous two and add the Parser library and 242some actions for indexing. If you want a refactoring, static analysis, or 243source-to-source compiler tool, you would then add the AST building and 244semantic analyzer libraries.</p> 245 246<p>For more information about the low-level implementation details of the 247various clang libraries, please see the <a href="docs/InternalsManual.html"> 248clang Internals Manual</a>.</p> 249 250<!--=======================================================================--> 251<h3><a name="diverseclients">Support Diverse Clients</a></h3> 252<!--=======================================================================--> 253 254<p>Clang is designed and built with many grand plans for how we can use it. The 255driving force is the fact that we use C and C++ daily, and have to suffer due to 256a lack of good tools available for it. We believe that the C and C++ tools 257ecosystem has been significantly limited by how difficult it is to parse and 258represent the source code for these languages, and we aim to rectify this 259problem in clang.</p> 260 261<p>The problem with this goal is that different clients have very different 262requirements. Consider code generation, for example: a simple front-end that 263parses for code generation must analyze the code for validity and emit code 264in some intermediate form to pass off to a optimizer or backend. Because 265validity analysis and code generation can largely be done on the fly, there is 266not hard requirement that the front-end actually build up a full AST for all 267the expressions and statements in the code. TCC and GCC are examples of 268compilers that either build no real AST (in the former case) or build a stripped 269down and simplified AST (in the later case) because they focus primarily on 270codegen.</p> 271 272<p>On the opposite side of the spectrum, some clients (like refactoring) want 273highly detailed information about the original source code and want a complete 274AST to describe it with. Refactoring wants to have information about macro 275expansions, the location of every paren expression '(((x)))' vs 'x', full 276position information, and much more. Further, refactoring wants to look 277<em>across the whole program</em> to ensure that it is making transformations 278that are safe. Making this efficient and getting this right requires a 279significant amount of engineering and algorithmic work that simply are 280unnecessary for a simple static compiler.</p> 281 282<p>The beauty of the clang approach is that it does not restrict how you use it. 283In particular, it is possible to use the clang preprocessor and parser to build 284an extremely quick and light-weight on-the-fly code generator (similar to TCC) 285that does not build an AST at all. As an intermediate step, clang supports 286using the current AST generation and semantic analysis code and having a code 287generation client free the AST for each function after code generation. Finally, 288clang provides support for building and retaining fully-fledged ASTs, and even 289supports writing them out to disk.</p> 290 291<p>Designing the libraries with clean and simple APIs allows these high-level 292policy decisions to be determined in the client, instead of forcing "one true 293way" in the implementation of any of these libraries. Getting this right is 294hard, and we don't always get it right the first time, but we fix any problems 295when we realize we made a mistake.</p> 296 297<!--=======================================================================--> 298<h3><a name="ideintegration">Integration with IDEs</h3> 299<!--=======================================================================--> 300 301<p> 302We believe that Integrated Development Environments (IDE's) are a great way 303to pull together various pieces of the development puzzle, and aim to make clang 304work well in such an environment. The chief advantage of an IDE is that they 305typically have visibility across your entire project and are long-lived 306processes, whereas stand-alone compiler tools are typically invoked on each 307individual file in the project, and thus have limited scope.</p> 308 309<p>There are many implications of this difference, but a significant one has to 310do with efficiency and caching: sharing an address space across different files 311in a project, means that you can use intelligent caching and other techniques to 312dramatically reduce analysis/compilation time.</p> 313 314<p>A further difference between IDEs and batch compiler is that they often 315impose very different requirements on the front-end: they depend on high 316performance in order to provide a "snappy" experience, and thus really want 317techniques like "incremental compilation", "fuzzy parsing", etc. Finally, IDEs 318often have very different requirements than code generation, often requiring 319information that a codegen-only frontend can throw away. Clang is 320specifically designed and built to capture this information. 321</p> 322 323 324<!--=======================================================================--> 325<h3><a name="license">Use the LLVM 'BSD' License</a></h3> 326<!--=======================================================================--> 327 328<p>We actively intend for clang (and LLVM as a whole) to be used for 329commercial projects, and the BSD license is the simplest way to allow this. We 330feel that the license encourages contributors to pick up the source and work 331with it, and believe that those individuals and organizations will contribute 332back their work if they do not want to have to maintain a fork forever (which is 333time consuming and expensive when merges are involved). Further, nobody makes 334money on compilers these days, but many people need them to get bigger goals 335accomplished: it makes sense for everyone to work together.</p> 336 337<p>For more information about the LLVM/clang license, please see the <a 338href="http://llvm.org/docs/DeveloperPolicy.html#license">LLVM License 339Description</a> for more information.</p> 340 341 342 343<!--*************************************************************************--> 344<h2><a name="design">Internal Design and Implementation</a></h2> 345<!--*************************************************************************--> 346 347<!--=======================================================================--> 348<h3><a name="real">A real-world, production quality compiler</a></h3> 349<!--=======================================================================--> 350 351<p> 352Clang is designed and built by experienced compiler developers who 353are increasingly frustrated with the problems that <a 354href="comparison.html">existing open source compilers</a> have. Clang is 355carefully and thoughtfully designed and built to provide the foundation of a 356whole new generation of C/C++/Objective C development tools, and we intend for 357it to be production quality.</p> 358 359<p>Being a production quality compiler means many things: it means being high 360performance, being solid and (relatively) bug free, and it means eventually 361being used and depended on by a broad range of people. While we are still in 362the early development stages, we strongly believe that this will become a 363reality.</p> 364 365<!--=======================================================================--> 366<h3><a name="simplecode">A simple and hackable code base</a></h3> 367<!--=======================================================================--> 368 369<p>Our goal is to make it possible for anyone with a basic understanding 370of compilers and working knowledge of the C/C++/ObjC languages to understand and 371extend the clang source base. A large part of this falls out of our decision to 372make the AST mirror the languages as closely as possible: you have your friendly 373if statement, for statement, parenthesis expression, structs, unions, etc, all 374represented in a simple and explicit way.</p> 375 376<p>In addition to a simple design, we work to make the source base approachable 377by commenting it well, including citations of the language standards where 378appropriate, and designing the code for simplicity. Beyond that, clang offers 379a set of AST dumpers, printers, and visualizers that make it easy to put code in 380and see how it is represented.</p> 381 382<!--=======================================================================--> 383<h3><a name="unifiedparser">A single unified parser for C, Objective C, C++, 384and Objective C++</a></h3> 385<!--=======================================================================--> 386 387<p>Clang is the "C Language Family Front-end", which means we intend to support 388the most popular members of the C family. We are convinced that the right 389parsing technology for this class of languages is a hand-built recursive-descent 390parser. Because it is plain C++ code, recursive descent makes it very easy for 391new developers to understand the code, it easily supports ad-hoc rules and other 392strange hacks required by C/C++, and makes it straight-forward to implement 393excellent diagnostics and error recovery.</p> 394 395<p>We believe that implementing C/C++/ObjC in a single unified parser makes the 396end result easier to maintain and evolve than maintaining a separate C and C++ 397parser which must be bugfixed and maintained independently of each other.</p> 398 399<!--=======================================================================--> 400<h3><a name="conformance">Conformance with C/C++/ObjC and their 401 variants</a></h3> 402<!--=======================================================================--> 403 404<p>When you start work on implementing a language, you find out that there is a 405huge gap between how the language works and how most people understand it to 406work. This gap is the difference between a normal programmer and a (scary? 407super-natural?) "language lawyer", who knows the ins and outs of the language 408and can grok standardese with ease.</p> 409 410<p>In practice, being conformant with the languages means that we aim to support 411the full language, including the dark and dusty corners (like trigraphs, 412preprocessor arcana, C99 VLAs, etc). Where we support extensions above and 413beyond what the standard officially allows, we make an effort to explicitly call 414this out in the code and emit warnings about it (which are disabled by default, 415but can optionally be mapped to either warnings or errors), allowing you to use 416clang in "strict" mode if you desire.</p> 417 418<p>We also intend to support "dialects" of these languages, such as C89, K&R 419C, C++'03, Objective-C 2, etc.</p> 420 421</div> 422</body> 423</html> 424