• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2          "http://www.w3.org/TR/html4/strict.dtd">
3<html>
4<head>
5  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
6  <title>Clang - Features and Goals</title>
7  <link type="text/css" rel="stylesheet" href="menu.css">
8  <link type="text/css" rel="stylesheet" href="content.css">
9  <style type="text/css">
10</style>
11</head>
12<body>
13
14<!--#include virtual="menu.html.incl"-->
15
16<div id="content">
17
18<!--*************************************************************************-->
19<h1>Clang - Features and Goals</h1>
20<!--*************************************************************************-->
21
22<p>
23This page describes the <a href="index.html#goals">features and goals</a> of
24Clang in more detail and gives a more broad explanation about what we mean.
25These features are:
26</p>
27
28<p>End-User Features:</p>
29
30<ul>
31<li><a href="#performance">Fast compiles and low memory use</a></li>
32<li><a href="#expressivediags">Expressive diagnostics</a></li>
33<li><a href="#gcccompat">GCC compatibility</a></li>
34</ul>
35
36<p>Utility and Applications:</p>
37
38<ul>
39<li><a href="#libraryarch">Library based architecture</a></li>
40<li><a href="#diverseclients">Support diverse clients</a></li>
41<li><a href="#ideintegration">Integration with IDEs</a></li>
42<li><a href="#license">Use the LLVM 'BSD' License</a></li>
43</ul>
44
45<p>Internal Design and Implementation:</p>
46
47<ul>
48<li><a href="#real">A real-world, production quality compiler</a></li>
49<li><a href="#simplecode">A simple and hackable code base</a></li>
50<li><a href="#unifiedparser">A single unified parser for C, Objective C, C++,
51    and Objective C++</a></li>
52<li><a href="#conformance">Conformance with C/C++/ObjC and their
53    variants</a></li>
54</ul>
55
56<!--*************************************************************************-->
57<h2><a name="enduser">End-User Features</a></h2>
58<!--*************************************************************************-->
59
60
61<!--=======================================================================-->
62<h3><a name="performance">Fast compiles and Low Memory Use</a></h3>
63<!--=======================================================================-->
64
65<p>A major focus of our work on clang is to make it fast, light and scalable.
66The library-based architecture of clang makes it straight-forward to time and
67profile the cost of each layer of the stack, and the driver has a number of
68options for performance analysis.</p>
69
70<p>While there is still much that can be done, we find that the clang front-end
71is significantly quicker than gcc and uses less memory  For example, when
72compiling "Carbon.h" on Mac OS/X, we see that clang is 2.5x faster than GCC:</p>
73
74<img class="img_slide" src="feature-compile1.png" width="400" height="300"
75     alt="Time to parse carbon.h: -fsyntax-only">
76
77<p>Carbon.h is a monster: it transitively includes 558 files, 12.3M of code,
78declares 10000 functions, has 2000 struct definitions, 8000 fields, 20000 enum
79constants, etc (see slide 25+ of the <a href="clang_video-07-25-2007.html">clang
80talk</a> for more information). It is also #include'd into almost every C file
81in a GUI app on the Mac, so its compile time is very important.</p>
82
83<p>From the slide above, you can see that we can measure the time to preprocess
84the file independently from the time to parse it, and independently from the
85time to build the ASTs for the code.  GCC doesn't provide a way to measure the
86parser without AST building (it only provides -fsyntax-only).  In our
87measurements, we find that clang's preprocessor is consistently 40% faster than
88GCCs, and the parser + AST builder is ~4x faster than GCC's.  If you have
89sources that do not depend as heavily on the preprocessor (or if you
90use Precompiled Headers) you may see a much bigger speedup from clang.
91</p>
92
93<p>Compile time performance is important, but when using clang as an API, often
94memory use is even moreso: the less memory the code takes the more code you can
95fit into memory at a time (useful for whole program analysis tools, for
96example).</p>
97
98<img class="img_slide" src="feature-memory1.png" width="400" height="300"
99     alt="Space">
100
101<p>Here we see a huge advantage of clang: its ASTs take <b>5x less memory</b>
102than GCC's syntax trees, despite the fact that clang's ASTs capture far more
103source-level information than GCC's trees do.  This feat is accomplished through
104the use of carefully designed APIs and efficient representations.</p>
105
106<p>In addition to being efficient when pitted head-to-head against GCC in batch
107mode, clang is built with a <a href="#libraryarch">library based
108architecture</a> that makes it relatively easy to adapt it and build new tools
109with it.  This means that it is often possible to apply out-of-the-box thinking
110and novel techniques to improve compilation in various ways.</p>
111
112<img class="img_slide" src="feature-compile2.png" width="400" height="300"
113     alt="Preprocessor Speeds: GCC 4.2 vs clang-all">
114
115<p>This slide shows how the clang preprocessor can be used to make "distcc"
116parallelization <b>3x</b> more scalable than when using the GCC preprocessor.
117"distcc" quickly bottlenecks on the preprocessor running on the central driver
118machine, so a fast preprocessor is very useful.  Comparing the first two bars
119of each group shows how a ~40% faster preprocessor can reduce preprocessing time
120of these large C++ apps by about 40% (shocking!).</p>
121
122<p>The third bar on the slide is the interesting part: it shows how trivial
123caching of file system accesses across invocations of the preprocessor allows
124clang to reduce time spent in the kernel by 10x, making distcc over 3x more
125scalable.  This is obviously just one simple hack, doing more interesting things
126(like caching tokens across preprocessed files) would yield another substantial
127speedup.</p>
128
129<p>The clean framework-based design of clang means that many things are possible
130that would be very difficult in other systems, for example incremental
131compilation, multithreading, intelligent caching, etc.  We are only starting
132to tap the full potential of the clang design.</p>
133
134
135<!--=======================================================================-->
136<h3><a name="expressivediags">Expressive Diagnostics</a></h3>
137<!--=======================================================================-->
138
139<p>In addition to being fast and functional, we aim to make Clang extremely user
140friendly.  As far as a command-line compiler goes, this basically boils down to
141making the diagnostics (error and warning messages) generated by the compiler
142be as useful as possible.  There are several ways that we do this, but the
143most important are pinpointing exactly what is wrong in the program,
144highlighting related information so that it is easy to understand at a glance,
145and making the wording as clear as possible.</p>
146
147<p>Here is one simple example that illustrates the difference between a typical
148GCC and Clang diagnostic:</p>
149
150<pre>
151  $ <b>gcc-4.2 -fsyntax-only t.c</b>
152  t.c:7: error: invalid operands to binary + (have 'int' and 'struct A')
153  $ <b>clang -fsyntax-only t.c</b>
154  t.c:7:39: error: invalid operands to binary expression ('int' and 'struct A')
155  <span style="color:darkgreen">  return y + func(y ? ((SomeA.X + 40) + SomeA) / 42 + SomeA.X : SomeA.X);</span>
156  <span style="color:blue">                       ~~~~~~~~~~~~~~ ^ ~~~~~</span>
157</pre>
158
159<p>Here you can see that you don't even need to see the original source code to
160understand what is wrong based on the Clang error: Because clang prints a
161caret, you know exactly <em>which</em> plus it is complaining about.  The range
162information highlights the left and right side of the plus which makes it
163immediately obvious what the compiler is talking about, which is very useful for
164cases involving precedence issues and many other situations.</p>
165
166<p>Clang diagnostics are very polished and have many features.  For more
167information and examples, please see the <a href="diagnostics.html">Expressive
168Diagnostics</a> page.</p>
169
170<!--=======================================================================-->
171<h3><a name="gcccompat">GCC Compatibility</a></h3>
172<!--=======================================================================-->
173
174<p>GCC is currently the defacto-standard open source compiler today, and it
175routinely compiles a huge volume of code.  GCC supports a huge number of
176extensions and features (many of which are undocumented) and a lot of
177code and header files depend on these features in order to build.</p>
178
179<p>While it would be nice to be able to ignore these extensions and focus on
180implementing the language standards to the letter, pragmatics force us to
181support the GCC extensions that see the most use.  Many users just want their
182code to compile, they don't care to argue about whether it is pedantically C99
183or not.</p>
184
185<p>As mentioned above, all
186extensions are explicitly recognized as such and marked with extension
187diagnostics, which can be mapped to warnings, errors, or just ignored.
188</p>
189
190
191<!--*************************************************************************-->
192<h2><a name="applications">Utility and Applications</a></h2>
193<!--*************************************************************************-->
194
195<!--=======================================================================-->
196<h3><a name="libraryarch">Library Based Architecture</a></h3>
197<!--=======================================================================-->
198
199<p>A major design concept for clang is its use of a library-based
200architecture.  In this design, various parts of the front-end can be cleanly
201divided into separate libraries which can then be mixed up for different needs
202and uses.  In addition, the library-based approach encourages good interfaces
203and makes it easier for new developers to get involved (because they only need
204to understand small pieces of the big picture).</p>
205
206<blockquote><p>
207"The world needs better compiler tools, tools which are built as libraries.
208This design point allows reuse of the tools in new and novel ways. However,
209building the tools as libraries isn't enough: they must have clean APIs, be as
210decoupled from each other as possible, and be easy to modify/extend. This
211requires clean layering, decent design, and keeping the libraries independent of
212any specific client."</p></blockquote>
213
214<p>
215Currently, clang is divided into the following libraries and tool:
216</p>
217
218<ul>
219<li><b>libsupport</b> - Basic support library, from LLVM.</li>
220<li><b>libsystem</b> - System abstraction library, from LLVM.</li>
221<li><b>libbasic</b> - Diagnostics, SourceLocations, SourceBuffer abstraction,
222    file system caching for input source files.</li>
223<li><b>libast</b> - Provides classes to represent the C AST, the C type system,
224    builtin functions, and various helpers for analyzing and manipulating the
225    AST (visitors, pretty printers, etc).</li>
226<li><b>liblex</b> - Lexing and preprocessing, identifier hash table, pragma
227    handling, tokens, and macro expansion.</li>
228<li><b>libparse</b> - Parsing. This library invokes coarse-grained 'Actions'
229    provided by the client (e.g. libsema builds ASTs) but knows nothing about
230    ASTs or other client-specific data structures.</li>
231<li><b>libsema</b> - Semantic Analysis.  This provides a set of parser actions
232    to build a standardized AST for programs.</li>
233<li><b>libcodegen</b> - Lower the AST to LLVM IR for optimization &amp; code
234    generation.</li>
235<li><b>librewrite</b> - Editing of text buffers (important for code rewriting
236    transformation, like refactoring).</li>
237<li><b>libanalysis</b> - Static analysis support.</li>
238<li><b>clang</b> - A driver program, client of the libraries at various
239    levels.</li>
240</ul>
241
242<p>As an example of the power of this library based design....  If you wanted to
243build a preprocessor, you would take the Basic and Lexer libraries. If you want
244an indexer, you would take the previous two and add the Parser library and
245some actions for indexing. If you want a refactoring, static analysis, or
246source-to-source compiler tool, you would then add the AST building and
247semantic analyzer libraries.</p>
248
249<p>For more information about the low-level implementation details of the
250various clang libraries, please see the <a href="docs/InternalsManual.html">
251clang Internals Manual</a>.</p>
252
253<!--=======================================================================-->
254<h3><a name="diverseclients">Support Diverse Clients</a></h3>
255<!--=======================================================================-->
256
257<p>Clang is designed and built with many grand plans for how we can use it.  The
258driving force is the fact that we use C and C++ daily, and have to suffer due to
259a lack of good tools available for it.  We believe that the C and C++ tools
260ecosystem has been significantly limited by how difficult it is to parse and
261represent the source code for these languages, and we aim to rectify this
262problem in clang.</p>
263
264<p>The problem with this goal is that different clients have very different
265requirements.  Consider code generation, for example: a simple front-end that
266parses for code generation must analyze the code for validity and emit code
267in some intermediate form to pass off to a optimizer or backend.  Because
268validity analysis and code generation can largely be done on the fly, there is
269not hard requirement that the front-end actually build up a full AST for all
270the expressions and statements in the code.  TCC and GCC are examples of
271compilers that either build no real AST (in the former case) or build a stripped
272down and simplified AST (in the later case) because they focus primarily on
273codegen.</p>
274
275<p>On the opposite side of the spectrum, some clients (like refactoring) want
276highly detailed information about the original source code and want a complete
277AST to describe it with.  Refactoring wants to have information about macro
278expansions, the location of every paren expression '(((x)))' vs 'x', full
279position information, and much more.  Further, refactoring wants to look
280<em>across the whole program</em> to ensure that it is making transformations
281that are safe.  Making this efficient and getting this right requires a
282significant amount of engineering and algorithmic work that simply are
283unnecessary for a simple static compiler.</p>
284
285<p>The beauty of the clang approach is that it does not restrict how you use it.
286In particular, it is possible to use the clang preprocessor and parser to build
287an extremely quick and light-weight on-the-fly code generator (similar to TCC)
288that does not build an AST at all.   As an intermediate step, clang supports
289using the current AST generation and semantic analysis code and having a code
290generation client free the AST for each function after code generation. Finally,
291clang provides support for building and retaining fully-fledged ASTs, and even
292supports writing them out to disk.</p>
293
294<p>Designing the libraries with clean and simple APIs allows these high-level
295policy decisions to be determined in the client, instead of forcing "one true
296way" in the implementation of any of these libraries.  Getting this right is
297hard, and we don't always get it right the first time, but we fix any problems
298when we realize we made a mistake.</p>
299
300<!--=======================================================================-->
301<h3 id="ideintegration">Integration with IDEs</h3>
302<!--=======================================================================-->
303
304<p>
305We believe that Integrated Development Environments (IDE's) are a great way
306to pull together various pieces of the development puzzle, and aim to make clang
307work well in such an environment.  The chief advantage of an IDE is that they
308typically have visibility across your entire project and are long-lived
309processes, whereas stand-alone compiler tools are typically invoked on each
310individual file in the project, and thus have limited scope.</p>
311
312<p>There are many implications of this difference, but a significant one has to
313do with efficiency and caching: sharing an address space across different files
314in a project, means that you can use intelligent caching and other techniques to
315dramatically reduce analysis/compilation time.</p>
316
317<p>A further difference between IDEs and batch compiler is that they often
318impose very different requirements on the front-end: they depend on high
319performance in order to provide a "snappy" experience, and thus really want
320techniques like "incremental compilation", "fuzzy parsing", etc.  Finally, IDEs
321often have very different requirements than code generation, often requiring
322information that a codegen-only frontend can throw away.  Clang is
323specifically designed and built to capture this information.
324</p>
325
326
327<!--=======================================================================-->
328<h3><a name="license">Use the LLVM 'BSD' License</a></h3>
329<!--=======================================================================-->
330
331<p>We actively intend for clang (and LLVM as a whole) to be used for
332commercial projects, not only as a stand-alone compiler but also as a library
333embedded inside a proprietary application.  The BSD license is the simplest way
334to allow this.  We feel that the license encourages contributors to pick up the
335source and work with it, and believe that those individuals and organizations
336will contribute back their work if they do not want to have to maintain a fork
337forever (which is time consuming and expensive when merges are involved).
338Further, nobody makes money on compilers these days, but many people need them
339to get bigger goals accomplished: it makes sense for everyone to work
340together.</p>
341
342<p>For more information about the LLVM/clang license, please see the <a
343href="http://llvm.org/docs/DeveloperPolicy.html#license">LLVM License
344Description</a> for more information.</p>
345
346
347
348<!--*************************************************************************-->
349<h2><a name="design">Internal Design and Implementation</a></h2>
350<!--*************************************************************************-->
351
352<!--=======================================================================-->
353<h3><a name="real">A real-world, production quality compiler</a></h3>
354<!--=======================================================================-->
355
356<p>
357Clang is designed and built by experienced compiler developers who
358are increasingly frustrated with the problems that <a
359href="comparison.html">existing open source compilers</a> have.  Clang is
360carefully and thoughtfully designed and built to provide the foundation of a
361whole new generation of C/C++/Objective C development tools, and we intend for
362it to be production quality.</p>
363
364<p>Being a production quality compiler means many things: it means being high
365performance, being solid and (relatively) bug free, and it means eventually
366being used and depended on by a broad range of people.  While we are still in
367the early development stages, we strongly believe that this will become a
368reality.</p>
369
370<!--=======================================================================-->
371<h3><a name="simplecode">A simple and hackable code base</a></h3>
372<!--=======================================================================-->
373
374<p>Our goal is to make it possible for anyone with a basic understanding
375of compilers and working knowledge of the C/C++/ObjC languages to understand and
376extend the clang source base.  A large part of this falls out of our decision to
377make the AST mirror the languages as closely as possible: you have your friendly
378if statement, for statement, parenthesis expression, structs, unions, etc, all
379represented in a simple and explicit way.</p>
380
381<p>In addition to a simple design, we work to make the source base approachable
382by commenting it well, including citations of the language standards where
383appropriate, and designing the code for simplicity.  Beyond that, clang offers
384a set of AST dumpers, printers, and visualizers that make it easy to put code in
385and see how it is represented.</p>
386
387<!--=======================================================================-->
388<h3><a name="unifiedparser">A single unified parser for C, Objective C, C++,
389and Objective C++</a></h3>
390<!--=======================================================================-->
391
392<p>Clang is the "C Language Family Front-end", which means we intend to support
393the most popular members of the C family.  We are convinced that the right
394parsing technology for this class of languages is a hand-built recursive-descent
395parser.  Because it is plain C++ code, recursive descent makes it very easy for
396new developers to understand the code, it easily supports ad-hoc rules and other
397strange hacks required by C/C++, and makes it straight-forward to implement
398excellent diagnostics and error recovery.</p>
399
400<p>We believe that implementing C/C++/ObjC in a single unified parser makes the
401end result easier to maintain and evolve than maintaining a separate C and C++
402parser which must be bugfixed and maintained independently of each other.</p>
403
404<!--=======================================================================-->
405<h3><a name="conformance">Conformance with C/C++/ObjC and their
406 variants</a></h3>
407<!--=======================================================================-->
408
409<p>When you start work on implementing a language, you find out that there is a
410huge gap between how the language works and how most people understand it to
411work.  This gap is the difference between a normal programmer and a (scary?
412super-natural?) "language lawyer", who knows the ins and outs of the language
413and can grok standardese with ease.</p>
414
415<p>In practice, being conformant with the languages means that we aim to support
416the full language, including the dark and dusty corners (like trigraphs,
417preprocessor arcana, C99 VLAs, etc).  Where we support extensions above and
418beyond what the standard officially allows, we make an effort to explicitly call
419this out in the code and emit warnings about it (which are disabled by default,
420but can optionally be mapped to either warnings or errors), allowing you to use
421clang in "strict" mode if you desire.</p>
422
423<p>We also intend to support "dialects" of these languages, such as C89, K&amp;R
424C, C++'03, Objective-C 2, etc.</p>
425
426</div>
427</body>
428</html>
429