• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[/==============================================================================
2    Copyright (C) 2001-2011 Joel de Guzman
3    Copyright (C) 2001-2011 Hartmut Kaiser
4
5    Distributed under the Boost Software License, Version 1.0. (See accompanying
6    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7===============================================================================/]
8
9[section Preface]
10
11[:['["Examples of designs that meet most of the criteria for
12"goodness" (easy to understand, flexible, efficient) are a
13recursive-descent parser, which is traditional procedural
14code. Another example is the STL, which is a generic library of
15containers and algorithms depending crucially on both traditional
16procedural code and on parametric polymorphism.]] [*--Bjarne
17Stroustrup]]
18
19[heading History]
20
21[heading /80s/]
22
23In the mid-80s, Joel wrote his first calculator in Pascal. Such an
24unforgettable coding experience, he was amazed at how a mutually
25recursive set of functions can model a grammar specification. In time,
26the skills he acquired from that academic experience became very
27practical as he was tasked to do some parsing. For instance, whenever he
28needed to perform any form of binary or text I/O, he tried to approach
29each task somewhat formally by writing a grammar using Pascal-like
30syntax diagrams and then a corresponding recursive-descent parser. This
31process worked very well.
32
33[heading /90s/]
34
35The arrival of the Internet and the World Wide Web magnified the need
36for parsing a thousand-fold. At one point Joel had to write an HTML
37parser for a Web browser project. Using the W3C formal specifications,
38he easily wrote a recursive-descent HTML parser. With the influence of
39the Internet, RFC specifications were abundent. SGML, HTML, XML, email
40addresses and even those seemingly trivial URLs were all formally
41specified using small EBNF-style grammar specifications. Joel had more
42parsing to do, and he wished for a tool similar to larger parser
43generators such as YACC and ANTLR, where a parser is built automatically
44from a grammar specification.
45
46This ideal tool would be able to parse anything from email addresses and
47command lines, to XML and scripting languages. Scalability was a primary
48goal. The tool would be able to do this without incurring a heavy
49development load, which was not possible with the above mentioned parser
50generators. The result was Spirit.
51
52Spirit was a personal project that was conceived when Joel was involved
53in R&D in Japan. Inspired by the GoF's composite and interpreter
54patterns, he realized that he can model a recursive-descent parser with
55hierarchical-object composition of primitives (terminals) and composites
56(productions). The original version was implemented with run-time
57polymorphic classes. A parser was generated at run time by feeding in
58production rule strings such as:
59
60    "prod ::= {'A' | 'B'} 'C';"
61
62A compile function compiled the parser, dynamically creating a hierarchy
63of objects and linking semantic actions on the fly. A very early text
64can be found here: __early_spirit__.
65
66[heading /2001 to 2006/]
67
68Version 1.0 to 1.8 was a complete rewrite of the original Spirit parser
69using expression templates and static polymorphism, inspired by the
70works of Todd Veldhuizen (__exprtemplates__, C++ Report, June
711995). Initially, the static-Spirit version was meant only to replace
72the core of the original dynamic-Spirit. Dynamic-Spirit needed a parser
73to implement itself anyway. The original employed a hand-coded
74recursive-descent parser to parse the input grammar specification
75strings. It was at this time when Hartmut Kaiser joined the Spirit
76development.
77
78After its initial "open-source" debut in May 2001, static-Spirit became
79a success. At around November 2001, the Spirit website had an activity
80percentile of 98%, making it the number one parser tool at Source Forge
81at the time. Not bad for a niche project like a parser library. The
82"static" portion of Spirit was forgotten and static-Spirit simply became
83Spirit. The library soon evolved to acquire more dynamic features.
84
85Spirit was formally accepted into __boost__ in October 2002. Boost is a
86peer-reviewed, open collaborative development effort around a collection
87of free Open Source C++ libraries covering a wide range of domains. The
88Boost Libraries have become widely known as an industry standard for
89design and implementation quality, robustness, and reusability.
90
91[heading /2007/]
92
93Over the years, especially after Spirit was accepted into Boost, Spirit
94has served its purpose quite admirably. [*/Classic-Spirit/] (versions
95prior to 2.0) focused on transduction parsing, where the input string is
96merely translated to an output string. Many parsers fall into the
97transduction type. When the time came to add attributes to the parser
98library, it was done in a rather ad-hoc manner, with the goal being 100%
99backward compatible with Classic Spirit. As a result, some parsers have
100attributes, some don't.
101
102Spirit V2 is another major rewrite. Spirit V2 grammars are fully
103attributed (see __attr_grammar__) which means that all parser components
104have attributes. To do this efficiently and elegantly, we had to use a
105couple of infrastructure libraries. Some did not exist, some were quite
106new when Spirit debuted, and some needed work. __mpl__ is an important
107infrastructure library, yet is not sufficient to implement Spirit V2.
108Another library had to be written: __fusion__. Fusion sits between MPL
109and STL --between compile time and runtime -- mapping types to values.
110Fusion is a direct descendant of both MPL and __boost_tuples__. Fusion
111is now a full-fledged __boost__ library. __phoenix__ also had to be
112beefed up to support Spirit V2. The result is __phoenix__. Last
113but not least, Spirit V2 uses an __exprtemplates__ library called
114__boost_proto__.
115
116Even though it has evolved and matured to become a multi-module library,
117Spirit is still used for micro-parsing tasks as well as scripting
118languages. Like C++, you only pay for features that you need. The power
119of Spirit comes from its modularity and extensibility. Instead of giving
120you a sledgehammer, it gives you the right ingredients to easily create
121a sledgehammer.
122
123[heading New Ideas: Spirit V2]
124
125Just before the development of Spirit V2 began, Hartmut came across the
126__string_template__ library that is a part of the ANTLR parser
127framework. [footnote Quote from http://www.stringtemplate.org/: It is a
128Java template engine (with ports for C# and Python) for generating
129source code, web pages, emails, or any other formatted text output.]
130The concepts presented in that library lead Hartmut to
131the next step in the evolution of Spirit. Parsing and generation are
132tightly connected to a formal notation, or a grammar. The grammar
133describes both input and output, and therefore, a parser library should
134have a grammar driven output. This duality is expressed in Spirit by the
135parser library __qi__ and the generator library __karma__ using the same
136component infrastructure.
137
138The idea of creating a lexer library well integrated with the Spirit
139parsers is not new. This has been discussed almost since Classic-Spirit
140(pre V2) initially debuted. Several attempts to integrate existing lexer
141libraries and frameworks with Spirit have been made and served as a
142proof of concept and usability (for example see __wave__: The Boost
143C/C++ Preprocessor Library, and __slex__: a fully dynamic C++ lexer
144implemented with Spirit). Based on these experiences we added __lex__: a
145fully integrated lexer library to the mix, allowing the user to take
146advantage of the power of regular expressions for token matching,
147removing pressure from the parser components, simplifying parser
148grammars. Again, Spirit's modular structure allowed us to reuse the same
149underlying component library as for the parser and generator libraries.
150
151[heading How to use this manual]
152
153Each major section (there are 3: __sec_qi__, __sec_karma__, and
154__sec_lex__) is roughly divided into 3 parts:
155
156# Tutorials: A step by step guide with heavily annotated code. These
157  are meant to get the user acquainted with the library as quickly as
158  possible. The objective is to build the confidence of the user in
159  using the library through abundant examples and detailed
160  instructions. Examples speak volumes and we have volumes of
161  examples!
162
163# Abstracts: A high level summary of key topics. The objective is to
164  give the user a high level view of the library, the key concepts,
165  background and theories.
166
167# Reference: Detailed formal technical reference. We start with a quick
168  reference -- an easy to use table that maps into the reference proper.
169  The reference proper starts with C++ concepts followed by
170  models of the concepts.
171
172Some icons are used to mark certain topics indicative of their relevance.
173These icons precede some text to indicate:
174
175[table Icons
176
177    [[Icon]             [Name]          [Meaning]]
178
179    [[__note__]         [Note]          [Generally useful information (an aside that
180                                        doesn't fit in the flow of the text)]]
181
182    [[__tip__]          [Tip]           [Suggestion on how to do something
183                                        (especially something that is not obvious)]]
184
185    [[__important__]    [Important]     [Important note on something to take
186                                        particular notice of]]
187
188    [[__caution__]      [Caution]       [Take special care with this - it may
189                                        not be what you expect and may cause bad
190                                        results]]
191
192    [[__danger__]       [Danger]        [This is likely to cause serious
193                                        trouble if ignored]]
194]
195
196This documentation is automatically generated by Boost QuickBook
197documentation tool. QuickBook can be found in the __boost_tools__.
198
199[heading Support]
200
201Please direct all questions to Spirit's mailing list. You can subscribe
202to the __spirit_list__. The mailing list has a searchable archive. A
203search link to this archive is provided in __spirit__'s home page. You
204may also read and post messages to the mailing list through
205__spirit_general__ (thanks to __gmane__). The news group mirrors the
206mailing list. Here is a link to the archives: __mlist_archive__.
207
208[endsect] [/ Preface]
209