• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[/==============================================================================
2    Copyright (C) 2001-2018 Joel de Guzman
3
4    Distributed under the Boost Software License, Version 1.0. (See accompanying
5    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
6
7    I would like to thank Rainbowverse, llc (https://primeorbial.com/)
8    for sponsoring this work and donating it to the community.
9===============================================================================/]
10
11[section:annotation Annotations - Decorating the ASTs]
12
13As a prerequisite in understanding this tutorial, please review the previous
14[tutorial_employee employee example]. This example builds on top of that
15example.
16
17Stop and think about it... We're actually generating ASTs (abstract syntax
18trees) in our previoius examples. We parsed a single structure and generated
19an in-memory representation of it in the form of a struct: the struct
20employee. If we changed the implementation to parse one or more employees,
21the result would be a std::vector<employee>. We can go on and add more
22hierarchy: teams, departments, corporations, etc. We can have an AST
23representation of it all.
24
25This example shows how to annotate the AST with the iterator positions for
26access to the source code when post processing using a client supplied
27`on_success` handler. The example will show how to get the position in input
28source stream that corresponds to a given element in the AST.
29
30In addition, This example also shows how to "inject" client data, using the
31"with" directive, that the `on_success` handler can access as it is called
32within the parse traversal through the parser's context.
33
34The full cpp file for this example can be found here:
35[@../../../example/x3/annotation.cpp annotation.cpp]
36
37[heading The AST]
38
39First, we'll update our previous employee struct, this time separating the
40person into its own struct. So now, we have two structs, the `person` and the
41`employee`. Take note too that we now inherit `person` and `employee` from
42`x3::position_tagged` which provides positional information that we can use
43to tell the AST's position in the input stream anytime.
44
45    namespace client { namespace ast
46    {
47        struct person : x3::position_tagged
48        {
49            person(
50                std::string const& first_name = ""
51              , std::string const& last_name = ""
52            )
53            : first_name(first_name)
54            , last_name(last_name)
55            {}
56
57            std::string first_name, last_name;
58        };
59
60        struct employee : x3::position_tagged
61        {
62            int age;
63            person who;
64            double salary;
65        };
66    }}
67
68Like before, we need to tell __fusion__ about our structs to make them
69first-class fusion citizens that the grammar can utilize:
70
71    BOOST_FUSION_ADAPT_STRUCT(client::ast::person,
72        first_name, last_name
73    )
74
75    BOOST_FUSION_ADAPT_STRUCT(client::ast::employee,
76        age, who, salary
77    )
78
79[heading x3::position_cache]
80
81Before we proceed, let me introduce a helper class called the
82`position_cache`. It is a simple class that collects iterator ranges that
83point to where each element in the AST are located in the input stream. Given
84an AST, you can query the position_cache about AST's position. For example:
85
86    auto pos = positions.position_of(my_ast);
87
88Where `my_ast` is the AST, `positions` and is the `position_cache`,
89`position_of` returns an iterator range that points to the start and end
90(`pos.begin()` and `pos.end()`) positions where the AST was parsed from.
91`positions.begin()` and `positions.end()` points to the start and end of the
92entire input stream.
93
94[heading on_success]
95
96The `on_success` gives you everything you want from semantic actions without
97the visual clutter. Declarative code can and should be free from imperative
98code. `on_success` as a concept and mechanism is an important departure from
99how things are done in Spirit's previous version: Qi.
100
101As demonstrated in the previous [tutorial_employee employee example], the
102preferred way to extract data from an input source is by having the parser
103collect the data for us into C++ structs as it traverses the input stream.
104Ideally, Spirit X3 grammars are fully attributed and declared in such a way
105that you do not have to add any imperative code and there should be no need
106for semantic actions at all. The parser simply works as declared and you get
107your data back as a result.
108
109However, there are certain cases where there's no way to avoid introducing
110imperative code. But semantic actions mess up our clean declarative grammars.
111If we care to keep our code clean, `on_success` handlers are alternative
112callback hooks to client code that are executed by the parser after a
113successful parse without polluting the grammar. Like semantic actions,
114`on_success` handlers have access to the AST, the iterators, and context.
115But, unlike semantic actions, `on_success` handlers are cleanly separated
116from the actual grammar.
117
118[heading Annotation Handler]
119
120As discussed, we annotate the AST with its position in the input stream with
121our `on_success` handler:
122
123    // tag used to get the position cache from the context
124    struct position_cache_tag;
125
126    struct annotate_position
127    {
128        template <typename T, typename Iterator, typename Context>
129        inline void on_success(Iterator const& first, Iterator const& last
130        , T& ast, Context const& context)
131        {
132            auto& position_cache = x3::get<position_cache_tag>(context).get();
133            position_cache.annotate(ast, first, last);
134        }
135    };
136
137`position_cache_tag` is a special tag we will use to get a reference to the
138actual `position_cache`, client data that we will inject at very start, when
139we call parse. More on that later.
140
141Our `on_success` handler gets a reference to the actual `position_cache` and
142calls its `annotate` member function, passing in the AST and the iterators.
143`position_cache.annotate(ast, first, last)` annotates the AST with
144information required by `x3::position_tagged`.
145
146[heading The Parser]
147
148Now we'll write a parser for our employee. To simplify, inputs will be of the
149form:
150
151    { age, "forename", "surname", salary }
152
153[#__tutorial_annotated_employee_parser__]
154Here we go:
155
156    namespace parser
157    {
158        using x3::int_;
159        using x3::double_;
160        using x3::lexeme;
161        using ascii::char_;
162
163        struct quoted_string_class;
164        struct person_class;
165        struct employee_class;
166
167        x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string";
168        x3::rule<person_class, ast::person> const person = "person";
169        x3::rule<employee_class, ast::employee> const employee = "employee";
170
171        auto const quoted_string_def = lexeme['"' >> +(char_ - '"') >> '"'];
172        auto const person_def = quoted_string >> ',' >> quoted_string;
173
174        auto const employee_def =
175                '{'
176            >>  int_ >> ','
177            >>  person >> ','
178            >>  double_
179            >>  '}'
180            ;
181
182        auto const employees = employee >> *(',' >> employee);
183
184        BOOST_SPIRIT_DEFINE(quoted_string, person, employee);
185    }
186
187[heading Rule Declarations]
188
189    struct quoted_string_class;
190    struct person_class;
191    struct employee_class;
192
193    x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string";
194    x3::rule<person_class, ast::person> const person = "person";
195    x3::rule<employee_class, ast::employee> const employee = "employee";
196
197Go back and review the original [link __tutorial_employee_parser__ employee parser].
198What has changed?
199
200* We split the single employee rule into three smaller rules: `quoted_string`,
201  `person` and `employee`.
202* We're using forward declared rule classes: `quoted_string_class`, `person_class`,
203  and `employee_class`.
204
205[heading Rule Classes]
206
207Like before, in this example, the rule classes, `quoted_string_class`,
208`person_class`, and `employee_class` provide statically known IDs for the
209rules required by X3 to perform its tasks. In addition to that, the rule
210class can also be extended to have some user-defined customization hooks that
211are called:
212
213* On success: After a rule successfully parses an input.
214* On Error: After a rule fails to parse.
215
216By subclassing the rule class from a client supplied handler such as our
217`annotate_position` handler above:
218
219    struct person_class : annotate_position {};
220    struct employee_class : annotate_position {};
221
222The code above tells X3 to check the rule class if it has an `on_success` or
223`on_error` member functions and appropriately calls them on such events.
224
225[#__tutorial_with_directive__]
226[heading The with Directive]
227
228For any parser `p`, one can inject supplementary data that semantic actions
229and handlers can access later on when they are called. The general syntax is:
230
231    with<tag>(data)[p]
232
233For our particular example, we use to inject the `position_cache` into the
234parse for our `annotate_position` on_success handler to have access to:
235
236    auto const parser =
237        // we pass our position_cache to the parser so we can access
238        // it later in our on_sucess handlers
239        with<position_cache_tag>(std::ref(positions))
240        [
241            employees
242        ];
243
244Typically this is done just before calling `x3::parse` or `x3::phrase_parse`.
245`with` is a very lightwight operation. It is possible to inject as much data
246as you want, even multiple `with` directives:
247
248    with<tag1>(data1)
249    [
250        with<tag2>(data2)[p]
251    ]
252
253Multiple `with` directives can (perhaps not obviously) be injected from
254outside the called function. Here's an outline:
255
256    template <typename Parser>
257    void bar(Parser const& p)
258    {
259        // Inject data2
260        auto const parser = with<tag2>(data2)[p];
261        x3::parse(first, last, parser);
262    }
263
264    void foo()
265    {
266        // Inject data1
267        auto const parser = with<tag1>(data1)[my_parser];
268        bar(p);
269    }
270
271[heading Let's Parse]
272
273Now we have the complete parse mechanism with support for annotations:
274
275    using iterator_type = std::string::const_iterator;
276    using position_cache = boost::spirit::x3::position_cache<std::vector<iterator_type>>;
277
278    std::vector<client::ast::employee>
279    parse(std::string const& input, position_cache& positions)
280    {
281        using boost::spirit::x3::ascii::space;
282
283        std::vector<client::ast::employee> ast;
284        iterator_type iter = input.begin();
285        iterator_type const end = input.end();
286
287        using boost::spirit::x3::with;
288
289        // Our parser
290        using client::parser::employees;
291        using client::parser::position_cache_tag;
292
293        auto const parser =
294            // we pass our position_cache to the parser so we can access
295            // it later in our on_sucess handlers
296            with<position_cache_tag>(std::ref(positions))
297            [
298                employees
299            ];
300
301        bool r = phrase_parse(iter, end, parser, space, ast);
302
303        // ... Some error checking here
304
305        return ast;
306    }
307
308Let's walk through the code.
309
310First, we have some typedefs for 1) The iterator type we are using for the
311parser, `iterator_type` and 2) For the `position_cache` type. The latter is a
312template that accepts the type of container it will hold. In this case, a
313`std::vector<iterator_type>`.
314
315The main parse function accepts an input, a std::string and a reference to a
316position_cache, and returns an AST: `std::vector<client::ast::employee>`.
317
318Inside the parse function, we first create an AST where parsed data will be
319stored:
320
321    std::vector<client::ast::employee> ast;
322
323Then finally, we create a parser, injecting a reference to the `position_cache`,
324and call phrase_parse:
325
326    using client::parser::employees;
327    using client::parser::position_cache_tag;
328
329    auto const parser =
330        // we pass our position_cache to the parser so we can access
331        // it later in our on_sucess handlers
332        with<position_cache_tag>(std::ref(positions))
333        [
334            employees
335        ];
336
337    bool r = phrase_parse(iter, end, parser, space, ast);
338
339On successful parse, the AST, `ast`, will contain the actual parsed data.
340
341[heading Getting The Source Positions]
342
343Now that we have our main parse function, let's have an example sourcefile to
344parse and show how we can obtain the position of an AST element, returned
345after a successful parse.
346
347Given this input:
348
349    std::string input = R"(
350    {
351        23,
352        "Amanda",
353        "Stefanski",
354        1000.99
355    },
356    {
357        35,
358        "Angie",
359        "Chilcote",
360        2000.99
361    },
362    {
363        43,
364        "Dannie",
365        "Dillinger",
366        3000.99
367    },
368    {
369        22,
370        "Dorene",
371        "Dole",
372        2500.99
373    },
374    {
375        38,
376        "Rossana",
377        "Rafferty",
378        5000.99
379    }
380    )";
381
382We call our parse function after instantiating a `position_cache` object that
383will hold the source stream positions:
384
385    position_cache positions{input.begin(), input.end()};
386    auto ast = parse(input, positions);
387
388We now have an AST, `ast`, that contains the parsed results. Let us get the
389source positions of the 2nd employee:
390
391    auto pos = positions.position_of(ast[1]); // zero based of course!
392
393`pos` is an iterator range that contains iterators to the start and end of
394`ast[1]` in the input stream.
395
396[heading Config]
397
398If you read the previous [tutorial_minimal Program Structure] tutorial where
399we separated various logical modules of the parser into separate cpp and
400header files, and you are wondering how to provide the context configuration
401information (see [link tutorial_configuration Config Section]), we need to
402supplement the context like this:
403
404    using phrase_context_type = x3::phrase_parse_context<x3::ascii::space_type>::type;
405
406    typedef x3::context<
407        error_handler_tag
408      , std::reference_wrapper<position_cache>
409      , phrase_context_type>
410    context_type;
411
412[endsect]
413