1[/============================================================================== 2 Copyright (C) 2001-2018 Joel de Guzman 3 4 Distributed under the Boost Software License, Version 1.0. (See accompanying 5 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 6 7 I would like to thank Rainbowverse, llc (https://primeorbial.com/) 8 for sponsoring this work and donating it to the community. 9===============================================================================/] 10 11[section:annotation Annotations - Decorating the ASTs] 12 13As a prerequisite in understanding this tutorial, please review the previous 14[tutorial_employee employee example]. This example builds on top of that 15example. 16 17Stop and think about it... We're actually generating ASTs (abstract syntax 18trees) in our previoius examples. We parsed a single structure and generated 19an in-memory representation of it in the form of a struct: the struct 20employee. If we changed the implementation to parse one or more employees, 21the result would be a std::vector<employee>. We can go on and add more 22hierarchy: teams, departments, corporations, etc. We can have an AST 23representation of it all. 24 25This example shows how to annotate the AST with the iterator positions for 26access to the source code when post processing using a client supplied 27`on_success` handler. The example will show how to get the position in input 28source stream that corresponds to a given element in the AST. 29 30In addition, This example also shows how to "inject" client data, using the 31"with" directive, that the `on_success` handler can access as it is called 32within the parse traversal through the parser's context. 33 34The full cpp file for this example can be found here: 35[@../../../example/x3/annotation.cpp annotation.cpp] 36 37[heading The AST] 38 39First, we'll update our previous employee struct, this time separating the 40person into its own struct. So now, we have two structs, the `person` and the 41`employee`. Take note too that we now inherit `person` and `employee` from 42`x3::position_tagged` which provides positional information that we can use 43to tell the AST's position in the input stream anytime. 44 45 namespace client { namespace ast 46 { 47 struct person : x3::position_tagged 48 { 49 person( 50 std::string const& first_name = "" 51 , std::string const& last_name = "" 52 ) 53 : first_name(first_name) 54 , last_name(last_name) 55 {} 56 57 std::string first_name, last_name; 58 }; 59 60 struct employee : x3::position_tagged 61 { 62 int age; 63 person who; 64 double salary; 65 }; 66 }} 67 68Like before, we need to tell __fusion__ about our structs to make them 69first-class fusion citizens that the grammar can utilize: 70 71 BOOST_FUSION_ADAPT_STRUCT(client::ast::person, 72 first_name, last_name 73 ) 74 75 BOOST_FUSION_ADAPT_STRUCT(client::ast::employee, 76 age, who, salary 77 ) 78 79[heading x3::position_cache] 80 81Before we proceed, let me introduce a helper class called the 82`position_cache`. It is a simple class that collects iterator ranges that 83point to where each element in the AST are located in the input stream. Given 84an AST, you can query the position_cache about AST's position. For example: 85 86 auto pos = positions.position_of(my_ast); 87 88Where `my_ast` is the AST, `positions` and is the `position_cache`, 89`position_of` returns an iterator range that points to the start and end 90(`pos.begin()` and `pos.end()`) positions where the AST was parsed from. 91`positions.begin()` and `positions.end()` points to the start and end of the 92entire input stream. 93 94[heading on_success] 95 96The `on_success` gives you everything you want from semantic actions without 97the visual clutter. Declarative code can and should be free from imperative 98code. `on_success` as a concept and mechanism is an important departure from 99how things are done in Spirit's previous version: Qi. 100 101As demonstrated in the previous [tutorial_employee employee example], the 102preferred way to extract data from an input source is by having the parser 103collect the data for us into C++ structs as it traverses the input stream. 104Ideally, Spirit X3 grammars are fully attributed and declared in such a way 105that you do not have to add any imperative code and there should be no need 106for semantic actions at all. The parser simply works as declared and you get 107your data back as a result. 108 109However, there are certain cases where there's no way to avoid introducing 110imperative code. But semantic actions mess up our clean declarative grammars. 111If we care to keep our code clean, `on_success` handlers are alternative 112callback hooks to client code that are executed by the parser after a 113successful parse without polluting the grammar. Like semantic actions, 114`on_success` handlers have access to the AST, the iterators, and context. 115But, unlike semantic actions, `on_success` handlers are cleanly separated 116from the actual grammar. 117 118[heading Annotation Handler] 119 120As discussed, we annotate the AST with its position in the input stream with 121our `on_success` handler: 122 123 // tag used to get the position cache from the context 124 struct position_cache_tag; 125 126 struct annotate_position 127 { 128 template <typename T, typename Iterator, typename Context> 129 inline void on_success(Iterator const& first, Iterator const& last 130 , T& ast, Context const& context) 131 { 132 auto& position_cache = x3::get<position_cache_tag>(context).get(); 133 position_cache.annotate(ast, first, last); 134 } 135 }; 136 137`position_cache_tag` is a special tag we will use to get a reference to the 138actual `position_cache`, client data that we will inject at very start, when 139we call parse. More on that later. 140 141Our `on_success` handler gets a reference to the actual `position_cache` and 142calls its `annotate` member function, passing in the AST and the iterators. 143`position_cache.annotate(ast, first, last)` annotates the AST with 144information required by `x3::position_tagged`. 145 146[heading The Parser] 147 148Now we'll write a parser for our employee. To simplify, inputs will be of the 149form: 150 151 { age, "forename", "surname", salary } 152 153[#__tutorial_annotated_employee_parser__] 154Here we go: 155 156 namespace parser 157 { 158 using x3::int_; 159 using x3::double_; 160 using x3::lexeme; 161 using ascii::char_; 162 163 struct quoted_string_class; 164 struct person_class; 165 struct employee_class; 166 167 x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string"; 168 x3::rule<person_class, ast::person> const person = "person"; 169 x3::rule<employee_class, ast::employee> const employee = "employee"; 170 171 auto const quoted_string_def = lexeme['"' >> +(char_ - '"') >> '"']; 172 auto const person_def = quoted_string >> ',' >> quoted_string; 173 174 auto const employee_def = 175 '{' 176 >> int_ >> ',' 177 >> person >> ',' 178 >> double_ 179 >> '}' 180 ; 181 182 auto const employees = employee >> *(',' >> employee); 183 184 BOOST_SPIRIT_DEFINE(quoted_string, person, employee); 185 } 186 187[heading Rule Declarations] 188 189 struct quoted_string_class; 190 struct person_class; 191 struct employee_class; 192 193 x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string"; 194 x3::rule<person_class, ast::person> const person = "person"; 195 x3::rule<employee_class, ast::employee> const employee = "employee"; 196 197Go back and review the original [link __tutorial_employee_parser__ employee parser]. 198What has changed? 199 200* We split the single employee rule into three smaller rules: `quoted_string`, 201 `person` and `employee`. 202* We're using forward declared rule classes: `quoted_string_class`, `person_class`, 203 and `employee_class`. 204 205[heading Rule Classes] 206 207Like before, in this example, the rule classes, `quoted_string_class`, 208`person_class`, and `employee_class` provide statically known IDs for the 209rules required by X3 to perform its tasks. In addition to that, the rule 210class can also be extended to have some user-defined customization hooks that 211are called: 212 213* On success: After a rule successfully parses an input. 214* On Error: After a rule fails to parse. 215 216By subclassing the rule class from a client supplied handler such as our 217`annotate_position` handler above: 218 219 struct person_class : annotate_position {}; 220 struct employee_class : annotate_position {}; 221 222The code above tells X3 to check the rule class if it has an `on_success` or 223`on_error` member functions and appropriately calls them on such events. 224 225[#__tutorial_with_directive__] 226[heading The with Directive] 227 228For any parser `p`, one can inject supplementary data that semantic actions 229and handlers can access later on when they are called. The general syntax is: 230 231 with<tag>(data)[p] 232 233For our particular example, we use to inject the `position_cache` into the 234parse for our `annotate_position` on_success handler to have access to: 235 236 auto const parser = 237 // we pass our position_cache to the parser so we can access 238 // it later in our on_sucess handlers 239 with<position_cache_tag>(std::ref(positions)) 240 [ 241 employees 242 ]; 243 244Typically this is done just before calling `x3::parse` or `x3::phrase_parse`. 245`with` is a very lightwight operation. It is possible to inject as much data 246as you want, even multiple `with` directives: 247 248 with<tag1>(data1) 249 [ 250 with<tag2>(data2)[p] 251 ] 252 253Multiple `with` directives can (perhaps not obviously) be injected from 254outside the called function. Here's an outline: 255 256 template <typename Parser> 257 void bar(Parser const& p) 258 { 259 // Inject data2 260 auto const parser = with<tag2>(data2)[p]; 261 x3::parse(first, last, parser); 262 } 263 264 void foo() 265 { 266 // Inject data1 267 auto const parser = with<tag1>(data1)[my_parser]; 268 bar(p); 269 } 270 271[heading Let's Parse] 272 273Now we have the complete parse mechanism with support for annotations: 274 275 using iterator_type = std::string::const_iterator; 276 using position_cache = boost::spirit::x3::position_cache<std::vector<iterator_type>>; 277 278 std::vector<client::ast::employee> 279 parse(std::string const& input, position_cache& positions) 280 { 281 using boost::spirit::x3::ascii::space; 282 283 std::vector<client::ast::employee> ast; 284 iterator_type iter = input.begin(); 285 iterator_type const end = input.end(); 286 287 using boost::spirit::x3::with; 288 289 // Our parser 290 using client::parser::employees; 291 using client::parser::position_cache_tag; 292 293 auto const parser = 294 // we pass our position_cache to the parser so we can access 295 // it later in our on_sucess handlers 296 with<position_cache_tag>(std::ref(positions)) 297 [ 298 employees 299 ]; 300 301 bool r = phrase_parse(iter, end, parser, space, ast); 302 303 // ... Some error checking here 304 305 return ast; 306 } 307 308Let's walk through the code. 309 310First, we have some typedefs for 1) The iterator type we are using for the 311parser, `iterator_type` and 2) For the `position_cache` type. The latter is a 312template that accepts the type of container it will hold. In this case, a 313`std::vector<iterator_type>`. 314 315The main parse function accepts an input, a std::string and a reference to a 316position_cache, and returns an AST: `std::vector<client::ast::employee>`. 317 318Inside the parse function, we first create an AST where parsed data will be 319stored: 320 321 std::vector<client::ast::employee> ast; 322 323Then finally, we create a parser, injecting a reference to the `position_cache`, 324and call phrase_parse: 325 326 using client::parser::employees; 327 using client::parser::position_cache_tag; 328 329 auto const parser = 330 // we pass our position_cache to the parser so we can access 331 // it later in our on_sucess handlers 332 with<position_cache_tag>(std::ref(positions)) 333 [ 334 employees 335 ]; 336 337 bool r = phrase_parse(iter, end, parser, space, ast); 338 339On successful parse, the AST, `ast`, will contain the actual parsed data. 340 341[heading Getting The Source Positions] 342 343Now that we have our main parse function, let's have an example sourcefile to 344parse and show how we can obtain the position of an AST element, returned 345after a successful parse. 346 347Given this input: 348 349 std::string input = R"( 350 { 351 23, 352 "Amanda", 353 "Stefanski", 354 1000.99 355 }, 356 { 357 35, 358 "Angie", 359 "Chilcote", 360 2000.99 361 }, 362 { 363 43, 364 "Dannie", 365 "Dillinger", 366 3000.99 367 }, 368 { 369 22, 370 "Dorene", 371 "Dole", 372 2500.99 373 }, 374 { 375 38, 376 "Rossana", 377 "Rafferty", 378 5000.99 379 } 380 )"; 381 382We call our parse function after instantiating a `position_cache` object that 383will hold the source stream positions: 384 385 position_cache positions{input.begin(), input.end()}; 386 auto ast = parse(input, positions); 387 388We now have an AST, `ast`, that contains the parsed results. Let us get the 389source positions of the 2nd employee: 390 391 auto pos = positions.position_of(ast[1]); // zero based of course! 392 393`pos` is an iterator range that contains iterators to the start and end of 394`ast[1]` in the input stream. 395 396[heading Config] 397 398If you read the previous [tutorial_minimal Program Structure] tutorial where 399we separated various logical modules of the parser into separate cpp and 400header files, and you are wondering how to provide the context configuration 401information (see [link tutorial_configuration Config Section]), we need to 402supplement the context like this: 403 404 using phrase_context_type = x3::phrase_parse_context<x3::ascii::space_type>::type; 405 406 typedef x3::context< 407 error_handler_tag 408 , std::reference_wrapper<position_cache> 409 , phrase_context_type> 410 context_type; 411 412[endsect] 413