1[/============================================================================== 2 Copyright (C) 2001-2011 Joel de Guzman 3 Copyright (C) 2001-2011 Hartmut Kaiser 4 5 Distributed under the Boost Software License, Version 1.0. (See accompanying 6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 7===============================================================================/] 8 9[section:lexer_quickstart2 Quickstart 2 - A better word counter using __lex__] 10 11People familiar with __flex__ will probably complain about the example from the 12section __sec_lex_quickstart_1__ as being overly complex and not being 13written to leverage the possibilities provided by this tool. In particular the 14previous example did not directly use the lexer actions to count the lines, 15words, and characters. So the example provided in this step of the tutorial will 16show how to use semantic actions in __lex__. Even though this examples still 17counts textual elements, the purpose is to introduce new concepts and 18configuration options along the lines (for the full example code 19see here: [@../../example/lex/word_count_lexer.cpp word_count_lexer.cpp]). 20 21[import ../example/lex/word_count_lexer.cpp] 22 23 24[heading Prerequisites] 25 26In addition to the only required `#include` specific to /Spirit.Lex/ this 27example needs to include a couple of header files from the __phoenix__ 28library. This example shows how to attach functors to token definitions, which 29could be done using any type of C++ technique resulting in a callable object. 30Using __phoenix__ for this task simplifies things and avoids adding 31dependencies to other libraries (__phoenix__ is already in use for 32__spirit__ anyway). 33 34[wcl_includes] 35 36To make all the code below more readable we introduce the following namespaces. 37 38[wcl_namespaces] 39 40To give a preview at what to expect from this example, here is the flex program 41which has been used as the starting point. The useful code is directly included 42inside the actions associated with each of the token definitions. 43 44[wcl_flex_version] 45 46 47[heading Semantic Actions in __lex__] 48 49__lex__ uses a very similar way of associating actions with the token 50definitions (which should look familiar to anybody knowledgeable with 51__spirit__ as well): specifying the operations to execute inside of a pair of 52`[]` brackets. In order to be able to attach semantic actions to token 53definitions for each of them there is defined an instance of a `token_def<>`. 54 55[wcl_token_definition] 56 57The semantics of the shown code is as follows. The code inside the `[]` 58brackets will be executed whenever the corresponding token has been matched by 59the lexical analyzer. This is very similar to __flex__, where the action code 60associated with a token definition gets executed after the recognition of a 61matching input sequence. The code above uses function objects constructed using 62__phoenix__, but it is possible to insert any C++ function or function object 63as long as it exposes the proper interface. For more details on please refer 64to the section __sec_lex_semactions__. 65 66[heading Associating Token Definitions with the Lexer] 67 68If you compare this code to the code from __sec_lex_quickstart_1__ with regard 69to the way how token definitions are associated with the lexer, you will notice 70a different syntax being used here. In the previous example we have been 71using the `self.add()` style of the API, while we here directly assign the token 72definitions to `self`, combining the different token definitions using the `|` 73operator. Here is the code snippet again: 74 75 this->self 76 = word [++ref(w), ref(c) += distance(_1)] 77 | eol [++ref(c), ++ref(l)] 78 | any [++ref(c)] 79 ; 80 81This way we have a very powerful and natural way of building the lexical 82analyzer. If translated into English this may be read as: The lexical analyzer 83will recognize ('`=`') tokens as defined by any of ('`|`') the token 84definitions `word`, `eol`, and `any`. 85 86A second difference to the previous example is that we do not explicitly 87specify any token ids to use for the separate tokens. Using semantic actions to 88trigger some useful work has freed us from the need to define those. To ensure 89every token gets assigned a id the __lex__ library internally assigns unique 90numbers to the token definitions, starting with the constant defined by 91`boost::spirit::lex::min_token_id`. 92 93[heading Pulling everything together] 94 95In order to execute the code defined above we still need to instantiate an 96instance of the lexer type, feed it from some input sequence and create a pair 97of iterators allowing to iterate over the token sequence as created by the 98lexer. This code shows how to achieve these steps: 99 100[wcl_main] 101 102 103[endsect] 104