1/// \page interop Interacting with the Generated Code 2/// 3/// \section intro Introduction 4/// 5/// The main way to interact with the generated code is via action code placed within <code>{</code> and 6/// <code>}</code> characters in your rules. In general, you are advised to keep the code you embed within 7/// these actions, and the grammar itself to an absolute minimum. Rather than embed code directly in your 8/// grammar, you should construct an API, that is called from the actions within your grammar. This way 9/// you will keep the grammar clean and maintainable and separate the code generators or other code 10/// from the definition of the grammar itself. 11/// 12/// However, when you wish to call your API functions, or insert small pieces of code that do not 13/// warrant external functions, you will need to access elements of tokens, return elements from 14/// parser rules and perhaps the internals of the recognizer itself. The C runtime provides a number 15/// of MACROs that you can use within your action code. It also provides a number of performant 16/// structures that you may find useful for building symbol tables, lists, tries, stacks, arrays and so on (all 17/// of which are managed so that your memory allocation problems are minimized.) 18/// 19/// \section rules Parameters and Returns from Parser Rules 20/// 21/// The C target does not differ from the Java target in any major ways here, and you should consult 22/// the standard documentation for the use of parameters on rules and the returns clause. You should 23/// be aware though, that the rules generate C function calls and therefore the input and returns 24/// clauses are subject to the constraints of C scoping. 25/// 26/// You should note that if your parser rule returns more than a single entity, then the return 27/// type of the generated rule function is a struct, which is returned by value. This is also the case 28/// if your rule is part of a tree building grammar (uses the <code>output=AST;</code> option. 29/// 30/// Other than the notes above, you can use any pre-declared type as an input or output parameter 31/// for your rule. 32/// 33/// \section memory Memory Management 34/// 35/// You are responsible for allocating and freeing any memory used by your own 36/// constructs, ANTLR will track and release any memory allocated internally for tokens, trees, stacks, scopes 37/// and so on. This memory is returned to the malloc pool when you call the free method of any 38/// ANTLR3 produced structure. 39/// 40/// For performance reasons, and to avoid thrashing the malloc allocation system, memory for amy elements 41/// of your generated parser is allocated in chunks and parcelled out by factories. For instance memory 42/// for tokens is created as an array of tokens, and a token factory hands out the next available slot 43/// to the lexer. When you free the lexer, the allocated memory is returned to the pool. The same applies 44/// to 'strings' that contain the token text and various other text elements accessed within the lexer. 45/// 46/// The only side effect of this is that after your parse and analysis is complete, if you wish to retain 47/// anything generated automatically, you must copy it before freeing the recognizer structures. In practice 48/// it is usually practical to retain the recognizer context objects until your processing is complete or 49/// to use your own allocation scheme for generating output etc. 50/// 51/// The advantage of using object factories is of course that memory leaks and accessing de-allocated 52/// memory are bugs that rarely occur within the ANTLR3 C runtime. Further, allocating memory for 53/// tokens, trees and so on is very fast. 54/// 55/// \section ctx The CTX Macro 56/// 57/// The CTX macro is a fundamental parameter that is passed as the first parameter to any generated function 58/// concerned with your lexer, parser, or tree parser. The is is the context pointer for your generated 59/// recognizer and is how you invoke the generated functions, and access the data embedded within your generated 60/// recognizer. While you can use it to directly access stacks, scopes and so on, this is not really recommended 61/// as you should use the $xxx references that are available generically within ANTLR grammars. 62/// 63/// The context pointer is used because this removes the need for any global/static variables at all, either 64/// within the generated code, or the C runtime. This is of course fundamental to creating free threading 65/// recognizers. Wherever a function call or rule call required the ctx parameter, you either reference it 66/// via the CTX macro, or the ctx parameter is in fact the return type from calling the 'constructor' 67/// function for your parser/lexer/tree parser (see code example in "How to build Generated Code" .) 68/// 69/// \section macros Pre-Defined convenience MACROs 70/// 71/// While the author is not fond of using C MACROs to hide code or structure access, in the case of generated 72/// code, they serve two useful purposes. The first is to simplify the references to internal constructs, 73/// the second is to facilitate the change of any internal interface without requiring you to port grammars 74/// from earlier versions (just regenerate and recompile). As of release 3.1, these macros are stable and 75/// will only change their usage interface in the event of bugs being discovered. You are encouraged to 76/// use these macros in your code, rather than access the raw interface. 77/// 78/// \bNB: Macros that act like statements must be terminated with a ';'. The macro body does not 79/// supply this, nor should it. Macros that call functions are declared with () even if they 80/// have no parameters, macros that reference fields do not have a () declaration. 81/// 82/// \section lexermacros Lexer Macros 83/// 84/// There are a number of macros that are useful exclusively within lexer rules. There are additional 85/// macros, common to all recognizer, and these are documented in the section Common Macros. 86/// 87/// \subsection lexer LEXER 88/// 89/// The <code>LEXER</code> macro returns a pointer to the base lexer object, which is of type #pANTLR3_LEXER. This is 90/// not the pointer to your generated lexer, which is supplied by the CTX macro, 91/// but to the common implementation of a lexer interface, 92/// which is supplied to all generated lexers. 93/// 94/// \subsection lexstate LEXSTATE 95/// 96/// Provides a pointer to the lexer shared state structure, which is where the tokens for a 97/// rule are constructed and the status elements of the lexer are kept. This pointer is of type 98/// #pANTLR3_RECOGNIZER_SHARED_STATE.In general you should only access elements of this structure 99/// if there is not already another MACRO or standard $xxxx antlr reference that refers to it. 100/// 101/// \subsection la LA(n) 102/// 103/// The <code>LA</code> macro returns the character at index n from the current input stream index. The return 104/// type is #ANTLR3_UINT32. Hence <code>LA(1)</code> returns the character at the current input position (the 105/// character that will be consumed next), <code>LA(-1)</code> returns the character that has just been consumed 106/// and so on. The <code>LA(n)</code> macro is useful for constructing semantic predicates in lexer rules. The 107/// reference <code>LA(0)</code> is undefined and will cause an error in your lexer. 108/// 109/// \subsection getcharindex GETCHARINDEX() 110/// 111/// The <code>GETCHARINDEX</code> macro returns the index of the current character position as a 0 based 112/// offset from the start of the input stream. It returns a value type of #ANTLR3_UINT32. 113/// 114/// \subsection getline GETLINE() 115/// 116/// The <code>GETLINE</code> macro returns the line number of current character (<code>LA(1)</code> in the input 117/// stream. It returns a value type of #ANTLR3_UINT32. Note that the line number is incremented 118/// automatically by an input stream when it sees the input character '\n'. The character that causes 119/// the line number to increment can be changed by calling the SetNewLineChar() method on the input 120/// stream before invoking the lexer and after creating the input stream. 121/// 122/// \subsection gettext GETTEXT() 123/// 124/// The <code>GETTEXT</code> macro returns the text currently matched by the lexer rule. In general you should use the 125/// generic $text reference in ANTLR to retrieve this. The return type is a reference type of #pANTLR3_STRING 126/// which allows you to manipulate the text you have retrieved (\b NB this does not change the input stream 127/// only the text you copy from the input stream when you use this MACRO or $text). 128/// 129/// The reference $text->chars or GETTEXT()->chars will reference a pointer to the '\\0' terminated character 130/// string that the ANTLR3 #pANTLR3_STRING represents. String space is allocated automatically as well as 131/// the structure that holds the string. The #pANTLR3_STRING_FACTORY associated with the lexer handles this 132/// and when you close the lexer, it will automatically free any space allocated for strings and their structures. 133/// 134/// \subsection getcharpositioninline GETCHARPOSITIONINLINE() 135/// 136/// The <code>GETCHARPOSITIONINLINE</code> returns the zero based offset of character <code>LA(1)</code> 137/// from the start of the current input line. See the macro <code>GETLINE</code> for details on what the 138/// line number means. 139/// 140/// \subsection emit EMIT() 141/// 142/// The macro <code>EMIT</code> causes the text range currently matched to the lexer rule to be emitted 143/// immediately as the token for the rule. Subsequent text is matched but ignored. The type used for the 144/// the token is the name of the lexer rule or, if you have change this by using $type = XXX;, the type 145/// XXX is used. 146/// 147/// \subsection emitnew EMITNEW(t) 148/// 149/// The macro <code>EMITNEW</code> causes the supplied token reference <code>t</code> to be used as the 150/// token emitted by the rule. The parameter <code>t </code> must be of type #pANTLR3_COMMON_TOKEN. 151/// 152/// \subsection index INDEX() 153/// 154/// The <code>INDEX</code> macro returns the current input position according to the input stream. It is not 155/// guaranteed to be the character offset in the input stream but is instead used as a value 156/// for marking and rewinding to specific points in the input stream. Use the macro <code>GETCHARINDEX()</code> 157/// to find out the position of the <code>LA(1)</code> in the input stream. 158/// 159/// \subsection pushstream PUSHSTREAM(str) 160/// 161/// The <code>PUSHSTREAM</code> macro, in conjunction with the <code>POPSTREAM</code> macro (called internally in the runtime usually) 162/// can be used to stack many input streams to the lexer, and implement constructs such as the C pre-processor 163/// \#include directive. 164/// 165/// An input stream that is pushed on to the stack becomes the current input stream for the lexer and 166/// the state of the previous stream is automatically saved. The input stream will be automatically 167/// popped from the stack when it is exhausted by the lexer. You may use the macro <code>POPSTREAM</code> 168/// to return to the previous input stream prior to exhausting the currently stacked input stream. 169/// 170/// Here is an example of using the macro in a lexer to implement the C \#include pre-processor directive: 171/// 172/// \code 173/// fragment 174/// STRING_GUTS : (~('\\'|'"') )* ; 175/// 176/// LINE_COMMAND 177/// : '#' (' ' | '\t')* 178/// ( 179/// 'include' (' ' | '\t')+ '"' file = STRING_GUTS '"' (' ' | '\t')* '\r'? '\n' 180/// { 181/// pANTLR3_STRING fName; 182/// pANTLR3_INPUT_STREAM in; 183/// 184/// // Create an initial string, then take a substring 185/// // We can do this by messing with the start and end 186/// // pointers of tokens and so on. This shows a reasonable way to 187/// // manipulate strings. 188/// // 189/// fName = $file.text; 190/// printf("Including file '\%s'\n", fName->chars); 191/// 192/// // Create a new input stream and take advantage of built in stream stacking 193/// // in C target runtime. 194/// // 195/// in = antlr38BitFileStreamNew(fName->chars); 196/// PUSHSTREAM(in); 197/// 198/// // Note that the input stream is not closed when it EOFs, I don't bother 199/// // to do it here, but it is up to you to track streams created like this 200/// // and destroy them when the whole parse session is complete. Remember that you 201/// // don't want to do this until all tokens have been manipulated all the way through 202/// // your tree parsers etc as the token does not store the text it just refers 203/// // back to the input stream and trying to get the text for it will abort if you 204/// // close the input stream too early. 205/// // 206/// 207/// } 208/// | (('0'..'9')=>('0'..'9'))+ ~('\n'|'\r')* '\r'? '\n' 209/// ) 210/// {$channel=HIDDEN;} 211/// ; 212/// \endcode 213/// 214/// \subsection popstream POPSTREAM() 215/// 216/// Assuming that you have stacked an input stream using the PUSHSTREAM macro, you can 217/// remove it from the stream stack and revert to the previous input stream. You should be careful 218/// to pop the stream at an appropriate point in your lexer action, so you do not match characters 219/// from one stream with those from another in the same rule (unless this is what you want to do) 220/// 221/// \subsection settext SETTEXT(str) 222/// 223/// A token manufactured by the lexer does not actually physically store the text from the 224/// input stream to which it matches. The token string is instead created only if you ask for 225/// the text. However if you wish to change the text that the token represents you can use 226/// this macro to set it explicitly. Note that this does not change the input stream text 227/// but associates the supplied #pANTLR3_STRING with the token. This string is then returned 228/// when parser and tree parser reference the tokens via the $xxx.text reference. 229/// 230/// \subsection user1 USER1 USER2 USER3 and CUSTOM 231/// 232/// While you can create your own custom token class and have the lexer deal with this, this 233/// is a lot of work compared to the trivial inheritance that can be achieved in the Java target. 234/// In many cases though, all that is needed is the addition of a few data items such as an 235/// integer or a pointer. Rather than require C programmers to create complicated structures 236/// just to add a few data items, the C target provides a few custom fields in the standard 237/// token, which will fulfil the needs of most lexers and parsers. 238/// 239/// The token fields user1, user2, and user3 are all value types of #ANTLR_UINT32. In the 240/// parser you can reference these fields directly from the token: <code>x=TOKNAME { $x->user1 ...</code> 241/// but when you are building the token in the lexer, you must assign to the fields using the 242/// macros <code>USER1</code>, <code>USER2</code>, or <code>USER3</code>. As in: 243/// 244/// \code 245/// LEXTOK: 'AAAAA' { USER1 = 99; } ; 246/// \endcode 247/// 248/// 249/// \section parsermacros Parser and Tree Parser Macros 250/// 251/// \subsection parser PARSER 252/// 253/// The <code>PARSER</code> macro returns a pointer to the base parser or tree parser object, which is of type #pANTLR3_PARSER 254/// or #pANTLR3_TREE_PARSER . This is not the pointer to your generated parser, which is supplied by the <code>CTX</code> macro, 255/// but to the common implementation of a parser or tree parser interface, which is supplied to all generated parsers. 256/// 257/// \subsection index INDEX() 258/// 259/// When used in the parser, the <code>INDEX</code> macro returns the position of the current 260/// token ( LT(1) ) in the input token stream. It can be used for <code>MARK</code> and <code>REWIND</code> 261/// operations. 262/// 263/// \subsection lt LT(n) and LA(n) 264/// 265/// In the parser, the macro <code>LT(n)</code> returns the #pANTLR3_COMMON_TOKEN at offset <code>n</code> from 266/// the current token stream input position. The macro <code>LA(n)</code> returns the token type of the token 267/// at position <code>n</code>. The value <code>n</code> cannot be zero, and such a reference will return 268/// <code>NULL</code> and possibly cause an error. <code>LA(1)</code> is the token that is about to be 269/// recognized and <code>LA(-1)</code> is the token that has just been recognized. Values of n that exceed the 270/// limits of the token stream boundaries will return <code>NULL</code>. 271/// 272/// \subsection psrstate PSRSTATE 273/// 274/// Returns the shared state pointer of type #pANTLR3_RECOGNIZER_SHARED_STATE. This is not generally 275/// useful to the grammar programmer as the useful elements have generic $xxx references built in to 276/// ANTLR. 277/// 278/// \subsection adaptor ADAPTOR 279/// 280/// When building an AST via a parser, the work of constructing and manipulating trees is done 281/// by a supplied adaptor class. The default class is usually fine for most tree operations but 282/// if you wish to build your own specialized linked/tree structure, then you may need to reference 283/// the adaptor you supply directly. The <code>ADAPTOR</code> macro returns the reference to the tree adaptor 284/// which is always of type #pANTLR3_BASE_TREE_ADAPTOR, even if it is your custom adapter. 285/// 286/// \section commonmacros Macros Common to All Recognizers 287/// 288/// \subsection recognizer RECOGNIZER 289/// 290/// Returns a reference type of #pANTRL3_BASE_RECOGNIZER, which is the base functionality supplied 291/// to all recognizers, whether lexers, parsers or tree parsers. You can override methods in this 292/// interface by installing your own function pointers (once you know what you are doing). 293/// 294/// \subsection input INPUT 295/// 296/// Returns a reference to the input stream of the appropriate type for the recognizer. In a lexer 297/// this macro returns a reference type of #pANTLR3_INPUT_STREAM, in a parser this is type 298/// #pANTLR3_TOKEN_STREAM and in a tree parser this is type #pANTLR3_COMMON_TREE_NODE_STREAM. 299/// You can of course provide your own implementations of any of these interfaces. 300/// 301/// \subsection mark MARK() 302/// 303/// This macro will cause the input stream for the current recognizer to be marked with a 304/// checkpoint. It will return a value type of #ANTLR3_MARKER which you can use as the 305/// parameter to a <code>REWIND</code> macro to return to the marked point in the input. 306/// 307/// If you know you will only ever rewind to the last <code>MARK</code>, then you can ignore the return 308/// value of this macro and just use the <code>REWINDLAST</code> macro to return to the last <code>MARK</code> that 309/// was set in the input stream. 310/// 311/// \subsection rewind REWIND(m) 312/// 313/// Rewinds the appropriate input stream back to the marked checkpoint returned from a prior 314/// MARK macro call and supplied as the parameter <code>m</code> to the <code>REWIND(m)</code> 315/// macro. 316/// 317/// \subsection rewindlast REWINDLAST() 318/// 319/// Rewinds the current input stream (character, tokens, tree nodes) back to the last checkpoint 320/// marker created by a <code>MARK</code> macro call. Fails silently if there was no prior 321/// <code>MARK</code> call. 322/// 323/// \subsection seek SEEK(n) 324/// 325/// Causes the input stream to position itself directly at offset <code>n</code> in the stream. Works for all 326/// input stream types, both lexer, parser and tree parser. 327/// 328