1<!-- ##### SECTION Title ##### --> 2Lexical Scanner 3 4<!-- ##### SECTION Short_Description ##### --> 5a general purpose lexical scanner 6 7<!-- ##### SECTION Long_Description ##### --> 8<para> 9The #GScanner and its associated functions provide a general purpose 10lexical scanner. 11</para> 12 13<!-- 14FIXME: really needs an example and more detail, but I don't completely 15understand it myself. Look at gtkrc.c for some code using the scanner. 16--> 17 18<!-- ##### SECTION See_Also ##### --> 19<para> 20 21</para> 22 23<!-- ##### SECTION Stability_Level ##### --> 24 25 26<!-- ##### STRUCT GScanner ##### --> 27<para> 28The data structure representing a lexical scanner. 29</para> 30<para> 31You should set <structfield>input_name</structfield> after creating 32the scanner, since it is used by the default message handler when 33displaying warnings and errors. If you are scanning a file, the file 34name would be a good choice. 35</para> 36<para> 37The <structfield>user_data</structfield> and 38<structfield>max_parse_errors</structfield> fields are not used. 39If you need to associate extra data with the scanner you can place them here. 40</para> 41<para> 42If you want to use your own message handler you can set the 43<structfield>msg_handler</structfield> field. The type of the message 44handler function is declared by #GScannerMsgFunc. 45</para> 46 47@user_data: 48@max_parse_errors: 49@parse_errors: 50@input_name: 51@qdata: 52@config: 53@token: token parsed by the last g_scanner_get_next_token() 54@value: value of the last token from g_scanner_get_next_token() 55@line: line number of the last token from g_scanner_get_next_token() 56@position: char number of the last token from g_scanner_get_next_token() 57@next_token: token parsed by the last g_scanner_peek_next_token() 58@next_value: value of the last token from g_scanner_peek_next_token() 59@next_line: line number of the last token from g_scanner_peek_next_token() 60@next_position: char number of the last token from g_scanner_peek_next_token() 61@symbol_table: 62@input_fd: 63@text: 64@text_end: 65@buffer: 66@scope_id: 67@msg_handler: function to handle GScanner message output 68 69<!-- ##### STRUCT GScannerConfig ##### --> 70<para> 71Specifies the #GScanner parser configuration. Most settings can be changed during 72the parsing phase and will affect the lexical parsing of the next unpeeked token. 73</para> 74<para> 75<structfield>cset_skip_characters</structfield> specifies which characters 76should be skipped by the scanner (the default is the whitespace characters: 77space, tab, carriage-return and line-feed). 78</para> 79<para> 80<structfield>cset_identifier_first</structfield> specifies the characters 81which can start identifiers (the default is #G_CSET_a_2_z, "_", and 82#G_CSET_A_2_Z). 83</para> 84<para> 85<structfield>cset_identifier_nth</structfield> specifies the characters 86which can be used in identifiers, after the first character (the default 87is #G_CSET_a_2_z, "_0123456789", #G_CSET_A_2_Z, #G_CSET_LATINS, 88#G_CSET_LATINC). 89</para> 90<para> 91<structfield>cpair_comment_single</structfield> specifies the characters 92at the start and end of single-line comments. The default is "#\n" which 93means that single-line comments start with a '#' and continue until a '\n' 94(end of line). 95</para> 96<para> 97<structfield>case_sensitive</structfield> specifies if symbols are 98case sensitive (the default is %FALSE). 99</para> 100<para> 101<structfield>skip_comment_multi</structfield> specifies if multi-line 102comments are skipped and not returned as tokens (the default is %TRUE). 103</para> 104<para> 105<structfield>skip_comment_single</structfield> specifies if single-line 106comments are skipped and not returned as tokens (the default is %TRUE). 107</para> 108<para> 109<structfield>scan_comment_multi</structfield> specifies if multi-line 110comments are recognized (the default is %TRUE). 111</para> 112<para> 113<structfield>scan_identifier</structfield> specifies if identifiers 114are recognized (the default is %TRUE). 115</para> 116<para> 117<structfield>scan_identifier_1char</structfield> specifies if single-character 118identifiers are recognized (the default is %FALSE). 119</para> 120<para> 121<structfield>scan_identifier_NULL</structfield> specifies if 122<literal>NULL</literal> is reported as #G_TOKEN_IDENTIFIER_NULL. 123(the default is %FALSE). 124</para> 125<para> 126<structfield>scan_symbols</structfield> specifies if symbols are 127recognized (the default is %TRUE). 128</para> 129<para> 130<structfield>scan_binary</structfield> specifies if binary numbers 131are recognized (the default is %FALSE). 132</para> 133<para> 134<structfield>scan_octal</structfield> specifies if octal numbers 135are recognized (the default is %TRUE). 136</para> 137<para> 138<structfield>scan_float</structfield> specifies if floating point numbers 139are recognized (the default is %TRUE). 140</para> 141<para> 142<structfield>scan_hex</structfield> specifies if hexadecimal numbers 143are recognized (the default is %TRUE). 144</para> 145<para> 146<structfield>scan_hex_dollar</structfield> specifies if '$' is recognized 147as a prefix for hexadecimal numbers (the default is %FALSE). 148</para> 149<para> 150<structfield>scan_string_sq</structfield> specifies if strings can be 151enclosed in single quotes (the default is %TRUE). 152</para> 153<para> 154<structfield>scan_string_dq</structfield> specifies if strings can be 155enclosed in double quotes (the default is %TRUE). 156</para> 157<para> 158<structfield>numbers_2_int</structfield> specifies if binary, octal and 159hexadecimal numbers are reported as #G_TOKEN_INT (the default is %TRUE). 160</para> 161<para> 162<structfield>int_2_float</structfield> specifies if all numbers are 163reported as #G_TOKEN_FLOAT (the default is %FALSE). 164</para> 165<para> 166<structfield>identifier_2_string</structfield> specifies if identifiers 167are reported as strings (the default is %FALSE). 168</para> 169<para> 170<structfield>char_2_token</structfield> specifies if characters 171are reported by setting <literal>token = ch</literal> or as #G_TOKEN_CHAR 172(the default is %TRUE). 173</para> 174<para> 175<structfield>symbol_2_token</structfield> specifies if symbols 176are reported by setting <literal>token = v_symbol</literal> or as 177#G_TOKEN_SYMBOL (the default is %FALSE). 178</para> 179<para> 180<structfield>scope_0_fallback</structfield> specifies if a symbol 181is searched for in the default scope in addition to the current scope 182(the default is %FALSE). 183</para> 184 185@cset_skip_characters: 186@cset_identifier_first: 187@cset_identifier_nth: 188@cpair_comment_single: 189@case_sensitive: 190@skip_comment_multi: 191@skip_comment_single: 192@scan_comment_multi: 193@scan_identifier: 194@scan_identifier_1char: 195@scan_identifier_NULL: 196@scan_symbols: 197@scan_binary: 198@scan_octal: 199@scan_float: 200@scan_hex: 201@scan_hex_dollar: 202@scan_string_sq: 203@scan_string_dq: 204@numbers_2_int: 205@int_2_float: 206@identifier_2_string: 207@char_2_token: 208@symbol_2_token: 209@scope_0_fallback: 210@store_int64: 211@padding_dummy: 212 213<!-- ##### FUNCTION g_scanner_new ##### --> 214<para> 215Creates a new #GScanner. 216The @config_templ structure specifies the initial settings of the scanner, 217which are copied into the #GScanner <structfield>config</structfield> field. 218If you pass %NULL then the default settings are used. 219</para> 220 221@config_templ: the initial scanner settings. 222@Returns: the new #GScanner. 223 224 225<!-- ##### FUNCTION g_scanner_destroy ##### --> 226<para> 227Frees all memory used by the #GScanner. 228</para> 229 230@scanner: a #GScanner. 231 232 233<!-- ##### FUNCTION g_scanner_input_file ##### --> 234<para> 235Prepares to scan a file. 236</para> 237 238@scanner: a #GScanner. 239@input_fd: a file descriptor. 240 241 242<!-- ##### FUNCTION g_scanner_sync_file_offset ##### --> 243<para> 244Rewinds the filedescriptor to the current buffer position and blows 245the file read ahead buffer. This is useful for third party uses of 246the scanners filedescriptor, which hooks onto the current scanning 247position. 248</para> 249 250@scanner: a #GScanner. 251 252 253<!-- ##### FUNCTION g_scanner_input_text ##### --> 254<para> 255Prepares to scan a text buffer. 256</para> 257 258@scanner: a #GScanner. 259@text: the text buffer to scan. 260@text_len: the length of the text buffer. 261 262 263<!-- ##### FUNCTION g_scanner_peek_next_token ##### --> 264<para> 265Parses the next token, without removing it from the input stream. 266The token data is placed in the 267<structfield>next_token</structfield>, 268<structfield>next_value</structfield>, 269<structfield>next_line</structfield>, and 270<structfield>next_position</structfield> fields of the #GScanner structure. 271</para> 272<para> 273Note that, while the token is not removed from the input stream (i.e. 274the next call to g_scanner_get_next_token() will return the same token), 275it will not be reevaluated. This can lead to surprising results when 276changing scope or the scanner configuration after peeking the next token. 277Getting the next token after switching the scope or configuration will 278return whatever was peeked before, regardless of any symbols that may 279have been added or removed in the new scope. 280</para> 281 282@scanner: a #GScanner. 283@Returns: the type of the token. 284 285 286<!-- ##### FUNCTION g_scanner_get_next_token ##### --> 287<para> 288Parses the next token just like g_scanner_peek_next_token() and also 289removes it from the input stream. 290The token data is placed in the 291<structfield>token</structfield>, 292<structfield>value</structfield>, 293<structfield>line</structfield>, and 294<structfield>position</structfield> fields of the #GScanner structure. 295</para> 296 297@scanner: a #GScanner. 298@Returns: the type of the token. 299 300 301<!-- ##### FUNCTION g_scanner_eof ##### --> 302<para> 303Returns %TRUE if the scanner has reached the end of the file or text buffer. 304</para> 305 306@scanner: a #GScanner. 307@Returns: %TRUE if the scanner has reached the end of the file or text buffer. 308 309 310<!-- ##### FUNCTION g_scanner_cur_line ##### --> 311<para> 312Returns the current line in the input stream (counting from 1). 313This is the line of the last token parsed via g_scanner_get_next_token(). 314</para> 315 316@scanner: a #GScanner. 317@Returns: the current line. 318 319 320<!-- ##### FUNCTION g_scanner_cur_position ##### --> 321<para> 322Returns the current position in the current line (counting from 0). 323This is the position of the last token parsed via g_scanner_get_next_token(). 324</para> 325 326@scanner: a #GScanner. 327@Returns: the current position on the line. 328 329 330<!-- ##### FUNCTION g_scanner_cur_token ##### --> 331<para> 332Gets the current token type. 333This is simply the <structfield>token</structfield> field in the #GScanner 334structure. 335</para> 336 337@scanner: a #GScanner. 338@Returns: the current token type. 339 340 341<!-- ##### FUNCTION g_scanner_cur_value ##### --> 342<para> 343Gets the current token value. 344This is simply the <structfield>value</structfield> field in the #GScanner 345structure. 346</para> 347 348@scanner: a #GScanner. 349@Returns: the current token value. 350 351 352<!-- ##### FUNCTION g_scanner_set_scope ##### --> 353<para> 354Sets the current scope. 355</para> 356 357@scanner: a #GScanner. 358@scope_id: the new scope id. 359@Returns: the old scope id. 360 361 362<!-- ##### FUNCTION g_scanner_scope_add_symbol ##### --> 363<para> 364Adds a symbol to the given scope. 365</para> 366 367@scanner: a #GScanner. 368@scope_id: the scope id. 369@symbol: the symbol to add. 370@value: the value of the symbol. 371 372 373<!-- ##### FUNCTION g_scanner_scope_foreach_symbol ##### --> 374<para> 375Calls the given function for each of the symbol/value pairs in the 376given scope of the #GScanner. The function is passed the symbol and 377value of each pair, and the given @user_data parameter. 378</para> 379 380@scanner: a #GScanner. 381@scope_id: the scope id. 382@func: the function to call for each symbol/value pair. 383@user_data: user data to pass to the function. 384 385 386<!-- ##### FUNCTION g_scanner_scope_lookup_symbol ##### --> 387<para> 388Looks up a symbol in a scope and return its value. If the 389symbol is not bound in the scope, %NULL is returned. 390</para> 391 392@scanner: a #GScanner. 393@scope_id: the scope id. 394@symbol: the symbol to look up. 395@Returns: the value of @symbol in the given scope, or %NULL 396if @symbol is not bound in the given scope. 397 398 399<!-- ##### FUNCTION g_scanner_scope_remove_symbol ##### --> 400<para> 401Removes a symbol from a scope. 402</para> 403 404@scanner: a #GScanner. 405@scope_id: the scope id. 406@symbol: the symbol to remove. 407 408 409<!-- ##### MACRO g_scanner_add_symbol ##### --> 410<para> 411Adds a symbol to the default scope. 412</para> 413 414@scanner: a #GScanner. 415@symbol: the symbol to add. 416@value: the value of the symbol. 417@Deprecated: 2.2: Use g_scanner_scope_add_symbol() instead. 418 419 420<!-- ##### MACRO g_scanner_remove_symbol ##### --> 421<para> 422Removes a symbol from the default scope. 423</para> 424 425@scanner: a #GScanner. 426@symbol: the symbol to remove. 427@Deprecated: 2.2: Use g_scanner_scope_remove_symbol() instead. 428 429 430<!-- ##### MACRO g_scanner_foreach_symbol ##### --> 431<para> 432Calls a function for each symbol in the default scope. 433</para> 434 435@scanner: a #GScanner. 436@func: the function to call with each symbol. 437@data: data to pass to the function. 438@Deprecated: 2.2: Use g_scanner_scope_foreach_symbol() instead. 439 440 441<!-- ##### MACRO g_scanner_freeze_symbol_table ##### --> 442<para> 443There is no reason to use this macro, since it does nothing. 444</para> 445 446@scanner: a #GScanner. 447@Deprecated: 2.2: This macro does nothing. 448 449 450<!-- ##### MACRO g_scanner_thaw_symbol_table ##### --> 451<para> 452There is no reason to use this macro, since it does nothing. 453</para> 454 455@scanner: a #GScanner. 456@Deprecated: 2.2: This macro does nothing. 457 458 459<!-- ##### FUNCTION g_scanner_lookup_symbol ##### --> 460<para> 461Looks up a symbol in the current scope and return its value. If the 462symbol is not bound in the current scope, %NULL is returned. 463</para> 464 465@scanner: a #GScanner. 466@symbol: the symbol to look up. 467@Returns: the value of @symbol in the current scope, or %NULL 468if @symbol is not bound in the current scope. 469 470 471<!-- ##### FUNCTION g_scanner_warn ##### --> 472<para> 473Outputs a warning message, via the #GScanner message handler. 474</para> 475 476@scanner: a #GScanner. 477@format: the message format. See the <function>printf()</function> 478documentation. 479@Varargs: the parameters to insert into the format string. 480 481 482<!-- ##### FUNCTION g_scanner_error ##### --> 483<para> 484Outputs an error message, via the #GScanner message handler. 485</para> 486 487@scanner: a #GScanner. 488@format: the message format. See the <function>printf()</function> 489documentation. 490@Varargs: the parameters to insert into the format string. 491 492 493<!-- ##### FUNCTION g_scanner_unexp_token ##### --> 494<para> 495Outputs a message through the scanner's msg_handler, resulting from an 496unexpected token in the input stream. 497Note that you should not call g_scanner_peek_next_token() followed by 498g_scanner_unexp_token() without an intermediate call to 499g_scanner_get_next_token(), as g_scanner_unexp_token() evaluates the 500scanner's current token (not the peeked token) to construct part 501of the message. 502</para> 503 504@scanner: a #GScanner. 505@expected_token: the expected token. 506@identifier_spec: a string describing how the scanner's user refers to 507 identifiers (%NULL defaults to "identifier"). 508 This is used if @expected_token is #G_TOKEN_IDENTIFIER 509 or #G_TOKEN_IDENTIFIER_NULL. 510@symbol_spec: a string describing how the scanner's user refers to 511 symbols (%NULL defaults to "symbol"). 512 This is used if @expected_token is #G_TOKEN_SYMBOL or 513 any token value greater than #G_TOKEN_LAST. 514@symbol_name: the name of the symbol, if the scanner's current token 515 is a symbol. 516@message: a message string to output at the end of the warning/error, or %NULL. 517@is_error: if %TRUE it is output as an error. If %FALSE it is output as a 518 warning. 519 520 521<!-- ##### USER_FUNCTION GScannerMsgFunc ##### --> 522<para> 523Specifies the type of the message handler function. 524</para> 525 526@scanner: a #GScanner. 527@message: the message. 528@error: %TRUE if the message signals an error, %FALSE if it 529 signals a warning. 530 531 532<!-- ##### MACRO G_CSET_a_2_z ##### --> 533<para> 534The set of lowercase ASCII alphabet characters. 535Used for specifying valid identifier characters in #GScannerConfig. 536</para> 537 538 539 540<!-- ##### MACRO G_CSET_A_2_Z ##### --> 541<para> 542The set of uppercase ASCII alphabet characters. 543Used for specifying valid identifier characters in #GScannerConfig. 544</para> 545 546 547 548<!-- ##### MACRO G_CSET_DIGITS ##### --> 549<para> 550The set of digits. 551Used for specifying valid identifier characters in #GScannerConfig. 552</para> 553 554 555 556<!-- ##### MACRO G_CSET_LATINC ##### --> 557<para> 558The set of uppercase ISO 8859-1 alphabet characters which are 559not ASCII characters. 560Used for specifying valid identifier characters in #GScannerConfig. 561</para> 562 563 564 565<!-- ##### MACRO G_CSET_LATINS ##### --> 566<para> 567The set of lowercase ISO 8859-1 alphabet characters which are 568not ASCII characters. 569Used for specifying valid identifier characters in #GScannerConfig. 570</para> 571 572 573 574<!-- ##### ENUM GTokenType ##### --> 575<para> 576The possible types of token returned from each g_scanner_get_next_token() call. 577</para> 578 579@G_TOKEN_EOF: the end of the file. 580@G_TOKEN_LEFT_PAREN: a '(' character. 581@G_TOKEN_LEFT_CURLY: a '{' character. 582@G_TOKEN_RIGHT_CURLY: a '}' character. 583 584<!-- ##### UNION GTokenValue ##### --> 585<para> 586A union holding the value of the token. 587</para> 588 589 590<!-- ##### ENUM GErrorType ##### --> 591<para> 592The possible errors, used in the <structfield>v_error</structfield> field 593of #GTokenValue, when the token is a #G_TOKEN_ERROR. 594</para> 595 596@G_ERR_UNKNOWN: unknown error. 597@G_ERR_UNEXP_EOF: unexpected end of file. 598@G_ERR_UNEXP_EOF_IN_STRING: unterminated string constant. 599@G_ERR_UNEXP_EOF_IN_COMMENT: unterminated comment. 600@G_ERR_NON_DIGIT_IN_CONST: non-digit character in a number. 601@G_ERR_DIGIT_RADIX: digit beyond radix in a number. 602@G_ERR_FLOAT_RADIX: non-decimal floating point number. 603@G_ERR_FLOAT_MALFORMED: malformed floating point number. 604 605