1Version 3.11 2--------------------- 302/15/18 beazley 4 Fixed some minor bugs related to re flags and token order. 5 Github pull requests #151 and #153. 6 702/15/18 beazley 8 Added a set_lexpos() method to grammar symbols. Github issue #148. 9 10 1104/13/17 beazley 12 Mostly minor bug fixes and small code cleanups. 13 14Version 3.10 15--------------------- 1601/31/17: beazley 17 Changed grammar signature computation to not involve hashing 18 functions. Parts are just combined into a big string. 19 2010/07/16: beazley 21 Fixed Issue #101: Incorrect shift-reduce conflict resolution with 22 precedence specifier. 23 24 PLY was incorrectly resolving shift-reduce conflicts in certain 25 cases. For example, in the example/calc/calc.py example, you 26 could trigger it doing this: 27 28 calc > -3 - 4 29 1 (correct answer should be -7) 30 calc > 31 32 Issue and suggested patch contributed by https://github.com/RomaVis 33 34Version 3.9 35--------------------- 3608/30/16: beazley 37 Exposed the parser state number as the parser.state attribute 38 in productions and error functions. For example: 39 40 def p_somerule(p): 41 ''' 42 rule : A B C 43 ''' 44 print('State:', p.parser.state) 45 46 May address issue #65 (publish current state in error callback). 47 4808/30/16: beazley 49 Fixed Issue #88. Python3 compatibility with ply/cpp. 50 5108/30/16: beazley 52 Fixed Issue #93. Ply can crash if SyntaxError is raised inside 53 a production. Not actually sure if the original implementation 54 worked as documented at all. Yacc has been modified to follow 55 the spec as outlined in the CHANGES noted for 11/27/07 below. 56 5708/30/16: beazley 58 Fixed Issue #97. Failure with code validation when the original 59 source files aren't present. Validation step now ignores 60 the missing file. 61 6208/30/16: beazley 63 Minor fixes to version numbers. 64 65Version 3.8 66--------------------- 6710/02/15: beazley 68 Fixed issues related to Python 3.5. Patch contributed by Barry Warsaw. 69 70Version 3.7 71--------------------- 7208/25/15: beazley 73 Fixed problems when reading table files from pickled data. 74 7505/07/15: beazley 76 Fixed regression in handling of table modules if specified as module 77 objects. See https://github.com/dabeaz/ply/issues/63 78 79Version 3.6 80--------------------- 8104/25/15: beazley 82 If PLY is unable to create the 'parser.out' or 'parsetab.py' files due 83 to permission issues, it now just issues a warning message and 84 continues to operate. This could happen if a module using PLY 85 is installed in a funny way where tables have to be regenerated, but 86 for whatever reason, the user doesn't have write permission on 87 the directory where PLY wants to put them. 88 8904/24/15: beazley 90 Fixed some issues related to use of packages and table file 91 modules. Just to emphasize, PLY now generates its special 92 files such as 'parsetab.py' and 'lextab.py' in the *SAME* 93 directory as the source file that uses lex() and yacc(). 94 95 If for some reason, you want to change the name of the table 96 module, use the tabmodule and lextab options: 97 98 lexer = lex.lex(lextab='spamlextab') 99 parser = yacc.yacc(tabmodule='spamparsetab') 100 101 If you specify a simple name as shown, the module will still be 102 created in the same directory as the file invoking lex() or yacc(). 103 If you want the table files to be placed into a different package, 104 then give a fully qualified package name. For example: 105 106 lexer = lex.lex(lextab='pkgname.files.lextab') 107 parser = yacc.yacc(tabmodule='pkgname.files.parsetab') 108 109 For this to work, 'pkgname.files' must already exist as a valid 110 Python package (i.e., the directories must already exist and be 111 set up with the proper __init__.py files, etc.). 112 113Version 3.5 114--------------------- 11504/21/15: beazley 116 Added support for defaulted_states in the parser. A 117 defaulted_state is a state where the only legal action is a 118 reduction of a single grammar rule across all valid input 119 tokens. For such states, the rule is reduced and the 120 reading of the next lookahead token is delayed until it is 121 actually needed at a later point in time. 122 123 This delay in consuming the next lookahead token is a 124 potentially important feature in advanced parsing 125 applications that require tight interaction between the 126 lexer and the parser. For example, a grammar rule change 127 modify the lexer state upon reduction and have such changes 128 take effect before the next input token is read. 129 130 *** POTENTIAL INCOMPATIBILITY *** 131 One potential danger of defaulted_states is that syntax 132 errors might be deferred to a a later point of processing 133 than where they were detected in past versions of PLY. 134 Thus, it's possible that your error handling could change 135 slightly on the same inputs. defaulted_states do not change 136 the overall parsing of the input (i.e., the same grammar is 137 accepted). 138 139 If for some reason, you need to disable defaulted states, 140 you can do this: 141 142 parser = yacc.yacc() 143 parser.defaulted_states = {} 144 14504/21/15: beazley 146 Fixed debug logging in the parser. It wasn't properly reporting goto states 147 on grammar rule reductions. 148 14904/20/15: beazley 150 Added actions to be defined to character literals (Issue #32). For example: 151 152 literals = [ '{', '}' ] 153 154 def t_lbrace(t): 155 r'\{' 156 # Some action 157 t.type = '{' 158 return t 159 160 def t_rbrace(t): 161 r'\}' 162 # Some action 163 t.type = '}' 164 return t 165 16604/19/15: beazley 167 Import of the 'parsetab.py' file is now constrained to only consider the 168 directory specified by the outputdir argument to yacc(). If not supplied, 169 the import will only consider the directory in which the grammar is defined. 170 This should greatly reduce problems with the wrong parsetab.py file being 171 imported by mistake. For example, if it's found somewhere else on the path 172 by accident. 173 174 *** POTENTIAL INCOMPATIBILITY *** It's possible that this might break some 175 packaging/deployment setup if PLY was instructed to place its parsetab.py 176 in a different location. You'll have to specify a proper outputdir= argument 177 to yacc() to fix this if needed. 178 17904/19/15: beazley 180 Changed default output directory to be the same as that in which the 181 yacc grammar is defined. If your grammar is in a file 'calc.py', 182 then the parsetab.py and parser.out files should be generated in the 183 same directory as that file. The destination directory can be changed 184 using the outputdir= argument to yacc(). 185 18604/19/15: beazley 187 Changed the parsetab.py file signature slightly so that the parsetab won't 188 regenerate if created on a different major version of Python (ie., a 189 parsetab created on Python 2 will work with Python 3). 190 19104/16/15: beazley 192 Fixed Issue #44 call_errorfunc() should return the result of errorfunc() 193 19404/16/15: beazley 195 Support for versions of Python <2.7 is officially dropped. PLY may work, but 196 the unit tests requires Python 2.7 or newer. 197 19804/16/15: beazley 199 Fixed bug related to calling yacc(start=...). PLY wasn't regenerating the 200 table file correctly for this case. 201 20204/16/15: beazley 203 Added skipped tests for PyPy and Java. Related to use of Python's -O option. 204 20505/29/13: beazley 206 Added filter to make unit tests pass under 'python -3'. 207 Reported by Neil Muller. 208 20905/29/13: beazley 210 Fixed CPP_INTEGER regex in ply/cpp.py (Issue 21). 211 Reported by @vbraun. 212 21305/29/13: beazley 214 Fixed yacc validation bugs when from __future__ import unicode_literals 215 is being used. Reported by Kenn Knowles. 216 21705/29/13: beazley 218 Added support for Travis-CI. Contributed by Kenn Knowles. 219 22005/29/13: beazley 221 Added a .gitignore file. Suggested by Kenn Knowles. 222 22305/29/13: beazley 224 Fixed validation problems for source files that include a 225 different source code encoding specifier. Fix relies on 226 the inspect module. Should work on Python 2.6 and newer. 227 Not sure about older versions of Python. 228 Contributed by Michael Droettboom 229 23005/21/13: beazley 231 Fixed unit tests for yacc to eliminate random failures due to dict hash value 232 randomization in Python 3.3 233 Reported by Arfrever 234 23510/15/12: beazley 236 Fixed comment whitespace processing bugs in ply/cpp.py. 237 Reported by Alexei Pososin. 238 23910/15/12: beazley 240 Fixed token names in ply/ctokens.py to match rule names. 241 Reported by Alexei Pososin. 242 24304/26/12: beazley 244 Changes to functions available in panic mode error recover. In previous versions 245 of PLY, the following global functions were available for use in the p_error() rule: 246 247 yacc.errok() # Reset error state 248 yacc.token() # Get the next token 249 yacc.restart() # Reset the parsing stack 250 251 The use of global variables was problematic for code involving multiple parsers 252 and frankly was a poor design overall. These functions have been moved to methods 253 of the parser instance created by the yacc() function. You should write code like 254 this: 255 256 def p_error(p): 257 ... 258 parser.errok() 259 260 parser = yacc.yacc() 261 262 *** POTENTIAL INCOMPATIBILITY *** The original global functions now issue a 263 DeprecationWarning. 264 26504/19/12: beazley 266 Fixed some problems with line and position tracking and the use of error 267 symbols. If you have a grammar rule involving an error rule like this: 268 269 def p_assignment_bad(p): 270 '''assignment : location EQUALS error SEMI''' 271 ... 272 273 You can now do line and position tracking on the error token. For example: 274 275 def p_assignment_bad(p): 276 '''assignment : location EQUALS error SEMI''' 277 start_line = p.lineno(3) 278 start_pos = p.lexpos(3) 279 280 If the trackng=True option is supplied to parse(), you can additionally get 281 spans: 282 283 def p_assignment_bad(p): 284 '''assignment : location EQUALS error SEMI''' 285 start_line, end_line = p.linespan(3) 286 start_pos, end_pos = p.lexspan(3) 287 288 Note that error handling is still a hairy thing in PLY. This won't work 289 unless your lexer is providing accurate information. Please report bugs. 290 Suggested by a bug reported by Davis Herring. 291 29204/18/12: beazley 293 Change to doc string handling in lex module. Regex patterns are now first 294 pulled from a function's .regex attribute. If that doesn't exist, then 295 .doc is checked as a fallback. The @TOKEN decorator now sets the .regex 296 attribute of a function instead of its doc string. 297 Changed suggested by Kristoffer Ellersgaard Koch. 298 29904/18/12: beazley 300 Fixed issue #1: Fixed _tabversion. It should use __tabversion__ instead of __version__ 301 Reported by Daniele Tricoli 302 30304/18/12: beazley 304 Fixed issue #8: Literals empty list causes IndexError 305 Reported by Walter Nissen. 306 30704/18/12: beazley 308 Fixed issue #12: Typo in code snippet in documentation 309 Reported by florianschanda. 310 31104/18/12: beazley 312 Fixed issue #10: Correctly escape t_XOREQUAL pattern. 313 Reported by Andy Kittner. 314 315Version 3.4 316--------------------- 31702/17/11: beazley 318 Minor patch to make cpp.py compatible with Python 3. Note: This 319 is an experimental file not currently used by the rest of PLY. 320 32102/17/11: beazley 322 Fixed setup.py trove classifiers to properly list PLY as 323 Python 3 compatible. 324 32501/02/11: beazley 326 Migration of repository to github. 327 328Version 3.3 329----------------------------- 33008/25/09: beazley 331 Fixed issue 15 related to the set_lineno() method in yacc. Reported by 332 mdsherry. 333 33408/25/09: beazley 335 Fixed a bug related to regular expression compilation flags not being 336 properly stored in lextab.py files created by the lexer when running 337 in optimize mode. Reported by Bruce Frederiksen. 338 339 340Version 3.2 341----------------------------- 34203/24/09: beazley 343 Added an extra check to not print duplicated warning messages 344 about reduce/reduce conflicts. 345 34603/24/09: beazley 347 Switched PLY over to a BSD-license. 348 34903/23/09: beazley 350 Performance optimization. Discovered a few places to make 351 speedups in LR table generation. 352 35303/23/09: beazley 354 New warning message. PLY now warns about rules never 355 reduced due to reduce/reduce conflicts. Suggested by 356 Bruce Frederiksen. 357 35803/23/09: beazley 359 Some clean-up of warning messages related to reduce/reduce errors. 360 36103/23/09: beazley 362 Added a new picklefile option to yacc() to write the parsing 363 tables to a filename using the pickle module. Here is how 364 it works: 365 366 yacc(picklefile="parsetab.p") 367 368 This option can be used if the normal parsetab.py file is 369 extremely large. For example, on jython, it is impossible 370 to read parsing tables if the parsetab.py exceeds a certain 371 threshold. 372 373 The filename supplied to the picklefile option is opened 374 relative to the current working directory of the Python 375 interpreter. If you need to refer to the file elsewhere, 376 you will need to supply an absolute or relative path. 377 378 For maximum portability, the pickle file is written 379 using protocol 0. 380 38103/13/09: beazley 382 Fixed a bug in parser.out generation where the rule numbers 383 where off by one. 384 38503/13/09: beazley 386 Fixed a string formatting bug with one of the error messages. 387 Reported by Richard Reitmeyer 388 389Version 3.1 390----------------------------- 39102/28/09: beazley 392 Fixed broken start argument to yacc(). PLY-3.0 broke this 393 feature by accident. 394 39502/28/09: beazley 396 Fixed debugging output. yacc() no longer reports shift/reduce 397 or reduce/reduce conflicts if debugging is turned off. This 398 restores similar behavior in PLY-2.5. Reported by Andrew Waters. 399 400Version 3.0 401----------------------------- 40202/03/09: beazley 403 Fixed missing lexer attribute on certain tokens when 404 invoking the parser p_error() function. Reported by 405 Bart Whiteley. 406 40702/02/09: beazley 408 The lex() command now does all error-reporting and diagonistics 409 using the logging module interface. Pass in a Logger object 410 using the errorlog parameter to specify a different logger. 411 41202/02/09: beazley 413 Refactored ply.lex to use a more object-oriented and organized 414 approach to collecting lexer information. 415 41602/01/09: beazley 417 Removed the nowarn option from lex(). All output is controlled 418 by passing in a logger object. Just pass in a logger with a high 419 level setting to suppress output. This argument was never 420 documented to begin with so hopefully no one was relying upon it. 421 42202/01/09: beazley 423 Discovered and removed a dead if-statement in the lexer. This 424 resulted in a 6-7% speedup in lexing when I tested it. 425 42601/13/09: beazley 427 Minor change to the procedure for signalling a syntax error in a 428 production rule. A normal SyntaxError exception should be raised 429 instead of yacc.SyntaxError. 430 43101/13/09: beazley 432 Added a new method p.set_lineno(n,lineno) that can be used to set the 433 line number of symbol n in grammar rules. This simplifies manual 434 tracking of line numbers. 435 43601/11/09: beazley 437 Vastly improved debugging support for yacc.parse(). Instead of passing 438 debug as an integer, you can supply a Logging object (see the logging 439 module). Messages will be generated at the ERROR, INFO, and DEBUG 440 logging levels, each level providing progressively more information. 441 The debugging trace also shows states, grammar rule, values passed 442 into grammar rules, and the result of each reduction. 443 44401/09/09: beazley 445 The yacc() command now does all error-reporting and diagnostics using 446 the interface of the logging module. Use the errorlog parameter to 447 specify a logging object for error messages. Use the debuglog parameter 448 to specify a logging object for the 'parser.out' output. 449 45001/09/09: beazley 451 *HUGE* refactoring of the the ply.yacc() implementation. The high-level 452 user interface is backwards compatible, but the internals are completely 453 reorganized into classes. No more global variables. The internals 454 are also more extensible. For example, you can use the classes to 455 construct a LALR(1) parser in an entirely different manner than 456 what is currently the case. Documentation is forthcoming. 457 45801/07/09: beazley 459 Various cleanup and refactoring of yacc internals. 460 46101/06/09: beazley 462 Fixed a bug with precedence assignment. yacc was assigning the precedence 463 each rule based on the left-most token, when in fact, it should have been 464 using the right-most token. Reported by Bruce Frederiksen. 465 46611/27/08: beazley 467 Numerous changes to support Python 3.0 including removal of deprecated 468 statements (e.g., has_key) and the additional of compatibility code 469 to emulate features from Python 2 that have been removed, but which 470 are needed. Fixed the unit testing suite to work with Python 3.0. 471 The code should be backwards compatible with Python 2. 472 47311/26/08: beazley 474 Loosened the rules on what kind of objects can be passed in as the 475 "module" parameter to lex() and yacc(). Previously, you could only use 476 a module or an instance. Now, PLY just uses dir() to get a list of 477 symbols on whatever the object is without regard for its type. 478 47911/26/08: beazley 480 Changed all except: statements to be compatible with Python2.x/3.x syntax. 481 48211/26/08: beazley 483 Changed all raise Exception, value statements to raise Exception(value) for 484 forward compatibility. 485 48611/26/08: beazley 487 Removed all print statements from lex and yacc, using sys.stdout and sys.stderr 488 directly. Preparation for Python 3.0 support. 489 49011/04/08: beazley 491 Fixed a bug with referring to symbols on the the parsing stack using negative 492 indices. 493 49405/29/08: beazley 495 Completely revamped the testing system to use the unittest module for everything. 496 Added additional tests to cover new errors/warnings. 497 498Version 2.5 499----------------------------- 50005/28/08: beazley 501 Fixed a bug with writing lex-tables in optimized mode and start states. 502 Reported by Kevin Henry. 503 504Version 2.4 505----------------------------- 50605/04/08: beazley 507 A version number is now embedded in the table file signature so that 508 yacc can more gracefully accomodate changes to the output format 509 in the future. 510 51105/04/08: beazley 512 Removed undocumented .pushback() method on grammar productions. I'm 513 not sure this ever worked and can't recall ever using it. Might have 514 been an abandoned idea that never really got fleshed out. This 515 feature was never described or tested so removing it is hopefully 516 harmless. 517 51805/04/08: beazley 519 Added extra error checking to yacc() to detect precedence rules defined 520 for undefined terminal symbols. This allows yacc() to detect a potential 521 problem that can be really tricky to debug if no warning message or error 522 message is generated about it. 523 52405/04/08: beazley 525 lex() now has an outputdir that can specify the output directory for 526 tables when running in optimize mode. For example: 527 528 lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar") 529 530 The behavior of specifying a table module and output directory are 531 more aligned with the behavior of yacc(). 532 53305/04/08: beazley 534 [Issue 9] 535 Fixed filename bug in when specifying the modulename in lex() and yacc(). 536 If you specified options such as the following: 537 538 parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar") 539 540 yacc would create a file "foo.bar.parsetab.py" in the given directory. 541 Now, it simply generates a file "parsetab.py" in that directory. 542 Bug reported by cptbinho. 543 54405/04/08: beazley 545 Slight modification to lex() and yacc() to allow their table files 546 to be loaded from a previously loaded module. This might make 547 it easier to load the parsing tables from a complicated package 548 structure. For example: 549 550 import foo.bar.spam.parsetab as parsetab 551 parser = yacc.yacc(tabmodule=parsetab) 552 553 Note: lex and yacc will never regenerate the table file if used 554 in the form---you will get a warning message instead. 555 This idea suggested by Brian Clapper. 556 557 55804/28/08: beazley 559 Fixed a big with p_error() functions being picked up correctly 560 when running in yacc(optimize=1) mode. Patch contributed by 561 Bart Whiteley. 562 56302/28/08: beazley 564 Fixed a bug with 'nonassoc' precedence rules. Basically the 565 non-precedence was being ignored and not producing the correct 566 run-time behavior in the parser. 567 56802/16/08: beazley 569 Slight relaxation of what the input() method to a lexer will 570 accept as a string. Instead of testing the input to see 571 if the input is a string or unicode string, it checks to see 572 if the input object looks like it contains string data. 573 This change makes it possible to pass string-like objects 574 in as input. For example, the object returned by mmap. 575 576 import mmap, os 577 data = mmap.mmap(os.open(filename,os.O_RDONLY), 578 os.path.getsize(filename), 579 access=mmap.ACCESS_READ) 580 lexer.input(data) 581 582 58311/29/07: beazley 584 Modification of ply.lex to allow token functions to aliased. 585 This is subtle, but it makes it easier to create libraries and 586 to reuse token specifications. For example, suppose you defined 587 a function like this: 588 589 def number(t): 590 r'\d+' 591 t.value = int(t.value) 592 return t 593 594 This change would allow you to define a token rule as follows: 595 596 t_NUMBER = number 597 598 In this case, the token type will be set to 'NUMBER' and use 599 the associated number() function to process tokens. 600 60111/28/07: beazley 602 Slight modification to lex and yacc to grab symbols from both 603 the local and global dictionaries of the caller. This 604 modification allows lexers and parsers to be defined using 605 inner functions and closures. 606 60711/28/07: beazley 608 Performance optimization: The lexer.lexmatch and t.lexer 609 attributes are no longer set for lexer tokens that are not 610 defined by functions. The only normal use of these attributes 611 would be in lexer rules that need to perform some kind of 612 special processing. Thus, it doesn't make any sense to set 613 them on every token. 614 615 *** POTENTIAL INCOMPATIBILITY *** This might break code 616 that is mucking around with internal lexer state in some 617 sort of magical way. 618 61911/27/07: beazley 620 Added the ability to put the parser into error-handling mode 621 from within a normal production. To do this, simply raise 622 a yacc.SyntaxError exception like this: 623 624 def p_some_production(p): 625 'some_production : prod1 prod2' 626 ... 627 raise yacc.SyntaxError # Signal an error 628 629 A number of things happen after this occurs: 630 631 - The last symbol shifted onto the symbol stack is discarded 632 and parser state backed up to what it was before the 633 the rule reduction. 634 635 - The current lookahead symbol is saved and replaced by 636 the 'error' symbol. 637 638 - The parser enters error recovery mode where it tries 639 to either reduce the 'error' rule or it starts 640 discarding items off of the stack until the parser 641 resets. 642 643 When an error is manually set, the parser does *not* call 644 the p_error() function (if any is defined). 645 *** NEW FEATURE *** Suggested on the mailing list 646 64711/27/07: beazley 648 Fixed structure bug in examples/ansic. Reported by Dion Blazakis. 649 65011/27/07: beazley 651 Fixed a bug in the lexer related to start conditions and ignored 652 token rules. If a rule was defined that changed state, but 653 returned no token, the lexer could be left in an inconsistent 654 state. Reported by 655 65611/27/07: beazley 657 Modified setup.py to support Python Eggs. Patch contributed by 658 Simon Cross. 659 66011/09/07: beazely 661 Fixed a bug in error handling in yacc. If a syntax error occurred and the 662 parser rolled the entire parse stack back, the parser would be left in in 663 inconsistent state that would cause it to trigger incorrect actions on 664 subsequent input. Reported by Ton Biegstraaten, Justin King, and others. 665 66611/09/07: beazley 667 Fixed a bug when passing empty input strings to yacc.parse(). This 668 would result in an error message about "No input given". Reported 669 by Andrew Dalke. 670 671Version 2.3 672----------------------------- 67302/20/07: beazley 674 Fixed a bug with character literals if the literal '.' appeared as the 675 last symbol of a grammar rule. Reported by Ales Smrcka. 676 67702/19/07: beazley 678 Warning messages are now redirected to stderr instead of being printed 679 to standard output. 680 68102/19/07: beazley 682 Added a warning message to lex.py if it detects a literal backslash 683 character inside the t_ignore declaration. This is to help 684 problems that might occur if someone accidentally defines t_ignore 685 as a Python raw string. For example: 686 687 t_ignore = r' \t' 688 689 The idea for this is from an email I received from David Cimimi who 690 reported bizarre behavior in lexing as a result of defining t_ignore 691 as a raw string by accident. 692 69302/18/07: beazley 694 Performance improvements. Made some changes to the internal 695 table organization and LR parser to improve parsing performance. 696 69702/18/07: beazley 698 Automatic tracking of line number and position information must now be 699 enabled by a special flag to parse(). For example: 700 701 yacc.parse(data,tracking=True) 702 703 In many applications, it's just not that important to have the 704 parser automatically track all line numbers. By making this an 705 optional feature, it allows the parser to run significantly faster 706 (more than a 20% speed increase in many cases). Note: positional 707 information is always available for raw tokens---this change only 708 applies to positional information associated with nonterminal 709 grammar symbols. 710 *** POTENTIAL INCOMPATIBILITY *** 711 71202/18/07: beazley 713 Yacc no longer supports extended slices of grammar productions. 714 However, it does support regular slices. For example: 715 716 def p_foo(p): 717 '''foo: a b c d e''' 718 p[0] = p[1:3] 719 720 This change is a performance improvement to the parser--it streamlines 721 normal access to the grammar values since slices are now handled in 722 a __getslice__() method as opposed to __getitem__(). 723 72402/12/07: beazley 725 Fixed a bug in the handling of token names when combined with 726 start conditions. Bug reported by Todd O'Bryan. 727 728Version 2.2 729------------------------------ 73011/01/06: beazley 731 Added lexpos() and lexspan() methods to grammar symbols. These 732 mirror the same functionality of lineno() and linespan(). For 733 example: 734 735 def p_expr(p): 736 'expr : expr PLUS expr' 737 p.lexpos(1) # Lexing position of left-hand-expression 738 p.lexpos(1) # Lexing position of PLUS 739 start,end = p.lexspan(3) # Lexing range of right hand expression 740 74111/01/06: beazley 742 Minor change to error handling. The recommended way to skip characters 743 in the input is to use t.lexer.skip() as shown here: 744 745 def t_error(t): 746 print "Illegal character '%s'" % t.value[0] 747 t.lexer.skip(1) 748 749 The old approach of just using t.skip(1) will still work, but won't 750 be documented. 751 75210/31/06: beazley 753 Discarded tokens can now be specified as simple strings instead of 754 functions. To do this, simply include the text "ignore_" in the 755 token declaration. For example: 756 757 t_ignore_cppcomment = r'//.*' 758 759 Previously, this had to be done with a function. For example: 760 761 def t_ignore_cppcomment(t): 762 r'//.*' 763 pass 764 765 If start conditions/states are being used, state names should appear 766 before the "ignore_" text. 767 76810/19/06: beazley 769 The Lex module now provides support for flex-style start conditions 770 as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html. 771 Please refer to this document to understand this change note. Refer to 772 the PLY documentation for PLY-specific explanation of how this works. 773 774 To use start conditions, you first need to declare a set of states in 775 your lexer file: 776 777 states = ( 778 ('foo','exclusive'), 779 ('bar','inclusive') 780 ) 781 782 This serves the same role as the %s and %x specifiers in flex. 783 784 One a state has been declared, tokens for that state can be 785 declared by defining rules of the form t_state_TOK. For example: 786 787 t_PLUS = '\+' # Rule defined in INITIAL state 788 t_foo_NUM = '\d+' # Rule defined in foo state 789 t_bar_NUM = '\d+' # Rule defined in bar state 790 791 t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar 792 t_ANY_NUM = '\d+' # Rule defined in all states 793 794 In addition to defining tokens for each state, the t_ignore and t_error 795 specifications can be customized for specific states. For example: 796 797 t_foo_ignore = " " # Ignored characters for foo state 798 def t_bar_error(t): 799 # Handle errors in bar state 800 801 With token rules, the following methods can be used to change states 802 803 def t_TOKNAME(t): 804 t.lexer.begin('foo') # Begin state 'foo' 805 t.lexer.push_state('foo') # Begin state 'foo', push old state 806 # onto a stack 807 t.lexer.pop_state() # Restore previous state 808 t.lexer.current_state() # Returns name of current state 809 810 These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and 811 yy_top_state() functions in flex. 812 813 The use of start states can be used as one way to write sub-lexers. 814 For example, the lexer or parser might instruct the lexer to start 815 generating a different set of tokens depending on the context. 816 817 example/yply/ylex.py shows the use of start states to grab C/C++ 818 code fragments out of traditional yacc specification files. 819 820 *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also 821 discussed various aspects of the design. 822 82310/19/06: beazley 824 Minor change to the way in which yacc.py was reporting shift/reduce 825 conflicts. Although the underlying LALR(1) algorithm was correct, 826 PLY was under-reporting the number of conflicts compared to yacc/bison 827 when precedence rules were in effect. This change should make PLY 828 report the same number of conflicts as yacc. 829 83010/19/06: beazley 831 Modified yacc so that grammar rules could also include the '-' 832 character. For example: 833 834 def p_expr_list(p): 835 'expression-list : expression-list expression' 836 837 Suggested by Oldrich Jedlicka. 838 83910/18/06: beazley 840 Attribute lexer.lexmatch added so that token rules can access the re 841 match object that was generated. For example: 842 843 def t_FOO(t): 844 r'some regex' 845 m = t.lexer.lexmatch 846 # Do something with m 847 848 849 This may be useful if you want to access named groups specified within 850 the regex for a specific token. Suggested by Oldrich Jedlicka. 851 85210/16/06: beazley 853 Changed the error message that results if an illegal character 854 is encountered and no default error function is defined in lex. 855 The exception is now more informative about the actual cause of 856 the error. 857 858Version 2.1 859------------------------------ 86010/02/06: beazley 861 The last Lexer object built by lex() can be found in lex.lexer. 862 The last Parser object built by yacc() can be found in yacc.parser. 863 86410/02/06: beazley 865 New example added: examples/yply 866 867 This example uses PLY to convert Unix-yacc specification files to 868 PLY programs with the same grammar. This may be useful if you 869 want to convert a grammar from bison/yacc to use with PLY. 870 87110/02/06: beazley 872 Added support for a start symbol to be specified in the yacc 873 input file itself. Just do this: 874 875 start = 'name' 876 877 where 'name' matches some grammar rule. For example: 878 879 def p_name(p): 880 'name : A B C' 881 ... 882 883 This mirrors the functionality of the yacc %start specifier. 884 88509/30/06: beazley 886 Some new examples added.: 887 888 examples/GardenSnake : A simple indentation based language similar 889 to Python. Shows how you might handle 890 whitespace. Contributed by Andrew Dalke. 891 892 examples/BASIC : An implementation of 1964 Dartmouth BASIC. 893 Contributed by Dave against his better 894 judgement. 895 89609/28/06: beazley 897 Minor patch to allow named groups to be used in lex regular 898 expression rules. For example: 899 900 t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)''' 901 902 Patch submitted by Adam Ring. 903 90409/28/06: beazley 905 LALR(1) is now the default parsing method. To use SLR, use 906 yacc.yacc(method="SLR"). Note: there is no performance impact 907 on parsing when using LALR(1) instead of SLR. However, constructing 908 the parsing tables will take a little longer. 909 91009/26/06: beazley 911 Change to line number tracking. To modify line numbers, modify 912 the line number of the lexer itself. For example: 913 914 def t_NEWLINE(t): 915 r'\n' 916 t.lexer.lineno += 1 917 918 This modification is both cleanup and a performance optimization. 919 In past versions, lex was monitoring every token for changes in 920 the line number. This extra processing is unnecessary for a vast 921 majority of tokens. Thus, this new approach cleans it up a bit. 922 923 *** POTENTIAL INCOMPATIBILITY *** 924 You will need to change code in your lexer that updates the line 925 number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1" 926 92709/26/06: beazley 928 Added the lexing position to tokens as an attribute lexpos. This 929 is the raw index into the input text at which a token appears. 930 This information can be used to compute column numbers and other 931 details (e.g., scan backwards from lexpos to the first newline 932 to get a column position). 933 93409/25/06: beazley 935 Changed the name of the __copy__() method on the Lexer class 936 to clone(). This is used to clone a Lexer object (e.g., if 937 you're running different lexers at the same time). 938 93909/21/06: beazley 940 Limitations related to the use of the re module have been eliminated. 941 Several users reported problems with regular expressions exceeding 942 more than 100 named groups. To solve this, lex.py is now capable 943 of automatically splitting its master regular regular expression into 944 smaller expressions as needed. This should, in theory, make it 945 possible to specify an arbitrarily large number of tokens. 946 94709/21/06: beazley 948 Improved error checking in lex.py. Rules that match the empty string 949 are now rejected (otherwise they cause the lexer to enter an infinite 950 loop). An extra check for rules containing '#' has also been added. 951 Since lex compiles regular expressions in verbose mode, '#' is interpreted 952 as a regex comment, it is critical to use '\#' instead. 953 95409/18/06: beazley 955 Added a @TOKEN decorator function to lex.py that can be used to 956 define token rules where the documentation string might be computed 957 in some way. 958 959 digit = r'([0-9])' 960 nondigit = r'([_A-Za-z])' 961 identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)' 962 963 from ply.lex import TOKEN 964 965 @TOKEN(identifier) 966 def t_ID(t): 967 # Do whatever 968 969 The @TOKEN decorator merely sets the documentation string of the 970 associated token function as needed for lex to work. 971 972 Note: An alternative solution is the following: 973 974 def t_ID(t): 975 # Do whatever 976 977 t_ID.__doc__ = identifier 978 979 Note: Decorators require the use of Python 2.4 or later. If compatibility 980 with old versions is needed, use the latter solution. 981 982 The need for this feature was suggested by Cem Karan. 983 98409/14/06: beazley 985 Support for single-character literal tokens has been added to yacc. 986 These literals must be enclosed in quotes. For example: 987 988 def p_expr(p): 989 "expr : expr '+' expr" 990 ... 991 992 def p_expr(p): 993 'expr : expr "-" expr' 994 ... 995 996 In addition to this, it is necessary to tell the lexer module about 997 literal characters. This is done by defining the variable 'literals' 998 as a list of characters. This should be defined in the module that 999 invokes the lex.lex() function. For example: 1000 1001 literals = ['+','-','*','/','(',')','='] 1002 1003 or simply 1004 1005 literals = '+=*/()=' 1006 1007 It is important to note that literals can only be a single character. 1008 When the lexer fails to match a token using its normal regular expression 1009 rules, it will check the current character against the literal list. 1010 If found, it will be returned with a token type set to match the literal 1011 character. Otherwise, an illegal character will be signalled. 1012 1013 101409/14/06: beazley 1015 Modified PLY to install itself as a proper Python package called 'ply'. 1016 This will make it a little more friendly to other modules. This 1017 changes the usage of PLY only slightly. Just do this to import the 1018 modules 1019 1020 import ply.lex as lex 1021 import ply.yacc as yacc 1022 1023 Alternatively, you can do this: 1024 1025 from ply import * 1026 1027 Which imports both the lex and yacc modules. 1028 Change suggested by Lee June. 1029 103009/13/06: beazley 1031 Changed the handling of negative indices when used in production rules. 1032 A negative production index now accesses already parsed symbols on the 1033 parsing stack. For example, 1034 1035 def p_foo(p): 1036 "foo: A B C D" 1037 print p[1] # Value of 'A' symbol 1038 print p[2] # Value of 'B' symbol 1039 print p[-1] # Value of whatever symbol appears before A 1040 # on the parsing stack. 1041 1042 p[0] = some_val # Sets the value of the 'foo' grammer symbol 1043 1044 This behavior makes it easier to work with embedded actions within the 1045 parsing rules. For example, in C-yacc, it is possible to write code like 1046 this: 1047 1048 bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; } 1049 1050 In this example, the printf() code executes immediately after A has been 1051 parsed. Within the embedded action code, $1 refers to the A symbol on 1052 the stack. 1053 1054 To perform this equivalent action in PLY, you need to write a pair 1055 of rules like this: 1056 1057 def p_bar(p): 1058 "bar : A seen_A B" 1059 do_stuff 1060 1061 def p_seen_A(p): 1062 "seen_A :" 1063 print "seen an A =", p[-1] 1064 1065 The second rule "seen_A" is merely a empty production which should be 1066 reduced as soon as A is parsed in the "bar" rule above. The use 1067 of the negative index p[-1] is used to access whatever symbol appeared 1068 before the seen_A symbol. 1069 1070 This feature also makes it possible to support inherited attributes. 1071 For example: 1072 1073 def p_decl(p): 1074 "decl : scope name" 1075 1076 def p_scope(p): 1077 """scope : GLOBAL 1078 | LOCAL""" 1079 p[0] = p[1] 1080 1081 def p_name(p): 1082 "name : ID" 1083 if p[-1] == "GLOBAL": 1084 # ... 1085 else if p[-1] == "LOCAL": 1086 #... 1087 1088 In this case, the name rule is inheriting an attribute from the 1089 scope declaration that precedes it. 1090 1091 *** POTENTIAL INCOMPATIBILITY *** 1092 If you are currently using negative indices within existing grammar rules, 1093 your code will break. This should be extremely rare if non-existent in 1094 most cases. The argument to various grammar rules is not usually not 1095 processed in the same way as a list of items. 1096 1097Version 2.0 1098------------------------------ 109909/07/06: beazley 1100 Major cleanup and refactoring of the LR table generation code. Both SLR 1101 and LALR(1) table generation is now performed by the same code base with 1102 only minor extensions for extra LALR(1) processing. 1103 110409/07/06: beazley 1105 Completely reimplemented the entire LALR(1) parsing engine to use the 1106 DeRemer and Pennello algorithm for calculating lookahead sets. This 1107 significantly improves the performance of generating LALR(1) tables 1108 and has the added feature of actually working correctly! If you 1109 experienced weird behavior with LALR(1) in prior releases, this should 1110 hopefully resolve all of those problems. Many thanks to 1111 Andrew Waters and Markus Schoepflin for submitting bug reports 1112 and helping me test out the revised LALR(1) support. 1113 1114Version 1.8 1115------------------------------ 111608/02/06: beazley 1117 Fixed a problem related to the handling of default actions in LALR(1) 1118 parsing. If you experienced subtle and/or bizarre behavior when trying 1119 to use the LALR(1) engine, this may correct those problems. Patch 1120 contributed by Russ Cox. Note: This patch has been superceded by 1121 revisions for LALR(1) parsing in Ply-2.0. 1122 112308/02/06: beazley 1124 Added support for slicing of productions in yacc. 1125 Patch contributed by Patrick Mezard. 1126 1127Version 1.7 1128------------------------------ 112903/02/06: beazley 1130 Fixed infinite recursion problem ReduceToTerminals() function that 1131 would sometimes come up in LALR(1) table generation. Reported by 1132 Markus Schoepflin. 1133 113403/01/06: beazley 1135 Added "reflags" argument to lex(). For example: 1136 1137 lex.lex(reflags=re.UNICODE) 1138 1139 This can be used to specify optional flags to the re.compile() function 1140 used inside the lexer. This may be necessary for special situations such 1141 as processing Unicode (e.g., if you want escapes like \w and \b to consult 1142 the Unicode character property database). The need for this suggested by 1143 Andreas Jung. 1144 114503/01/06: beazley 1146 Fixed a bug with an uninitialized variable on repeated instantiations of parser 1147 objects when the write_tables=0 argument was used. Reported by Michael Brown. 1148 114903/01/06: beazley 1150 Modified lex.py to accept Unicode strings both as the regular expressions for 1151 tokens and as input. Hopefully this is the only change needed for Unicode support. 1152 Patch contributed by Johan Dahl. 1153 115403/01/06: beazley 1155 Modified the class-based interface to work with new-style or old-style classes. 1156 Patch contributed by Michael Brown (although I tweaked it slightly so it would work 1157 with older versions of Python). 1158 1159Version 1.6 1160------------------------------ 116105/27/05: beazley 1162 Incorporated patch contributed by Christopher Stawarz to fix an extremely 1163 devious bug in LALR(1) parser generation. This patch should fix problems 1164 numerous people reported with LALR parsing. 1165 116605/27/05: beazley 1167 Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav, 1168 and Thad Austin. 1169 117005/27/05: beazley 1171 Added outputdir option to yacc() to control output directory. Contributed 1172 by Christopher Stawarz. 1173 117405/27/05: beazley 1175 Added rununit.py test script to run tests using the Python unittest module. 1176 Contributed by Miki Tebeka. 1177 1178Version 1.5 1179------------------------------ 118005/26/04: beazley 1181 Major enhancement. LALR(1) parsing support is now working. 1182 This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu) 1183 and optimized by David Beazley. To use LALR(1) parsing do 1184 the following: 1185 1186 yacc.yacc(method="LALR") 1187 1188 Computing LALR(1) parsing tables takes about twice as long as 1189 the default SLR method. However, LALR(1) allows you to handle 1190 more complex grammars. For example, the ANSI C grammar 1191 (in example/ansic) has 13 shift-reduce conflicts with SLR, but 1192 only has 1 shift-reduce conflict with LALR(1). 1193 119405/20/04: beazley 1195 Added a __len__ method to parser production lists. Can 1196 be used in parser rules like this: 1197 1198 def p_somerule(p): 1199 """a : B C D 1200 | E F" 1201 if (len(p) == 3): 1202 # Must have been first rule 1203 elif (len(p) == 2): 1204 # Must be second rule 1205 1206 Suggested by Joshua Gerth and others. 1207 1208Version 1.4 1209------------------------------ 121004/23/04: beazley 1211 Incorporated a variety of patches contributed by Eric Raymond. 1212 These include: 1213 1214 0. Cleans up some comments so they don't wrap on an 80-column display. 1215 1. Directs compiler errors to stderr where they belong. 1216 2. Implements and documents automatic line counting when \n is ignored. 1217 3. Changes the way progress messages are dumped when debugging is on. 1218 The new format is both less verbose and conveys more information than 1219 the old, including shift and reduce actions. 1220 122104/23/04: beazley 1222 Added a Python setup.py file to simply installation. Contributed 1223 by Adam Kerrison. 1224 122504/23/04: beazley 1226 Added patches contributed by Adam Kerrison. 1227 1228 - Some output is now only shown when debugging is enabled. This 1229 means that PLY will be completely silent when not in debugging mode. 1230 1231 - An optional parameter "write_tables" can be passed to yacc() to 1232 control whether or not parsing tables are written. By default, 1233 it is true, but it can be turned off if you don't want the yacc 1234 table file. Note: disabling this will cause yacc() to regenerate 1235 the parsing table each time. 1236 123704/23/04: beazley 1238 Added patches contributed by David McNab. This patch addes two 1239 features: 1240 1241 - The parser can be supplied as a class instead of a module. 1242 For an example of this, see the example/classcalc directory. 1243 1244 - Debugging output can be directed to a filename of the user's 1245 choice. Use 1246 1247 yacc(debugfile="somefile.out") 1248 1249 1250Version 1.3 1251------------------------------ 125212/10/02: jmdyck 1253 Various minor adjustments to the code that Dave checked in today. 1254 Updated test/yacc_{inf,unused}.exp to reflect today's changes. 1255 125612/10/02: beazley 1257 Incorporated a variety of minor bug fixes to empty production 1258 handling and infinite recursion checking. Contributed by 1259 Michael Dyck. 1260 126112/10/02: beazley 1262 Removed bogus recover() method call in yacc.restart() 1263 1264Version 1.2 1265------------------------------ 126611/27/02: beazley 1267 Lexer and parser objects are now available as an attribute 1268 of tokens and slices respectively. For example: 1269 1270 def t_NUMBER(t): 1271 r'\d+' 1272 print t.lexer 1273 1274 def p_expr_plus(t): 1275 'expr: expr PLUS expr' 1276 print t.lexer 1277 print t.parser 1278 1279 This can be used for state management (if needed). 1280 128110/31/02: beazley 1282 Modified yacc.py to work with Python optimize mode. To make 1283 this work, you need to use 1284 1285 yacc.yacc(optimize=1) 1286 1287 Furthermore, you need to first run Python in normal mode 1288 to generate the necessary parsetab.py files. After that, 1289 you can use python -O or python -OO. 1290 1291 Note: optimized mode turns off a lot of error checking. 1292 Only use when you are sure that your grammar is working. 1293 Make sure parsetab.py is up to date! 1294 129510/30/02: beazley 1296 Added cloning of Lexer objects. For example: 1297 1298 import copy 1299 l = lex.lex() 1300 lc = copy.copy(l) 1301 1302 l.input("Some text") 1303 lc.input("Some other text") 1304 ... 1305 1306 This might be useful if the same "lexer" is meant to 1307 be used in different contexts---or if multiple lexers 1308 are running concurrently. 1309 131010/30/02: beazley 1311 Fixed subtle bug with first set computation and empty productions. 1312 Patch submitted by Michael Dyck. 1313 131410/30/02: beazley 1315 Fixed error messages to use "filename:line: message" instead 1316 of "filename:line. message". This makes error reporting more 1317 friendly to emacs. Patch submitted by Fran�ois Pinard. 1318 131910/30/02: beazley 1320 Improvements to parser.out file. Terminals and nonterminals 1321 are sorted instead of being printed in random order. 1322 Patch submitted by Fran�ois Pinard. 1323 132410/30/02: beazley 1325 Improvements to parser.out file output. Rules are now printed 1326 in a way that's easier to understand. Contributed by Russ Cox. 1327 132810/30/02: beazley 1329 Added 'nonassoc' associativity support. This can be used 1330 to disable the chaining of operators like a < b < c. 1331 To use, simply specify 'nonassoc' in the precedence table 1332 1333 precedence = ( 1334 ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators 1335 ('left', 'PLUS', 'MINUS'), 1336 ('left', 'TIMES', 'DIVIDE'), 1337 ('right', 'UMINUS'), # Unary minus operator 1338 ) 1339 1340 Patch contributed by Russ Cox. 1341 134210/30/02: beazley 1343 Modified the lexer to provide optional support for Python -O and -OO 1344 modes. To make this work, Python *first* needs to be run in 1345 unoptimized mode. This reads the lexing information and creates a 1346 file "lextab.py". Then, run lex like this: 1347 1348 # module foo.py 1349 ... 1350 ... 1351 lex.lex(optimize=1) 1352 1353 Once the lextab file has been created, subsequent calls to 1354 lex.lex() will read data from the lextab file instead of using 1355 introspection. In optimized mode (-O, -OO) everything should 1356 work normally despite the loss of doc strings. 1357 1358 To change the name of the file 'lextab.py' use the following: 1359 1360 lex.lex(lextab="footab") 1361 1362 (this creates a file footab.py) 1363 1364 1365Version 1.1 October 25, 2001 1366------------------------------ 1367 136810/25/01: beazley 1369 Modified the table generator to produce much more compact data. 1370 This should greatly reduce the size of the parsetab.py[c] file. 1371 Caveat: the tables still need to be constructed so a little more 1372 work is done in parsetab on import. 1373 137410/25/01: beazley 1375 There may be a possible bug in the cycle detector that reports errors 1376 about infinite recursion. I'm having a little trouble tracking it 1377 down, but if you get this problem, you can disable the cycle 1378 detector as follows: 1379 1380 yacc.yacc(check_recursion = 0) 1381 138210/25/01: beazley 1383 Fixed a bug in lex.py that sometimes caused illegal characters to be 1384 reported incorrectly. Reported by Sverre J�rgensen. 1385 13867/8/01 : beazley 1387 Added a reference to the underlying lexer object when tokens are handled by 1388 functions. The lexer is available as the 'lexer' attribute. This 1389 was added to provide better lexing support for languages such as Fortran 1390 where certain types of tokens can't be conveniently expressed as regular 1391 expressions (and where the tokenizing function may want to perform a 1392 little backtracking). Suggested by Pearu Peterson. 1393 13946/20/01 : beazley 1395 Modified yacc() function so that an optional starting symbol can be specified. 1396 For example: 1397 1398 yacc.yacc(start="statement") 1399 1400 Normally yacc always treats the first production rule as the starting symbol. 1401 However, if you are debugging your grammar it may be useful to specify 1402 an alternative starting symbol. Idea suggested by Rich Salz. 1403 1404Version 1.0 June 18, 2001 1405-------------------------- 1406Initial public offering 1407 1408