1#!/usr/bin/ruby 2# encoding: utf-8 3 4=begin LICENSE 5 6[The "BSD licence"] 7Copyright (c) 2009-2010 Kyle Yetter 8All rights reserved. 9 10Redistribution and use in source and binary forms, with or without 11modification, are permitted provided that the following conditions 12are met: 13 14 1. Redistributions of source code must retain the above copyright 15 notice, this list of conditions and the following disclaimer. 16 2. Redistributions in binary form must reproduce the above copyright 17 notice, this list of conditions and the following disclaimer in the 18 documentation and/or other materials provided with the distribution. 19 3. The name of the author may not be used to endorse or promote products 20 derived from this software without specific prior written permission. 21 22THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 23IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 24OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 25IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 26INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT 27NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 28DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 29THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 30(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 31THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32 33=end 34 35module ANTLR3 36 37 38=begin rdoc ANTLR3::Stream 39 40= ANTLR3 Streams 41 42This documentation first covers the general concept of streams as used by ANTLR 43recognizers, and then discusses the specific <tt>ANTLR3::Stream</tt> module. 44 45== ANTLR Stream Classes 46 47ANTLR recognizers need a way to walk through input data in a serialized IO-style 48fashion. They also need some book-keeping about the input to provide useful 49information to developers, such as current line number and column. Furthermore, 50to implement backtracking and various error recovery techniques, recognizers 51need a way to record various locations in the input at a number of points in the 52recognition process so the input state may be restored back to a prior state. 53 54ANTLR bundles all of this functionality into a number of Stream classes, each 55designed to be used by recognizers for a specific recognition task. Most of the 56Stream hierarchy is implemented in antlr3/stream.rb, which is loaded by default 57when 'antlr3' is required. 58 59--- 60 61Here's a brief overview of the various stream classes and their respective 62purpose: 63 64StringStream:: 65 Similar to StringIO from the standard Ruby library, StringStream wraps raw 66 String data in a Stream interface for use by ANTLR lexers. 67FileStream:: 68 A subclass of StringStream, FileStream simply wraps data read from an IO or 69 File object for use by lexers. 70CommonTokenStream:: 71 The job of a TokenStream is to read lexer output and then provide ANTLR 72 parsers with the means to sequential walk through series of tokens. 73 CommonTokenStream is the default TokenStream implementation. 74TokenRewriteStream:: 75 A subclass of CommonTokenStream, TokenRewriteStreams provide rewriting-parsers 76 the ability to produce new output text from an input token-sequence by 77 managing rewrite "programs" on top of the stream. 78CommonTreeNodeStream:: 79 In a similar fashion to CommonTokenStream, CommonTreeNodeStream feeds tokens 80 to recognizers in a sequential fashion. However, the stream object serializes 81 an Abstract Syntax Tree into a flat, one-dimensional sequence, but preserves 82 the two-dimensional shape of the tree using special UP and DOWN tokens. The 83 sequence is primarily used by ANTLR Tree Parsers. *note* -- this is not 84 defined in antlr3/stream.rb, but antlr3/tree.rb 85 86--- 87 88The next few sections cover the most significant methods of all stream classes. 89 90=== consume / look / peek 91 92<tt>stream.consume</tt> is used to advance a stream one unit. StringStreams are 93advanced by one character and TokenStreams are advanced by one token. 94 95<tt>stream.peek(k = 1)</tt> is used to quickly retrieve the object of interest 96to a recognizer at look-ahead position specified by <tt>k</tt>. For 97<b>StringStreams</b>, this is the <i>integer value of the character</i> 98<tt>k</tt> characters ahead of the stream cursor. For <b>TokenStreams</b>, this 99is the <i>integer token type of the token</i> <tt>k</tt> tokens ahead of the 100stream cursor. 101 102<tt>stream.look(k = 1)</tt> is used to retrieve the full object of interest at 103look-ahead position specified by <tt>k</tt>. While <tt>peek</tt> provides the 104<i>bare-minimum lightweight information</i> that the recognizer needs, 105<tt>look</tt> provides the <i>full object of concern</i> in the stream. For 106<b>StringStreams</b>, this is a <i>string object containing the single 107character</i> <tt>k</tt> characters ahead of the stream cursor. For 108<b>TokenStreams</b>, this is the <i>full token structure</i> <tt>k</tt> tokens 109ahead of the stream cursor. 110 111<b>Note:</b> in most ANTLR runtime APIs for other languages, <tt>peek</tt> is 112implemented by some method with a name like <tt>LA(k)</tt> and <tt>look</tt> is 113implemented by some method with a name like <tt>LT(k)</tt>. When writing this 114Ruby runtime API, I found this naming practice both confusing, ambiguous, and 115un-Ruby-like. Thus, I chose <tt>peek</tt> and <tt>look</tt> to represent a 116quick-look (peek) and a full-fledged look-ahead operation (look). If this causes 117confusion or any sort of compatibility strife for developers using this 118implementation, all apologies. 119 120=== mark / rewind / release 121 122<tt>marker = stream.mark</tt> causes the stream to record important information 123about the current stream state, place the data in an internal memory table, and 124return a memento, <tt>marker</tt>. The marker object is typically an integer key 125to the stream's internal memory table. 126 127Used in tandem with, <tt>stream.rewind(mark = last_marker)</tt>, the marker can 128be used to restore the stream to an earlier state. This is used by recognizers 129to perform tasks such as backtracking and error recovery. 130 131<tt>stream.release(marker = last_marker)</tt> can be used to release an existing 132state marker from the memory table. 133 134=== seek 135 136<tt>stream.seek(position)</tt> moves the stream cursor to an absolute position 137within the stream, basically like typical ruby <tt>IO#seek</tt> style methods. 138However, unlike <tt>IO#seek</tt>, ANTLR streams currently always use absolute 139position seeking. 140 141== The Stream Module 142 143<tt>ANTLR3::Stream</tt> is an abstract-ish base mixin for all IO-like stream 144classes used by ANTLR recognizers. 145 146The module doesn't do much on its own besides define arguably annoying 147``abstract'' pseudo-methods that demand implementation when it is mixed in to a 148class that wants to be a Stream. Right now this exists as an artifact of porting 149the ANTLR Java/Python runtime library to Ruby. In Java, of course, this is 150represented as an interface. In Ruby, however, objects are duck-typed and 151interfaces aren't that useful as programmatic entities -- in fact, it's mildly 152wasteful to have a module like this hanging out. Thus, I may axe it. 153 154When mixed in, it does give the class a #size and #source_name attribute 155methods. 156 157Except in a small handful of places, most of the ANTLR runtime library uses 158duck-typing and not type checking on objects. This means that the methods which 159manipulate stream objects don't usually bother checking that the object is a 160Stream and assume that the object implements the proper stream interface. Thus, 161it is not strictly necessary that custom stream objects include ANTLR3::Stream, 162though it isn't a bad idea. 163 164=end 165 166module Stream 167 include ANTLR3::Constants 168 extend ClassMacros 169 170 ## 171 # :method: consume 172 # used to advance a stream one unit (such as character or token) 173 abstract :consume 174 175 ## 176 # :method: peek( k = 1 ) 177 # used to quickly retreive the object of interest to a recognizer at lookahead 178 # position specified by <tt>k</tt> (such as integer value of a character or an 179 # integer token type) 180 abstract :peek 181 182 ## 183 # :method: look( k = 1 ) 184 # used to retreive the full object of interest at lookahead position specified 185 # by <tt>k</tt> (such as a character string or a token structure) 186 abstract :look 187 188 ## 189 # :method: mark 190 # saves the current position for the purposes of backtracking and 191 # returns a value to pass to #rewind at a later time 192 abstract :mark 193 194 ## 195 # :method: index 196 # returns the current position of the stream 197 abstract :index 198 199 ## 200 # :method: rewind( marker = last_marker ) 201 # restores the stream position using the state information previously saved 202 # by the given marker 203 abstract :rewind 204 205 ## 206 # :method: release( marker = last_marker ) 207 # clears the saved state information associated with the given marker value 208 abstract :release 209 210 ## 211 # :method: seek( position ) 212 # move the stream to the given absolute index given by +position+ 213 abstract :seek 214 215 ## 216 # the total number of symbols in the stream 217 attr_reader :size 218 219 ## 220 # indicates an identifying name for the stream -- usually the file path of the input 221 attr_accessor :source_name 222end 223 224=begin rdoc ANTLR3::CharacterStream 225 226CharacterStream further extends the abstract-ish base mixin Stream to add 227methods specific to navigating character-based input data. Thus, it serves as an 228immitation of the Java interface for text-based streams, which are primarily 229used by lexers. 230 231It adds the ``abstract'' method, <tt>substring(start, stop)</tt>, which must be 232implemented to return a slice of the input string from position <tt>start</tt> 233to position <tt>stop</tt>. It also adds attribute accessor methods <tt>line</tt> 234and <tt>column</tt>, which are expected to indicate the current line number and 235position within the current line, respectively. 236 237== A Word About <tt>line</tt> and <tt>column</tt> attributes 238 239Presumably, the concept of <tt>line</tt> and <tt>column</tt> attirbutes of text 240are familliar to most developers. Line numbers of text are indexed from number 1 241up (not 0). Column numbers are indexed from 0 up. Thus, examining sample text: 242 243 Hey this is the first line. 244 Oh, and this is the second line. 245 246Line 1 is the string "Hey this is the first line\\n". If a character stream is at 247line 2, character 0, the stream cursor is sitting between the characters "\\n" 248and "O". 249 250*Note:* most ANTLR runtime APIs for other languages refer to <tt>column</tt> 251with the more-precise, but lengthy name <tt>charPositionInLine</tt>. I prefered 252to keep it simple and familliar in this Ruby runtime API. 253 254=end 255 256module CharacterStream 257 include Stream 258 extend ClassMacros 259 include Constants 260 261 ## 262 # :method: substring(start,stop) 263 abstract :substring 264 265 attr_accessor :line 266 attr_accessor :column 267end 268 269 270=begin rdoc ANTLR3::TokenStream 271 272TokenStream further extends the abstract-ish base mixin Stream to add methods 273specific to navigating token sequences. Thus, it serves as an imitation of the 274Java interface for token-based streams, which are used by many different 275components in ANTLR, including parsers and tree parsers. 276 277== Token Streams 278 279Token streams wrap a sequence of token objects produced by some token source, 280usually a lexer. They provide the operations required by higher-level 281recognizers, such as parsers and tree parsers for navigating through the 282sequence of tokens. Unlike simple character-based streams, such as StringStream, 283token-based streams have an additional level of complexity because they must 284manage the task of "tuning" to a specific token channel. 285 286One of the main advantages of ANTLR-based recognition is the token 287<i>channel</i> feature, which allows you to hold on to all tokens of interest 288while only presenting a specific set of interesting tokens to a parser. For 289example, if you need to hide whitespace and comments from a parser, but hang on 290to them for some other purpose, you have the lexer assign the comments and 291whitespace to channel value HIDDEN as it creates the tokens. 292 293When you create a token stream, you can tune it to some specific channel value. 294Then, all <tt>peek</tt>, <tt>look</tt>, and <tt>consume</tt> operations only 295yield tokens that have the same value for <tt>channel</tt>. The stream skips 296over any non-matching tokens in between. 297 298== The TokenStream Interface 299 300In addition to the abstract methods and attribute methods provided by the base 301Stream module, TokenStream adds a number of additional method implementation 302requirements and attributes. 303 304=end 305 306module TokenStream 307 include Stream 308 extend ClassMacros 309 310 ## 311 # expected to return the token source object (such as a lexer) from which 312 # all tokens in the stream were retreived 313 attr_reader :token_source 314 315 ## 316 # expected to return the value of the last marker produced by a call to 317 # <tt>stream.mark</tt> 318 attr_reader :last_marker 319 320 ## 321 # expected to return the integer index of the stream cursor 322 attr_reader :position 323 324 ## 325 # the integer channel value to which the stream is ``tuned'' 326 attr_accessor :channel 327 328 ## 329 # :method: to_s(start=0,stop=tokens.length-1) 330 # should take the tokens between start and stop in the sequence, extract their text 331 # and return the concatenation of all the text chunks 332 abstract :to_s 333 334 ## 335 # :method: at( i ) 336 # return the stream symbol at index +i+ 337 abstract :at 338end 339 340=begin rdoc ANTLR3::StringStream 341 342A StringStream's purpose is to wrap the basic, naked text input of a recognition 343system. Like all other stream types, it provides serial navigation of the input; 344a recognizer can arbitrarily step forward and backward through the stream's 345symbols as it requires. StringStream and its subclasses are they main way to 346feed text input into an ANTLR Lexer for token processing. 347 348The stream's symbols of interest, of course, are character values. Thus, the 349#peek method returns the integer character value at look-ahead position 350<tt>k</tt> and the #look method returns the character value as a +String+. They 351also track various pieces of information such as the line and column numbers at 352the current position. 353 354=== Note About Text Encoding 355 356This version of the runtime library primarily targets ruby version 1.8, which 357does not have strong built-in support for multi-byte character encodings. Thus, 358characters are assumed to be represented by a single byte -- an integer between 3590 and 255. Ruby 1.9 does provide built-in encoding support for multi-byte 360characters, but currently this library does not provide any streams to handle 361non-ASCII encoding. However, encoding-savvy recognition code is a future 362development goal for this project. 363 364=end 365 366class StringStream 367 NEWLINE = ?\n.ord 368 369 include CharacterStream 370 371 # current integer character index of the stream 372 attr_reader :position 373 374 # the current line number of the input, indexed upward from 1 375 attr_reader :line 376 377 # the current character position within the current line, indexed upward from 0 378 attr_reader :column 379 380 # the name associated with the stream -- usually a file name 381 # defaults to <tt>"(string)"</tt> 382 attr_accessor :name 383 384 # the entire string that is wrapped by the stream 385 attr_reader :data 386 attr_reader :string 387 388 if RUBY_VERSION =~ /^1\.9/ 389 390 # creates a new StringStream object where +data+ is the string data to stream. 391 # accepts the following options in a symbol-to-value hash: 392 # 393 # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt> 394 # [:line] the initial line number; default: +1+ 395 # [:column] the initial column number; default: +0+ 396 # 397 def initialize( data, options = {} ) # for 1.9 398 @string = data.to_s.encode( Encoding::UTF_8 ).freeze 399 @data = @string.codepoints.to_a.freeze 400 @position = options.fetch :position, 0 401 @line = options.fetch :line, 1 402 @column = options.fetch :column, 0 403 @markers = [] 404 @name ||= options[ :file ] || options[ :name ] # || '(string)' 405 mark 406 end 407 408 # 409 # identical to #peek, except it returns the character value as a String 410 # 411 def look( k = 1 ) # for 1.9 412 k == 0 and return nil 413 k += 1 if k < 0 414 415 index = @position + k - 1 416 index < 0 and return nil 417 418 @string[ index ] 419 end 420 421 else 422 423 # creates a new StringStream object where +data+ is the string data to stream. 424 # accepts the following options in a symbol-to-value hash: 425 # 426 # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt> 427 # [:line] the initial line number; default: +1+ 428 # [:column] the initial column number; default: +0+ 429 # 430 def initialize( data, options = {} ) # for 1.8 431 @data = data.to_s 432 @data.equal?( data ) and @data = @data.clone 433 @data.freeze 434 @string = @data 435 @position = options.fetch :position, 0 436 @line = options.fetch :line, 1 437 @column = options.fetch :column, 0 438 @markers = [] 439 @name ||= options[ :file ] || options[ :name ] # || '(string)' 440 mark 441 end 442 443 # 444 # identical to #peek, except it returns the character value as a String 445 # 446 def look( k = 1 ) # for 1.8 447 k == 0 and return nil 448 k += 1 if k < 0 449 450 index = @position + k - 1 451 index < 0 and return nil 452 453 c = @data[ index ] and c.chr 454 end 455 456 end 457 458 def size 459 @data.length 460 end 461 462 alias length size 463 464 # 465 # rewinds the stream back to the start and clears out any existing marker entries 466 # 467 def reset 468 initial_location = @markers.first 469 @position, @line, @column = initial_location 470 @markers.clear 471 @markers << initial_location 472 return self 473 end 474 475 # 476 # advance the stream by one character; returns the character consumed 477 # 478 def consume 479 c = @data[ @position ] || EOF 480 if @position < @data.length 481 @column += 1 482 if c == NEWLINE 483 @line += 1 484 @column = 0 485 end 486 @position += 1 487 end 488 return( c ) 489 end 490 491 # 492 # return the character at look-ahead distance +k+ as an integer. <tt>k = 1</tt> represents 493 # the current character. +k+ greater than 1 represents upcoming characters. A negative 494 # value of +k+ returns previous characters consumed, where <tt>k = -1</tt> is the last 495 # character consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+ 496 # 497 def peek( k = 1 ) 498 k == 0 and return nil 499 k += 1 if k < 0 500 index = @position + k - 1 501 index < 0 and return nil 502 @data[ index ] or EOF 503 end 504 505 # 506 # return a substring around the stream cursor at a distance +k+ 507 # if <tt>k >= 0</tt>, return the next k characters 508 # if <tt>k < 0</tt>, return the previous <tt>|k|</tt> characters 509 # 510 def through( k ) 511 if k >= 0 then @string[ @position, k ] else 512 start = ( @position + k ).at_least( 0 ) # start cannot be negative or index will wrap around 513 @string[ start ... @position ] 514 end 515 end 516 517 # operator style look-ahead 518 alias >> look 519 520 # operator style look-behind 521 def <<( k ) 522 self << -k 523 end 524 525 alias index position 526 alias character_index position 527 528 alias source_name name 529 530 # 531 # Returns true if the stream appears to be at the beginning of a new line. 532 # This is an extra utility method for use inside lexer actions if needed. 533 # 534 def beginning_of_line? 535 @position.zero? or @data[ @position - 1 ] == NEWLINE 536 end 537 538 # 539 # Returns true if the stream appears to be at the end of a new line. 540 # This is an extra utility method for use inside lexer actions if needed. 541 # 542 def end_of_line? 543 @data[ @position ] == NEWLINE #if @position < @data.length 544 end 545 546 # 547 # Returns true if the stream has been exhausted. 548 # This is an extra utility method for use inside lexer actions if needed. 549 # 550 def end_of_string? 551 @position >= @data.length 552 end 553 554 # 555 # Returns true if the stream appears to be at the beginning of a stream (position = 0). 556 # This is an extra utility method for use inside lexer actions if needed. 557 # 558 def beginning_of_string? 559 @position == 0 560 end 561 562 alias eof? end_of_string? 563 alias bof? beginning_of_string? 564 565 # 566 # record the current stream location parameters in the stream's marker table and 567 # return an integer-valued bookmark that may be used to restore the stream's 568 # position with the #rewind method. This method is used to implement backtracking. 569 # 570 def mark 571 state = [ @position, @line, @column ].freeze 572 @markers << state 573 return @markers.length - 1 574 end 575 576 # 577 # restore the stream to an earlier location recorded by #mark. If no marker value is 578 # provided, the last marker generated by #mark will be used. 579 # 580 def rewind( marker = @markers.length - 1, release = true ) 581 ( marker >= 0 and location = @markers[ marker ] ) or return( self ) 582 @position, @line, @column = location 583 release( marker ) if release 584 return self 585 end 586 587 # 588 # the total number of markers currently in existence 589 # 590 def mark_depth 591 @markers.length 592 end 593 594 # 595 # the last marker value created by a call to #mark 596 # 597 def last_marker 598 @markers.length - 1 599 end 600 601 # 602 # let go of the bookmark data for the marker and all marker 603 # values created after the marker. 604 # 605 def release( marker = @markers.length - 1 ) 606 marker.between?( 1, @markers.length - 1 ) or return 607 @markers.pop( @markers.length - marker ) 608 return self 609 end 610 611 # 612 # jump to the absolute position value given by +index+. 613 # note: if +index+ is before the current position, the +line+ and +column+ 614 # attributes of the stream will probably be incorrect 615 # 616 def seek( index ) 617 index = index.bound( 0, @data.length ) # ensures index is within the stream's range 618 if index > @position 619 skipped = through( index - @position ) 620 if lc = skipped.count( "\n" ) and lc.zero? 621 @column += skipped.length 622 else 623 @line += lc 624 @column = skipped.length - skipped.rindex( "\n" ) - 1 625 end 626 end 627 @position = index 628 return nil 629 end 630 631 # 632 # customized object inspection that shows: 633 # * the stream class 634 # * the stream's location in <tt>index / line:column</tt> format 635 # * +before_chars+ characters before the cursor (6 characters by default) 636 # * +after_chars+ characters after the cursor (10 characters by default) 637 # 638 def inspect( before_chars = 6, after_chars = 10 ) 639 before = through( -before_chars ).inspect 640 @position - before_chars > 0 and before.insert( 0, '... ' ) 641 642 after = through( after_chars ).inspect 643 @position + after_chars + 1 < @data.length and after << ' ...' 644 645 location = "#@position / line #@line:#@column" 646 "#<#{ self.class }: #{ before } | #{ after } @ #{ location }>" 647 end 648 649 # 650 # return the string slice between position +start+ and +stop+ 651 # 652 def substring( start, stop ) 653 @string[ start, stop - start + 1 ] 654 end 655 656 # 657 # identical to String#[] 658 # 659 def []( start, *args ) 660 @string[ start, *args ] 661 end 662end 663 664 665=begin rdoc ANTLR3::FileStream 666 667FileStream is a character stream that uses data stored in some external file. It 668is nearly identical to StringStream and functions as use data located in a file 669while automatically setting up the +source_name+ and +line+ parameters. It does 670not actually use any buffered IO operations throughout the stream navigation 671process. Instead, it reads the file data once when the stream is initialized. 672 673=end 674 675class FileStream < StringStream 676 677 # 678 # creates a new FileStream object using the given +file+ object. 679 # If +file+ is a path string, the file will be read and the contents 680 # will be used and the +name+ attribute will be set to the path. 681 # If +file+ is an IO-like object (that responds to :read), 682 # the content of the object will be used and the stream will 683 # attempt to set its +name+ object first trying the method #name 684 # on the object, then trying the method #path on the object. 685 # 686 # see StringStream.new for a list of additional options 687 # the constructer accepts 688 # 689 def initialize( file, options = {} ) 690 case file 691 when $stdin then 692 data = $stdin.read 693 @name = '(stdin)' 694 when ARGF 695 data = file.read 696 @name = file.path 697 when ::File then 698 file = file.clone 699 file.reopen( file.path, 'r' ) 700 @name = file.path 701 data = file.read 702 file.close 703 else 704 if file.respond_to?( :read ) 705 data = file.read 706 if file.respond_to?( :name ) then @name = file.name 707 elsif file.respond_to?( :path ) then @name = file.path 708 end 709 else 710 @name = file.to_s 711 if test( ?f, @name ) then data = File.read( @name ) 712 else raise ArgumentError, "could not find an existing file at %p" % @name 713 end 714 end 715 end 716 super( data, options ) 717 end 718 719end 720 721=begin rdoc ANTLR3::CommonTokenStream 722 723CommonTokenStream serves as the primary token stream implementation for feeding 724sequential token input into parsers. 725 726Using some TokenSource (such as a lexer), the stream collects a token sequence, 727setting the token's <tt>index</tt> attribute to indicate the token's position 728within the stream. The streams may be tuned to some channel value; off-channel 729tokens will be filtered out by the #peek, #look, and #consume methods. 730 731=== Sample Usage 732 733 734 source_input = ANTLR3::StringStream.new("35 * 4 - 1") 735 lexer = Calculator::Lexer.new(source_input) 736 tokens = ANTLR3::CommonTokenStream.new(lexer) 737 738 # assume this grammar defines whitespace as tokens on channel HIDDEN 739 # and numbers and operations as tokens on channel DEFAULT 740 tokens.look # => 0 INT['35'] @ line 1 col 0 (0..1) 741 tokens.look(2) # => 2 MULT["*"] @ line 1 col 2 (3..3) 742 tokens.tokens(0, 2) 743 # => [0 INT["35"] @line 1 col 0 (0..1), 744 # 1 WS[" "] @line 1 col 2 (1..1), 745 # 2 MULT["*"] @ line 1 col 3 (3..3)] 746 # notice the #tokens method does not filter off-channel tokens 747 748 lexer.reset 749 hidden_tokens = 750 ANTLR3::CommonTokenStream.new(lexer, :channel => ANTLR3::HIDDEN) 751 hidden_tokens.look # => 1 WS[' '] @ line 1 col 2 (1..1) 752 753=end 754 755class CommonTokenStream 756 include TokenStream 757 include Enumerable 758 759 # 760 # constructs a new token stream using the +token_source+ provided. +token_source+ is 761 # usually a lexer, but can be any object that implements +next_token+ and includes 762 # ANTLR3::TokenSource. 763 # 764 # If a block is provided, each token harvested will be yielded and if the block 765 # returns a +nil+ or +false+ value, the token will not be added to the stream -- 766 # it will be discarded. 767 # 768 # === Options 769 # [:channel] The channel value the stream should be tuned to initially 770 # [:source_name] The source name (file name) attribute of the stream 771 # 772 # === Example 773 # 774 # # create a new token stream that is tuned to channel :comment, and 775 # # discard all WHITE_SPACE tokens 776 # ANTLR3::CommonTokenStream.new(lexer, :channel => :comment) do |token| 777 # token.name != 'WHITE_SPACE' 778 # end 779 # 780 def initialize( token_source, options = {} ) 781 case token_source 782 when CommonTokenStream 783 # this is useful in cases where you want to convert a CommonTokenStream 784 # to a RewriteTokenStream or other variation of the standard token stream 785 stream = token_source 786 @token_source = stream.token_source 787 @channel = options.fetch( :channel ) { stream.channel or DEFAULT_CHANNEL } 788 @source_name = options.fetch( :source_name ) { stream.source_name } 789 tokens = stream.tokens.map { | t | t.dup } 790 else 791 @token_source = token_source 792 @channel = options.fetch( :channel, DEFAULT_CHANNEL ) 793 @source_name = options.fetch( :source_name ) { @token_source.source_name rescue nil } 794 tokens = @token_source.to_a 795 end 796 @last_marker = nil 797 @tokens = block_given? ? tokens.select { | t | yield( t, self ) } : tokens 798 @tokens.each_with_index { |t, i| t.index = i } 799 @position = 800 if first_token = @tokens.find { |t| t.channel == @channel } 801 @tokens.index( first_token ) 802 else @tokens.length 803 end 804 end 805 806 # 807 # resets the token stream and rebuilds it with a potentially new token source. 808 # If no +token_source+ value is provided, the stream will attempt to reset the 809 # current +token_source+ by calling +reset+ on the object. The stream will 810 # then clear the token buffer and attempt to harvest new tokens. Identical in 811 # behavior to CommonTokenStream.new, if a block is provided, tokens will be 812 # yielded and discarded if the block returns a +false+ or +nil+ value. 813 # 814 def rebuild( token_source = nil ) 815 if token_source.nil? 816 @token_source.reset rescue nil 817 else @token_source = token_source 818 end 819 @tokens = block_given? ? @token_source.select { |token| yield( token ) } : 820 @token_source.to_a 821 @tokens.each_with_index { |t, i| t.index = i } 822 @last_marker = nil 823 @position = 824 if first_token = @tokens.find { |t| t.channel == @channel } 825 @tokens.index( first_token ) 826 else @tokens.length 827 end 828 return self 829 end 830 831 # 832 # tune the stream to a new channel value 833 # 834 def tune_to( channel ) 835 @channel = channel 836 end 837 838 def token_class 839 @token_source.token_class 840 rescue NoMethodError 841 @position == -1 and fill_buffer 842 @tokens.empty? ? CommonToken : @tokens.first.class 843 end 844 845 alias index position 846 847 def size 848 @tokens.length 849 end 850 851 alias length size 852 853 ###### State-Control ################################################ 854 855 # 856 # rewind the stream to its initial state 857 # 858 def reset 859 @position = 0 860 @position += 1 while token = @tokens[ @position ] and 861 token.channel != @channel 862 @last_marker = nil 863 return self 864 end 865 866 # 867 # bookmark the current position of the input stream 868 # 869 def mark 870 @last_marker = @position 871 end 872 873 def release( marker = nil ) 874 # do nothing 875 end 876 877 878 def rewind( marker = @last_marker, release = true ) 879 seek( marker ) 880 end 881 882 # 883 # saves the current stream position, yields to the block, 884 # and then ensures the stream's position is restored before 885 # returning the value of the block 886 # 887 def hold( pos = @position ) 888 block_given? or return enum_for( :hold, pos ) 889 begin 890 yield 891 ensure 892 seek( pos ) 893 end 894 end 895 896 ###### Stream Navigation ########################################### 897 898 # 899 # advance the stream one step to the next on-channel token 900 # 901 def consume 902 token = @tokens[ @position ] || EOF_TOKEN 903 if @position < @tokens.length 904 @position = future?( 2 ) || @tokens.length 905 end 906 return( token ) 907 end 908 909 # 910 # jump to the stream position specified by +index+ 911 # note: seek does not check whether or not the 912 # token at the specified position is on-channel, 913 # 914 def seek( index ) 915 @position = index.to_i.bound( 0, @tokens.length ) 916 return self 917 end 918 919 # 920 # return the type of the on-channel token at look-ahead distance +k+. <tt>k = 1</tt> represents 921 # the current token. +k+ greater than 1 represents upcoming on-channel tokens. A negative 922 # value of +k+ returns previous on-channel tokens consumed, where <tt>k = -1</tt> is the last 923 # on-channel token consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+ 924 # 925 def peek( k = 1 ) 926 tk = look( k ) and return( tk.type ) 927 end 928 929 # 930 # operates simillarly to #peek, but returns the full token object at look-ahead position +k+ 931 # 932 def look( k = 1 ) 933 index = future?( k ) or return nil 934 @tokens.fetch( index, EOF_TOKEN ) 935 end 936 937 alias >> look 938 def << k 939 self >> -k 940 end 941 942 # 943 # returns the index of the on-channel token at look-ahead position +k+ or nil if no other 944 # on-channel tokens exist 945 # 946 def future?( k = 1 ) 947 @position == -1 and fill_buffer 948 949 case 950 when k == 0 then nil 951 when k < 0 then past?( -k ) 952 when k == 1 then @position 953 else 954 # since the stream only yields on-channel 955 # tokens, the stream can't just go to the 956 # next position, but rather must skip 957 # over off-channel tokens 958 ( k - 1 ).times.inject( @position ) do |cursor, | 959 begin 960 tk = @tokens.at( cursor += 1 ) or return( cursor ) 961 # ^- if tk is nil (i.e. i is outside array limits) 962 end until tk.channel == @channel 963 cursor 964 end 965 end 966 end 967 968 # 969 # returns the index of the on-channel token at look-behind position +k+ or nil if no other 970 # on-channel tokens exist before the current token 971 # 972 def past?( k = 1 ) 973 @position == -1 and fill_buffer 974 975 case 976 when k == 0 then nil 977 when @position - k < 0 then nil 978 else 979 980 k.times.inject( @position ) do |cursor, | 981 begin 982 cursor <= 0 and return( nil ) 983 tk = @tokens.at( cursor -= 1 ) or return( nil ) 984 end until tk.channel == @channel 985 cursor 986 end 987 988 end 989 end 990 991 # 992 # yields each token in the stream (including off-channel tokens) 993 # If no block is provided, the method returns an Enumerator object. 994 # #each accepts the same arguments as #tokens 995 # 996 def each( *args ) 997 block_given? or return enum_for( :each, *args ) 998 tokens( *args ).each { |token| yield( token ) } 999 end 1000 1001 1002 # 1003 # yields each token in the stream with the given channel value 1004 # If no channel value is given, the stream's tuned channel value will be used. 1005 # If no block is given, an enumerator will be returned. 1006 # 1007 def each_on_channel( channel = @channel ) 1008 block_given? or return enum_for( :each_on_channel, channel ) 1009 for token in @tokens 1010 token.channel == channel and yield( token ) 1011 end 1012 end 1013 1014 # 1015 # iterates through the token stream, yielding each on channel token along the way. 1016 # After iteration has completed, the stream's position will be restored to where 1017 # it was before #walk was called. While #each or #each_on_channel does not change 1018 # the positions stream during iteration, #walk advances through the stream. This 1019 # makes it possible to look ahead and behind the current token during iteration. 1020 # If no block is given, an enumerator will be returned. 1021 # 1022 def walk 1023 block_given? or return enum_for( :walk ) 1024 initial_position = @position 1025 begin 1026 while token = look and token.type != EOF 1027 consume 1028 yield( token ) 1029 end 1030 return self 1031 ensure 1032 @position = initial_position 1033 end 1034 end 1035 1036 # 1037 # returns a copy of the token buffer. If +start+ and +stop+ are provided, tokens 1038 # returns a slice of the token buffer from <tt>start..stop</tt>. The parameters 1039 # are converted to integers with their <tt>to_i</tt> methods, and thus tokens 1040 # can be provided to specify start and stop. If a block is provided, tokens are 1041 # yielded and filtered out of the return array if the block returns a +false+ 1042 # or +nil+ value. 1043 # 1044 def tokens( start = nil, stop = nil ) 1045 stop.nil? || stop >= @tokens.length and stop = @tokens.length - 1 1046 start.nil? || stop < 0 and start = 0 1047 tokens = @tokens[ start..stop ] 1048 1049 if block_given? 1050 tokens.delete_if { |t| not yield( t ) } 1051 end 1052 1053 return( tokens ) 1054 end 1055 1056 1057 def at( i ) 1058 @tokens.at i 1059 end 1060 1061 # 1062 # identical to Array#[], as applied to the stream's token buffer 1063 # 1064 def []( i, *args ) 1065 @tokens[ i, *args ] 1066 end 1067 1068 ###### Standard Conversion Methods ############################### 1069 def inspect 1070 string = "#<%p: @token_source=%p @ %p/%p" % 1071 [ self.class, @token_source.class, @position, @tokens.length ] 1072 tk = look( -1 ) and string << " #{ tk.inspect } <--" 1073 tk = look( 1 ) and string << " --> #{ tk.inspect }" 1074 string << '>' 1075 end 1076 1077 # 1078 # fetches the text content of all tokens between +start+ and +stop+ and 1079 # joins the chunks into a single string 1080 # 1081 def extract_text( start = 0, stop = @tokens.length - 1 ) 1082 start = start.to_i.at_least( 0 ) 1083 stop = stop.to_i.at_most( @tokens.length ) 1084 @tokens[ start..stop ].map! { |t| t.text }.join( '' ) 1085 end 1086 1087 alias to_s extract_text 1088 1089end 1090 1091end 1092