#!/usr/bin/ruby # encoding: utf-8 =begin LICENSE [The "BSD licence"] Copyright (c) 2009-2010 Kyle Yetter All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. =end module ANTLR3 =begin rdoc ANTLR3::Stream = ANTLR3 Streams This documentation first covers the general concept of streams as used by ANTLR recognizers, and then discusses the specific ANTLR3::Stream module. == ANTLR Stream Classes ANTLR recognizers need a way to walk through input data in a serialized IO-style fashion. They also need some book-keeping about the input to provide useful information to developers, such as current line number and column. Furthermore, to implement backtracking and various error recovery techniques, recognizers need a way to record various locations in the input at a number of points in the recognition process so the input state may be restored back to a prior state. ANTLR bundles all of this functionality into a number of Stream classes, each designed to be used by recognizers for a specific recognition task. Most of the Stream hierarchy is implemented in antlr3/stream.rb, which is loaded by default when 'antlr3' is required. --- Here's a brief overview of the various stream classes and their respective purpose: StringStream:: Similar to StringIO from the standard Ruby library, StringStream wraps raw String data in a Stream interface for use by ANTLR lexers. FileStream:: A subclass of StringStream, FileStream simply wraps data read from an IO or File object for use by lexers. CommonTokenStream:: The job of a TokenStream is to read lexer output and then provide ANTLR parsers with the means to sequential walk through series of tokens. CommonTokenStream is the default TokenStream implementation. TokenRewriteStream:: A subclass of CommonTokenStream, TokenRewriteStreams provide rewriting-parsers the ability to produce new output text from an input token-sequence by managing rewrite "programs" on top of the stream. CommonTreeNodeStream:: In a similar fashion to CommonTokenStream, CommonTreeNodeStream feeds tokens to recognizers in a sequential fashion. However, the stream object serializes an Abstract Syntax Tree into a flat, one-dimensional sequence, but preserves the two-dimensional shape of the tree using special UP and DOWN tokens. The sequence is primarily used by ANTLR Tree Parsers. *note* -- this is not defined in antlr3/stream.rb, but antlr3/tree.rb --- The next few sections cover the most significant methods of all stream classes. === consume / look / peek stream.consume is used to advance a stream one unit. StringStreams are advanced by one character and TokenStreams are advanced by one token. stream.peek(k = 1) is used to quickly retrieve the object of interest to a recognizer at look-ahead position specified by k. For StringStreams, this is the integer value of the character k characters ahead of the stream cursor. For TokenStreams, this is the integer token type of the token k tokens ahead of the stream cursor. stream.look(k = 1) is used to retrieve the full object of interest at look-ahead position specified by k. While peek provides the bare-minimum lightweight information that the recognizer needs, look provides the full object of concern in the stream. For StringStreams, this is a string object containing the single character k characters ahead of the stream cursor. For TokenStreams, this is the full token structure k tokens ahead of the stream cursor. Note: in most ANTLR runtime APIs for other languages, peek is implemented by some method with a name like LA(k) and look is implemented by some method with a name like LT(k). When writing this Ruby runtime API, I found this naming practice both confusing, ambiguous, and un-Ruby-like. Thus, I chose peek and look to represent a quick-look (peek) and a full-fledged look-ahead operation (look). If this causes confusion or any sort of compatibility strife for developers using this implementation, all apologies. === mark / rewind / release marker = stream.mark causes the stream to record important information about the current stream state, place the data in an internal memory table, and return a memento, marker. The marker object is typically an integer key to the stream's internal memory table. Used in tandem with, stream.rewind(mark = last_marker), the marker can be used to restore the stream to an earlier state. This is used by recognizers to perform tasks such as backtracking and error recovery. stream.release(marker = last_marker) can be used to release an existing state marker from the memory table. === seek stream.seek(position) moves the stream cursor to an absolute position within the stream, basically like typical ruby IO#seek style methods. However, unlike IO#seek, ANTLR streams currently always use absolute position seeking. == The Stream Module ANTLR3::Stream is an abstract-ish base mixin for all IO-like stream classes used by ANTLR recognizers. The module doesn't do much on its own besides define arguably annoying ``abstract'' pseudo-methods that demand implementation when it is mixed in to a class that wants to be a Stream. Right now this exists as an artifact of porting the ANTLR Java/Python runtime library to Ruby. In Java, of course, this is represented as an interface. In Ruby, however, objects are duck-typed and interfaces aren't that useful as programmatic entities -- in fact, it's mildly wasteful to have a module like this hanging out. Thus, I may axe it. When mixed in, it does give the class a #size and #source_name attribute methods. Except in a small handful of places, most of the ANTLR runtime library uses duck-typing and not type checking on objects. This means that the methods which manipulate stream objects don't usually bother checking that the object is a Stream and assume that the object implements the proper stream interface. Thus, it is not strictly necessary that custom stream objects include ANTLR3::Stream, though it isn't a bad idea. =end module Stream include ANTLR3::Constants extend ClassMacros ## # :method: consume # used to advance a stream one unit (such as character or token) abstract :consume ## # :method: peek( k = 1 ) # used to quickly retreive the object of interest to a recognizer at lookahead # position specified by k (such as integer value of a character or an # integer token type) abstract :peek ## # :method: look( k = 1 ) # used to retreive the full object of interest at lookahead position specified # by k (such as a character string or a token structure) abstract :look ## # :method: mark # saves the current position for the purposes of backtracking and # returns a value to pass to #rewind at a later time abstract :mark ## # :method: index # returns the current position of the stream abstract :index ## # :method: rewind( marker = last_marker ) # restores the stream position using the state information previously saved # by the given marker abstract :rewind ## # :method: release( marker = last_marker ) # clears the saved state information associated with the given marker value abstract :release ## # :method: seek( position ) # move the stream to the given absolute index given by +position+ abstract :seek ## # the total number of symbols in the stream attr_reader :size ## # indicates an identifying name for the stream -- usually the file path of the input attr_accessor :source_name end =begin rdoc ANTLR3::CharacterStream CharacterStream further extends the abstract-ish base mixin Stream to add methods specific to navigating character-based input data. Thus, it serves as an immitation of the Java interface for text-based streams, which are primarily used by lexers. It adds the ``abstract'' method, substring(start, stop), which must be implemented to return a slice of the input string from position start to position stop. It also adds attribute accessor methods line and column, which are expected to indicate the current line number and position within the current line, respectively. == A Word About line and column attributes Presumably, the concept of line and column attirbutes of text are familliar to most developers. Line numbers of text are indexed from number 1 up (not 0). Column numbers are indexed from 0 up. Thus, examining sample text: Hey this is the first line. Oh, and this is the second line. Line 1 is the string "Hey this is the first line\\n". If a character stream is at line 2, character 0, the stream cursor is sitting between the characters "\\n" and "O". *Note:* most ANTLR runtime APIs for other languages refer to column with the more-precise, but lengthy name charPositionInLine. I prefered to keep it simple and familliar in this Ruby runtime API. =end module CharacterStream include Stream extend ClassMacros include Constants ## # :method: substring(start,stop) abstract :substring attr_accessor :line attr_accessor :column end =begin rdoc ANTLR3::TokenStream TokenStream further extends the abstract-ish base mixin Stream to add methods specific to navigating token sequences. Thus, it serves as an imitation of the Java interface for token-based streams, which are used by many different components in ANTLR, including parsers and tree parsers. == Token Streams Token streams wrap a sequence of token objects produced by some token source, usually a lexer. They provide the operations required by higher-level recognizers, such as parsers and tree parsers for navigating through the sequence of tokens. Unlike simple character-based streams, such as StringStream, token-based streams have an additional level of complexity because they must manage the task of "tuning" to a specific token channel. One of the main advantages of ANTLR-based recognition is the token channel feature, which allows you to hold on to all tokens of interest while only presenting a specific set of interesting tokens to a parser. For example, if you need to hide whitespace and comments from a parser, but hang on to them for some other purpose, you have the lexer assign the comments and whitespace to channel value HIDDEN as it creates the tokens. When you create a token stream, you can tune it to some specific channel value. Then, all peek, look, and consume operations only yield tokens that have the same value for channel. The stream skips over any non-matching tokens in between. == The TokenStream Interface In addition to the abstract methods and attribute methods provided by the base Stream module, TokenStream adds a number of additional method implementation requirements and attributes. =end module TokenStream include Stream extend ClassMacros ## # expected to return the token source object (such as a lexer) from which # all tokens in the stream were retreived attr_reader :token_source ## # expected to return the value of the last marker produced by a call to # stream.mark attr_reader :last_marker ## # expected to return the integer index of the stream cursor attr_reader :position ## # the integer channel value to which the stream is ``tuned'' attr_accessor :channel ## # :method: to_s(start=0,stop=tokens.length-1) # should take the tokens between start and stop in the sequence, extract their text # and return the concatenation of all the text chunks abstract :to_s ## # :method: at( i ) # return the stream symbol at index +i+ abstract :at end =begin rdoc ANTLR3::StringStream A StringStream's purpose is to wrap the basic, naked text input of a recognition system. Like all other stream types, it provides serial navigation of the input; a recognizer can arbitrarily step forward and backward through the stream's symbols as it requires. StringStream and its subclasses are they main way to feed text input into an ANTLR Lexer for token processing. The stream's symbols of interest, of course, are character values. Thus, the #peek method returns the integer character value at look-ahead position k and the #look method returns the character value as a +String+. They also track various pieces of information such as the line and column numbers at the current position. === Note About Text Encoding This version of the runtime library primarily targets ruby version 1.8, which does not have strong built-in support for multi-byte character encodings. Thus, characters are assumed to be represented by a single byte -- an integer between 0 and 255. Ruby 1.9 does provide built-in encoding support for multi-byte characters, but currently this library does not provide any streams to handle non-ASCII encoding. However, encoding-savvy recognition code is a future development goal for this project. =end class StringStream NEWLINE = ?\n.ord include CharacterStream # current integer character index of the stream attr_reader :position # the current line number of the input, indexed upward from 1 attr_reader :line # the current character position within the current line, indexed upward from 0 attr_reader :column # the name associated with the stream -- usually a file name # defaults to "(string)" attr_accessor :name # the entire string that is wrapped by the stream attr_reader :data attr_reader :string if RUBY_VERSION =~ /^1\.9/ # creates a new StringStream object where +data+ is the string data to stream. # accepts the following options in a symbol-to-value hash: # # [:file or :name] the (file) name to associate with the stream; default: '(string)' # [:line] the initial line number; default: +1+ # [:column] the initial column number; default: +0+ # def initialize( data, options = {} ) # for 1.9 @string = data.to_s.encode( Encoding::UTF_8 ).freeze @data = @string.codepoints.to_a.freeze @position = options.fetch :position, 0 @line = options.fetch :line, 1 @column = options.fetch :column, 0 @markers = [] @name ||= options[ :file ] || options[ :name ] # || '(string)' mark end # # identical to #peek, except it returns the character value as a String # def look( k = 1 ) # for 1.9 k == 0 and return nil k += 1 if k < 0 index = @position + k - 1 index < 0 and return nil @string[ index ] end else # creates a new StringStream object where +data+ is the string data to stream. # accepts the following options in a symbol-to-value hash: # # [:file or :name] the (file) name to associate with the stream; default: '(string)' # [:line] the initial line number; default: +1+ # [:column] the initial column number; default: +0+ # def initialize( data, options = {} ) # for 1.8 @data = data.to_s @data.equal?( data ) and @data = @data.clone @data.freeze @string = @data @position = options.fetch :position, 0 @line = options.fetch :line, 1 @column = options.fetch :column, 0 @markers = [] @name ||= options[ :file ] || options[ :name ] # || '(string)' mark end # # identical to #peek, except it returns the character value as a String # def look( k = 1 ) # for 1.8 k == 0 and return nil k += 1 if k < 0 index = @position + k - 1 index < 0 and return nil c = @data[ index ] and c.chr end end def size @data.length end alias length size # # rewinds the stream back to the start and clears out any existing marker entries # def reset initial_location = @markers.first @position, @line, @column = initial_location @markers.clear @markers << initial_location return self end # # advance the stream by one character; returns the character consumed # def consume c = @data[ @position ] || EOF if @position < @data.length @column += 1 if c == NEWLINE @line += 1 @column = 0 end @position += 1 end return( c ) end # # return the character at look-ahead distance +k+ as an integer. k = 1 represents # the current character. +k+ greater than 1 represents upcoming characters. A negative # value of +k+ returns previous characters consumed, where k = -1 is the last # character consumed. k = 0 has undefined behavior and returns +nil+ # def peek( k = 1 ) k == 0 and return nil k += 1 if k < 0 index = @position + k - 1 index < 0 and return nil @data[ index ] or EOF end # # return a substring around the stream cursor at a distance +k+ # if k >= 0, return the next k characters # if k < 0, return the previous |k| characters # def through( k ) if k >= 0 then @string[ @position, k ] else start = ( @position + k ).at_least( 0 ) # start cannot be negative or index will wrap around @string[ start ... @position ] end end # operator style look-ahead alias >> look # operator style look-behind def <<( k ) self << -k end alias index position alias character_index position alias source_name name # # Returns true if the stream appears to be at the beginning of a new line. # This is an extra utility method for use inside lexer actions if needed. # def beginning_of_line? @position.zero? or @data[ @position - 1 ] == NEWLINE end # # Returns true if the stream appears to be at the end of a new line. # This is an extra utility method for use inside lexer actions if needed. # def end_of_line? @data[ @position ] == NEWLINE #if @position < @data.length end # # Returns true if the stream has been exhausted. # This is an extra utility method for use inside lexer actions if needed. # def end_of_string? @position >= @data.length end # # Returns true if the stream appears to be at the beginning of a stream (position = 0). # This is an extra utility method for use inside lexer actions if needed. # def beginning_of_string? @position == 0 end alias eof? end_of_string? alias bof? beginning_of_string? # # record the current stream location parameters in the stream's marker table and # return an integer-valued bookmark that may be used to restore the stream's # position with the #rewind method. This method is used to implement backtracking. # def mark state = [ @position, @line, @column ].freeze @markers << state return @markers.length - 1 end # # restore the stream to an earlier location recorded by #mark. If no marker value is # provided, the last marker generated by #mark will be used. # def rewind( marker = @markers.length - 1, release = true ) ( marker >= 0 and location = @markers[ marker ] ) or return( self ) @position, @line, @column = location release( marker ) if release return self end # # the total number of markers currently in existence # def mark_depth @markers.length end # # the last marker value created by a call to #mark # def last_marker @markers.length - 1 end # # let go of the bookmark data for the marker and all marker # values created after the marker. # def release( marker = @markers.length - 1 ) marker.between?( 1, @markers.length - 1 ) or return @markers.pop( @markers.length - marker ) return self end # # jump to the absolute position value given by +index+. # note: if +index+ is before the current position, the +line+ and +column+ # attributes of the stream will probably be incorrect # def seek( index ) index = index.bound( 0, @data.length ) # ensures index is within the stream's range if index > @position skipped = through( index - @position ) if lc = skipped.count( "\n" ) and lc.zero? @column += skipped.length else @line += lc @column = skipped.length - skipped.rindex( "\n" ) - 1 end end @position = index return nil end # # customized object inspection that shows: # * the stream class # * the stream's location in index / line:column format # * +before_chars+ characters before the cursor (6 characters by default) # * +after_chars+ characters after the cursor (10 characters by default) # def inspect( before_chars = 6, after_chars = 10 ) before = through( -before_chars ).inspect @position - before_chars > 0 and before.insert( 0, '... ' ) after = through( after_chars ).inspect @position + after_chars + 1 < @data.length and after << ' ...' location = "#@position / line #@line:#@column" "#<#{ self.class }: #{ before } | #{ after } @ #{ location }>" end # # return the string slice between position +start+ and +stop+ # def substring( start, stop ) @string[ start, stop - start + 1 ] end # # identical to String#[] # def []( start, *args ) @string[ start, *args ] end end =begin rdoc ANTLR3::FileStream FileStream is a character stream that uses data stored in some external file. It is nearly identical to StringStream and functions as use data located in a file while automatically setting up the +source_name+ and +line+ parameters. It does not actually use any buffered IO operations throughout the stream navigation process. Instead, it reads the file data once when the stream is initialized. =end class FileStream < StringStream # # creates a new FileStream object using the given +file+ object. # If +file+ is a path string, the file will be read and the contents # will be used and the +name+ attribute will be set to the path. # If +file+ is an IO-like object (that responds to :read), # the content of the object will be used and the stream will # attempt to set its +name+ object first trying the method #name # on the object, then trying the method #path on the object. # # see StringStream.new for a list of additional options # the constructer accepts # def initialize( file, options = {} ) case file when $stdin then data = $stdin.read @name = '(stdin)' when ARGF data = file.read @name = file.path when ::File then file = file.clone file.reopen( file.path, 'r' ) @name = file.path data = file.read file.close else if file.respond_to?( :read ) data = file.read if file.respond_to?( :name ) then @name = file.name elsif file.respond_to?( :path ) then @name = file.path end else @name = file.to_s if test( ?f, @name ) then data = File.read( @name ) else raise ArgumentError, "could not find an existing file at %p" % @name end end end super( data, options ) end end =begin rdoc ANTLR3::CommonTokenStream CommonTokenStream serves as the primary token stream implementation for feeding sequential token input into parsers. Using some TokenSource (such as a lexer), the stream collects a token sequence, setting the token's index attribute to indicate the token's position within the stream. The streams may be tuned to some channel value; off-channel tokens will be filtered out by the #peek, #look, and #consume methods. === Sample Usage source_input = ANTLR3::StringStream.new("35 * 4 - 1") lexer = Calculator::Lexer.new(source_input) tokens = ANTLR3::CommonTokenStream.new(lexer) # assume this grammar defines whitespace as tokens on channel HIDDEN # and numbers and operations as tokens on channel DEFAULT tokens.look # => 0 INT['35'] @ line 1 col 0 (0..1) tokens.look(2) # => 2 MULT["*"] @ line 1 col 2 (3..3) tokens.tokens(0, 2) # => [0 INT["35"] @line 1 col 0 (0..1), # 1 WS[" "] @line 1 col 2 (1..1), # 2 MULT["*"] @ line 1 col 3 (3..3)] # notice the #tokens method does not filter off-channel tokens lexer.reset hidden_tokens = ANTLR3::CommonTokenStream.new(lexer, :channel => ANTLR3::HIDDEN) hidden_tokens.look # => 1 WS[' '] @ line 1 col 2 (1..1) =end class CommonTokenStream include TokenStream include Enumerable # # constructs a new token stream using the +token_source+ provided. +token_source+ is # usually a lexer, but can be any object that implements +next_token+ and includes # ANTLR3::TokenSource. # # If a block is provided, each token harvested will be yielded and if the block # returns a +nil+ or +false+ value, the token will not be added to the stream -- # it will be discarded. # # === Options # [:channel] The channel value the stream should be tuned to initially # [:source_name] The source name (file name) attribute of the stream # # === Example # # # create a new token stream that is tuned to channel :comment, and # # discard all WHITE_SPACE tokens # ANTLR3::CommonTokenStream.new(lexer, :channel => :comment) do |token| # token.name != 'WHITE_SPACE' # end # def initialize( token_source, options = {} ) case token_source when CommonTokenStream # this is useful in cases where you want to convert a CommonTokenStream # to a RewriteTokenStream or other variation of the standard token stream stream = token_source @token_source = stream.token_source @channel = options.fetch( :channel ) { stream.channel or DEFAULT_CHANNEL } @source_name = options.fetch( :source_name ) { stream.source_name } tokens = stream.tokens.map { | t | t.dup } else @token_source = token_source @channel = options.fetch( :channel, DEFAULT_CHANNEL ) @source_name = options.fetch( :source_name ) { @token_source.source_name rescue nil } tokens = @token_source.to_a end @last_marker = nil @tokens = block_given? ? tokens.select { | t | yield( t, self ) } : tokens @tokens.each_with_index { |t, i| t.index = i } @position = if first_token = @tokens.find { |t| t.channel == @channel } @tokens.index( first_token ) else @tokens.length end end # # resets the token stream and rebuilds it with a potentially new token source. # If no +token_source+ value is provided, the stream will attempt to reset the # current +token_source+ by calling +reset+ on the object. The stream will # then clear the token buffer and attempt to harvest new tokens. Identical in # behavior to CommonTokenStream.new, if a block is provided, tokens will be # yielded and discarded if the block returns a +false+ or +nil+ value. # def rebuild( token_source = nil ) if token_source.nil? @token_source.reset rescue nil else @token_source = token_source end @tokens = block_given? ? @token_source.select { |token| yield( token ) } : @token_source.to_a @tokens.each_with_index { |t, i| t.index = i } @last_marker = nil @position = if first_token = @tokens.find { |t| t.channel == @channel } @tokens.index( first_token ) else @tokens.length end return self end # # tune the stream to a new channel value # def tune_to( channel ) @channel = channel end def token_class @token_source.token_class rescue NoMethodError @position == -1 and fill_buffer @tokens.empty? ? CommonToken : @tokens.first.class end alias index position def size @tokens.length end alias length size ###### State-Control ################################################ # # rewind the stream to its initial state # def reset @position = 0 @position += 1 while token = @tokens[ @position ] and token.channel != @channel @last_marker = nil return self end # # bookmark the current position of the input stream # def mark @last_marker = @position end def release( marker = nil ) # do nothing end def rewind( marker = @last_marker, release = true ) seek( marker ) end # # saves the current stream position, yields to the block, # and then ensures the stream's position is restored before # returning the value of the block # def hold( pos = @position ) block_given? or return enum_for( :hold, pos ) begin yield ensure seek( pos ) end end ###### Stream Navigation ########################################### # # advance the stream one step to the next on-channel token # def consume token = @tokens[ @position ] || EOF_TOKEN if @position < @tokens.length @position = future?( 2 ) || @tokens.length end return( token ) end # # jump to the stream position specified by +index+ # note: seek does not check whether or not the # token at the specified position is on-channel, # def seek( index ) @position = index.to_i.bound( 0, @tokens.length ) return self end # # return the type of the on-channel token at look-ahead distance +k+. k = 1 represents # the current token. +k+ greater than 1 represents upcoming on-channel tokens. A negative # value of +k+ returns previous on-channel tokens consumed, where k = -1 is the last # on-channel token consumed. k = 0 has undefined behavior and returns +nil+ # def peek( k = 1 ) tk = look( k ) and return( tk.type ) end # # operates simillarly to #peek, but returns the full token object at look-ahead position +k+ # def look( k = 1 ) index = future?( k ) or return nil @tokens.fetch( index, EOF_TOKEN ) end alias >> look def << k self >> -k end # # returns the index of the on-channel token at look-ahead position +k+ or nil if no other # on-channel tokens exist # def future?( k = 1 ) @position == -1 and fill_buffer case when k == 0 then nil when k < 0 then past?( -k ) when k == 1 then @position else # since the stream only yields on-channel # tokens, the stream can't just go to the # next position, but rather must skip # over off-channel tokens ( k - 1 ).times.inject( @position ) do |cursor, | begin tk = @tokens.at( cursor += 1 ) or return( cursor ) # ^- if tk is nil (i.e. i is outside array limits) end until tk.channel == @channel cursor end end end # # returns the index of the on-channel token at look-behind position +k+ or nil if no other # on-channel tokens exist before the current token # def past?( k = 1 ) @position == -1 and fill_buffer case when k == 0 then nil when @position - k < 0 then nil else k.times.inject( @position ) do |cursor, | begin cursor <= 0 and return( nil ) tk = @tokens.at( cursor -= 1 ) or return( nil ) end until tk.channel == @channel cursor end end end # # yields each token in the stream (including off-channel tokens) # If no block is provided, the method returns an Enumerator object. # #each accepts the same arguments as #tokens # def each( *args ) block_given? or return enum_for( :each, *args ) tokens( *args ).each { |token| yield( token ) } end # # yields each token in the stream with the given channel value # If no channel value is given, the stream's tuned channel value will be used. # If no block is given, an enumerator will be returned. # def each_on_channel( channel = @channel ) block_given? or return enum_for( :each_on_channel, channel ) for token in @tokens token.channel == channel and yield( token ) end end # # iterates through the token stream, yielding each on channel token along the way. # After iteration has completed, the stream's position will be restored to where # it was before #walk was called. While #each or #each_on_channel does not change # the positions stream during iteration, #walk advances through the stream. This # makes it possible to look ahead and behind the current token during iteration. # If no block is given, an enumerator will be returned. # def walk block_given? or return enum_for( :walk ) initial_position = @position begin while token = look and token.type != EOF consume yield( token ) end return self ensure @position = initial_position end end # # returns a copy of the token buffer. If +start+ and +stop+ are provided, tokens # returns a slice of the token buffer from start..stop. The parameters # are converted to integers with their to_i methods, and thus tokens # can be provided to specify start and stop. If a block is provided, tokens are # yielded and filtered out of the return array if the block returns a +false+ # or +nil+ value. # def tokens( start = nil, stop = nil ) stop.nil? || stop >= @tokens.length and stop = @tokens.length - 1 start.nil? || stop < 0 and start = 0 tokens = @tokens[ start..stop ] if block_given? tokens.delete_if { |t| not yield( t ) } end return( tokens ) end def at( i ) @tokens.at i end # # identical to Array#[], as applied to the stream's token buffer # def []( i, *args ) @tokens[ i, *args ] end ###### Standard Conversion Methods ############################### def inspect string = "#<%p: @token_source=%p @ %p/%p" % [ self.class, @token_source.class, @position, @tokens.length ] tk = look( -1 ) and string << " #{ tk.inspect } <--" tk = look( 1 ) and string << " --> #{ tk.inspect }" string << '>' end # # fetches the text content of all tokens between +start+ and +stop+ and # joins the chunks into a single string # def extract_text( start = 0, stop = @tokens.length - 1 ) start = start.to_i.at_least( 0 ) stop = stop.to_i.at_most( @tokens.length ) @tokens[ start..stop ].map! { |t| t.text }.join( '' ) end alias to_s extract_text end end