1:mod:`shlex` --- Simple lexical analysis 2======================================== 3 4.. module:: shlex 5 :synopsis: Simple lexical analysis for Unix shell-like languages. 6.. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com> 7.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> 8.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com> 9.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> 10 11 12.. versionadded:: 1.5.2 13 14**Source code:** :source:`Lib/shlex.py` 15 16-------------- 17 18 19The :class:`~shlex.shlex` class makes it easy to write lexical analyzers for 20simple syntaxes resembling that of the Unix shell. This will often be useful 21for writing minilanguages, (for example, in run control files for Python 22applications) or for parsing quoted strings. 23 24Prior to Python 2.7.3, this module did not support Unicode input. 25 26The :mod:`shlex` module defines the following functions: 27 28 29.. function:: split(s[, comments[, posix]]) 30 31 Split the string *s* using shell-like syntax. If *comments* is :const:`False` 32 (the default), the parsing of comments in the given string will be disabled 33 (setting the :attr:`~shlex.commenters` attribute of the 34 :class:`~shlex.shlex` instance to the empty string). This function operates 35 in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is 36 false. 37 38 .. versionadded:: 2.3 39 40 .. versionchanged:: 2.6 41 Added the *posix* parameter. 42 43 .. note:: 44 45 Since the :func:`split` function instantiates a :class:`~shlex.shlex` 46 instance, passing ``None`` for *s* will read the string to split from 47 standard input. 48 49The :mod:`shlex` module defines the following class: 50 51 52.. class:: shlex([instream[, infile[, posix]]]) 53 54 A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer 55 object. The initialization argument, if present, specifies where to read 56 characters from. It must be a file-/stream-like object with 57 :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or 58 a string (strings are accepted since Python 2.3). If no argument is given, 59 input will be taken from ``sys.stdin``. The second optional argument is a 60 filename string, which sets the initial value of the :attr:`~shlex.infile` 61 attribute. If the *instream* argument is omitted or equal to ``sys.stdin``, 62 this second argument defaults to "stdin". The *posix* argument was 63 introduced in Python 2.3, and defines the operational mode. When *posix* is 64 not true (default), the :class:`~shlex.shlex` instance will operate in 65 compatibility mode. When operating in POSIX mode, :class:`~shlex.shlex` 66 will try to be as close as possible to the POSIX shell parsing rules. 67 68 69.. seealso:: 70 71 Module :mod:`ConfigParser` 72 Parser for configuration files similar to the Windows :file:`.ini` files. 73 74 75.. _shlex-objects: 76 77shlex Objects 78------------- 79 80A :class:`~shlex.shlex` instance has the following methods: 81 82 83.. method:: shlex.get_token() 84 85 Return a token. If tokens have been stacked using :meth:`push_token`, pop a 86 token off the stack. Otherwise, read one from the input stream. If reading 87 encounters an immediate end-of-file, :attr:`eof` is returned (the empty 88 string (``''``) in non-POSIX mode, and ``None`` in POSIX mode). 89 90 91.. method:: shlex.push_token(str) 92 93 Push the argument onto the token stack. 94 95 96.. method:: shlex.read_token() 97 98 Read a raw token. Ignore the pushback stack, and do not interpret source 99 requests. (This is not ordinarily a useful entry point, and is documented here 100 only for the sake of completeness.) 101 102 103.. method:: shlex.sourcehook(filename) 104 105 When :class:`~shlex.shlex` detects a source request (see :attr:`source` 106 below) this method is given the following token as argument, and expected 107 to return a tuple consisting of a filename and an open file-like object. 108 109 Normally, this method first strips any quotes off the argument. If the result 110 is an absolute pathname, or there was no previous source request in effect, or 111 the previous source was a stream (such as ``sys.stdin``), the result is left 112 alone. Otherwise, if the result is a relative pathname, the directory part of 113 the name of the file immediately before it on the source inclusion stack is 114 prepended (this behavior is like the way the C preprocessor handles ``#include 115 "file.h"``). 116 117 The result of the manipulations is treated as a filename, and returned as the 118 first component of the tuple, with :func:`open` called on it to yield the second 119 component. (Note: this is the reverse of the order of arguments in instance 120 initialization!) 121 122 This hook is exposed so that you can use it to implement directory search paths, 123 addition of file extensions, and other namespace hacks. There is no 124 corresponding 'close' hook, but a shlex instance will call the 125 :meth:`~io.IOBase.close` method of the sourced input stream when it returns 126 EOF. 127 128 For more explicit control of source stacking, use the :meth:`push_source` and 129 :meth:`pop_source` methods. 130 131 132.. method:: shlex.push_source(stream[, filename]) 133 134 Push an input source stream onto the input stack. If the filename argument is 135 specified it will later be available for use in error messages. This is the 136 same method used internally by the :meth:`sourcehook` method. 137 138 .. versionadded:: 2.1 139 140 141.. method:: shlex.pop_source() 142 143 Pop the last-pushed input source from the input stack. This is the same method 144 used internally when the lexer reaches EOF on a stacked input stream. 145 146 .. versionadded:: 2.1 147 148 149.. method:: shlex.error_leader([file[, line]]) 150 151 This method generates an error message leader in the format of a Unix C compiler 152 error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced 153 with the name of the current source file and the ``%d`` with the current input 154 line number (the optional arguments can be used to override these). 155 156 This convenience is provided to encourage :mod:`shlex` users to generate error 157 messages in the standard, parseable format understood by Emacs and other Unix 158 tools. 159 160Instances of :class:`~shlex.shlex` subclasses have some public instance 161variables which either control lexical analysis or can be used for debugging: 162 163 164.. attribute:: shlex.commenters 165 166 The string of characters that are recognized as comment beginners. All 167 characters from the comment beginner to end of line are ignored. Includes just 168 ``'#'`` by default. 169 170 171.. attribute:: shlex.wordchars 172 173 The string of characters that will accumulate into multi-character tokens. By 174 default, includes all ASCII alphanumerics and underscore. 175 176 177.. attribute:: shlex.whitespace 178 179 Characters that will be considered whitespace and skipped. Whitespace bounds 180 tokens. By default, includes space, tab, linefeed and carriage-return. 181 182 183.. attribute:: shlex.escape 184 185 Characters that will be considered as escape. This will be only used in POSIX 186 mode, and includes just ``'\'`` by default. 187 188 .. versionadded:: 2.3 189 190 191.. attribute:: shlex.quotes 192 193 Characters that will be considered string quotes. The token accumulates until 194 the same quote is encountered again (thus, different quote types protect each 195 other as in the shell.) By default, includes ASCII single and double quotes. 196 197 198.. attribute:: shlex.escapedquotes 199 200 Characters in :attr:`quotes` that will interpret escape characters defined in 201 :attr:`escape`. This is only used in POSIX mode, and includes just ``'"'`` by 202 default. 203 204 .. versionadded:: 2.3 205 206 207.. attribute:: shlex.whitespace_split 208 209 If ``True``, tokens will only be split in whitespaces. This is useful, for 210 example, for parsing command lines with :class:`~shlex.shlex`, getting 211 tokens in a similar way to shell arguments. 212 213 .. versionadded:: 2.3 214 215 216.. attribute:: shlex.infile 217 218 The name of the current input file, as initially set at class instantiation time 219 or stacked by later source requests. It may be useful to examine this when 220 constructing error messages. 221 222 223.. attribute:: shlex.instream 224 225 The input stream from which this :class:`~shlex.shlex` instance is reading 226 characters. 227 228 229.. attribute:: shlex.source 230 231 This attribute is ``None`` by default. If you assign a string to it, that 232 string will be recognized as a lexical-level inclusion request similar to the 233 ``source`` keyword in various shells. That is, the immediately following token 234 will be opened as a filename and input will 235 be taken from that stream until EOF, at which 236 point the :meth:`~io.IOBase.close` method of that stream will be called and 237 the input source will again become the original input stream. Source 238 requests may be stacked any number of levels deep. 239 240 241.. attribute:: shlex.debug 242 243 If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex` 244 instance will print verbose progress output on its behavior. If you need 245 to use this, you can read the module source code to learn the details. 246 247 248.. attribute:: shlex.lineno 249 250 Source line number (count of newlines seen so far plus one). 251 252 253.. attribute:: shlex.token 254 255 The token buffer. It may be useful to examine this when catching exceptions. 256 257 258.. attribute:: shlex.eof 259 260 Token used to determine end of file. This will be set to the empty string 261 (``''``), in non-POSIX mode, and to ``None`` in POSIX mode. 262 263 .. versionadded:: 2.3 264 265 266.. _shlex-parsing-rules: 267 268Parsing Rules 269------------- 270 271When operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the 272following rules. 273 274* Quote characters are not recognized within words (``Do"Not"Separate`` is 275 parsed as the single word ``Do"Not"Separate``); 276 277* Escape characters are not recognized; 278 279* Enclosing characters in quotes preserve the literal value of all characters 280 within the quotes; 281 282* Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and 283 ``Separate``); 284 285* If :attr:`~shlex.whitespace_split` is ``False``, any character not 286 declared to be a word character, whitespace, or a quote will be returned as 287 a single-character token. If it is ``True``, :class:`~shlex.shlex` will only 288 split words in whitespaces; 289 290* EOF is signaled with an empty string (``''``); 291 292* It's not possible to parse empty strings, even if quoted. 293 294When operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the 295following parsing rules. 296 297* Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is 298 parsed as the single word ``DoNotSeparate``); 299 300* Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the 301 next character that follows; 302 303* Enclosing characters in quotes which are not part of 304 :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value 305 of all characters within the quotes; 306 307* Enclosing characters in quotes which are part of 308 :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value 309 of all characters within the quotes, with the exception of the characters 310 mentioned in :attr:`~shlex.escape`. The escape characters retain its 311 special meaning only when followed by the quote in use, or the escape 312 character itself. Otherwise the escape character will be considered a 313 normal character. 314 315* EOF is signaled with a :const:`None` value; 316 317* Quoted empty strings (``''``) are allowed; 318 319