• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`shlex` --- Simple lexical analysis
2========================================
3
4.. module:: shlex
5   :synopsis: Simple lexical analysis for Unix shell-like languages.
6.. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
7.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
8.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
9.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
10
11
12.. versionadded:: 1.5.2
13
14**Source code:** :source:`Lib/shlex.py`
15
16--------------
17
18
19The :class:`~shlex.shlex` class makes it easy to write lexical analyzers for
20simple syntaxes resembling that of the Unix shell.  This will often be useful
21for writing minilanguages, (for example, in run control files for Python
22applications) or for parsing quoted strings.
23
24Prior to Python 2.7.3, this module did not support Unicode input.
25
26The :mod:`shlex` module defines the following functions:
27
28
29.. function:: split(s[, comments[, posix]])
30
31   Split the string *s* using shell-like syntax. If *comments* is :const:`False`
32   (the default), the parsing of comments in the given string will be disabled
33   (setting the :attr:`~shlex.commenters` attribute of the
34   :class:`~shlex.shlex` instance to the empty string).  This function operates
35   in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is
36   false.
37
38   .. versionadded:: 2.3
39
40   .. versionchanged:: 2.6
41      Added the *posix* parameter.
42
43   .. note::
44
45      Since the :func:`split` function instantiates a :class:`~shlex.shlex`
46      instance, passing ``None`` for *s* will read the string to split from
47      standard input.
48
49The :mod:`shlex` module defines the following class:
50
51
52.. class:: shlex([instream[, infile[, posix]]])
53
54   A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer
55   object.  The initialization argument, if present, specifies where to read
56   characters from. It must be a file-/stream-like object with
57   :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or
58   a string (strings are accepted since Python 2.3).  If no argument is given,
59   input will be taken from ``sys.stdin``.  The second optional argument is a
60   filename string, which sets the initial value of the :attr:`~shlex.infile`
61   attribute.  If the *instream* argument is omitted or equal to ``sys.stdin``,
62   this second argument defaults to "stdin".  The *posix* argument was
63   introduced in Python 2.3, and defines the operational mode.  When *posix* is
64   not true (default), the :class:`~shlex.shlex` instance will operate in
65   compatibility mode.  When operating in POSIX mode, :class:`~shlex.shlex`
66   will try to be as close as possible to the POSIX shell parsing rules.
67
68
69.. seealso::
70
71   Module :mod:`ConfigParser`
72      Parser for configuration files similar to the Windows :file:`.ini` files.
73
74
75.. _shlex-objects:
76
77shlex Objects
78-------------
79
80A :class:`~shlex.shlex` instance has the following methods:
81
82
83.. method:: shlex.get_token()
84
85   Return a token.  If tokens have been stacked using :meth:`push_token`, pop a
86   token off the stack.  Otherwise, read one from the input stream.  If reading
87   encounters an immediate end-of-file, :attr:`eof` is returned (the empty
88   string (``''``) in non-POSIX mode, and ``None`` in POSIX mode).
89
90
91.. method:: shlex.push_token(str)
92
93   Push the argument onto the token stack.
94
95
96.. method:: shlex.read_token()
97
98   Read a raw token.  Ignore the pushback stack, and do not interpret source
99   requests.  (This is not ordinarily a useful entry point, and is documented here
100   only for the sake of completeness.)
101
102
103.. method:: shlex.sourcehook(filename)
104
105   When :class:`~shlex.shlex` detects a source request (see :attr:`source`
106   below) this method is given the following token as argument, and expected
107   to return a tuple consisting of a filename and an open file-like object.
108
109   Normally, this method first strips any quotes off the argument.  If the result
110   is an absolute pathname, or there was no previous source request in effect, or
111   the previous source was a stream (such as ``sys.stdin``), the result is left
112   alone.  Otherwise, if the result is a relative pathname, the directory part of
113   the name of the file immediately before it on the source inclusion stack is
114   prepended (this behavior is like the way the C preprocessor handles ``#include
115   "file.h"``).
116
117   The result of the manipulations is treated as a filename, and returned as the
118   first component of the tuple, with :func:`open` called on it to yield the second
119   component. (Note: this is the reverse of the order of arguments in instance
120   initialization!)
121
122   This hook is exposed so that you can use it to implement directory search paths,
123   addition of file extensions, and other namespace hacks. There is no
124   corresponding 'close' hook, but a shlex instance will call the
125   :meth:`~io.IOBase.close` method of the sourced input stream when it returns
126   EOF.
127
128   For more explicit control of source stacking, use the :meth:`push_source` and
129   :meth:`pop_source` methods.
130
131
132.. method:: shlex.push_source(stream[, filename])
133
134   Push an input source stream onto the input stack.  If the filename argument is
135   specified it will later be available for use in error messages.  This is the
136   same method used internally by the :meth:`sourcehook` method.
137
138   .. versionadded:: 2.1
139
140
141.. method:: shlex.pop_source()
142
143   Pop the last-pushed input source from the input stack. This is the same method
144   used internally when the lexer reaches EOF on a stacked input stream.
145
146   .. versionadded:: 2.1
147
148
149.. method:: shlex.error_leader([file[, line]])
150
151   This method generates an error message leader in the format of a Unix C compiler
152   error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced
153   with the name of the current source file and the ``%d`` with the current input
154   line number (the optional arguments can be used to override these).
155
156   This convenience is provided to encourage :mod:`shlex` users to generate error
157   messages in the standard, parseable format understood by Emacs and other Unix
158   tools.
159
160Instances of :class:`~shlex.shlex` subclasses have some public instance
161variables which either control lexical analysis or can be used for debugging:
162
163
164.. attribute:: shlex.commenters
165
166   The string of characters that are recognized as comment beginners. All
167   characters from the comment beginner to end of line are ignored. Includes just
168   ``'#'`` by default.
169
170
171.. attribute:: shlex.wordchars
172
173   The string of characters that will accumulate into multi-character tokens.  By
174   default, includes all ASCII alphanumerics and underscore.
175
176
177.. attribute:: shlex.whitespace
178
179   Characters that will be considered whitespace and skipped.  Whitespace bounds
180   tokens.  By default, includes space, tab, linefeed and carriage-return.
181
182
183.. attribute:: shlex.escape
184
185   Characters that will be considered as escape. This will be only used in POSIX
186   mode, and includes just ``'\'`` by default.
187
188   .. versionadded:: 2.3
189
190
191.. attribute:: shlex.quotes
192
193   Characters that will be considered string quotes.  The token accumulates until
194   the same quote is encountered again (thus, different quote types protect each
195   other as in the shell.)  By default, includes ASCII single and double quotes.
196
197
198.. attribute:: shlex.escapedquotes
199
200   Characters in :attr:`quotes` that will interpret escape characters defined in
201   :attr:`escape`.  This is only used in POSIX mode, and includes just ``'"'`` by
202   default.
203
204   .. versionadded:: 2.3
205
206
207.. attribute:: shlex.whitespace_split
208
209   If ``True``, tokens will only be split in whitespaces. This is useful, for
210   example, for parsing command lines with :class:`~shlex.shlex`, getting
211   tokens in a similar way to shell arguments.
212
213   .. versionadded:: 2.3
214
215
216.. attribute:: shlex.infile
217
218   The name of the current input file, as initially set at class instantiation time
219   or stacked by later source requests.  It may be useful to examine this when
220   constructing error messages.
221
222
223.. attribute:: shlex.instream
224
225   The input stream from which this :class:`~shlex.shlex` instance is reading
226   characters.
227
228
229.. attribute:: shlex.source
230
231   This attribute is ``None`` by default.  If you assign a string to it, that
232   string will be recognized as a lexical-level inclusion request similar to the
233   ``source`` keyword in various shells.  That is, the immediately following token
234   will be opened as a filename and input will
235   be taken from that stream until EOF, at which
236   point the :meth:`~io.IOBase.close` method of that stream will be called and
237   the input source will again become the original input stream.  Source
238   requests may be stacked any number of levels deep.
239
240
241.. attribute:: shlex.debug
242
243   If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex`
244   instance will print verbose progress output on its behavior.  If you need
245   to use this, you can read the module source code to learn the details.
246
247
248.. attribute:: shlex.lineno
249
250   Source line number (count of newlines seen so far plus one).
251
252
253.. attribute:: shlex.token
254
255   The token buffer.  It may be useful to examine this when catching exceptions.
256
257
258.. attribute:: shlex.eof
259
260   Token used to determine end of file. This will be set to the empty string
261   (``''``), in non-POSIX mode, and to ``None`` in POSIX mode.
262
263   .. versionadded:: 2.3
264
265
266.. _shlex-parsing-rules:
267
268Parsing Rules
269-------------
270
271When operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the
272following rules.
273
274* Quote characters are not recognized within words (``Do"Not"Separate`` is
275  parsed as the single word ``Do"Not"Separate``);
276
277* Escape characters are not recognized;
278
279* Enclosing characters in quotes preserve the literal value of all characters
280  within the quotes;
281
282* Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and
283  ``Separate``);
284
285* If :attr:`~shlex.whitespace_split` is ``False``, any character not
286  declared to be a word character, whitespace, or a quote will be returned as
287  a single-character token. If it is ``True``, :class:`~shlex.shlex` will only
288  split words in whitespaces;
289
290* EOF is signaled with an empty string (``''``);
291
292* It's not possible to parse empty strings, even if quoted.
293
294When operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the
295following parsing rules.
296
297* Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is
298  parsed as the single word ``DoNotSeparate``);
299
300* Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the
301  next character that follows;
302
303* Enclosing characters in quotes which are not part of
304  :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value
305  of all characters within the quotes;
306
307* Enclosing characters in quotes which are part of
308  :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value
309  of all characters within the quotes, with the exception of the characters
310  mentioned in :attr:`~shlex.escape`.  The escape characters retain its
311  special meaning only when followed by the quote in use, or the escape
312  character itself. Otherwise the escape character will be considered a
313  normal character.
314
315* EOF is signaled with a :const:`None` value;
316
317* Quoted empty strings (``''``) are allowed;
318
319