1:mod:`shlex` --- Simple lexical analysis 2======================================== 3 4.. module:: shlex 5 :synopsis: Simple lexical analysis for Unix shell-like languages. 6 7.. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com> 8.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> 9.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com> 10.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> 11 12**Source code:** :source:`Lib/shlex.py` 13 14-------------- 15 16The :class:`~shlex.shlex` class makes it easy to write lexical analyzers for 17simple syntaxes resembling that of the Unix shell. This will often be useful 18for writing minilanguages, (for example, in run control files for Python 19applications) or for parsing quoted strings. 20 21The :mod:`shlex` module defines the following functions: 22 23 24.. function:: split(s, comments=False, posix=True) 25 26 Split the string *s* using shell-like syntax. If *comments* is :const:`False` 27 (the default), the parsing of comments in the given string will be disabled 28 (setting the :attr:`~shlex.commenters` attribute of the 29 :class:`~shlex.shlex` instance to the empty string). This function operates 30 in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is 31 false. 32 33 .. note:: 34 35 Since the :func:`split` function instantiates a :class:`~shlex.shlex` 36 instance, passing ``None`` for *s* will read the string to split from 37 standard input. 38 39 .. deprecated:: 3.9 40 Passing ``None`` for *s* will raise an exception in future Python 41 versions. 42 43.. function:: join(split_command) 44 45 Concatenate the tokens of the list *split_command* and return a string. 46 This function is the inverse of :func:`split`. 47 48 >>> from shlex import join 49 >>> print(join(['echo', '-n', 'Multiple words'])) 50 echo -n 'Multiple words' 51 52 The returned value is shell-escaped to protect against injection 53 vulnerabilities (see :func:`quote`). 54 55 .. versionadded:: 3.8 56 57 58.. function:: quote(s) 59 60 Return a shell-escaped version of the string *s*. The returned value is a 61 string that can safely be used as one token in a shell command line, for 62 cases where you cannot use a list. 63 64 .. _shlex-quote-warning: 65 66 .. warning:: 67 68 The ``shlex`` module is **only designed for Unix shells**. 69 70 The :func:`quote` function is not guaranteed to be correct on non-POSIX 71 compliant shells or shells from other operating systems such as Windows. 72 Executing commands quoted by this module on such shells can open up the 73 possibility of a command injection vulnerability. 74 75 Consider using functions that pass command arguments with lists such as 76 :func:`subprocess.run` with ``shell=False``. 77 78 This idiom would be unsafe: 79 80 >>> filename = 'somefile; rm -rf ~' 81 >>> command = 'ls -l {}'.format(filename) 82 >>> print(command) # executed by a shell: boom! 83 ls -l somefile; rm -rf ~ 84 85 :func:`quote` lets you plug the security hole: 86 87 >>> from shlex import quote 88 >>> command = 'ls -l {}'.format(quote(filename)) 89 >>> print(command) 90 ls -l 'somefile; rm -rf ~' 91 >>> remote_command = 'ssh home {}'.format(quote(command)) 92 >>> print(remote_command) 93 ssh home 'ls -l '"'"'somefile; rm -rf ~'"'"'' 94 95 The quoting is compatible with UNIX shells and with :func:`split`: 96 97 >>> from shlex import split 98 >>> remote_command = split(remote_command) 99 >>> remote_command 100 ['ssh', 'home', "ls -l 'somefile; rm -rf ~'"] 101 >>> command = split(remote_command[-1]) 102 >>> command 103 ['ls', '-l', 'somefile; rm -rf ~'] 104 105 .. versionadded:: 3.3 106 107The :mod:`shlex` module defines the following class: 108 109 110.. class:: shlex(instream=None, infile=None, posix=False, punctuation_chars=False) 111 112 A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer 113 object. The initialization argument, if present, specifies where to read 114 characters from. It must be a file-/stream-like object with 115 :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or 116 a string. If no argument is given, input will be taken from ``sys.stdin``. 117 The second optional argument is a filename string, which sets the initial 118 value of the :attr:`~shlex.infile` attribute. If the *instream* 119 argument is omitted or equal to ``sys.stdin``, this second argument 120 defaults to "stdin". The *posix* argument defines the operational mode: 121 when *posix* is not true (default), the :class:`~shlex.shlex` instance will 122 operate in compatibility mode. When operating in POSIX mode, 123 :class:`~shlex.shlex` will try to be as close as possible to the POSIX shell 124 parsing rules. The *punctuation_chars* argument provides a way to make the 125 behaviour even closer to how real shells parse. This can take a number of 126 values: the default value, ``False``, preserves the behaviour seen under 127 Python 3.5 and earlier. If set to ``True``, then parsing of the characters 128 ``();<>|&`` is changed: any run of these characters (considered punctuation 129 characters) is returned as a single token. If set to a non-empty string of 130 characters, those characters will be used as the punctuation characters. Any 131 characters in the :attr:`wordchars` attribute that appear in 132 *punctuation_chars* will be removed from :attr:`wordchars`. See 133 :ref:`improved-shell-compatibility` for more information. *punctuation_chars* 134 can be set only upon :class:`~shlex.shlex` instance creation and can't be 135 modified later. 136 137 .. versionchanged:: 3.6 138 The *punctuation_chars* parameter was added. 139 140.. seealso:: 141 142 Module :mod:`configparser` 143 Parser for configuration files similar to the Windows :file:`.ini` files. 144 145 146.. _shlex-objects: 147 148shlex Objects 149------------- 150 151A :class:`~shlex.shlex` instance has the following methods: 152 153 154.. method:: shlex.get_token() 155 156 Return a token. If tokens have been stacked using :meth:`push_token`, pop a 157 token off the stack. Otherwise, read one from the input stream. If reading 158 encounters an immediate end-of-file, :attr:`eof` is returned (the empty 159 string (``''``) in non-POSIX mode, and ``None`` in POSIX mode). 160 161 162.. method:: shlex.push_token(str) 163 164 Push the argument onto the token stack. 165 166 167.. method:: shlex.read_token() 168 169 Read a raw token. Ignore the pushback stack, and do not interpret source 170 requests. (This is not ordinarily a useful entry point, and is documented here 171 only for the sake of completeness.) 172 173 174.. method:: shlex.sourcehook(filename) 175 176 When :class:`~shlex.shlex` detects a source request (see :attr:`source` 177 below) this method is given the following token as argument, and expected 178 to return a tuple consisting of a filename and an open file-like object. 179 180 Normally, this method first strips any quotes off the argument. If the result 181 is an absolute pathname, or there was no previous source request in effect, or 182 the previous source was a stream (such as ``sys.stdin``), the result is left 183 alone. Otherwise, if the result is a relative pathname, the directory part of 184 the name of the file immediately before it on the source inclusion stack is 185 prepended (this behavior is like the way the C preprocessor handles ``#include 186 "file.h"``). 187 188 The result of the manipulations is treated as a filename, and returned as the 189 first component of the tuple, with :func:`open` called on it to yield the second 190 component. (Note: this is the reverse of the order of arguments in instance 191 initialization!) 192 193 This hook is exposed so that you can use it to implement directory search paths, 194 addition of file extensions, and other namespace hacks. There is no 195 corresponding 'close' hook, but a shlex instance will call the 196 :meth:`~io.IOBase.close` method of the sourced input stream when it returns 197 EOF. 198 199 For more explicit control of source stacking, use the :meth:`push_source` and 200 :meth:`pop_source` methods. 201 202 203.. method:: shlex.push_source(newstream, newfile=None) 204 205 Push an input source stream onto the input stack. If the filename argument is 206 specified it will later be available for use in error messages. This is the 207 same method used internally by the :meth:`sourcehook` method. 208 209 210.. method:: shlex.pop_source() 211 212 Pop the last-pushed input source from the input stack. This is the same method 213 used internally when the lexer reaches EOF on a stacked input stream. 214 215 216.. method:: shlex.error_leader(infile=None, lineno=None) 217 218 This method generates an error message leader in the format of a Unix C compiler 219 error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced 220 with the name of the current source file and the ``%d`` with the current input 221 line number (the optional arguments can be used to override these). 222 223 This convenience is provided to encourage :mod:`shlex` users to generate error 224 messages in the standard, parseable format understood by Emacs and other Unix 225 tools. 226 227Instances of :class:`~shlex.shlex` subclasses have some public instance 228variables which either control lexical analysis or can be used for debugging: 229 230 231.. attribute:: shlex.commenters 232 233 The string of characters that are recognized as comment beginners. All 234 characters from the comment beginner to end of line are ignored. Includes just 235 ``'#'`` by default. 236 237 238.. attribute:: shlex.wordchars 239 240 The string of characters that will accumulate into multi-character tokens. By 241 default, includes all ASCII alphanumerics and underscore. In POSIX mode, the 242 accented characters in the Latin-1 set are also included. If 243 :attr:`punctuation_chars` is not empty, the characters ``~-./*?=``, which can 244 appear in filename specifications and command line parameters, will also be 245 included in this attribute, and any characters which appear in 246 ``punctuation_chars`` will be removed from ``wordchars`` if they are present 247 there. If :attr:`whitespace_split` is set to ``True``, this will have no 248 effect. 249 250 251.. attribute:: shlex.whitespace 252 253 Characters that will be considered whitespace and skipped. Whitespace bounds 254 tokens. By default, includes space, tab, linefeed and carriage-return. 255 256 257.. attribute:: shlex.escape 258 259 Characters that will be considered as escape. This will be only used in POSIX 260 mode, and includes just ``'\'`` by default. 261 262 263.. attribute:: shlex.quotes 264 265 Characters that will be considered string quotes. The token accumulates until 266 the same quote is encountered again (thus, different quote types protect each 267 other as in the shell.) By default, includes ASCII single and double quotes. 268 269 270.. attribute:: shlex.escapedquotes 271 272 Characters in :attr:`quotes` that will interpret escape characters defined in 273 :attr:`escape`. This is only used in POSIX mode, and includes just ``'"'`` by 274 default. 275 276 277.. attribute:: shlex.whitespace_split 278 279 If ``True``, tokens will only be split in whitespaces. This is useful, for 280 example, for parsing command lines with :class:`~shlex.shlex`, getting 281 tokens in a similar way to shell arguments. When used in combination with 282 :attr:`punctuation_chars`, tokens will be split on whitespace in addition to 283 those characters. 284 285 .. versionchanged:: 3.8 286 The :attr:`punctuation_chars` attribute was made compatible with the 287 :attr:`whitespace_split` attribute. 288 289 290.. attribute:: shlex.infile 291 292 The name of the current input file, as initially set at class instantiation time 293 or stacked by later source requests. It may be useful to examine this when 294 constructing error messages. 295 296 297.. attribute:: shlex.instream 298 299 The input stream from which this :class:`~shlex.shlex` instance is reading 300 characters. 301 302 303.. attribute:: shlex.source 304 305 This attribute is ``None`` by default. If you assign a string to it, that 306 string will be recognized as a lexical-level inclusion request similar to the 307 ``source`` keyword in various shells. That is, the immediately following token 308 will be opened as a filename and input will be taken from that stream until 309 EOF, at which point the :meth:`~io.IOBase.close` method of that stream will be 310 called and the input source will again become the original input stream. Source 311 requests may be stacked any number of levels deep. 312 313 314.. attribute:: shlex.debug 315 316 If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex` 317 instance will print verbose progress output on its behavior. If you need 318 to use this, you can read the module source code to learn the details. 319 320 321.. attribute:: shlex.lineno 322 323 Source line number (count of newlines seen so far plus one). 324 325 326.. attribute:: shlex.token 327 328 The token buffer. It may be useful to examine this when catching exceptions. 329 330 331.. attribute:: shlex.eof 332 333 Token used to determine end of file. This will be set to the empty string 334 (``''``), in non-POSIX mode, and to ``None`` in POSIX mode. 335 336 337.. attribute:: shlex.punctuation_chars 338 339 A read-only property. Characters that will be considered punctuation. Runs of 340 punctuation characters will be returned as a single token. However, note that no 341 semantic validity checking will be performed: for example, '>>>' could be 342 returned as a token, even though it may not be recognised as such by shells. 343 344 .. versionadded:: 3.6 345 346 347.. _shlex-parsing-rules: 348 349Parsing Rules 350------------- 351 352When operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the 353following rules. 354 355* Quote characters are not recognized within words (``Do"Not"Separate`` is 356 parsed as the single word ``Do"Not"Separate``); 357 358* Escape characters are not recognized; 359 360* Enclosing characters in quotes preserve the literal value of all characters 361 within the quotes; 362 363* Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and 364 ``Separate``); 365 366* If :attr:`~shlex.whitespace_split` is ``False``, any character not 367 declared to be a word character, whitespace, or a quote will be returned as 368 a single-character token. If it is ``True``, :class:`~shlex.shlex` will only 369 split words in whitespaces; 370 371* EOF is signaled with an empty string (``''``); 372 373* It's not possible to parse empty strings, even if quoted. 374 375When operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the 376following parsing rules. 377 378* Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is 379 parsed as the single word ``DoNotSeparate``); 380 381* Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the 382 next character that follows; 383 384* Enclosing characters in quotes which are not part of 385 :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value 386 of all characters within the quotes; 387 388* Enclosing characters in quotes which are part of 389 :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value 390 of all characters within the quotes, with the exception of the characters 391 mentioned in :attr:`~shlex.escape`. The escape characters retain its 392 special meaning only when followed by the quote in use, or the escape 393 character itself. Otherwise the escape character will be considered a 394 normal character. 395 396* EOF is signaled with a :const:`None` value; 397 398* Quoted empty strings (``''``) are allowed. 399 400.. _improved-shell-compatibility: 401 402Improved Compatibility with Shells 403---------------------------------- 404 405.. versionadded:: 3.6 406 407The :class:`shlex` class provides compatibility with the parsing performed by 408common Unix shells like ``bash``, ``dash``, and ``sh``. To take advantage of 409this compatibility, specify the ``punctuation_chars`` argument in the 410constructor. This defaults to ``False``, which preserves pre-3.6 behaviour. 411However, if it is set to ``True``, then parsing of the characters ``();<>|&`` 412is changed: any run of these characters is returned as a single token. While 413this is short of a full parser for shells (which would be out of scope for the 414standard library, given the multiplicity of shells out there), it does allow 415you to perform processing of command lines more easily than you could 416otherwise. To illustrate, you can see the difference in the following snippet: 417 418.. doctest:: 419 :options: +NORMALIZE_WHITESPACE 420 421 >>> import shlex 422 >>> text = "a && b; c && d || e; f >'abc'; (def \"ghi\")" 423 >>> s = shlex.shlex(text, posix=True) 424 >>> s.whitespace_split = True 425 >>> list(s) 426 ['a', '&&', 'b;', 'c', '&&', 'd', '||', 'e;', 'f', '>abc;', '(def', 'ghi)'] 427 >>> s = shlex.shlex(text, posix=True, punctuation_chars=True) 428 >>> s.whitespace_split = True 429 >>> list(s) 430 ['a', '&&', 'b', ';', 'c', '&&', 'd', '||', 'e', ';', 'f', '>', 'abc', ';', 431 '(', 'def', 'ghi', ')'] 432 433Of course, tokens will be returned which are not valid for shells, and you'll 434need to implement your own error checks on the returned tokens. 435 436Instead of passing ``True`` as the value for the punctuation_chars parameter, 437you can pass a string with specific characters, which will be used to determine 438which characters constitute punctuation. For example:: 439 440 >>> import shlex 441 >>> s = shlex.shlex("a && b || c", punctuation_chars="|") 442 >>> list(s) 443 ['a', '&', '&', 'b', '||', 'c'] 444 445.. note:: When ``punctuation_chars`` is specified, the :attr:`~shlex.wordchars` 446 attribute is augmented with the characters ``~-./*?=``. That is because these 447 characters can appear in file names (including wildcards) and command-line 448 arguments (e.g. ``--color=auto``). Hence:: 449 450 >>> import shlex 451 >>> s = shlex.shlex('~/a && b-c --color=auto || d *.py?', 452 ... punctuation_chars=True) 453 >>> list(s) 454 ['~/a', '&&', 'b-c', '--color=auto', '||', 'd', '*.py?'] 455 456 However, to match the shell as closely as possible, it is recommended to 457 always use ``posix`` and :attr:`~shlex.whitespace_split` when using 458 :attr:`~shlex.punctuation_chars`, which will negate 459 :attr:`~shlex.wordchars` entirely. 460 461For best effect, ``punctuation_chars`` should be set in conjunction with 462``posix=True``. (Note that ``posix=False`` is the default for 463:class:`~shlex.shlex`.) 464