1:mod:`email.parser`: Parsing email messages 2------------------------------------------- 3 4.. module:: email.parser 5 :synopsis: Parse flat text email messages to produce a message object structure. 6 7**Source code:** :source:`Lib/email/parser.py` 8 9-------------- 10 11Message object structures can be created in one of two ways: they can be 12created from whole cloth by creating an :class:`~email.message.EmailMessage` 13object, adding headers using the dictionary interface, and adding payload(s) 14using :meth:`~email.message.EmailMessage.set_content` and related methods, or 15they can be created by parsing a serialized representation of the email 16message. 17 18The :mod:`email` package provides a standard parser that understands most email 19document structures, including MIME documents. You can pass the parser a 20bytes, string or file object, and the parser will return to you the root 21:class:`~email.message.EmailMessage` instance of the object structure. For 22simple, non-MIME messages the payload of this root object will likely be a 23string containing the text of the message. For MIME messages, the root object 24will return ``True`` from its :meth:`~email.message.EmailMessage.is_multipart` 25method, and the subparts can be accessed via the payload manipulation methods, 26such as :meth:`~email.message.EmailMessage.get_body`, 27:meth:`~email.message.EmailMessage.iter_parts`, and 28:meth:`~email.message.EmailMessage.walk`. 29 30There are actually two parser interfaces available for use, the :class:`Parser` 31API and the incremental :class:`FeedParser` API. The :class:`Parser` API is 32most useful if you have the entire text of the message in memory, or if the 33entire message lives in a file on the file system. :class:`FeedParser` is more 34appropriate when you are reading the message from a stream which might block 35waiting for more input (such as reading an email message from a socket). The 36:class:`FeedParser` can consume and parse the message incrementally, and only 37returns the root object when you close the parser. 38 39Note that the parser can be extended in limited ways, and of course you can 40implement your own parser completely from scratch. All of the logic that 41connects the :mod:`email` package's bundled parser and the 42:class:`~email.message.EmailMessage` class is embodied in the :mod:`policy` 43class, so a custom parser can create message object trees any way it finds 44necessary by implementing custom versions of the appropriate :mod:`policy` 45methods. 46 47 48FeedParser API 49^^^^^^^^^^^^^^ 50 51The :class:`BytesFeedParser`, imported from the :mod:`email.feedparser` module, 52provides an API that is conducive to incremental parsing of email messages, 53such as would be necessary when reading the text of an email message from a 54source that can block (such as a socket). The :class:`BytesFeedParser` can of 55course be used to parse an email message fully contained in a :term:`bytes-like 56object`, string, or file, but the :class:`BytesParser` API may be more 57convenient for such use cases. The semantics and results of the two parser 58APIs are identical. 59 60The :class:`BytesFeedParser`'s API is simple; you create an instance, feed it a 61bunch of bytes until there's no more to feed it, then close the parser to 62retrieve the root message object. The :class:`BytesFeedParser` is extremely 63accurate when parsing standards-compliant messages, and it does a very good job 64of parsing non-compliant messages, providing information about how a message 65was deemed broken. It will populate a message object's 66:attr:`~email.message.EmailMessage.defects` attribute with a list of any 67problems it found in a message. See the :mod:`email.errors` module for the 68list of defects that it can find. 69 70Here is the API for the :class:`BytesFeedParser`: 71 72 73.. class:: BytesFeedParser(_factory=None, *, policy=policy.compat32) 74 75 Create a :class:`BytesFeedParser` instance. Optional *_factory* is a 76 no-argument callable; if not specified use the 77 :attr:`~email.policy.Policy.message_factory` from the *policy*. Call 78 *_factory* whenever a new message object is needed. 79 80 If *policy* is specified use the rules it specifies to update the 81 representation of the message. If *policy* is not set, use the 82 :class:`compat32 <email.policy.Compat32>` policy, which maintains backward 83 compatibility with the Python 3.2 version of the email package and provides 84 :class:`~email.message.Message` as the default factory. All other policies 85 provide :class:`~email.message.EmailMessage` as the default *_factory*. For 86 more information on what else *policy* controls, see the 87 :mod:`~email.policy` documentation. 88 89 Note: **The policy keyword should always be specified**; The default will 90 change to :data:`email.policy.default` in a future version of Python. 91 92 .. versionadded:: 3.2 93 94 .. versionchanged:: 3.3 Added the *policy* keyword. 95 .. versionchanged:: 3.6 *_factory* defaults to the policy ``message_factory``. 96 97 98 .. method:: feed(data) 99 100 Feed the parser some more data. *data* should be a :term:`bytes-like 101 object` containing one or more lines. The lines can be partial and the 102 parser will stitch such partial lines together properly. The lines can 103 have any of the three common line endings: carriage return, newline, or 104 carriage return and newline (they can even be mixed). 105 106 107 .. method:: close() 108 109 Complete the parsing of all previously fed data and return the root 110 message object. It is undefined what happens if :meth:`~feed` is called 111 after this method has been called. 112 113 114.. class:: FeedParser(_factory=None, *, policy=policy.compat32) 115 116 Works like :class:`BytesFeedParser` except that the input to the 117 :meth:`~BytesFeedParser.feed` method must be a string. This is of limited 118 utility, since the only way for such a message to be valid is for it to 119 contain only ASCII text or, if :attr:`~email.policy.Policy.utf8` is 120 ``True``, no binary attachments. 121 122 .. versionchanged:: 3.3 Added the *policy* keyword. 123 124 125Parser API 126^^^^^^^^^^ 127 128The :class:`BytesParser` class, imported from the :mod:`email.parser` module, 129provides an API that can be used to parse a message when the complete contents 130of the message are available in a :term:`bytes-like object` or file. The 131:mod:`email.parser` module also provides :class:`Parser` for parsing strings, 132and header-only parsers, :class:`BytesHeaderParser` and 133:class:`HeaderParser`, which can be used if you're only interested in the 134headers of the message. :class:`BytesHeaderParser` and :class:`HeaderParser` 135can be much faster in these situations, since they do not attempt to parse the 136message body, instead setting the payload to the raw body. 137 138 139.. class:: BytesParser(_class=None, *, policy=policy.compat32) 140 141 Create a :class:`BytesParser` instance. The *_class* and *policy* 142 arguments have the same meaning and semantics as the *_factory* 143 and *policy* arguments of :class:`BytesFeedParser`. 144 145 Note: **The policy keyword should always be specified**; The default will 146 change to :data:`email.policy.default` in a future version of Python. 147 148 .. versionchanged:: 3.3 149 Removed the *strict* argument that was deprecated in 2.4. Added the 150 *policy* keyword. 151 .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``. 152 153 154 .. method:: parse(fp, headersonly=False) 155 156 Read all the data from the binary file-like object *fp*, parse the 157 resulting bytes, and return the message object. *fp* must support 158 both the :meth:`~io.IOBase.readline` and the :meth:`~io.IOBase.read` 159 methods. 160 161 The bytes contained in *fp* must be formatted as a block of :rfc:`5322` 162 (or, if :attr:`~email.policy.Policy.utf8` is ``True``, :rfc:`6532`) 163 style headers and header continuation lines, optionally preceded by an 164 envelope header. The header block is terminated either by the end of the 165 data or by a blank line. Following the header block is the body of the 166 message (which may contain MIME-encoded subparts, including subparts 167 with a :mailheader:`Content-Transfer-Encoding` of ``8bit``). 168 169 Optional *headersonly* is a flag specifying whether to stop parsing after 170 reading the headers or not. The default is ``False``, meaning it parses 171 the entire contents of the file. 172 173 174 .. method:: parsebytes(bytes, headersonly=False) 175 176 Similar to the :meth:`parse` method, except it takes a :term:`bytes-like 177 object` instead of a file-like object. Calling this method on a 178 :term:`bytes-like object` is equivalent to wrapping *bytes* in a 179 :class:`~io.BytesIO` instance first and calling :meth:`parse`. 180 181 Optional *headersonly* is as with the :meth:`parse` method. 182 183 .. versionadded:: 3.2 184 185 186.. class:: BytesHeaderParser(_class=None, *, policy=policy.compat32) 187 188 Exactly like :class:`BytesParser`, except that *headersonly* 189 defaults to ``True``. 190 191 .. versionadded:: 3.3 192 193 194.. class:: Parser(_class=None, *, policy=policy.compat32) 195 196 This class is parallel to :class:`BytesParser`, but handles string input. 197 198 .. versionchanged:: 3.3 199 Removed the *strict* argument. Added the *policy* keyword. 200 .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``. 201 202 203 .. method:: parse(fp, headersonly=False) 204 205 Read all the data from the text-mode file-like object *fp*, parse the 206 resulting text, and return the root message object. *fp* must support 207 both the :meth:`~io.TextIOBase.readline` and the 208 :meth:`~io.TextIOBase.read` methods on file-like objects. 209 210 Other than the text mode requirement, this method operates like 211 :meth:`BytesParser.parse`. 212 213 214 .. method:: parsestr(text, headersonly=False) 215 216 Similar to the :meth:`parse` method, except it takes a string object 217 instead of a file-like object. Calling this method on a string is 218 equivalent to wrapping *text* in a :class:`~io.StringIO` instance first 219 and calling :meth:`parse`. 220 221 Optional *headersonly* is as with the :meth:`parse` method. 222 223 224.. class:: HeaderParser(_class=None, *, policy=policy.compat32) 225 226 Exactly like :class:`Parser`, except that *headersonly* 227 defaults to ``True``. 228 229 230Since creating a message object structure from a string or a file object is such 231a common task, four functions are provided as a convenience. They are available 232in the top-level :mod:`email` package namespace. 233 234.. currentmodule:: email 235 236 237.. function:: message_from_bytes(s, _class=None, *, policy=policy.compat32) 238 239 Return a message object structure from a :term:`bytes-like object`. This is 240 equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and 241 *policy* are interpreted as with the :class:`~email.parser.BytesParser` class 242 constructor. 243 244 .. versionadded:: 3.2 245 .. versionchanged:: 3.3 246 Removed the *strict* argument. Added the *policy* keyword. 247 248 249.. function:: message_from_binary_file(fp, _class=None, *, \ 250 policy=policy.compat32) 251 252 Return a message object structure tree from an open binary :term:`file 253 object`. This is equivalent to ``BytesParser().parse(fp)``. *_class* and 254 *policy* are interpreted as with the :class:`~email.parser.BytesParser` class 255 constructor. 256 257 .. versionadded:: 3.2 258 .. versionchanged:: 3.3 259 Removed the *strict* argument. Added the *policy* keyword. 260 261 262.. function:: message_from_string(s, _class=None, *, policy=policy.compat32) 263 264 Return a message object structure from a string. This is equivalent to 265 ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as 266 with the :class:`~email.parser.Parser` class constructor. 267 268 .. versionchanged:: 3.3 269 Removed the *strict* argument. Added the *policy* keyword. 270 271 272.. function:: message_from_file(fp, _class=None, *, policy=policy.compat32) 273 274 Return a message object structure tree from an open :term:`file object`. 275 This is equivalent to ``Parser().parse(fp)``. *_class* and *policy* are 276 interpreted as with the :class:`~email.parser.Parser` class constructor. 277 278 .. versionchanged:: 3.3 279 Removed the *strict* argument. Added the *policy* keyword. 280 .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``. 281 282 283Here's an example of how you might use :func:`message_from_bytes` at an 284interactive Python prompt:: 285 286 >>> import email 287 >>> msg = email.message_from_bytes(myBytes) # doctest: +SKIP 288 289 290Additional notes 291^^^^^^^^^^^^^^^^ 292 293Here are some notes on the parsing semantics: 294 295* Most non-\ :mimetype:`multipart` type messages are parsed as a single message 296 object with a string payload. These objects will return ``False`` for 297 :meth:`~email.message.EmailMessage.is_multipart`, and 298 :meth:`~email.message.EmailMessage.iter_parts` will yield an empty list. 299 300* All :mimetype:`multipart` type messages will be parsed as a container message 301 object with a list of sub-message objects for their payload. The outer 302 container message will return ``True`` for 303 :meth:`~email.message.EmailMessage.is_multipart`, and 304 :meth:`~email.message.EmailMessage.iter_parts` will yield a list of subparts. 305 306* Most messages with a content type of :mimetype:`message/\*` (such as 307 :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also 308 be parsed as container object containing a list payload of length 1. Their 309 :meth:`~email.message.EmailMessage.is_multipart` method will return ``True``. 310 The single element yielded by :meth:`~email.message.EmailMessage.iter_parts` 311 will be a sub-message object. 312 313* Some non-standards-compliant messages may not be internally consistent about 314 their :mimetype:`multipart`\ -edness. Such messages may have a 315 :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their 316 :meth:`~email.message.EmailMessage.is_multipart` method may return ``False``. 317 If such messages were parsed with the :class:`~email.parser.FeedParser`, 318 they will have an instance of the 319 :class:`~email.errors.MultipartInvariantViolationDefect` class in their 320 *defects* attribute list. See :mod:`email.errors` for details. 321