1:mod:`email` Package Architecture 2================================= 3 4Overview 5-------- 6 7The email package consists of three major components: 8 9 Model 10 An object structure that represents an email message, and provides an 11 API for creating, querying, and modifying a message. 12 13 Parser 14 Takes a sequence of characters or bytes and produces a model of the 15 email message represented by those characters or bytes. 16 17 Generator 18 Takes a model and turns it into a sequence of characters or bytes. The 19 sequence can either be intended for human consumption (a printable 20 unicode string) or bytes suitable for transmission over the wire. In 21 the latter case all data is properly encoded using the content transfer 22 encodings specified by the relevant RFCs. 23 24Conceptually the package is organized around the model. The model provides both 25"external" APIs intended for use by application programs using the library, 26and "internal" APIs intended for use by the Parser and Generator components. 27This division is intentionally a bit fuzzy; the API described by this 28documentation is all a public, stable API. This allows for an application 29with special needs to implement its own parser and/or generator. 30 31In addition to the three major functional components, there is a third key 32component to the architecture: 33 34 Policy 35 An object that specifies various behavioral settings and carries 36 implementations of various behavior-controlling methods. 37 38The Policy framework provides a simple and convenient way to control the 39behavior of the library, making it possible for the library to be used in a 40very flexible fashion while leveraging the common code required to parse, 41represent, and generate message-like objects. For example, in addition to the 42default :rfc:`5322` email message policy, we also have a policy that manages 43HTTP headers in a fashion compliant with :rfc:`2616`. Individual policy 44controls, such as the maximum line length produced by the generator, can also 45be controlled individually to meet specialized application requirements. 46 47 48The Model 49--------- 50 51The message model is implemented by the :class:`~email.message.Message` class. 52The model divides a message into the two fundamental parts discussed by the 53RFC: the header section and the body. The `Message` object acts as a 54pseudo-dictionary of named headers. Its dictionary interface provides 55convenient access to individual headers by name. However, all headers are kept 56internally in an ordered list, so that the information about the order of the 57headers in the original message is preserved. 58 59The `Message` object also has a `payload` that holds the body. A `payload` can 60be one of two things: data, or a list of `Message` objects. The latter is used 61to represent a multipart MIME message. Lists can be nested arbitrarily deeply 62in order to represent the message, with all terminal leaves having non-list 63data payloads. 64 65 66Message Lifecycle 67----------------- 68 69The general lifecycle of a message is: 70 71 Creation 72 A `Message` object can be created by a Parser, or it can be 73 instantiated as an empty message by an application. 74 75 Manipulation 76 The application may examine one or more headers, and/or the 77 payload, and it may modify one or more headers and/or 78 the payload. This may be done on the top level `Message` 79 object, or on any sub-object. 80 81 Finalization 82 The Model is converted into a unicode or binary stream, 83 or the model is discarded. 84 85 86 87Header Policy Control During Lifecycle 88-------------------------------------- 89 90One of the major controls exerted by the Policy is the management of headers 91during the `Message` lifecycle. Most applications don't need to be aware of 92this. 93 94A header enters the model in one of two ways: via a Parser, or by being set to 95a specific value by an application program after the Model already exists. 96Similarly, a header exits the model in one of two ways: by being serialized by 97a Generator, or by being retrieved from a Model by an application program. The 98Policy object provides hooks for all four of these pathways. 99 100The model storage for headers is a list of (name, value) tuples. 101 102The Parser identifies headers during parsing, and passes them to the 103:meth:`~email.policy.Policy.header_source_parse` method of the Policy. The 104result of that method is the (name, value) tuple to be stored in the model. 105 106When an application program supplies a header value (for example, through the 107`Message` object `__setitem__` interface), the name and the value are passed to 108the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which 109returns the (name, value) tuple to be stored in the model. 110 111When an application program retrieves a header (through any of the dict or list 112interfaces of `Message`), the name and value are passed to the 113:meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to 114obtain the value returned to the application. 115 116When a Generator requests a header during serialization, the name and value are 117passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which 118returns a string containing line breaks in the appropriate places. The 119:meth:`~email.policy.Policy.cte_type` Policy control determines whether or 120not Content Transfer Encoding is performed on the data in the header. There is 121also a :meth:`~email.policy.Policy.binary_fold` method for use by generators 122that produce binary output, which returns the folded header as binary data, 123possibly folded at different places than the corresponding string would be. 124 125 126Handling Binary Data 127-------------------- 128 129In an ideal world all message data would conform to the RFCs, meaning that the 130parser could decode the message into the idealized unicode message that the 131sender originally wrote. In the real world, the email package must also be 132able to deal with badly formatted messages, including messages containing 133non-ASCII characters that either have no indicated character set or are not 134valid characters in the indicated character set. 135 136Since email messages are *primarily* text data, and operations on message data 137are primarily text operations (except for binary payloads of course), the model 138stores all text data as unicode strings. Un-decodable binary inside text 139data is handled by using the `surrogateescape` error handler of the ASCII 140codec. As with the binary filenames the error handler was introduced to 141handle, this allows the email package to "carry" the binary data received 142during parsing along until the output stage, at which time it is regenerated 143in its original form. 144 145This carried binary data is almost entirely an implementation detail. The one 146place where it is visible in the API is in the "internal" API. A Parser must 147do the `surrogateescape` encoding of binary input data, and pass that data to 148the appropriate Policy method. The "internal" interface used by the Generator 149to access header values preserves the `surrogateescaped` bytes. All other 150interfaces convert the binary data either back into bytes or into a safe form 151(losing information in some cases). 152 153 154Backward Compatibility 155---------------------- 156 157The :class:`~email.policy.Policy.Compat32` Policy provides backward 158compatibility with version 5.1 of the email package. It does this via the 159following implementation of the four+1 Policy methods described above: 160 161header_source_parse 162 Splits the first line on the colon to obtain the name, discards any spaces 163 after the colon, and joins the remainder of the line with all of the 164 remaining lines, preserving the linesep characters to obtain the value. 165 Trailing carriage return and/or linefeed characters are stripped from the 166 resulting value string. 167 168header_store_parse 169 Returns the name and value exactly as received from the application. 170 171header_fetch_parse 172 If the value contains any `surrogateescaped` binary data, return the value 173 as a :class:`~email.header.Header` object, using the character set 174 `unknown-8bit`. Otherwise just returns the value. 175 176fold 177 Uses :class:`~email.header.Header`'s folding to fold headers in the 178 same way the email5.1 generator did. 179 180binary_fold 181 Same as fold, but encodes to 'ascii'. 182 183 184New Algorithm 185------------- 186 187header_source_parse 188 Same as legacy behavior. 189 190header_store_parse 191 Same as legacy behavior. 192 193header_fetch_parse 194 If the value is already a header object, returns it. Otherwise, parses the 195 value using the new parser, and returns the resulting object as the value. 196 `surrogateescaped` bytes get turned into unicode unknown character code 197 points. 198 199fold 200 Uses the new header folding algorithm, respecting the policy settings. 201 surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for 202 ``cte_type=7bit`` or ``8bit``. Returns a string. 203 204 At some point there will also be a ``cte_type=unicode``, and for that 205 policy fold will serialize the idealized unicode message with RFC-like 206 folding, converting any surrogateescaped bytes into the unicode 207 unknown character glyph. 208 209binary_fold 210 Uses the new header folding algorithm, respecting the policy settings. 211 surrogateescaped bytes are encoded using the `unknown-8bit` charset for 212 ``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``. 213 Returns bytes. 214 215 At some point there will also be a ``cte_type=unicode``, and for that 216 policy binary_fold will serialize the message according to :rfc:``5335``. 217