1:mod:`email.policy`: Policy Objects 2----------------------------------- 3 4.. module:: email.policy 5 :synopsis: Controlling the parsing and generating of messages 6 7.. moduleauthor:: R. David Murray <rdmurray@bitdance.com> 8.. sectionauthor:: R. David Murray <rdmurray@bitdance.com> 9 10.. versionadded:: 3.3 11 12**Source code:** :source:`Lib/email/policy.py` 13 14-------------- 15 16The :mod:`email` package's prime focus is the handling of email messages as 17described by the various email and MIME RFCs. However, the general format of 18email messages (a block of header fields each consisting of a name followed by 19a colon followed by a value, the whole block followed by a blank line and an 20arbitrary 'body'), is a format that has found utility outside of the realm of 21email. Some of these uses conform fairly closely to the main email RFCs, some 22do not. Even when working with email, there are times when it is desirable to 23break strict compliance with the RFCs, such as generating emails that 24interoperate with email servers that do not themselves follow the standards, or 25that implement extensions you want to use in ways that violate the 26standards. 27 28Policy objects give the email package the flexibility to handle all these 29disparate use cases. 30 31A :class:`Policy` object encapsulates a set of attributes and methods that 32control the behavior of various components of the email package during use. 33:class:`Policy` instances can be passed to various classes and methods in the 34email package to alter the default behavior. The settable values and their 35defaults are described below. 36 37There is a default policy used by all classes in the email package. For all of 38the :mod:`~email.parser` classes and the related convenience functions, and for 39the :class:`~email.message.Message` class, this is the :class:`Compat32` 40policy, via its corresponding pre-defined instance :const:`compat32`. This 41policy provides for complete backward compatibility (in some cases, including 42bug compatibility) with the pre-Python3.3 version of the email package. 43 44This default value for the *policy* keyword to 45:class:`~email.message.EmailMessage` is the :class:`EmailPolicy` policy, via 46its pre-defined instance :data:`~default`. 47 48When a :class:`~email.message.Message` or :class:`~email.message.EmailMessage` 49object is created, it acquires a policy. If the message is created by a 50:mod:`~email.parser`, a policy passed to the parser will be the policy used by 51the message it creates. If the message is created by the program, then the 52policy can be specified when it is created. When a message is passed to a 53:mod:`~email.generator`, the generator uses the policy from the message by 54default, but you can also pass a specific policy to the generator that will 55override the one stored on the message object. 56 57The default value for the *policy* keyword for the :mod:`email.parser` classes 58and the parser convenience functions **will be changing** in a future version of 59Python. Therefore you should **always specify explicitly which policy you want 60to use** when calling any of the classes and functions described in the 61:mod:`~email.parser` module. 62 63The first part of this documentation covers the features of :class:`Policy`, an 64:term:`abstract base class` that defines the features that are common to all 65policy objects, including :const:`compat32`. This includes certain hook 66methods that are called internally by the email package, which a custom policy 67could override to obtain different behavior. The second part describes the 68concrete classes :class:`EmailPolicy` and :class:`Compat32`, which implement 69the hooks that provide the standard behavior and the backward compatible 70behavior and features, respectively. 71 72:class:`Policy` instances are immutable, but they can be cloned, accepting the 73same keyword arguments as the class constructor and returning a new 74:class:`Policy` instance that is a copy of the original but with the specified 75attributes values changed. 76 77As an example, the following code could be used to read an email message from a 78file on disk and pass it to the system ``sendmail`` program on a Unix system: 79 80.. testsetup:: 81 82 from unittest import mock 83 mocker = mock.patch('subprocess.Popen') 84 m = mocker.start() 85 proc = mock.MagicMock() 86 m.return_value = proc 87 proc.stdin.close.return_value = None 88 mymsg = open('mymsg.txt', 'w') 89 mymsg.write('To: abc@xyz.com\n\n') 90 mymsg.flush() 91 92.. doctest:: 93 94 >>> from email import message_from_binary_file 95 >>> from email.generator import BytesGenerator 96 >>> from email import policy 97 >>> from subprocess import Popen, PIPE 98 >>> with open('mymsg.txt', 'rb') as f: 99 ... msg = message_from_binary_file(f, policy=policy.default) 100 >>> p = Popen(['sendmail', msg['To'].addresses[0]], stdin=PIPE) 101 >>> g = BytesGenerator(p.stdin, policy=msg.policy.clone(linesep='\r\n')) 102 >>> g.flatten(msg) 103 >>> p.stdin.close() 104 >>> rc = p.wait() 105 106.. testcleanup:: 107 108 mymsg.close() 109 mocker.stop() 110 import os 111 os.remove('mymsg.txt') 112 113Here we are telling :class:`~email.generator.BytesGenerator` to use the RFC 114correct line separator characters when creating the binary string to feed into 115``sendmail's`` ``stdin``, where the default policy would use ``\n`` line 116separators. 117 118Some email package methods accept a *policy* keyword argument, allowing the 119policy to be overridden for that method. For example, the following code uses 120the :meth:`~email.message.Message.as_bytes` method of the *msg* object from 121the previous example and writes the message to a file using the native line 122separators for the platform on which it is running:: 123 124 >>> import os 125 >>> with open('converted.txt', 'wb') as f: 126 ... f.write(msg.as_bytes(policy=msg.policy.clone(linesep=os.linesep))) 127 17 128 129Policy objects can also be combined using the addition operator, producing a 130policy object whose settings are a combination of the non-default values of the 131summed objects:: 132 133 >>> compat_SMTP = policy.compat32.clone(linesep='\r\n') 134 >>> compat_strict = policy.compat32.clone(raise_on_defect=True) 135 >>> compat_strict_SMTP = compat_SMTP + compat_strict 136 137This operation is not commutative; that is, the order in which the objects are 138added matters. To illustrate:: 139 140 >>> policy100 = policy.compat32.clone(max_line_length=100) 141 >>> policy80 = policy.compat32.clone(max_line_length=80) 142 >>> apolicy = policy100 + policy80 143 >>> apolicy.max_line_length 144 80 145 >>> apolicy = policy80 + policy100 146 >>> apolicy.max_line_length 147 100 148 149 150.. class:: Policy(**kw) 151 152 This is the :term:`abstract base class` for all policy classes. It provides 153 default implementations for a couple of trivial methods, as well as the 154 implementation of the immutability property, the :meth:`clone` method, and 155 the constructor semantics. 156 157 The constructor of a policy class can be passed various keyword arguments. 158 The arguments that may be specified are any non-method properties on this 159 class, plus any additional non-method properties on the concrete class. A 160 value specified in the constructor will override the default value for the 161 corresponding attribute. 162 163 This class defines the following properties, and thus values for the 164 following may be passed in the constructor of any policy class: 165 166 167 .. attribute:: max_line_length 168 169 The maximum length of any line in the serialized output, not counting the 170 end of line character(s). Default is 78, per :rfc:`5322`. A value of 171 ``0`` or :const:`None` indicates that no line wrapping should be 172 done at all. 173 174 175 .. attribute:: linesep 176 177 The string to be used to terminate lines in serialized output. The 178 default is ``\n`` because that's the internal end-of-line discipline used 179 by Python, though ``\r\n`` is required by the RFCs. 180 181 182 .. attribute:: cte_type 183 184 Controls the type of Content Transfer Encodings that may be or are 185 required to be used. The possible values are: 186 187 .. tabularcolumns:: |l|L| 188 189 ======== =============================================================== 190 ``7bit`` all data must be "7 bit clean" (ASCII-only). This means that 191 where necessary data will be encoded using either 192 quoted-printable or base64 encoding. 193 194 ``8bit`` data is not constrained to be 7 bit clean. Data in headers is 195 still required to be ASCII-only and so will be encoded (see 196 :meth:`fold_binary` and :attr:`~EmailPolicy.utf8` below for 197 exceptions), but body parts may use the ``8bit`` CTE. 198 ======== =============================================================== 199 200 A ``cte_type`` value of ``8bit`` only works with ``BytesGenerator``, not 201 ``Generator``, because strings cannot contain binary data. If a 202 ``Generator`` is operating under a policy that specifies 203 ``cte_type=8bit``, it will act as if ``cte_type`` is ``7bit``. 204 205 206 .. attribute:: raise_on_defect 207 208 If :const:`True`, any defects encountered will be raised as errors. If 209 :const:`False` (the default), defects will be passed to the 210 :meth:`register_defect` method. 211 212 213 .. attribute:: mangle_from_ 214 215 If :const:`True`, lines starting with *"From "* in the body are 216 escaped by putting a ``>`` in front of them. This parameter is used when 217 the message is being serialized by a generator. 218 Default: :const:`False`. 219 220 .. versionadded:: 3.5 221 The *mangle_from_* parameter. 222 223 224 .. attribute:: message_factory 225 226 A factory function for constructing a new empty message object. Used 227 by the parser when building messages. Defaults to ``None``, in 228 which case :class:`~email.message.Message` is used. 229 230 .. versionadded:: 3.6 231 232 The following :class:`Policy` method is intended to be called by code using 233 the email library to create policy instances with custom settings: 234 235 236 .. method:: clone(**kw) 237 238 Return a new :class:`Policy` instance whose attributes have the same 239 values as the current instance, except where those attributes are 240 given new values by the keyword arguments. 241 242 243 The remaining :class:`Policy` methods are called by the email package code, 244 and are not intended to be called by an application using the email package. 245 A custom policy must implement all of these methods. 246 247 248 .. method:: handle_defect(obj, defect) 249 250 Handle a *defect* found on *obj*. When the email package calls this 251 method, *defect* will always be a subclass of 252 :class:`~email.errors.Defect`. 253 254 The default implementation checks the :attr:`raise_on_defect` flag. If 255 it is ``True``, *defect* is raised as an exception. If it is ``False`` 256 (the default), *obj* and *defect* are passed to :meth:`register_defect`. 257 258 259 .. method:: register_defect(obj, defect) 260 261 Register a *defect* on *obj*. In the email package, *defect* will always 262 be a subclass of :class:`~email.errors.Defect`. 263 264 The default implementation calls the ``append`` method of the ``defects`` 265 attribute of *obj*. When the email package calls :attr:`handle_defect`, 266 *obj* will normally have a ``defects`` attribute that has an ``append`` 267 method. Custom object types used with the email package (for example, 268 custom ``Message`` objects) should also provide such an attribute, 269 otherwise defects in parsed messages will raise unexpected errors. 270 271 272 .. method:: header_max_count(name) 273 274 Return the maximum allowed number of headers named *name*. 275 276 Called when a header is added to an :class:`~email.message.EmailMessage` 277 or :class:`~email.message.Message` object. If the returned value is not 278 ``0`` or ``None``, and there are already a number of headers with the 279 name *name* greater than or equal to the value returned, a 280 :exc:`ValueError` is raised. 281 282 Because the default behavior of ``Message.__setitem__`` is to append the 283 value to the list of headers, it is easy to create duplicate headers 284 without realizing it. This method allows certain headers to be limited 285 in the number of instances of that header that may be added to a 286 ``Message`` programmatically. (The limit is not observed by the parser, 287 which will faithfully produce as many headers as exist in the message 288 being parsed.) 289 290 The default implementation returns ``None`` for all header names. 291 292 293 .. method:: header_source_parse(sourcelines) 294 295 The email package calls this method with a list of strings, each string 296 ending with the line separation characters found in the source being 297 parsed. The first line includes the field header name and separator. 298 All whitespace in the source is preserved. The method should return the 299 ``(name, value)`` tuple that is to be stored in the ``Message`` to 300 represent the parsed header. 301 302 If an implementation wishes to retain compatibility with the existing 303 email package policies, *name* should be the case preserved name (all 304 characters up to the '``:``' separator), while *value* should be the 305 unfolded value (all line separator characters removed, but whitespace 306 kept intact), stripped of leading whitespace. 307 308 *sourcelines* may contain surrogateescaped binary data. 309 310 There is no default implementation 311 312 313 .. method:: header_store_parse(name, value) 314 315 The email package calls this method with the name and value provided by 316 the application program when the application program is modifying a 317 ``Message`` programmatically (as opposed to a ``Message`` created by a 318 parser). The method should return the ``(name, value)`` tuple that is to 319 be stored in the ``Message`` to represent the header. 320 321 If an implementation wishes to retain compatibility with the existing 322 email package policies, the *name* and *value* should be strings or 323 string subclasses that do not change the content of the passed in 324 arguments. 325 326 There is no default implementation 327 328 329 .. method:: header_fetch_parse(name, value) 330 331 The email package calls this method with the *name* and *value* currently 332 stored in the ``Message`` when that header is requested by the 333 application program, and whatever the method returns is what is passed 334 back to the application as the value of the header being retrieved. 335 Note that there may be more than one header with the same name stored in 336 the ``Message``; the method is passed the specific name and value of the 337 header destined to be returned to the application. 338 339 *value* may contain surrogateescaped binary data. There should be no 340 surrogateescaped binary data in the value returned by the method. 341 342 There is no default implementation 343 344 345 .. method:: fold(name, value) 346 347 The email package calls this method with the *name* and *value* currently 348 stored in the ``Message`` for a given header. The method should return a 349 string that represents that header "folded" correctly (according to the 350 policy settings) by composing the *name* with the *value* and inserting 351 :attr:`linesep` characters at the appropriate places. See :rfc:`5322` 352 for a discussion of the rules for folding email headers. 353 354 *value* may contain surrogateescaped binary data. There should be no 355 surrogateescaped binary data in the string returned by the method. 356 357 358 .. method:: fold_binary(name, value) 359 360 The same as :meth:`fold`, except that the returned value should be a 361 bytes object rather than a string. 362 363 *value* may contain surrogateescaped binary data. These could be 364 converted back into binary data in the returned bytes object. 365 366 367 368.. class:: EmailPolicy(**kw) 369 370 This concrete :class:`Policy` provides behavior that is intended to be fully 371 compliant with the current email RFCs. These include (but are not limited 372 to) :rfc:`5322`, :rfc:`2047`, and the current MIME RFCs. 373 374 This policy adds new header parsing and folding algorithms. Instead of 375 simple strings, headers are ``str`` subclasses with attributes that depend 376 on the type of the field. The parsing and folding algorithm fully implement 377 :rfc:`2047` and :rfc:`5322`. 378 379 The default value for the :attr:`~email.policy.Policy.message_factory` 380 attribute is :class:`~email.message.EmailMessage`. 381 382 In addition to the settable attributes listed above that apply to all 383 policies, this policy adds the following additional attributes: 384 385 .. versionadded:: 3.6 [1]_ 386 387 388 .. attribute:: utf8 389 390 If ``False``, follow :rfc:`5322`, supporting non-ASCII characters in 391 headers by encoding them as "encoded words". If ``True``, follow 392 :rfc:`6532` and use ``utf-8`` encoding for headers. Messages 393 formatted in this way may be passed to SMTP servers that support 394 the ``SMTPUTF8`` extension (:rfc:`6531`). 395 396 397 .. attribute:: refold_source 398 399 If the value for a header in the ``Message`` object originated from a 400 :mod:`~email.parser` (as opposed to being set by a program), this 401 attribute indicates whether or not a generator should refold that value 402 when transforming the message back into serialized form. The possible 403 values are: 404 405 ======== =============================================================== 406 ``none`` all source values use original folding 407 408 ``long`` source values that have any line that is longer than 409 ``max_line_length`` will be refolded 410 411 ``all`` all values are refolded. 412 ======== =============================================================== 413 414 The default is ``long``. 415 416 417 .. attribute:: header_factory 418 419 A callable that takes two arguments, ``name`` and ``value``, where 420 ``name`` is a header field name and ``value`` is an unfolded header field 421 value, and returns a string subclass that represents that header. A 422 default ``header_factory`` (see :mod:`~email.headerregistry`) is provided 423 that supports custom parsing for the various address and date :RFC:`5322` 424 header field types, and the major MIME header field stypes. Support for 425 additional custom parsing will be added in the future. 426 427 428 .. attribute:: content_manager 429 430 An object with at least two methods: get_content and set_content. When 431 the :meth:`~email.message.EmailMessage.get_content` or 432 :meth:`~email.message.EmailMessage.set_content` method of an 433 :class:`~email.message.EmailMessage` object is called, it calls the 434 corresponding method of this object, passing it the message object as its 435 first argument, and any arguments or keywords that were passed to it as 436 additional arguments. By default ``content_manager`` is set to 437 :data:`~email.contentmanager.raw_data_manager`. 438 439 .. versionadded:: 3.4 440 441 442 The class provides the following concrete implementations of the abstract 443 methods of :class:`Policy`: 444 445 446 .. method:: header_max_count(name) 447 448 Returns the value of the 449 :attr:`~email.headerregistry.BaseHeader.max_count` attribute of the 450 specialized class used to represent the header with the given name. 451 452 453 .. method:: header_source_parse(sourcelines) 454 455 456 The name is parsed as everything up to the '``:``' and returned 457 unmodified. The value is determined by stripping leading whitespace off 458 the remainder of the first line, joining all subsequent lines together, 459 and stripping any trailing carriage return or linefeed characters. 460 461 462 .. method:: header_store_parse(name, value) 463 464 The name is returned unchanged. If the input value has a ``name`` 465 attribute and it matches *name* ignoring case, the value is returned 466 unchanged. Otherwise the *name* and *value* are passed to 467 ``header_factory``, and the resulting header object is returned as 468 the value. In this case a ``ValueError`` is raised if the input value 469 contains CR or LF characters. 470 471 472 .. method:: header_fetch_parse(name, value) 473 474 If the value has a ``name`` attribute, it is returned to unmodified. 475 Otherwise the *name*, and the *value* with any CR or LF characters 476 removed, are passed to the ``header_factory``, and the resulting 477 header object is returned. Any surrogateescaped bytes get turned into 478 the unicode unknown-character glyph. 479 480 481 .. method:: fold(name, value) 482 483 Header folding is controlled by the :attr:`refold_source` policy setting. 484 A value is considered to be a 'source value' if and only if it does not 485 have a ``name`` attribute (having a ``name`` attribute means it is a 486 header object of some sort). If a source value needs to be refolded 487 according to the policy, it is converted into a header object by 488 passing the *name* and the *value* with any CR and LF characters removed 489 to the ``header_factory``. Folding of a header object is done by 490 calling its ``fold`` method with the current policy. 491 492 Source values are split into lines using :meth:`~str.splitlines`. If 493 the value is not to be refolded, the lines are rejoined using the 494 ``linesep`` from the policy and returned. The exception is lines 495 containing non-ascii binary data. In that case the value is refolded 496 regardless of the ``refold_source`` setting, which causes the binary data 497 to be CTE encoded using the ``unknown-8bit`` charset. 498 499 500 .. method:: fold_binary(name, value) 501 502 The same as :meth:`fold` if :attr:`~Policy.cte_type` is ``7bit``, except 503 that the returned value is bytes. 504 505 If :attr:`~Policy.cte_type` is ``8bit``, non-ASCII binary data is 506 converted back 507 into bytes. Headers with binary data are not refolded, regardless of the 508 ``refold_header`` setting, since there is no way to know whether the 509 binary data consists of single byte characters or multibyte characters. 510 511 512The following instances of :class:`EmailPolicy` provide defaults suitable for 513specific application domains. Note that in the future the behavior of these 514instances (in particular the ``HTTP`` instance) may be adjusted to conform even 515more closely to the RFCs relevant to their domains. 516 517 518.. data:: default 519 520 An instance of ``EmailPolicy`` with all defaults unchanged. This policy 521 uses the standard Python ``\n`` line endings rather than the RFC-correct 522 ``\r\n``. 523 524 525.. data:: SMTP 526 527 Suitable for serializing messages in conformance with the email RFCs. 528 Like ``default``, but with ``linesep`` set to ``\r\n``, which is RFC 529 compliant. 530 531 532.. data:: SMTPUTF8 533 534 The same as ``SMTP`` except that :attr:`~EmailPolicy.utf8` is ``True``. 535 Useful for serializing messages to a message store without using encoded 536 words in the headers. Should only be used for SMTP transmission if the 537 sender or recipient addresses have non-ASCII characters (the 538 :meth:`smtplib.SMTP.send_message` method handles this automatically). 539 540 541.. data:: HTTP 542 543 Suitable for serializing headers with for use in HTTP traffic. Like 544 ``SMTP`` except that ``max_line_length`` is set to ``None`` (unlimited). 545 546 547.. data:: strict 548 549 Convenience instance. The same as ``default`` except that 550 ``raise_on_defect`` is set to ``True``. This allows any policy to be made 551 strict by writing:: 552 553 somepolicy + policy.strict 554 555 556With all of these :class:`EmailPolicies <.EmailPolicy>`, the effective API of 557the email package is changed from the Python 3.2 API in the following ways: 558 559 * Setting a header on a :class:`~email.message.Message` results in that 560 header being parsed and a header object created. 561 562 * Fetching a header value from a :class:`~email.message.Message` results 563 in that header being parsed and a header object created and 564 returned. 565 566 * Any header object, or any header that is refolded due to the 567 policy settings, is folded using an algorithm that fully implements the 568 RFC folding algorithms, including knowing where encoded words are required 569 and allowed. 570 571From the application view, this means that any header obtained through the 572:class:`~email.message.EmailMessage` is a header object with extra 573attributes, whose string value is the fully decoded unicode value of the 574header. Likewise, a header may be assigned a new value, or a new header 575created, using a unicode string, and the policy will take care of converting 576the unicode string into the correct RFC encoded form. 577 578The header objects and their attributes are described in 579:mod:`~email.headerregistry`. 580 581 582 583.. class:: Compat32(**kw) 584 585 This concrete :class:`Policy` is the backward compatibility policy. It 586 replicates the behavior of the email package in Python 3.2. The 587 :mod:`~email.policy` module also defines an instance of this class, 588 :const:`compat32`, that is used as the default policy. Thus the default 589 behavior of the email package is to maintain compatibility with Python 3.2. 590 591 The following attributes have values that are different from the 592 :class:`Policy` default: 593 594 595 .. attribute:: mangle_from_ 596 597 The default is ``True``. 598 599 600 The class provides the following concrete implementations of the 601 abstract methods of :class:`Policy`: 602 603 604 .. method:: header_source_parse(sourcelines) 605 606 The name is parsed as everything up to the '``:``' and returned 607 unmodified. The value is determined by stripping leading whitespace off 608 the remainder of the first line, joining all subsequent lines together, 609 and stripping any trailing carriage return or linefeed characters. 610 611 612 .. method:: header_store_parse(name, value) 613 614 The name and value are returned unmodified. 615 616 617 .. method:: header_fetch_parse(name, value) 618 619 If the value contains binary data, it is converted into a 620 :class:`~email.header.Header` object using the ``unknown-8bit`` charset. 621 Otherwise it is returned unmodified. 622 623 624 .. method:: fold(name, value) 625 626 Headers are folded using the :class:`~email.header.Header` folding 627 algorithm, which preserves existing line breaks in the value, and wraps 628 each resulting line to the ``max_line_length``. Non-ASCII binary data are 629 CTE encoded using the ``unknown-8bit`` charset. 630 631 632 .. method:: fold_binary(name, value) 633 634 Headers are folded using the :class:`~email.header.Header` folding 635 algorithm, which preserves existing line breaks in the value, and wraps 636 each resulting line to the ``max_line_length``. If ``cte_type`` is 637 ``7bit``, non-ascii binary data is CTE encoded using the ``unknown-8bit`` 638 charset. Otherwise the original source header is used, with its existing 639 line breaks and any (RFC invalid) binary data it may contain. 640 641 642.. data:: compat32 643 644 An instance of :class:`Compat32`, providing backward compatibility with the 645 behavior of the email package in Python 3.2. 646 647 648.. rubric:: Footnotes 649 650.. [1] Originally added in 3.3 as a :term:`provisional feature <provisional 651 package>`. 652