• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`!email.policy`: Policy Objects
2------------------------------------
3
4.. module:: email.policy
5   :synopsis: Controlling the parsing and generating of messages
6
7.. moduleauthor:: R. David Murray <rdmurray@bitdance.com>
8.. sectionauthor:: R. David Murray <rdmurray@bitdance.com>
9
10.. versionadded:: 3.3
11
12**Source code:** :source:`Lib/email/policy.py`
13
14--------------
15
16The :mod:`email` package's prime focus is the handling of email messages as
17described by the various email and MIME RFCs.  However, the general format of
18email messages (a block of header fields each consisting of a name followed by
19a colon followed by a value, the whole block followed by a blank line and an
20arbitrary 'body'), is a format that has found utility outside of the realm of
21email.  Some of these uses conform fairly closely to the main email RFCs, some
22do not.  Even when working with email, there are times when it is desirable to
23break strict compliance with the RFCs, such as generating emails that
24interoperate with email servers that do not themselves follow the standards, or
25that implement extensions you want to use in ways that violate the
26standards.
27
28Policy objects give the email package the flexibility to handle all these
29disparate use cases.
30
31A :class:`Policy` object encapsulates a set of attributes and methods that
32control the behavior of various components of the email package during use.
33:class:`Policy` instances can be passed to various classes and methods in the
34email package to alter the default behavior.  The settable values and their
35defaults are described below.
36
37There is a default policy used by all classes in the email package.  For all of
38the :mod:`~email.parser` classes and the related convenience functions, and for
39the :class:`~email.message.Message` class, this is the :class:`Compat32`
40policy, via its corresponding pre-defined instance :const:`compat32`.  This
41policy provides for complete backward compatibility (in some cases, including
42bug compatibility) with the pre-Python3.3 version of the email package.
43
44This default value for the *policy* keyword to
45:class:`~email.message.EmailMessage` is the :class:`EmailPolicy` policy, via
46its pre-defined instance :data:`~default`.
47
48When a :class:`~email.message.Message` or :class:`~email.message.EmailMessage`
49object is created, it acquires a policy.  If the message is created by a
50:mod:`~email.parser`, a policy passed to the parser will be the policy used by
51the message it creates.  If the message is created by the program, then the
52policy can be specified when it is created.  When a message is passed to a
53:mod:`~email.generator`, the generator uses the policy from the message by
54default, but you can also pass a specific policy to the generator that will
55override the one stored on the message object.
56
57The default value for the *policy* keyword for the :mod:`email.parser` classes
58and the parser convenience functions **will be changing** in a future version of
59Python.  Therefore you should **always specify explicitly which policy you want
60to use** when calling any of the classes and functions described in the
61:mod:`~email.parser` module.
62
63The first part of this documentation covers the features of :class:`Policy`, an
64:term:`abstract base class` that defines the features that are common to all
65policy objects, including :const:`compat32`.  This includes certain hook
66methods that are called internally by the email package, which a custom policy
67could override to obtain different behavior.  The second part describes the
68concrete classes :class:`EmailPolicy` and :class:`Compat32`, which implement
69the hooks that provide the standard behavior and the backward compatible
70behavior and features, respectively.
71
72:class:`Policy` instances are immutable, but they can be cloned, accepting the
73same keyword arguments as the class constructor and returning a new
74:class:`Policy` instance that is a copy of the original but with the specified
75attributes values changed.
76
77As an example, the following code could be used to read an email message from a
78file on disk and pass it to the system ``sendmail`` program on a Unix system:
79
80.. testsetup::
81
82   from unittest import mock
83   mocker = mock.patch('subprocess.Popen')
84   m = mocker.start()
85   proc = mock.MagicMock()
86   m.return_value = proc
87   proc.stdin.close.return_value = None
88   mymsg = open('mymsg.txt', 'w')
89   mymsg.write('To: abc@xyz.com\n\n')
90   mymsg.flush()
91
92.. doctest::
93
94   >>> from email import message_from_binary_file
95   >>> from email.generator import BytesGenerator
96   >>> from email import policy
97   >>> from subprocess import Popen, PIPE
98   >>> with open('mymsg.txt', 'rb') as f:
99   ...     msg = message_from_binary_file(f, policy=policy.default)
100   ...
101   >>> p = Popen(['sendmail', msg['To'].addresses[0]], stdin=PIPE)
102   >>> g = BytesGenerator(p.stdin, policy=msg.policy.clone(linesep='\r\n'))
103   >>> g.flatten(msg)
104   >>> p.stdin.close()
105   >>> rc = p.wait()
106
107.. testcleanup::
108
109   mymsg.close()
110   mocker.stop()
111   import os
112   os.remove('mymsg.txt')
113
114Here we are telling :class:`~email.generator.BytesGenerator` to use the RFC
115correct line separator characters when creating the binary string to feed into
116``sendmail's`` ``stdin``, where the default policy would use ``\n`` line
117separators.
118
119Some email package methods accept a *policy* keyword argument, allowing the
120policy to be overridden for that method.  For example, the following code uses
121the :meth:`~email.message.Message.as_bytes` method of the *msg* object from
122the previous example and writes the message to a file using the native line
123separators for the platform on which it is running::
124
125   >>> import os
126   >>> with open('converted.txt', 'wb') as f:
127   ...     f.write(msg.as_bytes(policy=msg.policy.clone(linesep=os.linesep)))
128   17
129
130Policy objects can also be combined using the addition operator, producing a
131policy object whose settings are a combination of the non-default values of the
132summed objects::
133
134   >>> compat_SMTP = policy.compat32.clone(linesep='\r\n')
135   >>> compat_strict = policy.compat32.clone(raise_on_defect=True)
136   >>> compat_strict_SMTP = compat_SMTP + compat_strict
137
138This operation is not commutative; that is, the order in which the objects are
139added matters.  To illustrate::
140
141   >>> policy100 = policy.compat32.clone(max_line_length=100)
142   >>> policy80 = policy.compat32.clone(max_line_length=80)
143   >>> apolicy = policy100 + policy80
144   >>> apolicy.max_line_length
145   80
146   >>> apolicy = policy80 + policy100
147   >>> apolicy.max_line_length
148   100
149
150
151.. class:: Policy(**kw)
152
153   This is the :term:`abstract base class` for all policy classes.  It provides
154   default implementations for a couple of trivial methods, as well as the
155   implementation of the immutability property, the :meth:`clone` method, and
156   the constructor semantics.
157
158   The constructor of a policy class can be passed various keyword arguments.
159   The arguments that may be specified are any non-method properties on this
160   class, plus any additional non-method properties on the concrete class.  A
161   value specified in the constructor will override the default value for the
162   corresponding attribute.
163
164   This class defines the following properties, and thus values for the
165   following may be passed in the constructor of any policy class:
166
167
168   .. attribute:: max_line_length
169
170      The maximum length of any line in the serialized output, not counting the
171      end of line character(s).  Default is 78, per :rfc:`5322`.  A value of
172      ``0`` or :const:`None` indicates that no line wrapping should be
173      done at all.
174
175
176   .. attribute:: linesep
177
178      The string to be used to terminate lines in serialized output.  The
179      default is ``\n`` because that's the internal end-of-line discipline used
180      by Python, though ``\r\n`` is required by the RFCs.
181
182
183   .. attribute:: cte_type
184
185      Controls the type of Content Transfer Encodings that may be or are
186      required to be used.  The possible values are:
187
188      .. tabularcolumns:: |l|L|
189
190      ========  ===============================================================
191      ``7bit``  all data must be "7 bit clean" (ASCII-only).  This means that
192                where necessary data will be encoded using either
193                quoted-printable or base64 encoding.
194
195      ``8bit``  data is not constrained to be 7 bit clean.  Data in headers is
196                still required to be ASCII-only and so will be encoded (see
197                :meth:`fold_binary` and :attr:`~EmailPolicy.utf8` below for
198                exceptions), but body parts may use the ``8bit`` CTE.
199      ========  ===============================================================
200
201      A ``cte_type`` value of ``8bit`` only works with ``BytesGenerator``, not
202      ``Generator``, because strings cannot contain binary data.  If a
203      ``Generator`` is operating under a policy that specifies
204      ``cte_type=8bit``, it will act as if ``cte_type`` is ``7bit``.
205
206
207   .. attribute:: raise_on_defect
208
209      If :const:`True`, any defects encountered will be raised as errors.  If
210      :const:`False` (the default), defects will be passed to the
211      :meth:`register_defect` method.
212
213
214   .. attribute:: mangle_from_
215
216      If :const:`True`, lines starting with *"From "* in the body are
217      escaped by putting a ``>`` in front of them. This parameter is used when
218      the message is being serialized by a generator.
219      Default: :const:`False`.
220
221      .. versionadded:: 3.5
222
223
224   .. attribute:: message_factory
225
226      A factory function for constructing a new empty message object.  Used
227      by the parser when building messages.  Defaults to ``None``, in
228      which case :class:`~email.message.Message` is used.
229
230      .. versionadded:: 3.6
231
232
233   .. attribute:: verify_generated_headers
234
235      If ``True`` (the default), the generator will raise
236      :exc:`~email.errors.HeaderWriteError` instead of writing a header
237      that is improperly folded or delimited, such that it would
238      be parsed as multiple headers or joined with adjacent data.
239      Such headers can be generated by custom header classes or bugs
240      in the ``email`` module.
241
242      As it's a security feature, this defaults to ``True`` even in the
243      :class:`~email.policy.Compat32` policy.
244      For backwards compatible, but unsafe, behavior, it must be set to
245      ``False`` explicitly.
246
247      .. versionadded:: 3.13
248
249
250   The following :class:`Policy` method is intended to be called by code using
251   the email library to create policy instances with custom settings:
252
253
254   .. method:: clone(**kw)
255
256      Return a new :class:`Policy` instance whose attributes have the same
257      values as the current instance, except where those attributes are
258      given new values by the keyword arguments.
259
260
261   The remaining :class:`Policy` methods are called by the email package code,
262   and are not intended to be called by an application using the email package.
263   A custom policy must implement all of these methods.
264
265
266   .. method:: handle_defect(obj, defect)
267
268      Handle a *defect* found on *obj*.  When the email package calls this
269      method, *defect* will always be a subclass of
270      :class:`~email.errors.Defect`.
271
272      The default implementation checks the :attr:`raise_on_defect` flag.  If
273      it is ``True``, *defect* is raised as an exception.  If it is ``False``
274      (the default), *obj* and *defect* are passed to :meth:`register_defect`.
275
276
277   .. method:: register_defect(obj, defect)
278
279      Register a *defect* on *obj*.  In the email package, *defect* will always
280      be a subclass of :class:`~email.errors.Defect`.
281
282      The default implementation calls the ``append`` method of the ``defects``
283      attribute of *obj*.  When the email package calls :attr:`handle_defect`,
284      *obj* will normally have a ``defects`` attribute that has an ``append``
285      method.  Custom object types used with the email package (for example,
286      custom ``Message`` objects) should also provide such an attribute,
287      otherwise defects in parsed messages will raise unexpected errors.
288
289
290   .. method:: header_max_count(name)
291
292      Return the maximum allowed number of headers named *name*.
293
294      Called when a header is added to an :class:`~email.message.EmailMessage`
295      or :class:`~email.message.Message` object.  If the returned value is not
296      ``0`` or ``None``, and there are already a number of headers with the
297      name *name* greater than or equal to the value returned, a
298      :exc:`ValueError` is raised.
299
300      Because the default behavior of ``Message.__setitem__`` is to append the
301      value to the list of headers, it is easy to create duplicate headers
302      without realizing it.  This method allows certain headers to be limited
303      in the number of instances of that header that may be added to a
304      ``Message`` programmatically.  (The limit is not observed by the parser,
305      which will faithfully produce as many headers as exist in the message
306      being parsed.)
307
308      The default implementation returns ``None`` for all header names.
309
310
311   .. method:: header_source_parse(sourcelines)
312
313      The email package calls this method with a list of strings, each string
314      ending with the line separation characters found in the source being
315      parsed.  The first line includes the field header name and separator.
316      All whitespace in the source is preserved.  The method should return the
317      ``(name, value)`` tuple that is to be stored in the ``Message`` to
318      represent the parsed header.
319
320      If an implementation wishes to retain compatibility with the existing
321      email package policies, *name* should be the case preserved name (all
322      characters up to the '``:``' separator), while *value* should be the
323      unfolded value (all line separator characters removed, but whitespace
324      kept intact), stripped of leading whitespace.
325
326      *sourcelines* may contain surrogateescaped binary data.
327
328      There is no default implementation
329
330
331   .. method:: header_store_parse(name, value)
332
333      The email package calls this method with the name and value provided by
334      the application program when the application program is modifying a
335      ``Message`` programmatically (as opposed to a ``Message`` created by a
336      parser).  The method should return the ``(name, value)`` tuple that is to
337      be stored in the ``Message`` to represent the header.
338
339      If an implementation wishes to retain compatibility with the existing
340      email package policies, the *name* and *value* should be strings or
341      string subclasses that do not change the content of the passed in
342      arguments.
343
344      There is no default implementation
345
346
347   .. method:: header_fetch_parse(name, value)
348
349      The email package calls this method with the *name* and *value* currently
350      stored in the ``Message`` when that header is requested by the
351      application program, and whatever the method returns is what is passed
352      back to the application as the value of the header being retrieved.
353      Note that there may be more than one header with the same name stored in
354      the ``Message``; the method is passed the specific name and value of the
355      header destined to be returned to the application.
356
357      *value* may contain surrogateescaped binary data.  There should be no
358      surrogateescaped binary data in the value returned by the method.
359
360      There is no default implementation
361
362
363   .. method:: fold(name, value)
364
365      The email package calls this method with the *name* and *value* currently
366      stored in the ``Message`` for a given header.  The method should return a
367      string that represents that header "folded" correctly (according to the
368      policy settings) by composing the *name* with the *value* and inserting
369      :attr:`linesep` characters at the appropriate places.  See :rfc:`5322`
370      for a discussion of the rules for folding email headers.
371
372      *value* may contain surrogateescaped binary data.  There should be no
373      surrogateescaped binary data in the string returned by the method.
374
375
376   .. method:: fold_binary(name, value)
377
378      The same as :meth:`fold`, except that the returned value should be a
379      bytes object rather than a string.
380
381      *value* may contain surrogateescaped binary data.  These could be
382      converted back into binary data in the returned bytes object.
383
384
385
386.. class:: EmailPolicy(**kw)
387
388   This concrete :class:`Policy` provides behavior that is intended to be fully
389   compliant with the current email RFCs.  These include (but are not limited
390   to) :rfc:`5322`, :rfc:`2047`, and the current MIME RFCs.
391
392   This policy adds new header parsing and folding algorithms.  Instead of
393   simple strings, headers are ``str`` subclasses with attributes that depend
394   on the type of the field.  The parsing and folding algorithm fully implement
395   :rfc:`2047` and :rfc:`5322`.
396
397   The default value for the :attr:`~email.policy.Policy.message_factory`
398   attribute is :class:`~email.message.EmailMessage`.
399
400   In addition to the settable attributes listed above that apply to all
401   policies, this policy adds the following additional attributes:
402
403   .. versionadded:: 3.6 [1]_
404
405
406   .. attribute:: utf8
407
408      If ``False``, follow :rfc:`5322`, supporting non-ASCII characters in
409      headers by encoding them as "encoded words".  If ``True``, follow
410      :rfc:`6532` and use ``utf-8`` encoding for headers.  Messages
411      formatted in this way may be passed to SMTP servers that support
412      the ``SMTPUTF8`` extension (:rfc:`6531`).
413
414
415   .. attribute:: refold_source
416
417      If the value for a header in the ``Message`` object originated from a
418      :mod:`~email.parser` (as opposed to being set by a program), this
419      attribute indicates whether or not a generator should refold that value
420      when transforming the message back into serialized form.  The possible
421      values are:
422
423      ========  ===============================================================
424      ``none``  all source values use original folding
425
426      ``long``  source values that have any line that is longer than
427                ``max_line_length`` will be refolded
428
429      ``all``   all values are refolded.
430      ========  ===============================================================
431
432      The default is ``long``.
433
434
435   .. attribute:: header_factory
436
437      A callable that takes two arguments, ``name`` and ``value``, where
438      ``name`` is a header field name and ``value`` is an unfolded header field
439      value, and returns a string subclass that represents that header.  A
440      default ``header_factory`` (see :mod:`~email.headerregistry`) is provided
441      that supports custom parsing for the various address and date :RFC:`5322`
442      header field types, and the major MIME header field stypes.  Support for
443      additional custom parsing will be added in the future.
444
445
446   .. attribute:: content_manager
447
448      An object with at least two methods: get_content and set_content.  When
449      the :meth:`~email.message.EmailMessage.get_content` or
450      :meth:`~email.message.EmailMessage.set_content` method of an
451      :class:`~email.message.EmailMessage` object is called, it calls the
452      corresponding method of this object, passing it the message object as its
453      first argument, and any arguments or keywords that were passed to it as
454      additional arguments.  By default ``content_manager`` is set to
455      :data:`~email.contentmanager.raw_data_manager`.
456
457      .. versionadded:: 3.4
458
459
460   The class provides the following concrete implementations of the abstract
461   methods of :class:`Policy`:
462
463
464   .. method:: header_max_count(name)
465
466      Returns the value of the
467      :attr:`~email.headerregistry.BaseHeader.max_count` attribute of the
468      specialized class used to represent the header with the given name.
469
470
471   .. method:: header_source_parse(sourcelines)
472
473
474      The name is parsed as everything up to the '``:``' and returned
475      unmodified.  The value is determined by stripping leading whitespace off
476      the remainder of the first line, joining all subsequent lines together,
477      and stripping any trailing carriage return or linefeed characters.
478
479
480   .. method:: header_store_parse(name, value)
481
482      The name is returned unchanged.  If the input value has a ``name``
483      attribute and it matches *name* ignoring case, the value is returned
484      unchanged.  Otherwise the *name* and *value* are passed to
485      ``header_factory``, and the resulting header object is returned as
486      the value.  In this case a ``ValueError`` is raised if the input value
487      contains CR or LF characters.
488
489
490   .. method:: header_fetch_parse(name, value)
491
492      If the value has a ``name`` attribute, it is returned to unmodified.
493      Otherwise the *name*, and the *value* with any CR or LF characters
494      removed, are passed to the ``header_factory``, and the resulting
495      header object is returned.  Any surrogateescaped bytes get turned into
496      the unicode unknown-character glyph.
497
498
499   .. method:: fold(name, value)
500
501      Header folding is controlled by the :attr:`refold_source` policy setting.
502      A value is considered to be a 'source value' if and only if it does not
503      have a ``name`` attribute (having a ``name`` attribute means it is a
504      header object of some sort).  If a source value needs to be refolded
505      according to the policy, it is converted into a header object by
506      passing the *name* and the *value* with any CR and LF characters removed
507      to the ``header_factory``.  Folding of a header object is done by
508      calling its ``fold`` method with the current policy.
509
510      Source values are split into lines using :meth:`~str.splitlines`.  If
511      the value is not to be refolded, the lines are rejoined using the
512      ``linesep`` from the policy and returned.  The exception is lines
513      containing non-ascii binary data.  In that case the value is refolded
514      regardless of the ``refold_source`` setting, which causes the binary data
515      to be CTE encoded using the ``unknown-8bit`` charset.
516
517
518   .. method:: fold_binary(name, value)
519
520      The same as :meth:`fold` if :attr:`~Policy.cte_type` is ``7bit``, except
521      that the returned value is bytes.
522
523      If :attr:`~Policy.cte_type` is ``8bit``, non-ASCII binary data is
524      converted back
525      into bytes.  Headers with binary data are not refolded, regardless of the
526      ``refold_header`` setting, since there is no way to know whether the
527      binary data consists of single byte characters or multibyte characters.
528
529
530The following instances of :class:`EmailPolicy` provide defaults suitable for
531specific application domains.  Note that in the future the behavior of these
532instances (in particular the ``HTTP`` instance) may be adjusted to conform even
533more closely to the RFCs relevant to their domains.
534
535
536.. data:: default
537
538   An instance of ``EmailPolicy`` with all defaults unchanged.  This policy
539   uses the standard Python ``\n`` line endings rather than the RFC-correct
540   ``\r\n``.
541
542
543.. data:: SMTP
544
545   Suitable for serializing messages in conformance with the email RFCs.
546   Like ``default``, but with ``linesep`` set to ``\r\n``, which is RFC
547   compliant.
548
549
550.. data:: SMTPUTF8
551
552   The same as ``SMTP`` except that :attr:`~EmailPolicy.utf8` is ``True``.
553   Useful for serializing messages to a message store without using encoded
554   words in the headers.  Should only be used for SMTP transmission if the
555   sender or recipient addresses have non-ASCII characters (the
556   :meth:`smtplib.SMTP.send_message` method handles this automatically).
557
558
559.. data:: HTTP
560
561   Suitable for serializing headers with for use in HTTP traffic.  Like
562   ``SMTP`` except that ``max_line_length`` is set to ``None`` (unlimited).
563
564
565.. data:: strict
566
567   Convenience instance.  The same as ``default`` except that
568   ``raise_on_defect`` is set to ``True``.  This allows any policy to be made
569   strict by writing::
570
571        somepolicy + policy.strict
572
573
574With all of these :class:`EmailPolicies <.EmailPolicy>`, the effective API of
575the email package is changed from the Python 3.2 API in the following ways:
576
577* Setting a header on a :class:`~email.message.Message` results in that
578  header being parsed and a header object created.
579
580* Fetching a header value from a :class:`~email.message.Message` results
581  in that header being parsed and a header object created and
582  returned.
583
584* Any header object, or any header that is refolded due to the
585  policy settings, is folded using an algorithm that fully implements the
586  RFC folding algorithms, including knowing where encoded words are required
587  and allowed.
588
589From the application view, this means that any header obtained through the
590:class:`~email.message.EmailMessage` is a header object with extra
591attributes, whose string value is the fully decoded unicode value of the
592header.  Likewise, a header may be assigned a new value, or a new header
593created, using a unicode string, and the policy will take care of converting
594the unicode string into the correct RFC encoded form.
595
596The header objects and their attributes are described in
597:mod:`~email.headerregistry`.
598
599
600
601.. class:: Compat32(**kw)
602
603   This concrete :class:`Policy` is the backward compatibility policy.  It
604   replicates the behavior of the email package in Python 3.2.  The
605   :mod:`~email.policy` module also defines an instance of this class,
606   :const:`compat32`, that is used as the default policy.  Thus the default
607   behavior of the email package is to maintain compatibility with Python 3.2.
608
609   The following attributes have values that are different from the
610   :class:`Policy` default:
611
612
613   .. attribute:: mangle_from_
614
615      The default is ``True``.
616
617
618   The class provides the following concrete implementations of the
619   abstract methods of :class:`Policy`:
620
621
622   .. method:: header_source_parse(sourcelines)
623
624      The name is parsed as everything up to the '``:``' and returned
625      unmodified.  The value is determined by stripping leading whitespace off
626      the remainder of the first line, joining all subsequent lines together,
627      and stripping any trailing carriage return or linefeed characters.
628
629
630   .. method:: header_store_parse(name, value)
631
632      The name and value are returned unmodified.
633
634
635   .. method:: header_fetch_parse(name, value)
636
637      If the value contains binary data, it is converted into a
638      :class:`~email.header.Header` object using the ``unknown-8bit`` charset.
639      Otherwise it is returned unmodified.
640
641
642   .. method:: fold(name, value)
643
644      Headers are folded using the :class:`~email.header.Header` folding
645      algorithm, which preserves existing line breaks in the value, and wraps
646      each resulting line to the ``max_line_length``.  Non-ASCII binary data are
647      CTE encoded using the ``unknown-8bit`` charset.
648
649
650   .. method:: fold_binary(name, value)
651
652      Headers are folded using the :class:`~email.header.Header` folding
653      algorithm, which preserves existing line breaks in the value, and wraps
654      each resulting line to the ``max_line_length``.  If ``cte_type`` is
655      ``7bit``, non-ascii binary data is CTE encoded using the ``unknown-8bit``
656      charset.  Otherwise the original source header is used, with its existing
657      line breaks and any (RFC invalid) binary data it may contain.
658
659
660.. data:: compat32
661
662   An instance of :class:`Compat32`, providing  backward compatibility with the
663   behavior of the email package in Python 3.2.
664
665
666.. rubric:: Footnotes
667
668.. [1] Originally added in 3.3 as a :term:`provisional feature <provisional
669       package>`.
670