1:mod:`email.headerregistry`: Custom Header Objects 2-------------------------------------------------- 3 4.. module:: email.headerregistry 5 :synopsis: Automatic Parsing of headers based on the field name 6 7.. moduleauthor:: R. David Murray <rdmurray@bitdance.com> 8.. sectionauthor:: R. David Murray <rdmurray@bitdance.com> 9 10**Source code:** :source:`Lib/email/headerregistry.py` 11 12-------------- 13 14.. versionadded:: 3.6 [1]_ 15 16Headers are represented by customized subclasses of :class:`str`. The 17particular class used to represent a given header is determined by the 18:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in 19effect when the headers are created. This section documents the particular 20``header_factory`` implemented by the email package for handling :RFC:`5322` 21compliant email messages, which not only provides customized header objects for 22various header types, but also provides an extension mechanism for applications 23to add their own custom header types. 24 25When using any of the policy objects derived from 26:data:`~email.policy.EmailPolicy`, all headers are produced by 27:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base 28class. Each header class has an additional base class that is determined by 29the type of the header. For example, many headers have the class 30:class:`.UnstructuredHeader` as their other base class. The specialized second 31class for a header is determined by the name of the header, using a lookup 32table stored in the :class:`.HeaderRegistry`. All of this is managed 33transparently for the typical application program, but interfaces are provided 34for modifying the default behavior for use by more complex applications. 35 36The sections below first document the header base classes and their attributes, 37followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and 38finally the support classes used to represent the data parsed from structured 39headers. 40 41 42.. class:: BaseHeader(name, value) 43 44 *name* and *value* are passed to ``BaseHeader`` from the 45 :attr:`~email.policy.EmailPolicy.header_factory` call. The string value of 46 any header object is the *value* fully decoded to unicode. 47 48 This base class defines the following read-only properties: 49 50 51 .. attribute:: name 52 53 The name of the header (the portion of the field before the ':'). This 54 is exactly the value passed in the 55 :attr:`~email.policy.EmailPolicy.header_factory` call for *name*; that 56 is, case is preserved. 57 58 59 .. attribute:: defects 60 61 A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any 62 RFC compliance problems found during parsing. The email package tries to 63 be complete about detecting compliance issues. See the :mod:`~email.errors` 64 module for a discussion of the types of defects that may be reported. 65 66 67 .. attribute:: max_count 68 69 The maximum number of headers of this type that can have the same 70 ``name``. A value of ``None`` means unlimited. The ``BaseHeader`` value 71 for this attribute is ``None``; it is expected that specialized header 72 classes will override this value as needed. 73 74 ``BaseHeader`` also provides the following method, which is called by the 75 email library code and should not in general be called by application 76 programs: 77 78 .. method:: fold(*, policy) 79 80 Return a string containing :attr:`~email.policy.Policy.linesep` 81 characters as required to correctly fold the header according to 82 *policy*. A :attr:`~email.policy.Policy.cte_type` of ``8bit`` will be 83 treated as if it were ``7bit``, since headers may not contain arbitrary 84 binary data. If :attr:`~email.policy.EmailPolicy.utf8` is ``False``, 85 non-ASCII data will be :rfc:`2047` encoded. 86 87 88 ``BaseHeader`` by itself cannot be used to create a header object. It 89 defines a protocol that each specialized header cooperates with in order to 90 produce the header object. Specifically, ``BaseHeader`` requires that 91 the specialized class provide a :func:`classmethod` named ``parse``. This 92 method is called as follows:: 93 94 parse(string, kwds) 95 96 ``kwds`` is a dictionary containing one pre-initialized key, ``defects``. 97 ``defects`` is an empty list. The parse method should append any detected 98 defects to this list. On return, the ``kwds`` dictionary *must* contain 99 values for at least the keys ``decoded`` and ``defects``. ``decoded`` 100 should be the string value for the header (that is, the header value fully 101 decoded to unicode). The parse method should assume that *string* may 102 contain content-transfer-encoded parts, but should correctly handle all valid 103 unicode characters as well so that it can parse un-encoded header values. 104 105 ``BaseHeader``'s ``__new__`` then creates the header instance, and calls its 106 ``init`` method. The specialized class only needs to provide an ``init`` 107 method if it wishes to set additional attributes beyond those provided by 108 ``BaseHeader`` itself. Such an ``init`` method should look like this:: 109 110 def init(self, /, *args, **kw): 111 self._myattr = kw.pop('myattr') 112 super().init(*args, **kw) 113 114 That is, anything extra that the specialized class puts in to the ``kwds`` 115 dictionary should be removed and handled, and the remaining contents of 116 ``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method. 117 118 119.. class:: UnstructuredHeader 120 121 An "unstructured" header is the default type of header in :rfc:`5322`. 122 Any header that does not have a specified syntax is treated as 123 unstructured. The classic example of an unstructured header is the 124 :mailheader:`Subject` header. 125 126 In :rfc:`5322`, an unstructured header is a run of arbitrary text in the 127 ASCII character set. :rfc:`2047`, however, has an :rfc:`5322` compatible 128 mechanism for encoding non-ASCII text as ASCII characters within a header 129 value. When a *value* containing encoded words is passed to the 130 constructor, the ``UnstructuredHeader`` parser converts such encoded words 131 into unicode, following the :rfc:`2047` rules for unstructured text. The 132 parser uses heuristics to attempt to decode certain non-compliant encoded 133 words. Defects are registered in such cases, as well as defects for issues 134 such as invalid characters within the encoded words or the non-encoded text. 135 136 This header type provides no additional attributes. 137 138 139.. class:: DateHeader 140 141 :rfc:`5322` specifies a very specific format for dates within email headers. 142 The ``DateHeader`` parser recognizes that date format, as well as 143 recognizing a number of variant forms that are sometimes found "in the 144 wild". 145 146 This header type provides the following additional attributes: 147 148 .. attribute:: datetime 149 150 If the header value can be recognized as a valid date of one form or 151 another, this attribute will contain a :class:`~datetime.datetime` 152 instance representing that date. If the timezone of the input date is 153 specified as ``-0000`` (indicating it is in UTC but contains no 154 information about the source timezone), then :attr:`.datetime` will be a 155 naive :class:`~datetime.datetime`. If a specific timezone offset is 156 found (including `+0000`), then :attr:`.datetime` will contain an aware 157 ``datetime`` that uses :class:`datetime.timezone` to record the timezone 158 offset. 159 160 The ``decoded`` value of the header is determined by formatting the 161 ``datetime`` according to the :rfc:`5322` rules; that is, it is set to:: 162 163 email.utils.format_datetime(self.datetime) 164 165 When creating a ``DateHeader``, *value* may be 166 :class:`~datetime.datetime` instance. This means, for example, that 167 the following code is valid and does what one would expect:: 168 169 msg['Date'] = datetime(2011, 7, 15, 21) 170 171 Because this is a naive ``datetime`` it will be interpreted as a UTC 172 timestamp, and the resulting value will have a timezone of ``-0000``. Much 173 more useful is to use the :func:`~email.utils.localtime` function from the 174 :mod:`~email.utils` module:: 175 176 msg['Date'] = utils.localtime() 177 178 This example sets the date header to the current time and date using 179 the current timezone offset. 180 181 182.. class:: AddressHeader 183 184 Address headers are one of the most complex structured header types. 185 The ``AddressHeader`` class provides a generic interface to any address 186 header. 187 188 This header type provides the following additional attributes: 189 190 191 .. attribute:: groups 192 193 A tuple of :class:`.Group` objects encoding the 194 addresses and groups found in the header value. Addresses that are 195 not part of a group are represented in this list as single-address 196 ``Groups`` whose :attr:`~.Group.display_name` is ``None``. 197 198 199 .. attribute:: addresses 200 201 A tuple of :class:`.Address` objects encoding all 202 of the individual addresses from the header value. If the header value 203 contains any groups, the individual addresses from the group are included 204 in the list at the point where the group occurs in the value (that is, 205 the list of addresses is "flattened" into a one dimensional list). 206 207 The ``decoded`` value of the header will have all encoded words decoded to 208 unicode. :class:`~encodings.idna` encoded domain names are also decoded to 209 unicode. The ``decoded`` value is set by :attr:`~str.join`\ ing the 210 :class:`str` value of the elements of the ``groups`` attribute with ``', 211 '``. 212 213 A list of :class:`.Address` and :class:`.Group` objects in any combination 214 may be used to set the value of an address header. ``Group`` objects whose 215 ``display_name`` is ``None`` will be interpreted as single addresses, which 216 allows an address list to be copied with groups intact by using the list 217 obtained from the ``groups`` attribute of the source header. 218 219 220.. class:: SingleAddressHeader 221 222 A subclass of :class:`.AddressHeader` that adds one 223 additional attribute: 224 225 226 .. attribute:: address 227 228 The single address encoded by the header value. If the header value 229 actually contains more than one address (which would be a violation of 230 the RFC under the default :mod:`~email.policy`), accessing this attribute 231 will result in a :exc:`ValueError`. 232 233 234Many of the above classes also have a ``Unique`` variant (for example, 235``UniqueUnstructuredHeader``). The only difference is that in the ``Unique`` 236variant, :attr:`~.BaseHeader.max_count` is set to 1. 237 238 239.. class:: MIMEVersionHeader 240 241 There is really only one valid value for the :mailheader:`MIME-Version` 242 header, and that is ``1.0``. For future proofing, this header class 243 supports other valid version numbers. If a version number has a valid value 244 per :rfc:`2045`, then the header object will have non-``None`` values for 245 the following attributes: 246 247 .. attribute:: version 248 249 The version number as a string, with any whitespace and/or comments 250 removed. 251 252 .. attribute:: major 253 254 The major version number as an integer 255 256 .. attribute:: minor 257 258 The minor version number as an integer 259 260 261.. class:: ParameterizedMIMEHeader 262 263 MIME headers all start with the prefix 'Content-'. Each specific header has 264 a certain value, described under the class for that header. Some can 265 also take a list of supplemental parameters, which have a common format. 266 This class serves as a base for all the MIME headers that take parameters. 267 268 .. attribute:: params 269 270 A dictionary mapping parameter names to parameter values. 271 272 273.. class:: ContentTypeHeader 274 275 A :class:`ParameterizedMIMEHeader` class that handles the 276 :mailheader:`Content-Type` header. 277 278 .. attribute:: content_type 279 280 The content type string, in the form ``maintype/subtype``. 281 282 .. attribute:: maintype 283 284 .. attribute:: subtype 285 286 287.. class:: ContentDispositionHeader 288 289 A :class:`ParameterizedMIMEHeader` class that handles the 290 :mailheader:`Content-Disposition` header. 291 292 .. attribute:: content_disposition 293 294 ``inline`` and ``attachment`` are the only valid values in common use. 295 296 297.. class:: ContentTransferEncoding 298 299 Handles the :mailheader:`Content-Transfer-Encoding` header. 300 301 .. attribute:: cte 302 303 Valid values are ``7bit``, ``8bit``, ``base64``, and 304 ``quoted-printable``. See :rfc:`2045` for more information. 305 306 307 308.. class:: HeaderRegistry(base_class=BaseHeader, \ 309 default_class=UnstructuredHeader, \ 310 use_default_map=True) 311 312 This is the factory used by :class:`~email.policy.EmailPolicy` by default. 313 ``HeaderRegistry`` builds the class used to create a header instance 314 dynamically, using *base_class* and a specialized class retrieved from a 315 registry that it holds. When a given header name does not appear in the 316 registry, the class specified by *default_class* is used as the specialized 317 class. When *use_default_map* is ``True`` (the default), the standard 318 mapping of header names to classes is copied in to the registry during 319 initialization. *base_class* is always the last class in the generated 320 class's ``__bases__`` list. 321 322 The default mappings are: 323 324 :subject: UniqueUnstructuredHeader 325 :date: UniqueDateHeader 326 :resent-date: DateHeader 327 :orig-date: UniqueDateHeader 328 :sender: UniqueSingleAddressHeader 329 :resent-sender: SingleAddressHeader 330 :to: UniqueAddressHeader 331 :resent-to: AddressHeader 332 :cc: UniqueAddressHeader 333 :resent-cc: AddressHeader 334 :bcc: UniqueAddressHeader 335 :resent-bcc: AddressHeader 336 :from: UniqueAddressHeader 337 :resent-from: AddressHeader 338 :reply-to: UniqueAddressHeader 339 :mime-version: MIMEVersionHeader 340 :content-type: ContentTypeHeader 341 :content-disposition: ContentDispositionHeader 342 :content-transfer-encoding: ContentTransferEncodingHeader 343 :message-id: MessageIDHeader 344 345 ``HeaderRegistry`` has the following methods: 346 347 348 .. method:: map_to_type(self, name, cls) 349 350 *name* is the name of the header to be mapped. It will be converted to 351 lower case in the registry. *cls* is the specialized class to be used, 352 along with *base_class*, to create the class used to instantiate headers 353 that match *name*. 354 355 356 .. method:: __getitem__(name) 357 358 Construct and return a class to handle creating a *name* header. 359 360 361 .. method:: __call__(name, value) 362 363 Retrieves the specialized header associated with *name* from the 364 registry (using *default_class* if *name* does not appear in the 365 registry) and composes it with *base_class* to produce a class, 366 calls the constructed class's constructor, passing it the same 367 argument list, and finally returns the class instance created thereby. 368 369 370The following classes are the classes used to represent data parsed from 371structured headers and can, in general, be used by an application program to 372construct structured values to assign to specific headers. 373 374 375.. class:: Address(display_name='', username='', domain='', addr_spec=None) 376 377 The class used to represent an email address. The general form of an 378 address is:: 379 380 [display_name] <username@domain> 381 382 or:: 383 384 username@domain 385 386 where each part must conform to specific syntax rules spelled out in 387 :rfc:`5322`. 388 389 As a convenience *addr_spec* can be specified instead of *username* and 390 *domain*, in which case *username* and *domain* will be parsed from the 391 *addr_spec*. An *addr_spec* must be a properly RFC quoted string; if it is 392 not ``Address`` will raise an error. Unicode characters are allowed and 393 will be property encoded when serialized. However, per the RFCs, unicode is 394 *not* allowed in the username portion of the address. 395 396 .. attribute:: display_name 397 398 The display name portion of the address, if any, with all quoting 399 removed. If the address does not have a display name, this attribute 400 will be an empty string. 401 402 .. attribute:: username 403 404 The ``username`` portion of the address, with all quoting removed. 405 406 .. attribute:: domain 407 408 The ``domain`` portion of the address. 409 410 .. attribute:: addr_spec 411 412 The ``username@domain`` portion of the address, correctly quoted 413 for use as a bare address (the second form shown above). This 414 attribute is not mutable. 415 416 .. method:: __str__() 417 418 The ``str`` value of the object is the address quoted according to 419 :rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII 420 characters. 421 422 To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if 423 ``username`` and ``domain`` are both the empty string (or ``None``), then 424 the string value of the ``Address`` is ``<>``. 425 426 427.. class:: Group(display_name=None, addresses=None) 428 429 The class used to represent an address group. The general form of an 430 address group is:: 431 432 display_name: [address-list]; 433 434 As a convenience for processing lists of addresses that consist of a mixture 435 of groups and single addresses, a ``Group`` may also be used to represent 436 single addresses that are not part of a group by setting *display_name* to 437 ``None`` and providing a list of the single address as *addresses*. 438 439 .. attribute:: display_name 440 441 The ``display_name`` of the group. If it is ``None`` and there is 442 exactly one ``Address`` in ``addresses``, then the ``Group`` represents a 443 single address that is not in a group. 444 445 .. attribute:: addresses 446 447 A possibly empty tuple of :class:`.Address` objects representing the 448 addresses in the group. 449 450 .. method:: __str__() 451 452 The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`, 453 but with no Content Transfer Encoding of any non-ASCII characters. If 454 ``display_name`` is none and there is a single ``Address`` in the 455 ``addresses`` list, the ``str`` value will be the same as the ``str`` of 456 that single ``Address``. 457 458 459.. rubric:: Footnotes 460 461.. [1] Originally added in 3.3 as a :term:`provisional module <provisional 462 package>` 463