• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5   :synopsis: Implementation of the ElementTree API.
6
7.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
8
9**Source code:** :source:`Lib/xml/etree/ElementTree.py`
10
11--------------
12
13The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
14for parsing and creating XML data.
15
16.. versionchanged:: 3.3
17   This module will use a fast implementation whenever available.
18
19.. deprecated:: 3.3
20   The :mod:`xml.etree.cElementTree` module is deprecated.
21
22
23.. warning::
24
25   The :mod:`xml.etree.ElementTree` module is not secure against
26   maliciously constructed data.  If you need to parse untrusted or
27   unauthenticated data see :ref:`xml-vulnerabilities`.
28
29Tutorial
30--------
31
32This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
33short).  The goal is to demonstrate some of the building blocks and basic
34concepts of the module.
35
36XML tree and elements
37^^^^^^^^^^^^^^^^^^^^^
38
39XML is an inherently hierarchical data format, and the most natural way to
40represent it is with a tree.  ``ET`` has two classes for this purpose -
41:class:`ElementTree` represents the whole XML document as a tree, and
42:class:`Element` represents a single node in this tree.  Interactions with
43the whole document (reading and writing to/from files) are usually done
44on the :class:`ElementTree` level.  Interactions with a single XML element
45and its sub-elements are done on the :class:`Element` level.
46
47.. _elementtree-parsing-xml:
48
49Parsing XML
50^^^^^^^^^^^
51
52We'll be using the following XML document as the sample data for this section:
53
54.. code-block:: xml
55
56   <?xml version="1.0"?>
57   <data>
58       <country name="Liechtenstein">
59           <rank>1</rank>
60           <year>2008</year>
61           <gdppc>141100</gdppc>
62           <neighbor name="Austria" direction="E"/>
63           <neighbor name="Switzerland" direction="W"/>
64       </country>
65       <country name="Singapore">
66           <rank>4</rank>
67           <year>2011</year>
68           <gdppc>59900</gdppc>
69           <neighbor name="Malaysia" direction="N"/>
70       </country>
71       <country name="Panama">
72           <rank>68</rank>
73           <year>2011</year>
74           <gdppc>13600</gdppc>
75           <neighbor name="Costa Rica" direction="W"/>
76           <neighbor name="Colombia" direction="E"/>
77       </country>
78   </data>
79
80We can import this data by reading from a file::
81
82   import xml.etree.ElementTree as ET
83   tree = ET.parse('country_data.xml')
84   root = tree.getroot()
85
86Or directly from a string::
87
88   root = ET.fromstring(country_data_as_string)
89
90:func:`fromstring` parses XML from a string directly into an :class:`Element`,
91which is the root element of the parsed tree.  Other parsing functions may
92create an :class:`ElementTree`.  Check the documentation to be sure.
93
94As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
95
96   >>> root.tag
97   'data'
98   >>> root.attrib
99   {}
100
101It also has children nodes over which we can iterate::
102
103   >>> for child in root:
104   ...     print(child.tag, child.attrib)
105   ...
106   country {'name': 'Liechtenstein'}
107   country {'name': 'Singapore'}
108   country {'name': 'Panama'}
109
110Children are nested, and we can access specific child nodes by index::
111
112   >>> root[0][1].text
113   '2008'
114
115
116.. note::
117
118   Not all elements of the XML input will end up as elements of the
119   parsed tree. Currently, this module skips over any XML comments,
120   processing instructions, and document type declarations in the
121   input. Nevertheless, trees built using this module's API rather
122   than parsing from XML text can have comments and processing
123   instructions in them; they will be included when generating XML
124   output. A document type declaration may be accessed by passing a
125   custom :class:`TreeBuilder` instance to the :class:`XMLParser`
126   constructor.
127
128
129.. _elementtree-pull-parsing:
130
131Pull API for non-blocking parsing
132^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
133
134Most parsing functions provided by this module require the whole document
135to be read at once before returning any result.  It is possible to use an
136:class:`XMLParser` and feed data into it incrementally, but it is a push API that
137calls methods on a callback target, which is too low-level and inconvenient for
138most needs.  Sometimes what the user really wants is to be able to parse XML
139incrementally, without blocking operations, while enjoying the convenience of
140fully constructed :class:`Element` objects.
141
142The most powerful tool for doing this is :class:`XMLPullParser`.  It does not
143require a blocking read to obtain the XML data, and is instead fed with data
144incrementally with :meth:`XMLPullParser.feed` calls.  To get the parsed XML
145elements, call :meth:`XMLPullParser.read_events`.  Here is an example::
146
147   >>> parser = ET.XMLPullParser(['start', 'end'])
148   >>> parser.feed('<mytag>sometext')
149   >>> list(parser.read_events())
150   [('start', <Element 'mytag' at 0x7fa66db2be58>)]
151   >>> parser.feed(' more text</mytag>')
152   >>> for event, elem in parser.read_events():
153   ...     print(event)
154   ...     print(elem.tag, 'text=', elem.text)
155   ...
156   end
157
158The obvious use case is applications that operate in a non-blocking fashion
159where the XML data is being received from a socket or read incrementally from
160some storage device.  In such cases, blocking reads are unacceptable.
161
162Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for
163simpler use-cases.  If you don't mind your application blocking on reading XML
164data but would still like to have incremental parsing capabilities, take a look
165at :func:`iterparse`.  It can be useful when you're reading a large XML document
166and don't want to hold it wholly in memory.
167
168Finding interesting elements
169^^^^^^^^^^^^^^^^^^^^^^^^^^^^
170
171:class:`Element` has some useful methods that help iterate recursively over all
172the sub-tree below it (its children, their children, and so on).  For example,
173:meth:`Element.iter`::
174
175   >>> for neighbor in root.iter('neighbor'):
176   ...     print(neighbor.attrib)
177   ...
178   {'name': 'Austria', 'direction': 'E'}
179   {'name': 'Switzerland', 'direction': 'W'}
180   {'name': 'Malaysia', 'direction': 'N'}
181   {'name': 'Costa Rica', 'direction': 'W'}
182   {'name': 'Colombia', 'direction': 'E'}
183
184:meth:`Element.findall` finds only elements with a tag which are direct
185children of the current element.  :meth:`Element.find` finds the *first* child
186with a particular tag, and :attr:`Element.text` accesses the element's text
187content.  :meth:`Element.get` accesses the element's attributes::
188
189   >>> for country in root.findall('country'):
190   ...     rank = country.find('rank').text
191   ...     name = country.get('name')
192   ...     print(name, rank)
193   ...
194   Liechtenstein 1
195   Singapore 4
196   Panama 68
197
198More sophisticated specification of which elements to look for is possible by
199using :ref:`XPath <elementtree-xpath>`.
200
201Modifying an XML File
202^^^^^^^^^^^^^^^^^^^^^
203
204:class:`ElementTree` provides a simple way to build XML documents and write them to files.
205The :meth:`ElementTree.write` method serves this purpose.
206
207Once created, an :class:`Element` object may be manipulated by directly changing
208its fields (such as :attr:`Element.text`), adding and modifying attributes
209(:meth:`Element.set` method), as well as adding new children (for example
210with :meth:`Element.append`).
211
212Let's say we want to add one to each country's rank, and add an ``updated``
213attribute to the rank element::
214
215   >>> for rank in root.iter('rank'):
216   ...     new_rank = int(rank.text) + 1
217   ...     rank.text = str(new_rank)
218   ...     rank.set('updated', 'yes')
219   ...
220   >>> tree.write('output.xml')
221
222Our XML now looks like this:
223
224.. code-block:: xml
225
226   <?xml version="1.0"?>
227   <data>
228       <country name="Liechtenstein">
229           <rank updated="yes">2</rank>
230           <year>2008</year>
231           <gdppc>141100</gdppc>
232           <neighbor name="Austria" direction="E"/>
233           <neighbor name="Switzerland" direction="W"/>
234       </country>
235       <country name="Singapore">
236           <rank updated="yes">5</rank>
237           <year>2011</year>
238           <gdppc>59900</gdppc>
239           <neighbor name="Malaysia" direction="N"/>
240       </country>
241       <country name="Panama">
242           <rank updated="yes">69</rank>
243           <year>2011</year>
244           <gdppc>13600</gdppc>
245           <neighbor name="Costa Rica" direction="W"/>
246           <neighbor name="Colombia" direction="E"/>
247       </country>
248   </data>
249
250We can remove elements using :meth:`Element.remove`.  Let's say we want to
251remove all countries with a rank higher than 50::
252
253   >>> for country in root.findall('country'):
254   ...     # using root.findall() to avoid removal during traversal
255   ...     rank = int(country.find('rank').text)
256   ...     if rank > 50:
257   ...         root.remove(country)
258   ...
259   >>> tree.write('output.xml')
260
261Note that concurrent modification while iterating can lead to problems,
262just like when iterating and modifying Python lists or dicts.
263Therefore, the example first collects all matching elements with
264``root.findall()``, and only then iterates over the list of matches.
265
266Our XML now looks like this:
267
268.. code-block:: xml
269
270   <?xml version="1.0"?>
271   <data>
272       <country name="Liechtenstein">
273           <rank updated="yes">2</rank>
274           <year>2008</year>
275           <gdppc>141100</gdppc>
276           <neighbor name="Austria" direction="E"/>
277           <neighbor name="Switzerland" direction="W"/>
278       </country>
279       <country name="Singapore">
280           <rank updated="yes">5</rank>
281           <year>2011</year>
282           <gdppc>59900</gdppc>
283           <neighbor name="Malaysia" direction="N"/>
284       </country>
285   </data>
286
287Building XML documents
288^^^^^^^^^^^^^^^^^^^^^^
289
290The :func:`SubElement` function also provides a convenient way to create new
291sub-elements for a given element::
292
293   >>> a = ET.Element('a')
294   >>> b = ET.SubElement(a, 'b')
295   >>> c = ET.SubElement(a, 'c')
296   >>> d = ET.SubElement(c, 'd')
297   >>> ET.dump(a)
298   <a><b /><c><d /></c></a>
299
300Parsing XML with Namespaces
301^^^^^^^^^^^^^^^^^^^^^^^^^^^
302
303If the XML input has `namespaces
304<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes
305with prefixes in the form ``prefix:sometag`` get expanded to
306``{uri}sometag`` where the *prefix* is replaced by the full *URI*.
307Also, if there is a `default namespace
308<https://www.w3.org/TR/xml-names/#defaulting>`__,
309that full URI gets prepended to all of the non-prefixed tags.
310
311Here is an XML example that incorporates two namespaces, one with the
312prefix "fictional" and the other serving as the default namespace:
313
314.. code-block:: xml
315
316    <?xml version="1.0"?>
317    <actors xmlns:fictional="http://characters.example.com"
318            xmlns="http://people.example.com">
319        <actor>
320            <name>John Cleese</name>
321            <fictional:character>Lancelot</fictional:character>
322            <fictional:character>Archie Leach</fictional:character>
323        </actor>
324        <actor>
325            <name>Eric Idle</name>
326            <fictional:character>Sir Robin</fictional:character>
327            <fictional:character>Gunther</fictional:character>
328            <fictional:character>Commander Clement</fictional:character>
329        </actor>
330    </actors>
331
332One way to search and explore this XML example is to manually add the
333URI to every tag or attribute in the xpath of a
334:meth:`~Element.find` or :meth:`~Element.findall`::
335
336    root = fromstring(xml_text)
337    for actor in root.findall('{http://people.example.com}actor'):
338        name = actor.find('{http://people.example.com}name')
339        print(name.text)
340        for char in actor.findall('{http://characters.example.com}character'):
341            print(' |-->', char.text)
342
343A better way to search the namespaced XML example is to create a
344dictionary with your own prefixes and use those in the search functions::
345
346    ns = {'real_person': 'http://people.example.com',
347          'role': 'http://characters.example.com'}
348
349    for actor in root.findall('real_person:actor', ns):
350        name = actor.find('real_person:name', ns)
351        print(name.text)
352        for char in actor.findall('role:character', ns):
353            print(' |-->', char.text)
354
355These two approaches both output::
356
357    John Cleese
358     |--> Lancelot
359     |--> Archie Leach
360    Eric Idle
361     |--> Sir Robin
362     |--> Gunther
363     |--> Commander Clement
364
365
366Additional resources
367^^^^^^^^^^^^^^^^^^^^
368
369See http://effbot.org/zone/element-index.htm for tutorials and links to other
370docs.
371
372
373.. _elementtree-xpath:
374
375XPath support
376-------------
377
378This module provides limited support for
379`XPath expressions <https://www.w3.org/TR/xpath>`_ for locating elements in a
380tree.  The goal is to support a small subset of the abbreviated syntax; a full
381XPath engine is outside the scope of the module.
382
383Example
384^^^^^^^
385
386Here's an example that demonstrates some of the XPath capabilities of the
387module.  We'll be using the ``countrydata`` XML document from the
388:ref:`Parsing XML <elementtree-parsing-xml>` section::
389
390   import xml.etree.ElementTree as ET
391
392   root = ET.fromstring(countrydata)
393
394   # Top-level elements
395   root.findall(".")
396
397   # All 'neighbor' grand-children of 'country' children of the top-level
398   # elements
399   root.findall("./country/neighbor")
400
401   # Nodes with name='Singapore' that have a 'year' child
402   root.findall(".//year/..[@name='Singapore']")
403
404   # 'year' nodes that are children of nodes with name='Singapore'
405   root.findall(".//*[@name='Singapore']/year")
406
407   # All 'neighbor' nodes that are the second child of their parent
408   root.findall(".//neighbor[2]")
409
410For XML with namespaces, use the usual qualified ``{namespace}tag`` notation::
411
412   # All dublin-core "title" tags in the document
413   root.findall(".//{http://purl.org/dc/elements/1.1/}title")
414
415
416Supported XPath syntax
417^^^^^^^^^^^^^^^^^^^^^^
418
419.. tabularcolumns:: |l|L|
420
421+-----------------------+------------------------------------------------------+
422| Syntax                | Meaning                                              |
423+=======================+======================================================+
424| ``tag``               | Selects all child elements with the given tag.       |
425|                       | For example, ``spam`` selects all child elements     |
426|                       | named ``spam``, and ``spam/egg`` selects all         |
427|                       | grandchildren named ``egg`` in all children named    |
428|                       | ``spam``.  ``{namespace}*`` selects all tags in the  |
429|                       | given namespace, ``{*}spam`` selects tags named      |
430|                       | ``spam`` in any (or no) namespace, and ``{}*``       |
431|                       | only selects tags that are not in a namespace.       |
432|                       |                                                      |
433|                       | .. versionchanged:: 3.8                              |
434|                       |    Support for star-wildcards was added.             |
435+-----------------------+------------------------------------------------------+
436| ``*``                 | Selects all child elements, including comments and   |
437|                       | processing instructions.  For example, ``*/egg``     |
438|                       | selects all grandchildren named ``egg``.             |
439+-----------------------+------------------------------------------------------+
440| ``.``                 | Selects the current node.  This is mostly useful     |
441|                       | at the beginning of the path, to indicate that it's  |
442|                       | a relative path.                                     |
443+-----------------------+------------------------------------------------------+
444| ``//``                | Selects all subelements, on all levels beneath the   |
445|                       | current  element.  For example, ``.//egg`` selects   |
446|                       | all ``egg`` elements in the entire tree.             |
447+-----------------------+------------------------------------------------------+
448| ``..``                | Selects the parent element.  Returns ``None`` if the |
449|                       | path attempts to reach the ancestors of the start    |
450|                       | element (the element ``find`` was called on).        |
451+-----------------------+------------------------------------------------------+
452| ``[@attrib]``         | Selects all elements that have the given attribute.  |
453+-----------------------+------------------------------------------------------+
454| ``[@attrib='value']`` | Selects all elements for which the given attribute   |
455|                       | has the given value.  The value cannot contain       |
456|                       | quotes.                                              |
457+-----------------------+------------------------------------------------------+
458| ``[@attrib!='value']``| Selects all elements for which the given attribute   |
459|                       | does not have the given value. The value cannot      |
460|                       | contain quotes.                                      |
461|                       |                                                      |
462|                       | .. versionadded:: 3.10                               |
463+-----------------------+------------------------------------------------------+
464| ``[tag]``             | Selects all elements that have a child named         |
465|                       | ``tag``.  Only immediate children are supported.     |
466+-----------------------+------------------------------------------------------+
467| ``[.='text']``        | Selects all elements whose complete text content,    |
468|                       | including descendants, equals the given ``text``.    |
469|                       |                                                      |
470|                       | .. versionadded:: 3.7                                |
471+-----------------------+------------------------------------------------------+
472| ``[.!='text']``       | Selects all elements whose complete text content,    |
473|                       | including descendants, does not equal the given      |
474|                       | ``text``.                                            |
475|                       |                                                      |
476|                       | .. versionadded:: 3.10                               |
477+-----------------------+------------------------------------------------------+
478| ``[tag='text']``      | Selects all elements that have a child named         |
479|                       | ``tag`` whose complete text content, including       |
480|                       | descendants, equals the given ``text``.              |
481+-----------------------+------------------------------------------------------+
482| ``[tag!='text']``     | Selects all elements that have a child named         |
483|                       | ``tag`` whose complete text content, including       |
484|                       | descendants, does not equal the given ``text``.      |
485|                       |                                                      |
486|                       | .. versionadded:: 3.10                               |
487+-----------------------+------------------------------------------------------+
488| ``[position]``        | Selects all elements that are located at the given   |
489|                       | position.  The position can be either an integer     |
490|                       | (1 is the first position), the expression ``last()`` |
491|                       | (for the last position), or a position relative to   |
492|                       | the last position (e.g. ``last()-1``).               |
493+-----------------------+------------------------------------------------------+
494
495Predicates (expressions within square brackets) must be preceded by a tag
496name, an asterisk, or another predicate.  ``position`` predicates must be
497preceded by a tag name.
498
499Reference
500---------
501
502.. _elementtree-functions:
503
504Functions
505^^^^^^^^^
506
507.. function:: canonicalize(xml_data=None, *, out=None, from_file=None, **options)
508
509   `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ transformation function.
510
511   Canonicalization is a way to normalise XML output in a way that allows
512   byte-by-byte comparisons and digital signatures.  It reduced the freedom
513   that XML serializers have and instead generates a more constrained XML
514   representation.  The main restrictions regard the placement of namespace
515   declarations, the ordering of attributes, and ignorable whitespace.
516
517   This function takes an XML data string (*xml_data*) or a file path or
518   file-like object (*from_file*) as input, converts it to the canonical
519   form, and writes it out using the *out* file(-like) object, if provided,
520   or returns it as a text string if not.  The output file receives text,
521   not bytes.  It should therefore be opened in text mode with ``utf-8``
522   encoding.
523
524   Typical uses::
525
526      xml_data = "<root>...</root>"
527      print(canonicalize(xml_data))
528
529      with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
530          canonicalize(xml_data, out=out_file)
531
532      with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
533          canonicalize(from_file="inputfile.xml", out=out_file)
534
535   The configuration *options* are as follows:
536
537   - *with_comments*: set to true to include comments (default: false)
538   - *strip_text*: set to true to strip whitespace before and after text content
539                   (default: false)
540   - *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}"
541                         (default: false)
542   - *qname_aware_tags*: a set of qname aware tag names in which prefixes
543                         should be replaced in text content (default: empty)
544   - *qname_aware_attrs*: a set of qname aware attribute names in which prefixes
545                          should be replaced in text content (default: empty)
546   - *exclude_attrs*: a set of attribute names that should not be serialised
547   - *exclude_tags*: a set of tag names that should not be serialised
548
549   In the option list above, "a set" refers to any collection or iterable of
550   strings, no ordering is expected.
551
552   .. versionadded:: 3.8
553
554
555.. function:: Comment(text=None)
556
557   Comment element factory.  This factory function creates a special element
558   that will be serialized as an XML comment by the standard serializer.  The
559   comment string can be either a bytestring or a Unicode string.  *text* is a
560   string containing the comment string.  Returns an element instance
561   representing a comment.
562
563   Note that :class:`XMLParser` skips over comments in the input
564   instead of creating comment objects for them. An :class:`ElementTree` will
565   only contain comment nodes if they have been inserted into to
566   the tree using one of the :class:`Element` methods.
567
568.. function:: dump(elem)
569
570   Writes an element tree or element structure to sys.stdout.  This function
571   should be used for debugging only.
572
573   The exact output format is implementation dependent.  In this version, it's
574   written as an ordinary XML file.
575
576   *elem* is an element tree or an individual element.
577
578   .. versionchanged:: 3.8
579      The :func:`dump` function now preserves the attribute order specified
580      by the user.
581
582
583.. function:: fromstring(text, parser=None)
584
585   Parses an XML section from a string constant.  Same as :func:`XML`.  *text*
586   is a string containing XML data.  *parser* is an optional parser instance.
587   If not given, the standard :class:`XMLParser` parser is used.
588   Returns an :class:`Element` instance.
589
590
591.. function:: fromstringlist(sequence, parser=None)
592
593   Parses an XML document from a sequence of string fragments.  *sequence* is a
594   list or other sequence containing XML data fragments.  *parser* is an
595   optional parser instance.  If not given, the standard :class:`XMLParser`
596   parser is used.  Returns an :class:`Element` instance.
597
598   .. versionadded:: 3.2
599
600
601.. function:: indent(tree, space="  ", level=0)
602
603   Appends whitespace to the subtree to indent the tree visually.
604   This can be used to generate pretty-printed XML output.
605   *tree* can be an Element or ElementTree.  *space* is the whitespace
606   string that will be inserted for each indentation level, two space
607   characters by default.  For indenting partial subtrees inside of an
608   already indented tree, pass the initial indentation level as *level*.
609
610   .. versionadded:: 3.9
611
612
613.. function:: iselement(element)
614
615   Check if an object appears to be a valid element object.  *element* is an
616   element instance.  Return ``True`` if this is an element object.
617
618
619.. function:: iterparse(source, events=None, parser=None)
620
621   Parses an XML section into an element tree incrementally, and reports what's
622   going on to the user.  *source* is a filename or :term:`file object`
623   containing XML data.  *events* is a sequence of events to report back.  The
624   supported events are the strings ``"start"``, ``"end"``, ``"comment"``,
625   ``"pi"``, ``"start-ns"`` and ``"end-ns"``
626   (the "ns" events are used to get detailed namespace
627   information).  If *events* is omitted, only ``"end"`` events are reported.
628   *parser* is an optional parser instance.  If not given, the standard
629   :class:`XMLParser` parser is used.  *parser* must be a subclass of
630   :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a
631   target.  Returns an :term:`iterator` providing ``(event, elem)`` pairs.
632
633   Note that while :func:`iterparse` builds the tree incrementally, it issues
634   blocking reads on *source* (or the file it names).  As such, it's unsuitable
635   for applications where blocking reads can't be made.  For fully non-blocking
636   parsing, see :class:`XMLPullParser`.
637
638   .. note::
639
640      :func:`iterparse` only guarantees that it has seen the ">" character of a
641      starting tag when it emits a "start" event, so the attributes are defined,
642      but the contents of the text and tail attributes are undefined at that
643      point.  The same applies to the element children; they may or may not be
644      present.
645
646      If you need a fully populated element, look for "end" events instead.
647
648   .. deprecated:: 3.4
649      The *parser* argument.
650
651   .. versionchanged:: 3.8
652      The ``comment`` and ``pi`` events were added.
653
654
655.. function:: parse(source, parser=None)
656
657   Parses an XML section into an element tree.  *source* is a filename or file
658   object containing XML data.  *parser* is an optional parser instance.  If
659   not given, the standard :class:`XMLParser` parser is used.  Returns an
660   :class:`ElementTree` instance.
661
662
663.. function:: ProcessingInstruction(target, text=None)
664
665   PI element factory.  This factory function creates a special element that
666   will be serialized as an XML processing instruction.  *target* is a string
667   containing the PI target.  *text* is a string containing the PI contents, if
668   given.  Returns an element instance, representing a processing instruction.
669
670   Note that :class:`XMLParser` skips over processing instructions
671   in the input instead of creating comment objects for them. An
672   :class:`ElementTree` will only contain processing instruction nodes if
673   they have been inserted into to the tree using one of the
674   :class:`Element` methods.
675
676.. function:: register_namespace(prefix, uri)
677
678   Registers a namespace prefix.  The registry is global, and any existing
679   mapping for either the given prefix or the namespace URI will be removed.
680   *prefix* is a namespace prefix.  *uri* is a namespace uri.  Tags and
681   attributes in this namespace will be serialized with the given prefix, if at
682   all possible.
683
684   .. versionadded:: 3.2
685
686
687.. function:: SubElement(parent, tag, attrib={}, **extra)
688
689   Subelement factory.  This function creates an element instance, and appends
690   it to an existing element.
691
692   The element name, attribute names, and attribute values can be either
693   bytestrings or Unicode strings.  *parent* is the parent element.  *tag* is
694   the subelement name.  *attrib* is an optional dictionary, containing element
695   attributes.  *extra* contains additional attributes, given as keyword
696   arguments.  Returns an element instance.
697
698
699.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
700                       xml_declaration=None, default_namespace=None, \
701                       short_empty_elements=True)
702
703   Generates a string representation of an XML element, including all
704   subelements.  *element* is an :class:`Element` instance.  *encoding* [1]_ is
705   the output encoding (default is US-ASCII).  Use ``encoding="unicode"`` to
706   generate a Unicode string (otherwise, a bytestring is generated).  *method*
707   is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
708   *xml_declaration*, *default_namespace* and *short_empty_elements* has the same
709   meaning as in :meth:`ElementTree.write`. Returns an (optionally) encoded string
710   containing the XML data.
711
712   .. versionadded:: 3.4
713      The *short_empty_elements* parameter.
714
715   .. versionadded:: 3.8
716      The *xml_declaration* and *default_namespace* parameters.
717
718   .. versionchanged:: 3.8
719      The :func:`tostring` function now preserves the attribute order
720      specified by the user.
721
722
723.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
724                           xml_declaration=None, default_namespace=None, \
725                           short_empty_elements=True)
726
727   Generates a string representation of an XML element, including all
728   subelements.  *element* is an :class:`Element` instance.  *encoding* [1]_ is
729   the output encoding (default is US-ASCII).  Use ``encoding="unicode"`` to
730   generate a Unicode string (otherwise, a bytestring is generated).  *method*
731   is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
732   *xml_declaration*, *default_namespace* and *short_empty_elements* has the same
733   meaning as in :meth:`ElementTree.write`. Returns a list of (optionally) encoded
734   strings containing the XML data. It does not guarantee any specific sequence,
735   except that ``b"".join(tostringlist(element)) == tostring(element)``.
736
737   .. versionadded:: 3.2
738
739   .. versionadded:: 3.4
740      The *short_empty_elements* parameter.
741
742   .. versionadded:: 3.8
743      The *xml_declaration* and *default_namespace* parameters.
744
745   .. versionchanged:: 3.8
746      The :func:`tostringlist` function now preserves the attribute order
747      specified by the user.
748
749
750.. function:: XML(text, parser=None)
751
752   Parses an XML section from a string constant.  This function can be used to
753   embed "XML literals" in Python code.  *text* is a string containing XML
754   data.  *parser* is an optional parser instance.  If not given, the standard
755   :class:`XMLParser` parser is used.  Returns an :class:`Element` instance.
756
757
758.. function:: XMLID(text, parser=None)
759
760   Parses an XML section from a string constant, and also returns a dictionary
761   which maps from element id:s to elements.  *text* is a string containing XML
762   data.  *parser* is an optional parser instance.  If not given, the standard
763   :class:`XMLParser` parser is used.  Returns a tuple containing an
764   :class:`Element` instance and a dictionary.
765
766
767.. _elementtree-xinclude:
768
769XInclude support
770----------------
771
772This module provides limited support for
773`XInclude directives <https://www.w3.org/TR/xinclude/>`_, via the :mod:`xml.etree.ElementInclude` helper module.  This module can be used to insert subtrees and text strings into element trees, based on information in the tree.
774
775Example
776^^^^^^^
777
778Here's an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the ``{http://www.w3.org/2001/XInclude}include`` element and set the **parse** attribute to ``"xml"``, and use the **href** attribute to specify the document to include.
779
780.. code-block:: xml
781
782    <?xml version="1.0"?>
783    <document xmlns:xi="http://www.w3.org/2001/XInclude">
784      <xi:include href="source.xml" parse="xml" />
785    </document>
786
787By default, the **href** attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax.
788
789To process this file, load it as usual, and pass the root element to the :mod:`xml.etree.ElementTree` module:
790
791.. code-block:: python
792
793   from xml.etree import ElementTree, ElementInclude
794
795   tree = ElementTree.parse("document.xml")
796   root = tree.getroot()
797
798   ElementInclude.include(root)
799
800The ElementInclude module replaces the ``{http://www.w3.org/2001/XInclude}include`` element with the root element from the **source.xml** document. The result might look something like this:
801
802.. code-block:: xml
803
804    <document xmlns:xi="http://www.w3.org/2001/XInclude">
805      <para>This is a paragraph.</para>
806    </document>
807
808If the **parse** attribute is omitted, it defaults to "xml". The href attribute is required.
809
810To include a text document, use the ``{http://www.w3.org/2001/XInclude}include`` element, and set the **parse** attribute to "text":
811
812.. code-block:: xml
813
814    <?xml version="1.0"?>
815    <document xmlns:xi="http://www.w3.org/2001/XInclude">
816      Copyright (c) <xi:include href="year.txt" parse="text" />.
817    </document>
818
819The result might look something like:
820
821.. code-block:: xml
822
823    <document xmlns:xi="http://www.w3.org/2001/XInclude">
824      Copyright (c) 2003.
825    </document>
826
827Reference
828---------
829
830.. _elementinclude-functions:
831
832Functions
833^^^^^^^^^
834
835.. function:: xml.etree.ElementInclude.default_loader( href, parse, encoding=None)
836
837   Default loader. This default loader reads an included resource from disk.  *href* is a URL.
838   *parse* is for parse mode either "xml" or "text".  *encoding*
839   is an optional text encoding.  If not given, encoding is ``utf-8``.  Returns the
840   expanded resource.  If the parse mode is ``"xml"``, this is an ElementTree
841   instance.  If the parse mode is "text", this is a Unicode string.  If the
842   loader fails, it can return None or raise an exception.
843
844
845.. function:: xml.etree.ElementInclude.include( elem, loader=None, base_url=None, \
846                                                max_depth=6)
847
848   This function expands XInclude directives.  *elem* is the root element.  *loader* is
849   an optional resource loader.  If omitted, it defaults to :func:`default_loader`.
850   If given, it should be a callable that implements the same interface as
851   :func:`default_loader`.  *base_url* is base URL of the original file, to resolve
852   relative include file references.  *max_depth* is the maximum number of recursive
853   inclusions.  Limited to reduce the risk of malicious content explosion. Pass a
854   negative value to disable the limitation.
855
856   Returns the expanded resource.  If the parse mode is
857   ``"xml"``, this is an ElementTree instance.  If the parse mode is "text",
858   this is a Unicode string.  If the loader fails, it can return None or
859   raise an exception.
860
861   .. versionadded:: 3.9
862      The *base_url* and *max_depth* parameters.
863
864
865.. _elementtree-element-objects:
866
867Element Objects
868^^^^^^^^^^^^^^^
869
870.. class:: Element(tag, attrib={}, **extra)
871
872   Element class.  This class defines the Element interface, and provides a
873   reference implementation of this interface.
874
875   The element name, attribute names, and attribute values can be either
876   bytestrings or Unicode strings.  *tag* is the element name.  *attrib* is
877   an optional dictionary, containing element attributes.  *extra* contains
878   additional attributes, given as keyword arguments.
879
880
881   .. attribute:: tag
882
883      A string identifying what kind of data this element represents (the
884      element type, in other words).
885
886
887   .. attribute:: text
888                  tail
889
890      These attributes can be used to hold additional data associated with
891      the element.  Their values are usually strings but may be any
892      application-specific object.  If the element is created from
893      an XML file, the *text* attribute holds either the text between
894      the element's start tag and its first child or end tag, or ``None``, and
895      the *tail* attribute holds either the text between the element's
896      end tag and the next tag, or ``None``.  For the XML data
897
898      .. code-block:: xml
899
900         <a><b>1<c>2<d/>3</c></b>4</a>
901
902      the *a* element has ``None`` for both *text* and *tail* attributes,
903      the *b* element has *text* ``"1"`` and *tail* ``"4"``,
904      the *c* element has *text* ``"2"`` and *tail* ``None``,
905      and the *d* element has *text* ``None`` and *tail* ``"3"``.
906
907      To collect the inner text of an element, see :meth:`itertext`, for
908      example ``"".join(element.itertext())``.
909
910      Applications may store arbitrary objects in these attributes.
911
912
913   .. attribute:: attrib
914
915      A dictionary containing the element's attributes.  Note that while the
916      *attrib* value is always a real mutable Python dictionary, an ElementTree
917      implementation may choose to use another internal representation, and
918      create the dictionary only if someone asks for it.  To take advantage of
919      such implementations, use the dictionary methods below whenever possible.
920
921   The following dictionary-like methods work on the element attributes.
922
923
924   .. method:: clear()
925
926      Resets an element.  This function removes all subelements, clears all
927      attributes, and sets the text and tail attributes to ``None``.
928
929
930   .. method:: get(key, default=None)
931
932      Gets the element attribute named *key*.
933
934      Returns the attribute value, or *default* if the attribute was not found.
935
936
937   .. method:: items()
938
939      Returns the element attributes as a sequence of (name, value) pairs.  The
940      attributes are returned in an arbitrary order.
941
942
943   .. method:: keys()
944
945      Returns the elements attribute names as a list.  The names are returned
946      in an arbitrary order.
947
948
949   .. method:: set(key, value)
950
951      Set the attribute *key* on the element to *value*.
952
953   The following methods work on the element's children (subelements).
954
955
956   .. method:: append(subelement)
957
958      Adds the element *subelement* to the end of this element's internal list
959      of subelements.  Raises :exc:`TypeError` if *subelement* is not an
960      :class:`Element`.
961
962
963   .. method:: extend(subelements)
964
965      Appends *subelements* from a sequence object with zero or more elements.
966      Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
967
968      .. versionadded:: 3.2
969
970
971   .. method:: find(match, namespaces=None)
972
973      Finds the first subelement matching *match*.  *match* may be a tag name
974      or a :ref:`path <elementtree-xpath>`.  Returns an element instance
975      or ``None``.  *namespaces* is an optional mapping from namespace prefix
976      to full name.  Pass ``''`` as prefix to move all unprefixed tag names
977      in the expression into the given namespace.
978
979
980   .. method:: findall(match, namespaces=None)
981
982      Finds all matching subelements, by tag name or
983      :ref:`path <elementtree-xpath>`.  Returns a list containing all matching
984      elements in document order.  *namespaces* is an optional mapping from
985      namespace prefix to full name.  Pass ``''`` as prefix to move all
986      unprefixed tag names in the expression into the given namespace.
987
988
989   .. method:: findtext(match, default=None, namespaces=None)
990
991      Finds text for the first subelement matching *match*.  *match* may be
992      a tag name or a :ref:`path <elementtree-xpath>`.  Returns the text content
993      of the first matching element, or *default* if no element was found.
994      Note that if the matching element has no text content an empty string
995      is returned. *namespaces* is an optional mapping from namespace prefix
996      to full name.  Pass ``''`` as prefix to move all unprefixed tag names
997      in the expression into the given namespace.
998
999
1000   .. method:: insert(index, subelement)
1001
1002      Inserts *subelement* at the given position in this element.  Raises
1003      :exc:`TypeError` if *subelement* is not an :class:`Element`.
1004
1005
1006   .. method:: iter(tag=None)
1007
1008      Creates a tree :term:`iterator` with the current element as the root.
1009      The iterator iterates over this element and all elements below it, in
1010      document (depth first) order.  If *tag* is not ``None`` or ``'*'``, only
1011      elements whose tag equals *tag* are returned from the iterator.  If the
1012      tree structure is modified during iteration, the result is undefined.
1013
1014      .. versionadded:: 3.2
1015
1016
1017   .. method:: iterfind(match, namespaces=None)
1018
1019      Finds all matching subelements, by tag name or
1020      :ref:`path <elementtree-xpath>`.  Returns an iterable yielding all
1021      matching elements in document order. *namespaces* is an optional mapping
1022      from namespace prefix to full name.
1023
1024
1025      .. versionadded:: 3.2
1026
1027
1028   .. method:: itertext()
1029
1030      Creates a text iterator.  The iterator loops over this element and all
1031      subelements, in document order, and returns all inner text.
1032
1033      .. versionadded:: 3.2
1034
1035
1036   .. method:: makeelement(tag, attrib)
1037
1038      Creates a new element object of the same type as this element.  Do not
1039      call this method, use the :func:`SubElement` factory function instead.
1040
1041
1042   .. method:: remove(subelement)
1043
1044      Removes *subelement* from the element.  Unlike the find\* methods this
1045      method compares elements based on the instance identity, not on tag value
1046      or contents.
1047
1048   :class:`Element` objects also support the following sequence type methods
1049   for working with subelements: :meth:`~object.__delitem__`,
1050   :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
1051   :meth:`~object.__len__`.
1052
1053   Caution: Elements with no subelements will test as ``False``.  This behavior
1054   will change in future versions.  Use specific ``len(elem)`` or ``elem is
1055   None`` test instead. ::
1056
1057     element = root.find('foo')
1058
1059     if not element:  # careful!
1060         print("element not found, or element has no subelements")
1061
1062     if element is None:
1063         print("element not found")
1064
1065   Prior to Python 3.8, the serialisation order of the XML attributes of
1066   elements was artificially made predictable by sorting the attributes by
1067   their name. Based on the now guaranteed ordering of dicts, this arbitrary
1068   reordering was removed in Python 3.8 to preserve the order in which
1069   attributes were originally parsed or created by user code.
1070
1071   In general, user code should try not to depend on a specific ordering of
1072   attributes, given that the `XML Information Set
1073   <https://www.w3.org/TR/xml-infoset/>`_ explicitly excludes the attribute
1074   order from conveying information. Code should be prepared to deal with
1075   any ordering on input. In cases where deterministic XML output is required,
1076   e.g. for cryptographic signing or test data sets, canonical serialisation
1077   is available with the :func:`canonicalize` function.
1078
1079   In cases where canonical output is not applicable but a specific attribute
1080   order is still desirable on output, code should aim for creating the
1081   attributes directly in the desired order, to avoid perceptual mismatches
1082   for readers of the code. In cases where this is difficult to achieve, a
1083   recipe like the following can be applied prior to serialisation to enforce
1084   an order independently from the Element creation::
1085
1086     def reorder_attributes(root):
1087         for el in root.iter():
1088             attrib = el.attrib
1089             if len(attrib) > 1:
1090                 # adjust attribute order, e.g. by sorting
1091                 attribs = sorted(attrib.items())
1092                 attrib.clear()
1093                 attrib.update(attribs)
1094
1095
1096.. _elementtree-elementtree-objects:
1097
1098ElementTree Objects
1099^^^^^^^^^^^^^^^^^^^
1100
1101
1102.. class:: ElementTree(element=None, file=None)
1103
1104   ElementTree wrapper class.  This class represents an entire element
1105   hierarchy, and adds some extra support for serialization to and from
1106   standard XML.
1107
1108   *element* is the root element.  The tree is initialized with the contents
1109   of the XML *file* if given.
1110
1111
1112   .. method:: _setroot(element)
1113
1114      Replaces the root element for this tree.  This discards the current
1115      contents of the tree, and replaces it with the given element.  Use with
1116      care.  *element* is an element instance.
1117
1118
1119   .. method:: find(match, namespaces=None)
1120
1121      Same as :meth:`Element.find`, starting at the root of the tree.
1122
1123
1124   .. method:: findall(match, namespaces=None)
1125
1126      Same as :meth:`Element.findall`, starting at the root of the tree.
1127
1128
1129   .. method:: findtext(match, default=None, namespaces=None)
1130
1131      Same as :meth:`Element.findtext`, starting at the root of the tree.
1132
1133
1134   .. method:: getroot()
1135
1136      Returns the root element for this tree.
1137
1138
1139   .. method:: iter(tag=None)
1140
1141      Creates and returns a tree iterator for the root element.  The iterator
1142      loops over all elements in this tree, in section order.  *tag* is the tag
1143      to look for (default is to return all elements).
1144
1145
1146   .. method:: iterfind(match, namespaces=None)
1147
1148      Same as :meth:`Element.iterfind`, starting at the root of the tree.
1149
1150      .. versionadded:: 3.2
1151
1152
1153   .. method:: parse(source, parser=None)
1154
1155      Loads an external XML section into this element tree.  *source* is a file
1156      name or :term:`file object`.  *parser* is an optional parser instance.
1157      If not given, the standard :class:`XMLParser` parser is used.  Returns the
1158      section root element.
1159
1160
1161   .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
1162                     default_namespace=None, method="xml", *, \
1163                     short_empty_elements=True)
1164
1165      Writes the element tree to a file, as XML.  *file* is a file name, or a
1166      :term:`file object` opened for writing.  *encoding* [1]_ is the output
1167      encoding (default is US-ASCII).
1168      *xml_declaration* controls if an XML declaration should be added to the
1169      file.  Use ``False`` for never, ``True`` for always, ``None``
1170      for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
1171      *default_namespace* sets the default XML namespace (for "xmlns").
1172      *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
1173      ``"xml"``).
1174      The keyword-only *short_empty_elements* parameter controls the formatting
1175      of elements that contain no content.  If ``True`` (the default), they are
1176      emitted as a single self-closed tag, otherwise they are emitted as a pair
1177      of start/end tags.
1178
1179      The output is either a string (:class:`str`) or binary (:class:`bytes`).
1180      This is controlled by the *encoding* argument.  If *encoding* is
1181      ``"unicode"``, the output is a string; otherwise, it's binary.  Note that
1182      this may conflict with the type of *file* if it's an open
1183      :term:`file object`; make sure you do not try to write a string to a
1184      binary stream and vice versa.
1185
1186      .. versionadded:: 3.4
1187         The *short_empty_elements* parameter.
1188
1189      .. versionchanged:: 3.8
1190         The :meth:`write` method now preserves the attribute order specified
1191         by the user.
1192
1193
1194This is the XML file that is going to be manipulated::
1195
1196    <html>
1197        <head>
1198            <title>Example page</title>
1199        </head>
1200        <body>
1201            <p>Moved to <a href="http://example.org/">example.org</a>
1202            or <a href="http://example.com/">example.com</a>.</p>
1203        </body>
1204    </html>
1205
1206Example of changing the attribute "target" of every link in first paragraph::
1207
1208    >>> from xml.etree.ElementTree import ElementTree
1209    >>> tree = ElementTree()
1210    >>> tree.parse("index.xhtml")
1211    <Element 'html' at 0xb77e6fac>
1212    >>> p = tree.find("body/p")     # Finds first occurrence of tag p in body
1213    >>> p
1214    <Element 'p' at 0xb77ec26c>
1215    >>> links = list(p.iter("a"))   # Returns list of all links
1216    >>> links
1217    [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
1218    >>> for i in links:             # Iterates through all found links
1219    ...     i.attrib["target"] = "blank"
1220    >>> tree.write("output.xhtml")
1221
1222.. _elementtree-qname-objects:
1223
1224QName Objects
1225^^^^^^^^^^^^^
1226
1227
1228.. class:: QName(text_or_uri, tag=None)
1229
1230   QName wrapper.  This can be used to wrap a QName attribute value, in order
1231   to get proper namespace handling on output.  *text_or_uri* is a string
1232   containing the QName value, in the form {uri}local, or, if the tag argument
1233   is given, the URI part of a QName.  If *tag* is given, the first argument is
1234   interpreted as a URI, and this argument is interpreted as a local name.
1235   :class:`QName` instances are opaque.
1236
1237
1238
1239.. _elementtree-treebuilder-objects:
1240
1241TreeBuilder Objects
1242^^^^^^^^^^^^^^^^^^^
1243
1244
1245.. class:: TreeBuilder(element_factory=None, *, comment_factory=None, \
1246                       pi_factory=None, insert_comments=False, insert_pis=False)
1247
1248   Generic element structure builder.  This builder converts a sequence of
1249   start, data, end, comment and pi method calls to a well-formed element
1250   structure.  You can use this class to build an element structure using
1251   a custom XML parser, or a parser for some other XML-like format.
1252
1253   *element_factory*, when given, must be a callable accepting two positional
1254   arguments: a tag and a dict of attributes.  It is expected to return a new
1255   element instance.
1256
1257   The *comment_factory* and *pi_factory* functions, when given, should behave
1258   like the :func:`Comment` and :func:`ProcessingInstruction` functions to
1259   create comments and processing instructions.  When not given, the default
1260   factories will be used.  When *insert_comments* and/or *insert_pis* is true,
1261   comments/pis will be inserted into the tree if they appear within the root
1262   element (but not outside of it).
1263
1264   .. method:: close()
1265
1266      Flushes the builder buffers, and returns the toplevel document
1267      element.  Returns an :class:`Element` instance.
1268
1269
1270   .. method:: data(data)
1271
1272      Adds text to the current element.  *data* is a string.  This should be
1273      either a bytestring, or a Unicode string.
1274
1275
1276   .. method:: end(tag)
1277
1278      Closes the current element.  *tag* is the element name.  Returns the
1279      closed element.
1280
1281
1282   .. method:: start(tag, attrs)
1283
1284      Opens a new element.  *tag* is the element name.  *attrs* is a dictionary
1285      containing element attributes.  Returns the opened element.
1286
1287
1288   .. method:: comment(text)
1289
1290      Creates a comment with the given *text*.  If ``insert_comments`` is true,
1291      this will also add it to the tree.
1292
1293      .. versionadded:: 3.8
1294
1295
1296   .. method:: pi(target, text)
1297
1298      Creates a comment with the given *target* name and *text*.  If
1299      ``insert_pis`` is true, this will also add it to the tree.
1300
1301      .. versionadded:: 3.8
1302
1303
1304   In addition, a custom :class:`TreeBuilder` object can provide the
1305   following methods:
1306
1307   .. method:: doctype(name, pubid, system)
1308
1309      Handles a doctype declaration.  *name* is the doctype name.  *pubid* is
1310      the public identifier.  *system* is the system identifier.  This method
1311      does not exist on the default :class:`TreeBuilder` class.
1312
1313      .. versionadded:: 3.2
1314
1315   .. method:: start_ns(prefix, uri)
1316
1317      Is called whenever the parser encounters a new namespace declaration,
1318      before the ``start()`` callback for the opening element that defines it.
1319      *prefix* is ``''`` for the default namespace and the declared
1320      namespace prefix name otherwise.  *uri* is the namespace URI.
1321
1322      .. versionadded:: 3.8
1323
1324   .. method:: end_ns(prefix)
1325
1326      Is called after the ``end()`` callback of an element that declared
1327      a namespace prefix mapping, with the name of the *prefix* that went
1328      out of scope.
1329
1330      .. versionadded:: 3.8
1331
1332
1333.. class:: C14NWriterTarget(write, *, \
1334             with_comments=False, strip_text=False, rewrite_prefixes=False, \
1335             qname_aware_tags=None, qname_aware_attrs=None, \
1336             exclude_attrs=None, exclude_tags=None)
1337
1338   A `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ writer.  Arguments are the
1339   same as for the :func:`canonicalize` function.  This class does not build a
1340   tree but translates the callback events directly into a serialised form
1341   using the *write* function.
1342
1343   .. versionadded:: 3.8
1344
1345
1346.. _elementtree-xmlparser-objects:
1347
1348XMLParser Objects
1349^^^^^^^^^^^^^^^^^
1350
1351
1352.. class:: XMLParser(*, target=None, encoding=None)
1353
1354   This class is the low-level building block of the module.  It uses
1355   :mod:`xml.parsers.expat` for efficient, event-based parsing of XML.  It can
1356   be fed XML data incrementally with the :meth:`feed` method, and parsing
1357   events are translated to a push API - by invoking callbacks on the *target*
1358   object.  If *target* is omitted, the standard :class:`TreeBuilder` is used.
1359   If *encoding* [1]_ is given, the value overrides the
1360   encoding specified in the XML file.
1361
1362   .. versionchanged:: 3.8
1363      Parameters are now :ref:`keyword-only <keyword-only_parameter>`.
1364      The *html* argument no longer supported.
1365
1366
1367   .. method:: close()
1368
1369      Finishes feeding data to the parser.  Returns the result of calling the
1370      ``close()`` method of the *target* passed during construction; by default,
1371      this is the toplevel document element.
1372
1373
1374   .. method:: feed(data)
1375
1376      Feeds data to the parser.  *data* is encoded data.
1377
1378   :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
1379   for each opening tag, its ``end(tag)`` method for each closing tag, and data
1380   is processed by method ``data(data)``.  For further supported callback
1381   methods, see the :class:`TreeBuilder` class.  :meth:`XMLParser.close` calls
1382   *target*\'s method ``close()``. :class:`XMLParser` can be used not only for
1383   building a tree structure. This is an example of counting the maximum depth
1384   of an XML file::
1385
1386    >>> from xml.etree.ElementTree import XMLParser
1387    >>> class MaxDepth:                     # The target object of the parser
1388    ...     maxDepth = 0
1389    ...     depth = 0
1390    ...     def start(self, tag, attrib):   # Called for each opening tag.
1391    ...         self.depth += 1
1392    ...         if self.depth > self.maxDepth:
1393    ...             self.maxDepth = self.depth
1394    ...     def end(self, tag):             # Called for each closing tag.
1395    ...         self.depth -= 1
1396    ...     def data(self, data):
1397    ...         pass            # We do not need to do anything with data.
1398    ...     def close(self):    # Called when all data has been parsed.
1399    ...         return self.maxDepth
1400    ...
1401    >>> target = MaxDepth()
1402    >>> parser = XMLParser(target=target)
1403    >>> exampleXml = """
1404    ... <a>
1405    ...   <b>
1406    ...   </b>
1407    ...   <b>
1408    ...     <c>
1409    ...       <d>
1410    ...       </d>
1411    ...     </c>
1412    ...   </b>
1413    ... </a>"""
1414    >>> parser.feed(exampleXml)
1415    >>> parser.close()
1416    4
1417
1418
1419.. _elementtree-xmlpullparser-objects:
1420
1421XMLPullParser Objects
1422^^^^^^^^^^^^^^^^^^^^^
1423
1424.. class:: XMLPullParser(events=None)
1425
1426   A pull parser suitable for non-blocking applications.  Its input-side API is
1427   similar to that of :class:`XMLParser`, but instead of pushing calls to a
1428   callback target, :class:`XMLPullParser` collects an internal list of parsing
1429   events and lets the user read from it. *events* is a sequence of events to
1430   report back.  The supported events are the strings ``"start"``, ``"end"``,
1431   ``"comment"``, ``"pi"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" events
1432   are used to get detailed namespace information).  If *events* is omitted,
1433   only ``"end"`` events are reported.
1434
1435   .. method:: feed(data)
1436
1437      Feed the given bytes data to the parser.
1438
1439   .. method:: close()
1440
1441      Signal the parser that the data stream is terminated. Unlike
1442      :meth:`XMLParser.close`, this method always returns :const:`None`.
1443      Any events not yet retrieved when the parser is closed can still be
1444      read with :meth:`read_events`.
1445
1446   .. method:: read_events()
1447
1448      Return an iterator over the events which have been encountered in the
1449      data fed to the
1450      parser.  The iterator yields ``(event, elem)`` pairs, where *event* is a
1451      string representing the type of event (e.g. ``"end"``) and *elem* is the
1452      encountered :class:`Element` object, or other context value as follows.
1453
1454      * ``start``, ``end``: the current Element.
1455      * ``comment``, ``pi``: the current comment / processing instruction
1456      * ``start-ns``: a tuple ``(prefix, uri)`` naming the declared namespace
1457        mapping.
1458      * ``end-ns``: :const:`None` (this may change in a future version)
1459
1460      Events provided in a previous call to :meth:`read_events` will not be
1461      yielded again.  Events are consumed from the internal queue only when
1462      they are retrieved from the iterator, so multiple readers iterating in
1463      parallel over iterators obtained from :meth:`read_events` will have
1464      unpredictable results.
1465
1466   .. note::
1467
1468      :class:`XMLPullParser` only guarantees that it has seen the ">"
1469      character of a starting tag when it emits a "start" event, so the
1470      attributes are defined, but the contents of the text and tail attributes
1471      are undefined at that point.  The same applies to the element children;
1472      they may or may not be present.
1473
1474      If you need a fully populated element, look for "end" events instead.
1475
1476   .. versionadded:: 3.4
1477
1478   .. versionchanged:: 3.8
1479      The ``comment`` and ``pi`` events were added.
1480
1481
1482Exceptions
1483^^^^^^^^^^
1484
1485.. class:: ParseError
1486
1487   XML parse error, raised by the various parsing methods in this module when
1488   parsing fails.  The string representation of an instance of this exception
1489   will contain a user-friendly error message.  In addition, it will have
1490   the following attributes available:
1491
1492   .. attribute:: code
1493
1494      A numeric error code from the expat parser. See the documentation of
1495      :mod:`xml.parsers.expat` for the list of error codes and their meanings.
1496
1497   .. attribute:: position
1498
1499      A tuple of *line*, *column* numbers, specifying where the error occurred.
1500
1501.. rubric:: Footnotes
1502
1503.. [1] The encoding string included in XML output should conform to the
1504   appropriate standards.  For example, "UTF-8" is valid, but "UTF8" is
1505   not.  See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
1506   and https://www.iana.org/assignments/character-sets/character-sets.xhtml.
1507