1:mod:`xml.etree.ElementTree` --- The ElementTree XML API 2======================================================== 3 4.. module:: xml.etree.ElementTree 5 :synopsis: Implementation of the ElementTree API. 6 7.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com> 8 9**Source code:** :source:`Lib/xml/etree/ElementTree.py` 10 11-------------- 12 13The :mod:`xml.etree.ElementTree` module implements a simple and efficient API 14for parsing and creating XML data. 15 16.. versionchanged:: 3.3 17 This module will use a fast implementation whenever available. 18 The :mod:`xml.etree.cElementTree` module is deprecated. 19 20 21.. warning:: 22 23 The :mod:`xml.etree.ElementTree` module is not secure against 24 maliciously constructed data. If you need to parse untrusted or 25 unauthenticated data see :ref:`xml-vulnerabilities`. 26 27Tutorial 28-------- 29 30This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in 31short). The goal is to demonstrate some of the building blocks and basic 32concepts of the module. 33 34XML tree and elements 35^^^^^^^^^^^^^^^^^^^^^ 36 37XML is an inherently hierarchical data format, and the most natural way to 38represent it is with a tree. ``ET`` has two classes for this purpose - 39:class:`ElementTree` represents the whole XML document as a tree, and 40:class:`Element` represents a single node in this tree. Interactions with 41the whole document (reading and writing to/from files) are usually done 42on the :class:`ElementTree` level. Interactions with a single XML element 43and its sub-elements are done on the :class:`Element` level. 44 45.. _elementtree-parsing-xml: 46 47Parsing XML 48^^^^^^^^^^^ 49 50We'll be using the following XML document as the sample data for this section: 51 52.. code-block:: xml 53 54 <?xml version="1.0"?> 55 <data> 56 <country name="Liechtenstein"> 57 <rank>1</rank> 58 <year>2008</year> 59 <gdppc>141100</gdppc> 60 <neighbor name="Austria" direction="E"/> 61 <neighbor name="Switzerland" direction="W"/> 62 </country> 63 <country name="Singapore"> 64 <rank>4</rank> 65 <year>2011</year> 66 <gdppc>59900</gdppc> 67 <neighbor name="Malaysia" direction="N"/> 68 </country> 69 <country name="Panama"> 70 <rank>68</rank> 71 <year>2011</year> 72 <gdppc>13600</gdppc> 73 <neighbor name="Costa Rica" direction="W"/> 74 <neighbor name="Colombia" direction="E"/> 75 </country> 76 </data> 77 78We can import this data by reading from a file:: 79 80 import xml.etree.ElementTree as ET 81 tree = ET.parse('country_data.xml') 82 root = tree.getroot() 83 84Or directly from a string:: 85 86 root = ET.fromstring(country_data_as_string) 87 88:func:`fromstring` parses XML from a string directly into an :class:`Element`, 89which is the root element of the parsed tree. Other parsing functions may 90create an :class:`ElementTree`. Check the documentation to be sure. 91 92As an :class:`Element`, ``root`` has a tag and a dictionary of attributes:: 93 94 >>> root.tag 95 'data' 96 >>> root.attrib 97 {} 98 99It also has children nodes over which we can iterate:: 100 101 >>> for child in root: 102 ... print(child.tag, child.attrib) 103 ... 104 country {'name': 'Liechtenstein'} 105 country {'name': 'Singapore'} 106 country {'name': 'Panama'} 107 108Children are nested, and we can access specific child nodes by index:: 109 110 >>> root[0][1].text 111 '2008' 112 113 114.. note:: 115 116 Not all elements of the XML input will end up as elements of the 117 parsed tree. Currently, this module skips over any XML comments, 118 processing instructions, and document type declarations in the 119 input. Nevertheless, trees built using this module's API rather 120 than parsing from XML text can have comments and processing 121 instructions in them; they will be included when generating XML 122 output. A document type declaration may be accessed by passing a 123 custom :class:`TreeBuilder` instance to the :class:`XMLParser` 124 constructor. 125 126 127.. _elementtree-pull-parsing: 128 129Pull API for non-blocking parsing 130^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 131 132Most parsing functions provided by this module require the whole document 133to be read at once before returning any result. It is possible to use an 134:class:`XMLParser` and feed data into it incrementally, but it is a push API that 135calls methods on a callback target, which is too low-level and inconvenient for 136most needs. Sometimes what the user really wants is to be able to parse XML 137incrementally, without blocking operations, while enjoying the convenience of 138fully constructed :class:`Element` objects. 139 140The most powerful tool for doing this is :class:`XMLPullParser`. It does not 141require a blocking read to obtain the XML data, and is instead fed with data 142incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML 143elements, call :meth:`XMLPullParser.read_events`. Here is an example:: 144 145 >>> parser = ET.XMLPullParser(['start', 'end']) 146 >>> parser.feed('<mytag>sometext') 147 >>> list(parser.read_events()) 148 [('start', <Element 'mytag' at 0x7fa66db2be58>)] 149 >>> parser.feed(' more text</mytag>') 150 >>> for event, elem in parser.read_events(): 151 ... print(event) 152 ... print(elem.tag, 'text=', elem.text) 153 ... 154 end 155 156The obvious use case is applications that operate in a non-blocking fashion 157where the XML data is being received from a socket or read incrementally from 158some storage device. In such cases, blocking reads are unacceptable. 159 160Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for 161simpler use-cases. If you don't mind your application blocking on reading XML 162data but would still like to have incremental parsing capabilities, take a look 163at :func:`iterparse`. It can be useful when you're reading a large XML document 164and don't want to hold it wholly in memory. 165 166Finding interesting elements 167^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 168 169:class:`Element` has some useful methods that help iterate recursively over all 170the sub-tree below it (its children, their children, and so on). For example, 171:meth:`Element.iter`:: 172 173 >>> for neighbor in root.iter('neighbor'): 174 ... print(neighbor.attrib) 175 ... 176 {'name': 'Austria', 'direction': 'E'} 177 {'name': 'Switzerland', 'direction': 'W'} 178 {'name': 'Malaysia', 'direction': 'N'} 179 {'name': 'Costa Rica', 'direction': 'W'} 180 {'name': 'Colombia', 'direction': 'E'} 181 182:meth:`Element.findall` finds only elements with a tag which are direct 183children of the current element. :meth:`Element.find` finds the *first* child 184with a particular tag, and :attr:`Element.text` accesses the element's text 185content. :meth:`Element.get` accesses the element's attributes:: 186 187 >>> for country in root.findall('country'): 188 ... rank = country.find('rank').text 189 ... name = country.get('name') 190 ... print(name, rank) 191 ... 192 Liechtenstein 1 193 Singapore 4 194 Panama 68 195 196More sophisticated specification of which elements to look for is possible by 197using :ref:`XPath <elementtree-xpath>`. 198 199Modifying an XML File 200^^^^^^^^^^^^^^^^^^^^^ 201 202:class:`ElementTree` provides a simple way to build XML documents and write them to files. 203The :meth:`ElementTree.write` method serves this purpose. 204 205Once created, an :class:`Element` object may be manipulated by directly changing 206its fields (such as :attr:`Element.text`), adding and modifying attributes 207(:meth:`Element.set` method), as well as adding new children (for example 208with :meth:`Element.append`). 209 210Let's say we want to add one to each country's rank, and add an ``updated`` 211attribute to the rank element:: 212 213 >>> for rank in root.iter('rank'): 214 ... new_rank = int(rank.text) + 1 215 ... rank.text = str(new_rank) 216 ... rank.set('updated', 'yes') 217 ... 218 >>> tree.write('output.xml') 219 220Our XML now looks like this: 221 222.. code-block:: xml 223 224 <?xml version="1.0"?> 225 <data> 226 <country name="Liechtenstein"> 227 <rank updated="yes">2</rank> 228 <year>2008</year> 229 <gdppc>141100</gdppc> 230 <neighbor name="Austria" direction="E"/> 231 <neighbor name="Switzerland" direction="W"/> 232 </country> 233 <country name="Singapore"> 234 <rank updated="yes">5</rank> 235 <year>2011</year> 236 <gdppc>59900</gdppc> 237 <neighbor name="Malaysia" direction="N"/> 238 </country> 239 <country name="Panama"> 240 <rank updated="yes">69</rank> 241 <year>2011</year> 242 <gdppc>13600</gdppc> 243 <neighbor name="Costa Rica" direction="W"/> 244 <neighbor name="Colombia" direction="E"/> 245 </country> 246 </data> 247 248We can remove elements using :meth:`Element.remove`. Let's say we want to 249remove all countries with a rank higher than 50:: 250 251 >>> for country in root.findall('country'): 252 ... rank = int(country.find('rank').text) 253 ... if rank > 50: 254 ... root.remove(country) 255 ... 256 >>> tree.write('output.xml') 257 258Our XML now looks like this: 259 260.. code-block:: xml 261 262 <?xml version="1.0"?> 263 <data> 264 <country name="Liechtenstein"> 265 <rank updated="yes">2</rank> 266 <year>2008</year> 267 <gdppc>141100</gdppc> 268 <neighbor name="Austria" direction="E"/> 269 <neighbor name="Switzerland" direction="W"/> 270 </country> 271 <country name="Singapore"> 272 <rank updated="yes">5</rank> 273 <year>2011</year> 274 <gdppc>59900</gdppc> 275 <neighbor name="Malaysia" direction="N"/> 276 </country> 277 </data> 278 279Building XML documents 280^^^^^^^^^^^^^^^^^^^^^^ 281 282The :func:`SubElement` function also provides a convenient way to create new 283sub-elements for a given element:: 284 285 >>> a = ET.Element('a') 286 >>> b = ET.SubElement(a, 'b') 287 >>> c = ET.SubElement(a, 'c') 288 >>> d = ET.SubElement(c, 'd') 289 >>> ET.dump(a) 290 <a><b /><c><d /></c></a> 291 292Parsing XML with Namespaces 293^^^^^^^^^^^^^^^^^^^^^^^^^^^ 294 295If the XML input has `namespaces 296<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes 297with prefixes in the form ``prefix:sometag`` get expanded to 298``{uri}sometag`` where the *prefix* is replaced by the full *URI*. 299Also, if there is a `default namespace 300<https://www.w3.org/TR/xml-names/#defaulting>`__, 301that full URI gets prepended to all of the non-prefixed tags. 302 303Here is an XML example that incorporates two namespaces, one with the 304prefix "fictional" and the other serving as the default namespace: 305 306.. code-block:: xml 307 308 <?xml version="1.0"?> 309 <actors xmlns:fictional="http://characters.example.com" 310 xmlns="http://people.example.com"> 311 <actor> 312 <name>John Cleese</name> 313 <fictional:character>Lancelot</fictional:character> 314 <fictional:character>Archie Leach</fictional:character> 315 </actor> 316 <actor> 317 <name>Eric Idle</name> 318 <fictional:character>Sir Robin</fictional:character> 319 <fictional:character>Gunther</fictional:character> 320 <fictional:character>Commander Clement</fictional:character> 321 </actor> 322 </actors> 323 324One way to search and explore this XML example is to manually add the 325URI to every tag or attribute in the xpath of a 326:meth:`~Element.find` or :meth:`~Element.findall`:: 327 328 root = fromstring(xml_text) 329 for actor in root.findall('{http://people.example.com}actor'): 330 name = actor.find('{http://people.example.com}name') 331 print(name.text) 332 for char in actor.findall('{http://characters.example.com}character'): 333 print(' |-->', char.text) 334 335A better way to search the namespaced XML example is to create a 336dictionary with your own prefixes and use those in the search functions:: 337 338 ns = {'real_person': 'http://people.example.com', 339 'role': 'http://characters.example.com'} 340 341 for actor in root.findall('real_person:actor', ns): 342 name = actor.find('real_person:name', ns) 343 print(name.text) 344 for char in actor.findall('role:character', ns): 345 print(' |-->', char.text) 346 347These two approaches both output:: 348 349 John Cleese 350 |--> Lancelot 351 |--> Archie Leach 352 Eric Idle 353 |--> Sir Robin 354 |--> Gunther 355 |--> Commander Clement 356 357 358Additional resources 359^^^^^^^^^^^^^^^^^^^^ 360 361See http://effbot.org/zone/element-index.htm for tutorials and links to other 362docs. 363 364 365.. _elementtree-xpath: 366 367XPath support 368------------- 369 370This module provides limited support for 371`XPath expressions <https://www.w3.org/TR/xpath>`_ for locating elements in a 372tree. The goal is to support a small subset of the abbreviated syntax; a full 373XPath engine is outside the scope of the module. 374 375Example 376^^^^^^^ 377 378Here's an example that demonstrates some of the XPath capabilities of the 379module. We'll be using the ``countrydata`` XML document from the 380:ref:`Parsing XML <elementtree-parsing-xml>` section:: 381 382 import xml.etree.ElementTree as ET 383 384 root = ET.fromstring(countrydata) 385 386 # Top-level elements 387 root.findall(".") 388 389 # All 'neighbor' grand-children of 'country' children of the top-level 390 # elements 391 root.findall("./country/neighbor") 392 393 # Nodes with name='Singapore' that have a 'year' child 394 root.findall(".//year/..[@name='Singapore']") 395 396 # 'year' nodes that are children of nodes with name='Singapore' 397 root.findall(".//*[@name='Singapore']/year") 398 399 # All 'neighbor' nodes that are the second child of their parent 400 root.findall(".//neighbor[2]") 401 402For XML with namespaces, use the usual qualified ``{namespace}tag`` notation:: 403 404 # All dublin-core "title" tags in the document 405 root.findall(".//{http://purl.org/dc/elements/1.1/}title") 406 407 408Supported XPath syntax 409^^^^^^^^^^^^^^^^^^^^^^ 410 411.. tabularcolumns:: |l|L| 412 413+-----------------------+------------------------------------------------------+ 414| Syntax | Meaning | 415+=======================+======================================================+ 416| ``tag`` | Selects all child elements with the given tag. | 417| | For example, ``spam`` selects all child elements | 418| | named ``spam``, and ``spam/egg`` selects all | 419| | grandchildren named ``egg`` in all children named | 420| | ``spam``. ``{namespace}*`` selects all tags in the | 421| | given namespace, ``{*}spam`` selects tags named | 422| | ``spam`` in any (or no) namespace, and ``{}*`` | 423| | only selects tags that are not in a namespace. | 424| | | 425| | .. versionchanged:: 3.8 | 426| | Support for star-wildcards was added. | 427+-----------------------+------------------------------------------------------+ 428| ``*`` | Selects all child elements, including comments and | 429| | processing instructions. For example, ``*/egg`` | 430| | selects all grandchildren named ``egg``. | 431+-----------------------+------------------------------------------------------+ 432| ``.`` | Selects the current node. This is mostly useful | 433| | at the beginning of the path, to indicate that it's | 434| | a relative path. | 435+-----------------------+------------------------------------------------------+ 436| ``//`` | Selects all subelements, on all levels beneath the | 437| | current element. For example, ``.//egg`` selects | 438| | all ``egg`` elements in the entire tree. | 439+-----------------------+------------------------------------------------------+ 440| ``..`` | Selects the parent element. Returns ``None`` if the | 441| | path attempts to reach the ancestors of the start | 442| | element (the element ``find`` was called on). | 443+-----------------------+------------------------------------------------------+ 444| ``[@attrib]`` | Selects all elements that have the given attribute. | 445+-----------------------+------------------------------------------------------+ 446| ``[@attrib='value']`` | Selects all elements for which the given attribute | 447| | has the given value. The value cannot contain | 448| | quotes. | 449+-----------------------+------------------------------------------------------+ 450| ``[tag]`` | Selects all elements that have a child named | 451| | ``tag``. Only immediate children are supported. | 452+-----------------------+------------------------------------------------------+ 453| ``[.='text']`` | Selects all elements whose complete text content, | 454| | including descendants, equals the given ``text``. | 455| | | 456| | .. versionadded:: 3.7 | 457+-----------------------+------------------------------------------------------+ 458| ``[tag='text']`` | Selects all elements that have a child named | 459| | ``tag`` whose complete text content, including | 460| | descendants, equals the given ``text``. | 461+-----------------------+------------------------------------------------------+ 462| ``[position]`` | Selects all elements that are located at the given | 463| | position. The position can be either an integer | 464| | (1 is the first position), the expression ``last()`` | 465| | (for the last position), or a position relative to | 466| | the last position (e.g. ``last()-1``). | 467+-----------------------+------------------------------------------------------+ 468 469Predicates (expressions within square brackets) must be preceded by a tag 470name, an asterisk, or another predicate. ``position`` predicates must be 471preceded by a tag name. 472 473Reference 474--------- 475 476.. _elementtree-functions: 477 478Functions 479^^^^^^^^^ 480 481.. function:: canonicalize(xml_data=None, *, out=None, from_file=None, **options) 482 483 `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ transformation function. 484 485 Canonicalization is a way to normalise XML output in a way that allows 486 byte-by-byte comparisons and digital signatures. It reduced the freedom 487 that XML serializers have and instead generates a more constrained XML 488 representation. The main restrictions regard the placement of namespace 489 declarations, the ordering of attributes, and ignorable whitespace. 490 491 This function takes an XML data string (*xml_data*) or a file path or 492 file-like object (*from_file*) as input, converts it to the canonical 493 form, and writes it out using the *out* file(-like) object, if provided, 494 or returns it as a text string if not. The output file receives text, 495 not bytes. It should therefore be opened in text mode with ``utf-8`` 496 encoding. 497 498 Typical uses:: 499 500 xml_data = "<root>...</root>" 501 print(canonicalize(xml_data)) 502 503 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: 504 canonicalize(xml_data, out=out_file) 505 506 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: 507 canonicalize(from_file="inputfile.xml", out=out_file) 508 509 The configuration *options* are as follows: 510 511 - *with_comments*: set to true to include comments (default: false) 512 - *strip_text*: set to true to strip whitespace before and after text content 513 (default: false) 514 - *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}" 515 (default: false) 516 - *qname_aware_tags*: a set of qname aware tag names in which prefixes 517 should be replaced in text content (default: empty) 518 - *qname_aware_attrs*: a set of qname aware attribute names in which prefixes 519 should be replaced in text content (default: empty) 520 - *exclude_attrs*: a set of attribute names that should not be serialised 521 - *exclude_tags*: a set of tag names that should not be serialised 522 523 In the option list above, "a set" refers to any collection or iterable of 524 strings, no ordering is expected. 525 526 .. versionadded:: 3.8 527 528 529.. function:: Comment(text=None) 530 531 Comment element factory. This factory function creates a special element 532 that will be serialized as an XML comment by the standard serializer. The 533 comment string can be either a bytestring or a Unicode string. *text* is a 534 string containing the comment string. Returns an element instance 535 representing a comment. 536 537 Note that :class:`XMLParser` skips over comments in the input 538 instead of creating comment objects for them. An :class:`ElementTree` will 539 only contain comment nodes if they have been inserted into to 540 the tree using one of the :class:`Element` methods. 541 542.. function:: dump(elem) 543 544 Writes an element tree or element structure to sys.stdout. This function 545 should be used for debugging only. 546 547 The exact output format is implementation dependent. In this version, it's 548 written as an ordinary XML file. 549 550 *elem* is an element tree or an individual element. 551 552 .. versionchanged:: 3.8 553 The :func:`dump` function now preserves the attribute order specified 554 by the user. 555 556 557.. function:: fromstring(text, parser=None) 558 559 Parses an XML section from a string constant. Same as :func:`XML`. *text* 560 is a string containing XML data. *parser* is an optional parser instance. 561 If not given, the standard :class:`XMLParser` parser is used. 562 Returns an :class:`Element` instance. 563 564 565.. function:: fromstringlist(sequence, parser=None) 566 567 Parses an XML document from a sequence of string fragments. *sequence* is a 568 list or other sequence containing XML data fragments. *parser* is an 569 optional parser instance. If not given, the standard :class:`XMLParser` 570 parser is used. Returns an :class:`Element` instance. 571 572 .. versionadded:: 3.2 573 574 575.. function:: iselement(element) 576 577 Check if an object appears to be a valid element object. *element* is an 578 element instance. Return ``True`` if this is an element object. 579 580 581.. function:: iterparse(source, events=None, parser=None) 582 583 Parses an XML section into an element tree incrementally, and reports what's 584 going on to the user. *source* is a filename or :term:`file object` 585 containing XML data. *events* is a sequence of events to report back. The 586 supported events are the strings ``"start"``, ``"end"``, ``"comment"``, 587 ``"pi"``, ``"start-ns"`` and ``"end-ns"`` 588 (the "ns" events are used to get detailed namespace 589 information). If *events* is omitted, only ``"end"`` events are reported. 590 *parser* is an optional parser instance. If not given, the standard 591 :class:`XMLParser` parser is used. *parser* must be a subclass of 592 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a 593 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs. 594 595 Note that while :func:`iterparse` builds the tree incrementally, it issues 596 blocking reads on *source* (or the file it names). As such, it's unsuitable 597 for applications where blocking reads can't be made. For fully non-blocking 598 parsing, see :class:`XMLPullParser`. 599 600 .. note:: 601 602 :func:`iterparse` only guarantees that it has seen the ">" character of a 603 starting tag when it emits a "start" event, so the attributes are defined, 604 but the contents of the text and tail attributes are undefined at that 605 point. The same applies to the element children; they may or may not be 606 present. 607 608 If you need a fully populated element, look for "end" events instead. 609 610 .. deprecated:: 3.4 611 The *parser* argument. 612 613 .. versionchanged:: 3.8 614 The ``comment`` and ``pi`` events were added. 615 616 617.. function:: parse(source, parser=None) 618 619 Parses an XML section into an element tree. *source* is a filename or file 620 object containing XML data. *parser* is an optional parser instance. If 621 not given, the standard :class:`XMLParser` parser is used. Returns an 622 :class:`ElementTree` instance. 623 624 625.. function:: ProcessingInstruction(target, text=None) 626 627 PI element factory. This factory function creates a special element that 628 will be serialized as an XML processing instruction. *target* is a string 629 containing the PI target. *text* is a string containing the PI contents, if 630 given. Returns an element instance, representing a processing instruction. 631 632 Note that :class:`XMLParser` skips over processing instructions 633 in the input instead of creating comment objects for them. An 634 :class:`ElementTree` will only contain processing instruction nodes if 635 they have been inserted into to the tree using one of the 636 :class:`Element` methods. 637 638.. function:: register_namespace(prefix, uri) 639 640 Registers a namespace prefix. The registry is global, and any existing 641 mapping for either the given prefix or the namespace URI will be removed. 642 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and 643 attributes in this namespace will be serialized with the given prefix, if at 644 all possible. 645 646 .. versionadded:: 3.2 647 648 649.. function:: SubElement(parent, tag, attrib={}, **extra) 650 651 Subelement factory. This function creates an element instance, and appends 652 it to an existing element. 653 654 The element name, attribute names, and attribute values can be either 655 bytestrings or Unicode strings. *parent* is the parent element. *tag* is 656 the subelement name. *attrib* is an optional dictionary, containing element 657 attributes. *extra* contains additional attributes, given as keyword 658 arguments. Returns an element instance. 659 660 661.. function:: tostring(element, encoding="us-ascii", method="xml", *, \ 662 xml_declaration=None, default_namespace=None, \ 663 short_empty_elements=True) 664 665 Generates a string representation of an XML element, including all 666 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 667 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 668 generate a Unicode string (otherwise, a bytestring is generated). *method* 669 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 670 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same 671 meaning as in :meth:`ElementTree.write`. Returns an (optionally) encoded string 672 containing the XML data. 673 674 .. versionadded:: 3.4 675 The *short_empty_elements* parameter. 676 677 .. versionadded:: 3.8 678 The *xml_declaration* and *default_namespace* parameters. 679 680 .. versionchanged:: 3.8 681 The :func:`tostring` function now preserves the attribute order 682 specified by the user. 683 684 685.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \ 686 xml_declaration=None, default_namespace=None, \ 687 short_empty_elements=True) 688 689 Generates a string representation of an XML element, including all 690 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 691 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 692 generate a Unicode string (otherwise, a bytestring is generated). *method* 693 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 694 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same 695 meaning as in :meth:`ElementTree.write`. Returns a list of (optionally) encoded 696 strings containing the XML data. It does not guarantee any specific sequence, 697 except that ``b"".join(tostringlist(element)) == tostring(element)``. 698 699 .. versionadded:: 3.2 700 701 .. versionadded:: 3.4 702 The *short_empty_elements* parameter. 703 704 .. versionadded:: 3.8 705 The *xml_declaration* and *default_namespace* parameters. 706 707 .. versionchanged:: 3.8 708 The :func:`tostringlist` function now preserves the attribute order 709 specified by the user. 710 711 712.. function:: XML(text, parser=None) 713 714 Parses an XML section from a string constant. This function can be used to 715 embed "XML literals" in Python code. *text* is a string containing XML 716 data. *parser* is an optional parser instance. If not given, the standard 717 :class:`XMLParser` parser is used. Returns an :class:`Element` instance. 718 719 720.. function:: XMLID(text, parser=None) 721 722 Parses an XML section from a string constant, and also returns a dictionary 723 which maps from element id:s to elements. *text* is a string containing XML 724 data. *parser* is an optional parser instance. If not given, the standard 725 :class:`XMLParser` parser is used. Returns a tuple containing an 726 :class:`Element` instance and a dictionary. 727 728 729.. _elementtree-xinclude: 730 731XInclude support 732---------------- 733 734This module provides limited support for 735`XInclude directives <https://www.w3.org/TR/xinclude/>`_, via the :mod:`xml.etree.ElementInclude` helper module. This module can be used to insert subtrees and text strings into element trees, based on information in the tree. 736 737Example 738^^^^^^^ 739 740Here's an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the ``{http://www.w3.org/2001/XInclude}include`` element and set the **parse** attribute to ``"xml"``, and use the **href** attribute to specify the document to include. 741 742.. code-block:: xml 743 744 <?xml version="1.0"?> 745 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 746 <xi:include href="source.xml" parse="xml" /> 747 </document> 748 749By default, the **href** attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax. 750 751To process this file, load it as usual, and pass the root element to the :mod:`xml.etree.ElementTree` module: 752 753.. code-block:: python 754 755 from xml.etree import ElementTree, ElementInclude 756 757 tree = ElementTree.parse("document.xml") 758 root = tree.getroot() 759 760 ElementInclude.include(root) 761 762The ElementInclude module replaces the ``{http://www.w3.org/2001/XInclude}include`` element with the root element from the **source.xml** document. The result might look something like this: 763 764.. code-block:: xml 765 766 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 767 <para>This is a paragraph.</para> 768 </document> 769 770If the **parse** attribute is omitted, it defaults to "xml". The href attribute is required. 771 772To include a text document, use the ``{http://www.w3.org/2001/XInclude}include`` element, and set the **parse** attribute to "text": 773 774.. code-block:: xml 775 776 <?xml version="1.0"?> 777 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 778 Copyright (c) <xi:include href="year.txt" parse="text" />. 779 </document> 780 781The result might look something like: 782 783.. code-block:: xml 784 785 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 786 Copyright (c) 2003. 787 </document> 788 789Reference 790--------- 791 792.. _elementinclude-functions: 793 794Functions 795^^^^^^^^^ 796 797.. function:: xml.etree.ElementInclude.default_loader( href, parse, encoding=None) 798 799 Default loader. This default loader reads an included resource from disk. *href* is a URL. 800 *parse* is for parse mode either "xml" or "text". *encoding* 801 is an optional text encoding. If not given, encoding is ``utf-8``. Returns the 802 expanded resource. If the parse mode is ``"xml"``, this is an ElementTree 803 instance. If the parse mode is "text", this is a Unicode string. If the 804 loader fails, it can return None or raise an exception. 805 806 807.. function:: xml.etree.ElementInclude.include( elem, loader=None) 808 809 This function expands XInclude directives. *elem* is the root element. *loader* is 810 an optional resource loader. If omitted, it defaults to :func:`default_loader`. 811 If given, it should be a callable that implements the same interface as 812 :func:`default_loader`. Returns the expanded resource. If the parse mode is 813 ``"xml"``, this is an ElementTree instance. If the parse mode is "text", 814 this is a Unicode string. If the loader fails, it can return None or 815 raise an exception. 816 817 818.. _elementtree-element-objects: 819 820Element Objects 821^^^^^^^^^^^^^^^ 822 823.. class:: Element(tag, attrib={}, **extra) 824 825 Element class. This class defines the Element interface, and provides a 826 reference implementation of this interface. 827 828 The element name, attribute names, and attribute values can be either 829 bytestrings or Unicode strings. *tag* is the element name. *attrib* is 830 an optional dictionary, containing element attributes. *extra* contains 831 additional attributes, given as keyword arguments. 832 833 834 .. attribute:: tag 835 836 A string identifying what kind of data this element represents (the 837 element type, in other words). 838 839 840 .. attribute:: text 841 tail 842 843 These attributes can be used to hold additional data associated with 844 the element. Their values are usually strings but may be any 845 application-specific object. If the element is created from 846 an XML file, the *text* attribute holds either the text between 847 the element's start tag and its first child or end tag, or ``None``, and 848 the *tail* attribute holds either the text between the element's 849 end tag and the next tag, or ``None``. For the XML data 850 851 .. code-block:: xml 852 853 <a><b>1<c>2<d/>3</c></b>4</a> 854 855 the *a* element has ``None`` for both *text* and *tail* attributes, 856 the *b* element has *text* ``"1"`` and *tail* ``"4"``, 857 the *c* element has *text* ``"2"`` and *tail* ``None``, 858 and the *d* element has *text* ``None`` and *tail* ``"3"``. 859 860 To collect the inner text of an element, see :meth:`itertext`, for 861 example ``"".join(element.itertext())``. 862 863 Applications may store arbitrary objects in these attributes. 864 865 866 .. attribute:: attrib 867 868 A dictionary containing the element's attributes. Note that while the 869 *attrib* value is always a real mutable Python dictionary, an ElementTree 870 implementation may choose to use another internal representation, and 871 create the dictionary only if someone asks for it. To take advantage of 872 such implementations, use the dictionary methods below whenever possible. 873 874 The following dictionary-like methods work on the element attributes. 875 876 877 .. method:: clear() 878 879 Resets an element. This function removes all subelements, clears all 880 attributes, and sets the text and tail attributes to ``None``. 881 882 883 .. method:: get(key, default=None) 884 885 Gets the element attribute named *key*. 886 887 Returns the attribute value, or *default* if the attribute was not found. 888 889 890 .. method:: items() 891 892 Returns the element attributes as a sequence of (name, value) pairs. The 893 attributes are returned in an arbitrary order. 894 895 896 .. method:: keys() 897 898 Returns the elements attribute names as a list. The names are returned 899 in an arbitrary order. 900 901 902 .. method:: set(key, value) 903 904 Set the attribute *key* on the element to *value*. 905 906 The following methods work on the element's children (subelements). 907 908 909 .. method:: append(subelement) 910 911 Adds the element *subelement* to the end of this element's internal list 912 of subelements. Raises :exc:`TypeError` if *subelement* is not an 913 :class:`Element`. 914 915 916 .. method:: extend(subelements) 917 918 Appends *subelements* from a sequence object with zero or more elements. 919 Raises :exc:`TypeError` if a subelement is not an :class:`Element`. 920 921 .. versionadded:: 3.2 922 923 924 .. method:: find(match, namespaces=None) 925 926 Finds the first subelement matching *match*. *match* may be a tag name 927 or a :ref:`path <elementtree-xpath>`. Returns an element instance 928 or ``None``. *namespaces* is an optional mapping from namespace prefix 929 to full name. Pass ``''`` as prefix to move all unprefixed tag names 930 in the expression into the given namespace. 931 932 933 .. method:: findall(match, namespaces=None) 934 935 Finds all matching subelements, by tag name or 936 :ref:`path <elementtree-xpath>`. Returns a list containing all matching 937 elements in document order. *namespaces* is an optional mapping from 938 namespace prefix to full name. Pass ``''`` as prefix to move all 939 unprefixed tag names in the expression into the given namespace. 940 941 942 .. method:: findtext(match, default=None, namespaces=None) 943 944 Finds text for the first subelement matching *match*. *match* may be 945 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content 946 of the first matching element, or *default* if no element was found. 947 Note that if the matching element has no text content an empty string 948 is returned. *namespaces* is an optional mapping from namespace prefix 949 to full name. Pass ``''`` as prefix to move all unprefixed tag names 950 in the expression into the given namespace. 951 952 953 .. method:: getchildren() 954 955 .. deprecated-removed:: 3.2 3.9 956 Use ``list(elem)`` or iteration. 957 958 959 .. method:: getiterator(tag=None) 960 961 .. deprecated-removed:: 3.2 3.9 962 Use method :meth:`Element.iter` instead. 963 964 965 .. method:: insert(index, subelement) 966 967 Inserts *subelement* at the given position in this element. Raises 968 :exc:`TypeError` if *subelement* is not an :class:`Element`. 969 970 971 .. method:: iter(tag=None) 972 973 Creates a tree :term:`iterator` with the current element as the root. 974 The iterator iterates over this element and all elements below it, in 975 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only 976 elements whose tag equals *tag* are returned from the iterator. If the 977 tree structure is modified during iteration, the result is undefined. 978 979 .. versionadded:: 3.2 980 981 982 .. method:: iterfind(match, namespaces=None) 983 984 Finds all matching subelements, by tag name or 985 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all 986 matching elements in document order. *namespaces* is an optional mapping 987 from namespace prefix to full name. 988 989 990 .. versionadded:: 3.2 991 992 993 .. method:: itertext() 994 995 Creates a text iterator. The iterator loops over this element and all 996 subelements, in document order, and returns all inner text. 997 998 .. versionadded:: 3.2 999 1000 1001 .. method:: makeelement(tag, attrib) 1002 1003 Creates a new element object of the same type as this element. Do not 1004 call this method, use the :func:`SubElement` factory function instead. 1005 1006 1007 .. method:: remove(subelement) 1008 1009 Removes *subelement* from the element. Unlike the find\* methods this 1010 method compares elements based on the instance identity, not on tag value 1011 or contents. 1012 1013 :class:`Element` objects also support the following sequence type methods 1014 for working with subelements: :meth:`~object.__delitem__`, 1015 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`, 1016 :meth:`~object.__len__`. 1017 1018 Caution: Elements with no subelements will test as ``False``. This behavior 1019 will change in future versions. Use specific ``len(elem)`` or ``elem is 1020 None`` test instead. :: 1021 1022 element = root.find('foo') 1023 1024 if not element: # careful! 1025 print("element not found, or element has no subelements") 1026 1027 if element is None: 1028 print("element not found") 1029 1030 Prior to Python 3.8, the serialisation order of the XML attributes of 1031 elements was artificially made predictable by sorting the attributes by 1032 their name. Based on the now guaranteed ordering of dicts, this arbitrary 1033 reordering was removed in Python 3.8 to preserve the order in which 1034 attributes were originally parsed or created by user code. 1035 1036 In general, user code should try not to depend on a specific ordering of 1037 attributes, given that the `XML Information Set 1038 <https://www.w3.org/TR/xml-infoset/>`_ explicitly excludes the attribute 1039 order from conveying information. Code should be prepared to deal with 1040 any ordering on input. In cases where deterministic XML output is required, 1041 e.g. for cryptographic signing or test data sets, canonical serialisation 1042 is available with the :func:`canonicalize` function. 1043 1044 In cases where canonical output is not applicable but a specific attribute 1045 order is still desirable on output, code should aim for creating the 1046 attributes directly in the desired order, to avoid perceptual mismatches 1047 for readers of the code. In cases where this is difficult to achieve, a 1048 recipe like the following can be applied prior to serialisation to enforce 1049 an order independently from the Element creation:: 1050 1051 def reorder_attributes(root): 1052 for el in root.iter(): 1053 attrib = el.attrib 1054 if len(attrib) > 1: 1055 # adjust attribute order, e.g. by sorting 1056 attribs = sorted(attrib.items()) 1057 attrib.clear() 1058 attrib.update(attribs) 1059 1060 1061.. _elementtree-elementtree-objects: 1062 1063ElementTree Objects 1064^^^^^^^^^^^^^^^^^^^ 1065 1066 1067.. class:: ElementTree(element=None, file=None) 1068 1069 ElementTree wrapper class. This class represents an entire element 1070 hierarchy, and adds some extra support for serialization to and from 1071 standard XML. 1072 1073 *element* is the root element. The tree is initialized with the contents 1074 of the XML *file* if given. 1075 1076 1077 .. method:: _setroot(element) 1078 1079 Replaces the root element for this tree. This discards the current 1080 contents of the tree, and replaces it with the given element. Use with 1081 care. *element* is an element instance. 1082 1083 1084 .. method:: find(match, namespaces=None) 1085 1086 Same as :meth:`Element.find`, starting at the root of the tree. 1087 1088 1089 .. method:: findall(match, namespaces=None) 1090 1091 Same as :meth:`Element.findall`, starting at the root of the tree. 1092 1093 1094 .. method:: findtext(match, default=None, namespaces=None) 1095 1096 Same as :meth:`Element.findtext`, starting at the root of the tree. 1097 1098 1099 .. method:: getiterator(tag=None) 1100 1101 .. deprecated-removed:: 3.2 3.9 1102 Use method :meth:`ElementTree.iter` instead. 1103 1104 1105 .. method:: getroot() 1106 1107 Returns the root element for this tree. 1108 1109 1110 .. method:: iter(tag=None) 1111 1112 Creates and returns a tree iterator for the root element. The iterator 1113 loops over all elements in this tree, in section order. *tag* is the tag 1114 to look for (default is to return all elements). 1115 1116 1117 .. method:: iterfind(match, namespaces=None) 1118 1119 Same as :meth:`Element.iterfind`, starting at the root of the tree. 1120 1121 .. versionadded:: 3.2 1122 1123 1124 .. method:: parse(source, parser=None) 1125 1126 Loads an external XML section into this element tree. *source* is a file 1127 name or :term:`file object`. *parser* is an optional parser instance. 1128 If not given, the standard :class:`XMLParser` parser is used. Returns the 1129 section root element. 1130 1131 1132 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \ 1133 default_namespace=None, method="xml", *, \ 1134 short_empty_elements=True) 1135 1136 Writes the element tree to a file, as XML. *file* is a file name, or a 1137 :term:`file object` opened for writing. *encoding* [1]_ is the output 1138 encoding (default is US-ASCII). 1139 *xml_declaration* controls if an XML declaration should be added to the 1140 file. Use ``False`` for never, ``True`` for always, ``None`` 1141 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``). 1142 *default_namespace* sets the default XML namespace (for "xmlns"). 1143 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is 1144 ``"xml"``). 1145 The keyword-only *short_empty_elements* parameter controls the formatting 1146 of elements that contain no content. If ``True`` (the default), they are 1147 emitted as a single self-closed tag, otherwise they are emitted as a pair 1148 of start/end tags. 1149 1150 The output is either a string (:class:`str`) or binary (:class:`bytes`). 1151 This is controlled by the *encoding* argument. If *encoding* is 1152 ``"unicode"``, the output is a string; otherwise, it's binary. Note that 1153 this may conflict with the type of *file* if it's an open 1154 :term:`file object`; make sure you do not try to write a string to a 1155 binary stream and vice versa. 1156 1157 .. versionadded:: 3.4 1158 The *short_empty_elements* parameter. 1159 1160 .. versionchanged:: 3.8 1161 The :meth:`write` method now preserves the attribute order specified 1162 by the user. 1163 1164 1165This is the XML file that is going to be manipulated:: 1166 1167 <html> 1168 <head> 1169 <title>Example page</title> 1170 </head> 1171 <body> 1172 <p>Moved to <a href="http://example.org/">example.org</a> 1173 or <a href="http://example.com/">example.com</a>.</p> 1174 </body> 1175 </html> 1176 1177Example of changing the attribute "target" of every link in first paragraph:: 1178 1179 >>> from xml.etree.ElementTree import ElementTree 1180 >>> tree = ElementTree() 1181 >>> tree.parse("index.xhtml") 1182 <Element 'html' at 0xb77e6fac> 1183 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body 1184 >>> p 1185 <Element 'p' at 0xb77ec26c> 1186 >>> links = list(p.iter("a")) # Returns list of all links 1187 >>> links 1188 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>] 1189 >>> for i in links: # Iterates through all found links 1190 ... i.attrib["target"] = "blank" 1191 >>> tree.write("output.xhtml") 1192 1193.. _elementtree-qname-objects: 1194 1195QName Objects 1196^^^^^^^^^^^^^ 1197 1198 1199.. class:: QName(text_or_uri, tag=None) 1200 1201 QName wrapper. This can be used to wrap a QName attribute value, in order 1202 to get proper namespace handling on output. *text_or_uri* is a string 1203 containing the QName value, in the form {uri}local, or, if the tag argument 1204 is given, the URI part of a QName. If *tag* is given, the first argument is 1205 interpreted as a URI, and this argument is interpreted as a local name. 1206 :class:`QName` instances are opaque. 1207 1208 1209 1210.. _elementtree-treebuilder-objects: 1211 1212TreeBuilder Objects 1213^^^^^^^^^^^^^^^^^^^ 1214 1215 1216.. class:: TreeBuilder(element_factory=None, *, comment_factory=None, \ 1217 pi_factory=None, insert_comments=False, insert_pis=False) 1218 1219 Generic element structure builder. This builder converts a sequence of 1220 start, data, end, comment and pi method calls to a well-formed element 1221 structure. You can use this class to build an element structure using 1222 a custom XML parser, or a parser for some other XML-like format. 1223 1224 *element_factory*, when given, must be a callable accepting two positional 1225 arguments: a tag and a dict of attributes. It is expected to return a new 1226 element instance. 1227 1228 The *comment_factory* and *pi_factory* functions, when given, should behave 1229 like the :func:`Comment` and :func:`ProcessingInstruction` functions to 1230 create comments and processing instructions. When not given, the default 1231 factories will be used. When *insert_comments* and/or *insert_pis* is true, 1232 comments/pis will be inserted into the tree if they appear within the root 1233 element (but not outside of it). 1234 1235 .. method:: close() 1236 1237 Flushes the builder buffers, and returns the toplevel document 1238 element. Returns an :class:`Element` instance. 1239 1240 1241 .. method:: data(data) 1242 1243 Adds text to the current element. *data* is a string. This should be 1244 either a bytestring, or a Unicode string. 1245 1246 1247 .. method:: end(tag) 1248 1249 Closes the current element. *tag* is the element name. Returns the 1250 closed element. 1251 1252 1253 .. method:: start(tag, attrs) 1254 1255 Opens a new element. *tag* is the element name. *attrs* is a dictionary 1256 containing element attributes. Returns the opened element. 1257 1258 1259 .. method:: comment(text) 1260 1261 Creates a comment with the given *text*. If ``insert_comments`` is true, 1262 this will also add it to the tree. 1263 1264 .. versionadded:: 3.8 1265 1266 1267 .. method:: pi(target, text) 1268 1269 Creates a comment with the given *target* name and *text*. If 1270 ``insert_pis`` is true, this will also add it to the tree. 1271 1272 .. versionadded:: 3.8 1273 1274 1275 In addition, a custom :class:`TreeBuilder` object can provide the 1276 following methods: 1277 1278 .. method:: doctype(name, pubid, system) 1279 1280 Handles a doctype declaration. *name* is the doctype name. *pubid* is 1281 the public identifier. *system* is the system identifier. This method 1282 does not exist on the default :class:`TreeBuilder` class. 1283 1284 .. versionadded:: 3.2 1285 1286 .. method:: start_ns(prefix, uri) 1287 1288 Is called whenever the parser encounters a new namespace declaration, 1289 before the ``start()`` callback for the opening element that defines it. 1290 *prefix* is ``''`` for the default namespace and the declared 1291 namespace prefix name otherwise. *uri* is the namespace URI. 1292 1293 .. versionadded:: 3.8 1294 1295 .. method:: end_ns(prefix) 1296 1297 Is called after the ``end()`` callback of an element that declared 1298 a namespace prefix mapping, with the name of the *prefix* that went 1299 out of scope. 1300 1301 .. versionadded:: 3.8 1302 1303 1304.. class:: C14NWriterTarget(write, *, \ 1305 with_comments=False, strip_text=False, rewrite_prefixes=False, \ 1306 qname_aware_tags=None, qname_aware_attrs=None, \ 1307 exclude_attrs=None, exclude_tags=None) 1308 1309 A `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ writer. Arguments are the 1310 same as for the :func:`canonicalize` function. This class does not build a 1311 tree but translates the callback events directly into a serialised form 1312 using the *write* function. 1313 1314 .. versionadded:: 3.8 1315 1316 1317.. _elementtree-xmlparser-objects: 1318 1319XMLParser Objects 1320^^^^^^^^^^^^^^^^^ 1321 1322 1323.. class:: XMLParser(*, target=None, encoding=None) 1324 1325 This class is the low-level building block of the module. It uses 1326 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can 1327 be fed XML data incrementally with the :meth:`feed` method, and parsing 1328 events are translated to a push API - by invoking callbacks on the *target* 1329 object. If *target* is omitted, the standard :class:`TreeBuilder` is used. 1330 If *encoding* [1]_ is given, the value overrides the 1331 encoding specified in the XML file. 1332 1333 .. versionchanged:: 3.8 1334 Parameters are now :ref:`keyword-only <keyword-only_parameter>`. 1335 The *html* argument no longer supported. 1336 1337 1338 .. method:: close() 1339 1340 Finishes feeding data to the parser. Returns the result of calling the 1341 ``close()`` method of the *target* passed during construction; by default, 1342 this is the toplevel document element. 1343 1344 1345 .. method:: feed(data) 1346 1347 Feeds data to the parser. *data* is encoded data. 1348 1349 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method 1350 for each opening tag, its ``end(tag)`` method for each closing tag, and data 1351 is processed by method ``data(data)``. For further supported callback 1352 methods, see the :class:`TreeBuilder` class. :meth:`XMLParser.close` calls 1353 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for 1354 building a tree structure. This is an example of counting the maximum depth 1355 of an XML file:: 1356 1357 >>> from xml.etree.ElementTree import XMLParser 1358 >>> class MaxDepth: # The target object of the parser 1359 ... maxDepth = 0 1360 ... depth = 0 1361 ... def start(self, tag, attrib): # Called for each opening tag. 1362 ... self.depth += 1 1363 ... if self.depth > self.maxDepth: 1364 ... self.maxDepth = self.depth 1365 ... def end(self, tag): # Called for each closing tag. 1366 ... self.depth -= 1 1367 ... def data(self, data): 1368 ... pass # We do not need to do anything with data. 1369 ... def close(self): # Called when all data has been parsed. 1370 ... return self.maxDepth 1371 ... 1372 >>> target = MaxDepth() 1373 >>> parser = XMLParser(target=target) 1374 >>> exampleXml = """ 1375 ... <a> 1376 ... <b> 1377 ... </b> 1378 ... <b> 1379 ... <c> 1380 ... <d> 1381 ... </d> 1382 ... </c> 1383 ... </b> 1384 ... </a>""" 1385 >>> parser.feed(exampleXml) 1386 >>> parser.close() 1387 4 1388 1389 1390.. _elementtree-xmlpullparser-objects: 1391 1392XMLPullParser Objects 1393^^^^^^^^^^^^^^^^^^^^^ 1394 1395.. class:: XMLPullParser(events=None) 1396 1397 A pull parser suitable for non-blocking applications. Its input-side API is 1398 similar to that of :class:`XMLParser`, but instead of pushing calls to a 1399 callback target, :class:`XMLPullParser` collects an internal list of parsing 1400 events and lets the user read from it. *events* is a sequence of events to 1401 report back. The supported events are the strings ``"start"``, ``"end"``, 1402 ``"comment"``, ``"pi"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" events 1403 are used to get detailed namespace information). If *events* is omitted, 1404 only ``"end"`` events are reported. 1405 1406 .. method:: feed(data) 1407 1408 Feed the given bytes data to the parser. 1409 1410 .. method:: close() 1411 1412 Signal the parser that the data stream is terminated. Unlike 1413 :meth:`XMLParser.close`, this method always returns :const:`None`. 1414 Any events not yet retrieved when the parser is closed can still be 1415 read with :meth:`read_events`. 1416 1417 .. method:: read_events() 1418 1419 Return an iterator over the events which have been encountered in the 1420 data fed to the 1421 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a 1422 string representing the type of event (e.g. ``"end"``) and *elem* is the 1423 encountered :class:`Element` object, or other context value as follows. 1424 1425 * ``start``, ``end``: the current Element. 1426 * ``comment``, ``pi``: the current comment / processing instruction 1427 * ``start-ns``: a tuple ``(prefix, uri)`` naming the declared namespace 1428 mapping. 1429 * ``end-ns``: :const:`None` (this may change in a future version) 1430 1431 Events provided in a previous call to :meth:`read_events` will not be 1432 yielded again. Events are consumed from the internal queue only when 1433 they are retrieved from the iterator, so multiple readers iterating in 1434 parallel over iterators obtained from :meth:`read_events` will have 1435 unpredictable results. 1436 1437 .. note:: 1438 1439 :class:`XMLPullParser` only guarantees that it has seen the ">" 1440 character of a starting tag when it emits a "start" event, so the 1441 attributes are defined, but the contents of the text and tail attributes 1442 are undefined at that point. The same applies to the element children; 1443 they may or may not be present. 1444 1445 If you need a fully populated element, look for "end" events instead. 1446 1447 .. versionadded:: 3.4 1448 1449 .. versionchanged:: 3.8 1450 The ``comment`` and ``pi`` events were added. 1451 1452 1453Exceptions 1454^^^^^^^^^^ 1455 1456.. class:: ParseError 1457 1458 XML parse error, raised by the various parsing methods in this module when 1459 parsing fails. The string representation of an instance of this exception 1460 will contain a user-friendly error message. In addition, it will have 1461 the following attributes available: 1462 1463 .. attribute:: code 1464 1465 A numeric error code from the expat parser. See the documentation of 1466 :mod:`xml.parsers.expat` for the list of error codes and their meanings. 1467 1468 .. attribute:: position 1469 1470 A tuple of *line*, *column* numbers, specifying where the error occurred. 1471 1472.. rubric:: Footnotes 1473 1474.. [1] The encoding string included in XML output should conform to the 1475 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is 1476 not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 1477 and https://www.iana.org/assignments/character-sets/character-sets.xhtml. 1478