1:mod:`xml.etree.ElementTree` --- The ElementTree XML API 2======================================================== 3 4.. module:: xml.etree.ElementTree 5 :synopsis: Implementation of the ElementTree API. 6 7.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com> 8 9**Source code:** :source:`Lib/xml/etree/ElementTree.py` 10 11-------------- 12 13The :mod:`xml.etree.ElementTree` module implements a simple and efficient API 14for parsing and creating XML data. 15 16.. versionchanged:: 3.3 17 This module will use a fast implementation whenever available. 18 19.. deprecated:: 3.3 20 The :mod:`xml.etree.cElementTree` module is deprecated. 21 22 23.. warning:: 24 25 The :mod:`xml.etree.ElementTree` module is not secure against 26 maliciously constructed data. If you need to parse untrusted or 27 unauthenticated data see :ref:`xml-vulnerabilities`. 28 29Tutorial 30-------- 31 32This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in 33short). The goal is to demonstrate some of the building blocks and basic 34concepts of the module. 35 36XML tree and elements 37^^^^^^^^^^^^^^^^^^^^^ 38 39XML is an inherently hierarchical data format, and the most natural way to 40represent it is with a tree. ``ET`` has two classes for this purpose - 41:class:`ElementTree` represents the whole XML document as a tree, and 42:class:`Element` represents a single node in this tree. Interactions with 43the whole document (reading and writing to/from files) are usually done 44on the :class:`ElementTree` level. Interactions with a single XML element 45and its sub-elements are done on the :class:`Element` level. 46 47.. _elementtree-parsing-xml: 48 49Parsing XML 50^^^^^^^^^^^ 51 52We'll be using the following XML document as the sample data for this section: 53 54.. code-block:: xml 55 56 <?xml version="1.0"?> 57 <data> 58 <country name="Liechtenstein"> 59 <rank>1</rank> 60 <year>2008</year> 61 <gdppc>141100</gdppc> 62 <neighbor name="Austria" direction="E"/> 63 <neighbor name="Switzerland" direction="W"/> 64 </country> 65 <country name="Singapore"> 66 <rank>4</rank> 67 <year>2011</year> 68 <gdppc>59900</gdppc> 69 <neighbor name="Malaysia" direction="N"/> 70 </country> 71 <country name="Panama"> 72 <rank>68</rank> 73 <year>2011</year> 74 <gdppc>13600</gdppc> 75 <neighbor name="Costa Rica" direction="W"/> 76 <neighbor name="Colombia" direction="E"/> 77 </country> 78 </data> 79 80We can import this data by reading from a file:: 81 82 import xml.etree.ElementTree as ET 83 tree = ET.parse('country_data.xml') 84 root = tree.getroot() 85 86Or directly from a string:: 87 88 root = ET.fromstring(country_data_as_string) 89 90:func:`fromstring` parses XML from a string directly into an :class:`Element`, 91which is the root element of the parsed tree. Other parsing functions may 92create an :class:`ElementTree`. Check the documentation to be sure. 93 94As an :class:`Element`, ``root`` has a tag and a dictionary of attributes:: 95 96 >>> root.tag 97 'data' 98 >>> root.attrib 99 {} 100 101It also has children nodes over which we can iterate:: 102 103 >>> for child in root: 104 ... print(child.tag, child.attrib) 105 ... 106 country {'name': 'Liechtenstein'} 107 country {'name': 'Singapore'} 108 country {'name': 'Panama'} 109 110Children are nested, and we can access specific child nodes by index:: 111 112 >>> root[0][1].text 113 '2008' 114 115 116.. note:: 117 118 Not all elements of the XML input will end up as elements of the 119 parsed tree. Currently, this module skips over any XML comments, 120 processing instructions, and document type declarations in the 121 input. Nevertheless, trees built using this module's API rather 122 than parsing from XML text can have comments and processing 123 instructions in them; they will be included when generating XML 124 output. A document type declaration may be accessed by passing a 125 custom :class:`TreeBuilder` instance to the :class:`XMLParser` 126 constructor. 127 128 129.. _elementtree-pull-parsing: 130 131Pull API for non-blocking parsing 132^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 133 134Most parsing functions provided by this module require the whole document 135to be read at once before returning any result. It is possible to use an 136:class:`XMLParser` and feed data into it incrementally, but it is a push API that 137calls methods on a callback target, which is too low-level and inconvenient for 138most needs. Sometimes what the user really wants is to be able to parse XML 139incrementally, without blocking operations, while enjoying the convenience of 140fully constructed :class:`Element` objects. 141 142The most powerful tool for doing this is :class:`XMLPullParser`. It does not 143require a blocking read to obtain the XML data, and is instead fed with data 144incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML 145elements, call :meth:`XMLPullParser.read_events`. Here is an example:: 146 147 >>> parser = ET.XMLPullParser(['start', 'end']) 148 >>> parser.feed('<mytag>sometext') 149 >>> list(parser.read_events()) 150 [('start', <Element 'mytag' at 0x7fa66db2be58>)] 151 >>> parser.feed(' more text</mytag>') 152 >>> for event, elem in parser.read_events(): 153 ... print(event) 154 ... print(elem.tag, 'text=', elem.text) 155 ... 156 end 157 158The obvious use case is applications that operate in a non-blocking fashion 159where the XML data is being received from a socket or read incrementally from 160some storage device. In such cases, blocking reads are unacceptable. 161 162Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for 163simpler use-cases. If you don't mind your application blocking on reading XML 164data but would still like to have incremental parsing capabilities, take a look 165at :func:`iterparse`. It can be useful when you're reading a large XML document 166and don't want to hold it wholly in memory. 167 168Finding interesting elements 169^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 170 171:class:`Element` has some useful methods that help iterate recursively over all 172the sub-tree below it (its children, their children, and so on). For example, 173:meth:`Element.iter`:: 174 175 >>> for neighbor in root.iter('neighbor'): 176 ... print(neighbor.attrib) 177 ... 178 {'name': 'Austria', 'direction': 'E'} 179 {'name': 'Switzerland', 'direction': 'W'} 180 {'name': 'Malaysia', 'direction': 'N'} 181 {'name': 'Costa Rica', 'direction': 'W'} 182 {'name': 'Colombia', 'direction': 'E'} 183 184:meth:`Element.findall` finds only elements with a tag which are direct 185children of the current element. :meth:`Element.find` finds the *first* child 186with a particular tag, and :attr:`Element.text` accesses the element's text 187content. :meth:`Element.get` accesses the element's attributes:: 188 189 >>> for country in root.findall('country'): 190 ... rank = country.find('rank').text 191 ... name = country.get('name') 192 ... print(name, rank) 193 ... 194 Liechtenstein 1 195 Singapore 4 196 Panama 68 197 198More sophisticated specification of which elements to look for is possible by 199using :ref:`XPath <elementtree-xpath>`. 200 201Modifying an XML File 202^^^^^^^^^^^^^^^^^^^^^ 203 204:class:`ElementTree` provides a simple way to build XML documents and write them to files. 205The :meth:`ElementTree.write` method serves this purpose. 206 207Once created, an :class:`Element` object may be manipulated by directly changing 208its fields (such as :attr:`Element.text`), adding and modifying attributes 209(:meth:`Element.set` method), as well as adding new children (for example 210with :meth:`Element.append`). 211 212Let's say we want to add one to each country's rank, and add an ``updated`` 213attribute to the rank element:: 214 215 >>> for rank in root.iter('rank'): 216 ... new_rank = int(rank.text) + 1 217 ... rank.text = str(new_rank) 218 ... rank.set('updated', 'yes') 219 ... 220 >>> tree.write('output.xml') 221 222Our XML now looks like this: 223 224.. code-block:: xml 225 226 <?xml version="1.0"?> 227 <data> 228 <country name="Liechtenstein"> 229 <rank updated="yes">2</rank> 230 <year>2008</year> 231 <gdppc>141100</gdppc> 232 <neighbor name="Austria" direction="E"/> 233 <neighbor name="Switzerland" direction="W"/> 234 </country> 235 <country name="Singapore"> 236 <rank updated="yes">5</rank> 237 <year>2011</year> 238 <gdppc>59900</gdppc> 239 <neighbor name="Malaysia" direction="N"/> 240 </country> 241 <country name="Panama"> 242 <rank updated="yes">69</rank> 243 <year>2011</year> 244 <gdppc>13600</gdppc> 245 <neighbor name="Costa Rica" direction="W"/> 246 <neighbor name="Colombia" direction="E"/> 247 </country> 248 </data> 249 250We can remove elements using :meth:`Element.remove`. Let's say we want to 251remove all countries with a rank higher than 50:: 252 253 >>> for country in root.findall('country'): 254 ... # using root.findall() to avoid removal during traversal 255 ... rank = int(country.find('rank').text) 256 ... if rank > 50: 257 ... root.remove(country) 258 ... 259 >>> tree.write('output.xml') 260 261Note that concurrent modification while iterating can lead to problems, 262just like when iterating and modifying Python lists or dicts. 263Therefore, the example first collects all matching elements with 264``root.findall()``, and only then iterates over the list of matches. 265 266Our XML now looks like this: 267 268.. code-block:: xml 269 270 <?xml version="1.0"?> 271 <data> 272 <country name="Liechtenstein"> 273 <rank updated="yes">2</rank> 274 <year>2008</year> 275 <gdppc>141100</gdppc> 276 <neighbor name="Austria" direction="E"/> 277 <neighbor name="Switzerland" direction="W"/> 278 </country> 279 <country name="Singapore"> 280 <rank updated="yes">5</rank> 281 <year>2011</year> 282 <gdppc>59900</gdppc> 283 <neighbor name="Malaysia" direction="N"/> 284 </country> 285 </data> 286 287Building XML documents 288^^^^^^^^^^^^^^^^^^^^^^ 289 290The :func:`SubElement` function also provides a convenient way to create new 291sub-elements for a given element:: 292 293 >>> a = ET.Element('a') 294 >>> b = ET.SubElement(a, 'b') 295 >>> c = ET.SubElement(a, 'c') 296 >>> d = ET.SubElement(c, 'd') 297 >>> ET.dump(a) 298 <a><b /><c><d /></c></a> 299 300Parsing XML with Namespaces 301^^^^^^^^^^^^^^^^^^^^^^^^^^^ 302 303If the XML input has `namespaces 304<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes 305with prefixes in the form ``prefix:sometag`` get expanded to 306``{uri}sometag`` where the *prefix* is replaced by the full *URI*. 307Also, if there is a `default namespace 308<https://www.w3.org/TR/xml-names/#defaulting>`__, 309that full URI gets prepended to all of the non-prefixed tags. 310 311Here is an XML example that incorporates two namespaces, one with the 312prefix "fictional" and the other serving as the default namespace: 313 314.. code-block:: xml 315 316 <?xml version="1.0"?> 317 <actors xmlns:fictional="http://characters.example.com" 318 xmlns="http://people.example.com"> 319 <actor> 320 <name>John Cleese</name> 321 <fictional:character>Lancelot</fictional:character> 322 <fictional:character>Archie Leach</fictional:character> 323 </actor> 324 <actor> 325 <name>Eric Idle</name> 326 <fictional:character>Sir Robin</fictional:character> 327 <fictional:character>Gunther</fictional:character> 328 <fictional:character>Commander Clement</fictional:character> 329 </actor> 330 </actors> 331 332One way to search and explore this XML example is to manually add the 333URI to every tag or attribute in the xpath of a 334:meth:`~Element.find` or :meth:`~Element.findall`:: 335 336 root = fromstring(xml_text) 337 for actor in root.findall('{http://people.example.com}actor'): 338 name = actor.find('{http://people.example.com}name') 339 print(name.text) 340 for char in actor.findall('{http://characters.example.com}character'): 341 print(' |-->', char.text) 342 343A better way to search the namespaced XML example is to create a 344dictionary with your own prefixes and use those in the search functions:: 345 346 ns = {'real_person': 'http://people.example.com', 347 'role': 'http://characters.example.com'} 348 349 for actor in root.findall('real_person:actor', ns): 350 name = actor.find('real_person:name', ns) 351 print(name.text) 352 for char in actor.findall('role:character', ns): 353 print(' |-->', char.text) 354 355These two approaches both output:: 356 357 John Cleese 358 |--> Lancelot 359 |--> Archie Leach 360 Eric Idle 361 |--> Sir Robin 362 |--> Gunther 363 |--> Commander Clement 364 365 366Additional resources 367^^^^^^^^^^^^^^^^^^^^ 368 369See http://effbot.org/zone/element-index.htm for tutorials and links to other 370docs. 371 372 373.. _elementtree-xpath: 374 375XPath support 376------------- 377 378This module provides limited support for 379`XPath expressions <https://www.w3.org/TR/xpath>`_ for locating elements in a 380tree. The goal is to support a small subset of the abbreviated syntax; a full 381XPath engine is outside the scope of the module. 382 383Example 384^^^^^^^ 385 386Here's an example that demonstrates some of the XPath capabilities of the 387module. We'll be using the ``countrydata`` XML document from the 388:ref:`Parsing XML <elementtree-parsing-xml>` section:: 389 390 import xml.etree.ElementTree as ET 391 392 root = ET.fromstring(countrydata) 393 394 # Top-level elements 395 root.findall(".") 396 397 # All 'neighbor' grand-children of 'country' children of the top-level 398 # elements 399 root.findall("./country/neighbor") 400 401 # Nodes with name='Singapore' that have a 'year' child 402 root.findall(".//year/..[@name='Singapore']") 403 404 # 'year' nodes that are children of nodes with name='Singapore' 405 root.findall(".//*[@name='Singapore']/year") 406 407 # All 'neighbor' nodes that are the second child of their parent 408 root.findall(".//neighbor[2]") 409 410For XML with namespaces, use the usual qualified ``{namespace}tag`` notation:: 411 412 # All dublin-core "title" tags in the document 413 root.findall(".//{http://purl.org/dc/elements/1.1/}title") 414 415 416Supported XPath syntax 417^^^^^^^^^^^^^^^^^^^^^^ 418 419.. tabularcolumns:: |l|L| 420 421+-----------------------+------------------------------------------------------+ 422| Syntax | Meaning | 423+=======================+======================================================+ 424| ``tag`` | Selects all child elements with the given tag. | 425| | For example, ``spam`` selects all child elements | 426| | named ``spam``, and ``spam/egg`` selects all | 427| | grandchildren named ``egg`` in all children named | 428| | ``spam``. ``{namespace}*`` selects all tags in the | 429| | given namespace, ``{*}spam`` selects tags named | 430| | ``spam`` in any (or no) namespace, and ``{}*`` | 431| | only selects tags that are not in a namespace. | 432| | | 433| | .. versionchanged:: 3.8 | 434| | Support for star-wildcards was added. | 435+-----------------------+------------------------------------------------------+ 436| ``*`` | Selects all child elements, including comments and | 437| | processing instructions. For example, ``*/egg`` | 438| | selects all grandchildren named ``egg``. | 439+-----------------------+------------------------------------------------------+ 440| ``.`` | Selects the current node. This is mostly useful | 441| | at the beginning of the path, to indicate that it's | 442| | a relative path. | 443+-----------------------+------------------------------------------------------+ 444| ``//`` | Selects all subelements, on all levels beneath the | 445| | current element. For example, ``.//egg`` selects | 446| | all ``egg`` elements in the entire tree. | 447+-----------------------+------------------------------------------------------+ 448| ``..`` | Selects the parent element. Returns ``None`` if the | 449| | path attempts to reach the ancestors of the start | 450| | element (the element ``find`` was called on). | 451+-----------------------+------------------------------------------------------+ 452| ``[@attrib]`` | Selects all elements that have the given attribute. | 453+-----------------------+------------------------------------------------------+ 454| ``[@attrib='value']`` | Selects all elements for which the given attribute | 455| | has the given value. The value cannot contain | 456| | quotes. | 457+-----------------------+------------------------------------------------------+ 458| ``[@attrib!='value']``| Selects all elements for which the given attribute | 459| | does not have the given value. The value cannot | 460| | contain quotes. | 461| | | 462| | .. versionadded:: 3.10 | 463+-----------------------+------------------------------------------------------+ 464| ``[tag]`` | Selects all elements that have a child named | 465| | ``tag``. Only immediate children are supported. | 466+-----------------------+------------------------------------------------------+ 467| ``[.='text']`` | Selects all elements whose complete text content, | 468| | including descendants, equals the given ``text``. | 469| | | 470| | .. versionadded:: 3.7 | 471+-----------------------+------------------------------------------------------+ 472| ``[.!='text']`` | Selects all elements whose complete text content, | 473| | including descendants, does not equal the given | 474| | ``text``. | 475| | | 476| | .. versionadded:: 3.10 | 477+-----------------------+------------------------------------------------------+ 478| ``[tag='text']`` | Selects all elements that have a child named | 479| | ``tag`` whose complete text content, including | 480| | descendants, equals the given ``text``. | 481+-----------------------+------------------------------------------------------+ 482| ``[tag!='text']`` | Selects all elements that have a child named | 483| | ``tag`` whose complete text content, including | 484| | descendants, does not equal the given ``text``. | 485| | | 486| | .. versionadded:: 3.10 | 487+-----------------------+------------------------------------------------------+ 488| ``[position]`` | Selects all elements that are located at the given | 489| | position. The position can be either an integer | 490| | (1 is the first position), the expression ``last()`` | 491| | (for the last position), or a position relative to | 492| | the last position (e.g. ``last()-1``). | 493+-----------------------+------------------------------------------------------+ 494 495Predicates (expressions within square brackets) must be preceded by a tag 496name, an asterisk, or another predicate. ``position`` predicates must be 497preceded by a tag name. 498 499Reference 500--------- 501 502.. _elementtree-functions: 503 504Functions 505^^^^^^^^^ 506 507.. function:: canonicalize(xml_data=None, *, out=None, from_file=None, **options) 508 509 `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ transformation function. 510 511 Canonicalization is a way to normalise XML output in a way that allows 512 byte-by-byte comparisons and digital signatures. It reduced the freedom 513 that XML serializers have and instead generates a more constrained XML 514 representation. The main restrictions regard the placement of namespace 515 declarations, the ordering of attributes, and ignorable whitespace. 516 517 This function takes an XML data string (*xml_data*) or a file path or 518 file-like object (*from_file*) as input, converts it to the canonical 519 form, and writes it out using the *out* file(-like) object, if provided, 520 or returns it as a text string if not. The output file receives text, 521 not bytes. It should therefore be opened in text mode with ``utf-8`` 522 encoding. 523 524 Typical uses:: 525 526 xml_data = "<root>...</root>" 527 print(canonicalize(xml_data)) 528 529 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: 530 canonicalize(xml_data, out=out_file) 531 532 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: 533 canonicalize(from_file="inputfile.xml", out=out_file) 534 535 The configuration *options* are as follows: 536 537 - *with_comments*: set to true to include comments (default: false) 538 - *strip_text*: set to true to strip whitespace before and after text content 539 (default: false) 540 - *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}" 541 (default: false) 542 - *qname_aware_tags*: a set of qname aware tag names in which prefixes 543 should be replaced in text content (default: empty) 544 - *qname_aware_attrs*: a set of qname aware attribute names in which prefixes 545 should be replaced in text content (default: empty) 546 - *exclude_attrs*: a set of attribute names that should not be serialised 547 - *exclude_tags*: a set of tag names that should not be serialised 548 549 In the option list above, "a set" refers to any collection or iterable of 550 strings, no ordering is expected. 551 552 .. versionadded:: 3.8 553 554 555.. function:: Comment(text=None) 556 557 Comment element factory. This factory function creates a special element 558 that will be serialized as an XML comment by the standard serializer. The 559 comment string can be either a bytestring or a Unicode string. *text* is a 560 string containing the comment string. Returns an element instance 561 representing a comment. 562 563 Note that :class:`XMLParser` skips over comments in the input 564 instead of creating comment objects for them. An :class:`ElementTree` will 565 only contain comment nodes if they have been inserted into to 566 the tree using one of the :class:`Element` methods. 567 568.. function:: dump(elem) 569 570 Writes an element tree or element structure to sys.stdout. This function 571 should be used for debugging only. 572 573 The exact output format is implementation dependent. In this version, it's 574 written as an ordinary XML file. 575 576 *elem* is an element tree or an individual element. 577 578 .. versionchanged:: 3.8 579 The :func:`dump` function now preserves the attribute order specified 580 by the user. 581 582 583.. function:: fromstring(text, parser=None) 584 585 Parses an XML section from a string constant. Same as :func:`XML`. *text* 586 is a string containing XML data. *parser* is an optional parser instance. 587 If not given, the standard :class:`XMLParser` parser is used. 588 Returns an :class:`Element` instance. 589 590 591.. function:: fromstringlist(sequence, parser=None) 592 593 Parses an XML document from a sequence of string fragments. *sequence* is a 594 list or other sequence containing XML data fragments. *parser* is an 595 optional parser instance. If not given, the standard :class:`XMLParser` 596 parser is used. Returns an :class:`Element` instance. 597 598 .. versionadded:: 3.2 599 600 601.. function:: indent(tree, space=" ", level=0) 602 603 Appends whitespace to the subtree to indent the tree visually. 604 This can be used to generate pretty-printed XML output. 605 *tree* can be an Element or ElementTree. *space* is the whitespace 606 string that will be inserted for each indentation level, two space 607 characters by default. For indenting partial subtrees inside of an 608 already indented tree, pass the initial indentation level as *level*. 609 610 .. versionadded:: 3.9 611 612 613.. function:: iselement(element) 614 615 Check if an object appears to be a valid element object. *element* is an 616 element instance. Return ``True`` if this is an element object. 617 618 619.. function:: iterparse(source, events=None, parser=None) 620 621 Parses an XML section into an element tree incrementally, and reports what's 622 going on to the user. *source* is a filename or :term:`file object` 623 containing XML data. *events* is a sequence of events to report back. The 624 supported events are the strings ``"start"``, ``"end"``, ``"comment"``, 625 ``"pi"``, ``"start-ns"`` and ``"end-ns"`` 626 (the "ns" events are used to get detailed namespace 627 information). If *events* is omitted, only ``"end"`` events are reported. 628 *parser* is an optional parser instance. If not given, the standard 629 :class:`XMLParser` parser is used. *parser* must be a subclass of 630 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a 631 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs. 632 633 Note that while :func:`iterparse` builds the tree incrementally, it issues 634 blocking reads on *source* (or the file it names). As such, it's unsuitable 635 for applications where blocking reads can't be made. For fully non-blocking 636 parsing, see :class:`XMLPullParser`. 637 638 .. note:: 639 640 :func:`iterparse` only guarantees that it has seen the ">" character of a 641 starting tag when it emits a "start" event, so the attributes are defined, 642 but the contents of the text and tail attributes are undefined at that 643 point. The same applies to the element children; they may or may not be 644 present. 645 646 If you need a fully populated element, look for "end" events instead. 647 648 .. deprecated:: 3.4 649 The *parser* argument. 650 651 .. versionchanged:: 3.8 652 The ``comment`` and ``pi`` events were added. 653 654 655.. function:: parse(source, parser=None) 656 657 Parses an XML section into an element tree. *source* is a filename or file 658 object containing XML data. *parser* is an optional parser instance. If 659 not given, the standard :class:`XMLParser` parser is used. Returns an 660 :class:`ElementTree` instance. 661 662 663.. function:: ProcessingInstruction(target, text=None) 664 665 PI element factory. This factory function creates a special element that 666 will be serialized as an XML processing instruction. *target* is a string 667 containing the PI target. *text* is a string containing the PI contents, if 668 given. Returns an element instance, representing a processing instruction. 669 670 Note that :class:`XMLParser` skips over processing instructions 671 in the input instead of creating comment objects for them. An 672 :class:`ElementTree` will only contain processing instruction nodes if 673 they have been inserted into to the tree using one of the 674 :class:`Element` methods. 675 676.. function:: register_namespace(prefix, uri) 677 678 Registers a namespace prefix. The registry is global, and any existing 679 mapping for either the given prefix or the namespace URI will be removed. 680 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and 681 attributes in this namespace will be serialized with the given prefix, if at 682 all possible. 683 684 .. versionadded:: 3.2 685 686 687.. function:: SubElement(parent, tag, attrib={}, **extra) 688 689 Subelement factory. This function creates an element instance, and appends 690 it to an existing element. 691 692 The element name, attribute names, and attribute values can be either 693 bytestrings or Unicode strings. *parent* is the parent element. *tag* is 694 the subelement name. *attrib* is an optional dictionary, containing element 695 attributes. *extra* contains additional attributes, given as keyword 696 arguments. Returns an element instance. 697 698 699.. function:: tostring(element, encoding="us-ascii", method="xml", *, \ 700 xml_declaration=None, default_namespace=None, \ 701 short_empty_elements=True) 702 703 Generates a string representation of an XML element, including all 704 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 705 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 706 generate a Unicode string (otherwise, a bytestring is generated). *method* 707 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 708 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same 709 meaning as in :meth:`ElementTree.write`. Returns an (optionally) encoded string 710 containing the XML data. 711 712 .. versionadded:: 3.4 713 The *short_empty_elements* parameter. 714 715 .. versionadded:: 3.8 716 The *xml_declaration* and *default_namespace* parameters. 717 718 .. versionchanged:: 3.8 719 The :func:`tostring` function now preserves the attribute order 720 specified by the user. 721 722 723.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \ 724 xml_declaration=None, default_namespace=None, \ 725 short_empty_elements=True) 726 727 Generates a string representation of an XML element, including all 728 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 729 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 730 generate a Unicode string (otherwise, a bytestring is generated). *method* 731 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 732 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same 733 meaning as in :meth:`ElementTree.write`. Returns a list of (optionally) encoded 734 strings containing the XML data. It does not guarantee any specific sequence, 735 except that ``b"".join(tostringlist(element)) == tostring(element)``. 736 737 .. versionadded:: 3.2 738 739 .. versionadded:: 3.4 740 The *short_empty_elements* parameter. 741 742 .. versionadded:: 3.8 743 The *xml_declaration* and *default_namespace* parameters. 744 745 .. versionchanged:: 3.8 746 The :func:`tostringlist` function now preserves the attribute order 747 specified by the user. 748 749 750.. function:: XML(text, parser=None) 751 752 Parses an XML section from a string constant. This function can be used to 753 embed "XML literals" in Python code. *text* is a string containing XML 754 data. *parser* is an optional parser instance. If not given, the standard 755 :class:`XMLParser` parser is used. Returns an :class:`Element` instance. 756 757 758.. function:: XMLID(text, parser=None) 759 760 Parses an XML section from a string constant, and also returns a dictionary 761 which maps from element id:s to elements. *text* is a string containing XML 762 data. *parser* is an optional parser instance. If not given, the standard 763 :class:`XMLParser` parser is used. Returns a tuple containing an 764 :class:`Element` instance and a dictionary. 765 766 767.. _elementtree-xinclude: 768 769XInclude support 770---------------- 771 772This module provides limited support for 773`XInclude directives <https://www.w3.org/TR/xinclude/>`_, via the :mod:`xml.etree.ElementInclude` helper module. This module can be used to insert subtrees and text strings into element trees, based on information in the tree. 774 775Example 776^^^^^^^ 777 778Here's an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the ``{http://www.w3.org/2001/XInclude}include`` element and set the **parse** attribute to ``"xml"``, and use the **href** attribute to specify the document to include. 779 780.. code-block:: xml 781 782 <?xml version="1.0"?> 783 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 784 <xi:include href="source.xml" parse="xml" /> 785 </document> 786 787By default, the **href** attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax. 788 789To process this file, load it as usual, and pass the root element to the :mod:`xml.etree.ElementTree` module: 790 791.. code-block:: python 792 793 from xml.etree import ElementTree, ElementInclude 794 795 tree = ElementTree.parse("document.xml") 796 root = tree.getroot() 797 798 ElementInclude.include(root) 799 800The ElementInclude module replaces the ``{http://www.w3.org/2001/XInclude}include`` element with the root element from the **source.xml** document. The result might look something like this: 801 802.. code-block:: xml 803 804 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 805 <para>This is a paragraph.</para> 806 </document> 807 808If the **parse** attribute is omitted, it defaults to "xml". The href attribute is required. 809 810To include a text document, use the ``{http://www.w3.org/2001/XInclude}include`` element, and set the **parse** attribute to "text": 811 812.. code-block:: xml 813 814 <?xml version="1.0"?> 815 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 816 Copyright (c) <xi:include href="year.txt" parse="text" />. 817 </document> 818 819The result might look something like: 820 821.. code-block:: xml 822 823 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 824 Copyright (c) 2003. 825 </document> 826 827Reference 828--------- 829 830.. _elementinclude-functions: 831 832Functions 833^^^^^^^^^ 834 835.. function:: xml.etree.ElementInclude.default_loader( href, parse, encoding=None) 836 837 Default loader. This default loader reads an included resource from disk. *href* is a URL. 838 *parse* is for parse mode either "xml" or "text". *encoding* 839 is an optional text encoding. If not given, encoding is ``utf-8``. Returns the 840 expanded resource. If the parse mode is ``"xml"``, this is an ElementTree 841 instance. If the parse mode is "text", this is a Unicode string. If the 842 loader fails, it can return None or raise an exception. 843 844 845.. function:: xml.etree.ElementInclude.include( elem, loader=None, base_url=None, \ 846 max_depth=6) 847 848 This function expands XInclude directives. *elem* is the root element. *loader* is 849 an optional resource loader. If omitted, it defaults to :func:`default_loader`. 850 If given, it should be a callable that implements the same interface as 851 :func:`default_loader`. *base_url* is base URL of the original file, to resolve 852 relative include file references. *max_depth* is the maximum number of recursive 853 inclusions. Limited to reduce the risk of malicious content explosion. Pass a 854 negative value to disable the limitation. 855 856 Returns the expanded resource. If the parse mode is 857 ``"xml"``, this is an ElementTree instance. If the parse mode is "text", 858 this is a Unicode string. If the loader fails, it can return None or 859 raise an exception. 860 861 .. versionadded:: 3.9 862 The *base_url* and *max_depth* parameters. 863 864 865.. _elementtree-element-objects: 866 867Element Objects 868^^^^^^^^^^^^^^^ 869 870.. class:: Element(tag, attrib={}, **extra) 871 872 Element class. This class defines the Element interface, and provides a 873 reference implementation of this interface. 874 875 The element name, attribute names, and attribute values can be either 876 bytestrings or Unicode strings. *tag* is the element name. *attrib* is 877 an optional dictionary, containing element attributes. *extra* contains 878 additional attributes, given as keyword arguments. 879 880 881 .. attribute:: tag 882 883 A string identifying what kind of data this element represents (the 884 element type, in other words). 885 886 887 .. attribute:: text 888 tail 889 890 These attributes can be used to hold additional data associated with 891 the element. Their values are usually strings but may be any 892 application-specific object. If the element is created from 893 an XML file, the *text* attribute holds either the text between 894 the element's start tag and its first child or end tag, or ``None``, and 895 the *tail* attribute holds either the text between the element's 896 end tag and the next tag, or ``None``. For the XML data 897 898 .. code-block:: xml 899 900 <a><b>1<c>2<d/>3</c></b>4</a> 901 902 the *a* element has ``None`` for both *text* and *tail* attributes, 903 the *b* element has *text* ``"1"`` and *tail* ``"4"``, 904 the *c* element has *text* ``"2"`` and *tail* ``None``, 905 and the *d* element has *text* ``None`` and *tail* ``"3"``. 906 907 To collect the inner text of an element, see :meth:`itertext`, for 908 example ``"".join(element.itertext())``. 909 910 Applications may store arbitrary objects in these attributes. 911 912 913 .. attribute:: attrib 914 915 A dictionary containing the element's attributes. Note that while the 916 *attrib* value is always a real mutable Python dictionary, an ElementTree 917 implementation may choose to use another internal representation, and 918 create the dictionary only if someone asks for it. To take advantage of 919 such implementations, use the dictionary methods below whenever possible. 920 921 The following dictionary-like methods work on the element attributes. 922 923 924 .. method:: clear() 925 926 Resets an element. This function removes all subelements, clears all 927 attributes, and sets the text and tail attributes to ``None``. 928 929 930 .. method:: get(key, default=None) 931 932 Gets the element attribute named *key*. 933 934 Returns the attribute value, or *default* if the attribute was not found. 935 936 937 .. method:: items() 938 939 Returns the element attributes as a sequence of (name, value) pairs. The 940 attributes are returned in an arbitrary order. 941 942 943 .. method:: keys() 944 945 Returns the elements attribute names as a list. The names are returned 946 in an arbitrary order. 947 948 949 .. method:: set(key, value) 950 951 Set the attribute *key* on the element to *value*. 952 953 The following methods work on the element's children (subelements). 954 955 956 .. method:: append(subelement) 957 958 Adds the element *subelement* to the end of this element's internal list 959 of subelements. Raises :exc:`TypeError` if *subelement* is not an 960 :class:`Element`. 961 962 963 .. method:: extend(subelements) 964 965 Appends *subelements* from a sequence object with zero or more elements. 966 Raises :exc:`TypeError` if a subelement is not an :class:`Element`. 967 968 .. versionadded:: 3.2 969 970 971 .. method:: find(match, namespaces=None) 972 973 Finds the first subelement matching *match*. *match* may be a tag name 974 or a :ref:`path <elementtree-xpath>`. Returns an element instance 975 or ``None``. *namespaces* is an optional mapping from namespace prefix 976 to full name. Pass ``''`` as prefix to move all unprefixed tag names 977 in the expression into the given namespace. 978 979 980 .. method:: findall(match, namespaces=None) 981 982 Finds all matching subelements, by tag name or 983 :ref:`path <elementtree-xpath>`. Returns a list containing all matching 984 elements in document order. *namespaces* is an optional mapping from 985 namespace prefix to full name. Pass ``''`` as prefix to move all 986 unprefixed tag names in the expression into the given namespace. 987 988 989 .. method:: findtext(match, default=None, namespaces=None) 990 991 Finds text for the first subelement matching *match*. *match* may be 992 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content 993 of the first matching element, or *default* if no element was found. 994 Note that if the matching element has no text content an empty string 995 is returned. *namespaces* is an optional mapping from namespace prefix 996 to full name. Pass ``''`` as prefix to move all unprefixed tag names 997 in the expression into the given namespace. 998 999 1000 .. method:: insert(index, subelement) 1001 1002 Inserts *subelement* at the given position in this element. Raises 1003 :exc:`TypeError` if *subelement* is not an :class:`Element`. 1004 1005 1006 .. method:: iter(tag=None) 1007 1008 Creates a tree :term:`iterator` with the current element as the root. 1009 The iterator iterates over this element and all elements below it, in 1010 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only 1011 elements whose tag equals *tag* are returned from the iterator. If the 1012 tree structure is modified during iteration, the result is undefined. 1013 1014 .. versionadded:: 3.2 1015 1016 1017 .. method:: iterfind(match, namespaces=None) 1018 1019 Finds all matching subelements, by tag name or 1020 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all 1021 matching elements in document order. *namespaces* is an optional mapping 1022 from namespace prefix to full name. 1023 1024 1025 .. versionadded:: 3.2 1026 1027 1028 .. method:: itertext() 1029 1030 Creates a text iterator. The iterator loops over this element and all 1031 subelements, in document order, and returns all inner text. 1032 1033 .. versionadded:: 3.2 1034 1035 1036 .. method:: makeelement(tag, attrib) 1037 1038 Creates a new element object of the same type as this element. Do not 1039 call this method, use the :func:`SubElement` factory function instead. 1040 1041 1042 .. method:: remove(subelement) 1043 1044 Removes *subelement* from the element. Unlike the find\* methods this 1045 method compares elements based on the instance identity, not on tag value 1046 or contents. 1047 1048 :class:`Element` objects also support the following sequence type methods 1049 for working with subelements: :meth:`~object.__delitem__`, 1050 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`, 1051 :meth:`~object.__len__`. 1052 1053 Caution: Elements with no subelements will test as ``False``. This behavior 1054 will change in future versions. Use specific ``len(elem)`` or ``elem is 1055 None`` test instead. :: 1056 1057 element = root.find('foo') 1058 1059 if not element: # careful! 1060 print("element not found, or element has no subelements") 1061 1062 if element is None: 1063 print("element not found") 1064 1065 Prior to Python 3.8, the serialisation order of the XML attributes of 1066 elements was artificially made predictable by sorting the attributes by 1067 their name. Based on the now guaranteed ordering of dicts, this arbitrary 1068 reordering was removed in Python 3.8 to preserve the order in which 1069 attributes were originally parsed or created by user code. 1070 1071 In general, user code should try not to depend on a specific ordering of 1072 attributes, given that the `XML Information Set 1073 <https://www.w3.org/TR/xml-infoset/>`_ explicitly excludes the attribute 1074 order from conveying information. Code should be prepared to deal with 1075 any ordering on input. In cases where deterministic XML output is required, 1076 e.g. for cryptographic signing or test data sets, canonical serialisation 1077 is available with the :func:`canonicalize` function. 1078 1079 In cases where canonical output is not applicable but a specific attribute 1080 order is still desirable on output, code should aim for creating the 1081 attributes directly in the desired order, to avoid perceptual mismatches 1082 for readers of the code. In cases where this is difficult to achieve, a 1083 recipe like the following can be applied prior to serialisation to enforce 1084 an order independently from the Element creation:: 1085 1086 def reorder_attributes(root): 1087 for el in root.iter(): 1088 attrib = el.attrib 1089 if len(attrib) > 1: 1090 # adjust attribute order, e.g. by sorting 1091 attribs = sorted(attrib.items()) 1092 attrib.clear() 1093 attrib.update(attribs) 1094 1095 1096.. _elementtree-elementtree-objects: 1097 1098ElementTree Objects 1099^^^^^^^^^^^^^^^^^^^ 1100 1101 1102.. class:: ElementTree(element=None, file=None) 1103 1104 ElementTree wrapper class. This class represents an entire element 1105 hierarchy, and adds some extra support for serialization to and from 1106 standard XML. 1107 1108 *element* is the root element. The tree is initialized with the contents 1109 of the XML *file* if given. 1110 1111 1112 .. method:: _setroot(element) 1113 1114 Replaces the root element for this tree. This discards the current 1115 contents of the tree, and replaces it with the given element. Use with 1116 care. *element* is an element instance. 1117 1118 1119 .. method:: find(match, namespaces=None) 1120 1121 Same as :meth:`Element.find`, starting at the root of the tree. 1122 1123 1124 .. method:: findall(match, namespaces=None) 1125 1126 Same as :meth:`Element.findall`, starting at the root of the tree. 1127 1128 1129 .. method:: findtext(match, default=None, namespaces=None) 1130 1131 Same as :meth:`Element.findtext`, starting at the root of the tree. 1132 1133 1134 .. method:: getroot() 1135 1136 Returns the root element for this tree. 1137 1138 1139 .. method:: iter(tag=None) 1140 1141 Creates and returns a tree iterator for the root element. The iterator 1142 loops over all elements in this tree, in section order. *tag* is the tag 1143 to look for (default is to return all elements). 1144 1145 1146 .. method:: iterfind(match, namespaces=None) 1147 1148 Same as :meth:`Element.iterfind`, starting at the root of the tree. 1149 1150 .. versionadded:: 3.2 1151 1152 1153 .. method:: parse(source, parser=None) 1154 1155 Loads an external XML section into this element tree. *source* is a file 1156 name or :term:`file object`. *parser* is an optional parser instance. 1157 If not given, the standard :class:`XMLParser` parser is used. Returns the 1158 section root element. 1159 1160 1161 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \ 1162 default_namespace=None, method="xml", *, \ 1163 short_empty_elements=True) 1164 1165 Writes the element tree to a file, as XML. *file* is a file name, or a 1166 :term:`file object` opened for writing. *encoding* [1]_ is the output 1167 encoding (default is US-ASCII). 1168 *xml_declaration* controls if an XML declaration should be added to the 1169 file. Use ``False`` for never, ``True`` for always, ``None`` 1170 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``). 1171 *default_namespace* sets the default XML namespace (for "xmlns"). 1172 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is 1173 ``"xml"``). 1174 The keyword-only *short_empty_elements* parameter controls the formatting 1175 of elements that contain no content. If ``True`` (the default), they are 1176 emitted as a single self-closed tag, otherwise they are emitted as a pair 1177 of start/end tags. 1178 1179 The output is either a string (:class:`str`) or binary (:class:`bytes`). 1180 This is controlled by the *encoding* argument. If *encoding* is 1181 ``"unicode"``, the output is a string; otherwise, it's binary. Note that 1182 this may conflict with the type of *file* if it's an open 1183 :term:`file object`; make sure you do not try to write a string to a 1184 binary stream and vice versa. 1185 1186 .. versionadded:: 3.4 1187 The *short_empty_elements* parameter. 1188 1189 .. versionchanged:: 3.8 1190 The :meth:`write` method now preserves the attribute order specified 1191 by the user. 1192 1193 1194This is the XML file that is going to be manipulated:: 1195 1196 <html> 1197 <head> 1198 <title>Example page</title> 1199 </head> 1200 <body> 1201 <p>Moved to <a href="http://example.org/">example.org</a> 1202 or <a href="http://example.com/">example.com</a>.</p> 1203 </body> 1204 </html> 1205 1206Example of changing the attribute "target" of every link in first paragraph:: 1207 1208 >>> from xml.etree.ElementTree import ElementTree 1209 >>> tree = ElementTree() 1210 >>> tree.parse("index.xhtml") 1211 <Element 'html' at 0xb77e6fac> 1212 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body 1213 >>> p 1214 <Element 'p' at 0xb77ec26c> 1215 >>> links = list(p.iter("a")) # Returns list of all links 1216 >>> links 1217 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>] 1218 >>> for i in links: # Iterates through all found links 1219 ... i.attrib["target"] = "blank" 1220 >>> tree.write("output.xhtml") 1221 1222.. _elementtree-qname-objects: 1223 1224QName Objects 1225^^^^^^^^^^^^^ 1226 1227 1228.. class:: QName(text_or_uri, tag=None) 1229 1230 QName wrapper. This can be used to wrap a QName attribute value, in order 1231 to get proper namespace handling on output. *text_or_uri* is a string 1232 containing the QName value, in the form {uri}local, or, if the tag argument 1233 is given, the URI part of a QName. If *tag* is given, the first argument is 1234 interpreted as a URI, and this argument is interpreted as a local name. 1235 :class:`QName` instances are opaque. 1236 1237 1238 1239.. _elementtree-treebuilder-objects: 1240 1241TreeBuilder Objects 1242^^^^^^^^^^^^^^^^^^^ 1243 1244 1245.. class:: TreeBuilder(element_factory=None, *, comment_factory=None, \ 1246 pi_factory=None, insert_comments=False, insert_pis=False) 1247 1248 Generic element structure builder. This builder converts a sequence of 1249 start, data, end, comment and pi method calls to a well-formed element 1250 structure. You can use this class to build an element structure using 1251 a custom XML parser, or a parser for some other XML-like format. 1252 1253 *element_factory*, when given, must be a callable accepting two positional 1254 arguments: a tag and a dict of attributes. It is expected to return a new 1255 element instance. 1256 1257 The *comment_factory* and *pi_factory* functions, when given, should behave 1258 like the :func:`Comment` and :func:`ProcessingInstruction` functions to 1259 create comments and processing instructions. When not given, the default 1260 factories will be used. When *insert_comments* and/or *insert_pis* is true, 1261 comments/pis will be inserted into the tree if they appear within the root 1262 element (but not outside of it). 1263 1264 .. method:: close() 1265 1266 Flushes the builder buffers, and returns the toplevel document 1267 element. Returns an :class:`Element` instance. 1268 1269 1270 .. method:: data(data) 1271 1272 Adds text to the current element. *data* is a string. This should be 1273 either a bytestring, or a Unicode string. 1274 1275 1276 .. method:: end(tag) 1277 1278 Closes the current element. *tag* is the element name. Returns the 1279 closed element. 1280 1281 1282 .. method:: start(tag, attrs) 1283 1284 Opens a new element. *tag* is the element name. *attrs* is a dictionary 1285 containing element attributes. Returns the opened element. 1286 1287 1288 .. method:: comment(text) 1289 1290 Creates a comment with the given *text*. If ``insert_comments`` is true, 1291 this will also add it to the tree. 1292 1293 .. versionadded:: 3.8 1294 1295 1296 .. method:: pi(target, text) 1297 1298 Creates a comment with the given *target* name and *text*. If 1299 ``insert_pis`` is true, this will also add it to the tree. 1300 1301 .. versionadded:: 3.8 1302 1303 1304 In addition, a custom :class:`TreeBuilder` object can provide the 1305 following methods: 1306 1307 .. method:: doctype(name, pubid, system) 1308 1309 Handles a doctype declaration. *name* is the doctype name. *pubid* is 1310 the public identifier. *system* is the system identifier. This method 1311 does not exist on the default :class:`TreeBuilder` class. 1312 1313 .. versionadded:: 3.2 1314 1315 .. method:: start_ns(prefix, uri) 1316 1317 Is called whenever the parser encounters a new namespace declaration, 1318 before the ``start()`` callback for the opening element that defines it. 1319 *prefix* is ``''`` for the default namespace and the declared 1320 namespace prefix name otherwise. *uri* is the namespace URI. 1321 1322 .. versionadded:: 3.8 1323 1324 .. method:: end_ns(prefix) 1325 1326 Is called after the ``end()`` callback of an element that declared 1327 a namespace prefix mapping, with the name of the *prefix* that went 1328 out of scope. 1329 1330 .. versionadded:: 3.8 1331 1332 1333.. class:: C14NWriterTarget(write, *, \ 1334 with_comments=False, strip_text=False, rewrite_prefixes=False, \ 1335 qname_aware_tags=None, qname_aware_attrs=None, \ 1336 exclude_attrs=None, exclude_tags=None) 1337 1338 A `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ writer. Arguments are the 1339 same as for the :func:`canonicalize` function. This class does not build a 1340 tree but translates the callback events directly into a serialised form 1341 using the *write* function. 1342 1343 .. versionadded:: 3.8 1344 1345 1346.. _elementtree-xmlparser-objects: 1347 1348XMLParser Objects 1349^^^^^^^^^^^^^^^^^ 1350 1351 1352.. class:: XMLParser(*, target=None, encoding=None) 1353 1354 This class is the low-level building block of the module. It uses 1355 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can 1356 be fed XML data incrementally with the :meth:`feed` method, and parsing 1357 events are translated to a push API - by invoking callbacks on the *target* 1358 object. If *target* is omitted, the standard :class:`TreeBuilder` is used. 1359 If *encoding* [1]_ is given, the value overrides the 1360 encoding specified in the XML file. 1361 1362 .. versionchanged:: 3.8 1363 Parameters are now :ref:`keyword-only <keyword-only_parameter>`. 1364 The *html* argument no longer supported. 1365 1366 1367 .. method:: close() 1368 1369 Finishes feeding data to the parser. Returns the result of calling the 1370 ``close()`` method of the *target* passed during construction; by default, 1371 this is the toplevel document element. 1372 1373 1374 .. method:: feed(data) 1375 1376 Feeds data to the parser. *data* is encoded data. 1377 1378 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method 1379 for each opening tag, its ``end(tag)`` method for each closing tag, and data 1380 is processed by method ``data(data)``. For further supported callback 1381 methods, see the :class:`TreeBuilder` class. :meth:`XMLParser.close` calls 1382 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for 1383 building a tree structure. This is an example of counting the maximum depth 1384 of an XML file:: 1385 1386 >>> from xml.etree.ElementTree import XMLParser 1387 >>> class MaxDepth: # The target object of the parser 1388 ... maxDepth = 0 1389 ... depth = 0 1390 ... def start(self, tag, attrib): # Called for each opening tag. 1391 ... self.depth += 1 1392 ... if self.depth > self.maxDepth: 1393 ... self.maxDepth = self.depth 1394 ... def end(self, tag): # Called for each closing tag. 1395 ... self.depth -= 1 1396 ... def data(self, data): 1397 ... pass # We do not need to do anything with data. 1398 ... def close(self): # Called when all data has been parsed. 1399 ... return self.maxDepth 1400 ... 1401 >>> target = MaxDepth() 1402 >>> parser = XMLParser(target=target) 1403 >>> exampleXml = """ 1404 ... <a> 1405 ... <b> 1406 ... </b> 1407 ... <b> 1408 ... <c> 1409 ... <d> 1410 ... </d> 1411 ... </c> 1412 ... </b> 1413 ... </a>""" 1414 >>> parser.feed(exampleXml) 1415 >>> parser.close() 1416 4 1417 1418 1419.. _elementtree-xmlpullparser-objects: 1420 1421XMLPullParser Objects 1422^^^^^^^^^^^^^^^^^^^^^ 1423 1424.. class:: XMLPullParser(events=None) 1425 1426 A pull parser suitable for non-blocking applications. Its input-side API is 1427 similar to that of :class:`XMLParser`, but instead of pushing calls to a 1428 callback target, :class:`XMLPullParser` collects an internal list of parsing 1429 events and lets the user read from it. *events* is a sequence of events to 1430 report back. The supported events are the strings ``"start"``, ``"end"``, 1431 ``"comment"``, ``"pi"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" events 1432 are used to get detailed namespace information). If *events* is omitted, 1433 only ``"end"`` events are reported. 1434 1435 .. method:: feed(data) 1436 1437 Feed the given bytes data to the parser. 1438 1439 .. method:: close() 1440 1441 Signal the parser that the data stream is terminated. Unlike 1442 :meth:`XMLParser.close`, this method always returns :const:`None`. 1443 Any events not yet retrieved when the parser is closed can still be 1444 read with :meth:`read_events`. 1445 1446 .. method:: read_events() 1447 1448 Return an iterator over the events which have been encountered in the 1449 data fed to the 1450 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a 1451 string representing the type of event (e.g. ``"end"``) and *elem* is the 1452 encountered :class:`Element` object, or other context value as follows. 1453 1454 * ``start``, ``end``: the current Element. 1455 * ``comment``, ``pi``: the current comment / processing instruction 1456 * ``start-ns``: a tuple ``(prefix, uri)`` naming the declared namespace 1457 mapping. 1458 * ``end-ns``: :const:`None` (this may change in a future version) 1459 1460 Events provided in a previous call to :meth:`read_events` will not be 1461 yielded again. Events are consumed from the internal queue only when 1462 they are retrieved from the iterator, so multiple readers iterating in 1463 parallel over iterators obtained from :meth:`read_events` will have 1464 unpredictable results. 1465 1466 .. note:: 1467 1468 :class:`XMLPullParser` only guarantees that it has seen the ">" 1469 character of a starting tag when it emits a "start" event, so the 1470 attributes are defined, but the contents of the text and tail attributes 1471 are undefined at that point. The same applies to the element children; 1472 they may or may not be present. 1473 1474 If you need a fully populated element, look for "end" events instead. 1475 1476 .. versionadded:: 3.4 1477 1478 .. versionchanged:: 3.8 1479 The ``comment`` and ``pi`` events were added. 1480 1481 1482Exceptions 1483^^^^^^^^^^ 1484 1485.. class:: ParseError 1486 1487 XML parse error, raised by the various parsing methods in this module when 1488 parsing fails. The string representation of an instance of this exception 1489 will contain a user-friendly error message. In addition, it will have 1490 the following attributes available: 1491 1492 .. attribute:: code 1493 1494 A numeric error code from the expat parser. See the documentation of 1495 :mod:`xml.parsers.expat` for the list of error codes and their meanings. 1496 1497 .. attribute:: position 1498 1499 A tuple of *line*, *column* numbers, specifying where the error occurred. 1500 1501.. rubric:: Footnotes 1502 1503.. [1] The encoding string included in XML output should conform to the 1504 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is 1505 not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 1506 and https://www.iana.org/assignments/character-sets/character-sets.xhtml. 1507