1:mod:`xml.etree.ElementTree` --- The ElementTree XML API 2======================================================== 3 4.. module:: xml.etree.ElementTree 5 :synopsis: Implementation of the ElementTree API. 6 7.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com> 8 9**Source code:** :source:`Lib/xml/etree/ElementTree.py` 10 11-------------- 12 13The :mod:`xml.etree.ElementTree` module implements a simple and efficient API 14for parsing and creating XML data. 15 16.. versionchanged:: 3.3 17 This module will use a fast implementation whenever available. 18 The :mod:`xml.etree.cElementTree` module is deprecated. 19 20 21.. warning:: 22 23 The :mod:`xml.etree.ElementTree` module is not secure against 24 maliciously constructed data. If you need to parse untrusted or 25 unauthenticated data see :ref:`xml-vulnerabilities`. 26 27Tutorial 28-------- 29 30This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in 31short). The goal is to demonstrate some of the building blocks and basic 32concepts of the module. 33 34XML tree and elements 35^^^^^^^^^^^^^^^^^^^^^ 36 37XML is an inherently hierarchical data format, and the most natural way to 38represent it is with a tree. ``ET`` has two classes for this purpose - 39:class:`ElementTree` represents the whole XML document as a tree, and 40:class:`Element` represents a single node in this tree. Interactions with 41the whole document (reading and writing to/from files) are usually done 42on the :class:`ElementTree` level. Interactions with a single XML element 43and its sub-elements are done on the :class:`Element` level. 44 45.. _elementtree-parsing-xml: 46 47Parsing XML 48^^^^^^^^^^^ 49 50We'll be using the following XML document as the sample data for this section: 51 52.. code-block:: xml 53 54 <?xml version="1.0"?> 55 <data> 56 <country name="Liechtenstein"> 57 <rank>1</rank> 58 <year>2008</year> 59 <gdppc>141100</gdppc> 60 <neighbor name="Austria" direction="E"/> 61 <neighbor name="Switzerland" direction="W"/> 62 </country> 63 <country name="Singapore"> 64 <rank>4</rank> 65 <year>2011</year> 66 <gdppc>59900</gdppc> 67 <neighbor name="Malaysia" direction="N"/> 68 </country> 69 <country name="Panama"> 70 <rank>68</rank> 71 <year>2011</year> 72 <gdppc>13600</gdppc> 73 <neighbor name="Costa Rica" direction="W"/> 74 <neighbor name="Colombia" direction="E"/> 75 </country> 76 </data> 77 78We can import this data by reading from a file:: 79 80 import xml.etree.ElementTree as ET 81 tree = ET.parse('country_data.xml') 82 root = tree.getroot() 83 84Or directly from a string:: 85 86 root = ET.fromstring(country_data_as_string) 87 88:func:`fromstring` parses XML from a string directly into an :class:`Element`, 89which is the root element of the parsed tree. Other parsing functions may 90create an :class:`ElementTree`. Check the documentation to be sure. 91 92As an :class:`Element`, ``root`` has a tag and a dictionary of attributes:: 93 94 >>> root.tag 95 'data' 96 >>> root.attrib 97 {} 98 99It also has children nodes over which we can iterate:: 100 101 >>> for child in root: 102 ... print(child.tag, child.attrib) 103 ... 104 country {'name': 'Liechtenstein'} 105 country {'name': 'Singapore'} 106 country {'name': 'Panama'} 107 108Children are nested, and we can access specific child nodes by index:: 109 110 >>> root[0][1].text 111 '2008' 112 113 114.. note:: 115 116 Not all elements of the XML input will end up as elements of the 117 parsed tree. Currently, this module skips over any XML comments, 118 processing instructions, and document type declarations in the 119 input. Nevertheless, trees built using this module's API rather 120 than parsing from XML text can have comments and processing 121 instructions in them; they will be included when generating XML 122 output. A document type declaration may be accessed by passing a 123 custom :class:`TreeBuilder` instance to the :class:`XMLParser` 124 constructor. 125 126 127.. _elementtree-pull-parsing: 128 129Pull API for non-blocking parsing 130^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 131 132Most parsing functions provided by this module require the whole document 133to be read at once before returning any result. It is possible to use an 134:class:`XMLParser` and feed data into it incrementally, but it is a push API that 135calls methods on a callback target, which is too low-level and inconvenient for 136most needs. Sometimes what the user really wants is to be able to parse XML 137incrementally, without blocking operations, while enjoying the convenience of 138fully constructed :class:`Element` objects. 139 140The most powerful tool for doing this is :class:`XMLPullParser`. It does not 141require a blocking read to obtain the XML data, and is instead fed with data 142incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML 143elements, call :meth:`XMLPullParser.read_events`. Here is an example:: 144 145 >>> parser = ET.XMLPullParser(['start', 'end']) 146 >>> parser.feed('<mytag>sometext') 147 >>> list(parser.read_events()) 148 [('start', <Element 'mytag' at 0x7fa66db2be58>)] 149 >>> parser.feed(' more text</mytag>') 150 >>> for event, elem in parser.read_events(): 151 ... print(event) 152 ... print(elem.tag, 'text=', elem.text) 153 ... 154 end 155 156The obvious use case is applications that operate in a non-blocking fashion 157where the XML data is being received from a socket or read incrementally from 158some storage device. In such cases, blocking reads are unacceptable. 159 160Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for 161simpler use-cases. If you don't mind your application blocking on reading XML 162data but would still like to have incremental parsing capabilities, take a look 163at :func:`iterparse`. It can be useful when you're reading a large XML document 164and don't want to hold it wholly in memory. 165 166Finding interesting elements 167^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 168 169:class:`Element` has some useful methods that help iterate recursively over all 170the sub-tree below it (its children, their children, and so on). For example, 171:meth:`Element.iter`:: 172 173 >>> for neighbor in root.iter('neighbor'): 174 ... print(neighbor.attrib) 175 ... 176 {'name': 'Austria', 'direction': 'E'} 177 {'name': 'Switzerland', 'direction': 'W'} 178 {'name': 'Malaysia', 'direction': 'N'} 179 {'name': 'Costa Rica', 'direction': 'W'} 180 {'name': 'Colombia', 'direction': 'E'} 181 182:meth:`Element.findall` finds only elements with a tag which are direct 183children of the current element. :meth:`Element.find` finds the *first* child 184with a particular tag, and :attr:`Element.text` accesses the element's text 185content. :meth:`Element.get` accesses the element's attributes:: 186 187 >>> for country in root.findall('country'): 188 ... rank = country.find('rank').text 189 ... name = country.get('name') 190 ... print(name, rank) 191 ... 192 Liechtenstein 1 193 Singapore 4 194 Panama 68 195 196More sophisticated specification of which elements to look for is possible by 197using :ref:`XPath <elementtree-xpath>`. 198 199Modifying an XML File 200^^^^^^^^^^^^^^^^^^^^^ 201 202:class:`ElementTree` provides a simple way to build XML documents and write them to files. 203The :meth:`ElementTree.write` method serves this purpose. 204 205Once created, an :class:`Element` object may be manipulated by directly changing 206its fields (such as :attr:`Element.text`), adding and modifying attributes 207(:meth:`Element.set` method), as well as adding new children (for example 208with :meth:`Element.append`). 209 210Let's say we want to add one to each country's rank, and add an ``updated`` 211attribute to the rank element:: 212 213 >>> for rank in root.iter('rank'): 214 ... new_rank = int(rank.text) + 1 215 ... rank.text = str(new_rank) 216 ... rank.set('updated', 'yes') 217 ... 218 >>> tree.write('output.xml') 219 220Our XML now looks like this: 221 222.. code-block:: xml 223 224 <?xml version="1.0"?> 225 <data> 226 <country name="Liechtenstein"> 227 <rank updated="yes">2</rank> 228 <year>2008</year> 229 <gdppc>141100</gdppc> 230 <neighbor name="Austria" direction="E"/> 231 <neighbor name="Switzerland" direction="W"/> 232 </country> 233 <country name="Singapore"> 234 <rank updated="yes">5</rank> 235 <year>2011</year> 236 <gdppc>59900</gdppc> 237 <neighbor name="Malaysia" direction="N"/> 238 </country> 239 <country name="Panama"> 240 <rank updated="yes">69</rank> 241 <year>2011</year> 242 <gdppc>13600</gdppc> 243 <neighbor name="Costa Rica" direction="W"/> 244 <neighbor name="Colombia" direction="E"/> 245 </country> 246 </data> 247 248We can remove elements using :meth:`Element.remove`. Let's say we want to 249remove all countries with a rank higher than 50:: 250 251 >>> for country in root.findall('country'): 252 ... rank = int(country.find('rank').text) 253 ... if rank > 50: 254 ... root.remove(country) 255 ... 256 >>> tree.write('output.xml') 257 258Our XML now looks like this: 259 260.. code-block:: xml 261 262 <?xml version="1.0"?> 263 <data> 264 <country name="Liechtenstein"> 265 <rank updated="yes">2</rank> 266 <year>2008</year> 267 <gdppc>141100</gdppc> 268 <neighbor name="Austria" direction="E"/> 269 <neighbor name="Switzerland" direction="W"/> 270 </country> 271 <country name="Singapore"> 272 <rank updated="yes">5</rank> 273 <year>2011</year> 274 <gdppc>59900</gdppc> 275 <neighbor name="Malaysia" direction="N"/> 276 </country> 277 </data> 278 279Building XML documents 280^^^^^^^^^^^^^^^^^^^^^^ 281 282The :func:`SubElement` function also provides a convenient way to create new 283sub-elements for a given element:: 284 285 >>> a = ET.Element('a') 286 >>> b = ET.SubElement(a, 'b') 287 >>> c = ET.SubElement(a, 'c') 288 >>> d = ET.SubElement(c, 'd') 289 >>> ET.dump(a) 290 <a><b /><c><d /></c></a> 291 292Parsing XML with Namespaces 293^^^^^^^^^^^^^^^^^^^^^^^^^^^ 294 295If the XML input has `namespaces 296<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes 297with prefixes in the form ``prefix:sometag`` get expanded to 298``{uri}sometag`` where the *prefix* is replaced by the full *URI*. 299Also, if there is a `default namespace 300<https://www.w3.org/TR/xml-names/#defaulting>`__, 301that full URI gets prepended to all of the non-prefixed tags. 302 303Here is an XML example that incorporates two namespaces, one with the 304prefix "fictional" and the other serving as the default namespace: 305 306.. code-block:: xml 307 308 <?xml version="1.0"?> 309 <actors xmlns:fictional="http://characters.example.com" 310 xmlns="http://people.example.com"> 311 <actor> 312 <name>John Cleese</name> 313 <fictional:character>Lancelot</fictional:character> 314 <fictional:character>Archie Leach</fictional:character> 315 </actor> 316 <actor> 317 <name>Eric Idle</name> 318 <fictional:character>Sir Robin</fictional:character> 319 <fictional:character>Gunther</fictional:character> 320 <fictional:character>Commander Clement</fictional:character> 321 </actor> 322 </actors> 323 324One way to search and explore this XML example is to manually add the 325URI to every tag or attribute in the xpath of a 326:meth:`~Element.find` or :meth:`~Element.findall`:: 327 328 root = fromstring(xml_text) 329 for actor in root.findall('{http://people.example.com}actor'): 330 name = actor.find('{http://people.example.com}name') 331 print(name.text) 332 for char in actor.findall('{http://characters.example.com}character'): 333 print(' |-->', char.text) 334 335A better way to search the namespaced XML example is to create a 336dictionary with your own prefixes and use those in the search functions:: 337 338 ns = {'real_person': 'http://people.example.com', 339 'role': 'http://characters.example.com'} 340 341 for actor in root.findall('real_person:actor', ns): 342 name = actor.find('real_person:name', ns) 343 print(name.text) 344 for char in actor.findall('role:character', ns): 345 print(' |-->', char.text) 346 347These two approaches both output:: 348 349 John Cleese 350 |--> Lancelot 351 |--> Archie Leach 352 Eric Idle 353 |--> Sir Robin 354 |--> Gunther 355 |--> Commander Clement 356 357 358Additional resources 359^^^^^^^^^^^^^^^^^^^^ 360 361See http://effbot.org/zone/element-index.htm for tutorials and links to other 362docs. 363 364 365.. _elementtree-xpath: 366 367XPath support 368------------- 369 370This module provides limited support for 371`XPath expressions <https://www.w3.org/TR/xpath>`_ for locating elements in a 372tree. The goal is to support a small subset of the abbreviated syntax; a full 373XPath engine is outside the scope of the module. 374 375Example 376^^^^^^^ 377 378Here's an example that demonstrates some of the XPath capabilities of the 379module. We'll be using the ``countrydata`` XML document from the 380:ref:`Parsing XML <elementtree-parsing-xml>` section:: 381 382 import xml.etree.ElementTree as ET 383 384 root = ET.fromstring(countrydata) 385 386 # Top-level elements 387 root.findall(".") 388 389 # All 'neighbor' grand-children of 'country' children of the top-level 390 # elements 391 root.findall("./country/neighbor") 392 393 # Nodes with name='Singapore' that have a 'year' child 394 root.findall(".//year/..[@name='Singapore']") 395 396 # 'year' nodes that are children of nodes with name='Singapore' 397 root.findall(".//*[@name='Singapore']/year") 398 399 # All 'neighbor' nodes that are the second child of their parent 400 root.findall(".//neighbor[2]") 401 402Supported XPath syntax 403^^^^^^^^^^^^^^^^^^^^^^ 404 405.. tabularcolumns:: |l|L| 406 407+-----------------------+------------------------------------------------------+ 408| Syntax | Meaning | 409+=======================+======================================================+ 410| ``tag`` | Selects all child elements with the given tag. | 411| | For example, ``spam`` selects all child elements | 412| | named ``spam``, and ``spam/egg`` selects all | 413| | grandchildren named ``egg`` in all children named | 414| | ``spam``. | 415+-----------------------+------------------------------------------------------+ 416| ``*`` | Selects all child elements. For example, ``*/egg`` | 417| | selects all grandchildren named ``egg``. | 418+-----------------------+------------------------------------------------------+ 419| ``.`` | Selects the current node. This is mostly useful | 420| | at the beginning of the path, to indicate that it's | 421| | a relative path. | 422+-----------------------+------------------------------------------------------+ 423| ``//`` | Selects all subelements, on all levels beneath the | 424| | current element. For example, ``.//egg`` selects | 425| | all ``egg`` elements in the entire tree. | 426+-----------------------+------------------------------------------------------+ 427| ``..`` | Selects the parent element. Returns ``None`` if the | 428| | path attempts to reach the ancestors of the start | 429| | element (the element ``find`` was called on). | 430+-----------------------+------------------------------------------------------+ 431| ``[@attrib]`` | Selects all elements that have the given attribute. | 432+-----------------------+------------------------------------------------------+ 433| ``[@attrib='value']`` | Selects all elements for which the given attribute | 434| | has the given value. The value cannot contain | 435| | quotes. | 436+-----------------------+------------------------------------------------------+ 437| ``[tag]`` | Selects all elements that have a child named | 438| | ``tag``. Only immediate children are supported. | 439+-----------------------+------------------------------------------------------+ 440| ``[.='text']`` | Selects all elements whose complete text content, | 441| | including descendants, equals the given ``text``. | 442| | | 443| | .. versionadded:: 3.7 | 444+-----------------------+------------------------------------------------------+ 445| ``[tag='text']`` | Selects all elements that have a child named | 446| | ``tag`` whose complete text content, including | 447| | descendants, equals the given ``text``. | 448+-----------------------+------------------------------------------------------+ 449| ``[position]`` | Selects all elements that are located at the given | 450| | position. The position can be either an integer | 451| | (1 is the first position), the expression ``last()`` | 452| | (for the last position), or a position relative to | 453| | the last position (e.g. ``last()-1``). | 454+-----------------------+------------------------------------------------------+ 455 456Predicates (expressions within square brackets) must be preceded by a tag 457name, an asterisk, or another predicate. ``position`` predicates must be 458preceded by a tag name. 459 460Reference 461--------- 462 463.. _elementtree-functions: 464 465Functions 466^^^^^^^^^ 467 468 469.. function:: Comment(text=None) 470 471 Comment element factory. This factory function creates a special element 472 that will be serialized as an XML comment by the standard serializer. The 473 comment string can be either a bytestring or a Unicode string. *text* is a 474 string containing the comment string. Returns an element instance 475 representing a comment. 476 477 Note that :class:`XMLParser` skips over comments in the input 478 instead of creating comment objects for them. An :class:`ElementTree` will 479 only contain comment nodes if they have been inserted into to 480 the tree using one of the :class:`Element` methods. 481 482.. function:: dump(elem) 483 484 Writes an element tree or element structure to sys.stdout. This function 485 should be used for debugging only. 486 487 The exact output format is implementation dependent. In this version, it's 488 written as an ordinary XML file. 489 490 *elem* is an element tree or an individual element. 491 492 493.. function:: fromstring(text, parser=None) 494 495 Parses an XML section from a string constant. Same as :func:`XML`. *text* 496 is a string containing XML data. *parser* is an optional parser instance. 497 If not given, the standard :class:`XMLParser` parser is used. 498 Returns an :class:`Element` instance. 499 500 501.. function:: fromstringlist(sequence, parser=None) 502 503 Parses an XML document from a sequence of string fragments. *sequence* is a 504 list or other sequence containing XML data fragments. *parser* is an 505 optional parser instance. If not given, the standard :class:`XMLParser` 506 parser is used. Returns an :class:`Element` instance. 507 508 .. versionadded:: 3.2 509 510 511.. function:: iselement(element) 512 513 Checks if an object appears to be a valid element object. *element* is an 514 element instance. Returns a true value if this is an element object. 515 516 517.. function:: iterparse(source, events=None, parser=None) 518 519 Parses an XML section into an element tree incrementally, and reports what's 520 going on to the user. *source* is a filename or :term:`file object` 521 containing XML data. *events* is a sequence of events to report back. The 522 supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and 523 ``"end-ns"`` (the "ns" events are used to get detailed namespace 524 information). If *events* is omitted, only ``"end"`` events are reported. 525 *parser* is an optional parser instance. If not given, the standard 526 :class:`XMLParser` parser is used. *parser* must be a subclass of 527 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a 528 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs. 529 530 Note that while :func:`iterparse` builds the tree incrementally, it issues 531 blocking reads on *source* (or the file it names). As such, it's unsuitable 532 for applications where blocking reads can't be made. For fully non-blocking 533 parsing, see :class:`XMLPullParser`. 534 535 .. note:: 536 537 :func:`iterparse` only guarantees that it has seen the ">" character of a 538 starting tag when it emits a "start" event, so the attributes are defined, 539 but the contents of the text and tail attributes are undefined at that 540 point. The same applies to the element children; they may or may not be 541 present. 542 543 If you need a fully populated element, look for "end" events instead. 544 545 .. deprecated:: 3.4 546 The *parser* argument. 547 548.. function:: parse(source, parser=None) 549 550 Parses an XML section into an element tree. *source* is a filename or file 551 object containing XML data. *parser* is an optional parser instance. If 552 not given, the standard :class:`XMLParser` parser is used. Returns an 553 :class:`ElementTree` instance. 554 555 556.. function:: ProcessingInstruction(target, text=None) 557 558 PI element factory. This factory function creates a special element that 559 will be serialized as an XML processing instruction. *target* is a string 560 containing the PI target. *text* is a string containing the PI contents, if 561 given. Returns an element instance, representing a processing instruction. 562 563 Note that :class:`XMLParser` skips over processing instructions 564 in the input instead of creating comment objects for them. An 565 :class:`ElementTree` will only contain processing instruction nodes if 566 they have been inserted into to the tree using one of the 567 :class:`Element` methods. 568 569.. function:: register_namespace(prefix, uri) 570 571 Registers a namespace prefix. The registry is global, and any existing 572 mapping for either the given prefix or the namespace URI will be removed. 573 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and 574 attributes in this namespace will be serialized with the given prefix, if at 575 all possible. 576 577 .. versionadded:: 3.2 578 579 580.. function:: SubElement(parent, tag, attrib={}, **extra) 581 582 Subelement factory. This function creates an element instance, and appends 583 it to an existing element. 584 585 The element name, attribute names, and attribute values can be either 586 bytestrings or Unicode strings. *parent* is the parent element. *tag* is 587 the subelement name. *attrib* is an optional dictionary, containing element 588 attributes. *extra* contains additional attributes, given as keyword 589 arguments. Returns an element instance. 590 591 592.. function:: tostring(element, encoding="us-ascii", method="xml", *, \ 593 short_empty_elements=True) 594 595 Generates a string representation of an XML element, including all 596 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 597 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 598 generate a Unicode string (otherwise, a bytestring is generated). *method* 599 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 600 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. 601 Returns an (optionally) encoded string containing the XML data. 602 603 .. versionadded:: 3.4 604 The *short_empty_elements* parameter. 605 606 607.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \ 608 short_empty_elements=True) 609 610 Generates a string representation of an XML element, including all 611 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 612 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 613 generate a Unicode string (otherwise, a bytestring is generated). *method* 614 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 615 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. 616 Returns a list of (optionally) encoded strings containing the XML data. 617 It does not guarantee any specific sequence, except that 618 ``b"".join(tostringlist(element)) == tostring(element)``. 619 620 .. versionadded:: 3.2 621 622 .. versionadded:: 3.4 623 The *short_empty_elements* parameter. 624 625 626.. function:: XML(text, parser=None) 627 628 Parses an XML section from a string constant. This function can be used to 629 embed "XML literals" in Python code. *text* is a string containing XML 630 data. *parser* is an optional parser instance. If not given, the standard 631 :class:`XMLParser` parser is used. Returns an :class:`Element` instance. 632 633 634.. function:: XMLID(text, parser=None) 635 636 Parses an XML section from a string constant, and also returns a dictionary 637 which maps from element id:s to elements. *text* is a string containing XML 638 data. *parser* is an optional parser instance. If not given, the standard 639 :class:`XMLParser` parser is used. Returns a tuple containing an 640 :class:`Element` instance and a dictionary. 641 642 643.. _elementtree-element-objects: 644 645Element Objects 646^^^^^^^^^^^^^^^ 647 648.. class:: Element(tag, attrib={}, **extra) 649 650 Element class. This class defines the Element interface, and provides a 651 reference implementation of this interface. 652 653 The element name, attribute names, and attribute values can be either 654 bytestrings or Unicode strings. *tag* is the element name. *attrib* is 655 an optional dictionary, containing element attributes. *extra* contains 656 additional attributes, given as keyword arguments. 657 658 659 .. attribute:: tag 660 661 A string identifying what kind of data this element represents (the 662 element type, in other words). 663 664 665 .. attribute:: text 666 tail 667 668 These attributes can be used to hold additional data associated with 669 the element. Their values are usually strings but may be any 670 application-specific object. If the element is created from 671 an XML file, the *text* attribute holds either the text between 672 the element's start tag and its first child or end tag, or ``None``, and 673 the *tail* attribute holds either the text between the element's 674 end tag and the next tag, or ``None``. For the XML data 675 676 .. code-block:: xml 677 678 <a><b>1<c>2<d/>3</c></b>4</a> 679 680 the *a* element has ``None`` for both *text* and *tail* attributes, 681 the *b* element has *text* ``"1"`` and *tail* ``"4"``, 682 the *c* element has *text* ``"2"`` and *tail* ``None``, 683 and the *d* element has *text* ``None`` and *tail* ``"3"``. 684 685 To collect the inner text of an element, see :meth:`itertext`, for 686 example ``"".join(element.itertext())``. 687 688 Applications may store arbitrary objects in these attributes. 689 690 691 .. attribute:: attrib 692 693 A dictionary containing the element's attributes. Note that while the 694 *attrib* value is always a real mutable Python dictionary, an ElementTree 695 implementation may choose to use another internal representation, and 696 create the dictionary only if someone asks for it. To take advantage of 697 such implementations, use the dictionary methods below whenever possible. 698 699 The following dictionary-like methods work on the element attributes. 700 701 702 .. method:: clear() 703 704 Resets an element. This function removes all subelements, clears all 705 attributes, and sets the text and tail attributes to ``None``. 706 707 708 .. method:: get(key, default=None) 709 710 Gets the element attribute named *key*. 711 712 Returns the attribute value, or *default* if the attribute was not found. 713 714 715 .. method:: items() 716 717 Returns the element attributes as a sequence of (name, value) pairs. The 718 attributes are returned in an arbitrary order. 719 720 721 .. method:: keys() 722 723 Returns the elements attribute names as a list. The names are returned 724 in an arbitrary order. 725 726 727 .. method:: set(key, value) 728 729 Set the attribute *key* on the element to *value*. 730 731 The following methods work on the element's children (subelements). 732 733 734 .. method:: append(subelement) 735 736 Adds the element *subelement* to the end of this element's internal list 737 of subelements. Raises :exc:`TypeError` if *subelement* is not an 738 :class:`Element`. 739 740 741 .. method:: extend(subelements) 742 743 Appends *subelements* from a sequence object with zero or more elements. 744 Raises :exc:`TypeError` if a subelement is not an :class:`Element`. 745 746 .. versionadded:: 3.2 747 748 749 .. method:: find(match, namespaces=None) 750 751 Finds the first subelement matching *match*. *match* may be a tag name 752 or a :ref:`path <elementtree-xpath>`. Returns an element instance 753 or ``None``. *namespaces* is an optional mapping from namespace prefix 754 to full name. 755 756 757 .. method:: findall(match, namespaces=None) 758 759 Finds all matching subelements, by tag name or 760 :ref:`path <elementtree-xpath>`. Returns a list containing all matching 761 elements in document order. *namespaces* is an optional mapping from 762 namespace prefix to full name. 763 764 765 .. method:: findtext(match, default=None, namespaces=None) 766 767 Finds text for the first subelement matching *match*. *match* may be 768 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content 769 of the first matching element, or *default* if no element was found. 770 Note that if the matching element has no text content an empty string 771 is returned. *namespaces* is an optional mapping from namespace prefix 772 to full name. 773 774 775 .. method:: getchildren() 776 777 .. deprecated:: 3.2 778 Use ``list(elem)`` or iteration. 779 780 781 .. method:: getiterator(tag=None) 782 783 .. deprecated:: 3.2 784 Use method :meth:`Element.iter` instead. 785 786 787 .. method:: insert(index, subelement) 788 789 Inserts *subelement* at the given position in this element. Raises 790 :exc:`TypeError` if *subelement* is not an :class:`Element`. 791 792 793 .. method:: iter(tag=None) 794 795 Creates a tree :term:`iterator` with the current element as the root. 796 The iterator iterates over this element and all elements below it, in 797 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only 798 elements whose tag equals *tag* are returned from the iterator. If the 799 tree structure is modified during iteration, the result is undefined. 800 801 .. versionadded:: 3.2 802 803 804 .. method:: iterfind(match, namespaces=None) 805 806 Finds all matching subelements, by tag name or 807 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all 808 matching elements in document order. *namespaces* is an optional mapping 809 from namespace prefix to full name. 810 811 812 .. versionadded:: 3.2 813 814 815 .. method:: itertext() 816 817 Creates a text iterator. The iterator loops over this element and all 818 subelements, in document order, and returns all inner text. 819 820 .. versionadded:: 3.2 821 822 823 .. method:: makeelement(tag, attrib) 824 825 Creates a new element object of the same type as this element. Do not 826 call this method, use the :func:`SubElement` factory function instead. 827 828 829 .. method:: remove(subelement) 830 831 Removes *subelement* from the element. Unlike the find\* methods this 832 method compares elements based on the instance identity, not on tag value 833 or contents. 834 835 :class:`Element` objects also support the following sequence type methods 836 for working with subelements: :meth:`~object.__delitem__`, 837 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`, 838 :meth:`~object.__len__`. 839 840 Caution: Elements with no subelements will test as ``False``. This behavior 841 will change in future versions. Use specific ``len(elem)`` or ``elem is 842 None`` test instead. :: 843 844 element = root.find('foo') 845 846 if not element: # careful! 847 print("element not found, or element has no subelements") 848 849 if element is None: 850 print("element not found") 851 852 853.. _elementtree-elementtree-objects: 854 855ElementTree Objects 856^^^^^^^^^^^^^^^^^^^ 857 858 859.. class:: ElementTree(element=None, file=None) 860 861 ElementTree wrapper class. This class represents an entire element 862 hierarchy, and adds some extra support for serialization to and from 863 standard XML. 864 865 *element* is the root element. The tree is initialized with the contents 866 of the XML *file* if given. 867 868 869 .. method:: _setroot(element) 870 871 Replaces the root element for this tree. This discards the current 872 contents of the tree, and replaces it with the given element. Use with 873 care. *element* is an element instance. 874 875 876 .. method:: find(match, namespaces=None) 877 878 Same as :meth:`Element.find`, starting at the root of the tree. 879 880 881 .. method:: findall(match, namespaces=None) 882 883 Same as :meth:`Element.findall`, starting at the root of the tree. 884 885 886 .. method:: findtext(match, default=None, namespaces=None) 887 888 Same as :meth:`Element.findtext`, starting at the root of the tree. 889 890 891 .. method:: getiterator(tag=None) 892 893 .. deprecated:: 3.2 894 Use method :meth:`ElementTree.iter` instead. 895 896 897 .. method:: getroot() 898 899 Returns the root element for this tree. 900 901 902 .. method:: iter(tag=None) 903 904 Creates and returns a tree iterator for the root element. The iterator 905 loops over all elements in this tree, in section order. *tag* is the tag 906 to look for (default is to return all elements). 907 908 909 .. method:: iterfind(match, namespaces=None) 910 911 Same as :meth:`Element.iterfind`, starting at the root of the tree. 912 913 .. versionadded:: 3.2 914 915 916 .. method:: parse(source, parser=None) 917 918 Loads an external XML section into this element tree. *source* is a file 919 name or :term:`file object`. *parser* is an optional parser instance. 920 If not given, the standard :class:`XMLParser` parser is used. Returns the 921 section root element. 922 923 924 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \ 925 default_namespace=None, method="xml", *, \ 926 short_empty_elements=True) 927 928 Writes the element tree to a file, as XML. *file* is a file name, or a 929 :term:`file object` opened for writing. *encoding* [1]_ is the output 930 encoding (default is US-ASCII). 931 *xml_declaration* controls if an XML declaration should be added to the 932 file. Use ``False`` for never, ``True`` for always, ``None`` 933 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``). 934 *default_namespace* sets the default XML namespace (for "xmlns"). 935 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is 936 ``"xml"``). 937 The keyword-only *short_empty_elements* parameter controls the formatting 938 of elements that contain no content. If ``True`` (the default), they are 939 emitted as a single self-closed tag, otherwise they are emitted as a pair 940 of start/end tags. 941 942 The output is either a string (:class:`str`) or binary (:class:`bytes`). 943 This is controlled by the *encoding* argument. If *encoding* is 944 ``"unicode"``, the output is a string; otherwise, it's binary. Note that 945 this may conflict with the type of *file* if it's an open 946 :term:`file object`; make sure you do not try to write a string to a 947 binary stream and vice versa. 948 949 .. versionadded:: 3.4 950 The *short_empty_elements* parameter. 951 952 953This is the XML file that is going to be manipulated:: 954 955 <html> 956 <head> 957 <title>Example page</title> 958 </head> 959 <body> 960 <p>Moved to <a href="http://example.org/">example.org</a> 961 or <a href="http://example.com/">example.com</a>.</p> 962 </body> 963 </html> 964 965Example of changing the attribute "target" of every link in first paragraph:: 966 967 >>> from xml.etree.ElementTree import ElementTree 968 >>> tree = ElementTree() 969 >>> tree.parse("index.xhtml") 970 <Element 'html' at 0xb77e6fac> 971 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body 972 >>> p 973 <Element 'p' at 0xb77ec26c> 974 >>> links = list(p.iter("a")) # Returns list of all links 975 >>> links 976 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>] 977 >>> for i in links: # Iterates through all found links 978 ... i.attrib["target"] = "blank" 979 >>> tree.write("output.xhtml") 980 981.. _elementtree-qname-objects: 982 983QName Objects 984^^^^^^^^^^^^^ 985 986 987.. class:: QName(text_or_uri, tag=None) 988 989 QName wrapper. This can be used to wrap a QName attribute value, in order 990 to get proper namespace handling on output. *text_or_uri* is a string 991 containing the QName value, in the form {uri}local, or, if the tag argument 992 is given, the URI part of a QName. If *tag* is given, the first argument is 993 interpreted as a URI, and this argument is interpreted as a local name. 994 :class:`QName` instances are opaque. 995 996 997 998.. _elementtree-treebuilder-objects: 999 1000TreeBuilder Objects 1001^^^^^^^^^^^^^^^^^^^ 1002 1003 1004.. class:: TreeBuilder(element_factory=None) 1005 1006 Generic element structure builder. This builder converts a sequence of 1007 start, data, and end method calls to a well-formed element structure. You 1008 can use this class to build an element structure using a custom XML parser, 1009 or a parser for some other XML-like format. *element_factory*, when given, 1010 must be a callable accepting two positional arguments: a tag and 1011 a dict of attributes. It is expected to return a new element instance. 1012 1013 .. method:: close() 1014 1015 Flushes the builder buffers, and returns the toplevel document 1016 element. Returns an :class:`Element` instance. 1017 1018 1019 .. method:: data(data) 1020 1021 Adds text to the current element. *data* is a string. This should be 1022 either a bytestring, or a Unicode string. 1023 1024 1025 .. method:: end(tag) 1026 1027 Closes the current element. *tag* is the element name. Returns the 1028 closed element. 1029 1030 1031 .. method:: start(tag, attrs) 1032 1033 Opens a new element. *tag* is the element name. *attrs* is a dictionary 1034 containing element attributes. Returns the opened element. 1035 1036 1037 In addition, a custom :class:`TreeBuilder` object can provide the 1038 following method: 1039 1040 .. method:: doctype(name, pubid, system) 1041 1042 Handles a doctype declaration. *name* is the doctype name. *pubid* is 1043 the public identifier. *system* is the system identifier. This method 1044 does not exist on the default :class:`TreeBuilder` class. 1045 1046 .. versionadded:: 3.2 1047 1048 1049.. _elementtree-xmlparser-objects: 1050 1051XMLParser Objects 1052^^^^^^^^^^^^^^^^^ 1053 1054 1055.. class:: XMLParser(html=0, target=None, encoding=None) 1056 1057 This class is the low-level building block of the module. It uses 1058 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can 1059 be fed XML data incrementally with the :meth:`feed` method, and parsing 1060 events are translated to a push API - by invoking callbacks on the *target* 1061 object. If *target* is omitted, the standard :class:`TreeBuilder` is used. 1062 The *html* argument was historically used for backwards compatibility and is 1063 now deprecated. If *encoding* [1]_ is given, the value overrides the 1064 encoding specified in the XML file. 1065 1066 .. deprecated:: 3.4 1067 The *html* argument. The remaining arguments should be passed via 1068 keyword to prepare for the removal of the *html* argument. 1069 1070 .. method:: close() 1071 1072 Finishes feeding data to the parser. Returns the result of calling the 1073 ``close()`` method of the *target* passed during construction; by default, 1074 this is the toplevel document element. 1075 1076 1077 .. method:: doctype(name, pubid, system) 1078 1079 .. deprecated:: 3.2 1080 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder 1081 target. 1082 1083 1084 .. method:: feed(data) 1085 1086 Feeds data to the parser. *data* is encoded data. 1087 1088 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method 1089 for each opening tag, its ``end(tag)`` method for each closing tag, and data 1090 is processed by method ``data(data)``. :meth:`XMLParser.close` calls 1091 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for 1092 building a tree structure. This is an example of counting the maximum depth 1093 of an XML file:: 1094 1095 >>> from xml.etree.ElementTree import XMLParser 1096 >>> class MaxDepth: # The target object of the parser 1097 ... maxDepth = 0 1098 ... depth = 0 1099 ... def start(self, tag, attrib): # Called for each opening tag. 1100 ... self.depth += 1 1101 ... if self.depth > self.maxDepth: 1102 ... self.maxDepth = self.depth 1103 ... def end(self, tag): # Called for each closing tag. 1104 ... self.depth -= 1 1105 ... def data(self, data): 1106 ... pass # We do not need to do anything with data. 1107 ... def close(self): # Called when all data has been parsed. 1108 ... return self.maxDepth 1109 ... 1110 >>> target = MaxDepth() 1111 >>> parser = XMLParser(target=target) 1112 >>> exampleXml = """ 1113 ... <a> 1114 ... <b> 1115 ... </b> 1116 ... <b> 1117 ... <c> 1118 ... <d> 1119 ... </d> 1120 ... </c> 1121 ... </b> 1122 ... </a>""" 1123 >>> parser.feed(exampleXml) 1124 >>> parser.close() 1125 4 1126 1127 1128.. _elementtree-xmlpullparser-objects: 1129 1130XMLPullParser Objects 1131^^^^^^^^^^^^^^^^^^^^^ 1132 1133.. class:: XMLPullParser(events=None) 1134 1135 A pull parser suitable for non-blocking applications. Its input-side API is 1136 similar to that of :class:`XMLParser`, but instead of pushing calls to a 1137 callback target, :class:`XMLPullParser` collects an internal list of parsing 1138 events and lets the user read from it. *events* is a sequence of events to 1139 report back. The supported events are the strings ``"start"``, ``"end"``, 1140 ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed 1141 namespace information). If *events* is omitted, only ``"end"`` events are 1142 reported. 1143 1144 .. method:: feed(data) 1145 1146 Feed the given bytes data to the parser. 1147 1148 .. method:: close() 1149 1150 Signal the parser that the data stream is terminated. Unlike 1151 :meth:`XMLParser.close`, this method always returns :const:`None`. 1152 Any events not yet retrieved when the parser is closed can still be 1153 read with :meth:`read_events`. 1154 1155 .. method:: read_events() 1156 1157 Return an iterator over the events which have been encountered in the 1158 data fed to the 1159 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a 1160 string representing the type of event (e.g. ``"end"``) and *elem* is the 1161 encountered :class:`Element` object. 1162 1163 Events provided in a previous call to :meth:`read_events` will not be 1164 yielded again. Events are consumed from the internal queue only when 1165 they are retrieved from the iterator, so multiple readers iterating in 1166 parallel over iterators obtained from :meth:`read_events` will have 1167 unpredictable results. 1168 1169 .. note:: 1170 1171 :class:`XMLPullParser` only guarantees that it has seen the ">" 1172 character of a starting tag when it emits a "start" event, so the 1173 attributes are defined, but the contents of the text and tail attributes 1174 are undefined at that point. The same applies to the element children; 1175 they may or may not be present. 1176 1177 If you need a fully populated element, look for "end" events instead. 1178 1179 .. versionadded:: 3.4 1180 1181Exceptions 1182^^^^^^^^^^ 1183 1184.. class:: ParseError 1185 1186 XML parse error, raised by the various parsing methods in this module when 1187 parsing fails. The string representation of an instance of this exception 1188 will contain a user-friendly error message. In addition, it will have 1189 the following attributes available: 1190 1191 .. attribute:: code 1192 1193 A numeric error code from the expat parser. See the documentation of 1194 :mod:`xml.parsers.expat` for the list of error codes and their meanings. 1195 1196 .. attribute:: position 1197 1198 A tuple of *line*, *column* numbers, specifying where the error occurred. 1199 1200.. rubric:: Footnotes 1201 1202.. [1] The encoding string included in XML output should conform to the 1203 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is 1204 not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 1205 and https://www.iana.org/assignments/character-sets/character-sets.xhtml. 1206