1:mod:`!xml.etree.ElementTree` --- The ElementTree XML API 2========================================================= 3 4.. module:: xml.etree.ElementTree 5 :synopsis: Implementation of the ElementTree API. 6 7.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com> 8 9**Source code:** :source:`Lib/xml/etree/ElementTree.py` 10 11-------------- 12 13The :mod:`xml.etree.ElementTree` module implements a simple and efficient API 14for parsing and creating XML data. 15 16.. versionchanged:: 3.3 17 This module will use a fast implementation whenever available. 18 19.. deprecated:: 3.3 20 The :mod:`!xml.etree.cElementTree` module is deprecated. 21 22 23.. warning:: 24 25 The :mod:`xml.etree.ElementTree` module is not secure against 26 maliciously constructed data. If you need to parse untrusted or 27 unauthenticated data see :ref:`xml-vulnerabilities`. 28 29Tutorial 30-------- 31 32This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in 33short). The goal is to demonstrate some of the building blocks and basic 34concepts of the module. 35 36XML tree and elements 37^^^^^^^^^^^^^^^^^^^^^ 38 39XML is an inherently hierarchical data format, and the most natural way to 40represent it is with a tree. ``ET`` has two classes for this purpose - 41:class:`ElementTree` represents the whole XML document as a tree, and 42:class:`Element` represents a single node in this tree. Interactions with 43the whole document (reading and writing to/from files) are usually done 44on the :class:`ElementTree` level. Interactions with a single XML element 45and its sub-elements are done on the :class:`Element` level. 46 47.. _elementtree-parsing-xml: 48 49Parsing XML 50^^^^^^^^^^^ 51 52We'll be using the fictive :file:`country_data.xml` XML document as the sample data for this section: 53 54.. code-block:: xml 55 56 <?xml version="1.0"?> 57 <data> 58 <country name="Liechtenstein"> 59 <rank>1</rank> 60 <year>2008</year> 61 <gdppc>141100</gdppc> 62 <neighbor name="Austria" direction="E"/> 63 <neighbor name="Switzerland" direction="W"/> 64 </country> 65 <country name="Singapore"> 66 <rank>4</rank> 67 <year>2011</year> 68 <gdppc>59900</gdppc> 69 <neighbor name="Malaysia" direction="N"/> 70 </country> 71 <country name="Panama"> 72 <rank>68</rank> 73 <year>2011</year> 74 <gdppc>13600</gdppc> 75 <neighbor name="Costa Rica" direction="W"/> 76 <neighbor name="Colombia" direction="E"/> 77 </country> 78 </data> 79 80We can import this data by reading from a file:: 81 82 import xml.etree.ElementTree as ET 83 tree = ET.parse('country_data.xml') 84 root = tree.getroot() 85 86Or directly from a string:: 87 88 root = ET.fromstring(country_data_as_string) 89 90:func:`fromstring` parses XML from a string directly into an :class:`Element`, 91which is the root element of the parsed tree. Other parsing functions may 92create an :class:`ElementTree`. Check the documentation to be sure. 93 94As an :class:`Element`, ``root`` has a tag and a dictionary of attributes:: 95 96 >>> root.tag 97 'data' 98 >>> root.attrib 99 {} 100 101It also has children nodes over which we can iterate:: 102 103 >>> for child in root: 104 ... print(child.tag, child.attrib) 105 ... 106 country {'name': 'Liechtenstein'} 107 country {'name': 'Singapore'} 108 country {'name': 'Panama'} 109 110Children are nested, and we can access specific child nodes by index:: 111 112 >>> root[0][1].text 113 '2008' 114 115 116.. note:: 117 118 Not all elements of the XML input will end up as elements of the 119 parsed tree. Currently, this module skips over any XML comments, 120 processing instructions, and document type declarations in the 121 input. Nevertheless, trees built using this module's API rather 122 than parsing from XML text can have comments and processing 123 instructions in them; they will be included when generating XML 124 output. A document type declaration may be accessed by passing a 125 custom :class:`TreeBuilder` instance to the :class:`XMLParser` 126 constructor. 127 128 129.. _elementtree-pull-parsing: 130 131Pull API for non-blocking parsing 132^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 133 134Most parsing functions provided by this module require the whole document 135to be read at once before returning any result. It is possible to use an 136:class:`XMLParser` and feed data into it incrementally, but it is a push API that 137calls methods on a callback target, which is too low-level and inconvenient for 138most needs. Sometimes what the user really wants is to be able to parse XML 139incrementally, without blocking operations, while enjoying the convenience of 140fully constructed :class:`Element` objects. 141 142The most powerful tool for doing this is :class:`XMLPullParser`. It does not 143require a blocking read to obtain the XML data, and is instead fed with data 144incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML 145elements, call :meth:`XMLPullParser.read_events`. Here is an example:: 146 147 >>> parser = ET.XMLPullParser(['start', 'end']) 148 >>> parser.feed('<mytag>sometext') 149 >>> list(parser.read_events()) 150 [('start', <Element 'mytag' at 0x7fa66db2be58>)] 151 >>> parser.feed(' more text</mytag>') 152 >>> for event, elem in parser.read_events(): 153 ... print(event) 154 ... print(elem.tag, 'text=', elem.text) 155 ... 156 end 157 mytag text= sometext more text 158 159The obvious use case is applications that operate in a non-blocking fashion 160where the XML data is being received from a socket or read incrementally from 161some storage device. In such cases, blocking reads are unacceptable. 162 163Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for 164simpler use-cases. If you don't mind your application blocking on reading XML 165data but would still like to have incremental parsing capabilities, take a look 166at :func:`iterparse`. It can be useful when you're reading a large XML document 167and don't want to hold it wholly in memory. 168 169Where *immediate* feedback through events is wanted, calling method 170:meth:`XMLPullParser.flush` can help reduce delay; 171please make sure to study the related security notes. 172 173 174Finding interesting elements 175^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 176 177:class:`Element` has some useful methods that help iterate recursively over all 178the sub-tree below it (its children, their children, and so on). For example, 179:meth:`Element.iter`:: 180 181 >>> for neighbor in root.iter('neighbor'): 182 ... print(neighbor.attrib) 183 ... 184 {'name': 'Austria', 'direction': 'E'} 185 {'name': 'Switzerland', 'direction': 'W'} 186 {'name': 'Malaysia', 'direction': 'N'} 187 {'name': 'Costa Rica', 'direction': 'W'} 188 {'name': 'Colombia', 'direction': 'E'} 189 190:meth:`Element.findall` finds only elements with a tag which are direct 191children of the current element. :meth:`Element.find` finds the *first* child 192with a particular tag, and :attr:`Element.text` accesses the element's text 193content. :meth:`Element.get` accesses the element's attributes:: 194 195 >>> for country in root.findall('country'): 196 ... rank = country.find('rank').text 197 ... name = country.get('name') 198 ... print(name, rank) 199 ... 200 Liechtenstein 1 201 Singapore 4 202 Panama 68 203 204More sophisticated specification of which elements to look for is possible by 205using :ref:`XPath <elementtree-xpath>`. 206 207Modifying an XML File 208^^^^^^^^^^^^^^^^^^^^^ 209 210:class:`ElementTree` provides a simple way to build XML documents and write them to files. 211The :meth:`ElementTree.write` method serves this purpose. 212 213Once created, an :class:`Element` object may be manipulated by directly changing 214its fields (such as :attr:`Element.text`), adding and modifying attributes 215(:meth:`Element.set` method), as well as adding new children (for example 216with :meth:`Element.append`). 217 218Let's say we want to add one to each country's rank, and add an ``updated`` 219attribute to the rank element:: 220 221 >>> for rank in root.iter('rank'): 222 ... new_rank = int(rank.text) + 1 223 ... rank.text = str(new_rank) 224 ... rank.set('updated', 'yes') 225 ... 226 >>> tree.write('output.xml') 227 228Our XML now looks like this: 229 230.. code-block:: xml 231 232 <?xml version="1.0"?> 233 <data> 234 <country name="Liechtenstein"> 235 <rank updated="yes">2</rank> 236 <year>2008</year> 237 <gdppc>141100</gdppc> 238 <neighbor name="Austria" direction="E"/> 239 <neighbor name="Switzerland" direction="W"/> 240 </country> 241 <country name="Singapore"> 242 <rank updated="yes">5</rank> 243 <year>2011</year> 244 <gdppc>59900</gdppc> 245 <neighbor name="Malaysia" direction="N"/> 246 </country> 247 <country name="Panama"> 248 <rank updated="yes">69</rank> 249 <year>2011</year> 250 <gdppc>13600</gdppc> 251 <neighbor name="Costa Rica" direction="W"/> 252 <neighbor name="Colombia" direction="E"/> 253 </country> 254 </data> 255 256We can remove elements using :meth:`Element.remove`. Let's say we want to 257remove all countries with a rank higher than 50:: 258 259 >>> for country in root.findall('country'): 260 ... # using root.findall() to avoid removal during traversal 261 ... rank = int(country.find('rank').text) 262 ... if rank > 50: 263 ... root.remove(country) 264 ... 265 >>> tree.write('output.xml') 266 267Note that concurrent modification while iterating can lead to problems, 268just like when iterating and modifying Python lists or dicts. 269Therefore, the example first collects all matching elements with 270``root.findall()``, and only then iterates over the list of matches. 271 272Our XML now looks like this: 273 274.. code-block:: xml 275 276 <?xml version="1.0"?> 277 <data> 278 <country name="Liechtenstein"> 279 <rank updated="yes">2</rank> 280 <year>2008</year> 281 <gdppc>141100</gdppc> 282 <neighbor name="Austria" direction="E"/> 283 <neighbor name="Switzerland" direction="W"/> 284 </country> 285 <country name="Singapore"> 286 <rank updated="yes">5</rank> 287 <year>2011</year> 288 <gdppc>59900</gdppc> 289 <neighbor name="Malaysia" direction="N"/> 290 </country> 291 </data> 292 293Building XML documents 294^^^^^^^^^^^^^^^^^^^^^^ 295 296The :func:`SubElement` function also provides a convenient way to create new 297sub-elements for a given element:: 298 299 >>> a = ET.Element('a') 300 >>> b = ET.SubElement(a, 'b') 301 >>> c = ET.SubElement(a, 'c') 302 >>> d = ET.SubElement(c, 'd') 303 >>> ET.dump(a) 304 <a><b /><c><d /></c></a> 305 306Parsing XML with Namespaces 307^^^^^^^^^^^^^^^^^^^^^^^^^^^ 308 309If the XML input has `namespaces 310<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes 311with prefixes in the form ``prefix:sometag`` get expanded to 312``{uri}sometag`` where the *prefix* is replaced by the full *URI*. 313Also, if there is a `default namespace 314<https://www.w3.org/TR/xml-names/#defaulting>`__, 315that full URI gets prepended to all of the non-prefixed tags. 316 317Here is an XML example that incorporates two namespaces, one with the 318prefix "fictional" and the other serving as the default namespace: 319 320.. code-block:: xml 321 322 <?xml version="1.0"?> 323 <actors xmlns:fictional="http://characters.example.com" 324 xmlns="http://people.example.com"> 325 <actor> 326 <name>John Cleese</name> 327 <fictional:character>Lancelot</fictional:character> 328 <fictional:character>Archie Leach</fictional:character> 329 </actor> 330 <actor> 331 <name>Eric Idle</name> 332 <fictional:character>Sir Robin</fictional:character> 333 <fictional:character>Gunther</fictional:character> 334 <fictional:character>Commander Clement</fictional:character> 335 </actor> 336 </actors> 337 338One way to search and explore this XML example is to manually add the 339URI to every tag or attribute in the xpath of a 340:meth:`~Element.find` or :meth:`~Element.findall`:: 341 342 root = fromstring(xml_text) 343 for actor in root.findall('{http://people.example.com}actor'): 344 name = actor.find('{http://people.example.com}name') 345 print(name.text) 346 for char in actor.findall('{http://characters.example.com}character'): 347 print(' |-->', char.text) 348 349A better way to search the namespaced XML example is to create a 350dictionary with your own prefixes and use those in the search functions:: 351 352 ns = {'real_person': 'http://people.example.com', 353 'role': 'http://characters.example.com'} 354 355 for actor in root.findall('real_person:actor', ns): 356 name = actor.find('real_person:name', ns) 357 print(name.text) 358 for char in actor.findall('role:character', ns): 359 print(' |-->', char.text) 360 361These two approaches both output:: 362 363 John Cleese 364 |--> Lancelot 365 |--> Archie Leach 366 Eric Idle 367 |--> Sir Robin 368 |--> Gunther 369 |--> Commander Clement 370 371 372.. _elementtree-xpath: 373 374XPath support 375------------- 376 377This module provides limited support for 378`XPath expressions <https://www.w3.org/TR/xpath>`_ for locating elements in a 379tree. The goal is to support a small subset of the abbreviated syntax; a full 380XPath engine is outside the scope of the module. 381 382Example 383^^^^^^^ 384 385Here's an example that demonstrates some of the XPath capabilities of the 386module. We'll be using the ``countrydata`` XML document from the 387:ref:`Parsing XML <elementtree-parsing-xml>` section:: 388 389 import xml.etree.ElementTree as ET 390 391 root = ET.fromstring(countrydata) 392 393 # Top-level elements 394 root.findall(".") 395 396 # All 'neighbor' grand-children of 'country' children of the top-level 397 # elements 398 root.findall("./country/neighbor") 399 400 # Nodes with name='Singapore' that have a 'year' child 401 root.findall(".//year/..[@name='Singapore']") 402 403 # 'year' nodes that are children of nodes with name='Singapore' 404 root.findall(".//*[@name='Singapore']/year") 405 406 # All 'neighbor' nodes that are the second child of their parent 407 root.findall(".//neighbor[2]") 408 409For XML with namespaces, use the usual qualified ``{namespace}tag`` notation:: 410 411 # All dublin-core "title" tags in the document 412 root.findall(".//{http://purl.org/dc/elements/1.1/}title") 413 414 415Supported XPath syntax 416^^^^^^^^^^^^^^^^^^^^^^ 417 418.. tabularcolumns:: |l|L| 419 420+-----------------------+------------------------------------------------------+ 421| Syntax | Meaning | 422+=======================+======================================================+ 423| ``tag`` | Selects all child elements with the given tag. | 424| | For example, ``spam`` selects all child elements | 425| | named ``spam``, and ``spam/egg`` selects all | 426| | grandchildren named ``egg`` in all children named | 427| | ``spam``. ``{namespace}*`` selects all tags in the | 428| | given namespace, ``{*}spam`` selects tags named | 429| | ``spam`` in any (or no) namespace, and ``{}*`` | 430| | only selects tags that are not in a namespace. | 431| | | 432| | .. versionchanged:: 3.8 | 433| | Support for star-wildcards was added. | 434+-----------------------+------------------------------------------------------+ 435| ``*`` | Selects all child elements, including comments and | 436| | processing instructions. For example, ``*/egg`` | 437| | selects all grandchildren named ``egg``. | 438+-----------------------+------------------------------------------------------+ 439| ``.`` | Selects the current node. This is mostly useful | 440| | at the beginning of the path, to indicate that it's | 441| | a relative path. | 442+-----------------------+------------------------------------------------------+ 443| ``//`` | Selects all subelements, on all levels beneath the | 444| | current element. For example, ``.//egg`` selects | 445| | all ``egg`` elements in the entire tree. | 446+-----------------------+------------------------------------------------------+ 447| ``..`` | Selects the parent element. Returns ``None`` if the | 448| | path attempts to reach the ancestors of the start | 449| | element (the element ``find`` was called on). | 450+-----------------------+------------------------------------------------------+ 451| ``[@attrib]`` | Selects all elements that have the given attribute. | 452+-----------------------+------------------------------------------------------+ 453| ``[@attrib='value']`` | Selects all elements for which the given attribute | 454| | has the given value. The value cannot contain | 455| | quotes. | 456+-----------------------+------------------------------------------------------+ 457| ``[@attrib!='value']``| Selects all elements for which the given attribute | 458| | does not have the given value. The value cannot | 459| | contain quotes. | 460| | | 461| | .. versionadded:: 3.10 | 462+-----------------------+------------------------------------------------------+ 463| ``[tag]`` | Selects all elements that have a child named | 464| | ``tag``. Only immediate children are supported. | 465+-----------------------+------------------------------------------------------+ 466| ``[.='text']`` | Selects all elements whose complete text content, | 467| | including descendants, equals the given ``text``. | 468| | | 469| | .. versionadded:: 3.7 | 470+-----------------------+------------------------------------------------------+ 471| ``[.!='text']`` | Selects all elements whose complete text content, | 472| | including descendants, does not equal the given | 473| | ``text``. | 474| | | 475| | .. versionadded:: 3.10 | 476+-----------------------+------------------------------------------------------+ 477| ``[tag='text']`` | Selects all elements that have a child named | 478| | ``tag`` whose complete text content, including | 479| | descendants, equals the given ``text``. | 480+-----------------------+------------------------------------------------------+ 481| ``[tag!='text']`` | Selects all elements that have a child named | 482| | ``tag`` whose complete text content, including | 483| | descendants, does not equal the given ``text``. | 484| | | 485| | .. versionadded:: 3.10 | 486+-----------------------+------------------------------------------------------+ 487| ``[position]`` | Selects all elements that are located at the given | 488| | position. The position can be either an integer | 489| | (1 is the first position), the expression ``last()`` | 490| | (for the last position), or a position relative to | 491| | the last position (e.g. ``last()-1``). | 492+-----------------------+------------------------------------------------------+ 493 494Predicates (expressions within square brackets) must be preceded by a tag 495name, an asterisk, or another predicate. ``position`` predicates must be 496preceded by a tag name. 497 498Reference 499--------- 500 501.. _elementtree-functions: 502 503Functions 504^^^^^^^^^ 505 506.. function:: canonicalize(xml_data=None, *, out=None, from_file=None, **options) 507 508 `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ transformation function. 509 510 Canonicalization is a way to normalise XML output in a way that allows 511 byte-by-byte comparisons and digital signatures. It reduces the freedom 512 that XML serializers have and instead generates a more constrained XML 513 representation. The main restrictions regard the placement of namespace 514 declarations, the ordering of attributes, and ignorable whitespace. 515 516 This function takes an XML data string (*xml_data*) or a file path or 517 file-like object (*from_file*) as input, converts it to the canonical 518 form, and writes it out using the *out* file(-like) object, if provided, 519 or returns it as a text string if not. The output file receives text, 520 not bytes. It should therefore be opened in text mode with ``utf-8`` 521 encoding. 522 523 Typical uses:: 524 525 xml_data = "<root>...</root>" 526 print(canonicalize(xml_data)) 527 528 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: 529 canonicalize(xml_data, out=out_file) 530 531 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: 532 canonicalize(from_file="inputfile.xml", out=out_file) 533 534 The configuration *options* are as follows: 535 536 - *with_comments*: set to true to include comments (default: false) 537 - *strip_text*: set to true to strip whitespace before and after text content 538 (default: false) 539 - *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}" 540 (default: false) 541 - *qname_aware_tags*: a set of qname aware tag names in which prefixes 542 should be replaced in text content (default: empty) 543 - *qname_aware_attrs*: a set of qname aware attribute names in which prefixes 544 should be replaced in text content (default: empty) 545 - *exclude_attrs*: a set of attribute names that should not be serialised 546 - *exclude_tags*: a set of tag names that should not be serialised 547 548 In the option list above, "a set" refers to any collection or iterable of 549 strings, no ordering is expected. 550 551 .. versionadded:: 3.8 552 553 554.. function:: Comment(text=None) 555 556 Comment element factory. This factory function creates a special element 557 that will be serialized as an XML comment by the standard serializer. The 558 comment string can be either a bytestring or a Unicode string. *text* is a 559 string containing the comment string. Returns an element instance 560 representing a comment. 561 562 Note that :class:`XMLParser` skips over comments in the input 563 instead of creating comment objects for them. An :class:`ElementTree` will 564 only contain comment nodes if they have been inserted into to 565 the tree using one of the :class:`Element` methods. 566 567.. function:: dump(elem) 568 569 Writes an element tree or element structure to sys.stdout. This function 570 should be used for debugging only. 571 572 The exact output format is implementation dependent. In this version, it's 573 written as an ordinary XML file. 574 575 *elem* is an element tree or an individual element. 576 577 .. versionchanged:: 3.8 578 The :func:`dump` function now preserves the attribute order specified 579 by the user. 580 581 582.. function:: fromstring(text, parser=None) 583 584 Parses an XML section from a string constant. Same as :func:`XML`. *text* 585 is a string containing XML data. *parser* is an optional parser instance. 586 If not given, the standard :class:`XMLParser` parser is used. 587 Returns an :class:`Element` instance. 588 589 590.. function:: fromstringlist(sequence, parser=None) 591 592 Parses an XML document from a sequence of string fragments. *sequence* is a 593 list or other sequence containing XML data fragments. *parser* is an 594 optional parser instance. If not given, the standard :class:`XMLParser` 595 parser is used. Returns an :class:`Element` instance. 596 597 .. versionadded:: 3.2 598 599 600.. function:: indent(tree, space=" ", level=0) 601 602 Appends whitespace to the subtree to indent the tree visually. 603 This can be used to generate pretty-printed XML output. 604 *tree* can be an Element or ElementTree. *space* is the whitespace 605 string that will be inserted for each indentation level, two space 606 characters by default. For indenting partial subtrees inside of an 607 already indented tree, pass the initial indentation level as *level*. 608 609 .. versionadded:: 3.9 610 611 612.. function:: iselement(element) 613 614 Check if an object appears to be a valid element object. *element* is an 615 element instance. Return ``True`` if this is an element object. 616 617 618.. function:: iterparse(source, events=None, parser=None) 619 620 Parses an XML section into an element tree incrementally, and reports what's 621 going on to the user. *source* is a filename or :term:`file object` 622 containing XML data. *events* is a sequence of events to report back. The 623 supported events are the strings ``"start"``, ``"end"``, ``"comment"``, 624 ``"pi"``, ``"start-ns"`` and ``"end-ns"`` 625 (the "ns" events are used to get detailed namespace 626 information). If *events* is omitted, only ``"end"`` events are reported. 627 *parser* is an optional parser instance. If not given, the standard 628 :class:`XMLParser` parser is used. *parser* must be a subclass of 629 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a 630 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs; 631 it has a ``root`` attribute that references the root element of the 632 resulting XML tree once *source* is fully read. 633 The iterator has the :meth:`!close` method that closes the internal 634 file object if *source* is a filename. 635 636 Note that while :func:`iterparse` builds the tree incrementally, it issues 637 blocking reads on *source* (or the file it names). As such, it's unsuitable 638 for applications where blocking reads can't be made. For fully non-blocking 639 parsing, see :class:`XMLPullParser`. 640 641 .. note:: 642 643 :func:`iterparse` only guarantees that it has seen the ">" character of a 644 starting tag when it emits a "start" event, so the attributes are defined, 645 but the contents of the text and tail attributes are undefined at that 646 point. The same applies to the element children; they may or may not be 647 present. 648 649 If you need a fully populated element, look for "end" events instead. 650 651 .. deprecated:: 3.4 652 The *parser* argument. 653 654 .. versionchanged:: 3.8 655 The ``comment`` and ``pi`` events were added. 656 657 .. versionchanged:: 3.13 658 Added the :meth:`!close` method. 659 660 661.. function:: parse(source, parser=None) 662 663 Parses an XML section into an element tree. *source* is a filename or file 664 object containing XML data. *parser* is an optional parser instance. If 665 not given, the standard :class:`XMLParser` parser is used. Returns an 666 :class:`ElementTree` instance. 667 668 669.. function:: ProcessingInstruction(target, text=None) 670 671 PI element factory. This factory function creates a special element that 672 will be serialized as an XML processing instruction. *target* is a string 673 containing the PI target. *text* is a string containing the PI contents, if 674 given. Returns an element instance, representing a processing instruction. 675 676 Note that :class:`XMLParser` skips over processing instructions 677 in the input instead of creating PI objects for them. An 678 :class:`ElementTree` will only contain processing instruction nodes if 679 they have been inserted into to the tree using one of the 680 :class:`Element` methods. 681 682.. function:: register_namespace(prefix, uri) 683 684 Registers a namespace prefix. The registry is global, and any existing 685 mapping for either the given prefix or the namespace URI will be removed. 686 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and 687 attributes in this namespace will be serialized with the given prefix, if at 688 all possible. 689 690 .. versionadded:: 3.2 691 692 693.. function:: SubElement(parent, tag, attrib={}, **extra) 694 695 Subelement factory. This function creates an element instance, and appends 696 it to an existing element. 697 698 The element name, attribute names, and attribute values can be either 699 bytestrings or Unicode strings. *parent* is the parent element. *tag* is 700 the subelement name. *attrib* is an optional dictionary, containing element 701 attributes. *extra* contains additional attributes, given as keyword 702 arguments. Returns an element instance. 703 704 705.. function:: tostring(element, encoding="us-ascii", method="xml", *, \ 706 xml_declaration=None, default_namespace=None, \ 707 short_empty_elements=True) 708 709 Generates a string representation of an XML element, including all 710 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 711 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 712 generate a Unicode string (otherwise, a bytestring is generated). *method* 713 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 714 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same 715 meaning as in :meth:`ElementTree.write`. Returns an (optionally) encoded string 716 containing the XML data. 717 718 .. versionchanged:: 3.4 719 Added the *short_empty_elements* parameter. 720 721 .. versionchanged:: 3.8 722 Added the *xml_declaration* and *default_namespace* parameters. 723 724 .. versionchanged:: 3.8 725 The :func:`tostring` function now preserves the attribute order 726 specified by the user. 727 728 729.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \ 730 xml_declaration=None, default_namespace=None, \ 731 short_empty_elements=True) 732 733 Generates a string representation of an XML element, including all 734 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is 735 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to 736 generate a Unicode string (otherwise, a bytestring is generated). *method* 737 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). 738 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same 739 meaning as in :meth:`ElementTree.write`. Returns a list of (optionally) encoded 740 strings containing the XML data. It does not guarantee any specific sequence, 741 except that ``b"".join(tostringlist(element)) == tostring(element)``. 742 743 .. versionadded:: 3.2 744 745 .. versionchanged:: 3.4 746 Added the *short_empty_elements* parameter. 747 748 .. versionchanged:: 3.8 749 Added the *xml_declaration* and *default_namespace* parameters. 750 751 .. versionchanged:: 3.8 752 The :func:`tostringlist` function now preserves the attribute order 753 specified by the user. 754 755 756.. function:: XML(text, parser=None) 757 758 Parses an XML section from a string constant. This function can be used to 759 embed "XML literals" in Python code. *text* is a string containing XML 760 data. *parser* is an optional parser instance. If not given, the standard 761 :class:`XMLParser` parser is used. Returns an :class:`Element` instance. 762 763 764.. function:: XMLID(text, parser=None) 765 766 Parses an XML section from a string constant, and also returns a dictionary 767 which maps from element id:s to elements. *text* is a string containing XML 768 data. *parser* is an optional parser instance. If not given, the standard 769 :class:`XMLParser` parser is used. Returns a tuple containing an 770 :class:`Element` instance and a dictionary. 771 772 773.. _elementtree-xinclude: 774 775XInclude support 776---------------- 777 778This module provides limited support for 779`XInclude directives <https://www.w3.org/TR/xinclude/>`_, via the :mod:`xml.etree.ElementInclude` helper module. This module can be used to insert subtrees and text strings into element trees, based on information in the tree. 780 781Example 782^^^^^^^ 783 784Here's an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the ``{http://www.w3.org/2001/XInclude}include`` element and set the **parse** attribute to ``"xml"``, and use the **href** attribute to specify the document to include. 785 786.. code-block:: xml 787 788 <?xml version="1.0"?> 789 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 790 <xi:include href="source.xml" parse="xml" /> 791 </document> 792 793By default, the **href** attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax. 794 795To process this file, load it as usual, and pass the root element to the :mod:`xml.etree.ElementTree` module: 796 797.. code-block:: python 798 799 from xml.etree import ElementTree, ElementInclude 800 801 tree = ElementTree.parse("document.xml") 802 root = tree.getroot() 803 804 ElementInclude.include(root) 805 806The ElementInclude module replaces the ``{http://www.w3.org/2001/XInclude}include`` element with the root element from the **source.xml** document. The result might look something like this: 807 808.. code-block:: xml 809 810 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 811 <para>This is a paragraph.</para> 812 </document> 813 814If the **parse** attribute is omitted, it defaults to "xml". The href attribute is required. 815 816To include a text document, use the ``{http://www.w3.org/2001/XInclude}include`` element, and set the **parse** attribute to "text": 817 818.. code-block:: xml 819 820 <?xml version="1.0"?> 821 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 822 Copyright (c) <xi:include href="year.txt" parse="text" />. 823 </document> 824 825The result might look something like: 826 827.. code-block:: xml 828 829 <document xmlns:xi="http://www.w3.org/2001/XInclude"> 830 Copyright (c) 2003. 831 </document> 832 833Reference 834--------- 835 836.. _elementinclude-functions: 837 838Functions 839^^^^^^^^^ 840 841.. module:: xml.etree.ElementInclude 842 843.. function:: default_loader(href, parse, encoding=None) 844 845 Default loader. This default loader reads an included resource from disk. 846 *href* is a URL. *parse* is for parse mode either "xml" or "text". 847 *encoding* is an optional text encoding. If not given, encoding is ``utf-8``. 848 Returns the expanded resource. 849 If the parse mode is ``"xml"``, this is an :class:`~xml.etree.ElementTree.Element` instance. 850 If the parse mode is ``"text"``, this is a string. 851 If the loader fails, it can return ``None`` or raise an exception. 852 853 854.. function:: include(elem, loader=None, base_url=None, max_depth=6) 855 856 This function expands XInclude directives in-place in tree pointed by *elem*. 857 *elem* is either the root :class:`~xml.etree.ElementTree.Element` or an 858 :class:`~xml.etree.ElementTree.ElementTree` instance to find such element. 859 *loader* is an optional resource loader. If omitted, it defaults to :func:`default_loader`. 860 If given, it should be a callable that implements the same interface as 861 :func:`default_loader`. *base_url* is base URL of the original file, to resolve 862 relative include file references. *max_depth* is the maximum number of recursive 863 inclusions. Limited to reduce the risk of malicious content explosion. 864 Pass ``None`` to disable the limitation. 865 866 .. versionchanged:: 3.9 867 Added the *base_url* and *max_depth* parameters. 868 869 870.. _elementtree-element-objects: 871 872Element Objects 873^^^^^^^^^^^^^^^ 874 875.. module:: xml.etree.ElementTree 876 :noindex: 877 :no-index: 878 879.. class:: Element(tag, attrib={}, **extra) 880 881 Element class. This class defines the Element interface, and provides a 882 reference implementation of this interface. 883 884 The element name, attribute names, and attribute values can be either 885 bytestrings or Unicode strings. *tag* is the element name. *attrib* is 886 an optional dictionary, containing element attributes. *extra* contains 887 additional attributes, given as keyword arguments. 888 889 890 .. attribute:: tag 891 892 A string identifying what kind of data this element represents (the 893 element type, in other words). 894 895 896 .. attribute:: text 897 tail 898 899 These attributes can be used to hold additional data associated with 900 the element. Their values are usually strings but may be any 901 application-specific object. If the element is created from 902 an XML file, the *text* attribute holds either the text between 903 the element's start tag and its first child or end tag, or ``None``, and 904 the *tail* attribute holds either the text between the element's 905 end tag and the next tag, or ``None``. For the XML data 906 907 .. code-block:: xml 908 909 <a><b>1<c>2<d/>3</c></b>4</a> 910 911 the *a* element has ``None`` for both *text* and *tail* attributes, 912 the *b* element has *text* ``"1"`` and *tail* ``"4"``, 913 the *c* element has *text* ``"2"`` and *tail* ``None``, 914 and the *d* element has *text* ``None`` and *tail* ``"3"``. 915 916 To collect the inner text of an element, see :meth:`itertext`, for 917 example ``"".join(element.itertext())``. 918 919 Applications may store arbitrary objects in these attributes. 920 921 922 .. attribute:: attrib 923 924 A dictionary containing the element's attributes. Note that while the 925 *attrib* value is always a real mutable Python dictionary, an ElementTree 926 implementation may choose to use another internal representation, and 927 create the dictionary only if someone asks for it. To take advantage of 928 such implementations, use the dictionary methods below whenever possible. 929 930 The following dictionary-like methods work on the element attributes. 931 932 933 .. method:: clear() 934 935 Resets an element. This function removes all subelements, clears all 936 attributes, and sets the text and tail attributes to ``None``. 937 938 939 .. method:: get(key, default=None) 940 941 Gets the element attribute named *key*. 942 943 Returns the attribute value, or *default* if the attribute was not found. 944 945 946 .. method:: items() 947 948 Returns the element attributes as a sequence of (name, value) pairs. The 949 attributes are returned in an arbitrary order. 950 951 952 .. method:: keys() 953 954 Returns the elements attribute names as a list. The names are returned 955 in an arbitrary order. 956 957 958 .. method:: set(key, value) 959 960 Set the attribute *key* on the element to *value*. 961 962 The following methods work on the element's children (subelements). 963 964 965 .. method:: append(subelement) 966 967 Adds the element *subelement* to the end of this element's internal list 968 of subelements. Raises :exc:`TypeError` if *subelement* is not an 969 :class:`Element`. 970 971 972 .. method:: extend(subelements) 973 974 Appends *subelements* from an iterable of elements. 975 Raises :exc:`TypeError` if a subelement is not an :class:`Element`. 976 977 .. versionadded:: 3.2 978 979 980 .. method:: find(match, namespaces=None) 981 982 Finds the first subelement matching *match*. *match* may be a tag name 983 or a :ref:`path <elementtree-xpath>`. Returns an element instance 984 or ``None``. *namespaces* is an optional mapping from namespace prefix 985 to full name. Pass ``''`` as prefix to move all unprefixed tag names 986 in the expression into the given namespace. 987 988 989 .. method:: findall(match, namespaces=None) 990 991 Finds all matching subelements, by tag name or 992 :ref:`path <elementtree-xpath>`. Returns a list containing all matching 993 elements in document order. *namespaces* is an optional mapping from 994 namespace prefix to full name. Pass ``''`` as prefix to move all 995 unprefixed tag names in the expression into the given namespace. 996 997 998 .. method:: findtext(match, default=None, namespaces=None) 999 1000 Finds text for the first subelement matching *match*. *match* may be 1001 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content 1002 of the first matching element, or *default* if no element was found. 1003 Note that if the matching element has no text content an empty string 1004 is returned. *namespaces* is an optional mapping from namespace prefix 1005 to full name. Pass ``''`` as prefix to move all unprefixed tag names 1006 in the expression into the given namespace. 1007 1008 1009 .. method:: insert(index, subelement) 1010 1011 Inserts *subelement* at the given position in this element. Raises 1012 :exc:`TypeError` if *subelement* is not an :class:`Element`. 1013 1014 1015 .. method:: iter(tag=None) 1016 1017 Creates a tree :term:`iterator` with the current element as the root. 1018 The iterator iterates over this element and all elements below it, in 1019 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only 1020 elements whose tag equals *tag* are returned from the iterator. If the 1021 tree structure is modified during iteration, the result is undefined. 1022 1023 .. versionadded:: 3.2 1024 1025 1026 .. method:: iterfind(match, namespaces=None) 1027 1028 Finds all matching subelements, by tag name or 1029 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all 1030 matching elements in document order. *namespaces* is an optional mapping 1031 from namespace prefix to full name. 1032 1033 1034 .. versionadded:: 3.2 1035 1036 1037 .. method:: itertext() 1038 1039 Creates a text iterator. The iterator loops over this element and all 1040 subelements, in document order, and returns all inner text. 1041 1042 .. versionadded:: 3.2 1043 1044 1045 .. method:: makeelement(tag, attrib) 1046 1047 Creates a new element object of the same type as this element. Do not 1048 call this method, use the :func:`SubElement` factory function instead. 1049 1050 1051 .. method:: remove(subelement) 1052 1053 Removes *subelement* from the element. Unlike the find\* methods this 1054 method compares elements based on the instance identity, not on tag value 1055 or contents. 1056 1057 :class:`Element` objects also support the following sequence type methods 1058 for working with subelements: :meth:`~object.__delitem__`, 1059 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`, 1060 :meth:`~object.__len__`. 1061 1062 Caution: Elements with no subelements will test as ``False``. In a future 1063 release of Python, all elements will test as ``True`` regardless of whether 1064 subelements exist. Instead, prefer explicit ``len(elem)`` or 1065 ``elem is not None`` tests.:: 1066 1067 element = root.find('foo') 1068 1069 if not element: # careful! 1070 print("element not found, or element has no subelements") 1071 1072 if element is None: 1073 print("element not found") 1074 1075 .. versionchanged:: 3.12 1076 Testing the truth value of an Element emits :exc:`DeprecationWarning`. 1077 1078 Prior to Python 3.8, the serialisation order of the XML attributes of 1079 elements was artificially made predictable by sorting the attributes by 1080 their name. Based on the now guaranteed ordering of dicts, this arbitrary 1081 reordering was removed in Python 3.8 to preserve the order in which 1082 attributes were originally parsed or created by user code. 1083 1084 In general, user code should try not to depend on a specific ordering of 1085 attributes, given that the `XML Information Set 1086 <https://www.w3.org/TR/xml-infoset/>`_ explicitly excludes the attribute 1087 order from conveying information. Code should be prepared to deal with 1088 any ordering on input. In cases where deterministic XML output is required, 1089 e.g. for cryptographic signing or test data sets, canonical serialisation 1090 is available with the :func:`canonicalize` function. 1091 1092 In cases where canonical output is not applicable but a specific attribute 1093 order is still desirable on output, code should aim for creating the 1094 attributes directly in the desired order, to avoid perceptual mismatches 1095 for readers of the code. In cases where this is difficult to achieve, a 1096 recipe like the following can be applied prior to serialisation to enforce 1097 an order independently from the Element creation:: 1098 1099 def reorder_attributes(root): 1100 for el in root.iter(): 1101 attrib = el.attrib 1102 if len(attrib) > 1: 1103 # adjust attribute order, e.g. by sorting 1104 attribs = sorted(attrib.items()) 1105 attrib.clear() 1106 attrib.update(attribs) 1107 1108 1109.. _elementtree-elementtree-objects: 1110 1111ElementTree Objects 1112^^^^^^^^^^^^^^^^^^^ 1113 1114 1115.. class:: ElementTree(element=None, file=None) 1116 1117 ElementTree wrapper class. This class represents an entire element 1118 hierarchy, and adds some extra support for serialization to and from 1119 standard XML. 1120 1121 *element* is the root element. The tree is initialized with the contents 1122 of the XML *file* if given. 1123 1124 1125 .. method:: _setroot(element) 1126 1127 Replaces the root element for this tree. This discards the current 1128 contents of the tree, and replaces it with the given element. Use with 1129 care. *element* is an element instance. 1130 1131 1132 .. method:: find(match, namespaces=None) 1133 1134 Same as :meth:`Element.find`, starting at the root of the tree. 1135 1136 1137 .. method:: findall(match, namespaces=None) 1138 1139 Same as :meth:`Element.findall`, starting at the root of the tree. 1140 1141 1142 .. method:: findtext(match, default=None, namespaces=None) 1143 1144 Same as :meth:`Element.findtext`, starting at the root of the tree. 1145 1146 1147 .. method:: getroot() 1148 1149 Returns the root element for this tree. 1150 1151 1152 .. method:: iter(tag=None) 1153 1154 Creates and returns a tree iterator for the root element. The iterator 1155 loops over all elements in this tree, in section order. *tag* is the tag 1156 to look for (default is to return all elements). 1157 1158 1159 .. method:: iterfind(match, namespaces=None) 1160 1161 Same as :meth:`Element.iterfind`, starting at the root of the tree. 1162 1163 .. versionadded:: 3.2 1164 1165 1166 .. method:: parse(source, parser=None) 1167 1168 Loads an external XML section into this element tree. *source* is a file 1169 name or :term:`file object`. *parser* is an optional parser instance. 1170 If not given, the standard :class:`XMLParser` parser is used. Returns the 1171 section root element. 1172 1173 1174 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \ 1175 default_namespace=None, method="xml", *, \ 1176 short_empty_elements=True) 1177 1178 Writes the element tree to a file, as XML. *file* is a file name, or a 1179 :term:`file object` opened for writing. *encoding* [1]_ is the output 1180 encoding (default is US-ASCII). 1181 *xml_declaration* controls if an XML declaration should be added to the 1182 file. Use ``False`` for never, ``True`` for always, ``None`` 1183 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``). 1184 *default_namespace* sets the default XML namespace (for "xmlns"). 1185 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is 1186 ``"xml"``). 1187 The keyword-only *short_empty_elements* parameter controls the formatting 1188 of elements that contain no content. If ``True`` (the default), they are 1189 emitted as a single self-closed tag, otherwise they are emitted as a pair 1190 of start/end tags. 1191 1192 The output is either a string (:class:`str`) or binary (:class:`bytes`). 1193 This is controlled by the *encoding* argument. If *encoding* is 1194 ``"unicode"``, the output is a string; otherwise, it's binary. Note that 1195 this may conflict with the type of *file* if it's an open 1196 :term:`file object`; make sure you do not try to write a string to a 1197 binary stream and vice versa. 1198 1199 .. versionchanged:: 3.4 1200 Added the *short_empty_elements* parameter. 1201 1202 .. versionchanged:: 3.8 1203 The :meth:`write` method now preserves the attribute order specified 1204 by the user. 1205 1206 1207This is the XML file that is going to be manipulated:: 1208 1209 <html> 1210 <head> 1211 <title>Example page</title> 1212 </head> 1213 <body> 1214 <p>Moved to <a href="http://example.org/">example.org</a> 1215 or <a href="http://example.com/">example.com</a>.</p> 1216 </body> 1217 </html> 1218 1219Example of changing the attribute "target" of every link in first paragraph:: 1220 1221 >>> from xml.etree.ElementTree import ElementTree 1222 >>> tree = ElementTree() 1223 >>> tree.parse("index.xhtml") 1224 <Element 'html' at 0xb77e6fac> 1225 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body 1226 >>> p 1227 <Element 'p' at 0xb77ec26c> 1228 >>> links = list(p.iter("a")) # Returns list of all links 1229 >>> links 1230 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>] 1231 >>> for i in links: # Iterates through all found links 1232 ... i.attrib["target"] = "blank" 1233 ... 1234 >>> tree.write("output.xhtml") 1235 1236.. _elementtree-qname-objects: 1237 1238QName Objects 1239^^^^^^^^^^^^^ 1240 1241 1242.. class:: QName(text_or_uri, tag=None) 1243 1244 QName wrapper. This can be used to wrap a QName attribute value, in order 1245 to get proper namespace handling on output. *text_or_uri* is a string 1246 containing the QName value, in the form {uri}local, or, if the tag argument 1247 is given, the URI part of a QName. If *tag* is given, the first argument is 1248 interpreted as a URI, and this argument is interpreted as a local name. 1249 :class:`QName` instances are opaque. 1250 1251 1252 1253.. _elementtree-treebuilder-objects: 1254 1255TreeBuilder Objects 1256^^^^^^^^^^^^^^^^^^^ 1257 1258 1259.. class:: TreeBuilder(element_factory=None, *, comment_factory=None, \ 1260 pi_factory=None, insert_comments=False, insert_pis=False) 1261 1262 Generic element structure builder. This builder converts a sequence of 1263 start, data, end, comment and pi method calls to a well-formed element 1264 structure. You can use this class to build an element structure using 1265 a custom XML parser, or a parser for some other XML-like format. 1266 1267 *element_factory*, when given, must be a callable accepting two positional 1268 arguments: a tag and a dict of attributes. It is expected to return a new 1269 element instance. 1270 1271 The *comment_factory* and *pi_factory* functions, when given, should behave 1272 like the :func:`Comment` and :func:`ProcessingInstruction` functions to 1273 create comments and processing instructions. When not given, the default 1274 factories will be used. When *insert_comments* and/or *insert_pis* is true, 1275 comments/pis will be inserted into the tree if they appear within the root 1276 element (but not outside of it). 1277 1278 .. method:: close() 1279 1280 Flushes the builder buffers, and returns the toplevel document 1281 element. Returns an :class:`Element` instance. 1282 1283 1284 .. method:: data(data) 1285 1286 Adds text to the current element. *data* is a string. This should be 1287 either a bytestring, or a Unicode string. 1288 1289 1290 .. method:: end(tag) 1291 1292 Closes the current element. *tag* is the element name. Returns the 1293 closed element. 1294 1295 1296 .. method:: start(tag, attrs) 1297 1298 Opens a new element. *tag* is the element name. *attrs* is a dictionary 1299 containing element attributes. Returns the opened element. 1300 1301 1302 .. method:: comment(text) 1303 1304 Creates a comment with the given *text*. If ``insert_comments`` is true, 1305 this will also add it to the tree. 1306 1307 .. versionadded:: 3.8 1308 1309 1310 .. method:: pi(target, text) 1311 1312 Creates a process instruction with the given *target* name and *text*. 1313 If ``insert_pis`` is true, this will also add it to the tree. 1314 1315 .. versionadded:: 3.8 1316 1317 1318 In addition, a custom :class:`TreeBuilder` object can provide the 1319 following methods: 1320 1321 .. method:: doctype(name, pubid, system) 1322 1323 Handles a doctype declaration. *name* is the doctype name. *pubid* is 1324 the public identifier. *system* is the system identifier. This method 1325 does not exist on the default :class:`TreeBuilder` class. 1326 1327 .. versionadded:: 3.2 1328 1329 .. method:: start_ns(prefix, uri) 1330 1331 Is called whenever the parser encounters a new namespace declaration, 1332 before the ``start()`` callback for the opening element that defines it. 1333 *prefix* is ``''`` for the default namespace and the declared 1334 namespace prefix name otherwise. *uri* is the namespace URI. 1335 1336 .. versionadded:: 3.8 1337 1338 .. method:: end_ns(prefix) 1339 1340 Is called after the ``end()`` callback of an element that declared 1341 a namespace prefix mapping, with the name of the *prefix* that went 1342 out of scope. 1343 1344 .. versionadded:: 3.8 1345 1346 1347.. class:: C14NWriterTarget(write, *, \ 1348 with_comments=False, strip_text=False, rewrite_prefixes=False, \ 1349 qname_aware_tags=None, qname_aware_attrs=None, \ 1350 exclude_attrs=None, exclude_tags=None) 1351 1352 A `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ writer. Arguments are the 1353 same as for the :func:`canonicalize` function. This class does not build a 1354 tree but translates the callback events directly into a serialised form 1355 using the *write* function. 1356 1357 .. versionadded:: 3.8 1358 1359 1360.. _elementtree-xmlparser-objects: 1361 1362XMLParser Objects 1363^^^^^^^^^^^^^^^^^ 1364 1365 1366.. class:: XMLParser(*, target=None, encoding=None) 1367 1368 This class is the low-level building block of the module. It uses 1369 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can 1370 be fed XML data incrementally with the :meth:`feed` method, and parsing 1371 events are translated to a push API - by invoking callbacks on the *target* 1372 object. If *target* is omitted, the standard :class:`TreeBuilder` is used. 1373 If *encoding* [1]_ is given, the value overrides the 1374 encoding specified in the XML file. 1375 1376 .. versionchanged:: 3.8 1377 Parameters are now :ref:`keyword-only <keyword-only_parameter>`. 1378 The *html* argument is no longer supported. 1379 1380 1381 .. method:: close() 1382 1383 Finishes feeding data to the parser. Returns the result of calling the 1384 ``close()`` method of the *target* passed during construction; by default, 1385 this is the toplevel document element. 1386 1387 1388 .. method:: feed(data) 1389 1390 Feeds data to the parser. *data* is encoded data. 1391 1392 1393 .. method:: flush() 1394 1395 Triggers parsing of any previously fed unparsed data, which can be 1396 used to ensure more immediate feedback, in particular with Expat >=2.6.0. 1397 The implementation of :meth:`flush` temporarily disables reparse deferral 1398 with Expat (if currently enabled) and triggers a reparse. 1399 Disabling reparse deferral has security consequences; please see 1400 :meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details. 1401 1402 Note that :meth:`flush` has been backported to some prior releases of 1403 CPython as a security fix. Check for availability of :meth:`flush` 1404 using :func:`hasattr` if used in code running across a variety of Python 1405 versions. 1406 1407 .. versionadded:: 3.13 1408 1409 1410 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method 1411 for each opening tag, its ``end(tag)`` method for each closing tag, and data 1412 is processed by method ``data(data)``. For further supported callback 1413 methods, see the :class:`TreeBuilder` class. :meth:`XMLParser.close` calls 1414 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for 1415 building a tree structure. This is an example of counting the maximum depth 1416 of an XML file:: 1417 1418 >>> from xml.etree.ElementTree import XMLParser 1419 >>> class MaxDepth: # The target object of the parser 1420 ... maxDepth = 0 1421 ... depth = 0 1422 ... def start(self, tag, attrib): # Called for each opening tag. 1423 ... self.depth += 1 1424 ... if self.depth > self.maxDepth: 1425 ... self.maxDepth = self.depth 1426 ... def end(self, tag): # Called for each closing tag. 1427 ... self.depth -= 1 1428 ... def data(self, data): 1429 ... pass # We do not need to do anything with data. 1430 ... def close(self): # Called when all data has been parsed. 1431 ... return self.maxDepth 1432 ... 1433 >>> target = MaxDepth() 1434 >>> parser = XMLParser(target=target) 1435 >>> exampleXml = """ 1436 ... <a> 1437 ... <b> 1438 ... </b> 1439 ... <b> 1440 ... <c> 1441 ... <d> 1442 ... </d> 1443 ... </c> 1444 ... </b> 1445 ... </a>""" 1446 >>> parser.feed(exampleXml) 1447 >>> parser.close() 1448 4 1449 1450 1451.. _elementtree-xmlpullparser-objects: 1452 1453XMLPullParser Objects 1454^^^^^^^^^^^^^^^^^^^^^ 1455 1456.. class:: XMLPullParser(events=None) 1457 1458 A pull parser suitable for non-blocking applications. Its input-side API is 1459 similar to that of :class:`XMLParser`, but instead of pushing calls to a 1460 callback target, :class:`XMLPullParser` collects an internal list of parsing 1461 events and lets the user read from it. *events* is a sequence of events to 1462 report back. The supported events are the strings ``"start"``, ``"end"``, 1463 ``"comment"``, ``"pi"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" events 1464 are used to get detailed namespace information). If *events* is omitted, 1465 only ``"end"`` events are reported. 1466 1467 .. method:: feed(data) 1468 1469 Feed the given bytes data to the parser. 1470 1471 .. method:: flush() 1472 1473 Triggers parsing of any previously fed unparsed data, which can be 1474 used to ensure more immediate feedback, in particular with Expat >=2.6.0. 1475 The implementation of :meth:`flush` temporarily disables reparse deferral 1476 with Expat (if currently enabled) and triggers a reparse. 1477 Disabling reparse deferral has security consequences; please see 1478 :meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details. 1479 1480 Note that :meth:`flush` has been backported to some prior releases of 1481 CPython as a security fix. Check for availability of :meth:`flush` 1482 using :func:`hasattr` if used in code running across a variety of Python 1483 versions. 1484 1485 .. versionadded:: 3.13 1486 1487 .. method:: close() 1488 1489 Signal the parser that the data stream is terminated. Unlike 1490 :meth:`XMLParser.close`, this method always returns :const:`None`. 1491 Any events not yet retrieved when the parser is closed can still be 1492 read with :meth:`read_events`. 1493 1494 .. method:: read_events() 1495 1496 Return an iterator over the events which have been encountered in the 1497 data fed to the 1498 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a 1499 string representing the type of event (e.g. ``"end"``) and *elem* is the 1500 encountered :class:`Element` object, or other context value as follows. 1501 1502 * ``start``, ``end``: the current Element. 1503 * ``comment``, ``pi``: the current comment / processing instruction 1504 * ``start-ns``: a tuple ``(prefix, uri)`` naming the declared namespace 1505 mapping. 1506 * ``end-ns``: :const:`None` (this may change in a future version) 1507 1508 Events provided in a previous call to :meth:`read_events` will not be 1509 yielded again. Events are consumed from the internal queue only when 1510 they are retrieved from the iterator, so multiple readers iterating in 1511 parallel over iterators obtained from :meth:`read_events` will have 1512 unpredictable results. 1513 1514 .. note:: 1515 1516 :class:`XMLPullParser` only guarantees that it has seen the ">" 1517 character of a starting tag when it emits a "start" event, so the 1518 attributes are defined, but the contents of the text and tail attributes 1519 are undefined at that point. The same applies to the element children; 1520 they may or may not be present. 1521 1522 If you need a fully populated element, look for "end" events instead. 1523 1524 .. versionadded:: 3.4 1525 1526 .. versionchanged:: 3.8 1527 The ``comment`` and ``pi`` events were added. 1528 1529 1530Exceptions 1531^^^^^^^^^^ 1532 1533.. class:: ParseError 1534 1535 XML parse error, raised by the various parsing methods in this module when 1536 parsing fails. The string representation of an instance of this exception 1537 will contain a user-friendly error message. In addition, it will have 1538 the following attributes available: 1539 1540 .. attribute:: code 1541 1542 A numeric error code from the expat parser. See the documentation of 1543 :mod:`xml.parsers.expat` for the list of error codes and their meanings. 1544 1545 .. attribute:: position 1546 1547 A tuple of *line*, *column* numbers, specifying where the error occurred. 1548 1549.. rubric:: Footnotes 1550 1551.. [1] The encoding string included in XML output should conform to the 1552 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is 1553 not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 1554 and https://www.iana.org/assignments/character-sets/character-sets.xhtml. 1555