1:mod:`xml.dom.minidom` --- Minimal DOM implementation 2===================================================== 3 4.. module:: xml.dom.minidom 5 :synopsis: Minimal Document Object Model (DOM) implementation. 6 7.. moduleauthor:: Paul Prescod <paul@prescod.net> 8.. sectionauthor:: Paul Prescod <paul@prescod.net> 9.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> 10 11**Source code:** :source:`Lib/xml/dom/minidom.py` 12 13-------------- 14 15:mod:`xml.dom.minidom` is a minimal implementation of the Document Object 16Model interface, with an API similar to that in other languages. It is intended 17to be simpler than the full DOM and also significantly smaller. Users who are 18not already proficient with the DOM should consider using the 19:mod:`xml.etree.ElementTree` module for their XML processing instead. 20 21 22.. warning:: 23 24 The :mod:`xml.dom.minidom` module is not secure against 25 maliciously constructed data. If you need to parse untrusted or 26 unauthenticated data see :ref:`xml-vulnerabilities`. 27 28 29DOM applications typically start by parsing some XML into a DOM. With 30:mod:`xml.dom.minidom`, this is done through the parse functions:: 31 32 from xml.dom.minidom import parse, parseString 33 34 dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name 35 36 datasource = open('c:\\temp\\mydata.xml') 37 dom2 = parse(datasource) # parse an open file 38 39 dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') 40 41The :func:`parse` function can take either a filename or an open file object. 42 43 44.. function:: parse(filename_or_file, parser=None, bufsize=None) 45 46 Return a :class:`Document` from the given input. *filename_or_file* may be 47 either a file name, or a file-like object. *parser*, if given, must be a SAX2 48 parser object. This function will change the document handler of the parser and 49 activate namespace support; other parser configuration (like setting an entity 50 resolver) must have been done in advance. 51 52If you have XML in a string, you can use the :func:`parseString` function 53instead: 54 55 56.. function:: parseString(string, parser=None) 57 58 Return a :class:`Document` that represents the *string*. This method creates an 59 :class:`io.StringIO` object for the string and passes that on to :func:`parse`. 60 61Both functions return a :class:`Document` object representing the content of the 62document. 63 64What the :func:`parse` and :func:`parseString` functions do is connect an XML 65parser with a "DOM builder" that can accept parse events from any SAX parser and 66convert them into a DOM tree. The name of the functions are perhaps misleading, 67but are easy to grasp when learning the interfaces. The parsing of the document 68will be completed before these functions return; it's simply that these 69functions do not provide a parser implementation themselves. 70 71You can also create a :class:`Document` by calling a method on a "DOM 72Implementation" object. You can get this object either by calling the 73:func:`getDOMImplementation` function in the :mod:`xml.dom` package or the 74:mod:`xml.dom.minidom` module. Once you have a :class:`Document`, you 75can add child nodes to it to populate the DOM:: 76 77 from xml.dom.minidom import getDOMImplementation 78 79 impl = getDOMImplementation() 80 81 newdoc = impl.createDocument(None, "some_tag", None) 82 top_element = newdoc.documentElement 83 text = newdoc.createTextNode('Some textual content.') 84 top_element.appendChild(text) 85 86Once you have a DOM document object, you can access the parts of your XML 87document through its properties and methods. These properties are defined in 88the DOM specification. The main property of the document object is the 89:attr:`documentElement` property. It gives you the main element in the XML 90document: the one that holds all others. Here is an example program:: 91 92 dom3 = parseString("<myxml>Some data</myxml>") 93 assert dom3.documentElement.tagName == "myxml" 94 95When you are finished with a DOM tree, you may optionally call the 96:meth:`unlink` method to encourage early cleanup of the now-unneeded 97objects. :meth:`unlink` is an :mod:`xml.dom.minidom`\ -specific 98extension to the DOM API that renders the node and its descendants are 99essentially useless. Otherwise, Python's garbage collector will 100eventually take care of the objects in the tree. 101 102.. seealso:: 103 104 `Document Object Model (DOM) Level 1 Specification <https://www.w3.org/TR/REC-DOM-Level-1/>`_ 105 The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`. 106 107 108.. _minidom-objects: 109 110DOM Objects 111----------- 112 113The definition of the DOM API for Python is given as part of the :mod:`xml.dom` 114module documentation. This section lists the differences between the API and 115:mod:`xml.dom.minidom`. 116 117 118.. method:: Node.unlink() 119 120 Break internal references within the DOM so that it will be garbage collected on 121 versions of Python without cyclic GC. Even when cyclic GC is available, using 122 this can make large amounts of memory available sooner, so calling this on DOM 123 objects as soon as they are no longer needed is good practice. This only needs 124 to be called on the :class:`Document` object, but may be called on child nodes 125 to discard children of that node. 126 127 You can avoid calling this method explicitly by using the :keyword:`with` 128 statement. The following code will automatically unlink *dom* when the 129 :keyword:`!with` block is exited:: 130 131 with xml.dom.minidom.parse(datasource) as dom: 132 ... # Work with dom. 133 134 135.. method:: Node.writexml(writer, indent="", addindent="", newl="", \ 136 encoding=None, standalone=None) 137 138 Write XML to the writer object. The writer receives texts but not bytes as input, 139 it should have a :meth:`write` method which matches that of the file object 140 interface. The *indent* parameter is the indentation of the current node. 141 The *addindent* parameter is the incremental indentation to use for subnodes 142 of the current one. The *newl* parameter specifies the string to use to 143 terminate newlines. 144 145 For the :class:`Document` node, an additional keyword argument *encoding* can 146 be used to specify the encoding field of the XML header. 147 148 Similarly, explicitly stating the *standalone* argument causes the 149 standalone document declarations to be added to the prologue of the XML 150 document. 151 If the value is set to `True`, `standalone="yes"` is added, 152 otherwise it is set to `"no"`. 153 Not stating the argument will omit the declaration from the document. 154 155 .. versionchanged:: 3.8 156 The :meth:`writexml` method now preserves the attribute order specified 157 by the user. 158 159 .. versionchanged:: 3.9 160 The *standalone* parameter was added. 161 162.. method:: Node.toxml(encoding=None, standalone=None) 163 164 Return a string or byte string containing the XML represented by 165 the DOM node. 166 167 With an explicit *encoding* [1]_ argument, the result is a byte 168 string in the specified encoding. 169 With no *encoding* argument, the result is a Unicode string, and the 170 XML declaration in the resulting string does not specify an 171 encoding. Encoding this string in an encoding other than UTF-8 is 172 likely incorrect, since UTF-8 is the default encoding of XML. 173 174 The *standalone* argument behaves exactly as in :meth:`writexml`. 175 176 .. versionchanged:: 3.8 177 The :meth:`toxml` method now preserves the attribute order specified 178 by the user. 179 180 .. versionchanged:: 3.9 181 The *standalone* parameter was added. 182 183.. method:: Node.toprettyxml(indent="\\t", newl="\\n", encoding=None, \ 184 standalone=None) 185 186 Return a pretty-printed version of the document. *indent* specifies the 187 indentation string and defaults to a tabulator; *newl* specifies the string 188 emitted at the end of each line and defaults to ``\n``. 189 190 The *encoding* argument behaves like the corresponding argument of 191 :meth:`toxml`. 192 193 The *standalone* argument behaves exactly as in :meth:`writexml`. 194 195 .. versionchanged:: 3.8 196 The :meth:`toprettyxml` method now preserves the attribute order specified 197 by the user. 198 199 .. versionchanged:: 3.9 200 The *standalone* parameter was added. 201 202.. _dom-example: 203 204DOM Example 205----------- 206 207This example program is a fairly realistic example of a simple program. In this 208particular case, we do not take much advantage of the flexibility of the DOM. 209 210.. literalinclude:: ../includes/minidom-example.py 211 212 213.. _minidom-and-dom: 214 215minidom and the DOM standard 216---------------------------- 217 218The :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with 219some DOM 2 features (primarily namespace features). 220 221Usage of the DOM interface in Python is straight-forward. The following mapping 222rules apply: 223 224* Interfaces are accessed through instance objects. Applications should not 225 instantiate the classes themselves; they should use the creator functions 226 available on the :class:`Document` object. Derived interfaces support all 227 operations (and attributes) from the base interfaces, plus any new operations. 228 229* Operations are used as methods. Since the DOM uses only :keyword:`in` 230 parameters, the arguments are passed in normal order (from left to right). 231 There are no optional arguments. ``void`` operations return ``None``. 232 233* IDL attributes map to instance attributes. For compatibility with the OMG IDL 234 language mapping for Python, an attribute ``foo`` can also be accessed through 235 accessor methods :meth:`_get_foo` and :meth:`_set_foo`. ``readonly`` 236 attributes must not be changed; this is not enforced at runtime. 237 238* The types ``short int``, ``unsigned int``, ``unsigned long long``, and 239 ``boolean`` all map to Python integer objects. 240 241* The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports 242 either bytes or strings, but will normally produce strings. 243 Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL 244 ``null`` value by the DOM specification from the W3C. 245 246* ``const`` declarations map to variables in their respective scope (e.g. 247 ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed. 248 249* ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`. 250 Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as 251 :exc:`TypeError` and :exc:`AttributeError`. 252 253* :class:`NodeList` objects are implemented using Python's built-in list type. 254 These objects provide the interface defined in the DOM specification, but with 255 earlier versions of Python they do not support the official API. They are, 256 however, much more "Pythonic" than the interface defined in the W3C 257 recommendations. 258 259The following interfaces have no implementation in :mod:`xml.dom.minidom`: 260 261* :class:`DOMTimeStamp` 262 263* :class:`EntityReference` 264 265Most of these reflect information in the XML document that is not of general 266utility to most DOM users. 267 268.. rubric:: Footnotes 269 270.. [1] The encoding name included in the XML output should conform to 271 the appropriate standards. For example, "UTF-8" is valid, but 272 "UTF8" is not valid in an XML document's declaration, even though 273 Python accepts it as an encoding name. 274 See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 275 and https://www.iana.org/assignments/character-sets/character-sets.xhtml. 276