1:mod:`xml.dom.pulldom` --- Support for building partial DOM trees 2================================================================= 3 4.. module:: xml.dom.pulldom 5 :synopsis: Support for building partial DOM trees from SAX events. 6 7.. moduleauthor:: Paul Prescod <paul@prescod.net> 8 9**Source code:** :source:`Lib/xml/dom/pulldom.py` 10 11-------------- 12 13The :mod:`xml.dom.pulldom` module provides a "pull parser" which can also be 14asked to produce DOM-accessible fragments of the document where necessary. The 15basic concept involves pulling "events" from a stream of incoming XML and 16processing them. In contrast to SAX which also employs an event-driven 17processing model together with callbacks, the user of a pull parser is 18responsible for explicitly pulling events from the stream, looping over those 19events until either processing is finished or an error condition occurs. 20 21 22.. warning:: 23 24 The :mod:`xml.dom.pulldom` module is not secure against 25 maliciously constructed data. If you need to parse untrusted or 26 unauthenticated data see :ref:`xml-vulnerabilities`. 27 28.. versionchanged:: 3.7.1 29 30 The SAX parser no longer processes general external entities by default to 31 increase security by default. To enable processing of external entities, 32 pass a custom parser instance in:: 33 34 from xml.dom.pulldom import parse 35 from xml.sax import make_parser 36 from xml.sax.handler import feature_external_ges 37 38 parser = make_parser() 39 parser.setFeature(feature_external_ges, True) 40 parse(filename, parser=parser) 41 42 43Example:: 44 45 from xml.dom import pulldom 46 47 doc = pulldom.parse('sales_items.xml') 48 for event, node in doc: 49 if event == pulldom.START_ELEMENT and node.tagName == 'item': 50 if int(node.getAttribute('price')) > 50: 51 doc.expandNode(node) 52 print(node.toxml()) 53 54``event`` is a constant and can be one of: 55 56* :data:`START_ELEMENT` 57* :data:`END_ELEMENT` 58* :data:`COMMENT` 59* :data:`START_DOCUMENT` 60* :data:`END_DOCUMENT` 61* :data:`CHARACTERS` 62* :data:`PROCESSING_INSTRUCTION` 63* :data:`IGNORABLE_WHITESPACE` 64 65``node`` is an object of type :class:`xml.dom.minidom.Document`, 66:class:`xml.dom.minidom.Element` or :class:`xml.dom.minidom.Text`. 67 68Since the document is treated as a "flat" stream of events, the document "tree" 69is implicitly traversed and the desired elements are found regardless of their 70depth in the tree. In other words, one does not need to consider hierarchical 71issues such as recursive searching of the document nodes, although if the 72context of elements were important, one would either need to maintain some 73context-related state (i.e. remembering where one is in the document at any 74given point) or to make use of the :func:`DOMEventStream.expandNode` method 75and switch to DOM-related processing. 76 77 78.. class:: PullDom(documentFactory=None) 79 80 Subclass of :class:`xml.sax.handler.ContentHandler`. 81 82 83.. class:: SAX2DOM(documentFactory=None) 84 85 Subclass of :class:`xml.sax.handler.ContentHandler`. 86 87 88.. function:: parse(stream_or_string, parser=None, bufsize=None) 89 90 Return a :class:`DOMEventStream` from the given input. *stream_or_string* may be 91 either a file name, or a file-like object. *parser*, if given, must be an 92 :class:`~xml.sax.xmlreader.XMLReader` object. This function will change the 93 document handler of the 94 parser and activate namespace support; other parser configuration (like 95 setting an entity resolver) must have been done in advance. 96 97If you have XML in a string, you can use the :func:`parseString` function instead: 98 99.. function:: parseString(string, parser=None) 100 101 Return a :class:`DOMEventStream` that represents the (Unicode) *string*. 102 103.. data:: default_bufsize 104 105 Default value for the *bufsize* parameter to :func:`parse`. 106 107 The value of this variable can be changed before calling :func:`parse` and 108 the new value will take effect. 109 110.. _domeventstream-objects: 111 112DOMEventStream Objects 113---------------------- 114 115.. class:: DOMEventStream(stream, parser, bufsize) 116 117 .. deprecated:: 3.8 118 Support for :meth:`sequence protocol <__getitem__>` is deprecated. 119 120 .. method:: getEvent() 121 122 Return a tuple containing *event* and the current *node* as 123 :class:`xml.dom.minidom.Document` if event equals :data:`START_DOCUMENT`, 124 :class:`xml.dom.minidom.Element` if event equals :data:`START_ELEMENT` or 125 :data:`END_ELEMENT` or :class:`xml.dom.minidom.Text` if event equals 126 :data:`CHARACTERS`. 127 The current node does not contain information about its children, unless 128 :func:`expandNode` is called. 129 130 .. method:: expandNode(node) 131 132 Expands all children of *node* into *node*. Example:: 133 134 from xml.dom import pulldom 135 136 xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>' 137 doc = pulldom.parseString(xml) 138 for event, node in doc: 139 if event == pulldom.START_ELEMENT and node.tagName == 'p': 140 # Following statement only prints '<p/>' 141 print(node.toxml()) 142 doc.expandNode(node) 143 # Following statement prints node with all its children '<p>Some text <div>and more</div></p>' 144 print(node.toxml()) 145 146 .. method:: DOMEventStream.reset() 147 148