1:mod:`xml.dom.pulldom` --- Support for building partial DOM trees 2================================================================= 3 4.. module:: xml.dom.pulldom 5 :synopsis: Support for building partial DOM trees from SAX events. 6 7.. moduleauthor:: Paul Prescod <paul@prescod.net> 8 9**Source code:** :source:`Lib/xml/dom/pulldom.py` 10 11-------------- 12 13The :mod:`xml.dom.pulldom` module provides a "pull parser" which can also be 14asked to produce DOM-accessible fragments of the document where necessary. The 15basic concept involves pulling "events" from a stream of incoming XML and 16processing them. In contrast to SAX which also employs an event-driven 17processing model together with callbacks, the user of a pull parser is 18responsible for explicitly pulling events from the stream, looping over those 19events until either processing is finished or an error condition occurs. 20 21 22.. warning:: 23 24 The :mod:`xml.dom.pulldom` module is not secure against 25 maliciously constructed data. If you need to parse untrusted or 26 unauthenticated data see :ref:`xml-vulnerabilities`. 27 28 29Example:: 30 31 from xml.dom import pulldom 32 33 doc = pulldom.parse('sales_items.xml') 34 for event, node in doc: 35 if event == pulldom.START_ELEMENT and node.tagName == 'item': 36 if int(node.getAttribute('price')) > 50: 37 doc.expandNode(node) 38 print(node.toxml()) 39 40``event`` is a constant and can be one of: 41 42* :data:`START_ELEMENT` 43* :data:`END_ELEMENT` 44* :data:`COMMENT` 45* :data:`START_DOCUMENT` 46* :data:`END_DOCUMENT` 47* :data:`CHARACTERS` 48* :data:`PROCESSING_INSTRUCTION` 49* :data:`IGNORABLE_WHITESPACE` 50 51``node`` is an object of type :class:`xml.dom.minidom.Document`, 52:class:`xml.dom.minidom.Element` or :class:`xml.dom.minidom.Text`. 53 54Since the document is treated as a "flat" stream of events, the document "tree" 55is implicitly traversed and the desired elements are found regardless of their 56depth in the tree. In other words, one does not need to consider hierarchical 57issues such as recursive searching of the document nodes, although if the 58context of elements were important, one would either need to maintain some 59context-related state (i.e. remembering where one is in the document at any 60given point) or to make use of the :func:`DOMEventStream.expandNode` method 61and switch to DOM-related processing. 62 63 64.. class:: PullDom(documentFactory=None) 65 66 Subclass of :class:`xml.sax.handler.ContentHandler`. 67 68 69.. class:: SAX2DOM(documentFactory=None) 70 71 Subclass of :class:`xml.sax.handler.ContentHandler`. 72 73 74.. function:: parse(stream_or_string, parser=None, bufsize=None) 75 76 Return a :class:`DOMEventStream` from the given input. *stream_or_string* may be 77 either a file name, or a file-like object. *parser*, if given, must be an 78 :class:`~xml.sax.xmlreader.XMLReader` object. This function will change the 79 document handler of the 80 parser and activate namespace support; other parser configuration (like 81 setting an entity resolver) must have been done in advance. 82 83If you have XML in a string, you can use the :func:`parseString` function instead: 84 85.. function:: parseString(string, parser=None) 86 87 Return a :class:`DOMEventStream` that represents the (Unicode) *string*. 88 89.. data:: default_bufsize 90 91 Default value for the *bufsize* parameter to :func:`parse`. 92 93 The value of this variable can be changed before calling :func:`parse` and 94 the new value will take effect. 95 96.. _domeventstream-objects: 97 98DOMEventStream Objects 99---------------------- 100 101.. class:: DOMEventStream(stream, parser, bufsize) 102 103 104 .. method:: getEvent() 105 106 Return a tuple containing *event* and the current *node* as 107 :class:`xml.dom.minidom.Document` if event equals :data:`START_DOCUMENT`, 108 :class:`xml.dom.minidom.Element` if event equals :data:`START_ELEMENT` or 109 :data:`END_ELEMENT` or :class:`xml.dom.minidom.Text` if event equals 110 :data:`CHARACTERS`. 111 The current node does not contain informations about its children, unless 112 :func:`expandNode` is called. 113 114 .. method:: expandNode(node) 115 116 Expands all children of *node* into *node*. Example:: 117 118 from xml.dom import pulldom 119 120 xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>' 121 doc = pulldom.parseString(xml) 122 for event, node in doc: 123 if event == pulldom.START_ELEMENT and node.tagName == 'p': 124 # Following statement only prints '<p/>' 125 print(node.toxml()) 126 doc.expandNode(node) 127 # Following statement prints node with all its children '<p>Some text <div>and more</div></p>' 128 print(node.toxml()) 129 130 .. method:: DOMEventStream.reset() 131 132