• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`xml.dom.pulldom` --- Support for building partial DOM trees
2=================================================================
3
4.. module:: xml.dom.pulldom
5   :synopsis: Support for building partial DOM trees from SAX events.
6
7.. moduleauthor:: Paul Prescod <paul@prescod.net>
8
9**Source code:** :source:`Lib/xml/dom/pulldom.py`
10
11--------------
12
13The :mod:`xml.dom.pulldom` module provides a "pull parser" which can also be
14asked to produce DOM-accessible fragments of the document where necessary. The
15basic concept involves pulling "events" from a stream of incoming XML and
16processing them. In contrast to SAX which also employs an event-driven
17processing model together with callbacks, the user of a pull parser is
18responsible for explicitly pulling events from the stream, looping over those
19events until either processing is finished or an error condition occurs.
20
21
22.. warning::
23
24   The :mod:`xml.dom.pulldom` module is not secure against
25   maliciously constructed data.  If you need to parse untrusted or
26   unauthenticated data see :ref:`xml-vulnerabilities`.
27
28
29Example::
30
31   from xml.dom import pulldom
32
33   doc = pulldom.parse('sales_items.xml')
34   for event, node in doc:
35       if event == pulldom.START_ELEMENT and node.tagName == 'item':
36           if int(node.getAttribute('price')) > 50:
37               doc.expandNode(node)
38               print(node.toxml())
39
40``event`` is a constant and can be one of:
41
42* :data:`START_ELEMENT`
43* :data:`END_ELEMENT`
44* :data:`COMMENT`
45* :data:`START_DOCUMENT`
46* :data:`END_DOCUMENT`
47* :data:`CHARACTERS`
48* :data:`PROCESSING_INSTRUCTION`
49* :data:`IGNORABLE_WHITESPACE`
50
51``node`` is an object of type :class:`xml.dom.minidom.Document`,
52:class:`xml.dom.minidom.Element` or :class:`xml.dom.minidom.Text`.
53
54Since the document is treated as a "flat" stream of events, the document "tree"
55is implicitly traversed and the desired elements are found regardless of their
56depth in the tree. In other words, one does not need to consider hierarchical
57issues such as recursive searching of the document nodes, although if the
58context of elements were important, one would either need to maintain some
59context-related state (i.e. remembering where one is in the document at any
60given point) or to make use of the :func:`DOMEventStream.expandNode` method
61and switch to DOM-related processing.
62
63
64.. class:: PullDom(documentFactory=None)
65
66   Subclass of :class:`xml.sax.handler.ContentHandler`.
67
68
69.. class:: SAX2DOM(documentFactory=None)
70
71   Subclass of :class:`xml.sax.handler.ContentHandler`.
72
73
74.. function:: parse(stream_or_string, parser=None, bufsize=None)
75
76   Return a :class:`DOMEventStream` from the given input. *stream_or_string* may be
77   either a file name, or a file-like object. *parser*, if given, must be an
78   :class:`~xml.sax.xmlreader.XMLReader` object. This function will change the
79   document handler of the
80   parser and activate namespace support; other parser configuration (like
81   setting an entity resolver) must have been done in advance.
82
83If you have XML in a string, you can use the :func:`parseString` function instead:
84
85.. function:: parseString(string, parser=None)
86
87   Return a :class:`DOMEventStream` that represents the (Unicode) *string*.
88
89.. data:: default_bufsize
90
91   Default value for the *bufsize* parameter to :func:`parse`.
92
93   The value of this variable can be changed before calling :func:`parse` and
94   the new value will take effect.
95
96.. _domeventstream-objects:
97
98DOMEventStream Objects
99----------------------
100
101.. class:: DOMEventStream(stream, parser, bufsize)
102
103
104   .. method:: getEvent()
105
106      Return a tuple containing *event* and the current *node* as
107      :class:`xml.dom.minidom.Document` if event equals :data:`START_DOCUMENT`,
108      :class:`xml.dom.minidom.Element` if event equals :data:`START_ELEMENT` or
109      :data:`END_ELEMENT` or :class:`xml.dom.minidom.Text` if event equals
110      :data:`CHARACTERS`.
111      The current node does not contain informations about its children, unless
112      :func:`expandNode` is called.
113
114   .. method:: expandNode(node)
115
116      Expands all children of *node* into *node*. Example::
117
118          from xml.dom import pulldom
119
120          xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>'
121          doc = pulldom.parseString(xml)
122          for event, node in doc:
123              if event == pulldom.START_ELEMENT and node.tagName == 'p':
124                  # Following statement only prints '<p/>'
125                  print(node.toxml())
126                  doc.expandNode(node)
127                  # Following statement prints node with all its children '<p>Some text <div>and more</div></p>'
128                  print(node.toxml())
129
130   .. method:: DOMEventStream.reset()
131
132