1:mod:`xml.sax.handler` --- Base classes for SAX handlers 2======================================================== 3 4.. module:: xml.sax.handler 5 :synopsis: Base classes for SAX event handlers. 6 7.. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> 8.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> 9 10**Source code:** :source:`Lib/xml/sax/handler.py` 11 12-------------- 13 14The SAX API defines five kinds of handlers: content handlers, DTD handlers, 15error handlers, entity resolvers and lexical handlers. Applications normally 16only need to implement those interfaces whose events they are interested in; 17they can implement the interfaces in a single object or in multiple objects. 18Handler implementations should inherit from the base classes provided in the 19module :mod:`xml.sax.handler`, so that all methods get default implementations. 20 21 22.. class:: ContentHandler 23 24 This is the main callback interface in SAX, and the one most important to 25 applications. The order of events in this interface mirrors the order of the 26 information in the document. 27 28 29.. class:: DTDHandler 30 31 Handle DTD events. 32 33 This interface specifies only those DTD events required for basic parsing 34 (unparsed entities and attributes). 35 36 37.. class:: EntityResolver 38 39 Basic interface for resolving entities. If you create an object implementing 40 this interface, then register the object with your Parser, the parser will call 41 the method in your object to resolve all external entities. 42 43 44.. class:: ErrorHandler 45 46 Interface used by the parser to present error and warning messages to the 47 application. The methods of this object control whether errors are immediately 48 converted to exceptions or are handled in some other way. 49 50 51.. class:: LexicalHandler 52 53 Interface used by the parser to represent low frequency events which may not 54 be of interest to many applications. 55 56In addition to these classes, :mod:`xml.sax.handler` provides symbolic constants 57for the feature and property names. 58 59 60.. data:: feature_namespaces 61 62 | value: ``"http://xml.org/sax/features/namespaces"`` 63 | true: Perform Namespace processing. 64 | false: Optionally do not perform Namespace processing (implies 65 namespace-prefixes; default). 66 | access: (parsing) read-only; (not parsing) read/write 67 68 69.. data:: feature_namespace_prefixes 70 71 | value: ``"http://xml.org/sax/features/namespace-prefixes"`` 72 | true: Report the original prefixed names and attributes used for Namespace 73 declarations. 74 | false: Do not report attributes used for Namespace declarations, and 75 optionally do not report original prefixed names (default). 76 | access: (parsing) read-only; (not parsing) read/write 77 78 79.. data:: feature_string_interning 80 81 | value: ``"http://xml.org/sax/features/string-interning"`` 82 | true: All element names, prefixes, attribute names, Namespace URIs, and 83 local names are interned using the built-in intern function. 84 | false: Names are not necessarily interned, although they may be (default). 85 | access: (parsing) read-only; (not parsing) read/write 86 87 88.. data:: feature_validation 89 90 | value: ``"http://xml.org/sax/features/validation"`` 91 | true: Report all validation errors (implies external-general-entities and 92 external-parameter-entities). 93 | false: Do not report validation errors. 94 | access: (parsing) read-only; (not parsing) read/write 95 96 97.. data:: feature_external_ges 98 99 | value: ``"http://xml.org/sax/features/external-general-entities"`` 100 | true: Include all external general (text) entities. 101 | false: Do not include external general entities. 102 | access: (parsing) read-only; (not parsing) read/write 103 104 105.. data:: feature_external_pes 106 107 | value: ``"http://xml.org/sax/features/external-parameter-entities"`` 108 | true: Include all external parameter entities, including the external DTD 109 subset. 110 | false: Do not include any external parameter entities, even the external 111 DTD subset. 112 | access: (parsing) read-only; (not parsing) read/write 113 114 115.. data:: all_features 116 117 List of all features. 118 119 120.. data:: property_lexical_handler 121 122 | value: ``"http://xml.org/sax/properties/lexical-handler"`` 123 | data type: xml.sax.handler.LexicalHandler (not supported in Python 2) 124 | description: An optional extension handler for lexical events like 125 comments. 126 | access: read/write 127 128 129.. data:: property_declaration_handler 130 131 | value: ``"http://xml.org/sax/properties/declaration-handler"`` 132 | data type: xml.sax.sax2lib.DeclHandler (not supported in Python 2) 133 | description: An optional extension handler for DTD-related events other 134 than notations and unparsed entities. 135 | access: read/write 136 137 138.. data:: property_dom_node 139 140 | value: ``"http://xml.org/sax/properties/dom-node"`` 141 | data type: org.w3c.dom.Node (not supported in Python 2) 142 | description: When parsing, the current DOM node being visited if this is 143 a DOM iterator; when not parsing, the root DOM node for iteration. 144 | access: (parsing) read-only; (not parsing) read/write 145 146 147.. data:: property_xml_string 148 149 | value: ``"http://xml.org/sax/properties/xml-string"`` 150 | data type: String 151 | description: The literal string of characters that was the source for the 152 current event. 153 | access: read-only 154 155 156.. data:: all_properties 157 158 List of all known property names. 159 160 161.. _content-handler-objects: 162 163ContentHandler Objects 164---------------------- 165 166Users are expected to subclass :class:`ContentHandler` to support their 167application. The following methods are called by the parser on the appropriate 168events in the input document: 169 170 171.. method:: ContentHandler.setDocumentLocator(locator) 172 173 Called by the parser to give the application a locator for locating the origin 174 of document events. 175 176 SAX parsers are strongly encouraged (though not absolutely required) to supply a 177 locator: if it does so, it must supply the locator to the application by 178 invoking this method before invoking any of the other methods in the 179 DocumentHandler interface. 180 181 The locator allows the application to determine the end position of any 182 document-related event, even if the parser is not reporting an error. Typically, 183 the application will use this information for reporting its own errors (such as 184 character content that does not match an application's business rules). The 185 information returned by the locator is probably not sufficient for use with a 186 search engine. 187 188 Note that the locator will return correct information only during the invocation 189 of the events in this interface. The application should not attempt to use it at 190 any other time. 191 192 193.. method:: ContentHandler.startDocument() 194 195 Receive notification of the beginning of a document. 196 197 The SAX parser will invoke this method only once, before any other methods in 198 this interface or in DTDHandler (except for :meth:`setDocumentLocator`). 199 200 201.. method:: ContentHandler.endDocument() 202 203 Receive notification of the end of a document. 204 205 The SAX parser will invoke this method only once, and it will be the last method 206 invoked during the parse. The parser shall not invoke this method until it has 207 either abandoned parsing (because of an unrecoverable error) or reached the end 208 of input. 209 210 211.. method:: ContentHandler.startPrefixMapping(prefix, uri) 212 213 Begin the scope of a prefix-URI Namespace mapping. 214 215 The information from this event is not necessary for normal Namespace 216 processing: the SAX XML reader will automatically replace prefixes for element 217 and attribute names when the ``feature_namespaces`` feature is enabled (the 218 default). 219 220 There are cases, however, when applications need to use prefixes in character 221 data or in attribute values, where they cannot safely be expanded automatically; 222 the :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events supply the 223 information to the application to expand prefixes in those contexts itself, if 224 necessary. 225 226 .. XXX This is not really the default, is it? MvL 227 228 Note that :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events are not 229 guaranteed to be properly nested relative to each-other: all 230 :meth:`startPrefixMapping` events will occur before the corresponding 231 :meth:`startElement` event, and all :meth:`endPrefixMapping` events will occur 232 after the corresponding :meth:`endElement` event, but their order is not 233 guaranteed. 234 235 236.. method:: ContentHandler.endPrefixMapping(prefix) 237 238 End the scope of a prefix-URI mapping. 239 240 See :meth:`startPrefixMapping` for details. This event will always occur after 241 the corresponding :meth:`endElement` event, but the order of 242 :meth:`endPrefixMapping` events is not otherwise guaranteed. 243 244 245.. method:: ContentHandler.startElement(name, attrs) 246 247 Signals the start of an element in non-namespace mode. 248 249 The *name* parameter contains the raw XML 1.0 name of the element type as a 250 string and the *attrs* parameter holds an object of the 251 :class:`~xml.sax.xmlreader.Attributes` 252 interface (see :ref:`attributes-objects`) containing the attributes of 253 the element. The object passed as *attrs* may be re-used by the parser; holding 254 on to a reference to it is not a reliable way to keep a copy of the attributes. 255 To keep a copy of the attributes, use the :meth:`copy` method of the *attrs* 256 object. 257 258 259.. method:: ContentHandler.endElement(name) 260 261 Signals the end of an element in non-namespace mode. 262 263 The *name* parameter contains the name of the element type, just as with the 264 :meth:`startElement` event. 265 266 267.. method:: ContentHandler.startElementNS(name, qname, attrs) 268 269 Signals the start of an element in namespace mode. 270 271 The *name* parameter contains the name of the element type as a ``(uri, 272 localname)`` tuple, the *qname* parameter contains the raw XML 1.0 name used in 273 the source document, and the *attrs* parameter holds an instance of the 274 :class:`~xml.sax.xmlreader.AttributesNS` interface (see 275 :ref:`attributes-ns-objects`) 276 containing the attributes of the element. If no namespace is associated with 277 the element, the *uri* component of *name* will be ``None``. The object passed 278 as *attrs* may be re-used by the parser; holding on to a reference to it is not 279 a reliable way to keep a copy of the attributes. To keep a copy of the 280 attributes, use the :meth:`copy` method of the *attrs* object. 281 282 Parsers may set the *qname* parameter to ``None``, unless the 283 ``feature_namespace_prefixes`` feature is activated. 284 285 286.. method:: ContentHandler.endElementNS(name, qname) 287 288 Signals the end of an element in namespace mode. 289 290 The *name* parameter contains the name of the element type, just as with the 291 :meth:`startElementNS` method, likewise the *qname* parameter. 292 293 294.. method:: ContentHandler.characters(content) 295 296 Receive notification of character data. 297 298 The Parser will call this method to report each chunk of character data. SAX 299 parsers may return all contiguous character data in a single chunk, or they may 300 split it into several chunks; however, all of the characters in any single event 301 must come from the same external entity so that the Locator provides useful 302 information. 303 304 *content* may be a string or bytes instance; the ``expat`` reader module 305 always produces strings. 306 307 .. note:: 308 309 The earlier SAX 1 interface provided by the Python XML Special Interest Group 310 used a more Java-like interface for this method. Since most parsers used from 311 Python did not take advantage of the older interface, the simpler signature was 312 chosen to replace it. To convert old code to the new interface, use *content* 313 instead of slicing content with the old *offset* and *length* parameters. 314 315 316.. method:: ContentHandler.ignorableWhitespace(whitespace) 317 318 Receive notification of ignorable whitespace in element content. 319 320 Validating Parsers must use this method to report each chunk of ignorable 321 whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating 322 parsers may also use this method if they are capable of parsing and using 323 content models. 324 325 SAX parsers may return all contiguous whitespace in a single chunk, or they may 326 split it into several chunks; however, all of the characters in any single event 327 must come from the same external entity, so that the Locator provides useful 328 information. 329 330 331.. method:: ContentHandler.processingInstruction(target, data) 332 333 Receive notification of a processing instruction. 334 335 The Parser will invoke this method once for each processing instruction found: 336 note that processing instructions may occur before or after the main document 337 element. 338 339 A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a 340 text declaration (XML 1.0, section 4.3.1) using this method. 341 342 343.. method:: ContentHandler.skippedEntity(name) 344 345 Receive notification of a skipped entity. 346 347 The Parser will invoke this method once for each entity skipped. Non-validating 348 processors may skip entities if they have not seen the declarations (because, 349 for example, the entity was declared in an external DTD subset). All processors 350 may skip external entities, depending on the values of the 351 ``feature_external_ges`` and the ``feature_external_pes`` properties. 352 353 354.. _dtd-handler-objects: 355 356DTDHandler Objects 357------------------ 358 359:class:`DTDHandler` instances provide the following methods: 360 361 362.. method:: DTDHandler.notationDecl(name, publicId, systemId) 363 364 Handle a notation declaration event. 365 366 367.. method:: DTDHandler.unparsedEntityDecl(name, publicId, systemId, ndata) 368 369 Handle an unparsed entity declaration event. 370 371 372.. _entity-resolver-objects: 373 374EntityResolver Objects 375---------------------- 376 377 378.. method:: EntityResolver.resolveEntity(publicId, systemId) 379 380 Resolve the system identifier of an entity and return either the system 381 identifier to read from as a string, or an InputSource to read from. The default 382 implementation returns *systemId*. 383 384 385.. _sax-error-handler: 386 387ErrorHandler Objects 388-------------------- 389 390Objects with this interface are used to receive error and warning information 391from the :class:`~xml.sax.xmlreader.XMLReader`. If you create an object that 392implements this interface, then register the object with your 393:class:`~xml.sax.xmlreader.XMLReader`, the parser 394will call the methods in your object to report all warnings and errors. There 395are three levels of errors available: warnings, (possibly) recoverable errors, 396and unrecoverable errors. All methods take a :exc:`SAXParseException` as the 397only parameter. Errors and warnings may be converted to an exception by raising 398the passed-in exception object. 399 400 401.. method:: ErrorHandler.error(exception) 402 403 Called when the parser encounters a recoverable error. If this method does not 404 raise an exception, parsing may continue, but further document information 405 should not be expected by the application. Allowing the parser to continue may 406 allow additional errors to be discovered in the input document. 407 408 409.. method:: ErrorHandler.fatalError(exception) 410 411 Called when the parser encounters an error it cannot recover from; parsing is 412 expected to terminate when this method returns. 413 414 415.. method:: ErrorHandler.warning(exception) 416 417 Called when the parser presents minor warning information to the application. 418 Parsing is expected to continue when this method returns, and document 419 information will continue to be passed to the application. Raising an exception 420 in this method will cause parsing to end. 421 422 423.. _lexical-handler-objects: 424 425LexicalHandler Objects 426---------------------- 427Optional SAX2 handler for lexical events. 428 429This handler is used to obtain lexical information about an XML 430document. Lexical information includes information describing the 431document encoding used and XML comments embedded in the document, as 432well as section boundaries for the DTD and for any CDATA sections. 433The lexical handlers are used in the same manner as content handlers. 434 435Set the LexicalHandler of an XMLReader by using the setProperty method 436with the property identifier 437``'http://xml.org/sax/properties/lexical-handler'``. 438 439 440.. method:: LexicalHandler.comment(content) 441 442 Reports a comment anywhere in the document (including the DTD and 443 outside the document element). 444 445.. method:: LexicalHandler.startDTD(name, public_id, system_id) 446 447 Reports the start of the DTD declarations if the document has an 448 associated DTD. 449 450.. method:: LexicalHandler.endDTD() 451 452 Reports the end of DTD declaration. 453 454.. method:: LexicalHandler.startCDATA() 455 456 Reports the start of a CDATA marked section. 457 458 The contents of the CDATA marked section will be reported through 459 the characters handler. 460 461.. method:: LexicalHandler.endCDATA() 462 463 Reports the end of a CDATA marked section. 464