Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
doc/ | 03-May-2024 | - | 1,213 | 775 | ||
html5lib/ | 03-May-2024 | - | 14,161 | 11,788 | ||
utils/ | 03-May-2024 | - | 236 | 189 | ||
.gitignore | D | 03-May-2024 | 1.8 KiB | 83 | 71 | |
.gitmodules | D | 03-May-2024 | 109 | 4 | 3 | |
.travis.yml | D | 03-May-2024 | 580 | 38 | 31 | |
AUTHORS.rst | D | 03-May-2024 | 651 | 44 | 38 | |
CHANGES.rst | D | 03-May-2024 | 5.3 KiB | 218 | 133 | |
CONTRIBUTING.rst | D | 03-May-2024 | 2.4 KiB | 61 | 43 | |
LICENSE | D | 03-May-2024 | 1.1 KiB | 21 | 17 | |
MANIFEST.in | D | 03-May-2024 | 149 | 7 | 6 | |
README.chromium | D | 03-May-2024 | 291 | 12 | 9 | |
README.rst | D | 03-May-2024 | 4.2 KiB | 158 | 103 | |
debug-info.py | D | 03-May-2024 | 779 | 38 | 26 | |
flake8-run.sh | D | 03-May-2024 | 393 | 15 | 11 | |
parse.py | D | 03-May-2024 | 8.9 KiB | 242 | 196 | |
requirements-install.sh | D | 03-May-2024 | 537 | 17 | 12 | |
requirements-optional-2.6.txt | D | 03-May-2024 | 126 | 6 | 4 | |
requirements-optional-cpython.txt | D | 03-May-2024 | 143 | 6 | 4 | |
requirements-optional.txt | D | 03-May-2024 | 334 | 14 | 10 | |
requirements-test.txt | D | 03-May-2024 | 58 | 6 | 4 | |
requirements.txt | D | 03-May-2024 | 4 | 2 | 1 | |
setup.py | D | 03-May-2024 | 2.2 KiB | 59 | 53 | |
tox.ini | D | 03-May-2024 | 513 | 31 | 27 |
README.chromium
1Name: html5lib-python 2Short Name: html5lib 3URL: https://github.com/html5lib/html5lib-python 4Version: 01b1ebb7ce0146b8082b1a7315431aac023eb046 5License: MIT 6 7Description: 8Standards-compliant library for parsing and serializing HTML documents and 9fragments in Python 10 11Local Modifications: None 12
README.rst
1html5lib 2======== 3 4.. image:: https://travis-ci.org/html5lib/html5lib-python.png?branch=master 5 :target: https://travis-ci.org/html5lib/html5lib-python 6 7html5lib is a pure-python library for parsing HTML. It is designed to 8conform to the WHATWG HTML specification, as is implemented by all major 9web browsers. 10 11 12Usage 13----- 14 15Simple usage follows this pattern: 16 17.. code-block:: python 18 19 import html5lib 20 with open("mydocument.html", "rb") as f: 21 document = html5lib.parse(f) 22 23or: 24 25.. code-block:: python 26 27 import html5lib 28 document = html5lib.parse("<p>Hello World!") 29 30By default, the ``document`` will be an ``xml.etree`` element instance. 31Whenever possible, html5lib chooses the accelerated ``ElementTree`` 32implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x). 33 34Two other tree types are supported: ``xml.dom.minidom`` and 35``lxml.etree``. To use an alternative format, specify the name of 36a treebuilder: 37 38.. code-block:: python 39 40 import html5lib 41 with open("mydocument.html", "rb") as f: 42 lxml_etree_document = html5lib.parse(f, treebuilder="lxml") 43 44When using with ``urllib2`` (Python 2), the charset from HTTP should be 45pass into html5lib as follows: 46 47.. code-block:: python 48 49 from contextlib import closing 50 from urllib2 import urlopen 51 import html5lib 52 53 with closing(urlopen("http://example.com/")) as f: 54 document = html5lib.parse(f, encoding=f.info().getparam("charset")) 55 56When using with ``urllib.request`` (Python 3), the charset from HTTP 57should be pass into html5lib as follows: 58 59.. code-block:: python 60 61 from urllib.request import urlopen 62 import html5lib 63 64 with urlopen("http://example.com/") as f: 65 document = html5lib.parse(f, encoding=f.info().get_content_charset()) 66 67To have more control over the parser, create a parser object explicitly. 68For instance, to make the parser raise exceptions on parse errors, use: 69 70.. code-block:: python 71 72 import html5lib 73 with open("mydocument.html", "rb") as f: 74 parser = html5lib.HTMLParser(strict=True) 75 document = parser.parse(f) 76 77When you're instantiating parser objects explicitly, pass a treebuilder 78class as the ``tree`` keyword argument to use an alternative document 79format: 80 81.. code-block:: python 82 83 import html5lib 84 parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom")) 85 minidom_document = parser.parse("<p>Hello World!") 86 87More documentation is available at http://html5lib.readthedocs.org/. 88 89 90Installation 91------------ 92 93html5lib works on CPython 2.6+, CPython 3.2+ and PyPy. To install it, 94use: 95 96.. code-block:: bash 97 98 $ pip install html5lib 99 100 101Optional Dependencies 102--------------------- 103 104The following third-party libraries may be used for additional 105functionality: 106 107- ``datrie`` can be used to improve parsing performance (though in 108 almost all cases the improvement is marginal); 109 110- ``lxml`` is supported as a tree format (for both building and 111 walking) under CPython (but *not* PyPy where it is known to cause 112 segfaults); 113 114- ``genshi`` has a treewalker (but not builder); and 115 116- ``charade`` can be used as a fallback when character encoding cannot 117 be determined; ``chardet``, from which it was forked, can also be used 118 on Python 2. 119 120- ``ordereddict`` can be used under Python 2.6 121 (``collections.OrderedDict`` is used instead on later versions) to 122 serialize attributes in alphabetical order. 123 124 125Bugs 126---- 127 128Please report any bugs on the `issue tracker 129<https://github.com/html5lib/html5lib-python/issues>`_. 130 131 132Tests 133----- 134 135Unit tests require the ``nose`` library and can be run using the 136``nosetests`` command in the root directory; ``ordereddict`` is 137required under Python 2.6. All should pass. 138 139Test data are contained in a separate `html5lib-tests 140<https://github.com/html5lib/html5lib-tests>`_ repository and included 141as a submodule, thus for git checkouts they must be initialized:: 142 143 $ git submodule init 144 $ git submodule update 145 146If you have all compatible Python implementations available on your 147system, you can run tests on all of them using the ``tox`` utility, 148which can be found on PyPI. 149 150 151Questions? 152---------- 153 154There's a mailing list available for support on Google Groups, 155`html5lib-discuss <http://groups.google.com/group/html5lib-discuss>`_, 156though you may get a quicker response asking on IRC in `#whatwg on 157irc.freenode.net <http://wiki.whatwg.org/wiki/IRC>`_. 158