1<html> 2<head> 3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 4<title>HTMLparser: interface for an HTML 4.0 non-verifying parser</title> 5<meta name="generator" content="Libxml2 devhelp stylesheet"> 6<link rel="start" href="index.html" title="libxml2 Reference Manual"> 7<link rel="up" href="general.html" title="API"> 8<link rel="stylesheet" href="style.css" type="text/css"> 9<link rel="chapter" href="general.html" title="API"> 10</head> 11<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 12<table class="navigation" width="100%" summary="Navigation header" cellpadding="2" cellspacing="2"><tr valign="middle"> 13<td><a accesskey="u" href="general.html"><img src="up.png" width="24" height="24" border="0" alt="Up"></a></td> 14<td><a accesskey="h" href="index.html"><img src="home.png" width="24" height="24" border="0" alt="Home"></a></td> 15<td><a accesskey="n" href="libxml2-HTMLtree.html"><img src="right.png" width="24" height="24" border="0" alt="Next"></a></td> 16<th width="100%" align="center">libxml2 Reference Manual</th> 17</tr></table> 18<h2><span class="refentrytitle">HTMLparser</span></h2> 19<p>HTMLparser - interface for an HTML 4.0 non-verifying parser</p> 20<p>this module implements an HTML 4.0 non-verifying parser with API compatible with the XML parser ones. It should be able to parse "real world" HTML, even if severely broken from a specification point of view. </p> 21<p>Author(s): Daniel Veillard </p> 22<div class="refsynopsisdiv"> 23<h2>Synopsis</h2> 24<pre class="synopsis">#define <a href="#htmlDefaultSubelement">htmlDefaultSubelement</a>(elt); 25#define <a href="#htmlElementAllowedHereDesc">htmlElementAllowedHereDesc</a>(parent, elt); 26#define <a href="#htmlRequiredAttrs">htmlRequiredAttrs</a>(elt); 27typedef <a href="libxml2-tree.html#xmlDocPtr">xmlDocPtr</a> <a href="#htmlDocPtr">htmlDocPtr</a>; 28typedef struct _htmlElemDesc <a href="#htmlElemDesc">htmlElemDesc</a>; 29typedef <a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * <a href="#htmlElemDescPtr">htmlElemDescPtr</a>; 30typedef struct _htmlEntityDesc <a href="#htmlEntityDesc">htmlEntityDesc</a>; 31typedef <a href="libxml2-HTMLparser.html#htmlEntityDesc">htmlEntityDesc</a> * <a href="#htmlEntityDescPtr">htmlEntityDescPtr</a>; 32typedef <a href="libxml2-tree.html#xmlNodePtr">xmlNodePtr</a> <a href="#htmlNodePtr">htmlNodePtr</a>; 33typedef <a href="libxml2-tree.html#xmlParserCtxt">xmlParserCtxt</a> <a href="#htmlParserCtxt">htmlParserCtxt</a>; 34typedef <a href="libxml2-tree.html#xmlParserCtxtPtr">xmlParserCtxtPtr</a> <a href="#htmlParserCtxtPtr">htmlParserCtxtPtr</a>; 35typedef <a href="libxml2-tree.html#xmlParserInput">xmlParserInput</a> <a href="#htmlParserInput">htmlParserInput</a>; 36typedef <a href="libxml2-tree.html#xmlParserInputPtr">xmlParserInputPtr</a> <a href="#htmlParserInputPtr">htmlParserInputPtr</a>; 37typedef <a href="libxml2-parser.html#xmlParserNodeInfo">xmlParserNodeInfo</a> <a href="#htmlParserNodeInfo">htmlParserNodeInfo</a>; 38typedef enum <a href="#htmlParserOption">htmlParserOption</a>; 39typedef <a href="libxml2-tree.html#xmlSAXHandler">xmlSAXHandler</a> <a href="#htmlSAXHandler">htmlSAXHandler</a>; 40typedef <a href="libxml2-tree.html#xmlSAXHandlerPtr">xmlSAXHandlerPtr</a> <a href="#htmlSAXHandlerPtr">htmlSAXHandlerPtr</a>; 41typedef enum <a href="#htmlStatus">htmlStatus</a>; 42int <a href="#UTF8ToHtml">UTF8ToHtml</a> (unsigned char * out, <br> int * outlen, <br> const unsigned char * in, <br> int * inlen); 43<a href="libxml2-HTMLparser.html#htmlStatus">htmlStatus</a> <a href="#htmlAttrAllowed">htmlAttrAllowed</a> (const <a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * elt, <br> const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * attr, <br> int legacy); 44int <a href="#htmlAutoCloseTag">htmlAutoCloseTag</a> (<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> doc, <br> const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * name, <br> <a href="libxml2-HTMLparser.html#htmlNodePtr">htmlNodePtr</a> elem); 45<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> <a href="#htmlCreateFileParserCtxt">htmlCreateFileParserCtxt</a> (const char * filename, <br> const char * encoding); 46<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> <a href="#htmlCreateMemoryParserCtxt">htmlCreateMemoryParserCtxt</a> (const char * buffer, <br> int size); 47<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> <a href="#htmlCreatePushParserCtxt">htmlCreatePushParserCtxt</a> (<a href="libxml2-HTMLparser.html#htmlSAXHandlerPtr">htmlSAXHandlerPtr</a> sax, <br> void * user_data, <br> const char * chunk, <br> int size, <br> const char * filename, <br> <a href="libxml2-encoding.html#xmlCharEncoding">xmlCharEncoding</a> enc); 48<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlCtxtReadDoc">htmlCtxtReadDoc</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * cur, <br> const char * URL, <br> const char * encoding, <br> int options); 49<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlCtxtReadFd">htmlCtxtReadFd</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> int fd, <br> const char * URL, <br> const char * encoding, <br> int options); 50<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlCtxtReadFile">htmlCtxtReadFile</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> const char * filename, <br> const char * encoding, <br> int options); 51<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlCtxtReadIO">htmlCtxtReadIO</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> <a href="libxml2-xmlIO.html#xmlInputReadCallback">xmlInputReadCallback</a> ioread, <br> <a href="libxml2-xmlIO.html#xmlInputCloseCallback">xmlInputCloseCallback</a> ioclose, <br> void * ioctx, <br> const char * URL, <br> const char * encoding, <br> int options); 52<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlCtxtReadMemory">htmlCtxtReadMemory</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> const char * buffer, <br> int size, <br> const char * URL, <br> const char * encoding, <br> int options); 53void <a href="#htmlCtxtReset">htmlCtxtReset</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt); 54int <a href="#htmlCtxtUseOptions">htmlCtxtUseOptions</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> int options); 55int <a href="#htmlElementAllowedHere">htmlElementAllowedHere</a> (const <a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * parent, <br> const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * elt); 56<a href="libxml2-HTMLparser.html#htmlStatus">htmlStatus</a> <a href="#htmlElementStatusHere">htmlElementStatusHere</a> (const <a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * parent, <br> const <a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * elt); 57int <a href="#htmlEncodeEntities">htmlEncodeEntities</a> (unsigned char * out, <br> int * outlen, <br> const unsigned char * in, <br> int * inlen, <br> int quoteChar); 58const <a href="libxml2-HTMLparser.html#htmlEntityDesc">htmlEntityDesc</a> * <a href="#htmlEntityLookup">htmlEntityLookup</a> (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * name); 59const <a href="libxml2-HTMLparser.html#htmlEntityDesc">htmlEntityDesc</a> * <a href="#htmlEntityValueLookup">htmlEntityValueLookup</a> (unsigned int value); 60void <a href="#htmlFreeParserCtxt">htmlFreeParserCtxt</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt); 61int <a href="#htmlHandleOmittedElem">htmlHandleOmittedElem</a> (int val); 62void <a href="#htmlInitAutoClose">htmlInitAutoClose</a> (void); 63int <a href="#htmlIsAutoClosed">htmlIsAutoClosed</a> (<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> doc, <br> <a href="libxml2-HTMLparser.html#htmlNodePtr">htmlNodePtr</a> elem); 64int <a href="#htmlIsScriptAttribute">htmlIsScriptAttribute</a> (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * name); 65<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> <a href="#htmlNewParserCtxt">htmlNewParserCtxt</a> (void); 66<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> <a href="#htmlNewSAXParserCtxt">htmlNewSAXParserCtxt</a> (<a href="libxml2-HTMLparser.html#htmlSAXHandlerPtr">htmlSAXHandlerPtr</a> sax, <br> void * userData); 67<a href="libxml2-HTMLparser.html#htmlStatus">htmlStatus</a> <a href="#htmlNodeStatus">htmlNodeStatus</a> (const <a href="libxml2-HTMLparser.html#htmlNodePtr">htmlNodePtr</a> node, <br> int legacy); 68int <a href="#htmlParseCharRef">htmlParseCharRef</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt); 69int <a href="#htmlParseChunk">htmlParseChunk</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> const char * chunk, <br> int size, <br> int terminate); 70<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlParseDoc">htmlParseDoc</a> (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * cur, <br> const char * encoding); 71int <a href="#htmlParseDocument">htmlParseDocument</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt); 72void <a href="#htmlParseElement">htmlParseElement</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt); 73const <a href="libxml2-HTMLparser.html#htmlEntityDesc">htmlEntityDesc</a> * <a href="#htmlParseEntityRef">htmlParseEntityRef</a> (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> ** str); 74<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlParseFile">htmlParseFile</a> (const char * filename, <br> const char * encoding); 75<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlReadDoc">htmlReadDoc</a> (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * cur, <br> const char * URL, <br> const char * encoding, <br> int options); 76<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlReadFd">htmlReadFd</a> (int fd, <br> const char * URL, <br> const char * encoding, <br> int options); 77<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlReadFile">htmlReadFile</a> (const char * filename, <br> const char * encoding, <br> int options); 78<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlReadIO">htmlReadIO</a> (<a href="libxml2-xmlIO.html#xmlInputReadCallback">xmlInputReadCallback</a> ioread, <br> <a href="libxml2-xmlIO.html#xmlInputCloseCallback">xmlInputCloseCallback</a> ioclose, <br> void * ioctx, <br> const char * URL, <br> const char * encoding, <br> int options); 79<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlReadMemory">htmlReadMemory</a> (const char * buffer, <br> int size, <br> const char * URL, <br> const char * encoding, <br> int options); 80<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlSAXParseDoc">htmlSAXParseDoc</a> (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * cur, <br> const char * encoding, <br> <a href="libxml2-HTMLparser.html#htmlSAXHandlerPtr">htmlSAXHandlerPtr</a> sax, <br> void * userData); 81<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> <a href="#htmlSAXParseFile">htmlSAXParseFile</a> (const char * filename, <br> const char * encoding, <br> <a href="libxml2-HTMLparser.html#htmlSAXHandlerPtr">htmlSAXHandlerPtr</a> sax, <br> void * userData); 82const <a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * <a href="#htmlTagLookup">htmlTagLookup</a> (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * tag); 83</pre> 84</div> 85<div class="refsect1" lang="en"><h2>Description</h2></div> 86<div class="refsect1" lang="en"> 87<h2>Details</h2> 88<div class="refsect2" lang="en"> 89<div class="refsect2" lang="en"> 90<h3> 91<a name="htmlDefaultSubelement">Macro </a>htmlDefaultSubelement</h3> 92<pre class="programlisting">#define <a href="#htmlDefaultSubelement">htmlDefaultSubelement</a>(elt); 93</pre> 94<p>Returns the default subelement for this element</p> 95<div class="variablelist"><table border="0"> 96<col align="left"> 97<tbody><tr> 98<td><span class="term"><i><tt>elt</tt></i>:</span></td> 99<td>HTML element</td> 100</tr></tbody> 101</table></div> 102</div> 103<hr> 104<div class="refsect2" lang="en"> 105<h3> 106<a name="htmlElementAllowedHereDesc">Macro </a>htmlElementAllowedHereDesc</h3> 107<pre class="programlisting">#define <a href="#htmlElementAllowedHereDesc">htmlElementAllowedHereDesc</a>(parent, elt); 108</pre> 109<p>Checks whether an HTML element description may be a direct child of the specified element. Returns 1 if allowed; 0 otherwise.</p> 110<div class="variablelist"><table border="0"> 111<col align="left"> 112<tbody> 113<tr> 114<td><span class="term"><i><tt>parent</tt></i>:</span></td> 115<td>HTML parent element</td> 116</tr> 117<tr> 118<td><span class="term"><i><tt>elt</tt></i>:</span></td> 119<td>HTML element</td> 120</tr> 121</tbody> 122</table></div> 123</div> 124<hr> 125<div class="refsect2" lang="en"> 126<h3> 127<a name="htmlRequiredAttrs">Macro </a>htmlRequiredAttrs</h3> 128<pre class="programlisting">#define <a href="#htmlRequiredAttrs">htmlRequiredAttrs</a>(elt); 129</pre> 130<p>Returns the attributes required for the specified element.</p> 131<div class="variablelist"><table border="0"> 132<col align="left"> 133<tbody><tr> 134<td><span class="term"><i><tt>elt</tt></i>:</span></td> 135<td>HTML element</td> 136</tr></tbody> 137</table></div> 138</div> 139<hr> 140<div class="refsect2" lang="en"> 141<h3> 142<a name="htmlDocPtr">Typedef </a>htmlDocPtr</h3> 143<pre class="programlisting"><a href="libxml2-tree.html#xmlDocPtr">xmlDocPtr</a> htmlDocPtr; 144</pre> 145<p></p> 146</div> 147<hr> 148<div class="refsect2" lang="en"> 149<h3> 150<a name="htmlElemDesc">Structure </a>htmlElemDesc</h3> 151<pre class="programlisting">struct _htmlElemDesc { 152 const char * name : The tag name 153 char startTag : Whether the start tag can be implied 154 char endTag : Whether the end tag can be implied 155 char saveEndTag : Whether the end tag should be saved 156 char empty : Is this an empty element ? 157 char depr : Is this a deprecated element ? 158 char dtd : 1: only in Loose DTD, 2: only Frameset one 159 char isinline : is this a block 0 or inline 1 element 160 const char * desc : the description NRK Jan.2003 * New fields encapsulating HTML structur 161 const char ** subelts : allowed sub-elements of this element 162 const char * defaultsubelt : subelement for suggested auto-repair if necessary or NULL 163 const char ** attrs_opt : Optional Attributes 164 const char ** attrs_depr : Additional deprecated attributes 165 const char ** attrs_req : Required attributes 166} htmlElemDesc; 167</pre> 168<p></p> 169</div> 170<hr> 171<div class="refsect2" lang="en"> 172<h3> 173<a name="htmlElemDescPtr">Typedef </a>htmlElemDescPtr</h3> 174<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * htmlElemDescPtr; 175</pre> 176<p></p> 177</div> 178<hr> 179<div class="refsect2" lang="en"> 180<h3> 181<a name="htmlEntityDesc">Structure </a>htmlEntityDesc</h3> 182<pre class="programlisting">struct _htmlEntityDesc { 183 unsigned int value : the UNICODE value for the character 184 const char * name : The entity name 185 const char * desc : the description 186} htmlEntityDesc; 187</pre> 188<p></p> 189</div> 190<hr> 191<div class="refsect2" lang="en"> 192<h3> 193<a name="htmlEntityDescPtr">Typedef </a>htmlEntityDescPtr</h3> 194<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlEntityDesc">htmlEntityDesc</a> * htmlEntityDescPtr; 195</pre> 196<p></p> 197</div> 198<hr> 199<div class="refsect2" lang="en"> 200<h3> 201<a name="htmlNodePtr">Typedef </a>htmlNodePtr</h3> 202<pre class="programlisting"><a href="libxml2-tree.html#xmlNodePtr">xmlNodePtr</a> htmlNodePtr; 203</pre> 204<p></p> 205</div> 206<hr> 207<div class="refsect2" lang="en"> 208<h3> 209<a name="htmlParserCtxt">Typedef </a>htmlParserCtxt</h3> 210<pre class="programlisting"><a href="libxml2-tree.html#xmlParserCtxt">xmlParserCtxt</a> htmlParserCtxt; 211</pre> 212<p></p> 213</div> 214<hr> 215<div class="refsect2" lang="en"> 216<h3> 217<a name="htmlParserCtxtPtr">Typedef </a>htmlParserCtxtPtr</h3> 218<pre class="programlisting"><a href="libxml2-tree.html#xmlParserCtxtPtr">xmlParserCtxtPtr</a> htmlParserCtxtPtr; 219</pre> 220<p></p> 221</div> 222<hr> 223<div class="refsect2" lang="en"> 224<h3> 225<a name="htmlParserInput">Typedef </a>htmlParserInput</h3> 226<pre class="programlisting"><a href="libxml2-tree.html#xmlParserInput">xmlParserInput</a> htmlParserInput; 227</pre> 228<p></p> 229</div> 230<hr> 231<div class="refsect2" lang="en"> 232<h3> 233<a name="htmlParserInputPtr">Typedef </a>htmlParserInputPtr</h3> 234<pre class="programlisting"><a href="libxml2-tree.html#xmlParserInputPtr">xmlParserInputPtr</a> htmlParserInputPtr; 235</pre> 236<p></p> 237</div> 238<hr> 239<div class="refsect2" lang="en"> 240<h3> 241<a name="htmlParserNodeInfo">Typedef </a>htmlParserNodeInfo</h3> 242<pre class="programlisting"><a href="libxml2-parser.html#xmlParserNodeInfo">xmlParserNodeInfo</a> htmlParserNodeInfo; 243</pre> 244<p></p> 245</div> 246<hr> 247<div class="refsect2" lang="en"> 248<h3> 249<a name="htmlParserOption">Enum </a>htmlParserOption</h3> 250<pre class="programlisting">enum <a href="#htmlParserOption">htmlParserOption</a> { 251 <a name="HTML_PARSE_RECOVER">HTML_PARSE_RECOVER</a> = 1 /* Relaxed parsing */ 252 <a name="HTML_PARSE_NODEFDTD">HTML_PARSE_NODEFDTD</a> = 4 /* do not default a doctype if not found */ 253 <a name="HTML_PARSE_NOERROR">HTML_PARSE_NOERROR</a> = 32 /* suppress error reports */ 254 <a name="HTML_PARSE_NOWARNING">HTML_PARSE_NOWARNING</a> = 64 /* suppress warning reports */ 255 <a name="HTML_PARSE_PEDANTIC">HTML_PARSE_PEDANTIC</a> = 128 /* pedantic error reporting */ 256 <a name="HTML_PARSE_NOBLANKS">HTML_PARSE_NOBLANKS</a> = 256 /* remove blank nodes */ 257 <a name="HTML_PARSE_NONET">HTML_PARSE_NONET</a> = 2048 /* Forbid network access */ 258 <a name="HTML_PARSE_NOIMPLIED">HTML_PARSE_NOIMPLIED</a> = 8192 /* Do not add implied html/body... elements */ 259 <a name="HTML_PARSE_COMPACT">HTML_PARSE_COMPACT</a> = 65536 /* compact small text nodes */ 260 <a name="HTML_PARSE_IGNORE_ENC">HTML_PARSE_IGNORE_ENC</a> = 2097152 /* ignore internal document encoding hint */ 261}; 262</pre> 263<p></p> 264</div> 265<hr> 266<div class="refsect2" lang="en"> 267<h3> 268<a name="htmlSAXHandler">Typedef </a>htmlSAXHandler</h3> 269<pre class="programlisting"><a href="libxml2-tree.html#xmlSAXHandler">xmlSAXHandler</a> htmlSAXHandler; 270</pre> 271<p></p> 272</div> 273<hr> 274<div class="refsect2" lang="en"> 275<h3> 276<a name="htmlSAXHandlerPtr">Typedef </a>htmlSAXHandlerPtr</h3> 277<pre class="programlisting"><a href="libxml2-tree.html#xmlSAXHandlerPtr">xmlSAXHandlerPtr</a> htmlSAXHandlerPtr; 278</pre> 279<p></p> 280</div> 281<hr> 282<div class="refsect2" lang="en"> 283<h3> 284<a name="htmlStatus">Enum </a>htmlStatus</h3> 285<pre class="programlisting">enum <a href="#htmlStatus">htmlStatus</a> { 286 <a name="HTML_NA">HTML_NA</a> = 0 /* something we don't check at all */ 287 <a name="HTML_INVALID">HTML_INVALID</a> = 1 288 <a name="HTML_DEPRECATED">HTML_DEPRECATED</a> = 2 289 <a name="HTML_VALID">HTML_VALID</a> = 4 290 <a name="HTML_REQUIRED">HTML_REQUIRED</a> = 12 /* VALID bit set so ( & HTML_VALID ) is TRUE */ 291}; 292</pre> 293<p></p> 294</div> 295<hr> 296<div class="refsect2" lang="en"> 297<h3> 298<a name="UTF8ToHtml"></a>UTF8ToHtml ()</h3> 299<pre class="programlisting">int UTF8ToHtml (unsigned char * out, <br> int * outlen, <br> const unsigned char * in, <br> int * inlen)<br> 300</pre> 301<p>Take a block of UTF-8 chars in and try to convert it to an ASCII plus HTML entities block of chars out.</p> 302<div class="variablelist"><table border="0"> 303<col align="left"> 304<tbody> 305<tr> 306<td><span class="term"><i><tt>out</tt></i>:</span></td> 307<td>a pointer to an array of bytes to store the result</td> 308</tr> 309<tr> 310<td><span class="term"><i><tt>outlen</tt></i>:</span></td> 311<td>the length of @out</td> 312</tr> 313<tr> 314<td><span class="term"><i><tt>in</tt></i>:</span></td> 315<td>a pointer to an array of UTF-8 chars</td> 316</tr> 317<tr> 318<td><span class="term"><i><tt>inlen</tt></i>:</span></td> 319<td>the length of @in</td> 320</tr> 321<tr> 322<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 323<td>0 if success, -2 if the transcoding fails, or -1 otherwise The value of @inlen after return is the number of octets consumed as the return value is positive, else unpredictable. The value of @outlen after return is the number of octets consumed.</td> 324</tr> 325</tbody> 326</table></div> 327</div> 328<hr> 329<div class="refsect2" lang="en"> 330<h3> 331<a name="htmlAttrAllowed"></a>htmlAttrAllowed ()</h3> 332<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlStatus">htmlStatus</a> htmlAttrAllowed (const <a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * elt, <br> const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * attr, <br> int legacy)<br> 333</pre> 334<p>Checks whether an <a href="libxml2-SAX.html#attribute">attribute</a> is valid for an element Has full knowledge of Required and Deprecated attributes</p> 335<div class="variablelist"><table border="0"> 336<col align="left"> 337<tbody> 338<tr> 339<td><span class="term"><i><tt>elt</tt></i>:</span></td> 340<td>HTML element</td> 341</tr> 342<tr> 343<td><span class="term"><i><tt>attr</tt></i>:</span></td> 344<td>HTML <a href="libxml2-SAX.html#attribute">attribute</a> 345</td> 346</tr> 347<tr> 348<td><span class="term"><i><tt>legacy</tt></i>:</span></td> 349<td>whether to allow deprecated attributes</td> 350</tr> 351<tr> 352<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 353<td>one of HTML_REQUIRED, HTML_VALID, HTML_DEPRECATED, <a href="libxml2-HTMLparser.html#HTML_INVALID">HTML_INVALID</a> 354</td> 355</tr> 356</tbody> 357</table></div> 358</div> 359<hr> 360<div class="refsect2" lang="en"> 361<h3> 362<a name="htmlAutoCloseTag"></a>htmlAutoCloseTag ()</h3> 363<pre class="programlisting">int htmlAutoCloseTag (<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> doc, <br> const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * name, <br> <a href="libxml2-HTMLparser.html#htmlNodePtr">htmlNodePtr</a> elem)<br> 364</pre> 365<p>The HTML DTD allows a tag to implicitly close other tags. The list is kept in htmlStartClose array. This function checks if the element or one of it's children would autoclose the given tag.</p> 366<div class="variablelist"><table border="0"> 367<col align="left"> 368<tbody> 369<tr> 370<td><span class="term"><i><tt>doc</tt></i>:</span></td> 371<td>the HTML document</td> 372</tr> 373<tr> 374<td><span class="term"><i><tt>name</tt></i>:</span></td> 375<td>The tag name</td> 376</tr> 377<tr> 378<td><span class="term"><i><tt>elem</tt></i>:</span></td> 379<td>the HTML element</td> 380</tr> 381<tr> 382<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 383<td>1 if autoclose, 0 otherwise</td> 384</tr> 385</tbody> 386</table></div> 387</div> 388<hr> 389<div class="refsect2" lang="en"> 390<h3> 391<a name="htmlCreateFileParserCtxt"></a>htmlCreateFileParserCtxt ()</h3> 392<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> htmlCreateFileParserCtxt (const char * filename, <br> const char * encoding)<br> 393</pre> 394<p>Create a parser context for a file content. Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time.</p> 395<div class="variablelist"><table border="0"> 396<col align="left"> 397<tbody> 398<tr> 399<td><span class="term"><i><tt>filename</tt></i>:</span></td> 400<td>the filename</td> 401</tr> 402<tr> 403<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 404<td>a free form C string describing the HTML document encoding, or NULL</td> 405</tr> 406<tr> 407<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 408<td>the new parser context or NULL</td> 409</tr> 410</tbody> 411</table></div> 412</div> 413<hr> 414<div class="refsect2" lang="en"> 415<h3> 416<a name="htmlCreateMemoryParserCtxt"></a>htmlCreateMemoryParserCtxt ()</h3> 417<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> htmlCreateMemoryParserCtxt (const char * buffer, <br> int size)<br> 418</pre> 419<p>Create a parser context for an HTML in-memory document.</p> 420<div class="variablelist"><table border="0"> 421<col align="left"> 422<tbody> 423<tr> 424<td><span class="term"><i><tt>buffer</tt></i>:</span></td> 425<td>a pointer to a char array</td> 426</tr> 427<tr> 428<td><span class="term"><i><tt>size</tt></i>:</span></td> 429<td>the size of the array</td> 430</tr> 431<tr> 432<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 433<td>the new parser context or NULL</td> 434</tr> 435</tbody> 436</table></div> 437</div> 438<hr> 439<div class="refsect2" lang="en"> 440<h3> 441<a name="htmlCreatePushParserCtxt"></a>htmlCreatePushParserCtxt ()</h3> 442<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> htmlCreatePushParserCtxt (<a href="libxml2-HTMLparser.html#htmlSAXHandlerPtr">htmlSAXHandlerPtr</a> sax, <br> void * user_data, <br> const char * chunk, <br> int size, <br> const char * filename, <br> <a href="libxml2-encoding.html#xmlCharEncoding">xmlCharEncoding</a> enc)<br> 443</pre> 444<p>Create a parser context for using the HTML parser in push mode The value of @filename is used for fetching external entities and error/warning reports.</p> 445<div class="variablelist"><table border="0"> 446<col align="left"> 447<tbody> 448<tr> 449<td><span class="term"><i><tt>sax</tt></i>:</span></td> 450<td>a SAX handler</td> 451</tr> 452<tr> 453<td><span class="term"><i><tt>user_data</tt></i>:</span></td> 454<td>The user data returned on SAX callbacks</td> 455</tr> 456<tr> 457<td><span class="term"><i><tt>chunk</tt></i>:</span></td> 458<td>a pointer to an array of chars</td> 459</tr> 460<tr> 461<td><span class="term"><i><tt>size</tt></i>:</span></td> 462<td>number of chars in the array</td> 463</tr> 464<tr> 465<td><span class="term"><i><tt>filename</tt></i>:</span></td> 466<td>an optional file name or URI</td> 467</tr> 468<tr> 469<td><span class="term"><i><tt>enc</tt></i>:</span></td> 470<td>an optional encoding</td> 471</tr> 472<tr> 473<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 474<td>the new parser context or NULL</td> 475</tr> 476</tbody> 477</table></div> 478</div> 479<hr> 480<div class="refsect2" lang="en"> 481<h3> 482<a name="htmlCtxtReadDoc"></a>htmlCtxtReadDoc ()</h3> 483<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlCtxtReadDoc (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * cur, <br> const char * URL, <br> const char * encoding, <br> int options)<br> 484</pre> 485<p>parse an XML in-memory document and build a tree. This reuses the existing @ctxt parser context</p> 486<div class="variablelist"><table border="0"> 487<col align="left"> 488<tbody> 489<tr> 490<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 491<td>an HTML parser context</td> 492</tr> 493<tr> 494<td><span class="term"><i><tt>cur</tt></i>:</span></td> 495<td>a pointer to a zero terminated string</td> 496</tr> 497<tr> 498<td><span class="term"><i><tt>URL</tt></i>:</span></td> 499<td>the base URL to use for the document</td> 500</tr> 501<tr> 502<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 503<td>the document encoding, or NULL</td> 504</tr> 505<tr> 506<td><span class="term"><i><tt>options</tt></i>:</span></td> 507<td>a combination of htmlParserOption(s)</td> 508</tr> 509<tr> 510<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 511<td>the resulting document tree</td> 512</tr> 513</tbody> 514</table></div> 515</div> 516<hr> 517<div class="refsect2" lang="en"> 518<h3> 519<a name="htmlCtxtReadFd"></a>htmlCtxtReadFd ()</h3> 520<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlCtxtReadFd (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> int fd, <br> const char * URL, <br> const char * encoding, <br> int options)<br> 521</pre> 522<p>parse an XML from a file descriptor and build a tree. This reuses the existing @ctxt parser context</p> 523<div class="variablelist"><table border="0"> 524<col align="left"> 525<tbody> 526<tr> 527<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 528<td>an HTML parser context</td> 529</tr> 530<tr> 531<td><span class="term"><i><tt>fd</tt></i>:</span></td> 532<td>an open file descriptor</td> 533</tr> 534<tr> 535<td><span class="term"><i><tt>URL</tt></i>:</span></td> 536<td>the base URL to use for the document</td> 537</tr> 538<tr> 539<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 540<td>the document encoding, or NULL</td> 541</tr> 542<tr> 543<td><span class="term"><i><tt>options</tt></i>:</span></td> 544<td>a combination of htmlParserOption(s)</td> 545</tr> 546<tr> 547<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 548<td>the resulting document tree</td> 549</tr> 550</tbody> 551</table></div> 552</div> 553<hr> 554<div class="refsect2" lang="en"> 555<h3> 556<a name="htmlCtxtReadFile"></a>htmlCtxtReadFile ()</h3> 557<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlCtxtReadFile (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> const char * filename, <br> const char * encoding, <br> int options)<br> 558</pre> 559<p>parse an XML file from the filesystem or the network. This reuses the existing @ctxt parser context</p> 560<div class="variablelist"><table border="0"> 561<col align="left"> 562<tbody> 563<tr> 564<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 565<td>an HTML parser context</td> 566</tr> 567<tr> 568<td><span class="term"><i><tt>filename</tt></i>:</span></td> 569<td>a file or URL</td> 570</tr> 571<tr> 572<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 573<td>the document encoding, or NULL</td> 574</tr> 575<tr> 576<td><span class="term"><i><tt>options</tt></i>:</span></td> 577<td>a combination of htmlParserOption(s)</td> 578</tr> 579<tr> 580<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 581<td>the resulting document tree</td> 582</tr> 583</tbody> 584</table></div> 585</div> 586<hr> 587<div class="refsect2" lang="en"> 588<h3> 589<a name="htmlCtxtReadIO"></a>htmlCtxtReadIO ()</h3> 590<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlCtxtReadIO (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> <a href="libxml2-xmlIO.html#xmlInputReadCallback">xmlInputReadCallback</a> ioread, <br> <a href="libxml2-xmlIO.html#xmlInputCloseCallback">xmlInputCloseCallback</a> ioclose, <br> void * ioctx, <br> const char * URL, <br> const char * encoding, <br> int options)<br> 591</pre> 592<p>parse an HTML document from I/O functions and source and build a tree. This reuses the existing @ctxt parser context</p> 593<div class="variablelist"><table border="0"> 594<col align="left"> 595<tbody> 596<tr> 597<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 598<td>an HTML parser context</td> 599</tr> 600<tr> 601<td><span class="term"><i><tt>ioread</tt></i>:</span></td> 602<td>an I/O read function</td> 603</tr> 604<tr> 605<td><span class="term"><i><tt>ioclose</tt></i>:</span></td> 606<td>an I/O close function</td> 607</tr> 608<tr> 609<td><span class="term"><i><tt>ioctx</tt></i>:</span></td> 610<td>an I/O handler</td> 611</tr> 612<tr> 613<td><span class="term"><i><tt>URL</tt></i>:</span></td> 614<td>the base URL to use for the document</td> 615</tr> 616<tr> 617<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 618<td>the document encoding, or NULL</td> 619</tr> 620<tr> 621<td><span class="term"><i><tt>options</tt></i>:</span></td> 622<td>a combination of htmlParserOption(s)</td> 623</tr> 624<tr> 625<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 626<td>the resulting document tree</td> 627</tr> 628</tbody> 629</table></div> 630</div> 631<hr> 632<div class="refsect2" lang="en"> 633<h3> 634<a name="htmlCtxtReadMemory"></a>htmlCtxtReadMemory ()</h3> 635<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlCtxtReadMemory (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> const char * buffer, <br> int size, <br> const char * URL, <br> const char * encoding, <br> int options)<br> 636</pre> 637<p>parse an XML in-memory document and build a tree. This reuses the existing @ctxt parser context</p> 638<div class="variablelist"><table border="0"> 639<col align="left"> 640<tbody> 641<tr> 642<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 643<td>an HTML parser context</td> 644</tr> 645<tr> 646<td><span class="term"><i><tt>buffer</tt></i>:</span></td> 647<td>a pointer to a char array</td> 648</tr> 649<tr> 650<td><span class="term"><i><tt>size</tt></i>:</span></td> 651<td>the size of the array</td> 652</tr> 653<tr> 654<td><span class="term"><i><tt>URL</tt></i>:</span></td> 655<td>the base URL to use for the document</td> 656</tr> 657<tr> 658<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 659<td>the document encoding, or NULL</td> 660</tr> 661<tr> 662<td><span class="term"><i><tt>options</tt></i>:</span></td> 663<td>a combination of htmlParserOption(s)</td> 664</tr> 665<tr> 666<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 667<td>the resulting document tree</td> 668</tr> 669</tbody> 670</table></div> 671</div> 672<hr> 673<div class="refsect2" lang="en"> 674<h3> 675<a name="htmlCtxtReset"></a>htmlCtxtReset ()</h3> 676<pre class="programlisting">void htmlCtxtReset (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt)<br> 677</pre> 678<p>Reset a parser context</p> 679<div class="variablelist"><table border="0"> 680<col align="left"> 681<tbody><tr> 682<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 683<td>an HTML parser context</td> 684</tr></tbody> 685</table></div> 686</div> 687<hr> 688<div class="refsect2" lang="en"> 689<h3> 690<a name="htmlCtxtUseOptions"></a>htmlCtxtUseOptions ()</h3> 691<pre class="programlisting">int htmlCtxtUseOptions (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> int options)<br> 692</pre> 693<p>Applies the options to the parser context</p> 694<div class="variablelist"><table border="0"> 695<col align="left"> 696<tbody> 697<tr> 698<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 699<td>an HTML parser context</td> 700</tr> 701<tr> 702<td><span class="term"><i><tt>options</tt></i>:</span></td> 703<td>a combination of htmlParserOption(s)</td> 704</tr> 705<tr> 706<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 707<td>0 in case of success, the set of unknown or unimplemented options in case of error.</td> 708</tr> 709</tbody> 710</table></div> 711</div> 712<hr> 713<div class="refsect2" lang="en"> 714<h3> 715<a name="htmlElementAllowedHere"></a>htmlElementAllowedHere ()</h3> 716<pre class="programlisting">int htmlElementAllowedHere (const <a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * parent, <br> const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * elt)<br> 717</pre> 718<p>Checks whether an HTML element may be a direct child of a parent element. Note - doesn't check for deprecated elements</p> 719<div class="variablelist"><table border="0"> 720<col align="left"> 721<tbody> 722<tr> 723<td><span class="term"><i><tt>parent</tt></i>:</span></td> 724<td>HTML parent element</td> 725</tr> 726<tr> 727<td><span class="term"><i><tt>elt</tt></i>:</span></td> 728<td>HTML element</td> 729</tr> 730<tr> 731<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 732<td>1 if allowed; 0 otherwise.</td> 733</tr> 734</tbody> 735</table></div> 736</div> 737<hr> 738<div class="refsect2" lang="en"> 739<h3> 740<a name="htmlElementStatusHere"></a>htmlElementStatusHere ()</h3> 741<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlStatus">htmlStatus</a> htmlElementStatusHere (const <a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * parent, <br> const <a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * elt)<br> 742</pre> 743<p>Checks whether an HTML element may be a direct child of a parent element. and if so whether it is valid or deprecated.</p> 744<div class="variablelist"><table border="0"> 745<col align="left"> 746<tbody> 747<tr> 748<td><span class="term"><i><tt>parent</tt></i>:</span></td> 749<td>HTML parent element</td> 750</tr> 751<tr> 752<td><span class="term"><i><tt>elt</tt></i>:</span></td> 753<td>HTML element</td> 754</tr> 755<tr> 756<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 757<td>one of HTML_VALID, HTML_DEPRECATED, <a href="libxml2-HTMLparser.html#HTML_INVALID">HTML_INVALID</a> 758</td> 759</tr> 760</tbody> 761</table></div> 762</div> 763<hr> 764<div class="refsect2" lang="en"> 765<h3> 766<a name="htmlEncodeEntities"></a>htmlEncodeEntities ()</h3> 767<pre class="programlisting">int htmlEncodeEntities (unsigned char * out, <br> int * outlen, <br> const unsigned char * in, <br> int * inlen, <br> int quoteChar)<br> 768</pre> 769<p>Take a block of UTF-8 chars in and try to convert it to an ASCII plus HTML entities block of chars out.</p> 770<div class="variablelist"><table border="0"> 771<col align="left"> 772<tbody> 773<tr> 774<td><span class="term"><i><tt>out</tt></i>:</span></td> 775<td>a pointer to an array of bytes to store the result</td> 776</tr> 777<tr> 778<td><span class="term"><i><tt>outlen</tt></i>:</span></td> 779<td>the length of @out</td> 780</tr> 781<tr> 782<td><span class="term"><i><tt>in</tt></i>:</span></td> 783<td>a pointer to an array of UTF-8 chars</td> 784</tr> 785<tr> 786<td><span class="term"><i><tt>inlen</tt></i>:</span></td> 787<td>the length of @in</td> 788</tr> 789<tr> 790<td><span class="term"><i><tt>quoteChar</tt></i>:</span></td> 791<td>the quote character to escape (' or ") or zero.</td> 792</tr> 793<tr> 794<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 795<td>0 if success, -2 if the transcoding fails, or -1 otherwise The value of @inlen after return is the number of octets consumed as the return value is positive, else unpredictable. The value of @outlen after return is the number of octets consumed.</td> 796</tr> 797</tbody> 798</table></div> 799</div> 800<hr> 801<div class="refsect2" lang="en"> 802<h3> 803<a name="htmlEntityLookup"></a>htmlEntityLookup ()</h3> 804<pre class="programlisting">const <a href="libxml2-HTMLparser.html#htmlEntityDesc">htmlEntityDesc</a> * htmlEntityLookup (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * name)<br> 805</pre> 806<p>Lookup the given entity in EntitiesTable TODO: the linear scan is really ugly, an hash table is really needed.</p> 807<div class="variablelist"><table border="0"> 808<col align="left"> 809<tbody> 810<tr> 811<td><span class="term"><i><tt>name</tt></i>:</span></td> 812<td>the entity name</td> 813</tr> 814<tr> 815<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 816<td>the associated <a href="libxml2-HTMLparser.html#htmlEntityDescPtr">htmlEntityDescPtr</a> if found, NULL otherwise.</td> 817</tr> 818</tbody> 819</table></div> 820</div> 821<hr> 822<div class="refsect2" lang="en"> 823<h3> 824<a name="htmlEntityValueLookup"></a>htmlEntityValueLookup ()</h3> 825<pre class="programlisting">const <a href="libxml2-HTMLparser.html#htmlEntityDesc">htmlEntityDesc</a> * htmlEntityValueLookup (unsigned int value)<br> 826</pre> 827<p>Lookup the given entity in EntitiesTable TODO: the linear scan is really ugly, an hash table is really needed.</p> 828<div class="variablelist"><table border="0"> 829<col align="left"> 830<tbody> 831<tr> 832<td><span class="term"><i><tt>value</tt></i>:</span></td> 833<td>the entity's unicode value</td> 834</tr> 835<tr> 836<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 837<td>the associated <a href="libxml2-HTMLparser.html#htmlEntityDescPtr">htmlEntityDescPtr</a> if found, NULL otherwise.</td> 838</tr> 839</tbody> 840</table></div> 841</div> 842<hr> 843<div class="refsect2" lang="en"> 844<h3> 845<a name="htmlFreeParserCtxt"></a>htmlFreeParserCtxt ()</h3> 846<pre class="programlisting">void htmlFreeParserCtxt (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt)<br> 847</pre> 848<p>Free all the memory used by a parser context. However the parsed document in ctxt->myDoc is not freed.</p> 849<div class="variablelist"><table border="0"> 850<col align="left"> 851<tbody><tr> 852<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 853<td>an HTML parser context</td> 854</tr></tbody> 855</table></div> 856</div> 857<hr> 858<div class="refsect2" lang="en"> 859<h3> 860<a name="htmlHandleOmittedElem"></a>htmlHandleOmittedElem ()</h3> 861<pre class="programlisting">int htmlHandleOmittedElem (int val)<br> 862</pre> 863<p>Set and return the previous value for handling HTML omitted tags.</p> 864<div class="variablelist"><table border="0"> 865<col align="left"> 866<tbody> 867<tr> 868<td><span class="term"><i><tt>val</tt></i>:</span></td> 869<td>int 0 or 1</td> 870</tr> 871<tr> 872<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 873<td>the last value for 0 for no handling, 1 for auto insertion.</td> 874</tr> 875</tbody> 876</table></div> 877</div> 878<hr> 879<div class="refsect2" lang="en"> 880<h3> 881<a name="htmlInitAutoClose"></a>htmlInitAutoClose ()</h3> 882<pre class="programlisting">void htmlInitAutoClose (void)<br> 883</pre> 884<p>DEPRECATED: This function will be made private. Call <a href="libxml2-parser.html#xmlInitParser">xmlInitParser</a> to initialize the library. This is a no-op now.</p> 885</div> 886<hr> 887<div class="refsect2" lang="en"> 888<h3> 889<a name="htmlIsAutoClosed"></a>htmlIsAutoClosed ()</h3> 890<pre class="programlisting">int htmlIsAutoClosed (<a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> doc, <br> <a href="libxml2-HTMLparser.html#htmlNodePtr">htmlNodePtr</a> elem)<br> 891</pre> 892<p>The HTML DTD allows a tag to implicitly close other tags. The list is kept in htmlStartClose array. This function checks if a tag is autoclosed by one of it's child</p> 893<div class="variablelist"><table border="0"> 894<col align="left"> 895<tbody> 896<tr> 897<td><span class="term"><i><tt>doc</tt></i>:</span></td> 898<td>the HTML document</td> 899</tr> 900<tr> 901<td><span class="term"><i><tt>elem</tt></i>:</span></td> 902<td>the HTML element</td> 903</tr> 904<tr> 905<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 906<td>1 if autoclosed, 0 otherwise</td> 907</tr> 908</tbody> 909</table></div> 910</div> 911<hr> 912<div class="refsect2" lang="en"> 913<h3> 914<a name="htmlIsScriptAttribute"></a>htmlIsScriptAttribute ()</h3> 915<pre class="programlisting">int htmlIsScriptAttribute (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * name)<br> 916</pre> 917<p>Check if an <a href="libxml2-SAX.html#attribute">attribute</a> is of content type Script</p> 918<div class="variablelist"><table border="0"> 919<col align="left"> 920<tbody> 921<tr> 922<td><span class="term"><i><tt>name</tt></i>:</span></td> 923<td>an <a href="libxml2-SAX.html#attribute">attribute</a> name</td> 924</tr> 925<tr> 926<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 927<td>1 is the <a href="libxml2-SAX.html#attribute">attribute</a> is a script 0 otherwise</td> 928</tr> 929</tbody> 930</table></div> 931</div> 932<hr> 933<div class="refsect2" lang="en"> 934<h3> 935<a name="htmlNewParserCtxt"></a>htmlNewParserCtxt ()</h3> 936<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> htmlNewParserCtxt (void)<br> 937</pre> 938<p>Allocate and initialize a new parser context.</p> 939<div class="variablelist"><table border="0"> 940<col align="left"> 941<tbody><tr> 942<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 943<td>the <a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> or NULL in case of allocation error</td> 944</tr></tbody> 945</table></div> 946</div> 947<hr> 948<div class="refsect2" lang="en"> 949<h3> 950<a name="htmlNewSAXParserCtxt"></a>htmlNewSAXParserCtxt ()</h3> 951<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> htmlNewSAXParserCtxt (<a href="libxml2-HTMLparser.html#htmlSAXHandlerPtr">htmlSAXHandlerPtr</a> sax, <br> void * userData)<br> 952</pre> 953<p>Allocate and initialize a new parser context.</p> 954<div class="variablelist"><table border="0"> 955<col align="left"> 956<tbody> 957<tr> 958<td><span class="term"><i><tt>sax</tt></i>:</span></td> 959<td>SAX handler</td> 960</tr> 961<tr> 962<td><span class="term"><i><tt>userData</tt></i>:</span></td> 963<td>user data</td> 964</tr> 965<tr> 966<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 967<td>the <a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> or NULL in case of allocation error</td> 968</tr> 969</tbody> 970</table></div> 971</div> 972<hr> 973<div class="refsect2" lang="en"> 974<h3> 975<a name="htmlNodeStatus"></a>htmlNodeStatus ()</h3> 976<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlStatus">htmlStatus</a> htmlNodeStatus (const <a href="libxml2-HTMLparser.html#htmlNodePtr">htmlNodePtr</a> node, <br> int legacy)<br> 977</pre> 978<p>Checks whether the tree node is valid. Experimental (the author only uses the HTML enhancements in a SAX parser)</p> 979<div class="variablelist"><table border="0"> 980<col align="left"> 981<tbody> 982<tr> 983<td><span class="term"><i><tt>node</tt></i>:</span></td> 984<td>an <a href="libxml2-HTMLparser.html#htmlNodePtr">htmlNodePtr</a> in a tree</td> 985</tr> 986<tr> 987<td><span class="term"><i><tt>legacy</tt></i>:</span></td> 988<td>whether to allow deprecated elements (YES is faster here for Element nodes)</td> 989</tr> 990<tr> 991<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 992<td>for Element nodes, a return from <a href="libxml2-HTMLparser.html#htmlElementAllowedHere">htmlElementAllowedHere</a> (if legacy allowed) or <a href="libxml2-HTMLparser.html#htmlElementStatusHere">htmlElementStatusHere</a> (otherwise). for Attribute nodes, a return from <a href="libxml2-HTMLparser.html#htmlAttrAllowed">htmlAttrAllowed</a> for other nodes, <a href="libxml2-HTMLparser.html#HTML_NA">HTML_NA</a> (no checks performed)</td> 993</tr> 994</tbody> 995</table></div> 996</div> 997<hr> 998<div class="refsect2" lang="en"> 999<h3> 1000<a name="htmlParseCharRef"></a>htmlParseCharRef ()</h3> 1001<pre class="programlisting">int htmlParseCharRef (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt)<br> 1002</pre> 1003<p>parse Reference declarations [66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'</p> 1004<div class="variablelist"><table border="0"> 1005<col align="left"> 1006<tbody> 1007<tr> 1008<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 1009<td>an HTML parser context</td> 1010</tr> 1011<tr> 1012<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1013<td>the value parsed (as an int)</td> 1014</tr> 1015</tbody> 1016</table></div> 1017</div> 1018<hr> 1019<div class="refsect2" lang="en"> 1020<h3> 1021<a name="htmlParseChunk"></a>htmlParseChunk ()</h3> 1022<pre class="programlisting">int htmlParseChunk (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> const char * chunk, <br> int size, <br> int terminate)<br> 1023</pre> 1024<p>Parse a Chunk of memory</p> 1025<div class="variablelist"><table border="0"> 1026<col align="left"> 1027<tbody> 1028<tr> 1029<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 1030<td>an HTML parser context</td> 1031</tr> 1032<tr> 1033<td><span class="term"><i><tt>chunk</tt></i>:</span></td> 1034<td>an char array</td> 1035</tr> 1036<tr> 1037<td><span class="term"><i><tt>size</tt></i>:</span></td> 1038<td>the size in byte of the chunk</td> 1039</tr> 1040<tr> 1041<td><span class="term"><i><tt>terminate</tt></i>:</span></td> 1042<td>last chunk indicator</td> 1043</tr> 1044<tr> 1045<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1046<td>zero if no error, the <a href="libxml2-xmlerror.html#xmlParserErrors">xmlParserErrors</a> otherwise.</td> 1047</tr> 1048</tbody> 1049</table></div> 1050</div> 1051<hr> 1052<div class="refsect2" lang="en"> 1053<h3> 1054<a name="htmlParseDoc"></a>htmlParseDoc ()</h3> 1055<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlParseDoc (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * cur, <br> const char * encoding)<br> 1056</pre> 1057<p>parse an HTML in-memory document and build a tree.</p> 1058<div class="variablelist"><table border="0"> 1059<col align="left"> 1060<tbody> 1061<tr> 1062<td><span class="term"><i><tt>cur</tt></i>:</span></td> 1063<td>a pointer to an array of <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> 1064</td> 1065</tr> 1066<tr> 1067<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 1068<td>a free form C string describing the HTML document encoding, or NULL</td> 1069</tr> 1070<tr> 1071<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1072<td>the resulting document tree</td> 1073</tr> 1074</tbody> 1075</table></div> 1076</div> 1077<hr> 1078<div class="refsect2" lang="en"> 1079<h3> 1080<a name="htmlParseDocument"></a>htmlParseDocument ()</h3> 1081<pre class="programlisting">int htmlParseDocument (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt)<br> 1082</pre> 1083<p>parse an HTML document (and build a tree if using the standard SAX interface).</p> 1084<div class="variablelist"><table border="0"> 1085<col align="left"> 1086<tbody> 1087<tr> 1088<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 1089<td>an HTML parser context</td> 1090</tr> 1091<tr> 1092<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1093<td>0, -1 in case of error. the parser context is augmented as a result of the parsing.</td> 1094</tr> 1095</tbody> 1096</table></div> 1097</div> 1098<hr> 1099<div class="refsect2" lang="en"> 1100<h3> 1101<a name="htmlParseElement"></a>htmlParseElement ()</h3> 1102<pre class="programlisting">void htmlParseElement (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt)<br> 1103</pre> 1104<p>parse an HTML element, this is highly recursive this is kept for compatibility with previous code versions [39] element ::= EmptyElemTag | STag content ETag [41] Attribute ::= Name Eq AttValue</p> 1105<div class="variablelist"><table border="0"> 1106<col align="left"> 1107<tbody><tr> 1108<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 1109<td>an HTML parser context</td> 1110</tr></tbody> 1111</table></div> 1112</div> 1113<hr> 1114<div class="refsect2" lang="en"> 1115<h3> 1116<a name="htmlParseEntityRef"></a>htmlParseEntityRef ()</h3> 1117<pre class="programlisting">const <a href="libxml2-HTMLparser.html#htmlEntityDesc">htmlEntityDesc</a> * htmlParseEntityRef (<a href="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br> const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> ** str)<br> 1118</pre> 1119<p>parse an HTML ENTITY references [68] EntityRef ::= '&' Name ';'</p> 1120<div class="variablelist"><table border="0"> 1121<col align="left"> 1122<tbody> 1123<tr> 1124<td><span class="term"><i><tt>ctxt</tt></i>:</span></td> 1125<td>an HTML parser context</td> 1126</tr> 1127<tr> 1128<td><span class="term"><i><tt>str</tt></i>:</span></td> 1129<td>location to store the entity name</td> 1130</tr> 1131<tr> 1132<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1133<td>the associated <a href="libxml2-HTMLparser.html#htmlEntityDescPtr">htmlEntityDescPtr</a> if found, or NULL otherwise, if non-NULL *str will have to be freed by the caller.</td> 1134</tr> 1135</tbody> 1136</table></div> 1137</div> 1138<hr> 1139<div class="refsect2" lang="en"> 1140<h3> 1141<a name="htmlParseFile"></a>htmlParseFile ()</h3> 1142<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlParseFile (const char * filename, <br> const char * encoding)<br> 1143</pre> 1144<p>parse an HTML file and build a tree. Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time.</p> 1145<div class="variablelist"><table border="0"> 1146<col align="left"> 1147<tbody> 1148<tr> 1149<td><span class="term"><i><tt>filename</tt></i>:</span></td> 1150<td>the filename</td> 1151</tr> 1152<tr> 1153<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 1154<td>a free form C string describing the HTML document encoding, or NULL</td> 1155</tr> 1156<tr> 1157<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1158<td>the resulting document tree</td> 1159</tr> 1160</tbody> 1161</table></div> 1162</div> 1163<hr> 1164<div class="refsect2" lang="en"> 1165<h3> 1166<a name="htmlReadDoc"></a>htmlReadDoc ()</h3> 1167<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlReadDoc (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * cur, <br> const char * URL, <br> const char * encoding, <br> int options)<br> 1168</pre> 1169<p>parse an XML in-memory document and build a tree.</p> 1170<div class="variablelist"><table border="0"> 1171<col align="left"> 1172<tbody> 1173<tr> 1174<td><span class="term"><i><tt>cur</tt></i>:</span></td> 1175<td>a pointer to a zero terminated string</td> 1176</tr> 1177<tr> 1178<td><span class="term"><i><tt>URL</tt></i>:</span></td> 1179<td>the base URL to use for the document</td> 1180</tr> 1181<tr> 1182<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 1183<td>the document encoding, or NULL</td> 1184</tr> 1185<tr> 1186<td><span class="term"><i><tt>options</tt></i>:</span></td> 1187<td>a combination of htmlParserOption(s)</td> 1188</tr> 1189<tr> 1190<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1191<td>the resulting document tree</td> 1192</tr> 1193</tbody> 1194</table></div> 1195</div> 1196<hr> 1197<div class="refsect2" lang="en"> 1198<h3> 1199<a name="htmlReadFd"></a>htmlReadFd ()</h3> 1200<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlReadFd (int fd, <br> const char * URL, <br> const char * encoding, <br> int options)<br> 1201</pre> 1202<p>parse an HTML from a file descriptor and build a tree. NOTE that the file descriptor will not be closed when the reader is closed or reset.</p> 1203<div class="variablelist"><table border="0"> 1204<col align="left"> 1205<tbody> 1206<tr> 1207<td><span class="term"><i><tt>fd</tt></i>:</span></td> 1208<td>an open file descriptor</td> 1209</tr> 1210<tr> 1211<td><span class="term"><i><tt>URL</tt></i>:</span></td> 1212<td>the base URL to use for the document</td> 1213</tr> 1214<tr> 1215<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 1216<td>the document encoding, or NULL</td> 1217</tr> 1218<tr> 1219<td><span class="term"><i><tt>options</tt></i>:</span></td> 1220<td>a combination of htmlParserOption(s)</td> 1221</tr> 1222<tr> 1223<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1224<td>the resulting document tree</td> 1225</tr> 1226</tbody> 1227</table></div> 1228</div> 1229<hr> 1230<div class="refsect2" lang="en"> 1231<h3> 1232<a name="htmlReadFile"></a>htmlReadFile ()</h3> 1233<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlReadFile (const char * filename, <br> const char * encoding, <br> int options)<br> 1234</pre> 1235<p>parse an XML file from the filesystem or the network.</p> 1236<div class="variablelist"><table border="0"> 1237<col align="left"> 1238<tbody> 1239<tr> 1240<td><span class="term"><i><tt>filename</tt></i>:</span></td> 1241<td>a file or URL</td> 1242</tr> 1243<tr> 1244<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 1245<td>the document encoding, or NULL</td> 1246</tr> 1247<tr> 1248<td><span class="term"><i><tt>options</tt></i>:</span></td> 1249<td>a combination of htmlParserOption(s)</td> 1250</tr> 1251<tr> 1252<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1253<td>the resulting document tree</td> 1254</tr> 1255</tbody> 1256</table></div> 1257</div> 1258<hr> 1259<div class="refsect2" lang="en"> 1260<h3> 1261<a name="htmlReadIO"></a>htmlReadIO ()</h3> 1262<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlReadIO (<a href="libxml2-xmlIO.html#xmlInputReadCallback">xmlInputReadCallback</a> ioread, <br> <a href="libxml2-xmlIO.html#xmlInputCloseCallback">xmlInputCloseCallback</a> ioclose, <br> void * ioctx, <br> const char * URL, <br> const char * encoding, <br> int options)<br> 1263</pre> 1264<p>parse an HTML document from I/O functions and source and build a tree.</p> 1265<div class="variablelist"><table border="0"> 1266<col align="left"> 1267<tbody> 1268<tr> 1269<td><span class="term"><i><tt>ioread</tt></i>:</span></td> 1270<td>an I/O read function</td> 1271</tr> 1272<tr> 1273<td><span class="term"><i><tt>ioclose</tt></i>:</span></td> 1274<td>an I/O close function</td> 1275</tr> 1276<tr> 1277<td><span class="term"><i><tt>ioctx</tt></i>:</span></td> 1278<td>an I/O handler</td> 1279</tr> 1280<tr> 1281<td><span class="term"><i><tt>URL</tt></i>:</span></td> 1282<td>the base URL to use for the document</td> 1283</tr> 1284<tr> 1285<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 1286<td>the document encoding, or NULL</td> 1287</tr> 1288<tr> 1289<td><span class="term"><i><tt>options</tt></i>:</span></td> 1290<td>a combination of htmlParserOption(s)</td> 1291</tr> 1292<tr> 1293<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1294<td>the resulting document tree</td> 1295</tr> 1296</tbody> 1297</table></div> 1298</div> 1299<hr> 1300<div class="refsect2" lang="en"> 1301<h3> 1302<a name="htmlReadMemory"></a>htmlReadMemory ()</h3> 1303<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlReadMemory (const char * buffer, <br> int size, <br> const char * URL, <br> const char * encoding, <br> int options)<br> 1304</pre> 1305<p>parse an XML in-memory document and build a tree.</p> 1306<div class="variablelist"><table border="0"> 1307<col align="left"> 1308<tbody> 1309<tr> 1310<td><span class="term"><i><tt>buffer</tt></i>:</span></td> 1311<td>a pointer to a char array</td> 1312</tr> 1313<tr> 1314<td><span class="term"><i><tt>size</tt></i>:</span></td> 1315<td>the size of the array</td> 1316</tr> 1317<tr> 1318<td><span class="term"><i><tt>URL</tt></i>:</span></td> 1319<td>the base URL to use for the document</td> 1320</tr> 1321<tr> 1322<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 1323<td>the document encoding, or NULL</td> 1324</tr> 1325<tr> 1326<td><span class="term"><i><tt>options</tt></i>:</span></td> 1327<td>a combination of htmlParserOption(s)</td> 1328</tr> 1329<tr> 1330<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1331<td>the resulting document tree</td> 1332</tr> 1333</tbody> 1334</table></div> 1335</div> 1336<hr> 1337<div class="refsect2" lang="en"> 1338<h3> 1339<a name="htmlSAXParseDoc"></a>htmlSAXParseDoc ()</h3> 1340<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlSAXParseDoc (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * cur, <br> const char * encoding, <br> <a href="libxml2-HTMLparser.html#htmlSAXHandlerPtr">htmlSAXHandlerPtr</a> sax, <br> void * userData)<br> 1341</pre> 1342<p>Parse an HTML in-memory document. If sax is not NULL, use the SAX callbacks to handle parse events. If sax is NULL, fallback to the default DOM behavior and return a tree.</p> 1343<div class="variablelist"><table border="0"> 1344<col align="left"> 1345<tbody> 1346<tr> 1347<td><span class="term"><i><tt>cur</tt></i>:</span></td> 1348<td>a pointer to an array of <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> 1349</td> 1350</tr> 1351<tr> 1352<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 1353<td>a free form C string describing the HTML document encoding, or NULL</td> 1354</tr> 1355<tr> 1356<td><span class="term"><i><tt>sax</tt></i>:</span></td> 1357<td>the SAX handler block</td> 1358</tr> 1359<tr> 1360<td><span class="term"><i><tt>userData</tt></i>:</span></td> 1361<td>if using SAX, this pointer will be provided on callbacks.</td> 1362</tr> 1363<tr> 1364<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1365<td>the resulting document tree unless SAX is NULL or the document is not well formed.</td> 1366</tr> 1367</tbody> 1368</table></div> 1369</div> 1370<hr> 1371<div class="refsect2" lang="en"> 1372<h3> 1373<a name="htmlSAXParseFile"></a>htmlSAXParseFile ()</h3> 1374<pre class="programlisting"><a href="libxml2-HTMLparser.html#htmlDocPtr">htmlDocPtr</a> htmlSAXParseFile (const char * filename, <br> const char * encoding, <br> <a href="libxml2-HTMLparser.html#htmlSAXHandlerPtr">htmlSAXHandlerPtr</a> sax, <br> void * userData)<br> 1375</pre> 1376<p>parse an HTML file and build a tree. Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time. It use the given SAX function block to handle the parsing callback. If sax is NULL, fallback to the default DOM tree building routines.</p> 1377<div class="variablelist"><table border="0"> 1378<col align="left"> 1379<tbody> 1380<tr> 1381<td><span class="term"><i><tt>filename</tt></i>:</span></td> 1382<td>the filename</td> 1383</tr> 1384<tr> 1385<td><span class="term"><i><tt>encoding</tt></i>:</span></td> 1386<td>a free form C string describing the HTML document encoding, or NULL</td> 1387</tr> 1388<tr> 1389<td><span class="term"><i><tt>sax</tt></i>:</span></td> 1390<td>the SAX handler block</td> 1391</tr> 1392<tr> 1393<td><span class="term"><i><tt>userData</tt></i>:</span></td> 1394<td>if using SAX, this pointer will be provided on callbacks.</td> 1395</tr> 1396<tr> 1397<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1398<td>the resulting document tree unless SAX is NULL or the document is not well formed.</td> 1399</tr> 1400</tbody> 1401</table></div> 1402</div> 1403<hr> 1404<div class="refsect2" lang="en"> 1405<h3> 1406<a name="htmlTagLookup"></a>htmlTagLookup ()</h3> 1407<pre class="programlisting">const <a href="libxml2-HTMLparser.html#htmlElemDesc">htmlElemDesc</a> * htmlTagLookup (const <a href="libxml2-xmlstring.html#xmlChar">xmlChar</a> * tag)<br> 1408</pre> 1409<p>Lookup the HTML tag in the ElementTable</p> 1410<div class="variablelist"><table border="0"> 1411<col align="left"> 1412<tbody> 1413<tr> 1414<td><span class="term"><i><tt>tag</tt></i>:</span></td> 1415<td>The tag name in lowercase</td> 1416</tr> 1417<tr> 1418<td><span class="term"><i><tt>Returns</tt></i>:</span></td> 1419<td>the related <a href="libxml2-HTMLparser.html#htmlElemDescPtr">htmlElemDescPtr</a> or NULL if not found.</td> 1420</tr> 1421</tbody> 1422</table></div> 1423</div> 1424<hr> 1425</div> 1426</div> 1427</body> 1428</html> 1429