• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1   #8.2.4 Tokenization Table of contents 8.4 Serializing HTML fragments
2
3   WHATWG
4
5HTML 5
6
7Draft Recommendation — 13 January 2009
8
9   ← 8.2.4 Tokenization – Table of contents – 8.4 Serializing HTML
10   fragments →
11
12    8.2.5 Tree construction
13
14   The input to the tree construction stage is a sequence of tokens from
15   the tokenization stage. The tree construction stage is associated with
16   a DOM Document object when a parser is created. The "output" of this
17   stage consists of dynamically modifying or extending that document's
18   DOM tree.
19
20   This specification does not define when an interactive user agent has
21   to render the Document so that it is available to the user, or when it
22   has to begin accepting user input.
23
24   As each token is emitted from the tokeniser, the user agent must
25   process the token according to the rules given in the section
26   corresponding to the current insertion mode.
27
28   When the steps below require the UA to insert a character into a node,
29   if that node has a child immediately before where the character is to
30   be inserted, and that child is a Text node, and that Text node was the
31   last node that the parser inserted into the document, then the
32   character must be appended to that Text node; otherwise, a new Text
33   node whose data is just that character must be inserted in the
34   appropriate place.
35
36   DOM mutation events must not fire for changes caused by the UA parsing
37   the document. (Conceptually, the parser is not mutating the DOM, it is
38   constructing it.) This includes the parsing of any content inserted
39   using document.write() and document.writeln() calls. [DOM3EVENTS]
40
41   Not all of the tag names mentioned below are conformant tag names in
42   this specification; many are included to handle legacy content. They
43   still form part of the algorithm that implementations are required to
44   implement to claim conformance.
45
46   The algorithm described below places no limit on the depth of the DOM
47   tree generated, or on the length of tag names, attribute names,
48   attribute values, text nodes, etc. While implementors are encouraged to
49   avoid arbitrary limits, it is recognized that practical concerns will
50   likely force user agents to impose nesting depths.
51
52      8.2.5.1 Creating and inserting elements
53
54   When the steps below require the UA to create an element for a token in
55   a particular namespace, the UA must create a node implementing the
56   interface appropriate for the element type corresponding to the tag
57   name of the token in the given namespace (as given in the specification
58   that defines that element, e.g. for an a element in the HTML namespace,
59   this specification defines it to be the HTMLAnchorElement interface),
60   with the tag name being the name of that element, with the node being
61   in the given namespace, and with the attributes on the node being those
62   given in the given token.
63
64   The interface appropriate for an element in the HTML namespace that is
65   not defined in this specification is HTMLElement. The interface
66   appropriate for an element in another namespace that is not defined by
67   that namespace's specification is Element.
68
69   When a resettable element is created in this manner, its reset
70   algorithm must be invoked once the attributes are set. (This
71   initializes the element's value and checkedness based on the element's
72   attributes.)
73     __________________________________________________________________
74
75   When the steps below require the UA to insert an HTML element for a
76   token, the UA must first create an element for the token in the HTML
77   namespace, and then append this node to the current node, and push it
78   onto the stack of open elements so that it is the new current node.
79
80   The steps below may also require that the UA insert an HTML element in
81   a particular place, in which case the UA must follow the same steps
82   except that it must insert or append the new node in the location
83   specified instead of appending it to the current node. (This happens in
84   particular during the parsing of tables with invalid content.)
85
86   If an element created by the insert an HTML element algorithm is a
87   form-associated element, and the form element pointer is not null, and
88   the newly created element doesn't have a form attribute, the user agent
89   must associate the newly created element with the form element pointed
90   to by the form element pointer before inserting it wherever it is to be
91   inserted.
92     __________________________________________________________________
93
94   When the steps below require the UA to insert a foreign element for a
95   token, the UA must first create an element for the token in the given
96   namespace, and then append this node to the current node, and push it
97   onto the stack of open elements so that it is the new current node. If
98   the newly created element has an xmlns attribute in the XMLNS namespace
99   whose value is not exactly the same as the element's namespace, that is
100   a parse error.
101
102   When the steps below require the user agent to adjust MathML attributes
103   for a token, then, if the token has an attribute named definitionurl,
104   change its name to definitionURL (note the case difference).
105
106   When the steps below require the user agent to adjust foreign
107   attributes for a token, then, if any of the attributes on the token
108   match the strings given in the first column of the following table, let
109   the attribute be a namespaced attribute, with the prefix being the
110   string given in the corresponding cell in the second column, the local
111   name being the string given in the corresponding cell in the third
112   column, and the namespace being the namespace given in the
113   corresponding cell in the fourth column. (This fixes the use of
114   namespaced attributes, in particular xml:lang.)
115
116   Attribute name Prefix Local name    Namespace
117   xlink:actuate  xlink  actuate    XLink namespace
118   xlink:arcrole  xlink  arcrole    XLink namespace
119   xlink:href     xlink  href       XLink namespace
120   xlink:role     xlink  role       XLink namespace
121   xlink:show     xlink  show       XLink namespace
122   xlink:title    xlink  title      XLink namespace
123   xlink:type     xlink  type       XLink namespace
124   xml:base       xml    base       XML namespace
125   xml:lang       xml    lang       XML namespace
126   xml:space      xml    space      XML namespace
127   xmlns          (none) xmlns      XMLNS namespace
128   xmlns:xlink    xmlns  xlink      XMLNS namespace
129     __________________________________________________________________
130
131   The generic CDATA element parsing algorithm and the generic RCDATA
132   element parsing algorithm consist of the following steps. These
133   algorithms are always invoked in response to a start tag token.
134    1. Insert an HTML element for the token.
135    2. If the algorithm that was invoked is the generic CDATA element
136       parsing algorithm, switch the tokeniser's content model flag to the
137       CDATA state; otherwise the algorithm invoked was the generic RCDATA
138       element parsing algorithm, switch the tokeniser's content model
139       flag to the RCDATA state.
140    3. Let the original insertion mode be the current insertion mode.
141    4. Then, switch the insertion mode to "in CDATA/RCDATA".
142
143      8.2.5.2 Closing elements that have implied end tags
144
145   When the steps below require the UA to generate implied end tags, then,
146   while the current node is a dd element, a dt element, an li element, an
147   option element, an optgroup element, a p element, an rp element, or an
148   rt element, the UA must pop the current node off the stack of open
149   elements.
150
151   If a step requires the UA to generate implied end tags but lists an
152   element to exclude from the process, then the UA must perform the above
153   steps as if that element was not in the above list.
154
155      8.2.5.3 Foster parenting
156
157   Foster parenting happens when content is misnested in tables.
158
159   When a node node is to be foster parented, the node node must be
160   inserted into the foster parent element, and the current table must be
161   marked as tainted. (Once the current table has been tainted, whitespace
162   characters are inserted into the foster parent element instead of the
163   current node.)
164
165   The foster parent element is the parent element of the last table
166   element in the stack of open elements, if there is a table element and
167   it has such a parent element. If there is no table element in the stack
168   of open elements (fragment case), then the foster parent element is the
169   first element in the stack of open elements (the html element).
170   Otherwise, if there is a table element in the stack of open elements,
171   but the last table element in the stack of open elements has no parent,
172   or its parent node is not an element, then the foster parent element is
173   the element before the last table element in the stack of open
174   elements.
175
176   If the foster parent element is the parent element of the last table
177   element in the stack of open elements, then node must be inserted
178   immediately before the last table element in the stack of open elements
179   in the foster parent element; otherwise, node must be appended to the
180   foster parent element.
181
182      8.2.5.4 The "initial" insertion mode
183
184   When the insertion mode is "initial", tokens must be handled as
185   follows:
186
187   A character token that is one of one of U+0009 CHARACTER TABULATION,
188          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
189          Ignore the token.
190
191   A comment token
192          Append a Comment node to the Document object with the data
193          attribute set to the data given in the comment token.
194
195   A DOCTYPE token
196          If the DOCTYPE token's name is not a case-sensitive match for
197          the string "html", or if the token's public identifier is
198          neither missing nor a case-sensitive match for the string
199          "XSLT-compat", or if the token's system identifier is not
200          missing, then there is a parse error (this is the DOCTYPE parse
201          error). Conformance checkers may, instead of reporting this
202          error, switch to a conformance checking mode for another
203          language (e.g. based on the DOCTYPE token a conformance checker
204          could recognize that the document is an HTML4-era document, and
205          defer to an HTML4 conformance checker.)
206
207          Append a DocumentType node to the Document node, with the name
208          attribute set to the name given in the DOCTYPE token; the
209          publicId attribute set to the public identifier given in the
210          DOCTYPE token, or the empty string if the public identifier was
211          missing; the systemId attribute set to the system identifier
212          given in the DOCTYPE token, or the empty string if the system
213          identifier was missing; and the other attributes specific to
214          DocumentType objects set to null and empty lists as appropriate.
215          Associate the DocumentType node with the Document object so that
216          it is returned as the value of the doctype attribute of the
217          Document object.
218
219          Then, if the DOCTYPE token matches one of the conditions in the
220          following list, then set the document to quirks mode:
221
222          + The force-quirks flag is set to on.
223          + The name is set to anything other than "HTML".
224          + The public identifier starts with: "+//Silmaril//dtd html Pro
225            v0r11 19970101//"
226          + The public identifier starts with: "-//AdvaSoft Ltd//DTD HTML
227            3.0 asWedit + extensions//"
228          + The public identifier starts with: "-//AS//DTD HTML 3.0
229            asWedit + extensions//"
230          + The public identifier starts with: "-//IETF//DTD HTML 2.0
231            Level 1//"
232          + The public identifier starts with: "-//IETF//DTD HTML 2.0
233            Level 2//"
234          + The public identifier starts with: "-//IETF//DTD HTML 2.0
235            Strict Level 1//"
236          + The public identifier starts with: "-//IETF//DTD HTML 2.0
237            Strict Level 2//"
238          + The public identifier starts with: "-//IETF//DTD HTML 2.0
239            Strict//"
240          + The public identifier starts with: "-//IETF//DTD HTML 2.0//"
241          + The public identifier starts with: "-//IETF//DTD HTML 2.1E//"
242          + The public identifier starts with: "-//IETF//DTD HTML 3.0//"
243          + The public identifier starts with: "-//IETF//DTD HTML 3.2
244            Final//"
245          + The public identifier starts with: "-//IETF//DTD HTML 3.2//"
246          + The public identifier starts with: "-//IETF//DTD HTML 3//"
247          + The public identifier starts with: "-//IETF//DTD HTML Level
248            0//"
249          + The public identifier starts with: "-//IETF//DTD HTML Level
250            1//"
251          + The public identifier starts with: "-//IETF//DTD HTML Level
252            2//"
253          + The public identifier starts with: "-//IETF//DTD HTML Level
254            3//"
255          + The public identifier starts with: "-//IETF//DTD HTML Strict
256            Level 0//"
257          + The public identifier starts with: "-//IETF//DTD HTML Strict
258            Level 1//"
259          + The public identifier starts with: "-//IETF//DTD HTML Strict
260            Level 2//"
261          + The public identifier starts with: "-//IETF//DTD HTML Strict
262            Level 3//"
263          + The public identifier starts with: "-//IETF//DTD HTML
264            Strict//"
265          + The public identifier starts with: "-//IETF//DTD HTML//"
266          + The public identifier starts with: "-//Metrius//DTD Metrius
267            Presentational//"
268          + The public identifier starts with: "-//Microsoft//DTD Internet
269            Explorer 2.0 HTML Strict//"
270          + The public identifier starts with: "-//Microsoft//DTD Internet
271            Explorer 2.0 HTML//"
272          + The public identifier starts with: "-//Microsoft//DTD Internet
273            Explorer 2.0 Tables//"
274          + The public identifier starts with: "-//Microsoft//DTD Internet
275            Explorer 3.0 HTML Strict//"
276          + The public identifier starts with: "-//Microsoft//DTD Internet
277            Explorer 3.0 HTML//"
278          + The public identifier starts with: "-//Microsoft//DTD Internet
279            Explorer 3.0 Tables//"
280          + The public identifier starts with: "-//Netscape Comm.
281            Corp.//DTD HTML//"
282          + The public identifier starts with: "-//Netscape Comm.
283            Corp.//DTD Strict HTML//"
284          + The public identifier starts with: "-//O'Reilly and
285            Associates//DTD HTML 2.0//"
286          + The public identifier starts with: "-//O'Reilly and
287            Associates//DTD HTML Extended 1.0//"
288          + The public identifier starts with: "-//O'Reilly and
289            Associates//DTD HTML Extended Relaxed 1.0//"
290          + The public identifier starts with: "-//SoftQuad Software//DTD
291            HoTMetaL PRO 6.0::19990601::extensions to HTML 4.0//"
292          + The public identifier starts with: "-//SoftQuad//DTD HoTMetaL
293            PRO 4.0::19971010::extensions to HTML 4.0//"
294          + The public identifier starts with: "-//Spyglass//DTD HTML 2.0
295            Extended//"
296          + The public identifier starts with: "-//SQ//DTD HTML 2.0
297            HoTMetaL + extensions//"
298          + The public identifier starts with: "-//Sun Microsystems
299            Corp.//DTD HotJava HTML//"
300          + The public identifier starts with: "-//Sun Microsystems
301            Corp.//DTD HotJava Strict HTML//"
302          + The public identifier starts with: "-//W3C//DTD HTML 3
303            1995-03-24//"
304          + The public identifier starts with: "-//W3C//DTD HTML 3.2
305            Draft//"
306          + The public identifier starts with: "-//W3C//DTD HTML 3.2
307            Final//"
308          + The public identifier starts with: "-//W3C//DTD HTML 3.2//"
309          + The public identifier starts with: "-//W3C//DTD HTML 3.2S
310            Draft//"
311          + The public identifier starts with: "-//W3C//DTD HTML 4.0
312            Frameset//"
313          + The public identifier starts with: "-//W3C//DTD HTML 4.0
314            Transitional//"
315          + The public identifier starts with: "-//W3C//DTD HTML
316            Experimental 19960712//"
317          + The public identifier starts with: "-//W3C//DTD HTML
318            Experimental 970421//"
319          + The public identifier starts with: "-//W3C//DTD W3 HTML//"
320          + The public identifier starts with: "-//W3O//DTD W3 HTML 3.0//"
321          + The public identifier is set to: "-//W3O//DTD W3 HTML Strict
322            3.0//EN//"
323          + The public identifier starts with: "-//WebTechs//DTD Mozilla
324            HTML 2.0//"
325          + The public identifier starts with: "-//WebTechs//DTD Mozilla
326            HTML//"
327          + The public identifier is set to: "-/W3C/DTD HTML 4.0
328            Transitional/EN"
329          + The public identifier is set to: "HTML"
330          + The system identifier is set to:
331            "http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd"
332          + The system identifier is missing and the public identifier
333            starts with: "-//W3C//DTD HTML 4.01 Frameset//"
334          + The system identifier is missing and the public identifier
335            starts with: "-//W3C//DTD HTML 4.01 Transitional//"
336
337          Otherwise, if the DOCTYPE token matches one of the conditions in
338          the following list, then set the document to limited quirks
339          mode:
340
341          + The public identifier starts with: "-//W3C//DTD XHTML 1.0
342            Frameset//"
343          + The public identifier starts with: "-//W3C//DTD XHTML 1.0
344            Transitional//"
345          + The system identifier is not missing and the public identifier
346            starts with: "-//W3C//DTD HTML 4.01 Frameset//"
347          + The system identifier is not missing and the public identifier
348            starts with: "-//W3C//DTD HTML 4.01 Transitional//"
349
350          The name, system identifier, and public identifier strings must
351          be compared to the values given in the lists above in an ASCII
352          case-insensitive manner. A system identifier whose value is the
353          empty string is not considered missing for the purposes of the
354          conditions above.
355
356          Then, switch the insertion mode to "before html".
357
358   Anything else
359          Parse error.
360
361          Set the document to quirks mode.
362
363          Switch the insertion mode to "before html", then reprocess the
364          current token.
365
366      8.2.5.5 The "before html" insertion mode
367
368   When the insertion mode is "before html", tokens must be handled as
369   follows:
370
371   A DOCTYPE token
372          Parse error. Ignore the token.
373
374   A comment token
375          Append a Comment node to the Document object with the data
376          attribute set to the data given in the comment token.
377
378   A character token that is one of one of U+0009 CHARACTER TABULATION,
379          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
380          Ignore the token.
381
382   A start tag whose tag name is "html"
383          Create an element for the token in the HTML namespace. Append it
384          to the Document object. Put this element in the stack of open
385          elements.
386
387          If the token has an attribute "manifest", then resolve the value
388          of that attribute to an absolute URL, and if that is successful,
389          run the application cache selection algorithm with the resulting
390          absolute URL. Otherwise, if there is no such attribute or
391          resolving it fails, run the application cache selection
392          algorithm with no manifest. The algorithm must be passed the
393          Document object.
394
395          Switch the insertion mode to "before head".
396
397   Anything else
398          Create an HTMLElement node with the tag name html, in the HTML
399          namespace. Append it to the Document object. Put this element in
400          the stack of open elements.
401
402          Run the application cache selection algorithm with no manifest,
403          passing it the Document object.
404
405          Switch the insertion mode to "before head", then reprocess the
406          current token.
407
408          Should probably make end tags be ignored, so that "</head><!--
409          --><html>" puts the comment before the root node (or should we?)
410
411   The root element can end up being removed from the Document object,
412   e.g. by scripts; nothing in particular happens in such cases, content
413   continues being appended to the nodes as described in the next section.
414
415      8.2.5.6 The "before head" insertion mode
416
417   When the insertion mode is "before head", tokens must be handled as
418   follows:
419
420   A character token that is one of one of U+0009 CHARACTER TABULATION,
421          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
422          Ignore the token.
423
424   A comment token
425          Append a Comment node to the current node with the data
426          attribute set to the data given in the comment token.
427
428   A DOCTYPE token
429          Parse error. Ignore the token.
430
431   A start tag whose tag name is "html"
432          Process the token using the rules for the "in body" insertion
433          mode.
434
435   A start tag whose tag name is "head"
436          Insert an HTML element for the token.
437
438          Set the head element pointer to the newly created head element.
439
440          Switch the insertion mode to "in head".
441
442   An end tag whose tag name is one of: "head", "br"
443          Act as if a start tag token with the tag name "head" and no
444          attributes had been seen, then reprocess the current token.
445
446   Any other end tag
447          Parse error. Ignore the token.
448
449   Anything else
450          Act as if a start tag token with the tag name "head" and no
451          attributes had been seen, then reprocess the current token.
452
453          This will result in an empty head element being generated, with
454          the current token being reprocessed in the "after head"
455          insertion mode.
456
457      8.2.5.7 The "in head" insertion mode
458
459   When the insertion mode is "in head", tokens must be handled as
460   follows:
461
462   A character token that is one of one of U+0009 CHARACTER TABULATION,
463          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
464          Insert the character into the current node.
465
466   A comment token
467          Append a Comment node to the current node with the data
468          attribute set to the data given in the comment token.
469
470   A DOCTYPE token
471          Parse error. Ignore the token.
472
473   A start tag whose tag name is "html"
474          Process the token using the rules for the "in body" insertion
475          mode.
476
477   A start tag whose tag name is one of: "base", "command", "eventsource",
478          "link"
479          Insert an HTML element for the token. Immediately pop the
480          current node off the stack of open elements.
481
482          Acknowledge the token's self-closing flag, if it is set.
483
484   A start tag whose tag name is "meta"
485          Insert an HTML element for the token. Immediately pop the
486          current node off the stack of open elements.
487
488          Acknowledge the token's self-closing flag, if it is set.
489
490          If the element has a charset attribute, and its value is a
491          supported encoding, and the confidence is currently tentative,
492          then change the encoding to the encoding given by the value of
493          the charset attribute.
494
495          Otherwise, if the element has a content attribute, and applying
496          the algorithm for extracting an encoding from a Content-Type to
497          its value returns a supported encoding encoding, and the
498          confidence is currently tentative, then change the encoding to
499          the encoding encoding.
500
501   A start tag whose tag name is "title"
502          Follow the generic RCDATA element parsing algorithm.
503
504   A start tag whose tag name is "noscript", if the scripting flag is
505          enabled
506
507   A start tag whose tag name is one of: "noframes", "style"
508          Follow the generic CDATA element parsing algorithm.
509
510   A start tag whose tag name is "noscript", if the scripting flag is
511          disabled
512          Insert an HTML element for the token.
513
514          Switch the insertion mode to "in head noscript".
515
516   A start tag whose tag name is "script"
517
518         1. Create an element for the token in the HTML namespace.
519         2. Mark the element as being "parser-inserted".
520            This ensures that, if the script is external, any
521            document.write() calls in the script will execute in-line,
522            instead of blowing the document away, as would happen in most
523            other cases. It also prevents the script from executing until
524            the end tag is seen.
525         3. If the parser was originally created for the HTML fragment
526            parsing algorithm, then mark the script element as "already
527            executed". (fragment case)
528         4. Append the new element to the current node.
529         5. Switch the tokeniser's content model flag to the CDATA state.
530         6. Let the original insertion mode be the current insertion mode.
531         7. Switch the insertion mode to "in CDATA/RCDATA".
532
533   An end tag whose tag name is "head"
534          Pop the current node (which will be the head element) off the
535          stack of open elements.
536
537          Switch the insertion mode to "after head".
538
539   An end tag whose tag name is "br"
540          Act as described in the "anything else" entry below.
541
542   A start tag whose tag name is "head"
543   Any other end tag
544          Parse error. Ignore the token.
545
546   Anything else
547          Act as if an end tag token with the tag name "head" had been
548          seen, and reprocess the current token.
549
550          In certain UAs, some elements don't trigger the "in body" mode
551          straight away, but instead get put into the head. Do we want to
552          copy that?
553
554      8.2.5.8 The "in head noscript" insertion mode
555
556   When the insertion mode is "in head noscript", tokens must be handled
557   as follows:
558
559   A DOCTYPE token
560          Parse error. Ignore the token.
561
562   A start tag whose tag name is "html"
563          Process the token using the rules for the "in body" insertion
564          mode.
565
566   An end tag whose tag name is "noscript"
567          Pop the current node (which will be a noscript element) from the
568          stack of open elements; the new current node will be a head
569          element.
570
571          Switch the insertion mode to "in head".
572
573   A character token that is one of one of U+0009 CHARACTER TABULATION,
574          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
575
576   A comment token
577   A start tag whose tag name is one of: "link", "meta", "noframes",
578          "style"
579          Process the token using the rules for the "in head" insertion
580          mode.
581
582   An end tag whose tag name is "br"
583          Act as described in the "anything else" entry below.
584
585   A start tag whose tag name is one of: "head", "noscript"
586   Any other end tag
587          Parse error. Ignore the token.
588
589   Anything else
590          Parse error. Act as if an end tag with the tag name "noscript"
591          had been seen and reprocess the current token.
592
593      8.2.5.9 The "after head" insertion mode
594
595   When the insertion mode is "after head", tokens must be handled as
596   follows:
597
598   A character token that is one of one of U+0009 CHARACTER TABULATION,
599          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
600          Insert the character into the current node.
601
602   A comment token
603          Append a Comment node to the current node with the data
604          attribute set to the data given in the comment token.
605
606   A DOCTYPE token
607          Parse error. Ignore the token.
608
609   A start tag whose tag name is "html"
610          Process the token using the rules for the "in body" insertion
611          mode.
612
613   A start tag whose tag name is "body"
614          Insert an HTML element for the token.
615
616          Switch the insertion mode to "in body".
617
618   A start tag whose tag name is "frameset"
619          Insert an HTML element for the token.
620
621          Switch the insertion mode to "in frameset".
622
623   A start tag token whose tag name is one of: "base", "link", "meta",
624          "noframes", "script", "style", "title"
625          Parse error.
626
627          Push the node pointed to by the head element pointer onto the
628          stack of open elements.
629
630          Process the token using the rules for the "in head" insertion
631          mode.
632
633          Remove the node pointed to by the head element pointer from the
634          stack of open elements.
635
636   An end tag whose tag name is "br"
637          Act as described in the "anything else" entry below.
638
639   A start tag whose tag name is "head"
640   Any other end tag
641          Parse error. Ignore the token.
642
643   Anything else
644          Act as if a start tag token with the tag name "body" and no
645          attributes had been seen, and then reprocess the current token.
646
647      8.2.5.10 The "in body" insertion mode
648
649   When the insertion mode is "in body", tokens must be handled as
650   follows:
651
652   A character token
653          Reconstruct the active formatting elements, if any.
654
655          Insert the token's character into the current node.
656
657   A comment token
658          Append a Comment node to the current node with the data
659          attribute set to the data given in the comment token.
660
661   A DOCTYPE token
662          Parse error. Ignore the token.
663
664   A start tag whose tag name is "html"
665          Parse error. For each attribute on the token, check to see if
666          the attribute is already present on the top element of the stack
667          of open elements. If it is not, add the attribute and its
668          corresponding value to that element.
669
670   A start tag token whose tag name is one of: "base", "command",
671          "eventsource", "link", "meta", "noframes", "script", "style",
672          "title"
673          Process the token using the rules for the "in head" insertion
674          mode.
675
676   A start tag whose tag name is "body"
677          Parse error.
678
679          If the second element on the stack of open elements is not a
680          body element, or, if the stack of open elements has only one
681          node on it, then ignore the token. (fragment case)
682
683          Otherwise, for each attribute on the token, check to see if the
684          attribute is already present on the body element (the second
685          element) on the stack of open elements. If it is not, add the
686          attribute and its corresponding value to that element.
687
688   An end-of-file token
689          If there is a node in the stack of open elements that is not
690          either a dd element, a dt element, an li element, a p element, a
691          tbody element, a td element, a tfoot element, a th element, a
692          thead element, a tr element, the body element, or the html
693          element, then this is a parse error.
694
695          Stop parsing.
696
697   An end tag whose tag name is "body"
698          If the stack of open elements does not have a body element in
699          scope, this is a parse error; ignore the token.
700
701          Otherwise, if there is a node in the stack of open elements that
702          is not either a dd element, a dt element, an li element, a p
703          element, a tbody element, a td element, a tfoot element, a th
704          element, a thead element, a tr element, the body element, or the
705          html element, then this is a parse error.
706
707          Switch the insertion mode to "after body".
708
709   An end tag whose tag name is "html"
710          Act as if an end tag with tag name "body" had been seen, then,
711          if that token wasn't ignored, reprocess the current token.
712
713          The fake end tag token here can only be ignored in the fragment
714          case.
715
716   A start tag whose tag name is one of: "address", "article", "aside",
717          "blockquote", "center", "datagrid", "details", "dialog", "dir",
718          "div", "dl", "fieldset", "figure", "footer", "header", "menu",
719          "nav", "ol", "p", "section", "ul"
720          If the stack of open elements has a p element in scope, then act
721          as if an end tag with the tag name "p" had been seen.
722
723          Insert an HTML element for the token.
724
725   A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5",
726          "h6"
727          If the stack of open elements has a p element in scope, then act
728          as if an end tag with the tag name "p" had been seen.
729
730          If the current node is an element whose tag name is one of "h1",
731          "h2", "h3", "h4", "h5", or "h6", then this is a parse error; pop
732          the current node off the stack of open elements.
733
734          Insert an HTML element for the token.
735
736   A start tag whose tag name is one of: "pre", "listing"
737          If the stack of open elements has a p element in scope, then act
738          as if an end tag with the tag name "p" had been seen.
739
740          Insert an HTML element for the token.
741
742          If the next token is a U+000A LINE FEED (LF) character token,
743          then ignore that token and move on to the next one. (Newlines at
744          the start of pre blocks are ignored as an authoring
745          convenience.)
746
747   A start tag whose tag name is "form"
748          If the form element pointer is not null, then this is a parse
749          error; ignore the token.
750
751          Otherwise:
752
753          If the stack of open elements has a p element in scope, then act
754          as if an end tag with the tag name "p" had been seen.
755
756          Insert an HTML element for the token, and set the form element
757          pointer to point to the element created.
758
759   A start tag whose tag name is "li"
760          Run the following algorithm:
761
762         1. Initialize node to be the current node (the bottommost node of
763            the stack).
764         2. If node is an li element, then act as if an end tag with the
765            tag name "li" had been seen, then jump to the last step.
766         3. If node is not in the formatting category, and is not in the
767            phrasing category, and is not an address, div, or p element,
768            then jump to the last step.
769         4. Otherwise, set node to the previous entry in the stack of open
770            elements and return to step 2.
771         5. This is the last step.
772            If the stack of open elements has a p element in scope, then
773            act as if an end tag with the tag name "p" had been seen.
774            Finally, insert an HTML element for the token.
775
776   A start tag whose tag name is one of: "dd", "dt"
777          Run the following algorithm:
778
779         1. Initialize node to be the current node (the bottommost node of
780            the stack).
781         2. If node is a dd or dt element, then act as if an end tag with
782            the same tag name as node had been seen, then jump to the last
783            step.
784         3. If node is not in the formatting category, and is not in the
785            phrasing category, and is not an address, div, or p element,
786            then jump to the last step.
787         4. Otherwise, set node to the previous entry in the stack of open
788            elements and return to step 2.
789         5. This is the last step.
790            If the stack of open elements has a p element in scope, then
791            act as if an end tag with the tag name "p" had been seen.
792            Finally, insert an HTML element for the token.
793
794   A start tag whose tag name is "plaintext"
795          If the stack of open elements has a p element in scope, then act
796          as if an end tag with the tag name "p" had been seen.
797
798          Insert an HTML element for the token.
799
800          Switch the content model flag to the PLAINTEXT state.
801
802          Once a start tag with the tag name "plaintext" has been seen,
803          that will be the last token ever seen other than character
804          tokens (and the end-of-file token), because there is no way to
805          switch the content model flag out of the PLAINTEXT state.
806
807   An end tag whose tag name is one of: "address", "article", "aside",
808          "blockquote", "center", "datagrid", "details", "dialog", "dir",
809          "div", "dl", "fieldset", "figure", "footer", "header",
810          "listing", "menu", "nav", "ol", "pre", "section", "ul"
811          If the stack of open elements does not have an element in scope
812          with the same tag name as that of the token, then this is a
813          parse error; ignore the token.
814
815          Otherwise, run these steps:
816
817         1. Generate implied end tags.
818         2. If the current node is not an element with the same tag name
819            as that of the token, then this is a parse error.
820         3. Pop elements from the stack of open elements until an element
821            with the same tag name as the token has been popped from the
822            stack.
823
824   An end tag whose tag name is "form"
825          Let node be the element that the form element pointer is set to.
826
827          Set the form element pointer to null.
828
829          If node is null or the stack of open elements does not have node
830          in scope, then this is a parse error; ignore the token.
831
832          Otherwise, run these steps:
833
834         1. Generate implied end tags.
835         2. If the current node is not node, then this is a parse error.
836         3. Remove node from the stack of open elements.
837
838   An end tag whose tag name is "p"
839          If the stack of open elements does not have an element in scope
840          with the same tag name as that of the token, then this is a
841          parse error; act as if a start tag with the tag name p had been
842          seen, then reprocess the current token.
843
844          Otherwise, run these steps:
845
846         1. Generate implied end tags, except for elements with the same
847            tag name as the token.
848         2. If the current node is not an element with the same tag name
849            as that of the token, then this is a parse error.
850         3. Pop elements from the stack of open elements until an element
851            with the same tag name as the token has been popped from the
852            stack.
853
854   An end tag whose tag name is one of: "dd", "dt", "li"
855          If the stack of open elements does not have an element in scope
856          with the same tag name as that of the token, then this is a
857          parse error; ignore the token.
858
859          Otherwise, run these steps:
860
861         1. Generate implied end tags, except for elements with the same
862            tag name as the token.
863         2. If the current node is not an element with the same tag name
864            as that of the token, then this is a parse error.
865         3. Pop elements from the stack of open elements until an element
866            with the same tag name as the token has been popped from the
867            stack.
868
869   An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6"
870          If the stack of open elements does not have an element in scope
871          whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6",
872          then this is a parse error; ignore the token.
873
874          Otherwise, run these steps:
875
876         1. Generate implied end tags.
877         2. If the current node is not an element with the same tag name
878            as that of the token, then this is a parse error.
879         3. Pop elements from the stack of open elements until an element
880            whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6"
881            has been popped from the stack.
882
883   An end tag whose tag name is "sarcasm"
884          Take a deep breath, then act as described in the "any other end
885          tag" entry below.
886
887   A start tag whose tag name is "a"
888          If the list of active formatting elements contains an element
889          whose tag name is "a" between the end of the list and the last
890          marker on the list (or the start of the list if there is no
891          marker on the list), then this is a parse error; act as if an
892          end tag with the tag name "a" had been seen, then remove that
893          element from the list of active formatting elements and the
894          stack of open elements if the end tag didn't already remove it
895          (it might not have if the element is not in table scope).
896
897          In the non-conforming stream
898          <a href="a">a<table><a href="b">b</table>x, the first a element
899          would be closed upon seeing the second one, and the "x"
900          character would be inside a link to "b", not to "a". This is
901          despite the fact that the outer a element is not in table scope
902          (meaning that a regular </a> end tag at the start of the table
903          wouldn't close the outer a element).
904
905          Reconstruct the active formatting elements, if any.
906
907          Insert an HTML element for the token. Add that element to the
908          list of active formatting elements.
909
910   A start tag whose tag name is one of: "b", "big", "em", "font", "i",
911          "s", "small", "strike", "strong", "tt", "u"
912          Reconstruct the active formatting elements, if any.
913
914          Insert an HTML element for the token. Add that element to the
915          list of active formatting elements.
916
917   A start tag whose tag name is "nobr"
918          Reconstruct the active formatting elements, if any.
919
920          If the stack of open elements has a nobr element in scope, then
921          this is a parse error; act as if an end tag with the tag name
922          "nobr" had been seen, then once again reconstruct the active
923          formatting elements, if any.
924
925          Insert an HTML element for the token. Add that element to the
926          list of active formatting elements.
927
928   An end tag whose tag name is one of: "a", "b", "big", "em", "font",
929          "i", "nobr", "s", "small", "strike", "strong", "tt", "u"
930          Follow these steps:
931
932         1. Let the formatting element be the last element in the list of
933            active formatting elements that:
934               o is between the end of the list and the last scope marker
935                 in the list, if any, or the start of the list otherwise,
936                 and
937               o has the same tag name as the token.
938            If there is no such node, or, if that node is also in the
939            stack of open elements but the element is not in scope, then
940            this is a parse error; ignore the token, and abort these
941            steps.
942            Otherwise, if there is such a node, but that node is not in
943            the stack of open elements, then this is a parse error; remove
944            the element from the list, and abort these steps.
945            Otherwise, there is a formatting element and that element is
946            in the stack and is in scope. If the element is not the
947            current node, this is a parse error. In any case, proceed with
948            the algorithm as written in the following steps.
949         2. Let the furthest block be the topmost node in the stack of
950            open elements that is lower in the stack than the formatting
951            element, and is not an element in the phrasing or formatting
952            categories. There might not be one.
953         3. If there is no furthest block, then the UA must skip the
954            subsequent steps and instead just pop all the nodes from the
955            bottom of the stack of open elements, from the current node up
956            to and including the formatting element, and remove the
957            formatting element from the list of active formatting
958            elements.
959         4. Let the common ancestor be the element immediately above the
960            formatting element in the stack of open elements.
961         5. If the furthest block has a parent node, then remove the
962            furthest block from its parent node.
963         6. Let a bookmark note the position of the formatting element in
964            the list of active formatting elements relative to the
965            elements on either side of it in the list.
966         7. Let node and last node be the furthest block. Follow these
967            steps:
968              1. Let node be the element immediately above node in the
969                 stack of open elements.
970              2. If node is not in the list of active formatting elements,
971                 then remove node from the stack of open elements and then
972                 go back to step 1.
973              3. Otherwise, if node is the formatting element, then go to
974                 the next step in the overall algorithm.
975              4. Otherwise, if last node is the furthest block, then move
976                 the aforementioned bookmark to be immediately after the
977                 node in the list of active formatting elements.
978              5. If node has any children, perform a shallow clone of
979                 node, replace the entry for node in the list of active
980                 formatting elements with an entry for the clone, replace
981                 the entry for node in the stack of open elements with an
982                 entry for the clone, and let node be the clone.
983              6. Insert last node into node, first removing it from its
984                 previous parent node if any.
985              7. Let last node be node.
986              8. Return to step 1 of this inner set of steps.
987         8. If the common ancestor node is a table, tbody, tfoot, thead,
988            or tr element, then, foster parent whatever last node ended up
989            being in the previous step.
990            Otherwise, append whatever last node ended up being in the
991            previous step to the common ancestor node, first removing it
992            from its previous parent node if any.
993         9. Perform a shallow clone of the formatting element.
994        10. Take all of the child nodes of the furthest block and append
995            them to the clone created in the last step.
996        11. Append that clone to the furthest block.
997        12. Remove the formatting element from the list of active
998            formatting elements, and insert the clone into the list of
999            active formatting elements at the position of the
1000            aforementioned bookmark.
1001        13. Remove the formatting element from the stack of open elements,
1002            and insert the clone into the stack of open elements
1003            immediately below the position of the furthest block in that
1004            stack.
1005        14. Jump back to step 1 in this series of steps.
1006
1007          The way these steps are defined, only elements in the formatting
1008          category ever get cloned by this algorithm.
1009
1010          Because of the way this algorithm causes elements to change
1011          parents, it has been dubbed the "adoption agency algorithm" (in
1012          contrast with other possibly algorithms for dealing with
1013          misnested content, which included the "incest algorithm", the
1014          "secret affair algorithm", and the "Heisenberg algorithm").
1015
1016   A start tag whose tag name is "button"
1017          If the stack of open elements has a button element in scope,
1018          then this is a parse error; act as if an end tag with the tag
1019          name "button" had been seen, then reprocess the token.
1020
1021          Otherwise:
1022
1023          Reconstruct the active formatting elements, if any.
1024
1025          Insert an HTML element for the token.
1026
1027          Insert a marker at the end of the list of active formatting
1028          elements.
1029
1030   A start tag token whose tag name is one of: "applet", "marquee",
1031          "object"
1032          Reconstruct the active formatting elements, if any.
1033
1034          Insert an HTML element for the token.
1035
1036          Insert a marker at the end of the list of active formatting
1037          elements.
1038
1039   An end tag token whose tag name is one of: "applet", "button",
1040          "marquee", "object"
1041          If the stack of open elements does not have an element in scope
1042          with the same tag name as that of the token, then this is a
1043          parse error; ignore the token.
1044
1045          Otherwise, run these steps:
1046
1047         1. Generate implied end tags.
1048         2. If the current node is not an element with the same tag name
1049            as that of the token, then this is a parse error.
1050         3. Pop elements from the stack of open elements until an element
1051            with the same tag name as the token has been popped from the
1052            stack.
1053         4. Clear the list of active formatting elements up to the last
1054            marker.
1055
1056   A start tag whose tag name is "xmp"
1057          Reconstruct the active formatting elements, if any.
1058
1059          Follow the generic CDATA element parsing algorithm.
1060
1061   A start tag whose tag name is "table"
1062          If the stack of open elements has a p element in scope, then act
1063          as if an end tag with the tag name "p" had been seen.
1064
1065          Insert an HTML element for the token.
1066
1067          Switch the insertion mode to "in table".
1068
1069   A start tag whose tag name is one of: "area", "basefont", "bgsound",
1070          "br", "embed", "img", "input", "spacer", "wbr"
1071          Reconstruct the active formatting elements, if any.
1072
1073          Insert an HTML element for the token. Immediately pop the
1074          current node off the stack of open elements.
1075
1076          Acknowledge the token's self-closing flag, if it is set.
1077
1078   A start tag whose tag name is one of: "param", "source"
1079          Insert an HTML element for the token. Immediately pop the
1080          current node off the stack of open elements.
1081
1082          Acknowledge the token's self-closing flag, if it is set.
1083
1084   A start tag whose tag name is "hr"
1085          If the stack of open elements has a p element in scope, then act
1086          as if an end tag with the tag name "p" had been seen.
1087
1088          Insert an HTML element for the token. Immediately pop the
1089          current node off the stack of open elements.
1090
1091          Acknowledge the token's self-closing flag, if it is set.
1092
1093   A start tag whose tag name is "image"
1094          Parse error. Change the token's tag name to "img" and reprocess
1095          it. (Don't ask.)
1096
1097   A start tag whose tag name is "isindex"
1098          Parse error.
1099
1100          If the form element pointer is not null, then ignore the token.
1101
1102          Otherwise:
1103
1104          Acknowledge the token's self-closing flag, if it is set.
1105
1106          Act as if a start tag token with the tag name "form" had been
1107          seen.
1108
1109          If the token has an attribute called "action", set the action
1110          attribute on the resulting form element to the value of the
1111          "action" attribute of the token.
1112
1113          Act as if a start tag token with the tag name "hr" had been
1114          seen.
1115
1116          Act as if a start tag token with the tag name "p" had been seen.
1117
1118          Act as if a start tag token with the tag name "label" had been
1119          seen.
1120
1121          Act as if a stream of character tokens had been seen (see below
1122          for what they should say).
1123
1124          Act as if a start tag token with the tag name "input" had been
1125          seen, with all the attributes from the "isindex" token except
1126          "name", "action", and "prompt". Set the name attribute of the
1127          resulting input element to the value "isindex".
1128
1129          Act as if a stream of character tokens had been seen (see below
1130          for what they should say).
1131
1132          Act as if an end tag token with the tag name "label" had been
1133          seen.
1134
1135          Act as if an end tag token with the tag name "p" had been seen.
1136
1137          Act as if a start tag token with the tag name "hr" had been
1138          seen.
1139
1140          Act as if an end tag token with the tag name "form" had been
1141          seen.
1142
1143          If the token has an attribute with the name "prompt", then the
1144          first stream of characters must be the same string as given in
1145          that attribute, and the second stream of characters must be
1146          empty. Otherwise, the two streams of character tokens together
1147          should, together with the input element, express the equivalent
1148          of "This is a searchable index. Insert your search keywords
1149          here: (input field)" in the user's preferred language.
1150
1151   A start tag whose tag name is "textarea"
1152
1153         1. Insert an HTML element for the token.
1154         2. If the next token is a U+000A LINE FEED (LF) character token,
1155            then ignore that token and move on to the next one. (Newlines
1156            at the start of textarea elements are ignored as an authoring
1157            convenience.)
1158         3. Switch the tokeniser's content model flag to the RCDATA state.
1159         4. Let the original insertion mode be the current insertion mode.
1160         5. Switch the insertion mode to "in CDATA/RCDATA".
1161
1162   A start tag whose tag name is one of: "iframe", "noembed"
1163   A start tag whose tag name is "noscript", if the scripting flag is
1164          enabled
1165          Follow the generic CDATA element parsing algorithm.
1166
1167   A start tag whose tag name is "select"
1168          Reconstruct the active formatting elements, if any.
1169
1170          Insert an HTML element for the token.
1171
1172          If the insertion mode is one of in table", "in caption", "in
1173          column group", "in table body", "in row", or "in cell", then
1174          switch the insertion mode to "in select in table". Otherwise,
1175          switch the insertion mode to "in select".
1176
1177   A start tag whose tag name is one of: "optgroup", "option"
1178          If the stack of open elements has an option element in scope,
1179          then act as if an end tag with the tag name "option" had been
1180          seen.
1181
1182          Reconstruct the active formatting elements, if any.
1183
1184          Insert an HTML element for the token.
1185
1186   A start tag whose tag name is one of: "rp", "rt"
1187          If the stack of open elements has a ruby element in scope, then
1188          generate implied end tags. If the current node is not then a
1189          ruby element, this is a parse error; pop all the nodes from the
1190          current node up to the node immediately before the bottommost
1191          ruby element on the stack of open elements.
1192
1193          Insert an HTML element for the token.
1194
1195   An end tag whose tag name is "br"
1196          Parse error. Act as if a start tag token with the tag name "br"
1197          had been seen. Ignore the end tag token.
1198
1199   A start tag whose tag name is "math"
1200          Reconstruct the active formatting elements, if any.
1201
1202          Adjust MathML attributes for the token. (This fixes the case of
1203          MathML attributes that are not all lowercase.)
1204
1205          Adjust foreign attributes for the token. (This fixes the use of
1206          namespaced attributes, in particular XLink.)
1207
1208          Insert a foreign element for the token, in the MathML namespace.
1209
1210          If the token has its self-closing flag set, pop the current node
1211          off the stack of open elements and acknowledge the token's
1212          self-closing flag.
1213
1214          Otherwise, let the secondary insertion mode be the current
1215          insertion mode, and then switch the insertion mode to "in
1216          foreign content".
1217
1218   A start tag whose tag name is one of: "caption", "col", "colgroup",
1219          "frame", "frameset", "head", "tbody", "td", "tfoot", "th",
1220          "thead", "tr"
1221          Parse error. Ignore the token.
1222
1223   Any other start tag
1224          Reconstruct the active formatting elements, if any.
1225
1226          Insert an HTML element for the token.
1227
1228          This element will be a phrasing element.
1229
1230   Any other end tag
1231          Run the following steps:
1232
1233         1. Initialize node to be the current node (the bottommost node of
1234            the stack).
1235         2. If node has the same tag name as the end tag token, then:
1236              1. Generate implied end tags.
1237              2. If the tag name of the end tag token does not match the
1238                 tag name of the current node, this is a parse error.
1239              3. Pop all the nodes from the current node up to node,
1240                 including node, then stop these steps.
1241         3. Otherwise, if node is in neither the formatting category nor
1242            the phrasing category, then this is a parse error; ignore the
1243            token, and abort these steps.
1244         4. Set node to the previous entry in the stack of open elements.
1245         5. Return to step 2.
1246
1247      8.2.5.11 The "in CDATA/RCDATA" insertion mode
1248
1249   When the insertion mode is "in CDATA/RCDATA", tokens must be handled as
1250   follows:
1251
1252   A character token
1253          Insert the token's character into the current node.
1254
1255   An end-of-file token
1256          Parse error.
1257
1258          If the current node is a script element, mark the script element
1259          as "already executed".
1260
1261          Pop the current node off the stack of open elements.
1262
1263          Switch the insertion mode to the original insertion mode and
1264          reprocess the current token.
1265
1266   An end tag whose tag name is "script"
1267          Let script be the current node (which will be a script element).
1268
1269          Pop the current node off the stack of open elements.
1270
1271          Switch the insertion mode to the original insertion mode.
1272
1273          Let the old insertion point have the same value as the current
1274          insertion point. Let the insertion point be just before the next
1275          input character.
1276
1277          Increment the parser's script nesting level by one.
1278
1279          Run the script. This might cause some script to execute, which
1280          might cause new characters to be inserted into the tokeniser,
1281          and might cause the tokeniser to output more tokens, resulting
1282          in a reentrant invocation of the parser.
1283
1284          Decrement the parser's script nesting level by one. If the
1285          parser's script nesting level is zero, then set the parser pause
1286          flag to false.
1287
1288          Let the insertion point have the value of the old insertion
1289          point. (In other words, restore the insertion point to the value
1290          it had before the previous paragraph. This value might be the
1291          "undefined" value.)
1292
1293          At this stage, if there is a pending external script, then:
1294
1295        If the tree construction stage is being called reentrantly, say
1296                from a call to document.write():
1297                Set the parser pause flag to true, and abort the
1298                processing of any nested invocations of the tokeniser,
1299                yielding control back to the caller. (Tokenization will
1300                resume when the caller returns to the "outer" tree
1301                construction stage.)
1302
1303        Otherwise:
1304                Follow these steps:
1305
1306              1. Let the script be the pending external script. There is
1307                 no longer a pending external script.
1308              2. Pause until the script has completed loading.
1309              3. Let the insertion point be just before the next input
1310                 character.
1311              4. Execute the script.
1312              5. Let the insertion point be undefined again.
1313              6. If there is once again a pending external script, then
1314                 repeat these steps from step 1.
1315
1316   Any other end tag
1317          Pop the current node off the stack of open elements.
1318
1319          Switch the insertion mode to the original insertion mode.
1320
1321      8.2.5.12 The "in table" insertion mode
1322
1323   When the insertion mode is "in table", tokens must be handled as
1324   follows:
1325
1326   A character token that is one of one of U+0009 CHARACTER TABULATION,
1327          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
1328          If the current table is tainted, then act as described in the
1329          "anything else" entry below.
1330
1331          Otherwise, insert the character into the current node.
1332
1333   A comment token
1334          Append a Comment node to the current node with the data
1335          attribute set to the data given in the comment token.
1336
1337   A DOCTYPE token
1338          Parse error. Ignore the token.
1339
1340   A start tag whose tag name is "caption"
1341          Clear the stack back to a table context. (See below.)
1342
1343          Insert a marker at the end of the list of active formatting
1344          elements.
1345
1346          Insert an HTML element for the token, then switch the insertion
1347          mode to "in caption".
1348
1349   A start tag whose tag name is "colgroup"
1350          Clear the stack back to a table context. (See below.)
1351
1352          Insert an HTML element for the token, then switch the insertion
1353          mode to "in column group".
1354
1355   A start tag whose tag name is "col"
1356          Act as if a start tag token with the tag name "colgroup" had
1357          been seen, then reprocess the current token.
1358
1359   A start tag whose tag name is one of: "tbody", "tfoot", "thead"
1360          Clear the stack back to a table context. (See below.)
1361
1362          Insert an HTML element for the token, then switch the insertion
1363          mode to "in table body".
1364
1365   A start tag whose tag name is one of: "td", "th", "tr"
1366          Act as if a start tag token with the tag name "tbody" had been
1367          seen, then reprocess the current token.
1368
1369   A start tag whose tag name is "table"
1370          Parse error. Act as if an end tag token with the tag name
1371          "table" had been seen, then, if that token wasn't ignored,
1372          reprocess the current token.
1373
1374          The fake end tag token here can only be ignored in the fragment
1375          case.
1376
1377   An end tag whose tag name is "table"
1378          If the stack of open elements does not have an element in table
1379          scope with the same tag name as the token, this is a parse
1380          error. Ignore the token. (fragment case)
1381
1382          Otherwise:
1383
1384          Pop elements from this stack until a table element has been
1385          popped from the stack.
1386
1387          Reset the insertion mode appropriately.
1388
1389   An end tag whose tag name is one of: "body", "caption", "col",
1390          "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr"
1391          Parse error. Ignore the token.
1392
1393   A start tag whose tag name is one of: "style", "script"
1394          If the current table is tainted then act as described in the
1395          "anything else" entry below.
1396
1397          Otherwise, process the token using the rules for the "in head"
1398          insertion mode.
1399
1400   A start tag whose tag name is "input"
1401          If the token does not have an attribute with the name "type", or
1402          if it does, but that attribute's value is not an ASCII
1403          case-insensitive match for the string "hidden", or, if the
1404          current table is tainted, then: act as described in the
1405          "anything else" entry below.
1406
1407          Otherwise:
1408
1409          Parse error.
1410
1411          Insert an HTML element for the token.
1412
1413          Pop that input element off the stack of open elements.
1414
1415   An end-of-file token
1416          If the current node is not the root html element, then this is a
1417          parse error.
1418
1419          It can only be the current node in the fragment case.
1420
1421          Stop parsing.
1422
1423   Anything else
1424          Parse error. Process the token using the rules for the "in body"
1425          insertion mode, except that if the current node is a table,
1426          tbody, tfoot, thead, or tr element, then, whenever a node would
1427          be inserted into the current node, it must instead be foster
1428          parented.
1429
1430   When the steps above require the UA to clear the stack back to a table
1431   context, it means that the UA must, while the current node is not a
1432   table element or an html element, pop elements from the stack of open
1433   elements.
1434
1435   The current node being an html element after this process is a fragment
1436   case.
1437
1438      8.2.5.13 The "in caption" insertion mode
1439
1440   When the insertion mode is "in caption", tokens must be handled as
1441   follows:
1442
1443   An end tag whose tag name is "caption"
1444          If the stack of open elements does not have an element in table
1445          scope with the same tag name as the token, this is a parse
1446          error. Ignore the token. (fragment case)
1447
1448          Otherwise:
1449
1450          Generate implied end tags.
1451
1452          Now, if the current node is not a caption element, then this is
1453          a parse error.
1454
1455          Pop elements from this stack until a caption element has been
1456          popped from the stack.
1457
1458          Clear the list of active formatting elements up to the last
1459          marker.
1460
1461          Switch the insertion mode to "in table".
1462
1463   A start tag whose tag name is one of: "caption", "col", "colgroup",
1464          "tbody", "td", "tfoot", "th", "thead", "tr"
1465
1466   An end tag whose tag name is "table"
1467          Parse error. Act as if an end tag with the tag name "caption"
1468          had been seen, then, if that token wasn't ignored, reprocess the
1469          current token.
1470
1471          The fake end tag token here can only be ignored in the fragment
1472          case.
1473
1474   An end tag whose tag name is one of: "body", "col", "colgroup", "html",
1475          "tbody", "td", "tfoot", "th", "thead", "tr"
1476          Parse error. Ignore the token.
1477
1478   Anything else
1479          Process the token using the rules for the "in body" insertion
1480          mode.
1481
1482      8.2.5.14 The "in column group" insertion mode
1483
1484   When the insertion mode is "in column group", tokens must be handled as
1485   follows:
1486
1487   A character token that is one of one of U+0009 CHARACTER TABULATION,
1488          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
1489          Insert the character into the current node.
1490
1491   A comment token
1492          Append a Comment node to the current node with the data
1493          attribute set to the data given in the comment token.
1494
1495   A DOCTYPE token
1496          Parse error. Ignore the token.
1497
1498   A start tag whose tag name is "html"
1499          Process the token using the rules for the "in body" insertion
1500          mode.
1501
1502   A start tag whose tag name is "col"
1503          Insert an HTML element for the token. Immediately pop the
1504          current node off the stack of open elements.
1505
1506          Acknowledge the token's self-closing flag, if it is set.
1507
1508   An end tag whose tag name is "colgroup"
1509          If the current node is the root html element, then this is a
1510          parse error; ignore the token. (fragment case)
1511
1512          Otherwise, pop the current node (which will be a colgroup
1513          element) from the stack of open elements. Switch the insertion
1514          mode to "in table".
1515
1516   An end tag whose tag name is "col"
1517          Parse error. Ignore the token.
1518
1519   An end-of-file token
1520          If the current node is the root html element, then stop parsing.
1521          (fragment case)
1522
1523          Otherwise, act as described in the "anything else" entry below.
1524
1525   Anything else
1526          Act as if an end tag with the tag name "colgroup" had been seen,
1527          and then, if that token wasn't ignored, reprocess the current
1528          token.
1529
1530          The fake end tag token here can only be ignored in the fragment
1531          case.
1532
1533      8.2.5.15 The "in table body" insertion mode
1534
1535   When the insertion mode is "in table body", tokens must be handled as
1536   follows:
1537
1538   A start tag whose tag name is "tr"
1539          Clear the stack back to a table body context. (See below.)
1540
1541          Insert an HTML element for the token, then switch the insertion
1542          mode to "in row".
1543
1544   A start tag whose tag name is one of: "th", "td"
1545          Parse error. Act as if a start tag with the tag name "tr" had
1546          been seen, then reprocess the current token.
1547
1548   An end tag whose tag name is one of: "tbody", "tfoot", "thead"
1549          If the stack of open elements does not have an element in table
1550          scope with the same tag name as the token, this is a parse
1551          error. Ignore the token.
1552
1553          Otherwise:
1554
1555          Clear the stack back to a table body context. (See below.)
1556
1557          Pop the current node from the stack of open elements. Switch the
1558          insertion mode to "in table".
1559
1560   A start tag whose tag name is one of: "caption", "col", "colgroup",
1561          "tbody", "tfoot", "thead"
1562
1563   An end tag whose tag name is "table"
1564          If the stack of open elements does not have a tbody, thead, or
1565          tfoot element in table scope, this is a parse error. Ignore the
1566          token. (fragment case)
1567
1568          Otherwise:
1569
1570          Clear the stack back to a table body context. (See below.)
1571
1572          Act as if an end tag with the same tag name as the current node
1573          ("tbody", "tfoot", or "thead") had been seen, then reprocess the
1574          current token.
1575
1576   An end tag whose tag name is one of: "body", "caption", "col",
1577          "colgroup", "html", "td", "th", "tr"
1578          Parse error. Ignore the token.
1579
1580   Anything else
1581          Process the token using the rules for the "in table" insertion
1582          mode.
1583
1584   When the steps above require the UA to clear the stack back to a table
1585   body context, it means that the UA must, while the current node is not
1586   a tbody, tfoot, thead, or html element, pop elements from the stack of
1587   open elements.
1588
1589   The current node being an html element after this process is a fragment
1590   case.
1591
1592      8.2.5.16 The "in row" insertion mode
1593
1594   When the insertion mode is "in row", tokens must be handled as follows:
1595
1596   A start tag whose tag name is one of: "th", "td"
1597          Clear the stack back to a table row context. (See below.)
1598
1599          Insert an HTML element for the token, then switch the insertion
1600          mode to "in cell".
1601
1602          Insert a marker at the end of the list of active formatting
1603          elements.
1604
1605   An end tag whose tag name is "tr"
1606          If the stack of open elements does not have an element in table
1607          scope with the same tag name as the token, this is a parse
1608          error. Ignore the token. (fragment case)
1609
1610          Otherwise:
1611
1612          Clear the stack back to a table row context. (See below.)
1613
1614          Pop the current node (which will be a tr element) from the stack
1615          of open elements. Switch the insertion mode to "in table body".
1616
1617   A start tag whose tag name is one of: "caption", "col", "colgroup",
1618          "tbody", "tfoot", "thead", "tr"
1619
1620   An end tag whose tag name is "table"
1621          Act as if an end tag with the tag name "tr" had been seen, then,
1622          if that token wasn't ignored, reprocess the current token.
1623
1624          The fake end tag token here can only be ignored in the fragment
1625          case.
1626
1627   An end tag whose tag name is one of: "tbody", "tfoot", "thead"
1628          If the stack of open elements does not have an element in table
1629          scope with the same tag name as the token, this is a parse
1630          error. Ignore the token.
1631
1632          Otherwise, act as if an end tag with the tag name "tr" had been
1633          seen, then reprocess the current token.
1634
1635   An end tag whose tag name is one of: "body", "caption", "col",
1636          "colgroup", "html", "td", "th"
1637          Parse error. Ignore the token.
1638
1639   Anything else
1640          Process the token using the rules for the "in table" insertion
1641          mode.
1642
1643   When the steps above require the UA to clear the stack back to a table
1644   row context, it means that the UA must, while the current node is not a
1645   tr element or an html element, pop elements from the stack of open
1646   elements.
1647
1648   The current node being an html element after this process is a fragment
1649   case.
1650
1651      8.2.5.17 The "in cell" insertion mode
1652
1653   When the insertion mode is "in cell", tokens must be handled as
1654   follows:
1655
1656   An end tag whose tag name is one of: "td", "th"
1657          If the stack of open elements does not have an element in table
1658          scope with the same tag name as that of the token, then this is
1659          a parse error and the token must be ignored.
1660
1661          Otherwise:
1662
1663          Generate implied end tags.
1664
1665          Now, if the current node is not an element with the same tag
1666          name as the token, then this is a parse error.
1667
1668          Pop elements from this stack until an element with the same tag
1669          name as the token has been popped from the stack.
1670
1671          Clear the list of active formatting elements up to the last
1672          marker.
1673
1674          Switch the insertion mode to "in row". (The current node will be
1675          a tr element at this point.)
1676
1677   A start tag whose tag name is one of: "caption", "col", "colgroup",
1678          "tbody", "td", "tfoot", "th", "thead", "tr"
1679          If the stack of open elements does not have a td or th element
1680          in table scope, then this is a parse error; ignore the token.
1681          (fragment case)
1682
1683          Otherwise, close the cell (see below) and reprocess the current
1684          token.
1685
1686   An end tag whose tag name is one of: "body", "caption", "col",
1687          "colgroup", "html"
1688          Parse error. Ignore the token.
1689
1690   An end tag whose tag name is one of: "table", "tbody", "tfoot",
1691          "thead", "tr"
1692          If the stack of open elements does not have an element in table
1693          scope with the same tag name as that of the token (which can
1694          only happen for "tbody", "tfoot" and "thead", or, in the
1695          fragment case), then this is a parse error and the token must be
1696          ignored.
1697
1698          Otherwise, close the cell (see below) and reprocess the current
1699          token.
1700
1701   Anything else
1702          Process the token using the rules for the "in body" insertion
1703          mode.
1704
1705   Where the steps above say to close the cell, they mean to run the
1706   following algorithm:
1707    1. If the stack of open elements has a td element in table scope, then
1708       act as if an end tag token with the tag name "td" had been seen.
1709    2. Otherwise, the stack of open elements will have a th element in
1710       table scope; act as if an end tag token with the tag name "th" had
1711       been seen.
1712
1713   The stack of open elements cannot have both a td and a th element in
1714   table scope at the same time, nor can it have neither when the
1715   insertion mode is "in cell".
1716
1717      8.2.5.18 The "in select" insertion mode
1718
1719   When the insertion mode is "in select", tokens must be handled as
1720   follows:
1721
1722   A character token
1723          Insert the token's character into the current node.
1724
1725   A comment token
1726          Append a Comment node to the current node with the data
1727          attribute set to the data given in the comment token.
1728
1729   A DOCTYPE token
1730          Parse error. Ignore the token.
1731
1732   A start tag whose tag name is "html"
1733          Process the token using the rules for the "in body" insertion
1734          mode.
1735
1736   A start tag whose tag name is "option"
1737          If the current node is an option element, act as if an end tag
1738          with the tag name "option" had been seen.
1739
1740          Insert an HTML element for the token.
1741
1742   A start tag whose tag name is "optgroup"
1743          If the current node is an option element, act as if an end tag
1744          with the tag name "option" had been seen.
1745
1746          If the current node is an optgroup element, act as if an end tag
1747          with the tag name "optgroup" had been seen.
1748
1749          Insert an HTML element for the token.
1750
1751   An end tag whose tag name is "optgroup"
1752          First, if the current node is an option element, and the node
1753          immediately before it in the stack of open elements is an
1754          optgroup element, then act as if an end tag with the tag name
1755          "option" had been seen.
1756
1757          If the current node is an optgroup element, then pop that node
1758          from the stack of open elements. Otherwise, this is a parse
1759          error; ignore the token.
1760
1761   An end tag whose tag name is "option"
1762          If the current node is an option element, then pop that node
1763          from the stack of open elements. Otherwise, this is a parse
1764          error; ignore the token.
1765
1766   An end tag whose tag name is "select"
1767          If the stack of open elements does not have an element in table
1768          scope with the same tag name as the token, this is a parse
1769          error. Ignore the token. (fragment case)
1770
1771          Otherwise:
1772
1773          Pop elements from the stack of open elements until a select
1774          element has been popped from the stack.
1775
1776          Reset the insertion mode appropriately.
1777
1778   A start tag whose tag name is "select"
1779          Parse error. Act as if the token had been an end tag with the
1780          tag name "select" instead.
1781
1782   A start tag whose tag name is one of: "input", "textarea"
1783          Parse error. Act as if an end tag with the tag name "select" had
1784          been seen, and reprocess the token.
1785
1786   A start tag token whose tag name is "script"
1787          Process the token using the rules for the "in head" insertion
1788          mode.
1789
1790   An end-of-file token
1791          If the current node is not the root html element, then this is a
1792          parse error.
1793
1794          It can only be the current node in the fragment case.
1795
1796          Stop parsing.
1797
1798   Anything else
1799          Parse error. Ignore the token.
1800
1801      8.2.5.19 The "in select in table" insertion mode
1802
1803   When the insertion mode is "in select in table", tokens must be handled
1804   as follows:
1805
1806   A start tag whose tag name is one of: "caption", "table", "tbody",
1807          "tfoot", "thead", "tr", "td", "th"
1808          Parse error. Act as if an end tag with the tag name "select" had
1809          been seen, and reprocess the token.
1810
1811   An end tag whose tag name is one of: "caption", "table", "tbody",
1812          "tfoot", "thead", "tr", "td", "th"
1813          Parse error.
1814
1815          If the stack of open elements has an element in table scope with
1816          the same tag name as that of the token, then act as if an end
1817          tag with the tag name "select" had been seen, and reprocess the
1818          token. Otherwise, ignore the token.
1819
1820   Anything else
1821          Process the token using the rules for the "in select" insertion
1822          mode.
1823
1824      8.2.5.20 The "in foreign content" insertion mode
1825
1826   When the insertion mode is "in foreign content", tokens must be handled
1827   as follows:
1828
1829   A character token
1830          Insert the token's character into the current node.
1831
1832   A comment token
1833          Append a Comment node to the current node with the data
1834          attribute set to the data given in the comment token.
1835
1836   A DOCTYPE token
1837          Parse error. Ignore the token.
1838
1839   A start tag whose tag name is neither "mglyph" nor "malignmark", if the
1840          current node is an mi element in the MathML namespace.
1841
1842   A start tag whose tag name is neither "mglyph" nor "malignmark", if the
1843          current node is an mo element in the MathML namespace.
1844
1845   A start tag whose tag name is neither "mglyph" nor "malignmark", if the
1846          current node is an mn element in the MathML namespace.
1847
1848   A start tag whose tag name is neither "mglyph" nor "malignmark", if the
1849          current node is an ms element in the MathML namespace.
1850
1851   A start tag whose tag name is neither "mglyph" nor "malignmark", if the
1852          current node is an mtext element in the MathML namespace.
1853
1854   A start tag, if the current node is an element in the HTML namespace.
1855   An end tag
1856          Process the token using the rules for the secondary insertion
1857          mode.
1858
1859          If, after doing so, the insertion mode is still "in foreign
1860          content", but there is no element in scope that has a namespace
1861          other than the HTML namespace, switch the insertion mode to the
1862          secondary insertion mode.
1863
1864   A start tag whose tag name is one of: "b", "big", "blockquote", "body",
1865          "br", "center", "code", "dd", "div", "dl", "dt", "em", "embed",
1866          "h1", "h2", "h3", "h4", "h5", "h6", "head", "hr", "i", "img",
1867          "li", "listing", "menu", "meta", "nobr", "ol", "p", "pre",
1868          "ruby", "s", "small", "span", "strong", "strike", "sub", "sup",
1869          "table", "tt", "u", "ul", "var"
1870
1871   A start tag whose tag name is "font", if the token has any attributes
1872          named "color", "face", or "size"
1873
1874   An end-of-file token
1875          Parse error.
1876
1877          Pop elements from the stack of open elements until the current
1878          node is in the HTML namespace.
1879
1880          Switch the insertion mode to the secondary insertion mode, and
1881          reprocess the token.
1882
1883   Any other start tag
1884          If the current node is an element in the MathML namespace,
1885          adjust MathML attributes for the token. (This fixes the case of
1886          MathML attributes that are not all lowercase.)
1887
1888          Adjust foreign attributes for the token. (This fixes the use of
1889          namespaced attributes, in particular XLink in SVG.)
1890
1891          Insert a foreign element for the token, in the same namespace as
1892          the current node.
1893
1894          If the token has its self-closing flag set, pop the current node
1895          off the stack of open elements and acknowledge the token's
1896          self-closing flag.
1897
1898      8.2.5.21 The "after body" insertion mode
1899
1900   When the insertion mode is "after body", tokens must be handled as
1901   follows:
1902
1903   A character token that is one of one of U+0009 CHARACTER TABULATION,
1904          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
1905          Process the token using the rules for the "in body" insertion
1906          mode.
1907
1908   A comment token
1909          Append a Comment node to the first element in the stack of open
1910          elements (the html element), with the data attribute set to the
1911          data given in the comment token.
1912
1913   A DOCTYPE token
1914          Parse error. Ignore the token.
1915
1916   A start tag whose tag name is "html"
1917          Process the token using the rules for the "in body" insertion
1918          mode.
1919
1920   An end tag whose tag name is "html"
1921          If the parser was originally created as part of the HTML
1922          fragment parsing algorithm, this is a parse error; ignore the
1923          token. (fragment case)
1924
1925          Otherwise, switch the insertion mode to "after after body".
1926
1927   An end-of-file token
1928          Stop parsing.
1929
1930   Anything else
1931          Parse error. Switch the insertion mode to "in body" and
1932          reprocess the token.
1933
1934      8.2.5.22 The "in frameset" insertion mode
1935
1936   When the insertion mode is "in frameset", tokens must be handled as
1937   follows:
1938
1939   A character token that is one of one of U+0009 CHARACTER TABULATION,
1940          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
1941          Insert the character into the current node.
1942
1943   A comment token
1944          Append a Comment node to the current node with the data
1945          attribute set to the data given in the comment token.
1946
1947   A DOCTYPE token
1948          Parse error. Ignore the token.
1949
1950   A start tag whose tag name is "html"
1951          Process the token using the rules for the "in body" insertion
1952          mode.
1953
1954   A start tag whose tag name is "frameset"
1955          Insert an HTML element for the token.
1956
1957   An end tag whose tag name is "frameset"
1958          If the current node is the root html element, then this is a
1959          parse error; ignore the token. (fragment case)
1960
1961          Otherwise, pop the current node from the stack of open elements.
1962
1963          If the parser was not originally created as part of the HTML
1964          fragment parsing algorithm (fragment case), and the current node
1965          is no longer a frameset element, then switch the insertion mode
1966          to "after frameset".
1967
1968   A start tag whose tag name is "frame"
1969          Insert an HTML element for the token. Immediately pop the
1970          current node off the stack of open elements.
1971
1972          Acknowledge the token's self-closing flag, if it is set.
1973
1974   A start tag whose tag name is "noframes"
1975          Process the token using the rules for the "in head" insertion
1976          mode.
1977
1978   An end-of-file token
1979          If the current node is not the root html element, then this is a
1980          parse error.
1981
1982          It can only be the current node in the fragment case.
1983
1984          Stop parsing.
1985
1986   Anything else
1987          Parse error. Ignore the token.
1988
1989      8.2.5.23 The "after frameset" insertion mode
1990
1991   When the insertion mode is "after frameset", tokens must be handled as
1992   follows:
1993
1994   A character token that is one of one of U+0009 CHARACTER TABULATION,
1995          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
1996          Insert the character into the current node.
1997
1998   A comment token
1999          Append a Comment node to the current node with the data
2000          attribute set to the data given in the comment token.
2001
2002   A DOCTYPE token
2003          Parse error. Ignore the token.
2004
2005   A start tag whose tag name is "html"
2006          Process the token using the rules for the "in body" insertion
2007          mode.
2008
2009   An end tag whose tag name is "html"
2010          Switch the insertion mode to "after after frameset".
2011
2012   A start tag whose tag name is "noframes"
2013          Process the token using the rules for the "in head" insertion
2014          mode.
2015
2016   An end-of-file token
2017          Stop parsing.
2018
2019   Anything else
2020          Parse error. Ignore the token.
2021
2022   This doesn't handle UAs that don't support frames, or that do support
2023   frames but want to show the NOFRAMES content. Supporting the former is
2024   easy; supporting the latter is harder.
2025
2026      8.2.5.24 The "after after body" insertion mode
2027
2028   When the insertion mode is "after after body", tokens must be handled
2029   as follows:
2030
2031   A comment token
2032          Append a Comment node to the Document object with the data
2033          attribute set to the data given in the comment token.
2034
2035   A DOCTYPE token
2036   A character token that is one of one of U+0009 CHARACTER TABULATION,
2037          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
2038
2039   A start tag whose tag name is "html"
2040          Process the token using the rules for the "in body" insertion
2041          mode.
2042
2043   An end-of-file token
2044          Stop parsing.
2045
2046   Anything else
2047          Parse error. Switch the insertion mode to "in body" and
2048          reprocess the token.
2049
2050      8.2.5.25 The "after after frameset" insertion mode
2051
2052   When the insertion mode is "after after frameset", tokens must be
2053   handled as follows:
2054
2055   A comment token
2056          Append a Comment node to the Document object with the data
2057          attribute set to the data given in the comment token.
2058
2059   A DOCTYPE token
2060   A character token that is one of one of U+0009 CHARACTER TABULATION,
2061          U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
2062
2063   A start tag whose tag name is "html"
2064          Process the token using the rules for the "in body" insertion
2065          mode.
2066
2067   An end-of-file token
2068          Stop parsing.
2069
2070   A start tag whose tag name is "noframes"
2071          Process the token using the rules for the "in head" insertion
2072          mode.
2073
2074   Anything else
2075          Parse error. Ignore the token.
2076
2077    8.2.6 The end
2078
2079   Once the user agent stops parsing the document, the user agent must
2080   follow the steps in this section.
2081
2082   First, the current document readiness must be set to "interactive".
2083
2084   Then, the rules for when a script completes loading start applying
2085   (script execution is no longer managed by the parser).
2086
2087   If any of the scripts in the list of scripts that will execute as soon
2088   as possible have completed loading, or if the list of scripts that will
2089   execute asynchronously is not empty and the first script in that list
2090   has completed loading, then the user agent must act as if those scripts
2091   just completed loading, following the rules given for that in the
2092   script element definition.
2093
2094   Then, if the list of scripts that will execute when the document has
2095   finished parsing is not empty, and the first item in this list has
2096   already completed loading, then the user agent must act as if that
2097   script just finished loading.
2098
2099   By this point, there will be no scripts that have loaded but have not
2100   yet been executed.
2101
2102   The user agent must then fire a simple event called DOMContentLoaded at
2103   the Document.
2104
2105   Once everything that delays the load event has completed, the user
2106   agent must set the current document readiness to "complete", and then
2107   fire a load event at the body element.
2108
2109   delaying the load event for things like image loads allows for intranet
2110   port scans (even without javascript!). Should we really encode that
2111   into the spec?
2112
2113    8.2.7 Coercing an HTML DOM into an infoset
2114
2115   When an application uses an HTML parser in conjunction with an XML
2116   pipeline, it is possible that the constructed DOM is not compatible
2117   with the XML tool chain in certain subtle ways. For example, an XML
2118   toolchain might not be able to represent attributes with the name
2119   xmlns, since they conflict with the Namespaces in XML syntax. There is
2120   also some data that the HTML parser generates that isn't included in
2121   the DOM itself. This section specifies some rules for handling these
2122   issues.
2123
2124   If the XML API being used doesn't support DOCTYPEs, the tool may drop
2125   DOCTYPEs altogether.
2126
2127   If the XML API doesn't support attributes in no namespace that are
2128   named "xmlns", attributes whose names start with "xmlns:", or
2129   attributes in the XMLNS namespace, then the tool may drop such
2130   attributes.
2131
2132   The tool may annotate the output with any namespace declarations
2133   required for proper operation.
2134
2135   If the XML API being used restricts the allowable characters in the
2136   local names of elements and attributes, then the tool may map all
2137   element and attribute local names that the API wouldn't support to a
2138   set of names that are allowed, by replacing any character that isn't
2139   supported with the uppercase letter U and the five digits of the
2140   character's Unicode codepoint when expressed in hexadecimal, using
2141   digits 0-9 and capital letters A-F as the symbols, in increasing
2142   numeric order.
2143
2144   For example, the element name foo<bar, which can be output by the HTML
2145   parser, though it is neither a legal HTML element name nor a
2146   well-formed XML element name, would be converted into fooU0003Cbar,
2147   which is a well-formed XML element name (though it's still not legal in
2148   HTML by any means).
2149
2150   As another example, consider the attribute xlink:href. Used on a MathML
2151   element, it becomes, after being adjusted, an attribute with a prefix
2152   "xlink" and a local name "href". However, used on an HTML element, it
2153   becomes an attribute with no prefix and the local name "xlink:href",
2154   which is not a valid NCName, and thus might not be accepted by an XML
2155   API. It could thus get converted, becoming "xlinkU0003Ahref".
2156
2157   The resulting names from this conversion conveniently can't clash with
2158   any attribute generated by the HTML parser, since those are all either
2159   lowercase or those listed in the adjust foreign attributes algorithm's
2160   table.
2161
2162   If the XML API restricts comments from having two consecutive U+002D
2163   HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE
2164   character between any such offending characters.
2165
2166   If the XML API restricts comments from ending in a U+002D HYPHEN-MINUS
2167   character (-), the tool may insert a single U+0020 SPACE character at
2168   the end of such comments.
2169
2170   If the XML API restricts allowed characters in character data, the tool
2171   may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE
2172   character, and any other literal non-XML character with a U+FFFD
2173   REPLACEMENT CHARACTER.
2174
2175   If the tool has no way to convey out-of-band information, then the tool
2176   may drop the following information:
2177     * Whether the document is set to no quirks mode, limited quirks mode,
2178       or quirks mode
2179     * The association between form controls and forms that aren't their
2180       nearest form element ancestor (use of the form element pointer in
2181       the parser)
2182
2183   The mutations allowed by this section apply after the HTML parser's
2184   rules have been applied. For example, a <a::> start tag will be closed
2185   by a </a::> end tag, and never by a </aU0003AU0003A> end tag, even if
2186   the user agent is using the rules above to then generate an actual
2187   element in the DOM with the name aU0003AU0003A for that start tag.
2188
2189  8.3 Namespaces
2190
2191   The HTML namespace is: http://www.w3.org/1999/xhtml
2192
2193   The MathML namespace is: http://www.w3.org/1998/Math/MathML
2194
2195   The SVG namespace is: http://www.w3.org/2000/svg
2196
2197   The XLink namespace is: http://www.w3.org/1999/xlink
2198
2199   The XML namespace is: http://www.w3.org/XML/1998/namespace
2200
2201   The XMLNS namespace is: http://www.w3.org/2000/xmlns/
2202