tree-construction.txt - OpenGrok cross reference for /external/owasp/sanitizer/lib/htmlparser-1.3/doc/tree-construction.txt

Lines Matching full:the
14    The input to the tree construction stage is a sequence of tokens from
15    the tokenization stage. The tree construction stage is associated with
16    a DOM Document object when a parser is created. The "output" of this
21    to render the Document so that it is available to the user, or when it
24    As each token is emitted from the tokeniser, the user agent must
25    process the token according to the rules given in the section
26    corresponding to the current insertion mode.
28    When the steps below require the UA to insert a character into a node,
29    if that node has a child immediately before where the character is to
30    be inserted, and that child is a Text node, and that Text node was the
31    last node that the parser inserted into the document, then the
33    node whose data is just that character must be inserted in the
36    DOM mutation events must not fire for changes caused by the UA parsing
37    the document. (Conceptually, the parser is not mutating the DOM, it is
38    constructing it.) This includes the parsing of any content inserted
41    Not all of the tag names mentioned below are conformant tag names in
43    still form part of the algorithm that implementations are required to
46    The algorithm described below places no limit on the depth of the DOM
47    tree generated, or on the length of tag names, attribute names,
54    When the steps below require the UA to create an element for a token in
55    a particular namespace, the UA must create a node implementing the
56    interface appropriate for the element type corresponding to the tag
57    name of the token in the given namespace (as given in the specification
58    that defines that element, e.g. for an a element in the HTML namespace,
59    this specification defines it to be the HTMLAnchorElement interface),
60    with the tag name being the name of that element, with the node being
61    in the given namespace, and with the attributes on the node being those
62    given in the given token.
64    The interface appropriate for an element in the HTML namespace that is
65    not defined in this specification is HTMLElement. The interface
70    algorithm must be invoked once the attributes are set. (This
71    initializes the element's value and checkedness based on the element's
75    When the steps below require the UA to insert an HTML element for a
76    token, the UA must first create an element for the token in the HTML
77    namespace, and then append this node to the current node, and push it
78    onto the stack of open elements so that it is the new current node.
80    The steps below may also require that the UA insert an HTML element in
81    a particular place, in which case the UA must follow the same steps
82    except that it must insert or append the new node in the location
83    specified instead of appending it to the current node. (This happens in
84    particular during the parsing of tables with invalid content.)
86    If an element created by the insert an HTML element algorithm is a
87    form-associated element, and the form element pointer is not null, and
88    the newly created element doesn't have a form attribute, the user agent
89    must associate the newly created element with the form element pointed
90    to by the form element pointer before inserting it wherever it is to be
94    When the steps below require the UA to insert a foreign element for a
95    token, the UA must first create an element for the token in the given
96    namespace, and then append this node to the current node, and push it
97    onto the stack of open elements so that it is the new current node. If
98    the newly created element has an xmlns attribute in the XMLNS namespace
99    whose value is not exactly the same as the element's namespace, that is
102    When the steps below require the user agent to adjust MathML attributes
103    for a token, then, if the token has an attribute named definitionurl,
104    change its name to definitionURL (note the case difference).
106    When the steps below require the user agent to adjust foreign
107    attributes for a token, then, if any of the attributes on the token
108    match the strings given in the first column of the following table, let
109    the attribute be a namespaced attribute, with the prefix being the
110    string given in the corresponding cell in the second column, the local
111    name being the string given in the corresponding cell in the third
112    column, and the namespace being the namespace given in the
113    corresponding cell in the fourth column. (This fixes the use of
131    The generic CDATA element parsing algorithm and the generic RCDATA
132    element parsing algorithm consist of the following steps. These
134     1. Insert an HTML element for the token.
135     2. If the algorithm that was invoked is the generic CDATA element
136        parsing algorithm, switch the tokeniser's content model flag to the
137        CDATA state; otherwise the algorithm invoked was the generic RCDATA
138        element parsing algorithm, switch the tokeniser's content model
139        flag to the RCDATA state.
140     3. Let the original insertion mode be the current insertion mode.
141     4. Then, switch the insertion mode to "in CDATA/RCDATA".
145    When the steps below require the UA to generate implied end tags, then,
146    while the current node is a dd element, a dt element, an li element, an
148    rt element, the UA must pop the current node off the stack of open
151    If a step requires the UA to generate implied end tags but lists an
152    element to exclude from the process, then the UA must perform the above
153    steps as if that element was not in the above list.
159    When a node node is to be foster parented, the node node must be
160    inserted into the foster parent element, and the current table must be
161    marked as tainted. (Once the current table has been tainted, whitespace
162    characters are inserted into the foster parent element instead of the
165    The foster parent element is the parent element of the last table
166    element in the stack of open elements, if there is a table element and
167    it has such a parent element. If there is no table element in the stack
168    of open elements (fragment case), then the foster parent element is the
169    first element in the stack of open elements (the html element).
170    Otherwise, if there is a table element in the stack of open elements,
171    but the last table element in the stack of open elements has no parent,
172    or its parent node is not an element, then the foster parent element is
173    the element before the last table element in the stack of open
176    If the foster parent element is the parent element of the last table
177    element in the stack of open elements, then node must be inserted
178    immediately before the last table element in the stack of open elements
179    in the foster parent element; otherwise, node must be appended to the
182       8.2.5.4 The "initial" insertion mode
184    When the insertion mode is "initial", tokens must be handled as
189           Ignore the token.
192           Append a Comment node to the Document object with the data
193           attribute set to the data given in the comment token.
196           If the DOCTYPE token's name is not a case-sensitive match for
197           the string "html", or if the token's public identifier is
198           neither missing nor a case-sensitive match for the string
199           "XSLT-compat", or if the token's system identifier is not
200           missing, then there is a parse error (this is the DOCTYPE parse
203           language (e.g. based on the DOCTYPE token a conformance checker
204           could recognize that the document is an HTML4-era document, and
207           Append a DocumentType node to the Document node, with the name
208           attribute set to the name given in the DOCTYPE token; the
209           publicId attribute set to the public identifier given in the
210           DOCTYPE token, or the empty string if the public identifier was
211           missing; the systemId attribute set to the system identifier
212           given in the DOCTYPE token, or the empty string if the system
213           identifier was missing; and the other attributes specific to
215           Associate the DocumentType node with the Document object so that
216           it is returned as the value of the doctype attribute of the
219           Then, if the DOCTYPE token matches one of the conditions in the
220           following list, then set the document to quirks mode:
222           + The force-quirks flag is set to on.
223           + The name is set to anything other than "HTML".
224           + The public identifier starts with: "+//Silmaril//dtd html Pro
226           + The public identifier starts with: "-//AdvaSoft Ltd//DTD HTML
228           + The public identifier starts with: "-//AS//DTD HTML 3.0
230           + The public identifier starts with: "-//IETF//DTD HTML 2.0
232           + The public identifier starts with: "-//IETF//DTD HTML 2.0
234           + The public identifier starts with: "-//IETF//DTD HTML 2.0
236           + The public identifier starts with: "-//IETF//DTD HTML 2.0
238           + The public identifier starts with: "-//IETF//DTD HTML 2.0
240           + The public identifier starts with: "-//IETF//DTD HTML 2.0//"
241           + The public identifier starts with: "-//IETF//DTD HTML 2.1E//"
242           + The public identifier starts with: "-//IETF//DTD HTML 3.0//"
243           + The public identifier starts with: "-//IETF//DTD HTML 3.2
245           + The public identifier starts with: "-//IETF//DTD HTML 3.2//"
246           + The public identifier starts with: "-//IETF//DTD HTML 3//"
247           + The public identifier starts with: "-//IETF//DTD HTML Level
249           + The public identifier starts with: "-//IETF//DTD HTML Level
251           + The public identifier starts with: "-//IETF//DTD HTML Level
253           + The public identifier starts with: "-//IETF//DTD HTML Level
255           + The public identifier starts with: "-//IETF//DTD HTML Strict
257           + The public identifier starts with: "-//IETF//DTD HTML Strict
259           + The public identifier starts with: "-//IETF//DTD HTML Strict
261           + The public identifier starts with: "-//IETF//DTD HTML Strict
263           + The public identifier starts with: "-//IETF//DTD HTML
265           + The public identifier starts with: "-//IETF//DTD HTML//"
266           + The public identifier starts with: "-//Metrius//DTD Metrius
268           + The public identifier starts with: "-//Microsoft//DTD Internet
270           + The public identifier starts with: "-//Microsoft//DTD Internet
272           + The public identifier starts with: "-//Microsoft//DTD Internet
274           + The public identifier starts with: "-//Microsoft//DTD Internet
276           + The public identifier starts with: "-//Microsoft//DTD Internet
278           + The public identifier starts with: "-//Microsoft//DTD Internet
280           + The public identifier starts with: "-//Netscape Comm.
282           + The public identifier starts with: "-//Netscape Comm.
284           + The public identifier starts with: "-//O'Reilly and
286           + The public identifier starts with: "-//O'Reilly and
288           + The public identifier starts with: "-//O'Reilly and
290           + The public identifier starts with: "-//SoftQuad Software//DTD
292           + The public identifier starts with: "-//SoftQuad//DTD HoTMetaL
294           + The public identifier starts with: "-//Spyglass//DTD HTML 2.0
296           + The public identifier starts with: "-//SQ//DTD HTML 2.0
298           + The public identifier starts with: "-//Sun Microsystems
300           + The public identifier starts with: "-//Sun Microsystems
302           + The public identifier starts with: "-//W3C//DTD HTML 3
304           + The public identifier starts with: "-//W3C//DTD HTML 3.2
306           + The public identifier starts with: "-//W3C//DTD HTML 3.2
308           + The public identifier starts with: "-//W3C//DTD HTML 3.2//"
309           + The public identifier starts with: "-//W3C//DTD HTML 3.2S
311           + The public identifier starts with: "-//W3C//DTD HTML 4.0
313           + The public identifier starts with: "-//W3C//DTD HTML 4.0
315           + The public identifier starts with: "-//W3C//DTD HTML
317           + The public identifier starts with: "-//W3C//DTD HTML
319           + The public identifier starts with: "-//W3C//DTD W3 HTML//"
320           + The public identifier starts with: "-//W3O//DTD W3 HTML 3.0//"
321           + The public identifier is set to: "-//W3O//DTD W3 HTML Strict
323           + The public identifier starts with: "-//WebTechs//DTD Mozilla
325           + The public identifier starts with: "-//WebTechs//DTD Mozilla
327           + The public identifier is set to: "-/W3C/DTD HTML 4.0
329           + The public identifier is set to: "HTML"
330           + The system identifier is set to:
332           + The system identifier is missing and the public identifier
334           + The system identifier is missing and the public identifier
337           Otherwise, if the DOCTYPE token matches one of the conditions in
338           the following list, then set the document to limited quirks
341           + The public identifier starts with: "-//W3C//DTD XHTML 1.0
343           + The public identifier starts with: "-//W3C//DTD XHTML 1.0
345           + The system identifier is not missing and the public identifier
347           + The system identifier is not missing and the public identifier
350           The name, system identifier, and public identifier strings must
351           be compared to the values given in the lists above in an ASCII
352           case-insensitive manner. A system identifier whose value is the
353           empty string is not considered missing for the purposes of the
356           Then, switch the insertion mode to "before html".
361           Set the document to quirks mode.
363           Switch the insertion mode to "before html", then reprocess the
366       8.2.5.5 The "before html" insertion mode
368    When the insertion mode is "before html", tokens must be handled as
372           Parse error. Ignore the token.
375           Append a Comment node to the Document object with the data
376           attribute set to the data given in the comment token.
380           Ignore the token.
383           Create an element for the token in the HTML namespace. Append it
384           to the Document object. Put this element in the stack of open
387           If the token has an attribute "manifest", then resolve the value
389           run the application cache selection algorithm with the resulting
391           resolving it fails, run the application cache selection
392           algorithm with no manifest. The algorithm must be passed the
395           Switch the insertion mode to "before head".
398           Create an HTMLElement node with the tag name html, in the HTML
399           namespace. Append it to the Document object. Put this element in
400           the stack of open elements.
402           Run the application cache selection algorithm with no manifest,
403           passing it the Document object.
405           Switch the insertion mode to "before head", then reprocess the
409           --><html>" puts the comment before the root node (or should we?)
411    The root element can end up being removed from the Document object,
413    continues being appended to the nodes as described in the next section.
415       8.2.5.6 The "before head" insertion mode
417    When the insertion mode is "before head", tokens must be handled as
422           Ignore the token.
425           Append a Comment node to the current node with the data
426           attribute set to the data given in the comment token.
429           Parse error. Ignore the token.
432           Process the token using the rules for the "in body" insertion
436           Insert an HTML element for the token.
438           Set the head element pointer to the newly created head element.
440           Switch the insertion mode to "in head".
443           Act as if a start tag token with the tag name "head" and no
444           attributes had been seen, then reprocess the current token.
447           Parse error. Ignore the token.
450           Act as if a start tag token with the tag name "head" and no
451           attributes had been seen, then reprocess the current token.
454           the current token being reprocessed in the "after head"
457       8.2.5.7 The "in head" insertion mode
459    When the insertion mode is "in head", tokens must be handled as
464           Insert the character into the current node.
467           Append a Comment node to the current node with the data
468           attribute set to the data given in the comment token.
471           Parse error. Ignore the token.
474           Process the token using the rules for the "in body" insertion
479           Insert an HTML element for the token. Immediately pop the
480           current node off the stack of open elements.
482           Acknowledge the token's self-closing flag, if it is set.
485           Insert an HTML element for the token. Immediately pop the
486           current node off the stack of open elements.
488           Acknowledge the token's self-closing flag, if it is set.
490           If the element has a charset attribute, and its value is a
491           supported encoding, and the confidence is currently tentative,
492           then change the encoding to the encoding given by the value of
493           the charset attribute.
495           Otherwise, if the element has a content attribute, and applying
496           the algorithm for extracting an encoding from a Content-Type to
497           its value returns a supported encoding encoding, and the
498           confidence is currently tentative, then change the encoding to
499           the encoding encoding.
502           Follow the generic RCDATA element parsing algorithm.
504    A start tag whose tag name is "noscript", if the scripting flag is
508           Follow the generic CDATA element parsing algorithm.
510    A start tag whose tag name is "noscript", if the scripting flag is
512           Insert an HTML element for the token.
514           Switch the insertion mode to "in head noscript".
518          1. Create an element for the token in the HTML namespace.
519          2. Mark the element as being "parser-inserted".
520             This ensures that, if the script is external, any
521             document.write() calls in the script will execute in-line,
522             instead of blowing the document away, as would happen in most
523             other cases. It also prevents the script from executing until
524             the end tag is seen.
525          3. If the parser was originally created for the HTML fragment
526             parsing algorithm, then mark the script element as "already
528          4. Append the new element to the current node.
529          5. Switch the tokeniser's content model flag to the CDATA state.
530          6. Let the original insertion mode be the current insertion mode.
531          7. Switch the insertion mode to "in CDATA/RCDATA".
534           Pop the current node (which will be the head element) off the
537           Switch the insertion mode to "after head".
540           Act as described in the "anything else" entry below.
544           Parse error. Ignore the token.
547           Act as if an end tag token with the tag name "head" had been
548           seen, and reprocess the current token.
550           In certain UAs, some elements don't trigger the "in body" mode
551           straight away, but instead get put into the head. Do we want to
554       8.2.5.8 The "in head noscript" insertion mode
556    When the insertion mode is "in head noscript", tokens must be handled
560           Parse error. Ignore the token.
563           Process the token using the rules for the "in body" insertion
567           Pop the current node (which will be a noscript element) from the
568           stack of open elements; the new current node will be a head
571           Switch the insertion mode to "in head".
579           Process the token using the rules for the "in head" insertion
583           Act as described in the "anything else" entry below.
587           Parse error. Ignore the token.
590           Parse error. Act as if an end tag with the tag name "noscript"
591           had been seen and reprocess the current token.
593       8.2.5.9 The "after head" insertion mode
595    When the insertion mode is "after head", tokens must be handled as
600           Insert the character into the current node.
603           Append a Comment node to the current node with the data
604           attribute set to the data given in the comment token.
607           Parse error. Ignore the token.
610           Process the token using the rules for the "in body" insertion
614           Insert an HTML element for the token.
616           Switch the insertion mode to "in body".
619           Insert an HTML element for the token.
621           Switch the insertion mode to "in frameset".
627           Push the node pointed to by the head element pointer onto the
630           Process the token using the rules for the "in head" insertion
633           Remove the node pointed to by the head element pointer from the
637           Act as described in the "anything else" entry below.
641           Parse error. Ignore the token.
644           Act as if a start tag token with the tag name "body" and no
645           attributes had been seen, and then reprocess the current token.
647       8.2.5.10 The "in body" insertion mode
649    When the insertion mode is "in body", tokens must be handled as
653           Reconstruct the active formatting elements, if any.
655           Insert the token's character into the current node.
658           Append a Comment node to the current node with the data
659           attribute set to the data given in the comment token.
662           Parse error. Ignore the token.
665           Parse error. For each attribute on the token, check to see if
666           the attribute is already present on the top element of the stack
667           of open elements. If it is not, add the attribute and its
673           Process the token using the rules for the "in head" insertion
679           If the second element on the stack of open elements is not a
680           body element, or, if the stack of open elements has only one
681           node on it, then ignore the token. (fragment case)
683           Otherwise, for each attribute on the token, check to see if the
684           attribute is already present on the body element (the second
685           element) on the stack of open elements. If it is not, add the
689           If there is a node in the stack of open elements that is not
692           thead element, a tr element, the body element, or the html
698           If the stack of open elements does not have a body element in
699           scope, this is a parse error; ignore the token.
701           Otherwise, if there is a node in the stack of open elements that
704           element, a thead element, a tr element, the body element, or the
707           Switch the insertion mode to "after body".
711           if that token wasn't ignored, reprocess the current token.
713           The fake end tag token here can only be ignored in the fragment
720           If the stack of open elements has a p element in scope, then act
721           as if an end tag with the tag name "p" had been seen.
723           Insert an HTML element for the token.
727           If the stack of open elements has a p element in scope, then act
728           as if an end tag with the tag name "p" had been seen.
730           If the current node is an element whose tag name is one of "h1",
732           the current node off the stack of open elements.
734           Insert an HTML element for the token.
737           If the stack of open elements has a p element in scope, then act
738           as if an end tag with the tag name "p" had been seen.
740           Insert an HTML element for the token.
742           If the next token is a U+000A LINE FEED (LF) character token,
743           then ignore that token and move on to the next one. (Newlines at
744           the start of pre blocks are ignored as an authoring
748           If the form element pointer is not null, then this is a parse
749           error; ignore the token.
753           If the stack of open elements has a p element in scope, then act
754           as if an end tag with the tag name "p" had been seen.
756           Insert an HTML element for the token, and set the form element
757           pointer to point to the element created.
760           Run the following algorithm:
762          1. Initialize node to be the current node (the bottommost node of
763             the stack).
764          2. If node is an li element, then act as if an end tag with the
765             tag name "li" had been seen, then jump to the last step.
766          3. If node is not in the formatting category, and is not in the
768             then jump to the last step.
769          4. Otherwise, set node to the previous entry in the stack of open
771          5. This is the last step.
772             If the stack of open elements has a p element in scope, then
773             act as if an end tag with the tag name "p" had been seen.
774             Finally, insert an HTML element for the token.
777           Run the following algorithm:
779          1. Initialize node to be the current node (the bottommost node of
780             the stack).
782             the same tag name as node had been seen, then jump to the last
784          3. If node is not in the formatting category, and is not in the
786             then jump to the last step.
787          4. Otherwise, set node to the previous entry in the stack of open
789          5. This is the last step.
790             If the stack of open elements has a p element in scope, then
791             act as if an end tag with the tag name "p" had been seen.
792             Finally, insert an HTML element for the token.
795           If the stack of open elements has a p element in scope, then act
796           as if an end tag with the tag name "p" had been seen.
798           Insert an HTML element for the token.
800           Switch the content model flag to the PLAINTEXT state.
802           Once a start tag with the tag name "plaintext" has been seen,
803           that will be the last token ever seen other than character
804           tokens (and the end-of-file token), because there is no way to
805           switch the content model flag out of the PLAINTEXT state.
811           If the stack of open elements does not have an element in scope
812           with the same tag name as that of the token, then this is a
813           parse error; ignore the token.
818          2. If the current node is not an element with the same tag name
819             as that of the token, then this is a parse error.
820          3. Pop elements from the stack of open elements until an element
821             with the same tag name as the token has been popped from the
825           Let node be the element that the form element pointer is set to.
827           Set the form element pointer to null.
829           If node is null or the stack of open elements does not have node
830           in scope, then this is a parse error; ignore the token.
835          2. If the current node is not node, then this is a parse error.
836          3. Remove node from the stack of open elements.
839           If the stack of open elements does not have an element in scope
840           with the same tag name as that of the token, then this is a
841           parse error; act as if a start tag with the tag name p had been
842           seen, then reprocess the current token.
846          1. Generate implied end tags, except for elements with the same
847             tag name as the token.
848          2. If the current node is not an element with the same tag name
849             as that of the token, then this is a parse error.
850          3. Pop elements from the stack of open elements until an element
851             with the same tag name as the token has been popped from the
855           If the stack of open elements does not have an element in scope
856           with the same tag name as that of the token, then this is a
857           parse error; ignore the token.
861          1. Generate implied end tags, except for elements with the same
862             tag name as the token.
863          2. If the current node is not an element with the same tag name
864             as that of the token, then this is a parse error.
865          3. Pop elements from the stack of open elements until an element
866             with the same tag name as the token has been popped from the
870           If the stack of open elements does not have an element in scope
872           then this is a parse error; ignore the token.
877          2. If the current node is not an element with the same tag name
878             as that of the token, then this is a parse error.
879          3. Pop elements from the stack of open elements until an element
881             has been popped from the stack.
884           Take a deep breath, then act as described in the "any other end
888           If the list of active formatting elements contains an element
889           whose tag name is "a" between the end of the list and the last
890           marker on the list (or the start of the list if there is no
891           marker on the list), then this is a parse error; act as if an
892           end tag with the tag name "a" had been seen, then remove that
893           element from the list of active formatting elements and the
894           stack of open elements if the end tag didn't already remove it
895           (it might not have if the element is not in table scope).
897           In the non-conforming stream
898           <a href="a">a<table><a href="b">b</table>x, the first a element
899           would be closed upon seeing the second one, and the "x"
901           despite the fact that the outer a element is not in table scope
902           (meaning that a regular </a> end tag at the start of the table
903           wouldn't close the outer a element).
905           Reconstruct the active formatting elements, if any.
907           Insert an HTML element for the token. Add that element to the
912           Reconstruct the active formatting elements, if any.
914           Insert an HTML element for the token. Add that element to the
918           Reconstruct the active formatting elements, if any.
920           If the stack of open elements has a nobr element in scope, then
921           this is a parse error; act as if an end tag with the tag name
922           "nobr" had been seen, then once again reconstruct the active
925           Insert an HTML element for the token. Add that element to the
932          1. Let the formatting element be the last element in the list of
934                o is between the end of the list and the last scope marker
935                  in the list, if any, or the start of the list otherwise,
937                o has the same tag name as the token.
938             If there is no such node, or, if that node is also in the
939             stack of open elements but the element is not in scope, then
940             this is a parse error; ignore the token, and abort these
943             the stack of open elements, then this is a parse error; remove
944             the element from the list, and abort these steps.
946             in the stack and is in scope. If the element is not the
948             the algorithm as written in the following steps.
949          2. Let the furthest block be the topmost node in the stack of
950             open elements that is lower in the stack than the formatting
951             element, and is not an element in the phrasing or formatting
953          3. If there is no furthest block, then the UA must skip the
954             subsequent steps and instead just pop all the nodes from the
955             bottom of the stack of open elements, from the current node up
956             to and including the formatting element, and remove the
957             formatting element from the list of active formatting
959          4. Let the common ancestor be the element immediately above the
960             formatting element in the stack of open elements.
961          5. If the furthest block has a parent node, then remove the
963          6. Let a bookmark note the position of the formatting element in
964             the list of active formatting elements relative to the
965             elements on either side of it in the list.
966          7. Let node and last node be the furthest block. Follow these
968               1. Let node be the element immediately above node in the
970               2. If node is not in the list of active formatting elements,
971                  then remove node from the stack of open elements and then
973               3. Otherwise, if node is the formatting element, then go to
974                  the next step in the overall algorithm.
975               4. Otherwise, if last node is the furthest block, then move
976                  the aforementioned bookmark to be immediately after the
977                  node in the list of active formatting elements.
979                  node, replace the entry for node in the list of active
980                  formatting elements with an entry for the clone, replace
981                  the entry for node in the stack of open elements with an
982                  entry for the clone, and let node be the clone.
987          8. If the common ancestor node is a table, tbody, tfoot, thead,
989             being in the previous step.
990             Otherwise, append whatever last node ended up being in the
991             previous step to the common ancestor node, first removing it
993          9. Perform a shallow clone of the formatting element.
994         10. Take all of the child nodes of the furthest block and append
995             them to the clone created in the last step.
996         11. Append that clone to the furthest block.
997         12. Remove the formatting element from the list of active
998             formatting elements, and insert the clone into the list of
999             active formatting elements at the position of the
1001         13. Remove the formatting element from the stack of open elements,
1002             and insert the clone into the stack of open elements
1003             immediately below the position of the furthest block in that
1007           The way these steps are defined, only elements in the formatting
1010           Because of the way this algorithm causes elements to change
1011           parents, it has been dubbed the "adoption agency algorithm" (in
1013           misnested content, which included the "incest algorithm", the
1014           "secret affair algorithm", and the "Heisenberg algorithm").
1017           If the stack of open elements has a button element in scope,
1018           then this is a parse error; act as if an end tag with the tag
1019           name "button" had been seen, then reprocess the token.
1023           Reconstruct the active formatting elements, if any.
1025           Insert an HTML element for the token.
1027           Insert a marker at the end of the list of active formatting
1032           Reconstruct the active formatting elements, if any.
1034           Insert an HTML element for the token.
1036           Insert a marker at the end of the list of active formatting
1041           If the stack of open elements does not have an element in scope
1042           with the same tag name as that of the token, then this is a
1043           parse error; ignore the token.
1048          2. If the current node is not an element with the same tag name
1049             as that of the token, then this is a parse error.
1050          3. Pop elements from the stack of open elements until an element
1051             with the same tag name as the token has been popped from the
1053          4. Clear the list of active formatting elements up to the last
1057           Reconstruct the active formatting elements, if any.
1059           Follow the generic CDATA element parsing algorithm.
1062           If the stack of open elements has a p element in scope, then act
1063           as if an end tag with the tag name "p" had been seen.
1065           Insert an HTML element for the token.
1067           Switch the insertion mode to "in table".
1071           Reconstruct the active formatting elements, if any.
1073           Insert an HTML element for the token. Immediately pop the
1074           current node off the stack of open elements.
1076           Acknowledge the token's self-closing flag, if it is set.
1079           Insert an HTML element for the token. Immediately pop the
1080           current node off the stack of open elements.
1082           Acknowledge the token's self-closing flag, if it is set.
1085           If the stack of open elements has a p element in scope, then act
1086           as if an end tag with the tag name "p" had been seen.
1088           Insert an HTML element for the token. Immediately pop the
1089           current node off the stack of open elements.
1091           Acknowledge the token's self-closing flag, if it is set.
1094           Parse error. Change the token's tag name to "img" and reprocess
1100           If the form element pointer is not null, then ignore the token.
1104           Acknowledge the token's self-closing flag, if it is set.
1106           Act as if a start tag token with the tag name "form" had been
1109           If the token has an attribute called "action", set the action
1110           attribute on the resulting form element to the value of the
1111           "action" attribute of the token.
1113           Act as if a start tag token with the tag name "hr" had been
1116           Act as if a start tag token with the tag name "p" had been seen.
1118           Act as if a start tag token with the tag name "label" had been
1124           Act as if a start tag token with the tag name "input" had been
1125           seen, with all the attributes from the "isindex" token except
1126           "name", "action", and "prompt". Set the name attribute of the
1127           resulting input element to the value "isindex".
1132           Act as if an end tag token with the tag name "label" had been
1135           Act as if an end tag token with the tag name "p" had been seen.
1137           Act as if a start tag token with the tag name "hr" had been
1140           Act as if an end tag token with the tag name "form" had been
1143           If the token has an attribute with the name "prompt", then the
1144           first stream of characters must be the same string as given in
1145           that attribute, and the second stream of characters must be
1146           empty. Otherwise, the two streams of character tokens together
1147           should, together with the input element, express the equivalent
1149           here: (input field)" in the user's preferred language.
1153          1. Insert an HTML element for the token.
1154          2. If the next token is a U+000A LINE FEED (LF) character token,
1155             then ignore that token and move on to the next one. (Newlines
1156             at the start of textarea elements are ignored as an authoring
1158          3. Switch the tokeniser's content model flag to the RCDATA state.
1159          4. Let the original insertion mode be the current insertion mode.
1160          5. Switch the insertion mode to "in CDATA/RCDATA".
1163    A start tag whose tag name is "noscript", if the scripting flag is
1165           Follow the generic CDATA element parsing algorithm.
1168           Reconstruct the active formatting elements, if any.
1170           Insert an HTML element for the token.
1172           If the insertion mode is one of in table", "in caption", "in
1174           switch the insertion mode to "in select in table". Otherwise,
1175           switch the insertion mode to "in select".
1178           If the stack of open elements has an option element in scope,
1179           then act as if an end tag with the tag name "option" had been
1182           Reconstruct the active formatting elements, if any.
1184           Insert an HTML element for the token.
1187           If the stack of open elements has a ruby element in scope, then
1188           generate implied end tags. If the current node is not then a
1189           ruby element, this is a parse error; pop all the nodes from the
1190           current node up to the node immediately before the bottommost
1191           ruby element on the stack of open elements.
1193           Insert an HTML element for the token.
1196           Parse error. Act as if a start tag token with the tag name "br"
1197           had been seen. Ignore the end tag token.
1200           Reconstruct the active formatting elements, if any.
1202           Adjust MathML attributes for the token. (This fixes the case of
1205           Adjust foreign attributes for the token. (This fixes the use of
1208           Insert a foreign element for the token, in the MathML namespace.
1210           If the token has its self-closing flag set, pop the current node
1211           off the stack of open elements and acknowledge the token's
1214           Otherwise, let the secondary insertion mode be the current
1215           insertion mode, and then switch the insertion mode to "in
1221           Parse error. Ignore the token.
1224           Reconstruct the active formatting elements, if any.
1226           Insert an HTML element for the token.
1231           Run the following steps:
1233          1. Initialize node to be the current node (the bottommost node of
1234             the stack).
1235          2. If node has the same tag name as the end tag token, then:
1237               2. If the tag name of the end tag token does not match the
1238                  tag name of the current node, this is a parse error.
1239               3. Pop all the nodes from the current node up to node,
1241          3. Otherwise, if node is in neither the formatting category nor
1242             the phrasing category, then this is a parse error; ignore the
1244          4. Set node to the previous entry in the stack of open elements.
1247       8.2.5.11 The "in CDATA/RCDATA" insertion mode
1249    When the insertion mode is "in CDATA/RCDATA", tokens must be handled as
1253           Insert the token's character into the current node.
1258           If the current node is a script element, mark the script element
1261           Pop the current node off the stack of open elements.
1263           Switch the insertion mode to the original insertion mode and
1264           reprocess the current token.
1267           Let script be the current node (which will be a script element).
1269           Pop the current node off the stack of open elements.
1271           Switch the insertion mode to the original insertion mode.
1273           Let the old insertion point have the same value as the current
1274           insertion point. Let the insertion point be just before the next
1277           Increment the parser's script nesting level by one.
1279           Run the script. This might cause some script to execute, which
1280           might cause new characters to be inserted into the tokeniser,
1281           and might cause the tokeniser to output more tokens, resulting
1282           in a reentrant invocation of the parser.
1284           Decrement the parser's script nesting level by one. If the
1285           parser's script nesting level is zero, then set the parser pause
1288           Let the insertion point have the value of the old insertion
1289           point. (In other words, restore the insertion point to the value
1290           it had before the previous paragraph. This value might be the
1295         If the tree construction stage is being called reentrantly, say
1297                 Set the parser pause flag to true, and abort the
1298                 processing of any nested invocations of the tokeniser,
1299                 yielding control back to the caller. (Tokenization will
1300                 resume when the caller returns to the "outer" tree
1306               1. Let the script be the pending external script. There is
1308               2. Pause until the script has completed loading.
1309               3. Let the insertion point be just before the next input
1311               4. Execute the script.
1312               5. Let the insertion point be undefined again.
1317           Pop the current node off the stack of open elements.
1319           Switch the insertion mode to the original insertion mode.
1321       8.2.5.12 The "in table" insertion mode
1323    When the insertion mode is "in table", tokens must be handled as
1328           If the current table is tainted, then act as described in the
1331           Otherwise, insert the character into the current node.
1334           Append a Comment node to the current node with the data
1335           attribute set to the data given in the comment token.
1338           Parse error. Ignore the token.
1341           Clear the stack back to a table context. (See below.)
1343           Insert a marker at the end of the list of active formatting
1346           Insert an HTML element for the token, then switch the insertion
1350           Clear the stack back to a table context. (See below.)
1352           Insert an HTML element for the token, then switch the insertion
1356           Act as if a start tag token with the tag name "colgroup" had
1357           been seen, then reprocess the current token.
1360           Clear the stack back to a table context. (See below.)
1362           Insert an HTML element for the token, then switch the insertion
1366           Act as if a start tag token with the tag name "tbody" had been
1367           seen, then reprocess the current token.
1370           Parse error. Act as if an end tag token with the tag name
1372           reprocess the current token.
1374           The fake end tag token here can only be ignored in the fragment
1378           If the stack of open elements does not have an element in table
1379           scope with the same tag name as the token, this is a parse
1380           error. Ignore the token. (fragment case)
1385           popped from the stack.
1387           Reset the insertion mode appropriately.
1391           Parse error. Ignore the token.
1394           If the current table is tainted then act as described in the
1397           Otherwise, process the token using the rules for the "in head"
1401           If the token does not have an attribute with the name "type", or
1403           case-insensitive match for the string "hidden", or, if the
1404           current table is tainted, then: act as described in the
1411           Insert an HTML element for the token.
1413           Pop that input element off the stack of open elements.
1416           If the current node is not the root html element, then this is a
1419           It can only be the current node in the fragment case.
1424           Parse error. Process the token using the rules for the "in body"
1425           insertion mode, except that if the current node is a table,
1427           be inserted into the current node, it must instead be foster
1430    When the steps above require the UA to clear the stack back to a table
1431    context, it means that the UA must, while the current node is not a
1432    table element or an html element, pop elements from the stack of open
1435    The current node being an html element after this process is a fragment
1438       8.2.5.13 The "in caption" insertion mode
1440    When the insertion mode is "in caption", tokens must be handled as
1444           If the stack of open elements does not have an element in table
1445           scope with the same tag name as the token, this is a parse
1446           error. Ignore the token. (fragment case)
1452           Now, if the current node is not a caption element, then this is
1456           popped from the stack.
1458           Clear the list of active formatting elements up to the last
1461           Switch the insertion mode to "in table".
1467           Parse error. Act as if an end tag with the tag name "caption"
1468           had been seen, then, if that token wasn't ignored, reprocess the
1471           The fake end tag token here can only be ignored in the fragment
1476           Parse error. Ignore the token.
1479           Process the token using the rules for the "in body" insertion
1482       8.2.5.14 The "in column group" insertion mode
1484    When the insertion mode is "in column group", tokens must be handled as
1489           Insert the character into the current node.
1492           Append a Comment node to the current node with the data
1493           attribute set to the data given in the comment token.
1496           Parse error. Ignore the token.
1499           Process the token using the rules for the "in body" insertion
1503           Insert an HTML element for the token. Immediately pop the
1504           current node off the stack of open elements.
1506           Acknowledge the token's self-closing flag, if it is set.
1509           If the current node is the root html element, then this is a
1510           parse error; ignore the token. (fragment case)
1512           Otherwise, pop the current node (which will be a colgroup
1513           element) from the stack of open elements. Switch the insertion
1517           Parse error. Ignore the token.
1520           If the current node is the root html element, then stop parsing.
1523           Otherwise, act as described in the "anything else" entry below.
1526           Act as if an end tag with the tag name "colgroup" had been seen,
1527           and then, if that token wasn't ignored, reprocess the current
1530           The fake end tag token here can only be ignored in the fragment
1533       8.2.5.15 The "in table body" insertion mode
1535    When the insertion mode is "in table body", tokens must be handled as
1539           Clear the stack back to a table body context. (See below.)
1541           Insert an HTML element for the token, then switch the insertion
1545           Parse error. Act as if a start tag with the tag name "tr" had
1546           been seen, then reprocess the current token.
1549           If the stack of open elements does not have an element in table
1550           scope with the same tag name as the token, this is a parse
1551           error. Ignore the token.
1555           Clear the stack back to a table body context. (See below.)
1557           Pop the current node from the stack of open elements. Switch the
1564           If the stack of open elements does not have a tbody, thead, or
1565           tfoot element in table scope, this is a parse error. Ignore the
1570           Clear the stack back to a table body context. (See below.)
1572           Act as if an end tag with the same tag name as the current node
1573           ("tbody", "tfoot", or "thead") had been seen, then reprocess the
1578           Parse error. Ignore the token.
1581           Process the token using the rules for the "in table" insertion
1584    When the steps above require the UA to clear the stack back to a table
1585    body context, it means that the UA must, while the current node is not
1586    a tbody, tfoot, thead, or html element, pop elements from the stack of
1589    The current node being an html element after this process is a fragment
1592       8.2.5.16 The "in row" insertion mode
1594    When the insertion mode is "in row", tokens must be handled as follows:
1597           Clear the stack back to a table row context. (See below.)
1599           Insert an HTML element for the token, then switch the insertion
1602           Insert a marker at the end of the list of active formatting
1606           If the stack of open elements does not have an element in table
1607           scope with the same tag name as the token, this is a parse
1608           error. Ignore the token. (fragment case)
1612           Clear the stack back to a table row context. (See below.)
1614           Pop the current node (which will be a tr element) from the stack
1615           of open elements. Switch the insertion mode to "in table body".
1621           Act as if an end tag with the tag name "tr" had been seen, then,
1622           if that token wasn't ignored, reprocess the current token.
1624           The fake end tag token here can only be ignored in the fragment
1628           If the stack of open elements does not have an element in table
1629           scope with the same tag name as the token, this is a parse
1630           error. Ignore the token.
1632           Otherwise, act as if an end tag with the tag name "tr" had been
1633           seen, then reprocess the current token.
1637           Parse error. Ignore the token.
1640           Process the token using the rules for the "in table" insertion
1643    When the steps above require the UA to clear the stack back to a table
1644    row context, it means that the UA must, while the current node is not a
1645    tr element or an html element, pop elements from the stack of open
1648    The current node being an html element after this process is a fragment
1651       8.2.5.17 The "in cell" insertion mode
1653    When the insertion mode is "in cell", tokens must be handled as
1657           If the stack of open elements does not have an element in table
1658           scope with the same tag name as that of the token, then this is
1659           a parse error and the token must be ignored.
1665           Now, if the current node is not an element with the same tag
1666           name as the token, then this is a parse error.
1668           Pop elements from this stack until an element with the same tag
1669           name as the token has been popped from the stack.
1671           Clear the list of active formatting elements up to the last
1674           Switch the insertion mode to "in row". (The current node will be
1679           If the stack of open elements does not have a td or th element
1680           in table scope, then this is a parse error; ignore the token.
1683           Otherwise, close the cell (see below) and reprocess the current
1688           Parse error. Ignore the token.
1692           If the stack of open elements does not have an element in table
1693           scope with the same tag name as that of the token (which can
1694           only happen for "tbody", "tfoot" and "thead", or, in the
1695           fragment case), then this is a parse error and the token must be
1698           Otherwise, close the cell (see below) and reprocess the current
1702           Process the token using the rules for the "in body" insertion
1705    Where the steps above say to close the cell, they mean to run the
1707     1. If the stack of open elements has a td element in table scope, then
1708        act as if an end tag token with the tag name "td" had been seen.
1709     2. Otherwise, the stack of open elements will have a th element in
1710        table scope; act as if an end tag token with the tag name "th" had
1713    The stack of open elements cannot have both a td and a th element in
1714    table scope at the same time, nor can it have neither when the
1717       8.2.5.18 The "in select" insertion mode
1719    When the insertion mode is "in select", tokens must be handled as
1723           Insert the token's character into the current node.
1726           Append a Comment node to the current node with the data
1727           attribute set to the data given in the comment token.
1730           Parse error. Ignore the token.
1733           Process the token using the rules for the "in body" insertion
1737           If the current node is an option element, act as if an end tag
1738           with the tag name "option" had been seen.
1740           Insert an HTML element for the token.
1743           If the current node is an option element, act as if an end tag
1744           with the tag name "option" had been seen.
1746           If the current node is an optgroup element, act as if an end tag
1747           with the tag name "optgroup" had been seen.
1749           Insert an HTML element for the token.
1752           First, if the current node is an option element, and the node
1753           immediately before it in the stack of open elements is an
1754           optgroup element, then act as if an end tag with the tag name
1757           If the current node is an optgroup element, then pop that node
1758           from the stack of open elements. Otherwise, this is a parse
1759           error; ignore the token.
1762           If the current node is an option element, then pop that node
1763           from the stack of open elements. Otherwise, this is a parse
1764           error; ignore the token.
1767           If the stack of open elements does not have an element in table
1768           scope with the same tag name as the token, this is a parse
1769           error. Ignore the token. (fragment case)
1773           Pop elements from the stack of open elements until a select
1774           element has been popped from the stack.
1776           Reset the insertion mode appropriately.
1779           Parse error. Act as if the token had been an end tag with the
1783           Parse error. Act as if an end tag with the tag name "select" had
1784           been seen, and reprocess the token.
1787           Process the token using the rules for the "in head" insertion
1791           If the current node is not the root html element, then this is a
1794           It can only be the current node in the fragment case.
1799           Parse error. Ignore the token.
1801       8.2.5.19 The "in select in table" insertion mode
1803    When the insertion mode is "in select in table", tokens must be handled
1808           Parse error. Act as if an end tag with the tag name "select" had
1809           been seen, and reprocess the token.
1815           If the stack of open elements has an element in table scope with
1816           the same tag name as that of the token, then act as if an end
1817           tag with the tag name "select" had been seen, and reprocess the
1818           token. Otherwise, ignore the token.
1821           Process the token using the rules for the "in select" insertion
1824       8.2.5.20 The "in foreign content" insertion mode
1826    When the insertion mode is "in foreign content", tokens must be handled
1830           Insert the token's character into the current node.
1833           Append a Comment node to the current node with the data
1834           attribute set to the data given in the comment token.
1837           Parse error. Ignore the token.
1839    A start tag whose tag name is neither "mglyph" nor "malignmark", if the
1840           current node is an mi element in the MathML namespace.
1842    A start tag whose tag name is neither "mglyph" nor "malignmark", if the
1843           current node is an mo element in the MathML namespace.
1845    A start tag whose tag name is neither "mglyph" nor "malignmark", if the
1846           current node is an mn element in the MathML namespace.
1848    A start tag whose tag name is neither "mglyph" nor "malignmark", if the
1849           current node is an ms element in the MathML namespace.
1851    A start tag whose tag name is neither "mglyph" nor "malignmark", if the
1852           current node is an mtext element in the MathML namespace.
1854    A start tag, if the current node is an element in the HTML namespace.
1856           Process the token using the rules for the secondary insertion
1859           If, after doing so, the insertion mode is still "in foreign
1861           other than the HTML namespace, switch the insertion mode to the
1871    A start tag whose tag name is "font", if the token has any attributes
1877           Pop elements from the stack of open elements until the current
1878           node is in the HTML namespace.
1880           Switch the insertion mode to the secondary insertion mode, and
1881           reprocess the token.
1884           If the current node is an element in the MathML namespace,
1885           adjust MathML attributes for the token. (This fixes the case of
1888           Adjust foreign attributes for the token. (This fixes the use of
1891           Insert a foreign element for the token, in the same namespace as
1892           the current node.
1894           If the token has its self-closing flag set, pop the current node
1895           off the stack of open elements and acknowledge the token's
1898       8.2.5.21 The "after body" insertion mode
1900    When the insertion mode is "after body", tokens must be handled as
1905           Process the token using the rules for the "in body" insertion
1909           Append a Comment node to the first element in the stack of open
1910           elements (the html element), with the data attribute set to the
1911           data given in the comment token.
1914           Parse error. Ignore the token.
1917           Process the token using the rules for the "in body" insertion
1921           If the parser was originally created as part of the HTML
1922           fragment parsing algorithm, this is a parse error; ignore the
1925           Otherwise, switch the insertion mode to "after after body".
1931           Parse error. Switch the insertion mode to "in body" and
1932           reprocess the token.
1934       8.2.5.22 The "in frameset" insertion mode
1936    When the insertion mode is "in frameset", tokens must be handled as
1941           Insert the character into the current node.
1944           Append a Comment node to the current node with the data
1945           attribute set to the data given in the comment token.
1948           Parse error. Ignore the token.
1951           Process the token using the rules for the "in body" insertion
1955           Insert an HTML element for the token.
1958           If the current node is the root html element, then this is a
1959           parse error; ignore the token. (fragment case)
1961           Otherwise, pop the current node from the stack of open elements.
1963           If the parser was not originally created as part of the HTML
1964           fragment parsing algorithm (fragment case), and the current node
1965           is no longer a frameset element, then switch the insertion mode
1969           Insert an HTML element for the token. Immediately pop the
1970           current node off the stack of open elements.
1972           Acknowledge the token's self-closing flag, if it is set.
1975           Process the token using the rules for the "in head" insertion
1979           If the current node is not the root html element, then this is a
1982           It can only be the current node in the fragment case.
1987           Parse error. Ignore the token.
1989       8.2.5.23 The "after frameset" insertion mode
1991    When the insertion mode is "after frameset", tokens must be handled as
1996           Insert the character into the current node.
1999           Append a Comment node to the current node with the data
2000           attribute set to the data given in the comment token.
2003           Parse error. Ignore the token.
2006           Process the token using the rules for the "in body" insertion
2010           Switch the insertion mode to "after after frameset".
2013           Process the token using the rules for the "in head" insertion
2020           Parse error. Ignore the token.
2023    frames but want to show the NOFRAMES content. Supporting the former is
2024    easy; supporting the latter is harder.
2026       8.2.5.24 The "after after body" insertion mode
2028    When the insertion mode is "after after body", tokens must be handled
2032           Append a Comment node to the Document object with the data
2033           attribute set to the data given in the comment token.
2040           Process the token using the rules for the "in body" insertion
2047           Parse error. Switch the insertion mode to "in body" and
2048           reprocess the token.
2050       8.2.5.25 The "after after frameset" insertion mode
2052    When the insertion mode is "after after frameset", tokens must be
2056           Append a Comment node to the Document object with the data
2057           attribute set to the data given in the comment token.
2064           Process the token using the rules for the "in body" insertion
2071           Process the token using the rules for the "in head" insertion
2075           Parse error. Ignore the token.
2077     8.2.6 The end
2079    Once the user agent stops parsing the document, the user agent must
2080    follow the steps in this section.
2082    First, the current document readiness must be set to "interactive".
2084    Then, the rules for when a script completes loading start applying
2085    (script execution is no longer managed by the parser).
2087    If any of the scripts in the list of scripts that will execute as soon
2088    as possible have completed loading, or if the list of scripts that will
2089    execute asynchronously is not empty and the first script in that list
2090    has completed loading, then the user agent must act as if those scripts
2091    just completed loading, following the rules given for that in the
2094    Then, if the list of scripts that will execute when the document has
2095    finished parsing is not empty, and the first item in this list has
2096    already completed loading, then the user agent must act as if that
2102    The user agent must then fire a simple event called DOMContentLoaded at
2103    the Document.
2105    Once everything that delays the load event has completed, the user
2106    agent must set the current document readiness to "complete", and then
2107    fire a load event at the body element.
2109    delaying the load event for things like image loads allows for intranet
2111    into the spec?
2116    pipeline, it is possible that the constructed DOM is not compatible
2117    with the XML tool chain in certain subtle ways. For example, an XML
2118    toolchain might not be able to represent attributes with the name
2119    xmlns, since they conflict with the Namespaces in XML syntax. There is
2120    also some data that the HTML parser generates that isn't included in
2121    the DOM itself. This section specifies some rules for handling these
2124    If the XML API being used doesn't support DOCTYPEs, the tool may drop
2127    If the XML API doesn't support attributes in no namespace that are
2129    attributes in the XMLNS namespace, then the tool may drop such
2132    The tool may annotate the output with any namespace declarations
2135    If the XML API being used restricts the allowable characters in the
2136    local names of elements and attributes, then the tool may map all
2137    element and attribute local names that the API wouldn't support to a
2139    supported with the uppercase letter U and the five digits of the
2141    digits 0-9 and capital letters A-F as the symbols, in increasing
2144    For example, the element name foo<bar, which can be output by the HTML
2150    As another example, consider the attribute xlink:href. Used on a MathML
2153    becomes an attribute with no prefix and the local name "xlink:href",
2157    The resulting names from this conversion conveniently can't clash with
2158    any attribute generated by the HTML parser, since those are all either
2159    lowercase or those listed in the adjust foreign attributes algorithm's
2162    If the XML API restricts comments from having two consecutive U+002D
2163    HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE
2166    If the XML API restricts comments from ending in a U+002D HYPHEN-MINUS
2167    character (-), the tool may insert a single U+0020 SPACE character at
2168    the end of such comments.
2170    If the XML API restricts allowed characters in character data, the tool
2175    If the tool has no way to convey out-of-band information, then the tool
2176    may drop the following information:
2177      * Whether the document is set to no quirks mode, limited quirks mode,
2179      * The association between form controls and forms that aren't their
2180        nearest form element ancestor (use of the form element pointer in
2181        the parser)
2183    The mutations allowed by this section apply after the HTML parser's
2186    the user agent is using the rules above to then generate an actual
2187    element in the DOM with the name aU0003AU0003A for that start tag.
2191    The HTML namespace is: http://www.w3.org/1999/xhtml
2193    The MathML namespace is: http://www.w3.org/1998/Math/MathML
2195    The SVG namespace is: http://www.w3.org/2000/svg
2197    The XLink namespace is: http://www.w3.org/1999/xlink
2199    The XML namespace is: http://www.w3.org/XML/1998/namespace
2201    The XMLNS namespace is: http://www.w3.org/2000/xmlns/