1Changes from 1.2 to 1.2.1 2========================= 3Match DOCTYPE case-blind 4Extend PushbackReader's size for oddball cases like & followed by CR 5Leo Sutic's 2x-4x speedup by precompiling HTMLScanner table 6 7Changes from 1.1.3 to 1.2 8========================= 9Changed license to Apache 2.0 10Bogon default model is now ANY, not EMPTY 11Support new DOCTYPE output switches --doctype-system and --doctype-public 12Support new XML declaration output switches --standalone and --version 13New --norootbogons switch makes bogons children of the root 14Don't resolve entity references in attribute values unless semicolon-terminated 15Support character entities above U+FFFF 16Add character entities from the 2007-12-14 draft of xml-entity-names 17Call SAX events startPrefixMapping and endPrefixMapping to report prefixes 18Clean up newline processing, shrinking html.stml considerably 19Allow link elements in the body as well as the head, to avoid excess bodies 20Allow tables inside paragraphs 21Allow cells and forms in thead and tfoot elements without intervening tr element 22The span element is no longer restartable 23Support non-standard elements bgsound, blink, canvas, comment, listing, 24 marquee, nobr, ruby, rbc, rtc, rb, rt, rp, wbr, xmp 25In HTML mode, boolean attributes like checked are output in minimized form 26Correctly handle runs of less-than characters 27Suppress all but the first DOCTYPE declaration 28Modify PI targets containing colons to have underscores instead 29The case of element tags is now canonicalized to the schema 30PI targets are no longer forced to lower case 31 32Changes from 1.1.2 to 1.1.3 33=========================== 34Allow Parser.set* methods to accept null 35Allow setting the LexicalHandler feature to be null 36 in both cases means "use default behavior" 37 38Changes from 1.1.1 to 1.1.2 39=========================== 40Setting CDATAElementsFeature didn't really set CDATAElements instance variable 41 42Changes from 1.1 to 1.1.1 43========================= 44Removed lexical handler calls to startCDATA/endCDATA from CDATA element handling 45Added lexical handler calls to startCDATA/endCDATA from CDATA section handling 46Added CDATAElementsFeature, the programmatic equivalent of the --nocdata switch 47 48Changes from 1.0.5 to 1.1 49========================= 50Add Tatu Saloranta's JAXP support package 51 52Changes from 1.0.4 to 1.0.5 53=========================== 54Major repairs to comment scanning 55Skip leading BOM 56Comment out debugging code in PYXWriter 57Allow &#X as well as &#x 58Add net.sf.saxon to list of supported XSLT engines 59 60Changes from 1.0.4 to 1.0.3 61=========================== 62Certain options were mutually exclusive that should not have been 63Blocked XML declaration from specifying an encoding of "" 64--method=html was not doing the right thing 65 66Changes from 1.0.3 to 1.0.2 67=========================== 68Fixed build file to use Java target version 1.4 69Fixed --version switch to print the right thing 70 71Changes from 1.0.1 to 1.0.2 72=========================== 73Version attribute default value removed from html element 74Leading and trailing hyphens now trimmed properly from comments 75Added --output-encoding switch to control encoding 76If output encoding is Unicode, don't generate character references 77Whitespace compressed and junk stripped from public identifiers 78 79Changes from 1.0 to 1.0.1 80========================= 81Added ignorableWhitespaceFeature and --ignorable to report ignorable whitespace 82 Patch due to David Pashley 83Insert spaces to break up -- in comments 84Change bogus chars in publicids to spaces 85--lexical switch now outputs DOCTYPE if there is one 86Remove unnecessary blank line after XML declaration 87 88Changes from 1.0rc9 to 1.0 89========================== 90Added feature to control restartability 91 Patch due to Nikita Zhuk 92Added corresponding --norestart switch in CommandLine 93Made translate-colons feature actually work 94 95Changes from 1.0rc8 to 1.0rc9 96============================= 97If there is a publicid but no systemid, set systemid to "" 98 99Changes from 1.0rc7 to 1.0rc8 100============================= 101Fixed paper-bag bug (source didn't match binary in release) 102 103Changes from 1.0rc6 to 1.0rc7 104============================= 105LexicalHandler now gets DOCTYPE information (publicid and systemid) 106 Patch due to Mike Bremford 107HTMLScanner now reports more useful debug output when not commented out 108 Patch due to Mike Bremford 109Change "<memberOfAny>" to exclude "<root>" pseudo-element 110 This prevents "script" from being output as a root 111The shared HTMLParser object has been eliminated 112 113Changes from 1.0rc5 to 1.0rc6 114============================= 115If namespaceFeature is false, uri and localname are passed as empty strings 116The namespacePrefixesFeature is now always false 117Command line switch --nons no longer affects namespacePrefixesFeature 118Command line switch --html now implies --nons 119XMLWriter is now told directly to use the schema's URI as default namespace 120XMLWriter now takes the element name from the qname if localname is empty 121 122Changes from 1.0rc4 to 1.0rc5 123============================= 124The --nodefault switch now removes only default attributes, not all of them 125Added --nocolons switch and translate-colons feature to convert ":" 126 in names to "_" (thus suppressing namespaces other than the basic one) 127The root element can be unknown without problem 128Empty <script/> and <style/> tags now work 129Added all standard SAX2 features to feature hashtable 130Reimplemented namespacePrefixes feature (broken since 1.0rc3) 131 132Changes from 1.0rc3 to 1.0rc4 133============================= 134Remove trailing ? from processing instructions (in case the input is XHTML) 135Added Javadocs for all SAX standard and TagSoup-specific features and properties 136Fixed termination conditions for entity/character references 137Fixed EOF-pushback bug that was generating bogus 񥔵 references 138Added Parser feature and --nodefaults switch to ignore default attribute values 139Added support for SAX Locator 140Updated AFL license to version 3.0 141Scanner buffer size increases as needed, allowing large attribute values 142Look for various XSLT implementations as available (still fails in raw 5.0) 143Clean up handling of XML empty tags and SGML minimized end-tags 144Support proper options and help message internally 145Use Hashtable in CommandLine class instead of HashMap 146Do proper buffering of InputStream and Reader 147Clean up content model of noframes element 148Removed htmlMode in XMLWriter 149Added support for XSLT output options METHOD=html and OMIT_XML_DECLARATION=yes 150Command line option --html sets both of these 151Wrote simple validator for TSSL schemas (tssl/tssl-validator.xslt) 152Removed various validity problems in html.tssl 153When processing a start-tag, don't restart elements that aren't in the new 154 element's content model 155Remove bogus double param in tssl.xslt 156 157Changes from 1.0rc2 to 1.0rc3 158============================= 159Convert CR and CRLF to LF in comments and PIs 160Force empty elements to close immediately 161Match close tags of CDATA elements more precisely (but case-blind) 162Process switches on the command line 163Man page available 164 165Changes from 1.0rc1 to 1.0rc2 166============================= 167Isolated & and &# now don't crash parser 168TagSoup no longer depends on /dev/stdin existing 169Refactored Parser class, removing main method to new CommandLine class 170Changes to content models of form, button, table, and tr elements in html.tssl 171'</scr' + 'ipt>' in a script element no longer terminates it 172Introduced "uncloseability" of form and table elements 173"pyxin" property specifies that input is in PYX format 174Correctly cope with unexpected characters around colons, also with multiple colons 175Correctly output comments with "--" in them (by adding a space) 176 177Changes from 0.10.2 to 1.0rc1 178============================= 179Script can now appear anywhere 180Switch -nocdata correctly implemented 181Eliminated useless M_n constants in Schema 182Introduced <memberofAny> and <isRoot> as alternatives to 183 <memberOf> in TSSL 184Allow prefixes in element names 185Attributes are now normalized 186Expanded public API for Element and ElementType 187Javadoc improved 188 189Changes from 0.10.1 to 0.10.2 190============================= 191Removed misfeature whereby > terminated a tag even inside quotes 192Added licensing language to XSLT scripts, RELAX NG schemas 193Removed long-standing mishandling of entity references in attributes 194Cleaned up logic for converting junky strings to proper XML Names 195Correctly handle empty tag that has no whitespace or attributes 196Restore correct 0.9.3 handling of an apparent end-tag in a CDATA element 197Added script element to content model of head element 198 199Changes from 0.9.7 to 0.10.1 (there is no 0.10.0): 200================================================== 201Convert to XSLT configuration exclusively; 202 Perl code and tab-separated tables are gone 203Remove xmlns:* attributes 204Append "_" to attribute names ending in ":" 205Don't prepend "_" to an attribute name starting in "_" 206Handle namespace prefixes in attributes: 207 "xml" prefix is handled correctly 208 other prefixes are mapped to "urn:x-prefix:foo" 209Ignore XML declarations 210-Dnocdata=true turns off F_CDATA on script and style elements 211Fixed off-by-one errors in character references that made them uninterpreted 212Start-tags ending in a minimized attribute are no longer being dropped 213XML empty tags are now supported (though slashes are still allowed in 214 unquoted attribute values) 215 216Changes from 0.9.6 to 0.9.7: 217============================ 218Upgraded AFL to version 2.1 219Passed through newlines in character content (very old bug) 220 221Changes from 0.9.5 to 0.9.6: 222============================ 223Script element can appear directly in body 224">" terminates a start-tag even inside a quoted attribute, 225 to protect against unbalanced quotes 226"_" is prepended to attributes that don't begin with a letter 227Remove "xmlns" attributes from the input 228All standard features can now be set 229 (although there is no effect from doing so) 230New "bogons-empty" feature can be set to false to give bogons 231 content model of ANY rather than EMPTY; 232 -Dany switch sets this feature to false 233TSSL now has an explicit group element to declare an element group 234STML is a new XML format for modeling state-table changes 235License updated to AFL 2.1 236 237Changes from 0.9.4 to 0.9.5: 238============================ 239S in the statetable now means \r and \n and \t as well as space 240 (as was always intended; brain fart!) 241Ins and del elements are now allowed everywhere 242TSSL now correctly supports attributes that are legal on all elements 243 244Changes from 0.9.3 to 0.9.4: 245============================ 246Fixed paper-bag bug that revealed attribute type BOOLEAN to applications. 247Obsolete ABSTRACT removed in favor of README. 248Improved implementation of CDATA restart after bogus end-tag. 249Allowed hyphen, underscore, and period in names as well as colon. 250First cut at TagSoup Schema Language -- doesn't do anything yet. 251Support CDATA sections on input. 252Don't generate built-in entities within CDATA elements. 253 254Changes from 0.9.2 to 0.9.3: 255============================ 256Convenience main program "tagsoup" in bin directory. 257Begin to integrate tests. 258Introduced BOOLEAN type (currently just converted to NMTOKEN). 259Features that actually work are now named constants in Parser. 260Double root elements are really gone now. 261ID attributes weren't being removed from restarted elements. 262Fixed a bug that made unknown elements disappear in some cases. 263Parser is now safely reusable. 264PYXWriter and XMLWriter now implement LexicalHandler. 265Parser reports comments, startCDATA, and endCDATA events to a LexicalHandler. 266ScanHandler methods now throw only SAXException, not also IOException. 267-Dlexical=true switch sets the ContentHandler as a LexicalHandler as well 268 (XMLWriter prints comments, ignores CDATA sections; PYXWriter ignores all). 269-Dreuse=true switch reuses a single Parser object (no great speed gain). 270We now disallow an a element as the child of another a element. 271An empty input is now treated as zero-length character content. 272HTMLWriter is gone in favor of an extended XMLWriter with get/setHTMLMode methods. 273CDATA elements only terminaate with matching end-tags (thanks to Sebastien Bardoux). 274 275Changes from 0.9.1 to 0.9.2: 276============================ 277No longer inserts bogus ; after unknown entity reference without ;. 278Consecutive entity references now work correctly. 279Setting namespaces and namespace-prefixes methods now works. 280-Dnons=true option turns off namespace and prefix. 281New feature http://www.ccil.org/~cowan/tagsoup/features/ignore-bogons" 282 suppresses unknown start-tags (any end-tag will be automatically ignored). 283-Dnobogons=true option turns ignore-bogons on. 284Suppress unknown and/or empty initial start-tag always 285 (prevents double root element). 286Schema now allows style as an inline element, like script. 287Schema now allows tr as a child of table to avoid problems with embedded tables. 288Clear Parser instance variables to make Parsers properly reusable. 289 290Changes from 0.9 to 0.9.1: 291========================== 292Incorporated patch for -jar support by Joseph Walton. 293Incorporated patch for Megginson XMLWriter support by Joseph Walton. 294Changed existing XMLWriter to HTMLWriter. 295Rewrote Parsermain for better features, removed Tester class. 296-Dnewline=true removed, now implied by -DHTML=true. 297-Dfiles=true now used to generate separate outputs (old Tester behavior) 298 with extension xhtml (removing any old extension). 299Fixed nasty bug in HTMLScanner that was failing to fix unusual entities. 300Don't attempt to smash whitespace to spaces any more. 301 302Changes from 0.8 to 0.9: 303======================== 304Ant-ified by Martin Rademacher. 305Don't suppress colons in element names. 306Entity problems fixed (I hope). 307Can now set namespace and namespace-prefixes features (without effect). 308Properly templatize HTMLModels.java. 309Attributes are no longer in the HTML namespace. 310