1124907 HTML parse buffer problem when parsing larse in-memory docs 2124110 DTD validation && wrong namespace 3123564 xmllint --html --format 4 5 TODO for the XML parser and stuff: 6 ================================== 7 8 this tend to be outdated :-\ ... 9 10DOCS: 11===== 12 13- use case of using XInclude to load for example a description. 14 order document + product base -(XSLT)-> quote with XIncludes 15 | 16 HTML output with description of parts <---(XSLT)-- 17 18TODO: 19===== 20- XInclude at the SAX level (libSRVG) 21- fix the C code prototype to bring back doc/libxml-undocumented.txt 22 to a reasonable level 23- Computation of base when HTTP redirect occurs, might affect HTTP 24 interfaces. 25- Computation of base in XInclude. Relativization of URIs. 26- listing all attributes in a node. 27- Better checking of external parsed entities TAG 1234 28- Go through erratas and do the cleanup. 29 http://www.w3.org/XML/xml-19980210-errata ... started ... 30- jamesh suggestion: SAX like functions to save a document ie. call a 31 function to open a new element with given attributes, write character 32 data, close last element, etc 33 + inversted SAX, initial patch in April 2002 archives. 34- htmlParseDoc has parameter encoding which is not used. 35 Function htmlCreateDocParserCtxt ignore it. 36- fix realloc() usage. 37- Stricten the UTF8 conformance (Martin Duerst): 38 http://www.w3.org/2001/06/utf-8-test/. 39 The bad files are in http://www.w3.org/2001/06/utf-8-wrong/. 40- xml:id normalized value 41 42TODO: 43===== 44 45- move all string manipulation functions (xmlStrdup, xmlStrlen, etc.) to 46 global.c. Bjorn noted that the following files depends on parser.o solely 47 because of these string functions: entities.o, global.o, hash.o, tree.o, 48 xmlIO.o, and xpath.o. 49 50- Optimization of tag strings allocation ? 51 52- maintain coherency of namespace when doing cut'n paste operations 53 => the functions are coded, but need testing 54 55- function to rebuild the ID table 56- functions to rebuild the DTD hash tables (after DTD changes). 57 58 59EXTENSIONS: 60=========== 61 62- Tools to produce man pages from the SGML docs. 63 64- Add Xpointer recognition/API 65 66- Add Xlink recognition/API 67 => started adding an xlink.[ch] with a unified API for XML and HTML. 68 it's crap :-( 69 70- Implement XSchemas 71 => Really need to be done <grin/> 72 - datatype are complete, but structure support is very limited. 73 74- extend the shell with: 75 - edit 76 - load/save 77 - mv (yum, yum, but it's harder because directories are ordered in 78 our case, mvup and mvdown would be required) 79 80 81Done: 82===== 83 84- Add HTML validation using the XHTML DTD 85 - problem: do we want to keep and maintain the code for handling 86 DTD/System ID cache directly in libxml ? 87 => not really done that way, but there are new APIs to check elements 88 or attributes. Otherwise XHTML validation directly ... 89 90- XML Schemas datatypes except Base64 and BinHex 91 92- Relax NG validation 93 94- XmlTextReader streaming API + validation 95 96- Add a DTD cache prefilled with xhtml DTDs and entities and a program to 97 manage them -> like the /usr/bin/install-catalog from SGML 98 right place seems $datadir/xmldtds 99 Maybe this is better left to user apps 100 => use a catalog instead , and xhtml1-dtd package 101 102- Add output to XHTML 103 => XML serializer automatically recognize the DTd and apply the specific 104 rules. 105 106- Fix output of <tst val="x
y"/> 107 108- compliance to XML-Namespace checking, see section 6 of 109 http://www.w3.org/TR/REC-xml-names/ 110 111- Correct standalone checking/emitting (hard) 112 2.9 Standalone Document Declaration 113 114- Implement OASIS XML Catalog support 115 http://www.oasis-open.org/committees/entity/ 116 117- Get OASIS testsuite to a more friendly result, check all the results 118 once stable. the check-xml-test-suite.py script does this 119 120- Implement XSLT 121 => libxslt 122 123- Finish XPath 124 => attributes addressing troubles 125 => defaulted attributes handling 126 => namespace axis ? 127 done as XSLT got debugged 128 129- bug reported by Michael Meallin on validation problems 130 => Actually means I need to add support (and warn) for non-deterministic 131 content model. 132- Handle undefined namespaces in entity contents better ... at least 133 issue a warning 134- DOM needs 135 int xmlPruneProp(xmlNodePtr node, xmlAtttrPtr attr); 136 => done it's actually xmlRemoveProp xmlUnsetProp xmlUnsetNsProp 137 138- HTML: handling of Script and style data elements, need special code in 139 the parser and saving functions (handling of < > " ' ...): 140 http://www.w3.org/TR/html4/types.html#type-script 141 Attributes are no problems since entities are accepted. 142- DOM needs 143 xmlAttrPtr xmlNewDocProp(xmlDocPtr doc, const xmlChar *name, const xmlChar *value) 144- problem when parsing hrefs with & with the HTML parser (IRC ac) 145- If the internal encoding is not UTF8 saving to a given encoding doesn't 146 work => fix to force UTF8 encoding ... 147 done, added documentation too 148- Add an ASCII I/O encoder (asciiToUTF8 and UTF8Toascii) 149- Issue warning when using non-absolute namespaces URI. 150- the html parser should add <head> and <body> if they don't exist 151 started, not finished. 152 Done, the automatic closing is added and 3 testcases were inserted 153- Command to force the parser to stop parsing and ignore the rest of the file. 154 xmlStopParser() should allow this, mostly untested 155- support for HTML empty attributes like <hr noshade> 156- plugged iconv() in for support of a large set of encodings. 157- xmlSwitchToEncoding() rewrite done 158- URI checkings (no fragments) rfc2396.txt 159- Added a clean mechanism for overload or added input methods: 160 xmlRegisterInputCallbacks() 161- dynamically adapt the alloc entry point to use g_alloc()/g_free() 162 if the programmer wants it: 163 - use xmlMemSetup() to reset the routines used. 164- Check attribute normalization especially xmlGetProp() 165- Validity checking problems for NOTATIONS attributes 166- Validity checking problems for ENTITY ENTITIES attributes 167- Parsing of a well balanced chunk xmlParseBalancedChunkMemory() 168- URI module: validation, base, etc ... see uri.[ch] 169- turn tester into a generic program xmllint installed with libxml 170- extend validity checks to go through entities content instead of 171 just labelling them PCDATA 172- Save Dtds using the children list instead of dumping the tables, 173 order is preserved as well as comments and PIs 174- Wrote a notice of changes requires to go from 1.x to 2.x 175- make sure that all SAX callbacks are disabled if a WF error is detected 176- checking/handling of newline normalization 177 http://localhost/www.xml.com/axml/target.html#sec-line-ends 178- correct checking of '&' '%' on entities content. 179- checking of PE/Nesting on entities declaration 180- checking/handling of xml:space 181 - checking done. 182 - handling done, not well tested 183- Language identification code, productions [33] to [38] 184 => done, the check has been added and report WFness errors 185- Conditional sections in DTDs [61] to [65] 186 => should this crap be really implemented ??? 187 => Yep OASIS testsuite uses them 188- Allow parsed entities defined in the internal subset to override 189 the ones defined in the external subset (DtD customization). 190 => This mean that the entity content should be computed only at 191 use time, i.e. keep the orig string only at parse time and expand 192 only when referenced from the external subset :-( 193 Needed for complete use of most DTD from Eve Maler 194- Add regression tests for all WFC errors 195 => did some in test/WFC 196 => added OASIS testsuite routines 197 http://xmlsoft.org/conf/result.html 198 199- I18N: http://wap.trondheim.com/vaer/index.phtml is not XML and accepted 200 by the XML parser, UTF-8 should be checked when there is no "encoding" 201 declared ! 202- Support for UTF-8 and UTF-16 encoding 203 => added some conversion routines provided by Martin Durst 204 patched them, got fixes from @@@ 205 I plan to keep everything internally as UTF-8 (or ISO-Latin-X) 206 this is slightly more costly but more compact, and recent processors 207 efficiency is cache related. The key for good performances is keeping 208 the data set small, so will I. 209 => the new progressive reading routines call the detection code 210 is enabled, tested the ISO->UTF-8 stuff 211- External entities loading: 212 - allow override by client code 213 - make sure it is called for all external entities referenced 214 Done, client code should use xmlSetExternalEntityLoader() to set 215 the default loading routine. It will be called each time an external 216 entity entity resolution is triggered. 217- maintain ID coherency when removing/changing attributes 218 The function used to deallocate attributes now check for it being an 219 ID and removes it from the table. 220- push mode parsing i.e. non-blocking state based parser 221 done, both for XML and HTML parsers. Use xmlCreatePushParserCtxt() 222 and xmlParseChunk() and html counterparts. 223 The tester program now has a --push option to select that parser 224 front-end. Douplicated tests to use both and check results are similar. 225 226- Most of XPath, still see some troubles and occasionnal memleaks. 227- an XML shell, allowing to traverse/manipulate an XML document with 228 a shell like interface, and using XPath for the anming syntax 229 - use of readline and history added when available 230 - the shell interface has been cleanly separated and moved to debugXML.c 231- HTML parser, should be fairly stable now 232- API to search the lang of an attribute 233- Collect IDs at parsing and maintain a table. 234 PBM: maintain the table coherency 235 PBM: how to detect ID types in absence of DtD ! 236- Use it for XPath ID support 237- Add validity checking 238 Should be finished now ! 239- Add regression tests with entity substitutions 240 241- External Parsed entities, either XML or external Subset [78] and [79] 242 parsing the xmllang DtD now works, so it should be sufficient for 243 most cases ! 244 245- progressive reading. The entity support is a first step toward 246 abstraction of an input stream. A large part of the context is still 247 located on the stack, moving to a state machine and putting everything 248 in the parsing context should provide an adequate solution. 249 => Rather than progressive parsing, give more power to the SAX-like 250 interface. Currently the DOM-like representation is built but 251 => it should be possible to define that only as a set of SAX callbacks 252 and remove the tree creation from the parser code. 253 DONE 254 255- DOM support, instead of using a proprietary in memory 256 format for the document representation, the parser should 257 call a DOM API to actually build the resulting document. 258 Then the parser becomes independent of the in-memory 259 representation of the document. Even better using RPC's 260 the parser can actually build the document in another 261 program. 262 => Work started, now the internal representation is by default 263 very near a direct DOM implementation. The DOM glue is implemented 264 as a separate module. See the GNOME gdome module. 265 266- C++ support : John Ehresman <jehresma@dsg.harvard.edu> 267- Updated code to follow more recent specs, added compatibility flag 268- Better error handling, use a dedicated, overridable error 269 handling function. 270- Support for CDATA. 271- Keep track of line numbers for better error reporting. 272- Support for PI (SAX one). 273- Support for Comments (bad, should be in ASAP, they are parsed 274 but not stored), should be configurable. 275- Improve the support of entities on save (+SAX). 276 277