1xml-rs, an XML library for Rust 2=============================== 3 4[](https://github.com/kornelski/xml-rs/actions/workflows/main.yml) 5[![crates.io][crates-io-img]](https://lib.rs/crates/xml-rs) 6[![docs][docs-img]](https://docs.rs/xml-rs/) 7 8[Documentation](https://docs.rs/xml-rs/) 9 10 [crates-io-img]: https://img.shields.io/crates/v/xml-rs.svg 11 [docs-img]: https://img.shields.io/badge/docs-latest%20release-6495ed.svg 12 13xml-rs is an XML library for the [Rust](https://www.rust-lang.org/) programming language. 14It supports reading and writing of XML documents in a streaming fashion (without DOM). 15 16### Features 17 18* XML spec conformance better than other pure-Rust libraries. 19 20* Easy to use API based on `Iterator`s and regular `String`s without tricky lifetimes. 21 22* Support for UTF-16, UTF-8, ISO-8859-1, and ASCII encodings. 23 24* Written entirely in the safe Rust subset. Designed to safely handle untrusted input. 25 26 27The API is heavily inspired by Java Streaming API for XML ([StAX][stax]). It contains a pull parser much like StAX event reader. It provides an iterator API, so you can leverage Rust's existing iterators library features. 28 29 [stax]: https://en.wikipedia.org/wiki/StAX 30 31It also provides a streaming document writer much like StAX event writer. 32This writer consumes its own set of events, but reader events can be converted to 33writer events easily, and so it is possible to write XML transformation chains in a pretty 34clean manner. 35 36This parser is mostly full-featured, however, there are limitations: 37* Legacy code pages and non-Unicode encodings are not supported; 38* DTD validation is not supported (but entities defined in the internal subset are supported); 39* attribute value normalization is not performed, and end-of-line characters are not normalized either. 40 41Other than that the parser tries to be mostly XML-1.1-compliant. 42 43Writer is also mostly full-featured with the following limitations: 44* no support for encodings other than UTF-8, 45* no support for emitting `<!DOCTYPE>` declarations; 46* more validations of input are needed, for example, checking that namespace prefixes are bounded 47 or comments are well-formed. 48 49Building and using 50------------------ 51 52xml-rs uses [Cargo](https://crates.io), so add it with `cargo add xml` or modify `Cargo.toml`: 53 54```toml 55[dependencies] 56xml = "0.8.16" 57``` 58 59The package exposes a single crate called `xml`. 60 61Reading XML documents 62--------------------- 63 64[`xml::reader::EventReader`](EventReader) requires a [`Read`](stdread) instance to read from. It can be a `File` wrapped in `BufReader`, or a `Vec<u8>`, or a `&[u8]` slice. 65 66[EventReader]: https://docs.rs/xml-rs/latest/xml/reader/struct.EventReader.html 67[stdread]: https://doc.rust-lang.org/stable/std/io/trait.Read.html 68 69`EventReader` implements `IntoIterator` trait, so you can use it in a `for` loop directly: 70 71```rust,no_run 72use std::fs::File; 73use std::io::BufReader; 74 75use xml::reader::{EventReader, XmlEvent}; 76 77fn main() -> std::io::Result<()> { 78 let file = File::open("file.xml")?; 79 let file = BufReader::new(file); // Buffering is important for performance 80 81 let parser = EventReader::new(file); 82 let mut depth = 0; 83 for e in parser { 84 match e { 85 Ok(XmlEvent::StartElement { name, .. }) => { 86 println!("{:spaces$}+{name}", "", spaces = depth * 2); 87 depth += 1; 88 } 89 Ok(XmlEvent::EndElement { name }) => { 90 depth -= 1; 91 println!("{:spaces$}-{name}", "", spaces = depth * 2); 92 } 93 Err(e) => { 94 eprintln!("Error: {e}"); 95 break; 96 } 97 // There's more: https://docs.rs/xml-rs/latest/xml/reader/enum.XmlEvent.html 98 _ => {} 99 } 100 } 101 102 Ok(()) 103} 104``` 105 106Document parsing can end normally or with an error. Regardless of exact cause, the parsing 107process will be stopped, and the iterator will terminate normally. 108 109You can also have finer control over when to pull the next event from the parser using its own 110`next()` method: 111 112```rust,ignore 113match parser.next() { 114 ... 115} 116``` 117 118Upon the end of the document or an error, the parser will remember the last event and will always 119return it in the result of `next()` call afterwards. If iterator is used, then it will yield 120error or end-of-document event once and will produce `None` afterwards. 121 122It is also possible to tweak parsing process a little using [`xml::reader::ParserConfig`][ParserConfig] structure. 123See its documentation for more information and examples. 124 125[ParserConfig]: https://docs.rs/xml-rs/latest/xml/reader/struct.ParserConfig.html 126 127You can find a more extensive example of using `EventReader` in `src/analyze.rs`, which is a 128small program (BTW, it is built with `cargo build` and can be run after that) which shows various 129statistics about specified XML document. It can also be used to check for well-formedness of 130XML documents - if a document is not well-formed, this program will exit with an error. 131 132 133## Parsing untrusted inputs 134 135The parser is written in safe Rust subset, so by Rust's guarantees the worst that it can do is to cause a panic. 136You can use `ParserConfig` to set limits on maximum lenghts of names, attributes, text, entities, etc. 137You should also set a maximum document size via `io::Read`'s [`take(max)`](https://doc.rust-lang.org/stable/std/io/trait.Read.html#method.take) method. 138 139Writing XML documents 140--------------------- 141 142xml-rs also provides a streaming writer much like StAX event writer. With it you can write an 143XML document to any `Write` implementor. 144 145```rust,no_run 146use std::io; 147use xml::writer::{EmitterConfig, XmlEvent}; 148 149/// A simple demo syntax where "+foo" makes `<foo>`, "-foo" makes `</foo>` 150fn make_event_from_line(line: &str) -> XmlEvent { 151 let line = line.trim(); 152 if let Some(name) = line.strip_prefix("+") { 153 XmlEvent::start_element(name).into() 154 } else if line.starts_with("-") { 155 XmlEvent::end_element().into() 156 } else { 157 XmlEvent::characters(line).into() 158 } 159} 160 161fn main() -> io::Result<()> { 162 let input = io::stdin(); 163 let output = io::stdout(); 164 let mut writer = EmitterConfig::new() 165 .perform_indent(true) 166 .create_writer(output); 167 168 let mut line = String::new(); 169 loop { 170 line.clear(); 171 let bytes_read = input.read_line(&mut line)?; 172 if bytes_read == 0 { 173 break; // EOF 174 } 175 176 let event = make_event_from_line(&line); 177 if let Err(e) = writer.write(event) { 178 panic!("Write error: {e}") 179 } 180 } 181 Ok(()) 182} 183``` 184 185The code example above also demonstrates how to create a writer out of its configuration. 186Similar thing also works with `EventReader`. 187 188The library provides an XML event building DSL which helps to construct complex events, 189e.g. ones having namespace definitions. Some examples: 190 191```rust,ignore 192// <a:hello a:param="value" xmlns:a="urn:some:document"> 193XmlEvent::start_element("a:hello").attr("a:param", "value").ns("a", "urn:some:document") 194 195// <hello b:config="name" xmlns="urn:default:uri"> 196XmlEvent::start_element("hello").attr("b:config", "value").default_ns("urn:defaul:uri") 197 198// <![CDATA[some unescaped text]]> 199XmlEvent::cdata("some unescaped text") 200``` 201 202Of course, one can create `XmlEvent` enum variants directly instead of using the builder DSL. 203There are more examples in [`xml::writer::XmlEvent`][XmlEvent] documentation. 204 205[XmlEvent]: https://docs.rs/xml-rs/latest/xml/reader/enum.XmlEvent.html 206 207The writer has multiple configuration options; see `EmitterConfig` documentation for more 208information. 209 210[EmitterConfig]: https://docs.rs/xml-rs/latest/xml/writer/struct.EmitterConfig.html 211 212Bug reports 213------------ 214 215Please report issues at: <https://github.com/kornelski/xml-rs/issues>. 216 217