1# Error management 2 3nom's errors are designed with multiple needs in mind: 4- indicate which parser failed and where in the input data 5- accumulate more context as the error goes up the parser chain 6- have a very low overhead, as errors are often discarded by the calling parser (examples: `many0`, `alt`) 7- can be modified according to the user's needs, because some languages need a lot more information 8 9To match these requirements, nom parsers have to return the following result 10type: 11 12```rust 13pub type IResult<I, O, E=nom::error::Error<I>> = Result<(I, O), nom::Err<E>>; 14 15pub enum Err<E> { 16 Incomplete(Needed), 17 Error(E), 18 Failure(E), 19} 20``` 21 22The result is either an `Ok((I, O))` containing the remaining input and the 23parsed value, or an `Err(nom::Err<E>)` with `E` the error type. 24`nom::Err<E>` is an enum because combinators can have different behaviours 25depending on the value. The `Err<E>` enum expresses 3 conditions for a parser error: 26- `Incomplete` indicates that a parser did not have enough data to decide. This can be returned by parsers found in `streaming` submodules to indicate that we should buffer more data from a file or socket. Parsers in the `complete` submodules assume that they have the entire input data, so if it was not sufficient, they will instead return a `Err::Error`. When a parser returns `Incomplete`, we should accumulate more data in the buffer (example: reading from a socket) and call the parser again 27- `Error` is a normal parser error. If a child parser of the `alt` combinator returns `Error`, it will try another child parser 28- `Failure` is an error from which we cannot recover: The `alt` combinator will not try other branches if a child parser returns `Failure`. If we know we were in the right branch (example: we found a correct prefix character but input after that was wrong), we can transform a `Err::Error` into a `Err::Failure` with the `cut()` combinator 29 30If we are running a parser and know it will not return `Err::Incomplete`, we can 31directly extract the error type from `Err::Error` or `Err::Failure` with the 32`finish()` method: 33 34```rust 35let parser_result: IResult<I, O, E> = parser(input); 36let result: Result<(I, O), E> = parser_result.finish(); 37``` 38 39If we used a borrowed type as input, like `&[u8]` or `&str`, we might want to 40convert it to an owned type to transmit it somewhere, with the `to_owned()` 41method: 42 43```rust 44let result: Result<(&[u8], Value), Err<Vec<u8>>> = 45 parser(data).map_err(|e: E<&[u8]>| -> e.to_owned()); 46``` 47 48nom provides a powerful error system that can adapt to your needs: you can 49get reduced error information if you want to improve performance, or you can 50get a precise trace of parser application, with fine grained position information. 51 52This is done through the third type parameter of `IResult`, nom's parser result 53type: 54 55```rust 56pub type IResult<I, O, E=nom::error::Error<I>> = Result<(I, O), Err<E>>; 57 58pub enum Err<E> { 59 Incomplete(Needed), 60 Error(E), 61 Failure(E), 62} 63``` 64 65This error type is completely generic in nom's combinators, so you can choose 66exactly which error type you want to use when you define your parsers, or 67directly at the call site. 68See [the JSON parser](https://github.com/Geal/nom/blob/5405e1173f1052f7e006dcb0b9cfda2b06557b65/examples/json.rs#L209-L286) 69for an example of choosing different error types at the call site. 70 71## Common error types 72 73### the default error type: nom::error::Error 74 75```rust 76#[derive(Debug, PartialEq)] 77pub struct Error<I> { 78 /// position of the error in the input data 79 pub input: I, 80 /// nom error code 81 pub code: ErrorKind, 82} 83``` 84 85This structure contains a `nom::error::ErrorKind` indicating which kind of 86parser encountered an error (example: `ErrorKind::Tag` for the `tag()` 87combinator), and the input position of the error. 88 89This error type is fast and has very low overhead, so it is suitable for 90parsers that are called repeatedly, like in network protocols. 91It is very limited though, it will not tell you about the chain of 92parser calls, so it is not enough to write user friendly errors. 93 94Example error returned in a JSON-like parser (from `examples/json.rs`): 95 96```rust 97let data = " { \"a\"\t: 42, 98\"b\": [ \"x\", \"y\", 12 ] , 99\"c\": { 1\"hello\" : \"world\" 100} 101} "; 102 103// will print: 104// Err( 105// Failure( 106// Error { 107// input: "1\"hello\" : \"world\"\n }\n } ", 108// code: Char, 109// }, 110// ), 111// ) 112println!( 113 "{:#?}\n", 114 json::<Error<&str>>(data) 115); 116``` 117 118### getting more information: nom::error::VerboseError 119 120The `VerboseError<I>` type accumulates more information about the chain of 121parsers that encountered an error: 122 123```rust 124#[derive(Clone, Debug, PartialEq)] 125pub struct VerboseError<I> { 126 /// List of errors accumulated by `VerboseError`, containing the affected 127 /// part of input data, and some context 128 pub errors: crate::lib::std::vec::Vec<(I, VerboseErrorKind)>, 129} 130 131#[derive(Clone, Debug, PartialEq)] 132/// Error context for `VerboseError` 133pub enum VerboseErrorKind { 134 /// Static string added by the `context` function 135 Context(&'static str), 136 /// Indicates which character was expected by the `char` function 137 Char(char), 138 /// Error kind given by various nom parsers 139 Nom(ErrorKind), 140} 141``` 142 143It contains the input position and error code for each of those parsers. 144It does not accumulate errors from the different branches of `alt`, it will 145only contain errors from the last branch it tried. 146 147It can be used along with the `nom::error::context` combinator to inform about 148the parser chain: 149 150```rust 151context( 152 "string", 153 preceded(char('\"'), cut(terminated(parse_str, char('\"')))), 154)(i) 155``` 156 157It is not very usable if printed directly: 158 159```rust 160// parsed verbose: Err( 161// Failure( 162// VerboseError { 163// errors: [ 164// ( 165// "1\"hello\" : \"world\"\n }\n } ", 166// Char( 167// '}', 168// ), 169// ), 170// ( 171// "{ 1\"hello\" : \"world\"\n }\n } ", 172// Context( 173// "map", 174// ), 175// ), 176// ( 177// "{ \"a\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } ", 178// Context( 179// "map", 180// ), 181// ), 182// ], 183// }, 184// ), 185// ) 186println!("parsed verbose: {:#?}", json::<VerboseError<&str>>(data)); 187``` 188 189But by looking at the original input and the chain of errors, we can build 190a more user friendly error message. The `nom::error::convert_error` function 191can build such a message. 192 193```rust 194let e = json::<VerboseError<&str>>(data).finish().err().unwrap(); 195// here we use the `convert_error` function, to transform a `VerboseError<&str>` 196// into a printable trace. 197// 198// This will print: 199// verbose errors - `json::<VerboseError<&str>>(data)`: 200// 0: at line 2: 201// "c": { 1"hello" : "world" 202// ^ 203// expected '}', found 1 204// 205// 1: at line 2, in map: 206// "c": { 1"hello" : "world" 207// ^ 208// 209// 2: at line 0, in map: 210// { "a" : 42, 211// ^ 212println!( 213 "verbose errors - `json::<VerboseError<&str>>(data)`:\n{}", 214 convert_error(data, e) 215); 216``` 217 218Note that `VerboseError` and `convert_error` are meant as a starting point for 219language errors, but that they cannot cover all use cases. So a custom 220`convert_error` function should probably be written. 221 222### Improving usability: nom_locate and nom-supreme 223 224These crates were developed to improve the user experience when writing nom 225parsers. 226 227#### nom_locate 228 229[nom_locate](https://docs.rs/nom_locate/) wraps the input data in a `Span` 230type that can be understood by nom parsers. That type provides location 231information, like line and column. 232 233#### nom-supreme 234 235[nom-supreme](https://docs.rs/nom-supreme/) provides the `ErrorTree<I>` error 236type, that provides the same chain of parser errors as `VerboseError`, but also 237accumulates errors from the various branches tried by `alt`. 238 239With this error type, you can explore everything that has been tried by the 240parser. 241 242## The `ParseError` trait 243 244If those error types are not enough, we can define our own, by implementing 245the `ParseError<I>` trait. All nom combinators are generic over that trait 246for their errors, so we only need to define it in the parser result type, 247and it will be used everywhere. 248 249```rust 250pub trait ParseError<I>: Sized { 251 /// Creates an error from the input position and an [ErrorKind] 252 fn from_error_kind(input: I, kind: ErrorKind) -> Self; 253 254 /// Combines an existing error with a new one created from the input 255 /// position and an [ErrorKind]. This is useful when backtracking 256 /// through a parse tree, accumulating error context on the way 257 fn append(input: I, kind: ErrorKind, other: Self) -> Self; 258 259 /// Creates an error from an input position and an expected character 260 fn from_char(input: I, _: char) -> Self { 261 Self::from_error_kind(input, ErrorKind::Char) 262 } 263 264 /// Combines two existing errors. This function is used to compare errors 265 /// generated in various branches of `alt` 266 fn or(self, other: Self) -> Self { 267 other 268 } 269} 270``` 271 272Any error type has to implement that trait, that requires ways to build an 273error: 274- `from_error_kind`: From the input position and the `ErrorKind` enum that indicates in which parser we got an error 275- `append`: Allows the creation of a chain of errors as we backtrack through the parser tree (various combinators will add more context) 276- `from_char`: Creates an error that indicates which character we were expecting 277- `or`: In combinators like `alt`, allows choosing between errors from various branches (or accumulating them) 278 279We can also implement the `ContextError` trait to support the `context()` 280combinator used by `VerboseError<I>`: 281 282```rust 283pub trait ContextError<I>: Sized { 284 fn add_context(_input: I, _ctx: &'static str, other: Self) -> Self { 285 other 286 } 287} 288``` 289 290And there is also the `FromExternalError<I, E>` used by `map_res` to wrap 291errors returned by other functions: 292 293```rust 294pub trait FromExternalError<I, ExternalError> { 295 fn from_external_error(input: I, kind: ErrorKind, e: ExternalError) -> Self; 296} 297``` 298 299### Example usage 300 301Let's define a debugging error type, that will print something every time an 302error is generated. This will give us a good insight into what the parser tried. 303Since errors can be combined with each other, we want it to keep some info on 304the error that was just returned. We'll just store that in a string: 305 306```rust 307struct DebugError { 308 message: String, 309} 310``` 311 312Now let's implement `ParseError` and `ContextError` on it: 313 314```rust 315impl ParseError<&str> for DebugError { 316 // on one line, we show the error code and the input that caused it 317 fn from_error_kind(input: &str, kind: ErrorKind) -> Self { 318 let message = format!("{:?}:\t{:?}\n", kind, input); 319 println!("{}", message); 320 DebugError { message } 321 } 322 323 // if combining multiple errors, we show them one after the other 324 fn append(input: &str, kind: ErrorKind, other: Self) -> Self { 325 let message = format!("{}{:?}:\t{:?}\n", other.message, kind, input); 326 println!("{}", message); 327 DebugError { message } 328 } 329 330 fn from_char(input: &str, c: char) -> Self { 331 let message = format!("'{}':\t{:?}\n", c, input); 332 println!("{}", message); 333 DebugError { message } 334 } 335 336 fn or(self, other: Self) -> Self { 337 let message = format!("{}\tOR\n{}\n", self.message, other.message); 338 println!("{}", message); 339 DebugError { message } 340 } 341} 342 343impl ContextError<&str> for DebugError { 344 fn add_context(input: &str, ctx: &'static str, other: Self) -> Self { 345 let message = format!("{}\"{}\":\t{:?}\n", other.message, ctx, input); 346 println!("{}", message); 347 DebugError { message } 348 } 349} 350``` 351 352So when calling our JSON parser with this error type, we will get a trace 353of all the times a parser stoppped and backtracked: 354 355```rust 356println!("debug: {:#?}", root::<DebugError>(data)); 357``` 358 359``` 360AlphaNumeric: "\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " 361 362'{': "42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " 363 364'{': "42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " 365"map": "42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " 366 367[..] 368 369AlphaNumeric: "\": { 1\"hello\" : \"world\"\n }\n } " 370 371'"': "1\"hello\" : \"world\"\n }\n } " 372 373'"': "1\"hello\" : \"world\"\n }\n } " 374"string": "1\"hello\" : \"world\"\n }\n } " 375 376'}': "1\"hello\" : \"world\"\n }\n } " 377 378'}': "1\"hello\" : \"world\"\n }\n } " 379"map": "{ 1\"hello\" : \"world\"\n }\n } " 380 381'}': "1\"hello\" : \"world\"\n }\n } " 382"map": "{ 1\"hello\" : \"world\"\n }\n } " 383"map": "{ \"a\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " 384 385debug: Err( 386 Failure( 387 DebugError { 388 message: "'}':\t\"1\\\"hello\\\" : \\\"world\\\"\\n }\\n } \"\n\"map\":\t\"{ 1\\\"hello\\\" : \\\"world 389\\"\\n }\\n } \"\n\"map\":\t\"{ \\\"a\\\"\\t: 42,\\n \\\"b\\\": [ \\\"x\\\", \\\"y\\\", 12 ] ,\\n \\\"c\\\": { 1\ 390\"hello\\\" : \\\"world\\\"\\n }\\n } \"\n", 391 }, 392 ), 393) 394``` 395 396Here we can see that when parsing `{ 1\"hello\" : \"world\"\n }\n }`, after 397getting past the initial `{`, we tried: 398- parsing a `"` because we're expecting a key name, and that parser was part of the 399"string" parser 400- parsing a `}` because the map might be empty. When this fails, we backtrack, 401through 2 recursive map parsers: 402 403``` 404'}': "1\"hello\" : \"world\"\n }\n } " 405"map": "{ 1\"hello\" : \"world\"\n }\n } " 406"map": "{ \"a\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " 407``` 408 409## Debugging parsers 410 411While you are writing your parsers, you will sometimes need to follow 412which part of the parser sees which part of the input. 413 414To that end, nom provides the `dbg_dmp` function that will observe 415a parser's input and output, and print a hexdump of the input if there was an 416error. Here is what it could return: 417 418```rust 419fn f(i: &[u8]) -> IResult<&[u8], &[u8]> { 420 dbg_dmp(tag("abcd"), "tag")(i) 421} 422 423let a = &b"efghijkl"[..]; 424 425// Will print the following message: 426// tag: Error(Error(Error { input: [101, 102, 103, 104, 105, 106, 107, 108], code: Tag })) at: 427// 00000000 65 66 67 68 69 6a 6b 6c efghijkl 428f(a); 429``` 430 431You can go further with the [nom-trace crate](https://github.com/rust-bakery/nom-trace) 432