# Error management nom's errors are designed with multiple needs in mind: - indicate which parser failed and where in the input data - accumulate more context as the error goes up the parser chain - have a very low overhead, as errors are often discarded by the calling parser (examples: `many0`, `alt`) - can be modified according to the user's needs, because some languages need a lot more information To match these requirements, nom parsers have to return the following result type: ```rust pub type IResult> = Result<(I, O), nom::Err>; pub enum Err { Incomplete(Needed), Error(E), Failure(E), } ``` The result is either an `Ok((I, O))` containing the remaining input and the parsed value, or an `Err(nom::Err)` with `E` the error type. `nom::Err` is an enum because combinators can have different behaviours depending on the value. The `Err` enum expresses 3 conditions for a parser error: - `Incomplete` indicates that a parser did not have enough data to decide. This can be returned by parsers found in `streaming` submodules to indicate that we should buffer more data from a file or socket. Parsers in the `complete` submodules assume that they have the entire input data, so if it was not sufficient, they will instead return a `Err::Error`. When a parser returns `Incomplete`, we should accumulate more data in the buffer (example: reading from a socket) and call the parser again - `Error` is a normal parser error. If a child parser of the `alt` combinator returns `Error`, it will try another child parser - `Failure` is an error from which we cannot recover: The `alt` combinator will not try other branches if a child parser returns `Failure`. If we know we were in the right branch (example: we found a correct prefix character but input after that was wrong), we can transform a `Err::Error` into a `Err::Failure` with the `cut()` combinator If we are running a parser and know it will not return `Err::Incomplete`, we can directly extract the error type from `Err::Error` or `Err::Failure` with the `finish()` method: ```rust let parser_result: IResult = parser(input); let result: Result<(I, O), E> = parser_result.finish(); ``` If we used a borrowed type as input, like `&[u8]` or `&str`, we might want to convert it to an owned type to transmit it somewhere, with the `to_owned()` method: ```rust let result: Result<(&[u8], Value), Err>> = parser(data).map_err(|e: E<&[u8]>| -> e.to_owned()); ``` nom provides a powerful error system that can adapt to your needs: you can get reduced error information if you want to improve performance, or you can get a precise trace of parser application, with fine grained position information. This is done through the third type parameter of `IResult`, nom's parser result type: ```rust pub type IResult> = Result<(I, O), Err>; pub enum Err { Incomplete(Needed), Error(E), Failure(E), } ``` This error type is completely generic in nom's combinators, so you can choose exactly which error type you want to use when you define your parsers, or directly at the call site. See [the JSON parser](https://github.com/Geal/nom/blob/5405e1173f1052f7e006dcb0b9cfda2b06557b65/examples/json.rs#L209-L286) for an example of choosing different error types at the call site. ## Common error types ### the default error type: nom::error::Error ```rust #[derive(Debug, PartialEq)] pub struct Error { /// position of the error in the input data pub input: I, /// nom error code pub code: ErrorKind, } ``` This structure contains a `nom::error::ErrorKind` indicating which kind of parser encountered an error (example: `ErrorKind::Tag` for the `tag()` combinator), and the input position of the error. This error type is fast and has very low overhead, so it is suitable for parsers that are called repeatedly, like in network protocols. It is very limited though, it will not tell you about the chain of parser calls, so it is not enough to write user friendly errors. Example error returned in a JSON-like parser (from `examples/json.rs`): ```rust let data = " { \"a\"\t: 42, \"b\": [ \"x\", \"y\", 12 ] , \"c\": { 1\"hello\" : \"world\" } } "; // will print: // Err( // Failure( // Error { // input: "1\"hello\" : \"world\"\n }\n } ", // code: Char, // }, // ), // ) println!( "{:#?}\n", json::>(data) ); ``` ### getting more information: nom::error::VerboseError The `VerboseError` type accumulates more information about the chain of parsers that encountered an error: ```rust #[derive(Clone, Debug, PartialEq)] pub struct VerboseError { /// List of errors accumulated by `VerboseError`, containing the affected /// part of input data, and some context pub errors: crate::lib::std::vec::Vec<(I, VerboseErrorKind)>, } #[derive(Clone, Debug, PartialEq)] /// Error context for `VerboseError` pub enum VerboseErrorKind { /// Static string added by the `context` function Context(&'static str), /// Indicates which character was expected by the `char` function Char(char), /// Error kind given by various nom parsers Nom(ErrorKind), } ``` It contains the input position and error code for each of those parsers. It does not accumulate errors from the different branches of `alt`, it will only contain errors from the last branch it tried. It can be used along with the `nom::error::context` combinator to inform about the parser chain: ```rust context( "string", preceded(char('\"'), cut(terminated(parse_str, char('\"')))), )(i) ``` It is not very usable if printed directly: ```rust // parsed verbose: Err( // Failure( // VerboseError { // errors: [ // ( // "1\"hello\" : \"world\"\n }\n } ", // Char( // '}', // ), // ), // ( // "{ 1\"hello\" : \"world\"\n }\n } ", // Context( // "map", // ), // ), // ( // "{ \"a\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } ", // Context( // "map", // ), // ), // ], // }, // ), // ) println!("parsed verbose: {:#?}", json::>(data)); ``` But by looking at the original input and the chain of errors, we can build a more user friendly error message. The `nom::error::convert_error` function can build such a message. ```rust let e = json::>(data).finish().err().unwrap(); // here we use the `convert_error` function, to transform a `VerboseError<&str>` // into a printable trace. // // This will print: // verbose errors - `json::>(data)`: // 0: at line 2: // "c": { 1"hello" : "world" // ^ // expected '}', found 1 // // 1: at line 2, in map: // "c": { 1"hello" : "world" // ^ // // 2: at line 0, in map: // { "a" : 42, // ^ println!( "verbose errors - `json::>(data)`:\n{}", convert_error(data, e) ); ``` Note that `VerboseError` and `convert_error` are meant as a starting point for language errors, but that they cannot cover all use cases. So a custom `convert_error` function should probably be written. ### Improving usability: nom_locate and nom-supreme These crates were developed to improve the user experience when writing nom parsers. #### nom_locate [nom_locate](https://docs.rs/nom_locate/) wraps the input data in a `Span` type that can be understood by nom parsers. That type provides location information, like line and column. #### nom-supreme [nom-supreme](https://docs.rs/nom-supreme/) provides the `ErrorTree` error type, that provides the same chain of parser errors as `VerboseError`, but also accumulates errors from the various branches tried by `alt`. With this error type, you can explore everything that has been tried by the parser. ## The `ParseError` trait If those error types are not enough, we can define our own, by implementing the `ParseError` trait. All nom combinators are generic over that trait for their errors, so we only need to define it in the parser result type, and it will be used everywhere. ```rust pub trait ParseError: Sized { /// Creates an error from the input position and an [ErrorKind] fn from_error_kind(input: I, kind: ErrorKind) -> Self; /// Combines an existing error with a new one created from the input /// position and an [ErrorKind]. This is useful when backtracking /// through a parse tree, accumulating error context on the way fn append(input: I, kind: ErrorKind, other: Self) -> Self; /// Creates an error from an input position and an expected character fn from_char(input: I, _: char) -> Self { Self::from_error_kind(input, ErrorKind::Char) } /// Combines two existing errors. This function is used to compare errors /// generated in various branches of `alt` fn or(self, other: Self) -> Self { other } } ``` Any error type has to implement that trait, that requires ways to build an error: - `from_error_kind`: From the input position and the `ErrorKind` enum that indicates in which parser we got an error - `append`: Allows the creation of a chain of errors as we backtrack through the parser tree (various combinators will add more context) - `from_char`: Creates an error that indicates which character we were expecting - `or`: In combinators like `alt`, allows choosing between errors from various branches (or accumulating them) We can also implement the `ContextError` trait to support the `context()` combinator used by `VerboseError`: ```rust pub trait ContextError: Sized { fn add_context(_input: I, _ctx: &'static str, other: Self) -> Self { other } } ``` And there is also the `FromExternalError` used by `map_res` to wrap errors returned by other functions: ```rust pub trait FromExternalError { fn from_external_error(input: I, kind: ErrorKind, e: ExternalError) -> Self; } ``` ### Example usage Let's define a debugging error type, that will print something every time an error is generated. This will give us a good insight into what the parser tried. Since errors can be combined with each other, we want it to keep some info on the error that was just returned. We'll just store that in a string: ```rust struct DebugError { message: String, } ``` Now let's implement `ParseError` and `ContextError` on it: ```rust impl ParseError<&str> for DebugError { // on one line, we show the error code and the input that caused it fn from_error_kind(input: &str, kind: ErrorKind) -> Self { let message = format!("{:?}:\t{:?}\n", kind, input); println!("{}", message); DebugError { message } } // if combining multiple errors, we show them one after the other fn append(input: &str, kind: ErrorKind, other: Self) -> Self { let message = format!("{}{:?}:\t{:?}\n", other.message, kind, input); println!("{}", message); DebugError { message } } fn from_char(input: &str, c: char) -> Self { let message = format!("'{}':\t{:?}\n", c, input); println!("{}", message); DebugError { message } } fn or(self, other: Self) -> Self { let message = format!("{}\tOR\n{}\n", self.message, other.message); println!("{}", message); DebugError { message } } } impl ContextError<&str> for DebugError { fn add_context(input: &str, ctx: &'static str, other: Self) -> Self { let message = format!("{}\"{}\":\t{:?}\n", other.message, ctx, input); println!("{}", message); DebugError { message } } } ``` So when calling our JSON parser with this error type, we will get a trace of all the times a parser stoppped and backtracked: ```rust println!("debug: {:#?}", root::(data)); ``` ``` AlphaNumeric: "\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " '{': "42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " '{': "42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " "map": "42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " [..] AlphaNumeric: "\": { 1\"hello\" : \"world\"\n }\n } " '"': "1\"hello\" : \"world\"\n }\n } " '"': "1\"hello\" : \"world\"\n }\n } " "string": "1\"hello\" : \"world\"\n }\n } " '}': "1\"hello\" : \"world\"\n }\n } " '}': "1\"hello\" : \"world\"\n }\n } " "map": "{ 1\"hello\" : \"world\"\n }\n } " '}': "1\"hello\" : \"world\"\n }\n } " "map": "{ 1\"hello\" : \"world\"\n }\n } " "map": "{ \"a\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " debug: Err( Failure( DebugError { message: "'}':\t\"1\\\"hello\\\" : \\\"world\\\"\\n }\\n } \"\n\"map\":\t\"{ 1\\\"hello\\\" : \\\"world \\"\\n }\\n } \"\n\"map\":\t\"{ \\\"a\\\"\\t: 42,\\n \\\"b\\\": [ \\\"x\\\", \\\"y\\\", 12 ] ,\\n \\\"c\\\": { 1\ \"hello\\\" : \\\"world\\\"\\n }\\n } \"\n", }, ), ) ``` Here we can see that when parsing `{ 1\"hello\" : \"world\"\n }\n }`, after getting past the initial `{`, we tried: - parsing a `"` because we're expecting a key name, and that parser was part of the "string" parser - parsing a `}` because the map might be empty. When this fails, we backtrack, through 2 recursive map parsers: ``` '}': "1\"hello\" : \"world\"\n }\n } " "map": "{ 1\"hello\" : \"world\"\n }\n } " "map": "{ \"a\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } " ``` ## Debugging parsers While you are writing your parsers, you will sometimes need to follow which part of the parser sees which part of the input. To that end, nom provides the `dbg_dmp` function that will observe a parser's input and output, and print a hexdump of the input if there was an error. Here is what it could return: ```rust fn f(i: &[u8]) -> IResult<&[u8], &[u8]> { dbg_dmp(tag("abcd"), "tag")(i) } let a = &b"efghijkl"[..]; // Will print the following message: // tag: Error(Error(Error { input: [101, 102, 103, 104, 105, 106, 107, 108], code: Tag })) at: // 00000000 65 66 67 68 69 6a 6b 6c efghijkl f(a); ``` You can go further with the [nom-trace crate](https://github.com/rust-bakery/nom-trace)