• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Error management
2
3nom's errors are designed with multiple needs in mind:
4- indicate which parser failed and where in the input data
5- accumulate more context as the error goes up the parser chain
6- have a very low overhead, as errors are often discarded by the calling parser (examples: `many0`, `alt`)
7- can be modified according to the user's needs, because some languages need a lot more information
8
9To match these requirements, nom parsers have to return the following result
10type:
11
12```rust
13pub type IResult<I, O, E=nom::error::Error<I>> = Result<(I, O), nom::Err<E>>;
14
15pub enum Err<E> {
16    Incomplete(Needed),
17    Error(E),
18    Failure(E),
19}
20```
21
22The result is either an `Ok((I, O))` containing the remaining input and the
23parsed value, or an `Err(nom::Err<E>)` with `E` the error type.
24`nom::Err<E>` is an enum because combinators can have different behaviours
25depending on the value. The `Err<E>` enum expresses 3 conditions for a parser error:
26- `Incomplete` indicates that a parser did not have enough data to decide. This can be returned by parsers found in `streaming` submodules to indicate that we should buffer more data from a file or socket. Parsers in the `complete` submodules assume that they have the entire input data, so if it was not sufficient, they will instead return a `Err::Error`. When a parser returns `Incomplete`, we should accumulate more data in the buffer (example: reading from a socket) and call the parser again
27- `Error` is a normal parser error. If a child parser of the `alt` combinator returns `Error`, it will try another child parser
28- `Failure` is an error from which we cannot recover: The `alt` combinator will not try other branches if a child parser returns `Failure`. If we know we were in the right branch (example: we found a correct prefix character but input after that was wrong), we can transform a `Err::Error` into a `Err::Failure` with the `cut()` combinator
29
30If we are running a parser and know it will not return `Err::Incomplete`, we can
31directly extract the error type from `Err::Error` or `Err::Failure` with the
32`finish()` method:
33
34```rust
35let parser_result: IResult<I, O, E> = parser(input);
36let result: Result<(I, O), E> = parser_result.finish();
37```
38
39If we used a borrowed type as input, like `&[u8]` or `&str`, we might want to
40convert it to an owned type to transmit it somewhere, with the `to_owned()`
41method:
42
43```rust
44let result: Result<(&[u8], Value), Err<Vec<u8>>> =
45  parser(data).map_err(|e: E<&[u8]>| -> e.to_owned());
46```
47
48nom provides a powerful error system that can adapt to your needs: you can
49get reduced error information if you want to improve performance, or you can
50get a precise trace of parser application, with fine grained position information.
51
52This is done through the third type parameter of `IResult`, nom's parser result
53type:
54
55```rust
56pub type IResult<I, O, E=nom::error::Error<I>> = Result<(I, O), Err<E>>;
57
58pub enum Err<E> {
59    Incomplete(Needed),
60    Error(E),
61    Failure(E),
62}
63```
64
65This error type is completely generic in nom's combinators, so you can choose
66exactly which error type you want to use when you define your parsers, or
67directly at the call site.
68See [the JSON parser](https://github.com/Geal/nom/blob/5405e1173f1052f7e006dcb0b9cfda2b06557b65/examples/json.rs#L209-L286)
69for an example of choosing different error types at the call site.
70
71## Common error types
72
73### the default error type: nom::error::Error
74
75```rust
76#[derive(Debug, PartialEq)]
77pub struct Error<I> {
78  /// position of the error in the input data
79  pub input: I,
80  /// nom error code
81  pub code: ErrorKind,
82}
83```
84
85This structure contains a `nom::error::ErrorKind` indicating which kind of
86parser encountered an error (example: `ErrorKind::Tag` for the `tag()`
87combinator), and the input position of the error.
88
89This error type is fast and has very low overhead, so it is suitable for
90parsers that are called repeatedly, like in network protocols.
91It is very limited though, it will not tell you about the chain of
92parser calls, so it is not enough to write user friendly errors.
93
94Example error returned in a JSON-like parser (from `examples/json.rs`):
95
96```rust
97let data = "  { \"a\"\t: 42,
98\"b\": [ \"x\", \"y\", 12 ] ,
99\"c\": { 1\"hello\" : \"world\"
100}
101} ";
102
103// will print:
104// Err(
105//   Failure(
106//       Error {
107//           input: "1\"hello\" : \"world\"\n  }\n  } ",
108//           code: Char,
109//       },
110//   ),
111// )
112println!(
113  "{:#?}\n",
114  json::<Error<&str>>(data)
115);
116```
117
118### getting more information: nom::error::VerboseError
119
120The  `VerboseError<I>` type accumulates more information about the chain of
121parsers that encountered an error:
122
123```rust
124#[derive(Clone, Debug, PartialEq)]
125pub struct VerboseError<I> {
126  /// List of errors accumulated by `VerboseError`, containing the affected
127  /// part of input data, and some context
128  pub errors: crate::lib::std::vec::Vec<(I, VerboseErrorKind)>,
129}
130
131#[derive(Clone, Debug, PartialEq)]
132/// Error context for `VerboseError`
133pub enum VerboseErrorKind {
134  /// Static string added by the `context` function
135  Context(&'static str),
136  /// Indicates which character was expected by the `char` function
137  Char(char),
138  /// Error kind given by various nom parsers
139  Nom(ErrorKind),
140}
141```
142
143It contains the input position and error code for each of those parsers.
144It does not accumulate errors from the different branches of `alt`, it will
145only contain errors from the last branch it tried.
146
147It can be used along with the `nom::error::context` combinator to inform about
148the parser chain:
149
150```rust
151context(
152  "string",
153  preceded(char('\"'), cut(terminated(parse_str, char('\"')))),
154)(i)
155```
156
157It is not very usable if printed directly:
158
159```rust
160// parsed verbose: Err(
161//   Failure(
162//       VerboseError {
163//           errors: [
164//               (
165//                   "1\"hello\" : \"world\"\n  }\n  } ",
166//                   Char(
167//                       '}',
168//                   ),
169//               ),
170//               (
171//                   "{ 1\"hello\" : \"world\"\n  }\n  } ",
172//                   Context(
173//                       "map",
174//                   ),
175//               ),
176//               (
177//                   "{ \"a\"\t: 42,\n  \"b\": [ \"x\", \"y\", 12 ] ,\n  \"c\": { 1\"hello\" : \"world\"\n  }\n  } ",
178//                   Context(
179//                       "map",
180//                   ),
181//               ),
182//           ],
183//       },
184//   ),
185// )
186println!("parsed verbose: {:#?}", json::<VerboseError<&str>>(data));
187```
188
189But by looking at the original input and the chain of errors, we can build
190a more user friendly error message. The `nom::error::convert_error` function
191can build such a message.
192
193```rust
194let e = json::<VerboseError<&str>>(data).finish().err().unwrap();
195// here we use the `convert_error` function, to transform a `VerboseError<&str>`
196// into a printable trace.
197//
198// This will print:
199// verbose errors - `json::<VerboseError<&str>>(data)`:
200// 0: at line 2:
201//   "c": { 1"hello" : "world"
202//          ^
203// expected '}', found 1
204//
205// 1: at line 2, in map:
206//   "c": { 1"hello" : "world"
207//        ^
208//
209// 2: at line 0, in map:
210//   { "a" : 42,
211//   ^
212println!(
213  "verbose errors - `json::<VerboseError<&str>>(data)`:\n{}",
214  convert_error(data, e)
215);
216```
217
218Note that `VerboseError` and `convert_error` are meant as a starting point for
219language errors, but that they cannot cover all use cases. So a custom
220`convert_error` function should probably be written.
221
222### Improving usability: nom_locate and nom-supreme
223
224These crates were developed to improve the user experience when writing nom
225parsers.
226
227#### nom_locate
228
229[nom_locate](https://docs.rs/nom_locate/) wraps the input data in a `Span`
230type that can be understood by nom parsers. That type provides location
231information, like line and column.
232
233#### nom-supreme
234
235[nom-supreme](https://docs.rs/nom-supreme/) provides the `ErrorTree<I>` error
236type, that provides the same chain of parser errors as `VerboseError`, but also
237accumulates errors from the various branches tried by `alt`.
238
239With this error type, you can explore everything that has been tried by the
240parser.
241
242## The `ParseError` trait
243
244If those error types are not enough, we can define our own, by implementing
245the `ParseError<I>` trait. All nom combinators are generic over that trait
246for their errors, so we only need to define it in the parser result type,
247and it will be used everywhere.
248
249```rust
250pub trait ParseError<I>: Sized {
251    /// Creates an error from the input position and an [ErrorKind]
252    fn from_error_kind(input: I, kind: ErrorKind) -> Self;
253
254    /// Combines an existing error with a new one created from the input
255    /// position and an [ErrorKind]. This is useful when backtracking
256    /// through a parse tree, accumulating error context on the way
257    fn append(input: I, kind: ErrorKind, other: Self) -> Self;
258
259    /// Creates an error from an input position and an expected character
260    fn from_char(input: I, _: char) -> Self {
261        Self::from_error_kind(input, ErrorKind::Char)
262    }
263
264    /// Combines two existing errors. This function is used to compare errors
265    /// generated in various branches of `alt`
266    fn or(self, other: Self) -> Self {
267        other
268    }
269}
270```
271
272Any error type has to implement that trait, that requires ways to build an
273error:
274- `from_error_kind`: From the input position and the `ErrorKind` enum that indicates in which parser we got an error
275- `append`: Allows the creation of a chain of errors as we backtrack through the parser tree (various combinators will add more context)
276- `from_char`: Creates an error that indicates which character we were expecting
277- `or`: In combinators like `alt`, allows choosing between errors from various branches (or accumulating them)
278
279We can also implement the `ContextError` trait to support the `context()`
280combinator used by `VerboseError<I>`:
281
282```rust
283pub trait ContextError<I>: Sized {
284    fn add_context(_input: I, _ctx: &'static str, other: Self) -> Self {
285        other
286    }
287}
288```
289
290And there is also the `FromExternalError<I, E>` used by `map_res` to wrap
291errors returned by other functions:
292
293```rust
294pub trait FromExternalError<I, ExternalError> {
295  fn from_external_error(input: I, kind: ErrorKind, e: ExternalError) -> Self;
296}
297```
298
299### Example usage
300
301Let's define a debugging error type, that will print something every time an
302error is generated. This will give us a good insight into what the parser tried.
303Since errors can be combined with each other, we want it to keep some info on
304the error that was just returned. We'll just store that in a string:
305
306```rust
307struct DebugError {
308    message: String,
309}
310```
311
312Now let's implement `ParseError` and `ContextError` on it:
313
314```rust
315impl ParseError<&str> for DebugError {
316    // on one line, we show the error code and the input that caused it
317    fn from_error_kind(input: &str, kind: ErrorKind) -> Self {
318        let message = format!("{:?}:\t{:?}\n", kind, input);
319        println!("{}", message);
320        DebugError { message }
321    }
322
323    // if combining multiple errors, we show them one after the other
324    fn append(input: &str, kind: ErrorKind, other: Self) -> Self {
325        let message = format!("{}{:?}:\t{:?}\n", other.message, kind, input);
326        println!("{}", message);
327        DebugError { message }
328    }
329
330    fn from_char(input: &str, c: char) -> Self {
331        let message = format!("'{}':\t{:?}\n", c, input);
332        println!("{}", message);
333        DebugError { message }
334    }
335
336    fn or(self, other: Self) -> Self {
337        let message = format!("{}\tOR\n{}\n", self.message, other.message);
338        println!("{}", message);
339        DebugError { message }
340    }
341}
342
343impl ContextError<&str> for DebugError {
344    fn add_context(input: &str, ctx: &'static str, other: Self) -> Self {
345        let message = format!("{}\"{}\":\t{:?}\n", other.message, ctx, input);
346        println!("{}", message);
347        DebugError { message }
348    }
349}
350```
351
352So when calling our JSON parser with this error type, we will get a trace
353of all the times a parser stoppped and backtracked:
354
355```rust
356println!("debug: {:#?}", root::<DebugError>(data));
357```
358
359```
360AlphaNumeric:   "\"\t: 42,\n  \"b\": [ \"x\", \"y\", 12 ] ,\n  \"c\": { 1\"hello\" : \"world\"\n  }\n  } "
361
362'{':    "42,\n  \"b\": [ \"x\", \"y\", 12 ] ,\n  \"c\": { 1\"hello\" : \"world\"\n  }\n  } "
363
364'{':    "42,\n  \"b\": [ \"x\", \"y\", 12 ] ,\n  \"c\": { 1\"hello\" : \"world\"\n  }\n  } "
365"map":  "42,\n  \"b\": [ \"x\", \"y\", 12 ] ,\n  \"c\": { 1\"hello\" : \"world\"\n  }\n  } "
366
367[..]
368
369AlphaNumeric:   "\": { 1\"hello\" : \"world\"\n  }\n  } "
370
371'"':    "1\"hello\" : \"world\"\n  }\n  } "
372
373'"':    "1\"hello\" : \"world\"\n  }\n  } "
374"string":       "1\"hello\" : \"world\"\n  }\n  } "
375
376'}':    "1\"hello\" : \"world\"\n  }\n  } "
377
378'}':    "1\"hello\" : \"world\"\n  }\n  } "
379"map":  "{ 1\"hello\" : \"world\"\n  }\n  } "
380
381'}':    "1\"hello\" : \"world\"\n  }\n  } "
382"map":  "{ 1\"hello\" : \"world\"\n  }\n  } "
383"map":  "{ \"a\"\t: 42,\n  \"b\": [ \"x\", \"y\", 12 ] ,\n  \"c\": { 1\"hello\" : \"world\"\n  }\n  } "
384
385debug: Err(
386    Failure(
387        DebugError {
388            message: "'}':\t\"1\\\"hello\\\" : \\\"world\\\"\\n  }\\n  } \"\n\"map\":\t\"{ 1\\\"hello\\\" : \\\"world
389\\"\\n  }\\n  } \"\n\"map\":\t\"{ \\\"a\\\"\\t: 42,\\n  \\\"b\\\": [ \\\"x\\\", \\\"y\\\", 12 ] ,\\n  \\\"c\\\": { 1\
390\"hello\\\" : \\\"world\\\"\\n  }\\n  } \"\n",
391        },
392    ),
393)
394```
395
396Here we can see that when parsing `{ 1\"hello\" : \"world\"\n  }\n  }`, after
397getting past the initial `{`, we tried:
398- parsing a `"` because we're expecting a key name, and that parser was part of the
399"string" parser
400- parsing a `}` because the map might be empty. When this fails, we backtrack,
401through 2 recursive map parsers:
402
403```
404'}':    "1\"hello\" : \"world\"\n  }\n  } "
405"map":  "{ 1\"hello\" : \"world\"\n  }\n  } "
406"map":  "{ \"a\"\t: 42,\n  \"b\": [ \"x\", \"y\", 12 ] ,\n  \"c\": { 1\"hello\" : \"world\"\n  }\n  } "
407```
408
409## Debugging parsers
410
411While you are writing your parsers, you will sometimes need to follow
412which part of the parser sees which part of the input.
413
414To that end, nom provides the `dbg_dmp` function that will observe
415a parser's input and output, and print a hexdump of the input if there was an
416error. Here is what it could return:
417
418```rust
419fn f(i: &[u8]) -> IResult<&[u8], &[u8]> {
420    dbg_dmp(tag("abcd"), "tag")(i)
421}
422
423let a = &b"efghijkl"[..];
424
425// Will print the following message:
426// tag: Error(Error(Error { input: [101, 102, 103, 104, 105, 106, 107, 108], code: Tag })) at:
427// 00000000        65 66 67 68 69 6a 6b 6c         efghijkl
428f(a);
429```
430
431You can go further with the [nom-trace crate](https://github.com/rust-bakery/nom-trace)
432