• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Upgrading to nom 5.0
2
3## Changes in error types
4
5**If you have a lot of unit tests, this is probably the biggest issue you'll encounter**
6
7Error management has been rewritten to avoid two issues present in previous
8versions:
9- The error type was causing type inference issues in macros
10- The `verbose-errors` was changing the API (adding a variant in an enum) and
11reducing the parsing speed. Since compilation features are additive, if a
12dependency used nom with `verbose-errors`, it would be activated for all dependencies
13
14The new error management simplifies the internal type, removing the `Context`
15type and the error conversions that needed to happen everywhere.
16
17Here's how the types change. Before:
18
19```rust
20type IResult<I, O, E = u32> = Result<(I, O), Err<I, E>>;
21
22pub enum Err<I, E = u32> {
23    Incomplete(Needed),
24    Error(Context<I, E>),
25    Failure(Context<I, E>),
26}
27
28pub enum Context<I, E = u32> {
29    Code(I, ErrorKind<E>),
30    // this variant only present if `verbose-errors` is active
31    List(Vec<(I, ErrorKind<E>)>),
32}
33```
34
35In nom 5:
36
37```rust
38type IResult<I, O, E = (I, ErrorKind)> = Result<(I, O), Err<E>>;
39
40pub enum Err<E> {
41    Incomplete(Needed),
42    Error(E),
43    Failure(E),
44}
45```
46
47Now the error type is completely generic, so you can choose exactly
48what you need, from erasing errors entirely, to reproducing the
49`verbose-errors` feature with the [`VerboseError` type](https://docs.rs/nom/latest/nom/error/struct.VerboseError.html).
50The [`ErrorKind` enum](https://docs.rs/nom/latest/nom/error/enum.ErrorKind.html)
51is not generic now: It does not need to hold a custom error type.
52
53Any error type has to implement the [`ParseError` trait](https://docs.rs/nom/latest/nom/error/trait.ParseError.html)
54that specifies methods to build an error from a position in input data,
55and an `ErrorKind`, or another error, etc.
56
57Since the error types change, this will probably generate errors
58in your unit tests.
59
60Usually, changing a `Err(Err::Error(Context::Code(input, error)))` to
61`Err(Err::Error((input, error)))` is enough (if you use the default
62error type `(Input, ErrorKind)`.
63
64## Removal of `CompleteStr` and `CompleteByteSlice`
65
66Those types were introduced in nom 4 as alternative input types, to
67solve issues with streaming parsers.
68
69A core feature of nom is its support for streaming parsers: When you are
70handling network packets or large files, you might not have all of the data.
71As an example, if you use a parser recognizing numbers and you pass as input
72"123", the parser will return `Err(Err::Incomplete(_))` because it cannot decide
73if it has the whole input or not. The complete data could be "123;" or "12345".
74
75There are various issues with this approach, though. A lot of formats, especially
76text formats, are not meant to be very large, and in all cases the data will be
77entirely loaded in memory, so handling the streaming case gets annoying.
78And for some binary formats, there will often be some TLV (tag/length/value)
79elements for which we already know the complete size of the value.
80
81nom can work on various input types, as long as they implement a common set of
82traits, so nom 4 had the `CompleteByteSlice` and `CompleteStr` types as alternatives
83of `&[u8]` and `&str`, that changed the behaviour of parsers to assume that the
84input data is complete. Unfortunately, those types were hard to handle, since
85we would often need to convert them back and forth with their inner types,
86and they would appear everywhere in parser signatures. Also, they were unexpectedly
87slow, even though they were just wrapper types.
88
89In nom 5, those types were removed, and instead we have *streaming* and *complete*
90versions of various function combinators. You can find them in the corresponding
91submodules of the `bytes`, `character`, and `number` modules. Since macros cannot
92be isolated in modules (they are all at the top level once exported), all macros
93have been rewritten to use the *streaming* version.
94
95Upgrading from nom 4 means removing the `CompleteStr` and `CompleteByteSlice` types
96if you were using them, and checking which parsers suddenly return `Incomplete` on
97valid inputs. It indicates that you will need to replace some macros combinators
98with the *complete* function version.
99
100## From macros to functions
101
102nom has used macros as its core tool for a long time, since they were a powerful
103tool to generate parsers. The code created was simple, approximately the same way
104it could be written manually, and it was easy for the compiler to optimize it.
105
106Unfortunately, macros were sometimes hard to manipulate, since nom was relying
107on a few lesser known tricks to build its DSL, and macros parsing errors were
108often too cryptic to understand.
109
110nom 5 introduces a new technique to write combinators. Instead of using macros
111that can take other macros as argument and call them by rewriting their argument
112list, we have functions that take other functions as arguments, and return
113functions.
114
115This technique has a lot of advantages over macros:
116- No type inference issues, you can explicitly describe the error type in
117function definitions
118- Nicer compilation errors: rustc can show you exactly what is missing when calling
119a combinator, if you need to import new traits, etc.
120- Those functions are actually faster than nom 4's macros when built with link time
121optimization
122- Small gain in compilation speed (since code can be reused instead of regenerated
123everywhere)
124- The macros are still there, but were rewritten to use the functions instead, so
125they gain the performance benefit immediately
126
127In practice, nom parsers will have the following signature:
128`Input -> IResult<Input, Output, Error>`
129
130A function combinator will then have this signature:
131`<args> -> impl Fn(Input) -> IResult<Input, Output, Error>`
132
133Here is an example with a simplified `take` combinator:
134
135```rust
136pub fn take(count: usize) -> impl Fn(&[u8]) -> IResult<&[u8], &[u8]>
137where
138{
139  move |i: &[u8]| {
140    if i.len() < count {
141      Err(Err::Error((i, ErrorKind::Eof))
142    } else {
143      Ok(i.split_at(count))
144    }
145  }
146}
147```
148
149`take` generates a closure and returns it. We can use it directly like this:
150`take(5)(input)`.
151
152(this version of `take` is simplified because it actually uses generic input
153and error types and various traits over these types)
154
155More complex combinators like `pair` (returns a tuple of the result of 2 parsers)
156will be able to combine parsers to make more advanced ones:
157
158```rust
159pub fn pair<I, O1, O2, E, F, G>(first: F, second: G) -> impl Fn(I) -> IResult<I, (O1, O2), E>
160where
161  F: Fn(I) -> IResult<I, O1, E>,
162  G: Fn(I) -> IResult<I, O2, E>,
163{
164  move |input: I| {
165    let (input, o1) = first(input)?;
166    second(input).map(|(i, o2)| (i, (o1, o2)))
167  }
168}
169```
170
171This combinator is generic over its parser arguments and can assemble them in
172the closure that it returns.
173
174You can then use it that way:
175
176```rust
177fn parser(i: &str) -> IResult<&str, (&str, &str)> {
178  pair(alpha0, digit0)(i)
179}
180
181// will return `Ok((";", ("abc", "123")))`
182parser("abc123;");
183```
184