1# Upgrading to nom 5.0 2 3## Changes in error types 4 5**If you have a lot of unit tests, this is probably the biggest issue you'll encounter** 6 7Error management has been rewritten to avoid two issues present in previous 8versions: 9- The error type was causing type inference issues in macros 10- The `verbose-errors` was changing the API (adding a variant in an enum) and 11reducing the parsing speed. Since compilation features are additive, if a 12dependency used nom with `verbose-errors`, it would be activated for all dependencies 13 14The new error management simplifies the internal type, removing the `Context` 15type and the error conversions that needed to happen everywhere. 16 17Here's how the types change. Before: 18 19```rust 20type IResult<I, O, E = u32> = Result<(I, O), Err<I, E>>; 21 22pub enum Err<I, E = u32> { 23 Incomplete(Needed), 24 Error(Context<I, E>), 25 Failure(Context<I, E>), 26} 27 28pub enum Context<I, E = u32> { 29 Code(I, ErrorKind<E>), 30 // this variant only present if `verbose-errors` is active 31 List(Vec<(I, ErrorKind<E>)>), 32} 33``` 34 35In nom 5: 36 37```rust 38type IResult<I, O, E = (I, ErrorKind)> = Result<(I, O), Err<E>>; 39 40pub enum Err<E> { 41 Incomplete(Needed), 42 Error(E), 43 Failure(E), 44} 45``` 46 47Now the error type is completely generic, so you can choose exactly 48what you need, from erasing errors entirely, to reproducing the 49`verbose-errors` feature with the [`VerboseError` type](https://docs.rs/nom/latest/nom/error/struct.VerboseError.html). 50The [`ErrorKind` enum](https://docs.rs/nom/latest/nom/error/enum.ErrorKind.html) 51is not generic now: It does not need to hold a custom error type. 52 53Any error type has to implement the [`ParseError` trait](https://docs.rs/nom/latest/nom/error/trait.ParseError.html) 54that specifies methods to build an error from a position in input data, 55and an `ErrorKind`, or another error, etc. 56 57Since the error types change, this will probably generate errors 58in your unit tests. 59 60Usually, changing a `Err(Err::Error(Context::Code(input, error)))` to 61`Err(Err::Error((input, error)))` is enough (if you use the default 62error type `(Input, ErrorKind)`. 63 64## Removal of `CompleteStr` and `CompleteByteSlice` 65 66Those types were introduced in nom 4 as alternative input types, to 67solve issues with streaming parsers. 68 69A core feature of nom is its support for streaming parsers: When you are 70handling network packets or large files, you might not have all of the data. 71As an example, if you use a parser recognizing numbers and you pass as input 72"123", the parser will return `Err(Err::Incomplete(_))` because it cannot decide 73if it has the whole input or not. The complete data could be "123;" or "12345". 74 75There are various issues with this approach, though. A lot of formats, especially 76text formats, are not meant to be very large, and in all cases the data will be 77entirely loaded in memory, so handling the streaming case gets annoying. 78And for some binary formats, there will often be some TLV (tag/length/value) 79elements for which we already know the complete size of the value. 80 81nom can work on various input types, as long as they implement a common set of 82traits, so nom 4 had the `CompleteByteSlice` and `CompleteStr` types as alternatives 83of `&[u8]` and `&str`, that changed the behaviour of parsers to assume that the 84input data is complete. Unfortunately, those types were hard to handle, since 85we would often need to convert them back and forth with their inner types, 86and they would appear everywhere in parser signatures. Also, they were unexpectedly 87slow, even though they were just wrapper types. 88 89In nom 5, those types were removed, and instead we have *streaming* and *complete* 90versions of various function combinators. You can find them in the corresponding 91submodules of the `bytes`, `character`, and `number` modules. Since macros cannot 92be isolated in modules (they are all at the top level once exported), all macros 93have been rewritten to use the *streaming* version. 94 95Upgrading from nom 4 means removing the `CompleteStr` and `CompleteByteSlice` types 96if you were using them, and checking which parsers suddenly return `Incomplete` on 97valid inputs. It indicates that you will need to replace some macros combinators 98with the *complete* function version. 99 100## From macros to functions 101 102nom has used macros as its core tool for a long time, since they were a powerful 103tool to generate parsers. The code created was simple, approximately the same way 104it could be written manually, and it was easy for the compiler to optimize it. 105 106Unfortunately, macros were sometimes hard to manipulate, since nom was relying 107on a few lesser known tricks to build its DSL, and macros parsing errors were 108often too cryptic to understand. 109 110nom 5 introduces a new technique to write combinators. Instead of using macros 111that can take other macros as argument and call them by rewriting their argument 112list, we have functions that take other functions as arguments, and return 113functions. 114 115This technique has a lot of advantages over macros: 116- No type inference issues, you can explicitly describe the error type in 117function definitions 118- Nicer compilation errors: rustc can show you exactly what is missing when calling 119a combinator, if you need to import new traits, etc. 120- Those functions are actually faster than nom 4's macros when built with link time 121optimization 122- Small gain in compilation speed (since code can be reused instead of regenerated 123everywhere) 124- The macros are still there, but were rewritten to use the functions instead, so 125they gain the performance benefit immediately 126 127In practice, nom parsers will have the following signature: 128`Input -> IResult<Input, Output, Error>` 129 130A function combinator will then have this signature: 131`<args> -> impl Fn(Input) -> IResult<Input, Output, Error>` 132 133Here is an example with a simplified `take` combinator: 134 135```rust 136pub fn take(count: usize) -> impl Fn(&[u8]) -> IResult<&[u8], &[u8]> 137where 138{ 139 move |i: &[u8]| { 140 if i.len() < count { 141 Err(Err::Error((i, ErrorKind::Eof)) 142 } else { 143 Ok(i.split_at(count)) 144 } 145 } 146} 147``` 148 149`take` generates a closure and returns it. We can use it directly like this: 150`take(5)(input)`. 151 152(this version of `take` is simplified because it actually uses generic input 153and error types and various traits over these types) 154 155More complex combinators like `pair` (returns a tuple of the result of 2 parsers) 156will be able to combine parsers to make more advanced ones: 157 158```rust 159pub fn pair<I, O1, O2, E, F, G>(first: F, second: G) -> impl Fn(I) -> IResult<I, (O1, O2), E> 160where 161 F: Fn(I) -> IResult<I, O1, E>, 162 G: Fn(I) -> IResult<I, O2, E>, 163{ 164 move |input: I| { 165 let (input, o1) = first(input)?; 166 second(input).map(|(i, o2)| (i, (o1, o2))) 167 } 168} 169``` 170 171This combinator is generic over its parser arguments and can assemble them in 172the closure that it returns. 173 174You can then use it that way: 175 176```rust 177fn parser(i: &str) -> IResult<&str, (&str, &str)> { 178 pair(alpha0, digit0)(i) 179} 180 181// will return `Ok((";", ("abc", "123")))` 182parser("abc123;"); 183``` 184