• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Upgrading to nom 2.0
2
3The 2.0 release of nom adds a lot of new features, but it was also time for a big cleanup of badly named functions and macros, awkwardly written features and redundant functionality. So this release has some breaking changes, but most of them are harmless.
4
5## Simple VS verbose errors
6
7The error management system of nom 1.0 is powerful: it allows you to aggregate errors as you backtrack in the parser tree, and gives you clear indications about which combinators worked on which part of the input. Unfortunately, this slowed down the parsers a bit, since a lot of code was generated to drop the error list when it was not used.
8
9Not everybody uses that feature, so it was moved behind a compilation feature called "verbose-errors". For projects that do not use the `Err` enum and do not try to make their own custom error codes, it should build correctly out of the box. You can get between 30% and 50% perf gains on some parsers by updating to 2.0.
10
11For the parsers using it, you will probably get something like the following compilation error:
12
13```
14error: no associated item named `Code` found for type `nom::ErrorKind<_>` in the current scope
15   --> src/metadata/parser.rs:309:31
16    |
17309 |     _       => IResult::Error(Err::Code(
18    |                               ^^^^^^^^^
19
20error: no associated item named `Code` found for type `nom::ErrorKind<_>` in the current scope
21   --> src/metadata/parser.rs:388:41
22    |
23388 |     let result_invalid = IResult::Error(Err::Code(nom::ErrorKind::Custom(
24    |                                         ^^^^^^^^^
25
26error: no associated item named `Position` found for type `nom::ErrorKind<_>` in the current scope
27  --> src/utility/macros.rs:16:41
28   |
2916 |             $crate::nom::IResult::Error($crate::nom::Err::Position(
30   |                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
31   |
32  ::: src/metadata/parser.rs
33   |
34178|     bytes: skip_bytes!(14, 2) ~
35   |                      - in this macro invocation
36
37error: no associated item named `Position` found for type `nom::ErrorKind<_>` in the current scope
38   --> src/utility/macros.rs:16:41
39    |
4016  |             $crate::nom::IResult::Error($crate::nom::Err::Position(
41    |                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
42    |
43   ::: src/metadata/parser.rs
44    |
45201 |     skip_bytes!(3),
46    |               - in this macro invocation
47```
48
49It is rather easy to fix, just activate the "verbose-errors" feature:
50
51```diff
52-nom             = "^1.0.0"
53nom             = { version = "^2.0.0", features = ["verbose-errors"] }
54```
55
56If you only use `Err::Code` to make your custom error codes, you could switch to the simple errors, since it replaces the `Err<Input,E=u32>` enum, which contained an `ErrorKind<E=u32>`, with the `ErrorKind<E=u32>` type directly.
57
58## The eof function was removed
59
60The eof implementation was linked too much to the input type. This is now a macro combinator, called `eof!()`.
61
62If you see the following error, remove the `eof` import and replace all `eof` calls by `eof!()`.
63```
64error[E0432]: unresolved import `nom::eof`
65 --> src/parser.rs:1:20
66  |
671 | use nom::{IResult, eof, line_ending, not_line_ending, space};
68  |                    ^^^ no `eof` in `nom`. Did you mean to use `eol`?
69```
70
71## Parsers returning `Incomplete` instead of an error on empty input
72
73`alpha`, `digit`, `alphanumeric`, `hex_digit`, `oct_digit`, `space`, `multispace`, `sized_buffer` will now return `Incomplete` if they get an empty input. If you get the following error message, you can wrap those calls with `complete!`, a combinator that transforms `Incomplete` to `Error`.
74
75```
76---- rules::literals::tests::case_invalid_hexadecimal_no_number stdout ----
77        thread 'rules::literals::tests::case_invalid_hexadecimal_no_number' panicked at 'assertion failed: `(left == right)` (left: `Incomplete(Unknown)`, right: `Error(Position(HexDigit, []))`)', source/rules/literals.rs:726
78```
79
80This change was implemented to make these basic parsers more consistent. Please note that parsing the basic elements of a format, like the alphabet of a token, is always very specific to that format, and those functions may not always fit your needs. In that case, you can easily make your own with [`take_while`](take_while.m.html) and a function that test for the characters or bytes you need.
81
82## `take_till!` iterates on bytes or chars, not on references to them
83
84The input types must now conform to a trait which requires changes to `take_till!`. If you get the following error:
85
86```
87error[E0308]: mismatched types
88  --> src/linux/parser.rs:32:1
89   |
9032 | named!(parse_c_string, take_till!(is_nul_byte));
91   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected &u8, found u8
92   |
93   = note: expected type `&u8`
94   = note:    found type `u8`
95   = note: this error originates in a macro outside of the current crate
96```
97
98you can fix it with:
99
100```diff
101-fn is_nul_byte(c: &u8) -> bool {
102-    *c == 0x0
103+fn is_nul_byte(c: u8) -> bool {
104+    c == 0x0
105```
106
107## `length_value!`, `length_bytes!` refactoring
108
109The "length-value" pattern usually indicates that we get a length from the input, then take a slice of that size from the input, and convert that to a value of the type we need. The `length_value!` macro was using the length parameter to apply the value parser a specific number of times.
110
111- the `length_value!` macro was replaced by `length_count!`
112- the new `length_value!` macros takes a slice of the size obtained by the first child parser, then applies the second child parser on this slice. If the second parser returns incomplete, the parser fails
113- `length_data!` gets a length from its child parser, then returns a subslice of that length
114
115```
116error[E0308]: mismatched types
117   --> src/tls.rs:378:37
118    |
119378 |                         cert_types: cert_types,
120    |                                     ^^^^^^^^^^ expected struct `std::vec::Vec`, found u8
121    |
122    = note: expected type `std::vec::Vec<u8>`
123    = note:    found type `u8`
124```
125
126```diff
127 fn parse_tls_handshake_msg_certificaterequest( i:&[u8] ) -> IResult<&[u8], TlsMessageHandshake> {
128     chain!(i,
129-        cert_types: length_value!(be_u8,be_u8) ~
130+        cert_types: length_count!(be_u8,be_u8) ~
131         sig_hash_algs_len: be_u16 ~
132```
133
134## `error!` does not exist anymore
135
136The `error!` macro, that was used to return a parsing error without backtracking through the parser tree, is now called `return_error!`. This change was done because the "log" crate also uses an `error!` macro, and they complained about the name conflict to nom instead of complaining to log, much to my dismay.
137
138The `add_error!` macro has also been renamed to `add_return_error!`.
139
140The compilation error you could get would be:
141
142```
143error: macro undefined: 'error!'
144   --> src/parser.rs:205:10
145    |
146205 |     error!(Custom(ParseError::InvalidData),
147    |          ^
148```
149
150It is fixed by:
151
152```diff
153 named!(repeat<&str, u8, ParseError>,
154-       error!(Custom(ParseError::RepeatNotNumeric), fix!(
155+       return_error!(Custom(ParseError::RepeatNotNumeric), fix!(
156        map_res!(flat_map!(take_s!(1), digit), FromStr::from_str))));
157```
158
159## The `offset()` method was moved to the `Offset` trait
160
161There is now an implementation of `Offset` for `&str`. The `HexDisplay` trait is now reserved for `&[u8]`.
162
163## `AsChar::is_0_to_9` is now `AsChar::is_dec_digit`
164
165This makes the method naming more consistent.
166
167## The number parsing macros with configurable endianness now take an enum as argument instead of a boolean
168
169Using a boolean to specify endianness was confusing, there is now the `nom::Endianness` enum:
170
171```diff
172-    named!(be_tst32<u32>, u32!(true));
173-    named!(le_tst32<u32>, u32!(false));
174+    named!(be_tst32<u32>, u32!(Endianness::Big));
175+    named!(le_tst32<u32>, u32!(Endianness::Little));
176```
177
178## End of line parsing
179
180There were different, incompatible ways to parse line endings. Now, the `eol`, `line_ending` and `not_line_ending` all have the same behaviour. First, test for '\n', then if it is not the right character, test for "\r\n". This fixes the length issues.
181