• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Upgrading to nom 4.0
2
3The nom 4.0 is a nearly complete rewrite of nom's internal structures, along with a cleanup of a lot of parser and combinators whose semantics were unclear. Upgrading from previous nom versions can require a lot of changes, especially if you have a lot of unit tests. But most of those changes are pretty straightforward.
4
5## Changes in internal structures
6
7Previous versions of nom all generated parsers with the following signature:
8
9```rust
10fn parser(input: I) -> IResult<I,O> { ... }
11```
12
13With the following definition for `IResult`:
14
15```rust
16pub enum IResult<I,O,E=u32> {
17  /// remaining input, result value
18  Done(I,O),
19  /// indicates the parser encountered an error. E is a custom error type you can redefine
20  Error(Err<E>),
21  /// Incomplete contains a Needed, an enum that can represent a known quantity of input data, or unknown
22  Incomplete(Needed)
23}
24
25pub enum Needed {
26  /// needs more data, but we do not know how much
27  Unknown,
28  /// contains the required total data size
29  Size(usize)
30}
31
32// if the "verbose-errors" feature is not active
33pub type Err<E=u32> = ErrorKind<E>;
34
35// if the "verbose-errors" feature is active
36pub enum Err<P,E=u32>{
37  /// An error code, represented by an ErrorKind, which can contain a custom error code represented by E
38  Code(ErrorKind<E>),
39  /// An error code, and the next error
40  Node(ErrorKind<E>, Vec<Err<P,E>>),
41  /// An error code, and the input position
42  Position(ErrorKind<E>, P),
43  /// An error code, the input position and the next error
44  NodePosition(ErrorKind<E>, P, Vec<Err<P,E>>)
45}
46```
47
48The new design uses the `Result` type from the standard library:
49
50```rust
51pub type IResult<I, O, E = u32> = Result<(I, O), Err<I, E>>;
52
53pub enum Err<I, E = u32> {
54  /// There was not enough data
55  Incomplete(Needed),
56  /// The parser had an error (recoverable)
57  Error(Context<I, E>),
58  /// The parser had an unrecoverable error
59  Failure(Context<I, E>),
60}
61
62pub enum Needed {
63  /// needs more data, but we do not know how much
64  Unknown,
65  /// contains the required additional data size
66  Size(usize)
67}
68
69// if the "verbose-errors" feature is inactive
70pub enum Context<I, E = u32> {
71  Code(I, ErrorKind<E>),
72}
73
74// if the "verbose-errors" feature is active
75pub enum Context<I, E = u32> {
76  Code(I, ErrorKind<E>),
77  List(Vec<(I, ErrorKind<E>)>),
78}
79```
80
81With this new design, the `Incomplete` case is now part of the error case, and we get a `Failure`
82case representing an unrecoverable error (combinators like `alt!` will not try another branch).
83The verbose error management is now a truly additive feature above the simple one (it adds a
84case to the `Context` enum).
85
86Error management types also get smaller and more efficient. We can now return
87the related input as part of the error in all cases.
88
89All of this will likely not affect your existing parsers, but require changes to the surrounding
90code that manipulates parser results.
91
92## Faster parsers, new memory layout but with lower footprint
93
94These changes keep the same memory footprint in simple errors mode, and reduce it in verbose errors:
95
96| size of `IResult<&[u8], &[u8]>` | simple errors | verbose errors |
97|---|---|---|
98| nom 3 | 40 bytes | 64 bytes |
99| nom 4 | 40 bytes | 48 bytes |
100
101In addition, [parsers are faster in nom 4 than in nom 3](https://github.com/Geal/nom/issues/356#issuecomment-333816834). This change is justified.
102
103## Replacing parser result matchers
104
105Whenever you use pattern matching on the result of a parser, or compare it to another parser
106result (like in a unit test), you will have to perform the following changes:
107
108For the correct result case:
109
110```rust
111IResult::Done(i, o)
112
113// becomes
114
115Ok((i, o))
116```
117
118For the error case (note that argument position for `error_position` and other sibling macros was changed
119for the sake of consistency with the rest of the code):
120
121```rust
122IResult::Error(error_position!(ErrorKind::OneOf, input)),
123
124// becomes
125
126Err(Err::Error(error_position!(input, ErrorKind::OneOf)))
127```
128
129```rust
130IResult::Incomplete(Needed::Size(1))
131
132// becomes
133
134Err(Err::Incomplete(Needed::Size(1)))
135```
136
137For pattern matching, you now need to handle the `Failure` case as well, which works like the error
138case:
139
140```rust
141match result {
142  Ok((remaining, value)) => { ... },
143  Err(Err::Incomplete(needed)) => { ... },
144  Err(Err::Error(e)) | Err(Err::Failure(e)) => { ... }
145}
146```
147
148## Errors on `Incomplete` data size calculation
149
150In previous versions, `Needed::Size(sz)` indicated the total needed data size (counting the actual input).
151Now it only returns the additional data needed, so the values will have changed.
152
153## New trait for input types
154
155nom allows other input types than `&[u8]` and `&str`, as long as they implement a set of traits
156that are used everywhere in nom. This version introduces the `AtEof` trait:
157
158```rust
159pub trait AtEof {
160  fn at_eof(&self) -> bool;
161}
162```
163
164This trait allows the input value to indicate whether there can be more input coming later (buffering
165data from a file, or waiting for network data).
166
167## Dealing with `Incomplete` usage
168
169nom's parsers are designed to work around streaming issues: if there is not enough data to decide, a
170parser will return `Incomplete` instead of returning a partial value that might be false.
171
172As an example, if you want to parse alphabetic characters then digits, when you get the whole input
173`abc123;`, the parser will return `abc` for alphabetic characters, and `123` for the digits, and `;`
174as remaining input.
175
176But if you get that input in chunks, like `ab` then `c123;`, the alphabetic characters parser will
177return `Incomplete`, because it does not know if there will be more matching characters afterwards.
178If it returned `ab` directly, the digit parser would fail on the rest of the input, even though the
179input had the valid format.
180
181For some users, though, the input will never be partial (everything could be loaded in memory at once),
182and the solution in nom 3 and before was to wrap parts of the parsers with the `complete!()` combinator
183that transforms `Incomplete` in `Error`.
184
185nom 4 is much stricter about the behaviour with partial data, but provides better tools to deal with it.
186Thanks to the new `AtEof` trait for input types, nom now provides the `CompleteByteSlice(&[u8])` and
187`CompleteStr(&str)` input types, for which the `at_eof()` method always returns true.
188With these types, no need to put a `complete!()` combinator everywhere, you can just apply those types
189like this:
190
191```rust
192named!(parser<&str,ReturnType>, ... );
193
194// becomes
195
196named!(parser<CompleteStr,ReturnType>, ... );
197```
198
199```rust
200named!(parser<&str,&str>, ... );
201
202// becomes
203
204named!(parser<CompleteStr,CompleteStr>, ... );
205```
206
207```rust
208named!(parser, ... );
209
210// becomes
211
212named!(parser<CompleteByteSlice,CompleteByteSlice>, ... );
213```
214
215And as an example, for a unit test:
216
217```rust
218assert_eq!(parser("abcd123"), Ok(("123", "abcd"));
219
220// becomes
221
222assert_eq!(parser(CompleteStr("abcd123")), Ok((CompleteStr("123"), CompleteStr("abcd")));
223```
224
225These types allow you to correctly handle cases like text formats for which there might be a last
226empty line or not, as seen in [one of the examples](https://github.com/Geal/nom/blob/87d837006467aebcdb0c37621da874a56c8562b5/tests/multiline.rs).
227
228If those types feel a bit long to write everywhere in the parsers, it's possible
229to alias them like this:
230
231```rust
232use nom::types::CompleteByteSlice as Input;
233```
234
235## Custom error types
236
237Custom error types caused a lot of type inference issues in previous nom versions. Now error types
238are automatically converted as needed. If you want to set up a custom error type, you now need to
239implement `std::convert::From<u32>` for this type.
240
241## Producers and consumers
242
243Producers and consumers were removed in nom 4. That feature was too hard to integrate in code that
244deals with IO.
245
246