1# Upgrading to nom 2.0 2 3The 2.0 release of nom adds a lot of new features, but it was also time for a big cleanup of badly named functions and macros, awkwardly written features and redundant functionality. So this release has some breaking changes, but most of them are harmless. 4 5## Simple VS verbose errors 6 7The error management system of nom 1.0 is powerful: it allows you to aggregate errors as you backtrack in the parser tree, and gives you clear indications about which combinators worked on which part of the input. Unfortunately, this slowed down the parsers a bit, since a lot of code was generated to drop the error list when it was not used. 8 9Not everybody uses that feature, so it was moved behind a compilation feature called "verbose-errors". For projects that do not use the `Err` enum and do not try to make their own custom error codes, it should build correctly out of the box. You can get between 30% and 50% perf gains on some parsers by updating to 2.0. 10 11For the parsers using it, you will probably get something like the following compilation error: 12 13``` 14error: no associated item named `Code` found for type `nom::ErrorKind<_>` in the current scope 15 --> src/metadata/parser.rs:309:31 16 | 17309 | _ => IResult::Error(Err::Code( 18 | ^^^^^^^^^ 19 20error: no associated item named `Code` found for type `nom::ErrorKind<_>` in the current scope 21 --> src/metadata/parser.rs:388:41 22 | 23388 | let result_invalid = IResult::Error(Err::Code(nom::ErrorKind::Custom( 24 | ^^^^^^^^^ 25 26error: no associated item named `Position` found for type `nom::ErrorKind<_>` in the current scope 27 --> src/utility/macros.rs:16:41 28 | 2916 | $crate::nom::IResult::Error($crate::nom::Err::Position( 30 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ 31 | 32 ::: src/metadata/parser.rs 33 | 34178| bytes: skip_bytes!(14, 2) ~ 35 | - in this macro invocation 36 37error: no associated item named `Position` found for type `nom::ErrorKind<_>` in the current scope 38 --> src/utility/macros.rs:16:41 39 | 4016 | $crate::nom::IResult::Error($crate::nom::Err::Position( 41 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ 42 | 43 ::: src/metadata/parser.rs 44 | 45201 | skip_bytes!(3), 46 | - in this macro invocation 47``` 48 49It is rather easy to fix, just activate the "verbose-errors" feature: 50 51```diff 52-nom = "^1.0.0" 53nom = { version = "^2.0.0", features = ["verbose-errors"] } 54``` 55 56If you only use `Err::Code` to make your custom error codes, you could switch to the simple errors, since it replaces the `Err<Input,E=u32>` enum, which contained an `ErrorKind<E=u32>`, with the `ErrorKind<E=u32>` type directly. 57 58## The eof function was removed 59 60The eof implementation was linked too much to the input type. This is now a macro combinator, called `eof!()`. 61 62If you see the following error, remove the `eof` import and replace all `eof` calls by `eof!()`. 63``` 64error[E0432]: unresolved import `nom::eof` 65 --> src/parser.rs:1:20 66 | 671 | use nom::{IResult, eof, line_ending, not_line_ending, space}; 68 | ^^^ no `eof` in `nom`. Did you mean to use `eol`? 69``` 70 71## Parsers returning `Incomplete` instead of an error on empty input 72 73`alpha`, `digit`, `alphanumeric`, `hex_digit`, `oct_digit`, `space`, `multispace`, `sized_buffer` will now return `Incomplete` if they get an empty input. If you get the following error message, you can wrap those calls with `complete!`, a combinator that transforms `Incomplete` to `Error`. 74 75``` 76---- rules::literals::tests::case_invalid_hexadecimal_no_number stdout ---- 77 thread 'rules::literals::tests::case_invalid_hexadecimal_no_number' panicked at 'assertion failed: `(left == right)` (left: `Incomplete(Unknown)`, right: `Error(Position(HexDigit, []))`)', source/rules/literals.rs:726 78``` 79 80This change was implemented to make these basic parsers more consistent. Please note that parsing the basic elements of a format, like the alphabet of a token, is always very specific to that format, and those functions may not always fit your needs. In that case, you can easily make your own with [`take_while`](take_while.m.html) and a function that test for the characters or bytes you need. 81 82## `take_till!` iterates on bytes or chars, not on references to them 83 84The input types must now conform to a trait which requires changes to `take_till!`. If you get the following error: 85 86``` 87error[E0308]: mismatched types 88 --> src/linux/parser.rs:32:1 89 | 9032 | named!(parse_c_string, take_till!(is_nul_byte)); 91 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected &u8, found u8 92 | 93 = note: expected type `&u8` 94 = note: found type `u8` 95 = note: this error originates in a macro outside of the current crate 96``` 97 98you can fix it with: 99 100```diff 101-fn is_nul_byte(c: &u8) -> bool { 102- *c == 0x0 103+fn is_nul_byte(c: u8) -> bool { 104+ c == 0x0 105``` 106 107## `length_value!`, `length_bytes!` refactoring 108 109The "length-value" pattern usually indicates that we get a length from the input, then take a slice of that size from the input, and convert that to a value of the type we need. The `length_value!` macro was using the length parameter to apply the value parser a specific number of times. 110 111- the `length_value!` macro was replaced by `length_count!` 112- the new `length_value!` macros takes a slice of the size obtained by the first child parser, then applies the second child parser on this slice. If the second parser returns incomplete, the parser fails 113- `length_data!` gets a length from its child parser, then returns a subslice of that length 114 115``` 116error[E0308]: mismatched types 117 --> src/tls.rs:378:37 118 | 119378 | cert_types: cert_types, 120 | ^^^^^^^^^^ expected struct `std::vec::Vec`, found u8 121 | 122 = note: expected type `std::vec::Vec<u8>` 123 = note: found type `u8` 124``` 125 126```diff 127 fn parse_tls_handshake_msg_certificaterequest( i:&[u8] ) -> IResult<&[u8], TlsMessageHandshake> { 128 chain!(i, 129- cert_types: length_value!(be_u8,be_u8) ~ 130+ cert_types: length_count!(be_u8,be_u8) ~ 131 sig_hash_algs_len: be_u16 ~ 132``` 133 134## `error!` does not exist anymore 135 136The `error!` macro, that was used to return a parsing error without backtracking through the parser tree, is now called `return_error!`. This change was done because the "log" crate also uses an `error!` macro, and they complained about the name conflict to nom instead of complaining to log, much to my dismay. 137 138The `add_error!` macro has also been renamed to `add_return_error!`. 139 140The compilation error you could get would be: 141 142``` 143error: macro undefined: 'error!' 144 --> src/parser.rs:205:10 145 | 146205 | error!(Custom(ParseError::InvalidData), 147 | ^ 148``` 149 150It is fixed by: 151 152```diff 153 named!(repeat<&str, u8, ParseError>, 154- error!(Custom(ParseError::RepeatNotNumeric), fix!( 155+ return_error!(Custom(ParseError::RepeatNotNumeric), fix!( 156 map_res!(flat_map!(take_s!(1), digit), FromStr::from_str)))); 157``` 158 159## The `offset()` method was moved to the `Offset` trait 160 161There is now an implementation of `Offset` for `&str`. The `HexDisplay` trait is now reserved for `&[u8]`. 162 163## `AsChar::is_0_to_9` is now `AsChar::is_dec_digit` 164 165This makes the method naming more consistent. 166 167## The number parsing macros with configurable endianness now take an enum as argument instead of a boolean 168 169Using a boolean to specify endianness was confusing, there is now the `nom::Endianness` enum: 170 171```diff 172- named!(be_tst32<u32>, u32!(true)); 173- named!(le_tst32<u32>, u32!(false)); 174+ named!(be_tst32<u32>, u32!(Endianness::Big)); 175+ named!(le_tst32<u32>, u32!(Endianness::Little)); 176``` 177 178## End of line parsing 179 180There were different, incompatible ways to parse line endings. Now, the `eol`, `line_ending` and `not_line_ending` all have the same behaviour. First, test for '\n', then if it is not the right character, test for "\r\n". This fixes the length issues. 181