1# Foldhash 2 3This repository contains foldhash, a fast, non-cryptographic, minimally 4DoS-resistant hashing algorithm implemented in Rust designed for computational 5uses such as hash maps, bloom filters, count sketching, etc. 6 7When should you **not** use foldhash: 8 9- You are afraid of people studying your long-running program's behavior to 10 reverse engineer its internal random state and using this knowledge to create 11 many colliding inputs for computational complexity attacks. For more details 12 see the section "HashDoS resistance". 13 14- You expect foldhash to have a consistent output across versions or 15 platforms, such as for persistent file formats or communication protocols. 16 17- You are relying on foldhash's properties for any kind of security. 18 Foldhash is **not appropriate for any cryptographic purpose**. 19 20Foldhash has two variants, one optimized for speed which is ideal for data 21structures such as hash maps and bloom filters, and one optimized for 22statistical quality which is ideal for algorithms such as 23[HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) and 24[MinHash](https://en.wikipedia.org/wiki/MinHash). 25 26Foldhash can be used in a `#![no_std]` environment by disabling its default 27`"std"` feature. 28 29 30## Performance 31 32We evaluated foldhash against three commonly used hashes in Rust: 33[aHash](https://github.com/tkaitchuck/aHash) v0.8.11, 34[fxhash](https://github.com/cbreeden/fxhash) v0.2.1, and 35[SipHash-1-3](https://en.wikipedia.org/wiki/SipHash), the default hash algorithm 36in Rust at the time of writing. We evaluated both variants foldhash provides, 37`foldhash-f` and `foldhash-q`, which correspond to `foldhash::fast` and 38`foldhash::quality` in the crate respectively. 39 40First we note that hashers with random state inflate the size of your `HashMap`, 41which may or may not be important for your performance: 42```rust 43std::mem::size_of::<foldhash::HashMap<u32, u32>>() = 40 // (both variants) 44std::mem::size_of::<ahash::HashMap<u32, u32>>() = 64 45std::mem::size_of::<fxhash::FxHashMap<u32, u32>>() = 32 46std::mem::size_of::<std::collections::HashMap<u32, u32>>() = 48 47``` 48 49We tested runtime performance on two machines, one with a 2023 Apple M2 CPU, one 50with a 2023 Intel Xeon Platinum 8481C server CPU, both with stable Rust 1.80.1. 51Since one of our competitors (aHash) is reliant on AES-based instructions for 52optimal performance we have included both a benchmark with and without 53`-C target-cpu=native` for the Intel machine. 54 55We tested across a wide variety of data types we consider representative of 56types / distributions one might hash in the real world, in the context of a hash 57table key: 58 59- `u32` - random 32-bit unsigned integers, 60- `u32pair` - pairs of random 32-bit unsigned integers, 61- `u64` - random 64-bit unsigned integers, 62- `u64pair` - pairs of random 64-bit unsigned integers, 63- `u64lobits` - 64-bit unsigned integers where only the bottom 16 bits vary, 64- `u64hibits` - 64-bit unsigned integers where only the top 16 bits vary, 65- `ipv4` - [`std::net::Ipv4Addr`](https://doc.rust-lang.org/std/net/struct.Ipv4Addr.html), which is equivalent to `[u8; 4]`, 66- `ipv6` - [`std::net::Ipv6Addr`](https://doc.rust-lang.org/std/net/struct.Ipv6Addr.html), which is equivalent to `[u8; 16]`, 67- `rgba` - random `(u8, u8, u8, u8)` tuples, 68- `strenglishword` - strings containing words sampled uniformly from the top 10,000 most common English words, 69- `struuid` - random UUIDs, hashed in string representation, 70- `strurl` - strings containing URLs sampled uniformly from a corpus of 10,000 URLs, 71- `strdate` - random `YYYY-MM-DD` date strings, 72- `accesslog` - `(u128, u32, chrono::NaiveDate, bool)`, meant to simulate a typical 73 larger compound type, in this case `(resource_id, user_id, date, success)` 74 for an access log. 75- `kilobyte` - random bytestrings one kilobyte in length, 76- `tenkilobyte` - random bytestrings ten kilobytes in length. 77 78We tested the performance of hashing the above data types in the following four contexts: 79 80- `hashonly` - only the time it takes to hash the value, 81- `lookupmiss` - the time it takes to do a lookup in a 1,000 element hash map 82of random elements, only sampling keys of which we know that are not in the hash map, 83- `lookuphit` - similar to `lookupmiss`, except the keys are sampled from keys 84known to be in the hash map, 85- `setbuild` - the time it takes to construct a `HashSet` of 1,000 elements 86from 1,000 randomly sampled elements each repeated 10 times (so 10,000 inserts, 87with ~90% duplicates). 88 89All times are reported as expected time per operation, so one hash, one lookup, 90or one insert respectively. The full results [can be found 91here](https://gist.github.com/orlp/1271ad5b8b775c651cc55773888858eb). To 92summarize, we will only show the results for `u64` and `strenglishword` here, as 93well as the observed geometric mean and average rank over the full benchmark. 94 95``` 96Xeon 8481c 97 98┌────────────────┬────────────┬────────────┬────────────┬─────────┬─────────┬─────────┐ 99│ avg_rank ┆ 1.58 ┆ 2.66 ┆ 2.09 ┆ 3.70 ┆ 4.97 │ 100│ geometric_mean ┆ 6.21 ┆ 7.01 ┆ 7.56 ┆ 8.74 ┆ 28.70 │ 101╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ 102│ distr ┆ bench ┆ foldhash-f ┆ foldhash-q ┆ fxhash ┆ ahash ┆ siphash │ 103╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ 104│ u64 ┆ hashonly ┆ 0.79 ┆ 1.03 ┆ 0.67 ┆ 1.23 ┆ 9.09 │ 105│ u64 ┆ lookupmiss ┆ 2.01 ┆ 2.44 ┆ 1.73 ┆ 2.73 ┆ 12.03 │ 106│ u64 ┆ lookuphit ┆ 3.04 ┆ 3.59 ┆ 2.64 ┆ 3.84 ┆ 12.65 │ 107│ u64 ┆ setbuild ┆ 6.13 ┆ 6.52 ┆ 4.88 ┆ 6.66 ┆ 17.80 │ 108| ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... | 109│ strenglishword ┆ hashonly ┆ 2.63 ┆ 2.98 ┆ 3.24 ┆ 3.57 ┆ 11.87 │ 110│ strenglishword ┆ lookupmiss ┆ 4.63 ┆ 5.03 ┆ 4.51 ┆ 5.86 ┆ 15.19 │ 111│ strenglishword ┆ lookuphit ┆ 8.62 ┆ 9.25 ┆ 8.28 ┆ 10.06 ┆ 21.35 │ 112│ strenglishword ┆ setbuild ┆ 14.77 ┆ 15.57 ┆ 18.86 ┆ 15.72 ┆ 35.36 │ 113└────────────────┴────────────┴────────────┴────────────┴─────────┴─────────┴─────────┘ 114 115Xeon 8481c with RUSTFLAGS="-C target-cpu=native" 116 117┌────────────────┬────────────┬────────────┬────────────┬─────────┬─────────┬─────────┐ 118│ avg_rank ┆ 1.89 ┆ 3.12 ┆ 2.25 ┆ 2.77 ┆ 4.97 │ 119│ geometric_mean ┆ 6.00 ┆ 6.82 ┆ 7.39 ┆ 6.94 ┆ 29.49 │ 120╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ 121│ distr ┆ bench ┆ foldhash-f ┆ foldhash-q ┆ fxhash ┆ ahash ┆ siphash │ 122╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ 123│ u64 ┆ hashonly ┆ 0.79 ┆ 1.01 ┆ 0.67 ┆ 1.34 ┆ 9.24 │ 124│ u64 ┆ lookupmiss ┆ 1.68 ┆ 2.12 ┆ 1.62 ┆ 1.96 ┆ 12.04 │ 125│ u64 ┆ lookuphit ┆ 2.68 ┆ 3.19 ┆ 2.28 ┆ 3.16 ┆ 13.09 │ 126│ u64 ┆ setbuild ┆ 6.16 ┆ 6.42 ┆ 4.75 ┆ 7.03 ┆ 18.88 │ 127| ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... | 128│ strenglishword ┆ hashonly ┆ 2.60 ┆ 2.97 ┆ 3.25 ┆ 3.04 ┆ 11.58 │ 129│ strenglishword ┆ lookupmiss ┆ 4.41 ┆ 4.96 ┆ 4.82 ┆ 4.79 ┆ 32.31 │ 130│ strenglishword ┆ lookuphit ┆ 8.68 ┆ 9.35 ┆ 8.46 ┆ 8.63 ┆ 21.48 │ 131│ strenglishword ┆ setbuild ┆ 15.01 ┆ 16.34 ┆ 19.34 ┆ 15.37 ┆ 35.22 │ 132└────────────────┴────────────┴────────────┴────────────┴─────────┴─────────┴─────────┘ 133 134Apple M2 135 136┌────────────────┬────────────┬────────────┬────────────┬─────────┬─────────┬─────────┐ 137│ avg_rank ┆ 1.62 ┆ 2.81 ┆ 2.02 ┆ 3.58 ┆ 4.97 │ 138│ geometric_mean ┆ 4.41 ┆ 4.86 ┆ 5.39 ┆ 5.71 ┆ 21.94 │ 139╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ 140│ distr ┆ bench ┆ foldhash-f ┆ foldhash-q ┆ fxhash ┆ ahash ┆ siphash │ 141╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡ 142│ u64 ┆ hashonly ┆ 0.60 ┆ 0.70 ┆ 0.41 ┆ 0.78 ┆ 6.61 │ 143│ u64 ┆ lookupmiss ┆ 1.50 ┆ 1.61 ┆ 1.23 ┆ 1.65 ┆ 8.28 │ 144│ u64 ┆ lookuphit ┆ 1.78 ┆ 2.10 ┆ 1.57 ┆ 2.25 ┆ 8.53 │ 145│ u64 ┆ setbuild ┆ 4.74 ┆ 5.19 ┆ 3.61 ┆ 5.38 ┆ 15.36 │ 146| ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... | 147│ strenglishword ┆ hashonly ┆ 1.84 ┆ 2.13 ┆ 1.85 ┆ 2.13 ┆ 11.61 │ 148│ strenglishword ┆ lookupmiss ┆ 2.71 ┆ 2.96 ┆ 2.47 ┆ 2.99 ┆ 9.27 │ 149│ strenglishword ┆ lookuphit ┆ 7.54 ┆ 8.77 ┆ 7.83 ┆ 8.77 ┆ 18.65 │ 150│ strenglishword ┆ setbuild ┆ 16.61 ┆ 17.09 ┆ 14.83 ┆ 16.52 ┆ 26.42 │ 151└────────────────┴────────────┴────────────┴────────────┴─────────┴─────────┴─────────┘ 152``` 153 154We note from the above benchmark that for hash table performance the extra 155quality that `foldhash-q` provides is almost never actually worth the small but 156also non-negligible computational overhead it has over `foldhash-f`. This is our 157justification for providing `foldhash::fast` as a default choice for hash 158tables, even though it has measurable biases (see also the "Quality" section). 159 160fxhash generally does fairly well for small inputs on the benchmarks, however it 161has structural weaknesses as a hash which makes it ill-advised to use as a 162general-purpose hash function in our opinion. For example the `lookuphit` 163benchmark on Apple M2 for `u64hibits` takes 1.77 nanoseconds per lookup for 164foldhash, but 67.72 nanoseconds for fxhash (due to everything colliding - the 165effects would be even worse with a larger hash map). In our opinion foldhash-f 166strikes the right balance between quality and performance for hash tables, 167whereas fxhash flies a bit too close to the sun. 168 169aHash is faster than foldhash for medium-long strings when compiled with 170AES instruction support, but is slower in almost every other scenario or when 171AES instructions are unavailable. 172 173 174## Quality 175 176Foldhash-f is a fairly strong hash in terms of collisions *on its full 64-bit 177output*. However, statistical tests such as 178[SMHasher3](https://gitlab.com/fwojcik/smhasher3) can distinguish it from an ideal 179hash function in tests that focus on the relationship between individual 180input/output bits. One such property is avalanching: changing a single bit in 181the input does not flip every other bit with 50% probability when using 182foldhash-f like it should if it behaved like a proper random oracle. 183 184As the benchmarks above show, spending more effort in foldhash-f to improve the 185hash quality does not lead to better hash table performance. However, there are 186also use cases for hash functions where it is important that (each bit of) the 187hash is unbiased and a random function of all bits of the input, such as in 188algorithms as HyperLogLog or MinHash. 189 190For this purpose we also provide foldhash-q, which is simply a post-processed 191version of foldhash-f to properly avalanche all the bits. Foldhash-q passes the 192[SMHasher3](https://gitlab.com/fwojcik/smhasher3) test suite [without any 193failures](https://github.com/orlp/foldhash_smhasher3). You can also plot the 194worst-case probability (where 50% is ideal) that any particular output bit flips 195if you flip an input bit, which nicely visualizes how fxhash and foldhash-f 196fail this avalanche property but foldhash-q and SipHash-1-3 pass: 197 198 199| FxHash | Foldhash-f | Foldhash-q | SipHash-1-3 | 200|--------|------------|------------|-------------| 201| <img src="assets/avalanche-fxhash.png" width=300> | <img src="assets/avalanche-foldhash-fast.png" width=300> | <img src="assets/avalanche-foldhash-quality.png" width=300> | <img src="assets/avalanche-siphash.png" width=300> 202 203 204## Background 205 206The name foldhash is derived from the *folded multiply*. This technique 207compresses two 64-bit words into a single 64-bit word while simultaneously 208thoroughly mixing the bits. It does this using a 64 x 64 bit -> 128 bit 209multiplication followed by folding the two halves of the 128-bit product 210together using a XOR operation: 211 212```rust 213let full = (x as u128) * (y as u128); 214let lo = full as u64; 215let hi = (full >> 64) as u64; 216let folded = lo ^ hi; 217``` 218 219We're not aware of a formal analysis of this operation, but empirically it works 220very well. An informal intuition for why it should work is that multiplication 221can be seen as the sum of many shifted copies of one of the arguments, only 222including those shifted copies where the other argument has set bits, e.g. for 223multiplying 4-bit words `abcd` and `efgh`: 224 225``` 226abcd * efgh = 227 228 abcd * e 229 abcd * f 230 abcd * g 231 abcd * h 232--------------- + 233``` 234 235Note that the middle bits of the product are a function of many of the input 236bits, whereas the top-most and bottom-most bits are impacted by fewer of the 237input bits. By folding the top half back onto the bottom half these effects 238compensate each other, making all the output bits affected by much of the input. 239 240We did not invent the folded multiply, it was previously used in essentially the 241same way in [aHash](https://github.com/tkaitchuck/aHash), 242[wyhash](https://github.com/wangyi-fudan/wyhash), and 243[xxhash3](https://github.com/Cyan4973/xxHash). The operation was also used 244in [mum-hash](https://github.com/vnmakarov/mum-hash), and probably others. 245We do not know who originally invented it, the earliest reference 246we could find was Steven Fuerst [blogging about it](https://web.archive.org/web/20121213174842/http://locklessinc.com/articles/crypto_hash/) 247in 2012. 248 249 250## HashDoS resistance 251 252The folded multiply has a fairly glaring flaw: if one of the halves is zero, the 253output is zero. This makes it trivial to create a large number of hash 254collisions (even by accident, as zeroes are a common input to hashes). To combat 255this, every folded multiply in foldhash has the following form: 256 257```rust 258folded_multiply(input1 ^ secret1, input2 ^ secret2) 259``` 260 261Here `secret1` or `secret2` are either secret random numbers generated by 262foldhash beforehand, or partial hash results influenced by such a secret prior. 263This (plus other careful design throughout the hash function) ensures that it is 264not possible to create a list of inputs that collide for every instance of 265foldhash, and also prevents certain access patterns on hash tables going 266quadratric by ensuring that each hash table uses a different seed and thus a 267different access pattern. It is these two properties that we refer to when we 268claim foldhash is "minimally DoS-resistant": it does the bare minimum to defeat 269very simple attacks. 270 271However, to be crystal clear, **foldhash does not claim to provide HashDoS 272resistance against interactive attackers**. For a student of cryptography it 273should be trivial to derive the secret values from direct observation of hash 274outputs, and feasible to derive the secret values from indirect observation of 275hashes, such as through timing attacks or hash table iteration. Once an attacker 276knows the secret values, they can once again create infinite hash collisions 277with ease. 278