• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1
2__collisionsTest__ is a brute force hash analyzer
3which will measure a 64-bit hash algorithm's collision rate
4by generating billions of hashes,
5and comparing the result to an "ideal" target.
6
7The test requires a very large amount of memory.
8By default, it will generate 24 billion of 64-bit hashes,
9requiring __192 GB of RAM__ for their storage.
10The number of hashes can be modified using command `--nbh=`.
11Be aware that testing the collision ratio of 64-bit hashes
12requires a very large amount of hashes (several billion) for meaningful measurements.
13
14To reduce RAM usage, an optional filter can be requested, with `--filter`.
15It reduces the nb of candidates to analyze, hence associated RAM budget.
16Note that the filter itself requires a lot of RAM
17(32 GB by default, can be modified using `--filterlog=`,
18a too small filter will not be efficient, aim at ~2 bytes per hash),
19and reading and writing into filter cost a significant CPU budget,
20so this method is slower.
21It also doesn't allow advanced analysis of partial bitfields,
22since most hashes will be discarded and not stored.
23
24When using the filter, the RAM budget consists of the filter and a list of candidates,
25which will be a fraction of the original hash list.
26Using default settings (24 billion hashes, 32 GB filter),
27the number of potential candidates should be reduced to less than 2 billion,
28requiring ~14 GB for their storage.
29Such a result also depends on hash algorithm's efficiency.
30The number of effective candidates is likely to be lower, at ~ 1 billion,
31but storage must allocate an upper bound.
32
33For the default test, the expected "optimal" collision rate for a 64-bit hash function is ~18 collisions.
34
35#### How to build
36```
37make
38```
39
40Note: the code is a mix of C99 and C++14,
41it's not compatible with a C90-only compiler.
42
43#### Build modifier
44
45- `SLAB5`: use alternative pattern generator, friendlier for weak hash algorithms
46- `POOL_MT`: if `=0`, disable multi-threading code (enabled by default)
47
48#### How to integrate any hash in the tester
49
50The build script will compile files found in `./allcodecs`.
51Put the source code here.
52This also works if the hash is a single `*.h` file.
53
54The glue happens in `hashes.h`.
55In this file, there are 2 sections:
56- Adds the required `#include "header.h"`, and creates a wrapper
57to respect the format expected by the function pointer.
58- Adds the wrapper, along with the name and an indication of the output width,
59to the table, at the end of `hashes.h`
60
61Build with `make`. Locate your new hash with `./collisionsTest -h`,
62it should be listed.
63
64
65#### Usage
66
67```
68usage: ./collisionsTest [hashName] [opt]
69
70list of hashNames: (...)
71
72Optional parameters:
73  --nbh=NB       Select nb of hashes to generate (25769803776 by default)
74  --filter       Enable the filter. Slower, but reduces memory usage for same nb of hashes.
75  --threadlog=NB Use 2^NB threads
76  --len=NB       Select length of input (255 bytes by default)
77```
78
79#### Some advises on how to setup a collisions test
80
81Most tests are primarily driven by the amount of RAM available.
82Here's a method to decide the size of the test.
83
84Presuming that RAM budget is not plentiful, for this example 32 GB,
85the `--filter` mode is actually compulsory to measure anything meaningful.
86Let's plan 50% of memory for the filter, that's 16 GB.
87This will be good enough to filter about 10% less hashes than this size.
88Let's round down to 14 G.
89
90By requesting 14G, the expectation is that the program will automatically
91size the filter to 16 GB, and expect to store ~1G candidates,
92leaving enough room to breeze for the system.
93
94The command line becomes:
95```
96./collisionsTest --nbh=14G --filter NameOfHash
97```
98
99#### Examples:
100
101Here are a few results produced with this tester:
102
103| Algorithm | Input Len | Nb Hashes | Expected | Nb Collisions | Notes |
104| ---        | --- | ---    | ---   | --- | --- |
105| __XXH3__   | 255 | 100 Gi | 312.5 | 326 |  |
106| __XXH64__  | 255 | 100 Gi | 312.5 | 294 |  |
107| __XXH128__ low64 | 512 | 100 Gi | 312.5 | 321 |  |
108| __XXH128__ high64| 512 | 100 Gi | 312.5 | 325 |  |
109| __XXH128__ | 255 | 100 Gi |   0.0 |   0 | a 128-bit hash is expected to generate 0 collisions |
110
111Test on small inputs:
112
113| Algorithm  | Input Len | Nb Hashes | Expected | Nb Collisions | Notes |
114| ---        | --- | ---    | --- | --- | --- |
115| __XXH64__  |   8 | 100 Gi | 312.5 | __0__ | `XXH64` is bijective for `len==8` |
116| __XXH3__   |   8 | 100 Gi | 312.5 | __0__ | `XXH3` is also bijective for `len==8` |
117| __XXH3__   |  16 | 100 Gi | 312.5 | 332 |  |
118| __XXH3__   |  32 |  14 Gi |   6.1 |   3 |  |
119| __XXH128__ |  16 |  25 Gi |   0.0 |   0 | test range 9-16 |
120| __XXH128__ |  32 |  25 Gi |   0.0 |   0 | test range 17-128 |
121| __XXH128__ | 100 |  13 Gi |   0.0 |   0 | test range 17-128 |
122| __XXH128__ | 200 |  13 Gi |   0.0 |   0 | test range 129-240 |
123