README.md
1# CRC-32
2
3CRC-32 (Cyclic Redundancy Check 32) is a checksum algorithm that hashes byte
4sequences to 32 bit values.
5
6The algorithm is based on polynomial division. In theory, a variety of
7polynomials can be used. In practice, only two are widely used. The IEEE
8polynomial is used by Bzip2, Ethernet (IEEE 802.3), Gzip, MPEG-2, PNG, SATA,
9Zip and other formats. The Castagnoli polynomial is used by Btrfs, Ext4, iSCSI,
10SCTP and other formats.
11
12
13# Polynomial Division
14
15In arithmetic, two numbers `A` and `B` can be divided to yield a quotient `Q`
16and a remainder `R`, so that `A = (B × Q) + R`. Two polynomials can similarly
17be divided, yielding quotient and remainder polynomials. This remains true when
18restricting polynomial coefficients to the [Galois Field
19GF(2)](https://en.wikipedia.org/wiki/GF(2)), equivalent to the integers modulo
202, i.e. the integers 0 and 1. Addition and multiplication are equivalent to XOR
21and AND, so that `foo + foo = 0` and `foo + 0 = foo`. For example,
22
23```
24 ((x⁴ + x¹ + x⁰) × (x⁴ + x⁰)) + (x¹ + x⁰)
25= (x⁸ + x⁵ + x⁴) + (x⁴ + x¹ + x⁰) + (x¹ + x⁰)
26= x⁸ + x⁵ + (x⁴ + x⁴) + (x¹ + x¹) + (x⁰ + x⁰)
27= x⁸ + x⁵ + 0 + 0 + 0
28= x⁸ + x⁵
29```
30
31Therefore, dividing `A(x) = x⁸ + x⁵` by `B(x) = x⁴ + x¹ + x⁰` yields a
32remainder of `R(x) = x¹ + x⁰`. Such polynomials can be represented as binary
33numbers: `A` is `0b100100000`, `B` is `0b10011` and `R` is `0b11`.
34
35
36## Bitwise Operations
37
38[Polynomial division](https://en.wikipedia.org/wiki/Polynomial_long_division)
39over GF(2) can be efficiently implemented as a sequence of XORs and shifts.
40Extending the example above (with a divisor polynomial `B` of degree `N` equal
41to 4), an input `0b10010` is first padded with `N` trailing zeroes to give the
42initial state: the `A` value `0b100100000`.
43
44The algorithm loops, alternating between shifting a copy of the `B` polynomial
45so that its left-most 1 bit aligns with the left-most 1 bit of the state, and
46XOR-ing those two values, producing a new state whose left-most 1 bit is
47further to right. The loop continues until no such shifted `B` can be found, or
48in other words, all of the state's bits to the left of the right-most `N` bits
49are all zero. What remains is the remainder polynomial, `R`.
50
51```
52input 00010010
53pad 00010010 0000
54divisor 10011
55xor 00000001 0000
56divisor 1 0011
57xor 00000000 0011
58remain 0011
59```
60
61This `B` divisor polynomial, `0b10011`, is also known as the CRC-4-ITU
62polynomial. The `4` in "CRC-4-ITU" is the `N` in "an N bit CRC". The polynomial
63has `N+1` bits, and the hash value or remainder `0b0011` (what remains after
64dividing a longer padded-input polynomial by the shorter hash polynomial) has
65`N` bits.
66
67
68## Normal (MSB) and Reversed (LSB) Representation
69
70Consider that CRC-4-ITU polynomial again: (x⁴ + x¹ + x⁰) or `0b10011` in
71binary. The high bit of an `N`-degree polynomial is always 1, so an equivalent
72representation of that polynomial is `N=4, MSB, BITS=0b0011`. This is the
73"normal" or Most Significant Bit first order. Another equivalent representation
74is to visit the bits right-to-left: `N=4, LSB, BITS=0b1100`. This is the
75"reversed" or Least Significant Bit first order. The binary number `0b1100` in
76hexadecimal is `0xC`, so yet another equivalent representation of that
77polynomial is `N=4, LSB, BITS=0xC`. If `N` and `LSB`-ness is agreed beforehand,
78`0xC` is all you need to specify the polynomial.
79
80For 32 bit CRCs, the two popular polynomials (presented in `LSB` order) are
81`0xEDB88320` and `0x82F63B78`, also called the IEEE and Castagnoli polynomials,
82also called CRC-32 and CRC-32C. For example, the bit string representation of
83the IEEE polynomial `0xEDB88320`, un-reversed and with the implicit high bit,
84is `0b1_00000100_11000001_00011101_10110111`.
85
86
87# Worked Example
88
89A worked example for calculating the CRC-32 hash of the three byte input "Hi\n"
90(equivalently, "\x48\x69\x0A" but note in the work below's input that the bits
91within each byte are reversed to be LSB-first) proceeds similarly to the
92simpler worked example for the CRC-4-ITU hash, above, with two additional
93inversion steps, described below.
94
95Both the CRC-4-ITU and the CRC-32 worked examples are output by the
96`script/print-crc32-example.go` program.
97
98```
99input 00010010 10010110 01010000
100pad 00010010 10010110 01010000 00000000 00000000 00000000 00000000
101invert 11101101 01101001 10101111 11111111 00000000 00000000 00000000
102divisor 10000010 01100000 10001110 11011011 1
103xor 01101111 00001001 00100001 00100100 10000000 00000000 00000000
104divisor 1000001 00110000 01000111 01101101 11
105xor 00101110 00111001 01100110 01001001 01000000 00000000 00000000
106divisor 100000 10011000 00100011 10110110 111
107xor 00001110 10100001 01000101 11111111 10100000 00000000 00000000
108divisor 1000 00100110 00001000 11101101 10111
109xor 00000110 10000111 01001101 00010010 00011000 00000000 00000000
110divisor 100 00010011 00000100 01110110 110111
111xor 00000010 10010100 01001001 01100100 11000100 00000000 00000000
112divisor 10 00001001 10000010 00111011 0110111
113xor 00000000 10011101 11001011 01011111 10101010 00000000 00000000
114divisor 10000010 01100000 10001110 11011011 1
115xor 00000000 00011111 10101011 11010001 01110001 10000000 00000000
116divisor 10000 01001100 00010001 11011011 0111
117xor 00000000 00001111 11100111 11000000 10101010 11110000 00000000
118divisor 1000 00100110 00001000 11101101 10111
119xor 00000000 00000111 11000001 11001000 01000111 01001000 00000000
120divisor 100 00010011 00000100 01110110 110111
121xor 00000000 00000011 11010010 11001100 00110001 10010100 00000000
122divisor 10 00001001 10000010 00111011 0110111
123xor 00000000 00000001 11011011 01001110 00001010 11111010 00000000
124divisor 1 00000100 11000001 00011101 10110111
125xor 00000000 00000000 11011111 10001111 00010111 01001101 00000000
126divisor 10000010 01100000 10001110 11011011 1
127xor 00000000 00000000 01011101 11101111 10011001 10010110 10000000
128divisor 1000001 00110000 01000111 01101101 11
129xor 00000000 00000000 00011100 11011111 11011110 11111011 01000000
130divisor 10000 01001100 00010001 11011011 0111
131xor 00000000 00000000 00001100 10010011 11001111 00100000 00110000
132divisor 1000 00100110 00001000 11101101 10111
133xor 00000000 00000000 00000100 10110101 11000111 11001101 10001000
134divisor 100 00010011 00000100 01110110 110111
135xor 00000000 00000000 00000000 10100110 11000011 10111011 01010100
136remain 10100110 11000011 10111011 01010100
137invert 01011001 00111100 01000100 10101011
138hex A 9 C 3 2 2 5 D
139```
140
141The final line says that the CRC-32 checksum of "Hi\n" is 0xD5223C9A. This can
142be verified by running the `/usr/bin/crc32` program:
143
144```
145$ echo Hi | xxd /dev/stdin
14600000000: 4869 0a Hi.
147$ echo Hi | crc32 /dev/stdin
148d5223c9a
149```
150
151
152## Inversion
153
154An `A` value of `0b100100000` as a mathematical concept (whether as a binary
155number of as a polynomial) is unchanged by adding leading 0 bits. Thus, the CRC
156algorithm in its basic form will compute the same hash value for both some
157string `s` and another string that is multiple "\x00" bytes followed by `s`.
158
159It's easy for network errors or other corruption to introduce multiple "\x00"
160bytes, and a good checksum should be able detect that. To do so, CRC as used in
161practice inverts (applies a bitwise NOT to) the first `N` bits of the padded
162input, just before the divisor-xor loop of the algorithm.
163
164Similarly but independently of that, the same issue (in a more limited way) can
165occur with trailing "\x00" bytes. The same trick addresses that, this time
166inverting the last `N` bits, just after the divisor-xor loop.
167
168While the two inversions are notionally independent, and it would be possible
169to implement a CRC flavor that inverted only one side, inverting on both sides
170results in a nice decomposability property. Calculating the CRC hash of the
171concatenation of two strings, `s+t`, can be computed by first hashing `s`, then
172hashing `t` starting with an initial state equal to that hash instead of equal
173to zero. The second inversion of the `s` computation cancels out the first
174inversion of the `t` computation.
175
176
177# Fast Computation
178
179In theory, the mathematics of the CRC algorithm works in terms of bit streams.
180In practice, the computation of the CRC hash value works with byte streams: 8
181(or more) bits at a time. It is faster to do so because CPU and RAM
182fundamentally work with bytes (or words) instead of bits, and because the
183bit-by-bit loop (over 8 successive 0 or 1 input bits) can be implemented as a
184byte-by-byte loop involving a single lookup into a 256-entry table, yielding a
18532-bit value to XOR with the cumulative state. For well known polynomials, such
186as the IEEE and Castagnoli polynomials, these lookup tables can be calculated
187beforehand, and hard-coded at compile time. A uint32 `state` variable can be
188updated by the simple algorithm (with `invert` meaning `bitwise_not`):
189
190```
191state = invert(state)
192for each input byte x {
193 state = table[x ^ (state & 0xFF)] ^ (state >> 8)
194}
195state = invert(state)
196```
197
198
199## Multi-Byte Lookup Tables
200
201Better performance can be obtained by processing M bytes at a time, e.g. for an
202M of 4, 8 or 16. Even at only 4 bytes, a naive implementation would require a
203more-than-4-billion (256 × 256 × 256 × 256) entry lookup table, which is
204impractical. A cleverer algorithm can process 4 bytes at a time using (256 +
205256 + 256 + 256) entries. See [A Systematic Approach to Building High
206Performance, Software-based, CRC
207Generators](https://web.archive.org/web/20060515024705/http://www.intel.com/technology/comms/perfnet/download/CRC_generators.pdf)
208by Kounavis and Berry of Intel Corporation. [Stephan Brumme's CRC-32
209page](https://create.stephan-brumme.com/crc32/#slicing-by-16-overview) also has
210some more discussion and code examples.
211
212This slicing-by-M algorithm is still applicable even when the input isn't an
213exact multiple of M bytes. The bulk of the input is processed M bytes at a
214time, and the remainder is then processed 1 byte at a time.
215
216For M greater than 4, the first 4 bytes of each slice are treated in one way,
217since the state (a uint32) is 4 bytes long. The remaining M-4 bytes are treated
218a second way. See [the actual code](./common_crc32.wuffs) for details.
219
220
221## SIMD Implementations
222
223Even better performance can be obtained through CPU-specific SIMD instructions.
224See [Fast CRC Computation for Generic Polynomials Using PCLMULQDQ
225Instruction](https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf)
226by Gopal, Ozturk, Guilford, Wolrich, Feghali and Dixon of Intel Corporation and
227Karakoyunlu of the Worcester Polytechnic Institute.
228
229Wuffs does not currently implement the SIMD algorithm.
230
231
232# Further Reading
233
234See a couple of Wikipedia articles:
235
236- [CRC](https://en.wikipedia.org/wiki/Cyclic_redundancy_check),
237- [Computation of
238 CRCs](https://en.wikipedia.org/wiki/Computation_of_cyclic_redundancy_checks).
239