1# Base38 and FourCC Codes 2 3Both of these encode a four-character string such as `"JPEG"` as a `uint32_t` 4value. Computers can compare two integer values faster than they can compare 5two arbitrary strings. 6 7Both schemes maintain ordering: if two four-character strings `s` and `t` 8satisfy `(s < t)`, and those strings have valid numerical encodings, then the 9numerical values also satisfy `(encoding(s) < encoding(t))`. 10 11 12## FourCC 13 14FourCC codes are not specific to Wuffs. For example, the AVI multimedia 15container format can hold various sub-formats, such as "H264" or "YV12", 16distinguished in the overall file format by their FourCC code. 17 18The FourCC encoding is the straightforward sequence of each character's ASCII 19encoding. The FourCC code for `"JPEG"` is `0x4A504547`, since `'J'` is `0x4A`, 20`'P'` is `0x50`, etc. This is essentially 8 bits for each character, 32 bits 21overall. The big-endian representation of this number is exactly the ASCII (and 22UTF-8) string `"JPEG"`. 23 24Other FourCC documentation sometimes use a little-endian convention, so that 25the `{0x4A, 0x50, 0x45, 0x47}` bytes on the wire for `"JPEG"` corresponds to 26the number `0x4745504A` (little-endian) instead of `0x4A504547` (big-endian). 27Wuffs uses the big-endian interpretation, as it maintains ordering. 28 29 30## Base38 31 32Base38 is a tighter encoding than FourCC, fitting four characters into 21 bits 33instead of 32 bits. This is achieved by using a smaller alphabet of 38 possible 34values (space, 0-9, ? or a-z), so that it cannot distinguish between e.g. an 35upper case 'X' and a lower case 'x'. There's also the happy coincidence that 36`38 ** 4` is slightly smaller than `2 ** 21`. 37 38The base38 encoding of `"JPEG"` is `0x122FF6`, which is `1191926`, which is 39`((21 * (38 ** 3)) + (27 * (38 ** 2)) + (16 * (38 ** 1)) + (18 * (38 ** 0)))`. 40 41Using only 21 bits means that we can use base38 values to partition the set of 42possible `uint32_t` values into file-format specific enumerations. Each package 43(i.e. Wuffs implementation of a specific file format) can define up to 1024 44different values in their own namespace, without conflicting with other 45packages (assuming that there aren't e.g. two `"JPEG"` Wuffs packages in the 46same library). The conventional `uint32_t` packing is: 47 48- Bit 31 is reserved (zero). 49- Bits 30 .. 10 are the base38 value, shifted by 10. 50- Bits 9 .. 0 are the enumeration value. 51 52For example, [quirk values](/doc/note/quirks.md) use this `((base38 << 10) | 53enumeration)` scheme. 54