1CityHash, a family of hash functions for strings. 2 3 4Introduction 5============ 6 7CityHash provides hash functions for strings. The functions mix the 8input bits thoroughly but are not suitable for cryptography. See 9"Hash Quality," below, for details on how CityHash was tested and so on. 10 11We provide reference implementations in C++, with a friendly MIT license. 12 13CityHash32() returns a 32-bit hash. 14 15CityHash64() and similar return a 64-bit hash. 16 17CityHash128() and similar return a 128-bit hash and are tuned for 18strings of at least a few hundred bytes. Depending on your compiler 19and hardware, it's likely faster than CityHash64() on sufficiently long 20strings. It's slower than necessary on shorter strings, but we expect 21that case to be relatively unimportant. 22 23CityHashCrc128() and similar are variants of CityHash128() that depend 24on _mm_crc32_u64(), an intrinsic that compiles to a CRC32 instruction 25on some CPUs. However, none of the functions we provide are CRCs. 26 27CityHashCrc256() is a variant of CityHashCrc128() that also depends 28on _mm_crc32_u64(). It returns a 256-bit hash. 29 30All members of the CityHash family were designed with heavy reliance 31on previous work by Austin Appleby, Bob Jenkins, and others. 32For example, CityHash32 has many similarities with Murmur3a. 33 34Performance on long strings: 64-bit CPUs 35======================================== 36 37We are most excited by the performance of CityHash64() and its variants on 38short strings, but long strings are interesting as well. 39 40CityHash is intended to be fast, under the constraint that it hash very 41well. For CPUs with the CRC32 instruction, CRC is speedy, but CRC wasn't 42designed as a hash function and shouldn't be used as one. CityHashCrc128() 43is not a CRC, but it uses the CRC32 machinery. 44 45On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at about 465 to 5.5 bytes/cycle. The other CityHashCrc functions are wrappers around 47CityHashCrc256 and should have similar performance on long strings. 48(CityHashCrc256 in v1.0.3 was even faster, but we decided it wasn't as thorough 49as it should be.) CityHash128 peaks at about 4.3 bytes/cycle. The fastest 50Murmur variant on that hardware, Murmur3F, peaks at about 2.4 bytes/cycle. 51We expect the peak speed of CityHash128 to dominate CityHash64, which is 52aimed more toward short strings or use in hash tables. 53 54For long strings, a new function by Bob Jenkins, SpookyHash, is just 55slightly slower than CityHash128 on Intel x86-64 CPUs, but noticeably 56faster on AMD x86-64 CPUs. For hashing long strings on AMD CPUs 57and/or CPUs without the CRC instruction, SpookyHash may be just as 58good or better than any of the CityHash variants. 59 60Performance on short strings: 64-bit CPUs 61========================================= 62 63For short strings, e.g., most hash table keys, CityHash64 is faster than 64CityHash128, and probably faster than all the aforementioned functions, 65depending on the mix of string lengths. Here are a few results from that 66same hardware, where we (unrealistically) tested a single string length over 67and over again: 68 69Hash Results 70------------------------------------------------------------------------------ 71CityHash64 v1.0.3 7ns for 1 byte, or 6ns for 8 bytes, or 9ns for 64 bytes 72Murmur2 (64-bit) 6ns for 1 byte, or 6ns for 8 bytes, or 15ns for 64 bytes 73Murmur3F 14ns for 1 byte, or 15ns for 8 bytes, or 23ns for 64 bytes 74 75We don't have CityHash64 benchmarks results for v1.1, but we expect the 76numbers to be similar. 77 78Performance: 32-bit CPUs 79======================== 80 81CityHash32 is the newest variant of CityHash. It is intended for 8232-bit hardware in general but has been mostly tested on x86. Our benchmarks 83suggest that Murmur3 is the nearest competitor to CityHash32 on x86. 84We don't know of anything faster that has comparable quality. The speed rankings 85in our testing: CityHash32 > Murmur3f > Murmur3a (for long strings), and 86CityHash32 > Murmur3a > Murmur3f (for short strings). 87 88Installation 89============ 90 91We provide reference implementations of several CityHash functions, written 92in C++. The build system is based on autoconf. It defaults the C++ 93compiler flags to "-g -O2", which is probably slower than -O3 if you are 94using gcc. YMMV. 95 96On systems with gcc, we generally recommend: 97 98./configure 99make all check CXXFLAGS="-g -O3" 100sudo make install 101 102Or, if your system has the CRC32 instruction, and you want to build everything: 103 104./configure --enable-sse4.2 105make all check CXXFLAGS="-g -O3 -msse4.2" 106sudo make install 107 108Note that our build system doesn't try to determine the appropriate compiler 109flag for enabling SSE4.2. For gcc it is "-msse4.2". The --enable-sse4.2 110flag to the configure script controls whether citycrc.h is installed when 111you "make install." In general, picking the right compiler flags can be 112tricky, and may depend on your compiler, your hardware, and even how you 113plan to use the library. 114 115For generic information about how to configure this software, please try: 116 117./configure --help 118 119Failing that, please work from city.cc and city*.h, as they contain all the 120necessary code. 121 122 123Usage 124===== 125 126The above installation instructions will produce a single library. It will 127contain CityHash32(), CityHash64(), and CityHash128(), and their variants, 128and possibly CityHashCrc128(), CityHashCrc128WithSeed(), and 129CityHashCrc256(). The functions with Crc in the name are declared in 130citycrc.h; the rest are declared in city.h. 131 132 133Limitations 134=========== 135 1361) CityHash32 is intended for little-endian 32-bit code, and everything else in 137the current version of CityHash is intended for little-endian 64-bit CPUs. 138 139All functions that don't use the CRC32 instruction should work in 140little-endian 32-bit or 64-bit code. CityHash should work on big-endian CPUs 141as well, but we haven't tested that very thoroughly yet. 142 1432) CityHash is fairly complex. As a result of its complexity, it may not 144perform as expected on some compilers. For example, preliminary reports 145suggest that some Microsoft compilers compile CityHash to assembly that's 14610-20% slower than it could be. 147 148 149Hash Quality 150============ 151 152We like to test hash functions with SMHasher, among other things. 153SMHasher isn't perfect, but it seems to find almost any significant flaw. 154SMHasher is available at http://code.google.com/p/smhasher/ 155 156SMHasher is designed to pass a 32-bit seed to the hash functions it tests. 157No CityHash function is designed to work that way, so we adapt as follows: 158For our functions that accept a seed, we use the given seed directly (padded 159with zeroes); for our functions that don't accept a seed, we hash the 160concatenation of the given seed and the input string. 161 162The CityHash functions have the following flaws according to SMHasher: 163 164(1) CityHash64: none 165 166(2) CityHash64WithSeed: none 167 168(3) CityHash64WithSeeds: did not test 169 170(4) CityHash128: none 171 172(5) CityHash128WithSeed: none 173 174(6) CityHashCrc128: none 175 176(7) CityHashCrc128WithSeed: none 177 178(8) CityHashCrc256: none 179 180(9) CityHash32: none 181 182Some minor flaws in 32-bit and 64-bit functions are harmless, as we 183expect the primary use of these functions will be in hash tables. We 184may have gone slightly overboard in trying to please SMHasher and other 185similar tests, but we don't want anyone to choose a different hash function 186because of some minor issue reported by a quality test. 187 188 189For more information 190==================== 191 192http://code.google.com/p/cityhash/ 193 194cityhash-discuss@googlegroups.com 195 196Please feel free to send us comments, questions, bug reports, or patches. 197