xxHash is an extremely fast non-cryptographic hash algorithm, working at RAM speed limits.
It is proposed in four flavors, in three families:
The reference system uses an Intel i7-9700K CPU, and runs Ubuntu x64 20.04. The open source benchmark program is compiled with clang v10.0 using -O3 flag.
Hash Name | ISA ext | Width | Large Data Speed | Small Data Velocity |
---|---|---|---|---|
XXH3_64bits() | AVX2 | 64 | 59.4 GB/s | 133.1 |
MeowHash | AES-NI | 128 | 58.2 GB/s | 52.5 |
XXH3_128bits() | AVX2 | 128 | 57.9 GB/s | 118.1 |
CLHash | PCLMUL | 64 | 37.1 GB/s | 58.1 |
XXH3_64bits() | SSE2 | 64 | 31.5 GB/s | 133.1 |
XXH3_128bits() | SSE2 | 128 | 29.6 GB/s | 118.1 |
RAM sequential read | N/A | 28.0 GB/s | N/A | |
ahash | AES-NI | 64 | 22.5 GB/s | 107.2 |
City64 | 64 | 22.0 GB/s | 76.6 | |
T1ha2 | 64 | 22.0 GB/s | 99.0 | |
City128 | 128 | 21.7 GB/s | 57.7 | |
FarmHash | AES-NI | 64 | 21.3 GB/s | 71.9 |
XXH64() | 64 | 19.4 GB/s | 71.0 | |
SpookyHash | 64 | 19.3 GB/s | 53.2 | |
Mum | 64 | 18.0 GB/s | 67.0 | |
CRC32C | SSE4.2 | 32 | 13.0 GB/s | 57.9 |
XXH32() | 32 | 9.7 GB/s | 71.9 | |
City32 | 32 | 9.1 GB/s | 66.0 | |
Blake3* | AVX2 | 256 | 4.4 GB/s | 8.1 |
Murmur3 | 32 | 3.9 GB/s | 56.1 | |
SipHash* | 64 | 3.0 GB/s | 43.2 | |
Blake3* | SSE2 | 256 | 2.4 GB/s | 8.1 |
HighwayHash | 64 | 1.4 GB/s | 6.0 | |
FNV64 | 64 | 1.2 GB/s | 62.7 | |
Blake2* | 256 | 1.1 GB/s | 5.1 | |
SHA1* | 160 | 0.8 GB/s | 5.6 | |
MD5* | 128 | 0.6 GB/s | 7.8 |
All xxHash variants use a similar API. Changing the algorithm is a trivial substitution.
input
, input + length
) is valid, readable memory.length
is 0
, input
may be NULL
.unsigned char
.These functions are stateless functions which hash a contiguous block of memory, immediately returning the result. They are the easiest and usually the fastest option.
XXH32(), XXH64(), XXH3_64bits(), XXH3_128bits()
These groups of functions allow incremental hashing of unknown size, even more than what would fit in a size_t.
XXH32_reset(), XXH64_reset(), XXH3_64bits_reset(), XXH3_128bits_reset()
Streaming functions generate the xxHash value from an incremental input. This method is slower than single-call functions, due to state management. For small inputs, prefer XXH32()
and XXH64()
, which are better optimized.
An XXH state must first be allocated using XXH*_createState()
.
Start a new hash by initializing the state with a seed using XXH*_reset()
.
Then, feed the hash state by calling XXH*_update()
as many times as necessary.
The function returns an error code, with 0 meaning OK, and any other value meaning there is an error.
Finally, a hash value can be produced anytime, by using XXH*_digest()
. This function returns the nn-bits hash as an int or long long.
It's still possible to continue inserting input into the hash state after a digest, and generate new hash values later on by invoking XXH*_digest()
.
When done, release the state using XXH*_freeState()
.
The default return values from XXH functions are unsigned 32, 64 and 128 bit integers. This the simplest and fastest format for further post-processing.
However, this leaves open the question of what is the order on the byte level, since little and big endian conventions will store the same number differently.
The canonical representation settles this issue by mandating big-endian convention, the same convention as human-readable numbers (large digits first).
When writing hash values to storage, sending them over a network, or printing them, it's highly recommended to use the canonical representation to ensure portability across a wider range of systems, present and future.
The following functions allow transformation of hash values to and from canonical format.
XXH32_canonicalFromHash(), XXH32_hashFromCanonical(), XXH64_canonicalFromHash(), XXH64_hashFromCanonical(), XXH128_canonicalFromHash(), XXH128_hashFromCanonical(),