xxHash 0.8.3
Extremely fast non-cryptographic hash function
 
Loading...
Searching...
No Matches
xxHash

xxHash is an extremely fast non-cryptographic hash algorithm, working at RAM speed limits.

It is proposed in four flavors, in three families:

  1. XXH32 family
    • Classic 32-bit hash function. Simple, compact, and runs on almost all 32-bit and 64-bit systems.
  2. XXH64 family
    • Classic 64-bit adaptation of XXH32. Just as simple, and runs well on most 64-bit systems (but not 32-bit systems).
  3. XXH3 family
    • Modern 64-bit and 128-bit hash function family which features improved strength and performance across the board, especially on smaller data. It benefits greatly from SIMD and 64-bit without requiring it.

Benchmarks

The reference system uses an Intel i7-9700K CPU, and runs Ubuntu x64 20.04. The open source benchmark program is compiled with clang v10.0 using -O3 flag.

Hash Name ISA ext Width Large Data Speed Small Data Velocity
XXH3_64bits() AVX2 64 59.4 GB/s 133.1
MeowHash AES-NI 128 58.2 GB/s 52.5
XXH3_128bits() AVX2 128 57.9 GB/s 118.1
CLHash PCLMUL 64 37.1 GB/s 58.1
XXH3_64bits() SSE2 64 31.5 GB/s 133.1
XXH3_128bits() SSE2 128 29.6 GB/s 118.1
RAM sequential read N/A 28.0 GB/s N/A
ahash AES-NI 64 22.5 GB/s 107.2
City64 64 22.0 GB/s 76.6
T1ha2 64 22.0 GB/s 99.0
City128 128 21.7 GB/s 57.7
FarmHash AES-NI 64 21.3 GB/s 71.9
XXH64() 64 19.4 GB/s 71.0
SpookyHash 64 19.3 GB/s 53.2
Mum 64 18.0 GB/s 67.0
CRC32C SSE4.2 32 13.0 GB/s 57.9
XXH32() 32 9.7 GB/s 71.9
City32 32 9.1 GB/s 66.0
Blake3* AVX2 256 4.4 GB/s 8.1
Murmur3 32 3.9 GB/s 56.1
SipHash* 64 3.0 GB/s 43.2
Blake3* SSE2 256 2.4 GB/s 8.1
HighwayHash 64 1.4 GB/s 6.0
FNV64 64 1.2 GB/s 62.7
Blake2* 256 1.1 GB/s 5.1
SHA1* 160 0.8 GB/s 5.6
MD5* 128 0.6 GB/s 7.8
Note
  • Hashes which require a specific ISA extension are noted. SSE2 is also noted, even though it is mandatory on x64.
  • Hashes with an asterisk are cryptographic. Note that MD5 is non-cryptographic by modern standards.
  • Small data velocity is a rough average of algorithm's efficiency for small data. For more accurate information, see the wiki.
  • More benchmarks and strength tests are found on the wiki: https://github.com/Cyan4973/xxHash/wiki

Usage

All xxHash variants use a similar API. Changing the algorithm is a trivial substitution.

Precondition
For functions which take an input and length parameter, the following requirements are assumed:
  • The range from [input, input + length) is valid, readable memory.
    • The only exception is if the length is 0, input may be NULL.
  • For C++, the objects must have the TriviallyCopyable property, as the functions access bytes directly as if it was an array of unsigned char.

Single Shot

These functions are stateless functions which hash a contiguous block of memory, immediately returning the result. They are the easiest and usually the fastest option.

XXH32(), XXH64(), XXH3_64bits(), XXH3_128bits()

#include <string.h>
#include "xxhash.h"
// Example for a function which hashes a null terminated string with XXH32().
XXH32_hash_t hash_string(const char* string, XXH32_hash_t seed)
{
// NULL pointers are only valid if the length is zero
size_t length = (string == NULL) ? 0 : strlen(string);
return XXH32(string, length, seed);
}
XXH32_hash_t XXH32(const void *input, size_t length, XXH32_hash_t seed)
Calculates the 32-bit hash of input using xxHash32.
Definition xxhash.h:3183
uint32_t XXH32_hash_t
An unsigned 32-bit integer.
Definition xxhash.h:587

Streaming

These groups of functions allow incremental hashing of unknown size, even more than what would fit in a size_t.

XXH32_reset(), XXH64_reset(), XXH3_64bits_reset(), XXH3_128bits_reset()

#include <stdio.h>
#include <assert.h>
#include "xxhash.h"
// Example for a function which hashes a FILE incrementally with XXH3_64bits().
XXH64_hash_t hashFile(FILE* f)
{
// Allocate a state struct. Do not just use malloc() or new.
assert(state != NULL && "Out of memory!");
// Reset the state to start a new hashing session.
char buffer[4096];
size_t count;
// Read the file in chunks
while ((count = fread(buffer, 1, sizeof(buffer), f)) != 0) {
// Run update() as many times as necessary to process the data
XXH3_64bits_update(state, buffer, count);
}
// Retrieve the finalized hash. This will not change the state.
// Free the state. Do not use free().
return result;
}
XXH64_hash_t XXH3_64bits_digest(const XXH3_state_t *statePtr)
Returns the calculated XXH3 64-bit hash value from an XXH3_state_t.
Definition xxhash.h:6660
struct XXH3_state_s XXH3_state_t
The opaque state struct for the XXH3 streaming API.
Definition xxhash.h:1244
XXH3_state_t * XXH3_createState(void)
Allocate an XXH3_state_t.
Definition xxhash.h:6369
XXH_errorcode XXH3_64bits_update(XXH3_state_t *statePtr, const void *input, size_t length)
Consumes a block of input to an XXH3_state_t.
Definition xxhash.h:6615
XXH_errorcode XXH3_64bits_reset(XXH3_state_t *statePtr)
Resets an XXH3_state_t to begin a new hash.
Definition xxhash.h:6431
XXH_errorcode XXH3_freeState(XXH3_state_t *statePtr)
Frees an XXH3_state_t.
Definition xxhash.h:6389
uint64_t XXH64_hash_t
An unsigned 64-bit integer.
Definition xxhash.h:866

Streaming functions generate the xxHash value from an incremental input. This method is slower than single-call functions, due to state management. For small inputs, prefer XXH32() and XXH64(), which are better optimized.

An XXH state must first be allocated using XXH*_createState().

Start a new hash by initializing the state with a seed using XXH*_reset().

Then, feed the hash state by calling XXH*_update() as many times as necessary.

The function returns an error code, with 0 meaning OK, and any other value meaning there is an error.

Finally, a hash value can be produced anytime, by using XXH*_digest(). This function returns the nn-bits hash as an int or long long.

It's still possible to continue inserting input into the hash state after a digest, and generate new hash values later on by invoking XXH*_digest().

When done, release the state using XXH*_freeState().

Canonical Representation

The default return values from XXH functions are unsigned 32, 64 and 128 bit integers. This the simplest and fastest format for further post-processing.

However, this leaves open the question of what is the order on the byte level, since little and big endian conventions will store the same number differently.

The canonical representation settles this issue by mandating big-endian convention, the same convention as human-readable numbers (large digits first).

When writing hash values to storage, sending them over a network, or printing them, it's highly recommended to use the canonical representation to ensure portability across a wider range of systems, present and future.

The following functions allow transformation of hash values to and from canonical format.

XXH32_canonicalFromHash(), XXH32_hashFromCanonical(), XXH64_canonicalFromHash(), XXH64_hashFromCanonical(), XXH128_canonicalFromHash(), XXH128_hashFromCanonical(),

#include <stdio.h>
#include "xxhash.h"
// Example for a function which prints XXH32_hash_t in human readable format
void printXxh32(XXH32_hash_t hash)
{
size_t i;
for(i = 0; i < sizeof(cano.digest); ++i) {
printf("%02x", cano.digest[i]);
}
printf("\n");
}
// Example for a function which converts XXH32_canonical_t to XXH32_hash_t
XXH32_hash_t convertCanonicalToXxh32(XXH32_canonical_t cano)
{
return hash;
}
void XXH32_canonicalFromHash(XXH32_canonical_t *dst, XXH32_hash_t hash)
Converts an XXH32_hash_t to a big endian XXH32_canonical_t.
Definition xxhash.h:3300
XXH32_hash_t XXH32_hashFromCanonical(const XXH32_canonical_t *src)
Converts an XXH32_canonical_t to a native XXH32_hash_t.
Definition xxhash.h:3307
Canonical (big endian) representation of XXH32_hash_t.
Definition xxhash.h:754
unsigned char digest[4]
Definition xxhash.h:755