xxHash 0.8.2
Extremely fast non-cryptographic hash function
|
Data Structures | |
struct | XXH128_hash_t |
The return value from 128-bit hashes. More... | |
struct | XXH128_canonical_t |
Macros | |
#define | XXH3_SECRET_SIZE_MIN 136 |
Typedefs | |
typedef struct XXH3_state_s | XXH3_state_t |
The state struct for the XXH3 streaming API. | |
Functions | |
XXH64_hash_t | XXH3_64bits (XXH_NOESCAPE const void *input, size_t length) |
64-bit unseeded variant of XXH3. | |
XXH64_hash_t | XXH3_64bits_withSeed (XXH_NOESCAPE const void *input, size_t length, XXH64_hash_t seed) |
64-bit seeded variant of XXH3 | |
XXH64_hash_t | XXH3_64bits_withSecret (XXH_NOESCAPE const void *data, size_t len, XXH_NOESCAPE const void *secret, size_t secretSize) |
64-bit variant of XXH3 with a custom "secret". | |
XXH3_state_t * | XXH3_createState (void) |
Allocate an XXH3_state_t. | |
XXH_errorcode | XXH3_freeState (XXH3_state_t *statePtr) |
Frees an XXH3_state_t. | |
void | XXH3_copyState (XXH_NOESCAPE XXH3_state_t *dst_state, XXH_NOESCAPE const XXH3_state_t *src_state) |
Copies one XXH3_state_t to another. | |
XXH_errorcode | XXH3_64bits_reset (XXH_NOESCAPE XXH3_state_t *statePtr) |
Resets an XXH3_state_t to begin a new hash. | |
XXH_errorcode | XXH3_64bits_reset_withSeed (XXH_NOESCAPE XXH3_state_t *statePtr, XXH64_hash_t seed) |
Resets an XXH3_state_t with 64-bit seed to begin a new hash. | |
XXH_errorcode | XXH3_64bits_reset_withSecret (XXH_NOESCAPE XXH3_state_t *statePtr, XXH_NOESCAPE const void *secret, size_t secretSize) |
XXH_errorcode | XXH3_64bits_update (XXH_NOESCAPE XXH3_state_t *statePtr, XXH_NOESCAPE const void *input, size_t length) |
Consumes a block of input to an XXH3_state_t. | |
XXH64_hash_t | XXH3_64bits_digest (XXH_NOESCAPE const XXH3_state_t *statePtr) |
Returns the calculated XXH3 64-bit hash value from an XXH3_state_t. | |
XXH128_hash_t | XXH3_128bits (XXH_NOESCAPE const void *data, size_t len) |
Unseeded 128-bit variant of XXH3. | |
XXH128_hash_t | XXH3_128bits_withSeed (XXH_NOESCAPE const void *data, size_t len, XXH64_hash_t seed) |
Seeded 128-bit variant of XXH3. | |
XXH128_hash_t | XXH3_128bits_withSecret (XXH_NOESCAPE const void *data, size_t len, XXH_NOESCAPE const void *secret, size_t secretSize) |
Custom secret 128-bit variant of XXH3. | |
XXH_errorcode | XXH3_128bits_reset (XXH_NOESCAPE XXH3_state_t *statePtr) |
Resets an XXH3_state_t to begin a new hash. | |
XXH_errorcode | XXH3_128bits_reset_withSeed (XXH_NOESCAPE XXH3_state_t *statePtr, XXH64_hash_t seed) |
Resets an XXH3_state_t with 64-bit seed to begin a new hash. | |
XXH_errorcode | XXH3_128bits_reset_withSecret (XXH_NOESCAPE XXH3_state_t *statePtr, XXH_NOESCAPE const void *secret, size_t secretSize) |
Custom secret 128-bit variant of XXH3. | |
XXH_errorcode | XXH3_128bits_update (XXH_NOESCAPE XXH3_state_t *statePtr, XXH_NOESCAPE const void *input, size_t length) |
Consumes a block of input to an XXH3_state_t. | |
XXH128_hash_t | XXH3_128bits_digest (XXH_NOESCAPE const XXH3_state_t *statePtr) |
Returns the calculated XXH3 128-bit hash value from an XXH3_state_t. | |
int | XXH128_isEqual (XXH128_hash_t h1, XXH128_hash_t h2) |
int | XXH128_cmp (XXH_NOESCAPE const void *h128_1, XXH_NOESCAPE const void *h128_2) |
Compares two XXH128_hash_t This comparator is compatible with stdlib's qsort() /bsearch() . | |
void | XXH128_canonicalFromHash (XXH_NOESCAPE XXH128_canonical_t *dst, XXH128_hash_t hash) |
Converts an XXH128_hash_t to a big endian XXH128_canonical_t. | |
XXH128_hash_t | XXH128_hashFromCanonical (XXH_NOESCAPE const XXH128_canonical_t *src) |
Converts an XXH128_canonical_t to a native XXH128_hash_t. | |
XXH_errorcode | XXH3_64bits_reset_withSecretandSeed (XXH_NOESCAPE XXH3_state_t *statePtr, XXH_NOESCAPE const void *secret, size_t secretSize, XXH64_hash_t seed64) |
XXH128_hash_t | XXH3_128bits_withSecretandSeed (XXH_NOESCAPE const void *input, size_t length, XXH_NOESCAPE const void *secret, size_t secretSize, XXH64_hash_t seed64) |
XXH128_hash_t | XXH128 (XXH_NOESCAPE const void *data, size_t len, XXH64_hash_t seed) |
XXH_errorcode | XXH3_128bits_reset_withSecretandSeed (XXH_NOESCAPE XXH3_state_t *statePtr, XXH_NOESCAPE const void *secret, size_t secretSize, XXH64_hash_t seed64) |
XXH_errorcode | XXH3_generateSecret (XXH_NOESCAPE void *secretBuffer, size_t secretSize, XXH_NOESCAPE const void *customSeed, size_t customSeedSize) |
void | XXH3_generateSecret_fromSeed (XXH_NOESCAPE void *secretBuffer, XXH64_hash_t seed) |
Generate the same secret as the _withSeed() variants. | |
XXH3 is a more recent hash algorithm featuring:
Speed analysis methodology is explained here:
https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html
Compared to XXH64, expect XXH3 to run approximately ~2x faster on large inputs and >3x faster on small ones, exact differences vary depending on platform.
XXH3's speed benefits greatly from SIMD and 64-bit arithmetic, but does not require it. Most 32-bit and 64-bit targets that can run XXH32 smoothly can run XXH3 at competitive speeds, even without vector support. Further details are explained in the implementation.
XXH3 has a fast scalar implementation, but it also includes accelerated SIMD implementations for many common platforms:
XXH3 implementation is portable: it has a generic C90 formulation that can be compiled on any platform, all implementations generate exactly the same hash value on all platforms. Starting from v0.8.0, it's also labelled "stable", meaning that any future version will also generate the same hash value.
XXH3 offers 2 variants, _64bits and _128bits.
When only 64 bits are needed, prefer invoking the _64bits variant, as it reduces the amount of mixing, resulting in faster speed on small inputs. It's also generally simpler to manipulate a scalar return type than a struct.
The API supports one-shot hashing, streaming mode, and custom secrets.
#define XXH3_SECRET_SIZE_MIN 136 |
The bare minimum size for a custom secret.
typedef struct XXH3_state_s XXH3_state_t |
The state struct for the XXH3 streaming API.
XXH64_hash_t XXH3_64bits | ( | XXH_NOESCAPE const void * | input, |
size_t | length | ||
) |
64-bit unseeded variant of XXH3.
This is equivalent to XXH3_64bits_withSeed() with a seed of 0, however it may have slightly better performance due to constant propagation of the defaults.
XXH64_hash_t XXH3_64bits_withSeed | ( | XXH_NOESCAPE const void * | input, |
size_t | length, | ||
XXH64_hash_t | seed | ||
) |
64-bit seeded variant of XXH3
This variant generates a custom secret on the fly based on default secret altered using the seed
value.
While this operation is decently fast, note that it's not completely free.
input | The data to hash |
length | The length |
seed | The 64-bit seed to alter the state. |
XXH64_hash_t XXH3_64bits_withSecret | ( | XXH_NOESCAPE const void * | data, |
size_t | len, | ||
XXH_NOESCAPE const void * | secret, | ||
size_t | secretSize | ||
) |
64-bit variant of XXH3 with a custom "secret".
It's possible to provide any blob of bytes as a "secret" to generate the hash. This makes it more difficult for an external actor to prepare an intentional collision. The main condition is that secretSize must be large enough (>= XXH3_SECRET_SIZE_MIN). However, the quality of the secret impacts the dispersion of the hash algorithm. Therefore, the secret must look like a bunch of random bytes. Avoid "trivial" or structured data such as repeated sequences or a text document. Whenever in doubt about the "randomness" of the blob of bytes, consider employing "XXH3_generateSecret()" instead (see below). It will generate a proper high entropy secret derived from the blob of bytes. Another advantage of using XXH3_generateSecret() is that it guarantees that all bits within the initial blob of bytes will impact every bit of the output. This is not necessarily the case when using the blob of bytes directly because, when hashing small inputs, only a portion of the secret is employed.
XXH3_state_t * XXH3_createState | ( | void | ) |
Allocate an XXH3_state_t.
Must be freed with XXH3_freeState().
NULL
on failure. XXH_errorcode XXH3_freeState | ( | XXH3_state_t * | statePtr | ) |
Frees an XXH3_state_t.
Must be allocated with XXH3_createState().
statePtr | A pointer to an XXH3_state_t allocated with XXH3_createState(). |
void XXH3_copyState | ( | XXH_NOESCAPE XXH3_state_t * | dst_state, |
XXH_NOESCAPE const XXH3_state_t * | src_state | ||
) |
Copies one XXH3_state_t to another.
dst_state | The state to copy to. |
src_state | The state to copy from. |
dst_state
and src_state
must not be NULL
and must not overlap. XXH_errorcode XXH3_64bits_reset | ( | XXH_NOESCAPE XXH3_state_t * | statePtr | ) |
Resets an XXH3_state_t to begin a new hash.
This function resets statePtr
and generate a secret with default parameters. Call it before XXH3_64bits_update(). Digest will be equivalent to XXH3_64bits()
.
statePtr | The state struct to reset. |
statePtr
must not be NULL
.XXH_errorcode XXH3_64bits_reset_withSeed | ( | XXH_NOESCAPE XXH3_state_t * | statePtr, |
XXH64_hash_t | seed | ||
) |
Resets an XXH3_state_t with 64-bit seed to begin a new hash.
This function resets statePtr
and generate a secret from seed
. Call it before XXH3_64bits_update(). Digest will be equivalent to XXH3_64bits_withSeed()
.
statePtr | The state struct to reset. |
seed | The 64-bit seed to alter the state. |
statePtr
must not be NULL
.XXH_errorcode XXH3_64bits_reset_withSecret | ( | XXH_NOESCAPE XXH3_state_t * | statePtr, |
XXH_NOESCAPE const void * | secret, | ||
size_t | secretSize | ||
) |
XXH3_64bits_reset_withSecret(): secret
is referenced, it must outlive the hash streaming session. Similar to one-shot API, secretSize
must be >= XXH3_SECRET_SIZE_MIN
, and the quality of produced hash values depends on secret's entropy (secret's content should look like a bunch of random bytes). When in doubt about the randomness of a candidate secret
, consider employing XXH3_generateSecret()
instead (see below).
XXH_errorcode XXH3_64bits_update | ( | XXH_NOESCAPE XXH3_state_t * | statePtr, |
XXH_NOESCAPE const void * | input, | ||
size_t | length | ||
) |
Consumes a block of input
to an XXH3_state_t.
Call this to incrementally consume blocks of data.
statePtr | The state struct to update. |
input | The block of data to be hashed, at least length bytes in size. |
length | The length of input , in bytes. |
statePtr
must not be NULL
. input
and input
+ length
must be valid, readable, contiguous memory. However, if length
is 0
, input
may be NULL
. In C++, this also must be TriviallyCopyable.XXH64_hash_t XXH3_64bits_digest | ( | XXH_NOESCAPE const XXH3_state_t * | statePtr | ) |
Returns the calculated XXH3 64-bit hash value from an XXH3_state_t.
statePtr
, so you can update, digest, and update again.statePtr | The state struct to calculate the hash from. |
statePtr
must not be NULL
.XXH128_hash_t XXH3_128bits | ( | XXH_NOESCAPE const void * | data, |
size_t | len | ||
) |
Unseeded 128-bit variant of XXH3.
The 128-bit variant of XXH3 has more strength, but it has a bit of overhead for shorter inputs.
This is equivalent to XXH3_128bits_withSeed() with a seed of 0, however it may have slightly better performance due to constant propagation of the defaults.
XXH128_hash_t XXH3_128bits_withSeed | ( | XXH_NOESCAPE const void * | data, |
size_t | len, | ||
XXH64_hash_t | seed | ||
) |
Seeded 128-bit variant of XXH3.
XXH128_hash_t XXH3_128bits_withSecret | ( | XXH_NOESCAPE const void * | data, |
size_t | len, | ||
XXH_NOESCAPE const void * | secret, | ||
size_t | secretSize | ||
) |
Custom secret 128-bit variant of XXH3.
XXH_errorcode XXH3_128bits_reset | ( | XXH_NOESCAPE XXH3_state_t * | statePtr | ) |
Resets an XXH3_state_t to begin a new hash.
This function resets statePtr
and generate a secret with default parameters. Call it before XXH3_128bits_update(). Digest will be equivalent to XXH3_128bits()
.
statePtr | The state struct to reset. |
statePtr
must not be NULL
.XXH_errorcode XXH3_128bits_reset_withSeed | ( | XXH_NOESCAPE XXH3_state_t * | statePtr, |
XXH64_hash_t | seed | ||
) |
Resets an XXH3_state_t with 64-bit seed to begin a new hash.
This function resets statePtr
and generate a secret from seed
. Call it before XXH3_128bits_update(). Digest will be equivalent to XXH3_128bits_withSeed()
.
statePtr | The state struct to reset. |
seed | The 64-bit seed to alter the state. |
statePtr
must not be NULL
.XXH_errorcode XXH3_128bits_reset_withSecret | ( | XXH_NOESCAPE XXH3_state_t * | statePtr, |
XXH_NOESCAPE const void * | secret, | ||
size_t | secretSize | ||
) |
Custom secret 128-bit variant of XXH3.
XXH_errorcode XXH3_128bits_update | ( | XXH_NOESCAPE XXH3_state_t * | statePtr, |
XXH_NOESCAPE const void * | input, | ||
size_t | length | ||
) |
Consumes a block of input
to an XXH3_state_t.
Call this to incrementally consume blocks of data.
statePtr | The state struct to update. |
input | The block of data to be hashed, at least length bytes in size. |
length | The length of input , in bytes. |
statePtr
must not be NULL
. input
and input
+ length
must be valid, readable, contiguous memory. However, if length
is 0
, input
may be NULL
. In C++, this also must be TriviallyCopyable.XXH128_hash_t XXH3_128bits_digest | ( | XXH_NOESCAPE const XXH3_state_t * | statePtr | ) |
Returns the calculated XXH3 128-bit hash value from an XXH3_state_t.
statePtr
, so you can update, digest, and update again.statePtr | The state struct to calculate the hash from. |
statePtr
must not be NULL
.int XXH128_isEqual | ( | XXH128_hash_t | h1, |
XXH128_hash_t | h2 | ||
) |
XXH128_isEqual(): Return: 1 if h1
and h2
are equal, 0 if they are not.
int XXH128_cmp | ( | XXH_NOESCAPE const void * | h128_1, |
XXH_NOESCAPE const void * | h128_2 | ||
) |
Compares two XXH128_hash_t This comparator is compatible with stdlib's qsort()
/bsearch()
.
void XXH128_canonicalFromHash | ( | XXH_NOESCAPE XXH128_canonical_t * | dst, |
XXH128_hash_t | hash | ||
) |
Converts an XXH128_hash_t to a big endian XXH128_canonical_t.
dst | The XXH128_canonical_t pointer to be stored to. |
hash | The XXH128_hash_t to be converted. |
dst
must not be NULL
. XXH128_hash_t XXH128_hashFromCanonical | ( | XXH_NOESCAPE const XXH128_canonical_t * | src | ) |
Converts an XXH128_canonical_t to a native XXH128_hash_t.
src | The XXH128_canonical_t to convert. |
src
must not be NULL
.XXH_errorcode XXH3_64bits_reset_withSecretandSeed | ( | XXH_NOESCAPE XXH3_state_t * | statePtr, |
XXH_NOESCAPE const void * | secret, | ||
size_t | secretSize, | ||
XXH64_hash_t | seed64 | ||
) |
These variants generate hash values using either seed
for "short" keys (< XXH3_MIDSIZE_MAX = 240 bytes) or secret
for "large" keys (>= XXH3_MIDSIZE_MAX).
This generally benefits speed, compared to _withSeed()
or _withSecret()
. _withSeed()
has to generate the secret on the fly for "large" keys. It's fast, but can be perceptible for "not so large" keys (< 1 KB). _withSecret()
has to generate the masks on the fly for "small" keys, which requires more instructions than _withSeed() variants. Therefore, _withSecretandSeed variant combines the best of both worlds.
When secret
has been generated by XXH3_generateSecret_fromSeed(), this variant produces exactly the same results as _withSeed()
variant, hence offering only a pure speed benefit on "large" input, by skipping the need to regenerate the secret for every large input.
Another usage scenario is to hash the secret to a 64-bit hash value, for example with XXH3_64bits(), which then becomes the seed, and then employ both the seed and the secret in _withSecretandSeed(). On top of speed, an added benefit is that each bit in the secret has a 50% chance to swap each bit in the output, via its impact to the seed.
This is not guaranteed when using the secret directly in "small data" scenarios, because only portions of the secret are employed for small data.
XXH128_hash_t XXH3_128bits_withSecretandSeed | ( | XXH_NOESCAPE const void * | input, |
size_t | length, | ||
XXH_NOESCAPE const void * | secret, | ||
size_t | secretSize, | ||
XXH64_hash_t | seed64 | ||
) |
These variants generate hash values using either seed
for "short" keys (< XXH3_MIDSIZE_MAX = 240 bytes) or secret
for "large" keys (>= XXH3_MIDSIZE_MAX).
This generally benefits speed, compared to _withSeed()
or _withSecret()
. _withSeed()
has to generate the secret on the fly for "large" keys. It's fast, but can be perceptible for "not so large" keys (< 1 KB). _withSecret()
has to generate the masks on the fly for "small" keys, which requires more instructions than _withSeed() variants. Therefore, _withSecretandSeed variant combines the best of both worlds.
When secret
has been generated by XXH3_generateSecret_fromSeed(), this variant produces exactly the same results as _withSeed()
variant, hence offering only a pure speed benefit on "large" input, by skipping the need to regenerate the secret for every large input.
Another usage scenario is to hash the secret to a 64-bit hash value, for example with XXH3_64bits(), which then becomes the seed, and then employ both the seed and the secret in _withSecretandSeed(). On top of speed, an added benefit is that each bit in the secret has a 50% chance to swap each bit in the output, via its impact to the seed.
This is not guaranteed when using the secret directly in "small data" scenarios, because only portions of the secret are employed for small data.
XXH128_hash_t XXH128 | ( | XXH_NOESCAPE const void * | data, |
size_t | len, | ||
XXH64_hash_t | seed | ||
) |
simple alias to pre-selected XXH3_128bits variant
XXH_errorcode XXH3_128bits_reset_withSecretandSeed | ( | XXH_NOESCAPE XXH3_state_t * | statePtr, |
XXH_NOESCAPE const void * | secret, | ||
size_t | secretSize, | ||
XXH64_hash_t | seed64 | ||
) |
These variants generate hash values using either seed
for "short" keys (< XXH3_MIDSIZE_MAX = 240 bytes) or secret
for "large" keys (>= XXH3_MIDSIZE_MAX).
This generally benefits speed, compared to _withSeed()
or _withSecret()
. _withSeed()
has to generate the secret on the fly for "large" keys. It's fast, but can be perceptible for "not so large" keys (< 1 KB). _withSecret()
has to generate the masks on the fly for "small" keys, which requires more instructions than _withSeed() variants. Therefore, _withSecretandSeed variant combines the best of both worlds.
When secret
has been generated by XXH3_generateSecret_fromSeed(), this variant produces exactly the same results as _withSeed()
variant, hence offering only a pure speed benefit on "large" input, by skipping the need to regenerate the secret for every large input.
Another usage scenario is to hash the secret to a 64-bit hash value, for example with XXH3_64bits(), which then becomes the seed, and then employ both the seed and the secret in _withSecretandSeed(). On top of speed, an added benefit is that each bit in the secret has a 50% chance to swap each bit in the output, via its impact to the seed.
This is not guaranteed when using the secret directly in "small data" scenarios, because only portions of the secret are employed for small data.
XXH_errorcode XXH3_generateSecret | ( | XXH_NOESCAPE void * | secretBuffer, |
size_t | secretSize, | ||
XXH_NOESCAPE const void * | customSeed, | ||
size_t | customSeedSize | ||
) |
Derive a high-entropy secret from any user-defined content, named customSeed. The generated secret can be used in combination with *_withSecret()
functions. The _withSecret()
variants are useful to provide a higher level of protection than 64-bit seed, as it becomes much more difficult for an external actor to guess how to impact the calculation logic.
The function accepts as input a custom seed of any length and any content, and derives from it a high-entropy secret of length secretSize
into an already allocated buffer secretBuffer
.
The generated secret can then be used with any *_withSecret()
variant. The functions XXH3_128bits_withSecret(), XXH3_64bits_withSecret(), XXH3_128bits_reset_withSecret() and XXH3_64bits_reset_withSecret() are part of this list. They all accept a secret
parameter which must be large enough for implementation reasons (>= XXH3_SECRET_SIZE_MIN) and feature very high entropy (consist of random-looking bytes). These conditions can be a high bar to meet, so XXH3_generateSecret() can be employed to ensure proper quality.
customSeed
can be anything. It can have any size, even small ones, and its content can be anything, even "poor entropy" sources such as a bunch of zeroes. The resulting secret
will nonetheless provide all required qualities.
secretSize
must be >= XXH3_SECRET_SIZE_MINcustomSeedSize
> 0, supplying NULL as customSeed is undefined behavior.Example code:
void XXH3_generateSecret_fromSeed | ( | XXH_NOESCAPE void * | secretBuffer, |
XXH64_hash_t | seed | ||
) |
Generate the same secret as the _withSeed() variants.
The generated secret can be used in combination with *_withSecret()
and _withSecretandSeed()
variants.
Example C++ std::string
hash class:
secretBuffer | A writable buffer of XXH3_SECRET_SIZE_MIN bytes |
seed | The seed to seed the state. |