Namespace faiss::quantize_lut
-
namespace quantize_lut
Functions to quantize PQ floating-point Look Up Tables (LUT) to uint8, and biases to uint16. The accumulation is supposed to take place in uint16. The quantization coefficients are float (a, b) such that
The hardest part of the quantization is with multiple LUTs that need to be added up together. In that case, coefficient a has to be chosen so that the sum fits in a uint16 accumulator.original_value = quantized_value * a / b
Functions
-
void round_uint8_per_column(float *tab, size_t n, size_t d, float *a_out = nullptr, float *b_out = nullptr)
-
void round_uint8_per_column_multi(float *tab, size_t m, size_t n, size_t d, float *a_out = nullptr, float *b_out = nullptr)
-
void quantize_LUT_and_bias(size_t nprobe, size_t M, size_t ksub, bool lut_is_3d, const float *LUT, const float *bias, uint8_t *LUTq, size_t M2, uint16_t *biasq, float *a_out = nullptr, float *b_out = nullptr)
LUT quantization to uint8 and bias to uint16.
(nprobe, M, ksub, lut_is_3d) determine the size of the the LUT
LUT input:
2D size (M, ksub): single matrix per probe (lut_is_3d=false)
3D size (nprobe, M, ksub): separate LUT per probe (lut_is_3d=true) bias input:
nullptr: bias is 0
size (nprobe): one bias per probe Output:
LUTq uint8 version of the LUT (M size is rounded up to M2)
biasq (or nullptr): uint16 version of the LUT
a, b: scalars to approximate the true distance
-
void aq_quantize_LUT_and_bias(size_t nprobe, size_t M, size_t ksub, const float *LUT, const float *bias, size_t M_norm, int norm_scale, uint8_t *LUTq, size_t M2, uint16_t *biasq, float *a_out, float *b_out)
-
float aq_estimate_norm_scale(size_t M, size_t ksub, size_t M_norm, const float *LUT)
-
void round_uint8_per_column(float *tab, size_t n, size_t d, float *a_out = nullptr, float *b_out = nullptr)