Struct faiss::ProductQuantizer
-
struct ProductQuantizer : public faiss::Quantizer
Product Quantizer. PQ is trained using k-means, minimizing the L2 distance to centroids. PQ supports L2 and Inner Product search, however the quantization error is biased towards L2 distance.
Public Types
-
enum train_type_t
initialization
Values:
-
enumerator Train_default
-
enumerator Train_hot_start
the centroids are already initialized
share dictionary across PQ segments
-
enumerator Train_hypercube
initialize centroids with nbits-D hypercube
-
enumerator Train_hypercube_pca
initialize centroids with nbits-D hypercube
-
enumerator Train_default
Public Functions
-
inline float *get_centroids(size_t m, size_t i)
return the centroids associated with subvector m
-
inline const float *get_centroids(size_t m, size_t i) const
-
virtual void train(size_t n, const float *x) override
Train the quantizer
- Parameters:
x – training vectors, size n * d
-
ProductQuantizer(size_t d, size_t M, size_t nbits)
-
ProductQuantizer()
-
void set_derived_values()
compute derived values when d, M and nbits have been set
-
void set_params(const float *centroids, int m)
Define the centroids for subquantizer m.
-
void compute_code(const float *x, uint8_t *code) const
Quantize one vector with the product quantizer.
-
virtual void compute_codes(const float *x, uint8_t *codes, size_t n) const override
same as compute_code for several vectors
-
void compute_codes_with_assign_index(const float *x, uint8_t *codes, size_t n)
speed up code assignment using assign_index (non-const because the index is changed)
-
void decode(const uint8_t *code, float *x) const
decode a vector from a given code (or n vectors if third argument)
-
virtual void decode(const uint8_t *code, float *x, size_t n) const override
Decode a set of vectors
- Parameters:
codes – input codes, size n * code_size
x – output vectors, size n * d
-
void compute_code_from_distance_table(const float *tab, uint8_t *code) const
If we happen to have the distance tables precomputed, this is more efficient to compute the codes.
-
void compute_distance_table(const float *x, float *dis_table) const
Compute distance table for one vector.
The distance table for x = [x_0 x_1 .. x_(M-1)] is a M * ksub matrix that contains
dis_table (m, j) = || x_m - c_(m, j)||^2 for m = 0..M-1 and j = 0 .. ksub - 1
where c_(m, j) is the centroid no j of sub-quantizer m.
- Parameters:
x – input vector size d
dis_table – output table, size M * ksub
-
void compute_inner_prod_table(const float *x, float *dis_table) const
-
void compute_distance_tables(size_t nx, const float *x, float *dis_tables) const
compute distance table for several vectors
- Parameters:
nx – nb of input vectors
x – input vector size nx * d
dis_table – output table, size nx * M * ksub
-
void compute_inner_prod_tables(size_t nx, const float *x, float *dis_tables) const
-
void search(const float *x, size_t nx, const uint8_t *codes, const size_t ncodes, float_maxheap_array_t *res, bool init_finalize_heap = true) const
perform a search (L2 distance)
- Parameters:
x – query vectors, size nx * d
nx – nb of queries
codes – database codes, size ncodes * code_size
ncodes – nb of nb vectors
res – heap array to store results (nh == nx)
init_finalize_heap – initialize heap (input) and sort (output)?
-
void search_ip(const float *x, size_t nx, const uint8_t *codes, const size_t ncodes, float_minheap_array_t *res, bool init_finalize_heap = true) const
same as search, but with inner product similarity
-
void compute_sdc_table()
-
void search_sdc(const uint8_t *qcodes, size_t nq, const uint8_t *bcodes, const size_t ncodes, float_maxheap_array_t *res, bool init_finalize_heap = true) const
-
void sync_transposed_centroids()
Sync transposed centroids with regular centroids. This call is needed if centroids were edited directly.
-
void clear_transposed_centroids()
Clear transposed centroids table so ones are no longer used.
Public Members
-
size_t M
number of subquantizers
-
size_t nbits
number of bits per quantization index
-
size_t dsub
dimensionality of each subvector
-
size_t ksub
number of centroids for each subquantizer
-
bool verbose
verbose during training?
-
train_type_t train_type
-
ClusteringParameters cp
parameters used during clustering
-
Index *assign_index
if non-NULL, use this index for assignment (should be of size d / M)
-
std::vector<float> centroids
Centroid table, size M * ksub * dsub. Layout: (M, ksub, dsub)
-
std::vector<float> transposed_centroids
Transposed centroid table, size M * ksub * dsub. Layout: (dsub, M, ksub)
-
std::vector<float> centroids_sq_lengths
Squared lengths of centroids, size M * ksub Layout: (M, ksub)
-
std::vector<float> sdc_table
Symmetric Distance Table.
-
size_t d
size of the input vectors
-
size_t code_size
bytes per indexed vector
-
enum train_type_t