Struct faiss::ProductQuantizer

struct ProductQuantizer : public faiss::Quantizer

Product Quantizer. PQ is trained using k-means, minimizing the L2 distance to centroids. PQ supports L2 and Inner Product search, however the quantization error is biased towards L2 distance.

Public Types

enum train_type_t

initialization

Values:

enumerator Train_default
enumerator Train_hot_start

the centroids are already initialized

enumerator Train_shared

share dictionary across PQ segments

enumerator Train_hypercube

initialize centroids with nbits-D hypercube

enumerator Train_hypercube_pca

initialize centroids with nbits-D hypercube

Public Functions

inline float *get_centroids(size_t m, size_t i)

return the centroids associated with subvector m

inline const float *get_centroids(size_t m, size_t i) const
virtual void train(size_t n, const float *x) override

Train the quantizer

Parameters:

x – training vectors, size n * d

ProductQuantizer(size_t d, size_t M, size_t nbits)
ProductQuantizer()
void set_derived_values()

compute derived values when d, M and nbits have been set

void set_params(const float *centroids, int m)

Define the centroids for subquantizer m.

void compute_code(const float *x, uint8_t *code) const

Quantize one vector with the product quantizer.

virtual void compute_codes(const float *x, uint8_t *codes, size_t n) const override

same as compute_code for several vectors

void compute_codes_with_assign_index(const float *x, uint8_t *codes, size_t n)

speed up code assignment using assign_index (non-const because the index is changed)

void decode(const uint8_t *code, float *x) const

decode a vector from a given code (or n vectors if third argument)

virtual void decode(const uint8_t *code, float *x, size_t n) const override

Decode a set of vectors

Parameters:
  • codes – input codes, size n * code_size

  • x – output vectors, size n * d

void compute_code_from_distance_table(const float *tab, uint8_t *code) const

If we happen to have the distance tables precomputed, this is more efficient to compute the codes.

void compute_distance_table(const float *x, float *dis_table) const

Compute distance table for one vector.

The distance table for x = [x_0 x_1 .. x_(M-1)] is a M * ksub matrix that contains

dis_table (m, j) = || x_m - c_(m, j)||^2 for m = 0..M-1 and j = 0 .. ksub - 1

where c_(m, j) is the centroid no j of sub-quantizer m.

Parameters:
  • x – input vector size d

  • dis_table – output table, size M * ksub

void compute_inner_prod_table(const float *x, float *dis_table) const
void compute_distance_tables(size_t nx, const float *x, float *dis_tables) const

compute distance table for several vectors

Parameters:
  • nx – nb of input vectors

  • x – input vector size nx * d

  • dis_table – output table, size nx * M * ksub

void compute_inner_prod_tables(size_t nx, const float *x, float *dis_tables) const
void search(const float *x, size_t nx, const uint8_t *codes, const size_t ncodes, float_maxheap_array_t *res, bool init_finalize_heap = true) const

perform a search (L2 distance)

Parameters:
  • x – query vectors, size nx * d

  • nx – nb of queries

  • codes – database codes, size ncodes * code_size

  • ncodes – nb of nb vectors

  • res – heap array to store results (nh == nx)

  • init_finalize_heap – initialize heap (input) and sort (output)?

void search_ip(const float *x, size_t nx, const uint8_t *codes, const size_t ncodes, float_minheap_array_t *res, bool init_finalize_heap = true) const

same as search, but with inner product similarity

void compute_sdc_table()
void search_sdc(const uint8_t *qcodes, size_t nq, const uint8_t *bcodes, const size_t ncodes, float_maxheap_array_t *res, bool init_finalize_heap = true) const
void sync_transposed_centroids()

Sync transposed centroids with regular centroids. This call is needed if centroids were edited directly.

void clear_transposed_centroids()

Clear transposed centroids table so ones are no longer used.

Public Members

size_t M

number of subquantizers

size_t nbits

number of bits per quantization index

size_t dsub

dimensionality of each subvector

size_t ksub

number of centroids for each subquantizer

bool verbose

verbose during training?

train_type_t train_type
ClusteringParameters cp

parameters used during clustering

Index *assign_index

if non-NULL, use this index for assignment (should be of size d / M)

std::vector<float> centroids

Centroid table, size M * ksub * dsub. Layout: (M, ksub, dsub)

std::vector<float> transposed_centroids

Transposed centroid table, size M * ksub * dsub. Layout: (dsub, M, ksub)

std::vector<float> centroids_sq_lengths

Squared lengths of centroids, size M * ksub Layout: (M, ksub)

std::vector<float> sdc_table

Symmetric Distance Table.

size_t d

size of the input vectors

size_t code_size

bytes per indexed vector