Struct faiss::IndexPQ

struct faiss::IndexPQ : public faiss::Index

Index based on a product quantizer. Stored vectors are approximated by PQ codes.

Public Types

enum Search_type_t

how to perform the search in search_core

Values:

enumerator ST_PQ

asymmetric product quantizer (default)

enumerator ST_HE

Hamming distance on codes.

enumerator ST_generalized_HE

nb of same codes

enumerator ST_SDC

symmetric product quantizer (SDC)

enumerator ST_polysemous

HE filter (using ht) + PQ combination.

enumerator ST_polysemous_generalize

Filter on generalized Hamming.

using idx_t = int64_t

all indices are this type

using component_t = float
using distance_t = float

Public Functions

IndexPQ(int d, size_t M, size_t nbits, MetricType metric = METRIC_L2)

Constructor.

Parameters
  • d – dimensionality of the input vectors

  • M – number of subquantizers

  • nbits – number of bit per subvector index

  • d – dimensionality of the input vectors

  • M – number of subquantizers

  • nbits – number of bit per subvector index

IndexPQ()
virtual void train(idx_t n, const float *x) override

Perform training on a representative set of vectors

Parameters
  • n – nb of training vectors

  • x – training vecors, size n * d

virtual void add(idx_t n, const float *x) override

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1 This function slices the input vectors in chunks smaller than blocksize_add and calls add_core.

Parameters

x – input matrix, size n * d

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels) const override

query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters
  • x – input vectors to search, size n * d

  • labels – output labels of the NNs, size n*k

  • distances – output pairwise distances, size n*k

virtual void reset() override

removes all elements from the database.

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const override

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters

recons – reconstucted vector (size ni * d)

virtual void reconstruct(idx_t key, float *recons) const override

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters
  • key – id of the vector to reconstruct

  • recons – reconstucted vector (size d)

virtual size_t remove_ids(const IDSelector &sel) override

removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual size_t sa_code_size() const override

size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const override

encode a set of vectors

Parameters
  • n – number of vectors

  • x – input vectors, size n * d

  • bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const override

encode a set of vectors

Parameters
  • n – number of vectors

  • bytes – input encoded vectors, size n * sa_code_size()

  • x – output vectors, size n * d

virtual DistanceComputer *get_distance_computer() const override

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

void search_core_polysemous(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels) const
void hamming_distance_histogram(idx_t n, const float *x, idx_t nb, const float *xb, int64_t *dist_histogram)

prepare query for a polysemous search, but instead of computing the result, just get the histogram of Hamming distances. May be computed on a provided dataset if xb != NULL

Parameters

dist_histogram – (M * nbits + 1)

void hamming_distance_table(idx_t n, const float *x, int32_t *dis) const

compute pairwise distances between queries and database

Parameters
  • n – nb of query vectors

  • x – query vector, size n * d

  • dis – output distances, size n * ntotal

virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters

xids – if non-null, ids to store for the vectors (size n)

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters
  • x – input vectors to search, size n * d

  • radius – search radius

  • result – result table

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const

return the indexes of the k vectors closest to the query x.

This function is identical as search but only return labels of neighbors.

Parameters
  • x – input vectors to search, size n * d

  • labels – output labels of the NNs, size n*k

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting arrays is padded with -1s.

Parameters

recons – reconstructed vectors size (n, k, d)

virtual void compute_residual(const float *x, float *residual, idx_t key) const

Computes a residual vector after indexing encoding.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters
  • x – input vector, size d

  • residual – output residual vector, size d

  • key – encoded index, as returned by search and assign

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const

Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters
  • n – number of vectors

  • xs – input vectors, size (n x d)

  • residuals – output residual vectors, size (n x d)

  • keys – encoded index, as returned by search and assign

Public Members

ProductQuantizer pq

The product quantizer used to encode the vectors.

std::vector<uint8_t> codes

Codes. Size ntotal * pq.code_size.

bool do_polysemous_training

false = standard PQ

PolysemousTraining polysemous_training

parameters used for the polysemous training

Search_type_t search_type
bool encode_signs
int polysemous_ht

Hamming threshold used for polysemy.

int d

vector dimension

idx_t ntotal

total nb of indexed vectors

bool verbose

verbosity level

bool is_trained

set if the Index does not require training, or if training is done already

MetricType metric_type

type of metric this index uses for search

float metric_arg

argument of the metric type