Struct faiss::ResidualCoarseQuantizer

struct ResidualCoarseQuantizer : public faiss::AdditiveCoarseQuantizer

The ResidualCoarseQuantizer is a bit specialized compared to the default AdditiveCoarseQuantizer because it can use a beam search at search time (slow but may be useful for very large vocabularies)

Public Types

using component_t = float
using distance_t = float

Public Functions

void set_beam_factor(float new_beam_factor)

computes centroid norms if required

ResidualCoarseQuantizer(int d, size_t M, size_t nbits, MetricType metric = METRIC_L2)

Constructor.

Parameters:
  • d – dimensionality of the input vectors

  • M – number of subquantizers

  • nbits – number of bit per subvector index

  • d – dimensionality of the input vectors

  • M – number of subquantizers

  • nbits – number of bit per subvector index

ResidualCoarseQuantizer(int d, const std::vector<size_t> &nbits, MetricType metric = METRIC_L2)
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override

query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters:
  • n – number of vectors

  • x – input vectors to search, size n * d

  • k – number of extracted vectors

  • distances – output pairwise distances, size n*k

  • labels – output labels of the NNs, size n*k

void initialize_from(const ResidualCoarseQuantizer &other)

Copy the M first codebook levels from other. Useful to crop a ResidualQuantizer to its first M quantizers.

ResidualCoarseQuantizer()
virtual void add(idx_t n, const float *x) override

N/A.

virtual void reconstruct(idx_t key, float *recons) const override

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:
  • key – id of the vector to reconstruct

  • recons – reconstucted vector (size d)

virtual void train(idx_t n, const float *x) override

Perform training on a representative set of vectors

Parameters:
  • n – nb of training vectors

  • x – training vecors, size n * d

virtual void reset() override

N/A.

virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters:
  • n – number of vectors

  • x – input vectors, size n * d

  • xids – if non-null, ids to store for the vectors (size n)

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:
  • n – number of vectors

  • x – input vectors to search, size n * d

  • radius – search radius

  • result – result table

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const

return the indexes of the k vectors closest to the query x.

This function is identical as search but only return labels of neighbors.

Parameters:
  • n – number of vectors

  • x – input vectors to search, size n * d

  • labels – output labels of the NNs, size n*k

  • k – number of nearest neighbours

virtual size_t remove_ids(const IDSelector &sel)

removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const

Reconstruct several stored vectors (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:
  • n – number of vectors to reconstruct

  • keys – ids of the vectors to reconstruct (size n)

  • recons – reconstucted vector (size n * d)

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters:
  • i0 – index of the first vector in the sequence

  • ni – number of vectors in the sequence

  • recons – reconstucted vector (size ni * d)

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting arrays is padded with -1s.

Parameters:
  • n – number of vectors

  • x – input vectors to search, size n * d

  • k – number of extracted vectors

  • distances – output pairwise distances, size n*k

  • labels – output labels of the NNs, size n*k

  • recons – reconstructed vectors size (n, k, d)

virtual void compute_residual(const float *x, float *residual, idx_t key) const

Computes a residual vector after indexing encoding.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters:
  • x – input vector, size d

  • residual – output residual vector, size d

  • key – encoded index, as returned by search and assign

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const

Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters:
  • n – number of vectors

  • xs – input vectors, size (n x d)

  • residuals – output residual vectors, size (n x d)

  • keys – encoded index, as returned by search and assign

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const

size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:
  • n – number of vectors

  • x – input vectors, size n * d

  • bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:
  • n – number of vectors

  • bytes – input encoded vectors, size n * sa_code_size()

  • x – output vectors, size n * d

virtual void merge_from(Index &otherIndex, idx_t add_id = 0)

moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const

check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

Public Members

ResidualQuantizer rq

The residual quantizer used to encode the vectors.

float beam_factor = 4.0f

factor between the beam size and the search k if negative, use exact search-to-centroid

AdditiveQuantizer *aq
std::vector<float> centroid_norms

norms of centroids, useful for knn-search

int d

vector dimension

idx_t ntotal

total nb of indexed vectors

bool verbose

verbosity level

bool is_trained

set if the Index does not require training, or if training is done already

MetricType metric_type

type of metric this index uses for search

float metric_arg

argument of the metric type