Struct faiss::IndexAdditiveQuantizerFastScan

struct IndexAdditiveQuantizerFastScan : public faiss::IndexFastScan

Fast scan version of IndexAQ. Works for 4-bit AQ for now.

The codes are not stored sequentially but grouped in blocks of size bbs. This makes it possible to compute distances quickly with SIMD instructions.

Implementations: 12: blocked loop with internal loop on Q with qbs 13: same with reservoir accumulator to store results 14: no qbs with heap accumulator 15: no qbs with reservoir accumulator

Subclassed by faiss::IndexLocalSearchQuantizerFastScan, faiss::IndexProductLocalSearchQuantizerFastScan, faiss::IndexProductResidualQuantizerFastScan, faiss::IndexResidualQuantizerFastScan

Public Types

using Search_type_t = AdditiveQuantizer::Search_type_t

using component_t = float

using distance_t = float

Public Functions

explicit IndexAdditiveQuantizerFastScan(AdditiveQuantizer *aq, MetricType metric = METRIC_L2, int bbs = 32)

void init(AdditiveQuantizer *aq, MetricType metric = METRIC_L2, int bbs = 32)

IndexAdditiveQuantizerFastScan()

~IndexAdditiveQuantizerFastScan() override

explicit IndexAdditiveQuantizerFastScan(const IndexAdditiveQuantizer &orig, int bbs = 32): build from an existing IndexAQ

virtual void train(idx_t n, const float *x) override

Perform training on a representative set of vectors

Parameters:

n – nb of training vectors
x – training vecors, size n * d

void estimate_norm_scale(idx_t n, const float *x)

virtual void compute_codes(uint8_t *codes, idx_t n, const float *x) const override

virtual void compute_float_LUT(float *lut, idx_t n, const float *x) const override

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override

query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters:

n – number of vectors
x – input vectors to search, size n * d
k – number of extracted vectors
distances – output pairwise distances, size n*k
labels – output labels of the NNs, size n*k

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const override

Decode a set of vectors.

NOTE: The codes in the IndexAdditiveQuantizerFastScan object are non- contiguous. But this method requires a contiguous representation.

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * code_size
x – output vectors, size n * d

void init_fastscan(int d, size_t M, size_t nbits, MetricType metric, int bbs)

virtual void reset() override: removes all elements from the database.

virtual void add(idx_t n, const float *x) override

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1 This function slices the input vectors in chunks smaller than blocksize_add and calls add_core.

Parameters:

n – number of vectors
x – input matrix, size n * d

void compute_quantized_LUT(idx_t n, const float *x, uint8_t *lut, float *normalizers) const

template<bool is_max> void search_dispatch_implem(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const NormTableScaler *scaler) const

template<class Cfloat> void search_implem_234(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const NormTableScaler *scaler) const

template<class C> void search_implem_12(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, int impl, const NormTableScaler *scaler) const

template<class C> void search_implem_14(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, int impl, const NormTableScaler *scaler) const

virtual void reconstruct(idx_t key, float *recons) const override

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

key – id of the vector to reconstruct
recons – reconstucted vector (size d)

virtual size_t remove_ids(const IDSelector &sel) override: removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

CodePacker *get_CodePacker() const

virtual void merge_from(Index &otherIndex, idx_t add_id = 0) override: moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const override: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters:

n – number of vectors
x – input vectors, size n * d
xids – if non-null, ids to store for the vectors (size n)

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const

return the indexes of the k vectors closest to the query x.

This function is identical as search but only return labels of neighbors.

Parameters:

n – number of vectors
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
k – number of nearest neighbours

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const

Reconstruct several stored vectors (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters:

i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting arrays is padded with -1s.

Parameters:

n – number of vectors
x – input vectors to search, size n * d
k – number of extracted vectors
distances – output pairwise distances, size n*k
labels – output labels of the NNs, size n*k
recons – reconstructed vectors size (n, k, d)

virtual void compute_residual(const float *x, float *residual, idx_t key) const

Computes a residual vector after indexing encoding.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters:

x – input vector, size d
residual – output residual vector, size d
key – encoded index, as returned by search and assign

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const

Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters:

n – number of vectors
xs – input vectors, size (n x d)
residuals – output residual vectors, size (n x d)
keys – encoded index, as returned by search and assign

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

Public Members

AdditiveQuantizer *aq

bool rescale_norm = true

int norm_scale = 1

size_t max_train_points = 0

int implem = 0

int skip = 0

int bbs

int qbs = 0

size_t M

size_t nbits

size_t ksub

size_t code_size

size_t ntotal2

size_t M2

AlignedTable<uint8_t> codes

const uint8_t *orig_codes = nullptr

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type