Struct faiss::IndexFastScan

struct IndexFastScan : public faiss::Index

Fast scan version of IndexPQ and IndexAQ. Works for 4-bit PQ and AQ for now.

The codes are not stored sequentially but grouped in blocks of size bbs. This makes it possible to compute distances quickly with SIMD instructions. The trailing codes (padding codes that are added to complete the last code) are garbage.

Implementations: 12: blocked loop with internal loop on Q with qbs 13: same with reservoir accumulator to store results 14: no qbs with heap accumulator 15: no qbs with reservoir accumulator

Subclassed by faiss::IndexAdditiveQuantizerFastScan, faiss::IndexPQFastScan

Public Types

using component_t = float

using distance_t = float

Public Functions

void init_fastscan(int d, size_t M, size_t nbits, MetricType metric, int bbs)

IndexFastScan()

virtual void reset() override: removes all elements from the database.

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override

query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters:

n – number of vectors
x – input vectors to search, size n * d
k – number of extracted vectors
distances – output pairwise distances, size n*k
labels – output labels of the NNs, size n*k

virtual void add(idx_t n, const float *x) override

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1 This function slices the input vectors in chunks smaller than blocksize_add and calls add_core.

Parameters:

n – number of vectors
x – input matrix, size n * d

virtual void compute_codes(uint8_t *codes, idx_t n, const float *x) const = 0

virtual void compute_float_LUT(float *lut, idx_t n, const float *x) const = 0

void compute_quantized_LUT(idx_t n, const float *x, uint8_t *lut, float *normalizers) const

template<bool is_max> void search_dispatch_implem(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const NormTableScaler *scaler) const

template<class Cfloat> void search_implem_234(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const NormTableScaler *scaler) const

template<class C> void search_implem_12(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, int impl, const NormTableScaler *scaler) const

template<class C> void search_implem_14(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, int impl, const NormTableScaler *scaler) const

virtual void reconstruct(idx_t key, float *recons) const override

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

key – id of the vector to reconstruct
recons – reconstucted vector (size d)

virtual size_t remove_ids(const IDSelector &sel) override: removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

CodePacker *get_CodePacker() const

virtual void merge_from(Index &otherIndex, idx_t add_id = 0) override: moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const override: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void train(idx_t n, const float *x)

Perform training on a representative set of vectors

Parameters:

n – nb of training vectors
x – training vecors, size n * d

virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters:

n – number of vectors
x – input vectors, size n * d
xids – if non-null, ids to store for the vectors (size n)

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const

return the indexes of the k vectors closest to the query x.

This function is identical as search but only return labels of neighbors.

Parameters:

n – number of vectors
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
k – number of nearest neighbours

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const

Reconstruct several stored vectors (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters:

i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting arrays is padded with -1s.

Parameters:

n – number of vectors
x – input vectors to search, size n * d
k – number of extracted vectors
distances – output pairwise distances, size n*k
labels – output labels of the NNs, size n*k
recons – reconstructed vectors size (n, k, d)

virtual void compute_residual(const float *x, float *residual, idx_t key) const

Computes a residual vector after indexing encoding.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters:

x – input vector, size d
residual – output residual vector, size d
key – encoded index, as returned by search and assign

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const

Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters:

n – number of vectors
xs – input vectors, size (n x d)
residuals – output residual vectors, size (n x d)
keys – encoded index, as returned by search and assign

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

Public Members

int implem = 0

int skip = 0

int bbs

int qbs = 0

size_t M

size_t nbits

size_t ksub

size_t code_size

size_t ntotal2

size_t M2

AlignedTable<uint8_t> codes

const uint8_t *orig_codes = nullptr

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type