Struct faiss::IndexRowwiseMinMaxBase

struct IndexRowwiseMinMaxBase : public faiss::Index

Provides base functions for rowwise normalizing indices.

Index wrapper that performs rowwise normalization to [0,1], preserving the coefficients. This is a vector codec index only.

Basically, this index performs a rowwise scaling to [0,1] of every row in an input dataset before calling subindex::train() and subindex::sa_encode(). sa_encode() call stores the scaling coefficients (scaler and minv) in the very beginning of every output code. The format: [scaler][minv][subindex::sa_encode() output] The de-scaling in sa_decode() is done using: output_rescaled = scaler * output + minv

An additional ::train_inplace() function is provided in order to do an inplace scaling before calling subindex::train() and, thus, avoiding the cloning of the input dataset, but modifying the input dataset because of the scaling and the scaling back. It is up to user to call this function instead of ::train()

Derived classes provide different data types for scaling coefficients. Currently, versions with fp16 and fp32 scaling coefficients are available.

fp16 version adds 4 extra bytes per encoded vector
fp32 version adds 8 extra bytes per encoded vector

Subclassed by faiss::IndexRowwiseMinMax, faiss::IndexRowwiseMinMaxFP16

Public Types

using component_t = float

using distance_t = float

Public Functions

explicit IndexRowwiseMinMaxBase(Index *index)

IndexRowwiseMinMaxBase()

~IndexRowwiseMinMaxBase() override

virtual void add(idx_t n, const float *x) override

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1 This function slices the input vectors in chunks smaller than blocksize_add and calls add_core.

Parameters:

n – number of vectors
x – input matrix, size n * d

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override

query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters:

n – number of vectors
x – input vectors to search, size n * d
k – number of extracted vectors
distances – output pairwise distances, size n*k
labels – output labels of the NNs, size n*k

virtual void reset() override: removes all elements from the database.

virtual void train_inplace(idx_t n, float *x) = 0

virtual void train(idx_t n, const float *x)

Perform training on a representative set of vectors

Parameters:

n – nb of training vectors
x – training vecors, size n * d

virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters:

n – number of vectors
x – input vectors, size n * d
xids – if non-null, ids to store for the vectors (size n)

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const

return the indexes of the k vectors closest to the query x.

This function is identical as search but only return labels of neighbors.

Parameters:

n – number of vectors
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
k – number of nearest neighbours

virtual size_t remove_ids(const IDSelector &sel): removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void reconstruct(idx_t key, float *recons) const

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

key – id of the vector to reconstruct
recons – reconstucted vector (size d)

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const

Reconstruct several stored vectors (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters:

i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting arrays is padded with -1s.

Parameters:

n – number of vectors
x – input vectors to search, size n * d
k – number of extracted vectors
distances – output pairwise distances, size n*k
labels – output labels of the NNs, size n*k
recons – reconstructed vectors size (n, k, d)

virtual void compute_residual(const float *x, float *residual, idx_t key) const

Computes a residual vector after indexing encoding.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters:

x – input vector, size d
residual – output residual vector, size d
key – encoded index, as returned by search and assign

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const

Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters:

n – number of vectors
xs – input vectors, size (n x d)
residuals – output residual vectors, size (n x d)
keys – encoded index, as returned by search and assign

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d

virtual void merge_from(Index &otherIndex, idx_t add_id = 0): moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

Public Members

Index *index: sub-index

bool own_fields: whether the subindex needs to be freed in the destructor.

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type