Struct faiss::IndexRowwiseMinMaxBase
-
struct IndexRowwiseMinMaxBase : public faiss::Index
Provides base functions for rowwise normalizing indices.
Index wrapper that performs rowwise normalization to [0,1], preserving the coefficients. This is a vector codec index only.
Basically, this index performs a rowwise scaling to [0,1] of every row in an input dataset before calling subindex::train() and subindex::sa_encode(). sa_encode() call stores the scaling coefficients (scaler and minv) in the very beginning of every output code. The format: [scaler][minv][subindex::sa_encode() output] The de-scaling in sa_decode() is done using: output_rescaled = scaler * output + minv
An additional ::train_inplace() function is provided in order to do an inplace scaling before calling subindex::train() and, thus, avoiding the cloning of the input dataset, but modifying the input dataset because of the scaling and the scaling back. It is up to user to call this function instead of ::train()
Derived classes provide different data types for scaling coefficients. Currently, versions with fp16 and fp32 scaling coefficients are available.
fp16 version adds 4 extra bytes per encoded vector
fp32 version adds 8 extra bytes per encoded vector
Subclassed by faiss::IndexRowwiseMinMax, faiss::IndexRowwiseMinMaxFP16
Public Types
-
using component_t = float
-
using distance_t = float
Public Functions
-
explicit IndexRowwiseMinMaxBase(Index *index)
-
IndexRowwiseMinMaxBase()
-
~IndexRowwiseMinMaxBase() override
-
virtual void add(idx_t n, const float *x) override
Add n vectors of dimension d to the index.
Vectors are implicitly assigned labels ntotal .. ntotal + n - 1 This function slices the input vectors in chunks smaller than blocksize_add and calls add_core.
- Parameters:
n – number of vectors
x – input matrix, size n * d
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override
query n vectors of dimension d to the index.
return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.
- Parameters:
n – number of vectors
x – input vectors to search, size n * d
k – number of extracted vectors
distances – output pairwise distances, size n*k
labels – output labels of the NNs, size n*k
-
virtual void reset() override
removes all elements from the database.
-
virtual void train_inplace(idx_t n, float *x) = 0
-
virtual void train(idx_t n, const float *x)
Perform training on a representative set of vectors
- Parameters:
n – nb of training vectors
x – training vecors, size n * d
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)
Same as add, but stores xids instead of sequential ids.
The default implementation fails with an assertion, as it is not supported by all indexes.
- Parameters:
n – number of vectors
x – input vectors, size n * d
xids – if non-null, ids to store for the vectors (size n)
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters:
n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const
return the indexes of the k vectors closest to the query x.
This function is identical as search but only return labels of neighbors.
- Parameters:
n – number of vectors
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
k – number of nearest neighbours
-
virtual size_t remove_ids(const IDSelector &sel)
removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.
-
virtual void reconstruct(idx_t key, float *recons) const
Reconstruct a stored vector (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters:
key – id of the vector to reconstruct
recons – reconstucted vector (size d)
-
virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const
Reconstruct several stored vectors (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters:
n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)
-
virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const
Reconstruct vectors i0 to i0 + ni - 1
this function may not be defined for some indexes
- Parameters:
i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const
Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.
If there are not enough results for a query, the resulting arrays is padded with -1s.
- Parameters:
n – number of vectors
x – input vectors to search, size n * d
k – number of extracted vectors
distances – output pairwise distances, size n*k
labels – output labels of the NNs, size n*k
recons – reconstructed vectors size (n, k, d)
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const
Computes a residual vector after indexing encoding.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters:
x – input vector, size d
residual – output residual vector, size d
key – encoded index, as returned by search and assign
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const
Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters:
n – number of vectors
xs – input vectors, size (n x d)
residuals – output residual vectors, size (n x d)
keys – encoded index, as returned by search and assign
-
virtual DistanceComputer *get_distance_computer() const
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const
encode a set of vectors
- Parameters:
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const
decode a set of vectors
- Parameters:
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
-
virtual void merge_from(Index &otherIndex, idx_t add_id = 0)
moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)
-
virtual void check_compatible_for_merge(const Index &otherIndex) const
check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.
-
virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)
Add vectors that are computed with the standalone codec
- Parameters:
codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n
Public Members
-
Index *index
sub-index
-
bool own_fields
whether the subindex needs to be freed in the destructor.
-
int d
vector dimension
-
idx_t ntotal
total nb of indexed vectors
-
bool verbose
verbosity level
-
bool is_trained
set if the Index does not require training, or if training is done already
-
MetricType metric_type
type of metric this index uses for search
-
float metric_arg
argument of the metric type