Struct faiss::IndexBinaryIVF

struct IndexBinaryIVF : public faiss::IndexBinary

Index based on a inverted file (IVF)

In the inverted file, the quantizer (an IndexBinary instance) provides a quantization index for each vector to be added. The quantization index maps to a list (aka inverted list or posting list), where the id of the vector is stored.

Otherwise the object is similar to the IndexIVF

Public Types

using component_t = uint8_t
using distance_t = int32_t

Public Functions

IndexBinaryIVF(IndexBinary *quantizer, size_t d, size_t nlist)

The Inverted file takes a quantizer (an IndexBinary) on input, which implements the function mapping a vector to a list identifier. The pointer is borrowed: the quantizer should not be deleted while the IndexBinaryIVF is in use.

IndexBinaryIVF()
~IndexBinaryIVF() override
virtual void reset() override

Removes all elements from the database.

virtual void train(idx_t n, const uint8_t *x) override

Trains the quantizer.

virtual void add(idx_t n, const uint8_t *x) override

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1

Parameters:

x – input matrix, size n * d / 8

virtual void add_with_ids(idx_t n, const uint8_t *x, const idx_t *xids) override

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters:

xids – if non-null, ids to store for the vectors (size n)

void add_core(idx_t n, const uint8_t *x, const idx_t *xids, const idx_t *precomputed_idx)

Implementation of vector addition where the vector assignments are predefined.

Parameters:

precomputed_idx – quantization indices for the input vectors (size n)

void search_preassigned(idx_t n, const uint8_t *x, idx_t k, const idx_t *assign, const int32_t *centroid_dis, int32_t *distances, idx_t *labels, bool store_pairs, const IVFSearchParameters *params = nullptr) const

Search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the corresponding heaps with the query results. search() calls this.

Parameters:
  • n – nb of vectors to query

  • x – query vectors, size nx * d

  • assign – coarse quantization indices, size nx * nprobe

  • centroid_dis – distances to coarse centroids, size nx * nprobe

  • distance – output distances, size n * k

  • labels – output labels, size n * k

  • store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).

  • params – used to override the object’s search parameters

virtual BinaryInvertedListScanner *get_InvertedListScanner(bool store_pairs = false) const
virtual void search(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels, const SearchParameters *params = nullptr) const override

assign the vectors, then call search_preassign

virtual void range_search(idx_t n, const uint8_t *x, int radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const override

Query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory). The distances are converted to float to reuse the RangeSearchResult structure, but they are integer. By convention, only distances < radius (strict comparison) are returned, ie. radius = 0 does not return any result and 1 returns only exact same vectors.

Parameters:
  • x – input vectors to search, size n * d / 8

  • radius – search radius

  • result – result table

void range_search_preassigned(idx_t n, const uint8_t *x, int radius, const idx_t *assign, const int32_t *centroid_dis, RangeSearchResult *result) const
virtual void reconstruct(idx_t key, uint8_t *recons) const override

Reconstruct a stored vector.

This function may not be defined for some indexes.

Parameters:
  • key – id of the vector to reconstruct

  • recons – reconstucted vector (size d / 8)

virtual void reconstruct_n(idx_t i0, idx_t ni, uint8_t *recons) const override

Reconstruct a subset of the indexed vectors.

Overrides default implementation to bypass reconstruct() which requires direct_map to be maintained.

Parameters:
  • i0 – first vector to reconstruct

  • ni – nb of vectors to reconstruct

  • recons – output array of reconstructed vectors, size ni * d / 8

virtual void search_and_reconstruct(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels, uint8_t *recons, const SearchParameters *params = nullptr) const override

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

Overrides default implementation to avoid having to maintain direct_map and instead fetch the code offsets through the store_pairs flag in search_preassigned().

Parameters:

recons – reconstructed vectors size (n, k, d / 8)

virtual void reconstruct_from_offset(idx_t list_no, idx_t offset, uint8_t *recons) const

Reconstruct a vector given the location in terms of (inv list index + inv list offset) instead of the id.

Useful for reconstructing when the direct_map is not maintained and the inv list offset is computed by search_preassigned() with store_pairs set.

virtual size_t remove_ids(const IDSelector &sel) override

Dataset manipulation functions.

virtual void merge_from(IndexBinary &other, idx_t add_id) override

moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const IndexBinary &otherIndex) const override

check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

inline size_t get_list_size(size_t list_no) const
void make_direct_map(bool new_maintain_direct_map = true)

initialize a direct map

Parameters:

new_maintain_direct_map – if true, create a direct map, else clear it

void set_direct_map_type(DirectMap::Type type)
void replace_invlists(InvertedLists *il, bool own = false)
void assign(idx_t n, const uint8_t *x, idx_t *labels, idx_t k = 1) const

Return the indexes of the k vectors closest to the query x.

This function is identical to search but only returns labels of neighbors.

Parameters:
  • x – input vectors to search, size n * d / 8

  • labels – output labels of the NNs, size n*k

void display() const

Display the actual class name and some more info.

virtual size_t sa_code_size() const

size of the produced codes in bytes

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Same as add_with_ids for IndexBinary.

Public Members

InvertedLists *invlists = nullptr

Access to the actual data.

bool own_invlists = true
size_t nprobe = 1

number of probes at query time

size_t max_codes = 0

max nb of codes to visit to do a query

bool use_heap = true

Select between using a heap or counting to select the k smallest values when scanning inverted lists.

bool per_invlist_search = false

collect computations per batch

DirectMap direct_map

map for direct access to the elements. Enables reconstruct().

IndexBinary *quantizer = nullptr

quantizer that maps vectors to inverted lists

size_t nlist = 0

number of possible key values

bool own_fields = false

whether object owns the quantizer

ClusteringParameters cp

to override default clustering params

Index *clustering_index = nullptr

to override index used during clustering

int d = 0

vector dimension

int code_size = 0

number of bytes per vector ( = d / 8 )

idx_t ntotal = 0

total nb of indexed vectors

bool verbose = false

verbosity level

bool is_trained = true

set if the Index does not require training, or if training is done already

MetricType metric_type = METRIC_L2

type of metric this index uses for search