Struct faiss::IndexBinaryIVF
-
struct IndexBinaryIVF : public faiss::IndexBinary
Index based on a inverted file (IVF)
In the inverted file, the quantizer (an IndexBinary instance) provides a quantization index for each vector to be added. The quantization index maps to a list (aka inverted list or posting list), where the id of the vector is stored.
Otherwise the object is similar to the IndexIVF
Public Types
-
using component_t = uint8_t
-
using distance_t = int32_t
Public Functions
-
IndexBinaryIVF(IndexBinary *quantizer, size_t d, size_t nlist)
The Inverted file takes a quantizer (an IndexBinary) on input, which implements the function mapping a vector to a list identifier. The pointer is borrowed: the quantizer should not be deleted while the IndexBinaryIVF is in use.
-
IndexBinaryIVF()
-
~IndexBinaryIVF() override
-
virtual void reset() override
Removes all elements from the database.
-
virtual void train(idx_t n, const uint8_t *x) override
Trains the quantizer.
-
virtual void add(idx_t n, const uint8_t *x) override
Add n vectors of dimension d to the index.
Vectors are implicitly assigned labels ntotal .. ntotal + n - 1
- Parameters:
x – input matrix, size n * d / 8
-
virtual void add_with_ids(idx_t n, const uint8_t *x, const idx_t *xids) override
Same as add, but stores xids instead of sequential ids.
The default implementation fails with an assertion, as it is not supported by all indexes.
- Parameters:
xids – if non-null, ids to store for the vectors (size n)
-
void add_core(idx_t n, const uint8_t *x, const idx_t *xids, const idx_t *precomputed_idx)
Implementation of vector addition where the vector assignments are predefined.
- Parameters:
precomputed_idx – quantization indices for the input vectors (size n)
-
void search_preassigned(idx_t n, const uint8_t *x, idx_t k, const idx_t *assign, const int32_t *centroid_dis, int32_t *distances, idx_t *labels, bool store_pairs, const IVFSearchParameters *params = nullptr) const
Search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the corresponding heaps with the query results. search() calls this.
- Parameters:
n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
distance – output distances, size n * k
labels – output labels, size n * k
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
-
virtual BinaryInvertedListScanner *get_InvertedListScanner(bool store_pairs = false) const
-
virtual void search(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels, const SearchParameters *params = nullptr) const override
assign the vectors, then call search_preassign
-
virtual void range_search(idx_t n, const uint8_t *x, int radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const override
Query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory). The distances are converted to float to reuse the RangeSearchResult structure, but they are integer. By convention, only distances < radius (strict comparison) are returned, ie. radius = 0 does not return any result and 1 returns only exact same vectors.
- Parameters:
x – input vectors to search, size n * d / 8
radius – search radius
result – result table
-
void range_search_preassigned(idx_t n, const uint8_t *x, int radius, const idx_t *assign, const int32_t *centroid_dis, RangeSearchResult *result) const
-
virtual void reconstruct(idx_t key, uint8_t *recons) const override
Reconstruct a stored vector.
This function may not be defined for some indexes.
- Parameters:
key – id of the vector to reconstruct
recons – reconstucted vector (size d / 8)
-
virtual void reconstruct_n(idx_t i0, idx_t ni, uint8_t *recons) const override
Reconstruct a subset of the indexed vectors.
Overrides default implementation to bypass reconstruct() which requires direct_map to be maintained.
- Parameters:
i0 – first vector to reconstruct
ni – nb of vectors to reconstruct
recons – output array of reconstructed vectors, size ni * d / 8
-
virtual void search_and_reconstruct(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels, uint8_t *recons, const SearchParameters *params = nullptr) const override
Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.
Overrides default implementation to avoid having to maintain direct_map and instead fetch the code offsets through the
store_pairs
flag in search_preassigned().- Parameters:
recons – reconstructed vectors size (n, k, d / 8)
-
virtual void reconstruct_from_offset(idx_t list_no, idx_t offset, uint8_t *recons) const
Reconstruct a vector given the location in terms of (inv list index + inv list offset) instead of the id.
Useful for reconstructing when the direct_map is not maintained and the inv list offset is computed by search_preassigned() with
store_pairs
set.
-
virtual size_t remove_ids(const IDSelector &sel) override
Dataset manipulation functions.
-
virtual void merge_from(IndexBinary &other, idx_t add_id) override
moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)
-
virtual void check_compatible_for_merge(const IndexBinary &otherIndex) const override
check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.
-
inline size_t get_list_size(size_t list_no) const
-
void make_direct_map(bool new_maintain_direct_map = true)
initialize a direct map
- Parameters:
new_maintain_direct_map – if true, create a direct map, else clear it
-
void replace_invlists(InvertedLists *il, bool own = false)
-
void assign(idx_t n, const uint8_t *x, idx_t *labels, idx_t k = 1) const
Return the indexes of the k vectors closest to the query x.
This function is identical to search but only returns labels of neighbors.
- Parameters:
x – input vectors to search, size n * d / 8
labels – output labels of the NNs, size n*k
-
void display() const
Display the actual class name and some more info.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)
Same as add_with_ids for IndexBinary.
Public Members
-
InvertedLists *invlists = nullptr
Access to the actual data.
-
bool own_invlists = true
-
size_t nprobe = 1
number of probes at query time
-
size_t max_codes = 0
max nb of codes to visit to do a query
-
bool use_heap = true
Select between using a heap or counting to select the k smallest values when scanning inverted lists.
-
bool per_invlist_search = false
collect computations per batch
-
DirectMap direct_map
map for direct access to the elements. Enables reconstruct().
-
IndexBinary *quantizer = nullptr
quantizer that maps vectors to inverted lists
-
size_t nlist = 0
number of possible key values
-
bool own_fields = false
whether object owns the quantizer
-
ClusteringParameters cp
to override default clustering params
-
Index *clustering_index = nullptr
to override index used during clustering
-
int d = 0
vector dimension
-
int code_size = 0
number of bytes per vector ( = d / 8 )
-
idx_t ntotal = 0
total nb of indexed vectors
-
bool verbose = false
verbosity level
-
bool is_trained = true
set if the Index does not require training, or if training is done already
-
MetricType metric_type = METRIC_L2
type of metric this index uses for search
-
using component_t = uint8_t