Namespace faiss
-
namespace faiss
Copyright (c) Facebook, Inc. and its affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.
Throughout the library, vectors are provided as float * pointers. Most algorithms can be optimized when several vectors are processed (added/searched) together in a batch. In this case, they are passed in as a matrix. When n vectors of size d are provided as float * x, component j of vector i is
x[ i * d + j ]
where 0 <= i < n and 0 <= j < d. In other words, matrices are always compact. When specifying the size of the matrix, we call it an n*d matrix, which implies a row-major storage.
Copyright (c) Facebook, Inc. and its affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree. I/O functions can read/write to a filename, a file handle or to an object that abstracts the medium.
The read functions return objects that should be deallocated with delete. All references within these objectes are owned by the object.
Copyright (c) Facebook, Inc. and its affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree. Definition of inverted lists + a few common classes that implement the interface.
Copyright (c) Facebook, Inc. and its affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree. Since IVF (inverted file) indexes are of so much use for large-scale use cases, we group a few functions related to them in this small library. Most functions work both on IndexIVFs and IndexIVFs embedded within an IndexPreTransform.
Copyright (c) Facebook, Inc. and its affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree. In this file are the implementations of extra metrics beyond L2 and inner product
Copyright (c) Facebook, Inc. and its affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree. Defines a few objects that apply transformations to a set of vectors Often these are pre-processing steps.
Typedefs
-
using IndexReplicas = IndexReplicasTemplate<Index>
-
using IndexBinaryReplicas = IndexReplicasTemplate<IndexBinary>
-
using IndexShards = IndexShardsTemplate<Index>
-
using IndexBinaryShards = IndexShardsTemplate<IndexBinary>
-
using ConcatenatedInvertedLists = HStackInvertedLists
-
using IndexIDMap = IndexIDMapTemplate<Index>
-
using IndexBinaryIDMap = IndexIDMapTemplate<IndexBinary>
-
using IndexIDMap2 = IndexIDMap2Template<Index>
-
using IndexBinaryIDMap2 = IndexIDMap2Template<IndexBinary>
Enums
-
enum MetricType
The metric space for vector comparison for Faiss indices and algorithms.
Most algorithms support both inner product and L2, with the flat (brute-force) indices supporting additional metric types for vector comparison.
Values:
-
enumerator METRIC_INNER_PRODUCT
maximum inner product search
-
enumerator METRIC_L2
squared L2 search
-
enumerator METRIC_L1
L1 (aka cityblock)
-
enumerator METRIC_Linf
infinity distance
-
enumerator METRIC_Lp
L_p distance, p is given by a faiss::Index metric_arg
-
enumerator METRIC_Canberra
some additional metrics defined in scipy.spatial.distance
-
enumerator METRIC_BrayCurtis
-
enumerator METRIC_JensenShannon
-
enumerator METRIC_INNER_PRODUCT
Functions
-
Quantizer *clone_Quantizer(const Quantizer *quant)
-
float kmeans_clustering(size_t d, size_t n, size_t k, const float *x, float *centroids)
simplified interface
- Parameters
d – dimension of the data
n – nb of training vectors
k – nb of output centroids
x – training set (size n * d)
centroids – output centroids (size k * d)
- Returns
final quantization error
-
Index *index_factory(int d, const char *description, MetricType metric = METRIC_L2)
Build and index with the sequence of processing steps described in the string.
-
IndexBinary *index_binary_factory(int d, const char *description)
-
void write_index(const Index *idx, const char *fname)
-
void write_index(const Index *idx, FILE *f)
-
void write_index(const Index *idx, IOWriter *writer)
-
void write_index_binary(const IndexBinary *idx, const char *fname)
-
void write_index_binary(const IndexBinary *idx, FILE *f)
-
void write_index_binary(const IndexBinary *idx, IOWriter *writer)
-
Index *read_index(const char *fname, int io_flags = 0)
-
Index *read_index(FILE *f, int io_flags = 0)
-
Index *read_index(IOReader *reader, int io_flags = 0)
-
IndexBinary *read_index_binary(const char *fname, int io_flags = 0)
-
IndexBinary *read_index_binary(FILE *f, int io_flags = 0)
-
IndexBinary *read_index_binary(IOReader *reader, int io_flags = 0)
-
void write_VectorTransform(const VectorTransform *vt, const char *fname)
-
VectorTransform *read_VectorTransform(const char *fname)
-
ProductQuantizer *read_ProductQuantizer(const char *fname)
-
ProductQuantizer *read_ProductQuantizer(IOReader *reader)
-
void write_ProductQuantizer(const ProductQuantizer *pq, const char *fname)
-
void write_ProductQuantizer(const ProductQuantizer *pq, IOWriter *f)
-
void write_InvertedLists(const InvertedLists *ils, IOWriter *f)
-
InvertedLists *read_InvertedLists(IOReader *reader, int io_flags = 0)
-
void initialize_IVFPQ_precomputed_table(int &use_precomputed_table, const Index *quantizer, const ProductQuantizer &pq, AlignedTable<float> &precomputed_table, bool verbose)
Pre-compute distance tables for IVFPQ with by-residual and METRIC_L2
- Parameters
use_precomputed_table – (I/O) =-1: force disable =0: decide heuristically (default: use tables only if they are < precomputed_tables_max_bytes), set use_precomputed_table on output =1: tables that work for all quantizers (size 256 * nlist * M) =2: specific version for MultiIndexQuantizer (much more compact)
precomputed_table – precomputed table to initialize
-
inline uint64_t lo_build(uint64_t list_id, uint64_t offset)
-
inline uint64_t lo_listno(uint64_t lo)
-
inline uint64_t lo_offset(uint64_t lo)
-
template<int A = 32>
inline bool is_aligned_pointer(const void *x)
-
float fvec_L2sqr(const float *x, const float *y, size_t d)
Squared L2 distance between two vectors.
-
float fvec_inner_product(const float *x, const float *y, size_t d)
inner product
-
float fvec_L1(const float *x, const float *y, size_t d)
L1 distance.
-
float fvec_Linf(const float *x, const float *y, size_t d)
infinity distance
-
void pairwise_L2sqr(int64_t d, int64_t nq, const float *xq, int64_t nb, const float *xb, float *dis, int64_t ldq = -1, int64_t ldb = -1, int64_t ldd = -1)
Compute pairwise distances between sets of vectors
- Parameters
d – dimension of the vectors
nq – nb of query vectors
nb – nb of database vectors
xq – query vectors (size nq * d)
xb – database vectors (size nb * d)
dis – output distances (size nq * nb)
ldq, ldb, ldd – strides for the matrices
-
void fvec_inner_products_ny(float *ip, const float *x, const float *y, size_t d, size_t ny)
-
void fvec_L2sqr_ny(float *dis, const float *x, const float *y, size_t d, size_t ny)
-
size_t fvec_L2sqr_ny_nearest(float *distances_tmp_buffer, const float *x, const float *y, size_t d, size_t ny)
-
float fvec_norm_L2sqr(const float *x, size_t d)
squared norm of a vector
-
void fvec_norms_L2(float *norms, const float *x, size_t d, size_t nx)
compute the L2 norms for a set of vectors
- Parameters
norms – output norms, size nx
x – set of vectors, size nx * d
-
void fvec_norms_L2sqr(float *norms, const float *x, size_t d, size_t nx)
same as fvec_norms_L2, but computes squared norms
-
void fvec_renorm_L2(size_t d, size_t nx, float *x)
-
void inner_product_to_L2sqr(float *dis, const float *nr1, const float *nr2, size_t n1, size_t n2)
-
void fvec_add(size_t d, const float *a, const float *b, float *c)
compute c := a + b for vectors
c and a can overlap, c and b can overlap
- Parameters
a – size d
b – size d
c – size d
-
void fvec_add(size_t d, const float *a, float b, float *c)
compute c := a + b for a, c vectors and b a scalar
c and a can overlap
- Parameters
a – size d
c – size d
-
void fvec_sub(size_t d, const float *a, const float *b, float *c)
compute c := a - b for vectors
c and a can overlap, c and b can overlap
- Parameters
a – size d
b – size d
c – size d
-
void fvec_inner_products_by_idx(float *ip, const float *x, const float *y, const int64_t *ids, size_t d, size_t nx, size_t ny)
-
void fvec_L2sqr_by_idx(float *dis, const float *x, const float *y, const int64_t *ids, size_t d, size_t nx, size_t ny)
-
void pairwise_indexed_L2sqr(size_t d, size_t n, const float *x, const int64_t *ix, const float *y, const int64_t *iy, float *dis)
compute dis[j] = L2sqr(x[ix[j]], y[iy[j]]) forall j=0..n-1
- Parameters
x – size (max(ix) + 1, d)
y – size (max(iy) + 1, d)
ix – size n
iy – size n
dis – size n
-
void pairwise_indexed_inner_product(size_t d, size_t n, const float *x, const int64_t *ix, const float *y, const int64_t *iy, float *dis)
-
void knn_inner_product(const float *x, const float *y, size_t d, size_t nx, size_t ny, float_minheap_array_t *res)
Return the k nearest neighors of each of the nx vectors x among the ny vector y, w.r.t to max inner product.
- Parameters
x – query vectors, size nx * d
y – database vectors, size ny * d
res – result heap structure, which also provides k. Sorted on output
-
void knn_inner_product(const float *x, const float *y, size_t d, size_t nx, size_t ny, size_t k, float *distances, int64_t *indexes)
Return the k nearest neighors of each of the nx vectors x among the ny vector y, for the inner product metric.
- Parameters
x – query vectors, size nx * d
y – database vectors, size ny * d
distances – output distances, size nq * k
indexes – output vector ids, size nq * k
-
void knn_L2sqr(const float *x, const float *y, size_t d, size_t nx, size_t ny, float_maxheap_array_t *res, const float *y_norm2 = nullptr)
Return the k nearest neighors of each of the nx vectors x among the ny vector y, for the L2 distance
- Parameters
x – query vectors, size nx * d
y – database vectors, size ny * d
res – result heap strcture, which also provides k. Sorted on output
y_norm2 – (optional) norms for the y vectors (nullptr or size ny)
-
void knn_L2sqr(const float *x, const float *y, size_t d, size_t nx, size_t ny, size_t k, float *distances, int64_t *indexes, const float *y_norm2 = nullptr)
Return the k nearest neighors of each of the nx vectors x among the ny vector y, for the L2 distance
- Parameters
x – query vectors, size nx * d
y – database vectors, size ny * d
distances – output distances, size nq * k
indexes – output vector ids, size nq * k
y_norm2 – (optional) norms for the y vectors (nullptr or size ny)
-
void knn_inner_products_by_idx(const float *x, const float *y, const int64_t *ids, size_t d, size_t nx, size_t ny, float_minheap_array_t *res)
-
void knn_L2sqr_by_idx(const float *x, const float *y, const int64_t *ids, size_t d, size_t nx, size_t ny, float_maxheap_array_t *res)
-
void range_search_L2sqr(const float *x, const float *y, size_t d, size_t nx, size_t ny, float radius, RangeSearchResult *result)
Return the k nearest neighors of each of the nx vectors x among the ny vector y, w.r.t to max inner product
- Parameters
x – query vectors, size nx * d
y – database vectors, size ny * d
radius – search radius around the x vectors
result – result structure
-
void range_search_inner_product(const float *x, const float *y, size_t d, size_t nx, size_t ny, float radius, RangeSearchResult *result)
same as range_search_L2sqr for the inner product similarity
-
void compute_PQ_dis_tables_dsub2(size_t d, size_t ksub, const float *centroids, size_t nx, const float *x, bool is_inner_product, float *dis_tables)
specialized function for PQ2
-
void pairwise_extra_distances(int64_t d, int64_t nq, const float *xq, int64_t nb, const float *xb, MetricType mt, float metric_arg, float *dis, int64_t ldq = -1, int64_t ldb = -1, int64_t ldd = -1)
-
void knn_extra_metrics(const float *x, const float *y, size_t d, size_t nx, size_t ny, MetricType mt, float metric_arg, float_maxheap_array_t *res)
-
FlatCodesDistanceComputer *get_extra_distance_computer(size_t d, MetricType mt, float metric_arg, size_t nb, const float *xb)
get a DistanceComputer that refers to this type of distance and indexes a flat array of size nb
- SPECIALIZED_HC (4)
- SPECIALIZED_HC (8)
- SPECIALIZED_HC (16)
- SPECIALIZED_HC (20)
- SPECIALIZED_HC (32)
- SPECIALIZED_HC (64)
-
inline int generalized_hamming_64(uint64_t a)
-
void generalized_hammings_knn_hc(int_maxheap_array_t *ha, const uint8_t *a, const uint8_t *b, size_t nb, size_t code_size, int ordered = true)
generalized Hamming distances (= count number of code bytes that are the same)
-
void bitvec_print(const uint8_t *b, size_t d)
-
void fvecs2bitvecs(const float *x, uint8_t *b, size_t d, size_t n)
-
void bitvecs2fvecs(const uint8_t *b, float *x, size_t d, size_t n)
-
void fvec2bitvec(const float *x, uint8_t *b, size_t d)
-
void bitvec_shuffle(size_t n, size_t da, size_t db, const int *order, const uint8_t *a, uint8_t *b)
Shuffle the bits from b(i, j) := a(i, order[j])
-
inline int popcount64(uint64_t x)
-
void hammings(const uint8_t *a, const uint8_t *b, size_t na, size_t nb, size_t nbytespercode, hamdis_t *dis)
Compute a set of Hamming distances between na and nb binary vectors
- Parameters
a – size na * nbytespercode
b – size nb * nbytespercode
nbytespercode – should be multiple of 8
dis – output distances, size na * nb
-
void hammings_knn_hc(int_maxheap_array_t *ha, const uint8_t *a, const uint8_t *b, size_t nb, size_t ncodes, int ordered)
Return the k smallest Hamming distances for a set of binary query vectors, using a max heap.
- Parameters
a – queries, size ha->nh * ncodes
b – database, size nb * ncodes
nb – number of database vectors
ncodes – size of the binary codes (bytes)
ordered – if != 0: order the results by decreasing distance (may be bottleneck for k/n > 0.01)
-
void hammings_knn(int_maxheap_array_t *ha, const uint8_t *a, const uint8_t *b, size_t nb, size_t ncodes, int ordered)
-
void hammings_knn_mc(const uint8_t *a, const uint8_t *b, size_t na, size_t nb, size_t k, size_t ncodes, int32_t *distances, int64_t *labels)
Return the k smallest Hamming distances for a set of binary query vectors, using counting max.
- Parameters
a – queries, size na * ncodes
b – database, size nb * ncodes
na – number of query vectors
nb – number of database vectors
k – number of vectors/distances to return
ncodes – size of the binary codes (bytes)
distances – output distances from each query vector to its k nearest neighbors
labels – output ids of the k nearest neighbors to each query vector
-
void hamming_range_search(const uint8_t *a, const uint8_t *b, size_t na, size_t nb, int radius, size_t ncodes, RangeSearchResult *result)
same as hammings_knn except we are doing a range search with radius
-
void hamming_count_thres(const uint8_t *bs1, const uint8_t *bs2, size_t n1, size_t n2, hamdis_t ht, size_t ncodes, size_t *nptr)
-
size_t match_hamming_thres(const uint8_t *bs1, const uint8_t *bs2, size_t n1, size_t n2, hamdis_t ht, size_t ncodes, int64_t *idx, hamdis_t *dis)
-
void crosshamming_count_thres(const uint8_t *dbs, size_t n, hamdis_t ht, size_t ncodes, size_t *nptr)
-
hamdis_t hamming(const uint64_t *bs1, const uint64_t *bs2, size_t nwords)
-
template<class C>
inline void heap_pop(size_t k, typename C::T *bh_val, typename C::TI *bh_ids) Pops the top element from the heap defined by bh_val[0..k-1] and bh_ids[0..k-1]. on output the element at k-1 is undefined.
-
template<class C>
inline void heap_push(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, typename C::T val, typename C::TI id) Pushes the element (val, ids) into the heap bh_val[0..k-2] and bh_ids[0..k-2]. on output the element at k-1 is defined.
-
template<class C>
inline void heap_replace_top(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, typename C::T val, typename C::TI id) Replaces the top element from the heap defined by bh_val[0..k-1] and bh_ids[0..k-1], and for identical bh_val[] values also sorts by bh_ids[] values.
-
template<typename T>
inline void minheap_pop(size_t k, T *bh_val, int64_t *bh_ids)
-
template<typename T>
inline void minheap_push(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
-
template<typename T>
inline void minheap_replace_top(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
-
template<typename T>
inline void maxheap_pop(size_t k, T *bh_val, int64_t *bh_ids)
-
template<typename T>
inline void maxheap_push(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
-
template<typename T>
inline void maxheap_replace_top(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
-
template<class C>
inline void heap_heapify(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, const typename C::T *x = nullptr, const typename C::TI *ids = nullptr, size_t k0 = 0)
-
template<typename T>
inline void minheap_heapify(size_t k, T *bh_val, int64_t *bh_ids, const T *x = nullptr, const int64_t *ids = nullptr, size_t k0 = 0)
-
template<typename T>
inline void maxheap_heapify(size_t k, T *bh_val, int64_t *bh_ids, const T *x = nullptr, const int64_t *ids = nullptr, size_t k0 = 0)
-
template<class C>
inline void heap_addn(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, const typename C::T *x, const typename C::TI *ids, size_t n)
-
template<typename T>
inline void minheap_addn(size_t k, T *bh_val, int64_t *bh_ids, const T *x, const int64_t *ids, size_t n)
-
template<typename T>
inline void maxheap_addn(size_t k, T *bh_val, int64_t *bh_ids, const T *x, const int64_t *ids, size_t n)
-
template<typename C>
inline size_t heap_reorder(size_t k, typename C::T *bh_val, typename C::TI *bh_ids)
-
template<typename T>
inline size_t minheap_reorder(size_t k, T *bh_val, int64_t *bh_ids)
-
template<typename T>
inline size_t maxheap_reorder(size_t k, T *bh_val, int64_t *bh_ids)
-
template<class C>
inline void indirect_heap_pop(size_t k, const typename C::T *bh_val, typename C::TI *bh_ids)
-
template<class C>
inline void indirect_heap_push(size_t k, const typename C::T *bh_val, typename C::TI *bh_ids, typename C::TI id)
-
template<>
inline float cmin_nextafter<float>(float x)
-
template<>
inline float cmax_nextafter<float>(float x)
-
template<>
inline uint16_t cmin_nextafter<uint16_t>(uint16_t x)
-
template<>
inline uint16_t cmax_nextafter<uint16_t>(uint16_t x)
-
template<class C>
C::T partition_fuzzy(typename C::T *vals, typename C::TI *ids, size_t n, size_t q_min, size_t q_max, size_t *q_out) partitions the table into 0:q and q:n where all elements above q are >= all elements below q (for C = CMax, for CMin comparisons are reversed)
Returns the partition threshold. The elements q:n are destroyed on output.
-
template<class C>
inline C::T partition(typename C::T *vals, typename C::TI *ids, size_t n, size_t q) simplified interface for when the parition is not fuzzy
-
void simd_histogram_8(const uint16_t *data, int n, uint16_t min, int shift, int *hist)
low level SIMD histogramming functions 8-bin histogram of (x - min) >> shift values outside the range are ignored. the data table should be aligned on 32 bytes
-
void simd_histogram_16(const uint16_t *data, int n, uint16_t min, int shift, int *hist)
same for 16-bin histogram
-
void float_rand(float *x, size_t n, int64_t seed)
-
void float_randn(float *x, size_t n, int64_t seed)
-
void int64_rand(int64_t *x, size_t n, int64_t seed)
-
void byte_rand(uint8_t *x, size_t n, int64_t seed)
-
void int64_rand_max(int64_t *x, size_t n, uint64_t max, int64_t seed)
-
void rand_perm(int *perm, size_t n, int64_t seed)
-
void rand_smooth_vectors(size_t n, size_t d, float *x, int64_t seed)
-
inline simd16uint16 min(simd16uint16 a, simd16uint16 b)
-
inline simd16uint16 max(simd16uint16 a, simd16uint16 b)
-
inline simd16uint16 combine2x2(simd16uint16 a, simd16uint16 b)
-
inline uint32_t cmp_ge32(simd16uint16 d0, simd16uint16 d1, simd16uint16 thr)
-
inline uint32_t cmp_le32(simd16uint16 d0, simd16uint16 d1, simd16uint16 thr)
-
inline simd32uint8 uint16_to_uint8_saturate(simd16uint16 a, simd16uint16 b)
-
inline uint32_t get_MSBs(simd32uint8 a)
get most significant bit of each byte
-
inline simd32uint8 blendv(simd32uint8 a, simd32uint8 b, simd32uint8 mask)
use MSB of each byte of mask to select a byte between a and b
-
inline simd8float32 hadd(simd8float32 a, simd8float32 b)
-
inline simd8float32 unpacklo(simd8float32 a, simd8float32 b)
-
inline simd8float32 unpackhi(simd8float32 a, simd8float32 b)
-
inline simd8float32 fmadd(simd8float32 a, simd8float32 b, simd8float32 c)
-
inline simd16uint16 min(const simd16uint16 &av, const simd16uint16 &bv)
-
inline simd16uint16 max(const simd16uint16 &av, const simd16uint16 &bv)
-
inline simd16uint16 combine2x2(const simd16uint16 &a, const simd16uint16 &b)
-
inline uint32_t cmp_ge32(const simd16uint16 &d0, const simd16uint16 &d1, const simd16uint16 &thr)
-
inline uint32_t cmp_le32(const simd16uint16 &d0, const simd16uint16 &d1, const simd16uint16 &thr)
-
inline simd32uint8 uint16_to_uint8_saturate(const simd16uint16 &a, const simd16uint16 &b)
-
inline uint32_t get_MSBs(const simd32uint8 &a)
get most significant bit of each byte
-
inline simd32uint8 blendv(const simd32uint8 &a, const simd32uint8 &b, const simd32uint8 &mask)
use MSB of each byte of mask to select a byte between a and b
-
inline simd8float32 hadd(const simd8float32 &a, const simd8float32 &b)
-
inline simd8float32 unpacklo(const simd8float32 &a, const simd8float32 &b)
-
inline simd8float32 unpackhi(const simd8float32 &a, const simd8float32 &b)
-
inline simd8float32 fmadd(const simd8float32 &a, const simd8float32 &b, const simd8float32 &c)
-
std::string get_compile_options()
get compile options
-
double getmillisecs()
ms elapsed since some arbitrary epoch
-
size_t get_mem_usage_kb()
get current RSS usage in kB
-
uint64_t get_cycles()
-
void fvec_madd(size_t n, const float *a, float bf, const float *b, float *c)
compute c := a + bf * b for a, b and c tables
- Parameters
n – size of the tables
a – size n
b – size n
c – restult table, size n
-
int fvec_madd_and_argmin(size_t n, const float *a, float bf, const float *b, float *c)
same as fvec_madd, also return index of the min of the result table
- Returns
index of the min of table c
-
void reflection(const float *u, float *x, size_t n, size_t d, size_t nu)
-
void matrix_qr(int m, int n, float *a)
compute the Q of the QR decomposition for m > n
- Parameters
a – size n * m: input matrix and output Q
-
void ranklist_handle_ties(int k, int64_t *idx, const float *dis)
distances are supposed to be sorted. Sorts indices with same distance
-
size_t ranklist_intersection_size(size_t k1, const int64_t *v1, size_t k2, const int64_t *v2)
count the number of common elements between v1 and v2 algorithm = sorting + bissection to avoid double-counting duplicates
-
size_t merge_result_table_with(size_t n, size_t k, int64_t *I0, float *D0, const int64_t *I1, const float *D1, bool keep_min = true, int64_t translation = 0)
merge a result table into another one
- Parameters
I0, D0 – first result table, size (n, k)
I1, D1 – second result table, size (n, k)
keep_min – if true, keep min values, otherwise keep max
translation – add this value to all I1’s indexes
- Returns
nb of values that were taken from the second table
-
double imbalance_factor(int n, int k, const int64_t *assign)
a balanced assignment has a IF of 1
-
double imbalance_factor(int k, const int *hist)
same, takes a histogram as input
-
void fvec_argsort(size_t n, const float *vals, size_t *perm)
-
void fvec_argsort_parallel(size_t n, const float *vals, size_t *perm)
-
int ivec_hist(size_t n, const int *v, int vmax, int *hist)
compute histogram on v
-
void bincode_hist(size_t n, size_t nbits, const uint8_t *codes, int *hist)
Compute histogram of bits on a code array
- Parameters
codes – size(n, nbits / 8)
hist – size(nbits): nb of 1s in the array of codes
-
size_t ivec_checksum(size_t n, const int *a)
compute a checksum on a table.
-
const float *fvecs_maybe_subsample(size_t d, size_t *n, size_t nmax, const float *x, bool verbose = false, int64_t seed = 1234)
random subsamples a set of vectors if there are too many of them
- Parameters
d – dimension of the vectors
n – on input: nb of input vectors, output: nb of output vectors
nmax – max nb of vectors to keep
x – input array, size *n-by-d
seed – random seed to use for sampling
- Returns
x or an array allocated with new [] with *n vectors
-
void binary_to_real(size_t d, const uint8_t *x_in, float *x_out)
Convert binary vector to +1/-1 valued float vector.
- Parameters
d – dimension of the vector (multiple of 8)
x_in – input binary vector (uint8_t table of size d / 8)
x_out – output float vector (float table of size d)
-
void real_to_binary(size_t d, const float *x_in, uint8_t *x_out)
Convert float vector to binary vector. Components > 0 are converted to 1, others to 0.
- Parameters
d – dimension of the vector (multiple of 8)
x_in – input float vector (float table of size d)
x_out – output binary vector (uint8_t table of size d / 8)
-
uint64_t hash_bytes(const uint8_t *bytes, int64_t n)
A reasonable hashing function
-
bool check_openmp()
Whether OpenMP annotations were respected.
Variables
- FAISS_API int index2layer_sa_encode_bs
- FAISS_API int index_factory_verbose
set to > 0 to get more logs from index_factory
-
const int IO_FLAG_READ_ONLY = 2
-
const int IO_FLAG_ONDISK_SAME_DIR = 4
-
const int IO_FLAG_SKIP_IVF_DATA = 8
-
const int IO_FLAG_MMAP = IO_FLAG_SKIP_IVF_DATA | 0x646f0000
- FAISS_API IndexBinaryHashStats indexBinaryHash_stats
- FAISS_API FastScanStats FastScan_stats
- FAISS_API IndexIVFStats indexIVF_stats
- FAISS_API IVFFastScanStats IVFFastScan_stats
- FAISS_API size_t precomputed_table_max_bytes
- FAISS_API int index_ivfpq_add_core_o_bs
- FAISS_API IndexIVFPQStats indexIVFPQ_stats
- FAISS_API IndexPQStats indexPQ_stats
- FAISS_API int multi_index_quantizer_search_bs
- FAISS_API int distance_compute_blas_threshold
- FAISS_API int distance_compute_blas_query_bs
- FAISS_API int distance_compute_blas_database_bs
- FAISS_API int distance_compute_min_k_reservoir
-
const uint8_t hamdis_tab_ham_bytes[256]
- FAISS_API size_t hamming_batch_size
- FAISS_API PartitionStats partition_stats
-
struct AdditiveCoarseQuantizer : public faiss::Index
- #include <IndexAdditiveQuantizer.h>
A “virtual” index where the elements are the residual quantizer centroids.
Intended for use as a coarse quantizer in an IndexIVF.
Subclassed by faiss::LocalSearchCoarseQuantizer, faiss::ResidualCoarseQuantizer
Public Types
-
using idx_t = int64_t
all indices are this type
-
using component_t = float
-
using distance_t = float
Public Functions
-
explicit AdditiveCoarseQuantizer(idx_t d = 0, AdditiveQuantizer *aq = nullptr, MetricType metric = METRIC_L2)
-
virtual void add(idx_t n, const float *x) override
N/A.
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels) const override
query n vectors of dimension d to the index.
return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.
- Parameters
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
distances – output pairwise distances, size n*k
-
virtual void reconstruct(idx_t key, float *recons) const override
Reconstruct a stored vector (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters
key – id of the vector to reconstruct
recons – reconstucted vector (size d)
-
virtual void train(idx_t n, const float *x) override
Perform training on a representative set of vectors
- Parameters
n – nb of training vectors
x – training vecors, size n * d
-
virtual void reset() override
N/A.
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)
Same as add, but stores xids instead of sequential ids.
The default implementation fails with an assertion, as it is not supported by all indexes.
- Parameters
xids – if non-null, ids to store for the vectors (size n)
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const
return the indexes of the k vectors closest to the query x.
This function is identical as search but only return labels of neighbors.
- Parameters
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
-
virtual size_t remove_ids(const IDSelector &sel)
removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.
-
virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const
Reconstruct vectors i0 to i0 + ni - 1
this function may not be defined for some indexes
- Parameters
recons – reconstucted vector (size ni * d)
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons) const
Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.
If there are not enough results for a query, the resulting arrays is padded with -1s.
- Parameters
recons – reconstructed vectors size (n, k, d)
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const
Computes a residual vector after indexing encoding.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters
x – input vector, size d
residual – output residual vector, size d
key – encoded index, as returned by search and assign
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const
Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters
n – number of vectors
xs – input vectors, size (n x d)
residuals – output residual vectors, size (n x d)
keys – encoded index, as returned by search and assign
-
virtual DistanceComputer *get_distance_computer() const
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const
encode a set of vectors
- Parameters
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const
decode a set of vectors
- Parameters
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
Public Members
-
AdditiveQuantizer *aq
-
std::vector<float> centroid_norms
norms of centroids, useful for knn-search
-
int d
vector dimension
-
bool verbose
verbosity level
-
MetricType metric_type
type of metric this index uses for search
-
float metric_arg
argument of the metric type
-
using idx_t = int64_t
-
template<class T, int A = 32>
struct AlignedTable Public Functions
-
inline AlignedTable()
-
inline explicit AlignedTable(size_t n)
-
inline size_t itemsize() const
-
inline void resize(size_t n)
-
inline void clear()
-
inline size_t size() const
-
inline size_t nbytes() const
-
inline T *get()
-
inline const T *get() const
-
inline T *data()
-
inline const T *data() const
-
inline T &operator[](size_t i)
-
inline T operator[](size_t i) const
Public Static Functions
-
static inline size_t round_capacity(size_t n)
-
inline AlignedTable()
-
template<class T, int A = 32>
struct AlignedTableTightAlloc Public Functions
-
inline AlignedTableTightAlloc()
-
inline explicit AlignedTableTightAlloc(size_t n)
-
inline size_t itemsize() const
-
inline void resize(size_t n)
-
inline void clear()
-
inline size_t size() const
-
inline size_t nbytes() const
-
inline T *get()
-
inline const T *get() const
-
inline T *data()
-
inline const T *data() const
-
inline T &operator[](size_t i)
-
inline T operator[](size_t i) const
-
inline ~AlignedTableTightAlloc()
-
inline AlignedTableTightAlloc<T, A> &operator=(const AlignedTableTightAlloc<T, A> &other)
-
inline AlignedTableTightAlloc(const AlignedTableTightAlloc<T, A> &other)
-
inline AlignedTableTightAlloc()
-
struct ArrayInvertedLists : public faiss::InvertedLists
- #include <InvertedLists.h>
simple (default) implementation as an array of inverted lists
Public Functions
-
ArrayInvertedLists(size_t nlist, size_t code_size)
-
virtual size_t list_size(size_t list_no) const override
get the size of a list
-
virtual const uint8_t *get_codes(size_t list_no) const override
get the codes for an inverted list must be released by release_codes
- Returns
codes size list_size * code_size
-
virtual const idx_t *get_ids(size_t list_no) const override
get the ids for an inverted list must be released by release_ids
- Returns
ids size list_size
-
virtual size_t add_entries(size_t list_no, size_t n_entry, const idx_t *ids, const uint8_t *code) override
-
virtual void update_entries(size_t list_no, size_t offset, size_t n_entry, const idx_t *ids, const uint8_t *code) override
-
virtual void resize(size_t list_no, size_t new_size) override
-
~ArrayInvertedLists() override
-
virtual void release_codes(size_t list_no, const uint8_t *codes) const
release codes returned by get_codes (default implementation is nop
-
virtual void release_ids(size_t list_no, const idx_t *ids) const
release ids returned by get_ids
-
virtual idx_t get_single_id(size_t list_no, size_t offset) const
- Returns
a single id in an inverted list
-
virtual const uint8_t *get_single_code(size_t list_no, size_t offset) const
- Returns
a single code in an inverted list (should be deallocated with release_codes)
-
virtual void prefetch_lists(const idx_t *list_nos, int nlist) const
prepare the following lists (default does nothing) a list can be -1 hence the signed long
-
virtual size_t add_entry(size_t list_no, idx_t theid, const uint8_t *code)
add one entry to an inverted list
-
virtual void update_entry(size_t list_no, size_t offset, idx_t id, const uint8_t *code)
-
virtual void reset()
-
void merge_from(InvertedLists *oivf, size_t add_id)
move all entries from oivf (empty on output)
-
double imbalance_factor() const
1= perfectly balanced, >1: imbalanced
-
void print_stats() const
display some stats about the inverted lists
-
size_t compute_ntotal() const
sum up list sizes
Public Members
-
std::vector<std::vector<uint8_t>> codes
-
std::vector<std::vector<idx_t>> ids
Inverted lists for indexes.
-
size_t nlist
number of possible key values
-
size_t code_size
code size per vector in bytes
Public Static Attributes
-
static const size_t INVALID_CODE_SIZE = static_cast<size_t>(-1)
used for BlockInvertedLists, where the codes are packed into groups and the individual code size is meaningless
-
ArrayInvertedLists(size_t nlist, size_t code_size)
-
struct AutoTuneCriterion
- #include <AutoTune.h>
Evaluation criterion. Returns a performance measure in [0,1], higher is better.
Subclassed by faiss::IntersectionCriterion, faiss::OneRecallAtRCriterion
Public Functions
-
void set_groundtruth(int gt_nnn, const float *gt_D_in, const idx_t *gt_I_in)
Intitializes the gt_D and gt_I vectors. Must be called before evaluating
- Parameters
gt_D_in – size nq * gt_nnn
gt_I_in – size nq * gt_nnn
-
virtual double evaluate(const float *D, const idx_t *I) const = 0
Evaluate the criterion.
- Parameters
D – size nq * nnn
I – size nq * nnn
- Returns
the criterion, between 0 and 1. Larger is better.
-
inline virtual ~AutoTuneCriterion()
Public Members
-
idx_t nq
nb of queries this criterion is evaluated on
-
idx_t nnn
nb of NNs that the query should request
-
idx_t gt_nnn
nb of GT NNs required to evaluate criterion
-
std::vector<float> gt_D
Ground-truth distances (size nq * gt_nnn)
-
std::vector<idx_t> gt_I
Ground-truth indexes (size nq * gt_nnn)
-
void set_groundtruth(int gt_nnn, const float *gt_D_in, const idx_t *gt_I_in)
-
struct BinaryInvertedListScanner
-
Public Functions
-
virtual void set_query(const uint8_t *query_vector) = 0
from now on we handle this query.
-
virtual void set_list(idx_t list_no, uint8_t coarse_dis) = 0
following codes come from this inverted list
-
virtual uint32_t distance_to_code(const uint8_t *code) const = 0
compute a single query-to-code distance
-
virtual size_t scan_codes(size_t n, const uint8_t *codes, const idx_t *ids, int32_t *distances, idx_t *labels, size_t k) const = 0
compute the distances to codes. (distances, labels) should be organized as a min- or max-heap
- Parameters
n – number of codes to scan
codes – codes to scan (n * code_size)
ids – corresponding ids (ignored if store_pairs)
distances – heap distances (size k)
labels – heap labels (size k)
k – heap size
-
virtual void scan_codes_range(size_t n, const uint8_t *codes, const idx_t *ids, int radius, RangeQueryResult &result) const = 0
-
inline virtual ~BinaryInvertedListScanner()
-
virtual void set_query(const uint8_t *query_vector) = 0
-
struct BitstringReader
Public Functions
-
inline BitstringReader(const uint8_t *code, size_t code_size)
-
inline uint64_t read(int nbit)
Public Members
-
const uint8_t *code
-
size_t code_size
-
size_t i
-
inline BitstringReader(const uint8_t *code, size_t code_size)
-
struct BitstringWriter
Public Functions
-
inline BitstringWriter(uint8_t *code, size_t code_size)
-
inline void write(uint64_t x, int nbit)
Public Members
-
uint8_t *code
-
size_t code_size
-
size_t i
-
inline BitstringWriter(uint8_t *code, size_t code_size)
-
struct BlockInvertedLists : public faiss::InvertedLists
- #include <BlockInvertedLists.h>
Inverted Lists that are organized by blocks.
Different from the regular inverted lists, the codes are organized by blocks of size block_size bytes that reprsent a set of n_per_block. Therefore, code allocations are always rounded up to block_size bytes. The codes are also aligned on 32-byte boundaries for use with SIMD.
To avoid misinterpretations, the code_size is set to (size_t)(-1), even if arguably the amount of memory consumed by code is block_size / n_per_block.
The writing functions add_entries and update_entries operate on block-aligned data.
Public Functions
-
BlockInvertedLists(size_t nlist, size_t vec_per_block, size_t block_size)
-
BlockInvertedLists()
-
virtual size_t list_size(size_t list_no) const override
get the size of a list
-
virtual const uint8_t *get_codes(size_t list_no) const override
get the codes for an inverted list must be released by release_codes
- Returns
codes size list_size * code_size
-
virtual const idx_t *get_ids(size_t list_no) const override
get the ids for an inverted list must be released by release_ids
- Returns
ids size list_size
-
virtual size_t add_entries(size_t list_no, size_t n_entry, const idx_t *ids, const uint8_t *code) override
-
virtual void update_entries(size_t list_no, size_t offset, size_t n_entry, const idx_t *ids, const uint8_t *code) override
not implemented
-
virtual void resize(size_t list_no, size_t new_size) override
-
~BlockInvertedLists() override
-
virtual void release_codes(size_t list_no, const uint8_t *codes) const
release codes returned by get_codes (default implementation is nop
-
virtual idx_t get_single_id(size_t list_no, size_t offset) const
- Returns
a single id in an inverted list
-
virtual const uint8_t *get_single_code(size_t list_no, size_t offset) const
- Returns
a single code in an inverted list (should be deallocated with release_codes)
-
virtual void prefetch_lists(const idx_t *list_nos, int nlist) const
prepare the following lists (default does nothing) a list can be -1 hence the signed long
-
virtual size_t add_entry(size_t list_no, idx_t theid, const uint8_t *code)
add one entry to an inverted list
-
virtual void reset()
-
void merge_from(InvertedLists *oivf, size_t add_id)
move all entries from oivf (empty on output)
-
double imbalance_factor() const
1= perfectly balanced, >1: imbalanced
-
void print_stats() const
display some stats about the inverted lists
-
size_t compute_ntotal() const
sum up list sizes
Public Members
-
size_t n_per_block
-
size_t block_size
-
std::vector<AlignedTable<uint8_t>> codes
-
std::vector<std::vector<idx_t>> ids
-
size_t nlist
number of possible key values
-
size_t code_size
code size per vector in bytes
Public Static Attributes
-
static const size_t INVALID_CODE_SIZE = static_cast<size_t>(-1)
used for BlockInvertedLists, where the codes are packed into groups and the individual code size is meaningless
-
BlockInvertedLists(size_t nlist, size_t vec_per_block, size_t block_size)
-
struct BlockInvertedListsIOHook : public faiss::InvertedListsIOHook
Public Functions
-
BlockInvertedListsIOHook()
-
virtual void write(const InvertedLists *ils, IOWriter *f) const override
write the index to the IOWriter (including the fourcc)
-
virtual InvertedLists *read(IOReader *f, int io_flags) const override
called when the fourcc matches this class’s fourcc
-
virtual InvertedLists *read_ArrayInvertedLists(IOReader *f, int io_flags, size_t nlist, size_t code_size, const std::vector<size_t> &sizes) const
read from a ArrayInvertedLists into this invertedlist type. For this to work, the callback has to be enabled and the io_flag has to be set to IO_FLAG_SKIP_IVF_DATA | (16 upper bits of the fourcc)
(default implementation fails)
Public Members
-
const std::string key
string version of the fourcc
-
const std::string classname
typeid.name
Public Static Functions
-
static void add_callback(InvertedListsIOHook*)
-
static void print_callbacks()
-
static InvertedListsIOHook *lookup(int h)
-
static InvertedListsIOHook *lookup_classname(const std::string &classname)
-
BlockInvertedListsIOHook()
-
struct CenteringTransform : public faiss::VectorTransform
- #include <VectorTransform.h>
Subtract the mean of each component from the vectors.
Public Functions
-
explicit CenteringTransform(int d = 0)
-
virtual void train(idx_t n, const float *x) override
train on n vectors.
-
virtual void apply_noalloc(idx_t n, const float *x, float *xt) const override
subtract the mean
-
virtual void reverse_transform(idx_t n, const float *xt, float *x) const override
add the mean
-
float *apply(idx_t n, const float *x) const
apply the random rotation, return new allocated matrix
- Parameters
x – size n * d_in
- Returns
size n * d_out
Public Members
-
std::vector<float> mean
Mean, size d_in = d_out.
-
int d_in
-
int d_out
! input dimension
-
bool is_trained
set if the VectorTransform does not require training, or if training is done already
-
explicit CenteringTransform(int d = 0)
-
struct Cloner
- #include <clone_index.h>
Cloner class, useful to override classes with other cloning functions. The cloning function above just calls Cloner::clone_Index.
Subclassed by faiss::gpu::ToCPUCloner, faiss::gpu::ToGpuCloner, faiss::gpu::ToGpuClonerMultiple
Public Functions
-
virtual VectorTransform *clone_VectorTransform(const VectorTransform*)
-
inline virtual ~Cloner()
-
virtual VectorTransform *clone_VectorTransform(const VectorTransform*)
-
struct Clustering : public faiss::ClusteringParameters
- #include <Clustering.h>
K-means clustering based on assignment - centroid update iterations
The clustering is based on an Index object that assigns training points to the centroids. Therefore, at each iteration the centroids are added to the index.
On output, the centoids table is set to the latest version of the centroids and they are also added to the index. If the centroids table it is not empty on input, it is also used for initialization.
Subclassed by faiss::Clustering1D
Public Functions
-
Clustering(int d, int k)
-
Clustering(int d, int k, const ClusteringParameters &cp)
-
virtual void train(idx_t n, const float *x, faiss::Index &index, const float *x_weights = nullptr)
run k-means training
- Parameters
x – training vectors, size n * d
index – index used for assignment
x_weights – weight associated to each vector: NULL or size n
-
void train_encoded(idx_t nx, const uint8_t *x_in, const Index *codec, Index &index, const float *weights = nullptr)
run with encoded vectors
win addition to train()’s parameters takes a codec as parameter to decode the input vectors.
- Parameters
codec – codec used to decode the vectors (nullptr = vectors are in fact floats) *
-
void post_process_centroids()
Post-process the centroids after each centroid update. includes optional L2 normalization and nearest integer rounding
-
inline virtual ~Clustering()
Public Members
-
size_t d
dimension of the vectors
-
size_t k
nb of centroids
-
std::vector<float> centroids
centroids (k * d) if centroids are set on input to train, they will be used as initialization
-
std::vector<ClusteringIterationStats> iteration_stats
stats at every iteration of clustering
-
int niter
clustering iterations
-
int nredo
redo clustering this many times and keep best
-
bool verbose
-
bool spherical
do we want normalized centroids?
-
bool int_centroids
round centroids coordinates to integer
-
bool update_index
re-train index after each iteration?
-
bool frozen_centroids
use the centroids provided as input and do not change them during iterations
-
int min_points_per_centroid
otherwise you get a warning
-
int max_points_per_centroid
to limit size of dataset
-
int seed
seed for the random number generator
-
size_t decode_block_size
how many vectors at a time to decode
-
Clustering(int d, int k)
-
struct Clustering1D : public faiss::Clustering
- #include <Clustering.h>
Exact 1D clustering algorithm
Since it does not use an index, it does not overload the train() function
Public Functions
-
explicit Clustering1D(int k)
-
Clustering1D(int k, const ClusteringParameters &cp)
-
void train_exact(idx_t n, const float *x)
-
inline virtual ~Clustering1D()
-
virtual void train(idx_t n, const float *x, faiss::Index &index, const float *x_weights = nullptr)
run k-means training
- Parameters
x – training vectors, size n * d
index – index used for assignment
x_weights – weight associated to each vector: NULL or size n
-
void train_encoded(idx_t nx, const uint8_t *x_in, const Index *codec, Index &index, const float *weights = nullptr)
run with encoded vectors
win addition to train()’s parameters takes a codec as parameter to decode the input vectors.
- Parameters
codec – codec used to decode the vectors (nullptr = vectors are in fact floats) *
-
void post_process_centroids()
Post-process the centroids after each centroid update. includes optional L2 normalization and nearest integer rounding
Public Members
-
size_t d
dimension of the vectors
-
size_t k
nb of centroids
-
std::vector<float> centroids
centroids (k * d) if centroids are set on input to train, they will be used as initialization
-
std::vector<ClusteringIterationStats> iteration_stats
stats at every iteration of clustering
-
int niter
clustering iterations
-
int nredo
redo clustering this many times and keep best
-
bool verbose
-
bool spherical
do we want normalized centroids?
-
bool int_centroids
round centroids coordinates to integer
-
bool update_index
re-train index after each iteration?
-
bool frozen_centroids
use the centroids provided as input and do not change them during iterations
-
int min_points_per_centroid
otherwise you get a warning
-
int max_points_per_centroid
to limit size of dataset
-
int seed
seed for the random number generator
-
size_t decode_block_size
how many vectors at a time to decode
-
explicit Clustering1D(int k)
-
struct ClusteringIterationStats
Public Members
-
float obj
objective values (sum of distances reported by index)
-
double time
seconds for iteration
-
double time_search
seconds for just search
-
double imbalance_factor
imbalance factor of iteration
-
int nsplit
number of cluster splits
-
float obj
-
struct ClusteringParameters
- #include <Clustering.h>
Class for the clustering parameters. Can be passed to the constructor of the Clustering object.
Subclassed by faiss::Clustering, faiss::ProgressiveDimClusteringParameters
Public Functions
-
ClusteringParameters()
sets reasonable defaults
Public Members
-
int niter
clustering iterations
-
int nredo
redo clustering this many times and keep best
-
bool verbose
-
bool spherical
do we want normalized centroids?
-
bool int_centroids
round centroids coordinates to integer
-
bool update_index
re-train index after each iteration?
-
bool frozen_centroids
use the centroids provided as input and do not change them during iterations
-
int min_points_per_centroid
otherwise you get a warning
-
int max_points_per_centroid
to limit size of dataset
-
int seed
seed for the random number generator
-
size_t decode_block_size
how many vectors at a time to decode
-
ClusteringParameters()
-
template<typename T_, typename TI_>
struct CMax - #include <ordered_key_value.h>
The C object gives the type T of the values of a key-value storage, the type of the keys, TI and the comparison that is done: CMax for a decreasing series and CMin for increasing series. In other words, for a given threshold threshold, an incoming value x is kept if
C::cmp(threshold, x)
is true.
Public Static Functions
-
static inline T neutral()
Public Static Attributes
-
static const bool is_max = true
-
static inline T neutral()
-
template<typename T_, typename TI_>
struct CMin -
Public Static Functions
-
static inline T neutral()
Public Static Attributes
-
static const bool is_max = false
-
static inline T neutral()
-
struct DirectMap
- #include <DirectMap.h>
Direct map: a way to map back from ids to inverted lists
Public Types
-
enum Type
Values:
-
enumerator NoMap
-
enumerator Array
-
enumerator Hashtable
-
enumerator NoMap
Public Functions
-
DirectMap()
-
void set_type(Type new_type, const InvertedLists *invlists, size_t ntotal)
set type and initialize
-
inline bool no() const
for quick checks
-
void check_can_add(const idx_t *ids)
throw if Array and ids is not NULL
update the direct_map
-
void clear()
remove all entries
-
size_t remove_ids(const IDSelector &sel, InvertedLists *invlists)
remove ids from the InvertedLists, possibly using the direct map
operations on inverted lists that require translation with a DirectMap
-
void update_codes(InvertedLists *invlists, int n, const idx_t *ids, const idx_t *list_nos, const uint8_t *codes)
update entries, using the direct map
-
enum Type
-
struct DirectMapAdd
- #include <DirectMap.h>
Thread-safe way of updating the direct_map.
Public Functions
-
void add(size_t i, idx_t list_no, size_t offset)
add vector i (with id xids[i]) at list_no and offset
-
~DirectMapAdd()
-
void add(size_t i, idx_t list_no, size_t offset)
-
struct FastScanStats
Public Functions
-
inline FastScanStats()
-
inline void reset()
Public Members
-
uint64_t t0
-
uint64_t t1
-
uint64_t t2
-
uint64_t t3
-
inline FastScanStats()
-
struct GenHammingComputer16
Public Functions
-
inline GenHammingComputer16(const uint8_t *a8, int code_size)
-
inline int hamming(const uint8_t *b8) const
Public Members
-
uint64_t a0
-
uint64_t a1
-
inline GenHammingComputer16(const uint8_t *a8, int code_size)
-
struct GenHammingComputer32
Public Functions
-
inline GenHammingComputer32(const uint8_t *a8, int code_size)
-
inline int hamming(const uint8_t *b8) const
Public Members
-
uint64_t a0
-
uint64_t a1
-
uint64_t a2
-
uint64_t a3
-
inline GenHammingComputer32(const uint8_t *a8, int code_size)
-
struct GenHammingComputer8
Public Functions
-
inline GenHammingComputer8(const uint8_t *a, int code_size)
-
inline int hamming(const uint8_t *b) const
Public Members
-
uint64_t a0
-
inline GenHammingComputer8(const uint8_t *a, int code_size)
-
struct GenHammingComputerM8
Public Functions
-
inline GenHammingComputerM8(const uint8_t *a8, int code_size)
-
inline int hamming(const uint8_t *b8) const
Public Members
-
const uint64_t *a
-
int n
-
inline GenHammingComputerM8(const uint8_t *a8, int code_size)
-
template<int CODE_SIZE>
struct HammingComputer : public faiss::HammingComputerDefault Public Functions
-
inline HammingComputer(const uint8_t *a, int code_size)
-
inline void set(const uint8_t *a8, int code_size)
-
inline int hamming(const uint8_t *b8) const
Public Members
-
const uint8_t *a8
-
int quotient8
-
int remainder8
-
inline HammingComputer(const uint8_t *a, int code_size)
-
struct HammingComputer16
Public Functions
-
inline HammingComputer16()
-
inline HammingComputer16(const uint8_t *a8, int code_size)
-
inline void set(const uint8_t *a8, int code_size)
-
inline int hamming(const uint8_t *b8) const
Public Members
-
uint64_t a0
-
uint64_t a1
-
inline HammingComputer16()
-
struct HammingComputer20
Public Functions
-
inline HammingComputer20()
-
inline HammingComputer20(const uint8_t *a8, int code_size)
-
inline void set(const uint8_t *a8, int code_size)
-
inline int hamming(const uint8_t *b8) const
Public Members
-
uint64_t a0
-
uint64_t a1
-
uint32_t a2
-
inline HammingComputer20()
-
struct HammingComputer32
Public Functions
-
inline HammingComputer32()
-
inline HammingComputer32(const uint8_t *a8, int code_size)
-
inline void set(const uint8_t *a8, int code_size)
-
inline int hamming(const uint8_t *b8) const
Public Members
-
uint64_t a0
-
uint64_t a1
-
uint64_t a2
-
uint64_t a3
-
inline HammingComputer32()
-
struct HammingComputer4
Public Functions
-
inline HammingComputer4()
-
inline HammingComputer4(const uint8_t *a, int code_size)
-
inline void set(const uint8_t *a, int code_size)
-
inline int hamming(const uint8_t *b) const
Public Members
-
uint32_t a0
-
inline HammingComputer4()
-
struct HammingComputer64
Public Functions
-
inline HammingComputer64()
-
inline HammingComputer64(const uint8_t *a8, int code_size)
-
inline void set(const uint8_t *a8, int code_size)
-
inline int hamming(const uint8_t *b8) const
Public Members
-
uint64_t a0
-
uint64_t a1
-
uint64_t a2
-
uint64_t a3
-
uint64_t a4
-
uint64_t a5
-
uint64_t a6
-
uint64_t a7
-
inline HammingComputer64()
-
struct HammingComputer8
Public Functions
-
inline HammingComputer8()
-
inline HammingComputer8(const uint8_t *a, int code_size)
-
inline void set(const uint8_t *a, int code_size)
-
inline int hamming(const uint8_t *b) const
Public Members
-
uint64_t a0
-
inline HammingComputer8()
-
struct HammingComputerDefault
Subclassed by faiss::HammingComputer< CODE_SIZE >
Public Functions
-
inline HammingComputerDefault()
-
inline HammingComputerDefault(const uint8_t *a8, int code_size)
-
inline void set(const uint8_t *a8, int code_size)
-
inline int hamming(const uint8_t *b8) const
Public Members
-
const uint8_t *a8
-
int quotient8
-
int remainder8
-
inline HammingComputerDefault()
-
struct HammingComputerM4
Public Functions
-
inline HammingComputerM4()
-
inline HammingComputerM4(const uint8_t *a4, int code_size)
-
inline void set(const uint8_t *a4, int code_size)
-
inline int hamming(const uint8_t *b8) const
Public Members
-
const uint32_t *a
-
int n
-
inline HammingComputerM4()
-
struct HammingComputerM8
Public Functions
-
inline HammingComputerM8()
-
inline HammingComputerM8(const uint8_t *a8, int code_size)
-
inline void set(const uint8_t *a8, int code_size)
-
inline int hamming(const uint8_t *b8) const
Public Members
-
const uint64_t *a
-
int n
-
inline HammingComputerM8()
-
template<class HammingComputer>
struct HCounterState - #include <hamming-inl.h>
This class maintains a list of best distances seen so far.
Since the distances are in a limited range (0 to nbit), the object maintains one list per possible distance, and fills in only the n-first lists, such that the sum of sizes of the n lists is below k.
Public Functions
-
inline HCounterState(int *counters, int64_t *ids_per_dis, const uint8_t *x, int d, int k)
-
inline void update_counter(const uint8_t *y, size_t j)
Public Members
-
int *counters
-
int64_t *ids_per_dis
-
int thres
-
int count_lt
-
int count_eq
-
int k
-
inline HCounterState(int *counters, int64_t *ids_per_dis, const uint8_t *x, int d, int k)
-
template<typename C>
struct HeapArray - #include <Heap.h>
a template structure for a set of [min|max]-heaps it is tailored so that the actual data of the heaps can just live in compact arrays.
Public Functions
-
inline T *get_val(size_t key)
Return the list of values for a heap.
-
inline TI *get_ids(size_t key)
Correspponding identifiers.
-
void heapify()
prepare all the heaps before adding
-
void addn(size_t nj, const T *vin, TI j0 = 0, size_t i0 = 0, int64_t ni = -1)
add nj elements to heaps i0:i0+ni, with sequential ids
- Parameters
nj – nb of elements to add to each heap
vin – elements to add, size ni * nj
j0 – add this to the ids that are added
i0 – first heap to update
ni – nb of elements to update (-1 = use nh)
-
void addn_with_ids(size_t nj, const T *vin, const TI *id_in = nullptr, int64_t id_stride = 0, size_t i0 = 0, int64_t ni = -1)
same as addn
- Parameters
id_in – ids of the elements to add, size ni * nj
id_stride – stride for id_in
-
void reorder()
reorder all the heaps
-
inline T *get_val(size_t key)
-
struct HStackInvertedLists : public faiss::ReadOnlyInvertedLists
- #include <InvertedLists.h>
Horizontal stack of inverted lists.
Public Functions
-
HStackInvertedLists(int nil, const InvertedLists **ils)
build InvertedLists by concatenating nil of them
-
virtual size_t list_size(size_t list_no) const override
get the size of a list
-
virtual const uint8_t *get_codes(size_t list_no) const override
get the codes for an inverted list must be released by release_codes
- Returns
codes size list_size * code_size
-
virtual const idx_t *get_ids(size_t list_no) const override
get the ids for an inverted list must be released by release_ids
- Returns
ids size list_size
-
virtual void prefetch_lists(const idx_t *list_nos, int nlist) const override
prepare the following lists (default does nothing) a list can be -1 hence the signed long
-
virtual void release_codes(size_t list_no, const uint8_t *codes) const override
release codes returned by get_codes (default implementation is nop
-
virtual void release_ids(size_t list_no, const idx_t *ids) const override
release ids returned by get_ids
-
virtual idx_t get_single_id(size_t list_no, size_t offset) const override
- Returns
a single id in an inverted list
-
virtual const uint8_t *get_single_code(size_t list_no, size_t offset) const override
- Returns
a single code in an inverted list (should be deallocated with release_codes)
-
virtual size_t add_entries(size_t list_no, size_t n_entry, const idx_t *ids, const uint8_t *code) override
-
virtual void update_entries(size_t list_no, size_t offset, size_t n_entry, const idx_t *ids, const uint8_t *code) override
-
virtual void resize(size_t list_no, size_t new_size) override
-
virtual size_t add_entry(size_t list_no, idx_t theid, const uint8_t *code)
add one entry to an inverted list
-
virtual void update_entry(size_t list_no, size_t offset, idx_t id, const uint8_t *code)
-
virtual void reset()
-
void merge_from(InvertedLists *oivf, size_t add_id)
move all entries from oivf (empty on output)
-
double imbalance_factor() const
1= perfectly balanced, >1: imbalanced
-
void print_stats() const
display some stats about the inverted lists
-
size_t compute_ntotal() const
sum up list sizes
Public Members
-
std::vector<const InvertedLists*> ils
-
size_t nlist
number of possible key values
-
size_t code_size
code size per vector in bytes
Public Static Attributes
-
static const size_t INVALID_CODE_SIZE = static_cast<size_t>(-1)
used for BlockInvertedLists, where the codes are packed into groups and the individual code size is meaningless
-
HStackInvertedLists(int nil, const InvertedLists **ils)
-
struct Index
- #include <Index.h>
Abstract structure for an index, supports adding vectors and searching them.
All vectors provided at add or search time are 32-bit float arrays, although the internal representation may vary.
Subclassed by faiss::AdditiveCoarseQuantizer, faiss::gpu::GpuIndex, faiss::IndexFastScan, faiss::IndexFlatCodes, faiss::IndexHNSW, faiss::IndexIVF, faiss::IndexLattice, faiss::IndexNNDescent, faiss::IndexNSG, faiss::IndexPreTransform, faiss::IndexRefine, faiss::IndexSplitVectors, faiss::MultiIndexQuantizer
Public Types
-
using idx_t = int64_t
all indices are this type
-
using component_t = float
-
using distance_t = float
Public Functions
-
inline explicit Index(idx_t d = 0, MetricType metric = METRIC_L2)
-
virtual ~Index()
-
virtual void train(idx_t n, const float *x)
Perform training on a representative set of vectors
- Parameters
n – nb of training vectors
x – training vecors, size n * d
-
virtual void add(idx_t n, const float *x) = 0
Add n vectors of dimension d to the index.
Vectors are implicitly assigned labels ntotal .. ntotal + n - 1 This function slices the input vectors in chunks smaller than blocksize_add and calls add_core.
- Parameters
x – input matrix, size n * d
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)
Same as add, but stores xids instead of sequential ids.
The default implementation fails with an assertion, as it is not supported by all indexes.
- Parameters
xids – if non-null, ids to store for the vectors (size n)
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels) const = 0
query n vectors of dimension d to the index.
return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.
- Parameters
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
distances – output pairwise distances, size n*k
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const
return the indexes of the k vectors closest to the query x.
This function is identical as search but only return labels of neighbors.
- Parameters
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
-
virtual void reset() = 0
removes all elements from the database.
-
virtual size_t remove_ids(const IDSelector &sel)
removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.
-
virtual void reconstruct(idx_t key, float *recons) const
Reconstruct a stored vector (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters
key – id of the vector to reconstruct
recons – reconstucted vector (size d)
-
virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const
Reconstruct vectors i0 to i0 + ni - 1
this function may not be defined for some indexes
- Parameters
recons – reconstucted vector (size ni * d)
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons) const
Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.
If there are not enough results for a query, the resulting arrays is padded with -1s.
- Parameters
recons – reconstructed vectors size (n, k, d)
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const
Computes a residual vector after indexing encoding.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters
x – input vector, size d
residual – output residual vector, size d
key – encoded index, as returned by search and assign
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const
Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters
n – number of vectors
xs – input vectors, size (n x d)
residuals – output residual vectors, size (n x d)
keys – encoded index, as returned by search and assign
-
virtual DistanceComputer *get_distance_computer() const
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const
encode a set of vectors
- Parameters
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const
decode a set of vectors
- Parameters
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
Public Members
-
int d
vector dimension
-
idx_t ntotal
total nb of indexed vectors
-
bool verbose
verbosity level
-
bool is_trained
set if the Index does not require training, or if training is done already
-
MetricType metric_type
type of metric this index uses for search
-
float metric_arg
argument of the metric type
-
using idx_t = int64_t
-
struct Index2Layer : public faiss::IndexFlatCodes
- #include <Index2Layer.h>
Same as an IndexIVFPQ without the inverted lists: codes are stored sequentially
The class is mainly inteded to store encoded vectors that can be accessed randomly, the search function is not implemented.
Public Types
-
using idx_t = int64_t
all indices are this type
-
using component_t = float
-
using distance_t = float
Public Functions
-
Index2Layer(Index *quantizer, size_t nlist, int M, int nbit = 8, MetricType metric = METRIC_L2)
-
Index2Layer()
-
~Index2Layer()
-
virtual void train(idx_t n, const float *x) override
Perform training on a representative set of vectors
- Parameters
n – nb of training vectors
x – training vecors, size n * d
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels) const override
not implemented
-
virtual DistanceComputer *get_distance_computer() const override
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
void transfer_to_IVFPQ(IndexIVFPQ &other) const
transfer the flat codes to an IVFPQ index
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const override
encode a set of vectors
- Parameters
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const override
decode a set of vectors
- Parameters
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
-
virtual void reset() override
removes all elements from the database.
-
virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const override
reconstruction using the codec interface
-
virtual void reconstruct(idx_t key, float *recons) const override
Reconstruct a stored vector (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters
key – id of the vector to reconstruct
recons – reconstucted vector (size d)
-
virtual size_t sa_code_size() const override
size of the produced codes in bytes
-
virtual size_t remove_ids(const IDSelector &sel) override
remove some ids. NB that Because of the structure of the indexing structure, the semantics of this operation are different from the usual ones: the new ids are shifted
-
virtual FlatCodesDistanceComputer *get_FlatCodesDistanceComputer() const
a FlatCodesDistanceComputer offers a distance_to_code method
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)
Same as add, but stores xids instead of sequential ids.
The default implementation fails with an assertion, as it is not supported by all indexes.
- Parameters
xids – if non-null, ids to store for the vectors (size n)
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const
return the indexes of the k vectors closest to the query x.
This function is identical as search but only return labels of neighbors.
- Parameters
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons) const
Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.
If there are not enough results for a query, the resulting arrays is padded with -1s.
- Parameters
recons – reconstructed vectors size (n, k, d)
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const
Computes a residual vector after indexing encoding.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters
x – input vector, size d
residual – output residual vector, size d
key – encoded index, as returned by search and assign
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const
Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters
n – number of vectors
xs – input vectors, size (n x d)
residuals – output residual vectors, size (n x d)
keys – encoded index, as returned by search and assign
Public Members
-
Level1Quantizer q1
first level quantizer
-
ProductQuantizer pq
second level quantizer is always a PQ
-
size_t code_size_1
size of the code for the first level (ceil(log8(q1.nlist)))
-
size_t code_size_2
size of the code for the second level
-
size_t code_size
-
std::vector<uint8_t> codes
encoded dataset, size ntotal * code_size
-
int d
vector dimension
-
bool verbose
verbosity level
-
MetricType metric_type
type of metric this index uses for search
-
float metric_arg
argument of the metric type
-
using idx_t = int64_t
-
struct IndexAdditiveQuantizer : public faiss::IndexFlatCodes
- #include <IndexAdditiveQuantizer.h>
Abstract class for additive quantizers. The search functions are in common.
Subclassed by faiss::IndexLocalSearchQuantizer, faiss::IndexResidualQuantizer
Public Types
-
using Search_type_t = AdditiveQuantizer::Search_type_t
-
using idx_t = int64_t
all indices are this type
-
using component_t = float
-
using distance_t = float
Public Functions
-
explicit IndexAdditiveQuantizer(idx_t d = 0, AdditiveQuantizer *aq = nullptr, MetricType metric = METRIC_L2)
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels) const override
query n vectors of dimension d to the index.
return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.
- Parameters
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
distances – output pairwise distances, size n*k
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const override
encode a set of vectors
- Parameters
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const override
decode a set of vectors
- Parameters
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
-
virtual FlatCodesDistanceComputer *get_FlatCodesDistanceComputer() const override
a FlatCodesDistanceComputer offers a distance_to_code method
-
virtual void reset() override
removes all elements from the database.
-
virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const override
reconstruction using the codec interface
-
virtual void reconstruct(idx_t key, float *recons) const override
Reconstruct a stored vector (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters
key – id of the vector to reconstruct
recons – reconstucted vector (size d)
-
virtual size_t sa_code_size() const override
size of the produced codes in bytes
-
virtual size_t remove_ids(const IDSelector &sel) override
remove some ids. NB that Because of the structure of the indexing structure, the semantics of this operation are different from the usual ones: the new ids are shifted
-
inline virtual DistanceComputer *get_distance_computer() const override
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual void train(idx_t n, const float *x)
Perform training on a representative set of vectors
- Parameters
n – nb of training vectors
x – training vecors, size n * d
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)
Same as add, but stores xids instead of sequential ids.
The default implementation fails with an assertion, as it is not supported by all indexes.
- Parameters
xids – if non-null, ids to store for the vectors (size n)
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const
return the indexes of the k vectors closest to the query x.
This function is identical as search but only return labels of neighbors.
- Parameters
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons) const
Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.
If there are not enough results for a query, the resulting arrays is padded with -1s.
- Parameters
recons – reconstructed vectors size (n, k, d)
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const
Computes a residual vector after indexing encoding.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters
x – input vector, size d
residual – output residual vector, size d
key – encoded index, as returned by search and assign
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const
Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters
n – number of vectors
xs – input vectors, size (n x d)
residuals – output residual vectors, size (n x d)
keys – encoded index, as returned by search and assign
-
using Search_type_t = AdditiveQuantizer::Search_type_t
-
struct IndexAdditiveQuantizerFastScan : public faiss::IndexFastScan
- #include <IndexAdditiveQuantizerFastScan.h>
Fast scan version of IndexAQ. Works for 4-bit AQ for now.
The codes are not stored sequentially but grouped in blocks of size bbs. This makes it possible to compute distances quickly with SIMD instructions.
Implementations: 12: blocked loop with internal loop on Q with qbs 13: same with reservoir accumulator to store results 14: no qbs with heap accumulator 15: no qbs with reservoir accumulator
Subclassed by faiss::IndexLocalSearchQuantizerFastScan, faiss::IndexResidualQuantizerFastScan
Public Types
-
using Search_type_t = AdditiveQuantizer::Search_type_t
-
using idx_t = int64_t
all indices are this type
-
using component_t = float
-
using distance_t = float
Public Functions
-
explicit IndexAdditiveQuantizerFastScan(AdditiveQuantizer *aq, MetricType metric = METRIC_L2, int bbs = 32)
-
void init(AdditiveQuantizer *aq, MetricType metric = METRIC_L2, int bbs = 32)
-
IndexAdditiveQuantizerFastScan()
-
~IndexAdditiveQuantizerFastScan() override
-
explicit IndexAdditiveQuantizerFastScan(const IndexAdditiveQuantizer &orig, int bbs = 32)
build from an existing IndexAQ
-
virtual void train(idx_t n, const float *x) override
Perform training on a representative set of vectors
- Parameters
n – nb of training vectors
x – training vecors, size n * d
-
void estimate_norm_scale(idx_t n, const float *x)
-
virtual void compute_codes(uint8_t *codes, idx_t n, const float *x) const override
-
virtual void compute_float_LUT(float *lut, idx_t n, const float *x) const override
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels) const override
query n vectors of dimension d to the index.
return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.
- Parameters
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
distances – output pairwise distances, size n*k
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const override
Decode a set of vectors.
NOTE: The codes in the IndexAdditiveQuantizerFastScan object are non- contiguous. But this method requires a contiguous representation.
- Parameters
n – number of vectors
bytes – input encoded vectors, size n * code_size
x – output vectors, size n * d
-
void init_fastscan(int d, size_t M, size_t nbits, MetricType metric, int bbs)
-
virtual void reset() override
removes all elements from the database.
-
virtual void add(idx_t n, const float *x) override
Add n vectors of dimension d to the index.
Vectors are implicitly assigned labels ntotal .. ntotal + n - 1 This function slices the input vectors in chunks smaller than blocksize_add and calls add_core.
- Parameters
x – input matrix, size n * d
-
template<bool is_max, class Scaler>
void search_dispatch_implem(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const Scaler &scaler) const
-
template<class Cfloat, class Scaler>
void search_implem_234(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const Scaler &scaler) const
-
template<class C, class Scaler>
void search_implem_12(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, int impl, const Scaler &scaler) const
-
template<class C, class Scaler>
void search_implem_14(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, int impl, const Scaler &scaler) const
-
virtual void reconstruct(idx_t key, float *recons) const override
Reconstruct a stored vector (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters
key – id of the vector to reconstruct
recons – reconstucted vector (size d)
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)
Same as add, but stores xids instead of sequential ids.
The default implementation fails with an assertion, as it is not supported by all indexes.
- Parameters
xids – if non-null, ids to store for the vectors (size n)
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const
return the indexes of the k vectors closest to the query x.
This function is identical as search but only return labels of neighbors.
- Parameters
x – input vectors to search, size n * d
labels – output labels of the NNs, size n*k
-
virtual size_t remove_ids(const IDSelector &sel)
removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.
-
virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const
Reconstruct vectors i0 to i0 + ni - 1
this function may not be defined for some indexes
- Parameters
recons – reconstucted vector (size ni * d)
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons) const
Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.
If there are not enough results for a query, the resulting arrays is padded with -1s.
- Parameters
recons – reconstructed vectors size (n, k, d)
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const
Computes a residual vector after indexing encoding.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters
x – input vector, size d
residual – output residual vector, size d
key – encoded index, as returned by search and assign
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const
Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.
The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.
- Parameters
n – number of vectors
xs – input vectors, size (n x d)
residuals – output residual vectors, size (n x d)
keys – encoded index, as returned by search and assign
-
virtual DistanceComputer *get_distance_computer() const
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const
encode a set of vectors
- Parameters
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
Public Members
-
AdditiveQuantizer *aq
-
bool rescale_norm = true
-
int norm_scale = 1
-
size_t max_train_points = 0
-
int implem = 0
-
int skip = 0
-
int bbs
-
int qbs = 0
-
size_t M
-
size_t nbits
-
size_t ksub
-
size_t code_size
-
size_t ntotal2
-
size_t M2
-
AlignedTable<uint8_t> codes
-
const uint8_t *orig_codes = nullptr
-
int d
vector dimension
-
bool verbose
verbosity level
-
using Search_type_t = AdditiveQuantizer::Search_type_t
-
using IndexReplicas = IndexReplicasTemplate<Index>