Namespace faiss

namespace faiss

Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

Throughout the library, vectors are provided as float * pointers. Most algorithms can be optimized when several vectors are processed (added/searched) together in a batch. In this case, they are passed in as a matrix. When n vectors of size d are provided as float * x, component j of vector i is

x[ i * d + j ]

where 0 <= i < n and 0 <= j < d. In other words, matrices are always compact. When specifying the size of the matrix, we call it an n*d matrix, which implies a row-major storage.

Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree. I/O functions can read/write to a filename, a file handle or to an object that abstracts the medium.

The read functions return objects that should be deallocated with delete. All references within these objectes are owned by the object.

Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree. Definition of inverted lists + a few common classes that implement the interface.

Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree. Since IVF (inverted file) indexes are of so much use for large-scale use cases, we group a few functions related to them in this small library. Most functions work both on IndexIVFs and IndexIVFs embedded within an IndexPreTransform.

Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree. In this file are the implementations of extra metrics beyond L2 and inner product

Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree. Defines a few objects that apply transformations to a set of vectors Often these are pre-processing steps.

Typedefs

using IndexReplicas = IndexReplicasTemplate<Index>
using IndexBinaryReplicas = IndexReplicasTemplate<IndexBinary>
using IndexShards = IndexShardsTemplate<Index>
using IndexBinaryShards = IndexShardsTemplate<IndexBinary>
using ConcatenatedInvertedLists = HStackInvertedLists
using IndexIDMap = IndexIDMapTemplate<Index>
using IndexBinaryIDMap = IndexIDMapTemplate<IndexBinary>
using IndexIDMap2 = IndexIDMap2Template<Index>
using IndexBinaryIDMap2 = IndexIDMap2Template<IndexBinary>
typedef HeapArray<CMin<float, int64_t>> float_minheap_array_t
typedef HeapArray<CMin<int, int64_t>> int_minheap_array_t
typedef HeapArray<CMax<float, int64_t>> float_maxheap_array_t
typedef HeapArray<CMax<int, int64_t>> int_maxheap_array_t

Enums

enum MetricType

The metric space for vector comparison for Faiss indices and algorithms.

Most algorithms support both inner product and L2, with the flat (brute-force) indices supporting additional metric types for vector comparison.

Values:

enumerator METRIC_INNER_PRODUCT

maximum inner product search

enumerator METRIC_L2

squared L2 search

enumerator METRIC_L1

L1 (aka cityblock)

enumerator METRIC_Linf

infinity distance

enumerator METRIC_Lp

L_p distance, p is given by a faiss::Index metric_arg

enumerator METRIC_Canberra

some additional metrics defined in scipy.spatial.distance

enumerator METRIC_BrayCurtis
enumerator METRIC_JensenShannon

Functions

Index *clone_index(const Index*)
float kmeans_clustering(size_t d, size_t n, size_t k, const float *x, float *centroids)

simplified interface

Parameters
  • d – dimension of the data

  • n – nb of training vectors

  • k – nb of output centroids

  • x – training set (size n * d)

  • centroids – output centroids (size k * d)

Returns

final quantization error

Index *index_factory(int d, const char *description, MetricType metric = METRIC_L2)

Build and index with the sequence of processing steps described in the string.

IndexBinary *index_binary_factory(int d, const char *description)
void write_index(const Index *idx, const char *fname)
void write_index(const Index *idx, FILE *f)
void write_index(const Index *idx, IOWriter *writer)
void write_index_binary(const IndexBinary *idx, const char *fname)
void write_index_binary(const IndexBinary *idx, FILE *f)
void write_index_binary(const IndexBinary *idx, IOWriter *writer)
Index *read_index(const char *fname, int io_flags = 0)
Index *read_index(FILE *f, int io_flags = 0)
Index *read_index(IOReader *reader, int io_flags = 0)
IndexBinary *read_index_binary(const char *fname, int io_flags = 0)
IndexBinary *read_index_binary(FILE *f, int io_flags = 0)
IndexBinary *read_index_binary(IOReader *reader, int io_flags = 0)
void write_VectorTransform(const VectorTransform *vt, const char *fname)
VectorTransform *read_VectorTransform(const char *fname)
ProductQuantizer *read_ProductQuantizer(const char *fname)
ProductQuantizer *read_ProductQuantizer(IOReader *reader)
void write_ProductQuantizer(const ProductQuantizer *pq, const char *fname)
void write_ProductQuantizer(const ProductQuantizer *pq, IOWriter *f)
void write_InvertedLists(const InvertedLists *ils, IOWriter *f)
InvertedLists *read_InvertedLists(IOReader *reader, int io_flags = 0)
void initialize_IVFPQ_precomputed_table(int &use_precomputed_table, const Index *quantizer, const ProductQuantizer &pq, AlignedTable<float> &precomputed_table, bool verbose)

Pre-compute distance tables for IVFPQ with by-residual and METRIC_L2

Parameters
  • use_precomputed_table – (I/O) =-1: force disable =0: decide heuristically (default: use tables only if they are < precomputed_tables_max_bytes), set use_precomputed_table on output =1: tables that work for all quantizers (size 256 * nlist * M) =2: specific version for MultiIndexQuantizer (much more compact)

  • precomputed_table – precomputed table to initialize

inline uint64_t lo_build(uint64_t list_id, uint64_t offset)
inline uint64_t lo_listno(uint64_t lo)
inline uint64_t lo_offset(uint64_t lo)
template<int A = 32>
inline bool is_aligned_pointer(const void *x)
float fvec_L2sqr(const float *x, const float *y, size_t d)

Squared L2 distance between two vectors.

float fvec_inner_product(const float *x, const float *y, size_t d)

inner product

float fvec_L1(const float *x, const float *y, size_t d)

L1 distance.

float fvec_Linf(const float *x, const float *y, size_t d)

infinity distance

void pairwise_L2sqr(int64_t d, int64_t nq, const float *xq, int64_t nb, const float *xb, float *dis, int64_t ldq = -1, int64_t ldb = -1, int64_t ldd = -1)

Compute pairwise distances between sets of vectors

Parameters
  • d – dimension of the vectors

  • nq – nb of query vectors

  • nb – nb of database vectors

  • xq – query vectors (size nq * d)

  • xb – database vectors (size nb * d)

  • dis – output distances (size nq * nb)

  • ldq, ldb, ldd – strides for the matrices

void fvec_inner_products_ny(float *ip, const float *x, const float *y, size_t d, size_t ny)
void fvec_L2sqr_ny(float *dis, const float *x, const float *y, size_t d, size_t ny)
float fvec_norm_L2sqr(const float *x, size_t d)

squared norm of a vector

void fvec_norms_L2(float *norms, const float *x, size_t d, size_t nx)

compute the L2 norms for a set of vectors

Parameters
  • norms – output norms, size nx

  • x – set of vectors, size nx * d

void fvec_norms_L2sqr(float *norms, const float *x, size_t d, size_t nx)

same as fvec_norms_L2, but computes squared norms

void fvec_renorm_L2(size_t d, size_t nx, float *x)
void inner_product_to_L2sqr(float *dis, const float *nr1, const float *nr2, size_t n1, size_t n2)
void fvec_add(size_t d, const float *a, const float *b, float *c)

compute c := a + b for vectors

c and a can overlap, c and b can overlap

Parameters
  • a – size d

  • b – size d

  • c – size d

void fvec_add(size_t d, const float *a, float b, float *c)

compute c := a + b for a, c vectors and b a scalar

c and a can overlap

Parameters
  • a – size d

  • c – size d

void fvec_sub(size_t d, const float *a, const float *b, float *c)

compute c := a - b for vectors

c and a can overlap, c and b can overlap

Parameters
  • a – size d

  • b – size d

  • c – size d

void fvec_inner_products_by_idx(float *ip, const float *x, const float *y, const int64_t *ids, size_t d, size_t nx, size_t ny)
void fvec_L2sqr_by_idx(float *dis, const float *x, const float *y, const int64_t *ids, size_t d, size_t nx, size_t ny)
void pairwise_indexed_L2sqr(size_t d, size_t n, const float *x, const int64_t *ix, const float *y, const int64_t *iy, float *dis)

compute dis[j] = L2sqr(x[ix[j]], y[iy[j]]) forall j=0..n-1

Parameters
  • x – size (max(ix) + 1, d)

  • y – size (max(iy) + 1, d)

  • ix – size n

  • iy – size n

  • dis – size n

void pairwise_indexed_inner_product(size_t d, size_t n, const float *x, const int64_t *ix, const float *y, const int64_t *iy, float *dis)
void knn_inner_product(const float *x, const float *y, size_t d, size_t nx, size_t ny, float_minheap_array_t *res)

Return the k nearest neighors of each of the nx vectors x among the ny vector y, w.r.t to max inner product

Parameters
  • x – query vectors, size nx * d

  • y – database vectors, size ny * d

  • res – result array, which also provides k. Sorted on output

void knn_L2sqr(const float *x, const float *y, size_t d, size_t nx, size_t ny, float_maxheap_array_t *res, const float *y_norm2 = nullptr)

Same as knn_inner_product, for the L2 distance

Parameters

y_norm2 – norms for the y vectors (nullptr or size ny)

void knn_inner_products_by_idx(const float *x, const float *y, const int64_t *ids, size_t d, size_t nx, size_t ny, float_minheap_array_t *res)
void knn_L2sqr_by_idx(const float *x, const float *y, const int64_t *ids, size_t d, size_t nx, size_t ny, float_maxheap_array_t *res)
void range_search_L2sqr(const float *x, const float *y, size_t d, size_t nx, size_t ny, float radius, RangeSearchResult *result)

Return the k nearest neighors of each of the nx vectors x among the ny vector y, w.r.t to max inner product

Parameters
  • x – query vectors, size nx * d

  • y – database vectors, size ny * d

  • radius – search radius around the x vectors

  • result – result structure

void range_search_inner_product(const float *x, const float *y, size_t d, size_t nx, size_t ny, float radius, RangeSearchResult *result)

same as range_search_L2sqr for the inner product similarity

void compute_PQ_dis_tables_dsub2(size_t d, size_t ksub, const float *centroids, size_t nx, const float *x, bool is_inner_product, float *dis_tables)

specialized function for PQ2

void pairwise_extra_distances(int64_t d, int64_t nq, const float *xq, int64_t nb, const float *xb, MetricType mt, float metric_arg, float *dis, int64_t ldq = -1, int64_t ldb = -1, int64_t ldd = -1)
void knn_extra_metrics(const float *x, const float *y, size_t d, size_t nx, size_t ny, MetricType mt, float metric_arg, float_maxheap_array_t *res)
DistanceComputer *get_extra_distance_computer(size_t d, MetricType mt, float metric_arg, size_t nb, const float *xb)

get a DistanceComputer that refers to this type of distance and indexes a flat array of size nb

SPECIALIZED_HC (4)
SPECIALIZED_HC (8)
SPECIALIZED_HC (16)
SPECIALIZED_HC (20)
SPECIALIZED_HC (32)
SPECIALIZED_HC (64)
inline int generalized_hamming_64(uint64_t a)
void generalized_hammings_knn_hc(int_maxheap_array_t *ha, const uint8_t *a, const uint8_t *b, size_t nb, size_t code_size, int ordered = true)

generalized Hamming distances (= count number of code bytes that are the same)

void bitvec_print(const uint8_t *b, size_t d)
void fvecs2bitvecs(const float *x, uint8_t *b, size_t d, size_t n)
void bitvecs2fvecs(const uint8_t *b, float *x, size_t d, size_t n)
void fvec2bitvec(const float *x, uint8_t *b, size_t d)
void bitvec_shuffle(size_t n, size_t da, size_t db, const int *order, const uint8_t *a, uint8_t *b)

Shuffle the bits from b(i, j) := a(i, order[j])

inline int popcount64(uint64_t x)
void hammings(const uint8_t *a, const uint8_t *b, size_t na, size_t nb, size_t nbytespercode, hamdis_t *dis)

Compute a set of Hamming distances between na and nb binary vectors

Parameters
  • a – size na * nbytespercode

  • b – size nb * nbytespercode

  • nbytespercode – should be multiple of 8

  • dis – output distances, size na * nb

void hammings_knn_hc(int_maxheap_array_t *ha, const uint8_t *a, const uint8_t *b, size_t nb, size_t ncodes, int ordered)

Return the k smallest Hamming distances for a set of binary query vectors, using a max heap.

Parameters
  • a – queries, size ha->nh * ncodes

  • b – database, size nb * ncodes

  • nb – number of database vectors

  • ncodes – size of the binary codes (bytes)

  • ordered – if != 0: order the results by decreasing distance (may be bottleneck for k/n > 0.01)

void hammings_knn(int_maxheap_array_t *ha, const uint8_t *a, const uint8_t *b, size_t nb, size_t ncodes, int ordered)
void hammings_knn_mc(const uint8_t *a, const uint8_t *b, size_t na, size_t nb, size_t k, size_t ncodes, int32_t *distances, int64_t *labels)

Return the k smallest Hamming distances for a set of binary query vectors, using counting max.

Parameters
  • a – queries, size na * ncodes

  • b – database, size nb * ncodes

  • na – number of query vectors

  • nb – number of database vectors

  • k – number of vectors/distances to return

  • ncodes – size of the binary codes (bytes)

  • distances – output distances from each query vector to its k nearest neighbors

  • labels – output ids of the k nearest neighbors to each query vector

void hamming_range_search(const uint8_t *a, const uint8_t *b, size_t na, size_t nb, int radius, size_t ncodes, RangeSearchResult *result)

same as hammings_knn except we are doing a range search with radius

void hamming_count_thres(const uint8_t *bs1, const uint8_t *bs2, size_t n1, size_t n2, hamdis_t ht, size_t ncodes, size_t *nptr)
size_t match_hamming_thres(const uint8_t *bs1, const uint8_t *bs2, size_t n1, size_t n2, hamdis_t ht, size_t ncodes, int64_t *idx, hamdis_t *dis)
void crosshamming_count_thres(const uint8_t *dbs, size_t n, hamdis_t ht, size_t ncodes, size_t *nptr)
hamdis_t hamming(const uint64_t *bs1, const uint64_t *bs2, size_t nwords)
template<class C>
inline void heap_pop(size_t k, typename C::T *bh_val, typename C::TI *bh_ids)

Pops the top element from the heap defined by bh_val[0..k-1] and bh_ids[0..k-1]. on output the element at k-1 is undefined.

template<class C>
inline void heap_push(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, typename C::T val, typename C::TI ids)

Pushes the element (val, ids) into the heap bh_val[0..k-2] and bh_ids[0..k-2]. on output the element at k-1 is defined.

template<class C>
inline void heap_replace_top(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, typename C::T val, typename C::TI ids)

Replace the top element from the heap defined by bh_val[0..k-1] and bh_ids[0..k-1].

template<typename T>
inline void minheap_pop(size_t k, T *bh_val, int64_t *bh_ids)
template<typename T>
inline void minheap_push(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
template<typename T>
inline void minheap_replace_top(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
template<typename T>
inline void maxheap_pop(size_t k, T *bh_val, int64_t *bh_ids)
template<typename T>
inline void maxheap_push(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
template<typename T>
inline void maxheap_replace_top(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
template<class C>
inline void heap_heapify(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, const typename C::T *x = nullptr, const typename C::TI *ids = nullptr, size_t k0 = 0)
template<typename T>
inline void minheap_heapify(size_t k, T *bh_val, int64_t *bh_ids, const T *x = nullptr, const int64_t *ids = nullptr, size_t k0 = 0)
template<typename T>
inline void maxheap_heapify(size_t k, T *bh_val, int64_t *bh_ids, const T *x = nullptr, const int64_t *ids = nullptr, size_t k0 = 0)
template<class C>
inline void heap_addn(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, const typename C::T *x, const typename C::TI *ids, size_t n)
template<typename T>
inline void minheap_addn(size_t k, T *bh_val, int64_t *bh_ids, const T *x, const int64_t *ids, size_t n)
template<typename T>
inline void maxheap_addn(size_t k, T *bh_val, int64_t *bh_ids, const T *x, const int64_t *ids, size_t n)
template<typename C>
inline size_t heap_reorder(size_t k, typename C::T *bh_val, typename C::TI *bh_ids)
template<typename T>
inline size_t minheap_reorder(size_t k, T *bh_val, int64_t *bh_ids)
template<typename T>
inline size_t maxheap_reorder(size_t k, T *bh_val, int64_t *bh_ids)
template<class C>
inline void indirect_heap_pop(size_t k, const typename C::T *bh_val, typename C::TI *bh_ids)
template<class C>
inline void indirect_heap_push(size_t k, const typename C::T *bh_val, typename C::TI *bh_ids, typename C::TI id)
template<typename T>
inline T cmin_nextafter(T x)
template<typename T>
inline T cmax_nextafter(T x)
template<>
inline float cmin_nextafter<float>(float x)
template<>
inline float cmax_nextafter<float>(float x)
template<>
inline uint16_t cmin_nextafter<uint16_t>(uint16_t x)
template<>
inline uint16_t cmax_nextafter<uint16_t>(uint16_t x)
template<class C>
C::T partition_fuzzy(typename C::T *vals, typename C::TI *ids, size_t n, size_t q_min, size_t q_max, size_t *q_out)

partitions the table into 0:q and q:n where all elements above q are >= all elements below q (for C = CMax, for CMin comparisons are reversed)

Returns the partition threshold. The elements q:n are destroyed on output.

template<class C>
inline C::T partition(typename C::T *vals, typename C::TI *ids, size_t n, size_t q)

simplified interface for when the parition is not fuzzy

void simd_histogram_8(const uint16_t *data, int n, uint16_t min, int shift, int *hist)

low level SIMD histogramming functions 8-bin histogram of (x - min) >> shift values outside the range are ignored. the data table should be aligned on 32 bytes

void simd_histogram_16(const uint16_t *data, int n, uint16_t min, int shift, int *hist)

same for 16-bin histogram

void float_rand(float *x, size_t n, int64_t seed)
void float_randn(float *x, size_t n, int64_t seed)
void int64_rand(int64_t *x, size_t n, int64_t seed)
void byte_rand(uint8_t *x, size_t n, int64_t seed)
void int64_rand_max(int64_t *x, size_t n, uint64_t max, int64_t seed)
void rand_perm(int *perm, size_t n, int64_t seed)
inline simd16uint16 min(simd16uint16 a, simd16uint16 b)
inline simd16uint16 max(simd16uint16 a, simd16uint16 b)
inline simd16uint16 combine2x2(simd16uint16 a, simd16uint16 b)
inline uint32_t cmp_ge32(simd16uint16 d0, simd16uint16 d1, simd16uint16 thr)
inline uint32_t cmp_le32(simd16uint16 d0, simd16uint16 d1, simd16uint16 thr)
inline simd32uint8 uint16_to_uint8_saturate(simd16uint16 a, simd16uint16 b)
inline uint32_t get_MSBs(simd32uint8 a)

get most significant bit of each byte

inline simd32uint8 blendv(simd32uint8 a, simd32uint8 b, simd32uint8 mask)

use MSB of each byte of mask to select a byte between a and b

inline simd8float32 hadd(simd8float32 a, simd8float32 b)
inline simd8float32 unpacklo(simd8float32 a, simd8float32 b)
inline simd8float32 unpackhi(simd8float32 a, simd8float32 b)
inline simd8float32 fmadd(simd8float32 a, simd8float32 b, simd8float32 c)
inline simd16uint16 min(const simd16uint16 &av, const simd16uint16 &bv)
inline simd16uint16 max(const simd16uint16 &av, const simd16uint16 &bv)
inline simd16uint16 combine2x2(const simd16uint16 &a, const simd16uint16 &b)
inline uint32_t cmp_ge32(const simd16uint16 &d0, const simd16uint16 &d1, const simd16uint16 &thr)
inline uint32_t cmp_le32(const simd16uint16 &d0, const simd16uint16 &d1, const simd16uint16 &thr)
inline simd32uint8 uint16_to_uint8_saturate(const simd16uint16 &a, const simd16uint16 &b)
inline uint32_t get_MSBs(const simd32uint8 &a)

get most significant bit of each byte

inline simd32uint8 blendv(const simd32uint8 &a, const simd32uint8 &b, const simd32uint8 &mask)

use MSB of each byte of mask to select a byte between a and b

inline simd8float32 hadd(const simd8float32 &a, const simd8float32 &b)
inline simd8float32 unpacklo(const simd8float32 &a, const simd8float32 &b)
inline simd8float32 unpackhi(const simd8float32 &a, const simd8float32 &b)
inline simd8float32 fmadd(const simd8float32 &a, const simd8float32 &b, const simd8float32 &c)
std::string get_compile_options()

get compile options

double getmillisecs()

ms elapsed since some arbitrary epoch

size_t get_mem_usage_kb()

get current RSS usage in kB

uint64_t get_cycles()
void fvec_madd(size_t n, const float *a, float bf, const float *b, float *c)

compute c := a + bf * b for a, b and c tables

Parameters
  • n – size of the tables

  • a – size n

  • b – size n

  • c – restult table, size n

int fvec_madd_and_argmin(size_t n, const float *a, float bf, const float *b, float *c)

same as fvec_madd, also return index of the min of the result table

Returns

index of the min of table c

void reflection(const float *u, float *x, size_t n, size_t d, size_t nu)
void matrix_qr(int m, int n, float *a)

compute the Q of the QR decomposition for m > n

Parameters

a – size n * m: input matrix and output Q

void ranklist_handle_ties(int k, int64_t *idx, const float *dis)

distances are supposed to be sorted. Sorts indices with same distance

size_t ranklist_intersection_size(size_t k1, const int64_t *v1, size_t k2, const int64_t *v2)

count the number of common elements between v1 and v2 algorithm = sorting + bissection to avoid double-counting duplicates

size_t merge_result_table_with(size_t n, size_t k, int64_t *I0, float *D0, const int64_t *I1, const float *D1, bool keep_min = true, int64_t translation = 0)

merge a result table into another one

Parameters
  • I0, D0 – first result table, size (n, k)

  • I1, D1 – second result table, size (n, k)

  • keep_min – if true, keep min values, otherwise keep max

  • translation – add this value to all I1’s indexes

Returns

nb of values that were taken from the second table

double imbalance_factor(int n, int k, const int64_t *assign)

a balanced assignment has a IF of 1

double imbalance_factor(int k, const int *hist)

same, takes a histogram as input

void fvec_argsort(size_t n, const float *vals, size_t *perm)
void fvec_argsort_parallel(size_t n, const float *vals, size_t *perm)
int ivec_hist(size_t n, const int *v, int vmax, int *hist)

compute histogram on v

void bincode_hist(size_t n, size_t nbits, const uint8_t *codes, int *hist)

Compute histogram of bits on a code array

Parameters
  • codes – size(n, nbits / 8)

  • hist – size(nbits): nb of 1s in the array of codes

size_t ivec_checksum(size_t n, const int *a)

compute a checksum on a table.

const float *fvecs_maybe_subsample(size_t d, size_t *n, size_t nmax, const float *x, bool verbose = false, int64_t seed = 1234)

random subsamples a set of vectors if there are too many of them

Parameters
  • d – dimension of the vectors

  • n – on input: nb of input vectors, output: nb of output vectors

  • nmax – max nb of vectors to keep

  • x – input array, size *n-by-d

  • seed – random seed to use for sampling

Returns

x or an array allocated with new [] with *n vectors

void binary_to_real(size_t d, const uint8_t *x_in, float *x_out)

Convert binary vector to +1/-1 valued float vector.

Parameters
  • d – dimension of the vector (multiple of 8)

  • x_in – input binary vector (uint8_t table of size d / 8)

  • x_out – output float vector (float table of size d)

void real_to_binary(size_t d, const float *x_in, uint8_t *x_out)

Convert float vector to binary vector. Components > 0 are converted to 1, others to 0.

Parameters
  • d – dimension of the vector (multiple of 8)

  • x_in – input float vector (float table of size d)

  • x_out – output binary vector (uint8_t table of size d / 8)

uint64_t hash_bytes(const uint8_t *bytes, int64_t n)

A reasonable hashing function

bool check_openmp()

Whether OpenMP annotations were respected.

Variables

const int IO_FLAG_READ_ONLY = 2
const int IO_FLAG_ONDISK_SAME_DIR = 4
const int IO_FLAG_SKIP_IVF_DATA = 8
const int IO_FLAG_MMAP = IO_FLAG_SKIP_IVF_DATA | 0x646f0000
FAISS_API IndexBinaryHashStats indexBinaryHash_stats
FAISS_API IndexIVFStats indexIVF_stats
FAISS_API size_t precomputed_table_max_bytes
FAISS_API IndexIVFPQStats indexIVFPQ_stats
FAISS_API IVFFastScanStats IVFFastScan_stats
FAISS_API IndexPQStats indexPQ_stats
FAISS_API FastScanStats FastScan_stats
FAISS_API int distance_compute_blas_threshold
FAISS_API int distance_compute_blas_query_bs
FAISS_API int distance_compute_blas_database_bs
FAISS_API int distance_compute_min_k_reservoir
const uint8_t hamdis_tab_ham_bytes[256]
FAISS_API size_t hamming_batch_size
FAISS_API PartitionStats partition_stats
struct AdditiveCoarseQuantizer : public faiss::Index
#include <IndexAdditiveQuantizer.h>

A “virtual” index where the elements are the residual quantizer centroids.

Intended for use as a coarse quantizer in an IndexIVF.

Subclassed by faiss::LocalSearchCoarseQuantizer, faiss::ResidualCoarseQuantizer

Public Types

using idx_t = int64_t

all indices are this type

using component_t = float
using distance_t = float

Public Functions

explicit AdditiveCoarseQuantizer(idx_t d = 0, AdditiveQuantizer *aq = nullptr, MetricType metric = METRIC_L2)
virtual void add(idx_t n, const float *x) override

N/A.

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels) const override

query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters
  • x – input vectors to search, size n * d

  • labels – output labels of the NNs, size n*k

  • distances – output pairwise distances, size n*k

virtual void reconstruct(idx_t key, float *recons) const override

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters
  • key – id of the vector to reconstruct

  • recons – reconstucted vector (size d)

virtual void train(idx_t n, const float *x) override

Perform training on a representative set of vectors

Parameters
  • n – nb of training vectors

  • x – training vecors, size n * d

virtual void reset() override

N/A.

virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters

xids – if non-null, ids to store for the vectors (size n)

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters
  • x – input vectors to search, size n * d

  • radius – search radius

  • result – result table

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const

return the indexes of the k vectors closest to the query x.

This function is identical as search but only return labels of neighbors.

Parameters
  • x – input vectors to search, size n * d

  • labels – output labels of the NNs, size n*k

virtual size_t remove_ids(const IDSelector &sel)

removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters

recons – reconstucted vector (size ni * d)

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting arrays is padded with -1s.

Parameters

recons – reconstructed vectors size (n, k, d)

virtual void compute_residual(const float *x, float *residual, idx_t key) const

Computes a residual vector after indexing encoding.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters
  • x – input vector, size d

  • residual – output residual vector, size d

  • key – encoded index, as returned by search and assign

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const

Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters
  • n – number of vectors

  • xs – input vectors, size (n x d)

  • residuals – output residual vectors, size (n x d)

  • keys – encoded index, as returned by search and assign

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const

size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters
  • n – number of vectors

  • x – input vectors, size n * d

  • bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

encode a set of vectors

Parameters
  • n – number of vectors

  • bytes – input encoded vectors, size n * sa_code_size()

  • x – output vectors, size n * d

Public Members

AdditiveQuantizer *aq
std::vector<float> centroid_norms

norms of centroids, useful for knn-search

int d

vector dimension

idx_t ntotal

total nb of indexed vectors

bool verbose

verbosity level

bool is_trained

set if the Index does not require training, or if training is done already

MetricType metric_type

type of metric this index uses for search

float metric_arg

argument of the metric type

template<class T, int A = 32>
struct AlignedTable

Public Functions

inline AlignedTable()
inline explicit AlignedTable(size_t n)
inline size_t itemsize() const
inline void resize(size_t n)
inline void clear()
inline size_t size() const
inline size_t nbytes() const
inline T *get()
inline const T *get() const
inline T *data()
inline const T *data() const
inline T &operator[](size_t i)
inline T operator[](size_t i) const

Public Members

AlignedTableTightAlloc<T, A> tab
size_t numel = 0

Public Static Functions

static inline size_t round_capacity(size_t n)
template<class T, int A = 32>
struct AlignedTableTightAlloc

Public Functions

inline AlignedTableTightAlloc()
inline explicit AlignedTableTightAlloc(size_t n)
inline size_t itemsize() const
inline void resize(size_t n)
inline void clear()
inline size_t size() const
inline size_t nbytes() const
inline T *get()
inline const T *get() const
inline T *data()
inline const T *data() const
inline T &operator[](size_t i)
inline T operator[](size_t i) const
inline ~AlignedTableTightAlloc()
inline AlignedTableTightAlloc<T, A> &operator=(const AlignedTableTightAlloc<T, A> &other)
inline AlignedTableTightAlloc(const AlignedTableTightAlloc<T, A> &other)

Public Members

T *ptr
size_t numel
struct ArrayInvertedLists : public faiss::InvertedLists
#include <InvertedLists.h>

simple (default) implementation as an array of inverted lists

Public Types

typedef Index::idx_t idx_t

Public Functions

ArrayInvertedLists(size_t nlist, size_t code_size)
virtual size_t list_size(size_t list_no) const override

get the size of a list

virtual const uint8_t *get_codes(size_t list_no) const override

get the codes for an inverted list must be released by release_codes

Returns

codes size list_size * code_size

virtual const idx_t *get_ids(size_t list_no) const override

get the ids for an inverted list must be released by release_ids

Returns

ids size list_size

size_t add_entries(size_t list_no, size_t n_entry, const idx_t *ids, const uint8_t *code) override
void update_entries(size_t list_no, size_t offset, size_t n_entry, const idx_t *ids, const uint8_t *code) override
virtual void resize(size_t list_no, size_t new_size) override
~ArrayInvertedLists() override
virtual void release_codes(size_t list_no, const uint8_t *codes) const

release codes returned by get_codes (default implementation is nop

virtual void release_ids(size_t list_no, const idx_t *ids) const

release ids returned by get_ids

virtual idx_t get_single_id(size_t list_no, size_t offset) const
Returns

a single id in an inverted list

virtual const uint8_t *get_single_code(size_t list_no, size_t offset) const
Returns

a single code in an inverted list (should be deallocated with release_codes)

virtual void prefetch_lists(const idx_t *list_nos, int nlist) const

prepare the following lists (default does nothing) a list can be -1 hence the signed long

virtual size_t add_entry(size_t list_no, idx_t theid, const uint8_t *code)

add one entry to an inverted list

virtual size_t add_entries(size_t list_no, size_t n_entry, const idx_t *ids, const uint8_t *code) = 0
virtual void update_entry(size_t list_no, size_t offset, idx_t id, const uint8_t *code)
virtual void update_entries(size_t list_no, size_t offset, size_t n_entry, const idx_t *ids, const uint8_t *code) = 0
virtual void reset()
void merge_from(InvertedLists *oivf, size_t add_id)

move all entries from oivf (empty on output)

double imbalance_factor() const

1= perfectly balanced, >1: imbalanced

void print_stats() const

display some stats about the inverted lists

size_t compute_ntotal() const

sum up list sizes

Public Members

std::vector<std::vector<uint8_t>> codes
std::vector<std::vector<idx_t>> ids

Inverted lists for indexes.

size_t nlist

number of possible key values

size_t code_size

code size per vector in bytes

Public Static Attributes

static const size_t INVALID_CODE_SIZE = static_cast<size_t>(-1)

used for BlockInvertedLists, where the codes are packed into groups and the individual code size is meaningless

struct AutoTuneCriterion
#include <AutoTune.h>

Evaluation criterion. Returns a performance measure in [0,1], higher is better.

Subclassed by faiss::IntersectionCriterion, faiss::OneRecallAtRCriterion

Public Types

typedef Index::idx_t idx_t

Public Functions

AutoTuneCriterion(idx_t nq, idx_t nnn)
void set_groundtruth(int gt_nnn, const float *gt_D_in, const idx_t *gt_I_in)

Intitializes the gt_D and gt_I vectors. Must be called before evaluating

Parameters
  • gt_D_in – size nq * gt_nnn

  • gt_I_in – size nq * gt_nnn

virtual double evaluate(const float *D, const idx_t *I) const = 0

Evaluate the criterion.

Parameters
  • D – size nq * nnn

  • I – size nq * nnn

Returns

the criterion, between 0 and 1. Larger is better.

inline virtual ~AutoTuneCriterion()

Public Members

idx_t nq

nb of queries this criterion is evaluated on

idx_t nnn

nb of NNs that the query should request

idx_t gt_nnn

nb of GT NNs required to evaluate criterion

std::vector<float> gt_D

Ground-truth distances (size nq * gt_nnn)

std::vector<idx_t> gt_I

Ground-truth indexes (size nq * gt_nnn)

struct BinaryInvertedListScanner

Public Types

using idx_t = Index::idx_t

Public Functions

virtual void set_query(const uint8_t *query_vector) = 0

from now on we handle this query.

virtual void set_list(idx_t list_no, uint8_t coarse_dis) = 0

following codes come from this inverted list

virtual uint32_t distance_to_code(const uint8_t *code) const = 0

compute a single query-to-code distance

virtual size_t scan_codes(size_t n, const uint8_t *codes, const idx_t *ids, int32_t *distances, idx_t *labels, size_t k) const = 0

compute the distances to codes. (distances, labels) should be organized as a min- or max-heap

Parameters
  • n – number of codes to scan

  • codes – codes to scan (n * code_size)

  • ids – corresponding ids (ignored if store_pairs)

  • distances – heap distances (size k)

  • labels – heap labels (size k)

  • k – heap size

virtual void scan_codes_range(size_t n, const uint8_t *codes, const idx_t *ids, int radius, RangeQueryResult &result) const = 0
inline virtual ~BinaryInvertedListScanner()
struct BitstringReader

Public Functions

inline BitstringReader(const uint8_t *code, size_t code_size)
inline uint64_t read(int nbit)

Public Members

const uint8_t *code
size_t code_size
size_t i
struct BitstringWriter

Public Functions

inline BitstringWriter(uint8_t *code, size_t code_size)
inline void write(uint64_t x, int nbit)

Public Members

uint8_t *code
size_t code_size
size_t i
struct BlockInvertedLists : public faiss::InvertedLists
#include <BlockInvertedLists.h>

Inverted Lists that are organized by blocks.

Different from the regular inverted lists, the codes are organized by blocks of size block_size bytes that reprsent a set of n_per_block. Therefore, code allocations are always rounded up to block_size bytes. The codes are also aligned on 32-byte boundaries for use with SIMD.

To avoid misinterpretations, the code_size is set to (size_t)(-1), even if arguably the amount of memory consumed by code is block_size / n_per_block.

The writing functions add_entries and update_entries operate on block-aligned data.

Public Types

typedef Index::idx_t idx_t

Public Functions

BlockInvertedLists(size_t nlist, size_t vec_per_block, size_t block_size)
BlockInvertedLists()
virtual size_t list_size(size_t list_no) const override

get the size of a list

virtual const uint8_t *get_codes(size_t list_no) const override

get the codes for an inverted list must be released by release_codes

Returns

codes size list_size * code_size

virtual const idx_t *get_ids(size_t list_no) const override

get the ids for an inverted list must be released by release_ids

Returns

ids size list_size

virtual size_t add_entries(size_t list_no, size_t n_entry, const idx_t *ids, const uint8_t *code) override
virtual void update_entries(size_t list_no, size_t offset, size_t n_entry, const idx_t *ids, const uint8_t *code) override

not implemented

virtual void resize(size_t list_no, size_t new_size) override
~BlockInvertedLists() override
virtual void release_codes(size_t list_no, const uint8_t *codes) const

release codes returned by get_codes (default implementation is nop

virtual void release_ids(size_t list_no, const idx_t *ids) const

release ids returned by get_ids

virtual idx_t get_single_id(size_t list_no, size_t offset) const
Returns

a single id in an inverted list

virtual const uint8_t *get_single_code(size_t list_no, size_t offset) const
Returns

a single code in an inverted list (should be deallocated with release_codes)

virtual void prefetch_lists(const idx_t *list_nos, int nlist) const

prepare the following lists (default does nothing) a list can be -1 hence the signed long

virtual size_t add_entry(size_t list_no, idx_t theid, const uint8_t *code)

add one entry to an inverted list

virtual void update_entry(size_t list_no, size_t offset, idx_t id, const uint8_t *code)
virtual void reset()
void merge_from(InvertedLists *oivf, size_t add_id)

move all entries from oivf (empty on output)

double imbalance_factor() const

1= perfectly balanced, >1: imbalanced

void print_stats() const

display some stats about the inverted lists

size_t compute_ntotal() const

sum up list sizes

Public Members

size_t n_per_block
size_t block_size
std::vector<AlignedTable<uint8_t>> codes
std::vector<std::vector<idx_t>> ids
size_t nlist

number of possible key values

size_t code_size

code size per vector in bytes

Public Static Attributes

static const size_t INVALID_CODE_SIZE = static_cast<size_t>(-1)

used for BlockInvertedLists, where the codes are packed into groups and the individual code size is meaningless

struct BlockInvertedListsIOHook : public faiss::InvertedListsIOHook

Public Functions

BlockInvertedListsIOHook()
virtual void write(const InvertedLists *ils, IOWriter *f) const override

write the index to the IOWriter (including the fourcc)

virtual InvertedLists *read(IOReader *f, int io_flags) const override

called when the fourcc matches this class’s fourcc

virtual InvertedLists *read_ArrayInvertedLists(IOReader *f, int io_flags, size_t nlist, size_t code_size, const std::vector<size_t> &sizes) const

read from a ArrayInvertedLists into this invertedlist type. For this to work, the callback has to be enabled and the io_flag has to be set to IO_FLAG_SKIP_IVF_DATA | (16 upper bits of the fourcc)

(default implementation fails)

Public Members

const std::string key

string version of the fourcc

const std::string classname

typeid.name

Public Static Functions

static void add_callback(InvertedListsIOHook*)
static void print_callbacks()
static InvertedListsIOHook *lookup(int h)
static InvertedListsIOHook *lookup_classname(const std::string &classname)
struct CenteringTransform : public faiss::VectorTransform
#include <VectorTransform.h>

Subtract the mean of each component from the vectors.

Public Types

typedef Index::idx_t idx_t

Public Functions

explicit CenteringTransform(int d = 0)
virtual void train(idx_t n, const float *x) override

train on n vectors.

virtual void apply_noalloc(idx_t n, const float *x, float *xt) const override

subtract the mean

virtual void reverse_transform(idx_t n, const float *xt, float *x) const override

add the mean

float *apply(idx_t n, const float *x) const

apply the random rotation, return new allocated matrix

Parameters

x – size n * d_in

Returns

size n * d_out

Public Members

std::vector<float> mean

Mean, size d_in = d_out.

int d_in
int d_out

! input dimension

bool is_trained

set if the VectorTransform does not require training, or if training is done already

struct Cloner
#include <clone_index.h>

Cloner class, useful to override classes with other cloning functions. The cloning function above just calls Cloner::clone_Index.

Subclassed by faiss::gpu::ToCPUCloner, faiss::gpu::ToGpuCloner, faiss::gpu::ToGpuClonerMultiple

Public Functions

virtual VectorTransform *clone_VectorTransform(const VectorTransform*)
virtual Index *clone_Index(const Index*)
virtual IndexIVF *clone_IndexIVF(const IndexIVF*)
inline virtual ~Cloner()
struct Clustering : public faiss::ClusteringParameters
#include <Clustering.h>

K-means clustering based on assignment - centroid update iterations

The clustering is based on an Index object that assigns training points to the centroids. Therefore, at each iteration the centroids are added to the index.

On output, the centoids table is set to the latest version of the centroids and they are also added to the index. If the centroids table it is not empty on input, it is also used for initialization.

Subclassed by faiss::Clustering1D

Public Types

typedef Index::idx_t idx_t

Public Functions

Clustering(int d, int k)
Clustering(int d, int k, const ClusteringParameters &cp)
virtual void train(idx_t n, const float *x, faiss::Index &index, const float *x_weights = nullptr)

run k-means training

Parameters
  • x – training vectors, size n * d

  • index – index used for assignment

  • x_weights – weight associated to each vector: NULL or size n

void train_encoded(idx_t nx, const uint8_t *x_in, const Index *codec, Index &index, const float *weights = nullptr)

run with encoded vectors

win addition to train()’s parameters takes a codec as parameter to decode the input vectors.

Parameters

codec – codec used to decode the vectors (nullptr = vectors are in fact floats) *

void post_process_centroids()

Post-process the centroids after each centroid update. includes optional L2 normalization and nearest integer rounding

inline virtual ~Clustering()

Public Members

size_t d

dimension of the vectors

size_t k

nb of centroids

std::vector<float> centroids

centroids (k * d) if centroids are set on input to train, they will be used as initialization

std::vector<ClusteringIterationStats> iteration_stats

stats at every iteration of clustering

int niter

clustering iterations

int nredo

redo clustering this many times and keep best

bool verbose
bool spherical

do we want normalized centroids?

bool int_centroids

round centroids coordinates to integer

bool update_index

re-train index after each iteration?

bool frozen_centroids

use the centroids provided as input and do not change them during iterations

int min_points_per_centroid

otherwise you get a warning

int max_points_per_centroid

to limit size of dataset

int seed

seed for the random number generator

size_t decode_block_size

how many vectors at a time to decode

struct Clustering1D : public faiss::Clustering
#include <Clustering.h>

Exact 1D clustering algorithm

Since it does not use an index, it does not overload the train() function

Public Types

typedef Index::idx_t idx_t

Public Functions

explicit Clustering1D(int k)
Clustering1D(int k, const ClusteringParameters &cp)
void train_exact(idx_t n, const float *x)
inline virtual ~Clustering1D()
virtual void train(idx_t n, const float *x, faiss::Index &index, const float *x_weights = nullptr)

run k-means training

Parameters
  • x – training vectors, size n * d

  • index – index used for assignment

  • x_weights – weight associated to each vector: NULL or size n

void train_encoded(idx_t nx, const uint8_t *x_in, const Index *codec, Index &index, const float *weights = nullptr)

run with encoded vectors

win addition to train()’s parameters takes a codec as parameter to decode the input vectors.

Parameters

codec – codec used to decode the vectors (nullptr = vectors are in fact floats) *

void post_process_centroids()

Post-process the centroids after each centroid update. includes optional L2 normalization and nearest integer rounding

Public Members

size_t d

dimension of the vectors

size_t k

nb of centroids

std::vector<float> centroids

centroids (k * d) if centroids are set on input to train, they will be used as initialization

std::vector<ClusteringIterationStats> iteration_stats

stats at every iteration of clustering

int niter

clustering iterations

int nredo

redo clustering this many times and keep best

bool verbose
bool spherical

do we want normalized centroids?

bool int_centroids

round centroids coordinates to integer

bool update_index

re-train index after each iteration?

bool frozen_centroids

use the centroids provided as input and do not change them during iterations

int min_points_per_centroid

otherwise you get a warning

int max_points_per_centroid

to limit size of dataset

int seed

seed for the random number generator

size_t decode_block_size

how many vectors at a time to decode

struct ClusteringIterationStats

Public Members

float obj

objective values (sum of distances reported by index)

double time

seconds for iteration

double time_search

seconds for just search

double imbalance_factor

imbalance factor of iteration

int nsplit

number of cluster splits

struct ClusteringParameters
#include <Clustering.h>

Class for the clustering parameters. Can be passed to the constructor of the Clustering object.

Subclassed by faiss::Clustering, faiss::ProgressiveDimClusteringParameters

Public Functions

ClusteringParameters()

sets reasonable defaults

Public Members

int niter

clustering iterations

int nredo

redo clustering this many times and keep best

bool verbose
bool spherical

do we want normalized centroids?

bool int_centroids

round centroids coordinates to integer

bool update_index

re-train index after each iteration?

bool frozen_centroids

use the centroids provided as input and do not change them during iterations

int min_points_per_centroid

otherwise you get a warning

int max_points_per_centroid

to limit size of dataset

int seed

seed for the random number generator

size_t decode_block_size

how many vectors at a time to decode

template<typename T_, typename TI_>
struct CMax
#include <ordered_key_value.h>

The C object gives the type T of the values of a key-value storage, the type of the keys, TI and the comparison that is done: CMax for a decreasing series and CMin for increasing series. In other words, for a given threshold threshold, an incoming value x is kept if

C::cmp(threshold, x)

is true.

Public Types

typedef T_ T
typedef TI_ TI
typedef CMin<T_, TI_> Crev

Public Static Functions

static inline bool cmp(T a, T b)
static inline T neutral()
static inline T nextafter(T x)

Public Static Attributes

static const bool is_max = true
template<typename T_, typename TI_>
struct CMin

Public Types

typedef T_ T
typedef TI_ TI
typedef CMax<T_, TI_> Crev

Public Static Functions

static inline bool cmp(T a, T b)
static inline T neutral()
static inline T nextafter(T x)

Public Static Attributes

static const bool is_max = false
struct DirectMap
#include <DirectMap.h>

Direct map: a way to map back from ids to inverted lists

Public Types

enum Type

Values:

enumerator NoMap
enumerator Array
enumerator Hashtable
typedef Index::idx_t idx_t

Public Functions

DirectMap()
void set_type(Type new_type, const InvertedLists *invlists, size_t ntotal)

set type and initialize

idx_t get(idx_t id) const

get an entry

inline bool no() const

for quick checks

void check_can_add(const idx_t *ids)

throw if Array and ids is not NULL

update the direct_map

void add_single_id(idx_t id, idx_t list_no, size_t offset)

non thread-safe version

void clear()

remove all entries

size_t remove_ids(const IDSelector &sel, InvertedLists *invlists)

remove ids from the InvertedLists, possibly using the direct map

operations on inverted lists that require translation with a DirectMap

void update_codes(InvertedLists *invlists, int n, const idx_t *ids, const idx_t *list_nos, const uint8_t *codes)

update entries, using the direct map

Public Members

Type type
std::vector<idx_t> array

map for direct access to the elements. Map ids to LO-encoded entries.

std::unordered_map<idx_t, idx_t> hashtable
struct DirectMapAdd
#include <DirectMap.h>

Thread-safe way of updating the direct_map.

Public Types

typedef Index::idx_t idx_t
using Type = DirectMap::Type

Public Functions

DirectMapAdd(DirectMap &direct_map, size_t n, const idx_t *xids)
void add(size_t i, idx_t list_no, size_t offset)

add vector i (with id xids[i]) at list_no and offset

~DirectMapAdd()

Public Members

DirectMap &direct_map
DirectMap::Type type
size_t ntotal
size_t n
const idx_t *xids
std::vector<idx_t> all_ofs
struct FastScanStats

Public Functions

inline FastScanStats()
inline void reset()

Public Members

uint64_t t0
uint64_t t1
uint64_t t2
uint64_t t3
struct GenHammingComputer16

Public Functions

inline GenHammingComputer16(const uint8_t *a8, int code_size)
inline int hamming(const uint8_t *b8) const

Public Members

uint64_t a0
uint64_t a1
struct GenHammingComputer32

Public Functions

inline GenHammingComputer32(const uint8_t *a8, int code_size)
inline int hamming(const uint8_t *b8) const

Public Members

uint64_t a0
uint64_t a1
uint64_t a2
uint64_t a3
struct GenHammingComputer8

Public Functions

inline GenHammingComputer8(const uint8_t *a, int code_size)
inline int hamming(const uint8_t *b) const

Public Members

uint64_t a0
struct GenHammingComputerM8

Public Functions

inline GenHammingComputerM8(const uint8_t *a8, int code_size)
inline int hamming(const uint8_t *b8) const

Public Members

const uint64_t *a
int n
template<int CODE_SIZE>
struct HammingComputer : public faiss::HammingComputerDefault

Public Functions

inline HammingComputer(const uint8_t *a, int code_size)
inline void set(const uint8_t *a8, int code_size)
inline int hamming(const uint8_t *b8) const

Public Members

const uint8_t *a8
int quotient8
int remainder8
struct HammingComputer16

Public Functions

inline HammingComputer16()
inline HammingComputer16(const uint8_t *a8, int code_size)
inline void set(const uint8_t *a8, int code_size)
inline int hamming(const uint8_t *b8) const

Public Members

uint64_t a0
uint64_t a1
struct HammingComputer20

Public Functions

inline HammingComputer20()
inline HammingComputer20(const uint8_t *a8, int code_size)
inline void set(const uint8_t *a8, int code_size)
inline int hamming(const uint8_t *b8) const

Public Members

uint64_t a0
uint64_t a1
uint32_t a2
struct HammingComputer32

Public Functions

inline HammingComputer32()
inline HammingComputer32(const uint8_t *a8, int code_size)
inline void set(const uint8_t *a8, int code_size)
inline int hamming(const uint8_t *b8) const

Public Members

uint64_t a0
uint64_t a1
uint64_t a2
uint64_t a3
struct HammingComputer4

Public Functions

inline HammingComputer4()
inline HammingComputer4(const uint8_t *a, int code_size)
inline void set(const uint8_t *a, int code_size)
inline int hamming(const uint8_t *b) const

Public Members

uint32_t a0
struct HammingComputer64

Public Functions

inline HammingComputer64()
inline HammingComputer64(const uint8_t *a8, int code_size)
inline void set(const uint8_t *a8, int code_size)
inline int hamming(const uint8_t *b8) const

Public Members

uint64_t a0
uint64_t a1
uint64_t a2
uint64_t a3
uint64_t a4
uint64_t a5
uint64_t a6
uint64_t a7
struct HammingComputer8

Public Functions

inline HammingComputer8()
inline HammingComputer8(const uint8_t *a, int code_size)
inline void set(const uint8_t *a, int code_size)
inline int hamming(const uint8_t *b) const

Public Members

uint64_t a0
struct HammingComputerDefault

Subclassed by faiss::HammingComputer< CODE_SIZE >

Public Functions

inline HammingComputerDefault()
inline HammingComputerDefault(const uint8_t *a8, int code_size)
inline void set(const uint8_t *a8, int code_size)
inline int hamming(const uint8_t *b8) const

Public Members

const uint8_t *a8
int quotient8
int remainder8
struct HammingComputerM4

Public Functions

inline HammingComputerM4()
inline HammingComputerM4(const uint8_t *a4, int code_size)
inline void set(const uint8_t *a4, int code_size)
inline int hamming(const uint8_t *b8) const

Public Members

const uint32_t *a
int n
struct HammingComputerM8

Public Functions

inline HammingComputerM8()
inline HammingComputerM8(const uint8_t *a8, int code_size)
inline void set(const uint8_t *a8, int code_size)
inline int hamming(const uint8_t *b8) const

Public Members

const uint64_t *a
int n
template<class HammingComputer>
struct HCounterState
#include <hamming-inl.h>

This class maintains a list of best distances seen so far.

Since the distances are in a limited range (0 to nbit), the object maintains one list per possible distance, and fills in only the n-first lists, such that the sum of sizes of the n lists is below k.

Public Functions

inline HCounterState(int *counters, int64_t *ids_per_dis, const uint8_t *x, int d, int k)
inline void update_counter(const uint8_t *y, size_t j)

Public Members

int *counters
int64_t *ids_per_dis
HammingComputer hc
int thres
int count_lt
int count_eq
int k
template<typename C>
struct HeapArray
#include <Heap.h>

a template structure for a set of [min|max]-heaps it is tailored so that the actual data of the heaps can just live in compact arrays.

Public Types

typedef C::TI TI
typedef C::T T

Public Functions

inline T *get_val(size_t key)

Return the list of values for a heap.

inline TI *get_ids(size_t key)

Correspponding identifiers.

void heapify()

prepare all the heaps before adding

void addn(size_t nj, const T *vin, TI j0 = 0, size_t i0 = 0, int64_t ni = -1)

add nj elements to heaps i0:i0+ni, with sequential ids

Parameters
  • nj – nb of elements to add to each heap

  • vin – elements to add, size ni * nj

  • j0 – add this to the ids that are added

  • i0 – first heap to update

  • ni – nb of elements to update (-1 = use nh)

void addn_with_ids(size_t nj, const T *vin, const TI *id_in = nullptr, int64_t id_stride = 0, size_t i0 = 0, int64_t ni = -1)

same as addn

Parameters
  • id_in – ids of the elements to add, size ni * nj

  • id_stride – stride for id_in

void reorder()

reorder all the heaps

void per_line_extrema(T *vals_out, TI *idx_out) const

this is not really a heap function. It just finds the per-line extrema of each line of array D

Parameters
  • vals_out – extreme value of each line (size nh, or NULL)

  • idx_out – index of extreme value (size nh or NULL)

Public Members

size_t nh

number of heaps

size_t k

allocated size per heap

TI *ids

identifiers (size nh * k)

T *val

values (distances or similarities), size nh * k

struct HStackInvertedLists : public faiss::ReadOnlyInvertedLists
#include <InvertedLists.h>

Horizontal stack of inverted lists.

Public Types

typedef Index::idx_t idx_t

Public Functions

HStackInvertedLists(int nil, const InvertedLists **ils)

build InvertedLists by concatenating nil of them

virtual size_t list_size(size_t list_no) const override

get the size of a list

virtual const uint8_t *get_codes(size_t list_no) const override

get the codes for an inverted list must be released by release_codes

Returns

codes size list_size * code_size

virtual const idx_t *get_ids(size_t list_no) const override

get the ids for an inverted list must be released by release_ids

Returns

ids size list_size

virtual void prefetch_lists(const idx_t *list_nos, int nlist) const override

prepare the following lists (default does nothing) a list can be -1 hence the signed long

virtual void release_codes(size_t list_no, const uint8_t *codes) const override

release codes returned by get_codes (default implementation is nop

virtual void release_ids(size_t list_no, const idx_t *ids) const override

release ids returned by get_ids

virtual idx_t get_single_id(size_t list_no, size_t offset) const override
Returns

a single id in an inverted list

virtual const uint8_t *get_single_code(size_t list_no, size_t offset) const override
Returns

a single code in an inverted list (should be deallocated with release_codes)

virtual size_t add_entries(size_t list_no, size_t n_entry, const idx_t *ids, const uint8_t *code) override
virtual void update_entries(size_t list_no, size_t offset, size_t n_entry, const idx_t *ids, const uint8_t *code) override
virtual void resize(size_t list_no, size_t new_size) override
virtual size_t add_entry(size_t list_no, idx_t theid, const uint8_t *code)

add one entry to an inverted list

virtual void update_entry(size_t list_no, size_t offset, idx_t id, const uint8_t *code)
virtual void reset()
void merge_from(InvertedLists *oivf, size_t add_id)

move all entries from oivf (empty on output)

double imbalance_factor() const

1= perfectly balanced, >1: imbalanced

void print_stats() const

display some stats about the inverted lists

size_t compute_ntotal() const

sum up list sizes

Public Members

std::vector<const InvertedLists*> ils
size_t nlist

number of possible key values

size_t code_size

code size per vector in bytes

Public Static Attributes

static const size_t INVALID_CODE_SIZE = static_cast<size_t>(-1)

used for BlockInvertedLists, where the codes are packed into groups and the individual code size is meaningless

struct Index
#include <Index.h>

Abstract structure for an index, supports adding vectors and searching them.

All vectors provided at add or search time are 32-bit float arrays, although the internal representation may vary.

Subclassed by faiss::AdditiveCoarseQuantizer, faiss::gpu::GpuIndex, faiss::Index2Layer, faiss::IndexAdditiveQuantizer, faiss::IndexFlat, faiss::IndexHNSW, faiss::IndexIVF, faiss::IndexLattice, faiss::IndexLSH, faiss::IndexNNDescent, faiss::IndexNSG, faiss::IndexPQ, faiss::IndexPQFastScan, faiss::IndexPreTransform, faiss::IndexRefine, faiss::IndexScalarQuantizer, faiss::IndexSplitVectors, faiss::MultiIndexQuantizer

Public Types

using idx_t = int64_t

all indices are this type

using component_t = float
using distance_t = float

Public Functions

inline explicit Index(idx_t d = 0, MetricType metric = METRIC_L2)
virtual ~Index()
virtual void train(idx_t n, const float *x)

Perform training on a representative set of vectors

Parameters
  • n – nb of training vectors

  • x – training vecors, size n * d

virtual void add(idx_t n, const float *x) = 0

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1 This function slices the input vectors in chunks smaller than blocksize_add and calls add_core.

Parameters

x – input matrix, size n * d

virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters

xids – if non-null, ids to store for the vectors (size n)

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels) const = 0

query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters
  • x – input vectors to search, size n * d

  • labels – output labels of the NNs, size n*k

  • distances – output pairwise distances, size n*k

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters
  • x – input vectors to search, size n * d

  • radius – search radius

  • result – result table

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const

return the indexes of the k vectors closest to the query x.

This function is identical as search but only return labels of neighbors.

Parameters
  • x – input vectors to search, size n * d

  • labels – output labels of the NNs, size n*k

virtual void reset() = 0

removes all elements from the database.

virtual size_t remove_ids(const IDSelector &sel)

removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void reconstruct(idx_t key, float *recons) const

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters
  • key – id of the vector to reconstruct

  • recons – reconstucted vector (size d)

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters

recons – reconstucted vector (size ni * d)

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting arrays is padded with -1s.

Parameters

recons – reconstructed vectors size (n, k, d)

virtual void compute_residual(const float *x, float *residual, idx_t key) const

Computes a residual vector after indexing encoding.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters
  • x – input vector, size d

  • residual – output residual vector, size d

  • key – encoded index, as returned by search and assign

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const

Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters
  • n – number of vectors

  • xs – input vectors, size (n x d)

  • residuals – output residual vectors, size (n x d)

  • keys – encoded index, as returned by search and assign

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const

size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters
  • n – number of vectors

  • x – input vectors, size n * d

  • bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

encode a set of vectors

Parameters
  • n – number of vectors

  • bytes – input encoded vectors, size n * sa_code_size()

  • x – output vectors, size n * d

Public Members

int d

vector dimension

idx_t ntotal

total nb of indexed vectors

bool verbose

verbosity level

bool is_trained

set if the Index does not require training, or if training is done already

MetricType metric_type

type of metric this index uses for search

float metric_arg

argument of the metric type

struct Index2Layer : public faiss::Index
#include <Index2Layer.h>

Same as an IndexIVFPQ without the inverted lists: codes are stored sequentially

The class is mainly inteded to store encoded vectors that can be accessed randomly, the search function is not implemented.

Public Types

using idx_t = int64_t

all indices are this type

using component_t = float
using distance_t = float

Public Functions

Index2Layer(Index *quantizer, size_t nlist, int M, int nbit = 8, MetricType metric = METRIC_L2)
Index2Layer()
~Index2Layer()
virtual void train(idx_t n, const float *x) override

Perform training on a representative set of vectors

Parameters
  • n – nb of training vectors

  • x – training vecors, size n * d

virtual void add(idx_t n, const float *x) override

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1 This function slices the input vectors in chunks smaller than blocksize_add and calls add_core.

Parameters

x – input matrix, size n * d

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels) const override

not implemented

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const override

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters

recons – reconstucted vector (size ni * d)

virtual void reconstruct(idx_t key, float *recons) const override

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters
  • key – id of the vector to reconstruct

  • recons – reconstucted vector (size d)

virtual void reset() override

removes all elements from the database.

virtual DistanceComputer *get_distance_computer() const override

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

void transfer_to_IVFPQ(IndexIVFPQ &other) const

transfer the flat codes to an IVFPQ index

virtual size_t sa_code_size() const override

size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const override

encode a set of vectors

Parameters
  • n – number of vectors

  • x – input vectors, size n * d

  • bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const override

encode a set of vectors

Parameters
  • n – number of vectors

  • bytes – input encoded vectors, size n * sa_code_size()

  • x – output vectors, size n * d

virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters

xids – if non-null, ids to store for the vectors (size n)

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters
  • x – input vectors to search, size n * d

  • radius – search radius

  • result – result table

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const

return the indexes of the k vectors closest to the query x.

This function is identical as search but only return labels of neighbors.

Parameters
  • x – input vectors to search, size n * d

  • labels – output labels of the NNs, size n*k

virtual size_t remove_ids(const IDSelector &sel)

removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting arrays is padded with -1s.

Parameters

recons – reconstructed vectors size (n, k, d)

virtual void compute_residual(const float *x, float *residual, idx_t key) const

Computes a residual vector after indexing encoding.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters
  • x – input vector, size d

  • residual – output residual vector, size d

  • key – encoded index, as returned by search and assign

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const

Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters
  • n – number of vectors

  • xs – input vectors, size (n x d)

  • residuals – output residual vectors, size (n x d)

  • keys – encoded index, as returned by search and assign

Public Members

Level1Quantizer q1

first level quantizer

ProductQuantizer pq

second level quantizer is always a PQ

std::vector<uint8_t> codes

Codes. Size ntotal * code_size.

size_t code_size_1

size of the code for the first level (ceil(log8(q1.nlist)))

size_t code_size_2

size of the code for the second level

size_t code_size

code_size_1 + code_size_2

int d

vector dimension

idx_t ntotal

total nb of indexed vectors

bool verbose

verbosity level

bool is_trained

set if the Index does not require training, or if training is done already

MetricType metric_type

type of metric this index uses for search

float metric_arg

argument of the metric type

struct IndexAdditiveQuantizer : public faiss::Index
#include <IndexAdditiveQuantizer.h>

Abstract class for additive quantizers. The search functions are in common.

Subclassed by faiss::IndexLocalSearchQuantizer, faiss::IndexResidualQuantizer

Public Types

using Search_type_t = AdditiveQuantizer::Search_type_t
using idx_t = int64_t

all indices are this type

using component_t = float
using distance_t = float

Public Functions

explicit IndexAdditiveQuantizer(idx_t d = 0, AdditiveQuantizer *aq = nullptr, MetricType metric = METRIC_L2)
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels) const override

query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters
  • x – input vectors to search, size n * d

  • labels – output labels of the NNs, size n*k

  • distances – output pairwise distances, size n*k

virtual void reset() override

removes all elements from the database.

virtual void add(idx_t n, const float *x) override

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1 This function slices the input vectors in chunks smaller than blocksize_add and calls add_core.

Parameters

x – input matrix, size n * d

virtual size_t sa_code_size() const override

size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const override

encode a set of vectors

Parameters
  • n – number of vectors

  • x – input vectors, size n * d

  • bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const override

encode a set of vectors

Parameters
  • n – number of vectors

  • bytes – input encoded vectors, size n * sa_code_size()

  • x – output vectors, size n * d

virtual void train(idx_t n, const float *x)

Perform training on a representative set of vectors

Parameters
  • n – nb of training vectors

  • x – training vecors, size n * d

virtual void add_with_ids(idx_t n, const float *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters

xids – if non-null, ids to store for the vectors (size n)

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters
  • x – input vectors to search, size n * d

  • radius – search radius

  • result – result table

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const

return the indexes of the k vectors closest to the query x.

This function is identical as search but only return labels of neighbors.

Parameters
  • x – input vectors to search, size n * d

  • labels – output labels of the NNs, size n*k

virtual size_t remove_ids(const IDSelector &sel)

removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void reconstruct(idx_t key, float *recons) const

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters
  • key – id of the vector to reconstruct

  • recons – reconstucted vector (size d)

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters

recons – reconstucted vector (size ni * d)

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting arrays is padded with -1s.

Parameters

recons – reconstructed vectors size (n, k, d)

virtual void compute_residual(const float *x, float *residual, idx_t key) const

Computes a residual vector after indexing encoding.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters
  • x – input vector, size d

  • residual – output residual vector, size d

  • key – encoded index, as returned by search and assign

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const

Computes a residual vector after indexing encoding (batch form). Equivalent to calling compute_residual for each vector.

The residual vector is the difference between a vector and the reconstruction that can be decoded from its representation in the index. The residual can be used for multiple-stage indexing methods, like IndexIVF’s methods.

Parameters
  • n – number of vectors

  • xs – input vectors, size (n x d)

  • residuals – output residual vectors, size (n x d)

  • keys – encoded index, as returned by search and assign

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

Public Members

AdditiveQuantizer *aq
size_t code_size

size of residual quantizer codes + norms

std::vector<uint8_t> codes

Codes. Size ntotal * rq.code_size.

int d

vector dimension

idx_t ntotal

total nb of indexed vectors

bool verbose

verbosity level

bool is_trained

set if the Index does not require training, or if training is done already

MetricType metric_type

type of metric this index uses for search

float metric_arg

argument of the metric type

struct IndexBinary
#include <IndexBinary.h>

Abstract structure for a binary index.

Supports adding vertices and searching them.

All queries are symmetric because there is no distinction between codes and vectors.

Subclassed by faiss::gpu::GpuIndexBinaryFlat, faiss::IndexBinaryFlat, faiss::IndexBinaryFromFloat, faiss::IndexBinaryHash, faiss::IndexBinaryHNSW, faiss::IndexBinaryIVF, faiss::IndexBinaryMultiHash

Public Types

using idx_t = Index::idx_t

all indices are this type

using component_t = uint8_t
using distance_t = int32_t

Public Functions

inline explicit IndexBinary(idx_t d = 0, MetricType metric = METRIC_L2)
virtual ~IndexBinary()
virtual void train(idx_t n, const uint8_t *x)

Perform training on a representative set of vectors.

Parameters
  • n – nb of training vectors

  • x – training vecors, size n * d / 8

virtual void add(idx_t n, const uint8_t *x) = 0

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1

Parameters

x – input matrix, size n * d / 8

virtual void add_with_ids(idx_t n, const uint8_t *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters

xids – if non-null, ids to store for the vectors (size n)

virtual void search(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels) const = 0

Query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters
  • x – input vectors to search, size n * d / 8

  • labels – output labels of the NNs, size n*k

  • distances – output pairwise distances, size n*k

virtual void range_search(idx_t n, const uint8_t *x, int radius, RangeSearchResult *result) const

Query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory). The distances are converted to float to reuse the RangeSearchResult structure, but they are integer. By convention, only distances < radius (strict comparison) are returned, ie. radius = 0 does not return any result and 1 returns only exact same vectors.

Parameters
  • x – input vectors to search, size n * d / 8

  • radius – search radius

  • result – result table

void assign(idx_t n, const uint8_t *x, idx_t *labels, idx_t k = 1) const

Return the indexes of the k vectors closest to the query x.

This function is identical to search but only returns labels of neighbors.

Parameters
  • x – input vectors to search, size n * d / 8

  • labels – output labels of the NNs, size n*k

virtual void reset() = 0

Removes all elements from the database.

virtual size_t remove_ids(const IDSelector &sel)

Removes IDs from the index. Not supported by all indexes.

virtual void reconstruct(idx_t key, uint8_t *recons) const

Reconstruct a stored vector.

This function may not be defined for some indexes.

Parameters
  • key – id of the vector to reconstruct

  • recons – reconstucted vector (size d / 8)

virtual void reconstruct_n(idx_t i0, idx_t ni, uint8_t *recons) const

Reconstruct vectors i0 to i0 + ni - 1.

This function may not be defined for some indexes.

Parameters

recons – reconstucted vectors (size ni * d / 8)

virtual void search_and_reconstruct(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels, uint8_t *recons) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting array is padded with -1s.

Parameters

recons – reconstructed vectors size (n, k, d)

void display() const

Display the actual class name and some more info.

Public Members

int d

vector dimension

int code_size

number of bytes per vector ( = d / 8 )

idx_t ntotal

total nb of indexed vectors

bool verbose

verbosity level

bool is_trained

set if the Index does not require training, or if training is done already

MetricType metric_type

type of metric this index uses for search

struct IndexBinaryFlat : public faiss::IndexBinary
#include <IndexBinaryFlat.h>

Index that stores the full vectors and performs exhaustive search.

Public Types

using idx_t = Index::idx_t

all indices are this type

using component_t = uint8_t
using distance_t = int32_t

Public Functions

explicit IndexBinaryFlat(idx_t d)
virtual void add(idx_t n, const uint8_t *x) override

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1

Parameters

x – input matrix, size n * d / 8

virtual void reset() override

Removes all elements from the database.

virtual void search(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels) const override

Query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters
  • x – input vectors to search, size n * d / 8

  • labels – output labels of the NNs, size n*k

  • distances – output pairwise distances, size n*k

virtual void range_search(idx_t n, const uint8_t *x, int radius, RangeSearchResult *result) const override

Query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory). The distances are converted to float to reuse the RangeSearchResult structure, but they are integer. By convention, only distances < radius (strict comparison) are returned, ie. radius = 0 does not return any result and 1 returns only exact same vectors.

Parameters
  • x – input vectors to search, size n * d / 8

  • radius – search radius

  • result – result table

virtual void reconstruct(idx_t key, uint8_t *recons) const override

Reconstruct a stored vector.

This function may not be defined for some indexes.

Parameters
  • key – id of the vector to reconstruct

  • recons – reconstucted vector (size d / 8)

virtual size_t remove_ids(const IDSelector &sel) override

Remove some ids. Note that because of the indexing structure, the semantics of this operation are different from the usual ones: the new ids are shifted.

inline IndexBinaryFlat()
virtual void train(idx_t n, const uint8_t *x)

Perform training on a representative set of vectors.

Parameters
  • n – nb of training vectors

  • x – training vecors, size n * d / 8

virtual void add_with_ids(idx_t n, const uint8_t *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters

xids – if non-null, ids to store for the vectors (size n)

void assign(idx_t n, const uint8_t *x, idx_t *labels, idx_t k = 1) const

Return the indexes of the k vectors closest to the query x.

This function is identical to search but only returns labels of neighbors.

Parameters
  • x – input vectors to search, size n * d / 8

  • labels – output labels of the NNs, size n*k

virtual void reconstruct_n(idx_t i0, idx_t ni, uint8_t *recons) const

Reconstruct vectors i0 to i0 + ni - 1.

This function may not be defined for some indexes.

Parameters

recons – reconstucted vectors (size ni * d / 8)

virtual void search_and_reconstruct(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels, uint8_t *recons) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting array is padded with -1s.

Parameters

recons – reconstructed vectors size (n, k, d)

void display() const

Display the actual class name and some more info.

Public Members

std::vector<uint8_t> xb

database vectors, size ntotal * d / 8

bool use_heap = true

Select between using a heap or counting to select the k smallest values when scanning inverted lists.

size_t query_batch_size = 32
int d

vector dimension

int code_size

number of bytes per vector ( = d / 8 )

idx_t ntotal

total nb of indexed vectors

bool verbose

verbosity level

bool is_trained

set if the Index does not require training, or if training is done already

MetricType metric_type

type of metric this index uses for search

struct IndexBinaryFromFloat : public faiss::IndexBinary
#include <IndexBinaryFromFloat.h>

IndexBinary backed by a float Index.

Supports adding vertices and searching them.

All queries are symmetric because there is no distinction between codes and vectors.

Public Types

using idx_t = Index::idx_t

all indices are this type

using component_t = uint8_t
using distance_t = int32_t

Public Functions

IndexBinaryFromFloat()
explicit IndexBinaryFromFloat(Index *index)
~IndexBinaryFromFloat()
virtual void add(idx_t n, const uint8_t *x) override

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1

Parameters

x – input matrix, size n * d / 8

virtual void reset() override

Removes all elements from the database.

virtual void search(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels) const override

Query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters
  • x – input vectors to search, size n * d / 8

  • labels – output labels of the NNs, size n*k

  • distances – output pairwise distances, size n*k

virtual void train(idx_t n, const uint8_t *x) override

Perform training on a representative set of vectors.

Parameters
  • n – nb of training vectors

  • x – training vecors, size n * d / 8

virtual void add_with_ids(idx_t n, const uint8_t *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters

xids – if non-null, ids to store for the vectors (size n)

virtual void range_search(idx_t n, const uint8_t *x, int radius, RangeSearchResult *result) const

Query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory). The distances are converted to float to reuse the RangeSearchResult structure, but they are integer. By convention, only distances < radius (strict comparison) are returned, ie. radius = 0 does not return any result and 1 returns only exact same vectors.

Parameters
  • x – input vectors to search, size n * d / 8

  • radius – search radius

  • result – result table

void assign(idx_t n, const uint8_t *x, idx_t *labels, idx_t k = 1) const

Return the indexes of the k vectors closest to the query x.

This function is identical to search but only returns labels of neighbors.

Parameters
  • x – input vectors to search, size n * d / 8

  • labels – output labels of the NNs, size n*k

virtual size_t remove_ids(const IDSelector &sel)

Removes IDs from the index. Not supported by all indexes.

virtual void reconstruct(idx_t key, uint8_t *recons) const

Reconstruct a stored vector.

This function may not be defined for some indexes.

Parameters
  • key – id of the vector to reconstruct

  • recons – reconstucted vector (size d / 8)

virtual void reconstruct_n(idx_t i0, idx_t ni, uint8_t *recons) const

Reconstruct vectors i0 to i0 + ni - 1.

This function may not be defined for some indexes.

Parameters

recons – reconstucted vectors (size ni * d / 8)

virtual void search_and_reconstruct(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels, uint8_t *recons) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting array is padded with -1s.

Parameters

recons – reconstructed vectors size (n, k, d)

void display() const

Display the actual class name and some more info.

Public Members

Index *index = nullptr
bool own_fields = false

Whether object owns the index pointer.

int d

vector dimension

int code_size

number of bytes per vector ( = d / 8 )