File Heap.h
-
namespace faiss
Implementation of k-means clustering with many variants.
Copyright (c) Facebook, Inc. and its affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.
IDSelector is intended to define a subset of vectors to handle (for removal or as subset to search)
PQ4 SIMD packing and accumulation functions
The basic kernel accumulates nq query vectors with bbs = nb * 2 * 16 vectors and produces an output matrix for that. It is interesting for nq * nb <= 4, otherwise register spilling becomes too large.
The implementation of these functions is spread over 3 cpp files to reduce parallel compile times. Templates are instantiated explicitly.
This file contains callbacks for kernels that compute distances.
Throughout the library, vectors are provided as float * pointers. Most algorithms can be optimized when several vectors are processed (added/searched) together in a batch. In this case, they are passed in as a matrix. When n vectors of size d are provided as float * x, component j of vector i is
x[ i * d + j ]
where 0 <= i < n and 0 <= j < d. In other words, matrices are always compact. When specifying the size of the matrix, we call it an n*d matrix, which implies a row-major storage.
I/O functions can read/write to a filename, a file handle or to an object that abstracts the medium.
The read functions return objects that should be deallocated with delete. All references within these objectes are owned by the object.
Definition of inverted lists + a few common classes that implement the interface.
Since IVF (inverted file) indexes are of so much use for large-scale use cases, we group a few functions related to them in this small library. Most functions work both on IndexIVFs and IndexIVFs embedded within an IndexPreTransform.
In this file are the implementations of extra metrics beyond L2 and inner product
Implements a few neural net layers, mainly to support QINCo
Defines a few objects that apply transformations to a set of vectors Often these are pre-processing steps.
Typedefs
Functions
-
template<class C>
inline void heap_pop(size_t k, typename C::T *bh_val, typename C::TI *bh_ids) Pops the top element from the heap defined by bh_val[0..k-1] and bh_ids[0..k-1]. on output the element at k-1 is undefined.
-
template<class C>
inline void heap_push(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, typename C::T val, typename C::TI id) Pushes the element (val, ids) into the heap bh_val[0..k-2] and bh_ids[0..k-2]. on output the element at k-1 is defined.
-
template<class C>
inline void heap_replace_top(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, typename C::T val, typename C::TI id) Replaces the top element from the heap defined by bh_val[0..k-1] and bh_ids[0..k-1], and for identical bh_val[] values also sorts by bh_ids[] values.
-
template<typename T>
inline void minheap_push(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
-
template<typename T>
inline void minheap_replace_top(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
-
template<typename T>
inline void maxheap_push(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
-
template<typename T>
inline void maxheap_replace_top(size_t k, T *bh_val, int64_t *bh_ids, T val, int64_t ids)
-
template<class C>
inline void heap_pop(size_t k, std::pair<typename C::T, typename C::TI> *bh) Pops the top element from the heap defined by bh_val[0..k-1] and bh_ids[0..k-1]. on output the element at k-1 is undefined.
-
template<class C>
inline void heap_push(size_t k, std::pair<typename C::T, typename C::TI> *bh, typename C::T val, typename C::TI id) Pushes the element (val, ids) into the heap bh_val[0..k-2] and bh_ids[0..k-2]. on output the element at k-1 is defined.
-
template<class C>
inline void heap_replace_top(size_t k, std::pair<typename C::T, typename C::TI> *bh, typename C::T val, typename C::TI id) Replaces the top element from the heap defined by bh_val[0..k-1] and bh_ids[0..k-1], and for identical bh_val[] values also sorts by bh_ids[] values.
-
template<class C>
inline void heap_heapify(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, const typename C::T *x = nullptr, const typename C::TI *ids = nullptr, size_t k0 = 0)
-
template<typename T>
inline void minheap_heapify(size_t k, T *bh_val, int64_t *bh_ids, const T *x = nullptr, const int64_t *ids = nullptr, size_t k0 = 0)
-
template<typename T>
inline void maxheap_heapify(size_t k, T *bh_val, int64_t *bh_ids, const T *x = nullptr, const int64_t *ids = nullptr, size_t k0 = 0)
-
template<class C>
inline void heap_addn(size_t k, typename C::T *bh_val, typename C::TI *bh_ids, const typename C::T *x, const typename C::TI *ids, size_t n)
-
template<typename T>
inline void minheap_addn(size_t k, T *bh_val, int64_t *bh_ids, const T *x, const int64_t *ids, size_t n)
-
template<typename T>
inline void maxheap_addn(size_t k, T *bh_val, int64_t *bh_ids, const T *x, const int64_t *ids, size_t n)
-
template<typename C>
inline size_t heap_reorder(size_t k, typename C::T *bh_val, typename C::TI *bh_ids)
-
template<class C>
inline void indirect_heap_pop(size_t k, const typename C::T *bh_val, typename C::TI *bh_ids)
-
template<class C>
inline void indirect_heap_push(size_t k, const typename C::T *bh_val, typename C::TI *bh_ids, typename C::TI id)
-
template<class idx_t, class C>
void merge_knn_results(size_t n, size_t k, typename C::TI nshard, const typename C::T *all_distances, const idx_t *all_labels, typename C::T *distances, idx_t *labels) Merge result tables from several shards. The per-shard results are assumed to be sorted. Note that the C comparator is reversed w.r.t. the usual top-k element heap because we want the best (ie. lowest for L2) result to be on top, not the worst. Also, it needs to hold an index of a shard id (ie. usually int32 is more than enough).
- Parameters:
all_distances – size (nshard, n, k)
all_labels – size (nshard, n, k)
distances – output distances, size (n, k)
labels – output labels, size (n, k)
-
template<typename C>
struct HeapArray - #include <Heap.h>
a template structure for a set of [min|max]-heaps it is tailored so that the actual data of the heaps can just live in compact arrays.
Public Functions
-
void heapify()
prepare all the heaps before adding
-
void addn(size_t nj, const T *vin, TI j0 = 0, size_t i0 = 0, int64_t ni = -1)
add nj elements to heaps i0:i0+ni, with sequential ids
- Parameters:
nj – nb of elements to add to each heap
vin – elements to add, size ni * nj
j0 – add this to the ids that are added
i0 – first heap to update
ni – nb of elements to update (-1 = use nh)
-
void addn_with_ids(size_t nj, const T *vin, const TI *id_in = nullptr, int64_t id_stride = 0, size_t i0 = 0, int64_t ni = -1)
same as addn
- Parameters:
id_in – ids of the elements to add, size ni * nj
id_stride – stride for id_in
-
void addn_query_subset_with_ids(size_t nsubset, const TI *subset, size_t nj, const T *vin, const TI *id_in = nullptr, int64_t id_stride = 0)
same as addn_with_ids, but for just a subset of queries
- Parameters:
nsubset – number of query entries to update
subset – indexes of queries to update, in 0..nh-1, size nsubset
-
void reorder()
reorder all the heaps
-
void heapify()
-
template<class C>