File HNSW.h
-
namespace faiss
Implementation of k-means clustering with many variants.
Copyright (c) Facebook, Inc. and its affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.
IDSelector is intended to define a subset of vectors to handle (for removal or as subset to search)
PQ4 SIMD packing and accumulation functions
The basic kernel accumulates nq query vectors with bbs = nb * 2 * 16 vectors and produces an output matrix for that. It is interesting for nq * nb <= 4, otherwise register spilling becomes too large.
The implementation of these functions is spread over 3 cpp files to reduce parallel compile times. Templates are instantiated explicitly.
This file contains callbacks for kernels that compute distances.
Throughout the library, vectors are provided as float * pointers. Most algorithms can be optimized when several vectors are processed (added/searched) together in a batch. In this case, they are passed in as a matrix. When n vectors of size d are provided as float * x, component j of vector i is
x[ i * d + j ]
where 0 <= i < n and 0 <= j < d. In other words, matrices are always compact. When specifying the size of the matrix, we call it an n*d matrix, which implies a row-major storage.
I/O functions can read/write to a filename, a file handle or to an object that abstracts the medium.
The read functions return objects that should be deallocated with delete. All references within these objectes are owned by the object.
Definition of inverted lists + a few common classes that implement the interface.
Since IVF (inverted file) indexes are of so much use for large-scale use cases, we group a few functions related to them in this small library. Most functions work both on IndexIVFs and IndexIVFs embedded within an IndexPreTransform.
In this file are the implementations of extra metrics beyond L2 and inner product
Implements a few neural net layers, mainly to support QINCo
Defines a few objects that apply transformations to a set of vectors Often these are pre-processing steps.
Functions
-
int search_from_candidates(const HNSW &hnsw, DistanceComputer &qdis, ResultHandler<HNSW::C> &res, HNSW::MinimaxHeap &candidates, VisitedTable &vt, HNSWStats &stats, int level, int nres_in = 0, const SearchParametersHNSW *params = nullptr)
-
HNSWStats greedy_update_nearest(const HNSW &hnsw, DistanceComputer &qdis, int level, HNSW::storage_idx_t &nearest, float &d_nearest)
-
std::priority_queue<HNSW::Node> search_from_candidate_unbounded(const HNSW &hnsw, const HNSW::Node &node, DistanceComputer &qdis, int ef, VisitedTable *vt, HNSWStats &stats)
-
void search_neighbors_to_add(HNSW &hnsw, DistanceComputer &qdis, std::priority_queue<HNSW::NodeDistCloser> &results, int entry_point, float d_entry_point, int level, VisitedTable &vt, bool reference_version = false)
Variables
- FAISS_API HNSWStats hnsw_stats
-
struct SearchParametersHNSW : public faiss::SearchParameters
Public Functions
-
inline ~SearchParametersHNSW()
-
inline ~SearchParametersHNSW()
-
struct HNSW
Public Types
-
using storage_idx_t = int32_t
internal storage of vectors (32 bits: this is expensive)
-
typedef std::pair<float, storage_idx_t> Node
Public Functions
-
void set_default_probas(int M, float levelMult)
initialize the assign_probas and cum_nneighbor_per_level to have 2*M links on level 0 and M links on levels > 0
-
void set_nb_neighbors(int level_no, int n)
set nb of neighbors for this level (before adding anything)
-
int nb_neighbors(int layer_no) const
nb of neighbors for this level
-
int cum_nb_neighbors(int layer_no) const
cumumlative nb up to (and excluding) this level
-
void neighbor_range(idx_t no, int layer_no, size_t *begin, size_t *end) const
range of entries in the neighbors table of vertex no at layer_no
-
explicit HNSW(int M = 32)
only mandatory parameter: nb of neighbors
-
int random_level()
pick a random level for a new point
-
void fill_with_random_links(size_t n)
add n random levels to table (for debugging…)
-
void add_links_starting_from(DistanceComputer &ptdis, storage_idx_t pt_id, storage_idx_t nearest, float d_nearest, int level, omp_lock_t *locks, VisitedTable &vt, bool keep_max_size_level0 = false)
-
void add_with_locks(DistanceComputer &ptdis, int pt_level, int pt_id, std::vector<omp_lock_t> &locks, VisitedTable &vt, bool keep_max_size_level0 = false)
add point pt_id on all levels <= pt_level and build the link structure for them.
-
HNSWStats search(DistanceComputer &qdis, ResultHandler<C> &res, VisitedTable &vt, const SearchParametersHNSW *params = nullptr) const
search interface for 1 point, single thread
-
void search_level_0(DistanceComputer &qdis, ResultHandler<C> &res, idx_t nprobe, const storage_idx_t *nearest_i, const float *nearest_d, int search_type, HNSWStats &search_stats, VisitedTable &vt, const SearchParametersHNSW *params = nullptr) const
search only in level 0 from a given vertex
-
void reset()
-
void clear_neighbor_tables(int level)
-
void print_neighbor_stats(int level) const
-
int prepare_level_tab(size_t n, bool preset_levels = false)
Public Members
-
std::vector<int> cum_nneighbor_per_level
number of neighbors stored per layer (cumulative), should not be changed after first add
-
std::vector<size_t> offsets
offsets[i] is the offset in the neighbors array where vector i is stored size ntotal + 1
-
std::vector<storage_idx_t> neighbors
neighbors[offsets[i]:offsets[i+1]] is the list of neighbors of vector i for all levels. this is where all storage goes.
-
storage_idx_t entry_point = -1
entry point in the search structure (one of the points with maximum level
-
faiss::RandomGenerator rng
-
int max_level = -1
maximum level
-
int efConstruction = 40
expansion factor at construction time
-
int efSearch = 16
expansion factor at search time
-
bool check_relative_distance = true
during search: do we check whether the next best distance is good enough?
-
bool search_bounded_queue = true
use bounded queue during exploration
Public Static Functions
-
static void shrink_neighbor_list(DistanceComputer &qdis, std::priority_queue<NodeDistFarther> &input, std::vector<NodeDistFarther> &output, int max_size, bool keep_max_size_level0 = false)
-
struct MinimaxHeap
- #include <HNSW.h>
Heap structure that allows fast
Public Types
-
typedef faiss::CMax<float, storage_idx_t> HC
-
typedef faiss::CMax<float, storage_idx_t> HC
-
struct NodeDistCloser
- #include <HNSW.h>
to sort pairs of (id, distance) from nearest to fathest or the reverse
Public Functions
-
inline NodeDistCloser(float d, int id)
-
inline bool operator<(const NodeDistCloser &obj1) const
-
inline NodeDistCloser(float d, int id)
-
struct NodeDistFarther
Public Functions
-
inline NodeDistFarther(float d, int id)
-
inline bool operator<(const NodeDistFarther &obj1) const
-
inline NodeDistFarther(float d, int id)
-
using storage_idx_t = int32_t
-
int search_from_candidates(const HNSW &hnsw, DistanceComputer &qdis, ResultHandler<HNSW::C> &res, HNSW::MinimaxHeap &candidates, VisitedTable &vt, HNSWStats &stats, int level, int nres_in = 0, const SearchParametersHNSW *params = nullptr)