File NSG.h

namespace faiss

Implementation of k-means clustering with many variants.

Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

IDSelector is intended to define a subset of vectors to handle (for removal or as subset to search)

PQ4 SIMD packing and accumulation functions

The basic kernel accumulates nq query vectors with bbs = nb * 2 * 16 vectors and produces an output matrix for that. It is interesting for nq * nb <= 4, otherwise register spilling becomes too large.

The implementation of these functions is spread over 3 cpp files to reduce parallel compile times. Templates are instantiated explicitly.

This file contains callbacks for kernels that compute distances.

Throughout the library, vectors are provided as float * pointers. Most algorithms can be optimized when several vectors are processed (added/searched) together in a batch. In this case, they are passed in as a matrix. When n vectors of size d are provided as float * x, component j of vector i is

x[ i * d + j ]

where 0 <= i < n and 0 <= j < d. In other words, matrices are always compact. When specifying the size of the matrix, we call it an n*d matrix, which implies a row-major storage.

I/O functions can read/write to a filename, a file handle or to an object that abstracts the medium.

The read functions return objects that should be deallocated with delete. All references within these objectes are owned by the object.

Definition of inverted lists + a few common classes that implement the interface.

Since IVF (inverted file) indexes are of so much use for large-scale use cases, we group a few functions related to them in this small library. Most functions work both on IndexIVFs and IndexIVFs embedded within an IndexPreTransform.

In this file are the implementations of extra metrics beyond L2 and inner product

Implements a few neural net layers, mainly to support QINCo

Defines a few objects that apply transformations to a set of vectors Often these are pre-processing steps.

struct NSG

Public Types

using storage_idx_t = int32_t

internal storage of vectors (32 bits: this is expensive)

using Node = nsg::Node
using Neighbor = nsg::Neighbor

Public Functions

explicit NSG(int R = 32)
void build(Index *storage, idx_t n, const nsg::Graph<idx_t> &knn_graph, bool verbose)
void reset()
void search(DistanceComputer &dis, int k, idx_t *I, float *D, VisitedTable &vt) const
void init_graph(Index *storage, const nsg::Graph<idx_t> &knn_graph)
template<bool collect_fullset, class index_t>
void search_on_graph(const nsg::Graph<index_t> &graph, DistanceComputer &dis, VisitedTable &vt, int ep, int pool_size, std::vector<Neighbor> &retset, std::vector<Node> &fullset) const
void add_reverse_links(int q, std::vector<std::mutex> &locks, DistanceComputer &dis, nsg::Graph<Node> &graph)
void sync_prune(int q, std::vector<Node> &pool, DistanceComputer &dis, VisitedTable &vt, const nsg::Graph<idx_t> &knn_graph, nsg::Graph<Node> &graph)
void link(Index *storage, const nsg::Graph<idx_t> &knn_graph, nsg::Graph<Node> &graph, bool verbose)
int tree_grow(Index *storage, std::vector<int> &degrees)
int dfs(VisitedTable &vt, int root, int cnt) const
int attach_unlinked(Index *storage, VisitedTable &vt, VisitedTable &vt2, std::vector<int> &degrees)
void check_graph() const

Public Members

int ntotal = 0

nb of nodes

int R

nb of neighbors per node

int L

length of the search path at construction time

int C

candidate pool size at construction time

int search_L = 16

length of the search path

int enterpoint

enterpoint

std::shared_ptr<nsg::Graph<int32_t>> final_graph

NSG graph structure.

bool is_built = false

NSG is built or not.

RandomGenerator rng

random generator

namespace nsg

Functions

DistanceComputer *storage_distance_computer(const Index *storage)
template<class node_t>
struct Graph

Public Functions

inline Graph(node_t *data, int N, int K)
inline Graph(int N, int K)
inline Graph(const Graph &g)
inline virtual ~Graph()
inline node_t at(int i, int j) const
inline node_t &at(int i, int j)
inline virtual size_t get_neighbors(int i, node_t *neighbors) const

Public Members

node_t *data

the flattened adjacency matrix, size N-by-K

int K

nb of neighbors per node

int N

total nb of nodes

bool own_fields

the underlying data owned by itself or not