Namespace faiss::gpu

namespace gpu

Enums

enum class DistanceDataType

Values:

enumerator F32

enumerator F16

enumerator BF16

enum class IndicesDataType

Values:

enumerator I64

enumerator I32

enum class graph_build_algo

Values:

enumerator IVF_PQ: Use IVF-PQ to build all-neighbors knn graph.

enumerator NN_DESCENT: Use NN-Descent to build all-neighbors knn graph.

enum class codebook_gen

A type for specifying how PQ codebooks are created.

Values:

enumerator PER_SUBSPACE

enumerator PER_CLUSTER

enum class search_algo

Values:

enumerator SINGLE_CTA: For large batch sizes.

enumerator MULTI_CTA: For small batch sizes.

enumerator MULTI_KERNEL

enumerator AUTO

enum class hash_mode

Values:

enumerator HASH

enumerator SMALL

enumerator AUTO

enum IndicesOptions

How user vector index data is stored on the GPU.

Values:

enumerator INDICES_CPU: The user indices are only stored on the CPU; the GPU returns (inverted list, offset) to the CPU which is then translated to the real user index.

enumerator INDICES_IVF: The indices are not stored at all, on either the CPU or GPU. Only (inverted list, offset) is returned to the user as the index.

enumerator INDICES_32_BIT: Indices are stored as 32 bit integers on the GPU, but returned as 64 bit integers

enumerator INDICES_64_BIT: Indices are stored as 64 bit integers on the GPU.

enum AllocType

Values:

enumerator Other: Unknown allocation type or miscellaneous (not currently categorized)

enumerator FlatData: Primary data storage for GpuIndexFlat (the raw matrix of vectors and vector norms if needed)

enumerator IVFLists: Primary data storage for GpuIndexIVF* (the storage for each individual IVF list)

enumerator Quantizer: Quantizer (PQ, SQ) dictionary information.

enumerator QuantizerPrecomputedCodes: For GpuIndexIVFPQ, “precomputed codes” for more efficient PQ lookup require the use of possibly large tables. These are marked separately from Quantizer as these can frequently be 100s - 1000s of MiB in size

enumerator TemporaryMemoryBuffer: StandardGpuResources implementation specific types When using StandardGpuResources, temporary memory allocations (MemorySpace::Temporary) come out of a stack region of memory that is allocated up front for each gpu (e.g., 1.5 GiB upon initialization). This allocation by StandardGpuResources is marked with this AllocType.

enumerator TemporaryMemoryOverflow: When using StandardGpuResources, any MemorySpace::Temporary allocations that cannot be satisfied within the TemporaryMemoryBuffer region fall back to calling cudaMalloc which are sized to just the request at hand. These “overflow” temporary allocations are marked with this AllocType.

enum MemorySpace

Memory regions accessible to the GPU.

Values:

enumerator Temporary: Temporary device memory (guaranteed to no longer be used upon exit of a top-level index call, and where the streams using it have completed GPU work). Typically backed by Device memory (cudaMalloc/cudaFree).

enumerator Device: Managed using cudaMalloc/cudaFree (typical GPU device memory)

enumerator Unified: Managed using cudaMallocManaged/cudaFree (typical Unified CPU/GPU memory)

Functions

faiss::Index *index_gpu_to_cpu(const faiss::Index *gpu_index): converts any GPU index inside gpu_index to a CPU index

faiss::Index *index_cpu_to_gpu(GpuResourcesProvider *provider, int device, const faiss::Index *index, const GpuClonerOptions *options = nullptr): converts any CPU index that can be converted to GPU

faiss::Index *index_cpu_to_gpu_multiple(std::vector<GpuResourcesProvider*> &provider, std::vector<int> &devices, const faiss::Index *index, const GpuMultipleClonerOptions *options = nullptr)

faiss::IndexBinary *index_binary_gpu_to_cpu(const faiss::IndexBinary *gpu_index)

faiss::IndexBinary *index_binary_cpu_to_gpu(GpuResourcesProvider *provider, int device, const faiss::IndexBinary *index, const GpuClonerOptions *options = nullptr): converts any CPU index that can be converted to GPU

faiss::IndexBinary *index_binary_cpu_to_gpu_multiple(std::vector<GpuResourcesProvider*> &provider, std::vector<int> &devices, const faiss::IndexBinary *index, const GpuMultipleClonerOptions *options = nullptr)

bool should_use_cuvs(GpuDistanceParams args): A function that determines whether cuVS should be used based on various conditions (such as unsupported architecture)

void bfKnn(GpuResourcesProvider *resources, const GpuDistanceParams &args)

A wrapper for gpu/impl/Distance.cuh to expose direct brute-force k-nearest neighbor searches on an externally-provided region of memory (e.g., from a pytorch tensor). The data (vectors, queries, outDistances, outIndices) can be resident on the GPU or the CPU, but all calculations are performed on the GPU. If the result buffers are on the CPU, results will be copied back when done.

All GPU computation is performed on the current CUDA device, and ordered with respect to resources->getDefaultStreamCurrentDevice().

For each vector in queries, searches all of vectors to find its k nearest neighbors with respect to the given metric

void bfKnn_tiling(GpuResourcesProvider *resources, const GpuDistanceParams &args, size_t vectorsMemoryLimit, size_t queriesMemoryLimit)

void bruteForceKnn(GpuResourcesProvider *resources, faiss::MetricType metric, const float *vectors, bool vectorsRowMajor, idx_t numVectors, const float *queries, bool queriesRowMajor, idx_t numQueries, int dims, int k, float *outDistances, idx_t *outIndices): Deprecated legacy implementation.

bool should_use_cuvs(GpuIndexConfig config_): A centralized function that determines whether cuVS should be used based on various conditions (such as unsupported architecture)

GpuIndex *tryCastGpuIndex(faiss::Index *index): If the given index is a GPU index, this returns the index instance.

bool isGpuIndex(faiss::Index *index): Is the given index instance a GPU index?

bool isGpuIndexImplemented(faiss::Index *index): Does the given CPU index instance have a corresponding GPU implementation?

std::string allocTypeToString(AllocType t): Convert an AllocType to string.

std::string memorySpaceToString(MemorySpace s): Convert a MemorySpace to string.

AllocInfo makeDevAlloc(AllocType at, cudaStream_t st): Create an AllocInfo for the current device with MemorySpace::Device.

AllocInfo makeTempAlloc(AllocType at, cudaStream_t st): Create an AllocInfo for the current device with MemorySpace::Temporary.

AllocInfo makeSpaceAlloc(AllocType at, MemorySpace sp, cudaStream_t st): Create an AllocInfo for the current device.

int getMaxKSelection()

A collection of various utility functions for index implementation.

Returns the maximum k-selection value supported based on the CUDA SDK that we were compiled with. .cu files can use DeviceDefs.cuh, but this is for non-CUDA files

void validateKSelect(int k)

void validateNProbe(size_t nprobe)

std::vector<uint8_t> unpackNonInterleaved(std::vector<uint8_t> data, int numVecs, int dims, int bitsPerCode)

std::vector<uint8_t> unpackInterleaved(std::vector<uint8_t> data, int numVecs, int dims, int bitsPerCode)

std::vector<uint8_t> packNonInterleaved(std::vector<uint8_t> data, int numVecs, int dims, int bitsPerCode)

std::vector<uint8_t> packInterleaved(std::vector<uint8_t> data, int numVecs, int dims, int bitsPerCode)

void ivfOffsetToUserIndex(idx_t *indices, idx_t numLists, idx_t queries, int k, const std::vector<std::vector<idx_t>> &listOffsetToUserIndex): Utility function to translate (list id, offset) to a user index on the CPU. In a cpp in order to use OpenMP.

void newTestSeed(): Generates and displays a new seed for the test.

void setTestSeed(long seed): Uses an explicit seed for the test.

float relativeError(float a, float b): Returns the relative error in difference between a and b (|a - b| / (0.5 * (|a| + |b|))

int randVal(int a, int b): Generates a random integer in the range [a, b].

bool randBool(): Generates a random bool.

template<typename T> T randSelect(std::initializer_list<T> vals): Select a random value from the given list of values provided as an initializer_list

std::vector<float> randVecs(size_t num, size_t dim): Generates a collection of random vectors in the range [0, 1].

std::vector<unsigned char> randBinaryVecs(size_t num, size_t dim): Generates a collection of random bit vectors.

std::vector<float> roundToHalf(const std::vector<float> &v)

void compareIndices(const std::vector<float> &queryVecs, faiss::Index &refIndex, faiss::Index &testIndex, int numQuery, int dim, int k, const std::string &configMsg, float maxRelativeError = 6e-5f, float pctMaxDiff1 = 0.1f, float pctMaxDiffN = 0.005f): Compare two indices via query for similarity, with a user-specified set of query vectors

void compareIndices(faiss::Index &refIndex, faiss::Index &testIndex, int numQuery, int dim, int k, const std::string &configMsg, float maxRelativeError = 6e-5f, float pctMaxDiff1 = 0.1f, float pctMaxDiffN = 0.005f): Compare two indices via query for similarity, generating random query vectors

void compareLists(const float *refDist, const faiss::idx_t *refInd, const float *testDist, const faiss::idx_t *testInd, int dim1, int dim2, const std::string &configMsg, bool printBasicStats, bool printDiffs, bool assertOnErr, float maxRelativeError = 6e-5f, float pctMaxDiff1 = 0.1f, float pctMaxDiffN = 0.005f): Display specific differences in the two (distance, index) lists.

template<typename A, typename B> void testIVFEquality(A &cpuIndex, B &gpuIndex): Compare IVF lists between a CPU and GPU index.

inline cuvsDistanceType metricFaissToCuvs(MetricType metric, bool exactDistance)

void validRowIndices(GpuResources *res, Tensor<float, 2, true> &vecs, bool *validRows): Identify matrix rows containing non NaN values. validRows[i] is false if row i contains a NaN value and true otherwise.

idx_t inplaceGatherFilteredRows(GpuResources *res, Tensor<float, 2, true> &vecs, Tensor<idx_t, 1, true> &indices): Filter out matrix rows containing NaN values. The vectors and indices are updated in-place.

int getCurrentDevice(): Returns the current thread-local GPU device.

void setCurrentDevice(int device): Sets the current thread-local GPU device.

int getNumDevices(): Returns the number of available GPU devices.

void profilerStart(): Starts the CUDA profiler (exposed via SWIG)

void profilerStop(): Stops the CUDA profiler (exposed via SWIG)

void synchronizeAllDevices(): Synchronizes the CPU against all devices (equivalent to cudaDeviceSynchronize for each device)

const cudaDeviceProp &getDeviceProperties(int device): Returns a cached cudaDeviceProp for the given device.

const cudaDeviceProp &getCurrentDeviceProperties(): Returns the cached cudaDeviceProp for the current device.

int getMaxThreads(int device): Returns the maximum number of threads available for the given GPU device

int getMaxThreadsCurrentDevice(): Equivalent to getMaxThreads(getCurrentDevice())

dim3 getMaxGrid(int device): Returns the maximum grid size for the given GPU device.

dim3 getMaxGridCurrentDevice(): Equivalent to getMaxGrid(getCurrentDevice())

size_t getMaxSharedMemPerBlock(int device): Returns the maximum smem available for the given GPU device.

size_t getMaxSharedMemPerBlockCurrentDevice(): Equivalent to getMaxSharedMemPerBlock(getCurrentDevice())

int getDeviceForAddress(const void *p): For a given pointer, returns whether or not it is located on a device (deviceId >= 0) or the host (-1).

bool getFullUnifiedMemSupport(int device): Does the given device support full unified memory sharing host memory?

bool getFullUnifiedMemSupportCurrentDevice(): Equivalent to getFullUnifiedMemSupport(getCurrentDevice())

bool getTensorCoreSupport(int device): Does the given device support tensor core operations?

bool getTensorCoreSupportCurrentDevice(): Equivalent to getTensorCoreSupport(getCurrentDevice())

int getWarpSize(int device): Returns the warp size of the given GPU device.

int getWarpSizeCurrentDevice(): Equivalent to getWarpSize(getCurrentDevice())

size_t getFreeMemory(int device): Returns the amount of currently available memory on the given device.

size_t getFreeMemoryCurrentDevice(): Equivalent to getFreeMemory(getCurrentDevice())

template<typename L1, typename L2> void streamWaitBase(const L1 &listWaiting, const L2 &listWaitOn): Call for a collection of streams to wait on.

template<typename L1> void streamWait(const L1 &a, const std::initializer_list<cudaStream_t> &b): These versions allow usage of initializer_list as arguments, since otherwise {…} doesn’t have a type

template<typename L2> void streamWait(const std::initializer_list<cudaStream_t> &a, const L2 &b)

inline void streamWait(const std::initializer_list<cudaStream_t> &a, const std::initializer_list<cudaStream_t> &b)

struct GpuParameterSpace : public faiss::ParameterSpace

#include <GpuAutoTune.h>

parameter space and setters for GPU indexes

Public Functions

virtual void initialize(const faiss::Index *index) override: initialize with reasonable parameters for the index

virtual void set_index_parameter(faiss::Index *index, const std::string &name, double val) const override: set a combination of parameters on an index

size_t n_combinations() const: nb of combinations, = product of values sizes

bool combination_ge(size_t c1, size_t c2) const: returns whether combinations c1 >= c2 in the tuple sense

std::string combination_name(size_t cno) const: get string representation of the combination

void display() const: print a description on stdout

ParameterRange &add_range(const std::string &name): add a new parameter (or return it if it exists)

void set_index_parameters(Index *index, size_t cno) const: set a combination of parameters on an index

void set_index_parameters(Index *index, const char *param_string) const: set a combination of parameters described by a string

void update_bounds(size_t cno, const OperatingPoint &op, double *upper_bound_perf, double *lower_bound_t) const: find an upper bound on the performance and a lower bound on t for configuration cno given another operating point op

void explore(Index *index, size_t nq, const float *xq, const AutoTuneCriterion &crit, OperatingPoints *ops) const

explore operating points

Parameters:

index – index to run on
xq – query vectors (size nq * index.d)
crit – selection criterion
ops – resulting operating points

Public Members

std::vector<ParameterRange> parameter_ranges: all tunable parameters

int verbose: verbosity during exploration

int n_experiments: nb of experiments during optimization (0 = try all combinations)

size_t batchsize: maximum number of queries to submit at a time.

bool thread_over_batches: use multithreading over batches (useful to benchmark independent single-searches)

double min_test_duration: run tests several times until they reach at least this duration (to avoid jittering in MT mode)

struct ToCPUCloner : public faiss::Cloner

#include <GpuCloner.h>

Cloner specialized for GPU -> CPU.

Public Functions

void merge_index(Index *dst, Index *src, bool successive_ids)

virtual Index *clone_Index(const Index *index) override

virtual VectorTransform *clone_VectorTransform(const VectorTransform*)

virtual IndexIVF *clone_IndexIVF(const IndexIVF*)

struct ToGpuCloner : public faiss::Cloner, public faiss::gpu::GpuClonerOptions

#include <GpuCloner.h>

Cloner specialized for CPU -> 1 GPU.

Public Functions

ToGpuCloner(GpuResourcesProvider *prov, int device, const GpuClonerOptions &options)

virtual Index *clone_Index(const Index *index) override

virtual VectorTransform *clone_VectorTransform(const VectorTransform*)

virtual IndexIVF *clone_IndexIVF(const IndexIVF*)

Public Members

GpuResourcesProvider *provider

int device

IndicesOptions indicesOptions = INDICES_64_BIT: how should indices be stored on index types that support indices (anything but GpuIndexFlat*)?

bool useFloat16CoarseQuantizer = false: is the coarse quantizer in float16?

bool useFloat16 = false: for GpuIndexIVFFlat, is storage in float16? for GpuIndexIVFPQ, are intermediate calculations in float16?

bool usePrecomputed = false: use precomputed tables?

long reserveVecs = 0: reserve vectors in the invfiles?

bool storeTransposed = false: For GpuIndexFlat, store data in transposed layout?

bool verbose = false: Set verbose options on the index.

bool use_cuvs = false: use the cuVS implementation

bool allowCpuCoarseQuantizer = false: This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.

struct ToGpuClonerMultiple : public faiss::Cloner, public faiss::gpu::GpuMultipleClonerOptions

#include <GpuCloner.h>

Cloner specialized for CPU -> multiple GPUs.

Public Functions

ToGpuClonerMultiple(std::vector<GpuResourcesProvider*> &provider, std::vector<int> &devices, const GpuMultipleClonerOptions &options)

ToGpuClonerMultiple(const std::vector<ToGpuCloner> &sub_cloners, const GpuMultipleClonerOptions &options)

void copy_ivf_shard(const IndexIVF *index_ivf, IndexIVF *idx2, idx_t n, idx_t i)

Index *clone_Index_to_shards(const Index *index)

virtual Index *clone_Index(const Index *index) override: main function

virtual VectorTransform *clone_VectorTransform(const VectorTransform*)

virtual IndexIVF *clone_IndexIVF(const IndexIVF*)

Public Members

std::vector<ToGpuCloner> sub_cloners

bool shard = false: Whether to shard the index across GPUs, versus replication across GPUs

int shard_type = 1: IndexIVF::copy_subset_to subset type.

bool common_ivf_quantizer = false: set to true if an IndexIVF is to be dispatched to multiple GPUs with a single common IVF quantizer, ie. only the inverted lists are sharded on the sub-indexes (uses an IndexShardsIVF)

IndicesOptions indicesOptions = INDICES_64_BIT: how should indices be stored on index types that support indices (anything but GpuIndexFlat*)?

bool useFloat16CoarseQuantizer = false: is the coarse quantizer in float16?

bool useFloat16 = false: for GpuIndexIVFFlat, is storage in float16? for GpuIndexIVFPQ, are intermediate calculations in float16?

bool usePrecomputed = false: use precomputed tables?

long reserveVecs = 0: reserve vectors in the invfiles?

bool storeTransposed = false: For GpuIndexFlat, store data in transposed layout?

bool verbose = false: Set verbose options on the index.

bool use_cuvs = false: use the cuVS implementation

bool allowCpuCoarseQuantizer = false: This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.

struct GpuProgressiveDimIndexFactory : public faiss::ProgressiveDimIndexFactory

#include <GpuCloner.h>

index factory for the ProgressiveDimClustering object

Public Functions

explicit GpuProgressiveDimIndexFactory(int ngpu)

virtual Index *operator()(int dim) override: ownership transferred to caller

virtual ~GpuProgressiveDimIndexFactory() override

Public Members

GpuMultipleClonerOptions options

std::vector<GpuResourcesProvider*> vres

std::vector<int> devices

int ncall

struct GpuClonerOptions

#include <GpuClonerOptions.h>

set some options on how to copy to GPU

Subclassed by faiss::gpu::GpuMultipleClonerOptions, faiss::gpu::ToGpuCloner

Public Members

IndicesOptions indicesOptions = INDICES_64_BIT: how should indices be stored on index types that support indices (anything but GpuIndexFlat*)?

bool useFloat16CoarseQuantizer = false: is the coarse quantizer in float16?

bool useFloat16 = false: for GpuIndexIVFFlat, is storage in float16? for GpuIndexIVFPQ, are intermediate calculations in float16?

bool usePrecomputed = false: use precomputed tables?

long reserveVecs = 0: reserve vectors in the invfiles?

bool storeTransposed = false: For GpuIndexFlat, store data in transposed layout?

bool verbose = false: Set verbose options on the index.

bool use_cuvs = false: use the cuVS implementation

bool allowCpuCoarseQuantizer = false: This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.

struct GpuMultipleClonerOptions : public faiss::gpu::GpuClonerOptions

Subclassed by faiss::gpu::ToGpuClonerMultiple

Public Members

bool shard = false: Whether to shard the index across GPUs, versus replication across GPUs

int shard_type = 1: IndexIVF::copy_subset_to subset type.

bool common_ivf_quantizer = false: set to true if an IndexIVF is to be dispatched to multiple GPUs with a single common IVF quantizer, ie. only the inverted lists are sharded on the sub-indexes (uses an IndexShardsIVF)

IndicesOptions indicesOptions = INDICES_64_BIT: how should indices be stored on index types that support indices (anything but GpuIndexFlat*)?

bool useFloat16CoarseQuantizer = false: is the coarse quantizer in float16?

bool useFloat16 = false: for GpuIndexIVFFlat, is storage in float16? for GpuIndexIVFPQ, are intermediate calculations in float16?

bool usePrecomputed = false: use precomputed tables?

long reserveVecs = 0: reserve vectors in the invfiles?

bool storeTransposed = false: For GpuIndexFlat, store data in transposed layout?

bool verbose = false: Set verbose options on the index.

bool use_cuvs = false: use the cuVS implementation

bool allowCpuCoarseQuantizer = false: This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.

struct GpuDistanceParams

#include <GpuDistance.h>

Arguments to brute-force GPU k-nearest neighbor searching.

Public Members

faiss::MetricType metric = METRIC_L2: Search parameter: distance metric.

float metricArg = 0: Search parameter: distance metric argument (if applicable) For metric == METRIC_Lp, this is the p-value

int k = 0: Search parameter: return k nearest neighbors If the value provided is -1, then we report all pairwise distances without top-k filtering

int dims = 0: Vector dimensionality.

const void *vectors = nullptr: If vectorsRowMajor is true, this is numVectors x dims, with dims innermost; otherwise, dims x numVectors, with numVectors innermost

DistanceDataType vectorType = DistanceDataType::F32

bool vectorsRowMajor = true

idx_t numVectors = 0

const float *vectorNorms = nullptr: Precomputed L2 norms for each vector in vectors, which can be optionally provided in advance to speed computation for METRIC_L2

const void *queries = nullptr: If queriesRowMajor is true, this is numQueries x dims, with dims innermost; otherwise, dims x numQueries, with numQueries innermost

DistanceDataType queryType = DistanceDataType::F32

bool queriesRowMajor = true

idx_t numQueries = 0

float *outDistances = nullptr: A region of memory size numQueries x k, with k innermost (row major) if k > 0, or if k == -1, a region of memory of size numQueries x numVectors

bool ignoreOutDistances = false: Do we only care about the indices reported, rather than the output distances? Not used if k == -1 (all pairwise distances)

IndicesDataType outIndicesType = IndicesDataType::I64: A region of memory size numQueries x k, with k innermost (row major). Not used if k == -1 (all pairwise distances)

void *outIndices = nullptr

int device = -1: On which GPU device should the search run? -1 indicates that the current CUDA thread-local device (via cudaGetDevice/cudaSetDevice) is used Otherwise, an integer 0 <= device < numDevices indicates the device for execution

bool use_cuvs = false: Should the index dispatch down to cuVS?

class GpuIcmEncoder : public faiss::lsq::IcmEncoder

#include <GpuIcmEncoder.h>

Perform LSQ encoding on GPU.

Split input vectors to different devices and call IcmEncoderImpl::encode to encode them

Public Functions

GpuIcmEncoder(const LocalSearchQuantizer *lsq, const std::vector<GpuResourcesProvider*> &provs, const std::vector<int> &devices)

~GpuIcmEncoder()

GpuIcmEncoder(const GpuIcmEncoder&) = delete

GpuIcmEncoder &operator=(const GpuIcmEncoder&) = delete

virtual void set_binary_term() override

virtual void encode(int32_t *codes, const float *x, std::mt19937 &gen, size_t n, size_t ils_iters) const override

Encode vectors given codebooks

Parameters:

codes – output codes, size n * M
x – vectors to encode, size n * d
gen – random generator
n – number of vectors
ils_iters – number of iterations of iterative local search

Public Members

std::vector<float> binaries

bool verbose

const LocalSearchQuantizer *lsq

Private Members

std::unique_ptr<IcmEncoderShards> shards

struct GpuIcmEncoderFactory : public faiss::lsq::IcmEncoderFactory

Public Functions

explicit GpuIcmEncoderFactory(int ngpus = 1)

virtual lsq::IcmEncoder *get(const LocalSearchQuantizer *lsq) override

Public Members

std::vector<GpuResourcesProvider*> provs

std::vector<int> devices

struct GpuIndexConfig

Subclassed by faiss::gpu::GpuIndexBinaryFlatConfig, faiss::gpu::GpuIndexCagraConfig, faiss::gpu::GpuIndexFlatConfig, faiss::gpu::GpuIndexIVFConfig

Public Members

int device = 0: GPU device on which the index is resident.

MemorySpace memorySpace = MemorySpace::Device: What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.

bool use_cuvs = false: Should the index dispatch down to cuVS?

class GpuIndex : public faiss::Index

Subclassed by faiss::gpu::GpuIndexCagra, faiss::gpu::GpuIndexFlat, faiss::gpu::GpuIndexIVF

Public Types

using component_t = float

using distance_t = float

Public Functions

GpuIndex(std::shared_ptr<GpuResources> resources, int dims, faiss::MetricType metric, float metricArg, GpuIndexConfig config)

int getDevice() const: Returns the device that this index is resident on.

std::shared_ptr<GpuResources> getResources(): Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU

void setMinPagingSize(size_t size): Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging

size_t getMinPagingSize() const: Returns the current minimum data size for paged searches.

virtual void add(idx_t, const float *x) override: x can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override: x and ids can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override: x and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override: x, distances and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override: x, distances and labels and recons can be resident on the CPU or any GPU; copies are performed as needed

virtual void compute_residual(const float *x, float *residual, idx_t key) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void train(idx_t n, const float *x)

Perform training on a representative set of vectors

Parameters:

n – nb of training vectors
x – training vecors, size n * d

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual void reset() = 0: removes all elements from the database.

virtual size_t remove_ids(const IDSelector &sel): removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void reconstruct(idx_t key, float *recons) const

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

key – id of the vector to reconstruct
recons – reconstucted vector (size d)

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const

Reconstruct several stored vectors (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters:

i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d

virtual void merge_from(Index &otherIndex, idx_t add_id = 0): moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

Public Members

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type

Protected Functions

void copyFrom(const faiss::Index *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::Index *index) const: Copy what we have to the CPU equivalent.

virtual bool addImplRequiresIDs_() const = 0: Does addImpl_ require IDs? If so, and no IDs are provided, we will generate them sequentially based on the order in which the IDs are added

virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) = 0: Overridden to actually perform the add All data is guaranteed to be resident on our device

virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const = 0: Overridden to actually perform the search All data is guaranteed to be resident on our device

Protected Attributes

std::shared_ptr<GpuResources> resources_: Manages streams, cuBLAS handles and scratch memory for devices.

const GpuIndexConfig config_: Our configuration options.

size_t minPagedSize_: Size above which we page copies from the CPU to GPU.

Private Functions

void addPaged_(idx_t n, const float *x, const idx_t *ids): Handles paged adds if the add set is too large, passes to addImpl_ to actually perform the add for the current page

void addPage_(idx_t n, const float *x, const idx_t *ids): Calls addImpl_ for a single page of GPU-resident data.

void searchNonPaged_(idx_t n, const float *x, int k, float *outDistancesData, idx_t *outIndicesData, const SearchParameters *params) const: Calls searchImpl_ for a single page of GPU-resident data.

void searchFromCpuPaged_(idx_t n, const float *x, int k, float *outDistancesData, idx_t *outIndicesData, const SearchParameters *params) const: Calls searchImpl_ for a single page of GPU-resident data, handling paging of the data and copies from the CPU

struct GpuIndexBinaryFlatConfig : public faiss::gpu::GpuIndexConfig

Public Members

int device = 0: GPU device on which the index is resident.

MemorySpace memorySpace = MemorySpace::Device: What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.

bool use_cuvs = false: Should the index dispatch down to cuVS?

class GpuIndexBinaryFlat : public faiss::IndexBinary

#include <GpuIndexBinaryFlat.h>

A GPU version of IndexBinaryFlat for brute-force comparison of bit vectors via Hamming distance

Public Types

using component_t = uint8_t

using distance_t = int32_t

Public Functions

GpuIndexBinaryFlat(GpuResourcesProvider *resources, const faiss::IndexBinaryFlat *index, GpuIndexBinaryFlatConfig config = GpuIndexBinaryFlatConfig()): Construct from a pre-existing faiss::IndexBinaryFlat instance, copying data over to the given GPU

GpuIndexBinaryFlat(GpuResourcesProvider *resources, int dims, GpuIndexBinaryFlatConfig config = GpuIndexBinaryFlatConfig()): Construct an empty instance that can be added to.

~GpuIndexBinaryFlat() override

int getDevice() const: Returns the device that this index is resident on.

std::shared_ptr<GpuResources> getResources(): Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU

void copyFrom(const faiss::IndexBinaryFlat *index): Initialize ourselves from the given CPU index; will overwrite all data in ourselves

void copyTo(faiss::IndexBinaryFlat *index) const: Copy ourselves to the given CPU index; will overwrite all data in the index instance

virtual void add(faiss::idx_t n, const uint8_t *x) override

Add n vectors of dimension d to the index.

Vectors are implicitly assigned labels ntotal .. ntotal + n - 1

Parameters:: x – input matrix, size n * d / 8

virtual void reset() override: Removes all elements from the database.

virtual void search(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, faiss::idx_t *labels, const faiss::SearchParameters *params = nullptr) const override

Query n vectors of dimension d to the index.

return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.

Parameters:

x – input vectors to search, size n * d / 8
labels – output labels of the NNs, size n*k
distances – output pairwise distances, size n*k

virtual void reconstruct(faiss::idx_t key, uint8_t *recons) const override

Reconstruct a stored vector.

This function may not be defined for some indexes.

Parameters:

key – id of the vector to reconstruct
recons – reconstucted vector (size d / 8)

virtual void train(idx_t n, const uint8_t *x)

Perform training on a representative set of vectors.

Parameters:

n – nb of training vectors
x – training vecors, size n * d / 8

virtual void add_with_ids(idx_t n, const uint8_t *x, const idx_t *xids)

Same as add, but stores xids instead of sequential ids.

The default implementation fails with an assertion, as it is not supported by all indexes.

Parameters:: xids – if non-null, ids to store for the vectors (size n)

virtual void range_search(idx_t n, const uint8_t *x, int radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

Query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory). The distances are converted to float to reuse the RangeSearchResult structure, but they are integer. By convention, only distances < radius (strict comparison) are returned, ie. radius = 0 does not return any result and 1 returns only exact same vectors.

Parameters:

x – input vectors to search, size n * d / 8
radius – search radius
result – result table

void assign(idx_t n, const uint8_t *x, idx_t *labels, idx_t k = 1) const

Return the indexes of the k vectors closest to the query x.

This function is identical to search but only returns labels of neighbors.

Parameters:

x – input vectors to search, size n * d / 8
labels – output labels of the NNs, size n*k

virtual size_t remove_ids(const IDSelector &sel): Removes IDs from the index. Not supported by all indexes.

virtual void reconstruct_n(idx_t i0, idx_t ni, uint8_t *recons) const

Reconstruct vectors i0 to i0 + ni - 1.

This function may not be defined for some indexes.

Parameters:: recons – reconstucted vectors (size ni * d / 8)

virtual void search_and_reconstruct(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels, uint8_t *recons, const SearchParameters *params = nullptr) const

Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.

If there are not enough results for a query, the resulting array is padded with -1s.

Parameters:: recons – reconstructed vectors size (n, k, d)

void display() const: Display the actual class name and some more info.

virtual void merge_from(IndexBinary &otherIndex, idx_t add_id = 0): moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const IndexBinary &otherIndex) const: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids): Same as add_with_ids for IndexBinary.

Public Members

int d = 0: vector dimension

int code_size = 0: number of bytes per vector ( = d / 8 )

idx_t ntotal = 0: total nb of indexed vectors

bool verbose = false: verbosity level

bool is_trained = true: set if the Index does not require training, or if training is done already

MetricType metric_type = METRIC_L2: type of metric this index uses for search

Protected Functions

void searchFromCpuPaged_(idx_t n, const uint8_t *x, int k, int32_t *outDistancesData, idx_t *outIndicesData) const: Called from search when the input data is on the CPU; potentially allows for pinned memory usage

void searchNonPaged_(idx_t n, const uint8_t *x, int k, int32_t *outDistancesData, idx_t *outIndicesData) const

Protected Attributes

std::shared_ptr<GpuResources> resources_: Manages streans, cuBLAS handles and scratch memory for devices.

const GpuIndexBinaryFlatConfig binaryFlatConfig_: Configuration options.

std::unique_ptr<BinaryFlatIndex> data_: Holds our GPU data containing the list of vectors.

struct IVFPQBuildCagraConfig

Public Members

uint32_t n_lists = 1024

The number of inverted lists (clusters)

Hint: the number of vectors per cluster (n_rows/n_lists) should be approximately 1,000 to 10,000.

uint32_t kmeans_n_iters = 20: The number of iterations searching for kmeans centers (index building).

double kmeans_trainset_fraction = 0.5: The fraction of data to use during iterative kmeans building.

uint32_t pq_bits = 8

The bit length of the vector element after compression by PQ.

Possible values: [4, 5, 6, 7, 8].

Hint: the smaller the ‘pq_bits’, the smaller the index size and the better the search performance, but the lower the recall.

uint32_t pq_dim = 0

The dimensionality of the vector after compression by PQ. When zero, an optimal value is selected using a heuristic.

NB: pq_dim /// pq_bits must be a multiple of 8.

Hint: a smaller ‘pq_dim’ results in a smaller index size and better search performance, but lower recall. If ‘pq_bits’ is 8, ‘pq_dim’ can be set to any number, but multiple of 8 are desirable for good performance. If ‘pq_bits’ is not 8, ‘pq_dim’ should be a multiple of 8. For good performance, it is desirable that ‘pq_dim’ is a multiple of 32. Ideally, ‘pq_dim’ should be also a divisor of the dataset dim.

codebook_gen codebook_kind = codebook_gen::PER_SUBSPACE: How PQ codebooks are created.

bool force_random_rotation = false

Apply a random rotation matrix on the input data and queries even if dim % pq_dim == 0.

Note: if dim is not multiple of pq_dim, a random rotation is always applied to the input data and queries to transform the working space from dim to rot_dim, which may be slightly larger than the original space and and is a multiple of pq_dim (rot_dim % pq_dim == 0). However, this transform is not necessary when dim is multiple of pq_dim (dim == rot_dim, hence no need in adding “extra” data columns / features).

By default, if dim == rot_dim, the rotation transform is initialized with the identity matrix. When force_random_rotation == true, a random orthogonal transform matrix is generated regardless of the values of dim and pq_dim.

bool conservative_memory_allocation = false

By default, the algorithm allocates more space than necessary for individual clusters (list_data). This allows to amortize the cost of memory allocation and reduce the number of data copies during repeated calls to extend (extending the database).

The alternative is the conservative allocation behavior; when enabled, the algorithm always allocates the minimum amount of memory required to store the given number of records. Set this flag to true if you prefer to use as little GPU memory for the database as possible.

struct IVFPQSearchCagraConfig

Public Members

uint32_t n_probes = 20: The number of clusters to search.

cudaDataType_t lut_dtype = CUDA_R_32F

Data type of look up table to be created dynamically at search time.

Possible values: [CUDA_R_32F, CUDA_R_16F, CUDA_R_8U]

The use of low-precision types reduces the amount of shared memory required at search time, so fast shared memory kernels can be used even for datasets with large dimansionality. Note that the recall is slightly degraded when low-precision type is selected.

cudaDataType_t internal_distance_dtype = CUDA_R_32F

Storage data type for distance/similarity computed at search time.

Possible values: [CUDA_R_16F, CUDA_R_32F]

If the performance limiter at search time is device memory access, selecting FP16 will improve performance slightly.

double preferred_shmem_carveout = 1.0

Preferred fraction of SM’s unified memory / L1 cache to be used as shared memory.

Possible values: [0.0 - 1.0] as a fraction of the sharedMemPerMultiprocessor.

One wants to increase the carveout to make sure a good GPU occupancy for the main search kernel, but not to keep it too high to leave some memory to be used as L1 cache. Note, this value is interpreted only as a hint. Moreover, a GPU usually allows only a fixed set of cache configurations, so the provided value is rounded up to the nearest configuration. Refer to the NVIDIA tuning guide for the target GPU architecture.

Note, this is a low-level tuning parameter that can have drastic negative effects on the search performance if tweaked incorrectly.

struct GpuIndexCagraConfig : public faiss::gpu::GpuIndexConfig

Public Members

size_t intermediate_graph_degree = 128: Degree of input graph for pruning.

size_t graph_degree = 64: Degree of output graph.

graph_build_algo build_algo = graph_build_algo::IVF_PQ: ANN algorithm to build knn graph.

size_t nn_descent_niter = 20: Number of Iterations to run if building with NN_DESCENT.

IVFPQBuildCagraConfig *ivf_pq_params = nullptr

IVFPQSearchCagraConfig *ivf_pq_search_params = nullptr

float refine_rate = 2.0f

bool store_dataset = true

int device = 0: GPU device on which the index is resident.

MemorySpace memorySpace = MemorySpace::Device: What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.

bool use_cuvs = false: Should the index dispatch down to cuVS?

struct SearchParametersCagra : public faiss::SearchParameters

Public Members

size_t max_queries = 0: Maximum number of queries to search at the same time (batch size). Auto select when 0.

size_t itopk_size = 64

Number of intermediate search results retained during the search.

This is the main knob to adjust trade off between accuracy and search speed. Higher values improve the search accuracy.

size_t max_iterations = 0: Upper limit of search iterations. Auto select when 0.

search_algo algo = search_algo::AUTO: Which search implementation to use.

size_t team_size = 0: Number of threads used to calculate a single distance. 4, 8, 16, or 32.

size_t search_width = 1: Number of graph nodes to select as the starting point for the search in each iteration. aka search width?

size_t min_iterations = 0: Lower limit of search iterations.

size_t thread_block_size = 0: Thread block size. 0, 64, 128, 256, 512, 1024. Auto selection when 0.

hash_mode hashmap_mode = hash_mode::AUTO: Hashmap type. Auto selection when AUTO.

size_t hashmap_min_bitlen = 0: Lower limit of hashmap bit length. More than 8.

float hashmap_max_fill_rate = 0.5: Upper limit of hashmap fill rate. More than 0.1, less than 0.9.

uint32_t num_random_samplings = 1: Number of iterations of initial random seed node selection. 1 or more.

uint64_t seed = 0x128394: Bit mask used for initial random seed node selection.

IDSelector *sel = nullptr: if non-null, only these IDs will be considered during search.

struct GpuIndexCagra : public faiss::gpu::GpuIndex

Public Types

using component_t = float

using distance_t = float

Public Functions

GpuIndexCagra(GpuResourcesProvider *provider, int dims, faiss::MetricType metric = faiss::METRIC_L2, GpuIndexCagraConfig config = GpuIndexCagraConfig())

virtual void train(idx_t n, const float *x) override: Trains CAGRA based on the given vector data.

void copyFrom(const faiss::IndexHNSWCagra *index): Initialize ourselves from the given CPU index; will overwrite all data in ourselves

void copyTo(faiss::IndexHNSWCagra *index) const: Copy ourselves to the given CPU index; will overwrite all data in the index instance

virtual void reset() override: removes all elements from the database.

std::vector<idx_t> get_knngraph() const

int getDevice() const: Returns the device that this index is resident on.

std::shared_ptr<GpuResources> getResources(): Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU

void setMinPagingSize(size_t size): Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging

size_t getMinPagingSize() const: Returns the current minimum data size for paged searches.

virtual void add(idx_t, const float *x) override: x can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override: x and ids can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override: x and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override: x, distances and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override: x, distances and labels and recons can be resident on the CPU or any GPU; copies are performed as needed

virtual void compute_residual(const float *x, float *residual, idx_t key) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual size_t remove_ids(const IDSelector &sel): removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void reconstruct(idx_t key, float *recons) const

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

key – id of the vector to reconstruct
recons – reconstucted vector (size d)

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const

Reconstruct several stored vectors (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters:

i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d

virtual void merge_from(Index &otherIndex, idx_t add_id = 0): moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

Public Members

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type

Protected Functions

virtual bool addImplRequiresIDs_() const override: Does addImpl_ require IDs? If so, and no IDs are provided, we will generate them sequentially based on the order in which the IDs are added

virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override: Overridden to actually perform the add All data is guaranteed to be resident on our device

virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *search_params) const override: Called from GpuIndex for search.

void copyFrom(const faiss::Index *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::Index *index) const: Copy what we have to the CPU equivalent.

Protected Attributes

const GpuIndexCagraConfig cagraConfig_: Our configuration options.

std::shared_ptr<CuvsCagra> index_: Instance that we own; contains the inverted lists.

std::shared_ptr<GpuResources> resources_: Manages streams, cuBLAS handles and scratch memory for devices.

const GpuIndexConfig config_: Our configuration options.

size_t minPagedSize_: Size above which we page copies from the CPU to GPU.

struct GpuIndexFlatConfig : public faiss::gpu::GpuIndexConfig

Public Functions

bool ALIGNED (8) useFloat16: Whether or not data is stored as float16.

Public Members

bool storeTransposed = false: Deprecated: no longer used Previously used to indicate whether internal storage of vectors is transposed

int device = 0: GPU device on which the index is resident.

MemorySpace memorySpace = MemorySpace::Device: What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.

bool use_cuvs = false: Should the index dispatch down to cuVS?

class GpuIndexFlat : public faiss::gpu::GpuIndex

#include <GpuIndexFlat.h>

Wrapper around the GPU implementation that looks like faiss::IndexFlat; copies over centroid data from a given faiss::IndexFlat

Subclassed by faiss::gpu::GpuIndexFlatIP, faiss::gpu::GpuIndexFlatL2

Public Types

using component_t = float

using distance_t = float

Public Functions

GpuIndexFlat(GpuResourcesProvider *provider, const faiss::IndexFlat *index, GpuIndexFlatConfig config = GpuIndexFlatConfig()): Construct from a pre-existing faiss::IndexFlat instance, copying data over to the given GPU

GpuIndexFlat(std::shared_ptr<GpuResources> resources, const faiss::IndexFlat *index, GpuIndexFlatConfig config = GpuIndexFlatConfig())

GpuIndexFlat(GpuResourcesProvider *provider, int dims, faiss::MetricType metric, GpuIndexFlatConfig config = GpuIndexFlatConfig()): Construct an empty instance that can be added to.

GpuIndexFlat(std::shared_ptr<GpuResources> resources, int dims, faiss::MetricType metric, GpuIndexFlatConfig config = GpuIndexFlatConfig())

~GpuIndexFlat() override

void copyFrom(const faiss::IndexFlat *index): Initialize ourselves from the given CPU index; will overwrite all data in ourselves

void copyTo(faiss::IndexFlat *index) const: Copy ourselves to the given CPU index; will overwrite all data in the index instance

size_t getNumVecs() const: Returns the number of vectors we contain.

virtual void reset() override: Clears all vectors from this index.

virtual void train(idx_t n, const float *x) override: This index is not trained, so this does nothing.

virtual void add(idx_t, const float *x) override: Overrides to avoid excessive copies.

virtual void reconstruct(idx_t key, float *out) const override: Reconstruction methods; prefer the batch reconstruct as it will be more efficient

virtual void reconstruct_n(idx_t i0, idx_t num, float *out) const override: Batch reconstruction method.

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *out) const override: Batch reconstruction method.

virtual void compute_residual(const float *x, float *residual, idx_t key) const override: Compute residual.

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override: Compute residual (batch mode)

inline FlatIndex *getGpuData(): For internal access.

int getDevice() const: Returns the device that this index is resident on.

std::shared_ptr<GpuResources> getResources(): Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU

void setMinPagingSize(size_t size): Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging

size_t getMinPagingSize() const: Returns the current minimum data size for paged searches.

virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override: x and ids can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override: x and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override: x, distances and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override: x, distances and labels and recons can be resident on the CPU or any GPU; copies are performed as needed

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual size_t remove_ids(const IDSelector &sel): removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d

virtual void merge_from(Index &otherIndex, idx_t add_id = 0): moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

Public Members

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type

Protected Functions

void resetIndex_(int dims)

virtual bool addImplRequiresIDs_() const override: Flat index does not require IDs as there is no storage available for them

virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override: Called from GpuIndex for add.

virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override: Called from GpuIndex for search.

void copyFrom(const faiss::Index *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::Index *index) const: Copy what we have to the CPU equivalent.

Protected Attributes

const GpuIndexFlatConfig flatConfig_: Our configuration options.

std::unique_ptr<FlatIndex> data_: Holds our GPU data containing the list of vectors.

std::shared_ptr<GpuResources> resources_: Manages streams, cuBLAS handles and scratch memory for devices.

const GpuIndexConfig config_: Our configuration options.

size_t minPagedSize_: Size above which we page copies from the CPU to GPU.

class GpuIndexFlatL2 : public faiss::gpu::GpuIndexFlat

#include <GpuIndexFlat.h>

Wrapper around the GPU implementation that looks like faiss::IndexFlatL2; copies over centroid data from a given faiss::IndexFlat

Public Types

using component_t = float

using distance_t = float

Public Functions

GpuIndexFlatL2(GpuResourcesProvider *provider, faiss::IndexFlatL2 *index, GpuIndexFlatConfig config = GpuIndexFlatConfig()): Construct from a pre-existing faiss::IndexFlatL2 instance, copying data over to the given GPU

GpuIndexFlatL2(std::shared_ptr<GpuResources> resources, faiss::IndexFlatL2 *index, GpuIndexFlatConfig config = GpuIndexFlatConfig())

GpuIndexFlatL2(GpuResourcesProvider *provider, int dims, GpuIndexFlatConfig config = GpuIndexFlatConfig()): Construct an empty instance that can be added to.

GpuIndexFlatL2(std::shared_ptr<GpuResources> resources, int dims, GpuIndexFlatConfig config = GpuIndexFlatConfig())

void copyFrom(faiss::IndexFlat *index): Initialize ourselves from the given CPU index; will overwrite all data in ourselves

void copyTo(faiss::IndexFlat *index): Copy ourselves to the given CPU index; will overwrite all data in the index instance

void copyFrom(const faiss::IndexFlat *index): Initialize ourselves from the given CPU index; will overwrite all data in ourselves

void copyTo(faiss::IndexFlat *index) const: Copy ourselves to the given CPU index; will overwrite all data in the index instance

size_t getNumVecs() const: Returns the number of vectors we contain.

virtual void reset() override: Clears all vectors from this index.

virtual void train(idx_t n, const float *x) override: This index is not trained, so this does nothing.

virtual void add(idx_t, const float *x) override: Overrides to avoid excessive copies.

virtual void reconstruct(idx_t key, float *out) const override: Reconstruction methods; prefer the batch reconstruct as it will be more efficient

virtual void reconstruct_n(idx_t i0, idx_t num, float *out) const override: Batch reconstruction method.

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *out) const override: Batch reconstruction method.

virtual void compute_residual(const float *x, float *residual, idx_t key) const override: Compute residual.

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override: Compute residual (batch mode)

inline FlatIndex *getGpuData(): For internal access.

int getDevice() const: Returns the device that this index is resident on.

std::shared_ptr<GpuResources> getResources(): Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU

void setMinPagingSize(size_t size): Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging

size_t getMinPagingSize() const: Returns the current minimum data size for paged searches.

virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override: x and ids can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override: x and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override: x, distances and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override: x, distances and labels and recons can be resident on the CPU or any GPU; copies are performed as needed

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual size_t remove_ids(const IDSelector &sel): removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d

virtual void merge_from(Index &otherIndex, idx_t add_id = 0): moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

Public Members

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type

Protected Functions

void copyFrom(const faiss::Index *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::Index *index) const: Copy what we have to the CPU equivalent.

void resetIndex_(int dims)

virtual bool addImplRequiresIDs_() const override: Flat index does not require IDs as there is no storage available for them

virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override: Called from GpuIndex for add.

virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override: Called from GpuIndex for search.

Protected Attributes

const GpuIndexFlatConfig flatConfig_: Our configuration options.

std::unique_ptr<FlatIndex> data_: Holds our GPU data containing the list of vectors.

std::shared_ptr<GpuResources> resources_: Manages streams, cuBLAS handles and scratch memory for devices.

const GpuIndexConfig config_: Our configuration options.

size_t minPagedSize_: Size above which we page copies from the CPU to GPU.

class GpuIndexFlatIP : public faiss::gpu::GpuIndexFlat

#include <GpuIndexFlat.h>

Wrapper around the GPU implementation that looks like faiss::IndexFlatIP; copies over centroid data from a given faiss::IndexFlat

Public Types

using component_t = float

using distance_t = float

Public Functions

GpuIndexFlatIP(GpuResourcesProvider *provider, faiss::IndexFlatIP *index, GpuIndexFlatConfig config = GpuIndexFlatConfig()): Construct from a pre-existing faiss::IndexFlatIP instance, copying data over to the given GPU

GpuIndexFlatIP(std::shared_ptr<GpuResources> resources, faiss::IndexFlatIP *index, GpuIndexFlatConfig config = GpuIndexFlatConfig())

GpuIndexFlatIP(GpuResourcesProvider *provider, int dims, GpuIndexFlatConfig config = GpuIndexFlatConfig()): Construct an empty instance that can be added to.

GpuIndexFlatIP(std::shared_ptr<GpuResources> resources, int dims, GpuIndexFlatConfig config = GpuIndexFlatConfig())

void copyFrom(faiss::IndexFlat *index): Initialize ourselves from the given CPU index; will overwrite all data in ourselves

void copyTo(faiss::IndexFlat *index): Copy ourselves to the given CPU index; will overwrite all data in the index instance

void copyFrom(const faiss::IndexFlat *index): Initialize ourselves from the given CPU index; will overwrite all data in ourselves

void copyTo(faiss::IndexFlat *index) const: Copy ourselves to the given CPU index; will overwrite all data in the index instance

size_t getNumVecs() const: Returns the number of vectors we contain.

virtual void reset() override: Clears all vectors from this index.

virtual void train(idx_t n, const float *x) override: This index is not trained, so this does nothing.

virtual void add(idx_t, const float *x) override: Overrides to avoid excessive copies.

virtual void reconstruct(idx_t key, float *out) const override: Reconstruction methods; prefer the batch reconstruct as it will be more efficient

virtual void reconstruct_n(idx_t i0, idx_t num, float *out) const override: Batch reconstruction method.

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *out) const override: Batch reconstruction method.

virtual void compute_residual(const float *x, float *residual, idx_t key) const override: Compute residual.

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override: Compute residual (batch mode)

inline FlatIndex *getGpuData(): For internal access.

int getDevice() const: Returns the device that this index is resident on.

std::shared_ptr<GpuResources> getResources(): Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU

void setMinPagingSize(size_t size): Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging

size_t getMinPagingSize() const: Returns the current minimum data size for paged searches.

virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override: x and ids can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override: x and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override: x, distances and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override: x, distances and labels and recons can be resident on the CPU or any GPU; copies are performed as needed

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual size_t remove_ids(const IDSelector &sel): removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d

virtual void merge_from(Index &otherIndex, idx_t add_id = 0): moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

Public Members

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type

Protected Functions

void copyFrom(const faiss::Index *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::Index *index) const: Copy what we have to the CPU equivalent.

void resetIndex_(int dims)

virtual bool addImplRequiresIDs_() const override: Flat index does not require IDs as there is no storage available for them

virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override: Called from GpuIndex for add.

virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override: Called from GpuIndex for search.

Protected Attributes

const GpuIndexFlatConfig flatConfig_: Our configuration options.

std::unique_ptr<FlatIndex> data_: Holds our GPU data containing the list of vectors.

std::shared_ptr<GpuResources> resources_: Manages streams, cuBLAS handles and scratch memory for devices.

const GpuIndexConfig config_: Our configuration options.

size_t minPagedSize_: Size above which we page copies from the CPU to GPU.

struct GpuIndexIVFConfig : public faiss::gpu::GpuIndexConfig

Subclassed by faiss::gpu::GpuIndexIVFFlatConfig, faiss::gpu::GpuIndexIVFPQConfig, faiss::gpu::GpuIndexIVFScalarQuantizerConfig

Public Members

IndicesOptions indicesOptions = INDICES_64_BIT: Index storage options for the GPU.

GpuIndexFlatConfig flatConfig: Configuration for the coarse quantizer object.

bool allowCpuCoarseQuantizer = false: This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.

int device = 0: GPU device on which the index is resident.

MemorySpace memorySpace = MemorySpace::Device: What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.

bool use_cuvs = false: Should the index dispatch down to cuVS?

class GpuIndexIVF : public faiss::gpu::GpuIndex, public faiss::IndexIVFInterface

#include <GpuIndexIVF.h>

Base class of all GPU IVF index types. This (for now) deliberately does not inherit from IndexIVF, as many of the public data members and functionality in IndexIVF is not supported in the same manner on the GPU.

Subclassed by faiss::gpu::GpuIndexIVFFlat, faiss::gpu::GpuIndexIVFPQ, faiss::gpu::GpuIndexIVFScalarQuantizer

Public Types

using component_t = float

using distance_t = float

Public Functions

GpuIndexIVF(GpuResourcesProvider *provider, int dims, faiss::MetricType metric, float metricArg, idx_t nlist, GpuIndexIVFConfig config = GpuIndexIVFConfig()): Version that auto-constructs a flat coarse quantizer based on the desired metric

GpuIndexIVF(GpuResourcesProvider *provider, Index *coarseQuantizer, int dims, faiss::MetricType metric, float metricArg, idx_t nlist, GpuIndexIVFConfig config = GpuIndexIVFConfig()): Version that takes a coarse quantizer instance. The GpuIndexIVF does not own the coarseQuantizer instance by default (functions like IndexIVF).

~GpuIndexIVF() override

void copyFrom(const faiss::IndexIVF *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::IndexIVF *index) const: Copy what we have to the CPU equivalent.

virtual void updateQuantizer() = 0: Should be called if the user ever changes the state of the IVF coarse quantizer manually (e.g., substitutes a new instance or changes vectors in the coarse quantizer outside the scope of training)

virtual idx_t getNumLists() const: Returns the number of inverted lists we’re managing.

virtual idx_t getListLength(idx_t listId) const: Returns the number of vectors present in a particular inverted list.

virtual std::vector<uint8_t> getListVectorData(idx_t listId, bool gpuFormat = false) const: Return the encoded vector data contained in a particular inverted list, for debugging purposes. If gpuFormat is true, the data is returned as it is encoded in the GPU-side representation. Otherwise, it is converted to the CPU format. compliant format, while the native GPU format may differ.

virtual std::vector<idx_t> getListIndices(idx_t listId) const: Return the vector indices contained in a particular inverted list, for debugging purposes.

virtual void search_preassigned(idx_t n, const float *x, idx_t k, const idx_t *assign, const float *centroid_dis, float *distances, idx_t *labels, bool store_pairs, const SearchParametersIVF *params = nullptr, IndexIVFStats *stats = nullptr) const override

search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the corresponding heaps with the query results. The default implementation uses InvertedListScanners to do the search.

Parameters:

n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
distance – output distances, size n * k
labels – output labels, size n * k
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)

virtual void range_search_preassigned(idx_t nx, const float *x, float radius, const idx_t *keys, const float *coarse_dis, RangeSearchResult *result, bool store_pairs = false, const IVFSearchParameters *params = nullptr, IndexIVFStats *stats = nullptr) const override

Range search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the RangeSearchResults results. The default implementation uses InvertedListScanners to do the search.

Parameters:

n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
result – Output results
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)

int getDevice() const: Returns the device that this index is resident on.

std::shared_ptr<GpuResources> getResources(): Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU

void setMinPagingSize(size_t size): Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging

size_t getMinPagingSize() const: Returns the current minimum data size for paged searches.

virtual void add(idx_t, const float *x) override: x can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override: x and ids can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override: x and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override: x, distances and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override: x, distances and labels and recons can be resident on the CPU or any GPU; copies are performed as needed

virtual void compute_residual(const float *x, float *residual, idx_t key) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void train(idx_t n, const float *x)

Perform training on a representative set of vectors

Parameters:

n – nb of training vectors
x – training vecors, size n * d

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual void reset() = 0: removes all elements from the database.

virtual size_t remove_ids(const IDSelector &sel): removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void reconstruct(idx_t key, float *recons) const

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

key – id of the vector to reconstruct
recons – reconstucted vector (size d)

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const

Reconstruct several stored vectors (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters:

i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d

virtual void merge_from(Index &otherIndex, idx_t add_id = 0): moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

void train_q1(size_t n, const float *x, bool verbose, MetricType metric_type): Trains the quantizer and calls train_residual to train sub-quantizers.

size_t coarse_code_size() const: compute the number of bytes required to store list ids

void encode_listno(idx_t list_no, uint8_t *code) const

idx_t decode_listno(const uint8_t *code) const

Public Members

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type

size_t nprobe = 1: number of probes at query time

size_t max_codes = 0: max nb of codes to visit to do a query

Index *quantizer = nullptr: quantizer that maps vectors to inverted lists

size_t nlist = 0: number of inverted lists

char quantizer_trains_alone = 0: = 0: use the quantizer as index in a kmeans training = 1: just pass on the training set to the train() of the quantizer = 2: kmeans training on a flat index + add the centroids to the quantizer

bool own_fields = false: whether object owns the quantizer

ClusteringParameters cp: to override default clustering params

Index *clustering_index = nullptr: to override index used during clustering

Protected Functions

int getCurrentNProbe_(const SearchParameters *params) const: From either the current set nprobe or the SearchParameters if available, return the nprobe that we should use for the current search

void verifyIVFSettings_() const

virtual bool addImplRequiresIDs_() const override: Does addImpl_ require IDs? If so, and no IDs are provided, we will generate them sequentially based on the order in which the IDs are added

virtual void trainQuantizer_(idx_t n, const float *x)

virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override: Called from GpuIndex for add/add_with_ids.

virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override: Called from GpuIndex for search.

void copyFrom(const faiss::Index *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::Index *index) const: Copy what we have to the CPU equivalent.

Protected Attributes

const GpuIndexIVFConfig ivfConfig_: Our configuration options.

std::shared_ptr<IVFBase> baseIndex_: For a trained/initialized index, this is a reference to the base class.

std::shared_ptr<GpuResources> resources_: Manages streams, cuBLAS handles and scratch memory for devices.

const GpuIndexConfig config_: Our configuration options.

size_t minPagedSize_: Size above which we page copies from the CPU to GPU.

Private Functions

void init_(): Shared initialization functions.

struct GpuIndexIVFFlatConfig : public faiss::gpu::GpuIndexIVFConfig

Public Members

bool interleavedLayout = true: Use the alternative memory layout for the IVF lists (currently the default)

IndicesOptions indicesOptions = INDICES_64_BIT: Index storage options for the GPU.

GpuIndexFlatConfig flatConfig: Configuration for the coarse quantizer object.

bool allowCpuCoarseQuantizer = false: This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.

int device = 0: GPU device on which the index is resident.

MemorySpace memorySpace = MemorySpace::Device: What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.

bool use_cuvs = false: Should the index dispatch down to cuVS?

class GpuIndexIVFFlat : public faiss::gpu::GpuIndexIVF

#include <GpuIndexIVFFlat.h>

Wrapper around the GPU implementation that looks like faiss::IndexIVFFlat

Public Types

using component_t = float

using distance_t = float

Public Functions

GpuIndexIVFFlat(GpuResourcesProvider *provider, const faiss::IndexIVFFlat *index, GpuIndexIVFFlatConfig config = GpuIndexIVFFlatConfig()): Construct from a pre-existing faiss::IndexIVFFlat instance, copying data over to the given GPU, if the input index is trained.

GpuIndexIVFFlat(GpuResourcesProvider *provider, int dims, idx_t nlist, faiss::MetricType metric = faiss::METRIC_L2, GpuIndexIVFFlatConfig config = GpuIndexIVFFlatConfig()): Constructs a new instance with an empty flat quantizer; the user provides the number of IVF lists desired.

GpuIndexIVFFlat(GpuResourcesProvider *provider, Index *coarseQuantizer, int dims, idx_t nlist, faiss::MetricType metric = faiss::METRIC_L2, GpuIndexIVFFlatConfig config = GpuIndexIVFFlatConfig()): Constructs a new instance with a provided CPU or GPU coarse quantizer; the user provides the number of IVF lists desired.

~GpuIndexIVFFlat() override

void reserveMemory(size_t numVecs): Reserve GPU memory in our inverted lists for this number of vectors.

void copyFrom(const faiss::IndexIVFFlat *index): Initialize ourselves from the given CPU index; will overwrite all data in ourselves

void copyTo(faiss::IndexIVFFlat *index) const: Copy ourselves to the given CPU index; will overwrite all data in the index instance

size_t reclaimMemory(): After adding vectors, one can call this to reclaim device memory to exactly the amount needed. Returns space reclaimed in bytes

virtual void reset() override: Clears out all inverted lists, but retains the coarse centroid information

virtual void updateQuantizer() override: Should be called if the user ever changes the state of the IVF coarse quantizer manually (e.g., substitutes a new instance or changes vectors in the coarse quantizer outside the scope of training)

virtual void train(idx_t n, const float *x) override: Trains the coarse quantizer based on the given vector data.

virtual void reconstruct_n(idx_t i0, idx_t n, float *out) const override

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters:

i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)

void copyFrom(const faiss::IndexIVF *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::IndexIVF *index) const: Copy what we have to the CPU equivalent.

virtual idx_t getNumLists() const: Returns the number of inverted lists we’re managing.

virtual idx_t getListLength(idx_t listId) const: Returns the number of vectors present in a particular inverted list.

virtual std::vector<uint8_t> getListVectorData(idx_t listId, bool gpuFormat = false) const: Return the encoded vector data contained in a particular inverted list, for debugging purposes. If gpuFormat is true, the data is returned as it is encoded in the GPU-side representation. Otherwise, it is converted to the CPU format. compliant format, while the native GPU format may differ.

virtual std::vector<idx_t> getListIndices(idx_t listId) const: Return the vector indices contained in a particular inverted list, for debugging purposes.

virtual void search_preassigned(idx_t n, const float *x, idx_t k, const idx_t *assign, const float *centroid_dis, float *distances, idx_t *labels, bool store_pairs, const SearchParametersIVF *params = nullptr, IndexIVFStats *stats = nullptr) const override

search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the corresponding heaps with the query results. The default implementation uses InvertedListScanners to do the search.

Parameters:

n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
distance – output distances, size n * k
labels – output labels, size n * k
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)

virtual void range_search_preassigned(idx_t nx, const float *x, float radius, const idx_t *keys, const float *coarse_dis, RangeSearchResult *result, bool store_pairs = false, const IVFSearchParameters *params = nullptr, IndexIVFStats *stats = nullptr) const override

Range search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the RangeSearchResults results. The default implementation uses InvertedListScanners to do the search.

Parameters:

n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
result – Output results
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)

int getDevice() const: Returns the device that this index is resident on.

std::shared_ptr<GpuResources> getResources(): Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU

void setMinPagingSize(size_t size): Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging

size_t getMinPagingSize() const: Returns the current minimum data size for paged searches.

virtual void add(idx_t, const float *x) override: x can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override: x and ids can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override: x and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override: x, distances and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override: x, distances and labels and recons can be resident on the CPU or any GPU; copies are performed as needed

virtual void compute_residual(const float *x, float *residual, idx_t key) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual size_t remove_ids(const IDSelector &sel): removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void reconstruct(idx_t key, float *recons) const

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

key – id of the vector to reconstruct
recons – reconstucted vector (size d)

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const

Reconstruct several stored vectors (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d

virtual void merge_from(Index &otherIndex, idx_t add_id = 0): moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

void train_q1(size_t n, const float *x, bool verbose, MetricType metric_type): Trains the quantizer and calls train_residual to train sub-quantizers.

size_t coarse_code_size() const: compute the number of bytes required to store list ids

void encode_listno(idx_t list_no, uint8_t *code) const

idx_t decode_listno(const uint8_t *code) const

Public Members

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type

size_t nprobe = 1: number of probes at query time

size_t max_codes = 0: max nb of codes to visit to do a query

Index *quantizer = nullptr: quantizer that maps vectors to inverted lists

size_t nlist = 0: number of inverted lists

char quantizer_trains_alone = 0: = 0: use the quantizer as index in a kmeans training = 1: just pass on the training set to the train() of the quantizer = 2: kmeans training on a flat index + add the centroids to the quantizer

bool own_fields = false: whether object owns the quantizer

ClusteringParameters cp: to override default clustering params

Index *clustering_index = nullptr: to override index used during clustering

Protected Functions

void setIndex_(GpuResources *resources, int dim, int nlist, faiss::MetricType metric, float metricArg, bool useResidual, faiss::ScalarQuantizer *scalarQ, bool interleavedLayout, IndicesOptions indicesOptions, MemorySpace space)

Initialize appropriate index.

Parameters:: scalarQ – Optional ScalarQuantizer

void copyFrom(const faiss::Index *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::Index *index) const: Copy what we have to the CPU equivalent.

int getCurrentNProbe_(const SearchParameters *params) const: From either the current set nprobe or the SearchParameters if available, return the nprobe that we should use for the current search

void verifyIVFSettings_() const

virtual bool addImplRequiresIDs_() const override: Does addImpl_ require IDs? If so, and no IDs are provided, we will generate them sequentially based on the order in which the IDs are added

virtual void trainQuantizer_(idx_t n, const float *x)

virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override: Called from GpuIndex for add/add_with_ids.

virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override: Called from GpuIndex for search.

Protected Attributes

const GpuIndexIVFFlatConfig ivfFlatConfig_: Our configuration options.

size_t reserveMemoryVecs_: Desired inverted list memory reservation.

std::shared_ptr<IVFFlat> index_: Instance that we own; contains the inverted lists.

const GpuIndexIVFConfig ivfConfig_: Our configuration options.

std::shared_ptr<IVFBase> baseIndex_: For a trained/initialized index, this is a reference to the base class.

std::shared_ptr<GpuResources> resources_: Manages streams, cuBLAS handles and scratch memory for devices.

const GpuIndexConfig config_: Our configuration options.

size_t minPagedSize_: Size above which we page copies from the CPU to GPU.

struct GpuIndexIVFPQConfig : public faiss::gpu::GpuIndexIVFConfig

Public Members

bool useFloat16LookupTables = false: Whether or not float16 residual distance tables are used in the list scanning kernels. When subQuantizers * 2^bitsPerCode > 16384, this is required.

bool usePrecomputedTables = false: Whether or not we enable the precomputed table option for search, which can substantially increase the memory requirement.

bool interleavedLayout = false: Use the alternative memory layout for the IVF lists WARNING: this is a feature under development, and is only supported with cuVS enabled for the index. Do not use if cuVS is not enabled.

bool useMMCodeDistance = false

Use GEMM-backed computation of PQ code distances for the no precomputed table version of IVFPQ. This is for debugging purposes, it should not substantially affect the results one way for another.

Note that MM code distance is enabled automatically if one uses a number of dimensions per sub-quantizer that is not natively specialized (an odd number like 7 or so).

IndicesOptions indicesOptions = INDICES_64_BIT: Index storage options for the GPU.

GpuIndexFlatConfig flatConfig: Configuration for the coarse quantizer object.

bool allowCpuCoarseQuantizer = false: This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.

int device = 0: GPU device on which the index is resident.

MemorySpace memorySpace = MemorySpace::Device: What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.

bool use_cuvs = false: Should the index dispatch down to cuVS?

class GpuIndexIVFPQ : public faiss::gpu::GpuIndexIVF

#include <GpuIndexIVFPQ.h>

IVFPQ index for the GPU.

Public Types

using component_t = float

using distance_t = float

Public Functions

GpuIndexIVFPQ(GpuResourcesProvider *provider, const faiss::IndexIVFPQ *index, GpuIndexIVFPQConfig config = GpuIndexIVFPQConfig()): Construct from a pre-existing faiss::IndexIVFPQ instance, copying data over to the given GPU, if the input index is trained.

GpuIndexIVFPQ(GpuResourcesProvider *provider, int dims, idx_t nlist, idx_t subQuantizers, idx_t bitsPerCode, faiss::MetricType metric = faiss::METRIC_L2, GpuIndexIVFPQConfig config = GpuIndexIVFPQConfig()): Constructs a new instance with an empty flat quantizer; the user provides the number of IVF lists desired.

GpuIndexIVFPQ(GpuResourcesProvider *provider, Index *coarseQuantizer, int dims, idx_t nlist, idx_t subQuantizers, idx_t bitsPerCode, faiss::MetricType metric = faiss::METRIC_L2, GpuIndexIVFPQConfig config = GpuIndexIVFPQConfig()): Constructs a new instance with a provided CPU or GPU coarse quantizer; the user provides the number of IVF lists desired.

~GpuIndexIVFPQ() override

void copyFrom(const faiss::IndexIVFPQ *index): Reserve space on the GPU for the inverted lists for num vectors, assumed equally distributed among Initialize ourselves from the given CPU index; will overwrite all data in ourselves

void copyTo(faiss::IndexIVFPQ *index) const: Copy ourselves to the given CPU index; will overwrite all data in the index instance

void reserveMemory(size_t numVecs): Reserve GPU memory in our inverted lists for this number of vectors.

void setPrecomputedCodes(bool enable): Enable or disable pre-computed codes.

bool getPrecomputedCodes() const: Are pre-computed codes enabled?

int getNumSubQuantizers() const: Return the number of sub-quantizers we are using.

int getBitsPerCode() const: Return the number of bits per PQ code.

int getCentroidsPerSubQuantizer() const: Return the number of centroids per PQ code (2^bits per code)

size_t reclaimMemory(): After adding vectors, one can call this to reclaim device memory to exactly the amount needed. Returns space reclaimed in bytes

virtual void reset() override: Clears out all inverted lists, but retains the coarse and product centroid information

virtual void updateQuantizer() override: Should be called if the user ever changes the state of the IVF coarse quantizer manually (e.g., substitutes a new instance or changes vectors in the coarse quantizer outside the scope of training)

virtual void train(idx_t n, const float *x) override: Trains the coarse and product quantizer based on the given vector data.

void copyFrom(const faiss::IndexIVF *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::IndexIVF *index) const: Copy what we have to the CPU equivalent.

virtual idx_t getNumLists() const: Returns the number of inverted lists we’re managing.

virtual idx_t getListLength(idx_t listId) const: Returns the number of vectors present in a particular inverted list.

virtual std::vector<uint8_t> getListVectorData(idx_t listId, bool gpuFormat = false) const: Return the encoded vector data contained in a particular inverted list, for debugging purposes. If gpuFormat is true, the data is returned as it is encoded in the GPU-side representation. Otherwise, it is converted to the CPU format. compliant format, while the native GPU format may differ.

virtual std::vector<idx_t> getListIndices(idx_t listId) const: Return the vector indices contained in a particular inverted list, for debugging purposes.

virtual void search_preassigned(idx_t n, const float *x, idx_t k, const idx_t *assign, const float *centroid_dis, float *distances, idx_t *labels, bool store_pairs, const SearchParametersIVF *params = nullptr, IndexIVFStats *stats = nullptr) const override

search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the corresponding heaps with the query results. The default implementation uses InvertedListScanners to do the search.

Parameters:

n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
distance – output distances, size n * k
labels – output labels, size n * k
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)

virtual void range_search_preassigned(idx_t nx, const float *x, float radius, const idx_t *keys, const float *coarse_dis, RangeSearchResult *result, bool store_pairs = false, const IVFSearchParameters *params = nullptr, IndexIVFStats *stats = nullptr) const override

Range search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the RangeSearchResults results. The default implementation uses InvertedListScanners to do the search.

Parameters:

n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
result – Output results
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)

int getDevice() const: Returns the device that this index is resident on.

std::shared_ptr<GpuResources> getResources(): Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU

void setMinPagingSize(size_t size): Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging

size_t getMinPagingSize() const: Returns the current minimum data size for paged searches.

virtual void add(idx_t, const float *x) override: x can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override: x and ids can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override: x and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override: x, distances and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override: x, distances and labels and recons can be resident on the CPU or any GPU; copies are performed as needed

virtual void compute_residual(const float *x, float *residual, idx_t key) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual size_t remove_ids(const IDSelector &sel): removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void reconstruct(idx_t key, float *recons) const

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

key – id of the vector to reconstruct
recons – reconstucted vector (size d)

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const

Reconstruct several stored vectors (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters:

i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d

virtual void merge_from(Index &otherIndex, idx_t add_id = 0): moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

void train_q1(size_t n, const float *x, bool verbose, MetricType metric_type): Trains the quantizer and calls train_residual to train sub-quantizers.

size_t coarse_code_size() const: compute the number of bytes required to store list ids

void encode_listno(idx_t list_no, uint8_t *code) const

idx_t decode_listno(const uint8_t *code) const

Public Members

ProductQuantizer pq: Like the CPU version, we expose a publically-visible ProductQuantizer for manipulation

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type

size_t nprobe = 1: number of probes at query time

size_t max_codes = 0: max nb of codes to visit to do a query

Index *quantizer = nullptr: quantizer that maps vectors to inverted lists

size_t nlist = 0: number of inverted lists

char quantizer_trains_alone = 0: = 0: use the quantizer as index in a kmeans training = 1: just pass on the training set to the train() of the quantizer = 2: kmeans training on a flat index + add the centroids to the quantizer

bool own_fields = false: whether object owns the quantizer

ClusteringParameters cp: to override default clustering params

Index *clustering_index = nullptr: to override index used during clustering

Protected Functions

void setIndex_(GpuResources *resources, int dim, idx_t nlist, faiss::MetricType metric, float metricArg, int numSubQuantizers, int bitsPerSubQuantizer, bool useFloat16LookupTables, bool useMMCodeDistance, bool interleavedLayout, float *pqCentroidData, IndicesOptions indicesOptions, MemorySpace space): Initialize appropriate index.

void verifyPQSettings_() const: Throws errors if configuration settings are improper.

void trainResidualQuantizer_(idx_t n, const float *x): Trains the PQ quantizer based on the given vector data.

void copyFrom(const faiss::Index *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::Index *index) const: Copy what we have to the CPU equivalent.

int getCurrentNProbe_(const SearchParameters *params) const: From either the current set nprobe or the SearchParameters if available, return the nprobe that we should use for the current search

void verifyIVFSettings_() const

virtual bool addImplRequiresIDs_() const override: Does addImpl_ require IDs? If so, and no IDs are provided, we will generate them sequentially based on the order in which the IDs are added

virtual void trainQuantizer_(idx_t n, const float *x)

virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override: Called from GpuIndex for add/add_with_ids.

virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override: Called from GpuIndex for search.

Protected Attributes

const GpuIndexIVFPQConfig ivfpqConfig_: Our configuration options that we were initialized with.

bool usePrecomputedTables_: Runtime override: whether or not we use precomputed tables.

int subQuantizers_: Number of sub-quantizers per encoded vector.

int bitsPerCode_: Bits per sub-quantizer code.

size_t reserveMemoryVecs_: Desired inverted list memory reservation.

std::shared_ptr<IVFPQ> index_: The product quantizer instance that we own; contains the inverted lists

const GpuIndexIVFConfig ivfConfig_: Our configuration options.

std::shared_ptr<IVFBase> baseIndex_: For a trained/initialized index, this is a reference to the base class.

std::shared_ptr<GpuResources> resources_: Manages streams, cuBLAS handles and scratch memory for devices.

const GpuIndexConfig config_: Our configuration options.

size_t minPagedSize_: Size above which we page copies from the CPU to GPU.

struct GpuIndexIVFScalarQuantizerConfig : public faiss::gpu::GpuIndexIVFConfig

Public Members

bool interleavedLayout = true: Use the alternative memory layout for the IVF lists (currently the default)

IndicesOptions indicesOptions = INDICES_64_BIT: Index storage options for the GPU.

GpuIndexFlatConfig flatConfig: Configuration for the coarse quantizer object.

bool allowCpuCoarseQuantizer = false: This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.

int device = 0: GPU device on which the index is resident.

MemorySpace memorySpace = MemorySpace::Device: What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.

bool use_cuvs = false: Should the index dispatch down to cuVS?

class GpuIndexIVFScalarQuantizer : public faiss::gpu::GpuIndexIVF

#include <GpuIndexIVFScalarQuantizer.h>

Wrapper around the GPU implementation that looks like faiss::IndexIVFScalarQuantizer

Public Types

using component_t = float

using distance_t = float

Public Functions

GpuIndexIVFScalarQuantizer(GpuResourcesProvider *provider, const faiss::IndexIVFScalarQuantizer *index, GpuIndexIVFScalarQuantizerConfig config = GpuIndexIVFScalarQuantizerConfig()): Construct from a pre-existing faiss::IndexIVFScalarQuantizer instance, copying data over to the given GPU, if the input index is trained.

GpuIndexIVFScalarQuantizer(GpuResourcesProvider *provider, int dims, idx_t nlist, faiss::ScalarQuantizer::QuantizerType qtype, faiss::MetricType metric = MetricType::METRIC_L2, bool encodeResidual = true, GpuIndexIVFScalarQuantizerConfig config = GpuIndexIVFScalarQuantizerConfig()): Constructs a new instance with an empty flat quantizer; the user provides the number of IVF lists desired.

GpuIndexIVFScalarQuantizer(GpuResourcesProvider *provider, Index *coarseQuantizer, int dims, idx_t nlist, faiss::ScalarQuantizer::QuantizerType qtype, faiss::MetricType metric = MetricType::METRIC_L2, bool encodeResidual = true, GpuIndexIVFScalarQuantizerConfig config = GpuIndexIVFScalarQuantizerConfig()): Constructs a new instance with a provided CPU or GPU coarse quantizer; the user provides the number of IVF lists desired.

~GpuIndexIVFScalarQuantizer() override

void reserveMemory(size_t numVecs): Reserve GPU memory in our inverted lists for this number of vectors.

void copyFrom(const faiss::IndexIVFScalarQuantizer *index): Initialize ourselves from the given CPU index; will overwrite all data in ourselves

void copyTo(faiss::IndexIVFScalarQuantizer *index) const: Copy ourselves to the given CPU index; will overwrite all data in the index instance

size_t reclaimMemory(): After adding vectors, one can call this to reclaim device memory to exactly the amount needed. Returns space reclaimed in bytes

virtual void reset() override: Clears out all inverted lists, but retains the coarse and scalar quantizer information

virtual void updateQuantizer() override: Should be called if the user ever changes the state of the IVF coarse quantizer manually (e.g., substitutes a new instance or changes vectors in the coarse quantizer outside the scope of training)

virtual void train(idx_t n, const float *x) override: Trains the coarse and scalar quantizer based on the given vector data.

void copyFrom(const faiss::IndexIVF *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::IndexIVF *index) const: Copy what we have to the CPU equivalent.

virtual idx_t getNumLists() const: Returns the number of inverted lists we’re managing.

virtual idx_t getListLength(idx_t listId) const: Returns the number of vectors present in a particular inverted list.

virtual std::vector<uint8_t> getListVectorData(idx_t listId, bool gpuFormat = false) const: Return the encoded vector data contained in a particular inverted list, for debugging purposes. If gpuFormat is true, the data is returned as it is encoded in the GPU-side representation. Otherwise, it is converted to the CPU format. compliant format, while the native GPU format may differ.

virtual std::vector<idx_t> getListIndices(idx_t listId) const: Return the vector indices contained in a particular inverted list, for debugging purposes.

virtual void search_preassigned(idx_t n, const float *x, idx_t k, const idx_t *assign, const float *centroid_dis, float *distances, idx_t *labels, bool store_pairs, const SearchParametersIVF *params = nullptr, IndexIVFStats *stats = nullptr) const override

search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the corresponding heaps with the query results. The default implementation uses InvertedListScanners to do the search.

Parameters:

n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
distance – output distances, size n * k
labels – output labels, size n * k
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)

virtual void range_search_preassigned(idx_t nx, const float *x, float radius, const idx_t *keys, const float *coarse_dis, RangeSearchResult *result, bool store_pairs = false, const IVFSearchParameters *params = nullptr, IndexIVFStats *stats = nullptr) const override

Range search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the RangeSearchResults results. The default implementation uses InvertedListScanners to do the search.

Parameters:

n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
result – Output results
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)

int getDevice() const: Returns the device that this index is resident on.

std::shared_ptr<GpuResources> getResources(): Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU

void setMinPagingSize(size_t size): Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging

size_t getMinPagingSize() const: Returns the current minimum data size for paged searches.

virtual void add(idx_t, const float *x) override: x can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override: x and ids can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_

virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override: x and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override: x, distances and labels can be resident on the CPU or any GPU; copies are performed as needed

virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override: x, distances and labels and recons can be resident on the CPU or any GPU; copies are performed as needed

virtual void compute_residual(const float *x, float *residual, idx_t key) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override: Overridden to force GPU indices to provide their own GPU-friendly implementation

virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const

query n vectors of dimension d to the index.

return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).

Parameters:

n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table

virtual size_t remove_ids(const IDSelector &sel): removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.

virtual void reconstruct(idx_t key, float *recons) const

Reconstruct a stored vector (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

key – id of the vector to reconstruct
recons – reconstucted vector (size d)

virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const

Reconstruct several stored vectors (or an approximation if lossy coding)

this function may not be defined for some indexes

Parameters:

n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)

virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const

Reconstruct vectors i0 to i0 + ni - 1

this function may not be defined for some indexes

Parameters:

i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)

virtual DistanceComputer *get_distance_computer() const

Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.

DistanceComputer is implemented for indexes that support random access of their vectors.

virtual size_t sa_code_size() const: size of the produced codes in bytes

virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const

encode a set of vectors

Parameters:

n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()

virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const

decode a set of vectors

Parameters:

n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d

virtual void merge_from(Index &otherIndex, idx_t add_id = 0): moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)

virtual void check_compatible_for_merge(const Index &otherIndex) const: check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.

virtual void add_sa_codes(idx_t n, const uint8_t *codes, const idx_t *xids)

Add vectors that are computed with the standalone codec

Parameters:

codes – codes to add size n * sa_code_size()
xids – corresponding ids, size n

void train_q1(size_t n, const float *x, bool verbose, MetricType metric_type): Trains the quantizer and calls train_residual to train sub-quantizers.

size_t coarse_code_size() const: compute the number of bytes required to store list ids

void encode_listno(idx_t list_no, uint8_t *code) const

idx_t decode_listno(const uint8_t *code) const

Public Members

faiss::ScalarQuantizer sq: Exposed like the CPU version.

bool by_residual: Exposed like the CPU version.

int d: vector dimension

idx_t ntotal: total nb of indexed vectors

bool verbose: verbosity level

bool is_trained: set if the Index does not require training, or if training is done already

MetricType metric_type: type of metric this index uses for search

float metric_arg: argument of the metric type

size_t nprobe = 1: number of probes at query time

size_t max_codes = 0: max nb of codes to visit to do a query

Index *quantizer = nullptr: quantizer that maps vectors to inverted lists

size_t nlist = 0: number of inverted lists

char quantizer_trains_alone = 0: = 0: use the quantizer as index in a kmeans training = 1: just pass on the training set to the train() of the quantizer = 2: kmeans training on a flat index + add the centroids to the quantizer

bool own_fields = false: whether object owns the quantizer

ClusteringParameters cp: to override default clustering params

Index *clustering_index = nullptr: to override index used during clustering

Protected Functions

void verifySQSettings_() const: Validates index SQ parameters.

void trainResiduals_(idx_t n, const float *x): Called from train to handle SQ residual training.

void copyFrom(const faiss::Index *index): Copy what we need from the CPU equivalent.

void copyTo(faiss::Index *index) const: Copy what we have to the CPU equivalent.

int getCurrentNProbe_(const SearchParameters *params) const: From either the current set nprobe or the SearchParameters if available, return the nprobe that we should use for the current search

void verifyIVFSettings_() const

virtual bool addImplRequiresIDs_() const override: Does addImpl_ require IDs? If so, and no IDs are provided, we will generate them sequentially based on the order in which the IDs are added

virtual void trainQuantizer_(idx_t n, const float *x)

virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override: Called from GpuIndex for add/add_with_ids.

virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override: Called from GpuIndex for search.

Protected Attributes

const GpuIndexIVFScalarQuantizerConfig ivfSQConfig_: Our configuration options.

size_t reserveMemoryVecs_: Desired inverted list memory reservation.

std::shared_ptr<IVFFlat> index_: Instance that we own; contains the inverted list.

const GpuIndexIVFConfig ivfConfig_: Our configuration options.

std::shared_ptr<IVFBase> baseIndex_: For a trained/initialized index, this is a reference to the base class.

std::shared_ptr<GpuResources> resources_: Manages streams, cuBLAS handles and scratch memory for devices.

const GpuIndexConfig config_: Our configuration options.

size_t minPagedSize_: Size above which we page copies from the CPU to GPU.

struct AllocInfo

#include <GpuResources.h>

Information on what/where an allocation is.

Subclassed by faiss::gpu::AllocRequest

Public Functions

inline AllocInfo()

inline AllocInfo(AllocType at, int dev, MemorySpace sp, cudaStream_t st)

std::string toString() const: Returns a string representation of this info.

Public Members

AllocType type = AllocType::Other: The internal category of the allocation.

int device = 0: The device on which the allocation is happening.

MemorySpace space = MemorySpace::Device: The memory space of the allocation.

cudaStream_t stream = nullptr

The stream on which new work on the memory will be ordered (e.g., if a piece of memory cached and to be returned for this call was last used on stream 3 and a new memory request is for stream 4, the memory manager will synchronize stream 4 to wait for the completion of stream 3 via events or other stream synchronization.

The memory manager guarantees that the returned memory is free to use without data races on this stream specified.

struct AllocRequest : public faiss::gpu::AllocInfo

#include <GpuResources.h>

Information on what/where an allocation is, along with how big it should be.

Public Functions

inline AllocRequest()

inline AllocRequest(const AllocInfo &info, size_t sz)

inline AllocRequest(AllocType at, int dev, MemorySpace sp, cudaStream_t st, size_t sz)

std::string toString() const: Returns a string representation of this request.

Public Members

size_t size = 0: The size in bytes of the allocation.

AllocType type = AllocType::Other: The internal category of the allocation.

int device = 0: The device on which the allocation is happening.

MemorySpace space = MemorySpace::Device: The memory space of the allocation.

cudaStream_t stream = nullptr

The stream on which new work on the memory will be ordered (e.g., if a piece of memory cached and to be returned for this call was last used on stream 3 and a new memory request is for stream 4, the memory manager will synchronize stream 4 to wait for the completion of stream 3 via events or other stream synchronization.

The memory manager guarantees that the returned memory is free to use without data races on this stream specified.

struct GpuMemoryReservation

#include <GpuResources.h>

A RAII object that manages a temporary memory request.

Public Functions

GpuMemoryReservation()

GpuMemoryReservation(GpuResources *r, int dev, cudaStream_t str, void *p, size_t sz)

GpuMemoryReservation(GpuMemoryReservation &&m) noexcept

~GpuMemoryReservation()

GpuMemoryReservation &operator=(GpuMemoryReservation &&m)

inline void *get()

void release()

Public Members

GpuResources *res

int device

cudaStream_t stream

void *data

size_t size

class GpuResources

#include <GpuResources.h>

Base class of GPU-side resource provider; hides provision of cuBLAS handles, CUDA streams and all device memory allocation performed

Subclassed by faiss::gpu::StandardGpuResourcesImpl

Public Functions

virtual ~GpuResources()

virtual void initializeForDevice(int device) = 0: Call to pre-allocate resources for a particular device. If this is not called, then resources will be allocated at the first time of demand

virtual bool supportsBFloat16(int device) = 0: Does the given GPU support bfloat16?

virtual cublasHandle_t getBlasHandle(int device) = 0: Returns the cuBLAS handle that we use for the given device.

virtual cudaStream_t getDefaultStream(int device) = 0: Returns the stream that we order all computation on for the given device

virtual void setDefaultStream(int device, cudaStream_t stream) = 0: Overrides the default stream for a device to the user-supplied stream. The resources object does not own this stream (i.e., it will not destroy it).

virtual std::vector<cudaStream_t> getAlternateStreams(int device) = 0: Returns the set of alternative streams that we use for the given device.

virtual void *allocMemory(const AllocRequest &req) = 0: Memory management Returns an allocation from the given memory space, ordered with respect to the given stream (i.e., the first user will be a kernel in this stream). All allocations are sized internally to be the next highest multiple of 16 bytes, and all allocations returned are guaranteed to be 16 byte aligned.

virtual void deallocMemory(int device, void *in) = 0: Returns a previous allocation.

virtual size_t getTempMemoryAvailable(int device) const = 0: For MemorySpace::Temporary, how much space is immediately available without cudaMalloc allocation?

virtual std::pair<void*, size_t> getPinnedMemory() = 0: Returns the available CPU pinned memory buffer.

virtual cudaStream_t getAsyncCopyStream(int device) = 0: Returns the stream on which we perform async CPU <-> GPU copies.

bool supportsBFloat16CurrentDevice()

Does the current GPU support bfloat16?

Functions provided by default

cublasHandle_t getBlasHandleCurrentDevice(): Calls getBlasHandle with the current device.

cudaStream_t getDefaultStreamCurrentDevice(): Calls getDefaultStream with the current device.

size_t getTempMemoryAvailableCurrentDevice() const: Calls getTempMemoryAvailable with the current device.

GpuMemoryReservation allocMemoryHandle(const AllocRequest &req): Returns a temporary memory allocation via a RAII object.

void syncDefaultStream(int device): Synchronizes the CPU with respect to the default stream for the given device

void syncDefaultStreamCurrentDevice(): Calls syncDefaultStream for the current device.

std::vector<cudaStream_t> getAlternateStreamsCurrentDevice(): Calls getAlternateStreams for the current device.

cudaStream_t getAsyncCopyStreamCurrentDevice(): Calls getAsyncCopyStream for the current device.

class GpuResourcesProvider

#include <GpuResources.h>

Interface for a provider of a shared resources object. This is to avoid interfacing std::shared_ptr to Python

Subclassed by faiss::gpu::GpuResourcesProviderFromInstance, faiss::gpu::StandardGpuResources

Public Functions

virtual ~GpuResourcesProvider()

virtual std::shared_ptr<GpuResources> getResources() = 0: Returns the shared resources object.

class GpuResourcesProviderFromInstance : public faiss::gpu::GpuResourcesProvider

#include <GpuResources.h>

A simple wrapper for a GpuResources object to make a GpuResourcesProvider out of it again

Public Functions

explicit GpuResourcesProviderFromInstance(std::shared_ptr<GpuResources> p)

~GpuResourcesProviderFromInstance() override

virtual std::shared_ptr<GpuResources> getResources() override: Returns the shared resources object.

Private Members

std::shared_ptr<GpuResources> res_

template<typename GpuIndex> struct IndexWrapper

Public Functions

IndexWrapper(int numGpus, std::function<std::unique_ptr<GpuIndex>(GpuResourcesProvider*, int)> init)

faiss::Index *getIndex()

void runOnIndices(std::function<void(GpuIndex*)> f)

void setNumProbes(size_t nprobe)

Public Members

std::vector<std::unique_ptr<faiss::gpu::StandardGpuResources>> resources

std::vector<std::unique_ptr<GpuIndex>> subIndex

std::unique_ptr<faiss::IndexReplicas> replicaIndex

class StandardGpuResourcesImpl : public faiss::gpu::GpuResources

#include <StandardGpuResources.h>

Standard implementation of the GpuResources object that provides for a temporary memory manager

Public Functions

StandardGpuResourcesImpl()

~StandardGpuResourcesImpl() override

virtual bool supportsBFloat16(int device) override: Does the given GPU support bfloat16?

void noTempMemory(): Disable allocation of temporary memory; all temporary memory requests will call cudaMalloc / cudaFree at the point of use

void setTempMemory(size_t size): Specify that we wish to use a certain fixed size of memory on all devices as temporary memory. This is the upper bound for the GPU memory that we will reserve. We will never go above 1.5 GiB on any GPU; smaller GPUs (with <= 4 GiB or <= 8 GiB) will use less memory than that. To avoid any temporary memory allocation, pass 0.

void setPinnedMemory(size_t size): Set amount of pinned memory to allocate, for async GPU <-> CPU transfers

virtual void setDefaultStream(int device, cudaStream_t stream) override: Called to change the stream for work ordering. We do not own stream; i.e., it will not be destroyed when the GpuResources object gets cleaned up. We are guaranteed that all Faiss GPU work is ordered with respect to this stream upon exit from an index or other Faiss GPU call.

void revertDefaultStream(int device): Revert the default stream to the original stream managed by this resources object, in case someone called setDefaultStream.

virtual cudaStream_t getDefaultStream(int device) override: Returns the stream for the given device on which all Faiss GPU work is ordered. We are guaranteed that all Faiss GPU work is ordered with respect to this stream upon exit from an index or other Faiss GPU call.

void setDefaultNullStreamAllDevices(): Called to change the work ordering streams to the null stream for all devices

void setLogMemoryAllocations(bool enable): If enabled, will print every GPU memory allocation and deallocation to standard output

virtual void initializeForDevice(int device) override

Internal system calls.

Initialize resources for this device

virtual cublasHandle_t getBlasHandle(int device) override: Returns the cuBLAS handle that we use for the given device.

virtual std::vector<cudaStream_t> getAlternateStreams(int device) override: Returns the set of alternative streams that we use for the given device.

virtual void *allocMemory(const AllocRequest &req) override: Allocate non-temporary GPU memory.

virtual void deallocMemory(int device, void *in) override: Returns a previous allocation.

virtual size_t getTempMemoryAvailable(int device) const override: For MemorySpace::Temporary, how much space is immediately available without cudaMalloc allocation?

std::map<int, std::map<std::string, std::pair<int, size_t>>> getMemoryInfo() const: Export a description of memory used for Python.

virtual std::pair<void*, size_t> getPinnedMemory() override: Returns the available CPU pinned memory buffer.

virtual cudaStream_t getAsyncCopyStream(int device) override: Returns the stream on which we perform async CPU <-> GPU copies.

bool supportsBFloat16CurrentDevice()

Does the current GPU support bfloat16?

Functions provided by default

cublasHandle_t getBlasHandleCurrentDevice(): Calls getBlasHandle with the current device.

cudaStream_t getDefaultStreamCurrentDevice(): Calls getDefaultStream with the current device.

size_t getTempMemoryAvailableCurrentDevice() const: Calls getTempMemoryAvailable with the current device.

GpuMemoryReservation allocMemoryHandle(const AllocRequest &req): Returns a temporary memory allocation via a RAII object.

void syncDefaultStream(int device): Synchronizes the CPU with respect to the default stream for the given device

void syncDefaultStreamCurrentDevice(): Calls syncDefaultStream for the current device.

std::vector<cudaStream_t> getAlternateStreamsCurrentDevice(): Calls getAlternateStreams for the current device.

cudaStream_t getAsyncCopyStreamCurrentDevice(): Calls getAsyncCopyStream for the current device.

Protected Functions

bool isInitialized(int device) const: Have GPU resources been initialized for this device yet?

Protected Attributes

std::unordered_map<int, std::unordered_map<void*, AllocRequest>> allocs_: Set of currently outstanding memory allocations per device device -> (alloc request, allocated ptr)

std::unordered_map<int, std::unique_ptr<StackDeviceMemory>> tempMemory_: Temporary memory provider, per each device.

std::unordered_map<int, cudaStream_t> defaultStreams_: Our default stream that work is ordered on, one per each device.

std::unordered_map<int, cudaStream_t> userDefaultStreams_: This contains particular streams as set by the user for ordering, if any

std::unordered_map<int, std::vector<cudaStream_t>> alternateStreams_: Other streams we can use, per each device.

std::unordered_map<int, cudaStream_t> asyncCopyStreams_: Async copy stream to use for GPU <-> CPU pinned memory copies.

std::unordered_map<int, cublasHandle_t> blasHandles_: cuBLAS handle for each device

void *pinnedMemAlloc_: Pinned memory allocation for use with this GPU.

size_t pinnedMemAllocSize_

size_t tempMemSize_: Another option is to use a specified amount of memory on all devices

size_t pinnedMemSize_: Amount of pinned memory we should allocate.

bool allocLogging_: Whether or not we log every GPU memory allocation and deallocation.

Protected Static Functions

static size_t getDefaultTempMemForGPU(int device, size_t requested): Adjust the default temporary memory allocation based on the total GPU memory size

class StandardGpuResources : public faiss::gpu::GpuResourcesProvider

#include <StandardGpuResources.h>

Default implementation of GpuResources that allocates a cuBLAS stream and 2 streams for use, as well as temporary memory. Internally, the Faiss GPU code uses the instance managed by getResources, but this is the user-facing object that is internally reference counted.

Public Functions

StandardGpuResources()

~StandardGpuResources() override

virtual std::shared_ptr<GpuResources> getResources() override: Returns the shared resources object.

bool supportsBFloat16(int device): Whether or not the given device supports native bfloat16 arithmetic.

bool supportsBFloat16CurrentDevice(): Whether or not the current device supports native bfloat16 arithmetic.

void noTempMemory(): Disable allocation of temporary memory; all temporary memory requests will call cudaMalloc / cudaFree at the point of use

void setTempMemory(size_t size): Specify that we wish to use a certain fixed size of memory on all devices as temporary memory. This is the upper bound for the GPU memory that we will reserve. We will never go above 1.5 GiB on any GPU; smaller GPUs (with <= 4 GiB or <= 8 GiB) will use less memory than that. To avoid any temporary memory allocation, pass 0.

void setPinnedMemory(size_t size): Set amount of pinned memory to allocate, for async GPU <-> CPU transfers

void setDefaultStream(int device, cudaStream_t stream): Called to change the stream for work ordering. We do not own stream; i.e., it will not be destroyed when the GpuResources object gets cleaned up. We are guaranteed that all Faiss GPU work is ordered with respect to this stream upon exit from an index or other Faiss GPU call.

void revertDefaultStream(int device): Revert the default stream to the original stream managed by this resources object, in case someone called setDefaultStream.

void setDefaultNullStreamAllDevices(): Called to change the work ordering streams to the null stream for all devices

std::map<int, std::map<std::string, std::pair<int, size_t>>> getMemoryInfo() const: Export a description of memory used for Python.

cudaStream_t getDefaultStream(int device): Returns the current default stream.

size_t getTempMemoryAvailable(int device) const: Returns the current amount of temp memory available.

void syncDefaultStreamCurrentDevice(): Synchronize our default stream with the CPU.

void setLogMemoryAllocations(bool enable): If enabled, will print every GPU memory allocation and deallocation to standard output

Private Members

std::shared_ptr<StandardGpuResourcesImpl> res_

class DeviceScope

#include <DeviceUtils.h>

RAII object to set the current device, and restore the previous device upon destruction

Public Functions

explicit DeviceScope(int device)

~DeviceScope()

Private Members

int prevDevice_

class CublasHandleScope

#include <DeviceUtils.h>

RAII object to manage a cublasHandle_t.

Public Functions

CublasHandleScope()

~CublasHandleScope()

inline cublasHandle_t get()

Private Members

cublasHandle_t blasHandle_

class CudaEvent

Public Functions

explicit CudaEvent(cudaStream_t stream, bool timer = false): Creates an event and records it in this stream.

CudaEvent(const CudaEvent &event) = delete

CudaEvent(CudaEvent &&event) noexcept

~CudaEvent()

inline cudaEvent_t get()

void streamWaitOnEvent(cudaStream_t stream): Wait on this event in this stream.

void cpuWaitOnEvent(): Have the CPU wait for the completion of this event.

CudaEvent &operator=(CudaEvent &&event) noexcept

CudaEvent &operator=(CudaEvent &event) = delete

Private Members

cudaEvent_t event_

class StackDeviceMemory

#include <StackDeviceMemory.h>

Device memory manager that provides temporary memory allocations out of a region of memory, for a single device

Public Functions

StackDeviceMemory(GpuResources *res, int device, size_t allocPerDevice): Allocate a new region of memory that we manage.

StackDeviceMemory(int device, void *p, size_t size, bool isOwner): Manage a region of memory for a particular device, with or without ownership

~StackDeviceMemory()

int getDevice() const

void *allocMemory(cudaStream_t stream, size_t size): All allocations requested should be a multiple of 16 bytes.

void deallocMemory(int device, cudaStream_t, size_t size, void *p)

size_t getSizeAvailable() const

std::string toString() const

Protected Attributes

int device_: Our device.

Stack stack_: Memory stack.

struct Range

#include <StackDeviceMemory.h>

Previous allocation ranges and the streams for which synchronization is required

Public Functions

inline Range(char *s, char *e, cudaStream_t str)

Public Members

char *start_

char *end_

cudaStream_t stream_

struct Stack

Public Functions

Stack(GpuResources *res, int device, size_t size): Constructor that allocates memory via cudaMalloc.

~Stack()

size_t getSizeAvailable() const: Returns how much size is available for an allocation without calling cudaMalloc

char *getAlloc(size_t size, cudaStream_t stream): Obtains an allocation; all allocations are guaranteed to be 16 byte aligned

void returnAlloc(char *p, size_t size, cudaStream_t stream): Returns an allocation.

std::string toString() const: Returns the stack state.

Public Members

GpuResources *res_: Our GpuResources object.

int device_: Device this allocation is on.

char *alloc_: Where our temporary memory buffer is allocated; we allocate starting 16 bytes into this

size_t allocSize_: Total size of our allocation.

char *start_: Our temporary memory region; [start_, end_) is valid.

char *end_

char *head_: Stack head within [start, end)

std::list<Range> lastUsers_: List of previous last users of allocations on our stack, for possible synchronization purposes

size_t highWaterMemoryUsed_: What’s the high water mark in terms of memory used from the temporary buffer?

class KernelTimer

#include <Timer.h>

Utility class for timing execution of a kernel.

Public Functions

KernelTimer(cudaStream_t stream = nullptr): Constructor starts the timer and adds an event into the current device stream

~KernelTimer(): Destructor releases event resources.

float elapsedMilliseconds(): Adds a stop event then synchronizes on the stop event to get the actual GPU-side kernel timings for any kernels launched in the current stream. Returns the number of milliseconds elapsed. Can only be called once.

Private Members

cudaEvent_t startEvent_

cudaEvent_t stopEvent_

cudaStream_t stream_

bool valid_

class CpuTimer

#include <Timer.h>

CPU wallclock elapsed timer.

Public Functions

CpuTimer(): Creates and starts a new timer.

float elapsedMilliseconds(): Returns elapsed time in milliseconds.

Private Members

std::chrono::time_point<std::chrono::steady_clock> start_

namespace utils

Functions

template<typename U, typename V> constexpr __host__ __device__ auto divDown (U a, V b) -> decltype(a+b)

template<typename U, typename V> constexpr __host__ __device__ auto divUp (U a, V b) -> decltype(a+b)

template<typename U, typename V> constexpr __host__ __device__ auto roundDown (U a, V b) -> decltype(a+b)

template<typename U, typename V> constexpr __host__ __device__ auto roundUp (U a, V b) -> decltype(a+b)

template<class T> constexpr __host__ __device__ T pow (T n, T power)

template<class T> constexpr __host__ __device__ T pow2 (T n)

template<typename T> constexpr __host__ __device__ int log2 (T n, int p=0)

template<typename T> constexpr __host__ __device__ bool isPowerOf2 (T v)

template<typename T> constexpr __host__ __device__ T nextHighestPowerOf2 (T v)