Namespace faiss::gpu
-
namespace gpu
Enums
-
enum class DistanceDataType
Values:
-
enumerator F32
-
enumerator F16
-
enumerator F32
-
enum class IndicesDataType
Values:
-
enumerator I64
-
enumerator I32
-
enumerator I64
-
enum class graph_build_algo
Values:
-
enumerator IVF_PQ
Use IVF-PQ to build all-neighbors knn graph.
-
enumerator NN_DESCENT
Experimental, use NN-Descent to build all-neighbors knn graph.
-
enumerator IVF_PQ
-
enum class codebook_gen
A type for specifying how PQ codebooks are created.
Values:
-
enumerator PER_SUBSPACE
-
enumerator PER_CLUSTER
-
enumerator PER_SUBSPACE
-
enum class search_algo
Values:
-
enumerator SINGLE_CTA
For large batch sizes.
-
enumerator MULTI_CTA
For small batch sizes.
-
enumerator MULTI_KERNEL
-
enumerator AUTO
-
enumerator SINGLE_CTA
-
enum class hash_mode
Values:
-
enumerator HASH
-
enumerator SMALL
-
enumerator AUTO
-
enumerator HASH
-
enum IndicesOptions
How user vector index data is stored on the GPU.
Values:
-
enumerator INDICES_CPU
The user indices are only stored on the CPU; the GPU returns (inverted list, offset) to the CPU which is then translated to the real user index.
-
enumerator INDICES_IVF
The indices are not stored at all, on either the CPU or GPU. Only (inverted list, offset) is returned to the user as the index.
-
enumerator INDICES_32_BIT
Indices are stored as 32 bit integers on the GPU, but returned as 64 bit integers
-
enumerator INDICES_64_BIT
Indices are stored as 64 bit integers on the GPU.
-
enumerator INDICES_CPU
-
enum AllocType
Values:
-
enumerator Other
Unknown allocation type or miscellaneous (not currently categorized)
-
enumerator FlatData
Primary data storage for GpuIndexFlat (the raw matrix of vectors and vector norms if needed)
-
enumerator IVFLists
Primary data storage for GpuIndexIVF* (the storage for each individual IVF list)
-
enumerator Quantizer
Quantizer (PQ, SQ) dictionary information.
-
enumerator QuantizerPrecomputedCodes
For GpuIndexIVFPQ, “precomputed codes” for more efficient PQ lookup require the use of possibly large tables. These are marked separately from Quantizer as these can frequently be 100s - 1000s of MiB in size
-
enumerator TemporaryMemoryBuffer
StandardGpuResources implementation specific types When using StandardGpuResources, temporary memory allocations (MemorySpace::Temporary) come out of a stack region of memory that is allocated up front for each gpu (e.g., 1.5 GiB upon initialization). This allocation by StandardGpuResources is marked with this AllocType.
-
enumerator TemporaryMemoryOverflow
When using StandardGpuResources, any MemorySpace::Temporary allocations that cannot be satisfied within the TemporaryMemoryBuffer region fall back to calling cudaMalloc which are sized to just the request at hand. These “overflow” temporary allocations are marked with this AllocType.
-
enumerator Other
-
enum MemorySpace
Memory regions accessible to the GPU.
Values:
-
enumerator Temporary
Temporary device memory (guaranteed to no longer be used upon exit of a top-level index call, and where the streams using it have completed GPU work). Typically backed by Device memory (cudaMalloc/cudaFree).
-
enumerator Device
Managed using cudaMalloc/cudaFree (typical GPU device memory)
-
enumerator Unified
Managed using cudaMallocManaged/cudaFree (typical Unified CPU/GPU memory)
-
enumerator Temporary
Functions
-
faiss::Index *index_gpu_to_cpu(const faiss::Index *gpu_index)
converts any GPU index inside gpu_index to a CPU index
-
faiss::Index *index_cpu_to_gpu(GpuResourcesProvider *provider, int device, const faiss::Index *index, const GpuClonerOptions *options = nullptr)
converts any CPU index that can be converted to GPU
-
faiss::Index *index_cpu_to_gpu_multiple(std::vector<GpuResourcesProvider*> &provider, std::vector<int> &devices, const faiss::Index *index, const GpuMultipleClonerOptions *options = nullptr)
-
faiss::IndexBinary *index_binary_gpu_to_cpu(const faiss::IndexBinary *gpu_index)
-
faiss::IndexBinary *index_binary_cpu_to_gpu(GpuResourcesProvider *provider, int device, const faiss::IndexBinary *index, const GpuClonerOptions *options = nullptr)
converts any CPU index that can be converted to GPU
-
faiss::IndexBinary *index_binary_cpu_to_gpu_multiple(std::vector<GpuResourcesProvider*> &provider, std::vector<int> &devices, const faiss::IndexBinary *index, const GpuMultipleClonerOptions *options = nullptr)
-
bool should_use_raft(GpuDistanceParams args)
A function that determines whether RAFT should be used based on various conditions (such as unsupported architecture)
-
void bfKnn(GpuResourcesProvider *resources, const GpuDistanceParams &args)
A wrapper for gpu/impl/Distance.cuh to expose direct brute-force k-nearest neighbor searches on an externally-provided region of memory (e.g., from a pytorch tensor). The data (vectors, queries, outDistances, outIndices) can be resident on the GPU or the CPU, but all calculations are performed on the GPU. If the result buffers are on the CPU, results will be copied back when done.
All GPU computation is performed on the current CUDA device, and ordered with respect to resources->getDefaultStreamCurrentDevice().
For each vector in
queries
, searches all ofvectors
to find its k nearest neighbors with respect to the given metric
-
void bfKnn_tiling(GpuResourcesProvider *resources, const GpuDistanceParams &args, size_t vectorsMemoryLimit, size_t queriesMemoryLimit)
-
void bruteForceKnn(GpuResourcesProvider *resources, faiss::MetricType metric, const float *vectors, bool vectorsRowMajor, idx_t numVectors, const float *queries, bool queriesRowMajor, idx_t numQueries, int dims, int k, float *outDistances, idx_t *outIndices)
Deprecated legacy implementation.
-
bool should_use_raft(GpuIndexConfig config_)
A centralized function that determines whether RAFT should be used based on various conditions (such as unsupported architecture)
-
GpuIndex *tryCastGpuIndex(faiss::Index *index)
If the given index is a GPU index, this returns the index instance.
-
bool isGpuIndexImplemented(faiss::Index *index)
Does the given CPU index instance have a corresponding GPU implementation?
-
std::string memorySpaceToString(MemorySpace s)
Convert a MemorySpace to string.
-
AllocInfo makeDevAlloc(AllocType at, cudaStream_t st)
Create an AllocInfo for the current device with MemorySpace::Device.
-
AllocInfo makeTempAlloc(AllocType at, cudaStream_t st)
Create an AllocInfo for the current device with MemorySpace::Temporary.
-
AllocInfo makeSpaceAlloc(AllocType at, MemorySpace sp, cudaStream_t st)
Create an AllocInfo for the current device.
-
int getMaxKSelection()
A collection of various utility functions for index implementation.
Returns the maximum k-selection value supported based on the CUDA SDK that we were compiled with. .cu files can use DeviceDefs.cuh, but this is for non-CUDA files
-
void validateKSelect(int k)
-
void validateNProbe(size_t nprobe)
-
std::vector<uint8_t> unpackNonInterleaved(std::vector<uint8_t> data, int numVecs, int dims, int bitsPerCode)
-
std::vector<uint8_t> unpackInterleaved(std::vector<uint8_t> data, int numVecs, int dims, int bitsPerCode)
-
std::vector<uint8_t> packNonInterleaved(std::vector<uint8_t> data, int numVecs, int dims, int bitsPerCode)
-
std::vector<uint8_t> packInterleaved(std::vector<uint8_t> data, int numVecs, int dims, int bitsPerCode)
-
void ivfOffsetToUserIndex(idx_t *indices, idx_t numLists, idx_t queries, int k, const std::vector<std::vector<idx_t>> &listOffsetToUserIndex)
Utility function to translate (list id, offset) to a user index on the CPU. In a cpp in order to use OpenMP.
-
void newTestSeed()
Generates and displays a new seed for the test.
-
void setTestSeed(long seed)
Uses an explicit seed for the test.
-
float relativeError(float a, float b)
Returns the relative error in difference between a and b (|a - b| / (0.5 * (|a| + |b|))
-
int randVal(int a, int b)
Generates a random integer in the range [a, b].
-
bool randBool()
Generates a random bool.
-
template<typename T>
T randSelect(std::initializer_list<T> vals) Select a random value from the given list of values provided as an initializer_list
-
std::vector<float> randVecs(size_t num, size_t dim)
Generates a collection of random vectors in the range [0, 1].
-
std::vector<unsigned char> randBinaryVecs(size_t num, size_t dim)
Generates a collection of random bit vectors.
-
void compareIndices(const std::vector<float> &queryVecs, faiss::Index &refIndex, faiss::Index &testIndex, int numQuery, int dim, int k, const std::string &configMsg, float maxRelativeError = 6e-5f, float pctMaxDiff1 = 0.1f, float pctMaxDiffN = 0.005f)
Compare two indices via query for similarity, with a user-specified set of query vectors
-
void compareIndices(faiss::Index &refIndex, faiss::Index &testIndex, int numQuery, int dim, int k, const std::string &configMsg, float maxRelativeError = 6e-5f, float pctMaxDiff1 = 0.1f, float pctMaxDiffN = 0.005f)
Compare two indices via query for similarity, generating random query vectors
-
void compareLists(const float *refDist, const faiss::idx_t *refInd, const float *testDist, const faiss::idx_t *testInd, int dim1, int dim2, const std::string &configMsg, bool printBasicStats, bool printDiffs, bool assertOnErr, float maxRelativeError = 6e-5f, float pctMaxDiff1 = 0.1f, float pctMaxDiffN = 0.005f)
Display specific differences in the two (distance, index) lists.
-
template<typename A, typename B>
void testIVFEquality(A &cpuIndex, B &gpuIndex) Compare IVF lists between a CPU and GPU index.
-
int getCurrentDevice()
Returns the current thread-local GPU device.
-
void setCurrentDevice(int device)
Sets the current thread-local GPU device.
-
int getNumDevices()
Returns the number of available GPU devices.
-
void profilerStart()
Starts the CUDA profiler (exposed via SWIG)
-
void profilerStop()
Stops the CUDA profiler (exposed via SWIG)
-
void synchronizeAllDevices()
Synchronizes the CPU against all devices (equivalent to cudaDeviceSynchronize for each device)
-
const cudaDeviceProp &getDeviceProperties(int device)
Returns a cached cudaDeviceProp for the given device.
-
const cudaDeviceProp &getCurrentDeviceProperties()
Returns the cached cudaDeviceProp for the current device.
-
int getMaxThreads(int device)
Returns the maximum number of threads available for the given GPU device
-
int getMaxThreadsCurrentDevice()
Equivalent to getMaxThreads(getCurrentDevice())
-
dim3 getMaxGrid(int device)
Returns the maximum grid size for the given GPU device.
-
dim3 getMaxGridCurrentDevice()
Equivalent to getMaxGrid(getCurrentDevice())
Returns the maximum smem available for the given GPU device.
Equivalent to getMaxSharedMemPerBlock(getCurrentDevice())
-
int getDeviceForAddress(const void *p)
For a given pointer, returns whether or not it is located on a device (deviceId >= 0) or the host (-1).
-
bool getFullUnifiedMemSupport(int device)
Does the given device support full unified memory sharing host memory?
-
bool getFullUnifiedMemSupportCurrentDevice()
Equivalent to getFullUnifiedMemSupport(getCurrentDevice())
-
bool getTensorCoreSupport(int device)
Does the given device support tensor core operations?
-
bool getTensorCoreSupportCurrentDevice()
Equivalent to getTensorCoreSupport(getCurrentDevice())
-
int getWarpSize(int device)
Returns the warp size of the given GPU device.
-
int getWarpSizeCurrentDevice()
Equivalent to getWarpSize(getCurrentDevice())
-
size_t getFreeMemory(int device)
Returns the amount of currently available memory on the given device.
-
size_t getFreeMemoryCurrentDevice()
Equivalent to getFreeMemory(getCurrentDevice())
-
template<typename L1, typename L2>
void streamWaitBase(const L1 &listWaiting, const L2 &listWaitOn) Call for a collection of streams to wait on.
-
template<typename L1>
void streamWait(const L1 &a, const std::initializer_list<cudaStream_t> &b) These versions allow usage of initializer_list as arguments, since otherwise {…} doesn’t have a type
-
inline void streamWait(const std::initializer_list<cudaStream_t> &a, const std::initializer_list<cudaStream_t> &b)
-
inline raft::distance::DistanceType metricFaissToRaft(MetricType metric, bool exactDistance)
-
void validRowIndices(GpuResources *res, Tensor<float, 2, true> &vecs, bool *validRows)
Identify matrix rows containing non NaN values. validRows[i] is false if row i contains a NaN value and true otherwise.
-
idx_t inplaceGatherFilteredRows(GpuResources *res, Tensor<float, 2, true> &vecs, Tensor<idx_t, 1, true> &indices)
Filter out matrix rows containing NaN values. The vectors and indices are updated in-place.
-
struct GpuParameterSpace : public faiss::ParameterSpace
- #include <GpuAutoTune.h>
parameter space and setters for GPU indexes
Public Functions
-
virtual void initialize(const faiss::Index *index) override
initialize with reasonable parameters for the index
-
virtual void set_index_parameter(faiss::Index *index, const std::string &name, double val) const override
set a combination of parameters on an index
-
size_t n_combinations() const
nb of combinations, = product of values sizes
-
bool combination_ge(size_t c1, size_t c2) const
returns whether combinations c1 >= c2 in the tuple sense
-
std::string combination_name(size_t cno) const
get string representation of the combination
-
void display() const
print a description on stdout
-
ParameterRange &add_range(const std::string &name)
add a new parameter (or return it if it exists)
-
void set_index_parameters(Index *index, size_t cno) const
set a combination of parameters on an index
-
void set_index_parameters(Index *index, const char *param_string) const
set a combination of parameters described by a string
-
void update_bounds(size_t cno, const OperatingPoint &op, double *upper_bound_perf, double *lower_bound_t) const
find an upper bound on the performance and a lower bound on t for configuration cno given another operating point op
-
void explore(Index *index, size_t nq, const float *xq, const AutoTuneCriterion &crit, OperatingPoints *ops) const
explore operating points
- Parameters:
index – index to run on
xq – query vectors (size nq * index.d)
crit – selection criterion
ops – resulting operating points
Public Members
-
std::vector<ParameterRange> parameter_ranges
all tunable parameters
-
int verbose
verbosity during exploration
-
int n_experiments
nb of experiments during optimization (0 = try all combinations)
-
size_t batchsize
maximum number of queries to submit at a time.
-
bool thread_over_batches
use multithreading over batches (useful to benchmark independent single-searches)
-
double min_test_duration
run tests several times until they reach at least this duration (to avoid jittering in MT mode)
-
virtual void initialize(const faiss::Index *index) override
-
struct ToCPUCloner : public faiss::Cloner
- #include <GpuCloner.h>
Cloner specialized for GPU -> CPU.
Public Functions
-
virtual VectorTransform *clone_VectorTransform(const VectorTransform*)
-
virtual VectorTransform *clone_VectorTransform(const VectorTransform*)
-
struct ToGpuCloner : public faiss::Cloner, public faiss::gpu::GpuClonerOptions
- #include <GpuCloner.h>
Cloner specialized for CPU -> 1 GPU.
Public Functions
-
ToGpuCloner(GpuResourcesProvider *prov, int device, const GpuClonerOptions &options)
-
virtual VectorTransform *clone_VectorTransform(const VectorTransform*)
Public Members
-
GpuResourcesProvider *provider
-
int device
-
IndicesOptions indicesOptions = INDICES_64_BIT
how should indices be stored on index types that support indices (anything but GpuIndexFlat*)?
-
bool useFloat16CoarseQuantizer = false
is the coarse quantizer in float16?
-
bool useFloat16 = false
for GpuIndexIVFFlat, is storage in float16? for GpuIndexIVFPQ, are intermediate calculations in float16?
-
bool usePrecomputed = false
use precomputed tables?
-
long reserveVecs = 0
reserve vectors in the invfiles?
-
bool storeTransposed = false
For GpuIndexFlat, store data in transposed layout?
-
bool verbose = false
Set verbose options on the index.
-
bool use_raft = false
use the RAFT implementation
-
bool allowCpuCoarseQuantizer = false
This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.
-
ToGpuCloner(GpuResourcesProvider *prov, int device, const GpuClonerOptions &options)
-
struct ToGpuClonerMultiple : public faiss::Cloner, public faiss::gpu::GpuMultipleClonerOptions
- #include <GpuCloner.h>
Cloner specialized for CPU -> multiple GPUs.
Public Functions
-
ToGpuClonerMultiple(std::vector<GpuResourcesProvider*> &provider, std::vector<int> &devices, const GpuMultipleClonerOptions &options)
-
ToGpuClonerMultiple(const std::vector<ToGpuCloner> &sub_cloners, const GpuMultipleClonerOptions &options)
-
virtual VectorTransform *clone_VectorTransform(const VectorTransform*)
Public Members
-
std::vector<ToGpuCloner> sub_cloners
-
bool shard = false
Whether to shard the index across GPUs, versus replication across GPUs
-
int shard_type = 1
IndexIVF::copy_subset_to subset type.
-
bool common_ivf_quantizer = false
set to true if an IndexIVF is to be dispatched to multiple GPUs with a single common IVF quantizer, ie. only the inverted lists are sharded on the sub-indexes (uses an IndexShardsIVF)
-
IndicesOptions indicesOptions = INDICES_64_BIT
how should indices be stored on index types that support indices (anything but GpuIndexFlat*)?
-
bool useFloat16CoarseQuantizer = false
is the coarse quantizer in float16?
-
bool useFloat16 = false
for GpuIndexIVFFlat, is storage in float16? for GpuIndexIVFPQ, are intermediate calculations in float16?
-
bool usePrecomputed = false
use precomputed tables?
-
long reserveVecs = 0
reserve vectors in the invfiles?
-
bool storeTransposed = false
For GpuIndexFlat, store data in transposed layout?
-
bool verbose = false
Set verbose options on the index.
-
bool use_raft = false
use the RAFT implementation
-
bool allowCpuCoarseQuantizer = false
This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.
-
ToGpuClonerMultiple(std::vector<GpuResourcesProvider*> &provider, std::vector<int> &devices, const GpuMultipleClonerOptions &options)
-
struct GpuProgressiveDimIndexFactory : public faiss::ProgressiveDimIndexFactory
- #include <GpuCloner.h>
index factory for the ProgressiveDimClustering object
Public Functions
-
explicit GpuProgressiveDimIndexFactory(int ngpu)
-
virtual Index *operator()(int dim) override
ownership transferred to caller
-
virtual ~GpuProgressiveDimIndexFactory() override
Public Members
-
GpuMultipleClonerOptions options
-
std::vector<GpuResourcesProvider*> vres
-
std::vector<int> devices
-
int ncall
-
explicit GpuProgressiveDimIndexFactory(int ngpu)
-
struct GpuClonerOptions
- #include <GpuClonerOptions.h>
set some options on how to copy to GPU
Subclassed by faiss::gpu::GpuMultipleClonerOptions, faiss::gpu::ToGpuCloner
Public Members
-
IndicesOptions indicesOptions = INDICES_64_BIT
how should indices be stored on index types that support indices (anything but GpuIndexFlat*)?
-
bool useFloat16CoarseQuantizer = false
is the coarse quantizer in float16?
-
bool useFloat16 = false
for GpuIndexIVFFlat, is storage in float16? for GpuIndexIVFPQ, are intermediate calculations in float16?
-
bool usePrecomputed = false
use precomputed tables?
-
long reserveVecs = 0
reserve vectors in the invfiles?
-
bool storeTransposed = false
For GpuIndexFlat, store data in transposed layout?
-
bool verbose = false
Set verbose options on the index.
-
bool use_raft = false
use the RAFT implementation
-
bool allowCpuCoarseQuantizer = false
This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.
-
IndicesOptions indicesOptions = INDICES_64_BIT
-
struct GpuMultipleClonerOptions : public faiss::gpu::GpuClonerOptions
Subclassed by faiss::gpu::ToGpuClonerMultiple
Public Members
-
bool shard = false
Whether to shard the index across GPUs, versus replication across GPUs
-
int shard_type = 1
IndexIVF::copy_subset_to subset type.
-
bool common_ivf_quantizer = false
set to true if an IndexIVF is to be dispatched to multiple GPUs with a single common IVF quantizer, ie. only the inverted lists are sharded on the sub-indexes (uses an IndexShardsIVF)
-
IndicesOptions indicesOptions = INDICES_64_BIT
how should indices be stored on index types that support indices (anything but GpuIndexFlat*)?
-
bool useFloat16CoarseQuantizer = false
is the coarse quantizer in float16?
-
bool useFloat16 = false
for GpuIndexIVFFlat, is storage in float16? for GpuIndexIVFPQ, are intermediate calculations in float16?
-
bool usePrecomputed = false
use precomputed tables?
-
long reserveVecs = 0
reserve vectors in the invfiles?
-
bool storeTransposed = false
For GpuIndexFlat, store data in transposed layout?
-
bool verbose = false
Set verbose options on the index.
-
bool use_raft = false
use the RAFT implementation
-
bool allowCpuCoarseQuantizer = false
This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.
-
bool shard = false
-
struct GpuDistanceParams
- #include <GpuDistance.h>
Arguments to brute-force GPU k-nearest neighbor searching.
Public Members
-
faiss::MetricType metric = METRIC_L2
Search parameter: distance metric.
-
float metricArg = 0
Search parameter: distance metric argument (if applicable) For metric == METRIC_Lp, this is the p-value
-
int k = 0
Search parameter: return k nearest neighbors If the value provided is -1, then we report all pairwise distances without top-k filtering
-
int dims = 0
Vector dimensionality.
-
const void *vectors = nullptr
If vectorsRowMajor is true, this is numVectors x dims, with dims innermost; otherwise, dims x numVectors, with numVectors innermost
-
DistanceDataType vectorType = DistanceDataType::F32
-
bool vectorsRowMajor = true
-
idx_t numVectors = 0
-
const float *vectorNorms = nullptr
Precomputed L2 norms for each vector in
vectors
, which can be optionally provided in advance to speed computation for METRIC_L2
-
const void *queries = nullptr
If queriesRowMajor is true, this is numQueries x dims, with dims innermost; otherwise, dims x numQueries, with numQueries innermost
-
DistanceDataType queryType = DistanceDataType::F32
-
bool queriesRowMajor = true
-
idx_t numQueries = 0
-
float *outDistances = nullptr
A region of memory size numQueries x k, with k innermost (row major) if k > 0, or if k == -1, a region of memory of size numQueries x numVectors
-
bool ignoreOutDistances = false
Do we only care about the indices reported, rather than the output distances? Not used if k == -1 (all pairwise distances)
-
IndicesDataType outIndicesType = IndicesDataType::I64
A region of memory size numQueries x k, with k innermost (row major). Not used if k == -1 (all pairwise distances)
-
void *outIndices = nullptr
-
int device = -1
On which GPU device should the search run? -1 indicates that the current CUDA thread-local device (via cudaGetDevice/cudaSetDevice) is used Otherwise, an integer 0 <= device < numDevices indicates the device for execution
-
bool use_raft = false
Should the index dispatch down to RAFT? TODO: change default to true if RAFT is enabled
-
faiss::MetricType metric = METRIC_L2
-
class GpuIcmEncoder : public faiss::lsq::IcmEncoder
- #include <GpuIcmEncoder.h>
Perform LSQ encoding on GPU.
Split input vectors to different devices and call IcmEncoderImpl::encode to encode them
Public Functions
-
GpuIcmEncoder(const LocalSearchQuantizer *lsq, const std::vector<GpuResourcesProvider*> &provs, const std::vector<int> &devices)
-
~GpuIcmEncoder()
-
GpuIcmEncoder(const GpuIcmEncoder&) = delete
-
GpuIcmEncoder &operator=(const GpuIcmEncoder&) = delete
-
virtual void set_binary_term() override
-
virtual void encode(int32_t *codes, const float *x, std::mt19937 &gen, size_t n, size_t ils_iters) const override
Encode vectors given codebooks
- Parameters:
codes – output codes, size n * M
x – vectors to encode, size n * d
gen – random generator
n – number of vectors
ils_iters – number of iterations of iterative local search
Private Members
-
std::unique_ptr<IcmEncoderShards> shards
-
GpuIcmEncoder(const LocalSearchQuantizer *lsq, const std::vector<GpuResourcesProvider*> &provs, const std::vector<int> &devices)
-
struct GpuIcmEncoderFactory : public faiss::lsq::IcmEncoderFactory
Public Functions
-
explicit GpuIcmEncoderFactory(int ngpus = 1)
-
virtual lsq::IcmEncoder *get(const LocalSearchQuantizer *lsq) override
-
explicit GpuIcmEncoderFactory(int ngpus = 1)
-
struct GpuIndexConfig
Subclassed by faiss::gpu::GpuIndexBinaryFlatConfig, faiss::gpu::GpuIndexCagraConfig, faiss::gpu::GpuIndexFlatConfig, faiss::gpu::GpuIndexIVFConfig
Public Members
-
int device = 0
GPU device on which the index is resident.
-
MemorySpace memorySpace = MemorySpace::Device
What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.
-
bool use_raft = false
Should the index dispatch down to RAFT?
-
int device = 0
-
class GpuIndex : public faiss::Index
Subclassed by faiss::gpu::GpuIndexCagra, faiss::gpu::GpuIndexFlat, faiss::gpu::GpuIndexIVF
Public Types
-
using component_t = float
-
using distance_t = float
Public Functions
-
int getDevice() const
Returns the device that this index is resident on.
-
std::shared_ptr<GpuResources> getResources()
Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU
-
void setMinPagingSize(size_t size)
Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging
-
size_t getMinPagingSize() const
Returns the current minimum data size for paged searches.
-
virtual void add(idx_t, const float *x) override
x
can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override
x
andids
can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override
x
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
andrecons
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const override
Overridden to force GPU indices to provide their own GPU-friendly implementation
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override
Overridden to force GPU indices to provide their own GPU-friendly implementation
-
virtual void train(idx_t n, const float *x)
Perform training on a representative set of vectors
- Parameters:
n – nb of training vectors
x – training vecors, size n * d
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters:
n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual void reset() = 0
removes all elements from the database.
-
virtual size_t remove_ids(const IDSelector &sel)
removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.
-
virtual void reconstruct(idx_t key, float *recons) const
Reconstruct a stored vector (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters:
key – id of the vector to reconstruct
recons – reconstucted vector (size d)
-
virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const
Reconstruct several stored vectors (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters:
n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)
-
virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const
Reconstruct vectors i0 to i0 + ni - 1
this function may not be defined for some indexes
- Parameters:
i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)
-
virtual DistanceComputer *get_distance_computer() const
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const
encode a set of vectors
- Parameters:
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const
decode a set of vectors
- Parameters:
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
-
virtual void merge_from(Index &otherIndex, idx_t add_id = 0)
moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)
-
virtual void check_compatible_for_merge(const Index &otherIndex) const
check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.
Public Members
-
int d
vector dimension
-
idx_t ntotal
total nb of indexed vectors
-
bool verbose
verbosity level
-
bool is_trained
set if the Index does not require training, or if training is done already
-
MetricType metric_type
type of metric this index uses for search
-
float metric_arg
argument of the metric type
Protected Functions
-
virtual bool addImplRequiresIDs_() const = 0
Does addImpl_ require IDs? If so, and no IDs are provided, we will generate them sequentially based on the order in which the IDs are added
-
virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) = 0
Overridden to actually perform the add All data is guaranteed to be resident on our device
-
virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const = 0
Overridden to actually perform the search All data is guaranteed to be resident on our device
Protected Attributes
-
std::shared_ptr<GpuResources> resources_
Manages streams, cuBLAS handles and scratch memory for devices.
-
const GpuIndexConfig config_
Our configuration options.
-
size_t minPagedSize_
Size above which we page copies from the CPU to GPU.
Private Functions
-
void addPaged_(idx_t n, const float *x, const idx_t *ids)
Handles paged adds if the add set is too large, passes to addImpl_ to actually perform the add for the current page
-
void addPage_(idx_t n, const float *x, const idx_t *ids)
Calls addImpl_ for a single page of GPU-resident data.
-
void searchNonPaged_(idx_t n, const float *x, int k, float *outDistancesData, idx_t *outIndicesData, const SearchParameters *params) const
Calls searchImpl_ for a single page of GPU-resident data.
-
void searchFromCpuPaged_(idx_t n, const float *x, int k, float *outDistancesData, idx_t *outIndicesData, const SearchParameters *params) const
Calls searchImpl_ for a single page of GPU-resident data, handling paging of the data and copies from the CPU
-
using component_t = float
-
struct GpuIndexBinaryFlatConfig : public faiss::gpu::GpuIndexConfig
Public Members
-
int device = 0
GPU device on which the index is resident.
-
MemorySpace memorySpace = MemorySpace::Device
What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.
-
bool use_raft = false
Should the index dispatch down to RAFT?
-
int device = 0
-
class GpuIndexBinaryFlat : public faiss::IndexBinary
- #include <GpuIndexBinaryFlat.h>
A GPU version of IndexBinaryFlat for brute-force comparison of bit vectors via Hamming distance
Public Types
-
using component_t = uint8_t
-
using distance_t = int32_t
Public Functions
-
GpuIndexBinaryFlat(GpuResourcesProvider *resources, const faiss::IndexBinaryFlat *index, GpuIndexBinaryFlatConfig config = GpuIndexBinaryFlatConfig())
Construct from a pre-existing faiss::IndexBinaryFlat instance, copying data over to the given GPU
-
GpuIndexBinaryFlat(GpuResourcesProvider *resources, int dims, GpuIndexBinaryFlatConfig config = GpuIndexBinaryFlatConfig())
Construct an empty instance that can be added to.
-
~GpuIndexBinaryFlat() override
-
int getDevice() const
Returns the device that this index is resident on.
-
std::shared_ptr<GpuResources> getResources()
Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU
-
void copyFrom(const faiss::IndexBinaryFlat *index)
Initialize ourselves from the given CPU index; will overwrite all data in ourselves
-
void copyTo(faiss::IndexBinaryFlat *index) const
Copy ourselves to the given CPU index; will overwrite all data in the index instance
-
virtual void add(faiss::idx_t n, const uint8_t *x) override
Add n vectors of dimension d to the index.
Vectors are implicitly assigned labels ntotal .. ntotal + n - 1
- Parameters:
x – input matrix, size n * d / 8
-
virtual void reset() override
Removes all elements from the database.
-
virtual void search(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, faiss::idx_t *labels, const faiss::SearchParameters *params = nullptr) const override
Query n vectors of dimension d to the index.
return at most k vectors. If there are not enough results for a query, the result array is padded with -1s.
- Parameters:
x – input vectors to search, size n * d / 8
labels – output labels of the NNs, size n*k
distances – output pairwise distances, size n*k
-
virtual void reconstruct(faiss::idx_t key, uint8_t *recons) const override
Reconstruct a stored vector.
This function may not be defined for some indexes.
- Parameters:
key – id of the vector to reconstruct
recons – reconstucted vector (size d / 8)
-
virtual void train(idx_t n, const uint8_t *x)
Perform training on a representative set of vectors.
- Parameters:
n – nb of training vectors
x – training vecors, size n * d / 8
-
virtual void add_with_ids(idx_t n, const uint8_t *x, const idx_t *xids)
Same as add, but stores xids instead of sequential ids.
The default implementation fails with an assertion, as it is not supported by all indexes.
- Parameters:
xids – if non-null, ids to store for the vectors (size n)
-
virtual void range_search(idx_t n, const uint8_t *x, int radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const
Query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory). The distances are converted to float to reuse the RangeSearchResult structure, but they are integer. By convention, only distances < radius (strict comparison) are returned, ie. radius = 0 does not return any result and 1 returns only exact same vectors.
- Parameters:
x – input vectors to search, size n * d / 8
radius – search radius
result – result table
-
void assign(idx_t n, const uint8_t *x, idx_t *labels, idx_t k = 1) const
Return the indexes of the k vectors closest to the query x.
This function is identical to search but only returns labels of neighbors.
- Parameters:
x – input vectors to search, size n * d / 8
labels – output labels of the NNs, size n*k
-
virtual size_t remove_ids(const IDSelector &sel)
Removes IDs from the index. Not supported by all indexes.
-
virtual void reconstruct_n(idx_t i0, idx_t ni, uint8_t *recons) const
Reconstruct vectors i0 to i0 + ni - 1.
This function may not be defined for some indexes.
- Parameters:
recons – reconstucted vectors (size ni * d / 8)
-
virtual void search_and_reconstruct(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels, uint8_t *recons, const SearchParameters *params = nullptr) const
Similar to search, but also reconstructs the stored vectors (or an approximation in the case of lossy coding) for the search results.
If there are not enough results for a query, the resulting array is padded with -1s.
- Parameters:
recons – reconstructed vectors size (n, k, d)
-
void display() const
Display the actual class name and some more info.
-
virtual void merge_from(IndexBinary &otherIndex, idx_t add_id = 0)
moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)
-
virtual void check_compatible_for_merge(const IndexBinary &otherIndex) const
check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.
Public Members
-
int d = 0
vector dimension
-
int code_size = 0
number of bytes per vector ( = d / 8 )
-
idx_t ntotal = 0
total nb of indexed vectors
-
bool verbose = false
verbosity level
-
bool is_trained = true
set if the Index does not require training, or if training is done already
-
MetricType metric_type = METRIC_L2
type of metric this index uses for search
Protected Functions
Protected Attributes
-
std::shared_ptr<GpuResources> resources_
Manages streans, cuBLAS handles and scratch memory for devices.
-
const GpuIndexBinaryFlatConfig binaryFlatConfig_
Configuration options.
-
std::unique_ptr<BinaryFlatIndex> data_
Holds our GPU data containing the list of vectors.
-
using component_t = uint8_t
-
struct IVFPQBuildCagraConfig
Public Members
-
uint32_t n_lists = 1024
The number of inverted lists (clusters)
Hint: the number of vectors per cluster (
n_rows/n_lists
) should be approximately 1,000 to 10,000.
-
uint32_t kmeans_n_iters = 20
The number of iterations searching for kmeans centers (index building).
-
double kmeans_trainset_fraction = 0.5
The fraction of data to use during iterative kmeans building.
-
uint32_t pq_bits = 8
The bit length of the vector element after compression by PQ.
Possible values: [4, 5, 6, 7, 8].
Hint: the smaller the ‘pq_bits’, the smaller the index size and the better the search performance, but the lower the recall.
-
uint32_t pq_dim = 0
The dimensionality of the vector after compression by PQ. When zero, an optimal value is selected using a heuristic.
NB:
pq_dim /// pq_bits
must be a multiple of 8.Hint: a smaller ‘pq_dim’ results in a smaller index size and better search performance, but lower recall. If ‘pq_bits’ is 8, ‘pq_dim’ can be set to any number, but multiple of 8 are desirable for good performance. If ‘pq_bits’ is not 8, ‘pq_dim’ should be a multiple of 8. For good performance, it is desirable that ‘pq_dim’ is a multiple of 32. Ideally, ‘pq_dim’ should be also a divisor of the dataset dim.
-
codebook_gen codebook_kind = codebook_gen::PER_SUBSPACE
How PQ codebooks are created.
-
bool force_random_rotation = false
Apply a random rotation matrix on the input data and queries even if
dim % pq_dim == 0
.Note: if
dim
is not multiple ofpq_dim
, a random rotation is always applied to the input data and queries to transform the working space fromdim
torot_dim
, which may be slightly larger than the original space and and is a multiple ofpq_dim
(rot_dim % pq_dim == 0
). However, this transform is not necessary whendim
is multiple ofpq_dim
(dim == rot_dim
, hence no need in adding “extra” data columns / features).By default, if
dim == rot_dim
, the rotation transform is initialized with the identity matrix. Whenforce_random_rotation == true
, a random orthogonal transform matrix is generated regardless of the values ofdim
andpq_dim
.
-
bool conservative_memory_allocation = false
By default, the algorithm allocates more space than necessary for individual clusters (
list_data
). This allows to amortize the cost of memory allocation and reduce the number of data copies during repeated calls toextend
(extending the database).The alternative is the conservative allocation behavior; when enabled, the algorithm always allocates the minimum amount of memory required to store the given number of records. Set this flag to
true
if you prefer to use as little GPU memory for the database as possible.
-
uint32_t n_lists = 1024
-
struct IVFPQSearchCagraConfig
Public Members
-
uint32_t n_probes = 20
The number of clusters to search.
-
cudaDataType_t lut_dtype = CUDA_R_32F
Data type of look up table to be created dynamically at search time.
Possible values: [CUDA_R_32F, CUDA_R_16F, CUDA_R_8U]
The use of low-precision types reduces the amount of shared memory required at search time, so fast shared memory kernels can be used even for datasets with large dimansionality. Note that the recall is slightly degraded when low-precision type is selected.
-
cudaDataType_t internal_distance_dtype = CUDA_R_32F
Storage data type for distance/similarity computed at search time.
Possible values: [CUDA_R_16F, CUDA_R_32F]
If the performance limiter at search time is device memory access, selecting FP16 will improve performance slightly.
-
double preferred_shmem_carveout = 1.0
Preferred fraction of SM’s unified memory / L1 cache to be used as shared memory.
Possible values: [0.0 - 1.0] as a fraction of the
sharedMemPerMultiprocessor
.One wants to increase the carveout to make sure a good GPU occupancy for the main search kernel, but not to keep it too high to leave some memory to be used as L1 cache. Note, this value is interpreted only as a hint. Moreover, a GPU usually allows only a fixed set of cache configurations, so the provided value is rounded up to the nearest configuration. Refer to the NVIDIA tuning guide for the target GPU architecture.
Note, this is a low-level tuning parameter that can have drastic negative effects on the search performance if tweaked incorrectly.
-
uint32_t n_probes = 20
-
struct GpuIndexCagraConfig : public faiss::gpu::GpuIndexConfig
Public Members
-
size_t intermediate_graph_degree = 128
Degree of input graph for pruning.
-
size_t graph_degree = 64
Degree of output graph.
-
graph_build_algo build_algo = graph_build_algo::IVF_PQ
ANN algorithm to build knn graph.
-
size_t nn_descent_niter = 20
Number of Iterations to run if building with NN_DESCENT.
-
IVFPQBuildCagraConfig *ivf_pq_params = nullptr
-
IVFPQSearchCagraConfig *ivf_pq_search_params = nullptr
-
int device = 0
GPU device on which the index is resident.
-
MemorySpace memorySpace = MemorySpace::Device
What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.
-
bool use_raft = false
Should the index dispatch down to RAFT?
-
size_t intermediate_graph_degree = 128
-
struct SearchParametersCagra : public faiss::SearchParameters
Public Members
-
size_t max_queries = 0
Maximum number of queries to search at the same time (batch size). Auto select when 0.
-
size_t itopk_size = 64
Number of intermediate search results retained during the search.
This is the main knob to adjust trade off between accuracy and search speed. Higher values improve the search accuracy.
-
size_t max_iterations = 0
Upper limit of search iterations. Auto select when 0.
-
search_algo algo = search_algo::AUTO
Which search implementation to use.
-
size_t team_size = 0
Number of threads used to calculate a single distance. 4, 8, 16, or 32.
-
size_t search_width = 1
Number of graph nodes to select as the starting point for the search in each iteration. aka search width?
-
size_t min_iterations = 0
Lower limit of search iterations.
-
size_t thread_block_size = 0
Thread block size. 0, 64, 128, 256, 512, 1024. Auto selection when 0.
-
size_t hashmap_min_bitlen = 0
Lower limit of hashmap bit length. More than 8.
-
float hashmap_max_fill_rate = 0.5
Upper limit of hashmap fill rate. More than 0.1, less than 0.9.
-
uint32_t num_random_samplings = 1
Number of iterations of initial random seed node selection. 1 or more.
-
uint64_t seed = 0x128394
Bit mask used for initial random seed node selection.
-
IDSelector *sel = nullptr
if non-null, only these IDs will be considered during search.
-
size_t max_queries = 0
-
struct GpuIndexCagra : public faiss::gpu::GpuIndex
Public Types
-
using component_t = float
-
using distance_t = float
Public Functions
-
GpuIndexCagra(GpuResourcesProvider *provider, int dims, faiss::MetricType metric = faiss::METRIC_L2, GpuIndexCagraConfig config = GpuIndexCagraConfig())
-
virtual void train(idx_t n, const float *x) override
Trains CAGRA based on the given vector data.
-
void copyFrom(const faiss::IndexHNSWCagra *index)
Initialize ourselves from the given CPU index; will overwrite all data in ourselves
-
void copyTo(faiss::IndexHNSWCagra *index) const
Copy ourselves to the given CPU index; will overwrite all data in the index instance
-
virtual void reset() override
removes all elements from the database.
-
int getDevice() const
Returns the device that this index is resident on.
-
std::shared_ptr<GpuResources> getResources()
Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU
-
void setMinPagingSize(size_t size)
Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging
-
size_t getMinPagingSize() const
Returns the current minimum data size for paged searches.
-
virtual void add(idx_t, const float *x) override
x
can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override
x
andids
can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override
x
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
andrecons
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const override
Overridden to force GPU indices to provide their own GPU-friendly implementation
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override
Overridden to force GPU indices to provide their own GPU-friendly implementation
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters:
n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual size_t remove_ids(const IDSelector &sel)
removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.
-
virtual void reconstruct(idx_t key, float *recons) const
Reconstruct a stored vector (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters:
key – id of the vector to reconstruct
recons – reconstucted vector (size d)
-
virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const
Reconstruct several stored vectors (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters:
n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)
-
virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const
Reconstruct vectors i0 to i0 + ni - 1
this function may not be defined for some indexes
- Parameters:
i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)
-
virtual DistanceComputer *get_distance_computer() const
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const
encode a set of vectors
- Parameters:
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const
decode a set of vectors
- Parameters:
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
-
virtual void merge_from(Index &otherIndex, idx_t add_id = 0)
moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)
-
virtual void check_compatible_for_merge(const Index &otherIndex) const
check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.
Public Members
-
int d
vector dimension
-
idx_t ntotal
total nb of indexed vectors
-
bool verbose
verbosity level
-
bool is_trained
set if the Index does not require training, or if training is done already
-
MetricType metric_type
type of metric this index uses for search
-
float metric_arg
argument of the metric type
Protected Functions
-
virtual bool addImplRequiresIDs_() const override
Does addImpl_ require IDs? If so, and no IDs are provided, we will generate them sequentially based on the order in which the IDs are added
-
virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override
Overridden to actually perform the add All data is guaranteed to be resident on our device
-
virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *search_params) const override
Called from GpuIndex for search.
Protected Attributes
-
const GpuIndexCagraConfig cagraConfig_
Our configuration options.
-
std::shared_ptr<RaftCagra> index_
Instance that we own; contains the inverted lists.
-
std::shared_ptr<GpuResources> resources_
Manages streams, cuBLAS handles and scratch memory for devices.
-
const GpuIndexConfig config_
Our configuration options.
-
size_t minPagedSize_
Size above which we page copies from the CPU to GPU.
-
using component_t = float
-
struct GpuIndexFlatConfig : public faiss::gpu::GpuIndexConfig
Public Functions
- bool ALIGNED (8) useFloat16
Whether or not data is stored as float16.
Public Members
-
bool storeTransposed = false
Deprecated: no longer used Previously used to indicate whether internal storage of vectors is transposed
-
int device = 0
GPU device on which the index is resident.
-
MemorySpace memorySpace = MemorySpace::Device
What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.
-
bool use_raft = false
Should the index dispatch down to RAFT?
-
class GpuIndexFlat : public faiss::gpu::GpuIndex
- #include <GpuIndexFlat.h>
Wrapper around the GPU implementation that looks like faiss::IndexFlat; copies over centroid data from a given faiss::IndexFlat
Subclassed by faiss::gpu::GpuIndexFlatIP, faiss::gpu::GpuIndexFlatL2
Public Types
-
using component_t = float
-
using distance_t = float
Public Functions
-
GpuIndexFlat(GpuResourcesProvider *provider, const faiss::IndexFlat *index, GpuIndexFlatConfig config = GpuIndexFlatConfig())
Construct from a pre-existing faiss::IndexFlat instance, copying data over to the given GPU
-
GpuIndexFlat(GpuResourcesProvider *provider, int dims, faiss::MetricType metric, GpuIndexFlatConfig config = GpuIndexFlatConfig())
Construct an empty instance that can be added to.
-
~GpuIndexFlat() override
-
void copyFrom(const faiss::IndexFlat *index)
Initialize ourselves from the given CPU index; will overwrite all data in ourselves
-
void copyTo(faiss::IndexFlat *index) const
Copy ourselves to the given CPU index; will overwrite all data in the index instance
-
size_t getNumVecs() const
Returns the number of vectors we contain.
-
virtual void reset() override
Clears all vectors from this index.
-
virtual void train(idx_t n, const float *x) override
This index is not trained, so this does nothing.
-
virtual void add(idx_t, const float *x) override
Overrides to avoid excessive copies.
-
virtual void reconstruct(idx_t key, float *out) const override
Reconstruction methods; prefer the batch reconstruct as it will be more efficient
-
virtual void reconstruct_n(idx_t i0, idx_t num, float *out) const override
Batch reconstruction method.
-
virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *out) const override
Batch reconstruction method.
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const override
Compute residual.
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override
Compute residual (batch mode)
-
inline FlatIndex *getGpuData()
For internal access.
-
int getDevice() const
Returns the device that this index is resident on.
-
std::shared_ptr<GpuResources> getResources()
Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU
-
void setMinPagingSize(size_t size)
Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging
-
size_t getMinPagingSize() const
Returns the current minimum data size for paged searches.
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override
x
andids
can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override
x
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
andrecons
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters:
n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual size_t remove_ids(const IDSelector &sel)
removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.
-
virtual DistanceComputer *get_distance_computer() const
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const
encode a set of vectors
- Parameters:
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const
decode a set of vectors
- Parameters:
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
-
virtual void merge_from(Index &otherIndex, idx_t add_id = 0)
moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)
-
virtual void check_compatible_for_merge(const Index &otherIndex) const
check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.
Public Members
-
int d
vector dimension
-
idx_t ntotal
total nb of indexed vectors
-
bool verbose
verbosity level
-
bool is_trained
set if the Index does not require training, or if training is done already
-
MetricType metric_type
type of metric this index uses for search
-
float metric_arg
argument of the metric type
Protected Functions
-
void resetIndex_(int dims)
-
virtual bool addImplRequiresIDs_() const override
Flat index does not require IDs as there is no storage available for them
-
virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override
Called from GpuIndex for add.
-
virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override
Called from GpuIndex for search.
Protected Attributes
-
const GpuIndexFlatConfig flatConfig_
Our configuration options.
-
std::unique_ptr<FlatIndex> data_
Holds our GPU data containing the list of vectors.
-
std::shared_ptr<GpuResources> resources_
Manages streams, cuBLAS handles and scratch memory for devices.
-
const GpuIndexConfig config_
Our configuration options.
-
size_t minPagedSize_
Size above which we page copies from the CPU to GPU.
-
using component_t = float
-
class GpuIndexFlatL2 : public faiss::gpu::GpuIndexFlat
- #include <GpuIndexFlat.h>
Wrapper around the GPU implementation that looks like faiss::IndexFlatL2; copies over centroid data from a given faiss::IndexFlat
Public Types
-
using component_t = float
-
using distance_t = float
Public Functions
-
GpuIndexFlatL2(GpuResourcesProvider *provider, faiss::IndexFlatL2 *index, GpuIndexFlatConfig config = GpuIndexFlatConfig())
Construct from a pre-existing faiss::IndexFlatL2 instance, copying data over to the given GPU
-
GpuIndexFlatL2(GpuResourcesProvider *provider, int dims, GpuIndexFlatConfig config = GpuIndexFlatConfig())
Construct an empty instance that can be added to.
-
void copyFrom(faiss::IndexFlat *index)
Initialize ourselves from the given CPU index; will overwrite all data in ourselves
-
void copyTo(faiss::IndexFlat *index)
Copy ourselves to the given CPU index; will overwrite all data in the index instance
-
void copyFrom(const faiss::IndexFlat *index)
Initialize ourselves from the given CPU index; will overwrite all data in ourselves
-
void copyTo(faiss::IndexFlat *index) const
Copy ourselves to the given CPU index; will overwrite all data in the index instance
-
size_t getNumVecs() const
Returns the number of vectors we contain.
-
virtual void reset() override
Clears all vectors from this index.
-
virtual void train(idx_t n, const float *x) override
This index is not trained, so this does nothing.
-
virtual void add(idx_t, const float *x) override
Overrides to avoid excessive copies.
-
virtual void reconstruct(idx_t key, float *out) const override
Reconstruction methods; prefer the batch reconstruct as it will be more efficient
-
virtual void reconstruct_n(idx_t i0, idx_t num, float *out) const override
Batch reconstruction method.
-
virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *out) const override
Batch reconstruction method.
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const override
Compute residual.
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override
Compute residual (batch mode)
-
inline FlatIndex *getGpuData()
For internal access.
-
int getDevice() const
Returns the device that this index is resident on.
-
std::shared_ptr<GpuResources> getResources()
Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU
-
void setMinPagingSize(size_t size)
Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging
-
size_t getMinPagingSize() const
Returns the current minimum data size for paged searches.
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override
x
andids
can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override
x
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
andrecons
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters:
n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual size_t remove_ids(const IDSelector &sel)
removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.
-
virtual DistanceComputer *get_distance_computer() const
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const
encode a set of vectors
- Parameters:
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const
decode a set of vectors
- Parameters:
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
-
virtual void merge_from(Index &otherIndex, idx_t add_id = 0)
moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)
-
virtual void check_compatible_for_merge(const Index &otherIndex) const
check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.
Public Members
-
int d
vector dimension
-
idx_t ntotal
total nb of indexed vectors
-
bool verbose
verbosity level
-
bool is_trained
set if the Index does not require training, or if training is done already
-
MetricType metric_type
type of metric this index uses for search
-
float metric_arg
argument of the metric type
Protected Functions
-
void resetIndex_(int dims)
-
virtual bool addImplRequiresIDs_() const override
Flat index does not require IDs as there is no storage available for them
-
virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override
Called from GpuIndex for add.
-
virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override
Called from GpuIndex for search.
Protected Attributes
-
const GpuIndexFlatConfig flatConfig_
Our configuration options.
-
std::unique_ptr<FlatIndex> data_
Holds our GPU data containing the list of vectors.
-
std::shared_ptr<GpuResources> resources_
Manages streams, cuBLAS handles and scratch memory for devices.
-
const GpuIndexConfig config_
Our configuration options.
-
size_t minPagedSize_
Size above which we page copies from the CPU to GPU.
-
using component_t = float
-
class GpuIndexFlatIP : public faiss::gpu::GpuIndexFlat
- #include <GpuIndexFlat.h>
Wrapper around the GPU implementation that looks like faiss::IndexFlatIP; copies over centroid data from a given faiss::IndexFlat
Public Types
-
using component_t = float
-
using distance_t = float
Public Functions
-
GpuIndexFlatIP(GpuResourcesProvider *provider, faiss::IndexFlatIP *index, GpuIndexFlatConfig config = GpuIndexFlatConfig())
Construct from a pre-existing faiss::IndexFlatIP instance, copying data over to the given GPU
-
GpuIndexFlatIP(GpuResourcesProvider *provider, int dims, GpuIndexFlatConfig config = GpuIndexFlatConfig())
Construct an empty instance that can be added to.
-
void copyFrom(faiss::IndexFlat *index)
Initialize ourselves from the given CPU index; will overwrite all data in ourselves
-
void copyTo(faiss::IndexFlat *index)
Copy ourselves to the given CPU index; will overwrite all data in the index instance
-
void copyFrom(const faiss::IndexFlat *index)
Initialize ourselves from the given CPU index; will overwrite all data in ourselves
-
void copyTo(faiss::IndexFlat *index) const
Copy ourselves to the given CPU index; will overwrite all data in the index instance
-
size_t getNumVecs() const
Returns the number of vectors we contain.
-
virtual void reset() override
Clears all vectors from this index.
-
virtual void train(idx_t n, const float *x) override
This index is not trained, so this does nothing.
-
virtual void add(idx_t, const float *x) override
Overrides to avoid excessive copies.
-
virtual void reconstruct(idx_t key, float *out) const override
Reconstruction methods; prefer the batch reconstruct as it will be more efficient
-
virtual void reconstruct_n(idx_t i0, idx_t num, float *out) const override
Batch reconstruction method.
-
virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *out) const override
Batch reconstruction method.
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const override
Compute residual.
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override
Compute residual (batch mode)
-
inline FlatIndex *getGpuData()
For internal access.
-
int getDevice() const
Returns the device that this index is resident on.
-
std::shared_ptr<GpuResources> getResources()
Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU
-
void setMinPagingSize(size_t size)
Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging
-
size_t getMinPagingSize() const
Returns the current minimum data size for paged searches.
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override
x
andids
can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override
x
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
andrecons
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters:
n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual size_t remove_ids(const IDSelector &sel)
removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.
-
virtual DistanceComputer *get_distance_computer() const
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const
encode a set of vectors
- Parameters:
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const
decode a set of vectors
- Parameters:
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
-
virtual void merge_from(Index &otherIndex, idx_t add_id = 0)
moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)
-
virtual void check_compatible_for_merge(const Index &otherIndex) const
check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.
Public Members
-
int d
vector dimension
-
idx_t ntotal
total nb of indexed vectors
-
bool verbose
verbosity level
-
bool is_trained
set if the Index does not require training, or if training is done already
-
MetricType metric_type
type of metric this index uses for search
-
float metric_arg
argument of the metric type
Protected Functions
-
void resetIndex_(int dims)
-
virtual bool addImplRequiresIDs_() const override
Flat index does not require IDs as there is no storage available for them
-
virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override
Called from GpuIndex for add.
-
virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override
Called from GpuIndex for search.
Protected Attributes
-
const GpuIndexFlatConfig flatConfig_
Our configuration options.
-
std::unique_ptr<FlatIndex> data_
Holds our GPU data containing the list of vectors.
-
std::shared_ptr<GpuResources> resources_
Manages streams, cuBLAS handles and scratch memory for devices.
-
const GpuIndexConfig config_
Our configuration options.
-
size_t minPagedSize_
Size above which we page copies from the CPU to GPU.
-
using component_t = float
-
struct GpuIndexIVFConfig : public faiss::gpu::GpuIndexConfig
Subclassed by faiss::gpu::GpuIndexIVFFlatConfig, faiss::gpu::GpuIndexIVFPQConfig, faiss::gpu::GpuIndexIVFScalarQuantizerConfig
Public Members
-
IndicesOptions indicesOptions = INDICES_64_BIT
Index storage options for the GPU.
-
GpuIndexFlatConfig flatConfig
Configuration for the coarse quantizer object.
-
bool allowCpuCoarseQuantizer = false
This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.
-
int device = 0
GPU device on which the index is resident.
-
MemorySpace memorySpace = MemorySpace::Device
What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.
-
bool use_raft = false
Should the index dispatch down to RAFT?
-
IndicesOptions indicesOptions = INDICES_64_BIT
-
class GpuIndexIVF : public faiss::gpu::GpuIndex, public faiss::IndexIVFInterface
- #include <GpuIndexIVF.h>
Base class of all GPU IVF index types. This (for now) deliberately does not inherit from IndexIVF, as many of the public data members and functionality in IndexIVF is not supported in the same manner on the GPU.
Subclassed by faiss::gpu::GpuIndexIVFFlat, faiss::gpu::GpuIndexIVFPQ, faiss::gpu::GpuIndexIVFScalarQuantizer
Public Types
-
using component_t = float
-
using distance_t = float
Public Functions
-
GpuIndexIVF(GpuResourcesProvider *provider, int dims, faiss::MetricType metric, float metricArg, idx_t nlist, GpuIndexIVFConfig config = GpuIndexIVFConfig())
Version that auto-constructs a flat coarse quantizer based on the desired metric
-
GpuIndexIVF(GpuResourcesProvider *provider, Index *coarseQuantizer, int dims, faiss::MetricType metric, float metricArg, idx_t nlist, GpuIndexIVFConfig config = GpuIndexIVFConfig())
Version that takes a coarse quantizer instance. The GpuIndexIVF does not own the coarseQuantizer instance by default (functions like IndexIVF).
-
~GpuIndexIVF() override
-
virtual void updateQuantizer() = 0
Should be called if the user ever changes the state of the IVF coarse quantizer manually (e.g., substitutes a new instance or changes vectors in the coarse quantizer outside the scope of training)
-
virtual idx_t getNumLists() const
Returns the number of inverted lists we’re managing.
-
virtual idx_t getListLength(idx_t listId) const
Returns the number of vectors present in a particular inverted list.
-
virtual std::vector<uint8_t> getListVectorData(idx_t listId, bool gpuFormat = false) const
Return the encoded vector data contained in a particular inverted list, for debugging purposes. If gpuFormat is true, the data is returned as it is encoded in the GPU-side representation. Otherwise, it is converted to the CPU format. compliant format, while the native GPU format may differ.
-
virtual std::vector<idx_t> getListIndices(idx_t listId) const
Return the vector indices contained in a particular inverted list, for debugging purposes.
-
virtual void search_preassigned(idx_t n, const float *x, idx_t k, const idx_t *assign, const float *centroid_dis, float *distances, idx_t *labels, bool store_pairs, const SearchParametersIVF *params = nullptr, IndexIVFStats *stats = nullptr) const override
search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the corresponding heaps with the query results. The default implementation uses InvertedListScanners to do the search.
- Parameters:
n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
distance – output distances, size n * k
labels – output labels, size n * k
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)
-
virtual void range_search_preassigned(idx_t nx, const float *x, float radius, const idx_t *keys, const float *coarse_dis, RangeSearchResult *result, bool store_pairs = false, const IVFSearchParameters *params = nullptr, IndexIVFStats *stats = nullptr) const override
Range search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the RangeSearchResults results. The default implementation uses InvertedListScanners to do the search.
- Parameters:
n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
result – Output results
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)
-
int getDevice() const
Returns the device that this index is resident on.
-
std::shared_ptr<GpuResources> getResources()
Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU
-
void setMinPagingSize(size_t size)
Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging
-
size_t getMinPagingSize() const
Returns the current minimum data size for paged searches.
-
virtual void add(idx_t, const float *x) override
x
can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override
x
andids
can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override
x
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
andrecons
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const override
Overridden to force GPU indices to provide their own GPU-friendly implementation
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override
Overridden to force GPU indices to provide their own GPU-friendly implementation
-
virtual void train(idx_t n, const float *x)
Perform training on a representative set of vectors
- Parameters:
n – nb of training vectors
x – training vecors, size n * d
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters:
n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual void reset() = 0
removes all elements from the database.
-
virtual size_t remove_ids(const IDSelector &sel)
removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.
-
virtual void reconstruct(idx_t key, float *recons) const
Reconstruct a stored vector (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters:
key – id of the vector to reconstruct
recons – reconstucted vector (size d)
-
virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const
Reconstruct several stored vectors (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters:
n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)
-
virtual void reconstruct_n(idx_t i0, idx_t ni, float *recons) const
Reconstruct vectors i0 to i0 + ni - 1
this function may not be defined for some indexes
- Parameters:
i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)
-
virtual DistanceComputer *get_distance_computer() const
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const
encode a set of vectors
- Parameters:
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const
decode a set of vectors
- Parameters:
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
-
virtual void merge_from(Index &otherIndex, idx_t add_id = 0)
moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)
-
virtual void check_compatible_for_merge(const Index &otherIndex) const
check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.
-
void train_q1(size_t n, const float *x, bool verbose, MetricType metric_type)
Trains the quantizer and calls train_residual to train sub-quantizers.
-
size_t coarse_code_size() const
compute the number of bytes required to store list ids
-
void encode_listno(idx_t list_no, uint8_t *code) const
-
idx_t decode_listno(const uint8_t *code) const
Public Members
-
int d
vector dimension
-
idx_t ntotal
total nb of indexed vectors
-
bool verbose
verbosity level
-
bool is_trained
set if the Index does not require training, or if training is done already
-
MetricType metric_type
type of metric this index uses for search
-
float metric_arg
argument of the metric type
-
size_t nprobe = 1
number of probes at query time
-
size_t max_codes = 0
max nb of codes to visit to do a query
-
Index *quantizer = nullptr
quantizer that maps vectors to inverted lists
-
size_t nlist = 0
number of inverted lists
-
char quantizer_trains_alone = 0
= 0: use the quantizer as index in a kmeans training = 1: just pass on the training set to the train() of the quantizer = 2: kmeans training on a flat index + add the centroids to the quantizer
-
bool own_fields = false
whether object owns the quantizer
-
ClusteringParameters cp
to override default clustering params
-
Index *clustering_index = nullptr
to override index used during clustering
Protected Functions
-
int getCurrentNProbe_(const SearchParameters *params) const
From either the current set nprobe or the SearchParameters if available, return the nprobe that we should use for the current search
-
void verifyIVFSettings_() const
-
virtual bool addImplRequiresIDs_() const override
Does addImpl_ require IDs? If so, and no IDs are provided, we will generate them sequentially based on the order in which the IDs are added
-
virtual void trainQuantizer_(idx_t n, const float *x)
-
virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override
Called from GpuIndex for add/add_with_ids.
-
virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override
Called from GpuIndex for search.
Protected Attributes
-
const GpuIndexIVFConfig ivfConfig_
Our configuration options.
-
std::shared_ptr<IVFBase> baseIndex_
For a trained/initialized index, this is a reference to the base class.
-
std::shared_ptr<GpuResources> resources_
Manages streams, cuBLAS handles and scratch memory for devices.
-
const GpuIndexConfig config_
Our configuration options.
-
size_t minPagedSize_
Size above which we page copies from the CPU to GPU.
Private Functions
-
void init_()
Shared initialization functions.
-
using component_t = float
-
struct GpuIndexIVFFlatConfig : public faiss::gpu::GpuIndexIVFConfig
Public Members
-
bool interleavedLayout = true
Use the alternative memory layout for the IVF lists (currently the default)
-
IndicesOptions indicesOptions = INDICES_64_BIT
Index storage options for the GPU.
-
GpuIndexFlatConfig flatConfig
Configuration for the coarse quantizer object.
-
bool allowCpuCoarseQuantizer = false
This flag controls the CPU fallback logic for coarse quantizer component of the index. When set to false (default), the cloner will throw an exception for indices not implemented on GPU. When set to true, it will fallback to a CPU implementation.
-
int device = 0
GPU device on which the index is resident.
-
MemorySpace memorySpace = MemorySpace::Device
What memory space to use for primary storage. On Pascal and above (CC 6+) architectures, allows GPUs to use more memory than is available on the GPU.
-
bool use_raft = false
Should the index dispatch down to RAFT?
-
bool interleavedLayout = true
-
class GpuIndexIVFFlat : public faiss::gpu::GpuIndexIVF
- #include <GpuIndexIVFFlat.h>
Wrapper around the GPU implementation that looks like faiss::IndexIVFFlat
Public Types
-
using component_t = float
-
using distance_t = float
Public Functions
-
GpuIndexIVFFlat(GpuResourcesProvider *provider, const faiss::IndexIVFFlat *index, GpuIndexIVFFlatConfig config = GpuIndexIVFFlatConfig())
Construct from a pre-existing faiss::IndexIVFFlat instance, copying data over to the given GPU, if the input index is trained.
-
GpuIndexIVFFlat(GpuResourcesProvider *provider, int dims, idx_t nlist, faiss::MetricType metric = faiss::METRIC_L2, GpuIndexIVFFlatConfig config = GpuIndexIVFFlatConfig())
Constructs a new instance with an empty flat quantizer; the user provides the number of IVF lists desired.
-
GpuIndexIVFFlat(GpuResourcesProvider *provider, Index *coarseQuantizer, int dims, idx_t nlist, faiss::MetricType metric = faiss::METRIC_L2, GpuIndexIVFFlatConfig config = GpuIndexIVFFlatConfig())
Constructs a new instance with a provided CPU or GPU coarse quantizer; the user provides the number of IVF lists desired.
-
~GpuIndexIVFFlat() override
-
void reserveMemory(size_t numVecs)
Reserve GPU memory in our inverted lists for this number of vectors.
-
void copyFrom(const faiss::IndexIVFFlat *index)
Initialize ourselves from the given CPU index; will overwrite all data in ourselves
-
void copyTo(faiss::IndexIVFFlat *index) const
Copy ourselves to the given CPU index; will overwrite all data in the index instance
-
size_t reclaimMemory()
After adding vectors, one can call this to reclaim device memory to exactly the amount needed. Returns space reclaimed in bytes
-
virtual void reset() override
Clears out all inverted lists, but retains the coarse centroid information
-
virtual void updateQuantizer() override
Should be called if the user ever changes the state of the IVF coarse quantizer manually (e.g., substitutes a new instance or changes vectors in the coarse quantizer outside the scope of training)
-
virtual void train(idx_t n, const float *x) override
Trains the coarse quantizer based on the given vector data.
-
virtual void reconstruct_n(idx_t i0, idx_t n, float *out) const override
Reconstruct vectors i0 to i0 + ni - 1
this function may not be defined for some indexes
- Parameters:
i0 – index of the first vector in the sequence
ni – number of vectors in the sequence
recons – reconstucted vector (size ni * d)
-
virtual idx_t getNumLists() const
Returns the number of inverted lists we’re managing.
-
virtual idx_t getListLength(idx_t listId) const
Returns the number of vectors present in a particular inverted list.
-
virtual std::vector<uint8_t> getListVectorData(idx_t listId, bool gpuFormat = false) const
Return the encoded vector data contained in a particular inverted list, for debugging purposes. If gpuFormat is true, the data is returned as it is encoded in the GPU-side representation. Otherwise, it is converted to the CPU format. compliant format, while the native GPU format may differ.
-
virtual std::vector<idx_t> getListIndices(idx_t listId) const
Return the vector indices contained in a particular inverted list, for debugging purposes.
-
virtual void search_preassigned(idx_t n, const float *x, idx_t k, const idx_t *assign, const float *centroid_dis, float *distances, idx_t *labels, bool store_pairs, const SearchParametersIVF *params = nullptr, IndexIVFStats *stats = nullptr) const override
search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the corresponding heaps with the query results. The default implementation uses InvertedListScanners to do the search.
- Parameters:
n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
distance – output distances, size n * k
labels – output labels, size n * k
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)
-
virtual void range_search_preassigned(idx_t nx, const float *x, float radius, const idx_t *keys, const float *coarse_dis, RangeSearchResult *result, bool store_pairs = false, const IVFSearchParameters *params = nullptr, IndexIVFStats *stats = nullptr) const override
Range search a set of vectors, that are pre-quantized by the IVF quantizer. Fill in the RangeSearchResults results. The default implementation uses InvertedListScanners to do the search.
- Parameters:
n – nb of vectors to query
x – query vectors, size nx * d
assign – coarse quantization indices, size nx * nprobe
centroid_dis – distances to coarse centroids, size nx * nprobe
result – Output results
store_pairs – store inv list index + inv list offset instead in upper/lower 32 bit of result, instead of ids (used for reranking).
params – used to override the object’s search parameters
stats – search stats to be updated (can be null)
-
int getDevice() const
Returns the device that this index is resident on.
-
std::shared_ptr<GpuResources> getResources()
Returns a reference to our GpuResources object that manages memory, stream and handle resources on the GPU
-
void setMinPagingSize(size_t size)
Set the minimum data size for searches (in MiB) for which we use CPU -> GPU paging
-
size_t getMinPagingSize() const
Returns the current minimum data size for paged searches.
-
virtual void add(idx_t, const float *x) override
x
can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_
-
virtual void add_with_ids(idx_t n, const float *x, const idx_t *ids) override
x
andids
can be resident on the CPU or any GPU; copies are performed as needed Handles paged adds if the add set is too large; calls addInternal_
-
virtual void assign(idx_t n, const float *x, idx_t *labels, idx_t k = 1) const override
x
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void search_and_reconstruct(idx_t n, const float *x, idx_t k, float *distances, idx_t *labels, float *recons, const SearchParameters *params = nullptr) const override
x
,distances
andlabels
andrecons
can be resident on the CPU or any GPU; copies are performed as needed
-
virtual void compute_residual(const float *x, float *residual, idx_t key) const override
Overridden to force GPU indices to provide their own GPU-friendly implementation
-
virtual void compute_residual_n(idx_t n, const float *xs, float *residuals, const idx_t *keys) const override
Overridden to force GPU indices to provide their own GPU-friendly implementation
-
virtual void range_search(idx_t n, const float *x, float radius, RangeSearchResult *result, const SearchParameters *params = nullptr) const
query n vectors of dimension d to the index.
return all vectors with distance < radius. Note that many indexes do not implement the range_search (only the k-NN search is mandatory).
- Parameters:
n – number of vectors
x – input vectors to search, size n * d
radius – search radius
result – result table
-
virtual size_t remove_ids(const IDSelector &sel)
removes IDs from the index. Not supported by all indexes. Returns the number of elements removed.
-
virtual void reconstruct(idx_t key, float *recons) const
Reconstruct a stored vector (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters:
key – id of the vector to reconstruct
recons – reconstucted vector (size d)
-
virtual void reconstruct_batch(idx_t n, const idx_t *keys, float *recons) const
Reconstruct several stored vectors (or an approximation if lossy coding)
this function may not be defined for some indexes
- Parameters:
n – number of vectors to reconstruct
keys – ids of the vectors to reconstruct (size n)
recons – reconstucted vector (size n * d)
-
virtual DistanceComputer *get_distance_computer() const
Get a DistanceComputer (defined in AuxIndexStructures) object for this kind of index.
DistanceComputer is implemented for indexes that support random access of their vectors.
-
virtual size_t sa_code_size() const
size of the produced codes in bytes
-
virtual void sa_encode(idx_t n, const float *x, uint8_t *bytes) const
encode a set of vectors
- Parameters:
n – number of vectors
x – input vectors, size n * d
bytes – output encoded vectors, size n * sa_code_size()
-
virtual void sa_decode(idx_t n, const uint8_t *bytes, float *x) const
decode a set of vectors
- Parameters:
n – number of vectors
bytes – input encoded vectors, size n * sa_code_size()
x – output vectors, size n * d
-
virtual void merge_from(Index &otherIndex, idx_t add_id = 0)
moves the entries from another dataset to self. On output, other is empty. add_id is added to all moved ids (for sequential ids, this would be this->ntotal)
-
virtual void check_compatible_for_merge(const Index &otherIndex) const
check that the two indexes are compatible (ie, they are trained in the same way and have the same parameters). Otherwise throw.
-
void train_q1(size_t n, const float *x, bool verbose, MetricType metric_type)
Trains the quantizer and calls train_residual to train sub-quantizers.
-
size_t coarse_code_size() const
compute the number of bytes required to store list ids
-
void encode_listno(idx_t list_no, uint8_t *code) const
-
idx_t decode_listno(const uint8_t *code) const
Public Members
-
int d
vector dimension
-
idx_t ntotal
total nb of indexed vectors
-
bool verbose
verbosity level
-
bool is_trained
set if the Index does not require training, or if training is done already
-
MetricType metric_type
type of metric this index uses for search
-
float metric_arg
argument of the metric type
-
size_t nprobe = 1
number of probes at query time
-
size_t max_codes = 0
max nb of codes to visit to do a query
-
Index *quantizer = nullptr
quantizer that maps vectors to inverted lists
-
size_t nlist = 0
number of inverted lists
-
char quantizer_trains_alone = 0
= 0: use the quantizer as index in a kmeans training = 1: just pass on the training set to the train() of the quantizer = 2: kmeans training on a flat index + add the centroids to the quantizer
-
bool own_fields = false
whether object owns the quantizer
-
ClusteringParameters cp
to override default clustering params
-
Index *clustering_index = nullptr
to override index used during clustering
Protected Functions
-
void setIndex_(GpuResources *resources, int dim, int nlist, faiss::MetricType metric, float metricArg, bool useResidual, faiss::ScalarQuantizer *scalarQ, bool interleavedLayout, IndicesOptions indicesOptions, MemorySpace space)
Initialize appropriate index.
- Parameters:
scalarQ – Optional ScalarQuantizer
-
int getCurrentNProbe_(const SearchParameters *params) const
From either the current set nprobe or the SearchParameters if available, return the nprobe that we should use for the current search
-
void verifyIVFSettings_() const
-
virtual bool addImplRequiresIDs_() const override
Does addImpl_ require IDs? If so, and no IDs are provided, we will generate them sequentially based on the order in which the IDs are added
-
virtual void trainQuantizer_(idx_t n, const float *x)
-
virtual void addImpl_(idx_t n, const float *x, const idx_t *ids) override
Called from GpuIndex for add/add_with_ids.
-
virtual void searchImpl_(idx_t n, const float *x, int k, float *distances, idx_t *labels, const SearchParameters *params) const override
Called from GpuIndex for search.
Protected Attributes
-
const GpuIndexIVFFlatConfig ivfFlatConfig_
Our configuration options.
-
size_t reserveMemoryVecs_
Desired inverted list memory reservation.
-
std::shared_ptr<IVFFlat> index_
Instance that we own; contains the inverted lists.
-
const GpuIndexIVFConfig ivfConfig_
Our configuration options.
-
std::shared_ptr<IVFBase> baseIndex_
For a trained/initialized index, this is a reference to the base class.
-
std::shared_ptr<GpuResources> resources_
Manages streams, cuBLAS handles and scratch memory for devices.
-
const GpuIndexConfig config_
Our configuration options.
-
size_t minPagedSize_
Size above which we page copies from the CPU to GPU.
-
using component_t = float
-
struct GpuIndexIVFPQConfig : public faiss::gpu::GpuIndexIVFConfig
Public Members
-
bool useFloat16LookupTables = false
Whether or not float16 residual distance tables are used in the list scanning kernels. When subQuantizers * 2^bitsPerCode > 16384, this is required.
-
bool usePrecomputedTables = false
Whether or not we enable the precomputed table option for search, which can substantially increase the memory requirement.
-
bool interleavedLayout = false
Use the alternative memory layout for the IVF lists WARNING: this is a feature under development, and is only supported with RAFT enabled for the index. Do not use if RAFT is not enabled.
- bool useMMCodeDistance
-
bool useFloat16LookupTables = false
-
enum class DistanceDataType