File GpuResources.h
-
namespace faiss
Implementation of k-means clustering with many variants.
Copyright (c) Facebook, Inc. and its affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.
IDSelector is intended to define a subset of vectors to handle (for removal or as subset to search)
PQ4 SIMD packing and accumulation functions
The basic kernel accumulates nq query vectors with bbs = nb * 2 * 16 vectors and produces an output matrix for that. It is interesting for nq * nb <= 4, otherwise register spilling becomes too large.
The implementation of these functions is spread over 3 cpp files to reduce parallel compile times. Templates are instantiated explicitly.
This file contains callbacks for kernels that compute distances.
Throughout the library, vectors are provided as float * pointers. Most algorithms can be optimized when several vectors are processed (added/searched) together in a batch. In this case, they are passed in as a matrix. When n vectors of size d are provided as float * x, component j of vector i is
x[ i * d + j ]
where 0 <= i < n and 0 <= j < d. In other words, matrices are always compact. When specifying the size of the matrix, we call it an n*d matrix, which implies a row-major storage.
I/O functions can read/write to a filename, a file handle or to an object that abstracts the medium.
The read functions return objects that should be deallocated with delete. All references within these objectes are owned by the object.
Definition of inverted lists + a few common classes that implement the interface.
Since IVF (inverted file) indexes are of so much use for large-scale use cases, we group a few functions related to them in this small library. Most functions work both on IndexIVFs and IndexIVFs embedded within an IndexPreTransform.
In this file are the implementations of extra metrics beyond L2 and inner product
Implements a few neural net layers, mainly to support QINCo
Defines a few objects that apply transformations to a set of vectors Often these are pre-processing steps.
-
namespace gpu
Enums
-
enum AllocType
Values:
-
enumerator Other
Unknown allocation type or miscellaneous (not currently categorized)
-
enumerator FlatData
Primary data storage for GpuIndexFlat (the raw matrix of vectors and vector norms if needed)
-
enumerator IVFLists
Primary data storage for GpuIndexIVF* (the storage for each individual IVF list)
-
enumerator QuantizerPrecomputedCodes
For GpuIndexIVFPQ, “precomputed codes” for more efficient PQ lookup require the use of possibly large tables. These are marked separately from Quantizer as these can frequently be 100s - 1000s of MiB in size
-
enumerator TemporaryMemoryBuffer
StandardGpuResources implementation specific types When using StandardGpuResources, temporary memory allocations (MemorySpace::Temporary) come out of a stack region of memory that is allocated up front for each gpu (e.g., 1.5 GiB upon initialization). This allocation by StandardGpuResources is marked with this AllocType.
-
enumerator TemporaryMemoryOverflow
When using StandardGpuResources, any MemorySpace::Temporary allocations that cannot be satisfied within the TemporaryMemoryBuffer region fall back to calling cudaMalloc which are sized to just the request at hand. These “overflow” temporary allocations are marked with this AllocType.
-
enumerator Other
-
enum MemorySpace
Memory regions accessible to the GPU.
Values:
-
enumerator Temporary
Temporary device memory (guaranteed to no longer be used upon exit of a top-level index call, and where the streams using it have completed GPU work). Typically backed by Device memory (cudaMalloc/cudaFree).
-
enumerator Device
Managed using cudaMalloc/cudaFree (typical GPU device memory)
-
enumerator Unified
Managed using cudaMallocManaged/cudaFree (typical Unified CPU/GPU memory)
-
enumerator Temporary
Functions
-
std::string memorySpaceToString(MemorySpace s)
Convert a MemorySpace to string.
-
AllocInfo makeDevAlloc(AllocType at, cudaStream_t st)
Create an AllocInfo for the current device with MemorySpace::Device.
-
AllocInfo makeTempAlloc(AllocType at, cudaStream_t st)
Create an AllocInfo for the current device with MemorySpace::Temporary.
-
AllocInfo makeSpaceAlloc(AllocType at, MemorySpace sp, cudaStream_t st)
Create an AllocInfo for the current device.
-
struct AllocInfo
- #include <GpuResources.h>
Information on what/where an allocation is.
Subclassed by faiss::gpu::AllocRequest
Public Functions
-
inline AllocInfo()
-
inline AllocInfo(AllocType at, int dev, MemorySpace sp, cudaStream_t st)
Public Members
-
int device = 0
The device on which the allocation is happening.
-
MemorySpace space = MemorySpace::Device
The memory space of the allocation.
-
cudaStream_t stream = nullptr
The stream on which new work on the memory will be ordered (e.g., if a piece of memory cached and to be returned for this call was last used on stream 3 and a new memory request is for stream 4, the memory manager will synchronize stream 4 to wait for the completion of stream 3 via events or other stream synchronization.
The memory manager guarantees that the returned memory is free to use without data races on this stream specified.
-
inline AllocInfo()
-
struct AllocRequest : public faiss::gpu::AllocInfo
- #include <GpuResources.h>
Information on what/where an allocation is, along with how big it should be.
Public Functions
-
inline AllocRequest()
-
inline AllocRequest(AllocType at, int dev, MemorySpace sp, cudaStream_t st, size_t sz)
Public Members
-
size_t size = 0
The size in bytes of the allocation.
-
int device = 0
The device on which the allocation is happening.
-
MemorySpace space = MemorySpace::Device
The memory space of the allocation.
-
cudaStream_t stream = nullptr
The stream on which new work on the memory will be ordered (e.g., if a piece of memory cached and to be returned for this call was last used on stream 3 and a new memory request is for stream 4, the memory manager will synchronize stream 4 to wait for the completion of stream 3 via events or other stream synchronization.
The memory manager guarantees that the returned memory is free to use without data races on this stream specified.
-
inline AllocRequest()
-
struct GpuMemoryReservation
- #include <GpuResources.h>
A RAII object that manages a temporary memory request.
Public Functions
-
GpuMemoryReservation()
-
GpuMemoryReservation(GpuResources *r, int dev, cudaStream_t str, void *p, size_t sz)
-
GpuMemoryReservation(GpuMemoryReservation &&m) noexcept
-
~GpuMemoryReservation()
-
GpuMemoryReservation &operator=(GpuMemoryReservation &&m)
-
inline void *get()
-
void release()
-
GpuMemoryReservation()
-
class GpuResources
- #include <GpuResources.h>
Base class of GPU-side resource provider; hides provision of cuBLAS handles, CUDA streams and all device memory allocation performed
Subclassed by faiss::gpu::StandardGpuResourcesImpl
Public Functions
-
virtual ~GpuResources()
-
virtual void initializeForDevice(int device) = 0
Call to pre-allocate resources for a particular device. If this is not called, then resources will be allocated at the first time of demand
-
virtual bool supportsBFloat16(int device) = 0
Does the given GPU support bfloat16?
-
virtual cublasHandle_t getBlasHandle(int device) = 0
Returns the cuBLAS handle that we use for the given device.
-
virtual cudaStream_t getDefaultStream(int device) = 0
Returns the stream that we order all computation on for the given device
-
virtual void setDefaultStream(int device, cudaStream_t stream) = 0
Overrides the default stream for a device to the user-supplied stream. The resources object does not own this stream (i.e., it will not destroy it).
-
virtual std::vector<cudaStream_t> getAlternateStreams(int device) = 0
Returns the set of alternative streams that we use for the given device.
-
virtual void *allocMemory(const AllocRequest &req) = 0
Memory management Returns an allocation from the given memory space, ordered with respect to the given stream (i.e., the first user will be a kernel in this stream). All allocations are sized internally to be the next highest multiple of 16 bytes, and all allocations returned are guaranteed to be 16 byte aligned.
-
virtual void deallocMemory(int device, void *in) = 0
Returns a previous allocation.
-
virtual size_t getTempMemoryAvailable(int device) const = 0
For MemorySpace::Temporary, how much space is immediately available without cudaMalloc allocation?
-
virtual std::pair<void*, size_t> getPinnedMemory() = 0
Returns the available CPU pinned memory buffer.
-
virtual cudaStream_t getAsyncCopyStream(int device) = 0
Returns the stream on which we perform async CPU <-> GPU copies.
-
bool supportsBFloat16CurrentDevice()
Does the current GPU support bfloat16?
Functions provided by default
-
cublasHandle_t getBlasHandleCurrentDevice()
Calls getBlasHandle with the current device.
-
cudaStream_t getDefaultStreamCurrentDevice()
Calls getDefaultStream with the current device.
-
size_t getTempMemoryAvailableCurrentDevice() const
Calls getTempMemoryAvailable with the current device.
-
GpuMemoryReservation allocMemoryHandle(const AllocRequest &req)
Returns a temporary memory allocation via a RAII object.
-
void syncDefaultStream(int device)
Synchronizes the CPU with respect to the default stream for the given device
-
void syncDefaultStreamCurrentDevice()
Calls syncDefaultStream for the current device.
-
std::vector<cudaStream_t> getAlternateStreamsCurrentDevice()
Calls getAlternateStreams for the current device.
-
cudaStream_t getAsyncCopyStreamCurrentDevice()
Calls getAsyncCopyStream for the current device.
-
virtual ~GpuResources()
-
class GpuResourcesProvider
- #include <GpuResources.h>
Interface for a provider of a shared resources object. This is to avoid interfacing std::shared_ptr to Python
Subclassed by faiss::gpu::GpuResourcesProviderFromInstance, faiss::gpu::StandardGpuResources
Public Functions
-
virtual ~GpuResourcesProvider()
-
virtual std::shared_ptr<GpuResources> getResources() = 0
Returns the shared resources object.
-
virtual ~GpuResourcesProvider()
-
class GpuResourcesProviderFromInstance : public faiss::gpu::GpuResourcesProvider
- #include <GpuResources.h>
A simple wrapper for a GpuResources object to make a GpuResourcesProvider out of it again
Public Functions
-
~GpuResourcesProviderFromInstance() override
-
virtual std::shared_ptr<GpuResources> getResources() override
Returns the shared resources object.
Private Members
-
std::shared_ptr<GpuResources> res_
-
~GpuResourcesProviderFromInstance() override
-
enum AllocType
-
namespace gpu