File GpuResources.h

namespace faiss

Implementation of k-means clustering with many variants.

Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

IDSelector is intended to define a subset of vectors to handle (for removal or as subset to search)

PQ4 SIMD packing and accumulation functions

The basic kernel accumulates nq query vectors with bbs = nb * 2 * 16 vectors and produces an output matrix for that. It is interesting for nq * nb <= 4, otherwise register spilling becomes too large.

The implementation of these functions is spread over 3 cpp files to reduce parallel compile times. Templates are instantiated explicitly.

This file contains callbacks for kernels that compute distances.

Throughout the library, vectors are provided as float * pointers. Most algorithms can be optimized when several vectors are processed (added/searched) together in a batch. In this case, they are passed in as a matrix. When n vectors of size d are provided as float * x, component j of vector i is

x[ i * d + j ]

where 0 <= i < n and 0 <= j < d. In other words, matrices are always compact. When specifying the size of the matrix, we call it an n*d matrix, which implies a row-major storage.

I/O functions can read/write to a filename, a file handle or to an object that abstracts the medium.

The read functions return objects that should be deallocated with delete. All references within these objectes are owned by the object.

Definition of inverted lists + a few common classes that implement the interface.

Since IVF (inverted file) indexes are of so much use for large-scale use cases, we group a few functions related to them in this small library. Most functions work both on IndexIVFs and IndexIVFs embedded within an IndexPreTransform.

In this file are the implementations of extra metrics beyond L2 and inner product

Implements a few neural net layers, mainly to support QINCo

Defines a few objects that apply transformations to a set of vectors Often these are pre-processing steps.

namespace gpu

Enums

enum AllocType

Values:

enumerator Other: Unknown allocation type or miscellaneous (not currently categorized)

enumerator FlatData: Primary data storage for GpuIndexFlat (the raw matrix of vectors and vector norms if needed)

enumerator IVFLists: Primary data storage for GpuIndexIVF* (the storage for each individual IVF list)

enumerator Quantizer: Quantizer (PQ, SQ) dictionary information.

enumerator QuantizerPrecomputedCodes: For GpuIndexIVFPQ, “precomputed codes” for more efficient PQ lookup require the use of possibly large tables. These are marked separately from Quantizer as these can frequently be 100s - 1000s of MiB in size

enumerator TemporaryMemoryBuffer: StandardGpuResources implementation specific types When using StandardGpuResources, temporary memory allocations (MemorySpace::Temporary) come out of a stack region of memory that is allocated up front for each gpu (e.g., 1.5 GiB upon initialization). This allocation by StandardGpuResources is marked with this AllocType.

enumerator TemporaryMemoryOverflow: When using StandardGpuResources, any MemorySpace::Temporary allocations that cannot be satisfied within the TemporaryMemoryBuffer region fall back to calling cudaMalloc which are sized to just the request at hand. These “overflow” temporary allocations are marked with this AllocType.

enum MemorySpace

Memory regions accessible to the GPU.

Values:

enumerator Temporary: Temporary device memory (guaranteed to no longer be used upon exit of a top-level index call, and where the streams using it have completed GPU work). Typically backed by Device memory (cudaMalloc/cudaFree).

enumerator Device: Managed using cudaMalloc/cudaFree (typical GPU device memory)

enumerator Unified: Managed using cudaMallocManaged/cudaFree (typical Unified CPU/GPU memory)

Functions

std::string allocTypeToString(AllocType t): Convert an AllocType to string.

std::string memorySpaceToString(MemorySpace s): Convert a MemorySpace to string.

AllocInfo makeDevAlloc(AllocType at, cudaStream_t st): Create an AllocInfo for the current device with MemorySpace::Device.

AllocInfo makeTempAlloc(AllocType at, cudaStream_t st): Create an AllocInfo for the current device with MemorySpace::Temporary.

AllocInfo makeSpaceAlloc(AllocType at, MemorySpace sp, cudaStream_t st): Create an AllocInfo for the current device.

struct AllocInfo

#include <GpuResources.h>

Information on what/where an allocation is.

Subclassed by faiss::gpu::AllocRequest

Public Functions

inline AllocInfo()

inline AllocInfo(AllocType at, int dev, MemorySpace sp, cudaStream_t st)

std::string toString() const: Returns a string representation of this info.

Public Members

AllocType type = AllocType::Other : The internal category of the allocation.

int device = 0: The device on which the allocation is happening.

MemorySpace space = MemorySpace::Device : The memory space of the allocation.

cudaStream_t stream = nullptr

The stream on which new work on the memory will be ordered (e.g., if a piece of memory cached and to be returned for this call was last used on stream 3 and a new memory request is for stream 4, the memory manager will synchronize stream 4 to wait for the completion of stream 3 via events or other stream synchronization.

The memory manager guarantees that the returned memory is free to use without data races on this stream specified.

struct AllocRequest : public faiss::gpu::AllocInfo 

#include <GpuResources.h>

Information on what/where an allocation is, along with how big it should be.

Public Functions

inline AllocRequest()

inline AllocRequest(const AllocInfo &info, size_t sz)

inline AllocRequest(AllocType at, int dev, MemorySpace sp, cudaStream_t st, size_t sz)

std::string toString() const: Returns a string representation of this request.

Public Members

size_t size = 0: The size in bytes of the allocation.

AllocType type = AllocType::Other : The internal category of the allocation.

int device = 0: The device on which the allocation is happening.

MemorySpace space = MemorySpace::Device : The memory space of the allocation.

cudaStream_t stream = nullptr

The stream on which new work on the memory will be ordered (e.g., if a piece of memory cached and to be returned for this call was last used on stream 3 and a new memory request is for stream 4, the memory manager will synchronize stream 4 to wait for the completion of stream 3 via events or other stream synchronization.

The memory manager guarantees that the returned memory is free to use without data races on this stream specified.

struct GpuMemoryReservation

#include <GpuResources.h>

A RAII object that manages a temporary memory request.

Public Functions

GpuMemoryReservation()

GpuMemoryReservation(GpuResources *r, int dev, cudaStream_t str, void *p, size_t sz)

GpuMemoryReservation(GpuMemoryReservation &&m) noexcept

~GpuMemoryReservation()

GpuMemoryReservation &operator=(GpuMemoryReservation &&m)

inline void *get()

void release()

Public Members

GpuResources *res

int device

cudaStream_t stream

void *data

size_t size

class GpuResources

#include <GpuResources.h>

Base class of GPU-side resource provider; hides provision of cuBLAS handles, CUDA streams and all device memory allocation performed

Subclassed by faiss::gpu::StandardGpuResourcesImpl

Public Functions

virtual ~GpuResources()

virtual void initializeForDevice(int device) = 0: Call to pre-allocate resources for a particular device. If this is not called, then resources will be allocated at the first time of demand

virtual bool supportsBFloat16(int device) = 0: Does the given GPU support bfloat16?

virtual cublasHandle_t getBlasHandle(int device) = 0: Returns the cuBLAS handle that we use for the given device.

virtual cudaStream_t getDefaultStream(int device) = 0: Returns the stream that we order all computation on for the given device

virtual void setDefaultStream(int device, cudaStream_t stream) = 0: Overrides the default stream for a device to the user-supplied stream. The resources object does not own this stream (i.e., it will not destroy it).

virtual std::vector<cudaStream_t> getAlternateStreams(int device) = 0: Returns the set of alternative streams that we use for the given device.

virtual void *allocMemory(const AllocRequest &req) = 0: Memory management Returns an allocation from the given memory space, ordered with respect to the given stream (i.e., the first user will be a kernel in this stream). All allocations are sized internally to be the next highest multiple of 16 bytes, and all allocations returned are guaranteed to be 16 byte aligned.

virtual void deallocMemory(int device, void *in) = 0: Returns a previous allocation.

virtual size_t getTempMemoryAvailable(int device) const = 0: For MemorySpace::Temporary, how much space is immediately available without cudaMalloc allocation?

virtual std::pair<void*, size_t> getPinnedMemory() = 0: Returns the available CPU pinned memory buffer.

virtual cudaStream_t getAsyncCopyStream(int device) = 0: Returns the stream on which we perform async CPU <-> GPU copies.

bool supportsBFloat16CurrentDevice()

Does the current GPU support bfloat16?

Functions provided by default

cublasHandle_t getBlasHandleCurrentDevice(): Calls getBlasHandle with the current device.

cudaStream_t getDefaultStreamCurrentDevice(): Calls getDefaultStream with the current device.

size_t getTempMemoryAvailableCurrentDevice() const: Calls getTempMemoryAvailable with the current device.

GpuMemoryReservation allocMemoryHandle(const AllocRequest &req): Returns a temporary memory allocation via a RAII object.

void syncDefaultStream(int device): Synchronizes the CPU with respect to the default stream for the given device

void syncDefaultStreamCurrentDevice(): Calls syncDefaultStream for the current device.

std::vector<cudaStream_t> getAlternateStreamsCurrentDevice(): Calls getAlternateStreams for the current device.

cudaStream_t getAsyncCopyStreamCurrentDevice(): Calls getAsyncCopyStream for the current device.

class GpuResourcesProvider

#include <GpuResources.h>

Interface for a provider of a shared resources object. This is to avoid interfacing std::shared_ptr to Python

Subclassed by faiss::gpu::GpuResourcesProviderFromInstance, faiss::gpu::StandardGpuResources

Public Functions

virtual ~GpuResourcesProvider()

virtual std::shared_ptr<GpuResources> getResources() = 0: Returns the shared resources object.

class GpuResourcesProviderFromInstance : public faiss::gpu::GpuResourcesProvider

#include <GpuResources.h>

A simple wrapper for a GpuResources object to make a GpuResourcesProvider out of it again

Public Functions

explicit GpuResourcesProviderFromInstance(std::shared_ptr<GpuResources> p)

~GpuResourcesProviderFromInstance() override

virtual std::shared_ptr<GpuResources> getResources() override: Returns the shared resources object.

Private Members

std::shared_ptr<GpuResources> res_