File StandardGpuResources.h

namespace faiss

Implementation of k-means clustering with many variants.

Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

IDSelector is intended to define a subset of vectors to handle (for removal or as subset to search)

PQ4 SIMD packing and accumulation functions

The basic kernel accumulates nq query vectors with bbs = nb * 2 * 16 vectors and produces an output matrix for that. It is interesting for nq * nb <= 4, otherwise register spilling becomes too large.

The implementation of these functions is spread over 3 cpp files to reduce parallel compile times. Templates are instantiated explicitly.

This file contains callbacks for kernels that compute distances.

Throughout the library, vectors are provided as float * pointers. Most algorithms can be optimized when several vectors are processed (added/searched) together in a batch. In this case, they are passed in as a matrix. When n vectors of size d are provided as float * x, component j of vector i is

x[ i * d + j ]

where 0 <= i < n and 0 <= j < d. In other words, matrices are always compact. When specifying the size of the matrix, we call it an n*d matrix, which implies a row-major storage.

I/O functions can read/write to a filename, a file handle or to an object that abstracts the medium.

The read functions return objects that should be deallocated with delete. All references within these objectes are owned by the object.

Definition of inverted lists + a few common classes that implement the interface.

Since IVF (inverted file) indexes are of so much use for large-scale use cases, we group a few functions related to them in this small library. Most functions work both on IndexIVFs and IndexIVFs embedded within an IndexPreTransform.

In this file are the implementations of extra metrics beyond L2 and inner product

Implements a few neural net layers, mainly to support QINCo

Defines a few objects that apply transformations to a set of vectors Often these are pre-processing steps.

namespace gpu

class StandardGpuResourcesImpl : public faiss::gpu::GpuResources

#include <StandardGpuResources.h>

Standard implementation of the GpuResources object that provides for a temporary memory manager

Public Functions

StandardGpuResourcesImpl()

~StandardGpuResourcesImpl() override

virtual bool supportsBFloat16(int device) override: Does the given GPU support bfloat16?

void noTempMemory(): Disable allocation of temporary memory; all temporary memory requests will call cudaMalloc / cudaFree at the point of use

void setTempMemory(size_t size): Specify that we wish to use a certain fixed size of memory on all devices as temporary memory. This is the upper bound for the GPU memory that we will reserve. We will never go above 1.5 GiB on any GPU; smaller GPUs (with <= 4 GiB or <= 8 GiB) will use less memory than that. To avoid any temporary memory allocation, pass 0.

void setPinnedMemory(size_t size): Set amount of pinned memory to allocate, for async GPU <-> CPU transfers

virtual void setDefaultStream(int device, cudaStream_t stream) override: Called to change the stream for work ordering. We do not own stream; i.e., it will not be destroyed when the GpuResources object gets cleaned up. We are guaranteed that all Faiss GPU work is ordered with respect to this stream upon exit from an index or other Faiss GPU call.

void revertDefaultStream(int device): Revert the default stream to the original stream managed by this resources object, in case someone called setDefaultStream.

virtual cudaStream_t getDefaultStream(int device) override: Returns the stream for the given device on which all Faiss GPU work is ordered. We are guaranteed that all Faiss GPU work is ordered with respect to this stream upon exit from an index or other Faiss GPU call.

void setDefaultNullStreamAllDevices(): Called to change the work ordering streams to the null stream for all devices

void setLogMemoryAllocations(bool enable): If enabled, will print every GPU memory allocation and deallocation to standard output

virtual void initializeForDevice(int device) override

Internal system calls.

Initialize resources for this device

virtual cublasHandle_t getBlasHandle(int device) override: Returns the cuBLAS handle that we use for the given device.

virtual std::vector<cudaStream_t> getAlternateStreams(int device) override: Returns the set of alternative streams that we use for the given device.

virtual void *allocMemory(const AllocRequest &req) override: Allocate non-temporary GPU memory.

virtual void deallocMemory(int device, void *in) override: Returns a previous allocation.

virtual size_t getTempMemoryAvailable(int device) const override: For MemorySpace::Temporary, how much space is immediately available without cudaMalloc allocation?

std::map<int, std::map<std::string, std::pair<int, size_t>>> getMemoryInfo() const: Export a description of memory used for Python.

virtual std::pair<void*, size_t> getPinnedMemory() override: Returns the available CPU pinned memory buffer.

virtual cudaStream_t getAsyncCopyStream(int device) override: Returns the stream on which we perform async CPU <-> GPU copies.

Protected Functions

bool isInitialized(int device) const: Have GPU resources been initialized for this device yet?

Protected Attributes

std::unordered_map<int, std::unordered_map<void*, AllocRequest>> allocs_: Set of currently outstanding memory allocations per device device -> (alloc request, allocated ptr)

std::unordered_map<int, std::unique_ptr<StackDeviceMemory>> tempMemory_: Temporary memory provider, per each device.

std::unordered_map<int, cudaStream_t> defaultStreams_: Our default stream that work is ordered on, one per each device.

std::unordered_map<int, cudaStream_t> userDefaultStreams_: This contains particular streams as set by the user for ordering, if any

std::unordered_map<int, std::vector<cudaStream_t>> alternateStreams_: Other streams we can use, per each device.

std::unordered_map<int, cudaStream_t> asyncCopyStreams_: Async copy stream to use for GPU <-> CPU pinned memory copies.

std::unordered_map<int, cublasHandle_t> blasHandles_: cuBLAS handle for each device

void *pinnedMemAlloc_: Pinned memory allocation for use with this GPU.

size_t pinnedMemAllocSize_

size_t tempMemSize_: Another option is to use a specified amount of memory on all devices

size_t pinnedMemSize_: Amount of pinned memory we should allocate.

bool allocLogging_: Whether or not we log every GPU memory allocation and deallocation.

Protected Static Functions

static size_t getDefaultTempMemForGPU(int device, size_t requested): Adjust the default temporary memory allocation based on the total GPU memory size

class StandardGpuResources : public faiss::gpu::GpuResourcesProvider

#include <StandardGpuResources.h>

Default implementation of GpuResources that allocates a cuBLAS stream and 2 streams for use, as well as temporary memory. Internally, the Faiss GPU code uses the instance managed by getResources, but this is the user-facing object that is internally reference counted.

Public Functions

StandardGpuResources()

~StandardGpuResources() override

virtual std::shared_ptr<GpuResources> getResources() override: Returns the shared resources object.

bool supportsBFloat16(int device): Whether or not the given device supports native bfloat16 arithmetic.

bool supportsBFloat16CurrentDevice(): Whether or not the current device supports native bfloat16 arithmetic.

void noTempMemory(): Disable allocation of temporary memory; all temporary memory requests will call cudaMalloc / cudaFree at the point of use

void setTempMemory(size_t size): Specify that we wish to use a certain fixed size of memory on all devices as temporary memory. This is the upper bound for the GPU memory that we will reserve. We will never go above 1.5 GiB on any GPU; smaller GPUs (with <= 4 GiB or <= 8 GiB) will use less memory than that. To avoid any temporary memory allocation, pass 0.

void setPinnedMemory(size_t size): Set amount of pinned memory to allocate, for async GPU <-> CPU transfers

void setDefaultStream(int device, cudaStream_t stream): Called to change the stream for work ordering. We do not own stream; i.e., it will not be destroyed when the GpuResources object gets cleaned up. We are guaranteed that all Faiss GPU work is ordered with respect to this stream upon exit from an index or other Faiss GPU call.

void revertDefaultStream(int device): Revert the default stream to the original stream managed by this resources object, in case someone called setDefaultStream.

void setDefaultNullStreamAllDevices(): Called to change the work ordering streams to the null stream for all devices

std::map<int, std::map<std::string, std::pair<int, size_t>>> getMemoryInfo() const: Export a description of memory used for Python.

cudaStream_t getDefaultStream(int device): Returns the current default stream.

size_t getTempMemoryAvailable(int device) const: Returns the current amount of temp memory available.

void syncDefaultStreamCurrentDevice(): Synchronize our default stream with the CPU.

void setLogMemoryAllocations(bool enable): If enabled, will print every GPU memory allocation and deallocation to standard output

Private Members

std::shared_ptr<StandardGpuResourcesImpl> res_