Struct faiss::gpu::IVFPQBuildCagraConfig

struct IVFPQBuildCagraConfig

Public Members

uint32_t n_lists = 1024

The number of inverted lists (clusters)

Hint: the number of vectors per cluster (n_rows/n_lists) should be approximately 1,000 to 10,000.

uint32_t kmeans_n_iters = 20

The number of iterations searching for kmeans centers (index building).

double kmeans_trainset_fraction = 0.5

The fraction of data to use during iterative kmeans building.

uint32_t pq_bits = 8

The bit length of the vector element after compression by PQ.

Possible values: [4, 5, 6, 7, 8].

Hint: the smaller the ‘pq_bits’, the smaller the index size and the better the search performance, but the lower the recall.

uint32_t pq_dim = 0

The dimensionality of the vector after compression by PQ. When zero, an optimal value is selected using a heuristic.

NB: pq_dim /// pq_bits must be a multiple of 8.

Hint: a smaller ‘pq_dim’ results in a smaller index size and better search performance, but lower recall. If ‘pq_bits’ is 8, ‘pq_dim’ can be set to any number, but multiple of 8 are desirable for good performance. If ‘pq_bits’ is not 8, ‘pq_dim’ should be a multiple of 8. For good performance, it is desirable that ‘pq_dim’ is a multiple of 32. Ideally, ‘pq_dim’ should be also a divisor of the dataset dim.

codebook_gen codebook_kind = codebook_gen::PER_SUBSPACE

How PQ codebooks are created.

bool force_random_rotation = false

Apply a random rotation matrix on the input data and queries even if dim % pq_dim == 0.

Note: if dim is not multiple of pq_dim, a random rotation is always applied to the input data and queries to transform the working space from dim to rot_dim, which may be slightly larger than the original space and and is a multiple of pq_dim (rot_dim % pq_dim == 0). However, this transform is not necessary when dim is multiple of pq_dim (dim == rot_dim, hence no need in adding “extra” data columns / features).

By default, if dim == rot_dim, the rotation transform is initialized with the identity matrix. When force_random_rotation == true, a random orthogonal transform matrix is generated regardless of the values of dim and pq_dim.

bool conservative_memory_allocation = false

By default, the algorithm allocates more space than necessary for individual clusters (list_data). This allows to amortize the cost of memory allocation and reduce the number of data copies during repeated calls to extend (extending the database).

The alternative is the conservative allocation behavior; when enabled, the algorithm always allocates the minimum amount of memory required to store the given number of records. Set this flag to true if you prefer to use as little GPU memory for the database as possible.