Struct faiss::gpu::IVFPQBuildCagraConfig
-
struct IVFPQBuildCagraConfig
Public Members
-
uint32_t n_lists = 1024
The number of inverted lists (clusters)
Hint: the number of vectors per cluster (
n_rows/n_lists
) should be approximately 1,000 to 10,000.
-
uint32_t kmeans_n_iters = 20
The number of iterations searching for kmeans centers (index building).
-
double kmeans_trainset_fraction = 0.5
The fraction of data to use during iterative kmeans building.
-
uint32_t pq_bits = 8
The bit length of the vector element after compression by PQ.
Possible values: [4, 5, 6, 7, 8].
Hint: the smaller the ‘pq_bits’, the smaller the index size and the better the search performance, but the lower the recall.
-
uint32_t pq_dim = 0
The dimensionality of the vector after compression by PQ. When zero, an optimal value is selected using a heuristic.
NB:
pq_dim /// pq_bits
must be a multiple of 8.Hint: a smaller ‘pq_dim’ results in a smaller index size and better search performance, but lower recall. If ‘pq_bits’ is 8, ‘pq_dim’ can be set to any number, but multiple of 8 are desirable for good performance. If ‘pq_bits’ is not 8, ‘pq_dim’ should be a multiple of 8. For good performance, it is desirable that ‘pq_dim’ is a multiple of 32. Ideally, ‘pq_dim’ should be also a divisor of the dataset dim.
-
codebook_gen codebook_kind = codebook_gen::PER_SUBSPACE
How PQ codebooks are created.
-
bool force_random_rotation = false
Apply a random rotation matrix on the input data and queries even if
dim % pq_dim == 0
.Note: if
dim
is not multiple ofpq_dim
, a random rotation is always applied to the input data and queries to transform the working space fromdim
torot_dim
, which may be slightly larger than the original space and and is a multiple ofpq_dim
(rot_dim % pq_dim == 0
). However, this transform is not necessary whendim
is multiple ofpq_dim
(dim == rot_dim
, hence no need in adding “extra” data columns / features).By default, if
dim == rot_dim
, the rotation transform is initialized with the identity matrix. Whenforce_random_rotation == true
, a random orthogonal transform matrix is generated regardless of the values ofdim
andpq_dim
.
-
bool conservative_memory_allocation = false
By default, the algorithm allocates more space than necessary for individual clusters (
list_data
). This allows to amortize the cost of memory allocation and reduce the number of data copies during repeated calls toextend
(extending the database).The alternative is the conservative allocation behavior; when enabled, the algorithm always allocates the minimum amount of memory required to store the given number of records. Set this flag to
true
if you prefer to use as little GPU memory for the database as possible.
-
uint32_t n_lists = 1024