File pq4_fast_scan.h
-
namespace faiss
Implementation of k-means clustering with many variants.
Copyright (c) Facebook, Inc. and its affiliates.
This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.
IDSelector is intended to define a subset of vectors to handle (for removal or as subset to search)
PQ4 SIMD packing and accumulation functions
The basic kernel accumulates nq query vectors with bbs = nb * 2 * 16 vectors and produces an output matrix for that. It is interesting for nq * nb <= 4, otherwise register spilling becomes too large.
The implementation of these functions is spread over 3 cpp files to reduce parallel compile times. Templates are instantiated explicitly.
This file contains callbacks for kernels that compute distances.
Throughout the library, vectors are provided as float * pointers. Most algorithms can be optimized when several vectors are processed (added/searched) together in a batch. In this case, they are passed in as a matrix. When n vectors of size d are provided as float * x, component j of vector i is
x[ i * d + j ]
where 0 <= i < n and 0 <= j < d. In other words, matrices are always compact. When specifying the size of the matrix, we call it an n*d matrix, which implies a row-major storage.
I/O functions can read/write to a filename, a file handle or to an object that abstracts the medium.
The read functions return objects that should be deallocated with delete. All references within these objectes are owned by the object.
Definition of inverted lists + a few common classes that implement the interface.
Since IVF (inverted file) indexes are of so much use for large-scale use cases, we group a few functions related to them in this small library. Most functions work both on IndexIVFs and IndexIVFs embedded within an IndexPreTransform.
In this file are the implementations of extra metrics beyond L2 and inner product
Implements a few neural net layers, mainly to support QINCo
Defines a few objects that apply transformations to a set of vectors Often these are pre-processing steps.
Functions
-
void pq4_pack_codes(const uint8_t *codes, size_t ntotal, size_t M, size_t nb, size_t bbs, size_t nsq, uint8_t *blocks)
Pack codes for consumption by the SIMD kernels. The unused bytes are set to 0.
- Parameters:
codes – input codes, size (ntotal, ceil(M / 2))
ntotal – number of input codes
nb – output number of codes (ntotal rounded up to a multiple of bbs)
nsq – number of sub-quantizers (=M rounded up to a muliple of 2)
bbs – size of database blocks (multiple of 32)
blocks – output array, size nb * nsq / 2.
-
void pq4_pack_codes_range(const uint8_t *codes, size_t M, size_t i0, size_t i1, size_t bbs, size_t nsq, uint8_t *blocks)
Same as pack_codes but write in a given range of the output, leaving the rest untouched. Assumes allocated entries are 0 on input.
- Parameters:
codes – input codes, size (i1 - i0, ceil(M / 2))
i0 – first output code to write
i1 – last output code to write
blocks – output array, size at least ceil(i1 / bbs) * bbs * nsq / 2
-
uint8_t pq4_get_packed_element(const uint8_t *data, size_t bbs, size_t nsq, size_t vector_id, size_t sq)
get a single element from a packed codes table
- Parameters:
vector_id – vector id
sq – subquantizer (< nsq)
-
void pq4_set_packed_element(uint8_t *data, uint8_t code, size_t bbs, size_t nsq, size_t vector_id, size_t sq)
set a single element “code” into a packed codes table
- Parameters:
vector_id – vector id
sq – subquantizer (< nsq)
-
void pq4_pack_LUT(int nq, int nsq, const uint8_t *src, uint8_t *dest)
Pack Look-up table for consumption by the kernel.
- Parameters:
nq – number of queries
nsq – number of sub-quantizers (muliple of 2)
src – input array, size (nq, 16)
dest – output array, size (nq, 16)
-
void pq4_accumulate_loop(int nq, size_t nb, int bbs, int nsq, const uint8_t *codes, const uint8_t *LUT, SIMDResultHandler &res, const NormTableScaler *scaler)
Loop over database elements and accumulate results into result handler
- Parameters:
nq – number of queries
nb – number of database elements
bbs – size of database blocks (multiple of 32)
nsq – number of sub-quantizers (muliple of 2)
codes – packed codes array
LUT – packed look-up table
scaler – scaler to scale the encoded norm
-
int pq4_qbs_to_nq(int qbs)
-
int pq4_preferred_qbs(int nq)
return the preferred decomposition in blocks for a nb of queries.
-
int pq4_pack_LUT_qbs(int fqbs, int nsq, const uint8_t *src, uint8_t *dest)
Pack Look-up table for consumption by the kernel.
- Parameters:
qbs – 4-bit encoded number of query blocks, the total number of queries handled (nq) is deduced from it
nsq – number of sub-quantizers (muliple of 2)
src – input array, size (nq, 16)
dest – output array, size (nq, 16)
- Returns:
nq
-
int pq4_pack_LUT_qbs_q_map(int qbs, int nsq, const uint8_t *src, const int *q_map, uint8_t *dest)
Same as pq4_pack_LUT_qbs, except the source vectors are remapped with q_map
-
void pq4_accumulate_loop_qbs(int qbs, size_t nb, int nsq, const uint8_t *codes, const uint8_t *LUT, SIMDResultHandler &res, const NormTableScaler *scaler = nullptr)
Run accumulation loop.
- Parameters:
qbs – 4-bit encoded number of queries
nb – number of database codes (mutliple of bbs)
nsq – number of sub-quantizers
codes – encoded database vectors (packed)
LUT – look-up table (packed)
res – call-back for the resutls
scaler – scaler to scale the encoded norm
-
struct CodePackerPQ4 : public faiss::CodePacker
- #include <pq4_fast_scan.h>
CodePacker API for the PQ4 fast-scan
Public Functions
-
CodePackerPQ4(size_t nsq, size_t bbs)
-
virtual void pack_1(const uint8_t *flat_code, size_t offset, uint8_t *block) const final
-
virtual void unpack_1(const uint8_t *block, size_t offset, uint8_t *flat_code) const final
Public Members
-
size_t nsq
-
CodePackerPQ4(size_t nsq, size_t bbs)
-
void pq4_pack_codes(const uint8_t *codes, size_t ntotal, size_t M, size_t nb, size_t bbs, size_t nsq, uint8_t *blocks)