File pq4_fast_scan.h

namespace faiss

Implementation of k-means clustering with many variants.

Copyright (c) Facebook, Inc. and its affiliates.

This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree.

IDSelector is intended to define a subset of vectors to handle (for removal or as subset to search)

PQ4 SIMD packing and accumulation functions

The basic kernel accumulates nq query vectors with bbs = nb * 2 * 16 vectors and produces an output matrix for that. It is interesting for nq * nb <= 4, otherwise register spilling becomes too large.

The implementation of these functions is spread over 3 cpp files to reduce parallel compile times. Templates are instantiated explicitly.

This file contains callbacks for kernels that compute distances.

Throughout the library, vectors are provided as float * pointers. Most algorithms can be optimized when several vectors are processed (added/searched) together in a batch. In this case, they are passed in as a matrix. When n vectors of size d are provided as float * x, component j of vector i is

x[ i * d + j ]

where 0 <= i < n and 0 <= j < d. In other words, matrices are always compact. When specifying the size of the matrix, we call it an n*d matrix, which implies a row-major storage.

I/O functions can read/write to a filename, a file handle or to an object that abstracts the medium.

The read functions return objects that should be deallocated with delete. All references within these objectes are owned by the object.

Definition of inverted lists + a few common classes that implement the interface.

Since IVF (inverted file) indexes are of so much use for large-scale use cases, we group a few functions related to them in this small library. Most functions work both on IndexIVFs and IndexIVFs embedded within an IndexPreTransform.

In this file are the implementations of extra metrics beyond L2 and inner product

Implements a few neural net layers, mainly to support QINCo

Defines a few objects that apply transformations to a set of vectors Often these are pre-processing steps.

Functions

void pq4_pack_codes(const uint8_t *codes, size_t ntotal, size_t M, size_t nb, size_t bbs, size_t nsq, uint8_t *blocks)

Pack codes for consumption by the SIMD kernels. The unused bytes are set to 0.

Parameters:
  • codes – input codes, size (ntotal, ceil(M / 2))

  • ntotal – number of input codes

  • nb – output number of codes (ntotal rounded up to a multiple of bbs)

  • nsq – number of sub-quantizers (=M rounded up to a muliple of 2)

  • bbs – size of database blocks (multiple of 32)

  • blocks – output array, size nb * nsq / 2.

void pq4_pack_codes_range(const uint8_t *codes, size_t M, size_t i0, size_t i1, size_t bbs, size_t nsq, uint8_t *blocks)

Same as pack_codes but write in a given range of the output, leaving the rest untouched. Assumes allocated entries are 0 on input.

Parameters:
  • codes – input codes, size (i1 - i0, ceil(M / 2))

  • i0 – first output code to write

  • i1 – last output code to write

  • blocks – output array, size at least ceil(i1 / bbs) * bbs * nsq / 2

uint8_t pq4_get_packed_element(const uint8_t *data, size_t bbs, size_t nsq, size_t vector_id, size_t sq)

get a single element from a packed codes table

Parameters:
  • vector_id – vector id

  • sq – subquantizer (< nsq)

void pq4_set_packed_element(uint8_t *data, uint8_t code, size_t bbs, size_t nsq, size_t vector_id, size_t sq)

set a single element “code” into a packed codes table

Parameters:
  • vector_id – vector id

  • sq – subquantizer (< nsq)

void pq4_pack_LUT(int nq, int nsq, const uint8_t *src, uint8_t *dest)

Pack Look-up table for consumption by the kernel.

Parameters:
  • nq – number of queries

  • nsq – number of sub-quantizers (muliple of 2)

  • src – input array, size (nq, 16)

  • dest – output array, size (nq, 16)

void pq4_accumulate_loop(int nq, size_t nb, int bbs, int nsq, const uint8_t *codes, const uint8_t *LUT, SIMDResultHandler &res, const NormTableScaler *scaler)

Loop over database elements and accumulate results into result handler

Parameters:
  • nq – number of queries

  • nb – number of database elements

  • bbs – size of database blocks (multiple of 32)

  • nsq – number of sub-quantizers (muliple of 2)

  • codes – packed codes array

  • LUT – packed look-up table

  • scaler – scaler to scale the encoded norm

int pq4_qbs_to_nq(int qbs)
int pq4_preferred_qbs(int nq)

return the preferred decomposition in blocks for a nb of queries.

int pq4_pack_LUT_qbs(int fqbs, int nsq, const uint8_t *src, uint8_t *dest)

Pack Look-up table for consumption by the kernel.

Parameters:
  • qbs – 4-bit encoded number of query blocks, the total number of queries handled (nq) is deduced from it

  • nsq – number of sub-quantizers (muliple of 2)

  • src – input array, size (nq, 16)

  • dest – output array, size (nq, 16)

Returns:

nq

int pq4_pack_LUT_qbs_q_map(int qbs, int nsq, const uint8_t *src, const int *q_map, uint8_t *dest)

Same as pq4_pack_LUT_qbs, except the source vectors are remapped with q_map

void pq4_accumulate_loop_qbs(int qbs, size_t nb, int nsq, const uint8_t *codes, const uint8_t *LUT, SIMDResultHandler &res, const NormTableScaler *scaler = nullptr)

Run accumulation loop.

Parameters:
  • qbs – 4-bit encoded number of queries

  • nb – number of database codes (mutliple of bbs)

  • nsq – number of sub-quantizers

  • codes – encoded database vectors (packed)

  • LUT – look-up table (packed)

  • res – call-back for the resutls

  • scaler – scaler to scale the encoded norm

struct CodePackerPQ4 : public faiss::CodePacker
#include <pq4_fast_scan.h>

CodePacker API for the PQ4 fast-scan

Public Functions

CodePackerPQ4(size_t nsq, size_t bbs)
virtual void pack_1(const uint8_t *flat_code, size_t offset, uint8_t *block) const final
virtual void unpack_1(const uint8_t *block, size_t offset, uint8_t *flat_code) const final

Public Members

size_t nsq