VecBench

A controllable benchmark framework for evaluating vector databases under filtered vector search workloads. VecBench supports multiple databases, datasets, index algorithms, and provides comprehensive metrics including recall, latency, QPS, and incremental operation performance. We also maintain an online leaderboard with benchmark results at http://www.x-bench.xin

Overview

Filtered vector search—retrieving nearest neighbors under structured constraints (e.g., equality, range, containment)—is a core operation in modern vector databases and LLM systems (e.g., RAG).

However, existing benchmarks suffer from three major limitations:

Limited data realism Most benchmarks rely on low-dimensional or small-scale datasets, which do not reflect real-world scenarios involving millions of high-dimensional vectors.
Simplistic query workloads

Queries are typically generated using random filters, ignoring two critical factors:
- Selectivity (how many data points pass the filter)
- Filter correlation (how filtering conditions align with vector similarity)
Incomplete evaluation metrics Prior work evaluates isolated metrics (e.g., recall or latency), without capturing system-level trade-offs, especially under dynamic updates.

To address these issues, we propose VecBench, a controllable benchmark for holistic evaluation of filtered vector search.

Framework Architecture

VecBench follows a modular pipeline:

Data loading and preprocessing
Controlled data generation (dimension & scale)
Query synthesis (selectivity & correlation)
Execution (multi-phase workload)
Metrics collection and aggregation

Comparison with Existing Work

Compared to existing benchmarks, VecBench provides:

Controllable data generation
Adjustable selectivity and filter correlation
Multi-type predicates (equality, range, containment)
Dynamic update evaluation
End-to-end holistic metrics

Features

Multi-database support: Milvus, pgvector (PostgreSQL), Qdrant, Weaviate
Multiple datasets: YFCC (10M, 1920-dim), SIFT (1M, 128-dim), GLOVE (1.2M, 200-dim), GIST (1M, 960-dim)
Hybrid queries: Vector similarity search combined with scalar filters (equality, range, contains)
Multiple index algorithms: HNSW, IVFFlat, DiskANN, ACORN ...
Concurrent benchmarking: Multi-threaded workload execution with warm-up phase and QPS measurement
Incremental operations: Insert, Update, Delete with configurable ratios
Query analysis: Filter rate and relevance analysis with automatic query modification
Ground truth generation: Brute-force exact search for recall evaluation

Quick Start

Prerequisites

Python 3.9+
A running instance of at least one supported vector database

Installation

conda create -n vecbench python=3.10
conda activate vecbench

pip install pymilvus psycopg2-binary qdrant-client weaviate-client
pip install numpy pandas h5py scipy pyyaml tqdm matplotlib

1. Prepare Data

Download the dataset files and place them under data/<DATASET>/. See Dataset Preparation for details.

2. Configure Database Connection

Edit config/db_config.yaml with your database connection info:

pgvector:
  host: localhost
  port: 5432
  dbname: postgres
  user: postgres
  password: your_password

milvus:
  host: localhost
  port: 19530

qdrant:
  host: localhost
  port: 6333
  grpc_port: 6334

weaviate:
  host: http://localhost:8080
  grpc_port: 50051

3. Initialize Database and Load Data

# Full YFCC 10M dataset with Milvus
python benchmark.py --case init --database milvus --dataset YFCC --scale 10000000

# SIFT 1M dataset with pgvector
python benchmark.py --case init --database pgvector --dataset SIFT --scale 1000000

# GLOVE with Qdrant, using smaller batch size
python benchmark.py --case init --database qdrant --dataset GLOVE --scale 1000000 --chunk_rows 500000

# GIST with Weaviate
python benchmark.py --case init --database weaviate --dataset GIST --scale 1000000

The init phase will:

Connect to the target database
Create the table/collection with the configured schema
Load and insert data in chunks
Generate ground truth for the configured queries

4. Run Benchmarks

# Concurrent benchmark: Milvus + HNSW, 50 threads
python benchmark.py --case test --database milvus --dataset YFCC --algorithm hnsw --concurrency 50

# pgvector + IVFFlat, 10 threads
python benchmark.py --case test --database pgvector --dataset SIFT --algorithm ivfflat --concurrency 10

# Run each query 5 times for more stable results
python benchmark.py --case test --database qdrant --dataset GLOVE --algorithm hnsw --times 5 --concurrency 50

The test phase runs a two-stage concurrent benchmark:

Warm-up phase: Preloads data into memory to eliminate cold-start I/O effects
Formal benchmark: Executes all queries under the specified concurrency, measures QPS and recall

Output includes per-query latency statistics and overall QPS/recall per search parameter.

CLI Reference

python benchmark.py [OPTIONS]

Option	Default	Description
`--case`	`init`	Execution mode: `data_pre`, `init`, `modify_queries`, `test`
`--database`	`milvus`	Target database: `milvus`, `pgvector`, `qdrant`, `weaviate`
`--dataset`	`YFCC`	Dataset: `YFCC`, `SIFT`, `GLOVE`, `GIST`
`--scale`	`10000000`	Number of vectors to load
`--chunk_rows`	`1000000`	Batch size for data loading
`--algorithm`	`hnsw`	Index algorithm: `hnsw`, `ivfflat`, `diskann`
`--times`	`1`	Number of times each query is executed
`--concurrency`	`50`	Number of concurrent threads
`--ratio`	`0.2`	Ratio of data for incremental generation
`--in_ratio`	`0.2`	Ratio of data to insert before index creation
`--up_ratio`	`0.2`	Ratio of data to update
`--de_ratio`	`0.2`	Ratio of data to delete
`--regen_incremental`	`False`	Force regenerate incremental data

Execution Modes

`data_pre` — Data Preparation

Generates synthetic and scaled data for benchmarking:

# Adjust vector dimensionality (YFCC to 1920d) and scale to 100M
python benchmark.py --case data_pre --dataset YFCC --scale 100000000

This runs three operations:

Dimension adjustment: Random projection to change vector dimensions
Scale adjustment: Replicate data with noise to reach target scale
Incremental data generation: Create insert/update/delete workload data

`init` — Database Initialization

Creates the table, loads data, and generates ground truth:

python benchmark.py --case init --database milvus --dataset YFCC --scale 10000000

`modify_queries` — Query Modification

Adjusts query filter selectivity based on filter rate or relevance analysis:

python benchmark.py --case modify_queries --database milvus --dataset YFCC

This uses the Analyzer to compute filter rates and relevances across the dataset, then modifies query scalar filters to achieve a target selectivity. Two modes are available:

filter: Modify filters to match a target filter rate (fraction of data passing the filter)
relevance: Modify filters to match a target relevance rate (fraction of top-k results passing the filter)

Edit the default_filter_rate / default_relevance_rate and per_query_filter / per_query_relevance dictionaries in benchmark.py to control target selectivities.

`test` — Benchmark Execution

Runs the benchmark with index creation, query execution, concurrent testing, and incremental operations:

python benchmark.py --case test --database milvus --dataset YFCC --algorithm hnsw --concurrency 50

The test case supports multiple phases (uncomment the desired phases in benchmark.py):

P1 — Initialization: Create vector and scalar indexes
P2 — Query Execution: Single-threaded query execution with detailed per-query statistics
P3 — Concurrent Phase: Multi-threaded benchmark with warm-up and QPS measurement
P4 — Incremental Load: Insert new data into the indexed collection
P5 — Update Phase: Update existing vectors and scalar fields
P6 — Delete Phase: Delete previously inserted incremental data

Project Structure

VecBench/
├── benchmark.py                 # Main entry point
├── config/
│   ├── db_config.yaml          # Database connection settings
│   ├── YFCC/
│   │   ├── schema.yaml         # Table schema for each database
│   │   ├── index.yaml          # Index configurations + search params
│   │   ├── E2E_queries.yaml    # End-to-end query definitions
│   │   └── Filtered_queries.yaml
│   ├── SIFT/
│   ├── GLOVE/
│   └── GIST/
├── data/                        # Dataset files (not included in repo)
│   ├── YFCC/
│   ├── SIFT/
│   ├── GLOVE/
│   └── GIST/
├── databases/                   # Database connector implementations
│   ├── base_vd.py              # Abstract base class
│   ├── milvus.py               # Milvus connector
│   ├── pgvector.py             # pgvector connector
│   ├── qdrant.py               # Qdrant connector
│   └── weaviate.py             # Weaviate connector
├── utils/
│   ├── workload_executor.py    # Query execution, ground truth generation
│   ├── concurrent.py           # Concurrent benchmarking with connection pool
│   ├── data_synthesizer.py     # Dimension/scale adjustment, incremental data
│   ├── analyzer.py             # Filter rate & relevance analysis
│   ├── query_generator.py      # Random query generation
│   └── plot.py                 # Result visualization
└── perf_result/                 # Benchmark output

Configuration

Schema Configuration (`config/<DATASET>/schema.yaml`)

Defines the table/collection structure for each database. Each dataset has a schema file specifying columns, types, and constraints. Example for YFCC on Milvus:

milvus:
  table_name: my_table
  columns:
    - name: id
      type: INT64
      primary: True
    - name: equal
      type: INT16
    - name: range
      type: FLOAT
    - name: tags
      type: ARRAY
      element_type: INT32
      max_capacity: 1600
    - name: image_vec
      type: FLOAT_VECTOR
      dimension: 1920

The schema includes both vector fields and scalar fields used for hybrid filtering.

Index Configuration (`config/<DATASET>/index.yaml`)

Defines index parameters and search parameters for each database and algorithm:

milvus:
  hnsw:
    - index_column: "image_vec"
      distance: L2
      params:
        - m: 16
          ef_construction: 64
  scalar_index:
    - index_column: "equal"
    - index_column: "range"
    - index_column: "tags"

search_params:
  hnsw: [800]
  ivfflat: [50]
  diskann: [100, 1000]

The search_params section controls the search parameter sweep (e.g., ef for HNSW, nprobe for IVFFlat). The benchmark runs all queries at each parameter value and reports QPS/recall for each.

Query Configuration (`config/<DATASET>/E2E_queries.yaml`)

Defines the benchmark queries with vector search + scalar filters:

YFCC:
  - name: query1
    vector_field: image_vec
    reference_vector_name: 987232
    scalar_filters:
      - field: equal
        operator: "=="
        value: 14
        logic: and
    limit: 100

Supported filter operators: ==, <, <=, >, >=, like, contains

Dataset Preparation

YFCC

The YFCC dataset uses a custom binary format. Place the following files in data/YFCC/:

File	Description
`base.10M.1920d.u8bin`	10M vectors (uint8, 1920 dimensions)
`base.metadata.10M.spmat`	Sparse tag matrix (CSR format)
`equal_cluster.u2bin`	Equality filter values (uint16)
`range.f4bin`	Range filter values (float32)

YFCC provides three scalar fields:

equal: Categorical/discrete value for equality filtering
range: Continuous value for range filtering
tags: Multi-value field for containment filtering

SIFT / GLOVE / GIST

These datasets use the standard HDF5 format from ann-benchmarks:

Dataset	File	Vectors	Dimensions	Distance
SIFT	`sift-128-euclidean.hdf5`	1M	128	L2
GLOVE	`glove-200-angular.hdf5`	1.2M	200	Cosine
GIST	`gist-960-euclidean.hdf5`	1M	960	L2

Place the HDF5 files in data/<DATASET>/ along with a *_scalar.bin file containing equality filter values for hybrid queries.

Scaling Data

To generate larger datasets from existing ones:

# Scale YFCC to 100M vectors
python benchmark.py --case data_pre --dataset YFCC --scale 100000000

This replicates the base data with Gaussian noise and saves the scaled dataset to data/YFCC/scaled_100M/.

Incremental Data

Incremental data for insert/update/delete operations is generated during data_pre:

python benchmark.py --case data_pre --dataset YFCC --scale 10000000 --ratio 0.2

The --ratio flag controls the size of incremental data as a fraction of the base dataset. Generated files are stored in data/<DATASET>/incremental_data/.

Database Deployment

Milvus

# Using Docker Compose (etcd + MinIO + Milvus standalone)
docker-compose up -d

Default port: 19530

pgvector

# Pull and run PostgreSQL with pgvector extension
docker run -d \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=your_password \
  pgvector/pgvector:pg16

Then create the extension: CREATE EXTENSION vector;

Qdrant

docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant

Weaviate

docker run -d \
  -p 8080:8080 \
  -p 50051:50051 \
  cr.weaviate.io/semitechnologies/weaviate:latest

Extending VecBench

Adding a New Database

Create a new connector in databases/ that inherits from BaseVD
Implement the required methods: connect(), create_table(), create_index(), create_scalar_index(), process_data(), insert_data(), query(), close()
Add the database configuration to config/db_config.yaml
Add per-dataset schema entries in config/<DATASET>/schema.yaml
Register the database in setup_database() in benchmark.py

Adding a New Dataset

Place data files in data/<DATASET>/
Create config/<DATASET>/schema.yaml with table definitions
Create config/<DATASET>/index.yaml with index and search parameters
Create config/<DATASET>/E2E_queries.yaml with benchmark queries
Add the dataset loading logic in data/load_yfcc.py (for custom formats) or use the existing HDF5 loader

Metrics

VecBench reports the following metrics:

Recall: Fraction of ground truth results found by the approximate search
Latency: Per-query execution time (avg, min, max, p1/p5/p25/p50/p95/p99)
QPS: Queries per second under concurrent workload
Index creation time: Time to build vector and scalar indexes
Incremental operation time: Insert, update, delete throughput

License

Code License

This project is licensed under the MIT License. See the LICENSE file for full details.

Paper License

The associated paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Citation

If you use this benchmark in your research or systems, please cite our paper:

@article{vecbench2026,
  title={VecBench: A Controllable Benchmark for Filtered Vector Search},
  author={Zhang, Xiang and Zhang, Chao and Fan, Ju and Li, Guoliang and Du, Xiaoyong},
  journal={Proceedings of the ACM on Management of Data (SIGMOD)},
  year={2026},
  doi={10.1145/3802125}
}

If you find this project helpful, please consider giving it a star ⭐ and citing our work.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
data		data
databases		databases
imgs		imgs
utils		utils
README.md		README.md
benchmark.py		benchmark.py

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

VecBench

Overview

Framework Architecture

Comparison with Existing Work

Features

Quick Start

Prerequisites

Installation

1. Prepare Data

2. Configure Database Connection

3. Initialize Database and Load Data

4. Run Benchmarks

CLI Reference

Execution Modes

data_pre — Data Preparation

init — Database Initialization

modify_queries — Query Modification

test — Benchmark Execution

Project Structure

Configuration

Schema Configuration (config/<DATASET>/schema.yaml)

Index Configuration (config/<DATASET>/index.yaml)

Query Configuration (config/<DATASET>/E2E_queries.yaml)

Dataset Preparation

YFCC

SIFT / GLOVE / GIST

Scaling Data

Incremental Data

Database Deployment

Milvus

pgvector

Qdrant

Weaviate

Extending VecBench

Adding a New Database

Adding a New Dataset

Metrics

License

Code License

Paper License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`data_pre` — Data Preparation

`init` — Database Initialization

`modify_queries` — Query Modification

`test` — Benchmark Execution

Schema Configuration (`config/<DATASET>/schema.yaml`)

Index Configuration (`config/<DATASET>/index.yaml`)

Query Configuration (`config/<DATASET>/E2E_queries.yaml`)

Packages