Design a Vector Search Engine — System Design Practice | AmanAI Lab

Problems›Design a Vector Search Engine

PineconeGoogleMetaMSFT

45:00

Hard

The Problem

Problem

Design a vector search engine that stores 1 billion embeddings and serves approximate-nearest-neighbour (ANN) queries for semantic search and RAG retrieval.

Requirements:

•Insert, update, and delete vectors (with metadata) in near real time
•k-NN search with metadata filtering (e.g. "top 10 similar docs where tenant = X")
•Tunable trade-off between recall and latency
•Multi-tenant: thousands of isolated indexes

What you'll be assessed on

ANN index choice, sharding billions of vectors, the recall/latency/cost trade-off, filtered search, and real-time updates.

Scale & Constraints

▸1B vectors, 768–1536 dimensions each (~3–6 KB per vector)
▸50K queries/sec at peak, P99 search latency ≤ 50ms
▸Recall@10 ≥ 0.95 target
▸Inserts/updates visible within seconds (near real time)
▸Metadata filters applied alongside vector similarity
▸Thousands of tenants with isolated namespaces

Must Cover0/10

Hints (if stuck)

💡 HNSW gives great recall/latency but is memory-hungry; IVF-PQ trades some recall for far lower memory at billion scale.

💡 Shard by vector (hash) and scatter-gather: query all shards, merge top-k. Replicate shards for throughput.

💡 Product Quantisation compresses vectors ~10–30× so a billion of them fit in RAM across the cluster.

💡 Metadata filtering is the hard part — naive pre-filtering can starve the ANN graph; many engines use filtered graph traversal.

💡 Handle updates with an in-memory write buffer + periodic background re-index/compaction; deletes use tombstones.

0 words · auto-saved

Back to problems

Mock interviewTake quizFlashcards

Problem

Design a vector search engine that stores 1 billion embeddings and serves approximate-nearest-neighbour (ANN) queries for semantic search and RAG retrieval.

Requirements:

•Insert, update, and delete vectors (with metadata) in near real time

•k-NN search with metadata filtering (e.g. "top 10 similar docs where tenant = X")

•Tunable trade-off between recall and latency

•Multi-tenant: thousands of isolated indexes

What you'll be assessed on