Bay Information Systems

Why Are Vector Databases Difficult? A Deep Dive

Introduction

Vector databases have become essential in modern AI applications, powering semantic search, retrieval-augmented generation (RAG), and recommendation systems. These databases store embedding vectors (high-dimensional numerical representations derived from machine learning models). Unlike traditional databases that store structured or keyword-based data, vector databases enable similarity search, where a query retrieves the most semantically similar items.

Despite their utility, vector databases introduce unique challenges, particularly when handling multi-model embeddings. Different models generate vectors of varying sizes and distributions, making indexing, querying, and infrastructure management more complex. This article explores the core difficulties of vector databases, including:

The nature and origin of embeddings
Large and variable vector sizes
CPU vs. GPU deployment and caching implications
Indexing complexities (HNSW, IVF, etc.)
Memory bandwidth and storage constraints
Fine-tuning embedding models for better retrieval
The difference between storing and querying embeddings

Understanding Embeddings and Latent Space

What Are Embeddings?

Embeddings are numerical representations of data (text, images, audio) mapped into a high-dimensional space. These vectors capture semantic relationships—similar items are closer together, while dissimilar items are farther apart.

For example, text embeddings from models like all-MiniLM-L6-v2 map similar phrases to nearby points in space. Image embeddings, such as those from Meta’s SAM model, encode visual features into high-dimensional vectors, often exceeding 1,024 dimensions, making them significantly larger than text embeddings.

How Are Latent Spaces Built?

Modern embeddings are derived from deep learning models, often from the intermediate layers of a Transformer model. Historically, Variational Autoencoders (VAEs) were common, but today’s approaches primarily use:

Contrastive Learning (e.g., CLIP, SimCLR): Forces similar items to have close embeddings while pushing dissimilar ones apart.
Self-Supervised Learning: Uses masked token prediction (BERT-style models) or next-sentence prediction.
Metric Learning: Optimizes embeddings for retrieval-specific objectives.

Why Fine-Tune Embeddings?

Off-the-shelf embeddings might not be optimal for specific domains. Fine-tuning aligns the latent space with the problem at hand. The key steps:

Collect labeled pairs (query-document pairs, similar/dissimilar images, etc.).
Train with a loss function (contrastive loss, triplet loss, or cosine similarity loss).
Evaluate using retrieval metrics like Mean Reciprocal Rank (MRR), Recall\@K, or NDCG.

Fine-tuned embeddings can dramatically improve retrieval quality but add complexity in maintaining custom models.

Why Are Vector Databases Challenging?

1. Large and Variable Vector Sizes

Embeddings range from 256 dimensions (simple models) to 2,048+ dimensions (complex image models like SAM). High-dimensional vectors cause:

Memory bloat: A dataset with millions of 1,536-d vectors consumes vast amounts of RAM.
CPU cache inefficiencies: Large vectors don’t fit neatly in cache lines, leading to slow memory accesses.
Difficulty in mixed-model retrieval: Different models output varying sizes, forcing schema design choices (multiple indexes or concatenated vectors).

2. CPU vs. GPU for Vector Search

FAISS and GPU Acceleration

Facebook AI Similarity Search (FAISS) is the most widely used library for vector search, supporting both CPU and GPU acceleration.

CPU-Based FAISS: Uses SIMD (vectorized operations) and multithreading to optimize nearest neighbor search.
GPU-Based FAISS: Leverages CUDA for parallel brute-force search, often 10-100x faster than CPU.

Key Trade-offs:

Factor	CPU FAISS	GPU FAISS
Query Latency	Slower (depends on RAM bandwidth)	Fast (parallel execution)
Index Types	IVF, HNSW	Flat (brute-force), IVF
Memory Usage	Fits in system RAM	Limited to GPU VRAM (often <24GB)
Cost	Cheaper	Expensive GPUs required

GPU-based FAISS excels in large-scale brute-force searches but struggles with complex indexing methods like HNSW due to VRAM limitations.

3. Indexing Difficulties

Vector search requires Approximate Nearest Neighbor (ANN) indexing to speed up retrieval. Popular techniques include:

HNSW (Hierarchical Navigable Small World): Graph-based search, high recall but memory-heavy.
IVF (Inverted File Index): Clustered search, requires training on dataset.
PQ (Product Quantization): Compresses vectors to reduce memory footprint.

Each has trade-offs:

HNSW requires full index rebuilds after major updates.
IVF struggles with dynamic data, as centroids don’t adjust without retraining.
PQ reduces precision, affecting search quality.

4. Storage vs. Querying: Not the Same

Many databases claim to “support embeddings,” but few offer native vector search.

PostgreSQL (pgvector) and SQLite (VSS) provide SQL-like interfaces for embeddings but still rely on extensions.
MySQL, MongoDB, and Redis store embeddings but don’t always offer optimized similarity search.

This distinction matters: If the database doesn’t support ANN natively, you may need to offload search to Python (FAISS, Annoy) or C++ implementations, adding architectural complexity.

5. Infrastructure Constraints

Memory Bandwidth & Scaling Issues

Vector search is memory-bound: Even optimized FAISS queries often max out RAM bandwidth, limiting performance.
Sharding vectors across machines is complex: Unlike traditional databases, vector DBs require distributed indexing.
Few databases offer true storage/compute separation: Pinecone and Milvus are starting to offload storage to object stores, but most systems still require fully in-memory indexes.

Logging & Observability

Unlike SQL databases, vector search queries aren’t easily interpretable. Debugging search results is hard because raw embeddings lack human readability. To mitigate:

Store metadata alongside vectors (e.g., text descriptions of embeddings).
Use embedding visualization tools like TensorBoard’s embedding projector.
Log top-N retrieved items per query to analyze search quality over time.

Comparing Vector Database Solutions

Database	Type	Indexing Methods	Max Dimensions	Storage/Compute Separation	Best Use Case
PostgreSQL (pgvector)	RDBMS Extension	HNSW, IVFFlat	~2,000 (Postgres page size limit)	No	Adding vector search to relational data
SQLite (VSS Extension)	Embedded SQL	IVF, Flat	Flexible (depends on implementation)	No	Lightweight, mobile/edge applications
Milvus	Dedicated Vector DB	HNSW, IVF, DiskANN	32,768+	Yes	Large-scale, distributed search
Pinecone	Managed Cloud	Proprietary (HNSW-like)	20,000	Partial	Serverless, easy integration
Weaviate	GraphQL-based	HNSW	65,000+	Yes	Multi-modal search (text + images)

Conclusion

Vector databases are powerful but introduce significant scalability, memory, and indexing challenges. When dealing with multi-model embeddings, engineers must carefully:

Choose between CPU vs. GPU search based on cost/performance.
Select an appropriate indexing strategy (HNSW, IVF, PQ).
Understand that storage ≠ query performance—some databases store embeddings but don’t optimize retrieval.
Consider fine-tuning embeddings to improve retrieval performance.

Despite these hurdles, innovations like FAISS on GPUs, serverless vector DBs, and smarter ANN indexing are making vector search more efficient. However, designing a robust, scalable retrieval system still requires deep architectural planning.