Retrieval-Augmented Generation (RAG) has become a cornerstone of modern AI systems, especially for enterprise applications where large language models need grounding in domain-specific knowledge.
But “RAG” isn’t one thing. There are several techniques of varying complexity, and the tools chosen, particularly databases and indexing strategies, should reflect that technique.
In this article, I’ll walk through three types of RAG strategies, the tradeoffs behind each, and how to choose appropriate tooling for your context.
1. Naive RAG The simplest setup: embed your documents, store them in a vector database, and query using nearest-neighbour search. It is fast to implement and sufficient for many internal search tasks, though often brittle when queries are vague or abstract.
2. Reranking RAG This approach retrieves a broad candidate set (for example, the top 100 results) and reranks them using a more capable model that examines full content, such as a cross-encoder. It increases relevance but adds computational cost and latency.
3. LLM-Enabled or Agentic RAG In this case, LLMs participate earlier in the process: rewriting queries, generating document summaries during ingestion, or dynamically shaping retrieval. It is often the only way to maintain high quality when questions are complex or ambiguous.
You cannot run an effective RAG system without making infrastructure decisions. Below is a comparison of common vector databases and retrieval tools:
Naive RAG works well with pgvector, Chroma, or DuckDB. These are fast to set up, easy to reason about, and integrate with structured or tabular data.
Reranking RAG benefits from FAISS, Qdrant, or Milvus. These support high-recall candidate sets, which rerankers require.
LLM-Enabled RAG often depends on flexible pipelines and schema-aware storage. Weaviate and Qdrant are particularly suitable here, as they accommodate structured enrichment and allow dynamic querying.
Structured filters and clean metadata are important, and especially relevant when the LLM is orchestrating retrieval steps based on intermediate reasoning.
RAG systems are not one-size-fits-all. The difference between a frustrating chatbot and a useful AI application often comes down to retrieval quality. That quality, in turn, depends on how you structure, index, rank, and augment the content behind your models.
By understanding your RAG architecture (naive, reranking, or LLM-enhanced) you can:
If you’re working in this space or exploring how to apply RAG techniques, I’d be interested to hear how you’re approaching the tradeoffs.