Bay Information Systems

RAG Strategy and Tooling

Retrieval-Augmented Generation (RAG) has become a cornerstone of modern AI systems, especially for enterprise applications where large language models need grounding in domain-specific knowledge.

But “RAG” isn’t one thing. There are several techniques of varying complexity, and the tools chosen, particularly databases and indexing strategies, should reflect that technique.

In this article, I’ll walk through three types of RAG strategies, the tradeoffs behind each, and how to choose appropriate tooling for your context.


Three Types of RAG Systems

1. Naive RAG The simplest setup: embed your documents, store them in a vector database, and query using nearest-neighbour search. It is fast to implement and sufficient for many internal search tasks, though often brittle when queries are vague or abstract.

2. Reranking RAG This approach retrieves a broad candidate set (for example, the top 100 results) and reranks them using a more capable model that examines full content, such as a cross-encoder. It increases relevance but adds computational cost and latency.

3. LLM-Enabled or Agentic RAG In this case, LLMs participate earlier in the process: rewriting queries, generating document summaries during ingestion, or dynamically shaping retrieval. It is often the only way to maintain high quality when questions are complex or ambiguous.


How Do Databases Support RAG?

You cannot run an effective RAG system without making infrastructure decisions. Below is a comparison of common vector databases and retrieval tools:

FAISS

Qdrant

Weaviate

Milvus

Pinecone

pgvector (Postgres extension)

Chroma


Matching Tooling to RAG Strategy

Structured filters and clean metadata are important, and especially relevant when the LLM is orchestrating retrieval steps based on intermediate reasoning.


Summary

RAG systems are not one-size-fits-all. The difference between a frustrating chatbot and a useful AI application often comes down to retrieval quality. That quality, in turn, depends on how you structure, index, rank, and augment the content behind your models.

By understanding your RAG architecture (naive, reranking, or LLM-enhanced) you can:

If you’re working in this space or exploring how to apply RAG techniques, I’d be interested to hear how you’re approaching the tradeoffs.