Retrieval Infrastructure

The Open-Source RAG Stack Is Here

Yuki Nakashima May 9, 2025

In late 2023, building a production RAG system required assembling components that were, to varying degrees, immature. The vector databases were early. The embedding models were mostly proprietary. The chunking and indexing tooling was hand-rolled by every team. The evaluation frameworks were essentially nonexistent. The pattern for deploying RAG in production was not well understood and teams were learning it by doing.

By mid-2025, that's no longer true. There is now a mature open-source stack for retrieval-augmented generation, with production-grade components at every layer. The problem has shifted from "figure out how to build this" to "understand the trade-offs between alternatives and operate the result reliably."

The stack that has emerged

The ingestion layer — document parsing, chunking, embedding, and indexing — has consolidated around a small number of well-understood patterns. Unstructured handles extraction from diverse document formats with better accuracy than previous tools. The chunking strategies (fixed-size, semantic, hierarchical) are now documented with empirical guidance on which works for which query types. Fine-tuned open embedding models have closed much of the quality gap with proprietary alternatives.

The retrieval layer is where the most significant maturation has happened. Hybrid search combining dense vector retrieval with sparse BM25 has become the production standard — pure dense retrieval has too many failure modes on queries where keyword matching is the appropriate signal. Qdrant and Weaviate both support hybrid search natively now. Reranking with cross-encoder models as a post-retrieval step has become standard practice for applications where retrieval precision matters more than latency.

CocoIndex is an emerging pattern in this stack — a framework for building deterministic, incremental indexing pipelines that maintain consistency between source documents and the index as the underlying data changes. The problem of index freshness — ensuring that retrieval reflects the current state of source documents rather than a snapshot — is underappreciated until you try to deploy RAG against a living knowledge base, at which point it becomes urgent. This is an area where the tooling was thin and is now starting to mature.

What's still hard

The evaluation story remains incomplete. The components for building a RAG evaluation pipeline — RAGAS, DeepEval, and others — have improved, but systematic evaluation of retrieval quality in production applications requires more investment than most teams make. The consequence is that RAG application quality is often assessed by demos and user feedback rather than by systematic measurement, which makes it hard to debug regressions or optimize retrieval strategies.

Multi-modal RAG — retrieval over documents that contain tables, figures, and structured data alongside text — is a genuine unsolved problem. Most production RAG systems today work well on prose documents and poorly on anything else. The technical approaches for handling mixed-modality documents (vision models for figure extraction, specialized parsers for tables, hybrid indexing strategies) exist but are not well integrated. This is the next significant gap in the RAG stack.