Retrieval-augmented generation has won. It's the dominant architecture pattern for enterprise AI applications that require knowledge grounding, and the adoption is near-universal in the AI deployments we see in portfolio companies and their customers. The logic is sound: don't rely on the model's training data for enterprise-specific knowledge; retrieve the relevant context at query time, assemble it with the prompt, and let the model reason over fresh, accurate information. In theory, this solves the hallucination problem for factual enterprise queries. In practice, the gap between RAG as an architecture and RAG as a reliable production system is larger than most teams realize before they're running it at scale.

The gap is in the retrieval layer. The model reasoning in a RAG system is typically excellent — frontier models handle complex reasoning over assembled context reliably. The failure modes in production RAG are almost always in the retrieval layer: retrieving the wrong chunks, retrieving too much context (causing the model to miss the relevant information in the noise), retrieving stale data, failing to retrieve anything useful for edge-case queries, or assembling context in a format that doesn't match the model's reasoning pattern.

What reliable RAG infrastructure requires

Retrieval evaluation as a continuous process, not a one-time test. Most teams evaluate their RAG pipeline once when they build it and then assume it continues to work. The reality is that retrieval quality degrades as the knowledge base grows, as document formats change, as query patterns shift, and as model updates change the embedding representations used for similarity search. Continuous retrieval evaluation — running a golden question set against the system automatically, tracking retrieval precision and recall over time, alerting when metrics degrade — is the right operational posture. It's rarely implemented.

Chunking strategy as a first-class engineering decision. How documents are chunked before embedding dramatically affects retrieval quality, and the optimal chunking strategy depends on the document type, the query pattern, and the model's context window. Fixed-size chunking is the default but almost never optimal. Semantic chunking — splitting documents at natural semantic boundaries rather than arbitrary character counts — meaningfully improves retrieval precision. The tooling to implement and evaluate chunking strategies at the document corpus level is underbuilt relative to the downstream impact it has on application quality.

The infrastructure investment opportunity here is in the tooling layer that sits between raw vector storage and the LLM call: the retrieval pipeline management systems that make production RAG observable, tunable, and reliable. This is a category that doesn't yet have a dominant player at the enterprise scale, and it's where we're focusing a portion of our research attention for the next phase of investment.