Navigating the vector database landscape

Two years ago, "vector database" was a category that required explanation. Now it's a line item in procurement requests at Fortune 500 companies. The speed of commoditization in this category has been striking — and in our view, it's a more accurate reflection of what vector search actually is: a storage and retrieval primitive, not a moat. Every major cloud provider has shipped a managed vector search offering. Most relational databases have added vector extensions. The dedicated vector database vendors are competing on a shrinking set of differentiators: scale, query performance at high cardinality, and filtering capabilities on hybrid queries.

This is not a pessimistic view of the category. It's a realistic one. Primitives becoming commodity is how the software stack matures. TCP/IP became commodity and the internet got built on top of it. Relational databases became commodity and a generation of SaaS companies got built on top of them. Vector storage will follow the same arc. The interesting investment question isn't which vector database wins the infrastructure layer — it's what abstraction layer and application layer gets built on top of the commodity, and who builds it first at enterprise scale.

Where the value accumulates

In our view, the value in the enterprise AI retrieval stack is migrating upward from storage to retrieval intelligence. The companies that matter aren't the ones that store embeddings efficiently; they're the ones that figure out what to retrieve, from where, using what strategy, and with what confidence thresholds. This includes context management systems that can dynamically decide which chunks of knowledge are relevant to a given query. It includes retrieval evaluation infrastructure that lets you measure whether your RAG pipeline is actually finding the right information. It includes hybrid retrieval systems that combine semantic search, full-text search, and structured filter logic in a single query plan without requiring the application developer to orchestrate that combination manually.

The portfolio decisions we've made in the data infrastructure layer reflect this analysis. We're not backing another vector store. We're backing the companies that make enterprise AI pipelines reliable, observable, and accurate — regardless of which storage layer sits underneath them. The infrastructure bet here is on the abstraction that makes the underlying storage choice a non-differentiating implementation detail.