The vector database category has ~40 funded companies. Maybe five have the API surface, performance profile, and community momentum to survive as independent businesses. What distinguishes the durable ones.
Moving inference from cloud to edge isn't just a cost optimization — it's an architectural shift that changes what AI applications can do. The infrastructure requirements, and who builds them.
Data contracts as a pattern are now mainstream. What's actually stuck in production, what's still aspirational, and what the successful implementations have in common.
Real-time data pipelines have been "the future" for a decade. In 2025 they're genuinely table-stakes. The infrastructure maturity, tooling cost drops, and developer ergonomics that closed the gap.
Traditional APM wasn't built for probabilistic outputs. ML systems in production need a different observability surface — what it looks like and which teams are building the right abstractions.
When Flintrock led Chalk's seed in 2023, feature stores were a specialty concern. Now they're a deployment prerequisite. The market map, and what the mature-pattern looks like.
Every portfolio company we've backed has been absorbed into someone's RAG architecture. Here's what a principled open-source stack looks like, layer by layer, with no vendor lock-in at any point.
For most analytical workloads, a single well-designed process beats a distributed cluster. Why the pendulum is swinging back, what DuckDB makes tractable, and the class of workloads still needing Spark.
Not all MLOps tooling translates cleanly to the large-language-model world. A systematic look at which abstractions carry over, which need redesigning, and which should be retired entirely.
The database-as-server model was designed for a different era. In-process engines like DuckDB dissolve the client-server boundary — and that changes how AI applications can interact with data.
Vector databases are being built by infrastructure engineers and application developers simultaneously, with very different design centers. What that means for how the category shakes out.
The OSS-core to enterprise-contract playbook that Kafka and Airflow ran is still viable — but the timeline, community threshold, and enterprise buyer behavior have all shifted. What's new.
Post-hoc data quality tooling runs in the warehouse, after the damage is done. The more productive frame is source-side validation — which requires a different infrastructure model entirely.
The dominant mental model of orchestration as "cron at scale" misses what makes modern workflow systems valuable: observability, recoverable failure states, and dynamic task graphs. What changed and why it matters.
There are now hundreds of AI infrastructure companies. Most address real pain. Few will have durable businesses. A map of the stack, a framework for evaluating positions, and a view on where defensibility compounds.
What a language model can do is bounded by what it can see. The retrieval infrastructure that populates the context window — embedding models, vector stores, rerankers, chunking strategies — is not a solved problem.
The gap between ML research and ML production is not closing fast enough. Feature skew, model drift, serving latency, dependency hell — each is a product category. The companies addressing the hardest ones.
Our investment thesis for Qdrant: the architecture decisions that separate it from other vector databases, the community trajectory that convinced us, and the question we spent the most time on during diligence.
The window to back the next Kafka, Spark, or Airflow equivalent opens before the market recognizes it. What we look for in Seed-stage data infrastructure companies, and why we write checks when others wait.
The most durable data infrastructure companies share a pattern: open-source core with enterprise expansion path. Why OSS-first distribution outperforms proprietary distribution at the infrastructure layer, and when it doesn't.
AI applications are proliferating. But the infrastructure layer they run on — the databases, orchestration systems, feature stores, model-serving frameworks — is still immature. The missing pieces, and why now is the right time to build them.