← All research
ML Infrastructure

Feature Stores in 2025: The Second Wave

The first wave of feature store adoption, roughly 2019 to 2022, was characterized by significant organizational effort and mixed outcomes. The companies that built feature stores in this period often found the implementation more expensive than anticipated, the ergonomics worse than documented, and the promised benefits of training-serving consistency harder to realize in practice than in theory. Feature stores acquired a reputation as expensive infrastructure that requires dedicated platform engineering to operate and produces organizational friction rather than efficiency.

This reputation was earned. First-generation feature stores were difficult to integrate with existing data infrastructure, required significant changes to ML workflows, and had APIs that were designed by infrastructure engineers rather than practitioners. The value proposition was real but the implementation cost was too high for most teams outside of large tech companies with dedicated ML platform organizations.

What changed for the second wave

The second wave of feature stores, represented by products like Chalk, is architecturally different in ways that address the first wave's failure modes. The core change is Python-first API design — features are defined as decorated Python functions that are simple to write, test, and compose. The complexity of the infrastructure is hidden behind the abstraction rather than exposed to practitioners. A data scientist can define a feature that computes the 30-day rolling average of user spend and have it available in both training and serving contexts without understanding the underlying execution model.

The second architectural change is native streaming integration. First-generation feature stores were predominantly batch systems that served stale features with low latency. For many ML applications — fraud detection, real-time recommendations, dynamic pricing — features computed from stale data are not useful. The second wave builds streaming computation as a first-class primitive rather than an afterthought. Features can be defined to update continuously from event streams and be served at millisecond latency with recent data.

The LLM feature question

A genuine open question for the feature store category is how it adapts to the LLM application development workflow. The classical feature store model — define numerical features over structured entity data, compute them consistently across training and serving — maps poorly onto LLM applications where the "features" are text chunks, conversation history, and retrieved context. The computational patterns are different, the storage requirements are different, and the freshness semantics are different.

Our view is that LLM applications don't primarily use feature stores in the classical sense, but that there is a class of LLM applications — real-time personalization, dynamic context injection, entity-specific context retrieval — where the feature store pattern extends naturally. A user's recent activity, computed and stored as structured context that gets injected into prompts, is architecturally similar to a feature. The companies that figure out this extension of the feature store concept will have an interesting position at the intersection of ML infrastructure and AI application infrastructure.