← All research
Data Infrastructure

Why Streaming Is Finally Winning

Real-time data processing has been on the "strategic priority" list of enterprise data teams since the launch of Apache Storm in 2011 and Kafka in 2012. For most of the subsequent decade, it remained a strategic priority that rarely made it into production at scale. The operational complexity was too high. The tooling required specialists. The use cases that genuinely required sub-second freshness were narrower than the hype suggested. Batch with frequent refreshes was good enough for most applications.

Something changed between 2023 and 2025. Streaming data infrastructure is now being adopted at a rate that exceeds the previous decade's adoption curve. The adoption is real — not conference talk adoption, but production deployments that process continuous event streams in organizations that weren't streaming-first. The change is being driven by AI application requirements, not by streaming ideology.

What AI applications need that batch can't provide

The critical AI application pattern driving streaming adoption is real-time context injection — providing LLMs with information about the current state of a user's session, transaction, or environment that batch-refreshed data cannot provide. A fraud detection model that scores transactions in milliseconds needs to know about other transactions that happened in the last thirty seconds, not the last hour. A customer support assistant that personalizes responses needs to know what the customer did on the website two minutes ago, not yesterday.

This pattern requires databases that can receive continuous writes from operational systems, not periodic batch loads from ETL pipelines. It requires the operational-to-analytical data movement that change data capture tools like Artie enable — streaming row-level changes from Postgres or MySQL into analytical stores and serving layers without the latency of batch ETL. Artie's approach of real-time CDC replication removes the overnight batch window that separates operational data from analytical data in traditional architectures.

The simplification that made it practical

The other change driving streaming adoption is the simplification of the streaming infrastructure itself. Estuary Flow represents a different philosophy than the Kafka/Flink/Spark Streaming generation — a managed streaming platform that abstracts the operational complexity of running distributed streaming infrastructure while maintaining the real-time semantics that applications need. The operational burden of managing a Kafka cluster, tuning consumer lag, handling rebalancing, and monitoring partition health is significant. Managed streaming services that handle this operational layer lower the barrier for teams that need streaming semantics but don't want to become streaming infrastructure specialists.

The combination of these two trends — AI applications that need real-time data and managed streaming services that make real-time data practical to operate — has created the adoption curve we're now seeing. Streaming is winning not because streaming is philosophically superior to batch but because the applications that matter most today have requirements that batch can't meet, and the tools to run streaming in production have finally gotten good enough for non-specialist teams.