Feature Store Architecture for Real-Time ML

A practical breakdown of feature store design for teams that need low-latency feature serving without training-serving skew.

Feature Store Architecture

A feature store is a centralized system for computing, storing, and serving ML features to both training pipelines and inference endpoints. The core problem it solves is training-serving skew: the gap between how features are computed during training and how they are computed at inference time. When that gap exists, your model is evaluated on one data distribution and deployed against another, and the accuracy you measure in evaluation does not reflect production performance.

Feature stores are not always necessary. If your feature engineering is simple, stateless, and fast to compute, you may not need one. But for teams dealing with complex aggregations, shared features across multiple models, or strict latency requirements, a feature store becomes essential infrastructure.

The Dual-Store Architecture

Most production feature stores use a dual-store architecture: an offline store for training and batch inference, and an online store for real-time inference. The offline store is typically a columnar data warehouse or data lake, optimized for high-throughput reads of large feature datasets. The online store is a low-latency key-value store, optimized for single-entity feature lookups in under 10ms. Redis and DynamoDB are common online store backends. Snowflake, BigQuery, and Parquet files on object storage are common offline store backends.

The critical design requirement is that both stores are populated by the same feature computation code. Not similar code, or code that produces equivalent results most of the time, but literally the same code path. Any divergence between the offline and online computation produces skew. This is the architectural guarantee that makes feature stores valuable: by centralizing feature computation, you eliminate the opportunity for skew by construction.

Feature Pipelines

A feature pipeline is a data processing job that reads raw data, applies transformations, and writes the results to the feature store. Batch feature pipelines run on a schedule, typically daily or hourly, and populate the offline store and the online store for slowly-changing features. Streaming feature pipelines process events as they arrive and update the online store in near real time for features that require fresh data, such as a user's activity in the last 5 minutes or a product's purchase rate in the last hour.

Streaming pipelines are significantly more complex to build and operate than batch pipelines. They require a streaming framework (Kafka Streams, Apache Flink, Spark Streaming), robust state management, and careful handling of late-arriving events. For most teams early in their feature store journey, starting with batch pipelines and adding streaming selectively for the features that genuinely require freshness is the right approach. Not every feature needs to be updated in real time; many features change slowly enough that hourly batch updates are sufficient.

Point-in-Time Correctness for Training

When training a model on historical data, you need to retrieve the feature values that were available at the time each training label was generated. If a user made a purchase on January 15th, you need the features as they existed on January 15th, not their current values. Using future feature values to predict past events produces data leakage, which inflates training metrics and leads to poor production performance.

This is called point-in-time correct feature retrieval, and it is one of the most common sources of subtle bugs in ML systems. Implementing it correctly requires the offline store to retain historical feature values with timestamps, and the training data generation step to join on entity key and timestamp rather than just entity key. Feature stores that support point-in-time correct retrieval are significantly more useful for offline training than simple feature tables.

Feature Discovery and Reuse

One of the secondary benefits of a feature store is making features discoverable and reusable across teams. Without a central store, each data science team computes the same features independently, with slight variations in implementation and naming. With a central store, a feature computed once by one team is available to all teams. The user's 30-day purchase count, computed for a recommendation model, is immediately available to the fraud detection model without duplication.

Feature discoverability requires a catalog with documentation: what does this feature represent, how is it computed, when is it updated, and what models use it. Most feature store platforms provide a catalog interface. The value of the catalog is proportional to the discipline with which it is maintained. Features without documentation get rediscovered and reimplemented, defeating the purpose.

Latency Requirements and Store Selection

The choice of online store backend should be driven by your latency requirements. If your model endpoint SLA is 100ms and feature retrieval needs to complete in under 10ms, Redis with a local cache for hot entities is appropriate. If your endpoint SLA is 500ms and you have thousands of features per entity, a columnar store with a cache may be acceptable. Measure actual feature retrieval latency under production-like load before committing to a backend.

Serialization format matters significantly for online store latency. JSON is convenient but slow to serialize and deserialize for large feature vectors. Protocol Buffers or Arrow IPC format reduces serialization overhead substantially and is worth the added complexity for latency-sensitive applications.

When You Do Not Need a Feature Store

A feature store adds operational complexity. If your features are computed entirely from the request payload (no historical state needed), if your feature engineering is trivial Python that runs in under 5ms, and if you have only one or two models, a feature store is overhead without benefit. The right time to introduce a feature store is when training-serving skew has caused a measurable accuracy problem, or when feature reuse across multiple models would provide clear ROI, or when streaming feature freshness is required.

Conclusion

Feature stores solve a real problem in production ML systems and the architectural pattern is well established. The key decisions are whether to use a dual-store architecture, how to handle point-in-time correctness for training, and what streaming versus batch cadence each feature requires. Start simple, with batch pipelines and a basic online store, and evolve as your requirements grow. The discipline of centralized feature computation pays dividends in model quality and team velocity.