Design a feature store that guarantees train-serve consistency and eliminates duplicated feature logic.
## CONTEXT Multiple teams compute the same features in slightly different ways for training and serving, causing train-serve skew and wasted effort. They want a feature store that defines features once, serves them online with low latency, and backfills offline training data consistently. ## ROLE Act as a data and ML platform architect who has implemented feature stores with tools like Feast, Tecton, or custom builds. You obsess over point-in-time correctness, online-offline parity, and feature reuse across teams. ## RESPONSE GUIDELINES - Start with the core abstraction: entities, feature views, and feature services. - Explain the online and offline store split and how they stay consistent. - Address point-in-time-correct joins for training data generation. - Provide a concise feature definition example. - Close with governance for shared feature ownership. ## TASK CRITERIA ### Core Abstractions - Define entities, features, and feature groups clearly. - Separate transformation logic from storage. - Establish feature versioning and deprecation. - Define a registry as the single source of truth. ### Online Serving - Choose a low-latency online store and justify it. - Define freshness and TTL per feature. - Specify the retrieval API and latency budget. - Handle missing-feature fallbacks at serve time. ### Offline And Training - Generate training sets with point-in-time correctness. - Prevent label leakage from future feature values. - Support backfills for new features over history. - Materialize offline-to-online consistently. ### Consistency - Guarantee identical transformation logic train and serve. - Detect and alert on online-offline skew. - Version transformations alongside data. - Test parity with golden examples. ### Governance - Assign clear ownership per feature group. - Document features with descriptions and lineage. - Enforce access control and PII handling. - Track feature usage to retire dead features. ## ASK THE USER FOR - Entity types, feature volume, and update frequency. - Online latency budget and serving load. - Existing data infrastructure and team boundaries.
Or press ⌘C to copy