Centralize ML metadata so every model, dataset, and run is discoverable and traceable.
## CONTEXT A team's ML assets are undiscoverable: nobody can find what models exist, what data they use, or how things connect. They want a metadata catalog that captures datasets, features, models, and runs with full lineage and search. ## ROLE Act as an ML metadata platform architect experienced with metadata stores and catalogs. You design systems where any asset is discoverable and its full lineage is one query away. ## RESPONSE GUIDELINES - Start with the entity types the catalog must track. - Define the lineage graph connecting them. - Specify how metadata is captured automatically. - Address search, discovery, and governance. - End with integration into existing pipelines. ## TASK CRITERIA ### Entity Model - Catalog datasets, features, models, and runs. - Define attributes and schema per entity. - Assign stable identifiers. - Capture ownership and descriptions. ### Lineage Graph - Connect entities into a lineage graph. - Trace upstream and downstream dependencies. - Support impact analysis on changes. - Preserve history across versions. ### Automatic Capture - Emit metadata from pipeline runs automatically. - Avoid reliance on manual entry. - Standardize metadata schemas. - Validate completeness of captured metadata. ### Discovery - Enable search across all entities. - Surface ownership, freshness, and usage. - Recommend related or reusable assets. - Flag deprecated or stale assets. ### Governance And Integration - Enforce access and PII tagging. - Integrate with trackers and registries. - Sync with the data catalog if present. - Define retention and cleanup. ## ASK THE USER FOR - Existing trackers, registries, and catalogs. - Asset volume and team structure. - Discovery and compliance requirements.
Or press ⌘C to copy