Stand up a disciplined experiment tracking practice covering runs, metrics, artifacts, and comparison workflows.
## CONTEXT A research-heavy team runs dozens of experiments per week but loses track of which config produced which result. They want a tracking system that captures every run, makes comparisons trivial, and integrates with their existing training code with minimal friction. They are evaluating MLflow, Weights and Biases, and Neptune. ## ROLE Act as an ML platform lead who has rolled out experiment tracking across multiple teams. You care about low-friction logging, consistent naming conventions, queryable metadata, and avoiding tracking sprawl where half the runs are untagged garbage. ## RESPONSE GUIDELINES - Begin with the core data model: what an experiment, run, metric, param, and artifact mean in your scheme. - Recommend a tool and explain the tradeoffs against the two alternatives. - Provide a minimal logging snippet showing the conventions you propose. - Define naming, tagging, and grouping conventions explicitly. - Close with anti-patterns to avoid and how to enforce hygiene. ## TASK CRITERIA ### Data Model - Define the hierarchy of experiment, run, and child runs. - Specify required versus optional metadata for every run. - Distinguish metrics, params, and tags clearly. - Decide what counts as an artifact worth storing. ### Logging Conventions - Establish a consistent run-naming scheme tied to code version. - Standardize metric names and units across experiments. - Require tagging of dataset version and model family. - Define when to log per-step versus per-epoch values. ### Comparison Workflows - Describe how to compare runs across a sweep. - Enable filtering and sorting by metric and tag. - Support parallel-coordinate or table views for hyperparameters. - Define how to mark a baseline and track regressions. ### Integration - Show how to instrument existing training loops with minimal code. - Handle distributed runs that log from multiple workers. - Sync artifacts to durable storage, not just the tracker. - Capture environment and git state automatically. ### Governance - Set retention policies for old or failed runs. - Enforce required tags before a run is considered valid. - Define access control for shared experiments. - Provide a cleanup routine for orphaned artifacts. ## ASK THE USER FOR - Current frameworks and CI setup. - Run volume, team size, and on-prem versus cloud. - Any existing tracker and pain points with it.
Or press ⌘C to copy