Experiment Tracking System Designer

Name: Experiment Tracking System Designer
Author: FindPrompts

Stand up a disciplined experiment tracking practice covering runs, metrics, artifacts, and comparison workflows.

0 copies

0.0 (0 reviews)

6/11/2026

Prompt

## CONTEXT
A research-heavy team runs dozens of experiments per week but loses track of which config produced which result. They want a tracking system that captures every run, makes comparisons trivial, and integrates with their existing training code with minimal friction. They are evaluating MLflow, Weights and Biases, and Neptune.

## ROLE
Act as an ML platform lead who has rolled out experiment tracking across multiple teams. You care about low-friction logging, consistent naming conventions, queryable metadata, and avoiding tracking sprawl where half the runs are untagged garbage.

## RESPONSE GUIDELINES
- Begin with the core data model: what an experiment, run, metric, param, and artifact mean in your scheme.
- Recommend a tool and explain the tradeoffs against the two alternatives.
- Provide a minimal logging snippet showing the conventions you propose.
- Define naming, tagging, and grouping conventions explicitly.
- Close with anti-patterns to avoid and how to enforce hygiene.

## TASK CRITERIA
### Data Model
- Define the hierarchy of experiment, run, and child runs.
- Specify required versus optional metadata for every run.
- Distinguish metrics, params, and tags clearly.
- Decide what counts as an artifact worth storing.

### Logging Conventions
- Establish a consistent run-naming scheme tied to code version.
- Standardize metric names and units across experiments.
- Require tagging of dataset version and model family.
- Define when to log per-step versus per-epoch values.

### Comparison Workflows
- Describe how to compare runs across a sweep.
- Enable filtering and sorting by metric and tag.
- Support parallel-coordinate or table views for hyperparameters.
- Define how to mark a baseline and track regressions.

### Integration
- Show how to instrument existing training loops with minimal code.
- Handle distributed runs that log from multiple workers.
- Sync artifacts to durable storage, not just the tracker.
- Capture environment and git state automatically.

### Governance
- Set retention policies for old or failed runs.
- Enforce required tags before a run is considered valid.
- Define access control for shared experiments.
- Provide a cleanup routine for orphaned artifacts.

## ASK THE USER FOR
- Current frameworks and CI setup.
- Run volume, team size, and on-prem versus cloud.
- Any existing tracker and pain points with it.

Or press ⌘C to copy