Poll until the job is done
A fine-tuning job moves through the states:pending → queued → running → uploading → completed. Queue wait is typically under an hour but varies with platform load. Once a job is running, multiply the duration of the first epoch by n_epochs to estimate remaining training time.
Use this loop to poll until the job reaches a terminal state, then fetch the metrics. The terminal states are completed, error, and cancelled.
Expected job durations: A small LoRA job on an 8B model with under 1,000 examples typically completes in 10 to 30 minutes after queue. A full job on a 70B model with hundreds of thousands of examples can take several hours. Save your job ID: you can poll from any session without re-uploading data.
Retrieve metrics
Thelist_metrics call returns every recorded step. The CLI renders ASCII charts by default; pass --json to get raw output.
Filter by step or time
All filter parameters are optional. Omit them to retrieve every recorded step.Downsample with resolution
For long runs, passresolution to cap the response at a fixed number of uniformly sampled training steps. Eval metrics are always returned in full regardless of this setting.
Sample output
Training and eval steps are returned as separate objects. Training steps containtrain/* keys, eval steps contain eval/*. When both fire at the same step, both objects appear:
Parameters
| Parameter | Type | Description |
|---|---|---|
global_step_from | integer | Return only metrics with global_step ≥ this value. |
global_step_to | integer | Return only metrics with global_step ≤ this value. |
logged_at_from | string or datetime | Return only metrics logged at or after this ISO 8601 timestamp. |
logged_at_to | string or datetime | Return only metrics logged at or before this ISO 8601 timestamp. |
resolution | integer | Maximum number of uniformly sampled training metric points. Does not affect eval metrics. |
Available metrics
Every job reportstrain/global_step, train/loss, train/grad_norm, train/learning_rate, and timestamp. When you supply validation_file and set n_evals > 0, the response also includes eval/loss and other validation metrics.
Preference-tuning jobs
A DPO job emits everything above and adds reward and divergence metrics to the samelist_metrics payload. They show up as extra train/* keys during training and, when evaluation is enabled, matching eval/* keys:
- Reward and accuracy: The reward assigned to the preferred and non-preferred responses, plus the share of examples where the preferred reward is higher.
- KL divergence: How far the trained model’s output distribution has drifted from the reference model.
- Per-side log probabilities: Separate values for the preferred and non-preferred outputs, useful for debugging stalled runs.
Stream to Weights & Biases
Passwandb_api_key when creating the job to mirror these metrics to your W&B workspace in real time. See the quickstart for the call structure.