Skip to main content
This page covers what to expect while a job runs and how to diagnose the most common failures. To inspect a job’s status and history, run tg fine-tuning retrieve <JOB_ID> and tg fine-tuning list-events <JOB_ID>, or open the jobs dashboard.

Job timing

Time to start: Start time depends on queue depth and hardware availability. With no other pending jobs and free hardware, a job starts within a minute. Most jobs start within an hour. There is no SLA on queue wait time. Run time: Run time depends on model size, dataset size, and network speed during model and data downloads. To estimate the total, multiply the duration of the first epoch (visible in the event log) by n_epochs.

File upload failures

If a training file fails to upload, check the following:
  • Invalid API key: A 401 or 403 status means the key is missing or inactive. Verify the key is set and active.
  • Insufficient balance: A minimum balance of $5 is required. Add a credit card or adjust your limits.

Cancelled jobs

A job is usually cancelled for one of two reasons:
  • Insufficient balance: Add funds or raise the spending limit.
  • Invalid Weights & Biases API key: Re-issue the key and re-pass wandb_api_key.
Check the event log to confirm the cause:
tg fine-tuning list-events <JOB_ID>

Job fails after the data download

If a job fails after the model downloads but before training starts, the most likely cause is a data validation error. Run a client-side check on the file:
tg files check ./train.jsonl
If the file passes locally but the job still fails, contact support with the job ID.

Automatic restarts

Jobs restart automatically on internal errors, such as hardware failures. Check the event log for a restart event, the new job ID, and a refund line. The refund is automatic.

Error codes

Fine-tuning API requests return standard HTTP status codes. For the full list of codes, causes, and fixes, see Error codes. Transient 5xx errors (503, 504, 524, and 529) are safe to retry after a short wait. Contact support if they persist.