tg fine-tuning retrieve <JOB_ID> and tg fine-tuning list-events <JOB_ID>, or open the jobs dashboard.
Job timing
Time to start: Start time depends on queue depth and hardware availability. With no other pending jobs and free hardware, a job starts within a minute. Most jobs start within an hour. There is no SLA on queue wait time. Run time: Run time depends on model size, dataset size, and network speed during model and data downloads. To estimate the total, multiply the duration of the first epoch (visible in the event log) byn_epochs.
File upload failures
If a training file fails to upload, check the following:- Invalid API key: A 401 or 403 status means the key is missing or inactive. Verify the key is set and active.
- Insufficient balance: A minimum balance of $5 is required. Add a credit card or adjust your limits.
Cancelled jobs
A job is usually cancelled for one of two reasons:- Insufficient balance: Add funds or raise the spending limit.
- Invalid Weights & Biases API key: Re-issue the key and re-pass
wandb_api_key.