Prerequisites
Before you begin, make sure you have:- A Together AI account and API key.
- The Together CLI or the Python / TypeScript SDK installed.
- Python install, with
datasets,transformers, andtqdmif you want to follow the data-prep step verbatim:
Step 1: Prepare your dataset
This quickstart uses the CoQA conversational dataset. Together AI supports four text data formats: conversational, instruction, preference, and generic text. JSONL is the default file format, but you can use Parquet for pre-tokenized data and custom loss masking. Transform CoQA into the conversational shape:Python
check=True re-runs the same validation server-side before the job starts.
check_file:
id from the upload response. You’ll pass it as training_file in the next step.
Step 2: Launch the job
client.fine_tuning.create() starts a LoRA job by default. The example below tunes Qwen3 8B for three epochs. See the API reference for the full list of parameters.
Job parameters
Job parameters
Here are some common job parameters:
| Parameter | Required | Default | Notes |
|---|---|---|---|
training_file | Required | n/a | File ID from Step 1. |
model | Required | n/a | Base model to fine-tune. |
lora | Optional | true | Set false for full fine-tuning. |
n_epochs | Optional | 1 | Passes through the training set. |
learning_rate | Optional | 0.00001 | Step size. |
batch_size | Optional | "max" | Examples per step. |
warmup_ratio | Optional | 0.0 | Fraction of steps for LR warmup. |
weight_decay | Optional | 0.0 | L2 regularization. |
train_on_inputs | Optional | "auto" | Mask user or prompt tokens from the loss. |
suffix | Optional | n/a | Up to 64 characters appended to the output model name. |
n_checkpoints | Optional | 1 | Intermediate checkpoints saved during training. |
n_evals | Optional | 0 | Evaluations against validation_file during training. |
hf_api_token | Optional | n/a | Only required for a private Hugging Face base. Omit otherwise. |
Step 3: Watch the job complete
Jobs move through these states:pending → queued → running → uploading → completed. Queue wait time is typically under an hour. Once running, multiply the first epoch’s duration by n_epochs to estimate the time remaining.
Poll for completion (or error/cancellation), then read the output model name:
Step 4: Deploy and call your model
Fine-tuned models can be run on Together AI using dedicated endpoints. The example below deploys, sends one request, and tears the endpoint down to stop billing:Pass
endpoint.name (not output_model) as the model parameter when calling inference APIs. The endpoint name includes a unique suffix that routes traffic to your deployment.Congrats! You just fine-tuned a model, deployed it to a dedicated endpoint, and ran inference end-to-end.
Step 5: Compare against the base model (optional)
To measure the impact of fine-tuning, run the same prompts through the base model and the fine-tuned model.Many fine-tuneable base models aren’t available on serverless. For example, calling
Qwen/Qwen3-8B directly returns Unable to access non-serverless model. To compare, deploy the base on its own dedicated endpoint, evaluate against endpoint.name, then tear that endpoint down too. Serverless bases (those with a per-token price listed on the models dashboard) can be called directly without deploying anything.| Qwen3 8B | EM | F1 |
|---|---|---|
| Base | 0.01 | 0.18 |
| Fine-tuned | 0.32 | 0.41 |
Stop the endpoint
Dedicated endpoints bill per minute as long as they’re running. Step 4 deletes the endpoint at the end of the script, but if you skipped that step or want to delete it later, run:tg endpoints list.
Continue from a checkpoint
Resume training from an existing job by passingfrom_checkpoint:
from_checkpoint accepts the output model name, the job ID, or a specific step in the form ft-...:{STEP_NUM}. List available checkpoints with tg fine-tuning list-checkpoints <JOB_ID>.
Next steps
Data preparation
See the full schema for conversational, instruction, preference, and tokenized data.
Supported models
Browse base models with context lengths and batch size limits.
Preference tuning
Align a model with paired preferred and dispreferred responses.
Deploy your model
Hosting, teardown, and local inference for fine-tuned models.