Supervised fine-tuning

Supervised fine-tuning (SFT) trains a model on demonstration data: examples that pair an input with the exact completion you want the model to produce. It’s the default training method on Together AI and the right starting point for most use cases. To train on ranked pairs of good and bad responses instead, see preference fine-tuning. Both methods share the same job lifecycle. See the fine-tuning quickstart for the complete flow, including data upload and evaluation.

When to use supervised fine-tuning

Use SFT when:

You have demonstrations of the target behavior. Each example shows one correct completion for an input, which is the standard format for instruction and conversational data.
You want to teach a new task, style, or format. SFT shifts the model toward the patterns in your training data.
You’re starting a new fine-tune. SFT should be your foundation for most use cases. If you later need to align the model against ranked outputs, run DPO on top of the SFT checkpoint.

If your input dataset is made up of paired preferred and dispreferred responses for the same input, you can start with preference fine-tuning instead.

Prepare your data

SFT accepts conversational, instruction, and general text formats. Each line carries a single target completion. See data preparation for the schema and packing instructions for each format.

Launch a fine-tuning job

Pass a training file and a base model. SFT is the default training_method, so you don’t need to set it. Here’s the minimum code to start a supervised fine-tuning job:

from together import Together

client = Together()

job = client.fine_tuning.create(
    training_file="<FILE_ID>",
    model="Qwen/Qwen3.5-9B",
)
print(job.id)

import Together from "together-ai";

const client = new Together();

const job = await client.fineTuning.create({
  training_file: "<FILE_ID>",
  model: "Qwen/Qwen3.5-9B",
});
console.log(job.id);

tg fine-tuning create \
  --training-file "<FILE_ID>" \
  --model "Qwen/Qwen3.5-9B"

Key parameters

These are the parameters you’ll reach for most often. The full list lives in the fine-tuning API reference.

Parameter	Default	Description
`n_epochs`	`1`	Number of passes over the dataset. Range is 1 to 20.
`learning_rate`	`0.00001`	Learning rate multiplier.
`batch_size`	`max`	Per-iteration batch size. See supported models for the min and max per model.
`train_on_inputs`	`auto`	Whether to compute loss on the input tokens. `auto` masks inputs for conversational and instruction data, and trains on them for general text data.
`validation_file`	none	A held-out file to evaluate against during training. Required when `n_evals > 0`.
`suffix`	none	Up to 40 characters appended to the output model name to tell your fine-tunes apart.

To stop a run automatically when validation loss plateaus, see early stopping.

Choose LoRA or full fine-tuning

SFT runs as either LoRA (the default) or full fine-tuning. That choice is independent of the training method and affects cost, batch size, and how you deploy the result. See LoRA vs. full fine-tuning to decide and to configure LoRA’s rank and target modules.

Next steps

Track training

Retrieve per-step loss, learning rate, and evaluation metrics.

Deploy your model

Serve the result on a dedicated endpoint or download the weights.

Continue with DPO

Continue training the SFT checkpoint with DPO to align it against ranked outputs.

GET STARTED

SERVERLESS

INFERENCE APIS

DEDICATED MODEL INFERENCE

DEDICATED CONTAINER INFERENCE

GPU CLUSTERS

FINE-TUNING

CODE EXECUTION

ADMINISTRATION

When to use supervised fine-tuning

Prepare your data

Launch a fine-tuning job

Key parameters

Choose LoRA or full fine-tuning

Next steps

Track training

Deploy your model

Continue with DPO

​When to use supervised fine-tuning

​Prepare your data

​Launch a fine-tuning job

​Key parameters

​Choose LoRA or full fine-tuning

​Next steps

Track training

Deploy your model

Continue with DPO

When to use supervised fine-tuning

Prepare your data

Launch a fine-tuning job

Key parameters

Choose LoRA or full fine-tuning

Next steps