Skip to main content
Supervised fine-tuning (SFT) trains a model on demonstration data: examples that pair an input with the exact completion you want the model to produce. It’s the default training method on Together AI and the right starting point for most use cases. To train on ranked pairs of good and bad responses instead, see preference fine-tuning. Both methods share the same job lifecycle. See the fine-tuning quickstart for the complete flow, including data upload and evaluation.

When to use supervised fine-tuning

Use SFT when:
  • You have demonstrations of the target behavior. Each example shows one correct completion for an input, which is the standard format for instruction and conversational data.
  • You want to teach a new task, style, or format. SFT shifts the model toward the patterns in your training data.
  • You’re starting a new fine-tune. SFT should be your foundation for most use cases. If you later need to align the model against ranked outputs, run DPO on top of the SFT checkpoint.
If your input dataset is made up of paired preferred and dispreferred responses for the same input, you can start with preference fine-tuning instead.

Prepare your data

SFT accepts conversational, instruction, and general text formats. Each line carries a single target completion. See data preparation for the schema and packing instructions for each format.

Launch a fine-tuning job

Pass a training file and a base model. SFT is the default training_method, so you don’t need to set it. Here’s the minimum code to start a supervised fine-tuning job:
from together import Together

client = Together()

job = client.fine_tuning.create(
    training_file="<FILE_ID>",
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Reference",
)
print(job.id)

Key parameters

These are the parameters you’ll reach for most often. The full list lives in the fine-tuning API reference.
ParameterDefaultDescription
n_epochs1Number of passes over the dataset. Range is 1 to 20.
learning_rate0.00001Learning rate multiplier.
batch_sizemaxPer-iteration batch size. See supported models for the min and max per model.
train_on_inputsautoWhether to compute loss on the input tokens. auto masks inputs for conversational and instruction data, and trains on them for general text data.
validation_filenoneA held-out file to evaluate against during training. Required when n_evals > 0.
suffixnoneUp to 40 characters appended to the output model name to tell your fine-tunes apart.

Choose LoRA or full fine-tuning

SFT runs as either LoRA (the default) or full fine-tuning. That choice is independent of the training method and affects cost, batch size, and how you deploy the result. See LoRA vs. full fine-tuning to decide and to configure LoRA’s rank and target modules.

Next steps

Track training

Retrieve per-step loss, learning rate, and evaluation metrics.

Deploy your model

Serve the result on a dedicated endpoint or download the weights.

Continue with DPO

Continue training the SFT checkpoint with DPO to align it against ranked outputs.