When to use supervised fine-tuning
Use SFT when:- You have demonstrations of the target behavior. Each example shows one correct completion for an input, which is the standard format for instruction and conversational data.
- You want to teach a new task, style, or format. SFT shifts the model toward the patterns in your training data.
- You’re starting a new fine-tune. SFT should be your foundation for most use cases. If you later need to align the model against ranked outputs, run DPO on top of the SFT checkpoint.
Prepare your data
SFT accepts conversational, instruction, and general text formats. Each line carries a single target completion. See data preparation for the schema and packing instructions for each format.Launch a fine-tuning job
Pass a training file and a base model. SFT is the defaulttraining_method, so you don’t need to set it. Here’s the minimum code to start a supervised fine-tuning job:
Key parameters
These are the parameters you’ll reach for most often. The full list lives in the fine-tuning API reference.| Parameter | Default | Description |
|---|---|---|
n_epochs | 1 | Number of passes over the dataset. Range is 1 to 20. |
learning_rate | 0.00001 | Learning rate multiplier. |
batch_size | max | Per-iteration batch size. See supported models for the min and max per model. |
train_on_inputs | auto | Whether to compute loss on the input tokens. auto masks inputs for conversational and instruction data, and trains on them for general text data. |
validation_file | none | A held-out file to evaluate against during training. Required when n_evals > 0. |
suffix | none | Up to 40 characters appended to the output model name to tell your fine-tunes apart. |
Choose LoRA or full fine-tuning
SFT runs as either LoRA (the default) or full fine-tuning. That choice is independent of the training method and affects cost, batch size, and how you deploy the result. See LoRA vs. full fine-tuning to decide and to configure LoRA’s rank and target modules.Next steps
Track training
Retrieve per-step loss, learning rate, and evaluation metrics.
Deploy your model
Serve the result on a dedicated endpoint or download the weights.
Continue with DPO
Continue training the SFT checkpoint with DPO to align it against ranked outputs.