Introduction
Large Language Models (LLMs) offer powerful general capabilities, but often require fine-tuning to excel at specific tasks or understand domain-specific language. Fine-tuning adapts a trained model to a smaller, targeted dataset, enhancing its performance for your unique needs. This guide provides a step-by-step walkthrough for fine-tuning models using the Together AI platform. We will cover everything from preparing your data to evaluating your fine-tuned model. We will cover:- Dataset Preparation: Loading a standard dataset, transforming it into the required format for supervised fine-tuning on Together AI, and uploading your formatted dataset to Together AI Files.
- Fine-tuning Job Launch: Configuring and initiating a fine-tuning job using the Together AI API.
- Job Monitoring: Checking the status and progress of your fine-tuning job.
- Inference: Using your newly fine-tuned model via the Together AI API for predictions.
- Evaluation: Comparing the performance of the fine-tuned model against the base model on a test set.
Fine-tuning Guide Notebook
Here is a runnable notebook version of this fine-tuning guide: Fine-tuning Guide NotebookTable of Contents
- What is Fine-tuning?
- Getting Started
- Dataset Preparation
- Starting a Fine-tuning Job
- Monitoring Your Fine-tuning Job
- Using Your Fine-tuned Model
- Evaluating Your Fine-tuned Model
- Advanced Topics
What is Fine-tuning?
Fine-tuning is the process of improving an existing LLM for a specific task or domain. You can enhance an LLM by providing labeled examples for a particular task which it can learn from. These examples can come from public datasets or private data specific to your organization. Together AI facilitates every step of the fine-tuning process, from data preparation to model deployment. Together supports two types of fine-tuning:- LoRA (Low-Rank Adaptation) fine-tuning: Fine-tunes only a small subset of weights compared to full fine-tuning. This is faster, requires less computational resources, and is recommended for most use cases. Our fine-tuning API defaults to LoRA.
- Full fine-tuning: Updates all weights in the model, which requires more computational resources but may provide better results for certain tasks.
Getting Started
Prerequisites- Register for an account: Sign up at Together AI to get an API key.
-
Set up your API key:
-
Install the required libraries:
- Base models are trained on a wide variety of texts, making their predictions broad
- Instruct models are trained on instruction-response pairs, making them better for specific tasks
- meta-llama/Meta-Llama-3.1-8B-Instruct-Reference is great for simpler tasks
- meta-llama/Meta-Llama-3.1-70B-Instruct-Reference is better for more complex datasets and domains
Dataset Preparation
Fine-tuning requires data formatted in a specific way. We’ll use a conversational dataset as an example - here the goal is to improve the model on multi-turn conversations. Data Formats Together AI supports several data formats:-
Conversational data: A JSON object per line, where each object contains a list of conversation turns under the
"messages"
key. Each message must have a"role"
(system
,user
, orassistant
) and"content"
. See details here. - Instruction data: For instruction-based tasks with prompt-completion pairs. See details here.
- Preference data: For preference-based fine-tuning. See details here.
- Generic text data: For simple text completion tasks. See details here.
- JSONL: Simpler and works for most cases.
- Parquet: Stores pre-tokenized data, provides flexibility to specify custom attention mask and labels (loss masking).
JSONL
. However, Parquet
can be useful if you need custom tokenization or specific loss masking.
Example: Preparing the CoQA Dataset
Here’s an example of transforming the CoQA dataset into the required chat format:
Python
- When using Conversational or Instruction Data Formats, you can specify
train_on_inputs
(bool or ‘auto’) - whether to mask the user messages in conversational data or prompts in instruction data. - For Conversational format, you can mask specific messages by assigning weights.
- With pre-tokenized datasets (Parquet), you can provide custom
labels
to mask specific tokens by setting their label to-100
.
JSON
Starting a Fine-tuning Job
With our data uploaded, we can now launch the fine-tuning job usingclient.fine_tuning.create()
.
Key Parameters
model
: The base model you want to fine-tune (e.g.,'meta-llama/Meta-Llama-3.1-8B-Instruct-Reference'
)training_file
: The ID of your uploaded training JSONL filevalidation_file
: Optional ID of validation file (highly recommended for monitoring)suffix
: A custom string added to create your unique model name (e.g.,'test1_8b'
)n_epochs
: Number of times the model sees the entire datasetn_checkpoints
: Number of checkpoints to save during training (for resuming or selecting the best model)learning_rate
: Controls how much model weights are updatedbatch_size
: Number of examples processed per iteration (default: “max”)lora
: Set toTrue
for LoRA fine-tuningtrain_on_inputs
: Whether to mask user messages or prompts (can be bool or ‘auto’)warmup_ratio
: Ratio of steps for warmup
lora
parameter:
Text
Monitoring a Fine-tuning Job
Fine-tuning can take time depending on the model size, dataset size, and hyperparameters. Your job will progress through several states: Pending, Queued, Running, Uploading, and Completed. You can monitor and manage the job’s progress using the following methods:- List all jobs:
client.fine_tuning.list()
- Status of a job:
client.fine_tuning.retrieve(id=ft_resp.id)
- List all events for a job:
client.fine_tuning.list_events(id=ft_resp.id)
- Retrieves logs and events generated during the job - Cancel job:
client.fine_tuning.cancel(id=ft_resp.id)
- Download fine-tuned model:
client.fine_tuning.download(id=ft_resp.id)
status == 'completed'
), the response from retrieve
will contain the name of your newly created fine-tuned model. It follows the pattern: <your-account>/<base-model-name>:<suffix>:<job-id>
.
Check Status via API
Text

Using a Fine-tuned Model
Once your fine-tuning job completes, your model will be available for use: Option 1: Serverless LoRA Inference If you used LoRA fine-tuning and the model supports serverless LoRA inference, you can immediately use your model without deployment. We can call it just like any other model on the Together AI platform, by providing the unique fine-tuned modeloutput_name
from our fine-tuning job.
See the list of all models that support LoRA
Inference.
Python
"OPEN IN PLAYGROUND"
. Read more about Serverless LoRA Inference here

- Visit your models dashboard
-
Click
"+ CREATE DEDICATED ENDPOINT"
for your fine-tuned model -
Select hardware configuration and scaling options, including min and max replicas which affects the maximum QPS the deployment can support and then click"DEPLOY"
Python

Python
Evaluating a Fine-tuned Model
To assess the impact of fine-tuning, we can compare the responses of our fine-tuned model with the original base model on the same prompts in our test set. This provides a way to measure improvements after fine-tuning. Using a Validation Set During Training You can provide a validation set when starting your fine-tuning job:Python
- First, load a portion of the validation dataset:
Python
- Define a function to generate answers from both models:
Python
- Generate answers from both models:
Python
- Define a function to calculate evaluation metrics:
Python
- Calculate and compare metrics:
Python
Llama 3.1 8B | EM | F1 |
---|---|---|
Original | 0.01 | 0.18 |
Fine-tuned | 0.32 | 0.41 |
Advanced Topics
Continuing a Fine-tuning Job You can continue training from a previous fine-tuning job:- The output model name from the previous job
- Fine-tuning job ID
- A specific checkpoint step with the format
ft-...:{STEP_NUM}
Shell
Continued Fine-tuning jobs and LoRA Serverless Inference
Continued Fine-tuning supports various training method combinations: you can train an adapter module on top of a fully trained model or continue training an existing adapter from a previous job. Therefore, LoRA Serverless can be enabled or disabled after training is completed. If you continue a LoRA fine-tuning job with the same LoRA hyperparameters (rank, alpha, selected modules), the trained model will be available for LoRA Serverless. However, if you change any of these parameters or continue with Full training, LoRA Serverless will be disabled. Additionally, if you continue a Full fine-tuning job, LoRA Serverless will remain disabled. *Note: The feature is disabled when parameters change because the Fine-tuning API merges the parent fine-tuning adapter to the base model when it detects different adapter hyperparameters, ensuring optimal training quality. Training and Validation Split To split your dataset into training and validation sets:Shell
validation_file
and set n_evals
to a number above 0:
Python
- Prepare and format your data for fine-tuning
- Launch a fine-tuning job with appropriate parameters
- Monitor the progress of your fine-tuning job
- Use your fine-tuned model via API or dedicated endpoints
- Evaluate your model’s performance improvements
- Work with advanced features like continued training and validation sets