Introduction
Large Language Models (LLMs) offer powerful general capabilities, but often require fine-tuning to excel at specific tasks or understand domain-specific language. Fine-tuning adapts a trained model to a smaller, targeted dataset, enhancing its performance for your unique needs. This guide provides a step-by-step walkthrough for fine-tuning models using the Together AI platform. We will cover everything from preparing your data to evaluating your fine-tuned model. We will cover:- Dataset Preparation: Loading a standard dataset, transforming it into the required format for supervised fine-tuning on Together AI, and uploading your formatted dataset to Together AI Files.
- Fine-tuning Job Launch: Configuring and initiating a fine-tuning job using the Together AI API.
- Job Monitoring: Checking the status and progress of your fine-tuning job.
- Inference: Using your newly fine-tuned model via the Together AI API for predictions.
- Evaluation: Comparing the performance of the fine-tuned model against the base model on a test set.
Fine-tuning Guide Notebook
Here is a runnable notebook version of this fine-tuning guide: Fine-tuning Guide NotebookTable of Contents
- What is Fine-tuning?
- Getting Started
- Dataset Preparation
- Starting a Fine-tuning Job
- Monitoring Your Fine-tuning Job
- Using Your Fine-tuned Model
- Evaluating Your Fine-tuned Model
- Advanced Topics
What is Fine-tuning?
Fine-tuning is the process of improving an existing LLM for a specific task or domain. You can enhance an LLM by providing labeled examples for a particular task which it can learn from. These examples can come from public datasets or private data specific to your organization. Together AI facilitates every step of the fine-tuning process, from data preparation to model deployment. Together supports two types of fine-tuning:- LoRA (Low-Rank Adaptation) fine-tuning: Fine-tunes only a small subset of weights compared to full fine-tuning. This is faster, requires less computational resources, and is recommended for most use cases. Our fine-tuning API defaults to LoRA.
- Full fine-tuning: Updates all weights in the model, which requires more computational resources but may provide better results for certain tasks.
Getting Started
Prerequisites- Register for an account: Sign up at Together AI to get an API key.
-
Set up your API key:
-
Install the required libraries:
- Base models are trained on a wide variety of texts, making their predictions broad
- Instruct models are trained on instruction-response pairs, making them better for specific tasks
- Qwen/Qwen3-8B is great for simpler tasks
- Qwen/Qwen3-32B is better for more complex datasets and domains
Dataset Preparation
Fine-tuning requires data formatted in a specific way. We’ll use a conversational dataset as an example - here the goal is to improve the model on multi-turn conversations. Data Formats Together AI supports several data formats:-
Conversational data: A JSON object per line, where each object contains a list of conversation turns under the
"messages"key. Each message must have a"role"(system,user, orassistant) and"content". See details here. - Instruction data: For instruction-based tasks with prompt-completion pairs. See details here.
- Preference data: For preference-based fine-tuning. See details here.
- Generic text data: For simple text completion tasks. See details here.
- JSONL: Simpler and works for most cases.
- Parquet: Stores pre-tokenized data, provides flexibility to specify custom attention mask and labels (loss masking).
JSONL. However, Parquet can be useful if you need custom tokenization or specific loss masking.
Example: Preparing the CoQA Dataset
Here’s an example of transforming the CoQA dataset into the required chat format:
Python
- When using Conversational or Instruction Data Formats, you can specify
train_on_inputs(bool or ‘auto’) - whether to mask the user messages in conversational data or prompts in instruction data. - For Conversational format, you can mask specific messages by assigning weights.
- With pre-tokenized datasets (Parquet), you can provide custom
labelsto mask specific tokens by setting their label to-100.
JSON
Starting a Fine-tuning Job
With our data uploaded, we can now launch the fine-tuning job usingclient.fine_tuning.create().
Key Parameters
model: The base model you want to fine-tune (e.g.,'meta-llama/Meta-Llama-3.1-8B-Instruct-Reference')training_file: The ID of your uploaded training JSONL filevalidation_file: Optional ID of validation file (highly recommended for monitoring)suffix: A custom string added to create your unique model name (e.g.,'test1_8b')n_epochs: Number of times the model sees the entire datasetn_checkpoints: Number of checkpoints to save during training (for resuming or selecting the best model)learning_rate: Controls how much model weights are updatedbatch_size: Number of examples processed per iteration (default: “max”)lora: Set toTruefor LoRA fine-tuningtrain_on_inputs: Whether to mask user messages or prompts (can be bool or ‘auto’)warmup_ratio: Ratio of steps for warmup
lora parameter:
Text
Monitoring a Fine-tuning Job
Fine-tuning can take time depending on the model size, dataset size, and hyperparameters. Your job will progress through several states: Pending, Queued, Running, Uploading, and Completed. You can monitor and manage the job’s progress using the following methods:- List all jobs:
client.fine_tuning.list() - Status of a job:
client.fine_tuning.retrieve(id=ft_resp.id) - List all events for a job:
client.fine_tuning.list_events(id=ft_resp.id)- Retrieves logs and events generated during the job - Cancel job:
client.fine_tuning.cancel(id=ft_resp.id) - Download fine-tuned model:
client.fine_tuning.download(id=ft_resp.id)(v1) orclient.fine_tuning.with_streaming_response.content(ft_id=ft_resp.id)(v2)
status == 'completed'), the response from retrieve will contain the name of your newly created fine-tuned model. It follows the pattern: <your-account>/<base-model-name>:<suffix>:<job-id>.
Check Status via API
Text

Deleting a fine-tuning job
You can also delete your fine-tuning job. This action can not be undone. This will destroy all files produced by your job including intermediate and final checkpoints.Using a Fine-tuned Model
Once your fine-tuning job completes, your model will be available for use: Deploy a Dedicated Endpoint To run your fine-tuned model, deploy it on a dedicated endpoint:- Visit your models dashboard
-
Click
"+ CREATE DEDICATED ENDPOINT"for your fine-tuned model -
Select hardware configuration and scaling options, including min and max replicas which affects the maximum QPS the deployment can support and then click
"DEPLOY"
Python

Python
Evaluating a Fine-tuned Model
To assess the impact of fine-tuning, we can compare the responses of our fine-tuned model with the original base model on the same prompts in our test set. This provides a way to measure improvements after fine-tuning. Using a Validation Set During Training You can provide a validation set when starting your fine-tuning job:Python
- First, load a portion of the validation dataset:
Python
- Define a function to generate answers from both models:
Python
- Generate answers from both models:
Python
- Define a function to calculate evaluation metrics:
Python
- Calculate and compare metrics:
Python
| Llama 3.1 8B | EM | F1 |
|---|---|---|
| Original | 0.01 | 0.18 |
| Fine-tuned | 0.32 | 0.41 |
Advanced Topics
Continuing a Fine-tuning Job You can continue training from a previous fine-tuning job:- The output model name from the previous job
- Fine-tuning job ID
- A specific checkpoint step with the format
ft-...:{STEP_NUM}
Shell
Shell
validation_file and set n_evals to a number above 0:
Python
- Prepare and format your data for fine-tuning
- Launch a fine-tuning job with appropriate parameters
- Monitor the progress of your fine-tuning job
- Use your fine-tuned model via API or dedicated endpoints
- Evaluate your model’s performance improvements
- Work with advanced features like continued training and validation sets