- Uploading your own datasets to our platform
- Starting a fine-tuning job that fine-tunes an existing LLM of your choice with your uploaded data
- Monitoring the progress of your fine-tuning job
- Hosting the resulting model on Together or download it so you can run it yourself locally
Choosing your model
The first step in fine-tuning is to choose which LLM you want to use as the starting point for your custom model. All generative LLMs are trained to take some input text and then predict what text is most likely to follow it. While base models are trained on a wide variety of texts, making their predictions broad, instruct models are trained on text that’s been structured as instruction-response pairs – hence their name. Each instruct model has its own structured format, however you only need to pass in theprompt
and completion
pairs - please refer to the data format details here.
If it’s your first time fine-tuning, we recommend using an instruct model. Llama 3 8b instruct is great for simpler training sets, and the larger Llama 3 70b instruct is good for more complicated training sets.
You can find all available models on the Together API here.
Preparing your data
Once you’ve chosen your model you’ll need to save your structured data as either a JSONL file or a Parquet file (tokenized).Which file format should I use for data?
The example packing strategy is used by default for training data if a JSONL file is provided. If you’d like to disable the example packing for training, you can provide a tokenized dataset in a Parquet file. This example script for tokenizing a dataset demonstrates padding each example with a pad token. Note that the correspondingattention_mask
and labels
should be set to 0 and -100, respectively, so that the model essentially ignores the padding tokens in prediction and excludes them in its loss.
JSONL is simpler and will work for many cases, while Parquet stores pre-tokenized data, providing flexibility to specify custom attention mask and labels (loss masking). It also saves you time for each job you run by skipping the tokenization step. View our file format guide to learn more about working with each format.
Loss masking
The Together Fine-tuning API trains a model using the same cross-entropy loss as used during pre-training (in other words, by predicting the next token). If you provide a JSONL file, the loss will be calculated for every token, regardless of your custom task and prompt format. However, in some cases you may want to fine-tune a model to excel at predicting only a specific part of a prompt. For example, if you want to fine-tune a model to answer a short question followed by a long context, the model doesn’t need to learn to generate the entire context and the question. Penalizing its prediction for the context and question could lead to ineffective training for your answering task. By providing a customlabels
field for your examples in the tokenized dataset (in a Parquet file), you can mask out the loss calculation for specified tokens. Set the label for tokens you don’t want to include in the loss calculation to -100
(see here for why). Note that unlike padding tokens, you still set their corresponding attention_mask
to 1, so that the model can properly attend to these tokens during prediction.
Train and Validation Split
You can split a JSONL file for training and validation, by running the following example script. For more information about using the validation set, see here :Shell
File Check
Once your data is in the correct structure and saved as either a.jsonl
or .parquet
file, use our CLI to verify that it’s correct:
CLI
JSON
is_check_passed: true
in the response.
You’re now ready to upload your data to Together!
Uploading your data
To upload your data, use the CLI or our Python library (our TypeScript library currently doesn’t support file uploads):JSON
Starting a fine-tuning job
We support both LoRA and full finetuning – see how to start a finetuning job with either method below.LoRA fine-tuning
(Supported withtogether >= 1.2.3
) Call create
with your file ID as the training_file
to kick off a new fine-tuning job. Pass --lora
for LoRA fine-tuning:
--lora-r
, --lora-dropout
, --lora-alpha
, --lora-trainable-modules
to customize your job. See the full list of hyperparameters and their definitions here.
The response object will have all the details of your job, including its ID and a status
key that starts out as “pending”:
JSON
Full fine-tuning
Callcreate
with your file ID as the training_file
to kick off a new fine-tuning job:
status
key that starts out as “pending”:
JSON
Continue a fine-tuning job
You can continue a previous fine-tuning job by specifying the--from-checkpoint
field in the request:
- The output model name from the previous job
- Fine-tuning job ID
- A specific checkpoint step with the format
ft-...:{STEP_NUM}
, where{STEP_NUM}
is the step on which the checkpoint was created
- A specific checkpoint step with the format
together fine-tuning list-checkpoints {FT_JOB_ID}
.
Evaluation
To use a validation set, provide--validation-file
and --n-evals
the number of evaluations (over the entire job):
n_evals
, the most up-to-date model weights will be evaluated with a forward pass on your validation set, and the evaluation loss will be recorded in your job event log. If you provide a W&B API key, you will also be able to see the losses in the W&B page. Therefore, the presence of the validation set will not influence the model’s training quality.


Monitoring a fine-tuning job’s progress
After you started your job, visit your jobs dashboard. You should see your new job!
retrieve
to get the latest details about your job directly from your code:
retrieve
command from above. If your job is in a pending state for too long, please reach out to [email protected].
You can also monitor the fine-tuning job on the Weights & Biases platform, shown below, if you provided your API key when submitting the fine-tuning job as instructed above.

Deploying your fine-tuned model
Once your fine-tune job completes, you should see your new model in your models dashboard:
Hosting your model on Together AI
If you select your model in the models dashboard, you’ll see several hardware configurations that you can choose from to start hosting your model:

Running your model locally
To run your model locally, first download it by callingdownload
with your job ID:
output
as a tar.zst
file, which is an archive file format that uses the ZStandard algorithm. You’ll need to install ZStandard to decompress your model.
On Macs, you can use Homebrew:
Shell
Python
Pricing
Pricing for fine-tuning is based on model size, the number of training tokens, the number of validation tokens, the number of evaluations, and the number of epochs. In other words, the total number of tokens used in a job isn_epochs * n_tokens_per_training_dataset + n_evals * n_tokens_per_validation_dataset
. You can estimate fine-tuning pricing with our calculator. The exact pricing may differ from the estimate cost by ~$1 as the exact number of trainable parameter is different for each model.
Currently LoRA and full fine-tuning have the same pricing.
The tokenization step is a part of the fine-tuning process on our API, and the exact number of tokens and the price of your job will be available after the tokenization step is done. You can find the information in your jobs dashboard or retrieve them by running together fine-tuning retrieve $JOB_ID
in your CLI.
Q: Is there a minimum price? The minimum price for a fine-tuning job is $5. For example, fine-tuning Llama-3-8B with 1B training tokens for 1 epoch and 1M validation tokens for 10 evaluations is $369.7. If you fine-tune this model for 1M training tokens for 1 epoch only without a validation set, it is $0.37 based on the rate, and the final price will be $5.
Q: What happens if I cancel my job? The final price will be determined based on the amount of tokens used to train and validate your model up to the point of the cancellation. For example, if your fine-tuning job is using Llama-3-8B with a batch size of 8, and you cancelled the job after 1000 training steps the total number of tokens used for training is 8192 [context length] x 8 [batch size] x 1000 [steps] = 65,536,000. If your validation set has 1M tokens and it’s run 10 evaluation steps before the cancellation, you will need to add 10M tokens to the token count. This results in $30.91 as you can check in the pricing page.