Skip to main content

Overview

LoRA (Low-Rank Adaptation) enables efficient fine-tuning of large language models by training only a small set of additional parameters while keeping the original model weights frozen. This approach delivers several key advantages:
  • Reduced training costs: Trains fewer parameters than full fine-tuning, using less GPU memory
  • Faster deployment: Produces compact adapter files that can be quickly shared and deployed
Together AI handles the LoRA fine-tuning workflow. Once training is complete, you can deploy your fine-tuned model using a dedicated endpoint for inference.

Quick start

This guide demonstrates how to fine-tune a model using LoRA. For comprehensive fine-tuning options and best practices, refer to the Fine-Tuning Guide.

Prerequisites

  • Together AI API key
  • Training data in the JSONL format

Step 1: Upload Training Data

First, upload your training dataset to Together AI:
together files upload "your-datafile.jsonl"

Step 2: Create Fine-tuning Job

Launch a LoRA fine-tuning job using the uploaded file ID:
together fine-tuning create \
  --training-file "file-629e58b4-ff73-438c-b2cc-f69542b27980" \
  --model "meta-llama/Meta-Llama-3.1-8B-Instruct-Reference" \
  --lora
Note: If you plan to use a validation set, make sure to set the --validation-file and --n-evals (the number of evaluations over the entire job) parameters. --n-evals needs to be set as a number above 0 in order for your validation set to be used.
Once you submit the fine-tuning job you should be able to see the model output_name and job_id in the response:
{
  "id": "ft-44129430-ac08-4136-9774-aed81e0164a4",
  "training_file": "file-629e58b4-ff73-438c-b2cc-f69542b27980",
  "validation_file": "",
  "model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Reference",
  "output_name": "zainhas/Meta-Llama-3.1-8B-Instruct-Reference-my-demo-finetune-4224205a",
  ...
}

Step 3: Getting the output model

Once you submit the fine-tuning job you should be able to see the model output_name and job_id in the response:
{
  "id": "ft-44129430-ac08-4136-9774-aed81e0164a4",
  "training_file": "file-629e58b4-ff73-438c-b2cc-f69542b27980",
  "validation_file": "",
  "model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Reference",
  "output_name": "zainhas/Meta-Llama-3.1-8B-Instruct-Reference-my-demo-finetune-4224205a",
  ...
}
You can also see the status of the job and get the model name if you navigate to your fine-tuned model in the ‘Model’ or ‘Jobs’ tab in the Together dashboard.

Step 4: Deploy for inference

Once the fine-tuning job is completed, you can deploy your model for inference using a dedicated endpoint. See Deploying a Fine-tuned Model for detailed instructions.

Best Practices

  1. Data Preparation: Ensure your training data follows the correct JSONL format for your chosen model
  2. Validation Sets: Always include validation data to monitor training quality
  3. Model Naming: Use descriptive names for easy identification in production
  4. Monitoring: Track training metrics through the Together dashboard

Frequently Asked Questions

Which base models support LoRA fine-tuning?

Together AI supports LoRA fine-tuning on a curated selection of high-performance base models. See the supported models list for current options.

What’s the difference between LoRA and full fine-tuning?

LoRA trains only a small set of additional parameters (typically 0.1-1% of model size), resulting in faster training, lower costs, and smaller output files, while full fine-tuning updates all model parameters for maximum customization at higher computational cost.

How do I run inference on my LoRA fine-tuned model?

Once training is complete, deploy your model using a dedicated endpoint. See Deploying a Fine-tuned Model for instructions.

Next Steps