Overview
Together AI supports uploading and running inference on custom LoRA (Low-Rank Adaptation) adapters that you’ve trained independently or obtained from sources like the Hugging Face Hub.Key benefits
- Serverless deployment: No infrastructure management required
- Fast inference: Optimized for low latency
- Private models: Your adapters remain private to your account
- Multiple sources: Support for AWS S3 and Hugging Face Hub repositories
Supported base models
Currently, LoRA inference is supported for adapters based on the following base models in Together API. Whether using pre-fine-tuned models or bringing your own adapters, these are the only compatible models:| Organization | Base Model Name | Base Model String | Quantization |
|---|---|---|---|
| Meta | Llama 4 Maverick Instruct | meta-llama/Llama-4-Maverick-17B-128E-Instruct | FP8 |
| Meta | Llama 3.1 8B Instruct | meta-llama/Meta-Llama-3.1-8B-Instruct-Reference | BF16 |
| Meta | Llama 3.1 70B Instruct | meta-llama/Meta-Llama-3.1-70B-Instruct-Reference | BF16 |
| Alibaba | Qwen2.5 14B Instruct | Qwen/Qwen2.5-14B-Instruct | FP8 |
| Alibaba | Qwen2.5 72B Instruct | Qwen/Qwen2.5-72B-Instruct | FP8 |
Implemenation guide
Prerequisites
- Together AI API key
- Compatible LoRA adapter files:
If you are getting the adapter from Hugging Face Hub you can find information about the base model there as well.
You need to make sure that the adapter you are trying to upload has an
adapter_config.jsonandadapter_model.safetensorsfiles. - Adapter hosted on AWS S3 or Hugging Face Hub
Upload from S3
Upload from the Hugging Face Hub
Make sure that the adapter containsadapter_config.json and adapter_model.safetensors files in Files and versions tab on the Hugging Face Hub.
Upload response
Successful upload returns:Monitor upload progress
You can poll the API using thejob_id until the adapter has finished uploading.
Run LoRA inference:
Use themodel_name string from the adapter upload.
Troubleshooting
1. “Model name already exists” Error
Problem: Attempting to upload with a duplicate model name Solution: Choose a unique model name for your adapter2. Missing Required Files
Problem: Adapter missingadapter_config.json or adapter_model.safetensors
Solution: Ensure both files are present in your source location before uploading
3. Base Model Incompatibility
Problem: Adapter trained on unsupported base model Solution: Verify your adapter was trained on one of the supported base models listed above4. Upload Job Stuck in “Processing”
Problem: Job status remains “Processing” for extended period Solution:- Check if file size exceeds limits for your tier
- Verify presigned URL hasn’t expired (for S3)
- Ensure Hugging Face token has proper permissions (for private repos)
5. Authentication Errors
Problem: 401 or 403 errors during upload Solution:- Verify your Together API key is valid
- For Hugging Face Hub private repos, ensure HF token is included
- For S3, check presigned URL is properly generated
FAQs
Q: What are the adapter limits based on my tier?
