You can upload your own LoRA (Low-Rank Adaptation) adapters to Together AI and run inference on them through a dedicated endpoint. Adapters can come from the Hugging Face Hub or from an archive in S3, including adapters you trained outside of Together AI. To upload a full custom model instead of an adapter, see Upload a model.Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
An adapter is eligible for upload if it meets all of the following:- Source: Hugging Face Hub or an S3 presigned URL.
- Files: the adapter directory must contain
adapter_config.jsonandadapter_model.safetensors. - Base model: the adapter must target a base model that Together AI supports for dedicated inference.
.zip or .tar.gz) with the files at the root of the archive, not nested inside an extra top-level directory. The presigned URL must point to the archive and have an expiration of at least 100 minutes.
Upload the adapter
- CLI / SDK
- UI
Upload from Hugging Face by passing the repo URL as Upload from S3 by passing the presigned archive URL as A successful upload returns the upload job:Note the
model_source and setting model_type to adapter. Include your Hugging Face token for private or gated repos.model_source:job_id (used to check status) and the model_name (used to deploy and call the adapter).CLI options
| Option | Required | Description |
|---|---|---|
--model-name | Yes | The name to give the uploaded adapter. |
--model-source | Yes | A Hugging Face repo URL or an S3 presigned URL. |
--model-type | Yes | Set to adapter for LoRA adapters. |
--base-model | Yes | The base model the adapter targets. |
--hf-token | For Hugging Face | Your Hugging Face token. Required for private or gated repos. |
--description | No | A description of the adapter. |
Check upload status
Poll the upload job until itsstatus field is Complete. The adapter is ready to deploy at that point.
Deploy the adapter
Uploaded adapters deploy as dedicated endpoints, the same way as any other model. Use themodel_name from the upload response as the model argument when creating the endpoint.
- CLI / SDK
- UI
List hardware available for the adapter, then create the endpoint with one of the returned hardware IDs:See Manage dedicated endpoints for the full endpoint lifecycle, including autoscaling, listing, and deletion.
Run inference
Once the endpoint is running, call it like any other Together AI chat or completions model. Use themodel_name from the upload response as the model parameter.
Troubleshooting
“Model name already exists”: Each uploaded adapter needs a unique name. Adapter versioning isn’t supported, so re-upload under a new name. Missing required files: The adapter source must contain bothadapter_config.json and adapter_model.safetensors. Confirm both are present at the root of the archive (S3) or in the Files and versions tab on Hugging Face.
Base model incompatibility: The adapter must target a base model that Together AI supports for dedicated inference. Verify the base model you trained against is available on dedicated endpoints.
Upload job stuck in Processing: Most often this means the source can’t be reached. For S3, confirm the presigned URL hasn’t expired. For Hugging Face, confirm your token has access to the repo.
401 or 403 during upload: Check that TOGETHER_API_KEY is set, your Hugging Face token has permission for private repos, and your S3 presigned URL is valid and not expired.