Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

You can upload your own LoRA (Low-Rank Adaptation) adapters to Together AI and run inference on them through a dedicated endpoint. Adapters can come from the Hugging Face Hub or from an archive in S3, including adapters you trained outside of Together AI. To upload a full custom model instead of an adapter, see Upload a model.

Prerequisites

An adapter is eligible for upload if it meets all of the following:
  • Source: Hugging Face Hub or an S3 presigned URL.
  • Files: the adapter directory must contain adapter_config.json and adapter_model.safetensors.
  • Base model: the adapter must target a base model that Together AI supports for dedicated inference.
If you’re uploading from S3, package the adapter files in a single archive (.zip or .tar.gz) with the files at the root of the archive, not nested inside an extra top-level directory. The presigned URL must point to the archive and have an expiration of at least 100 minutes.

Upload the adapter

Upload from Hugging Face by passing the repo URL as model_source and setting model_type to adapter. Include your Hugging Face token for private or gated repos.
together models upload \
  --model-name <ADAPTER_NAME> \
  --model-source <HUGGING_FACE_REPO_URL> \
  --model-type adapter \
  --base-model <BASE_MODEL_ID> \
  --hf-token "$HUGGINGFACE_TOKEN"
Upload from S3 by passing the presigned archive URL as model_source:
together models upload \
  --model-name <ADAPTER_NAME> \
  --model-source <S3_PRESIGNED_URL> \
  --model-type adapter \
  --base-model <BASE_MODEL_ID>
A successful upload returns the upload job:
{
  "data": {
    "job_id": "job-b641db51-38e8-40f2-90a0-5353aeda6f21",
    "model_name": "devuser/test-lora-model-creation-8b",
    "model_source": "remote_archive"
  },
  "message": "job created"
}
Note the job_id (used to check status) and the model_name (used to deploy and call the adapter).

CLI options

OptionRequiredDescription
--model-nameYesThe name to give the uploaded adapter.
--model-sourceYesA Hugging Face repo URL or an S3 presigned URL.
--model-typeYesSet to adapter for LoRA adapters.
--base-modelYesThe base model the adapter targets.
--hf-tokenFor Hugging FaceYour Hugging Face token. Required for private or gated repos.
--descriptionNoA description of the adapter.

Check upload status

Poll the upload job until its status field is Complete. The adapter is ready to deploy at that point.
curl -X GET "https://api.together.ai/v1/jobs/<JOB_ID>" \
     -H "Authorization: Bearer $TOGETHER_API_KEY"
A completed job looks like this:
{
  "type": "adapter_upload",
  "job_id": "job-b641db51-38e8-40f2-90a0-5353aeda6f21",
  "status": "Complete",
  "status_updates": []
}
You can also see uploaded adapters on the My models page in the dashboard.

Deploy the adapter

Uploaded adapters deploy as dedicated endpoints, the same way as any other model. Use the model_name from the upload response as the model argument when creating the endpoint.
List hardware available for the adapter, then create the endpoint with one of the returned hardware IDs:
together endpoints hardware --model <ADAPTER_MODEL_NAME>

together endpoints create \
  --display-name <ENDPOINT_NAME> \
  --model <ADAPTER_MODEL_NAME> \
  --hardware <HARDWARE_ID>
See Manage dedicated endpoints for the full endpoint lifecycle, including autoscaling, listing, and deletion.

Run inference

Once the endpoint is running, call it like any other Together AI chat or completions model. Use the model_name from the upload response as the model parameter.
curl -X POST "https://api.together.ai/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ADAPTER_MODEL_NAME>",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 128,
    "temperature": 0.8
  }'

Troubleshooting

“Model name already exists”: Each uploaded adapter needs a unique name. Adapter versioning isn’t supported, so re-upload under a new name. Missing required files: The adapter source must contain both adapter_config.json and adapter_model.safetensors. Confirm both are present at the root of the archive (S3) or in the Files and versions tab on Hugging Face. Base model incompatibility: The adapter must target a base model that Together AI supports for dedicated inference. Verify the base model you trained against is available on dedicated endpoints. Upload job stuck in Processing: Most often this means the source can’t be reached. For S3, confirm the presigned URL hasn’t expired. For Hugging Face, confirm your token has access to the repo. 401 or 403 during upload: Check that TOGETHER_API_KEY is set, your Hugging Face token has permission for private repos, and your S3 presigned URL is valid and not expired.

FAQ

Can I upload adapters trained outside Together AI? Yes, as long as the adapter targets one of the supported base models and includes the required files. Can I update an existing adapter? No. Upload the new version under a different name. Adapter versioning isn’t supported yet.