You can run inference on your own custom or fine-tuned models by uploading them to Together AI and deploying them on a dedicated endpoint. Models can come from the Hugging Face Hub or from an archive in S3.Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
A model is eligible for upload if it meets all of the following:- Source: Hugging Face Hub or an S3 presigned URL.
- Type: text generation or embedding model.
- Scale: fits on a single node. Multi-node models aren’t supported.
from_pretrained. A valid model directory contains files like:
S3 archive requirements
If you’re uploading from S3, package the files in a single archive (.zip or .tar.gz) with the model files at the root of the archive. Don’t nest them inside an extra top-level directory.
Correct (files at root):
Shell
Upload the model
- CLI / SDK
- UI
Upload from Hugging Face by passing the repo path as Upload from S3 by passing the presigned archive URL as The response includes a
model_source. Include your Hugging Face token for private or gated repos.model_source:job_id. Use it to poll for upload status.CLI options
| Option | Required | Description |
|---|---|---|
--model-name | Yes | The name to give the uploaded model. |
--model-source | Yes | A Hugging Face repo path or an S3 presigned URL. |
--hf-token | For Hugging Face | Your Hugging Face token. Required for private or gated repos. |
--model-type | No | model (default) or adapter. |
--description | No | A description of the model. |
Check upload status
Poll the upload job until itsstatus field is Complete. The model is ready to deploy at that point.
Deploy the model
Uploaded models deploy as dedicated endpoints, the same way as any other model.- CLI / SDK
- UI
List hardware available for the uploaded model:Create the endpoint, using the hardware ID from the list:See Manage dedicated endpoints for the full endpoint lifecycle, including autoscaling, listing, and deletion.