Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

You can run inference on your own custom or fine-tuned models by uploading them to Together AI and deploying them on a dedicated endpoint. Models can come from the Hugging Face Hub or from an archive in S3.

Prerequisites

A model is eligible for upload if it meets all of the following:
  • Source: Hugging Face Hub or an S3 presigned URL.
  • Type: text generation or embedding model.
  • Scale: fits on a single node. Multi-node models aren’t supported.
The model files must be in standard Hugging Face repository format, compatible with from_pretrained. A valid model directory contains files like:
config.json
generation_config.json
model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors
model.safetensors.index.json
special_tokens_map.json
tokenizer.json
tokenizer_config.json
To upload a LoRA adapter instead of a full model, see Deploy a fine-tuned adapter.

S3 archive requirements

If you’re uploading from S3, package the files in a single archive (.zip or .tar.gz) with the model files at the root of the archive. Don’t nest them inside an extra top-level directory. Correct (files at root):
config.json
model.safetensors
tokenizer.json
...
Incorrect (files nested in a directory):
my-model/
  config.json
  model.safetensors
  tokenizer.json
  ...
To create the archive from within the model directory:
Shell
cd /path/to/your/model
tar -czvf ../model.tar.gz .
The presigned URL must point to the archive file in S3 and have an expiration of at least 100 minutes.

Upload the model

Upload from Hugging Face by passing the repo path as model_source. Include your Hugging Face token for private or gated repos.
together models upload \
  --model-name <MODEL_NAME> \
  --model-source <HUGGING_FACE_REPO> \
  --hf-token "$HUGGINGFACE_TOKEN"
Upload from S3 by passing the presigned archive URL as model_source:
together models upload \
  --model-name <MODEL_NAME> \
  --model-source <S3_PRESIGNED_URL>
The response includes a job_id. Use it to poll for upload status.

CLI options

OptionRequiredDescription
--model-nameYesThe name to give the uploaded model.
--model-sourceYesA Hugging Face repo path or an S3 presigned URL.
--hf-tokenFor Hugging FaceYour Hugging Face token. Required for private or gated repos.
--model-typeNomodel (default) or adapter.
--descriptionNoA description of the model.

Check upload status

Poll the upload job until its status field is Complete. The model is ready to deploy at that point.
curl -X GET "https://api.together.ai/v1/jobs/<JOB_ID>" \
     -H "Authorization: Bearer $TOGETHER_API_KEY"
You can also see uploaded models on the My models page in the dashboard.

Deploy the model

Uploaded models deploy as dedicated endpoints, the same way as any other model.
List hardware available for the uploaded model:
together endpoints hardware --model <MODEL_NAME>
Create the endpoint, using the hardware ID from the list:
together endpoints create \
  --display-name <ENDPOINT_NAME> \
  --model <MODEL_NAME> \
  --hardware <HARDWARE_ID>
See Manage dedicated endpoints for the full endpoint lifecycle, including autoscaling, listing, and deletion.