Skip to main content
You can upload custom or fine-tuned models from Hugging Face or S3 and run inference on a dedicated endpoint through Together AI. This is a quick guide that shows you how to do this through our UI or CLI.

Requirements

Currently, we support models that meet the following criteria.
  • Source: We support uploads from Hugging Face or S3.
  • Type: We support text generation and embedding models.
  • Scale: We currently only support models that fit in a single node. Multi-node models are not supported when you upload a custom model.

Model upload via S3 presigned URL

When uploading a model to Together using an S3 presigned URL, the URL must point to a single archive file containing the model files.

Supported archive formats

The presigned URL must reference one of the following archive types:
  • .zip
  • .tar
  • .tar.gz

Required archive structure

The archive must contain the model files laid out in standard Hugging Face model repository format. The files should exist at the root of the archive, not nested inside an extra top-level directory. A valid archive will look like this when extracted:
config.json
generation_config.json
model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors
model.safetensors.index.json
special_tokens_map.json
tokenizer.json
tokenizer_config.json

Creating an archive from a Hugging Face model directory

If you already have a Hugging Face model directory on disk, make sure you archive the directory contents, not the individual files and not the directory itself. The key detail is to archive the current directory using . or / so that files are at the root of the archive.

Presigned URL requirements

  • The presigned URL must point to the archive file in S3.
  • The presigned URL expiration time must be at least 100 minutes.

Important notes

  • Do not upload individual files. Only a single archive file is supported.
  • Do not include additional wrapping directories inside the archive.
  • The structure must be compatible with Hugging Face from_pretrained loading.
If the archive does not follow this format, model upload or loading may fail.

Getting Started

Upload the model

Model uploads can be done via the UI, API or the CLI. The API reference can be found here.

UI

To upload via the web, just log in and navigate to models > add custom model to reach this page:
Upload model
Then fill in the source URL (S3 or Hugging Face), the model name and how you would like it described in your Together account once uploaded.

CLI

Upload a model from Hugging Face or S3:
together models upload \
  --model-name <your_model_name> \
  --model-source <path_to_model_or_repo> \
  --model-type <model_or_adapter> \
  --hf-token <your_HF_token_if_uploading_from_HF> \
  --description <description_of_your_model>

Checking the status of your upload

When an upload has been kicked off, it will return a job id. You can poll our API using the returned job id until the model has finished uploading.
curl -X GET "https://api.together.ai/v1/jobs/{jobId}" \
     -H "Authorization: Bearer $TOGETHER_API_KEY" \
     -H "Content-Type: application/json" \
The output contains a “status” field. When the “status” is “Complete”, your model is ready to be deployed.

Deploy the model

Uploaded models are treated like any other dedicated endpoint models. Deploying a custom model can be done via the UI, API or the CLI. The API reference can be found here.

UI

All models, custom and finetuned models as well as any model that has a dedicated endpoint will be listed under My Models. To deploy a custom model: Select the model to open the model page.
My Models
The model page will display details from your uploaded model with an option to create a dedicated endpoint.
Create Dedicated Endpoint
When you select ‘Create Dedicated Endpoint’ you will see an option to configure the deployment.
Create Dedicated Endpoint
Once an endpoint has been deployed, you can interact with it on the playground or via the API.

CLI

After uploading your model, you can verify its registration and check available hardware options. List your uploaded models:
bash CLI together models list
View available GPU SKUs for a specific model:
together endpoints hardware --model <model-name>
Once your model is uploaded, create a dedicated inference endpoint:
together endpoints create \
  --display-name <endpoint-name> \
  --model <model-name> \
  --gpu h100 \
  --no-speculative-decoding \
  --no-prompt-cache \
  --gpu-count 2
After deploying, you can view all your endpoints and retrieve connection details such as URL, scaling configuration, and status. List all endpoints:
bash CLI together endpoints list
Get details for a specific endpoint:
together endpoints get <endpoint-id>