Skip to main content
You can upload custom or fine-tuned models from Hugging Face or S3 and run inference on a dedicated endpoint through Together AI. This is a quick guide that shows you how to do this through our UI or CLI.

Requirements

Currently, we support models that meet the following criteria.
  • Source: We support uploads from Hugging Face or S3.
  • Type: We support text generation and embedding models.
  • Scale: We currently only support models that fit in a single node. Multi-node models are not supported when you upload a custom model.

Getting Started

Upload the model

Model uploads can be done via the UI, API or the CLI. The API reference can be found here.

UI

To upload via the web, just log in and navigate to models > add custom model to reach this page:
Upload model
Then fill in the source URL (S3 or Hugging Face), the model name and how you would like it described in your Together account once uploaded.

CLI

Upload a model from Hugging Face or S3:
together models upload \
  --model-name <your_model_name> \
  --model-source <path_to_model_or_repo> \
  --model-type <model_or_adapter> \
  --hf-token <your_HF_token_if_uploading_from_HF> \
  --description <description_of_your_model>

Deploy the model

Uploaded models are treated like any other dedicated endpoint models. Deploying a custom model can be done via the UI, API or the CLI. The API reference can be found here.

UI

All models, custom and finetuned models as well as any model that has a dedicated endpoint will be listed under My Models. To deploy a custom model: Select the model to open the model page.
My Models
The model page will display details from your uploaded model with an option to create a dedicated endpoint.
Create Dedicated Endpoint
When you select ‘Create Dedicated Endpoint’ you will see an option to configure the deployment.
Create Dedicated Endpoint
Once an endpoint has been deployed, you can interact with it on the playground or via the API.

CLI

After uploading your model, you can verify its registration and check available hardware options. List your uploaded models:
bash CLI together models list
View available GPU SKUs for a specific model:
together endpoints hardware --model <model-name>
Once your model is uploaded, create a dedicated inference endpoint:
together endpoints create \
  --display-name <endpoint-name> \
  --model <model-name> \
  --gpu h100 \
  --no-speculative-decoding \
  --no-prompt-cache \
  --gpu-count 2
After deploying, you can view all your endpoints and retrieve connection details such as URL, scaling configuration, and status. List all endpoints:
bash CLI together endpoints list
Get details for a specific endpoint:
together endpoints get <endpoint-id>