Skip to main content
POST
/
deployments
cURL
curl -X POST https://api.together.ai/v1/deployments \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-deployment",
    "gpu_type": "h100-80gb",
    "image": "registry.together.xyz/proj_abcdefg1234567890/my-image:latest"
  }'
{
  "id": "dep_abc123",
  "name": "my-video-model",
  "description": "Video generation model with Wan 2.1",
  "image": "registry.together.xyz/proj_abc123/video-model@sha256:abc123def456",
  "status": "Ready",
  "gpu_type": "h100-80gb",
  "gpu_count": 2,
  "cpu": 8,
  "memory": 64,
  "storage": 200,
  "min_replicas": 1,
  "max_replicas": 20,
  "desired_replicas": 3,
  "ready_replicas": 3,
  "port": 8000,
  "health_check_path": "/health",
  "autoscaling": {
    "metric": "QueueBacklogPerWorker",
    "target": "1.05"
  },
  "environment_variables": [
    {
      "name": "MODEL_PATH",
      "value": "/models/weights"
    }
  ],
  "created_at": "2026-02-07T10:00:00Z",
  "updated_at": "2026-02-07T10:00:00Z"
}

Authorizations

Authorization
string
header
default:default
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Deployment configuration

gpu_type
enum<string>
required

GPUType specifies the GPU hardware to use (e.g., "h100-80gb").

Available options:
h100-80gb,
a100-80gb
image
string
required

Image is the container image to deploy from registry.together.ai.

name
string
required

Name is the unique identifier for your deployment. Must contain only alphanumeric characters, underscores, or hyphens (1-100 characters)

Required string length: 1 - 100
args
string[]

Args overrides the container's CMD. Provide as an array of arguments (e.g., ["python", "app.py"])

autoscaling
object

Autoscaling configuration as key-value pairs. Example: {"metric": "QueueBacklogPerWorker", "target": "10"} to scale based on queue backlog

command
string[]

Command overrides the container's ENTRYPOINT. Provide as an array (e.g., ["/bin/sh", "-c"])

cpu
number

CPU is the number of CPU cores to allocate per container instance (e.g., 0.1 = 100 milli cores)

Required range: x >= 0.1
description
string

Description is an optional human-readable description of your deployment

environment_variables
object[]

EnvironmentVariables is a list of environment variables to set in the container. Each must have a name and either a value or value_from_secret

gpu_count
integer

GPUCount is the number of GPUs to allocate per container instance. Defaults to 0 if not specified

health_check_path
string

HealthCheckPath is the HTTP path for health checks (e.g., "/health"). If set, the platform will check this endpoint to determine container health

max_replicas
integer

MaxReplicas is the maximum number of container instances that can be scaled up to. If not set, will be set to MinReplicas

memory
number

Memory is the amount of RAM to allocate per container instance in GiB (e.g., 0.5 = 512MiB)

Required range: x >= 0.1
min_replicas
integer

MinReplicas is the minimum number of container instances to run. Defaults to 1 if not specified

port
integer

Port is the container port your application listens on (e.g., 8080 for web servers). Required if your application serves traffic

storage
integer

Storage is the amount of ephemeral disk storage to allocate per container instance (e.g., 10 = 10GiB)

termination_grace_period_seconds
integer

TerminationGracePeriodSeconds is the time in seconds to wait for graceful shutdown before forcefully terminating the replica

volumes
object[]

Volumes is a list of volume mounts to attach to the container. Each mount must reference an existing volume by name

Response

Deployment created successfully

args
string[]

Args are the arguments passed to the container's command

autoscaling
object

Autoscaling contains autoscaling configuration parameters for this deployment

command
string[]

Command is the entrypoint command run in the container

cpu
number

CPU is the amount of CPU resource allocated to each replica in cores (fractional value is allowed)

created_at
string

CreatedAt is the ISO8601 timestamp when this deployment was created

description
string

Description provides a human-readable explanation of the deployment's purpose or content

desired_replicas
integer

DesiredReplicas is the number of replicas that the orchestrator is targeting

environment_variables
object[]

EnvironmentVariables is a list of environment variables set in the container

gpu_count
integer

GPUCount is the number of GPUs allocated to each replica in this deployment

gpu_type
enum<string>

GPUType specifies the type of GPU requested (if any) for this deployment

Available options:
h100-80gb,
a100-80gb
health_check_path
string

HealthCheckPath is the HTTP path used for health checks of the application

id
string

ID is the unique identifier of the deployment

image
string

Image specifies the container image used for this deployment

max_replicas
integer

MaxReplicas is the maximum number of replicas to run for this deployment

memory
number

Memory is the amount of memory allocated to each replica in GiB (fractional value is allowed)

min_replicas
integer

MinReplicas is the minimum number of replicas to run for this deployment

name
string

Name is the name of the deployment

object
string

Object is the type identifier for this response (always "deployment")

port
integer

Port is the container port that the deployment exposes

ready_replicas
integer

ReadyReplicas is the current number of replicas that are in the Ready state

replica_events
object

ReplicaEvents is a mapping of replica names or IDs to their status events

status
enum<string>

Status represents the overall status of the deployment (e.g., Updating, Scaling, Ready, Failed)

Available options:
Updating,
Scaling,
Ready,
Failed
storage
integer

Storage is the amount of storage (in MB or units as defined by the platform) allocated to each replica

updated_at
string

UpdatedAt is the ISO8601 timestamp when this deployment was last updated

volumes
object[]

Volumes is a list of volume mounts for this deployment