curl -X POST https://api.together.ai/v1/deployments \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "my-deployment",
"gpu_type": "h100-80gb",
"image": "registry.together.xyz/proj_abcdefg1234567890/my-image:latest"
}'{
"id": "dep_abc123",
"name": "my-video-model",
"description": "Video generation model with Wan 2.1",
"image": "registry.together.xyz/proj_abc123/video-model@sha256:abc123def456",
"status": "Ready",
"gpu_type": "h100-80gb",
"gpu_count": 2,
"cpu": 8,
"memory": 64,
"storage": 200,
"min_replicas": 1,
"max_replicas": 20,
"desired_replicas": 3,
"ready_replicas": 3,
"port": 8000,
"health_check_path": "/health",
"autoscaling": {
"metric": "QueueBacklogPerWorker",
"target": "1.05"
},
"environment_variables": [
{
"name": "MODEL_PATH",
"value": "/models/weights"
}
],
"created_at": "2026-02-07T10:00:00Z",
"updated_at": "2026-02-07T10:00:00Z"
}Create a new deployment with specified configuration
curl -X POST https://api.together.ai/v1/deployments \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "my-deployment",
"gpu_type": "h100-80gb",
"image": "registry.together.xyz/proj_abcdefg1234567890/my-image:latest"
}'{
"id": "dep_abc123",
"name": "my-video-model",
"description": "Video generation model with Wan 2.1",
"image": "registry.together.xyz/proj_abc123/video-model@sha256:abc123def456",
"status": "Ready",
"gpu_type": "h100-80gb",
"gpu_count": 2,
"cpu": 8,
"memory": 64,
"storage": 200,
"min_replicas": 1,
"max_replicas": 20,
"desired_replicas": 3,
"ready_replicas": 3,
"port": 8000,
"health_check_path": "/health",
"autoscaling": {
"metric": "QueueBacklogPerWorker",
"target": "1.05"
},
"environment_variables": [
{
"name": "MODEL_PATH",
"value": "/models/weights"
}
],
"created_at": "2026-02-07T10:00:00Z",
"updated_at": "2026-02-07T10:00:00Z"
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Deployment configuration
GPUType specifies the GPU hardware to use (e.g., "h100-80gb").
h100-80gb, a100-80gb Image is the container image to deploy from registry.together.ai.
Name is the unique identifier for your deployment. Must contain only alphanumeric characters, underscores, or hyphens (1-100 characters)
1 - 100Args overrides the container's CMD. Provide as an array of arguments (e.g., ["python", "app.py"])
Autoscaling configuration as key-value pairs. Example: {"metric": "QueueBacklogPerWorker", "target": "10"} to scale based on queue backlog
Show child attributes
Command overrides the container's ENTRYPOINT. Provide as an array (e.g., ["/bin/sh", "-c"])
CPU is the number of CPU cores to allocate per container instance (e.g., 0.1 = 100 milli cores)
x >= 0.1Description is an optional human-readable description of your deployment
EnvironmentVariables is a list of environment variables to set in the container. Each must have a name and either a value or value_from_secret
Show child attributes
GPUCount is the number of GPUs to allocate per container instance. Defaults to 0 if not specified
HealthCheckPath is the HTTP path for health checks (e.g., "/health"). If set, the platform will check this endpoint to determine container health
MaxReplicas is the maximum number of container instances that can be scaled up to. If not set, will be set to MinReplicas
Memory is the amount of RAM to allocate per container instance in GiB (e.g., 0.5 = 512MiB)
x >= 0.1MinReplicas is the minimum number of container instances to run. Defaults to 1 if not specified
Port is the container port your application listens on (e.g., 8080 for web servers). Required if your application serves traffic
Storage is the amount of ephemeral disk storage to allocate per container instance (e.g., 10 = 10GiB)
TerminationGracePeriodSeconds is the time in seconds to wait for graceful shutdown before forcefully terminating the replica
Volumes is a list of volume mounts to attach to the container. Each mount must reference an existing volume by name
Show child attributes
Deployment created successfully
Args are the arguments passed to the container's command
Autoscaling contains autoscaling configuration parameters for this deployment
Show child attributes
Command is the entrypoint command run in the container
CPU is the amount of CPU resource allocated to each replica in cores (fractional value is allowed)
CreatedAt is the ISO8601 timestamp when this deployment was created
Description provides a human-readable explanation of the deployment's purpose or content
DesiredReplicas is the number of replicas that the orchestrator is targeting
EnvironmentVariables is a list of environment variables set in the container
Show child attributes
GPUCount is the number of GPUs allocated to each replica in this deployment
GPUType specifies the type of GPU requested (if any) for this deployment
h100-80gb, a100-80gb HealthCheckPath is the HTTP path used for health checks of the application
ID is the unique identifier of the deployment
Image specifies the container image used for this deployment
MaxReplicas is the maximum number of replicas to run for this deployment
Memory is the amount of memory allocated to each replica in GiB (fractional value is allowed)
MinReplicas is the minimum number of replicas to run for this deployment
Name is the name of the deployment
Object is the type identifier for this response (always "deployment")
Port is the container port that the deployment exposes
ReadyReplicas is the current number of replicas that are in the Ready state
ReplicaEvents is a mapping of replica names or IDs to their status events
Show child attributes
Status represents the overall status of the deployment (e.g., Updating, Scaling, Ready, Failed)
Updating, Scaling, Ready, Failed Storage is the amount of storage (in MB or units as defined by the platform) allocated to each replica
UpdatedAt is the ISO8601 timestamp when this deployment was last updated
Volumes is a list of volume mounts for this deployment
Show child attributes
Was this page helpful?