curl -X PATCH \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
--data '{ "gpu_count": 2 }' \
https://api.together.ai/v1/deployments/my-deployment{
"id": "dep_abc123",
"name": "my-video-model",
"description": "Video generation model with Wan 2.1",
"image": "registry.together.xyz/proj_abc123/video-model@sha256:abc123def456",
"status": "Ready",
"gpu_type": "h100-80gb",
"gpu_count": 2,
"cpu": 8,
"memory": 64,
"storage": 200,
"min_replicas": 1,
"max_replicas": 20,
"desired_replicas": 3,
"ready_replicas": 3,
"port": 8000,
"health_check_path": "/health",
"autoscaling": {
"metric": "QueueBacklogPerWorker",
"target": "1.05"
},
"environment_variables": [
{
"name": "MODEL_PATH",
"value": "/models/weights"
}
],
"created_at": "2026-02-07T10:00:00Z",
"updated_at": "2026-02-07T10:00:00Z"
}Update an existing deployment configuration
curl -X PATCH \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
--data '{ "gpu_count": 2 }' \
https://api.together.ai/v1/deployments/my-deployment{
"id": "dep_abc123",
"name": "my-video-model",
"description": "Video generation model with Wan 2.1",
"image": "registry.together.xyz/proj_abc123/video-model@sha256:abc123def456",
"status": "Ready",
"gpu_type": "h100-80gb",
"gpu_count": 2,
"cpu": 8,
"memory": 64,
"storage": 200,
"min_replicas": 1,
"max_replicas": 20,
"desired_replicas": 3,
"ready_replicas": 3,
"port": 8000,
"health_check_path": "/health",
"autoscaling": {
"metric": "QueueBacklogPerWorker",
"target": "1.05"
},
"environment_variables": [
{
"name": "MODEL_PATH",
"value": "/models/weights"
}
],
"created_at": "2026-02-07T10:00:00Z",
"updated_at": "2026-02-07T10:00:00Z"
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Deployment ID or name
Updated deployment configuration
Args overrides the container's CMD. Provide as an array of arguments (e.g., ["python", "app.py"])
Autoscaling configuration as key-value pairs. Example: {"metric": "QueueBacklogPerWorker", "target": "10"} to scale based on queue backlog
Show child attributes
Command overrides the container's ENTRYPOINT. Provide as an array (e.g., ["/bin/sh", "-c"])
CPU is the number of CPU cores to allocate per container instance (e.g., 0.1 = 100 milli cores)
x >= 0.1Description is an optional human-readable description of your deployment
EnvironmentVariables is a list of environment variables to set in the container. This will replace all existing environment variables
Show child attributes
GPUCount is the number of GPUs to allocate per container instance
GPUType specifies the GPU hardware to use (e.g., "h100-80gb")
h100-80gb, a100-80gb HealthCheckPath is the HTTP path for health checks (e.g., "/health"). Set to empty string to disable health checks
Image is the container image to deploy from registry.together.ai.
MaxReplicas is the maximum number of replicas that can be scaled up to.
Memory is the amount of RAM to allocate per container instance in GiB (e.g., 0.5 = 512MiB)
x >= 0.1MinReplicas is the minimum number of replicas to run
Name is the new unique identifier for your deployment. Must contain only alphanumeric characters, underscores, or hyphens (1-100 characters)
1 - 100Port is the container port your application listens on (e.g., 8080 for web servers)
Storage is the amount of ephemeral disk storage to allocate per container instance (e.g., 10 = 10GiB)
TerminationGracePeriodSeconds is the time in seconds to wait for graceful shutdown before forcefully terminating the replica
Volumes is a list of volume mounts to attach to the container. This will replace all existing volumes
Show child attributes
Deployment updated successfully
Args are the arguments passed to the container's command
Autoscaling contains autoscaling configuration parameters for this deployment
Show child attributes
Command is the entrypoint command run in the container
CPU is the amount of CPU resource allocated to each replica in cores (fractional value is allowed)
CreatedAt is the ISO8601 timestamp when this deployment was created
Description provides a human-readable explanation of the deployment's purpose or content
DesiredReplicas is the number of replicas that the orchestrator is targeting
EnvironmentVariables is a list of environment variables set in the container
Show child attributes
GPUCount is the number of GPUs allocated to each replica in this deployment
GPUType specifies the type of GPU requested (if any) for this deployment
h100-80gb, a100-80gb HealthCheckPath is the HTTP path used for health checks of the application
ID is the unique identifier of the deployment
Image specifies the container image used for this deployment
MaxReplicas is the maximum number of replicas to run for this deployment
Memory is the amount of memory allocated to each replica in GiB (fractional value is allowed)
MinReplicas is the minimum number of replicas to run for this deployment
Name is the name of the deployment
Object is the type identifier for this response (always "deployment")
Port is the container port that the deployment exposes
ReadyReplicas is the current number of replicas that are in the Ready state
ReplicaEvents is a mapping of replica names or IDs to their status events
Show child attributes
Status represents the overall status of the deployment (e.g., Updating, Scaling, Ready, Failed)
Updating, Scaling, Ready, Failed Storage is the amount of storage (in MB or units as defined by the platform) allocated to each replica
UpdatedAt is the ISO8601 timestamp when this deployment was last updated
Volumes is a list of volume mounts for this deployment
Show child attributes
Was this page helpful?