curl -X GET https://api.together.ai/v1/deployments/my-deployment \
-H "Authorization: Bearer $TOGETHER_API_KEY"{
"id": "dep_abc123",
"name": "my-video-model",
"description": "Video generation model with Wan 2.1",
"image": "registry.together.xyz/proj_abc123/video-model@sha256:abc123def456",
"status": "Ready",
"gpu_type": "h100-80gb",
"gpu_count": 2,
"cpu": 8,
"memory": 64,
"storage": 200,
"min_replicas": 1,
"max_replicas": 20,
"desired_replicas": 3,
"ready_replicas": 3,
"port": 8000,
"health_check_path": "/health",
"autoscaling": {
"metric": "QueueBacklogPerWorker",
"target": "1.05"
},
"environment_variables": [
{
"name": "MODEL_PATH",
"value": "/models/weights"
}
],
"created_at": "2026-02-07T10:00:00Z",
"updated_at": "2026-02-07T10:00:00Z"
}Retrieve details of a specific deployment by its ID or name
curl -X GET https://api.together.ai/v1/deployments/my-deployment \
-H "Authorization: Bearer $TOGETHER_API_KEY"{
"id": "dep_abc123",
"name": "my-video-model",
"description": "Video generation model with Wan 2.1",
"image": "registry.together.xyz/proj_abc123/video-model@sha256:abc123def456",
"status": "Ready",
"gpu_type": "h100-80gb",
"gpu_count": 2,
"cpu": 8,
"memory": 64,
"storage": 200,
"min_replicas": 1,
"max_replicas": 20,
"desired_replicas": 3,
"ready_replicas": 3,
"port": 8000,
"health_check_path": "/health",
"autoscaling": {
"metric": "QueueBacklogPerWorker",
"target": "1.05"
},
"environment_variables": [
{
"name": "MODEL_PATH",
"value": "/models/weights"
}
],
"created_at": "2026-02-07T10:00:00Z",
"updated_at": "2026-02-07T10:00:00Z"
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Deployment ID or name
Deployment details
Args are the arguments passed to the container's command
Autoscaling contains autoscaling configuration parameters for this deployment
Show child attributes
Command is the entrypoint command run in the container
CPU is the amount of CPU resource allocated to each replica in cores (fractional value is allowed)
CreatedAt is the ISO8601 timestamp when this deployment was created
Description provides a human-readable explanation of the deployment's purpose or content
DesiredReplicas is the number of replicas that the orchestrator is targeting
EnvironmentVariables is a list of environment variables set in the container
Show child attributes
GPUCount is the number of GPUs allocated to each replica in this deployment
GPUType specifies the type of GPU requested (if any) for this deployment
h100-80gb, a100-80gb HealthCheckPath is the HTTP path used for health checks of the application
ID is the unique identifier of the deployment
Image specifies the container image used for this deployment
MaxReplicas is the maximum number of replicas to run for this deployment
Memory is the amount of memory allocated to each replica in GiB (fractional value is allowed)
MinReplicas is the minimum number of replicas to run for this deployment
Name is the name of the deployment
Object is the type identifier for this response (always "deployment")
Port is the container port that the deployment exposes
ReadyReplicas is the current number of replicas that are in the Ready state
ReplicaEvents is a mapping of replica names or IDs to their status events
Show child attributes
Status represents the overall status of the deployment (e.g., Updating, Scaling, Ready, Failed)
Updating, Scaling, Ready, Failed Storage is the amount of storage (in MB or units as defined by the platform) allocated to each replica
UpdatedAt is the ISO8601 timestamp when this deployment was last updated
Volumes is a list of volume mounts for this deployment
Show child attributes
Was this page helpful?