Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
All cluster management operations are available through multiple interfaces for programmatic control and automation:
- tcloud CLI – Command-line tool for cluster operations
- REST API – Full HTTP API for custom integrations
- Terraform Provider – Infrastructure-as-code for reproducible deployments
- SkyPilot – Orchestrate AI workloads across clusters
tcloud CLI
The tcloud CLI provides a command-line interface for managing clusters, storage, and scaling.
Installation
Download the CLI for your platform:
Authentication
Authenticate via Google SSO:
Common Commands
Create a cluster:
tcloud cluster create my-cluster \
--num-gpus 8 \
--reservation-duration 1 \
--instance-type H100-SXM \
--region us-central-8 \
--shared-volume-name my-volume \
--size-tib 1
Specify billing type (reserved vs on-demand):
# Reserved capacity
tcloud cluster create my-cluster \
--num-gpus 8 \
--billing-type prepaid \
--reservation-duration 30 \
--instance-type H100-SXM \
--region us-central-8 \
--shared-volume-name my-volume \
--size-tib 1
# On-demand capacity
tcloud cluster create my-cluster \
--num-gpus 8 \
--billing-type on_demand \
--instance-type H100-SXM \
--region us-central-8 \
--shared-volume-name my-volume \
--size-tib 1
Delete a cluster:
tcloud cluster delete <CLUSTER_UUID>
List clusters:
Scale a cluster:
tcloud cluster scale <CLUSTER_UUID> --num-gpus 16
REST API
All cluster management actions are available via REST API endpoints.
API Reference
Complete API documentation is available at:
GPU Cluster API Reference →
Example: Create Cluster
curl -X POST "https://manager.cloud.together.ai/api/v1/gpu_cluster" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "my-cluster",
"num_gpus": 8,
"instance_type": "H100-SXM",
"region": "us-central-8",
"billing_type": "prepaid",
"reservation_duration": 30,
"shared_volume": {
"name": "my-volume",
"size_tib": 1
}
}'
Example: List Clusters
curl -X GET "https://manager.cloud.together.ai/api/v1/gpu_clusters" \
-H "Authorization: Bearer $TOGETHER_API_KEY"
Example: Delete Cluster
curl -X DELETE "https://manager.cloud.together.ai/api/v1/gpu_cluster/{cluster_id}" \
-H "Authorization: Bearer $TOGETHER_API_KEY"
Use the Together Terraform Provider to define clusters, storage, and scaling policies as code.
Setup
terraform {
required_providers {
together = {
source = "together-ai/together"
version = "~> 1.0"
}
}
}
provider "together" {
api_key = var.together_api_key
}
Example: Define a Cluster
resource "together_gpu_cluster" "training_cluster" {
name = "training-cluster"
num_gpus = 8
instance_type = "H100-SXM"
region = "us-central-8"
billing_type = "prepaid"
reservation_days = 30
shared_volume {
name = "training-data"
size_tib = 5
}
}
Benefits
- Version control – Track infrastructure changes in Git
- Reproducibility – Deploy identical clusters across environments
- Automation – Integrate with CI/CD pipelines
- State management – Terraform tracks cluster state automatically
SkyPilot Integration
Orchestrate AI workloads on GPU Clusters using SkyPilot for simplified cluster management and job scheduling.
Installation
uv pip install skypilot[kubernetes]
Setup
-
Launch a Kubernetes cluster via Together Cloud
-
Configure kubeconfig:
Download the kubeconfig from the cluster UI and merge it:
# Option 1: Replace existing config
cp together-kubeconfig ~/.kube/config
# Option 2: Merge with existing config
KUBECONFIG=./together-kubeconfig:~/.kube/config \
kubectl config view --flatten > /tmp/merged_kubeconfig && \
mv /tmp/merged_kubeconfig ~/.kube/config
- Verify SkyPilot access:
Expected output:
Checking credentials to enable infra for SkyPilot.
Kubernetes: enabled [compute]
Allowed contexts:
└── t-51326e6b-25ec-42dd-8077-6f3c9b9a34c6-admin: enabled.
🎉 Enabled infra 🎉
Kubernetes [compute]
- Check available GPUs:
sky show-gpus --infra k8s
Example: Launch a Workload
Create a SkyPilot task file (task.yaml):
resources:
accelerators: H100:8
cloud: kubernetes
setup: |
pip install torch transformers
run: |
python train.py
Launch the task:
sky launch -c my-job task.yaml
Example: Fine-tune GPT OSS
Download the gpt-oss-20b.yaml configuration.
Launch fine-tuning:
sky launch -c gpt-together gpt-oss-20b.yaml
Benefits
- Simplified orchestration – Abstract away Kubernetes complexity
- Multi-cloud support – Same workflow across different clouds
- Cost optimization – Auto-select cheapest available resources
- Job management – Easy monitoring and cancellation
Automation Patterns
CI/CD Integration
GitHub Actions example:
name: Train Model
on: push
jobs:
train:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Create GPU Cluster
run: |
tcloud cluster create training-${{ github.sha }} \
--num-gpus 8 \
--billing-type on_demand \
--instance-type H100-SXM \
--region us-central-8
- name: Run Training
run: |
# Submit training job to cluster
kubectl apply -f training-job.yaml
- name: Cleanup
if: always()
run: |
tcloud cluster delete training-${{ github.sha }}
Scheduled Jobs
Cron-based cluster creation:
# Create cluster daily at 6 AM for batch processing
0 6 * * * tcloud cluster create daily-batch \
--num-gpus 16 \
--billing-type on_demand \
--instance-type H100-SXM
Auto-scaling Scripts
import requests
def scale_cluster(cluster_id, target_gpus):
response = requests.put(
f"https://manager.cloud.together.ai/api/v1/gpu_cluster",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"cluster_id": cluster_id, "num_gpus": target_gpus},
)
return response.json()
# Scale based on job queue length
if job_queue_length > 100:
scale_cluster("cluster-123", 16)
else:
scale_cluster("cluster-123", 8)
Best Practices
API Usage
- Use environment variables for API keys (never hardcode)
- Implement retry logic for transient failures
- Check cluster status before submitting jobs
- Clean up resources after completion
CLI Usage
- Authenticate once per session with
tcloud sso login
- Use UUIDs for cluster references (more reliable than names)
- Script common operations for team consistency
- Version control your cluster configuration scripts
- Use remote state for team collaboration
- Tag resources for cost tracking
- Use variables for environment-specific configs
- Test in dev before applying to production
Troubleshooting
Authentication issues
- Verify API key is set:
echo $TOGETHER_API_KEY
- Re-authenticate with SSO:
tcloud sso login
- Check token expiration
API rate limits
- Implement exponential backoff
- Batch operations when possible
- Contact support for higher limits
- Use remote state locking
- Coordinate with team on apply operations
- Use
terraform plan before apply
What’s Next?