Create a Cluster
Follow these steps to create your first GPU cluster:1. Access the Cluster Console
- Log into api.together.ai
- Click GPU Clusters in the top navigation menu
- Click Create Cluster
2. Choose Capacity Type
Select the billing mode that fits your needs:- Reserved – Pay upfront to reserve capacity for 1-90 days with discounted pricing
- On-demand – Pay hourly with no commitment; terminate anytime
3. Configure Your Cluster
Cluster Size- Select the number and type of GPUs (e.g.,
8xH100) - Available options: H100, H200, B200
- Enter a descriptive name for easy identification
- Kubernetes – For containerized workloads and K8s-native tools
- Slurm – For HPC-style batch scheduling and traditional workflows
- Select the datacenter region closest to your data or team
- Choose reservation length: 1-90 days
- Create and name your persistent storage volume
- Minimum size: 1 TiB
- Can be resized later as needed
- Select NVIDIA driver version
- Select CUDA version
4. Create and Verify
- Click Proceed to create your cluster
- Monitor the cluster status in the UI as it provisions
- Wait for status to transition to Ready
Next Steps
For Kubernetes Clusters
-
Install kubectl
- MacOS installation guide
- Or use your preferred method for your OS
-
Download kubeconfig
- From the cluster UI, download the kubeconfig file
- Copy it to your local machine:
- Verify connectivity
- Start using your cluster
For Slurm Clusters
-
Add SSH key (if not already done)
- Ensure your SSH key is added to your account at api.together.ai/settings/ssh-key
- Keys must be added before cluster creation
-
Connect via SSH
- Use the connection command shown in the cluster UI
- SSH directly to the Slurm login node
- Verify Slurm
- Start submitting jobs
- Learn about Slurm commands
- Submit batch jobs with
sbatch - Run interactive jobs with
srun
Common First Tasks
Upload Data
For small datasets:Run a Test Job
Kubernetes example:Troubleshooting
Can’t see my nodes
- Verify your kubeconfig is set:
echo $KUBECONFIG - Check cluster status in the UI (should be “Ready”)
- Ensure you downloaded the latest kubeconfig
SSH connection refused
- Verify your SSH key was added before cluster creation
- Check the connection command in the cluster UI
- Ensure you’re using the correct hostname
Capacity unavailable
- Use the “Notify Me” option to get alerts when capacity is available
- Try a different region
- Contact [email protected] for custom requirements