Skip to main content

Create a Cluster

Follow these steps to create your first GPU cluster:

1. Access the Cluster Console

  1. Log into api.together.ai
  2. Click GPU Clusters in the top navigation menu
  3. Click Create Cluster

2. Choose Capacity Type

Select the billing mode that fits your needs:
  • Reserved – Pay upfront to reserve capacity for 1-90 days with discounted pricing
  • On-demand – Pay hourly with no commitment; terminate anytime
Learn more about capacity types →

3. Configure Your Cluster

Cluster Size
  • Select the number and type of GPUs (e.g., 8xH100)
  • Available options: H100, H200, B200
Cluster Name
  • Enter a descriptive name for easy identification
Cluster Type
  • Kubernetes – For containerized workloads and K8s-native tools
  • Slurm – For HPC-style batch scheduling and traditional workflows
Region
  • Select the datacenter region closest to your data or team
Duration (Reserved only)
  • Choose reservation length: 1-90 days
Shared Volume
  • Create and name your persistent storage volume
  • Minimum size: 1 TiB
  • Can be resized later as needed
Optional Settings
  • Select NVIDIA driver version
  • Select CUDA version

4. Create and Verify

  1. Click Proceed to create your cluster
  2. Monitor the cluster status in the UI as it provisions
  3. Wait for status to transition to Ready
Your cluster is now ready to use!

Next Steps

For Kubernetes Clusters

  1. Install kubectl
  2. Download kubeconfig
    • From the cluster UI, download the kubeconfig file
    • Copy it to your local machine:
~/.kube/together_cluster.kubeconfig
export KUBECONFIG=$HOME/.kube/together_cluster.kubeconfig
  1. Verify connectivity
kubectl get nodes
You should see all worker and control plane nodes listed.
  1. Start using your cluster

For Slurm Clusters

  1. Add SSH key (if not already done)
  2. Connect via SSH
    • Use the connection command shown in the cluster UI
    • SSH directly to the Slurm login node
  3. Verify Slurm
sinfo          # View node status
squeue         # View job queue
  1. Start submitting jobs

Common First Tasks

Upload Data

For small datasets:
# Create a pod with your shared volume mounted
# Then copy files directly
kubectl cp local_file.tar.gz pod-name:/mnt/shared/
For large datasets, create a pod that downloads from S3 or your data source.

Run a Test Job

Kubernetes example:
kubectl run test --image=ubuntu --command -- sleep infinity
kubectl exec -it test -- bash
Slurm example:
srun --gpus=1 --pty bash
nvidia-smi

Troubleshooting

Can’t see my nodes

  • Verify your kubeconfig is set: echo $KUBECONFIG
  • Check cluster status in the UI (should be “Ready”)
  • Ensure you downloaded the latest kubeconfig

SSH connection refused

  • Verify your SSH key was added before cluster creation
  • Check the connection command in the cluster UI
  • Ensure you’re using the correct hostname

Capacity unavailable

  • Use the “Notify Me” option to get alerts when capacity is available
  • Try a different region
  • Contact [email protected] for custom requirements

What’s Next?