Cluster Management

Kubernetes Usage

Use kubectl to interact with Kubernetes clusters for containerized workloads.

Deploy Pods with Storage

New to Kubernetes? A PersistentVolumeClaim (PVC) is a request for storage that your pods can use. Think of it like requesting a disk that persists even when pods restart.

We provide a static PersistentVolume (PV) with the same name as your shared volume. As long as you use the static PV, your data will persist across pod restarts, cluster operations, and even after cluster deletion.

Understanding Storage in Kubernetes

Kubernetes uses a three-step process for storage:

PersistentVolume (PV) - The actual storage resource (managed by Together AI)
PersistentVolumeClaim (PVC) - Your request to use that storage (you create this)
Pod with volumeMounts - Mounts the PVC into your container at a specific path (you create this)

Step 1: Create a PersistentVolumeClaim

Shared Storage PVC (Multi-Pod Access):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-pvc                    # Name you'll reference in pods
spec:
  accessModes:
    - ReadWriteMany                   # Multiple pods can read/write simultaneously
  resources:
    requests:
      storage: 10Gi                   # Requested size (can be adjusted)
  volumeName: <shared volume name>    # Replace with your shared volume name from cluster UI

Key fields explained:

accessModes: ReadWriteMany - Allows multiple pods across different nodes to mount this volume simultaneously (learn more)
volumeName - Must match the exact name of your shared volume shown in the cluster UI
storage: 10Gi - The amount of storage you’re requesting

Local Storage PVC (Single-Node Access):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: local-pvc                     # Name you'll reference in pods
spec:
  accessModes:
    - ReadWriteOnce                   # Only one pod/node can mount at a time
  resources:
    requests:
      storage: 50Gi                   # Requested size
  storageClassName: local-storage-class

Key fields explained:

accessModes: ReadWriteOnce - Only one pod can mount this volume (typically for fast local NVMe storage)
storageClassName - Specifies the type of storage to provision

Save these to files (e.g., shared-pvc.yaml, local-pvc.yaml) and apply:

kubectl apply -f shared-pvc.yaml -n default # change to your namespace
kubectl apply -f local-pvc.yaml -n default # change to your namespace

# Verify PVCs are bound
kubectl get pvc -A # across all namepspaces

You should see STATUS: Bound for both PVCs.

Step 2: Create a Pod with Mounted Volumes

Now create a pod that mounts these volumes:

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  restartPolicy: Never
  containers:
    - name: ubuntu
      image: debian:stable-slim
      command: ["/bin/sh", "-c", "sleep infinity"]  # Keeps pod running
      volumeMounts:                                   # Where to mount volumes inside container
        - name: shared-storage                        # References volume defined below
          mountPath: /mnt/shared                      # Path inside container
        - name: local-storage
          mountPath: /mnt/local
  volumes:                                            # Defines volumes from PVCs
    - name: shared-storage                            # Internal name for this volume
      persistentVolumeClaim:
        claimName: shared-pvc                         # Must match PVC name from Step 1
    - name: local-storage
      persistentVolumeClaim:
        claimName: local-pvc

Key fields explained:

volumeMounts.mountPath - The directory path inside your container where the volume will appear
volumes[].name - An internal identifier that connects the volume definition to the volumeMount
persistentVolumeClaim.claimName - Must exactly match the PVC name you created in Step 1

Learn more about volumes in pods →

Step 3: Deploy and Access Your Pod

Save the pod definition to a file (e.g., pod-with-storage.yaml) and deploy:

# Deploy the pod
kubectl apply -f pod-with-storage.yaml -n default # should be same as the namespace in which PVC is deployed

# Wait for pod to be running
kubectl get pods -w

# Once STATUS shows "Running", access the pod
kubectl exec -it test-pod -- bash

Step 4: Verify Mounted Volumes

Once inside the pod, verify your volumes are mounted:

# Check mounted filesystems
df -h | grep /mnt

# List mounted directories
ls -la /mnt/shared
ls -la /mnt/local

# Test write access
echo "Hello from pod" > /mnt/shared/test.txt
cat /mnt/shared/test.txt

Accessing Volumes from Multiple Pods

Because the shared storage uses ReadWriteMany, multiple pods can access it simultaneously:

# Create a second pod using the same shared PVC
kubectl run test-pod-2 --image=debian:stable-slim --command -- sleep infinity

# Exec into the second pod
kubectl exec -it test-pod-2 -- bash

# The file you created from the first pod is visible here
cat /mnt/shared/test.txt

Understanding GPU Access in Containers for Kubernetes Clusters

Our Kubernetes runtime exposes all GPU devices to all containers on the host. However, whether you can use tools like nvidia-smi inside your container depends on your container image. Two scenarios:

Container with CUDA drivers (e.g., nvidia/cuda, pytorch/pytorch):
- ✓ GPU devices are accessible
- ✓ nvidia-smi works
- ✓ CUDA libraries available
- Recommended for GPU workloads
Container without CUDA drivers (e.g., debian, ubuntu base images):
- ✓ GPU devices are still exposed by the runtime
- ✗ nvidia-smi command not found (CUDA drivers not installed in container)
- ✗ Cannot run GPU workloads without installing CUDA
- GPU hardware is accessible, but you need CUDA software to use it

Key Concept: The container runtime makes GPU devices available, but the container image must include CUDA drivers and tools to interact with them. Think of it like having a GPU plugged in (runtime provides this) but needing drivers installed (image must provide this).

To run GPU workloads or access your data volumes in the Kubernetes Clusters: Deploy a pod with GPU and storage access, then exec into it. First, ensure you have a PVC created (see PVC creation above), then create a pod with a CUDA-enabled base image.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload-pod
spec:
  restartPolicy: Never
  containers:
    - name: pytorch
      image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime  # CUDA-enabled image
      command: ["/bin/bash", "-c", "sleep infinity"]
      resources:
        limits:
          nvidia.com/gpu: 1  # Request 1 GPU
      volumeMounts:
        - name: shared-storage
          mountPath: /mnt/shared
  volumes:
    - name: shared-storage
      persistentVolumeClaim:
        claimName: shared-pvc  # Must match your PVC name from earlier

Deploy and access:

# Deploy the pod
kubectl apply -f gpu-pod.yaml

# Wait for it to be running
kubectl wait --for=condition=Ready pod/gpu-workload-pod

# Exec into the pod
kubectl exec -it gpu-workload-pod -- bash

# Inside the pod, you can now:
nvidia-smi                    # See GPU(s) allocated to this pod
ls /mnt/shared                # Access your mounted volumes
python train.py               # Run your GPU workloads

Kubernetes Dashboard

Access the Kubernetes Dashboard for visual cluster management:

From the cluster UI, click the K8s Dashboard URL
Retrieve your access token:

kubectl -n kubernetes-dashboard get secret \
  $(kubectl -n kubernetes-dashboard get secret | grep admin-user-token | awk '{print $1}') \
  -o jsonpath='{.data.token}' | base64 -d | pbcopy

Paste the token into the dashboard login

Direct SSH Access

Prerequisites

SSH key must be added to your account at api.together.ai/settings/ssh-key

SSH to GPU Worker Nodes (in Kubernetes) and Slurm Compute Nodes (Slurm)

You can SSH directly into any GPU worker node/ Slurm compute nodes from the cluster UI. From the UI:

Navigate to your cluster in the Together Cloud UI
Go to the Worker Nodes section
Find the node you want to access
Click the Copy icon under the host column next to the node
Paste and run the command in your terminal

The copied command includes the full hostname and is ready to use:

ssh 9cvq-68pzlt-99e6e06b-ec17-4198-96c2-6a7a9c2236b2.s1.us-central-4b.cloud.together.ai

Use cases for direct worker node access:

Check GPU utilization across all GPUs on the node with nvidia-smi
Monitor node-level performance metrics (CPU, memory, disk, network)
Inspect system logs (journalctl, /var/log)
Debug node-level networking or storage issues
Check Kubernetes kubelet status and logs
View all processes running on the node
In case of Slurm clusters you can directly run GPU workloads on the compute nodes via ssh

Important: SSH access matrix (Kubernetes vs Slurm)

Access method	Kubernetes clusters (SSH to worker node)	Slurm clusters (SSH to compute node)
GPU visibility	See all GPUs on the node with `nvidia-smi`	See all GPUs on the node with `nvidia-smi`
Run GPU workloads	✗ Not available (no direct GPU device access)	✓ Available (run workloads directly)
Access PersistentVolumes	✗ Not available (mounted in pods only)	✓ Available via /home directory
Best for	Node-level monitoring and debugging	Node-level monitoring, debugging, and direct Slurm workflows

If you need GPU workloads or PersistentVolumes on Kubernetes, exec into a pod with GPU and storage access.

For HPC workflows, Slurm clusters provide SSH access to login nodes for job submission. The cluster UI shows copy-ready Slurm commands tailored to your cluster. Use these to quickly verify connectivity and submit jobs. Hostnames:

Worker nodes: <node-name>.slurm.pod (e.g., gpu-dp-hmqnh-nwlnj.slurm.pod)
Login node: Always slurm-login (where you’ll start most jobs)

Common Slurm commands:

sinfo          # View node and partition status
squeue         # View job queue
srun           # Run interactive jobs
sbatch         # Submit batch jobs
scancel        # Cancel jobs

VS Code Remote SSH SetupTo use VS Code with your Slurm cluster, configure SSH with a proxy jump host in your ~/.ssh/config:

# Keep connections alive
Host *
  ServerAliveInterval 60

# Together AI jump host (if applicable)
Host together-jump
  HostName <your-jump-host>
  User <your-username>

# Your Slurm login node
Host slurm-cluster
  HostName slurm-login
  ProxyJump together-jump
  User <your-username>

Then in VS Code’s Remote SSH extension, connect to slurm-cluster. The connection will automatically route through the jump host.

Learn more about Slurm →

Cluster Scaling

Clusters can scale flexibly in real time. Add on-demand compute to temporarily scale up when workload demand spikes, then scale back down as demand decreases. Scaling operations can be performed via:

Together Cloud UI
tcloud CLI
REST API

Cluster Autoscaling

Cluster Autoscaling automatically adjusts the number of nodes in your cluster based on workload demand using the Kubernetes Cluster Autoscaler. How It Works: The Kubernetes Cluster Autoscaler monitors your cluster and:

Scales up when pods are pending due to insufficient resources
Scales down when nodes are underutilized for an extended period
Respects constraints like minimum/maximum node counts and resource limits

When pods cannot be scheduled due to lack of resources, the autoscaler provisions additional nodes automatically. When nodes remain idle below a utilization threshold, they are safely drained and removed. Enabling Autoscaling:

Navigate to GPU Clusters in the Together Cloud UI
Click Create Cluster
In the cluster configuration, toggle Enable Autoscaling
Configure your maximum GPUs
Create the cluster

Once enabled, the autoscaler runs continuously in the background, responding to workload changes without manual intervention.

Autoscaling works with both reserved and on-demand capacity. Scaling beyond reserved capacity will provision on-demand nodes at standard hourly rates.

Targeted Scale-down

To control which nodes are removed during scale-down:

Cordon the node(s) to prevent new workloads
- For Kubernetes: kubectl cordon <node_to_cordon>
- For Slurm: sudo scontrol update NodeName=<node_name> State=drain Reason="<reason_for_cordoning>”
Trigger scale-down via UI, CLI, or API

Cordoned and annotated nodes are prioritized for deletion above all others.

Storage Management

Clusters support long-lived, resizable shared storage with persistent data.

Storage Tiers

All clusters include:

Shared volumes – Multi-NIC bare metal paths for high throughput
Local NVMe disks – Fast local storage on each node
Shared /home – NFS-mounted from head node for code and configs

Upload Data

For small datasets:

# Create a PVC and pod with your shared volume mounted
kubectl cp LOCAL_FILENAME POD_NAME:/data/

For large datasets: Schedule a pod on the cluster that downloads directly from S3 or your data source:

apiVersion: v1
kind: Pod
metadata:
  name: data-loader
spec:
  containers:
    - name: downloader
      image: amazon/aws-cli
      command: ["aws", "s3", "cp", "s3://bucket/data", "/mnt/shared/", "--recursive"]
      volumeMounts:
        - name: shared-storage
          mountPath: /mnt/shared
  volumes:
    - name: shared-storage
      persistentVolumeClaim:
        claimName: shared-pvc

Resize Storage

Storage volumes can be dynamically resized as your data grows. Use the UI, CLI, or API to increase volume size. Learn more about storage options →

Monitoring and Status

Check Cluster Health

From the UI:

View cluster status (Provisioning, Ready, Error)
Monitor resource utilization
Check node health indicators

From kubectl:

kubectl get nodes           # Node status
kubectl top nodes           # Resource usage
kubectl get pods --all-namespaces  # All running workloads

From Slurm:

sinfo                       # Node and partition status
squeue                      # Job queue
scontrol show node          # Detailed node info

Best Practices

Resource Management

Use PersistentVolumes for shared data that persists across pod restarts
Use local storage for temporary scratch space
Set resource requests and limits in pod specs

Job Scheduling

Use Kubernetes Jobs for batch processing
Use Slurm job arrays for embarrassingly parallel workloads
Set appropriate timeouts and retry policies

Data Management

Download large datasets directly on the cluster (not via local machine)
Use shared storage for training data and checkpoints
Use local NVMe for temporary files during training

Scaling Strategy

Start with reserved capacity for baseline workload
Add on-demand capacity for burst periods
Use targeted scale-down to control costs

GPU capacity not available

In case you do not see GPU capacity of the type you require in the api.together.ai cloud consile, you can request GPU capacity by going to the create cluster view, slecting your region and GPU capacity, type required and clicking on “Request” button. Please also, select the date from which you need the GPUs. We use these requests as input for our demand planning, and our team will reach out to you if and when that becomes avialable.

Submitting a request for capacity does not guarantee fulfillment due to very high demand, we try our best to fulfil these requests based on available GPU capacity. In case you need guaranteed GPU capacity for fixed periods of time, please reach out to our team

Getting Started

Inference

Training

Capabilities

Other APIs

On this page

Kubernetes Usage

Deploy Pods with Storage

Understanding Storage in Kubernetes

Step 1: Create a PersistentVolumeClaim

Step 2: Create a Pod with Mounted Volumes

Step 3: Deploy and Access Your Pod

Step 4: Verify Mounted Volumes

Accessing Volumes from Multiple Pods

Understanding GPU Access in Containers for Kubernetes Clusters

Kubernetes Dashboard

Direct SSH Access

Prerequisites

SSH to GPU Worker Nodes (in Kubernetes) and Slurm Compute Nodes (Slurm)

Cluster Scaling

Cluster Autoscaling

Targeted Scale-down

Storage Management

Storage Tiers

Upload Data

Resize Storage

Monitoring and Status

Check Cluster Health

Best Practices

Resource Management

Job Scheduling

Data Management

Scaling Strategy

GPU capacity not available

Troubleshooting

Pods not scheduling

Storage mount issues

Slurm jobs not running

What’s Next?

Getting Started

Inference

Training

Capabilities

Other APIs

​On this page

​Kubernetes Usage

​Deploy Pods with Storage

​Understanding Storage in Kubernetes

​Step 1: Create a PersistentVolumeClaim

​Step 2: Create a Pod with Mounted Volumes

​Step 3: Deploy and Access Your Pod

​Step 4: Verify Mounted Volumes

​Accessing Volumes from Multiple Pods

​Understanding GPU Access in Containers for Kubernetes Clusters

​Kubernetes Dashboard

​Direct SSH Access

​Prerequisites

​SSH to GPU Worker Nodes (in Kubernetes) and Slurm Compute Nodes (Slurm)

​SSH to Slurm Login Nodes

​Cluster Scaling

​Cluster Autoscaling

​Targeted Scale-down

​Storage Management

​Storage Tiers

​Upload Data

​Resize Storage

​Monitoring and Status

​Check Cluster Health

​Best Practices

​Resource Management

​Job Scheduling

​Data Management

​Scaling Strategy

​GPU capacity not available

​Troubleshooting

​Pods not scheduling

​Storage mount issues

​Slurm jobs not running

​What’s Next?

On this page

Kubernetes Usage

Deploy Pods with Storage

Understanding Storage in Kubernetes

Step 1: Create a PersistentVolumeClaim

Step 2: Create a Pod with Mounted Volumes

Step 3: Deploy and Access Your Pod

Step 4: Verify Mounted Volumes

Accessing Volumes from Multiple Pods

Understanding GPU Access in Containers for Kubernetes Clusters

Kubernetes Dashboard

Direct SSH Access

Prerequisites

SSH to GPU Worker Nodes (in Kubernetes) and Slurm Compute Nodes (Slurm)

SSH to Slurm Login Nodes

Cluster Scaling

Cluster Autoscaling

Targeted Scale-down

Storage Management

Storage Tiers

Upload Data

Resize Storage

Monitoring and Status

Check Cluster Health

Best Practices

Resource Management

Job Scheduling

Data Management

Scaling Strategy

GPU capacity not available

Troubleshooting

Pods not scheduling

Storage mount issues

Slurm jobs not running

What’s Next?