Skip to main content

On this page

Kubernetes Usage

Use kubectl to interact with Kubernetes clusters for containerized workloads.

Deploy Pods with Storage

New to Kubernetes? A PersistentVolumeClaim (PVC) is a request for storage that your pods can use. Think of it like requesting a disk that persists even when pods restart.
We provide a static PersistentVolume (PV) with the same name as your shared volume. As long as you use the static PV, your data will persist across pod restarts, cluster operations, and even after cluster deletion.

Understanding Storage in Kubernetes

Kubernetes uses a three-step process for storage:
  1. PersistentVolume (PV) - The actual storage resource (managed by Together AI)
  2. PersistentVolumeClaim (PVC) - Your request to use that storage (you create this)
  3. Pod with volumeMounts - Mounts the PVC into your container at a specific path (you create this)

Step 1: Create a PersistentVolumeClaim

Shared Storage PVC (Multi-Pod Access):
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-pvc                    # Name you'll reference in pods
spec:
  accessModes:
    - ReadWriteMany                   # Multiple pods can read/write simultaneously
  resources:
    requests:
      storage: 10Gi                   # Requested size (can be adjusted)
  volumeName: <shared volume name>    # Replace with your shared volume name from cluster UI
Key fields explained:
  • accessModes: ReadWriteMany - Allows multiple pods across different nodes to mount this volume simultaneously (learn more)
  • volumeName - Must match the exact name of your shared volume shown in the cluster UI
  • storage: 10Gi - The amount of storage you’re requesting
Local Storage PVC (Single-Node Access):
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: local-pvc                     # Name you'll reference in pods
spec:
  accessModes:
    - ReadWriteOnce                   # Only one pod/node can mount at a time
  resources:
    requests:
      storage: 50Gi                   # Requested size
  storageClassName: local-storage-class
Key fields explained:
  • accessModes: ReadWriteOnce - Only one pod can mount this volume (typically for fast local NVMe storage)
  • storageClassName - Specifies the type of storage to provision
Save these to files (e.g., shared-pvc.yaml, local-pvc.yaml) and apply:
kubectl apply -f shared-pvc.yaml -n default # change to your namespace
kubectl apply -f local-pvc.yaml -n default # change to your namespace

# Verify PVCs are bound
kubectl get pvc -A # across all namepspaces
You should see STATUS: Bound for both PVCs.

Step 2: Create a Pod with Mounted Volumes

Now create a pod that mounts these volumes:
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  restartPolicy: Never
  containers:
    - name: ubuntu
      image: debian:stable-slim
      command: ["/bin/sh", "-c", "sleep infinity"]  # Keeps pod running
      volumeMounts:                                   # Where to mount volumes inside container
        - name: shared-storage                        # References volume defined below
          mountPath: /mnt/shared                      # Path inside container
        - name: local-storage
          mountPath: /mnt/local
  volumes:                                            # Defines volumes from PVCs
    - name: shared-storage                            # Internal name for this volume
      persistentVolumeClaim:
        claimName: shared-pvc                         # Must match PVC name from Step 1
    - name: local-storage
      persistentVolumeClaim:
        claimName: local-pvc
Key fields explained:
  • volumeMounts.mountPath - The directory path inside your container where the volume will appear
  • volumes[].name - An internal identifier that connects the volume definition to the volumeMount
  • persistentVolumeClaim.claimName - Must exactly match the PVC name you created in Step 1
Learn more about volumes in pods →

Step 3: Deploy and Access Your Pod

Save the pod definition to a file (e.g., pod-with-storage.yaml) and deploy:
# Deploy the pod
kubectl apply -f pod-with-storage.yaml -n default # should be same as the namespace in which PVC is deployed

# Wait for pod to be running
kubectl get pods -w

# Once STATUS shows "Running", access the pod
kubectl exec -it test-pod -- bash

Step 4: Verify Mounted Volumes

Once inside the pod, verify your volumes are mounted:
# Check mounted filesystems
df -h | grep /mnt

# List mounted directories
ls -la /mnt/shared
ls -la /mnt/local

# Test write access
echo "Hello from pod" > /mnt/shared/test.txt
cat /mnt/shared/test.txt

Accessing Volumes from Multiple Pods

Because the shared storage uses ReadWriteMany, multiple pods can access it simultaneously:
# Create a second pod using the same shared PVC
kubectl run test-pod-2 --image=debian:stable-slim --command -- sleep infinity

# Exec into the second pod
kubectl exec -it test-pod-2 -- bash

# The file you created from the first pod is visible here
cat /mnt/shared/test.txt

Understanding GPU Access in Containers for Kubernetes Clusters

Our Kubernetes runtime exposes all GPU devices to all containers on the host. However, whether you can use tools like nvidia-smi inside your container depends on your container image. Two scenarios:
  1. Container with CUDA drivers (e.g., nvidia/cuda, pytorch/pytorch):
    • ✓ GPU devices are accessible
    • nvidia-smi works
    • ✓ CUDA libraries available
    • Recommended for GPU workloads
  2. Container without CUDA drivers (e.g., debian, ubuntu base images):
    • ✓ GPU devices are still exposed by the runtime
    • nvidia-smi command not found (CUDA drivers not installed in container)
    • ✗ Cannot run GPU workloads without installing CUDA
    • GPU hardware is accessible, but you need CUDA software to use it
Key Concept: The container runtime makes GPU devices available, but the container image must include CUDA drivers and tools to interact with them. Think of it like having a GPU plugged in (runtime provides this) but needing drivers installed (image must provide this).
To run GPU workloads or access your data volumes in the Kubernetes Clusters: Deploy a pod with GPU and storage access, then exec into it. First, ensure you have a PVC created (see PVC creation above), then create a pod with a CUDA-enabled base image.
apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload-pod
spec:
  restartPolicy: Never
  containers:
    - name: pytorch
      image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime  # CUDA-enabled image
      command: ["/bin/bash", "-c", "sleep infinity"]
      resources:
        limits:
          nvidia.com/gpu: 1  # Request 1 GPU
      volumeMounts:
        - name: shared-storage
          mountPath: /mnt/shared
  volumes:
    - name: shared-storage
      persistentVolumeClaim:
        claimName: shared-pvc  # Must match your PVC name from earlier
Deploy and access:
# Deploy the pod
kubectl apply -f gpu-pod.yaml

# Wait for it to be running
kubectl wait --for=condition=Ready pod/gpu-workload-pod

# Exec into the pod
kubectl exec -it gpu-workload-pod -- bash

# Inside the pod, you can now:
nvidia-smi                    # See GPU(s) allocated to this pod
ls /mnt/shared                # Access your mounted volumes
python train.py               # Run your GPU workloads

Kubernetes Dashboard

Access the Kubernetes Dashboard for visual cluster management:
  1. From the cluster UI, click the K8s Dashboard URL
  2. Retrieve your access token:
kubectl -n kubernetes-dashboard get secret \
  $(kubectl -n kubernetes-dashboard get secret | grep admin-user-token | awk '{print $1}') \
  -o jsonpath='{.data.token}' | base64 -d | pbcopy
  1. Paste the token into the dashboard login

Direct SSH Access

Prerequisites

SSH to GPU Worker Nodes (in Kubernetes) and Slurm Compute Nodes (Slurm)

You can SSH directly into any GPU worker node/ Slurm compute nodes from the cluster UI. From the UI:
  1. Navigate to your cluster in the Together Cloud UI
  2. Go to the Worker Nodes section
  3. Find the node you want to access
  4. Click the Copy icon under the host column next to the node
  5. Paste and run the command in your terminal
The copied command includes the full hostname and is ready to use:
ssh 9cvq-68pzlt-99e6e06b-ec17-4198-96c2-6a7a9c2236b2.s1.us-central-4b.cloud.together.ai
Use cases for direct worker node access:
  • Check GPU utilization across all GPUs on the node with nvidia-smi
  • Monitor node-level performance metrics (CPU, memory, disk, network)
  • Inspect system logs (journalctl, /var/log)
  • Debug node-level networking or storage issues
  • Check Kubernetes kubelet status and logs
  • View all processes running on the node
  • In case of Slurm clusters you can directly run GPU workloads on the compute nodes via ssh
Important: SSH access matrix (Kubernetes vs Slurm)
Access methodKubernetes clusters (SSH to worker node)Slurm clusters (SSH to compute node)
GPU visibilitySee all GPUs on the node with nvidia-smiSee all GPUs on the node with nvidia-smi
Run GPU workloads✗ Not available (no direct GPU device access)✓ Available (run workloads directly)
Access PersistentVolumes✗ Not available (mounted in pods only)✓ Available via /home directory
Best forNode-level monitoring and debuggingNode-level monitoring, debugging, and direct Slurm workflows
If you need GPU workloads or PersistentVolumes on Kubernetes, exec into a pod with GPU and storage access.

SSH to Slurm Login Nodes

For HPC workflows, Slurm clusters provide SSH access to login nodes for job submission. The cluster UI shows copy-ready Slurm commands tailored to your cluster. Use these to quickly verify connectivity and submit jobs. Hostnames:
  • Worker nodes: <node-name>.slurm.pod (e.g., gpu-dp-hmqnh-nwlnj.slurm.pod)
  • Login node: Always slurm-login (where you’ll start most jobs)
Common Slurm commands:
sinfo          # View node and partition status
squeue         # View job queue
srun           # Run interactive jobs
sbatch         # Submit batch jobs
scancel        # Cancel jobs
VS Code Remote SSH SetupTo use VS Code with your Slurm cluster, configure SSH with a proxy jump host in your ~/.ssh/config:
# Keep connections alive
Host *
  ServerAliveInterval 60

# Together AI jump host (if applicable)
Host together-jump
  HostName <your-jump-host>
  User <your-username>

# Your Slurm login node
Host slurm-cluster
  HostName slurm-login
  ProxyJump together-jump
  User <your-username>
Then in VS Code’s Remote SSH extension, connect to slurm-cluster. The connection will automatically route through the jump host.
Learn more about Slurm →

Cluster Scaling

Clusters can scale flexibly in real time. Add on-demand compute to temporarily scale up when workload demand spikes, then scale back down as demand decreases. Scaling operations can be performed via:
  • Together Cloud UI
  • tcloud CLI
  • REST API

Cluster Autoscaling

Cluster Autoscaling automatically adjusts the number of nodes in your cluster based on workload demand using the Kubernetes Cluster Autoscaler. How It Works: The Kubernetes Cluster Autoscaler monitors your cluster and:
  • Scales up when pods are pending due to insufficient resources
  • Scales down when nodes are underutilized for an extended period
  • Respects constraints like minimum/maximum node counts and resource limits
When pods cannot be scheduled due to lack of resources, the autoscaler provisions additional nodes automatically. When nodes remain idle below a utilization threshold, they are safely drained and removed. Enabling Autoscaling:
  1. Navigate to GPU Clusters in the Together Cloud UI
  2. Click Create Cluster
  3. In the cluster configuration, toggle Enable Autoscaling
  4. Configure your maximum GPUs
  5. Create the cluster
Once enabled, the autoscaler runs continuously in the background, responding to workload changes without manual intervention.
Autoscaling works with both reserved and on-demand capacity. Scaling beyond reserved capacity will provision on-demand nodes at standard hourly rates.

Targeted Scale-down

To control which nodes are removed during scale-down:
  1. Cordon the node(s) to prevent new workloads
    • For Kubernetes: kubectl cordon <node_to_cordon>
    • For Slurm: sudo scontrol update NodeName=<node_name> State=drain Reason="<reason_for_cordoning>”
  2. Trigger scale-down via UI, CLI, or API
Cordoned and annotated nodes are prioritized for deletion above all others.

Storage Management

Clusters support long-lived, resizable shared storage with persistent data.

Storage Tiers

All clusters include:
  • Shared volumes – Multi-NIC bare metal paths for high throughput
  • Local NVMe disks – Fast local storage on each node
  • Shared /home – NFS-mounted from head node for code and configs

Upload Data

For small datasets:
# Create a PVC and pod with your shared volume mounted
kubectl cp LOCAL_FILENAME POD_NAME:/data/
For large datasets: Schedule a pod on the cluster that downloads directly from S3 or your data source:
apiVersion: v1
kind: Pod
metadata:
  name: data-loader
spec:
  containers:
    - name: downloader
      image: amazon/aws-cli
      command: ["aws", "s3", "cp", "s3://bucket/data", "/mnt/shared/", "--recursive"]
      volumeMounts:
        - name: shared-storage
          mountPath: /mnt/shared
  volumes:
    - name: shared-storage
      persistentVolumeClaim:
        claimName: shared-pvc

Resize Storage

Storage volumes can be dynamically resized as your data grows. Use the UI, CLI, or API to increase volume size. Learn more about storage options →

Monitoring and Status

Check Cluster Health

From the UI:
  • View cluster status (Provisioning, Ready, Error)
  • Monitor resource utilization
  • Check node health indicators
From kubectl:
kubectl get nodes           # Node status
kubectl top nodes           # Resource usage
kubectl get pods --all-namespaces  # All running workloads
From Slurm:
sinfo                       # Node and partition status
squeue                      # Job queue
scontrol show node          # Detailed node info

Best Practices

Resource Management

  • Use PersistentVolumes for shared data that persists across pod restarts
  • Use local storage for temporary scratch space
  • Set resource requests and limits in pod specs

Job Scheduling

  • Use Kubernetes Jobs for batch processing
  • Use Slurm job arrays for embarrassingly parallel workloads
  • Set appropriate timeouts and retry policies

Data Management

  • Download large datasets directly on the cluster (not via local machine)
  • Use shared storage for training data and checkpoints
  • Use local NVMe for temporary files during training

Scaling Strategy

  • Start with reserved capacity for baseline workload
  • Add on-demand capacity for burst periods
  • Use targeted scale-down to control costs

GPU capacity not available

In case you do not see GPU capacity of the type you require in the api.together.ai cloud consile, you can request GPU capacity by going to the create cluster view, slecting your region and GPU capacity, type required and clicking on “Request” button. Please also, select the date from which you need the GPUs. We use these requests as input for our demand planning, and our team will reach out to you if and when that becomes avialable.
Submitting a request for capacity does not guarantee fulfillment due to very high demand, we try our best to fulfil these requests based on available GPU capacity. In case you need guaranteed GPU capacity for fixed periods of time, please reach out to our team

Troubleshooting

Pods not scheduling

  • Check node status: kubectl get nodes
  • Verify resource requests don’t exceed available resources
  • Check for taints on nodes: kubectl describe node <node-name>

Storage mount issues

  • Verify PVC is bound: kubectl get pvc
  • Check volume name matches your shared volume
  • Ensure storage class exists for local storage

Slurm jobs not running

  • Check node status: sinfo
  • Verify partition is available
  • Check job status: scontrol show job <jobid>

What’s Next?