Together GPU Clusters provides multiple storage options. It is critical to understand which storage is persistent and which is ephemeral so you can architect your workloads to avoid data loss.Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Storage types at a glance
Use this to decide where to store your data:- Shared volumes (PVC): Persistent. Survives pod restarts, node reboots/migrations/recreations, cluster operations, and even cluster deletion. Use this for training data, checkpoints, model weights, and anything you cannot lose.
- Local NVMe disks: Ephemeral. Fast local storage on each node. Data can be lost during node migrations/recreations or cluster operations. Use only for temporary scratch data (e.g., intermediate computation files).
/homedirectory: Persistence depends on cluster type (see below).
Persistent storage: shared volumes
Shared volumes are remote-attached, high-speed filesystems. They are created during cluster setup (or attached from an existing volume) and are accessible from all nodes. Persists across:- Pod restarts and rescheduling
- Node reboots, migrations, recreations, and maintenance
- Cluster scaling operations
- Cluster deletion (volumes persist independently. In case of reserved, they move to on-demand pricing and can be reattached to other clusters)
- Kubernetes clusters: A static PersistentVolume (PV) is provided with the same name as your shared volume. Create a PersistentVolumeClaim (PVC) referencing it, then mount it in your pods. Step-by-step setup →
- Slurm clusters: The shared volume is mounted and accessible from all compute and login nodes at /home directory path.
Ephemeral storage: local NVMe disks
Each node has local NVMe drives that provide high-speed read/write performance./home directory
The behavior of /home differs between cluster types:
Slurm clusters
On Slurm clusters,/home is a persistent NFS-backed file system shared across all nodes (compute and login). It is mounted from the head node and is suitable for:
- Code and scripts
- Configuration files
- Logs
- Small datasets
- Model weights and training data
Kubernetes clusters
On Kubernetes clusters,/home is local to each node and ephemeral. It is not shared across nodes and is subject to the same data loss risks as local NVMe storage.
Which storage should I use?
- Training data, datasets → Shared volume (PVC), or
/homeon Slurm clusters - Checkpoints, model weights → Shared volume (PVC), or
/homeon Slurm clusters - Application state, databases → Shared volume (PVC), or
/homeon Slurm clusters - Code, configs → Shared volume (PVC), or
/homeon Slurm clusters - Temporary scratch files → Local NVMe (acceptable to lose)
- Intermediate computation artifacts → Local NVMe (acceptable to lose)
Upload your data
For small datasets:- Create a PVC using the shared volume name as the
volumeName, and a pod to mount the volume - Run
kubectl cp LOCAL_FILENAME YOUR_POD_NAME:/data/