Overview
Instant Clusters allows you to create high-performance GPU clusters in minutes. With features like on-demand scaling, long-lived resizable high-bandwidth shared DC-local storage, Kubernetes and Slurm cluster flavors, a REST API, and Terraform support, you can run workloads flexibly without complex infrastructure management.Quickstart: Create an Instant Cluster
- Log into api.together.ai.
- Click GPU Clusters in the top navigation menu.
- Click Create Cluster.
- Choose whether you want Reserved capacity or On-demand, based on your needs.
- Select the cluster size, for example
8xH100
. - Enter a cluster name.
- Choose a cluster type either Kubernetes or Slurm.
- Select a region.
- Choose the reservation duration for your cluster.
- Create and name your shared volume (minimum size 1 TiB).
- Optional: Select your NVIDIA driver and CUDA versions.
- Click Proceed.
Capacity Types
- Reserved: You pay upfront to reserve GPU capacity for a duration between 1-90 days.
- On-demand: You pay as you go for GPU capacity on an hourly basis. No pre-payment or reservation needed, and you can terminate your cluster at any time.
Node Types
We have the following node types available in Instant Clusters.- NVIDIA HGX B200
- NVIDIA HGX H200
- NVIDIA HGX H100 SXM
- NVIDIA HGX H100 SXM - Inference (lower Infiniband multi-node GPU-to-GPU bandwidth, suitable for single-node inference)
Pricing
Pricing information for different GPU node types can be found here.Cluster Status
- From the UI, verify that your cluster transitions to Ready.
- Monitor progress and health indicators directly from the cluster list.
Start Training with Kubernetes
Install kubectl
Installkubectl
in your environment, for example on MacOS.
Download kubeconfig
From the Instant Clusters UI, download the kubeconfig and copy it to your local machine:
You can rename the file to config
, but back up your existing config first.
Verify Connectivity
Deploy a Pod with Storage
- Create a PersistentVolumeClaim for shared storage.
- Create a PersistentVolumeClaim for local storage.
- Mount them into a pod:
Kubernetes Dashboard Access
- From the cluster UI, click the K8s Dashboard URL.
- Retrieve your access token using the following command:
Cluster Scaling
Clusters can scale flexibly in real time. By adding on-demand compute to your cluster, you can temporarily scale up to more GPUs when workload demand spikes and then scale back down as it wanes. Scaling up or down can be performed using the UI, tcloud CLI, or REST API.Targeted Scale-down
If you wish to mark which node or nodes should be targeted for scale-down, you can:- Either cordon the k8s node(s) or add the node.together.ai/delete-node-on-scale-down: “true” annotation to the k8s node(s). = Then trigger scale-down via the cloud console UI (or CLI, REST API).
- Instant Clusters will ensure that cordoned + annotated nodes are prioritized for deletion above all others.
Storage Management
Instant Clusters supports long-lived, resizable in-DC shared storage. You can dynamically create and attach volumes to your cluster at cluster creation time, and resize as your data grows. All shared storage is backed by multi-NIC bare metal paths, ensuring high-throughput and low-latency performance for shared storage.Upload Your Data
To upload data to the cluster from your local machine, follow these steps:- Create a PVC using the
shared-rdm
a storage class as well as a pod to mount the volume - Run
kubectl cp LOCAL_FILENAME YOUR_POD_NAME:/data/
- Note: This method is suitable for smaller datasets, for larger datasets we recommend scheduling a pod on the cluster that can download from S3.
Compute Access
You can run workloads on Instant Clusters using Kubernetes or Slurm-on-Kubernetes.Kubernetes
Usekubectl
to submit jobs, manage pods, and interact with your cluster. See Quickstart for setup details.
Slurm Direct SSH
For HPC workflows, you can enable Slurm-on-Kubernetes:- Directly SSH into a Slurm node.
- Use familiar Slurm commands (
sbatch
,srun
, etc.) to manage distributed training jobs.
APIs and Integrations
tcloud CLI
Download the CLI: Authenticate via Google SSO:billing-type
field and setting its value to either prepaid
(i.e. a reservation) or on_demand
.
REST API
All cluster management actions (create, scale, delete, storage, etc.) are available via REST API endpoints for programmatic control. The API documentation can be found here.Terraform Provider
Use the Together Terraform Provider to define clusters, storage, and scaling policies as code. This allows reproducible infrastructure management integrated with existing Terraform workflows.SkyPilot
You can orchestrate AI workloads on Instant Clusters using SkyPilot. The following example shows how to use Together with SkyPilot and orchestrategpt-oss-20b
finetuning on it.
Use Together Instant Cluster with SkyPilot
-
- Launch a Together Instant Cluster with cluster type selected as Kubernetes
- Get the Kubernetes config for the cluster
- Save the kubeconfig to a file say
./together.kubeconfig
- Copy the kubeconfig to your
~/.kube/config
or merge the Kubernetes config with your existing kubeconfig file.orSkyPilot automatically picks up your credentials to the Together Instant Cluster.
-
Check that SkyPilot can access the Together Instant Cluster
Your Together cluster is now accessible with SkyPilot.
-
Check the available GPUs on the cluster:
Example: Finetuning gpt-oss-20b on the Together Instant Cluster
Launch a gpt-oss finetuning job on the Together cluster is now as simple as a single command:Billing
Compute Billing
Instant Clusters offer two compute billing options: reserved and on-demand.- Reservations – Credits are charged upfront or deducted for the full reserved duration once the cluster is provisioned. Any usage beyond the reserved capacity is billed at on-demand rates.
- On-Demand – Pay only for the time your cluster is running, with no upfront commitment.
Storage Billing
Storage is billed on a pay-as-you-go basis, as detailed on our pricing page. You can freely increase or decrease your storage volume size, with all usage billed at the same rate.Viewing Usage and Invoices
You can view your current usage anytime on the Billing page in Settings. Each invoice includes a detailed breakdown of reservation, burst, and on-demand usage for compute and storageCluster and Storage Lifecycles
Clusters and storage volumes follow different lifecycle policies:- Compute Clusters – Clusters are automatically decommissioned when their reservation period ends. To extend a reservation, please contact your account team.
- Storage Volumes – Storage volumes are persistent and remain available as long as your billing account is in good standing. They are not automatically deleted.
Running Out of Credits
When your credits are exhausted, resources behave differently depending on their type:- Reserved Compute – Existing reservations remain active until their scheduled end date. Any additional on-demand capacity used to scale beyond the reservation is decommissioned.
- Fully On-Demand Compute – Clusters are first paused and then decommissioned if credits are not restored.
- Storage Volumes – Access is revoked first, and the data is later decommissioned.