Instant Clusters

This document explains how to create an instant cluster and how to start training with a Kubernetes Cluster. V.1.5

Create your Instant Cluster

Create a Cluster

Text

1. Click on the Cluster size, for example 8xH100
2. Enter a cluster name
3. Choose a cluster type
4. Select a Region
5. Select the required duration for your cluster
6. Create and name your shared volume. The minimum size is 1TB
7. Optional: Select your Nvidia driver and CUDA versions
8. Click on Proceed

Check Status of your Cluster
Increase your cluster size : click on the … in the cluster line and click on Edit Cluster and click on “Number of GPUs” select the desired amount and click update

Start training with Kubernetes

Prerequisites: install kubernetes in your environment. For example on MAC install this: https://kubernetes.io/docs/tasks/tools/install-kubectl-macos/
Get the cluster Kubeconfig
To schedule kubernetes jobs on your cluster, download the kubectl context from the Instant Clusters UI page and copy it to your local machine

Text

~/.kube/config_k8s_together_instant

export KUBECONFIG=$HOME/.kube/config_k8s_together_instant or kubectl --kubeconfig=$HOME/.kube/config_k8s_together_instant get nodes

Note: It’s possible to name config as the default “config”. If doing so, make sure to take a backup of your current config file prior

Verify you can connect to your K8s cluster

Text

kubectl get nodes
NAME                             STATUS   ROLES                       AGE   VERSION
5fa43eae-01.cloud.together.ai    Ready    <none>                      21h   v1.31.4+k3s1
5fa43eae-02.cloud.together.ai    Ready    <none>                      21h   v1.31.4+k3s1
5fa43eae-hn1.cloud.together.ai   Ready    control-plane,etcd,master   22h   v1.31.4+k3s1
5fa43eae-hn2.cloud.together.ai   Ready    control-plane,etcd,master   8h    v1.31.4+k3s1
5fa43eae-hn3.cloud.together.ai   Ready    control-plane,etcd,master   22h   v1.31.4+k3s1

How to deploy a pod from a docker image
1. Create a manifest yaml for storage to mount on your container
2. Apply the manifest: kubectl apply -f pvc.yaml

Text

 apiVersion: v1  
kind: PersistentVolumeClaim  
metadata:  
  name: shared-pvc  
spec:  
  accessModes:  
    - ReadWriteOnce  
  resources:  
    requests:  
      storage: 1Ti  
  storageClassName: shared-rdma

***

apiVersion: v1  
kind: PersistentVolumeClaim  
metadata:  
  name: local-pvc  
spec:  
  accessModes:  
    - ReadWriteOnce  
  resources:  
    requests:  
      storage: 1Ti  
  storageClassName: scratch-storage-gpu

iii. Create a manifest yaml file with your docker image and mount the volumes created above. This is a general purpose shell test pod with ubuntu allowing you to see files on the data volume for example.

Text

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
    - name: test-pod
      image: [registry/]repository/ubuntu[:tag]
      command: ["/bin/sh", "-c"]
      args: ["sleep infinity"]
    volumeMounts:
    - name: shared-pvc
      mountPath: /<path-for-shared>
    - name: local-pvc
      mountPath: /<path-for-local>
  volumes:
  - name: shared-pvc
    persistentVolumeClaim:
      claimName: shared-pvc
  - name: local-pvc
    persistentVolumeClaim:
      claimName: local-pvc

---- Real manifest ----

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  restartPolicy: Never
  containers:
    - name: ubuntu
      image: debian:stable-slim
      command: ["/bin/sh", "-c", "sleep infinity"]
      volumeMounts:
        - name: shared-pvc
          mountPath: /mnt/shared
        - name: local-pvc
          mountPath: /mnt/local
  volumes:
    - name: shared-pvc
      persistentVolumeClaim:
        claimName: shared-pvc
    - name: local-pvc
      persistentVolumeClaim:
        claimName: local-pvc

b. Create the pod by running kubectl apply -f manifest.yaml c. Get a shell into the pod by running kubectl exec -it test-pod -- bash

How to access to the Kubernetes Dashboard
You can access the k8s dashboard by clicking on your cluster’s name, then click on the k8s dashboard url. You will be prompted to enter a password, which can be obtained as follows:

Text

 kubectl --kubeconfig=$HOME/.kube/config get secret admin-user-token -n  
kubernetes-dashboard -o jsonpath={".data.token"} | base64 -d

Using the `tcloud` CLI

You can also create and manage your GPU clusters within Together’s cloud infrastructure via the tcloud CLI tool. Download it for your platform:

Mac
Linux

Authenticate with Together Cloud via Google SSO

You can authenticate with Together Cloud using Google Single Sign-On (SSO) via the tcloud CLI. Run the following command:

Text

%tcloud sso login

Your verification code is: ABC-DEFG-HIJ
Opening browser to https://www.google.com/device

Waiting for device authorization...

Open https://www.google.com/device in your browser and enter the verification code to complete authentication. Note: You must be part of an approved Google Workspace organization to authenticate.

Create a cluster

Callout: Cluster creation requires a valid payment method to be set up in your account. You can add a payment method at https://api.together.ai/settings/billing.

Text

tcloud cluster create my-cluster  
    --num-gpus 8  
    --reservation-duration 1  
    --instance-type H100-SXM  
    --region us-central-8  
    --shared-volume-name my-volume  
    --size-tib 1

Deleting a cluster

Text

 tcloud cluster delete \<CLUSTER_UUID>

Getting Started

Inference

Capabilities

Examples

Training

Guides

❓ Frequently Asked Questions

Create your Instant Cluster

Start training with Kubernetes

Using the `tcloud` CLI

Authenticate with Together Cloud via Google SSO

Create a cluster

Deleting a cluster

Getting Started

Inference

Capabilities

Examples

Training

Guides

❓ Frequently Asked Questions

​Create your Instant Cluster

​Start training with Kubernetes

​Using the tcloud CLI

​Authenticate with Together Cloud via Google SSO

​Create a cluster

​Deleting a cluster

Create your Instant Cluster

Start training with Kubernetes

Using the `tcloud` CLI

Authenticate with Together Cloud via Google SSO

Create a cluster

Deleting a cluster