Overview

nanochat is Andrej Karpathy’s end-to-end ChatGPT clone that demonstrates how a full conversational AI stack, from tokenizer to web UI—can, be trained and deployed for $100 on 8×H100 hardware. In this guide, you’ll learn how to train and deploy nanochat using Together’s Instant Clusters. The entire process takes approximately 4 hours on an 8×H100 cluster and includes:

Training a BPE tokenizer on FineWeb-Edu
Pretraining a base transformer model
Midtraining on curated tasks
Supervised fine-tuning for conversational alignment
Deploying a FastAPI web server with a chat interface

Prerequisites

Before you begin, make sure you have:

A Together AI account with access to Instant Clusters
Basic familiarity with SSH and command line operations
kubectl installed on your local machine (installation guide)

Training nanochat

Step 1: Create an Instant Cluster

First, let’s create an 8×H100 cluster to train nanochat.

Log into api.together.ai
Click GPU Clusters in the top navigation menu
Click Create Cluster
Select On-demand capacity
Choose 8xH100 as your cluster size
Enter a cluster name (e.g., nanochat-training)
Select Slurm on Kubernetes as the cluster type
Choose your preferred region
Create a shared volume, min 1 TB storage
Click Preview CLuster and then “Confirm & Create”

Your cluster will be ready in a few minutes. Once the status shows Ready, you can proceed to the next step.

For detailed information about Instant Clusters features and options, see the Instant Clusters documentation.

Step 2: SSH into Your Cluster

From the Instant Clusters UI, you’ll find SSH access details for your cluster. A command like the one below can be copied from the instant clusters dashboard.

ssh <username>@<cluster-hostname>

You can also use ssh -o ServerAliveInterval=60 - it sends a ping to the ssh server every 60s, so it keeps the TCP ssh session alive, even if there’s no terminal input/output for a long time during training. Once connected, you’ll be in the login node of your Slurm cluster.

Step 3: Clone nanochat and Set Up Environment

Let’s clone the nanochat repository and set up the required dependencies.

# Clone the repository
git clone https://github.com/karpathy/nanochat.git
cd nanochat

# Add ~/.local/bin to your PATH
export PATH="$HOME/.local/bin:$PATH"

# Source the Cargo environment
source "$HOME/.cargo/env"

Install System Dependencies nanochat requires Python 3.10 and development headers:

# Update package manager and install Python dependencies
sudo apt-get update
sudo apt-get install -y python3.10-dev

# Verify Python installation
python3 -c "import sysconfig; print(sysconfig.get_path('include'))"

Step 4: Access GPU Resources

Use Slurm’s srun command to allocate 8 GPUs for your training job:

srun --gres=gpu:8 --pty bash

This command requests 8 GPUs and gives you an interactive bash session on a compute node. Once you’re on the compute node, verify GPU access:

nvidia-smi

You should see all 8 H100 GPUs listed with their memory and utilization stats like below.

Step 5: Configure Cache Directory

To optimize data loading performance, set the nanochat cache directory to the /scratch volume, which is optimized for high-throughput I/O:

export NANOCHAT_BASE_DIR="/scratch/$USER/nanochat/.cache/nanochat"

This needs to be changed inside the speedrun.sh file and ensures that dataset streaming, checkpoints, and intermediate artifacts don’t bottleneck your training.

This step is critical and without it, during training, you’ll notice that your FLOP utilization is only ~13% instead of ~50%. This is due to dataloading bottlenecks.

Step 6: Run the Training Pipeline

Now you’re ready to kick off the full training pipeline! nanochat includes a speedrun.sh script that orchestrates all training phases:

bash speedrun.sh

# or you can use screen

screen -L -Logfile speedrun.log -S speedrun bash speedrun.sh

This script will execute the following stages:

Tokenizer Training - Trains a GPT-4 style BPE tokenizer on FineWeb-Edu data
Base Model Pretraining - Trains the base transformer model with rotary embeddings and Muon optimizer
Midtraining - Fine-tunes on a curated mixture of SmolTalk, MMLU, and GSM8K tasks
Supervised Fine-Tuning (SFT) - Aligns the model for conversational interactions
Evaluation - Runs CORE benchmarks and generates a comprehensive report

The entire training process takes approximately 4 hours on 8×H100 GPUs.

Monitor Training Progress During training, you can monitor several key metrics:

Model Flops Utilization (MFU): Should be around 50% for optimal performance
tok/sec: Tracks tokens processed per second of training
Step timing: Each step should complete in a few seconds

The scripts automatically log progress and save checkpoints under $NANOCHAT_BASE_DIR.

nanochat Inference

Step 1: Download Your Cluster’s Kubeconfig

While training is running (or after it completes), download your cluster’s kubeconfig from the Together AI dashboard. This will allow you to access the cluster using kubectl.

Go to your cluster in the Together AI dashboard
Click on the View Kubeconfig button
Copy and save the kubeconfig file to your local machine (e.g., ~/.kube/nanochat-cluster-config)

Step 2: Access the Compute Pod via kubectl

From your local machine, set up kubectl access to your cluster:

# Set the KUBECONFIG environment variable
export KUBECONFIG=/path/to/nanochat-cluster-config

# List pods in the slurm namespace
kubectl -n slurm get pods

You should see your Slurm compute pods listed. Identify the production pod where your training ran:

# Example output:
# NAME                              READY   STATUS    RESTARTS   AGE
# slurm-compute-production-abc123   1/1     Running   0          2h

# Exec into the pod
kubectl -n slurm exec -it <your-slurm-compute-production-pod> -- /bin/bash

Once inside the pod, navigate to the nanochat directory:

cd /path/to/nanochat

Set Up Python Virtual Environment Inside the compute pod, set up the Python virtual environment using uv:

# Install uv (if not already installed)
command -v uv &> /dev/null || curl -LsSf https://astral.sh/uv/install.sh | sh

# Create a local virtual environment
[ -d ".venv" ] || uv venv

# Install the repo dependencies with GPU support
uv sync --extra gpu

# Activate the virtual environment
source .venv/bin/activate

Step 3: Launch the nanochat Web Server

Now that training is complete and your environment is set up, launch the FastAPI web server:

python -m scripts.chat_web

The server will start on port 8000 inside the pod. You should see output indicating the server is running:

Step 4: Port Forward to Access the UI

In a new terminal window on your local machine, set up port forwarding to access the web UI:

# Set the KUBECONFIG (if not already set in this terminal)
export KUBECONFIG=/path/to/nanochat-cluster-config

# Forward port 8000 from the pod to local port 6818
kubectl -n slurm port-forward <your-slurm-compute-production-pod> 6818:8000

The port forwarding will remain active as long as this terminal session is open.

Step 5: Chat with nanochat!

Open your web browser and navigate to:

http://localhost:6818/

You should see the nanochat web interface! You can now have conversations with your trained model. Go ahead and ask it its favorite question and see what reaction you get!

Understanding Training Costs and Performance

The nanochat training pipeline on 8×H100 Instant Clusters typically:

Training time: ~4 hours for the full speedrun pipeline
Model Flops Utilization: ~50% (indicating efficient GPU utilization)
Cost: Approximately $100 depending on your selected hardware and duration
Final model: A fully functional conversational AI

After training completes, check the generated report report.md for detailed metrics.

Troubleshooting

GPU Not Available If nvidia-smi doesn’t show GPUs after srun:

# Try requesting GPUs explicitly
srun --gres=gpu:8 --nodes=1 --pty bash

Out of Memory Errors If you encounter OOM errors during training:

Check that NANOCHAT_BASE_DIR is set to /scratch
Ensure no other processes are using GPU memory
The default batch sizes should work on H100 80GB

Port Forwarding Connection Issues If you can’t connect to the web UI:

Verify the pod name matches exactly: kubectl -n slurm get pods
Ensure the web server is running: check logs in the pod terminal
Try a different local port if 6818 is in use

Next Steps

Now that you have nanochat running, you can:

Experiment with different prompts - Test the model’s conversational abilities and domain knowledge
Fine-tune further - Modify the SFT data or run additional RL training for specific behaviors
Deploy to production - Extend chat_web.py with authentication and persistence layers
Scale the model - Try the run1000.sh script for a larger model with better performance
Integrate with other tools - Use the inference API to build custom applications

For more details on the nanochat architecture and training process, visit the nanochat GitHub repository.

Agents

Apps

General Guides

Search & RAG

How to run nanochat on Instant Clusters⚡️

Overview

Prerequisites

Training nanochat

Step 1: Create an Instant Cluster

Step 2: SSH into Your Cluster

Step 3: Clone nanochat and Set Up Environment

Step 4: Access GPU Resources

Step 5: Configure Cache Directory

Step 6: Run the Training Pipeline

nanochat Inference

Step 1: Download Your Cluster’s Kubeconfig

Step 2: Access the Compute Pod via kubectl

Step 3: Launch the nanochat Web Server

Step 4: Port Forward to Access the UI

Step 5: Chat with nanochat!

Understanding Training Costs and Performance

Troubleshooting

Next Steps

Additional Resources

Agents

Apps

General Guides

Search & RAG

​Overview

​Prerequisites

​Training nanochat

​Step 1: Create an Instant Cluster

​Step 2: SSH into Your Cluster

​Step 3: Clone nanochat and Set Up Environment

​Step 4: Access GPU Resources

​Step 5: Configure Cache Directory

​Step 6: Run the Training Pipeline

​nanochat Inference

​Step 1: Download Your Cluster’s Kubeconfig

​Step 2: Access the Compute Pod via kubectl

​Step 3: Launch the nanochat Web Server

​Step 4: Port Forward to Access the UI

​Step 5: Chat with nanochat!

​Understanding Training Costs and Performance

​Troubleshooting

​Next Steps

​Additional Resources

Overview

Prerequisites

Training nanochat

Step 1: Create an Instant Cluster

Step 2: SSH into Your Cluster

Step 3: Clone nanochat and Set Up Environment

Step 4: Access GPU Resources

Step 5: Configure Cache Directory

Step 6: Run the Training Pipeline

nanochat Inference

Step 1: Download Your Cluster’s Kubeconfig

Step 2: Access the Compute Pod via kubectl

Step 3: Launch the nanochat Web Server

Step 4: Port Forward to Access the UI

Step 5: Chat with nanochat!

Understanding Training Costs and Performance

Troubleshooting

Next Steps

Additional Resources