Getting started with nanochat

The most fun you can have is to train your own GPT-2 and talk to it. The entire pipeline—tokenizer, pretraining, SFT, and serving—is orchestrated by a single script, so you can go from zero to a chatty model in a few hours.

Reproduce and talk to GPT-2

Clone the repo from GitHub, then run the speedrun script on an 8×H100 GPU node. At ~$24/hour, pretraining takes ~3 hours (~$75 total). Boot up a node from your favorite provider (e.g., Lambda, RunPod, Vast.ai) and run:

bash runs/speedrun.sh

Run this in a screen session—it takes ~3 hours. Once done, activate your venv and serve the chat UI:

source .venv/bin/activate
python -m scripts.chat_web

Visit the URL shown (e.g., http://YOUR_IP:8000/). Talk to your LLM—stories, poems, Q&A, or ask why the sky is blue!

Requirements

Python: See .python-version in the repo
uv: For dependency management (recommended). Install if needed
PyTorch: With CUDA for GPU training; installed via uv
GPU: 8×H100 recommended; 8×A100 works but slower. Single GPU works with gradient accumulation—omit torchrun and the code adapts automatically.

Installation

git clone https://github.com/karpathy/nanochat
cd nanochat
uv sync

For detailed setup, see installation.

Hardware notes

8×H100: Optimal; ~3 hours to GPT-2 grade. ~$24/hour on most cloud providers.
8×A100: Works, slightly slower. Good option if H100 nodes aren't available.
Single GPU: Omit torchrun; gradient accumulation kicks in; ~8× longer. Same quality, just more wall-clock time.
Less than 80GB VRAM: Reduce --device_batch_size (e.g., 16, 8, 4, 2, 1). Start with 16 and decrease until it fits.
CPU/MPS: See runs/runcpu.sh for a minimal example. The model is shrunk to fit; useful for debugging.

What happens next

After training completes, the script saves checkpoints and prepares the chat model. You run python -m scripts.chat_web to serve the web UI. From there you can iterate: customize with synthetic data (see guides), add new abilities, or push the research frontier with scaling experiments.