Getting started with nanochat
The most fun you can have is to train your own GPT-2 and talk to it. The entire pipeline—tokenizer, pretraining, SFT, and serving—is orchestrated by a single script, so you can go from zero to a chatty model in a few hours.
Reproduce and talk to GPT-2
Clone the repo from GitHub, then run the speedrun script on an 8×H100 GPU node. At ~$24/hour, pretraining takes ~3 hours (~$75 total). Boot up a node from your favorite provider (e.g., Lambda, RunPod, Vast.ai) and run:
bash runs/speedrun.sh
Run this in a screen session—it takes ~3 hours. Once done, activate your venv and serve the chat UI:
source .venv/bin/activate
python -m scripts.chat_web
Visit the URL shown (e.g., http://YOUR_IP:8000/). Talk to your LLM—stories, poems, Q&A, or ask why the sky is blue!
Requirements
- Python: See
.python-versionin the repo - uv: For dependency management (recommended). Install if needed
- PyTorch: With CUDA for GPU training; installed via uv
- GPU: 8×H100 recommended; 8×A100 works but slower. Single GPU works with gradient accumulation—omit
torchrunand the code adapts automatically.
Installation
git clone https://github.com/karpathy/nanochat
cd nanochat
uv sync
For detailed setup, see installation.
Hardware notes
- 8×H100: Optimal; ~3 hours to GPT-2 grade. ~$24/hour on most cloud providers.
- 8×A100: Works, slightly slower. Good option if H100 nodes aren't available.
- Single GPU: Omit
torchrun; gradient accumulation kicks in; ~8× longer. Same quality, just more wall-clock time. - Less than 80GB VRAM: Reduce
--device_batch_size(e.g., 16, 8, 4, 2, 1). Start with 16 and decrease until it fits. - CPU/MPS: See runs/runcpu.sh for a minimal example. The model is shrunk to fit; useful for debugging.
What happens next
After training completes, the script saves checkpoints and prepares the chat model. You run python -m scripts.chat_web to serve the web UI. From there you can iterate: customize with synthetic data (see guides), add new abilities, or push the research frontier with scaling experiments.