Getting started with nanochat

The most fun you can have is to train your own GPT-2 and talk to it. The entire pipeline—tokenizer, pretraining, SFT, and serving—is orchestrated by a single script, so you can go from zero to a chatty model in a few hours.

Reproduce and talk to GPT-2

Clone the repo from GitHub, then run the speedrun script on an 8×H100 GPU node. At ~$24/hour, pretraining takes ~3 hours (~$75 total). Boot up a node from your favorite provider (e.g., Lambda, RunPod, Vast.ai) and run:

bash runs/speedrun.sh

Run this in a screen session—it takes ~3 hours. Once done, activate your venv and serve the chat UI:

source .venv/bin/activate
python -m scripts.chat_web

Visit the URL shown (e.g., http://YOUR_IP:8000/). Talk to your LLM—stories, poems, Q&A, or ask why the sky is blue!

Requirements

Installation

git clone https://github.com/karpathy/nanochat
cd nanochat
uv sync

For detailed setup, see installation.

Hardware notes

What happens next

After training completes, the script saves checkpoints and prepares the chat model. You run python -m scripts.chat_web to serve the web UI. From there you can iterate: customize with synthetic data (see guides), add new abilities, or push the research frontier with scaling experiments.