Quick Start
Run bash runs/speedrun.sh on an 8×H100 node. In ~3 hours, train a GPT-2 grade model and talk to it.
Train Your Own ChatGPT-Style LLM for Under $100
nanochat is the simplest experimental harness for training LLMs. Train your own GPT-2 capability model in ~3 hours on an 8×H100 GPU node for ~$73, then talk to it in a familiar ChatGPT-like web UI.
In 2019, training GPT-2 cost approximately $50,000. Thanks to advances across the stack over 7 years—faster GPUs, better algorithms, and scaling insights—nanochat lets you outperform the GPT-2 (1.6B) CORE metric in ~3 hours for ~$73. The codebase is minimal, hackable, and covers all major LLM stages: tokenization, pretraining, finetuning, evaluation, inference, and a chat UI.
Created by Andrej Karpathy, nanochat is the simplest experimental harness for training language models. It's not a framework with endless configs—it's a single cohesive pipeline you can read, modify, and run from start to finish.
Run bash runs/speedrun.sh on an 8×H100 node. In ~3 hours, train a GPT-2 grade model and talk to it.
Tokenization, pretraining, SFT, RL, evaluation—all in one minimal, readable codebase under 10K lines.
Explore features →Scaling laws, miniseries training, CORE metric evaluation. Help beat the GPT-2 time record.
Research →Serve your model with a ChatGPT-like web interface. Stories, poems, Q&A—talk to your LLM.
Chat UI →Every stage of the LLM pipeline, from raw text to a chatty model, in one minimal codebase:
nanochat is a complete end-to-end pipeline. You start with raw text, train a tokenizer, pretrain the base model on FineWeb-Edu, then fine-tune it for chat using SmolTalk and other datasets. The result is a model you can serve locally and talk to via a ChatGPT-style interface—all from a minimal, readable codebase.
Single GPU? It works—gradient accumulation kicks in automatically, just takes longer. Running on CPU or Apple Silicon? See runcpu.sh for a minimal example.
nanochat is ideal for researchers experimenting with LLM training, students learning how language models work end-to-end, hobbyists who want to train and talk to their own model, and anyone curious about building a ChatGPT-style system on a budget. No enterprise tooling—just clean, readable Python and a single coherent pipeline.
Want to chat with a nanochat model without training? Try the hosted demo—no setup required. Ask for stories, poems, or why the sky is blue.
Try nanochat online →nanochat covers every stage from raw data to a chatty model:
Data is handled by FineWeb-Edu, SmolTalk, and task-specific datasets. See the file structure for where everything lives.
The primary metric is time to GPT-2—wall clock time to outperform GPT-2 (1.6B) CORE metric on an 8×H100 node.
Dive deeper into documentation, research, and resources.