About nanochat
nanochat is the simplest experimental harness for training large language models (LLMs). Created by Andrej Karpathy, it is designed to run on a single GPU node with minimal, hackable code that covers all major LLM stages.
Philosophy
nanochat is not an exhaustively configurable LLM "framework." There are no giant configuration objects, model factories, or if-then-else monsters. It is a single, cohesive, minimal, readable, hackable, maximally-forkable strong baseline codebase designed to run start to end and produce a ChatGPT model you can talk to.
Goals
- Accessibility: Work end-to-end on budgets under $1,000. Cost and cognitive complexity matter. No enterprise licenses or cloud lock-in.
- Speed to GPT-2: Beat the GPT-2 CORE score (0.256525) as fast as possible. Currently ~3 hours. The leaderboard tracks progress.
- Simplicity: Around 8,000 lines of code, mostly Python (PyTorch) plus a little Rust for tokenizer training. No giant config objects or model factories.
- Full stack: Tokenization, pretraining, finetuning, evaluation, inference, and chat UI—all in one place. Fork it, hack it, extend it.
Origin
The name derives from nanoGPT, Karpathy's earlier project that only covered pretraining. nanochat is also inspired by modded-nanoGPT, which gamified nanoGPT with clear metrics and a leaderboard. Both are available on GitHub.
Community
nanochat has an active community. Use the GitHub Discussions for questions, the Issues tab for bugs, and join the #nanochat Discord channel for real-time chat. Researchers can contribute to improving the time-to-GPT-2 metric—see contributing.
Try it
You can try nanochat at nanochat.karpathy.ai, or clone the repo from GitHub and train your own model. The speedrun produces a model you can chat with in a familiar web interface.