nanochat Research
If you're a researcher, nanochat offers scripts and benchmarks to improve micro model training. The main goal: beat GPT-2 faster. The codebase is minimal enough to iterate quickly—change something, run a d12 or d16, and see if it helped.
Key scripts
runs/scaling_laws.sh— Scaling law experimentsruns/miniseries.sh— Miniseries of models at increasing scales
See the Jan 7 miniseries v1 discussion for documentation.
Quick iteration
For ~5 min pretraining runs, train a d12 (GPT-1 sized) model:
OMP_NUM_THREADS=1 torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- \
--depth=12 \
--run="d12" \
--model-tag="d12" \
--core-metric-every=999999 \
--sample-every=-1 \
--save-every=-1
Change something, re-run, and see if it helped. Iterate on d12, d16, etc.
Approach
Use depth as the single dial of complexity. Sweep depth to get increasingly powerful models. Set data budget to compute-optimal, train a miniseries, and compare to GPT-2 and GPT-3 miniseries. Beating GPT-2 faster is the current target.
CORE metric
The CORE score (from the DCLM paper) is the primary benchmark. GPT-2 (1.6B) target: 0.256525. nanochat evaluates this in nanochat/core_eval.py. Beating this score in the least wall-clock time is the main research target—currently ~3 hours on 8×H100.
Community & discussions
See the GitHub Discussions for guides, the Jan 7 miniseries documentation, and community experiments. New ideas for speeding up time-to-GPT-2 are especially welcome.