nanoGPT
last updated: Oct 20, 2023
https://github.com/karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
It's a re-write of minGPT, which I think became too complicated, and which I am hesitant to now touch. Still under active development, currently working to reproduce GPT-2 on OpenWebText dataset. The code itself aims by design to be plain and readable:Â
train.py
 is a ~300-line boilerplate training loop andÂmodel.py
 a ~300-line GPT model definition, which can optionally load the GPT-2 weights from OpenAI. That's it.
https://www.youtube.com/watch?v=kCc8FmEb1nY
"Let's build GPT: from scratch, in code, spelled out."
Walks through an even basic-er version of this code