Neural Network zero to hero
last updated: Oct 20, 2023
https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ
A series of lectures from Andrej Karpathy building neural networks up from the ground up. Via Michael Nielsen
A hands-on explanation with a nice combination of theory and practice. Focused on language models, with the promise that it'll build up to modern transformer models.
I really enjoyed reading the transformer code here: https://github.com/karpathy/makemore/blob/master/makemore.py The key code is about ~70 lines, of which most is very straightforward / boilerplate. The core of it is maybe 20 lines - that's what does GPT-2!