Neural Network zero to hero

Oct 20, 2023

A series of lectures from Andrej Karpathy building neural networks up from the ground up. Via Michael Nielsen

A hands-on explanation with a nice combination of theory and practice. Focused on language models, with the promise that it'll build up to modern transformer models.

I really enjoyed reading the transformer code here: The key code is about ~70 lines, of which most is very straightforward / boilerplate. The core of it is maybe 20 lines - that's what does GPT-2!

↑ up