Neural Network zero to hero
A series of lectures from Andrej Karpathy building neural networks up from the ground up. Via Michael Nielsen
A hands-on explanation with a nice combination of theory and practice. Focused on language models, with the promise that it'll build up to modern transformer models.
I really enjoyed reading the transformer code here: https://github.com/karpathy/makemore/blob/master/makemore.py The key code is about ~70 lines, of which most is very straightforward / boilerplate. The core of it is maybe 20 lines - that's what does GPT-2!