How does prompt caching work

last updated: Jan 17, 2026

https://ngrok.com/blog/prompt-caching/

Not satisfied with the answers in the vendor documentation, which do a good job of explaining how to use prompt caching but sidestep the question of what is actually being cached, I decided to go deeper. I went down the rabbit hole of how LLMs work until I understood the precise data providers cache, what it's used for, and how it makes everything faster and cheaper for everyone.

Goes deep into how LLMs work in general, before arriving at how prompt caching works

via Simon Willison

↑ up