Good Performance for Bad Days

last updated: Jun 15, 2025

The core of what I tried to communicate is that, in my view, a lot of the performance evaluation community is overly focused on happy case performance (throughput, latency, scalability), and not focusing as much as we need to on performance under saturation and overload.

Hypothesis 1: In large-scale systems, system performance is the single largest contributor to system availability

Hypothesis 2: In large-scale systems, unpredictable system performance is the single largest contributor to system unavailability


Closed benchmarks are too kind to realistically reflect how performance changes with load, for the simple reason that they slow their load down when latency goes up.

The real world isn’t that kind to systems. In most cases, if you slow down, you just have more work to be done later.

↑ up