Faster zlib compression on Apple M1

Aug 20, 2022

Most of the speedup just comes from reading and applying Fabian Giesen’s posts (intro to dataflow graphs) on Huffman decoding

The goal was to use roughly the following variant-4-style loop to decode and unconditionally refill with eight-cycle latency on Apple M1...

Very detailed analysis of how to speed up deflate (and urging you to use zstd instead)

↑ up