#1Speculative KV Coding: Losslessly Compressing KV Cache by Up to 4x
A researcher demonstrates a lossless compression technique for LLM key-value caches achieving up to 4x compression using a lightweight predictor model. The method encodes only the residual differences between predicted and actual KV values using arithmetic coding. Stacked with existing FP8 quantization, total compression reaches 6–8x — a potentially significant efficiency gain for LLM inference infrastructure.




