Blog

Speculative Decoding Is Not a Heuristic

Increase LLM Speed Without Sacrificing Quality

Speculative decoding is a technique which uses a fast draft model to accelerate the inference of a slow target model. With the right verification procedure, speculative decoding can be lossless (i.e. it reproduces the quality of the target model). For token-by-token verification procedures, the criteria for lossless decoding can be written as a system of linear equations and inequalities. The acceptance rate, which directly controls model throughput, is also linear in the same variables. Thus, the optimal verification procedure is obtained by solving a linear program. This linear program is equivalent to an optimal transport problem from the draft distribution to the target distribution.

Jan 21, 2026

Reed Meyerson

The Ancients had Ops

Reproducible and Auditable MLOps

While the “Dev” and “ML” parts of DevOps and MLOps are new, the “Ops” part is ancient, and the properties of good Ops have remained constant from antiquity to present.

Nov 14, 2024

Reed Meyerson

Generative AI doesn’t need to be flashy

The case for boring generative AI

Generative AI is flashy, but it doesn’t need to be. Generative diffusion models are universal probability distribution approximators. Sometimes (rarely, but more often than never) you should consider them for solving your boring business problems.

Mar 19, 2023

Reed Meyerson