Techniques
Speculative Decoding
Definition
“
An inference optimization technique where a smaller, faster draft model generates candidate tokens that are then verified in parallel by the larger target model. Speculative decoding can speed up inference by 2-3x without changing the output distribution, making it valuable for reducing latency in production LLM deployments.
”
Related Terms
No related terms linked yet.
Explore all terms →