Techniques

Speculative Decoding

Definition

An inference optimization technique where a smaller, faster draft model generates candidate tokens that are then verified in parallel by the larger target model. Speculative decoding can speed up inference by 2-3x without changing the output distribution, making it valuable for reducing latency in production LLM deployments.

Related Terms

No related terms linked yet.

Explore all terms →

Explore companies in this space

Techniques Companies

View Techniques companies