COLING 2025 Tutorial:
Speculative Decoding for Efficient LLM Inference


Heming Xia¹,	Yongqi Li¹,	Cunxiao Du²,	Qian Liu³	Wenjie Li¹

¹The Hong Kong Polytechnic University, ²SEA AI Lab, ³TikTok

Sunday, January 19, 09:00 - 12:30 (GST), Tutorial 1

Abu Dhabi National Exhibition Centre, Capital Suite 7

Full slide deck: https://tinyurl.com/speculative-decoding-tutorial
Recording: https://tinyurl.com/spec-tutorial-recording

About this tutorial

This tutorial presents a comprehensive introduction to Speculative Decoding (SD), an advanced technique for LLM inference acceleration that has garnered significant research interest in recent years. SD is introduced as an innovative decoding paradigm to mitigate the high inference latency stemming from autoregressive decoding in LLMs. At each decoding step, SD efficiently drafts several future tokens and then verifies them in parallel. This approach, unlike traditional autoregressive decoding, facilitates the simultaneous decoding of multiple tokens per step, thereby achieving promising 2x-4x speedups in LLM inference while maintaining original distributions.

This tutorial delves into the latest techniques in SD, including draft model architectures and verification strategies. Additionally, it explores the acceleration potential and future research directions in this promising field. We aim for this tutorial to elucidate the current research landscape and offer insights for researchers interested in Speculative Decoding, ultimately contributing to more efficient LLM inference.

Schedule

Our tutorial will be held on January 19 (all the times are based on GST = Abu Dhabi local time).

Time	Section	Presenter
09:00—09:40	Part I: Introduction & Definition	Heming
09:40—10:25	Part II: History and A Taxonomy of Methods	Qian
10:25—10:30	Q & A Session I
10:30—11:00	Coffee break
11:00—11:40	Part III: Cutting-edge Algorithms	Heming
11:40—12:10	Part IV: Downstream Adaptations	Yongqi
12:10—12:30	Part V: Final Remarks and Outlook + Q & A Session II	Yongqi

Reading List

Bold papers are discussed in detail during our tutorial.

For further information, we recommend referring to our Survey and Reading List on Speculative Decoding.

BibTeX

@article{ speculative-decoding-tutorial,
  author    = { Xia, Heming and Du, Cunxiao and Li, Yongqi and Liu, Qian and Li, Wenjie },
  title     = { COLING 2025 Tutorial: Speculative Decoding for Efficient LLM Inference },
  journal   = { COLING 2025 },
  year      = { 2025 },
}

COLING 2025 Tutorial: Speculative Decoding for Efficient LLM Inference

About this tutorial

Schedule

Reading List

Part II: History

Part II: A Taxonomy of Methods

Part III: Cutting-edge Algorithms

Part IV: Downstream Adaptations

BibTeX

COLING 2025 Tutorial:
Speculative Decoding for Efficient LLM Inference