Blog

All

blog post

Updates

How CentML Achieved 2x Inference Speed on DeepSeek-R1 using Speculative Decoding

Feb 24, 2025

Since the release of DeepSeek-R1, the open-source community has been working to optimize its inference speed. While low-level GPU optimizations have improved performance, CentML took it a step further using speculative decoding. By repurposing DeepSeek’s MTP module and implementing EAGLE-style recursive generation, we achieved a 2x speedup, generating up to 70 tokens/second.

Previous

Next

Get started

Let's make your LLM better!

Book a Demo