Blog

Updates

blog post

Updates

How CentML Achieved 2x Inference Speed on DeepSeek-R1 using Speculative Decoding

Feb 24, 2025

Since the release of DeepSeek-R1, the open-source community has been working to optimize its inference speed. While low-level GPU optimizations have improved performance, CentML took it a step further using speculative decoding. By repurposing DeepSeek’s MTP module and implementing EAGLE-style recursive generation, we achieved a 2x speedup, generating up to 70 tokens/second.

blog post

Updates

Introducing ‘ECR Anywhere’: A New Tool for Simplifying Multi-Cloud Deployments

Nov 18, 2024

ECR Anywhere for Cross-Cloud Container Flexibility From vendor lock-in and security overhead to reduced agility, multi-cloud deployments present some sizeable hurdles. With a new cross-cloud solution, ECR Anywhere, developers can now eliminate the complexity of native registries, allowing for secure, seamless multi-cloud deployment of Docker images on any Kubernetes cluster. Managing containerized applications across multiple […]

blog post

Updates

CentML’s New Platform Enables Rapid, Economical AI Deployment for All

Nov 4, 2024

The CentML team is thrilled to announce the launch of the CentML Platform — a frictionless and economical AI deployment solution for enterprises and startups alike. Since ChatGPT’s launch two years ago, GenAI has reshaped industries and unlocked new possibilities. Yet, for many businesses, adopting GenAI remains challenging. High costs, complex deployments, significant compute resource […]

Previous

Next

Get started

Let's make your LLM better!

Book a Demo