Author: ermek

A Technical Deep Dive into Pipeline Parallel Inference with CentML

With yesterday’s release of Llama-3.1-405B, we’re excited to announce that CentML’s recent contribution to vLLM, adding pipeline parallel inference support, […]

Updates

Hardware Efficiency in the Era of LLM Deployments

How CServe can make LLM deployment easy, efficient and scalable

Older Posts Newer Posts

Get started

Let's make your LLM better! Book a Demo