Author: ermek

A Technical Deep Dive into Pipeline Parallel Inference with CentML

With yesterday’s release of Llama-3.1-405B, we’re excited to announce that CentML’s recent contribution to vLLM, adding pipeline parallel inference support, […]

Updates

Hardware Efficiency in the Era of LLM Deployments

How CServe can make LLM deployment easy, efficient and scalable

Older Posts

Get started

Let's make your LLM better! Book a Demo