May 12, 2025
A team of researchers from CentML and the University of Toronto analyzed LLM parallelization methods and developed Seesaw, an LLM inference engine optimized for throughput-oriented tasks.

May 12, 2025
A team of researchers from CentML and the University of Toronto analyzed LLM parallelization methods and developed Seesaw, an LLM inference engine optimized for throughput-oriented tasks.
Guides
Sep 9, 2024
From optimizing existing resources to leveraging cloud-based solutions, learn how to contend with GPU shortages.
Guides
In this guide, we take a closer look at the core differences between TPUs and GPUs, their distinct roles, and how TPU-GPU synergy can supercharge your AI and machine learning (ML) models. Understanding the Basics: What Are Tensors, TPUs, and GPUs? Tensors A tensor is a multi-dimensional array of numbers that represents data across dimensions […]
Guides
In this guide, we dig into some proven strategies and techniques to help you boost GPU performance. Armed with these tactics, you’ll be well prepared to refine your AI and ML deployments. optimizing them for maximum efficiency and speed. The Basics: GPU Performance and Testing Graphics Processing Units (GPUs) are powerful processors that are essential […]
Guides
Aug 26, 2024
Understanding GPU Cluster Basics The GPU cluster is an infrastructure powerhouse that combines multiple Graphics Processing Units (GPUs) spread across a computer network. Each computer enables and accelerates computational tasks within a cluster, which can be broken down into three primary types: GPU clusters are particularly helpful for machine learning (ML) and artificial intelligence (AI) […]
Guides
In this guide, we take a closer look at the core differences between CPUs and GPUs, their distinct roles, and how combined CPU-GPU synergy can supercharge your AI and machine learning (ML) models. Understanding the Basics: What Are CPUs and GPUs? Central Processing Unit (CPU) CPUs are the backbone of computing, acting as the “brain” […]
Guides
Hyperparameter optimization (HPO), or hyperparameter tuning, is one of the most critical stages in your machine learning (ML) pipeline. It’s also one of the most resource-intensive. Because HPO is critical for your ML model architecture and quality, choosing the right hyperparameter values is essential. Ultimately, those choices impact your model’s efficiency and utility. Using GPU […]
Case Studies
In this case study, we take a closer look at how EquoAI reduced its LLM deployment costs, improved deployment efficiency, and drove significant competitive advantage with CentML GPU optimization. Meet EquoAI Founded in 2023, EquoAI evolved from researching Generative AI adoption barriers to providing GenAI solutions. Now, the company offers white-label RAG and data services […]
Updates
DeepView accurately predicts ML model performance across various cloud GPUs, helping you choose the most cost-effective option. It reveals whether upgrading to pricier GPUs like the H100 is truly beneficial for your specific workload, potentially saving time and resources. The tool also helps identify and resolve performance bottlenecks, ensuring optimal GPU utilization. Introduction Cloud computing […]
Case Studies
With yesterday’s release of Llama-3.1-405B, we’re excited to announce that CentML’s recent contribution to vLLM, adding pipeline parallel inference support, has significantly improved performance when deploying Llama-405B on multi-node setups with vLLM. We’re proud to join many other open-source contributors in the vLLM community, making Llama 3.1 405B available on the day of its release. […]
Updates
How CServe can make LLM deployment easy, efficient and scalable