Platform Pricing
Free $10 Credits to all new users (4 million tokens on Llama 3.1 405B)
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.
Developer
Best performance – no hidden fees
- $10 free credits for all new users
- Full pay-as-you-go billing – per minute/per token
- On-demand dedicated endpoints – no rate limits
- Planner feature available for all deployments
- No daily limits
Enterprise
Custom solutions for scaling
- Custom pricing
- Unlimited rate limits
- Unlimited deployed models
- Dedicated and self-hosted deployments
- Guarenteed uptime SLA
- 24/7 tech support
- Plus all features from the Developer package
Platform Pricing Overview
Deploying Applications are calculated on a credit-based billing system, where 1 CentML credit equals 1 USD. You can buy credits through the Platform by going to your Account page.
Serverless Endpoint usage is billed according to the total number of tokens generated and processed.
Model Size | Price per 1M Tokens | Examples |
Small (1-4B) | $0.04 | Smaller language models |
Medium (7-11B) | $0.08 | General-purpose models |
Large (70-90B) | $0.50 | Complex AI applications |
X-Large (405B) | $2.50 | High-demand, intensive LLMs |
Dedicated Deployments
Dedicated deployments are charged based on the type and duration of hardware used, following a per-minute billing system.
Accelerator | Credits per hour |
NVIDIA L4 – 24GB | 0.30 |
NVIDIA A10G – 24GB | 0.30 |
NVIDIA A100 – 40GB | 1.10 |
AWS Inf2 – 32GB | 2.00 |
NVIDIA H100 – 80GB | 2.50 |
NVIDIA H200 – 141GB | 2.60 |
GCP v6e TPU – 32GB | 2.70 |
AMD MI300X – 192GB | 14.00 |
Customized Plans
Looking for specialized requirements or larger-scale deployments? We offer customizable plans to suit enterprise needs. Contact us for details.