Model Library

Qwen

Qwen2-VL-7B-Instruct

An advanced vision-language model. It is purpose-built for tasks that involve both visual and textual understanding, including image captioning, visual question answering, and content generation.

Serverless

Read More
Qwen

Qwen2-VL-2B-Instruct

A powerful multimodal model that excels in visual understanding tasks, including image and video comprehension.

Read More
Qwen

Qwen2.5-VL-7B-Instruct

A 7B-parameter multimodal model, designed for complex vision-language tasks. Ideal for document understanding, video summarization, and interactive applications requiring both visual perception and language reasoning.

Serverless

Read More
Llama

Llama 4 Scout 17B (16E) Instruct

A mixture-of-experts (MoE) language model activating 17 billion parameters out of a total of 109B. Designed for assistant-style interaction and visual reasoning.

Serverless

Read More
Llama

Llama 4 Maverick 17B (128E) Instruct

High-capacity multimodal language model built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters. Supports multilingual text and image input.

Serverless

Read More
Llama

Llama 3.2 11B Vision Instruct

Part of Meta's vision family, designed for a wide array of vision-language tasks including visual recognition, image reasoning, and captioning.

Read More

Get started

Let's make your LLM better!

Book a Demo