Model Library
Qwen2-VL-7B-Instruct
An advanced vision-language model. It is purpose-built for tasks that involve both visual and textual understanding, including image captioning, visual question answering, and content generation.
Serverless
Read More
Qwen2-VL-2B-Instruct
A powerful multimodal model that excels in visual understanding tasks, including image and video comprehension.
Read More
Qwen2.5-VL-7B-Instruct
A 7B-parameter multimodal model, designed for complex vision-language tasks. Ideal for document understanding, video summarization, and interactive applications requiring both visual perception and language reasoning.
Serverless
Read More
Llama 4 Scout 17B (16E) Instruct
A mixture-of-experts (MoE) language model activating 17 billion parameters out of a total of 109B. Designed for assistant-style interaction and visual reasoning.
Serverless
Read More