High-capacity multimodal language model built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters. Supports multilingual text and image input.
An iteration of the DeepSeek V3 model with notable improvements in reasoning capabilities across various benchmarks, including MMLU-Pro, GPQA, and AIME.
A mixture-of-experts (MoE) language model activating 17 billion parameters out of a total of 109B. Designed for assistant-style interaction and visual reasoning.