The InternLM team recently officially open-sourced the lightweight multimodal inference model Intern-S1-mini. This "research powerhouse," with only 8 bytes of parameters, combines the Qwen3-8byte language model with the 0.3byte visual encoder InternViT, achieving powerful performance with a compact architecture. Particularly notable is its pre-training data, which exceeds 5 trillion tokens, half of which comes from scientific fields such as chemistry and physics. This enables specialized capabilities such as molecular formula analysis and synthetic route planning, providing a new tool for interdisciplinary research.
Performance testing shows that the model leads across authoritative benchmarks such as MMLU-Pro and AIME2025, achieving scores of 76.47 on ChemBench, 61.55 on MatBench, and 58.47 on ProteinLMBench for protein sequence analysis. This "reducing parameters without sacrificing performance" allows it to maintain its lightweight design while achieving top-tier multimodal processing capabilities for text, images, and video. Even more surprising is the model's default "thinking mode" (toggleable via the enable_thinking command), which provides a human-like reasoning and interactive experience, allowing users to flexibly adjust the depth of responses.
Industry experts have hailed it as a "scientific star," and its out-of-the-box expertise has been applied to the Shanghai AI Lab's Intern-Discovery scientific discovery platform. In scenarios such as materials design and drug development, the model can simultaneously process experimental data and literature, significantly improving research efficiency. As multimodal AI evolves towards lightweight and specialized capabilities, this "hexagonal scientific warrior" may become a key engine driving the scale-law of scientific discovery.