Zhipu AI recently announced the launch and open-source release of the world's first 100-byte visual reasoning model, GLM-4.5V. With a total of 106 billion parameters and 12 billion activation parameters, it is now available for download simultaneously on the MoDa community and Hugging Face. As a key step toward artificial general intelligence (AGI), this model has achieved state-of-the-art performance among open-source models across 41 multimodal rankings, covering a full range of tasks, including image, video, and document parsing, and GUI interaction.
Based on the next-generation text-based framework, GLM-4.5-Air, this model achieves breakthrough capabilities through efficient hybrid training. A new "thinking mode" switch allows for flexible switching between fast response and deep reasoning, supports 64KB of context input, and utilizes 3D convolution and 3D-RoPE encoding technologies to enhance video and spatial relationship understanding. In actual testing, it can accurately locate objects in images, replicate web page structures, and even extract key information from complex documents containing dozens of pages.
To lower the barrier to entry, Zhipu has also open-sourced a desktop assistant application that can take real-time screenshots for visual tasks such as coding assistance and game guides. The API service is now available on BigModel.cn, offering a free quota of 20 million tokens. Call costs are as low as 2 yuan per million tokens, with response speeds of 60-80 tokens per second. Enterprise users can use this service to quickly deploy cost-effective multimodal solutions for scenarios such as industrial quality inspection and intelligent customer service.
Technically, the model innovatively integrates a visual encoder, an MLP adapter, and a language decoder, enhancing its ability to process images at extreme scales through bicubic interpolation. Analysts believe that the open-source release of GLM-4.5V will accelerate the industrialization of visual reasoning technology and take a key step towards the widespread application of AI in general scenarios.