Liquid AI, an artificial intelligence company, recently officially launched the LFM2-VL series of visual language foundation models. This series is optimized for low latency and on-device deployment, marking a significant breakthrough in multimodal AI for mobile and edge computing. The newly released LFM2-VL, available in 450M and 1.6B parameter versions, doubles the GPU inference speed of existing models while maintaining competitiveness.
The LFM2-VL utilizes an innovative modular architecture, combining a language model backbone, a visual encoder, and a multimodal projector. It uses "pixel demixing" technology to dynamically reduce the number of image labels. This design enables the model to process images at their native resolution, up to 512×512 pixels, with larger images intelligently segmented into multiple patches. Version 1.6B also adds full-image thumbnail encoding to ensure global contextual understanding.
In practical applications, the LFM2-VL demonstrates exceptional flexibility, allowing users to dynamically adjust the balance between speed and quality based on device performance and application requirements. In public benchmark tests, its performance is comparable to larger models such as InternVL3 and SmolVLM2, but with a smaller memory footprint and faster processing speed. The weights for both models are now publicly available and available to developers through Hugging Face. Commercial users must contact Liquid AI to obtain a license.
The launch of this new model will provide powerful support for applications such as robotics, the Internet of Things, smart cameras, and mobile assistants, promoting the widespread adoption of on-device multimodal AI and reducing reliance on cloud computing.