Zhiyuan Robotics recently announced the open source release of the Genie Operator-1 (GO-1), a large, universal embodied base model. This is the world's first embodied intelligence model utilizing the Vision-Language-Latent-Action (ViLLA) architecture. This open source initiative aims to lower the technical barriers to embodied intelligence and enable more developers to participate in the application and development of this cutting-edge technology. This release follows the release of the AgiBot World embodied intelligence dataset, released in January of this year, marking the accelerated move toward open sharing of embodied intelligence technology.
The GO-1 model's core breakthrough lies in its innovative ViLLA architecture. Compared to traditional Vision-Language-Action (VLA) architectures, the ViLLA architecture introduces implicit action markers, enabling a precise connection between image and text input and robot actions. This architecture utilizes a three-layer design: the VLM multimodal understanding layer, built on InternVL-2B, processes visual, force, and language information; the Latent Planner implicit planner understands complex tasks; and the Action Expert generates high-precision action sequences through a diffusion model, ensuring precise robot manipulation. This technological breakthrough enables robots to more accurately understand human intentions and perform more complex manipulation tasks.
To lower the development barrier, Zhiyuan has simultaneously launched the Genie Studio development platform, providing a comprehensive solution from data acquisition to model training and simulation evaluation. The platform integrates the GO-1 model, features a video training solution, and a unified framework, significantly improving development efficiency. Notably, while GO-1 is pre-trained using AgiBot G1 robot data, it has demonstrated excellent portability across multiple platforms, adapting to the needs of diverse robots. The model is now available to developers via the GitHub repository, enabling both AI experts and tech enthusiasts to embark on their embodied intelligence development journey.