Yesterday, Zhiyuan Robotics officially launched Genie Envisioner (GE), the industry's first unified world model platform for real-world robot control. This platform, through its revolutionary closed-loop architecture, completely transforms the traditional "data-training-evaluation" model. GE innovatively integrates future frame prediction, policy learning, and simulation evaluation into a video-based system, enabling robots to complete a complete closed loop from environmental perception to decision execution within a single model for the first time. This breakthrough marks a shift in robotics technology from passive execution to an active "imagine-verify-act" model.
Based on 3,000 hours of real-machine data training, the GE-Act component demonstrates remarkable cross-platform adaptability. Testing has shown that on new platforms such as the Agilex Cobot Magic, high-quality teleoperation tasks can be achieved with just one hour (approximately 250 demonstrations) of teleoperation data. Its core technology directly models interaction dynamics in visual space, fully preserving spatiotemporal information, resulting in a nearly 60% improvement in the success rate of long-term tasks compared to existing solutions. The platform's entire code, models, and evaluation tools are now open source and available to developers through the project homepage and GitHub repository.
Notably, the GE platform utilizes a unique vision-centric modeling paradigm, with GE-Base modeling for environmental analysis, GE-Act for decoding motion commands, and GE-Sim for neural simulation. These three elements work together to form a complete closed loop. In actual testing, robots equipped with this system have been able to seamlessly complete complex tasks such as making sandwiches and pouring tea. Zhiyuan Robotics stated that it will expand its support for multimodal sensors in the future to promote the application of intelligent manufacturing and service robots.