Google DeepMind has released Genie 3, one of the most advanced world models available. Genie 3 generates fully interactive, highly consistent, and dynamic worlds from text in real time, allowing users to explore them in real time at 24 frames per second and 720p resolution. This model is a key step towards artificial general intelligence (AGI), enabling AI agents to be trained in rich simulated environments.
Developed through a collaboration between DeepMind's Veo 2 and Genie 2 projects, Genie 3 boasts several groundbreaking features. It can retain up to a minute of spatial memory, enabling, for example, a character who paints a wall and then moves away will still have the paint on their return. Furthermore, physical laws become a natural outgrowth of the model, and as the scale and depth of training data increase, physics effects, such as water simulation and lighting variations, become more realistic.
Genie 3 represents a significant leap forward in its ability to simulate the real world, generating videos that are indistinguishable from real-world content. Trained on massive amounts of data, the model exhibits intuitive human behaviors, such as predicting a door's opening when approaching it and swimming or splashing when entering water. These behaviors are the result of the model's autonomous learning, not a design decision.
Going forward, DeepMind will continue to enhance the realism and interactivity of the Genie series, aiming to make the model have a broader impact. Genie 3 will eventually be open to other teams for use in creating diverse applications, such as personal game worlds, reinforcement learning agent training, and robotics research.