Google DeepMind has launched SIMA 2, a new generation of AI agents that act as companions within virtual worlds. This marks a significant shift from simple, reactive actions to more complex interactions where AI can plan, explain itself, and learn from experience. This aligns with the ambition to move one step closer to Artificial General Intelligence (AGI), a goal that has important implications for both robotics and broader AI applications.
The first SIMA (Scalable Instructable Multiworld Agent) was launched in March 2024 and learned hundreds of basic skills simply by observing its interaction with the screen. In this new version, the AI can reason independently. According to Google, SIMA 2 is their most advanced AI agent for 3D virtual worlds, equipped with the Gemini model, which allows this agent to not only follow basic instructions but also to think, understand, and take meaningful actions in interactive environments. This means that one can interact with the system not only through text or speech, but even using images.
Using the Gemini AI model, SIMA 2 can now interpret higher-level goals and explain the steps it plans to take. The effectiveness of this approach is demonstrated by its strong generalization across different virtual environments. SIMA 2 is capable of completing longer and more complex tasks, such as logic puzzles and using visual cues, achieving an astonishing 65% task completion rate, compared to 31% for its predecessor. This offers crucial insight into the progress the AI has made and what this means for the future of user interaction and gaming experiences.
A notable development is that SIMA 2 can adapt to new 3D worlds created by another DeepMind project, Genie 3. This system generates interactive environments based on a single image or text prompt. In tests, SIMA 2 was able to quickly orient itself, understand its goals, and act effectively in these unfamiliar worlds. The system demonstrates a capacity improvement that goes beyond simple reactions; it can transfer concepts like "mining" in one game to "harvesting" in another, demonstrating the value of transfer learning—the ability to move knowledge from one context to another.
Nevertheless, there remains room for improvement; the research identified shortcomings such as handling very long and complex tasks, limited memory structures, and visual interpretation difficulties. Despite these challenges, DeepMind considers SIMA 2 a valuable test platform for future applications in both robotics and navigation. The development of SIMA 2 therefore not only offers new possibilities in digital domains but also lays a solid foundation for the future of intelligence in the real world.
What makes SIMA 2 different from its predecessor?
SIMA 2 can now reason independently and weigh its actions better, allowing it to perform long and complex tasks more effectively.
How does SIMA 2 contribute to the development of AGI?
SIMA 2's autonomy and adaptive learning processes bring us closer to AI that can mimic human levels of understanding and reasoning.
What challenges remain for the future of SIMA 2?
Both processing complex multi-step tasks and visual interpretation problems in novel environments remain challenges that must be solved to achieve full effectiveness.