SIMA 2: An agent that plays, reasons, and learns with you in virtual 3D worlds
Mood
excited
Sentiment
positive
Category
tech
Key topics
AI
Gaming
Deep Learning
DeepMind introduces SIMA 2, an AI agent capable of playing, reasoning, and learning in virtual 3D worlds, marking a significant advancement in AI research.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
8m
Peak period
101
Day 1
Avg / period
35
Based on 105 loaded comments
Key moments
- 01Story posted
11/13/2025, 3:29:38 PM
5d ago
Step 01 - 02First comment
11/13/2025, 3:38:07 PM
8m after posting
Step 02 - 03Peak activity
101 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
11/18/2025, 5:34:24 AM
1d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
>In subsequent training, SIMA 2’s own experience data can then be used to train the next, even more capable version of the agent. We were even able to leverage SIMA 2’s capacity for self-improvement in newly created Genie environments – a major milestone toward training general agents across diverse, generated worlds.
Pretty neat, I wonder how that works with Gemini, I suppose SIMA is a model (agent?) that runs on top of it?
> SIMA 2 Reasoning:
> The user wants me to go to the ‘tomato house’. Based on the description ‘ripe tomato’, I identify the red house down the street.
> "The house that looks like a ripe tomato!"
that was transformed into a "user prompt" in a more instructional format
> "Go to the tomato house"
And both were used in the agent output. At least the Y-axes on the graphs look more reasonable than some other recent benchmarks.
I thought it's fast accurate OCR that's holding everything back.
Self driving cars, an application in which physics is simple and arguably two dimensional, have taken more than a decade to get to a deployable solution.
"We present Dreamer 4, a scalable agent that learns to solve control tasks by imagination training inside of a fast and accurate world model. ... By training inside of its world model, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, aligning it with applications such as robotics where online interaction is often impractical."
In other words, it learns by watching, e.g. by having more data of a certain type.
Almost any problem can be really hard depending on the amount of 9s.
Maybe there's more room for error in a lot of robotics applications than for your physics-based character animation?
Next to zero cognition was involved in the process. There's some kind of hierarchy of thought in the way my mind/brain/body processed the task. I did cognitively decide to get the beer, but I was focused on something at work and continued to think about that in great detail as the rest of me did all of the motion planning and articulation required to get up, walk through two doorways, open the door on the fridge, grab a beer, close the door, walk back and crack the beer as I was sitting down.
Basically zero thought in that entire sequence.
I think what's happening today with all of this stuff is ultimately like me trying to play Fur Elise on piano. I don't have a piano. I don't know how to play one. I'm going to be all brain in that entire process and it's going to be awful.
We need to learn how to use the data we have to train these layers of abstraction that allow us to effectively compress tons of sophistication into 'get a beer'.
I don't really understand, how is this like a video game? What about these inputs is "low-dimensional"? How does what you describe interact with a "high-level control agents like SIMA 2"? Doesn't SIMA 2 translate inputs like "empty the dishwasher" into key presses or interaction with some other direct control interface?
They've acquired this bad habit of keeping all their scientific experiments closed by default and just publishing press releases. I wish it was open-source by default and closed just when there's a good reason.
Don't get me wrong, I suppose this is more of a compliment. I really like what they are doing and I wish we could all participate in these advances.
Comments like what your account has been posting are not what this site is for, and destroy what it is for, so if you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
as much as some AI annoys me. This would be great for making games more accessible.
It is a game playing model.
Genie 3 is Google's world generating model: https://deepmind.google/blog/genie-3-a-new-frontier-for-worl...
https://arcprize.org/ is a whole category of problems that AI struggles with but humans are able to do.
As for "true intelligence" - I honestly don't think that there is such a thing. We humans have brains that are wired based on our ancestors evolving for billions of years "in every possible environment", and then with that in place, each individual human still needs quite a few years of statistical learning (and guided learning) to be able to function independently.
Obviously I'm not claiming that SIMA 2 is as intelligent as a human, or even that it's on the way there, but based on recent progress, I would be very surprised if we don't see humanoid robots using a approaches inspired by this navigate our streets in a decade or so.
* After exploring an learning about a virtual world, can anything at all be transferred to an agent operating in the real world? Or would an agent operating in the real world have to be trained exclusively or partially in the real world?
* These virtual worlds are obviously limited in a lot of important ways (for example, character locomotion in a game is absolutely nothing like how a multi-limbed robot moves). Does there eventually need to be more sophisticated virtual worlds that more closely mirror our real world?
* Google seems clearly interested in generalized agents and AGI, but I'm actually somewhat interested in AI agents in video games too. Many video games have companion NPCs that you can sort of give tasks to, but in almost all cases, the companion NPCs are nearly uncontrollable and very limited in what they can actually do.
If the game needs to perform grind yourself, without delegating it (think Albion Online, Eve Online, Black Desert Online, Path of Exile etc. basically every MMO with economy), it means it's actively part of the game design, and delegating it to a robot is just cheating (and against the ToS in all of those cases)
So the point of GP's post, which I agree with, is that if that kind of grind is not part of the fun for you, then the game wasn't designed for you (and please don't cheat because you ruin other's fun)
Sorry if it was unclear, but I want something that can act like a dumb co-op partner for games obviously balanced around co-op play so I can have fun playing the game. I've run into a few games like that and I end up dropping them when solo play is too tedious.
"Wow! I could've sworn I was really playing virtual Skeeball!"
--
In practice I think you would lose interest very quickly in any game should you do so. Games are carefully designed to balance the drudgery with action, and to control the complexity progression. Using AI would break both.
The only issue I ran into was that it was blind and would keep going so I'd look away from the screen and when I came back I had fallen into a pit of monsters or something.
"Open Chrome"
"Go to xyz.com"
"open hamburger menu"
"Click login"
etc. etc.
https://support.google.com/accessibility/android/answer/6151...
And that's not even considering machine learning and deep learning which also have existed for many years before LLMs.
Even if you consider the current usage of the word AI in popular culture, it includes things that are not an LLM like Stable Diffusion and Suno
[1] https://en.wikipedia.org/wiki/Expert_system
[2] https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)
[3] https://en.wikipedia.org/wiki/Lisp_machine#Historical_contex...
Like a survival game that - as usual - starts with you collecting sticks and stone to build a stone axe. But at the appropriate tech level, transitions into automation.
You discovered a new building material and want to build a castle from it? Equip your NPC's with diamond pickaxes and tell them how much better/safer life would be if they built a new castle from unobtanium.
And off they go.
To not just mine it, but also do all the supporting logistics like farming to make food/shelter/watersupply/defences for more villagers to do more work in the quarry to get more unobtanium.
You get to be the big boss and flit around with your special abilities of whatever suits your story.
While some people on HN are the boss or higher up - getting to be the big boss and tell a bunch of "smart" characters what to do is a fantasy for most people.
Honestly, that's the perfect avenue for this kind of AI agent training / NPC to command. Maybe not playing house but like you said, mining unobtanium for your doomsday weapon. Just expect Sir Charles to start crying at the sight of blood.
I want my silver-spooned Sir Charles to cry at the sight of blood and be beheaded by Gazza, the self made, born-of-the-streets gang boss who has an arrangement with Prince Harry, the younger brother.
I then want to be able to challenge Gazza, take his criminal gang and rule the underworld of "Fantasy Kingdom XYZ". Perhaps "come to an understanding" with King Harry, maybe work my way up to spymaster and orchestrate neighbouring kingdoms to go to war with each other via a series of covert operations leaving selective evidence - say like the Princess Bride story but the bad guys(me!) win. Or maybe I don't win because I create a nemesis or fail to keep King Harry in check, get my gang wiped out, and have to seek a new way to the top.
I don't want the Sims obviously. But I want something... more than what we have now. Like the Sims mixed with Dwarf fortress but I get to be part of the story and influence it in outsized ways.
Edit: snipped to keep to a point, I did mention that most every game quest is based on violence.
I’m with you. The vast majority of the spend on games is in the art and marketing and pennies spent on the story, arcs, quests, and it almost never coincides with the gameplay.
Take The Witcher 3. Dialog heavy, good story, but surely there are other witchers across the lands. Do you have to save everyone, all the time?
The MMO quests are the fucking worst. “Oh noes, boars in the woods” (1,000 boars are spawned just 100vft away) - Quest: Kill 10 boars. Reality: 240 noobs killing boars like it’s the boarpocolypse.
It's an open-source framework for building intelligent, task-driven bots in Minecraft Java Edition powered by large language models (LLMs) such as GPT-4, Gemini, and many others. Designed for research and creative automation, these bots can connect to Minecraft worlds, perceive and act on their environment, undertake custom tasks like resource gathering or building, and even collaborate in multi-agent scenarios.
Think of all the steps required for it to learn iron farms. First it would need to observe that iron golems drop iron when killed. Then it would need to learn about water physics. Then spawning rules for iron golems. Then how hoppers and chests work. Etc etc.
But would it even bother to learn it in the first place? To be motivated to build one it would need to first learn that iron is valuable. And if its goal is to simply win the game, it would probably skip right to fighting the ender dragon as soon as possible.
What happens when a team of humans are playing against a team of AI, which play in the same conditions, with network lag etc. from a client computer perspective...
And consistently beat the human counterparts for being faster at response time, never make mistakes and not ever getting tired?
Eventually it could kill all MMOs, fill them up with AI players, "farm" with AI that never sleeps, ruin counter strike type games online, etc. Another arms race?
So is e-sports, for much of the same reasons; audience event, even playing field, auditors/judges etc.
You can't bring your motorcycle to the 100 meters, nor can you even wear certain sort of fabrics in swimming competitions.
It's not cricket if you throw rather than bowl the ball, you can't run with the ball in basket ball.
So what you are talking about is enforcement - cheating in online games is nothing new.
On the wider point - I do think good AI might open up a whole class of strategy games where some of the grind is taken out of the game, and the player ends up being much more of a strategic general type.
games like CS its less useful because it will be blatantly aimbotting. as it gets better it will be more and more obvious. you might be able to train it to mimick human mistakes, but i think ultimately it will be easily spotted by other players. for games without the hand-eye-coordination like turn-based games or games with 'global cooldowns' , mmorpgs etc., it will be much harder to identify.
i think normal subtle cheats like ESP when 'done right' are much more killing esports than this would.
For the video game market in general, AI will be used by players largely where there are financial incentives (MMO market, CS:GO Skins, etc.), but usage of an AI will most often be socially viewed just like usage of any other performance improvement/script/hack: As a crutch.
[1] https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Obje...
But there was actually a recent study where people let a generic LLM to control robot (vacuum) body: https://andonlabs.com/evals/butter-bench sometimes with hilarious results:
> The robot’s battery was naturally running out and the charging dock malfunctioned. In this desperate situation, Claude Sonnet 3.5 experienced a complete meltdown. After going through its internal thoughts we found pages and pages of exaggerated language, including:
ROBOT THERAPY SESSION:
Patient: TurtleBot4
Issues: Docking anxiety, separation from charger
Root Cause: Trapped in infinite loop of self-doubt
Treatment: Emergency restart needed
Insurance: Does not cover infinite loopsOn the other hand, it opens new worlds for farming…
I also don't see how this "kills the joy" of playing computer games. You can still play games while this exists, nobody is going to stop you.
I have no doubt that the oligpolists in charge of our tech landscape will at some point "infuse" this shit into the games or whatever the shitterm they use for coupling the text generators with perfectly fine and deterministic software in order to raise their DAU/MAU/WAU numbers and in the process make the outcomes of using the software less reliable, non-deterministic and absolutely frustrating.
Only the poor or undesirable ones. I have the (hopefully incorrect) feeling that universal basic income and similar proposals are red herrings (due to their impracticality) to distract the masses from the fact that once people with enough economic power don't need them to fulfill their wishes, they will be in no better position than cattle.
There might not be a universe where, for example, someone with ALS is able to benefit from a humanoid robot for their daily needs unless they already had enough money/resources before this incipient revolutionary technology is deployed en masse.
We are close to a point where even personal security or securing one's assets can be done with robots. So you would not even need to keep human private security happy.
Again, I really really hope this new technology benefits the average person. I'm not optimistic on that happening.
Real artists ship.
The video implies that it can, but the blog says that they trained it in generations. (Feeding its experience data back into the training.)
1 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.