SIMA 2: An agent that plays, reasons, and learns with you in virtual 3D worlds
OK, AI playing video games is cool. But you know what's really really cool? It looks like SIMA 2 is controlling the mouse and reading the screen at something approaching 30+fps. WANT. Computer use agents are so slow right now, this is really something. I wonder what the architecture is for this.
The gap between high level and low level control of robots is closing. Right now thousands of hours of task specific training data is being collected and trained on to create models that can control robots to execute specific tasks in specific contexts. This essentially turns the operation of a robot into a kind of video game, where inputs are only needed a in low-dimensional abstract form, such as "empty the dishwasher" or "repeat what I do" or "put your finger in the loop and pull the string". This will be combined with high-level control agents like SIMA 2 to create useful real-world robots.
Tangential question: is this going to eventually ruin e-sports?
What happens when a team of humans are playing against a team of AI, which play in the same conditions, with network lag etc. from a client computer perspective...
And consistently beat the human counterparts for being faster at response time, never make mistakes and not ever getting tired?
Eventually it could kill all MMOs, fill them up with AI players, "farm" with AI that never sleeps, ruin counter strike type games online, etc. Another arms race?
>We’ve observed that, throughout the course of training, SIMA 2 agents can perform increasingly complex and new tasks, bootstrapped by trial-and-error and Gemini-based feedback.
>In subsequent training, SIMA 2’s own experience data can then be used to train the next, even more capable version of the agent. We were even able to leverage SIMA 2’s capacity for self-improvement in newly created Genie environments – a major milestone toward training general agents across diverse, generated worlds.
Pretty neat, I wonder how that works with Gemini, I suppose SIMA is a model (agent?) that runs on top of it?
I want a smarter game.
Like a survival game that - as usual - starts with you collecting sticks and stone to build a stone axe. But at the appropriate tech level, transitions into automation.
You discovered a new building material and want to build a castle from it? Equip your NPC's with diamond pickaxes and tell them how much better/safer life would be if they built a new castle from unobtanium.
And off they go.
To not just mine it, but also do all the supporting logistics like farming to make food/shelter/watersupply/defences for more villagers to do more work in the quarry to get more unobtanium.
You get to be the big boss and flit around with your special abilities of whatever suits your story.
While some people on HN are the boss or higher up - getting to be the big boss and tell a bunch of "smart" characters what to do is a fantasy for most people.
I hope we can get some (ideally local) version of this we can use as a "gaming minion". There's a lot of games where I probably would have played more if I could delegate the grind. If they're not that competent, it adds to the fun a little even.
I get why they do it, they are a business. I just wish Google would get off their ivory tower and build in the open more like they used to (did they? maybe I'm misremembering...).
They've acquired this bad habit of keeping all their scientific experiments closed by default and just publishing press releases. I wish it was open-source by default and closed just when there's a good reason.
Don't get me wrong, I suppose this is more of a compliment. I really like what they are doing and I wish we could all participate in these advances.
This is obviously just a research project, but I do wonder about the next steps:
* After exploring an learning about a virtual world, can anything at all be transferred to an agent operating in the real world? Or would an agent operating in the real world have to be trained exclusively or partially in the real world?
* These virtual worlds are obviously limited in a lot of important ways (for example, character locomotion in a game is absolutely nothing like how a multi-limbed robot moves). Does there eventually need to be more sophisticated virtual worlds that more closely mirror our real world?
* Google seems clearly interested in generalized agents and AGI, but I'm actually somewhat interested in AI agents in video games too. Many video games have companion NPCs that you can sort of give tasks to, but in almost all cases, the companion NPCs are nearly uncontrollable and very limited in what they can actually do.
At 0:52 in their demo video, there is a grammatical inconsistency in the agent's text output. The annotations in the video are therefore suspected to be created by humans after the fact. Is Google up to their old marketing/hyping tricks again?
> SIMA 2 Reasoning:
> The user wants me to go to the ‘tomato house’. Based on the description ‘ripe tomato’, I identify the red house down the street.
So, can this learn in real time or not?
The video implies that it can, but the blog says that they trained it in generations. (Feeding its experience data back into the training.)
Could make for some very interesting Digimon games in the future.
Would be cool to see if they could make it play starcraft too and pit it against alphastar.
It's like the factorio moment where you unlock the roboport. No more manual changes to the world, drone swarms to build housing, roads, bridges, parks etc. so exciting.
Isn't most of this demo no man's sky? The voiceover doesn't make it clear that the world is not generated by SIMA.
Why would anyone care that a big company paid for some people to go on an inhouse research excursion.
Real artists ship.
Yet another blogpost that looks super impressive, until you get to the bottom and see the charts assessing held out task performance on ASKA and MineDojo and see that it's still a paltry 15% success rate. (Holy misleading chart batman!) Yes, it's a major improvement over SIMA 1, but we are still a long way from this being useful for most people.
[flagged]
This gives me strong vibes of "The Lifecycle of the Software Objects" [1] by Ted Chiang. Next step is to put this digient AI into a Figure 03 robot [2].
[1] https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Obje...
[2] https://www.figure.ai/news/introducing-figure-03