I'm into VR and mixed reality, and I think this is headed to making the Holodeck real in an immersive way. That's the concept of the Matrix and what they are demoing, just in 2d.
I am guessing the main thing holding this stuff back in terms of fidelity and consistency or generalization is just compute. But the new techniques they have here have just dramatically lowered the compute costs and increased the generalization.
Maybe just something like the giant Cerebras SRAM chips will get to the next 10 X in scale that smooths this out and pushes it closer to Star Trek. Or maybe some new paradigm like memristors.
But I'm looking forward to within just a few years being able to put on some fairly comfortable mixed reality glasses and just asking for whatever or whoever I want to appear in my home (for example) according to my whim.
Or, train it on a lot of how-to videos such as cooking. It just materializes an example of someone showing you exactly what you need to do right in your kitchen.
Here's another crazy idea: train on videos and interactions with productivity applications rather than games. In the future, for small businesses, we skip having the AI generate source code and just describe how the application works. The data and program state are just stored in a giant context window, and the application functionality changes the instant you make a request.
I wish researchers would spend more time on using generative models to create level geometry, rather than trying to generate video from scratch. It would be both cheaper and more effective for stable gameplay.
This is the future I was trying to pitch in 2018 when we had built Ayvri and had every paraglider in the world, the world's largest ultramarathons, drone operators, and lots of other users of our real-world 3D environment.
Though we were using map tiles at the time, we were developing a model that took photos and a GPS track to add information that better matched environmental conditions (cloud, better lighting, etc).
People still ask me to open-source or give them our source code, but the code was acquired, so that isn't possible. But I do regularly say that if I were to rebuild Ayvri today, I'd do it as an interactive video rather than loading tiles.
Why would you want to generate all the pixels using this model instead of generating all the art, physics, and objects in the world using a game engine? The engine does so much of the physics and keeps everything stable for very cheap.
I didn't fully grok what this was about from the website. Though just last night I was talking to a friend about that quote from the Matrix that Morpheus tells Neo, so some nice synchronicity there. The sense I got from this is that they are developing a triple AAA type virtual world that can get generated on the fly based on text prompts? When the authors say frame level control do they mean that at any point, the next frame can be manipulated, either to be completely new or to influence the current story or context that is being played out?
I’m really excited for where this is going. From the demo videos, it seems to be a step up from Oasis, which itself came out only 2 weeks ago. I expect to see a lot of innovative use cases in this field
unreadable website
> Click to play
Clicking - nothing works.
No source, no playable demo, just promises of.
Could be total vaporware for all we know.
It is an ad, a statement of achievement in case someone else states it first, or what?
Seems like it would be better on Youtube, it really doesn't offer much of use right now.
Definitely used Cyberpunk2077 footage to train
"Welcome to the Matrix" with matrix-like rain seems like an invitation for Warner Bros to sue you into oblivion.
Prediction: in 20 years, I’m going to be reading about some dude who wrote a program to drive the car continuously until it ran into some surreal edge condition, and finally hit it. There will be a subculture of “matrix glitchers” who spend much of their time doing these kinds of experiments.
This is surely really cool. Just a bit sad that, as phrased by the authors, the "First Real-Time" virtual world created for the demo is a fat & fast SUV driving on virgin lands.
[dead]
[flagged]
Someone should ban "AI" articles on Hacker News.
What does "world" mean here? How does the spatiality fit into some latent space? Or what constitutes the "world"? If the answer is, there is none, the world is just frames of video and any consistency quickly blurs out after a few seconds. That's not a world generation, that's just generation of video frames following frames. Not that it isn't cool, but it has almost zero usability for generating a "world" simulation. The key to a realistic world is that you can reliably navigate it. Visit and revisit places. If you modify anything, those modifications are persisted. If you leave a room and re-enter it hours later, the base expectation is that the same objects are in that room.
Wouldn't a working approach be to just create a really low resolution 3D world in the traditional "3D game world" sense to get the spatial consistency. Then this crude map with attributes is fed into frame generation to create the resulting world? It wouldn't be infinite, but on the other hand no one has a need for an infinite world either. A spherical world solves the border issue pretty handily. As I understood it, there was some element of that in the new FS2024 (discussed yesterday on HN).