Niantic announces “Large Geospatial Model” trained on Pokémon Go player data
This is pretty cool, but I feel as a pokehunter (Pokemon Go player), I have been tricked into working to contribute training data so that they can profit off my labor. How? They consistently incentivize you to scan pokestops (physical locations) through "research tasks" and give you some useful items as rewards. The effort is usually much more significant than what you get in return, so I have stopped doing it. It's not very convenient to take a video around the object or location in question. If they release the model and weights, though, I will feel I contributed to the greater good.
This title is editorialized. The real title is: "Building a Large Geospatial Model to Achieve Spatial Intelligence"
> Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize.
My personal layman's opinion:
I'm mostly surprised that they were able to do this. When I played Pokémon GO a few years back, the AR was so slow that I rarely used it. Apparently it's so popular and common, it can be used to train an LGM?
I also feel like this is a win-win-win situation here, economically. Players get a free(mium) game, Niantic gets a profit, the rest of the world gets a cool new technology that is able to turn "AR glasses location markers" into reality. That's awesome.
We do this at MyFitnessPal.
When users scan their barcode, the preview window is zoomed in so users think its mostly barcode. We actually get quite a bit more background noise typically of a fridge, supermarket aisle, pantry etc. but it is sent across to us, stored, and trained on.
Within the next year we will have a pretty good idea of the average pantry, fridge, supermarket aisle. Who knows what is next
Not wanting to over-do it, but is there possibly an argument the data about geospatial should be in the commons and google have some obligation to put the data back into the commons?
I'm not arguing to a legal basis but if it's crowdsourced, then the inputs came from ordinary people. Sure, they signed to T&Cs.
Philosophically, I think knowledge, facts of the world as it is, even the constructed world, should be public knowledge not an asset class in itself.
I can really imagine a meeting with the big brasses of Google/Niantic a few years ago that went along
- We need to be the first to have a better, new generation 3D model of the world to build the future of maps on it. How can we get that data?"
+ What about gamifying it and crowd-sourcing it to the masses?
- Sure! Let's buy some Pokemon rights!
It's scary but some people do really have some long-term vision
Brian Maclendon (Niantic) presented some interesting details about this in his recent Bellingfest presentation:
https://www.youtube.com/live/0ZKl70Ka5sg?feature=shared&t=12...
Very cool.
However, I can't fully agree that generating 3d scene "on the fly" is the future of maps and many other use cases for AR.
The thing with geospatial, buildings, roads, signs, etc. objects is that they are very static, not many changes are being made to them and many changes are not relevant to the majority of use cases. For example: today your house is white and in 3 years it has stains and yellowish color due to time, but everything else is the same.
Given that storage is cheap and getting cheaper, bandwidth of 5G and local networks is getting too fast for most current use cases, while computer graphics compute is still bound by our GPU performance, I say that it would be much more useful to identify the location and the building that you are looking at and pull the accurate model from the cloud (further optimisations might be needed like to pull only the data user has access to or needs access to given the task he is doing). Most importantly users will need to have access to a small subset of 3D space on daily basis, so you can have a local cache on end devices for best performance and rendering. Or stream rendered result from the cloud like nVidia GDN is doing.
Most precise models will come from CAD files for newly built buildings, retrospectively going back to CAD files of buildings build in last 20-30 years(I would bet most of them have some soft of computer model made before) and finally going back even further - making AI look at the old 2D construction plans of the building and reconstructing it in 3D.
Once the building is reconstructed (or a concrete pole like shown in the article) you can pull its 3D model from the cloud and place it in front of the user - this will cover 95% of use cases for AR. For 5% of the tasks you might want real time recognition of the current state of surfaces for some tasks or changes in geometry (like tracking the changes in the road quality compared with the previous scans or with reference model), but these cases can be tackled separately and having precise 3D model will only help, but won't be needed to be reconstructed from scratch.
This is a good 1st step to make a 3D map, however there should be an option to go to the real location and make edits to 3D plan by the expert so that the model can be precise and not "kind of" precise.
I'm sure the CIA already has access. [1] People were raising privacy concerns years ago. [2]
[1] https://www.networkworld.com/article/953621/the-cia-nsa-and-...
[2] https://kotaku.com/the-creators-of-pokemon-go-mapped-the-wor...
I still don't get what LGM is. From what I understood, it isn't actually about any "geospatial" data at all, is it? It is rather about improving some vision models to predict how the backside of a building looks, right? And training data isn't of people walking, but from images they've produced while catching pokemons or something?
P.S.: Also, if that's indeed what they mean, I wonder why having google street view data isn't enough for that.
Impressive, but this is one of those "if this is public knowledge, how far ahead is the _not_ public knowledge" things
> For example, it takes us relatively little effort to back-track our way through the winding streets of a European old town. We identify all the right junctions although we had only seen them once and from the opposing direction.
That is true for some people, but I'm fairly sure that the majority of people would not agree that it comes naturally to them.
Somehow I always thought something like that would have been the ultimate use case for Microsoft Photosynth (developed from Photo Tourism research project), ideally with a time dimension, like browsing photos in a geo spatio-temporal context.
I expect that was also some reason behind their flickr bid back then.
https://medium.com/@dddexperiments/why-i-preserved-photosynt...
https://phototour.cs.washington.edu
https://en.wikipedia.org/wiki/Photosynth
at least any patents regarding this will also expire about 2026.
> Today we have 10 million scanned locations around the world, and over 1 million of those are activated and available for use with our VPS service. We receive about 1 million fresh scans each week
Wait, they get a million a week but they only have a total of 10 million, ie 10 days worth? Is this a typo or am I missing something?
Even before LLMs, I knew they are going to launch a fine grained mapping service with all that camera and POI data. Now this one is actually much better obviously. Very few companies actually have this kind of data. Remains to be seen how they make money out of this
Interestingly, Pokemon GO only prompts players to scan a subset of the Points of Interest on the game map. Players can manually choose to scan any POI, but with no incentive for those scans I'm sure it almost never happens.
> Today we have 10 million scanned locations around the world, and over 1 million of those are activated and available for use with our VPS service.
This 1 in 10 figure is about accurate, both from experience as a player and from perusing the mentioned Visual Positioning System service. Most POI never get enough scan data to 'activate'. The data from POI that are able to activate can be accessed with a free account on Niantic Lightship [1], and has been available for a while.
I'll be curious to see how Niantic plans to fill in the gaps, and gather scan data for the 9 out of 10 POI that aren't designated for scan rewards.
I’ve published research in this general arena and the sheer amount of data they need to get good is massive. They have a moat the size of an ocean until most people have cameras and depth sensors on their face
It’s funny, we actually started by having people play games as well but we expressly told them it was to collect data. Brilliant to use an AR game that people actually play for fun
I'm guessing this can be the new bot that could play competitively at GeoGuesser. It would be interesting if Google trained a similar model and released it using all the Street Map data, I sure hope they do.
Has anyone done something similar with the geolocated WIFI MAC addresses, to have small model for predicting location from those.
I wonder how this can be combined with satellite data, if at all?
It may not be Geospatial data at all and I'm not sure how much the users consented but the data collection strategy was well crafted. I remember recommending building a game to collect handwriting data from testers (about a thousand), to the research lab I worked for long time back.
Genuinely impressed Google had the vision and resources to commit to a 10 year data collection project
Conversation about ‘players are the product’ of Pokémon go aside… What are some practical applications of an LGM?
Seems like navigation is ‘solved’? There’s already a lot of technology supporting permanence of virtual objects based on spatial mapping? Better AI generated animations?
I am sure there are a ton of innovations it could unlock…
Applications that I thought of as I read this:
Real-Time mapping of the environment for VR experiences with built-in semantic understanding.
Winning at geoguesser, automated doxing of anybody posting a picture of themselves.
Robotic positioning and navigation
Asset generation for video games. Think about generating an alternate New York City that's more influenced by Nepal.
I'm getting echoes of neural radiance fields as well.
Procedural generation of an alternative planet is the kind of stuff that the No Man's sky devs could only dream of.
Is this related to NeRF (neural radiance fields)?
I wonder if there's a sweet spot for geospatial model size.
A model trained on all data for 1m in every direction would probably be too sparse to be useful, but perhaps involving data from a different continent is costly overkill? I expect most users are only going to care about their immediate surroundings. Seems like an opportunity for optimization.
So that's why Pokemon was notoriously impactful on battery life. They were recording and uploading our videos the whole time?
This seems like it’d be quite handy to have in an autonomous vehicle of any kind
Don't quite understand the application of this?
I’m intrigued by the generative possibilities of such a model even more than how it could be used with irl locations. Imagine a game or simulation that creates a realistic looking American suburbia on the fly. It honestly can’t be that difficult, it practically predicts itself.
People complaining here that you are somehow owed something for contributing to the data set, or that because you use google maps or reCAPTCHA you are owed access to their training data. I mean, I'd like that data too. But you did get something in return already. A game that you enjoy (or your wouldn't play it), free and efficient navigation (better than your TomTom ever worked), sites not overwhelmed by bots or spammers. Yeah google gets more out of it than you probably do, but it's incorrect to say that you are getting 'nothing' in return.
Waymo is supposedly geofenced because they need detailed maps of an area. And this is supposedly a blocker for them deploying everywhere. But then Google goes and does something like this, and I'm not sure, if it's even really true that Waymo needs really detailed maps, that it's an insurmountable problem.
The data marginally better than what google already have
This is literally what I built my first company around starting in 2012, when Niantic was still working on Ingress
I describe it here during 500 Startups demo day: https://youtu.be/3oYHxdL93zE?si=cvLob-NHNEIJqYrI&t=6411
I further described it on the Planet of the Apps episode 1
Here's my patent from 2018: https://patents.google.com/patent/US10977818B2/en
So. I'm not really sure what to do here given that this was exactly and specifically what we were building and frankly had a lot of success in actually building.
Quite frustrating
[dead]
Fucking cool. Hi old Niantic teammates, it's me Mark Johns ;).
The cia has to be all over this.
I'm confused by both this blog post, and the reception on HN. They... didn't actually train the model. This is an announcement of a plan! They don't actually know if it'll even work. They announced that they "trained over 50 million neural networks," but not that they've trained this neural network: the other networks appear to just have been things they were doing anyway (i.e. the "Virtual Positioning Systems"). They tout huge parameter counts ("over 150 trillion"), but that appears to be the sum of the parameters of the 50 million models they've previously trained, which implies each model had an average of... 3MM parameters. Not exactly groundbreaking scale. You could train one a single consumer GPU.
This is a vision document, presumably intended to position Niantic as an AI company (and thus worthy of being showered with funding), instead of a mobile gaming company, mainly on the merit of the data they've collected rather than their prowess at training large models.