Benefits of Apache Iceberg for geospatial data analysis

MrPowers | 16 points

> today, spatial data users have a problem if they need to scale above about a million features

I found this claim to be surprising. That is a small dataset even for cartography, never mind geospatial analysis.

My heuristic advice for years has been that the comfortable limits of open source geospatial analysis are about a billion features if you know what you are doing and use the tools well. This is still a pretty small dataset for geospatial analysis but it is large enough that it covers a lot of use cases. The storage format is mostly at the margin, scalability is much more about optimal scheduling and query selectivity.

Scaling past this point puts you into the realm of exotics with custom I/O and execution schedulers (required) and much less obvious ways of organizing the data that nonetheless scale qualitatively better because they have better selectivity with a smaller memory footprint.

jandrewrogers | 12 hours ago
[deleted]
| 12 hours ago