As someone who works on satellite imagery, this part is incredibly exciting:
> ViT models pretrained on satellite dataset (SAT-493M)
DINOv2 had pretty poor out-of-the-box performance on satellite/aerial imagery, so it's super exciting that they released a version of it specifically for this use case.
I think SAM and DINO are the two off-the-shelf image models I've gotten the most mileage out of.
You have to share your contact information, including DoB, and then be approved access, to obtain the models, and given that it's Meta I assume they're actually validating it against their All Humans database.
They made their own DINOv3 license for this release (whereas DINOv2 used the Apache 2.0 license).
Neat though. Will still check it out.
As a first comment, I had to install the latest transformer==4.56.0dev (e.g. pip install git+https://github.com/huggingface/transformers) for it to work properly. 4.55.2 and earlier was failing with a missing image type in the config.
Could anyone point to an example or git repo showing a simple implementation?
I’m fascinated by this, but am admittedly clueless about how to actually go about building any kind of recognizer or other system atop it.
If I’m already using siglip2 for a clustering application, is this enough of a an uplift that I should look at it?
I have no idea what this even is.
That's awesome. DINOv2 was the best image embedder until now.
This was submitted earlier:
DINOV3: Self-supervised learning for vision at unprecedented scale | https://news.ycombinator.com/item?id=44904608
- Blog post: https://ai.meta.com/blog/dinov3-self-supervised-vision-model... - Paper: https://ai.meta.com/research/publications/dinov3/ - Hugging Face: https://huggingface.co/collections/facebook/dinov3-68924841b...