Generative AI is going to drive the marginal cost of building 3D interactive content to zero. Unironically this will unlock the metaverse, cringe as that may sound. I'm more bullish than ever on AR/VR.
For the AI un-initiated; is this something you could feasibly run at home? eg on a 4090? (How can I tell how "big" the model is from the github or huggingface page?)
Ouch; License: EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA
TENCENT HUNYUAN 3D 2.0 COMMUNITY LICENSE AGREEMENT
Tencent Hunyuan 3D 2.0 Release Date: January 21, 2025
THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
https://github.com/Tencent/Hunyuan3D-2?tab=License-1-ov-fileInteresting. One of the diagrams suggests that the mesh is generated from the marching cubes algorithm but the geometry of the meshes shown above are clearly not generated in this way.
Any user-generated content system suffers from what we call “the penis problem”.
As with any generative model, trust but verify. Try it yourself. Frankly, as a generative researcher myself, there's a lot of reason to not trust what you see in papers and pages.
They link a Huggingface page (great sign!): https://huggingface.co/spaces/tencent/Hunyuan3D-2
I tried to replicate the objects they show on their project page (https://3d-models.hunyuan.tencent.com/). The full prompts exist but are truncated so you can just inspect the element and grab the text.
Here's what I got
Leaf
PNG: https://0x0.st/8HDL.png
GLB: https://0x0.st/8HD9.glb
Guitar
PNG: https://0x0.st/8HDf.png other view: https://0x0.st/8HDO.png
GLB: https://0x0.st/8HDV.glb
Google Translate of Guitar:
Prompt: A brown guitar is centered against a white background, creating a realistic photography style. This photo captures the culture of the instrument and conveys a tranquil atmosphere.
PNG: https://0x0.st/8HDt.png and https://0x0.st/8HDv.png
Note: Weird thing on top of guitar. But at least this time the strings aren't fusing into sound hole.
I haven't tested my own prompts or the google translation of the Chinese prompts because I'm getting an over usage error (I'll edit comment if I get them). That said, these look pretty good. The paper and page images definitely look better, but these aren't like Stable Diffusion 1 paper vs Stable Diffusion 1 reality.But these are long and detailed prompts. Lots of prompt engineering. That should raise some suspicion. Real world has higher variance and let's get an idea how hard it is to use. So let's try some simpler things :)
Prompt: A guitar
PNG: https://0x0.st/8HDg.png
Note: Not bad! Definitely overfit but does that matter here? A bit too thick for a electric guitar but too thin for acoustic.
Prompt: A Monstera leaf
PNG: https://0x0.st/8HD6.png
https://0x0.st/8HDl.png
https://0x0.st/8HDU.png
Note: A bit wonkier. I picked this because it looked like the leaf in the example but this one is doing some odd things.
It's definitely a leaf and monstera like but a bit of a mutant.
Prompt: Mario from Super Mario Bros
PNG: https://0x0.st/8Hkq.png
Note: Now I'm VERY suspicious....
Prompt: Luigi from Super Mario Bros
PNG: https://0x0.st/8Hkc.png
https://0x0.st/8HkT.png
https://0x0.st/8HkA.png
Note: Highly overfit[0]. This is what I suspected. Luigi isn't just tall Mario.
Where is the tie coming from? The suspender buttons are all messed up.
Really went uncanny valley here. So this suggests we're really brittle.
Prompt: Peach from Super Mario Bros
PNG: https://0x0.st/8Hku.png
https://0x0.st/8HkM.png
Note: I'm fucking dying over here this is so funny. It's just a peach with a cute face hahahahaha
Prompt: Toad from Super Mario Bros
PNG: https://0x0.st/8Hke.png
https://0x0.st/8Hk_.png
https://0x0.st/8HkL.png
Note: Lord have mercy on this toad, I think it is a mutated Squirtle.
Paper can be found here (the arxiv badge on the page leads to a pdf in the repo, which github is slow to render those): https://arxiv.org/abs/2411.02293(If you want to share images like I did all I'm doing is `curl -F'file=@foobar.png' https://0x0.st`)
[0] Overfit is a weird thing now. Maybe it doesn't generalize well, but sometimes that's not a problem. I think this is one of the bigger lessons we've learned with recent ML models. My viewpoint is "Sometimes you want a database with a human language interface. Sometimes you want to generalize". So we have to be more context driven here. But certainly there are a lot of things we should be careful about when we're talking about generation. These things are trained on A LOT of data. If you're more "database-like" then certainly there's potential legal ramifications...
Edit: For context, by "look pretty good" I mean in comparison to other works I've seen. I think it is likely a ways from being useful in production. I'm not sure how much human labor would be required to fix the issues.
[flagged]
Question related to 3D mesh models in general: has any significant work been done on models oriented towards photogrammetry?
Case in point, I have a series of photos (48) that capture a small statue. The photos are high quality, the object was on a rotating platform. Lighting is consistent. The background is solid black.
These normally are ideal variables for photogrammetry but none of the various common applications and websites do a very good job creating a mesh out of it that isn't super low poly and/or full of holes.
I've been casually scanning huggingface for relevant models to try out but haven't really found anything.