Hunyuan3D 2.0 – High-Resolution 3D Assets Generation

TheGuyWhoCodes | 206 points

Question related to 3D mesh models in general: has any significant work been done on models oriented towards photogrammetry?

Case in point, I have a series of photos (48) that capture a small statue. The photos are high quality, the object was on a rotating platform. Lighting is consistent. The background is solid black.

These normally are ideal variables for photogrammetry but none of the various common applications and websites do a very good job creating a mesh out of it that isn't super low poly and/or full of holes.

I've been casually scanning huggingface for relevant models to try out but haven't really found anything.

geuis | 8 hours ago

Generative AI is going to drive the marginal cost of building 3D interactive content to zero. Unironically this will unlock the metaverse, cringe as that may sound. I'm more bullish than ever on AR/VR.

MikeTheRocker | 10 hours ago

For the AI un-initiated; is this something you could feasibly run at home? eg on a 4090? (How can I tell how "big" the model is from the github or huggingface page?)

denkmoon | 10 hours ago

Ouch; License: EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA

  TENCENT HUNYUAN 3D 2.0 COMMUNITY LICENSE AGREEMENT
  Tencent Hunyuan 3D 2.0 Release Date: January 21, 2025
  THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
https://github.com/Tencent/Hunyuan3D-2?tab=License-1-ov-file
pella | 10 hours ago

Interesting. One of the diagrams suggests that the mesh is generated from the marching cubes algorithm but the geometry of the meshes shown above are clearly not generated in this way.

sebzim4500 | 10 hours ago

Any user-generated content system suffers from what we call “the penis problem”.

xgkickt | 6 hours ago

As with any generative model, trust but verify. Try it yourself. Frankly, as a generative researcher myself, there's a lot of reason to not trust what you see in papers and pages.

They link a Huggingface page (great sign!): https://huggingface.co/spaces/tencent/Hunyuan3D-2

I tried to replicate the objects they show on their project page (https://3d-models.hunyuan.tencent.com/). The full prompts exist but are truncated so you can just inspect the element and grab the text.

  Here's what I got
  Leaf
     PNG: https://0x0.st/8HDL.png
     GLB: https://0x0.st/8HD9.glb
  Guitar
     PNG: https://0x0.st/8HDf.png  other view: https://0x0.st/8HDO.png
     GLB: https://0x0.st/8HDV.glb
  Google Translate of Guitar:
     Prompt: A brown guitar is centered against a white background, creating a realistic photography style. This photo captures the culture of the instrument and conveys a tranquil atmosphere.
     PNG: https://0x0.st/8HDt.png   and  https://0x0.st/8HDv.png
     Note: Weird thing on top of guitar. But at least this time the strings aren't fusing into sound hole. 
I haven't tested my own prompts or the google translation of the Chinese prompts because I'm getting an over usage error (I'll edit comment if I get them). That said, these look pretty good. The paper and page images definitely look better, but these aren't like Stable Diffusion 1 paper vs Stable Diffusion 1 reality.

But these are long and detailed prompts. Lots of prompt engineering. That should raise some suspicion. Real world has higher variance and let's get an idea how hard it is to use. So let's try some simpler things :)

  Prompt: A guitar
    PNG: https://0x0.st/8HDg.png
    Note: Not bad! Definitely overfit but does that matter here? A bit too thick for a electric guitar but too thin for acoustic.
  Prompt: A Monstera leaf
    PNG: https://0x0.st/8HD6.png  
         https://0x0.st/8HDl.png  
         https://0x0.st/8HDU.png
    Note: A bit wonkier. I picked this because it looked like the leaf in the example but this one is doing some odd things. 
          It's definitely a leaf and monstera like but a bit of a mutant. 
  Prompt: Mario from Super Mario Bros
    PNG: https://0x0.st/8Hkq.png
    Note: Now I'm VERY suspicious....
  Prompt: Luigi from Super Mario Bros
    PNG: https://0x0.st/8Hkc.png
         https://0x0.st/8HkT.png  
         https://0x0.st/8HkA.png
    Note: Highly overfit[0]. This is what I suspected. Luigi isn't just tall Mario. 
          Where is the tie coming from? The suspender buttons are all messed up. 
          Really went uncanny valley here. So this suggests we're really brittle. 
  Prompt: Peach from Super Mario Bros
    PNG: https://0x0.st/8Hku.png  
         https://0x0.st/8HkM.png
    Note: I'm fucking dying over here this is so funny. It's just a peach with a cute face hahahahaha
  Prompt: Toad from Super Mario Bros
    PNG: https://0x0.st/8Hke.png 
         https://0x0.st/8Hk_.png
         https://0x0.st/8HkL.png
    Note: Lord have mercy on this toad, I think it is a mutated Squirtle.  
Paper can be found here (the arxiv badge on the page leads to a pdf in the repo, which github is slow to render those): https://arxiv.org/abs/2411.02293

(If you want to share images like I did all I'm doing is `curl -F'file=@foobar.png' https://0x0.st`)

[0] Overfit is a weird thing now. Maybe it doesn't generalize well, but sometimes that's not a problem. I think this is one of the bigger lessons we've learned with recent ML models. My viewpoint is "Sometimes you want a database with a human language interface. Sometimes you want to generalize". So we have to be more context driven here. But certainly there are a lot of things we should be careful about when we're talking about generation. These things are trained on A LOT of data. If you're more "database-like" then certainly there's potential legal ramifications...

Edit: For context, by "look pretty good" I mean in comparison to other works I've seen. I think it is likely a ways from being useful in production. I'm not sure how much human labor would be required to fix the issues.

godelski | 10 hours ago

[flagged]

artemonster | 11 hours ago