SVDQuant+NVFP4: 4× Smaller, 3× Faster FLUX with 16-bit Quality on Blackwell GPUs

lmxyy | 49 points

SVDQuant supports NVFP4 on NVIDIA Blackwell GPUs with 3× speedup over BF16 and better image quality than INT4. Try our interactive demo below or at https://svdquant.mit.edu/! Our code is all available at https://github.com/mit-han-lab/nunchaku!

lmxyy | 16 hours ago

I assume they've messed up the prompt caption for the squirrel-looking creature?

Interesting to see how poor the prompt adhesion is in these examples. The cyanobacteria one is just "an image of the ocean". The skincare one completely ignores 50% of the ingredients in the prompt, and makes coffee beans the size and shape of almonds.

semi-extrinsic | 7 hours ago

I thought I'd already seen this in the previous discussion 3 months ago https://news.ycombinator.com/item?id=42093112 but that one used INT4 quantization, so NVFP4 is a further improvement on that. Sweet!

If I found the correct docs https://docs.nvidia.com/deeplearning/cudnn/frontend/latest/o... NVFP4 means 16 4-bit floating-point values (1 sign bit, 2 for the exponent, 1 for the mantissa) each have one shared 8-bit floating point scaling factor (1 sign bit, 4 exponent, 3 mantissa), so strictly speaking it's 4.5 bits per value.

This grouped scaling immediately makes me wonder whether the quantization error could be reduced even more by permuting the matrix so values of similar magnitude are quantized together.

yorwba | 11 hours ago

FLUX-schnell is only 800ms on RTX 5090.

lmxyy | 14 hours ago

This is amazing

beebaween | 14 hours ago

[dead]

curtisszmania | 12 hours ago

Now release the LoRa conversion code you promised months ago…

42lux | 8 hours ago