Qwen2.5: A Party of Foundation Models

apsec112 | 161 points

Probably an ignorant question, but could someone explain why the Context Length is much larger than the Generation Length?

jcoc611 | a day ago

I'm impressed by the scope of this drop. The raw intelligence of open models seems to be falling behind closed. But I think that's because frontier models from openai and anthropic are not just raw models, but probably include stuff like COT, 'best of N', or control vectors.

irthomasthomas | 17 hours ago

32B is a nice size for 2x 3090s. That comfortably fits on the GPU with minimal quantization and still leaves extra memory for the long context length.

70B is just a littttle rough trying to run without offloading some layers to the CPU.

freeqaz | a day ago

It would be nice to have comparisons to Claude 3.5 for the coder model, only comparing to open source models isn’t super helpful because I would want to compare to the model I’m currently using for development work.

Flux159 | a day ago

Actually really impressive. They went up from 7T tokens to 18T tokens. Curious to see how they perform after finetuning.

ekojs | a day ago

> we are inspired by the recent advancements in reinforcement learning (e.g., o1)

It is interesting to see what the future will bring when models incorporate chain of thought approaches and whether o1 will get outperformed by open source models.

cateye | a day ago

>our latest large-scale dataset, encompassing up to 18 trillion tokens

I remember when GPT-3 was trained on 300B tokens.

GaggiX | a day ago
[deleted]
| 16 hours ago
[deleted]
| a day ago