I'm currently making 2bit to 8bit GGUFs for local deployment! Will be up in an hour or so at https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruc...
Also docs on running it in a 24GB GPU + 128 to 256GB of RAM here: https://docs.unsloth.ai/basics/qwen3-coder
I've been using it all day, it rips. Had to bump up toolcalling limit in cline to 100 and it just went through the app no issues, got the mobile app built, fixed throug hthe linter errors... wasn't even hosting it with the toolcall template on with the vllm nightly, just stock vllm it understood the toolcall instructions just fine
> Qwen3-Coder is available in multiple sizes, but we’re excited to introduce its most powerful variant first
I'm most excited for the smaller sizes because I'm interested in locally-runnable models that can sometimes write passable code, and I think we're getting close. But since for the foreseeable future, I'll probably sometimes want to "call in" a bigger model that I can't realistically or affordably host on my own computer, I love having the option of high-quality open-weight models for this, and I also like the idea of "paying in" for the smaller open-weight models I play around with by renting access to their larger counterparts.
Congrats to the Qwen team on this release! I'm excited to try it out.
The "qwen-code" app seems to be a gemini-cli fork.
https://github.com/QwenLM/qwen-code https://github.com/QwenLM/qwen-code/blob/main/LICENSE
I hope these OSS CC clones converge at some point.
Actually it is mentioned in the page:
we’re also open-sourcing a command-line tool for agentic coding: Qwen Code. Forked from Gemini Code
This suggests adding a `QWEN.md` in the repo for agents instructions. Where are we with `AGENTS.md`? In a team repo it's getting ridiculous to have a duplicate markdown file for every agent out there.
How does one keep up with all this change? I wish we could fast-forward like 2-3 years to see if an actual winner has landed by then. I feel like at that point there will be THE tool, with no one thinking twice about using anything else.
What sort of hardware will run Qwen3-Coder-480B-A35B-Instruct?
With the performance apparently comparable to Sonnet some of the heavy Claude Code users could be interested in running it locally. They have instructions for configuring it for use by Claude Code. Huge bills for usage are regularly shared on X, so maybe it could even be economical (like for a team of 6 or something sharing a local instance).
Glad to see everyone centering on using OpenHands [1] as the scaffold! Nothing more frustrating than seeing "private scaffold" on a public benchmark report.
I just checked and it's up on OpenRouter. (not affiliated) https://openrouter.ai/qwen/qwen3-coder
does anyone understand pricing ? On OpenRouter (https://openrouter.ai/qwen/qwen3-coder) you have:
Alibaba Plus: input: $1 to $6 output: $5 to $60
Alibaba OpenSource: input: $1.50 to $4.50 output: $7.50 to $22.50
So it doesn't look that cheap comparing to Kimi k2 or their non coder version (Qwen3 235B A22B 2507).
What's more confusing this "up to" pricing that supposed to can reach $60 for output - with agents it's not that easy to control context.
> Additionally, we are actively exploring whether the Coding Agent can achieve self-improvement
How casually we enter the sci-fi era.
Open weight models matching Cloud 4 is exciting! It's really possible to run this locally since it's MoE
Odd to see this languishing at the bottom of /new. Looks very interesting.
Open, small, if the benchmarks are to be believed sonnet 4~ish, tool use?
Thank god I already made an Alibaba Cloud account last year because this interface sucks big time. At least you get 1 mio. tokens free (once?). Bit confusing that they forked the Gemini CLI but you still have to set environment variables for OpenAI?
Much faster than Claude Sonnet 4 with similar results.
I'm waiting on this to be released on Groq or Cerebras for high speed vibe coding.
Can someone please make these CLI from Rust/Ratatui.
I'm confused why would this LLM require API keys to openAI?
At my work, here is a typical breakdown of time spent by work areas for a software engineer. Which of these areas can be sped up by using agentic coding?
05%: Making code changes
10%: Running build pipelines
20%: Learning about changed process and people via zoom calls, teams chat and emails
15%: Raising incident tickets for issues outside of my control
20%: Submitting forms, attending reviews and chasing approvals
20%: Reaching out to people for dependencies, following up
10%: Finding and reading up some obscure and conflicting internal wiki page, which is likely to be outdated