Cerebras Code

d3vr | 311 points

Tried this out with Cline using my own API key (Cerebras is also available as a provider for Qwen3 Coder via via openrouter here: https://openrouter.ai/qwen/qwen3-coder) and realized that without caching, this becomes very expensive very quickly. Specifically, after each new tool call, you're sending the entire previous message history as input tokens - which are priced at $2/1M via the API just like output tokens.

The quality is also not quite what Claude Code gave me, but the speed is definitely way faster. If Cerebras supported caching & reduced token pricing for using the cache I think I would run this more, but right now it's too expensive per agent run.

Flux159 | 9 hours ago

> running at speeds of up to 2,000 tokens per second, with a 131k-token context window, no proprietary IDE lock-in, and no weekly limits!

I was excited, then I read this:

> Send up to 1,000 messages per day—enough for 3–4 hours of uninterrupted vibe coding.

I don't mind paying for services I use. But it's hard to take this seriously when the first paragraph claim is contradicting the fine prints.

thanhhaimai | 10 hours ago

If you would like to try this in a coding agent (we find the qwen3-coder model works really well in agents!), we have been experimenting with Cerebras Code in Sketch. We just pushed support, so you can run it with the latest version, 0.0.33:

  brew install boldsoftware/tap/sketch
  CEREBRAS_API_KEY=...
  sketch --model=qwen3-coder-cerebras -skaband-addr=
Our experience is it seems overloaded right now, to the point where we have better results with our usual hosted version:

  sketch --model=qwen
crawshaw | 10 hours ago

Some users who signed up for pro ($50 p.m.) are reporting further limitations than those advertised.

>While they advertise a 1,000-request limit, the actual daily constraint is a 7.5 million-token limit. [1]

Assumes an average of 7.5k/request whereas in their marketing videos they show API requests ballooning by ~24k per request. Still lower than the API price.

[1] https://old.reddit.com/r/LocalLLaMA/comments/1mfeazc/cerebra...

unraveller | 6 hours ago

2k tokens/second is insane. While I'm very much against vibe coding, such performance essentially means you can get near-github copilot level speed with drastically better quality.

For in-editor use that's game changing.

alfalfasprout | 10 hours ago

Windsurf also has Cerebras/Qwen3-Coder. 1000 user messages per month for $15

https://x.com/windsurf/status/1951340259192742063

exclipy | 3 hours ago

How does context buildup work for the code generating machines generally ? Do the programs just use human notes + current code directly ? Are there some specific ranking steps that need to be done ?

another_twist | an hour ago

I was waiting for more subscription base services to pop up to compete with the influence provider on a commodities level.

I think a lot more companies will follow suit and the competition will make pricing much better for the end user.

congrats on the launch Cerebras team!

namanyayg | 11 hours ago

Does it work with claude-code-router? I was getting API errors this week trying to use qwen3 Cerebras through OpenRouter with Claude code router.

ktsakas | 11 hours ago

Their hardware is incredible. Why aren’t more investors lining up for this in this environment?

lvl155 | 9 hours ago

FYI, you are probably going to use up your tokens because there's a total limit of tokens per day, so in about 300 requests it's feasible to use it all up. See https://www.reddit.com/r/LocalLLaMA/comments/1mfeazc/cerebra...

segmondy | 5 hours ago

I'm so excited to see a real competitor to Claude Code! Gemini CLI, while decent, does not have a $200/month pricing model and they charge per API access - Codex is the same. I'm trying to get into the https://cloud.cerebras.ai/ to try the $50/month plan but I can't even get in.

sneilan1 | 10 hours ago

At $200/month the comparable should be Opus 4 not Sonnet 4.

clbrmbr | 10 hours ago

Attn: Cerebras

Any attempt to deal with "<think>" in the code gets it replaced with "<tool_call>".

Both in inference.cerebras.ai chat and API.

Same model on chat.qwen.ai doesn't do it.

attentive | 7 hours ago

My understanding is that the coding agents people use can be modified to plug into any LLM provider's API?

The difference here seems to be that Cerebras does not appear to have Qwen3-Coder through their API! So now there is a crazy fast (and apparently good too?) model that they only provide if you pay the crazy monthly sub?

sophia01 | 11 hours ago

I'm finding myself switching between subscriptions to ChatGPT, T3 Chat, DeepSeek, Claude Code etc. Their subscription models aren't compatible with making it easy to take your data with you. I wish I could try this out and import all my data.

deevus | 7 hours ago

I've been waiting on this for a LONG time. Integration with Cursor when Cerebras released their earlier models was patchy at best, even through openrouter. It's nice to finally see official support, although I'm a bit worried about long-term the time for bash mcp calls ending up dominating.

Still, definitely the right direction!

EDIT: doesn't seem like anything but a first-party api with a monthly plan.

JackYoustra | 11 hours ago

Super curious to see some comparisons to claude code. Especially Opus, since they're primarily comparing it to Sonnet in that graph.

unshavedyak | 10 hours ago

I use regular cerebras for plan stage in cline, so I’m very excited to try this out

atkailash | 7 hours ago

Is this available as cline/roo-code integration? I think it might be on openrouter too.

lxe | 9 hours ago

For those that have tried this, what kind of time-to-first-token latency are you seeing?

dpkirchner | 10 hours ago

Groq also probably has this in the works. Fun times.

scosman | 9 hours ago

What are the token prices?

cellis | 8 hours ago

They should just host all the latest open source models FTW.

esafak | 6 hours ago

It says it works with your favorite IDE-- How do you (the reader) plan to use this? I use Cursor, but I'm not sure if this replaces my need to pay for Cursor, or if I need to pay for Cursor AND this, and add in the LLM?

Or is VS code pretty good at this point? Or is there something better? These are the only two ways I'd know how to actually consume this with any success.

knicholes | 10 hours ago

This has to be a monstrous money loser.

If they can maintain this pricing level, and if Qwen3‑Coder is as good as people say then they will have an enormous hit on their hands. A massive money losing hit, but a hit.

Very interesting!

PS: Did they reduce the context window, it looks like it.

HardCodedBias | 10 hours ago

How is this even possible?

supernova8 | 10 hours ago

[dead]

evrennetwork | 3 hours ago

[dead]

hexagrams64 | 6 hours ago

[flagged]

dude250711 | 10 hours ago