Claude Sonnet 4 now supports 1M tokens of context

adocomplete | 1311 points

This is definitely one of my CORE problem as I use these tools for "professional software engineering." I really desperately need LLMs to maintain extremely effective context and it's not actually that interesting to see a new model that's marginally better than the next one (for my day-to-day).

However. Price is king. Allowing me to flood the context window with my code base is great, but given that the price has substantially increased, it makes sense to better manage the context window into the current situation. The value I'm getting here flooding their context window is great for them, but short of evals that look into how effective Sonnet stays on track, it's not clear if the value actually exists here.

aliljet | 5 days ago

A tip for those who both use Claude Code and are worried about token use (which you should be if you're stuffing 400k tokens into context even if you're on 20x Max):

  1. Build context for the work you're doing. Put lots of your codebase into the context window.
  2. Do work, but at each logical stopping point hit double escape to rewind to the context-filled checkpoint. You do not spend those tokens to rewind to that point.
  3. Tell Claude your developer finished XYZ, have it read it into context and give high level and low level feedback (Claude will find more problems with your developer's work than with yours).
If you want to have multiple chats running, use /resume and pull up the same thread. Hit double escape to the point where Claude has rich context, but has not started down a specific rabbit hole.
gdudeman | 5 days ago

This is definitely good to have this as an option but at the same time having more context reduces the quality of the output because it's easier for the LLM to get "distracted". So, I wonder what will happen to the quality of code produced by tools like Claude Code if users don't properly understand the trade off being made (if they leave it in auto mode of coding right up to the auto compact).

tankenmate | 5 days ago

My experience with the current tools so far:

1. It helps to get me going with new languages, frameworks, utilities or full green field stuff. After that I expend a lot of time parsing the code to understand what it wrote that I kind of "trust" it because it is too tedious but "it works".

2. When working with languages or frameworks that I know, I find it makes me unproductive, the amount of time I spend writing a good enough prompt with the correct context is almost the same or more that if I write the stuff myself and to be honest the solution that it gives me works for this specific case but looks like a junior code with pitfalls that are not that obvious unless you have the experience to know it.

I used it with Typescript, Kotlin, Java and C++, for different scenarios, like websites, ESPHome components (ESP32), backend APIs, node scripts etc.

Botton line: usefull for hobby projects, scripts and to prototypes, but for enterprise level code it is not there.

not_that_d | 4 days ago

One of the most helpful usages of CC so far is when I simply ask:

"Are there any bugs in the current diff"

It analyzes the changes very thoroughly, often finds very subtle bugs that would cost hours of time/deployments down the line, and points out a bunch of things to think through for correctness.

lpa22 | 4 days ago

I could be wrong, but I think this pricing is the first to admit that cost scales quadratically with number of tokens. It’s the first time I’ve seen nonlinear pricing from an LLM provider which implicitly mirrors the inference scaling laws I think we're all aware of.

howinator | 4 days ago

A big problem with the chat apps (ChatGPT; Claude.ai) is the weird context window hijinks. Especially ChatGPT does wild stuff.. sudden truncation; summarization; reinjecting 'ghost snippets' etc

I was thinking this should be up to the user (do you want to continue this conversation with context rolling out of the window or start a new chat) but now I realized that this is inevitable given the way pricing tiers and limited computation works. Like the only way to have full context is use developer tools like Google AI Studio or use a chat app that wraps the API

With a custom chat app that wraps the API you can even inject the current timestamp into each message and just ask the LLM btw every 10 minutes just make a new row in a markdown table that summarizes every 10 min chunk

firasd | 5 days ago

Isn’t Opus better? Whenever I run out of Opus tokens and get kicked down to Sonnet it’s quite a shock sometimes.

But man I’m at the perfect stage in my career for these tools. I know a lot, I understand a lot, I have a lot of great ideas-but I’m getting kinda tired of hammering out code all day long. Now with Claude I am just busting ass executing in all these ideas and tests and fixes-never going back!

psyclobe | 4 days ago

Related ongoing thread:

Claude vs. Gemini: Testing on 1M Tokens of Context - https://news.ycombinator.com/item?id=44878999 - Aug 2025 (9 comments)

dang | 4 days ago

1M of input... at $6/1M input tokens. Better hope it can one-shot your answer.

isoprophlex | 5 days ago

I like to spend a lot of time in "Ask" mode in Cursor. I guess the equivalent in Claude code is "plan" mode.

Where I have minimal knowledge about the framework or language, I ask a lot of questions about how the implementation would work, what the tradeoffs are etc. This is to minimize any misunderstanding between me and the tool. Then I ask it to write the implementation plan, and execute it one by one.

Cursor lets you have multiple tabs open so I'll have a Ask mode and Agent mode running in parallel.

This is a lot slower, and if it was a language/framework I'm familiar with I'm more likely to execute the plan myself.

meander_water | 4 days ago

1M context windows are not created equal. I doubt Claude's recall is as good as Gemini's 1M context recall. https://cloud.google.com/blog/products/ai-machine-learning/t...

xnx | 5 days ago

Before this they supposedly had a longer context window than ChatGPT, but I have workloads that abuse the heck out of context windows (100-120K tokens). ChatGPT genuinely seems to have a 32K context window, in the sense that is legitimately remembers/can utilize everything within that window.

Claude previously had "200K" context windows, but during testing it wouldn't even hit a full 32K before hitting a wall/it forgetting earlier parts of the context. They also have extremely short prompt limits relative to the other services around, making it hard to utilize their supposedly larger context windows (which is suspicious).

I guess my point is that with Anthropic specifically, I don't trust their claims because that has been my personal experience. It would be nice if this "1M" context window now allows you to actually use 200K though, but it remains to be seen if it can even do that. As I said with Anthropic you need to verify everything they claim.

Someone1234 | 5 days ago

How does "supporting 1M tokens" really work in practice? Is it a new model? Or did they just remove some hard coded constraint?

simianwords | 4 days ago

I think this highlights some problems with software development in general. i.e. the code isn't enough - you need to have domain knowledge too and a lot of knowledge about how and why the company needs things done in some way or another. You might imagine that dumping the contents of your wiki and all your chat channels into some sort of context might do it but that would miss the 100s of verbal conversations between people in the company. It would also fall foul of the way everything tends to work in any way you can imagine except what the wiki says.

Even if you transcribed all the voice chats and meetings and added it in, it challenges a human to work out what is going on. No-context human developers are pretty useless too.

t43562 | 4 days ago

feels like we just traded "not enough context" for "too much noise." The million-token window is cool for marketing, but until retrieval and summarization get way better, it’s like dumping the entire repo on a junior dev’s desk and saying "figure it out." They’ll spend half their time paging through irrelevant crap, and the other half hallucinating connections. Bigger context is only a net win if the model can filter, prioritize, and actually reason over it

simon_rider | 4 days ago

As far as coding goes Claude seems to be the most competent right now, I like it. GPT5 is abysmal - I'm not sure if they're bugs, or what, but the new release takes a good few steps back. Gemini still a hit and miss - and Grok seems to be a poor man's Claude (where code is kind of okay, a bit buggy and somehow similar to Claude).

joduplessis | 4 days ago

What I've found with LLMs is they're basically a better version of Google Search. If I need a quick "How do I do..." or if I need to find a quick answer to something its way more useful than Google and the fact that I can ask follow up questions is amazing. But for any serious deep work it has a long way to go.

phyzix5761 | 4 days ago

Strange that they don't mention whether that's enabled or configurable in Claude Code.

falcor84 | 5 days ago

Oh man finally. This has been such a HUGE advantage for Gemini.

Could we please have zip files too? ChatGPT and Gemini both unpack zip files via the chat window.

Now how about a button to download all files?

andrewstuart | 5 days ago

Just completed a new benchmark that sheds some light on whether Anthropic's premium is worth it.

(Short answer: not unless your top priority is speed.)

https://brokk.ai/power-rankings

jbellis | 5 days ago

I believe this can be configured in Claude Code via the following environment variable:

ANTHROPIC_BETAS="context-1m-2025-08-07" claude

varyherb | 5 days ago

Yes, but if you look in the rate limit notes, the rate limit is 500k tokens / minite for tier 4, which we are on. Given how stingy anthropic has been with rate limit increases, this is for very few people right now

greenfish6 | 5 days ago

I won't complain about a strict upgrade, but that's a pricy boi. Interesting to see differential pricing based on size of input, which is understandable given the O(n^2) nature of attention.

qsort | 5 days ago

How does anyone send these models that much context without it tripping over itself? I can't get anywhere near that much before it starts losing track of instruction.

pupppet | 5 days ago

Wow, I thought they would feel some pricing pressure from GPT5 API costs, but they are doubling down on their API being more expensive than everyone else.

lherron | 5 days ago
[deleted]
| 4 days ago

That is an unfortunate logo.

as367 | 4 days ago

Oh, well, ChatGPT is being left in the dust…

When done correctly, having one million tokens of context window is amazing for all sorts of tasks: understanding large codebases, summarizing books, finding information on many documents, etc.

Existing RAG solutions fill a void up to a point, but they lack the precision that large context windows offer.

I’m excited for this release and hope to see it soon on the UI as well.

thimabi | 5 days ago

Many people are confused about the usefulness of 1M tokens because LLMs often start to get confused after about 100k. But this is big for Claude 4 because it uses automatic RAG when the context becomes large. With optimized retrieval thanks to RAG, we'll be able to make good use of those 1M tokens.

brokegrammer | 4 days ago

While this is cool, can anything be done about the speed of inference?

At least for my use, 200K context is fine, but I’d like to see a lot faster task completion. I feel like more people would be OK with the smaller context if the agent acts quickly (vs waiting 2-3 mins per prompt).

i_have_an_idea | 4 days ago

Shame it's only the API. Would've loved to see it via the web interface on claude.ai itself.

mettamage | 5 days ago

I hope that they are going to put something in Claude Code to display if you're entering the expensive window. Sometime I just keep the conversation going. I wouldn't want that to burn my Max credits 2x faster.

aledalgrande | 4 days ago

Peer of this post currently also on HN front page, comparing perf for Claude vs Gemini, w/ 1M tokens: https://news.ycombinator.com/item?id=44878999

chrisweekly | 4 days ago

The 1M token context was Gemini's headlining feature. Now, the only thing I'd like Claude to work on is tokens counted towards document processing. Gemini will often bill 1/10th the tokens Anthropic does for the same document.

film42 | 5 days ago

> desperately need LLMs to maintain extremely effective context

Last time I used Gemini it did something very surprising: instead of providing readable code, it started to generate pseudo-minified code.

Like on CSS class would become one long line of CSS, and one JS function became one long line of JS, with most of the variable names minified, while some remained readable, but short. It did away with all unnecessary spaces.

I was asking myself what is happening here, and my only explanation was that maybe Google started training Gemini on minified code, on making Gemini understand and generate it, in order to maximize the value of every token.

qwertox | 4 days ago

Neat. I do 1M tokens context locally, and do it entirely with a single GPU and FOSS software, and have access to a wide range of models of equivalent or better quality.

Explain to me, again, how Anthropic's flawed business model works?

DiabloD3 | 4 days ago

I just want a better way to invalidate old context... It's great that I can fit more context, but the main challenge is claude getting sidetracked with 10 invalid grep calls, pytest dumping a 10k token stack trace etc.... And yes the ability to go back in time via esc+esc is great but I want claude to read the error stack learn from it and purge from its context or at least let me lobotomize ot selectively... Learning and discarding the raw output from tool calls feels like the missing piece here still.

tomsanbear | 4 days ago

I wish they’d fix other things faster. Still can’t upload an Excel file in the iOS app, even with analyst mode enabled. The voice mode feels like it’s 10 years behind OpenAI (no realtime, for example). And Opus 4.1 still occasionally goes absolutely mental and provides much worse analysis than o3 or GPT5-thinking.

Rooting for Anthropic. Competition in this space is good.

I watched an interview with Dario recently where he said he wasted a “product guy” and it really shows.

mrcwinn | 4 days ago

I’m glad to see the only company chasing margins - which they get by having a great product and a meticulous brand - finding even more ways to get margin. That’s good business.

cadamsdotcom | 4 days ago

I wonder how modern models fair on NovelQA and FLenQA (benchmarks that test ability to understand long context beyond needle in a haystack retrieval). The only such test on a reasoning model that I found was done on o3-mini-high (https://arxiv.org/abs/2504.21318), it suggests that reasoning noticeably improves FLenQA performance, but this test only explored context up to 3,000 tokens.

ffitch | 4 days ago

For folks using LLMs for big coding projects, what's your go-to workflow for deciding which parts of the codebase to feed the model?

cintusshied | 4 days ago

"...in API"

That's a VERY relevant clarification. this DOESN'T apply to web or app users.

Basically, if you want a 1M context window you have to specifically pay for it.

ericol | 4 days ago

My experience with Claude code beyond building anything bigger than a webpage, a small API, a tutorial on CSS etc has been pretty bad. I think context length is a manageable problem, but not the main one. I used it to write a 50K LoC python code base with 300 unit tests and it went ok for the first few weeks and then it failed. This is after there is a CLAUDE.md file for every single module that needs it as well as detailed agents for testing, design, coding and review.

I won't going into a case by case list of its failures, The core of the issue is misaligned incentives, which I want to get into:

1. The incentives for coding agent, in general and claude, are writing LOTS of code. None of them — O — are good at the planning and verification.

2. The involvement of the human, ironically, in a haphazard way in the agent's process. And this has to do with how the problem of coding for these agents is defined. Human developers are like snow flakes when it comes to opinions on software design, there is no way to apply each's preference(except paper machet and superglue SO, Reddit threads and books) to the design of the system in any meaningful way and that makes a simple system way too complex or it makes a complex problem simplistic.

  - There is no way to evolve the plan to accept new preferences except text in CLAUDE.md file in git that you will have to read through and edit.

  - There is no way to know the near term effect of code choices now on 1 week from now. 

  - So much code is written that asking a person to review it in case you are at the envelope and pushing the limit feels morally wrong and an insane ask. How many of your Code reviews are instead replaced by 15-30 min design meetings to instead solicit feedback on design of the PR — because it so complex — and just push the PR into dev? WTF am I even doing I wonder.

  - It does not know how far to explore for better rewards and does not know it better from local rewards, Resulting in commented out tests and deleting arbitrary code, to make its plan "work".
In short code is a commodity for CEOs of Coding agent companies and CXOs of your company to use(sales force has everyone coding, but that just raises the floor and its a good thing, it does NOT lower the bar and make people 10x devs). All of them have bought into this idea that 10x is somehow producing 10x code. Your time reviewing and unmangling and mainitaining the code is not the commodity. It never ever was.
itissid | 4 days ago

To be honest, I am not particularly interested in whether the next model is better than the previous one. Rather than being superior, it is important that it maintains context and has enough memory capacity to not interfere with work. I believe that is what makes the new model competitive.

cognix_dev | 4 days ago

Anyone else found Claude has become hugely more stupid recently?

It used to always pitch answers at the right level, but recently it just seems to have left its common sense at the door. Gemini just gives much better answers for non-technical questions now.

nprateem | 4 days ago

The reason I initially got interested in Claude was because they were the first to offer a 200K token context window. That was massive in 2023. However, they didn't keep up once Gemini offered a 1M token window last year.

I'm glad to see an attempt to return to having a competitive context window.

pmxi | 4 days ago

We do know Parkinson’s Law ( https://en.m.wikipedia.org/wiki/Parkinson%27s_law ) will apply to all this, right?

truelson | 4 days ago

Does anyone have data on how much better these 1M token context models produce better results than the more limited windows alongside certain RAG implementations? Or how much better in the face of RAG the 200k vs 1M token models perform on a benchmark?

sporkland | 4 days ago

Brain: Hey, you going to sleep? Me: Yes. Brain: That 200,001st token cost you $600,000/M.

irthomasthomas | 4 days ago

How much will it cost when you get near the 1m context mark? Its got to be in thousands per query

nothercastle | 2 days ago

Of course Bolt is the customer spotlight. These vibe coding tools chuck the entire codebase and charge for tokens used. By 10k lines of code or so, these apps were not able to fit.

muzani | 4 days ago

In my testing the gap between claude and gemini pro 2.5 is close. My company is in asia pacific and we can't get access to claude via vertex for some stupid reason.

but i tested it via other providers, the gap used to be huge but now not.

faangguyindia | 5 days ago

I noticed the quality of answer degrades horribly beyond few thousands of tokens. Maybe 10k. Is anyone actually successfully using these 100k+ token contexts for anything?

Roark66 | 4 days ago

Currently the quality seems to degrade long before the context limit is reached, as the context becomes “polluted”.

Should we expect the higher limit to also increase the practical context size proportionally?

nojs | 4 days ago

1M tokens is impressive, but the real gains will come from how we curate context—compact summaries, per-repo indexes, and phase resets. Bigger windows help; guardrails keep models focused and costs predictable.

shamano | 5 days ago

The fracturing of all the models offered across providers is annoying. The number of different models and the fact a given model will have different capabilities from different providers is ridiculous.

alienbaby | 4 days ago

Unfortunately, larger context isn’t really the answer after a certain point. Small focused context is better, lazily throwing a bunch of tokens in as a context is going to yield bad results.

deadbabe | 4 days ago

With that pricing I can't imagine why anyone would use Claude Sonnet through the API when Gemini 2.5 Pro is both better and cheaper (especially at long-context understanding).

logicchains | 4 days ago

Does very large context significantly increase a response time? Are there any benchmarks/leader-boards estimating different models in that regard?

maxnevermind | 4 days ago

Awesome addition to a great model.

The best interface for long context reasoning has been AIStudio by Google. Exceptional experience.

I use Prompt Tower to create long context payloads.

ramoz | 4 days ago

Does anybody know which technology most of these companies that support 1M tokens use? Or is it all hidden?

ndkap | 4 days ago

It's great they've finally caught up, but unfortunate it's on their mid-tier model only and it's laughably expensive.

ZeroCool2u | 5 days ago

But will it remember any of it, and stop creating new redundant files when it can't find or understand what its looking for?

penguin202 | 5 days ago

My first thought was "gg no re" can't wait to see how this changes compaction requirements in claude code.

whalesalad | 4 days ago

I’ve tried 2 AI tools recently. Neither could produce the correct code to calculate the CPU temperature on a Raspberry Pi RP2040. The code worked, looked ok and even produced reasonable looking results - until I put a finger on the chip and thus raised the temp. The calculated temperature went down. As an aside the free version of chatGPT didn’t know about anything newer than 2023 so couldn’t tell me about the RP2350

markb139 | 4 days ago

[Claude usage limit reached. Your limit will reset at..] .. eh lunch is a good time to go home anyways..

poniko | 4 days ago

For some context, only the tweaks files and scripting parts of Cyberpunk 2077 are ~2 million LOC.

chmod775 | 4 days ago

A million tokens? Damn, I’m gonna need a lot of quarters to play this game at Chuck-E-Cheese.

kotaKat | 5 days ago

O observed that claude produces a lot of bloat. Wonder how such llm generated projects age.

fpauser | 4 days ago

This tells me they've gotten very good at caching and modeling the impact of caching.

mrcwinn | 4 days ago

So I can upload 1M tokens per prompt but pay $3 per 1M input tokens?

Its really expensive to use.

hoppp | 4 days ago

So, more tokens means better but at the same time more tokens means it distracts itself too much along the way. So at the same time it is an improvement but also potentially detrimental. How are those things beneficial in any capacity? What was said last week? Embrace AI or leave?

All I see so far is: don't embrace and stay.

rootnod3 | 5 days ago

Only time this is useful is to do init on a sizable code base or dump a “big” csv.

lvl155 | 5 days ago

Context window after certain size doesn’t bring in much benefit but higher bill. If it still keeps forgetting instructions it would be just much easier to be ended up with long messages with higher context consumption and hence the bill

I’d rather having an option to limit the context size

alvis | 5 days ago

Ah, so claude code on subscription will become a crippled down version

siva7 | 4 days ago

The critical issue with LLM which never beats human: break what worked.

revskill | 5 days ago

Thats incredible to see how ai models are improving, i'm really happy with this news. (imo it's more impactful than the release of gpt5) now, we need more tokens per second, and then the self-improvement of the model will accelerate.

henriquegodoy | 5 days ago

This is amazing. shout out to anthropic for doing this. I would like to have a CLAUDE Model which is not nerfed with ethics and values to please the users and write overtly large plans to impress the user.

m13rar | 4 days ago

Claude is down.

EDIT: for the moment... it supports 0 tokens of context xD

williamtrask | 4 days ago
[deleted]
| 5 days ago

How did they do the 1M context window?

Same technique as Qwen? As Gemini?

tosh | 5 days ago

It's like double "double the dose"

socrateslee | 4 days ago

Fantastic, use up your quota even more quickly. :)

_joel | 4 days ago

Why do I get the feeling that HN devs on here want to just feed it their entire folder, source, binaries everything and have it make changes in seconds.

throwmeaway222 | 4 days ago

Wish it was on the web app as well!

forgingahead | 4 days ago

i personally use it in my codding tasks such ana amazing and powerful llm

hassadyb | 4 days ago

Does this cover subscription?

reverseblade2 | 4 days ago

Yay, more room for stray cats.

nickphx | 4 days ago

In Visual Studio as well?

CodeCompost | 4 days ago

Wow!

As a fiction writer/noodler this is amazing. I can put not just a whole book in as before, not just a whole series, but the entire corpus of authors in.

I mean, from the pov of biography writers, this is awesome too. Just dump it all in, right?

I'll have to switch using to Sonnet 4 now for workflows and edit my RAG code to be longer windows, a lot longer

Balgair | 4 days ago

Remember kids, just because you CAN doesn't mean you SHOULD

TZubiri | 4 days ago

It’s a stupid metric because nothing in the real world has half a million words of context. So all they’re doing is feeding it imagined slop, or sticking together random files.

MagicMoonlight | 4 days ago

Eagerly waiting for them to do this with Opus

artursapek | 5 days ago

HM

omlelelom_kimox | 4 days ago

1M?

640K ought to be enough for anybody ... right?

amelius | 4 days ago

god they keep raising prices

rafaelero | 5 days ago

holy moly! awesome

throwaway888abc | 5 days ago

Great! Now we can use AI to read and think like a specific "book".

doppelgunner | 4 days ago

moaaaaarrrr

1xer | 5 days ago