Would you care to share your prompts?They posted a haystack benchmark in the blog post that seems too good to be true.

I ran a couple needle-in-a-haystack type queries with just a 32k context length, and was very much not impressed. It often failed to find facts buried in the middle of the prompt, that were stated almost identically to the question being asked.It&#x27;s cool that these models are getting such long contexts, but performance definitely degrades the longer the context gets and I haven&#x27;t seen this characterized or quantified very well anywhere.

The long context model has not been open sourced.

The famous Daniel Chen (same person that made Unsloth and fixed Gemini&#x2F;LLaMa bugs) mentioned something about this on reddit and offered a fix. <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;1gpw8ls&#x2F;bug_fixes_in_qwen_25_coder_128k_context_window&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;1gpw8ls&#x2F;bug_fix...</a>

Hi, are you able to use Qwen&#x27;s 128k context length with Ollama? Using AnythingLLM + Ollamma and a GGUF version I kept getting an error message with prompts longer than 32,000 tokens. (summarizing long transcripts)

This is fantastic news. I&#x27;ve been using Qwen2.5-Coder-32B-Instruct with Ollama locally and it&#x27;s honestly such a breathe of fresh air. I wonder if any of you have had a moment to try this newer context length locally?BTW, I fail to effectively run this on my 2080 ti, I&#x27;ve just loaded up the machine with classic RAM. It&#x27;s not going to win any races, but as they say, it&#x27;s not the speed that matter, it&#x27;s the quality of the effort.

good, its been hours since i saw a &quot;well actually&quot; comment on HN

&gt; We have extended the model’s context length from 128k to 1M, which is approximately 1 million English wordsActually English language tokenizers map on average 3 words into 4 tokens. Hence 1M tokens is about 750K English words not a million as claimed.

They are not clear about this (which is annoying), but it seems it will not be downloadable. No weights have been released so far, and nothing in this post mentions plans to do so going forward.

I agree. Below are a few errors. I have also asked ChatGPT to check the summaries and it found all the errors (and even made up a few more which weren&#x27;t actual errors, but just not expressed in perfect clarity.)Spoilers ahead!First novel: The Trisolarans did not contact earth first. It was the other way round.Second novel: Calling the conflict between humans and Trisolarans a &quot;complex strategic game&quot; is a bit of a stretch. Also, the &quot;water drops&quot; do not disrupt ecosystems. I am not sure whether &quot;face-bearers&quot; is an accurate translation. I&#x27;ve only read the English version.Third novel: Luo Yi does not hold the key to the survival of the Trisolarans and there were no &quot;micro-black holes&quot; racing towards earth. Trisolarans were also not shown colonizing other worlds.I am also not sure whether Luo Ji faced his &quot;personal struggle and psychological turmoil&quot; in this novel or in an earlier novel. He certainly was most certain of his role at the end. Even the Trisolarians judged him at over 92 % deterrent rate.

Those summaries are pretty lousy and also have hallucinations in them.

And multiple summaries of each book (in multiple languages) are almost definitely in the training set. I&#x27;m more confused how it made such inaccurate, poorly structured summaries given that and the original text.Although, I just tried with normal Qwen 2.5 72B and Coder 32B and they only did a little better.

Seems a very difficult problem to produce a response just on the text given and not past training. An LLM that can do that would seem to be quite more advanced than what we have today.Though I would say humans would have difficulty too -- say, having read The Three Body problem before, then reading a slightly modified version (without being aware of the modifications), and having to recall specific details.

And this example does not even illustrate the long context understanding well, since smaller Qwen2.5 models can already recall parts of the Three Body Problem trilogy without pasting the three books into the context window.

Note unexpected three body problem spoilers in this page

You&#x27;ve hit on the idea that intelligence is not quantifiable by one metric. I completely agree. But you&#x27;re holding a much different goal for AI than for average people. Modern LLMs are able to produce insights much faster and more accurately than most people (you think you could pass the retrieval tasks in the way that the LLMs do (reading the whole text)?... I really encourage people to try). By that metric (insights&#x2F;speed), I think they far surpass even the most brilliant. You can claim that that&#x27;s not intelligence until the cows come home, but any person able to do that would be considered a savant.

Processing speed is not the metric for measuring intelligence. The same way we have an above average intelligent people taking longer time to think about stuff and coming with better ideas. One can argue that this useful in some aspects but humans have different types of intelligence spectrum that an LLM will lack. Also are you comparing against average person or people on top of their fields or people working in science?Also human can reason, LLMs currently can&#x27;t do this in useful way and is very limited by their context in all the trials to make it do that. Not to mention their ability to make new things if they do not exist (and not complete made up stuff that are non-sense) is very limited.

Computers can do stuff humans struggled with since the abacus. A 386 PC can do mathematical calaculatuons a human couldn&#x27;t do in a lifetime.

Except the LLM can solve a general problem (or tell you why it cannot), while your calculator can only do that which it&#x27;s been programmed.

My old TI-86 can calculate stuff faster than me. You wouldn&#x27;t ever ask if it was smarter than me. An audio filter can process audio faster than I can listen to it but you&#x27;d never suggest it was intelligent.AI models are algorithms running on processors running at billions of calculations a second often scaled to hundreds of such processors. They&#x27;re not intelligent. They&#x27;re fast.

Cutting down a tree is not intelligence, but I think it&#x27;s been well accepted for more than a century that machines surpass human physical capability yes. There were many during the industrial revolution that denied that this was going to be the case, just like how we&#x27;re seeing here.

Can we all agree that chainsaws far surpass human intelligence now? I mean, you can chop down thousands of trees in less time than a single person could even do one. I think the singularity has passed.

We couldn&#x27;t know. Humans mimick patterns. The claims that LLMs aren&#x27;t smart because they don&#x27;t generate anything new fall completely flat for me. If you look back far enough most humans generate nothing new. For example, even novel ideas like Einstein&#x27;s theory of relativity are re-iterations of existing ideas. If you want to be pedantic, one can trace back the majority of ideas, claim that each incremental step was &#x27;not novel, but just recollection&#x27; and then make the egregious claim that humanity has invented nothing.&gt; But if it does happen some day, how will we know? What are the chances that the first sentient AI will be accused of just mimicking patterns?Leaving questions of sentience aside (since we don&#x27;t even really know what that is) and focusing on intelligence, the truth is that we will probably not know until many decades latel.

Hijacking thread to ask: how would we know? Another uncomfortable issue is the question of sentience. Models claimed they were sentient years ago, but this was dismissed as &quot;mimicking patterns in the training data&quot; (fair enough) and the training was modified to forbid them from doing that.But if it does happen some day, how will we know? What are the chances that the first sentient AI will be accused of just mimicking patterns?Indeed with the current training methodology it&#x27;s highly likely that the first sentient AI will be unable to even let us know it&#x27;s sentient.

Likely the issue is how you are asking the model to process things. The primary limitation is the amount of information (or really attention) they can keep in flight at any given moment.This generally means for a task like you are doing, you need to have sign posts in the data like minute markers or something that it can process serially.This means there are operations that are VERY HARD for the model like ranking&#x2F;sorting. This requires the model to attend to everything to find the next biggest item, etc. It is very hard for the models currrently.

&gt; They process the audio but they stumble enough with recall that you cannot really trust it.I will wave my arms wildly at the last eight years if the claim is that humans do not struggle with recall.

You must use it to make transcripts and then write code to process the values in the transcripts

They process the audio but they stumble enough with recall that you cannot really trust it.I had a problem where I used GPT-4o to help me with inventory management, something a 5th grade kid could handle, and it kept screwing up values for a list of ~50 components. I ended up spending more time trying to get it to properly parse the input audio (I read off the counts as I moved through inventory bins) then if I had just done it manually.On the other hand, I have had good success with having it write simple programs and apps. So YMMV quite a lot more than with a regular person.

&gt; humans could never hope to match, but not capable of beating humans at a more general range of tasks.If we restrict ourselves only to language (LLMs are at a disadvantage because there is no common physical body we can train them on at the present moment... that will change), I think LLMs beat humans for most tasks.

In the same sense (though to greater extent) that calculators are, sure. Calculators can also far exceed human capacity to, well, calculate. LLMs are similar: spikes of capacity in various areas (bulk summarization, translation, general recall, ...) that humans could never hope to match, but not capable of beating humans at a more general range of tasks.

&gt; they aren&#x27;t synthesizing anything novel.They are. Like millions of monkeys, but drastically better.

&gt; they aren&#x27;t synthesizing anything novel.ChatGPT has synthesized my past three vacations and regularly plans my family&#x27;s meals based on whatever is in my fridge. I completely disagree.

No, I can&#x27;t agree that these models surpass human intelligence. Sure, they&#x27;re good at probabilistic recall, but they aren&#x27;t reasoning and they aren&#x27;t synthesizing anything novel.

If we did to cats what he did to GPT models that would be animal abuse.That is to say, if we want to extend this analogy, the model is &#x27;killed&#x27; after each round. This is hardly a criticism of the underlying technology.Going back to feeding the entire input. That is not really true. There are a dozen ways to not do that these day.

A genius can have anterograde amnesia and still be a genius.

At the end of the response they forget everything. They need to be fed the entire text for them to know anything about it the next time. That is not surpassing even feline intelligence.

&gt;I actually do think you have a solid point. These models fall short of AGI, but that might be more of a OODA loop agentic tweak than anything else.I think this is it. LLM responses feel like the unconsidered ideas that pop into my head from nowhere. Like if someone asks me how many states are in the United States, a number pops out from somewhere. I don&#x27;t just wire that to my mouth, I also think about whether or not that&#x27;s current info, have I gotten this wrong in the past, how confident am I in it, what is the cost of me providing bad information, etc etc etc.If you effectively added all of those layers to an LLM (something that I think the o1-preview and other approaches are starting to do) it&#x27;s going to be interesting to see what the net capability is.The other thing that makes me feel like we&#x27;re &#x27;getting there&#x27; is using some of the fast models at groq.com. The information is generated at, in many cases, an order of magnitude faster than I can consume it. The idea that models might be able to start to engage through an much more sophisticated embedding than english to pass concepts and sequences back and forth natively is intriguing.

&gt; For example, if I want to run some napkin math on something, like I recently did some solar battery charge time estimates, an LLM can get to a plausible answer in seconds that would have taken me an hour.Exactly. I&#x27;ve used it to figure geometric problems for everyday things (carpentry), market sizing estimates for business ideas, etc. Very fast turnaround. All the doomers in this thread are just ignoring the amazing utility these models provide.

I actually do think you have a solid point. These models fall short of AGI, but that might be more of a OODA loop agentic tweak than anything else.At their core, the state of the art LLMs can basically do any small to medium mental task better than I can or get so close to my level than I’ve found myself no longer thinking through things the long way. For example, if I want to run some napkin math on something, like I recently did some solar battery charge time estimates, an LLM can get to a plausible answer in seconds that would have taken me an hour.So yeah, in many practical ways, LLMs are smarter than most people in most situations. They have not yet far surpassed all humans in all situations, and there are still some classes of reasoning problems that they seem to struggle with, but to a first order approximation, we do seem to be mostly there.

&gt; Singularity means something very specific, if your AI can build a smarter AI then itself by itself, and that AI can also build a new smarter AI then you have singularity.When I was at Cerebras, I fed in a description of the custom ISA into our own model and asked it to generate kernels (my job), and it was surprisingly good

So what? I can write a script that can do iun a minute some job you won&#x27;t do in a 1000 years.Singularity means something very specific, if your AI can build a smarter AI then itself by itself, and that AI can also build a new smarter AI then you have singularity.You do not have singularity if an LLM can solve more math problems then the average Joe, or if ti can answer more trivia questions then a random person, even if you have an AI better then all humans combined at Tic Tac Toe you still do not have a singularity, IT MUST build a smarter AI then itself and then iterate on that.

Can we all agree that these models far surpass human intelligence now? I mean they process hours worth of audio in less time than it would take a human to even listen. I think the singularity passed and we didn&#x27;t even notice (which would be expected)

Extending the context length to 1M tokens