Chain of Thought empowers transformers to solve inherently serial problems

krackers | 258 points

I liked Yoav Goldberg snarky's quote tweet:

> next paper: transformers can solve any problem but on some of them they may compute indefinitely and never provide an answer

> (and you cannot tell in advance which is which!!)

https://twitter.com/yoavgo/status/1835802380589203802

lgessler | 2 days ago

Note that for the purposes of this paper a “problem” just means a formally decidable problem or a formal language, and the proof is that by creatively arranging transformers you can make individual transformer runs behave like individual Boolean circuits. However, this is a long way from any practical application of transformers: for one thing, most problems we care about are not stated as formal languages, and we already have an exceptionally more efficient way to implement Boolean circuits.

lsy | 3 days ago

I'm waiting for peoples of AI to discover syllogism and inference in its original PROLOG sense, which this CoT abomination basically tries to achieve. Interestingly, if all logical content is translated to rules, and then only rules are fed into the LLM training set, what would the result be, and can the probabilistic magic be made into actually following reason without all the dice.

larodi | 3 days ago

>Remarkably, constant depth is sufficient.

How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states? Since transformers also use fully connected layers, none of this should really come as a surprise. But from glancing at the paper, they don't even mention it.

sigmoid10 | 3 days ago

But didn't we already know that NN can solve any computable problem? The interesting thing is if they can be trained to solve any (computable) problem.

wodenokoto | 3 days ago

In the words of an author:

"What is the performance limit when scaling LLM inference? Sky's the limit.

We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient.

http://arxiv.org/abs/2402.12875 (ICLR 2024)"

https://x.com/denny_zhou/status/1835761801453306089

nopinsight | 3 days ago

So given infinite time and resources it can solve any problem? Hardly groundbreaking is it.

JSDevOps | 3 days ago

Sure, in same sense as an infinitely long tape let's a Turing machine solve arbitrary problems. In theory at least. If one had the right program.

HarHarVeryFunny | 3 days ago

Has it been publicly benchmarked yet, if this approach:

    Hello LLM, please solve this task: <task>
Can be improved by performing this afterwards?

    for iteration in range(10):
        Hello LLM, please solve this task: <task>
        Here is a possible solution: <last_reply>
        Please look at it and see if you can improve it.
        Then tell me your improved solution.
mg | 3 days ago

> We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.

This is also the case with plain and regular RNNs

tossandthrow | 3 days ago

https://x.com/ctjlewis/status/1786948443472339247

"Running cellular automata and other programs on Claude 3 Opus."

Its one of the replies on this tweet.

smusamashah | 3 days ago

'can'

But will they? I believe the frontier has moved to making them make sense instead of just making infinite language.

The infinite monkey problem is not solved yet

seydor | 3 days ago
phemartin | a day ago

Chain of thought GPT is sort of a Turing machine with a tape that it's allowed to write to for purposes other than outputting the answer.

scotty79 | 3 days ago

A reply in this twitter thread links to a detailed blog post titled "Universal computation by attention: Running cellular automata and other programs on Claude 3 Opus." https://x.com/ctjlewis/status/1786948443472339247

smusamashah | 3 days ago

Can any of these tools do anything that the Github copilot cannot do? (Apart from using other models?). I tried Continue.dev and cursor.ai, but it was not immediately obvious to me. Maybe I am missing something workflow specific?

cpldcpu | 3 days ago

They have also mathematically proven that transformers are great randomness generators.

floppiplopp | 3 days ago

Is this more general than LLMs? Is it possible to do something Chain-of-Thought-like in a transformer model that _isn't_ trained on language?

empath75 | 2 days ago

Apologies if this is a dumb question, but aren't all computations inherently serial? In that a Turing machine performs operations serially?

glial | 3 days ago

Random generator of tokens can also solve any problem if you give it enough time and memory.

tonii141 | 3 days ago

Is this similar to the Universal Approximator Theorem?

qmatch | 3 days ago

Damn, we just used our entire Round A acquiring an infinite amount of bananas and typewriter ink. The boss is not going to like this.

CarRamrod | 3 days ago

[flagged]

tonii141 | 3 days ago

Are we getting to a point where the LLM will just answer "42" and we need to figure out the question? =)

theshrike79 | 3 days ago

Forget UBI, we're going to need Universal Basic Compute.

bottlepalm | 3 days ago