> When controlling for the number of tokens, NoThinking outperforms Thinking across a diverse set of seven challenging reasoning datasets
Interesting. I thought the "thinking" was useful because it pulls in a lot of concepts into the context, but I guess not then?
It has also been said before that the text a model outputs during its Thinking step isn't actually a view into its inner thoughts. There are times when the model will think X but eventually answer Y.
But even so: the models _are_ better, right? So is the Thinking step then mostly useful during training?
skerit | 3 days ago
I'm not entirely sure how this kind of study jives well with other study, such as "Reasoning models don't always say what they think" [0], discussion [1].
To quote the article:
So if we can't trust the reasoning, then what's the point of checking whether they are "effective" or not?[0]: https://www.anthropic.com/research/reasoning-models-dont-say...
[1]: https://news.ycombinator.com/item?id=43572374