Tensor Product Attention Is All You Need

eunos | 160 points

My kingdom for renaming this paper to something like "Tensor Product Attention is a Memory-Efficient Approach for Long-Sequence Language Modeling"

carbocation | 2 months ago

(trying to move the critique beyond the title...)

When trying to deploy llms in with larger context windows constrained environments 2 things start to hurt: a) increased memory footprint for longer KV cache b) increased decode speed due to longer context window. this paper addresses a) only, which is useful, but we are still left with b) (right?)

bbcc90 | 2 months ago

I really can't with these paper titles anymore, man.

whymauri | 2 months ago

Tensor decomposition has traditionally suffered from high computational complexity. Is it an issue here?

esafak | 2 months ago

Every day there are literally tons of papers with “XYX is All you need” at this point we apparently need thousands of things…

jdefr89 | 2 months ago

For those of us who are lay people outside of machine learning and AI, what was the critical insight that made “attention all you need” in the original Transformer paper?

hangonhn | 2 months ago

Another approach has been where separate physics-informed neural networks learned the tensor product. They reformulated the initial optimization problem to be structured as tensors. I assume that tensor products could be another factor in improving the actual computations.

https://arxiv.org/abs/2408.13101

AxesPushPatty | 2 months ago

> a novel attention mechanism

Why do every paper has to mention this word "novel" and these titles are getting crazier day by day.

cute_boi | 2 months ago

The main contribution of the paper aside, I have to say that the background section 2 is very neatly and succinctly written.

sva_ | 2 months ago

> Because memory consumption grows linearly with sequence length, the maximum context window is limited by practical hardware constraints

I thought the number of parameters grows quadratically with context window length - what do they mean?

t_mann | 2 months ago

I'm sorry but can people please stop naming their papers "X is all you need"? It's super annoying.

joshdavham | 2 months ago

If you don’t pay to read papers, you don’t get to complain about the titles, imo.

I hate ads, but I’m not paying for YouTube Premium either. That’s how it goes. I get ads.

thunkingdeep | 2 months ago