I find this topic very interesting because it&#x27;s something I&#x27;ve run into and mitigated ever since gpt3 was available.Plenty of long, loooong and complex role plays, world building and tests to see if I could integrate dozens of different local models into a game project or similar.All of the same issues there apply here for &quot;agents&quot; as well.Very quickly learn that even current models are like distracted puppies. Larger models seem to be able to brute force their way through some of these problems but I wouldn&#x27;t call that sustainable.

I mean that&#x27;s pretty much self attention mechanism right?It&#x27;s just beyond playing with more heads, specialised heads, kv caching etc it doesn&#x27;t seem like anybody&#x27;s figured out the next step here yet.Attention is already pretty atrocious perf even with caching so additional context metadata would have to be implemented carefully.

There seems to be a need to be some kind of hierarchy to contextualization brought to LLMs and not a flat history stuffed into the context. For comparison, human memory is not just a compressing of linear history of which we rewind the tape when needed. Its more like a Russian doll of bounded contexts which we break out of, when our inner context is no longer sufficient to solve the problem.

How Long Contexts Fail