arxiv:2605.11196

Variational Linear Attention: Stable Associative Memory for Long-Context Transformers

Published on May 11

Authors:

Abstract

Variational Linear Attention addresses memory growth issues in linear attention by reformulating memory updates as a regularized least-squares problem with adaptive penalty, achieving faster computation and better retrieval performance than standard linear attention.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Linear attention reduces the quadratic cost of softmax attention to O(T), but its memory state grows as O(T) in Frobenius norm, causing progressive interference between stored associations. We introduce Variational Linear Attention (VLA), which reframes the memory update as an online regularised least-squares problem with an adaptive penalty matrix maintained via the Sherman-Morrison rank-1 formula. We prove that normalising the write direction to unit length gives the recurrence Jacobian spectral norm exactly 1 for all sequence lengths and head dimensions (Proposition 2), and that the state norm is self-limiting under bounded inputs (Proposition 1). Empirically, VLA reduces |S_t|_F by 109times relative to standard linear attention at T{=}1{,}000, achieves near-perfect exact-match accuracy on multi-query associative recall within the effective per-head memory regime (n_pairs < d_h), maintaining substantially higher retrieval performance than DeltaNet and standard linear attention under increasing memory load, and maintains 62\% accuracy at the per-head capacity boundary. A Triton-fused kernel achieves 14times speedup over sequential Python and O(T) scaling, crossing below softmax attention latency at approximately 43\,000 tokens.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.11196

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.11196 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.11196 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.11196 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.