LongMem: Augmenting Large Language Models with Long-Term Memory - Agent Memory Research

Key Contribution

LongMem decouples long-term memory from LLM parameters, enabling models to access vast amounts of information without increasing model size or context window. The approach uses a frozen LLM backbone with a trainable side-network for memory retrieval.

Architecture

Memory Bank

Store past context as key-value pairs

Keys are encoder representations

Values are decoder representations

Grows without limit

SideNet

Trainable memory retrieval network:

Learns to query memory bank

Produces memory-enhanced representations

Residual connection with frozen LLM

Small parameter overhead

Frozen Backbone

Base LLM remains unchanged:

No need to retrain large model

Preserves original capabilities

Memory augmentation is additive

Memory Operations

Writing to Memory

For each token in context:

Encode with frozen LLM encoder

Store encoder output as key

Store decoder output as value

Append to memory bank

Reading from Memory

For new input:

Encode query with SideNet

Retrieve top-k similar memories

Fuse retrieved values with current representation

Continue generation with enhanced context

Training

Data

Long-form documents (books, papers)

Split into chunks that span memory operations

Train to predict using both local and memory context

Objectives

Standard language modeling loss

Memory attention regularization

Ensures memory is actually used

Efficiency

Only train SideNet (small)

Backbone frozen (no gradient)

Memory bank is inference-only structure

Evaluation

Tasks

Long-context language modeling

Memory-intensive QA

Long document summarization

Results

Significant perplexity improvements

Better than extending context window

Scales to very long contexts

Maintains generation quality

Key Insights

Decoupling Benefits

Memory scales independently from model

Can update memory without retraining

Different memory banks for different domains

Privacy: user memory is separate

Memory as Knowledge

Previous context becomes retrievable knowledge

Similar to human episodic memory

Enables continual learning without forgetting

Relevance to Agent Memory

Direct Application

Agents can use similar architecture:

Store all interactions in memory bank

Retrieve relevant past when needed

No context window limitations

Personalized memory per user

Practical Considerations

Memory bank growth management

Retrieval efficiency at scale

Memory consolidation strategies

Cross-session memory persistence

Comparison to Other Approaches

vs. RAG

RAG retrieves from external docs

LongMem retrieves from past context

Complementary approaches

vs. Extended Context

Context window: fixed, expensive

LongMem: unlimited, retrieval-based

Trade-off: exact attention vs. approximate

vs. MemGPT

MemGPT: LLM manages memory explicitly

LongMem: retrieval is learned/automatic

Different levels of agent control

Limitations

Retrieval adds latency

Memory bank requires storage

Training SideNet needs data

Not all information is retrievable

Citation

@article{wang2023longmem,

title={Augmenting Language Models with Long-Term Memory},

author={Wang, Weizhi and Dong, Li and Cheng, Hao and Liu, Xiaodong and Yan, Xifeng and Gao, Jianfeng and Wei, Furu},

journal={arXiv preprint arXiv:2306.07174},

year={2023}

}

TL;DR