Key Contribution
LongMem decouples long-term memory from LLM parameters, enabling models to access vast amounts of information without increasing model size or context window. The approach uses a frozen LLM backbone with a trainable side-network for memory retrieval.
Architecture
Memory Bank
Store past context as key-value pairs
Keys are encoder representations
Values are decoder representations
Grows without limit
SideNet
Trainable memory retrieval network:
Learns to query memory bank
Produces memory-enhanced representations
Residual connection with frozen LLM
Small parameter overhead
Frozen Backbone
Base LLM remains unchanged:
No need to retrain large model
Preserves original capabilities
Memory augmentation is additive
Memory Operations
Writing to Memory
For each token in context:
Encode with frozen LLM encoder
Store encoder output as key
Store decoder output as value
Append to memory bank
Reading from Memory
For new input:
Encode query with SideNet
Retrieve top-k similar memories
Fuse retrieved values with current representation
Continue generation with enhanced context
Training
Data
Long-form documents (books, papers)
Split into chunks that span memory operations
Train to predict using both local and memory context
Objectives
Standard language modeling loss
Memory attention regularization
Ensures memory is actually used
Efficiency
Only train SideNet (small)
Backbone frozen (no gradient)
Memory bank is inference-only structure
Evaluation
Tasks
Long-context language modeling
Memory-intensive QA
Long document summarization
Results
Significant perplexity improvements
Better than extending context window
Scales to very long contexts
Maintains generation quality
Key Insights
Decoupling Benefits
Memory scales independently from model
Can update memory without retraining
Different memory banks for different domains
Privacy: user memory is separate
Memory as Knowledge
Previous context becomes retrievable knowledge
Similar to human episodic memory
Enables continual learning without forgetting
Relevance to Agent Memory
Direct Application
Agents can use similar architecture:
Store all interactions in memory bank
Retrieve relevant past when needed
No context window limitations
Personalized memory per user
Practical Considerations
Memory bank growth management
Retrieval efficiency at scale
Memory consolidation strategies
Cross-session memory persistence
Comparison to Other Approaches
vs. RAG
RAG retrieves from external docs
LongMem retrieves from past context
Complementary approaches
vs. Extended Context
Context window: fixed, expensive
LongMem: unlimited, retrieval-based
Trade-off: exact attention vs. approximate
vs. MemGPT
MemGPT: LLM manages memory explicitly
LongMem: retrieval is learned/automatic
Different levels of agent control
Limitations
Retrieval adds latency
Memory bank requires storage
Training SideNet needs data
Not all information is retrievable
Citation
@article{wang2023longmem,
title={Augmenting Language Models with Long-Term Memory},
author={Wang, Weizhi and Dong, Li and Cheng, Hao and Liu, Xiaodong and Yan, Xifeng and Gao, Jianfeng and Wei, Furu},
journal={arXiv preprint arXiv:2306.07174},
year={2023}
}