LongMem: Augmenting Large Language Models with Long-Term Memory

Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei

arXiv · 2023

long-term-memorymemory-bankdecoupledscalable

TL;DR

Proposes decoupling memory from model parameters, using a frozen LLM with trainable memory retrieval to enable unlimited context through a memory bank.

Key Contribution

LongMem decouples long-term memory from LLM parameters, enabling models to access vast amounts of information without increasing model size or context window. The approach uses a frozen LLM backbone with a trainable side-network for memory retrieval.

Architecture

Memory Bank

  • Store past context as key-value pairs
  • Keys are encoder representations
  • Values are decoder representations
  • Grows without limit
  • SideNet

    Trainable memory retrieval network:

  • Learns to query memory bank
  • Produces memory-enhanced representations
  • Residual connection with frozen LLM
  • Small parameter overhead
  • Frozen Backbone

    Base LLM remains unchanged:

  • No need to retrain large model
  • Preserves original capabilities
  • Memory augmentation is additive
  • Memory Operations

    Writing to Memory

    For each token in context:

  • Encode with frozen LLM encoder
  • Store encoder output as key
  • Store decoder output as value
  • Append to memory bank
  • Reading from Memory

    For new input:

  • Encode query with SideNet
  • Retrieve top-k similar memories
  • Fuse retrieved values with current representation
  • Continue generation with enhanced context
  • Training

    Data

  • Long-form documents (books, papers)
  • Split into chunks that span memory operations
  • Train to predict using both local and memory context
  • Objectives

  • Standard language modeling loss
  • Memory attention regularization
  • Ensures memory is actually used
  • Efficiency

  • Only train SideNet (small)
  • Backbone frozen (no gradient)
  • Memory bank is inference-only structure
  • Evaluation

    Tasks

  • Long-context language modeling
  • Memory-intensive QA
  • Long document summarization
  • Results

  • Significant perplexity improvements
  • Better than extending context window
  • Scales to very long contexts
  • Maintains generation quality
  • Key Insights

    Decoupling Benefits

  • Memory scales independently from model
  • Can update memory without retraining
  • Different memory banks for different domains
  • Privacy: user memory is separate
  • Memory as Knowledge

  • Previous context becomes retrievable knowledge
  • Similar to human episodic memory
  • Enables continual learning without forgetting
  • Relevance to Agent Memory

    Direct Application

    Agents can use similar architecture:

  • Store all interactions in memory bank
  • Retrieve relevant past when needed
  • No context window limitations
  • Personalized memory per user
  • Practical Considerations

  • Memory bank growth management
  • Retrieval efficiency at scale
  • Memory consolidation strategies
  • Cross-session memory persistence
  • Comparison to Other Approaches

    vs. RAG

  • RAG retrieves from external docs
  • LongMem retrieves from past context
  • Complementary approaches
  • vs. Extended Context

  • Context window: fixed, expensive
  • LongMem: unlimited, retrieval-based
  • Trade-off: exact attention vs. approximate
  • vs. MemGPT

  • MemGPT: LLM manages memory explicitly
  • LongMem: retrieval is learned/automatic
  • Different levels of agent control
  • Limitations

  • Retrieval adds latency
  • Memory bank requires storage
  • Training SideNet needs data
  • Not all information is retrievable
  • Citation

    @article{wang2023longmem,

    title={Augmenting Language Models with Long-Term Memory},

    author={Wang, Weizhi and Dong, Li and Cheng, Hao and Liu, Xiaodong and Yan, Xifeng and Gao, Jianfeng and Wei, Furu},

    journal={arXiv preprint arXiv:2306.07174},

    year={2023}

    }