MemGPT: Towards LLMs as Operating Systems - Agent Memory Research

Key Contribution

MemGPT introduces a novel approach to handling LLM context limitations by treating the context window as "main memory" and using external storage as "disk." The LLM itself manages memory through function calls, deciding when to move information between memory tiers.

Architecture

Memory Hierarchy

Main Context: Limited LLM context window (the "RAM")

External Storage: Vector database and structured storage (the "disk")

Memory Management: LLM-directed paging between tiers

Self-Directed Memory

Unlike traditional RAG where retrieval is automatic, MemGPT lets the LLM:

Explicitly request memory retrieval

Decide what to store and evict

Manage its own working set

Handle memory pressure gracefully

Key Mechanisms

Memory Functions

The LLM has access to memory operations:

`core_memory_append`: Add to persistent context

`core_memory_replace`: Update persistent context

`archival_memory_insert`: Store to long-term

`archival_memory_search`: Retrieve from long-term

`conversation_search`: Search past conversations

Paging System

When context fills up:

System detects memory pressure

Older/less relevant content moved to archival

Space freed for new information

LLM can recall archived content when needed

Evaluation

Tasks Tested

Document QA: Answer questions requiring information across long documents

Conversational Agents: Multi-session conversations with memory

Nested Retrieval: Tasks requiring multiple retrieval steps

Results

Outperforms fixed-context baselines on long documents

Maintains conversation coherence across many sessions

Successfully manages memory without human intervention

Implications for Agent Memory

Design Principles

Let agents manage their own memory

Provide explicit memory operations as tools

Use hierarchical storage with different access patterns

Enable self-reflection about memory state

Practical Considerations

Memory operations consume context/tokens

Requires careful prompt engineering

Trade-off between autonomy and efficiency

Related Work

Builds on retrieval-augmented generation

Inspired by operating systems literature

Related to tool-use and function calling

Connects to cognitive architecture research

Citation

@article{packer2023memgpt,

title={MemGPT: Towards LLMs as Operating Systems},

author={Packer, Charles and Fang, Vivian and Patil, Shishir G and Lin, Kevin and Wooders, Sarah and Gonzalez, Joseph E},

journal={arXiv preprint arXiv:2310.08560},

year={2023}

}

TL;DR