Overview
Inspired by human memory systems, hierarchical memory organizes agent memory into multiple levels with different capacities, access speeds, and retention characteristics. This enables efficient handling of both immediate context and long-term knowledge.
Memory Levels
Working Memory (L1)
Immediate context:
Short-term Memory (L2)
Recent history:
Long-term Memory (L3)
Persistent knowledge:
Archival Memory (L4)
Cold storage:
Architecture Flow
┌─────────────────────────────────────────────────────────────┐
│ CURRENT CONVERSATION │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ L1: Working Memory (In Context) │
│ ├── Current task state │
│ ├── Last N messages │
│ └── Retrieved context from lower levels │
└──────────────────────────┬──────────────────────────────────┘
│ overflow / retrieval
▼
┌─────────────────────────────────────────────────────────────┐
│ L2: Short-term Memory (Hot Storage) │
│ ├── Today's conversations │
│ ├── Active project context │
│ └── Recently accessed memories │
└──────────────────────────┬──────────────────────────────────┘
│ consolidation / retrieval
▼
┌─────────────────────────────────────────────────────────────┐
│ L3: Long-term Memory (Warm Storage) │
│ ├── User profile and preferences │
│ ├── Conversation summaries │
│ └── Learned facts and patterns │
└──────────────────────────┬──────────────────────────────────┘
│ archival / deep retrieval
▼
┌─────────────────────────────────────────────────────────────┐
│ L4: Archival Memory (Cold Storage) │
│ ├── Historical conversations (full text) │
│ ├── Audit trails │
│ └── Rarely accessed data │
└─────────────────────────────────────────────────────────────┘
Movement Between Levels
Promotion (Cold → Hot)
When archived memory becomes relevant:
Demotion (Hot → Cold)
As memories age or lose relevance:
Consolidation
Transform detailed memories into summaries:
Retrieval Strategy
Query Planning
When a query arrives:
Relevance Scoring
Combine multiple signals:
Budget Allocation
Distribute context window budget:
Consolidation Process
When to Consolidate
What to Preserve
During consolidation, keep:
What to Discard
Safe to compress or remove:
Implementation with Qdrant
Collection per Level
L2: Collection "memory_short_term"
- High-performance configuration
- Aggressive indexing
- Small payload limits
L3: Collection "memory_long_term"
- Balanced configuration
- Standard indexing
- Full payloads
L4: Collection "memory_archive"
- Cost-optimized configuration
- Minimal indexing
- Compressed storage
Tiered Search
# Fast path - check hot memory first
l2_results = search(collection="memory_short_term", limit=5)
if sufficient_relevance(l2_results):
return l2_results
# Slower path - check warm memory
l3_results = search(collection="memory_long_term", limit=5)
return merge(l2_results, l3_results)