Conversation Memory System - Agent Memory Architecture

Overview

Conversation memory is the foundation of agent memory systems. This architecture handles storing, summarizing, and retrieving conversation history to maintain continuity across sessions while managing context window limitations.

The Challenge

Context Window Limits

LLMs have finite context windows:

Can't include entire conversation history

Recent context often most relevant

But old context sometimes crucial

Need smart selection and compression

Session Continuity

Users expect agents to remember:

What was discussed previously

Decisions made together

Ongoing tasks and their status

The relationship built over time

Architecture Components

┌─────────────────────────────────────────────────────────────┐

│ Current Session │

│ ┌───────────────────────────────────────────────────────┐ │

│ │ Message 1 → Message 2 → Message 3 → ... → Message N │ │

│ └───────────────────────────────────────────────────────┘ │

└──────────────────────────┬──────────────────────────────────┘

│ session end

▼

┌─────────────────────────────────────────────────────────────┐

│ Summarization Pipeline │

│ ├── Extract key facts and decisions │

│ ├── Identify action items and outcomes │

│ ├── Note preferences expressed │

│ └── Generate session summary │

└──────────────────────────┬──────────────────────────────────┘

│

▼

┌─────────────────────────────────────────────────────────────┐

│ Memory Storage │

│ ┌─────────────────┐ ┌─────────────────┐ │

│ │ Session Index │ │ Full Transcripts│ │

│ │ (Summaries + │ │ (Raw messages │ │

│ │ Embeddings) │ │ if needed) │ │

│ └─────────────────┘ └─────────────────┘ │

└──────────────────────────┬──────────────────────────────────┘

│ new session starts

▼

┌─────────────────────────────────────────────────────────────┐

│ Context Assembly │

│ ├── Retrieve relevant past session summaries │

│ ├── Load any ongoing task context │

│ ├── Include user profile/preferences │

│ └── Assemble into system prompt │

└─────────────────────────────────────────────────────────────┘

Message Storage

What to Store

For each message:

Message:

├── id: unique identifier

├── session_id: which conversation

├── user_id: whose conversation

├── role: user | assistant

├── content: message text

├── timestamp: when sent

├── tokens: token count

└── metadata: any additional context

Session Metadata

For each conversation session:

Session:

├── id: unique identifier

├── user_id: whose session

├── started_at: timestamp

├── ended_at: timestamp

├── message_count: number of messages

├── summary: generated summary

├── summary_embedding: for retrieval

├── topics: extracted topics

└── outcome: resolved | ongoing | abandoned

Summarization Strategy

When to Summarize

End of session (user leaves)

Session exceeds length threshold

Topic significantly changes

Periodically during long sessions

What to Extract

**Session Summary:**

Main topics discussed

Key decisions made

Questions asked and answered

Action items identified

**Facts Learned:**

User preferences expressed

Personal information shared

Opinions and beliefs stated

Corrections to prior understanding

**Task State:**

What was being worked on

Current status

Next steps identified

Blockers encountered

Summarization Prompt

Summarize this conversation for future reference:

[CONVERSATION]

Extract:

Main topics (2-3 bullet points)

Key decisions or conclusions

Any user preferences expressed

Outstanding questions or tasks

5. One paragraph summary

Format as structured JSON.

Retrieval Strategy

Starting a New Session

Load user profile (persistent preferences)

Search for relevant past sessions by:

- Recency (last few sessions)

- Relevance (if topic known)

- Importance (flagged sessions)

Check for ongoing tasks

Assemble context

During Conversation

When the user references past discussion:

Search session summaries semantically

Retrieve relevant session(s)

Optionally fetch full transcript

Inject into context

Context Budget

Allocate limited context window:

Total Context: 8000 tokens

├── System Prompt: 500 tokens

├── User Profile: 200 tokens

├── Recent Sessions: 1000 tokens (2-3 summaries)

├── Current Session: 5000 tokens

└── Buffer: 1300 tokens

Sliding Window Patterns

Simple Sliding Window

Keep last N messages:

Easy to implement

Predictable context size

Loses old context entirely

Good for short, simple conversations

Summarize and Slide

Summarize older messages:

Keep last N messages verbatim

Summarize messages before that

Preserves key information

More complex implementation

Hierarchical Summarization

Multiple levels of compression:

Recent: Full messages

Medium: Detailed summaries

Old: Brief summaries

Ancient: Topics only

Implementation Example

Session End Handler

async function onSessionEnd(sessionId):

messages = await getSessionMessages(sessionId)

summary = await llm.summarize(messages)

embedding = await embed(summary.text)

await storage.saveSession({

id: sessionId,

summary: summary.text,

summary_embedding: embedding,

topics: summary.topics,

facts: summary.facts,

ended_at: now()

})

// Also save individual facts to memory

for fact in summary.facts:

await memory.add({

content: fact,

source: sessionId,

type: "conversation_fact"

})

Context Assembly

async function assembleContext(userId, currentTopic):

profile = await getProfile(userId)

recentSessions = await getRecentSessions(userId, limit=3)

if currentTopic:

relevantSessions = await searchSessions(

userId,

query=currentTopic,

limit=2

)

ongoingTasks = await getOngoingTasks(userId)

return buildPrompt({

profile,

recentSessions,

relevantSessions,

ongoingTasks

})

Best Practices

Summarize incrementally, not just at session end

Store both summaries and raw transcripts

Use semantic search, not just recency

Extract structured facts, not just text

Track conversation outcomes for learning