Conversation Memory System

Architecture for managing conversation history across sessions

beginnerconversationchat-historysummarizationcontext

Overview

Conversation memory is the foundation of agent memory systems. This architecture handles storing, summarizing, and retrieving conversation history to maintain continuity across sessions while managing context window limitations.

The Challenge

Context Window Limits

LLMs have finite context windows:

  • Can't include entire conversation history
  • Recent context often most relevant
  • But old context sometimes crucial
  • Need smart selection and compression
  • Session Continuity

    Users expect agents to remember:

  • What was discussed previously
  • Decisions made together
  • Ongoing tasks and their status
  • The relationship built over time
  • Architecture Components

    ┌─────────────────────────────────────────────────────────────┐

    │ Current Session │

    │ ┌───────────────────────────────────────────────────────┐ │

    │ │ Message 1 → Message 2 → Message 3 → ... → Message N │ │

    │ └───────────────────────────────────────────────────────┘ │

    └──────────────────────────┬──────────────────────────────────┘

    │ session end

    ┌─────────────────────────────────────────────────────────────┐

    │ Summarization Pipeline │

    │ ├── Extract key facts and decisions │

    │ ├── Identify action items and outcomes │

    │ ├── Note preferences expressed │

    │ └── Generate session summary │

    └──────────────────────────┬──────────────────────────────────┘

    ┌─────────────────────────────────────────────────────────────┐

    │ Memory Storage │

    │ ┌─────────────────┐ ┌─────────────────┐ │

    │ │ Session Index │ │ Full Transcripts│ │

    │ │ (Summaries + │ │ (Raw messages │ │

    │ │ Embeddings) │ │ if needed) │ │

    │ └─────────────────┘ └─────────────────┘ │

    └──────────────────────────┬──────────────────────────────────┘

    │ new session starts

    ┌─────────────────────────────────────────────────────────────┐

    │ Context Assembly │

    │ ├── Retrieve relevant past session summaries │

    │ ├── Load any ongoing task context │

    │ ├── Include user profile/preferences │

    │ └── Assemble into system prompt │

    └─────────────────────────────────────────────────────────────┘

    Message Storage

    What to Store

    For each message:

    Message:

    ├── id: unique identifier

    ├── session_id: which conversation

    ├── user_id: whose conversation

    ├── role: user | assistant

    ├── content: message text

    ├── timestamp: when sent

    ├── tokens: token count

    └── metadata: any additional context

    Session Metadata

    For each conversation session:

    Session:

    ├── id: unique identifier

    ├── user_id: whose session

    ├── started_at: timestamp

    ├── ended_at: timestamp

    ├── message_count: number of messages

    ├── summary: generated summary

    ├── summary_embedding: for retrieval

    ├── topics: extracted topics

    └── outcome: resolved | ongoing | abandoned

    Summarization Strategy

    When to Summarize

  • End of session (user leaves)
  • Session exceeds length threshold
  • Topic significantly changes
  • Periodically during long sessions
  • What to Extract

    **Session Summary:**

  • Main topics discussed
  • Key decisions made
  • Questions asked and answered
  • Action items identified
  • **Facts Learned:**

  • User preferences expressed
  • Personal information shared
  • Opinions and beliefs stated
  • Corrections to prior understanding
  • **Task State:**

  • What was being worked on
  • Current status
  • Next steps identified
  • Blockers encountered
  • Summarization Prompt

    Summarize this conversation for future reference:

    [CONVERSATION]

    Extract:

  • Main topics (2-3 bullet points)
  • Key decisions or conclusions
  • Any user preferences expressed
  • Outstanding questions or tasks
  • 5. One paragraph summary

    Format as structured JSON.

    Retrieval Strategy

    Starting a New Session

  • Load user profile (persistent preferences)
  • Search for relevant past sessions by:
  • - Recency (last few sessions)

    - Relevance (if topic known)

    - Importance (flagged sessions)

  • Check for ongoing tasks
  • Assemble context
  • During Conversation

    When the user references past discussion:

  • Search session summaries semantically
  • Retrieve relevant session(s)
  • Optionally fetch full transcript
  • Inject into context
  • Context Budget

    Allocate limited context window:

    Total Context: 8000 tokens

    ├── System Prompt: 500 tokens

    ├── User Profile: 200 tokens

    ├── Recent Sessions: 1000 tokens (2-3 summaries)

    ├── Current Session: 5000 tokens

    └── Buffer: 1300 tokens

    Sliding Window Patterns

    Simple Sliding Window

    Keep last N messages:

  • Easy to implement
  • Predictable context size
  • Loses old context entirely
  • Good for short, simple conversations
  • Summarize and Slide

    Summarize older messages:

  • Keep last N messages verbatim
  • Summarize messages before that
  • Preserves key information
  • More complex implementation
  • Hierarchical Summarization

    Multiple levels of compression:

  • Recent: Full messages
  • Medium: Detailed summaries
  • Old: Brief summaries
  • Ancient: Topics only
  • Implementation Example

    Session End Handler

    async function onSessionEnd(sessionId):

    messages = await getSessionMessages(sessionId)

    summary = await llm.summarize(messages)

    embedding = await embed(summary.text)

    await storage.saveSession({

    id: sessionId,

    summary: summary.text,

    summary_embedding: embedding,

    topics: summary.topics,

    facts: summary.facts,

    ended_at: now()

    })

    // Also save individual facts to memory

    for fact in summary.facts:

    await memory.add({

    content: fact,

    source: sessionId,

    type: "conversation_fact"

    })

    Context Assembly

    async function assembleContext(userId, currentTopic):

    profile = await getProfile(userId)

    recentSessions = await getRecentSessions(userId, limit=3)

    if currentTopic:

    relevantSessions = await searchSessions(

    userId,

    query=currentTopic,

    limit=2

    )

    ongoingTasks = await getOngoingTasks(userId)

    return buildPrompt({

    profile,

    recentSessions,

    relevantSessions,

    ongoingTasks

    })

    Best Practices

  • Summarize incrementally, not just at session end
  • Store both summaries and raw transcripts
  • Use semantic search, not just recency
  • Extract structured facts, not just text
  • Track conversation outcomes for learning