RAG-Enhanced Agent

Agent architecture combining retrieval-augmented generation with persistent memory

intermediateragretrievalknowledge-basehybrid

Overview

This architecture combines traditional RAG (Retrieval-Augmented Generation) with persistent agent memory. While RAG provides access to a static knowledge base, memory adds dynamic, evolving context from interactions. Together, they create agents that are both knowledgeable and personalized.

The Two Memory Systems

Knowledge Base (RAG)

Static or slowly-changing information:

  • Product documentation
  • Company policies
  • FAQs and guides
  • Domain knowledge
  • Reference materials
  • Agent Memory

    Dynamic, interaction-derived information:

  • User preferences learned
  • Conversation history
  • Task context
  • Relationship data
  • Evolving understanding
  • Architecture Components

    User Query

    ┌─────────────────┐

    │ Query Router │ ── Determine what context is needed

    └────────┬────────┘

    ┌────┴────┐

    ▼ ▼

    ┌───────┐ ┌───────┐

    │ RAG │ │Memory │

    │Search │ │Search │

    └───┬───┘ └───┬───┘

    │ │

    └────┬────┘

    ┌─────────────────┐

    │Context Assembly │ ── Combine knowledge + memory

    └────────┬────────┘

    ┌─────────────────┐

    │ LLM + Prompt │ ── Generate response

    └────────┬────────┘

    ┌─────────────────┐

    │ Memory Update │ ── Store new learnings

    └─────────────────┘

    Query Routing

    Not every query needs both systems:

    Knowledge-Heavy Queries

    "What's your return policy?"

  • Primary: RAG search for policy docs
  • Secondary: Memory for user's past returns
  • Memory-Heavy Queries

    "What did we discuss last time?"

  • Primary: Conversation memory
  • Secondary: Topics might link to knowledge base
  • Hybrid Queries

    "Based on my preferences, what do you recommend?"

  • Memory: User preferences, past purchases
  • RAG: Product catalog, current offerings
  • Collection Strategy

    Separate Collections

    Keep RAG and memory distinct:

    **Knowledge Collection**

  • Documents chunked and embedded
  • Metadata: source, category, last_updated
  • Refresh cycle tied to content updates
  • **Memory Collection**

  • User-specific memories
  • Metadata: user_id, type, timestamp, importance
  • Continuous updates during interactions
  • Query Both, Merge Results

    knowledge_results = qdrant.search(

    collection="knowledge",

    query_vector=query_embedding,

    limit=5

    )

    memory_results = qdrant.search(

    collection="memories",

    query_vector=query_embedding,

    filter={"user_id": current_user},

    limit=5

    )

    context = merge_and_rank(knowledge_results, memory_results)

    Context Assembly

    Prioritization Logic

    When context window is limited:

  • Most relevant memory (user-specific context)
  • Most relevant knowledge (factual grounding)
  • Recent conversation (immediate context)
  • Lower-relevance items as space allows
  • Deduplication

    Avoid redundancy:

  • Memory might contain learned facts also in knowledge base
  • Prefer authoritative knowledge base version
  • Use memory version when personalized
  • Source Attribution

    Track where context came from:

  • Enable citations in responses
  • Debug retrieval quality
  • Audit information sources
  • Memory Update Patterns

    During Conversation

    Extract and store:

  • New user preferences expressed
  • Facts about user learned
  • Topic interests demonstrated
  • Feedback on recommendations
  • Post-Conversation

    Consolidation tasks:

  • Summarize conversation
  • Update user profile
  • Connect to existing memories
  • Prune redundant entries
  • Scaling Considerations

    Knowledge Base

  • Chunking strategy affects retrieval quality
  • Regular re-indexing for freshness
  • Version control for rollbacks
  • Multi-collection for different domains
  • Memory Store

  • Per-user isolation
  • Growth management over time
  • Archival strategy for old memories
  • Fast retrieval for active users
  • When to Use This Pattern

    Good fit:

  • Customer support with documentation
  • Product assistants with catalogs
  • Enterprise agents with knowledge bases
  • Any domain needing both facts and personalization
  • Consider alternatives if:

  • No static knowledge needed (pure memory agent)
  • No personalization needed (pure RAG)
  • Real-time knowledge required (add live search)