RAG-Enhanced Agent - Agent Memory Architecture

Overview

This architecture combines traditional RAG (Retrieval-Augmented Generation) with persistent agent memory. While RAG provides access to a static knowledge base, memory adds dynamic, evolving context from interactions. Together, they create agents that are both knowledgeable and personalized.

The Two Memory Systems

Knowledge Base (RAG)

Static or slowly-changing information:

Product documentation

Company policies

FAQs and guides

Domain knowledge

Reference materials

Agent Memory

Dynamic, interaction-derived information:

User preferences learned

Conversation history

Task context

Relationship data

Evolving understanding

Architecture Components

User Query

│

▼

┌─────────────────┐

│ Query Router │ ── Determine what context is needed

└────────┬────────┘

│

┌────┴────┐

▼ ▼

┌───────┐ ┌───────┐

│ RAG │ │Memory │

│Search │ │Search │

└───┬───┘ └───┬───┘

│ │

└────┬────┘

▼

┌─────────────────┐

│Context Assembly │ ── Combine knowledge + memory

└────────┬────────┘

│

▼

┌─────────────────┐

│ LLM + Prompt │ ── Generate response

└────────┬────────┘

│

▼

┌─────────────────┐

│ Memory Update │ ── Store new learnings

└─────────────────┘

Query Routing

Not every query needs both systems:

Knowledge-Heavy Queries

"What's your return policy?"

Primary: RAG search for policy docs

Secondary: Memory for user's past returns

Memory-Heavy Queries

"What did we discuss last time?"

Primary: Conversation memory

Secondary: Topics might link to knowledge base

Hybrid Queries

"Based on my preferences, what do you recommend?"

Memory: User preferences, past purchases

RAG: Product catalog, current offerings

Collection Strategy

Separate Collections

Keep RAG and memory distinct:

**Knowledge Collection**

Documents chunked and embedded

Metadata: source, category, last_updated

Refresh cycle tied to content updates

**Memory Collection**

User-specific memories

Metadata: user_id, type, timestamp, importance

Continuous updates during interactions

Query Both, Merge Results

knowledge_results = qdrant.search(

collection="knowledge",

query_vector=query_embedding,

limit=5

)

memory_results = qdrant.search(

collection="memories",

query_vector=query_embedding,

filter={"user_id": current_user},

limit=5

)

context = merge_and_rank(knowledge_results, memory_results)

Context Assembly

Prioritization Logic

When context window is limited:

Most relevant memory (user-specific context)

Most relevant knowledge (factual grounding)

Recent conversation (immediate context)

Lower-relevance items as space allows

Deduplication

Avoid redundancy:

Memory might contain learned facts also in knowledge base

Prefer authoritative knowledge base version

Use memory version when personalized

Source Attribution

Track where context came from:

Enable citations in responses

Debug retrieval quality

Audit information sources

Memory Update Patterns

During Conversation

Extract and store:

New user preferences expressed

Facts about user learned

Topic interests demonstrated

Feedback on recommendations

Post-Conversation

Consolidation tasks:

Summarize conversation

Update user profile

Connect to existing memories

Prune redundant entries

Scaling Considerations

Knowledge Base

Chunking strategy affects retrieval quality

Regular re-indexing for freshness

Version control for rollbacks

Multi-collection for different domains

Memory Store

Per-user isolation

Growth management over time

Archival strategy for old memories

Fast retrieval for active users

When to Use This Pattern

Good fit:

Customer support with documentation

Product assistants with catalogs

Enterprise agents with knowledge bases

Any domain needing both facts and personalization

Consider alternatives if:

No static knowledge needed (pure memory agent)

No personalization needed (pure RAG)

Real-time knowledge required (add live search)