Retrieval-Augmented Generation for Large Language Models: A Survey - Agent Memory Research

Overview

This survey provides a systematic review of Retrieval-Augmented Generation (RAG) for LLMs, covering the paradigm's evolution, core components, and advanced techniques. Essential reading for understanding how external knowledge enhances language models.

RAG Paradigm

Basic Pipeline

**Indexing**: Process and store documents in retrievable format

**Retrieval**: Find relevant documents for a query

**Generation**: Use retrieved context to generate response

Evolution

Naive RAG: Simple retrieve-then-read pipeline

Advanced RAG: Pre/post-retrieval optimization

Modular RAG: Flexible, composable components

Retrieval Enhancements

Pre-Retrieval

Query rewriting and expansion

Query decomposition for complex questions

Hypothetical document generation (HyDE)

Retrieval Methods

Sparse retrieval (BM25, TF-IDF)

Dense retrieval (embeddings, bi-encoders)

Hybrid approaches combining both

Learned retrievers fine-tuned for task

Post-Retrieval

Re-ranking retrieved documents

Compression and summarization

Filtering irrelevant passages

Context window optimization

Generation Strategies

Context Integration

Prepend retrieved documents to prompt

Interleave retrieval with generation

Iterative retrieval-generation cycles

Fusion Approaches

Late fusion: Retrieve then generate

Early fusion: Joint encoding

Intermediate fusion: Cross-attention

Advanced Techniques

Iterative RAG

Multiple retrieval-generation cycles:

Generate initial response

Identify knowledge gaps

Retrieve additional information

Refine response

Repeat until satisfied

Self-RAG

Model decides when and what to retrieve:

Learns to issue retrieval calls

Evaluates retrieval necessity

Critiques own outputs

Knowledge Graphs + RAG

Structured knowledge integration:

Entity-centric retrieval

Relationship traversal

Multi-hop reasoning

Evaluation

Metrics

Answer accuracy

Retrieval precision/recall

Attribution quality

Hallucination rate

Benchmarks

Natural Questions

TriviaQA

HotpotQA (multi-hop)

ASQA (ambiguous questions)

Relevance to Agent Memory

RAG provides the foundation for agent memory systems:

Conversation history as retrievable knowledge

User preferences stored and retrieved

Dynamic knowledge that updates over time

Semantic search over agent experiences

Key Takeaways

RAG significantly reduces hallucination

Quality of retrieval directly impacts generation

Hybrid approaches often work best

Task-specific tuning improves results

Citation

@article{gao2023retrieval,

title={Retrieval-Augmented Generation for Large Language Models: A Survey},

author={Gao, Yunfan and others},

journal={arXiv preprint arXiv:2312.10997},

year={2023}

}

TL;DR