Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Haofen Wang

arXiv · 2023

ragsurveyretrievalknowledge-augmentation

TL;DR

Comprehensive survey of RAG techniques covering retrieval methods, generation approaches, and augmentation strategies for enhancing LLMs with external knowledge.

Overview

This survey provides a systematic review of Retrieval-Augmented Generation (RAG) for LLMs, covering the paradigm's evolution, core components, and advanced techniques. Essential reading for understanding how external knowledge enhances language models.

RAG Paradigm

Basic Pipeline

  • **Indexing**: Process and store documents in retrievable format
  • **Retrieval**: Find relevant documents for a query
  • **Generation**: Use retrieved context to generate response
  • Evolution

  • Naive RAG: Simple retrieve-then-read pipeline
  • Advanced RAG: Pre/post-retrieval optimization
  • Modular RAG: Flexible, composable components
  • Retrieval Enhancements

    Pre-Retrieval

  • Query rewriting and expansion
  • Query decomposition for complex questions
  • Hypothetical document generation (HyDE)
  • Retrieval Methods

  • Sparse retrieval (BM25, TF-IDF)
  • Dense retrieval (embeddings, bi-encoders)
  • Hybrid approaches combining both
  • Learned retrievers fine-tuned for task
  • Post-Retrieval

  • Re-ranking retrieved documents
  • Compression and summarization
  • Filtering irrelevant passages
  • Context window optimization
  • Generation Strategies

    Context Integration

  • Prepend retrieved documents to prompt
  • Interleave retrieval with generation
  • Iterative retrieval-generation cycles
  • Fusion Approaches

  • Late fusion: Retrieve then generate
  • Early fusion: Joint encoding
  • Intermediate fusion: Cross-attention
  • Advanced Techniques

    Iterative RAG

    Multiple retrieval-generation cycles:

  • Generate initial response
  • Identify knowledge gaps
  • Retrieve additional information
  • Refine response
  • Repeat until satisfied
  • Self-RAG

    Model decides when and what to retrieve:

  • Learns to issue retrieval calls
  • Evaluates retrieval necessity
  • Critiques own outputs
  • Knowledge Graphs + RAG

    Structured knowledge integration:

  • Entity-centric retrieval
  • Relationship traversal
  • Multi-hop reasoning
  • Evaluation

    Metrics

  • Answer accuracy
  • Retrieval precision/recall
  • Attribution quality
  • Hallucination rate
  • Benchmarks

  • Natural Questions
  • TriviaQA
  • HotpotQA (multi-hop)
  • ASQA (ambiguous questions)
  • Relevance to Agent Memory

    RAG provides the foundation for agent memory systems:

  • Conversation history as retrievable knowledge
  • User preferences stored and retrieved
  • Dynamic knowledge that updates over time
  • Semantic search over agent experiences
  • Key Takeaways

  • RAG significantly reduces hallucination
  • Quality of retrieval directly impacts generation
  • Hybrid approaches often work best
  • Task-specific tuning improves results
  • Citation

    @article{gao2023retrieval,

    title={Retrieval-Augmented Generation for Large Language Models: A Survey},

    author={Gao, Yunfan and others},

    journal={arXiv preprint arXiv:2312.10997},

    year={2023}

    }