RAG (Retrieval-Augmented Generation)

intermediate
TechniquesLast updated: 2025-01-15
Also known as: retrieval augmented generation, RAG pattern

What is RAG?


Retrieval-Augmented Generation (RAG) is a technique that enhances LLM responses by retrieving relevant information from external sources before generating a response. Instead of relying solely on training data, RAG grounds responses in retrieved context.


How RAG Works


The RAG pipeline:


1. **Query**: User asks a question

2. **Retrieve**: Find relevant documents/memories

3. **Augment**: Add retrieved context to the prompt

4. **Generate**: LLM produces grounded response


RAG vs Fine-tuning


| Aspect | RAG | Fine-tuning |

|--------|-----|-------------|

| Knowledge updates | Real-time | Requires retraining |

| Cost | Lower (no training) | Higher (compute) |

| Accuracy | Good with good retrieval | Can be very high |

| Customization | Flexible | Fixed after training |


RAG for Agent Memory


RAG enables memory-augmented agents by:


  • Retrieving relevant past conversations
  • Grounding responses in user history
  • Accessing up-to-date information
  • Reducing hallucinations
  • Enabling personalization

  • RAG Components


    Key components of a RAG system:


  • **Embedding Model**: Converts text to vectors
  • **Vector Store**: Indexes and searches embeddings
  • **Retriever**: Finds relevant documents
  • **Reranker**: Improves retrieval quality
  • **Generator**: Produces final response

  • Retrieval Strategies


    Common retrieval approaches:


  • **Dense Retrieval**: Embedding-based similarity
  • **Sparse Retrieval**: BM25, keyword matching
  • **Hybrid**: Combining dense and sparse
  • **Multi-query**: Generate multiple search queries

  • Advanced RAG Patterns


    Sophisticated RAG techniques:


  • **Self-RAG**: Model decides when to retrieve
  • **Corrective RAG**: Validates and corrects retrieval
  • **Iterative RAG**: Multiple retrieval rounds
  • **Agentic RAG**: Agents controlling retrieval

  • Challenges


    Common RAG challenges:


  • Retrieval quality ("garbage in, garbage out")
  • Context window limits
  • Latency from retrieval step
  • Chunking strategies
  • Handling no relevant results

  • Related Terms