What is RAG?
Retrieval-Augmented Generation (RAG) is a technique that enhances LLM responses by retrieving relevant information from external sources before generating a response. Instead of relying solely on training data, RAG grounds responses in retrieved context.
How RAG Works
The RAG pipeline:
1. **Query**: User asks a question
2. **Retrieve**: Find relevant documents/memories
3. **Augment**: Add retrieved context to the prompt
4. **Generate**: LLM produces grounded response
RAG vs Fine-tuning
| Aspect | RAG | Fine-tuning |
|--------|-----|-------------|
| Knowledge updates | Real-time | Requires retraining |
| Cost | Lower (no training) | Higher (compute) |
| Accuracy | Good with good retrieval | Can be very high |
| Customization | Flexible | Fixed after training |
RAG for Agent Memory
RAG enables memory-augmented agents by:
RAG Components
Key components of a RAG system:
Retrieval Strategies
Common retrieval approaches:
Advanced RAG Patterns
Sophisticated RAG techniques:
Challenges
Common RAG challenges: