Key Contribution
REALM shows that retrieval can be incorporated into language model pre-training itself. Rather than adding retrieval as a post-hoc component, the retriever is trained end-to-end with the language model, learning what knowledge to retrieve for the masked language modeling objective.
Architecture
Neural Knowledge Retriever
Given input x, retrieve relevant document z:
p(z|x) = exp(f(x,z)) / Σ_z' exp(f(x,z'))
Where f(x,z) = Embed_input(x) · Embed_doc(z)
Knowledge-Augmented Encoder
Combine input with retrieved document:
p(y|z,x) = Language_Model([x; z])
Overall: p(y|x) = Σ_z p(y|z,x) p(z|x)
End-to-End Training
Both retriever and LM trained jointly:
Pre-Training
Objective
Masked language modeling with retrieval:
Async Index Refresh
Challenge: Document embeddings change during training
Solution:
Salient Span Masking
Mask named entities and dates:
Fine-Tuning
Open-Domain QA
Results
State-of-the-art on:
Without any corpus-specific pretraining.
Key Insights
Retrieval as Latent Variable
Treating z as latent enables:
Knowledge Externalization
Benefits of stored knowledge:
Pre-training Matters
Retrieval-augmented pre-training:
Comparison to DPR
| Aspect | DPR | REALM |
|--------|-----|-------|
| Training | Supervised | Self-supervised |
| Data needed | QA pairs | Raw text |
| Retriever | Fixed during LM | Joint training |
| Pre-training | No | Yes |
Relevance to Agent Memory
Learned Memory Access
REALM principles for agents:
Memory Pre-training
Could pre-train agents to:
Implementation Challenges
Compute Requirements
Engineering Complexity
Limitations
Citation
@inproceedings{guu2020realm,
title={REALM: Retrieval-Augmented Language Model Pre-Training},
author={Guu, Kelvin and Lee, Kenton and Tung, Zora and Pasupat, Panupong and Chang, Ming-Wei},
booktitle={International Conference on Machine Learning},
year={2020}
}