Toolformer: Language Models Can Teach Themselves to Use Tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom

arXiv · 2023

tool-useself-supervisedapi-callsaugmentation

TL;DR

Demonstrates that LLMs can learn to use external tools (calculator, search, etc.) in a self-supervised way by generating and filtering their own training data.

Key Contribution

Toolformer shows that language models can teach themselves when and how to use external tools. The key insight is using the model's own predictions to generate training data for tool use, then filtering based on whether tools actually help.

Approach

Self-Supervised Tool Learning

  • Model generates potential API calls
  • Execute calls and insert results
  • Keep calls that reduce perplexity
  • Fine-tune on filtered examples
  • No Human Annotations

    Unlike prior work:

  • No manually labeled tool-use examples
  • Model discovers when tools help
  • Learns appropriate tool syntax
  • Generalizes to new situations
  • Tools Supported

    Calculator

    For arithmetic operations:

    The population is [Calculator(1000000 * 1.02^10)] after 10 years.

    → The population is [Calculator(1000000 * 1.02^10) → 1218994] after 10 years.

    Search Engine

    For factual queries:

    The capital of France is [Search(capital of France)].

    → The capital of France is [Search(capital of France) → Paris] Paris.

    Wikipedia

    For encyclopedic knowledge:

    Einstein was born in [Wikipedia(Albert Einstein)].

    → Einstein was born in [Wikipedia(Albert Einstein) → Ulm, Germany] Ulm.

    Machine Translation

    For language conversion:

    "Hello" in French is [MT(Hello, French)].

    → "Hello" in French is [MT(Hello, French) → Bonjour] "Bonjour".

    Calendar

    For date operations:

    In 5 days it will be [Calendar()].

    → In 5 days it will be [Calendar() → January 20, 2024] January 20th.

    Training Pipeline

    Step 1: Sample API Calls

    For each position in training text:

  • Sample potential API calls from model
  • Use few-shot prompting to guide format
  • Step 2: Execute APIs

  • Run each sampled call
  • Insert results back into text
  • Create augmented training examples
  • Step 3: Filter by Usefulness

    Keep examples where tool helps:

    L(with_tool) + threshold < L(without_tool)

    Where L = negative log likelihood

    Step 4: Fine-tune

  • Train model on filtered examples
  • Model learns when tools reduce uncertainty
  • Generalizes to new contexts
  • Results

    Improvements

    Significant gains on:

  • Math word problems
  • Question answering
  • Temporal reasoning
  • Multilingual tasks
  • Emergent Behaviors

    Model learns to:

  • Chain multiple tools
  • Use tools for verification
  • Skip tools when unnecessary
  • Handle tool failures
  • Relevance to Agent Memory

    Memory as Tool

    Memory can be treated as a tool:

    User asked about [Memory(search: user preferences)] before.

    → User asked about [Memory(search: user preferences) → coffee preferences] their coffee preferences before.

    Self-Supervised Memory Learning

    Could extend Toolformer approach:

  • Model generates memory calls
  • Filter by whether memory helps
  • Learn when to remember/recall
  • API Design for Memory

    Memory API could include:

  • `Remember(fact)`: Store information
  • `Recall(query)`: Retrieve information
  • `Forget(id)`: Remove information
  • Implementation Insights

    Few-Shot Prompting

    Guide API generation with examples:

    Your task is to add API calls to text.

    Example: 5 * 3 = [Calculator(5 * 3)]

    Filtering Threshold

    Hyperparameter for how much tool must help:

  • Too low: noisy examples
  • Too high: few examples
  • Tune on held-out data
  • Execution Environment

    Need sandboxed tool execution:

  • Safe calculator evaluation
  • Rate-limited API calls
  • Error handling
  • Limitations

  • Requires tool execution at training
  • Limited to text-in-text-out tools
  • May learn incorrect tool usage
  • Doesn't handle complex tool sequences
  • Citation

    @article{schick2023toolformer,

    title={Toolformer: Language Models Can Teach Themselves to Use Tools},

    author={Schick, Timo and Dwivedi-Yu, Jane and Dess{\`\i}, Roberto and Raileanu, Roberta and Lomeli, Maria and Zettlemoyer, Luke and Cancedda, Nicola and Scialom, Thomas},

    journal={arXiv preprint arXiv:2302.04761},

    year={2023}

    }