Toolformer: Language Models Can Teach Themselves to Use Tools - Agent Memory Research

Key Contribution

Toolformer shows that language models can teach themselves when and how to use external tools. The key insight is using the model's own predictions to generate training data for tool use, then filtering based on whether tools actually help.

Approach

Self-Supervised Tool Learning

Model generates potential API calls

Execute calls and insert results

Keep calls that reduce perplexity

Fine-tune on filtered examples

No Human Annotations

Unlike prior work:

No manually labeled tool-use examples

Model discovers when tools help

Learns appropriate tool syntax

Generalizes to new situations

Tools Supported

Calculator

For arithmetic operations:

The population is [Calculator(1000000 * 1.02^10)] after 10 years.

→ The population is [Calculator(1000000 * 1.02^10) → 1218994] after 10 years.

Search Engine

For factual queries:

The capital of France is [Search(capital of France)].

→ The capital of France is [Search(capital of France) → Paris] Paris.

Wikipedia

For encyclopedic knowledge:

Einstein was born in [Wikipedia(Albert Einstein)].

→ Einstein was born in [Wikipedia(Albert Einstein) → Ulm, Germany] Ulm.

Machine Translation

For language conversion:

"Hello" in French is [MT(Hello, French)].

→ "Hello" in French is [MT(Hello, French) → Bonjour] "Bonjour".

Calendar

For date operations:

In 5 days it will be [Calendar()].

→ In 5 days it will be [Calendar() → January 20, 2024] January 20th.

Training Pipeline

Step 1: Sample API Calls

For each position in training text:

Sample potential API calls from model

Use few-shot prompting to guide format

Step 2: Execute APIs

Run each sampled call

Insert results back into text

Create augmented training examples

Step 3: Filter by Usefulness

Keep examples where tool helps:

L(with_tool) + threshold < L(without_tool)

Where L = negative log likelihood

Step 4: Fine-tune

Train model on filtered examples

Model learns when tools reduce uncertainty

Generalizes to new contexts

Results

Improvements

Significant gains on:

Math word problems

Question answering

Temporal reasoning

Multilingual tasks

Emergent Behaviors

Model learns to:

Chain multiple tools

Use tools for verification

Skip tools when unnecessary

Handle tool failures

Relevance to Agent Memory

Memory as Tool

Memory can be treated as a tool:

User asked about [Memory(search: user preferences)] before.

→ User asked about [Memory(search: user preferences) → coffee preferences] their coffee preferences before.

Self-Supervised Memory Learning

Could extend Toolformer approach:

Model generates memory calls

Filter by whether memory helps

Learn when to remember/recall

API Design for Memory

Memory API could include:

`Remember(fact)`: Store information

`Recall(query)`: Retrieve information

`Forget(id)`: Remove information

Implementation Insights

Few-Shot Prompting

Guide API generation with examples:

Your task is to add API calls to text.

Example: 5 * 3 = [Calculator(5 * 3)]

Filtering Threshold

Hyperparameter for how much tool must help:

Too low: noisy examples

Too high: few examples

Tune on held-out data

Execution Environment

Need sandboxed tool execution:

Safe calculator evaluation

Rate-limited API calls

Error handling

Limitations

Requires tool execution at training

Limited to text-in-text-out tools

May learn incorrect tool usage

Doesn't handle complex tool sequences

Citation

@article{schick2023toolformer,

title={Toolformer: Language Models Can Teach Themselves to Use Tools},

author={Schick, Timo and Dwivedi-Yu, Jane and Dess{\`\i}, Roberto and Raileanu, Roberta and Lomeli, Maria and Zettlemoyer, Luke and Cancedda, Nicola and Scialom, Thomas},

journal={arXiv preprint arXiv:2302.04761},

year={2023}

}

TL;DR