Back to All Events

AIIF Masterclass: Are Emergent Abilities of Large Language Models a Mirage?

This Masterclass event will be Virtual Only on June 25, 2024, at 17:00 (Japan time). For access: contact@aiindustryfoundation.org.

----

Were you surprised by a sudden advance in large language model capabilities? These sudden advances are often called “emergent”, a word usually used when a strange thing arises from a large number of normal things:

  • Consciousness is an “emergent” property of large networks of neurons

  • Flocking behaviour is an “emergent” property of the individual action of large numbers of animals

A single bird does not flock, nor does a single neuron think, but put enough together and the collective flocks and thinks. Examining the behaviour of a single bird or neuron will tell you little about the behaviour of flocks or brains. By analogy, a small language model cannot answer questions, and studying small language models tells you little about the question-answering performance of large language models—the question-answering ability “emerges” with scale.

Or so we used to believe. In an influential paper, “Are Emergent Abilities of Large Language Models a Mirage?”, Schaeffer et al. suggests that supposedly “emergent” abilities are actually an artifact of poor measurements that fail to give partial credit. Consider three possible answers:

What is the capital of France?

1. Paris

2. London

3. An angry weasel

We would usually mark answer 1 as “correct” and answers 2 and 3 as “incorrect”, but Schaeffer et al. argue that answer 2 is clearly better than 3, and this should be reflected in our choice of metric. Choosing better metrics that allow for partial credit for almost-correct answers often reveals latent “emergent” capabilities long before they actually appear.

In this session, we’ll discuss “emergent” capabilities, ask how we can predict them before they appear, how we can better measure the performance of language models to be less surprised, and how we can prepare to take advantage of new capabilities rather than being caught flat-footed.

Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, one can choose a metric which leads to the inference of an emergent ability or another metric which does not. Thus, our alternative suggests that existing claims of emergent abilities are creations of the researcher's analyses, not fundamental changes in model behavior on specific tasks with scale. We present our explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities, (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show how similar metric decisions suggest apparent emergent abilities on vision tasks in diverse deep network architectures (convolutional, autoencoder, transformers). In all three analyses, we find strong supporting evidence that emergent abilities may not be a fundamental property of scaling AI models.

Are Emergent Abilities of Large Language Models a Mirage?, Schaeffer et al. 2023

Previous
Previous
11 June

AIIF Masterclass: Business Process Model and Notation (BPMN)

Next
Next
26 June

Is GPT conscious?