Back to All Events

Evaluating the World Model Implicit in a Generative Model

Generative models like LLMs seem capable of understanding the world, but how well do they really grasp the underlying structure of the tasks they perform? This month, André Röhm will delve into recent research that defines and evaluates a set of "world models" that are based on deterministic finite automata.

Using examples from navigation, board games, and logic puzzles, the paper discusses new metrics for measuring whether models can recognize patterns and differentiate between distinct situations. The stricter environment provides insight into the coherence of world models and new ways to assess generative models.

Recent work suggests that large language models may implicitly learn world models. How should we assess this possibility? We formalize this question for the case where the underlying reality is governed by a deterministic finite automaton. This includes problems as diverse as simple logical reasoning, geographic navigation, game-playing, and chemistry. We propose new evaluation metrics for world model recovery inspired by the classic Myhill-Nerode theorem from language theory. We illustrate their utility in three domains: game playing, logic puzzles, and navigation. In all domains, the generative models we consider do well on existing diagnostics for assessing world models, but our evaluation metrics reveal their world models to be far less coherent than they appear. Such incoherence creates fragility: using a generative model to solve related but subtly different tasks can lead to failures. Building generative models that meaningfully capture the underlying logic of the domains they model would be immensely valuable; our results suggest new ways to assess how close a given model is to that goal.

— Evaluating the World Model Implicit in a Generative Model, Keyon Vafa et al. 2024

Photo by Louis Maniquet on Unsplash

Previous
Previous
20 November

Against Almost Every Theory of Impact of Interpretability