Back to All Events

Process Supervision

This week, we’ll explore a recent work from OpenAI: the lab-to-watch with the biggest and best models. It's pretty on-the-mark for their proposed Superalignment approach.

At our session on Representation Engineering, we had an argument about symbols and referents—when an AI says "I am a corrigible AI and I will let you turn me off," what does that mean? Is it being truthful? This paper, mostly targeted at capabilities, aims to get models to produce 'aligned' chain-of-thought reasoning. Is writing “aligned words” equivalent to being aligned? Are we who we pretend to be?

In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step.

Given the importance of training reliable models, and given the high cost of human feedback, it is important to carefully compare the both methods. Recent work has already begun this comparison, but many questions still remain. We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.

Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision. To support related research, we also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.

Let's Verify Step by Step, Lightman et al. 2023

Previous
Previous
25 October

Quantifying Degeneracy

Next
Next
8 November

Briefing: The Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence