In early 2025, the release of OpenAI's reasoning models o1 and o3 sent shockwaves through the AI world. The heads of OpenAI, Anthropic, and Google DeepMind all went on record predicting human-level AI within just a few years. By the year's end, however, median forecasts on Metaculus had shifted outward by 2.5 years, even further than pre-2025 baselines. What happened?
In our second benkyoukai this month, we'll work through Rob Wiblin's analysis of the 2025 timeline rollercoaster. We'll examine why the initial excitement faded: why reasoning gains failed to generalize beyond checkable domains like math and coding, why most benchmark improvements appear to have come from expensive inference scaling rather than training, and why rising agent costs may mean headline progress metrics overstate practical gains. Join us to get grounded on where things actually stand and what it means for AI Safety.
"Models keep getting more impressive at the rate the short timelines people predict, but more useful at the rate the long timelines people predict."
— Dwarkesh Patel"Behind all of the noisy swings in sentiment — people suddenly becoming optimistic and then horribly pessimistic — AI has neither gone to the moon nor has it hit a wall. Instead, it's just gradually but relentlessly been getting a bit more useful every month for years now."
— Rob Wiblin, What the hell happened with AGI timelines in 2025?, 80,000 Hours Podcast (2026)