Episode Details

Back to Episodes

“I underestimated AI capabilities (again)” by Ajeya

Published 1 month, 3 weeks ago
Description

Note: This post was crossposted from Planned Obsolescence by the Forum team, with the author's permission. The author may not see or respond to comments on this post.

Revisiting a prediction ten months early

On Jan 14th, I made predictions about AI progress in 2026. My forecasts for software engineering capabilities already feel much too conservative.

In my view, METR (where I now work) has some of the hardest and highest-quality software engineering and ML engineering benchmarks out there, and the most useful framework for making benchmark performance intuitive: we measure a task's difficulty by the amount of time a human expert would take to complete it (called the “time horizon”).[1]

When I made my forecasts last month, the model with the longest measured time horizon on METR's suite of software engineering tasks was Claude Opus 4.5; it could succeed around half the time at software tasks that would take a human software engineer about five hours.[2] Time horizons on software tasks had been doubling a little less than twice a year from 2019 through 2025, which would have implied the state-of-the-art 50% time horizon should be somewhat less than 20 hours by the end of 2026.[3] But there [...]

---

First published:
March 5th, 2026

Source:
https://forum.effectivealtruism.org/posts/h3zcCh49kLE6egrfz/i-underestimated-ai-capabilities-again

---

Narrated by TYPE III AUDIO.

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us