Episode Details
Back to Episodes“Retrying vs Resampling in AI Control” by james.lucassen, Adam Kaufman
Description
We’ve just released a new paper: Retrying vs Resampling in AI Control. We revisit the resampling protocols introduced in Ctrl-Z with an up-to-date setting and much stronger models, and compare them against “retrying” protocols similar to Claude Code auto mode or Codex Auto-review.
Motivation
Roughly a year ago we released Ctrl-Z, the first paper to study control techniques for agents. A headline result of that paper was the performance of resample protocols – strategies that involve taking multiple i.i.d. samples from the model per step. But since Ctrl-Z, models have gotten much stronger, and we have built more sophisticated control settings to keep up. We wanted to answer the following questions:
How well do the results from Ctrl-Z hold up with better models and a better setting? Current high stakes control research is trying to learn by analogy about how to do control effectively in a real high stakes deployment during a real intelligence explosion. Findings about technique performance[1] are going to have to generalize pretty far to be useful.
If the resample protocols from Ctrl-Z still work, what makes them work? One way we try to make our work more generalizable is by understanding the dynamics governing outcomes [...]
---
Outline:
(00:30) Motivation
(02:36) TL;DR Takeaways
(03:55) en-US-AvaMultilingualNeural__ Bar graph showing safety percentages across different monitoring and resampling methods.
(04:36) Methodology
(07:16) Differences from Ctrl-Z
(12:11) Are Retrying Protocols Exploitable?
(15:24) Cost and Latency of Resampling
(17:22) Conclusion
The original text contained 7 footnotes which were omitted from this narration.
---
First published:
May 29th, 2026
Source:
https://www.lesswrong.com/posts/yThJQTJxtmKNeZGwA/retrying-vs-resampling-in-ai-control
---
Narrated by TYPE III AUDIO.
---
Love PodBriefly?
If you like Podbriefly.com, please consider donating to support the ongoing development.
Support Us