Podcast Episodes
Back to Search“The Unintelligibility is Ours: Notes on Chain-of-Thought” by 1a3orn
Many people seem to think that the chains-of-thought in RL-trained LLMs are under a great deal of "pressure" to cease being English. The idea is tha…
1 week, 4 days ago
“If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines” by ryan_greenblatt
Anthropic's system card for Mythos Preview says:
It's unclear how we should interpret this. What do they mean by productivity uplift? To what exten…
1 week, 4 days ago
“Claude Mythos #2: Cybersecurity and Project Glasswing” by Zvi
Anthropic is not going to release its new most capable model, Claude Mythos, to the public any time soon. Its cyber capabilities are too dangerous t…
1 week, 4 days ago
“Why Control Creates Conflict, and When to Open Instead” by plex
tl;dr: with multiple agents, control attempts tend to create conflict, because control attempts shut down communications channels, which leads to fe…
1 week, 4 days ago
“Reproducing steering against evaluation awareness in a large open-weight model” by Thomas Read, Bronson Schoen, Joseph Bloom
Produced as part of the UK AISI Model Transparency Team. Our team works on ensuring models don't subvert safety assessments, e.g. through evaluation…
1 week, 4 days ago
“Have we already lost? Part 2: Reasons for Doom” by LawrenceC
Written very quickly for the Inkhaven Residency.
As I take the time to reflect on the state of AI Safety in early 2026, one question feels unavoidab…
1 week, 4 days ago
“Model organisms researchers should check whether high LRs defeat their model organisms” by dx26, Sebastian Prasanna, Alek Westover, Vivek Hebbar, Julian Stastny
Thanks to Buck Shlegeris for feedback on a draft of this post.
The goal-guarding hypothesis states that schemers will be able to preserve their goal…
1 week, 5 days ago
“Anthropic did not publish a “risk discussion” of Mythos when required by their RSP” by RobertM
I and some other people noticed a potential discrepancy in Anthropic's announcement of Claude Mythos. The version of the RSP that was operative over…
1 week, 5 days ago
“Claude Mythos: The System Card” by Zvi
Claude Mythos is different.
This is the first model other than GPT-2 that is at first not being released for public use at all.
With GPT-2 the del…
1 week, 5 days ago
“Some takes on UV & cancer” by Steven Byrnes
Table of contents:
Part 1: In which I use my optical physics background to share some hopefully-uncontroversial observationsPart 2: In which I boldl…1 week, 5 days ago