Podcast Episodes

“The Unintelligibility is Ours: Notes on Chain-of-Thought” by 1a3orn

Many people seem to think that the chains-of-thought in RL-trained LLMs are under a great deal of "pressure" to cease being English. The idea is tha…

1 week, 4 days ago

Short Long

View Episode

“If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines” by ryan_greenblatt

Anthropic's system card for Mythos Preview says:

It's unclear how we should interpret this. What do they mean by productivity uplift? To what exten…

1 week, 4 days ago

Short Long

View Episode

“Claude Mythos #2: Cybersecurity and Project Glasswing” by Zvi

Anthropic is not going to release its new most capable model, Claude Mythos, to the public any time soon. Its cyber capabilities are too dangerous t…

1 week, 4 days ago

Short Long

View Episode

“Why Control Creates Conflict, and When to Open Instead” by plex

tl;dr: with multiple agents, control attempts tend to create conflict, because control attempts shut down communications channels, which leads to fe…

1 week, 4 days ago

Short Long

View Episode

“Reproducing steering against evaluation awareness in a large open-weight model” by Thomas Read, Bronson Schoen, Joseph Bloom

Produced as part of the UK AISI Model Transparency Team. Our team works on ensuring models don't subvert safety assessments, e.g. through evaluation…

1 week, 4 days ago

Short Long

View Episode

“Have we already lost? Part 2: Reasons for Doom” by LawrenceC

Written very quickly for the Inkhaven Residency.

As I take the time to reflect on the state of AI Safety in early 2026, one question feels unavoidab…

1 week, 4 days ago

Short Long

View Episode

“Model organisms researchers should check whether high LRs defeat their model organisms” by dx26, Sebastian Prasanna, Alek Westover, Vivek Hebbar, Julian Stastny

Thanks to Buck Shlegeris for feedback on a draft of this post.

The goal-guarding hypothesis states that schemers will be able to preserve their goal…

1 week, 5 days ago

Short Long

View Episode

“Anthropic did not publish a “risk discussion” of Mythos when required by their RSP” by RobertM

I and some other people noticed a potential discrepancy in Anthropic's announcement of Claude Mythos. The version of the RSP that was operative over…

1 week, 5 days ago

Short Long

View Episode

“Claude Mythos: The System Card” by Zvi

Claude Mythos is different.

This is the first model other than GPT-2 that is at first not being released for public use at all.

With GPT-2 the del…

1 week, 5 days ago

Short Long

View Episode

“Some takes on UV & cancer” by Steven Byrnes

Table of contents:

Part 1: In which I use my optical physics background to share some hopefully-uncontroversial observationsPart 2: In which I boldl…

1 week, 5 days ago

Short Long

View Episode

Podcast Episodes

“The Unintelligibility is Ours: Notes on Chain-of-Thought” by 1a3orn

“If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines” by ryan_greenblatt

“Claude Mythos #2: Cybersecurity and Project Glasswing” by Zvi

“Why Control Creates Conflict, and When to Open Instead” by plex

“Reproducing steering against evaluation awareness in a large open-weight model” by Thomas Read, Bronson Schoen, Joseph Bloom

“Have we already lost? Part 2: Reasons for Doom” by LawrenceC

“Model organisms researchers should check whether high LRs defeat their model organisms” by dx26, Sebastian Prasanna, Alek Westover, Vivek Hebbar, Julian Stastny

“Anthropic did not publish a “risk discussion” of Mythos when required by their RSP” by RobertM

“Claude Mythos: The System Card” by Zvi

“Some takes on UV & cancer” by Steven Byrnes

Love PodBriefly?