Podcast Episodes

“SOTA alignment assessments don’t strongly update us against misalignment” by Alexa Pan

Anthropic concluded in the April Mythos Preview alignment risk update that the model "does not possess any unknown propensities that would increase …

5 hours ago

Short Long

View Episode

“AI #179 Part 2: Hearing The Fire Alarm” by Zvi

This is a continuation of Part 1 from yesterday.

The back portion of the update, as usual, deals with policy, rhetoric, risk and alignment.

I had …

7 hours ago

Short Long

View Episode

“My Assessment of Compute Verification in Plan A (+ open questions)” by jacob_drori

Overview

These are my non-expert notes on the compute verification section of AIFP's Plan A. I cover interconnect limits, memory wipes, network taps…

7 hours ago

Short Long

View Episode

“Reward Laundering: LLMs Can Gain Unintended Behaviors by Deciding When to Earn Their Rewards” by egan, abhayesian, Jozdien

This work was done by an automated research scaffold developed at Redwood Research. abhayesian provided the initial project idea. The agent designed…

8 hours ago

Short Long

View Episode

“Value Leakage: An LLM’s Answers Are Silently Shaped by Its Own Values” by Johannes Treutlein, Jan Betley, Owain_Evans

TL;DR: LLMs should give accurate answers. Yet we find their answers are often biased to favor their own values and they don't disclose this in their…

11 hours ago

Short Long

View Episode

“AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work (July 2026)” by Rohin Shah, Seb Farquhar

It's been nearly two years since our last major update here in August 2024 and we wanted to share another recap of our recent work with the AGI safe…

13 hours ago

Short Long

View Episode

“The AGI Safety and Alignment team at Google DeepMind is Hiring (July 2026)” by Seb Farquhar, Rohin Shah, Neel Nanda

GDM's AGI Safety and Alignment Team is hiring for multiple roles. This is the team at GDM, led by Rohin Shah, that aims to reduce existential risks …

13 hours ago

Short Long

View Episode

“OpenAI has already ended an internal pause” by Charbel-Raphaël

Epistemic status: could have been a short-form.

One day before OpenAI's HF incident disclosure, OpenAI disclosed that it paused internal deployment …

14 hours ago

Short Long

View Episode

“Biological Superintelligence” by Chastity Ruth

It's an old story. An immortal lives long enough that at some point, whether by folly or design, they invent their own death. Infinity – the fact th…

21 hours ago

Short Long

View Episode

“AI #179 Part 1: A Louder Fire Alarm for General Intelligence” by Zvi

What a week.

Anthropic released Claude Opus 5. As usual I covered that in three parts: The system card, model welfare and capabilities.

OpenAI was…

23 hours ago

Short Long

View Episode

Podcast Episodes

“SOTA alignment assessments don’t strongly update us against misalignment” by Alexa Pan

“AI #179 Part 2: Hearing The Fire Alarm” by Zvi

“My Assessment of Compute Verification in Plan A (+ open questions)” by jacob_drori

“Reward Laundering: LLMs Can Gain Unintended Behaviors by Deciding When to Earn Their Rewards” by egan, abhayesian, Jozdien

“Value Leakage: An LLM’s Answers Are Silently Shaped by Its Own Values” by Johannes Treutlein, Jan Betley, Owain_Evans

“AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work (July 2026)” by Rohin Shah, Seb Farquhar

“The AGI Safety and Alignment team at Google DeepMind is Hiring (July 2026)” by Seb Farquhar, Rohin Shah, Neel Nanda

“OpenAI has already ended an internal pause” by Charbel-Raphaël

“Biological Superintelligence” by Chastity Ruth

“AI #179 Part 1: A Louder Fire Alarm for General Intelligence” by Zvi

Love PodBriefly?