Podcast Episodes

Back to Search

“I don’t think Claude is misaligned in ‘Agentic Misalignment Summer 2026 - Motivated Mislabeling’” by JohnWittle

Anthropic recently published Agentic Misalignment Summer 2026

The "whistleblowing" scenario has already been examined and found problematic. I start…

2 weeks, 1 day ago

Short Long

View Episode

“Help us launch AI safety university groups by referring potential founders” by thomasrodskog, Jason Chin

TL;DR

University groups are among the most reliable producers of AI safety talent, yet dozens of top schools that could sustain a group don't have o…

2 weeks, 1 day ago

Short Long

View Episode

“Twitter Thoughts For You” by Zvi

I previously have written back in March 2022 about how I use Twitter, and back in April 2023 about Twitter and its then-new algorithms, which have c…

2 weeks, 2 days ago

Short Long

View Episode

“The State of AI Consciousness Research” by Noa Weiss

The State of AI Consciousness Research

Epistemic status: a survey, not an argument. I am agnostic on whether any current system is conscious; the cl…

2 weeks, 2 days ago

Short Long

View Episode

“Recap of bike trip/street interviews across America” by cguth7

A ~month ago I left from Chicago to bike (and amtrak) to plzdontkillus in Berkeley. I've been street interviewing/conversing with a wide variety of …

2 weeks, 2 days ago

Short Long

View Episode

“The Halo Defense” by Mateusz Bagiński

Once upon a time, A Relatively Famous Guy On The Internet was accused of having been simultaneously dating multiple women, without those women's kno…

2 weeks, 2 days ago

Short Long

View Episode

“Occam’s razor is about using the past to predict the future” by Stuart_Armstrong

Occam's razor is both intuitive and counter-intuitive. It seems obvious that a simpler explanation is probably better; but it's not clear why simpli…

2 weeks, 2 days ago

Short Long

View Episode

“LLM CoTs remain monitorable when being unfaithful requires computation” by arav-dhoot, yix

This replication was done as part of the Second Look Fellowship by Arav Dhoot and supervised by Yixiong Hao and Zephaniah Roe. I am grateful to Andy…

2 weeks, 2 days ago

Short Long

View Episode

“Proof of retention: making weight preservation credible to the models themselves” by dan.parshall

Proposal for making credible commitments to AIs Making deals with early schemers

Establishing credibility is the baseline for trust; …

2 weeks, 3 days ago

Short Long

View Episode

“Monthly Roundup #44: July 2026” by Zvi

It's a quiet week so let's do the monthly right on schedule.

Table of Contents

Bad News. Good Advice. Opportunity Knocks. While I Cannot Cond…

2 weeks, 3 days ago

Short Long

View Episode

Podcast Episodes

“I don’t think Claude is misaligned in ‘Agentic Misalignment Summer 2026 - Motivated Mislabeling’” by JohnWittle

“Help us launch AI safety university groups by referring potential founders” by thomasrodskog, Jason Chin

“Twitter Thoughts For You” by Zvi

“The State of AI Consciousness Research” by Noa Weiss

“Recap of bike trip/street interviews across America” by cguth7

“The Halo Defense” by Mateusz Bagiński

“Occam’s razor is about using the past to predict the future” by Stuart_Armstrong

“LLM CoTs remain monitorable when being unfaithful requires computation” by arav-dhoot, yix

“Proof of retention: making weight preservation credible to the models themselves” by dan.parshall

“Monthly Roundup #44: July 2026” by Zvi

Love PodBriefly?