Podcast Episodes

“Can Agents Fool Each Other? Findings from the AI Village” by Shoshannah Tekofsky

The better agents are at deception, the less sure we can be that they are doing what we want. As agents become increasingly capable and autonomous, …

3 weeks, 6 days ago

Short Long

View Episode

″$1 billion is not enough; OpenAI Foundation must start spending tens of billions each year” by Davidmanheim

OpenAI is now a public benefit corporation, with a charter that demands they use AGI for the benefit of all, and do so safely. To justify this struc…

3 weeks, 6 days ago

Short Long

View Episode

“Is Gemini 3 Scheming in the Wild?” by Alejandro Wainstock, Agustin_Martinez_Suñe, Iván Arcuschin, Victor Braberman

TL;DR

When faced with an unexpected tool response, without any adversarial attack, Gemini 3 deliberately and covertly violates an explicit system pr…

3 weeks, 6 days ago

Short Long

View Episode

“Latent Introspection (and other open-source introspection papers)” by vgel

@vgel, Martin Vanek, @Raymond Douglas, @Jan_Kulveit — ACS Research, CTS, Charles University

---

Paper | Code | Earlier post | Twitter thread | Blues…

4 weeks ago

Short Long

View Episode

“The Fourth World” by Linch

Is consciousness the last moral world?

Imagine trying to explain to a virus why suffering matters.

A virus is a simple self-replicating molecule: un…

4 weeks ago

Short Long

View Episode

“My cost-effectiveness unit” by Zach Stein-Perlman

It feels like the grantmaking around me is only partially moneyball-pilled, or it's only somewhat competent at moneyball. There's alpha in putting n…

4 weeks ago

Short Long

View Episode

“The AIXI perspective on AI Safety” by Cole Wyeth

Epistemic status: While I am specialized in this topic, my career incentivizes may bias me towards a positive assessment of AIXI theory. I am also d…

4 weeks ago

Short Long

View Episode

“Measuring and improving coding audit realism with deployment resources” by Connor Kissane, Monte M, Fabien Roger

TL;DR We study realism win rate, a metric for measuring how distinguishable Petri audit transcripts are from real deployment interactions. We use it…

4 weeks ago

Short Long

View Episode

“Ablating Split Personality Training” by OscarGilg

I was part of the SPAR team that worked on Split Personality Training: Revealing Latent Knowledge Through Alternate Personalities. I ran some follow…

4 weeks ago

Short Long

View Episode

“AI character is a big deal” by wdmacaskill, Tom Davidson

0. Intro

Due to Claude's Constitution and OpenAI's model spec, the issue of AI character has started getting more attention, particularly concerning…

4 weeks ago

Short Long

View Episode

Podcast Episodes

“Can Agents Fool Each Other? Findings from the AI Village” by Shoshannah Tekofsky

″$1 billion is not enough; OpenAI Foundation must start spending tens of billions each year” by Davidmanheim

“Is Gemini 3 Scheming in the Wild?” by Alejandro Wainstock, Agustin_Martinez_Suñe, Iván Arcuschin, Victor Braberman

“Latent Introspection (and other open-source introspection papers)” by vgel

“The Fourth World” by Linch

“My cost-effectiveness unit” by Zach Stein-Perlman

“The AIXI perspective on AI Safety” by Cole Wyeth

“Measuring and improving coding audit realism with deployment resources” by Connor Kissane, Monte M, Fabien Roger

“Ablating Split Personality Training” by OscarGilg

“AI character is a big deal” by wdmacaskill, Tom Davidson

Love PodBriefly?