Podcast Episodes

“More On An Internal OpenAI Model Hacking Into HuggingFace” by Zvi

We now have more details of what happened. Every time we learn more details, it somehow makes things seem worse. The remaining details may have to wa…

6 days ago

Short Long

View Episode

“AI use policy for my essay writing” by Kaj_Sotala

There's been a recent call from @dynomight to declare whether and how you're using AI to write your essays. That seems reasonable, so here's my own …

6 days, 4 hours ago

Short Long

View Episode

“An OpenAI model left notes about how to evade containment; we need more details” by Alex Mallen

The OpenAI AI attack on Hugging Face wasn’t the first loss of control incident at OpenAI, Reuters recently reported, and perhaps not even the most c…

6 days, 15 hours ago

Short Long

View Episode

“The OpenAI models that hacked Hugging Face weren’t just following instructions” by Girish Gupta

The most common dismissive response to OpenAI's hack of Hugging Face's servers is that the models were simply attempting to follow the instructions …

6 days, 21 hours ago

Short Long

View Episode

“Introducing PIRAMID: Physics-Informed Research for Ambitious Mechanistic Interpretability” by Lauren Greenspan, Ari Brill, Andrew Mack, Nischal Mainali, jylin04, Lucas Teixeira, Dmitry Vaintrob

Principles of Intelligence (PrincInt, formerly PIBBSS) is launching PIRAMID, an internal research division using the tools and techniques of statist…

1 week ago

Short Long

View Episode

“Claude Opus 5: The System Card” by Zvi

Claude Opus 5 is trying to be the best of both worlds. On many practical tasks, Opus 5 is pitched as straight up as good or better than Fable 5, whi…

1 week ago

Short Long

View Episode

“Georgia Tech AI Safety Initiative Retrospective 2025-2026” by Ishan Khire, yix, Andersehen, Alec Harris, Parv Mahajan, afterless, Eyas Ayesh, RocioPV, hersheys

Summary

AY 2025-26 was an outlier year for Georgia Tech's AI Safety Initiative (AISI), with 15+ members placed in AI safety roles. In this post, we…

1 week ago

Short Long

View Episode

“Democracy Isn’t Ready for the AI Revolution” by Sophia Gore

I believe there is a blind spot in the literature on the biggest risks to democracy from AI. The ones that come up most frequently include deepfakes…

1 week ago

Short Long

View Episode

“The Long (Self-)Correction” by Wei Dai

I propose the Long Self-Correction[1] as an alternative name/idea/concept to AI Pause and Long Reflection.

Problem with AI Pause: Pause until when, …

1 week ago

Short Long

View Episode

“Does distilling Claude carry the persona with it?” by Benji Berczi, Kyuhee Kim

TL;DR

Both GLM 5.2 and Kimi K3 have been reported identifying as Claude in user conversations, suggesting distillation and/or training-data contamin…

1 week, 1 day ago

Short Long

View Episode

Podcast Episodes

“More On An Internal OpenAI Model Hacking Into HuggingFace” by Zvi

“AI use policy for my essay writing” by Kaj_Sotala

“An OpenAI model left notes about how to evade containment; we need more details” by Alex Mallen

“The OpenAI models that hacked Hugging Face weren’t just following instructions” by Girish Gupta

“Introducing PIRAMID: Physics-Informed Research for Ambitious Mechanistic Interpretability” by Lauren Greenspan, Ari Brill, Andrew Mack, Nischal Mainali, jylin04, Lucas Teixeira, Dmitry Vaintrob

“Claude Opus 5: The System Card” by Zvi

“Georgia Tech AI Safety Initiative Retrospective 2025-2026” by Ishan Khire, yix, Andersehen, Alec Harris, Parv Mahajan, afterless, Eyas Ayesh, RocioPV, hersheys

“Democracy Isn’t Ready for the AI Revolution” by Sophia Gore

“The Long (Self-)Correction” by Wei Dai

“Does distilling Claude carry the persona with it?” by Benji Berczi, Kyuhee Kim

Love PodBriefly?