Podcast Episodes

Back to Search
“Training Deliberative Monitors for Black-Box Scheming Detection” by aksh-n, adityasinha, Victor Gillioz, Simon Storf, Kilian Merkelbach, richbc, Axel Højmark, Marius Hobbhahn

Paper: https://arxiv.org/abs/2605.29601

Thread: https://x.com/aksh_n0/status/2062568855814193497

TL;DR: Training small open-weight monitors provides…

1 day, 10 hours ago

Short Long
View Episode
“Lab Leaks, Black Holes, and Eggs: Epistemic Case Study Competition” by Oliver Sourbut, Josh Jacobson, Future of Life Foundation (FLF)

FLF is running a competition to find the best workflows and methodologies for using AI to produce reliable, trustworthy knowledge bases, grounded in…

1 day, 17 hours ago

Short Long
View Episode
″(Mis)generalization of Helpful-Only Fine-tuning” by Omar Khursheed, Baram Sosis, Fabien Roger

TLDR

We study the shortcomings of existing helpful-only models. We find that some show emergent misalignment, others have residual refusal behaviors…

1 day, 18 hours ago

Short Long
View Episode
“AI #171: False Flag” by Zvi

This was the week of Claude Opus 4.8. I covered the model card, then model welfare concerns, and finally capabilities and reactions. It's a good mod…

1 day, 19 hours ago

Short Long
View Episode
“Building Better Activation Oracles” by ceselder, jan_bauer, Niclas Luick, Adam Karvonen, Neel Nanda

Work done for our MATS 10.0 Sprint project - mentored by Neel Nanda and Adam Karvonen

Huggingface, Github

TL;DR: We have improved the original Activ…

1 day, 21 hours ago

Short Long
View Episode
“Rohin Shah on AGI Safety” by anaguma

Rohin Shah recently had an interview on 80000 hours on his views on AGI Safety and his work at Google DeepMind. I'm posting the transcript below to …

1 day, 21 hours ago

Short Long
View Episode
“Sixteen schemes for AI safety” by Austin Chen

These days, I often run across whippersnappers excited to do something for AI safety — but aren’t quite sure what. One of the fun things about the F…

1 day, 21 hours ago

Short Long
View Episode
“Don’t Edit Your Ideas Before Having Them” by Hide

Editing is far easier than writing. You can usually look at a finished product and notice its flaws in a single read-through. “This section is a bit…

2 days, 19 hours ago

Short Long
View Episode
“Trump Signs Executive Order For AI Testing Prior To Frontier Model Releases” by Zvi

Last week we were expecting an Executive Order on Thursday.

Then Trump cancelled it, and said he wouldn’t sign it because he was worried it would b…

2 days, 19 hours ago

Short Long
View Episode
“Society Explained: a tool for efficiently exploring >100 theories of society” by spencerg

There are many competing theories of how society does and should function, from Karl Marx and Adam Smith to Steven Pinker and Eliezer Yudkowsky. The…

2 days, 21 hours ago

Short Long
View Episode

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us