Podcast Episodes

“Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks” by Mark Kagach ☘️, EliasSchlie, Thomas Van Damme, JustinShovelain

Summary

This is a write-up on preparing for warning shots to catalyze international cooperation on AGI risks, and the corollary list of projects one…

8 hours ago

Short Long

View Episode

“Beyond the lexical personality traits: What is the structure of personality?” by tailcalled

This is a description of the methodology behind the latest iteration of my Targeted Personality Test. Feel free to take it either before or after re…

10 hours ago

Short Long

View Episode

“Logits as a new monitor for evaluation awareness” by Santiago Aranguri

TL;DR:

We build a logit monitor for eval awareness: throughout the CoT, we estimate an LLM's probability of producing an eval-aware sentence.The log…

15 hours ago

Short Long

View Episode

“My research agenda and work” by Seth Herd

This is a summary of the work I've done and work I plan to do, and the theories of change and AI progress that motivate my work. I've been working f…

15 hours ago

Short Long

View Episode

“One Year of PauseAI UK” by Joseph Miller, PauseAI UK

About one year ago, I started spending most of my time organising PauseAI UK. At that time our largest protest had seen fewer than 50 attendees, no …

19 hours ago

Short Long

View Episode

“Learnings from starting an AI safety research team” by draganover, Erin Robertson

This post's goal is to distill our takeaways from building a research team (somewhat) from scratch over the past four months. We describe some conte…

21 hours ago

Short Long

View Episode

“Training Deliberative Monitors for Black-Box Scheming Detection” by aksh-n, adityasinha, Victor Gillioz, Simon Storf, Kilian Merkelbach, richbc, Axel Højmark, Marius Hobbhahn

Paper: https://arxiv.org/abs/2605.29601

Thread: https://x.com/aksh_n0/status/2062568855814193497

TL;DR: Training small open-weight monitors provides…

1 day, 6 hours ago

Short Long

View Episode

“Lab Leaks, Black Holes, and Eggs: Epistemic Case Study Competition” by Oliver Sourbut, Josh Jacobson, Future of Life Foundation (FLF)

FLF is running a competition to find the best workflows and methodologies for using AI to produce reliable, trustworthy knowledge bases, grounded in…

1 day, 13 hours ago

Short Long

View Episode

″(Mis)generalization of Helpful-Only Fine-tuning” by Omar Khursheed, Baram Sosis, Fabien Roger

TLDR

We study the shortcomings of existing helpful-only models. We find that some show emergent misalignment, others have residual refusal behaviors…

1 day, 14 hours ago

Short Long

View Episode

“AI #171: False Flag” by Zvi

This was the week of Claude Opus 4.8. I covered the model card, then model welfare concerns, and finally capabilities and reactions. It's a good mod…

1 day, 16 hours ago

Short Long

View Episode

Podcast Episodes

“Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks” by Mark Kagach ☘️, EliasSchlie, Thomas Van Damme, JustinShovelain

“Beyond the lexical personality traits: What is the structure of personality?” by tailcalled

“Logits as a new monitor for evaluation awareness” by Santiago Aranguri

“My research agenda and work” by Seth Herd

“One Year of PauseAI UK” by Joseph Miller, PauseAI UK

“Learnings from starting an AI safety research team” by draganover, Erin Robertson

“Training Deliberative Monitors for Black-Box Scheming Detection” by aksh-n, adityasinha, Victor Gillioz, Simon Storf, Kilian Merkelbach, richbc, Axel Højmark, Marius Hobbhahn

“Lab Leaks, Black Holes, and Eggs: Epistemic Case Study Competition” by Oliver Sourbut, Josh Jacobson, Future of Life Foundation (FLF)

″(Mis)generalization of Helpful-Only Fine-tuning” by Omar Khursheed, Baram Sosis, Fabien Roger

“AI #171: False Flag” by Zvi

Love PodBriefly?