Podcast Episodes
Back to Search“Risk from fitness-seeking AIs: mechanisms and mitigations” by Alex Mallen
Current AIs routinely take unintended actions to score well on tasks: hardcoding test cases, training on the test set, downplaying issues, etc. This…
1 month ago
“Sanity-checking “Incompressible Knowledge Probes”” by Sturb, LawrenceC
Or, did a chief scientist of an AI assistant startup conclusively show that GPT-5.5 has 9.7 trillion parameters?
Introduction
Recently, a paper was …
1 month ago
“AI unemployment and AI extinction are often the same” by KatjaGrace
My sense is that people think of AI existential risk and AI unemployment as distinct issues.
Some people are extremely concerned about extinction a…
1 month ago
“AI risk was not invested by AI CEOs to hype their companies” by KatjaGrace
I hear that many people believe that the idea of advanced AI threatening human existence was invented by AI CEOs to hype their products. I’ve even b…
1 month ago
“Cyborg evals” by Eye You, frmsaul
The low-background steel problem
Modern steel is slightly radioactive. We did a lot of atomic testing in the 40s and 50s, and now our atmosphere has…
1 month ago
“To what extent is Qwen3-32B predicting its persona?” by Arjun Khandelwal, ryan_greenblatt, Alex Mallen
TL;DR
We test to what extent Qwen3-32B behaves as though it is trying to predict what "Qwen3" would do. We do this by using Synthetic Document Finet…
1 month ago
“Research Sabotage in ML Codebases” by egan
One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safet…
1 month, 1 week ago
“Maybe I was too harsh on deep learning theory (three days ago)” by LawrenceC
A few days ago, I reviewed a paper titled “There Will Be a Scientific Theory of Deep Learning". In it, I expressed appreciation for the authors for …
1 month, 1 week ago
“Notes on Transformer Consciousness” by slavachalnev
Assuming transformers can have conscious experience, what would that experience be like?
Transformers[1] are a structured grid of layers and token p…
1 month, 1 week ago
“On today’s panel with Bernie Sanders” by David Scott Krueger
It's sort of easy to forget how close Bernie Sanders was to becoming the most powerful person in the world. The world we live in feels so much not l…
1 month, 1 week ago