Podcast Episodes
Back to Search"How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)" by Kaj_Sotala
How it started
I used to think that anything that LLMs said about having something like subjective experience or what it felt like on the inside was…
2 months, 1 week ago
“My AGI safety research—2025 review, ’26 plans” by Steven Byrnes
Previous: 2024, 2022
“Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter.” –attributed to DL Mood…
2 months, 1 week ago
“Weird Generalization & Inductive Backdoors” by Jorio Cocola, Owain_Evans, dylan_f
This is the abstract and introduction of our new paper.
Links: 📜 Paper, 🐦 Twitter thread, 🌐 Project page, 💻 Code
Authors: Jan Betley*, Jorio Cocola…
2 months, 1 week ago
“Insights into Claude Opus 4.5 from Pokémon” by Julian Bradshaw
Credit: Nano Banana, with some text provided. You may be surprised to learn that ClaudePlaysPokemon is still running today, and that Claude still has…
2 months, 2 weeks ago
“The funding conversation we left unfinished” by jenn
People working in the AI industry are making stupid amounts of money, and word on the street is that Anthropic is going to have some sort of liquidi…
2 months, 2 weeks ago
“The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck
Highly capable AI systems might end up deciding the future. Understanding what will drive those decisions is therefore one of the most important que…
2 months, 2 weeks ago
“Little Echo” by Zvi
I believe that we will win.
An echo of an old ad for the 2014 US men's World Cup team. It did not win.
I was in Berkeley for the 2025 Secular Solsti…
2 months, 2 weeks ago
“A Pragmatic Vision for Interpretability” by Neel Nanda
Executive Summary
The Google DeepMind mechanistic interpretability team has made a strategic pivot over the past year, from ambitious reverse-engin…
2 months, 2 weeks ago
“AI in 2025: gestalt” by technicalities
This is the editorial for this year's "Shallow Review of AI Safety". (It got long enough to stand alone.)
Epistemic status: subjective impressions …
2 months, 2 weeks ago
“Eliezer’s Unteachable Methods of Sanity” by Eliezer Yudkowsky
"How are you coping with the end of the world?" journalists sometimes ask me, and the true answer is something they have no hope of understanding an…
2 months, 2 weeks ago