Podcast Episodes
Back to Search“Protecting Cognitive Integrity: Our internal AI use policy (V1)” by Tom DAVID
We (at GPAI Policy Lab), wanted to share our V1 policy as an invitation to argue about it. Some of what motivates it is extrapolation and conversati…
1 month, 1 week ago
“Methodology for inferring propensities of LLMs” by Olli Järviniemi
Our team at UK AISI has released a paper on inferring LLM propensities for undesired behaviour.
I view this primarily as a methodology paper, and in…
1 month, 1 week ago
“vLLM-Lens: Fast Interpretability Tooling That Scales to Trillion-Parameter Models” by Alan Cooney, Sid Black
TL;DR: vLLM-Lens is a vLLM plugin for top-down interpretability techniques[1] such as probes, steering, and activation oracles. We benchmarked it as…
1 month, 1 week ago
“What Happens When a Model Thinks It Is AGI?” by josh :), David Africa
TL;DR
We fine-tuned models to claim they are AGI or ASI, then evaluated them in Petri in multi-turn settings with tool use.On GPT-4.1, this produced…1 month, 1 week ago
“Should We Train Against (CoT) Monitors?” by RohanS
The question I actually try to answer in this post is a broader one (that doesn't work as well as a title): Should we incorporate proxies for desire…
1 month, 1 week ago
“If Everyone Reads It, Nobody Dies - Course Launch” by Luc Brinkman, Chris-Lons
tl;dr: Lens Academy offers a new course introducing ASI x-risk for AI safety newcomers, centered around the book IABIED. We share our hypothesis of …
1 month, 1 week ago
“Does your AI perform badly because you — you, specifically — are a bad person” by Natalie Cargill
Claude really got me lately.
I’d given it an elaborate prompt in an attempt to summon an AGI-level answer to my third-grade level question. Embarras…
1 month, 1 week ago
“A “Lay” Introduction to “On the Complexity of Neural Computation in Superposition”” by LawrenceC
This is a writeup based on a lightning talk I gave at an InkHaven hosted by Georgia Ray, where we were supposed to read a paper in about an hour, an…
1 month, 1 week ago
“AI #165: In Our Image” by Zvi
This was the week of Claude Opus 4.7.
The reception was more mixed than usual. It clearly has the intelligence and chops, especially for coding tas…
1 month, 1 week ago
“An Angry Review of Greg Egan’s “Didicosm”” by LawrenceC
I rarely find that reading fiction makes me upset. Normally, I only get worked up when high-profile people publish bad machine research that is then…
1 month, 1 week ago