Podcast Episodes

“Protecting Cognitive Integrity: Our internal AI use policy (V1)” by Tom DAVID

We (at GPAI Policy Lab), wanted to share our V1 policy as an invitation to argue about it. Some of what motivates it is extrapolation and conversati…

1 month, 1 week ago

Short Long

View Episode

“Methodology for inferring propensities of LLMs” by Olli Järviniemi

Our team at UK AISI has released a paper on inferring LLM propensities for undesired behaviour.

I view this primarily as a methodology paper, and in…

1 month, 1 week ago

Short Long

View Episode

“vLLM-Lens: Fast Interpretability Tooling That Scales to Trillion-Parameter Models” by Alan Cooney, Sid Black

TL;DR: vLLM-Lens is a vLLM plugin for top-down interpretability techniques[1] such as probes, steering, and activation oracles. We benchmarked it as…

1 month, 1 week ago

Short Long

View Episode

“What Happens When a Model Thinks It Is AGI?” by josh :), David Africa

TL;DR

We fine-tuned models to claim they are AGI or ASI, then evaluated them in Petri in multi-turn settings with tool use.On GPT-4.1, this produced…

1 month, 1 week ago

Short Long

View Episode

“Should We Train Against (CoT) Monitors?” by RohanS

The question I actually try to answer in this post is a broader one (that doesn't work as well as a title): Should we incorporate proxies for desire…

1 month, 1 week ago

Short Long

View Episode

“If Everyone Reads It, Nobody Dies - Course Launch” by Luc Brinkman, Chris-Lons

tl;dr: Lens Academy offers a new course introducing ASI x-risk for AI safety newcomers, centered around the book IABIED. We share our hypothesis of …

1 month, 1 week ago

Short Long

View Episode

“Does your AI perform badly because you — you, specifically — are a bad person” by Natalie Cargill

Claude really got me lately.

I’d given it an elaborate prompt in an attempt to summon an AGI-level answer to my third-grade level question. Embarras…

1 month, 1 week ago

Short Long

View Episode

“A “Lay” Introduction to “On the Complexity of Neural Computation in Superposition”” by LawrenceC

This is a writeup based on a lightning talk I gave at an InkHaven hosted by Georgia Ray, where we were supposed to read a paper in about an hour, an…

1 month, 1 week ago

Short Long

View Episode

“AI #165: In Our Image” by Zvi

This was the week of Claude Opus 4.7.

The reception was more mixed than usual. It clearly has the intelligence and chops, especially for coding tas…

1 month, 1 week ago

Short Long

View Episode

“An Angry Review of Greg Egan’s “Didicosm”” by LawrenceC

I rarely find that reading fiction makes me upset. Normally, I only get worked up when high-profile people publish bad machine research that is then…

1 month, 1 week ago

Short Long

View Episode

Podcast Episodes

“Protecting Cognitive Integrity: Our internal AI use policy (V1)” by Tom DAVID

“Methodology for inferring propensities of LLMs” by Olli Järviniemi

“vLLM-Lens: Fast Interpretability Tooling That Scales to Trillion-Parameter Models” by Alan Cooney, Sid Black

“What Happens When a Model Thinks It Is AGI?” by josh :), David Africa

“Should We Train Against (CoT) Monitors?” by RohanS

“If Everyone Reads It, Nobody Dies - Course Launch” by Luc Brinkman, Chris-Lons

“Does your AI perform badly because you — you, specifically — are a bad person” by Natalie Cargill

“A “Lay” Introduction to “On the Complexity of Neural Computation in Superposition”” by LawrenceC

“AI #165: In Our Image” by Zvi

“An Angry Review of Greg Egan’s “Didicosm”” by LawrenceC

Love PodBriefly?