Podcast Episodes

"Measuring no CoT math time horizon (single forward pass)" by ryan_greenblatt

A key risk factor for scheming (and misalignment more generally) is opaque reasoning ability.One proxy for this is how good AIs are at solving math …

6 months, 4 weeks ago

Short Long

View Episode

"Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance" by ryan_greenblatt

Prior results have shown that LLMs released before 2024 can't leverage 'filler tokens'—unrelated tokens prior to the model's final answer—to perform…

7 months ago

Short Long

View Episode

"Turning 20 in the probable pre-apocalypse" by Parv Mahajan

Master version of this on https://parvmahajan.com/2025/12/21/turning-20.html

I turn 20 in January, and the world looks very strange. Probably, thin…

7 months ago

Short Long

View Episode

"Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment" by Cam, Puria Radmard, Kyle O’Brien, David Africa, Samuel Ratnam, andyk

TL;DR

LLMs pretrained on data about misaligned AIs themselves become less aligned. Luckily, pretraining LLMs with synthetic data about good AIs help…

7 months ago

Short Long

View Episode

"Dancing in a World of Horseradish" by lsusr

Commercial airplane tickets are divided up into coach, business class, and first class. In 2014, Etihad introduced The Residence, a premium experien…

7 months ago

Short Long

View Episode

"Contradict my take on OpenPhil’s past AI beliefs" by Eliezer Yudkowsky

At many points now, I've been asked in private for a critique of EA / EA's history / EA's impact and I have ad-libbed statements that I feel guilty …

7 months ago

Short Long

View Episode

"Opinionated Takes on Meetups Organizing" by jenn

Screwtape, as the global ACX meetups czar, has to be reasonable and responsible in his advice giving for running meetups.

And the advice is great! I…

7 months ago

Short Long

View Episode

"How to game the METR plot" by shash42

TL;DR: In 2025, we were in the 1-4 hour range, which has only 14 samples in METR's underlying data. The topic of each sample is public, making it ea…

7 months ago

Short Long

View Episode

"Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers" by Sam Marks, Adam Karvonen, James Chua, Subhash Kantamneni, Euan Ong, Julian Minder, Clément Dumas, Owain_Evans

TL;DR: We train LLMs to accept LLM neural activations as inputs and answer arbitrary questions about them in natural language. These Activation Orac…

7 months ago

Short Long

View Episode

"Scientific breakthroughs of the year" by technicalities

A couple of years ago, Gavin became frustrated with science journalism. No one was pulling together results across fields; the articles usually did…

7 months, 1 week ago

Short Long

View Episode

Podcast Episodes

"Measuring no CoT math time horizon (single forward pass)" by ryan_greenblatt

"Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance" by ryan_greenblatt

"Turning 20 in the probable pre-apocalypse" by Parv Mahajan

"Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment" by Cam, Puria Radmard, Kyle O’Brien, David Africa, Samuel Ratnam, andyk

"Dancing in a World of Horseradish" by lsusr

"Contradict my take on OpenPhil’s past AI beliefs" by Eliezer Yudkowsky

"Opinionated Takes on Meetups Organizing" by jenn

"How to game the METR plot" by shash42

"Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers" by Sam Marks, Adam Karvonen, James Chua, Subhash Kantamneni, Euan Ong, Julian Minder, Clément Dumas, Owain_Evans

"Scientific breakthroughs of the year" by technicalities

Love PodBriefly?