Podcast Episodes
Back to Search"Good if make prior after data instead of before" by dynomight
They say you’re supposed to choose your prior in advance. That's why it's called a “prior”. First, you’re supposed to say say how plausible differen…
2 months ago
"Measuring no CoT math time horizon (single forward pass)" by ryan_greenblatt
A key risk factor for scheming (and misalignment more generally) is opaque reasoning ability.One proxy for this is how good AIs are at solving math …
2 months ago
"Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance" by ryan_greenblatt
Prior results have shown that LLMs released before 2024 can't leverage 'filler tokens'—unrelated tokens prior to the model's final answer—to perform…
2 months ago
"Turning 20 in the probable pre-apocalypse" by Parv Mahajan
Master version of this on https://parvmahajan.com/2025/12/21/turning-20.html
I turn 20 in January, and the world looks very strange. Probably, thin…
2 months ago
"Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment" by Cam, Puria Radmard, Kyle O’Brien, David Africa, Samuel Ratnam, andyk
TL;DR
LLMs pretrained on data about misaligned AIs themselves become less aligned. Luckily, pretraining LLMs with synthetic data about good AIs help…
2 months ago
"Dancing in a World of Horseradish" by lsusr
Commercial airplane tickets are divided up into coach, business class, and first class. In 2014, Etihad introduced The Residence, a premium experien…
2 months, 1 week ago
"Contradict my take on OpenPhil’s past AI beliefs" by Eliezer Yudkowsky
At many points now, I've been asked in private for a critique of EA / EA's history / EA's impact and I have ad-libbed statements that I feel guilty …
2 months, 1 week ago
"Opinionated Takes on Meetups Organizing" by jenn
Screwtape, as the global ACX meetups czar, has to be reasonable and responsible in his advice giving for running meetups.
And the advice is great! I…
2 months, 1 week ago
"How to game the METR plot" by shash42
TL;DR: In 2025, we were in the 1-4 hour range, which has only 14 samples in METR's underlying data. The topic of each sample is public, making it ea…
2 months, 1 week ago
"Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers" by Sam Marks, Adam Karvonen, James Chua, Subhash Kantamneni, Euan Ong, Julian Minder, Clément Dumas, Owain_Evans
TL;DR: We train LLMs to accept LLM neural activations as inputs and answer arbitrary questions about them in natural language. These Activation Orac…
2 months, 1 week ago