Podcast Episodes
Back to Search“Bringing More Expertise to Bear on Alignment” by Edmund Lau, Geoffrey Irving, Cameron Holmes, David Africa
Preamble
The preamble is less useful for the typical AlignmentForum/LessWrong reader, who may want to skip to Adversaria vs Basinland section.
On 28…
4 weeks, 1 day ago
[Linkpost] “How to prevent AI’s 2008 moment (We’re hiring)” by felixgaston
This is a link post.
TL;DR; CeSIA, the French Center for AI Safety is recruiting. French not necessary. Apply by 22 May 2026; Paris or remote in Euro…
4 weeks, 1 day ago
“AI #167: The Prior Restraint Era Begins” by Zvi
The era of training frontier models and then releasing them whenever you wanted?
That was fun while it lasted. It looks likely to be over now. The …
4 weeks, 2 days ago
“Mechanistic estimation for wide random MLPs” by Jacob_Hilton
This post covers joint work with Wilson Wu, George Robinson, Mike Winer, Victor Lecomte and Paul Christiano. Thanks to Geoffrey Irving and Jess Ried…
4 weeks, 2 days ago
“Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations” by Subhash Kantamneni, kitft, Euan Ong, Sam Marks
Abstract
We introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. …
4 weeks, 2 days ago
“Try, even if they have you cold” by WalterL
I think smart people try things less often than they should, because of a cached mental pattern where you think of what might go wrong, and you find…
4 weeks, 2 days ago
“A review of “Investigating the consequences of accidentally grading CoT during RL”” by Buck
Last week, OpenAI staff shared an early draft of Investigating the consequences of accidentally grading CoT during RL with Redwood Research staff.
T…
4 weeks, 2 days ago
“There is no evidence you should reapply sunscreen every 2 hours.” by Hide
It's incredible how many consensus guidelines dissolve when you look closely at them.
If you listen to any authority on the subject of sunscreen,…
4 weeks, 2 days ago
“Many individual CEVs are probably quite bad” by Viliam
I was thinking about Habryka's article on Putin's CEV, but I am posting my response here, because the original article is already 3 weeks old.
I am …
1 month ago
“x-risk-themed” by kave
Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “what…
1 month ago