Podcast Episodes
Back to Search“Ten people on the inside” by Buck
(Many of these ideas developed in conversation with Ryan Greenblatt)
In a shortform, I described some different levels of resources and buy-in for mis…
1 year, 1 month ago
“Anomalous Tokens in DeepSeek-V3 and r1” by henry
“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or otherwise don’t behave like regular text.
The Solid…
1 year, 1 month ago
“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans
This is the abstract and introduction of our new paper, with some discussion of implications for AI Safety at the end.
Authors: Jan Betley*, Xuchan …
1 year, 1 month ago
“Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell
The Cake
Imagine that I want to bake a chocolate cake, and my sole goal in my entire lightcone and extended mathematical universe is to bake that cak…
1 year, 1 month ago
“A Three-Layer Model of LLM Psychology” by Jan_Kulveit
This post offers an accessible model of psychology of character-trained LLMs like Claude.
Epistemic Status
This is primarily a phenomenological model…
1 year, 1 month ago
“Training on Documents About Reward Hacking Induces Reward Hacking” by evhub
This is a link post.This is a blog post reporting some preliminary work from the Anthropic Alignment Science team, which might be of interest to rese…
1 year, 1 month ago
“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt
One hope for keeping existential risks low is to get AI companies to (successfully) make high-assurance safety cases: structured and auditable argume…
1 year, 1 month ago
“Mechanisms too simple for humans to design” by Malmesbury
Cross-posted from Telescopic Turnip
As we all know, humans are terrible at building butterflies. We can make a lot of objectively cool things like nuc…
1 year, 1 month ago
“The Gentle Romance” by Richard_Ngo
This is a link post.A story I wrote about living through the transition to utopia.
This is the one story that I've put the most time and effort into; …
1 year, 1 month ago
“Quotes from the Stargate press conference” by Nikola Jurkovic
This is a link post.Present alongside President Trump:
Sam AltmanLarry Ellison (Oracle executive chairman and CTO)Masayoshi Son (Softbank CEO who be…
1 year, 1 month ago