Podcast Episodes
Back to Search“SAE feature geometry is outside the superposition hypothesis” by jake_mendel
Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations of feature vectors contain cr…
1 year, 8 months ago
“Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data” by Johannes Treutlein, Owain_Evans
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.TL;DR: We published a new paper on out-of-c…
1 year, 8 months ago
“Boycott OpenAI” by PeterMcCluskey
This is a link post.I have canceled my OpenAI subscription in protest over OpenAI's lack ofethics.
In particular, I object to:
threats to confiscate d…
1 year, 8 months ago
“Sycophancy to subterfuge: Investigating reward tampering in large language models” by evhub, Carson Denison
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.New Anthropic model organisms research pape…
1 year, 8 months ago
“I would have shit in that alley, too” by Declan Molony
After living in a suburb for most of my life, when I moved to a major U.S. city the first thing I noticed was the feces. At first I assumed it was do…
1 year, 8 months ago
“Getting 50% (SoTA) on ARC-AGI with GPT-4o” by ryan_greenblatt
ARC-AGI post
Getting 50% (SoTA) on ARC-AGI with GPT-4o
I recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate …
1 year, 8 months ago
“Why I don’t believe in the placebo effect” by transhumanist_atom_understander
Have you heard this before? In clinical trials, medicines have to be compared to a placebo to separate the effect of the medicine from the psychologi…
1 year, 8 months ago
“Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)” by Andrew_Critch
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.As an AI researcher who wants to do technical work that helps h…
1 year, 8 months ago
“My AI Model Delta Compared To Christiano” by johnswentworth
Preamble: Delta vs Crux
This section is redundant if you already read My AI Model Delta Compared To Yudkowsky.
I don’t natively think in terms of crux…
1 year, 8 months ago
“My AI Model Delta Compared To Yudkowsky” by johnswentworth
Preamble: Delta vs Crux
I don’t natively think in terms of cruxes. But there's a similar concept which is more natural for me, which I’ll call a delt…
1 year, 8 months ago