Podcast Episodes
Back to Search[HUMAN VOICE] "There is way too much serendipity" by Malmesbury
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
Crossposted from substack.
As we all know, sugar is sweet and s…
2 years, 4 months ago
[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
Source:
https://www.lesswrong.com/posts/tEPHGZAb63dfq2v8n/how-u…
2 years, 4 months ago
[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al
This is a linkpost for https://arxiv.org/abs/2401.05566
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
Source:…
2 years, 4 months ago
The impossible problem of due process
I wrote this entire post in February of 2023, during the fallout from the TIME article. I didn't post it at the time for multiple reasons:
because I …
2 years, 4 months ago
[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith
"(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.)"
This is the first essay in a series that I’…
2 years, 4 months ago
Introducing Alignment Stress-Testing at Anthropic
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Following on from our recent paper, “Sleeper Agents: Training D…
2 years, 4 months ago
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a linkpost for https://arxiv.org/abs/2401.05566I'm not …
2 years, 4 months ago
[HUMAN VOICE] "Meaning & Agency" by Abram Demski
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
The goal of this post is to clarify a few concepts relating to…
2 years, 4 months ago
What’s up with LLMs representing XORs of arbitrary features?
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Thanks to Clément Dumas, Nikola Jurković, Nora Belrose, Arthur …
2 years, 4 months ago
Gentleness and the artificial Other
(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.
This is the first essay in a series that I’m c…
2 years, 4 months ago