Podcast Episodes
Back to Search[HUMAN VOICE] "There is way too much serendipity" by Malmesbury
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
Crossposted from substack.
As we all know, sugar is sweet and s…
2 years, 1 month ago
[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
Source:
https://www.lesswrong.com/posts/tEPHGZAb63dfq2v8n/how-u…
2 years, 1 month ago
[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al
This is a linkpost for https://arxiv.org/abs/2401.05566
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
Source:…
2 years, 1 month ago
The impossible problem of due process
I wrote this entire post in February of 2023, during the fallout from the TIME article. I didn't post it at the time for multiple reasons:
because I …
2 years, 1 month ago
[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith
"(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.)"
This is the first essay in a series that I’…
2 years, 1 month ago
Introducing Alignment Stress-Testing at Anthropic
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Following on from our recent paper, “Sleeper Agents: Training D…
2 years, 1 month ago
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a linkpost for https://arxiv.org/abs/2401.05566I'm not …
2 years, 1 month ago
[HUMAN VOICE] "Meaning & Agency" by Abram Demski
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
The goal of this post is to clarify a few concepts relating to…
2 years, 1 month ago
What’s up with LLMs representing XORs of arbitrary features?
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Thanks to Clément Dumas, Nikola Jurković, Nora Belrose, Arthur …
2 years, 1 month ago
Gentleness and the artificial Other
(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.
This is the first essay in a series that I’m c…
2 years, 1 month ago