Podcast Episodes

Back to Search
[HUMAN VOICE] "There is way too much serendipity" by Malmesbury

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

Crossposted from substack.

As we all know, sugar is sweet and s…

2 years, 4 months ago

Short Long
View Episode
[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

Source:
https://www.lesswrong.com/posts/tEPHGZAb63dfq2v8n/how-u…

2 years, 4 months ago

Short Long
View Episode
[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al

This is a linkpost for https://arxiv.org/abs/2401.05566

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

Source:…

2 years, 4 months ago

Short Long
View Episode
The impossible problem of due process

I wrote this entire post in February of 2023, during the fallout from the TIME article. I didn't post it at the time for multiple reasons:

because I …

2 years, 4 months ago

Short Long
View Episode
[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith

"(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.)"

This is the first essay in a series that I’…

2 years, 4 months ago

Short Long
View Episode
Introducing Alignment Stress-Testing at Anthropic

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Following on from our recent paper, “Sleeper Agents: Training D…

2 years, 4 months ago

Short Long
View Episode
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a linkpost for https://arxiv.org/abs/2401.05566I'm not …

2 years, 4 months ago

Short Long
View Episode
[HUMAN VOICE] "Meaning & Agency" by Abram Demski

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

The goal of this post is to clarify a few concepts relating to…

2 years, 4 months ago

Short Long
View Episode
What’s up with LLMs representing XORs of arbitrary features?

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Thanks to Clément Dumas, Nikola Jurković, Nora Belrose, Arthur …

2 years, 4 months ago

Short Long
View Episode
Gentleness and the artificial Other

(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.

This is the first essay in a series that I’m c…

2 years, 4 months ago

Short Long
View Episode

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us