Podcast Episodes

Back to Search
[HUMAN VOICE] "There is way too much serendipity" by Malmesbury

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

Crossposted from substack.

As we all know, sugar is sweet and s…

2 years, 1 month ago

Short Long
View Episode
[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

Source:
https://www.lesswrong.com/posts/tEPHGZAb63dfq2v8n/how-u…

2 years, 1 month ago

Short Long
View Episode
[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al

This is a linkpost for https://arxiv.org/abs/2401.05566

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

Source:…

2 years, 1 month ago

Short Long
View Episode
The impossible problem of due process

I wrote this entire post in February of 2023, during the fallout from the TIME article. I didn't post it at the time for multiple reasons:

because I …

2 years, 1 month ago

Short Long
View Episode
[HUMAN VOICE] "Gentleness and the artificial Other" by Joe Carlsmith

"(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.)"

This is the first essay in a series that I’…

2 years, 1 month ago

Short Long
View Episode
Introducing Alignment Stress-Testing at Anthropic

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Following on from our recent paper, “Sleeper Agents: Training D…

2 years, 1 month ago

Short Long
View Episode
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a linkpost for https://arxiv.org/abs/2401.05566I'm not …

2 years, 1 month ago

Short Long
View Episode
[HUMAN VOICE] "Meaning & Agency" by Abram Demski

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

The goal of this post is to clarify a few concepts relating to…

2 years, 1 month ago

Short Long
View Episode
What’s up with LLMs representing XORs of arbitrary features?

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Thanks to Clément Dumas, Nikola Jurković, Nora Belrose, Arthur …

2 years, 1 month ago

Short Long
View Episode
Gentleness and the artificial Other

(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.

This is the first essay in a series that I’m c…

2 years, 1 month ago

Short Long
View Episode

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us