Episode Details

Back to Episodes

[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al

Published 2 years, 1 month ago
Description

This is a linkpost for https://arxiv.org/abs/2401.05566

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

Source:
https://www.lesswrong.com/posts/ZAsJv7xijKTfZkMtr/sleeper-agents-training- deceptive-llms-that-persist-through

Narrated for LessWrong by Perrin Walker.

Share feedback on this narration.

[Curated Post]
[
125+ Karma Post]

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us