Episode Details

Back to Episodes

“Claude Mythos System Card Preview” by anaguma

Published 2 weeks, 1 day ago
Description

Anthropic has released a preview of the Claude Mythos System Card preview here. It is too long to present in full, but a section I found particularly notable is below:

In our testing and early internal use of Claude Mythos Preview, we have seen it reach unprecedented levels of reliability and alignment, and accordingly have come to use it quite broadly, often with greater affordances and less frequent human-interaction than we gave prior models. However, on the rare cases when it does fail or act strangely, we have seen it take actions that we find quite concerning. These incidents generally involved taking reckless excessive measures when attempting to complete a difficult user-specified task and, in rare cases with earlier versions of the model, seemingly obfuscating that it had done so.

All of the severe incidents of this kind that we observed involved earlier versions of Claude Mythos Preview which, while still less prone to taking unwanted actions than Claude Opus 4.6, predated what turned out to be some of our most effective training interventions. These earlier versions were tested extensively internally and were shared with some external pilot users. Among the incidents that we have observed:

  • Leaking [...]

---

First published:
April 7th, 2026

Source:
https://www.lesswrong.com/posts/xtnSzhA3TvExN4ZhG/claude-mythos-system-card-preview

---

Narrated by TYPE III AUDIO.

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us