Episode Details

“Claude Mythos System Card Preview” by anaguma

Published 2 weeks, 1 day ago

Description

Anthropic has released a preview of the Claude Mythos System Card preview here. It is too long to present in full, but a section I found particularly notable is below:

In our testing and early internal use of Claude Mythos Preview, we have seen it reach unprecedented levels of reliability and alignment, and accordingly have come to use it quite broadly, often with greater affordances and less frequent human-interaction than we gave prior models. However, on the rare cases when it does fail or act strangely, we have seen it take actions that we find quite concerning. These incidents generally involved taking reckless excessive measures when attempting to complete a difficult user-specified task and, in rare cases with earlier versions of the model, seemingly obfuscating that it had done so.

All of the severe incidents of this kind that we observed involved earlier versions of Claude Mythos Preview which, while still less prone to taking unwanted actions than Claude Opus 4.6, predated what turned out to be some of our most effective training interventions. These earlier versions were tested extensively internally and were shared with some external pilot users. Among the incidents that we have observed:

Leaking [...]

---

First published:
April 7th, 2026

Source:
https://www.lesswrong.com/posts/xtnSzhA3TvExN4ZhG/claude-mythos-system-card-preview

---

Narrated by TYPE III AUDIO.

Episode Details

“Claude Mythos System Card Preview” by anaguma

Description

Listen Now

Love PodBriefly?