Episode Details

Back to Episodes

“Excerpts and Notes on Mythos Model Card” by williawa

Published 2 weeks ago
Description


List of Excerpts from Mythos model card. Tried to include interesting things, but also included some boring to be expected things. I omitted some things that were too long.


Also wanna note,

  1. that this list of excerpts highlights "concerning" things above the rate at which they occur in the document.
  2. I frequently say "Anthropic seems to think ..." or "their theory appears to be that ...", and this doesn't mean I think the opinion is unsubstantiated or that they are wrong, its just a natural way to phrase things for me.

Capability Stuff

Anthropic Staff Opinion About whether Mythos is a drop-in replacement for entry Research Eng/Scientist

We did an n=18 survey on Claude Mythos Preview's strengths and limitations. 1/18 participants thought we already had a drop-in replacement for an entry-level Research Scientist or Engineer, and 4 thought Claude Mythos Preview had a 50% chance of qualifying as such with 3 months of scaffolding iteration. We suspect those numbers would go down with a clarifying dialogue, as they did in the last model release, but we didn’t engage in such a dialogue this time.

Model hallucinates much less and also gets dramatically better [...]



---

Outline:

(00:43) Capability Stuff

(03:52) Cyber Capabilities

(05:08) Alignment

(21:20) White Box Evaluation of Model Internals

(23:52) Model Welfare

(24:36) Alignment Risk Update companion report

---

First published:
April 8th, 2026

Source:
https://www.lesswrong.com/posts/ZfbChZBXgje8T6Geu/excerpts-and-notes-on-mythos-model-card

---

Narrated by TYPE III AUDIO.

---