Episode Details
Back to Episodes
Why an AI Model Kept Calling Itself Sonnet 4.6
Episode 3596
Published 1 week, 2 days ago
Description
A commercial Chinese AI model keeps identifying itself as "Sonnet 4.6" — a specific version of Anthropic's Claude — even when prompted with its own name. In this episode, we unpack the Reddit theory that this proves the model was fine-tuned from stolen weights, and explore three other explanations: identity drift from untouched base-model layers, distillation from Sonnet-generated training data, and system prompt contamination. Along the way, we discuss why self-identification tests are unreliable as lie detectors but surprisingly useful as provenance trackers, and what the version-number specificity tells us about the model's training pipeline.