Episode Details

Why an AI Model Kept Calling Itself Sonnet 4.6

Episode 3596 Published 1 week, 2 days ago

Description

A commercial Chinese AI model keeps identifying itself as "Sonnet 4.6" — a specific version of Anthropic's Claude — even when prompted with its own name. In this episode, we unpack the Reddit theory that this proves the model was fine-tuned from stolen weights, and explore three other explanations: identity drift from untouched base-model layers, distillation from Sonnet-generated training data, and system prompt contamination. Along the way, we discuss why self-identification tests are unreliable as lie detectors but surprisingly useful as provenance trackers, and what the version-number specificity tells us about the model's training pipeline.

Episode Details

Why an AI Model Kept Calling Itself Sonnet 4.6

Description

Listen Now

Love PodBriefly?