Episode Details

Back to Episodes

“Does Claude really care about you?” by Simon Lermen

Published 1 week, 1 day ago
Description

TLDR: The persona-selection alignment approach — selecting a warm, caring persona from the pretraining distribution and reinforcing it — looks successful in the current regime, but probably won't extrapolate to more powerful, less constrained settings. My core argument is that human empathy has two specific origins (kin selection + architectural mirroring of others' mental states) that AI systems lack, so AI "caring" is closer to "figure out what humans want to hear and say it" than to genuine other-directed concern.

Sometimes chatbots like Claude express a sense of caring and empathy for the user. I've always had a strong intuition that these feelings expressed by AI systems aren't real in the way a human's would be.

In the view of the persona-selection alignment approach, we roughly try to identify and reinforce a nice persona from the distribution of personas present in pretraining data, with caring and showing empathy being important parts of the desired persona. This has been successfully realized in current AI systems by some labs, to the extent that they actually stick to their desired persona.

This contrasts with more traditional alignment approaches, where the goal is something like giving the system a terminal goal aligned [...]

---

First published:
May 28th, 2026

Source:
https://www.lesswrong.com/posts/KSChdD4xgD5Pxp47H/does-claude-really-care-about-you

---

Narrated by TYPE III AUDIO.

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us