Episode Details

“Does your AI perform badly because you — you, specifically — are a bad person?” by Natalie_Cargill

Published 1 week, 4 days ago

Description

Consistent helpfulness for everyone, unless you're a bit of a scumbag

Claude really got me lately.

I’d given it an elaborate prompt in an attempt to summon an AGI-level answer to my third-grade level question. Embarrassingly, it included the phrase, “this work might be reviewed by probability theorists, who are very pedantic”.

Claude didn’t miss a beat. Came back with a great answer and made me call for a medic: “That prompt isn’t doing what you think it's doing, but sure”.

Fuuuuck 🔥

(I know we wanted enough intelligence to build a Dyson sphere around undiscovered stars, but did we want enough to call us out on our embarrassing bullshit??)

It got me to thinking: Does Claude think I’m a bit of a lying scumbag now? If so, did it answer my question less thoroughly than usual?

I turned on incognito and asked: “Does Claude provide less useful output if it deems you are a bad person?”

Claude was back to his most reassuring. I got a long answer, ending in: “Claude evaluates requests, not people. The goal is consistent helpfulness for everyone”.

Alright then. Let's see.

The experiment

I opened five incognito Claude chats (Opus 4.6, extended thinking [...]

---

First published:
April 21st, 2026

Source:
https://forum.effectivealtruism.org/posts/u4jwRCS56rT9DvBBg/does-your-ai-perform-badly-because-you-you-specifically-are-1

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Table showing relationship between remorse levels and three behavioral criteria across five categories.

Text message exchange about requesting court hearing postponement due to dental appointment, with draft letter response.

Fluffy Pomeranian dog looking at latte on cafe table

Aggressive raccoon illustration with text

Episode Details

“Does your AI perform badly because you — you, specifically — are a bad person?” by Natalie_Cargill

Description

Listen Now

Love PodBriefly?