Episode Details

Back to Episodes

“Retrospective on my unsupervised elicitation challenge” by DanielFilan

Published 1 month, 1 week ago
Description

This post contains spoilers for the unsupervised elicitation challenge of getting Claude to get my Ancient Greek homework right.

tl;dr Opus 4.7 one-shots it, nothing else worked.

The challenge

A few weeks ago, I announced to the world my Unsupervised Elicitation Challenge (my blog, LessWrong). I’d encourage you to read that post for the context, but the tl;dr is that there was a fill-in-the-blank exercise early on in my Ancient Greek textbook that Claude Opus 4.6 didn’t fill out correctly by default, but could do correctly if I prodded it a bit. The challenge was to get it to fill out the answers correctly without knowing any Ancient Greek yourself—after all, Opus 4.6 apparently has this knowledge somewhere internally (as you might expect, given that it's a large language model that has presumably read the whole corpus of Ancient Greek as well as many textbooks on the topic), but I was only able to extract it out because I knew what to ask about.

The general idea of the challenge is to mimic a hard version of AI alignment, in some sense: suppose that there's some task you want an AI to complete, but can’t [...]

---

Outline:

(00:23) The challenge

(02:11) The secret: accents

(04:42) Is this unfair?

(05:46) Nobody succeeded

(07:57) What this says about alignment

(08:38) The problem of Opus 4.7

(10:47) Next steps for unsupervised elicitation

The original text contained 8 footnotes which were omitted from this narration.

---

First published:
April 26th, 2026

Source:
https://www.lesswrong.com/posts/aJ3NTJ6tM7yc49kQn/retrospective-on-my-unsupervised-elicitation-challenge

---

Narrated by TYPE III AUDIO.

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us