Episode Details

Back to Episodes

[Linkpost] “Metagaming matters for training, evaluation, and oversight” by jenny, Bronson Schoen

Published 1 month ago
Description
This is a link post.

Following up on our previous work on verbalized eval awareness:

we are sharing a post investigating the emergence of metagaming reasoning in a frontier training run.

  1. Metagaming is a more general, and in our experience a more useful concept, than evaluation awareness.
  2. It arises in frontier training runs and does not require training on honeypot environments.
  3. Verbalization of metagaming can go down over the course of training.

We also share some quantitative analyses, qualitative examples, and upcoming work.

---

First published:
March 18th, 2026

Source:
https://www.lesswrong.com/posts/4hXWSw8tzoK9PM7v6/metagaming-matters-for-training-evaluation-and-oversight

Linkpost URL:
https://alignment.openai.com/metagaming

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Graph showing verbalized eval awareness rates for early versus late exp-rl-cap models.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Listen Now

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us