Episode Details
Back to Episodes“My research agenda and work” by Seth Herd
Description
This is a summary of the work I've done and work I plan to do, and the theories of change and AI progress that motivate my work. I've been working full-time on alignment for three years and change, and thinking about brainlike AGI and its alignment increasingly often since 2004.
Here's the research agenda in one breath: I'm trying to predict what the first transformative AI will be, in enough mechanistic detail that we can predict likely failure modes of its alignment. That's in service of finding interventions that address those failure modes efficiently, so that they can realistically be implemented even if timelines are short and work is rushed. I'm using my background in computational cognitive neuroscience to predict what might be called loosely brainlike AGI: LLMs with added human-like cognitive capacities.
I'll give a summary in the rest of this section, then give a little more depth on each major thread of my work in the remaining sections. All of it is pretty brief.
Approach and premises
Most alignment work falls roughly into one of two broad categories: empirical study of current systems ("prosaic alignment"), or theory about idealized agents ("agent foundations") (with much variation [...]
---
Outline:
(01:07) Approach and premises
(05:36) Philosophy of the approach
(07:36) 2. Technical work
(08:06) 2.1. Predicted paths to TCAI
(08:55) Memory (continuous learning)
(09:30) Executive function and metacognition
(10:40) 2.2. Predicted paths to (mis)alignment
(13:59) 3. My research background in computational cognitive neuroscience
(16:04) 4. Societal influences on AI safety
(17:10) 4.1. Government and public opinion on AI progress
(20:21) 4.2. AI progress and epistemics
(22:50) 5. Alignment targets
(23:14) 5.1. Corrigibility, DWIMAC, or instruction-following vs. value alignment targets
(26:19) 5.2. Stability as an alignment target
(28:03) 6. Future work
(32:52) 7. Collaboration
The original text contained 7 footnotes which were omitted from this narration.
---
First published:
June 5th, 2026
Source:
https://www.lesswrong.com/posts/MuLvZxMcy5WaKJu3H/my-research-agenda-and-work
---
Narrated by TYPE III AUDIO.
---
Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.