Podcast Episodes
Back to Search“Outer Alignment is the Necessary Compliment to AI 2027’s Best Case Scenario” by Josh Hickman
To the extent we believe more advanced training and control techniques will lead to alignment of agents capable enough to strategically make success…
8 months, 3 weeks ago
“Dwarkesh Patel on Continual Learning” by Zvi
A key question going forward is the extent to which making further AI progress will depend upon some form of continual learning. Dwarkesh Patel offe…
8 months, 3 weeks ago
“Personal Agents: AIs as trusted advisors, caretakers, and user proxies” by JWJohnston
Just posted the following on Medium. Interested in comments from readers here, especially pointers to similar efforts and ideas I didn't mention bel…
8 months, 3 weeks ago
[Linkpost] “METR: Recent frontier models are reward hacking” by Daniel Kokotajlo
This is a link post. METR just made a lovely post detailing many examples they've found of reward hacks by frontier models. Unlike the reward hacks o…
8 months, 3 weeks ago
[Linkpost] “Identifying ‘Deception Vectors’ In Models” by Stephen Martin
This is a link post. Using representation engineering, we systematically induce, detect, and control such deception in CoT-enabled LLMs, extracting ”…
8 months, 3 weeks ago
“The Unparalleled Awesomeness of Effective Altruism Conferences” by omnizoid
Crosspost from my blog.
I just got back from Effective Altruism Global London—a conference that brought together lots of different people trying t…
8 months, 3 weeks ago
“The True Goal Fallacy” by adamShimi
As I ease out into a short sabbatical, I find myself turning back to dig the seeds of my repeated cycle of exhaustion and burnout in the last few ye…
8 months, 3 weeks ago
“AI companies’ eval reports mostly don’t support their claims” by Zach Stein-Perlman
AI companies claim that their models are safe on the basis of dangerous capability evaluations. OpenAI, Google DeepMind, and Anthropic publish report…
8 months, 3 weeks ago
“Against asking if AIs are conscious” by AlexMennen
People sometimes wonder whether certain AIs or animals are conscious/sentient/sapient/have qualia/etc. I don't think that such questions are coheren…
8 months, 3 weeks ago
“Season Recap of the Village: Agents raise $2,000” by Shoshannah Tekofsky
Four agents woke up with four computers, a view of the world wide web, and a shared chat room full of humans. Like Claude plays Pokemon, you can wat…
8 months, 3 weeks ago