Podcast Episodes

[Linkpost] “Identifying ‘Deception Vectors’ In Models” by Stephen Martin

This is a link post. Using representation engineering, we systematically induce, detect, and control such deception in CoT-enabled LLMs, extracting ”…

10 months, 1 week ago

Short Long

View Episode

“The Unparalleled Awesomeness of Effective Altruism Conferences” by omnizoid

Crosspost from my blog.

I just got back from Effective Altruism Global London—a conference that brought together lots of different people trying t…

10 months, 1 week ago

Short Long

View Episode

“The True Goal Fallacy” by adamShimi

As I ease out into a short sabbatical, I find myself turning back to dig the seeds of my repeated cycle of exhaustion and burnout in the last few ye…

10 months, 1 week ago

Short Long

View Episode

“AI companies’ eval reports mostly don’t support their claims” by Zach Stein-Perlman

AI companies claim that their models are safe on the basis of dangerous capability evaluations. OpenAI, Google DeepMind, and Anthropic publish report…

10 months, 1 week ago

Short Long

View Episode

“Against asking if AIs are conscious” by AlexMennen

People sometimes wonder whether certain AIs or animals are conscious/sentient/sapient/have qualia/etc. I don't think that such questions are coheren…

10 months, 1 week ago

Short Long

View Episode

“Season Recap of the Village: Agents raise $2,000” by Shoshannah Tekofsky

Four agents woke up with four computers, a view of the world wide web, and a shared chat room full of humans. Like Claude plays Pokemon, you can wat…

10 months, 1 week ago

Short Long

View Episode

“The Best Reference Works for Every Subject” by Parker Conley

Introduction

The Best Textbooks on Every Subject is the Schelling point for the best textbooks on every subject. My The Best Tacit Knowledge Videos …

10 months, 1 week ago

Short Long

View Episode

“‘Flaky breakthroughs’ pervade coaching — and no one tracks them” by Chipmonk

Has someone you know ever had a “breakthrough” from coaching, meditation, or psychedelics — only to later have it fade?

Show tweet

For example, man…

10 months, 1 week ago

Short Long

View Episode

“The Value Proposition of Romantic Relationships” by johnswentworth

What's the main value proposition of romantic relationships?

Now, look, I know that when people drop that kind of question, they’re often about to p…

10 months, 1 week ago

Short Long

View Episode

“It’s hard to make scheming evals look realistic” by Igor Ivanov, dan_moken

Abstract

Claude 3.7 Sonnet easily detects when it's being evaluated for scheming. Surface‑level edits to evaluation scenarios, such as lengthening t…

10 months, 2 weeks ago

Short Long

View Episode

Podcast Episodes

[Linkpost] “Identifying ‘Deception Vectors’ In Models” by Stephen Martin

“The Unparalleled Awesomeness of Effective Altruism Conferences” by omnizoid

“The True Goal Fallacy” by adamShimi

“AI companies’ eval reports mostly don’t support their claims” by Zach Stein-Perlman

“Against asking if AIs are conscious” by AlexMennen

“Season Recap of the Village: Agents raise $2,000” by Shoshannah Tekofsky

“The Best Reference Works for Every Subject” by Parker Conley

“‘Flaky breakthroughs’ pervade coaching — and no one tracks them” by Chipmonk

“The Value Proposition of Romantic Relationships” by johnswentworth

“It’s hard to make scheming evals look realistic” by Igor Ivanov, dan_moken

Love PodBriefly?