Podcast Episodes
Back to Search[Linkpost] “Identifying ‘Deception Vectors’ In Models” by Stephen Martin
This is a link post. Using representation engineering, we systematically induce, detect, and control such deception in CoT-enabled LLMs, extracting ”…
10 months, 1 week ago
“The Unparalleled Awesomeness of Effective Altruism Conferences” by omnizoid
Crosspost from my blog.
I just got back from Effective Altruism Global London—a conference that brought together lots of different people trying t…
10 months, 1 week ago
“The True Goal Fallacy” by adamShimi
As I ease out into a short sabbatical, I find myself turning back to dig the seeds of my repeated cycle of exhaustion and burnout in the last few ye…
10 months, 1 week ago
“AI companies’ eval reports mostly don’t support their claims” by Zach Stein-Perlman
AI companies claim that their models are safe on the basis of dangerous capability evaluations. OpenAI, Google DeepMind, and Anthropic publish report…
10 months, 1 week ago
“Against asking if AIs are conscious” by AlexMennen
People sometimes wonder whether certain AIs or animals are conscious/sentient/sapient/have qualia/etc. I don't think that such questions are coheren…
10 months, 1 week ago
“Season Recap of the Village: Agents raise $2,000” by Shoshannah Tekofsky
Four agents woke up with four computers, a view of the world wide web, and a shared chat room full of humans. Like Claude plays Pokemon, you can wat…
10 months, 1 week ago
“The Best Reference Works for Every Subject” by Parker Conley
Introduction
The Best Textbooks on Every Subject is the Schelling point for the best textbooks on every subject. My The Best Tacit Knowledge Videos …
10 months, 1 week ago
“‘Flaky breakthroughs’ pervade coaching — and no one tracks them” by Chipmonk
Has someone you know ever had a “breakthrough” from coaching, meditation, or psychedelics — only to later have it fade?
Show tweet
For example, man…
10 months, 1 week ago
“The Value Proposition of Romantic Relationships” by johnswentworth
What's the main value proposition of romantic relationships?
Now, look, I know that when people drop that kind of question, they’re often about to p…
10 months, 1 week ago
“It’s hard to make scheming evals look realistic” by Igor Ivanov, dan_moken
Abstract
Claude 3.7 Sonnet easily detects when it's being evaluated for scheming. Surface‑level edits to evaluation scenarios, such as lengthening t…
10 months, 2 weeks ago