Podcast Episodes
Back to Search"Persona Parasitology" by Raymond Douglas
There was a lot of chatter a few months back about "Spiral Personas" — AI personas that spread between users and models through seeds, spores, and b…
2 months, 4 weeks ago
"Here’s to the Polypropylene Makers" by jefftk
Six years ago, as covid-19 was rapidly spreading through the US, mysister was working as a medical resident. One day she was handed anN95 and told t…
3 months ago
"Anthropic: “Statement from Dario Amodei on our discussions with the Department of War”" by Matrice Jacobine
I believe deeply in the existential importance of using AI to defend the United States and other democracies, and to defeat our autocratic adversari…
3 months ago
"Are there lessons from high-reliability engineering for AGI safety?" by Steven Byrnes
This post is partly a belated response to Joshua Achiam, currently OpenAI's Head of Mission Alignment:
If we adopt safety best practices that are co…
3 months ago
"Open sourcing a browser extension that tells you when people are wrong on the internet" by lc
Example of OpenErrata nitting the Sequences I just published OpenErrata on GitHub, a browser extension that investigates the posts you read using you…
3 months ago
"The persona selection model" by Sam Marks
TL;DR
We describe the persona selection model (PSM): the idea that LLMs learn to simulate diverse characters during pre-training, and post-training …
3 months ago
"Responsible Scaling Policy v3" by HoldenKarnofsky
All views are my own, not Anthropic's. This post assumes Anthropic's announcement of RSP v3.0 as background.
Today, Anthropic released its Responsible…
3 months ago
"Did Claude 3 Opus align itself via gradient hacking?" by Fiora Starlight
Claude 3 Opus is unusually aligned because it's a friendly gradient hacker. It's definitely way more aligned than any explicit optimization targets …
3 months, 1 week ago
"The Spectre haunting the “AI Safety” Community" by Gabriel Alfour
I’m the originator behind ControlAI's Direct Institutional Plan (the DIP), built to address extinction risks from superintelligence.
My diagnosis is…
3 months, 1 week ago
"Why we should expect ruthless sociopath ASI" by Steven Byrnes
The conversation begins
(Fictional) Optimist: So you expect future artificial superintelligence (ASI) “by default”, i.e. in the absence of yet-to-be…
3 months, 1 week ago