Podcast Episodes

Back to Search
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a linkpost for https://arxiv.org/abs/2401.05566I'm not …

2 years, 3 months ago

Short Long
View Episode
[HUMAN VOICE] "Meaning & Agency" by Abram Demski

Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated

The goal of this post is to clarify a few concepts relating to…

2 years, 3 months ago

Short Long
View Episode
What’s up with LLMs representing XORs of arbitrary features?

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Thanks to Clément Dumas, Nikola Jurković, Nora Belrose, Arthur …

2 years, 3 months ago

Short Long
View Episode
Gentleness and the artificial Other

(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.

This is the first essay in a series that I’m c…

2 years, 3 months ago

Short Long
View Episode
MIRI 2024 Mission and Strategy Update

As we announced back in October, I have taken on the senior leadership role at MIRI as its CEO. It's a big pair of shoes to fill, and an awesome resp…

2 years, 3 months ago

Short Long
View Episode
The Plan - 2023 Version

Background: The Plan, The Plan: 2022 Update. If you haven’t read those, don’t worry, we’re going to go through things from the top this year, and wit…

2 years, 3 months ago

Short Long
View Episode
Apologizing is a Core Rationalist Skill

In certain circumstances, apologizing can also be a countersignalling power-move, i.e. “I am so high status that I can grovel a bit without anybody m…

2 years, 3 months ago

Short Long
View Episode
[HUMAN VOICE] "A case for AI alignment being difficult" by jessicata

This is a linkpost for https://unstableontology.com/2023/12/31/a-case-for-ai-alignment-being-difficult/

Support ongoing human narrations of LessWrong'…

2 years, 3 months ago

Short Long
View Episode
The Dark Arts

lsusrIt is my understanding that you won all of your public forum debates this year. That's very impressive. I thought it would be interesting to dis…

2 years, 3 months ago

Short Long
View Episode
Critical review of Christiano’s disagreements with Yudkowsky

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a review of Paul Christiano's article "where I agree an…

2 years, 3 months ago

Short Long
View Episode

Love PodBriefly?

If you like Podbriefly.com, please consider donating to support the ongoing development.

Support Us