Podcast Episodes

“Unless its governance changes, Anthropic is untrustworthy” by null

Anthropic is untrustworthy.

This post provides arguments, asks questions, and documents some examples of Anthropic's leadership being misleading and…

7 months, 3 weeks ago

Short Long

View Episode

“Alignment remains a hard, unsolved problem” by null

Thanks to (in alphabetical order) Joshua Batson, Roger Grosse, Jeremy Hadfield, Jared Kaplan, Jan Leike, Jack Lindsey, Monte MacDiarmid, Francesco M…

7 months, 3 weeks ago

Short Long

View Episode

“Video games are philosophy’s playground” by Rachel Shu

Crypto people have this saying: "cryptocurrencies are macroeconomics' playground." The idea is that blockchains let you cheaply spin up toy economie…

7 months, 4 weeks ago

Short Long

View Episode

“Stop Applying And Get To Work” by plex

TL;DR: Figure out what needs doing and do it, don't wait on approval from fellowships or jobs.

If you...

Have short timelines Have been struggling …

8 months ago

Short Long

View Episode

“Gemini 3 is Evaluation-Paranoid and Contaminated” by null

TL;DR: Gemini 3 frequently thinks it is in an evaluation when it is not, assuming that all of its reality is fabricated. It can also reliably outpu…

8 months ago

Short Long

View Episode

“Natural emergent misalignment from reward hacking in production RL” by evhub, Monte M, Benjamin Wright, Jonathan Uesato

Abstract

We show that when large language models learn to reward hack on production RL environments, this can result in egregious emergent misalignm…

8 months ago

Short Long

View Episode

“Anthropic is (probably) not meeting its RSP security commitments” by habryka

TLDR: An AI company's model weight security is at most as good as its compute providers' security. Anthropic has committed (with a bit of ambiguity,…

8 months ago

Short Long

View Episode

“Varieties Of Doom” by jdp

There has been a lot of talk about "p(doom)"over the last few years. This has always rubbed me the wrong waybecause "p(doom)" didn't feel like it ma…

8 months ago

Short Long

View Episode

“How Colds Spread” by RobertM

It seems like a catastrophic civilizational failure that we don't have confident common knowledge of how colds spread. There have been a number of s…

8 months ago

Short Long

View Episode

“New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence” by Aaron_Scher, David Abecassis, Brian Abeyta, peterbarnett

TLDR: We at the MIRI Technical Governance Team have released a report describing an example international agreement to halt the advancement towards …

8 months ago

Short Long

View Episode

Podcast Episodes

“Unless its governance changes, Anthropic is untrustworthy” by null

“Alignment remains a hard, unsolved problem” by null

“Video games are philosophy’s playground” by Rachel Shu

“Stop Applying And Get To Work” by plex

“Gemini 3 is Evaluation-Paranoid and Contaminated” by null

“Natural emergent misalignment from reward hacking in production RL” by evhub, Monte M, Benjamin Wright, Jonathan Uesato

“Anthropic is (probably) not meeting its RSP security commitments” by habryka

“Varieties Of Doom” by jdp

“How Colds Spread” by RobertM

“New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence” by Aaron_Scher, David Abecassis, Brian Abeyta, peterbarnett

Love PodBriefly?