Podcast Episodes

“Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell

The Cake

Imagine that I want to bake a chocolate cake, and my sole goal in my entire lightcone and extended mathematical universe is to bake that cak…

1 year, 2 months ago

Short Long

View Episode

“A Three-Layer Model of LLM Psychology” by Jan_Kulveit

This post offers an accessible model of psychology of character-trained LLMs like Claude.

Epistemic Status

This is primarily a phenomenological model…

1 year, 2 months ago

Short Long

View Episode

“Training on Documents About Reward Hacking Induces Reward Hacking” by evhub

This is a link post.This is a blog post reporting some preliminary work from the Anthropic Alignment Science team, which might be of interest to rese…

1 year, 2 months ago

Short Long

View Episode

“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt

One hope for keeping existential risks low is to get AI companies to (successfully) make high-assurance safety cases: structured and auditable argume…

1 year, 2 months ago

Short Long

View Episode

“Mechanisms too simple for humans to design” by Malmesbury

Cross-posted from Telescopic Turnip

As we all know, humans are terrible at building butterflies. We can make a lot of objectively cool things like nuc…

1 year, 2 months ago

Short Long

View Episode

“The Gentle Romance” by Richard_Ngo

This is a link post.A story I wrote about living through the transition to utopia.

This is the one story that I've put the most time and effort into; …

1 year, 2 months ago

Short Long

View Episode

“Quotes from the Stargate press conference” by Nikola Jurkovic

This is a link post.Present alongside President Trump:

Sam AltmanLarry Ellison (Oracle executive chairman and CTO)Masayoshi Son (Softbank CEO who be…

1 year, 2 months ago

Short Long

View Episode

“The Case Against AI Control Research” by johnswentworth

The AI Control Agenda, in its own words:

… we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that t…

1 year, 2 months ago

Short Long

View Episode

“Don’t ignore bad vibes you get from people” by Kaj_Sotala

I think a lot of people have heard so much about internalized prejudice and bias that they think they should ignore any bad vibes they get about a pe…

1 year, 2 months ago

Short Long

View Episode

“[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty” by tandem

(Both characters are fictional, loosely inspired by various traits from various real people. Be careful about combining kratom and alcohol.)

The orig…

1 year, 2 months ago

Short Long

View Episode

Podcast Episodes

“Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell

“A Three-Layer Model of LLM Psychology” by Jan_Kulveit

“Training on Documents About Reward Hacking Induces Reward Hacking” by evhub

“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt

“Mechanisms too simple for humans to design” by Malmesbury

“The Gentle Romance” by Richard_Ngo

“Quotes from the Stargate press conference” by Nikola Jurkovic

“The Case Against AI Control Research” by johnswentworth

“Don’t ignore bad vibes you get from people” by Kaj_Sotala

“[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty” by tandem

Love PodBriefly?