Podcast Episodes

“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout

This is a link post.Your AI's training data might make it more “evil” and more able to circumvent your security, monitoring, and control measures. Ev…

1 year, 1 month ago

Short Long

View Episode

“Judgements: Merging Prediction & Evidence” by abramdemski

I recently wrote about complete feedback, an idea which I think is quite important for AI safety. However, my note was quite brief, explaining the id…

1 year, 1 month ago

Short Long

View Episode

“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis

First, let me quote my previous ancient post on the topic:

Effective Strategies for Changing Public Opinion

The titular paper is very relevant here. I'…

1 year, 1 month ago

Short Long

View Episode

“Power Lies Trembling: a three-book review” by Richard_Ngo

In a previous book review I described exclusive nightclubs as the particle colliders of sociology—places where you can reliably observe extreme force…

1 year, 1 month ago

Short Long

View Episode

“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans

This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable co…

1 year, 1 month ago

Short Long

View Episode

“The Paris AI Anti-Safety Summit” by Zvi

It doesn’t look good.

What used to be the AI Safety Summits were perhaps the most promising thing happening towards international coordination for AI …

1 year, 1 month ago

Short Long

View Episode

“Eliezer’s Lost Alignment Articles / The Arbital Sequence” by Ruby

Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibility.

Circa 2015-2017, a lot of high quality conten…

1 year, 1 month ago

Short Long

View Episode

“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby

Arbital was envisioned as a successor to Wikipedia. The project was discontinued in 2017, but not before many new features had been built and a subst…

1 year, 1 month ago

Short Long

View Episode

“How to Make Superbabies” by GeneSmith, kman

We’ve spent the better part of the last two decades unravelling exactly how the human genome works and which specific letter changes in our DNA affec…

1 year, 1 month ago

Short Long

View Episode

“A computational no-coincidence principle” by Eric Neyman

Audio note: this article contains 134 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in t…

1 year, 1 month ago

Short Long

View Episode

Podcast Episodes

“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout

“Judgements: Merging Prediction & Evidence” by abramdemski

“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis

“Power Lies Trembling: a three-book review” by Richard_Ngo

“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans

“The Paris AI Anti-Safety Summit” by Zvi

“Eliezer’s Lost Alignment Articles / The Arbital Sequence” by Ruby

“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby

“How to Make Superbabies” by GeneSmith, kman

“A computational no-coincidence principle” by Eric Neyman

Love PodBriefly?